PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

French Site - Query Problems

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • JCF1976
    Guest replied
    Originally posted by wrensoft View Post
    I spend some time making up some example pages
    Thank you for your efforts.

    Originally posted by wrensoft View Post
    ...but couldn't reproduce the main part of the problem you described in the end.

    I made two example pages. Using ASCII characters
    What method did you use and what software, out of curiousity?

    Originally posted by wrensoft View Post
    (no multibyte and no character entities in the HTML).
    Please explain more, exactly what you mean.

    Originally posted by wrensoft View Post
    I then set the page character sets to UTF-8 and ISO-8859-1 for the two files.

    In Zoom I selected the PHP option with the UTF-8 character set and checked this was carried over to the search_template file. I also set the 'Enable accent/diacritic/ligature insensitivity' option.

    I then did searches for the words you mentioned, both with and without accents. I got the same set of results with and without accents. As expected.
    I am curious how you actually got the results, searching with the accents. I wonder if it matters how someone is inputting the words, into the search fields? I am asking out of curiousity. I wonder if someone types into the field using a French keyboard layout or copying and pasting or another method, how that would affect things or not. I also wonder if you were copying from a page that was using the ASCII in the HTML or if it was from a text editor that had the words not in ASCII. Again, these are just things that occur to me and that I wonder about.

    Originally posted by wrensoft View Post
    ...However I think you are right about the highlighting of the search word not working with this combination of configuration settings, character sets and accented search words. So we need to have a look at this part of the problem to see if it can be fixed or improved on for the next patch release.
    This is good then, that you see this and can work on the issue. I also noticed that the jump to worked but the highlight did not work, when I click on the links from the results.

    Leave a comment:


  • David
    replied
    I spend some time making up some example pages but couldn't reproduce the main part of the problem you described in the end.

    I made two example pages. Using ASCII characters (no multibyte and no character entities in the HTML). I then set the page character sets to UTF-8 and ISO-8859-1 for the two files.

    In Zoom I selected the PHP option with the UTF-8 character set and checked this was carried over to the search_template file. I also set the 'Enable accent/diacritic/ligature insensitivity' option.

    I then did searches for the words you mentioned, both with and without accents. I got the same set of results with and without accents. As expected.

    Here is a screen shot.




    However I think you are right about the highlighting of the search word not working with this combination of configuration settings, character sets and accented search words. So we need to have a look at this part of the problem to see if it can be fixed or improved on for the next patch release.

    Leave a comment:


  • JCF1976
    Guest replied
    Php

    Originally posted by wrensoft View Post
    Can you give a couple of examples of what words you are searching for and let us know what Zoom script option you are using (PHP, ASP, etc..).

    If you are only using plain ASCII, then you probably don't need UTF-8 and might try using the Windows-1252 (English/Latin) character set. Having said that, UTF-8 should still work. I am wondering however is the UTF-8 encoding of the French accented characters is different from the Windows-1252 encoding and if this is what is messing up the accent insensitivity option.
    Thanks for your response. Hopefully this can be resolved. I am using PHP. The index is of about 1200 pages. I wondered if I needed UTF-8, since I am using ASCII. It's probably not necessary, but I also tend to think it doesn't hurt.

    Here are a couple examples of searches:

    délibérèrent

    ôtera même

    I almost tried the Windows-1252 crawl/index option. I can still try that and then encode the search page in Windows-1252, unless you think it is not necessary.

    I look forward to further responses, ASAP.

    Leave a comment:


  • David
    replied
    Can you give a couple of examples of what words you are searching for and let us know what Zoom script option you are using (PHP, ASP, etc..).

    If you are only using plain ASCII, then you probably don't need UTF-8 and might try using the Windows-1252 (English/Latin) character set. Having said that, UTF-8 should still work. I am wondering however is the UTF-8 encoding of the French accented characters is different from the Windows-1252 encoding and if this is what is messing up the accent insensitivity option.

    Leave a comment:


  • JCF1976
    Guest replied
    Forum Members

    I look forward to receiving help from Ray and/or David, but, if there are any forum members that are French speaking and/or have experience crawling, indexing and querying French sites, I would appreciate your contribution in this thread!

    Leave a comment:


  • JCF1976
    Guest started a topic French Site - Query Problems

    French Site - Query Problems

    Greetings. I have a French site that I just finished. The accents (in the HTML) are all in ASCII. This is very good for viewing in browsers, but may be causing a problem with Zoom, particularly with querying.

    I have all of my pages in UTF-8. I did use the "Enable accent/diacritic/ligature insensitivity" setting. I did use the UTF-8 setting.

    What happens:

    #1, if I run a query actually using words with accent marks, it doesn't pull up the results in the index.

    #2, if I run a query without the accents, it pulls up the results (which have the accents) but doesn't highlight them.

    What am I doing wrong? I need this to work properly. I would like a person to be able to search for words with or without the accent marks and for it to pull up the right words.

    (I have V5 Pro)
Working...
X