As far as we are aware is there no issue to be resolved. We did some testing and posted the results of our tests (see above). But didn't see the problem you are talking about. Once you get the config correct, it works fine for French as far as we know and no one as provided an example to the contrary.
So unless you are prepared to provide exact details of your configuration and copies of your input files we don't plan on investigating this issue. Otherwise there is nothing for us to investigate.
Announcement
Collapse
No announcement yet.
French Site - Query Problems
Collapse
X
-
Guest repliedFurther Work
David, I did further work on this problem. I made another copy of the whole site and did a find and replace function to replace all of the ASCII characters with normal characters. I did download the most recent version of your software to use in crawling the site again.
I am restricted by doing offline searches. I don't know if that makes a difference. I am sure you would say that it does not. ...If Zoom would obey my online robots.txt file, I could try to crawl the site online to see if there would be a difference. This is another reason why you would not be able to crawl the site and do testing.
I see a slight improvement, after the work I did and possibly the work you have done on your program. I see that if I do a one word search with the accents in the word, it does come up in the results. It appears that Zoom did indeed index the accented words. If I do a multi-word search, it also gives results, but it's hard to tell exactly what kind of results I am getting. What I have tried to do is doing the multi-word search in quotes, where there are words with accented characters. This does not work. So, this appears to be where your indexing breaks down.
Also, the words are not being highlighted, if I do searches with accented words. This is disappointing, of course. I hope you will be able to fix this.
I did download your files and took a look at them. I also reviewed the searches you performed. Again, I saw that you did not try to perform any searches with two or more words in quotes, where the words are accented. This kind of search is critical on my site.
I look forward to your further responses. Based on m00di's posts, it's evident that others are also interested in this being resolved.
Leave a comment:
-
Guest repliedOriginally posted by m00di View PostHi
Did you find a solution for this. I am having the same issue.
Thanks
Leave a comment:
-
Hi
Did you find a solution for this. I am having the same issue.
Thanks
Leave a comment:
-
Guest repliedThanks! I'll download it and test it this evening!
Leave a comment:
-
I have uploaded the set of working example files I made to our server. So instead of us trying to reproduce the problem with your files (which we don't have), you can try and provoke the problem by editing our files or work out what is different by comparing your files to our files.
You can download the set of files here,
http://www.wrensoft.com/test/french/accenttest.zip
and see it working here,
http://www.wrensoft.com/test/french/...uery=m%C3%AAme
http://www.wrensoft.com/test/french/...oom_query=meme
This set of index files were generated with the UTF-8 selected in Zoom 5 on a Windows XP machine. I tested the search behaviour on Windows/PHP and Unix/PHP and it was the same.
Leave a comment:
-
Guest repliedI'd rather not
Originally posted by wrensoft View PostCan you put the HTML pages in question on a public web site where we can see the files. Or put the entire search function on a public site and post the URL. E-mailing us your Zoom configuration file would also help us match your configuration.
Leave a comment:
-
Can you put the HTML pages in question on a public web site where we can see the files. Or put the entire search function on a public site and post the URL. E-mailing us your Zoom configuration file would also help us match your configuration.
Leave a comment:
-
Guest repliedUtf-8
Okay, I switched everything over to UTF-8 again and recrawled the files that had been converted from ASCII text to French accents. Reposted everything with the changes. Now we're back to the old/original problem. The queries are not pulling up results when I do a search with accented characters.
Leave a comment:
-
Guest repliedVouliez-vous dire: tenebres autant au la surface?
I noticed that the suggested search isn't correct either:
Vouliez-vous dire: tenebres autant au la surface?
instead of:
Vouliez-vous dire: tenebres etaient a la surface?
Leave a comment:
-
Guest repliedISO-8859-15 setting
By the way, it should go without saying, that when I crawled the files locally, with Zoom, that I used the ISO-8859-15 setting.
Leave a comment:
-
Guest repliedténèbres étaient à la surface
Okay, I just changed all of the ASCII characters to actual accented French vowels. I also changed all of the encoding to ISO-8859-15 (I did that prior to changing all of the vowels). I reran the zoom crawler (locally). I uploaded the new files and ran a query with the following words:
ténèbres étaient à la surface
The search result page displayed:
Résultats de la recherche pour : ta©našbres a©taient a la surface dans toutes les categories
and infact, the actual search field displays:
ténÚbres étaient à la surface
instead of:
ténèbres étaient à la surface
I did/had change/changed the encoding on the search template to ISO-8859-15 too. So, I am not sure what to make of this.
Leave a comment:
-
Guest repliedMore testing
I am going to do some more testing this weekend, including converting the ASCII characters to real French characters IN the code. (Don't worry! I'll do testing on a copy of the site.
Leave a comment:
-
Guest repliedIso-8859-15
I found this information helpful (found at http://en.wikipedia.org/wiki/ISO_8859-1 ):
ISO 8859-1 encodes what it refers to as "Latin alphabet no. 1," consisting of 191 characters from the Latin script. Each character is encoded as a single eight-bit code value. These code values can be used in almost any data interchange system to communicate in the following European languages (with a few exceptions due to missing characters, as noted):
...# French (missing Œ, œ and rare Ÿ)
* Note that Windows-1252 and ISO-8859-15 do contain these
...Relationship to ISO/IEC 8859-15
Although ISO/IEC 8859-1 has enough characters for most French text, it is missing a few less-common letters. It is also missing a single-glyph representation for the letter IJ, two Finnish letters used for transcription of some foreign names and in a few loanwords (Š and Ž), typographic quotation marks and dashes, and common symbols such as the euro sign (€) and dagger (†).
In order to provide some of these characters, ISO/IEC 8859-15 was developed as an update of ISO/IEC 8859-1. This required, however, the removal of some infrequently-used characters from ISO/IEC 8859-1, including fraction symbols and letter-free diacritics: ¤, |, ¨, ´, ¸, ¼, ½, and ¾.
Leave a comment:
-
The test files were made using a text editor.
Please explain more, exactly what you mean.
HTML character entities are special strings, defined in the WWW standards, that are used to represent special characters. Including accented characters in some character sets.
It should not matter if you cut and paste or type in the accented characters. Provided of course that the you aren't forcing a Unicode to single byte conversion on multibyte character. Which should not be the case here as the accented characters in question are represented by a single byte.
So we need more details & maybe copies of your HTML pages if we are going to reproduce the problem.
Leave a comment:
Leave a comment: