I have just downloaded the Free Edition of Zoom to give it a try before deciding to buy. My site is multilingual (Russian/English/German) and I use UTF-8 encoding. The indexing of 50 pages was done OK. But when I try to search it looks like it is not possible to find any words starting with capital Russian letters (family names, towns, first words of the sentences, etc.). The "Search result for" string shows that the query word was converted to lowercase correctly. Also when I seach for any adjacent words I can see those problem words in the results so they must have been indexed. What is wrong? Searching for any English words does not cause any problem at all.
Announcement
Collapse
No announcement yet.
Capital letters in Russian UTF-8 problem
Collapse
X
-
Here is the search page of my site: http://www.icon-art.info/search/search.php.
I am not sure that I would be able to put any cyrillic word on this forum so idea is as follows: have a look at this page: http://www.icon-art.info/library.php?lng=ru - it has been indexed. You can check it if you would search for any lowercase word. But if you would try searching for any word that starts with capital cyrillic letter (e.g. family names of authors) you would fail.
Comment
-
We have had a look at the site and agree the behaviour is not correct.
We don't have a full solution for the problem and will not be able to investigate the problem in detail until after Christmas now.
So if you can leave the search page on your web site for a couple of weeks that would be good.
As an temporary solution you could remove the following lines of code in search.php
if ($UseUTF8 == 1 && function_exists('mb_strtolower'))
$query = mb_strtolower($query, "UTF-8");
else
This will avoid the conversion of Russian search words into lower case and you should then be able to do case sensitive searches in Russian.
---
David
Comment
-
We've fixed this problem in the latest build (4.2.1007) released today. This is available for download here:
http://www.wrensoft.com/zoom/whatsnew.html
The latest version should now be able to perform case insensitive searches on Cyrillic words (for UTF-8 encoded websites) without any problems. This also applies for other foreign languages encoded with UTF-8.
Comment
Comment