PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Capital letters in Russian UTF-8 problem

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Capital letters in Russian UTF-8 problem

    I have just downloaded the Free Edition of Zoom to give it a try before deciding to buy. My site is multilingual (Russian/English/German) and I use UTF-8 encoding. The indexing of 50 pages was done OK. But when I try to search it looks like it is not possible to find any words starting with capital Russian letters (family names, towns, first words of the sentences, etc.). The "Search result for" string shows that the query word was converted to lowercase correctly. Also when I seach for any adjacent words I can see those problem words in the results so they must have been indexed. What is wrong? Searching for any English words does not cause any problem at all.

  • #2
    Can you post the URL to your web site search function & details of what words you are searching for, so that we can see the problem.

    -----
    David

    Comment


    • #3
      Here is the search page of my site: http://www.icon-art.info/search/search.php.

      I am not sure that I would be able to put any cyrillic word on this forum so idea is as follows: have a look at this page: http://www.icon-art.info/library.php?lng=ru - it has been indexed. You can check it if you would search for any lowercase word. But if you would try searching for any word that starts with capital cyrillic letter (e.g. family names of authors) you would fail.

      Comment


      • #4
        We have had a look at the site and agree the behaviour is not correct.

        We don't have a full solution for the problem and will not be able to investigate the problem in detail until after Christmas now.

        So if you can leave the search page on your web site for a couple of weeks that would be good.

        As an temporary solution you could remove the following lines of code in search.php

        if ($UseUTF8 == 1 && function_exists('mb_strtolower'))
        $query = mb_strtolower($query, "UTF-8");
        else

        This will avoid the conversion of Russian search words into lower case and you should then be able to do case sensitive searches in Russian.

        ---
        David

        Comment


        • #5
          Originally posted by Wrensoft
          So if you can leave the search page on your web site for a couple of weeks that would be good.
          The search page would be there as long as you need for your investigations. Good luck and Merry Christmas!

          Comment


          • #6
            We've fixed this problem in the latest build (4.2.1007) released today. This is available for download here:
            http://www.wrensoft.com/zoom/whatsnew.html

            The latest version should now be able to perform case insensitive searches on Cyrillic words (for UTF-8 encoded websites) without any problems. This also applies for other foreign languages encoded with UTF-8.
            --Ray
            Wrensoft Web Software
            Sydney, Australia
            Zoom Search Engine

            Comment

            Working...
            X