PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Garbage characters in result Title

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Garbage characters in result Title

    I am using Zoom 5.1 Build 1011 Professional, CGI (Win32)

    Offline indexing of approx 7500 .htm pages (Title and page contents), 60000 unique words, Result display of title and context only.
    Language = Spanish, encoding windows-1252.

    When I search for w* (all words starting with "w"), the first two results are displayed as [;:]w[:;] and [;:]web.

    When I search for we* or web, the output is [;:]web (the only word matching in the dataset).

    All other results appear OK, including other short titles of 1, 2, or 3 characters.

    (This appears to be true for any single letter+*, e.g. z* displays [;:]z[:;], but most other letters give too many results...)

    The <title> tags for all these cases look fine, <title>w</title> and <title>web</title>. I tried adding blank spaces to make the titles longer, but the results are the same.

    Any suggestions how to avoid this problem would be greatly appreciated.

    Franz J Mayrhofer
    Spanish Department
    Gavilan College

  • #2
    There is a known issue described here and I think this is the same problem.

    However, it does not normally occur for searches like "z*", etc. More likely if there are more than one search term which attempt o highlight the same word (eg. a search for "zoom z*" could trigger this bug). Having substring match enabled may also change this behaviour. Can you provide us with a URL to your search page so that we can take a closer look?
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment


    • #3
      Raymond,

      thanks for the quick response.

      I looked at the problem described by your link (Weird results with wild card searches * and ?) and did some more testing - while my problem may be related to the earlier issue, it appears to be much more widespread, and therefore more serious:

      The same problem also happens without the use of any wildcard characters or substring match. Any searches for 2 or 3 character words which are also used as the title of a page display the same [;:] or [:;] strings in the result.

      That is, searches for ir, no, si, tal, que... all have the same problem and display the title as [;:]ir[:;], [;:]no[:;], [;:]si[:;], [;:]tal, [;:]que ...

      Since the project I am working on is a Spanish Dictionary (the CD version of the "Diccionario panhispánico de dudas" of the Real Academia Española) it has many pages for 2 and 3 character words, and some of them are rather likely to be used as search terms.

      Since this is targeted for a CD, and not for a web page, I can't provide you with a url - but I believe you should be able to duplicate the more general problem easily: search for 2 or 3 character words which are also used as the title of a page. The problem seems to be related to highlighting these short titles - other occurences of the words in the results (context) are highlighted without problems.

      I also could mail you a preliminary copy of the CD if necessary...

      I have come up with a temporary "patch" for the problem, by calling the following function on load of the search page, but would obviously prefer a real fix.

      function pageInit() {
      var rng=document.body.createTextRange();
      while (rng.findText("[;:]")|rng.findText("[:;]")) {
      rng.text="";
      rng=document.body.createTextRange();
      }
      }

      <body onLoad="pageInit();">

      Franz J Mayrhofer
      Spanish Department
      Gavilan College

      Comment


      • #4
        The problem can not be reproduced as you have described. For example, try a search for *to* on our search page. It works as expected. Results 11 and 15 have "to" in the page title and the highlight is correct.

        But to be fair this is using the PHP option and not the CGI option. But I tested the CGI option with the same good result.

        So it must be something special about you Zoom setting, your pages, or your system that is causing the problem.

        If you can't provide us with the the URL. Can you zip up the entire project and E-Mail it to us. Or make it available for download.

        Comment


        • #5
          maybe I wasn't entirely clear in my last message - in order to reproduce the problem, the title has to match the 2- or 3-character search word exactly.

          The example you suggest, searching for *to* on the Zoom search page, results in only pages with long titles. Even though some contain "to", that works for me also, only cases where both the search word and the title are exactly the same have the problem...

          E-mailing or downloading the project is not feasable - it is 150 MB... but I will try and make a partial copy with the same symptoms. I'll let you know if I succeed

          Comment


          • #6
            OK, I have recreated the problem with a "mini" version of the CD project. It is still 8MB zipped, but better than 150MB...
            (Most of the dictionary word lists are empty, but letters U thru Z are populated, which provides plenty of examples)

            You can find the "MiniDPD.zip" file at http://hhh.gavilan.edu/fmayrhofer/download/MiniDPD.zip

            After downloading and unzipping it, start it by clicking the "Server2Go.exe" file. You should see a splash screen, possibly a black screen (while Server2Go is loading), another splash screen, and finally the "Diccionario panhispánico de dudas" Presentación screen.

            There are two search methods: the "basic" dictionary-list-of-words via the "Consulta" text box on the left, and the "advanced" search of the dictionary definitions accessed by clicking on the "Búsqueda avanzada" link at the top right of the page.

            Zoom search is only used for this "advanced" search.

            On the "Búsqueda avanzada" page, enter w* in the "Buscar:" text box and click the "Enviar" button.
            You should see the problem in the first two results, [;:]w[:;] and [;:]web

            Other search terms illustrating the problems are web, uno, ver, v?z, y*, yo, yen, ...

            Refer to the "search_template.html" file in the "cgi-bin" folder to activate the temporary "onLoad" patch mentioned in my previous post.

            Please let me know if you need anything else

            Franz J Mayrhofer
            Spanish Department
            Gavilan College

            Comment


            • #7
              Thanks for the files and additional information, Franz.

              We've confirmed that this is a bug in the CGI build and the problem is different to the one described in the other thread. The problem here is that the highlighting process is aborted when given a very short piece of text, from which highlighting is to occur (in your above instances, this would be the single character or 3 character titles, etc.).

              We will fix this in the next build (5.1.1013). Thanks for bringing it to our attention.
              Last edited by Ray; Mar-05-2008, 06:33 AM.
              --Ray
              Wrensoft Web Software
              Sydney, Australia
              Zoom Search Engine

              Comment


              • #8
                Update: The new build is up with this bug fixed. Download from here.
                --Ray
                Wrensoft Web Software
                Sydney, Australia
                Zoom Search Engine

                Comment


                • #9
                  Thank you, Raymond, for the incredibly fast response and fix - I just upgraded to Build 1013, and the problem is corrected.

                  I have been adding Zoom search to more of my web pages, and to the CD versions, and am very pleased with the results - you have an outstanding product, and your excellent support system makes it even better.

                  Franz J Mayrhofer
                  Spanish Department
                  Gavilan College

                  Comment

                  Working...
                  X