PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Words missing from search

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Words missing from search

    I have used Zoom to search a small medical site of about 50 pages with the output in javascript. The search engine has found a list of about 8000 unique words. However, the search does not seem to have found medical terms such as drug names or names of specific medical conditions. Is there a simple explanation for this?

  • #2
    The free version of the software is limited to 50 pages, so maybe some pages were not indexed? Thus some words on these pages woud not be indexed.

    If a word has more than 40 characters in it, it will be broken up into multiple words.

    Can you give some examples of words not indexed and the pages they are found on.

    -----
    David

    Comment


    • #3
      Thanks for your response.

      The search engine appears to have tried to index all the pages I expected it to (judging by the output in verbose mode). However the word list is incomplete. Since this is an intranet site, I would need to email sample pages etc to you, if this is possible.

      Comment


      • #4
        Yes, you can email your files to us at the address on the top of our support page.

        Can you give us some examples of the drug names and search terms in question. For example, if they contain an unusual character, or if they are hyphenated, such as "blah-somethingitis". The way that words are indexed depend on the "Indexing Options" specified in the Configuration window. You may find that your options are causing certain words to be splitted up.
        --Ray
        Wrensoft Web Software
        Sydney, Australia
        Zoom Search Engine

        Comment


        • #5
          Thanks for the response. The sorts of drug names not found are dopamine, dobutamine, isoprenaline. Also not finding various eponymous syndromes.

          I have sent an example page and output from Zoom search by email.

          Comment


          • #6
            I can see the problem. These words don't appear in the body of the document. Well they do, but they really don't. Kind of. Sort of.

            This is your HTML,

            Code:
            [img]Drug%20Dosage_files/shapeimage_2.png[/img]
            So all the text is obscured by the image shapeimage_2.png. And the text you want indexed never appears on the page and it is not indexed.

            In this release of Zoom we don't index images, so there isn't much point indexing the alternate image text (especially 10KB of alternate text).

            In their document, Techniques For Accessibility Evaluation And Repair Tools, the Evaluation and Repair Tools Working Group of the W3C Web Accessibility Initiative recommends a maximum of 150 characters for alternate text

            -----
            David

            Comment


            • #7
              Thanks for your advice.

              I suppose this is the problem with me using what seems a simple WYSIWYG program to produce the html code from existing Word documents. Looking at the various pages it has produced, in some cases the text has been turned into an enormous image file and in others it has not, and these have indexed properly. I suspect this may be related to the amount of formatting present in the original documents or odd fonts.

              Comment

              Working...
              X