I have used Zoom to search a small medical site of about 50 pages with the output in javascript. The search engine has found a list of about 8000 unique words. However, the search does not seem to have found medical terms such as drug names or names of specific medical conditions. Is there a simple explanation for this?
Announcement
Collapse
No announcement yet.
Words missing from search
Collapse
X
-
The free version of the software is limited to 50 pages, so maybe some pages were not indexed? Thus some words on these pages woud not be indexed.
If a word has more than 40 characters in it, it will be broken up into multiple words.
Can you give some examples of words not indexed and the pages they are found on.
-----
David
-
Thanks for your response.
The search engine appears to have tried to index all the pages I expected it to (judging by the output in verbose mode). However the word list is incomplete. Since this is an intranet site, I would need to email sample pages etc to you, if this is possible.
Comment
-
Yes, you can email your files to us at the address on the top of our support page.
Can you give us some examples of the drug names and search terms in question. For example, if they contain an unusual character, or if they are hyphenated, such as "blah-somethingitis". The way that words are indexed depend on the "Indexing Options" specified in the Configuration window. You may find that your options are causing certain words to be splitted up.
Comment
-
I can see the problem. These words don't appear in the body of the document. Well they do, but they really don't. Kind of. Sort of.
This is your HTML,
Code:[img]Drug%20Dosage_files/shapeimage_2.png[/img]
In this release of Zoom we don't index images, so there isn't much point indexing the alternate image text (especially 10KB of alternate text).
In their document, Techniques For Accessibility Evaluation And Repair Tools, the Evaluation and Repair Tools Working Group of the W3C Web Accessibility Initiative recommends a maximum of 150 characters for alternate text
-----
David
Comment
-
Thanks for your advice.
I suppose this is the problem with me using what seems a simple WYSIWYG program to produce the html code from existing Word documents. Looking at the various pages it has produced, in some cases the text has been turned into an enormous image file and in others it has not, and these have indexed properly. I suspect this may be related to the amount of formatting present in the original documents or odd fonts.
Comment
Comment