PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Did You Mean? - Suggestions

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Did You Mean? - Suggestions

    Hello,

    I was wondering if anyone has some advice and suggestions on how to control or manage the "Did you Mean" Suggestions that are provided?


    Is there a dictionary, or file that can be updated to provide better results? It just seems that when I perform tests it doesnt display the results I would want to see.?

    Any ideas on how to manipulate them to provide more relevant suggestions.?


    Thanks
    Steve

  • #2
    The spelling suggestions are based on a variant of the soundex algorithm. Which isn't perfectly accurate, especially for non English languages.

    The suggestions made come from the search dictionary. This means that only words that are appear on your web site are suggested. It can also mean that misspelled words can be suggested if misspelled appear on your web site.

    You can't edit the dictionary but what you can do is,
    1) Add recomended links for search words that are returning zero or too few results.
    2) Add meta data to relevant pages to make sure sme valid search results are returned and remove the need for the "did you mean.." sugestions.
    3) Add synonyms as required remove the need for the "did you mean.." sugestions.

    Comment


    • #3
      Not accurate at all.

      Sorry to dig up an old thread, but it seemed to provide some context.

      We are just starting to integrate Zoom and have been pleased with everything except the "Did you mean" suggestions. In the previous posts you said that "the suggestions made come from the search dictionary", however we are not finding that to be the case at all.

      For example, if you search for "daviid" (two i's) Zoom comes back with:

      Did you mean: dvd or divid or deviat?

      DVD does appear in our index, however divid and deviat are nowhere to be found in our index. I've triple checked. I could give many, many examples of strange suggestions. In the above example, we do have HUNDREDS of occurrences of "David" which was not suggested at all and seems to be the most likely candidate.

      Have you ever thought about switching from soundex to something based on the Bayes Theorem? Peter Norvig from Google has a great article on spelling correction using Bayes. And, this guy has taken in a step further. Because the Zoom spelling suggestor is so inaccurate, we are probably going to write a wrapper that takes the query and uses something based on one of those two articles to make suggestions and disable the zoom suggestions completely. The test results we have obtained using those 21 lines of code is many times more accurate than we are getting from Zoom.

      But, I'd still like to know why we are getting such crazy results if it's only supposed to be using the dictionary from the search index. Is that a bug? And, I'd really love to see something more accurate built into Zoom so we don't have to deal with both solutions.
      Last edited by boswell; Sep-23-2008, 12:33 AM.

      Comment


      • #4
        As pointed out, this is an old thread. Zoom no longer uses the Soundex algorithm for it spelling suggestion feature.

        Can you tell us which version and build of Zoom you are using?

        If your search function is online, could you also provide us with a URL to it (you can PM or email us if you prefer to keep it private).

        The suggested words are definitely from the index (there is no where else it can retrieve words from), but it's possible that you have words indexed in an unexpected manner (for example, you may have PDF files of scanned documents with bad OCR).

        Or if you are using an old version, it might just be a bug that has since been fixed. Other possibilities include a corrupted index or a modified search script.

        A search for "daviid" on our website, certainly suggests "David". So if yours does otherwise, we would like to have a look at why.

        Originally posted by boswell View Post
        Have you ever thought about switching from soundex to something based on the Bayes Theorem? Peter Norvig from Google has a great article on spelling correction using Bayes. And, this guy has taken in a step further.
        Interesting links, we can certainly take a look at them.

        Most alternative spelling suggestion algorithms have different requirements to what we need (e.g. hosting platform independence), and that's something we need to consider, which is different to Google. For example, they may need a complete English language dictionary file to be hosted on the server. This could mean a 10MB+ file to be uploaded along with all your index data, and this would be overkill for some of our users with limited shared hosting space and a 1-2MB index. Not to mention the processing time required to load in such a large file on top of the searching procedure.

        Many algorithms also rely on a training procedure of some sort, and that means making it much more difficult to setup/install on a server because users have to setup file permissions (to allow the training process to save data somewhere), and execute a training script/function on the web server. Our current design has advantages in ease of installation, minimal requirements and minimal cost on search time.

        Having said that, it'll be interesting to hear how your implementation fares. Did you use the existing Zoom dictionary or added additional files? How much time did it add to the search process? How does it scale?

        While the current spelling suggestion feature in Zoom is not perfect, I would say it should be much better than what you are describing, which seems to be incorrect behaviour. If we're given more information, we can take a look and see what the problem is.
        --Ray
        Wrensoft Web Software
        Sydney, Australia
        Zoom Search Engine

        Comment

        Working...
        X