PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Problems with search spelling suggestions

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problems with search spelling suggestions

    I trying to find a search engine for a medical site I am developing, which will have about 25K pages. I purchased the professional version of Zoom today to test it. I like what I see so far—the documentation is excellent and the software offers lots of options to tweak the search results (which I have already put to use with good results).

    Unfortunately, I have come upon one problem which may be a deal-breaker: the spelling suggestions are abysmal. Let me provide some examples.

    If I search for Parkinsin or Parkinsen (instead of Parkinson), it suggests:
    Did you mean: Prognosis or Pregnancy or prognostic?

    If I search for alzimer (instead of alzhiemer), it suggests:
    Did you mean: ALS-mixed or Al-Samarrai?

    If I search for Lou Gerig (instead of Lou Gehrig), it suggests:
    Did you mean: lou Jorg?

    Note that I have been testing many different search engines (open source, paid, SaaS), and the autocorrect on all other search engines I have tried provide appropriate suggestions with the above examples.

    The spelling suggestions in Zoom do work OK for some terms, but it performs quite badly for many others. I thought this might be a problem with our installation or data files, but the spelling suggestion is wacko on your site as well (I assume you are using Zoom for your search). For example, if I search for speling (instead of spelling) on your site, it suggests:
    Did you mean: supply or spill or -_Splas?

    Is there any way to improve the spelling suggestions via the settings, or might this be fixed in the V7 alpha?

  • #2
    I agree, those are some particularly bad suggestions.

    There are some possible causes for suboptimal behaviour:

    (a) The index files are incomplete on your server, and it is using a mix of files from an older session with files from a newer session. If you are manually copying or uploading the files to your web server, you may be missing some files in the process, e.g. "zoom_spelling.zdat"

    (b) You may have partially indexed your content, and some important pages were excluded. The spelling suggestions only offer words that have been indexed (and popular ones at that). If you did not index the pages containing "Parkinson" or "Alzhiemer" for example, then it will not be suggested.

    So make sure searching for those (correctly spelled) words actually return results. And that important parts of the site containing those words didn't get skipped (confirm in the Index Log and the "Index"->"Manage existing index", or simply try to search for those pages).

    So in a few of your examples, (Parkinson and Gehrig), the above issues MAY be a cause.

    However, for "Alzimer/Alzhiemer" this is definitely a deficiency in the metaphone algorithm.

    Basically, for the benefit of a smaller index file to manage for our end users (instead of needing to host a large English phonetic dictionary on your web server), we have to rely on a programmatic way of determining what is a likely misspelling. We judge this by phonetics but there's no definitive set of phonetic rules in the English language. In this particular case (and others), the single character difference is significant enough in the pronunciation of the word that it makes it distinct from the correct spelling (meanwhile "alzhimer" would have been suitably corrected as "alzhiemer").

    There has been some advances in phonetic algorithms since when we last worked on this feature, so there may well be some improvements in this area that we can investigate for V7. We will revisit this in our list of things todo.
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment


    • #3
      re: spelling suggestion problems

      Thanks for your response.

      I did a little checking to look more deeply into the spelling suggestion problem. First, I deleted all index files on the server, re-indexed, then updated the index files to the server. All of the spelling suggestion problems persisted. I also checked to makes sure that the topics which should have been provided as suggestions really were indexed properly. They are. So if I type in “Parkinsen” for search, it will say “Did you mean: Prognosis or Pregnancy or prognostic?”, even though “Parkinson” is in the index (the word “Parkinson” is included in at least 50 different pages of the test site).

      I also installed the Version 7 alpha to see if search suggestions had improved, and found things to be pretty much the same.

      Some more examples:

      Searching for “pakage” (instead of “package”) returns: Did you mean: Peacock or PIK3CA or Pocock?

      Searching for “Daviid” (instead of “David”) returns: Did you mean: divides or devoid or Tufts?

      Note that this is not just an issue on our server. I can get the same strange results on Wrensofts website. For example:

      Searching for “speling” (instead of “spelling) returns: Did you mean: supply or spill or -_Splas?

      Searching for “busines” (instead of “business”) returns: Did you mean: Bison?

      In your message you mentioned that the problems might be a limitation of the phonetic algorithm. But the suggestions are so bad that I wonder if you in fact have a bug in your code. I mean, how can “Tufts” be a logical suggestion for a search for “Daviid”? I also noted by reading the forum that this problem has existed for a while (http://www.wrensoft.com/forum/showthread.php?t=2111), even though you have apparently changed phonetic algorithms.

      We are using the PHP version of the search, and I know that there are plenty of script examples/classes out there that provide a better spelling suggestion than what is offered with Zoom. I wonder if you might try to use a different code base for this aspect or the program.

      Sorry if I am going on a bit about this. Overall I am quite impressed with Zoom. The search index performance is first-rate, and I love how meta tags can be used to provide word and document boosts. It would be great if the spelling suggestions performed to the same high standard as the rest of the product.

      Comment


      • #4
        Originally posted by gregjames700 View Post
        In your message you mentioned that the problems might be a limitation of the phonetic algorithm. But the suggestions are so bad that I wonder if you in fact have a bug in your code. I mean, how can “Tufts” be a logical suggestion for a search for “Daviid”?
        Actually, you'd be surprised. The metaphone code for "David" is "TFT" and the metaphone code for "Tuft" is "TFT". So there is a serious shortcoming of the metaphone algorithm in this scenario. Simplifying the english phonetics system to an algorithm is not as easy as it may seem.

        But you're partially right about it being a little worse than it could be. We looked into our code and realized that we're also stemming the word before generating the metaphone code. This means "Tufts" (which normally would have a metaphone of "TFTS") became "Tuft" ("TFT"). So we can improve on it slightly by using the non-stemmed word, but that doesn't avoid things like "Davids" and "Tufts" matching.

        I also noted by reading the forum that this problem has existed for a while (http://www.wrensoft.com/forum/showthread.php?t=2111), even though you have apparently changed phonetic algorithms.
        Yes, we were previously using soundex. Now we're using a metaphone variation. Soundex may have worked better in some scenarios (like the ones we are looking at) but worked less well overall.

        Ultimately, we really need to revisit with a new algorithm I think, for any significant improvement. We'll look into this for V7.
        --Ray
        Wrensoft Web Software
        Sydney, Australia
        Zoom Search Engine

        Comment


        • #5
          spelling suggestions

          I have read with interest this thread on spelling suggestions as I have found the suggestions to be so poor that I will turn it off until something better is available. Is it possible to limit the suggestions to a list, or to provide a list that is considered first for suggestions. Or maybe provide additional weightings, like those for the seach results, that would control over how likely those terms would be suggested. For example, spelling suggestions could be limited to those found only in title tags and custom meta tags, if desired, and the file names of pages themselves could be eliminated, if desired, from spelling suggestions.

          Comment


          • #6
            I trying to find a search engine for a medical site I am developing, which will have about 25K pages. I purchased the professional version of Zoom today to test it. I like what I see so far—the documentation is excellent and the software offers lots of options to tweak the search results (which I have already put to use with good results).

            Comment

            Working...
            X