PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Zoom forgets default match and then miscounts results

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Zoom forgets default match and then miscounts results

    A couple of issues, one relatively minor and one more serious.

    I'm using search.cgi version 4.2 (1013) PRO, I have 'Default to "match all search words"' set on, the Basic search form selected, and 'Provide spelling suggestions when less than 2 results found' checked. I'm not using categories. (The full configuration file is below.)

    From a page on my site, I search for a typo. (That is, not from the modified search template.) I get no matching results and the suggestion: "Did you mean xxxx or yyyy?" I click on xxxx and get the results I expected. Great! However:

    Problem 1:

    I'm now on the search template page. If I view source I find the hidden form field zoom_and has changed from 1 to 0. This is unexpected behavior. It means that if I now do a multi-word search from the search template page I get an implicit OR rather than the implicit AND I expected.

    Problem 2:

    From the search template page I search for aaaa bbbb, and get the following summary: "9 results found containing all search terms. 40 results found containing some search terms. 5 pages of results." This is inconsistent, and when I check page 5, I find there are actually 49 search results.

    Nick

    Configuration file, omitting the strings at the bottom:

    UseUTF8 = 0
    Charset = "iso-8859-1"
    MapAccents = 0
    MinWordLen = 2
    Highlighting = 1
    GotoHighlight = 0
    TemplateFilename = "search_template.html"
    FormFormat = 1
    Logging = 1
    LogFileName = "../search_logs/searchwords.log"
    MaxKeyWordLineLen = 256
    MaxDictIDLen = 2
    NumKeywords = 4953
    NumPages = 423
    DictReservedLimit = 32
    DictReservedPrefixes = 21
    DictReservedSuffixes = 12
    DictReservedNoSpaces = 27
    WordSplit = 1
    ZoomInfo = 0
    Timing = 0
    DefaultToAnd = 1
    SearchAsSubstring = 0
    ToLowerSearchWords = 1
    ContextSize = 30
    MaxContextKeywords = 3
    AllowExactPhrase = 1
    MaxContextSeeks = 500
    UseLinkTarget = 0
    UseDateTime = 0
    WordJoinChars = ".-_'"
    Spelling = 1
    SpellingWhenLessThan = 2
    NumSpellings = 1623
    UseCats = 0
    DisplayNumber = 1
    DisplayTitle = 1
    DisplayMetaDesc = 1
    DisplayContext = 1
    DisplayTerms = 0
    DisplayScore = 0
    DisplayURL = 1
    DisplayDate = 0
    Nick

  • #2
    Re: Zoom forgets default match and then miscounts results

    Originally posted by quintic
    I'm now on the search template page. If I view source I find the hidden form field zoom_and has changed from 1 to 0. This is unexpected behavior. It means that if I now do a multi-word search from the search template page I get an implicit OR rather than the implicit AND I expected.
    This was by design. We are not able to make implicit AND suggestions because we do not have the information required for this. Our suggestions work by suggesting a more common spelling of a word, on a word by word basis, based on the popularity of that word across the entire website. So for example, let's say you search for "zoom animalz" on our website and that we actually have a page on animals, but it does not make mention of "zoom" on it. In this case, the spelling suggestion decides that "animals" is a better spelling and offer the suggestion "zoom animals". Now if you have "boolean AND" turned on here, this suggestion would return 0 results, and look a bit silly to the end user - why suggest something that you have no results for?

    So to put it simply, the spelling suggestion feature is only capable of providing implicit OR suggestions.

    Originally posted by quintic
    From the search template page I search for aaaa bbbb, and get the following summary: "9 results found containing all search terms. 40 results found containing some search terms. 5 pages of results." This is inconsistent, and when I check page 5, I find there are actually 49 search results.
    This seems correct? I'm not sure where the inconsistency is that you are referring to.

    The summary states that there are 9 results found where all words are matched, and 40 results found where only some of the words were matched. This means that, in an example where we're searching for "cat dog", we found 9 results where both the words "cat" and "dog" appeared; and 40 results where only one of the words appeared. So in total, there were 49 results found.

    Note that this message should only appear when the matching is switch to "any search words". If it was set to "all search words", it should never happen.
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment


    • #3
      Re: Zoom forgets default match and then miscounts results

      Originally posted by Ray
      So for example, let's say you search for "zoom animalz" on our website and that we actually have a page on animals, but it does not make mention of "zoom" on it. In this case, the spelling suggestion decides that "animals" is a better spelling and offer the suggestion "zoom animals". Now if you have "boolean AND" turned on here, this suggestion would return 0 results, and look a bit silly to the end user - why suggest something that you have no results for?
      That's debatable, I think. Google does use boolean AND in this scenario. Now I admit that Google is not perfect, and that searching the whole web (as they do by default) is less likely to return zero results for the spelling correction, but I prefer their approach in this case. I don't want my AND switching to an OR behind the scenes, thank you! If you want to avoid returning zero results in such circumstances, you could do a second search (albeit at some cost in performance) and make the suggestion only if it would return some results. Debatable, as I say.

      In any case, that wasn't my point. My point was that after the search for zoom animals, with its implicit OR, a subsequent search from the search template page retains the OR.

      Suppose I search for red rosez on flowers.com, for which "match all search words" is set on. I click on "Did you mean: red roses?" and am taken to a page with links for all kinds of roses and all kinds of red flowers. (Not just red roses; the implicit OR is at work.) Somewhat confused, from the search template page I attempt a new search, for pink carnations. Now I am confronted with links to lots of pink flowers and to carnations in general. Disenchanted -- the search engine appears to be behaving differently than usual, and returning spurious results -- I pick up the phone to the local florist. (Note that Google, Yahoo! and most other search engines default to an implicit AND; most people will never have seen an implicit OR in action.)

      To put it simply, the new search for pink carnations uses an implicit OR, when it should use an implicit AND. (Assuming Default to "match all search words" is set on, as it is on my site.)

      Originally posted by Ray
      The summary states that there are 9 results found where all words are matched, and 40 results found where only some of the words were matched. This means that, in an example where we're searching for "cat dog", we found 9 results where both the words "cat" and "dog" appeared; and 40 results where only one of the words appeared. So in total, there were 49 results found.
      I see what you mean. I would have thought "some" implied "all", so that the 9 results would be within the 40, but the above is fine, too.

      Originally posted by Ray
      Note that this message should only appear when the matching is switch to "any search words". If it was set to "all search words", it should never happen.
      It also appears when clicking on a suggested spelling correction, but you indicated above that that was done by design.

      Thank you for your response.

      Nick
      Nick

      Comment


      • #4
        Re: Zoom forgets default match and then miscounts results

        Originally posted by quintic
        Originally posted by Ray
        So for example, let's say you search for "zoom animalz" on our website and that we actually have a page on animals, but it does not make mention of "zoom" on it. In this case, the spelling suggestion decides that "animals" is a better spelling and offer the suggestion "zoom animals". Now if you have "boolean AND" turned on here, this suggestion would return 0 results, and look a bit silly to the end user - why suggest something that you have no results for?
        That's debatable, I think. Google does use boolean AND in this scenario. Now I admit that Google is not perfect, and that searching the whole web (as they do by default) is less likely to return zero results for the spelling correction, but I prefer their approach in this case. I don't want my AND switching to an OR behind the scenes, thank you! If you want to avoid returning zero results in such circumstances, you could do a second search (albeit at some cost in performance) and make the suggestion only if it would return some results. Debatable, as I say.

        In any case, that wasn't my point. My point was that after the search for zoom animals, with its implicit OR, a subsequent search from the search template page retains the OR.

        Suppose I search for red rosez on flowers.com, for which "match all search words" is set on. I click on "Did you mean: red roses?" and am taken to a page with links for all kinds of roses and all kinds of red flowers. (Not just red roses; the implicit OR is at work.) Somewhat confused, from the search template page I attempt a new search, for pink carnations. Now I am confronted with links to lots of pink flowers and to carnations in general. Disenchanted -- the search engine appears to be behaving differently than usual, and returning spurious results -- I pick up the phone to the local florist. (Note that Google, Yahoo! and most other search engines default to an implicit AND; most people will never have seen an implicit OR in action.)

        To put it simply, the new search for pink carnations uses an implicit OR, when it should use an implicit AND. (Assuming Default to "match all search words" is set on, as it is on my site.)

        Nick
        I agree with Nick's logic. I vote to modify the Zoom based on his logic on this point.

        Comment


        • #5
          Re: Zoom forgets default match and then miscounts results

          Originally posted by quintic
          That's debatable, I think. Google does use boolean AND in this scenario. Now I admit that Google is not perfect, and that searching the whole web (as they do by default) is less likely to return zero results for the spelling correction, but I prefer their approach in this case. I don't want my AND switching to an OR behind the scenes, thank you! If you want to avoid returning zero results in such circumstances, you could do a second search (albeit at some cost in performance) and make the suggestion only if it would return some results. Debatable, as I say.
          My original reply was not to suggest that the current spelling suggestion design is perfect. I was actually pointing out the technical limitation of it. We do not store enough indexed data to make a better spelling suggestion than what we have at the moment, and it was a compromise made for the benefit of flexibility, which I'll explain below.

          Google and other larger search engines rely on a combination of user search history and linguistic dictionaries to make their suggestions. We did not go with this approach because it involves far greater server-side setup and maintenance requirements for the website owner. For example, analysing user search history would require the maintenance of an ongoing database on the web server of previous search behaviours. Linguistic dictionaries would be limited to certain supported languages, and only known words in the dictionary. A very thorough dictionary with names, places, etc. would be truely enormous in size. This is something that Google et al can afford to manage (one of Google's main attributes is the enormous amounts of storage they possess) - however it is impractical for the majority of our users to increase the size of the index data so significantly.

          So this is the reason why the spelling suggestion works as it is at the moment. In any case, there is always room for improvement, and it is something we do plan on looking into in the future.

          Originally posted by quintic
          Suppose I search for red rosez on flowers.com, for which "match all search words" is set on. I click on "Did you mean: red roses?" and am taken to a page with links for all kinds of roses and all kinds of red flowers. (Not just red roses; the implicit OR is at work.) Somewhat confused, from the search template page I attempt a new search, for pink carnations. Now I am confronted with links to lots of pink flowers and to carnations in general.
          Yes, I can see how this would be confusing for the end user. The problem, as you said, is that after clicking on the spelling suggestion link, the search form is set to "match any search words" without the user doing so him/herself.

          We'll have to look into this for Version 5.0 and make a change accordingly. Perhaps a zero result suggestion is OK in retrospect. If users do complain about this in the future, we might have to point them to this thread.

          Originally posted by quintic
          (Note that Google, Yahoo! and most other search engines default to an implicit AND; most people will never have seen an implicit OR in action.)
          We're aware of this and it is the reason why we have the "Default to 'match all search words'" option available. We are considering making this the default option in the next release. In some cases however, particularly on single/smaller sites, some of our users prefer the 'match any search words' option. At the end of the day, both options are there and supported. As our capability of indexing larger sites increase (you may note from our other threads that we're indexing up to a million pages in our tests with the upcoming V5.0), it may be true that "match all search words" has now become the more reasonable default.
          --Ray
          Wrensoft Web Software
          Sydney, Australia
          Zoom Search Engine

          Comment

          Working...
          X