PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Literal search with text within quotes is interpreting a hyphen as the NOT operator

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Literal search with text within quotes is interpreting a hyphen as the NOT operator

    There appears to be a bug in the way Zoom search is handling a literal search string enclosed in double quotation marks, and including a hyphen.

    Use case: the known target literal string in the index is "QRT 1.4 -4" (without the quotes). In this case the hyphen followed by number four stands for "maintenance release 4" in the technology major.minor release series "QRT 1.4".

    When searching for the literal string "QRT 1.4 -4" (with the quotes), Zoom returns: "No results found." If quotes are removed, or if the substring "-4" is removed from the literal string, a target document can be found eventually, but buried well down in the results list, because of dozens of similar documents which only differ by the maintenance release tag.

    The hyphen is configured to join words by default in the Indexing options page of the Zoom Search configuration.

    I would expect a literal string in quotes to override any such indexing word rules, but apparently this is not the case.

    Perhaps during indexing Zoom is incorrectly excluding "-4" from the index for some reason.

    Can anyone clarify what is happening and provide a workaround?

    Richard

  • #2
    Here is a link on how to do advanced searches on zoom: https://www.zoomsearchengine.com/zoo...earchtips.html
    The minus sign / hyphen is used as a Boolean NOT. So it will exclude these words.

    But words that start with a hyphen are in fact included in the index.

    As for a workaround, you could try replacing the hyphen with a wild card question mark, for example: https://www.passmark.com/search/zoom...=%22%3F0.19%22 - and the second search result finds text containing a hyphen.

    Comment


    • #3
      Thanks for the info. I know about the hyphen as a NOT operator, and it works very well. But for me it should be ignored as such because I'm enclosing my search terms in quotes to force a literal search for the string exactly as typed.

      I tested your workaround example with "?0.19", including the quotes, with the link to the Passmark search you provided, which does indeed return a "-0.19" in result number 2.

      In my use case, however, "QRT 1.4. ?4" still returns "No results found", unfortunately. So I'm guessing that there may be a search configuration option that I many need to change in order to get the result you get on the Passmark site.
      Richard

      Comment


      • #4
        But for me it should be ignored
        While I agree that this might be a reasonable expectation, it wasn't designed like this.
        It would probably be possible to change the behaviour, but it would be a lot of work to change it in all scripts (PHP, ASP, .NET, JS, CGI) and it is a rare case, I don't think anyone else encountered the issue in the last 15 years. People just aren't using <space>-4 as product codes and version numbers.

        Check your dictionary file (zoom_dictionary.zdat) to see if your -4 text was indexed as a word.
        What script are you using, PHP, CGI, etc...? Maybe there is different behaviour in this case.





        Comment


        • #5
          I must admit our release numbering convention is a special case, but that was a decision made by our software development team for various reasons, and over which I have no control.

          Searching through my zoom_dictionary.zdat for "-4" returns as expected matches whenever the hyphen is preceded by an alphanumerical character, but not by a space. Curiously, when I run a regex over the file for the expression "\s-[0-9]", I find loads of "<space>-1", but only for the integer "1".

          We are compiling for ASP.NET.
          Richard

          Comment

          Working...
          X