PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Limit on the numer of Word Skip list

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Limit on the numer of Word Skip list

    Does any one know if there is a limit on the " Word Skip list".

    What I would like to know is. Not the actual limit. But has anything negative
    in the way of performance been documented by exceeding a certain amount of words.

    Either in the speed of the indexing or the speed of the search.

    And am I Correct in believing that by skipping words there is no reduction
    in the index file sizes? And that the words are still indexed just not search-able?

    I have to index several hundred thousand pages (500k-600k) and am just looking at all angles of the process.


    Zoom 64bit - Ver 6.* - 30 WPG - OFFLINE --> CGI / IIS7

    Thank you in advance.

  • #2
    The word skip list will have little to no effect on performacnce.

    We have this FAQ for indexing large sites however, which might provide some useful tips.

    Comment


    • #3
      There is a maximum number of 400 skip words that you can enter. (see chapter 8.5 "Technical limitations" in the Users Guide).

      This does reduce index size slightly. While the word is still stored in the dictionary (because we need to reconstruct context data), we do not store any data in terms of how many times the word appears, nor do we keep track of which pages it appeared on. So for very common words like "the", "he", "she", "and", etc., this would be a significant saving on a very large set of files.

      It won't, as mentioned, necessarily make it faster or slower however.
      --Ray
      Wrensoft Web Software
      Sydney, Australia
      Zoom Search Engine

      Comment


      • #4
        Thank you

        I checked out the wiki index test. Interesting.

        I am using FastCGI on IIS7 with a dedicated server.

        That article made me feel like I should have more faith and less worries,
        Worries I had on when I would need to switch search systems.

        I have tried at least 10 other search programs. And none really seem to grasp the meaning of "search".

        1. OmniFInd OYE. Great, if you like overloaded system resources, having minimal options to control what is indexed and are an expert using XML and REST. If you enjoy 3 of the four things above then try it if you must.

        2. Search engine Builder. Yea, it builds a search engine all right. Also builds an index file the size of the hard drive. And indexes at the speed of an old lady. All in all, not very good.

        3. K-search expect to keep the page count lower than zoom. And cross your fingers that you can get it to work.


        Actually there was 1 competitor that I used that might have out done Zoom in the feature area. I won't mention it. Since I don't think many people reading this are willing to shell out $25,000 USD for a licence that expires..

        Many more but I got to GO.

        Thank you..............................................

        Comment


        • #5
          Thank for the feedback.

          While your server might support FastCGI, I think it is unlikely that you are actually using FastCGI with Zoom. More likely you are using the CGI.

          The FastCGI code that we have if pretty new (as of March 2010), and isn't in any public release. We have supplied it to a large bespoke customer for testing however.

          We are deciding if we will include FastCGI as an option in the next major release. It is way faster than the other scripts (20x faster than CGI and 100x faster than PHP), but the installation process is much harder, and the system requirements are significantly higher and more restrictive.

          Comment

          Working...
          X