PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Re-indexing : Time and space!

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Re-indexing : Time and space!

    We have a site, hosted on a linux server, which is expanding towards 750,000 pages, including our forums.

    Zoom looks promising for our site search purposes but it would be handy if we had some idea how long the initial indexing might take, what size index files might be generated and what power and capacity would be required of an office PC, running XP Pro, to do this - eg if it could be run as a background task.

    I assume that incremental indexing would be the way to go after set up.

    Any suggestions will be appreciated.

  • #2
    This previous post will give you a basic impression of indexing speed (although the numbers given are from an older version - the latest version should be just as fast in all aspects, and in most cases, faster):
    http://wrensoft.com/forum/showthread.php?t=514

    There are really too many factors involved to give a meaningful estimation.

    If you are indexing your forums, then you would most likely be indexing with Spider Mode, which means that the speed of your internet connection (between the indexing computer and the web server), and the speed at which the server is able to serve files, are the most signficant factors to how fast you can index.

    The size of the resultant index files would depend on the size and nature of your content. For example, there's a big difference between having 750,000 HTML pages (each of 3 KB in size) versus 750,000 PDF documents (each several MB's, and possibly containing hundreds of pages of small font text). And then the nature of the content would be another factor (e.g. many pages of a simple/common vocabulary versus many pages containing hundreds of serial numbers and/or product codes).

    750,000 pages are alot of pages, how are you counting this number? (did you add the number of threads in the included forum, assuming 1 thread per page?) It's worth confirming this because we often see users overestimate their page count. If the majority of this is due to the forum, you could potentially separate the search function into two sets of index files (i.e. a "search the site" function, and a "search the forums" funtion), which would be easier to manage.

    If you like, you can give us the URL to your site and we'd have a better idea of the scale we're looking at.
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment


    • #3
      Thanks, Ray. Details sent via PM.

      Comment

      Working...
      X