PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Indexing speed benchmarks

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Indexing speed benchmarks

    We have had a couple of enquiries about indexing speed and the number of pages that can be indexed per second. So we thought it would be worth posting some example benchmarks.

    We already have a lot of detailed information for website search engine benchmarks on our web. See,
    http://www.wrensoft.com/zoom/benchmarks.html

    But we didn't have much details about the speed of indexing.

    The indexing speed can vary enormously depending on many factors including,
    • File size
    • Network speed, load & latency.
    • If conversion is required from Word or PDF to text (for example)
    • The Zoom configuration & number of threads in use
    • CPU speed
    • How much RAM is intalled & if the total amount of data to be indexed forces swap space to be used
    • Other load on the machine doing the indexing or the server.
    • If you are using spider mode or offline mode.
    • Network latency
    • Internet & file system caching
    • If you are indexing pages generated by a database (e.g. .php/sql pages) and the speed of that database.
    • The throttling value set in the Zoom configuration window (In V5 of Zoom)
    These results are from V4 of Zoom. But the V5 performance is at least as good or better.

    Example 1:
    Spider mode on an 1500/256 ADSL connection indexing a remote Internet site in the same country, no caching, mix of PHP and HTML pages, 2 threads, HP XW8200 PC, 1GB RAM.
    RESULT: 100 pages in 10 seconds (10 pages / second)

    Example 2:
    Spider mode on an 1500/256 ADSL connection indexing a remote Internet site on the other side of the world, no caching, mix of PHP, HTML & PDF file (10KB to large 1MB files), 4 threads, HP XW8200 PC, 1GB RAM.
    RESULT: 151 pages in 58 seconds (2.6 pages / second)

    Example 3:
    Offline mode. Local files stored on local hard disk, HTML pages (9KB - 15KB each). 2 threads, HP XW8200 PC, 1GB RAM, 10K RPM drive
    RESULT: 5000 pages in 62 seconds (80.6 pages / second)

    The conclusions are
    • Dramatic speed differences are possible depending on your configurartion.
    • Offline mode is much faster than spider mode becuase files do not need downloading.
    • Spider mode speed is limited by the ADSL connection speed and it's latency for remote sites.
    • Offline mode speed is limited by the speed of the local hard disk and CPU speed.
    • Large PDF files take much longer to index than small HTML files
    ------
    David

  • #2
    If both the server and indexing machine have good connection speed, the number of threads used influences things strongly. Our server is 2500 miles from the machine used to build the indexes. The indexing machine has a download speed of 5+Mb/s. The full site has ~1000 files, ranging from 1K html to 600+K pdfs.
    Code:
     Zoom       Index Speed
    Threads     (Pages/sec)
       1           2.8
       2           5.4
       5          11.4
      9-10        13.6

    Comment


    • #3
      For top speed we maintain a copy of our web site on a local PC (runnning Microweb which, whilst meant for use of CD's, can be run from a hard-disk folder).

      We publish our edits to that folder. It means we can check them by browsing from others PC's before we then copy new content to the real web site.

      For Zoom it means that we can build our indexes on that PC in off-line mode for maximum speed, test the new index files and then publish them to the real site.

      The other advantage is a working backup copy of the whole website just in case and the ability to make further wholesale copies when we want to try new code.
      Mark Gallagher

      Comment

      Working...
      X