We have had a couple of enquiries about indexing speed and the number of pages that can be indexed per second. So we thought it would be worth posting some example benchmarks.
We already have a lot of detailed information for website search engine benchmarks on our web. See,
http://www.wrensoft.com/zoom/benchmarks.html
But we didn't have much details about the speed of indexing.
The indexing speed can vary enormously depending on many factors including,
Example 1:
Spider mode on an 1500/256 ADSL connection indexing a remote Internet site in the same country, no caching, mix of PHP and HTML pages, 2 threads, HP XW8200 PC, 1GB RAM.
RESULT: 100 pages in 10 seconds (10 pages / second)
Example 2:
Spider mode on an 1500/256 ADSL connection indexing a remote Internet site on the other side of the world, no caching, mix of PHP, HTML & PDF file (10KB to large 1MB files), 4 threads, HP XW8200 PC, 1GB RAM.
RESULT: 151 pages in 58 seconds (2.6 pages / second)
Example 3:
Offline mode. Local files stored on local hard disk, HTML pages (9KB - 15KB each). 2 threads, HP XW8200 PC, 1GB RAM, 10K RPM drive
RESULT: 5000 pages in 62 seconds (80.6 pages / second)
The conclusions are
David
We already have a lot of detailed information for website search engine benchmarks on our web. See,
http://www.wrensoft.com/zoom/benchmarks.html
But we didn't have much details about the speed of indexing.
The indexing speed can vary enormously depending on many factors including,
- File size
- Network speed, load & latency.
- If conversion is required from Word or PDF to text (for example)
- The Zoom configuration & number of threads in use
- CPU speed
- How much RAM is intalled & if the total amount of data to be indexed forces swap space to be used
- Other load on the machine doing the indexing or the server.
- If you are using spider mode or offline mode.
- Network latency
- Internet & file system caching
- If you are indexing pages generated by a database (e.g. .php/sql pages) and the speed of that database.
- The throttling value set in the Zoom configuration window (In V5 of Zoom)
Example 1:
Spider mode on an 1500/256 ADSL connection indexing a remote Internet site in the same country, no caching, mix of PHP and HTML pages, 2 threads, HP XW8200 PC, 1GB RAM.
RESULT: 100 pages in 10 seconds (10 pages / second)
Example 2:
Spider mode on an 1500/256 ADSL connection indexing a remote Internet site on the other side of the world, no caching, mix of PHP, HTML & PDF file (10KB to large 1MB files), 4 threads, HP XW8200 PC, 1GB RAM.
RESULT: 151 pages in 58 seconds (2.6 pages / second)
Example 3:
Offline mode. Local files stored on local hard disk, HTML pages (9KB - 15KB each). 2 threads, HP XW8200 PC, 1GB RAM, 10K RPM drive
RESULT: 5000 pages in 62 seconds (80.6 pages / second)
The conclusions are
- Dramatic speed differences are possible depending on your configurartion.
- Offline mode is much faster than spider mode becuase files do not need downloading.
- Spider mode speed is limited by the ADSL connection speed and it's latency for remote sites.
- Offline mode speed is limited by the speed of the local hard disk and CPU speed.
- Large PDF files take much longer to index than small HTML files
David
Comment