Ever since ~ Version 5.1.1007 I have been having a problem. I have approx. 8,000 pages and the bulk of these are added to the queue via one page containing several thousand links. In versions prior to ~ 5.1.007, queueing and indexing were concurrent. Now, it seems no indexing takes place while the queue climbs up to ~4,000 URLs, then indexing initiates. While the queue is being built, CPU usage spikes up to 100% and stays there until indexing gets underway. Consequently, the whole process is substantially slower in more recent versions of Zoom Search than it was prior to ~ 5.1.1007. Is this now by design or might there be something else going on here?
Announcement
Collapse
No announcement yet.
CPU usage & URL queue
Collapse
X
-
I don't think there has been any changes to this area in the last couple of patches.
Indexing of page (including adding links found to the queue) and downloading of pages can take place at the same time.
If you configured Zoom to use more threads, then multiple downloads (and indexing) can occur at the same time.
Needless to say, dual core, or dual CPUs do a better job at performing multiple tasks at the same time. What CPU are you using, what limits do you have set and how much RAM is in the machine?
Do you have logging on? Turning this off might speed things up slightly.
If you want E-mail us the one big page and we can test it here.
-
Originally posted by wrensoft View PostIf you configured Zoom to use more threads, then multiple downloads (and indexing) can occur at the same time.
Needless to say, dual core, or dual CPUs do a better job at performing multiple tasks at the same time. What CPU are you using, what limits do you have set and how much RAM is in the machine?
Do you have logging on? Turning this off might speed things up slightly.
If you want E-mail us the one big page and we can test it here.
Comment
-
I do have logging turned on, but do you mean...
I'm not sure how sending the page would help...
Comment
-
Note that there is a difference to queueing URLs for one domain, as opposed to queuing up URLs for multiple domains. If you have changed your configuration so that it is indexing URLs to different domains (eg. your changed your spider option to "index and follow internal and external links"), or if you always had this option set, and only now started adding links on the page which go to more/different domains, then you may be noticing this difference.
While Zoom can index and download pages concurrently from one single domain (or start point) when multiple threads are enabled, it can not download pages from multiple domains at the same time.
It would help, as suggested above, if you can show us the page of links so we can look at the nature of your URLs to see if the above is the case. Else we can merely speculate at possibilities.
Comment
Comment