If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.
A really useful feature to prevent time outs should be that the indexer pauses itself whenever it encounters an error and possibly bleep, is this possible
I'm not sure why this would prevent timeouts? Perhaps you have a specific situation in mind that you can elaborate. Are you talking about timeouts during spider mode indexing?
A timeout may occur when the Indexer (in spider mode) is expecting a response from the web server and it fails to receive this after about a minute of waiting. This is often due to a failure to connect to the server, or the server may be overloaded and can not handle the number of requests it is getting (so some gets rejected).
In these cases, pausing won't really do much. Unless you mean that it should pause and re-send the request after a period of time. Otherwise pausing and beeping would just halt indexing whenever a page times out. This would also be impossible for scheduled indexings which may be un-attended, as the indexer would just halt and freeze.
Yes, it is in spider mode where I have the problem. It seems as if once the server starts to reject a request, and once the pages to scan run out (while their are still pages to download on the website), it moves onto the next url because it thinks their are no more pages in that website.
Your point was good about resuming after a certain amount of time. That would be great. I've found that the indexing has to be resumed after about 10 minutes or so for everything to be fine when I resume.
May be you could have a time limit or and no time limit, ie, manual restart.
We don't think it would be practical to retry every possible error that can happen when retrieving a page (because most of the time, they really are errors and may be problems with configuration etc. which would not fix itself just by waiting a period of time). However, if there is a more identifiable error for your problem, then we can consider if something reasonable can be done.
Perhaps you can send us your index log ("File"->"Save index log to file") and show us what happens specifically when you are having connection issues which impairs your indexing.
The other alternative is, of course, to use a computer with a better Internet connection to do your spider mode indexing. Or consider the possiblity of indexing offline or on a local server.
With our previous search solution we were indexing via an offline server, but as we're looking to apply Zoom to a dynamic and oft-refreshed site that is built from contributions from hundreds of sources per hour, that's not really practical.
I understand that some errors are simply errors, but I'm seeing timeouts on numerous pages (even when I have a strong internet connection) that could easily be fixed with better log maintenance. The log already keeps track of the kinds of errors reported (to a reasonably fine degree) so it's a matter of parsing the log and making a list of pages to try again later.
I wrote a script that is feeding those pages back to the indexer one by one as starting points, but it would be easier if the indexer had this function natively.
Consider this a feature request, I guess.
My Zoom-searchable poetry archives web site.
http://poetryx.com
I am not sure how you can have a "strong" internet connection if you have timeouts all the time. That would seem to be a contradiction. I would be complaining to our ISP if we had this problem.
I think Ray wanted to see part of your log file. To identify the exact error you are getting. There are several different timeout situations and from your description we can't tell which particular one you are getting.
I am not sure how you can have a "strong" internet connection if you have timeouts all the time. That would seem to be a contradiction. I would be complaining to our ISP if we had this problem.
I use Comcast at home (horrible connectivity, but it's usually free because of all of the downtime) and Verizon DSL at the office (2.4mbps, always up, fast both uploading and downloading - in other words, "strong"). Sometimes I use a wireless connection too, which is always patchy.
In any case, since I've broken up my site into smaller chunks for spidering I haven't seen the problem. It seems to occur consistently (regardless of whether I'm on the DSL line, a client's T1, wireless, etc.) when I just start the spider off at the root of the site and let it wander.
The next time I see the problem I'll send my log file.
Or heck, you can try it: http://poetryx.com (making sure that the base url includes the subdomains poetry.poetryx.com and articles.poetryx.com).
My Zoom-searchable poetry archives web site.
http://poetryx.com
Comment