PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Auto pause upon error

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Auto pause upon error

    A really useful feature to prevent time outs should be that the indexer pauses itself whenever it encounters an error and possibly bleep, is this possible

    thanks

  • #2
    I'm not sure why this would prevent timeouts? Perhaps you have a specific situation in mind that you can elaborate. Are you talking about timeouts during spider mode indexing?

    A timeout may occur when the Indexer (in spider mode) is expecting a response from the web server and it fails to receive this after about a minute of waiting. This is often due to a failure to connect to the server, or the server may be overloaded and can not handle the number of requests it is getting (so some gets rejected).

    In these cases, pausing won't really do much. Unless you mean that it should pause and re-send the request after a period of time. Otherwise pausing and beeping would just halt indexing whenever a page times out. This would also be impossible for scheduled indexings which may be un-attended, as the indexer would just halt and freeze.
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment


    • #3
      Hi there

      Yes, it is in spider mode where I have the problem. It seems as if once the server starts to reject a request, and once the pages to scan run out (while their are still pages to download on the website), it moves onto the next url because it thinks their are no more pages in that website.

      Your point was good about resuming after a certain amount of time. That would be great. I've found that the indexing has to be resumed after about 10 minutes or so for everything to be fine when I resume.

      May be you could have a time limit or and no time limit, ie, manual restart.


      Thank you

      Comment


      • #4
        If the web server needs 10 minutes of rest before you can view a page, then there is something wrong with the web server.

        If Zoom can't download a page, then neither can a normal user view a page on the web site.

        So it would seem more logical to fix the problem on the server rather than trying to make Zoom work around the web server problem.

        --------
        David

        Comment


        • #5
          I have the same problem but with connectivity on my ISP's side, not my server's side.

          Comcast goes down more often than an heiress in a homemade movie, and when the Indexer can't spider the pages it just skips them.

          It would be nice if it could re-try skipped pages after it was done.
          My Zoom-searchable poetry archives web site.
          http://poetryx.com

          Comment


          • #6
            We don't think it would be practical to retry every possible error that can happen when retrieving a page (because most of the time, they really are errors and may be problems with configuration etc. which would not fix itself just by waiting a period of time). However, if there is a more identifiable error for your problem, then we can consider if something reasonable can be done.

            Perhaps you can send us your index log ("File"->"Save index log to file") and show us what happens specifically when you are having connection issues which impairs your indexing.

            The other alternative is, of course, to use a computer with a better Internet connection to do your spider mode indexing. Or consider the possiblity of indexing offline or on a local server.
            --Ray
            Wrensoft Web Software
            Sydney, Australia
            Zoom Search Engine

            Comment


            • #7
              With our previous search solution we were indexing via an offline server, but as we're looking to apply Zoom to a dynamic and oft-refreshed site that is built from contributions from hundreds of sources per hour, that's not really practical.

              I understand that some errors are simply errors, but I'm seeing timeouts on numerous pages (even when I have a strong internet connection) that could easily be fixed with better log maintenance. The log already keeps track of the kinds of errors reported (to a reasonably fine degree) so it's a matter of parsing the log and making a list of pages to try again later.

              I wrote a script that is feeding those pages back to the indexer one by one as starting points, but it would be easier if the indexer had this function natively.

              Consider this a feature request, I guess.
              My Zoom-searchable poetry archives web site.
              http://poetryx.com

              Comment


              • #8
                I am not sure how you can have a "strong" internet connection if you have timeouts all the time. That would seem to be a contradiction. I would be complaining to our ISP if we had this problem.

                I think Ray wanted to see part of your log file. To identify the exact error you are getting. There are several different timeout situations and from your description we can't tell which particular one you are getting.

                --
                David

                Comment


                • #9
                  Originally posted by Wrensoft
                  I am not sure how you can have a "strong" internet connection if you have timeouts all the time. That would seem to be a contradiction. I would be complaining to our ISP if we had this problem.
                  I use Comcast at home (horrible connectivity, but it's usually free because of all of the downtime) and Verizon DSL at the office (2.4mbps, always up, fast both uploading and downloading - in other words, "strong"). Sometimes I use a wireless connection too, which is always patchy.

                  In any case, since I've broken up my site into smaller chunks for spidering I haven't seen the problem. It seems to occur consistently (regardless of whether I'm on the DSL line, a client's T1, wireless, etc.) when I just start the spider off at the root of the site and let it wander.

                  The next time I see the problem I'll send my log file.

                  Or heck, you can try it: http://poetryx.com (making sure that the base url includes the subdomains poetry.poetryx.com and articles.poetryx.com).
                  My Zoom-searchable poetry archives web site.
                  http://poetryx.com

                  Comment

                  Working...
                  X