PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

406 Errors

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • 406 Errors

    I am unable to get a complete index of our website. Even after reducing threads to 1 and increasing throttling to 15 seconds, I typically get two 406 errors (No acceptable response available) on each pass. If I log in to the website using the login used by ZoomIndexer, I have no problem accessing any of the files / URLs shown with errors. Furthermore, the errors are attributed to different PDF files in each indexing attempt. How can I get a complete index?
    I looked for a retry setting in the User Guide and in the help, but did not find any such feature.
    I have tried both build 1004 and build 1008 with the same results.
    P.S. Errors like this have appeared intermittently in the past 4 months, but I have been able to resolve them by using single-threading and throttling. Unfortunately, I am now at the end of the line for this tactic.

  • #2
    One page request per 15 seconds is a very low level of load. A modern web server should do something around 200 requests per second.
    Also a 406 error is not the typically error you get from a web server when it is overloaded. 406 is nearly never used in real life.
    So further reducing the load is not the solution.

    Instead you need to investigate the root cause. Start by having a look at the web server's error log. There might be additional detail in the log.

    What type of hosting package are you using? Who is it with and what type of package is it (e.g. VPS, dedicated hosting, cheap shared hosting)?

    Comment


    • #3
      Thanks for responding. I haven't been back here in a while as I wasn't notified of your response. The platform we are using is Wild Apricot, and I have no access to logs.
      I see that meanwhile several new Zoom Search versions have been released. I will try again with build 1011.
      P.S: I found out haw to get e-mail notifications by editing my profile settings.

      Comment


      • #4
        So very likely Wild Apricot (what kind of name is that? ) are deliberately blocking you from downloading all the pages on the site.
        Could be a form of lock-in to stop you moving the content elsewhere, or a form of load control to squeeze the maximum number of sites onto a physical machine. (So any popular site won't work on their service if that was the case).

        Comment


        • #5
          Once again I failed to get an e-mail notification. I have (re-)subscribed to this thread (and all of my other threads) and hope to get e-mail notifications in the future.
          I am still having the sporadic 406 errors with build 1011.
          Wild Apricot is indeed a peculiar name; I have no idea how they came by it. However, it is the most reliable and feature-rich club membership platform we have found for a halfway affordable price. Given how much I have slowed down the capture, I doubt that load control is responsible. Furthermore Wild Apricot hosts much larger sites, which are used by much larger populations than ours. Nonetheless, I will open a ticket with to investigate. Meanwhile I have asked Ray whether there is some way to capture the server response in order to identify the mismatch between the requested type and what the server wants to return.

          Comment


          • #6
            Wireshark is the best tool for capturing network traffic.

            Comment


            • #7
              Again, I failed to get an e-mail notification of your post, despite having enabled "Automatic Subscriptions" and "E-mail Notifications" in my profile settings. I am still experiencing intermittent 406 errors with ZSE v8.0.1016 usingle single threading and the maximum pause of 15 seconds between page requests. After extensive research with Wild Apricot, it appears that the problem arises neither from network problems nor from any attempt by Wild Apricot to block migration of content (I can successfully backup the site with wget). Instead it seems to arise from a mismatch between the throttling mechanisms used by Wild Apricot and Zoom Search Engine. Wild Apricot has a limit on frequency of requests, Zoom Search Engine only supports throttling by pausing between pages, but the number of requests generated by a page download can vary enormously. Since the error rarely if ever crops up with the same URL on successive runs of Zoom Search Engine, a simple solution would be for Zoom Search Engine to provide a retry option. This is a feature which has repeatedly been requested since (at least) 2008. Since curl, which underpins Zoom Search Engine, provides a number of retry options, I fail to understand the resistance to incorporating such an option into Zoom Search Engine.

              P.S. I can't add anything more sophisticated than java script to the Wild Apricot website, so I am unable to use ZSE's incremental indexing feature to achieve a complete index. Each time an index yields a 406 error, I have to start over from scratch and hope I don't get another (usually different) one.

              Comment


              • #8
                throttling mechanisms used by Wild Apricot
                Why do they throttle at all?

                but the number of requests generated by a page download can vary enormously.
                Not really. One HTML page download it just one download.
                If you continue on and download all the images linked to by the page that will be more, but Zoom doesn't do this by default.

                As mentioned 18 months ago, one page request per 15 seconds is a very low level of load. A modern web server should do something around 200 requests per second.
                If Wild Apricot can't deal with that, just dump them, it is a garbage product that isn't worth preserving with. We aren't going to change our product to work around their artificial limits.

                Also as pointed out, 406 is a dumb error code to use. 406 (Not Acceptable) implies the server is available but will never serve this file. So retries make no sense.
                If the server returned HTTP 503 (Service Unavailable) it would make more sense and would be more standard. As this is the general code for a server that is overloaded and too busy.


                Comment


                • #9
                  Once again I received no notification of your response here, despite my subscription.

                  Originally posted by David View Post
                  Why do they throttle at all?
                  You would have to ask them. Presumably it is because they host a large number of websites and can^'t afford to have excessive demands on one of them impairing the performance of others

                  Originally posted by David View Post
                  Not really. One HTML page download it just one download.
                  If you continue on and download all the images linked to by the page that will be more, but Zoom doesn't do this by default.

                  As mentioned 18 months ago, one page request per 15 seconds is a very low level of load. A modern web server should do something around 200 requests per second.
                  If Wild Apricot can't deal with that, just dump them, it is a garbage product that isn't worth preserving with. We aren't going to change our product to work around their artificial limits.

                  Also as pointed out, 406 is a dumb error code to use. 406 (Not Acceptable) implies the server is available but will never serve this file. So retries make no sense.
                  If the server returned HTTP 503 (Service Unavailable) it would make more sense and would be more standard. As this is the general code for a server that is overloaded and too busy.
                  Thanks for your sympathy ;-} I can't argue with your assessment. However, after exhaustive analysis of available platforms, we couldn't find anything better for our purposes, at least nothing we can afford. If you can suggest alternative club website and administration software for a small club (100-200 members) with similar pricing and features, I will gladly take a look.
                  I have reported these errors to Wild Apricot and, although they have responded to other change requests, they seem unwilling (or unable) to handle this. IAC, I don't see how changing the error code would improve anything. Meanwhile I am running Zoom Search Engine against a local backup copy of the website as a workaround.



                  Comment


                  • #10
                    and can't afford to have excessive demands on one of them impairing the performance of others
                    Or maybe it is simple greed, deciding to spend as little as possible on infrastructure.

                    There is a study here showing a Raspberry pi 3b is able to serve 1100 pages per second for static pages and 100 pages a second for complex dynamic pages.
                    https://www.diva-portal.org/smash/ge...FULLTEXT01.pdf

                    Raspberry Pi is a $40 computer.




                    Comment

                    Working...
                    X