PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

skip URL when indexing - Locks up on a URL

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • skip URL when indexing - Locks up on a URL

    hi all

    wen your indexing a url is the a way to skip the url if it locks up on the url ??

    thanks

  • #2
    If you know there is a problem with a particaular URL in advance, then you could just add this URL to the skip list.

    But in most cases there should also be a timeout on bad URLs, or loss of internet connection. What is the URL in question?

    Comment


    • #3
      skip urls

      Originally posted by wrensoft View Post
      If you know there is a problem with a particaular URL in advance, then you could just add this URL to the skip list.

      But in most cases there should also be a timeout on bad URLs, or loss of internet connection. What is the URL in question?
      here are just some of the urls to skip

      http://www.montezumawell.com/montezumawelldirectory.html
      http://stats.superstats.com/b/ss/vsign_1362749/1
      <snip>
      http://www.abigdir.com/
      <snip>
      http://www.buildweblinks.com/Recreation_/Travel_/
      http://www.bestwebdirectory.info/

      the is over 400 url that are now not getting index and i cant under stand why ??

      Comment


      • #4
        Actually, we could only find a problem when indexing one of the above URLs:
        http://stats.superstats.com/b/ss/vsign_1362749/1

        All the other URLs you listed indexed fine.

        I presume you have these URLs added as start points (by clicking on the "More" button in Spider Mode).

        The problem with the above URL is that it is actually a script which serves an image file. It is, in fact, useless as a website URL. If you click on the above link you will see that it leads you to a seemingly blank page. In fact, it is a 43 bytes GIF image with 2x2 pixels of transparency.

        The Indexer stalled indefinitely when given this URL to index as a start point. This was because of a bug which only failed to move to the next start point, when an image is indexed as a start point AND the image is less than the "Minimum image file size to index" limit (defaulted to 5 kb). As this is the case here, the URL caused the indexer to stop and not continue further.

        We have fixed this bug for the next release (V5.1.1009). But in the meantime, you should be able to safely remove the URL from your list (since it is not a valid page and will be skipped anyway), and the rest of your URLs should index fine.
        --Ray
        Wrensoft Web Software
        Sydney, Australia
        Zoom Search Engine

        Comment

        Working...
        X