Announcement

**David** · Feb-12-2015, 10:15 PM

The obvious thing is to not create hundreds of broken pages in the first place.

Second obvious thing would be to use page templates or a CMS system. So when things like links in menus get broken you only need to fix it once in one place.

But as this doesn't seem to apply, in your case I would suggest doing either,

1) A global search and replace across your HTML code. There are many tools that can do this. Dreamweaver and Ultraedit are two we use.

2) Putting in a server side redirect. This is especially useful if external sites, which you don't control, are linking in to your site. Again, it is a one off change to do this.

**PilotJR70** · Feb-14-2015, 05:24 AM

Originally posted by wrensoft View Post

The obvious thing is to not create hundreds of broken pages in the first place

Thanks for the idea, don't think ANYONE would have ever thought of that <roll-eyes>

We inherited this directory structure and don't have much choice at the moment to use what has been handed to us.

My original questions still stand. Is there are reason it stops after finding the first common broken link? Or is there a setting that I'm missing in the configuration set-up.

**Ray** · Feb-16-2015, 03:45 AM

Originally posted by PilotJR70 View Post

However when I run the spider again, the broken link isn't found anymore (good), but finds another broken link on the server. This new broken link is similar to the first (same name, same URL) - the only difference is that the new broken link is on a different page located in a different branch of the file tree.
...

So, I guess my question is this. Why does the spider seem to stop after finding the first broken link, even when the same broken link URL exists somewhere else within the directory structure?

There are two main reasons:

1) If the first broken link causes the spider to not crawl a sub portion of the website, then it won't get to that branch containing the second broken link.

For example, let's say you have a "/news/" section of the site which links to "/articles/index.html" but this link is broken so the entire "/articles/" section of the site is not indexed. In which case, if there's a broken link in the /articles/ section, it won't be found until you fix that first link.

2) Second, yes, Zoom will not reconsider a link if it turns out to be broken the first time. So it won't determine it is a broken link again, it will just notice that it's a link it's already seen and bypass it. This saves indexing time instead of having to re-attempt every time a URL appears (some broken links actually lead to a time out instead of a 404 error and can take up to a minute to complete).

Announcement

Spider not finding all broken links

Spider not finding all broken links

Comment

Comment

Comment