Perhaps someone with a bit more experience can help me out.
When I run the spider to index my intranet site, I get the following broken link output in the log (I've replaced a large chunk of characters with bold x to preserve privacy):
10:17:14 - Broken link found on page:
http://xxx.xx.ca/opx/Pxx/Px_NorthXxx/xxxments/paxx%20xxund/paxx%20unddetails.htm
10:17:21 - (Broken link URL is:
http://xxxweb.xxx.xxx.xx.ca/opx/Pxx/Px_NorthXxx/common/Xxxulance.htm )
I understand that the broken link URL is found on the page identified by the previous line in the log. No issues there, I fix the link on the server and all is good.
However when I run the spider again, the broken link isn't found anymore (good), but finds another broken link on the server. This new broken link is similar to the first (same name, same URL) - the only difference is that the new broken link is on a different page located in a different branch of the file tree.
This happens everytime I run the spider. It's gets very tiresome fixing, running, fixing, running when the spider should find and identify multiple occurences of the same URL being broken, then I could fix them all at once.
So, I guess my question is this. Why does the spider seem to stop after finding the first broken link, even when the same broken link URL exists somewhere else within the directory structure?
Is there a setting I can change?
Any help would be appreciated
When I run the spider to index my intranet site, I get the following broken link output in the log (I've replaced a large chunk of characters with bold x to preserve privacy):
10:17:14 - Broken link found on page:
http://xxx.xx.ca/opx/Pxx/Px_NorthXxx/xxxments/paxx%20xxund/paxx%20unddetails.htm
10:17:21 - (Broken link URL is:
http://xxxweb.xxx.xxx.xx.ca/opx/Pxx/Px_NorthXxx/common/Xxxulance.htm )
I understand that the broken link URL is found on the page identified by the previous line in the log. No issues there, I fix the link on the server and all is good.
However when I run the spider again, the broken link isn't found anymore (good), but finds another broken link on the server. This new broken link is similar to the first (same name, same URL) - the only difference is that the new broken link is on a different page located in a different branch of the file tree.
This happens everytime I run the spider. It's gets very tiresome fixing, running, fixing, running when the spider should find and identify multiple occurences of the same URL being broken, then I could fix them all at once.
So, I guess my question is this. Why does the spider seem to stop after finding the first broken link, even when the same broken link URL exists somewhere else within the directory structure?
Is there a setting I can change?
Any help would be appreciated
Comment