I am wracking my brain trying to figure out what is wrong with Zoom, that it won't search through my site. I have tried doing it in offline mode and crawling the site on the internet. I have the site set up with many separate folders and files in those folders, rather than one big list of html files. It is a typical way to lay out a site. No big deal. The folders are like, "/home/" , "/guestbook/" , "/downloads/" , etc. Everything is properly linked together (relatively, rather than absolute links). I have had plenty of experience indexing a site with Zoom, so this one is puzzling me. It keeps giving me errors like, "Skipping http://subdomain.mywebsite.com/new (External site - does not match base URL)". Please help!
Announcement
Collapse
No announcement yet.
Searching a Site Based on Folders
Collapse
X
-
That skipping message means that the URL being indexed does not match the base URL specified. This would happen in Spider Mode if you have a Base URL which points to a different domain to the URL being indexed.
For example, you may be indexing a page such as "http://www.mysite.com/mypage.html" with a base URL of "http://www.mysite.com/"
This means that only links matching that base URL will be considered part of the site. It will not automatically follow links to every other domain name (or subdomain) because they would typically be other sites (and doing so would mean the spider would end up indexing the rest of the Internet).
If you have links to a subdomain which you wish to index as part of the same site, you should specify multiple base URLs separated by a semicolon. For example, "http://www.mywebsite.com/;http://subdomain.mywebsite.com/"
Information on multiple base URLs, and indexing subdomains can be found in ch 2.1.6 "Base URL" of the Users Guide here:
http://www.wrensoft.com/zoom/usersguide.html
-
Originally posted by Ray View PostThat skipping message means that the URL being indexed does not match the base URL specified. This would happen in Spider Mode if you have a Base URL which points to a different domain to the URL being indexed.
For example, you may be indexing a page such as "http://www.mysite.com/mypage.html" with a base URL of "http://www.mysite.com/"
This means that only links matching that base URL will be considered part of the site. It will not automatically follow links to every other domain name (or subdomain) because they would typically be other sites (and doing so would mean the spider would end up indexing the rest of the Internet).
If you have links to a subdomain which you wish to index as part of the same site, you should specify multiple base URLs separated by a semicolon. For example, "http://www.mywebsite.com/;http://subdomain.mywebsite.com/"
Information on multiple base URLs, and indexing subdomains can be found in ch 2.1.6 "Base URL" of the Users Guide here:
http://www.wrensoft.com/zoom/usersguide.html
Comment
-
I addressed the most likely cause of the problem from the information you have given me.
The only useful information you gave me was the skip message. Did you check if the problem is what I suggested? If you have, and have reasons to show that this is not the case, then do tell us and let us know why.
All you have told me is that you have folders, and you have index files for these folders and they are linked to other folders. There is nothing unusual about that. That's essentially every website out there. Zoom should have no problem indexing a website because of that. The most likely problem, given your skip message, is what I suggested, so I recommend you check that and let us know if you have reasons to believe that is not the case.
Also, it would be helpful to have the exact message given (with the real URL intact) as well as the start and base URL you are using. E-mail us the actual ZCFG file with your indexing configuration if you are lost.
Comment
-
Originally posted by Ray View PostI addressed the most likely cause of the problem from the information you have given me.
The only useful information you gave me was the skip message. Did you check if the problem is what I suggested? If you have, and have reasons to show that this is not the case, then do tell us and let us know why.
All you have told me is that you have folders, and you have index files for these folders and they are linked to other folders. There is nothing unusual about that. That's essentially every website out there. Zoom should have no problem indexing a website because of that. The most likely problem, given your skip message, is what I suggested, so I recommend you check that and let us know if you have reasons to believe that is not the case.
Also, it would be helpful to have the exact message given (with the real URL intact) as well as the start and base URL you are using. E-mail us the actual ZCFG file with your indexing configuration if you are lost.
Comment
Comment