Hi,
I just installed Zoom 6 pro and successfully indexed my entire site except for one important area (my blog) which Zoom isn't indexing.
The bulk of my site consists of static html pages organized in folders. Zoom finds and indexes these all (5000+ pages).
http://www.johnkane.com/
However, my blog runs on Wordpress and all pages are dynamically generated. The permalink structure doesn't even contain pages (with or without extensions), and end in slashes:
http://www.johnkane.com/blog/
http://www.johnkane.com/blog/2010-02-03-mushroom-macros/
There are no files to scan for, only urls that return pages when called. The urls are specified on the root and subsequent urls.
Zoom's log returns the error:
X No files to spider from http://www.johnkane.com/blog/
In a twist, it seems I can partially solve this by unchecking the "Enable robots.txt support" under Advanced spider mode options on the Spider options tab. This even though my robots.txt file (in the root of my domain) doesn't restrict the /blog/ folder.
The problem of this approach (ignore robots.txt) is that zoom will crawl content in folders not referenced by my site, which I don't want. For instance, orphan or admin pages.
Hoping I can reconfigure Zoom or Wordpress or both to resolve this. Any suggestions appreciated!
I just installed Zoom 6 pro and successfully indexed my entire site except for one important area (my blog) which Zoom isn't indexing.
The bulk of my site consists of static html pages organized in folders. Zoom finds and indexes these all (5000+ pages).
http://www.johnkane.com/
However, my blog runs on Wordpress and all pages are dynamically generated. The permalink structure doesn't even contain pages (with or without extensions), and end in slashes:
http://www.johnkane.com/blog/
http://www.johnkane.com/blog/2010-02-03-mushroom-macros/
There are no files to scan for, only urls that return pages when called. The urls are specified on the root and subsequent urls.
Zoom's log returns the error:
X No files to spider from http://www.johnkane.com/blog/
In a twist, it seems I can partially solve this by unchecking the "Enable robots.txt support" under Advanced spider mode options on the Spider options tab. This even though my robots.txt file (in the root of my domain) doesn't restrict the /blog/ folder.
The problem of this approach (ignore robots.txt) is that zoom will crawl content in folders not referenced by my site, which I don't want. For instance, orphan or admin pages.
Hoping I can reconfigure Zoom or Wordpress or both to resolve this. Any suggestions appreciated!
Comment