I can't seem to get zoom search to exclude specific files via the Robots.txt if they have typical variables as part of the URL. Robots.txt works fine for directories or a filename without variables. I'm trying to tune robots.txt for my forum (VB -same the one you use), and while I it can be done in the zoom search configuration (which I've confirmed works), it might indicate a bug?
For example looking at the log -
Zoom search indexes the file: (note the line is much longer, but you get the idea - the thread editor doesn’t' allow the full line).
http://www.mysite.com/forums/search.php?f=8
The Robots.txt file: (confirmed uploaded and with these entries, forum is under /forums)
The log shows it reads the Robots.txt file fine, and it excludes other files and directories in the Robots.txt.
I also have set the General configuration option "Reload all files (do not use cache).
Where it becomes really awful is to watch it attempt to index the calendar in VB forum where I'm guessing it's infinitely going through prior and future months via links on the calendar page. It's the same problem as above, where the robots.txt excludes the calendar, but it seems to process it when there is one or more passed variables in the URL. I have to manually stop it as I'm not sure how long it might run.
Perhaps you will see the same issue with the Wrensoft site since you're using the same forum system.
For example looking at the log -
Zoom search indexes the file: (note the line is much longer, but you get the idea - the thread editor doesn’t' allow the full line).
http://www.mysite.com/forums/search.php?f=8
The Robots.txt file: (confirmed uploaded and with these entries, forum is under /forums)
User-agent: *
Disallow: /forums/search.php
Disallow: /forums/calendar.php
....
Disallow: /forums/search.php
Disallow: /forums/calendar.php
....
The log shows it reads the Robots.txt file fine, and it excludes other files and directories in the Robots.txt.
I also have set the General configuration option "Reload all files (do not use cache).
Where it becomes really awful is to watch it attempt to index the calendar in VB forum where I'm guessing it's infinitely going through prior and future months via links on the calendar page. It's the same problem as above, where the robots.txt excludes the calendar, but it seems to process it when there is one or more passed variables in the URL. I have to manually stop it as I'm not sure how long it might run.
Perhaps you will see the same issue with the Wrensoft site since you're using the same forum system.
Comment