Hopefully this might help some other web developers....
Some of you might be interested to see what we have done with our new vBulletin forums on the PassMark site.
We wanted a site wide search engine that would search all of the vB forum posts, the content of our PDF files, and all the other web pages on our site. To do this we used the Zoom to spider all of our site content.
What follows is a brief overview of how to get vB and Zoom to play together nicely.
The first requirement we had was that people should be able to search all the site, just our product pages or just the forums. This was easily done by creating search categories in Zoom. We set this up so that all pages in the /forum/ directory got put into the Forum category.
The main problem turned out to be that Zoom would index too deeply into vBulletin finding all kinds of unwanted links. So careful configuration is required to avoid indexing too much irrelevant information. When you allow the spider to index the large number of content-irrelevant pages created by vB, you are reducing the effectiveness of your search results (by returning too many pages that a user would not find useful), as well as significantly extending the time required to index your site, and wasting resources in terms of bandwidth and disk space.
So some configuration is required because the spider is designed to follow every legitimately different link on a web page. But in the case of the vB script (and other forums scripts), there can often be many useless pages which are simply user options eg. login procedures, sorting options, various display modes of the same page, etc..
The Zoom spider is able to skip indexing of certain URLs based on entering a fragment of the URL into a skip list. Assuming your forum is installed in a directory called /forum/ this is the required skip list to just index the thread content and nothing else.
/forum/private.php
/forum/usercp.php
/forum/faq.php
/forum/memberlist.php
/forum/calendar.php
/forum/search.php
/forum/forumdisplay.php?do=markread
/forum/login.php
/forum/modcp/
/forum/member.php
/forum/showthread.php?goto=newpost
/forum/newthread.php
&daysprune=-1&order=
/forum/showthread.php?p=
/forum/showthread.php?mode=hybrid
/forum/showpost.php
/forum/editpost.php
/forum/newreply.php
/forum/online.php
/forum/profile.php
/forum/report.php
/forum/postings.php
/forum/misc.php
/forum/subscription.php
/forum/poll.php
/forum/sendmessage.php
/forum/printthread.php
&goto=nextnewest
&goto=nextoldest
/forum/infraction.php
/forum/archive/
/forum/viewtopic.php
After indexing our site, this got the basic search function working. But the next minor problem was ensuring that our HTML product pages and PDFs ranked above the search results for the forum.
The easy solution was to force down the ranking of the forum results in the search index by inserting the following meta data into the vBulletin headinclude template.
<meta name="ZOOMPAGEBOOST" content="-5" />
And now we can search all of the Passmark.com site from a single search page.
----
David
Some of you might be interested to see what we have done with our new vBulletin forums on the PassMark site.
We wanted a site wide search engine that would search all of the vB forum posts, the content of our PDF files, and all the other web pages on our site. To do this we used the Zoom to spider all of our site content.
What follows is a brief overview of how to get vB and Zoom to play together nicely.
The first requirement we had was that people should be able to search all the site, just our product pages or just the forums. This was easily done by creating search categories in Zoom. We set this up so that all pages in the /forum/ directory got put into the Forum category.
The main problem turned out to be that Zoom would index too deeply into vBulletin finding all kinds of unwanted links. So careful configuration is required to avoid indexing too much irrelevant information. When you allow the spider to index the large number of content-irrelevant pages created by vB, you are reducing the effectiveness of your search results (by returning too many pages that a user would not find useful), as well as significantly extending the time required to index your site, and wasting resources in terms of bandwidth and disk space.
So some configuration is required because the spider is designed to follow every legitimately different link on a web page. But in the case of the vB script (and other forums scripts), there can often be many useless pages which are simply user options eg. login procedures, sorting options, various display modes of the same page, etc..
The Zoom spider is able to skip indexing of certain URLs based on entering a fragment of the URL into a skip list. Assuming your forum is installed in a directory called /forum/ this is the required skip list to just index the thread content and nothing else.
/forum/private.php
/forum/usercp.php
/forum/faq.php
/forum/memberlist.php
/forum/calendar.php
/forum/search.php
/forum/forumdisplay.php?do=markread
/forum/login.php
/forum/modcp/
/forum/member.php
/forum/showthread.php?goto=newpost
/forum/newthread.php
&daysprune=-1&order=
/forum/showthread.php?p=
/forum/showthread.php?mode=hybrid
/forum/showpost.php
/forum/editpost.php
/forum/newreply.php
/forum/online.php
/forum/profile.php
/forum/report.php
/forum/postings.php
/forum/misc.php
/forum/subscription.php
/forum/poll.php
/forum/sendmessage.php
/forum/printthread.php
&goto=nextnewest
&goto=nextoldest
/forum/infraction.php
/forum/archive/
/forum/viewtopic.php
After indexing our site, this got the basic search function working. But the next minor problem was ensuring that our HTML product pages and PDFs ranked above the search results for the forum.
The easy solution was to force down the ranking of the forum results in the search index by inserting the following meta data into the vBulletin headinclude template.
<meta name="ZOOMPAGEBOOST" content="-5" />
And now we can search all of the Passmark.com site from a single search page.
----
David
Comment