Scanning versus indexing

AndrewD

Newbie

Join Date: Dec 2006

Posts: 49
- Share
- Tweet
#1

Scanning versus indexing

Dec-26-2006, 08:58 AM

I'm wondering if I'm missing something obvious...

Our site is driven by a patched version of VBulletin, and the interesting content is stored as documents (usually pdfs). Access to everything is script controlled, so (for example) people will get at a particular document with something like getscript.php?action=retrieve... which grabs and downloads the content to the browser. The raw pdfs are not visible in the site.

Zoom does a super job of finding everything, but along the way it also indexes all the surrounding junk (usernames, etc) in the forums, headers and footers, etc. I've read the instructions for integrating with bulletin boards, which helps, but...

What I think I'd like to do is scan everything and follow links, but only index the content which is retrieved by the final getscript.php?action=retrieve

Is this straightforward to set up?

I suppose I could achieve the same result by pre-scanning my database to produce a dynamic search config file, telling zoom exactly which scripts to index.

Is there another way?

thanks
Tags: None
David

Administrator

Join Date: Dec 2004

Posts: 4709
- Share
- Tweet
#2

Dec-26-2006, 09:31 AM

To avoid indexing the content in the VBulletin DB but to just follow links to the PDF's, make a small customisation to VBulletin.

Add  and  tags to the header and footer of the page.

You still need to use the skip list recomended for forums to skip unwanted pages however.
Comment

Announcement

Scanning versus indexing

Scanning versus indexing

Comment