I'm wondering if I'm missing something obvious...
Our site is driven by a patched version of VBulletin, and the interesting content is stored as documents (usually pdfs). Access to everything is script controlled, so (for example) people will get at a particular document with something like getscript.php?action=retrieve... which grabs and downloads the content to the browser. The raw pdfs are not visible in the site.
Zoom does a super job of finding everything, but along the way it also indexes all the surrounding junk (usernames, etc) in the forums, headers and footers, etc. I've read the instructions for integrating with bulletin boards, which helps, but...
What I think I'd like to do is scan everything and follow links, but only index the content which is retrieved by the final getscript.php?action=retrieve
Is this straightforward to set up?
I suppose I could achieve the same result by pre-scanning my database to produce a dynamic search config file, telling zoom exactly which scripts to index.
Is there another way?
thanks
Our site is driven by a patched version of VBulletin, and the interesting content is stored as documents (usually pdfs). Access to everything is script controlled, so (for example) people will get at a particular document with something like getscript.php?action=retrieve... which grabs and downloads the content to the browser. The raw pdfs are not visible in the site.
Zoom does a super job of finding everything, but along the way it also indexes all the surrounding junk (usernames, etc) in the forums, headers and footers, etc. I've read the instructions for integrating with bulletin boards, which helps, but...
What I think I'd like to do is scan everything and follow links, but only index the content which is retrieved by the final getscript.php?action=retrieve
Is this straightforward to set up?
I suppose I could achieve the same result by pre-scanning my database to produce a dynamic search config file, telling zoom exactly which scripts to index.
Is there another way?
thanks
Comment