PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

"Index Only" is not limiting indexing to only those PDFs I want to index

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • "Index Only" is not limiting indexing to only those PDFs I want to index

    I have a folder on my website with about 400 PDFs. I have set up several different configurations, each of which has a different list of URLs that are set to [Index Only]. When I index using each configuration and then do a test search in those areas, all PDFs in the folder are being indexed and returned in results, not just the ones I have defined in the configuration.

    Each of the different configurations results has a different folder on my website that the search page links to that contains the results of the index scan for each

    My questions: Am I thinking about the way this is supposed to work wrong? I only want to return results for the URLs that I wanted scanned with the [Index Only] setting. Am I doing that correctly?

    Thanks

  • #2
    If you just want to index a collection of PDF files, and you have this collection on your computer (or you can download it to your computer), then it would make more sense to use Offline Mode and simply index that folder. Then you wouldn't be susceptible to issues with crawling and link following, etc.

    Having said that, there's probably something else happening that you're not aware of, given your description. In theory, it should be fine. But since you're indexing in Spider Mode, you're dependant on how your web server responds to the URL requests. For example, if one of these URLs is wrong or needs authentication, and your web server redirects to a different file with links elsewhere. "Index only" setting should prevent this from further crawling, but it's hard to predict what else is amiss with your settings.

    I presume each of the URL points to a single PDF? So you have 400 start URLs specified?

    If you want us to look at it further, email us the .zcfg configuration file, along with a saved index log from a previous indexing session. And we can verify what's going on. If the search index/page is live somewhere, give us the URL and we can take a look at that too.

    But as mentioned, Offline Mode should really be a more straight forward way to approach this scenario.
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment

    Working...
    X