PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

PDF searches not working

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • PDF searches not working

    I have added the pdf plug-in to Zoom Professional, and it shows up as loading properly. I have checked all the appropriate settings to make sure it is set to scan PDFs; however, not all my PDFs get indexed.

    When I do an index of my full site, a couple of PDFs do get processed, but I have more than just those two in my site.

    I have removed password protection on them and removed any special characters in the filename to see if this caused it, but nothing works.

    I am running Version 4.0.1016 (Professional) on Windows XP. I have checked the log (with Verbose on) and I don't show any PDFs being excluded or failing. They just don't show up.

    Are there some additional settings for PDFs that I am missing, or are there specific settings in the PDF files that may be blocking the scan?
    Mark

  • #2
    If you have a few PDFs being indexed, then there is a good chance that the PDF plug-in install and PDF configuration in Zoom has been done correctly.

    You don't see any error messages or skipped file messages. So the most likely explaination is that you are using Spider mode in Zoom and the spider never finds the PDF documents, becuase it doesn't find any links to the documents.

    Can you try adding a link to a missing PDF file from your start page to see if that corrects the problem.

    -----
    David

    Comment


    • #3
      I tried linking to the document, but that still didn't work. I even created a bare-bones HTML file that only had direct links to the PDFs and it still didn't find them.

      Is there another way to index them? You said that spider mode rarely finds them. Should I index another way? And why don't my direct links catch either then?
      Mark

      Comment


      • #4
        The direct links should work, unless you are indexing a cached copy. You can prevent this by clicking on "Configure" -> "General" and checking the box for "Reload all files (do not use cache)".

        Spider mode has no problem with finding PDFs so it is not that it "rarely finds them". However, it is common for some websites to be spider-unfriendly and not provide valid HTML links to certain documents (which is what a spider relies on to crawl a website). Common spider unfriendly elements include Javascript menus or links. For more information, see this FAQ:
        http://www.wrensoft.com/zoom/support...spider_finding

        The alternative to spider mode is Offline mode. This allows you to index the files off a local hard disk, where the Indexer can locate all files under a certain folder (and all sub-folders).

        For more information about the differences between Offline and Spider mode, refer to the Users Guide:
        http://www.wrensoft.com/zoom/usersguide.html
        --Ray
        Wrensoft Web Software
        Sydney, Australia
        Zoom Search Engine

        Comment


        • #5
          If you could post the URL to your bare-bones HTML file, that would be helpful.

          ----
          David

          Comment


          • #6
            I turned off caching and did an offline index run (I have direct access to the server) and everything worked well. I would have posted the URL, but the files are password protected and sensitive material for our client. I couldn't provide access outside of the company.

            Thanks for all the information! Problem solved.
            Mark

            Comment

            Working...
            X