PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Zoom Professional Not Indexing PDF Content

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Zoom Professional Not Indexing PDF Content

    Is there anything special one needs to do in order to have the content of PDF files indexed? I have the .pdf extension specified in the search configuration, have the Zoom Indexer plugins installed...but when I search for something contained in a PDF it doesn't always work.

    Maybe there are types of PDFs that are supported and some types that are not?

    THANKS.
    _______________
    kyler boudreau
    hvacwebsite.com

  • #2
    Zoom converts PDF files to plain text and indexes all words found in the entire PDF or DOC document. Images, diagrams, graphs, etc. will however, not be indexed.

    See also these FAQ
    Q. Why can't I find words from my scanned PDF files? (PDFs created from scanning in physical documents)

    And it is also possible the PDF is encrypted (in which case you can enter in the decryption password in Zoom).

    And maybe the spider isn't finding your PDF files at at? In this case see these FAQ
    Q. Why are links in my Javascript menus being skipped?
    Q. I am indexing with spider mode but it is not finding all the pages on my web site

    If you still have a problem, please post the URL of the site, the PDF file, your search function and let us know the search word used.

    Comment


    • #3
      Thanks for the response!

      This site isn't using any flash or javascript. There is a literature page that has a list of PDF links. Here are the details:

      site: http://thermostatsusa.com/literature.asp
      pdf: http://thermostatsusa.com/pdfs/c11ns_submittal.pdf
      search function: all words
      search words tried: s1-thec11ns, hydronic, sleek styling

      I've tried another PDF on this page and it didn't work either. Kinda strange...if you look at my home page (http://www.thermostatsusa.com) you'll see a direct html link to the literature.asp page which then has the direct PDF links.

      Thanks for the help!
      _______________
      kyler boudreau
      hvacwebsite.com

      Comment


      • #4
        I tried indexing the PDF file in question, and all the search words you mentioned were found.

        The problem is that you have configured Zoom to not index the PDF file at all. You should be able to note this in the index log.

        As described in this FAQ, you should enable "Verbose Mode" if you need to see why certain pages are not indexed. It will display the reason why certain links are not being scanned.

        When I attempt to spider from the following URL with Verbose Mode:
        http://thermostatsusa.com/literature.asp

        I see a list of PDF skipped messages such as the following:

        10:44:39 - [SKIPPED] Skipping http://thermostatsusa.com/pdfs/c11ns_submittal.pdf (Blocked by robots.txt)
        10:44:39 - [SKIPPED] Skipping http://thermostatsusa.com/pdfs/c11p5s_submittal.pdf (Blocked by robots.txt)
        10:44:39 - [SKIPPED] Skipping http://thermostatsusa.com/pdfs/c11p5s_manual.pdf (Blocked by robots.txt)
        10:44:39 - [SKIPPED] Skipping http://thermostatsusa.com/pdfs/c11p5s_installation.pdf (Blocked by robots.txt)
        And as expected, when I checked the robots.txt file on your site, this is what I see:

        # FULL access (All Spiders)
        User-agent: *
        Disallow: /stats
        Disallow: /images
        Disallow: /pdf
        This means that the "robots.txt" file on your site is explicitly telling all spiders to not index your PDF files. So it is little surprise that Zoom has also ignored your PDF files.

        You can configure Zoom to not obey the "robots.txt" file (on the "Scan Options" tab, uncheck the option that is labelled "Enable 'robots.txt' support"). Or you can change your robots.txt file so that Zoom is allowed to access the PDF folder.
        --Ray
        Wrensoft Web Software
        Sydney, Australia
        Zoom Search Engine

        Comment


        • #5
          oh....well guess I'm the moron. THANKS for taking the time to help me fix this!

          Have already told another web developer about your product - love it so far.

          Thanks again for the help.
          _______________
          kyler boudreau
          hvacwebsite.com

          Comment

          Working...
          X