PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

PDF Plugin Installed, but PDFs not indexed

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • PDF Plugin Installed, but PDFs not indexed

    We just installed our PDF plugin today and have finally gotten things indexing, but not the PDF files. There are plenty linked from all of our pages so they should be picked up by the spider just fine. Is there something else we need to do besides adding them as one of the formats to search? Thanks.

  • #2
    The plugin needs to be installed.
    .pdf needs to be added as a file extension to scan (in the scan options config window).

    If you are using offline mode this is enough. If you are using spider mode then you need to have HTML pages with links to the PDF files.

    Check in the log, you might find some other issues. For example a robots.txt file blocking access to the files, of the file size being larger than the limits you set in the limits configuration window.

    Comment


    • #3
      It's installed, and PDF is added as an extension to index. It also says in the log that the PDF plugin is there and PDF file support is enabled.

      I only have about 8 HTML files but there are about 50 PDFs linked from those pages that should be picked up by the spider. I don't have any major size limitations (2MB or smaller file size is all).

      The only error I don't understand in the log is that the failed to open the zoom.ini file. Would that have something to do with it not being able to find any PDFs?

      Comment


      • #4
        Actually, it looks like it's skipping all of the PDFs and says "External site - does not match base URL." These are all linked relatively from the page (i.e. ../../forms/form.pdf) - is it viewing this as an external site? Is there any way to get it to recognize these links as internal? Thanks.

        Comment


        • #5
          Never mind - I found this one in the support section. My starting point was too specific so it wasn't matching the base URL. Thanks for your help.

          Comment


          • #6
            Right. Yes, you can do that by adjusting the Base URL. Click on "More" next to your spider start URL, then click on "Edit" for that start point. You can override the pre-determined Base URL there. Click the "Help" button for some examples and instructions.

            If you can tell us what the URL to the PDFs are (e.g. give us the full skipped message) and also tell us what your current start URL and base URL are, we can probably tell you what you need to change this to.

            EDIT: Guess I hit "Post" too late (got held up on the phone). Glad you sorted it out!
            --Ray
            Wrensoft Web Software
            Sydney, Australia
            Zoom Search Engine

            Comment

            Working...
            X