PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Search text within PDF

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Search text within PDF

    Hi guys, May i know whether this Zoom Search possible to search the text within PDF file? From what i read from Wrensoft website, is stated search PDF file, is it search the content within the PDF of just search for the PDF file type?thanks in advanced

  • #2
    Zoom (with the PDF plugin installed) will search through all the text WITHIN the PDF files, and not just the filename or filetype. It can also index the meta information such as title, description, author, keywords, etc.

    You will need one of the registered editions (Standard, Pro or Enterprise) to use plugins with Zoom.
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment


    • #3
      Originally posted by Ray View Post
      Zoom (with the PDF plugin installed) will search through all the text WITHIN the PDF files, and not just the filename or filetype. It can also index the meta information such as title, description, author, keywords, etc.

      You will need one of the registered editions (Standard, Pro or Enterprise) to use plugins with Zoom.
      I have unzipped the pdf plugin into the plugins folder but am having trouble indexing pdfs in a "newsletters" directory. The site navigates to these newsletters via a php page that opens an html page in an iframe. The html page has a dropdown containing the pdfs.

      http://havasuchamber.com/inner.php?newsletters.html

      A couple other pdfs on the site ARE indexed... just not the ones in the newsletter directory.

      These files are at http://havasuchamber.com/newsletters/

      Is there a way to FORCE files in a certain directory that is missed? Similar to "skip pages" but opposite?

      Comment


      • #4
        Originally posted by timhodge View Post
        I have unzipped the pdf plugin into the plugins folder but am having trouble indexing pdfs in a "newsletters" directory. The site navigates to these newsletters via a php page that opens an html page in an iframe. The html page has a dropdown containing the pdfs.
        Your problem is due to your dependence of Javascript menus to link to the PDF files. This problem is explained in this FAQ:
        Q. I am indexing with spider mode but it is not finding all the pages on my web site

        Originally posted by timhodge View Post
        These files are at http://havasuchamber.com/newsletters/

        Is there a way to FORCE files in a certain directory that is missed? Similar to "skip pages" but opposite?
        In your case, you should be able to simply add http://havasuchamber.com/newsletters/ as an additional start point by clicking on the "More" button in Spider Mode. This will ask the spider to crawl the newsletters folder after indexing your original start point. Since you have directory listing enabled on your server, it will find links to the other PDF files.

        For other users (or other sites in the future), where you do not have directory listing enabled, you may add the URL to each file you wish to index as individual start points. This would essentially work like an "opposite skip list" and allow you to specify files that have to be indexed even when the spider fails to find links to them.
        --Ray
        Wrensoft Web Software
        Sydney, Australia
        Zoom Search Engine

        Comment

        Working...
        X