PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

no text extracted from pdf

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • no text extracted from pdf

    Hello. I am using this software to index an intranet which is essentially a file server (very few web pages). I am getting confusuing results, so I am currently indexing only the bare bones filename and page title (which I assume in a document mean the document title). I get an error that says "No text content extracted from PDF file" which makes no sense to me. If the configuration is set for only the Page Title and Filename, why would it be trying to extract text?

  • #2
    To get the page title, the text from the PDF file still needs to be extracted. And from memory we attempt a full conversion of the PDF (to text) even if just the title is required. If there is no text in the document (e.g. it is just full of pictures) then you'll see this warning.

    Comment


    • #3
      If your PDFs are created from scanned paper documents, you may want to take a look at this FAQ:
      Q. Why can't I find words from my scanned PDF files? (PDFs created from scanning in physical documents)
      --Ray
      Wrensoft Web Software
      Sydney, Australia
      Zoom Search Engine

      Comment


      • #4
        It is probably because your PDF file is a scanned PDF and you could not index it. You need try to convert it into a normal PDF using OCR Terminal or other helpful application.

        Comment

        Working...
        X