Hello. I am using this software to index an intranet which is essentially a file server (very few web pages). I am getting confusuing results, so I am currently indexing only the bare bones filename and page title (which I assume in a document mean the document title). I get an error that says "No text content extracted from PDF file" which makes no sense to me. If the configuration is set for only the Page Title and Filename, why would it be trying to extract text?
Announcement
Collapse
No announcement yet.
no text extracted from pdf
Collapse
X
-
To get the page title, the text from the PDF file still needs to be extracted. And from memory we attempt a full conversion of the PDF (to text) even if just the title is required. If there is no text in the document (e.g. it is just full of pictures) then you'll see this warning.
-
If your PDFs are created from scanned paper documents, you may want to take a look at this FAQ:
Q. Why can't I find words from my scanned PDF files? (PDFs created from scanning in physical documents)
Comment
Comment