PassMark Logo
Home » Forum


No announcement yet.

[PDF plugin error] Failed to read or parse PDF file. File may require a password.

  • Filter
  • Time
  • Show
Clear All
new posts

  • [PDF plugin error] Failed to read or parse PDF file. File may require a password.

    Since build 1015 I am getting errors like the above for PDF files which are *not* password-protected and are not even in a login-protected section of our website.
    These files can be easily downloaded via Firefox and opened with PDF-Xchange Editor. Like the previously reported 406 errors, these errors generally occur for different files on successive runs of Zoom Search Indexer. In other words, a file which triggers this error on one run can be indexed without problems on a subsequent run. Since I am dependent on java script, I am also dependent on a complete, error-free index run. This can take many repetitions, which take hoours because I am reduced to using single-threading and the longest possible delay between pages in order to minimize these errors.

  • #2
    I tried running the indexer offline on a backup copy of the website and get similar errors on PDF files which can easily be opened by simply copying the path from the ZSE log and pasting it into a Windows "Run" dialog. There was one case where the file really couldn't be found because ZSE had modified the filename, replacing a "Ř" (u with umlaut) by "u?", but two others for files which can be opened without a password.

    P.S. The index generated offline was IAC unusable. When added to the website, the search results referenced pages which didn't contain the search term.
    Last edited by imcz; 06-28-2021, 11:44 AM.


    • #3
      There are several different ways PDF files can be protected. They can be encrypted with a password, but they can also be flagged to prevent printing and text extraction (but still allow viewing).
      See example below from Adobe Acrobat.

      Click image for larger version

Name:	PDF-Page-Extraction.png
Views:	104
Size:	7.3 KB
ID:	38095

      Other PDF files might have no text in them (i.e. no OCR layer) and just be a photograph or a scan.

      If you think you have a different case, can you post a link to an example file.


      • #4
        Dear David,
        Thanks for responding. Unfortunately, I once again failed to receive (even in my spam folder) an e-mail notification of your response, even though I am subscribed to this thread.
        Here is a link to a document which has "no security" (at least according to PDF Xchange Editor) and still fails to index:
        PDF Security properties of non-indexed file
        It is a regular authored PDF with embedded text, not a scan.
        I have attached the log file, which also shows the DLL load error which I reported in another thread.
        To answer your question there, the error occurs consistently (every time).
        Thanks for your support.

        P.S. Here is another link to a PDF with no security which provokes the same error.
        Attached Files
        Last edited by imcz; 07-16-2021, 12:07 PM.


        • #5
          Adobe says something different for the same document.

          Click image for larger version

Name:	No-page-extraction-pdf.png
Views:	97
Size:	118.5 KB
ID:	38112


          • #6
            Sorry, my bad. I misread the filename, so the link I provided points to a different document (202107.pdf) from the one causing the error (202102.pdf). I can confirm your analysis of 202102.pdf. The error caused by the file in the seci˘nd (P.S.) link seems to be due to a discrepancy between the filename in the (wget) backup we are scanning (due to the notorious 406 errors) and the link ZSE is using.
            ZSE is looking for "Kunstfu?hrung, flyer 2020_english_Webversion.pdf", but the file is really named "Kunstführung, flyer 2020_english_Webversion.pdf".
            I'm not yet sure whether the problem lies with ZSE or with wget. If you have any ideas, please let me know.


            • #7
              Hi, I have addressed this issue in your email, and we will follow up there so we don't end up discussing this in multiple places.
              Wrensoft Web Software
              Sydney, Australia
              Zoom Search Engine