PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Questions about searching PDF files

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Questions about searching PDF files

    I am interested in purchasing Zoom for my small commercial site. It appears that there is no way of testing how Zoom works with PDF files, as the free version does not support plugins. Is this correct? (My site has 2000-3000 PDF files that I am interested in making searchable. I realize that those that were created graphically will not be searchable unless I run then through OCR.)

    What is displayed in the search results list for a PDF file? A PDF file has no title or description to display.

    Thank you very much.

    Robin Miller

  • #2
    Hello

    Good question!

    ... Is there a differnce between older verions and newer pdf - files?

    thx

    Comment


    • #3
      You can give a PDF file a title and description using something like Acrobat or other PDF creating programs. Not sure what is meant by old and new, but I have PDF files created years ago and they work as well as new ones.

      The search result is the same as a web page.

      Bob
      Robert Isaac
      Volvo Owners Club

      Comment


      • #4
        Bob, thank you for your response. However, for a PDF file that has no title or description, I don't see how the search results can display in the manner shown in all the screenshots on Wrensoft's website. Each entry in the search results appears to have three components: the title, the description, and the snippet of the file containing the search words. In looking through the user's manual, I do not see that the user can change the fields that are displayed. Therefore my question still is, what is displayed in the title and description fields when there is no title or description information? Blank lines? If so, is there any way to change the fields that are displayed--to show the file name, for instance, rather than the document title?

        Is there a way to create a *.desc file that applies globally to a file type, such as to all PDF files?

        Thanks,
        Robin Miller

        Comment


        • #5
          Just to explain further: I publish a newsletter that summarizes recent decisions by U.S. courts. Each summary of a court decision has a link to the full text of the decision as released by the court. This is a PDF document that the different courts create in various ways: most are converted from a word processing file, but some are scanned, and various versions of Acrobat are involved. Since there is no way to test the Professional version, I am trying to determine how Zoom will work under these conditions.

          I have scanned the last 40 pages of the forum thread titles. I understand that:

          --Documents created through scanning have no textual content and will not index (or will index with no content)

          --"Jump to match and highlighting in document" will work under certain conditions; these conditions seem to be that the user is using a recent PDF viewer, at least if it's made by Adobe rather than a third party.

          One of the things I don't know--and this is absolutely critical--is what the search results page will look like. If there are no document titles and descriptions to include in the results, will the user see simply a list of document snippets (contexts)? This would probably be useless to me.

          Let me also confirm: There is no way to implement "Jump to match and highlighting in document" for Word files?

          Thanks,
          Robin

          Comment


          • #6
            It appears that there is no way of testing how Zoom works with PDF files, as the free version does not support plugins. Is this correct?
            The free edition doesn't do PDF files. You can E-mail us and request a trial enterprise key however.

            Comment


            • #7
              You can give a PDF file a title and description using something like Acrobat or other PDF creating programs.
              In addition to setting the document meta data directly within the PDF file you can also set the meta data using an external .desc file. See the Zoom User's Guide for details.

              If you are using .desc files, then you need a .desc file per document.

              But it is probably better to set the document title in the PDF file. Using Adobe Acrobat for example. .desc files are of use if you can't edit the PDF for some reason.

              It should also be noted that most PDF files do in fact have a title. Even if the document author doesn't bother to set one. This is because programs like Word automatically guess at a title, and this ends up in the PDF. This guessing does mean that the titles make no sense from time to time however.

              If a title isn't available for a PDF, then the file name will automatically be used.

              Comment


              • #8
                There is no way to implement "Jump to match and highlighting in document" for Word files?
                This is correct.

                Comment


                • #9
                  Thank you very much!

                  I had looked at some the PDF documents posted to my website, and you're right that many of them have titles, and you're also correct that many of the titles aren't useful. I can see, then, that my challenge is to go back and manually edit the titles on these documents. This will be a medium-term project, but I would like to make the documents searchable by my subscribers. I will also need to use OCR to add a text layer to the documents that were created by scanning. Once I've done this, I will certainly use Zoom. It looks like a great website search engine.

                  My remaining questions are (1) what is displayed, if anything, in the search results for a PDF document that has an empty description field in its metadata; and (2) can I suppress display of the document description in the results list?

                  Thanks again,
                  Robin

                  Comment


                  • #10
                    I thought of another question, and I certainly appreciate your time. With respect to Ray's information on searching metadata fields in PDF documents:

                    http://www.wrensoft.com/forum/showthread.php?t=4344

                    Can this process be employed to search any of the metadata fields in a PDF document?

                    Can the website user leave the text search box empty and search only the metadata field(s)?

                    Thanks again,
                    Robin

                    Comment


                    • #11
                      Hi Robin,

                      Originally posted by cmplxgal View Post
                      My remaining questions are (1) what is displayed, if anything, in the search results for a PDF document that has an empty description field in its metadata; and (2) can I suppress display of the document description in the results list?
                      If there is no meta description for any result (PDF or HTML or otherwise), then only the context description is shown (the part of the content where the matched word was found).

                      If context description is not enabled (such as when using the JavaScript platform), then a portion of the page content is extracted from the beginning of the document.

                      Originally posted by cmplxgal View Post
                      I thought of another question, and I certainly appreciate your time. With respect to Ray's information on searching metadata fields in PDF documents:

                      http://www.wrensoft.com/forum/showthread.php?t=4344

                      Can this process be employed to search any of the metadata fields in a PDF document?
                      Only those which are extracted by the PDF plugin. This should apply for Title, Author, and Subject (which is used as a meta description). You will have to disable them from indexing as noted in the previous thread.

                      Originally posted by cmplxgal View Post
                      Can the website user leave the text search box empty and search only the metadata field(s)?
                      Yes. You can see how this behaves on our demo "fruit shop" site featuring the Custom Meta Fields features.
                      --Ray
                      Wrensoft Web Software
                      Sydney, Australia
                      Zoom Search Engine

                      Comment


                      • #12
                        Thank you again. I am thrilled to have found an excellent solution for adding a search capability to my newsletter's website.

                        My best,
                        Robin

                        Comment


                        • #13
                          ZoomSearch 5.1 Professional Edition can index PDF files?

                          Hello,

                          I have bought Zoom Search 5.1 Professional Edition and I need now to index also PDF files (but I can see, I cannot)...

                          Is it possible for this Edition to index PDF files? If it is, what should I do to make this possible, and if not, what should I do then?

                          Best Regards,

                          Arben Cokaj

                          Comment


                          • #14
                            but I can see, I cannot..
                            Why not?

                            You need to add .PDF to the list of files types to index and have the plug-ins installed.
                            http://www.wrensoft.com/zoom/plugins.html
                            That should be all that is required.

                            Comment


                            • #15
                              Thank you very much, indeed. I tried it and everything was going OK.

                              But I would like to ask, how can I index files that are bigger than 1 MB (1024 KB)? Because I have some PDF files that are bigger and they cannot be downloaded.

                              Is it possible to change the specification of the file size from 1 MB (1024 KB), to something more?

                              Best Regards,

                              Arben

                              Comment

                              Working...
                              X