PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

PDF files not being indexed correctly

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • PDF files not being indexed correctly

    I just reinstalled zoom and indexed the site and it will not index all files in the directory. for 10 files in the directory only finding 210 unique words these files are 3 to 6 pages long. do i need to add some program for them index correctly?

  • #2
    You need to tell us if you are using (a) offline mode, or (b) spider mode, and also if you are indexing HTML pages, PDF files, etc.

    Also, you should read these FAQs first:
    Q. Why are some of my pages being skipped by the indexer?

    Q. Why are links in my Javascript menus being skipped?

    Q. I am indexing with spider mode but it is not finding all the pages on my web site
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment


    • #3
      Sorry for not including all the information.
      I am scaninng in offline mode, this directory contains only pdf files 9 of them.

      I read the links and checked all settings in the setup they seem correct.

      Comment


      • #4
        Did you check in the log to determine of the files are actually being indexed? Of if there was an error (e.g. they are encrypted).

        Also do the files really contain text. Or are they just images (or scanned in text).

        Can you upload one of the files to a location where we can see it.

        Comment


        • #5
          no errors in log file some of our pdfs are scanned by our copier so they could be an image inside. still but 9 documents and only 210 unique words.

          Comment


          • #6
            so they could be an image inside
            Maybe you should check?

            Can you upload one of the offending files to a location where we can see it.

            Comment


            • #7
              See this FAQ for reference:
              Q. Why can't I find words from my scanned PDF files? (PDFs created from scanning in physical documents)

              Without seeing the files ourselves, we can't be sure if there really is more than 210 unique words (note that "unique words" are not "total words", e.g. 10x occurrences of the word "dog" is only 1 unique word).
              --Ray
              Wrensoft Web Software
              Sydney, Australia
              Zoom Search Engine

              Comment

              Working...
              X