PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

pdf not found

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • pdf not found

    Hi,
    I have just buy Zoom Search Engine V6 Professional, it works great but I have ONE BIG problem.
    It doesn't found pdf files.
    I have all plugins, I added all file extensions, I added the folder url where are the pdf (like: http://www.website.com/brochure_online/ )
    but, after this, it don't found the pdf!!!
    I hope that someone can help me.
    thanks

    ps: the pdf are only text, no scanned
    pps: sorry for my english!

  • #2
    Can you have a look in the Zoom log (on the tab called log) to see if there was any warnings or errors.

    Are you sure that when you enter in the folder path that the PDF files are actually listed out. Try this in a browser to check. It might be that the web server is not configured to return a directory listing, and so the PDF files are not found by the spider.

    Other reasons might be that the PDF are encrypted with a password, corrupt or don't contain the words you are searching for.

    See also these FAQ
    Q. Why are some of my PDF files failing to index with a "PDF plugin error"?

    Q. Why can't I find words from my scanned PDF files? (PDFs created from scanning in physical documents)

    Q. I am indexing with spider mode but it is not finding all the pages on my web site

    Comment


    • #3
      Hi
      I just have a look in the zoom log... there aren't error but some warning. But all about the folder where there are pdf files...
      warning is:
      "Could not download file: http://www.website.com/brochure_online/ (forbidden)"

      Do you know what it mean?
      Also in the list "file extensions: .pdf indexect: 0"
      Confirm that it doesn't index the pdf files!

      The pdf files are all unlocked, full of text of word, and if I put the direct url (http://www.website.com/brochure_online/file.pdf) I can read it!

      Thanks for your support, it's so important that it works

      Comment


      • #4
        As already pointed out it,
        "Are you sure that when you enter in the folder path that the PDF files are actually listed out. Try this in a browser to check. It might be that the web server is not configured to return a directory listing, and so the PDF files are not found by the spider."

        If you get a forbidden error from your server then it will mean that looking at the listing of this folder on your server is (unsurprisingly) forbidden. You might want to allow this on your server, or use a directory listing script, or include a index.html page with a list of links to the PDF files to solve the problem.

        Comment


        • #5
          Another alternative is to index in Offline Mode, if you have these files on your hard disk. And if you don't need to index any PHP or ASP dynamically generated pages.

          Spider mode relies on the web server providing links through web pages. If you are pointing it at a URL which does not link to the pages, there would be no way to find them. Just as a visitor to your web site can't just download all the files that make up your website unless you link to them.
          --Ray
          Wrensoft Web Software
          Sydney, Australia
          Zoom Search Engine

          Comment


          • #6
            OK, it could be a solution!
            But I have a question for you: the Zoom search engine don't index the filename (pdf, jpg, swf) in a specific folder, I put the url's folder in the spider url too?
            Because in the indexing options tag I check filename too.
            thanks for your support

            Comment


            • #7
              I presume you are still using Spider Mode regarding this question.

              You can only use a folder URL if there is a page returned. For example, a URL like this:
              http://www.mysite.com/foldername/

              If you go to that URL in your browser, then you will see what a spider would see. If your web server is configured to return directory listings, then that URL will return a generated page with a list of links to all the subfolders and files. In which case, the spider can follow the links and crawl the files.

              If your web server is not configured to return a directory listing, then such a URL will be useless. There is no magical way that a spider would be able to find all the files under that URL, just as there is no way a user would be able to download and find everything you have on your web server if you do not link to it.

              Again, Offline Mode would not have this trouble. Because the files are simply on your hard disk, it can find all the contents of the folder.
              --Ray
              Wrensoft Web Software
              Sydney, Australia
              Zoom Search Engine

              Comment

              Working...
              X