PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Limit search results to pdfs only?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Limit search results to pdfs only?

    Is there a way to limit the search results to pdf documents only?

    We built and integrated a customized search engine, and only need the added ability to search pdf documents.

    Is it possible to configure zoom to limit search output?

    Thanks,

  • #2
    You could use Categories (that's what we have done) and it seems to work nicely. Search through the forums and you will find others that wanted to do the same thing. The manual also shows some good info for Categories.

    Shawn

    Comment


    • #3
      Thanks Shawn. I actually read all the posts in this forum (I think) and didn't see anything similar. Undoubtedly, I missed something .

      I will review the categories section of the manual, in the meantime, if you could point me to the other threads discussing this, I would appreciate it.

      Thanks!

      Teddy

      Comment


      • #4
        For my site, I use only .asp files so I set up one category (Web Pages) and the pattern is .asp . Then, I setup another category called PDF Files and the pattern for that is .pdf .

        I am not sure how to do multiple patterns because I never researched it (didn't apply to me) so if you don't have all your pages as one file format, it might be more difficult.

        HTH,
        Shawn

        Comment


        • #5
          If you only want to index PDF files, and not ASP files nor HTML files, there are several options.

          - Use offline indexing option in Zoom to just index the folders that contain your PDFs

          - Create a list of your PDF documents (their URLs) and import that list into Zoom as a list of spider start points.

          - Create a PDF site map page in ASP or HTML then start the spider indexing at that map page. but enclose the page in the tags and . The links to the PDFs will still be followed but any text on the map page will be ignored.

          In all of these cases above, you will have needed to download the PDF plugin,
          http://www.wrensoft.com/zoom/plugins.html
          and added .PDF to the list of file extensions to scan.

          If you need more details on any of these options let me know.

          ------
          David

          Comment


          • #6
            Another option would be to filter out any url's that have .html in them, or whatever page extensions you are using that aren't pdf's.

            Comment


            • #7
              Although the suggestion from Broman has simplistic appeal, it will probably not work for most web sites. If you are using spider mode but don't index the HTML files, the you will never find the links to the PDF files that are on the HTML pages (becauase the HTML pages are skipped).

              So the 3 suggestions previously provided are probably better for most web sites which only want to search PDF files.

              ----
              David

              Comment


              • #8
                Duh, I guest I kinda forgot that the html pages needed to be spidered through. My oversight.

                Comment

                Working...
                X