PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

searching embedded pdf indexes

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • searching embedded pdf indexes

    Hello,

    We have been using the zoom search engine for awhile now, and it's been pretty amazing.

    The issue we are currently having probably has nothing to do with Zoom, but I figured I'd ask anyways.

    we use Adobe acrobat to Embed our indexes, so that the adobe pdf search is instantaneous. The zoom search finds the words in the OCRed pdfs, but once the user clicks that PDF a new search opens up. A search that is powered by adobe and not zoom anymore.

    The thing is when you embed a index into a PDF, it's just that one PDF. In order to do it to hundreds of PDFs we have to make a index catalogue file which spans everything we want to search. Except Zoom doesn't search that index, it searches the PDF.

    I guess my question is, is there a way for Zoom search to look at that embedded catalog index file so when they open up a pdf, the adobe search is fast.

    Or does this have nothing to do with Zoom search

  • #2
    It doesn't have much to do with Zoom Search.

    The index created by Adobe Acrobat (catalog or otherwise) cannot be used by Zoom because (1) it's in their proprietary format, and (2) they store very different information than what we need/store. For example, we have multiple hit positions of the word within the file, and they may not. We have key word stemming, synonyms, categories, etc. and they do not have any of this data. So we will always have to parse the PDF file and harvest our own index data.

    Not entirely sure what you mean by this:
    "I guess my question is, is there a way for Zoom search to look at that embedded catalog index file so when they open up a pdf, the adobe search is fast."

    Assuming "they" are the end users (reading and searching your PDF documents) then they will not be affected. It will not make the Adobe search faster -- we have no way of pre-loading the Acrobat index for them, and they have no way of using something we may have pre-loaded.

    So in short, we can't make the Acrobat search faster. Nor can their index make Zoom's search faster.

    Having good OCR'ed PDF files is already a good step to getting accurate data available for Zoom to index.

    Hope that answers your question.
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment

    Working...
    X