PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Skip 1 page of all PDF files?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Skip 1 page of all PDF files?

    Hi,
    I've just started playing with Zoom and am finding it to be extremely good. I have one [probably simple] question that deserves a little history first.
    We are a document imaging bureau offering an image hosting service to out clients (i.e. they can search for and download documents via the internet using index data that we have captured and hold in a database). Most of our images are produced as PDFs as they can be OCR'd and hence made searchable.

    We are looking to offer a full text search faciilty which is where Zoom enters the room.
    All is working well, but I would prefer it if the first page of each PDF file was not indexed as this is an index sheet that we attach to the front of every file and is not really related to the scanned document itself.

    I'm aware of the ZOOMSTOP and ZOOMSTART tags which I could put at the start and end of the first page, but the OCR 'might' not read it correctly (we scan at 200dpi and not 300dpi due to speed and file size constraints) and could mess things up.
    The PDFs are all between 10 and 400 pages long and there are potentially tens of 1000's of files.

    So......is there a configuration that i haven't found yet that tells zoom to ignore the first page of every file?

  • #2
    There is no option to ignore the 1st page of a PDF file.

    If is was a HTML file you could insert tags to skip part of the document, but it is not possible to insert tags into a PDF file.

    My advice would be not to OCR the 1st page and leave the 1st page as an image. If there is no text on the 1st page then Zoom will ignore that page.

    Comment


    • #3
      This would be the simplest option normally......but as we're dealing with 1000's of pages each day our OCR system is pretty much fully automated and cannot be configured to ignore any pages.

      No worries, it's not a huge issue - and if it becomes one I can automate the removal of the first page from every PDF before the OCR takes place.
      Last edited by Ted; May-13-2008, 02:28 PM.

      Comment


      • #4
        It's ZOOMRESTART, not ZOOMSTART...just in case you need to use it for another issue at some point.


        Good luck,
        Leon

        Comment


        • #5
          Originally posted by MergeThis View Post
          It's ZOOMRESTART, not ZOOMSTART...just in case you need to use it for another issue at some point.


          Good luck,
          Leon
          Hi Leon,
          I was was just typing madly away.....knew it wasn't quite right

          Thanks for the pointer though.
          Ted

          Comment

          Working...
          X