Hi,
I've just started playing with Zoom and am finding it to be extremely good. I have one [probably simple] question that deserves a little history first.
We are a document imaging bureau offering an image hosting service to out clients (i.e. they can search for and download documents via the internet using index data that we have captured and hold in a database). Most of our images are produced as PDFs as they can be OCR'd and hence made searchable.
We are looking to offer a full text search faciilty which is where Zoom enters the room.
All is working well, but I would prefer it if the first page of each PDF file was not indexed as this is an index sheet that we attach to the front of every file and is not really related to the scanned document itself.
I'm aware of the ZOOMSTOP and ZOOMSTART tags which I could put at the start and end of the first page, but the OCR 'might' not read it correctly (we scan at 200dpi and not 300dpi due to speed and file size constraints) and could mess things up.
The PDFs are all between 10 and 400 pages long and there are potentially tens of 1000's of files.
So......is there a configuration that i haven't found yet that tells zoom to ignore the first page of every file?
I've just started playing with Zoom and am finding it to be extremely good. I have one [probably simple] question that deserves a little history first.
We are a document imaging bureau offering an image hosting service to out clients (i.e. they can search for and download documents via the internet using index data that we have captured and hold in a database). Most of our images are produced as PDFs as they can be OCR'd and hence made searchable.
We are looking to offer a full text search faciilty which is where Zoom enters the room.
All is working well, but I would prefer it if the first page of each PDF file was not indexed as this is an index sheet that we attach to the front of every file and is not really related to the scanned document itself.
I'm aware of the ZOOMSTOP and ZOOMSTART tags which I could put at the start and end of the first page, but the OCR 'might' not read it correctly (we scan at 200dpi and not 300dpi due to speed and file size constraints) and could mess things up.
The PDFs are all between 10 and 400 pages long and there are potentially tens of 1000's of files.
So......is there a configuration that i haven't found yet that tells zoom to ignore the first page of every file?
Comment