I am looking to buy software to add a search engine to my website (jjonz.us/RadioLogs).
When finished the site will have 45000 pdf files. However, my problem is that these pdf files are scanned images. I have tried to OCR with acrobat 7, but acrobat is not a great OCR program and my results have been unsatisfactory. Instead I think I will use Omnipage which will create text files for each pdf file.
Here is the problem. If I run a search of the OCR'd text files the results will show up as hits in the text files. USING YOUR SOFTWARE IS THERE ANYWAY TO CROSS REFERENCE THE RESULTS BACK TO THE ORIGINAL PDF FILE?
Thanks for your help.
jj
ps. I sent the above message as an email, but it was undeliverable with the following message:
Delay reason: SMTP error from remote mail server after RCPT TO:<info [at] wrensoft.com>:
host mailwash4.pair.com [66.39.2.4]: 450 <info [at] wrensoft.com>:
Recipient address rejected: Service temporarily unavailable
When finished the site will have 45000 pdf files. However, my problem is that these pdf files are scanned images. I have tried to OCR with acrobat 7, but acrobat is not a great OCR program and my results have been unsatisfactory. Instead I think I will use Omnipage which will create text files for each pdf file.
Here is the problem. If I run a search of the OCR'd text files the results will show up as hits in the text files. USING YOUR SOFTWARE IS THERE ANYWAY TO CROSS REFERENCE THE RESULTS BACK TO THE ORIGINAL PDF FILE?
Thanks for your help.
jj
ps. I sent the above message as an email, but it was undeliverable with the following message:
Delay reason: SMTP error from remote mail server after RCPT TO:<info [at] wrensoft.com>:
host mailwash4.pair.com [66.39.2.4]: 450 <info [at] wrensoft.com>:
Recipient address rejected: Service temporarily unavailable
Comment