Challenge: Grab text in an uploaded document and auto-post to an HTML template.
Google and other online search services have an option to "View as HTML" the content of trawled documents (DOC, RTF, DOCX, PDF, etc.). Does Wrensoft plan to add this feature to the Zoom Search Engine results page? I know Zoom already does a nice job indexing various document formats.
Here's my challenge. I'd like to have users be able to upload a document to a dedicated site where the textual content is automatically parsed and posted to an HTML page. The goal is to have the resulting HTML page be found and indexed by internet web crawlers and bots (not just the Zoom Search Engine). It must be a no-brainer for the person who submits the document. No cut-and-paste. No monitoring of the resulting HTML pages. It all gets done automatically and reliably.
I realize the problems inherent with some documents which may have difficult-to-read text (custom encoding) or rasterized type. Just looking for ideas on how to get started with the task of parsing text and slapping it on an HTML page. Is there something off-the-shelf we can use? I would really like to be able to hook this into the Zoom Search Engine.
Google and other online search services have an option to "View as HTML" the content of trawled documents (DOC, RTF, DOCX, PDF, etc.). Does Wrensoft plan to add this feature to the Zoom Search Engine results page? I know Zoom already does a nice job indexing various document formats.
Here's my challenge. I'd like to have users be able to upload a document to a dedicated site where the textual content is automatically parsed and posted to an HTML page. The goal is to have the resulting HTML page be found and indexed by internet web crawlers and bots (not just the Zoom Search Engine). It must be a no-brainer for the person who submits the document. No cut-and-paste. No monitoring of the resulting HTML pages. It all gets done automatically and reliably.
I realize the problems inherent with some documents which may have difficult-to-read text (custom encoding) or rasterized type. Just looking for ideas on how to get started with the task of parsing text and slapping it on an HTML page. Is there something off-the-shelf we can use? I would really like to be able to hook this into the Zoom Search Engine.
Comment