Indexing files i dont want it to

Beard

Newbie

Join Date: Nov 2007

Posts: 1
- Share
- Tweet
#1

Indexing files i dont want it to

Nov-23-2007, 09:58 AM

Hi,
On our company Intranet, for each directory on it there are 3 copys of each HTML page, the page was originally done in word (dont ask we've told them not to do webpages in word) then the word document is converted to HTML and another copy is made in PDF format. How can i stop the ZoomSearch software from indexing the PDF documents, ive got CRC turned on, and its skipping the word documents as they are identical. Im pretty sure the PDF files arent listed in the webpages anywhere so theres no reason to index it. Cant really do anything to the pdf files as the site is huge.

Im currently using spider mode.

cheers
Tags: None
David

Administrator

Join Date: Dec 2004

Posts: 4709
- Share
- Tweet
#2

Nov-23-2007, 09:19 PM

If you don't want to index PDF files, then remove .PDF from the list of file types to scan (on the scan options tab). But this seems too obvious, so maybe I am missing the point?

Also, using the CRC option will not filter out Word documents that happen to have the same text as HTML documents. Documents need to be byte for byte identical before they are filtered with the CRC option (at least in V5 of the software).
Comment

Announcement