PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Indexing files i dont want it to

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Indexing files i dont want it to

    Hi,
    On our company Intranet, for each directory on it there are 3 copys of each HTML page, the page was originally done in word (dont ask we've told them not to do webpages in word) then the word document is converted to HTML and another copy is made in PDF format. How can i stop the ZoomSearch software from indexing the PDF documents, ive got CRC turned on, and its skipping the word documents as they are identical. Im pretty sure the PDF files arent listed in the webpages anywhere so theres no reason to index it. Cant really do anything to the pdf files as the site is huge.

    Im currently using spider mode.

    cheers

  • #2
    If you don't want to index PDF files, then remove .PDF from the list of file types to scan (on the scan options tab). But this seems too obvious, so maybe I am missing the point?

    Also, using the CRC option will not filter out Word documents that happen to have the same text as HTML documents. Documents need to be byte for byte identical before they are filtered with the CRC option (at least in V5 of the software).

    Comment

    Working...
    X