PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Exclude folders setting not working - Zoom Version 5.0

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Exclude folders setting not working - Zoom Version 5.0

    Hello All
    We recently purchased ZOOM Enterprise Ver 5.0.
    After including certain web folders to be excluded from the general search(Sensitive information), the Zoom Indexer still indexes pages with these excluded folders
    We use Spider Mode (ASP). The web is using Dot Net Framework 2.0

    The folders are included using the following syntax
    "/Admin/" (Without the Quotes)
    "/Security/"

    I even tried giving the fill path to the folder, still no success.
    Read somewhere on the forum that folder names are case sensitive,which seems odd for a web folder.
    Anyway tried all , with no success


    Please help

    Thanks

  • #2
    Web folder names are most certainly case sensitive. This is actually most common, because the majority of web servers are Unix-based (where file/path names have always been case sensitive). By definition, filenames and paths in URLs are always case sensitive, and only IIS and Windows map the names across to match regardless of case.

    Can you give us the message in the Index Log which report the files having been indexed (the ones that you want to skip) and their exact URLs? This will probably give us a better idea of what the URL exactly look like, and whether it needs to be entirely in lowercase or uppercase, or mixed case, etc.

    Remember that the skip file list is matched against the entire URL of the file being crawled in Spider Mode (including the slashes). The URLs that would be skipped by your above examples should look something like:
    http://www.mysite.com/mypages/Admin/blah.html
    http://www.mysite.com/Security/somethingorother.asp
    http://www.mysite.com/page.asp?file=/Security/new.html
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment


    • #3
      Re:

      Hello

      Sorry as almost skipped the unix based web portals.
      Anway here's the log entry

      08:40:23 - [DOWNLOAD] Downloading file http://testserver/myweb/reports/support.aspx (64864 bytes)

      08:40:23 - [INDEXED] Indexing http://testserver/myweb/reports/support.aspx

      I included /Reports/ as an entry in the skip Option tab.

      I would like to skip indexing any aspx pages with in reports folder

      Also while i have your attention, i noticed that if i have '&' character in the pdf filename, its gives an error indexing the pdf. Is there a workaround to this ?

      Thanks for the help

      Comment


      • #4
        As we have just mentioned before, the skip list is case sensitive. This means that upper and lower case differences are significant.

        This means that a skip entry of "/Reports/" will NOT skip
        http://testserver/myweb/reports/support.aspx

        To skip that URL, you will need a skip entry of "/reports/". Add this to your skip list, and try indexing again.

        As for your ampersand ('&') problem - this is probably due to the fact that you are not encoding your ampersand characters properly on your web pages. It is illegal HTML and an invalid URL to have a stand-alone ampersand character in a URL. Such characters need to be encoded, either as a HTML entity ("&") or escaped in the URL ("%26").

        More information on this here:
        http://htmlhelp.com/tools/validator/problems.html#amp

        If this is not the problem, then please give us actual URLs of the files in question, and the error message you are seeing.
        --Ray
        Wrensoft Web Software
        Sydney, Australia
        Zoom Search Engine

        Comment

        Working...
        X