PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Skip List filename matching

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Skip List filename matching

    The Page and Folder skip list description reads: "Pages and folders with names containing any of the following words will not be scanned."

    Does this include partial matches? For example, if I list "most_common.html" (no quotes) on the skip list, will it exclude files that contain that string such as "abc_most_common.html" and "def_most_common.html"?

    If found a specific mention in the user guide stating that if it's a path then this would apply but it doesn't specify if this also applies to the filename itself.

    Andrew

  • #2
    From the users guide.

    This is a list of pages and folders that will not be scanned during the indexing process. Note that filenames and paths are case sensitive. Typically you would want to filter pages that the user should never be able to get to directly via the search function. Note that if the path to a page partially or fully matches any entry in this list it will be filtered. For example, an entry of “\private\” will filter “\private\file1.htm”, “\private\file2.htm” and “photos\private\athome.htm”.

    Both file names and the path are matched against.

    See also this FAQ
    Q. How should I index my site if it features a message board, forum, or calendar and other similarly complex scripts?

    Comment


    • #3
      Thanks.

      A.

      Comment


      • #4
        I'd like to put in a request for allowing more than 200 entries on the skip list. For my site I had a list of about 350 that I wanted to include. When I found out that the max was 200, I switched strategies to use the robots.txt support instead. But then I realized that a file in robots.txt cannot be followed but not indexed (which was my goal). Inclusion in the robots.txt file only does the equivalent of NOFOLLOW and NOINDEX. So I went back to the skip list approach. I was able to whittle down my list of 350 to just under 200 by using some strings that some URL's had in common. It's a less satisfactory solution for me because I'm not 100% sure I haven't accidentally excluded some pages I want indexed because I've used less specific strings in the Skip List.

        If it would be possible to allow for increased entries in this list in a future update, I would find this feature useful.

        Andrew

        Comment


        • #5
          From release V5.1 build 1013 (5/March/200 we increased the maximum number of skip pages from 200 to 1000.

          Comment


          • #6
            Oh. I must have missed the update. I've just got it installed and am looking forward to going back to my original Skip list.

            Thanks,

            Andrew

            Comment

            Working...
            X