PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Content Filtering

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Content Filtering

    I have the following 3 content filters:

    -We have over 15 years experience in selling
    -<h1 class="header">Site Map</h1>
    -<h1 class="header">Office Chairs <span class="greenh1">&</span> Office Seating</h1>

    To filter out the following 3 pages:

    http://www.furniturerunner.com/
    and
    http://www.furniturerunner.com/sitemap.asp
    and
    http://www.furniturerunner.com/chairs/

    However only the second page is filtered out. I have tried various pieces of text only found on the home page and the chairs home page but they won't filter out.

    Any ideas on what I can do?

  • #2
    If you are trying to filter a small set of URLs, then just add the URLs to the page and folder skip list (instead of using a filter).

    Comment


    • #3
      Skip List

      I want http://www.furniturerunner.com/chairs/chairdetail.asp?id=1 (and about 500 more dynamically created pages like this) to be indexed so if I had http://www.furniturerunner.com/chairs/ to the skip list then it skips the whole folder. I also have no idea how to skip the index page as it is simply http://www.furniturerunner.com/

      Please advise.

      Comment


      • #4
        Do you have a list of the URLs you want indexed?

        Or do you have a single page on your site that lists all the URLs that you want indexed (e.g. a product_list.asp page)?

        Can you put NOINDEX meta tags on the page you don't want indexed?

        Comment


        • #5
          Lists

          There are no lists currently as there's over 4000 urls.

          The only page is the sitemap.asp file although this has pages I want to exclude (there are over 50 entries in my skip list).

          I can't put a NOINDEX tag as I still want google et al to pick them up.

          The zoom search engine is live at http://www.shreddingmachines.co.uk/ (another site I run) using exactly the same method. Most pages are skipped but about 4 pages are filtered out using the content filter.

          Bit confused as to why it doesn't work in this instance.

          Comment


          • #6
            Originally posted by chrave43 View Post
            I can't put a NOINDEX tag as I still want google et al to pick them up.
            You can use a "robots.txt" file. You can specify the User-agent to apply your rules to. Zoom can be identified as "ZoomSpider".

            I think, as suggested in the other thread, you might be best off applying a POSITIVE content filter. This is a rule that a page must contain this text for it to be indexed. You can make up your own tag for this, since it works on HTML. So a content filter of

            +<meta name="ZOOMONLY" content="indexme">

            And add that tag to only the pages you're interested in, should do the job. Remember, this is a made up tag for this example. You can use any unique identifying text or tag of your own design.

            I'm also guessing your first page is your start URL? (This would explain why your content filter didn't knock it out). And yet you want it skipped, but you want it to follow all the links on it? Note that you can change the spider option for the start point by clicking on the "More" button next to your Start Spider URL, hit "Edit" and set it to "Follow links only" so it does not index the content.
            --Ray
            Wrensoft Web Software
            Sydney, Australia
            Zoom Search Engine

            Comment

            Working...
            X