PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Avoiding duplicate pages

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Avoiding duplicate pages

    We have multiple links to the same page on our web site which all get included in the index.

    see http://www.southhams.gov.uk/ and search for 'listed' to see the issue

    has anyone any solutions for this issue apart from manually excluding the duplicates?

    Thanks in advance for any ideas

    AllanP

  • #2
    The "same page" is in an entirely different directory:

    ksp-development_and_planning-developmentcontrol/1app-planning-application-forms.htm

    1app-national-forms-submitting-planning-apps/1app-planning-application-forms.htm


    Good luck,
    Leon

    Comment


    • #3
      They seem to be copies of the same file placed in different folders. Perhaps there are some folders or filenames that you would want to exclude from indexing on the "Skip Options" tab.

      But if these files are 100% identical, you can easily skip them all by turning on the "Duplicate page detection - Use CRC to skip files with identical content" option on the "Scan Options" tab of the Configuration window.
      --Ray
      Wrensoft Web Software
      Sydney, Australia
      Zoom Search Engine

      Comment


      • #4
        Thanks for the input
        I should have said that the pages/links are generated from our CMS system which generates the navigation automatically so there are no 'folders' as such.
        I have the Duplicate page detection on in scan options, but the CMS will display the page content in different templates depending from where in the navigation the link was followed. This results in, as far as the CRC is concerned, a different page because for example the banner may change colour and the breadcrumb will be different. I tried to exclude these differences using ZOOMSTOP and ZOOMRESTART tags
        but I guess the duplication check ignores these. Oh well back to manual removal I guess

        AllanP

        Comment


        • #5
          You would be better off getting your CMS to not generate those duplicate pages if you do not want them.

          Having almost identical content with many different URLs like that (those URLs technically are presenting themselves as different folders regardless of whether they reflect true folders on the filesystem) could possibly get your site penalized on Internet wide search engines such as Google and Yahoo (as it is a similar method that many sites use to keyword spam). Zoom will not penalize this.

          V6 of Zoom will have an improved "duplicate page detection" method, which will exclude ZOOMSTOP and ZOOMRESTART sections.
          --Ray
          Wrensoft Web Software
          Sydney, Australia
          Zoom Search Engine

          Comment

          Working...
          X