PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

duplicate content detection is not correct

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • duplicate content detection is not correct

    I checked all crc found and their contents are just similar but not at all identical. Is this what duplicate content detection meant to be?

  • #2
    The skip files counter inside the indexing status box is always over counting. When there is 25 skipped files in the indexlog.txt, it saids 56. When there is no skipped files, it always above 0 and will keep growing as the indexing proceed.

    Comment


    • #3
      The CRC option is for detecting and removing pages that have identical content but different URLs. (Not pages which might just be similar).

      Regarding the skip page count. Turn on verbose mode, so you get a full log, before assuming the counter is wrong. There might be files skipped that you are not aware of.

      Comment


      • #4
        Where is the verbose mode located? Do you mean the debug mode?

        Comment


        • #5
          It is a button on the main index window. Alongside "Start indexing", "Configure", "Exit", there is a button that says "Verbose is off" (when Verbose mode is off) and "Verbose is on" (when Verbose mode is on).
          --Ray
          Wrensoft Web Software
          Sydney, Australia
          Zoom Search Engine

          Comment

          Working...
          X