PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Additionnal rules to delete doubloons in results.

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Additionnal rules to delete doubloons in results.

    In DotNetNuke Portal,

    I tested the option "Use CRC to skip with identical content", however I still have results of type

    StartPage
    http://myportal.com/default.aspx
    ...Lorem...
    term : 1

    StartPage
    http://myportal.com/default.aspx
    ...Lorem ipsum...
    term : 1

    StartPage
    http://myportal.com/StartPage/tabId=xx/default.aspx
    ...Lorem...
    term : 1

    Thank you in advance.

  • #2
    Without seeing the actual web site (and web pages) in question, it's hard for us to comment. But, generally this is because the pages generated are actually different (if only slightly), especially in result #1 and result #3 in your above example where they have two different URLs.

    Chances are, for example, if the page has something which changes, e.g. a 'Current Time' is reported, or "Time taken to generate page" message, etc. Or advertising. These all need to be excluded and filtered out as described in this FAQ:
    Q. How do I prevent parts of my webpage from being indexed (eg. exclude navigation menus, or page footers)?

    Result #2 and result #1 is more curious, as that should not happen. Since you are "paraphrasing" your results, my first question is whether this is actually exactly what you are seeing (two exact same URLs) or you've omitted an important bit of information (different upper and lower casing for example).

    You should also make sure you have the latest build available, just to be sure. You can download the latest from here.
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment

    Working...
    X