PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Duplicate results

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Duplicate results

    I've been using Zoom for several years and have just upgraded to v5.1

    It seems I've done something different to my config but can't figure out what I've done different.

    Please go to the site http://www.tekrati.com - In the header you'll find a search box. Try the word "quarterly" (without quotes) in the search box. You'll find that search result 1 and 2 are exactly the same.

    I've got the use crc ... checked. I'm using Zoomstop & restart to make sure I'm not picking up the title twice.

    Is there something else I should be doing to eliminate the duplicate.
    Last edited by Tekrati Admin; Sep-04-2007, 05:50 AM. Reason: typo

  • #2
    You'll notice that the two URLs are slightly different:

    http://computers.tekrati.com/research/News.asp?id=9269
    and
    http://computers.tekrati.com/research/news.asp?id=9269

    Note that URLs are, by definition, case sensitive. However, since Windows is a case insensitive file system, IIS will automatically ignore case differences and return the same page when requested. But from a web client's point of view (in this case, a spider), the two pages could potentially be different and it is possible that some web servers will return totally different pages for "mypage.html" and "MyPage.html".

    The reason why the CRC detection is not filtering them out is because the pages contain dynamically generated ads, so each time you revisit the page, the ads changes, and thus the CRC is also different.

    The reason you are only seeing this problem now and not before is most likely because the link to "News.asp" (as opposed to "news.asp") was only added recently.

    The recommended, long term solution is to avoid linking to the same file with different URLs and different upper/lowercasing. This is generally good practice for SEO (Search Engine Optimization). If you turn on verbose mode in Zoom, and switch to single threaded mode, you should be able to locate which page you have this inconsistent link to "News.asp" as opposed to "news.asp". Changing this link to be consistent with your other links will eliminate the problem.

    A quick solution would be to simply add "News.asp" to your Page skip list. Since URLs are case sensitive, the page skip list is also case sensitive, and it will not skip "news.asp" from being indexed.
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment

    Working...
    X