PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Multiple results for same page

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multiple results for same page

    Hello,

    I have a problem of result. When I type in my form of search a
    word, I in my results page several times the same file obtains.
    I notched in the mitre configuration/Scan option the option Use CRC to content skip files with identical. But perhaps that it is not the good solution...

  • #2
    Using the CRC option is one solution. This requires the HTML of the pages to actually be identical, not just similar. This won't work on pages which might have dynamically generated advertising, or dates, etc. This option also means that the file has to be downloaded first before the Indexer can determine that it is the same and skipping the file.

    But it depends on what these duplicate pages look like. Can you give us some URLs to the pages that are the same that you wish to skip?

    For example, they most likely have different URLs, and this is why they are considered different pages. If they are the same pages with different URLs, such as:

    http://www.mysite.com/showpage.php?id=1
    http://www.mysite.com/showpage.php?id=1&sort=1
    http://www.mysite.com/showpage.php?id=1&sort=2
    http://www.mysite.com/showpage.php?id=1&style=green

    ... etc., then you could potentially skip all the unnecessary pages by adding the entries to the "Skip pages" list (on the "Skip options" tab of the Configuration window). In this example, the skip list entries would be:

    &sort=
    &style=

    If you search in the forums, you will find several other discussions on preventing "duplicate" pages from being indexed.

    There is a FAQ page on indexing forums and other dynamically generated sites, which also explain and give more examples on the setup required to spider such sites, and it's a similar requirement:
    Q. How should I index my site if it features a message board, forum, or calendar and other similarly complex scripts?
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment


    • #3
      Try http://www.reference-appro.eu/recherche.php

      And you type Blé.
      Results 2 to 26 are the same ones.
      Last edited by jmp; Aug-23-2007, 07:35 AM.

      Comment


      • #4
        Your problem is due to your website's use of session IDs. The repeated pages each have a different PHPSESSID:

        http://www.reference-appro.eu/metier.php?PHPSESSID=efc59e40220c54932bf943334a417 01a
        http://www.reference-appro.eu/metier.php?PHPSESSID=ebcebf8b52bb8a09fa9aa483019a6 785
        http://www.reference-appro.eu/metier.php?PHPSESSID=acd7b5d9fd16cf4d65d558897e69b 9ca
        ... etc.

        Check whether you have Cookie support enabled or not in Zoom. PHP sessions do not generally use the PHPSESSID= parameter unless you have cookies disabled. Check the option "Use cookies from Windows and IE" on the "Authentication" tab of the Configuration window and re-index. Make sure to also check the "Reload all files" option on the "General" tab as well.

        More information on this issue can be found by searching for "PHPSESSID" on our website.
        --Ray
        Wrensoft Web Software
        Sydney, Australia
        Zoom Search Engine

        Comment

        Working...
        X