PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

page has been skipped problem

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • page has been skipped problem

    log file ===> Skipping http://content.abc.com/ContactList/index.htm (Identical page found: CRC signature matched)

    I have removed the "index" in Word skip list, so it could be indexed (other idnex page is no problem)

    i found all of (Identical page found: CRC signature matched) can not be indexed ..

    why this page will be skip

  • #2
    CRC = Cyclic redundancy check.
    It detects duplicate pages on your site and prevents the duplicate being indexed.

    You might have this page appearing at two different URLs. e.g.
    http://content.abc.com/ContactList/index.htm
    and
    http://content.abc.com/ContactList/
    or
    http://content.abc.com/ContactList/index.html

    Comment


    • #3
      Originally posted by wrensoft View Post
      CRC = Cyclic redundancy check.
      It detects duplicate pages on your site and prevents the duplicate being indexed.

      You might have this page appearing at two different URLs. e.g.
      http://content.abc.com/ContactList/index.htm
      and
      http://content.abc.com/ContactList/
      or
      http://content.abc.com/ContactList/index.html
      but i cannot find any other indexed successfully page like "/ContactList/index.htm" , "/ContactList/" , "/ContactList/index.html" in the index log
      now i cannot index the link inside this page....

      Comment


      • #4
        The duplicate page could have any URL. But the content will be the same.

        Just turn off CRC in the Scan Options configuration window

        Comment


        • #5
          Originally posted by wrensoft View Post
          The duplicate page could have any URL. But the content will be the same.

          Just turn off CRC in the Scan Options configuration window
          I dont understand of "The duplicate page could have any URL. But the content will be the same."
          how does it different URL but the same page
          if the page already indexed, the linkage in this page also indexed...right?
          nothing linkage in this page to be indexed

          Comment


          • #6
            how does it different URL but the same page
            The concept isn't complex. The content on one page is the same as the content on another page. It is a duplicate.

            But as suggested just turn off the CRC function if it is all too complex to understand.

            if the page already indexed, the linkage in this page also indexed
            If you are using Spider mode then links within your site are followed to find other pages.

            Comment


            • #7
              Originally posted by wrensoft View Post
              The concept isn't complex. The content on one page is the same as the content on another page. It is a duplicate.

              But as suggested just turn off the CRC function if it is all too complex to understand.

              If you are using Spider mode then links within your site are followed to find other pages.
              yes~im using spider mode (follow links only)
              if i turn off CRC function...it will be many same url appear after search
              if i turn on CRC function...the page cannot be indexed, and the linkage in that page also not be indexed
              if the another page's content is the same, i should have the linkage indexed inside the page.
              now it can't pls help , thanks

              Comment


              • #8
                it will be many same url appear after search
                I think this is impossible. Each URL will appear at most once in the search results.

                Comment


                • #9
                  Originally posted by deandean View Post
                  if i turn off CRC function...it will be many same url appear after search
                  As explained above, this shouldn't be possible. I think you might be confused. Are you referring to URLs like the following as the same:

                  http://mysite.com/test/
                  http://mysite.com/test/index.html
                  http://www.mysite.com/test/default.html

                  These are NOT the same URLs. They may go to the same page but that is dependent on the way your server is configured.

                  The CRC detection prevents this from happening by actually looking at the contents of each page and determining if they are duplicates. Although ideally, you should not be linking to the same page with different names and URLs throughout your site, so that this would not be necessary.

                  Originally posted by deandean View Post
                  if i turn on CRC function...the page cannot be indexed, and the linkage in that page also not be indexed
                  if the another page's content is the same, i should have the linkage indexed inside the page.
                  You've made an assumption here. The problem might actually be that the page (or an identical copy of the page somewhere else) was indexed, but the links were not crawled for other reasons (links are inside javascripts, links are external to base URL, etc.). You should verify this first.

                  It would likely be much quicker if you can give us a URL to the site in question, and maybe even provide us with the ZCFG file, and tell us what page or link you are trying to index.
                  --Ray
                  Wrensoft Web Software
                  Sydney, Australia
                  Zoom Search Engine

                  Comment


                  • #10
                    i found that the input url "http://content.abc.com/ContactList/" can goto "http://content.abc.com/ContactList/index.htm" page....
                    so that's why "http://content.abc.com/ContactList/index.htm" will be indexed ??
                    so the linkage inside "http://content.abc.com/ContactList/index.htm" cannot be found ??

                    u know i cannot turn off CRC ...
                    becourse i have 3 pages e.g. a.htm , b.htm, c.htm
                    and there are a linkage e.g. z.htm in these 3 pages........
                    after search the key word inside the z.htm , the result will be 3......so that's why i need to turn on the CRC......

                    and i found that the linkage is used <a href="ContactList/index.htm" ...........>

                    now i turn off the CRC can index , but many linkage the same...
                    turn on CRC........the page is skipping , so the linkage inside the page will also no indexing...

                    how to insert the ZCFG file here??

                    Comment


                    • #11
                      I'm sorry but I have read you post a couple of times, and I really can't make much sense of it. Statements like "the result will be 3......so that's why i need to turn on the CRC", are illogical.

                      You also contradict you initial post, and your use fictitious examples rather than the real URLs and real search words further confuses matters.

                      I am guessing English isn't your first language, but this is no excuse for only supplying half the information.

                      You can't insert files into the forum. You need to E-Mail them.

                      Comment


                      • #12
                        yes~im so sorry English is not my first language
                        but i think , i tell u the sample is very clear
                        ok i tell u in this way.

                        a.htm
                        <a href="z.htm" ...........>

                        b.htm
                        <a href="z.htm" ...........>

                        c.htm
                        <a href="z.htm" ...........>

                        The "<a href="z.htm" ...........> " also in these 3 pages , right?
                        so the z.htm contains a word like this "Hello zoom search", and the CRC turn off and start indexing, after that, i search a key word "Hello"
                        the result will be
                        ================================================== =======
                        z page
                        "Hello zoom search"
                        Terms matched: 1 - Score: 14 - 20 Mar 2009 - URL: http://content.abc.com/z.htm

                        z page
                        "Hello zoom search"
                        Terms matched: 1 - Score: 14 - 20 Mar 2009 - URL: http://content.abc.com/z.htm

                        z page
                        "Hello zoom search"
                        Terms matched: 1 - Score: 14 - 20 Mar 2009 - URL: http://content.abc.com/z.htm
                        ================================================== =======
                        right? so that is what i meant "3 results"
                        and sorry again...the real URL cannot give u becouse it is not private website...it is the company i ma working for ........so i need to give the example in my way.
                        Pls help , many many thanks

                        Comment


                        • #13
                          Unless you are using a corrupted set of index files, the fictitious example you describe can't happen in real life. As already pointed out, each URL will appear at most once in the search results. So your scenario is impossible regardless of the CRC setting.

                          Comment


                          • #14
                            I agree. This should not be possible unless:

                            (a) You have a corrupted set of index files, by mixing files from different indexing sessions or uploading them incorrectly.

                            (b) You have modified our search script source code and broken it.

                            (c) You are missing details in your fictitious example, and it is not an accurate representation of what you are seeing.

                            If you are sure it is not any of the above reasons, then ZIP up your search files (all files generated by the Indexer) and the ZCFG configuration file, and e-mail them to us.
                            --Ray
                            Wrensoft Web Software
                            Sydney, Australia
                            Zoom Search Engine

                            Comment


                            • #15
                              [QUOTE=Ray;12659]As explained above, this shouldn't be possible. I think you might be confused. Are you referring to URLs like the following as the same:
                              http://mysite.com/test/
                              http://mysite.com/test/index.html
                              http://www.mysite.com/test/default.html

                              These are NOT the same URLs. They may go to the same page but that is dependent on the way your server is configured.

                              The CRC detection prevents this from happening by actually looking at the contents of each page and determining if they are duplicates. Although ideally, you should not be linking to the same page with different names and URLs throughout your site, so that this would not be necessary.
                              [QUOTE]
                              OK...but i wanna to confirm that .....
                              input this "http://mysite.com/test/" ,then it can goto the page "http://mysite.com/test/index.html",
                              and the CRC is on, so the page "http://mysite.com/test/index.html" will also be skipped, right?
                              and the linkage(content) inside the page "http://mysite.com/test/index.html" will be indexed or not
                              thanks so much

                              Comment

                              Working...
                              X