PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Same page showing multiple times with double slash

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Same page showing multiple times with double slash

    I wonder if anyone can help solve my mystery as to why, when a search word is entered in my search i.e. Vwadyck Zendera (I have used this as it is an uncommon name shown on the site) it shows the same page 3 times. It happens with all searches
    The Url for my site is worldwar2exraf.co.uk
    Would appreciate some help
    Thanks
    Len Smith

  • #2
    The URLs of your results give...

    http://www.worldwar2exraf.co.uk/Ground%20Crew%20Notice%20Board/Page%2036.htm
    http://www.worldwar2exraf.co.uk//Ground%20Crew%20Notice%20Board/Page%2036.htm
    http://www.worldwar2exraf.co.uk///Ground%20Crew%20Notice%20Board/Page%2036.htm
    http://www.worldwar2exraf.co.uk////Ground%20Crew%20Notice%20Board/Page%2036.htm

    Notice the extra slashes. Do you have multipule copies of the same file.?

    Comment


    • #3
      No there is only file for each page. I noticed the extra slashes too and checked to make sure but everything is as it should be. That's what I can't understand, no matter what yu enter on any page it still shows the same page 3 times. Very strange !!!!!!
      Regards
      Len Smith
      Last edited by len smith; Jul-19-2007, 07:59 PM.

      Comment


      • #4
        As the URLs are different they are treated as different pages.

        The problem is that the same page is being index multiple times at different URLs. This, in turn, is a result of you having a (slightly) broken link on your web site. You probably have a relative link somewhere on your site that has a double slash in the URL. Then on the page with a double slash it links to a page with a triple slash. An infinite loop.

        So the possible solutions are,

        1) Fix the broken link, although finding it might take a small amount of detective work.

        2) Turn on CRC duplicate page checking in Zoom.

        Option 1) is a much better solution than 2) as option 2) needs to download the page before it can work out it is a duplicate, plus option 2) might fail in any case if some of your content is dynamically changing.

        If you can't find the broken link, let us know, we might be able to find it for you if you site is not too large.

        Comment


        • #5
          I recently helped another user with a similar problem, so I'll post my tips from my original e-mail below. This should give you (and anyone else who is having this problem) a guide as to how you can use Zoom to find your broken links, and in this particular case, find the links with the extra slash that is making your website spider unfriendly.

          Extract from my e-mail below:

          ... in situations like this, you can configure Zoom to help you
          locate these problems in your website. This is what you do:
          • On the "General" tab of the Configuration window, set Zoom to
            "Single-threaded downloading". This helps make things alot clearer
            as to which link came from where in your log.
          • If you are making changes as you go, it probably helps to check the
            "Reload all files (do not use cache)" option to avoid indexing cached pages.
          • On the "Index Log" tab, make sure you have the following boxes enabled:
            Indexing, Spidering, Initialization, Downloading, Information, Error,
            Warning, Plugin, Summary, Broken Links.
          • Turn on "Save index log to file" and specify a place for the log file to
            be written to.
          • Enable "Debug mode" so that the log file will be written out as indexing
            goes.
          Now when you re-index the site, a log text file will be created with all the
          index messages. Once it gets to a point where the looping occurs, you can
          stop it, and open the log file in Notepad or any text editor. This will
          allow you to browse through the log in more detail.
          You will find "Queued URL: ..." messages in the log which immediately
          follows the page that it was spidered from ("Spidering for ..."). So, in
          doing the above, I have found the following as the first occurance of a URL
          containing ".uk//":

          07/05/07 10:39:24 - Spidering for links on
          http://www.mysite.co.uk/main/cycleracks_rear2-3.htm
          07/05/07 10:39:24 - Queued URL:
          http://www.mysite.co.uk/main/euroclassic_mof.html
          07/05/07 10:39:24 - Queued URL:
          http://www.mysite.co.uk//main/backup_box.htm
          07/05/07 10:39:24 - Queued URL:
          http://www.mysite.co.uk/main/euroway_mof.html

          As we can see here, this double slashed link was found on the
          cycleracks_rear2-3.htm page. And going to that page and looking for that
          link, indeed, we see this in the HTML source:

          Code:
           903 can also be used to carry the Thule <a
          href="..//main/backup_box.htm">BackUp 
                 luggage box</a><font face="Arial, Helvetica, sans-serif">. <br />
          So this is one of the problems to fix. It might even be the cause of all the
          other links, because what happens is that, the spider will proceed to that
          URL later on, and your server will respond as if it was a unique page:

          07/05/07 10:40:32 - Downloading file
          http://www.mysite.co.uk//main/backup_box.htm
          07/05/07 10:40:33 - Spidering for links on
          http://www.mysite.co.uk//main/backup_box.htm
          07/05/07 10:40:33 - Queued URL:
          http://www.mysite.co.uk//main/specials.htm
          07/05/07 10:40:33 - Queued URL:
          http://www.mysite.co.uk//main/cycleracks_rear2-3.htm

          Now because of the relative links on that page (links to "specials.htm"
          etc.) it will think this is a new path and all these links are unique, and
          cause a cascading effect - where the entire website is re-indexed with two
          slashes - which will lead to then indexing with 3 slashes, etc.

          I can't confirm that this is the only occurance of the double slash on your
          site without modifying the page on your website and repeating the above
          process. So I would recommend you do that, and see how you go from there.
          --Ray
          Wrensoft Web Software
          Sydney, Australia
          Zoom Search Engine

          Comment


          • #6
            Thanks for that Guys.
            Well I have checked and rechecked and although everything points to several broken links when I look at the links shown they all seem to be working perfectly with no problems at all. I am wondering if it is because I use Pop menu magic for my navigation as the pages that show as broken links are the ones that use this navigation system.
            All very starnge indeed.
            Any help orsolutions would be appreciated paid or otherwise
            regards
            len smith

            Comment


            • #7
              Originally posted by wrensoft View Post
              As the URLs are different they are treated as different pages.

              The problem is that the same page is being index multiple times at different URLs. This, in turn, is a result of you having a (slightly) broken link on your web site. You probably have a relative link somewhere on your site that has a double slash in the URL. Then on the page with a double slash it links to a page with a triple slash. An infinite loop.

              So the possible solutions are,

              1) Fix the broken link, although finding it might take a small amount of detective work.

              2) Turn on CRC duplicate page checking in Zoom.

              Option 1) is a much better solution than 2) as option 2) needs to download the page before it can work out it is a duplicate, plus option 2) might fail in any case if some of your content is dynamically changing.

              If you can't find the broken link, let us know, we might be able to find it for you if you site is not too large.
              Now if I had tried your option 2 in the first place (Turn on CRC) I could have saved myself endless hours of seraching. Have just tried this option and guess what? It works fine now.
              Thanks once again Regards
              Len Smith

              Comment


              • #8
                I had a look at your site. You have 1000's of pages on the site, so it was a little like looking for a needle in a haystack.

                Nevertheless I found at least one (partially) broken link which would cause this problem.

                On this page,
                http://www.worldwar2exraf.co.uk/Photo%20Gallery%202207/Aircrew%202007/aircrew%20single%206.html

                You have this HTML code,

                <a href="../..//Aircrew Notice Board/aircrew notice board 168.html#1608">View details on Notice Board </a>

                As expected it was a relative link with a double slash in the link.

                Comment


                • #9
                  Thanks for that. yep there are approximatly 15,000 + pages on the site as growing all the time due to the amount of people who use the site for posting details. As you say like looking for a needle in a haystack.
                  thanks for finding that page.
                  Have to say though your sencod suggestion of turning on CRC seeming to work just fine for the site.
                  Thanks for that. It really pulled me out of a hole
                  Best Regards
                  Len

                  Comment


                  • #10
                    Hi to Wrensoft
                    It seems there were a few more links like that on the same page you mentioned which I have now put right. I am wonering how many more I have missed. The silly thing is the links still worked, too.
                    I do appreciate the time you took over this, I see you spent well over an hour on the site looking for the problem.

                    Thanks for that.
                    Regards
                    Len Smith

                    Comment


                    • #11
                      Seeing how difficult it is to hunt for such problems on your website (and the potential trouble you will have when indexing a site with such problems), we decided to add some functionality in the latest build of Zoom which will strip out the multiple slashes in URLs, and thus prevent duplicate pages being indexed because of this.

                      You can download the latest build with this new feature (V5.1.1003) here:
                      http://www.wrensoft.com/zoom/whatsnew.html

                      We should note however, that we still recommend website designers to avoid such linking issues and fix any existing problems. They will cause similar issues with other search engines and may potentially render your website search engine unfriendly (a little while back, Google was also showing pages with multiple slashes in its URLs and it would rank down a site because of this, thinking it had too many duplicate pages).
                      --Ray
                      Wrensoft Web Software
                      Sydney, Australia
                      Zoom Search Engine

                      Comment


                      • #12
                        Thanks for that Ray
                        after this reply I will be downloading the latest build.
                        What stars you are at Wrensoft.
                        My problem is that not only have I thousands of pages but also each page contains up to 100 links too (sometimes more in a lot of cases)
                        Thankfully we rate very high with google so broken links don't seem to be too much of a problem with the site.
                        I do appreciate the time you have spent on my site trying to help
                        Regards
                        Len Smith

                        Comment


                        • #13
                          works like a dream
                          thanks for that
                          Len Smith

                          Comment

                          Working...
                          X