PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Sitemap question

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sitemap question

    I just upgraded to the latest version of Zoom in order to try out the revised sitemap generation feature.

    My site has about 20,000 pages. When I generate the sitemap from the top level domain, I only get 179 links. Some of these generated links appear on the main page, some are one level down, and some page links are two levels down -- but many pages that are two levels down (and beyond) are not included.

    What criteria are used to generate sitemaps? Can the amount of detail be controlled via a preference?

  • #2
    In case you haven't already, you may want to first have a look at the Help file (click on the "Help" button on the Sitemaps tab of the Configuration window, or refer to the corresponding chapter in the Users Guide).

    If that doesn't answer your question, can you tell us which sitemap option you are using (text file "urllist.txt" or a XML/Google sitemap). As explained in the Help file, the Google sitemap format depends on the Sitemap base URL to determine what pages can legitimately be in the sitemap file. Generally, it should never prevent pages which are x levels down from the base URL - only pages which are several levels up or in a different folder/domain.

    You should check that these pages are actually several levels down as you describe. Note that the URLs need to represent this accurately. So, even though "http://mysite.com/news/archive/test.html" may seem to be several levels down from "http://www.mysite.com/news/" - it will actually fail this distinction because the domain name is different and theoretically, it is actually a different URL - although many web servers are configured to redirect it to the same domain as the www version so it seems to be the same. Likewise with case sensitive URLs. In such cases, you should change the absolute links on your web pages to be more consistent.
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment


    • #3
      We tried both sitemap options in order to compare them. Except for format, the content appears to be the same.

      The main domain is http://www.globalgourmet.com/

      The main (home) page is http://www.globalgourmet.com/index.html

      We don't use http://globalgourmet.com/

      It's updated each week and each month so some of the links change but over 90% of the links are always the same. ZOOM gets all the main links, then goes into a strange tailspin where it adds repeat links into the sitemap and also includes links that are several levels down yet skips more important links it should have found on its way to those lower links. The other anomaly is where an extra colon is added to some of the links. I first thought we had added those colons by mistake but I can't find them.

      The structure of our site includes much of the content in the "food" directory, which is a second level directory of other directories -- that is, there are almost no files in that directory, only more directories, each containing files. This came about because the site started in 1994 and the main domain name changed several times during its evolution.

      Here's an extract from the Yahoo-style sitemap, part of the way in, where the errors begin:

      http://www.globalgourmet.com/food/resources/
      http://www.globalgourmet.com/food/special/mothers-day/
      http://www.globalgourmet.com/food/kgk/
      http://www.globalgourmet.com/food/kgk/katesbooks.html
      http://www.globalgourmet.com/food/cookbook/
      http://www.globalgourmet.com/food/special/holidays/
      http://www.globalgourmet.com/food/office/
      http://www.globalgourmet.com/destinations/countries.html
      http://:www.globalgourmet.com/food/foodscpe/resources/cocktails/index.html
      http://:www.globalgourmet.com/food/foodscpe/resources/cocktails/index.html
      http://www.globalgourmet.com/food/foodscpe/resources/cocktails/index.html
      http://:www.globalgourmet.com/food/special/lunar/
      http://:www.globalgourmet.com/food/special/lunar/
      http://:www.globalgourmet.com/food/resources/
      http://:www.globalgourmet.com/food/resources/
      http://:www.globalgourmet.com/food/resources/
      http://:www.globalgourmet.com/food/resources/
      http://:www.globalgourmet.com/food/resources/
      http://:www.globalgourmet.com/food/resources/
      http://:www.globalgourmet.com/food/resources/
      http://:www.globalgourmet.com/food/resources/
      http://:www.globalgourmet.com/food/ild/
      http://:www.globalgourmet.com/food/ild/
      http://:www.globalgourmet.com/food/ild/ildarc.html
      http://:www.globalgourmet.com/food/ild/ildarc.html
      http://:www.globalgourmet.com/food/ild/
      http://:www.globalgourmet.com/food/special/chocolate/
      http://:www.globalgourmet.com/food/ild/ildarc.html
      http://:www.globalgourmet.com/food/special/chocolate/
      http://:www.globalgourmet.com/food/ild/ildarc.html
      http://:www.globalgourmet.com/food/special/easter/
      http://:www.globalgourmet.com/food/special/easter/index.html#passover/
      http://:www.globalgourmet.com/food/wine/
      http://:www.globalgourmet.com/food/winearc.html/
      http://:www.globalgourmet.com/food/wine/
      http://:www.globalgourmet.com/food/wine/
      http://:www.globalgourmet.com/food/wine/
      http://:www.globalgourmet.com/food/special/st-patrick/
      http://:www.globalgourmet.com/food/special/st-patrick/
      http://:www.globalgourmet.com/food/special/st-patrick/
      http://:www.globalgourmet.com/food/special/st-patrick/
      http://:www.globalgourmet.com/food/special/st-patrick/
      http://:www.globalgourmet.com/food/special/thanksgiving/
      http://:www.globalgourmet.com/food/special/thanksgiving/
      http://:www.globalgourmet.com/food/special/holiday/
      http://:www.globalgourmet.com/food/special/holiday/
      http://:www.globalgourmet.com/food/special/holiday/
      http://:www.globalgourmet.com/food/special/holiday/
      http://:www.globalgourmet.com/food/egg/egg0197/orleans.html
      http://:www.globalgourmet.com/food/egg/egg0197/orleans.html
      http://:www.globalgourmet.com/food/egg/egg0197/orleans.html
      http://:www.globalgourmet.com/food/egg/egg0298/oysters.html
      http://:www.globalgourmet.com/food/egg/egg0298/oysters.html
      http://:www.globalgourmet.com/food/sleuth/0699/index.html
      http://:www.globalgourmet.com/food/sleuth/0699/index.html

      etc.

      Note all the repeats and the added colons.

      Comment


      • #4
        We have not been able to reproduce this behaviour by running tests on the same site. Can you e-mail us your ZCFG file so we can take a closer look.

        Please also confirm that you are using the latest version and build (V5.0 build 100 as available here and it may also be helpful to let us know what OS you are using.
        --Ray
        Wrensoft Web Software
        Sydney, Australia
        Zoom Search Engine

        Comment


        • #5
          I am using the latest version (V5.0 build 1008 -- I upgraded from the original 5.0) and the first time I used the new version after installation I got the weird file I quoted in my previous post.

          While awaiting your reply I decided to reindex the site again. This time the sitemap files look more like I expected.

          Not sure what caused the anomaly on the first run. I never changed any configuration parameters. I simply retarted Vista and reindexed.

          Comment


          • #6
            I will presume that this problem is resolved for the moment. Please let us know if this problem occurs again, and if possible, try to narrow it down to a reproducible scenario/set of actions. Then e-mail us your findings and your ZCFG file and we will be able to look into it.
            --Ray
            Wrensoft Web Software
            Sydney, Australia
            Zoom Search Engine

            Comment

            Working...
            X