PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Problems concerning Google sitemap files

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problems concerning Google sitemap files

    Hello,

    at our website http://www.mykothek.de there are about 4.100 pages now. It is under construction yet - only about 10% are done up to now. We looked for possibilities to add search capabilities to our pages. And, after testing the Free Edition of your program we buyed "Zoom Search Engine", Version 6.0 (Build 1020), Professional Edition.

    And now, I am to say that I am very content concerning the abilities of the "Zoom Search Engine" for indexing and for searching our website. Therefore, many thanks to you from Germany.


    Necessary to say next that I am using always "offline mode" for indexing the pages, and that I am preferring relative paths as well for local use (JavaScript platform) as for server side use (PHP platform). That means that "../" is the base url in "Start Options". However, for creating the Google sitemap files the sitemap base url in "Sitemaps" is http://www.mykothek.de/.

    As so far all the files required for seraching have been created successfully, and they are working very well. The Google sitemap files have been created too, but the file "sitemap.xml" is totally empty, except the following header:

    <?xml version="1.0" encoding="UTF-8"?>
    <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
    xmlnssi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
    http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
    </urlset>


    I don't know the reason for this strange behaviour. But I am sure you will have some idea about it.

    Many thanks in advance for your advice.

  • #2
    My guess is that you haven't set the "Sitemap base URL" correctly.

    From the help file,

    Sitemap Base URL
    Note that a Sitemap Base URL is required to generate a XML sitemap. This is necessary for two reasons:

    1.To determine the URL of your split sitemap files (e.g. "sitemap2.xml", etc.) should it be necessary to generate a sitemap index ("sitemap_index.xml") as explained above.

    2.To exclude any indexed URLs which are under a different domain (or base URL) to the location of the sitemap file, as required by Google's Sitemap specifications.

    With the default option "Include only URLs within the Sitemap Base URL" selected, the latter requirement is enforced to obey Google requirements for sitemap files to only contain URLs which are below the base URL of the sitemap file. For example, you can not normally have a URL such as "http://www.myotherdomain.com/mypage.html" included in a sitemap file located at http://www.mysite.com/sitemap.xml.

    When "Include only URLs within the Sitemap Base URL" is selected, Zoom will automatically exclude any URLs from your XML sitemap if it is not within the specified Sitemap Base URL, even though that URL may have been indexed (or it may be a recommended link). This means that your XML sitemap may not contain all the URLs indexed if this is setup incorrectly...

    Comment


    • #3
      Hello,

      many thanks for your quick reply.

      Sorry, but you are not right. I correctly did set the sitemap base url to: http://wwww.mykothek.de/
      All the files of the website are in the root directory (http://wwww.mykothek.de/) it-self or in subdirectories to the root, respectively.


      Please remember, I am speaking about indexing in OFFLINE MODE for PHP PLATFORM, only.
      For this, after some tests I finally found out:

      1.) The base url (in Start options menue) must be identical with the sitemap base url (in Sitemap menue).

      2.) Then and only then the file "sitemap.xml" will be created correctly.

      3.) The final slash may be added or not, it doesn't matter.

      4.) By the way, you may use ANY url, for example http://_anything_/ for base url AND for sitemap base url, and the program will create FORMALLY correct files "urllist.txt", "sitemap.xml" and "zoom_pagedata.zdat". Of course, these files are useless because the url http://_anything_/ doesn't exist. But, the program works this way. Try it out yourself, please.


      In consequence of the said above I did the following to get the files desired:

      1.) Set the base url to: http://wwww.mykothek.de/

      2.) Set the sitemap base url to: http://wwww.mykothek.de/

      3.) Created the files "urllist.txt", "sitemap.xml", and all the files necessary for seraching.

      4.) Replaced in the file "zoom_pagedata.zdat" (in spite of the warning not to edit this file) the string "http://wwww.mykothek.de/" by the string "../" using a plain text editor. This way I got the desired relative paths.

      5.) Uploaded all the files for searching to the server, and tested searching. And now, everything works very well. You may look yourself at our website http://www.mykothek.de/


      Once more, I remember that base url set to "../" AND sitemap base url set to "http://wwww.mykothek.de/" will create an empty file "sitemap.xml". I would be interested in your opinion about this behaviour of the program.

      Regards

      Comment


      • #4
        It is true that if you set your base URL to a relative path such as "../" you would not be able to specify a sitemap base URL to correspond.

        The reason for this is that we support multiple start directories (if you click on the "More" button on the 'Start options' panel). So potentially you could be indexing several folders, each mapped to a very different base URL (e.g. start point 1 may have a base URL of http://www.mysite.com/ while start point 2 may have a base URL of http://www.someothersite.com/, etc.)

        But Google Sitemaps will only accept a sitemap file which contains links to ONE site (and reject a file containing more than one domain), so the Sitemap Base URL would determine which of these pages would be used.

        Note that when the base URL is set to "../" it becomes impossible to determine which start point belongs to the sitemap. Of course, in the case where you only have one start point, this is not an issue (and probably why it seems strange to you).

        We could perhaps make it behave differently when there is only one start point and we are in offline mode and you are using a relative path for a base URL. But as you can see, it is pretty complex to explain the variety of situations that we need to cater for (and a user may care less about) and it can be even more confusing if behaviours changed on various obscure circumstances. We may need to come up with a better approach to addressing this.
        --Ray
        Wrensoft Web Software
        Sydney, Australia
        Zoom Search Engine

        Comment

        Working...
        X