PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Manually Adding URL List & duplicate pages

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Manually Adding URL List & duplicate pages

    Is there a way to manually add a URL list to Zoom? I have a text file with all of my URL's (over 3K links) and would like to simply add this to the database without having Zoom index the site. Is this possible or does it need to index in order to update the specific meta tag details (title, description, etc.)?

    Thanks!

  • #2
    You can add/import a list of URLs to index by clicking on the "More" button in Spider Mode.

    However, adding URLs to the database, without Zoom indexing them is certainly pointless and not possible. I suspect you actually meant that Zoom would index something from these URLs, perhaps their filenames alone? Zoom is a search engine after all, and it will need something to search by to return it as a result. If Zoom does not look at the file, then Zoom would not have any relevant information to search for in that file.

    And yes, for Zoom to pickup meta information such as title, description, etc. it would require Zoom to index the file. Zoom by default, would also be indexing the actual content of the pages.

    You can however, configure Zoom a great deal as to what you want it to index, and what you want it to ignore - so you can ask Zoom to only index the title and description and ignore the content if that is what you are after. Please see the "Indexing Options" tab on the Configuration window.
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment


    • #3
      Raymond,

      Thanks for the info. One other thing I noticed was the duplication of links being indexed. For example, Zoom indexes both these:

      http://www.domain.com/page1.html
      http://domain.com/page1.html

      How do I avoid the duplication of both URL's?

      Thanks!

      Comment


      • #4
        We treat these URLs differently because they are different URLs. On some servers, these URLs can point to different pages (or can even be on different physical servers).

        A more obvious example would be,
        http://german.domain.com/page1.html
        http://french.domain.com/page1.html
        Here there are two sub-domains in use.

        The best solution is to only every use a single sub-domain in all your URLs. i.e. always use, WWW, or never use WWW. Don't mix your drinks, nor your URLs.

        Another solution would be to skip all URLs that have http://www. in the URL. (assuming you have all pages available in both sub-domains)

        A 3rd solution, if the content of the pages is 100% identical, you can use the CRC option in Zoom to block duplicate content.

        Comment


        • #5
          Yes, the contact is exactly the same. How do I enable the CRC option? Thanks!

          Comment


          • #6
            Tick the "Use CRC" box.

            It is on the Scan options tab of the config window.

            Comment

            Working...
            X