PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Indexing skip over directory, but keep search results

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Indexing skip over directory, but keep search results

    We have a website that has thousands of pages inside of a single directory /manufacturers/. Because of this, to index the entire site, it takes up to an hour every time. We are trying to figure out if there is any way to do a full indexing and then the next times, skip of that /manufacturers/ directly without loosing those search results. I looked into into Updating an Existing Index, but there were no clear examples of how to do this or whether this is exactly what I am after.

    Another similar questions I have is, is there any way of telling ZOOM to look at an xml sitemap and to only crawl pages that have been updated? Our XML sitemap automatically updates the modified dates of the pages and I figured that could tell ZOOM which files to update and which ones to skip over.

  • #2
    Are you using offline mode or spider mode?
    Are the pages static HTML pages or dynamic pages generated using a script?

    There are command line options that would allow you to feed in a list of pages that need to be deleted (
    -deletepage) or added (-addpages).

    Comment


    • #3
      Originally posted by boswebdev View Post
      Another similar questions I have is, is there any way of telling ZOOM to look at an xml sitemap and to only crawl pages that have been updated? Our XML sitemap automatically updates the modified dates of the pages and I figured that could tell ZOOM which files to update and which ones to skip over.
      We don't support crawling an existing sitemap at the moment. Zoom however, can generate a XML sitemap based on the files it indexed and crawled (which is unfortunately, the opposite of what you're after).

      We plan to add support for crawling based on a given XML sitemap file in V7.
      --Ray
      Wrensoft Web Software
      Sydney, Australia
      Zoom Search Engine

      Comment


      • #4
        Originally posted by wrensoft View Post
        Are you using offline mode or spider mode?
        Are the pages static HTML pages or dynamic pages generated using a script?

        There are command line options that would allow you to feed in a list of pages that need to be deleted (
        -deletepage) or added (-addpages).
        The pages are dynamic (database driven) and we cannot use command line since we need to be able to hand this off to a client the easiest way. Lets say ZOOM crawls my pages a to z and indexes them all. The next time I index all I need to do is crawl pages a to f, but keep a through z in my search results. All of our pages need to show up in the search results on our clients web site. A large portion of the pages are only going to get updated once or twice a year and unfortunately take an hour to re-index. We would like to figure out somehow to just update a portion of the index every time we re-crawl instead of the entire site.

        Comment


        • #5
          If you are using spider mode, and your pages are returning valid date stamps then it should be possible to use the "Update existing index" function.

          It should be just a matter of selecting this options from the menu.

          With each subsequent incremental update, the index gets larger and less efficient. We recommend performing a full re-index regularly where possible (perhaps once a month, depending on how often you perform a partial index).

          Note that the valid date stamps requirement is critical. Without good date / time stamps it is impossible to know what pages are new.

          Comment

          Working...
          X