Announcement

**David** · May-24-2012, 09:14 PM

Are you using offline mode or spider mode?
Are the pages static HTML pages or dynamic pages generated using a script?

There are command line options that would allow you to feed in a list of pages that need to be deleted (

-deletepage) or added (-addpages).

**Ray** · May-25-2012, 01:39 AM

Originally posted by boswebdev View Post

Another similar questions I have is, is there any way of telling ZOOM to look at an xml sitemap and to only crawl pages that have been updated? Our XML sitemap automatically updates the modified dates of the pages and I figured that could tell ZOOM which files to update and which ones to skip over.

We don't support crawling an existing sitemap at the moment. Zoom however, can generate a XML sitemap based on the files it indexed and crawled (which is unfortunately, the opposite of what you're after).

We plan to add support for crawling based on a given XML sitemap file in V7.

**boswebdev** · May-29-2012, 05:40 PM

Originally posted by wrensoft View Post

Are you using offline mode or spider mode?
Are the pages static HTML pages or dynamic pages generated using a script?

There are command line options that would allow you to feed in a list of pages that need to be deleted (

-deletepage) or added (-addpages).

The pages are dynamic (database driven) and we cannot use command line since we need to be able to hand this off to a client the easiest way. Lets say ZOOM crawls my pages a to z and indexes them all. The next time I index all I need to do is crawl pages a to f, but keep a through z in my search results. All of our pages need to show up in the search results on our clients web site. A large portion of the pages are only going to get updated once or twice a year and unfortunately take an hour to re-index. We would like to figure out somehow to just update a portion of the index every time we re-crawl instead of the entire site.

**David** · May-29-2012, 09:21 PM

If you are using spider mode, and your pages are returning valid date stamps then it should be possible to use the "Update existing index" function.

It should be just a matter of selecting this options from the menu.

With each subsequent incremental update, the index gets larger and less efficient. We recommend performing a full re-index regularly where possible (perhaps once a month, depending on how often you perform a partial index).

Note that the valid date stamps requirement is critical. Without good date / time stamps it is impossible to know what pages are new.

Announcement

Indexing skip over directory, but keep search results

Indexing skip over directory, but keep search results

Comment

Comment

Comment

Comment