PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Crawler Saving Files for Local Indexing

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Crawler Saving Files for Local Indexing

    Incremental Backup is a great feature useful for many sites. However, Off-Line Backup is still much faster and has many other advantages, including being less fragile. I would think that the advantages of both could be easily achieved in a future release with an option to have the crawler save a copy of the files it wants to index to a local folder.

    This would allow an initial full crawl of a site that would make the relevant files local for local indexing. Thereafter, incremental crawls would allow the off-line files to be maintained as a copy of what was on the site. The actual indexing could be to the off-line copy and be robust.

    It would also solve the problem of indexing multiple sites into the same index when some are local and some remote.

    Of course, this would not work for all sites, but then nothing does. It would be most useful to sites that rarely remove pages but there are lots of those. Since Zoom already has the file locally in order to index it, it would seem to be a minor enhancement to implement and a major benefit to many users.
    -Gabe Fineman
    Washington, DC [still defranchised]

  • #2
    Incremental Backup is a great feature...
    I think you mean Incremental indexing, as Zoom is not a backup application.

    If the remote files are copied locally, then how can you know that the local files are up to date 1 month later when you re-index the files? The only way is to hit the remote site, and if you are always hitting the remote site, the big advantage of local indexing is gone.

    It would also not work for most dymanic pages.

    Plus you might need GB's of extra local disk space for storage of what might be out of date files. Plus there would be a lot of extra disk activity (at the moment we don't write out most pages to disk).

    Plus it doesn't make sense for a lot of our customers, who have stable high speed internet connections (especially those on a LAN with a Intranet).

    So I think only a few customers would get some advantage from this.

    Comment

    Working...
    X