PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Updating settings without re-indexing

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Updating settings without re-indexing

    After I spider a site is it possible to regenerate the index files without spidering files from the server each time?

    For example, I'm experimenting with some of the many options you have in your product. I don't want to reindex the site each time I do this. Is there a way to just regenerate the output pages based on the last spider and the new settings?


    (Let me know if that makes any sense)

    - Greg

  • #2
    The settings are held in a settings file. For example, settings.php, settings.asp, etc..

    So sometimes it is safe to edit & upload this file directly without re-indexing the site. But sometimes it isn't.

    For example, if you index with the PHP option, then page pointers are 2 byte integers. But if you index with the CGI option page pointers are 4 byte integers. If you index with the JavaScript option, then integers aren't used at all. Other options like date sorting, context results & categories can all affect the data stored and the index files required.

    Having inconsistencies in the settings, compared to the data in the other index files will cause mysterious and strange behaviour.

    So in general, just to keep the process as simple as possible, we try and force a re-index of the site after any setting has been changed.

    -------
    David

    Comment


    • #3
      Re-index vs. Regeneration

      So from what I understand re-indexing is always safer so things stay in sync.

      If I have a large site (1000's) of pages. The spider does it's work and fetches the files. These files are presumably cached somewhere on the computer? If so it would be usefull to be able to change some options and regenerate the index and settings from the cached files. If the files aren't cached then I understand why you would need to re-spider every time.

      Non-related question.... Any plans to track links (back and forth) between pages to help determine page rank (Ala Google). This would greatly improve accuracy since you could show hub and spoke relationships.

      (So far I really like the product, I registered a few hours ago)

      - Greg

      Comment


      • #4
        The spider defaults to using the Windows cache (shared with Internet Explorer and other apps using the Windows Internet API). To disable the cache, you can check the "Reload all files (do not use cache)" option in the Configuration window, under the "General" tab.

        When the cache is enabled, the spidering process should be much faster, as it is simply spidering the files from the cache. When it is disabled, it will need to re-download all the files.

        We do not currently have plans to implement a page rank system which is dependent on links between pages. This method is generally more effective on an Internet-wide search engine than one which only indexes several sites (which is what Zoom is mostly used for), because there is a greater value in different domains/websites linking to another, than just inter-page relationships within the same site (within one website, most pages are fairly well interlinked and they would have a very similar rank).

        However, it is something we are considering, and something like this can be of benefit. We are also working on alot of new features for V5.0, one of which is an option to allow you to give priority to pages higher up in the website hierarchy (those which are closer to the front page). This can give you a similar effect on a well structured website. A variety of other new features will also keep us open to further possibilities (Zoom will be tracking where some links came from, and the text used to link to a page).
        --Ray
        Wrensoft Web Software
        Sydney, Australia
        Zoom Search Engine

        Comment

        Working...
        X