PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Client and Remote Improvements

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Client and Remote Improvements

    I really like this product. I think it's a great value just as it is, but there are a few things that would make it a lot more powerful, both on the client side and remote side.

    I see a lot of questions about how to skip or include files and/or folders during the indexing process. For some this is simple as their file organization and naming conventions are well structured, which lends itself to the simple implementation of the extension inclusion or file/folder denial word lists. But some of us inherit some less than organized legacy sites that make filling out these lists pretty cumbersome. A more comprehensive approach would be to allow the spider to first prepare a site manifest as a pre-scan, which would be presented as a collapsible/expandable tree with each folder and file accompanied by a check box. On the highest level, the entire list could be set to 'include' or 'deny' whatever checkboxes are selected. The checkboxes could then be used in a highly selective manner to easily qualify the files to be indexed. New files and folders added after the initial scan could be highlighted. The nice thing about this approach is that it would really help those using the non-pro versions to allocate their file count allotment before the indexing process begins. Initially, in lieu of a visual approach, the system could simply read a user prepared concordance file.

    On the remote side, a very powerful plug-in would be to track sessions so that a user's searches could be profiled and saved. The results of the searches would be stored in a backend db along with options to automatically perform profile searches on a recurring basis with the user receiving an e-mail notification when new references to search criteria are added to the site. I know this sounds daunting, but it's all pretty standard stuff. This would be invaluable for dynamic php sites, especially forum based software. The profile searches could be scheduled to occur at off-peak times.

  • #2
    Doing a pre-scan in order to built a visual tree of the site would be almost as intensive and long as indexing the full site.

    If the idea is to limit the indexing to a subset of the site because indexing the full site would take too long, then the entire concept fails. For example, you can't index your site because it is too large, but on the other hand you can't limit the indexing until you have indexed the site.

    There are also many sites with forms or calender scripts which generate a near infinite number of pages. So the pre-scan will never complete on these sites.

    Also building a visual tree of a site would be very RAM & CPU intensive once you start indexing 200,000+ pages. Details of every page need to be held in RAM to maintain the display. Several customers are already hitting the RAM limits in their PCs with 100s of thousands of pages.

    Finally many sites don't appear as a simple tree (except in concept). More often they are a mesh structure with many cross links. So a tree view doesn't represents the true page links on most sites. Thus leading to user confusion as they manipulate a tree view which is not representative of there non hierarchical site.

    On the 2nd suggestion. Most form software already has a facility to notify users of new posts. Also Zoom has no concept of a user (and doesn't store E-mail addresses). To notify users of new pages would require Zoom to manage user accounts (login / logout / Add / Delete / Edit) and to remember results already seen by the user. The storage requirements would be massive. In our opinion this is a job better suited to a CMS rather than a search engine.

    ----
    David

    Comment

    Working...
    X