PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

include file list

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • include file list

    Perhaps this is a feature request. Our client has been acquired by larger client and would like to offer a limited form of cross-site search. The search should only index product pages. Unfortunately the various sites use different technologies and architectures to present their products. They also have varying amounts of supporting content pages.

    Although a skip list might work okay, it may limit how many product pages the index will traverse. Using categories to group all product pages together and all non-product pages is more applicable, but it's a big waste of index space since we do not want to offer non-content results, and it will most likely exceed the 65,000 page recommendation (ASP).

    It would be much easier if we could have an inclusion list, that would tell Zoom which pages to include in the resulting index. It wouldn't limit following, only the amount of data in the index. Can this be done in the current build in some other way?

  • #2
    It is hard to give exact advice as we don't know the layout of the sites and exactly which URLs you want indexed and which you don't.

    The list of pages to be potentially included in an index is determined by the start point list.

    By creating or importing into Zoom a list of start points in conjunction with use the file and folder skip list, you can expand or limit the indexing to certain sets of files.

    For example, you might have two start points in your case. These start points might point to each companies product index page (assuming they have a web page with a list of their products).

    -----
    David

    Comment


    • #3
      In this particular case we have a very large product count (1000-2000 per individual site, 8 sites) and multiple pages along the path to an individual product. For example:

      starting point 1: www.site1.com/catalog
      intermediate pages: category.htm, subcategory.htm, productgroup.htm
      product detail page: product.htm

      The intermediate pages are unimportant in terms of the search results, but are vital in terms of spidering.

      So, it would be great to be able to provide a *skip, but spider* list so that the contents are not included, but the spider continue to follow. Much like a noindex, follow instructions in a robots.txt file.

      Comment


      • #4
        One easy solution is just to wrap the HTML on the category pages in and tags.

        The will allow the links to be followed but the text to be ignored. See section 6.5 of the users guide

        Other option that could maybe be made to work would be to use the 'Follow links only' flag for the start points. But this only works to 1 level deep. So the top level text is ignored by the next level is indexed. Because you have multiple intermediate levels you would need a lot of start points if you were going to use this option.

        A 3rd possibility might be to create a whole new page on the site(s). This new, private page, called for example, CompleteProductList.php would list all products without any category hierarchy. You could then use this as the Zoom start point, maybe in conjuction with one of the methods above.

        ----
        David

        Comment

        Working...
        X