PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Incremental Indexing

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Incremental Indexing

    Hi, I am new to this forum. The question I am asking might have been answered but I am not able to find it. My question is:

    I have implemented Zoom Search to my News Portal. Daily new news is posted to the site and the old news is archived. To see the archived news user has to select the date and then access the news. Now how can I make Zoom Search to Search in the archive news because their is no direct link to the archived news.

    Your urgent help will be appreciated.

    Thanks

  • #2
    You titled the post Incremental Indexing, but I don't think your main problem relates to Incremental Indexing.

    If you have content that is protected by a form then no search function will be able to find it (including Google).

    There are several solutions. Some of these are,
    - Using off line mode instead of spider mode. (links are not required in this case)
    - Create links. This could be via an automatic directory listing or a hidden HTML page.
    - Adding a list of pages into Zoom as new start points (see chapter 2.1.3 in the Users Guide)

    Comment


    • #3
      Search Indexing for inaccessible links

      Thanks for your response.
      We have recently bought the Enterprise license of Zoom Search.

      To find the best suited solution for this problem let me give you some insight of our news portal.

      Daily one edition is published in the portal. When the users access the site they are redirected to the latest edition (Edition ID passed as the query string).

      Lets suppose for the last three days the home page URL is as:
      1st Day : http: //www.mynewsportal.com/Default.aspx?EditionID=1001
      2nd Day : http ://www.mynewsportal.com/Default.aspx?EditionID=1002
      3rd Day : http ://www.mynewsportal.com/Default.aspx?EditionID=1003

      Now what I want is that when we index the site the indexer should append the results to the previous day results. In this fashion if we index the site on day 1 and then on day 2 and then on day 3 it will have the results for all editions.

      Thanks
      Pankaj
      Last edited by pankaj121; Oct-26-2010, 07:00 AM.

      Comment


      • #4
        Are you saying that you lose the links to the previous editions so they become unaccessible to the end user? (In which case, why would you want the search to point results to them if you don't want them to be accessible?)

        Possible solution #1:
        - You use incremental indexing to add each new edition to the existing index. You can do this either by manually starting Zoom and entering the new edition URL (via "Indexing"->"Incremental indexing"->"Add start points to existing index" or "Add new pages..."). Please see the Users Guide for more information.

        Or you can script something to do this at the same time as when you update your main page's redirection. Note that Zoom has command-line parameters you can call:
        http://www.wrensoft.com/zoom/support...mmandline.html

        Possible solution #2:
        - You setup Zoom to always index a particular hidden start page (e.g. http://www.mynewsportal.com/spiderstart.asp)
        - This page will contain an updated list of links to each of the editions you want to index.
        --Ray
        Wrensoft Web Software
        Sydney, Australia
        Zoom Search Engine

        Comment


        • #5
          I am using the incremental indexing option through command line as:
          ZoomIndexer.exe -s zoom.zcfg -update

          This open the GUI of zoom search engine indexer but within a second or two the interface is closed. None of the file in the output directory is modified. Please advice if i am doing something wrong.

          Comment


          • #6
            Please note there is more than one incremental indexing feature.

            You are using the "Incremental update" feature which is different to what I was talking about ("Incremental Add Start Points.." or "Incremental Add Pages"). In other words you should be using either the -addpage or -addstartpt commandline options as given.

            "Update" only asks the Indexer to check if any of the existing pages in the index has been modified and if so, it will re-index them. This won't work if your existing pages don't change and you just have a new page added somewhere (as you said, unlinked to).

            Please see chapter 2.10 in the Users Guide for more information.

            I noted you have modified the post I previously responded to, and there is a mistaken premise in your requirement:

            Originally posted by pankaj121 View Post
            Daily one edition is published in the portal. When the users access the site they are redirected to the latest edition (Edition ID passed as the query string).

            Lets suppose for the last three days the home page URL is as:
            1st Day : http: //www.mynewsportal.com/Default.aspx?EditionID=1001
            2nd Day : http ://www.mynewsportal.com/Default.aspx?EditionID=1002
            3rd Day : http ://www.mynewsportal.com/Default.aspx?EditionID=1003

            Now what I want is that when we index the site the indexer should append the results to the previous day results. In this fashion if we index the site on day 1 and then on day 2 and then on day 3 it will have the results for all editions.
            1st problem: You didn't mention how you are doing the redirecting. If you are using JavaScript to redirect then no spider will be able to follow this.

            2nd problem: Even if you are using HTTP redirection (which spiders can follow), the fault is that once a spider has visited say http://www.mynewsportal.com/Default.aspx once (and gets redirected to one of the above editions) it would have considered the main page to have been visited and it won't revisit it. The "content" of the page would not have changed.

            So my original suggestion to use incremental add page (and explicitly specify the new page to index) would still be the better way to approach this problem.
            --Ray
            Wrensoft Web Software
            Sydney, Australia
            Zoom Search Engine

            Comment

            Working...
            X