PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Is Zoom for me?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Is Zoom for me?

    I have multiple clients that use various public (i.e. google, freefind) searches, but I have a client who wishes to add a search to their loginID/password site for their clients.

    Since the site requires a login, we use a session cookie to track users. When logged, they are assigned a variable. Every file has an if/else to check to see if that variable is there and if there has been no activity for 30 minutes, they are forced to log in again. All the files have this check before drilling down to the content.

    I presume I would need to crawl these in offline mode. Does the crawl still follow links? Also, does the crawl index the content of the whole file? The files are XML based (using Terascript and server (.taf)). We do have some asp for file uploads and I would be using that or aspx for the actual searches. Also we have a couple of hundred pdf files that are linked that we would need to include. Not a large site, but probably several hundred files altogether. There are a couple of instances wher I would like to follow a post argument (?_usetype=vendor) but I would presume this would not work in offline mode? There would be just a couple of instances of this - can I specify an exact url for something like this?

    I am looking at purchasing the Enterprise version and would be looking at a bit of support to get this up and running for this particular client. Does this all sound possible? The site in question is running on Win2008 server with IIS 7. Would also be looking at adding this to a couple of Win2008 R2 servers with IIS 7.5

  • #2
    If you index in offline mode then links are not followed. It instead uses the files found in in the file system.
    You should be able to index in spider mode however and follow the links.

    Yes, the entire contents for all the popular file types are indexed (Word, PDF, HTML, and dynamically scripted pages etc..).

    To index dynamic scripted pages, you need to be in spider mode.

    I don't know anything about Terascript, but I assume it is like any other scripted language that runs on a server, and so should be OK

    See also this FAQ

    Comment


    • #3
      I downloaded the demo and crawled via the spider. I am setting a cookie via IE and then starting a crawl and it seems to work well. We get the max 50 pages indexed. A couple of questions, if you don't mind.

      I am passing a couple of arguments to start the crawl of the home page. Unfortunately, these arguments are being indexed and I would like to remove that particular page to keep users from seeing sensitive information. I found a way of removing the page, but is there a way to start from that page and automatically remove the page after a crawl... I am thinking of scheduling and would want that page removed. A way of passing hidden arguments at the start of the crawl would be a nice feature.

      Also, when moving the index to another physical path, the program seems to break. I keep getting an error that the index file cannot be found, even though I change the path in the configuration. It seems I cannot move the index at all, and once moved, trying to move it back to the original path seems to still generate the error. The only way I could clear the error was to totally uninstall the program and delete any traces that I could find and then reinstall the program again. Is this a bug perhaps?

      I really appreciate your time!

      Comment


      • #4
        There are a few options for keeping the page out of the index.

        1) In the indexer, in the start options window, click the 'more' button then edit the start point. Then select the spider options for that start point. e.g. select "Follow links only"

        2) Place ZOOMSTOP tag around the text on the start page. No text will be indexed, and so the page won't appear in search results. See,
        http://www.wrensoft.com/zoom/support....html#zoomstop

        3) Use NOINDEX meta data on the page.
        <meta name="robots" content="noindex">

        4) Manually remove the page from the index. This is done from the "Index" menu in the indexer.


        There should be no problems moving the set of index files to a folder. Are you talking about the set of index files on your web server? Or the set of index files on your local machine. (This might be one and the same if you are indexing on the server itself). What was the exact error message? What operation are you doing to prompt the error?

        Comment

        Working...
        X