Announcement

**David** · Aug-13-2013, 09:06 PM

If you index in offline mode then links are not followed. It instead uses the files found in in the file system.
You should be able to index in spider mode however and follow the links.

Yes, the entire contents for all the popular file types are indexed (Word, PDF, HTML, and dynamically scripted pages etc..).

To index dynamic scripted pages, you need to be in spider mode.

I don't know anything about Terascript, but I assume it is like any other scripted language that runs on a server, and so should be OK

See also this FAQ

Q. How do I index protected parts of my website requiring user authentication?

**WebDude** · Aug-20-2013, 01:31 PM

I downloaded the demo and crawled via the spider. I am setting a cookie via IE and then starting a crawl and it seems to work well. We get the max 50 pages indexed. A couple of questions, if you don't mind.

I am passing a couple of arguments to start the crawl of the home page. Unfortunately, these arguments are being indexed and I would like to remove that particular page to keep users from seeing sensitive information. I found a way of removing the page, but is there a way to start from that page and automatically remove the page after a crawl... I am thinking of scheduling and would want that page removed. A way of passing hidden arguments at the start of the crawl would be a nice feature.

Also, when moving the index to another physical path, the program seems to break. I keep getting an error that the index file cannot be found, even though I change the path in the configuration. It seems I cannot move the index at all, and once moved, trying to move it back to the original path seems to still generate the error. The only way I could clear the error was to totally uninstall the program and delete any traces that I could find and then reinstall the program again. Is this a bug perhaps?

I really appreciate your time!

**David** · Aug-20-2013, 10:42 PM

There are a few options for keeping the page out of the index.

1) In the indexer, in the start options window, click the 'more' button then edit the start point. Then select the spider options for that start point. e.g. select "Follow links only"

2) Place ZOOMSTOP tag around the text on the start page. No text will be indexed, and so the page won't appear in search results. See,
http://www.wrensoft.com/zoom/support....html#zoomstop

3) Use NOINDEX meta data on the page.
<meta name="robots" content="noindex">

4) Manually remove the page from the index. This is done from the "Index" menu in the indexer.

There should be no problems moving the set of index files to a folder. Are you talking about the set of index files on your web server? Or the set of index files on your local machine. (This might be one and the same if you are indexing on the server itself). What was the exact error message? What operation are you doing to prompt the error?

Announcement

Is Zoom for me?

Is Zoom for me?

Comment

Comment

Comment