Announcement

**David** · Mar-05-2006, 10:35 PM

We have received your config file, thanks. Unfortunately because we don't have access to any of your files and a large volume of data is involved it is not going to be easy to reproduce the problem.

From your post it appears as if the crash happened immediately after the line, "Cleaning up memory used for index data", appeared in the log file. Can you confirm this?

If this was the case, the index files are probably still intact and complete and can be used.

Needless to say, we still need to investigate and fix the bug.

Could you also send us a couple of example HTML files.

I don't think you are going to be able to index 5,000,000 files in a single indexing session your current hardware (nor with our current software). 200,000 is certainly possible. But 500,000 is probably about the limit given the RAM you have.

Can you spilt the data and have several databases of 250,000 - 500,000 documents each?

----
David

**jimmyd** · Mar-07-2006, 06:05 PM

From your post it appears as if the crash happened immediately after the line, "Cleaning up memory used for index data", appeared in the log file. Can you confirm this?

YES i confirm..

If this was the case, the index files are probably still intact and complete and can be used.

NO I can't .. It's corrupted (i'll send it you via giga email)

Could you also send us a couple of example HTML files.

YES

I don't think you are going to be able to index 5,000,000 files in a single indexing session your current hardware (nor with our current software). 200,000 is certainly possible. But 500,000 is probably about the limit given the RAM you have.

Can your software use disk (tmp file) instead of RAM? There is some workaround solution or tip or something else that you know.. (virtual ram, disk cache.. ecc.. ecc.)

thanks

**David** · Mar-07-2006, 08:14 PM

We didn't receive any E-Mail / files as yet.

We did a few tests yesterday with sets of 150,000+ files but couldn't get a crash. So the problem must only occur sometimes, depending on content being indexed and / or configuration options.

So if you could send us your Zoom configuration file as well, this might also help us match you scenario.

Would you be willing to send us a copy of your HTML files if we needed them to reproduce the problem? (all 125,000 of them). We would pay for the cost of FedEx'ing a DVD-ROM if they were too large to put on the web.

Can your software use disk (tmp file) instead of RAM?

As physical RAM gets low, Zoom automatically starts to use space from the Windows swap file. Windows also automatically reduces the size of the disk cache to free memory as well. This works well for a while. As time goes on and more pieces of the index are written out to the disk. But these pieces are not a complete index and from time to time they need to be added to as more pages are indexed.

To add a few bytes to data in the swap file it needs to be read from disk into RAM, manipulated and then put back onto the disk at a later point. This is an extremely slow operation. Eventually the indexer spends all its time reading writing from the disk and doing no real work. (Thrashing is the technical term).

Virtual RAM is always used by Windows and all Windows applications to 'vitualise' the addresss space. Each application always gets 2GB of usable virtual address space, even if you only have 512MB of RAM installed. It is this concept of Virtual RAM that limits all 32bit Windows applications to using at most 2GB of RAM. So this is another problem in addition to the disk thrashing.

So that was the bad news. The good news is,

1) Windows 64bit will remove the Virtual RAM limit and we are willing to do a 64bit version of Zoom when someone needs it.

2) We have some other ideas about storing index data in a different format so that more of it can be written to disk without there being a risk of needing to swap it between the disk and RAM too often.

But lets fix the crash bug 1st, then worry about the RAM issue.

----
David

Announcement

Error: access violation 4.2 build 1010

Error: access violation 4.2 build 1010

Comment

Comment

Comment