Announcement

**David** · May-12-2021, 01:37 AM

Yes, you were a couple of years behind on the patches. So there is some chance that might help. Hard to be sure, as we don't know the cause of the crash.
Exception code: 0xc0000005 means "Access Violation". It is the most common form of error for software and means that the software wrote to a memory address that it shouldn't have. Typically this due to a software bug, but hardware fault can also cause the same error.

If you can get a crash dump that can sometimes help.

Turning on logging in Zoom can also help (from the Logging options window). Especially if the crash happens each time on the same web page / document. But it will slow down indexing a bit.

There is also a lot of hardware out there that just isn't stable enough to run anything under heavy load for a week.

We have this page in the FAQ for indexing large sites. Some of the tips might help.
Incremental indexing in particular might help for your site, especially if the indexing job can be broken up into discreet parts (e.g. different domains).
A lot of the indexing speed is dependant on the speed of the web site you are indexing. If each web pages takes 10 seconds to generate then indexing can be very slow. If your web site can server 100pages / second, then indexing can be quicker.

Is all the data already in SQL in a nice structured format? Maybe you don't really need a unstructured text search engine in that case?

**BluejacketSoftware** · May-12-2021, 06:36 PM

Hi David. Years behind? I just purchased and downloaded last week?

I will attempt to get a crash dump for you should it fail again.

The data is in the sql server but pages have to be assembled from blocks which is all done in software so I don't think there's a path to connect directly to the database (if that's where you were going). As it stands, I have a search feature built into the site but unfortunately it relies on the the SQL Server Full Text Search feature and since there are hundreds of thousands of pages broken down into blocks (i.e. paragraphs, headings, etc) the search is slow (the database is almost 3/4 TB).

I specifically purchased Zoom to be able to give me a fast, type ahead search function to replace the slow, labored version. I don't necessarily need it to be 'unstructured' but the features of Zoom far exceed what we can grow in house in a reasonable amount of time.

I will read the FAQ for indexing large sites and see if there's anything in there that I can implement.

We're a C# development house so we're comfortable with getting our hands dirty with software but this is clearly not written in a 'managed' (ie dotnet) language so other than digging through logs and event viewer, there's not much we can deconstruct on our end.

I have turned on logging at the diagnostic level so hopefully that will give some insight.

Unfortunately, I just got word that the web server is not responding at the moment. Not sure if it's related to the indexer or not yet. Going in to troubleshoot now. If it turns out to be the indexer that caused the webserver to stop responding, I'll let you know and see if I can replicate the problem with enough diagnostic information to see if you can get to the root of the problem. I looked at the server briefly already and noticed that the indexer was running but apparently stuck. I performed and IIS reset and verified all the app pools were indeed running and then paused/resumed the indexer. That seemed to re-awaken the indexer and it started processing again for a while but the other sites appear to be unavailable still. It might require a server reboot to bring everything back online. The only thing that's different from normal operations is that the indexer was restarted yesterday afternoon. I'm not prepared to blame Zoom yet thoug, not enough info. Could be many other things. I'll let you know if I find a root cause.

Scott

**BluejacketSoftware** · May-12-2021, 08:25 PM

Reading the FAQ, I have some questions:

Split indexing process over multiple machines

If it makes sense, split the source files into categories and perform indexing on smaller portions of the data using separate machines, and thus greatly reducing the amount of time required to index the complete data set. Wrensoft provides a free software tool known as Zoom MasterNode that could be used as a front-end to these distributed index files so that they can be collectively searched. MasterNode works by taking any search request and transparently dividing the work amongst its slave node machines (where the various actual indexes are stored), which can result in better search performance and greater search capability.

Question: Does this not require a higher level of license to run the indexer on multiple machines then?

And I finished the webserver troubleshooting. Turns out it was an infrastructure problem. My hyper-v cluster had a hiccup causing all kinds of trouble. Everything is back online now and I'm going to resume the index.

**David** · May-13-2021, 12:46 AM

Once you get to the point of dealing with TB of data everything starts to get exponentially harder and slower. Small mistakes, bugs or experiments take days of time to sort out. Dealing with incremental updates can also be problematic and requires good planning.

Some other ideas.

1) Setup a development / staging environment so you aren't experimenting on the live system. Potentially you can even build the search index on the staging machine, then just move the index to the live machine (without ever indexing the live environment). You could build the dev environment on a single machine. i.e. SQL, Web server and indexer all on the same box with a fast M2 SSD. With no network latency and no background load the indexing might be 5x faster (just a wild guess).

2) Write a script that pre-generates all your web pages into very simple HTML files (no need for CSS, JS, graphics or any significant formatting). Just include the headings and text. This could be done in any programming language that can call your SQL database. Then once all the HTML files are made, run the Zoom indexer in offline mode on those simple HTML files. Rewrite the URLs and job done! This has a bunch of advantages. Zero network latency, no load on your DB during indexer, much much faster indexing and easier incremental changes. With offline indexing it can be 10x faster.

For MasterNode, As of May 2016, this product has been discontinued and is no longer supported. There wasn't enough demand for doing really really big indexes, so we stopped development. However, it remains open source should anyone want to use it. No special license is required for this. Despite being open source, we have had no code contributions for the last 5 years. And no recent testing has been done. So I wouldn't be very confident of MasterNode being a good solution.

Announcement

Maximum performance best practices

Maximum performance best practices

Comment

Comment

Comment

Comment