Now I know that I have touched on this subject previously and I know that some posts touch on aspects of this but all that considered I'm still not content with how this is working. I accept that some of this may be strategy or my lack of understanding of best practice with ZOOM or the way ZOOM works as opposed to the way I'd like it to work. But make no mistake, I'm a big fan of ZOOM.
It seems to me there are several barriers to indexing large systems with ZOOM despite Wrensoft's performance claims (that's not a criticism - I accept its metrics but its the practicality of taking ZOOM that far that is an issue for me.) Let me expand with some of the issues, questions, concerns and suggestions I have:
Changes to your .cfg file - I know you can add new starting points and keep going which is great but invariably if you add new starting points you might be inclined to tweak your settings in which case ZOOM requires you to start all over again. For me I usually tweak the settings to improve search results or reduce indexing loads and times. On large operations restarting or redoing it entirely is a pain and a great consumer of resources.
System disruption - if something goes pear shaped during a long indexing operation (e.g. loss of connectivity, app crash [most common one for me] etc) then you have the joy of starting all over again. It would be nice if on longer operations ZOOM did a save in the background e.g. every 10 minutes or so - something like a FTP resume would then be nice so it picks up from the last save.
Maximum file size - is a real pain for me. In some cases I'm indexing (or trying to index) files (usually PDFs) way beyond the default limit for ZOOM - I have played with various settings but that's not the real issue. It looks like ZOOM downloads the file to the maximum configured file size and once reached ditches it - is that right - if so then this is great waist of resources and ultimately the document is not in the index in any way. What would work better is that if ZOOM indexed the file to the maximum size if that's possible - or alternatively if the maximum size is reached (or preferably detected before download) then it index the first X pages of the document. This is actually a good strategy because if you're indexing a lot of vintage magazines as I am then the Contents of the magazine are usually listed within the first 10 pages or so so you pick up a "snapshot" of the magazine in your index rather than it being ditched and not indexed at all.
I did have a tinker with MasterNode creating smaller indexes and then using MasterNode to collectively search them but didn't particularly like it due to the trade off in features and need to synchronize several .cfg files. I have, as an interim solution, deployed an ad free Google Custom Search but I prefer the level of control that ZOOM gives me over the UX and what is indexed etc etc so want to get this working.
Thanks folks!!!
It seems to me there are several barriers to indexing large systems with ZOOM despite Wrensoft's performance claims (that's not a criticism - I accept its metrics but its the practicality of taking ZOOM that far that is an issue for me.) Let me expand with some of the issues, questions, concerns and suggestions I have:
Changes to your .cfg file - I know you can add new starting points and keep going which is great but invariably if you add new starting points you might be inclined to tweak your settings in which case ZOOM requires you to start all over again. For me I usually tweak the settings to improve search results or reduce indexing loads and times. On large operations restarting or redoing it entirely is a pain and a great consumer of resources.
System disruption - if something goes pear shaped during a long indexing operation (e.g. loss of connectivity, app crash [most common one for me] etc) then you have the joy of starting all over again. It would be nice if on longer operations ZOOM did a save in the background e.g. every 10 minutes or so - something like a FTP resume would then be nice so it picks up from the last save.
Maximum file size - is a real pain for me. In some cases I'm indexing (or trying to index) files (usually PDFs) way beyond the default limit for ZOOM - I have played with various settings but that's not the real issue. It looks like ZOOM downloads the file to the maximum configured file size and once reached ditches it - is that right - if so then this is great waist of resources and ultimately the document is not in the index in any way. What would work better is that if ZOOM indexed the file to the maximum size if that's possible - or alternatively if the maximum size is reached (or preferably detected before download) then it index the first X pages of the document. This is actually a good strategy because if you're indexing a lot of vintage magazines as I am then the Contents of the magazine are usually listed within the first 10 pages or so so you pick up a "snapshot" of the magazine in your index rather than it being ditched and not indexed at all.
I did have a tinker with MasterNode creating smaller indexes and then using MasterNode to collectively search them but didn't particularly like it due to the trade off in features and need to synchronize several .cfg files. I have, as an interim solution, deployed an ad free Google Custom Search but I prefer the level of control that ZOOM gives me over the UX and what is indexed etc etc so want to get this working.
Thanks folks!!!
Comment