PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

64bit Edition of Zoom Search Engine

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • will
    replied
    Something about setting IMAGE_FILE_LARGE_ADDRESS_AWARE tag?

    w.r.t.. what Ray says: I'm pretty sure they are PDFs but how do I know whether they are indexing binary garbage? Also, how would I stop this from happening?

    Leave a comment:


  • will
    replied
    Windows applications are limited to using 2GB of RAM
    Another source: 32 bit apps are limited to 3GB

    Can we not have 3GB from Zoom please?

    Leave a comment:


  • nibb
    replied
    Well I did not knew that. So if you buy 64 bit servers, what would be more recommend for config, 2 severs with 8 GB of Ram of 4 servers with 4 GB each one. For example you said 4 GB on a 32 bit system is a waster. I dont have 32 bit systems anymore but we had a server some years ago with Windows NT with a couple of Gb Of Ram so i guess it was a waste. I dont remember but i think it had 6 Gb in the time.

    Leave a comment:


  • David
    replied
    Thats great. So the 64 Bit will have the double of power.
    At least double. We hope. It depends on how much we change the index file format and if we are prepared to drop support for older operating systems.

    With MasterNode you can have multiple machine doing the indexing and multiple machines serving up search results.

    4GB of RAM is almost always a waste on a 32bit Windows system.
    Windows applications are limited to using 2GB of RAM due to a lack of virtual address space.

    Leave a comment:


  • nibb
    replied
    Thats great. So the 64 Bit will have the double of power.
    Can the limits go farther with the right harware? For example if you have 5 servers with the multinode, i see it uses the 5 servers for searching but could it be done for indexing as well?
    For example if you put 5 servers with 2 GB of Ram each one, could they all work together to index the data? Also what happens if i have a 8 GB server will i be able then to index more then a millon files? html files.

    Leave a comment:


  • Ray
    replied
    Would like to add that, in the past, we have often found that when users seemingly index a huge number of unique words, that there might be something else at play. For example, they could be indexing a file type that is not supported, and Zoom ends up indexing a lot of binary garbage, thus pushing the unique words count up for data that would not be meaningfully searchable anyway.

    On very rare occasions, it may be more valid and the user may actually have 100,000 PDF files, each of which are over 100 MB in size or similar. These are certainly exceptions to the rule, and in such cases, there are still alternative methods which may be suitable (for example, limiting the number of words to index per file). So if you can better clarify your scenario, the more likely we can be of help.

    Leave a comment:


  • David
    replied
    The 2GB limit I was refering to was a file size limit for some of the index files. Old Linux operating systems are not able to deal with individual files greater than 2GB (technically speaking, they used a signed 32bit integer for seeking to a file position). That limit has nothing to do with available RAM.

    It would help if you answered my other questions.

    How many million is "several million"? Having so many unique words is not typical, why do you have / need so many?

    Leave a comment:


  • will
    replied
    Hi,

    Think it's a unique word issue (several million of them) which stops Enterprise edition exceeding ~50,000 files. Once 2GB limit is reached I guess your software cuts off indexing. Is there any way to "disable" that cut-off and let it carry on? Afterall, Zoom lets indexing start when it predicts 2.8GB needed.

    Leave a comment:


  • David
    replied
    We have tested it to just over a million pages / documents. We did most of our tests with moderately sized HTML files. But there are certainly conditions under which you are not going to reach a million pages / documents. If you are indexing large PDF files for example, you are not going to get to 1M large PDF files. If you don't have enough RAM, then you aren't going to make it either.

    Generally the blocking factor will be either, 1) You run of of RAM or 2) The files grow so large that the internal 32bit pointers that cross reference records within the index aren't large enough anymore. But there are other subtle effects in play, sometimes, like older versions of Linux can't seek within files greater than 2GB, etc..

    But 100,000 shouldn't be so hard to reach.

    What limits did you set on the limits tab?

    What exactly do you mean by "overload"? We don't have any message to that effect in the software as far as I know.

    Are you using spider mode or offline mode?

    What type of content are you indexing?

    What hardware (CPU, RAM, Free disk space, Internet connection) do you have?

    Leave a comment:


  • will
    replied
    Hi,

    Are you sure that the Enterprise 5 edition has a million page capacity in all cases?

    My index of 100,000 pages overloads it and I am trying to figure out why. Before indexing starts it says it needs 2.8GB RAM (which we have).

    It overloads before it finishes indexing rather than when it tries to write the files at the end of indexing.

    Does that mean it is a RAM issue or that 32-bit edition cannot cope?

    Please reply saying either:
    - 32-bit can handle it with enough computer power
    - I can have a free upgrade to 64-bit version when it is released

    infinite capacity
    Download the internet here:
    http://www.w3schools.com/downloadwww.htm
    Last edited by will; Jun-26-2007, 10:15 AM.

    Leave a comment:


  • David
    replied
    We have a 64bit version of the software almost ready to go. Be we figured that there was no point releasing it until there was some demand.

    64bit hardware and and a 64bit operating systems will be required. The native 64bit version should immediately double the capacity of the 32bit software. And there is the potential to increase capacity 10 fold in the medium term (with the right hardware).

    But at the moment the million page capacity of the 32bit version is large enough for almost all of our customers. And the customers who want more than a million tend to rather irrationally ask for infinite capacity.

    What is your project?

    If it is just capacity you are after, then there is also the MasterNode distributed search solution.

    Leave a comment:


  • will
    started a topic 64bit Edition of Zoom Search Engine

    64bit Edition of Zoom Search Engine

    Hi,

    In the warning you get in the configuration "limits" tab, it suggests to use the "64-bit edition of Zoom".

    Is there such a thing? Can't find on your website.

    Best Regards,

    Will
Working...
X