PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

64bit Edition of Zoom Search Engine

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    2G limit for 32 bits applications

    Guys

    the problem is ZoomIndexer.exe is linked without IMAGE_FILE_LARGE_ADDRESS_AWARE bit set in PE header. Please refer to the following for details: http://www.microsoft.com/whdc/system...AE/PAEmem.mspx.

    Without this flag your app is limited to 2GB of Virtual Memory. And this is not enough for indexing non-English-text (chemistry related articles in our case) when dictionary grows fast. For our purposes extending it to 3GB would suffice. All you have to do is to specify /LARGEADDRESSAWARE flag (if compiling with Microsoft compiler). This could even be done without relinking using some kind of binary editor, but it seems you're checking CRC on startup to avoid file modifications. The good thing about this modification is you could get 1.5 times capabilities improvement for free!

    Comment


    • #17
      Originally posted by will View Post
      Yes it is the file system limit not RAM I was referring to.

      Can you not allow 3GB for CGI? Seeing as you have separate Windows and Linux options for the CGI version.
      This is already the case. The 2 GB filesystem limit only applies to the Linux and BSD versions of the CGI output. Have you tried indexing with CGI/Win32 selected in the Indexer? What is the exact error message you are seeing when the Indexer hits the limit?

      Originally posted by will View Post
      - I'm not trying to index 100MB PDFs
      - I'm not trying to index the whole internet (just 100,000 PDFs)

      Av words per file is ~5000
      Av unique words per file ~ 40
      So that's about 4 million unique words. That's a large amount, despite the relatively smaller number of documents. Our references to the number of pages/files are approximations based on average document sizes. Your PDFs are clearly quite large, due to the number of chemical names and numeric values involved.

      Originally posted by valt View Post
      Without this flag your app is limited to 2GB of Virtual Memory. And this is not enough for indexing non-English-text (chemistry related articles in our case) when dictionary grows fast. For our purposes extending it to 3GB would suffice. All you have to do is to specify /LARGEADDRESSAWARE flag (if compiling with Microsoft compiler). This could even be done without relinking using some kind of binary editor, but it seems you're checking CRC on startup to avoid file modifications. The good thing about this modification is you could get 1.5 times capabilities improvement for free!
      According to Will above, the limit he is hitting is the filesystem limit, rather than the memory limit. So this would not get around that.

      While it is possible to compile the application with the abovementioned linker flag enabled and extend it to use up to 3GB of virtual memory (in XP and Server 2003 only), it is not without risk or costs - the fact that Windows require you to specify this flag (and apply a change in the boot.ini file), as opposed to making this the default behaviour, itself suggests that complications can occur and such an action should require some extensive testing. We have some internal memory management in some areas of optimizations that may complicate things. On top of that, it is quite possible that the filesystem limitations will be reached before the memory limit, as seen in Will's scenario above.

      Having said that, if you like, we could provide you with a test build with this flag enabled, and see if it helps much in your situation. We would be interested to hear the results since we have not tested it with data of this size. Please e-mail us for more information.
      --Ray
      Wrensoft Web Software
      Sydney, Australia
      Zoom Search Engine

      Comment


      • #18
        Does the professional version have a 64bit version too? When can it be released?

        Comment


        • #19
          Betty,

          If you just asking about the future 64bit version of the software becuase you have a 64bit operating system, then you don't really need it.

          The 32bit version of the software will run fine on 64bit Windows (up until to the memory limits, etc.. discused above).

          As far as we know there are only one or two customers that will actually benefit from 64bit at the moment. And the above discussion hasn't really changed that, as no one seems to be running out of RAM. So demand for 64bit isn't strong (yet).

          The file system limit isn't really related to 64bit, it is more related to support for continued support for old operating systems and the current structure of internal pointers in the index files. In other words, files greater than 4GB are possible in most versions of 32bit Windows and newer versions of 32bit Linux.

          There is no date for the release of a 64bit version as yet. We are thinking that in addition to doing a 64bit code release that we also need to change the index file formats (to remove both the limits discussed above, rather than just one of them). But this is a bigger job.

          Comment


          • #20
            Need 64 bit version

            Hi there,

            I am looking for a 64 bit version of search.cgi as I currently get a "/lib/ld-linux.so.2: bad ELF interpreter" error when trying to execute search.cgi. I tried symlinking from my lib64 to lib but that does not work.
            Do you have any 64 bit versions?

            Regards

            Anders

            Comment


            • #21
              To date the only 64bit platform we have compiled the search.cgi script for is Sun / Sparc hardware.

              We do however sell the source code for the search.cgi if you wanted to re-compile it yourself.

              But all 64bit Linux systems should (in theory) be able to run 32bit binaries, if configured correctly. However I know from experience that different distributions of Linux have serious binary compatibility issues.

              Comment


              • #22
                Just mention our success and show that getting the source code to the CGI was beneficial:

                We have successfully compiled, tested, and run the CGI on Linux 2.6 - x86, x86_64, ppc, and ia64; as well as Win32 and MacOSX - ppc.

                We had to adjust the Makefiles a little (especially for the ppc variants) and some includes in the source. But it is stable and we are happy.

                Thanks Zoom,
                D

                Comment


                • #23
                  ASP Limitation

                  While our professional version claims to support 200,000 documents, it won't allow me to configure it higher than 65,500 to be indexed. Is this a 32-bit ASP issue? Would 64-bit Zoom using ASP on 64-bit Server 2003 allow for a higher limit?

                  Comment


                  • #24
                    jcbeck, this is not a 64bit issue.
                    The problem is with the ASP language itself, which has terrible performance. So we limit the ASP script to 65,000 files in order to stop the search times being minutes long and putting a huge load on the server. If you switch to using the CGI option you'll get a 10 fold improvement in performance, higher capacity and much less server load.

                    Comment


                    • #25
                      Please note that this is stated in the message box when configuring above the 65,500 page limit. And also on our "Which Edition?" page:

                      *Dependant on memory and hardware resources available. More details here. Sites containing over 65,000 pages in total will need to use the higher performance CGI platform option.
                      --Ray
                      Wrensoft Web Software
                      Sydney, Australia
                      Zoom Search Engine

                      Comment


                      • #26
                        I see this now... now that this is live (with just 18k documents) we're seeing multiple second queries in ASP (25+ seconds). Is there some way to customize the CGI the way I am customizing the ASP (http://www.wrensoft.com/forum/showthread.php?t=2712) to filter results based on the URL of the result and a variable set based on a cookie value?

                        EDIT: From your forums, it appears my only option is to enable XML output which I parse and filter: http://www.wrensoft.com/forum/showthread.php?t=1461. Since the CGI to XML searching is very fast, the remaining filtering should be reasonably fast within ASP, so I can apply my filters there and update match counts accordingly before generating the finished output.
                        Last edited by jcbeck; Aug-01-2008, 06:09 PM.

                        Comment


                        • #27
                          This is getting rather off topic (as it has nothing to do with 64bit). But anyway, the CGI is a native compiled executable in C++. Not a script. But we sell the C++ source code as part of a SDK here. So you can edit it and re-compile it, if you know C++. The CGI is slightly more complex than the PHP and ASP code as we spent a fair amount of time optimizing it for for very large sets of files (1M+ documents)

                          Comment

                          Working...
                          X