PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

PHP using CGI index - Hitting 300,000 unique word limit

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by Wrensoft
    I quickly looked back through the old posts, you never actually said what the problem with using the CGI version was?
    We have a DVD disc that I want to put the search on, and I already have PHP running on it (along with a minimal Apache and an SQLite DB). I have yet to test the CGI version, so I will attempt to do that. It is my only option now.

    Thanks,
    Sean

    Comment


    • #17
      The CGI should work fine (and much much quicker than PHP).

      -------
      David

      Comment


      • #18
        David,

        I noticed that when I went to change it to CGI it asks what the host OS will be. The DVD is built only for Windows machines, so would I choose Windows?

        Thanks,
        Sean

        Comment


        • #19
          Yes, if it is for Windows machines, select Windows.

          David

          Comment


          • #20
            Well, I ran the CGI indexer and I was able to finish the entire scan of 5300+ docs and it only topped out at 350,763 unique words. Is there any way to increase the 300,000 unique word limit by just 60,000 words? A registry setting or something? Or is it hardcoded in your program?

            I guess I'm just stuck on trying to do it with PHP.

            Thanks a ton,
            Sean

            ******************
            13:41:03 - INDEX SUMMARY
            13:41:03 - Files scanned: 5312
            13:41:03 - Files skipped: 14018
            13:41:03 - Unique words found: 350763
            13:41:03 - Total words found: 61236411
            13:41:03 - Avg. unique words per page: 66
            13:41:03 - Avg. words per page: 11527
            13:41:03 - Errors: 9
            13:41:03 - Total bytes scanned/downloaded: 2742028311
            13:41:03 - File extensions:
            13:41:03 - .pdf scanned: 5312

            Comment


            • #21
              No. The PHP script is designed with that limitation due to technical issues with the scripting platform (there are performance issues hit by the scripting engine when a certain amount of data needs to be processed).

              As suggested before, the solution is to use the CGI version, which is designed to handle index data exceeding 300,000 unique words. This should work fine with any setup that can run PHP (this includes a minimal Apache setup), so we are not sure what your reasons are for averting away from the CGI version. Perhaps you can explain this to us and we will be better capable at addressing your problem.
              --Ray
              Wrensoft Web Software
              Sydney, Australia
              Zoom Search Engine

              Comment


              • #22
                Originally posted by Wrensoft
                You could use PHP code like this,
                <?php
                $QSTRING = $_SERVER['QUERY_STRING'];
                while (list ($header, $value) = each ($HTTP_GET_VARS))
                {
                $QSTRING = $QSTRING.'&'.$header.'='.$value;
                }
                virtual("/cgi-bin/search.cgi".'?'.$QSTRING);
                ?>
                All this is returning is the binary of the search.cgi file. How can I get it to actually execute the CGI file? I need to be able to fix this in the next week or so.
                Originally posted by Ray
                there are performance issues hit by the scripting engine when a certain amount of data needs to be processed
                By the way, what kind of performance issues are there with PHP using Zoom?

                Thanks,
                Sean

                Comment


                • #23
                  Also, what is the skip word list limit?

                  Thanks,
                  Sean

                  Comment


                  • #24
                    Originally posted by seangates
                    Originally posted by Wrensoft
                    You could use PHP code like this,
                    <?php
                    $QSTRING = $_SERVER['QUERY_STRING'];
                    while (list ($header, $value) = each ($HTTP_GET_VARS))
                    {
                    $QSTRING = $QSTRING.'&'.$header.'='.$value;
                    }
                    virtual("/cgi-bin/search.cgi".'?'.$QSTRING);
                    ?>
                    All this is returning is the binary of the search.cgi file. How can I get it to actually execute the CGI file? I need to be able to fix this in the next week or so.
                    The code should work on any normal PHP installation. If you have more details on your setup, it would be useful, for we can not know if you have a modified PHP build, or if your minimal Apache has been changed significantly to allow it to run off your DVD.

                    A possible cause may be that your web server is trying to handle all ".CGI" files as Perl scripts, in which case the CGI is not being properly executed.

                    Try this alternatively (also from the FAQ):
                    Code:
                    <?php
                    $QSTRING = $_SERVER&#91;'QUERY_STRING'&#93;; 
                    while &#40;list &#40;$header, $value&#41; = each &#40;$HTTP_GET_VARS&#41;&#41; 
                    &#123; 
                    $QSTRING = $QSTRING.'&'.$header.'='.$value; 
                    &#125;
                    $output = shell_exec&#40;"/cgi-bin/search.cgi".'?'.$QSTRING&#41;;
                    $output = ereg_replace&#40;"Content-type&#58; text/html", "", $output&#41;;
                    echo $output;
                    ?>
                    Originally posted by seangates
                    By the way, what kind of performance issues are there with PHP using Zoom?
                    PHP will always be slower, due to the overhead in the scripting engine required to process the script in runtime. Here are some benchmarks we have made comparing the search times of the different platforms:
                    http://www.wrensoft.com/zoom/benchmarks.html

                    PHP (and all other scripting variants) do not scale well as the size of the data exceeds a certain amount. Once it hits this point, searches may take unreasonably long or it may fail (due to timeouts etc.)

                    Originally posted by seangates
                    Also, what is the skip word list limit?
                    The limit is 400. Note that this is documented in the Technical Limits section of the Users Guide.

                    Note that you can precede a word with an "*" character to filter all words containing this keyword. So you can truncate the list a fair bit by doing so ("*code" will skip "code", "decode", "encode", "coder", etc.)
                    --Ray
                    Wrensoft Web Software
                    Sydney, Australia
                    Zoom Search Engine

                    Comment

                    Working...
                    X