PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Any way around the 300K limit for max pages indexed for PHP?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Any way around the 300K limit for max pages indexed for PHP?

    Hi,

    Just wondering if there was a way around the 300K limit for max number of pages indexed for PHP. This disclaimer says that it's too much load for the server however my server is a quad processor system with 12 gig of memory. It's not going to be a problem

    I really want to keep using PHP as my entier website and scripts are PHP based.

    Thanks

  • #2
    The limit for PHP is much lower than 300,000 pages. The limit is 65,000 pages for PHP. I am not sure where you got the 300,000 number from.

    The problem is that PHP is slow. It is an interpreted language and was never really designed to handle huge data processing jobs in a time critical manner.

    You say you have a quad CPU system. But PHP scripts only run in a single thread. Thus at best only 1 CPU is used. Then even this CPU will probably be limited by the speed it can get data from the disk.

    You also say that you have 12GB of RAM. But I am betting you are using a 32bit operating system, running 32bit web server applications and a 32bit version of PHP. With 32bits you can at best use 2GB of RAM per application and PHP will be restricted by this 32bit addressing limit.

    So 3 of your 4 CPUs and 10GB of your RAM will not be used by PHP.

    Next, have a look at the benchmark measurements we made.
    http://www.wrensoft.com/zoom/benchmarks.html
    With 60,000 pages search times were between 1.7 to 2.5 seconds. So we don't think PHP would scale to 300K pages. Search times would probably be around 10sec+.

    So that was the bad news.

    The good news is that we have a CGI option that uses compiled C++ code to get about 4 times the performance of PHP. Again refer to the benchmarks above.

    This CGI version doesn't have a limit of 65K pages and can in theory go all the way to 300K pages.

    However you also need to build the initial index with the 300K pages. You need a lot of RAM for this. But 2GB should just be enough. Again 10GB will be wasted, as the Zoom indexer is only a 32bit application. But the indexer is multi-threaded which means your 4 CPUs might not be wasted if you can feed them data quick enough from your disk & internet connection.

    We are looking for a reason to release a 64bit version of Zoom however. A 64bit version of the indexer and CGI should be able to push the limit to 1M pages or more on a machine like yours. So if you are stuck let us know. We can do a deal with you to move to 64bit.

    ----
    David

    Comment


    • #3
      David,

      Sorry but I mistyped my post. I meant to put "Max. unique words". My website has about 500,000 unique words when I use the CGI option. The funny thing is even though there are so many pages and words on my site the search index is only about 125K in size.

      So I should rephrase my question to is there a way arond the unique word limits with PHP?

      As I said my whole site is PHP based and I would like to keep it that way. My other problem is I don't know much about CGI at all and don't know much what to do with the files once I have them.

      If you know of a primer on how to use your CGI file I could give that a try. I run IIS 5 on Windows 2000 server.

      Thanks

      Comment


      • #4
        Okay...I got the CGI stuff working through some research. However my question now is it seems I have to put my zdat files in my scripts directory.

        How can I have my CGI script in my scripts directory and have the zdat files somewhere else. Is that possible to do?

        Do I have to decompile the cgi script? Or invoke it through an imbedded script in an html pag?

        Thanks

        p.s. I still want to do it with PHH

        Comment


        • #5
          500,000 unique words is a lot of words. There are only about 50,000 unique words in the English dictionary. So it might be worth some investigation of your pages to determine why you have some many unique words.

          > the search index is only about 125K in size

          Yes, this seems too small. Maybe it is really 1250KB or 12.5MB?

          There is no way around the PHP limits.

          > I have to put my zdat files in my scripts directory

          All the search related files need to be in the same directory.

          > Do I have to decompile the cgi script?

          You can try if you want. But you'll need to be a very good programmer to decompile C++ code.

          What is the problem with having all the search related files in the same directory?

          ------
          David

          Comment


          • #6
            David,

            Thanks

            Yes the index files are small and this is because the pages are emails. Basically I have an email repository for the past 3 years and I rip them to html files and then index them with Zoom.

            As far as the files in the scripts directory it was just more of a question of did they have to be there...which you answered.

            Anyway everything is working great with CGI and my whole site is indexed and I learned some new things.

            Thanks

            Comment

            Working...
            X