PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Search box only, zoom_index.js too large?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Search box only, zoom_index.js too large?

    I'm indexing ~850 pdf's, and there are just under 80,000 unique words. I put the search.htm code onto my index.htm page, and the form was working perfectly. As I added more files, and tweaked these limits to get the above numbers, the index page now only shows the Search heading and the text box, but no Submit button or search options like usual. I noticed other people had similar problems, but none of the solutions worked. The only thing that is working for me is if I lower the max. unique words, making my zoom_index file smaller (but then all my files don't get indexed). With the 80,000 unique words, it's a 3640kB file. When I make the max. ~40,000 unique words, the file obviously gets alot smaller, and then everything works/shows up perfectly. What am I doing wrong???

    Also, I noticed the maximum word-skip list is 395 words, is it possible to import a larger list somehow?

    Thanks.

  • #2
    JavaScript is a very limited language. It is not good at tasks that involve heavy data processing. It is also very limited in how much RAM it can use. Internet Explorer limits this to about 10MB. (even if you have 1GB of RAM available on the client machine).

    So I think you have hit the limit of what Javascript can do. If you switch to the server side scripts (ASP, CGI or PHP), these limits are removed.

    The server side scripts are also much faster and have more functions.

    There is a fair chance that if your skip list is greater than 100 words, you are not using it correctly. Why do you need to skip so many words?

    -----
    David

    Comment


    • #3
      I'm doing a DVD-ROM, so would using a server-side CGI script work? I was reading up online about using it, but am a little bit lost trying to figure it out.

      There is a fair chance that if your skip list is greater than 100 words, you are not using it correctly. Why do you need to skip so many words?
      There's just a whole heck of a lot of garbage that's being found on these big pdf's. There's lots of instructional numbers, service numbers, some random french stuff - just a whack load of junk that's showing up in the index. Like it's getting just under 80,000 unique words, but only like 35,000 are even usefull atall. I was trying to make a nice big skip list so that the index file smaller and quicker. At first I was just going to manually go through the index file and start mass-deleting the useless crap, then quickly realised I was screwed when I needed to re-index.

      I really appreciate the quick response! Thanks.

      Comment


      • #4
        Originally posted by squeak
        I'm doing a DVD-ROM, so would using a server-side CGI script work? I was reading up online about using it, but am a little bit lost trying to figure it out.
        Here is more information on setting up the CGI version with Server2Go for use on a CD or DVD:
        http://www.wrensoft.com/zoom/support/cgicd.html

        Note that this solution will only work on Windows machines.

        Originally posted by squeak
        There's just a whole heck of a lot of garbage that's being found on these big pdf's. There's lots of instructional numbers, service numbers, some random french stuff - just a whack load of junk that's showing up in the index. Like it's getting just under 80,000 unique words, but only like 35,000 are even usefull atall. I was trying to make a nice big skip list so that the index file smaller and quicker. At first I was just going to manually go through the index file and start mass-deleting the useless crap, then quickly realised I was screwed when I needed to re-index.
        We don't recommend manually modifying the index files. In some cases it can break the index data.

        It might be worth checking why there is so much supposed 'garbage' indexed. I presume the numbers etc, are actually found within your documents. If you don't want a set of your documents to be indexed, you can omit them by using the "skip page list". That may be better than attempting to skip all the numbers etc.

        If that isn't possible, you should note that by preceding a skip word with a "*" character, the skip word will then allowed to be partially matched (see Help for more information).

        So the following skip word list will skip all words containing numbers:

        Code:
        *1
        *2
        *3
        *4
        *5
        *6
        *7
        *8
        *9
        *0
        --Ray
        Wrensoft Web Software
        Sydney, Australia
        Zoom Search Engine

        Comment


        • #5
          I cannot use the CGI script because it's not Mac compatible. It's very important that this DVD is cross platform compatible!!

          Are there any other options to fix this issue???

          I can't use anything like *1, *2...because some of the actual manual number's I would like searchable...

          Comment


          • #6
            There is no great solution. But there are some not so great solutions.

            If you want the search function to be in a browser window and run on a Mac and PC then you are really limited to using Java, Flash and JavaScript languages. We decided Java was not a good option because Microsoft prevents Java applications running in a browser smoothly without changing security settings. Flash didn't have the right file access functions. So that leaves you with JavaScript. Which is a lame language, but cross platform and runs in a browser.

            However if you are prepared to step out side of the browser, then other options are possible. You can have a native Windows application that runs on the desktop (not in a browser) and a different native Mac application for the Mac. They would both use the same set of index files, but the user interface would be different, and you would need two different executables.

            Once you have a native executable, then you have access to the full resources of the machine. This is what we did with the CGI Front end.

            We make the source code freely available, so if you can port this code to the Mac, you have a solution.

            Otherwise you can stick with Javascript, but work on ways to reduce the size of the index. For example, you can certainly have 5 seperate search functions and restict the indexing to just 1/5 of the documents for each set of index files. With some additional custom coding you could add a drop down list to select the set of index files to use. (you can't use categories to solve this problem as the index will be just as large as it was to start with).

            ----
            David

            Comment

            Working...
            X