PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Looking for word occurences within files

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Looking for word occurences within files

    I can't find how to make Zoom engine to give as search results all occurences of a character string within files rather than only the names of files where the string is found (which is not very useful in my opinion, particularly if the files are quite large).

    A hint please

    Martin

  • #2
    By default Zoom DOES display the contents of the file. It displays a small section of the document that contains your search word (context results).

    But there are various ways that you can turn this off. And I am guessing that this is what you have done. The most common issue is that you are using the Javascript option, which doesn't display context results. So if this is the case try switching to using the PHP. ASP or CGI option.

    If this is not the issue can you
    1/ Tell us what version of Zoom you are using
    2/ Post and example of what your results look like
    3/ Give us the URL of the site you are trying to index

    --------
    David

    Comment


    • #3
      David,

      Originally posted by Wrensoft
      But there are various ways that you can turn this off. And I am guessing that this is what you have done. The most common issue is that you are using the Javascript option, which doesn't display context results.
      You are right, I used Javascript.

      Originally posted by Wrensoft
      So if this is the case try switching to using the PHP. ASP or CGI option

      If this is not the issue can you
      1/ Tell us what version of Zoom you are using
      2/ Post and example of what your results look like
      3/ Give us the URL of the site you are trying to index
      Ok, I put two files (one pdf and one html but with the same content) on http://www.umce.ca/cours/martin/Testing_Zoom/
      and created an index with Zoom (v 4). If you use the search engine (...Testing_Zoom/search.php) with, for example, the word 'Cuba', you will get two items, the two file names. That is all. You will not be able to find the 62 occurences of this word in each of these files.

      You see what I mean ?

      Martin

      Comment


      • #4
        No? I went to the following search:

        http://www.umce.ca/cours/martin/Test...oom_query=cuba

        And it returns a filename for each result, and a context description showing where "cuba" appears within the content of the PDF file and the HTML page.

        Perhaps you've changed your configuration since your last post? You should (or may have already) note that the appearance of your search results is configured from the "Results Layout" tab in the Configuration window. Your previous posts sounded like you simply had "context description" disabled.

        Let us know if you have any questions.
        --Ray
        Wrensoft Web Software
        Sydney, Australia
        Zoom Search Engine

        Comment


        • #5
          Hello Raymond,

          Originally posted by Ray
          No? I went to the following search:

          http://www.umce.ca/cours/martin/Test...oom_query=cuba

          And it returns a filename for each result, and a context description showing where "cuba" appears within the content of the PDF file and the HTML page.
          Hum, there is no way to do attachments here, otherwise I would show you what I get when I click on the above link. Would you please tell how many result you are getting? For my part, I get two results only, one for each of the files, with the word highlightled with the context (yes, this option was enabled).

          By comparison, Acrobat Reader will find (in the pdf file only, of course) 62 occurences of the word 'cuba', each of these occurence reachable on a click.

          And now ?

          Martin

          Comment


          • #6
            Well, I did not realized this limitation of the program (Google is not better in that regard). But Zoom has qualities too.

            Martin

            Comment


            • #7
              The above URL for your site is no longer available, but my guess is that you are seeing normal behaviour but have some misunderstanding with what you expect to see in the results.

              Remember that Zoom (and Google) searches across multiple documents to find the document that has the most relevance. Acrobat Reader is locating a search word within a single document.

              Zoom determines relevance (and thus the order of your search results) depending on the total number of words in the documents (so yes, it does realize internally that there's 62 occurances of the word in that document), but it only shows the first one in the context description. If you do a multiple word search "cuba cars", you'll see the first occurance of each of these words in the context description.

              However, it would make very little sense to display ALL occurances of these words for EVERY document. It would make for a very long, and crowded search result, when you start searching more than a few files (like in your example).

              So the idea is that, Zoom will find documents for your search, sorted by relevance. You can then look inside each of these documents to find each occurance of the word.

              PS. One thing I should note is that the "Terms matched:" line seems to be commonly misunderstood as "Number of words found". This is NOT what it means. Terms matched is the number of search terms (from your query) that were found. eg. "cuba" is only one search term, so ALL pages found will say "Terms matched: 1".
              --Ray
              Wrensoft Web Software
              Sydney, Australia
              Zoom Search Engine

              Comment


              • #8
                Hello Ray,

                Originally posted by Ray
                The above URL for your site is no longer available, but my guess is that you are seeing normal behaviour but have some misunderstanding with what you expect to see in the results.
                Effectively, I removed it as it was no longer useful.

                Originally posted by Ray
                Remember that Zoom (and Google) searches across multiple documents to find the document that has the most relevance. Acrobat Reader is locating a search word within a single document.
                I agree but Acrobat Reader can extent a search for a whole bunch of pdf files located in the same directory which is extremely useful for my purpose. I just wanted to find a search engine that could do the same with html files (or both type of files).

                (snip...)

                Originally posted by Ray
                However, it would make very little sense to display ALL occurances of these words for EVERY document. It would make for a very long, and crowded search result, when you start searching more than a few files (like in your example).
                Well, that depends. For somebody searching many small files, it is ok to point only to name files. If you are searching through few large files, a detailed list of all occurences is then useful.

                (snip)

                Originally posted by Ray
                One thing I should note is that the "Terms matched:" line seems to be commonly misunderstood as "Number of words found". This is NOT what it means. Terms matched is the number of search terms (from your query) that were found. eg. "cuba" is only one search term, so ALL pages found will say "Terms matched: 1".
                I agree with you on that point, this is slightly misleading.

                I better know my problem now, thank you.

                Martin

                Comment


                • #9
                  We generally decided that listing all occurances on the search page would crowd up the results too much. It would also slow down searches alot and make it an impractical solution as a website search. You may notice that Acrobat Reader takes a long time to search through groups of files.

                  Also, for your information (and others reading this based on the name of this thread), you can add extra capability to highlight and scroll to occurrances of words within a HTML document with this feature:
                  http://www.wrensoft.com/zoom/support/highlighting.html
                  --Ray
                  Wrensoft Web Software
                  Sydney, Australia
                  Zoom Search Engine

                  Comment

                  Working...
                  X