PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Apostrophes & PDF files

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Apostrophes & PDF files

    Hi: A Zoom_index.js file generated with Zoom 5 from pdf files has the apostrophes coded as ' (with apostrophes checked in the Zoom configuration options), e.g. Tom's is indexed as tom's. A search of the pdf file for Tom's then doesn't find it (Adobe Reader . If I replace all the ' groups in the Zoom_index.js file with ' the search works fine (see exception below). If the apostrophe option is unchecked in the Zoom configuration, the search fails also (Adobe seems to think it is searching for "tom s" and doesn't find it). I have not tried this with file types other than pdf.
    The exception mentioned above: if Tom's is followed by a comma, e.g.Tom's, then Adobe doesn't find it initially, but if (in the Adobe Reader) New Search is selected and the same search redone, it (e.g. Tom's,) the search is successful (after the ' groups are replaced).
    I hope this post is clear enough. Any suggestions - especially about the second problem?
    Walter K.
    Last edited by whk; Dec-31-2006, 07:00 PM. Reason: clarification

  • #2
    Isn't this really a bug in Adobe Reader?

    How are you doing the search in Adobe Reader? Directly from the Zoom search result, or by using CTRL-F from within Adobe Reader?.

    If I replace all the ' groups in the Zoom_index.js file with ' the search works
    These two chararacter look the same to me? Even when I look at them in hex, they are both the same character (0x27). So I am confused by this comment.

    Comment


    • #3
      Originally posted by wrensoft View Post
      Isn't this really a bug in Adobe Reader?

      How are you doing the search in Adobe Reader? Directly from the Zoom search result, or by using CTRL-F from within Adobe Reader?.


      These two chararacter look the same to me? Even when I look at them in hex, they are both the same character (0x27). So I am confused by this comment.
      The original zoom_index.js file generated by zoom has, in place of an apostrophe, an ampersand symbol followed by a # symbol and then 39;. I put this in my first post but apparently this got translated back to an apostrophe by the posting process. No wonder you got confused; I should have previewed the post. Anyway, when I use Zoom to search pdf files for "Tom's" after generating the Zoom search files for a site that includes several pdf files, it reports that there are no instances of "Tom's", even though there really are. If I replace all instances of ampersand, # and 39; in the zoom_index.js file by a real apostrophe, then the search, using zoom, works and clicking on a result brings up Adobe Reader with the search term showing and highlighted. The one exception I found was the strange result when "Tom's" is followed by a comma. In this case, a zoom search correctly reports (after I do the replacement mentioned above) which pdf files contain "Tom's" but clicking on one brings up Adobe Reader saying there are no instances of "Tom's". Clicking on New Search in the Reader and repeating the search within the Reader then shows the instances, jumping to and highlighting them. I don't know if this is a bug in Zoom or in the Reader (version .
      Note the quotes around Tom's in this post are for clarity only; I don't use them in the search. Also, other terms besides Tom's, which have apostrophes give similar results.
      Walter K.
      Note added in edit: Searching for a substring with Zoom substring matching enabled, and then clicking on a pdf listed on the Zoom Results page brings up the reader with a Not Found message. However, clicking on "new search" in the Reader and redoing the same substring search within the Reader then displays the strings containing the substring. This behavior is similar to that described for "Tom's," at the end of my post above. PDF files are tricky.
      Last edited by whk; Jan-01-2007, 07:30 PM. Reason: More findings

      Comment


      • #4
        I did some testing and I agree. There appears to be a bug in the V5.0 build 1001, Javascript search option, when searching for words with apostrophes and when apostrophes selected as a join character. It doesn't appear to limited to just PDF content.

        We'll have a closer look at the best way to fix it and hopefully make a fix available in the next patch release. (Probably around the 20 / Jan / 07).

        We'll also make a note of the other sugestion (having @ as a join char)

        Comment


        • #5
          Originally posted by wrensoft View Post
          I did some testing and I agree. There appears to be a bug in the V5.0 build 1001, Javascript search option, when searching for words with apostrophes and when apostrophes selected as a join character. It doesn't appear to limited to just PDF content.

          We'll have a closer look at the best way to fix it and hopefully make a fix available in the next patch release. (Probably around the 20 / Jan / 07).

          We'll also make a note of the other sugestion (having @ as a join char)
          The apostrophe/PDF problem does not appear to have been fixed unless there is some configuration setting I am not making. For example, indexing a PDF file that contains "president's" gives an index that contains "president" but not "president's". Searching for the latter then gives no result, of course. This is different behavior than with the beta version of Zoom 5 (see my previous posts on this subject) but obviously still not right.
          On a different topic, it would be nice if, when the search form is opened, the focus would be on the text box so the user could just start typing the search term instead of having to click on the text box first.
          Walter K.

          Comment


          • #6
            Originally posted by whk View Post
            The apostrophe/PDF problem does not appear to have been fixed unless there is some configuration setting I am not making.
            We've confirmed that the problem with searching for words containing the apostrophe character (with "apostrophes" enabled as a word join character) is still occuring in the Javascript version from the latest build (5.0.1004). We'll have it fixed for the next build.

            Originally posted by whk View Post
            On a different topic, it would be nice if, when the search form is opened, the focus would be on the text box so the user could just start typing the search term instead of having to click on the text box first.
            This requires the use of additional client-side Javascript. We purposely do not provide any Javascript in the default search template so as to avoid potential problems with other scripts that the user may insert into their search page (eg. javascript navigation menus, scripting that places the cursor on a different field, etc.). So this is by design that we leave it to the user to add Javascripting to do this.

            We have previously discussed how to do this in another thread:
            http://www.wrensoft.com/forum/showthread.php?t=42
            --Ray
            Wrensoft Web Software
            Sydney, Australia
            Zoom Search Engine

            Comment

            Working...
            X