PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Invalid characters for strings extracted from PDF, DOC, ...

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Invalid characters for strings extracted from PDF, DOC, ...

    Our intranet is in french and dutch, and the charset of pages is set to "iso-8859-1".
    In the Zoom Search Engine I set the Languages options to : Specify other encoding = iso-8859-1 (English / Latin 1).

    So, the results of a search shows good strings for indexed pages, but strings froms documents present invalid characters.

    I join an screenshot...

    (http://www.tiikoni.com/tis/view/?id=e77b641)

  • #2
    Is the search page wrapped within another page?

    For example, if you are using PHP ("search.php"), is this included within another PHP page?

    Tell us which platform (PHP, ASP, CGI, etc.) and which script you are using and if you are accessing the script directly or otherwise. If so, try accessing the script directly.

    Also tell us if you have edited the script in any way.

    Ideally, if you can give us the URL to your search page, we can take a look.

    Also make sure the meta charset tag on your "search_template.html" file is consistent with your other settings.
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment


    • #3
      Yes, the search.php is included in a page with the design of intranet (so I can't give you an url).

      By using search.php directly, the problem is same.

      No, the script was not edited.

      I don't find any meta charset in search_template.html. Only in my index.php (which includes search.php).

      -----------

      The situation better if
      - I run the Zoom, defining Language = "Use Unicode (UTF- encoding";
      - I change the charset of index.php from iso-8859-1 to utf-8.

      But I have invalid characters in the titles of documents (both in index.php or search.php).
      See image : http://www.tiikoni.com/tis/view/?id=86e4e31

      Comment


      • #4
        Very like it is a character set issue. But if we can't get access to the site can you E-mail us

        1) A couple of the documents in question, the ones with the corrupted text
        2) Your Zoom configuration file. Save it from the File menu.
        3) Your search_template.html file.

        Comment


        • #5
          Email sent.

          Thank's for your support !

          Comment

          Working...
          X