PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Language problems %XX characters in links with UTF-8

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Language problems %XX characters in links with UTF-8

    Hi there, great software, good work - I am using it since V3!
    Now the first time I am encountering troubles:
    I am indexing offline our inhouse fileserver that contains xls, pdf, ppt, doc.
    No problem with the indexing, all is done in UTF-8.
    The problem is the display of the search results:
    as the file names of all indexed documents contain non-standard paragraphs, i.e. a space or an ä, ö, or ü, they look wrong on the results page -
    the UTF-8 is not translated, so instead of a space there is %20,
    ä yields %E4 and so on. This is just happening in the links, the displayed file content is correct.
    What am I doing wrong?

    Best regards,
    Michael

  • #2
    Actually, this is correct behaviour. I assume that when you say it is "just happening in the links", you are referring to what appears under the search results, alongside the "Score:" etc, but prepended by "URL: " and also what appears in the status bar of the browser when you hover your cursor over the link.

    What you are seeing is not really the result of using UTF-8. It is in fact known as "URL Encoding". URLs are required to be encoded, as there are a number of reserved characters which are not technically allowed (including spaces and the foreign characters you mentioned). Browsers like IE hide this fact from you and allow you to enter in URLs with spaces, but it will automatically convert the characters to percent encoding behind the scenes. But not all browsers do this, so we cannot leave them unencoded.

    More information on URL encoding here: http://en.wikipedia.org/wiki/Percent-encoding
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment

    Working...
    X