PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Special chars not showing up right in cgi search

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Special chars not showing up right in cgi search

    Hi,
    I have the enterprise edition and I am indexing UTF-8 html content with the Unicode char set selected in the Language config tab.

    This source content sometimes has a lot of ® and © symbols in it. Serving the html will show the chars just fine. But when searching using the CGI search, and a result has one of these chars., it will show up as a "�" or a "?" in internet explorer, firefox, and opera; Windows and Linux.

    I have used both indexer 5.0 build 1000 and 1001. Is there some way to have these show up fine?

    Thanks,
    D

  • #2
    Do the copyright symbols appear in the HTML source as character entities or as ordinary characters? A HTML character entity looks like this, "©".

    Are you sure that the search template file is also UTF-8?

    What font set are you using on the result page? Arial or something more obscure?

    Can you post the URL to the search function and / or the source document, so that we can see the problem. It makes it much easier to investigate.

    Comment


    • #3
      Hi again,
      Those were some really good suggestions. We tried them all, but to no avail:

      Used character entities and ordinary chars all in UTF-8.
      Our font was Arial.
      The template was UTF-8; we also tried the default template.
      We used a different web server. (first was shttpd, second apache)
      The encoding under the Launguage tab in the indexer was still set to UTF-8

      We set up a test here:
      http://www.gogroundline.com/cgi-bin/Linux_search.cgi

      Most the links will not work because we only indexed one page. If you do a search for 'Groupwise', the problem will show up.

      I am attaching a piece of one of the documents that show this error. It is found as 'test.html' in the accompanying zip file. The index files are also attached.

      Thanks,
      D
      Attached Files
      Last edited by dbuck; Jan-10-2007, 06:36 PM. Reason: spelling

      Comment


      • #4
        We've confirmed the problem in the latest build (5.0.1001) where copyright, trademark and various symbols may not appear correctly in the context description on an UTF-8 page. This will be fixed in the upcoming new build (5.0.1002).
        --Ray
        Wrensoft Web Software
        Sydney, Australia
        Zoom Search Engine

        Comment


        • #5
          Thanks!

          D

          Comment

          Working...
          X