PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Search for danish capitalised ligatures: Æ, Ø and Å

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Search for danish capitalised ligatures: Æ, Ø and Å

    Hi,
    Thanks for a real fine and easy to use program. I just downloaded the program yesterday, version 5.1 (build 1017).

    I have one problem though: If I search for words which contain capitalised danish ligatures: Æ, Ø and Å, the ingieen does'nt seam to reconise the ligatures. It finds the lower case ligatures fine (æ, ø and å), but not the capitalised.

    My website is in unicode utf-8, and I have seleced the "Use Unicode (UTF-8 encoding) button", and the platform is linux/php. I have disabled the insensitivity to ligatures because of the known bug with the display of ligatures in the search result response: Search results for: search word …

    I also tried på replace the Æ, Ø and Å width the html-entites: Æ Ø and Å but it don't help.

    Is this a bug, or am I doing something wrong?

  • #2
    I've just tested this and could not reproduce the problem. I was able to search for words containing these characters fine, in upper or lowercase.

    I presume you are not actually referring to highlighting, but actual search results. Note that the current version has a known issue with not highlighting some words with accent characters.

    You should check if you have enabled "Support single-case languages" on the Languages tab of the Configuration window. This should be disabled (that is, the checkbox should be cleared).

    Can you provide us with a URL to your search page?
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment


    • #3
      Yes, I am referring to the search result, not the highligting.

      Something is odd. The "support single-case languages" is disabled, when the search for upper case ligatures doesn't work. If I make a search for the danish word "årstiderne" the engine doesn't find it, but if I make a new search for the letters "rstiderne" (the word without the ligature å), the engine finds it.

      I just tried to enabled the "support single-case languages" (the checkbox is not cleared), and then the engine finds the upper case ligatures, but the search behaves case sensitive.

      The search-page address is: http://www.designlinien.dk/search.php

      Page with upper and lower case ligatures: http://www.designlinien.dk/jungs-mandala.html

      I have disabled the "support single-case languages" (checkbox is clear) again.

      The websites xhtml-files are in unicode utf-8, with unicode signature DOM included, and with unicode normalization form C – Dreamweaver says

      Thank you for the help
      Last edited by anne; Oct-03-2008, 03:26 PM.

      Comment


      • #4
        I think we've found the cause of the problem.

        It appears that you have enabled "Substring match for all searches". First, I should point out that this is generally only recommended for asian languages. For any latin-based language (where spaces are used to separate words), it is usually not beneficial. For example, with this feature on, searching for the word "cat" will also return "category", "duplication", "authentication", "location", ... and anything else containing 'cat' within the word. This usually returns far more results than you would want.

        By default, this option is disabled, and when I tested indexing the above given example page with this setting off, I was able to search for the word "årstiderne" without problem, in either upper or lowercase.

        But when I enabled this option, I was no longer able to search for it. It seems that the substring matching feature in PHP does not support certain UTF-8 characters. We'll add this to our list of things to look at for the next version.

        Having said that, I think you would likely want to turn off "substring matching" and it will in turn fix this problem. If you sometimes need a partial text match, you can use wildcards as described here.
        --Ray
        Wrensoft Web Software
        Sydney, Australia
        Zoom Search Engine

        Comment


        • #5
          Hi,
          thank you for finding out and for looking for the problem in the new version.

          The reason why I turned on the "substring matching" is because of the nature of the danish language. For example the two words in english: "search engine" will in danish be only one word – like this "searchengine". So when turning off the "substring matching" the search result becomes too narrow. Of cause turning it on also help with the issue of "matching plural form of words to their singular forms".

          Thanks again – I will be looking forward to the new version. Lots of great new features there.

          Comment

          Working...
          X