PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Problems with accents in search

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problems with accents in search

    I imported my list of about 600 synonyms and all seems to work fine until I searched for a word with an accent.

    One of the synonyms I imported was:
    Sjögren's=Sjögrens,Sjögren,Sjogren's,Sjogrens,Sjog ren

    I searched for Sjogrens and got the following result:

    Search results for: sjogrens in all categories
    The following word(s) are in the skip word list and have been omitted from your search: "sjogrens"
    No results found.
    The live site for testing is here: http://www.icdmeister.com/site/

    I have the checkbox to "Enable accent/diacritic insensitivity" checked for all 3 options.

    I don't understand why the search isn't recognizing these words set as synonyms. What should I do to make this work properly for words with accented characters?

    Thanks,

    Andrew

  • #2
    Do you have sjogrens as a word in your skip list?
    Do you have apostrophe as a join character?

    Also you can't use phrases as synonyms.

    Comment


    • #3
      There are several problems here.

      Originally posted by aschecht View Post
      One of the synonyms I imported was:
      Sjögren's=Sjögrens,Sjögren,Sjogren's,Sjogrens,Sjog ren
      You should check if you have a space in that last synonym there. Space characters are invalid in synonyms because phrases (that is, multiple words) are not supported.

      I have just confirmed that although we prevent people from entering it in manually from the GUI, it is still possible to import a word with a space in it. We will have to fix this for the next build.

      The next likely cause of the problem is tricky. What you need to remember is that the "word" part of the synonym (that is, the left hand side of the equals sign) needs to be something that is indexed. From the Users Guide and Help file:

      A synonym definition has two fields:
      1.Word: This is the word that the synonyms will be mapped to. It must be a word that actually appears in the content of your website.
      2.Synonyms: This is a list of words separated by commas that will be considered equivalent to the indexed word. When a user searches for any of these words, they will get the same search results as if they searched for the indexed word. All occurrences of the words in this synonym list will also appear as a search result when you search for the indexed word.
      Because you have accent insensitivity enabled, the word indexed is actually not the accented version, but the non-accented one.

      So if you change your synonym to the following, it should work:

      Code:
      Sjoegren's=Sjögrens,Sjögren,Sjogren's,Sjogrens,Sjogren
      We should probably change the way this works in the next major release. Note also that 'ö' is currently the equivalent of 'oe' in the current release. We may also need to provide an option to switch between this more accurate/old linguistic behaviour to the modern day "ö=o" behaviour in V6.
      --Ray
      Wrensoft Web Software
      Sydney, Australia
      Zoom Search Engine

      Comment


      • #4
        Thanks. That's very clear.

        I followed the instructions to make sure that the Word on the synonym list actually did appear in the website but then the fact that I had activated accent insensitivity made it so that even though the Word was the true content of the website it was interpreted as "Word without accents" and this is the correct listing for the Word in the Synonyms list.

        Originally posted by Ray View Post
        Note also that 'ö' is currently the equivalent of 'oe' in the current release.
        That sure is tricky - so even though the word in my content is Sjögren's, I need the Word in the synonym list to be Sjoegren's.

        Are there any other accents for which I'll need to substitute something other than the letter without the accent (ie. like the oe for the ö, I mean)?

        Thanks for helping me understand this.

        Andrew

        Comment


        • #5
          Another accent problem

          OK. I changed the listing in my synonyms list to:

          Word: sjoegren's
          Synonyms: Sjögren's,Sjögrens,Sjögren,Sjogren's,Sjogrens,Sjog ren (note: there's no space in this last word on the synonyms list. It just keeps showing up in this forum post)

          I have Enable accent insensitivity activated. Apostrophe is not set as a character that splits words. The listing in my content is "Sjögren's" (no quotes)

          Here are my search results:
          • sjoegren's - finds correct results
          • Sjögren's - finds correct results
          • Sjögrens - "Did you mean: Sjoegren's or Sjögren's or Sjögrens?" - clicking the first two links goes to the correct results. Clicking the third seems to reload the same page.
          • Sjögren - same results as for Sjögrens
          • Sjogren's - finds correct results
          • Sjogrens - finds correct results
          • Sjogren - finds correct results

          I've had similar inconsistent results with Löffler, Stähli, & Münchausen.

          The site is available at www.icdmeister.com/site/ for testing purposes.

          Any ideas why I'm getting the inconsistent results?

          Andrew
          Last edited by aschecht; Jun-14-2008, 01:49 PM. Reason: explaining the space in the synonym list

          Comment


          • #6
            OK. I was able to fix this by adding Sjoegrens and Sjoegren to the synonym list. I'll try the same thing with Ä = AE to see if that works and will report back.

            I agree with you Ray that this could be simplified in the next release.

            Andrew

            Comment


            • #7
              Actually, what you should do is only use the accent-less version of the words in your synonym list, when you have "accent insensitivity" enabled.

              That is, if you have the following synonym entry:

              Word: sjoegren's
              Synonyms: Sjoegrens,Sjoegren,Sjogren's,Sjogrens,Sjogren

              .. and you have accent insensitivity enabled, then all of the following search words will return the same result: sjögren's, sjögrens, sjögren, sjoegren's, sjoegren, sjoegrens, sjogren's, sjogrens, sjogren.

              Perhaps we can avoid this confusion in the next release by converting all synonyms to accent-less versions when accent insensitivity is enabled.
              --Ray
              Wrensoft Web Software
              Sydney, Australia
              Zoom Search Engine

              Comment


              • #8
                Thanks.

                A.

                Comment

                Working...
                X