PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

windows 1251 support

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • windows 1251 support

    Hi,

    I am evaluating your product right now so please bear with me in the following posts

    I am indexing one html page in Bulgarian with encoding 1251. When the Zoom is set to use English 1252 code page I have this indices:

    "аст,0,15",
    "монтиране,0,15",
    "пускане,0,15",
    "техни,0,15",
    "ески,0,15",

    but when I change the settings to use Windows 1251 the indices change resulting to correct ones:

    "част,0,15",
    "монтиране,0,15",
    "пускане,0,15",
    "технически,0,15",

    As it seems appropriate to think that one should use 1251 setting when indexing a 1251 encoded html page, why the rest of the words are still indexed correctly?

  • #2
    We don't know what text was on the page you indexed and don't speak Bulgarian. So we have no chance of knowing if the output was correct or not.

    But from your post I think you are saying that if you use the Zoom 1251 code page setting on a page that is encoded with 1251, then it gives a correct result.

    But if the result is correct, why did you post the question? I'm confused.

    There are also general guidelines for using Zoom with various languages here,
    http://www.wrensoft.com/zoom/support/languages.html

    ------
    David

    Comment


    • #3
      My point is that when using Windows 1252 words in the index look just like the ones generated using Windows 1251.
      Let me use sample english words just for demostration:

      windows 1251

      "head"
      "technical"

      windows 1252

      "ead"
      "tec"
      "nical"

      This is what generaly happens with the bulgarian words. Everything looks good, when using Windows 1252, except for one missing letter - in this sample 'h'.
      My question was why it looks this way? Why only one letter is missing?
      Shouldn't it look all like garbage when Windows 1252 is used.

      Comment


      • #4
        I haven't checked in detail, but I assume that there is a fairly big overlap between the 1252 and 1251 character sets. So that it almost looks OK for some words which use the common subset of letters.

        If you want to have a mixed site with both Bulgarian and English, consider using the UTF-8 character set instead.

        ----
        David

        Comment

        Working...
        X