PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

problems with arabic diacritic marks

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • mrbasserby
    replied
    Originally posted by Ray View Post
    Just to clarify, it seems like you did not have this feature enabled before? So the behaviour is now different with the feature enabled?

    From your most recent examples, it seems that it now matches all occurences on a web page (with diacritics and without) so long as the user enters the non-diacritic version of the word into the search box.

    However, it will not match if the user enters the diacritic version of the word into the search box.

    Correct me if the above summary is inaccurate.

    If this is the case, then it is behaving currently as designed. The indexer is capable of stripping diacritic marks from arabic languages because it is run from your computer. However, the search script (PHP, ASP, CGI, etc.) does not have this available because most hosting platforms are limited and it would be difficult to impose locale/regional settings on the web server (alot of people are on shared hosting and not dedicated servers).

    As yourself and the original poster of this thread stated however, most of the time Arabic users do not type in the diacritic marks when searching. So perhaps you can simply add some advice on the search_template.html page before the search box to tell users they should enter in words without diacritic marks (and that it will match both diacritic and non-diacritic versions found on pages).
    hi sorry for my bad English and yes you got exactly what i mean when type
    in Arabic diacritic and the strip option enabled no match found even if its exist.
    and yes you right they don't type in diacritic except my case
    because i have about 1330 pages most it contain Arabic diacritic holy quran text and they mostly looking for diacritic texts like in the holy quran and also normal texts to find the exact information they want so they do type "copy paste Arabic diacritic text" when searching to find the exact text or statement they looking for so maybe i should add script that will strip the diacritic inputs text that entered in the search box of the designed page
    but i need to do more tests and i will tell you the results i got ..
    thank you
    Last edited by mrbasserby; May-01-2012, 03:41 AM.

    Leave a comment:


  • Ray
    replied
    Just to clarify, it seems like you did not have this feature enabled before? So the behaviour is now different with the feature enabled?

    From your most recent examples, it seems that it now matches all occurences on a web page (with diacritics and without) so long as the user enters the non-diacritic version of the word into the search box.

    However, it will not match if the user enters the diacritic version of the word into the search box.

    Correct me if the above summary is inaccurate.

    If this is the case, then it is behaving currently as designed. The indexer is capable of stripping diacritic marks from arabic languages because it is run from your computer. However, the search script (PHP, ASP, CGI, etc.) does not have this available because most hosting platforms are limited and it would be difficult to impose locale/regional settings on the web server (alot of people are on shared hosting and not dedicated servers).

    As yourself and the original poster of this thread stated however, most of the time Arabic users do not type in the diacritic marks when searching. So perhaps you can simply add some advice on the search_template.html page before the search box to tell users they should enter in words without diacritic marks (and that it will match both diacritic and non-diacritic versions found on pages).

    Leave a comment:


  • mrbasserby
    replied
    hi
    yes i did but its not working as i want i give you other example to simplify :
    ------------
    when enable :"Strip Arabic diacritic marks:
    do: index
    type : التابوت
    result : "التابوت" and "التّابُوتِ" ///ok excellent

    type: التّابُوتِ
    result : sorry no result found !!! //// it should give me same result as above
    ------------
    when disable :"Strip Arabic diacritic marks:
    do: index
    type : التابوت
    result : "التابوت"/// ok good

    type: التّابُوتِ
    result : التّابُوتِ ///ok good


    ////////////////////////////////////////////////////////
    another example :
    --------------
    when enable :"Strip Arabic diacritic marks:
    do: index
    type : الأردن
    result : "الأردن" /// it should give me also this :"الاردن" and "الإردن"

    ------------
    when disable :"Strip Arabic diacritic marks:
    do: index
    type : الأردن
    result : "الأردن"/// ok good



    Leave a comment:


  • Ray
    replied
    Did you enable the option to "Strip Arabic diacritic marks from words" under "Configure"->"Languages"?

    You will have to reindex (and upload your new index files) for it to take effect.

    Leave a comment:


  • mrbasserby
    replied
    hi sorry for posting too many post
    just three is another issue when i type a word in search box i expect from the search engine to find both diacritic and noun diacritic words no matter if the input word was diacritic or not for example if put this word in search box:
    "التّابُوتِ"
    as you see its dirictic and should the search give me all
    type the result if that was my option it like this :

    "التّابُوتِ"
    and
    "التابوت"

    but its not its just working like this when i put this word:

    "التابوت"

    the results as like this :

    "التّابُوتِ"
    and
    "التابوت"

    Leave a comment:


  • mrbasserby
    replied
    hi just one thing

    when you add "ا" and "ل" and "ا" the result is : "الا"

    and also when add "ا" and "ل" and "إ" the result is "الإ"

    and like that we could write "الأ" like this example:
    "الأردن"
    Last edited by mrbasserby; Apr-29-2012, 05:15 AM.

    Leave a comment:


  • mrbasserby
    replied
    hi
    i have the same problem i use zoom search php and i tried to type unicode :
    the first is utf-8 then i tried the second type:windos 1252 arabic

    but both are not strip my diacritic search for example :
    when i put this arabic word in search box :
    "ابليس"
    the result it find all the words that match
    the previous but its not detect this for example:
    "إبليس"
    or this:
    "أبليس"

    you see the difference ?
    most Arabic ppl when searching ignore to write this :
    "ء" which is called in arabic "hamza"
    or
    "أ" called "hamaza" above "alif"
    or
    "إ" called "alif" above "hamaza"
    and just type "alif"
    like this : "ا"
    and chm search detect the two last parts "أ" and "إ"
    even if i type it like this "ا"

    so I'm sure if i missed something ?
    Last edited by mrbasserby; Apr-29-2012, 05:15 AM.

    Leave a comment:


  • Ray
    replied
    Are you using UTF-8 encoding?

    Can you give us the URL to the page in question?

    Also let us know if you are using PHP, ASP, ASP.NET, CGI, or JavaScript. And the version and build of Zoom you are using (click Help->About in the Indexer).

    Can you also give us the name of the diacritic mark in question. It's a little hard to recognize for those of us who don't read Arabic. Is that an "alif hamza"?
    Last edited by Ray; Apr-15-2010, 02:00 AM.

    Leave a comment:


  • dfisch
    started a topic problems with arabic diacritic marks

    problems with arabic diacritic marks

    I am having problems when using an arabic language search with diacritic marks not getting stripped. I have selected "Strip Arabic diacritic marks from words" from the configuration and have reindexed the website.

    for example; I have content on the website that is entered as الأردن
    most of our users would search for الاردن which still returns no results.

    Are there other settings I may need to adjust?
Working...
X