Is it possible to add additional special characters or ligatures to a search engine? Something like appending values to $AccentChars and $NormalChars in settings.php.
My client is indexing pdf files that contain transcriptions of 17th century Italian. The search seems to be working well using UTF-8 with "enable accent/diacritic insensitivity..." enabled however some of the glyphs or ligatures found in the documents are not contained in the above mentioned variables so if a user were to spell an indexed word using "normal" characters they may not get the same results as when they use special characters. Adding words with their normalized spelling to the synonyms list does work but the authors would have to produce a list of hundreds of normalized words. This is not possible. We need to have that automatic accent/diacritic insensitivity.
A good example of a special character not included in the above variables would be the long s. We do not get the same search results when a user enters a word with a long s or with a normalized regular s.
My client is indexing pdf files that contain transcriptions of 17th century Italian. The search seems to be working well using UTF-8 with "enable accent/diacritic insensitivity..." enabled however some of the glyphs or ligatures found in the documents are not contained in the above mentioned variables so if a user were to spell an indexed word using "normal" characters they may not get the same results as when they use special characters. Adding words with their normalized spelling to the synonyms list does work but the authors would have to produce a list of hundreds of normalized words. This is not possible. We need to have that automatic accent/diacritic insensitivity.
A good example of a special character not included in the above variables would be the long s. We do not get the same search results when a user enters a word with a long s or with a normalized regular s.
Comment