PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Stemming now added to V6!

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stemming now added to V6!

    We can now confirm that V6 will feature STEMMING.

    This is a much requested feature, that when enabled, search results will match similar words or words which are derivatives of each other (e.g. plurals). For example, searching for the word "fish" will return pages containing the singular and plural words variates "fish", "fishes", "fishing", etc.

    Adding this feature required some significant changes to the index file format and the way we index and search words, but we are glad to see that the end results seem to be worth the effort.

    Stemming will not be available for JavaScript. The PHP and ASP scripts will only support English stemming, while the CGI version features improved stemming and also stemming support for 16 languages.

    The feature will be enabled by default in V6. But you may want to turn it off, if for example, it is absolutely critical that your website differentiates between "booking", "booker", "book", etc.

    More information on V6 here.
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

  • #2
    Stemming and single-case languages

    I notice that stemming is disabled when "support for single-case languages (ie asian)" is enabled.

    Is this intentional? I can't use both?

    Thanks

    Comment


    • #3
      The stemming algorithm is very language dependent. It doesn't make sense for most asian languages where there are no linguistic concepts such as plurals or verbs.
      --Ray
      Wrensoft Web Software
      Sydney, Australia
      Zoom Search Engine

      Comment


      • #4
        I would like to know which the 16 languages are that stemming works for. Couldn't find it in the article. I think it's a great feature.

        Comment


        • #5
          They are listed on the languages window in the Zoom configuration. (You need to select the CGI script option first however).

          Comment


          • #6
            Thank you

            Comment


            • #7
              Hi, please tell us, and Russian language support v6?
              http://sutki72.net/

              Comment


              • #8
                Russian is supported with a few minor exceptions. See,
                http://www.wrensoft.com/zoom/support/languages.html

                For Russian stemming you need to use the CGI option.

                Comment


                • #9
                  Does that mean that the stemming function does not work for Chinese as well?

                  Comment


                  • #10
                    There is no stemming functionality for Chinese. Linguistically I don't see how that would work either. There is no plural or singular forms of words, nor is there present and past tense in the Chinese language and most asian languages that we are aware of.
                    --Ray
                    Wrensoft Web Software
                    Sydney, Australia
                    Zoom Search Engine

                    Comment


                    • #11
                      Do you plan stemming for czech language? Thank you for answer.

                      Comment


                      • #12
                        Not at the moment.
                        Hasn't been any demand.

                        Comment


                        • #13
                          Just updated to version 6. One issue with the stemming feature I am finding is with highlighting. If I search for a word like domain, I get pages with both domain and domains. This is expected. However, if I click on a result that takes me to a page with domains, there is no highlighting on the page. Has anyone else mentioned this? The only way I have found to make sure the page shows highlighting is to disable stemming which defeats the purpose.

                          Thanks.

                          Comment


                          • #14
                            Yes, it is mentioned under "Known limitations" here:
                            http://www.wrensoft.com/zoom/support...ht.html#limits

                            There is no practical/efficient way to get the "jump to highlighting" script (which is a small JavaScript that runs on each of your content pages) to perform stemming, or to pass it a list of all the matches found on that page. So unless we blow out the complexity of that highlighting script (such that it has a stemming algorithm and we perform a stemming comparison against every word on the page -- this is really not wise considering the script is to be integrated to every page on so many different websites -- many of which have alot of other JS running already and it may conflict in functionality not to mention execution time).

                            As noted, we simply cannot "highlight" every occurrence that is considered by the actual indexing and matching algorithm (which is much more complicated and has much more resources available, there is also synonyms and diacritics, etc.), without costing elsewhere (e.g. script complexity, more integration problems, larger download per page and so slower access times, etc.).

                            So the jump to highlighting script serves as a highlight of some occurrences, not as an accurate representation of what have been matched.
                            --Ray
                            Wrensoft Web Software
                            Sydney, Australia
                            Zoom Search Engine

                            Comment


                            • #15
                              dear admin.
                              I would love to use this V6 version, but need some support when used.
                              please tell me Vietnamese language support v6?
                              A Nice Day.

                              Comment

                              Working...
                              X