[DOCS] Reformat `kstem` token filter (#55823)

Makes the following changes to the `kstem` token filter docs: * Rewrite description and adds a Lucene work * Adds detailed analyze example * Adds an analyzer example
2020-04-29 08:52:55 -04:00 · 2020-04-29 08:52:55 -04:00 · 767836c367
parent 6a0e1e161b
commit 767836c367
1 changed files with 109 additions and 3 deletions
--- a/docs/reference/analysis/tokenfilters/kstem-tokenfilter.asciidoc
+++ b/docs/reference/analysis/tokenfilters/kstem-tokenfilter.asciidoc
@ -4,6 +4,112 @@
 <titleabbrev>KStem</titleabbrev>
 ++++
-The `kstem` token filter is a high performance filter for english. All
+Provides http://ciir.cs.umass.edu/pubfiles/ir-35.pdf[KStem]-based stemming for
-terms must already be lowercased (use `lowercase` filter) for this
+the English language. The `kstem` filter combines
-filter to work correctly.
+<<algorithmic-stemmers,algorithmic stemming>> with a built-in
 <<dictionary-stemmers,dictionary>>.
 The `kstem` filter tends to stem less aggressively than other English stemmer
 filters, such as the <<analysis-porterstem-tokenfilter,`porter_stem`>> filter.
 The `kstem` filter is equivalent to the
 <<analysis-stemmer-tokenfilter,`stemmer`>> filter's
 <<analysis-stemmer-tokenfilter-language-parm,`light_english`>> variant.
 This filter uses Lucene's
 {lucene-analysis-docs}s/en/KStemFilter.html[KStemFilter].
 [[analysis-kstem-tokenfilter-analyze-ex]]
 ==== Example
 The following analyze API request uses the `kstem` filter to stem `the foxes
 jumping quickly` to `the fox jump quick`:
 [source,console]
 ----
 GET /_analyze
 {
  "tokenizer": "standard",
  "filter": [ "kstem" ],
  "text": "the foxes jumping quickly"
 }
 ----
 The filter produces the following tokens:
 [source,text]
 ----
 [ the, fox, jump, quick ]
 ----
 ////
 [source,console-result]
 ----
 {
  "tokens": [
    {
      "token": "the",
      "start_offset": 0,
      "end_offset": 3,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "fox",
      "start_offset": 4,
      "end_offset": 9,
      "type": "<ALPHANUM>",
      "position": 1
    },
    {
      "token": "jump",
      "start_offset": 10,
      "end_offset": 17,
      "type": "<ALPHANUM>",
      "position": 2
    },
    {
      "token": "quick",
      "start_offset": 18,
      "end_offset": 25,
      "type": "<ALPHANUM>",
      "position": 3
    }
  ]
 }
 ----
 ////
 [[analysis-kstem-tokenfilter-analyzer-ex]]
 ==== Add to an analyzer
 The following <<indices-create-index,create index API>> request uses the
 `kstem` filter to configure a new <<analysis-custom-analyzer,custom
 analyzer>>.
 [IMPORTANT]
 ====
 To work properly, the `kstem` filter requires lowercase tokens. To ensure tokens
 are lowercased, add the <<analysis-lowercase-tokenfilter,`lowercase`>> filter
 before the `kstem` filter in the analyzer configuration.
 ====
 [source,console]
 ----
 PUT /my_index
 {
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "whitespace",
          "filter": [
            "lowercase",
            "kstem"
          ]
        }
      }
    }
  }
 }
 ----