[DOCS] Reformat `porter_stem` token filter (#56053)

Makes the following changes to the `porter_stem` token filter docs: * Rewrites description and adds a Lucene link * Adds detailed analyze example * Adds an analyzer example
2020-05-04 10:39:17 -04:00 · 2020-05-04 10:39:17 -04:00 · 4faf5a7916
parent e8ef44ce78
commit 4faf5a7916
1 changed files with 107 additions and 11 deletions
--- a/docs/reference/analysis/tokenfilters/porterstem-tokenfilter.asciidoc
+++ b/docs/reference/analysis/tokenfilters/porterstem-tokenfilter.asciidoc
@ -4,15 +4,111 @@
 <titleabbrev>Porter stem</titleabbrev>
 ++++
-A token filter of type `porter_stem` that transforms the token stream as
+Provides <<algorithmic-stemmers,algorithmic stemming>> for the English language,
-per the Porter stemming algorithm.
+based on the http://snowball.tartarus.org/algorithms/porter/stemmer.html[Porter
 stemming algorithm].
-Note, the input to the stemming filter must already be in lower case, so
+This filter tends to stem more aggressively than other English
-you will need to use
+stemmer filters, such as the <<analysis-kstem-tokenfilter,`kstem`>> filter.
-<<analysis-lowercase-tokenfilter,Lower
+
-Case Token Filter>> or
+The `porter_stem` filter is equivalent to the
-<<analysis-lowercase-tokenizer,Lower
+<<analysis-stemmer-tokenfilter,`stemmer`>> filter's
-Case Tokenizer>> farther down the Tokenizer chain in order for this to
+<<analysis-stemmer-tokenfilter-language-parm,`english`>> variant.
-work properly!. For example, when using custom analyzer, make sure the
+
-`lowercase` filter comes before the `porter_stem` filter in the list of
+The `porter_stem` filter uses Lucene's
-filters.
+{lucene-analysis-docs}/en/PorterStemFilter.html[PorterStemFilter].
 [[analysis-porterstem-tokenfilter-analyze-ex]]
 ==== Example
 The following analyze API request uses the `porter_stem` filter to stem
 `the foxes jumping quickly` to `the fox jump quickli`:
 [source,console]
 ----
 GET /_analyze
 {
  "tokenizer": "standard",
  "filter": [ "porter_stem" ],
  "text": "the foxes jumping quickly"
 }
 ----
 The filter produces the following tokens:
 [source,text]
 ----
 [ the, fox, jump, quickli ]
 ----
 ////
 [source,console-result]
 ----
 {
  "tokens": [
    {
      "token": "the",
      "start_offset": 0,
      "end_offset": 3,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "fox",
      "start_offset": 4,
      "end_offset": 9,
      "type": "<ALPHANUM>",
      "position": 1
    },
    {
      "token": "jump",
      "start_offset": 10,
      "end_offset": 17,
      "type": "<ALPHANUM>",
      "position": 2
    },
    {
      "token": "quickli",
      "start_offset": 18,
      "end_offset": 25,
      "type": "<ALPHANUM>",
      "position": 3
    }
  ]
 }
 ----
 ////
 [[analysis-porterstem-tokenfilter-analyzer-ex]]
 ==== Add to an analyzer
 The following <<indices-create-index,create index API>> request uses the
 `porter_stem` filter to configure a new <<analysis-custom-analyzer,custom
 analyzer>>.
 [IMPORTANT]
 ====
 To work properly, the `porter_stem` filter requires lowercase tokens. To ensure
 tokens are lowercased, add the <<analysis-lowercase-tokenfilter,`lowercase`>>
 filter before the `porter_stem` filter in the analyzer configuration.
 ====
 [source,console]
 ----
 PUT /my_index
 {
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "whitespace",
          "filter": [
            "lowercase",
            "porter_stem"
          ]
        }
      }
    }
  }
 }
 ----