[DOCS] Reformat `porter_stem` token filter (#56053)

Makes the following changes to the `porter_stem` token filter docs: * Rewrites description and adds a Lucene link * Adds detailed analyze example * Adds an analyzer example
2020-05-04 10:39:17 -04:00 · 2020-05-04 10:39:17 -04:00 · 4faf5a7916
parent e8ef44ce78
commit 4faf5a7916
1 changed files with 107 additions and 11 deletions
--- a/docs/reference/analysis/tokenfilters/porterstem-tokenfilter.asciidoc
+++ b/docs/reference/analysis/tokenfilters/porterstem-tokenfilter.asciidoc
@ -4,15 +4,111 @@
 <titleabbrev>Porter stem</titleabbrev>
 ++++

-A token filter of type `porter_stem` that transforms the token stream as
-per the Porter stemming algorithm.
+Provides <<algorithmic-stemmers,algorithmic stemming>> for the English language,
+based on the http://snowball.tartarus.org/algorithms/porter/stemmer.html[Porter
+stemming algorithm].

-Note, the input to the stemming filter must already be in lower case, so
-you will need to use
-<<analysis-lowercase-tokenfilter,Lower
-Case Token Filter>> or
-<<analysis-lowercase-tokenizer,Lower
-Case Tokenizer>> farther down the Tokenizer chain in order for this to
-work properly!. For example, when using custom analyzer, make sure the
-`lowercase` filter comes before the `porter_stem` filter in the list of
-filters.
+This filter tends to stem more aggressively than other English
+stemmer filters, such as the <<analysis-kstem-tokenfilter,`kstem`>> filter.
+
+The `porter_stem` filter is equivalent to the
+<<analysis-stemmer-tokenfilter,`stemmer`>> filter's
+<<analysis-stemmer-tokenfilter-language-parm,`english`>> variant.
+
+The `porter_stem` filter uses Lucene's
+{lucene-analysis-docs}/en/PorterStemFilter.html[PorterStemFilter].
+
+[[analysis-porterstem-tokenfilter-analyze-ex]]
+==== Example
+
+The following analyze API request uses the `porter_stem` filter to stem
+`the foxes jumping quickly` to `the fox jump quickli`:
+
+[source,console]
+----
+GET /_analyze
+{
+  "tokenizer": "standard",
+  "filter": [ "porter_stem" ],
+  "text": "the foxes jumping quickly"
+}
+----
+
+The filter produces the following tokens:
+
+[source,text]
+----
+[ the, fox, jump, quickli ]
+----
+
+////
+[source,console-result]
+----
+{
+  "tokens": [
+    {
+      "token": "the",
+      "start_offset": 0,
+      "end_offset": 3,
+      "type": "<ALPHANUM>",
+      "position": 0
+    },
+    {
+      "token": "fox",
+      "start_offset": 4,
+      "end_offset": 9,
+      "type": "<ALPHANUM>",
+      "position": 1
+    },
+    {
+      "token": "jump",
+      "start_offset": 10,
+      "end_offset": 17,
+      "type": "<ALPHANUM>",
+      "position": 2
+    },
+    {
+      "token": "quickli",
+      "start_offset": 18,
+      "end_offset": 25,
+      "type": "<ALPHANUM>",
+      "position": 3
+    }
+  ]
+}
+----
+////
+
+[[analysis-porterstem-tokenfilter-analyzer-ex]]
+==== Add to an analyzer
+
+The following <<indices-create-index,create index API>> request uses the
+`porter_stem` filter to configure a new <<analysis-custom-analyzer,custom
+analyzer>>.
+
+[IMPORTANT]
+====
+To work properly, the `porter_stem` filter requires lowercase tokens. To ensure
+tokens are lowercased, add the <<analysis-lowercase-tokenfilter,`lowercase`>>
+filter before the `porter_stem` filter in the analyzer configuration.
+====
+
+[source,console]
+----
+PUT /my_index
+{
+  "settings": {
+    "analysis": {
+      "analyzer": {
+        "my_analyzer": {
+          "tokenizer": "whitespace",
+          "filter": [
+            "lowercase",
+            "porter_stem"
+          ]
+        }
+      }
+    }
+  }
+}
+----