[DOCS] Reformat `porter_stem` token filter (#56053)

Makes the following changes to the `porter_stem` token filter docs:

* Rewrites description and adds a Lucene link
* Adds detailed analyze example
* Adds an analyzer example
This commit is contained in:
James Rodewig 2020-05-04 10:39:17 -04:00 committed by GitHub
parent e8ef44ce78
commit 4faf5a7916
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 107 additions and 11 deletions

View File

@ -4,15 +4,111 @@
<titleabbrev>Porter stem</titleabbrev> <titleabbrev>Porter stem</titleabbrev>
++++ ++++
A token filter of type `porter_stem` that transforms the token stream as Provides <<algorithmic-stemmers,algorithmic stemming>> for the English language,
per the Porter stemming algorithm. based on the http://snowball.tartarus.org/algorithms/porter/stemmer.html[Porter
stemming algorithm].
Note, the input to the stemming filter must already be in lower case, so This filter tends to stem more aggressively than other English
you will need to use stemmer filters, such as the <<analysis-kstem-tokenfilter,`kstem`>> filter.
<<analysis-lowercase-tokenfilter,Lower
Case Token Filter>> or The `porter_stem` filter is equivalent to the
<<analysis-lowercase-tokenizer,Lower <<analysis-stemmer-tokenfilter,`stemmer`>> filter's
Case Tokenizer>> farther down the Tokenizer chain in order for this to <<analysis-stemmer-tokenfilter-language-parm,`english`>> variant.
work properly!. For example, when using custom analyzer, make sure the
`lowercase` filter comes before the `porter_stem` filter in the list of The `porter_stem` filter uses Lucene's
filters. {lucene-analysis-docs}/en/PorterStemFilter.html[PorterStemFilter].
[[analysis-porterstem-tokenfilter-analyze-ex]]
==== Example
The following analyze API request uses the `porter_stem` filter to stem
`the foxes jumping quickly` to `the fox jump quickli`:
[source,console]
----
GET /_analyze
{
"tokenizer": "standard",
"filter": [ "porter_stem" ],
"text": "the foxes jumping quickly"
}
----
The filter produces the following tokens:
[source,text]
----
[ the, fox, jump, quickli ]
----
////
[source,console-result]
----
{
"tokens": [
{
"token": "the",
"start_offset": 0,
"end_offset": 3,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "fox",
"start_offset": 4,
"end_offset": 9,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "jump",
"start_offset": 10,
"end_offset": 17,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "quickli",
"start_offset": 18,
"end_offset": 25,
"type": "<ALPHANUM>",
"position": 3
}
]
}
----
////
[[analysis-porterstem-tokenfilter-analyzer-ex]]
==== Add to an analyzer
The following <<indices-create-index,create index API>> request uses the
`porter_stem` filter to configure a new <<analysis-custom-analyzer,custom
analyzer>>.
[IMPORTANT]
====
To work properly, the `porter_stem` filter requires lowercase tokens. To ensure
tokens are lowercased, add the <<analysis-lowercase-tokenfilter,`lowercase`>>
filter before the `porter_stem` filter in the analyzer configuration.
====
[source,console]
----
PUT /my_index
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "whitespace",
"filter": [
"lowercase",
"porter_stem"
]
}
}
}
}
}
----