[DOCS] Reformat `kstem` token filter (#55823)

Makes the following changes to the `kstem` token filter docs:

* Rewrite description and adds a Lucene work
* Adds detailed analyze example
* Adds an analyzer example
This commit is contained in:
James Rodewig 2020-04-29 08:52:55 -04:00 committed by GitHub
parent 6a0e1e161b
commit 767836c367
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 109 additions and 3 deletions

View File

@ -4,6 +4,112 @@
<titleabbrev>KStem</titleabbrev>
++++
The `kstem` token filter is a high performance filter for english. All
terms must already be lowercased (use `lowercase` filter) for this
filter to work correctly.
Provides http://ciir.cs.umass.edu/pubfiles/ir-35.pdf[KStem]-based stemming for
the English language. The `kstem` filter combines
<<algorithmic-stemmers,algorithmic stemming>> with a built-in
<<dictionary-stemmers,dictionary>>.
The `kstem` filter tends to stem less aggressively than other English stemmer
filters, such as the <<analysis-porterstem-tokenfilter,`porter_stem`>> filter.
The `kstem` filter is equivalent to the
<<analysis-stemmer-tokenfilter,`stemmer`>> filter's
<<analysis-stemmer-tokenfilter-language-parm,`light_english`>> variant.
This filter uses Lucene's
{lucene-analysis-docs}s/en/KStemFilter.html[KStemFilter].
[[analysis-kstem-tokenfilter-analyze-ex]]
==== Example
The following analyze API request uses the `kstem` filter to stem `the foxes
jumping quickly` to `the fox jump quick`:
[source,console]
----
GET /_analyze
{
"tokenizer": "standard",
"filter": [ "kstem" ],
"text": "the foxes jumping quickly"
}
----
The filter produces the following tokens:
[source,text]
----
[ the, fox, jump, quick ]
----
////
[source,console-result]
----
{
"tokens": [
{
"token": "the",
"start_offset": 0,
"end_offset": 3,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "fox",
"start_offset": 4,
"end_offset": 9,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "jump",
"start_offset": 10,
"end_offset": 17,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "quick",
"start_offset": 18,
"end_offset": 25,
"type": "<ALPHANUM>",
"position": 3
}
]
}
----
////
[[analysis-kstem-tokenfilter-analyzer-ex]]
==== Add to an analyzer
The following <<indices-create-index,create index API>> request uses the
`kstem` filter to configure a new <<analysis-custom-analyzer,custom
analyzer>>.
[IMPORTANT]
====
To work properly, the `kstem` filter requires lowercase tokens. To ensure tokens
are lowercased, add the <<analysis-lowercase-tokenfilter,`lowercase`>> filter
before the `kstem` filter in the analyzer configuration.
====
[source,console]
----
PUT /my_index
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "whitespace",
"filter": [
"lowercase",
"kstem"
]
}
}
}
}
}
----