[DOCS] Reformat `kstem` token filter (#55823)
Makes the following changes to the `kstem` token filter docs: * Rewrite description and adds a Lucene work * Adds detailed analyze example * Adds an analyzer example
This commit is contained in:
parent
6a0e1e161b
commit
767836c367
|
@ -4,6 +4,112 @@
|
||||||
<titleabbrev>KStem</titleabbrev>
|
<titleabbrev>KStem</titleabbrev>
|
||||||
++++
|
++++
|
||||||
|
|
||||||
The `kstem` token filter is a high performance filter for english. All
|
Provides http://ciir.cs.umass.edu/pubfiles/ir-35.pdf[KStem]-based stemming for
|
||||||
terms must already be lowercased (use `lowercase` filter) for this
|
the English language. The `kstem` filter combines
|
||||||
filter to work correctly.
|
<<algorithmic-stemmers,algorithmic stemming>> with a built-in
|
||||||
|
<<dictionary-stemmers,dictionary>>.
|
||||||
|
|
||||||
|
The `kstem` filter tends to stem less aggressively than other English stemmer
|
||||||
|
filters, such as the <<analysis-porterstem-tokenfilter,`porter_stem`>> filter.
|
||||||
|
|
||||||
|
The `kstem` filter is equivalent to the
|
||||||
|
<<analysis-stemmer-tokenfilter,`stemmer`>> filter's
|
||||||
|
<<analysis-stemmer-tokenfilter-language-parm,`light_english`>> variant.
|
||||||
|
|
||||||
|
This filter uses Lucene's
|
||||||
|
{lucene-analysis-docs}s/en/KStemFilter.html[KStemFilter].
|
||||||
|
|
||||||
|
[[analysis-kstem-tokenfilter-analyze-ex]]
|
||||||
|
==== Example
|
||||||
|
|
||||||
|
The following analyze API request uses the `kstem` filter to stem `the foxes
|
||||||
|
jumping quickly` to `the fox jump quick`:
|
||||||
|
|
||||||
|
[source,console]
|
||||||
|
----
|
||||||
|
GET /_analyze
|
||||||
|
{
|
||||||
|
"tokenizer": "standard",
|
||||||
|
"filter": [ "kstem" ],
|
||||||
|
"text": "the foxes jumping quickly"
|
||||||
|
}
|
||||||
|
----
|
||||||
|
|
||||||
|
The filter produces the following tokens:
|
||||||
|
|
||||||
|
[source,text]
|
||||||
|
----
|
||||||
|
[ the, fox, jump, quick ]
|
||||||
|
----
|
||||||
|
|
||||||
|
////
|
||||||
|
[source,console-result]
|
||||||
|
----
|
||||||
|
{
|
||||||
|
"tokens": [
|
||||||
|
{
|
||||||
|
"token": "the",
|
||||||
|
"start_offset": 0,
|
||||||
|
"end_offset": 3,
|
||||||
|
"type": "<ALPHANUM>",
|
||||||
|
"position": 0
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"token": "fox",
|
||||||
|
"start_offset": 4,
|
||||||
|
"end_offset": 9,
|
||||||
|
"type": "<ALPHANUM>",
|
||||||
|
"position": 1
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"token": "jump",
|
||||||
|
"start_offset": 10,
|
||||||
|
"end_offset": 17,
|
||||||
|
"type": "<ALPHANUM>",
|
||||||
|
"position": 2
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"token": "quick",
|
||||||
|
"start_offset": 18,
|
||||||
|
"end_offset": 25,
|
||||||
|
"type": "<ALPHANUM>",
|
||||||
|
"position": 3
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
----
|
||||||
|
////
|
||||||
|
|
||||||
|
[[analysis-kstem-tokenfilter-analyzer-ex]]
|
||||||
|
==== Add to an analyzer
|
||||||
|
|
||||||
|
The following <<indices-create-index,create index API>> request uses the
|
||||||
|
`kstem` filter to configure a new <<analysis-custom-analyzer,custom
|
||||||
|
analyzer>>.
|
||||||
|
|
||||||
|
[IMPORTANT]
|
||||||
|
====
|
||||||
|
To work properly, the `kstem` filter requires lowercase tokens. To ensure tokens
|
||||||
|
are lowercased, add the <<analysis-lowercase-tokenfilter,`lowercase`>> filter
|
||||||
|
before the `kstem` filter in the analyzer configuration.
|
||||||
|
====
|
||||||
|
|
||||||
|
[source,console]
|
||||||
|
----
|
||||||
|
PUT /my_index
|
||||||
|
{
|
||||||
|
"settings": {
|
||||||
|
"analysis": {
|
||||||
|
"analyzer": {
|
||||||
|
"my_analyzer": {
|
||||||
|
"tokenizer": "whitespace",
|
||||||
|
"filter": [
|
||||||
|
"lowercase",
|
||||||
|
"kstem"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
----
|
||||||
|
|
Loading…
Reference in New Issue