[DOCS] Reformat trim token filter docs (#51649)
Makes the following changes to the `trim` token filter docs: * Updates description * Adds a link to the related Lucene filter * Adds tip about removing whitespace using tokenizers * Adds detailed analyze snippets * Adds custom analyzer snippet
This commit is contained in:
parent
f5bccad847
commit
d336faa0b0
|
@ -4,4 +4,107 @@
|
||||||
<titleabbrev>Trim</titleabbrev>
|
<titleabbrev>Trim</titleabbrev>
|
||||||
++++
|
++++
|
||||||
|
|
||||||
The `trim` token filter trims the whitespace surrounding a token.
|
Removes leading and trailing whitespace from each token in a stream.
|
||||||
|
|
||||||
|
The `trim` filter uses Lucene's
|
||||||
|
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/miscellaneous/TrimFilter.html[TrimFilter].
|
||||||
|
|
||||||
|
[TIP]
|
||||||
|
====
|
||||||
|
Many commonly used tokenizers, such as the
|
||||||
|
<<analysis-standard-tokenizer,`standard`>> or
|
||||||
|
<<analysis-whitespace-tokenizer,`whitespace`>> tokenizer, remove whitespace by
|
||||||
|
default. When using these tokenizers, you don't need to add a separate `trim`
|
||||||
|
filter.
|
||||||
|
====
|
||||||
|
|
||||||
|
[[analysis-trim-tokenfilter-analyze-ex]]
|
||||||
|
==== Example
|
||||||
|
|
||||||
|
To see how the `trim` filter works, you first need to produce a token
|
||||||
|
containing whitespace.
|
||||||
|
|
||||||
|
The following <<indices-analyze,analyze API>> request uses the
|
||||||
|
<<analysis-keyword-tokenizer,`keyword`>> tokenizer to produce a token for
|
||||||
|
`" fox "`.
|
||||||
|
|
||||||
|
[source,console]
|
||||||
|
----
|
||||||
|
GET _analyze
|
||||||
|
{
|
||||||
|
"tokenizer" : "keyword",
|
||||||
|
"text" : " fox "
|
||||||
|
}
|
||||||
|
----
|
||||||
|
|
||||||
|
The API returns the following response. Note the `" fox "` token contains
|
||||||
|
the original text's whitespace.
|
||||||
|
|
||||||
|
[source,console-result]
|
||||||
|
----
|
||||||
|
{
|
||||||
|
"tokens": [
|
||||||
|
{
|
||||||
|
"token": " fox ",
|
||||||
|
"start_offset": 0,
|
||||||
|
"end_offset": 5,
|
||||||
|
"type": "word",
|
||||||
|
"position": 0
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
----
|
||||||
|
|
||||||
|
To remove the whitespace, add the `trim` filter to the previous analyze API
|
||||||
|
request.
|
||||||
|
|
||||||
|
[source,console]
|
||||||
|
----
|
||||||
|
GET _analyze
|
||||||
|
{
|
||||||
|
"tokenizer" : "keyword",
|
||||||
|
"filter" : ["trim"],
|
||||||
|
"text" : " fox "
|
||||||
|
}
|
||||||
|
----
|
||||||
|
|
||||||
|
The API returns the following response. The returned `fox` token does not
|
||||||
|
include any leading or trailing whitespace.
|
||||||
|
|
||||||
|
[source,console-result]
|
||||||
|
----
|
||||||
|
{
|
||||||
|
"tokens": [
|
||||||
|
{
|
||||||
|
"token": "fox",
|
||||||
|
"start_offset": 0,
|
||||||
|
"end_offset": 5,
|
||||||
|
"type": "word",
|
||||||
|
"position": 0
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
----
|
||||||
|
|
||||||
|
[[analysis-trim-tokenfilter-analyzer-ex]]
|
||||||
|
==== Add to an analyzer
|
||||||
|
|
||||||
|
The following <<indices-create-index,create index API>> request uses the `trim`
|
||||||
|
filter to configure a new <<analysis-custom-analyzer,custom analyzer>>.
|
||||||
|
|
||||||
|
[source,console]
|
||||||
|
----
|
||||||
|
PUT trim_example
|
||||||
|
{
|
||||||
|
"settings": {
|
||||||
|
"analysis": {
|
||||||
|
"analyzer": {
|
||||||
|
"keyword_trim": {
|
||||||
|
"tokenizer": "keyword",
|
||||||
|
"filter": [ "trim" ]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
----
|
Loading…
Reference in New Issue