[DOCS] Reformat trim token filter docs (#51649)

Makes the following changes to the `trim` token filter docs:

* Updates description
* Adds a link to the related Lucene filter
* Adds tip about removing whitespace using tokenizers
* Adds detailed analyze snippets
* Adds custom analyzer snippet
This commit is contained in:
James Rodewig 2020-03-02 07:47:38 -05:00
parent f5bccad847
commit d336faa0b0
1 changed files with 104 additions and 1 deletions

View File

@ -4,4 +4,107 @@
<titleabbrev>Trim</titleabbrev> <titleabbrev>Trim</titleabbrev>
++++ ++++
The `trim` token filter trims the whitespace surrounding a token. Removes leading and trailing whitespace from each token in a stream.
The `trim` filter uses Lucene's
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/miscellaneous/TrimFilter.html[TrimFilter].
[TIP]
====
Many commonly used tokenizers, such as the
<<analysis-standard-tokenizer,`standard`>> or
<<analysis-whitespace-tokenizer,`whitespace`>> tokenizer, remove whitespace by
default. When using these tokenizers, you don't need to add a separate `trim`
filter.
====
[[analysis-trim-tokenfilter-analyze-ex]]
==== Example
To see how the `trim` filter works, you first need to produce a token
containing whitespace.
The following <<indices-analyze,analyze API>> request uses the
<<analysis-keyword-tokenizer,`keyword`>> tokenizer to produce a token for
`" fox "`.
[source,console]
----
GET _analyze
{
"tokenizer" : "keyword",
"text" : " fox "
}
----
The API returns the following response. Note the `" fox "` token contains
the original text's whitespace.
[source,console-result]
----
{
"tokens": [
{
"token": " fox ",
"start_offset": 0,
"end_offset": 5,
"type": "word",
"position": 0
}
]
}
----
To remove the whitespace, add the `trim` filter to the previous analyze API
request.
[source,console]
----
GET _analyze
{
"tokenizer" : "keyword",
"filter" : ["trim"],
"text" : " fox "
}
----
The API returns the following response. The returned `fox` token does not
include any leading or trailing whitespace.
[source,console-result]
----
{
"tokens": [
{
"token": "fox",
"start_offset": 0,
"end_offset": 5,
"type": "word",
"position": 0
}
]
}
----
[[analysis-trim-tokenfilter-analyzer-ex]]
==== Add to an analyzer
The following <<indices-create-index,create index API>> request uses the `trim`
filter to configure a new <<analysis-custom-analyzer,custom analyzer>>.
[source,console]
----
PUT trim_example
{
"settings": {
"analysis": {
"analyzer": {
"keyword_trim": {
"tokenizer": "keyword",
"filter": [ "trim" ]
}
}
}
}
}
----