[DOCS] Reformat trim token filter docs (#51649)
Makes the following changes to the `trim` token filter docs: * Updates description * Adds a link to the related Lucene filter * Adds tip about removing whitespace using tokenizers * Adds detailed analyze snippets * Adds custom analyzer snippet
This commit is contained in:
parent
f5bccad847
commit
d336faa0b0
|
@ -4,4 +4,107 @@
|
|||
<titleabbrev>Trim</titleabbrev>
|
||||
++++
|
||||
|
||||
The `trim` token filter trims the whitespace surrounding a token.
|
||||
Removes leading and trailing whitespace from each token in a stream.
|
||||
|
||||
The `trim` filter uses Lucene's
|
||||
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/miscellaneous/TrimFilter.html[TrimFilter].
|
||||
|
||||
[TIP]
|
||||
====
|
||||
Many commonly used tokenizers, such as the
|
||||
<<analysis-standard-tokenizer,`standard`>> or
|
||||
<<analysis-whitespace-tokenizer,`whitespace`>> tokenizer, remove whitespace by
|
||||
default. When using these tokenizers, you don't need to add a separate `trim`
|
||||
filter.
|
||||
====
|
||||
|
||||
[[analysis-trim-tokenfilter-analyze-ex]]
|
||||
==== Example
|
||||
|
||||
To see how the `trim` filter works, you first need to produce a token
|
||||
containing whitespace.
|
||||
|
||||
The following <<indices-analyze,analyze API>> request uses the
|
||||
<<analysis-keyword-tokenizer,`keyword`>> tokenizer to produce a token for
|
||||
`" fox "`.
|
||||
|
||||
[source,console]
|
||||
----
|
||||
GET _analyze
|
||||
{
|
||||
"tokenizer" : "keyword",
|
||||
"text" : " fox "
|
||||
}
|
||||
----
|
||||
|
||||
The API returns the following response. Note the `" fox "` token contains
|
||||
the original text's whitespace.
|
||||
|
||||
[source,console-result]
|
||||
----
|
||||
{
|
||||
"tokens": [
|
||||
{
|
||||
"token": " fox ",
|
||||
"start_offset": 0,
|
||||
"end_offset": 5,
|
||||
"type": "word",
|
||||
"position": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
----
|
||||
|
||||
To remove the whitespace, add the `trim` filter to the previous analyze API
|
||||
request.
|
||||
|
||||
[source,console]
|
||||
----
|
||||
GET _analyze
|
||||
{
|
||||
"tokenizer" : "keyword",
|
||||
"filter" : ["trim"],
|
||||
"text" : " fox "
|
||||
}
|
||||
----
|
||||
|
||||
The API returns the following response. The returned `fox` token does not
|
||||
include any leading or trailing whitespace.
|
||||
|
||||
[source,console-result]
|
||||
----
|
||||
{
|
||||
"tokens": [
|
||||
{
|
||||
"token": "fox",
|
||||
"start_offset": 0,
|
||||
"end_offset": 5,
|
||||
"type": "word",
|
||||
"position": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
----
|
||||
|
||||
[[analysis-trim-tokenfilter-analyzer-ex]]
|
||||
==== Add to an analyzer
|
||||
|
||||
The following <<indices-create-index,create index API>> request uses the `trim`
|
||||
filter to configure a new <<analysis-custom-analyzer,custom analyzer>>.
|
||||
|
||||
[source,console]
|
||||
----
|
||||
PUT trim_example
|
||||
{
|
||||
"settings": {
|
||||
"analysis": {
|
||||
"analyzer": {
|
||||
"keyword_trim": {
|
||||
"tokenizer": "keyword",
|
||||
"filter": [ "trim" ]
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
----
|
Loading…
Reference in New Issue