[DOCS] Reformat trim token filter docs (#51649)

Makes the following changes to the `trim` token filter docs: * Updates description * Adds a link to the related Lucene filter * Adds tip about removing whitespace using tokenizers * Adds detailed analyze snippets * Adds custom analyzer snippet
2020-03-02 07:47:38 -05:00 · 2020-03-02 07:47:38 -05:00 · d336faa0b0
parent f5bccad847
commit d336faa0b0
1 changed files with 104 additions and 1 deletions
--- a/docs/reference/analysis/tokenfilters/trim-tokenfilter.asciidoc
+++ b/docs/reference/analysis/tokenfilters/trim-tokenfilter.asciidoc
@ -4,4 +4,107 @@
 <titleabbrev>Trim</titleabbrev>
 ++++
-The `trim` token filter trims the whitespace surrounding a token.
+Removes leading and trailing whitespace from each token in a stream.
 The `trim` filter uses Lucene's
 https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/miscellaneous/TrimFilter.html[TrimFilter].
 [TIP]
 ====
 Many commonly used tokenizers, such as the
 <<analysis-standard-tokenizer,`standard`>> or
 <<analysis-whitespace-tokenizer,`whitespace`>> tokenizer, remove whitespace by
 default. When using these tokenizers, you don't need to add a separate `trim`
 filter.
 ====
 [[analysis-trim-tokenfilter-analyze-ex]]
 ==== Example
 To see how the `trim` filter works, you first need to produce a token
 containing whitespace.
 The following <<indices-analyze,analyze API>> request uses the
 <<analysis-keyword-tokenizer,`keyword`>> tokenizer to produce a token for 
 `" fox "`.
 [source,console]
 ----
 GET _analyze
 {
  "tokenizer" : "keyword",
  "text" : " fox "
 }
 ----
 The API returns the following response. Note the `" fox "` token contains
 the original text's whitespace.
 [source,console-result]
 ----
 {
  "tokens": [
    {
      "token": " fox ",
      "start_offset": 0,
      "end_offset": 5,
      "type": "word",
      "position": 0
    }
  ]
 }
 ----
 To remove the whitespace, add the `trim` filter to the previous analyze API
 request.
 [source,console]
 ----
 GET _analyze
 {
  "tokenizer" : "keyword",
  "filter" : ["trim"],
  "text" : " fox "
 }
 ----
 The API returns the following response. The returned `fox` token does not
 include any leading or trailing whitespace.
 [source,console-result]
 ----
 {
  "tokens": [
    {
      "token": "fox",
      "start_offset": 0,
      "end_offset": 5,
      "type": "word",
      "position": 0
    }
  ]
 }
 ----
 [[analysis-trim-tokenfilter-analyzer-ex]]
 ==== Add to an analyzer
 The following <<indices-create-index,create index API>> request uses the `trim`
 filter to configure a new <<analysis-custom-analyzer,custom analyzer>>.
 [source,console]
 ----
 PUT trim_example
 {
  "settings": {
    "analysis": {
      "analyzer": {
        "keyword_trim": {
          "tokenizer": "keyword",
          "filter": [ "trim" ]
        }
      }
    }
  }
 }
 ----