From 9bb9f6336452f3f060f35f51049254215fae0ee0 Mon Sep 17 00:00:00 2001 From: James Rodewig Date: Fri, 6 Mar 2020 07:27:14 -0500 Subject: [PATCH] [DOCS] Note that `trim` filter doesn't change offsets (#53220) The [word delimiter graph token filter docs][0] note that the `trim` filter changes the length of tokens without changing their offsets. This explicitly mentions that in the `trim` filter docs. [0]: https://www.elastic.co/guide/en/elasticsearch/reference/master/analysis-word-delimiter-graph-tokenfilter.html --- .../analysis/tokenfilters/trim-tokenfilter.asciidoc | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/docs/reference/analysis/tokenfilters/trim-tokenfilter.asciidoc b/docs/reference/analysis/tokenfilters/trim-tokenfilter.asciidoc index 19d47f203af..b69f71b7bdc 100644 --- a/docs/reference/analysis/tokenfilters/trim-tokenfilter.asciidoc +++ b/docs/reference/analysis/tokenfilters/trim-tokenfilter.asciidoc @@ -4,7 +4,9 @@ Trim ++++ -Removes leading and trailing whitespace from each token in a stream. +Removes leading and trailing whitespace from each token in a stream. While this +can change the length of a token, the `trim` filter does _not_ change a token's +offsets. The `trim` filter uses Lucene's https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/miscellaneous/TrimFilter.html[TrimFilter]. @@ -37,8 +39,9 @@ GET _analyze } ---- -The API returns the following response. Note the `" fox "` token contains -the original text's whitespace. +The API returns the following response. Note the `" fox "` token contains the +original text's whitespace. Note that despite changing the token's length, the +`start_offset` and `end_offset` remain the same. [source,console-result] ----