[Docs] Clarify behaviour of Pattern Capture Token Filter during search (#26278)

There was some confusion about the fact that tokens emitted from a Pattern Capture Token Filter are treated as synonyms when used to analyze a search query. This commit adds an explanation to the note in the docs to emphasize this behaviour. Closes #25746
2025-03-25 09:28:27 +00:00 · 2017-08-21 14:56:52 +02:00 · 2017-08-21 14:56:52 +02:00 · 254c1b28e9
commit 254c1b28e9
parent 181e881a0f
1 changed files with 6 additions and 4 deletions
--- a/docs/reference/analysis/tokenfilters/pattern-capture-tokenfilter.asciidoc
+++ b/docs/reference/analysis/tokenfilters/pattern-capture-tokenfilter.asciidoc
@ -131,10 +131,12 @@ Multiple patterns are required to allow overlapping captures, but also
 means that patterns are less dense and easier to understand.

 *Note:* All tokens are emitted in the same position, and with the same
-character offsets, so when combined with highlighting, the whole
-original token will be highlighted, not just the matching subset. For
-instance, querying the above email address for `"smith"` would
-highlight:
+character offsets. This means, for example, that a `match` query for
+`john-smith_123@foo-bar.com` that uses this analyzer will return documents
+containing any of these tokens, even when using the `and` operator.
+Also, when combined with highlighting, the whole original token will 
+be highlighted, not just the matching subset. For instance, querying 
+the above email address for `"smith"` would highlight:

 [source,html]
 --------------------------------------------------