[Docs] Clarify behaviour of Pattern Capture Token Filter during search (#26278)

There was some confusion about the fact that tokens emitted from a Pattern
Capture Token Filter are treated as synonyms when used to analyze a search
query. This commit adds an explanation to the note in the docs to emphasize this
behaviour.

Closes #25746
This commit is contained in:
Christoph Büscher 2017-08-21 14:56:52 +02:00 committed by GitHub
parent 181e881a0f
commit 254c1b28e9
1 changed files with 6 additions and 4 deletions

View File

@ -131,10 +131,12 @@ Multiple patterns are required to allow overlapping captures, but also
means that patterns are less dense and easier to understand.
*Note:* All tokens are emitted in the same position, and with the same
character offsets, so when combined with highlighting, the whole
original token will be highlighted, not just the matching subset. For
instance, querying the above email address for `"smith"` would
highlight:
character offsets. This means, for example, that a `match` query for
`john-smith_123@foo-bar.com` that uses this analyzer will return documents
containing any of these tokens, even when using the `and` operator.
Also, when combined with highlighting, the whole original token will
be highlighted, not just the matching subset. For instance, querying
the above email address for `"smith"` would highlight:
[source,html]
--------------------------------------------------