Improvements to docs around multiplexer and synonyms (#41645)

This commit fixes a multiplexer doc error concerning synonyms, and adds
suggestions on how to combine the two filters.
This commit is contained in:
Alan Woodward 2019-05-07 09:09:28 +01:00
parent c808badb23
commit 3a35427b6d
3 changed files with 13 additions and 6 deletions

View File

@ -116,9 +116,8 @@ And it'd respond:
duplicate of this token it has been removed from the token stream
NOTE: The synonym and synonym_graph filters use their preceding analysis chain to
parse and analyse their synonym lists, and ignore any token filters in the chain
that produce multiple tokens at the same position. This means that any filters
within the multiplexer will be ignored for the purpose of synonyms. If you want to
use filters contained within the multiplexer for parsing synonyms (for example, to
apply stemming to the synonym lists), then you should append the synonym filter
to the relevant multiplexer filter list.
parse and analyse their synonym lists, and will throw an exception if that chain
contains token filters that produce multiple tokens at the same position.
If you want to apply synonyms to a token stream containing a multiplexer, then you
should append the synonym filter to each relevant multiplexer filter list, rather than
placing it after the multiplexer in the main token chain definition.

View File

@ -188,6 +188,10 @@ parsing synonyms, e.g. `asciifolding` will only produce the folded version of th
token. Others, e.g. `multiplexer`, `word_delimiter_graph` or `ngram` will throw an
error.
If you need to build analyzers that include both multi-token filters and synonym
filters, consider using the <<analysis-multiplexer-tokenfilter,multiplexer>> filter,
with the multi-token filters in one branch and the synonym filter in the other.
WARNING: The synonym rules should not contain words that are removed by
a filter that appears after in the chain (a `stop` filter for instance).
Removing a term from a synonym rule breaks the matching at query time.

View File

@ -177,3 +177,7 @@ multiple versions of a token may choose which version of the token to emit when
parsing synonyms, e.g. `asciifolding` will only produce the folded version of the
token. Others, e.g. `multiplexer`, `word_delimiter_graph` or `ngram` will throw an
error.
If you need to build analyzers that include both multi-token filters and synonym
filters, consider using the <<analysis-multiplexer-tokenfilter,multiplexer>> filter,
with the multi-token filters in one branch and the synonym filter in the other.