OpenSearch/docs/reference/analysis
Alan Woodward a646f85a99
Ensure TokenFilters only produce single tokens when parsing synonyms (#34331)
A number of tokenfilters can produce multiple tokens at the same position.  This
is a problem when using token chains to parse synonym files, as the SynonymMap
requires that there are no stacked tokens in its input.

This commit ensures that when used to parse synonyms, these tokenfilters either produce
a single version of their input token, or that they throw an error when mappings are 
generated.  In indexes created in elasticsearch 6.x deprecation warnings are emitted in place 
of the error. 

* asciifolding and cjk_bigram produce only the folded or bigrammed token
* decompounders, synonyms and keyword_repeat are skipped
* n-grams, word-delimiter-filter, multiplexer, fingerprint and phonetic throw errors

Fixes #34298
2018-11-29 10:35:38 +00:00
..
analyzers Add a prebuilt ICU Analyzer (#34958) 2018-11-21 09:00:48 +00:00
charfilters fixed elements in array of produced terms (#32519) 2018-08-02 11:12:15 -04:00
tokenfilters Ensure TokenFilters only produce single tokens when parsing synonyms (#34331) 2018-11-29 10:35:38 +00:00
tokenizers [Feature] Adding a char_group tokenizer (#24186) 2018-05-22 16:26:31 +02:00
analyzers.asciidoc First pass at improving analyzer docs (#18269) 2016-05-11 14:17:56 +02:00
anatomy.asciidoc Correction of the names of numirals (#21531) 2016-11-25 14:30:49 +01:00
charfilters.asciidoc Hindu-Arabico-Latino Numerals (#22476) 2017-01-10 15:24:56 +01:00
normalizers.asciidoc Make sure to use the type _doc in the REST documentation. (#34662) 2018-10-22 11:54:04 -07:00
testing.asciidoc Allow `_doc` as a type. (#27816) 2017-12-14 17:47:53 +01:00
tokenfilters.asciidoc Add predicate_token_filter (#33431) 2018-09-11 09:16:39 +01:00
tokenizers.asciidoc [Feature] Adding a char_group tokenizer (#24186) 2018-05-22 16:26:31 +02:00