OpenSearch/docs/reference/analysis
Christoph Büscher 4ffa050735 Allow custom characters in token_chars of ngram tokenizers (#49250)
Currently the `token_chars` setting in both `edgeNGram` and `ngram` tokenizers
only allows for a list of predefined character classes, which might not fit
every use case. For example, including underscore "_" in a token would currently
require the `punctuation` class which comes with a lot of other characters.
This change adds an additional "custom" option to the `token_chars` setting,
which requires an additional `custom_token_chars` setting to be present and
which will be interpreted as a set of characters to inlcude into a token.

Closes #25894
2019-11-20 10:37:12 +01:00
..
analyzers Implement Lucene EstonianAnalyzer, Stemmer (#49149) 2019-11-18 17:24:21 +01:00
charfilters Fixed grammar in pattern replace char filter docs. (#46546) 2019-09-10 11:04:07 -07:00
tokenfilters [DOCS] Reformat elision token filter docs (#49262) 2019-11-19 10:55:22 -05:00
tokenizers Allow custom characters in token_chars of ngram tokenizers (#49250) 2019-11-20 10:37:12 +01:00
analyzers.asciidoc [DOCS] Sort analyzers, tokenizers, and token filters alphabetically (#48068) 2019-10-15 15:47:25 -04:00
anatomy.asciidoc Correction of the names of numirals (#21531) 2016-11-25 14:30:49 +01:00
charfilters.asciidoc Hindu-Arabico-Latino Numerals (#22476) 2017-01-10 15:24:56 +01:00
normalizers.asciidoc [DOCS] [2 of 5] Change // CONSOLE comments to [source,console] (#46353) (#46502) 2019-09-09 13:38:14 -04:00
testing.asciidoc [DOCS] [2 of 5] Change // CONSOLE comments to [source,console] (#46353) (#46502) 2019-09-09 13:38:14 -04:00
tokenfilters.asciidoc [DOCS] Reformat compound word token filters (#49006) 2019-11-13 09:36:52 -05:00
tokenizers.asciidoc [DOCS] Sort analyzers, tokenizers, and token filters alphabetically (#48068) 2019-10-15 15:47:25 -04:00