OpenSearch/modules/analysis-common
Christoph Büscher 4ffa050735 Allow custom characters in token_chars of ngram tokenizers (#49250)
Currently the `token_chars` setting in both `edgeNGram` and `ngram` tokenizers
only allows for a list of predefined character classes, which might not fit
every use case. For example, including underscore "_" in a token would currently
require the `punctuation` class which comes with a lot of other characters.
This change adds an additional "custom" option to the `token_chars` setting,
which requires an additional `custom_token_chars` setting to be present and
which will be interpreted as a set of characters to inlcude into a token.

Closes #25894
2019-11-20 10:37:12 +01:00
..
src Allow custom characters in token_chars of ngram tokenizers (#49250) 2019-11-20 10:37:12 +01:00
build.gradle Apply 2-space indent to all gradle scripts (#49071) 2019-11-14 11:01:23 +00:00