4ffa050735
Currently the `token_chars` setting in both `edgeNGram` and `ngram` tokenizers only allows for a list of predefined character classes, which might not fit every use case. For example, including underscore "_" in a token would currently require the `punctuation` class which comes with a lot of other characters. This change adds an additional "custom" option to the `token_chars` setting, which requires an additional `custom_token_chars` setting to be present and which will be interpreted as a set of characters to inlcude into a token. Closes #25894 |
||
---|---|---|
.. | ||
analyzers | ||
charfilters | ||
tokenfilters | ||
tokenizers | ||
analyzers.asciidoc | ||
anatomy.asciidoc | ||
charfilters.asciidoc | ||
normalizers.asciidoc | ||
testing.asciidoc | ||
tokenfilters.asciidoc | ||
tokenizers.asciidoc |