OpenSearch

History

Christoph Büscher 4ffa050735 Allow custom characters in token_chars of ngram tokenizers (#49250 ) Currently the `token_chars` setting in both `edgeNGram` and `ngram` tokenizers only allows for a list of predefined character classes, which might not fit every use case. For example, including underscore "_" in a token would currently require the `punctuation` class which comes with a lot of other characters. This change adds an additional "custom" option to the `token_chars` setting, which requires an additional `custom_token_chars` setting to be present and which will be interpreted as a set of characters to inlcude into a token. Closes #25894		2019-11-20 10:37:12 +01:00
..
analyzers	Implement Lucene EstonianAnalyzer, Stemmer (#49149 )	2019-11-18 17:24:21 +01:00
charfilters	Fixed grammar in pattern replace char filter docs. (#46546 )	2019-09-10 11:04:07 -07:00
tokenfilters	[DOCS] Reformat elision token filter docs (#49262 )	2019-11-19 10:55:22 -05:00
tokenizers	Allow custom characters in token_chars of ngram tokenizers (#49250 )	2019-11-20 10:37:12 +01:00
analyzers.asciidoc	[DOCS] Sort analyzers, tokenizers, and token filters alphabetically (#48068 )	2019-10-15 15:47:25 -04:00
anatomy.asciidoc	Correction of the names of numirals (#21531 )	2016-11-25 14:30:49 +01:00
charfilters.asciidoc	Hindu-Arabico-Latino Numerals (#22476 )	2017-01-10 15:24:56 +01:00
normalizers.asciidoc	[DOCS] [2 of 5] Change // CONSOLE comments to [source,console] (#46353 ) (#46502 )	2019-09-09 13:38:14 -04:00
testing.asciidoc	[DOCS] [2 of 5] Change // CONSOLE comments to [source,console] (#46353 ) (#46502 )	2019-09-09 13:38:14 -04:00
tokenfilters.asciidoc	[DOCS] Reformat compound word token filters (#49006 )	2019-11-13 09:36:52 -05:00
tokenizers.asciidoc	[DOCS] Sort analyzers, tokenizers, and token filters alphabetically (#48068 )	2019-10-15 15:47:25 -04:00