OpenSearch/docs/reference/analysis/tokenizers/standard-tokenizer.asciidoc

19 lines
766 B
Plaintext
Raw Normal View History

[[analysis-standard-tokenizer]]
=== Standard Tokenizer
A tokenizer of type `standard` providing grammar based tokenizer that is
a good tokenizer for most European language documents. The tokenizer
implements the Unicode Text Segmentation algorithm, as specified in
http://unicode.org/reports/tr29/[Unicode Standard Annex #29].
The following are settings that can be set for a `standard` tokenizer
type:
[cols="<,<",options="header",]
|=======================================================================
|Setting |Description
|`max_token_length` |The maximum token length. If a token is seen that
exceeds this length then it is split at `max_token_length` intervals. Defaults to `255`.
|=======================================================================