OpenSearch/docs/reference/analysis/tokenizers/keyword-tokenizer.asciidoc

59 lines
1.2 KiB
Plaintext
Raw Normal View History

[[analysis-keyword-tokenizer]]
=== Keyword Tokenizer
The `keyword` tokenizer is a ``noop'' tokenizer that accepts whatever text it
is given and outputs the exact same text as a single term. It can be combined
with token filters to normalise output, e.g. lower-casing email addresses.
[float]
=== Example output
[source,console]
---------------------------
POST _analyze
{
"tokenizer": "keyword",
"text": "New York"
}
---------------------------
/////////////////////
[source,console-result]
----------------------------
{
"tokens": [
{
"token": "New York",
"start_offset": 0,
"end_offset": 8,
"type": "word",
"position": 0
}
]
}
----------------------------
/////////////////////
The above sentence would produce the following term:
[source,text]
---------------------------
[ New York ]
---------------------------
[float]
=== Configuration
The `keyword` tokenizer accepts the following parameters:
[horizontal]
`buffer_size`::
The number of characters read into the term buffer in a single pass.
Defaults to `256`. The term buffer will grow by this size until all the
text has been consumed. It is advisable not to change this setting.