59 lines
1.2 KiB
Plaintext
59 lines
1.2 KiB
Plaintext
[[analysis-keyword-tokenizer]]
|
|
=== Keyword Tokenizer
|
|
|
|
The `keyword` tokenizer is a ``noop'' tokenizer that accepts whatever text it
|
|
is given and outputs the exact same text as a single term. It can be combined
|
|
with token filters to normalise output, e.g. lower-casing email addresses.
|
|
|
|
[float]
|
|
=== Example output
|
|
|
|
[source,console]
|
|
---------------------------
|
|
POST _analyze
|
|
{
|
|
"tokenizer": "keyword",
|
|
"text": "New York"
|
|
}
|
|
---------------------------
|
|
|
|
/////////////////////
|
|
|
|
[source,console-result]
|
|
----------------------------
|
|
{
|
|
"tokens": [
|
|
{
|
|
"token": "New York",
|
|
"start_offset": 0,
|
|
"end_offset": 8,
|
|
"type": "word",
|
|
"position": 0
|
|
}
|
|
]
|
|
}
|
|
----------------------------
|
|
|
|
/////////////////////
|
|
|
|
|
|
The above sentence would produce the following term:
|
|
|
|
[source,text]
|
|
---------------------------
|
|
[ New York ]
|
|
---------------------------
|
|
|
|
[float]
|
|
=== Configuration
|
|
|
|
The `keyword` tokenizer accepts the following parameters:
|
|
|
|
[horizontal]
|
|
`buffer_size`::
|
|
|
|
The number of characters read into the term buffer in a single pass.
|
|
Defaults to `256`. The term buffer will grow by this size until all the
|
|
text has been consumed. It is advisable not to change this setting.
|
|
|