mirror of
https://github.com/honeymoose/OpenSearch.git
synced 2025-02-05 20:48:22 +00:00
5da9e5dcbc
* Docs: Improved tokenizer docs Added descriptions and runnable examples * Addressed Nik's comments * Added TESTRESPONSEs for all tokenizer examples * Added TESTRESPONSEs for all analyzer examples too * Added docs, examples, and TESTRESPONSES for character filters * Skipping two tests: One interprets "$1" as a stack variable - same problem exists with the REST tests The other because the "took" value is always different * Fixed tests with "took" * Fixed failing tests and removed preserve_original from fingerprint analyzer
61 lines
1.2 KiB
Plaintext
61 lines
1.2 KiB
Plaintext
[[analysis-keyword-tokenizer]]
|
|
=== Keyword Tokenizer
|
|
|
|
The `keyword` tokenizer is a ``noop'' tokenizer that accepts whatever text it
|
|
is given and outputs the exact same text as a single term. It can be combined
|
|
with token filters to normalise output, e.g. lower-casing email addresses.
|
|
|
|
[float]
|
|
=== Example output
|
|
|
|
[source,js]
|
|
---------------------------
|
|
POST _analyze
|
|
{
|
|
"tokenizer": "keyword",
|
|
"text": "New York"
|
|
}
|
|
---------------------------
|
|
// CONSOLE
|
|
|
|
/////////////////////
|
|
|
|
[source,js]
|
|
----------------------------
|
|
{
|
|
"tokens": [
|
|
{
|
|
"token": "New York",
|
|
"start_offset": 0,
|
|
"end_offset": 8,
|
|
"type": "word",
|
|
"position": 0
|
|
}
|
|
]
|
|
}
|
|
----------------------------
|
|
// TESTRESPONSE
|
|
|
|
/////////////////////
|
|
|
|
|
|
The above sentence would produce the following term:
|
|
|
|
[source,text]
|
|
---------------------------
|
|
[ New York ]
|
|
---------------------------
|
|
|
|
[float]
|
|
=== Configuration
|
|
|
|
The `keyword` tokenizer accepts the following parameters:
|
|
|
|
[horizontal]
|
|
`buffer_size`::
|
|
|
|
The number of characters read into the term buffer in a single pass.
|
|
Defaults to `256`. The term buffer will grow by this size until all the
|
|
text has been consumed. It is advisable not to change this setting.
|
|
|