2013-08-28 19:24:34 -04:00
|
|
|
[[analysis-keyword-analyzer]]
|
2020-06-26 10:17:01 -04:00
|
|
|
=== Keyword analyzer
|
|
|
|
++++
|
|
|
|
<titleabbrev>Keyword</titleabbrev>
|
|
|
|
++++
|
2013-08-28 19:24:34 -04:00
|
|
|
|
2016-05-11 08:17:56 -04:00
|
|
|
The `keyword` analyzer is a ``noop'' analyzer which returns the entire input
|
|
|
|
string as a single token.
|
|
|
|
|
|
|
|
[float]
|
|
|
|
=== Example output
|
|
|
|
|
2019-09-09 13:38:14 -04:00
|
|
|
[source,console]
|
2016-05-11 08:17:56 -04:00
|
|
|
---------------------------
|
|
|
|
POST _analyze
|
|
|
|
{
|
|
|
|
"analyzer": "keyword",
|
|
|
|
"text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
|
|
|
|
}
|
|
|
|
---------------------------
|
|
|
|
|
2016-05-19 13:42:23 -04:00
|
|
|
/////////////////////
|
|
|
|
|
2019-09-06 09:22:08 -04:00
|
|
|
[source,console-result]
|
2016-05-19 13:42:23 -04:00
|
|
|
----------------------------
|
|
|
|
{
|
|
|
|
"tokens": [
|
|
|
|
{
|
|
|
|
"token": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone.",
|
|
|
|
"start_offset": 0,
|
|
|
|
"end_offset": 56,
|
|
|
|
"type": "word",
|
|
|
|
"position": 0
|
|
|
|
}
|
|
|
|
]
|
|
|
|
}
|
|
|
|
----------------------------
|
|
|
|
|
|
|
|
/////////////////////
|
|
|
|
|
|
|
|
|
2016-05-11 08:17:56 -04:00
|
|
|
The above sentence would produce the following single term:
|
|
|
|
|
|
|
|
[source,text]
|
|
|
|
---------------------------
|
|
|
|
[ The 2 QUICK Brown-Foxes jumped over the lazy dog's bone. ]
|
|
|
|
---------------------------
|
|
|
|
|
|
|
|
[float]
|
|
|
|
=== Configuration
|
|
|
|
|
|
|
|
The `keyword` analyzer is not configurable.
|
2018-05-14 18:40:54 -04:00
|
|
|
|
|
|
|
[float]
|
|
|
|
=== Definition
|
|
|
|
|
|
|
|
The `keyword` analyzer consists of:
|
|
|
|
|
|
|
|
Tokenizer::
|
|
|
|
* <<analysis-keyword-tokenizer,Keyword Tokenizer>>
|
|
|
|
|
|
|
|
If you need to customize the `keyword` analyzer then you need to
|
|
|
|
recreate it as a `custom` analyzer and modify it, usually by adding
|
|
|
|
token filters. Usually, you should prefer the
|
|
|
|
<<keyword, Keyword type>> when you want strings that are not split
|
|
|
|
into tokens, but just in case you need it, this would recreate the
|
|
|
|
built-in `keyword` analyzer and you can use it as a starting point
|
|
|
|
for further customization:
|
|
|
|
|
2019-09-09 13:38:14 -04:00
|
|
|
[source,console]
|
2018-05-14 18:40:54 -04:00
|
|
|
----------------------------------------------------
|
2019-01-18 03:34:11 -05:00
|
|
|
PUT /keyword_example
|
2018-05-14 18:40:54 -04:00
|
|
|
{
|
|
|
|
"settings": {
|
|
|
|
"analysis": {
|
|
|
|
"analyzer": {
|
|
|
|
"rebuilt_keyword": {
|
|
|
|
"tokenizer": "keyword",
|
|
|
|
"filter": [ <1>
|
|
|
|
]
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
----------------------------------------------------
|
|
|
|
// TEST[s/\n$/\nstartyaml\n - compare_analyzers: {index: keyword_example, first: keyword, second: rebuilt_keyword}\nendyaml\n/]
|
2019-09-09 13:38:14 -04:00
|
|
|
|
2018-05-14 18:40:54 -04:00
|
|
|
<1> You'd add any token filters here.
|