2013-08-28 19:24:34 -04:00
|
|
|
[[analysis-simple-analyzer]]
|
2020-06-26 10:17:01 -04:00
|
|
|
=== Simple analyzer
|
|
|
|
++++
|
|
|
|
<titleabbrev>Simple</titleabbrev>
|
|
|
|
++++
|
2013-08-28 19:24:34 -04:00
|
|
|
|
2020-07-02 09:52:05 -04:00
|
|
|
The `simple` analyzer breaks text into tokens at any non-letter character, such
|
|
|
|
as numbers, spaces, hyphens and apostrophes, discards non-letter characters,
|
|
|
|
and changes uppercase to lowercase.
|
2016-05-11 08:17:56 -04:00
|
|
|
|
2020-07-02 09:52:05 -04:00
|
|
|
[[analysis-simple-analyzer-ex]]
|
|
|
|
==== Example
|
2016-05-11 08:17:56 -04:00
|
|
|
|
2019-09-09 13:38:14 -04:00
|
|
|
[source,console]
|
2020-07-02 09:52:05 -04:00
|
|
|
----
|
2016-05-11 08:17:56 -04:00
|
|
|
POST _analyze
|
|
|
|
{
|
|
|
|
"analyzer": "simple",
|
|
|
|
"text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
|
|
|
|
}
|
2020-07-02 09:52:05 -04:00
|
|
|
----
|
2016-05-19 13:42:23 -04:00
|
|
|
|
2020-07-02 09:52:05 -04:00
|
|
|
////
|
2019-09-06 09:22:08 -04:00
|
|
|
[source,console-result]
|
2020-07-02 09:52:05 -04:00
|
|
|
----
|
2016-05-19 13:42:23 -04:00
|
|
|
{
|
|
|
|
"tokens": [
|
|
|
|
{
|
|
|
|
"token": "the",
|
|
|
|
"start_offset": 0,
|
|
|
|
"end_offset": 3,
|
|
|
|
"type": "word",
|
|
|
|
"position": 0
|
|
|
|
},
|
|
|
|
{
|
|
|
|
"token": "quick",
|
|
|
|
"start_offset": 6,
|
|
|
|
"end_offset": 11,
|
|
|
|
"type": "word",
|
|
|
|
"position": 1
|
|
|
|
},
|
|
|
|
{
|
|
|
|
"token": "brown",
|
|
|
|
"start_offset": 12,
|
|
|
|
"end_offset": 17,
|
|
|
|
"type": "word",
|
|
|
|
"position": 2
|
|
|
|
},
|
|
|
|
{
|
|
|
|
"token": "foxes",
|
|
|
|
"start_offset": 18,
|
|
|
|
"end_offset": 23,
|
|
|
|
"type": "word",
|
|
|
|
"position": 3
|
|
|
|
},
|
|
|
|
{
|
|
|
|
"token": "jumped",
|
|
|
|
"start_offset": 24,
|
|
|
|
"end_offset": 30,
|
|
|
|
"type": "word",
|
|
|
|
"position": 4
|
|
|
|
},
|
|
|
|
{
|
|
|
|
"token": "over",
|
|
|
|
"start_offset": 31,
|
|
|
|
"end_offset": 35,
|
|
|
|
"type": "word",
|
|
|
|
"position": 5
|
|
|
|
},
|
|
|
|
{
|
|
|
|
"token": "the",
|
|
|
|
"start_offset": 36,
|
|
|
|
"end_offset": 39,
|
|
|
|
"type": "word",
|
|
|
|
"position": 6
|
|
|
|
},
|
|
|
|
{
|
|
|
|
"token": "lazy",
|
|
|
|
"start_offset": 40,
|
|
|
|
"end_offset": 44,
|
|
|
|
"type": "word",
|
|
|
|
"position": 7
|
|
|
|
},
|
|
|
|
{
|
|
|
|
"token": "dog",
|
|
|
|
"start_offset": 45,
|
|
|
|
"end_offset": 48,
|
|
|
|
"type": "word",
|
|
|
|
"position": 8
|
|
|
|
},
|
|
|
|
{
|
|
|
|
"token": "s",
|
|
|
|
"start_offset": 49,
|
|
|
|
"end_offset": 50,
|
|
|
|
"type": "word",
|
|
|
|
"position": 9
|
|
|
|
},
|
|
|
|
{
|
|
|
|
"token": "bone",
|
|
|
|
"start_offset": 51,
|
|
|
|
"end_offset": 55,
|
|
|
|
"type": "word",
|
|
|
|
"position": 10
|
|
|
|
}
|
|
|
|
]
|
|
|
|
}
|
2020-07-02 09:52:05 -04:00
|
|
|
----
|
|
|
|
////
|
2016-05-19 13:42:23 -04:00
|
|
|
|
2020-07-02 09:52:05 -04:00
|
|
|
The `simple` analyzer parses the sentence and produces the following
|
|
|
|
tokens:
|
2016-05-11 08:17:56 -04:00
|
|
|
|
|
|
|
[source,text]
|
2020-07-02 09:52:05 -04:00
|
|
|
----
|
2016-05-11 08:17:56 -04:00
|
|
|
[ the, quick, brown, foxes, jumped, over, the, lazy, dog, s, bone ]
|
2020-07-02 09:52:05 -04:00
|
|
|
----
|
2016-05-11 08:17:56 -04:00
|
|
|
|
2020-07-02 09:52:05 -04:00
|
|
|
[[analysis-simple-analyzer-definition]]
|
|
|
|
==== Definition
|
2016-05-11 08:17:56 -04:00
|
|
|
|
2020-07-02 09:52:05 -04:00
|
|
|
The `simple` analyzer is defined by one tokenizer:
|
2018-05-14 18:40:54 -04:00
|
|
|
|
|
|
|
Tokenizer::
|
2020-07-02 09:52:05 -04:00
|
|
|
* <<analysis-lowercase-tokenizer, Lowercase Tokenizer>>
|
|
|
|
|
|
|
|
[[analysis-simple-analyzer-customize]]
|
|
|
|
==== Customize
|
2018-05-14 18:40:54 -04:00
|
|
|
|
2020-07-02 09:52:05 -04:00
|
|
|
To customize the `simple` analyzer, duplicate it to create the basis for
|
|
|
|
a custom analyzer. This custom analyzer can be modified as required, usually by
|
|
|
|
adding token filters.
|
2018-05-14 18:40:54 -04:00
|
|
|
|
2019-09-09 13:38:14 -04:00
|
|
|
[source,console]
|
2020-07-02 09:52:05 -04:00
|
|
|
----
|
2020-07-27 15:58:26 -04:00
|
|
|
PUT /my-index-000001
|
2018-05-14 18:40:54 -04:00
|
|
|
{
|
|
|
|
"settings": {
|
|
|
|
"analysis": {
|
|
|
|
"analyzer": {
|
2020-07-02 09:52:05 -04:00
|
|
|
"my_custom_simple_analyzer": {
|
2018-05-14 18:40:54 -04:00
|
|
|
"tokenizer": "lowercase",
|
2020-07-02 09:52:05 -04:00
|
|
|
"filter": [ <1>
|
2018-05-14 18:40:54 -04:00
|
|
|
]
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
2020-07-02 09:52:05 -04:00
|
|
|
----
|
|
|
|
<1> Add token filters here.
|