mirror of
https://github.com/honeymoose/OpenSearch.git
synced 2025-02-08 14:05:27 +00:00
36a3b84fc9
* Default include_type_name to false for get and put mappings. * Default include_type_name to false for get field mappings. * Add a constant for the default include_type_name value. * Default include_type_name to false for get and put index templates. * Default include_type_name to false for create index. * Update create index calls in REST documentation to use include_type_name=true. * Some minor clean-ups around the get index API. * In REST tests, use include_type_name=true by default for index creation. * Make sure to use 'expression == false'. * Clarify the different IndexTemplateMetaData toXContent methods. * Fix FullClusterRestartIT#testSnapshotRestore. * Fix the ml_anomalies_default_mappings test. * Fix GetFieldMappingsResponseTests and GetIndexTemplateResponseTests. We make sure to specify include_type_name=true during xContent parsing, so we continue to test the legacy typed responses. XContent generation for the typeless responses is currently only covered by REST tests, but we will be adding unit test coverage for these as we implement each typeless API in the Java HLRC. This commit also refactors GetMappingsResponse to follow the same appraoch as the other mappings-related responses, where we read include_type_name out of the xContent params, instead of creating a second toXContent method. This gives better consistency in the response parsing code. * Fix more REST tests. * Improve some wording in the create index documentation. * Add a note about types removal in the create index docs. * Fix SmokeTestMonitoringWithSecurityIT#testHTTPExporterWithSSL. * Make sure to mention include_type_name in the REST docs for affected APIs. * Make sure to use 'expression == false' in FullClusterRestartIT. * Mention include_type_name in the REST templates docs.
278 lines
5.5 KiB
Plaintext
278 lines
5.5 KiB
Plaintext
[[analysis-stop-analyzer]]
|
|
=== Stop Analyzer
|
|
|
|
The `stop` analyzer is the same as the <<analysis-simple-analyzer,`simple` analyzer>>
|
|
but adds support for removing stop words. It defaults to using the
|
|
`_english_` stop words.
|
|
|
|
[float]
|
|
=== Example output
|
|
|
|
[source,js]
|
|
---------------------------
|
|
POST _analyze
|
|
{
|
|
"analyzer": "stop",
|
|
"text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
|
|
}
|
|
---------------------------
|
|
// CONSOLE
|
|
|
|
/////////////////////
|
|
|
|
[source,js]
|
|
----------------------------
|
|
{
|
|
"tokens": [
|
|
{
|
|
"token": "quick",
|
|
"start_offset": 6,
|
|
"end_offset": 11,
|
|
"type": "word",
|
|
"position": 1
|
|
},
|
|
{
|
|
"token": "brown",
|
|
"start_offset": 12,
|
|
"end_offset": 17,
|
|
"type": "word",
|
|
"position": 2
|
|
},
|
|
{
|
|
"token": "foxes",
|
|
"start_offset": 18,
|
|
"end_offset": 23,
|
|
"type": "word",
|
|
"position": 3
|
|
},
|
|
{
|
|
"token": "jumped",
|
|
"start_offset": 24,
|
|
"end_offset": 30,
|
|
"type": "word",
|
|
"position": 4
|
|
},
|
|
{
|
|
"token": "over",
|
|
"start_offset": 31,
|
|
"end_offset": 35,
|
|
"type": "word",
|
|
"position": 5
|
|
},
|
|
{
|
|
"token": "lazy",
|
|
"start_offset": 40,
|
|
"end_offset": 44,
|
|
"type": "word",
|
|
"position": 7
|
|
},
|
|
{
|
|
"token": "dog",
|
|
"start_offset": 45,
|
|
"end_offset": 48,
|
|
"type": "word",
|
|
"position": 8
|
|
},
|
|
{
|
|
"token": "s",
|
|
"start_offset": 49,
|
|
"end_offset": 50,
|
|
"type": "word",
|
|
"position": 9
|
|
},
|
|
{
|
|
"token": "bone",
|
|
"start_offset": 51,
|
|
"end_offset": 55,
|
|
"type": "word",
|
|
"position": 10
|
|
}
|
|
]
|
|
}
|
|
----------------------------
|
|
// TESTRESPONSE
|
|
|
|
/////////////////////
|
|
|
|
|
|
The above sentence would produce the following terms:
|
|
|
|
[source,text]
|
|
---------------------------
|
|
[ quick, brown, foxes, jumped, over, lazy, dog, s, bone ]
|
|
---------------------------
|
|
|
|
[float]
|
|
=== Configuration
|
|
|
|
The `stop` analyzer accepts the following parameters:
|
|
|
|
[horizontal]
|
|
`stopwords`::
|
|
|
|
A pre-defined stop words list like `_english_` or an array containing a
|
|
list of stop words. Defaults to `_english_`.
|
|
|
|
`stopwords_path`::
|
|
|
|
The path to a file containing stop words. This path is relative to the
|
|
Elasticsearch `config` directory.
|
|
|
|
|
|
See the <<analysis-stop-tokenfilter,Stop Token Filter>> for more information
|
|
about stop word configuration.
|
|
|
|
[float]
|
|
=== Example configuration
|
|
|
|
In this example, we configure the `stop` analyzer to use a specified list of
|
|
words as stop words:
|
|
|
|
[source,js]
|
|
----------------------------
|
|
PUT my_index?include_type_name=true
|
|
{
|
|
"settings": {
|
|
"analysis": {
|
|
"analyzer": {
|
|
"my_stop_analyzer": {
|
|
"type": "stop",
|
|
"stopwords": ["the", "over"]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
|
|
POST my_index/_analyze
|
|
{
|
|
"analyzer": "my_stop_analyzer",
|
|
"text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
|
|
}
|
|
----------------------------
|
|
// CONSOLE
|
|
|
|
/////////////////////
|
|
|
|
[source,js]
|
|
----------------------------
|
|
{
|
|
"tokens": [
|
|
{
|
|
"token": "quick",
|
|
"start_offset": 6,
|
|
"end_offset": 11,
|
|
"type": "word",
|
|
"position": 1
|
|
},
|
|
{
|
|
"token": "brown",
|
|
"start_offset": 12,
|
|
"end_offset": 17,
|
|
"type": "word",
|
|
"position": 2
|
|
},
|
|
{
|
|
"token": "foxes",
|
|
"start_offset": 18,
|
|
"end_offset": 23,
|
|
"type": "word",
|
|
"position": 3
|
|
},
|
|
{
|
|
"token": "jumped",
|
|
"start_offset": 24,
|
|
"end_offset": 30,
|
|
"type": "word",
|
|
"position": 4
|
|
},
|
|
{
|
|
"token": "lazy",
|
|
"start_offset": 40,
|
|
"end_offset": 44,
|
|
"type": "word",
|
|
"position": 7
|
|
},
|
|
{
|
|
"token": "dog",
|
|
"start_offset": 45,
|
|
"end_offset": 48,
|
|
"type": "word",
|
|
"position": 8
|
|
},
|
|
{
|
|
"token": "s",
|
|
"start_offset": 49,
|
|
"end_offset": 50,
|
|
"type": "word",
|
|
"position": 9
|
|
},
|
|
{
|
|
"token": "bone",
|
|
"start_offset": 51,
|
|
"end_offset": 55,
|
|
"type": "word",
|
|
"position": 10
|
|
}
|
|
]
|
|
}
|
|
----------------------------
|
|
// TESTRESPONSE
|
|
|
|
/////////////////////
|
|
|
|
|
|
The above example produces the following terms:
|
|
|
|
[source,text]
|
|
---------------------------
|
|
[ quick, brown, foxes, jumped, lazy, dog, s, bone ]
|
|
---------------------------
|
|
|
|
[float]
|
|
=== Definition
|
|
|
|
It consists of:
|
|
|
|
Tokenizer::
|
|
* <<analysis-lowercase-tokenizer,Lower Case Tokenizer>>
|
|
|
|
Token filters::
|
|
* <<analysis-stop-tokenfilter,Stop Token Filter>>
|
|
|
|
If you need to customize the `stop` analyzer beyond the configuration
|
|
parameters then you need to recreate it as a `custom` analyzer and modify
|
|
it, usually by adding token filters. This would recreate the built-in
|
|
`stop` analyzer and you can use it as a starting point for further
|
|
customization:
|
|
|
|
[source,js]
|
|
----------------------------------------------------
|
|
PUT /stop_example?include_type_name=true
|
|
{
|
|
"settings": {
|
|
"analysis": {
|
|
"filter": {
|
|
"english_stop": {
|
|
"type": "stop",
|
|
"stopwords": "_english_" <1>
|
|
}
|
|
},
|
|
"analyzer": {
|
|
"rebuilt_stop": {
|
|
"tokenizer": "lowercase",
|
|
"filter": [
|
|
"english_stop" <2>
|
|
]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
----------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[s/\n$/\nstartyaml\n - compare_analyzers: {index: stop_example, first: stop, second: rebuilt_stop}\nendyaml\n/]
|
|
<1> The default stopwords can be overridden with the `stopwords`
|
|
or `stopwords_path` parameters.
|
|
<2> You'd add any token filters after `english_stop`.
|