mirror of
https://github.com/honeymoose/OpenSearch.git
synced 2025-02-06 21:18:31 +00:00
* Default include_type_name to false for get and put mappings. * Default include_type_name to false for get field mappings. * Add a constant for the default include_type_name value. * Default include_type_name to false for get and put index templates. * Default include_type_name to false for create index. * Update create index calls in REST documentation to use include_type_name=true. * Some minor clean-ups around the get index API. * In REST tests, use include_type_name=true by default for index creation. * Make sure to use 'expression == false'. * Clarify the different IndexTemplateMetaData toXContent methods. * Fix FullClusterRestartIT#testSnapshotRestore. * Fix the ml_anomalies_default_mappings test. * Fix GetFieldMappingsResponseTests and GetIndexTemplateResponseTests. We make sure to specify include_type_name=true during xContent parsing, so we continue to test the legacy typed responses. XContent generation for the typeless responses is currently only covered by REST tests, but we will be adding unit test coverage for these as we implement each typeless API in the Java HLRC. This commit also refactors GetMappingsResponse to follow the same appraoch as the other mappings-related responses, where we read include_type_name out of the xContent params, instead of creating a second toXContent method. This gives better consistency in the response parsing code. * Fix more REST tests. * Improve some wording in the create index documentation. * Add a note about types removal in the create index docs. * Fix SmokeTestMonitoringWithSecurityIT#testHTTPExporterWithSSL. * Make sure to mention include_type_name in the REST docs for affected APIs. * Make sure to use 'expression == false' in FullClusterRestartIT. * Mention include_type_name in the REST templates docs.
1773 lines
49 KiB
Plaintext
1773 lines
49 KiB
Plaintext
[[analysis-lang-analyzer]]
|
|
=== Language Analyzers
|
|
|
|
A set of analyzers aimed at analyzing specific language text. The
|
|
following types are supported:
|
|
<<arabic-analyzer,`arabic`>>,
|
|
<<armenian-analyzer,`armenian`>>,
|
|
<<basque-analyzer,`basque`>>,
|
|
<<bengali-analyzer,`bengali`>>,
|
|
<<brazilian-analyzer,`brazilian`>>,
|
|
<<bulgarian-analyzer,`bulgarian`>>,
|
|
<<catalan-analyzer,`catalan`>>,
|
|
<<cjk-analyzer,`cjk`>>,
|
|
<<czech-analyzer,`czech`>>,
|
|
<<danish-analyzer,`danish`>>,
|
|
<<dutch-analyzer,`dutch`>>,
|
|
<<english-analyzer,`english`>>,
|
|
<<finnish-analyzer,`finnish`>>,
|
|
<<french-analyzer,`french`>>,
|
|
<<galician-analyzer,`galician`>>,
|
|
<<german-analyzer,`german`>>,
|
|
<<greek-analyzer,`greek`>>,
|
|
<<hindi-analyzer,`hindi`>>,
|
|
<<hungarian-analyzer,`hungarian`>>,
|
|
<<indonesian-analyzer,`indonesian`>>,
|
|
<<irish-analyzer,`irish`>>,
|
|
<<italian-analyzer,`italian`>>,
|
|
<<latvian-analyzer,`latvian`>>,
|
|
<<lithuanian-analyzer,`lithuanian`>>,
|
|
<<norwegian-analyzer,`norwegian`>>,
|
|
<<persian-analyzer,`persian`>>,
|
|
<<portuguese-analyzer,`portuguese`>>,
|
|
<<romanian-analyzer,`romanian`>>,
|
|
<<russian-analyzer,`russian`>>,
|
|
<<sorani-analyzer,`sorani`>>,
|
|
<<spanish-analyzer,`spanish`>>,
|
|
<<swedish-analyzer,`swedish`>>,
|
|
<<turkish-analyzer,`turkish`>>,
|
|
<<thai-analyzer,`thai`>>.
|
|
|
|
==== Configuring language analyzers
|
|
|
|
===== Stopwords
|
|
|
|
All analyzers support setting custom `stopwords` either internally in
|
|
the config, or by using an external stopwords file by setting
|
|
`stopwords_path`. Check <<analysis-stop-analyzer,Stop Analyzer>> for
|
|
more details.
|
|
|
|
===== Excluding words from stemming
|
|
|
|
The `stem_exclusion` parameter allows you to specify an array
|
|
of lowercase words that should not be stemmed. Internally, this
|
|
functionality is implemented by adding the
|
|
<<analysis-keyword-marker-tokenfilter,`keyword_marker` token filter>>
|
|
with the `keywords` set to the value of the `stem_exclusion` parameter.
|
|
|
|
The following analyzers support setting custom `stem_exclusion` list:
|
|
`arabic`, `armenian`, `basque`, `bengali`, `bulgarian`, `catalan`, `czech`,
|
|
`dutch`, `english`, `finnish`, `french`, `galician`,
|
|
`german`, `hindi`, `hungarian`, `indonesian`, `irish`, `italian`, `latvian`,
|
|
`lithuanian`, `norwegian`, `portuguese`, `romanian`, `russian`, `sorani`,
|
|
`spanish`, `swedish`, `turkish`.
|
|
|
|
==== Reimplementing language analyzers
|
|
|
|
The built-in language analyzers can be reimplemented as `custom` analyzers
|
|
(as described below) in order to customize their behaviour.
|
|
|
|
NOTE: If you do not intend to exclude words from being stemmed (the
|
|
equivalent of the `stem_exclusion` parameter above), then you should remove
|
|
the `keyword_marker` token filter from the custom analyzer configuration.
|
|
|
|
[[arabic-analyzer]]
|
|
===== `arabic` analyzer
|
|
|
|
The `arabic` analyzer could be reimplemented as a `custom` analyzer as follows:
|
|
|
|
[source,js]
|
|
----------------------------------------------------
|
|
PUT /arabic_example?include_type_name=true
|
|
{
|
|
"settings": {
|
|
"analysis": {
|
|
"filter": {
|
|
"arabic_stop": {
|
|
"type": "stop",
|
|
"stopwords": "_arabic_" <1>
|
|
},
|
|
"arabic_keywords": {
|
|
"type": "keyword_marker",
|
|
"keywords": ["مثال"] <2>
|
|
},
|
|
"arabic_stemmer": {
|
|
"type": "stemmer",
|
|
"language": "arabic"
|
|
}
|
|
},
|
|
"analyzer": {
|
|
"rebuilt_arabic": {
|
|
"tokenizer": "standard",
|
|
"filter": [
|
|
"lowercase",
|
|
"decimal_digit",
|
|
"arabic_stop",
|
|
"arabic_normalization",
|
|
"arabic_keywords",
|
|
"arabic_stemmer"
|
|
]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
----------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[s/"arabic_keywords",//]
|
|
// TEST[s/\n$/\nstartyaml\n - compare_analyzers: {index: arabic_example, first: arabic, second: rebuilt_arabic}\nendyaml\n/]
|
|
<1> The default stopwords can be overridden with the `stopwords`
|
|
or `stopwords_path` parameters.
|
|
<2> This filter should be removed unless there are words which should
|
|
be excluded from stemming.
|
|
|
|
[[armenian-analyzer]]
|
|
===== `armenian` analyzer
|
|
|
|
The `armenian` analyzer could be reimplemented as a `custom` analyzer as follows:
|
|
|
|
[source,js]
|
|
----------------------------------------------------
|
|
PUT /armenian_example?include_type_name=true
|
|
{
|
|
"settings": {
|
|
"analysis": {
|
|
"filter": {
|
|
"armenian_stop": {
|
|
"type": "stop",
|
|
"stopwords": "_armenian_" <1>
|
|
},
|
|
"armenian_keywords": {
|
|
"type": "keyword_marker",
|
|
"keywords": ["օրինակ"] <2>
|
|
},
|
|
"armenian_stemmer": {
|
|
"type": "stemmer",
|
|
"language": "armenian"
|
|
}
|
|
},
|
|
"analyzer": {
|
|
"rebuilt_armenian": {
|
|
"tokenizer": "standard",
|
|
"filter": [
|
|
"lowercase",
|
|
"armenian_stop",
|
|
"armenian_keywords",
|
|
"armenian_stemmer"
|
|
]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
----------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[s/"armenian_keywords",//]
|
|
// TEST[s/\n$/\nstartyaml\n - compare_analyzers: {index: armenian_example, first: armenian, second: rebuilt_armenian}\nendyaml\n/]
|
|
<1> The default stopwords can be overridden with the `stopwords`
|
|
or `stopwords_path` parameters.
|
|
<2> This filter should be removed unless there are words which should
|
|
be excluded from stemming.
|
|
|
|
[[basque-analyzer]]
|
|
===== `basque` analyzer
|
|
|
|
The `basque` analyzer could be reimplemented as a `custom` analyzer as follows:
|
|
|
|
[source,js]
|
|
----------------------------------------------------
|
|
PUT /basque_example?include_type_name=true
|
|
{
|
|
"settings": {
|
|
"analysis": {
|
|
"filter": {
|
|
"basque_stop": {
|
|
"type": "stop",
|
|
"stopwords": "_basque_" <1>
|
|
},
|
|
"basque_keywords": {
|
|
"type": "keyword_marker",
|
|
"keywords": ["Adibidez"] <2>
|
|
},
|
|
"basque_stemmer": {
|
|
"type": "stemmer",
|
|
"language": "basque"
|
|
}
|
|
},
|
|
"analyzer": {
|
|
"rebuilt_basque": {
|
|
"tokenizer": "standard",
|
|
"filter": [
|
|
"lowercase",
|
|
"basque_stop",
|
|
"basque_keywords",
|
|
"basque_stemmer"
|
|
]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
----------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[s/"basque_keywords",//]
|
|
// TEST[s/\n$/\nstartyaml\n - compare_analyzers: {index: basque_example, first: basque, second: rebuilt_basque}\nendyaml\n/]
|
|
<1> The default stopwords can be overridden with the `stopwords`
|
|
or `stopwords_path` parameters.
|
|
<2> This filter should be removed unless there are words which should
|
|
be excluded from stemming.
|
|
|
|
[[bengali-analyzer]]
|
|
===== `bengali` analyzer
|
|
|
|
The `bengali` analyzer could be reimplemented as a `custom` analyzer as follows:
|
|
|
|
[source,js]
|
|
----------------------------------------------------
|
|
PUT /bengali_example?include_type_name=true
|
|
{
|
|
"settings": {
|
|
"analysis": {
|
|
"filter": {
|
|
"bengali_stop": {
|
|
"type": "stop",
|
|
"stopwords": "_bengali_" <1>
|
|
},
|
|
"bengali_keywords": {
|
|
"type": "keyword_marker",
|
|
"keywords": ["উদাহরণ"] <2>
|
|
},
|
|
"bengali_stemmer": {
|
|
"type": "stemmer",
|
|
"language": "bengali"
|
|
}
|
|
},
|
|
"analyzer": {
|
|
"rebuilt_bengali": {
|
|
"tokenizer": "standard",
|
|
"filter": [
|
|
"lowercase",
|
|
"decimal_digit",
|
|
"bengali_keywords",
|
|
"indic_normalization",
|
|
"bengali_normalization",
|
|
"bengali_stop",
|
|
"bengali_stemmer"
|
|
]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
----------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[s/"bengali_keywords",//]
|
|
// TEST[s/\n$/\nstartyaml\n - compare_analyzers: {index: bengali_example, first: bengali, second: rebuilt_bengali}\nendyaml\n/]
|
|
<1> The default stopwords can be overridden with the `stopwords`
|
|
or `stopwords_path` parameters.
|
|
<2> This filter should be removed unless there are words which should
|
|
be excluded from stemming.
|
|
|
|
[[brazilian-analyzer]]
|
|
===== `brazilian` analyzer
|
|
|
|
The `brazilian` analyzer could be reimplemented as a `custom` analyzer as follows:
|
|
|
|
[source,js]
|
|
----------------------------------------------------
|
|
PUT /brazilian_example?include_type_name=true
|
|
{
|
|
"settings": {
|
|
"analysis": {
|
|
"filter": {
|
|
"brazilian_stop": {
|
|
"type": "stop",
|
|
"stopwords": "_brazilian_" <1>
|
|
},
|
|
"brazilian_keywords": {
|
|
"type": "keyword_marker",
|
|
"keywords": ["exemplo"] <2>
|
|
},
|
|
"brazilian_stemmer": {
|
|
"type": "stemmer",
|
|
"language": "brazilian"
|
|
}
|
|
},
|
|
"analyzer": {
|
|
"rebuilt_brazilian": {
|
|
"tokenizer": "standard",
|
|
"filter": [
|
|
"lowercase",
|
|
"brazilian_stop",
|
|
"brazilian_keywords",
|
|
"brazilian_stemmer"
|
|
]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
----------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[s/"brazilian_keywords",//]
|
|
// TEST[s/\n$/\nstartyaml\n - compare_analyzers: {index: brazilian_example, first: brazilian, second: rebuilt_brazilian}\nendyaml\n/]
|
|
<1> The default stopwords can be overridden with the `stopwords`
|
|
or `stopwords_path` parameters.
|
|
<2> This filter should be removed unless there are words which should
|
|
be excluded from stemming.
|
|
|
|
[[bulgarian-analyzer]]
|
|
===== `bulgarian` analyzer
|
|
|
|
The `bulgarian` analyzer could be reimplemented as a `custom` analyzer as follows:
|
|
|
|
[source,js]
|
|
----------------------------------------------------
|
|
PUT /bulgarian_example?include_type_name=true
|
|
{
|
|
"settings": {
|
|
"analysis": {
|
|
"filter": {
|
|
"bulgarian_stop": {
|
|
"type": "stop",
|
|
"stopwords": "_bulgarian_" <1>
|
|
},
|
|
"bulgarian_keywords": {
|
|
"type": "keyword_marker",
|
|
"keywords": ["пример"] <2>
|
|
},
|
|
"bulgarian_stemmer": {
|
|
"type": "stemmer",
|
|
"language": "bulgarian"
|
|
}
|
|
},
|
|
"analyzer": {
|
|
"rebuilt_bulgarian": {
|
|
"tokenizer": "standard",
|
|
"filter": [
|
|
"lowercase",
|
|
"bulgarian_stop",
|
|
"bulgarian_keywords",
|
|
"bulgarian_stemmer"
|
|
]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
----------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[s/"bulgarian_keywords",//]
|
|
// TEST[s/\n$/\nstartyaml\n - compare_analyzers: {index: bulgarian_example, first: bulgarian, second: rebuilt_bulgarian}\nendyaml\n/]
|
|
<1> The default stopwords can be overridden with the `stopwords`
|
|
or `stopwords_path` parameters.
|
|
<2> This filter should be removed unless there are words which should
|
|
be excluded from stemming.
|
|
|
|
[[catalan-analyzer]]
|
|
===== `catalan` analyzer
|
|
|
|
The `catalan` analyzer could be reimplemented as a `custom` analyzer as follows:
|
|
|
|
[source,js]
|
|
----------------------------------------------------
|
|
PUT /catalan_example?include_type_name=true
|
|
{
|
|
"settings": {
|
|
"analysis": {
|
|
"filter": {
|
|
"catalan_elision": {
|
|
"type": "elision",
|
|
"articles": [ "d", "l", "m", "n", "s", "t"],
|
|
"articles_case": true
|
|
},
|
|
"catalan_stop": {
|
|
"type": "stop",
|
|
"stopwords": "_catalan_" <1>
|
|
},
|
|
"catalan_keywords": {
|
|
"type": "keyword_marker",
|
|
"keywords": ["exemple"] <2>
|
|
},
|
|
"catalan_stemmer": {
|
|
"type": "stemmer",
|
|
"language": "catalan"
|
|
}
|
|
},
|
|
"analyzer": {
|
|
"rebuilt_catalan": {
|
|
"tokenizer": "standard",
|
|
"filter": [
|
|
"catalan_elision",
|
|
"lowercase",
|
|
"catalan_stop",
|
|
"catalan_keywords",
|
|
"catalan_stemmer"
|
|
]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
----------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[s/"catalan_keywords",//]
|
|
// TEST[s/\n$/\nstartyaml\n - compare_analyzers: {index: catalan_example, first: catalan, second: rebuilt_catalan}\nendyaml\n/]
|
|
<1> The default stopwords can be overridden with the `stopwords`
|
|
or `stopwords_path` parameters.
|
|
<2> This filter should be removed unless there are words which should
|
|
be excluded from stemming.
|
|
|
|
[[cjk-analyzer]]
|
|
===== `cjk` analyzer
|
|
|
|
NOTE: You may find that `icu_analyzer` in the ICU analysis plugin works better
|
|
for CJK text than the `cjk` analyzer. Experiment with your text and queries.
|
|
|
|
The `cjk` analyzer could be reimplemented as a `custom` analyzer as follows:
|
|
|
|
[source,js]
|
|
----------------------------------------------------
|
|
PUT /cjk_example?include_type_name=true
|
|
{
|
|
"settings": {
|
|
"analysis": {
|
|
"filter": {
|
|
"english_stop": {
|
|
"type": "stop",
|
|
"stopwords": [ <1>
|
|
"a", "and", "are", "as", "at", "be", "but", "by", "for",
|
|
"if", "in", "into", "is", "it", "no", "not", "of", "on",
|
|
"or", "s", "such", "t", "that", "the", "their", "then",
|
|
"there", "these", "they", "this", "to", "was", "will",
|
|
"with", "www"
|
|
]
|
|
}
|
|
},
|
|
"analyzer": {
|
|
"rebuilt_cjk": {
|
|
"tokenizer": "standard",
|
|
"filter": [
|
|
"cjk_width",
|
|
"lowercase",
|
|
"cjk_bigram",
|
|
"english_stop"
|
|
]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
----------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[s/"cjk_keywords",//]
|
|
// TEST[s/\n$/\nstartyaml\n - compare_analyzers: {index: cjk_example, first: cjk, second: rebuilt_cjk}\nendyaml\n/]
|
|
<1> The default stopwords can be overridden with the `stopwords`
|
|
or `stopwords_path` parameters. The default stop words are
|
|
*almost* the same as the `_english_` set, but not exactly
|
|
the same.
|
|
|
|
[[czech-analyzer]]
|
|
===== `czech` analyzer
|
|
|
|
The `czech` analyzer could be reimplemented as a `custom` analyzer as follows:
|
|
|
|
[source,js]
|
|
----------------------------------------------------
|
|
PUT /czech_example?include_type_name=true
|
|
{
|
|
"settings": {
|
|
"analysis": {
|
|
"filter": {
|
|
"czech_stop": {
|
|
"type": "stop",
|
|
"stopwords": "_czech_" <1>
|
|
},
|
|
"czech_keywords": {
|
|
"type": "keyword_marker",
|
|
"keywords": ["příklad"] <2>
|
|
},
|
|
"czech_stemmer": {
|
|
"type": "stemmer",
|
|
"language": "czech"
|
|
}
|
|
},
|
|
"analyzer": {
|
|
"rebuilt_czech": {
|
|
"tokenizer": "standard",
|
|
"filter": [
|
|
"lowercase",
|
|
"czech_stop",
|
|
"czech_keywords",
|
|
"czech_stemmer"
|
|
]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
----------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[s/"czech_keywords",//]
|
|
// TEST[s/\n$/\nstartyaml\n - compare_analyzers: {index: czech_example, first: czech, second: rebuilt_czech}\nendyaml\n/]
|
|
<1> The default stopwords can be overridden with the `stopwords`
|
|
or `stopwords_path` parameters.
|
|
<2> This filter should be removed unless there are words which should
|
|
be excluded from stemming.
|
|
|
|
[[danish-analyzer]]
|
|
===== `danish` analyzer
|
|
|
|
The `danish` analyzer could be reimplemented as a `custom` analyzer as follows:
|
|
|
|
[source,js]
|
|
----------------------------------------------------
|
|
PUT /danish_example?include_type_name=true
|
|
{
|
|
"settings": {
|
|
"analysis": {
|
|
"filter": {
|
|
"danish_stop": {
|
|
"type": "stop",
|
|
"stopwords": "_danish_" <1>
|
|
},
|
|
"danish_keywords": {
|
|
"type": "keyword_marker",
|
|
"keywords": ["eksempel"] <2>
|
|
},
|
|
"danish_stemmer": {
|
|
"type": "stemmer",
|
|
"language": "danish"
|
|
}
|
|
},
|
|
"analyzer": {
|
|
"rebuilt_danish": {
|
|
"tokenizer": "standard",
|
|
"filter": [
|
|
"lowercase",
|
|
"danish_stop",
|
|
"danish_keywords",
|
|
"danish_stemmer"
|
|
]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
----------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[s/"danish_keywords",//]
|
|
// TEST[s/\n$/\nstartyaml\n - compare_analyzers: {index: danish_example, first: danish, second: rebuilt_danish}\nendyaml\n/]
|
|
<1> The default stopwords can be overridden with the `stopwords`
|
|
or `stopwords_path` parameters.
|
|
<2> This filter should be removed unless there are words which should
|
|
be excluded from stemming.
|
|
|
|
[[dutch-analyzer]]
|
|
===== `dutch` analyzer
|
|
|
|
The `dutch` analyzer could be reimplemented as a `custom` analyzer as follows:
|
|
|
|
[source,js]
|
|
----------------------------------------------------
|
|
PUT /dutch_example?include_type_name=true
|
|
{
|
|
"settings": {
|
|
"analysis": {
|
|
"filter": {
|
|
"dutch_stop": {
|
|
"type": "stop",
|
|
"stopwords": "_dutch_" <1>
|
|
},
|
|
"dutch_keywords": {
|
|
"type": "keyword_marker",
|
|
"keywords": ["voorbeeld"] <2>
|
|
},
|
|
"dutch_stemmer": {
|
|
"type": "stemmer",
|
|
"language": "dutch"
|
|
},
|
|
"dutch_override": {
|
|
"type": "stemmer_override",
|
|
"rules": [
|
|
"fiets=>fiets",
|
|
"bromfiets=>bromfiets",
|
|
"ei=>eier",
|
|
"kind=>kinder"
|
|
]
|
|
}
|
|
},
|
|
"analyzer": {
|
|
"rebuilt_dutch": {
|
|
"tokenizer": "standard",
|
|
"filter": [
|
|
"lowercase",
|
|
"dutch_stop",
|
|
"dutch_keywords",
|
|
"dutch_override",
|
|
"dutch_stemmer"
|
|
]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
----------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[s/"dutch_keywords",//]
|
|
// TEST[s/\n$/\nstartyaml\n - compare_analyzers: {index: dutch_example, first: dutch, second: rebuilt_dutch}\nendyaml\n/]
|
|
<1> The default stopwords can be overridden with the `stopwords`
|
|
or `stopwords_path` parameters.
|
|
<2> This filter should be removed unless there are words which should
|
|
be excluded from stemming.
|
|
|
|
[[english-analyzer]]
|
|
===== `english` analyzer
|
|
|
|
The `english` analyzer could be reimplemented as a `custom` analyzer as follows:
|
|
|
|
[source,js]
|
|
----------------------------------------------------
|
|
PUT /english_example?include_type_name=true
|
|
{
|
|
"settings": {
|
|
"analysis": {
|
|
"filter": {
|
|
"english_stop": {
|
|
"type": "stop",
|
|
"stopwords": "_english_" <1>
|
|
},
|
|
"english_keywords": {
|
|
"type": "keyword_marker",
|
|
"keywords": ["example"] <2>
|
|
},
|
|
"english_stemmer": {
|
|
"type": "stemmer",
|
|
"language": "english"
|
|
},
|
|
"english_possessive_stemmer": {
|
|
"type": "stemmer",
|
|
"language": "possessive_english"
|
|
}
|
|
},
|
|
"analyzer": {
|
|
"rebuilt_english": {
|
|
"tokenizer": "standard",
|
|
"filter": [
|
|
"english_possessive_stemmer",
|
|
"lowercase",
|
|
"english_stop",
|
|
"english_keywords",
|
|
"english_stemmer"
|
|
]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
----------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[s/"english_keywords",//]
|
|
// TEST[s/\n$/\nstartyaml\n - compare_analyzers: {index: english_example, first: english, second: rebuilt_english}\nendyaml\n/]
|
|
<1> The default stopwords can be overridden with the `stopwords`
|
|
or `stopwords_path` parameters.
|
|
<2> This filter should be removed unless there are words which should
|
|
be excluded from stemming.
|
|
|
|
[[finnish-analyzer]]
|
|
===== `finnish` analyzer
|
|
|
|
The `finnish` analyzer could be reimplemented as a `custom` analyzer as follows:
|
|
|
|
[source,js]
|
|
----------------------------------------------------
|
|
PUT /finnish_example?include_type_name=true
|
|
{
|
|
"settings": {
|
|
"analysis": {
|
|
"filter": {
|
|
"finnish_stop": {
|
|
"type": "stop",
|
|
"stopwords": "_finnish_" <1>
|
|
},
|
|
"finnish_keywords": {
|
|
"type": "keyword_marker",
|
|
"keywords": ["esimerkki"] <2>
|
|
},
|
|
"finnish_stemmer": {
|
|
"type": "stemmer",
|
|
"language": "finnish"
|
|
}
|
|
},
|
|
"analyzer": {
|
|
"rebuilt_finnish": {
|
|
"tokenizer": "standard",
|
|
"filter": [
|
|
"lowercase",
|
|
"finnish_stop",
|
|
"finnish_keywords",
|
|
"finnish_stemmer"
|
|
]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
----------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[s/"finnish_keywords",//]
|
|
// TEST[s/\n$/\nstartyaml\n - compare_analyzers: {index: finnish_example, first: finnish, second: rebuilt_finnish}\nendyaml\n/]
|
|
<1> The default stopwords can be overridden with the `stopwords`
|
|
or `stopwords_path` parameters.
|
|
<2> This filter should be removed unless there are words which should
|
|
be excluded from stemming.
|
|
|
|
[[french-analyzer]]
|
|
===== `french` analyzer
|
|
|
|
The `french` analyzer could be reimplemented as a `custom` analyzer as follows:
|
|
|
|
[source,js]
|
|
----------------------------------------------------
|
|
PUT /french_example?include_type_name=true
|
|
{
|
|
"settings": {
|
|
"analysis": {
|
|
"filter": {
|
|
"french_elision": {
|
|
"type": "elision",
|
|
"articles_case": true,
|
|
"articles": [
|
|
"l", "m", "t", "qu", "n", "s",
|
|
"j", "d", "c", "jusqu", "quoiqu",
|
|
"lorsqu", "puisqu"
|
|
]
|
|
},
|
|
"french_stop": {
|
|
"type": "stop",
|
|
"stopwords": "_french_" <1>
|
|
},
|
|
"french_keywords": {
|
|
"type": "keyword_marker",
|
|
"keywords": ["Exemple"] <2>
|
|
},
|
|
"french_stemmer": {
|
|
"type": "stemmer",
|
|
"language": "light_french"
|
|
}
|
|
},
|
|
"analyzer": {
|
|
"rebuilt_french": {
|
|
"tokenizer": "standard",
|
|
"filter": [
|
|
"french_elision",
|
|
"lowercase",
|
|
"french_stop",
|
|
"french_keywords",
|
|
"french_stemmer"
|
|
]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
----------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[s/"french_keywords",//]
|
|
// TEST[s/\n$/\nstartyaml\n - compare_analyzers: {index: french_example, first: french, second: rebuilt_french}\nendyaml\n/]
|
|
<1> The default stopwords can be overridden with the `stopwords`
|
|
or `stopwords_path` parameters.
|
|
<2> This filter should be removed unless there are words which should
|
|
be excluded from stemming.
|
|
|
|
[[galician-analyzer]]
|
|
===== `galician` analyzer
|
|
|
|
The `galician` analyzer could be reimplemented as a `custom` analyzer as follows:
|
|
|
|
[source,js]
|
|
----------------------------------------------------
|
|
PUT /galician_example?include_type_name=true
|
|
{
|
|
"settings": {
|
|
"analysis": {
|
|
"filter": {
|
|
"galician_stop": {
|
|
"type": "stop",
|
|
"stopwords": "_galician_" <1>
|
|
},
|
|
"galician_keywords": {
|
|
"type": "keyword_marker",
|
|
"keywords": ["exemplo"] <2>
|
|
},
|
|
"galician_stemmer": {
|
|
"type": "stemmer",
|
|
"language": "galician"
|
|
}
|
|
},
|
|
"analyzer": {
|
|
"rebuilt_galician": {
|
|
"tokenizer": "standard",
|
|
"filter": [
|
|
"lowercase",
|
|
"galician_stop",
|
|
"galician_keywords",
|
|
"galician_stemmer"
|
|
]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
----------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[s/"galician_keywords",//]
|
|
// TEST[s/\n$/\nstartyaml\n - compare_analyzers: {index: galician_example, first: galician, second: rebuilt_galician}\nendyaml\n/]
|
|
<1> The default stopwords can be overridden with the `stopwords`
|
|
or `stopwords_path` parameters.
|
|
<2> This filter should be removed unless there are words which should
|
|
be excluded from stemming.
|
|
|
|
[[german-analyzer]]
|
|
===== `german` analyzer
|
|
|
|
The `german` analyzer could be reimplemented as a `custom` analyzer as follows:
|
|
|
|
[source,js]
|
|
----------------------------------------------------
|
|
PUT /german_example?include_type_name=true
|
|
{
|
|
"settings": {
|
|
"analysis": {
|
|
"filter": {
|
|
"german_stop": {
|
|
"type": "stop",
|
|
"stopwords": "_german_" <1>
|
|
},
|
|
"german_keywords": {
|
|
"type": "keyword_marker",
|
|
"keywords": ["Beispiel"] <2>
|
|
},
|
|
"german_stemmer": {
|
|
"type": "stemmer",
|
|
"language": "light_german"
|
|
}
|
|
},
|
|
"analyzer": {
|
|
"rebuilt_german": {
|
|
"tokenizer": "standard",
|
|
"filter": [
|
|
"lowercase",
|
|
"german_stop",
|
|
"german_keywords",
|
|
"german_normalization",
|
|
"german_stemmer"
|
|
]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
----------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[s/"german_keywords",//]
|
|
// TEST[s/\n$/\nstartyaml\n - compare_analyzers: {index: german_example, first: german, second: rebuilt_german}\nendyaml\n/]
|
|
<1> The default stopwords can be overridden with the `stopwords`
|
|
or `stopwords_path` parameters.
|
|
<2> This filter should be removed unless there are words which should
|
|
be excluded from stemming.
|
|
|
|
[[greek-analyzer]]
|
|
===== `greek` analyzer
|
|
|
|
The `greek` analyzer could be reimplemented as a `custom` analyzer as follows:
|
|
|
|
[source,js]
|
|
----------------------------------------------------
|
|
PUT /greek_example?include_type_name=true
|
|
{
|
|
"settings": {
|
|
"analysis": {
|
|
"filter": {
|
|
"greek_stop": {
|
|
"type": "stop",
|
|
"stopwords": "_greek_" <1>
|
|
},
|
|
"greek_lowercase": {
|
|
"type": "lowercase",
|
|
"language": "greek"
|
|
},
|
|
"greek_keywords": {
|
|
"type": "keyword_marker",
|
|
"keywords": ["παράδειγμα"] <2>
|
|
},
|
|
"greek_stemmer": {
|
|
"type": "stemmer",
|
|
"language": "greek"
|
|
}
|
|
},
|
|
"analyzer": {
|
|
"rebuilt_greek": {
|
|
"tokenizer": "standard",
|
|
"filter": [
|
|
"greek_lowercase",
|
|
"greek_stop",
|
|
"greek_keywords",
|
|
"greek_stemmer"
|
|
]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
----------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[s/"greek_keywords",//]
|
|
// TEST[s/\n$/\nstartyaml\n - compare_analyzers: {index: greek_example, first: greek, second: rebuilt_greek}\nendyaml\n/]
|
|
<1> The default stopwords can be overridden with the `stopwords`
|
|
or `stopwords_path` parameters.
|
|
<2> This filter should be removed unless there are words which should
|
|
be excluded from stemming.
|
|
|
|
[[hindi-analyzer]]
|
|
===== `hindi` analyzer
|
|
|
|
The `hindi` analyzer could be reimplemented as a `custom` analyzer as follows:
|
|
|
|
[source,js]
|
|
----------------------------------------------------
|
|
PUT /hindi_example?include_type_name=true
|
|
{
|
|
"settings": {
|
|
"analysis": {
|
|
"filter": {
|
|
"hindi_stop": {
|
|
"type": "stop",
|
|
"stopwords": "_hindi_" <1>
|
|
},
|
|
"hindi_keywords": {
|
|
"type": "keyword_marker",
|
|
"keywords": ["उदाहरण"] <2>
|
|
},
|
|
"hindi_stemmer": {
|
|
"type": "stemmer",
|
|
"language": "hindi"
|
|
}
|
|
},
|
|
"analyzer": {
|
|
"rebuilt_hindi": {
|
|
"tokenizer": "standard",
|
|
"filter": [
|
|
"lowercase",
|
|
"decimal_digit",
|
|
"hindi_keywords",
|
|
"indic_normalization",
|
|
"hindi_normalization",
|
|
"hindi_stop",
|
|
"hindi_stemmer"
|
|
]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
----------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[s/"hindi_keywords",//]
|
|
// TEST[s/\n$/\nstartyaml\n - compare_analyzers: {index: hindi_example, first: hindi, second: rebuilt_hindi}\nendyaml\n/]
|
|
<1> The default stopwords can be overridden with the `stopwords`
|
|
or `stopwords_path` parameters.
|
|
<2> This filter should be removed unless there are words which should
|
|
be excluded from stemming.
|
|
|
|
[[hungarian-analyzer]]
|
|
===== `hungarian` analyzer
|
|
|
|
The `hungarian` analyzer could be reimplemented as a `custom` analyzer as follows:
|
|
|
|
[source,js]
|
|
----------------------------------------------------
|
|
PUT /hungarian_example?include_type_name=true
|
|
{
|
|
"settings": {
|
|
"analysis": {
|
|
"filter": {
|
|
"hungarian_stop": {
|
|
"type": "stop",
|
|
"stopwords": "_hungarian_" <1>
|
|
},
|
|
"hungarian_keywords": {
|
|
"type": "keyword_marker",
|
|
"keywords": ["példa"] <2>
|
|
},
|
|
"hungarian_stemmer": {
|
|
"type": "stemmer",
|
|
"language": "hungarian"
|
|
}
|
|
},
|
|
"analyzer": {
|
|
"rebuilt_hungarian": {
|
|
"tokenizer": "standard",
|
|
"filter": [
|
|
"lowercase",
|
|
"hungarian_stop",
|
|
"hungarian_keywords",
|
|
"hungarian_stemmer"
|
|
]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
----------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[s/"hungarian_keywords",//]
|
|
// TEST[s/\n$/\nstartyaml\n - compare_analyzers: {index: hungarian_example, first: hungarian, second: rebuilt_hungarian}\nendyaml\n/]
|
|
<1> The default stopwords can be overridden with the `stopwords`
|
|
or `stopwords_path` parameters.
|
|
<2> This filter should be removed unless there are words which should
|
|
be excluded from stemming.
|
|
|
|
|
|
[[indonesian-analyzer]]
|
|
===== `indonesian` analyzer
|
|
|
|
The `indonesian` analyzer could be reimplemented as a `custom` analyzer as follows:
|
|
|
|
[source,js]
|
|
----------------------------------------------------
|
|
PUT /indonesian_example?include_type_name=true
|
|
{
|
|
"settings": {
|
|
"analysis": {
|
|
"filter": {
|
|
"indonesian_stop": {
|
|
"type": "stop",
|
|
"stopwords": "_indonesian_" <1>
|
|
},
|
|
"indonesian_keywords": {
|
|
"type": "keyword_marker",
|
|
"keywords": ["contoh"] <2>
|
|
},
|
|
"indonesian_stemmer": {
|
|
"type": "stemmer",
|
|
"language": "indonesian"
|
|
}
|
|
},
|
|
"analyzer": {
|
|
"rebuilt_indonesian": {
|
|
"tokenizer": "standard",
|
|
"filter": [
|
|
"lowercase",
|
|
"indonesian_stop",
|
|
"indonesian_keywords",
|
|
"indonesian_stemmer"
|
|
]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
----------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[s/"indonesian_keywords",//]
|
|
// TEST[s/\n$/\nstartyaml\n - compare_analyzers: {index: indonesian_example, first: indonesian, second: rebuilt_indonesian}\nendyaml\n/]
|
|
<1> The default stopwords can be overridden with the `stopwords`
|
|
or `stopwords_path` parameters.
|
|
<2> This filter should be removed unless there are words which should
|
|
be excluded from stemming.
|
|
|
|
[[irish-analyzer]]
|
|
===== `irish` analyzer
|
|
|
|
The `irish` analyzer could be reimplemented as a `custom` analyzer as follows:
|
|
|
|
[source,js]
|
|
----------------------------------------------------
|
|
PUT /irish_example?include_type_name=true
|
|
{
|
|
"settings": {
|
|
"analysis": {
|
|
"filter": {
|
|
"irish_hyphenation": {
|
|
"type": "stop",
|
|
"stopwords": [ "h", "n", "t" ],
|
|
"ignore_case": true
|
|
},
|
|
"irish_elision": {
|
|
"type": "elision",
|
|
"articles": [ "d", "m", "b" ],
|
|
"articles_case": true
|
|
},
|
|
"irish_stop": {
|
|
"type": "stop",
|
|
"stopwords": "_irish_" <1>
|
|
},
|
|
"irish_lowercase": {
|
|
"type": "lowercase",
|
|
"language": "irish"
|
|
},
|
|
"irish_keywords": {
|
|
"type": "keyword_marker",
|
|
"keywords": ["sampla"] <2>
|
|
},
|
|
"irish_stemmer": {
|
|
"type": "stemmer",
|
|
"language": "irish"
|
|
}
|
|
},
|
|
"analyzer": {
|
|
"rebuilt_irish": {
|
|
"tokenizer": "standard",
|
|
"filter": [
|
|
"irish_hyphenation",
|
|
"irish_elision",
|
|
"irish_lowercase",
|
|
"irish_stop",
|
|
"irish_keywords",
|
|
"irish_stemmer"
|
|
]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
----------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[s/"irish_keywords",//]
|
|
// TEST[s/\n$/\nstartyaml\n - compare_analyzers: {index: irish_example, first: irish, second: rebuilt_irish}\nendyaml\n/]
|
|
<1> The default stopwords can be overridden with the `stopwords`
|
|
or `stopwords_path` parameters.
|
|
<2> This filter should be removed unless there are words which should
|
|
be excluded from stemming.
|
|
|
|
[[italian-analyzer]]
|
|
===== `italian` analyzer
|
|
|
|
The `italian` analyzer could be reimplemented as a `custom` analyzer as follows:
|
|
|
|
[source,js]
|
|
----------------------------------------------------
|
|
PUT /italian_example?include_type_name=true
|
|
{
|
|
"settings": {
|
|
"analysis": {
|
|
"filter": {
|
|
"italian_elision": {
|
|
"type": "elision",
|
|
"articles": [
|
|
"c", "l", "all", "dall", "dell",
|
|
"nell", "sull", "coll", "pell",
|
|
"gl", "agl", "dagl", "degl", "negl",
|
|
"sugl", "un", "m", "t", "s", "v", "d"
|
|
],
|
|
"articles_case": true
|
|
},
|
|
"italian_stop": {
|
|
"type": "stop",
|
|
"stopwords": "_italian_" <1>
|
|
},
|
|
"italian_keywords": {
|
|
"type": "keyword_marker",
|
|
"keywords": ["esempio"] <2>
|
|
},
|
|
"italian_stemmer": {
|
|
"type": "stemmer",
|
|
"language": "light_italian"
|
|
}
|
|
},
|
|
"analyzer": {
|
|
"rebuilt_italian": {
|
|
"tokenizer": "standard",
|
|
"filter": [
|
|
"italian_elision",
|
|
"lowercase",
|
|
"italian_stop",
|
|
"italian_keywords",
|
|
"italian_stemmer"
|
|
]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
----------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[s/"italian_keywords",//]
|
|
// TEST[s/\n$/\nstartyaml\n - compare_analyzers: {index: italian_example, first: italian, second: rebuilt_italian}\nendyaml\n/]
|
|
<1> The default stopwords can be overridden with the `stopwords`
|
|
or `stopwords_path` parameters.
|
|
<2> This filter should be removed unless there are words which should
|
|
be excluded from stemming.
|
|
|
|
[[latvian-analyzer]]
|
|
===== `latvian` analyzer
|
|
|
|
The `latvian` analyzer could be reimplemented as a `custom` analyzer as follows:
|
|
|
|
[source,js]
|
|
----------------------------------------------------
|
|
PUT /latvian_example?include_type_name=true
|
|
{
|
|
"settings": {
|
|
"analysis": {
|
|
"filter": {
|
|
"latvian_stop": {
|
|
"type": "stop",
|
|
"stopwords": "_latvian_" <1>
|
|
},
|
|
"latvian_keywords": {
|
|
"type": "keyword_marker",
|
|
"keywords": ["piemērs"] <2>
|
|
},
|
|
"latvian_stemmer": {
|
|
"type": "stemmer",
|
|
"language": "latvian"
|
|
}
|
|
},
|
|
"analyzer": {
|
|
"rebuilt_latvian": {
|
|
"tokenizer": "standard",
|
|
"filter": [
|
|
"lowercase",
|
|
"latvian_stop",
|
|
"latvian_keywords",
|
|
"latvian_stemmer"
|
|
]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
----------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[s/"latvian_keywords",//]
|
|
// TEST[s/\n$/\nstartyaml\n - compare_analyzers: {index: latvian_example, first: latvian, second: rebuilt_latvian}\nendyaml\n/]
|
|
<1> The default stopwords can be overridden with the `stopwords`
|
|
or `stopwords_path` parameters.
|
|
<2> This filter should be removed unless there are words which should
|
|
be excluded from stemming.
|
|
|
|
[[lithuanian-analyzer]]
|
|
===== `lithuanian` analyzer
|
|
|
|
The `lithuanian` analyzer could be reimplemented as a `custom` analyzer as follows:
|
|
|
|
[source,js]
|
|
----------------------------------------------------
|
|
PUT /lithuanian_example?include_type_name=true
|
|
{
|
|
"settings": {
|
|
"analysis": {
|
|
"filter": {
|
|
"lithuanian_stop": {
|
|
"type": "stop",
|
|
"stopwords": "_lithuanian_" <1>
|
|
},
|
|
"lithuanian_keywords": {
|
|
"type": "keyword_marker",
|
|
"keywords": ["pavyzdys"] <2>
|
|
},
|
|
"lithuanian_stemmer": {
|
|
"type": "stemmer",
|
|
"language": "lithuanian"
|
|
}
|
|
},
|
|
"analyzer": {
|
|
"rebuilt_lithuanian": {
|
|
"tokenizer": "standard",
|
|
"filter": [
|
|
"lowercase",
|
|
"lithuanian_stop",
|
|
"lithuanian_keywords",
|
|
"lithuanian_stemmer"
|
|
]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
----------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[s/"lithuanian_keywords",//]
|
|
// TEST[s/\n$/\nstartyaml\n - compare_analyzers: {index: lithuanian_example, first: lithuanian, second: rebuilt_lithuanian}\nendyaml\n/]
|
|
<1> The default stopwords can be overridden with the `stopwords`
|
|
or `stopwords_path` parameters.
|
|
<2> This filter should be removed unless there are words which should
|
|
be excluded from stemming.
|
|
|
|
[[norwegian-analyzer]]
|
|
===== `norwegian` analyzer
|
|
|
|
The `norwegian` analyzer could be reimplemented as a `custom` analyzer as follows:
|
|
|
|
[source,js]
|
|
----------------------------------------------------
|
|
PUT /norwegian_example?include_type_name=true
|
|
{
|
|
"settings": {
|
|
"analysis": {
|
|
"filter": {
|
|
"norwegian_stop": {
|
|
"type": "stop",
|
|
"stopwords": "_norwegian_" <1>
|
|
},
|
|
"norwegian_keywords": {
|
|
"type": "keyword_marker",
|
|
"keywords": ["eksempel"] <2>
|
|
},
|
|
"norwegian_stemmer": {
|
|
"type": "stemmer",
|
|
"language": "norwegian"
|
|
}
|
|
},
|
|
"analyzer": {
|
|
"rebuilt_norwegian": {
|
|
"tokenizer": "standard",
|
|
"filter": [
|
|
"lowercase",
|
|
"norwegian_stop",
|
|
"norwegian_keywords",
|
|
"norwegian_stemmer"
|
|
]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
----------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[s/"norwegian_keywords",//]
|
|
// TEST[s/\n$/\nstartyaml\n - compare_analyzers: {index: norwegian_example, first: norwegian, second: rebuilt_norwegian}\nendyaml\n/]
|
|
<1> The default stopwords can be overridden with the `stopwords`
|
|
or `stopwords_path` parameters.
|
|
<2> This filter should be removed unless there are words which should
|
|
be excluded from stemming.
|
|
|
|
[[persian-analyzer]]
|
|
===== `persian` analyzer
|
|
|
|
The `persian` analyzer could be reimplemented as a `custom` analyzer as follows:
|
|
|
|
[source,js]
|
|
----------------------------------------------------
|
|
PUT /persian_example?include_type_name=true
|
|
{
|
|
"settings": {
|
|
"analysis": {
|
|
"char_filter": {
|
|
"zero_width_spaces": {
|
|
"type": "mapping",
|
|
"mappings": [ "\\u200C=> "] <1>
|
|
}
|
|
},
|
|
"filter": {
|
|
"persian_stop": {
|
|
"type": "stop",
|
|
"stopwords": "_persian_" <2>
|
|
}
|
|
},
|
|
"analyzer": {
|
|
"rebuilt_persian": {
|
|
"tokenizer": "standard",
|
|
"char_filter": [ "zero_width_spaces" ],
|
|
"filter": [
|
|
"lowercase",
|
|
"decimal_digit",
|
|
"arabic_normalization",
|
|
"persian_normalization",
|
|
"persian_stop"
|
|
]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
----------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[s/\n$/\nstartyaml\n - compare_analyzers: {index: persian_example, first: persian, second: rebuilt_persian}\nendyaml\n/]
|
|
<1> Replaces zero-width non-joiners with an ASCII space.
|
|
<2> The default stopwords can be overridden with the `stopwords`
|
|
or `stopwords_path` parameters.
|
|
|
|
[[portuguese-analyzer]]
|
|
===== `portuguese` analyzer
|
|
|
|
The `portuguese` analyzer could be reimplemented as a `custom` analyzer as follows:
|
|
|
|
[source,js]
|
|
----------------------------------------------------
|
|
PUT /portuguese_example?include_type_name=true
|
|
{
|
|
"settings": {
|
|
"analysis": {
|
|
"filter": {
|
|
"portuguese_stop": {
|
|
"type": "stop",
|
|
"stopwords": "_portuguese_" <1>
|
|
},
|
|
"portuguese_keywords": {
|
|
"type": "keyword_marker",
|
|
"keywords": ["exemplo"] <2>
|
|
},
|
|
"portuguese_stemmer": {
|
|
"type": "stemmer",
|
|
"language": "light_portuguese"
|
|
}
|
|
},
|
|
"analyzer": {
|
|
"rebuilt_portuguese": {
|
|
"tokenizer": "standard",
|
|
"filter": [
|
|
"lowercase",
|
|
"portuguese_stop",
|
|
"portuguese_keywords",
|
|
"portuguese_stemmer"
|
|
]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
----------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[s/"portuguese_keywords",//]
|
|
// TEST[s/\n$/\nstartyaml\n - compare_analyzers: {index: portuguese_example, first: portuguese, second: rebuilt_portuguese}\nendyaml\n/]
|
|
<1> The default stopwords can be overridden with the `stopwords`
|
|
or `stopwords_path` parameters.
|
|
<2> This filter should be removed unless there are words which should
|
|
be excluded from stemming.
|
|
|
|
[[romanian-analyzer]]
|
|
===== `romanian` analyzer
|
|
|
|
The `romanian` analyzer could be reimplemented as a `custom` analyzer as follows:
|
|
|
|
[source,js]
|
|
----------------------------------------------------
|
|
PUT /romanian_example?include_type_name=true
|
|
{
|
|
"settings": {
|
|
"analysis": {
|
|
"filter": {
|
|
"romanian_stop": {
|
|
"type": "stop",
|
|
"stopwords": "_romanian_" <1>
|
|
},
|
|
"romanian_keywords": {
|
|
"type": "keyword_marker",
|
|
"keywords": ["exemplu"] <2>
|
|
},
|
|
"romanian_stemmer": {
|
|
"type": "stemmer",
|
|
"language": "romanian"
|
|
}
|
|
},
|
|
"analyzer": {
|
|
"rebuilt_romanian": {
|
|
"tokenizer": "standard",
|
|
"filter": [
|
|
"lowercase",
|
|
"romanian_stop",
|
|
"romanian_keywords",
|
|
"romanian_stemmer"
|
|
]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
----------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[s/"romanian_keywords",//]
|
|
// TEST[s/\n$/\nstartyaml\n - compare_analyzers: {index: romanian_example, first: romanian, second: rebuilt_romanian}\nendyaml\n/]
|
|
<1> The default stopwords can be overridden with the `stopwords`
|
|
or `stopwords_path` parameters.
|
|
<2> This filter should be removed unless there are words which should
|
|
be excluded from stemming.
|
|
|
|
|
|
[[russian-analyzer]]
|
|
===== `russian` analyzer
|
|
|
|
The `russian` analyzer could be reimplemented as a `custom` analyzer as follows:
|
|
|
|
[source,js]
|
|
----------------------------------------------------
|
|
PUT /russian_example?include_type_name=true
|
|
{
|
|
"settings": {
|
|
"analysis": {
|
|
"filter": {
|
|
"russian_stop": {
|
|
"type": "stop",
|
|
"stopwords": "_russian_" <1>
|
|
},
|
|
"russian_keywords": {
|
|
"type": "keyword_marker",
|
|
"keywords": ["пример"] <2>
|
|
},
|
|
"russian_stemmer": {
|
|
"type": "stemmer",
|
|
"language": "russian"
|
|
}
|
|
},
|
|
"analyzer": {
|
|
"rebuilt_russian": {
|
|
"tokenizer": "standard",
|
|
"filter": [
|
|
"lowercase",
|
|
"russian_stop",
|
|
"russian_keywords",
|
|
"russian_stemmer"
|
|
]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
----------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[s/"russian_keywords",//]
|
|
// TEST[s/\n$/\nstartyaml\n - compare_analyzers: {index: russian_example, first: russian, second: rebuilt_russian}\nendyaml\n/]
|
|
<1> The default stopwords can be overridden with the `stopwords`
|
|
or `stopwords_path` parameters.
|
|
<2> This filter should be removed unless there are words which should
|
|
be excluded from stemming.
|
|
|
|
[[sorani-analyzer]]
|
|
===== `sorani` analyzer
|
|
|
|
The `sorani` analyzer could be reimplemented as a `custom` analyzer as follows:
|
|
|
|
[source,js]
|
|
----------------------------------------------------
|
|
PUT /sorani_example?include_type_name=true
|
|
{
|
|
"settings": {
|
|
"analysis": {
|
|
"filter": {
|
|
"sorani_stop": {
|
|
"type": "stop",
|
|
"stopwords": "_sorani_" <1>
|
|
},
|
|
"sorani_keywords": {
|
|
"type": "keyword_marker",
|
|
"keywords": ["mînak"] <2>
|
|
},
|
|
"sorani_stemmer": {
|
|
"type": "stemmer",
|
|
"language": "sorani"
|
|
}
|
|
},
|
|
"analyzer": {
|
|
"rebuilt_sorani": {
|
|
"tokenizer": "standard",
|
|
"filter": [
|
|
"sorani_normalization",
|
|
"lowercase",
|
|
"decimal_digit",
|
|
"sorani_stop",
|
|
"sorani_keywords",
|
|
"sorani_stemmer"
|
|
]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
----------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[s/"sorani_keywords",//]
|
|
// TEST[s/\n$/\nstartyaml\n - compare_analyzers: {index: sorani_example, first: sorani, second: rebuilt_sorani}\nendyaml\n/]
|
|
<1> The default stopwords can be overridden with the `stopwords`
|
|
or `stopwords_path` parameters.
|
|
<2> This filter should be removed unless there are words which should
|
|
be excluded from stemming.
|
|
|
|
[[spanish-analyzer]]
|
|
===== `spanish` analyzer
|
|
|
|
The `spanish` analyzer could be reimplemented as a `custom` analyzer as follows:
|
|
|
|
[source,js]
|
|
----------------------------------------------------
|
|
PUT /spanish_example?include_type_name=true
|
|
{
|
|
"settings": {
|
|
"analysis": {
|
|
"filter": {
|
|
"spanish_stop": {
|
|
"type": "stop",
|
|
"stopwords": "_spanish_" <1>
|
|
},
|
|
"spanish_keywords": {
|
|
"type": "keyword_marker",
|
|
"keywords": ["ejemplo"] <2>
|
|
},
|
|
"spanish_stemmer": {
|
|
"type": "stemmer",
|
|
"language": "light_spanish"
|
|
}
|
|
},
|
|
"analyzer": {
|
|
"rebuilt_spanish": {
|
|
"tokenizer": "standard",
|
|
"filter": [
|
|
"lowercase",
|
|
"spanish_stop",
|
|
"spanish_keywords",
|
|
"spanish_stemmer"
|
|
]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
----------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[s/"spanish_keywords",//]
|
|
// TEST[s/\n$/\nstartyaml\n - compare_analyzers: {index: spanish_example, first: spanish, second: rebuilt_spanish}\nendyaml\n/]
|
|
<1> The default stopwords can be overridden with the `stopwords`
|
|
or `stopwords_path` parameters.
|
|
<2> This filter should be removed unless there are words which should
|
|
be excluded from stemming.
|
|
|
|
[[swedish-analyzer]]
|
|
===== `swedish` analyzer
|
|
|
|
The `swedish` analyzer could be reimplemented as a `custom` analyzer as follows:
|
|
|
|
[source,js]
|
|
----------------------------------------------------
|
|
PUT /swedish_example?include_type_name=true
|
|
{
|
|
"settings": {
|
|
"analysis": {
|
|
"filter": {
|
|
"swedish_stop": {
|
|
"type": "stop",
|
|
"stopwords": "_swedish_" <1>
|
|
},
|
|
"swedish_keywords": {
|
|
"type": "keyword_marker",
|
|
"keywords": ["exempel"] <2>
|
|
},
|
|
"swedish_stemmer": {
|
|
"type": "stemmer",
|
|
"language": "swedish"
|
|
}
|
|
},
|
|
"analyzer": {
|
|
"rebuilt_swedish": {
|
|
"tokenizer": "standard",
|
|
"filter": [
|
|
"lowercase",
|
|
"swedish_stop",
|
|
"swedish_keywords",
|
|
"swedish_stemmer"
|
|
]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
----------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[s/"swedish_keywords",//]
|
|
// TEST[s/\n$/\nstartyaml\n - compare_analyzers: {index: swedish_example, first: swedish, second: rebuilt_swedish}\nendyaml\n/]
|
|
<1> The default stopwords can be overridden with the `stopwords`
|
|
or `stopwords_path` parameters.
|
|
<2> This filter should be removed unless there are words which should
|
|
be excluded from stemming.
|
|
|
|
[[turkish-analyzer]]
|
|
===== `turkish` analyzer
|
|
|
|
The `turkish` analyzer could be reimplemented as a `custom` analyzer as follows:
|
|
|
|
[source,js]
|
|
----------------------------------------------------
|
|
PUT /turkish_example?include_type_name=true
|
|
{
|
|
"settings": {
|
|
"analysis": {
|
|
"filter": {
|
|
"turkish_stop": {
|
|
"type": "stop",
|
|
"stopwords": "_turkish_" <1>
|
|
},
|
|
"turkish_lowercase": {
|
|
"type": "lowercase",
|
|
"language": "turkish"
|
|
},
|
|
"turkish_keywords": {
|
|
"type": "keyword_marker",
|
|
"keywords": ["örnek"] <2>
|
|
},
|
|
"turkish_stemmer": {
|
|
"type": "stemmer",
|
|
"language": "turkish"
|
|
}
|
|
},
|
|
"analyzer": {
|
|
"rebuilt_turkish": {
|
|
"tokenizer": "standard",
|
|
"filter": [
|
|
"apostrophe",
|
|
"turkish_lowercase",
|
|
"turkish_stop",
|
|
"turkish_keywords",
|
|
"turkish_stemmer"
|
|
]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
----------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[s/"turkish_keywords",//]
|
|
// TEST[s/\n$/\nstartyaml\n - compare_analyzers: {index: turkish_example, first: turkish, second: rebuilt_turkish}\nendyaml\n/]
|
|
<1> The default stopwords can be overridden with the `stopwords`
|
|
or `stopwords_path` parameters.
|
|
<2> This filter should be removed unless there are words which should
|
|
be excluded from stemming.
|
|
|
|
[[thai-analyzer]]
|
|
===== `thai` analyzer
|
|
|
|
The `thai` analyzer could be reimplemented as a `custom` analyzer as follows:
|
|
|
|
[source,js]
|
|
----------------------------------------------------
|
|
PUT /thai_example?include_type_name=true
|
|
{
|
|
"settings": {
|
|
"analysis": {
|
|
"filter": {
|
|
"thai_stop": {
|
|
"type": "stop",
|
|
"stopwords": "_thai_" <1>
|
|
}
|
|
},
|
|
"analyzer": {
|
|
"rebuilt_thai": {
|
|
"tokenizer": "thai",
|
|
"filter": [
|
|
"lowercase",
|
|
"decimal_digit",
|
|
"thai_stop"
|
|
]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
----------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[s/"thai_keywords",//]
|
|
// TEST[s/\n$/\nstartyaml\n - compare_analyzers: {index: thai_example, first: thai, second: rebuilt_thai}\nendyaml\n/]
|
|
<1> The default stopwords can be overridden with the `stopwords`
|
|
or `stopwords_path` parameters.
|