mirror of
https://github.com/honeymoose/OpenSearch.git
synced 2025-02-06 04:58:50 +00:00
25aac4f77f
The "include_type_name" parameter was temporarily introduced in #37285 to facilitate moving the default parameter setting to "false" in many places in the documentation code snippets. Most of the places can simply be reverted without causing errors. In this change I looked for asciidoc files that contained the "include_type_name=true" addition when creating new indices but didn't look likey they made use of the "_doc" type for mappings. This is mostly the case e.g. in the analysis docs where index creating often only contains settings. I manually corrected the use of types in some places where the docs still used an explicit type name and not the dummy "_doc" type.
99 lines
2.3 KiB
Plaintext
99 lines
2.3 KiB
Plaintext
[[analysis-phonetic]]
|
|
=== Phonetic Analysis Plugin
|
|
|
|
The Phonetic Analysis plugin provides token filters which convert tokens to
|
|
their phonetic representation using Soundex, Metaphone, and a variety of other
|
|
algorithms.
|
|
|
|
:plugin_name: analysis-phonetic
|
|
include::install_remove.asciidoc[]
|
|
|
|
|
|
[[analysis-phonetic-token-filter]]
|
|
==== `phonetic` token filter
|
|
|
|
The `phonetic` token filter takes the following settings:
|
|
|
|
`encoder`::
|
|
|
|
Which phonetic encoder to use. Accepts `metaphone` (default),
|
|
`double_metaphone`, `soundex`, `refined_soundex`, `caverphone1`,
|
|
`caverphone2`, `cologne`, `nysiis`, `koelnerphonetik`, `haasephonetik`,
|
|
`beider_morse`, `daitch_mokotoff`.
|
|
|
|
`replace`::
|
|
|
|
Whether or not the original token should be replaced by the phonetic
|
|
token. Accepts `true` (default) and `false`. Not supported by
|
|
`beider_morse` encoding.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
PUT phonetic_sample
|
|
{
|
|
"settings": {
|
|
"index": {
|
|
"analysis": {
|
|
"analyzer": {
|
|
"my_analyzer": {
|
|
"tokenizer": "standard",
|
|
"filter": [
|
|
"lowercase",
|
|
"my_metaphone"
|
|
]
|
|
}
|
|
},
|
|
"filter": {
|
|
"my_metaphone": {
|
|
"type": "phonetic",
|
|
"encoder": "metaphone",
|
|
"replace": false
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
|
|
GET phonetic_sample/_analyze
|
|
{
|
|
"analyzer": "my_analyzer",
|
|
"text": "Joe Bloggs" <1>
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
|
|
<1> Returns: `J`, `joe`, `BLKS`, `bloggs`
|
|
|
|
|
|
[float]
|
|
===== Double metaphone settings
|
|
|
|
If the `double_metaphone` encoder is used, then this additional setting is
|
|
supported:
|
|
|
|
`max_code_len`::
|
|
|
|
The maximum length of the emitted metaphone token. Defaults to `4`.
|
|
|
|
[float]
|
|
===== Beider Morse settings
|
|
|
|
If the `beider_morse` encoder is used, then these additional settings are
|
|
supported:
|
|
|
|
`rule_type`::
|
|
|
|
Whether matching should be `exact` or `approx` (default).
|
|
|
|
`name_type`::
|
|
|
|
Whether names are `ashkenazi`, `sephardic`, or `generic` (default).
|
|
|
|
`languageset`::
|
|
|
|
An array of languages to check. If not specified, then the language will
|
|
be guessed. Accepts: `any`, `common`, `cyrillic`, `english`, `french`,
|
|
`german`, `hebrew`, `hungarian`, `polish`, `romanian`, `russian`,
|
|
`spanish`.
|