OpenSearch/docs/plugins/analysis-phonetic.asciidoc

121 lines
3.0 KiB
Plaintext

[[analysis-phonetic]]
=== Phonetic Analysis Plugin
The Phonetic Analysis plugin provides token filters which convert tokens to
their phonetic representation using Soundex, Metaphone, and a variety of other
algorithms.
[[analysis-phonetic-install]]
[float]
==== Installation
This plugin can be installed using the plugin manager:
[source,sh]
----------------------------------------------------------------
sudo bin/plugin install analysis-phonetic
----------------------------------------------------------------
The plugin must be installed on every node in the cluster, and each node must
be restarted after installation.
[[analysis-phonetic-remove]]
[float]
==== Removal
The plugin can be removed with the following command:
[source,sh]
----------------------------------------------------------------
sudo bin/plugin remove analysis-phonetic
----------------------------------------------------------------
The node must be stopped before removing the plugin.
[[analysis-phonetic-token-filter]]
==== `phonetic` token filter
The `phonetic` token filter takes the following settings:
`encoder`::
Which phonetic encoder to use. Accepts `metaphone` (default),
`doublemetaphone`, `soundex`, `refinedsoundex`, `caverphone1`,
`caverphone2`, `cologne`, `nysiis`, `koelnerphonetik`, `haasephonetik`,
`beidermorse`, `daitch_mokotoff`.
`replace`::
Whether or not the original token should be replaced by the phonetic
token. Accepts `true` (default) and `false`. Not supported by
`beidermorse` encoding.
[source,json]
--------------------------------------------------
PUT phonetic_sample
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"filter": [
"standard",
"lowercase",
"my_metaphone"
]
}
},
"filter": {
"my_metaphone": {
"type": "phonetic",
"encoder": "metaphone",
"replace": false
}
}
}
}
}
}
POST phonetic_sample/_analyze?analyzer=my_analyzer&text=Joe Bloggs <1>
--------------------------------------------------
// AUTOSENSE
<1> Returns: `J`, `joe`, `BLKS`, `bloggs`
[float]
===== Double metaphone settings
If the `double_metaphone` encoder is used, then this additional setting is
supported:
`max_code_len`::
The maximum length of the emitted metaphone token. Defaults to `4`.
[float]
===== Beider Morse settings
If the `beider_morse` encoder is used, then these additional settings are
supported:
`rule_type`::
Whether matching should be `exact` or `approx` (default).
`name_type`::
Whether names are `ashkenazi`, `sephardic`, or `generic` (default).
`languageset`::
An array of languages to check. If not specified, then the language will
be guessed. Accepts: `any`, `comomon`, `cyrillic`, `english`, `french`,
`german`, `hebrew`, `hungarian`, `polish`, `romanian`, `russian`,
`spanish`.