122 lines
3.1 KiB
Plaintext
122 lines
3.1 KiB
Plaintext
[[analysis-phonetic]]
|
|
=== Phonetic Analysis Plugin
|
|
|
|
The Phonetic Analysis plugin provides token filters which convert tokens to
|
|
their phonetic representation using Soundex, Metaphone, and a variety of other
|
|
algorithms.
|
|
|
|
[[analysis-phonetic-install]]
|
|
[float]
|
|
==== Installation
|
|
|
|
This plugin can be installed using the plugin manager:
|
|
|
|
[source,sh]
|
|
----------------------------------------------------------------
|
|
sudo bin/elasticsearch-plugin install analysis-phonetic
|
|
----------------------------------------------------------------
|
|
|
|
The plugin must be installed on every node in the cluster, and each node must
|
|
be restarted after installation.
|
|
|
|
This plugin can be downloaded for <<plugin-management-custom-url,offline install>> from
|
|
{plugin_url}/analysis-phonetic/analysis-phonetic-{version}.zip.
|
|
|
|
[[analysis-phonetic-remove]]
|
|
[float]
|
|
==== Removal
|
|
|
|
The plugin can be removed with the following command:
|
|
|
|
[source,sh]
|
|
----------------------------------------------------------------
|
|
sudo bin/elasticsearch-plugin remove analysis-phonetic
|
|
----------------------------------------------------------------
|
|
|
|
The node must be stopped before removing the plugin.
|
|
|
|
[[analysis-phonetic-token-filter]]
|
|
==== `phonetic` token filter
|
|
|
|
The `phonetic` token filter takes the following settings:
|
|
|
|
`encoder`::
|
|
|
|
Which phonetic encoder to use. Accepts `metaphone` (default),
|
|
`doublemetaphone`, `soundex`, `refinedsoundex`, `caverphone1`,
|
|
`caverphone2`, `cologne`, `nysiis`, `koelnerphonetik`, `haasephonetik`,
|
|
`beidermorse`, `daitch_mokotoff`.
|
|
|
|
`replace`::
|
|
|
|
Whether or not the original token should be replaced by the phonetic
|
|
token. Accepts `true` (default) and `false`. Not supported by
|
|
`beidermorse` encoding.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
PUT phonetic_sample
|
|
{
|
|
"settings": {
|
|
"index": {
|
|
"analysis": {
|
|
"analyzer": {
|
|
"my_analyzer": {
|
|
"tokenizer": "standard",
|
|
"filter": [
|
|
"standard",
|
|
"lowercase",
|
|
"my_metaphone"
|
|
]
|
|
}
|
|
},
|
|
"filter": {
|
|
"my_metaphone": {
|
|
"type": "phonetic",
|
|
"encoder": "metaphone",
|
|
"replace": false
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
|
|
POST phonetic_sample/_analyze?analyzer=my_analyzer&text=Joe Bloggs <1>
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
|
|
<1> Returns: `J`, `joe`, `BLKS`, `bloggs`
|
|
|
|
|
|
[float]
|
|
===== Double metaphone settings
|
|
|
|
If the `double_metaphone` encoder is used, then this additional setting is
|
|
supported:
|
|
|
|
`max_code_len`::
|
|
|
|
The maximum length of the emitted metaphone token. Defaults to `4`.
|
|
|
|
[float]
|
|
===== Beider Morse settings
|
|
|
|
If the `beider_morse` encoder is used, then these additional settings are
|
|
supported:
|
|
|
|
`rule_type`::
|
|
|
|
Whether matching should be `exact` or `approx` (default).
|
|
|
|
`name_type`::
|
|
|
|
Whether names are `ashkenazi`, `sephardic`, or `generic` (default).
|
|
|
|
`languageset`::
|
|
|
|
An array of languages to check. If not specified, then the language will
|
|
be guessed. Accepts: `any`, `comomon`, `cyrillic`, `english`, `french`,
|
|
`german`, `hebrew`, `hungarian`, `polish`, `romanian`, `russian`,
|
|
`spanish`.
|