OpenSearch/docs/plugins/analysis-phonetic.asciidoc
Nik Everett 9c85569883 Test docs for plugins
We weren't doing it before because we weren't starting the plugins.
Now we are.

The hardest part of this was handling the files the tests expect
to be on the filesystem. extraConfigFiles was broken.
2016-06-14 14:32:29 -04:00

121 lines
3.0 KiB
Plaintext

[[analysis-phonetic]]
=== Phonetic Analysis Plugin
The Phonetic Analysis plugin provides token filters which convert tokens to
their phonetic representation using Soundex, Metaphone, and a variety of other
algorithms.
[[analysis-phonetic-install]]
[float]
==== Installation
This plugin can be installed using the plugin manager:
[source,sh]
----------------------------------------------------------------
sudo bin/elasticsearch-plugin install analysis-phonetic
----------------------------------------------------------------
The plugin must be installed on every node in the cluster, and each node must
be restarted after installation.
[[analysis-phonetic-remove]]
[float]
==== Removal
The plugin can be removed with the following command:
[source,sh]
----------------------------------------------------------------
sudo bin/elasticsearch-plugin remove analysis-phonetic
----------------------------------------------------------------
The node must be stopped before removing the plugin.
[[analysis-phonetic-token-filter]]
==== `phonetic` token filter
The `phonetic` token filter takes the following settings:
`encoder`::
Which phonetic encoder to use. Accepts `metaphone` (default),
`doublemetaphone`, `soundex`, `refinedsoundex`, `caverphone1`,
`caverphone2`, `cologne`, `nysiis`, `koelnerphonetik`, `haasephonetik`,
`beidermorse`, `daitch_mokotoff`.
`replace`::
Whether or not the original token should be replaced by the phonetic
token. Accepts `true` (default) and `false`. Not supported by
`beidermorse` encoding.
[source,js]
--------------------------------------------------
PUT phonetic_sample
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"filter": [
"standard",
"lowercase",
"my_metaphone"
]
}
},
"filter": {
"my_metaphone": {
"type": "phonetic",
"encoder": "metaphone",
"replace": false
}
}
}
}
}
}
GET _cluster/health?wait_for_status=yellow
POST phonetic_sample/_analyze?analyzer=my_analyzer&text=Joe Bloggs <1>
--------------------------------------------------
// CONSOLE
<1> Returns: `J`, `joe`, `BLKS`, `bloggs`
[float]
===== Double metaphone settings
If the `double_metaphone` encoder is used, then this additional setting is
supported:
`max_code_len`::
The maximum length of the emitted metaphone token. Defaults to `4`.
[float]
===== Beider Morse settings
If the `beider_morse` encoder is used, then these additional settings are
supported:
`rule_type`::
Whether matching should be `exact` or `approx` (default).
`name_type`::
Whether names are `ashkenazi`, `sephardic`, or `generic` (default).
`languageset`::
An array of languages to check. If not specified, then the language will
be guessed. Accepts: `any`, `comomon`, `cyrillic`, `english`, `french`,
`german`, `hebrew`, `hungarian`, `polish`, `romanian`, `russian`,
`spanish`.