OpenSearch/docs/plugins/analysis-phonetic.asciidoc

[[analysis-phonetic]]
=== Phonetic Analysis Plugin

The Phonetic Analysis plugin provides token filters which convert tokens to
their phonetic representation using Soundex, Metaphone, and a variety of other
algorithms.

[[analysis-phonetic-install]]
[float]
==== Installation

This plugin can be installed using the plugin manager:

[source,sh]
----------------------------------------------------------------
sudo bin/plugin install analysis-phonetic
----------------------------------------------------------------

The plugin must be installed on every node in the cluster, and each node must
be restarted after installation.

[[analysis-phonetic-remove]]
[float]
==== Removal

The plugin can be removed with the following command:

[source,sh]
----------------------------------------------------------------
sudo bin/plugin remove analysis-phonetic
----------------------------------------------------------------

The node must be stopped before removing the plugin.

[[analysis-phonetic-token-filter]]
==== `phonetic` token filter

The `phonetic` token filter takes the following settings:

`encoder`::

    Which phonetic encoder to use.  Accepts `metaphone` (default),
    `doublemetaphone`, `soundex`, `refinedsoundex`, `caverphone1`,
    `caverphone2`, `cologne`, `nysiis`, `koelnerphonetik`, `haasephonetik`,
    `beidermorse`, `daitch_mokotoff`.

`replace`::

    Whether or not the original token should be replaced by the phonetic
    token. Accepts `true` (default) and `false`.  Not supported by
    `beidermorse` encoding.

[source,json]
--------------------------------------------------
PUT phonetic_sample
{
  "settings": {
    "index": {
      "analysis": {
        "analyzer": {
          "my_analyzer": {
            "tokenizer": "standard",
            "filter": [
              "standard",
              "lowercase",
              "my_metaphone"
            ]
          }
        },
        "filter": {
          "my_metaphone": {
            "type": "phonetic",
            "encoder": "metaphone",
            "replace": false
          }
        }
      }
    }
  }
}

POST phonetic_sample/_analyze?analyzer=my_analyzer&text=Joe Bloggs <1>
--------------------------------------------------
// AUTOSENSE

<1> Returns: `J`, `joe`, `BLKS`, `bloggs`


[float]
===== Double metaphone settings

If the `double_metaphone` encoder is used, then this additional setting is
supported:

`max_code_len`::

    The maximum length of the emitted metaphone token.  Defaults to `4`.

[float]
===== Beider Morse settings

If the `beider_morse` encoder is used, then these additional settings are
supported:

`rule_type`::

    Whether matching should be `exact` or `approx` (default).

`name_type`::

    Whether names are `ashkenazi`, `sephardic`, or `generic` (default).

`languageset`::

    An array of languages to check. If not specified, then the language will
    be guessed. Accepts: `any`, `comomon`, `cyrillic`, `english`, `french`,
    `german`, `hebrew`, `hungarian`, `polish`, `romanian`, `russian`,
    `spanish`.
Docs: Prepare plugin and integration docs for 2.0 * Centralised plugin docs in docs/plugins/ * Moved integrations into same docs * Moved community clients into the clients section of the docs * Removed docs/community Closes #11734 Closes #11724 Closes #11636 Closes #11635 Closes #11632 Closes #11630 Closes #12046 Closes #12438 Closes #12579 2015-08-15 18:00:55 +02:00			`[[analysis-phonetic]]`
			`=== Phonetic Analysis Plugin`

			`The Phonetic Analysis plugin provides token filters which convert tokens to`
			`their phonetic representation using Soundex, Metaphone, and a variety of other`
			`algorithms.`

			`[[analysis-phonetic-install]]`
			`[float]`
			`==== Installation`

			`This plugin can be installed using the plugin manager:`

			`[source,sh]`
			`----------------------------------------------------------------`
			`sudo bin/plugin install analysis-phonetic`
			`----------------------------------------------------------------`

			`The plugin must be installed on every node in the cluster, and each node must`
			`be restarted after installation.`

			`[[analysis-phonetic-remove]]`
			`[float]`
			`==== Removal`

			`The plugin can be removed with the following command:`

			`[source,sh]`
			`----------------------------------------------------------------`
			`sudo bin/plugin remove analysis-phonetic`
			`----------------------------------------------------------------`

			`The node must be stopped before removing the plugin.`

			`[[analysis-phonetic-token-filter]]`
			==== `phonetic` token filter

			The `phonetic` token filter takes the following settings:

			`encoder`::

			Which phonetic encoder to use. Accepts `metaphone` (default),
			`doublemetaphone`, `soundex`, `refinedsoundex`, `caverphone1`,
			`caverphone2`, `cologne`, `nysiis`, `koelnerphonetik`, `haasephonetik`,
Add support for `daitch_mokotoff` [Daitch Mokotoff](https://en.wikipedia.org/wiki/Daitch%E2%80%93Mokotoff_Soundex) support has been added in Lucene 5. So we can now support it as well. 2015-11-18 15:41:19 +01:00			`beidermorse`, `daitch_mokotoff`.
Docs: Prepare plugin and integration docs for 2.0 * Centralised plugin docs in docs/plugins/ * Moved integrations into same docs * Moved community clients into the clients section of the docs * Removed docs/community Closes #11734 Closes #11724 Closes #11636 Closes #11635 Closes #11632 Closes #11630 Closes #12046 Closes #12438 Closes #12579 2015-08-15 18:00:55 +02:00
			`replace`::

			`Whether or not the original token should be replaced by the phonetic`
			token. Accepts `true` (default) and `false`. Not supported by
			`beidermorse` encoding.

			`[source,json]`
			`--------------------------------------------------`
			`PUT phonetic_sample`
			`{`
			`"settings": {`
			`"index": {`
			`"analysis": {`
			`"analyzer": {`
			`"my_analyzer": {`
			`"tokenizer": "standard",`
			`"filter": [`
			`"standard",`
			`"lowercase",`
			`"my_metaphone"`
			`]`
			`}`
			`},`
			`"filter": {`
			`"my_metaphone": {`
			`"type": "phonetic",`
			`"encoder": "metaphone",`
			`"replace": false`
			`}`
			`}`
			`}`
			`}`
			`}`
			`}`

			`POST phonetic_sample/_analyze?analyzer=my_analyzer&text=Joe Bloggs <1>`
			`--------------------------------------------------`
			`// AUTOSENSE`

			<1> Returns: `J`, `joe`, `BLKS`, `bloggs`


			`[float]`
			`===== Double metaphone settings`

			If the `double_metaphone` encoder is used, then this additional setting is
			`supported:`

			`max_code_len`::

			The maximum length of the emitted metaphone token. Defaults to `4`.

			`[float]`
			`===== Beider Morse settings`

			If the `beider_morse` encoder is used, then these additional settings are
			`supported:`

			`rule_type`::

			Whether matching should be `exact` or `approx` (default).

			`name_type`::

			Whether names are `ashkenazi`, `sephardic`, or `generic` (default).

			`languageset`::

			`An array of languages to check. If not specified, then the language will`
			be guessed. Accepts: `any`, `comomon`, `cyrillic`, `english`, `french`,
			`german`, `hebrew`, `hungarian`, `polish`, `romanian`, `russian`,
			`spanish`.