OpenSearch/docs/plugins/analysis-phonetic.asciidoc

[[analysis-phonetic]]
=== Phonetic Analysis Plugin

The Phonetic Analysis plugin provides token filters which convert tokens to
their phonetic representation using Soundex, Metaphone, and a variety of other
algorithms.

:plugin_name: analysis-phonetic
include::install_remove.asciidoc[]


[[analysis-phonetic-token-filter]]
==== `phonetic` token filter

The `phonetic` token filter takes the following settings:

`encoder`::

    Which phonetic encoder to use.  Accepts `metaphone` (default),
    `doublemetaphone`, `soundex`, `refinedsoundex`, `caverphone1`,
    `caverphone2`, `cologne`, `nysiis`, `koelnerphonetik`, `haasephonetik`,
    `beidermorse`, `daitch_mokotoff`.

`replace`::

    Whether or not the original token should be replaced by the phonetic
    token. Accepts `true` (default) and `false`.  Not supported by
    `beidermorse` encoding.

[source,js]
--------------------------------------------------
PUT phonetic_sample
{
  "settings": {
    "index": {
      "analysis": {
        "analyzer": {
          "my_analyzer": {
            "tokenizer": "standard",
            "filter": [
              "standard",
              "lowercase",
              "my_metaphone"
            ]
          }
        },
        "filter": {
          "my_metaphone": {
            "type": "phonetic",
            "encoder": "metaphone",
            "replace": false
          }
        }
      }
    }
  }
}

GET phonetic_sample/_analyze
{
  "analyzer": "my_analyzer",
  "text": "Joe Bloggs" <1>
}
--------------------------------------------------
// CONSOLE

<1> Returns: `J`, `joe`, `BLKS`, `bloggs`


[float]
===== Double metaphone settings

If the `double_metaphone` encoder is used, then this additional setting is
supported:

`max_code_len`::

    The maximum length of the emitted metaphone token.  Defaults to `4`.

[float]
===== Beider Morse settings

If the `beider_morse` encoder is used, then these additional settings are
supported:

`rule_type`::

    Whether matching should be `exact` or `approx` (default).

`name_type`::

    Whether names are `ashkenazi`, `sephardic`, or `generic` (default).

`languageset`::

    An array of languages to check. If not specified, then the language will
    be guessed. Accepts: `any`, `comomon`, `cyrillic`, `english`, `french`,
    `german`, `hebrew`, `hungarian`, `polish`, `romanian`, `russian`,
    `spanish`.
Docs: Prepare plugin and integration docs for 2.0 * Centralised plugin docs in docs/plugins/ * Moved integrations into same docs * Moved community clients into the clients section of the docs * Removed docs/community Closes #11734 Closes #11724 Closes #11636 Closes #11635 Closes #11632 Closes #11630 Closes #12046 Closes #12438 Closes #12579 2015-08-15 12:00:55 -04:00			`[[analysis-phonetic]]`
			`=== Phonetic Analysis Plugin`

			`The Phonetic Analysis plugin provides token filters which convert tokens to`
			`their phonetic representation using Soundex, Metaphone, and a variety of other`
			`algorithms.`

Added "release-state" support to plugin docs 2017-04-20 09:01:37 -04:00			`:plugin_name: analysis-phonetic`
			`include::install_remove.asciidoc[]`
Docs: Prepare plugin and integration docs for 2.0 * Centralised plugin docs in docs/plugins/ * Moved integrations into same docs * Moved community clients into the clients section of the docs * Removed docs/community Closes #11734 Closes #11724 Closes #11636 Closes #11635 Closes #11632 Closes #11630 Closes #12046 Closes #12438 Closes #12579 2015-08-15 12:00:55 -04:00

			`[[analysis-phonetic-token-filter]]`
			==== `phonetic` token filter

			The `phonetic` token filter takes the following settings:

			`encoder`::

			Which phonetic encoder to use. Accepts `metaphone` (default),
			`doublemetaphone`, `soundex`, `refinedsoundex`, `caverphone1`,
			`caverphone2`, `cologne`, `nysiis`, `koelnerphonetik`, `haasephonetik`,
Add support for `daitch_mokotoff` [Daitch Mokotoff](https://en.wikipedia.org/wiki/Daitch%E2%80%93Mokotoff_Soundex) support has been added in Lucene 5. So we can now support it as well. 2015-11-18 09:41:19 -05:00			`beidermorse`, `daitch_mokotoff`.
Docs: Prepare plugin and integration docs for 2.0 * Centralised plugin docs in docs/plugins/ * Moved integrations into same docs * Moved community clients into the clients section of the docs * Removed docs/community Closes #11734 Closes #11724 Closes #11636 Closes #11635 Closes #11632 Closes #11630 Closes #12046 Closes #12438 Closes #12579 2015-08-15 12:00:55 -04:00
			`replace`::

			`Whether or not the original token should be replaced by the phonetic`
			token. Accepts `true` (default) and `false`. Not supported by
			`beidermorse` encoding.

Docs: Replace [source,json] with [source,js] The syntax highlighter only supports [source,js]. Also adds a check to the rest test generator that runs during the build that'll fail the build if it sees `[source,json]`. 2016-05-12 12:43:01 -04:00			`[source,js]`
Docs: Prepare plugin and integration docs for 2.0 * Centralised plugin docs in docs/plugins/ * Moved integrations into same docs * Moved community clients into the clients section of the docs * Removed docs/community Closes #11734 Closes #11724 Closes #11636 Closes #11635 Closes #11632 Closes #11630 Closes #12046 Closes #12438 Closes #12579 2015-08-15 12:00:55 -04:00			`--------------------------------------------------`
			`PUT phonetic_sample`
			`{`
			`"settings": {`
			`"index": {`
			`"analysis": {`
			`"analyzer": {`
			`"my_analyzer": {`
			`"tokenizer": "standard",`
			`"filter": [`
			`"standard",`
			`"lowercase",`
			`"my_metaphone"`
			`]`
			`}`
			`},`
			`"filter": {`
			`"my_metaphone": {`
			`"type": "phonetic",`
			`"encoder": "metaphone",`
			`"replace": false`
			`}`
			`}`
			`}`
			`}`
			`}`
			`}`

Removing request parameters in _analyze API Remove unused imports Replace POST method by GET method in docs Add breaking changes explanation Fix small issue in Kuromoji docs Closes #20246 2016-09-30 16:42:45 -04:00			`GET phonetic_sample/_analyze`
Removing request parameters in _analyze API Remove request params in _analyze API without index param Change rest-api-test using JSON Change docs using JSON Closes #20246 2016-09-22 07:54:30 -04:00			`{`
			`"analyzer": "my_analyzer",`
			`"text": "Joe Bloggs" <1>`
			`}`
Docs: Prepare plugin and integration docs for 2.0 * Centralised plugin docs in docs/plugins/ * Moved integrations into same docs * Moved community clients into the clients section of the docs * Removed docs/community Closes #11734 Closes #11724 Closes #11636 Closes #11635 Closes #11632 Closes #11630 Closes #12046 Closes #12438 Closes #12579 2015-08-15 12:00:55 -04:00			`--------------------------------------------------`
Renamed all AUTOSENSE snippets to CONSOLE (#18210) 2016-05-09 09:42:23 -04:00			`// CONSOLE`
Docs: Prepare plugin and integration docs for 2.0 * Centralised plugin docs in docs/plugins/ * Moved integrations into same docs * Moved community clients into the clients section of the docs * Removed docs/community Closes #11734 Closes #11724 Closes #11636 Closes #11635 Closes #11632 Closes #11630 Closes #12046 Closes #12438 Closes #12579 2015-08-15 12:00:55 -04:00
			<1> Returns: `J`, `joe`, `BLKS`, `bloggs`


			`[float]`
			`===== Double metaphone settings`

			If the `double_metaphone` encoder is used, then this additional setting is
			`supported:`

			`max_code_len`::

			The maximum length of the emitted metaphone token. Defaults to `4`.

			`[float]`
			`===== Beider Morse settings`

			If the `beider_morse` encoder is used, then these additional settings are
			`supported:`

			`rule_type`::

			Whether matching should be `exact` or `approx` (default).

			`name_type`::

			Whether names are `ashkenazi`, `sephardic`, or `generic` (default).

			`languageset`::

			`An array of languages to check. If not specified, then the language will`
			be guessed. Accepts: `any`, `comomon`, `cyrillic`, `english`, `french`,
			`german`, `hebrew`, `hungarian`, `polish`, `romanian`, `russian`,
			`spanish`.