OpenSearch/docs/plugins/analysis-phonetic.asciidoc
Christoph Büscher 25aac4f77f
Remove include_type_name in asciidoc where possible (#37568)
The "include_type_name" parameter was temporarily introduced in #37285 to facilitate
moving the default parameter setting to "false" in many places in the documentation
code snippets. Most of the places can simply be reverted without causing errors.
In this change I looked for asciidoc files that contained the
"include_type_name=true" addition when creating new indices but didn't look
likey they made use of the "_doc" type for mappings. This is mostly the case
e.g. in the analysis docs where index creating often only contains settings. I
manually corrected the use of types in some places where the docs still used an
explicit type name and not the dummy "_doc" type.
2019-01-18 09:34:11 +01:00

99 lines
2.3 KiB
Plaintext

[[analysis-phonetic]]
=== Phonetic Analysis Plugin
The Phonetic Analysis plugin provides token filters which convert tokens to
their phonetic representation using Soundex, Metaphone, and a variety of other
algorithms.
:plugin_name: analysis-phonetic
include::install_remove.asciidoc[]
[[analysis-phonetic-token-filter]]
==== `phonetic` token filter
The `phonetic` token filter takes the following settings:
`encoder`::
Which phonetic encoder to use. Accepts `metaphone` (default),
`double_metaphone`, `soundex`, `refined_soundex`, `caverphone1`,
`caverphone2`, `cologne`, `nysiis`, `koelnerphonetik`, `haasephonetik`,
`beider_morse`, `daitch_mokotoff`.
`replace`::
Whether or not the original token should be replaced by the phonetic
token. Accepts `true` (default) and `false`. Not supported by
`beider_morse` encoding.
[source,js]
--------------------------------------------------
PUT phonetic_sample
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"my_metaphone"
]
}
},
"filter": {
"my_metaphone": {
"type": "phonetic",
"encoder": "metaphone",
"replace": false
}
}
}
}
}
}
GET phonetic_sample/_analyze
{
"analyzer": "my_analyzer",
"text": "Joe Bloggs" <1>
}
--------------------------------------------------
// CONSOLE
<1> Returns: `J`, `joe`, `BLKS`, `bloggs`
[float]
===== Double metaphone settings
If the `double_metaphone` encoder is used, then this additional setting is
supported:
`max_code_len`::
The maximum length of the emitted metaphone token. Defaults to `4`.
[float]
===== Beider Morse settings
If the `beider_morse` encoder is used, then these additional settings are
supported:
`rule_type`::
Whether matching should be `exact` or `approx` (default).
`name_type`::
Whether names are `ashkenazi`, `sephardic`, or `generic` (default).
`languageset`::
An array of languages to check. If not specified, then the language will
be guessed. Accepts: `any`, `common`, `cyrillic`, `english`, `french`,
`german`, `hebrew`, `hungarian`, `polish`, `romanian`, `russian`,
`spanish`.