mirror of
https://github.com/honeymoose/OpenSearch.git
synced 2025-02-05 20:48:22 +00:00
7ad71f906a
The main benefit of the upgrade for users is the search optimization for top scored documents when the total hit count is not needed. However this optimization is not activated in this change, there is another issue opened to discuss how it should be integrated smoothly. Some comments about the change: * Tests that can produce negative scores have been adapted but we need to forbid them completely: #33309 Closes #32899
99 lines
2.3 KiB
Plaintext
99 lines
2.3 KiB
Plaintext
[[analysis-phonetic]]
|
|
=== Phonetic Analysis Plugin
|
|
|
|
The Phonetic Analysis plugin provides token filters which convert tokens to
|
|
their phonetic representation using Soundex, Metaphone, and a variety of other
|
|
algorithms.
|
|
|
|
:plugin_name: analysis-phonetic
|
|
include::install_remove.asciidoc[]
|
|
|
|
|
|
[[analysis-phonetic-token-filter]]
|
|
==== `phonetic` token filter
|
|
|
|
The `phonetic` token filter takes the following settings:
|
|
|
|
`encoder`::
|
|
|
|
Which phonetic encoder to use. Accepts `metaphone` (default),
|
|
`double_metaphone`, `soundex`, `refined_soundex`, `caverphone1`,
|
|
`caverphone2`, `cologne`, `nysiis`, `koelnerphonetik`, `haasephonetik`,
|
|
`beider_morse`, `daitch_mokotoff`.
|
|
|
|
`replace`::
|
|
|
|
Whether or not the original token should be replaced by the phonetic
|
|
token. Accepts `true` (default) and `false`. Not supported by
|
|
`beider_morse` encoding.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
PUT phonetic_sample
|
|
{
|
|
"settings": {
|
|
"index": {
|
|
"analysis": {
|
|
"analyzer": {
|
|
"my_analyzer": {
|
|
"tokenizer": "standard",
|
|
"filter": [
|
|
"lowercase",
|
|
"my_metaphone"
|
|
]
|
|
}
|
|
},
|
|
"filter": {
|
|
"my_metaphone": {
|
|
"type": "phonetic",
|
|
"encoder": "metaphone",
|
|
"replace": false
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
|
|
GET phonetic_sample/_analyze
|
|
{
|
|
"analyzer": "my_analyzer",
|
|
"text": "Joe Bloggs" <1>
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
|
|
<1> Returns: `J`, `joe`, `BLKS`, `bloggs`
|
|
|
|
|
|
[float]
|
|
===== Double metaphone settings
|
|
|
|
If the `double_metaphone` encoder is used, then this additional setting is
|
|
supported:
|
|
|
|
`max_code_len`::
|
|
|
|
The maximum length of the emitted metaphone token. Defaults to `4`.
|
|
|
|
[float]
|
|
===== Beider Morse settings
|
|
|
|
If the `beider_morse` encoder is used, then these additional settings are
|
|
supported:
|
|
|
|
`rule_type`::
|
|
|
|
Whether matching should be `exact` or `approx` (default).
|
|
|
|
`name_type`::
|
|
|
|
Whether names are `ashkenazi`, `sephardic`, or `generic` (default).
|
|
|
|
`languageset`::
|
|
|
|
An array of languages to check. If not specified, then the language will
|
|
be guessed. Accepts: `any`, `comomon`, `cyrillic`, `english`, `french`,
|
|
`german`, `hebrew`, `hungarian`, `polish`, `romanian`, `russian`,
|
|
`spanish`.
|