From 99542e66a647a5a62362eb28b631d223f7c5f179 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Christoph=20B=C3=BCscher?= Date: Wed, 5 Jun 2019 22:02:17 +0200 Subject: [PATCH] [Docs] Clarify caveats for phonetic filters replace option (#42807) The `replace` option in the phonetic token filter can have suprising side effects, e.g. such as described in #26921. This PR adds a note to be mindful about such scenarios and offers alternatives to using the `replace` option. Closes #26921 --- docs/plugins/analysis-phonetic.asciidoc | 8 ++++++++ docs/reference/query-dsl/match-query.asciidoc | 3 ++- 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/docs/plugins/analysis-phonetic.asciidoc b/docs/plugins/analysis-phonetic.asciidoc index e22f819e1eb..3627751670a 100644 --- a/docs/plugins/analysis-phonetic.asciidoc +++ b/docs/plugins/analysis-phonetic.asciidoc @@ -65,6 +65,14 @@ GET phonetic_sample/_analyze <1> Returns: `J`, `joe`, `BLKS`, `bloggs` +It is important to note that `"replace": false` can lead to unexpected behavior since +the original and the phonetically analyzed version are both kept at the same token position. +Some queries handle these stacked tokens in special ways. For example, the fuzzy `match` +query does not apply {ref}/common-options.html#fuzziness[fuzziness] to stacked synonym tokens. +This can lead to issues that are difficult to diagnose and reason about. For this reason, it +is often beneficial to use separate fields for analysis with and without phonetic filtering. +That way searches can be run against both fields with differing boosts and trade-offs (e.g. +only run a fuzzy `match` query on the original text field, but not on the phonetic version). [float] ===== Double metaphone settings diff --git a/docs/reference/query-dsl/match-query.asciidoc b/docs/reference/query-dsl/match-query.asciidoc index 5e45d2b3212..23474811449 100644 --- a/docs/reference/query-dsl/match-query.asciidoc +++ b/docs/reference/query-dsl/match-query.asciidoc @@ -56,7 +56,8 @@ rewritten. Fuzzy transpositions (`ab` -> `ba`) are allowed by default but can be disabled by setting `fuzzy_transpositions` to `false`. -Note that fuzzy matching is not applied to terms with synonyms, as under the hood +NOTE: Fuzzy matching is not applied to terms with synonyms or in cases where the +analysis process produces multiple tokens at the same position. Under the hood these terms are expanded to a special synonym query that blends term frequencies, which does not support fuzzy expansion.