mirror of
https://github.com/honeymoose/OpenSearch.git
synced 2025-02-06 04:58:50 +00:00
bc5a9ca342
A lot of different API's currently use different names for the same logical parameter. Since lucene moved away from the notion of a `similarity` and now uses an `fuzziness` we should generalize this and encapsulate the generation, parsing and creation of these settings across all queries. This commit adds a new `Fuzziness` class that handles the renaming and generalization in a backwards compatible manner. This commit also added a ParseField class to better support deprecated Query DSL parameters The ParseField class allows specifying parameger that have been deprecated. Those parameters can be more easily tracked and removed in future version. This also allows to run queries in `strict` mode per index to throw exceptions if a query is executed with deprected keys. Closes #4082
66 lines
2.4 KiB
Plaintext
66 lines
2.4 KiB
Plaintext
[[query-dsl-flt-query]]
|
|
=== Fuzzy Like This Query
|
|
|
|
Fuzzy like this query find documents that are "like" provided text by
|
|
running it against one or more fields.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"fuzzy_like_this" : {
|
|
"fields" : ["name.first", "name.last"],
|
|
"like_text" : "text like this one",
|
|
"max_query_terms" : 12
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
`fuzzy_like_this` can be shortened to `flt`.
|
|
|
|
The `fuzzy_like_this` top level parameters include:
|
|
|
|
[cols="<,<",options="header",]
|
|
|=======================================================================
|
|
|Parameter |Description
|
|
|`fields` |A list of the fields to run the more like this query against.
|
|
Defaults to the `_all` field.
|
|
|
|
|`like_text` |The text to find documents like it, *required*.
|
|
|
|
|`ignore_tf` |Should term frequency be ignored. Defaults to `false`.
|
|
|
|
|`max_query_terms` |The maximum number of query terms that will be
|
|
included in any generated query. Defaults to `25`.
|
|
|
|
|`fuzziness` |The minimum similarity of the term variants. Defaults
|
|
to `0.5`. See <<fuzziness>>.
|
|
|
|
|`prefix_length` |Length of required common prefix on variant terms.
|
|
Defaults to `0`.
|
|
|
|
|`boost` |Sets the boost value of the query. Defaults to `1.0`.
|
|
|
|
|`analyzer` |The analyzer that will be used to analyze the text.
|
|
Defaults to the analyzer associated with the field.
|
|
|=======================================================================
|
|
|
|
[float]
|
|
==== How it Works
|
|
|
|
Fuzzifies ALL terms provided as strings and then picks the best n
|
|
differentiating terms. In effect this mixes the behaviour of FuzzyQuery
|
|
and MoreLikeThis but with special consideration of fuzzy scoring
|
|
factors. This generally produces good results for queries where users
|
|
may provide details in a number of fields and have no knowledge of
|
|
boolean query syntax and also want a degree of fuzzy matching and a fast
|
|
query.
|
|
|
|
For each source term the fuzzy variants are held in a BooleanQuery with
|
|
no coord factor (because we are not looking for matches on multiple
|
|
variants in any one doc). Additionally, a specialized TermQuery is used
|
|
for variants and does not use that variant term's IDF because this would
|
|
favor rarer terms, such as misspellings. Instead, all variants use the
|
|
same IDF ranking (the one for the source query term) and this is
|
|
factored into the variant's boost. If the source query term does not
|
|
exist in the index the average IDF of the variants is used.
|