mirror of
https://github.com/honeymoose/OpenSearch.git
synced 2025-02-07 13:38:49 +00:00
bc5a9ca342
A lot of different API's currently use different names for the same logical parameter. Since lucene moved away from the notion of a `similarity` and now uses an `fuzziness` we should generalize this and encapsulate the generation, parsing and creation of these settings across all queries. This commit adds a new `Fuzziness` class that handles the renaming and generalization in a backwards compatible manner. This commit also added a ParseField class to better support deprecated Query DSL parameters The ParseField class allows specifying parameger that have been deprecated. Those parameters can be more easily tracked and removed in future version. This also allows to run queries in `strict` mode per index to throw exceptions if a query is executed with deprected keys. Closes #4082
103 lines
2.4 KiB
Plaintext
103 lines
2.4 KiB
Plaintext
[[query-dsl-fuzzy-query]]
|
|
=== Fuzzy Query
|
|
|
|
The fuzzy query uses similarity based on Levenshtein edit distance for
|
|
`string` fields, and a `+/-` margin on numeric and date fields.
|
|
|
|
==== String fields
|
|
|
|
The `fuzzy` query generates all possible matching terms that are within the
|
|
maximum edit distance specified in `fuzziness` and then checks the term
|
|
dictionary to find out which of those generated terms actually exist in the
|
|
index.
|
|
|
|
Here is a simple example:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"fuzzy" : { "user" : "ki" }
|
|
}
|
|
--------------------------------------------------
|
|
|
|
Or with more advanced settings:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"fuzzy" : {
|
|
"user" : {
|
|
"value" : "ki",
|
|
"boost" : 1.0,
|
|
"fuzziness" : 2,
|
|
"prefix_length" : 0,
|
|
"max_expansions": 100
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
[float]
|
|
===== Parameters
|
|
|
|
[horizontal]
|
|
`fuzziness`::
|
|
|
|
The maximum edit distance. Defaults to `AUTO`. See <<fuzziness>>.
|
|
|
|
`prefix_length`::
|
|
|
|
The number of initial characters which will not be ``fuzzified''. This
|
|
helps to reduce the number of terms which must be examined. Defaults
|
|
to `0`.
|
|
|
|
`max_expansions`::
|
|
|
|
The maximum number of terms that the `fuzzy` query will expand to.
|
|
Defaults to `0`.
|
|
|
|
|
|
WARNING: this query can be very heavy if `prefix_length` and `max_expansions`
|
|
are both set to their defaults of `0`. This could cause every term in the
|
|
index to be examined!
|
|
|
|
|
|
[float]
|
|
==== Numeric and date fields
|
|
|
|
Performs a <<query-dsl-range-query>> ``around'' the value using the
|
|
`fuzziness` value as a `+/-` range, where:
|
|
|
|
-fuzziness <= field value <= +fuzziness
|
|
|
|
For example:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"fuzzy" : {
|
|
"price" : {
|
|
"value" : 12,
|
|
"fuzziness" : 2
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
Will result in a range query between 10 and 14. Date fields support
|
|
<<time-units,time values>>, eg:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"fuzzy" : {
|
|
"created" : {
|
|
"value" : "2010-02-05T12:05:07",
|
|
"fuzziness" : "1d"
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
See <<fuzziness>> for more details about accepted values.
|