[DOCS] Added function score query
This commit is contained in:
parent
aa59ef2e84
commit
765bd026f5
Binary file not shown.
After Width: | Height: | Size: 3.0 KiB |
Binary file not shown.
After Width: | Height: | Size: 3.2 KiB |
Binary file not shown.
After Width: | Height: | Size: 3.3 KiB |
|
@ -18,8 +18,6 @@ include::queries/common-terms-query.asciidoc[]
|
|||
|
||||
include::queries/custom-filters-score-query.asciidoc[]
|
||||
|
||||
include::queries/ids-query.asciidoc[]
|
||||
|
||||
include::queries/custom-score-query.asciidoc[]
|
||||
|
||||
include::queries/custom-boost-factor-query.asciidoc[]
|
||||
|
@ -36,6 +34,8 @@ include::queries/flt-query.asciidoc[]
|
|||
|
||||
include::queries/flt-field-query.asciidoc[]
|
||||
|
||||
include::queries/function-score-query.asciidoc[]
|
||||
|
||||
include::queries/fuzzy-query.asciidoc[]
|
||||
|
||||
include::queries/geo-shape-query.asciidoc[]
|
||||
|
@ -44,6 +44,8 @@ include::queries/has-child-query.asciidoc[]
|
|||
|
||||
include::queries/has-parent-query.asciidoc[]
|
||||
|
||||
include::queries/ids-query.asciidoc[]
|
||||
|
||||
include::queries/indices-query.asciidoc[]
|
||||
|
||||
include::queries/match-all-query.asciidoc[]
|
||||
|
|
|
@ -1,6 +1,8 @@
|
|||
[[query-dsl-custom-boost-factor-query]]
|
||||
=== Custom Boost Factor Query
|
||||
|
||||
deprecated[1.00.Beta,Replaced by <<query-dsl-function-score-query>>]
|
||||
|
||||
`custom_boost_factor` query allows to wrap another query and multiply
|
||||
its score by the provided `boost_factor`. This can sometimes be desired
|
||||
since `boost` value set on specific queries gets normalized, while this
|
||||
|
|
|
@ -1,6 +1,8 @@
|
|||
[[query-dsl-custom-filters-score-query]]
|
||||
=== Custom Filters Score Query
|
||||
|
||||
deprecated[1.00.Beta,Replaced by <<query-dsl-function-score-query>>]
|
||||
|
||||
A `custom_filters_score` query allows to execute a query, and if the hit
|
||||
matches a provided filter (ordered), use either a boost or a script
|
||||
associated with it to compute the score. Here is an example:
|
||||
|
|
|
@ -1,6 +1,8 @@
|
|||
[[query-dsl-custom-score-query]]
|
||||
=== Custom Score Query
|
||||
|
||||
deprecated[1.00.Beta,Replaced by <<query-dsl-function-score-query>>]
|
||||
|
||||
`custom_score` query allows to wrap another query and customize the
|
||||
scoring of it optionally with a computation derived from other field
|
||||
values in the doc (numeric ones) using
|
||||
|
|
|
@ -0,0 +1,484 @@
|
|||
[[query-dsl-function-score-query]]
|
||||
=== Function Score Query
|
||||
|
||||
added[1.00.Beta]
|
||||
|
||||
The `function_score` allows you to modify the score of documents that are
|
||||
retrieved by a query. This can be useful if, for example, a score
|
||||
function is computationally expensive and it is sufficient to compute
|
||||
the score on a filtered set of documents.
|
||||
|
||||
`function_score` provides the same functionality that
|
||||
<<query-dsl-custom-boost-factor-query>>,
|
||||
<<query-dsl-custom-score-query>> and
|
||||
<<query-dsl-custom-filters-score-query>> provided
|
||||
but furthermore adds futher scoring functionality such as
|
||||
distance and recency scoring (see description below).
|
||||
|
||||
==== Using function score
|
||||
|
||||
To use `function_score`, the user has to define a query and one or
|
||||
several functions, that compute a new score for each document returned
|
||||
by the query.
|
||||
|
||||
`function_score` can be used with only one function like this:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
"function_score": {
|
||||
"(query|filter)": {},
|
||||
"boost": "boost for the whole query",
|
||||
"FUNCTION": {},
|
||||
"boost_mode":"(mult|replace|...)"
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
Furthermore, several functions can be combined. In this case one can
|
||||
optionally choose to apply the function only if a document matches a
|
||||
given filter:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
"function_score": {
|
||||
"(query|filter)": {},
|
||||
"boost": "boost for the whole query",
|
||||
"functions": [
|
||||
{
|
||||
"filter": {},
|
||||
"FUNCTION": {}
|
||||
},
|
||||
{
|
||||
"FUNCTION": {}
|
||||
}
|
||||
],
|
||||
"max_boost": number,
|
||||
"score_mode": "(mult|max|...)",
|
||||
"boost_mode": "(mult|replace|...)"
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
If no filter is given with a function this is equivalent to specifying
|
||||
`"match_all": {}`
|
||||
|
||||
First, each document is scored by the defined functons. The parameter
|
||||
`score_mode` specifies how the computed scores are combined:
|
||||
|
||||
[horizontal]
|
||||
`multiply`:: scores are multiplied (default)
|
||||
`sum`:: scores are summed
|
||||
`avg`:: scores are averaged
|
||||
`first`:: the first function that has a matching filter
|
||||
is applied
|
||||
`max`:: maximum score is used
|
||||
`min`:: minimum score is used
|
||||
|
||||
The new score can be restricted to not exceed a certain limit by setting
|
||||
the `max_boost` parameter. The default for `max_boost` is FLT_MAX.
|
||||
|
||||
Finally, the newly computed score is combined with the score of the
|
||||
query. The parameter `boost_mode` defines how:
|
||||
|
||||
`multiply`:: query score and function score is multiplied (default)
|
||||
`replace`:: only function score is used, the query score is ignored
|
||||
`sum`:: query score and function score are added
|
||||
`avg`:: average
|
||||
`max`:: max of query score and function score
|
||||
`min`:: min of query score and function score
|
||||
|
||||
|
||||
==== Score functions
|
||||
|
||||
The `function_score` query provides several types of score functions.
|
||||
|
||||
===== Script score
|
||||
|
||||
The `script_score` function allows you to wrap another query and customize
|
||||
the scoring of it optionally with a computation derived from other numeric
|
||||
field values in the doc using a script expression. Here is a
|
||||
simple sample:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
"script_score" : {
|
||||
"script" : "_score * doc['my_numeric_field'].value"
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
On top of the different scripting field values and expression, the
|
||||
`_score` script parameter can be used to retrieve the score based on the
|
||||
wrapped query.
|
||||
|
||||
Scripts are cached for faster execution. If the script has parameters
|
||||
that it needs to take into account, it is preferable to reuse the same
|
||||
script, and provide parameters to it:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
"script_score": {
|
||||
"lang": "lang",
|
||||
"params": {
|
||||
"param1": value1,
|
||||
"param2": value2
|
||||
},
|
||||
"script": "_score * doc['my_numeric_field'].value / pow(param1, param2)"
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
Note that unlike the <<query-dsl-custom-score-query>>, the
|
||||
score of the query is multiplied with the result of the script scoring. If
|
||||
you wish to inhibit this, set `"boost_mode": "replace"`
|
||||
|
||||
===== Boost factor
|
||||
|
||||
The `boost_factor` score allows you to multiply the score by the provided
|
||||
`boost_factor`. This can sometimes be desired since boost value set on
|
||||
specific queries gets normalized, while for this score function it does
|
||||
not.
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
"boost_factor" : number
|
||||
--------------------------------------------------
|
||||
|
||||
===== Random
|
||||
|
||||
The `random_score` generates scores via a pseudo random number algorithm
|
||||
that is initialized with a `seed`.
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
"random_score": {
|
||||
"seed" : number
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
===== Decay functions
|
||||
|
||||
Decay functions score a document with a function that decays depending
|
||||
on the distance of a numeric field value of the document from a user
|
||||
given origin. This is similar to a range query, but with smooth edges
|
||||
instead of boxes.
|
||||
|
||||
To use distance scoring on a query that has numerical fields, the user
|
||||
has to define an `origin` and a `scale` for each field. The `origin`
|
||||
is needed to define the ``central point'' from which the distance
|
||||
is calculated, and the `scale` to define the rate of decay. The
|
||||
decay function is specified as
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
"DECAY_FUNCTION": {
|
||||
"FIELD_NAME": {
|
||||
"origin": "11, 12",
|
||||
"scale": "2km",
|
||||
"offset": "1km",
|
||||
"decay": 0.5
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
where `DECAY_FUNCTION` can be "linear", "exp" and "gauss" (see below).
|
||||
The `offset` and `decay` parameters are optional.
|
||||
|
||||
`offset`::
|
||||
If an `offset` is defined, the decay function will only compute a the
|
||||
decay function for documents with a distance greater that the defined
|
||||
`offset`. The default is 0.
|
||||
`decay`::
|
||||
|
||||
The `decay` parameter defines how documents are scored at the distance
|
||||
given at `scale`. If no `decay` is defined, documents at the distance
|
||||
`scale` will be scored 0.5.
|
||||
|
||||
For example, your documents might represents hotels and contain a geo
|
||||
location field. You want to compute a decay function depending on how
|
||||
far the hotel is from a given location. You might not immediately see
|
||||
what scale to choose for the gauss function, but you can say something
|
||||
like: "At a distance of 2km from the desired location, the score should
|
||||
be reduced by one third."
|
||||
|
||||
You can provide this parameter like this:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
"DECAY_FUNCTION": {
|
||||
"location": {
|
||||
"origin": "11, 12",
|
||||
"scale": "2km",
|
||||
"decay" : 0.33
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
The parameter "scale" will then be adjusted automatically to assure that
|
||||
the score function computes a score of 0.33 for hotels that are 2km away
|
||||
from the desired location.
|
||||
|
||||
The `DECAY_FUNCTION` determines the shape of the decay:
|
||||
|
||||
[horizontal]
|
||||
`gauss`::
|
||||
|
||||
Normal decay, computed as:
|
||||
+
|
||||
image:images/Gaussian.png[]
|
||||
|
||||
`exp`::
|
||||
|
||||
Exponential decay, computed as:
|
||||
+
|
||||
image:images/Exponential.png[]
|
||||
|
||||
|
||||
`linear`::
|
||||
Linear decay, computed as:
|
||||
+
|
||||
image:images/Linear.png[].
|
||||
+
|
||||
In contrast to the normal and exponential decay, this function actually
|
||||
sets the score to 0 if the field value exceeds twice the user given
|
||||
scale value.
|
||||
|
||||
==== Detailed example
|
||||
|
||||
Suppose you are searching for a hotel in a certain town. Your budget is
|
||||
limited. Also, you would like the hotel to be close to the town center,
|
||||
so the farther the hotel is from the desired location the less likely
|
||||
you are to check in.
|
||||
|
||||
You would like the query results that match your criterion (for
|
||||
example, "hotel, Nancy, non-smoker") to be scored with respect to
|
||||
distance to the town center and also the price.
|
||||
|
||||
Intuitively, you would like to define the town center as the origin and
|
||||
maybe you are willing to walk 2km to the town center from the hotel. +
|
||||
In this case your *origin* for the location field is the town center
|
||||
and the *scale* is ~2km.
|
||||
|
||||
If your budget is low, you would probably prefer something cheap above
|
||||
something expensive. For the price field, the *origin* would be 0 Euros
|
||||
and the *scale* depends on how much you are willing to pay, for example 20 Euros.
|
||||
|
||||
In this example, the fields might be called "price" for the price of the
|
||||
hotel and "location" for the coordinates of this hotel.
|
||||
|
||||
The function for `price` in this case would be
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
"DECAY_FUNCTION": {
|
||||
"price": {
|
||||
"origin": "0",
|
||||
"scale": "20"
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
and for `location`:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
|
||||
"DECAY_FUNCTION": {
|
||||
"location": {
|
||||
"origin": "11, 12",
|
||||
"scale": "2km"
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
where `DECAY_FUNCTION` can be "linear", "exp" and "gauss".
|
||||
|
||||
Suppose you want to multiply these two functions on the original score,
|
||||
the request would look like this:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
curl 'localhost:9200/hotels/_search/' -d '{
|
||||
"query": {
|
||||
"function_score": {
|
||||
"functions": [
|
||||
{
|
||||
"DECAY_FUNCTION": {
|
||||
"price": {
|
||||
"origin": "0",
|
||||
"scale": "20"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"DECAY_FUNCTION": {
|
||||
"location": {
|
||||
"origin": "11, 12",
|
||||
"scale": "2km"
|
||||
}
|
||||
}
|
||||
}
|
||||
],
|
||||
"query": {
|
||||
"match": {
|
||||
"properties": "balcony"
|
||||
}
|
||||
},
|
||||
"score_mode": "multiply"
|
||||
}
|
||||
}
|
||||
}'
|
||||
--------------------------------------------------
|
||||
|
||||
Next, we show how the computed score looks like for each of the three
|
||||
possible decay functions.
|
||||
|
||||
===== Normal decay, keyword `gauss`
|
||||
|
||||
When choosing `gauss` as the decay function in the above example, the
|
||||
contour and surface plot of the multiplier looks like this:
|
||||
|
||||
image::https://f.cloud.github.com/assets/4320215/768157/cd0e18a6-e898-11e2-9b3c-f0145078bd6f.png[width="700px"]
|
||||
|
||||
image::https://f.cloud.github.com/assets/4320215/768160/ec43c928-e898-11e2-8e0d-f3c4519dbd89.png[width="700px"]
|
||||
|
||||
Suppose your original search results matches three hotels :
|
||||
|
||||
* "Backback Nap"
|
||||
* "Drink n Drive"
|
||||
* "BnB Bellevue".
|
||||
|
||||
"Drink n Drive" is pretty far from your defined location (nearly 2 km)
|
||||
and is not too cheap (about 13 Euros) so it gets a low factor a factor
|
||||
of 0.56. "BnB Bellevue" and "Backback Nap" are both pretty close to the
|
||||
defined location but "BnB Bellevue" is cheaper, so it gets a multiplier
|
||||
of 0.86 whereas "Backpack Nap" gets a value of 0.66.
|
||||
|
||||
===== Exponential decay, keyword `exp`
|
||||
|
||||
When choosing `exp` as the decay function in the above example, the
|
||||
contour and surface plot of the multiplier looks like this:
|
||||
|
||||
image::https://f.cloud.github.com/assets/4320215/768161/082975c0-e899-11e2-86f7-174c3a729d64.png[width="700px"]
|
||||
|
||||
image::https://f.cloud.github.com/assets/4320215/768162/0b606884-e899-11e2-907b-aefc77eefef6.png[width="700px"]
|
||||
|
||||
===== Linear' decay, keyword `linear`
|
||||
|
||||
When choosing `linear` as the decay function in the above example, the
|
||||
contour and surface plot of the multiplier looks like this:
|
||||
|
||||
image::https://f.cloud.github.com/assets/4320215/768164/1775b0ca-e899-11e2-9f4a-776b406305c6.png[width="700px"]
|
||||
|
||||
image::https://f.cloud.github.com/assets/4320215/768165/19d8b1aa-e899-11e2-91bc-6b0553e8d722.png[width="700px"]
|
||||
|
||||
==== Supported fields for decay functions
|
||||
|
||||
Only single valued numeric fields, including time and geo locations,
|
||||
are supported.
|
||||
|
||||
==== What is a field is missing?
|
||||
|
||||
If the numeric field is missing in the document, the function will
|
||||
return 1.
|
||||
|
||||
==== Relation to `custom_boost`, `custom_score` and `custom_filters_score`
|
||||
|
||||
The <<query-dsl-custom-boost-factor-query>>
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
"custom_boost_factor": {
|
||||
"boost_factor": 5.2,
|
||||
"query": {...}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
becomes
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
"function_score": {
|
||||
"boost_factor": 5.2,
|
||||
"query": {...}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
The <<query-dsl-custom-score-query>>
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
"custom_score": {
|
||||
"params": {
|
||||
"param1": 2,
|
||||
"param2": 3.1
|
||||
},
|
||||
"query": {...},
|
||||
"script": "_score * doc['my_numeric_field'].value / pow(param1, param2)"
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
becomes
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
"function_score": {
|
||||
"boost_mode": "replace",
|
||||
"query": {...},
|
||||
"script_score": {
|
||||
"params": {
|
||||
"param1": 2,
|
||||
"param2": 3.1
|
||||
},
|
||||
"script": "_score * doc['my_numeric_field'].value / pow(param1, param2)"
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
and the <<query-dsl-custom-filters-score-query>>
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
"custom_filters_score": {
|
||||
"filters": [
|
||||
{
|
||||
"boost": "3",
|
||||
"filter": {...}
|
||||
},
|
||||
{
|
||||
"filter": {â¦},
|
||||
"script": "_score * doc['my_numeric_field'].value / pow(param1, param2)"
|
||||
}
|
||||
],
|
||||
"params": {
|
||||
"param1": 2,
|
||||
"param2": 3.1
|
||||
},
|
||||
"query": {...},
|
||||
"score_mode": "first"
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
becomes:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
"function_score": {
|
||||
"functions": [
|
||||
{
|
||||
"boost": "3",
|
||||
"filter": {...}
|
||||
},
|
||||
{
|
||||
"filter": {...},
|
||||
"script_score": {
|
||||
"params": {
|
||||
"param1": 2,
|
||||
"param2": 3.1
|
||||
},
|
||||
"script": "_score * doc['my_numeric_field'].value / pow(param1, param2)"
|
||||
}
|
||||
}
|
||||
],
|
||||
"query": {...},
|
||||
"score_mode": "first"
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
|
Loading…
Reference in New Issue