[DOCS] Added function score query

2013-09-04 21:59:46 +02:00 · 2013-09-04 21:59:46 +02:00 · 765bd026f5
parent aa59ef2e84
commit 765bd026f5
8 changed files with 494 additions and 2 deletions
--- a/docs/reference/images/Exponential.png
+++ b/docs/reference/images/Exponential.png
--- a/docs/reference/images/Gaussian.png
+++ b/docs/reference/images/Gaussian.png
--- a/docs/reference/images/Linear.png
+++ b/docs/reference/images/Linear.png
--- a/docs/reference/query-dsl/queries.asciidoc
+++ b/docs/reference/query-dsl/queries.asciidoc
@ -18,8 +18,6 @@ include::queries/common-terms-query.asciidoc[]

 include::queries/custom-filters-score-query.asciidoc[]

-include::queries/ids-query.asciidoc[]
-
 include::queries/custom-score-query.asciidoc[]

 include::queries/custom-boost-factor-query.asciidoc[]
@ -36,6 +34,8 @@ include::queries/flt-query.asciidoc[]

 include::queries/flt-field-query.asciidoc[]

+include::queries/function-score-query.asciidoc[]
+
 include::queries/fuzzy-query.asciidoc[]

 include::queries/geo-shape-query.asciidoc[]
@ -44,6 +44,8 @@ include::queries/has-child-query.asciidoc[]

 include::queries/has-parent-query.asciidoc[]

+include::queries/ids-query.asciidoc[]
+
 include::queries/indices-query.asciidoc[]

 include::queries/match-all-query.asciidoc[]
--- a/docs/reference/query-dsl/queries/custom-boost-factor-query.asciidoc
+++ b/docs/reference/query-dsl/queries/custom-boost-factor-query.asciidoc
@ -1,6 +1,8 @@
 [[query-dsl-custom-boost-factor-query]]
 === Custom Boost Factor Query

+deprecated[1.00.Beta,Replaced by <<query-dsl-function-score-query>>]
+
 `custom_boost_factor` query allows to wrap another query and multiply
 its score by the provided `boost_factor`. This can sometimes be desired
 since `boost` value set on specific queries gets normalized, while this
--- a/docs/reference/query-dsl/queries/custom-filters-score-query.asciidoc
+++ b/docs/reference/query-dsl/queries/custom-filters-score-query.asciidoc
@ -1,6 +1,8 @@
 [[query-dsl-custom-filters-score-query]]
 === Custom Filters Score Query

+deprecated[1.00.Beta,Replaced by <<query-dsl-function-score-query>>]
+
 A `custom_filters_score` query allows to execute a query, and if the hit
 matches a provided filter (ordered), use either a boost or a script
 associated with it to compute the score. Here is an example:
--- a/docs/reference/query-dsl/queries/custom-score-query.asciidoc
+++ b/docs/reference/query-dsl/queries/custom-score-query.asciidoc
@ -1,6 +1,8 @@
 [[query-dsl-custom-score-query]]
 === Custom Score Query

+deprecated[1.00.Beta,Replaced by <<query-dsl-function-score-query>>]
+
 `custom_score` query allows to wrap another query and customize the
 scoring of it optionally with a computation derived from other field
 values in the doc (numeric ones) using
--- a/docs/reference/query-dsl/queries/function-score-query.asciidoc
+++ b/docs/reference/query-dsl/queries/function-score-query.asciidoc
@ -0,0 +1,484 @@
+[[query-dsl-function-score-query]]
+=== Function Score Query
+
+added[1.00.Beta]
+
+The `function_score` allows you to modify the score of documents that are
+retrieved by a query. This can be useful if, for example, a score
+function is computationally expensive and it is sufficient to compute
+the score on a filtered set of documents.
+
+`function_score` provides the same functionality that
+<<query-dsl-custom-boost-factor-query>>,
+<<query-dsl-custom-score-query>> and
+<<query-dsl-custom-filters-score-query>> provided
+but furthermore adds futher scoring functionality such as
+distance and recency scoring (see description below).
+
+==== Using function score
+
+To use `function_score`, the user has to define a query and one or
+several functions, that compute a new score for each document returned
+by the query.
+
+`function_score` can be used with only one function like this:
+
+[source,js]
+--------------------------------------------------
+"function_score": {
+    "(query|filter)": {},
+    "boost": "boost for the whole query",
+    "FUNCTION": {},
+    "boost_mode":"(mult|replace|...)"
+}
+--------------------------------------------------
+
+Furthermore, several functions can be combined. In this case one can
+optionally choose to apply the function only if a document matches a
+given filter:
+
+[source,js]
+--------------------------------------------------
+"function_score": {
+    "(query|filter)": {},
+    "boost": "boost for the whole query",
+    "functions": [
+        {
+            "filter": {},
+            "FUNCTION": {}
+        },
+        {
+            "FUNCTION": {}
+        }
+    ],
+    "max_boost": number,
+    "score_mode": "(mult|max|...)",
+    "boost_mode": "(mult|replace|...)"
+}
+--------------------------------------------------
+
+If no filter is given with a function this is equivalent to specifying
+`"match_all": {}`
+
+First, each document is scored by the defined functons. The parameter
+`score_mode` specifies how the computed scores are combined:
+
+[horizontal]
+`multiply`::    scores are multiplied (default)
+`sum`::         scores are summed
+`avg`::         scores are averaged
+`first`::       the first function that has a matching filter
+                is applied
+`max`::         maximum score is used
+`min`::         minimum score is used
+
+The new score can be restricted to not exceed a certain limit by setting
+the `max_boost` parameter. The default for `max_boost` is FLT_MAX.
+
+Finally, the newly computed score is combined with the score of the
+query. The parameter `boost_mode` defines how:
+
+`multiply`::    query score and function score is multiplied (default)
+`replace`::     only function score is used, the query score is ignored
+`sum`::         query score and function score are added
+`avg`::         average
+`max`::         max of query score and function score
+`min`::         min of query score and function score
+
+
+==== Score functions
+
+The `function_score` query provides several types of score functions.
+
+===== Script score
+
+The `script_score` function allows you to wrap another query and customize
+the scoring of it optionally with a computation derived from other numeric
+field values in the doc using a script expression. Here is a
+simple sample:
+
+[source,js]
+--------------------------------------------------
+"script_score" : {
+    "script" : "_score * doc['my_numeric_field'].value"
+}
+--------------------------------------------------
+
+On top of the different scripting field values and expression, the
+`_score` script parameter can be used to retrieve the score based on the
+wrapped query.
+
+Scripts are cached for faster execution. If the script has parameters
+that it needs to take into account, it is preferable to reuse the same
+script, and provide parameters to it:
+
+[source,js]
+--------------------------------------------------
+"script_score": {
+    "lang": "lang",
+    "params": {
+        "param1": value1,
+        "param2": value2
+     },
+    "script": "_score * doc['my_numeric_field'].value / pow(param1, param2)"
+}
+--------------------------------------------------
+
+Note that unlike the <<query-dsl-custom-score-query>>, the
+score of the query is multiplied with the result of the script scoring. If
+you wish to inhibit this, set `"boost_mode": "replace"`
+
+===== Boost factor
+
+The `boost_factor` score allows you to multiply the score by the provided
+`boost_factor`. This can sometimes be desired since boost value set on
+specific queries gets normalized, while for this score function it does
+not.
+
+[source,js]
+--------------------------------------------------
+"boost_factor" : number
+--------------------------------------------------
+
+===== Random
+
+The `random_score` generates scores via a pseudo random number algorithm
+that is initialized with a `seed`.
+
+[source,js]
+--------------------------------------------------
+"random_score": {
+    "seed" : number
+}
+--------------------------------------------------
+
+===== Decay functions
+
+Decay functions score a document with a function that decays depending
+on the distance of a numeric field value of the document from a user
+given origin. This is similar to a range query, but with smooth edges
+instead of boxes.
+
+To use distance scoring on a query that has numerical fields, the user
+has to define an `origin` and a `scale` for each field. The `origin`
+is needed to define the ``central point'' from which the distance
+is calculated, and the `scale` to define the rate of decay. The
+decay function is specified as
+
+[source,js]
+--------------------------------------------------
+"DECAY_FUNCTION": {
+    "FIELD_NAME": {
+          "origin": "11, 12",
+          "scale": "2km",
+          "offset": "1km",
+          "decay": 0.5
+    }
+}
+--------------------------------------------------
+
+where `DECAY_FUNCTION` can be "linear", "exp" and "gauss" (see below).
+The `offset` and `decay` parameters are optional.
+
+`offset`::
+    If an `offset` is defined, the decay function will only compute a the
+    decay function for documents with a distance greater that the defined
+    `offset`. The default is 0.
+`decay`::
+
+The `decay` parameter defines how documents are scored at the distance
+given at `scale`. If no `decay` is defined, documents at the distance
+`scale` will be scored 0.5.
+
+For example, your documents might represents hotels and contain a geo
+location field. You want to compute a decay function depending on how
+far the hotel is from a given location. You might not immediately see
+what scale to choose for the gauss function, but you can say something
+like: "At a distance of 2km from the desired location, the score should
+be reduced by one third."
+
+You can provide this parameter like this:
+
+[source,js]
+--------------------------------------------------
+    "DECAY_FUNCTION": {
+        "location": {
+              "origin": "11, 12",
+              "scale": "2km",
+              "decay" : 0.33
+        }
+    }
+--------------------------------------------------
+
+The parameter "scale" will then be adjusted automatically to assure that
+the score function computes a score of 0.33 for hotels that are 2km away
+from the desired location.
+
+The `DECAY_FUNCTION` determines the shape of the decay:
+
+[horizontal]
+`gauss`::
+
+Normal decay, computed as:
+
+image:images/Gaussian.png[]
+
+`exp`::
+
+Exponential decay, computed as:
+
+image:images/Exponential.png[]
+
+
+`linear`::
+Linear decay, computed as:
+
+image:images/Linear.png[].
+
+In contrast to the normal and exponential decay, this function actually
+sets the score to 0 if the field value exceeds twice the user given
+scale value.
+
+==== Detailed example
+
+Suppose you are searching for a hotel in a certain town. Your budget is
+limited. Also, you would like the hotel to be close to the town center,
+so the farther the hotel is from the desired location the less likely
+you are to check in.
+
+You would like the query results that match your criterion (for
+example, "hotel, Nancy, non-smoker") to be scored with respect to
+distance to the town center and also the price.
+
+Intuitively, you would like to define the town center as the origin and
+maybe you are willing to walk 2km to the town center from the hotel. +
+In this case your *origin* for the location field is the town center
+and the *scale* is ~2km.
+
+If your budget is low, you would probably prefer something cheap above
+something expensive.  For the price field, the *origin* would be 0 Euros
+and the *scale* depends on how much you are willing to pay, for example 20 Euros.
+
+In this example, the fields might be called "price" for the price of the
+hotel and "location" for the coordinates of this hotel.
+
+The function for `price` in this case would be
+
+[source,js]
+--------------------------------------------------
+"DECAY_FUNCTION": {
+    "price": {
+          "origin": "0",
+          "scale": "20"
+    }
+}
+--------------------------------------------------
+
+and for `location`:
+
+[source,js]
+--------------------------------------------------
+
+"DECAY_FUNCTION": {
+    "location": {
+          "origin": "11, 12",
+          "scale": "2km"
+    }
+}
+--------------------------------------------------
+
+where `DECAY_FUNCTION` can be "linear", "exp" and "gauss".
+
+Suppose you want to multiply these two functions on the original score,
+the request would look like this:
+
+[source,js]
+--------------------------------------------------
+curl 'localhost:9200/hotels/_search/' -d '{
+"query": {
+    "function_score": {
+        "functions": [
+            {
+                "DECAY_FUNCTION": {
+                    "price": {
+                        "origin": "0",
+                        "scale": "20"
+                    }
+                }
+            },
+            {
+                "DECAY_FUNCTION": {
+                    "location": {
+                        "origin": "11, 12",
+                        "scale": "2km"
+                    }
+                }
+            }
+        ],
+        "query": {
+            "match": {
+                "properties": "balcony"
+            }
+        },
+        "score_mode": "multiply"
+    }
+}
+}'
+--------------------------------------------------
+
+Next, we show how the computed score looks like for each of the three
+possible decay functions.
+
+===== Normal decay, keyword `gauss`
+
+When choosing `gauss` as the decay function in the above example, the
+contour and surface plot of the multiplier looks like this:
+
+image::https://f.cloud.github.com/assets/4320215/768157/cd0e18a6-e898-11e2-9b3c-f0145078bd6f.png[width="700px"]
+
+image::https://f.cloud.github.com/assets/4320215/768160/ec43c928-e898-11e2-8e0d-f3c4519dbd89.png[width="700px"]
+
+Suppose your original search results matches three hotels :
+
+* "Backback Nap"
+* "Drink n Drive"
+* "BnB Bellevue".
+
+"Drink n Drive" is pretty far from your defined location (nearly 2 km)
+and is not too cheap (about 13 Euros) so it gets a low factor a factor
+of 0.56. "BnB Bellevue" and "Backback Nap" are both pretty close to the
+defined location but "BnB Bellevue" is cheaper, so it gets a multiplier
+of 0.86 whereas "Backpack Nap" gets a value of 0.66.
+
+===== Exponential decay, keyword `exp`
+
+When choosing `exp` as the decay function in the above example, the
+contour and surface plot of the multiplier looks like this:
+
+image::https://f.cloud.github.com/assets/4320215/768161/082975c0-e899-11e2-86f7-174c3a729d64.png[width="700px"]
+
+image::https://f.cloud.github.com/assets/4320215/768162/0b606884-e899-11e2-907b-aefc77eefef6.png[width="700px"]
+
+===== Linear' decay, keyword `linear`
+
+When choosing `linear` as the decay function in the above example, the
+contour and surface plot of the multiplier looks like this:
+
+image::https://f.cloud.github.com/assets/4320215/768164/1775b0ca-e899-11e2-9f4a-776b406305c6.png[width="700px"]
+
+image::https://f.cloud.github.com/assets/4320215/768165/19d8b1aa-e899-11e2-91bc-6b0553e8d722.png[width="700px"]
+
+==== Supported fields for decay functions
+
+Only single valued numeric fields, including time and geo locations,
+are supported.
+
+==== What is a field is missing?
+
+If the numeric field is missing in the document, the function will
+return 1.
+
+==== Relation to `custom_boost`, `custom_score` and `custom_filters_score`
+
+The <<query-dsl-custom-boost-factor-query>>
+
+[source,js]
+--------------------------------------------------
+"custom_boost_factor": {
+    "boost_factor": 5.2,
+    "query": {...}
+}
+--------------------------------------------------
+
+becomes
+
+[source,js]
+--------------------------------------------------
+"function_score": {
+    "boost_factor": 5.2,
+    "query": {...}
+}
+--------------------------------------------------
+
+The <<query-dsl-custom-score-query>>
+
+[source,js]
+--------------------------------------------------
+"custom_score": {
+    "params": {
+        "param1": 2,
+        "param2": 3.1
+    },
+    "query": {...},
+    "script": "_score * doc['my_numeric_field'].value / pow(param1, param2)"
+}
+--------------------------------------------------
+
+becomes
+
+[source,js]
+--------------------------------------------------
+"function_score": {
+    "boost_mode": "replace",
+    "query": {...},
+    "script_score": {
+        "params": {
+            "param1": 2,
+            "param2": 3.1
+        },
+        "script": "_score * doc['my_numeric_field'].value / pow(param1, param2)"
+    }
+}
+--------------------------------------------------
+
+and the <<query-dsl-custom-filters-score-query>>
+
+[source,js]
+--------------------------------------------------
+"custom_filters_score": {
+    "filters": [
+        {
+            "boost": "3",
+            "filter": {...}
+        },
+        {
+            "filter": {â¦},
+            "script": "_score * doc['my_numeric_field'].value / pow(param1, param2)"
+        }
+    ],
+    "params": {
+        "param1": 2,
+        "param2": 3.1
+    },
+    "query": {...},
+    "score_mode": "first"
+}
+--------------------------------------------------
+
+becomes:
+
+[source,js]
+--------------------------------------------------
+"function_score": {
+    "functions": [
+        {
+            "boost": "3",
+            "filter": {...}
+        },
+        {
+            "filter": {...},
+            "script_score": {
+                "params": {
+                    "param1": 2,
+                    "param2": 3.1
+                },
+                "script": "_score * doc['my_numeric_field'].value / pow(param1, param2)"
+            }
+        }
+    ],
+    "query": {...},
+    "score_mode": "first"
+}
+--------------------------------------------------
+
+