From 3c4150cf725878ecff80341818ddc9610092a234 Mon Sep 17 00:00:00 2001 From: James Rodewig Date: Wed, 31 Jul 2019 14:31:28 -0400 Subject: [PATCH] [DOCS] Reformat rank feature query. Add relevance score section. (#44975) --- .../query-dsl/query_filter_context.asciidoc | 51 ++-- .../query-dsl/rank-feature-query.asciidoc | 226 ++++++++++++------ 2 files changed, 187 insertions(+), 90 deletions(-) diff --git a/docs/reference/query-dsl/query_filter_context.asciidoc b/docs/reference/query-dsl/query_filter_context.asciidoc index c7065948a50..dd2b3cfd478 100644 --- a/docs/reference/query-dsl/query_filter_context.asciidoc +++ b/docs/reference/query-dsl/query_filter_context.asciidoc @@ -1,27 +1,38 @@ [[query-filter-context]] == Query and filter context -The behaviour of a query clause depends on whether it is used in _query context_ or -in _filter context_: +[float] +[[relevance-scores]] +=== Relevance scores -Query context:: -+ --- -A query clause used in query context answers the question ``__How well does this +By default, Elasticsearch sorts matching search results by **relevance +score**, which measures how well each document matches a query. + +The relevance score is a positive floating point number, returned in the +`_score` meta-field of the <> API. The higher the +`_score`, the more relevant the document. While each query type can calculate +relevance scores differently, score calculation also depends on whether the +query clause is run in a **query** or **filter** context. + +[float] +[[query-context]] +=== Query context +In the query context, a query clause answers the question ``__How well does this document match this query clause?__'' Besides deciding whether or not the -document matches, the query clause also calculates a `_score` representing how -well the document matches, relative to other documents. +document matches, the query clause also calculates a relevance score in the +`_score` meta-field. -Query context is in effect whenever a query clause is passed to a `query` parameter, -such as the `query` parameter in the <> API. --- +Query context is in effect whenever a query clause is passed to a `query` +parameter, such as the `query` parameter in the +<> API. -Filter context:: -+ --- -In _filter_ context, a query clause answers the question ``__Does this document -match this query clause?__'' The answer is a simple Yes or No -- no scores are -calculated. Filter context is mostly used for filtering structured data, e.g. +[float] +[[filter-context]] +=== Filter context +In a filter context, a query clause answers the question ``__Does this +document match this query clause?__'' The answer is a simple Yes or No -- no +scores are calculated. Filter context is mostly used for filtering structured +data, e.g. * __Does this +timestamp+ fall into the range 2015 to 2016?__ * __Is the +status+ field set to ++"published"++__? @@ -34,8 +45,10 @@ parameter, such as the `filter` or `must_not` parameters in the <> query, the `filter` parameter in the <> query, or the <> aggregation. --- +[float] +[[query-filter-context-ex]] +=== Example of query and filter contexts Below is an example of query clauses being used in query and filter context in the `search` API. This query will match documents where all of the following conditions are met: @@ -80,4 +93,4 @@ significand's precision will be converted to floats with loss of precision. TIP: Use query clauses in query context for conditions which should affect the score of matching documents (i.e. how well does the document match), and use -all other query clauses in filter context. +all other query clauses in filter context. \ No newline at end of file diff --git a/docs/reference/query-dsl/rank-feature-query.asciidoc b/docs/reference/query-dsl/rank-feature-query.asciidoc index 18e4562a90a..9a132e3e5d3 100644 --- a/docs/reference/query-dsl/rank-feature-query.asciidoc +++ b/docs/reference/query-dsl/rank-feature-query.asciidoc @@ -4,33 +4,58 @@ Rank feature ++++ -The `rank_feature` query is a specialized query that only works on -<> fields and <> fields. -Its goal is to boost the score of documents based on the values of numeric -features. It is typically put in a `should` clause of a -<> query so that its score is added to the score -of the query. +Boosts the <> of documents based on the +numeric value of a <> or +<> field. -Compared to using <> or other -ways to modify the score, this query has the benefit of being able to -efficiently skip non-competitive hits when -<> is not set to `true`. Speedups may be -spectacular. +The `rank_feature` query is typically used in the `should` clause of a +<> query so its relevance scores are added to other +scores from the `bool` query. -Here is an example that indexes various features: - - https://en.wikipedia.org/wiki/PageRank[`pagerank`], a measure of the - importance of a website, - - `url_length`, the length of the url, which typically correlates negatively - with relevance, - - `topics`, which associates a list of topics with every document alongside a - measure of how well the document is connected to this topic. +Unlike the <> query or other +ways to change <>, the +`rank_feature` query efficiently skips non-competitive hits when the +<> parameter is **not** `true`. This can +dramatically improve query speed. -Then the example includes an example query that searches for `"2016"` and boosts -based or `pagerank`, `url_length` and the `sports` topic. +[[rank-feature-query-functions]] +==== Rank feature functions + +To calculate relevance scores based on rank feature fields, the `rank_feature` +query supports the following mathematical functions: + +* <> +* <> +* <> + +If you don't know where to start, we recommend using the `saturation` function. +If no function is provided, the `rank_feature` query uses the `saturation` +function by default. + +[[rank-feature-query-ex-request]] +==== Example request + +[[rank-feature-query-index-setup]] +===== Index setup + +To use the `rank_feature` query, your index must include a +<> or <> field +mapping. To see how you can set up an index for the `rank_feature` query, try +the following example. + +Create a `test` index with the following field mappings: + +- `pagerank`, a <> field which measures the +importance of a website +- `url_length`, a <> field which contains the +length of the website's URL. For this example, a long URL correlates negatively +to relevance, indicated by a `positive_score_impact` value of `false`. +- `topics`, a <> field which contains a list of +topics and a measure of how well each document is connected to this topic [source,js] --------------------------------------------------- -PUT test +---- +PUT /test { "mappings": { "properties": { @@ -47,8 +72,16 @@ PUT test } } } +---- +// CONSOLE +// TESTSETUP -PUT test/_doc/1 + +Index several documents to the `test` index. + +[source,js] +---- +PUT /test/_doc/1?refresh { "url": "http://en.wikipedia.org/wiki/2016_Summer_Olympics", "content": "Rio 2016", @@ -60,10 +93,10 @@ PUT test/_doc/1 } } -PUT test/_doc/2 +PUT /test/_doc/2?refresh { "url": "http://en.wikipedia.org/wiki/2016_Brazilian_Grand_Prix", - "content": "Formula One motor race held on 13 November 2016 at the Autódromo José Carlos Pace in São Paulo, Brazil", + "content": "Formula One motor race held on 13 November 2016", "pagerank": 50.3, "url_length": 47, "topics": { @@ -73,7 +106,7 @@ PUT test/_doc/2 } } -PUT test/_doc/3 +PUT /test/_doc/3?refresh { "url": "http://en.wikipedia.org/wiki/Deadpool_(film)", "content": "Deadpool is a 2016 American superhero film", @@ -84,10 +117,18 @@ PUT test/_doc/3 "super hero": 65 } } +---- +// CONSOLE -POST test/_refresh +[[rank-feature-query-ex-query]] +===== Example query -GET test/_search +The following query searches for `2016` and boosts relevance scores based or +`pagerank`, `url_length`, and the `sports` topic. + +[source,js] +---- +GET /test/_search { "query": { "bool": { @@ -120,31 +161,80 @@ GET test/_search } } } --------------------------------------------------- +---- // CONSOLE -[float] -=== Supported functions -The `rank_feature` query supports 3 functions in order to boost scores using the -values of rank features. If you do not know where to start, we recommend that you -start with the `saturation` function, which is the default when no function is -provided. +[[rank-feature-top-level-params]] +==== Top-level parameters for `rank_feature` -[float] -==== Saturation +`field`:: +(Required, string) <> or +<> field used to boost +<>. -This function gives a score that is equal to `S / (S + pivot)` where `S` is the -value of the rank feature and `pivot` is a configurable pivot value so that the -result will be less than +0.5+ if `S` is less than pivot and greater than +0.5+ -otherwise. Scores are always is +(0, 1)+. +`boost`:: ++ +-- +(Optional, float) Floating point number used to decrease or increase +<>. Defaults to `1.0`. -If the rank feature has a negative score impact then the function will be computed as -`pivot / (S + pivot)`, which decreases when `S` increases. +Boost values are relative to the default value of `1.0`. A boost value between +`0` and `1.0` decreases the relevance score. A value greater than `1.0` +increases the relevance score. +-- + +`saturation`:: ++ +-- +(Optional, <>) Saturation +function used to boost <> based on the +value of the rank feature `field`. If no function is provided, the `rank_feature` +query defaults to the `saturation` function. See +<> for more information. + +Only one function `saturation`, `log`, or `sigmoid` can be provided. +-- + +`log`:: ++ +-- +(Optional, <>) Logarithmic +function used to boost <> based on the +value of the rank feature `field`. See +<> for more information. + +Only one function `saturation`, `log`, or `sigmoid` can be provided. +-- + +`sigmoid`:: ++ +-- +(Optional, <>) Sigmoid function used +to boost <> based on the value of the +rank feature `field`. See <> for more +information. + +Only one function `saturation`, `log`, or `sigmoid` can be provided. +-- + + +[[rank-feature-query-notes]] +==== Notes + +[[rank-feature-query-saturation]] +===== Saturation +The `saturation` function gives a score equal to `S / (S + pivot)`, where `S` is +the value of the rank feature field and `pivot` is a configurable pivot value so +that the result will be less than `0.5` if `S` is less than pivot and greater +than `0.5` otherwise. Scores are always `(0,1)`. + +If the rank feature has a negative score impact then the function will be +computed as `pivot / (S + pivot)`, which decreases when `S` increases. [source,js] -------------------------------------------------- -GET test/_search +GET /test/_search { "query": { "rank_feature": { @@ -157,16 +247,15 @@ GET test/_search } -------------------------------------------------- // CONSOLE -// TEST[continued] -If +pivot+ is not supplied then Elasticsearch will compute a default value that -will be approximately equal to the geometric mean of all feature values that -exist in the index. We recommend this if you haven't had the opportunity to -train a good pivot value. +If a `pivot` value is not provided, {es} computes a default value equal to the +approximate geometric mean of all rank feature values in the index. We recommend +using this default value if you haven't had the opportunity to train a good +pivot value. [source,js] -------------------------------------------------- -GET test/_search +GET /test/_search { "query": { "rank_feature": { @@ -177,20 +266,18 @@ GET test/_search } -------------------------------------------------- // CONSOLE -// TEST[continued] -[float] -==== Logarithm - -This function gives a score that is equal to `log(scaling_factor + S)` where -`S` is the value of the rank feature and `scaling_factor` is a configurable scaling -factor. Scores are unbounded. +[[rank-feature-query-logarithm]] +===== Logarithm +The `log` function gives a score equal to `log(scaling_factor + S)`, where `S` +is the value of the rank feature field and `scaling_factor` is a configurable +scaling factor. Scores are unbounded. This function only supports rank features that have a positive score impact. [source,js] -------------------------------------------------- -GET test/_search +GET /test/_search { "query": { "rank_feature": { @@ -203,23 +290,21 @@ GET test/_search } -------------------------------------------------- // CONSOLE -// TEST[continued] -[float] -==== Sigmoid - -This function is an extension of `saturation` which adds a configurable +[[rank-feature-query-sigmoid]] +===== Sigmoid +The `sigmoid` function is an extension of `saturation` which adds a configurable exponent. Scores are computed as `S^exp^ / (S^exp^ + pivot^exp^)`. Like for the -`saturation` function, `pivot` is the value of `S` that gives a score of +0.5+ -and scores are in +(0, 1)+. +`saturation` function, `pivot` is the value of `S` that gives a score of `0.5` +and scores are `(0,1)`. -`exponent` must be positive, but is typically in +[0.5, 1]+. A good value should -be computed via training. If you don't have the opportunity to do so, we recommend -that you stick to the `saturation` function instead. +The `exponent` must be positive and is typically in `[0.5, 1]`. A +good value should be computed via training. If you don't have the opportunity to +do so, we recommend you use the `saturation` function instead. [source,js] -------------------------------------------------- -GET test/_search +GET /test/_search { "query": { "rank_feature": { @@ -232,5 +317,4 @@ GET test/_search } } -------------------------------------------------- -// CONSOLE -// TEST[continued] +// CONSOLE \ No newline at end of file