[DOCS] Reformat rank feature query. Add relevance score section. (#44975)

2025-03-09 14:34:43 +00:00 · 2019-07-31 14:31:28 -04:00 · 2019-07-31 14:31:28 -04:00 · 3c4150cf72
commit 3c4150cf72
parent 728b0cf9ff
2 changed files with 187 additions and 90 deletions
--- a/docs/reference/query-dsl/query_filter_context.asciidoc
+++ b/docs/reference/query-dsl/query_filter_context.asciidoc
@ -1,27 +1,38 @@
 [[query-filter-context]]
 == Query and filter context

-The behaviour of a query clause depends on whether it is used in _query context_ or
-in _filter context_:
+[float]
+[[relevance-scores]]
+=== Relevance scores

-Query context::
-+
--
-A query clause used in query context answers the question ``__How well does this
+By default, Elasticsearch sorts matching search results by **relevance
+score**, which measures how well each document matches a query.
+
+The relevance score is a positive floating point number, returned in the
+`_score` meta-field of the <<search-request-body,search>> API. The higher the
+`_score`, the more relevant the document. While each query type can calculate
+relevance scores differently, score calculation also depends on whether the
+query clause is run in a **query** or **filter** context.
+
+[float]
+[[query-context]]
+=== Query context
+In the query context, a query clause answers the question ``__How well does this
 document match this query clause?__'' Besides deciding whether or not the
-document matches, the query clause also calculates a `_score` representing how
-well the document matches, relative to other documents.
+document matches, the query clause also calculates a relevance score in the
+`_score` meta-field.

-Query context is in effect whenever a query clause is passed to a `query` parameter,
-such as the `query` parameter in the <<request-body-search-query,`search`>> API.
--
+Query context is in effect whenever a query clause is passed to a `query`
+parameter, such as the `query` parameter in the
+<<request-body-search-query,search>> API.

-Filter context::
-+
--
-In _filter_ context, a query clause answers the question ``__Does this document
-match this query clause?__''  The answer is a simple Yes or No -- no scores are
-calculated.  Filter context is mostly used for filtering structured data, e.g.
+[float]
+[[filter-context]]
+=== Filter context
+In a filter context, a query clause answers the question ``__Does this
+document match this query clause?__''  The answer is a simple Yes or No -- no
+scores are calculated.  Filter context is mostly used for filtering structured
+data, e.g.

 *  __Does this +timestamp+ fall into the range 2015 to 2016?__
 *  __Is the +status+  field set to ++"published"++__?
@ -34,8 +45,10 @@ parameter, such as the `filter` or `must_not` parameters in the
 <<query-dsl-bool-query,`bool`>> query, the `filter` parameter in the
 <<query-dsl-constant-score-query,`constant_score`>> query, or the
 <<search-aggregations-bucket-filter-aggregation,`filter`>> aggregation.
--

+[float]
+[[query-filter-context-ex]]
+=== Example of query and filter contexts
 Below is an example of query clauses being used in query and filter context
 in the `search` API.  This query will match documents where all of the following
 conditions are met:
--- a/docs/reference/query-dsl/rank-feature-query.asciidoc
+++ b/docs/reference/query-dsl/rank-feature-query.asciidoc
@ -4,33 +4,58 @@
 <titleabbrev>Rank feature</titleabbrev>
 ++++

-The `rank_feature` query is a specialized query that only works on
-<<rank-feature,`rank_feature`>> fields and <<rank-features,`rank_features`>> fields.
-Its goal is to boost the score of documents based on the values of numeric
-features. It is typically put in a `should` clause of a
-<<query-dsl-bool-query,`bool`>> query so that its score is added to the score
-of the query.
+Boosts the <<relevance-scores,relevance score>> of documents based on the
+numeric value of a <<rank-feature,`rank_feature`>> or
+<<rank-features,`rank_features`>> field.

-Compared to using <<query-dsl-function-score-query,`function_score`>> or other
-ways to modify the score, this query has the benefit of being able to
-efficiently skip non-competitive hits when
-<<search-uri-request,`track_total_hits`>> is not set to `true`. Speedups may be
-spectacular.
+The `rank_feature` query is typically used in the `should` clause of a
+<<query-dsl-bool-query,`bool`>> query so its relevance scores are added to other
+scores from the `bool` query.

-Here is an example that indexes various features:
- - https://en.wikipedia.org/wiki/PageRank[`pagerank`], a measure of the
-   importance of a website,
- - `url_length`, the length of the url, which typically correlates negatively
-   with relevance,
- - `topics`, which associates a list of topics with every document alongside a
-   measure of how well the document is connected to this topic.
+Unlike the <<query-dsl-function-score-query,`function_score`>> query or other
+ways to change <<relevance-scores,relevance scores>>, the
+`rank_feature` query efficiently skips non-competitive hits when the
+<<search-uri-request,`track_total_hits`>> parameter is **not** `true`. This can
+dramatically improve query speed.

-Then the example includes an example query that searches for `"2016"` and boosts
-based or `pagerank`, `url_length` and the `sports` topic.
+[[rank-feature-query-functions]]
+==== Rank feature functions
+
+To calculate relevance scores based on rank feature fields, the `rank_feature`
+query supports the following mathematical functions:
+
+* <<rank-feature-query-saturation,Saturation>>
+* <<rank-feature-query-logarithm,Logarithm>>
+* <<rank-feature-query-sigmoid,Sigmoid>>
+
+If you don't know where to start, we recommend using the `saturation` function.
+If no function is provided, the `rank_feature` query uses the `saturation`
+function by default.
+
+[[rank-feature-query-ex-request]]
+==== Example request
+
+[[rank-feature-query-index-setup]]
+===== Index setup
+
+To use the `rank_feature` query, your index must include a
+<<rank-feature,`rank_feature`>> or <<rank-features,`rank_features`>> field
+mapping. To see how you can set up an index for the `rank_feature` query, try
+the following example.
+
+Create a `test` index with the following field mappings:
+
+- `pagerank`, a <<rank-feature,`rank_feature`>> field which measures the
+importance of a website
+- `url_length`, a <<rank-feature,`rank_feature`>> field which contains the
+length of the website's URL. For this example, a long URL correlates negatively
+to relevance, indicated by a `positive_score_impact` value of `false`.
+- `topics`, a <<rank-features,`rank_features`>> field which contains a list of
+topics and a measure of how well each document is connected to this topic

 [source,js]
--------------------------------------------------
-PUT test
+----
+PUT /test
 {
  "mappings": {
    "properties": {
@ -47,8 +72,16 @@ PUT test
    }
  }
 }
+----
+// CONSOLE
+// TESTSETUP

-PUT test/_doc/1
+
+Index several documents to the `test` index.
+
+[source,js]
+----
+PUT /test/_doc/1?refresh
 {
  "url": "http://en.wikipedia.org/wiki/2016_Summer_Olympics",
  "content": "Rio 2016",
@ -60,10 +93,10 @@ PUT test/_doc/1
  }
 }

-PUT test/_doc/2
+PUT /test/_doc/2?refresh
 {
  "url": "http://en.wikipedia.org/wiki/2016_Brazilian_Grand_Prix",
-  "content": "Formula One motor race held on 13 November 2016 at the Autódromo José Carlos Pace in São Paulo, Brazil",
+  "content": "Formula One motor race held on 13 November 2016",
  "pagerank": 50.3,
  "url_length": 47,
  "topics": {
@ -73,7 +106,7 @@ PUT test/_doc/2
  }
 }

-PUT test/_doc/3
+PUT /test/_doc/3?refresh
 {
  "url": "http://en.wikipedia.org/wiki/Deadpool_(film)",
  "content": "Deadpool is a 2016 American superhero film",
@ -84,10 +117,18 @@ PUT test/_doc/3
    "super hero": 65
  }
 }
+----
+// CONSOLE

-POST test/_refresh
+[[rank-feature-query-ex-query]]
+===== Example query

-GET test/_search 
+The following query searches for `2016` and boosts relevance scores based or
+`pagerank`, `url_length`, and the `sports` topic.
+
+[source,js]
+----
+GET /test/_search 
 {
  "query": {
    "bool": {
@ -120,31 +161,80 @@ GET test/_search
    }
  }
 }
--------------------------------------------------
+----
 // CONSOLE

-[float]
-=== Supported functions

-The `rank_feature` query supports 3 functions in order to boost scores using the
-values of rank features. If you do not know where to start, we recommend that you
-start with the `saturation` function, which is the default when no function is
-provided.
+[[rank-feature-top-level-params]]
+==== Top-level parameters for `rank_feature`

-[float]
-==== Saturation
+`field`::
+(Required, string) <<rank-feature,`rank_feature`>> or
+<<rank-features,`rank_features`>> field used to boost
+<<relevance-scores,relevance scores>>.

-This function gives a score that is equal to `S / (S + pivot)` where `S` is the
-value of the rank feature and `pivot` is a configurable pivot value so that the
-result will be less than +0.5+ if `S` is less than pivot and greater than +0.5+
-otherwise. Scores are always is +(0, 1)+.
+`boost`::
+
+--
+(Optional, float) Floating point number used to decrease or increase
+<<relevance-scores,relevance scores>>. Defaults to `1.0`.

-If the rank feature has a negative score impact then the function will be computed as
-`pivot / (S + pivot)`, which decreases when `S` increases.
+Boost values are relative to the default value of `1.0`. A boost value between
+`0` and `1.0` decreases the relevance score. A value greater than `1.0`
+increases the relevance score.
+--
+
+`saturation`::
+
+--
+(Optional, <<rank-feature-query-saturation,function object>>) Saturation
+function used to boost <<relevance-scores,relevance scores>> based on the
+value of the rank feature `field`. If no function is provided, the `rank_feature`
+query defaults to the `saturation` function. See
+<<rank-feature-query-saturation,Saturation>> for more information.
+
+Only one function `saturation`, `log`, or `sigmoid` can be provided.
+--
+
+`log`::
+
+--
+(Optional, <<rank-feature-query-logarithm,function object>>) Logarithmic
+function used to boost <<relevance-scores,relevance scores>> based on the
+value of the rank feature `field`. See
+<<rank-feature-query-logarithm,Logarithm>> for more information.
+
+Only one function `saturation`, `log`, or `sigmoid` can be provided.
+--
+
+`sigmoid`::
+
+--
+(Optional, <<rank-feature-query-sigmoid,function object>>) Sigmoid function used
+to boost <<relevance-scores,relevance scores>> based on the value of the
+rank feature `field`. See <<rank-feature-query-sigmoid,Sigmoid>> for more
+information.
+
+Only one function `saturation`, `log`, or `sigmoid` can be provided.
+--
+
+
+[[rank-feature-query-notes]]
+==== Notes
+
+[[rank-feature-query-saturation]]
+===== Saturation
+The `saturation` function gives a score equal to `S / (S + pivot)`, where `S` is
+the value of the rank feature field and `pivot` is a configurable pivot value so
+that the result will be less than `0.5` if `S` is less than pivot and greater
+than `0.5` otherwise. Scores are always `(0,1)`.
+
+If the rank feature has a negative score impact then the function will be
+computed as `pivot / (S + pivot)`, which decreases when `S` increases.

 [source,js]
 --------------------------------------------------
-GET test/_search
+GET /test/_search
 {
  "query": {
    "rank_feature": {
@ -157,16 +247,15 @@ GET test/_search
 }
 --------------------------------------------------
 // CONSOLE
-// TEST[continued]

-If +pivot+ is not supplied then Elasticsearch will compute a default value that
-will be approximately equal to the geometric mean of all feature values that
-exist in the index. We recommend this if you haven't had the opportunity to
-train a good pivot value.
+If a `pivot` value is not provided, {es} computes a default value equal to the
+approximate geometric mean of all rank feature values in the index. We recommend
+using this default value if you haven't had the opportunity to train a good
+pivot value.

 [source,js]
 --------------------------------------------------
-GET test/_search
+GET /test/_search
 {
  "query": {
    "rank_feature": {
@ -177,20 +266,18 @@ GET test/_search
 }
 --------------------------------------------------
 // CONSOLE
-// TEST[continued]

-[float]
-==== Logarithm
-
-This function gives a score that is equal to `log(scaling_factor + S)` where
-`S` is the value of the rank feature and `scaling_factor` is a configurable scaling
-factor. Scores are unbounded.
+[[rank-feature-query-logarithm]]
+===== Logarithm
+The `log` function gives a score equal to `log(scaling_factor + S)`, where `S`
+is the value of the rank feature field and `scaling_factor` is a configurable
+scaling factor. Scores are unbounded.

 This function only supports rank features that have a positive score impact.

 [source,js]
 --------------------------------------------------
-GET test/_search
+GET /test/_search
 {
  "query": {
    "rank_feature": {
@ -203,23 +290,21 @@ GET test/_search
 }
 --------------------------------------------------
 // CONSOLE
-// TEST[continued]

-[float]
-==== Sigmoid
-
-This function is an extension of `saturation` which adds a configurable
+[[rank-feature-query-sigmoid]]
+===== Sigmoid
+The `sigmoid` function is an extension of `saturation` which adds a configurable
 exponent. Scores are computed as `S^exp^ / (S^exp^ + pivot^exp^)`. Like for the
-`saturation` function, `pivot` is the value of `S` that gives a score of +0.5+
-and scores are in +(0, 1)+.
+`saturation` function, `pivot` is the value of `S` that gives a score of `0.5`
+and scores are `(0,1)`.

-`exponent` must be positive, but is typically in +[0.5, 1]+. A good value should
-be computed via training. If you don't have the opportunity to do so, we recommend
-that you stick to the `saturation` function instead.
+The `exponent` must be positive and is typically in `[0.5, 1]`. A
+good value should be computed via training. If you don't have the opportunity to
+do so, we recommend you use the `saturation` function instead.

 [source,js]
 --------------------------------------------------
-GET test/_search
+GET /test/_search
 {
  "query": {
    "rank_feature": {
@ -233,4 +318,3 @@ GET test/_search
 }
 --------------------------------------------------
 // CONSOLE
-// TEST[continued]