[DOCS] Reformat rank feature query. Add relevance score section. (#44975)

This commit is contained in:
James Rodewig 2019-07-31 14:31:28 -04:00
parent 728b0cf9ff
commit 3c4150cf72
2 changed files with 187 additions and 90 deletions

View File

@ -1,27 +1,38 @@
[[query-filter-context]]
== Query and filter context
The behaviour of a query clause depends on whether it is used in _query context_ or
in _filter context_:
[float]
[[relevance-scores]]
=== Relevance scores
Query context::
+
--
A query clause used in query context answers the question ``__How well does this
By default, Elasticsearch sorts matching search results by **relevance
score**, which measures how well each document matches a query.
The relevance score is a positive floating point number, returned in the
`_score` meta-field of the <<search-request-body,search>> API. The higher the
`_score`, the more relevant the document. While each query type can calculate
relevance scores differently, score calculation also depends on whether the
query clause is run in a **query** or **filter** context.
[float]
[[query-context]]
=== Query context
In the query context, a query clause answers the question ``__How well does this
document match this query clause?__'' Besides deciding whether or not the
document matches, the query clause also calculates a `_score` representing how
well the document matches, relative to other documents.
document matches, the query clause also calculates a relevance score in the
`_score` meta-field.
Query context is in effect whenever a query clause is passed to a `query` parameter,
such as the `query` parameter in the <<request-body-search-query,`search`>> API.
--
Query context is in effect whenever a query clause is passed to a `query`
parameter, such as the `query` parameter in the
<<request-body-search-query,search>> API.
Filter context::
+
--
In _filter_ context, a query clause answers the question ``__Does this document
match this query clause?__'' The answer is a simple Yes or No -- no scores are
calculated. Filter context is mostly used for filtering structured data, e.g.
[float]
[[filter-context]]
=== Filter context
In a filter context, a query clause answers the question ``__Does this
document match this query clause?__'' The answer is a simple Yes or No -- no
scores are calculated. Filter context is mostly used for filtering structured
data, e.g.
* __Does this +timestamp+ fall into the range 2015 to 2016?__
* __Is the +status+ field set to ++"published"++__?
@ -34,8 +45,10 @@ parameter, such as the `filter` or `must_not` parameters in the
<<query-dsl-bool-query,`bool`>> query, the `filter` parameter in the
<<query-dsl-constant-score-query,`constant_score`>> query, or the
<<search-aggregations-bucket-filter-aggregation,`filter`>> aggregation.
--
[float]
[[query-filter-context-ex]]
=== Example of query and filter contexts
Below is an example of query clauses being used in query and filter context
in the `search` API. This query will match documents where all of the following
conditions are met:

View File

@ -4,33 +4,58 @@
<titleabbrev>Rank feature</titleabbrev>
++++
The `rank_feature` query is a specialized query that only works on
<<rank-feature,`rank_feature`>> fields and <<rank-features,`rank_features`>> fields.
Its goal is to boost the score of documents based on the values of numeric
features. It is typically put in a `should` clause of a
<<query-dsl-bool-query,`bool`>> query so that its score is added to the score
of the query.
Boosts the <<relevance-scores,relevance score>> of documents based on the
numeric value of a <<rank-feature,`rank_feature`>> or
<<rank-features,`rank_features`>> field.
Compared to using <<query-dsl-function-score-query,`function_score`>> or other
ways to modify the score, this query has the benefit of being able to
efficiently skip non-competitive hits when
<<search-uri-request,`track_total_hits`>> is not set to `true`. Speedups may be
spectacular.
The `rank_feature` query is typically used in the `should` clause of a
<<query-dsl-bool-query,`bool`>> query so its relevance scores are added to other
scores from the `bool` query.
Here is an example that indexes various features:
- https://en.wikipedia.org/wiki/PageRank[`pagerank`], a measure of the
importance of a website,
- `url_length`, the length of the url, which typically correlates negatively
with relevance,
- `topics`, which associates a list of topics with every document alongside a
measure of how well the document is connected to this topic.
Unlike the <<query-dsl-function-score-query,`function_score`>> query or other
ways to change <<relevance-scores,relevance scores>>, the
`rank_feature` query efficiently skips non-competitive hits when the
<<search-uri-request,`track_total_hits`>> parameter is **not** `true`. This can
dramatically improve query speed.
Then the example includes an example query that searches for `"2016"` and boosts
based or `pagerank`, `url_length` and the `sports` topic.
[[rank-feature-query-functions]]
==== Rank feature functions
To calculate relevance scores based on rank feature fields, the `rank_feature`
query supports the following mathematical functions:
* <<rank-feature-query-saturation,Saturation>>
* <<rank-feature-query-logarithm,Logarithm>>
* <<rank-feature-query-sigmoid,Sigmoid>>
If you don't know where to start, we recommend using the `saturation` function.
If no function is provided, the `rank_feature` query uses the `saturation`
function by default.
[[rank-feature-query-ex-request]]
==== Example request
[[rank-feature-query-index-setup]]
===== Index setup
To use the `rank_feature` query, your index must include a
<<rank-feature,`rank_feature`>> or <<rank-features,`rank_features`>> field
mapping. To see how you can set up an index for the `rank_feature` query, try
the following example.
Create a `test` index with the following field mappings:
- `pagerank`, a <<rank-feature,`rank_feature`>> field which measures the
importance of a website
- `url_length`, a <<rank-feature,`rank_feature`>> field which contains the
length of the website's URL. For this example, a long URL correlates negatively
to relevance, indicated by a `positive_score_impact` value of `false`.
- `topics`, a <<rank-features,`rank_features`>> field which contains a list of
topics and a measure of how well each document is connected to this topic
[source,js]
--------------------------------------------------
PUT test
----
PUT /test
{
"mappings": {
"properties": {
@ -47,8 +72,16 @@ PUT test
}
}
}
----
// CONSOLE
// TESTSETUP
PUT test/_doc/1
Index several documents to the `test` index.
[source,js]
----
PUT /test/_doc/1?refresh
{
"url": "http://en.wikipedia.org/wiki/2016_Summer_Olympics",
"content": "Rio 2016",
@ -60,10 +93,10 @@ PUT test/_doc/1
}
}
PUT test/_doc/2
PUT /test/_doc/2?refresh
{
"url": "http://en.wikipedia.org/wiki/2016_Brazilian_Grand_Prix",
"content": "Formula One motor race held on 13 November 2016 at the Autódromo José Carlos Pace in São Paulo, Brazil",
"content": "Formula One motor race held on 13 November 2016",
"pagerank": 50.3,
"url_length": 47,
"topics": {
@ -73,7 +106,7 @@ PUT test/_doc/2
}
}
PUT test/_doc/3
PUT /test/_doc/3?refresh
{
"url": "http://en.wikipedia.org/wiki/Deadpool_(film)",
"content": "Deadpool is a 2016 American superhero film",
@ -84,10 +117,18 @@ PUT test/_doc/3
"super hero": 65
}
}
----
// CONSOLE
POST test/_refresh
[[rank-feature-query-ex-query]]
===== Example query
GET test/_search
The following query searches for `2016` and boosts relevance scores based or
`pagerank`, `url_length`, and the `sports` topic.
[source,js]
----
GET /test/_search
{
"query": {
"bool": {
@ -120,31 +161,80 @@ GET test/_search
}
}
}
--------------------------------------------------
----
// CONSOLE
[float]
=== Supported functions
The `rank_feature` query supports 3 functions in order to boost scores using the
values of rank features. If you do not know where to start, we recommend that you
start with the `saturation` function, which is the default when no function is
provided.
[[rank-feature-top-level-params]]
==== Top-level parameters for `rank_feature`
[float]
==== Saturation
`field`::
(Required, string) <<rank-feature,`rank_feature`>> or
<<rank-features,`rank_features`>> field used to boost
<<relevance-scores,relevance scores>>.
This function gives a score that is equal to `S / (S + pivot)` where `S` is the
value of the rank feature and `pivot` is a configurable pivot value so that the
result will be less than +0.5+ if `S` is less than pivot and greater than +0.5+
otherwise. Scores are always is +(0, 1)+.
`boost`::
+
--
(Optional, float) Floating point number used to decrease or increase
<<relevance-scores,relevance scores>>. Defaults to `1.0`.
If the rank feature has a negative score impact then the function will be computed as
`pivot / (S + pivot)`, which decreases when `S` increases.
Boost values are relative to the default value of `1.0`. A boost value between
`0` and `1.0` decreases the relevance score. A value greater than `1.0`
increases the relevance score.
--
`saturation`::
+
--
(Optional, <<rank-feature-query-saturation,function object>>) Saturation
function used to boost <<relevance-scores,relevance scores>> based on the
value of the rank feature `field`. If no function is provided, the `rank_feature`
query defaults to the `saturation` function. See
<<rank-feature-query-saturation,Saturation>> for more information.
Only one function `saturation`, `log`, or `sigmoid` can be provided.
--
`log`::
+
--
(Optional, <<rank-feature-query-logarithm,function object>>) Logarithmic
function used to boost <<relevance-scores,relevance scores>> based on the
value of the rank feature `field`. See
<<rank-feature-query-logarithm,Logarithm>> for more information.
Only one function `saturation`, `log`, or `sigmoid` can be provided.
--
`sigmoid`::
+
--
(Optional, <<rank-feature-query-sigmoid,function object>>) Sigmoid function used
to boost <<relevance-scores,relevance scores>> based on the value of the
rank feature `field`. See <<rank-feature-query-sigmoid,Sigmoid>> for more
information.
Only one function `saturation`, `log`, or `sigmoid` can be provided.
--
[[rank-feature-query-notes]]
==== Notes
[[rank-feature-query-saturation]]
===== Saturation
The `saturation` function gives a score equal to `S / (S + pivot)`, where `S` is
the value of the rank feature field and `pivot` is a configurable pivot value so
that the result will be less than `0.5` if `S` is less than pivot and greater
than `0.5` otherwise. Scores are always `(0,1)`.
If the rank feature has a negative score impact then the function will be
computed as `pivot / (S + pivot)`, which decreases when `S` increases.
[source,js]
--------------------------------------------------
GET test/_search
GET /test/_search
{
"query": {
"rank_feature": {
@ -157,16 +247,15 @@ GET test/_search
}
--------------------------------------------------
// CONSOLE
// TEST[continued]
If +pivot+ is not supplied then Elasticsearch will compute a default value that
will be approximately equal to the geometric mean of all feature values that
exist in the index. We recommend this if you haven't had the opportunity to
train a good pivot value.
If a `pivot` value is not provided, {es} computes a default value equal to the
approximate geometric mean of all rank feature values in the index. We recommend
using this default value if you haven't had the opportunity to train a good
pivot value.
[source,js]
--------------------------------------------------
GET test/_search
GET /test/_search
{
"query": {
"rank_feature": {
@ -177,20 +266,18 @@ GET test/_search
}
--------------------------------------------------
// CONSOLE
// TEST[continued]
[float]
==== Logarithm
This function gives a score that is equal to `log(scaling_factor + S)` where
`S` is the value of the rank feature and `scaling_factor` is a configurable scaling
factor. Scores are unbounded.
[[rank-feature-query-logarithm]]
===== Logarithm
The `log` function gives a score equal to `log(scaling_factor + S)`, where `S`
is the value of the rank feature field and `scaling_factor` is a configurable
scaling factor. Scores are unbounded.
This function only supports rank features that have a positive score impact.
[source,js]
--------------------------------------------------
GET test/_search
GET /test/_search
{
"query": {
"rank_feature": {
@ -203,23 +290,21 @@ GET test/_search
}
--------------------------------------------------
// CONSOLE
// TEST[continued]
[float]
==== Sigmoid
This function is an extension of `saturation` which adds a configurable
[[rank-feature-query-sigmoid]]
===== Sigmoid
The `sigmoid` function is an extension of `saturation` which adds a configurable
exponent. Scores are computed as `S^exp^ / (S^exp^ + pivot^exp^)`. Like for the
`saturation` function, `pivot` is the value of `S` that gives a score of +0.5+
and scores are in +(0, 1)+.
`saturation` function, `pivot` is the value of `S` that gives a score of `0.5`
and scores are `(0,1)`.
`exponent` must be positive, but is typically in +[0.5, 1]+. A good value should
be computed via training. If you don't have the opportunity to do so, we recommend
that you stick to the `saturation` function instead.
The `exponent` must be positive and is typically in `[0.5, 1]`. A
good value should be computed via training. If you don't have the opportunity to
do so, we recommend you use the `saturation` function instead.
[source,js]
--------------------------------------------------
GET test/_search
GET /test/_search
{
"query": {
"rank_feature": {
@ -233,4 +318,3 @@ GET test/_search
}
--------------------------------------------------
// CONSOLE
// TEST[continued]