mirror of
https://github.com/honeymoose/OpenSearch.git
synced 2025-03-09 14:34:43 +00:00
[DOCS] Reformat rank feature query. Add relevance score section. (#44975)
This commit is contained in:
parent
728b0cf9ff
commit
3c4150cf72
@ -1,27 +1,38 @@
|
||||
[[query-filter-context]]
|
||||
== Query and filter context
|
||||
|
||||
The behaviour of a query clause depends on whether it is used in _query context_ or
|
||||
in _filter context_:
|
||||
[float]
|
||||
[[relevance-scores]]
|
||||
=== Relevance scores
|
||||
|
||||
Query context::
|
||||
+
|
||||
--
|
||||
A query clause used in query context answers the question ``__How well does this
|
||||
By default, Elasticsearch sorts matching search results by **relevance
|
||||
score**, which measures how well each document matches a query.
|
||||
|
||||
The relevance score is a positive floating point number, returned in the
|
||||
`_score` meta-field of the <<search-request-body,search>> API. The higher the
|
||||
`_score`, the more relevant the document. While each query type can calculate
|
||||
relevance scores differently, score calculation also depends on whether the
|
||||
query clause is run in a **query** or **filter** context.
|
||||
|
||||
[float]
|
||||
[[query-context]]
|
||||
=== Query context
|
||||
In the query context, a query clause answers the question ``__How well does this
|
||||
document match this query clause?__'' Besides deciding whether or not the
|
||||
document matches, the query clause also calculates a `_score` representing how
|
||||
well the document matches, relative to other documents.
|
||||
document matches, the query clause also calculates a relevance score in the
|
||||
`_score` meta-field.
|
||||
|
||||
Query context is in effect whenever a query clause is passed to a `query` parameter,
|
||||
such as the `query` parameter in the <<request-body-search-query,`search`>> API.
|
||||
--
|
||||
Query context is in effect whenever a query clause is passed to a `query`
|
||||
parameter, such as the `query` parameter in the
|
||||
<<request-body-search-query,search>> API.
|
||||
|
||||
Filter context::
|
||||
+
|
||||
--
|
||||
In _filter_ context, a query clause answers the question ``__Does this document
|
||||
match this query clause?__'' The answer is a simple Yes or No -- no scores are
|
||||
calculated. Filter context is mostly used for filtering structured data, e.g.
|
||||
[float]
|
||||
[[filter-context]]
|
||||
=== Filter context
|
||||
In a filter context, a query clause answers the question ``__Does this
|
||||
document match this query clause?__'' The answer is a simple Yes or No -- no
|
||||
scores are calculated. Filter context is mostly used for filtering structured
|
||||
data, e.g.
|
||||
|
||||
* __Does this +timestamp+ fall into the range 2015 to 2016?__
|
||||
* __Is the +status+ field set to ++"published"++__?
|
||||
@ -34,8 +45,10 @@ parameter, such as the `filter` or `must_not` parameters in the
|
||||
<<query-dsl-bool-query,`bool`>> query, the `filter` parameter in the
|
||||
<<query-dsl-constant-score-query,`constant_score`>> query, or the
|
||||
<<search-aggregations-bucket-filter-aggregation,`filter`>> aggregation.
|
||||
--
|
||||
|
||||
[float]
|
||||
[[query-filter-context-ex]]
|
||||
=== Example of query and filter contexts
|
||||
Below is an example of query clauses being used in query and filter context
|
||||
in the `search` API. This query will match documents where all of the following
|
||||
conditions are met:
|
||||
|
@ -4,33 +4,58 @@
|
||||
<titleabbrev>Rank feature</titleabbrev>
|
||||
++++
|
||||
|
||||
The `rank_feature` query is a specialized query that only works on
|
||||
<<rank-feature,`rank_feature`>> fields and <<rank-features,`rank_features`>> fields.
|
||||
Its goal is to boost the score of documents based on the values of numeric
|
||||
features. It is typically put in a `should` clause of a
|
||||
<<query-dsl-bool-query,`bool`>> query so that its score is added to the score
|
||||
of the query.
|
||||
Boosts the <<relevance-scores,relevance score>> of documents based on the
|
||||
numeric value of a <<rank-feature,`rank_feature`>> or
|
||||
<<rank-features,`rank_features`>> field.
|
||||
|
||||
Compared to using <<query-dsl-function-score-query,`function_score`>> or other
|
||||
ways to modify the score, this query has the benefit of being able to
|
||||
efficiently skip non-competitive hits when
|
||||
<<search-uri-request,`track_total_hits`>> is not set to `true`. Speedups may be
|
||||
spectacular.
|
||||
The `rank_feature` query is typically used in the `should` clause of a
|
||||
<<query-dsl-bool-query,`bool`>> query so its relevance scores are added to other
|
||||
scores from the `bool` query.
|
||||
|
||||
Here is an example that indexes various features:
|
||||
- https://en.wikipedia.org/wiki/PageRank[`pagerank`], a measure of the
|
||||
importance of a website,
|
||||
- `url_length`, the length of the url, which typically correlates negatively
|
||||
with relevance,
|
||||
- `topics`, which associates a list of topics with every document alongside a
|
||||
measure of how well the document is connected to this topic.
|
||||
Unlike the <<query-dsl-function-score-query,`function_score`>> query or other
|
||||
ways to change <<relevance-scores,relevance scores>>, the
|
||||
`rank_feature` query efficiently skips non-competitive hits when the
|
||||
<<search-uri-request,`track_total_hits`>> parameter is **not** `true`. This can
|
||||
dramatically improve query speed.
|
||||
|
||||
Then the example includes an example query that searches for `"2016"` and boosts
|
||||
based or `pagerank`, `url_length` and the `sports` topic.
|
||||
[[rank-feature-query-functions]]
|
||||
==== Rank feature functions
|
||||
|
||||
To calculate relevance scores based on rank feature fields, the `rank_feature`
|
||||
query supports the following mathematical functions:
|
||||
|
||||
* <<rank-feature-query-saturation,Saturation>>
|
||||
* <<rank-feature-query-logarithm,Logarithm>>
|
||||
* <<rank-feature-query-sigmoid,Sigmoid>>
|
||||
|
||||
If you don't know where to start, we recommend using the `saturation` function.
|
||||
If no function is provided, the `rank_feature` query uses the `saturation`
|
||||
function by default.
|
||||
|
||||
[[rank-feature-query-ex-request]]
|
||||
==== Example request
|
||||
|
||||
[[rank-feature-query-index-setup]]
|
||||
===== Index setup
|
||||
|
||||
To use the `rank_feature` query, your index must include a
|
||||
<<rank-feature,`rank_feature`>> or <<rank-features,`rank_features`>> field
|
||||
mapping. To see how you can set up an index for the `rank_feature` query, try
|
||||
the following example.
|
||||
|
||||
Create a `test` index with the following field mappings:
|
||||
|
||||
- `pagerank`, a <<rank-feature,`rank_feature`>> field which measures the
|
||||
importance of a website
|
||||
- `url_length`, a <<rank-feature,`rank_feature`>> field which contains the
|
||||
length of the website's URL. For this example, a long URL correlates negatively
|
||||
to relevance, indicated by a `positive_score_impact` value of `false`.
|
||||
- `topics`, a <<rank-features,`rank_features`>> field which contains a list of
|
||||
topics and a measure of how well each document is connected to this topic
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT test
|
||||
----
|
||||
PUT /test
|
||||
{
|
||||
"mappings": {
|
||||
"properties": {
|
||||
@ -47,8 +72,16 @@ PUT test
|
||||
}
|
||||
}
|
||||
}
|
||||
----
|
||||
// CONSOLE
|
||||
// TESTSETUP
|
||||
|
||||
PUT test/_doc/1
|
||||
|
||||
Index several documents to the `test` index.
|
||||
|
||||
[source,js]
|
||||
----
|
||||
PUT /test/_doc/1?refresh
|
||||
{
|
||||
"url": "http://en.wikipedia.org/wiki/2016_Summer_Olympics",
|
||||
"content": "Rio 2016",
|
||||
@ -60,10 +93,10 @@ PUT test/_doc/1
|
||||
}
|
||||
}
|
||||
|
||||
PUT test/_doc/2
|
||||
PUT /test/_doc/2?refresh
|
||||
{
|
||||
"url": "http://en.wikipedia.org/wiki/2016_Brazilian_Grand_Prix",
|
||||
"content": "Formula One motor race held on 13 November 2016 at the Autódromo José Carlos Pace in São Paulo, Brazil",
|
||||
"content": "Formula One motor race held on 13 November 2016",
|
||||
"pagerank": 50.3,
|
||||
"url_length": 47,
|
||||
"topics": {
|
||||
@ -73,7 +106,7 @@ PUT test/_doc/2
|
||||
}
|
||||
}
|
||||
|
||||
PUT test/_doc/3
|
||||
PUT /test/_doc/3?refresh
|
||||
{
|
||||
"url": "http://en.wikipedia.org/wiki/Deadpool_(film)",
|
||||
"content": "Deadpool is a 2016 American superhero film",
|
||||
@ -84,10 +117,18 @@ PUT test/_doc/3
|
||||
"super hero": 65
|
||||
}
|
||||
}
|
||||
----
|
||||
// CONSOLE
|
||||
|
||||
POST test/_refresh
|
||||
[[rank-feature-query-ex-query]]
|
||||
===== Example query
|
||||
|
||||
GET test/_search
|
||||
The following query searches for `2016` and boosts relevance scores based or
|
||||
`pagerank`, `url_length`, and the `sports` topic.
|
||||
|
||||
[source,js]
|
||||
----
|
||||
GET /test/_search
|
||||
{
|
||||
"query": {
|
||||
"bool": {
|
||||
@ -120,31 +161,80 @@ GET test/_search
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
----
|
||||
// CONSOLE
|
||||
|
||||
[float]
|
||||
=== Supported functions
|
||||
|
||||
The `rank_feature` query supports 3 functions in order to boost scores using the
|
||||
values of rank features. If you do not know where to start, we recommend that you
|
||||
start with the `saturation` function, which is the default when no function is
|
||||
provided.
|
||||
[[rank-feature-top-level-params]]
|
||||
==== Top-level parameters for `rank_feature`
|
||||
|
||||
[float]
|
||||
==== Saturation
|
||||
`field`::
|
||||
(Required, string) <<rank-feature,`rank_feature`>> or
|
||||
<<rank-features,`rank_features`>> field used to boost
|
||||
<<relevance-scores,relevance scores>>.
|
||||
|
||||
This function gives a score that is equal to `S / (S + pivot)` where `S` is the
|
||||
value of the rank feature and `pivot` is a configurable pivot value so that the
|
||||
result will be less than +0.5+ if `S` is less than pivot and greater than +0.5+
|
||||
otherwise. Scores are always is +(0, 1)+.
|
||||
`boost`::
|
||||
+
|
||||
--
|
||||
(Optional, float) Floating point number used to decrease or increase
|
||||
<<relevance-scores,relevance scores>>. Defaults to `1.0`.
|
||||
|
||||
If the rank feature has a negative score impact then the function will be computed as
|
||||
`pivot / (S + pivot)`, which decreases when `S` increases.
|
||||
Boost values are relative to the default value of `1.0`. A boost value between
|
||||
`0` and `1.0` decreases the relevance score. A value greater than `1.0`
|
||||
increases the relevance score.
|
||||
--
|
||||
|
||||
`saturation`::
|
||||
+
|
||||
--
|
||||
(Optional, <<rank-feature-query-saturation,function object>>) Saturation
|
||||
function used to boost <<relevance-scores,relevance scores>> based on the
|
||||
value of the rank feature `field`. If no function is provided, the `rank_feature`
|
||||
query defaults to the `saturation` function. See
|
||||
<<rank-feature-query-saturation,Saturation>> for more information.
|
||||
|
||||
Only one function `saturation`, `log`, or `sigmoid` can be provided.
|
||||
--
|
||||
|
||||
`log`::
|
||||
+
|
||||
--
|
||||
(Optional, <<rank-feature-query-logarithm,function object>>) Logarithmic
|
||||
function used to boost <<relevance-scores,relevance scores>> based on the
|
||||
value of the rank feature `field`. See
|
||||
<<rank-feature-query-logarithm,Logarithm>> for more information.
|
||||
|
||||
Only one function `saturation`, `log`, or `sigmoid` can be provided.
|
||||
--
|
||||
|
||||
`sigmoid`::
|
||||
+
|
||||
--
|
||||
(Optional, <<rank-feature-query-sigmoid,function object>>) Sigmoid function used
|
||||
to boost <<relevance-scores,relevance scores>> based on the value of the
|
||||
rank feature `field`. See <<rank-feature-query-sigmoid,Sigmoid>> for more
|
||||
information.
|
||||
|
||||
Only one function `saturation`, `log`, or `sigmoid` can be provided.
|
||||
--
|
||||
|
||||
|
||||
[[rank-feature-query-notes]]
|
||||
==== Notes
|
||||
|
||||
[[rank-feature-query-saturation]]
|
||||
===== Saturation
|
||||
The `saturation` function gives a score equal to `S / (S + pivot)`, where `S` is
|
||||
the value of the rank feature field and `pivot` is a configurable pivot value so
|
||||
that the result will be less than `0.5` if `S` is less than pivot and greater
|
||||
than `0.5` otherwise. Scores are always `(0,1)`.
|
||||
|
||||
If the rank feature has a negative score impact then the function will be
|
||||
computed as `pivot / (S + pivot)`, which decreases when `S` increases.
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
GET test/_search
|
||||
GET /test/_search
|
||||
{
|
||||
"query": {
|
||||
"rank_feature": {
|
||||
@ -157,16 +247,15 @@ GET test/_search
|
||||
}
|
||||
--------------------------------------------------
|
||||
// CONSOLE
|
||||
// TEST[continued]
|
||||
|
||||
If +pivot+ is not supplied then Elasticsearch will compute a default value that
|
||||
will be approximately equal to the geometric mean of all feature values that
|
||||
exist in the index. We recommend this if you haven't had the opportunity to
|
||||
train a good pivot value.
|
||||
If a `pivot` value is not provided, {es} computes a default value equal to the
|
||||
approximate geometric mean of all rank feature values in the index. We recommend
|
||||
using this default value if you haven't had the opportunity to train a good
|
||||
pivot value.
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
GET test/_search
|
||||
GET /test/_search
|
||||
{
|
||||
"query": {
|
||||
"rank_feature": {
|
||||
@ -177,20 +266,18 @@ GET test/_search
|
||||
}
|
||||
--------------------------------------------------
|
||||
// CONSOLE
|
||||
// TEST[continued]
|
||||
|
||||
[float]
|
||||
==== Logarithm
|
||||
|
||||
This function gives a score that is equal to `log(scaling_factor + S)` where
|
||||
`S` is the value of the rank feature and `scaling_factor` is a configurable scaling
|
||||
factor. Scores are unbounded.
|
||||
[[rank-feature-query-logarithm]]
|
||||
===== Logarithm
|
||||
The `log` function gives a score equal to `log(scaling_factor + S)`, where `S`
|
||||
is the value of the rank feature field and `scaling_factor` is a configurable
|
||||
scaling factor. Scores are unbounded.
|
||||
|
||||
This function only supports rank features that have a positive score impact.
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
GET test/_search
|
||||
GET /test/_search
|
||||
{
|
||||
"query": {
|
||||
"rank_feature": {
|
||||
@ -203,23 +290,21 @@ GET test/_search
|
||||
}
|
||||
--------------------------------------------------
|
||||
// CONSOLE
|
||||
// TEST[continued]
|
||||
|
||||
[float]
|
||||
==== Sigmoid
|
||||
|
||||
This function is an extension of `saturation` which adds a configurable
|
||||
[[rank-feature-query-sigmoid]]
|
||||
===== Sigmoid
|
||||
The `sigmoid` function is an extension of `saturation` which adds a configurable
|
||||
exponent. Scores are computed as `S^exp^ / (S^exp^ + pivot^exp^)`. Like for the
|
||||
`saturation` function, `pivot` is the value of `S` that gives a score of +0.5+
|
||||
and scores are in +(0, 1)+.
|
||||
`saturation` function, `pivot` is the value of `S` that gives a score of `0.5`
|
||||
and scores are `(0,1)`.
|
||||
|
||||
`exponent` must be positive, but is typically in +[0.5, 1]+. A good value should
|
||||
be computed via training. If you don't have the opportunity to do so, we recommend
|
||||
that you stick to the `saturation` function instead.
|
||||
The `exponent` must be positive and is typically in `[0.5, 1]`. A
|
||||
good value should be computed via training. If you don't have the opportunity to
|
||||
do so, we recommend you use the `saturation` function instead.
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
GET test/_search
|
||||
GET /test/_search
|
||||
{
|
||||
"query": {
|
||||
"rank_feature": {
|
||||
@ -233,4 +318,3 @@ GET test/_search
|
||||
}
|
||||
--------------------------------------------------
|
||||
// CONSOLE
|
||||
// TEST[continued]
|
||||
|
Loading…
x
Reference in New Issue
Block a user