2019-01-24 19:18:48 -05:00
|
|
|
[[query-dsl-rank-feature-query]]
|
2019-07-18 10:18:11 -04:00
|
|
|
=== Rank feature query
|
|
|
|
++++
|
|
|
|
<titleabbrev>Rank feature</titleabbrev>
|
|
|
|
++++
|
2018-05-23 02:55:21 -04:00
|
|
|
|
2019-07-31 14:31:28 -04:00
|
|
|
Boosts the <<relevance-scores,relevance score>> of documents based on the
|
|
|
|
numeric value of a <<rank-feature,`rank_feature`>> or
|
|
|
|
<<rank-features,`rank_features`>> field.
|
|
|
|
|
|
|
|
The `rank_feature` query is typically used in the `should` clause of a
|
|
|
|
<<query-dsl-bool-query,`bool`>> query so its relevance scores are added to other
|
|
|
|
scores from the `bool` query.
|
|
|
|
|
|
|
|
Unlike the <<query-dsl-function-score-query,`function_score`>> query or other
|
|
|
|
ways to change <<relevance-scores,relevance scores>>, the
|
|
|
|
`rank_feature` query efficiently skips non-competitive hits when the
|
|
|
|
<<search-uri-request,`track_total_hits`>> parameter is **not** `true`. This can
|
|
|
|
dramatically improve query speed.
|
|
|
|
|
|
|
|
[[rank-feature-query-functions]]
|
|
|
|
==== Rank feature functions
|
|
|
|
|
|
|
|
To calculate relevance scores based on rank feature fields, the `rank_feature`
|
|
|
|
query supports the following mathematical functions:
|
|
|
|
|
|
|
|
* <<rank-feature-query-saturation,Saturation>>
|
|
|
|
* <<rank-feature-query-logarithm,Logarithm>>
|
|
|
|
* <<rank-feature-query-sigmoid,Sigmoid>>
|
|
|
|
|
|
|
|
If you don't know where to start, we recommend using the `saturation` function.
|
|
|
|
If no function is provided, the `rank_feature` query uses the `saturation`
|
|
|
|
function by default.
|
|
|
|
|
|
|
|
[[rank-feature-query-ex-request]]
|
|
|
|
==== Example request
|
|
|
|
|
|
|
|
[[rank-feature-query-index-setup]]
|
|
|
|
===== Index setup
|
|
|
|
|
|
|
|
To use the `rank_feature` query, your index must include a
|
|
|
|
<<rank-feature,`rank_feature`>> or <<rank-features,`rank_features`>> field
|
|
|
|
mapping. To see how you can set up an index for the `rank_feature` query, try
|
|
|
|
the following example.
|
|
|
|
|
|
|
|
Create a `test` index with the following field mappings:
|
|
|
|
|
|
|
|
- `pagerank`, a <<rank-feature,`rank_feature`>> field which measures the
|
|
|
|
importance of a website
|
|
|
|
- `url_length`, a <<rank-feature,`rank_feature`>> field which contains the
|
|
|
|
length of the website's URL. For this example, a long URL correlates negatively
|
|
|
|
to relevance, indicated by a `positive_score_impact` value of `false`.
|
|
|
|
- `topics`, a <<rank-features,`rank_features`>> field which contains a list of
|
|
|
|
topics and a measure of how well each document is connected to this topic
|
2018-05-23 02:55:21 -04:00
|
|
|
|
2019-09-09 12:35:50 -04:00
|
|
|
[source,console]
|
2019-07-31 14:31:28 -04:00
|
|
|
----
|
|
|
|
PUT /test
|
2018-05-23 02:55:21 -04:00
|
|
|
{
|
|
|
|
"mappings": {
|
2019-01-22 09:13:52 -05:00
|
|
|
"properties": {
|
|
|
|
"pagerank": {
|
2019-01-24 19:18:48 -05:00
|
|
|
"type": "rank_feature"
|
2019-01-22 09:13:52 -05:00
|
|
|
},
|
|
|
|
"url_length": {
|
2019-01-24 19:18:48 -05:00
|
|
|
"type": "rank_feature",
|
2019-01-22 09:13:52 -05:00
|
|
|
"positive_score_impact": false
|
|
|
|
},
|
|
|
|
"topics": {
|
2019-01-24 19:18:48 -05:00
|
|
|
"type": "rank_features"
|
2018-05-23 02:55:21 -04:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
2019-07-31 14:31:28 -04:00
|
|
|
----
|
|
|
|
// TESTSETUP
|
2018-05-23 02:55:21 -04:00
|
|
|
|
2019-07-31 14:31:28 -04:00
|
|
|
|
|
|
|
Index several documents to the `test` index.
|
|
|
|
|
2019-09-09 12:35:50 -04:00
|
|
|
[source,console]
|
2019-07-31 14:31:28 -04:00
|
|
|
----
|
|
|
|
PUT /test/_doc/1?refresh
|
2018-05-23 02:55:21 -04:00
|
|
|
{
|
2018-06-07 04:05:37 -04:00
|
|
|
"url": "http://en.wikipedia.org/wiki/2016_Summer_Olympics",
|
|
|
|
"content": "Rio 2016",
|
|
|
|
"pagerank": 50.3,
|
|
|
|
"url_length": 42,
|
|
|
|
"topics": {
|
|
|
|
"sports": 50,
|
|
|
|
"brazil": 30
|
|
|
|
}
|
2018-05-23 02:55:21 -04:00
|
|
|
}
|
|
|
|
|
2019-07-31 14:31:28 -04:00
|
|
|
PUT /test/_doc/2?refresh
|
2018-05-23 02:55:21 -04:00
|
|
|
{
|
2018-06-07 04:05:37 -04:00
|
|
|
"url": "http://en.wikipedia.org/wiki/2016_Brazilian_Grand_Prix",
|
2019-07-31 14:31:28 -04:00
|
|
|
"content": "Formula One motor race held on 13 November 2016",
|
2018-06-07 04:05:37 -04:00
|
|
|
"pagerank": 50.3,
|
|
|
|
"url_length": 47,
|
|
|
|
"topics": {
|
|
|
|
"sports": 35,
|
|
|
|
"formula one": 65,
|
|
|
|
"brazil": 20
|
|
|
|
}
|
2018-05-23 02:55:21 -04:00
|
|
|
}
|
|
|
|
|
2019-07-31 14:31:28 -04:00
|
|
|
PUT /test/_doc/3?refresh
|
2018-05-23 02:55:21 -04:00
|
|
|
{
|
2018-06-07 04:05:37 -04:00
|
|
|
"url": "http://en.wikipedia.org/wiki/Deadpool_(film)",
|
|
|
|
"content": "Deadpool is a 2016 American superhero film",
|
|
|
|
"pagerank": 50.3,
|
|
|
|
"url_length": 37,
|
|
|
|
"topics": {
|
|
|
|
"movies": 60,
|
|
|
|
"super hero": 65
|
2018-05-23 02:55:21 -04:00
|
|
|
}
|
|
|
|
}
|
2019-07-31 14:31:28 -04:00
|
|
|
----
|
2018-05-23 02:55:21 -04:00
|
|
|
|
2019-07-31 14:31:28 -04:00
|
|
|
[[rank-feature-query-ex-query]]
|
|
|
|
===== Example query
|
2018-06-07 04:05:37 -04:00
|
|
|
|
2020-06-01 08:42:09 -04:00
|
|
|
The following query searches for `2016` and boosts relevance scores based on
|
2019-07-31 14:31:28 -04:00
|
|
|
`pagerank`, `url_length`, and the `sports` topic.
|
|
|
|
|
2019-09-09 12:35:50 -04:00
|
|
|
[source,console]
|
2019-07-31 14:31:28 -04:00
|
|
|
----
|
|
|
|
GET /test/_search
|
2018-05-23 02:55:21 -04:00
|
|
|
{
|
|
|
|
"query": {
|
2018-06-07 04:05:37 -04:00
|
|
|
"bool": {
|
|
|
|
"must": [
|
|
|
|
{
|
|
|
|
"match": {
|
|
|
|
"content": "2016"
|
|
|
|
}
|
|
|
|
}
|
|
|
|
],
|
|
|
|
"should": [
|
|
|
|
{
|
2019-01-24 19:18:48 -05:00
|
|
|
"rank_feature": {
|
2018-06-07 04:05:37 -04:00
|
|
|
"field": "pagerank"
|
|
|
|
}
|
|
|
|
},
|
|
|
|
{
|
2019-01-24 19:18:48 -05:00
|
|
|
"rank_feature": {
|
2018-06-07 04:05:37 -04:00
|
|
|
"field": "url_length",
|
|
|
|
"boost": 0.1
|
|
|
|
}
|
|
|
|
},
|
|
|
|
{
|
2019-01-24 19:18:48 -05:00
|
|
|
"rank_feature": {
|
2018-06-07 04:05:37 -04:00
|
|
|
"field": "topics.sports",
|
|
|
|
"boost": 0.4
|
|
|
|
}
|
|
|
|
}
|
|
|
|
]
|
2018-05-23 02:55:21 -04:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
2019-07-31 14:31:28 -04:00
|
|
|
----
|
2018-05-23 02:55:21 -04:00
|
|
|
|
|
|
|
|
2019-07-31 14:31:28 -04:00
|
|
|
[[rank-feature-top-level-params]]
|
|
|
|
==== Top-level parameters for `rank_feature`
|
|
|
|
|
|
|
|
`field`::
|
|
|
|
(Required, string) <<rank-feature,`rank_feature`>> or
|
|
|
|
<<rank-features,`rank_features`>> field used to boost
|
|
|
|
<<relevance-scores,relevance scores>>.
|
|
|
|
|
|
|
|
`boost`::
|
|
|
|
+
|
|
|
|
--
|
|
|
|
(Optional, float) Floating point number used to decrease or increase
|
|
|
|
<<relevance-scores,relevance scores>>. Defaults to `1.0`.
|
|
|
|
|
|
|
|
Boost values are relative to the default value of `1.0`. A boost value between
|
|
|
|
`0` and `1.0` decreases the relevance score. A value greater than `1.0`
|
|
|
|
increases the relevance score.
|
|
|
|
--
|
|
|
|
|
|
|
|
`saturation`::
|
|
|
|
+
|
|
|
|
--
|
|
|
|
(Optional, <<rank-feature-query-saturation,function object>>) Saturation
|
|
|
|
function used to boost <<relevance-scores,relevance scores>> based on the
|
|
|
|
value of the rank feature `field`. If no function is provided, the `rank_feature`
|
|
|
|
query defaults to the `saturation` function. See
|
|
|
|
<<rank-feature-query-saturation,Saturation>> for more information.
|
|
|
|
|
|
|
|
Only one function `saturation`, `log`, or `sigmoid` can be provided.
|
|
|
|
--
|
2018-05-23 02:55:21 -04:00
|
|
|
|
2019-07-31 14:31:28 -04:00
|
|
|
`log`::
|
|
|
|
+
|
|
|
|
--
|
|
|
|
(Optional, <<rank-feature-query-logarithm,function object>>) Logarithmic
|
|
|
|
function used to boost <<relevance-scores,relevance scores>> based on the
|
|
|
|
value of the rank feature `field`. See
|
|
|
|
<<rank-feature-query-logarithm,Logarithm>> for more information.
|
2018-05-23 02:55:21 -04:00
|
|
|
|
2019-07-31 14:31:28 -04:00
|
|
|
Only one function `saturation`, `log`, or `sigmoid` can be provided.
|
|
|
|
--
|
2018-05-23 02:55:21 -04:00
|
|
|
|
2019-07-31 14:31:28 -04:00
|
|
|
`sigmoid`::
|
|
|
|
+
|
|
|
|
--
|
|
|
|
(Optional, <<rank-feature-query-sigmoid,function object>>) Sigmoid function used
|
|
|
|
to boost <<relevance-scores,relevance scores>> based on the value of the
|
|
|
|
rank feature `field`. See <<rank-feature-query-sigmoid,Sigmoid>> for more
|
|
|
|
information.
|
|
|
|
|
|
|
|
Only one function `saturation`, `log`, or `sigmoid` can be provided.
|
|
|
|
--
|
|
|
|
|
|
|
|
|
|
|
|
[[rank-feature-query-notes]]
|
|
|
|
==== Notes
|
|
|
|
|
|
|
|
[[rank-feature-query-saturation]]
|
|
|
|
===== Saturation
|
|
|
|
The `saturation` function gives a score equal to `S / (S + pivot)`, where `S` is
|
|
|
|
the value of the rank feature field and `pivot` is a configurable pivot value so
|
|
|
|
that the result will be less than `0.5` if `S` is less than pivot and greater
|
|
|
|
than `0.5` otherwise. Scores are always `(0,1)`.
|
|
|
|
|
|
|
|
If the rank feature has a negative score impact then the function will be
|
|
|
|
computed as `pivot / (S + pivot)`, which decreases when `S` increases.
|
2018-05-23 02:55:21 -04:00
|
|
|
|
2019-09-09 12:35:50 -04:00
|
|
|
[source,console]
|
2018-05-23 02:55:21 -04:00
|
|
|
--------------------------------------------------
|
2019-07-31 14:31:28 -04:00
|
|
|
GET /test/_search
|
2018-05-23 02:55:21 -04:00
|
|
|
{
|
|
|
|
"query": {
|
2019-01-24 19:18:48 -05:00
|
|
|
"rank_feature": {
|
2018-05-23 02:55:21 -04:00
|
|
|
"field": "pagerank",
|
|
|
|
"saturation": {
|
|
|
|
"pivot": 8
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
--------------------------------------------------
|
|
|
|
|
2019-07-31 14:31:28 -04:00
|
|
|
If a `pivot` value is not provided, {es} computes a default value equal to the
|
|
|
|
approximate geometric mean of all rank feature values in the index. We recommend
|
|
|
|
using this default value if you haven't had the opportunity to train a good
|
|
|
|
pivot value.
|
2018-05-23 02:55:21 -04:00
|
|
|
|
2019-09-09 12:35:50 -04:00
|
|
|
[source,console]
|
2018-05-23 02:55:21 -04:00
|
|
|
--------------------------------------------------
|
2019-07-31 14:31:28 -04:00
|
|
|
GET /test/_search
|
2018-05-23 02:55:21 -04:00
|
|
|
{
|
|
|
|
"query": {
|
2019-01-24 19:18:48 -05:00
|
|
|
"rank_feature": {
|
2018-05-23 02:55:21 -04:00
|
|
|
"field": "pagerank",
|
|
|
|
"saturation": {}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
--------------------------------------------------
|
|
|
|
|
2019-07-31 14:31:28 -04:00
|
|
|
[[rank-feature-query-logarithm]]
|
|
|
|
===== Logarithm
|
|
|
|
The `log` function gives a score equal to `log(scaling_factor + S)`, where `S`
|
|
|
|
is the value of the rank feature field and `scaling_factor` is a configurable
|
|
|
|
scaling factor. Scores are unbounded.
|
2018-05-23 02:55:21 -04:00
|
|
|
|
2019-01-24 19:18:48 -05:00
|
|
|
This function only supports rank features that have a positive score impact.
|
2018-05-23 02:55:21 -04:00
|
|
|
|
2019-09-09 12:35:50 -04:00
|
|
|
[source,console]
|
2018-05-23 02:55:21 -04:00
|
|
|
--------------------------------------------------
|
2019-07-31 14:31:28 -04:00
|
|
|
GET /test/_search
|
2018-05-23 02:55:21 -04:00
|
|
|
{
|
|
|
|
"query": {
|
2019-01-24 19:18:48 -05:00
|
|
|
"rank_feature": {
|
2018-05-23 02:55:21 -04:00
|
|
|
"field": "pagerank",
|
|
|
|
"log": {
|
|
|
|
"scaling_factor": 4
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
--------------------------------------------------
|
|
|
|
|
2019-07-31 14:31:28 -04:00
|
|
|
[[rank-feature-query-sigmoid]]
|
|
|
|
===== Sigmoid
|
|
|
|
The `sigmoid` function is an extension of `saturation` which adds a configurable
|
2018-05-23 02:55:21 -04:00
|
|
|
exponent. Scores are computed as `S^exp^ / (S^exp^ + pivot^exp^)`. Like for the
|
2019-07-31 14:31:28 -04:00
|
|
|
`saturation` function, `pivot` is the value of `S` that gives a score of `0.5`
|
|
|
|
and scores are `(0,1)`.
|
2018-05-23 02:55:21 -04:00
|
|
|
|
2019-07-31 14:31:28 -04:00
|
|
|
The `exponent` must be positive and is typically in `[0.5, 1]`. A
|
|
|
|
good value should be computed via training. If you don't have the opportunity to
|
|
|
|
do so, we recommend you use the `saturation` function instead.
|
2018-05-23 02:55:21 -04:00
|
|
|
|
2019-09-09 12:35:50 -04:00
|
|
|
[source,console]
|
2018-05-23 02:55:21 -04:00
|
|
|
--------------------------------------------------
|
2019-07-31 14:31:28 -04:00
|
|
|
GET /test/_search
|
2018-05-23 02:55:21 -04:00
|
|
|
{
|
|
|
|
"query": {
|
2019-01-24 19:18:48 -05:00
|
|
|
"rank_feature": {
|
2018-05-23 02:55:21 -04:00
|
|
|
"field": "pagerank",
|
|
|
|
"sigmoid": {
|
|
|
|
"pivot": 7,
|
|
|
|
"exponent": 0.6
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
--------------------------------------------------
|