[[query-dsl-rank-feature-query]] === Rank feature query ++++ Rank feature ++++ Boosts the <> of documents based on the numeric value of a <> or <> field. The `rank_feature` query is typically used in the `should` clause of a <> query so its relevance scores are added to other scores from the `bool` query. Unlike the <> query or other ways to change <>, the `rank_feature` query efficiently skips non-competitive hits when the <> parameter is **not** `true`. This can dramatically improve query speed. [[rank-feature-query-functions]] ==== Rank feature functions To calculate relevance scores based on rank feature fields, the `rank_feature` query supports the following mathematical functions: * <> * <> * <> If you don't know where to start, we recommend using the `saturation` function. If no function is provided, the `rank_feature` query uses the `saturation` function by default. [[rank-feature-query-ex-request]] ==== Example request [[rank-feature-query-index-setup]] ===== Index setup To use the `rank_feature` query, your index must include a <> or <> field mapping. To see how you can set up an index for the `rank_feature` query, try the following example. Create a `test` index with the following field mappings: - `pagerank`, a <> field which measures the importance of a website - `url_length`, a <> field which contains the length of the website's URL. For this example, a long URL correlates negatively to relevance, indicated by a `positive_score_impact` value of `false`. - `topics`, a <> field which contains a list of topics and a measure of how well each document is connected to this topic [source,console] ---- PUT /test { "mappings": { "properties": { "pagerank": { "type": "rank_feature" }, "url_length": { "type": "rank_feature", "positive_score_impact": false }, "topics": { "type": "rank_features" } } } } ---- // TESTSETUP Index several documents to the `test` index. [source,console] ---- PUT /test/_doc/1?refresh { "url": "https://en.wikipedia.org/wiki/2016_Summer_Olympics", "content": "Rio 2016", "pagerank": 50.3, "url_length": 42, "topics": { "sports": 50, "brazil": 30 } } PUT /test/_doc/2?refresh { "url": "https://en.wikipedia.org/wiki/2016_Brazilian_Grand_Prix", "content": "Formula One motor race held on 13 November 2016", "pagerank": 50.3, "url_length": 47, "topics": { "sports": 35, "formula one": 65, "brazil": 20 } } PUT /test/_doc/3?refresh { "url": "https://en.wikipedia.org/wiki/Deadpool_(film)", "content": "Deadpool is a 2016 American superhero film", "pagerank": 50.3, "url_length": 37, "topics": { "movies": 60, "super hero": 65 } } ---- [[rank-feature-query-ex-query]] ===== Example query The following query searches for `2016` and boosts relevance scores based on `pagerank`, `url_length`, and the `sports` topic. [source,console] ---- GET /test/_search { "query": { "bool": { "must": [ { "match": { "content": "2016" } } ], "should": [ { "rank_feature": { "field": "pagerank" } }, { "rank_feature": { "field": "url_length", "boost": 0.1 } }, { "rank_feature": { "field": "topics.sports", "boost": 0.4 } } ] } } } ---- [[rank-feature-top-level-params]] ==== Top-level parameters for `rank_feature` `field`:: (Required, string) <> or <> field used to boost <>. `boost`:: + -- (Optional, float) Floating point number used to decrease or increase <>. Defaults to `1.0`. Boost values are relative to the default value of `1.0`. A boost value between `0` and `1.0` decreases the relevance score. A value greater than `1.0` increases the relevance score. -- `saturation`:: + -- (Optional, <>) Saturation function used to boost <> based on the value of the rank feature `field`. If no function is provided, the `rank_feature` query defaults to the `saturation` function. See <> for more information. Only one function `saturation`, `log`, or `sigmoid` can be provided. -- `log`:: + -- (Optional, <>) Logarithmic function used to boost <> based on the value of the rank feature `field`. See <> for more information. Only one function `saturation`, `log`, or `sigmoid` can be provided. -- `sigmoid`:: + -- (Optional, <>) Sigmoid function used to boost <> based on the value of the rank feature `field`. See <> for more information. Only one function `saturation`, `log`, or `sigmoid` can be provided. -- [[rank-feature-query-notes]] ==== Notes [[rank-feature-query-saturation]] ===== Saturation The `saturation` function gives a score equal to `S / (S + pivot)`, where `S` is the value of the rank feature field and `pivot` is a configurable pivot value so that the result will be less than `0.5` if `S` is less than pivot and greater than `0.5` otherwise. Scores are always `(0,1)`. If the rank feature has a negative score impact then the function will be computed as `pivot / (S + pivot)`, which decreases when `S` increases. [source,console] -------------------------------------------------- GET /test/_search { "query": { "rank_feature": { "field": "pagerank", "saturation": { "pivot": 8 } } } } -------------------------------------------------- If a `pivot` value is not provided, {es} computes a default value equal to the approximate geometric mean of all rank feature values in the index. We recommend using this default value if you haven't had the opportunity to train a good pivot value. [source,console] -------------------------------------------------- GET /test/_search { "query": { "rank_feature": { "field": "pagerank", "saturation": {} } } } -------------------------------------------------- [[rank-feature-query-logarithm]] ===== Logarithm The `log` function gives a score equal to `log(scaling_factor + S)`, where `S` is the value of the rank feature field and `scaling_factor` is a configurable scaling factor. Scores are unbounded. This function only supports rank features that have a positive score impact. [source,console] -------------------------------------------------- GET /test/_search { "query": { "rank_feature": { "field": "pagerank", "log": { "scaling_factor": 4 } } } } -------------------------------------------------- [[rank-feature-query-sigmoid]] ===== Sigmoid The `sigmoid` function is an extension of `saturation` which adds a configurable exponent. Scores are computed as `S^exp^ / (S^exp^ + pivot^exp^)`. Like for the `saturation` function, `pivot` is the value of `S` that gives a score of `0.5` and scores are `(0,1)`. The `exponent` must be positive and is typically in `[0.5, 1]`. A good value should be computed via training. If you don't have the opportunity to do so, we recommend you use the `saturation` function instead. [source,console] -------------------------------------------------- GET /test/_search { "query": { "rank_feature": { "field": "pagerank", "sigmoid": { "pivot": 7, "exponent": 0.6 } } } } --------------------------------------------------