OpenSearch/docs/reference/query-dsl/feature-query.asciidoc

[[query-dsl-feature-query]]
=== Feature Query

The `feature` query is a specialized query that only works on
<<feature,`feature`>> fields. Its goal is to boost the score of documents based
on the values of numeric features. It is typically put in a `should` clause of
a <<query-dsl-bool-query,`bool`>> query so that its score is added to the score
of the query.

Compared to using <<query-dsl-function-score-query,`function_score`>> or other
ways to modify the score, this query has the benefit of being able to
efficiently skip non-competitive hits when
<<search-uri-request,`track_total_hits`>> is set to `false`. Speedups may be
spectacular.

Here is an example:

[source,js]
--------------------------------------------------
PUT test
{
  "mappings": {
    "_doc": {
      "properties": {
        "pagerank": {
          "type": "feature"
        },
        "url_length": {
          "type": "feature",
          "positive_score_impact": false
        }
      }
    }
  }
}

PUT test/_doc/1
{
  "pagerank": 10,
  "url_length": 50
}

PUT test/_doc/2
{
  "pagerank": 100,
  "url_length": 20
}

POST test/_refresh

GET test/_search
{
  "query": {
    "feature": {
      "field": "pagerank"
    }
  }
}

GET test/_search
{
  "query": {
    "feature": {
      "field": "url_length"
    }
  }
}
--------------------------------------------------
// CONSOLE

[float]
=== Supported functions

The `feature` query supports 3 functions in order to boost scores using the
values of features. If you do not know where to start, we recommend that you
start with the `saturation` function, which is the default when no function is
provided.

[float]
==== Saturation

This function gives a score that is equal to `S / (S + pivot)` where `S` is the
value of the feature and `pivot` is a configurable pivot value so that the
result will be less than +0.5+ if `S` is less than pivot and greater than +0.5+
otherwise. Scores are always is +(0, 1)+.

If the feature has a negative score impact then the function will be computed as
`pivot / (S + pivot)`, which decreases when `S` increases.

[source,js]
--------------------------------------------------
GET test/_search
{
  "query": {
    "feature": {
      "field": "pagerank",
      "saturation": {
        "pivot": 8
      }
    }
  }
}
--------------------------------------------------
// CONSOLE
// TEST[continued]

If +pivot+ is not supplied then Elasticsearch will compute a default value that
will be approximately equal to the geometric mean of all feature values that
exist in the index. We recommend this if you haven't had the opportunity to
train a good pivot value.

[source,js]
--------------------------------------------------
GET test/_search
{
  "query": {
    "feature": {
      "field": "pagerank",
      "saturation": {}
    }
  }
}
--------------------------------------------------
// CONSOLE
// TEST[continued]

[float]
==== Logarithm

This function gives a score that is equal to `log(scaling_factor + S)` where
`S` is the value of the feature and `scaling_factor` is a configurable scaling
factor. Scores are unbounded.

This function only supports features that have a positive score impact.

[source,js]
--------------------------------------------------
GET test/_search
{
  "query": {
    "feature": {
      "field": "pagerank",
      "log": {
        "scaling_factor": 4
      }
    }
  }
}
--------------------------------------------------
// CONSOLE
// TEST[continued]

[float]
==== Sigmoid

This function is an extension of `saturation` which adds a configurable
exponent. Scores are computed as `S^exp^ / (S^exp^ + pivot^exp^)`. Like for the
`saturation` function, `pivot` is the value of `S` that gives a score of +0.5+
and scores are in +(0, 1)+.

`exponent` must be positive, but is typically in +[0.5, 1]+. A good value should
be computed via traning. If you don't have the opportunity to do so, we recommend
that you stick to the `saturation` function instead.

[source,js]
--------------------------------------------------
GET test/_search
{
  "query": {
    "feature": {
      "field": "pagerank",
      "sigmoid": {
        "pivot": 7,
        "exponent": 0.6
      }
    }
  }
}
--------------------------------------------------
// CONSOLE
// TEST[continued]
Expose Lucene's FeatureField. (#30618) Lucene has a new `FeatureField` which gives the ability to record numeric features as term frequencies. Its main benefit is that it allows to boost queries with the values of these features and efficiently skip non-competitive documents at the same time using block-max WAND and indexed impacts. 2018-05-23 08:55:21 +02:00			`[[query-dsl-feature-query]]`
			`=== Feature Query`

			The `feature` query is a specialized query that only works on
			<<feature,`feature`>> fields. Its goal is to boost the score of documents based
			on the values of numeric features. It is typically put in a `should` clause of
			a <<query-dsl-bool-query,`bool`>> query so that its score is added to the score
			`of the query.`

			Compared to using <<query-dsl-function-score-query,`function_score`>> or other
			`ways to modify the score, this query has the benefit of being able to`
			`efficiently skip non-competitive hits when`
			<<search-uri-request,`track_total_hits`>> is set to `false`. Speedups may be
			`spectacular.`

			`Here is an example:`

			`[source,js]`
			`--------------------------------------------------`
			`PUT test`
			`{`
			`"mappings": {`
			`"_doc": {`
			`"properties": {`
			`"pagerank": {`
			`"type": "feature"`
			`},`
			`"url_length": {`
			`"type": "feature",`
			`"positive_score_impact": false`
			`}`
			`}`
			`}`
			`}`
			`}`

			`PUT test/_doc/1`
			`{`
			`"pagerank": 10,`
			`"url_length": 50`
			`}`

			`PUT test/_doc/2`
			`{`
			`"pagerank": 100,`
			`"url_length": 20`
			`}`

			`POST test/_refresh`

			`GET test/_search`
			`{`
			`"query": {`
			`"feature": {`
			`"field": "pagerank"`
			`}`
			`}`
			`}`

			`GET test/_search`
			`{`
			`"query": {`
			`"feature": {`
			`"field": "url_length"`
			`}`
			`}`
			`}`
			`--------------------------------------------------`
			`// CONSOLE`

			`[float]`
			`=== Supported functions`

			The `feature` query supports 3 functions in order to boost scores using the
			`values of features. If you do not know where to start, we recommend that you`
			start with the `saturation` function, which is the default when no function is
			`provided.`

			`[float]`
			`==== Saturation`

			This function gives a score that is equal to `S / (S + pivot)` where `S` is the
			value of the feature and `pivot` is a configurable pivot value so that the
			result will be less than +0.5+ if `S` is less than pivot and greater than +0.5+
			`otherwise. Scores are always is +(0, 1)+.`

			`If the feature has a negative score impact then the function will be computed as`
			`pivot / (S + pivot)`, which decreases when `S` increases.

			`[source,js]`
			`--------------------------------------------------`
			`GET test/_search`
			`{`
			`"query": {`
			`"feature": {`
			`"field": "pagerank",`
			`"saturation": {`
			`"pivot": 8`
			`}`
			`}`
			`}`
			`}`
			`--------------------------------------------------`
			`// CONSOLE`
			`// TEST[continued]`

			`If +pivot+ is not supplied then Elasticsearch will compute a default value that`
			`will be approximately equal to the geometric mean of all feature values that`
			`exist in the index. We recommend this if you haven't had the opportunity to`
			`train a good pivot value.`

			`[source,js]`
			`--------------------------------------------------`
			`GET test/_search`
			`{`
			`"query": {`
			`"feature": {`
			`"field": "pagerank",`
			`"saturation": {}`
			`}`
			`}`
			`}`
			`--------------------------------------------------`
			`// CONSOLE`
			`// TEST[continued]`

			`[float]`
			`==== Logarithm`

			This function gives a score that is equal to `log(scaling_factor + S)` where
			`S` is the value of the feature and `scaling_factor` is a configurable scaling
			`factor. Scores are unbounded.`

			`This function only supports features that have a positive score impact.`

			`[source,js]`
			`--------------------------------------------------`
			`GET test/_search`
			`{`
			`"query": {`
			`"feature": {`
			`"field": "pagerank",`
			`"log": {`
			`"scaling_factor": 4`
			`}`
			`}`
			`}`
			`}`
			`--------------------------------------------------`
			`// CONSOLE`
			`// TEST[continued]`

			`[float]`
			`==== Sigmoid`

			This function is an extension of `saturation` which adds a configurable
			exponent. Scores are computed as `S^exp^ / (S^exp^ + pivot^exp^)`. Like for the
			`saturation` function, `pivot` is the value of `S` that gives a score of +0.5+
			`and scores are in +(0, 1)+.`

			`exponent` must be positive, but is typically in +[0.5, 1]+. A good value should
			`be computed via traning. If you don't have the opportunity to do so, we recommend`
			that you stick to the `saturation` function instead.

			`[source,js]`
			`--------------------------------------------------`
			`GET test/_search`
			`{`
			`"query": {`
			`"feature": {`
			`"field": "pagerank",`
			`"sigmoid": {`
			`"pivot": 7,`
			`"exponent": 0.6`
			`}`
			`}`
			`}`
			`}`
			`--------------------------------------------------`
			`// CONSOLE`
			`// TEST[continued]`