182 lines
4.1 KiB
Plaintext
182 lines
4.1 KiB
Plaintext
[[query-dsl-feature-query]]
|
|
=== Feature Query
|
|
|
|
The `feature` query is a specialized query that only works on
|
|
<<feature,`feature`>> fields. Its goal is to boost the score of documents based
|
|
on the values of numeric features. It is typically put in a `should` clause of
|
|
a <<query-dsl-bool-query,`bool`>> query so that its score is added to the score
|
|
of the query.
|
|
|
|
Compared to using <<query-dsl-function-score-query,`function_score`>> or other
|
|
ways to modify the score, this query has the benefit of being able to
|
|
efficiently skip non-competitive hits when
|
|
<<search-uri-request,`track_total_hits`>> is set to `false`. Speedups may be
|
|
spectacular.
|
|
|
|
Here is an example:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
PUT test
|
|
{
|
|
"mappings": {
|
|
"_doc": {
|
|
"properties": {
|
|
"pagerank": {
|
|
"type": "feature"
|
|
},
|
|
"url_length": {
|
|
"type": "feature",
|
|
"positive_score_impact": false
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
|
|
PUT test/_doc/1
|
|
{
|
|
"pagerank": 10,
|
|
"url_length": 50
|
|
}
|
|
|
|
PUT test/_doc/2
|
|
{
|
|
"pagerank": 100,
|
|
"url_length": 20
|
|
}
|
|
|
|
POST test/_refresh
|
|
|
|
GET test/_search
|
|
{
|
|
"query": {
|
|
"feature": {
|
|
"field": "pagerank"
|
|
}
|
|
}
|
|
}
|
|
|
|
GET test/_search
|
|
{
|
|
"query": {
|
|
"feature": {
|
|
"field": "url_length"
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
|
|
[float]
|
|
=== Supported functions
|
|
|
|
The `feature` query supports 3 functions in order to boost scores using the
|
|
values of features. If you do not know where to start, we recommend that you
|
|
start with the `saturation` function, which is the default when no function is
|
|
provided.
|
|
|
|
[float]
|
|
==== Saturation
|
|
|
|
This function gives a score that is equal to `S / (S + pivot)` where `S` is the
|
|
value of the feature and `pivot` is a configurable pivot value so that the
|
|
result will be less than +0.5+ if `S` is less than pivot and greater than +0.5+
|
|
otherwise. Scores are always is +(0, 1)+.
|
|
|
|
If the feature has a negative score impact then the function will be computed as
|
|
`pivot / (S + pivot)`, which decreases when `S` increases.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
GET test/_search
|
|
{
|
|
"query": {
|
|
"feature": {
|
|
"field": "pagerank",
|
|
"saturation": {
|
|
"pivot": 8
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[continued]
|
|
|
|
If +pivot+ is not supplied then Elasticsearch will compute a default value that
|
|
will be approximately equal to the geometric mean of all feature values that
|
|
exist in the index. We recommend this if you haven't had the opportunity to
|
|
train a good pivot value.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
GET test/_search
|
|
{
|
|
"query": {
|
|
"feature": {
|
|
"field": "pagerank",
|
|
"saturation": {}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[continued]
|
|
|
|
[float]
|
|
==== Logarithm
|
|
|
|
This function gives a score that is equal to `log(scaling_factor + S)` where
|
|
`S` is the value of the feature and `scaling_factor` is a configurable scaling
|
|
factor. Scores are unbounded.
|
|
|
|
This function only supports features that have a positive score impact.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
GET test/_search
|
|
{
|
|
"query": {
|
|
"feature": {
|
|
"field": "pagerank",
|
|
"log": {
|
|
"scaling_factor": 4
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[continued]
|
|
|
|
[float]
|
|
==== Sigmoid
|
|
|
|
This function is an extension of `saturation` which adds a configurable
|
|
exponent. Scores are computed as `S^exp^ / (S^exp^ + pivot^exp^)`. Like for the
|
|
`saturation` function, `pivot` is the value of `S` that gives a score of +0.5+
|
|
and scores are in +(0, 1)+.
|
|
|
|
`exponent` must be positive, but is typically in +[0.5, 1]+. A good value should
|
|
be computed via traning. If you don't have the opportunity to do so, we recommend
|
|
that you stick to the `saturation` function instead.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
GET test/_search
|
|
{
|
|
"query": {
|
|
"feature": {
|
|
"field": "pagerank",
|
|
"sigmoid": {
|
|
"pivot": 7,
|
|
"exponent": 0.6
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[continued]
|