OpenSearch/docs/reference/query-dsl/script-score-query.asciidoc

414 lines
12 KiB
Plaintext

[[query-dsl-script-score-query]]
=== Script Score Query
The `script_score` allows you to modify the score of documents that are
retrieved by a query. This can be useful if, for example, a score
function is computationally expensive and it is sufficient to compute
the score on a filtered set of documents.
To use `script_score`, you have to define a query and a script -
a function to be used to compute a new score for each document returned
by the query. For more information on scripting see
<<modules-scripting, scripting documentation>>.
Here is an example of using `script_score` to assign each matched document
a score equal to the number of likes divided by 10:
[source,js]
--------------------------------------------------
GET /_search
{
"query" : {
"script_score" : {
"query" : {
"match": { "message": "elasticsearch" }
},
"script" : {
"source" : "doc['likes'].value / 10 "
}
}
}
}
--------------------------------------------------
// CONSOLE
// TEST[setup:twitter]
==== Accessing the score of a document within a script
Within a script, you can
{ref}/modules-scripting-fields.html#scripting-score[access]
the `_score` variable which represents the current relevance score of a
document.
==== Predefined functions within a Painless script
You can use any of the available
<<painless-api-reference, painless functions>> in the painless script.
Besides these functions, there are a number of predefined functions
that can help you with scoring. We suggest you to use them instead of
rewriting equivalent functions of your own, as these functions try
to be the most efficient by using the internal mechanisms.
===== saturation
`saturation(value,k) = value/(k + value)`
[source,js]
--------------------------------------------------
"script" : {
"source" : "saturation(doc['likes'].value, 1)"
}
--------------------------------------------------
// NOTCONSOLE
===== sigmoid
`sigmoid(value, k, a) = value^a/ (k^a + value^a)`
[source,js]
--------------------------------------------------
"script" : {
"source" : "sigmoid(doc['likes'].value, 2, 1)"
}
--------------------------------------------------
// NOTCONSOLE
[role="xpack"]
[testenv="basic"]
[[vector-functions]]
===== Functions for vector fields
experimental[]
These functions are used for
for <<dense-vector,`dense_vector`>> and
<<sparse-vector,`sparse_vector`>> fields.
NOTE: During vector functions' calculation, all matched documents are
linearly scanned. Thus, expect the query time grow linearly
with the number of matched documents. For this reason, we recommend
to limit the number of matched documents with a `query` parameter.
For dense_vector fields, `cosineSimilarity` calculates the measure of
cosine similarity between a given query vector and document vectors.
[source,js]
--------------------------------------------------
{
"query": {
"script_score": {
"query": {
"match_all": {}
},
"script": {
"source": "cosineSimilarity(params.queryVector, doc['my_dense_vector'])",
"params": {
"queryVector": [4, 3.4, -0.2] <1>
}
}
}
}
}
--------------------------------------------------
// NOTCONSOLE
<1> To take advantage of the script optimizations, provide a query vector as a script parameter.
Similarly, for sparse_vector fields, `cosineSimilaritySparse` calculates cosine similarity
between a given query vector and document vectors.
[source,js]
--------------------------------------------------
{
"query": {
"script_score": {
"query": {
"match_all": {}
},
"script": {
"source": "cosineSimilaritySparse(params.queryVector, doc['my_sparse_vector'])",
"params": {
"queryVector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0}
}
}
}
}
}
--------------------------------------------------
// NOTCONSOLE
For dense_vector fields, `dotProduct` calculates the measure of
dot product between a given query vector and document vectors.
[source,js]
--------------------------------------------------
{
"query": {
"script_score": {
"query": {
"match_all": {}
},
"script": {
"source": "dotProduct(params.queryVector, doc['my_dense_vector'])",
"params": {
"queryVector": [4, 3.4, -0.2]
}
}
}
}
}
--------------------------------------------------
// NOTCONSOLE
Similarly, for sparse_vector fields, `dotProductSparse` calculates dot product
between a given query vector and document vectors.
[source,js]
--------------------------------------------------
{
"query": {
"script_score": {
"query": {
"match_all": {}
},
"script": {
"source": "dotProductSparse(params.queryVector, doc['my_sparse_vector'])",
"params": {
"queryVector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0}
}
}
}
}
}
--------------------------------------------------
// NOTCONSOLE
NOTE: If a document doesn't have a value for a vector field on which
a vector function is executed, an error will be thrown.
You can check if a document has a value for the field `my_vector` by
`doc['my_vector'].size() == 0`. Your overall script can look like this:
[source,js]
--------------------------------------------------
"source": "doc['my_vector'].size() == 0 ? 0 : cosineSimilarity(params.queryVector, doc['my_vector'])"
--------------------------------------------------
// NOTCONSOLE
NOTE: If a document's dense vector field has a number of dimensions
different from the query's vector, an error will be thrown.
[[random-score-function]]
===== Random score function
`random_score` function generates scores that are uniformly distributed
from 0 up to but not including 1.
`randomScore` function has the following syntax:
`randomScore(<seed>, <fieldName>)`.
It has a required parameter - `seed` as an integer value,
and an optional parameter - `fieldName` as a string value.
[source,js]
--------------------------------------------------
"script" : {
"source" : "randomScore(100, '_seq_no')"
}
--------------------------------------------------
// NOTCONSOLE
If the `fieldName` parameter is omitted, the internal Lucene
document ids will be used as a source of randomness. This is very efficient,
but unfortunately not reproducible since documents might be renumbered
by merges.
[source,js]
--------------------------------------------------
"script" : {
"source" : "randomScore(100)"
}
--------------------------------------------------
// NOTCONSOLE
Note that documents that are within the same shard and have the
same value for field will get the same score, so it is usually desirable
to use a field that has unique values for all documents across a shard.
A good default choice might be to use the `_seq_no`
field, whose only drawback is that scores will change if the document is
updated since update operations also update the value of the `_seq_no` field.
[[decay-functions-numeric-fields]]
===== Decay functions for numeric fields
You can read more about decay functions
{ref}/query-dsl-function-score-query.html#function-decay[here].
* `double decayNumericLinear(double origin, double scale, double offset, double decay, double docValue)`
* `double decayNumericExp(double origin, double scale, double offset, double decay, double docValue)`
* `double decayNumericGauss(double origin, double scale, double offset, double decay, double docValue)`
[source,js]
--------------------------------------------------
"script" : {
"source" : "decayNumericLinear(params.origin, params.scale, params.offset, params.decay, doc['dval'].value)",
"params": { <1>
"origin": 20,
"scale": 10,
"decay" : 0.5,
"offset" : 0
}
}
--------------------------------------------------
// NOTCONSOLE
<1> Using `params` allows to compile the script only once, even if params change.
===== Decay functions for geo fields
* `double decayGeoLinear(String originStr, String scaleStr, String offsetStr, double decay, GeoPoint docValue)`
* `double decayGeoExp(String originStr, String scaleStr, String offsetStr, double decay, GeoPoint docValue)`
* `double decayGeoGauss(String originStr, String scaleStr, String offsetStr, double decay, GeoPoint docValue)`
[source,js]
--------------------------------------------------
"script" : {
"source" : "decayGeoExp(params.origin, params.scale, params.offset, params.decay, doc['location'].value)",
"params": {
"origin": "40, -70.12",
"scale": "200km",
"offset": "0km",
"decay" : 0.2
}
}
--------------------------------------------------
// NOTCONSOLE
===== Decay functions for date fields
* `double decayDateLinear(String originStr, String scaleStr, String offsetStr, double decay, JodaCompatibleZonedDateTime docValueDate)`
* `double decayDateExp(String originStr, String scaleStr, String offsetStr, double decay, JodaCompatibleZonedDateTime docValueDate)`
* `double decayDateGauss(String originStr, String scaleStr, String offsetStr, double decay, JodaCompatibleZonedDateTime docValueDate)`
[source,js]
--------------------------------------------------
"script" : {
"source" : "decayDateGauss(params.origin, params.scale, params.offset, params.decay, doc['date'].value)",
"params": {
"origin": "2008-01-01T01:00:00Z",
"scale": "1h",
"offset" : "0",
"decay" : 0.5
}
}
--------------------------------------------------
// NOTCONSOLE
NOTE: Decay functions on dates are limited to dates in the default format
and default time zone. Also calculations with `now` are not supported.
==== Faster alternatives
Script Score Query calculates the score for every hit (matching document).
There are faster alternative query types that can efficiently skip
non-competitive hits:
* If you want to boost documents on some static fields, use
<<query-dsl-rank-feature-query, Rank Feature Query>>.
==== Transition from Function Score Query
We are deprecating <<query-dsl-function-score-query, Function Score>>, and
Script Score Query will be a substitute for it.
Here we describe how Function Score Query's functions can be
equivalently implemented in Script Score Query:
[[script-score]]
===== `script_score`
What you used in `script_score` of the Function Score query, you
can copy into the Script Score query. No changes here.
[[weight]]
===== `weight`
`weight` function can be implemented in the Script Score query through
the following script:
[source,js]
--------------------------------------------------
"script" : {
"source" : "params.weight * _score",
"params": {
"weight": 2
}
}
--------------------------------------------------
// NOTCONSOLE
[[random-score]]
===== `random_score`
Use `randomScore` function
as described in <<random-score-function, random score function>>.
[[field-value-factor]]
===== `field_value_factor`
`field_value_factor` function can be easily implemented through script:
[source,js]
--------------------------------------------------
"script" : {
"source" : "Math.log10(doc['field'].value * params.factor)",
params" : {
"factor" : 5
}
}
--------------------------------------------------
// NOTCONSOLE
For checking if a document has a missing value, you can use
`doc['field'].size() == 0`. For example, this script will use
a value `1` if a document doesn't have a field `field`:
[source,js]
--------------------------------------------------
"script" : {
"source" : "Math.log10((doc['field'].size() == 0 ? 1 : doc['field'].value()) * params.factor)",
params" : {
"factor" : 5
}
}
--------------------------------------------------
// NOTCONSOLE
This table lists how `field_value_factor` modifiers can be implemented
through a script:
[cols="<,<",options="header",]
|=======================================================================
| Modifier | Implementation in Script Score
| `none` | -
| `log` | `Math.log10(doc['f'].value)`
| `log1p` | `Math.log10(doc['f'].value + 1)`
| `log2p` | `Math.log10(doc['f'].value + 2)`
| `ln` | `Math.log(doc['f'].value)`
| `ln1p` | `Math.log(doc['f'].value + 1)`
| `ln2p` | `Math.log(doc['f'].value + 2)`
| `square` | `Math.pow(doc['f'].value, 2)`
| `sqrt` | `Math.sqrt(doc['f'].value)`
| `reciprocal` | `1.0 / doc['f'].value`
|=======================================================================
[[decay-functions]]
===== `decay functions`
Script Score query has equivalent <<decay-functions, decay functions>>
that can be used in script.