2016-05-04 12:17:10 -04:00
|
|
|
[[modules-scripting-fields]]
|
2019-06-06 10:45:04 -04:00
|
|
|
== Accessing document fields and special variables
|
2016-05-04 12:17:10 -04:00
|
|
|
|
|
|
|
Depending on where a script is used, it will have access to certain special
|
|
|
|
variables and document fields.
|
|
|
|
|
|
|
|
[float]
|
|
|
|
== Update scripts
|
|
|
|
|
|
|
|
A script used in the <<docs-update,update>>,
|
|
|
|
<<docs-update-by-query,update-by-query>>, or <<docs-reindex,reindex>>
|
|
|
|
API will have access to the `ctx` variable which exposes:
|
|
|
|
|
|
|
|
[horizontal]
|
|
|
|
`ctx._source`:: Access to the document <<mapping-source-field,`_source` field>>.
|
|
|
|
`ctx.op`:: The operation that should be applied to the document: `index` or `delete`.
|
|
|
|
`ctx._index` etc:: Access to <<mapping-fields,document meta-fields>>, some of which may be read-only.
|
|
|
|
|
|
|
|
[float]
|
2019-06-06 10:45:04 -04:00
|
|
|
== Search and aggregation scripts
|
2016-05-04 12:17:10 -04:00
|
|
|
|
2019-07-19 09:16:35 -04:00
|
|
|
With the exception of <<request-body-search-script-fields,script fields>> which are
|
2016-05-04 12:17:10 -04:00
|
|
|
executed once per search hit, scripts used in search and aggregations will be
|
|
|
|
executed once for every document which might match a query or an aggregation.
|
|
|
|
Depending on how many documents you have, this could mean millions or billions
|
|
|
|
of executions: these scripts need to be fast!
|
|
|
|
|
|
|
|
Field values can be accessed from a script using
|
2020-06-01 20:29:48 -04:00
|
|
|
<<modules-scripting-doc-vals,doc-values>>,
|
|
|
|
<<modules-scripting-source, the `_source` field>>, or
|
|
|
|
<<modules-scripting-stored, stored fields>>,
|
|
|
|
each of which is explained below.
|
2016-05-04 12:17:10 -04:00
|
|
|
|
|
|
|
[[scripting-score]]
|
|
|
|
[float]
|
|
|
|
=== Accessing the score of a document within a script
|
|
|
|
|
|
|
|
Scripts used in the <<query-dsl-function-score-query,`function_score` query>>,
|
2019-07-19 09:16:35 -04:00
|
|
|
in <<request-body-search-sort,script-based sorting>>, or in
|
2016-05-04 12:17:10 -04:00
|
|
|
<<search-aggregations,aggregations>> have access to the `_score` variable which
|
|
|
|
represents the current relevance score of a document.
|
|
|
|
|
|
|
|
Here's an example of using a script in a
|
|
|
|
<<query-dsl-function-score-query,`function_score` query>> to alter the
|
|
|
|
relevance `_score` of each document:
|
|
|
|
|
2019-09-09 12:35:50 -04:00
|
|
|
[source,console]
|
2016-05-04 12:17:10 -04:00
|
|
|
-------------------------------------
|
2017-12-14 11:47:53 -05:00
|
|
|
PUT my_index/_doc/1?refresh
|
2016-05-04 12:17:10 -04:00
|
|
|
{
|
|
|
|
"text": "quick brown fox",
|
|
|
|
"popularity": 1
|
|
|
|
}
|
|
|
|
|
2017-12-14 11:47:53 -05:00
|
|
|
PUT my_index/_doc/2?refresh
|
2016-05-04 12:17:10 -04:00
|
|
|
{
|
|
|
|
"text": "quick fox",
|
|
|
|
"popularity": 5
|
|
|
|
}
|
|
|
|
|
|
|
|
GET my_index/_search
|
|
|
|
{
|
|
|
|
"query": {
|
|
|
|
"function_score": {
|
|
|
|
"query": {
|
|
|
|
"match": {
|
|
|
|
"text": "quick brown fox"
|
|
|
|
}
|
|
|
|
},
|
|
|
|
"script_score": {
|
|
|
|
"script": {
|
|
|
|
"lang": "expression",
|
2017-06-09 11:29:25 -04:00
|
|
|
"source": "_score * doc['popularity']"
|
2016-05-04 12:17:10 -04:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
-------------------------------------
|
|
|
|
|
|
|
|
|
|
|
|
[float]
|
|
|
|
[[modules-scripting-doc-vals]]
|
2019-06-06 10:45:04 -04:00
|
|
|
=== Doc values
|
2016-05-04 12:17:10 -04:00
|
|
|
|
|
|
|
By far the fastest most efficient way to access a field value from a
|
|
|
|
script is to use the `doc['field_name']` syntax, which retrieves the field
|
|
|
|
value from <<doc-values,doc values>>. Doc values are a columnar field value
|
|
|
|
store, enabled by default on all fields except for <<text,analyzed `text` fields>>.
|
|
|
|
|
2019-09-09 12:35:50 -04:00
|
|
|
[source,console]
|
2016-05-04 12:17:10 -04:00
|
|
|
-------------------------------
|
2017-12-14 11:47:53 -05:00
|
|
|
PUT my_index/_doc/1?refresh
|
2016-05-04 12:17:10 -04:00
|
|
|
{
|
|
|
|
"cost_price": 100
|
|
|
|
}
|
|
|
|
|
|
|
|
GET my_index/_search
|
|
|
|
{
|
|
|
|
"script_fields": {
|
|
|
|
"sales_price": {
|
|
|
|
"script": {
|
|
|
|
"lang": "expression",
|
2017-06-09 11:29:25 -04:00
|
|
|
"source": "doc['cost_price'] * markup",
|
2016-05-04 12:17:10 -04:00
|
|
|
"params": {
|
|
|
|
"markup": 0.2
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
-------------------------------
|
|
|
|
|
|
|
|
Doc-values can only return "simple" field values like numbers, dates, geo-
|
|
|
|
points, terms, etc, or arrays of these values if the field is multi-valued.
|
|
|
|
It cannot return JSON objects.
|
|
|
|
|
2018-03-21 01:15:34 -04:00
|
|
|
[NOTE]
|
|
|
|
.Missing fields
|
|
|
|
===================================================
|
|
|
|
|
|
|
|
The `doc['field']` will throw an error if `field` is missing from the mappings.
|
|
|
|
In `painless`, a check can first be done with `doc.containsKey('field')` to guard
|
|
|
|
accessing the `doc` map. Unfortunately, there is no way to check for the
|
|
|
|
existence of the field in mappings in an `expression` script.
|
|
|
|
|
|
|
|
===================================================
|
|
|
|
|
2016-05-04 12:17:10 -04:00
|
|
|
[NOTE]
|
|
|
|
.Doc values and `text` fields
|
|
|
|
===================================================
|
|
|
|
|
|
|
|
The `doc['field']` syntax can also be used for <<text,analyzed `text` fields>>
|
|
|
|
if <<fielddata,`fielddata`>> is enabled, but *BEWARE*: enabling fielddata on a
|
|
|
|
`text` field requires loading all of the terms into the JVM heap, which can be
|
|
|
|
very expensive both in terms of memory and CPU. It seldom makes sense to
|
|
|
|
access `text` fields from scripts.
|
|
|
|
|
|
|
|
===================================================
|
|
|
|
|
|
|
|
[float]
|
2020-06-01 20:29:48 -04:00
|
|
|
[[modules-scripting-source]]
|
|
|
|
=== The document `_source`
|
2016-05-04 12:17:10 -04:00
|
|
|
|
2020-06-01 20:29:48 -04:00
|
|
|
The document <<mapping-source-field,`_source`>> can be accessed using the
|
|
|
|
`_source.field_name` syntax. The `_source` is loaded as a map-of-maps, so
|
|
|
|
properties within object fields can be accessed as, for example,
|
|
|
|
`_source.name.first`.
|
2016-05-04 12:17:10 -04:00
|
|
|
|
|
|
|
[IMPORTANT]
|
2020-06-01 20:29:48 -04:00
|
|
|
.Prefer doc-values to _source
|
2016-05-04 12:17:10 -04:00
|
|
|
=========================================================
|
|
|
|
|
2020-06-01 20:29:48 -04:00
|
|
|
Accessing the `_source` field is much slower than using doc-values. The
|
|
|
|
_source field is optimised for returning several fields per result, while doc
|
|
|
|
values are optimised for accessing the value of a specific field in many
|
|
|
|
documents.
|
2016-05-04 12:17:10 -04:00
|
|
|
|
2020-06-01 20:29:48 -04:00
|
|
|
It makes sense to use `_source` when generating a
|
|
|
|
<<request-body-search-script-fields,script field>> for the top ten hits from a
|
|
|
|
search result but, for other search and aggregation use cases, always prefer
|
|
|
|
using doc values.
|
2016-05-04 12:17:10 -04:00
|
|
|
=========================================================
|
|
|
|
|
|
|
|
|
|
|
|
For instance:
|
|
|
|
|
2019-09-09 12:35:50 -04:00
|
|
|
[source,console]
|
2016-05-04 12:17:10 -04:00
|
|
|
-------------------------------
|
2019-01-18 08:11:18 -05:00
|
|
|
PUT my_index
|
2016-05-04 12:17:10 -04:00
|
|
|
{
|
|
|
|
"mappings": {
|
2019-01-18 08:11:18 -05:00
|
|
|
"properties": {
|
|
|
|
"first_name": {
|
2020-06-01 20:29:48 -04:00
|
|
|
"type": "text"
|
2019-01-18 08:11:18 -05:00
|
|
|
},
|
|
|
|
"last_name": {
|
2020-06-01 20:29:48 -04:00
|
|
|
"type": "text"
|
2016-05-04 12:17:10 -04:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-12-14 11:47:53 -05:00
|
|
|
PUT my_index/_doc/1?refresh
|
2016-05-04 12:17:10 -04:00
|
|
|
{
|
|
|
|
"first_name": "Barry",
|
|
|
|
"last_name": "White"
|
|
|
|
}
|
|
|
|
|
|
|
|
GET my_index/_search
|
|
|
|
{
|
|
|
|
"script_fields": {
|
2020-06-01 20:29:48 -04:00
|
|
|
"full_name": {
|
2016-05-04 12:17:10 -04:00
|
|
|
"script": {
|
2016-11-22 22:24:12 -05:00
|
|
|
"lang": "painless",
|
2020-06-01 20:29:48 -04:00
|
|
|
"source": "params._source.first_name + ' ' + params._source.last_name"
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
-------------------------------
|
|
|
|
|
|
|
|
[float]
|
|
|
|
[[modules-scripting-stored]]
|
|
|
|
=== Stored fields
|
|
|
|
|
|
|
|
_Stored fields_ -- fields explicitly marked as
|
|
|
|
<<mapping-store,`"store": true`>> in the mapping -- can be accessed using the
|
|
|
|
`_fields['field_name'].value` or `_fields['field_name']` syntax:
|
|
|
|
|
|
|
|
[source,console]
|
|
|
|
-------------------------------
|
|
|
|
PUT my_index
|
|
|
|
{
|
|
|
|
"mappings": {
|
|
|
|
"properties": {
|
|
|
|
"full_name": {
|
|
|
|
"type": "text",
|
|
|
|
"store": true
|
|
|
|
},
|
|
|
|
"title": {
|
|
|
|
"type": "text",
|
|
|
|
"store": true
|
2016-05-04 12:17:10 -04:00
|
|
|
}
|
2020-06-01 20:29:48 -04:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
PUT my_index/_doc/1?refresh
|
|
|
|
{
|
|
|
|
"full_name": "Alice Ball",
|
|
|
|
"title": "Professor"
|
|
|
|
}
|
|
|
|
|
|
|
|
GET my_index/_search
|
|
|
|
{
|
|
|
|
"script_fields": {
|
|
|
|
"name_with_title": {
|
2016-05-04 12:17:10 -04:00
|
|
|
"script": {
|
2016-11-22 22:24:12 -05:00
|
|
|
"lang": "painless",
|
2020-06-01 20:29:48 -04:00
|
|
|
"source": "params._fields['title'].value + ' ' + params._fields['full_name'].value"
|
2016-05-04 12:17:10 -04:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
-------------------------------
|
2019-09-09 12:35:50 -04:00
|
|
|
|
2016-05-04 12:17:10 -04:00
|
|
|
[TIP]
|
|
|
|
.Stored vs `_source`
|
|
|
|
=======================================================
|
|
|
|
|
|
|
|
The `_source` field is just a special stored field, so the performance is
|
|
|
|
similar to that of other stored fields. The `_source` provides access to the
|
|
|
|
original document body that was indexed (including the ability to distinguish
|
|
|
|
`null` values from empty fields, single-value arrays from plain scalars, etc).
|
|
|
|
|
|
|
|
The only time it really makes sense to use stored fields instead of the
|
|
|
|
`_source` field is when the `_source` is very large and it is less costly to
|
|
|
|
access a few small stored fields instead of the entire `_source`.
|
|
|
|
|
|
|
|
=======================================================
|