OpenSearch/docs/reference/modules/scripting.asciidoc

405 lines
15 KiB
Plaintext
Raw Normal View History

[[modules-scripting]]
== Scripting
The scripting module allows to use scripts in order to evaluate custom
expressions. For example, scripts can be used to return "script fields"
as part of a search request, or can be used to evaluate a custom score
for a query and so on.
deprecated[1.3.0,Groovy has replaced Mvel as the default scripting language]
The scripting module uses by default http://groovy.codehaus.org/[groovy]
(previously http://mvel.codehaus.org/[mvel]) as the scripting language with some
extensions. Groovy is used since it is extremely fast and very simple to use.
Additional `lang` plugins are provided to allow to execute scripts in
different languages. Currently supported plugins are `lang-javascript`
for JavaScript, `lang-mvel` for Mvel, and `lang-python` for Python.
All places where a `script` parameter can be used, a `lang` parameter
(on the same level) can be provided to define the language of the
script. The `lang` options are `groovy`, `js`, `mvel`, `python`, and
`native`.
added[1.2.0, Dynamic scripting is disabled for non-sandboxed languages by default since version 1.2.0]
To increase security, Elasticsearch does not allow you to specify scripts for
non-sandboxed languages with a request. Instead, scripts must be placed in the
`scripts` directory inside the configuration directory (the directory where
elasticsearch.yml is). Scripts placed into this directory will automatically be
picked up and be available to be used. Once a script has been placed in this
directory, it can be referenced by name. For example, a script called
`calculate-score.groovy` can be referenced in a request like this:
2014-04-28 05:34:39 -04:00
[source,sh]
--------------------------------------------------
$ tree config
config
├── elasticsearch.yml
├── logging.yml
└── scripts
└── calculate-score.groovy
2014-04-28 05:34:39 -04:00
--------------------------------------------------
[source,sh]
--------------------------------------------------
$ cat config/scripts/calculate-score.groovy
log(_score * 2) + my_modifier
--------------------------------------------------
[source,js]
--------------------------------------------------
curl -XPOST localhost:9200/_search -d '{
"query": {
"function_score": {
"query": {
"match": {
"body": "foo"
}
},
"functions": [
{
"script_score": {
"script": "calculate-score",
"params": {
"my_modifier": 8
}
}
}
]
}
}
}'
--------------------------------------------------
The name of the script is derived from the hierarchy of directories it
exists under, and the file name without the lang extension. For example,
a script placed under `config/scripts/group1/group2/test.py` will be
named `group1_group2_test`.
[float]
=== Enabling dynamic scripting
We recommend running Elasticsearch behind an application or proxy, which
protects Elasticsearch from the outside world. If users are allowed to run
dynamic scripts (even in a search request), then they have the same access to
your box as the user that Elasticsearch is running as. For this reason dynamic
scripting is allowed only for sandboxed languages by default.
First, you should not run Elasticsearch as the `root` user, as this would allow
a script to access or do *anything* on your server, without limitations. Second,
you should not expose Elasticsearch directly to users, but instead have a proxy
application inbetween. If you *do* intend to expose Elasticsearch directly to
your users, then you have to decide whether you trust them enough to run scripts
on your box or not. If you do, you can enable dynamic scripting by adding the
following setting to the `config/elasticsearch.yml` file on every node:
[source,yaml]
-----------------------------------
script.disable_dynamic: false
-----------------------------------
While this still allows execution of named scripts provided in the config, or
_native_ Java scripts registered through plugins, it also allows users to run
arbitrary scripts via the API. Instead of sending the name of the file as the
script, the body of the script can be sent instead.
There are three possible configuration values for the `script.disable_dynamic`
setting, the default value is `sandbox`:
[cols="<,<",options="header",]
|=======================================================================
|Value |Description
| `true` |all dynamic scripting is disabled, scripts must be placed in the `config/scripts` directory.
| `false` |all dynamic scripting is enabled, scripts may be sent as strings in requests.
| `sandbox` |scripts may be sent as strings for languages that are sandboxed.
|=======================================================================
[float]
=== Default Scripting Language
The default scripting language (assuming no `lang` parameter is provided) is
`groovy`. In order to change it, set the `script.default_lang` to the
appropriate language.
[float]
=== Groovy Sandboxing
Elasticsearch sandboxes Groovy scripts that are compiled and executed in order
to ensure they don't perform unwanted actions. There are a number of options
that can be used for configuring this sandbox:
`script.groovy.sandbox.receiver_whitelist`::
Comma-separated list of string classes for objects that may have methods
invoked.
`script.groovy.sandbox.package_whitelist`::
Comma-separated list of packages under which new objects may be constructed.
`script.groovy.sandbox.class_whitelist`::
Comma-separated list of classes that are allowed to be constructed.
`script.groovy.sandbox.method_blacklist`::
Comma-separated list of methods that are never allowed to be invoked,
regardless of target object.
`script.groovy.sandbox.enabled`::
Flag to disable the sandbox (defaults to `true` meaning the sandbox is
enabled).
[float]
=== Automatic Script Reloading
The `config/scripts` directory is scanned periodically for changes.
New and changed scripts are reloaded and deleted script are removed
from preloaded scripts cache. The reload frequency can be specified
using `watcher.interval` setting, which defaults to `60s`.
To disable script reloading completely set `script.auto_reload_enabled`
to `false`.
[float]
=== Native (Java) Scripts
Even though `groovy` is pretty fast, this allows to register native Java based
scripts for faster execution.
In order to allow for scripts, the `NativeScriptFactory` needs to be
implemented that constructs the script that will be executed. There are
two main types, one that extends `AbstractExecutableScript` and one that
extends `AbstractSearchScript` (probably the one most users will extend,
with additional helper classes in `AbstractLongSearchScript`,
`AbstractDoubleSearchScript`, and `AbstractFloatSearchScript`).
Registering them can either be done by settings, for example:
`script.native.my.type` set to `sample.MyNativeScriptFactory` will
register a script named `my`. Another option is in a plugin, access
`ScriptModule` and call `registerScript` on it.
Executing the script is done by specifying the `lang` as `native`, and
the name of the script as the `script`.
Note, the scripts need to be in the classpath of elasticsearch. One
simple way to do it is to create a directory under plugins (choose a
descriptive name), and place the jar / classes files there, they will be
automatically loaded.
[float]
=== Score
In all scripts that can be used in facets, allow to access the current
doc score using `doc.score`.
make term statistics accessible in scripts term statistics can be accessed via the _shard variable. Below is a minimal example. See documentation on details. ``` DELETE paytest PUT paytest { "mappings": { "test": { "_all": { "auto_boost": true, "enabled": true }, "properties": { "text": { "index_analyzer": "fulltext_analyzer", "store": "yes", "type": "string" } } } }, "settings": { "analysis": { "analyzer": { "fulltext_analyzer": { "filter": [ "my_delimited_payload_filter" ], "tokenizer": "whitespace", "type": "custom" } }, "filter": { "my_delimited_payload_filter": { "delimiter": "+", "encoding": "float", "type": "delimited_payload_filter" } } }, "index": { "number_of_replicas": 0, "number_of_shards": 1 } } } POST paytest/test/1 { "text": "the+1 quick+2 brown+3 fox+4 is quick+10" } POST paytest/test/2 { "text": "the+1 quick+2 red+3 fox+4" } POST paytest/_refresh POST paytest/_search { "script_fields": { "ttf": { "script": "_shard[\"text\"][\"quick\"].ttf()" } } } POST paytest/_search { "script_fields": { "freq": { "script": "_shard[\"text\"][\"quick\"].freq()" } } } POST paytest/test/2/_termvector POST paytest/_search { "script_fields": { "payloads": { "script": "term = _shard[\"text\"].get(\"red\",_PAYLOADS);payloads = []; for(pos : term){payloads.add(pos.payloadAsFloat(-1));} return payloads;" } } } POST paytest/_search { "script_fields": { "tv": { "script": "_shard[\"text\"][\"quick\"].freq()" } }, "query": { "function_score": { "functions": [ { "script_score": { "script": "_shard[\"text\"][\"quick\"].freq()" } } ] } } } ``` closes #3772
2014-01-02 05:17:33 -05:00
[float]
=== Computing scores based on terms in scripts
see <<modules-advanced-scripting, advanced scripting documentation>>
make term statistics accessible in scripts term statistics can be accessed via the _shard variable. Below is a minimal example. See documentation on details. ``` DELETE paytest PUT paytest { "mappings": { "test": { "_all": { "auto_boost": true, "enabled": true }, "properties": { "text": { "index_analyzer": "fulltext_analyzer", "store": "yes", "type": "string" } } } }, "settings": { "analysis": { "analyzer": { "fulltext_analyzer": { "filter": [ "my_delimited_payload_filter" ], "tokenizer": "whitespace", "type": "custom" } }, "filter": { "my_delimited_payload_filter": { "delimiter": "+", "encoding": "float", "type": "delimited_payload_filter" } } }, "index": { "number_of_replicas": 0, "number_of_shards": 1 } } } POST paytest/test/1 { "text": "the+1 quick+2 brown+3 fox+4 is quick+10" } POST paytest/test/2 { "text": "the+1 quick+2 red+3 fox+4" } POST paytest/_refresh POST paytest/_search { "script_fields": { "ttf": { "script": "_shard[\"text\"][\"quick\"].ttf()" } } } POST paytest/_search { "script_fields": { "freq": { "script": "_shard[\"text\"][\"quick\"].freq()" } } } POST paytest/test/2/_termvector POST paytest/_search { "script_fields": { "payloads": { "script": "term = _shard[\"text\"].get(\"red\",_PAYLOADS);payloads = []; for(pos : term){payloads.add(pos.payloadAsFloat(-1));} return payloads;" } } } POST paytest/_search { "script_fields": { "tv": { "script": "_shard[\"text\"][\"quick\"].freq()" } }, "query": { "function_score": { "functions": [ { "script_score": { "script": "_shard[\"text\"][\"quick\"].freq()" } } ] } } } ``` closes #3772
2014-01-02 05:17:33 -05:00
[float]
=== Document Fields
Most scripting revolve around the use of specific document fields data.
The `doc['field_name']` can be used to access specific field data within
a document (the document in question is usually derived by the context
the script is used). Document fields are very fast to access since they
end up being loaded into memory (all the relevant field values/tokens
are loaded to memory).
The following data can be extracted from a field:
[cols="<,<",options="header",]
|=======================================================================
|Expression |Description
|`doc['field_name'].value` |The native value of the field. For example,
if its a short type, it will be short.
|`doc['field_name'].values` |The native array values of the field. For
example, if its a short type, it will be short[]. Remember, a field can
have several values within a single doc. Returns an empty array if the
field has no values.
|`doc['field_name'].empty` |A boolean indicating if the field has no
values within the doc.
|`doc['field_name'].multiValued` |A boolean indicating that the field
has several values within the corpus.
|`doc['field_name'].lat` |The latitude of a geo point type.
|`doc['field_name'].lon` |The longitude of a geo point type.
|`doc['field_name'].lats` |The latitudes of a geo point type.
|`doc['field_name'].lons` |The longitudes of a geo point type.
|`doc['field_name'].distance(lat, lon)` |The `plane` distance (in meters)
of this geo point field from the provided lat/lon.
|`doc['field_name'].distanceWithDefault(lat, lon, default)` |The `plane` distance (in meters)
of this geo point field from the provided lat/lon with a default value.
|`doc['field_name'].distanceInMiles(lat, lon)` |The `plane` distance (in
miles) of this geo point field from the provided lat/lon.
|`doc['field_name'].distanceInMilesWithDefault(lat, lon, default)` |The `plane` distance (in
miles) of this geo point field from the provided lat/lon with a default value.
|`doc['field_name'].distanceInKm(lat, lon)` |The `plane` distance (in
km) of this geo point field from the provided lat/lon.
|`doc['field_name'].distanceInKmWithDefault(lat, lon, default)` |The `plane` distance (in
km) of this geo point field from the provided lat/lon with a default value.
|`doc['field_name'].arcDistance(lat, lon)` |The `arc` distance (in
meters) of this geo point field from the provided lat/lon.
|`doc['field_name'].arcDistanceWithDefault(lat, lon, default)` |The `arc` distance (in
meters) of this geo point field from the provided lat/lon with a default value.
|`doc['field_name'].arcDistanceInMiles(lat, lon)` |The `arc` distance (in
miles) of this geo point field from the provided lat/lon.
|`doc['field_name'].arcDistanceInMilesWithDefault(lat, lon, default)` |The `arc` distance (in
miles) of this geo point field from the provided lat/lon with a default value.
|`doc['field_name'].arcDistanceInKm(lat, lon)` |The `arc` distance (in
km) of this geo point field from the provided lat/lon.
|`doc['field_name'].arcDistanceInKmWithDefault(lat, lon, default)` |The `arc` distance (in
km) of this geo point field from the provided lat/lon with a default value.
|`doc['field_name'].factorDistance(lat, lon)` |The distance factor of this geo point field from the provided lat/lon.
|`doc['field_name'].factorDistance(lat, lon, default)` |The distance factor of this geo point field from the provided lat/lon with a default value.
|`doc['field_name'].geohashDistance(geohash)` |The `arc` distance (in meters)
of this geo point field from the provided geohash.
|`doc['field_name'].geohashDistanceInKm(geohash)` |The `arc` distance (in km)
of this geo point field from the provided geohash.
|`doc['field_name'].geohashDistanceInMiles(geohash)` |The `arc` distance (in
miles) of this geo point field from the provided geohash.
|=======================================================================
[float]
=== Stored Fields
2014-03-07 08:21:45 -05:00
Stored fields can also be accessed when executing a script. Note, they
2014-05-03 17:21:52 -04:00
are much slower to access compared with document fields, as they are not
loaded into memory. They can be simply accessed using
`_fields['my_field_name'].value` or `_fields['my_field_name'].values`.
[float]
=== Source Field
The source field can also be accessed when executing a script. The
source field is loaded per doc, parsed, and then provided to the script
for evaluation. The `_source` forms the context under which the source
field can be accessed, for example `_source.obj2.obj1.field3`.
Accessing `_source` is much slower compared to using `_doc`
but the data is not loaded into memory. For a single field access `_fields` may be
faster than using `_source` due to the extra overhead of potentially parsing large documents.
However, `_source` may be faster if you access multiple fields or if the source has already been
loaded for other purposes.
make term statistics accessible in scripts term statistics can be accessed via the _shard variable. Below is a minimal example. See documentation on details. ``` DELETE paytest PUT paytest { "mappings": { "test": { "_all": { "auto_boost": true, "enabled": true }, "properties": { "text": { "index_analyzer": "fulltext_analyzer", "store": "yes", "type": "string" } } } }, "settings": { "analysis": { "analyzer": { "fulltext_analyzer": { "filter": [ "my_delimited_payload_filter" ], "tokenizer": "whitespace", "type": "custom" } }, "filter": { "my_delimited_payload_filter": { "delimiter": "+", "encoding": "float", "type": "delimited_payload_filter" } } }, "index": { "number_of_replicas": 0, "number_of_shards": 1 } } } POST paytest/test/1 { "text": "the+1 quick+2 brown+3 fox+4 is quick+10" } POST paytest/test/2 { "text": "the+1 quick+2 red+3 fox+4" } POST paytest/_refresh POST paytest/_search { "script_fields": { "ttf": { "script": "_shard[\"text\"][\"quick\"].ttf()" } } } POST paytest/_search { "script_fields": { "freq": { "script": "_shard[\"text\"][\"quick\"].freq()" } } } POST paytest/test/2/_termvector POST paytest/_search { "script_fields": { "payloads": { "script": "term = _shard[\"text\"].get(\"red\",_PAYLOADS);payloads = []; for(pos : term){payloads.add(pos.payloadAsFloat(-1));} return payloads;" } } } POST paytest/_search { "script_fields": { "tv": { "script": "_shard[\"text\"][\"quick\"].freq()" } }, "query": { "function_score": { "functions": [ { "script_score": { "script": "_shard[\"text\"][\"quick\"].freq()" } } ] } } } ``` closes #3772
2014-01-02 05:17:33 -05:00
[float]
=== Groovy Built In Functions
There are several built in functions that can be used within scripts.
They include:
[cols="<,<",options="header",]
|=======================================================================
|Function |Description
|`sin(a)` |Returns the trigonometric sine of an angle.
|`cos(a)` |Returns the trigonometric cosine of an angle.
|`tan(a)` |Returns the trigonometric tangent of an angle.
|`asin(a)` |Returns the arc sine of a value.
|`acos(a)` |Returns the arc cosine of a value.
|`atan(a)` |Returns the arc tangent of a value.
|`toRadians(angdeg)` |Converts an angle measured in degrees to an
approximately equivalent angle measured in radians
|`toDegrees(angrad)` |Converts an angle measured in radians to an
approximately equivalent angle measured in degrees.
|`exp(a)` |Returns Euler's number _e_ raised to the power of value.
|`log(a)` |Returns the natural logarithm (base _e_) of a value.
|`log10(a)` |Returns the base 10 logarithm of a value.
|`sqrt(a)` |Returns the correctly rounded positive square root of a
value.
|`cbrt(a)` |Returns the cube root of a double value.
|`IEEEremainder(f1, f2)` |Computes the remainder operation on two
arguments as prescribed by the IEEE 754 standard.
|`ceil(a)` |Returns the smallest (closest to negative infinity) value
that is greater than or equal to the argument and is equal to a
mathematical integer.
|`floor(a)` |Returns the largest (closest to positive infinity) value
that is less than or equal to the argument and is equal to a
mathematical integer.
|`rint(a)` |Returns the value that is closest in value to the argument
and is equal to a mathematical integer.
|`atan2(y, x)` |Returns the angle _theta_ from the conversion of
rectangular coordinates (_x_, _y_) to polar coordinates (r,_theta_).
|`pow(a, b)` |Returns the value of the first argument raised to the
power of the second argument.
|`round(a)` |Returns the closest _int_ to the argument.
|`random()` |Returns a random _double_ value.
|`abs(a)` |Returns the absolute value of a value.
|`max(a, b)` |Returns the greater of two values.
|`min(a, b)` |Returns the smaller of two values.
|`ulp(d)` |Returns the size of an ulp of the argument.
|`signum(d)` |Returns the signum function of the argument.
|`sinh(x)` |Returns the hyperbolic sine of a value.
|`cosh(x)` |Returns the hyperbolic cosine of a value.
|`tanh(x)` |Returns the hyperbolic tangent of a value.
|`hypot(x, y)` |Returns sqrt(_x2_ + _y2_) without intermediate overflow
or underflow.
|=======================================================================
[float]
=== Arithmetic precision in MVEL
When dividing two numbers using MVEL based scripts, the engine tries to
be smart and adheres to the default behaviour of java. This means if you
divide two integers (you might have configured the fields as integer in
the mapping), the result will also be an integer. This means, if a
calculation like `1/num` is happening in your scripts and `num` is an
integer with the value of `8`, the result is `0` even though you were
expecting it to be `0.125`. You may need to enforce precision by
explicitly using a double like `1.0/num` in order to get the expected
result.