593 lines
21 KiB
Plaintext
593 lines
21 KiB
Plaintext
[[modules-scripting]]
|
||
== Scripting
|
||
|
||
The scripting module allows to use scripts in order to evaluate custom
|
||
expressions. For example, scripts can be used to return "script fields"
|
||
as part of a search request, or can be used to evaluate a custom score
|
||
for a query and so on.
|
||
|
||
The scripting module uses by default http://groovy.codehaus.org/[groovy]
|
||
(previously http://mvel.codehaus.org/[mvel] in 1.3.x and earlier) as the
|
||
scripting language with some extensions. Groovy is used since it is extremely
|
||
fast and very simple to use.
|
||
|
||
.Groovy dynamic scripting disabled by default from v1.4.3
|
||
[IMPORTANT]
|
||
===================================================
|
||
|
||
Elasticsearch versions 1.3.0-1.3.7 and 1.4.0-1.4.2 have a vulnerability in the
|
||
Groovy scripting engine. The vulnerability allows an attacker to construct
|
||
Groovy scripts that escape the sandbox and execute shell commands as the user
|
||
running the Elasticsearch Java VM.
|
||
|
||
If you are running a vulnerable version of Elasticsearch, you should either
|
||
upgrade to at least v1.3.8 or v1.4.3, or disable dynamic Groovy scripts by
|
||
adding this setting to the `config/elasticsearch.yml` file in all nodes in the
|
||
cluster:
|
||
|
||
[source,yaml]
|
||
-----------------------------------
|
||
script.groovy.sandbox.enabled: false
|
||
-----------------------------------
|
||
|
||
This will turn off the Groovy sandbox, thus preventing dynamic Groovy scripts
|
||
from being accepted as part of a request or retrieved from the special
|
||
`.scripts` index. You will still be able to use Groovy scripts stored in files
|
||
in the `config/scripts/` directory on every node.
|
||
|
||
To convert an inline script to a file, take this simple script
|
||
as an example:
|
||
|
||
[source,json]
|
||
-----------------------------------
|
||
GET /_search
|
||
{
|
||
"script_fields": {
|
||
"my_field": {
|
||
"script": "1 + my_var",
|
||
"params": {
|
||
"my_var": 2
|
||
}
|
||
}
|
||
}
|
||
}
|
||
-----------------------------------
|
||
|
||
Save the contents of the script as a file called `config/scripts/my_script.groovy`
|
||
on every data node in the cluster:
|
||
|
||
[source,js]
|
||
-----------------------------------
|
||
1 + my_var
|
||
-----------------------------------
|
||
|
||
Now you can access the script by file name (without the extension):
|
||
|
||
[source,json]
|
||
-----------------------------------
|
||
GET /_search
|
||
{
|
||
"script_fields": {
|
||
"my_field": {
|
||
"script_file": "my_test",
|
||
"params": {
|
||
"my_var": 2
|
||
}
|
||
}
|
||
}
|
||
}
|
||
-----------------------------------
|
||
|
||
===================================================
|
||
|
||
|
||
Additional `lang` plugins are provided to allow to execute scripts in
|
||
different languages. Currently supported plugins are `lang-javascript`
|
||
for JavaScript, `lang-mvel` for Mvel, and `lang-python` for Python.
|
||
All places where a `script` parameter can be used, a `lang` parameter
|
||
(on the same level) can be provided to define the language of the
|
||
script. The `lang` options are `groovy`, `js`, `mvel`, `python`,
|
||
`expression` and `native`.
|
||
|
||
To increase security, Elasticsearch does not allow you to specify scripts for
|
||
non-sandboxed languages with a request. Instead, scripts must be placed in the
|
||
`scripts` directory inside the configuration directory (the directory where
|
||
elasticsearch.yml is). Scripts placed into this directory will automatically be
|
||
picked up and be available to be used. Once a script has been placed in this
|
||
directory, it can be referenced by name. For example, a script called
|
||
`calculate-score.groovy` can be referenced in a request like this:
|
||
|
||
[source,sh]
|
||
--------------------------------------------------
|
||
$ tree config
|
||
config
|
||
├── elasticsearch.yml
|
||
├── logging.yml
|
||
└── scripts
|
||
└── calculate-score.groovy
|
||
--------------------------------------------------
|
||
|
||
[source,sh]
|
||
--------------------------------------------------
|
||
$ cat config/scripts/calculate-score.groovy
|
||
log(_score * 2) + my_modifier
|
||
--------------------------------------------------
|
||
|
||
[source,js]
|
||
--------------------------------------------------
|
||
curl -XPOST localhost:9200/_search -d '{
|
||
"query": {
|
||
"function_score": {
|
||
"query": {
|
||
"match": {
|
||
"body": "foo"
|
||
}
|
||
},
|
||
"functions": [
|
||
{
|
||
"script_score": {
|
||
"script_file": "calculate-score",
|
||
"params": {
|
||
"my_modifier": 8
|
||
}
|
||
}
|
||
}
|
||
]
|
||
}
|
||
}
|
||
}'
|
||
--------------------------------------------------
|
||
|
||
The name of the script is derived from the hierarchy of directories it
|
||
exists under, and the file name without the lang extension. For example,
|
||
a script placed under `config/scripts/group1/group2/test.py` will be
|
||
named `group1_group2_test`.
|
||
|
||
[float]
|
||
=== Indexed Scripts
|
||
If dynamic scripting is enabled, Elasticsearch allows you to store scripts
|
||
in an internal index known as `.scripts` and reference them by id. There are
|
||
REST endpoints to manage indexed scripts as follows:
|
||
|
||
Requests to the scripts endpoint look like :
|
||
[source,js]
|
||
-----------------------------------
|
||
/_scripts/{lang}/{id}
|
||
-----------------------------------
|
||
Where the `lang` part is the language the script is in and the `id` part is the id
|
||
of the script. In the `.scripts` index the type of the document will be set to the `lang`.
|
||
|
||
|
||
[source,js]
|
||
-----------------------------------
|
||
curl -XPOST localhost:9200/_scripts/groovy/indexedCalculateScore -d '{
|
||
"script": "log(_score * 2) + my_modifier"
|
||
}'
|
||
-----------------------------------
|
||
|
||
This will create a document with id: `indexedCalculateScore` and type: `groovy` in the
|
||
`.scripts` index. The type of the document is the language used by the script.
|
||
|
||
This script can be accessed at query time by appending `_id` to
|
||
the script parameter and passing the script id. So `script` becomes `script_id`.:
|
||
|
||
[source,js]
|
||
--------------------------------------------------
|
||
curl -XPOST localhost:9200/_search -d '{
|
||
"query": {
|
||
"function_score": {
|
||
"query": {
|
||
"match": {
|
||
"body": "foo"
|
||
}
|
||
},
|
||
"functions": [
|
||
{
|
||
"script_score": {
|
||
"script_id": "indexedCalculateScore",
|
||
"lang" : "groovy",
|
||
"params": {
|
||
"my_modifier": 8
|
||
}
|
||
}
|
||
}
|
||
]
|
||
}
|
||
}
|
||
}'
|
||
--------------------------------------------------
|
||
Note that you must have dynamic scripting enabled to use indexed scripts
|
||
at query time.
|
||
|
||
The script can be viewed by:
|
||
[source,js]
|
||
-----------------------------------
|
||
curl -XGET localhost:9200/_scripts/groovy/indexedCalculateScore
|
||
-----------------------------------
|
||
|
||
This is rendered as:
|
||
|
||
[source,js]
|
||
-----------------------------------
|
||
'{
|
||
"script": "log(_score * 2) + my_modifier"
|
||
}'
|
||
-----------------------------------
|
||
|
||
Indexed scripts can be deleted by:
|
||
[source,js]
|
||
-----------------------------------
|
||
curl -XDELETE localhost:9200/_scripts/groovy/indexedCalculateScore
|
||
-----------------------------------
|
||
|
||
|
||
|
||
[float]
|
||
=== Enabling dynamic scripting
|
||
|
||
We recommend running Elasticsearch behind an application or proxy, which
|
||
protects Elasticsearch from the outside world. If users are allowed to run
|
||
dynamic scripts (even in a search request), then they have the same access to
|
||
your box as the user that Elasticsearch is running as. For this reason dynamic
|
||
scripting is allowed only for sandboxed languages by default.
|
||
|
||
First, you should not run Elasticsearch as the `root` user, as this would allow
|
||
a script to access or do *anything* on your server, without limitations. Second,
|
||
you should not expose Elasticsearch directly to users, but instead have a proxy
|
||
application inbetween. If you *do* intend to expose Elasticsearch directly to
|
||
your users, then you have to decide whether you trust them enough to run scripts
|
||
on your box or not. If you do, you can enable dynamic scripting by adding the
|
||
following setting to the `config/elasticsearch.yml` file on every node:
|
||
|
||
[source,yaml]
|
||
-----------------------------------
|
||
script.disable_dynamic: false
|
||
-----------------------------------
|
||
|
||
While this still allows execution of named scripts provided in the config, or
|
||
_native_ Java scripts registered through plugins, it also allows users to run
|
||
arbitrary scripts via the API. Instead of sending the name of the file as the
|
||
script, the body of the script can be sent instead.
|
||
|
||
There are three possible configuration values for the `script.disable_dynamic`
|
||
setting, the default value is `sandbox`:
|
||
|
||
[cols="<,<",options="header",]
|
||
|=======================================================================
|
||
|Value |Description
|
||
| `true` |all dynamic scripting is disabled, scripts must be placed in the `config/scripts` directory.
|
||
| `false` |all dynamic scripting is enabled, scripts may be sent as strings in requests.
|
||
| `sandbox` |scripts may be sent as strings for languages that are sandboxed.
|
||
|=======================================================================
|
||
|
||
[float]
|
||
=== Default Scripting Language
|
||
|
||
The default scripting language (assuming no `lang` parameter is provided) is
|
||
`groovy`. In order to change it, set the `script.default_lang` to the
|
||
appropriate language.
|
||
|
||
[float]
|
||
=== Groovy Sandboxing
|
||
|
||
Elasticsearch sandboxes Groovy scripts that are compiled and executed in order
|
||
to ensure they don't perform unwanted actions. There are a number of options
|
||
that can be used for configuring this sandbox:
|
||
|
||
`script.groovy.sandbox.receiver_whitelist`::
|
||
|
||
Comma-separated list of string classes for objects that may have methods
|
||
invoked.
|
||
|
||
`script.groovy.sandbox.package_whitelist`::
|
||
|
||
Comma-separated list of packages under which new objects may be constructed.
|
||
|
||
`script.groovy.sandbox.class_whitelist`::
|
||
|
||
Comma-separated list of classes that are allowed to be constructed.
|
||
|
||
`script.groovy.sandbox.method_blacklist`::
|
||
|
||
Comma-separated list of methods that are never allowed to be invoked,
|
||
regardless of target object.
|
||
|
||
`script.groovy.sandbox.enabled`::
|
||
|
||
Flag to disable the sandbox (defaults to `false` added[v1.4.3] meaning the sandbox is
|
||
disabled).
|
||
|
||
When specifying whitelist or blacklist settings for the groovy sandbox, all
|
||
options replace the current whitelist, they are not additive.
|
||
|
||
[float]
|
||
=== Automatic Script Reloading
|
||
|
||
The `config/scripts` directory is scanned periodically for changes.
|
||
New and changed scripts are reloaded and deleted script are removed
|
||
from preloaded scripts cache. The reload frequency can be specified
|
||
using `watcher.interval` setting, which defaults to `60s`.
|
||
To disable script reloading completely set `script.auto_reload_enabled`
|
||
to `false`.
|
||
|
||
[[native-java-scripts]]
|
||
[float]
|
||
=== Native (Java) Scripts
|
||
|
||
Even though `groovy` is pretty fast, this allows to register native Java based
|
||
scripts for faster execution.
|
||
|
||
In order to allow for scripts, the `NativeScriptFactory` needs to be
|
||
implemented that constructs the script that will be executed. There are
|
||
two main types, one that extends `AbstractExecutableScript` and one that
|
||
extends `AbstractSearchScript` (probably the one most users will extend,
|
||
with additional helper classes in `AbstractLongSearchScript`,
|
||
`AbstractDoubleSearchScript`, and `AbstractFloatSearchScript`).
|
||
|
||
Registering them can either be done by settings, for example:
|
||
`script.native.my.type` set to `sample.MyNativeScriptFactory` will
|
||
register a script named `my`. Another option is in a plugin, access
|
||
`ScriptModule` and call `registerScript` on it.
|
||
|
||
Executing the script is done by specifying the `lang` as `native`, and
|
||
the name of the script as the `script`.
|
||
|
||
Note, the scripts need to be in the classpath of elasticsearch. One
|
||
simple way to do it is to create a directory under plugins (choose a
|
||
descriptive name), and place the jar / classes files there. They will be
|
||
automatically loaded.
|
||
|
||
[float]
|
||
=== Lucene Expressions Scripts
|
||
|
||
[WARNING]
|
||
========================
|
||
This feature is *experimental* and subject to change in future versions.
|
||
========================
|
||
|
||
Lucene's expressions module provides a mechanism to compile a
|
||
`javascript` expression to bytecode. This allows very fast execution,
|
||
as if you had written a `native` script. Expression scripts can be
|
||
used in `script_score`, `script_fields`, sort scripts and numeric aggregation scripts.
|
||
|
||
See the link:http://lucene.apache.org/core/4_9_0/expressions/index.html?org/apache/lucene/expressions/js/package-summary.html[expressions module documentation]
|
||
for details on what operators and functions are available.
|
||
|
||
Variables in `expression` scripts are available to access:
|
||
|
||
* Single valued document fields, e.g. `doc['myfield'].value`
|
||
* Parameters passed into the script, e.g. `mymodifier`
|
||
* The current document's score, `_score` (only available when used in a `script_score`)
|
||
|
||
There are a few limitations relative to other script languages:
|
||
|
||
* Only numeric fields may be accessed
|
||
* Stored fields are not available
|
||
* If a field is sparse (only some documents contain a value), documents missing the field will have a value of `0`
|
||
|
||
[float]
|
||
=== Score
|
||
|
||
In all scripts that can be used in aggregations, the current
|
||
document's score is accessible in `_score`.
|
||
|
||
[float]
|
||
=== Computing scores based on terms in scripts
|
||
|
||
see <<modules-advanced-scripting, advanced scripting documentation>>
|
||
|
||
[float]
|
||
=== Document Fields
|
||
|
||
Most scripting revolve around the use of specific document fields data.
|
||
The `doc['field_name']` can be used to access specific field data within
|
||
a document (the document in question is usually derived by the context
|
||
the script is used). Document fields are very fast to access since they
|
||
end up being loaded into memory (all the relevant field values/tokens
|
||
are loaded to memory). Note, however, that the `doc[...]` notation only
|
||
allows for simple valued fields (can’t return a json object from it)
|
||
and makes sense only on non-analyzed or single term based fields.
|
||
|
||
The following data can be extracted from a field:
|
||
|
||
[cols="<,<",options="header",]
|
||
|=======================================================================
|
||
|Expression |Description
|
||
|`doc['field_name'].value` |The native value of the field. For example,
|
||
if its a short type, it will be short.
|
||
|
||
|`doc['field_name'].values` |The native array values of the field. For
|
||
example, if its a short type, it will be short[]. Remember, a field can
|
||
have several values within a single doc. Returns an empty array if the
|
||
field has no values.
|
||
|
||
|`doc['field_name'].empty` |A boolean indicating if the field has no
|
||
values within the doc.
|
||
|
||
|`doc['field_name'].multiValued` |A boolean indicating that the field
|
||
has several values within the corpus.
|
||
|
||
|`doc['field_name'].lat` |The latitude of a geo point type.
|
||
|
||
|`doc['field_name'].lon` |The longitude of a geo point type.
|
||
|
||
|`doc['field_name'].lats` |The latitudes of a geo point type.
|
||
|
||
|`doc['field_name'].lons` |The longitudes of a geo point type.
|
||
|
||
|`doc['field_name'].distance(lat, lon)` |The `plane` distance (in meters)
|
||
of this geo point field from the provided lat/lon.
|
||
|
||
|`doc['field_name'].distanceWithDefault(lat, lon, default)` |The `plane` distance (in meters)
|
||
of this geo point field from the provided lat/lon with a default value.
|
||
|
||
|`doc['field_name'].distanceInMiles(lat, lon)` |The `plane` distance (in
|
||
miles) of this geo point field from the provided lat/lon.
|
||
|
||
|`doc['field_name'].distanceInMilesWithDefault(lat, lon, default)` |The `plane` distance (in
|
||
miles) of this geo point field from the provided lat/lon with a default value.
|
||
|
||
|`doc['field_name'].distanceInKm(lat, lon)` |The `plane` distance (in
|
||
km) of this geo point field from the provided lat/lon.
|
||
|
||
|`doc['field_name'].distanceInKmWithDefault(lat, lon, default)` |The `plane` distance (in
|
||
km) of this geo point field from the provided lat/lon with a default value.
|
||
|
||
|`doc['field_name'].arcDistance(lat, lon)` |The `arc` distance (in
|
||
meters) of this geo point field from the provided lat/lon.
|
||
|
||
|`doc['field_name'].arcDistanceWithDefault(lat, lon, default)` |The `arc` distance (in
|
||
meters) of this geo point field from the provided lat/lon with a default value.
|
||
|
||
|`doc['field_name'].arcDistanceInMiles(lat, lon)` |The `arc` distance (in
|
||
miles) of this geo point field from the provided lat/lon.
|
||
|
||
|`doc['field_name'].arcDistanceInMilesWithDefault(lat, lon, default)` |The `arc` distance (in
|
||
miles) of this geo point field from the provided lat/lon with a default value.
|
||
|
||
|`doc['field_name'].arcDistanceInKm(lat, lon)` |The `arc` distance (in
|
||
km) of this geo point field from the provided lat/lon.
|
||
|
||
|`doc['field_name'].arcDistanceInKmWithDefault(lat, lon, default)` |The `arc` distance (in
|
||
km) of this geo point field from the provided lat/lon with a default value.
|
||
|
||
|`doc['field_name'].factorDistance(lat, lon)` |The distance factor of this geo point field from the provided lat/lon.
|
||
|
||
|`doc['field_name'].factorDistance(lat, lon, default)` |The distance factor of this geo point field from the provided lat/lon with a default value.
|
||
|
||
|`doc['field_name'].geohashDistance(geohash)` |The `arc` distance (in meters)
|
||
of this geo point field from the provided geohash.
|
||
|
||
|`doc['field_name'].geohashDistanceInKm(geohash)` |The `arc` distance (in km)
|
||
of this geo point field from the provided geohash.
|
||
|
||
|`doc['field_name'].geohashDistanceInMiles(geohash)` |The `arc` distance (in
|
||
miles) of this geo point field from the provided geohash.
|
||
|=======================================================================
|
||
|
||
[float]
|
||
=== Stored Fields
|
||
|
||
Stored fields can also be accessed when executing a script. Note, they
|
||
are much slower to access compared with document fields, as they are not
|
||
loaded into memory. They can be simply accessed using
|
||
`_fields['my_field_name'].value` or `_fields['my_field_name'].values`.
|
||
|
||
[float]
|
||
=== Accessing the score of a document within a script
|
||
|
||
When using scripting for calculating the score of a document (for instance, with
|
||
the `function_score` query), you can access the score using the `_score`
|
||
variable inside of a Groovy script.
|
||
|
||
[float]
|
||
=== Source Field
|
||
|
||
The source field can also be accessed when executing a script. The
|
||
source field is loaded per doc, parsed, and then provided to the script
|
||
for evaluation. The `_source` forms the context under which the source
|
||
field can be accessed, for example `_source.obj2.obj1.field3`.
|
||
|
||
Accessing `_source` is much slower compared to using `_doc`
|
||
but the data is not loaded into memory. For a single field access `_fields` may be
|
||
faster than using `_source` due to the extra overhead of potentially parsing large documents.
|
||
However, `_source` may be faster if you access multiple fields or if the source has already been
|
||
loaded for other purposes.
|
||
|
||
|
||
[float]
|
||
=== Groovy Built In Functions
|
||
|
||
There are several built in functions that can be used within scripts.
|
||
They include:
|
||
|
||
[cols="<,<",options="header",]
|
||
|=======================================================================
|
||
|Function |Description
|
||
|`sin(a)` |Returns the trigonometric sine of an angle.
|
||
|
||
|`cos(a)` |Returns the trigonometric cosine of an angle.
|
||
|
||
|`tan(a)` |Returns the trigonometric tangent of an angle.
|
||
|
||
|`asin(a)` |Returns the arc sine of a value.
|
||
|
||
|`acos(a)` |Returns the arc cosine of a value.
|
||
|
||
|`atan(a)` |Returns the arc tangent of a value.
|
||
|
||
|`toRadians(angdeg)` |Converts an angle measured in degrees to an
|
||
approximately equivalent angle measured in radians
|
||
|
||
|`toDegrees(angrad)` |Converts an angle measured in radians to an
|
||
approximately equivalent angle measured in degrees.
|
||
|
||
|`exp(a)` |Returns Euler's number _e_ raised to the power of value.
|
||
|
||
|`log(a)` |Returns the natural logarithm (base _e_) of a value.
|
||
|
||
|`log10(a)` |Returns the base 10 logarithm of a value.
|
||
|
||
|`sqrt(a)` |Returns the correctly rounded positive square root of a
|
||
value.
|
||
|
||
|`cbrt(a)` |Returns the cube root of a double value.
|
||
|
||
|`IEEEremainder(f1, f2)` |Computes the remainder operation on two
|
||
arguments as prescribed by the IEEE 754 standard.
|
||
|
||
|`ceil(a)` |Returns the smallest (closest to negative infinity) value
|
||
that is greater than or equal to the argument and is equal to a
|
||
mathematical integer.
|
||
|
||
|`floor(a)` |Returns the largest (closest to positive infinity) value
|
||
that is less than or equal to the argument and is equal to a
|
||
mathematical integer.
|
||
|
||
|`rint(a)` |Returns the value that is closest in value to the argument
|
||
and is equal to a mathematical integer.
|
||
|
||
|`atan2(y, x)` |Returns the angle _theta_ from the conversion of
|
||
rectangular coordinates (_x_, _y_) to polar coordinates (r,_theta_).
|
||
|
||
|`pow(a, b)` |Returns the value of the first argument raised to the
|
||
power of the second argument.
|
||
|
||
|`round(a)` |Returns the closest _int_ to the argument.
|
||
|
||
|`random()` |Returns a random _double_ value.
|
||
|
||
|`abs(a)` |Returns the absolute value of a value.
|
||
|
||
|`max(a, b)` |Returns the greater of two values.
|
||
|
||
|`min(a, b)` |Returns the smaller of two values.
|
||
|
||
|`ulp(d)` |Returns the size of an ulp of the argument.
|
||
|
||
|`signum(d)` |Returns the signum function of the argument.
|
||
|
||
|`sinh(x)` |Returns the hyperbolic sine of a value.
|
||
|
||
|`cosh(x)` |Returns the hyperbolic cosine of a value.
|
||
|
||
|`tanh(x)` |Returns the hyperbolic tangent of a value.
|
||
|
||
|`hypot(x, y)` |Returns sqrt(_x2_ + _y2_) without intermediate overflow
|
||
or underflow.
|
||
|=======================================================================
|
||
|
||
[float]
|
||
=== Arithmetic precision in MVEL
|
||
|
||
When dividing two numbers using MVEL based scripts, the engine tries to
|
||
be smart and adheres to the default behaviour of java. This means if you
|
||
divide two integers (you might have configured the fields as integer in
|
||
the mapping), the result will also be an integer. This means, if a
|
||
calculation like `1/num` is happening in your scripts and `num` is an
|
||
integer with the value of `8`, the result is `0` even though you were
|
||
expecting it to be `0.125`. You may need to enforce precision by
|
||
explicitly using a double like `1.0/num` in order to get the expected
|
||
result.
|
||
|