2013-08-28 19:24:34 -04:00
|
|
|
[[modules-scripting]]
|
|
|
|
== Scripting
|
|
|
|
|
|
|
|
The scripting module allows to use scripts in order to evaluate custom
|
|
|
|
expressions. For example, scripts can be used to return "script fields"
|
|
|
|
as part of a search request, or can be used to evaluate a custom score
|
|
|
|
for a query and so on.
|
|
|
|
|
|
|
|
The scripting module uses by default http://mvel.codehaus.org/[mvel] as
|
|
|
|
the scripting language with some extensions. mvel is used since it is
|
|
|
|
extremely fast and very simple to use, and in most cases, simple
|
|
|
|
expressions are needed (for example, mathematical equations).
|
|
|
|
|
|
|
|
Additional `lang` plugins are provided to allow to execute scripts in
|
|
|
|
different languages. Currently supported plugins are `lang-javascript`
|
|
|
|
for JavaScript, `lang-groovy` for Groovy, and `lang-python` for Python.
|
|
|
|
All places where a `script` parameter can be used, a `lang` parameter
|
|
|
|
(on the same level) can be provided to define the language of the
|
|
|
|
script. The `lang` options are `mvel`, `js`, `groovy`, `python`, and
|
|
|
|
`native`.
|
|
|
|
|
|
|
|
[float]
|
|
|
|
=== Default Scripting Language
|
|
|
|
|
|
|
|
The default scripting language (assuming no `lang` parameter is
|
|
|
|
provided) is `mvel`. In order to change it set the `script.default_lang`
|
|
|
|
to the appropriate language.
|
|
|
|
|
|
|
|
[float]
|
|
|
|
=== Preloaded Scripts
|
|
|
|
|
|
|
|
Scripts can always be provided as part of the relevant API, but they can
|
|
|
|
also be preloaded by placing them under `config/scripts` and then
|
|
|
|
referencing them by the script name (instead of providing the full
|
|
|
|
script). This helps reduce the amount of data passed between the client
|
|
|
|
and the nodes.
|
|
|
|
|
|
|
|
The name of the script is derived from the hierarchy of directories it
|
|
|
|
exists under, and the file name without the lang extension. For example,
|
|
|
|
a script placed under `config/scripts/group1/group2/test.py` will be
|
|
|
|
named `group1_group2_test`.
|
|
|
|
|
2013-10-15 06:24:33 -04:00
|
|
|
[float]
|
|
|
|
=== Disabling dynamic scripts
|
|
|
|
|
|
|
|
We recommend running Elasticsearch behind an application or proxy,
|
|
|
|
which protects Elasticsearch from the outside world. If users are
|
|
|
|
allowed to run dynamic scripts (even in a search request), then they
|
|
|
|
have the same access to your box as the user that Elasticsearch is
|
|
|
|
running as.
|
|
|
|
|
|
|
|
First, you should not run Elasticsearch as the `root` user, as this
|
|
|
|
would allow a script to access or do *anything* on your server, without
|
|
|
|
limitations. Second, you should not expose Elasticsearch directly to
|
|
|
|
users, but instead have a proxy application inbetween. If you *do*
|
|
|
|
intend to expose Elasticsearch directly to your users, then you have
|
|
|
|
to decide whether you trust them enough to run scripts on your box or
|
|
|
|
not. If not, then even if you have a proxy which only allows `GET`
|
|
|
|
requests, you should disable dynamic scripting by adding the following
|
|
|
|
setting to the `config/elasticsearch.yml` file on every node:
|
|
|
|
|
|
|
|
[source,yaml]
|
|
|
|
-----------------------------------
|
|
|
|
script.disable_dynamic: true
|
|
|
|
-----------------------------------
|
|
|
|
|
|
|
|
This will still allow execution of named scripts provided in the config, or
|
|
|
|
_native_ Java scripts registered through plugins, however it will prevent
|
|
|
|
users from running arbitrary scripts via the API.
|
|
|
|
|
2013-11-03 21:20:38 -05:00
|
|
|
[float]
|
|
|
|
=== Automatic Script Reloading
|
|
|
|
|
|
|
|
The `config/scripts` directory is scanned periodically for changes.
|
|
|
|
New and changed scripts are reloaded and deleted script are removed
|
|
|
|
from preloaded scripts cache. The reload frequency can be specified
|
|
|
|
using `watcher.interval` setting, which defaults to `60s`.
|
|
|
|
To disable script reloading completely set `script.auto_reload_enabled`
|
|
|
|
to `false`.
|
|
|
|
|
2013-08-28 19:24:34 -04:00
|
|
|
[float]
|
|
|
|
=== Native (Java) Scripts
|
|
|
|
|
2014-03-07 08:21:45 -05:00
|
|
|
Even though `mvel` is pretty fast, this allows to register native Java based
|
2013-08-28 19:24:34 -04:00
|
|
|
scripts for faster execution.
|
|
|
|
|
|
|
|
In order to allow for scripts, the `NativeScriptFactory` needs to be
|
|
|
|
implemented that constructs the script that will be executed. There are
|
|
|
|
two main types, one that extends `AbstractExecutableScript` and one that
|
|
|
|
extends `AbstractSearchScript` (probably the one most users will extend,
|
|
|
|
with additional helper classes in `AbstractLongSearchScript`,
|
|
|
|
`AbstractDoubleSearchScript`, and `AbstractFloatSearchScript`).
|
|
|
|
|
|
|
|
Registering them can either be done by settings, for example:
|
|
|
|
`script.native.my.type` set to `sample.MyNativeScriptFactory` will
|
|
|
|
register a script named `my`. Another option is in a plugin, access
|
|
|
|
`ScriptModule` and call `registerScript` on it.
|
|
|
|
|
|
|
|
Executing the script is done by specifying the `lang` as `native`, and
|
|
|
|
the name of the script as the `script`.
|
|
|
|
|
|
|
|
Note, the scripts need to be in the classpath of elasticsearch. One
|
|
|
|
simple way to do it is to create a directory under plugins (choose a
|
|
|
|
descriptive name), and place the jar / classes files there, they will be
|
|
|
|
automatically loaded.
|
|
|
|
|
|
|
|
[float]
|
|
|
|
=== Score
|
|
|
|
|
|
|
|
In all scripts that can be used in facets, allow to access the current
|
|
|
|
doc score using `doc.score`.
|
|
|
|
|
make term statistics accessible in scripts
term statistics can be accessed via the _shard variable.
Below is a minimal example. See documentation on details.
```
DELETE paytest
PUT paytest
{
"mappings": {
"test": {
"_all": {
"auto_boost": true,
"enabled": true
},
"properties": {
"text": {
"index_analyzer": "fulltext_analyzer",
"store": "yes",
"type": "string"
}
}
}
},
"settings": {
"analysis": {
"analyzer": {
"fulltext_analyzer": {
"filter": [
"my_delimited_payload_filter"
],
"tokenizer": "whitespace",
"type": "custom"
}
},
"filter": {
"my_delimited_payload_filter": {
"delimiter": "+",
"encoding": "float",
"type": "delimited_payload_filter"
}
}
},
"index": {
"number_of_replicas": 0,
"number_of_shards": 1
}
}
}
POST paytest/test/1
{
"text": "the+1 quick+2 brown+3 fox+4 is quick+10"
}
POST paytest/test/2
{
"text": "the+1 quick+2 red+3 fox+4"
}
POST paytest/_refresh
POST paytest/_search
{
"script_fields": {
"ttf": {
"script": "_shard[\"text\"][\"quick\"].ttf()"
}
}
}
POST paytest/_search
{
"script_fields": {
"freq": {
"script": "_shard[\"text\"][\"quick\"].freq()"
}
}
}
POST paytest/test/2/_termvector
POST paytest/_search
{
"script_fields": {
"payloads": {
"script": "term = _shard[\"text\"].get(\"red\",_PAYLOADS);payloads = []; for(pos : term){payloads.add(pos.payloadAsFloat(-1));} return payloads;"
}
}
}
POST paytest/_search
{
"script_fields": {
"tv": {
"script": "_shard[\"text\"][\"quick\"].freq()"
}
},
"query": {
"function_score": {
"functions": [
{
"script_score": {
"script": "_shard[\"text\"][\"quick\"].freq()"
}
}
]
}
}
}
```
closes #3772
2014-01-02 05:17:33 -05:00
|
|
|
[float]
|
|
|
|
=== Computing scores based on terms in scripts
|
|
|
|
|
|
|
|
see <<modules-advanced-scripting, advanced scripting documentation>>
|
|
|
|
|
2013-08-28 19:24:34 -04:00
|
|
|
[float]
|
|
|
|
=== Document Fields
|
|
|
|
|
|
|
|
Most scripting revolve around the use of specific document fields data.
|
|
|
|
The `doc['field_name']` can be used to access specific field data within
|
|
|
|
a document (the document in question is usually derived by the context
|
|
|
|
the script is used). Document fields are very fast to access since they
|
|
|
|
end up being loaded into memory (all the relevant field values/tokens
|
|
|
|
are loaded to memory).
|
|
|
|
|
|
|
|
The following data can be extracted from a field:
|
|
|
|
|
|
|
|
[cols="<,<",options="header",]
|
|
|
|
|=======================================================================
|
|
|
|
|Expression |Description
|
|
|
|
|`doc['field_name'].value` |The native value of the field. For example,
|
|
|
|
if its a short type, it will be short.
|
|
|
|
|
|
|
|
|`doc['field_name'].values` |The native array values of the field. For
|
|
|
|
example, if its a short type, it will be short[]. Remember, a field can
|
|
|
|
have several values within a single doc. Returns an empty array if the
|
|
|
|
field has no values.
|
|
|
|
|
|
|
|
|`doc['field_name'].empty` |A boolean indicating if the field has no
|
|
|
|
values within the doc.
|
|
|
|
|
|
|
|
|`doc['field_name'].multiValued` |A boolean indicating that the field
|
|
|
|
has several values within the corpus.
|
|
|
|
|
|
|
|
|`doc['field_name'].lat` |The latitude of a geo point type.
|
|
|
|
|
|
|
|
|`doc['field_name'].lon` |The longitude of a geo point type.
|
|
|
|
|
|
|
|
|`doc['field_name'].lats` |The latitudes of a geo point type.
|
|
|
|
|
|
|
|
|`doc['field_name'].lons` |The longitudes of a geo point type.
|
|
|
|
|
|
|
|
|`doc['field_name'].distance(lat, lon)` |The `plane` distance (in miles)
|
|
|
|
of this geo point field from the provided lat/lon.
|
|
|
|
|
|
|
|
|`doc['field_name'].arcDistance(lat, lon)` |The `arc` distance (in
|
|
|
|
miles) of this geo point field from the provided lat/lon.
|
|
|
|
|
|
|
|
|`doc['field_name'].distanceInKm(lat, lon)` |The `plane` distance (in
|
|
|
|
km) of this geo point field from the provided lat/lon.
|
|
|
|
|
|
|
|
|`doc['field_name'].arcDistanceInKm(lat, lon)` |The `arc` distance (in
|
|
|
|
km) of this geo point field from the provided lat/lon.
|
|
|
|
|
|
|
|
|`doc['field_name'].geohashDistance(geohash)` |The distance (in miles)
|
|
|
|
of this geo point field from the provided geohash.
|
|
|
|
|
|
|
|
|`doc['field_name'].geohashDistanceInKm(geohash)` |The distance (in km)
|
|
|
|
of this geo point field from the provided geohash.
|
|
|
|
|=======================================================================
|
|
|
|
|
|
|
|
[float]
|
|
|
|
=== Stored Fields
|
|
|
|
|
2014-03-07 08:21:45 -05:00
|
|
|
Stored fields can also be accessed when executing a script. Note, they
|
2013-08-28 19:24:34 -04:00
|
|
|
are much slower to access compared with document fields, but are not
|
|
|
|
loaded into memory. They can be simply accessed using
|
|
|
|
`_fields['my_field_name'].value` or `_fields['my_field_name'].values`.
|
|
|
|
|
|
|
|
[float]
|
|
|
|
=== Source Field
|
|
|
|
|
|
|
|
The source field can also be accessed when executing a script. The
|
|
|
|
source field is loaded per doc, parsed, and then provided to the script
|
|
|
|
for evaluation. The `_source` forms the context under which the source
|
|
|
|
field can be accessed, for example `_source.obj2.obj1.field3`.
|
|
|
|
|
2013-11-02 23:09:24 -04:00
|
|
|
Accessing `_source` is much slower compared to using `_doc`
|
|
|
|
but the data is not loaded into memory. For a single field access `_fields` may be
|
|
|
|
faster than using `_source` due to the extra overhead of potentially parsing large documents.
|
|
|
|
However, `_source` may be faster if you access multiple fields or if the source has already been
|
|
|
|
loaded for other purposes.
|
|
|
|
|
make term statistics accessible in scripts
term statistics can be accessed via the _shard variable.
Below is a minimal example. See documentation on details.
```
DELETE paytest
PUT paytest
{
"mappings": {
"test": {
"_all": {
"auto_boost": true,
"enabled": true
},
"properties": {
"text": {
"index_analyzer": "fulltext_analyzer",
"store": "yes",
"type": "string"
}
}
}
},
"settings": {
"analysis": {
"analyzer": {
"fulltext_analyzer": {
"filter": [
"my_delimited_payload_filter"
],
"tokenizer": "whitespace",
"type": "custom"
}
},
"filter": {
"my_delimited_payload_filter": {
"delimiter": "+",
"encoding": "float",
"type": "delimited_payload_filter"
}
}
},
"index": {
"number_of_replicas": 0,
"number_of_shards": 1
}
}
}
POST paytest/test/1
{
"text": "the+1 quick+2 brown+3 fox+4 is quick+10"
}
POST paytest/test/2
{
"text": "the+1 quick+2 red+3 fox+4"
}
POST paytest/_refresh
POST paytest/_search
{
"script_fields": {
"ttf": {
"script": "_shard[\"text\"][\"quick\"].ttf()"
}
}
}
POST paytest/_search
{
"script_fields": {
"freq": {
"script": "_shard[\"text\"][\"quick\"].freq()"
}
}
}
POST paytest/test/2/_termvector
POST paytest/_search
{
"script_fields": {
"payloads": {
"script": "term = _shard[\"text\"].get(\"red\",_PAYLOADS);payloads = []; for(pos : term){payloads.add(pos.payloadAsFloat(-1));} return payloads;"
}
}
}
POST paytest/_search
{
"script_fields": {
"tv": {
"script": "_shard[\"text\"][\"quick\"].freq()"
}
},
"query": {
"function_score": {
"functions": [
{
"script_score": {
"script": "_shard[\"text\"][\"quick\"].freq()"
}
}
]
}
}
}
```
closes #3772
2014-01-02 05:17:33 -05:00
|
|
|
|
2013-08-28 19:24:34 -04:00
|
|
|
[float]
|
|
|
|
=== mvel Built In Functions
|
|
|
|
|
|
|
|
There are several built in functions that can be used within scripts.
|
|
|
|
They include:
|
|
|
|
|
|
|
|
[cols="<,<",options="header",]
|
|
|
|
|=======================================================================
|
|
|
|
|Function |Description
|
|
|
|
|`time()` |The current time in milliseconds.
|
|
|
|
|
|
|
|
|`sin(a)` |Returns the trigonometric sine of an angle.
|
|
|
|
|
|
|
|
|`cos(a)` |Returns the trigonometric cosine of an angle.
|
|
|
|
|
|
|
|
|`tan(a)` |Returns the trigonometric tangent of an angle.
|
|
|
|
|
|
|
|
|`asin(a)` |Returns the arc sine of a value.
|
|
|
|
|
|
|
|
|`acos(a)` |Returns the arc cosine of a value.
|
|
|
|
|
|
|
|
|`atan(a)` |Returns the arc tangent of a value.
|
|
|
|
|
|
|
|
|`toRadians(angdeg)` |Converts an angle measured in degrees to an
|
|
|
|
approximately equivalent angle measured in radians
|
|
|
|
|
|
|
|
|`toDegrees(angrad)` |Converts an angle measured in radians to an
|
|
|
|
approximately equivalent angle measured in degrees.
|
|
|
|
|
|
|
|
|`exp(a)` |Returns Euler's number _e_ raised to the power of value.
|
|
|
|
|
|
|
|
|`log(a)` |Returns the natural logarithm (base _e_) of a value.
|
|
|
|
|
|
|
|
|`log10(a)` |Returns the base 10 logarithm of a value.
|
|
|
|
|
|
|
|
|`sqrt(a)` |Returns the correctly rounded positive square root of a
|
|
|
|
value.
|
|
|
|
|
|
|
|
|`cbrt(a)` |Returns the cube root of a double value.
|
|
|
|
|
|
|
|
|`IEEEremainder(f1, f2)` |Computes the remainder operation on two
|
|
|
|
arguments as prescribed by the IEEE 754 standard.
|
|
|
|
|
|
|
|
|`ceil(a)` |Returns the smallest (closest to negative infinity) value
|
|
|
|
that is greater than or equal to the argument and is equal to a
|
|
|
|
mathematical integer.
|
|
|
|
|
|
|
|
|`floor(a)` |Returns the largest (closest to positive infinity) value
|
|
|
|
that is less than or equal to the argument and is equal to a
|
|
|
|
mathematical integer.
|
|
|
|
|
|
|
|
|`rint(a)` |Returns the value that is closest in value to the argument
|
|
|
|
and is equal to a mathematical integer.
|
|
|
|
|
|
|
|
|`atan2(y, x)` |Returns the angle _theta_ from the conversion of
|
|
|
|
rectangular coordinates (_x_, _y_) to polar coordinates (r,_theta_).
|
|
|
|
|
|
|
|
|`pow(a, b)` |Returns the value of the first argument raised to the
|
|
|
|
power of the second argument.
|
|
|
|
|
|
|
|
|`round(a)` |Returns the closest _int_ to the argument.
|
|
|
|
|
|
|
|
|`random()` |Returns a random _double_ value.
|
|
|
|
|
|
|
|
|`abs(a)` |Returns the absolute value of a value.
|
|
|
|
|
|
|
|
|`max(a, b)` |Returns the greater of two values.
|
|
|
|
|
|
|
|
|`min(a, b)` |Returns the smaller of two values.
|
|
|
|
|
|
|
|
|`ulp(d)` |Returns the size of an ulp of the argument.
|
|
|
|
|
|
|
|
|`signum(d)` |Returns the signum function of the argument.
|
|
|
|
|
|
|
|
|`sinh(x)` |Returns the hyperbolic sine of a value.
|
|
|
|
|
|
|
|
|`cosh(x)` |Returns the hyperbolic cosine of a value.
|
|
|
|
|
|
|
|
|`tanh(x)` |Returns the hyperbolic tangent of a value.
|
|
|
|
|
|
|
|
|`hypot(x, y)` |Returns sqrt(_x2_ + _y2_) without intermediate overflow
|
|
|
|
or underflow.
|
|
|
|
|=======================================================================
|
|
|
|
|
|
|
|
[float]
|
|
|
|
=== Arithmetic precision in MVEL
|
|
|
|
|
|
|
|
When dividing two numbers using MVEL based scripts, the engine tries to
|
|
|
|
be smart and adheres to the default behaviour of java. This means if you
|
|
|
|
divide two integers (you might have configured the fields as integer in
|
|
|
|
the mapping), the result will also be an integer. This means, if a
|
|
|
|
calculation like `1/num` is happening in your scripts and `num` is an
|
|
|
|
integer with the value of `8`, the result is `0` even though you were
|
|
|
|
expecting it to be `0.125`. You may need to enforce precision by
|
|
|
|
explicitly using a double like `1.0/num` in order to get the expected
|
|
|
|
result.
|