Reorganise scripting docs (#18132)

* Reorganize scripting documentation

* Further changes to tidy up scripting docs

Closes #18116

* Add note about .lat/lon potentially returning null

* Added .value to expressions example

* Fixed two bad ASCIIDOC links
This commit is contained in:
Clinton Gormley 2016-05-04 18:17:10 +02:00
parent 5a0cfdd6af
commit 34d90b041f
11 changed files with 1108 additions and 777 deletions

View File

@ -94,8 +94,6 @@ include::modules/network.asciidoc[]
include::modules/node.asciidoc[]
include::modules/painless.asciidoc[]
include::modules/plugins.asciidoc[]
include::modules/scripting.asciidoc[]

View File

@ -1,5 +1,104 @@
include::scripting/scripting.asciidoc[]
[[modules-scripting]]
== Scripting
The scripting module enables you to use scripts to evaluate custom
expressions. For example, you could use a script to return "script fields"
as part of a search request or evaluate a custom score for a query.
TIP: Elasticsearch now has a built-in scripting language called _Painless_
that provides a more secure alternative for implementing
scripts for Elasticsearch. We encourage you to try it out --
for more information, see <<modules-scripting-painless, Painless Scripting Language>>.
The default scripting language is http://groovy-lang.org/[groovy].
Additional `lang` plugins enable you to run scripts written in other languages.
Everywhere a script can be used, you can include a `lang` parameter
to specify the language of the script.
[float]
=== General-purpose languages:
These languages can be used for any purpose in the scripting APIs,
and give the most flexibility.
[cols="<,<,<",options="header",]
|=======================================================================
|Language
|Sandboxed
|Required plugin
|<<modules-scripting-painless, `painless`>>
|yes
|built-in
|<<modules-scripting-groovy, `groovy`>>
|<<modules-scripting-security, no>>
|built-in
|{plugins}/lang-javascript.html[`javascript`]
|<<modules-scripting-security, no>>
|{plugins}/lang-javascript.html[`lang-javascript`]
|{plugins}/lang-python.html[`python`]
|<<modules-scripting-security, no>>
|{plugins}/lang-python.html[`lang-python`]
|=======================================================================
[float]
=== Special-purpose languages:
These languages are less flexible, but typically have higher performance for
certain tasks.
[cols="<,<,<,<",options="header",]
|=======================================================================
|Language
|Sandboxed
|Required plugin
|Purpose
|<<modules-scripting-expression, `expression`>>
|yes
|built-in
|fast custom ranking and sorting
|<<search-template, `mustache`>>
|yes
|built-in
|templates
|<<modules-scripting-native, `java`>>
|n/a
|you write it!
|expert API
|=======================================================================
[WARNING]
.Scripts and security
=================================================
Languages that are sandboxed are designed with security in mind. However, non-
sandboxed languages can be a security issue, please read
<<modules-scripting-security, Scripting and security>> for more details.
=================================================
include::scripting/using.asciidoc[]
include::scripting/fields.asciidoc[]
include::scripting/security.asciidoc[]
include::scripting/groovy.asciidoc[]
include::scripting/painless.asciidoc[]
include::scripting/expression.asciidoc[]
include::scripting/native.asciidoc[]
include::scripting/advanced-scripting.asciidoc[]
include::scripting/security.asciidoc[]

View File

@ -1,13 +1,17 @@
[[modules-advanced-scripting]]
=== Text scoring in scripts
=== Advanced text scoring in scripts
experimental[The functionality described on this page is considered experimental and may be changed or removed in a future release]
Text features, such as term or document frequency for a specific term can be accessed in scripts (see <<modules-scripting, scripting documentation>> ) with the `_index` variable. This can be useful if, for example, you want to implement your own scoring model using for example a script inside a <<query-dsl-function-score-query,function score query>>.
Text features, such as term or document frequency for a specific term can be
accessed in scripts with the `_index` variable. This can be useful if, for
example, you want to implement your own scoring model using for example a
script inside a <<query-dsl-function-score-query,function score query>>.
Statistics over the document collection are computed *per shard*, not per
index.
[float]
==== Nomenclature:
=== Nomenclature:
[horizontal]
@ -33,7 +37,7 @@ depending on the shard the current document resides in.
[float]
==== Shard statistics:
=== Shard statistics:
`_index.numDocs()`::
@ -49,7 +53,7 @@ depending on the shard the current document resides in.
[float]
==== Field statistics:
=== Field statistics:
Field statistics can be accessed with a subscript operator like this:
`_index['FIELD']`.
@ -74,7 +78,7 @@ depending on the shard the current document resides in.
The number of terms in a field cannot be accessed using the `_index` variable. See <<token-count>> for how to do that.
[float]
==== Term statistics:
=== Term statistics:
Term statistics for a field can be accessed with a subscript operator like
this: `_index['FIELD']['TERM']`. This will never return null, even if term or field does not exist.
@ -101,7 +105,7 @@ affect is your set the <<index-options,`index_options`>> to `docs`.
[float]
==== Term positions, offsets and payloads:
=== Term positions, offsets and payloads:
If you need information on the positions of terms in a field, call
`_index['FIELD'].get('TERM', flag)` where flag can be
@ -174,7 +178,7 @@ return score;
[float]
==== Term vectors:
=== Term vectors:
The `_index` variable can only be used to gather statistics for single terms. If you want to use information on all terms in a field, you must store the term vectors (see <<term-vector>>). To access them, call
`_index.termVectors()` to get a

View File

@ -0,0 +1,120 @@
[[modules-scripting-expression]]
=== Lucene Expressions Language
Lucene's expressions compile a `javascript` expression to bytecode. They are
designed for high-performance custom ranking and sorting functions and are
enabled for `inline` and `stored` scripting by default.
[float]
=== Performance
Expressions were designed to have competitive performance with custom Lucene code.
This performance is due to having low per-document overhead as opposed to other
scripting engines: expressions do more "up-front".
This allows for very fast execution, even faster than if you had written a `native` script.
[float]
=== Syntax
Expressions support a subset of javascript syntax: a single expression.
See the link:http://lucene.apache.org/core/6_0_0/expressions/index.html?org/apache/lucene/expressions/js/package-summary.html[expressions module documentation]
for details on what operators and functions are available.
Variables in `expression` scripts are available to access:
* document fields, e.g. `doc['myfield'].value`
* variables and methods that the field supports, e.g. `doc['myfield'].empty`
* Parameters passed into the script, e.g. `mymodifier`
* The current document's score, `_score` (only available when used in a `script_score`)
You can use Expressions scripts for `script_score`, `script_fields`, sort scripts, and numeric aggregation
scripts, simply set the `lang` parameter to `expression`.
[float]
=== Numeric field API
[cols="<,<",options="header",]
|=======================================================================
|Expression |Description
|`doc['field_name'].value` |The value of the field, as a `double`
|`doc['field_name'].empty` |A boolean indicating if the field has no
values within the doc.
|`doc['field_name'].min()` |The minimum value of the field in this document.
|`doc['field_name'].max()` |The maximum value of the field in this document.
|`doc['field_name'].median()` |The median value of the field in this document.
|`doc['field_name'].avg()` |The average of the values in this document.
|`doc['field_name'].sum()` |The sum of the values in this document.
|`doc['field_name'].count()` |The number of values in this document.
|=======================================================================
When a document is missing the field completely, by default the value will be treated as `0`.
You can treat it as another value instead, e.g. `doc['myfield'].empty ? 100 : doc['myfield'].value`
When a document has multiple values for the field, by default the minimum value is returned.
You can choose a different value instead, e.g. `doc['myfield'].sum()`.
When a document is missing the field completely, by default the value will be treated as `0`.
Boolean fields are exposed as numerics, with `true` mapped to `1` and `false` mapped to `0`.
For example: `doc['on_sale'].value ? doc['price'].value * 0.5 : doc['price'].value`
[float]
=== Date field API
Date fields are treated as the number of milliseconds since January 1, 1970 and
support the Numeric Fields API above, with these additional methods:
[cols="<,<",options="header",]
|=======================================================================
|Expression |Description
|`doc['field_name'].getYear()` |Year component, e.g. `1970`.
|`doc['field_name'].getMonth()` |Month component (0-11), e.g. `0` for January.
|`doc['field_name'].getDayOfMonth()` |Day component, e.g. `1` for the first of the month.
|`doc['field_name'].getHourOfDay()` |Hour component (0-23)
|`doc['field_name'].getMinutes()` |Minutes component (0-59)
|`doc['field_name'].getSeconds()` |Seconds component (0-59)
|=======================================================================
The following example shows the difference in years between the `date` fields date0 and date1:
`doc['date1'].getYear() - doc['date0'].getYear()`
[float]
=== `geo_point` field API
[cols="<,<",options="header",]
|=======================================================================
|Expression |Description
|`doc['field_name'].empty` |A boolean indicating if the field has no
values within the doc.
|`doc['field_name'].lat` |The latitude of the geo point, or `null`.
|`doc['field_name'].lon` |The longitude of the geo point, or `null`.
|=======================================================================
The following example computes distance in kilometers from Washington, DC:
`haversin(38.9072, 77.0369, doc['field_name'].lat, doc['field_name'].lon)`
In this example the coordinates could have been passed as parameters to the script,
e.g. based on geolocation of the user.
[float]
=== Limitations
There are a few limitations relative to other script languages:
* Only numeric, boolean, date, and geo_point fields may be accessed
* Stored fields are not available

View File

@ -0,0 +1,232 @@
[[modules-scripting-fields]]
=== Accessing document fields and special variables
Depending on where a script is used, it will have access to certain special
variables and document fields.
[float]
== Update scripts
A script used in the <<docs-update,update>>,
<<docs-update-by-query,update-by-query>>, or <<docs-reindex,reindex>>
API will have access to the `ctx` variable which exposes:
[horizontal]
`ctx._source`:: Access to the document <<mapping-source-field,`_source` field>>.
`ctx.op`:: The operation that should be applied to the document: `index` or `delete`.
`ctx._index` etc:: Access to <<mapping-fields,document meta-fields>>, some of which may be read-only.
[float]
== Search and Aggregation scripts
With the exception of <<search-request-script-fields,script fields>> which are
executed once per search hit, scripts used in search and aggregations will be
executed once for every document which might match a query or an aggregation.
Depending on how many documents you have, this could mean millions or billions
of executions: these scripts need to be fast!
Field values can be accessed from a script using
<<modules-scripting-doc-vals,doc-values>>, or
<<modules-scripting-stored,stored fields or `_source` field>>, which are explained below.
Scripts may also have access to the document's relevance
<<scripting-score,`_score`>> and, via the experimental `_index` variable,
to term statistics for <<modules-advanced-scripting,advanced text scoring>>.
[[scripting-score]]
[float]
=== Accessing the score of a document within a script
Scripts used in the <<query-dsl-function-score-query,`function_score` query>>,
in <<search-request-sort,script-based sorting>>, or in
<<search-aggregations,aggregations>> have access to the `_score` variable which
represents the current relevance score of a document.
Here's an example of using a script in a
<<query-dsl-function-score-query,`function_score` query>> to alter the
relevance `_score` of each document:
[source,js]
-------------------------------------
PUT my_index/my_type/1
{
"text": "quick brown fox",
"popularity": 1
}
PUT my_index/my_type/2
{
"text": "quick fox",
"popularity": 5
}
GET my_index/_search
{
"query": {
"function_score": {
"query": {
"match": {
"text": "quick brown fox"
}
},
"script_score": {
"script": {
"lang": "expression",
"inline": "_score * doc['popularity']"
}
}
}
}
}
-------------------------------------
// AUTOSENSE
[float]
[[modules-scripting-doc-vals]]
=== Doc Values
By far the fastest most efficient way to access a field value from a
script is to use the `doc['field_name']` syntax, which retrieves the field
value from <<doc-values,doc values>>. Doc values are a columnar field value
store, enabled by default on all fields except for <<text,analyzed `text` fields>>.
[source,js]
-------------------------------
PUT my_index/my_type/1
{
"cost_price": 100
}
GET my_index/_search
{
"script_fields": {
"sales_price": {
"script": {
"lang": "expression",
"inline": "doc['cost_price'] * markup",
"params": {
"markup": 0.2
}
}
}
}
}
-------------------------------
// AUTOSENSE
Doc-values can only return "simple" field values like numbers, dates, geo-
points, terms, etc, or arrays of these values if the field is multi-valued.
It cannot return JSON objects.
[NOTE]
.Doc values and `text` fields
===================================================
The `doc['field']` syntax can also be used for <<text,analyzed `text` fields>>
if <<fielddata,`fielddata`>> is enabled, but *BEWARE*: enabling fielddata on a
`text` field requires loading all of the terms into the JVM heap, which can be
very expensive both in terms of memory and CPU. It seldom makes sense to
access `text` fields from scripts.
===================================================
[float]
[[modules-scripting-stored]]
=== Stored Fields and `_source`
_Stored fields_ -- fields explicitly marked as
<<mapping-store,`"store": true`>> -- can be accessed using the
`_fields['field_name'].value` or `_fields['field_name'].values` syntax.
The document <<mapping-source-field,`_source`>>, which is really just a
special stored field, can be accessed using the `_source.field_name` syntax.
The `_source` is loaded as a map-of-maps, so properties within object fields
can be accessed as, for example, `_source.name.first`.
[IMPORTANT]
.Prefer doc-values to stored fields
=========================================================
Stored fields (which includes the stored `_source` field) are much slower than
doc-values. They are optimised for returning several fields per result,
while doc values are optimised for accessing the value of a specific field in
many documents.
It makes sense to use `_source` or stored fields when generating a
<<search-request-script-fields,script field>> for the top ten hits from a search
result but, for other search and aggregation use cases, always prefer using
doc values.
=========================================================
For instance:
[source,js]
-------------------------------
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"title": { <1>
"type": "text"
},
"first_name": {
"type": "text",
"store": true
},
"last_name": {
"type": "text",
"store": true
}
}
}
}
}
PUT my_index/my_type/1
{
"title": "Mr",
"first_name": "Barry",
"last_name": "White"
}
GET my_index/_search
{
"script_fields": {
"source": {
"script": {
"lang": "groovy",
"inline": "_source.title + ' ' + _source.first_name + ' ' + _source.last_name" <2>
}
},
"stored_fields": {
"script": {
"lang": "groovy",
"inline": "_fields['first_name'].value + ' ' + _fields['last_name'].value"
}
}
}
}
-------------------------------
// AUTOSENSE
<1> The `title` field is not stored and so cannot be used with the `_fields[]` syntax.
<2> The `title` field can still be accessed from the `_source`.
[TIP]
.Stored vs `_source`
=======================================================
The `_source` field is just a special stored field, so the performance is
similar to that of other stored fields. The `_source` provides access to the
original document body that was indexed (including the ability to distinguish
`null` values from empty fields, single-value arrays from plain scalars, etc).
The only time it really makes sense to use stored fields instead of the
`_source` field is when the `_source` is very large and it is less costly to
access a few small stored fields instead of the entire `_source`.
=======================================================

View File

@ -0,0 +1,181 @@
[[modules-scripting-groovy]]
=== Groovy Scripting Language
Groovy is the default scripting language available in Elasticsearch. Although
limited by the <<java-security-manager,Java Security Manager>>, it is not a
sandboxed language and only `file` scripts may be used by default.
Enabling `inline` or `stored` Groovy scripting is a security risk and should
only be considered if your Elasticsearch cluster is protected from the outside
world. Even a simple `while (true) { }` loop could behave as a denial-of-
service attack on your cluster.
See <<modules-scripting-security, Scripting and Security>> for details
on security issues with scripts, including how to customize class
whitelisting.
[float]
=== Doc value properties and methods
Doc values in Groovy support the following properties and methods (depending
on the underlying field type):
`doc['field_name'].value`::
The native value of the field. For example, if its a short type, it will be short.
`doc['field_name'].values`::
The native array values of the field. For example, if its a short type,
it will be short[]. Remember, a field can have several values within a
single doc. Returns an empty array if the field has no values.
`doc['field_name'].empty`::
A boolean indicating if the field has no values within the doc.
`doc['field_name'].multiValued`::
A boolean indicating that the field has several values within the corpus.
`doc['field_name'].lat`::
The latitude of a geo point type, or `null`.
`doc['field_name'].lon`::
The longitude of a geo point type, or `null`.
`doc['field_name'].lats`::
The latitudes of a geo point type, or an empty array.
`doc['field_name'].lons`::
The longitudes of a geo point type, or an empty array.
`doc['field_name'].distance(lat, lon)`::
The `plane` distance (in meters) of this geo point field from the provided lat/lon.
`doc['field_name'].distanceWithDefault(lat, lon, default)`::
The `plane` distance (in meters) of this geo point field from the provided lat/lon with a default value.
`doc['field_name'].distanceInMiles(lat, lon)`::
The `plane` distance (in miles) of this geo point field from the provided lat/lon.
`doc['field_name'].distanceInMilesWithDefault(lat, lon, default)`::
The `plane` distance (in miles) of this geo point field from the provided lat/lon with a default value.
`doc['field_name'].distanceInKm(lat, lon)`::
The `plane` distance (in km) of this geo point field from the provided lat/lon.
`doc['field_name'].distanceInKmWithDefault(lat, lon, default)`::
The `plane` distance (in km) of this geo point field from the provided lat/lon with a default value.
`doc['field_name'].arcDistance(lat, lon)`::
The `arc` distance (in meters) of this geo point field from the provided lat/lon.
`doc['field_name'].arcDistanceWithDefault(lat, lon, default)`::
The `arc` distance (in meters) of this geo point field from the provided lat/lon with a default value.
`doc['field_name'].arcDistanceInMiles(lat, lon)`::
The `arc` distance (in miles) of this geo point field from the provided lat/lon.
`doc['field_name'].arcDistanceInMilesWithDefault(lat, lon, default)`::
The `arc` distance (in miles) of this geo point field from the provided lat/lon with a default value.
`doc['field_name'].arcDistanceInKm(lat, lon)`::
The `arc` distance (in km) of this geo point field from the provided lat/lon.
`doc['field_name'].arcDistanceInKmWithDefault(lat, lon, default)`::
The `arc` distance (in km) of this geo point field from the provided lat/lon with a default value.
`doc['field_name'].factorDistance(lat, lon)`::
The distance factor of this geo point field from the provided lat/lon.
`doc['field_name'].factorDistance(lat, lon, default)`::
The distance factor of this geo point field from the provided lat/lon with a default value.
`doc['field_name'].geohashDistance(geohash)`::
The `arc` distance (in meters) of this geo point field from the provided geohash.
`doc['field_name'].geohashDistanceInKm(geohash)`::
The `arc` distance (in km) of this geo point field from the provided geohash.
`doc['field_name'].geohashDistanceInMiles(geohash)`::
The `arc` distance (in miles) of this geo point field from the provided geohash.
[float]
=== Groovy Built In Functions
There are several built in functions that can be used within scripts.
They include:
[cols="<,<",options="header",]
|=======================================================================
|Function |Description
|`sin(a)` |Returns the trigonometric sine of an angle.
|`cos(a)` |Returns the trigonometric cosine of an angle.
|`tan(a)` |Returns the trigonometric tangent of an angle.
|`asin(a)` |Returns the arc sine of a value.
|`acos(a)` |Returns the arc cosine of a value.
|`atan(a)` |Returns the arc tangent of a value.
|`toRadians(angdeg)` |Converts an angle measured in degrees to an
approximately equivalent angle measured in radians
|`toDegrees(angrad)` |Converts an angle measured in radians to an
approximately equivalent angle measured in degrees.
|`exp(a)` |Returns Euler's number _e_ raised to the power of value.
|`log(a)` |Returns the natural logarithm (base _e_) of a value.
|`log10(a)` |Returns the base 10 logarithm of a value.
|`sqrt(a)` |Returns the correctly rounded positive square root of a
value.
|`cbrt(a)` |Returns the cube root of a double value.
|`IEEEremainder(f1, f2)` |Computes the remainder operation on two
arguments as prescribed by the IEEE 754 standard.
|`ceil(a)` |Returns the smallest (closest to negative infinity) value
that is greater than or equal to the argument and is equal to a
mathematical integer.
|`floor(a)` |Returns the largest (closest to positive infinity) value
that is less than or equal to the argument and is equal to a
mathematical integer.
|`rint(a)` |Returns the value that is closest in value to the argument
and is equal to a mathematical integer.
|`atan2(y, x)` |Returns the angle _theta_ from the conversion of
rectangular coordinates (_x_, _y_) to polar coordinates (r,_theta_).
|`pow(a, b)` |Returns the value of the first argument raised to the
power of the second argument.
|`round(a)` |Returns the closest _int_ to the argument.
|`random()` |Returns a random _double_ value.
|`abs(a)` |Returns the absolute value of a value.
|`max(a, b)` |Returns the greater of two values.
|`min(a, b)` |Returns the smaller of two values.
|`ulp(d)` |Returns the size of an ulp of the argument.
|`signum(d)` |Returns the signum function of the argument.
|`sinh(x)` |Returns the hyperbolic sine of a value.
|`cosh(x)` |Returns the hyperbolic cosine of a value.
|`tanh(x)` |Returns the hyperbolic tangent of a value.
|`hypot(x, y)` |Returns sqrt(_x2_ + _y2_) without intermediate overflow
or underflow.
|=======================================================================

View File

@ -0,0 +1,84 @@
[[modules-scripting-native]]
=== Native (Java) Scripts
Sometimes `groovy` and <<modules-scripting-expression, expression>> aren't enough. For those times you can
implement a native script.
The best way to implement a native script is to write a plugin and install it.
The plugin {plugins}/plugin-authors.html[documentation] has more information on
how to write a plugin so that Elasticsearch will properly load it.
To register the actual script you'll need to implement `NativeScriptFactory`
to construct the script. The actual script will extend either
`AbstractExecutableScript` or `AbstractSearchScript`. The second one is likely
the most useful and has several helpful subclasses you can extend like
`AbstractLongSearchScript`, `AbstractDoubleSearchScript`, and
`AbstractFloatSearchScript`. Finally, your plugin should register the native
script by declaring the `onModule(ScriptModule)` method.
If you squashed the whole thing into one class it'd look like:
[source,java]
--------------------------------------------------
public class MyNativeScriptPlugin extends Plugin {
@Override
public String name() {
return "my-native-script";
}
@Override
public String description() {
return "my native script that does something great";
}
public void onModule(ScriptModule scriptModule) {
scriptModule.registerScript("my_script", MyNativeScriptFactory.class);
}
public static class MyNativeScriptFactory implements NativeScriptFactory {
@Override
public ExecutableScript newScript(@Nullable Map<String, Object> params) {
return new MyNativeScript();
}
@Override
public boolean needsScores() {
return false;
}
}
public static class MyNativeScript extends AbstractFloatSearchScript {
@Override
public float runAsFloat() {
float a = (float) source().get("a");
float b = (float) source().get("b");
return a * b;
}
}
}
--------------------------------------------------
You can execute the script by specifying its `lang` as `native`, and the name
of the script as the `id`:
[source,js]
--------------------------------------------------
curl -XPOST localhost:9200/_search -d '{
"query": {
"function_score": {
"query": {
"match": {
"body": "foo"
}
},
"functions": [
{
"script_score": {
"script": {
"id": "my_script",
"lang" : "native"
}
}
}
]
}
}
}'
--------------------------------------------------

View File

@ -1,10 +1,12 @@
[[modules-scripting-painless]]
== Painless Scripting Language
=== Painless Scripting Language
experimental[The Painless scripting language is new and is still marked as experimental. The syntax or API may be changed in the future in non-backwards compatible ways if required.]
_Painless_ is a simple, secure scripting language built in to Elasticsearch as a module.
It is designed specifically for use with Elasticsearch and can safely be used dynamically.
_Painless_ is a simple, secure scripting language available in Elasticsearch
by default. It is designed specifically for use with Elasticsearch and can
safely be used with `inline` and `stored` scripting, which is enabled by
default.
A Painless script is essentially a single function. Painless does not provide support
for defining multiple functions within a script. The Painless syntax is similar to

View File

@ -1,761 +0,0 @@
[[modules-scripting]]
== Scripting
The scripting module enables you to use scripts to evaluate custom
expressions. For example, you could use a script to return "script fields"
as part of a search request or evaluate a custom score for a query.
TIP: Elasticsearch now has a built-in scripting language called _Painless_
that provides a more secure alternative for implementing
scripts for Elasticsearch. We encourage you to try it out--
for more information, see <<modules-scripting-painless, Painless Scripting Language>>.
The default scripting language is http://groovy-lang.org/[groovy]
(http://mvel.codehaus.org/[mvel] was the default in 1.3.x and earlier).
Additional `lang` plugins enable you to run scripts written in other languages.
Everywhere a script can be used, you can include a `lang` parameter
to specify the language of the script. Plugins are available for following languages:
[cols="<,<,<",options="header",]
|=======================================================================
|Language |Sandboxed |Required plugin
|groovy |no |built-in
|expression |yes |built-in
|mustache |yes |built-in
|painless |yes |built-in (module)
|javascript |no |{plugins}/lang-javascript.html[elasticsearch-lang-javascript]
|python |no |{plugins}/lang-python.html[elasticsearch-lang-python]
|=======================================================================
.Groovy dynamic scripting off by default from v1.4.3
[IMPORTANT]
===================================================
Groovy dynamic scripting is off by default. This prevents Groovy scripts
from being accepted as part of a request or retrieved from the
`.scripts` index. You can still use Groovy file scripts stored in
the `config/scripts/` directory on every node.
To convert an inline script to a file-based script, save the contents
of the `inline` field to a file with the `.groovy` extension and
store it in the `config/scripts` directory on every data node in your
cluster.
For example, if you have the following inline script:
[source,js]
-----------------------------------
GET /_search
{
"script_fields": {
"my_field": {
"inline": "1 + my_var",
"params": {
"my_var": 2
}
}
}
}
-----------------------------------
Save `1 + my_var` in a file called `config/scripts/my_script.groovy`.
To use the script in a request, specify its name (without the `.groovy` extension) in the `file` field:
[source,js]
-----------------------------------
GET /_search
{
"script_fields": {
"my_field": {
"script": {
"file": "my_script",
"params": {
"my_var": 2
}
}
}
}
}
-----------------------------------
===================================================
[float]
=== File-based Scripts
To increase security, Elasticsearch does not allow you to specify scripts for
non-sandboxed languages with a request. Instead, scripts must be placed in the
`scripts` directory inside the configuration directory (the directory where
elasticsearch.yml is). The default location of this `scripts` directory can be
changed by setting `path.scripts` in elasticsearch.yml. Scripts placed into
this directory will automatically be picked up and be available to be used.
Once a script has been placed in this directory, it can be referenced by name.
For example, a script called `calculate-score.groovy` can be referenced in a
request like this:
[source,sh]
--------------------------------------------------
$ tree config
config
├── elasticsearch.yml
├── logging.yml
└── scripts
└── calculate-score.groovy
--------------------------------------------------
[source,sh]
--------------------------------------------------
$ cat config/scripts/calculate-score.groovy
log(_score * 2) + my_modifier
--------------------------------------------------
[source,js]
--------------------------------------------------
curl -XPOST localhost:9200/_search -d '{
"query": {
"function_score": {
"query": {
"match": {
"body": "foo"
}
},
"functions": [
{
"script_score": {
"script": {
"lang": "groovy",
"file": "calculate-score",
"params": {
"my_modifier": 8
}
}
}
}
]
}
}
}'
--------------------------------------------------
The name of the script is derived from the hierarchy of directories it
exists under, and the file name without the lang extension. For example,
a script placed under `config/scripts/group1/group2/test.py` will be
named `group1_group2_test`.
[float]
[[modules-scripting-stored-scripts]]
=== Stored Scripts
Elasticsearch allows you to store scripts in the cluster state.
There are REST endpoints to manage stored scripts as follows:
Requests to the scripts endpoint look like :
[source,js]
-----------------------------------
/_scripts/{lang}/{id}
-----------------------------------
Where the `lang` part is the language the script is in and the `id` part is the id
of the script.
[source,js]
-----------------------------------
curl -XPOST localhost:9200/_scripts/groovy/calculateScore -d '{
"script": "log(_score * 2) + my_modifier"
}'
-----------------------------------
This will store the script under the `calculateScore` in the cluster
state.
This script can be accessed at query time by using the `id` and `lang` script parameters:
[source,js]
--------------------------------------------------
curl -XPOST localhost:9200/_search -d '{
"query": {
"function_score": {
"query": {
"match": {
"body": "foo"
}
},
"functions": [
{
"script_score": {
"script": {
"id": "calculateScore",
"lang" : "groovy",
"params": {
"my_modifier": 8
}
}
}
}
]
}
}
}'
--------------------------------------------------
The script can be viewed by:
[source,js]
-----------------------------------
curl -XGET localhost:9200/_scripts/groovy/calculateScore
-----------------------------------
This is rendered as:
[source,js]
-----------------------------------
'{
"script": "log(_score * 2) + my_modifier"
}'
-----------------------------------
Stored scripts can be deleted by:
[source,js]
-----------------------------------
curl -XDELETE localhost:9200/_scripts/groovy/calculateScore
-----------------------------------
NOTE: The size of stored scripts is limited to 65535 bytes. This can be changed by setting `script.max_size_in_bytes`
setting to increase that soft limit, but if scripts are really large then alternatives like native scripts should be considered.
[float]
[[enable-dynamic-scripting]]
=== Enabling dynamic scripting
We recommend running Elasticsearch behind an application or proxy, which
protects Elasticsearch from the outside world. If users are allowed to run
inline scripts (even in a search request) or indexed scripts, then they have
the same access to your box as the user that Elasticsearch is running as. For
this reason dynamic scripting is allowed only for sandboxed languages by default.
First, you should not run Elasticsearch as the `root` user, as this would allow
a script to access or do *anything* on your server, without limitations. Second,
you should not expose Elasticsearch directly to users, but instead have a proxy
application inbetween. If you *do* intend to expose Elasticsearch directly to
your users, then you have to decide whether you trust them enough to run scripts
on your box or not.
It is possible to enable scripts based on their source, for
every script engine, through the following settings that need to be added to the
`config/elasticsearch.yml` file on every node.
[source,yaml]
-----------------------------------
script.inline: true
script.stored: true
-----------------------------------
While this still allows execution of named scripts provided in the config, or
_native_ Java scripts registered through plugins, it also allows users to run
arbitrary scripts via the API. Instead of sending the name of the file as the
script, the body of the script can be sent instead or retrieved from the
cluster state if previously stored.
There are three possible configuration values for any of the fine-grained
script settings:
[cols="<,<",options="header",]
|=======================================================================
|Value |Description
| `false` |scripting is turned off completely, in the context of the setting being set.
| `true` |scripting is turned on, in the context of the setting being set.
| `sandbox` |scripts may be executed only for languages that are sandboxed
|=======================================================================
The default values are the following:
[source,yaml]
-----------------------------------
script.inline: sandbox
script.stored: sandbox
script.file: true
-----------------------------------
NOTE: Global scripting settings affect the `mustache` scripting language.
<<search-template,Search templates>> internally use the `mustache` language,
and will still be enabled by default as the `mustache` engine is sandboxed,
but they will be enabled/disabled according to fine-grained settings
specified in `elasticsearch.yml`.
It is also possible to control which operations can execute scripts. The
supported operations are:
[cols="<,<",options="header",]
|=======================================================================
|Value |Description
| `aggs` |Aggregations (wherever they may be used)
| `search` |Search api, Percolator api and Suggester api (e.g filters, script_fields)
| `update` |Update api
| `plugin` |Any plugin that makes use of scripts under the generic `plugin` category
|=======================================================================
Plugins can also define custom operations that they use scripts for instead
of using the generic `plugin` category. Those operations can be referred to
in the following form: `${pluginName}_${operation}`.
The following example disables scripting for `update` and `mapping` operations,
regardless of the script source, for any engine. Scripts can still be
executed from sandboxed languages as part of `aggregations`, `search`
and plugins execution though, as the above defaults still get applied.
[source,yaml]
-----------------------------------
script.update: false
script.mapping: false
-----------------------------------
Generic settings get applied in order, operation based ones have precedence
over source based ones. Language specific settings are supported too. They
need to be prefixed with the `script.engine.<engine>` prefix and have
precedence over any other generic settings.
[source,yaml]
-----------------------------------
script.engine.groovy.file.aggs: true
script.engine.groovy.file.mapping: true
script.engine.groovy.file.search: true
script.engine.groovy.file.update: true
script.engine.groovy.file.plugin: true
script.engine.groovy.stored.aggs: true
script.engine.groovy.stored.mapping: false
script.engine.groovy.stored.search: true
script.engine.groovy.stored.update: false
script.engine.groovy.stored.plugin: false
script.engine.groovy.inline.aggs: true
script.engine.groovy.inline.mapping: false
script.engine.groovy.inline.search: false
script.engine.groovy.inline.update: false
script.engine.groovy.inline.plugin: false
-----------------------------------
[float]
=== Default Scripting Language
The default scripting language (assuming no `lang` parameter is provided) is
`groovy`. In order to change it, set the `script.default_lang` to the
appropriate language.
[float]
=== Automatic Script Reloading
The `config/scripts` directory is scanned periodically for changes.
New and changed scripts are reloaded and deleted script are removed
from preloaded scripts cache. The reload frequency can be specified
using `resource.reload.interval` setting, which defaults to `60s`.
To disable script reloading completely set `script.auto_reload_enabled`
to `false`.
[[native-java-scripts]]
[float]
=== Native (Java) Scripts
Sometimes `groovy` and `expressions` aren't enough. For those times you can
implement a native script.
The best way to implement a native script is to write a plugin and install it.
The plugin {plugins}/plugin-authors.html[documentation] has more information on
how to write a plugin so that Elasticsearch will properly load it.
To register the actual script you'll need to implement `NativeScriptFactory`
to construct the script. The actual script will extend either
`AbstractExecutableScript` or `AbstractSearchScript`. The second one is likely
the most useful and has several helpful subclasses you can extend like
`AbstractLongSearchScript`, `AbstractDoubleSearchScript`, and
`AbstractFloatSearchScript`. Finally, your plugin should register the native
script by declaring the `onModule(ScriptModule)` method.
If you squashed the whole thing into one class it'd look like:
[source,java]
--------------------------------------------------
public class MyNativeScriptPlugin extends Plugin {
@Override
public String name() {
return "my-native-script";
}
@Override
public String description() {
return "my native script that does something great";
}
public void onModule(ScriptModule scriptModule) {
scriptModule.registerScript("my_script", MyNativeScriptFactory.class);
}
public static class MyNativeScriptFactory implements NativeScriptFactory {
@Override
public ExecutableScript newScript(@Nullable Map<String, Object> params) {
return new MyNativeScript();
}
@Override
public boolean needsScores() {
return false;
}
}
public static class MyNativeScript extends AbstractFloatSearchScript {
@Override
public float runAsFloat() {
float a = (float) source().get("a");
float b = (float) source().get("b");
return a * b;
}
}
}
--------------------------------------------------
You can execute the script by specifying its `lang` as `native`, and the name
of the script as the `id`:
[source,js]
--------------------------------------------------
curl -XPOST localhost:9200/_search -d '{
"query": {
"function_score": {
"query": {
"match": {
"body": "foo"
}
},
"functions": [
{
"script_score": {
"script": {
"id": "my_script",
"lang" : "native"
}
}
}
]
}
}
}'
--------------------------------------------------
[float]
=== Lucene Expressions Scripts
experimental[The Lucene expressions module is undergoing significant development and the exposed functionality is likely to change in the future]
Lucene's expressions module provides a mechanism to compile a
`javascript` expression to bytecode. This allows very fast execution,
as if you had written a `native` script. Expression scripts can be
used in `script_score`, `script_fields`, sort scripts and numeric aggregation scripts.
See the link:http://lucene.apache.org/core/4_9_0/expressions/index.html?org/apache/lucene/expressions/js/package-summary.html[expressions module documentation]
for details on what operators and functions are available.
Variables in `expression` scripts are available to access:
* document fields, e.g. `doc['myfield'].value`
* variables and methods that the field supports, e.g. `doc['myfield'].empty`
* Parameters passed into the script, e.g. `mymodifier`
* The current document's score, `_score` (only available when used in a `script_score`)
[float]
=== Expressions API for numeric fields
[cols="<,<",options="header",]
|=======================================================================
|Expression |Description
|`doc['field_name'].value` |The native value of the field. For example,
if its a short type, it will be short.
|`doc['field_name'].empty` |A boolean indicating if the field has no
values within the doc.
|`doc['field_name'].min()` |The minimum value of the field in this document.
|`doc['field_name'].max()` |The maximum value of the field in this document.
|`doc['field_name'].median()` |The median value of the field in this document.
|`doc['field_name'].avg()` |The average of the values in this document.
|`doc['field_name'].sum()` |The sum of the values in this document.
|`doc['field_name'].count()` |The number of values in this document.
|=======================================================================
When a document is missing the field completely, by default the value will be treated as `0`.
You can treat it as another value instead, e.g. `doc['myfield'].empty ? 100 : doc['myfield'].value`
When a document has multiple values for the field, by default the minimum value is returned.
You can choose a different value instead, e.g. `doc['myfield'].sum()`.
When a document is missing the field completely, by default the value will be treated as `0`.
Boolean fields are exposed as numerics, with `true` mapped to `1` and `false` mapped to `0`.
For example: `doc['on_sale'] ? doc['price'] * 0.5 : doc['price']`
[float]
=== Additional methods for date fields
Date fields are treated as the number of milliseconds since January 1, 1970 and
support the numeric API above, with these additional methods:
[cols="<,<",options="header",]
|=======================================================================
|Expression |Description
|`doc['field_name'].getYear()` |Year component, e.g. `1970`.
|`doc['field_name'].getMonth()` |Month component (0-11), e.g. `0` for January.
|`doc['field_name'].getDayOfMonth()` |Day component, e.g. `1` for the first of the month.
|`doc['field_name'].getHourOfDay()` |Hour component (0-23)
|`doc['field_name'].getMinutes()` |Minutes component (0-59)
|`doc['field_name'].getSeconds()` |Seconds component (0-59)
|=======================================================================
The following example shows the difference in years between the `date` fields date0 and date1:
`doc['date1'].getYear() - doc['date0'].getYear()`
[float]
=== Expressions API for `geo_point` fields
[cols="<,<",options="header",]
|=======================================================================
|Expression |Description
|`doc['field_name'].empty` |A boolean indicating if the field has no
values within the doc.
|`doc['field_name'].lat` |The latitude of the geo point.
|`doc['field_name'].lon` |The longitude of the geo point.
|=======================================================================
The following example computes distance in kilometers from Washington, DC:
`haversin(38.9072, 77.0369, doc['field_name'].lat, doc['field_name'].lon)`
In this example the coordinates could have been passed as parameters to the script,
e.g. based on geolocation of the user.
[float]
=== Expressions limitations
There are a few limitations relative to other script languages:
* Only numeric, boolean, date, and geo_point fields may be accessed
* Stored fields are not available
[float]
=== Score
In all scripts that can be used in aggregations, the current
document's score is accessible in `_score`.
[float]
=== Computing scores based on terms in scripts
see <<modules-advanced-scripting, advanced scripting documentation>>
[float]
=== Document Fields
Most scripting revolve around the use of specific document fields data.
The `doc['field_name']` can be used to access specific field data within
a document (the document in question is usually derived by the context
the script is used). Document fields are very fast to access since they
end up being loaded into memory (all the relevant field values/tokens
are loaded to memory). Note, however, that the `doc[...]` notation only
allows for simple valued fields (cant return a json object from it)
and makes sense only on non-analyzed or single term based fields.
The following data can be extracted from a field:
[cols="<,<",options="header",]
|=======================================================================
|Expression |Description
|`doc['field_name'].value` |The native value of the field. For example,
if its a short type, it will be short.
|`doc['field_name'].values` |The native array values of the field. For
example, if its a short type, it will be short[]. Remember, a field can
have several values within a single doc. Returns an empty array if the
field has no values.
|`doc['field_name'].empty` |A boolean indicating if the field has no
values within the doc.
|`doc['field_name'].multiValued` |A boolean indicating that the field
has several values within the corpus.
|`doc['field_name'].lat` |The latitude of a geo point type.
|`doc['field_name'].lon` |The longitude of a geo point type.
|`doc['field_name'].lats` |The latitudes of a geo point type.
|`doc['field_name'].lons` |The longitudes of a geo point type.
|`doc['field_name'].distance(lat, lon)` |The `plane` distance (in meters)
of this geo point field from the provided lat/lon.
|`doc['field_name'].distanceWithDefault(lat, lon, default)` |The `plane` distance (in meters)
of this geo point field from the provided lat/lon with a default value.
|`doc['field_name'].distanceInMiles(lat, lon)` |The `plane` distance (in
miles) of this geo point field from the provided lat/lon.
|`doc['field_name'].distanceInMilesWithDefault(lat, lon, default)` |The `plane` distance (in
miles) of this geo point field from the provided lat/lon with a default value.
|`doc['field_name'].distanceInKm(lat, lon)` |The `plane` distance (in
km) of this geo point field from the provided lat/lon.
|`doc['field_name'].distanceInKmWithDefault(lat, lon, default)` |The `plane` distance (in
km) of this geo point field from the provided lat/lon with a default value.
|`doc['field_name'].arcDistance(lat, lon)` |The `arc` distance (in
meters) of this geo point field from the provided lat/lon.
|`doc['field_name'].arcDistanceWithDefault(lat, lon, default)` |The `arc` distance (in
meters) of this geo point field from the provided lat/lon with a default value.
|`doc['field_name'].arcDistanceInMiles(lat, lon)` |The `arc` distance (in
miles) of this geo point field from the provided lat/lon.
|`doc['field_name'].arcDistanceInMilesWithDefault(lat, lon, default)` |The `arc` distance (in
miles) of this geo point field from the provided lat/lon with a default value.
|`doc['field_name'].arcDistanceInKm(lat, lon)` |The `arc` distance (in
km) of this geo point field from the provided lat/lon.
|`doc['field_name'].arcDistanceInKmWithDefault(lat, lon, default)` |The `arc` distance (in
km) of this geo point field from the provided lat/lon with a default value.
|`doc['field_name'].factorDistance(lat, lon)` |The distance factor of this geo point field from the provided lat/lon.
|`doc['field_name'].factorDistance(lat, lon, default)` |The distance factor of this geo point field from the provided lat/lon with a default value.
|`doc['field_name'].geohashDistance(geohash)` |The `arc` distance (in meters)
of this geo point field from the provided geohash.
|`doc['field_name'].geohashDistanceInKm(geohash)` |The `arc` distance (in km)
of this geo point field from the provided geohash.
|`doc['field_name'].geohashDistanceInMiles(geohash)` |The `arc` distance (in
miles) of this geo point field from the provided geohash.
|=======================================================================
[float]
=== Stored Fields
Stored fields can also be accessed when executing a script. Note, they
are much slower to access compared with document fields, as they are not
loaded into memory. They can be simply accessed using
`_fields['my_field_name'].value` or `_fields['my_field_name'].values`.
[float]
=== Accessing the score of a document within a script
When using scripting for calculating the score of a document (for instance, with
the `function_score` query), you can access the score using the `_score`
variable inside of a Groovy script.
[float]
=== Source Field
The source field can also be accessed when executing a script. The
source field is loaded per doc, parsed, and then provided to the script
for evaluation. The `_source` forms the context under which the source
field can be accessed, for example `_source.obj2.obj1.field3`.
Accessing `_source` is much slower compared to using `doc`
but the data is not loaded into memory. For a single field access `_fields` may be
faster than using `_source` due to the extra overhead of potentially parsing large documents.
However, `_source` may be faster if you access multiple fields or if the source has already been
loaded for other purposes.
[float]
=== Groovy Built In Functions
There are several built in functions that can be used within scripts.
They include:
[cols="<,<",options="header",]
|=======================================================================
|Function |Description
|`sin(a)` |Returns the trigonometric sine of an angle.
|`cos(a)` |Returns the trigonometric cosine of an angle.
|`tan(a)` |Returns the trigonometric tangent of an angle.
|`asin(a)` |Returns the arc sine of a value.
|`acos(a)` |Returns the arc cosine of a value.
|`atan(a)` |Returns the arc tangent of a value.
|`toRadians(angdeg)` |Converts an angle measured in degrees to an
approximately equivalent angle measured in radians
|`toDegrees(angrad)` |Converts an angle measured in radians to an
approximately equivalent angle measured in degrees.
|`exp(a)` |Returns Euler's number _e_ raised to the power of value.
|`log(a)` |Returns the natural logarithm (base _e_) of a value.
|`log10(a)` |Returns the base 10 logarithm of a value.
|`sqrt(a)` |Returns the correctly rounded positive square root of a
value.
|`cbrt(a)` |Returns the cube root of a double value.
|`IEEEremainder(f1, f2)` |Computes the remainder operation on two
arguments as prescribed by the IEEE 754 standard.
|`ceil(a)` |Returns the smallest (closest to negative infinity) value
that is greater than or equal to the argument and is equal to a
mathematical integer.
|`floor(a)` |Returns the largest (closest to positive infinity) value
that is less than or equal to the argument and is equal to a
mathematical integer.
|`rint(a)` |Returns the value that is closest in value to the argument
and is equal to a mathematical integer.
|`atan2(y, x)` |Returns the angle _theta_ from the conversion of
rectangular coordinates (_x_, _y_) to polar coordinates (r,_theta_).
|`pow(a, b)` |Returns the value of the first argument raised to the
power of the second argument.
|`round(a)` |Returns the closest _int_ to the argument.
|`random()` |Returns a random _double_ value.
|`abs(a)` |Returns the absolute value of a value.
|`max(a, b)` |Returns the greater of two values.
|`min(a, b)` |Returns the smaller of two values.
|`ulp(d)` |Returns the size of an ulp of the argument.
|`signum(d)` |Returns the signum function of the argument.
|`sinh(x)` |Returns the hyperbolic sine of a value.
|`cosh(x)` |Returns the hyperbolic cosine of a value.
|`tanh(x)` |Returns the hyperbolic tangent of a value.
|`hypot(x, y)` |Returns sqrt(_x2_ + _y2_) without intermediate overflow
or underflow.
|=======================================================================

View File

@ -1,5 +1,139 @@
[[modules-scripting-security]]
=== Scripting and the Java Security Manager
=== Scripting and security
You should never run Elasticsearch as the `root` user, as this would allow a
script to access or do *anything* on your server, without limitations.
You should not expose Elasticsearch directly to users, but instead have a
proxy application inbetween. If you *do* intend to expose Elasticsearch
directly to your users, then you have to decide whether you trust them enough
to run scripts on your box or not, and apply the appropriate safety measures.
[[enable-dynamic-scripting]]
[float]
=== Enabling dynamic scripting
The `script.*` settings allow for <<security-script-fine,fine-grained>>
control of which script languages (e.g `groovy`, `painless`) are allowed to
run in which context ( e.g. `search`, `aggs`, `update`), and where the script
source is allowed to come from (i.e. `inline`, `stored`, `file`).
For instance, the following setting enables `stored` `update` scripts for
`groovy`:
[source,yaml]
----------------
script.engine.groovy.inline.update: true
----------------
Less fine-grained settings exist which allow you to enable or disable scripts
for all sources, all languages, or all contexts. The following settings
enable `inline` and `stored` scripts for all languages in all contexts:
[source,yaml]
-----------------------------------
script.inline: true
script.stored: true
-----------------------------------
WARNING: The above settings mean that anybody who can send requests to your
Elasticsearch instance can run whatever scripts they choose! This is a
security risk and may well lead to your Elasticsearch cluster being
compromised.
[[security-script-source]]
[float]
=== Script source settings
Scripts may be enabled or disabled depending on their source: `inline`,
`stored` in the cluster state, or from a `file` on each node in the cluster.
Each of these settings takes one of these values:
[horizontal]
`false`:: Scripting is enabled.
`true`:: Scripting is disabled.
`sandbox`:: Scripting is enabled only for sandboxed languages.
The default values are the following:
[source,yaml]
-----------------------------------
script.inline: sandbox
script.stored: sandbox
script.file: true
-----------------------------------
NOTE: Global scripting settings affect the `mustache` scripting language.
<<search-template,Search templates>> internally use the `mustache` language,
and will still be enabled by default as the `mustache` engine is sandboxed,
but they will be enabled/disabled according to fine-grained settings
specified in `elasticsearch.yml`.
[[security-script-context]]
[float]
=== Script context settings
Scripting may also be enabled or disabled in different contexts in the
Elasticsearch API. The supported contexts are:
[horizontal]
`aggs`:: Aggregations
`search`:: Search api, Percolator API and Suggester API
`update`:: Update api
`plugin`:: Any plugin that makes use of scripts under the generic `plugin` category
Plugins can also define custom operations that they use scripts for instead
of using the generic `plugin` category. Those operations can be referred to
in the following form: `${pluginName}_${operation}`.
The following example disables scripting for `update` and `plugin` operations,
regardless of the script source or language. Scripts can still be executed
from sandboxed languages as part of `aggregations`, `search` and plugins
execution though, as the above defaults still get applied.
[source,yaml]
-----------------------------------
script.update: false
script.plugin: false
-----------------------------------
[[security-script-fine]]
[float]
=== Fine-grained script settings
First, the high-level script settings described above are applied in order
(context settings have precedence over source settings). Then, fine-grained
settings which include the script language take precedence over any high-level
settings.
Fine-grained settings have the form:
[source,yaml]
------------------------
script.engine.{lang}.{source}.{context}: true|false
------------------------
For example:
[source,yaml]
-----------------------------------
script.inline: false <1>
script.stored: false <1>
script.file: false <1>
script.engine.groovy.stored.search: true <2>
script.engine.groovy.stored.aggs: true <2>
script.engine.mustache.stored.search: true <3>
-----------------------------------
<1> Disable all scripting from any source.
<2> Allow stored Groovy scripts to be used for search and aggregations.
<3> Allow stored Mustache templates to be used for search.
[[java-security-manager]]
[float]
=== Java Security Manager
Elasticsearch runs with the https://docs.oracle.com/javase/tutorial/essential/environment/security.html[Java Security Manager]
enabled by default. The security policy in Elasticsearch locks down the

View File

@ -0,0 +1,238 @@
[[modules-scripting-using]]
=== How to use scripts
Wherever scripting is supported in the Elasticsearch API, the syntax follows
the same pattern:
[source,js]
-------------------------------------
"script": {
"lang": "...", <1>
"inline" | "id" | "file": "...", <2>
"params": { ... } <3>
}
-------------------------------------
<1> The language the script is written in, which defaults to `groovy`.
<2> The script itself which may be specfied as `inline`, `id`, or `file`.
<3> Any named parameters that should be passed into the script.
For example, the following script is used in a search request to return a
<<search-request-script-fields, scripted field>>:
[source,js]
-------------------------------------
PUT my_index/my_type/1
{
"my_field": 5
}
GET my_index/_search
{
"script_fields": {
"my_doubled_field": {
"script": {
"lang": "expression",
"inline": "doc['my_field'] * multiplier",
"params": {
"multiplier": 2
}
}
}
}
}
-------------------------------------
// AUTOSENSE
[float]
=== Script Parameters
`lang`::
Specifies the language the script is written in. Defaults to `groovy` but
may be set to any of languages listed in <<modules-scripting>>. The
default language may be changed in the `elasticsearch.yml` config file by
setting `script.default_lang` to the appropriate language.
`inline`, `id`, `file`::
Specifies the source of the script. An `inline` script is specified
`inline` as in the example above, a stored script with the specified `id`
is retrieved from the cluster state (see <<modules-scripting-stored-scripts,Stored Scripts>>),
and a `file` script is retrieved from a file in the `config/scripts`
directory (see <<modules-scripting-file-scripts, File Scripts>>).
+
While languages like `expression` and `painless` can be used out of the box as
inline or stored scripts, other languages like `groovy` can only be
specified as `file` unless you first adjust the default
<<modules-scripting-security,scripting security settings>>.
`params`::
Specifies any named parameters that are passed into the script as
variables.
[IMPORTANT]
.Prefer parameters
========================================
The first time Elasticsearch sees a new script, it compiles it and stores the
compiled version in a cache. Compilation can be a heavy process.
If you need to pass variables into the script, you should pass them in as
named `params` instead of hard-coding values into the script itself. For
example, if you want to be able to multiply a field value by different
multipliers, don't hard-code the multiplier into the script:
[source,js]
----------------------
"inline": "doc['my_field'] * 2"
----------------------
Instead, pass it in as a named parameter:
[source,js]
----------------------
"inline": "doc['my_field'] * multiplier",
"params": {
"multiplier": 2
}
----------------------
The first version has to be recompiled every time the multiplier changes. The
second version is only compiled once.
========================================
[float]
[[modules-scripting-file-scripts]]
=== File-based Scripts
To increase security, non-sandboxed languages can only be specified in script
files stored on every node in the cluster. File scripts must be saved in the
`scripts` directory whose default location depends on whether you use the
<<zip-targz-layout,`zip`/`tar.gz`>> (`$ES_HOME/config/scripts/`),
<<rpm-layout,RPM>>, or <<deb-layout,Debian>> package. The default may be
changed with the `path.script` setting.
Any files placed in the `scripts` directory will be compiled automatically
when the node starts up and then <<reload-scripts,every 60 seconds thereafter>>.
The file should be named as follows: `{script-name}.{lang}`. For instance,
the following example creates a Groovy script called `calculate-score`:
[source,sh]
--------------------------------------------------
cat "log(_score * 2) + my_modifier" > config/scripts/calculate-score.groovy
--------------------------------------------------
This script can be used as follows:
[source,js]
--------------------------------------------------
GET my_index/_search
{
"query": {
"script": {
"script": {
"lang": "groovy", <1>
"file": "calculate-score", <2>
"params": {
"my_modifier": 2
}
}
}
}
}
--------------------------------------------------
<1> The language of the script, which should correspond with the script file suffix.
<2> The name of the script, which should be the name of the file.
The `script` directory may contain sub-directories, in which case the
hierarchy of directories is flattened and concatenated with underscores. A
script in `group1/group2/my_script.groovy` should use `group1_group2_myscript`
as the `file` name.
[[reload-scripts]]
[float]
==== Automatic script reloading
The `scripts` directory will be rescanned every `60s` (configurable with the
`resource.reload.interval` setting) and new, changed, or removed scripts will
be compiled, updated, or deleted from the script cache.
Script reloading can be completely disabled by setting
`script.auto_reload_enabled` to `false`.
[float]
[[modules-scripting-stored-scripts]]
=== Stored Scripts
Scripts may be stored in and retrieved from the cluster state using the
`_scripts` end-point:
[source,js]
-----------------------------------
/_scripts/{lang}/{id} <1> <2>
-----------------------------------
<1> The `lang` represents the script language.
<2> The `id` is a unique identifier or script name.
This example stores a Groovy script called `calculate-score` in the cluster
state:
[source,js]
-----------------------------------
POST /_scripts/groovy/calculate-score
{
"script": "log(_score * 2) + my_modifier"
}
-----------------------------------
// AUTOSENSE
This same script can be retrieved with:
[source,js]
-----------------------------------
GET /_scripts/groovy/calculate-score
-----------------------------------
// AUTOSENSE
or deleted with:
[source,js]
-----------------------------------
DELETE /_scripts/groovy/calculate-score
-----------------------------------
// AUTOSENSE
Stored scripts can be used by specifying the `lang` and `id` parameters as follows:
[source,js]
--------------------------------------------------
GET my_index/_search
{
"query": {
"script": {
"script": {
"lang": "groovy",
"id": "calculate-score",
"params": {
"my_modifier": 2
}
}
}
}
}
--------------------------------------------------
NOTE: The size of stored scripts is limited to 65,535 bytes. This can be
changed by setting `script.max_size_in_bytes` setting to increase that soft
limit, but if scripts are really large then alternatives like
<<modules-scripting-native,native>> scripts should be considered instead.