OpenSearch/docs/painless/painless-getting-started.as...

392 lines
12 KiB
Plaintext

[[painless-getting-started]]
== Getting Started with Painless
include::painless-description.asciidoc[]
[[painless-examples]]
=== Painless Examples
To illustrate how Painless works, let's load some hockey stats into an Elasticsearch index:
[source,js]
----------------------------------------------------------------
PUT hockey/player/_bulk?refresh
{"index":{"_id":1}}
{"first":"johnny","last":"gaudreau","goals":[9,27,1],"assists":[17,46,0],"gp":[26,82,1],"born":"1993/08/13"}
{"index":{"_id":2}}
{"first":"sean","last":"monohan","goals":[7,54,26],"assists":[11,26,13],"gp":[26,82,82],"born":"1994/10/12"}
{"index":{"_id":3}}
{"first":"jiri","last":"hudler","goals":[5,34,36],"assists":[11,62,42],"gp":[24,80,79],"born":"1984/01/04"}
{"index":{"_id":4}}
{"first":"micheal","last":"frolik","goals":[4,6,15],"assists":[8,23,15],"gp":[26,82,82],"born":"1988/02/17"}
{"index":{"_id":5}}
{"first":"sam","last":"bennett","goals":[5,0,0],"assists":[8,1,0],"gp":[26,1,0],"born":"1996/06/20"}
{"index":{"_id":6}}
{"first":"dennis","last":"wideman","goals":[0,26,15],"assists":[11,30,24],"gp":[26,81,82],"born":"1983/03/20"}
{"index":{"_id":7}}
{"first":"david","last":"jones","goals":[7,19,5],"assists":[3,17,4],"gp":[26,45,34],"born":"1984/08/10"}
{"index":{"_id":8}}
{"first":"tj","last":"brodie","goals":[2,14,7],"assists":[8,42,30],"gp":[26,82,82],"born":"1990/06/07"}
{"index":{"_id":39}}
{"first":"mark","last":"giordano","goals":[6,30,15],"assists":[3,30,24],"gp":[26,60,63],"born":"1983/10/03"}
{"index":{"_id":10}}
{"first":"mikael","last":"backlund","goals":[3,15,13],"assists":[6,24,18],"gp":[26,82,82],"born":"1989/03/17"}
{"index":{"_id":11}}
{"first":"joe","last":"colborne","goals":[3,18,13],"assists":[6,20,24],"gp":[26,67,82],"born":"1990/01/30"}
----------------------------------------------------------------
// CONSOLE
// TESTSETUP
[float]
==== Accessing Doc Values from Painless
Document values can be accessed from a `Map` named `doc`.
For example, the following script calculates a player's total goals. This example uses a strongly typed `int` and a `for` loop.
[source,js]
----------------------------------------------------------------
GET hockey/_search
{
"query": {
"function_score": {
"script_score": {
"script": {
"lang": "painless",
"source": """
int total = 0;
for (int i = 0; i < doc['goals'].length; ++i) {
total += doc['goals'][i];
}
return total;
"""
}
}
}
}
}
----------------------------------------------------------------
// CONSOLE
Alternatively, you could do the same thing using a script field instead of a function score:
[source,js]
----------------------------------------------------------------
GET hockey/_search
{
"query": {
"match_all": {}
},
"script_fields": {
"total_goals": {
"script": {
"lang": "painless",
"source": """
int total = 0;
for (int i = 0; i < doc['goals'].length; ++i) {
total += doc['goals'][i];
}
return total;
"""
}
}
}
}
----------------------------------------------------------------
// CONSOLE
The following example uses a Painless script to sort the players by their combined first and last names. The names are accessed using
`doc['first'].value` and `doc['last'].value`.
[source,js]
----------------------------------------------------------------
GET hockey/_search
{
"query": {
"match_all": {}
},
"sort": {
"_script": {
"type": "string",
"order": "asc",
"script": {
"lang": "painless",
"source": "doc['first.keyword'].value + ' ' + doc['last.keyword'].value"
}
}
}
}
----------------------------------------------------------------
// CONSOLE
[float]
==== Updating Fields with Painless
You can also easily update fields. You access the original source for a field as `ctx._source.<field-name>`.
First, let's look at the source data for a player by submitting the following request:
[source,js]
----------------------------------------------------------------
GET hockey/_search
{
"stored_fields": [
"_id",
"_source"
],
"query": {
"term": {
"_id": 1
}
}
}
----------------------------------------------------------------
// CONSOLE
To change player 1's last name to `hockey`, simply set `ctx._source.last` to the new value:
[source,js]
----------------------------------------------------------------
POST hockey/player/1/_update
{
"script": {
"lang": "painless",
"source": "ctx._source.last = params.last",
"params": {
"last": "hockey"
}
}
}
----------------------------------------------------------------
// CONSOLE
You can also add fields to a document. For example, this script adds a new field that contains
the player's nickname, _hockey_.
[source,js]
----------------------------------------------------------------
POST hockey/player/1/_update
{
"script": {
"lang": "painless",
"source": """
ctx._source.last = params.last;
ctx._source.nick = params.nick
""",
"params": {
"last": "gaudreau",
"nick": "hockey"
}
}
}
----------------------------------------------------------------
// CONSOLE
[float]
[[modules-scripting-painless-dates]]
==== Dates
Date fields are exposed as
`ReadableDateTime`
so they support methods like
`getYear`,
and `getDayOfWeek`.
To get milliseconds since epoch call
`getMillis`.
For example, the following returns every hockey player's birth year:
[source,js]
----------------------------------------------------------------
GET hockey/_search
{
"script_fields": {
"birth_year": {
"script": {
"source": "doc.born.value.year"
}
}
}
}
----------------------------------------------------------------
// CONSOLE
[float]
[[modules-scripting-painless-regex]]
==== Regular expressions
NOTE: Regexes are disabled by default because they circumvent Painless's
protection against long running and memory hungry scripts. To make matters
worse even innocuous looking regexes can have staggering performance and stack
depth behavior. They remain an amazing powerful tool but are too scary to enable
by default. To enable them yourself set `script.painless.regex.enabled: true` in
`elasticsearch.yml`. We'd like very much to have a safe alternative
implementation that can be enabled by default so check this space for later
developments!
Painless's native support for regular expressions has syntax constructs:
* `/pattern/`: Pattern literals create patterns. This is the only way to create
a pattern in painless. The pattern inside the ++/++'s are just
http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html[Java regular expressions].
See <<pattern-flags>> for more.
* `=~`: The find operator return a `boolean`, `true` if a subsequence of the
text matches, `false` otherwise.
* `==~`: The match operator returns a `boolean`, `true` if the text matches,
`false` if it doesn't.
Using the find operator (`=~`) you can update all hockey players with "b" in
their last name:
[source,js]
----------------------------------------------------------------
POST hockey/_update_by_query
{
"script": {
"lang": "painless",
"source": """
if (ctx._source.last =~ /b/) {
ctx._source.last += "matched";
} else {
ctx.op = "noop";
}
"""
}
}
----------------------------------------------------------------
// CONSOLE
Using the match operator (`==~`) you can update all the hockey players whose
names start with a consonant and end with a vowel:
[source,js]
----------------------------------------------------------------
POST hockey/_update_by_query
{
"script": {
"lang": "painless",
"source": """
if (ctx._source.last ==~ /[^aeiou].*[aeiou]/) {
ctx._source.last += "matched";
} else {
ctx.op = "noop";
}
"""
}
}
----------------------------------------------------------------
// CONSOLE
You can use the `Pattern.matcher` directly to get a `Matcher` instance and
remove all of the vowels in all of their last names:
[source,js]
----------------------------------------------------------------
POST hockey/_update_by_query
{
"script": {
"lang": "painless",
"source": "ctx._source.last = /[aeiou]/.matcher(ctx._source.last).replaceAll('')"
}
}
----------------------------------------------------------------
// CONSOLE
`Matcher.replaceAll` is just a call to Java's `Matcher`'s
http://docs.oracle.com/javase/8/docs/api/java/util/regex/Matcher.html#replaceAll-java.lang.String-[replaceAll]
method so it supports `$1` and `\1` for replacements:
[source,js]
----------------------------------------------------------------
POST hockey/_update_by_query
{
"script": {
"lang": "painless",
"source": "ctx._source.last = /n([aeiou])/.matcher(ctx._source.last).replaceAll('$1')"
}
}
----------------------------------------------------------------
// CONSOLE
If you need more control over replacements you can call `replaceAll` on a
`CharSequence` with a `Function<Matcher, String>` that builds the replacement.
This does not support `$1` or `\1` to access replacements because you already
have a reference to the matcher and can get them with `m.group(1)`.
IMPORTANT: Calling `Matcher.find` inside of the function that builds the
replacement is rude and will likely break the replacement process.
This will make all of the vowels in the hockey player's last names upper case:
[source,js]
----------------------------------------------------------------
POST hockey/_update_by_query
{
"script": {
"lang": "painless",
"source": """
ctx._source.last = ctx._source.last.replaceAll(/[aeiou]/, m ->
m.group().toUpperCase(Locale.ROOT))
"""
}
}
----------------------------------------------------------------
// CONSOLE
Or you can use the `CharSequence.replaceFirst` to make the first vowel in their
last names upper case:
[source,js]
----------------------------------------------------------------
POST hockey/_update_by_query
{
"script": {
"lang": "painless",
"source": """
ctx._source.last = ctx._source.last.replaceFirst(/[aeiou]/, m ->
m.group().toUpperCase(Locale.ROOT))
"""
}
}
----------------------------------------------------------------
// CONSOLE
Note: all of the `_update_by_query` examples above could really do with a
`query` to limit the data that they pull back. While you *could* use a
{ref}/query-dsl-script-query.html[script query] it wouldn't be as efficient
as using any other query because script queries aren't able to use the inverted
index to limit the documents that they have to check.
[[modules-scripting-painless-dispatch]]
=== How painless dispatches functions
Painless uses receiver, name, and https://en.wikipedia.org/wiki/Arity[arity]
for method dispatch. For example, `s.foo(a, b)` is resolved by first getting
the class of `s` and then looking up the method `foo` with two parameters. This
is different from Groovy which uses the
https://en.wikipedia.org/wiki/Multiple_dispatch[runtime types] of the
parameters and Java which uses the compile time types of the parameters.
The consequence of this that Painless doesn't support overloaded methods like
Java, leading to some trouble when it whitelists classes from the Java
standard library. For example, in Java and Groovy, `Matcher` has two methods:
`group(int)` and `group(String)`. Painless can't whitelist both of these methods
because they have the same name and the same number of parameters. So instead it
has `group(int)` and `namedGroup(String)`.
We have a few justifications for this different way of dispatching methods:
1. It makes operating on `def` types simpler and, presumably, faster. Using
receiver, name, and arity means that when Painless sees a call on a `def` object it
can dispatch the appropriate method without having to do expensive comparisons
of the types of the parameters. The same is true for invocations with `def`
typed parameters.
2. It keeps things consistent. It would be genuinely weird for Painless to
behave like Groovy if any `def` typed parameters were involved and Java
otherwise. It'd be slow for it to behave like Groovy all the time.
3. It keeps Painless maintainable. Adding the Java or Groovy like method
dispatch *feels* like it'd add a ton of complexity which'd make maintenance and
other improvements much more difficult.
include::painless-debugging.asciidoc[]