640 lines
21 KiB
Plaintext
640 lines
21 KiB
Plaintext
[role="xpack"]
|
|
[testenv="basic"]
|
|
[[transform-painless-examples]]
|
|
= Painless examples for {transforms}
|
|
++++
|
|
<titleabbrev>Painless examples</titleabbrev>
|
|
++++
|
|
|
|
These examples demonstrate how to use Painless in {transforms}. You can learn
|
|
more about the Painless scripting language in the
|
|
{painless}/painless-guide.html[Painless guide].
|
|
|
|
* <<painless-top-hits>>
|
|
* <<painless-time-features>>
|
|
* <<painless-group-by>>
|
|
* <<painless-bucket-script>>
|
|
* <<painless-count-http>>
|
|
* <<painless-compare>>
|
|
* <<painless-web-session>>
|
|
|
|
NOTE: While the context of the following examples is the {transform} use case,
|
|
the Painless scripts in the snippets below can be used in other {es} search
|
|
aggregations, too.
|
|
|
|
[[painless-top-hits]]
|
|
== Getting top hits by using scripted metric aggregation
|
|
|
|
This snippet shows how to find the latest document, in other words the document
|
|
with the earliest timestamp. From a technical perspective, it helps to achieve
|
|
the function of a <<search-aggregations-metrics-top-hits-aggregation>> by using
|
|
scripted metric aggregation in a {transform}, which provides a metric output.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
"aggregations": {
|
|
"latest_doc": {
|
|
"scripted_metric": {
|
|
"init_script": "state.timestamp_latest = 0L; state.last_doc = ''", <1>
|
|
"map_script": """ <2>
|
|
def current_date = doc['@timestamp'].getValue().toInstant().toEpochMilli();
|
|
if (current_date > state.timestamp_latest)
|
|
{state.timestamp_latest = current_date;
|
|
state.last_doc = new HashMap(params['_source']);}
|
|
""",
|
|
"combine_script": "return state", <3>
|
|
"reduce_script": """ <4>
|
|
def last_doc = '';
|
|
def timestamp_latest = 0L;
|
|
for (s in states) {if (s.timestamp_latest > (timestamp_latest))
|
|
{timestamp_latest = s.timestamp_latest; last_doc = s.last_doc;}}
|
|
return last_doc
|
|
"""
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// NOTCONSOLE
|
|
|
|
<1> The `init_script` creates a long type `timestamp_latest` and a string type
|
|
`last_doc` in the `state` object.
|
|
<2> The `map_script` defines `current_date` based on the timestamp of the
|
|
document, then compares `current_date` with `state.timestamp_latest`, finally
|
|
returns `state.last_doc` from the shard. By using `new HashMap(...)` you copy
|
|
the source document, this is important whenever you want to pass the full source
|
|
object from one phase to the next.
|
|
<3> The `combine_script` returns `state` from each shard.
|
|
<4> The `reduce_script` iterates through the value of `s.timestamp_latest`
|
|
returned by each shard and returns the document with the latest timestamp
|
|
(`last_doc`). In the response, the top hit (in other words, the `latest_doc`) is
|
|
nested below the `latest_doc` field.
|
|
|
|
Check the
|
|
<<scripted-metric-aggregation-scope,scope of scripts>>
|
|
for detailed explanation on the respective scripts.
|
|
|
|
You can retrieve the last value in a similar way:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
"aggregations": {
|
|
"latest_value": {
|
|
"scripted_metric": {
|
|
"init_script": "state.timestamp_latest = 0L; state.last_value = ''",
|
|
"map_script": """
|
|
def current_date = doc['date'].getValue().toInstant().toEpochMilli();
|
|
if (current_date > state.timestamp_latest)
|
|
{state.timestamp_latest = current_date;
|
|
state.last_value = params['_source']['value'];}
|
|
""",
|
|
"combine_script": "return state",
|
|
"reduce_script": """
|
|
def last_value = '';
|
|
def timestamp_latest = 0L;
|
|
for (s in states) {if (s.timestamp_latest > (timestamp_latest))
|
|
{timestamp_latest = s.timestamp_latest; last_value = s.last_value;}}
|
|
return last_value
|
|
"""
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// NOTCONSOLE
|
|
|
|
[[painless-time-features]]
|
|
== Getting time features by using aggregations
|
|
|
|
This snippet shows how to extract time based features by using Painless in a
|
|
{transform}. The snippet uses an index where `@timestamp` is defined as a `date`
|
|
type field.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
"aggregations": {
|
|
"avg_hour_of_day": { <1>
|
|
"avg":{
|
|
"script": { <2>
|
|
"source": """
|
|
ZonedDateTime date = doc['@timestamp'].value; <3>
|
|
return date.getHour(); <4>
|
|
"""
|
|
}
|
|
}
|
|
},
|
|
"avg_month_of_year": { <5>
|
|
"avg":{
|
|
"script": { <6>
|
|
"source": """
|
|
ZonedDateTime date = doc['@timestamp'].value; <7>
|
|
return date.getMonthValue(); <8>
|
|
"""
|
|
}
|
|
}
|
|
},
|
|
...
|
|
}
|
|
--------------------------------------------------
|
|
// NOTCONSOLE
|
|
|
|
<1> Name of the aggregation.
|
|
<2> Contains the Painless script that returns the hour of the day.
|
|
<3> Sets `date` based on the timestamp of the document.
|
|
<4> Returns the hour value from `date`.
|
|
<5> Name of the aggregation.
|
|
<6> Contains the Painless script that returns the month of the year.
|
|
<7> Sets `date` based on the timestamp of the document.
|
|
<8> Returns the month value from `date`.
|
|
|
|
[[painless-group-by]]
|
|
== Using Painless in `group_by`
|
|
|
|
It is possible to base the `group_by` property of a {transform} on the output of
|
|
a script. The following example uses the {kib} sample web logs dataset. The goal
|
|
here is to make the {transform} output easier to understand through normalizing
|
|
the value of the fields that the data is grouped by.
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
POST _transform/_preview
|
|
{
|
|
"source": {
|
|
"index": [ <1>
|
|
"kibana_sample_data_logs"
|
|
]
|
|
},
|
|
"pivot": {
|
|
"group_by": {
|
|
"agent": {
|
|
"terms": {
|
|
"script": { <2>
|
|
"source": """String agent = doc['agent.keyword'].value;
|
|
if (agent.contains("MSIE")) {
|
|
return "internet explorer";
|
|
} else if (agent.contains("AppleWebKit")) {
|
|
return "safari";
|
|
} else if (agent.contains('Firefox')) {
|
|
return "firefox";
|
|
} else { return agent }""",
|
|
"lang": "painless"
|
|
}
|
|
}
|
|
}
|
|
},
|
|
"aggregations": { <3>
|
|
"200": {
|
|
"filter": {
|
|
"term": {
|
|
"response": "200"
|
|
}
|
|
}
|
|
},
|
|
"404": {
|
|
"filter": {
|
|
"term": {
|
|
"response": "404"
|
|
}
|
|
}
|
|
},
|
|
"503": {
|
|
"filter": {
|
|
"term": {
|
|
"response": "503"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
},
|
|
"dest": { <4>
|
|
"index": "pivot_logs"
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TEST[skip:setup kibana sample data]
|
|
|
|
<1> Specifies the source index or indices.
|
|
<2> The script defines an `agent` string based on the `agent` field of the
|
|
documents, then iterates through the values. If an `agent` field contains
|
|
"MSIE", than the script returns "Internet Explorer". If it contains
|
|
`AppleWebKit`, it returns "safari". It returns "firefox" if the field value
|
|
contains "Firefox". Finally, in every other case, the value of the field is
|
|
returned.
|
|
<3> The aggregations object contains filters that narrow down the results to
|
|
documents that contains `200`, `404`, or `503` values in the `response` field.
|
|
<4> Specifies the destination index of the {transform}.
|
|
|
|
The API returns the following result:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"preview" : [
|
|
{
|
|
"agent" : "firefox",
|
|
"200" : 4931,
|
|
"404" : 259,
|
|
"503" : 172
|
|
},
|
|
{
|
|
"agent" : "internet explorer",
|
|
"200" : 3674,
|
|
"404" : 210,
|
|
"503" : 126
|
|
},
|
|
{
|
|
"agent" : "safari",
|
|
"200" : 4227,
|
|
"404" : 332,
|
|
"503" : 143
|
|
}
|
|
],
|
|
"mappings" : {
|
|
"properties" : {
|
|
"200" : {
|
|
"type" : "long"
|
|
},
|
|
"agent" : {
|
|
"type" : "keyword"
|
|
},
|
|
"404" : {
|
|
"type" : "long"
|
|
},
|
|
"503" : {
|
|
"type" : "long"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// NOTCONSOLE
|
|
|
|
You can see that the `agent` values are simplified so it is easier to interpret
|
|
them. The table below shows how normalization modifies the output of the
|
|
{transform} in our example compared to the non-normalized values.
|
|
|
|
[width="50%"]
|
|
|
|
|===
|
|
| Non-normalized `agent` value | Normalized `agent` value
|
|
|
|
| "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)" | "internet explorer"
|
|
| "Mozilla/5.0 (X11; Linux i686) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/11.0.696.50 Safari/534.24" | "safari"
|
|
| "Mozilla/5.0 (X11; Linux x86_64; rv:6.0a1) Gecko/20110421 Firefox/6.0a1" | "firefox"
|
|
|===
|
|
|
|
[[painless-bucket-script]]
|
|
== Getting duration by using bucket script
|
|
|
|
This example shows you how to get the duration of a session by client IP from a
|
|
data log by using
|
|
<<search-aggregations-pipeline-bucket-script-aggregation,bucket script>>.
|
|
The example uses the {kib} sample web logs dataset.
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
PUT _transform/data_log
|
|
{
|
|
"source": {
|
|
"index": "kibana_sample_data_logs"
|
|
},
|
|
"dest": {
|
|
"index": "data-logs-by-client"
|
|
},
|
|
"pivot": {
|
|
"group_by": {
|
|
"machine.os": {"terms": {"field": "machine.os.keyword"}},
|
|
"machine.ip": {"terms": {"field": "clientip"}}
|
|
},
|
|
"aggregations": {
|
|
"time_frame.lte": {
|
|
"max": {
|
|
"field": "timestamp"
|
|
}
|
|
},
|
|
"time_frame.gte": {
|
|
"min": {
|
|
"field": "timestamp"
|
|
}
|
|
},
|
|
"time_length": { <1>
|
|
"bucket_script": {
|
|
"buckets_path": { <2>
|
|
"min": "time_frame.gte.value",
|
|
"max": "time_frame.lte.value"
|
|
},
|
|
"script": "params.max - params.min" <3>
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TEST[skip:setup kibana sample data]
|
|
|
|
<1> To define the length of the sessions, we use a bucket script.
|
|
<2> The bucket path is a map of script variables and their associated path to
|
|
the buckets you want to use for the variable. In this particular case, `min` and
|
|
`max` are variables mapped to `time_frame.gte.value` and `time_frame.lte.value`.
|
|
<3> Finally, the script substracts the start date of the session from the end
|
|
date which results in the duration of the session.
|
|
|
|
[[painless-count-http]]
|
|
== Counting HTTP responses by using scripted metric aggregation
|
|
|
|
You can count the different HTTP response types in a web log data set by using
|
|
scripted metric aggregation as part of the {transform}. You can achieve a similar
|
|
function with filter aggregations, check the
|
|
{ref}/transform-examples.html#example-clientips[Finding suspicious client IPs]
|
|
example for details.
|
|
|
|
The example below assumes that the HTTP response codes are stored as keywords in
|
|
the `response` field of the documents.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
"aggregations": { <1>
|
|
"responses.counts": { <2>
|
|
"scripted_metric": { <3>
|
|
"init_script": "state.responses = ['error':0L,'success':0L,'other':0L]", <4>
|
|
"map_script": """ <5>
|
|
def code = doc['response.keyword'].value;
|
|
if (code.startsWith('5') || code.startsWith('4')) {
|
|
state.responses.error += 1 ;
|
|
} else if(code.startsWith('2')) {
|
|
state.responses.success += 1;
|
|
} else {
|
|
state.responses.other += 1;
|
|
}
|
|
""",
|
|
"combine_script": "state.responses", <6>
|
|
"reduce_script": """ <7>
|
|
def counts = ['error': 0L, 'success': 0L, 'other': 0L];
|
|
for (responses in states) {
|
|
counts.error += responses['error'];
|
|
counts.success += responses['success'];
|
|
counts.other += responses['other'];
|
|
}
|
|
return counts;
|
|
"""
|
|
}
|
|
},
|
|
...
|
|
}
|
|
--------------------------------------------------
|
|
// NOTCONSOLE
|
|
|
|
<1> The `aggregations` object of the {transform} that contains all aggregations.
|
|
<2> Object of the `scripted_metric` aggregation.
|
|
<3> This `scripted_metric` performs a distributed operation on the web log data
|
|
to count specific types of HTTP responses (error, success, and other).
|
|
<4> The `init_script` creates a `responses` array in the `state` object with
|
|
three properties (`error`, `success`, `other`) with long data type.
|
|
<5> The `map_script` defines `code` based on the `response.keyword` value of the
|
|
document, then it counts the errors, successes, and other responses based on the
|
|
first digit of the responses.
|
|
<6> The `combine_script` returns `state.responses` from each shard.
|
|
<7> The `reduce_script` creates a `counts` array with the `error`, `success`,
|
|
and `other` properties, then iterates through the value of `responses` returned
|
|
by each shard and assigns the different response types to the appropriate
|
|
properties of the `counts` object; error responses to the error counts, success
|
|
responses to the success counts, and other responses to the other counts.
|
|
Finally, returns the `counts` array with the response counts.
|
|
|
|
[[painless-compare]]
|
|
== Comparing indices by using scripted metric aggregations
|
|
|
|
This example shows how to compare the content of two indices by a {transform}
|
|
that uses a scripted metric aggregation.
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
POST _transform/_preview
|
|
{
|
|
"id" : "index_compare",
|
|
"source" : { <1>
|
|
"index" : [
|
|
"index1",
|
|
"index2"
|
|
],
|
|
"query" : {
|
|
"match_all" : { }
|
|
}
|
|
},
|
|
"dest" : { <2>
|
|
"index" : "compare"
|
|
},
|
|
"pivot" : {
|
|
"group_by" : {
|
|
"unique-id" : {
|
|
"terms" : {
|
|
"field" : "<unique-id-field>" <3>
|
|
}
|
|
}
|
|
},
|
|
"aggregations" : {
|
|
"compare" : { <4>
|
|
"scripted_metric" : {
|
|
"init_script" : "",
|
|
"map_script" : "state.doc = new HashMap(params['_source'])", <5>
|
|
"combine_script" : "return state", <6>
|
|
"reduce_script" : """ <7>
|
|
if (states.size() != 2) {
|
|
return "count_mismatch"
|
|
}
|
|
if (states.get(0).equals(states.get(1))) {
|
|
return "match"
|
|
} else {
|
|
return "mismatch"
|
|
}
|
|
"""
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TEST[skip:setup kibana sample data]
|
|
|
|
<1> The indices referenced in the `source` object are compared to each other.
|
|
<2> The `dest` index contains the results of the comparison.
|
|
<3> The `group_by` field needs to be a unique identifier for each document.
|
|
<4> Object of the `scripted_metric` aggregation.
|
|
<5> The `map_script` defines `doc` in the state object. By using
|
|
`new HashMap(...)` you copy the source document, this is important whenever you
|
|
want to pass the full source object from one phase to the next.
|
|
<6> The `combine_script` returns `state` from each shard.
|
|
<7> The `reduce_script` checks if the size of the indices are equal. If they are
|
|
not equal, than it reports back a `count_mismatch`. Then it iterates through all
|
|
the values of the two indices and compare them. If the values are equal, then it
|
|
returns a `match`, otherwise returns a `mismatch`.
|
|
|
|
[[painless-web-session]]
|
|
== Getting web session details by using scripted metric aggregation
|
|
|
|
This example shows how to derive multiple features from a single transaction.
|
|
Let's take a look on the example source document from the data:
|
|
|
|
.Source document
|
|
[%collapsible%open]
|
|
=====
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"_index":"apache-sessions",
|
|
"_type":"_doc",
|
|
"_id":"KvzSeGoB4bgw0KGbE3wP",
|
|
"_score":1.0,
|
|
"_source":{
|
|
"@timestamp":1484053499256,
|
|
"apache":{
|
|
"access":{
|
|
"sessionid":"571604f2b2b0c7b346dc685eeb0e2306774a63c2",
|
|
"url":"http://www.leroymerlin.fr/v3/search/search.do?keyword=Carrelage%20salle%20de%20bain",
|
|
"path":"/v3/search/search.do",
|
|
"query":"keyword=Carrelage%20salle%20de%20bain",
|
|
"referrer":"http://www.leroymerlin.fr/v3/p/produits/carrelage-parquet-sol-souple/carrelage-sol-et-mur/decor-listel-et-accessoires-carrelage-mural-l1308217717?resultOffset=0&resultLimit=51&resultListShape=MOSAIC&priceStyle=SALEUNIT_PRICE",
|
|
"user_agent":{
|
|
"original":"Mobile Safari 10.0 Mac OS X (iPad) Apple Inc.",
|
|
"os_name":"Mac OS X (iPad)"
|
|
},
|
|
"remote_ip":"0337b1fa-5ed4-af81-9ef4-0ec53be0f45d",
|
|
"geoip":{
|
|
"country_iso_code":"FR",
|
|
"location":{
|
|
"lat":48.86,
|
|
"lon":2.35
|
|
}
|
|
},
|
|
"response_code":200,
|
|
"method":"GET"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
...
|
|
--------------------------------------------------
|
|
// NOTCONSOLE
|
|
=====
|
|
|
|
|
|
By using the `sessionid` as a group-by field, you are able to enumerate events
|
|
through the session and get more details of the session by using scripted metric
|
|
aggregation.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
POST _transform/_preview
|
|
{
|
|
"source": {
|
|
"index": "apache-sessions"
|
|
},
|
|
"pivot": {
|
|
"group_by": {
|
|
"sessionid": { <1>
|
|
"terms": {
|
|
"field": "apache.access.sessionid"
|
|
}
|
|
}
|
|
},
|
|
"aggregations": { <2>
|
|
"distinct_paths": {
|
|
"cardinality": {
|
|
"field": "apache.access.path"
|
|
}
|
|
},
|
|
"num_pages_viewed": {
|
|
"value_count": {
|
|
"field": "apache.access.url"
|
|
}
|
|
},
|
|
"session_details": {
|
|
"scripted_metric": {
|
|
"init_script": "state.docs = []", <3>
|
|
"map_script": """ <4>
|
|
Map span = [
|
|
'@timestamp':doc['@timestamp'].value,
|
|
'url':doc['apache.access.url'].value,
|
|
'referrer':doc['apache.access.referrer'].value
|
|
];
|
|
state.docs.add(span)
|
|
""",
|
|
"combine_script": "return state.docs;", <5>
|
|
"reduce_script": """ <6>
|
|
def all_docs = [];
|
|
for (s in states) {
|
|
for (span in s) {
|
|
all_docs.add(span);
|
|
}
|
|
}
|
|
all_docs.sort((HashMap o1, HashMap o2)->o1['@timestamp'].millis.compareTo(o2['@timestamp'].millis));
|
|
def size = all_docs.size();
|
|
def min_time = all_docs[0]['@timestamp'];
|
|
def max_time = all_docs[size-1]['@timestamp'];
|
|
def duration = max_time.millis - min_time.millis;
|
|
def entry_page = all_docs[0]['url'];
|
|
def exit_path = all_docs[size-1]['url'];
|
|
def first_referrer = all_docs[0]['referrer'];
|
|
def ret = new HashMap();
|
|
ret['first_time'] = min_time;
|
|
ret['last_time'] = max_time;
|
|
ret['duration'] = duration;
|
|
ret['entry_page'] = entry_page;
|
|
ret['exit_path'] = exit_path;
|
|
ret['first_referrer'] = first_referrer;
|
|
return ret;
|
|
"""
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// NOTCONSOLE
|
|
|
|
<1> The data is grouped by `sessionid`.
|
|
<2> The aggregations counts the number of paths and enumerate the viewed pages
|
|
during the session.
|
|
<3> The `init_script` creates an array type `doc` in the `state` object.
|
|
<4> The `map_script` defines a `span` array with a timestamp, a URL, and a
|
|
referrer value which are based on the corresponding values of the document, then
|
|
adds the value of the `span` array to the `doc` object.
|
|
<5> The `combine_script` returns `state.docs` from each shard.
|
|
<6> The `reduce_script` defines various objects like `min_time`, `max_time`, and
|
|
`duration` based on the document fields, then declares a `ret` object, and
|
|
copies the source document by using `new HashMap ()`. Next, the script defines
|
|
`first_time`, `last_time`, `duration` and other fields inside the `ret` object
|
|
based on the corresponding object defined earlier, finally returns `ret`.
|
|
|
|
The API call results in a similar response:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"num_pages_viewed" : 2.0,
|
|
"session_details" : {
|
|
"duration" : 131374,
|
|
"first_referrer" : "https://www.bing.com/",
|
|
"entry_page" : "http://www.leroymerlin.fr/v3/p/produits/materiaux-menuiserie/porte-coulissante-porte-interieure-escalier-et-rambarde/barriere-de-securite-l1308218463",
|
|
"first_time" : "2017-01-10T21:22:52.982Z",
|
|
"last_time" : "2017-01-10T21:25:04.356Z",
|
|
"exit_path" : "http://www.leroymerlin.fr/v3/p/produits/materiaux-menuiserie/porte-coulissante-porte-interieure-escalier-et-rambarde/barriere-de-securite-l1308218463?__result-wrapper?pageTemplate=Famille%2FMat%C3%A9riaux+et+menuiserie&resultOffset=0&resultLimit=50&resultListShape=PLAIN&nomenclatureId=17942&priceStyle=SALEUNIT_PRICE&fcr=1&*4294718806=4294718806&*14072=14072&*4294718593=4294718593&*17942=17942"
|
|
},
|
|
"distinct_paths" : 1.0,
|
|
"sessionid" : "000046f8154a80fd89849369c984b8cc9d795814"
|
|
},
|
|
{
|
|
"num_pages_viewed" : 10.0,
|
|
"session_details" : {
|
|
"duration" : 343112,
|
|
"first_referrer" : "https://www.google.fr/",
|
|
"entry_page" : "http://www.leroymerlin.fr/",
|
|
"first_time" : "2017-01-10T16:57:39.937Z",
|
|
"last_time" : "2017-01-10T17:03:23.049Z",
|
|
"exit_path" : "http://www.leroymerlin.fr/v3/p/produits/porte-de-douche-coulissante-adena-e168578"
|
|
},
|
|
"distinct_paths" : 8.0,
|
|
"sessionid" : "000087e825da1d87a332b8f15fa76116c7467da6"
|
|
}
|
|
...
|
|
--------------------------------------------------
|
|
// NOTCONSOLE
|