[DOCS] Changes scripted metric to filter aggs in transforms example (#54167)

This commit is contained in:
István Zoltán Szabó 2020-03-30 09:49:40 +02:00
parent 6e025c12f0
commit 00eaa0ebe5
1 changed files with 35 additions and 50 deletions

View File

@ -188,20 +188,15 @@ or flight stats for any of the featured destination or origin airports.
[[example-clientips]] [[example-clientips]]
==== Finding suspicious client IPs by using scripted metrics ==== Finding suspicious client IPs
With {transforms}, you can use In this example, we use the web log sample dataset to identify suspicious client
{ref}/search-aggregations-metrics-scripted-metric-aggregation.html[scripted IPs. We transform the data such that the new index contains the sum of bytes and
metric aggregations] on your data. These aggregations are flexible and make the number of distinct URLs, agents, incoming requests by location, and
it possible to perform very complex processing. Let's use scripted metrics to geographic destinations for each client IP. We also use filter aggregations to
identify suspicious client IPs in the web log sample dataset. count the specific types of HTTP responses that each client IP receives.
Ultimately, the example below transforms web log data into an entity centric
We transform the data such that the new index contains the sum of bytes and the index where the entity is `clientip`.
number of distinct URLs, agents, incoming requests by location, and geographic
destinations for each client IP. We also use a scripted field to count the
specific types of HTTP responses that each client IP receives. Ultimately, the
example below transforms web log data into an entity centric index where the
entity is `clientip`.
[source,console] [source,console]
---------------------------------- ----------------------------------
@ -230,30 +225,17 @@ PUT _transform/suspicious_client_ips
"agent_dc": { "cardinality": { "field": "agent.keyword" }}, "agent_dc": { "cardinality": { "field": "agent.keyword" }},
"geo.dest_dc": { "cardinality": { "field": "geo.dest" }}, "geo.dest_dc": { "cardinality": { "field": "geo.dest" }},
"responses.total": { "value_count": { "field": "timestamp" }}, "responses.total": { "value_count": { "field": "timestamp" }},
"responses.counts": { <4> "success" : { <4>
"scripted_metric": { "filter": {
"init_script": "state.responses = ['error':0L,'success':0L,'other':0L]", "term": { "response" : "200"}}
"map_script": """ },
def code = doc['response.keyword'].value; "error404" : {
if (code.startsWith('5') || code.startsWith('4')) { "filter": {
state.responses.error += 1 ; "term": { "response" : "404"}}
} else if(code.startsWith('2')) { },
state.responses.success += 1; "error503" : {
} else { "filter": {
state.responses.other += 1; "term": { "response" : "503"}}
}
""",
"combine_script": "state.responses",
"reduce_script": """
def counts = ['error': 0L, 'success': 0L, 'other': 0L];
for (responses in states) {
counts.error += responses['error'];
counts.success += responses['success'];
counts.other += responses['other'];
}
return counts;
"""
}
}, },
"timestamp.min": { "min": { "field": "timestamp" }}, "timestamp.min": { "min": { "field": "timestamp" }},
"timestamp.max": { "max": { "field": "timestamp" }}, "timestamp.max": { "max": { "field": "timestamp" }},
@ -277,11 +259,13 @@ PUT _transform/suspicious_client_ips
to synchronize the source and destination indices. The worst case to synchronize the source and destination indices. The worst case
ingestion delay is 60 seconds. ingestion delay is 60 seconds.
<3> The data is grouped by the `clientip` field. <3> The data is grouped by the `clientip` field.
<4> This `scripted_metric` performs a distributed operation on the web log data <4> Filter aggregation that counts the occurrences of successful (`200`)
to count specific types of HTTP responses (error, success, and other). responses in the `response` field. The following two aggregations (`error404`
and `error503`) count the error responses by error codes.
<5> This `bucket_script` calculates the duration of the `clientip` access based <5> This `bucket_script` calculates the duration of the `clientip` access based
on the results of the aggregation. on the results of the aggregation.
After you create the {transform}, you must start it: After you create the {transform}, you must start it:
[source,console] [source,console]
@ -290,6 +274,7 @@ POST _transform/suspicious_client_ips/_start
---------------------------------- ----------------------------------
// TEST[skip:setup kibana sample data] // TEST[skip:setup kibana sample data]
Shortly thereafter, the first results should be available in the destination Shortly thereafter, the first results should be available in the destination
index: index:
@ -299,6 +284,7 @@ GET sample_weblogs_by_clientip/_search
---------------------------------- ----------------------------------
// TEST[skip:setup kibana sample data] // TEST[skip:setup kibana sample data]
The search result shows you data like this for each client IP: The search result shows you data like this for each client IP:
[source,js] [source,js]
@ -313,22 +299,20 @@ The search result shows you data like this for each client IP:
"src_dc" : 2.0, "src_dc" : 2.0,
"dest_dc" : 2.0 "dest_dc" : 2.0
}, },
"success" : 2,
"error404" : 0,
"error503" : 0,
"clientip" : "0.72.176.46", "clientip" : "0.72.176.46",
"agent_dc" : 2.0, "agent_dc" : 2.0,
"bytes_sum" : 4422.0, "bytes_sum" : 4422.0,
"responses" : { "responses" : {
"total" : 2.0, "total" : 2.0
"counts" : {
"other" : 0,
"success" : 2,
"error" : 0
}
}, },
"url_dc" : 2.0, "url_dc" : 2.0,
"timestamp" : { "timestamp" : {
"duration_ms" : 5.2191698E8, "duration_ms" : 5.2191698E8,
"min" : "2019-11-25T07:51:57.333Z", "min" : "2020-03-16T07:51:57.333Z",
"max" : "2019-12-01T08:50:34.313Z" "max" : "2020-03-22T08:50:34.313Z"
} }
} }
} }
@ -337,11 +321,12 @@ The search result shows you data like this for each client IP:
// NOTCONSOLE // NOTCONSOLE
NOTE: Like other Kibana sample data sets, the web log sample dataset contains NOTE: Like other Kibana sample data sets, the web log sample dataset contains
timestamps relative to when you installed it, including timestamps in the future. timestamps relative to when you installed it, including timestamps in the
The {ctransform} will pick up the data points once they are in the past. If you future. The {ctransform} will pick up the data points once they are in the past.
installed the web log sample dataset some time ago, you can uninstall and If you installed the web log sample dataset some time ago, you can uninstall and
reinstall it and the timestamps will change. reinstall it and the timestamps will change.
This {transform} makes it easier to answer questions such as: This {transform} makes it easier to answer questions such as:
* Which client IPs are transferring the most amounts of data? * Which client IPs are transferring the most amounts of data?