[DOCS] Changes scripted metric to filter aggs in transforms example (#54167)
This commit is contained in:
parent
6e025c12f0
commit
00eaa0ebe5
|
@ -188,20 +188,15 @@ or flight stats for any of the featured destination or origin airports.
|
||||||
|
|
||||||
|
|
||||||
[[example-clientips]]
|
[[example-clientips]]
|
||||||
==== Finding suspicious client IPs by using scripted metrics
|
==== Finding suspicious client IPs
|
||||||
|
|
||||||
With {transforms}, you can use
|
In this example, we use the web log sample dataset to identify suspicious client
|
||||||
{ref}/search-aggregations-metrics-scripted-metric-aggregation.html[scripted
|
IPs. We transform the data such that the new index contains the sum of bytes and
|
||||||
metric aggregations] on your data. These aggregations are flexible and make
|
the number of distinct URLs, agents, incoming requests by location, and
|
||||||
it possible to perform very complex processing. Let's use scripted metrics to
|
geographic destinations for each client IP. We also use filter aggregations to
|
||||||
identify suspicious client IPs in the web log sample dataset.
|
count the specific types of HTTP responses that each client IP receives.
|
||||||
|
Ultimately, the example below transforms web log data into an entity centric
|
||||||
We transform the data such that the new index contains the sum of bytes and the
|
index where the entity is `clientip`.
|
||||||
number of distinct URLs, agents, incoming requests by location, and geographic
|
|
||||||
destinations for each client IP. We also use a scripted field to count the
|
|
||||||
specific types of HTTP responses that each client IP receives. Ultimately, the
|
|
||||||
example below transforms web log data into an entity centric index where the
|
|
||||||
entity is `clientip`.
|
|
||||||
|
|
||||||
[source,console]
|
[source,console]
|
||||||
----------------------------------
|
----------------------------------
|
||||||
|
@ -230,30 +225,17 @@ PUT _transform/suspicious_client_ips
|
||||||
"agent_dc": { "cardinality": { "field": "agent.keyword" }},
|
"agent_dc": { "cardinality": { "field": "agent.keyword" }},
|
||||||
"geo.dest_dc": { "cardinality": { "field": "geo.dest" }},
|
"geo.dest_dc": { "cardinality": { "field": "geo.dest" }},
|
||||||
"responses.total": { "value_count": { "field": "timestamp" }},
|
"responses.total": { "value_count": { "field": "timestamp" }},
|
||||||
"responses.counts": { <4>
|
"success" : { <4>
|
||||||
"scripted_metric": {
|
"filter": {
|
||||||
"init_script": "state.responses = ['error':0L,'success':0L,'other':0L]",
|
"term": { "response" : "200"}}
|
||||||
"map_script": """
|
},
|
||||||
def code = doc['response.keyword'].value;
|
"error404" : {
|
||||||
if (code.startsWith('5') || code.startsWith('4')) {
|
"filter": {
|
||||||
state.responses.error += 1 ;
|
"term": { "response" : "404"}}
|
||||||
} else if(code.startsWith('2')) {
|
},
|
||||||
state.responses.success += 1;
|
"error503" : {
|
||||||
} else {
|
"filter": {
|
||||||
state.responses.other += 1;
|
"term": { "response" : "503"}}
|
||||||
}
|
|
||||||
""",
|
|
||||||
"combine_script": "state.responses",
|
|
||||||
"reduce_script": """
|
|
||||||
def counts = ['error': 0L, 'success': 0L, 'other': 0L];
|
|
||||||
for (responses in states) {
|
|
||||||
counts.error += responses['error'];
|
|
||||||
counts.success += responses['success'];
|
|
||||||
counts.other += responses['other'];
|
|
||||||
}
|
|
||||||
return counts;
|
|
||||||
"""
|
|
||||||
}
|
|
||||||
},
|
},
|
||||||
"timestamp.min": { "min": { "field": "timestamp" }},
|
"timestamp.min": { "min": { "field": "timestamp" }},
|
||||||
"timestamp.max": { "max": { "field": "timestamp" }},
|
"timestamp.max": { "max": { "field": "timestamp" }},
|
||||||
|
@ -277,11 +259,13 @@ PUT _transform/suspicious_client_ips
|
||||||
to synchronize the source and destination indices. The worst case
|
to synchronize the source and destination indices. The worst case
|
||||||
ingestion delay is 60 seconds.
|
ingestion delay is 60 seconds.
|
||||||
<3> The data is grouped by the `clientip` field.
|
<3> The data is grouped by the `clientip` field.
|
||||||
<4> This `scripted_metric` performs a distributed operation on the web log data
|
<4> Filter aggregation that counts the occurrences of successful (`200`)
|
||||||
to count specific types of HTTP responses (error, success, and other).
|
responses in the `response` field. The following two aggregations (`error404`
|
||||||
|
and `error503`) count the error responses by error codes.
|
||||||
<5> This `bucket_script` calculates the duration of the `clientip` access based
|
<5> This `bucket_script` calculates the duration of the `clientip` access based
|
||||||
on the results of the aggregation.
|
on the results of the aggregation.
|
||||||
|
|
||||||
|
|
||||||
After you create the {transform}, you must start it:
|
After you create the {transform}, you must start it:
|
||||||
|
|
||||||
[source,console]
|
[source,console]
|
||||||
|
@ -290,6 +274,7 @@ POST _transform/suspicious_client_ips/_start
|
||||||
----------------------------------
|
----------------------------------
|
||||||
// TEST[skip:setup kibana sample data]
|
// TEST[skip:setup kibana sample data]
|
||||||
|
|
||||||
|
|
||||||
Shortly thereafter, the first results should be available in the destination
|
Shortly thereafter, the first results should be available in the destination
|
||||||
index:
|
index:
|
||||||
|
|
||||||
|
@ -299,6 +284,7 @@ GET sample_weblogs_by_clientip/_search
|
||||||
----------------------------------
|
----------------------------------
|
||||||
// TEST[skip:setup kibana sample data]
|
// TEST[skip:setup kibana sample data]
|
||||||
|
|
||||||
|
|
||||||
The search result shows you data like this for each client IP:
|
The search result shows you data like this for each client IP:
|
||||||
|
|
||||||
[source,js]
|
[source,js]
|
||||||
|
@ -313,22 +299,20 @@ The search result shows you data like this for each client IP:
|
||||||
"src_dc" : 2.0,
|
"src_dc" : 2.0,
|
||||||
"dest_dc" : 2.0
|
"dest_dc" : 2.0
|
||||||
},
|
},
|
||||||
|
"success" : 2,
|
||||||
|
"error404" : 0,
|
||||||
|
"error503" : 0,
|
||||||
"clientip" : "0.72.176.46",
|
"clientip" : "0.72.176.46",
|
||||||
"agent_dc" : 2.0,
|
"agent_dc" : 2.0,
|
||||||
"bytes_sum" : 4422.0,
|
"bytes_sum" : 4422.0,
|
||||||
"responses" : {
|
"responses" : {
|
||||||
"total" : 2.0,
|
"total" : 2.0
|
||||||
"counts" : {
|
|
||||||
"other" : 0,
|
|
||||||
"success" : 2,
|
|
||||||
"error" : 0
|
|
||||||
}
|
|
||||||
},
|
},
|
||||||
"url_dc" : 2.0,
|
"url_dc" : 2.0,
|
||||||
"timestamp" : {
|
"timestamp" : {
|
||||||
"duration_ms" : 5.2191698E8,
|
"duration_ms" : 5.2191698E8,
|
||||||
"min" : "2019-11-25T07:51:57.333Z",
|
"min" : "2020-03-16T07:51:57.333Z",
|
||||||
"max" : "2019-12-01T08:50:34.313Z"
|
"max" : "2020-03-22T08:50:34.313Z"
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
@ -337,11 +321,12 @@ The search result shows you data like this for each client IP:
|
||||||
// NOTCONSOLE
|
// NOTCONSOLE
|
||||||
|
|
||||||
NOTE: Like other Kibana sample data sets, the web log sample dataset contains
|
NOTE: Like other Kibana sample data sets, the web log sample dataset contains
|
||||||
timestamps relative to when you installed it, including timestamps in the future.
|
timestamps relative to when you installed it, including timestamps in the
|
||||||
The {ctransform} will pick up the data points once they are in the past. If you
|
future. The {ctransform} will pick up the data points once they are in the past.
|
||||||
installed the web log sample dataset some time ago, you can uninstall and
|
If you installed the web log sample dataset some time ago, you can uninstall and
|
||||||
reinstall it and the timestamps will change.
|
reinstall it and the timestamps will change.
|
||||||
|
|
||||||
|
|
||||||
This {transform} makes it easier to answer questions such as:
|
This {transform} makes it easier to answer questions such as:
|
||||||
|
|
||||||
* Which client IPs are transferring the most amounts of data?
|
* Which client IPs are transferring the most amounts of data?
|
||||||
|
|
Loading…
Reference in New Issue