[DOCS] Add configuration information for population analysis (elastic/x-pack-elasticsearch#1653)
* [DOCS] Add configuration information for population analysis * [DOCS] Add ML population analysis examples * [DOCS] Address feedback for population analysis * [DOCS] More feedback on population analysis Original commit: elastic/x-pack-elasticsearch@ffa2bfeed9
This commit is contained in:
parent
00c40c8299
commit
527b789e6f
|
@ -33,3 +33,4 @@ The scenarios in this section describe some best practices for generating useful
|
|||
|
||||
include::aggregations.asciidoc[]
|
||||
include::categories.asciidoc[]
|
||||
include::populations.asciidoc[]
|
||||
|
|
Binary file not shown.
After Width: | Height: | Size: 152 KiB |
Binary file not shown.
After Width: | Height: | Size: 62 KiB |
Binary file not shown.
After Width: | Height: | Size: 215 KiB |
|
@ -0,0 +1,87 @@
|
|||
[[ml-configuring-pop]]
|
||||
=== Performing Population Analysis
|
||||
|
||||
Entities or events in your data can be considered anomalous when:
|
||||
|
||||
* Their behavior changes over time, relative to their own previous behavior, or
|
||||
* Their behavior is different than other entities in a specified population.
|
||||
|
||||
The latter method of detecting outliers is known as _population analysis_. The
|
||||
{ml} analytics build a profile of what a "typical" user, machine, or other entity
|
||||
does over a specified time period and then identify when one is behaving
|
||||
abnormally compared to the population.
|
||||
|
||||
This type of analysis is most useful when the behavior of the population as a
|
||||
whole is mostly homogeneous and you want to identify outliers. In general,
|
||||
population analysis is not useful when members of the population inherently
|
||||
have vastly different behavior. You can, however, segment your data into groups
|
||||
that behave similarly and run these as separate jobs. For example, you can use a
|
||||
query filter in the {dfeed} to segment your data or you can use the
|
||||
`partition_field_name` to split the analysis for the different groups.
|
||||
|
||||
Population analysis scales well and has a lower resource footprint than
|
||||
individual analysis of each series. For example, you can analyze populations
|
||||
of hundreds of thousands or millions of entities.
|
||||
|
||||
To specify the population, use the `over_field_name` property. For example:
|
||||
|
||||
[source,js]
|
||||
----------------------------------
|
||||
PUT _xpack/ml/anomaly_detectors/population
|
||||
{
|
||||
"description" : "Population analysis",
|
||||
"analysis_config" : {
|
||||
"bucket_span":"10m",
|
||||
"influencers": [
|
||||
"username"
|
||||
],
|
||||
"detectors": [
|
||||
{
|
||||
"function": "mean",
|
||||
"field_name": "bytesSent",
|
||||
"over_field_name": "username" <1>
|
||||
}
|
||||
]
|
||||
},
|
||||
"data_description" : {
|
||||
"time_field":"@timestamp",
|
||||
"time_format": "epoch_ms"
|
||||
}
|
||||
}
|
||||
----------------------------------
|
||||
//CONSOLE
|
||||
<1> This `over_field-name` property indicates that the metrics for each user (
|
||||
as identified by their `username` value) are analyzed relative to other users
|
||||
in each bucket.
|
||||
|
||||
//TO-DO: Per sophiec20 "Perhaps add the datafeed config and add a query filter to
|
||||
//include only workstations as servers and printers would behave differently
|
||||
//from the population
|
||||
|
||||
If your data is stored in {es}, you can create an advanced job with these same
|
||||
properties. In particular, you specify the `over_field_name` property when you
|
||||
add detectors:
|
||||
|
||||
[role="screenshot"]
|
||||
image::images/ml-population-job.jpg["Create a detector for population analysis]
|
||||
|
||||
After you open the job and start the {dfeed} or supply data to the job, you can
|
||||
view the results in {kib}. For example:
|
||||
|
||||
[role="screenshot"]
|
||||
image::images/ml-population-results.jpg["Population analysis results in the Anomaly Explorer"]
|
||||
|
||||
As in this case, the results are often quite sparse. There might be just a few
|
||||
data points for the selected time period. Population analysis is particularly
|
||||
useful when you have many entities and the data for specific entitles is sporadic
|
||||
or sparse.
|
||||
|
||||
If you click on a section in the time line or swim lanes, you can see more
|
||||
details about the anomalies:
|
||||
|
||||
[role="screenshot"]
|
||||
image::images/ml-population-anomaly.jpg["Anomaly details for a specific user"]
|
||||
|
||||
In this example, the user identified as `antonette` sent a high volume of bytes
|
||||
on the date and time shown. This event is anomalous because the mean is two times
|
||||
higher than the expected behavior of the population.
|
Loading…
Reference in New Issue