2018-06-19 16:57:10 -04:00
|
|
|
[role="xpack"]
|
2017-05-05 13:40:17 -04:00
|
|
|
[[ml-count-functions]]
|
2020-07-20 20:04:59 -04:00
|
|
|
= Count functions
|
2017-05-05 13:40:17 -04:00
|
|
|
|
2017-06-01 17:16:14 -04:00
|
|
|
Count functions detect anomalies when the number of events in a bucket is
|
2017-05-05 13:40:17 -04:00
|
|
|
anomalous.
|
|
|
|
|
|
|
|
Use `non_zero_count` functions if your data is sparse and you want to ignore
|
|
|
|
cases where the bucket count is zero.
|
|
|
|
|
|
|
|
Use `distinct_count` functions to determine when the number of distinct values
|
|
|
|
in one field is unusual, as opposed to the total count.
|
|
|
|
|
|
|
|
Use high-sided functions if you want to monitor unusually high event rates.
|
|
|
|
Use low-sided functions if you want to look at drops in event rate.
|
|
|
|
|
2019-01-07 17:32:36 -05:00
|
|
|
The {ml-features} include the following count functions:
|
2017-05-05 13:40:17 -04:00
|
|
|
|
2017-06-01 17:16:14 -04:00
|
|
|
* xref:ml-count[`count`, `high_count`, `low_count`]
|
|
|
|
* xref:ml-nonzero-count[`non_zero_count`, `high_non_zero_count`, `low_non_zero_count`]
|
2017-10-11 11:13:45 -04:00
|
|
|
* xref:ml-distinct-count[`distinct_count`, `high_distinct_count`, `low_distinct_count`]
|
2017-05-05 13:40:17 -04:00
|
|
|
|
2020-07-23 12:42:33 -04:00
|
|
|
[discrete]
|
2017-05-05 13:40:17 -04:00
|
|
|
[[ml-count]]
|
2020-07-20 20:04:59 -04:00
|
|
|
== Count, high_count, low_count
|
2017-05-05 13:40:17 -04:00
|
|
|
|
2017-06-01 17:16:14 -04:00
|
|
|
The `count` function detects anomalies when the number of events in a bucket is
|
2017-05-05 13:40:17 -04:00
|
|
|
anomalous.
|
|
|
|
|
2017-06-01 17:16:14 -04:00
|
|
|
The `high_count` function detects anomalies when the count of events in a
|
|
|
|
bucket are unusually high.
|
|
|
|
|
|
|
|
The `low_count` function detects anomalies when the count of events in a
|
|
|
|
bucket are unusually low.
|
|
|
|
|
|
|
|
These functions support the following properties:
|
|
|
|
|
|
|
|
* `by_field_name` (optional)
|
|
|
|
* `over_field_name` (optional)
|
|
|
|
* `partition_field_name` (optional)
|
2017-05-05 13:40:17 -04:00
|
|
|
|
2019-12-27 16:30:26 -05:00
|
|
|
For more information about those properties, see the
|
|
|
|
{ref}/ml-put-job.html#ml-put-job-request-body[create {anomaly-jobs} API].
|
2017-06-01 17:16:14 -04:00
|
|
|
|
|
|
|
.Example 1: Analyzing events with the count function
|
2019-09-09 12:35:50 -04:00
|
|
|
[source,console]
|
2017-05-05 13:40:17 -04:00
|
|
|
--------------------------------------------------
|
2018-12-07 15:34:11 -05:00
|
|
|
PUT _ml/anomaly_detectors/example1
|
2018-06-19 16:57:10 -04:00
|
|
|
{
|
|
|
|
"analysis_config": {
|
|
|
|
"detectors": [{
|
|
|
|
"function" : "count"
|
|
|
|
}]
|
|
|
|
},
|
|
|
|
"data_description": {
|
|
|
|
"time_field":"timestamp",
|
|
|
|
"time_format": "epoch_ms"
|
|
|
|
}
|
|
|
|
}
|
2017-05-05 13:40:17 -04:00
|
|
|
--------------------------------------------------
|
2018-08-31 14:56:26 -04:00
|
|
|
// TEST[skip:needs-licence]
|
2017-05-05 13:40:17 -04:00
|
|
|
|
2017-06-01 17:16:14 -04:00
|
|
|
This example is probably the simplest possible analysis. It identifies
|
|
|
|
time buckets during which the overall count of events is higher or lower than
|
|
|
|
usual.
|
2017-05-05 13:40:17 -04:00
|
|
|
|
2019-07-26 14:07:01 -04:00
|
|
|
When you use this function in a detector in your {anomaly-job}, it models the
|
|
|
|
event rate and detects when the event rate is unusual compared to its past
|
|
|
|
behavior.
|
2017-05-05 13:40:17 -04:00
|
|
|
|
2017-06-01 17:16:14 -04:00
|
|
|
.Example 2: Analyzing errors with the high_count function
|
2019-09-09 12:35:50 -04:00
|
|
|
[source,console]
|
2017-06-01 17:16:14 -04:00
|
|
|
--------------------------------------------------
|
2018-12-07 15:34:11 -05:00
|
|
|
PUT _ml/anomaly_detectors/example2
|
2017-06-01 17:16:14 -04:00
|
|
|
{
|
2018-06-19 16:57:10 -04:00
|
|
|
"analysis_config": {
|
|
|
|
"detectors": [{
|
|
|
|
"function" : "high_count",
|
|
|
|
"by_field_name" : "error_code",
|
|
|
|
"over_field_name": "user"
|
|
|
|
}]
|
|
|
|
},
|
|
|
|
"data_description": {
|
|
|
|
"time_field":"timestamp",
|
|
|
|
"time_format": "epoch_ms"
|
|
|
|
}
|
2017-06-01 17:16:14 -04:00
|
|
|
}
|
|
|
|
--------------------------------------------------
|
2018-08-31 14:56:26 -04:00
|
|
|
// TEST[skip:needs-licence]
|
2017-05-05 13:40:17 -04:00
|
|
|
|
2019-07-26 14:07:01 -04:00
|
|
|
If you use this `high_count` function in a detector in your {anomaly-job}, it
|
2017-06-01 17:16:14 -04:00
|
|
|
models the event rate for each error code. It detects users that generate an
|
|
|
|
unusually high count of error codes compared to other users.
|
2017-05-05 13:40:17 -04:00
|
|
|
|
|
|
|
|
2017-06-01 17:16:14 -04:00
|
|
|
.Example 3: Analyzing status codes with the low_count function
|
2019-09-09 12:35:50 -04:00
|
|
|
[source,console]
|
2017-05-05 13:40:17 -04:00
|
|
|
--------------------------------------------------
|
2018-12-07 15:34:11 -05:00
|
|
|
PUT _ml/anomaly_detectors/example3
|
2017-06-01 17:16:14 -04:00
|
|
|
{
|
2018-06-19 16:57:10 -04:00
|
|
|
"analysis_config": {
|
|
|
|
"detectors": [{
|
|
|
|
"function" : "low_count",
|
|
|
|
"by_field_name" : "status_code"
|
|
|
|
}]
|
|
|
|
},
|
|
|
|
"data_description": {
|
|
|
|
"time_field":"timestamp",
|
|
|
|
"time_format": "epoch_ms"
|
|
|
|
}
|
2017-06-01 17:16:14 -04:00
|
|
|
}
|
2017-05-05 13:40:17 -04:00
|
|
|
--------------------------------------------------
|
2018-08-31 14:56:26 -04:00
|
|
|
// TEST[skip:needs-licence]
|
2017-05-05 13:40:17 -04:00
|
|
|
|
2017-06-01 17:16:14 -04:00
|
|
|
In this example, the function detects when the count of events for a
|
|
|
|
status code is lower than usual.
|
|
|
|
|
2019-07-26 14:07:01 -04:00
|
|
|
When you use this function in a detector in your {anomaly-job}, it models the
|
|
|
|
event rate for each status code and detects when a status code has an unusually
|
|
|
|
low count compared to its past behavior.
|
2017-06-01 17:16:14 -04:00
|
|
|
|
|
|
|
.Example 4: Analyzing aggregated data with the count function
|
2019-09-09 12:35:50 -04:00
|
|
|
[source,console]
|
2017-06-01 17:16:14 -04:00
|
|
|
--------------------------------------------------
|
2018-12-07 15:34:11 -05:00
|
|
|
PUT _ml/anomaly_detectors/example4
|
2017-06-01 17:16:14 -04:00
|
|
|
{
|
2018-06-19 16:57:10 -04:00
|
|
|
"analysis_config": {
|
|
|
|
"summary_count_field_name" : "events_per_min",
|
|
|
|
"detectors": [{
|
|
|
|
"function" : "count"
|
|
|
|
}]
|
|
|
|
},
|
|
|
|
"data_description": {
|
|
|
|
"time_field":"timestamp",
|
|
|
|
"time_format": "epoch_ms"
|
|
|
|
}
|
|
|
|
}
|
2017-06-01 17:16:14 -04:00
|
|
|
--------------------------------------------------
|
2018-08-31 14:56:26 -04:00
|
|
|
// TEST[skip:needs-licence]
|
2017-06-01 17:16:14 -04:00
|
|
|
|
|
|
|
If you are analyzing an aggregated `events_per_min` field, do not use a sum
|
|
|
|
function (for example, `sum(events_per_min)`). Instead, use the count function
|
2018-06-19 16:57:10 -04:00
|
|
|
and the `summary_count_field_name` property. For more information, see
|
|
|
|
<<ml-configuring-aggregation>>.
|
2017-05-05 13:40:17 -04:00
|
|
|
|
2020-07-23 12:42:33 -04:00
|
|
|
[discrete]
|
2017-06-01 17:16:14 -04:00
|
|
|
[[ml-nonzero-count]]
|
2020-07-20 20:04:59 -04:00
|
|
|
== Non_zero_count, high_non_zero_count, low_non_zero_count
|
2017-05-05 13:40:17 -04:00
|
|
|
|
2017-06-01 17:16:14 -04:00
|
|
|
The `non_zero_count` function detects anomalies when the number of events in a
|
|
|
|
bucket is anomalous, but it ignores cases where the bucket count is zero. Use
|
|
|
|
this function if you know your data is sparse or has gaps and the gaps are not
|
|
|
|
important.
|
2017-05-05 13:40:17 -04:00
|
|
|
|
2017-06-01 17:16:14 -04:00
|
|
|
The `high_non_zero_count` function detects anomalies when the number of events
|
|
|
|
in a bucket is unusually high and it ignores cases where the bucket count is
|
|
|
|
zero.
|
2017-05-05 13:40:17 -04:00
|
|
|
|
2017-06-01 17:16:14 -04:00
|
|
|
The `low_non_zero_count` function detects anomalies when the number of events in
|
|
|
|
a bucket is unusually low and it ignores cases where the bucket count is zero.
|
|
|
|
|
|
|
|
These functions support the following properties:
|
|
|
|
|
|
|
|
* `by_field_name` (optional)
|
|
|
|
* `partition_field_name` (optional)
|
|
|
|
|
2019-12-27 16:30:26 -05:00
|
|
|
For more information about those properties, see the
|
|
|
|
{ref}/ml-put-job.html#ml-put-job-request-body[create {anomaly-jobs} API].
|
2017-06-01 17:16:14 -04:00
|
|
|
|
|
|
|
For example, if you have the following number of events per bucket:
|
|
|
|
|
|
|
|
========================================
|
|
|
|
|
|
|
|
1,22,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,43,31,0,0,0,0,0,0,0,0,0,0,0,0,2,1
|
|
|
|
|
|
|
|
========================================
|
|
|
|
|
|
|
|
The `non_zero_count` function models only the following data:
|
|
|
|
|
|
|
|
========================================
|
|
|
|
|
|
|
|
1,22,2,43,31,2,1
|
|
|
|
|
|
|
|
========================================
|
|
|
|
|
|
|
|
.Example 5: Analyzing signatures with the high_non_zero_count function
|
2019-09-09 12:35:50 -04:00
|
|
|
[source,console]
|
2017-05-05 13:40:17 -04:00
|
|
|
--------------------------------------------------
|
2018-12-07 15:34:11 -05:00
|
|
|
PUT _ml/anomaly_detectors/example5
|
2017-06-01 17:16:14 -04:00
|
|
|
{
|
2018-06-19 16:57:10 -04:00
|
|
|
"analysis_config": {
|
|
|
|
"detectors": [{
|
|
|
|
"function" : "high_non_zero_count",
|
|
|
|
"by_field_name" : "signaturename"
|
|
|
|
}]
|
|
|
|
},
|
|
|
|
"data_description": {
|
|
|
|
"time_field":"timestamp",
|
|
|
|
"time_format": "epoch_ms"
|
|
|
|
}
|
2017-06-01 17:16:14 -04:00
|
|
|
}
|
2017-05-05 13:40:17 -04:00
|
|
|
--------------------------------------------------
|
2018-08-31 14:56:26 -04:00
|
|
|
// TEST[skip:needs-licence]
|
2017-05-05 13:40:17 -04:00
|
|
|
|
2019-07-26 14:07:01 -04:00
|
|
|
If you use this `high_non_zero_count` function in a detector in your
|
|
|
|
{anomaly-job}, it models the count of events for the `signaturename` field. It
|
|
|
|
ignores any buckets where the count is zero and detects when a `signaturename`
|
|
|
|
value has an unusually high count of events compared to its past behavior.
|
2017-06-01 17:16:14 -04:00
|
|
|
|
|
|
|
NOTE: Population analysis (using an `over_field_name` property value) is not
|
|
|
|
supported for the `non_zero_count`, `high_non_zero_count`, and
|
|
|
|
`low_non_zero_count` functions. If you want to do population analysis and your
|
|
|
|
data is sparse, use the `count` functions, which are optimized for that scenario.
|
2017-05-05 13:40:17 -04:00
|
|
|
|
|
|
|
|
2020-07-23 12:42:33 -04:00
|
|
|
[discrete]
|
2017-06-01 17:16:14 -04:00
|
|
|
[[ml-distinct-count]]
|
2020-07-20 20:04:59 -04:00
|
|
|
== Distinct_count, high_distinct_count, low_distinct_count
|
2017-05-05 13:40:17 -04:00
|
|
|
|
2017-06-01 17:16:14 -04:00
|
|
|
The `distinct_count` function detects anomalies where the number of distinct
|
|
|
|
values in one field is unusual.
|
2017-05-05 13:40:17 -04:00
|
|
|
|
2017-06-01 17:16:14 -04:00
|
|
|
The `high_distinct_count` function detects unusually high numbers of distinct
|
|
|
|
values in one field.
|
2017-05-05 13:40:17 -04:00
|
|
|
|
2017-06-01 17:16:14 -04:00
|
|
|
The `low_distinct_count` function detects unusually low numbers of distinct
|
|
|
|
values in one field.
|
2017-05-05 13:40:17 -04:00
|
|
|
|
2017-06-01 17:16:14 -04:00
|
|
|
These functions support the following properties:
|
2017-05-05 13:40:17 -04:00
|
|
|
|
2017-06-01 17:16:14 -04:00
|
|
|
* `field_name` (required)
|
|
|
|
* `by_field_name` (optional)
|
|
|
|
* `over_field_name` (optional)
|
|
|
|
* `partition_field_name` (optional)
|
2017-05-05 13:40:17 -04:00
|
|
|
|
2019-12-27 16:30:26 -05:00
|
|
|
For more information about those properties, see the
|
|
|
|
{ref}/ml-put-job.html#ml-put-job-request-body[create {anomaly-jobs} API].
|
2017-05-05 13:40:17 -04:00
|
|
|
|
2017-06-01 17:16:14 -04:00
|
|
|
.Example 6: Analyzing users with the distinct_count function
|
2019-09-09 12:35:50 -04:00
|
|
|
[source,console]
|
2017-06-01 17:16:14 -04:00
|
|
|
--------------------------------------------------
|
2018-12-07 15:34:11 -05:00
|
|
|
PUT _ml/anomaly_detectors/example6
|
2017-06-01 17:16:14 -04:00
|
|
|
{
|
2018-06-19 16:57:10 -04:00
|
|
|
"analysis_config": {
|
|
|
|
"detectors": [{
|
|
|
|
"function" : "distinct_count",
|
|
|
|
"field_name" : "user"
|
|
|
|
}]
|
|
|
|
},
|
|
|
|
"data_description": {
|
|
|
|
"time_field":"timestamp",
|
|
|
|
"time_format": "epoch_ms"
|
|
|
|
}
|
2017-06-01 17:16:14 -04:00
|
|
|
}
|
|
|
|
--------------------------------------------------
|
2018-08-31 14:56:26 -04:00
|
|
|
// TEST[skip:needs-licence]
|
2017-05-05 13:40:17 -04:00
|
|
|
|
2017-06-01 17:16:14 -04:00
|
|
|
This `distinct_count` function detects when a system has an unusual number
|
2019-07-26 14:07:01 -04:00
|
|
|
of logged in users. When you use this function in a detector in your
|
|
|
|
{anomaly-job}, it models the distinct count of users. It also detects when the
|
|
|
|
distinct number of users is unusual compared to the past.
|
2017-06-01 17:16:14 -04:00
|
|
|
|
|
|
|
.Example 7: Analyzing ports with the high_distinct_count function
|
2019-09-09 12:35:50 -04:00
|
|
|
[source,console]
|
2017-06-01 17:16:14 -04:00
|
|
|
--------------------------------------------------
|
2018-12-07 15:34:11 -05:00
|
|
|
PUT _ml/anomaly_detectors/example7
|
2017-06-01 17:16:14 -04:00
|
|
|
{
|
2018-06-19 16:57:10 -04:00
|
|
|
"analysis_config": {
|
|
|
|
"detectors": [{
|
|
|
|
"function" : "high_distinct_count",
|
|
|
|
"field_name" : "dst_port",
|
|
|
|
"over_field_name": "src_ip"
|
|
|
|
}]
|
|
|
|
},
|
|
|
|
"data_description": {
|
|
|
|
"time_field":"timestamp",
|
|
|
|
"time_format": "epoch_ms"
|
|
|
|
}
|
2017-06-01 17:16:14 -04:00
|
|
|
}
|
|
|
|
--------------------------------------------------
|
2018-08-31 14:56:26 -04:00
|
|
|
// TEST[skip:needs-licence]
|
2017-06-01 17:16:14 -04:00
|
|
|
|
|
|
|
This example detects instances of port scanning. When you use this function in a
|
2019-07-26 14:07:01 -04:00
|
|
|
detector in your {anomaly-job}, it models the distinct count of ports. It also
|
|
|
|
detects the `src_ip` values that connect to an unusually high number of different
|
2017-06-01 17:16:14 -04:00
|
|
|
`dst_ports` values compared to other `src_ip` values.
|