2017-05-05 13:40:17 -04:00
|
|
|
|
|
|
|
|
|
[[ml-sum-functions]]
|
|
|
|
|
=== Sum Functions
|
|
|
|
|
|
|
|
|
|
The sum functions detect anomalies when the sum of a field in a bucket is anomalous.
|
|
|
|
|
|
2017-05-16 10:59:53 -04:00
|
|
|
|
If you want to monitor unusually high totals, use high-sided functions.
|
2017-05-05 13:40:17 -04:00
|
|
|
|
|
2017-05-16 10:59:53 -04:00
|
|
|
|
If want to look at drops in totals, use low-sided functions.
|
2017-05-05 13:40:17 -04:00
|
|
|
|
|
2017-05-16 10:59:53 -04:00
|
|
|
|
If your data is sparse, use `non_null_sum` functions. Buckets without values are
|
|
|
|
|
ignored; buckets with a zero value are analyzed.
|
2017-05-05 13:40:17 -04:00
|
|
|
|
|
2017-05-16 10:59:53 -04:00
|
|
|
|
The {xpackml} features include the following sum functions:
|
|
|
|
|
|
|
|
|
|
* <<ml-sum,`sum`>>
|
|
|
|
|
* <<ml-high-sum,`high_sum`>>
|
|
|
|
|
* <<ml-low-sum,`low_sum`>>
|
|
|
|
|
* <<ml-nonnull-sum,`non_null_sum`>>
|
|
|
|
|
* <<ml-high-nonnull-sum,`high_non_null_sum`>>
|
|
|
|
|
* <<ml-low-nonnull-sum,`low_non_null_sum`>>
|
2017-05-05 13:40:17 -04:00
|
|
|
|
|
|
|
|
|
////
|
|
|
|
|
TBD: Incorporate from prelert docs?:
|
2017-05-16 10:59:53 -04:00
|
|
|
|
Input data may contain pre-calculated fields giving the total count of some value e.g. transactions per minute.
|
2017-05-05 13:40:17 -04:00
|
|
|
|
Ensure you are familiar with our advice on Summarization of Input Data, as this is likely to provide
|
|
|
|
|
a more appropriate method to using the sum function.
|
2017-05-16 10:59:53 -04:00
|
|
|
|
////
|
|
|
|
|
|
|
|
|
|
[float]
|
|
|
|
|
[[ml-sum]]
|
|
|
|
|
==== Sum
|
2017-05-05 14:57:20 -04:00
|
|
|
|
|
2017-05-16 10:59:53 -04:00
|
|
|
|
The `sum` function detects anomalies where the sum of a field in a bucket is
|
|
|
|
|
anomalous.
|
|
|
|
|
|
|
|
|
|
This function supports the following properties:
|
|
|
|
|
|
|
|
|
|
* `field_name` (required)
|
|
|
|
|
* `by_field_name` (optional)
|
|
|
|
|
* `over_field_name` (optional)
|
|
|
|
|
* `partition_field_name` (optional)
|
|
|
|
|
* `summary_count_field_name` (optional)
|
|
|
|
|
|
|
|
|
|
For more information about those properties,
|
|
|
|
|
see <<ml-detectorconfig,Detector Configuration Objects>>.
|
|
|
|
|
|
|
|
|
|
For example, if you use the following function in a detector in your job, it
|
|
|
|
|
models total expenses per employees for each cost center. For each time bucket,
|
|
|
|
|
it detects when an employee’s expenses are unusual for a cost center compared
|
|
|
|
|
to other employees.
|
2017-05-05 14:57:20 -04:00
|
|
|
|
|
|
|
|
|
[source,js]
|
|
|
|
|
--------------------------------------------------
|
2017-05-16 10:59:53 -04:00
|
|
|
|
{
|
|
|
|
|
"function" : "sum",
|
|
|
|
|
"field_name" : "expenses",
|
|
|
|
|
"by_field_name" : "costcenter",
|
|
|
|
|
"over_field_name" : "employee"
|
|
|
|
|
}
|
2017-05-05 14:57:20 -04:00
|
|
|
|
--------------------------------------------------
|
|
|
|
|
|
2017-05-16 10:59:53 -04:00
|
|
|
|
[float]
|
|
|
|
|
[[ml-high-sum]]
|
|
|
|
|
==== High_sum
|
2017-05-05 14:57:20 -04:00
|
|
|
|
|
2017-05-16 10:59:53 -04:00
|
|
|
|
The `high_sum` function detects anomalies where the sum of a field in a bucket
|
|
|
|
|
is unusually high.
|
|
|
|
|
|
|
|
|
|
This function supports the following properties:
|
|
|
|
|
|
|
|
|
|
* `field_name` (required)
|
|
|
|
|
* `by_field_name` (optional)
|
|
|
|
|
* `over_field_name` (optional)
|
|
|
|
|
* `partition_field_name` (optional)
|
|
|
|
|
* `summary_count_field_name` (optional)
|
|
|
|
|
|
|
|
|
|
For more information about those properties,
|
|
|
|
|
see <<ml-detectorconfig,Detector Configuration Objects>>.
|
|
|
|
|
|
|
|
|
|
For example, if you use the following function in a detector in your job, it
|
|
|
|
|
models total `cs_bytes`. It detects `cs_hosts` that transfer unusually high
|
|
|
|
|
volumes compared to other `cs_hosts`.
|
|
|
|
|
|
|
|
|
|
[source,js]
|
|
|
|
|
--------------------------------------------------
|
|
|
|
|
{
|
|
|
|
|
"function" : "high_sum",
|
|
|
|
|
"field_name" : "cs_bytes",
|
|
|
|
|
"over_field_name" : "cs_host"
|
|
|
|
|
}
|
|
|
|
|
--------------------------------------------------
|
|
|
|
|
|
|
|
|
|
This example looks for volumes of data transferred from a client to a server on
|
|
|
|
|
the internet that are unusual compared to other clients. This scenario could be
|
|
|
|
|
useful to detect data exfiltration or to find users that are abusing internet
|
|
|
|
|
privileges.
|
|
|
|
|
|
|
|
|
|
[float]
|
|
|
|
|
[[ml-low-sum]]
|
|
|
|
|
==== Low_sum
|
|
|
|
|
|
|
|
|
|
The `low_sum` function detects anomalies where the sum of a field in a bucket
|
|
|
|
|
is unusually low.
|
|
|
|
|
|
|
|
|
|
This function supports the following properties:
|
|
|
|
|
|
|
|
|
|
* `field_name` (required)
|
|
|
|
|
* `by_field_name` (optional)
|
|
|
|
|
* `over_field_name` (optional)
|
|
|
|
|
* `partition_field_name` (optional)
|
|
|
|
|
* `summary_count_field_name` (optional)
|
|
|
|
|
|
|
|
|
|
For more information about those properties,
|
|
|
|
|
see <<ml-detectorconfig,Detector Configuration Objects>>.
|
|
|
|
|
|
|
|
|
|
[float]
|
|
|
|
|
[[ml-nonnull-sum]]
|
|
|
|
|
==== Non_null_sum
|
|
|
|
|
|
|
|
|
|
The `non_null_sum` function is useful if your data is sparse. Buckets without
|
|
|
|
|
values are ignored and buckets with a zero value are analyzed.
|
|
|
|
|
|
|
|
|
|
This function supports the following properties:
|
|
|
|
|
|
|
|
|
|
* `field_name` (required)
|
|
|
|
|
* `by_field_name` (optional)
|
|
|
|
|
* `partition_field_name` (optional)
|
|
|
|
|
* `summary_count_field_name` (optional)
|
|
|
|
|
|
|
|
|
|
For more information about those properties,
|
|
|
|
|
see <<ml-detectorconfig,Detector Configuration Objects>>.
|
|
|
|
|
|
|
|
|
|
NOTE: Population analysis (that is to say, use of the `over_field_name` property)
|
|
|
|
|
is not applicable for this function.
|
|
|
|
|
|
|
|
|
|
[float]
|
|
|
|
|
[[ml-high-nonnull-sum]]
|
|
|
|
|
==== High_non_null_sum
|
|
|
|
|
|
|
|
|
|
The `high_non_null_sum` function is useful if your data is sparse. Buckets
|
|
|
|
|
without values are ignored and buckets with a zero value are analyzed.
|
|
|
|
|
Use this function if you want to monitor unusually high totals.
|
|
|
|
|
|
|
|
|
|
This function supports the following properties:
|
|
|
|
|
|
|
|
|
|
* `field_name` (required)
|
|
|
|
|
* `by_field_name` (optional)
|
|
|
|
|
* `partition_field_name` (optional)
|
|
|
|
|
* `summary_count_field_name` (optional)
|
|
|
|
|
|
|
|
|
|
For more information about those properties,
|
|
|
|
|
see <<ml-detectorconfig,Detector Configuration Objects>>.
|
|
|
|
|
|
|
|
|
|
NOTE: Population analysis (that is to say, use of the `over_field_name` property)
|
|
|
|
|
is not applicable for this function.
|
|
|
|
|
|
|
|
|
|
For example, if you use the following function in a detector in your job, it
|
|
|
|
|
models the total `amount_approved` for each employee. It ignores any buckets
|
|
|
|
|
where the amount is null. It detects employees who approve unusually high
|
|
|
|
|
amounts compared to their past behavior.
|
|
|
|
|
|
|
|
|
|
[source,js]
|
|
|
|
|
--------------------------------------------------
|
|
|
|
|
{
|
|
|
|
|
"function" : "high_non_null_sum",
|
|
|
|
|
"fieldName" : "amount_approved",
|
|
|
|
|
"byFieldName" : "employee"
|
|
|
|
|
}
|
|
|
|
|
--------------------------------------------------
|
|
|
|
|
|
|
|
|
|
//For this credit control system analysis, using non_null_sum will ignore
|
|
|
|
|
//periods where the employees are not active on the system.
|
|
|
|
|
|
|
|
|
|
[float]
|
|
|
|
|
[[ml-low-nonnull-sum]]
|
|
|
|
|
==== Low_non_null_sum
|
|
|
|
|
|
|
|
|
|
The `low_non_null_sum` function is useful if your data is sparse. Buckets
|
|
|
|
|
without values are ignored and buckets with a zero value are analyzed.
|
|
|
|
|
Use this function if you want to look at drops in totals.
|
|
|
|
|
|
|
|
|
|
This function supports the following properties:
|
|
|
|
|
|
|
|
|
|
* `field_name` (required)
|
|
|
|
|
* `by_field_name` (optional)
|
|
|
|
|
* `partition_field_name` (optional)
|
|
|
|
|
* `summary_count_field_name` (optional)
|
|
|
|
|
|
|
|
|
|
For more information about those properties,
|
|
|
|
|
see <<ml-detectorconfig,Detector Configuration Objects>>.
|
|
|
|
|
|
|
|
|
|
NOTE: Population analysis (that is to say, use of the `over_field_name` property)
|
|
|
|
|
is not applicable for this function.
|