OpenSearch/docs/en/ml/functions/sum.asciidoc

120 lines
4.1 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

[[ml-sum-functions]]
=== Sum Functions
The sum functions detect anomalies when the sum of a field in a bucket is anomalous.
If you want to monitor unusually high totals, use high-sided functions.
If want to look at drops in totals, use low-sided functions.
If your data is sparse, use `non_null_sum` functions. Buckets without values are
ignored; buckets with a zero value are analyzed.
The {xpackml} features include the following sum functions:
* xref:ml-sum[`sum`, `high_sum`, `low_sum`]
* xref:ml-nonnull-sum[`non_null_sum`, `high_non_null_sum`, `low_non_null_sum`]
////
TBD: Incorporate from prelert docs?:
Input data may contain pre-calculated fields giving the total count of some value e.g. transactions per minute.
Ensure you are familiar with our advice on Summarization of Input Data, as this is likely to provide
a more appropriate method to using the sum function.
////
[float]
[[ml-sum]]
==== Sum, High_sum, Low_sum
The `sum` function detects anomalies where the sum of a field in a bucket is
anomalous.
If you want to monitor unusually high sum values, use the `high_sum` function.
If you want to monitor unusually low sum values, use the `low_sum` function.
These functions support the following properties:
* `field_name` (required)
* `by_field_name` (optional)
* `over_field_name` (optional)
* `partition_field_name` (optional)
For more information about those properties, see
{ref}/ml-job-resource.html#ml-detectorconfig[Detector Configuration Objects].
.Example 1: Analyzing total expenses with the sum function
[source,js]
--------------------------------------------------
{
"function" : "sum",
"field_name" : "expenses",
"by_field_name" : "costcenter",
"over_field_name" : "employee"
}
--------------------------------------------------
If you use this `sum` function in a detector in your job, it
models total expenses per employees for each cost center. For each time bucket,
it detects when an employees expenses are unusual for a cost center compared
to other employees.
.Example 2: Analyzing total bytes with the high_sum function
[source,js]
--------------------------------------------------
{
"function" : "high_sum",
"field_name" : "cs_bytes",
"over_field_name" : "cs_host"
}
--------------------------------------------------
If you use this `high_sum` function in a detector in your job, it
models total `cs_bytes`. It detects `cs_hosts` that transfer unusually high
volumes compared to other `cs_hosts`. This example looks for volumes of data
transferred from a client to a server on the internet that are unusual compared
to other clients. This scenario could be useful to detect data exfiltration or
to find users that are abusing internet privileges.
[float]
[[ml-nonnull-sum]]
==== Non_null_sum, High_non_null_sum, Low_non_null_sum
The `non_null_sum` function is useful if your data is sparse. Buckets without
values are ignored and buckets with a zero value are analyzed.
If you want to monitor unusually high totals, use the `high_non_null_sum`
function.
If you want to look at drops in totals, use the `low_non_null_sum` function.
These functions support the following properties:
* `field_name` (required)
* `by_field_name` (optional)
* `partition_field_name` (optional)
For more information about those properties, see
{ref}/ml-job-resource.html#ml-detectorconfig[Detector Configuration Objects].
NOTE: Population analysis (that is to say, use of the `over_field_name` property)
is not applicable for this function.
.Example 3: Analyzing employee approvals with the high_non_null_sum function
[source,js]
--------------------------------------------------
{
"function" : "high_non_null_sum",
"fieldName" : "amount_approved",
"byFieldName" : "employee"
}
--------------------------------------------------
If you use this `high_non_null_sum` function in a detector in your job, it
models the total `amount_approved` for each employee. It ignores any buckets
where the amount is null. It detects employees who approve unusually high
amounts compared to their past behavior.
//For this credit control system analysis, using non_null_sum will ignore
//periods where the employees are not active on the system.