2018-06-19 16:57:10 -04:00
|
|
|
[role="xpack"]
|
2017-05-05 13:40:17 -04:00
|
|
|
[[ml-functions]]
|
2018-06-19 16:57:10 -04:00
|
|
|
== Function reference
|
2017-05-05 13:40:17 -04:00
|
|
|
|
2019-01-07 17:32:36 -05:00
|
|
|
The {ml-features} include analysis functions that provide a wide variety of
|
2017-05-05 13:40:17 -04:00
|
|
|
flexible ways to analyze data for anomalies.
|
|
|
|
|
2019-07-26 14:07:01 -04:00
|
|
|
When you create {anomaly-jobs}, you specify one or more detectors, which define
|
|
|
|
the type of analysis that needs to be done. If you are creating your job by
|
2019-12-27 16:30:26 -05:00
|
|
|
using {ml} APIs, you specify the functions in detector configuration objects.
|
2017-05-05 13:40:17 -04:00
|
|
|
If you are creating your job in {kib}, you specify the functions differently
|
|
|
|
depending on whether you are creating single metric, multi-metric, or advanced
|
2019-07-12 11:25:23 -04:00
|
|
|
jobs.
|
2017-05-05 13:40:17 -04:00
|
|
|
|
2017-06-01 17:16:14 -04:00
|
|
|
Most functions detect anomalies in both low and high values. In statistical
|
|
|
|
terminology, they apply a two-sided test. Some functions offer low and high
|
|
|
|
variations (for example, `count`, `low_count`, and `high_count`). These variations
|
|
|
|
apply one-sided tests, detecting anomalies only when the values are low or
|
|
|
|
high, depending one which alternative is used.
|
|
|
|
|
|
|
|
You can specify a `summary_count_field_name` with any function except `metric`.
|
2017-05-05 13:40:17 -04:00
|
|
|
When you use `summary_count_field_name`, the {ml} features expect the input
|
2017-06-01 17:16:14 -04:00
|
|
|
data to be pre-aggregated. The value of the `summary_count_field_name` field
|
|
|
|
must contain the count of raw events that were summarized. In {kib}, use the
|
2019-07-26 14:07:01 -04:00
|
|
|
**summary_count_field_name** in advanced {anomaly-jobs}. Analyzing aggregated
|
|
|
|
input data provides a significant boost in performance. For more information, see
|
2017-06-02 15:22:19 -04:00
|
|
|
<<ml-configuring-aggregation>>.
|
2017-05-05 13:40:17 -04:00
|
|
|
|
2017-06-01 17:16:14 -04:00
|
|
|
If your data is sparse, there may be gaps in the data which means you might have
|
|
|
|
empty buckets. You might want to treat these as anomalies or you might want these
|
|
|
|
gaps to be ignored. Your decision depends on your use case and what is important
|
|
|
|
to you. It also depends on which functions you use. The `sum` and `count`
|
|
|
|
functions are strongly affected by empty buckets. For this reason, there are
|
|
|
|
`non_null_sum` and `non_zero_count` functions, which are tolerant to sparse data.
|
|
|
|
These functions effectively ignore empty buckets.
|
2017-05-05 13:40:17 -04:00
|
|
|
|
|
|
|
* <<ml-count-functions>>
|
|
|
|
* <<ml-geo-functions>>
|
|
|
|
* <<ml-info-functions>>
|
|
|
|
* <<ml-metric-functions>>
|
|
|
|
* <<ml-rare-functions>>
|
|
|
|
* <<ml-sum-functions>>
|
|
|
|
* <<ml-time-functions>>
|
2017-06-05 16:07:15 -04:00
|
|
|
|
|
|
|
include::functions/count.asciidoc[]
|
2018-05-23 12:37:55 -04:00
|
|
|
|
2017-06-05 16:07:15 -04:00
|
|
|
include::functions/geo.asciidoc[]
|
2018-05-23 12:37:55 -04:00
|
|
|
|
2017-06-05 16:07:15 -04:00
|
|
|
include::functions/info.asciidoc[]
|
2018-05-23 12:37:55 -04:00
|
|
|
|
2017-06-05 16:07:15 -04:00
|
|
|
include::functions/metric.asciidoc[]
|
2018-05-23 12:37:55 -04:00
|
|
|
|
2017-06-05 16:07:15 -04:00
|
|
|
include::functions/rare.asciidoc[]
|
2018-05-23 12:37:55 -04:00
|
|
|
|
2017-06-05 16:07:15 -04:00
|
|
|
include::functions/sum.asciidoc[]
|
2018-05-23 12:37:55 -04:00
|
|
|
|
2017-06-05 16:07:15 -04:00
|
|
|
include::functions/time.asciidoc[]
|