60 lines
2.4 KiB
Plaintext
60 lines
2.4 KiB
Plaintext
[role="xpack"]
|
|
[[ml-functions]]
|
|
== Function reference
|
|
|
|
The {ml-features} include analysis functions that provide a wide variety of
|
|
flexible ways to analyze data for anomalies.
|
|
|
|
When you create jobs, you specify one or more detectors, which define the type of
|
|
analysis that needs to be done. If you are creating your job by using {ml} APIs,
|
|
you specify the functions in
|
|
{ref}/ml-job-resource.html#ml-detectorconfig[Detector Configuration Objects].
|
|
If you are creating your job in {kib}, you specify the functions differently
|
|
depending on whether you are creating single metric, multi-metric, or advanced
|
|
jobs.
|
|
//For a demonstration of creating jobs in {kib}, see <<ml-getting-started>>.
|
|
|
|
Most functions detect anomalies in both low and high values. In statistical
|
|
terminology, they apply a two-sided test. Some functions offer low and high
|
|
variations (for example, `count`, `low_count`, and `high_count`). These variations
|
|
apply one-sided tests, detecting anomalies only when the values are low or
|
|
high, depending one which alternative is used.
|
|
|
|
You can specify a `summary_count_field_name` with any function except `metric`.
|
|
When you use `summary_count_field_name`, the {ml} features expect the input
|
|
data to be pre-aggregated. The value of the `summary_count_field_name` field
|
|
must contain the count of raw events that were summarized. In {kib}, use the
|
|
**summary_count_field_name** in advanced jobs. Analyzing aggregated input data
|
|
provides a significant boost in performance. For more information, see
|
|
<<ml-configuring-aggregation>>.
|
|
|
|
If your data is sparse, there may be gaps in the data which means you might have
|
|
empty buckets. You might want to treat these as anomalies or you might want these
|
|
gaps to be ignored. Your decision depends on your use case and what is important
|
|
to you. It also depends on which functions you use. The `sum` and `count`
|
|
functions are strongly affected by empty buckets. For this reason, there are
|
|
`non_null_sum` and `non_zero_count` functions, which are tolerant to sparse data.
|
|
These functions effectively ignore empty buckets.
|
|
|
|
* <<ml-count-functions>>
|
|
* <<ml-geo-functions>>
|
|
* <<ml-info-functions>>
|
|
* <<ml-metric-functions>>
|
|
* <<ml-rare-functions>>
|
|
* <<ml-sum-functions>>
|
|
* <<ml-time-functions>>
|
|
|
|
include::functions/count.asciidoc[]
|
|
|
|
include::functions/geo.asciidoc[]
|
|
|
|
include::functions/info.asciidoc[]
|
|
|
|
include::functions/metric.asciidoc[]
|
|
|
|
include::functions/rare.asciidoc[]
|
|
|
|
include::functions/sum.asciidoc[]
|
|
|
|
include::functions/time.asciidoc[]
|