OpenSearch/docs/en/ml/functions.asciidoc

55 lines
2.5 KiB
Plaintext

[float]
[[ml-functions]]
=== Analytical Functions
The {xpackml} features include analysis functions that provide a wide variety of
flexible ways to analyze data for anomalies.
When you create jobs, you specify one or more detectors, which define the type of
analysis that needs to be done. If you are creating your job by using {ml} APIs,
you specify the functions in <<ml-detectorconfig,Detector Configuration Objects>>.
If you are creating your job in {kib}, you specify the functions differently
depending on whether you are creating single metric, multi-metric, or advanced
jobs. For a demonstration of creating jobs in {kib}, see <<ml-getting-started>>.
//TBD: Determine what these fields are called in Kibana, for people who aren't using APIs
////
TBD: Integrate from prelert docs?:
By default, temporal (time-based) analysis is invoked, unless you also specify an
`over_field_name`, which shifts the analysis to be population- or peer-based.
When you specify `by_field_name` with a function, the analysis considers whether
there is an anomaly for one of more specific values of `by_field_name`.
NOTE: Some functions cannot be used with a `by_field_name` or `over_field_name`.
You can specify a `partition_field_name` with any function. When this is used,
the analysis is replicated for every distinct value of `partition_field_name`.
You can specify a `summary_count_field_name` with any function except metric.
When you use `summary_count_field_name`, the {ml} features expect the input
data to be pre-summarized. The value of the `summary_count_field_name` field
must contain the count of raw events that were summarized.
Some functions can benefit from overlapping buckets. This improves the overall
accuracy of the results but at the cost of a 2 bucket delay in seeing the results.
////
Most functions detect anomalies in both low and high values. In statistical
terminology, they apply a two-sided test. Some functions offer low and high
variations (for example, `count`, `low_count`, and `high_count`). These variations
apply one-sided tests, detecting anomalies only when the values are low or
high, depending one which alternative is used.
////
The table below provides a high-level summary of the analytical functions provided by the API. Each of the functions is described in detail over the following pages. Note the examples given in these pages use single Detector Configuration objects.
////
* <<ml-count-functions>>
* <<ml-geo-functions>>
* <<ml-info-functions>>
* <<ml-metric-functions>>
* <<ml-rare-functions>>
* <<ml-sum-functions>>
* <<ml-time-functions>>