OpenSearch/docs/en/ml/functions.asciidoc

[float]
[[ml-functions]]
=== Analytical Functions

The {xpackml} features include analysis functions that provide a wide variety of
flexible ways to analyze data for anomalies.

When you create jobs, you specify one or more detectors, which define the type of
analysis that needs to be done. If you are creating your job by using {ml} APIs,
you specify the functions in <<ml-detectorconfig,Detector Configuration Objects>>.
If you are creating your job in {kib}, you specify the functions differently
depending on whether you are creating single metric, multi-metric, or advanced
jobs. For a demonstration of creating jobs in {kib}, see <<ml-getting-started>>.

Most functions detect anomalies in both low and high values. In statistical
terminology, they apply a two-sided test. Some functions offer low and high
variations (for example, `count`, `low_count`, and `high_count`). These variations
apply one-sided tests, detecting anomalies only when the values are low or
high, depending one which alternative is used.

//For some functions, you can optionally specify a field name in the
//`by_field_name` property. The analysis then considers whether there is an
//anomaly for one of more specific values of that field. In {kib}, use the
//**Key Fields** field in multi-metric jobs or the **by_field_name** field in
//advanced jobs.
////
TODO: Per Sophie, "This is incorrect... Split Data refers to a partition_field_name. Over fields can only be added in Adv Config...

Can you please remove the explanations for by/over/partition fields from the documentation for analytical functions. It's a complex topic and will be easier to review in a separate exercise."
////

//For some functions, you can also optionally specify a field name in the
//`over_field_name` property. This property shifts the analysis to be population-
//or peer-based and uses the field to split the data.  In {kib}, use the
//**Split Data** field in multi-metric jobs or the **over_field_name** field in
//advanced jobs.

//You can specify a `partition_field_name` with any function. The analysis is then
//segmented with completely independent baselines for each value of that field.
//In {kib}, use the **partition_field_name** field in advanced jobs.

You can specify a `summary_count_field_name` with any function except `metric`.
When you use `summary_count_field_name`, the {ml} features expect the input
data to be pre-aggregated. The value of the `summary_count_field_name` field
must contain the count of raw events that were summarized. In {kib}, use the
**summary_count_field_name** in advanced jobs. Analyzing aggregated input data
provides a significant boost in performance. For more information, see
<<ml-configuring-aggregation>>.

If your data is sparse, there may be gaps in the data which means you might have
empty buckets. You might want to treat these as anomalies or you might want these
gaps to be ignored. Your decision depends on your use case and what is important
to you. It also depends on which functions you use. The `sum` and `count`
functions are strongly affected by empty buckets. For this reason, there are
`non_null_sum` and `non_zero_count` functions, which are tolerant to sparse data.
These functions effectively ignore empty buckets.

////
Some functions can benefit from overlapping buckets. This improves the overall
accuracy of the results but at the cost of a 2 bucket delay in seeing the results.

The table below provides a high-level summary of the analytical functions provided by the API. Each of the functions is described in detail over the following pages. Note the examples given in these pages use single Detector Configuration objects.
////

* <<ml-count-functions>>
* <<ml-geo-functions>>
* <<ml-info-functions>>
* <<ml-metric-functions>>
* <<ml-rare-functions>>
* <<ml-sum-functions>>
* <<ml-time-functions>>
[DOCS] Add ML analytical functions (elastic/x-pack-elasticsearch#1319) * [DOCS] Add ML analytical functions * [DOCS] Add pages for ML analytical functions * [DOCS] Add links to ML functions from API definitions Original commit: elastic/x-pack-elasticsearch@ae50b431d3e2ca6cfb61fde014cca2ae9fa0024f 2017-05-05 13:40:17 -04:00			`[float]`
			`[[ml-functions]]`
			`=== Analytical Functions`

			`The {xpackml} features include analysis functions that provide a wide variety of`
			`flexible ways to analyze data for anomalies.`

			`When you create jobs, you specify one or more detectors, which define the type of`
			`analysis that needs to be done. If you are creating your job by using {ml} APIs,`
			`you specify the functions in <<ml-detectorconfig,Detector Configuration Objects>>.`
			`If you are creating your job in {kib}, you specify the functions differently`
			`depending on whether you are creating single metric, multi-metric, or advanced`
			`jobs. For a demonstration of creating jobs in {kib}, see <<ml-getting-started>>.`

[DOCS] Add details about ML count functions (elastic/x-pack-elasticsearch#1335) * [DOCS] Add details about ML count functions * [DOCS] Address feedback in ML count functions * [DOCS] Clarify ML population analysis in non-zero count functions Original commit: elastic/x-pack-elasticsearch@24dbeba891ee7b23f35cd1887134476ff721dfe8 2017-06-01 17:16:14 -04:00			`Most functions detect anomalies in both low and high values. In statistical`
			`terminology, they apply a two-sided test. Some functions offer low and high`
			variations (for example, `count`, `low_count`, and `high_count`). These variations
			`apply one-sided tests, detecting anomalies only when the values are low or`
			`high, depending one which alternative is used.`

			`//For some functions, you can optionally specify a field name in the`
			//`by_field_name` property. The analysis then considers whether there is an
			`//anomaly for one of more specific values of that field. In {kib}, use the`
			`//Key Fields field in multi-metric jobs or the by_field_name field in`
			`//advanced jobs.`
[DOCS] Add ML analytical functions (elastic/x-pack-elasticsearch#1319) * [DOCS] Add ML analytical functions * [DOCS] Add pages for ML analytical functions * [DOCS] Add links to ML functions from API definitions Original commit: elastic/x-pack-elasticsearch@ae50b431d3e2ca6cfb61fde014cca2ae9fa0024f 2017-05-05 13:40:17 -04:00			`////`
[DOCS] Add details about ML count functions (elastic/x-pack-elasticsearch#1335) * [DOCS] Add details about ML count functions * [DOCS] Address feedback in ML count functions * [DOCS] Clarify ML population analysis in non-zero count functions Original commit: elastic/x-pack-elasticsearch@24dbeba891ee7b23f35cd1887134476ff721dfe8 2017-06-01 17:16:14 -04:00			`TODO: Per Sophie, "This is incorrect... Split Data refers to a partition_field_name. Over fields can only be added in Adv Config...`
[DOCS] Add ML analytical functions (elastic/x-pack-elasticsearch#1319) * [DOCS] Add ML analytical functions * [DOCS] Add pages for ML analytical functions * [DOCS] Add links to ML functions from API definitions Original commit: elastic/x-pack-elasticsearch@ae50b431d3e2ca6cfb61fde014cca2ae9fa0024f 2017-05-05 13:40:17 -04:00
[DOCS] Add details about ML count functions (elastic/x-pack-elasticsearch#1335) * [DOCS] Add details about ML count functions * [DOCS] Address feedback in ML count functions * [DOCS] Clarify ML population analysis in non-zero count functions Original commit: elastic/x-pack-elasticsearch@24dbeba891ee7b23f35cd1887134476ff721dfe8 2017-06-01 17:16:14 -04:00			`Can you please remove the explanations for by/over/partition fields from the documentation for analytical functions. It's a complex topic and will be easier to review in a separate exercise."`
			`////`
[DOCS] Add ML analytical functions (elastic/x-pack-elasticsearch#1319) * [DOCS] Add ML analytical functions * [DOCS] Add pages for ML analytical functions * [DOCS] Add links to ML functions from API definitions Original commit: elastic/x-pack-elasticsearch@ae50b431d3e2ca6cfb61fde014cca2ae9fa0024f 2017-05-05 13:40:17 -04:00
[DOCS] Add details about ML count functions (elastic/x-pack-elasticsearch#1335) * [DOCS] Add details about ML count functions * [DOCS] Address feedback in ML count functions * [DOCS] Clarify ML population analysis in non-zero count functions Original commit: elastic/x-pack-elasticsearch@24dbeba891ee7b23f35cd1887134476ff721dfe8 2017-06-01 17:16:14 -04:00			`//For some functions, you can also optionally specify a field name in the`
			//`over_field_name` property. This property shifts the analysis to be population-
			`//or peer-based and uses the field to split the data. In {kib}, use the`
			`//Split Data field in multi-metric jobs or the over_field_name field in`
			`//advanced jobs.`
[DOCS] Add ML analytical functions (elastic/x-pack-elasticsearch#1319) * [DOCS] Add ML analytical functions * [DOCS] Add pages for ML analytical functions * [DOCS] Add links to ML functions from API definitions Original commit: elastic/x-pack-elasticsearch@ae50b431d3e2ca6cfb61fde014cca2ae9fa0024f 2017-05-05 13:40:17 -04:00
[DOCS] Add details about ML count functions (elastic/x-pack-elasticsearch#1335) * [DOCS] Add details about ML count functions * [DOCS] Address feedback in ML count functions * [DOCS] Clarify ML population analysis in non-zero count functions Original commit: elastic/x-pack-elasticsearch@24dbeba891ee7b23f35cd1887134476ff721dfe8 2017-06-01 17:16:14 -04:00			//You can specify a `partition_field_name` with any function. The analysis is then
			`//segmented with completely independent baselines for each value of that field.`
			`//In {kib}, use the partition_field_name field in advanced jobs.`
[DOCS] Add ML analytical functions (elastic/x-pack-elasticsearch#1319) * [DOCS] Add ML analytical functions * [DOCS] Add pages for ML analytical functions * [DOCS] Add links to ML functions from API definitions Original commit: elastic/x-pack-elasticsearch@ae50b431d3e2ca6cfb61fde014cca2ae9fa0024f 2017-05-05 13:40:17 -04:00
[DOCS] Add details about ML count functions (elastic/x-pack-elasticsearch#1335) * [DOCS] Add details about ML count functions * [DOCS] Address feedback in ML count functions * [DOCS] Clarify ML population analysis in non-zero count functions Original commit: elastic/x-pack-elasticsearch@24dbeba891ee7b23f35cd1887134476ff721dfe8 2017-06-01 17:16:14 -04:00			You can specify a `summary_count_field_name` with any function except `metric`.
[DOCS] Add ML analytical functions (elastic/x-pack-elasticsearch#1319) * [DOCS] Add ML analytical functions * [DOCS] Add pages for ML analytical functions * [DOCS] Add links to ML functions from API definitions Original commit: elastic/x-pack-elasticsearch@ae50b431d3e2ca6cfb61fde014cca2ae9fa0024f 2017-05-05 13:40:17 -04:00			When you use `summary_count_field_name`, the {ml} features expect the input
[DOCS] Add details about ML count functions (elastic/x-pack-elasticsearch#1335) * [DOCS] Add details about ML count functions * [DOCS] Address feedback in ML count functions * [DOCS] Clarify ML population analysis in non-zero count functions Original commit: elastic/x-pack-elasticsearch@24dbeba891ee7b23f35cd1887134476ff721dfe8 2017-06-01 17:16:14 -04:00			data to be pre-aggregated. The value of the `summary_count_field_name` field
			`must contain the count of raw events that were summarized. In {kib}, use the`
			`summary_count_field_name in advanced jobs. Analyzing aggregated input data`
[DOCS] Add link to ML aggregations page Original commit: elastic/x-pack-elasticsearch@9137522f9a25df1208eed25b40def3ba883870b9 2017-06-02 15:22:19 -04:00			`provides a significant boost in performance. For more information, see`
			`<<ml-configuring-aggregation>>.`
[DOCS] Add ML analytical functions (elastic/x-pack-elasticsearch#1319) * [DOCS] Add ML analytical functions * [DOCS] Add pages for ML analytical functions * [DOCS] Add links to ML functions from API definitions Original commit: elastic/x-pack-elasticsearch@ae50b431d3e2ca6cfb61fde014cca2ae9fa0024f 2017-05-05 13:40:17 -04:00
[DOCS] Add details about ML count functions (elastic/x-pack-elasticsearch#1335) * [DOCS] Add details about ML count functions * [DOCS] Address feedback in ML count functions * [DOCS] Clarify ML population analysis in non-zero count functions Original commit: elastic/x-pack-elasticsearch@24dbeba891ee7b23f35cd1887134476ff721dfe8 2017-06-01 17:16:14 -04:00			`If your data is sparse, there may be gaps in the data which means you might have`
			`empty buckets. You might want to treat these as anomalies or you might want these`
			`gaps to be ignored. Your decision depends on your use case and what is important`
			to you. It also depends on which functions you use. The `sum` and `count`
			`functions are strongly affected by empty buckets. For this reason, there are`
			`non_null_sum` and `non_zero_count` functions, which are tolerant to sparse data.
			`These functions effectively ignore empty buckets.`
[DOCS] Add ML analytical functions (elastic/x-pack-elasticsearch#1319) * [DOCS] Add ML analytical functions * [DOCS] Add pages for ML analytical functions * [DOCS] Add links to ML functions from API definitions Original commit: elastic/x-pack-elasticsearch@ae50b431d3e2ca6cfb61fde014cca2ae9fa0024f 2017-05-05 13:40:17 -04:00
			`////`
[DOCS] Add details about ML count functions (elastic/x-pack-elasticsearch#1335) * [DOCS] Add details about ML count functions * [DOCS] Address feedback in ML count functions * [DOCS] Clarify ML population analysis in non-zero count functions Original commit: elastic/x-pack-elasticsearch@24dbeba891ee7b23f35cd1887134476ff721dfe8 2017-06-01 17:16:14 -04:00			`Some functions can benefit from overlapping buckets. This improves the overall`
			`accuracy of the results but at the cost of a 2 bucket delay in seeing the results.`

[DOCS] Add ML analytical functions (elastic/x-pack-elasticsearch#1319) * [DOCS] Add ML analytical functions * [DOCS] Add pages for ML analytical functions * [DOCS] Add links to ML functions from API definitions Original commit: elastic/x-pack-elasticsearch@ae50b431d3e2ca6cfb61fde014cca2ae9fa0024f 2017-05-05 13:40:17 -04:00			`The table below provides a high-level summary of the analytical functions provided by the API. Each of the functions is described in detail over the following pages. Note the examples given in these pages use single Detector Configuration objects.`
			`////`

			`* <<ml-count-functions>>`
			`* <<ml-geo-functions>>`
			`* <<ml-info-functions>>`
			`* <<ml-metric-functions>>`
			`* <<ml-rare-functions>>`
			`* <<ml-sum-functions>>`
			`* <<ml-time-functions>>`