OpenSearch/docs/reference/ml/populations.asciidoc

[role="xpack"]
[[ml-configuring-pop]]
=== Performing population analysis

Entities or events in your data can be considered anomalous when:

* Their behavior changes over time, relative to their own previous behavior, or
* Their behavior is different than other entities in a specified population.

The latter method of detecting outliers is known as _population analysis_. The
{ml} analytics build a profile of what a "typical" user, machine, or other entity
does over a specified time period and then identify when one is behaving
abnormally compared to the population.

This type of analysis is most useful when the behavior of the population as a
whole is mostly homogeneous and you want to identify outliers. In general,
population analysis is not useful when members of the population inherently
have vastly different behavior. You can, however, segment your data into groups
that behave similarly and run these as separate jobs. For example, you can use a
query filter in the {dfeed} to segment your data or you can use the
`partition_field_name` to split the analysis for the different groups.

Population analysis scales well and has a lower resource footprint than
individual analysis of each series. For example, you can analyze populations
of hundreds of thousands or millions of entities.

To specify the population, use the `over_field_name` property. For example:

[source,js]
----------------------------------
PUT _ml/anomaly_detectors/population
{
  "description" : "Population analysis",
  "analysis_config" : {
    "bucket_span":"15m",
    "influencers": [
      "clientip"
    ],
    "detectors": [
      {
        "function": "mean",
        "field_name": "bytes",
        "over_field_name": "clientip" <1>
      }
    ]
  },
  "data_description" : {
    "time_field":"timestamp",
    "time_format": "epoch_ms"
  }
}
----------------------------------
//CONSOLE
// TEST[skip:needs-licence]
<1> This `over_field_name` property indicates that the metrics for each client (
  as identified by their IP address) are analyzed relative to other clients
  in each bucket.

If your data is stored in {es}, you can use the population job wizard in {kib}
to create a job with these same properties. For example, if you add the sample
web logs in {kib}, you can use the following job settings in the population job
wizard:

[role="screenshot"]
image::images/ml-population-job.jpg["Job settings in the population job wizard]

After you open the job and start the {dfeed} or supply data to the job, you can
view the results in {kib}. For example, you can view the results in the
**Anomaly Explorer**:

[role="screenshot"]
image::images/ml-population-results.jpg["Population analysis results in the Anomaly Explorer"]

As in this case, the results are often quite sparse. There might be just a few
data points for the selected time period. Population analysis is particularly
useful when you have many entities and the data for specific entitles is sporadic
or sparse.

If you click on a section in the timeline or swimlanes, you can see more
details about the anomalies:

[role="screenshot"]
image::images/ml-population-anomaly.jpg["Anomaly details for a specific user"]

In this example, the client IP address `29.64.62.83` received a high volume of
bytes on the date and time shown. This event is anomalous because the mean is
three times higher than the expected behavior of the population.
[DOCS] Fixes code snippet testing for machine learning (#31189) 2018-06-19 16:57:10 -04:00			`[role="xpack"]`
[DOCS] Add configuration information for population analysis (elastic/x-pack-elasticsearch#1653) * [DOCS] Add configuration information for population analysis * [DOCS] Add ML population analysis examples * [DOCS] Address feedback for population analysis * [DOCS] More feedback on population analysis Original commit: elastic/x-pack-elasticsearch@ffa2bfeed9ee06a0c6df7820afec6edab773c29c 2017-06-23 13:53:16 -04:00			`[[ml-configuring-pop]]`
[DOCS] Fixes code snippet testing for machine learning (#31189) 2018-06-19 16:57:10 -04:00			`=== Performing population analysis`
[DOCS] Add configuration information for population analysis (elastic/x-pack-elasticsearch#1653) * [DOCS] Add configuration information for population analysis * [DOCS] Add ML population analysis examples * [DOCS] Address feedback for population analysis * [DOCS] More feedback on population analysis Original commit: elastic/x-pack-elasticsearch@ffa2bfeed9ee06a0c6df7820afec6edab773c29c 2017-06-23 13:53:16 -04:00
			`Entities or events in your data can be considered anomalous when:`

			`* Their behavior changes over time, relative to their own previous behavior, or`
			`* Their behavior is different than other entities in a specified population.`

			`The latter method of detecting outliers is known as _population analysis_. The`
			`{ml} analytics build a profile of what a "typical" user, machine, or other entity`
			`does over a specified time period and then identify when one is behaving`
			`abnormally compared to the population.`

			`This type of analysis is most useful when the behavior of the population as a`
			`whole is mostly homogeneous and you want to identify outliers. In general,`
			`population analysis is not useful when members of the population inherently`
			`have vastly different behavior. You can, however, segment your data into groups`
			`that behave similarly and run these as separate jobs. For example, you can use a`
			`query filter in the {dfeed} to segment your data or you can use the`
			`partition_field_name` to split the analysis for the different groups.

			`Population analysis scales well and has a lower resource footprint than`
			`individual analysis of each series. For example, you can analyze populations`
			`of hundreds of thousands or millions of entities.`

			To specify the population, use the `over_field_name` property. For example:

			`[source,js]`
			`----------------------------------`
[ML] Deprecate X-Pack centric ML endpoints (#36315) This commit is part of our plan to deprecate and ultimately remove the use of _xpack in the REST APIs. Relates #35958 2018-12-07 15:34:11 -05:00			`PUT _ml/anomaly_detectors/population`
[DOCS] Add configuration information for population analysis (elastic/x-pack-elasticsearch#1653) * [DOCS] Add configuration information for population analysis * [DOCS] Add ML population analysis examples * [DOCS] Address feedback for population analysis * [DOCS] More feedback on population analysis Original commit: elastic/x-pack-elasticsearch@ffa2bfeed9ee06a0c6df7820afec6edab773c29c 2017-06-23 13:53:16 -04:00			`{`
			`"description" : "Population analysis",`
			`"analysis_config" : {`
[DOCS] Refreshes population job examples (#36101) 2018-11-30 11:55:29 -05:00			`"bucket_span":"15m",`
[DOCS] Add configuration information for population analysis (elastic/x-pack-elasticsearch#1653) * [DOCS] Add configuration information for population analysis * [DOCS] Add ML population analysis examples * [DOCS] Address feedback for population analysis * [DOCS] More feedback on population analysis Original commit: elastic/x-pack-elasticsearch@ffa2bfeed9ee06a0c6df7820afec6edab773c29c 2017-06-23 13:53:16 -04:00			`"influencers": [`
[DOCS] Refreshes population job examples (#36101) 2018-11-30 11:55:29 -05:00			`"clientip"`
[DOCS] Add configuration information for population analysis (elastic/x-pack-elasticsearch#1653) * [DOCS] Add configuration information for population analysis * [DOCS] Add ML population analysis examples * [DOCS] Address feedback for population analysis * [DOCS] More feedback on population analysis Original commit: elastic/x-pack-elasticsearch@ffa2bfeed9ee06a0c6df7820afec6edab773c29c 2017-06-23 13:53:16 -04:00			`],`
			`"detectors": [`
			`{`
			`"function": "mean",`
[DOCS] Refreshes population job examples (#36101) 2018-11-30 11:55:29 -05:00			`"field_name": "bytes",`
			`"over_field_name": "clientip" <1>`
[DOCS] Add configuration information for population analysis (elastic/x-pack-elasticsearch#1653) * [DOCS] Add configuration information for population analysis * [DOCS] Add ML population analysis examples * [DOCS] Address feedback for population analysis * [DOCS] More feedback on population analysis Original commit: elastic/x-pack-elasticsearch@ffa2bfeed9ee06a0c6df7820afec6edab773c29c 2017-06-23 13:53:16 -04:00			`}`
			`]`
			`},`
			`"data_description" : {`
[DOCS] Refreshes population job examples (#36101) 2018-11-30 11:55:29 -05:00			`"time_field":"timestamp",`
[DOCS] Add configuration information for population analysis (elastic/x-pack-elasticsearch#1653) * [DOCS] Add configuration information for population analysis * [DOCS] Add ML population analysis examples * [DOCS] Address feedback for population analysis * [DOCS] More feedback on population analysis Original commit: elastic/x-pack-elasticsearch@ffa2bfeed9ee06a0c6df7820afec6edab773c29c 2017-06-23 13:53:16 -04:00			`"time_format": "epoch_ms"`
			`}`
			`}`
			`----------------------------------`
			`//CONSOLE`
[DOCS] Moves ml folder from x-pack/docs to docs (#33248) 2018-08-31 14:56:26 -04:00			`// TEST[skip:needs-licence]`
[DOCS] Refreshes population job examples (#36101) 2018-11-30 11:55:29 -05:00			<1> This `over_field_name` property indicates that the metrics for each client (
			`as identified by their IP address) are analyzed relative to other clients`
[DOCS] Add configuration information for population analysis (elastic/x-pack-elasticsearch#1653) * [DOCS] Add configuration information for population analysis * [DOCS] Add ML population analysis examples * [DOCS] Address feedback for population analysis * [DOCS] More feedback on population analysis Original commit: elastic/x-pack-elasticsearch@ffa2bfeed9ee06a0c6df7820afec6edab773c29c 2017-06-23 13:53:16 -04:00			`in each bucket.`

[DOCS] Update screenshots for population job (elastic/x-pack-elasticsearch#3334) * [DOCS] Update screenshots for population job * [DOCS] Updated screenshots for population wizard Original commit: elastic/x-pack-elasticsearch@21b7dc1734bb9ceba10a92334011bb92886629b9 2017-12-20 16:09:58 -05:00			`If your data is stored in {es}, you can use the population job wizard in {kib}`
[DOCS] Refreshes population job examples (#36101) 2018-11-30 11:55:29 -05:00			`to create a job with these same properties. For example, if you add the sample`
			`web logs in {kib}, you can use the following job settings in the population job`
			`wizard:`
[DOCS] Add configuration information for population analysis (elastic/x-pack-elasticsearch#1653) * [DOCS] Add configuration information for population analysis * [DOCS] Add ML population analysis examples * [DOCS] Address feedback for population analysis * [DOCS] More feedback on population analysis Original commit: elastic/x-pack-elasticsearch@ffa2bfeed9ee06a0c6df7820afec6edab773c29c 2017-06-23 13:53:16 -04:00
			`[role="screenshot"]`
[DOCS] Update screenshots for population job (elastic/x-pack-elasticsearch#3334) * [DOCS] Update screenshots for population job * [DOCS] Updated screenshots for population wizard Original commit: elastic/x-pack-elasticsearch@21b7dc1734bb9ceba10a92334011bb92886629b9 2017-12-20 16:09:58 -05:00			`image::images/ml-population-job.jpg["Job settings in the population job wizard]`
[DOCS] Add configuration information for population analysis (elastic/x-pack-elasticsearch#1653) * [DOCS] Add configuration information for population analysis * [DOCS] Add ML population analysis examples * [DOCS] Address feedback for population analysis * [DOCS] More feedback on population analysis Original commit: elastic/x-pack-elasticsearch@ffa2bfeed9ee06a0c6df7820afec6edab773c29c 2017-06-23 13:53:16 -04:00
			`After you open the job and start the {dfeed} or supply data to the job, you can`
[DOCS] Update screenshots for population job (elastic/x-pack-elasticsearch#3334) * [DOCS] Update screenshots for population job * [DOCS] Updated screenshots for population wizard Original commit: elastic/x-pack-elasticsearch@21b7dc1734bb9ceba10a92334011bb92886629b9 2017-12-20 16:09:58 -05:00			`view the results in {kib}. For example, you can view the results in the`
			`Anomaly Explorer:`
[DOCS] Add configuration information for population analysis (elastic/x-pack-elasticsearch#1653) * [DOCS] Add configuration information for population analysis * [DOCS] Add ML population analysis examples * [DOCS] Address feedback for population analysis * [DOCS] More feedback on population analysis Original commit: elastic/x-pack-elasticsearch@ffa2bfeed9ee06a0c6df7820afec6edab773c29c 2017-06-23 13:53:16 -04:00
			`[role="screenshot"]`
			`image::images/ml-population-results.jpg["Population analysis results in the Anomaly Explorer"]`

			`As in this case, the results are often quite sparse. There might be just a few`
			`data points for the selected time period. Population analysis is particularly`
			`useful when you have many entities and the data for specific entitles is sporadic`
			`or sparse.`

[DOCS] Update screenshots for population job (elastic/x-pack-elasticsearch#3334) * [DOCS] Update screenshots for population job * [DOCS] Updated screenshots for population wizard Original commit: elastic/x-pack-elasticsearch@21b7dc1734bb9ceba10a92334011bb92886629b9 2017-12-20 16:09:58 -05:00			`If you click on a section in the timeline or swimlanes, you can see more`
[DOCS] Add configuration information for population analysis (elastic/x-pack-elasticsearch#1653) * [DOCS] Add configuration information for population analysis * [DOCS] Add ML population analysis examples * [DOCS] Address feedback for population analysis * [DOCS] More feedback on population analysis Original commit: elastic/x-pack-elasticsearch@ffa2bfeed9ee06a0c6df7820afec6edab773c29c 2017-06-23 13:53:16 -04:00			`details about the anomalies:`

			`[role="screenshot"]`
			`image::images/ml-population-anomaly.jpg["Anomaly details for a specific user"]`

[DOCS] Refreshes population job examples (#36101) 2018-11-30 11:55:29 -05:00			In this example, the client IP address `29.64.62.83` received a high volume of
			`bytes on the date and time shown. This event is anomalous because the mean is`
			`three times higher than the expected behavior of the population.`