[DOCS] Add configuration information for population analysis (elastic/x-pack-elasticsearch#1653)

* [DOCS] Add configuration information for population analysis * [DOCS] Add ML population analysis examples * [DOCS] Address feedback for population analysis * [DOCS] More feedback on population analysis Original commit: elastic/x-pack-elasticsearch@ffa2bfeed9
2025-03-27 10:28:28 +00:00 · 2017-06-23 10:53:16 -07:00 · 2017-06-23 10:53:16 -07:00 · 527b789e6f
commit 527b789e6f
parent 00c40c8299
5 changed files with 88 additions and 0 deletions
--- a/docs/en/ml/configuring.asciidoc
+++ b/docs/en/ml/configuring.asciidoc
@ -33,3 +33,4 @@ The scenarios in this section describe some best practices for generating useful

 include::aggregations.asciidoc[]
 include::categories.asciidoc[]
+include::populations.asciidoc[]
--- a/docs/en/ml/images/ml-population-anomaly.jpg
+++ b/docs/en/ml/images/ml-population-anomaly.jpg
--- a/docs/en/ml/images/ml-population-job.jpg
+++ b/docs/en/ml/images/ml-population-job.jpg
--- a/docs/en/ml/images/ml-population-results.jpg
+++ b/docs/en/ml/images/ml-population-results.jpg
--- a/docs/en/ml/populations.asciidoc
+++ b/docs/en/ml/populations.asciidoc
@ -0,0 +1,87 @@
+[[ml-configuring-pop]]
+=== Performing Population Analysis
+
+Entities or events in your data can be considered anomalous when:
+
+* Their behavior changes over time, relative to their own previous behavior, or
+* Their behavior is different than other entities in a specified population.
+
+The latter method of detecting outliers is known as _population analysis_. The
+{ml} analytics build a profile of what a "typical" user, machine, or other entity
+does over a specified time period and then identify when one is behaving
+abnormally compared to the population.
+
+This type of analysis is most useful when the behavior of the population as a
+whole is mostly homogeneous and you want to identify outliers. In general,
+population analysis is not useful when members of the population inherently
+have vastly different behavior. You can, however, segment your data into groups
+that behave similarly and run these as separate jobs. For example, you can use a
+query filter in the {dfeed} to segment your data or you can use the
+`partition_field_name` to split the analysis for the different groups.
+
+Population analysis scales well and has a lower resource footprint than
+individual analysis of each series. For example, you can analyze populations
+of hundreds of thousands or millions of entities.
+
+To specify the population, use the `over_field_name` property. For example:
+
+[source,js]
+----------------------------------
+PUT _xpack/ml/anomaly_detectors/population
+{
+  "description" : "Population analysis",
+  "analysis_config" : {
+    "bucket_span":"10m",
+    "influencers": [
+      "username"
+    ],
+    "detectors": [
+      {
+        "function": "mean",
+        "field_name": "bytesSent",
+        "over_field_name": "username" <1>
+      }
+    ]
+  },
+  "data_description" : {
+    "time_field":"@timestamp",
+    "time_format": "epoch_ms"
+  }
+}
+----------------------------------
+//CONSOLE
+<1> This `over_field-name` property indicates that the metrics for each user (
+  as identified by their `username` value) are analyzed relative to other users
+  in each bucket.
+
+//TO-DO: Per sophiec20 "Perhaps add the datafeed config and add a query filter to
+//include only workstations as servers and printers would behave differently
+//from the population
+
+If your data is stored in {es}, you can create an advanced job with these same
+properties. In particular, you specify the `over_field_name` property when you
+add detectors:
+
+[role="screenshot"]
+image::images/ml-population-job.jpg["Create a detector for population analysis]
+
+After you open the job and start the {dfeed} or supply data to the job, you can
+view the results in {kib}. For example:
+
+[role="screenshot"]
+image::images/ml-population-results.jpg["Population analysis results in the Anomaly Explorer"]
+
+As in this case, the results are often quite sparse. There might be just a few
+data points for the selected time period. Population analysis is particularly
+useful when you have many entities and the data for specific entitles is sporadic
+or sparse.
+
+If you click on a section in the time line or swim lanes, you can see more
+details about the anomalies:
+
+[role="screenshot"]
+image::images/ml-population-anomaly.jpg["Anomaly details for a specific user"]
+
+In this example, the user identified as `antonette` sent a high volume of bytes
+on the date and time shown. This event is anomalous because the mean is two times
+higher than the expected behavior of the population.