[DOCS] Add multi-metric job creation to ML getting started tutorial (elastic/x-pack-elasticsearch#1451)

* [DOCS] Getting started with multi-metric jobs

* [DOCS] More work on ML getting started with multi-metric jobs

* [DOCS] Add ML getting started with multi-metric jobs screenshot

* [DOCS] Add ML getting started information about influencers

* [DOCS] Getting started with multi-metric jobs

* [DOCS] Fix ML getting started build error

* [DOCS] Add ML getting started multi-metric snapshots

* [DOCS] Add screenshots and next steps to ML Getting Started tutorial

* [DOCS] Clarified anomaly scores in ML Getting Started pages

* [DOCS] Addressed ML getting started feedback

* [DOCS] Fix ML getting started links

Original commit: elastic/x-pack-elasticsearch@a7e80cfabf
This commit is contained in:
Lisa Cawley 2017-06-27 13:30:15 -07:00 committed by lcawley
parent b710f5906f
commit 6d4be0e5d3
10 changed files with 304 additions and 111 deletions

View File

@ -0,0 +1,218 @@
[[ml-gs-multi-jobs]]
=== Creating Multi-metric Jobs
The multi-metric job wizard in {kib} provides a simple way to create more
complex jobs with multiple detectors. For example, in the single metric job, you
were tracking total requests versus time. You might also want to track other
metrics like average response time or the maximum number of denied requests.
Instead of creating jobs for each of those metrics, you can combine them in a
multi-metric job.
You can also use multi-metric jobs to split a single time series into multiple
time series based on a categorical field. For example, you can split the data
based on its hostnames, locations, or users. Each time series is modeled
independently. By looking at temporal patterns on a per entity basis, you might
spot things that might have otherwise been hidden in the lumped view.
Conceptually, you can think of this as running many independent single metric
jobs. By bundling them together in a multi-metric job, however, you can see an
overall score and shared influencers for all the metrics and all the entities in
the job. Multi-metric jobs therefore scale better than having many independent
single metric jobs and provide better results when you have influencers that are
shared across the detectors.
The sample data for this tutorial contains information about the requests that
are received by various applications and services in a system. Let's assume that
you want to monitor the requests received and the response time. In particular,
you might want to track those metrics on a per service basis to see if any
services have unusual patterns.
To create a multi-metric job in {kib}:
. Open {kib} in your web browser and log in. If you are running {kib} locally,
go to `http://localhost:5601/`.
. Click **Machine Learning** in the side navigation, then click **Create new job**. +
+
--
[role="screenshot"]
image::images/ml-kibana.jpg[Job Management]
--
. Click **Create multi metric job**. +
+
--
[role="screenshot"]
image::images/ml-create-job2.jpg["Create a multi metric job"]
--
. Click the `server-metrics` index. +
+
--
[role="screenshot"]
image::images/ml-gs-index.jpg["Select an index"]
--
. Configure the job by providing the following job settings: +
+
--
[role="screenshot"]
image::images/ml-gs-multi-job.jpg["Create a new job from the server-metrics index"]
--
.. For the **Fields**, select `high mean(response)` and `sum(total)`. This
creates two detectors and specifies the analysis function and field that each
detector uses. The first detector uses the high mean function to detect
unusually high average values for the `response` field in each bucket. The
second detector uses the sum function to detect when the sum of the `total`
field is anomalous in each bucket. For more information about any of the
analytical functions, see <<ml-functions>>.
.. For the **Bucket span**, enter `10m`. This value specifies the size of the
interval that the analysis is aggregated into. As was the case in the single
metric example, this value has a significant impact on the analysis. When you're
creating jobs for your own data, you might need to experiment with different
bucket spans depending on the frequency of the input data, the duration of
typical anomalies, and the frequency at which alerting is required.
.. For the **Split Data**, select `service`. When you specify this
option, the analysis is segmented such that you have completely independent
baselines for each distinct value of this field.
//TBD: What is the importance of having separate baselines?
There are seven unique service keyword values in the sample data. Thus for each
of the seven services, you will see the high mean response metrics and sum
total metrics. +
+
--
NOTE: If you are creating a job by using the {ml} APIs or the advanced job
wizard in {kib}, you can accomplish this split by using the
`partition_field_name` property.
--
.. For the **Key Fields**, select `host`. Note that the `service` field
is also automatically selected because you used it to split the data. These key
fields are also known as _influencers_.
When you identify a field as an influencer, you are indicating that you think
it contains information about someone or something that influences or
contributes to anomalies.
+
--
[TIP]
========================
Picking an influencer is strongly recommended for the following reasons:
* It allows you to more easily assign blame for the anomaly
* It simplifies and aggregates the results
The best influencer is the person or thing that you want to blame for the
anomaly. In many cases, users or client IP addresses make excellent influencers.
Influencers can be any field in your data; they do not need to be fields that
are specified in your detectors, though they often are.
As a best practice, do not pick too many influencers. For example, you generally
do not need more than three. If you pick many influencers, the results can be
overwhelming and there is a small overhead to the analysis.
========================
//TBD: Is this something you can determine later from looking at results and
//update your job with if necessary? Is it all post-processing or does it affect
//the ongoing modeling?
--
. Click **Use full server-metrics* data**. Two graphs are generated for each
`service` value, which represent the high mean `response` values and
sum `total` values over time.
//TBD What is the use of the document count table?
. Provide a name for the job, for example `response_requests_by_app`. The job
name must be unique in your cluster. You can also optionally provide a
description of the job.
. Click **Create Job**. As the job is created, the graphs are updated to give a
visual representation of the progress of {ml} as the data is processed. For
example:
+
--
[role="screenshot"]
image::images/ml-gs-job2-results.jpg["Job results updating as data is processed"]
--
TIP: The `create_multi_metic.sh` script creates a similar job and {dfeed} by
using the {ml} APIs. You can download that script by clicking
here: https://download.elastic.co/demos/machine_learning/gettingstarted/create_multi_metric.sh[create_multi_metric.sh]
For API reference information, see {ref}/ml-apis.html[Machine Learning APIs].
[[ml-gs-job2-analyze]]
=== Exploring Multi-metric Job Results
The {xpackml} features analyze the input stream of data, model its behavior, and
perform analysis based on the two detectors you defined in your job. When an
event occurs outside of the model, that event is identified as an anomaly.
You can use the **Anomaly Explorer** in {kib} to view the analysis results:
[role="screenshot"]
image::images/ml-gs-job2-explorer.jpg["Job results in the Anomaly Explorer"]
You can explore the overall anomaly time line, which shows the maximum anomaly
score for each section in the specified time period. You can change the time
period by using the time picker in the {kib} toolbar. Note that the sections in
this time line do not necessarily correspond to the bucket span. If you change
the time period, the sections change size too. The smallest possible size for
these sections is a bucket. If you specify a large time period, the sections can
span many buckets.
On the left is a list of the top influencers for all of the detected anomalies
in that same time period. The list includes maximum anomaly scores, which in
this case are aggregated for each influencer, for each bucket, across all
detectors. There is also a total sum of the anomaly scores for each influencer.
You can use this list to help you narrow down the contributing factors and focus
on the most anomalous entities.
If your job contains influencers, you can also explore swim lanes that
correspond to the values of an influencer. In this example, the swim lanes
correspond to the values for the `service` field that you used to split the data.
Each lane represents a unique application or service name. Since you specified
the `host` field as an influencer, you can also optionally view the results in
swim lanes for each host name:
[role="screenshot"]
image::images/ml-gs-job2-explorer-host.jpg["Job results sorted by host"]
By default, the swim lanes are ordered by their maximum anomaly score values.
You can click on the sections in the swim lane to see details about the
anomalies that occurred in that time interval.
NOTE: The anomaly scores that you see in each section of the **Anomaly Explorer**
might differ slightly. This disparity occurs because for each job we generate
bucket results, influencer results, and record results. Anomaly scores are
generated for each type of result. The anomaly timeline uses the bucket-level
anomaly scores. The list of top influencers uses the influencer-level anomaly
scores. The list of anomalies uses the record-level anomaly scores. For more
information about these different result types, see
{ref}/ml-results-resource.html[Results Resources].
Click on a section in the swim lanes to obtain more information about the
anomalies in that time period. For example, click on the red section in the swim
lane for `server_2`:
[role="screenshot"]
image::images/ml-gs-job2-explorer-anomaly.jpg["Job results for an anomaly"]
You can see exact times when anomalies occurred and which detectors or metrics
caught the anomaly. Also note that because you split the data by the `service`
field, you see separate charts for each applicable service. In particular, you
see charts for each service for which there is data on the specified host in the
specified time interval.
Below the charts, there is a table that provides more information, such as the
typical and actual values and the influencers that contributed to the anomaly.
[role="screenshot"]
image::images/ml-gs-job2-explorer-table.jpg["Job results table"]
Notice that there are anomalies for both detectors, that is to say for both the
`high_mean(response)` and the `sum(total)` metrics in this time interval. By
investigating multiple metrics in a single job, you might see relationships
between events in your data that would otherwise be overlooked.

View File

@ -0,0 +1,55 @@
[[ml-gs-next]]
=== Next Steps
By completing this tutorial, you've learned how you can detect anomalous
behavior in a simple set of sample data. You created single and multi-metric
jobs in {kib}, which creates and opens jobs and creates and starts {dfeeds} for
you under the covers. You examined the results of the {ml} analysis in the
**Single Metric Viewer** and **Anomaly Explorer** in {kib}.
If you want to learn about advanced job options, you might be interested in
the following video tutorial:
https://www.elastic.co/videos/machine-learning-lab-3-detect-outliers-in-a-population[Machine Learning Lab 3 - Detect Outliers in a Population].
If you intend to use {ml} APIs in your applications, a good next step might be
to learn about the APIs by retrieving information about these sample jobs.
For example, the following APIs retrieve information about the jobs and {dfeeds}.
[source,js]
--------------------------------------------------
GET _xpack/ml/anomaly_detectors
GET _xpack/ml/datafeeds
--------------------------------------------------
// CONSOLE
For more information about the {ml} APIs, see <<ml-api-quickref>>.
Ultimately, the next step is to start applying {ml} to your own data.
As mentioned in <<ml-gs-data>>, there are three things to consider when you're
thinking about where {ml} will be most impactful:
. It must be time series data.
. It should be information that contains key performance indicators for the
health, security, or success of your business or system. The better you know the
data, the quicker you will be able to create jobs that generate useful
insights.
. Ideally, the data is located in {es} and you can therefore create a {dfeed}
that retrieves data in real time. If your data is outside of {es}, you
cannot use {kib} to create your jobs and you cannot use {dfeeds}. Machine
learning analysis is still possible, however, by using APIs to create and manage
jobs and to post data to them.
Once you have decided which data to analyze, you can start considering which
analysis functions you want to use. For more information, see <<ml-functions>>.
In general, it is a good idea to start with single metric jobs for your
key performance indicators. After you examine these simple analysis results,
you will have a better idea of what the influencers might be. You can create
multi-metric jobs and split the data or create more complex analysis functions
as necessary.
//TO)DO: Add link to configuration section: For examples of
//more complicated configuration options, see <<>>.
If you encounter problems, we're here to help. See <<xpack-help>> and
<<ml-troubleshooting>>.

View File

@ -13,13 +13,14 @@ Ready to get some hands-on experience with the {xpackml} features? This
tutorial shows you how to:
* Load a sample data set into {es}
* Create a {ml} job
* Create single and multi-metric {ml} jobs in {kib}
* Use the results to identify possible anomalies in the data
At the end of this tutorial, you should have a good idea of what {ml} is and
will hopefully be inspired to use it to detect anomalies in your own data.
You might also be interested in these video tutorials:
You might also be interested in these video tutorials, which use the same sample
data:
* https://www.elastic.co/videos/machine-learning-tutorial-creating-a-single-metric-job[Machine Learning for the Elastic Stack: Creating a single metric job]
* https://www.elastic.co/videos/machine-learning-tutorial-creating-a-multi-metric-job[Machine Learning for the Elastic Stack: Creating a multi-metric job]
@ -278,8 +279,8 @@ You should see output similar to the following:
[source,shell]
----------------------------------
health status index ... pri rep docs.count docs.deleted store.size ...
green open server-metrics ... 1 0 905940 0 120.5mb ...
health status index ... pri rep docs.count ...
green open server-metrics ... 1 0 905940 ...
----------------------------------
Next, you must define an index pattern for this data set:
@ -305,7 +306,7 @@ This data set can now be analyzed in {ml} jobs in {kib}.
[[ml-gs-jobs]]
=== Creating Jobs
=== Creating Single Metric Jobs
Machine learning jobs contain the configuration information and metadata
necessary to perform an analytical task. They also contain the results of the
@ -322,7 +323,25 @@ web browser so that it does not block pop-up windows or create an
exception for your Kibana URL.
--
To work with jobs in {kib}:
You can choose to create single metric, multi-metric, or advanced jobs in
{kib}. At this point in the tutorial, the goal is to detect anomalies in the
total requests received by your applications and services. The sample data
contains a single key performance indicator to track this, which is the total
requests over time. It is therefore logical to start by creating a single metric
job for this KPI.
TIP: If you are using aggregated data, you can create an advanced job
and configure it to use a `summary_count_field`. The {ml} algorithms will
make the best possible use of summarized data in this case. For simplicity, in
this tutorial we will not make use of that advanced functionality.
//TO-DO: Add link to aggregations.asciidoc: For more information, see <<>>.
A single metric job contains a single _detector_. A detector defines the type of
analysis that will occur (for example, `max`, `average`, or `rare` analytical
functions) and the fields that will be analyzed.
To create a single metric job in {kib}:
. Open {kib} in your web browser and log in. If you are running {kib} locally,
go to `http://localhost:5601/`.
@ -334,30 +353,7 @@ go to `http://localhost:5601/`.
image::images/ml-kibana.jpg[Job Management]
--
You can choose to create single metric, multi-metric, or advanced jobs in
{kib}. In this tutorial, the goal is to detect anomalies in the total requests
received by your applications and services. The sample data contains a single
key performance indicator to track this, which is the total requests over time.
It is therefore logical to start by creating a single metric job for this KPI.
TIP: If you are using aggregated data, you can create an advanced job
and configure it to use a `summary_count_field`. The {ml} algorithms will
make the best possible use of summarized data in this case. For simplicity in this tutorial
we will not make use of that advanced functionality.
[float]
[[ml-gs-job1-create]]
==== Creating a Single Metric Job
A single metric job contains a single _detector_. A detector defines the type of
analysis that will occur (for example, `max`, `average`, or `rare` analytical
functions) and the fields that will be analyzed.
To create a single metric job in {kib}:
. Click **Machine Learning** in the side navigation,
then click **Create new job**.
. Click **Create new job**.
. Click **Create single metric job**. +
+
@ -568,9 +564,8 @@ button: image:images/ml-stop-feed.jpg["Stop {dfeed}"]
Now that you have processed all the data, let's start exploring the job results.
[[ml-gs-jobresults]]
=== Exploring Job Results
[[ml-gs-job1-analyze]]
=== Exploring Single Metric Job Results
The {xpackml} features analyze the input stream of data, model its behavior,
and perform analysis based on the detectors you defined in your job. When an
@ -589,11 +584,6 @@ Anomaly Explorer::
also swim lanes for each influencer. By selecting a block in a swim lane, the
anomaly details are displayed alongside the original source data (where
applicable).
//TBD: Are they swimlane blocks, tiles, segments or cards? hmmm
//TBD: Do the time periods in the heat map correspond to buckets? hmmm is it a heat map?
//As time is the x-axis, and the block sizes stay the same, it feels more intuitive call it a swimlane.
//The swimlane bucket intervals depends on the time range selected. Their smallest possible
//granularity is a bucket, but if you have a big time range selected, then they will span many buckets
Single Metric Viewer::
This view contains a chart that represents the actual and expected values over
@ -601,10 +591,6 @@ Single Metric Viewer::
where `model_plot_config` is enabled. As in the **Anomaly Explorer**, anomalous
data points are shown in different colors depending on their score.
[float]
[[ml-gs-job1-analyze]]
==== Exploring Single Metric Job Results
By default when you view the results for a single metric job, the
**Single Metric Viewer** opens:
[role="screenshot"]
@ -652,7 +638,7 @@ You can see the same information in a different format by using the
image::images/ml-gs-job1-explorer.jpg["Anomaly Explorer for total-requests job"]
Click one of the red blocks in the swim lane to see details about the anomalies
Click one of the red sections in the swim lane to see details about the anomalies
that occurred in that time interval. For example:
[role="screenshot"]
image::images/ml-gs-job1-explorer-anomaly.jpg["Anomaly Explorer details for total-requests job"]
@ -663,71 +649,5 @@ contributing to the problem? Are the anomalies confined to particular
applications or servers? You can begin to troubleshoot these situations by
layering additional jobs or creating multi-metric jobs.
////
The troubleshooting job would not create alarms of its own, but rather would
help explain the overall situation. It's usually a different job because it's
operating on different indices. Layering jobs is an important concept.
////
////
[float]
[[ml-gs-job2-create]]
==== Creating a Multi-Metric Job
TBD.
* Walk through creation of a simple multi-metric job.
* Provide overview of:
** partition fields,
** influencers
*** An influencer is someone or something that has influenced or contributed to the anomaly.
Results are aggregated for each influencer, for each bucket, across all detectors.
In this way, a combined anomaly score is calculated for each influencer,
which determines its relative anomalousness. You can specify one or many influencers.
Picking an influencer is strongly recommended for the following reasons:
**** It allow you to blame someone/something for the anomaly
**** It simplifies and aggregates results
*** The best influencer is the person or thing that you want to blame for the anomaly.
In many cases, users or client IP make excellent influencers.
*** By/over/partition fields are usually good candidates for influencers.
*** Influencers can be any field in the source data; they do not need to be fields
specified in detectors, although they often are.
** by/over fields,
*** detectors
**** You can have more than one detector in a job which is more efficient than
running multiple jobs against the same data stream.
//http://www.prelert.com/docs/behavioral_analytics/latest/concepts/multivariate.html
[float]
[[ml-gs-job2-analyze]]
===== Viewing Multi-Metric Job Results
TBD.
* Walk through exploration of job results.
* Describe how influencer detection accelerates root cause identification.
////
////
* Provide brief overview of statistical models and/or link to more info.
* Possibly discuss effect of altering bucket span.
The anomaly score is a sophisticated aggregation of the anomaly records in the
bucket. The calculation is optimized for high throughput, gracefully ages
historical data, and reduces the signal to noise levels. It adjusts for
variations in event rate, takes into account the frequency and the level of
anomalous activity and is adjusted relative to past anomalous behavior.
In addition, [the anomaly score] is boosted if anomalous activity occurs for related entities,
for example if disk IO and CPU are both behaving unusually for a given host.
** Once an anomalous time interval has been identified, it can be expanded to
view the detailed anomaly records which are the significant causal factors.
////
////
[[ml-gs-alerts]]
=== Creating Alerts for Job Results
TBD.
* Walk through creation of simple alert for anomalous data?
////
include::getting-started-multi.asciidoc[]
include::getting-started-next.asciidoc[]

Binary file not shown.

After

Width:  |  Height:  |  Size: 64 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 226 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 78 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 179 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 240 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 371 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 58 KiB