[DOCS] Add ML Getting Started job analysis pages (elastic/x-pack-elasticsearch#1185)

* [DOCS] ML getting started file extraction

* [DOCS] ML Getting Started exploring job results

Original commit: elastic/x-pack-elasticsearch@7b46e7beb3
This commit is contained in:
Lisa Cawley 2017-04-24 16:26:55 -07:00 committed by lcawley
parent 918f4fb962
commit ee612a3dd8
5 changed files with 150 additions and 67 deletions

View File

@ -16,7 +16,6 @@ tutorial shows you how to:
* Create a {ml} job
* Use the results to identify possible anomalies in the data
{nbsp}
At the end of this tutorial, you should have a good idea of what {ml} is and
will hopefully be inspired to use it to detect anomalies in your own data.
@ -155,12 +154,13 @@ available publicly on https://github.com/elastic/examples
//Download this data set by clicking here:
//See https://download.elastic.co/demos/kibana/gettingstarted/shakespeare.json[shakespeare.json].
////
Use the following commands to extract the files:
[source,shell]
gzip -d transactions.ndjson.gz
////
----------------------------------
tar xvf server_metrics.tar.gz
----------------------------------
Each document in the server-metrics data set has the following schema:
[source,js]
@ -191,7 +191,12 @@ and specify a field's characteristics, such as the field's searchability or
whether or not it's _tokenized_, or broken up into separate words.
The sample data includes an `upload_server-metrics.sh` script, which you can use
to create the mappings and load the data set. The script runs a command similar
to create the mappings and load the data set. Before you run it, however, you
must edit the USERNAME and PASSWORD variables with your actual user ID and
password. If you want to test adding data to an existing data feed, you must
also comment out the final two commands related to `server-metrics_4.json`.
The script runs a command similar
to the following example, which sets up a mapping for the data set:
[source,shell]
@ -247,8 +252,7 @@ http://localhost:9200/server-metrics -d '{
----------------------------------
NOTE: If you run this command, you must replace `elasticpassword` with your
actual password. Likewise, if you use the `upload_server-metrics.sh` script,
you must edit the USERNAME and PASSWORD variables before you run it.
actual password.
////
This mapping specifies the following qualities for the data set:
@ -262,7 +266,7 @@ This mapping specifies the following qualities for the data set:
You can then use the Elasticsearch `bulk` API to load the data set. The
`upload_server-metrics.sh` script runs commands similar to the following
example, which loads the four JSON files:
example, which loads three of the JSON files:
[source,shell]
----------------------------------
@ -276,10 +280,10 @@ http://localhost:9200/server-metrics/_bulk --data-binary "@server-metrics_2.json
curl -u elastic:elasticpassword -X POST -H "Content-Type: application/json"
http://localhost:9200/server-metrics/_bulk --data-binary "@server-metrics_3.json"
curl -u elastic:elasticpassword -X POST -H "Content-Type: application/json"
http://localhost:9200/server-metrics/_bulk --data-binary "@server-metrics_4.json"
----------------------------------
//curl -u elastic:elasticpassword -X POST -H "Content-Type: application/json"
//http://localhost:9200/server-metrics/_bulk --data-binary "@server-metrics_4.json"
These commands might take some time to run, depending on the computing resources
available.
@ -291,13 +295,13 @@ You can verify that the data was loaded successfully with the following command:
curl 'http://localhost:9200/_cat/indices?v' -u elastic:elasticpassword
----------------------------------
You should see output similar to the following:
For three sample JSON files, you should see output similar to the following:
[source,shell]
----------------------------------
health status index ... pri rep docs.count docs.deleted store.size ...
green open server-metrics ... 1 0 907200 0 134.9mb ...
green open server-metrics ... 1 0 680400 0 101.7mb ...
----------------------------------
Next, you must define an index pattern for this data set:
@ -423,12 +427,8 @@ at which alerting is required.
//non-aggregating functions.
--
. Click **Use full transaction_counts data**.
+
--
A graph is generated, which represents the total number of requests over time.
//TBD: What happens if you click the play button instead?
--
. Click **Use full transaction_counts data**. A graph is generated,
which represents the total number of requests over time.
. Provide a name for the job, for example `total-requests`. The job name must
be unique in your cluster. You can also optionally provide a description of the
@ -442,10 +442,14 @@ the {ml} that occurs as the data is processed.
//To explore the results, click **View Results**.
//TBD: image::images/ml-gs-job1-results.jpg["The total-requests job is created"]
[[ml-gs-job1-managa]]
TIP: The `create_single_metic.sh` script creates a similar job and data feed by
using the {ml} APIs. For API reference information, see <<ml-apis>>.
[[ml-gs-job1-manage]]
=== Managing Jobs
After you create a job, you can see its status in the **Job Management** tab:
image::images/ml-gs-job1-manage.jpg["Status information for the total-requests job"]
The following information is provided for each job:
@ -458,14 +462,11 @@ The optional description of the job.
Processed records::
The number of records that have been processed by the job.
+
--
NOTE: Depending on how you send data to the job, the number of processed
records is not always equal to the number of input records. For more information,
see the `processed_record_count` description in <<ml-datacounts,Data Counts Objects>>.
--
Memory status::
The status of the mathematical models. When you create jobs by using the APIs or
by using the advanced options in Kibana, you can specify a `model_memory_limit`.
@ -510,71 +511,137 @@ the job.
You can also click one of the **Actions** buttons to start the data feed, edit
the job or data feed, and clone or delete the job, for example.
* TBD: Demonstrate how to re-open the data feed and add additional data
[float]
[[ml-gs-job1-datafeed]]
==== Managing Data Feeds
A data feed can be started and stopped multiple times throughout its lifecycle.
If you want to retrieve more data from Elasticsearch and the data feed is
stopped, you must restart it.
For example, if you only loaded three of the sample JSON files, you can now load
the fourth using the Elasticsearch `bulk` API as follows:
[source,shell]
----------------------------------
curl -u elastic:elasticpassword -X POST -H "Content-Type: application/json"
http://localhost:9200/server-metrics/_bulk --data-binary "@server-metrics_4.json"
----------------------------------
You can optionally verify that the data was loaded successfully with the
following command:
[source,shell]
----------------------------------
curl 'http://localhost:9200/_cat/indices?v' -u elastic:elasticpassword
----------------------------------
For the four sample JSON files, you should see output similar to the following:
[source,shell]
----------------------------------
health status index ... pri rep docs.count docs.deleted store.size ...
green open server-metrics ... 1 0 907200 0 136.2mb ...
----------------------------------
To use this new data in your job:
. In the **Machine Learning** / **Job Management** tab, click the following
button to start the data feed: image::images/ml-start-feed.jpg["Start data feed"].
. Choose a start time and end time. For example,
click **Continue from 2017-04-22** and **No end time**, then click **Start**.
image::images/ml-gs-job1-datafeed.jpg["Restarting a data feed"]
* TBD: Why do I not see increases in the job count stats after this occurs?
How can I determine that it has been successfully processed?
[[ml-gs-jobresults]]
=== Exploring Job Results
After you create a job, you can use the **Anomaly Explorer** or the
The {xpack} {ml} features analyze the input stream of data, model its behavior,
and perform analysis based on the detectors you defined in your job. When an
event occurs outside of the model, that event is identified as an anomaly.
Result records for each anomaly are stored in `.ml-notifications` and
`.ml-anomalies*` indices in Elasticsearch. By default, the name of the
index where {ml} results are stored is `shared`, which corresponds to
the `.ml-anomalies-shared` index.
//For example, these results include the probability of detecting that anomaly.
You can use the **Anomaly Explorer** or the
**Single Metric Viewer** in Kibana to view the analysis results.
Anomaly Explorer::
TBD
This view contains heatmap charts, where the color for each section of the
timeline is determined by the maximum anomaly score in that period.
//TBD: Do the time periods in the heat map correspond to buckets?
Single Metric Viewer::
TBD
This view contains a time series chart that represents the analysis.
As in the **Anomaly Explorer**, anomalous data points are shown in
different colors depending on their probability.
[float]
[[ml-gs-job1-analyze]]
==== Exploring Single Metric Job Results
TBD.
* Walk through exploration of job results.
** Based on this job configuration we analyze the input stream of data.
We model the behavior of the data, perform analysis based upon the defined detectors
and for the time interval. When we see an event occurring outside of our model,
we identify this as an anomaly. For each anomaly detected, we store the
result records of our analysis, which includes the probability of
detecting that anomaly.
** With high volumes of real-life data, many anomalies may be found.
These vary in probability from very likely to highly unlikely i.e. from not
particularly anomalous to highly anomalous. There can be none, one or two or
tens, sometimes hundreds of anomalies found within each bucket.
There can be many thousands found per job.
In order to provide a sensible view of the results, we calculate an anomaly score
for each time interval. An interval with a high anomaly score is significant
and requires investigation.
** The anomaly score is a sophisticated aggregation of the anomaly records.
The calculation is optimized for high throughput, gracefully ages historical data,
and reduces the signal to noise levels.
It adjusts for variations in event rate, takes into account the frequency
and the level of anomalous activity and is adjusted relative to past anomalous behavior.
In addition, it is boosted if anomalous activity occurs for related entities,
for example if disk IO and CPU are both behaving unusually for a given host.
** Once an anomalous time interval has been identified, it can be expanded to
view the detailed anomaly records which are the significant causal factors.
* Provide brief overview of statistical models and/or link to more info.
* Possibly discuss effect of altering bucket span.
* Provide general overview of management of jobs (when/why to start or
stop them).
Integrate the following images:
. Single Metric Viewer: All
By default when you view the results for a single metric job,
the **Single Metric Viewer** opens:
image::images/ml-gs-job1-analysis.jpg["Single Metric Viewer for total-requests job"]
. Single Metric Viewer: Anomalies
The blue line in the chart represents the actual data values. The shaded blue area
represents the expected behavior that was calculated by the model.
//TBD: What is meant by "95% prediction bounds"?
If you slide the time selector from the beginning of the data to the end of the
data, you can see how the model improves as it processes more data. At the
beginning, the expected range of values is pretty broad and the model is not
capturing the periodicity in the data. But it quickly learns and begins to
reflect the daily variation.
Any data points outside the range that was predicted by the model are marked
as anomalies. When you have high volumes of real-life data, many anomalies
might be found. These vary in probability from very likely to highly unlikely,
that is to say, from not particularly anomalous to highly anomalous. There
can be none, one or two or tens, sometimes hundreds of anomalies found within
each bucket. There can be many thousands found per job. In order to provide
a sensible view of the results, an _anomaly score_ is calculated for each bucket
time interval. The anomaly score is a value from 0 to 100, which indicates
the significance of the observed anomaly compared to previously seen anomalies.
The highly anomalous values are shown in red and the low scored values are
indicated in blue. An interval with a high anomaly score is significant and
requires investigation.
Slide the time selector to a section of the time series that contains a red data
point. If you hover over the point, you can see more information about that
data point. You can also see details in the **Anomalies** section of the viewer.
For example:
image::images/ml-gs-job1-anomalies.jpg["Single Metric Viewer Anomalies for total-requests job"]
. Anomaly Explorer: All
For each anomaly you can see key details such as the time, the actual and
expected ("typical") values, and their probability.
You can see the same information in a different format by using the **Anomaly Explorer**:
image::images/ml-gs-job1-explorer.jpg["Anomaly Explorer for total-requests job"]
. Anomaly Explorer: Selected a red area from the heatmap
Click one of the red areas in the heatmap to see details about that anomaly. For
example:
image::images/ml-gs-job1-explorer-anomaly.jpg["Anomaly Explorer details for total-requests job"]
After you have identified anomalies, often the next step is to try to determine
the context of those situations. For example, are there other factors that are
contributing to the problem? Are the anomalies confined to particular
applications or servers? You can begin to troubleshoot these situations by
layering additional jobs or creating multi-metric jobs.
////
[float]
[[ml-gs-job2-create]]
@ -614,6 +681,22 @@ TBD.
* Walk through exploration of job results.
* Describe how influencer detection accelerates root cause identification.
////
////
* Provide brief overview of statistical models and/or link to more info.
* Possibly discuss effect of altering bucket span.
The anomaly score is a sophisticated aggregation of the anomaly records in the
bucket. The calculation is optimized for high throughput, gracefully ages
historical data, and reduces the signal to noise levels. It adjusts for
variations in event rate, takes into account the frequency and the level of
anomalous activity and is adjusted relative to past anomalous behavior.
In addition, [the anomaly score] is boosted if anomalous activity occurs for related entities,
for example if disk IO and CPU are both behaving unusually for a given host.
** Once an anomalous time interval has been identified, it can be expanded to
view the detailed anomaly records which are the significant causal factors.
////
////
[[ml-gs-alerts]]
=== Creating Alerts for Job Results

Binary file not shown.

After

Width:  |  Height:  |  Size: 124 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.2 KiB

View File

@ -3,7 +3,7 @@
==== Start Data Feeds
A data feed must be started in order to retrieve data from {es}.
A data feed can be opened and closed multiple times throughout its lifecycle.
A data feed can be started and stopped multiple times throughout its lifecycle.
===== Request

View File

@ -3,7 +3,7 @@
==== Stop Data Feeds
A data feed that is stopped ceases to retrieve data from {es}.
A data feed can be opened and closed multiple times throughout its lifecycle.
A data feed can be started and stopped multiple times throughout its lifecycle.
===== Request