mirror of
https://github.com/honeymoose/OpenSearch.git
synced 2025-03-08 03:49:38 +00:00
[DOCS] Subdivided getting started with ML pages (elastic/x-pack-elasticsearch#3167)
* [DOCS] Subdivided getting started with ML pages * [DOCS] Added new getting started page to build.gradle Original commit: elastic/x-pack-elasticsearch@968187b048
This commit is contained in:
parent
11ab50d9dc
commit
90a1da82ee
@ -9,7 +9,7 @@ apply plugin: 'elasticsearch.docs-test'
|
||||
* only remove entries from this list. When it is empty we'll remove it
|
||||
* entirely and have a party! There will be cake and everything.... */
|
||||
buildRestTests.expectedUnconvertedCandidates = [
|
||||
'en/ml/getting-started.asciidoc',
|
||||
'en/ml/getting-started-data.asciidoc',
|
||||
'en/ml/functions/count.asciidoc',
|
||||
'en/ml/functions/geo.asciidoc',
|
||||
'en/ml/functions/info.asciidoc',
|
||||
|
220
docs/en/ml/getting-started-data.asciidoc
Normal file
220
docs/en/ml/getting-started-data.asciidoc
Normal file
@ -0,0 +1,220 @@
|
||||
[[ml-gs-data]]
|
||||
=== Identifying Data for Analysis
|
||||
|
||||
For the purposes of this tutorial, we provide sample data that you can play with
|
||||
and search in {es}. When you consider your own data, however, it's important to
|
||||
take a moment and think about where the {xpackml} features will be most
|
||||
impactful.
|
||||
|
||||
The first consideration is that it must be time series data. The {ml} features
|
||||
are designed to model and detect anomalies in time series data.
|
||||
|
||||
The second consideration, especially when you are first learning to use {ml},
|
||||
is the importance of the data and how familiar you are with it. Ideally, it is
|
||||
information that contains key performance indicators (KPIs) for the health,
|
||||
security, or success of your business or system. It is information that you need
|
||||
to monitor and act on when anomalous behavior occurs. You might even have {kib}
|
||||
dashboards that you're already using to watch this data. The better you know the
|
||||
data, the quicker you will be able to create {ml} jobs that generate useful
|
||||
insights.
|
||||
|
||||
The final consideration is where the data is located. This tutorial assumes that
|
||||
your data is stored in {es}. It guides you through the steps required to create
|
||||
a _{dfeed}_ that passes data to a job. If your own data is outside of {es},
|
||||
analysis is still possible by using a post data API.
|
||||
|
||||
IMPORTANT: If you want to create {ml} jobs in {kib}, you must use {dfeeds}.
|
||||
That is to say, you must store your input data in {es}. When you create
|
||||
a job, you select an existing index pattern and {kib} configures the {dfeed}
|
||||
for you under the covers.
|
||||
|
||||
|
||||
[float]
|
||||
[[ml-gs-sampledata]]
|
||||
==== Obtaining a Sample Data Set
|
||||
|
||||
In this step we will upload some sample data to {es}. This is standard
|
||||
{es} functionality, and is needed to set the stage for using {ml}.
|
||||
|
||||
The sample data for this tutorial contains information about the requests that
|
||||
are received by various applications and services in a system. A system
|
||||
administrator might use this type of information to track the total number of
|
||||
requests across all of the infrastructure. If the number of requests increases
|
||||
or decreases unexpectedly, for example, this might be an indication that there
|
||||
is a problem or that resources need to be redistributed. By using the {xpack}
|
||||
{ml} features to model the behavior of this data, it is easier to identify
|
||||
anomalies and take appropriate action.
|
||||
|
||||
Download this sample data by clicking here:
|
||||
https://download.elastic.co/demos/machine_learning/gettingstarted/server_metrics.tar.gz[server_metrics.tar.gz]
|
||||
|
||||
Use the following commands to extract the files:
|
||||
|
||||
[source,shell]
|
||||
----------------------------------
|
||||
tar -zxvf server_metrics.tar.gz
|
||||
----------------------------------
|
||||
|
||||
Each document in the server-metrics data set has the following schema:
|
||||
|
||||
[source,js]
|
||||
----------------------------------
|
||||
{
|
||||
"index":
|
||||
{
|
||||
"_index":"server-metrics",
|
||||
"_type":"metric",
|
||||
"_id":"1177"
|
||||
}
|
||||
}
|
||||
{
|
||||
"@timestamp":"2017-03-23T13:00:00",
|
||||
"accept":36320,
|
||||
"deny":4156,
|
||||
"host":"server_2",
|
||||
"response":2.4558210155,
|
||||
"service":"app_3",
|
||||
"total":40476
|
||||
}
|
||||
----------------------------------
|
||||
|
||||
TIP: The sample data sets include summarized data. For example, the `total`
|
||||
value is a sum of the requests that were received by a specific service at a
|
||||
particular time. If your data is stored in {es}, you can generate
|
||||
this type of sum or average by using aggregations. One of the benefits of
|
||||
summarizing data this way is that {es} automatically distributes
|
||||
these calculations across your cluster. You can then feed this summarized data
|
||||
into {xpackml} instead of raw results, which reduces the volume
|
||||
of data that must be considered while detecting anomalies. For the purposes of
|
||||
this tutorial, however, these summary values are stored in {es}. For more
|
||||
information, see <<ml-configuring-aggregation>>.
|
||||
|
||||
Before you load the data set, you need to set up {ref}/mapping.html[_mappings_]
|
||||
for the fields. Mappings divide the documents in the index into logical groups
|
||||
and specify a field's characteristics, such as the field's searchability or
|
||||
whether or not it's _tokenized_, or broken up into separate words.
|
||||
|
||||
The sample data includes an `upload_server-metrics.sh` script, which you can use
|
||||
to create the mappings and load the data set. You can download it by clicking
|
||||
here: https://download.elastic.co/demos/machine_learning/gettingstarted/upload_server-metrics.sh[upload_server-metrics.sh]
|
||||
Before you run it, however, you must edit the USERNAME and PASSWORD variables
|
||||
with your actual user ID and password.
|
||||
|
||||
The script runs a command similar to the following example, which sets up a
|
||||
mapping for the data set:
|
||||
|
||||
[source,shell]
|
||||
----------------------------------
|
||||
|
||||
curl -u elastic:x-pack-test-password -X PUT -H 'Content-Type: application/json'
|
||||
http://localhost:9200/server-metrics -d '{
|
||||
"settings":{
|
||||
"number_of_shards":1,
|
||||
"number_of_replicas":0
|
||||
},
|
||||
"mappings":{
|
||||
"metric":{
|
||||
"properties":{
|
||||
"@timestamp":{
|
||||
"type":"date"
|
||||
},
|
||||
"accept":{
|
||||
"type":"long"
|
||||
},
|
||||
"deny":{
|
||||
"type":"long"
|
||||
},
|
||||
"host":{
|
||||
"type":"keyword"
|
||||
},
|
||||
"response":{
|
||||
"type":"float"
|
||||
},
|
||||
"service":{
|
||||
"type":"keyword"
|
||||
},
|
||||
"total":{
|
||||
"type":"long"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}'
|
||||
----------------------------------
|
||||
|
||||
NOTE: If you run this command, you must replace `x-pack-test-password` with your
|
||||
actual password.
|
||||
|
||||
////
|
||||
This mapping specifies the following qualities for the data set:
|
||||
|
||||
* The _@timestamp_ field is a date.
|
||||
//that uses the ISO format `epoch_second`,
|
||||
//which is the number of seconds since the epoch.
|
||||
* The _accept_, _deny_, and _total_ fields are long numbers.
|
||||
* The _host
|
||||
////
|
||||
|
||||
You can then use the {es} `bulk` API to load the data set. The
|
||||
`upload_server-metrics.sh` script runs commands similar to the following
|
||||
example, which loads the four JSON files:
|
||||
|
||||
[source,shell]
|
||||
----------------------------------
|
||||
|
||||
curl -u elastic:x-pack-test-password -X POST -H "Content-Type: application/json"
|
||||
http://localhost:9200/server-metrics/_bulk --data-binary "@server-metrics_1.json"
|
||||
|
||||
curl -u elastic:x-pack-test-password -X POST -H "Content-Type: application/json"
|
||||
http://localhost:9200/server-metrics/_bulk --data-binary "@server-metrics_2.json"
|
||||
|
||||
curl -u elastic:x-pack-test-password -X POST -H "Content-Type: application/json"
|
||||
http://localhost:9200/server-metrics/_bulk --data-binary "@server-metrics_3.json"
|
||||
|
||||
curl -u elastic:x-pack-test-password -X POST -H "Content-Type: application/json"
|
||||
http://localhost:9200/server-metrics/_bulk --data-binary "@server-metrics_4.json"
|
||||
----------------------------------
|
||||
|
||||
TIP: This will upload 200MB of data. This is split into 4 files as there is a
|
||||
maximum 100MB limit when using the `_bulk` API.
|
||||
|
||||
These commands might take some time to run, depending on the computing resources
|
||||
available.
|
||||
|
||||
You can verify that the data was loaded successfully with the following command:
|
||||
|
||||
[source,shell]
|
||||
----------------------------------
|
||||
|
||||
curl 'http://localhost:9200/_cat/indices?v' -u elastic:x-pack-test-password
|
||||
----------------------------------
|
||||
|
||||
You should see output similar to the following:
|
||||
|
||||
[source,shell]
|
||||
----------------------------------
|
||||
|
||||
health status index ... pri rep docs.count ...
|
||||
green open server-metrics ... 1 0 905940 ...
|
||||
----------------------------------
|
||||
|
||||
Next, you must define an index pattern for this data set:
|
||||
|
||||
. Open {kib} in your web browser and log in. If you are running {kib}
|
||||
locally, go to `http://localhost:5601/`.
|
||||
|
||||
. Click the **Management** tab, then **Index Patterns**.
|
||||
|
||||
. If you already have index patterns, click the plus sign (+) to define a new
|
||||
one. Otherwise, the **Configure an index pattern** wizard is already open.
|
||||
|
||||
. For this tutorial, any pattern that matches the name of the index you've
|
||||
loaded will work. For example, enter `server-metrics*` as the index pattern.
|
||||
|
||||
. Verify that the **Index contains time-based events** is checked.
|
||||
|
||||
. Select the `@timestamp` field from the **Time-field name** list.
|
||||
|
||||
. Click **Create**.
|
||||
|
||||
This data set can now be analyzed in {ml} jobs in {kib}.
|
354
docs/en/ml/getting-started-single.asciidoc
Normal file
354
docs/en/ml/getting-started-single.asciidoc
Normal file
@ -0,0 +1,354 @@
|
||||
[[ml-gs-jobs]]
|
||||
=== Creating Single Metric Jobs
|
||||
|
||||
Machine learning jobs contain the configuration information and metadata
|
||||
necessary to perform an analytical task. They also contain the results of the
|
||||
analytical task.
|
||||
|
||||
[NOTE]
|
||||
--
|
||||
This tutorial uses {kib} to create jobs and view results, but you can
|
||||
alternatively use APIs to accomplish most tasks.
|
||||
For API reference information, see {ref}/ml-apis.html[Machine Learning APIs].
|
||||
|
||||
The {xpackml} features in {kib} use pop-ups. You must configure your
|
||||
web browser so that it does not block pop-up windows or create an
|
||||
exception for your {kib} URL.
|
||||
--
|
||||
|
||||
You can choose to create single metric, multi-metric, or advanced jobs in
|
||||
{kib}. At this point in the tutorial, the goal is to detect anomalies in the
|
||||
total requests received by your applications and services. The sample data
|
||||
contains a single key performance indicator to track this, which is the total
|
||||
requests over time. It is therefore logical to start by creating a single metric
|
||||
job for this KPI.
|
||||
|
||||
TIP: If you are using aggregated data, you can create an advanced job
|
||||
and configure it to use a `summary_count_field_name`. The {ml} algorithms will
|
||||
make the best possible use of summarized data in this case. For simplicity, in
|
||||
this tutorial we will not make use of that advanced functionality. For more
|
||||
information, see <<ml-configuring-aggregation>>.
|
||||
|
||||
A single metric job contains a single _detector_. A detector defines the type of
|
||||
analysis that will occur (for example, `max`, `average`, or `rare` analytical
|
||||
functions) and the fields that will be analyzed.
|
||||
|
||||
To create a single metric job in {kib}:
|
||||
|
||||
. Open {kib} in your web browser and log in. If you are running {kib} locally,
|
||||
go to `http://localhost:5601/`.
|
||||
|
||||
. Click **Machine Learning** in the side navigation: +
|
||||
+
|
||||
--
|
||||
[role="screenshot"]
|
||||
image::images/ml-kibana.jpg[Job Management]
|
||||
--
|
||||
|
||||
. Click **Create new job**.
|
||||
|
||||
. Click **Create single metric job**. +
|
||||
+
|
||||
--
|
||||
[role="screenshot"]
|
||||
image::images/ml-create-jobs.jpg["Create a new job"]
|
||||
--
|
||||
|
||||
. Click the `server-metrics` index. +
|
||||
+
|
||||
--
|
||||
[role="screenshot"]
|
||||
image::images/ml-gs-index.jpg["Select an index"]
|
||||
--
|
||||
|
||||
. Configure the job by providing the following information: +
|
||||
+
|
||||
--
|
||||
[role="screenshot"]
|
||||
image::images/ml-gs-single-job.jpg["Create a new job from the server-metrics index"]
|
||||
--
|
||||
|
||||
.. For the **Aggregation**, select `Sum`. This value specifies the analysis
|
||||
function that is used.
|
||||
+
|
||||
--
|
||||
Some of the analytical functions look for single anomalous data points. For
|
||||
example, `max` identifies the maximum value that is seen within a bucket.
|
||||
Others perform some aggregation over the length of the bucket. For example,
|
||||
`mean` calculates the mean of all the data points seen within the bucket.
|
||||
Similarly, `count` calculates the total number of data points within the bucket.
|
||||
In this tutorial, you are using the `sum` function, which calculates the sum of
|
||||
the specified field's values within the bucket.
|
||||
--
|
||||
|
||||
.. For the **Field**, select `total`. This value specifies the field that
|
||||
the detector uses in the function.
|
||||
+
|
||||
--
|
||||
NOTE: Some functions such as `count` and `rare` do not require fields.
|
||||
--
|
||||
|
||||
.. For the **Bucket span**, enter `10m`. This value specifies the size of the
|
||||
interval that the analysis is aggregated into.
|
||||
+
|
||||
--
|
||||
The {xpackml} features use the concept of a bucket to divide up the time series
|
||||
into batches for processing. For example, if you are monitoring
|
||||
the total number of requests in the system,
|
||||
//and receive a data point every 10 minutes
|
||||
using a bucket span of 1 hour would mean that at the end of each hour, it
|
||||
calculates the sum of the requests for the last hour and computes the
|
||||
anomalousness of that value compared to previous hours.
|
||||
|
||||
The bucket span has two purposes: it dictates over what time span to look for
|
||||
anomalous features in data, and also determines how quickly anomalies can be
|
||||
detected. Choosing a shorter bucket span enables anomalies to be detected more
|
||||
quickly. However, there is a risk of being too sensitive to natural variations
|
||||
or noise in the input data. Choosing too long a bucket span can mean that
|
||||
interesting anomalies are averaged away. There is also the possibility that the
|
||||
aggregation might smooth out some anomalies based on when the bucket starts
|
||||
in time.
|
||||
|
||||
The bucket span has a significant impact on the analysis. When you're trying to
|
||||
determine what value to use, take into account the granularity at which you
|
||||
want to perform the analysis, the frequency of the input data, the duration of
|
||||
typical anomalies and the frequency at which alerting is required.
|
||||
--
|
||||
|
||||
. Determine whether you want to process all of the data or only part of it. If
|
||||
you want to analyze all of the existing data, click
|
||||
**Use full server-metrics* data**. If you want to see what happens when you
|
||||
stop and start {dfeeds} and process additional data over time, click the time
|
||||
picker in the {kib} toolbar. Since the sample data spans a period of time
|
||||
between March 23, 2017 and April 22, 2017, click **Absolute**. Set the start
|
||||
time to March 23, 2017 and the end time to April 1, 2017, for example. Once
|
||||
you've got the time range set up, click the **Go** button. +
|
||||
+
|
||||
--
|
||||
[role="screenshot"]
|
||||
image::images/ml-gs-job1-time.jpg["Setting the time range for the {dfeed}"]
|
||||
--
|
||||
+
|
||||
--
|
||||
A graph is generated, which represents the total number of requests over time.
|
||||
--
|
||||
|
||||
. Provide a name for the job, for example `total-requests`. The job name must
|
||||
be unique in your cluster. You can also optionally provide a description of the
|
||||
job.
|
||||
|
||||
. Click **Create Job**. +
|
||||
+
|
||||
--
|
||||
[role="screenshot"]
|
||||
image::images/ml-gs-job1.jpg["A graph of the total number of requests over time"]
|
||||
--
|
||||
|
||||
As the job is created, the graph is updated to give a visual representation of
|
||||
the progress of {ml} as the data is processed. This view is only available whilst the
|
||||
job is running.
|
||||
|
||||
TIP: The `create_single_metic.sh` script creates a similar job and {dfeed} by
|
||||
using the {ml} APIs. You can download that script by clicking
|
||||
here: https://download.elastic.co/demos/machine_learning/gettingstarted/create_single_metric.sh[create_single_metric.sh]
|
||||
For API reference information, see {ref}/ml-apis.html[Machine Learning APIs].
|
||||
|
||||
[[ml-gs-job1-manage]]
|
||||
=== Managing Jobs
|
||||
|
||||
After you create a job, you can see its status in the **Job Management** tab: +
|
||||
|
||||
[role="screenshot"]
|
||||
image::images/ml-gs-job1-manage1.jpg["Status information for the total-requests job"]
|
||||
|
||||
The following information is provided for each job:
|
||||
|
||||
Job ID::
|
||||
The unique identifier for the job.
|
||||
|
||||
Description::
|
||||
The optional description of the job.
|
||||
|
||||
Processed records::
|
||||
The number of records that have been processed by the job.
|
||||
|
||||
Memory status::
|
||||
The status of the mathematical models. When you create jobs by using the APIs or
|
||||
by using the advanced options in {kib}, you can specify a `model_memory_limit`.
|
||||
That value is the maximum amount of memory resources that the mathematical
|
||||
models can use. Once that limit is approached, data pruning becomes more
|
||||
aggressive. Upon exceeding that limit, new entities are not modeled. For more
|
||||
information about this setting, see
|
||||
{ref}/ml-job-resource.html#ml-apilimits[Analysis Limits]. The memory status
|
||||
field reflects whether you have reached or exceeded the model memory limit. It
|
||||
can have one of the following values: +
|
||||
`ok`::: The models stayed below the configured value.
|
||||
`soft_limit`::: The models used more than 60% of the configured memory limit
|
||||
and older unused models will be pruned to free up space.
|
||||
`hard_limit`::: The models used more space than the configured memory limit.
|
||||
As a result, not all incoming data was processed.
|
||||
|
||||
Job state::
|
||||
The status of the job, which can be one of the following values: +
|
||||
`open`::: The job is available to receive and process data.
|
||||
`closed`::: The job finished successfully with its model state persisted.
|
||||
The job must be opened before it can accept further data.
|
||||
`closing`::: The job close action is in progress and has not yet completed.
|
||||
A closing job cannot accept further data.
|
||||
`failed`::: The job did not finish successfully due to an error.
|
||||
This situation can occur due to invalid input data.
|
||||
If the job had irrevocably failed, it must be force closed and then deleted.
|
||||
If the {dfeed} can be corrected, the job can be closed and then re-opened.
|
||||
|
||||
{dfeed-cap} state::
|
||||
The status of the {dfeed}, which can be one of the following values: +
|
||||
started::: The {dfeed} is actively receiving data.
|
||||
stopped::: The {dfeed} is stopped and will not receive data until it is
|
||||
re-started.
|
||||
|
||||
Latest timestamp::
|
||||
The timestamp of the last processed record.
|
||||
|
||||
|
||||
If you click the arrow beside the name of job, you can show or hide additional
|
||||
information, such as the settings, configuration information, or messages for
|
||||
the job.
|
||||
|
||||
You can also click one of the **Actions** buttons to start the {dfeed}, edit
|
||||
the job or {dfeed}, and clone or delete the job, for example.
|
||||
|
||||
[float]
|
||||
[[ml-gs-job1-datafeed]]
|
||||
==== Managing {dfeeds-cap}
|
||||
|
||||
A {dfeed} can be started and stopped multiple times throughout its lifecycle.
|
||||
If you want to retrieve more data from {es} and the {dfeed} is stopped, you must
|
||||
restart it.
|
||||
|
||||
For example, if you did not use the full data when you created the job, you can
|
||||
now process the remaining data by restarting the {dfeed}:
|
||||
|
||||
. In the **Machine Learning** / **Job Management** tab, click the following
|
||||
button to start the {dfeed}: image:images/ml-start-feed.jpg["Start {dfeed}"]
|
||||
|
||||
|
||||
. Choose a start time and end time. For example,
|
||||
click **Continue from 2017-04-01 23:59:00** and select **2017-04-30** as the
|
||||
search end time. Then click **Start**. The date picker defaults to the latest
|
||||
timestamp of processed data. Be careful not to leave any gaps in the analysis,
|
||||
otherwise you might miss anomalies. +
|
||||
+
|
||||
--
|
||||
[role="screenshot"]
|
||||
image::images/ml-gs-job1-datafeed.jpg["Restarting a {dfeed}"]
|
||||
--
|
||||
|
||||
The {dfeed} state changes to `started`, the job state changes to `opened`,
|
||||
and the number of processed records increases as the new data is analyzed. The
|
||||
latest timestamp information also increases. For example:
|
||||
[role="screenshot"]
|
||||
image::images/ml-gs-job1-manage2.jpg["Job opened and {dfeed} started"]
|
||||
|
||||
TIP: If your data is being loaded continuously, you can continue running the job
|
||||
in real time. For this, start your {dfeed} and select **No end time**.
|
||||
|
||||
If you want to stop the {dfeed} at this point, you can click the following
|
||||
button: image:images/ml-stop-feed.jpg["Stop {dfeed}"]
|
||||
|
||||
Now that you have processed all the data, let's start exploring the job results.
|
||||
|
||||
[[ml-gs-job1-analyze]]
|
||||
=== Exploring Single Metric Job Results
|
||||
|
||||
The {xpackml} features analyze the input stream of data, model its behavior,
|
||||
and perform analysis based on the detectors you defined in your job. When an
|
||||
event occurs outside of the model, that event is identified as an anomaly.
|
||||
|
||||
Result records for each anomaly are stored in `.ml-anomalies-*` indices in {es}.
|
||||
By default, the name of the index where {ml} results are stored is labelled
|
||||
`shared`, which corresponds to the `.ml-anomalies-shared` index.
|
||||
|
||||
You can use the **Anomaly Explorer** or the **Single Metric Viewer** in {kib} to
|
||||
view the analysis results.
|
||||
|
||||
Anomaly Explorer::
|
||||
This view contains swim lanes showing the maximum anomaly score over time.
|
||||
There is an overall swim lane that shows the overall score for the job, and
|
||||
also swim lanes for each influencer. By selecting a block in a swim lane, the
|
||||
anomaly details are displayed alongside the original source data (where
|
||||
applicable).
|
||||
|
||||
Single Metric Viewer::
|
||||
This view contains a chart that represents the actual and expected values over
|
||||
time. This is only available for jobs that analyze a single time series and
|
||||
where `model_plot_config` is enabled. As in the **Anomaly Explorer**, anomalous
|
||||
data points are shown in different colors depending on their score.
|
||||
|
||||
By default when you view the results for a single metric job, the
|
||||
**Single Metric Viewer** opens:
|
||||
[role="screenshot"]
|
||||
image::images/ml-gs-job1-analysis.jpg["Single Metric Viewer for total-requests job"]
|
||||
|
||||
|
||||
The blue line in the chart represents the actual data values. The shaded blue
|
||||
area represents the bounds for the expected values. The area between the upper
|
||||
and lower bounds are the most likely values for the model. If a value is outside
|
||||
of this area then it can be said to be anomalous.
|
||||
|
||||
If you slide the time selector from the beginning of the data to the end of the
|
||||
data, you can see how the model improves as it processes more data. At the
|
||||
beginning, the expected range of values is pretty broad and the model is not
|
||||
capturing the periodicity in the data. But it quickly learns and begins to
|
||||
reflect the daily variation.
|
||||
|
||||
Any data points outside the range that was predicted by the model are marked
|
||||
as anomalies. When you have high volumes of real-life data, many anomalies
|
||||
might be found. These vary in probability from very likely to highly unlikely,
|
||||
that is to say, from not particularly anomalous to highly anomalous. There
|
||||
can be none, one or two or tens, sometimes hundreds of anomalies found within
|
||||
each bucket. There can be many thousands found per job. In order to provide
|
||||
a sensible view of the results, an _anomaly score_ is calculated for each bucket
|
||||
time interval. The anomaly score is a value from 0 to 100, which indicates
|
||||
the significance of the observed anomaly compared to previously seen anomalies.
|
||||
The highly anomalous values are shown in red and the low scored values are
|
||||
indicated in blue. An interval with a high anomaly score is significant and
|
||||
requires investigation.
|
||||
|
||||
Slide the time selector to a section of the time series that contains a red
|
||||
anomaly data point. If you hover over the point, you can see more information
|
||||
about that data point. You can also see details in the **Anomalies** section
|
||||
of the viewer. For example:
|
||||
[role="screenshot"]
|
||||
image::images/ml-gs-job1-anomalies.jpg["Single Metric Viewer Anomalies for total-requests job"]
|
||||
|
||||
For each anomaly you can see key details such as the time, the actual and
|
||||
expected ("typical") values, and their probability.
|
||||
|
||||
By default, the table contains all anomalies that have a severity of "warning"
|
||||
or higher in the selected section of the timeline. If you are only interested in
|
||||
critical anomalies, for example, you can change the severity threshold for this
|
||||
table.
|
||||
|
||||
The anomalies table also automatically calculates an interval for the data in
|
||||
the table. If the time difference between the earliest and latest records in the
|
||||
table is less than two days, the data is aggregated by hour to show the details
|
||||
of the highest severity anomaly for each detector. Otherwise, it is
|
||||
aggregated by day. You can change the interval for the table, for example, to
|
||||
show all anomalies.
|
||||
|
||||
You can see the same information in a different format by using the
|
||||
**Anomaly Explorer**:
|
||||
[role="screenshot"]
|
||||
image::images/ml-gs-job1-explorer.jpg["Anomaly Explorer for total-requests job"]
|
||||
|
||||
|
||||
Click one of the red sections in the swim lane to see details about the anomalies
|
||||
that occurred in that time interval. For example:
|
||||
[role="screenshot"]
|
||||
image::images/ml-gs-job1-explorer-anomaly.jpg["Anomaly Explorer details for total-requests job"]
|
||||
|
||||
After you have identified anomalies, often the next step is to try to determine
|
||||
the context of those situations. For example, are there other factors that are
|
||||
contributing to the problem? Are the anomalies confined to particular
|
||||
applications or servers? You can begin to troubleshoot these situations by
|
||||
layering additional jobs or creating multi-metric jobs.
|
@ -79,583 +79,7 @@ significant changes to the system. You can alternatively assign the
|
||||
|
||||
For more information, see <<built-in-roles>> and <<privileges-list-cluster>>.
|
||||
|
||||
[[ml-gs-data]]
|
||||
=== Identifying Data for Analysis
|
||||
|
||||
For the purposes of this tutorial, we provide sample data that you can play with
|
||||
and search in {es}. When you consider your own data, however, it's important to
|
||||
take a moment and think about where the {xpackml} features will be most
|
||||
impactful.
|
||||
|
||||
The first consideration is that it must be time series data. The {ml} features
|
||||
are designed to model and detect anomalies in time series data.
|
||||
|
||||
The second consideration, especially when you are first learning to use {ml},
|
||||
is the importance of the data and how familiar you are with it. Ideally, it is
|
||||
information that contains key performance indicators (KPIs) for the health,
|
||||
security, or success of your business or system. It is information that you need
|
||||
to monitor and act on when anomalous behavior occurs. You might even have {kib}
|
||||
dashboards that you're already using to watch this data. The better you know the
|
||||
data, the quicker you will be able to create {ml} jobs that generate useful
|
||||
insights.
|
||||
|
||||
The final consideration is where the data is located. This tutorial assumes that
|
||||
your data is stored in {es}. It guides you through the steps required to create
|
||||
a _{dfeed}_ that passes data to a job. If your own data is outside of {es},
|
||||
analysis is still possible by using a post data API.
|
||||
|
||||
IMPORTANT: If you want to create {ml} jobs in {kib}, you must use {dfeeds}.
|
||||
That is to say, you must store your input data in {es}. When you create
|
||||
a job, you select an existing index pattern and {kib} configures the {dfeed}
|
||||
for you under the covers.
|
||||
|
||||
|
||||
[float]
|
||||
[[ml-gs-sampledata]]
|
||||
==== Obtaining a Sample Data Set
|
||||
|
||||
In this step we will upload some sample data to {es}. This is standard
|
||||
{es} functionality, and is needed to set the stage for using {ml}.
|
||||
|
||||
The sample data for this tutorial contains information about the requests that
|
||||
are received by various applications and services in a system. A system
|
||||
administrator might use this type of information to track the total number of
|
||||
requests across all of the infrastructure. If the number of requests increases
|
||||
or decreases unexpectedly, for example, this might be an indication that there
|
||||
is a problem or that resources need to be redistributed. By using the {xpack}
|
||||
{ml} features to model the behavior of this data, it is easier to identify
|
||||
anomalies and take appropriate action.
|
||||
|
||||
Download this sample data by clicking here:
|
||||
https://download.elastic.co/demos/machine_learning/gettingstarted/server_metrics.tar.gz[server_metrics.tar.gz]
|
||||
|
||||
Use the following commands to extract the files:
|
||||
|
||||
[source,shell]
|
||||
----------------------------------
|
||||
tar -zxvf server_metrics.tar.gz
|
||||
----------------------------------
|
||||
|
||||
Each document in the server-metrics data set has the following schema:
|
||||
|
||||
[source,js]
|
||||
----------------------------------
|
||||
{
|
||||
"index":
|
||||
{
|
||||
"_index":"server-metrics",
|
||||
"_type":"metric",
|
||||
"_id":"1177"
|
||||
}
|
||||
}
|
||||
{
|
||||
"@timestamp":"2017-03-23T13:00:00",
|
||||
"accept":36320,
|
||||
"deny":4156,
|
||||
"host":"server_2",
|
||||
"response":2.4558210155,
|
||||
"service":"app_3",
|
||||
"total":40476
|
||||
}
|
||||
----------------------------------
|
||||
|
||||
TIP: The sample data sets include summarized data. For example, the `total`
|
||||
value is a sum of the requests that were received by a specific service at a
|
||||
particular time. If your data is stored in {es}, you can generate
|
||||
this type of sum or average by using aggregations. One of the benefits of
|
||||
summarizing data this way is that {es} automatically distributes
|
||||
these calculations across your cluster. You can then feed this summarized data
|
||||
into {xpackml} instead of raw results, which reduces the volume
|
||||
of data that must be considered while detecting anomalies. For the purposes of
|
||||
this tutorial, however, these summary values are stored in {es}. For more
|
||||
information, see <<ml-configuring-aggregation>>.
|
||||
|
||||
Before you load the data set, you need to set up {ref}/mapping.html[_mappings_]
|
||||
for the fields. Mappings divide the documents in the index into logical groups
|
||||
and specify a field's characteristics, such as the field's searchability or
|
||||
whether or not it's _tokenized_, or broken up into separate words.
|
||||
|
||||
The sample data includes an `upload_server-metrics.sh` script, which you can use
|
||||
to create the mappings and load the data set. You can download it by clicking
|
||||
here: https://download.elastic.co/demos/machine_learning/gettingstarted/upload_server-metrics.sh[upload_server-metrics.sh]
|
||||
Before you run it, however, you must edit the USERNAME and PASSWORD variables
|
||||
with your actual user ID and password.
|
||||
|
||||
The script runs a command similar to the following example, which sets up a
|
||||
mapping for the data set:
|
||||
|
||||
[source,shell]
|
||||
----------------------------------
|
||||
|
||||
curl -u elastic:x-pack-test-password -X PUT -H 'Content-Type: application/json'
|
||||
http://localhost:9200/server-metrics -d '{
|
||||
"settings":{
|
||||
"number_of_shards":1,
|
||||
"number_of_replicas":0
|
||||
},
|
||||
"mappings":{
|
||||
"metric":{
|
||||
"properties":{
|
||||
"@timestamp":{
|
||||
"type":"date"
|
||||
},
|
||||
"accept":{
|
||||
"type":"long"
|
||||
},
|
||||
"deny":{
|
||||
"type":"long"
|
||||
},
|
||||
"host":{
|
||||
"type":"keyword"
|
||||
},
|
||||
"response":{
|
||||
"type":"float"
|
||||
},
|
||||
"service":{
|
||||
"type":"keyword"
|
||||
},
|
||||
"total":{
|
||||
"type":"long"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}'
|
||||
----------------------------------
|
||||
|
||||
NOTE: If you run this command, you must replace `x-pack-test-password` with your
|
||||
actual password.
|
||||
|
||||
////
|
||||
This mapping specifies the following qualities for the data set:
|
||||
|
||||
* The _@timestamp_ field is a date.
|
||||
//that uses the ISO format `epoch_second`,
|
||||
//which is the number of seconds since the epoch.
|
||||
* The _accept_, _deny_, and _total_ fields are long numbers.
|
||||
* The _host
|
||||
////
|
||||
|
||||
You can then use the {es} `bulk` API to load the data set. The
|
||||
`upload_server-metrics.sh` script runs commands similar to the following
|
||||
example, which loads the four JSON files:
|
||||
|
||||
[source,shell]
|
||||
----------------------------------
|
||||
|
||||
curl -u elastic:x-pack-test-password -X POST -H "Content-Type: application/json"
|
||||
http://localhost:9200/server-metrics/_bulk --data-binary "@server-metrics_1.json"
|
||||
|
||||
curl -u elastic:x-pack-test-password -X POST -H "Content-Type: application/json"
|
||||
http://localhost:9200/server-metrics/_bulk --data-binary "@server-metrics_2.json"
|
||||
|
||||
curl -u elastic:x-pack-test-password -X POST -H "Content-Type: application/json"
|
||||
http://localhost:9200/server-metrics/_bulk --data-binary "@server-metrics_3.json"
|
||||
|
||||
curl -u elastic:x-pack-test-password -X POST -H "Content-Type: application/json"
|
||||
http://localhost:9200/server-metrics/_bulk --data-binary "@server-metrics_4.json"
|
||||
----------------------------------
|
||||
|
||||
TIP: This will upload 200MB of data. This is split into 4 files as there is a
|
||||
maximum 100MB limit when using the `_bulk` API.
|
||||
|
||||
These commands might take some time to run, depending on the computing resources
|
||||
available.
|
||||
|
||||
You can verify that the data was loaded successfully with the following command:
|
||||
|
||||
[source,shell]
|
||||
----------------------------------
|
||||
|
||||
curl 'http://localhost:9200/_cat/indices?v' -u elastic:x-pack-test-password
|
||||
----------------------------------
|
||||
|
||||
You should see output similar to the following:
|
||||
|
||||
[source,shell]
|
||||
----------------------------------
|
||||
|
||||
health status index ... pri rep docs.count ...
|
||||
green open server-metrics ... 1 0 905940 ...
|
||||
----------------------------------
|
||||
|
||||
Next, you must define an index pattern for this data set:
|
||||
|
||||
. Open {kib} in your web browser and log in. If you are running {kib}
|
||||
locally, go to `http://localhost:5601/`.
|
||||
|
||||
. Click the **Management** tab, then **Index Patterns**.
|
||||
|
||||
. If you already have index patterns, click the plus sign (+) to define a new
|
||||
one. Otherwise, the **Configure an index pattern** wizard is already open.
|
||||
|
||||
. For this tutorial, any pattern that matches the name of the index you've
|
||||
loaded will work. For example, enter `server-metrics*` as the index pattern.
|
||||
|
||||
. Verify that the **Index contains time-based events** is checked.
|
||||
|
||||
. Select the `@timestamp` field from the **Time-field name** list.
|
||||
|
||||
. Click **Create**.
|
||||
|
||||
This data set can now be analyzed in {ml} jobs in {kib}.
|
||||
|
||||
|
||||
[[ml-gs-jobs]]
|
||||
=== Creating Single Metric Jobs
|
||||
|
||||
Machine learning jobs contain the configuration information and metadata
|
||||
necessary to perform an analytical task. They also contain the results of the
|
||||
analytical task.
|
||||
|
||||
[NOTE]
|
||||
--
|
||||
This tutorial uses {kib} to create jobs and view results, but you can
|
||||
alternatively use APIs to accomplish most tasks.
|
||||
For API reference information, see {ref}/ml-apis.html[Machine Learning APIs].
|
||||
|
||||
The {xpackml} features in {kib} use pop-ups. You must configure your
|
||||
web browser so that it does not block pop-up windows or create an
|
||||
exception for your Kibana URL.
|
||||
--
|
||||
|
||||
You can choose to create single metric, multi-metric, or advanced jobs in
|
||||
{kib}. At this point in the tutorial, the goal is to detect anomalies in the
|
||||
total requests received by your applications and services. The sample data
|
||||
contains a single key performance indicator to track this, which is the total
|
||||
requests over time. It is therefore logical to start by creating a single metric
|
||||
job for this KPI.
|
||||
|
||||
TIP: If you are using aggregated data, you can create an advanced job
|
||||
and configure it to use a `summary_count_field_name`. The {ml} algorithms will
|
||||
make the best possible use of summarized data in this case. For simplicity, in
|
||||
this tutorial we will not make use of that advanced functionality.
|
||||
|
||||
//TO-DO: Add link to aggregations.asciidoc: For more information, see <<>>.
|
||||
|
||||
A single metric job contains a single _detector_. A detector defines the type of
|
||||
analysis that will occur (for example, `max`, `average`, or `rare` analytical
|
||||
functions) and the fields that will be analyzed.
|
||||
|
||||
To create a single metric job in {kib}:
|
||||
|
||||
. Open {kib} in your web browser and log in. If you are running {kib} locally,
|
||||
go to `http://localhost:5601/`.
|
||||
|
||||
. Click **Machine Learning** in the side navigation: +
|
||||
+
|
||||
--
|
||||
[role="screenshot"]
|
||||
image::images/ml-kibana.jpg[Job Management]
|
||||
--
|
||||
|
||||
. Click **Create new job**.
|
||||
|
||||
. Click **Create single metric job**. +
|
||||
+
|
||||
--
|
||||
[role="screenshot"]
|
||||
image::images/ml-create-jobs.jpg["Create a new job"]
|
||||
--
|
||||
|
||||
. Click the `server-metrics` index. +
|
||||
+
|
||||
--
|
||||
[role="screenshot"]
|
||||
image::images/ml-gs-index.jpg["Select an index"]
|
||||
--
|
||||
|
||||
. Configure the job by providing the following information: +
|
||||
+
|
||||
--
|
||||
[role="screenshot"]
|
||||
image::images/ml-gs-single-job.jpg["Create a new job from the server-metrics index"]
|
||||
--
|
||||
|
||||
.. For the **Aggregation**, select `Sum`. This value specifies the analysis
|
||||
function that is used.
|
||||
+
|
||||
--
|
||||
Some of the analytical functions look for single anomalous data points. For
|
||||
example, `max` identifies the maximum value that is seen within a bucket.
|
||||
Others perform some aggregation over the length of the bucket. For example,
|
||||
`mean` calculates the mean of all the data points seen within the bucket.
|
||||
Similarly, `count` calculates the total number of data points within the bucket.
|
||||
In this tutorial, you are using the `sum` function, which calculates the sum of
|
||||
the specified field's values within the bucket.
|
||||
--
|
||||
|
||||
.. For the **Field**, select `total`. This value specifies the field that
|
||||
the detector uses in the function.
|
||||
+
|
||||
--
|
||||
NOTE: Some functions such as `count` and `rare` do not require fields.
|
||||
--
|
||||
|
||||
.. For the **Bucket span**, enter `10m`. This value specifies the size of the
|
||||
interval that the analysis is aggregated into.
|
||||
+
|
||||
--
|
||||
The {xpackml} features use the concept of a bucket to divide up the time series
|
||||
into batches for processing. For example, if you are monitoring
|
||||
the total number of requests in the system,
|
||||
//and receive a data point every 10 minutes
|
||||
using a bucket span of 1 hour would mean that at the end of each hour, it
|
||||
calculates the sum of the requests for the last hour and computes the
|
||||
anomalousness of that value compared to previous hours.
|
||||
|
||||
The bucket span has two purposes: it dictates over what time span to look for
|
||||
anomalous features in data, and also determines how quickly anomalies can be
|
||||
detected. Choosing a shorter bucket span enables anomalies to be detected more
|
||||
quickly. However, there is a risk of being too sensitive to natural variations
|
||||
or noise in the input data. Choosing too long a bucket span can mean that
|
||||
interesting anomalies are averaged away. There is also the possibility that the
|
||||
aggregation might smooth out some anomalies based on when the bucket starts
|
||||
in time.
|
||||
|
||||
The bucket span has a significant impact on the analysis. When you're trying to
|
||||
determine what value to use, take into account the granularity at which you
|
||||
want to perform the analysis, the frequency of the input data, the duration of
|
||||
typical anomalies and the frequency at which alerting is required.
|
||||
--
|
||||
|
||||
. Determine whether you want to process all of the data or only part of it. If
|
||||
you want to analyze all of the existing data, click
|
||||
**Use full server-metrics* data**. If you want to see what happens when you
|
||||
stop and start {dfeeds} and process additional data over time, click the time
|
||||
picker in the {kib} toolbar. Since the sample data spans a period of time
|
||||
between March 23, 2017 and April 22, 2017, click **Absolute**. Set the start
|
||||
time to March 23, 2017 and the end time to April 1, 2017, for example. Once
|
||||
you've got the time range set up, click the **Go** button. +
|
||||
+
|
||||
--
|
||||
[role="screenshot"]
|
||||
image::images/ml-gs-job1-time.jpg["Setting the time range for the {dfeed}"]
|
||||
--
|
||||
+
|
||||
--
|
||||
A graph is generated, which represents the total number of requests over time.
|
||||
--
|
||||
|
||||
. Provide a name for the job, for example `total-requests`. The job name must
|
||||
be unique in your cluster. You can also optionally provide a description of the
|
||||
job.
|
||||
|
||||
. Click **Create Job**. +
|
||||
+
|
||||
--
|
||||
[role="screenshot"]
|
||||
image::images/ml-gs-job1.jpg["A graph of the total number of requests over time"]
|
||||
--
|
||||
|
||||
As the job is created, the graph is updated to give a visual representation of
|
||||
the progress of {ml} as the data is processed. This view is only available whilst the
|
||||
job is running.
|
||||
|
||||
TIP: The `create_single_metic.sh` script creates a similar job and {dfeed} by
|
||||
using the {ml} APIs. You can download that script by clicking
|
||||
here: https://download.elastic.co/demos/machine_learning/gettingstarted/create_single_metric.sh[create_single_metric.sh]
|
||||
For API reference information, see {ref}/ml-apis.html[Machine Learning APIs].
|
||||
|
||||
[[ml-gs-job1-manage]]
|
||||
=== Managing Jobs
|
||||
|
||||
After you create a job, you can see its status in the **Job Management** tab: +
|
||||
|
||||
[role="screenshot"]
|
||||
image::images/ml-gs-job1-manage1.jpg["Status information for the total-requests job"]
|
||||
|
||||
The following information is provided for each job:
|
||||
|
||||
Job ID::
|
||||
The unique identifier for the job.
|
||||
|
||||
Description::
|
||||
The optional description of the job.
|
||||
|
||||
Processed records::
|
||||
The number of records that have been processed by the job.
|
||||
|
||||
Memory status::
|
||||
The status of the mathematical models. When you create jobs by using the APIs or
|
||||
by using the advanced options in {kib}, you can specify a `model_memory_limit`.
|
||||
That value is the maximum amount of memory resources that the mathematical
|
||||
models can use. Once that limit is approached, data pruning becomes more
|
||||
aggressive. Upon exceeding that limit, new entities are not modeled. For more
|
||||
information about this setting, see
|
||||
{ref}/ml-job-resource.html#ml-apilimits[Analysis Limits]. The memory status
|
||||
field reflects whether you have reached or exceeded the model memory limit. It
|
||||
can have one of the following values: +
|
||||
`ok`::: The models stayed below the configured value.
|
||||
`soft_limit`::: The models used more than 60% of the configured memory limit
|
||||
and older unused models will be pruned to free up space.
|
||||
`hard_limit`::: The models used more space than the configured memory limit.
|
||||
As a result, not all incoming data was processed.
|
||||
|
||||
Job state::
|
||||
The status of the job, which can be one of the following values: +
|
||||
`open`::: The job is available to receive and process data.
|
||||
`closed`::: The job finished successfully with its model state persisted.
|
||||
The job must be opened before it can accept further data.
|
||||
`closing`::: The job close action is in progress and has not yet completed.
|
||||
A closing job cannot accept further data.
|
||||
`failed`::: The job did not finish successfully due to an error.
|
||||
This situation can occur due to invalid input data.
|
||||
If the job had irrevocably failed, it must be force closed and then deleted.
|
||||
If the {dfeed} can be corrected, the job can be closed and then re-opened.
|
||||
|
||||
{dfeed-cap} state::
|
||||
The status of the {dfeed}, which can be one of the following values: +
|
||||
started::: The {dfeed} is actively receiving data.
|
||||
stopped::: The {dfeed} is stopped and will not receive data until it is
|
||||
re-started.
|
||||
|
||||
Latest timestamp::
|
||||
The timestamp of the last processed record.
|
||||
|
||||
|
||||
If you click the arrow beside the name of job, you can show or hide additional
|
||||
information, such as the settings, configuration information, or messages for
|
||||
the job.
|
||||
|
||||
You can also click one of the **Actions** buttons to start the {dfeed}, edit
|
||||
the job or {dfeed}, and clone or delete the job, for example.
|
||||
|
||||
[float]
|
||||
[[ml-gs-job1-datafeed]]
|
||||
==== Managing {dfeeds-cap}
|
||||
|
||||
A {dfeed} can be started and stopped multiple times throughout its lifecycle.
|
||||
If you want to retrieve more data from {es} and the {dfeed} is stopped, you must
|
||||
restart it.
|
||||
|
||||
For example, if you did not use the full data when you created the job, you can
|
||||
now process the remaining data by restarting the {dfeed}:
|
||||
|
||||
. In the **Machine Learning** / **Job Management** tab, click the following
|
||||
button to start the {dfeed}: image:images/ml-start-feed.jpg["Start {dfeed}"]
|
||||
|
||||
|
||||
. Choose a start time and end time. For example,
|
||||
click **Continue from 2017-04-01 23:59:00** and select **2017-04-30** as the
|
||||
search end time. Then click **Start**. The date picker defaults to the latest
|
||||
timestamp of processed data. Be careful not to leave any gaps in the analysis,
|
||||
otherwise you might miss anomalies. +
|
||||
+
|
||||
--
|
||||
[role="screenshot"]
|
||||
image::images/ml-gs-job1-datafeed.jpg["Restarting a {dfeed}"]
|
||||
--
|
||||
|
||||
The {dfeed} state changes to `started`, the job state changes to `opened`,
|
||||
and the number of processed records increases as the new data is analyzed. The
|
||||
latest timestamp information also increases. For example:
|
||||
[role="screenshot"]
|
||||
image::images/ml-gs-job1-manage2.jpg["Job opened and {dfeed} started"]
|
||||
|
||||
TIP: If your data is being loaded continuously, you can continue running the job
|
||||
in real time. For this, start your {dfeed} and select **No end time**.
|
||||
|
||||
If you want to stop the {dfeed} at this point, you can click the following
|
||||
button: image:images/ml-stop-feed.jpg["Stop {dfeed}"]
|
||||
|
||||
Now that you have processed all the data, let's start exploring the job results.
|
||||
|
||||
[[ml-gs-job1-analyze]]
|
||||
=== Exploring Single Metric Job Results
|
||||
|
||||
The {xpackml} features analyze the input stream of data, model its behavior,
|
||||
and perform analysis based on the detectors you defined in your job. When an
|
||||
event occurs outside of the model, that event is identified as an anomaly.
|
||||
|
||||
Result records for each anomaly are stored in `.ml-anomalies-*` indices in {es}.
|
||||
By default, the name of the index where {ml} results are stored is labelled
|
||||
`shared`, which corresponds to the `.ml-anomalies-shared` index.
|
||||
|
||||
You can use the **Anomaly Explorer** or the **Single Metric Viewer** in {kib} to
|
||||
view the analysis results.
|
||||
|
||||
Anomaly Explorer::
|
||||
This view contains swim lanes showing the maximum anomaly score over time.
|
||||
There is an overall swim lane that shows the overall score for the job, and
|
||||
also swim lanes for each influencer. By selecting a block in a swim lane, the
|
||||
anomaly details are displayed alongside the original source data (where
|
||||
applicable).
|
||||
|
||||
Single Metric Viewer::
|
||||
This view contains a chart that represents the actual and expected values over
|
||||
time. This is only available for jobs that analyze a single time series and
|
||||
where `model_plot_config` is enabled. As in the **Anomaly Explorer**, anomalous
|
||||
data points are shown in different colors depending on their score.
|
||||
|
||||
By default when you view the results for a single metric job, the
|
||||
**Single Metric Viewer** opens:
|
||||
[role="screenshot"]
|
||||
image::images/ml-gs-job1-analysis.jpg["Single Metric Viewer for total-requests job"]
|
||||
|
||||
|
||||
The blue line in the chart represents the actual data values. The shaded blue
|
||||
area represents the bounds for the expected values. The area between the upper
|
||||
and lower bounds are the most likely values for the model. If a value is outside
|
||||
of this area then it can be said to be anomalous.
|
||||
|
||||
If you slide the time selector from the beginning of the data to the end of the
|
||||
data, you can see how the model improves as it processes more data. At the
|
||||
beginning, the expected range of values is pretty broad and the model is not
|
||||
capturing the periodicity in the data. But it quickly learns and begins to
|
||||
reflect the daily variation.
|
||||
|
||||
Any data points outside the range that was predicted by the model are marked
|
||||
as anomalies. When you have high volumes of real-life data, many anomalies
|
||||
might be found. These vary in probability from very likely to highly unlikely,
|
||||
that is to say, from not particularly anomalous to highly anomalous. There
|
||||
can be none, one or two or tens, sometimes hundreds of anomalies found within
|
||||
each bucket. There can be many thousands found per job. In order to provide
|
||||
a sensible view of the results, an _anomaly score_ is calculated for each bucket
|
||||
time interval. The anomaly score is a value from 0 to 100, which indicates
|
||||
the significance of the observed anomaly compared to previously seen anomalies.
|
||||
The highly anomalous values are shown in red and the low scored values are
|
||||
indicated in blue. An interval with a high anomaly score is significant and
|
||||
requires investigation.
|
||||
|
||||
Slide the time selector to a section of the time series that contains a red
|
||||
anomaly data point. If you hover over the point, you can see more information
|
||||
about that data point. You can also see details in the **Anomalies** section
|
||||
of the viewer. For example:
|
||||
[role="screenshot"]
|
||||
image::images/ml-gs-job1-anomalies.jpg["Single Metric Viewer Anomalies for total-requests job"]
|
||||
|
||||
For each anomaly you can see key details such as the time, the actual and
|
||||
expected ("typical") values, and their probability.
|
||||
|
||||
By default, the table contains all anomalies that have a severity of "warning"
|
||||
or higher in the selected section of the timeline. If you are only interested in
|
||||
critical anomalies, for example, you can change the severity threshold for this
|
||||
table.
|
||||
|
||||
The anomalies table also automatically calculates an interval for the data in
|
||||
the table. If the time difference between the earliest and latest records in the
|
||||
table is less than two days, the data is aggregated by hour to show the details
|
||||
of the highest severity anomaly for each detector. Otherwise, it is
|
||||
aggregated by day. You can change the interval for the table, for example, to
|
||||
show all anomalies.
|
||||
|
||||
You can see the same information in a different format by using the
|
||||
**Anomaly Explorer**:
|
||||
[role="screenshot"]
|
||||
image::images/ml-gs-job1-explorer.jpg["Anomaly Explorer for total-requests job"]
|
||||
|
||||
|
||||
Click one of the red sections in the swim lane to see details about the anomalies
|
||||
that occurred in that time interval. For example:
|
||||
[role="screenshot"]
|
||||
image::images/ml-gs-job1-explorer-anomaly.jpg["Anomaly Explorer details for total-requests job"]
|
||||
|
||||
After you have identified anomalies, often the next step is to try to determine
|
||||
the context of those situations. For example, are there other factors that are
|
||||
contributing to the problem? Are the anomalies confined to particular
|
||||
applications or servers? You can begin to troubleshoot these situations by
|
||||
layering additional jobs or creating multi-metric jobs.
|
||||
|
||||
include::getting-started-data.asciidoc[]
|
||||
include::getting-started-single.asciidoc[]
|
||||
include::getting-started-multi.asciidoc[]
|
||||
include::getting-started-next.asciidoc[]
|
||||
|
Loading…
x
Reference in New Issue
Block a user