[DOCS] Moves ML content to stack-docs

This commit is contained in:
lcawl 2018-06-07 09:26:00 -07:00
parent d0f35d204e
commit 1de38a2488
31 changed files with 0 additions and 1388 deletions

View File

@ -1,210 +0,0 @@
[[ml-gs-data]]
=== Identifying Data for Analysis
For the purposes of this tutorial, we provide sample data that you can play with
and search in {es}. When you consider your own data, however, it's important to
take a moment and think about where the {xpackml} features will be most
impactful.
The first consideration is that it must be time series data. The {ml} features
are designed to model and detect anomalies in time series data.
The second consideration, especially when you are first learning to use {ml},
is the importance of the data and how familiar you are with it. Ideally, it is
information that contains key performance indicators (KPIs) for the health,
security, or success of your business or system. It is information that you need
to monitor and act on when anomalous behavior occurs. You might even have {kib}
dashboards that you're already using to watch this data. The better you know the
data, the quicker you will be able to create {ml} jobs that generate useful
insights.
The final consideration is where the data is located. This tutorial assumes that
your data is stored in {es}. It guides you through the steps required to create
a _{dfeed}_ that passes data to a job. If your own data is outside of {es},
analysis is still possible by using a post data API.
IMPORTANT: If you want to create {ml} jobs in {kib}, you must use {dfeeds}.
That is to say, you must store your input data in {es}. When you create
a job, you select an existing index pattern and {kib} configures the {dfeed}
for you under the covers.
[float]
[[ml-gs-sampledata]]
==== Obtaining a Sample Data Set
In this step we will upload some sample data to {es}. This is standard
{es} functionality, and is needed to set the stage for using {ml}.
The sample data for this tutorial contains information about the requests that
are received by various applications and services in a system. A system
administrator might use this type of information to track the total number of
requests across all of the infrastructure. If the number of requests increases
or decreases unexpectedly, for example, this might be an indication that there
is a problem or that resources need to be redistributed. By using the {xpack}
{ml} features to model the behavior of this data, it is easier to identify
anomalies and take appropriate action.
Download this sample data by clicking here:
https://download.elastic.co/demos/machine_learning/gettingstarted/server_metrics.tar.gz[server_metrics.tar.gz]
Use the following commands to extract the files:
[source,sh]
----------------------------------
tar -zxvf server_metrics.tar.gz
----------------------------------
Each document in the server-metrics data set has the following schema:
[source,js]
----------------------------------
{
"index":
{
"_index":"server-metrics",
"_type":"metric",
"_id":"1177"
}
}
{
"@timestamp":"2017-03-23T13:00:00",
"accept":36320,
"deny":4156,
"host":"server_2",
"response":2.4558210155,
"service":"app_3",
"total":40476
}
----------------------------------
// NOTCONSOLE
TIP: The sample data sets include summarized data. For example, the `total`
value is a sum of the requests that were received by a specific service at a
particular time. If your data is stored in {es}, you can generate
this type of sum or average by using aggregations. One of the benefits of
summarizing data this way is that {es} automatically distributes
these calculations across your cluster. You can then feed this summarized data
into {xpackml} instead of raw results, which reduces the volume
of data that must be considered while detecting anomalies. For the purposes of
this tutorial, however, these summary values are stored in {es}. For more
information, see <<ml-configuring-aggregation>>.
Before you load the data set, you need to set up {ref}/mapping.html[_mappings_]
for the fields. Mappings divide the documents in the index into logical groups
and specify a field's characteristics, such as the field's searchability or
whether or not it's _tokenized_, or broken up into separate words.
The sample data includes an `upload_server-metrics.sh` script, which you can use
to create the mappings and load the data set. You can download it by clicking
here: https://download.elastic.co/demos/machine_learning/gettingstarted/upload_server-metrics.sh[upload_server-metrics.sh]
Before you run it, however, you must edit the USERNAME and PASSWORD variables
with your actual user ID and password.
The script runs a command similar to the following example, which sets up a
mapping for the data set:
[source,sh]
----------------------------------
curl -u elastic:x-pack-test-password -X PUT -H 'Content-Type: application/json'
http://localhost:9200/server-metrics -d '{
"settings":{
"number_of_shards":1,
"number_of_replicas":0
},
"mappings":{
"metric":{
"properties":{
"@timestamp":{
"type":"date"
},
"accept":{
"type":"long"
},
"deny":{
"type":"long"
},
"host":{
"type":"keyword"
},
"response":{
"type":"float"
},
"service":{
"type":"keyword"
},
"total":{
"type":"long"
}
}
}
}
}'
----------------------------------
// NOTCONSOLE
NOTE: If you run this command, you must replace `x-pack-test-password` with your
actual password.
You can then use the {es} `bulk` API to load the data set. The
`upload_server-metrics.sh` script runs commands similar to the following
example, which loads the four JSON files:
[source,sh]
----------------------------------
curl -u elastic:x-pack-test-password -X POST -H "Content-Type: application/json"
http://localhost:9200/server-metrics/_bulk --data-binary "@server-metrics_1.json"
curl -u elastic:x-pack-test-password -X POST -H "Content-Type: application/json"
http://localhost:9200/server-metrics/_bulk --data-binary "@server-metrics_2.json"
curl -u elastic:x-pack-test-password -X POST -H "Content-Type: application/json"
http://localhost:9200/server-metrics/_bulk --data-binary "@server-metrics_3.json"
curl -u elastic:x-pack-test-password -X POST -H "Content-Type: application/json"
http://localhost:9200/server-metrics/_bulk --data-binary "@server-metrics_4.json"
----------------------------------
// NOTCONSOLE
TIP: This will upload 200MB of data. This is split into 4 files as there is a
maximum 100MB limit when using the `_bulk` API.
These commands might take some time to run, depending on the computing resources
available.
You can verify that the data was loaded successfully with the following command:
[source,sh]
----------------------------------
curl 'http://localhost:9200/_cat/indices?v' -u elastic:x-pack-test-password
----------------------------------
// NOTCONSOLE
You should see output similar to the following:
[source,txt]
----------------------------------
health status index ... pri rep docs.count ...
green open server-metrics ... 1 0 905940 ...
----------------------------------
// NOTCONSOLE
Next, you must define an index pattern for this data set:
. Open {kib} in your web browser and log in. If you are running {kib}
locally, go to `http://localhost:5601/`.
. Click the **Management** tab, then **{kib}** > **Index Patterns**.
. If you already have index patterns, click **Create Index** to define a new
one. Otherwise, the **Create index pattern** wizard is already open.
. For this tutorial, any pattern that matches the name of the index you've
loaded will work. For example, enter `server-metrics*` as the index pattern.
. In the **Configure settings** step, select the `@timestamp` field in the
**Time Filter field name** list.
. Click **Create index pattern**.
This data set can now be analyzed in {ml} jobs in {kib}.

View File

@ -1,76 +0,0 @@
[[ml-gs-forecast]]
=== Creating Forecasts
In addition to detecting anomalous behavior in your data, you can use
{ml} to predict future behavior. For more information, see <<ml-forecasting>>.
To create a forecast in {kib}:
. Go to the **Single Metric Viewer** and select one of the jobs that you created
in this tutorial. For example, select the `total-requests` job.
. Click **Forecast**. +
+
--
[role="screenshot"]
image::images/ml-gs-forecast.jpg["Create a forecast from the Single Metric Viewer"]
--
. Specify a duration for your forecast. This value indicates how far to
extrapolate beyond the last record that was processed. You must use time units,
such as `30d` for 30 days. For more information, see
{ref}/common-options.html#time-units[Time Units]. In this example, we use a
duration of 1 week: +
+
--
[role="screenshot"]
image::images/ml-gs-duration.jpg["Specify a duration of 1w"]
--
. View the forecast in the **Single Metric Viewer**: +
+
--
[role="screenshot"]
image::images/ml-gs-forecast-results.jpg["View a forecast from the Single Metric Viewer"]
The yellow line in the chart represents the predicted data values. The shaded
yellow area represents the bounds for the predicted values, which also gives an
indication of the confidence of the predictions. Note that the bounds generally
increase with time (that is to say, the confidence levels decrease), since you
are forecasting further into the future. Eventually if the confidence levels are
too low, the forecast stops.
--
. Optional: Compare the forecast to actual data. +
+
--
You can try this with the sample data by choosing a subset of the data when you
create the job, as described in <<ml-gs-jobs>>. Create the forecast then process
the remaining data, as described in <<ml-gs-job1-datafeed>>.
--
.. After you restart the {dfeed}, re-open the forecast by selecting the job in
the **Single Metric Viewer**, clicking **Forecast**, and selecting your forecast
from the list. For example: +
+
--
[role="screenshot"]
image::images/ml-gs-forecast-open.jpg["Open a forecast in the Single Metric Viewer"]
--
.. View the forecast and actual data in the **Single Metric Viewer**: +
+
--
[role="screenshot"]
image::images/ml-gs-forecast-actual.jpg["View a forecast over actual data in the Single Metric Viewer"]
The chart contains the actual data values, the bounds for the expected values,
the anomalies, the forecast data values, and the bounds for the forecast. This
combination of actual and forecast data gives you an indication of how well the
{xpack} {ml} features can extrapolate the future behavior of the data.
--
Now that you have seen how easy it is to create forecasts with the sample data,
consider what type of events you might want to predict in your own data. For
more information and ideas, as well as a list of limitations related to
forecasts, see <<ml-forecasting>>.

View File

@ -1,211 +0,0 @@
[[ml-gs-multi-jobs]]
=== Creating Multi-metric Jobs
The multi-metric job wizard in {kib} provides a simple way to create more
complex jobs with multiple detectors. For example, in the single metric job, you
were tracking total requests versus time. You might also want to track other
metrics like average response time or the maximum number of denied requests.
Instead of creating jobs for each of those metrics, you can combine them in a
multi-metric job.
You can also use multi-metric jobs to split a single time series into multiple
time series based on a categorical field. For example, you can split the data
based on its hostnames, locations, or users. Each time series is modeled
independently. By looking at temporal patterns on a per entity basis, you might
spot things that might have otherwise been hidden in the lumped view.
Conceptually, you can think of this as running many independent single metric
jobs. By bundling them together in a multi-metric job, however, you can see an
overall score and shared influencers for all the metrics and all the entities in
the job. Multi-metric jobs therefore scale better than having many independent
single metric jobs and provide better results when you have influencers that are
shared across the detectors.
The sample data for this tutorial contains information about the requests that
are received by various applications and services in a system. Let's assume that
you want to monitor the requests received and the response time. In particular,
you might want to track those metrics on a per service basis to see if any
services have unusual patterns.
To create a multi-metric job in {kib}:
. Open {kib} in your web browser and log in. If you are running {kib} locally,
go to `http://localhost:5601/`.
. Click **Machine Learning** in the side navigation, then click **Create new job**.
. Select the index pattern that you created for the sample data. For example,
`server-metrics*`.
. In the **Use a wizard** section, click **Multi metric**.
. Configure the job by providing the following job settings: +
+
--
[role="screenshot"]
image::images/ml-gs-multi-job.jpg["Create a new job from the server-metrics index"]
--
.. For the **Fields**, select `high mean(response)` and `sum(total)`. This
creates two detectors and specifies the analysis function and field that each
detector uses. The first detector uses the high mean function to detect
unusually high average values for the `response` field in each bucket. The
second detector uses the sum function to detect when the sum of the `total`
field is anomalous in each bucket. For more information about any of the
analytical functions, see <<ml-functions>>.
.. For the **Bucket span**, enter `10m`. This value specifies the size of the
interval that the analysis is aggregated into. As was the case in the single
metric example, this value has a significant impact on the analysis. When you're
creating jobs for your own data, you might need to experiment with different
bucket spans depending on the frequency of the input data, the duration of
typical anomalies, and the frequency at which alerting is required.
.. For the **Split Data**, select `service`. When you specify this
option, the analysis is segmented such that you have completely independent
baselines for each distinct value of this field.
//TBD: What is the importance of having separate baselines?
There are seven unique service keyword values in the sample data. Thus for each
of the seven services, you will see the high mean response metrics and sum
total metrics. +
+
--
NOTE: If you are creating a job by using the {ml} APIs or the advanced job
wizard in {kib}, you can accomplish this split by using the
`partition_field_name` property.
--
.. For the **Key Fields (Influencers)**, select `host`. Note that the `service` field
is also automatically selected because you used it to split the data. These key
fields are also known as _influencers_.
When you identify a field as an influencer, you are indicating that you think
it contains information about someone or something that influences or
contributes to anomalies.
+
--
[TIP]
========================
Picking an influencer is strongly recommended for the following reasons:
* It allows you to more easily assign blame for the anomaly
* It simplifies and aggregates the results
The best influencer is the person or thing that you want to blame for the
anomaly. In many cases, users or client IP addresses make excellent influencers.
Influencers can be any field in your data; they do not need to be fields that
are specified in your detectors, though they often are.
As a best practice, do not pick too many influencers. For example, you generally
do not need more than three. If you pick many influencers, the results can be
overwhelming and there is a small overhead to the analysis.
========================
//TBD: Is this something you can determine later from looking at results and
//update your job with if necessary? Is it all post-processing or does it affect
//the ongoing modeling?
--
. Click **Use full server-metrics* data**. Two graphs are generated for each
`service` value, which represent the high mean `response` values and
sum `total` values over time. For example:
+
--
[role="screenshot"]
image::images/ml-gs-job2-split.jpg["Kibana charts for data split by service"]
--
. Provide a name for the job, for example `response_requests_by_app`. The job
name must be unique in your cluster. You can also optionally provide a
description of the job.
. Click **Create Job**.
When the job is created, you can choose to view the results, continue the job in
real-time, and create a watch. In this tutorial, we will proceed to view the
results.
TIP: The `create_multi_metic.sh` script creates a similar job and {dfeed} by
using the {ml} APIs. You can download that script by clicking
here: https://download.elastic.co/demos/machine_learning/gettingstarted/create_multi_metric.sh[create_multi_metric.sh]
For API reference information, see {ref}/ml-apis.html[Machine Learning APIs].
[[ml-gs-job2-analyze]]
=== Exploring Multi-metric Job Results
The {xpackml} features analyze the input stream of data, model its behavior, and
perform analysis based on the two detectors you defined in your job. When an
event occurs outside of the model, that event is identified as an anomaly.
You can use the **Anomaly Explorer** in {kib} to view the analysis results:
[role="screenshot"]
image::images/ml-gs-job2-explorer.jpg["Job results in the Anomaly Explorer"]
You can explore the overall anomaly time line, which shows the maximum anomaly
score for each section in the specified time period. You can change the time
period by using the time picker in the {kib} toolbar. Note that the sections in
this time line do not necessarily correspond to the bucket span. If you change
the time period, the sections change size too. The smallest possible size for
these sections is a bucket. If you specify a large time period, the sections can
span many buckets.
On the left is a list of the top influencers for all of the detected anomalies
in that same time period. The list includes maximum anomaly scores, which in
this case are aggregated for each influencer, for each bucket, across all
detectors. There is also a total sum of the anomaly scores for each influencer.
You can use this list to help you narrow down the contributing factors and focus
on the most anomalous entities.
If your job contains influencers, you can also explore swim lanes that
correspond to the values of an influencer. In this example, the swim lanes
correspond to the values for the `service` field that you used to split the data.
Each lane represents a unique application or service name. Since you specified
the `host` field as an influencer, you can also optionally view the results in
swim lanes for each host name:
[role="screenshot"]
image::images/ml-gs-job2-explorer-host.jpg["Job results sorted by host"]
By default, the swim lanes are ordered by their maximum anomaly score values.
You can click on the sections in the swim lane to see details about the
anomalies that occurred in that time interval.
NOTE: The anomaly scores that you see in each section of the **Anomaly Explorer**
might differ slightly. This disparity occurs because for each job we generate
bucket results, influencer results, and record results. Anomaly scores are
generated for each type of result. The anomaly timeline uses the bucket-level
anomaly scores. The list of top influencers uses the influencer-level anomaly
scores. The list of anomalies uses the record-level anomaly scores. For more
information about these different result types, see
{ref}/ml-results-resource.html[Results Resources].
Click on a section in the swim lanes to obtain more information about the
anomalies in that time period. For example, click on the red section in the swim
lane for `server_2`:
[role="screenshot"]
image::images/ml-gs-job2-explorer-anomaly.jpg["Job results for an anomaly"]
You can see exact times when anomalies occurred and which detectors or metrics
caught the anomaly. Also note that because you split the data by the `service`
field, you see separate charts for each applicable service. In particular, you
see charts for each service for which there is data on the specified host in the
specified time interval.
Below the charts, there is a table that provides more information, such as the
typical and actual values and the influencers that contributed to the anomaly.
[role="screenshot"]
image::images/ml-gs-job2-explorer-table.jpg["Job results table"]
Notice that there are anomalies for both detectors, that is to say for both the
`high_mean(response)` and the `sum(total)` metrics in this time interval. The
table aggregates the anomalies to show the highest severity anomaly per detector
and entity, which is the by, over, or partition field value that is displayed
in the **found for** column. To view all the anomalies without any aggregation,
set the **Interval** to `Show all`.
By
investigating multiple metrics in a single job, you might see relationships
between events in your data that would otherwise be overlooked.

View File

@ -1,55 +0,0 @@
[[ml-gs-next]]
=== Next Steps
By completing this tutorial, you've learned how you can detect anomalous
behavior in a simple set of sample data. You created single and multi-metric
jobs in {kib}, which creates and opens jobs and creates and starts {dfeeds} for
you under the covers. You examined the results of the {ml} analysis in the
**Single Metric Viewer** and **Anomaly Explorer** in {kib}. You also
extrapolated the future behavior of a job by creating a forecast.
If you want to learn about advanced job options, you might be interested in
the following video tutorial:
https://www.elastic.co/videos/machine-learning-lab-3-detect-outliers-in-a-population[Machine Learning Lab 3 - Detect Outliers in a Population].
If you intend to use {ml} APIs in your applications, a good next step might be
to learn about the APIs by retrieving information about these sample jobs.
For example, the following APIs retrieve information about the jobs and {dfeeds}.
[source,js]
--------------------------------------------------
GET _xpack/ml/anomaly_detectors
GET _xpack/ml/datafeeds
--------------------------------------------------
// CONSOLE
For more information about the {ml} APIs, see <<ml-api-quickref>>.
Ultimately, the next step is to start applying {ml} to your own data.
As mentioned in <<ml-gs-data>>, there are three things to consider when you're
thinking about where {ml} will be most impactful:
. It must be time series data.
. It should be information that contains key performance indicators for the
health, security, or success of your business or system. The better you know the
data, the quicker you will be able to create jobs that generate useful
insights.
. Ideally, the data is located in {es} and you can therefore create a {dfeed}
that retrieves data in real time. If your data is outside of {es}, you
cannot use {kib} to create your jobs and you cannot use {dfeeds}. Machine
learning analysis is still possible, however, by using APIs to create and manage
jobs and to post data to them.
Once you have decided which data to analyze, you can start considering which
analysis functions you want to use. For more information, see <<ml-functions>>.
In general, it is a good idea to start with single metric jobs for your
key performance indicators. After you examine these simple analysis results,
you will have a better idea of what the influencers might be. You can create
multi-metric jobs and split the data or create more complex analysis functions
as necessary. For examples of more complicated configuration options, see
<<ml-configuring>>.
If you encounter problems, we're here to help. See <<help>> and
<<ml-troubleshooting>>.

View File

@ -1,331 +0,0 @@
[[ml-gs-jobs]]
=== Creating Single Metric Jobs
At this point in the tutorial, the goal is to detect anomalies in the
total requests received by your applications and services. The sample data
contains a single key performance indicator(KPI) to track this, which is the total
requests over time. It is therefore logical to start by creating a single metric
job for this KPI.
TIP: If you are using aggregated data, you can create an advanced job
and configure it to use a `summary_count_field_name`. The {ml} algorithms will
make the best possible use of summarized data in this case. For simplicity, in
this tutorial we will not make use of that advanced functionality. For more
information, see <<ml-configuring-aggregation>>.
A single metric job contains a single _detector_. A detector defines the type of
analysis that will occur (for example, `max`, `average`, or `rare` analytical
functions) and the fields that will be analyzed.
To create a single metric job in {kib}:
. Open {kib} in your web browser and log in. If you are running {kib} locally,
go to `http://localhost:5601/`.
. Click **Machine Learning** in the side navigation.
. Click **Create new job**.
. Select the index pattern that you created for the sample data. For example,
`server-metrics*`.
. In the **Use a wizard** section, click **Single metric**.
. Configure the job by providing the following information: +
+
--
[role="screenshot"]
image::images/ml-gs-single-job.jpg["Create a new job from the server-metrics index"]
--
.. For the **Aggregation**, select `Sum`. This value specifies the analysis
function that is used.
+
--
Some of the analytical functions look for single anomalous data points. For
example, `max` identifies the maximum value that is seen within a bucket.
Others perform some aggregation over the length of the bucket. For example,
`mean` calculates the mean of all the data points seen within the bucket.
Similarly, `count` calculates the total number of data points within the bucket.
In this tutorial, you are using the `sum` function, which calculates the sum of
the specified field's values within the bucket. For descriptions of all the
functions, see <<ml-functions>>.
--
.. For the **Field**, select `total`. This value specifies the field that
the detector uses in the function.
+
--
NOTE: Some functions such as `count` and `rare` do not require fields.
--
.. For the **Bucket span**, enter `10m`. This value specifies the size of the
interval that the analysis is aggregated into.
+
--
The {xpackml} features use the concept of a bucket to divide up the time series
into batches for processing. For example, if you are monitoring
the total number of requests in the system,
using a bucket span of 1 hour would mean that at the end of each hour, it
calculates the sum of the requests for the last hour and computes the
anomalousness of that value compared to previous hours.
The bucket span has two purposes: it dictates over what time span to look for
anomalous features in data, and also determines how quickly anomalies can be
detected. Choosing a shorter bucket span enables anomalies to be detected more
quickly. However, there is a risk of being too sensitive to natural variations
or noise in the input data. Choosing too long a bucket span can mean that
interesting anomalies are averaged away. There is also the possibility that the
aggregation might smooth out some anomalies based on when the bucket starts
in time.
The bucket span has a significant impact on the analysis. When you're trying to
determine what value to use, take into account the granularity at which you
want to perform the analysis, the frequency of the input data, the duration of
typical anomalies, and the frequency at which alerting is required.
--
. Determine whether you want to process all of the data or only part of it. If
you want to analyze all of the existing data, click
**Use full server-metrics* data**. If you want to see what happens when you
stop and start {dfeeds} and process additional data over time, click the time
picker in the {kib} toolbar. Since the sample data spans a period of time
between March 23, 2017 and April 22, 2017, click **Absolute**. Set the start
time to March 23, 2017 and the end time to April 1, 2017, for example. Once
you've got the time range set up, click the **Go** button. +
+
--
[role="screenshot"]
image::images/ml-gs-job1-time.jpg["Setting the time range for the {dfeed}"]
--
+
--
A graph is generated, which represents the total number of requests over time.
Note that the **Estimate bucket span** option is no longer greyed out in the
**Buck span** field. This is an experimental feature that you can use to help
determine an appropriate bucket span for your data. For the purposes of this
tutorial, we will leave the bucket span at 10 minutes.
--
. Provide a name for the job, for example `total-requests`. The job name must
be unique in your cluster. You can also optionally provide a description of the
job and create a job group.
. Click **Create Job**. +
+
--
[role="screenshot"]
image::images/ml-gs-job1.jpg["A graph of the total number of requests over time"]
--
As the job is created, the graph is updated to give a visual representation of
the progress of {ml} as the data is processed. This view is only available whilst the
job is running.
When the job is created, you can choose to view the results, continue the job
in real-time, and create a watch. In this tutorial, we will look at how to
manage jobs and {dfeeds} before we view the results.
TIP: The `create_single_metic.sh` script creates a similar job and {dfeed} by
using the {ml} APIs. You can download that script by clicking
here: https://download.elastic.co/demos/machine_learning/gettingstarted/create_single_metric.sh[create_single_metric.sh]
For API reference information, see {ref}/ml-apis.html[Machine Learning APIs].
[[ml-gs-job1-manage]]
=== Managing Jobs
After you create a job, you can see its status in the **Job Management** tab: +
[role="screenshot"]
image::images/ml-gs-job1-manage1.jpg["Status information for the total-requests job"]
The following information is provided for each job:
Job ID::
The unique identifier for the job.
Description::
The optional description of the job.
Processed records::
The number of records that have been processed by the job.
Memory status::
The status of the mathematical models. When you create jobs by using the APIs or
by using the advanced options in {kib}, you can specify a `model_memory_limit`.
That value is the maximum amount of memory resources that the mathematical
models can use. Once that limit is approached, data pruning becomes more
aggressive. Upon exceeding that limit, new entities are not modeled. For more
information about this setting, see
{ref}/ml-job-resource.html#ml-apilimits[Analysis Limits]. The memory status
field reflects whether you have reached or exceeded the model memory limit. It
can have one of the following values: +
`ok`::: The models stayed below the configured value.
`soft_limit`::: The models used more than 60% of the configured memory limit
and older unused models will be pruned to free up space.
`hard_limit`::: The models used more space than the configured memory limit.
As a result, not all incoming data was processed.
Job state::
The status of the job, which can be one of the following values: +
`opened`::: The job is available to receive and process data.
`closed`::: The job finished successfully with its model state persisted.
The job must be opened before it can accept further data.
`closing`::: The job close action is in progress and has not yet completed.
A closing job cannot accept further data.
`failed`::: The job did not finish successfully due to an error.
This situation can occur due to invalid input data.
If the job had irrevocably failed, it must be force closed and then deleted.
If the {dfeed} can be corrected, the job can be closed and then re-opened.
{dfeed-cap} state::
The status of the {dfeed}, which can be one of the following values: +
started::: The {dfeed} is actively receiving data.
stopped::: The {dfeed} is stopped and will not receive data until it is
re-started.
Latest timestamp::
The timestamp of the last processed record.
If you click the arrow beside the name of job, you can show or hide additional
information, such as the settings, configuration information, or messages for
the job.
You can also click one of the **Actions** buttons to start the {dfeed}, edit
the job or {dfeed}, and clone or delete the job, for example.
[float]
[[ml-gs-job1-datafeed]]
==== Managing {dfeeds-cap}
A {dfeed} can be started and stopped multiple times throughout its lifecycle.
If you want to retrieve more data from {es} and the {dfeed} is stopped, you must
restart it.
For example, if you did not use the full data when you created the job, you can
now process the remaining data by restarting the {dfeed}:
. In the **Machine Learning** / **Job Management** tab, click the following
button to start the {dfeed}: image:images/ml-start-feed.jpg["Start {dfeed}"]
. Choose a start time and end time. For example,
click **Continue from 2017-04-01 23:59:00** and select **2017-04-30** as the
search end time. Then click **Start**. The date picker defaults to the latest
timestamp of processed data. Be careful not to leave any gaps in the analysis,
otherwise you might miss anomalies. +
+
--
[role="screenshot"]
image::images/ml-gs-job1-datafeed.jpg["Restarting a {dfeed}"]
--
The {dfeed} state changes to `started`, the job state changes to `opened`,
and the number of processed records increases as the new data is analyzed. The
latest timestamp information also increases.
TIP: If your data is being loaded continuously, you can continue running the job
in real time. For this, start your {dfeed} and select **No end time**.
If you want to stop the {dfeed} at this point, you can click the following
button: image:images/ml-stop-feed.jpg["Stop {dfeed}"]
Now that you have processed all the data, let's start exploring the job results.
[[ml-gs-job1-analyze]]
=== Exploring Single Metric Job Results
The {xpackml} features analyze the input stream of data, model its behavior,
and perform analysis based on the detectors you defined in your job. When an
event occurs outside of the model, that event is identified as an anomaly.
Result records for each anomaly are stored in `.ml-anomalies-*` indices in {es}.
By default, the name of the index where {ml} results are stored is labelled
`shared`, which corresponds to the `.ml-anomalies-shared` index.
You can use the **Anomaly Explorer** or the **Single Metric Viewer** in {kib} to
view the analysis results.
Anomaly Explorer::
This view contains swim lanes showing the maximum anomaly score over time.
There is an overall swim lane that shows the overall score for the job, and
also swim lanes for each influencer. By selecting a block in a swim lane, the
anomaly details are displayed alongside the original source data (where
applicable).
Single Metric Viewer::
This view contains a chart that represents the actual and expected values over
time. This is only available for jobs that analyze a single time series and
where `model_plot_config` is enabled. As in the **Anomaly Explorer**, anomalous
data points are shown in different colors depending on their score.
By default when you view the results for a single metric job, the
**Single Metric Viewer** opens:
[role="screenshot"]
image::images/ml-gs-job1-analysis.jpg["Single Metric Viewer for total-requests job"]
The blue line in the chart represents the actual data values. The shaded blue
area represents the bounds for the expected values. The area between the upper
and lower bounds are the most likely values for the model. If a value is outside
of this area then it can be said to be anomalous.
If you slide the time selector from the beginning of the data to the end of the
data, you can see how the model improves as it processes more data. At the
beginning, the expected range of values is pretty broad and the model is not
capturing the periodicity in the data. But it quickly learns and begins to
reflect the daily variation.
Any data points outside the range that was predicted by the model are marked
as anomalies. When you have high volumes of real-life data, many anomalies
might be found. These vary in probability from very likely to highly unlikely,
that is to say, from not particularly anomalous to highly anomalous. There
can be none, one or two or tens, sometimes hundreds of anomalies found within
each bucket. There can be many thousands found per job. In order to provide
a sensible view of the results, an _anomaly score_ is calculated for each bucket
time interval. The anomaly score is a value from 0 to 100, which indicates
the significance of the observed anomaly compared to previously seen anomalies.
The highly anomalous values are shown in red and the low scored values are
indicated in blue. An interval with a high anomaly score is significant and
requires investigation.
Slide the time selector to a section of the time series that contains a red
anomaly data point. If you hover over the point, you can see more information
about that data point. You can also see details in the **Anomalies** section
of the viewer. For example:
[role="screenshot"]
image::images/ml-gs-job1-anomalies.jpg["Single Metric Viewer Anomalies for total-requests job"]
For each anomaly you can see key details such as the time, the actual and
expected ("typical") values, and their probability.
By default, the table contains all anomalies that have a severity of "warning"
or higher in the selected section of the timeline. If you are only interested in
critical anomalies, for example, you can change the severity threshold for this
table.
The anomalies table also automatically calculates an interval for the data in
the table. If the time difference between the earliest and latest records in the
table is less than two days, the data is aggregated by hour to show the details
of the highest severity anomaly for each detector. Otherwise, it is
aggregated by day. You can change the interval for the table, for example, to
show all anomalies.
You can see the same information in a different format by using the
**Anomaly Explorer**:
[role="screenshot"]
image::images/ml-gs-job1-explorer.jpg["Anomaly Explorer for total-requests job"]
Click one of the red sections in the swim lane to see details about the anomalies
that occurred in that time interval. For example:
[role="screenshot"]
image::images/ml-gs-job1-explorer-anomaly.jpg["Anomaly Explorer details for total-requests job"]
After you have identified anomalies, often the next step is to try to determine
the context of those situations. For example, are there other factors that are
contributing to the problem? Are the anomalies confined to particular
applications or servers? You can begin to troubleshoot these situations by
layering additional jobs or creating multi-metric jobs.

View File

@ -1,99 +0,0 @@
[[ml-gs-wizards]]
=== Creating Jobs in {kib}
++++
<titleabbrev>Creating Jobs</titleabbrev>
++++
Machine learning jobs contain the configuration information and metadata
necessary to perform an analytical task. They also contain the results of the
analytical task.
[NOTE]
--
This tutorial uses {kib} to create jobs and view results, but you can
alternatively use APIs to accomplish most tasks.
For API reference information, see {ref}/ml-apis.html[Machine Learning APIs].
The {xpackml} features in {kib} use pop-ups. You must configure your
web browser so that it does not block pop-up windows or create an
exception for your {kib} URL.
--
{kib} provides wizards that help you create typical {ml} jobs. For example, you
can use wizards to create single metric, multi-metric, population, and advanced
jobs.
To see the job creation wizards:
. Open {kib} in your web browser and log in. If you are running {kib} locally,
go to `http://localhost:5601/`.
. Click **Machine Learning** in the side navigation.
. Click **Create new job**.
. Click the `server-metrics*` index pattern.
You can then choose from a list of job wizards. For example:
[role="screenshot"]
image::images/ml-create-job.jpg["Job creation wizards in {kib}"]
If you are not certain which wizard to use, there is also a **Data Visualizer**
that can help you explore the fields in your data.
To learn more about the sample data:
. Click **Data Visualizer**. +
+
--
[role="screenshot"]
image::images/ml-data-visualizer.jpg["Data Visualizer in {kib}"]
--
. Select a time period that you're interested in exploring by using the time
picker in the {kib} toolbar. Alternatively, click
**Use full server-metrics* data** to view data over the full time range. In this
sample data, the documents relate to March and April 2017.
. Optional: Change the number of documents per shard that are used in the
visualizations. There is a relatively small number of documents in the sample
data, so you can choose a value of `all`. For larger data sets, keep in mind
that using a large sample size increases query run times and increases the load
on the cluster.
[role="screenshot"]
image::images/ml-data-metrics.jpg["Data Visualizer output for metrics in {kib}"]
The fields in the indices are listed in two sections. The first section contains
the numeric ("metric") fields. The second section contains non-metric fields
(such as `keyword`, `text`, `date`, `boolean`, `ip`, and `geo_point` data types).
For metric fields, the **Data Visualizer** indicates how many documents contain
the field in the selected time period. It also provides information about the
minimum, median, and maximum values, the number of distinct values, and their
distribution. You can use the distribution chart to get a better idea of how
the values in the data are clustered. Alternatively, you can view the top values
for metric fields. For example:
[role="screenshot"]
image::images/ml-data-topmetrics.jpg["Data Visualizer output for top values in {kib}"]
For date fields, the **Data Visualizer** provides the earliest and latest field
values and the number and percentage of documents that contain the field
during the selected time period. For example:
[role="screenshot"]
image::images/ml-data-dates.jpg["Data Visualizer output for date fields in {kib}"]
For keyword fields, the **Data Visualizer** provides the number of distinct
values, a list of the top values, and the number and percentage of documents
that contain the field during the selected time period. For example:
[role="screenshot"]
image::images/ml-data-keywords.jpg["Data Visualizer output for date fields in {kib}"]
In this tutorial, you will create single and multi-metric jobs that use the
`total`, `response`, `service`, and `host` fields. Though there is an option to
create an advanced job directly from the **Data Visualizer**, we will use the
single and multi-metric job creation wizards instead.

View File

@ -1,92 +0,0 @@
[[ml-getting-started]]
== Getting started with machine learning
++++
<titleabbrev>Getting started</titleabbrev>
++++
Ready to get some hands-on experience with the {xpackml} features? This
tutorial shows you how to:
* Load a sample data set into {es}
* Create single and multi-metric {ml} jobs in {kib}
* Use the results to identify possible anomalies in the data
At the end of this tutorial, you should have a good idea of what {ml} is and
will hopefully be inspired to use it to detect anomalies in your own data.
You might also be interested in these video tutorials, which use the same sample
data:
* https://www.elastic.co/videos/machine-learning-tutorial-creating-a-single-metric-job[Machine Learning for the Elastic Stack: Creating a single metric job]
* https://www.elastic.co/videos/machine-learning-tutorial-creating-a-multi-metric-job[Machine Learning for the Elastic Stack: Creating a multi-metric job]
[float]
[[ml-gs-sysoverview]]
=== System Overview
To follow the steps in this tutorial, you will need the following
components of the Elastic Stack:
* {es} {version}, which stores the data and the analysis results
* {kib} {version}, which provides a helpful user interface for creating and
viewing jobs
See the https://www.elastic.co/support/matrix[Elastic Support Matrix] for
information about supported operating systems.
See {stack-ref}/installing-elastic-stack.html[Installing the Elastic Stack] for
information about installing each of the components.
NOTE: To get started, you can install {es} and {kib} on a
single VM or even on your laptop (requires 64-bit OS).
As you add more data and your traffic grows,
you'll want to replace the single {es} instance with a cluster.
By default, when you install {es} and {kib}, {xpack} is installed and the
{ml} features are enabled. You cannot use {ml} with the free basic license, but
you can try all of the {xpack} features with a <<license-management,trial license>>.
If you have multiple nodes in your cluster, you can optionally dedicate nodes to
specific purposes. If you want to control which nodes are
_machine learning nodes_ or limit which nodes run resource-intensive
activity related to jobs, see
{ref}/modules-node.html#modules-node-xpack[{ml} node settings].
[float]
[[ml-gs-users]]
==== Users, Roles, and Privileges
The {xpackml} features implement cluster privileges and built-in roles to
make it easier to control which users have authority to view and manage the jobs,
{dfeeds}, and results.
By default, you can perform all of the steps in this tutorial by using the
built-in `elastic` super user. However, the password must be set before the user
can do anything. For information about how to set that password, see
<<security-getting-started>>.
If you are performing these steps in a production environment, take extra care
because `elastic` has the `superuser` role and you could inadvertently make
significant changes to the system. You can alternatively assign the
`machine_learning_admin` and `kibana_user` roles to a user ID of your choice.
For more information, see <<built-in-roles>> and <<privileges-list-cluster>>.
:edit_url: https://github.com/elastic/elasticsearch/edit/{branch}/x-pack/docs/en/ml/getting-started-data.asciidoc
include::getting-started-data.asciidoc[]
:edit_url: https://github.com/elastic/elasticsearch/edit/{branch}/x-pack/docs/en/ml/getting-started-wizards.asciidoc
include::getting-started-wizards.asciidoc[]
:edit_url: https://github.com/elastic/elasticsearch/edit/{branch}/x-pack/docs/en/ml/getting-started-single.asciidoc
include::getting-started-single.asciidoc[]
:edit_url: https://github.com/elastic/elasticsearch/edit/{branch}/x-pack/docs/en/ml/getting-started-multi.asciidoc
include::getting-started-multi.asciidoc[]
:edit_url: https://github.com/elastic/elasticsearch/edit/{branch}/x-pack/docs/en/ml/getting-started-forecast.asciidoc
include::getting-started-forecast.asciidoc[]
:edit_url: https://github.com/elastic/elasticsearch/edit/{branch}/x-pack/docs/en/ml/getting-started-next.asciidoc
include::getting-started-next.asciidoc[]

Binary file not shown.

Before

Width:  |  Height:  |  Size: 26 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 30 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 304 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 53 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 293 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 286 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 398 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 133 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 157 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 236 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 134 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 154 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 218 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 175 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 245 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 249 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 64 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 122 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 230 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 141 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 84 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 87 KiB

View File

@ -1,198 +0,0 @@
[[ml-limitations]]
== Machine Learning Limitations
The following limitations and known problems apply to the {version} release of
{xpack}:
[float]
=== Categorization uses English dictionary words
//See x-pack-elasticsearch/#3021
Categorization identifies static parts of unstructured logs and groups similar
messages together. The default categorization tokenizer assumes English language
log messages. For other languages you must define a different
`categorization_analyzer` for your job. For more information, see
<<ml-configuring-categories>>.
Additionally, a dictionary used to influence the categorization process contains
only English words. This means categorization might work better in English than
in other languages. The ability to customize the dictionary will be added in a
future release.
[float]
=== Pop-ups must be enabled in browsers
//See x-pack-elasticsearch/#844
The {xpackml} features in {kib} use pop-ups. You must configure your
web browser so that it does not block pop-up windows or create an
exception for your {kib} URL.
[float]
=== Anomaly Explorer omissions and limitations
//See x-pack-elasticsearch/#844 and x-pack-kibana/#1461
In {kib}, Anomaly Explorer charts are not displayed for anomalies
that were due to categorization, `time_of_day` functions, or `time_of_week`
functions. Those particular results do not display well as time series
charts.
The charts are also not displayed for detectors that use script fields. In that
case, the original source data cannot be easily searched because it has been
somewhat transformed by the script.
The Anomaly Explorer charts can also look odd in circumstances where there
is very little data to plot. For example, if there is only one data point, it is
represented as a single dot. If there are only two data points, they are joined
by a line.
[float]
=== Jobs close on the {dfeed} end date
//See x-pack-elasticsearch/#1037
If you start a {dfeed} and specify an end date, it will close the job when
the {dfeed} stops. This behavior avoids having numerous open one-time jobs.
If you do not specify an end date when you start a {dfeed}, the job
remains open when you stop the {dfeed}. This behavior avoids the overhead
of closing and re-opening large jobs when there are pauses in the {dfeed}.
[float]
=== Jobs created in {kib} must use {dfeeds}
If you create jobs in {kib}, you must use {dfeeds}. If the data that you want to
analyze is not stored in {es}, you cannot use {dfeeds} and therefore you cannot
create your jobs in {kib}. You can, however, use the {ml} APIs to create jobs
and to send batches of data directly to the jobs. For more information, see
<<ml-dfeeds>> and <<ml-api-quickref>>.
[float]
=== Post data API requires JSON format
The post data API enables you to send data to a job for analysis. The data that
you send to the job must use the JSON format.
For more information about this API, see
{ref}/ml-post-data.html[Post Data to Jobs].
[float]
=== Misleading high missing field counts
//See x-pack-elasticsearch/#684
One of the counts associated with a {ml} job is `missing_field_count`,
which indicates the number of records that are missing a configured field.
//This information is most useful when your job analyzes CSV data. In this case,
//missing fields indicate data is not being analyzed and you might receive poor results.
Since jobs analyze JSON data, the `missing_field_count` might be misleading.
Missing fields might be expected due to the structure of the data and therefore
do not generate poor results.
For more information about `missing_field_count`,
see {ref}/ml-jobstats.html#ml-datacounts[Data Counts Objects].
[float]
=== Terms aggregation size affects data analysis
//See x-pack-elasticsearch/#601
By default, the `terms` aggregation returns the buckets for the top ten terms.
You can change this default behavior by setting the `size` parameter.
If you are send pre-aggregated data to a job for analysis, you must ensure
that the `size` is configured correctly. Otherwise, some data might not be
analyzed.
[float]
=== Time-based index patterns are not supported
//See x-pack-elasticsearch/#1910
It is not possible to create an {xpackml} analysis job that uses time-based
index patterns, for example `[logstash-]YYYY.MM.DD`.
This applies to the single metric or multi metric job creation wizards in {kib}.
[float]
=== Fields named "by", "count", or "over" cannot be used to split data
//See x-pack-elasticsearch/#858
You cannot use the following field names in the `by_field_name` or
`over_field_name` properties in a job: `by`; `count`; `over`. This limitation
also applies to those properties when you create advanced jobs in {kib}.
[float]
=== Jobs created in {kib} use model plot config and pre-aggregated data
//See x-pack-elasticsearch/#844
If you create single or multi-metric jobs in {kib}, it might enable some
options under the covers that you'd want to reconsider for large or
long-running jobs.
For example, when you create a single metric job in {kib}, it generally
enables the `model_plot_config` advanced configuration option. That configuration
option causes model information to be stored along with the results and provides
a more detailed view into anomaly detection. It is specifically used by the
**Single Metric Viewer** in {kib}. When this option is enabled, however, it can
add considerable overhead to the performance of the system. If you have jobs
with many entities, for example data from tens of thousands of servers, storing
this additional model information for every bucket might be problematic. If you
are not certain that you need this option or if you experience performance
issues, edit your job configuration to disable this option.
For more information, see
{ref}/ml-job-resource.html#ml-apimodelplotconfig[Model Plot Config].
Likewise, when you create a single or multi-metric job in {kib}, in some cases
it uses aggregations on the data that it retrieves from {es}. One of the
benefits of summarizing data this way is that {es} automatically distributes
these calculations across your cluster. This summarized data is then fed into
{xpackml} instead of raw results, which reduces the volume of data that must
be considered while detecting anomalies. However, if you have two jobs, one of
which uses pre-aggregated data and another that does not, their results might
differ. This difference is due to the difference in precision of the input data.
The {ml} analytics are designed to be aggregation-aware and the likely increase
in performance that is gained by pre-aggregating the data makes the potentially
poorer precision worthwhile. If you want to view or change the aggregations
that are used in your job, refer to the `aggregations` property in your {dfeed}.
For more information, see {ref}/ml-datafeed-resource.html[Datafeed Resources].
[float]
=== Security Integration
When {security} is enabled, a {dfeed} stores the roles of the user who created
or updated the {dfeed} **at that time**. This means that if those roles are
updated then the {dfeed} subsequently runs with the new permissions that are
associated with the roles. However, if the user's roles are adjusted after
creating or updating the {dfeed}, the {dfeed} continues to run with the
permissions that were associated with the original roles. For more information,
see <<ml-dfeeds>>.
[float]
=== Forecasts cannot be created for population jobs
If you use an `over_field_name` property in your job (that is to say, it's a
_population job_), you cannot create a forecast. If you try to create a forecast
for this type of job, an error occurs. For more information about forecasts,
see <<ml-forecasting>>.
[float]
=== Forecasts cannot be created for jobs that use geographic, rare, or time functions
If you use any of the following analytical functions in your job, you cannot
create a forecast:
* `lat_long`
* `rare` and `freq_rare`
* `time_of_day` and `time_of_week`
If you try to create a forecast for this type of job, an error occurs. For more
information about any of these functions, see <<ml-functions>>.
[float]
=== Jobs must be stopped before upgrades
You must stop any {ml} jobs that are running before you start the upgrade
process. For more information, see <<stopping-ml>> and
{stack-ref}/upgrading-elastic-stack.html[Upgrading the Elastic Stack].

View File

@ -1,116 +0,0 @@
[[ml-troubleshooting]]
== {xpackml} Troubleshooting
++++
<titleabbrev>{xpackml}</titleabbrev>
++++
Use the information in this section to troubleshoot common problems and find
answers for frequently asked questions.
* <<ml-rollingupgrade>>
* <<ml-mappingclash>>
To get help, see <<help>>.
[[ml-rollingupgrade]]
=== Machine learning features unavailable after rolling upgrade
This problem occurs after you upgrade all of the nodes in your cluster to
{version} by using rolling upgrades. When you try to use {xpackml} features for
the first time, all attempts fail, though `GET _xpack` and `GET _xpack/usage`
indicate that {xpack} is enabled.
*Symptoms:*
* Errors when you click *Machine Learning* in {kib}.
For example: `Jobs list could not be created` and `An internal server error occurred`.
* Null pointer and remote transport exceptions when you run {ml} APIs such as
`GET _xpack/ml/anomaly_detectors` and `GET _xpack/ml/datafeeds`.
* Errors in the log files on the master nodes.
For example: `unable to install ml metadata upon startup`
*Resolution:*
After you upgrade all master-eligible nodes to {es} {version} and {xpack}
{version}, restart the current master node, which triggers the {xpackml}
features to re-initialize.
For more information, see {ref}/rolling-upgrades.html[Rolling upgrades].
[[ml-mappingclash]]
=== Job creation failure due to mapping clash
This problem occurs when you try to create a job.
*Symptoms:*
* Illegal argument exception occurs when you click *Create Job* in {kib} or run
the create job API. For example:
`Save failed: [status_exception] This job would cause a mapping clash
with existing field [field_name] - avoid the clash by assigning a dedicated
results index` or `Save failed: [illegal_argument_exception] Can't merge a non
object mapping [field_name] with an object mapping [field_name]`.
*Resolution:*
This issue typically occurs when two or more jobs store their results in the
same index and the results contain fields with the same name but different
data types or different `fields` settings.
By default, {ml} results are stored in the `.ml-anomalies-shared` index in {es}.
To resolve this issue, click *Advanced > Use dedicated index* when you create
the job in {kib}. If you are using the create job API, specify an index name in
the `results_index_name` property.
[[ml-jobnames]]
=== {kib} cannot display jobs with invalid characters in their name
This problem occurs when you create a job by using the
{ref}/ml-put-job.html[Create Jobs API] then try to view that job in {kib}. In
particular, the problem occurs when you use a period(.) in the job identifier.
*Symptoms:*
* When you try to open a job (named, for example, `job.test` in the
**Anomaly Explorer** or the **Single Metric Viewer**, the job name is split and
the text after the period is assumed to be the job name. If a job does not exist
with that abbreviated name, an error occurs. For example:
`Warning Requested job test does not exist`. If a job exists with that
abbreviated name, it is displayed.
*Resolution:*
Create jobs in {kib} or ensure that you create jobs with valid identifiers when
you use the {ml} APIs. For more information about valid identifiers, see
{ref}/ml-put-job.html[Create Jobs API] or
{ref}/ml-job-resource.html[Job Resources].
[[ml-upgradedf]]
=== Upgraded nodes fail to start due to {dfeed} issues
This problem occurs when you have a {dfeed} that contains search or query
domain specific language (DSL) that was discontinued. For example, if you
created a {dfeed} query in 5.x using search syntax that was deprecated in 5.x
and removed in 6.0, you must fix the {dfeed} before you upgrade to 6.0.
*Symptoms:*
* If {ref}/logging.html#deprecation-logging[deprecation logging] is enabled
before the upgrade, deprecation messages are generated when the {dfeeds} attempt
to retrieve data.
* After the upgrade, nodes fail to start and the error indicates that they
failed to read the local state.
*Resolution:*
Before you upgrade, identify the problematic search or query DSL. In 5.6.5 and
later, the Upgrade Assistant detects these scenarios. If you cannot fix the DSL
before the upgrade, you must delete the {dfeed} then re-create it with valid DSL
after the upgrade.
If you do not fix or delete the {dfeed} before the upgrade, in order to successfully
start the failing nodes you must downgrade the nodes then fix the problem per
above.
See also {stack-ref}/upgrading-elastic-stack.html[Upgrading the Elastic Stack].