[[ml-gs-jobs]] === Creating Single Metric Jobs At this point in the tutorial, the goal is to detect anomalies in the total requests received by your applications and services. The sample data contains a single key performance indicator(KPI) to track this, which is the total requests over time. It is therefore logical to start by creating a single metric job for this KPI. TIP: If you are using aggregated data, you can create an advanced job and configure it to use a `summary_count_field_name`. The {ml} algorithms will make the best possible use of summarized data in this case. For simplicity, in this tutorial we will not make use of that advanced functionality. For more information, see <>. A single metric job contains a single _detector_. A detector defines the type of analysis that will occur (for example, `max`, `average`, or `rare` analytical functions) and the fields that will be analyzed. To create a single metric job in {kib}: . Open {kib} in your web browser and log in. If you are running {kib} locally, go to `http://localhost:5601/`. . Click **Machine Learning** in the side navigation. . Click **Create new job**. . Select the index pattern that you created for the sample data. For example, `server-metrics*`. . In the **Use a wizard** section, click **Single metric**. . Configure the job by providing the following information: + + -- [role="screenshot"] image::images/ml-gs-single-job.jpg["Create a new job from the server-metrics index"] -- .. For the **Aggregation**, select `Sum`. This value specifies the analysis function that is used. + -- Some of the analytical functions look for single anomalous data points. For example, `max` identifies the maximum value that is seen within a bucket. Others perform some aggregation over the length of the bucket. For example, `mean` calculates the mean of all the data points seen within the bucket. Similarly, `count` calculates the total number of data points within the bucket. In this tutorial, you are using the `sum` function, which calculates the sum of the specified field's values within the bucket. For descriptions of all the functions, see <>. -- .. For the **Field**, select `total`. This value specifies the field that the detector uses in the function. + -- NOTE: Some functions such as `count` and `rare` do not require fields. -- .. For the **Bucket span**, enter `10m`. This value specifies the size of the interval that the analysis is aggregated into. + -- The {xpackml} features use the concept of a bucket to divide up the time series into batches for processing. For example, if you are monitoring the total number of requests in the system, using a bucket span of 1 hour would mean that at the end of each hour, it calculates the sum of the requests for the last hour and computes the anomalousness of that value compared to previous hours. The bucket span has two purposes: it dictates over what time span to look for anomalous features in data, and also determines how quickly anomalies can be detected. Choosing a shorter bucket span enables anomalies to be detected more quickly. However, there is a risk of being too sensitive to natural variations or noise in the input data. Choosing too long a bucket span can mean that interesting anomalies are averaged away. There is also the possibility that the aggregation might smooth out some anomalies based on when the bucket starts in time. The bucket span has a significant impact on the analysis. When you're trying to determine what value to use, take into account the granularity at which you want to perform the analysis, the frequency of the input data, the duration of typical anomalies, and the frequency at which alerting is required. -- . Determine whether you want to process all of the data or only part of it. If you want to analyze all of the existing data, click **Use full server-metrics* data**. If you want to see what happens when you stop and start {dfeeds} and process additional data over time, click the time picker in the {kib} toolbar. Since the sample data spans a period of time between March 23, 2017 and April 22, 2017, click **Absolute**. Set the start time to March 23, 2017 and the end time to April 1, 2017, for example. Once you've got the time range set up, click the **Go** button. + + -- [role="screenshot"] image::images/ml-gs-job1-time.jpg["Setting the time range for the {dfeed}"] -- + -- A graph is generated, which represents the total number of requests over time. Note that the **Estimate bucket span** option is no longer greyed out in the **Buck span** field. This is an experimental feature that you can use to help determine an appropriate bucket span for your data. For the purposes of this tutorial, we will leave the bucket span at 10 minutes. -- . Provide a name for the job, for example `total-requests`. The job name must be unique in your cluster. You can also optionally provide a description of the job and create a job group. . Click **Create Job**. + + -- [role="screenshot"] image::images/ml-gs-job1.jpg["A graph of the total number of requests over time"] -- As the job is created, the graph is updated to give a visual representation of the progress of {ml} as the data is processed. This view is only available whilst the job is running. When the job is created, you can choose to view the results, continue the job in real-time, and create a watch. In this tutorial, we will look at how to manage jobs and {dfeeds} before we view the results. TIP: The `create_single_metic.sh` script creates a similar job and {dfeed} by using the {ml} APIs. You can download that script by clicking here: https://download.elastic.co/demos/machine_learning/gettingstarted/create_single_metric.sh[create_single_metric.sh] For API reference information, see {ref}/ml-apis.html[Machine Learning APIs]. [[ml-gs-job1-manage]] === Managing Jobs After you create a job, you can see its status in the **Job Management** tab: + [role="screenshot"] image::images/ml-gs-job1-manage1.jpg["Status information for the total-requests job"] The following information is provided for each job: Job ID:: The unique identifier for the job. Description:: The optional description of the job. Processed records:: The number of records that have been processed by the job. Memory status:: The status of the mathematical models. When you create jobs by using the APIs or by using the advanced options in {kib}, you can specify a `model_memory_limit`. That value is the maximum amount of memory resources that the mathematical models can use. Once that limit is approached, data pruning becomes more aggressive. Upon exceeding that limit, new entities are not modeled. For more information about this setting, see {ref}/ml-job-resource.html#ml-apilimits[Analysis Limits]. The memory status field reflects whether you have reached or exceeded the model memory limit. It can have one of the following values: + `ok`::: The models stayed below the configured value. `soft_limit`::: The models used more than 60% of the configured memory limit and older unused models will be pruned to free up space. `hard_limit`::: The models used more space than the configured memory limit. As a result, not all incoming data was processed. Job state:: The status of the job, which can be one of the following values: + `opened`::: The job is available to receive and process data. `closed`::: The job finished successfully with its model state persisted. The job must be opened before it can accept further data. `closing`::: The job close action is in progress and has not yet completed. A closing job cannot accept further data. `failed`::: The job did not finish successfully due to an error. This situation can occur due to invalid input data. If the job had irrevocably failed, it must be force closed and then deleted. If the {dfeed} can be corrected, the job can be closed and then re-opened. {dfeed-cap} state:: The status of the {dfeed}, which can be one of the following values: + started::: The {dfeed} is actively receiving data. stopped::: The {dfeed} is stopped and will not receive data until it is re-started. Latest timestamp:: The timestamp of the last processed record. If you click the arrow beside the name of job, you can show or hide additional information, such as the settings, configuration information, or messages for the job. You can also click one of the **Actions** buttons to start the {dfeed}, edit the job or {dfeed}, and clone or delete the job, for example. [float] [[ml-gs-job1-datafeed]] ==== Managing {dfeeds-cap} A {dfeed} can be started and stopped multiple times throughout its lifecycle. If you want to retrieve more data from {es} and the {dfeed} is stopped, you must restart it. For example, if you did not use the full data when you created the job, you can now process the remaining data by restarting the {dfeed}: . In the **Machine Learning** / **Job Management** tab, click the following button to start the {dfeed}: image:images/ml-start-feed.jpg["Start {dfeed}"] . Choose a start time and end time. For example, click **Continue from 2017-04-01 23:59:00** and select **2017-04-30** as the search end time. Then click **Start**. The date picker defaults to the latest timestamp of processed data. Be careful not to leave any gaps in the analysis, otherwise you might miss anomalies. + + -- [role="screenshot"] image::images/ml-gs-job1-datafeed.jpg["Restarting a {dfeed}"] -- The {dfeed} state changes to `started`, the job state changes to `opened`, and the number of processed records increases as the new data is analyzed. The latest timestamp information also increases. TIP: If your data is being loaded continuously, you can continue running the job in real time. For this, start your {dfeed} and select **No end time**. If you want to stop the {dfeed} at this point, you can click the following button: image:images/ml-stop-feed.jpg["Stop {dfeed}"] Now that you have processed all the data, let's start exploring the job results. [[ml-gs-job1-analyze]] === Exploring Single Metric Job Results The {xpackml} features analyze the input stream of data, model its behavior, and perform analysis based on the detectors you defined in your job. When an event occurs outside of the model, that event is identified as an anomaly. Result records for each anomaly are stored in `.ml-anomalies-*` indices in {es}. By default, the name of the index where {ml} results are stored is labelled `shared`, which corresponds to the `.ml-anomalies-shared` index. You can use the **Anomaly Explorer** or the **Single Metric Viewer** in {kib} to view the analysis results. Anomaly Explorer:: This view contains swim lanes showing the maximum anomaly score over time. There is an overall swim lane that shows the overall score for the job, and also swim lanes for each influencer. By selecting a block in a swim lane, the anomaly details are displayed alongside the original source data (where applicable). Single Metric Viewer:: This view contains a chart that represents the actual and expected values over time. This is only available for jobs that analyze a single time series and where `model_plot_config` is enabled. As in the **Anomaly Explorer**, anomalous data points are shown in different colors depending on their score. By default when you view the results for a single metric job, the **Single Metric Viewer** opens: [role="screenshot"] image::images/ml-gs-job1-analysis.jpg["Single Metric Viewer for total-requests job"] The blue line in the chart represents the actual data values. The shaded blue area represents the bounds for the expected values. The area between the upper and lower bounds are the most likely values for the model. If a value is outside of this area then it can be said to be anomalous. If you slide the time selector from the beginning of the data to the end of the data, you can see how the model improves as it processes more data. At the beginning, the expected range of values is pretty broad and the model is not capturing the periodicity in the data. But it quickly learns and begins to reflect the daily variation. Any data points outside the range that was predicted by the model are marked as anomalies. When you have high volumes of real-life data, many anomalies might be found. These vary in probability from very likely to highly unlikely, that is to say, from not particularly anomalous to highly anomalous. There can be none, one or two or tens, sometimes hundreds of anomalies found within each bucket. There can be many thousands found per job. In order to provide a sensible view of the results, an _anomaly score_ is calculated for each bucket time interval. The anomaly score is a value from 0 to 100, which indicates the significance of the observed anomaly compared to previously seen anomalies. The highly anomalous values are shown in red and the low scored values are indicated in blue. An interval with a high anomaly score is significant and requires investigation. Slide the time selector to a section of the time series that contains a red anomaly data point. If you hover over the point, you can see more information about that data point. You can also see details in the **Anomalies** section of the viewer. For example: [role="screenshot"] image::images/ml-gs-job1-anomalies.jpg["Single Metric Viewer Anomalies for total-requests job"] For each anomaly you can see key details such as the time, the actual and expected ("typical") values, and their probability. By default, the table contains all anomalies that have a severity of "warning" or higher in the selected section of the timeline. If you are only interested in critical anomalies, for example, you can change the severity threshold for this table. The anomalies table also automatically calculates an interval for the data in the table. If the time difference between the earliest and latest records in the table is less than two days, the data is aggregated by hour to show the details of the highest severity anomaly for each detector. Otherwise, it is aggregated by day. You can change the interval for the table, for example, to show all anomalies. You can see the same information in a different format by using the **Anomaly Explorer**: [role="screenshot"] image::images/ml-gs-job1-explorer.jpg["Anomaly Explorer for total-requests job"] Click one of the red sections in the swim lane to see details about the anomalies that occurred in that time interval. For example: [role="screenshot"] image::images/ml-gs-job1-explorer-anomaly.jpg["Anomaly Explorer details for total-requests job"] After you have identified anomalies, often the next step is to try to determine the context of those situations. For example, are there other factors that are contributing to the problem? Are the anomalies confined to particular applications or servers? You can begin to troubleshoot these situations by layering additional jobs or creating multi-metric jobs.