[[ml-gs-multi-jobs]] === Creating Multi-metric Jobs The multi-metric job wizard in {kib} provides a simple way to create more complex jobs with multiple detectors. For example, in the single metric job, you were tracking total requests versus time. You might also want to track other metrics like average response time or the maximum number of denied requests. Instead of creating jobs for each of those metrics, you can combine them in a multi-metric job. You can also use multi-metric jobs to split a single time series into multiple time series based on a categorical field. For example, you can split the data based on its hostnames, locations, or users. Each time series is modeled independently. By looking at temporal patterns on a per entity basis, you might spot things that might have otherwise been hidden in the lumped view. Conceptually, you can think of this as running many independent single metric jobs. By bundling them together in a multi-metric job, however, you can see an overall score and shared influencers for all the metrics and all the entities in the job. Multi-metric jobs therefore scale better than having many independent single metric jobs and provide better results when you have influencers that are shared across the detectors. The sample data for this tutorial contains information about the requests that are received by various applications and services in a system. Let's assume that you want to monitor the requests received and the response time. In particular, you might want to track those metrics on a per service basis to see if any services have unusual patterns. To create a multi-metric job in {kib}: . Open {kib} in your web browser and log in. If you are running {kib} locally, go to `http://localhost:5601/`. . Click **Machine Learning** in the side navigation, then click **Create new job**. . Select the index pattern that you created for the sample data. For example, `server-metrics*`. . In the **Use a wizard** section, click **Multi metric**. . Configure the job by providing the following job settings: + + -- [role="screenshot"] image::images/ml-gs-multi-job.jpg["Create a new job from the server-metrics index"] -- .. For the **Fields**, select `high mean(response)` and `sum(total)`. This creates two detectors and specifies the analysis function and field that each detector uses. The first detector uses the high mean function to detect unusually high average values for the `response` field in each bucket. The second detector uses the sum function to detect when the sum of the `total` field is anomalous in each bucket. For more information about any of the analytical functions, see <>. .. For the **Bucket span**, enter `10m`. This value specifies the size of the interval that the analysis is aggregated into. As was the case in the single metric example, this value has a significant impact on the analysis. When you're creating jobs for your own data, you might need to experiment with different bucket spans depending on the frequency of the input data, the duration of typical anomalies, and the frequency at which alerting is required. .. For the **Split Data**, select `service`. When you specify this option, the analysis is segmented such that you have completely independent baselines for each distinct value of this field. //TBD: What is the importance of having separate baselines? There are seven unique service keyword values in the sample data. Thus for each of the seven services, you will see the high mean response metrics and sum total metrics. + + -- NOTE: If you are creating a job by using the {ml} APIs or the advanced job wizard in {kib}, you can accomplish this split by using the `partition_field_name` property. -- .. For the **Key Fields (Influencers)**, select `host`. Note that the `service` field is also automatically selected because you used it to split the data. These key fields are also known as _influencers_. When you identify a field as an influencer, you are indicating that you think it contains information about someone or something that influences or contributes to anomalies. + -- [TIP] ======================== Picking an influencer is strongly recommended for the following reasons: * It allows you to more easily assign blame for the anomaly * It simplifies and aggregates the results The best influencer is the person or thing that you want to blame for the anomaly. In many cases, users or client IP addresses make excellent influencers. Influencers can be any field in your data; they do not need to be fields that are specified in your detectors, though they often are. As a best practice, do not pick too many influencers. For example, you generally do not need more than three. If you pick many influencers, the results can be overwhelming and there is a small overhead to the analysis. ======================== //TBD: Is this something you can determine later from looking at results and //update your job with if necessary? Is it all post-processing or does it affect //the ongoing modeling? -- . Click **Use full server-metrics* data**. Two graphs are generated for each `service` value, which represent the high mean `response` values and sum `total` values over time. For example: + -- [role="screenshot"] image::images/ml-gs-job2-split.jpg["Kibana charts for data split by service"] -- . Provide a name for the job, for example `response_requests_by_app`. The job name must be unique in your cluster. You can also optionally provide a description of the job. . Click **Create Job**. When the job is created, you can choose to view the results, continue the job in real-time, and create a watch. In this tutorial, we will proceed to view the results. TIP: The `create_multi_metic.sh` script creates a similar job and {dfeed} by using the {ml} APIs. You can download that script by clicking here: https://download.elastic.co/demos/machine_learning/gettingstarted/create_multi_metric.sh[create_multi_metric.sh] For API reference information, see {ref}/ml-apis.html[Machine Learning APIs]. [[ml-gs-job2-analyze]] === Exploring Multi-metric Job Results The {xpackml} features analyze the input stream of data, model its behavior, and perform analysis based on the two detectors you defined in your job. When an event occurs outside of the model, that event is identified as an anomaly. You can use the **Anomaly Explorer** in {kib} to view the analysis results: [role="screenshot"] image::images/ml-gs-job2-explorer.jpg["Job results in the Anomaly Explorer"] You can explore the overall anomaly time line, which shows the maximum anomaly score for each section in the specified time period. You can change the time period by using the time picker in the {kib} toolbar. Note that the sections in this time line do not necessarily correspond to the bucket span. If you change the time period, the sections change size too. The smallest possible size for these sections is a bucket. If you specify a large time period, the sections can span many buckets. On the left is a list of the top influencers for all of the detected anomalies in that same time period. The list includes maximum anomaly scores, which in this case are aggregated for each influencer, for each bucket, across all detectors. There is also a total sum of the anomaly scores for each influencer. You can use this list to help you narrow down the contributing factors and focus on the most anomalous entities. If your job contains influencers, you can also explore swim lanes that correspond to the values of an influencer. In this example, the swim lanes correspond to the values for the `service` field that you used to split the data. Each lane represents a unique application or service name. Since you specified the `host` field as an influencer, you can also optionally view the results in swim lanes for each host name: [role="screenshot"] image::images/ml-gs-job2-explorer-host.jpg["Job results sorted by host"] By default, the swim lanes are ordered by their maximum anomaly score values. You can click on the sections in the swim lane to see details about the anomalies that occurred in that time interval. NOTE: The anomaly scores that you see in each section of the **Anomaly Explorer** might differ slightly. This disparity occurs because for each job we generate bucket results, influencer results, and record results. Anomaly scores are generated for each type of result. The anomaly timeline uses the bucket-level anomaly scores. The list of top influencers uses the influencer-level anomaly scores. The list of anomalies uses the record-level anomaly scores. For more information about these different result types, see {ref}/ml-results-resource.html[Results Resources]. Click on a section in the swim lanes to obtain more information about the anomalies in that time period. For example, click on the red section in the swim lane for `server_2`: [role="screenshot"] image::images/ml-gs-job2-explorer-anomaly.jpg["Job results for an anomaly"] You can see exact times when anomalies occurred and which detectors or metrics caught the anomaly. Also note that because you split the data by the `service` field, you see separate charts for each applicable service. In particular, you see charts for each service for which there is data on the specified host in the specified time interval. Below the charts, there is a table that provides more information, such as the typical and actual values and the influencers that contributed to the anomaly. [role="screenshot"] image::images/ml-gs-job2-explorer-table.jpg["Job results table"] Notice that there are anomalies for both detectors, that is to say for both the `high_mean(response)` and the `sum(total)` metrics in this time interval. The table aggregates the anomalies to show the highest severity anomaly per detector and entity, which is the by, over, or partition field value that is displayed in the **found for** column. To view all the anomalies without any aggregation, set the **Interval** to `Show all`. By investigating multiple metrics in a single job, you might see relationships between events in your data that would otherwise be overlooked.