diff --git a/docs/en/ml/getting-started-multi.asciidoc b/docs/en/ml/getting-started-multi.asciidoc new file mode 100644 index 00000000000..a9d89efd07a --- /dev/null +++ b/docs/en/ml/getting-started-multi.asciidoc @@ -0,0 +1,218 @@ +[[ml-gs-multi-jobs]] +=== Creating Multi-metric Jobs + +The multi-metric job wizard in {kib} provides a simple way to create more +complex jobs with multiple detectors. For example, in the single metric job, you +were tracking total requests versus time. You might also want to track other +metrics like average response time or the maximum number of denied requests. +Instead of creating jobs for each of those metrics, you can combine them in a +multi-metric job. + +You can also use multi-metric jobs to split a single time series into multiple +time series based on a categorical field. For example, you can split the data +based on its hostnames, locations, or users. Each time series is modeled +independently. By looking at temporal patterns on a per entity basis, you might +spot things that might have otherwise been hidden in the lumped view. + +Conceptually, you can think of this as running many independent single metric +jobs. By bundling them together in a multi-metric job, however, you can see an +overall score and shared influencers for all the metrics and all the entities in +the job. Multi-metric jobs therefore scale better than having many independent +single metric jobs and provide better results when you have influencers that are +shared across the detectors. + +The sample data for this tutorial contains information about the requests that +are received by various applications and services in a system. Let's assume that +you want to monitor the requests received and the response time. In particular, +you might want to track those metrics on a per service basis to see if any +services have unusual patterns. + +To create a multi-metric job in {kib}: + +. Open {kib} in your web browser and log in. If you are running {kib} locally, +go to `http://localhost:5601/`. + +. Click **Machine Learning** in the side navigation, then click **Create new job**. + ++ +-- +[role="screenshot"] +image::images/ml-kibana.jpg[Job Management] +-- + +. Click **Create multi metric job**. + ++ +-- +[role="screenshot"] +image::images/ml-create-job2.jpg["Create a multi metric job"] +-- + +. Click the `server-metrics` index. + ++ +-- +[role="screenshot"] +image::images/ml-gs-index.jpg["Select an index"] +-- + +. Configure the job by providing the following job settings: + ++ +-- +[role="screenshot"] +image::images/ml-gs-multi-job.jpg["Create a new job from the server-metrics index"] +-- + +.. For the **Fields**, select `high mean(response)` and `sum(total)`. This +creates two detectors and specifies the analysis function and field that each +detector uses. The first detector uses the high mean function to detect +unusually high average values for the `response` field in each bucket. The +second detector uses the sum function to detect when the sum of the `total` +field is anomalous in each bucket. For more information about any of the +analytical functions, see <>. + +.. For the **Bucket span**, enter `10m`. This value specifies the size of the +interval that the analysis is aggregated into. As was the case in the single +metric example, this value has a significant impact on the analysis. When you're +creating jobs for your own data, you might need to experiment with different +bucket spans depending on the frequency of the input data, the duration of +typical anomalies, and the frequency at which alerting is required. + +.. For the **Split Data**, select `service`. When you specify this +option, the analysis is segmented such that you have completely independent +baselines for each distinct value of this field. +//TBD: What is the importance of having separate baselines? +There are seven unique service keyword values in the sample data. Thus for each +of the seven services, you will see the high mean response metrics and sum +total metrics. + ++ +-- +NOTE: If you are creating a job by using the {ml} APIs or the advanced job +wizard in {kib}, you can accomplish this split by using the +`partition_field_name` property. + +-- + +.. For the **Key Fields**, select `host`. Note that the `service` field +is also automatically selected because you used it to split the data. These key +fields are also known as _influencers_. +When you identify a field as an influencer, you are indicating that you think +it contains information about someone or something that influences or +contributes to anomalies. ++ +-- +[TIP] +======================== +Picking an influencer is strongly recommended for the following reasons: + +* It allows you to more easily assign blame for the anomaly +* It simplifies and aggregates the results + +The best influencer is the person or thing that you want to blame for the +anomaly. In many cases, users or client IP addresses make excellent influencers. +Influencers can be any field in your data; they do not need to be fields that +are specified in your detectors, though they often are. + +As a best practice, do not pick too many influencers. For example, you generally +do not need more than three. If you pick many influencers, the results can be +overwhelming and there is a small overhead to the analysis. + +======================== +//TBD: Is this something you can determine later from looking at results and +//update your job with if necessary? Is it all post-processing or does it affect +//the ongoing modeling? +-- + +. Click **Use full server-metrics* data**. Two graphs are generated for each +`service` value, which represent the high mean `response` values and +sum `total` values over time. +//TBD What is the use of the document count table? + +. Provide a name for the job, for example `response_requests_by_app`. The job +name must be unique in your cluster. You can also optionally provide a +description of the job. + +. Click **Create Job**. As the job is created, the graphs are updated to give a +visual representation of the progress of {ml} as the data is processed. For +example: ++ +-- +[role="screenshot"] +image::images/ml-gs-job2-results.jpg["Job results updating as data is processed"] +-- + +TIP: The `create_multi_metic.sh` script creates a similar job and {dfeed} by +using the {ml} APIs. You can download that script by clicking +here: https://download.elastic.co/demos/machine_learning/gettingstarted/create_multi_metric.sh[create_multi_metric.sh] +For API reference information, see {ref}/ml-apis.html[Machine Learning APIs]. + +[[ml-gs-job2-analyze]] +=== Exploring Multi-metric Job Results + +The {xpackml} features analyze the input stream of data, model its behavior, and +perform analysis based on the two detectors you defined in your job. When an +event occurs outside of the model, that event is identified as an anomaly. + +You can use the **Anomaly Explorer** in {kib} to view the analysis results: + +[role="screenshot"] +image::images/ml-gs-job2-explorer.jpg["Job results in the Anomaly Explorer"] + +You can explore the overall anomaly time line, which shows the maximum anomaly +score for each section in the specified time period. You can change the time +period by using the time picker in the {kib} toolbar. Note that the sections in +this time line do not necessarily correspond to the bucket span. If you change +the time period, the sections change size too. The smallest possible size for +these sections is a bucket. If you specify a large time period, the sections can +span many buckets. + +On the left is a list of the top influencers for all of the detected anomalies +in that same time period. The list includes maximum anomaly scores, which in +this case are aggregated for each influencer, for each bucket, across all +detectors. There is also a total sum of the anomaly scores for each influencer. +You can use this list to help you narrow down the contributing factors and focus +on the most anomalous entities. + +If your job contains influencers, you can also explore swim lanes that +correspond to the values of an influencer. In this example, the swim lanes +correspond to the values for the `service` field that you used to split the data. +Each lane represents a unique application or service name. Since you specified +the `host` field as an influencer, you can also optionally view the results in +swim lanes for each host name: + +[role="screenshot"] +image::images/ml-gs-job2-explorer-host.jpg["Job results sorted by host"] + +By default, the swim lanes are ordered by their maximum anomaly score values. +You can click on the sections in the swim lane to see details about the +anomalies that occurred in that time interval. + +NOTE: The anomaly scores that you see in each section of the **Anomaly Explorer** +might differ slightly. This disparity occurs because for each job we generate +bucket results, influencer results, and record results. Anomaly scores are +generated for each type of result. The anomaly timeline uses the bucket-level +anomaly scores. The list of top influencers uses the influencer-level anomaly +scores. The list of anomalies uses the record-level anomaly scores. For more +information about these different result types, see +{ref}/ml-results-resource.html[Results Resources]. + +Click on a section in the swim lanes to obtain more information about the +anomalies in that time period. For example, click on the red section in the swim +lane for `server_2`: + +[role="screenshot"] +image::images/ml-gs-job2-explorer-anomaly.jpg["Job results for an anomaly"] + +You can see exact times when anomalies occurred and which detectors or metrics +caught the anomaly. Also note that because you split the data by the `service` +field, you see separate charts for each applicable service. In particular, you +see charts for each service for which there is data on the specified host in the +specified time interval. + +Below the charts, there is a table that provides more information, such as the +typical and actual values and the influencers that contributed to the anomaly. + +[role="screenshot"] +image::images/ml-gs-job2-explorer-table.jpg["Job results table"] + +Notice that there are anomalies for both detectors, that is to say for both the +`high_mean(response)` and the `sum(total)` metrics in this time interval. By +investigating multiple metrics in a single job, you might see relationships +between events in your data that would otherwise be overlooked. diff --git a/docs/en/ml/getting-started-next.asciidoc b/docs/en/ml/getting-started-next.asciidoc new file mode 100644 index 00000000000..fde7fb1aebc --- /dev/null +++ b/docs/en/ml/getting-started-next.asciidoc @@ -0,0 +1,55 @@ +[[ml-gs-next]] +=== Next Steps + +By completing this tutorial, you've learned how you can detect anomalous +behavior in a simple set of sample data. You created single and multi-metric +jobs in {kib}, which creates and opens jobs and creates and starts {dfeeds} for +you under the covers. You examined the results of the {ml} analysis in the +**Single Metric Viewer** and **Anomaly Explorer** in {kib}. + +If you want to learn about advanced job options, you might be interested in +the following video tutorial: +https://www.elastic.co/videos/machine-learning-lab-3-detect-outliers-in-a-population[Machine Learning Lab 3 - Detect Outliers in a Population]. + +If you intend to use {ml} APIs in your applications, a good next step might be +to learn about the APIs by retrieving information about these sample jobs. +For example, the following APIs retrieve information about the jobs and {dfeeds}. + +[source,js] +-------------------------------------------------- +GET _xpack/ml/anomaly_detectors + +GET _xpack/ml/datafeeds +-------------------------------------------------- +// CONSOLE + +For more information about the {ml} APIs, see <>. + +Ultimately, the next step is to start applying {ml} to your own data. +As mentioned in <>, there are three things to consider when you're +thinking about where {ml} will be most impactful: + +. It must be time series data. +. It should be information that contains key performance indicators for the +health, security, or success of your business or system. The better you know the +data, the quicker you will be able to create jobs that generate useful +insights. +. Ideally, the data is located in {es} and you can therefore create a {dfeed} +that retrieves data in real time. If your data is outside of {es}, you +cannot use {kib} to create your jobs and you cannot use {dfeeds}. Machine +learning analysis is still possible, however, by using APIs to create and manage +jobs and to post data to them. + +Once you have decided which data to analyze, you can start considering which +analysis functions you want to use. For more information, see <>. + +In general, it is a good idea to start with single metric jobs for your +key performance indicators. After you examine these simple analysis results, +you will have a better idea of what the influencers might be. You can create +multi-metric jobs and split the data or create more complex analysis functions +as necessary. +//TO)DO: Add link to configuration section: For examples of +//more complicated configuration options, see <<>>. + +If you encounter problems, we're here to help. See <> and +<>. diff --git a/docs/en/ml/getting-started.asciidoc b/docs/en/ml/getting-started.asciidoc index f81655b3411..976bbdd3d0f 100644 --- a/docs/en/ml/getting-started.asciidoc +++ b/docs/en/ml/getting-started.asciidoc @@ -13,13 +13,14 @@ Ready to get some hands-on experience with the {xpackml} features? This tutorial shows you how to: * Load a sample data set into {es} -* Create a {ml} job +* Create single and multi-metric {ml} jobs in {kib} * Use the results to identify possible anomalies in the data At the end of this tutorial, you should have a good idea of what {ml} is and will hopefully be inspired to use it to detect anomalies in your own data. -You might also be interested in these video tutorials: +You might also be interested in these video tutorials, which use the same sample +data: * https://www.elastic.co/videos/machine-learning-tutorial-creating-a-single-metric-job[Machine Learning for the Elastic Stack: Creating a single metric job] * https://www.elastic.co/videos/machine-learning-tutorial-creating-a-multi-metric-job[Machine Learning for the Elastic Stack: Creating a multi-metric job] @@ -278,8 +279,8 @@ You should see output similar to the following: [source,shell] ---------------------------------- -health status index ... pri rep docs.count docs.deleted store.size ... -green open server-metrics ... 1 0 905940 0 120.5mb ... +health status index ... pri rep docs.count ... +green open server-metrics ... 1 0 905940 ... ---------------------------------- Next, you must define an index pattern for this data set: @@ -305,7 +306,7 @@ This data set can now be analyzed in {ml} jobs in {kib}. [[ml-gs-jobs]] -=== Creating Jobs +=== Creating Single Metric Jobs Machine learning jobs contain the configuration information and metadata necessary to perform an analytical task. They also contain the results of the @@ -322,7 +323,25 @@ web browser so that it does not block pop-up windows or create an exception for your Kibana URL. -- -To work with jobs in {kib}: +You can choose to create single metric, multi-metric, or advanced jobs in +{kib}. At this point in the tutorial, the goal is to detect anomalies in the +total requests received by your applications and services. The sample data +contains a single key performance indicator to track this, which is the total +requests over time. It is therefore logical to start by creating a single metric +job for this KPI. + +TIP: If you are using aggregated data, you can create an advanced job +and configure it to use a `summary_count_field`. The {ml} algorithms will +make the best possible use of summarized data in this case. For simplicity, in +this tutorial we will not make use of that advanced functionality. + +//TO-DO: Add link to aggregations.asciidoc: For more information, see <<>>. + +A single metric job contains a single _detector_. A detector defines the type of +analysis that will occur (for example, `max`, `average`, or `rare` analytical +functions) and the fields that will be analyzed. + +To create a single metric job in {kib}: . Open {kib} in your web browser and log in. If you are running {kib} locally, go to `http://localhost:5601/`. @@ -334,30 +353,7 @@ go to `http://localhost:5601/`. image::images/ml-kibana.jpg[Job Management] -- -You can choose to create single metric, multi-metric, or advanced jobs in -{kib}. In this tutorial, the goal is to detect anomalies in the total requests -received by your applications and services. The sample data contains a single -key performance indicator to track this, which is the total requests over time. -It is therefore logical to start by creating a single metric job for this KPI. - -TIP: If you are using aggregated data, you can create an advanced job -and configure it to use a `summary_count_field`. The {ml} algorithms will -make the best possible use of summarized data in this case. For simplicity in this tutorial -we will not make use of that advanced functionality. - - -[float] -[[ml-gs-job1-create]] -==== Creating a Single Metric Job - -A single metric job contains a single _detector_. A detector defines the type of -analysis that will occur (for example, `max`, `average`, or `rare` analytical -functions) and the fields that will be analyzed. - -To create a single metric job in {kib}: - -. Click **Machine Learning** in the side navigation, -then click **Create new job**. +. Click **Create new job**. . Click **Create single metric job**. + + @@ -568,9 +564,8 @@ button: image:images/ml-stop-feed.jpg["Stop {dfeed}"] Now that you have processed all the data, let's start exploring the job results. - -[[ml-gs-jobresults]] -=== Exploring Job Results +[[ml-gs-job1-analyze]] +=== Exploring Single Metric Job Results The {xpackml} features analyze the input stream of data, model its behavior, and perform analysis based on the detectors you defined in your job. When an @@ -589,11 +584,6 @@ Anomaly Explorer:: also swim lanes for each influencer. By selecting a block in a swim lane, the anomaly details are displayed alongside the original source data (where applicable). -//TBD: Are they swimlane blocks, tiles, segments or cards? hmmm -//TBD: Do the time periods in the heat map correspond to buckets? hmmm is it a heat map? -//As time is the x-axis, and the block sizes stay the same, it feels more intuitive call it a swimlane. -//The swimlane bucket intervals depends on the time range selected. Their smallest possible -//granularity is a bucket, but if you have a big time range selected, then they will span many buckets Single Metric Viewer:: This view contains a chart that represents the actual and expected values over @@ -601,10 +591,6 @@ Single Metric Viewer:: where `model_plot_config` is enabled. As in the **Anomaly Explorer**, anomalous data points are shown in different colors depending on their score. -[float] -[[ml-gs-job1-analyze]] -==== Exploring Single Metric Job Results - By default when you view the results for a single metric job, the **Single Metric Viewer** opens: [role="screenshot"] @@ -652,7 +638,7 @@ You can see the same information in a different format by using the image::images/ml-gs-job1-explorer.jpg["Anomaly Explorer for total-requests job"] -Click one of the red blocks in the swim lane to see details about the anomalies +Click one of the red sections in the swim lane to see details about the anomalies that occurred in that time interval. For example: [role="screenshot"] image::images/ml-gs-job1-explorer-anomaly.jpg["Anomaly Explorer details for total-requests job"] @@ -663,71 +649,5 @@ contributing to the problem? Are the anomalies confined to particular applications or servers? You can begin to troubleshoot these situations by layering additional jobs or creating multi-metric jobs. -//// -The troubleshooting job would not create alarms of its own, but rather would -help explain the overall situation. It's usually a different job because it's -operating on different indices. Layering jobs is an important concept. -//// -//// -[float] -[[ml-gs-job2-create]] -==== Creating a Multi-Metric Job - -TBD. - -* Walk through creation of a simple multi-metric job. -* Provide overview of: -** partition fields, -** influencers -*** An influencer is someone or something that has influenced or contributed to the anomaly. -Results are aggregated for each influencer, for each bucket, across all detectors. -In this way, a combined anomaly score is calculated for each influencer, -which determines its relative anomalousness. You can specify one or many influencers. -Picking an influencer is strongly recommended for the following reasons: -**** It allow you to blame someone/something for the anomaly -**** It simplifies and aggregates results -*** The best influencer is the person or thing that you want to blame for the anomaly. -In many cases, users or client IP make excellent influencers. -*** By/over/partition fields are usually good candidates for influencers. -*** Influencers can be any field in the source data; they do not need to be fields -specified in detectors, although they often are. -** by/over fields, -*** detectors -**** You can have more than one detector in a job which is more efficient than -running multiple jobs against the same data stream. - -//http://www.prelert.com/docs/behavioral_analytics/latest/concepts/multivariate.html - -[float] -[[ml-gs-job2-analyze]] -===== Viewing Multi-Metric Job Results - -TBD. - -* Walk through exploration of job results. -* Describe how influencer detection accelerates root cause identification. - -//// -//// -* Provide brief overview of statistical models and/or link to more info. -* Possibly discuss effect of altering bucket span. - -The anomaly score is a sophisticated aggregation of the anomaly records in the -bucket. The calculation is optimized for high throughput, gracefully ages -historical data, and reduces the signal to noise levels. It adjusts for -variations in event rate, takes into account the frequency and the level of -anomalous activity and is adjusted relative to past anomalous behavior. -In addition, [the anomaly score] is boosted if anomalous activity occurs for related entities, -for example if disk IO and CPU are both behaving unusually for a given host. -** Once an anomalous time interval has been identified, it can be expanded to -view the detailed anomaly records which are the significant causal factors. -//// -//// -[[ml-gs-alerts]] -=== Creating Alerts for Job Results - -TBD. - -* Walk through creation of simple alert for anomalous data? - -//// +include::getting-started-multi.asciidoc[] +include::getting-started-next.asciidoc[] diff --git a/docs/en/ml/images/ml-create-job2.jpg b/docs/en/ml/images/ml-create-job2.jpg new file mode 100644 index 00000000000..a927d04cd3e Binary files /dev/null and b/docs/en/ml/images/ml-create-job2.jpg differ diff --git a/docs/en/ml/images/ml-gs-job2-explorer-anomaly.jpg b/docs/en/ml/images/ml-gs-job2-explorer-anomaly.jpg new file mode 100644 index 00000000000..9b042b96c6b Binary files /dev/null and b/docs/en/ml/images/ml-gs-job2-explorer-anomaly.jpg differ diff --git a/docs/en/ml/images/ml-gs-job2-explorer-host.jpg b/docs/en/ml/images/ml-gs-job2-explorer-host.jpg new file mode 100644 index 00000000000..de9df812d37 Binary files /dev/null and b/docs/en/ml/images/ml-gs-job2-explorer-host.jpg differ diff --git a/docs/en/ml/images/ml-gs-job2-explorer-table.jpg b/docs/en/ml/images/ml-gs-job2-explorer-table.jpg new file mode 100644 index 00000000000..121c10b44b9 Binary files /dev/null and b/docs/en/ml/images/ml-gs-job2-explorer-table.jpg differ diff --git a/docs/en/ml/images/ml-gs-job2-explorer.jpg b/docs/en/ml/images/ml-gs-job2-explorer.jpg new file mode 100644 index 00000000000..a94cbb0f646 Binary files /dev/null and b/docs/en/ml/images/ml-gs-job2-explorer.jpg differ diff --git a/docs/en/ml/images/ml-gs-job2-results.jpg b/docs/en/ml/images/ml-gs-job2-results.jpg new file mode 100644 index 00000000000..c68e3c125ae Binary files /dev/null and b/docs/en/ml/images/ml-gs-job2-results.jpg differ diff --git a/docs/en/ml/images/ml-gs-multi-job.jpg b/docs/en/ml/images/ml-gs-multi-job.jpg new file mode 100644 index 00000000000..57c8094ba8e Binary files /dev/null and b/docs/en/ml/images/ml-gs-multi-job.jpg differ