[DOCS] Subdivided getting started with ML pages (elastic/x-pack-elasticsearch#3167)

* [DOCS] Subdivided getting started with ML pages * [DOCS] Added new getting started page to build.gradle Original commit: elastic/x-pack-elasticsearch@968187b048
2025-03-08 03:49:38 +00:00 · 2017-11-29 12:03:51 -08:00 · 2017-11-29 12:03:51 -08:00 · 90a1da82ee
commit 90a1da82ee
parent 11ab50d9dc
4 changed files with 577 additions and 579 deletions
--- a/docs/build.gradle
+++ b/docs/build.gradle
@ -9,7 +9,7 @@ apply plugin: 'elasticsearch.docs-test'
 * only remove entries from this list. When it is empty we'll remove it
 * entirely and have a party! There will be cake and everything.... */
 buildRestTests.expectedUnconvertedCandidates = [
-        'en/ml/getting-started.asciidoc',
+        'en/ml/getting-started-data.asciidoc',
        'en/ml/functions/count.asciidoc',
        'en/ml/functions/geo.asciidoc',
        'en/ml/functions/info.asciidoc',
--- a/docs/en/ml/getting-started-data.asciidoc
+++ b/docs/en/ml/getting-started-data.asciidoc
@ -0,0 +1,220 @@
+[[ml-gs-data]]
+=== Identifying Data for Analysis
+
+For the purposes of this tutorial, we provide sample data that you can play with
+and search in {es}. When you consider your own data, however, it's important to
+take a moment and think about where the {xpackml} features will be most
+impactful.
+
+The first consideration is that it must be time series data. The {ml} features
+are designed to model and detect anomalies in time series data.
+
+The second consideration, especially when you are first learning to use {ml},
+is the importance of the data and how familiar you are with it. Ideally, it is
+information that contains key performance indicators (KPIs) for the health,
+security, or success of your business or system. It is information that you need
+to monitor and act on when anomalous behavior occurs. You might even have {kib}
+dashboards that you're already using to watch this data. The better you know the
+data, the quicker you will be able to create {ml} jobs that generate useful
+insights.
+
+The final consideration is where the data is located. This tutorial assumes that
+your data is stored in {es}. It guides you through the steps required to create
+a _{dfeed}_ that passes data to a job. If your own data is outside of {es},
+analysis is still possible by using a post data API.
+
+IMPORTANT: If you want to create {ml} jobs in {kib}, you must use {dfeeds}.
+That is to say, you must store your input data in {es}. When you create
+a job, you select an existing index pattern and {kib} configures the {dfeed}
+for you under the covers.
+
+
+[float]
+[[ml-gs-sampledata]]
+==== Obtaining a Sample Data Set
+
+In this step we will upload some sample data to {es}. This is standard
+{es} functionality, and is needed to set the stage for using {ml}.
+
+The sample data for this tutorial contains information about the requests that
+are received by various applications and services in a system. A system
+administrator might use this type of information to track the total number of
+requests across all of the infrastructure. If the number of requests increases
+or decreases unexpectedly, for example, this might be an indication that there
+is a problem or that resources need to be redistributed. By using the {xpack}
+{ml} features to model the behavior of this data, it is easier to identify
+anomalies and take appropriate action.
+
+Download this sample data by clicking here:
+https://download.elastic.co/demos/machine_learning/gettingstarted/server_metrics.tar.gz[server_metrics.tar.gz]
+
+Use the following commands to extract the files:
+
+[source,shell]
+----------------------------------
+tar -zxvf server_metrics.tar.gz
+----------------------------------
+
+Each document in the server-metrics data set has the following schema:
+
+[source,js]
+----------------------------------
+{
+  "index":
+  {
+    "_index":"server-metrics",
+    "_type":"metric",
+    "_id":"1177"
+  }
+}
+{
+  "@timestamp":"2017-03-23T13:00:00",
+  "accept":36320,
+  "deny":4156,
+  "host":"server_2",
+  "response":2.4558210155,
+  "service":"app_3",
+  "total":40476
+}
+----------------------------------
+
+TIP: The sample data sets include summarized data. For example, the `total`
+value is a sum of the requests that were received by a specific service at a
+particular time. If your data is stored in {es}, you can generate
+this type of sum or average by using aggregations. One of the benefits of
+summarizing data this way is that {es} automatically distributes
+these calculations across your cluster. You can then feed this summarized data
+into {xpackml} instead of raw results, which reduces the volume
+of data that must be considered while detecting anomalies. For the purposes of
+this tutorial, however, these summary values are stored in {es}. For more
+information, see <<ml-configuring-aggregation>>.
+
+Before you load the data set, you need to set up {ref}/mapping.html[_mappings_]
+for the fields. Mappings divide the documents in the index into logical groups
+and specify a field's characteristics, such as the field's searchability or
+whether or not it's _tokenized_, or broken up into separate words.
+
+The sample data includes an `upload_server-metrics.sh` script, which you can use
+to create the mappings and load the data set. You can download it by clicking
+here: https://download.elastic.co/demos/machine_learning/gettingstarted/upload_server-metrics.sh[upload_server-metrics.sh]
+Before you run it, however, you must edit the USERNAME and PASSWORD variables
+with your actual user ID and password.
+
+The script runs a command similar to the following example, which sets up a
+mapping for the data set:
+
+[source,shell]
+----------------------------------
+
+curl -u elastic:x-pack-test-password -X PUT -H 'Content-Type: application/json'
+http://localhost:9200/server-metrics -d '{
+   "settings":{
+      "number_of_shards":1,
+      "number_of_replicas":0
+   },
+   "mappings":{
+      "metric":{
+         "properties":{
+            "@timestamp":{
+               "type":"date"
+            },
+            "accept":{
+               "type":"long"
+            },
+            "deny":{
+               "type":"long"
+            },
+            "host":{
+               "type":"keyword"
+            },
+            "response":{
+               "type":"float"
+            },
+            "service":{
+               "type":"keyword"
+            },
+            "total":{
+               "type":"long"
+            }
+         }
+      }
+   }
+}'
+----------------------------------
+
+NOTE: If you run this command, you must replace `x-pack-test-password` with your
+actual password.
+
+////
+This mapping specifies the following qualities for the data set:
+
+* The _@timestamp_ field is a date.
+//that uses the ISO format `epoch_second`,
+//which is the number of seconds since the epoch.
+* The _accept_, _deny_, and _total_ fields are long numbers.
+* The _host
+////
+
+You can then use the {es} `bulk` API to load the data set. The
+`upload_server-metrics.sh` script runs commands similar to the following
+example, which loads the four JSON files:
+
+[source,shell]
+----------------------------------
+
+curl -u elastic:x-pack-test-password -X POST -H "Content-Type: application/json"
+http://localhost:9200/server-metrics/_bulk --data-binary "@server-metrics_1.json"
+
+curl -u elastic:x-pack-test-password -X POST -H "Content-Type: application/json"
+http://localhost:9200/server-metrics/_bulk --data-binary "@server-metrics_2.json"
+
+curl -u elastic:x-pack-test-password -X POST -H "Content-Type: application/json"
+http://localhost:9200/server-metrics/_bulk --data-binary "@server-metrics_3.json"
+
+curl -u elastic:x-pack-test-password -X POST -H "Content-Type: application/json"
+http://localhost:9200/server-metrics/_bulk --data-binary "@server-metrics_4.json"
+----------------------------------
+
+TIP: This will upload 200MB of data. This is split into 4 files as there is a
+maximum 100MB limit when using the `_bulk` API.
+
+These commands might take some time to run, depending on the computing resources
+available.
+
+You can verify that the data was loaded successfully with the following command:
+
+[source,shell]
+----------------------------------
+
+curl 'http://localhost:9200/_cat/indices?v' -u elastic:x-pack-test-password
+----------------------------------
+
+You should see output similar to the following:
+
+[source,shell]
+----------------------------------
+
+health status index ... pri rep docs.count  ...
+green  open   server-metrics ... 1 0 905940  ...
+----------------------------------
+
+Next, you must define an index pattern for this data set:
+
+. Open {kib} in your web browser and log in. If you are running {kib}
+locally, go to `http://localhost:5601/`.
+
+. Click the **Management** tab, then **Index Patterns**.
+
+. If you already have index patterns, click the plus sign (+) to define a new
+one. Otherwise, the **Configure an index pattern** wizard is already open.
+
+. For this tutorial, any pattern that matches the name of the index you've
+loaded will work. For example, enter `server-metrics*` as the index pattern.
+
+. Verify that the **Index contains time-based events** is checked.
+
+. Select the `@timestamp` field from the **Time-field name** list.
+
+. Click **Create**.
+
+This data set can now be analyzed in {ml} jobs in {kib}.
--- a/docs/en/ml/getting-started-single.asciidoc
+++ b/docs/en/ml/getting-started-single.asciidoc
@ -0,0 +1,354 @@
+[[ml-gs-jobs]]
+=== Creating Single Metric Jobs
+
+Machine learning jobs contain the configuration information and metadata
+necessary to perform an analytical task. They also contain the results of the
+analytical task.
+
+[NOTE]
+--
+This tutorial uses {kib} to create jobs and view results, but you can
+alternatively use APIs to accomplish most tasks.
+For API reference information, see {ref}/ml-apis.html[Machine Learning APIs].
+
+The {xpackml} features in {kib} use pop-ups. You must configure your
+web browser so that it does not block pop-up windows or create an
+exception for your {kib} URL.
+--
+
+You can choose to create single metric, multi-metric, or advanced jobs in
+{kib}. At this point in the tutorial, the goal is to detect anomalies in the
+total requests received by your applications and services. The sample data
+contains a single key performance indicator to track this, which is the total
+requests over time. It is therefore logical to start by creating a single metric
+job for this KPI.
+
+TIP: If you are using aggregated data, you can create an advanced job
+and configure it to use a `summary_count_field_name`. The {ml} algorithms will
+make the best possible use of summarized data in this case. For simplicity, in
+this tutorial we will not make use of that advanced functionality. For more
+information, see <<ml-configuring-aggregation>>.
+
+A single metric job contains a single _detector_. A detector defines the type of
+analysis that will occur (for example, `max`, `average`, or `rare` analytical
+functions) and the fields that will be analyzed.
+
+To create a single metric job in {kib}:
+
+. Open {kib} in your web browser and log in. If you are running {kib} locally,
+go to `http://localhost:5601/`.
+
+. Click **Machine Learning** in the side navigation: +
+
+--
+[role="screenshot"]
+image::images/ml-kibana.jpg[Job Management]
+--
+
+. Click **Create new job**.
+
+. Click **Create single metric job**. +
+
+--
+[role="screenshot"]
+image::images/ml-create-jobs.jpg["Create a new job"]
+--
+
+. Click the `server-metrics` index. +
+
+--
+[role="screenshot"]
+image::images/ml-gs-index.jpg["Select an index"]
+--
+
+. Configure the job by providing the following information: +
+
+--
+[role="screenshot"]
+image::images/ml-gs-single-job.jpg["Create a new job from the server-metrics index"]
+--
+
+.. For the **Aggregation**, select `Sum`. This value specifies the analysis
+function that is used.
+
+--
+Some of the analytical functions look for single anomalous data points. For
+example, `max` identifies the maximum value that is seen within a bucket.
+Others perform some aggregation over the length of the bucket. For example,
+`mean` calculates the mean of all the data points seen within the bucket.
+Similarly, `count` calculates the total number of data points within the bucket.
+In this tutorial, you are using the `sum` function, which calculates the sum of
+the specified field's values within the bucket.
+--
+
+.. For the **Field**, select `total`. This value specifies the field that
+the detector uses in the function.
+
+--
+NOTE: Some functions such as `count` and `rare` do not require fields.
+--
+
+.. For the **Bucket span**, enter `10m`. This value specifies the size of the
+interval that the analysis is aggregated into.
+
+--
+The {xpackml} features use the concept of a bucket to divide up the time series
+into batches for processing. For example, if you are monitoring
+the total number of requests in the system,
+//and receive a data point every 10 minutes
+using a bucket span of 1 hour would mean that at the end of each hour, it
+calculates the sum of the requests for the last hour and computes the
+anomalousness of that value compared to previous hours.
+
+The bucket span has two purposes: it dictates over what time span to look for
+anomalous features in data, and also determines how quickly anomalies can be
+detected. Choosing a shorter bucket span enables anomalies to be detected more
+quickly. However, there is a risk of being too sensitive to natural variations
+or noise in the input data. Choosing too long a bucket span can mean that
+interesting anomalies are averaged away. There is also the possibility that the
+aggregation might smooth out some anomalies based on when the bucket starts
+in time.
+
+The bucket span has a significant impact on the analysis. When you're trying to
+determine what value to use, take into account the granularity at which you
+want to perform the analysis, the frequency of the input data, the duration of
+typical anomalies and the frequency at which alerting is required.
+--
+
+. Determine whether you want to process all of the data or only part of it. If
+you want to analyze all of the existing data, click
+**Use full server-metrics* data**. If you want to see what happens when you
+stop and start {dfeeds} and process additional data over time, click the time
+picker in the {kib} toolbar. Since the sample data spans a period of time
+between March 23, 2017 and April 22, 2017, click **Absolute**. Set the start
+time to March 23, 2017 and the end time to April 1, 2017, for example. Once
+you've got the time range set up, click the **Go** button. +
+
+--
+[role="screenshot"]
+image::images/ml-gs-job1-time.jpg["Setting the time range for the {dfeed}"]
+--
+
+--
+A graph is generated, which represents the total number of requests over time.
+--
+
+. Provide a name for the job, for example `total-requests`. The job name must
+be unique in your cluster. You can also optionally provide a description of the
+job.
+
+. Click **Create Job**. +
+
+--
+[role="screenshot"]
+image::images/ml-gs-job1.jpg["A graph of the total number of requests over time"]
+--
+
+As the job is created, the graph is updated to give a visual representation of
+the progress of {ml} as the data is processed. This view is only available whilst the
+job is running.
+
+TIP: The `create_single_metic.sh` script creates a similar job and {dfeed} by
+using the {ml} APIs. You can download that script by clicking
+here: https://download.elastic.co/demos/machine_learning/gettingstarted/create_single_metric.sh[create_single_metric.sh]
+For API reference information, see {ref}/ml-apis.html[Machine Learning APIs].
+
+[[ml-gs-job1-manage]]
+=== Managing Jobs
+
+After you create a job, you can see its status in the **Job Management** tab: +
+
+[role="screenshot"]
+image::images/ml-gs-job1-manage1.jpg["Status information for the total-requests job"]
+
+The following information is provided for each job:
+
+Job ID::
+The unique identifier for the job.
+
+Description::
+The optional description of the job.
+
+Processed records::
+The number of records that have been processed by the job.
+
+Memory status::
+The status of the mathematical models. When you create jobs by using the APIs or
+by using the advanced options in {kib}, you can specify a `model_memory_limit`.
+That value is the maximum amount of memory resources that the mathematical
+models can use. Once that limit is approached, data pruning becomes more
+aggressive. Upon exceeding that limit, new entities are not modeled. For more
+information about this setting, see
+{ref}/ml-job-resource.html#ml-apilimits[Analysis Limits]. The memory status
+field reflects whether you have reached or exceeded the model memory limit. It
+can have one of the following values: +
+`ok`::: The models stayed below the configured value.
+`soft_limit`::: The models used more than 60% of the configured memory limit
+and older unused models will be pruned to free up space.
+`hard_limit`::: The models used more space than the configured memory limit.
+As a result, not all incoming data was processed.
+
+Job state::
+The status of the job, which can be one of the following values: +
+`open`::: The job is available to receive and process data.
+`closed`::: The job finished successfully with its model state persisted.
+The job must be opened before it can accept further data.
+`closing`::: The job close action is in progress and has not yet completed.
+A closing job cannot accept further data.
+`failed`::: The job did not finish successfully due to an error.
+This situation can occur due to invalid input data.
+If the job had irrevocably failed, it must be force closed and then deleted.
+If the {dfeed} can be corrected, the job can be closed and then re-opened.
+
+{dfeed-cap} state::
+The status of the {dfeed}, which can be one of the following values: +
+started::: The {dfeed} is actively receiving data.
+stopped::: The {dfeed} is stopped and will not receive data until it is
+re-started.
+
+Latest timestamp::
+The timestamp of the last processed record.
+
+
+If you click the arrow beside the name of job, you can show or hide additional
+information, such as the settings, configuration information, or messages for
+the job.
+
+You can also click one of the **Actions** buttons to start the {dfeed}, edit
+the job or {dfeed}, and clone or delete the job, for example.
+
+[float]
+[[ml-gs-job1-datafeed]]
+==== Managing {dfeeds-cap}
+
+A {dfeed} can be started and stopped multiple times throughout its lifecycle.
+If you want to retrieve more data from {es} and the {dfeed} is stopped, you must
+restart it.
+
+For example, if you did not use the full data when you created the job, you can
+now process the remaining data by restarting the {dfeed}:
+
+. In the **Machine Learning** / **Job Management** tab, click the following
+button to start the {dfeed}: image:images/ml-start-feed.jpg["Start {dfeed}"]
+
+
+. Choose a start time and end time. For example,
+click **Continue from 2017-04-01 23:59:00** and select **2017-04-30** as the
+search end time. Then click **Start**. The date picker defaults to the latest
+timestamp of processed data. Be careful not to leave any gaps in the analysis,
+otherwise you might miss anomalies. +
+
+--
+[role="screenshot"]
+image::images/ml-gs-job1-datafeed.jpg["Restarting a {dfeed}"]
+--
+
+The {dfeed} state changes to `started`, the job state changes to `opened`,
+and the number of processed records increases as the new data is analyzed. The
+latest timestamp information also increases. For example:
+[role="screenshot"]
+image::images/ml-gs-job1-manage2.jpg["Job opened and {dfeed} started"]
+
+TIP: If your data is being loaded continuously, you can continue running the job
+in real time. For this, start your {dfeed} and select **No end time**.
+
+If you want to stop the {dfeed} at this point, you can click the following
+button: image:images/ml-stop-feed.jpg["Stop {dfeed}"]
+
+Now that you have processed all the data, let's start exploring the job results.
+
+[[ml-gs-job1-analyze]]
+=== Exploring Single Metric Job Results
+
+The {xpackml} features analyze the input stream of data, model its behavior,
+and perform analysis based on the detectors you defined in your job. When an
+event occurs outside of the model, that event is identified as an anomaly.
+
+Result records for each anomaly are stored in `.ml-anomalies-*` indices in {es}.
+By default, the name of the index where {ml} results are stored is labelled
+`shared`, which corresponds to the `.ml-anomalies-shared` index.
+
+You can use the **Anomaly Explorer** or the **Single Metric Viewer** in {kib} to
+view the analysis results.
+
+Anomaly Explorer::
+  This view contains swim lanes showing the maximum anomaly score over time.
+  There is an overall swim lane that shows the overall score for the job, and
+  also swim lanes for each influencer. By selecting a block in a swim lane, the
+  anomaly details are displayed alongside the original source data (where
+  applicable).
+
+Single Metric Viewer::
+  This view contains a chart that represents the actual and expected values over
+  time. This is only available for jobs that analyze a single time series and
+  where `model_plot_config` is enabled. As in the **Anomaly Explorer**, anomalous
+  data points are shown in different colors depending on their score.
+
+By default when you view the results for a single metric job, the
+**Single Metric Viewer** opens:
+[role="screenshot"]
+image::images/ml-gs-job1-analysis.jpg["Single Metric Viewer for total-requests job"]
+
+
+The blue line in the chart represents the actual data values. The shaded blue
+area represents the bounds for the expected values. The area between the upper
+and lower bounds are the most likely values for the model. If a value is outside
+of this area then it can be said to be anomalous.
+
+If you slide the time selector from the beginning of the data to the end of the
+data, you can see how the model improves as it processes more data. At the
+beginning, the expected range of values is pretty broad and the model is not
+capturing the periodicity in the data. But it quickly learns and begins to
+reflect the daily variation.
+
+Any data points outside the range that was predicted by the model are marked
+as anomalies. When you have high volumes of real-life data, many anomalies
+might be found. These vary in probability from very likely to highly unlikely,
+that is to say, from not particularly anomalous to highly anomalous. There
+can be none, one or two or tens, sometimes hundreds of anomalies found within
+each bucket. There can be many thousands found per job. In order to provide
+a sensible view of the results, an _anomaly score_ is calculated for each bucket
+time interval. The anomaly score is a value from 0 to 100, which indicates
+the significance of the observed anomaly compared to previously seen anomalies.
+The highly anomalous values are shown in red and the low scored values are
+indicated in blue. An interval with a high anomaly score is significant and
+requires investigation.
+
+Slide the time selector to a section of the time series that contains a red
+anomaly data point. If you hover over the point, you can see more information
+about that data point. You can also see details in the **Anomalies** section
+of the viewer. For example:
+[role="screenshot"]
+image::images/ml-gs-job1-anomalies.jpg["Single Metric Viewer Anomalies for total-requests job"]
+
+For each anomaly you can see key details such as the time, the actual and
+expected ("typical") values, and their probability.
+
+By default, the table contains all anomalies that have a severity of "warning"
+or higher in the selected section of the timeline. If you are only interested in
+critical anomalies, for example, you can change the severity threshold for this
+table.
+
+The anomalies table also automatically calculates an interval for the data in
+the table. If the time difference between the earliest and latest records in the
+table is less than two days, the data is aggregated by hour to show the details
+of the highest severity anomaly for each detector.  Otherwise, it is
+aggregated by day. You can change the interval for the table, for example, to
+show all anomalies.
+
+You can see the same information in a different format by using the
+**Anomaly Explorer**:
+[role="screenshot"]
+image::images/ml-gs-job1-explorer.jpg["Anomaly Explorer for total-requests job"]
+
+
+Click one of the red sections in the swim lane to see details about the anomalies
+that occurred in that time interval. For example:
+[role="screenshot"]
+image::images/ml-gs-job1-explorer-anomaly.jpg["Anomaly Explorer details for total-requests job"]
+
+After you have identified anomalies, often the next step is to try to determine
+the context of those situations. For example, are there other factors that are
+contributing to the problem? Are the anomalies confined to particular
+applications or servers? You can begin to troubleshoot these situations by
+layering additional jobs or creating multi-metric jobs.
--- a/docs/en/ml/getting-started.asciidoc
+++ b/docs/en/ml/getting-started.asciidoc
@ -79,583 +79,7 @@ significant changes to the system. You can alternatively assign the

 For more information, see <<built-in-roles>> and <<privileges-list-cluster>>.

-[[ml-gs-data]]
-=== Identifying Data for Analysis
-
-For the purposes of this tutorial, we provide sample data that you can play with
-and search in {es}. When you consider your own data, however, it's important to
-take a moment and think about where the {xpackml} features will be most
-impactful.
-
-The first consideration is that it must be time series data. The {ml} features
-are designed to model and detect anomalies in time series data.
-
-The second consideration, especially when you are first learning to use {ml},
-is the importance of the data and how familiar you are with it. Ideally, it is
-information that contains key performance indicators (KPIs) for the health,
-security, or success of your business or system. It is information that you need
-to monitor and act on when anomalous behavior occurs. You might even have {kib}
-dashboards that you're already using to watch this data. The better you know the
-data, the quicker you will be able to create {ml} jobs that generate useful
-insights.
-
-The final consideration is where the data is located. This tutorial assumes that
-your data is stored in {es}. It guides you through the steps required to create
-a _{dfeed}_ that passes data to a job. If your own data is outside of {es},
-analysis is still possible by using a post data API.
-
-IMPORTANT: If you want to create {ml} jobs in {kib}, you must use {dfeeds}.
-That is to say, you must store your input data in {es}. When you create
-a job, you select an existing index pattern and {kib} configures the {dfeed}
-for you under the covers.
-
-
-[float]
-[[ml-gs-sampledata]]
-==== Obtaining a Sample Data Set
-
-In this step we will upload some sample data to {es}. This is standard
-{es} functionality, and is needed to set the stage for using {ml}.
-
-The sample data for this tutorial contains information about the requests that
-are received by various applications and services in a system. A system
-administrator might use this type of information to track the total number of
-requests across all of the infrastructure. If the number of requests increases
-or decreases unexpectedly, for example, this might be an indication that there
-is a problem or that resources need to be redistributed. By using the {xpack}
-{ml} features to model the behavior of this data, it is easier to identify
-anomalies and take appropriate action.
-
-Download this sample data by clicking here:
-https://download.elastic.co/demos/machine_learning/gettingstarted/server_metrics.tar.gz[server_metrics.tar.gz]
-
-Use the following commands to extract the files:
-
-[source,shell]
----------------------------------
-tar -zxvf server_metrics.tar.gz
----------------------------------
-
-Each document in the server-metrics data set has the following schema:
-
-[source,js]
----------------------------------
-{
-  "index":
-  {
-    "_index":"server-metrics",
-    "_type":"metric",
-    "_id":"1177"
-  }
-}
-{
-  "@timestamp":"2017-03-23T13:00:00",
-  "accept":36320,
-  "deny":4156,
-  "host":"server_2",
-  "response":2.4558210155,
-  "service":"app_3",
-  "total":40476
-}
----------------------------------
-
-TIP: The sample data sets include summarized data. For example, the `total`
-value is a sum of the requests that were received by a specific service at a
-particular time. If your data is stored in {es}, you can generate
-this type of sum or average by using aggregations. One of the benefits of
-summarizing data this way is that {es} automatically distributes
-these calculations across your cluster. You can then feed this summarized data
-into {xpackml} instead of raw results, which reduces the volume
-of data that must be considered while detecting anomalies. For the purposes of
-this tutorial, however, these summary values are stored in {es}. For more
-information, see <<ml-configuring-aggregation>>.
-
-Before you load the data set, you need to set up {ref}/mapping.html[_mappings_]
-for the fields. Mappings divide the documents in the index into logical groups
-and specify a field's characteristics, such as the field's searchability or
-whether or not it's _tokenized_, or broken up into separate words.
-
-The sample data includes an `upload_server-metrics.sh` script, which you can use
-to create the mappings and load the data set. You can download it by clicking
-here: https://download.elastic.co/demos/machine_learning/gettingstarted/upload_server-metrics.sh[upload_server-metrics.sh]
-Before you run it, however, you must edit the USERNAME and PASSWORD variables
-with your actual user ID and password.
-
-The script runs a command similar to the following example, which sets up a
-mapping for the data set:
-
-[source,shell]
----------------------------------
-
-curl -u elastic:x-pack-test-password -X PUT -H 'Content-Type: application/json'
-http://localhost:9200/server-metrics -d '{
-   "settings":{
-      "number_of_shards":1,
-      "number_of_replicas":0
-   },
-   "mappings":{
-      "metric":{
-         "properties":{
-            "@timestamp":{
-               "type":"date"
-            },
-            "accept":{
-               "type":"long"
-            },
-            "deny":{
-               "type":"long"
-            },
-            "host":{
-               "type":"keyword"
-            },
-            "response":{
-               "type":"float"
-            },
-            "service":{
-               "type":"keyword"
-            },
-            "total":{
-               "type":"long"
-            }
-         }
-      }
-   }
-}'
----------------------------------
-
-NOTE: If you run this command, you must replace `x-pack-test-password` with your
-actual password.
-
-////
-This mapping specifies the following qualities for the data set:
-
-* The _@timestamp_ field is a date.
-//that uses the ISO format `epoch_second`,
-//which is the number of seconds since the epoch.
-* The _accept_, _deny_, and _total_ fields are long numbers.
-* The _host
-////
-
-You can then use the {es} `bulk` API to load the data set. The
-`upload_server-metrics.sh` script runs commands similar to the following
-example, which loads the four JSON files:
-
-[source,shell]
----------------------------------
-
-curl -u elastic:x-pack-test-password -X POST -H "Content-Type: application/json"
-http://localhost:9200/server-metrics/_bulk --data-binary "@server-metrics_1.json"
-
-curl -u elastic:x-pack-test-password -X POST -H "Content-Type: application/json"
-http://localhost:9200/server-metrics/_bulk --data-binary "@server-metrics_2.json"
-
-curl -u elastic:x-pack-test-password -X POST -H "Content-Type: application/json"
-http://localhost:9200/server-metrics/_bulk --data-binary "@server-metrics_3.json"
-
-curl -u elastic:x-pack-test-password -X POST -H "Content-Type: application/json"
-http://localhost:9200/server-metrics/_bulk --data-binary "@server-metrics_4.json"
----------------------------------
-
-TIP: This will upload 200MB of data. This is split into 4 files as there is a
-maximum 100MB limit when using the `_bulk` API.
-
-These commands might take some time to run, depending on the computing resources
-available.
-
-You can verify that the data was loaded successfully with the following command:
-
-[source,shell]
----------------------------------
-
-curl 'http://localhost:9200/_cat/indices?v' -u elastic:x-pack-test-password
----------------------------------
-
-You should see output similar to the following:
-
-[source,shell]
----------------------------------
-
-health status index ... pri rep docs.count  ...
-green  open   server-metrics ... 1 0 905940  ...
----------------------------------
-
-Next, you must define an index pattern for this data set:
-
-. Open {kib} in your web browser and log in. If you are running {kib}
-locally, go to `http://localhost:5601/`.
-
-. Click the **Management** tab, then **Index Patterns**.
-
-. If you already have index patterns, click the plus sign (+) to define a new
-one. Otherwise, the **Configure an index pattern** wizard is already open.
-
-. For this tutorial, any pattern that matches the name of the index you've
-loaded will work. For example, enter `server-metrics*` as the index pattern.
-
-. Verify that the **Index contains time-based events** is checked.
-
-. Select the `@timestamp` field from the **Time-field name** list.
-
-. Click **Create**.
-
-This data set can now be analyzed in {ml} jobs in {kib}.
-
-
-[[ml-gs-jobs]]
-=== Creating Single Metric Jobs
-
-Machine learning jobs contain the configuration information and metadata
-necessary to perform an analytical task. They also contain the results of the
-analytical task.
-
-[NOTE]
--
-This tutorial uses {kib} to create jobs and view results, but you can
-alternatively use APIs to accomplish most tasks.
-For API reference information, see {ref}/ml-apis.html[Machine Learning APIs].
-
-The {xpackml} features in {kib} use pop-ups. You must configure your
-web browser so that it does not block pop-up windows or create an
-exception for your Kibana URL.
--
-
-You can choose to create single metric, multi-metric, or advanced jobs in
-{kib}. At this point in the tutorial, the goal is to detect anomalies in the
-total requests received by your applications and services. The sample data
-contains a single key performance indicator to track this, which is the total
-requests over time. It is therefore logical to start by creating a single metric
-job for this KPI.
-
-TIP: If you are using aggregated data, you can create an advanced job
-and configure it to use a `summary_count_field_name`. The {ml} algorithms will
-make the best possible use of summarized data in this case. For simplicity, in
-this tutorial we will not make use of that advanced functionality.
-
-//TO-DO: Add link to aggregations.asciidoc: For more information, see <<>>.
-
-A single metric job contains a single _detector_. A detector defines the type of
-analysis that will occur (for example, `max`, `average`, or `rare` analytical
-functions) and the fields that will be analyzed.
-
-To create a single metric job in {kib}:
-
-. Open {kib} in your web browser and log in. If you are running {kib} locally,
-go to `http://localhost:5601/`.
-
-. Click **Machine Learning** in the side navigation: +
-+
--
-[role="screenshot"]
-image::images/ml-kibana.jpg[Job Management]
--
-
-. Click **Create new job**.
-
-. Click **Create single metric job**. +
-+
--
-[role="screenshot"]
-image::images/ml-create-jobs.jpg["Create a new job"]
--
-
-. Click the `server-metrics` index. +
-+
--
-[role="screenshot"]
-image::images/ml-gs-index.jpg["Select an index"]
--
-
-. Configure the job by providing the following information: +
-+
--
-[role="screenshot"]
-image::images/ml-gs-single-job.jpg["Create a new job from the server-metrics index"]
--
-
-.. For the **Aggregation**, select `Sum`. This value specifies the analysis
-function that is used.
-+
--
-Some of the analytical functions look for single anomalous data points. For
-example, `max` identifies the maximum value that is seen within a bucket.
-Others perform some aggregation over the length of the bucket. For example,
-`mean` calculates the mean of all the data points seen within the bucket.
-Similarly, `count` calculates the total number of data points within the bucket.
-In this tutorial, you are using the `sum` function, which calculates the sum of
-the specified field's values within the bucket.
--
-
-.. For the **Field**, select `total`. This value specifies the field that
-the detector uses in the function.
-+
--
-NOTE: Some functions such as `count` and `rare` do not require fields.
--
-
-.. For the **Bucket span**, enter `10m`. This value specifies the size of the
-interval that the analysis is aggregated into.
-+
--
-The {xpackml} features use the concept of a bucket to divide up the time series
-into batches for processing. For example, if you are monitoring
-the total number of requests in the system,
-//and receive a data point every 10 minutes
-using a bucket span of 1 hour would mean that at the end of each hour, it
-calculates the sum of the requests for the last hour and computes the
-anomalousness of that value compared to previous hours.
-
-The bucket span has two purposes: it dictates over what time span to look for
-anomalous features in data, and also determines how quickly anomalies can be
-detected. Choosing a shorter bucket span enables anomalies to be detected more
-quickly. However, there is a risk of being too sensitive to natural variations
-or noise in the input data. Choosing too long a bucket span can mean that
-interesting anomalies are averaged away. There is also the possibility that the
-aggregation might smooth out some anomalies based on when the bucket starts
-in time.
-
-The bucket span has a significant impact on the analysis. When you're trying to
-determine what value to use, take into account the granularity at which you
-want to perform the analysis, the frequency of the input data, the duration of
-typical anomalies and the frequency at which alerting is required.
--
-
-. Determine whether you want to process all of the data or only part of it. If
-you want to analyze all of the existing data, click
-**Use full server-metrics* data**. If you want to see what happens when you
-stop and start {dfeeds} and process additional data over time, click the time
-picker in the {kib} toolbar. Since the sample data spans a period of time
-between March 23, 2017 and April 22, 2017, click **Absolute**. Set the start
-time to March 23, 2017 and the end time to April 1, 2017, for example. Once
-you've got the time range set up, click the **Go** button. +
-+
--
-[role="screenshot"]
-image::images/ml-gs-job1-time.jpg["Setting the time range for the {dfeed}"]
--
-+
--
-A graph is generated, which represents the total number of requests over time.
--
-
-. Provide a name for the job, for example `total-requests`. The job name must
-be unique in your cluster. You can also optionally provide a description of the
-job.
-
-. Click **Create Job**. +
-+
--
-[role="screenshot"]
-image::images/ml-gs-job1.jpg["A graph of the total number of requests over time"]
--
-
-As the job is created, the graph is updated to give a visual representation of
-the progress of {ml} as the data is processed. This view is only available whilst the
-job is running.
-
-TIP: The `create_single_metic.sh` script creates a similar job and {dfeed} by
-using the {ml} APIs. You can download that script by clicking
-here: https://download.elastic.co/demos/machine_learning/gettingstarted/create_single_metric.sh[create_single_metric.sh]
-For API reference information, see {ref}/ml-apis.html[Machine Learning APIs].
-
-[[ml-gs-job1-manage]]
-=== Managing Jobs
-
-After you create a job, you can see its status in the **Job Management** tab: +
-
-[role="screenshot"]
-image::images/ml-gs-job1-manage1.jpg["Status information for the total-requests job"]
-
-The following information is provided for each job:
-
-Job ID::
-The unique identifier for the job.
-
-Description::
-The optional description of the job.
-
-Processed records::
-The number of records that have been processed by the job.
-
-Memory status::
-The status of the mathematical models. When you create jobs by using the APIs or
-by using the advanced options in {kib}, you can specify a `model_memory_limit`.
-That value is the maximum amount of memory resources that the mathematical
-models can use. Once that limit is approached, data pruning becomes more
-aggressive. Upon exceeding that limit, new entities are not modeled. For more
-information about this setting, see
-{ref}/ml-job-resource.html#ml-apilimits[Analysis Limits]. The memory status
-field reflects whether you have reached or exceeded the model memory limit. It
-can have one of the following values: +
-`ok`::: The models stayed below the configured value.
-`soft_limit`::: The models used more than 60% of the configured memory limit
-and older unused models will be pruned to free up space.
-`hard_limit`::: The models used more space than the configured memory limit.
-As a result, not all incoming data was processed.
-
-Job state::
-The status of the job, which can be one of the following values: +
-`open`::: The job is available to receive and process data.
-`closed`::: The job finished successfully with its model state persisted.
-The job must be opened before it can accept further data.
-`closing`::: The job close action is in progress and has not yet completed.
-A closing job cannot accept further data.
-`failed`::: The job did not finish successfully due to an error.
-This situation can occur due to invalid input data.
-If the job had irrevocably failed, it must be force closed and then deleted.
-If the {dfeed} can be corrected, the job can be closed and then re-opened.
-
-{dfeed-cap} state::
-The status of the {dfeed}, which can be one of the following values: +
-started::: The {dfeed} is actively receiving data.
-stopped::: The {dfeed} is stopped and will not receive data until it is
-re-started.
-
-Latest timestamp::
-The timestamp of the last processed record.
-
-
-If you click the arrow beside the name of job, you can show or hide additional
-information, such as the settings, configuration information, or messages for
-the job.
-
-You can also click one of the **Actions** buttons to start the {dfeed}, edit
-the job or {dfeed}, and clone or delete the job, for example.
-
-[float]
-[[ml-gs-job1-datafeed]]
-==== Managing {dfeeds-cap}
-
-A {dfeed} can be started and stopped multiple times throughout its lifecycle.
-If you want to retrieve more data from {es} and the {dfeed} is stopped, you must
-restart it.
-
-For example, if you did not use the full data when you created the job, you can
-now process the remaining data by restarting the {dfeed}:
-
-. In the **Machine Learning** / **Job Management** tab, click the following
-button to start the {dfeed}: image:images/ml-start-feed.jpg["Start {dfeed}"]
-
-
-. Choose a start time and end time. For example,
-click **Continue from 2017-04-01 23:59:00** and select **2017-04-30** as the
-search end time. Then click **Start**. The date picker defaults to the latest
-timestamp of processed data. Be careful not to leave any gaps in the analysis,
-otherwise you might miss anomalies. +
-+
--
-[role="screenshot"]
-image::images/ml-gs-job1-datafeed.jpg["Restarting a {dfeed}"]
--
-
-The {dfeed} state changes to `started`, the job state changes to `opened`,
-and the number of processed records increases as the new data is analyzed. The
-latest timestamp information also increases. For example:
-[role="screenshot"]
-image::images/ml-gs-job1-manage2.jpg["Job opened and {dfeed} started"]
-
-TIP: If your data is being loaded continuously, you can continue running the job
-in real time. For this, start your {dfeed} and select **No end time**.
-
-If you want to stop the {dfeed} at this point, you can click the following
-button: image:images/ml-stop-feed.jpg["Stop {dfeed}"]
-
-Now that you have processed all the data, let's start exploring the job results.
-
-[[ml-gs-job1-analyze]]
-=== Exploring Single Metric Job Results
-
-The {xpackml} features analyze the input stream of data, model its behavior,
-and perform analysis based on the detectors you defined in your job. When an
-event occurs outside of the model, that event is identified as an anomaly.
-
-Result records for each anomaly are stored in `.ml-anomalies-*` indices in {es}.
-By default, the name of the index where {ml} results are stored is labelled
-`shared`, which corresponds to the `.ml-anomalies-shared` index.
-
-You can use the **Anomaly Explorer** or the **Single Metric Viewer** in {kib} to
-view the analysis results.
-
-Anomaly Explorer::
-  This view contains swim lanes showing the maximum anomaly score over time.
-  There is an overall swim lane that shows the overall score for the job, and
-  also swim lanes for each influencer. By selecting a block in a swim lane, the
-  anomaly details are displayed alongside the original source data (where
-  applicable).
-
-Single Metric Viewer::
-  This view contains a chart that represents the actual and expected values over
-  time. This is only available for jobs that analyze a single time series and
-  where `model_plot_config` is enabled. As in the **Anomaly Explorer**, anomalous
-  data points are shown in different colors depending on their score.
-
-By default when you view the results for a single metric job, the
-**Single Metric Viewer** opens:
-[role="screenshot"]
-image::images/ml-gs-job1-analysis.jpg["Single Metric Viewer for total-requests job"]
-
-
-The blue line in the chart represents the actual data values. The shaded blue
-area represents the bounds for the expected values. The area between the upper
-and lower bounds are the most likely values for the model. If a value is outside
-of this area then it can be said to be anomalous.
-
-If you slide the time selector from the beginning of the data to the end of the
-data, you can see how the model improves as it processes more data. At the
-beginning, the expected range of values is pretty broad and the model is not
-capturing the periodicity in the data. But it quickly learns and begins to
-reflect the daily variation.
-
-Any data points outside the range that was predicted by the model are marked
-as anomalies. When you have high volumes of real-life data, many anomalies
-might be found. These vary in probability from very likely to highly unlikely,
-that is to say, from not particularly anomalous to highly anomalous. There
-can be none, one or two or tens, sometimes hundreds of anomalies found within
-each bucket. There can be many thousands found per job. In order to provide
-a sensible view of the results, an _anomaly score_ is calculated for each bucket
-time interval. The anomaly score is a value from 0 to 100, which indicates
-the significance of the observed anomaly compared to previously seen anomalies.
-The highly anomalous values are shown in red and the low scored values are
-indicated in blue. An interval with a high anomaly score is significant and
-requires investigation.
-
-Slide the time selector to a section of the time series that contains a red
-anomaly data point. If you hover over the point, you can see more information
-about that data point. You can also see details in the **Anomalies** section
-of the viewer. For example:
-[role="screenshot"]
-image::images/ml-gs-job1-anomalies.jpg["Single Metric Viewer Anomalies for total-requests job"]
-
-For each anomaly you can see key details such as the time, the actual and
-expected ("typical") values, and their probability.
-
-By default, the table contains all anomalies that have a severity of "warning"
-or higher in the selected section of the timeline. If you are only interested in
-critical anomalies, for example, you can change the severity threshold for this
-table.
-
-The anomalies table also automatically calculates an interval for the data in
-the table. If the time difference between the earliest and latest records in the
-table is less than two days, the data is aggregated by hour to show the details
-of the highest severity anomaly for each detector.  Otherwise, it is
-aggregated by day. You can change the interval for the table, for example, to
-show all anomalies.
-
-You can see the same information in a different format by using the
-**Anomaly Explorer**:
-[role="screenshot"]
-image::images/ml-gs-job1-explorer.jpg["Anomaly Explorer for total-requests job"]
-
-
-Click one of the red sections in the swim lane to see details about the anomalies
-that occurred in that time interval. For example:
-[role="screenshot"]
-image::images/ml-gs-job1-explorer-anomaly.jpg["Anomaly Explorer details for total-requests job"]
-
-After you have identified anomalies, often the next step is to try to determine
-the context of those situations. For example, are there other factors that are
-contributing to the problem? Are the anomalies confined to particular
-applications or servers? You can begin to troubleshoot these situations by
-layering additional jobs or creating multi-metric jobs.
-
+include::getting-started-data.asciidoc[]
+include::getting-started-single.asciidoc[]
 include::getting-started-multi.asciidoc[]
 include::getting-started-next.asciidoc[]