diff --git a/x-pack/docs/en/ml/getting-started-data.asciidoc b/x-pack/docs/en/ml/getting-started-data.asciidoc deleted file mode 100644 index 6a0c6bbecc8..00000000000 --- a/x-pack/docs/en/ml/getting-started-data.asciidoc +++ /dev/null @@ -1,210 +0,0 @@ -[[ml-gs-data]] -=== Identifying Data for Analysis - -For the purposes of this tutorial, we provide sample data that you can play with -and search in {es}. When you consider your own data, however, it's important to -take a moment and think about where the {xpackml} features will be most -impactful. - -The first consideration is that it must be time series data. The {ml} features -are designed to model and detect anomalies in time series data. - -The second consideration, especially when you are first learning to use {ml}, -is the importance of the data and how familiar you are with it. Ideally, it is -information that contains key performance indicators (KPIs) for the health, -security, or success of your business or system. It is information that you need -to monitor and act on when anomalous behavior occurs. You might even have {kib} -dashboards that you're already using to watch this data. The better you know the -data, the quicker you will be able to create {ml} jobs that generate useful -insights. - -The final consideration is where the data is located. This tutorial assumes that -your data is stored in {es}. It guides you through the steps required to create -a _{dfeed}_ that passes data to a job. If your own data is outside of {es}, -analysis is still possible by using a post data API. - -IMPORTANT: If you want to create {ml} jobs in {kib}, you must use {dfeeds}. -That is to say, you must store your input data in {es}. When you create -a job, you select an existing index pattern and {kib} configures the {dfeed} -for you under the covers. - - -[float] -[[ml-gs-sampledata]] -==== Obtaining a Sample Data Set - -In this step we will upload some sample data to {es}. This is standard -{es} functionality, and is needed to set the stage for using {ml}. - -The sample data for this tutorial contains information about the requests that -are received by various applications and services in a system. A system -administrator might use this type of information to track the total number of -requests across all of the infrastructure. If the number of requests increases -or decreases unexpectedly, for example, this might be an indication that there -is a problem or that resources need to be redistributed. By using the {xpack} -{ml} features to model the behavior of this data, it is easier to identify -anomalies and take appropriate action. - -Download this sample data by clicking here: -https://download.elastic.co/demos/machine_learning/gettingstarted/server_metrics.tar.gz[server_metrics.tar.gz] - -Use the following commands to extract the files: - -[source,sh] ----------------------------------- -tar -zxvf server_metrics.tar.gz ----------------------------------- - -Each document in the server-metrics data set has the following schema: - -[source,js] ----------------------------------- -{ - "index": - { - "_index":"server-metrics", - "_type":"metric", - "_id":"1177" - } -} -{ - "@timestamp":"2017-03-23T13:00:00", - "accept":36320, - "deny":4156, - "host":"server_2", - "response":2.4558210155, - "service":"app_3", - "total":40476 -} ----------------------------------- -// NOTCONSOLE - -TIP: The sample data sets include summarized data. For example, the `total` -value is a sum of the requests that were received by a specific service at a -particular time. If your data is stored in {es}, you can generate -this type of sum or average by using aggregations. One of the benefits of -summarizing data this way is that {es} automatically distributes -these calculations across your cluster. You can then feed this summarized data -into {xpackml} instead of raw results, which reduces the volume -of data that must be considered while detecting anomalies. For the purposes of -this tutorial, however, these summary values are stored in {es}. For more -information, see <>. - -Before you load the data set, you need to set up {ref}/mapping.html[_mappings_] -for the fields. Mappings divide the documents in the index into logical groups -and specify a field's characteristics, such as the field's searchability or -whether or not it's _tokenized_, or broken up into separate words. - -The sample data includes an `upload_server-metrics.sh` script, which you can use -to create the mappings and load the data set. You can download it by clicking -here: https://download.elastic.co/demos/machine_learning/gettingstarted/upload_server-metrics.sh[upload_server-metrics.sh] -Before you run it, however, you must edit the USERNAME and PASSWORD variables -with your actual user ID and password. - -The script runs a command similar to the following example, which sets up a -mapping for the data set: - -[source,sh] ----------------------------------- -curl -u elastic:x-pack-test-password -X PUT -H 'Content-Type: application/json' -http://localhost:9200/server-metrics -d '{ - "settings":{ - "number_of_shards":1, - "number_of_replicas":0 - }, - "mappings":{ - "metric":{ - "properties":{ - "@timestamp":{ - "type":"date" - }, - "accept":{ - "type":"long" - }, - "deny":{ - "type":"long" - }, - "host":{ - "type":"keyword" - }, - "response":{ - "type":"float" - }, - "service":{ - "type":"keyword" - }, - "total":{ - "type":"long" - } - } - } - } -}' ----------------------------------- -// NOTCONSOLE - -NOTE: If you run this command, you must replace `x-pack-test-password` with your -actual password. - -You can then use the {es} `bulk` API to load the data set. The -`upload_server-metrics.sh` script runs commands similar to the following -example, which loads the four JSON files: - -[source,sh] ----------------------------------- -curl -u elastic:x-pack-test-password -X POST -H "Content-Type: application/json" -http://localhost:9200/server-metrics/_bulk --data-binary "@server-metrics_1.json" - -curl -u elastic:x-pack-test-password -X POST -H "Content-Type: application/json" -http://localhost:9200/server-metrics/_bulk --data-binary "@server-metrics_2.json" - -curl -u elastic:x-pack-test-password -X POST -H "Content-Type: application/json" -http://localhost:9200/server-metrics/_bulk --data-binary "@server-metrics_3.json" - -curl -u elastic:x-pack-test-password -X POST -H "Content-Type: application/json" -http://localhost:9200/server-metrics/_bulk --data-binary "@server-metrics_4.json" ----------------------------------- -// NOTCONSOLE - -TIP: This will upload 200MB of data. This is split into 4 files as there is a -maximum 100MB limit when using the `_bulk` API. - -These commands might take some time to run, depending on the computing resources -available. - -You can verify that the data was loaded successfully with the following command: - -[source,sh] ----------------------------------- -curl 'http://localhost:9200/_cat/indices?v' -u elastic:x-pack-test-password ----------------------------------- -// NOTCONSOLE - -You should see output similar to the following: - -[source,txt] ----------------------------------- -health status index ... pri rep docs.count ... -green open server-metrics ... 1 0 905940 ... ----------------------------------- -// NOTCONSOLE - -Next, you must define an index pattern for this data set: - -. Open {kib} in your web browser and log in. If you are running {kib} -locally, go to `http://localhost:5601/`. - -. Click the **Management** tab, then **{kib}** > **Index Patterns**. - -. If you already have index patterns, click **Create Index** to define a new -one. Otherwise, the **Create index pattern** wizard is already open. - -. For this tutorial, any pattern that matches the name of the index you've -loaded will work. For example, enter `server-metrics*` as the index pattern. - -. In the **Configure settings** step, select the `@timestamp` field in the -**Time Filter field name** list. - -. Click **Create index pattern**. - -This data set can now be analyzed in {ml} jobs in {kib}. diff --git a/x-pack/docs/en/ml/getting-started-forecast.asciidoc b/x-pack/docs/en/ml/getting-started-forecast.asciidoc deleted file mode 100644 index bc445195bd4..00000000000 --- a/x-pack/docs/en/ml/getting-started-forecast.asciidoc +++ /dev/null @@ -1,76 +0,0 @@ -[[ml-gs-forecast]] -=== Creating Forecasts - -In addition to detecting anomalous behavior in your data, you can use -{ml} to predict future behavior. For more information, see <>. - -To create a forecast in {kib}: - -. Go to the **Single Metric Viewer** and select one of the jobs that you created -in this tutorial. For example, select the `total-requests` job. - -. Click **Forecast**. + -+ --- -[role="screenshot"] -image::images/ml-gs-forecast.jpg["Create a forecast from the Single Metric Viewer"] --- - -. Specify a duration for your forecast. This value indicates how far to -extrapolate beyond the last record that was processed. You must use time units, -such as `30d` for 30 days. For more information, see -{ref}/common-options.html#time-units[Time Units]. In this example, we use a -duration of 1 week: + -+ --- -[role="screenshot"] -image::images/ml-gs-duration.jpg["Specify a duration of 1w"] --- - -. View the forecast in the **Single Metric Viewer**: + -+ --- -[role="screenshot"] -image::images/ml-gs-forecast-results.jpg["View a forecast from the Single Metric Viewer"] - -The yellow line in the chart represents the predicted data values. The shaded -yellow area represents the bounds for the predicted values, which also gives an -indication of the confidence of the predictions. Note that the bounds generally -increase with time (that is to say, the confidence levels decrease), since you -are forecasting further into the future. Eventually if the confidence levels are -too low, the forecast stops. --- - -. Optional: Compare the forecast to actual data. + -+ --- -You can try this with the sample data by choosing a subset of the data when you -create the job, as described in <>. Create the forecast then process -the remaining data, as described in <>. --- - -.. After you restart the {dfeed}, re-open the forecast by selecting the job in -the **Single Metric Viewer**, clicking **Forecast**, and selecting your forecast -from the list. For example: + -+ --- -[role="screenshot"] -image::images/ml-gs-forecast-open.jpg["Open a forecast in the Single Metric Viewer"] --- - -.. View the forecast and actual data in the **Single Metric Viewer**: + -+ --- -[role="screenshot"] -image::images/ml-gs-forecast-actual.jpg["View a forecast over actual data in the Single Metric Viewer"] - -The chart contains the actual data values, the bounds for the expected values, -the anomalies, the forecast data values, and the bounds for the forecast. This -combination of actual and forecast data gives you an indication of how well the -{xpack} {ml} features can extrapolate the future behavior of the data. --- - -Now that you have seen how easy it is to create forecasts with the sample data, -consider what type of events you might want to predict in your own data. For -more information and ideas, as well as a list of limitations related to -forecasts, see <>. diff --git a/x-pack/docs/en/ml/getting-started-multi.asciidoc b/x-pack/docs/en/ml/getting-started-multi.asciidoc deleted file mode 100644 index 804abacc605..00000000000 --- a/x-pack/docs/en/ml/getting-started-multi.asciidoc +++ /dev/null @@ -1,211 +0,0 @@ -[[ml-gs-multi-jobs]] -=== Creating Multi-metric Jobs - -The multi-metric job wizard in {kib} provides a simple way to create more -complex jobs with multiple detectors. For example, in the single metric job, you -were tracking total requests versus time. You might also want to track other -metrics like average response time or the maximum number of denied requests. -Instead of creating jobs for each of those metrics, you can combine them in a -multi-metric job. - -You can also use multi-metric jobs to split a single time series into multiple -time series based on a categorical field. For example, you can split the data -based on its hostnames, locations, or users. Each time series is modeled -independently. By looking at temporal patterns on a per entity basis, you might -spot things that might have otherwise been hidden in the lumped view. - -Conceptually, you can think of this as running many independent single metric -jobs. By bundling them together in a multi-metric job, however, you can see an -overall score and shared influencers for all the metrics and all the entities in -the job. Multi-metric jobs therefore scale better than having many independent -single metric jobs and provide better results when you have influencers that are -shared across the detectors. - -The sample data for this tutorial contains information about the requests that -are received by various applications and services in a system. Let's assume that -you want to monitor the requests received and the response time. In particular, -you might want to track those metrics on a per service basis to see if any -services have unusual patterns. - -To create a multi-metric job in {kib}: - -. Open {kib} in your web browser and log in. If you are running {kib} locally, -go to `http://localhost:5601/`. - -. Click **Machine Learning** in the side navigation, then click **Create new job**. - -. Select the index pattern that you created for the sample data. For example, -`server-metrics*`. - -. In the **Use a wizard** section, click **Multi metric**. - -. Configure the job by providing the following job settings: + -+ --- -[role="screenshot"] -image::images/ml-gs-multi-job.jpg["Create a new job from the server-metrics index"] --- - -.. For the **Fields**, select `high mean(response)` and `sum(total)`. This -creates two detectors and specifies the analysis function and field that each -detector uses. The first detector uses the high mean function to detect -unusually high average values for the `response` field in each bucket. The -second detector uses the sum function to detect when the sum of the `total` -field is anomalous in each bucket. For more information about any of the -analytical functions, see <>. - -.. For the **Bucket span**, enter `10m`. This value specifies the size of the -interval that the analysis is aggregated into. As was the case in the single -metric example, this value has a significant impact on the analysis. When you're -creating jobs for your own data, you might need to experiment with different -bucket spans depending on the frequency of the input data, the duration of -typical anomalies, and the frequency at which alerting is required. - -.. For the **Split Data**, select `service`. When you specify this -option, the analysis is segmented such that you have completely independent -baselines for each distinct value of this field. -//TBD: What is the importance of having separate baselines? -There are seven unique service keyword values in the sample data. Thus for each -of the seven services, you will see the high mean response metrics and sum -total metrics. + -+ --- -NOTE: If you are creating a job by using the {ml} APIs or the advanced job -wizard in {kib}, you can accomplish this split by using the -`partition_field_name` property. - --- - -.. For the **Key Fields (Influencers)**, select `host`. Note that the `service` field -is also automatically selected because you used it to split the data. These key -fields are also known as _influencers_. -When you identify a field as an influencer, you are indicating that you think -it contains information about someone or something that influences or -contributes to anomalies. -+ --- -[TIP] -======================== -Picking an influencer is strongly recommended for the following reasons: - -* It allows you to more easily assign blame for the anomaly -* It simplifies and aggregates the results - -The best influencer is the person or thing that you want to blame for the -anomaly. In many cases, users or client IP addresses make excellent influencers. -Influencers can be any field in your data; they do not need to be fields that -are specified in your detectors, though they often are. - -As a best practice, do not pick too many influencers. For example, you generally -do not need more than three. If you pick many influencers, the results can be -overwhelming and there is a small overhead to the analysis. - -======================== -//TBD: Is this something you can determine later from looking at results and -//update your job with if necessary? Is it all post-processing or does it affect -//the ongoing modeling? --- - -. Click **Use full server-metrics* data**. Two graphs are generated for each -`service` value, which represent the high mean `response` values and -sum `total` values over time. For example: -+ --- -[role="screenshot"] -image::images/ml-gs-job2-split.jpg["Kibana charts for data split by service"] --- - -. Provide a name for the job, for example `response_requests_by_app`. The job -name must be unique in your cluster. You can also optionally provide a -description of the job. - -. Click **Create Job**. - -When the job is created, you can choose to view the results, continue the job in -real-time, and create a watch. In this tutorial, we will proceed to view the -results. - -TIP: The `create_multi_metic.sh` script creates a similar job and {dfeed} by -using the {ml} APIs. You can download that script by clicking -here: https://download.elastic.co/demos/machine_learning/gettingstarted/create_multi_metric.sh[create_multi_metric.sh] -For API reference information, see {ref}/ml-apis.html[Machine Learning APIs]. - -[[ml-gs-job2-analyze]] -=== Exploring Multi-metric Job Results - -The {xpackml} features analyze the input stream of data, model its behavior, and -perform analysis based on the two detectors you defined in your job. When an -event occurs outside of the model, that event is identified as an anomaly. - -You can use the **Anomaly Explorer** in {kib} to view the analysis results: - -[role="screenshot"] -image::images/ml-gs-job2-explorer.jpg["Job results in the Anomaly Explorer"] - -You can explore the overall anomaly time line, which shows the maximum anomaly -score for each section in the specified time period. You can change the time -period by using the time picker in the {kib} toolbar. Note that the sections in -this time line do not necessarily correspond to the bucket span. If you change -the time period, the sections change size too. The smallest possible size for -these sections is a bucket. If you specify a large time period, the sections can -span many buckets. - -On the left is a list of the top influencers for all of the detected anomalies -in that same time period. The list includes maximum anomaly scores, which in -this case are aggregated for each influencer, for each bucket, across all -detectors. There is also a total sum of the anomaly scores for each influencer. -You can use this list to help you narrow down the contributing factors and focus -on the most anomalous entities. - -If your job contains influencers, you can also explore swim lanes that -correspond to the values of an influencer. In this example, the swim lanes -correspond to the values for the `service` field that you used to split the data. -Each lane represents a unique application or service name. Since you specified -the `host` field as an influencer, you can also optionally view the results in -swim lanes for each host name: - -[role="screenshot"] -image::images/ml-gs-job2-explorer-host.jpg["Job results sorted by host"] - -By default, the swim lanes are ordered by their maximum anomaly score values. -You can click on the sections in the swim lane to see details about the -anomalies that occurred in that time interval. - -NOTE: The anomaly scores that you see in each section of the **Anomaly Explorer** -might differ slightly. This disparity occurs because for each job we generate -bucket results, influencer results, and record results. Anomaly scores are -generated for each type of result. The anomaly timeline uses the bucket-level -anomaly scores. The list of top influencers uses the influencer-level anomaly -scores. The list of anomalies uses the record-level anomaly scores. For more -information about these different result types, see -{ref}/ml-results-resource.html[Results Resources]. - -Click on a section in the swim lanes to obtain more information about the -anomalies in that time period. For example, click on the red section in the swim -lane for `server_2`: - -[role="screenshot"] -image::images/ml-gs-job2-explorer-anomaly.jpg["Job results for an anomaly"] - -You can see exact times when anomalies occurred and which detectors or metrics -caught the anomaly. Also note that because you split the data by the `service` -field, you see separate charts for each applicable service. In particular, you -see charts for each service for which there is data on the specified host in the -specified time interval. - -Below the charts, there is a table that provides more information, such as the -typical and actual values and the influencers that contributed to the anomaly. - -[role="screenshot"] -image::images/ml-gs-job2-explorer-table.jpg["Job results table"] - -Notice that there are anomalies for both detectors, that is to say for both the -`high_mean(response)` and the `sum(total)` metrics in this time interval. The -table aggregates the anomalies to show the highest severity anomaly per detector -and entity, which is the by, over, or partition field value that is displayed -in the **found for** column. To view all the anomalies without any aggregation, -set the **Interval** to `Show all`. - -By -investigating multiple metrics in a single job, you might see relationships -between events in your data that would otherwise be overlooked. diff --git a/x-pack/docs/en/ml/getting-started-next.asciidoc b/x-pack/docs/en/ml/getting-started-next.asciidoc deleted file mode 100644 index 90d1e7798ee..00000000000 --- a/x-pack/docs/en/ml/getting-started-next.asciidoc +++ /dev/null @@ -1,55 +0,0 @@ -[[ml-gs-next]] -=== Next Steps - -By completing this tutorial, you've learned how you can detect anomalous -behavior in a simple set of sample data. You created single and multi-metric -jobs in {kib}, which creates and opens jobs and creates and starts {dfeeds} for -you under the covers. You examined the results of the {ml} analysis in the -**Single Metric Viewer** and **Anomaly Explorer** in {kib}. You also -extrapolated the future behavior of a job by creating a forecast. - -If you want to learn about advanced job options, you might be interested in -the following video tutorial: -https://www.elastic.co/videos/machine-learning-lab-3-detect-outliers-in-a-population[Machine Learning Lab 3 - Detect Outliers in a Population]. - -If you intend to use {ml} APIs in your applications, a good next step might be -to learn about the APIs by retrieving information about these sample jobs. -For example, the following APIs retrieve information about the jobs and {dfeeds}. - -[source,js] --------------------------------------------------- -GET _xpack/ml/anomaly_detectors - -GET _xpack/ml/datafeeds --------------------------------------------------- -// CONSOLE - -For more information about the {ml} APIs, see <>. - -Ultimately, the next step is to start applying {ml} to your own data. -As mentioned in <>, there are three things to consider when you're -thinking about where {ml} will be most impactful: - -. It must be time series data. -. It should be information that contains key performance indicators for the -health, security, or success of your business or system. The better you know the -data, the quicker you will be able to create jobs that generate useful -insights. -. Ideally, the data is located in {es} and you can therefore create a {dfeed} -that retrieves data in real time. If your data is outside of {es}, you -cannot use {kib} to create your jobs and you cannot use {dfeeds}. Machine -learning analysis is still possible, however, by using APIs to create and manage -jobs and to post data to them. - -Once you have decided which data to analyze, you can start considering which -analysis functions you want to use. For more information, see <>. - -In general, it is a good idea to start with single metric jobs for your -key performance indicators. After you examine these simple analysis results, -you will have a better idea of what the influencers might be. You can create -multi-metric jobs and split the data or create more complex analysis functions -as necessary. For examples of more complicated configuration options, see -<>. - -If you encounter problems, we're here to help. See <> and -<>. diff --git a/x-pack/docs/en/ml/getting-started-single.asciidoc b/x-pack/docs/en/ml/getting-started-single.asciidoc deleted file mode 100644 index 3befdbaf34d..00000000000 --- a/x-pack/docs/en/ml/getting-started-single.asciidoc +++ /dev/null @@ -1,331 +0,0 @@ -[[ml-gs-jobs]] -=== Creating Single Metric Jobs - -At this point in the tutorial, the goal is to detect anomalies in the -total requests received by your applications and services. The sample data -contains a single key performance indicator(KPI) to track this, which is the total -requests over time. It is therefore logical to start by creating a single metric -job for this KPI. - -TIP: If you are using aggregated data, you can create an advanced job -and configure it to use a `summary_count_field_name`. The {ml} algorithms will -make the best possible use of summarized data in this case. For simplicity, in -this tutorial we will not make use of that advanced functionality. For more -information, see <>. - -A single metric job contains a single _detector_. A detector defines the type of -analysis that will occur (for example, `max`, `average`, or `rare` analytical -functions) and the fields that will be analyzed. - -To create a single metric job in {kib}: - -. Open {kib} in your web browser and log in. If you are running {kib} locally, -go to `http://localhost:5601/`. - -. Click **Machine Learning** in the side navigation. - -. Click **Create new job**. - -. Select the index pattern that you created for the sample data. For example, -`server-metrics*`. - -. In the **Use a wizard** section, click **Single metric**. - -. Configure the job by providing the following information: + -+ --- -[role="screenshot"] -image::images/ml-gs-single-job.jpg["Create a new job from the server-metrics index"] --- - -.. For the **Aggregation**, select `Sum`. This value specifies the analysis -function that is used. -+ --- -Some of the analytical functions look for single anomalous data points. For -example, `max` identifies the maximum value that is seen within a bucket. -Others perform some aggregation over the length of the bucket. For example, -`mean` calculates the mean of all the data points seen within the bucket. -Similarly, `count` calculates the total number of data points within the bucket. -In this tutorial, you are using the `sum` function, which calculates the sum of -the specified field's values within the bucket. For descriptions of all the -functions, see <>. --- - -.. For the **Field**, select `total`. This value specifies the field that -the detector uses in the function. -+ --- -NOTE: Some functions such as `count` and `rare` do not require fields. --- - -.. For the **Bucket span**, enter `10m`. This value specifies the size of the -interval that the analysis is aggregated into. -+ --- -The {xpackml} features use the concept of a bucket to divide up the time series -into batches for processing. For example, if you are monitoring -the total number of requests in the system, -using a bucket span of 1 hour would mean that at the end of each hour, it -calculates the sum of the requests for the last hour and computes the -anomalousness of that value compared to previous hours. - -The bucket span has two purposes: it dictates over what time span to look for -anomalous features in data, and also determines how quickly anomalies can be -detected. Choosing a shorter bucket span enables anomalies to be detected more -quickly. However, there is a risk of being too sensitive to natural variations -or noise in the input data. Choosing too long a bucket span can mean that -interesting anomalies are averaged away. There is also the possibility that the -aggregation might smooth out some anomalies based on when the bucket starts -in time. - -The bucket span has a significant impact on the analysis. When you're trying to -determine what value to use, take into account the granularity at which you -want to perform the analysis, the frequency of the input data, the duration of -typical anomalies, and the frequency at which alerting is required. --- - -. Determine whether you want to process all of the data or only part of it. If -you want to analyze all of the existing data, click -**Use full server-metrics* data**. If you want to see what happens when you -stop and start {dfeeds} and process additional data over time, click the time -picker in the {kib} toolbar. Since the sample data spans a period of time -between March 23, 2017 and April 22, 2017, click **Absolute**. Set the start -time to March 23, 2017 and the end time to April 1, 2017, for example. Once -you've got the time range set up, click the **Go** button. + -+ --- -[role="screenshot"] -image::images/ml-gs-job1-time.jpg["Setting the time range for the {dfeed}"] --- -+ --- -A graph is generated, which represents the total number of requests over time. - -Note that the **Estimate bucket span** option is no longer greyed out in the -**Buck span** field. This is an experimental feature that you can use to help -determine an appropriate bucket span for your data. For the purposes of this -tutorial, we will leave the bucket span at 10 minutes. --- - -. Provide a name for the job, for example `total-requests`. The job name must -be unique in your cluster. You can also optionally provide a description of the -job and create a job group. - -. Click **Create Job**. + -+ --- -[role="screenshot"] -image::images/ml-gs-job1.jpg["A graph of the total number of requests over time"] --- - -As the job is created, the graph is updated to give a visual representation of -the progress of {ml} as the data is processed. This view is only available whilst the -job is running. - -When the job is created, you can choose to view the results, continue the job -in real-time, and create a watch. In this tutorial, we will look at how to -manage jobs and {dfeeds} before we view the results. - -TIP: The `create_single_metic.sh` script creates a similar job and {dfeed} by -using the {ml} APIs. You can download that script by clicking -here: https://download.elastic.co/demos/machine_learning/gettingstarted/create_single_metric.sh[create_single_metric.sh] -For API reference information, see {ref}/ml-apis.html[Machine Learning APIs]. - -[[ml-gs-job1-manage]] -=== Managing Jobs - -After you create a job, you can see its status in the **Job Management** tab: + - -[role="screenshot"] -image::images/ml-gs-job1-manage1.jpg["Status information for the total-requests job"] - -The following information is provided for each job: - -Job ID:: -The unique identifier for the job. - -Description:: -The optional description of the job. - -Processed records:: -The number of records that have been processed by the job. - -Memory status:: -The status of the mathematical models. When you create jobs by using the APIs or -by using the advanced options in {kib}, you can specify a `model_memory_limit`. -That value is the maximum amount of memory resources that the mathematical -models can use. Once that limit is approached, data pruning becomes more -aggressive. Upon exceeding that limit, new entities are not modeled. For more -information about this setting, see -{ref}/ml-job-resource.html#ml-apilimits[Analysis Limits]. The memory status -field reflects whether you have reached or exceeded the model memory limit. It -can have one of the following values: + -`ok`::: The models stayed below the configured value. -`soft_limit`::: The models used more than 60% of the configured memory limit -and older unused models will be pruned to free up space. -`hard_limit`::: The models used more space than the configured memory limit. -As a result, not all incoming data was processed. - -Job state:: -The status of the job, which can be one of the following values: + -`opened`::: The job is available to receive and process data. -`closed`::: The job finished successfully with its model state persisted. -The job must be opened before it can accept further data. -`closing`::: The job close action is in progress and has not yet completed. -A closing job cannot accept further data. -`failed`::: The job did not finish successfully due to an error. -This situation can occur due to invalid input data. -If the job had irrevocably failed, it must be force closed and then deleted. -If the {dfeed} can be corrected, the job can be closed and then re-opened. - -{dfeed-cap} state:: -The status of the {dfeed}, which can be one of the following values: + -started::: The {dfeed} is actively receiving data. -stopped::: The {dfeed} is stopped and will not receive data until it is -re-started. - -Latest timestamp:: -The timestamp of the last processed record. - - -If you click the arrow beside the name of job, you can show or hide additional -information, such as the settings, configuration information, or messages for -the job. - -You can also click one of the **Actions** buttons to start the {dfeed}, edit -the job or {dfeed}, and clone or delete the job, for example. - -[float] -[[ml-gs-job1-datafeed]] -==== Managing {dfeeds-cap} - -A {dfeed} can be started and stopped multiple times throughout its lifecycle. -If you want to retrieve more data from {es} and the {dfeed} is stopped, you must -restart it. - -For example, if you did not use the full data when you created the job, you can -now process the remaining data by restarting the {dfeed}: - -. In the **Machine Learning** / **Job Management** tab, click the following -button to start the {dfeed}: image:images/ml-start-feed.jpg["Start {dfeed}"] - - -. Choose a start time and end time. For example, -click **Continue from 2017-04-01 23:59:00** and select **2017-04-30** as the -search end time. Then click **Start**. The date picker defaults to the latest -timestamp of processed data. Be careful not to leave any gaps in the analysis, -otherwise you might miss anomalies. + -+ --- -[role="screenshot"] -image::images/ml-gs-job1-datafeed.jpg["Restarting a {dfeed}"] --- - -The {dfeed} state changes to `started`, the job state changes to `opened`, -and the number of processed records increases as the new data is analyzed. The -latest timestamp information also increases. - -TIP: If your data is being loaded continuously, you can continue running the job -in real time. For this, start your {dfeed} and select **No end time**. - -If you want to stop the {dfeed} at this point, you can click the following -button: image:images/ml-stop-feed.jpg["Stop {dfeed}"] - -Now that you have processed all the data, let's start exploring the job results. - -[[ml-gs-job1-analyze]] -=== Exploring Single Metric Job Results - -The {xpackml} features analyze the input stream of data, model its behavior, -and perform analysis based on the detectors you defined in your job. When an -event occurs outside of the model, that event is identified as an anomaly. - -Result records for each anomaly are stored in `.ml-anomalies-*` indices in {es}. -By default, the name of the index where {ml} results are stored is labelled -`shared`, which corresponds to the `.ml-anomalies-shared` index. - -You can use the **Anomaly Explorer** or the **Single Metric Viewer** in {kib} to -view the analysis results. - -Anomaly Explorer:: - This view contains swim lanes showing the maximum anomaly score over time. - There is an overall swim lane that shows the overall score for the job, and - also swim lanes for each influencer. By selecting a block in a swim lane, the - anomaly details are displayed alongside the original source data (where - applicable). - -Single Metric Viewer:: - This view contains a chart that represents the actual and expected values over - time. This is only available for jobs that analyze a single time series and - where `model_plot_config` is enabled. As in the **Anomaly Explorer**, anomalous - data points are shown in different colors depending on their score. - -By default when you view the results for a single metric job, the -**Single Metric Viewer** opens: -[role="screenshot"] -image::images/ml-gs-job1-analysis.jpg["Single Metric Viewer for total-requests job"] - - -The blue line in the chart represents the actual data values. The shaded blue -area represents the bounds for the expected values. The area between the upper -and lower bounds are the most likely values for the model. If a value is outside -of this area then it can be said to be anomalous. - -If you slide the time selector from the beginning of the data to the end of the -data, you can see how the model improves as it processes more data. At the -beginning, the expected range of values is pretty broad and the model is not -capturing the periodicity in the data. But it quickly learns and begins to -reflect the daily variation. - -Any data points outside the range that was predicted by the model are marked -as anomalies. When you have high volumes of real-life data, many anomalies -might be found. These vary in probability from very likely to highly unlikely, -that is to say, from not particularly anomalous to highly anomalous. There -can be none, one or two or tens, sometimes hundreds of anomalies found within -each bucket. There can be many thousands found per job. In order to provide -a sensible view of the results, an _anomaly score_ is calculated for each bucket -time interval. The anomaly score is a value from 0 to 100, which indicates -the significance of the observed anomaly compared to previously seen anomalies. -The highly anomalous values are shown in red and the low scored values are -indicated in blue. An interval with a high anomaly score is significant and -requires investigation. - -Slide the time selector to a section of the time series that contains a red -anomaly data point. If you hover over the point, you can see more information -about that data point. You can also see details in the **Anomalies** section -of the viewer. For example: -[role="screenshot"] -image::images/ml-gs-job1-anomalies.jpg["Single Metric Viewer Anomalies for total-requests job"] - -For each anomaly you can see key details such as the time, the actual and -expected ("typical") values, and their probability. - -By default, the table contains all anomalies that have a severity of "warning" -or higher in the selected section of the timeline. If you are only interested in -critical anomalies, for example, you can change the severity threshold for this -table. - -The anomalies table also automatically calculates an interval for the data in -the table. If the time difference between the earliest and latest records in the -table is less than two days, the data is aggregated by hour to show the details -of the highest severity anomaly for each detector. Otherwise, it is -aggregated by day. You can change the interval for the table, for example, to -show all anomalies. - -You can see the same information in a different format by using the -**Anomaly Explorer**: -[role="screenshot"] -image::images/ml-gs-job1-explorer.jpg["Anomaly Explorer for total-requests job"] - - -Click one of the red sections in the swim lane to see details about the anomalies -that occurred in that time interval. For example: -[role="screenshot"] -image::images/ml-gs-job1-explorer-anomaly.jpg["Anomaly Explorer details for total-requests job"] - -After you have identified anomalies, often the next step is to try to determine -the context of those situations. For example, are there other factors that are -contributing to the problem? Are the anomalies confined to particular -applications or servers? You can begin to troubleshoot these situations by -layering additional jobs or creating multi-metric jobs. diff --git a/x-pack/docs/en/ml/getting-started-wizards.asciidoc b/x-pack/docs/en/ml/getting-started-wizards.asciidoc deleted file mode 100644 index 2eb6b5c2904..00000000000 --- a/x-pack/docs/en/ml/getting-started-wizards.asciidoc +++ /dev/null @@ -1,99 +0,0 @@ -[[ml-gs-wizards]] -=== Creating Jobs in {kib} -++++ -Creating Jobs -++++ - -Machine learning jobs contain the configuration information and metadata -necessary to perform an analytical task. They also contain the results of the -analytical task. - -[NOTE] --- -This tutorial uses {kib} to create jobs and view results, but you can -alternatively use APIs to accomplish most tasks. -For API reference information, see {ref}/ml-apis.html[Machine Learning APIs]. - -The {xpackml} features in {kib} use pop-ups. You must configure your -web browser so that it does not block pop-up windows or create an -exception for your {kib} URL. --- - -{kib} provides wizards that help you create typical {ml} jobs. For example, you -can use wizards to create single metric, multi-metric, population, and advanced -jobs. - -To see the job creation wizards: - -. Open {kib} in your web browser and log in. If you are running {kib} locally, -go to `http://localhost:5601/`. - -. Click **Machine Learning** in the side navigation. - -. Click **Create new job**. - -. Click the `server-metrics*` index pattern. - -You can then choose from a list of job wizards. For example: - -[role="screenshot"] -image::images/ml-create-job.jpg["Job creation wizards in {kib}"] - -If you are not certain which wizard to use, there is also a **Data Visualizer** -that can help you explore the fields in your data. - -To learn more about the sample data: - -. Click **Data Visualizer**. + -+ --- -[role="screenshot"] -image::images/ml-data-visualizer.jpg["Data Visualizer in {kib}"] --- - -. Select a time period that you're interested in exploring by using the time -picker in the {kib} toolbar. Alternatively, click -**Use full server-metrics* data** to view data over the full time range. In this -sample data, the documents relate to March and April 2017. - -. Optional: Change the number of documents per shard that are used in the -visualizations. There is a relatively small number of documents in the sample -data, so you can choose a value of `all`. For larger data sets, keep in mind -that using a large sample size increases query run times and increases the load -on the cluster. - -[role="screenshot"] -image::images/ml-data-metrics.jpg["Data Visualizer output for metrics in {kib}"] - -The fields in the indices are listed in two sections. The first section contains -the numeric ("metric") fields. The second section contains non-metric fields -(such as `keyword`, `text`, `date`, `boolean`, `ip`, and `geo_point` data types). - -For metric fields, the **Data Visualizer** indicates how many documents contain -the field in the selected time period. It also provides information about the -minimum, median, and maximum values, the number of distinct values, and their -distribution. You can use the distribution chart to get a better idea of how -the values in the data are clustered. Alternatively, you can view the top values -for metric fields. For example: - -[role="screenshot"] -image::images/ml-data-topmetrics.jpg["Data Visualizer output for top values in {kib}"] - -For date fields, the **Data Visualizer** provides the earliest and latest field -values and the number and percentage of documents that contain the field -during the selected time period. For example: - -[role="screenshot"] -image::images/ml-data-dates.jpg["Data Visualizer output for date fields in {kib}"] - -For keyword fields, the **Data Visualizer** provides the number of distinct -values, a list of the top values, and the number and percentage of documents -that contain the field during the selected time period. For example: - -[role="screenshot"] -image::images/ml-data-keywords.jpg["Data Visualizer output for date fields in {kib}"] - -In this tutorial, you will create single and multi-metric jobs that use the -`total`, `response`, `service`, and `host` fields. Though there is an option to -create an advanced job directly from the **Data Visualizer**, we will use the -single and multi-metric job creation wizards instead. diff --git a/x-pack/docs/en/ml/getting-started.asciidoc b/x-pack/docs/en/ml/getting-started.asciidoc deleted file mode 100644 index 0f1b7164d4a..00000000000 --- a/x-pack/docs/en/ml/getting-started.asciidoc +++ /dev/null @@ -1,92 +0,0 @@ -[[ml-getting-started]] -== Getting started with machine learning -++++ -Getting started -++++ - -Ready to get some hands-on experience with the {xpackml} features? This -tutorial shows you how to: - -* Load a sample data set into {es} -* Create single and multi-metric {ml} jobs in {kib} -* Use the results to identify possible anomalies in the data - -At the end of this tutorial, you should have a good idea of what {ml} is and -will hopefully be inspired to use it to detect anomalies in your own data. - -You might also be interested in these video tutorials, which use the same sample -data: - -* https://www.elastic.co/videos/machine-learning-tutorial-creating-a-single-metric-job[Machine Learning for the Elastic Stack: Creating a single metric job] -* https://www.elastic.co/videos/machine-learning-tutorial-creating-a-multi-metric-job[Machine Learning for the Elastic Stack: Creating a multi-metric job] - - -[float] -[[ml-gs-sysoverview]] -=== System Overview - -To follow the steps in this tutorial, you will need the following -components of the Elastic Stack: - -* {es} {version}, which stores the data and the analysis results -* {kib} {version}, which provides a helpful user interface for creating and -viewing jobs - -See the https://www.elastic.co/support/matrix[Elastic Support Matrix] for -information about supported operating systems. - -See {stack-ref}/installing-elastic-stack.html[Installing the Elastic Stack] for -information about installing each of the components. - -NOTE: To get started, you can install {es} and {kib} on a -single VM or even on your laptop (requires 64-bit OS). -As you add more data and your traffic grows, -you'll want to replace the single {es} instance with a cluster. - -By default, when you install {es} and {kib}, {xpack} is installed and the -{ml} features are enabled. You cannot use {ml} with the free basic license, but -you can try all of the {xpack} features with a <>. - -If you have multiple nodes in your cluster, you can optionally dedicate nodes to -specific purposes. If you want to control which nodes are -_machine learning nodes_ or limit which nodes run resource-intensive -activity related to jobs, see -{ref}/modules-node.html#modules-node-xpack[{ml} node settings]. - -[float] -[[ml-gs-users]] -==== Users, Roles, and Privileges - -The {xpackml} features implement cluster privileges and built-in roles to -make it easier to control which users have authority to view and manage the jobs, -{dfeeds}, and results. - -By default, you can perform all of the steps in this tutorial by using the -built-in `elastic` super user. However, the password must be set before the user -can do anything. For information about how to set that password, see -<>. - -If you are performing these steps in a production environment, take extra care -because `elastic` has the `superuser` role and you could inadvertently make -significant changes to the system. You can alternatively assign the -`machine_learning_admin` and `kibana_user` roles to a user ID of your choice. - -For more information, see <> and <>. - -:edit_url: https://github.com/elastic/elasticsearch/edit/{branch}/x-pack/docs/en/ml/getting-started-data.asciidoc -include::getting-started-data.asciidoc[] - -:edit_url: https://github.com/elastic/elasticsearch/edit/{branch}/x-pack/docs/en/ml/getting-started-wizards.asciidoc -include::getting-started-wizards.asciidoc[] - -:edit_url: https://github.com/elastic/elasticsearch/edit/{branch}/x-pack/docs/en/ml/getting-started-single.asciidoc -include::getting-started-single.asciidoc[] - -:edit_url: https://github.com/elastic/elasticsearch/edit/{branch}/x-pack/docs/en/ml/getting-started-multi.asciidoc -include::getting-started-multi.asciidoc[] - -:edit_url: https://github.com/elastic/elasticsearch/edit/{branch}/x-pack/docs/en/ml/getting-started-forecast.asciidoc -include::getting-started-forecast.asciidoc[] - -:edit_url: https://github.com/elastic/elasticsearch/edit/{branch}/x-pack/docs/en/ml/getting-started-next.asciidoc -include::getting-started-next.asciidoc[] diff --git a/x-pack/docs/en/ml/images/ml-gs-aggregations.jpg b/x-pack/docs/en/ml/images/ml-gs-aggregations.jpg deleted file mode 100644 index 446dce79727..00000000000 Binary files a/x-pack/docs/en/ml/images/ml-gs-aggregations.jpg and /dev/null differ diff --git a/x-pack/docs/en/ml/images/ml-gs-duration.jpg b/x-pack/docs/en/ml/images/ml-gs-duration.jpg deleted file mode 100644 index 0e93b3f4ccd..00000000000 Binary files a/x-pack/docs/en/ml/images/ml-gs-duration.jpg and /dev/null differ diff --git a/x-pack/docs/en/ml/images/ml-gs-forecast-actual.jpg b/x-pack/docs/en/ml/images/ml-gs-forecast-actual.jpg deleted file mode 100644 index 6733b6e3477..00000000000 Binary files a/x-pack/docs/en/ml/images/ml-gs-forecast-actual.jpg and /dev/null differ diff --git a/x-pack/docs/en/ml/images/ml-gs-forecast-open.jpg b/x-pack/docs/en/ml/images/ml-gs-forecast-open.jpg deleted file mode 100644 index e654c9e7804..00000000000 Binary files a/x-pack/docs/en/ml/images/ml-gs-forecast-open.jpg and /dev/null differ diff --git a/x-pack/docs/en/ml/images/ml-gs-forecast-results.jpg b/x-pack/docs/en/ml/images/ml-gs-forecast-results.jpg deleted file mode 100644 index f6911b41939..00000000000 Binary files a/x-pack/docs/en/ml/images/ml-gs-forecast-results.jpg and /dev/null differ diff --git a/x-pack/docs/en/ml/images/ml-gs-forecast.jpg b/x-pack/docs/en/ml/images/ml-gs-forecast.jpg deleted file mode 100644 index eeb8923b412..00000000000 Binary files a/x-pack/docs/en/ml/images/ml-gs-forecast.jpg and /dev/null differ diff --git a/x-pack/docs/en/ml/images/ml-gs-job1-analysis.jpg b/x-pack/docs/en/ml/images/ml-gs-job1-analysis.jpg deleted file mode 100644 index 9b34c916c80..00000000000 Binary files a/x-pack/docs/en/ml/images/ml-gs-job1-analysis.jpg and /dev/null differ diff --git a/x-pack/docs/en/ml/images/ml-gs-job1-anomalies.jpg b/x-pack/docs/en/ml/images/ml-gs-job1-anomalies.jpg deleted file mode 100644 index d0d77827c90..00000000000 Binary files a/x-pack/docs/en/ml/images/ml-gs-job1-anomalies.jpg and /dev/null differ diff --git a/x-pack/docs/en/ml/images/ml-gs-job1-datafeed.jpg b/x-pack/docs/en/ml/images/ml-gs-job1-datafeed.jpg deleted file mode 100644 index aa36b5f13ea..00000000000 Binary files a/x-pack/docs/en/ml/images/ml-gs-job1-datafeed.jpg and /dev/null differ diff --git a/x-pack/docs/en/ml/images/ml-gs-job1-explorer-anomaly.jpg b/x-pack/docs/en/ml/images/ml-gs-job1-explorer-anomaly.jpg deleted file mode 100644 index 9e6c76a5518..00000000000 Binary files a/x-pack/docs/en/ml/images/ml-gs-job1-explorer-anomaly.jpg and /dev/null differ diff --git a/x-pack/docs/en/ml/images/ml-gs-job1-explorer.jpg b/x-pack/docs/en/ml/images/ml-gs-job1-explorer.jpg deleted file mode 100644 index bb436a72e50..00000000000 Binary files a/x-pack/docs/en/ml/images/ml-gs-job1-explorer.jpg and /dev/null differ diff --git a/x-pack/docs/en/ml/images/ml-gs-job1-manage1.jpg b/x-pack/docs/en/ml/images/ml-gs-job1-manage1.jpg deleted file mode 100644 index a2cba454e9d..00000000000 Binary files a/x-pack/docs/en/ml/images/ml-gs-job1-manage1.jpg and /dev/null differ diff --git a/x-pack/docs/en/ml/images/ml-gs-job1-results.jpg b/x-pack/docs/en/ml/images/ml-gs-job1-results.jpg deleted file mode 100644 index 0b04fec0e2d..00000000000 Binary files a/x-pack/docs/en/ml/images/ml-gs-job1-results.jpg and /dev/null differ diff --git a/x-pack/docs/en/ml/images/ml-gs-job1-time.jpg b/x-pack/docs/en/ml/images/ml-gs-job1-time.jpg deleted file mode 100644 index 9cecf7e8b54..00000000000 Binary files a/x-pack/docs/en/ml/images/ml-gs-job1-time.jpg and /dev/null differ diff --git a/x-pack/docs/en/ml/images/ml-gs-job1.jpg b/x-pack/docs/en/ml/images/ml-gs-job1.jpg deleted file mode 100644 index 7251bfc3f6b..00000000000 Binary files a/x-pack/docs/en/ml/images/ml-gs-job1.jpg and /dev/null differ diff --git a/x-pack/docs/en/ml/images/ml-gs-job2-explorer-anomaly.jpg b/x-pack/docs/en/ml/images/ml-gs-job2-explorer-anomaly.jpg deleted file mode 100644 index f7579dd338f..00000000000 Binary files a/x-pack/docs/en/ml/images/ml-gs-job2-explorer-anomaly.jpg and /dev/null differ diff --git a/x-pack/docs/en/ml/images/ml-gs-job2-explorer-host.jpg b/x-pack/docs/en/ml/images/ml-gs-job2-explorer-host.jpg deleted file mode 100644 index cfe3f4fba6d..00000000000 Binary files a/x-pack/docs/en/ml/images/ml-gs-job2-explorer-host.jpg and /dev/null differ diff --git a/x-pack/docs/en/ml/images/ml-gs-job2-explorer-table.jpg b/x-pack/docs/en/ml/images/ml-gs-job2-explorer-table.jpg deleted file mode 100644 index cb3b8205bc8..00000000000 Binary files a/x-pack/docs/en/ml/images/ml-gs-job2-explorer-table.jpg and /dev/null differ diff --git a/x-pack/docs/en/ml/images/ml-gs-job2-explorer.jpg b/x-pack/docs/en/ml/images/ml-gs-job2-explorer.jpg deleted file mode 100644 index 20809aa3d1b..00000000000 Binary files a/x-pack/docs/en/ml/images/ml-gs-job2-explorer.jpg and /dev/null differ diff --git a/x-pack/docs/en/ml/images/ml-gs-job2-split.jpg b/x-pack/docs/en/ml/images/ml-gs-job2-split.jpg deleted file mode 100644 index 4e07b865532..00000000000 Binary files a/x-pack/docs/en/ml/images/ml-gs-job2-split.jpg and /dev/null differ diff --git a/x-pack/docs/en/ml/images/ml-gs-multi-job.jpg b/x-pack/docs/en/ml/images/ml-gs-multi-job.jpg deleted file mode 100644 index 03bb6ae1196..00000000000 Binary files a/x-pack/docs/en/ml/images/ml-gs-multi-job.jpg and /dev/null differ diff --git a/x-pack/docs/en/ml/images/ml-gs-single-job.jpg b/x-pack/docs/en/ml/images/ml-gs-single-job.jpg deleted file mode 100644 index 5d813444db9..00000000000 Binary files a/x-pack/docs/en/ml/images/ml-gs-single-job.jpg and /dev/null differ diff --git a/x-pack/docs/en/ml/limitations.asciidoc b/x-pack/docs/en/ml/limitations.asciidoc deleted file mode 100644 index 1efe6b19027..00000000000 --- a/x-pack/docs/en/ml/limitations.asciidoc +++ /dev/null @@ -1,198 +0,0 @@ -[[ml-limitations]] -== Machine Learning Limitations - -The following limitations and known problems apply to the {version} release of -{xpack}: - -[float] -=== Categorization uses English dictionary words -//See x-pack-elasticsearch/#3021 -Categorization identifies static parts of unstructured logs and groups similar -messages together. The default categorization tokenizer assumes English language -log messages. For other languages you must define a different -`categorization_analyzer` for your job. For more information, see -<>. - -Additionally, a dictionary used to influence the categorization process contains -only English words. This means categorization might work better in English than -in other languages. The ability to customize the dictionary will be added in a -future release. - -[float] -=== Pop-ups must be enabled in browsers -//See x-pack-elasticsearch/#844 - -The {xpackml} features in {kib} use pop-ups. You must configure your -web browser so that it does not block pop-up windows or create an -exception for your {kib} URL. - -[float] -=== Anomaly Explorer omissions and limitations -//See x-pack-elasticsearch/#844 and x-pack-kibana/#1461 - -In {kib}, Anomaly Explorer charts are not displayed for anomalies -that were due to categorization, `time_of_day` functions, or `time_of_week` -functions. Those particular results do not display well as time series -charts. - -The charts are also not displayed for detectors that use script fields. In that -case, the original source data cannot be easily searched because it has been -somewhat transformed by the script. - -The Anomaly Explorer charts can also look odd in circumstances where there -is very little data to plot. For example, if there is only one data point, it is -represented as a single dot. If there are only two data points, they are joined -by a line. - -[float] -=== Jobs close on the {dfeed} end date -//See x-pack-elasticsearch/#1037 - -If you start a {dfeed} and specify an end date, it will close the job when -the {dfeed} stops. This behavior avoids having numerous open one-time jobs. - -If you do not specify an end date when you start a {dfeed}, the job -remains open when you stop the {dfeed}. This behavior avoids the overhead -of closing and re-opening large jobs when there are pauses in the {dfeed}. - -[float] -=== Jobs created in {kib} must use {dfeeds} - -If you create jobs in {kib}, you must use {dfeeds}. If the data that you want to -analyze is not stored in {es}, you cannot use {dfeeds} and therefore you cannot -create your jobs in {kib}. You can, however, use the {ml} APIs to create jobs -and to send batches of data directly to the jobs. For more information, see -<> and <>. - -[float] -=== Post data API requires JSON format - -The post data API enables you to send data to a job for analysis. The data that -you send to the job must use the JSON format. - -For more information about this API, see -{ref}/ml-post-data.html[Post Data to Jobs]. - - -[float] -=== Misleading high missing field counts -//See x-pack-elasticsearch/#684 - -One of the counts associated with a {ml} job is `missing_field_count`, -which indicates the number of records that are missing a configured field. -//This information is most useful when your job analyzes CSV data. In this case, -//missing fields indicate data is not being analyzed and you might receive poor results. - -Since jobs analyze JSON data, the `missing_field_count` might be misleading. -Missing fields might be expected due to the structure of the data and therefore -do not generate poor results. - -For more information about `missing_field_count`, -see {ref}/ml-jobstats.html#ml-datacounts[Data Counts Objects]. - - -[float] -=== Terms aggregation size affects data analysis -//See x-pack-elasticsearch/#601 - -By default, the `terms` aggregation returns the buckets for the top ten terms. -You can change this default behavior by setting the `size` parameter. - -If you are send pre-aggregated data to a job for analysis, you must ensure -that the `size` is configured correctly. Otherwise, some data might not be -analyzed. - - -[float] -=== Time-based index patterns are not supported -//See x-pack-elasticsearch/#1910 - -It is not possible to create an {xpackml} analysis job that uses time-based -index patterns, for example `[logstash-]YYYY.MM.DD`. -This applies to the single metric or multi metric job creation wizards in {kib}. - - -[float] -=== Fields named "by", "count", or "over" cannot be used to split data -//See x-pack-elasticsearch/#858 - -You cannot use the following field names in the `by_field_name` or -`over_field_name` properties in a job: `by`; `count`; `over`. This limitation -also applies to those properties when you create advanced jobs in {kib}. - - -[float] -=== Jobs created in {kib} use model plot config and pre-aggregated data -//See x-pack-elasticsearch/#844 - -If you create single or multi-metric jobs in {kib}, it might enable some -options under the covers that you'd want to reconsider for large or -long-running jobs. - -For example, when you create a single metric job in {kib}, it generally -enables the `model_plot_config` advanced configuration option. That configuration -option causes model information to be stored along with the results and provides -a more detailed view into anomaly detection. It is specifically used by the -**Single Metric Viewer** in {kib}. When this option is enabled, however, it can -add considerable overhead to the performance of the system. If you have jobs -with many entities, for example data from tens of thousands of servers, storing -this additional model information for every bucket might be problematic. If you -are not certain that you need this option or if you experience performance -issues, edit your job configuration to disable this option. - -For more information, see -{ref}/ml-job-resource.html#ml-apimodelplotconfig[Model Plot Config]. - -Likewise, when you create a single or multi-metric job in {kib}, in some cases -it uses aggregations on the data that it retrieves from {es}. One of the -benefits of summarizing data this way is that {es} automatically distributes -these calculations across your cluster. This summarized data is then fed into -{xpackml} instead of raw results, which reduces the volume of data that must -be considered while detecting anomalies. However, if you have two jobs, one of -which uses pre-aggregated data and another that does not, their results might -differ. This difference is due to the difference in precision of the input data. -The {ml} analytics are designed to be aggregation-aware and the likely increase -in performance that is gained by pre-aggregating the data makes the potentially -poorer precision worthwhile. If you want to view or change the aggregations -that are used in your job, refer to the `aggregations` property in your {dfeed}. - -For more information, see {ref}/ml-datafeed-resource.html[Datafeed Resources]. - -[float] -=== Security Integration - -When {security} is enabled, a {dfeed} stores the roles of the user who created -or updated the {dfeed} **at that time**. This means that if those roles are -updated then the {dfeed} subsequently runs with the new permissions that are -associated with the roles. However, if the user's roles are adjusted after -creating or updating the {dfeed}, the {dfeed} continues to run with the -permissions that were associated with the original roles. For more information, -see <>. - -[float] -=== Forecasts cannot be created for population jobs - -If you use an `over_field_name` property in your job (that is to say, it's a -_population job_), you cannot create a forecast. If you try to create a forecast -for this type of job, an error occurs. For more information about forecasts, -see <>. - -[float] -=== Forecasts cannot be created for jobs that use geographic, rare, or time functions - -If you use any of the following analytical functions in your job, you cannot -create a forecast: - -* `lat_long` -* `rare` and `freq_rare` -* `time_of_day` and `time_of_week` - -If you try to create a forecast for this type of job, an error occurs. For more -information about any of these functions, see <>. - -[float] -=== Jobs must be stopped before upgrades - -You must stop any {ml} jobs that are running before you start the upgrade -process. For more information, see <> and -{stack-ref}/upgrading-elastic-stack.html[Upgrading the Elastic Stack]. diff --git a/x-pack/docs/en/ml/troubleshooting.asciidoc b/x-pack/docs/en/ml/troubleshooting.asciidoc deleted file mode 100644 index d5244cebdae..00000000000 --- a/x-pack/docs/en/ml/troubleshooting.asciidoc +++ /dev/null @@ -1,116 +0,0 @@ -[[ml-troubleshooting]] -== {xpackml} Troubleshooting -++++ -{xpackml} -++++ - -Use the information in this section to troubleshoot common problems and find -answers for frequently asked questions. - -* <> -* <> - -To get help, see <>. - -[[ml-rollingupgrade]] -=== Machine learning features unavailable after rolling upgrade - -This problem occurs after you upgrade all of the nodes in your cluster to -{version} by using rolling upgrades. When you try to use {xpackml} features for -the first time, all attempts fail, though `GET _xpack` and `GET _xpack/usage` -indicate that {xpack} is enabled. - -*Symptoms:* - -* Errors when you click *Machine Learning* in {kib}. -For example: `Jobs list could not be created` and `An internal server error occurred`. -* Null pointer and remote transport exceptions when you run {ml} APIs such as -`GET _xpack/ml/anomaly_detectors` and `GET _xpack/ml/datafeeds`. -* Errors in the log files on the master nodes. -For example: `unable to install ml metadata upon startup` - -*Resolution:* - -After you upgrade all master-eligible nodes to {es} {version} and {xpack} -{version}, restart the current master node, which triggers the {xpackml} -features to re-initialize. - -For more information, see {ref}/rolling-upgrades.html[Rolling upgrades]. - -[[ml-mappingclash]] -=== Job creation failure due to mapping clash - -This problem occurs when you try to create a job. - -*Symptoms:* - -* Illegal argument exception occurs when you click *Create Job* in {kib} or run -the create job API. For example: -`Save failed: [status_exception] This job would cause a mapping clash -with existing field [field_name] - avoid the clash by assigning a dedicated -results index` or `Save failed: [illegal_argument_exception] Can't merge a non -object mapping [field_name] with an object mapping [field_name]`. - -*Resolution:* - -This issue typically occurs when two or more jobs store their results in the -same index and the results contain fields with the same name but different -data types or different `fields` settings. - -By default, {ml} results are stored in the `.ml-anomalies-shared` index in {es}. -To resolve this issue, click *Advanced > Use dedicated index* when you create -the job in {kib}. If you are using the create job API, specify an index name in -the `results_index_name` property. - -[[ml-jobnames]] -=== {kib} cannot display jobs with invalid characters in their name - -This problem occurs when you create a job by using the -{ref}/ml-put-job.html[Create Jobs API] then try to view that job in {kib}. In -particular, the problem occurs when you use a period(.) in the job identifier. - -*Symptoms:* - -* When you try to open a job (named, for example, `job.test` in the -**Anomaly Explorer** or the **Single Metric Viewer**, the job name is split and -the text after the period is assumed to be the job name. If a job does not exist -with that abbreviated name, an error occurs. For example: -`Warning Requested job test does not exist`. If a job exists with that -abbreviated name, it is displayed. - -*Resolution:* - -Create jobs in {kib} or ensure that you create jobs with valid identifiers when -you use the {ml} APIs. For more information about valid identifiers, see -{ref}/ml-put-job.html[Create Jobs API] or -{ref}/ml-job-resource.html[Job Resources]. - -[[ml-upgradedf]] - -=== Upgraded nodes fail to start due to {dfeed} issues - -This problem occurs when you have a {dfeed} that contains search or query -domain specific language (DSL) that was discontinued. For example, if you -created a {dfeed} query in 5.x using search syntax that was deprecated in 5.x -and removed in 6.0, you must fix the {dfeed} before you upgrade to 6.0. - -*Symptoms:* - -* If {ref}/logging.html#deprecation-logging[deprecation logging] is enabled -before the upgrade, deprecation messages are generated when the {dfeeds} attempt -to retrieve data. -* After the upgrade, nodes fail to start and the error indicates that they -failed to read the local state. - -*Resolution:* - -Before you upgrade, identify the problematic search or query DSL. In 5.6.5 and -later, the Upgrade Assistant detects these scenarios. If you cannot fix the DSL -before the upgrade, you must delete the {dfeed} then re-create it with valid DSL -after the upgrade. - -If you do not fix or delete the {dfeed} before the upgrade, in order to successfully -start the failing nodes you must downgrade the nodes then fix the problem per -above. - -See also {stack-ref}/upgrading-elastic-stack.html[Upgrading the Elastic Stack].