diff --git a/docs/en/ml/api-quickref.asciidoc b/docs/en/ml/api-quickref.asciidoc new file mode 100644 index 00000000000..b8261be2626 --- /dev/null +++ b/docs/en/ml/api-quickref.asciidoc @@ -0,0 +1,84 @@ +[[ml-api-quickref]] +== API Quick Reference + +All {ml} endpoints have the following base: + +---- +/_xpack/ml/ +---- + +The main {ml} resources can be accessed with a variety of endpoints: + +* <>: Create and manage {ml} jobs. +* <>: Update data to be analyzed. +* <>: Access the results of a {ml} job. +* <>: Manage model snapshots. +* <>: Validate subsections of job configurations. + +[float] +[[ml-api-jobs]] +=== /anomaly_detectors/ + +* <>: Create job +* </_open>>: Open a job +* <+++>>: Send data to a job +* <>: List jobs +* <+++>>: Get job details +* </_stats>>: Get job statistics +* </_update>>: Update certain properties of the job configuration +* </_flush>>: Force a job to analyze buffered data +* </_close>>: Close a job +* <+++>>: Delete job + +[float] +[[ml-api-datafeeds]] +=== /datafeeds/ + +* <+++>>: Create a data feed +* </_start>>: Start a data feed +* <>: List data feeds +* <+++>>: Get data feed details +* </_stats>>: Get statistical information for data feeds +* </_preview>>: Get a preview of a data feed +* </_update>>: Update certain settings for a data feed +* </_stop>>: Stop a data feed +* <+++>>: Delete data feed + +[float] +[[ml-api-results]] +=== /results/ + +* <>: List the buckets in the results +* <+++>>: Get bucket details +* <>: List the categories in the results +* <+++>>: Get category details +* <>: Get influencer details +* <>: Get records from the results + +[float] +[[ml-api-snapshots]] +=== /model_snapshots/ + +* <>: List model snapshots +* <+++>>: Get model snapshot details +* </_revert>>: Revert a model snapshot +* </_update>>: Update certain settings for a model snapshot +* <+++>>: Delete a model snapshot + +[float] +[[ml-api-validate]] +=== /validate/ + +* <>: Validate a detector +* <>: Validate a job +//[float] +//== Where to Go Next + +//<> :: Enable machine learning and start +//discovering anomalies in your data. + +//[float] +//== Have Comments, Questions, or Feedback? + +//Head over to our {forum}[Graph Discussion Forum] to share your experience, questions, and +//suggestions. diff --git a/docs/en/ml/getting-started.asciidoc b/docs/en/ml/getting-started.asciidoc new file mode 100644 index 00000000000..b3b7404f8e7 --- /dev/null +++ b/docs/en/ml/getting-started.asciidoc @@ -0,0 +1,11 @@ +[[ml-getting-started]] +== Getting Started + +To start exploring anomalies in your data: + +. Open Kibana in your web browser and log in. If you are running Kibana +locally, go to `http://localhost:5601/`. + +. Click **ML** in the side navigation ... + +//image::graph-open.jpg["Accessing Graph"] diff --git a/docs/en/ml/index.asciidoc b/docs/en/ml/index.asciidoc new file mode 100644 index 00000000000..3a2abbf8ea4 --- /dev/null +++ b/docs/en/ml/index.asciidoc @@ -0,0 +1,23 @@ +[[xpack-ml]] += Machine Learning in the Elastic Stack + +[partintro] +-- +Data stored in {es} contains valuable insights into the behavior and +performance of your business and systems. However, the following questions can +be difficult to answer: + +* Is the response time of my website unusual? +* Are users exfiltrating data unusually? + +The good news is that the {xpack} machine learning capabilities enable you to +easily answer these types of questions. +-- + +include::introduction.asciidoc[] +include::getting-started.asciidoc[] +include::ml-scenarios.asciidoc[] +include::api-quickref.asciidoc[] + +//include::troubleshooting.asciidoc[] Referenced from x-pack/docs/public/xpack-troubleshooting.asciidoc +//include::release-notes.asciidoc[] Referenced from x-pack/docs/public/xpack-release-notes.asciidoc diff --git a/docs/en/ml/introduction.asciidoc b/docs/en/ml/introduction.asciidoc new file mode 100644 index 00000000000..b15e7f15d96 --- /dev/null +++ b/docs/en/ml/introduction.asciidoc @@ -0,0 +1,34 @@ +[[ml-introduction]] +== Introduction + +Machine learning in {xpack} automates the analysis of time-series data by +creating accurate baselines of normal behaviors in the data, and identifying +anomalous patterns in that data. + +Driven by proprietary machine learning algorithms, anomalies related to temporal +deviations in values/counts/frequencies, statistical rarity, and unusual +behaviors for a member of a population are detected, scored and linked with +statistically significant influencers in the data. + +Automated periodicity detection and quick adaptation to changing data ensure +that you don’t need to specify algorithms, models, or other data +science-related configurations in order to get the benefits of {ml}. +//image::graph-network.jpg["Graph network"] + +=== Integration with the Elastic Stack + +Machine learning is tightly integrated with the Elastic Stack. +Data is pulled from {es} for analysis and anomaly results are displayed in +{kb} dashboards. + +//[float] +//== Where to Go Next + +//<> :: Enable machine learning and start +//discovering anomalies in your data. + +//[float] +//== Have Comments, Questions, or Feedback? + +//Head over to our {forum}[Graph Discussion Forum] to share your experience, questions, and +//suggestions. diff --git a/docs/en/ml/limitations.asciidoc b/docs/en/ml/limitations.asciidoc new file mode 100644 index 00000000000..f93953451ff --- /dev/null +++ b/docs/en/ml/limitations.asciidoc @@ -0,0 +1,32 @@ +[[ml-limitations]] +== Machine Learning Limitations + +[float] +=== Misleading High Missing Field Counts +//See x-pack-elasticsearch/#684 + +One of the counts associated with a {ml} job is +missingFieldCount+, +which indicates the number of records that are missing a configured field. +This information is most useful when your job analyzes CSV data. In this case, +missing fields indicate data is not being analyzed and you might receive poor results. + +If your job analyzes JSON data, the +missingFieldCount+ might be misleading. +Missing fields might be expected due to the structure of the data and therefore do +not generate poor results. + + +//When you refer to a file script in a watch, the watch itself is not updated +//if you change the script on the filesystem. + +//Currently, the only way to reload a file script in a watch is to delete +//the watch and recreate it. + +//=== The _data Endpoint Requires Data to be in JSON Format + +//See x-pack-elasticsearch/#777 + +//=== tBD + +//See x-pack-elasticsearch/#601 +//When you use aggregations, you must ensure +size+ is configured correctly. +//Otherwise, not all data will be analyzed. diff --git a/docs/en/ml/ml-scenarios.asciidoc b/docs/en/ml/ml-scenarios.asciidoc new file mode 100644 index 00000000000..da47718108a --- /dev/null +++ b/docs/en/ml/ml-scenarios.asciidoc @@ -0,0 +1,100 @@ +[[ml-scenarios]] +== Use Cases + +Enterprises, government organizations and cloud based service providers daily +process volumes of machine data so massive as to make real-time human +analysis impossible. Changing behaviors hidden in this data provide the +information needed to quickly resolve massive service outage, detect security +breaches before they result in the theft of millions of credit records or +identify the next big trend in consumer patterns. Current search and analysis, +performance management and cyber security tools are unable to find these +anomalies without significant human work in the form of thresholds, rules, +signatures and data models. + +By using advanced anomaly detection techniques that learn normal behavior +patterns represented by the data and identify and cross-correlate anomalies, +performance, security and operational anomalies and their cause can be +identified as they develop, so they can be acted on before they impact business. + +Whilst anomaly detection is applicable to any type of data, we focus on machine +data scenarios. Enterprise application developers, cloud service providers and +technology vendors need to harness the power of machine learning based anomaly +detection analytics to better manage complex on-line services, detect the +earliest signs of advanced security threats and gain insight to business +opportunities and risks represented by changing behaviors hidden in their +massive data sets. Here are some real-world examples. + +=== Eliminating noise generated by threshold-based alerts + +Modern IT systems are highly instrumented and can generate TBs of machine data +a day. Traditional methods for analyzing data involves alerting when metric +values exceed a known value (static thresholds), or looking for simple statistical deviations (dynamic thresholds). + +Setting accurate thresholds for each metric at different times of day is +practically impossible. It results in static thresholds generating large volumes +of false positives (threshold set too low) and false negatives (threshold set too high). + +The {ml} features in {xpack} automatically learn and calculate the probability +of a value being anomalous based on its historical behavior. +This enables accurate alerting and highlights only the subset of relevant metrics +that have changed. These alerts provide actionable insight into what is a growing +mountain of data. + +=== Reducing troubleshooting times and subject matter expert (SME) involvement + +It is said that 75 percent of troubleshooting time is spent mining data to try +and identify the root cause of an incident. The {ml} features in {xpack} +automatically analyze data and boil down the massive volume of information +to the few metrics or log messages that have changed behavior. +This allows the subject matter experts (SMEs) to focus on the subset of +information that is relevant to an issue, which greatly reduces triage time. + +//In a major credit services provider, within a month of deployment, the company +//reported that its overall time to triage was reduced by 70 percent and the use of +//outside SMEs’ time to troubleshoot was decreased by 80 percent. + +=== Finding and fixing issues before they impact the end user + +Large-scale systems, such as online banking, typically require complex +infrastructures involving hundreds of different interdependent applications. +Just accessing an account summary page might involve dozens of different +databases, systems and applications. + +Because of their importance to the business, these systems are typically highly +resilient and a critical problem will not be allowed to re-occur. +If a problem happens, it is likely to be complicated and be the result of a +causal sequence of events that span multiple interacting resources. +Troubleshooting would require the analysis of large volumes of data with a wide +range of characteristics and data types. A variety of experts from multiple +disciplines would need to participate in time consuming “war rooms” to mine +the data for answers. + +By using {ml} in real-time, large volumes of data can be analyzed to provide +alerts to early indicators of problems and highlight the events that were likely +to have contributed to the problem. + +=== Finding rare events that may be symptomatic of a security issue + +With several hundred servers under management, the presence of new processes +running might indicate a security breach. + +Using typical operational management techniques, each server would require a +period of baselining in order to identify which processes are considered standard. +Ideally a baseline would be created for each server (or server group) +and would be periodically updated, making this a large management overhead. + +By using {ml} features in {xpack}, baselines are automatically built based +upon normal behavior patterns for each host and alerts are generated when rare +events occur. + +=== Finding anomalies in periodic data + +For data that has periodicity it is difficult for standard monitoring tools to +accurately tell whether a change in data is due to a service outage, or is a +result of usual time schedules. Daily and weekly trends in data along with +peak and off-peak hours, make it difficult to identify anomalies using standard +threshold-based methods. A min and max threshold for SMS text activity at 2am +would be very different than the thresholds that would be effective during the day. + +By using {ml}, time-related trends are automatically identified and smoothed, +leaving the residual to be analyzed for anomalies. diff --git a/docs/en/ml/release-notes.asciidoc b/docs/en/ml/release-notes.asciidoc new file mode 100644 index 00000000000..e26c368dc62 --- /dev/null +++ b/docs/en/ml/release-notes.asciidoc @@ -0,0 +1,12 @@ +[[ml-release-notes]] +== Machine Learning Release Notes + +[[ml-change-list]] +=== Change List + +[float] +==== 5.4.0 + +May 2017 + +* Introduces Machine Learning in the Elastic Stack. diff --git a/docs/en/ml/troubleshooting.asciidoc b/docs/en/ml/troubleshooting.asciidoc new file mode 100644 index 00000000000..c3ded04993a --- /dev/null +++ b/docs/en/ml/troubleshooting.asciidoc @@ -0,0 +1,4 @@ +[[ml-troubleshooting]] +== Machine Learning Troubleshooting + +TBD diff --git a/docs/en/rest-api/ml-api.asciidoc b/docs/en/rest-api/ml-api.asciidoc new file mode 100644 index 00000000000..6abf5cd5921 --- /dev/null +++ b/docs/en/rest-api/ml-api.asciidoc @@ -0,0 +1,70 @@ +[[ml-apis]] +== Machine Learning APIs + +Use machine learning to detect anomalies in time series data. + +* <> +* <> +* <> +* <> +* <> + +[[ml-api-datafeed-endpoint]] +=== Datafeeds + +include::ml/put-datafeed.asciidoc[] +include::ml/delete-datafeed.asciidoc[] +include::ml/get-datafeed.asciidoc[] +include::ml/get-datafeed-stats.asciidoc[] +include::ml/preview-datafeed.asciidoc[] +include::ml/start-datafeed.asciidoc[] +include::ml/stop-datafeed.asciidoc[] +include::ml/update-datafeed.asciidoc[] + +[[ml-api-job-endpoint]] +=== Jobs + +include::ml/close-job.asciidoc[] +include::ml/put-job.asciidoc[] +include::ml/delete-job.asciidoc[] +include::ml/get-job.asciidoc[] +include::ml/get-job-stats.asciidoc[] +include::ml/flush-job.asciidoc[] +include::ml/open-job.asciidoc[] +include::ml/post-data.asciidoc[] +include::ml/update-job.asciidoc[] +include::ml/validate-job.asciidoc[] +include::ml/validate-detector.asciidoc[] + +[[ml-api-snapshot-endpoint]] +=== Model Snapshots + +include::ml/delete-snapshot.asciidoc[] +include::ml/get-snapshot.asciidoc[] +include::ml/revert-snapshot.asciidoc[] +include::ml/update-snapshot.asciidoc[] + +[[ml-api-result-endpoint]] +=== Results + +include::ml/get-bucket.asciidoc[] +include::ml/get-category.asciidoc[] +include::ml/get-influencer.asciidoc[] +include::ml/get-record.asciidoc[] + +[[ml-api-definitions]] +=== Definitions + +include::ml/datafeedresource.asciidoc[] +include::ml/jobresource.asciidoc[] +include::ml/jobcounts.asciidoc[] +include::ml/snapshotresource.asciidoc[] +include::ml/resultsresource.asciidoc[] + + +//* <> +//* <> +//* <> +//* <> +//* <> +//* <> diff --git a/docs/en/rest-api/ml.asciidoc b/docs/en/rest-api/ml.asciidoc deleted file mode 100644 index 2ff53b2c7a4..00000000000 --- a/docs/en/rest-api/ml.asciidoc +++ /dev/null @@ -1,20 +0,0 @@ -[[ml-api]] -== Machine Learning APIs - -Use machine learning to detect anomalies in time series data. - -//=== Job Management APIs -//* <> -//* <> -//* <> -//* <> -//* <> -//* <> - - -//include::ml/put-job.asciidoc[] -//include::ml/delete-job.asciidoc[] -//include::ml/get-job.asciidoc[] -//include::ml/open-close-job.asciidoc[] -//include::ml/flush-job.asciidoc[] -//include::ml/post-data.asciidoc[] diff --git a/docs/en/rest-api/ml/close-job.asciidoc b/docs/en/rest-api/ml/close-job.asciidoc new file mode 100644 index 00000000000..d7950c1a51e --- /dev/null +++ b/docs/en/rest-api/ml/close-job.asciidoc @@ -0,0 +1,63 @@ +[[ml-close-job]] +==== Close Jobs + +An anomaly detection job must be opened in order for it to be ready to receive and analyze data. +A job may be opened and closed multiple times throughout its lifecycle. + +===== Request + +`POST _xpack/ml/anomaly_detectors//_close` + +===== Description + +A job can be closed once all data has been analyzed. + +When you close a job, it runs housekeeping tasks such as pruning the model history, +flushing buffers, calculating final results and persisting the internal models. +Depending upon the size of the job, it could take several minutes to close and +the equivalent time to re-open. + +Once closed, the anomaly detection job has almost no overhead on the cluster +(except for maintaining its meta data). A closed job is blocked for receiving +data and analysis operations, however you can still explore and navigate results. + +//NOTE: +//OUTDATED?: If using the {prelert} UI, the job will be automatically closed when stopping a datafeed job. + +===== Path Parameters + +`job_id` (required):: + (+string+) Identifier for the job + +===== Query Parameters + +`close_timeout`:: + (+time+; default: ++30 min++) Controls the time to wait until a job has closed + +//// +===== Responses + +200 +(EmptyResponse) The cluster has been successfully deleted +404 +(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found) +412 +(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error) +//// +===== Examples + +The following example closes the `event_rate` job: + +[source,js] +-------------------------------------------------- +POST _xpack/ml/anomaly_detectors/event_rate/_close +-------------------------------------------------- +// CONSOLE +// TEST[skip:todo] + +When the job is closed, you receive the following results: +---- +{ + "closed": true +} +---- diff --git a/docs/en/rest-api/ml/datafeedresource.asciidoc b/docs/en/rest-api/ml/datafeedresource.asciidoc new file mode 100644 index 00000000000..d6fca9df56c --- /dev/null +++ b/docs/en/rest-api/ml/datafeedresource.asciidoc @@ -0,0 +1,10 @@ +[[ml-datafeed-resource]] +==== Data Feed Resources + +A data feed resource has the following properties: + +TBD +//// +`analysis_config`:: + (+object+) The analysis configuration, which specifies how to analyze the data. See <>. +//// diff --git a/docs/en/rest-api/ml/delete-datafeed.asciidoc b/docs/en/rest-api/ml/delete-datafeed.asciidoc new file mode 100644 index 00000000000..a4bcdfbf22f --- /dev/null +++ b/docs/en/rest-api/ml/delete-datafeed.asciidoc @@ -0,0 +1,56 @@ +[[ml-delete-datafeed]] +==== Delete Data Feeds + +The delete data feed API allows you to delete an existing data feed. + +===== Request + +`DELETE _xpack/ml/datafeeds/` + +//// +===== Description + +All job configuration, model state and results are deleted. + +IMPORTANT: Deleting a job must be done via this API only. Do not delete the + job directly from the `.ml-*` indices using the Elasticsearch + DELETE Document API. When {security} is enabled, make sure no `write` + privileges are granted to anyone over the `.ml-*` indices. + +Before you can delete a job, you must delete the data feeds that are associated with it. +//See <<>>. + +It is not currently possible to delete multiple jobs using wildcards or a comma separated list. +//// +===== Path Parameters + +`feed_id` (required):: + (+string+) Identifier for the data feed +//// +===== Responses + +200 +(EmptyResponse) The cluster has been successfully deleted +404 +(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found) +412 +(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error) +//// + +===== Examples + +The following example deletes the `datafeed-it-ops` data feed: + +[source,js] +-------------------------------------------------- +DELETE _xpack/ml/datafeeds/datafeed-it-ops +-------------------------------------------------- +// CONSOLE +// TEST[skip:todo] + +When the data feed is deleted, you receive the following results: +---- +{ + "acknowledged": true +} +---- diff --git a/docs/en/rest-api/ml/delete-job.asciidoc b/docs/en/rest-api/ml/delete-job.asciidoc new file mode 100644 index 00000000000..5199b846177 --- /dev/null +++ b/docs/en/rest-api/ml/delete-job.asciidoc @@ -0,0 +1,55 @@ +[[ml-delete-job]] +==== Delete Jobs + +The delete job API allows you to delete an existing anomaly detection job. + +===== Request + +`DELETE _xpack/ml/anomaly_detectors/` + +===== Description + +All job configuration, model state and results are deleted. + +IMPORTANT: Deleting a job must be done via this API only. Do not delete the + job directly from the `.ml-*` indices using the Elasticsearch + DELETE Document API. When {security} is enabled, make sure no `write` + privileges are granted to anyone over the `.ml-*` indices. + +Before you can delete a job, you must delete the data feeds that are associated with it. +//See <<>>. + +It is not currently possible to delete multiple jobs using wildcards or a comma separated list. + +===== Path Parameters + +`job_id` (required):: + (+string+) Identifier for the job +//// +===== Responses + +200 +(EmptyResponse) The cluster has been successfully deleted +404 +(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found) +412 +(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error) +//// + +===== Examples + +The following example deletes the `event_rate` job: + +[source,js] +-------------------------------------------------- +DELETE _xpack/ml/anomaly_detectors/event_rate +-------------------------------------------------- +// CONSOLE +// TEST[skip:todo] + +When the job is deleted, you receive the following results: +---- +{ + "acknowledged": true +} +---- diff --git a/docs/en/rest-api/ml/delete-snapshot.asciidoc b/docs/en/rest-api/ml/delete-snapshot.asciidoc new file mode 100644 index 00000000000..54b7fa0534b --- /dev/null +++ b/docs/en/rest-api/ml/delete-snapshot.asciidoc @@ -0,0 +1,60 @@ +[[ml-delete-snapshot]] +==== Delete Model Snapshots + +The delete model snapshot API allows you to delete an existing model snapshot. + +===== Request + +`DELETE _xpack/ml/anomaly_detectors//model_snapshots/` + +//// +===== Description + +All job configuration, model state and results are deleted. + +IMPORTANT: Deleting a job must be done via this API only. Do not delete the + job directly from the `.ml-*` indices using the Elasticsearch + DELETE Document API. When {security} is enabled, make sure no `write` + privileges are granted to anyone over the `.ml-*` indices. + +Before you can delete a job, you must delete the data feeds that are associated with it. +//See <<>>. + +It is not currently possible to delete multiple jobs using wildcards or a comma separated list. +//// +===== Path Parameters + +`job_id` (required):: + (+string+) Identifier for the job + +`snapshot_id` (required):: + (+string+) Identifier for the model snapshot +//// +===== Responses + +200 +(EmptyResponse) The cluster has been successfully deleted +404 +(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found) +412 +(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error) + + +===== Examples + +The following example deletes the `event_rate` job: + +[source,js] +-------------------------------------------------- +DELETE _xpack/ml/anomaly_detectors/event_rate +-------------------------------------------------- +// CONSOLE +// TEST[skip:todo] + +When the job is deleted, you receive the following results: +---- +{ + "acknowledged": true +} +---- +//// diff --git a/docs/en/rest-api/ml/flush-job.asciidoc b/docs/en/rest-api/ml/flush-job.asciidoc new file mode 100644 index 00000000000..f1eba72afb9 --- /dev/null +++ b/docs/en/rest-api/ml/flush-job.asciidoc @@ -0,0 +1,49 @@ +[[ml-flush-job]] +==== Flush Jobs + +The flush job API forces any buffered data to be processed by the {ml} job. + +===== Request +`POST _xpack/ml/anomaly_detectors//_flush` + +===== Description + +The flush job API is only applicable when sending data for analysis using the POST `_data` API. +Depending on the content of the buffer, then it might additionally calculate new results. + +Both flush and close operations are similar, however the flush is more efficient if you are expecting to send more data for analysis. +When flushing, the job remains open and is available to continue analyzing data. +A close operation additionally prunes and persists the model state to disk and the job must be opened again before analyzing further data. + +===== Path Parameters + +`job_id` (required):: +( +string+) Identifier for the job + +===== Query Parameters + +`calc_interim`:: + (+boolean+; default: ++false++) If true (default false), will calculate interim + results for the most recent bucket or all buckets within the latency period + +`start`:: + (+string+; default: ++null++) When used in conjunction with `calc_interim`, + specifies the range of buckets on which to calculate interim results + +`end`:: + (+string+; default: ++null++) When used in conjunction with `calc_interim`, + specifies the range of buckets on which to calculate interim results + + +`advance_time`:: + (+string+; default: ++null++) Specifies that no data prior to the date `advance_time` is expected + +//// +===== Responses +200 +(EmptyResponse) The cluster has been successfully deleted +404 +(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found) +412 +(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error) +//// diff --git a/docs/en/rest-api/ml/get-bucket.asciidoc b/docs/en/rest-api/ml/get-bucket.asciidoc new file mode 100644 index 00000000000..2f6f0808480 --- /dev/null +++ b/docs/en/rest-api/ml/get-bucket.asciidoc @@ -0,0 +1,86 @@ +[[ml-get-bucket]] +==== Get Buckets + +The get bucket API allows you to retrieve information about buckets in the results from a job. + +===== Request + +`GET _xpack/ml/anomaly_detectors//results/buckets` + + +`GET _xpack/ml/anomaly_detectors//results/buckets/` +//// +===== Description + +OUTDATED?: The get job API can also be applied to all jobs by using `_all` as the job name. +//// +===== Path Parameters + +`job_id`:: + (+string+) Identifier for the job + +`timestamp`:: + (+string+) The timestamp of a single bucket result. If you do not specify this optional parameter, + the API returns information about all buckets that you have authority to view in the job. + +//// +===== Results + +The API returns information about the job resource. For more information, see +<>. + +===== Query Parameters + +`_stats`:: +(+boolean+; default: ++true++) If true (default false), will just validate the cluster definition but will not perform the creation + +===== Responses + +200 +(EmptyResponse) The cluster has been successfully deleted +404 +(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found) +412 +(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error) + +===== Examples + +.Example results for a single job +---- +{ + "count": 1, + "jobs": [ + { + "job_id": "it-ops-kpi", + "description": "First simple job", + "create_time": 1491007356077, + "finished_time": 1491007365347, + "analysis_config": { + "bucket_span": "5m", + "latency": "0ms", + "summary_count_field_name": "doc_count", + "detectors": [ + { + "detector_description": "low_sum(events_per_min)", + "function": "low_sum", + "field_name": "events_per_min", + "detector_rules": [] + } + ], + "influencers": [], + "use_per_partition_normalization": false + }, + "data_description": { + "time_field": "@timestamp", + "time_format": "epoch_ms" + }, + "model_plot_config": { + "enabled": true + }, + "model_snapshot_retention_days": 1, + "model_snapshot_id": "1491007364", + "results_index_name": "shared" + } + ] +} +---- +//// diff --git a/docs/en/rest-api/ml/get-category.asciidoc b/docs/en/rest-api/ml/get-category.asciidoc new file mode 100644 index 00000000000..c2b99875f19 --- /dev/null +++ b/docs/en/rest-api/ml/get-category.asciidoc @@ -0,0 +1,86 @@ +[[ml-get-category]] +==== Get Categories + +The get categories API allows you to retrieve information about the categories in the results for a job. + +===== Request + +`GET _xpack/ml/anomaly_detectors//results/categories` + + +`GET _xpack/ml/anomaly_detectors//results/categories/` +//// +===== Description + +OUTDATED?: The get job API can also be applied to all jobs by using `_all` as the job name. +//// +===== Path Parameters + +`job_id`:: + (+string+) Identifier for the job. + +`category_id`:: + (+string+) Identifier for the category. If you do not specify this optional parameter, + the API returns information about all categories that you have authority to view. + +//// +===== Results + +The API returns information about the job resource. For more information, see +<>. + +===== Query Parameters + +`_stats`:: +(+boolean+; default: ++true++) If true (default false), will just validate the cluster definition but will not perform the creation + +===== Responses + +200 +(EmptyResponse) The cluster has been successfully deleted +404 +(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found) +412 +(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error) + +===== Examples + +.Example results for a single job +---- +{ + "count": 1, + "jobs": [ + { + "job_id": "it-ops-kpi", + "description": "First simple job", + "create_time": 1491007356077, + "finished_time": 1491007365347, + "analysis_config": { + "bucket_span": "5m", + "latency": "0ms", + "summary_count_field_name": "doc_count", + "detectors": [ + { + "detector_description": "low_sum(events_per_min)", + "function": "low_sum", + "field_name": "events_per_min", + "detector_rules": [] + } + ], + "influencers": [], + "use_per_partition_normalization": false + }, + "data_description": { + "time_field": "@timestamp", + "time_format": "epoch_ms" + }, + "model_plot_config": { + "enabled": true + }, + "model_snapshot_retention_days": 1, + "model_snapshot_id": "1491007364", + "results_index_name": "shared" + } + ] +} +---- +//// diff --git a/docs/en/rest-api/ml/get-datafeed-stats.asciidoc b/docs/en/rest-api/ml/get-datafeed-stats.asciidoc new file mode 100644 index 00000000000..42fb5168018 --- /dev/null +++ b/docs/en/rest-api/ml/get-datafeed-stats.asciidoc @@ -0,0 +1,105 @@ +[[ml-get-datafeed-stats]] +==== Get Data Feed Statistics + +The get data feed statistics API allows you to retrieve usage information for data feeds. + +===== Request + +`GET _xpack/datafeeds/_stats` + + +`GET _xpack/datafeeds//_stats` + +//// +===== Description + +TBD +//// +===== Path Parameters + +`feed_id`:: + (+string+) Identifier for the data feed. If you do not specify this optional parameter, + the API returns information about all data feeds that you have authority to view. + + +//// +===== Results + +The API returns the following usage information: + +`job_id`:: + (+string+) A numerical character string that uniquely identifies the job. + +`data_counts`:: + (+object+) An object that describes the number of records processed and any related error counts. + See <>. + +`model_size_stats`:: + (+object+) An object that provides information about the size and contents of the model. + See <> + +`state`:: + (+string+) The status of the job, which can be one of the following values: + running:: The job is actively receiving and processing data. + closed:: The job finished successfully with its model state persisted. + The job is still available to accept further data. NOTE: If you send data in a periodic cycle + and close the job at the end of each transaction, the job is marked as closed in the intervals + between when data is sent. For example, if data is sent every minute and it takes 1 second to process, + the job has a closed state for 59 seconds. + failed:: The job did not finish successfully due to an error. NOTE: This can occur due to invalid input data. + In this case, sending corrected data to a failed job re-opens the job and resets it to a running state. + + + +===== Responses + +200 +(EmptyResponse) The cluster has been successfully deleted +404 +(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found) +412 +(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error) + +===== Examples + +.Example results for a single job +---- +{ + "count": 1, + "jobs": [ + { + "job_id": "it-ops-kpi", + "data_counts": { + "job_id": "it-ops", + "processed_record_count": 43272, + "processed_field_count": 86544, + "input_bytes": 2846163, + "input_field_count": 86544, + "invalid_date_count": 0, + "missing_field_count": 0, + "out_of_order_timestamp_count": 0, + "empty_bucket_count": 0, + "sparse_bucket_count": 0, + "bucket_count": 4329, + "earliest_record_timestamp": 1454020560000, + "latest_record_timestamp": 1455318900000, + "last_data_time": 1491235405945, + "input_record_count": 43272 + }, + "model_size_stats": { + "job_id": "it-ops", + "result_type": "model_size_stats", + "model_bytes": 25586, + "total_by_field_count": 3, + "total_over_field_count": 0, + "total_partition_field_count": 2, + "bucket_allocation_failures_count": 0, + "memory_status": "ok", + "log_time": 1491235406000, + "timestamp": 1455318600000 + }, + "state": "closed" + } + ] +} +---- +//// diff --git a/docs/en/rest-api/ml/get-datafeed.asciidoc b/docs/en/rest-api/ml/get-datafeed.asciidoc new file mode 100644 index 00000000000..b8b6dbf4e18 --- /dev/null +++ b/docs/en/rest-api/ml/get-datafeed.asciidoc @@ -0,0 +1,92 @@ +[[ml-get-datafeed]] +==== Get Data Feeds + +The get data feeds API allows you to retrieve configuration information about data feeds. + +===== Request + +`GET _xpack/ml/datafeeds/` + + +`GET _xpack/ml/datafeeds/` +//// +===== Description + +OUTDATED?: The get job API can also be applied to all jobs by using `_all` as the job name. +//// +===== Path Parameters + +`feed_id`:: + (+string+) Identifier for the data feed. If you do not specify this optional parameter, + the API returns information about all data feeds that you have authority to view. + +===== Results + +The API returns information about the data feed resource. +//For more information, see <>. + +//// +===== Query Parameters + +`_stats`:: +(+boolean+; default: ++true++) If true (default false), will just validate the cluster definition but will not perform the creation + +===== Responses + +200 +(EmptyResponse) The cluster has been successfully deleted +404 +(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found) +412 +(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error) +//// +===== Examples + +.Example results for a single data feed +---- +{ + "count": 1, + "datafeeds": [ + { + "datafeed_id": "datafeed-it-ops", + "job_id": "it-ops", + "query_delay": "60s", + "frequency": "150s", + "indexes": [ + "it_ops_metrics" + ], + "types": [ + "network", + "kpi", + "sql" + ], + "query": { + "match_all": { + "boost": 1 + } + }, + "aggregations": { + "@timestamp": { + "histogram": { + "field": "@timestamp", + "interval": 30000, + "offset": 0, + "order": { + "_key": "asc" + }, + "keyed": false, + "min_doc_count": 0 + }, + "aggregations": { + "events_per_min": { + "sum": { + "field": "events_per_min" + } + } + } + } + }, + "scroll_size": 1000 + } + ] +} +---- diff --git a/docs/en/rest-api/ml/get-influencer.asciidoc b/docs/en/rest-api/ml/get-influencer.asciidoc new file mode 100644 index 00000000000..66bf2170260 --- /dev/null +++ b/docs/en/rest-api/ml/get-influencer.asciidoc @@ -0,0 +1,81 @@ +[[ml-get-influencer]] +==== Get Influencers + +The get influencers API allows you to retrieve information about the influencers in a job. + +===== Request + +`GET _xpack/ml/anomaly_detectors//results/influencers` + +//// +===== Description + +OUTDATED?: The get job API can also be applied to all jobs by using `_all` as the job name. +//// +===== Path Parameters + +`job_id`:: + (+string+) Identifier for the job. + +//// +===== Results + +The API returns information about the job resource. For more information, see +<>. + +===== Query Parameters + +`_stats`:: +(+boolean+; default: ++true++) If true (default false), will just validate the cluster definition but will not perform the creation + +===== Responses + +200 +(EmptyResponse) The cluster has been successfully deleted +404 +(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found) +412 +(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error) + +===== Examples + +.Example results for a single job +---- +{ + "count": 1, + "jobs": [ + { + "job_id": "it-ops-kpi", + "description": "First simple job", + "create_time": 1491007356077, + "finished_time": 1491007365347, + "analysis_config": { + "bucket_span": "5m", + "latency": "0ms", + "summary_count_field_name": "doc_count", + "detectors": [ + { + "detector_description": "low_sum(events_per_min)", + "function": "low_sum", + "field_name": "events_per_min", + "detector_rules": [] + } + ], + "influencers": [], + "use_per_partition_normalization": false + }, + "data_description": { + "time_field": "@timestamp", + "time_format": "epoch_ms" + }, + "model_plot_config": { + "enabled": true + }, + "model_snapshot_retention_days": 1, + "model_snapshot_id": "1491007364", + "results_index_name": "shared" + } + ] +} +---- +//// diff --git a/docs/en/rest-api/ml/get-job-stats.asciidoc b/docs/en/rest-api/ml/get-job-stats.asciidoc new file mode 100644 index 00000000000..d4131c3ba4e --- /dev/null +++ b/docs/en/rest-api/ml/get-job-stats.asciidoc @@ -0,0 +1,103 @@ +[[ml-get-job-stats]] +==== Get Job Statistics + +The get jobs API allows you to retrieve usage information for jobs. + +===== Request + +`GET _xpack/anomaly_detectors/_stats` + + +`GET _xpack/anomaly_detectors//_stats` + +//// +===== Description + +TBD +//// +===== Path Parameters + +`job_id`:: + (+string+) Identifier for the job. If you do not specify this optional parameter, + the API returns information about all jobs that you have authority to view. + + +===== Results + +The API returns the following usage information: + +`job_id`:: + (+string+) A numerical character string that uniquely identifies the job. + +`data_counts`:: + (+object+) An object that describes the number of records processed and any related error counts. + See <>. + +`model_size_stats`:: + (+object+) An object that provides information about the size and contents of the model. + See <> + +`state`:: + (+string+) The status of the job, which can be one of the following values: + running:: The job is actively receiving and processing data. + closed:: The job finished successfully with its model state persisted. + The job is still available to accept further data. NOTE: If you send data in a periodic cycle + and close the job at the end of each transaction, the job is marked as closed in the intervals + between when data is sent. For example, if data is sent every minute and it takes 1 second to process, + the job has a closed state for 59 seconds. + failed:: The job did not finish successfully due to an error. NOTE: This can occur due to invalid input data. + In this case, sending corrected data to a failed job re-opens the job and resets it to a running state. + + +//// +===== Responses + +200 +(EmptyResponse) The cluster has been successfully deleted +404 +(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found) +412 +(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error) +//// +===== Examples + +.Example results for a single job +---- +{ + "count": 1, + "jobs": [ + { + "job_id": "it-ops-kpi", + "data_counts": { + "job_id": "it-ops", + "processed_record_count": 43272, + "processed_field_count": 86544, + "input_bytes": 2846163, + "input_field_count": 86544, + "invalid_date_count": 0, + "missing_field_count": 0, + "out_of_order_timestamp_count": 0, + "empty_bucket_count": 0, + "sparse_bucket_count": 0, + "bucket_count": 4329, + "earliest_record_timestamp": 1454020560000, + "latest_record_timestamp": 1455318900000, + "last_data_time": 1491235405945, + "input_record_count": 43272 + }, + "model_size_stats": { + "job_id": "it-ops", + "result_type": "model_size_stats", + "model_bytes": 25586, + "total_by_field_count": 3, + "total_over_field_count": 0, + "total_partition_field_count": 2, + "bucket_allocation_failures_count": 0, + "memory_status": "ok", + "log_time": 1491235406000, + "timestamp": 1455318600000 + }, + "state": "closed" + } + ] +} +---- diff --git a/docs/en/rest-api/ml/get-job.asciidoc b/docs/en/rest-api/ml/get-job.asciidoc new file mode 100644 index 00000000000..7a0112f37a0 --- /dev/null +++ b/docs/en/rest-api/ml/get-job.asciidoc @@ -0,0 +1,82 @@ +[[ml-get-job]] +==== Get Job Details + +The get jobs API allows you to retrieve configuration information about jobs. + +===== Request + +`GET _xpack/ml/anomaly_detectors/` + + +`GET _xpack/ml/anomaly_detectors/` +//// +===== Description + +OUTDATED?: The get job API can also be applied to all jobs by using `_all` as the job name. +//// +===== Path Parameters + +`job_id`:: + (+string+) Identifier for the job. If you do not specify this optional parameter, + the API returns information about all jobs that you have authority to view. + +===== Results + +The API returns information about the job resource. For more information, see +<>. + +//// +===== Query Parameters + +`_stats`:: +(+boolean+; default: ++true++) If true (default false), will just validate the cluster definition but will not perform the creation + +===== Responses + +200 +(EmptyResponse) The cluster has been successfully deleted +404 +(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found) +412 +(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error) +//// +===== Examples + +.Example results for a single job +---- +{ + "count": 1, + "jobs": [ + { + "job_id": "it-ops-kpi", + "description": "First simple job", + "create_time": 1491007356077, + "finished_time": 1491007365347, + "analysis_config": { + "bucket_span": "5m", + "latency": "0ms", + "summary_count_field_name": "doc_count", + "detectors": [ + { + "detector_description": "low_sum(events_per_min)", + "function": "low_sum", + "field_name": "events_per_min", + "detector_rules": [] + } + ], + "influencers": [], + "use_per_partition_normalization": false + }, + "data_description": { + "time_field": "@timestamp", + "time_format": "epoch_ms" + }, + "model_plot_config": { + "enabled": true + }, + "model_snapshot_retention_days": 1, + "model_snapshot_id": "1491007364", + "results_index_name": "shared" + } + ] +} +---- diff --git a/docs/en/rest-api/ml/get-record.asciidoc b/docs/en/rest-api/ml/get-record.asciidoc new file mode 100644 index 00000000000..fded2df5ed8 --- /dev/null +++ b/docs/en/rest-api/ml/get-record.asciidoc @@ -0,0 +1,81 @@ +[[ml-get-record]] +==== Get Job Details + +The get records API allows you to retrieve records from the results that were generated by a job. + +===== Request + +`GET _xpack/ml/anomaly_detectors//results/records` + +//// +===== Description + +OUTDATED?: The get job API can also be applied to all jobs by using `_all` as the job name. +//// +===== Path Parameters + +`job_id`:: + (+string+) Identifier for the job. + +//// +===== Results + +The API returns information about the job resource. For more information, see +<>. + +===== Query Parameters + +`_stats`:: +(+boolean+; default: ++true++) If true (default false), will just validate the cluster definition but will not perform the creation + +===== Responses + +200 +(EmptyResponse) The cluster has been successfully deleted +404 +(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found) +412 +(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error) + +===== Examples + +.Example results for a single job +---- +{ + "count": 1, + "jobs": [ + { + "job_id": "it-ops-kpi", + "description": "First simple job", + "create_time": 1491007356077, + "finished_time": 1491007365347, + "analysis_config": { + "bucket_span": "5m", + "latency": "0ms", + "summary_count_field_name": "doc_count", + "detectors": [ + { + "detector_description": "low_sum(events_per_min)", + "function": "low_sum", + "field_name": "events_per_min", + "detector_rules": [] + } + ], + "influencers": [], + "use_per_partition_normalization": false + }, + "data_description": { + "time_field": "@timestamp", + "time_format": "epoch_ms" + }, + "model_plot_config": { + "enabled": true + }, + "model_snapshot_retention_days": 1, + "model_snapshot_id": "1491007364", + "results_index_name": "shared" + } + ] +} +---- +//// diff --git a/docs/en/rest-api/ml/get-snapshot.asciidoc b/docs/en/rest-api/ml/get-snapshot.asciidoc new file mode 100644 index 00000000000..c8043f3f47d --- /dev/null +++ b/docs/en/rest-api/ml/get-snapshot.asciidoc @@ -0,0 +1,86 @@ +[[ml-get-snapshot]] +==== Get Model Snapshots + +The get model snapshots API allows you to retrieve information about model snapshots. + +===== Request + +`GET _xpack/ml/anomaly_detectors//model_snapshots` + + +`GET _xpack/ml/anomaly_detectors//model_snapshots/` +//// +===== Description + +OUTDATED?: The get job API can also be applied to all jobs by using `_all` as the job name. +//// +===== Path Parameters + +`job_id`:: + (+string+) Identifier for the job. + +`snapshot_id`:: + (+string+) Identifier for the job. If you do not specify this optional parameter, + the API returns information about all model snapshots that you have authority to view. + +//// +===== Results + +The API returns information about the job resource. For more information, see +<>. + +===== Query Parameters + +`_stats`:: +(+boolean+; default: ++true++) If true (default false), will just validate the cluster definition but will not perform the creation + +===== Responses + +200 +(EmptyResponse) The cluster has been successfully deleted +404 +(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found) +412 +(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error) + +===== Examples + +.Example results for a single job +---- +{ + "count": 1, + "jobs": [ + { + "job_id": "it-ops-kpi", + "description": "First simple job", + "create_time": 1491007356077, + "finished_time": 1491007365347, + "analysis_config": { + "bucket_span": "5m", + "latency": "0ms", + "summary_count_field_name": "doc_count", + "detectors": [ + { + "detector_description": "low_sum(events_per_min)", + "function": "low_sum", + "field_name": "events_per_min", + "detector_rules": [] + } + ], + "influencers": [], + "use_per_partition_normalization": false + }, + "data_description": { + "time_field": "@timestamp", + "time_format": "epoch_ms" + }, + "model_plot_config": { + "enabled": true + }, + "model_snapshot_retention_days": 1, + "model_snapshot_id": "1491007364", + "results_index_name": "shared" + } + ] +} +---- +//// diff --git a/docs/en/rest-api/ml/jobcounts.asciidoc b/docs/en/rest-api/ml/jobcounts.asciidoc new file mode 100644 index 00000000000..9d7be6e8868 --- /dev/null +++ b/docs/en/rest-api/ml/jobcounts.asciidoc @@ -0,0 +1,120 @@ +[[ml-jobcounts]] +==== Job Counts + +The `data_counts` object provides information about the operational progress of a job. +It describes the number of records processed and any related error counts. + +NOTE: Job count values are cumulative for the lifetime of a job. If a model snapshot is reverted +or old results are deleted, the job counts are not reset. + +[[ml-datacounts]] +===== Data Counts Objects + +A `data_counts` object has the following properties: + +`job_id`:: + (+string+) A numerical character string that uniquely identifies the job. + +`processed_record_count`:: + (+long+) The number of records that have been processed by the job. + This value includes records with missing fields, since they are nonetheless analyzed. + The following records are not processed: + * Records not in chronological order and outside the latency window + * Records with invalid timestamps + * Records filtered by an exclude transform + +`processed_field_count`:: + (+long+) The total number of fields in all the records that have been processed by the job. + Only fields that are specified in the detector configuration object contribute to this count. + The time stamp is not included in this count. + +`input_bytes`:: + (+long+) The number of raw bytes read by the job. + +`input_field_count`:: + (+long+) The total number of record fields read by the job. This count includes + fields that are not used in the analysis. + +`invalid_date_count`:: + (+long+) The number of records with either a missing date field or a date that could not be parsed. + +`missing_field_count`:: + (+long+) The number of records that are missing a field that the job is configured to analyze. + Records with missing fields are still processed because it is possible that not all fields are missing. + The value of `processed_record_count` includes this count. + +`out_of_order_timestamp_count`:: + (+long+) The number of records that are out of time sequence and outside of the latency window. + These records are discarded, since jobs require time series data to be in ascending chronological order. + +`empty_bucket_count`:: + TBD + +`sparse_bucket_count`:: + TBD + +`bucket_count`:: + (+long+) The number of bucket results produced by the job. + +`earliest_record_timestamp`:: + (+string+) The timestamp of the earliest chronologically ordered record. + The datetime string is in ISO 8601 format. + +`latest_record_timestamp`:: + (+string+) The timestamp of the last chronologically ordered record. + If the records are not in strict chronological order, this value might not be + the same as the timestamp of the last record. + The datetime string is in ISO 8601 format. + +`last_data_time`:: + TBD + +`input_record_count`:: + (+long+) The number of data records read by the job. + + +[[ml-modelsizestats]] +===== Model Size Stats Objects + +The `model_size_stats` object has the following properties: + +`job_id`:: + (+string+) A numerical character string that uniquely identifies the job. + +`result_type`:: + TBD + +`model_bytes`:: + (+long+) The number of bytes of memory used by the models. This is the maximum value since the + last time the model was persisted. If the job is closed, this value indicates the latest size. + +`total_by_field_count`:: + (+long+) The number of `by` field values that were analyzed by the models. + +NOTE: The `by` field values are counted separately for each detector and partition. + + +`total_over_field_count`:: + (+long+) The number of `over` field values that were analyzed by the models. + +NOTE: The `over` field values are counted separately for each detector and partition. + +`total_partition_field_count`:: + (+long+) The number of `partition` field values that were analyzed by the models. + +`bucket_allocation_failures_count`:: + TBD + +`memory_status`:: + (+string+) The status of the mathematical models. This property can have one of the following values: + "ok":: The models stayed below the configured value. + "soft_limit":: The models used more than 60% of the configured memory limit and older unused models will + be pruned to free up space. + "hard_limit":: The models used more space than the configured memory limit. As a result, + not all incoming data was processed. + +`log_time`:: + TBD + +`timestamp`:: + TBD diff --git a/docs/en/rest-api/ml/jobresource.asciidoc b/docs/en/rest-api/ml/jobresource.asciidoc new file mode 100644 index 00000000000..b894c80fb21 --- /dev/null +++ b/docs/en/rest-api/ml/jobresource.asciidoc @@ -0,0 +1,243 @@ +[[ml-job-resource]] +==== Job Resources + +A job resource has the following properties: + +`analysis_config`:: + (+object+) The analysis configuration, which specifies how to analyze the data. See <>. + +`analysis_limits`:: + (+object+) Defines limits on the number of field values and time buckets to be analyzed. + See <>. + +`create_time`:: + (+string+) The time the job was created, in ISO 8601 format. For example, `1491007356077`. + +`data_description`:: + (+object+) Describes the data format and how APIs parse timestamp fields. See <>. + +`description`:: + (+string+) An optional description of the job. + +`finished_time`:: + (+string+) If the job closed of failed, this is the time the job finished, in ISO 8601 format. + Otherwise, it is `null`. For example, `1491007365347`. + +`job_id`:: + (+string+) A numerical character string that uniquely identifies the job. + +`model_plot_config`:: TBD + `enabled`:: TBD. For example, `true`. + +`model_snapshot_id`:: + TBD. For example, `1491007364`. + + +`model_snapshot_retention_days`:: + (+long+) The time in days that model snapshots are retained for the job. Older snapshots are deleted. + The default value is 1 day. + +`results_index_name`:: + TBD. For example, `shared`. + +[[ml-analysisconfig]] +===== Analysis Configuration Objects + +An analysis configuration object has the following properties: + +`batch_span`:: + (+unsigned integer+) The interval into which to batch seasonal data, measured in seconds. + This is an advanced option which is usually left as the default value. +//// + Requires `period` to be specified +//// + +`bucket_span`:: + (+unsigned integer+, required) The size of the interval that the analysis is aggregated into, measured in seconds. + The default value is 300 seconds (5 minutes). + +`categorization_field_name`:: + (+string+) If not null, the values of the specified field will be categorized. + The resulting categories can be used in a detector by setting `by_field_name`, + `over_field_name`, or `partition_field_name` to the keyword `prelertcategory`. + +`categorization_filters`:: + (+array of strings+) If `categorization_field_name` is specified, you can also define optional filters. + This property expects an array of regular expressions. + The expressions are used to filter out matching sequences off the categorization field values. + This functionality is useful to fine tune categorization by excluding sequences + that should not be taken into consideration for defining categories. + For example, you can exclude SQL statements that appear in your log files. + +`detectors`:: + (+array+, required) An array of detector configuration objects, + which describe the anomaly detectors that are used in the job. + See <>. + +NOTE: If the `detectors` array does not contain at least one detector, no analysis can occur +and an error is returned. + +`influencers`:: + (+array of strings+) A comma separated list of influencer field names. + Typically these can be the by, over, or partition fields that are used in the detector configuration. + You might also want to use a field name that is not specifically named in a detector, + but is available as part of the input data. When you use multiple detectors, + the use of influencers is recommended as it aggregates results for each influencer entity. + +`latency`:: + (+unsigned integer+) The size of the window, in seconds, in which to expect data that is out of time order. + The default value is 0 seconds (no latency). + +NOTE: Latency is only applicable when you send data by using the <> API. + +`multivariate_by_fields`:: + (+boolean+) If set to `true`, the analysis will automatically find correlations + between metrics for a given `by` field value and report anomalies when those + correlations cease to hold. For example, suppose CPU and memory usage on host A + is usually highly correlated with the same metrics on host B. Perhaps this + correlation occurs because they are running a load-balanced application. + If you enable this property, then anomalies will be reported when, for example, + CPU usage on host A is high and the value of CPU usage on host B is low. + That is to say, you'll see an anomaly when the CPU of host A is unusual given the CPU of host B. + +NOTE: To use the `multivariate_by_fields` property, you must also specify `by_field_name` in your detector. + +`overlapping_buckets`:: + (+boolean+) If set to `true`, an additional analysis occurs that runs out of phase by half a bucket length. + This requires more system resources and enhances detection of anomalies that span bucket boundaries. + +`period`:: + (+unsigned integer+) The repeat interval for periodic data in multiples of `batch_span`. + If this property is not specified, daily and weekly periodicity are automatically determined. + This is an advanced option which is usually left as the default value. + +`summary_count_field_name`:: + (+string+) If not null, the data fed to the job is expected to be pre-summarized. + This property value is the name of the field that contains the count of raw data points that have been summarized. + The same `summary_count_field_name` applies to all detectors in the job. + +NOTE: The `summary_count_field_name` property cannot be used with the `metric` function. + + +`use_per_partition_normalization`:: + TBD + +[[ml-detectorconfig]] +===== Detector Configuration Objects + +Detector configuration objects specify which data fields a job analyzes. +They also specify which analytical functions are used. +You can specify multiple detectors for a job. +Each detector has the following properties: + +`by_field_name`:: + (+string+) The field used to split the data. + In particular, this property is used for analyzing the splits with respect to their own history. + It is used for finding unusual values in the context of the split. + +`detector_description`:: + (+string+) A description of the detector. For example, `low_sum(events_per_min)`. + +`detector_rules`:: + TBD + +`exclude_frequent`:: + (+string+) Contains one of the following values: `all`, `none`, `by`, or `over`. + If set, frequent entities are excluded from influencing the anomaly results. + Entities can be considered frequent over time or frequent in a population. + If you are working with both over and by fields, then you can set `exclude_frequent` + to `all` for both fields, or to `by` or `over` for those specific fields. + +`field_name`:: + (+string+) The field that the detector uses in the function. If you use an event rate + function such as `count` or `rare`, do not specify this field. + +NOTE: The `field_name` cannot contain double quotes or backslashes. + +`function`:: + (+string+, required) The analysis function that is used. + For example, `count`, `rare`, `mean`, `min`, `max`, and `sum`. + The default function is `metric`, which looks for anomalies in all of `min`, `max`, + and `mean`. + +NOTE: You cannot use the `metric` function with pre-summarized input. If `summary_count_field_name` + is not null, you must specify a function other than `metric`. + +`over_field_name`:: + (+string+) The field used to split the data. + In particular, this property is used for analyzing the splits with respect to the history of all splits. + It is used for finding unusual values in the population of all splits. + +`partition_field_name`:: + (+string+) The field used to segment the analysis. + When you use this property, you have completely independent baselines for each value of this field. + +`use_null`:: + (+boolean+) Defines whether a new series is used as the null series + when there is no value for the by or partition fields. The default value is `false` + +IMPORTANT: Field names are case sensitive, for example a field named 'Bytes' is different to one named 'bytes'. + +[[ml-datadescription]] +===== Data Description Objects + +The data description settings define the format of the input data. + +When data is read from Elasticsearch, the datafeed must be configured. +This defines which index data will be taken from, and over what time period. + +When data is received via the <> API, +you must specify the data format (for example, JSON or CSV). In this scenario, +the data posted is not stored in Elasticsearch. Only the results for anomaly detection are retained. + +When you create a job, by default it accepts data in tab-separated-values format and expects +an Epoch time value in a field named `time`. The `time` field must be measured in seconds from the Epoch. +If, however, your data is not in this format, you can provide a data description object that specifies the +format of your data. + +A data description object has the following properties: + +`fieldDelimiter`:: + TBD + +`format`:: + TBD + +`time_field`:: + (+string+) The name of the field that contains the timestamp. + The default value is `time`. + +`time_format`:: + (+string+) The time format, which can be `epoch`, `epoch_ms`, or a custom pattern. + The default value is `epoch`, which refers to UNIX or Epoch time (the number of seconds + since 1 Jan 1970) and corresponds to the time_t type in C and C++. + The value `epoch_ms` indicates that time is measured in milliseconds since the epoch. + The `epoch` and `epoch_ms` time formats accept either integer or real values. + + +NOTE: Custom patterns must conform to the Java `DateTimeFormatter` class. When you use date-time formatting patterns, it is recommended that you provide the full date, time and time zone. For example: `yyyy-MM-ddTHH:mm:ssX`. If the pattern that you specify is not sufficient to produce a complete timestamp, job creation fails. + +`quotecharacter`:: + TBD + +[[ml-apilimits]] +===== Analysis Limits + +Limits can be applied for the size of the mathematical models that are held in memory. +These limits can be set per job and do not control the memory used by other processes. +If necessary, the limits can also be updated after the job is created. + +The `analysis_limits` object has the following properties: + +`categorization_examples_limit`:: + (+long+) The maximum number of examples stored per category in memory and + in the results data store. The default value is 4. If you increase this value, + more examples are available, however it requires that you have more storage available. + If you set this value to `0`, no examples are stored. + +//// +NOTE: The `categorization_examples_limit` only applies to analysis that uses categorization. +//// +`model_memory_limit`:: + (+long+) The maximum amount of memory, in MiB, that the mathematical models can use. + Once this limit is approached, data pruning becomes more aggressive. + Upon exceeding this limit, new entities are not modeled. The default value is 4096. diff --git a/docs/en/rest-api/ml/open-datafeed.asciidoc b/docs/en/rest-api/ml/open-datafeed.asciidoc new file mode 100644 index 00000000000..82cc0021452 --- /dev/null +++ b/docs/en/rest-api/ml/open-datafeed.asciidoc @@ -0,0 +1,63 @@ +[[ml-open-job]] +==== Open Jobs + +An anomaly detection job must be opened in order for it to be ready to receive and analyze data. +A job may be opened and closed multiple times throughout its lifecycle. + +===== Request + +`POST _xpack/ml/anomaly_detectors//_open` + +===== Description + +A job must be open in order to it to accept and analyze data. + +When you open a new job, it starts with an empty model. + +When you open an existing job, the most recent model state is automatically loaded. +The job is ready to resume its analysis from where it left off, once new data is received. + +===== Path Parameters + +`job_id` (required):: +(+string+) Identifier for the job + +===== Request Body + +`open_timeout`:: + (+time+; default: ++30 min++) Controls the time to wait until a job has opened + +`ignore_downtime`:: + (+boolean+; default: ++true++) If true (default), any gap in data since it was + last closed is treated as a maintenance window. That is to say, it is not an anomaly + +//// +===== Responses + +200 +(EmptyResponse) The cluster has been successfully deleted +404 +(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found) +412 +(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error) +//// +===== Examples + +The following example opens the `event_rate` job: + +[source,js] +-------------------------------------------------- +POST _xpack/ml/anomaly_detectors/event_rate/_open +{ + "ignore_downtime":false +} +-------------------------------------------------- +// CONSOLE +// TEST[skip:todo] + +When the job opens, you receive the following results: +---- +{ + "opened": true +} +---- diff --git a/docs/en/rest-api/ml/open-job.asciidoc b/docs/en/rest-api/ml/open-job.asciidoc new file mode 100644 index 00000000000..1f3bb5d27c1 --- /dev/null +++ b/docs/en/rest-api/ml/open-job.asciidoc @@ -0,0 +1,63 @@ +[[ml-open-job]] +==== Open Jobs + +An anomaly detection job must be opened in order for it to be ready to receive and analyze data. +A job may be opened and closed multiple times throughout its lifecycle. + +===== Request + +`POST _xpack/ml/anomaly_detectors/{job_id}/_open` + +===== Description + +A job must be open in order to it to accept and analyze data. + +When you open a new job, it starts with an empty model. + +When you open an existing job, the most recent model state is automatically loaded. +The job is ready to resume its analysis from where it left off, once new data is received. + +===== Path Parameters + +`job_id` (required):: +(+string+) Identifier for the job + +===== Request Body + +`open_timeout`:: + (+time+; default: ++30 min++) Controls the time to wait until a job has opened + +`ignore_downtime`:: + (+boolean+; default: ++true++) If true (default), any gap in data since it was + last closed is treated as a maintenance window. That is to say, it is not an anomaly + +//// +===== Responses + +200 +(EmptyResponse) The cluster has been successfully deleted +404 +(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found) +412 +(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error) +//// +===== Examples + +The following example opens the `event_rate` job and sets an optional property: + +[source,js] +-------------------------------------------------- +POST _xpack/ml/anomaly_detectors/event_rate/_open +{ + "ignore_downtime": false +} +-------------------------------------------------- +// CONSOLE +// TEST[skip:todo] + +When the job opens, you receive the following results: +---- +{ + "opened": true +} +---- diff --git a/docs/en/rest-api/ml/post-data.asciidoc b/docs/en/rest-api/ml/post-data.asciidoc new file mode 100644 index 00000000000..1753ce07418 --- /dev/null +++ b/docs/en/rest-api/ml/post-data.asciidoc @@ -0,0 +1,56 @@ +[[ml-post-data]] +==== Post Data to Jobs + +The post data API allows you to send data to an anomaly detection job for analysis. +The job must have been opened prior to sending data. + +===== Request + +`POST _xpack/ml/anomaly_detectors/ --data-binary @{data-file.json}` + +===== Description + +File sizes are limited to 100 Mb, so if your file is larger, +then split it into multiple files and upload each one separately in sequential time order. +When running in real-time, it is generally recommended to arrange to perform +many small uploads, rather than queueing data to upload larger files. + + +IMPORTANT: Data can only be accepted from a single connection. + Do not attempt to access the data endpoint from different threads at the same time. + Use a single connection synchronously to send data, close, flush or delete a single job. + + + It is not currently possible to post data to multiple jobs using wildcards or a comma separated list. + +===== Path Parameters + +`job_id` (required):: + (+string+) Identifier for the job + +===== Request Body + +`reset_start`:: + (+string+; default: ++null++) Specifies the start of the bucket resetting range + +`reset_end`:: + (+string+; default: ++null++) Specifies the end of the bucket resetting range" + +//// +===== Responses + + + 200 + (EmptyResponse) The cluster has been successfully deleted + 404 + (BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found) + 412 + (BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error) + + The following example sends data from file `data-file.json` to a job called `my_analysis`. +//// +===== Examples + +[source,js] +-------------------------------------------------- +$ curl -s -XPOST localhost:9200/_xpack/ml/anomaly_detectors/my_analysis --data-binary @data-file.json +-------------------------------------------------- diff --git a/docs/en/rest-api/ml/preview-datafeed.asciidoc b/docs/en/rest-api/ml/preview-datafeed.asciidoc new file mode 100644 index 00000000000..7f3b9b37714 --- /dev/null +++ b/docs/en/rest-api/ml/preview-datafeed.asciidoc @@ -0,0 +1,84 @@ +[[ml-preview-datafeed]] +==== Preview Data Feeds + +The preview data feed API allows you to preview a data feed. + +===== Request + +`GET _xpack/ml/datafeeds//_preview` + +//// +===== Description + +Important:: Updates do not take effect until after then job is closed and new +data is sent to it. +//// +===== Path Parameters + +`feed_id` (required):: + (+string+) Identifier for the data feed + +//// +===== Request Body + +The following properties can be updated after the job is created: + +`analysis_config`:: + (+object+) The analysis configuration, which specifies how to analyze the data. + See <>. In particular, the following properties can be updated: `categorization_filters`, `detector_description`, TBD. + +`analysis_limits`:: + Optionally specifies runtime limits for the job. See <>. + +[NOTE] +* You can update the `analysis_limits` only while the job is closed. +* The `model_memory_limit` property value cannot be decreased. +* If the `memory_status` property in the `model_size_stats` object has a value of `hard_limit`, +increasing the `model_memory_limit` is not recommended. + +`description`:: + (+string+) An optional description of the job. + +This expects data to be sent in JSON format using the POST `_data` API. + +===== Responses + +TBD +//// +//// +200 +(EmptyResponse) The cluster has been successfully deleted +404 +(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found) +412 +(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error) + +===== Examples + +The following example updates the `it-ops-kpi` job: + +[source,js] +-------------------------------------------------- +PUT _xpack/ml/anomaly_detectors/it-ops-kpi/_update +{ + "description":"New description", + "analysis_limits":{ + "model_memory_limit": 8192 + } +} +-------------------------------------------------- +// CONSOLE +// TEST[skip:todo] + +When the job is updated, you receive the following results: +---- +{ + "job_id": "it-ops-kpi", + "description": "New description", + ... + "analysis_limits": { + "model_memory_limit": 8192 + ... +} +---- +//// diff --git a/docs/en/rest-api/ml/put-datafeed.asciidoc b/docs/en/rest-api/ml/put-datafeed.asciidoc new file mode 100644 index 00000000000..fc13e149c4d --- /dev/null +++ b/docs/en/rest-api/ml/put-datafeed.asciidoc @@ -0,0 +1,109 @@ +[[ml-put-datafeed]] +==== Create Data Feeds + +The create data feed API allows you to instantiate a data feed. + +===== Request + +`PUT _xpack/ml/datafeeds/` + +//// +===== Description + +TBD +//// +===== Path Parameters + +`feed_id` (required):: + (+string+) Identifier for the data feed + +//// +===== Request Body + + +`description`:: + (+string+) An optional description of the job. + +`analysis_config`:: + (+object+) The analysis configuration, which specifies how to analyze the data. + See <>. + +`data_description`:: + (+object+) Describes the format of the input data. + See <>. + +`analysis_limits`:: + Optionally specifies runtime limits for the job. See <>. + + +This expects data to be sent in JSON format using the POST `_data` API. + +===== Responses + +TBD +200 +(EmptyResponse) The cluster has been successfully deleted +404 +(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found) +412 +(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error) + + +===== Examples + +The following example creates the `it-ops-kpi` job: + +[source,js] +-------------------------------------------------- +PUT _xpack/ml/anomaly_detectors/it-ops-kpi +{ + "description":"First simple job", + "analysis_config":{ + "bucket_span": "5m", + "latency": "0ms", + "detectors":[ + { + "detector_description": "low_sum(events_per_min)", + "function":"low_sum", + "field_name": "events_per_min" + } + ] + }, + "data_description": { + "time_field":"@timestamp", + "time_format":"epoch_ms" + } +} +-------------------------------------------------- +// CONSOLE +// TEST[skip:todo] + +When the job is created, you receive the following results: +---- +{ + "job_id": "it-ops-kpi", + "description": "First simple job", + "create_time": 1491247016391, + "analysis_config": { + "bucket_span": "5m", + "latency": "0ms", + "detectors": [ + { + "detector_description": "low_sum(events_per_min)", + "function": "low_sum", + "field_name": "events_per_min", + "detector_rules": [] + } + ], + "influencers": [], + "use_per_partition_normalization": false + }, + "data_description": { + "time_field": "@timestamp", + "time_format": "epoch_ms" + }, + "model_snapshot_retention_days": 1, + "results_index_name": "shared" +} +---- +//// diff --git a/docs/en/rest-api/ml/put-job.asciidoc b/docs/en/rest-api/ml/put-job.asciidoc new file mode 100644 index 00000000000..90edc8a8e75 --- /dev/null +++ b/docs/en/rest-api/ml/put-job.asciidoc @@ -0,0 +1,109 @@ +[[ml-put-job]] +==== Create Jobs + +The create job API allows you to instantiate a {ml} job. + +===== Request + +`PUT _xpack/ml/anomaly_detectors/` + +//// +===== Description + +TBD +//// +===== Path Parameters + +`job_id` (required):: + (+string+) Identifier for the job + + +===== Request Body + +`description`:: + (+string+) An optional description of the job. + +`analysis_config`:: + (+object+) The analysis configuration, which specifies how to analyze the data. + See <>. + +`data_description`:: + (+object+) Describes the format of the input data. + See <>. + +`analysis_limits`:: + Optionally specifies runtime limits for the job. See <>. + +//// +This expects data to be sent in JSON format using the POST `_data` API. + +===== Responses + +TBD +//// +//// +200 +(EmptyResponse) The cluster has been successfully deleted +404 +(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found) +412 +(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error) +//// + +===== Examples + +The following example creates the `it-ops-kpi` job: + +[source,js] +-------------------------------------------------- +PUT _xpack/ml/anomaly_detectors/it-ops-kpi +{ + "description":"First simple job", + "analysis_config":{ + "bucket_span": "5m", + "latency": "0ms", + "detectors":[ + { + "detector_description": "low_sum(events_per_min)", + "function":"low_sum", + "field_name": "events_per_min" + } + ] + }, + "data_description": { + "time_field":"@timestamp", + "time_format":"epoch_ms" + } +} +-------------------------------------------------- +// CONSOLE +// TEST[skip:todo] + +When the job is created, you receive the following results: +---- +{ + "job_id": "it-ops-kpi", + "description": "First simple job", + "create_time": 1491247016391, + "analysis_config": { + "bucket_span": "5m", + "latency": "0ms", + "detectors": [ + { + "detector_description": "low_sum(events_per_min)", + "function": "low_sum", + "field_name": "events_per_min", + "detector_rules": [] + } + ], + "influencers": [], + "use_per_partition_normalization": false + }, + "data_description": { + "time_field": "@timestamp", + "time_format": "epoch_ms" + }, + "model_snapshot_retention_days": 1, + "results_index_name": "shared" +} +---- diff --git a/docs/en/rest-api/ml/resultsresource.asciidoc b/docs/en/rest-api/ml/resultsresource.asciidoc new file mode 100644 index 00000000000..a8aae6e974b --- /dev/null +++ b/docs/en/rest-api/ml/resultsresource.asciidoc @@ -0,0 +1,10 @@ +[[ml-results-resource]] +==== Results Resources + +A results resource has the following properties: + +TBD +//// +`analysis_config`:: + (+object+) The analysis configuration, which specifies how to analyze the data. See <>. +//// diff --git a/docs/en/rest-api/ml/revert-snapshot.asciidoc b/docs/en/rest-api/ml/revert-snapshot.asciidoc new file mode 100644 index 00000000000..cb0f7572665 --- /dev/null +++ b/docs/en/rest-api/ml/revert-snapshot.asciidoc @@ -0,0 +1,89 @@ +[[ml-revert-snapshot]] +==== Update Model Snapshots + +The update model snapshot API allows you to update certain properties of a snapshot. + +===== Request + +`POST _xpack/ml/anomaly_detectors//model_snapshots//_update` + +//// +===== Description + +Important:: Updates do not take effect until after then job is closed and new +data is sent to it. +//// +===== Path Parameters + +`job_id` (required):: + (+string+) Identifier for the job + +`snapshot_id` (required):: + (+string+) Identifier for the model snapshot + +===== Request Body + +The following properties can be updated after the job is created: + +TBD + +//// +`analysis_config`:: + (+object+) The analysis configuration, which specifies how to analyze the data. + See <>. In particular, the following properties can be updated: `categorization_filters`, `detector_description`, TBD. + +`analysis_limits`:: + Optionally specifies runtime limits for the job. See <>. + +[NOTE] +* You can update the `analysis_limits` only while the job is closed. +* The `model_memory_limit` property value cannot be decreased. +* If the `memory_status` property in the `model_size_stats` object has a value of `hard_limit`, +increasing the `model_memory_limit` is not recommended. + +`description`:: + (+string+) An optional description of the job. + + +This expects data to be sent in JSON format using the POST `_data` API. + +===== Responses + +TBD +200 +(EmptyResponse) The cluster has been successfully deleted +404 +(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found) +412 +(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error) + + +===== Examples + +The following example updates the `it-ops-kpi` job: + +[source,js] +-------------------------------------------------- +PUT _xpack/ml/anomaly_detectors/it-ops-kpi/_update +{ + "description":"New description", + "analysis_limits":{ + "model_memory_limit": 8192 + } +} +-------------------------------------------------- +// CONSOLE +// TEST[skip:todo] + +When the job is updated, you receive the following results: +---- +{ + "job_id": "it-ops-kpi", + "description": "New description", + ... + "analysis_limits": { + "model_memory_limit": 8192 + ... +} +---- +//// diff --git a/docs/en/rest-api/ml/snapshotresource.asciidoc b/docs/en/rest-api/ml/snapshotresource.asciidoc new file mode 100644 index 00000000000..0c0b53fa35e --- /dev/null +++ b/docs/en/rest-api/ml/snapshotresource.asciidoc @@ -0,0 +1,10 @@ +[[ml-snapshot-resource]] +==== Model Snapshot Resources + +A model snapshot resource has the following properties: + +TBD +//// +`analysis_config`:: + (+object+) The analysis configuration, which specifies how to analyze the data. See <>. +//// diff --git a/docs/en/rest-api/ml/start-datafeed.asciidoc b/docs/en/rest-api/ml/start-datafeed.asciidoc new file mode 100644 index 00000000000..e03054746a9 --- /dev/null +++ b/docs/en/rest-api/ml/start-datafeed.asciidoc @@ -0,0 +1,64 @@ +[[ml-start-datafeed]] +==== Start Data Feeds + +A data feed must be started in order for it to be ready to receive and analyze data. +A data feed can be opened and closed multiple times throughout its lifecycle. + +===== Request + +`POST _xpack/ml/datafeeds//_start` + +//// +===== Description + +A job must be open in order to it to accept and analyze data. + +When you open a new job, it starts with an empty model. + +When you open an existing job, the most recent model state is automatically loaded. +The job is ready to resume its analysis from where it left off, once new data is received. +//// +===== Path Parameters + +`feed_id` (required):: +(+string+) Identifier for the data feed +//// +===== Request Body + +`open_timeout`:: + (+time+; default: ++30 min++) Controls the time to wait until a job has opened + +`ignore_downtime`:: + (+boolean+; default: ++true++) If true (default), any gap in data since it was + last closed is treated as a maintenance window. That is to say, it is not an anomaly + +===== Responses + +200 +(EmptyResponse) The cluster has been successfully deleted +404 +(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found) +412 +(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error) + +===== Examples + +The following example opens the `event_rate` job: + +[source,js] +-------------------------------------------------- +POST _xpack/ml/anomaly_detectors/event_rate/_open +{ + "ignore_downtime":false +} +-------------------------------------------------- +// CONSOLE +// TEST[skip:todo] + +When the job opens, you receive the following results: +---- +{ + "opened": true +} +---- +//// diff --git a/docs/en/rest-api/ml/stop-datafeed.asciidoc b/docs/en/rest-api/ml/stop-datafeed.asciidoc new file mode 100644 index 00000000000..b3485007dd4 --- /dev/null +++ b/docs/en/rest-api/ml/stop-datafeed.asciidoc @@ -0,0 +1,64 @@ +[[ml-stop-datafeed]] +==== Stop Data Feeds + +A data feed can be opened and closed multiple times throughout its lifecycle. + +===== Request + +`POST _xpack/ml/datafeeds//_stop` + +//// +===== Description + +A job can be closed once all data has been analyzed. + +When you close a job, it runs housekeeping tasks such as pruning the model history, +flushing buffers, calculating final results and persisting the internal models. +Depending upon the size of the job, it could take several minutes to close and +the equivalent time to re-open. + +Once closed, the anomaly detection job has almost no overhead on the cluster +(except for maintaining its meta data). A closed job is blocked for receiving +data and analysis operations, however you can still explore and navigate results. + +//NOTE: +//OUTDATED?: If using the {prelert} UI, the job will be automatically closed when stopping a datafeed job. +//// +===== Path Parameters + +`feed_id` (required):: + (+string+) Identifier for the data feed +//// +===== Query Parameters + +`close_timeout`:: + (+time+; default: ++30 min++) Controls the time to wait until a job has closed + + +===== Responses + +200 +(EmptyResponse) The cluster has been successfully deleted +404 +(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found) +412 +(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error) + +===== Examples + +The following example closes the `event_rate` job: + +[source,js] +-------------------------------------------------- +POST _xpack/ml/anomaly_detectors/event_rate/_close +-------------------------------------------------- +// CONSOLE +// TEST[skip:todo] + +When the job is closed, you receive the following results: +---- +{ + "closed": true +} +---- +//// diff --git a/docs/en/rest-api/ml/update-datafeed.asciidoc b/docs/en/rest-api/ml/update-datafeed.asciidoc new file mode 100644 index 00000000000..6dc6fc35bb4 --- /dev/null +++ b/docs/en/rest-api/ml/update-datafeed.asciidoc @@ -0,0 +1,84 @@ +[[ml-update-datafeed]] +==== Update Data Feeds + +The update data feed API allows you to update certain properties of a data feed. + +===== Request + +`POST _xpack/ml/datafeeds//_update` + +//// +===== Description + +Important:: Updates do not take effect until after then job is closed and new +data is sent to it. +//// +===== Path Parameters + +`feed_id` (required):: + (+string+) Identifier for the data feed + +//// +===== Request Body + +The following properties can be updated after the job is created: + +`analysis_config`:: + (+object+) The analysis configuration, which specifies how to analyze the data. + See <>. In particular, the following properties can be updated: `categorization_filters`, `detector_description`, TBD. + +`analysis_limits`:: + Optionally specifies runtime limits for the job. See <>. + +[NOTE] +* You can update the `analysis_limits` only while the job is closed. +* The `model_memory_limit` property value cannot be decreased. +* If the `memory_status` property in the `model_size_stats` object has a value of `hard_limit`, +increasing the `model_memory_limit` is not recommended. + +`description`:: + (+string+) An optional description of the job. + +This expects data to be sent in JSON format using the POST `_data` API. + +===== Responses + +TBD +//// +//// +200 +(EmptyResponse) The cluster has been successfully deleted +404 +(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found) +412 +(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error) + +===== Examples + +The following example updates the `it-ops-kpi` job: + +[source,js] +-------------------------------------------------- +PUT _xpack/ml/anomaly_detectors/it-ops-kpi/_update +{ + "description":"New description", + "analysis_limits":{ + "model_memory_limit": 8192 + } +} +-------------------------------------------------- +// CONSOLE +// TEST[skip:todo] + +When the job is updated, you receive the following results: +---- +{ + "job_id": "it-ops-kpi", + "description": "New description", + ... + "analysis_limits": { + "model_memory_limit": 8192 + ... +} +---- +//// diff --git a/docs/en/rest-api/ml/update-job.asciidoc b/docs/en/rest-api/ml/update-job.asciidoc new file mode 100644 index 00000000000..2939af314cf --- /dev/null +++ b/docs/en/rest-api/ml/update-job.asciidoc @@ -0,0 +1,84 @@ +[[ml-update-job]] +==== Update Jobs + +The update job API allows you to update certain properties of a job. + +===== Request + +`POST _xpack/ml/anomaly_detectors//_update` + +//// +===== Description + +Important:: Updates do not take effect until after then job is closed and new +data is sent to it. +//// +===== Path Parameters + +`job_id` (required):: + (+string+) Identifier for the job + +===== Request Body + +The following properties can be updated after the job is created: + +`analysis_config`:: + (+object+) The analysis configuration, which specifies how to analyze the data. + See <>. In particular, the following properties can be updated: `categorization_filters`, `detector_description`, TBD. + +`analysis_limits`:: + Optionally specifies runtime limits for the job. See <>. + +[NOTE] +* You can update the `analysis_limits` only while the job is closed. +* The `model_memory_limit` property value cannot be decreased. +* If the `memory_status` property in the `model_size_stats` object has a value of `hard_limit`, +increasing the `model_memory_limit` is not recommended. + +`description`:: + (+string+) An optional description of the job. + +//// +This expects data to be sent in JSON format using the POST `_data` API. + +===== Responses + +TBD +//// +//// +200 +(EmptyResponse) The cluster has been successfully deleted +404 +(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found) +412 +(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error) +//// + +===== Examples + +The following example updates the `it-ops-kpi` job: + +[source,js] +-------------------------------------------------- +PUT _xpack/ml/anomaly_detectors/it-ops-kpi/_update +{ + "description":"New description", + "analysis_limits":{ + "model_memory_limit": 8192 + } +} +-------------------------------------------------- +// CONSOLE +// TEST[skip:todo] + +When the job is updated, you receive the following results: +---- +{ + "job_id": "it-ops-kpi", + "description": "New description", + ... + "analysis_limits": { + "model_memory_limit": 8192 + ... +} +---- diff --git a/docs/en/rest-api/ml/update-snapshot.asciidoc b/docs/en/rest-api/ml/update-snapshot.asciidoc new file mode 100644 index 00000000000..7368486c41d --- /dev/null +++ b/docs/en/rest-api/ml/update-snapshot.asciidoc @@ -0,0 +1,89 @@ +[[ml-update-snapshot]] +==== Update Model Snapshots + +The update model snapshot API allows you to update certain properties of a snapshot. + +===== Request + +`POST _xpack/ml/anomaly_detectors//model_snapshots//_update` + +//// +===== Description + +Important:: Updates do not take effect until after then job is closed and new +data is sent to it. +//// +===== Path Parameters + +`job_id` (required):: + (+string+) Identifier for the job + +`snapshot_id` (required):: + (+string+) Identifier for the model snapshot + +===== Request Body + +The following properties can be updated after the job is created: + +TBD + +//// +`analysis_config`:: + (+object+) The analysis configuration, which specifies how to analyze the data. + See <>. In particular, the following properties can be updated: `categorization_filters`, `detector_description`, TBD. + +`analysis_limits`:: + Optionally specifies runtime limits for the job. See <>. + +[NOTE] +* You can update the `analysis_limits` only while the job is closed. +* The `model_memory_limit` property value cannot be decreased. +* If the `memory_status` property in the `model_size_stats` object has a value of `hard_limit`, +increasing the `model_memory_limit` is not recommended. + +`description`:: + (+string+) An optional description of the job. + + +This expects data to be sent in JSON format using the POST `_data` API. + +===== Responses + +TBD +200 +(EmptyResponse) The cluster has been successfully deleted +404 +(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found) +412 +(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error) + + +===== Examples + +The following example updates the `it-ops-kpi` job: + +[source,js] +-------------------------------------------------- +PUT _xpack/ml/anomaly_detectors/it-ops-kpi/_update +{ + "description":"New description", + "analysis_limits":{ + "model_memory_limit": 8192 + } +} +-------------------------------------------------- +// CONSOLE +// TEST[skip:todo] + +When the job is updated, you receive the following results: +---- +{ + "job_id": "it-ops-kpi", + "description": "New description", + ... + "analysis_limits": { + "model_memory_limit": 8192 + ... +} +---- +//// diff --git a/docs/en/rest-api/ml/validate-detector.asciidoc b/docs/en/rest-api/ml/validate-detector.asciidoc new file mode 100644 index 00000000000..aa49c2d8073 --- /dev/null +++ b/docs/en/rest-api/ml/validate-detector.asciidoc @@ -0,0 +1,61 @@ +[[ml-valid-detector]] +==== Validate Detectors + +TBD + +===== Request + +`POST _xpack/ml/anomaly_detectors/_validate/detector` + +===== Description + +TBD + +//// +===== Path Parameters + +`job_id` (required):: +(+string+) Identifier for the job +//// +===== Request Body + +TBD +//// +`open_timeout`:: + (+time+; default: ++30 min++) Controls the time to wait until a job has opened + +`ignore_downtime`:: + (+boolean+; default: ++true++) If true (default), any gap in data since it was + last closed is treated as a maintenance window. That is to say, it is not an anomaly + + +===== Responses + +200 +(EmptyResponse) The cluster has been successfully deleted +404 +(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found) +412 +(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error) + +===== Examples + +The following example opens the `event_rate` job: + +[source,js] +-------------------------------------------------- +POST _xpack/ml/anomaly_detectors/event_rate/_open +{ + "ignore_downtime":false +} +-------------------------------------------------- +// CONSOLE +// TEST[skip:todo] + +When the job opens, you receive the following results: +---- +{ + "opened": true +} +---- +//// diff --git a/docs/en/rest-api/ml/validate-job.asciidoc b/docs/en/rest-api/ml/validate-job.asciidoc new file mode 100644 index 00000000000..cf947bafb0a --- /dev/null +++ b/docs/en/rest-api/ml/validate-job.asciidoc @@ -0,0 +1,61 @@ +[[ml-valid-job]] +==== Validate Jobs + +TBD + +===== Request + +`POST _xpack/ml/anomaly_detectors/_validate` + +===== Description + +TBD + +//// +===== Path Parameters + +`job_id` (required):: +(+string+) Identifier for the job +//// +===== Request Body + +TBD +//// +`open_timeout`:: + (+time+; default: ++30 min++) Controls the time to wait until a job has opened + +`ignore_downtime`:: + (+boolean+; default: ++true++) If true (default), any gap in data since it was + last closed is treated as a maintenance window. That is to say, it is not an anomaly + + +===== Responses + +200 +(EmptyResponse) The cluster has been successfully deleted +404 +(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found) +412 +(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error) + +===== Examples + +The following example opens the `event_rate` job: + +[source,js] +-------------------------------------------------- +POST _xpack/ml/anomaly_detectors/event_rate/_open +{ + "ignore_downtime":false +} +-------------------------------------------------- +// CONSOLE +// TEST[skip:todo] + +When the job opens, you receive the following results: +---- +{ + "opened": true +} +---- +////