[DOCS] Add ML documentation to master (elastic/x-pack-elasticsearch#959)

Original commit: elastic/x-pack-elasticsearch@666a10bd23
This commit is contained in:
Lisa Cawley 2017-04-04 15:26:39 -07:00 committed by GitHub
parent d11fbfa70c
commit 843a0d8b3f
43 changed files with 2928 additions and 20 deletions

View File

@ -0,0 +1,84 @@
[[ml-api-quickref]]
== API Quick Reference
All {ml} endpoints have the following base:
----
/_xpack/ml/
----
The main {ml} resources can be accessed with a variety of endpoints:
* <<ml-api-jobs,+/anomaly_detectors/+>>: Create and manage {ml} jobs.
* <<ml-api-datafeeds,+/datafeeds/+>>: Update data to be analyzed.
* <<ml-api-results,+/results/+>>: Access the results of a {ml} job.
* <<ml-api-snapshots,+/modelsnapshots/+>>: Manage model snapshots.
* <<ml-api-validate,+/validate/+>>: Validate subsections of job configurations.
[float]
[[ml-api-jobs]]
=== /anomaly_detectors/
* <<ml-put-job,POST /anomaly_detectors>>: Create job
* <<ml-open-job,POST /anomaly_detectors/<job_id>/_open>>: Open a job
* <<ml-post-data,POST anomaly_detectors/<job_id+++>+++>>: Send data to a job
* <<ml-get-job,GET /anomaly_detectors>>: List jobs
* <<ml-get-job,GET /anomaly_detectors/<job_id+++>+++>>: Get job details
* <<ml-get-job-stats,GET /anomaly_detectors/<job_id>/_stats>>: Get job statistics
* <<ml-update-job,POST /anomaly_detectors/<job_id>/_update>>: Update certain properties of the job configuration
* <<ml-flush-job,POST anomaly_detectors/<job_id>/_flush>>: Force a job to analyze buffered data
* <<ml-close-job,POST /anomaly_detectors/<job_id>/_close>>: Close a job
* <<ml-delete-job,DELETE /anomaly_detectors/<job_id+++>+++>>: Delete job
[float]
[[ml-api-datafeeds]]
=== /datafeeds/
* <<ml-put-datafeed,PUT /datafeeds/<datafeedID+++>+++>>: Create a data feed
* <<ml-start-datafeed,POST /datafeeds/<feed_id>/_start>>: Start a data feed
* <<ml-get-datafeed,GET /datafeeds>>: List data feeds
* <<ml-get-datafeed,GET /datafeeds/<feed_id+++>+++>>: Get data feed details
* <<ml-get-datafeed-stats,GET /datafeeds/<feed_id>/_stats>>: Get statistical information for data feeds
* <<ml-preview-datafeed,GET /datafeeds/<feed_id>/_preview>>: Get a preview of a data feed
* <<ml-update-datafeed,POST /datafeeds/<feedid>/_update>>: Update certain settings for a data feed
* <<ml-stop-datafeed,POST /datafeeds/<feed_id>/_stop>>: Stop a data feed
* <<ml-delete-datafeed,DELETE /datafeeds/<feed_id+++>+++>>: Delete data feed
[float]
[[ml-api-results]]
=== /results/
* <<ml-get-bucket,GET /results/buckets>>: List the buckets in the results
* <<ml-get-bucket,GET /results/buckets/<bucket_id+++>+++>>: Get bucket details
* <<ml-get-influencer,GET /results/categories>>: List the categories in the results
* <<ml-get-influencer,GET /results/categories/<category_id+++>+++>>: Get category details
* <<ml-get-influencer,GET /results/influencers>>: Get influencer details
* <<ml-get-record,GET /results/records>>: Get records from the results
[float]
[[ml-api-snapshots]]
=== /model_snapshots/
* <<ml-get-snapshot,GET /model_snapshots>>: List model snapshots
* <<ml-get-snapshot,GET /model_snapshots/<snapshot_id+++>+++>>: Get model snapshot details
* <<ml-revert-snapshot,POST /model_snapshots/<snapshot_id>/_revert>>: Revert a model snapshot
* <<ml-update-snapshot,POST /model_snapshots/<snapshot_id>/_update>>: Update certain settings for a model snapshot
* <<ml-delete-snapshot,DELETE /model_snapshots/<snapshot_id+++>+++>>: Delete a model snapshot
[float]
[[ml-api-validate]]
=== /validate/
* <<ml-valid-detector,POST /anomaly_detectors/_validate/detector>>: Validate a detector
* <<ml-valid-job, POST /anomaly_detectors/_validate>>: Validate a job
//[float]
//== Where to Go Next
//<<ml-getting-started, Getting Started>> :: Enable machine learning and start
//discovering anomalies in your data.
//[float]
//== Have Comments, Questions, or Feedback?
//Head over to our {forum}[Graph Discussion Forum] to share your experience, questions, and
//suggestions.

View File

@ -0,0 +1,11 @@
[[ml-getting-started]]
== Getting Started
To start exploring anomalies in your data:
. Open Kibana in your web browser and log in. If you are running Kibana
locally, go to `http://localhost:5601/`.
. Click **ML** in the side navigation ...
//image::graph-open.jpg["Accessing Graph"]

23
docs/en/ml/index.asciidoc Normal file
View File

@ -0,0 +1,23 @@
[[xpack-ml]]
= Machine Learning in the Elastic Stack
[partintro]
--
Data stored in {es} contains valuable insights into the behavior and
performance of your business and systems. However, the following questions can
be difficult to answer:
* Is the response time of my website unusual?
* Are users exfiltrating data unusually?
The good news is that the {xpack} machine learning capabilities enable you to
easily answer these types of questions.
--
include::introduction.asciidoc[]
include::getting-started.asciidoc[]
include::ml-scenarios.asciidoc[]
include::api-quickref.asciidoc[]
//include::troubleshooting.asciidoc[] Referenced from x-pack/docs/public/xpack-troubleshooting.asciidoc
//include::release-notes.asciidoc[] Referenced from x-pack/docs/public/xpack-release-notes.asciidoc

View File

@ -0,0 +1,34 @@
[[ml-introduction]]
== Introduction
Machine learning in {xpack} automates the analysis of time-series data by
creating accurate baselines of normal behaviors in the data, and identifying
anomalous patterns in that data.
Driven by proprietary machine learning algorithms, anomalies related to temporal
deviations in values/counts/frequencies, statistical rarity, and unusual
behaviors for a member of a population are detected, scored and linked with
statistically significant influencers in the data.
Automated periodicity detection and quick adaptation to changing data ensure
that you dont need to specify algorithms, models, or other data
science-related configurations in order to get the benefits of {ml}.
//image::graph-network.jpg["Graph network"]
=== Integration with the Elastic Stack
Machine learning is tightly integrated with the Elastic Stack.
Data is pulled from {es} for analysis and anomaly results are displayed in
{kb} dashboards.
//[float]
//== Where to Go Next
//<<ml-getting-started, Getting Started>> :: Enable machine learning and start
//discovering anomalies in your data.
//[float]
//== Have Comments, Questions, or Feedback?
//Head over to our {forum}[Graph Discussion Forum] to share your experience, questions, and
//suggestions.

View File

@ -0,0 +1,32 @@
[[ml-limitations]]
== Machine Learning Limitations
[float]
=== Misleading High Missing Field Counts
//See x-pack-elasticsearch/#684
One of the counts associated with a {ml} job is +missingFieldCount+,
which indicates the number of records that are missing a configured field.
This information is most useful when your job analyzes CSV data. In this case,
missing fields indicate data is not being analyzed and you might receive poor results.
If your job analyzes JSON data, the +missingFieldCount+ might be misleading.
Missing fields might be expected due to the structure of the data and therefore do
not generate poor results.
//When you refer to a file script in a watch, the watch itself is not updated
//if you change the script on the filesystem.
//Currently, the only way to reload a file script in a watch is to delete
//the watch and recreate it.
//=== The _data Endpoint Requires Data to be in JSON Format
//See x-pack-elasticsearch/#777
//=== tBD
//See x-pack-elasticsearch/#601
//When you use aggregations, you must ensure +size+ is configured correctly.
//Otherwise, not all data will be analyzed.

View File

@ -0,0 +1,100 @@
[[ml-scenarios]]
== Use Cases
Enterprises, government organizations and cloud based service providers daily
process volumes of machine data so massive as to make real-time human
analysis impossible. Changing behaviors hidden in this data provide the
information needed to quickly resolve massive service outage, detect security
breaches before they result in the theft of millions of credit records or
identify the next big trend in consumer patterns. Current search and analysis,
performance management and cyber security tools are unable to find these
anomalies without significant human work in the form of thresholds, rules,
signatures and data models.
By using advanced anomaly detection techniques that learn normal behavior
patterns represented by the data and identify and cross-correlate anomalies,
performance, security and operational anomalies and their cause can be
identified as they develop, so they can be acted on before they impact business.
Whilst anomaly detection is applicable to any type of data, we focus on machine
data scenarios. Enterprise application developers, cloud service providers and
technology vendors need to harness the power of machine learning based anomaly
detection analytics to better manage complex on-line services, detect the
earliest signs of advanced security threats and gain insight to business
opportunities and risks represented by changing behaviors hidden in their
massive data sets. Here are some real-world examples.
=== Eliminating noise generated by threshold-based alerts
Modern IT systems are highly instrumented and can generate TBs of machine data
a day. Traditional methods for analyzing data involves alerting when metric
values exceed a known value (static thresholds), or looking for simple statistical deviations (dynamic thresholds).
Setting accurate thresholds for each metric at different times of day is
practically impossible. It results in static thresholds generating large volumes
of false positives (threshold set too low) and false negatives (threshold set too high).
The {ml} features in {xpack} automatically learn and calculate the probability
of a value being anomalous based on its historical behavior.
This enables accurate alerting and highlights only the subset of relevant metrics
that have changed. These alerts provide actionable insight into what is a growing
mountain of data.
=== Reducing troubleshooting times and subject matter expert (SME) involvement
It is said that 75 percent of troubleshooting time is spent mining data to try
and identify the root cause of an incident. The {ml} features in {xpack}
automatically analyze data and boil down the massive volume of information
to the few metrics or log messages that have changed behavior.
This allows the subject matter experts (SMEs) to focus on the subset of
information that is relevant to an issue, which greatly reduces triage time.
//In a major credit services provider, within a month of deployment, the company
//reported that its overall time to triage was reduced by 70 percent and the use of
//outside SMEs time to troubleshoot was decreased by 80 percent.
=== Finding and fixing issues before they impact the end user
Large-scale systems, such as online banking, typically require complex
infrastructures involving hundreds of different interdependent applications.
Just accessing an account summary page might involve dozens of different
databases, systems and applications.
Because of their importance to the business, these systems are typically highly
resilient and a critical problem will not be allowed to re-occur.
If a problem happens, it is likely to be complicated and be the result of a
causal sequence of events that span multiple interacting resources.
Troubleshooting would require the analysis of large volumes of data with a wide
range of characteristics and data types. A variety of experts from multiple
disciplines would need to participate in time consuming “war rooms” to mine
the data for answers.
By using {ml} in real-time, large volumes of data can be analyzed to provide
alerts to early indicators of problems and highlight the events that were likely
to have contributed to the problem.
=== Finding rare events that may be symptomatic of a security issue
With several hundred servers under management, the presence of new processes
running might indicate a security breach.
Using typical operational management techniques, each server would require a
period of baselining in order to identify which processes are considered standard.
Ideally a baseline would be created for each server (or server group)
and would be periodically updated, making this a large management overhead.
By using {ml} features in {xpack}, baselines are automatically built based
upon normal behavior patterns for each host and alerts are generated when rare
events occur.
=== Finding anomalies in periodic data
For data that has periodicity it is difficult for standard monitoring tools to
accurately tell whether a change in data is due to a service outage, or is a
result of usual time schedules. Daily and weekly trends in data along with
peak and off-peak hours, make it difficult to identify anomalies using standard
threshold-based methods. A min and max threshold for SMS text activity at 2am
would be very different than the thresholds that would be effective during the day.
By using {ml}, time-related trends are automatically identified and smoothed,
leaving the residual to be analyzed for anomalies.

View File

@ -0,0 +1,12 @@
[[ml-release-notes]]
== Machine Learning Release Notes
[[ml-change-list]]
=== Change List
[float]
==== 5.4.0
May 2017
* Introduces Machine Learning in the Elastic Stack.

View File

@ -0,0 +1,4 @@
[[ml-troubleshooting]]
== Machine Learning Troubleshooting
TBD

View File

@ -0,0 +1,70 @@
[[ml-apis]]
== Machine Learning APIs
Use machine learning to detect anomalies in time series data.
* <<ml-api-datafeed-endpoint,Datafeeds>>
* <<ml-api-job-endpoint,Jobs>>
* <<ml-api-snapshot-endpoint, Model Snapshots>>
* <<ml-api-result-endpoint,Results>>
* <<ml-api-definitions, Definitions>>
[[ml-api-datafeed-endpoint]]
=== Datafeeds
include::ml/put-datafeed.asciidoc[]
include::ml/delete-datafeed.asciidoc[]
include::ml/get-datafeed.asciidoc[]
include::ml/get-datafeed-stats.asciidoc[]
include::ml/preview-datafeed.asciidoc[]
include::ml/start-datafeed.asciidoc[]
include::ml/stop-datafeed.asciidoc[]
include::ml/update-datafeed.asciidoc[]
[[ml-api-job-endpoint]]
=== Jobs
include::ml/close-job.asciidoc[]
include::ml/put-job.asciidoc[]
include::ml/delete-job.asciidoc[]
include::ml/get-job.asciidoc[]
include::ml/get-job-stats.asciidoc[]
include::ml/flush-job.asciidoc[]
include::ml/open-job.asciidoc[]
include::ml/post-data.asciidoc[]
include::ml/update-job.asciidoc[]
include::ml/validate-job.asciidoc[]
include::ml/validate-detector.asciidoc[]
[[ml-api-snapshot-endpoint]]
=== Model Snapshots
include::ml/delete-snapshot.asciidoc[]
include::ml/get-snapshot.asciidoc[]
include::ml/revert-snapshot.asciidoc[]
include::ml/update-snapshot.asciidoc[]
[[ml-api-result-endpoint]]
=== Results
include::ml/get-bucket.asciidoc[]
include::ml/get-category.asciidoc[]
include::ml/get-influencer.asciidoc[]
include::ml/get-record.asciidoc[]
[[ml-api-definitions]]
=== Definitions
include::ml/datafeedresource.asciidoc[]
include::ml/jobresource.asciidoc[]
include::ml/jobcounts.asciidoc[]
include::ml/snapshotresource.asciidoc[]
include::ml/resultsresource.asciidoc[]
//* <<ml-put-job>>
//* <<ml-delete-job>>
//* <<ml-get-job>>
//* <<ml-open-close-job>>
//* <<ml-flush-job>>
//* <<ml-post-data>>

View File

@ -1,20 +0,0 @@
[[ml-api]]
== Machine Learning APIs
Use machine learning to detect anomalies in time series data.
//=== Job Management APIs
//* <<ml-put-job>>
//* <<ml-delete-job>>
//* <<ml-get-job>>
//* <<ml-open-close-job>>
//* <<ml-flush-job>>
//* <<ml-post-data>>
//include::ml/put-job.asciidoc[]
//include::ml/delete-job.asciidoc[]
//include::ml/get-job.asciidoc[]
//include::ml/open-close-job.asciidoc[]
//include::ml/flush-job.asciidoc[]
//include::ml/post-data.asciidoc[]

View File

@ -0,0 +1,63 @@
[[ml-close-job]]
==== Close Jobs
An anomaly detection job must be opened in order for it to be ready to receive and analyze data.
A job may be opened and closed multiple times throughout its lifecycle.
===== Request
`POST _xpack/ml/anomaly_detectors/<job_id>/_close`
===== Description
A job can be closed once all data has been analyzed.
When you close a job, it runs housekeeping tasks such as pruning the model history,
flushing buffers, calculating final results and persisting the internal models.
Depending upon the size of the job, it could take several minutes to close and
the equivalent time to re-open.
Once closed, the anomaly detection job has almost no overhead on the cluster
(except for maintaining its meta data). A closed job is blocked for receiving
data and analysis operations, however you can still explore and navigate results.
//NOTE:
//OUTDATED?: If using the {prelert} UI, the job will be automatically closed when stopping a datafeed job.
===== Path Parameters
`job_id` (required)::
(+string+) Identifier for the job
===== Query Parameters
`close_timeout`::
(+time+; default: ++30 min++) Controls the time to wait until a job has closed
////
===== Responses
200
(EmptyResponse) The cluster has been successfully deleted
404
(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found)
412
(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error)
////
===== Examples
The following example closes the `event_rate` job:
[source,js]
--------------------------------------------------
POST _xpack/ml/anomaly_detectors/event_rate/_close
--------------------------------------------------
// CONSOLE
// TEST[skip:todo]
When the job is closed, you receive the following results:
----
{
"closed": true
}
----

View File

@ -0,0 +1,10 @@
[[ml-datafeed-resource]]
==== Data Feed Resources
A data feed resource has the following properties:
TBD
////
`analysis_config`::
(+object+) The analysis configuration, which specifies how to analyze the data. See <<ml-analysisconfig, analysis configuration objects>>.
////

View File

@ -0,0 +1,56 @@
[[ml-delete-datafeed]]
==== Delete Data Feeds
The delete data feed API allows you to delete an existing data feed.
===== Request
`DELETE _xpack/ml/datafeeds/<feed_id>`
////
===== Description
All job configuration, model state and results are deleted.
IMPORTANT: Deleting a job must be done via this API only. Do not delete the
job directly from the `.ml-*` indices using the Elasticsearch
DELETE Document API. When {security} is enabled, make sure no `write`
privileges are granted to anyone over the `.ml-*` indices.
Before you can delete a job, you must delete the data feeds that are associated with it.
//See <<>>.
It is not currently possible to delete multiple jobs using wildcards or a comma separated list.
////
===== Path Parameters
`feed_id` (required)::
(+string+) Identifier for the data feed
////
===== Responses
200
(EmptyResponse) The cluster has been successfully deleted
404
(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found)
412
(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error)
////
===== Examples
The following example deletes the `datafeed-it-ops` data feed:
[source,js]
--------------------------------------------------
DELETE _xpack/ml/datafeeds/datafeed-it-ops
--------------------------------------------------
// CONSOLE
// TEST[skip:todo]
When the data feed is deleted, you receive the following results:
----
{
"acknowledged": true
}
----

View File

@ -0,0 +1,55 @@
[[ml-delete-job]]
==== Delete Jobs
The delete job API allows you to delete an existing anomaly detection job.
===== Request
`DELETE _xpack/ml/anomaly_detectors/<job_id>`
===== Description
All job configuration, model state and results are deleted.
IMPORTANT: Deleting a job must be done via this API only. Do not delete the
job directly from the `.ml-*` indices using the Elasticsearch
DELETE Document API. When {security} is enabled, make sure no `write`
privileges are granted to anyone over the `.ml-*` indices.
Before you can delete a job, you must delete the data feeds that are associated with it.
//See <<>>.
It is not currently possible to delete multiple jobs using wildcards or a comma separated list.
===== Path Parameters
`job_id` (required)::
(+string+) Identifier for the job
////
===== Responses
200
(EmptyResponse) The cluster has been successfully deleted
404
(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found)
412
(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error)
////
===== Examples
The following example deletes the `event_rate` job:
[source,js]
--------------------------------------------------
DELETE _xpack/ml/anomaly_detectors/event_rate
--------------------------------------------------
// CONSOLE
// TEST[skip:todo]
When the job is deleted, you receive the following results:
----
{
"acknowledged": true
}
----

View File

@ -0,0 +1,60 @@
[[ml-delete-snapshot]]
==== Delete Model Snapshots
The delete model snapshot API allows you to delete an existing model snapshot.
===== Request
`DELETE _xpack/ml/anomaly_detectors/<job_id>/model_snapshots/<snapshot_id>`
////
===== Description
All job configuration, model state and results are deleted.
IMPORTANT: Deleting a job must be done via this API only. Do not delete the
job directly from the `.ml-*` indices using the Elasticsearch
DELETE Document API. When {security} is enabled, make sure no `write`
privileges are granted to anyone over the `.ml-*` indices.
Before you can delete a job, you must delete the data feeds that are associated with it.
//See <<>>.
It is not currently possible to delete multiple jobs using wildcards or a comma separated list.
////
===== Path Parameters
`job_id` (required)::
(+string+) Identifier for the job
`snapshot_id` (required)::
(+string+) Identifier for the model snapshot
////
===== Responses
200
(EmptyResponse) The cluster has been successfully deleted
404
(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found)
412
(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error)
===== Examples
The following example deletes the `event_rate` job:
[source,js]
--------------------------------------------------
DELETE _xpack/ml/anomaly_detectors/event_rate
--------------------------------------------------
// CONSOLE
// TEST[skip:todo]
When the job is deleted, you receive the following results:
----
{
"acknowledged": true
}
----
////

View File

@ -0,0 +1,49 @@
[[ml-flush-job]]
==== Flush Jobs
The flush job API forces any buffered data to be processed by the {ml} job.
===== Request
`POST _xpack/ml/anomaly_detectors/<job_id>/_flush`
===== Description
The flush job API is only applicable when sending data for analysis using the POST `_data` API.
Depending on the content of the buffer, then it might additionally calculate new results.
Both flush and close operations are similar, however the flush is more efficient if you are expecting to send more data for analysis.
When flushing, the job remains open and is available to continue analyzing data.
A close operation additionally prunes and persists the model state to disk and the job must be opened again before analyzing further data.
===== Path Parameters
`job_id` (required)::
( +string+) Identifier for the job
===== Query Parameters
`calc_interim`::
(+boolean+; default: ++false++) If true (default false), will calculate interim
results for the most recent bucket or all buckets within the latency period
`start`::
(+string+; default: ++null++) When used in conjunction with `calc_interim`,
specifies the range of buckets on which to calculate interim results
`end`::
(+string+; default: ++null++) When used in conjunction with `calc_interim`,
specifies the range of buckets on which to calculate interim results
`advance_time`::
(+string+; default: ++null++) Specifies that no data prior to the date `advance_time` is expected
////
===== Responses
200
(EmptyResponse) The cluster has been successfully deleted
404
(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found)
412
(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error)
////

View File

@ -0,0 +1,86 @@
[[ml-get-bucket]]
==== Get Buckets
The get bucket API allows you to retrieve information about buckets in the results from a job.
===== Request
`GET _xpack/ml/anomaly_detectors/<job_id>/results/buckets` +
`GET _xpack/ml/anomaly_detectors/<job_id>/results/buckets/<timestamp>`
////
===== Description
OUTDATED?: The get job API can also be applied to all jobs by using `_all` as the job name.
////
===== Path Parameters
`job_id`::
(+string+) Identifier for the job
`timestamp`::
(+string+) The timestamp of a single bucket result. If you do not specify this optional parameter,
the API returns information about all buckets that you have authority to view in the job.
////
===== Results
The API returns information about the job resource. For more information, see
<<ml-job-resource,job resources>>.
===== Query Parameters
`_stats`::
(+boolean+; default: ++true++) If true (default false), will just validate the cluster definition but will not perform the creation
===== Responses
200
(EmptyResponse) The cluster has been successfully deleted
404
(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found)
412
(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error)
===== Examples
.Example results for a single job
----
{
"count": 1,
"jobs": [
{
"job_id": "it-ops-kpi",
"description": "First simple job",
"create_time": 1491007356077,
"finished_time": 1491007365347,
"analysis_config": {
"bucket_span": "5m",
"latency": "0ms",
"summary_count_field_name": "doc_count",
"detectors": [
{
"detector_description": "low_sum(events_per_min)",
"function": "low_sum",
"field_name": "events_per_min",
"detector_rules": []
}
],
"influencers": [],
"use_per_partition_normalization": false
},
"data_description": {
"time_field": "@timestamp",
"time_format": "epoch_ms"
},
"model_plot_config": {
"enabled": true
},
"model_snapshot_retention_days": 1,
"model_snapshot_id": "1491007364",
"results_index_name": "shared"
}
]
}
----
////

View File

@ -0,0 +1,86 @@
[[ml-get-category]]
==== Get Categories
The get categories API allows you to retrieve information about the categories in the results for a job.
===== Request
`GET _xpack/ml/anomaly_detectors/<job_id>/results/categories` +
`GET _xpack/ml/anomaly_detectors/<job_id>/results/categories/<category_id>`
////
===== Description
OUTDATED?: The get job API can also be applied to all jobs by using `_all` as the job name.
////
===== Path Parameters
`job_id`::
(+string+) Identifier for the job.
`category_id`::
(+string+) Identifier for the category. If you do not specify this optional parameter,
the API returns information about all categories that you have authority to view.
////
===== Results
The API returns information about the job resource. For more information, see
<<ml-job-resource,job resources>>.
===== Query Parameters
`_stats`::
(+boolean+; default: ++true++) If true (default false), will just validate the cluster definition but will not perform the creation
===== Responses
200
(EmptyResponse) The cluster has been successfully deleted
404
(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found)
412
(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error)
===== Examples
.Example results for a single job
----
{
"count": 1,
"jobs": [
{
"job_id": "it-ops-kpi",
"description": "First simple job",
"create_time": 1491007356077,
"finished_time": 1491007365347,
"analysis_config": {
"bucket_span": "5m",
"latency": "0ms",
"summary_count_field_name": "doc_count",
"detectors": [
{
"detector_description": "low_sum(events_per_min)",
"function": "low_sum",
"field_name": "events_per_min",
"detector_rules": []
}
],
"influencers": [],
"use_per_partition_normalization": false
},
"data_description": {
"time_field": "@timestamp",
"time_format": "epoch_ms"
},
"model_plot_config": {
"enabled": true
},
"model_snapshot_retention_days": 1,
"model_snapshot_id": "1491007364",
"results_index_name": "shared"
}
]
}
----
////

View File

@ -0,0 +1,105 @@
[[ml-get-datafeed-stats]]
==== Get Data Feed Statistics
The get data feed statistics API allows you to retrieve usage information for data feeds.
===== Request
`GET _xpack/datafeeds/_stats` +
`GET _xpack/datafeeds/<feed_id>/_stats`
////
===== Description
TBD
////
===== Path Parameters
`feed_id`::
(+string+) Identifier for the data feed. If you do not specify this optional parameter,
the API returns information about all data feeds that you have authority to view.
////
===== Results
The API returns the following usage information:
`job_id`::
(+string+) A numerical character string that uniquely identifies the job.
`data_counts`::
(+object+) An object that describes the number of records processed and any related error counts.
See <<ml-datacounts,data counts objects>>.
`model_size_stats`::
(+object+) An object that provides information about the size and contents of the model.
See <<ml-modelsizestats,model size stats objects>>
`state`::
(+string+) The status of the job, which can be one of the following values:
running:: The job is actively receiving and processing data.
closed:: The job finished successfully with its model state persisted.
The job is still available to accept further data. NOTE: If you send data in a periodic cycle
and close the job at the end of each transaction, the job is marked as closed in the intervals
between when data is sent. For example, if data is sent every minute and it takes 1 second to process,
the job has a closed state for 59 seconds.
failed:: The job did not finish successfully due to an error. NOTE: This can occur due to invalid input data.
In this case, sending corrected data to a failed job re-opens the job and resets it to a running state.
===== Responses
200
(EmptyResponse) The cluster has been successfully deleted
404
(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found)
412
(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error)
===== Examples
.Example results for a single job
----
{
"count": 1,
"jobs": [
{
"job_id": "it-ops-kpi",
"data_counts": {
"job_id": "it-ops",
"processed_record_count": 43272,
"processed_field_count": 86544,
"input_bytes": 2846163,
"input_field_count": 86544,
"invalid_date_count": 0,
"missing_field_count": 0,
"out_of_order_timestamp_count": 0,
"empty_bucket_count": 0,
"sparse_bucket_count": 0,
"bucket_count": 4329,
"earliest_record_timestamp": 1454020560000,
"latest_record_timestamp": 1455318900000,
"last_data_time": 1491235405945,
"input_record_count": 43272
},
"model_size_stats": {
"job_id": "it-ops",
"result_type": "model_size_stats",
"model_bytes": 25586,
"total_by_field_count": 3,
"total_over_field_count": 0,
"total_partition_field_count": 2,
"bucket_allocation_failures_count": 0,
"memory_status": "ok",
"log_time": 1491235406000,
"timestamp": 1455318600000
},
"state": "closed"
}
]
}
----
////

View File

@ -0,0 +1,92 @@
[[ml-get-datafeed]]
==== Get Data Feeds
The get data feeds API allows you to retrieve configuration information about data feeds.
===== Request
`GET _xpack/ml/datafeeds/` +
`GET _xpack/ml/datafeeds/<feed_id>`
////
===== Description
OUTDATED?: The get job API can also be applied to all jobs by using `_all` as the job name.
////
===== Path Parameters
`feed_id`::
(+string+) Identifier for the data feed. If you do not specify this optional parameter,
the API returns information about all data feeds that you have authority to view.
===== Results
The API returns information about the data feed resource.
//For more information, see <<ml-job-resource,job resources>>.
////
===== Query Parameters
`_stats`::
(+boolean+; default: ++true++) If true (default false), will just validate the cluster definition but will not perform the creation
===== Responses
200
(EmptyResponse) The cluster has been successfully deleted
404
(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found)
412
(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error)
////
===== Examples
.Example results for a single data feed
----
{
"count": 1,
"datafeeds": [
{
"datafeed_id": "datafeed-it-ops",
"job_id": "it-ops",
"query_delay": "60s",
"frequency": "150s",
"indexes": [
"it_ops_metrics"
],
"types": [
"network",
"kpi",
"sql"
],
"query": {
"match_all": {
"boost": 1
}
},
"aggregations": {
"@timestamp": {
"histogram": {
"field": "@timestamp",
"interval": 30000,
"offset": 0,
"order": {
"_key": "asc"
},
"keyed": false,
"min_doc_count": 0
},
"aggregations": {
"events_per_min": {
"sum": {
"field": "events_per_min"
}
}
}
}
},
"scroll_size": 1000
}
]
}
----

View File

@ -0,0 +1,81 @@
[[ml-get-influencer]]
==== Get Influencers
The get influencers API allows you to retrieve information about the influencers in a job.
===== Request
`GET _xpack/ml/anomaly_detectors/<job_id>/results/influencers`
////
===== Description
OUTDATED?: The get job API can also be applied to all jobs by using `_all` as the job name.
////
===== Path Parameters
`job_id`::
(+string+) Identifier for the job.
////
===== Results
The API returns information about the job resource. For more information, see
<<ml-job-resource,job resources>>.
===== Query Parameters
`_stats`::
(+boolean+; default: ++true++) If true (default false), will just validate the cluster definition but will not perform the creation
===== Responses
200
(EmptyResponse) The cluster has been successfully deleted
404
(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found)
412
(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error)
===== Examples
.Example results for a single job
----
{
"count": 1,
"jobs": [
{
"job_id": "it-ops-kpi",
"description": "First simple job",
"create_time": 1491007356077,
"finished_time": 1491007365347,
"analysis_config": {
"bucket_span": "5m",
"latency": "0ms",
"summary_count_field_name": "doc_count",
"detectors": [
{
"detector_description": "low_sum(events_per_min)",
"function": "low_sum",
"field_name": "events_per_min",
"detector_rules": []
}
],
"influencers": [],
"use_per_partition_normalization": false
},
"data_description": {
"time_field": "@timestamp",
"time_format": "epoch_ms"
},
"model_plot_config": {
"enabled": true
},
"model_snapshot_retention_days": 1,
"model_snapshot_id": "1491007364",
"results_index_name": "shared"
}
]
}
----
////

View File

@ -0,0 +1,103 @@
[[ml-get-job-stats]]
==== Get Job Statistics
The get jobs API allows you to retrieve usage information for jobs.
===== Request
`GET _xpack/anomaly_detectors/_stats` +
`GET _xpack/anomaly_detectors/<job_id>/_stats`
////
===== Description
TBD
////
===== Path Parameters
`job_id`::
(+string+) Identifier for the job. If you do not specify this optional parameter,
the API returns information about all jobs that you have authority to view.
===== Results
The API returns the following usage information:
`job_id`::
(+string+) A numerical character string that uniquely identifies the job.
`data_counts`::
(+object+) An object that describes the number of records processed and any related error counts.
See <<ml-datacounts,data counts objects>>.
`model_size_stats`::
(+object+) An object that provides information about the size and contents of the model.
See <<ml-modelsizestats,model size stats objects>>
`state`::
(+string+) The status of the job, which can be one of the following values:
running:: The job is actively receiving and processing data.
closed:: The job finished successfully with its model state persisted.
The job is still available to accept further data. NOTE: If you send data in a periodic cycle
and close the job at the end of each transaction, the job is marked as closed in the intervals
between when data is sent. For example, if data is sent every minute and it takes 1 second to process,
the job has a closed state for 59 seconds.
failed:: The job did not finish successfully due to an error. NOTE: This can occur due to invalid input data.
In this case, sending corrected data to a failed job re-opens the job and resets it to a running state.
////
===== Responses
200
(EmptyResponse) The cluster has been successfully deleted
404
(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found)
412
(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error)
////
===== Examples
.Example results for a single job
----
{
"count": 1,
"jobs": [
{
"job_id": "it-ops-kpi",
"data_counts": {
"job_id": "it-ops",
"processed_record_count": 43272,
"processed_field_count": 86544,
"input_bytes": 2846163,
"input_field_count": 86544,
"invalid_date_count": 0,
"missing_field_count": 0,
"out_of_order_timestamp_count": 0,
"empty_bucket_count": 0,
"sparse_bucket_count": 0,
"bucket_count": 4329,
"earliest_record_timestamp": 1454020560000,
"latest_record_timestamp": 1455318900000,
"last_data_time": 1491235405945,
"input_record_count": 43272
},
"model_size_stats": {
"job_id": "it-ops",
"result_type": "model_size_stats",
"model_bytes": 25586,
"total_by_field_count": 3,
"total_over_field_count": 0,
"total_partition_field_count": 2,
"bucket_allocation_failures_count": 0,
"memory_status": "ok",
"log_time": 1491235406000,
"timestamp": 1455318600000
},
"state": "closed"
}
]
}
----

View File

@ -0,0 +1,82 @@
[[ml-get-job]]
==== Get Job Details
The get jobs API allows you to retrieve configuration information about jobs.
===== Request
`GET _xpack/ml/anomaly_detectors/` +
`GET _xpack/ml/anomaly_detectors/<job_id>`
////
===== Description
OUTDATED?: The get job API can also be applied to all jobs by using `_all` as the job name.
////
===== Path Parameters
`job_id`::
(+string+) Identifier for the job. If you do not specify this optional parameter,
the API returns information about all jobs that you have authority to view.
===== Results
The API returns information about the job resource. For more information, see
<<ml-job-resource,job resources>>.
////
===== Query Parameters
`_stats`::
(+boolean+; default: ++true++) If true (default false), will just validate the cluster definition but will not perform the creation
===== Responses
200
(EmptyResponse) The cluster has been successfully deleted
404
(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found)
412
(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error)
////
===== Examples
.Example results for a single job
----
{
"count": 1,
"jobs": [
{
"job_id": "it-ops-kpi",
"description": "First simple job",
"create_time": 1491007356077,
"finished_time": 1491007365347,
"analysis_config": {
"bucket_span": "5m",
"latency": "0ms",
"summary_count_field_name": "doc_count",
"detectors": [
{
"detector_description": "low_sum(events_per_min)",
"function": "low_sum",
"field_name": "events_per_min",
"detector_rules": []
}
],
"influencers": [],
"use_per_partition_normalization": false
},
"data_description": {
"time_field": "@timestamp",
"time_format": "epoch_ms"
},
"model_plot_config": {
"enabled": true
},
"model_snapshot_retention_days": 1,
"model_snapshot_id": "1491007364",
"results_index_name": "shared"
}
]
}
----

View File

@ -0,0 +1,81 @@
[[ml-get-record]]
==== Get Job Details
The get records API allows you to retrieve records from the results that were generated by a job.
===== Request
`GET _xpack/ml/anomaly_detectors/<job_id>/results/records`
////
===== Description
OUTDATED?: The get job API can also be applied to all jobs by using `_all` as the job name.
////
===== Path Parameters
`job_id`::
(+string+) Identifier for the job.
////
===== Results
The API returns information about the job resource. For more information, see
<<ml-job-resource,job resources>>.
===== Query Parameters
`_stats`::
(+boolean+; default: ++true++) If true (default false), will just validate the cluster definition but will not perform the creation
===== Responses
200
(EmptyResponse) The cluster has been successfully deleted
404
(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found)
412
(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error)
===== Examples
.Example results for a single job
----
{
"count": 1,
"jobs": [
{
"job_id": "it-ops-kpi",
"description": "First simple job",
"create_time": 1491007356077,
"finished_time": 1491007365347,
"analysis_config": {
"bucket_span": "5m",
"latency": "0ms",
"summary_count_field_name": "doc_count",
"detectors": [
{
"detector_description": "low_sum(events_per_min)",
"function": "low_sum",
"field_name": "events_per_min",
"detector_rules": []
}
],
"influencers": [],
"use_per_partition_normalization": false
},
"data_description": {
"time_field": "@timestamp",
"time_format": "epoch_ms"
},
"model_plot_config": {
"enabled": true
},
"model_snapshot_retention_days": 1,
"model_snapshot_id": "1491007364",
"results_index_name": "shared"
}
]
}
----
////

View File

@ -0,0 +1,86 @@
[[ml-get-snapshot]]
==== Get Model Snapshots
The get model snapshots API allows you to retrieve information about model snapshots.
===== Request
`GET _xpack/ml/anomaly_detectors/<job_id>/model_snapshots` +
`GET _xpack/ml/anomaly_detectors/<job_id>/model_snapshots/<snapshot_id>`
////
===== Description
OUTDATED?: The get job API can also be applied to all jobs by using `_all` as the job name.
////
===== Path Parameters
`job_id`::
(+string+) Identifier for the job.
`snapshot_id`::
(+string+) Identifier for the job. If you do not specify this optional parameter,
the API returns information about all model snapshots that you have authority to view.
////
===== Results
The API returns information about the job resource. For more information, see
<<ml-job-resource,job resources>>.
===== Query Parameters
`_stats`::
(+boolean+; default: ++true++) If true (default false), will just validate the cluster definition but will not perform the creation
===== Responses
200
(EmptyResponse) The cluster has been successfully deleted
404
(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found)
412
(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error)
===== Examples
.Example results for a single job
----
{
"count": 1,
"jobs": [
{
"job_id": "it-ops-kpi",
"description": "First simple job",
"create_time": 1491007356077,
"finished_time": 1491007365347,
"analysis_config": {
"bucket_span": "5m",
"latency": "0ms",
"summary_count_field_name": "doc_count",
"detectors": [
{
"detector_description": "low_sum(events_per_min)",
"function": "low_sum",
"field_name": "events_per_min",
"detector_rules": []
}
],
"influencers": [],
"use_per_partition_normalization": false
},
"data_description": {
"time_field": "@timestamp",
"time_format": "epoch_ms"
},
"model_plot_config": {
"enabled": true
},
"model_snapshot_retention_days": 1,
"model_snapshot_id": "1491007364",
"results_index_name": "shared"
}
]
}
----
////

View File

@ -0,0 +1,120 @@
[[ml-jobcounts]]
==== Job Counts
The `data_counts` object provides information about the operational progress of a job.
It describes the number of records processed and any related error counts.
NOTE: Job count values are cumulative for the lifetime of a job. If a model snapshot is reverted
or old results are deleted, the job counts are not reset.
[[ml-datacounts]]
===== Data Counts Objects
A `data_counts` object has the following properties:
`job_id`::
(+string+) A numerical character string that uniquely identifies the job.
`processed_record_count`::
(+long+) The number of records that have been processed by the job.
This value includes records with missing fields, since they are nonetheless analyzed.
The following records are not processed:
* Records not in chronological order and outside the latency window
* Records with invalid timestamps
* Records filtered by an exclude transform
`processed_field_count`::
(+long+) The total number of fields in all the records that have been processed by the job.
Only fields that are specified in the detector configuration object contribute to this count.
The time stamp is not included in this count.
`input_bytes`::
(+long+) The number of raw bytes read by the job.
`input_field_count`::
(+long+) The total number of record fields read by the job. This count includes
fields that are not used in the analysis.
`invalid_date_count`::
(+long+) The number of records with either a missing date field or a date that could not be parsed.
`missing_field_count`::
(+long+) The number of records that are missing a field that the job is configured to analyze.
Records with missing fields are still processed because it is possible that not all fields are missing.
The value of `processed_record_count` includes this count.
`out_of_order_timestamp_count`::
(+long+) The number of records that are out of time sequence and outside of the latency window.
These records are discarded, since jobs require time series data to be in ascending chronological order.
`empty_bucket_count`::
TBD
`sparse_bucket_count`::
TBD
`bucket_count`::
(+long+) The number of bucket results produced by the job.
`earliest_record_timestamp`::
(+string+) The timestamp of the earliest chronologically ordered record.
The datetime string is in ISO 8601 format.
`latest_record_timestamp`::
(+string+) The timestamp of the last chronologically ordered record.
If the records are not in strict chronological order, this value might not be
the same as the timestamp of the last record.
The datetime string is in ISO 8601 format.
`last_data_time`::
TBD
`input_record_count`::
(+long+) The number of data records read by the job.
[[ml-modelsizestats]]
===== Model Size Stats Objects
The `model_size_stats` object has the following properties:
`job_id`::
(+string+) A numerical character string that uniquely identifies the job.
`result_type`::
TBD
`model_bytes`::
(+long+) The number of bytes of memory used by the models. This is the maximum value since the
last time the model was persisted. If the job is closed, this value indicates the latest size.
`total_by_field_count`::
(+long+) The number of `by` field values that were analyzed by the models.
NOTE: The `by` field values are counted separately for each detector and partition.
`total_over_field_count`::
(+long+) The number of `over` field values that were analyzed by the models.
NOTE: The `over` field values are counted separately for each detector and partition.
`total_partition_field_count`::
(+long+) The number of `partition` field values that were analyzed by the models.
`bucket_allocation_failures_count`::
TBD
`memory_status`::
(+string+) The status of the mathematical models. This property can have one of the following values:
"ok":: The models stayed below the configured value.
"soft_limit":: The models used more than 60% of the configured memory limit and older unused models will
be pruned to free up space.
"hard_limit":: The models used more space than the configured memory limit. As a result,
not all incoming data was processed.
`log_time`::
TBD
`timestamp`::
TBD

View File

@ -0,0 +1,243 @@
[[ml-job-resource]]
==== Job Resources
A job resource has the following properties:
`analysis_config`::
(+object+) The analysis configuration, which specifies how to analyze the data. See <<ml-analysisconfig, analysis configuration objects>>.
`analysis_limits`::
(+object+) Defines limits on the number of field values and time buckets to be analyzed.
See <<ml-apilimits,analysis limits>>.
`create_time`::
(+string+) The time the job was created, in ISO 8601 format. For example, `1491007356077`.
`data_description`::
(+object+) Describes the data format and how APIs parse timestamp fields. See <<ml-datadescription,data description objects>>.
`description`::
(+string+) An optional description of the job.
`finished_time`::
(+string+) If the job closed of failed, this is the time the job finished, in ISO 8601 format.
Otherwise, it is `null`. For example, `1491007365347`.
`job_id`::
(+string+) A numerical character string that uniquely identifies the job.
`model_plot_config`:: TBD
`enabled`:: TBD. For example, `true`.
`model_snapshot_id`::
TBD. For example, `1491007364`.
`model_snapshot_retention_days`::
(+long+) The time in days that model snapshots are retained for the job. Older snapshots are deleted.
The default value is 1 day.
`results_index_name`::
TBD. For example, `shared`.
[[ml-analysisconfig]]
===== Analysis Configuration Objects
An analysis configuration object has the following properties:
`batch_span`::
(+unsigned integer+) The interval into which to batch seasonal data, measured in seconds.
This is an advanced option which is usually left as the default value.
////
Requires `period` to be specified
////
`bucket_span`::
(+unsigned integer+, required) The size of the interval that the analysis is aggregated into, measured in seconds.
The default value is 300 seconds (5 minutes).
`categorization_field_name`::
(+string+) If not null, the values of the specified field will be categorized.
The resulting categories can be used in a detector by setting `by_field_name`,
`over_field_name`, or `partition_field_name` to the keyword `prelertcategory`.
`categorization_filters`::
(+array of strings+) If `categorization_field_name` is specified, you can also define optional filters.
This property expects an array of regular expressions.
The expressions are used to filter out matching sequences off the categorization field values.
This functionality is useful to fine tune categorization by excluding sequences
that should not be taken into consideration for defining categories.
For example, you can exclude SQL statements that appear in your log files.
`detectors`::
(+array+, required) An array of detector configuration objects,
which describe the anomaly detectors that are used in the job.
See <<ml-detectorconfig,detector configuration objects>>.
NOTE: If the `detectors` array does not contain at least one detector, no analysis can occur
and an error is returned.
`influencers`::
(+array of strings+) A comma separated list of influencer field names.
Typically these can be the by, over, or partition fields that are used in the detector configuration.
You might also want to use a field name that is not specifically named in a detector,
but is available as part of the input data. When you use multiple detectors,
the use of influencers is recommended as it aggregates results for each influencer entity.
`latency`::
(+unsigned integer+) The size of the window, in seconds, in which to expect data that is out of time order.
The default value is 0 seconds (no latency).
NOTE: Latency is only applicable when you send data by using the <<ml-post-data, Post Data to Jobs>> API.
`multivariate_by_fields`::
(+boolean+) If set to `true`, the analysis will automatically find correlations
between metrics for a given `by` field value and report anomalies when those
correlations cease to hold. For example, suppose CPU and memory usage on host A
is usually highly correlated with the same metrics on host B. Perhaps this
correlation occurs because they are running a load-balanced application.
If you enable this property, then anomalies will be reported when, for example,
CPU usage on host A is high and the value of CPU usage on host B is low.
That is to say, you'll see an anomaly when the CPU of host A is unusual given the CPU of host B.
NOTE: To use the `multivariate_by_fields` property, you must also specify `by_field_name` in your detector.
`overlapping_buckets`::
(+boolean+) If set to `true`, an additional analysis occurs that runs out of phase by half a bucket length.
This requires more system resources and enhances detection of anomalies that span bucket boundaries.
`period`::
(+unsigned integer+) The repeat interval for periodic data in multiples of `batch_span`.
If this property is not specified, daily and weekly periodicity are automatically determined.
This is an advanced option which is usually left as the default value.
`summary_count_field_name`::
(+string+) If not null, the data fed to the job is expected to be pre-summarized.
This property value is the name of the field that contains the count of raw data points that have been summarized.
The same `summary_count_field_name` applies to all detectors in the job.
NOTE: The `summary_count_field_name` property cannot be used with the `metric` function.
`use_per_partition_normalization`::
TBD
[[ml-detectorconfig]]
===== Detector Configuration Objects
Detector configuration objects specify which data fields a job analyzes.
They also specify which analytical functions are used.
You can specify multiple detectors for a job.
Each detector has the following properties:
`by_field_name`::
(+string+) The field used to split the data.
In particular, this property is used for analyzing the splits with respect to their own history.
It is used for finding unusual values in the context of the split.
`detector_description`::
(+string+) A description of the detector. For example, `low_sum(events_per_min)`.
`detector_rules`::
TBD
`exclude_frequent`::
(+string+) Contains one of the following values: `all`, `none`, `by`, or `over`.
If set, frequent entities are excluded from influencing the anomaly results.
Entities can be considered frequent over time or frequent in a population.
If you are working with both over and by fields, then you can set `exclude_frequent`
to `all` for both fields, or to `by` or `over` for those specific fields.
`field_name`::
(+string+) The field that the detector uses in the function. If you use an event rate
function such as `count` or `rare`, do not specify this field.
NOTE: The `field_name` cannot contain double quotes or backslashes.
`function`::
(+string+, required) The analysis function that is used.
For example, `count`, `rare`, `mean`, `min`, `max`, and `sum`.
The default function is `metric`, which looks for anomalies in all of `min`, `max`,
and `mean`.
NOTE: You cannot use the `metric` function with pre-summarized input. If `summary_count_field_name`
is not null, you must specify a function other than `metric`.
`over_field_name`::
(+string+) The field used to split the data.
In particular, this property is used for analyzing the splits with respect to the history of all splits.
It is used for finding unusual values in the population of all splits.
`partition_field_name`::
(+string+) The field used to segment the analysis.
When you use this property, you have completely independent baselines for each value of this field.
`use_null`::
(+boolean+) Defines whether a new series is used as the null series
when there is no value for the by or partition fields. The default value is `false`
IMPORTANT: Field names are case sensitive, for example a field named 'Bytes' is different to one named 'bytes'.
[[ml-datadescription]]
===== Data Description Objects
The data description settings define the format of the input data.
When data is read from Elasticsearch, the datafeed must be configured.
This defines which index data will be taken from, and over what time period.
When data is received via the <<ml-post-data, Post Data to Jobs>> API,
you must specify the data format (for example, JSON or CSV). In this scenario,
the data posted is not stored in Elasticsearch. Only the results for anomaly detection are retained.
When you create a job, by default it accepts data in tab-separated-values format and expects
an Epoch time value in a field named `time`. The `time` field must be measured in seconds from the Epoch.
If, however, your data is not in this format, you can provide a data description object that specifies the
format of your data.
A data description object has the following properties:
`fieldDelimiter`::
TBD
`format`::
TBD
`time_field`::
(+string+) The name of the field that contains the timestamp.
The default value is `time`.
`time_format`::
(+string+) The time format, which can be `epoch`, `epoch_ms`, or a custom pattern.
The default value is `epoch`, which refers to UNIX or Epoch time (the number of seconds
since 1 Jan 1970) and corresponds to the time_t type in C and C++.
The value `epoch_ms` indicates that time is measured in milliseconds since the epoch.
The `epoch` and `epoch_ms` time formats accept either integer or real values. +
NOTE: Custom patterns must conform to the Java `DateTimeFormatter` class. When you use date-time formatting patterns, it is recommended that you provide the full date, time and time zone. For example: `yyyy-MM-ddTHH:mm:ssX`. If the pattern that you specify is not sufficient to produce a complete timestamp, job creation fails.
`quotecharacter`::
TBD
[[ml-apilimits]]
===== Analysis Limits
Limits can be applied for the size of the mathematical models that are held in memory.
These limits can be set per job and do not control the memory used by other processes.
If necessary, the limits can also be updated after the job is created.
The `analysis_limits` object has the following properties:
`categorization_examples_limit`::
(+long+) The maximum number of examples stored per category in memory and
in the results data store. The default value is 4. If you increase this value,
more examples are available, however it requires that you have more storage available.
If you set this value to `0`, no examples are stored.
////
NOTE: The `categorization_examples_limit` only applies to analysis that uses categorization.
////
`model_memory_limit`::
(+long+) The maximum amount of memory, in MiB, that the mathematical models can use.
Once this limit is approached, data pruning becomes more aggressive.
Upon exceeding this limit, new entities are not modeled. The default value is 4096.

View File

@ -0,0 +1,63 @@
[[ml-open-job]]
==== Open Jobs
An anomaly detection job must be opened in order for it to be ready to receive and analyze data.
A job may be opened and closed multiple times throughout its lifecycle.
===== Request
`POST _xpack/ml/anomaly_detectors/<job_id>/_open`
===== Description
A job must be open in order to it to accept and analyze data.
When you open a new job, it starts with an empty model.
When you open an existing job, the most recent model state is automatically loaded.
The job is ready to resume its analysis from where it left off, once new data is received.
===== Path Parameters
`job_id` (required)::
(+string+) Identifier for the job
===== Request Body
`open_timeout`::
(+time+; default: ++30 min++) Controls the time to wait until a job has opened
`ignore_downtime`::
(+boolean+; default: ++true++) If true (default), any gap in data since it was
last closed is treated as a maintenance window. That is to say, it is not an anomaly
////
===== Responses
200
(EmptyResponse) The cluster has been successfully deleted
404
(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found)
412
(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error)
////
===== Examples
The following example opens the `event_rate` job:
[source,js]
--------------------------------------------------
POST _xpack/ml/anomaly_detectors/event_rate/_open
{
"ignore_downtime":false
}
--------------------------------------------------
// CONSOLE
// TEST[skip:todo]
When the job opens, you receive the following results:
----
{
"opened": true
}
----

View File

@ -0,0 +1,63 @@
[[ml-open-job]]
==== Open Jobs
An anomaly detection job must be opened in order for it to be ready to receive and analyze data.
A job may be opened and closed multiple times throughout its lifecycle.
===== Request
`POST _xpack/ml/anomaly_detectors/{job_id}/_open`
===== Description
A job must be open in order to it to accept and analyze data.
When you open a new job, it starts with an empty model.
When you open an existing job, the most recent model state is automatically loaded.
The job is ready to resume its analysis from where it left off, once new data is received.
===== Path Parameters
`job_id` (required)::
(+string+) Identifier for the job
===== Request Body
`open_timeout`::
(+time+; default: ++30 min++) Controls the time to wait until a job has opened
`ignore_downtime`::
(+boolean+; default: ++true++) If true (default), any gap in data since it was
last closed is treated as a maintenance window. That is to say, it is not an anomaly
////
===== Responses
200
(EmptyResponse) The cluster has been successfully deleted
404
(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found)
412
(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error)
////
===== Examples
The following example opens the `event_rate` job and sets an optional property:
[source,js]
--------------------------------------------------
POST _xpack/ml/anomaly_detectors/event_rate/_open
{
"ignore_downtime": false
}
--------------------------------------------------
// CONSOLE
// TEST[skip:todo]
When the job opens, you receive the following results:
----
{
"opened": true
}
----

View File

@ -0,0 +1,56 @@
[[ml-post-data]]
==== Post Data to Jobs
The post data API allows you to send data to an anomaly detection job for analysis.
The job must have been opened prior to sending data.
===== Request
`POST _xpack/ml/anomaly_detectors/<job_id> --data-binary @{data-file.json}`
===== Description
File sizes are limited to 100 Mb, so if your file is larger,
then split it into multiple files and upload each one separately in sequential time order.
When running in real-time, it is generally recommended to arrange to perform
many small uploads, rather than queueing data to upload larger files.
IMPORTANT: Data can only be accepted from a single connection.
Do not attempt to access the data endpoint from different threads at the same time.
Use a single connection synchronously to send data, close, flush or delete a single job.
+
It is not currently possible to post data to multiple jobs using wildcards or a comma separated list.
===== Path Parameters
`job_id` (required)::
(+string+) Identifier for the job
===== Request Body
`reset_start`::
(+string+; default: ++null++) Specifies the start of the bucket resetting range
`reset_end`::
(+string+; default: ++null++) Specifies the end of the bucket resetting range"
////
===== Responses
200
(EmptyResponse) The cluster has been successfully deleted
404
(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found)
412
(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error)
The following example sends data from file `data-file.json` to a job called `my_analysis`.
////
===== Examples
[source,js]
--------------------------------------------------
$ curl -s -XPOST localhost:9200/_xpack/ml/anomaly_detectors/my_analysis --data-binary @data-file.json
--------------------------------------------------

View File

@ -0,0 +1,84 @@
[[ml-preview-datafeed]]
==== Preview Data Feeds
The preview data feed API allows you to preview a data feed.
===== Request
`GET _xpack/ml/datafeeds/<feed_id>/_preview`
////
===== Description
Important:: Updates do not take effect until after then job is closed and new
data is sent to it.
////
===== Path Parameters
`feed_id` (required)::
(+string+) Identifier for the data feed
////
===== Request Body
The following properties can be updated after the job is created:
`analysis_config`::
(+object+) The analysis configuration, which specifies how to analyze the data.
See <<ml-analysisconfig, analysis configuration objects>>. In particular, the following properties can be updated: `categorization_filters`, `detector_description`, TBD.
`analysis_limits`::
Optionally specifies runtime limits for the job. See <<ml-apilimits,analysis limits>>.
[NOTE]
* You can update the `analysis_limits` only while the job is closed.
* The `model_memory_limit` property value cannot be decreased.
* If the `memory_status` property in the `model_size_stats` object has a value of `hard_limit`,
increasing the `model_memory_limit` is not recommended.
`description`::
(+string+) An optional description of the job.
This expects data to be sent in JSON format using the POST `_data` API.
===== Responses
TBD
////
////
200
(EmptyResponse) The cluster has been successfully deleted
404
(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found)
412
(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error)
===== Examples
The following example updates the `it-ops-kpi` job:
[source,js]
--------------------------------------------------
PUT _xpack/ml/anomaly_detectors/it-ops-kpi/_update
{
"description":"New description",
"analysis_limits":{
"model_memory_limit": 8192
}
}
--------------------------------------------------
// CONSOLE
// TEST[skip:todo]
When the job is updated, you receive the following results:
----
{
"job_id": "it-ops-kpi",
"description": "New description",
...
"analysis_limits": {
"model_memory_limit": 8192
...
}
----
////

View File

@ -0,0 +1,109 @@
[[ml-put-datafeed]]
==== Create Data Feeds
The create data feed API allows you to instantiate a data feed.
===== Request
`PUT _xpack/ml/datafeeds/<feed_id>`
////
===== Description
TBD
////
===== Path Parameters
`feed_id` (required)::
(+string+) Identifier for the data feed
////
===== Request Body
`description`::
(+string+) An optional description of the job.
`analysis_config`::
(+object+) The analysis configuration, which specifies how to analyze the data.
See <<ml-analysisconfig, analysis configuration objects>>.
`data_description`::
(+object+) Describes the format of the input data.
See <<ml-datadescription,data description objects>>.
`analysis_limits`::
Optionally specifies runtime limits for the job. See <<ml-apilimits,analysis limits>>.
This expects data to be sent in JSON format using the POST `_data` API.
===== Responses
TBD
200
(EmptyResponse) The cluster has been successfully deleted
404
(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found)
412
(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error)
===== Examples
The following example creates the `it-ops-kpi` job:
[source,js]
--------------------------------------------------
PUT _xpack/ml/anomaly_detectors/it-ops-kpi
{
"description":"First simple job",
"analysis_config":{
"bucket_span": "5m",
"latency": "0ms",
"detectors":[
{
"detector_description": "low_sum(events_per_min)",
"function":"low_sum",
"field_name": "events_per_min"
}
]
},
"data_description": {
"time_field":"@timestamp",
"time_format":"epoch_ms"
}
}
--------------------------------------------------
// CONSOLE
// TEST[skip:todo]
When the job is created, you receive the following results:
----
{
"job_id": "it-ops-kpi",
"description": "First simple job",
"create_time": 1491247016391,
"analysis_config": {
"bucket_span": "5m",
"latency": "0ms",
"detectors": [
{
"detector_description": "low_sum(events_per_min)",
"function": "low_sum",
"field_name": "events_per_min",
"detector_rules": []
}
],
"influencers": [],
"use_per_partition_normalization": false
},
"data_description": {
"time_field": "@timestamp",
"time_format": "epoch_ms"
},
"model_snapshot_retention_days": 1,
"results_index_name": "shared"
}
----
////

View File

@ -0,0 +1,109 @@
[[ml-put-job]]
==== Create Jobs
The create job API allows you to instantiate a {ml} job.
===== Request
`PUT _xpack/ml/anomaly_detectors/<job_id>`
////
===== Description
TBD
////
===== Path Parameters
`job_id` (required)::
(+string+) Identifier for the job
===== Request Body
`description`::
(+string+) An optional description of the job.
`analysis_config`::
(+object+) The analysis configuration, which specifies how to analyze the data.
See <<ml-analysisconfig, analysis configuration objects>>.
`data_description`::
(+object+) Describes the format of the input data.
See <<ml-datadescription,data description objects>>.
`analysis_limits`::
Optionally specifies runtime limits for the job. See <<ml-apilimits,analysis limits>>.
////
This expects data to be sent in JSON format using the POST `_data` API.
===== Responses
TBD
////
////
200
(EmptyResponse) The cluster has been successfully deleted
404
(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found)
412
(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error)
////
===== Examples
The following example creates the `it-ops-kpi` job:
[source,js]
--------------------------------------------------
PUT _xpack/ml/anomaly_detectors/it-ops-kpi
{
"description":"First simple job",
"analysis_config":{
"bucket_span": "5m",
"latency": "0ms",
"detectors":[
{
"detector_description": "low_sum(events_per_min)",
"function":"low_sum",
"field_name": "events_per_min"
}
]
},
"data_description": {
"time_field":"@timestamp",
"time_format":"epoch_ms"
}
}
--------------------------------------------------
// CONSOLE
// TEST[skip:todo]
When the job is created, you receive the following results:
----
{
"job_id": "it-ops-kpi",
"description": "First simple job",
"create_time": 1491247016391,
"analysis_config": {
"bucket_span": "5m",
"latency": "0ms",
"detectors": [
{
"detector_description": "low_sum(events_per_min)",
"function": "low_sum",
"field_name": "events_per_min",
"detector_rules": []
}
],
"influencers": [],
"use_per_partition_normalization": false
},
"data_description": {
"time_field": "@timestamp",
"time_format": "epoch_ms"
},
"model_snapshot_retention_days": 1,
"results_index_name": "shared"
}
----

View File

@ -0,0 +1,10 @@
[[ml-results-resource]]
==== Results Resources
A results resource has the following properties:
TBD
////
`analysis_config`::
(+object+) The analysis configuration, which specifies how to analyze the data. See <<ml-analysisconfig, analysis configuration objects>>.
////

View File

@ -0,0 +1,89 @@
[[ml-revert-snapshot]]
==== Update Model Snapshots
The update model snapshot API allows you to update certain properties of a snapshot.
===== Request
`POST _xpack/ml/anomaly_detectors/<job_id>/model_snapshots/<snapshot_id>/_update`
////
===== Description
Important:: Updates do not take effect until after then job is closed and new
data is sent to it.
////
===== Path Parameters
`job_id` (required)::
(+string+) Identifier for the job
`snapshot_id` (required)::
(+string+) Identifier for the model snapshot
===== Request Body
The following properties can be updated after the job is created:
TBD
////
`analysis_config`::
(+object+) The analysis configuration, which specifies how to analyze the data.
See <<ml-analysisconfig, analysis configuration objects>>. In particular, the following properties can be updated: `categorization_filters`, `detector_description`, TBD.
`analysis_limits`::
Optionally specifies runtime limits for the job. See <<ml-apilimits,analysis limits>>.
[NOTE]
* You can update the `analysis_limits` only while the job is closed.
* The `model_memory_limit` property value cannot be decreased.
* If the `memory_status` property in the `model_size_stats` object has a value of `hard_limit`,
increasing the `model_memory_limit` is not recommended.
`description`::
(+string+) An optional description of the job.
This expects data to be sent in JSON format using the POST `_data` API.
===== Responses
TBD
200
(EmptyResponse) The cluster has been successfully deleted
404
(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found)
412
(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error)
===== Examples
The following example updates the `it-ops-kpi` job:
[source,js]
--------------------------------------------------
PUT _xpack/ml/anomaly_detectors/it-ops-kpi/_update
{
"description":"New description",
"analysis_limits":{
"model_memory_limit": 8192
}
}
--------------------------------------------------
// CONSOLE
// TEST[skip:todo]
When the job is updated, you receive the following results:
----
{
"job_id": "it-ops-kpi",
"description": "New description",
...
"analysis_limits": {
"model_memory_limit": 8192
...
}
----
////

View File

@ -0,0 +1,10 @@
[[ml-snapshot-resource]]
==== Model Snapshot Resources
A model snapshot resource has the following properties:
TBD
////
`analysis_config`::
(+object+) The analysis configuration, which specifies how to analyze the data. See <<ml-analysisconfig, analysis configuration objects>>.
////

View File

@ -0,0 +1,64 @@
[[ml-start-datafeed]]
==== Start Data Feeds
A data feed must be started in order for it to be ready to receive and analyze data.
A data feed can be opened and closed multiple times throughout its lifecycle.
===== Request
`POST _xpack/ml/datafeeds/<feed_id>/_start`
////
===== Description
A job must be open in order to it to accept and analyze data.
When you open a new job, it starts with an empty model.
When you open an existing job, the most recent model state is automatically loaded.
The job is ready to resume its analysis from where it left off, once new data is received.
////
===== Path Parameters
`feed_id` (required)::
(+string+) Identifier for the data feed
////
===== Request Body
`open_timeout`::
(+time+; default: ++30 min++) Controls the time to wait until a job has opened
`ignore_downtime`::
(+boolean+; default: ++true++) If true (default), any gap in data since it was
last closed is treated as a maintenance window. That is to say, it is not an anomaly
===== Responses
200
(EmptyResponse) The cluster has been successfully deleted
404
(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found)
412
(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error)
===== Examples
The following example opens the `event_rate` job:
[source,js]
--------------------------------------------------
POST _xpack/ml/anomaly_detectors/event_rate/_open
{
"ignore_downtime":false
}
--------------------------------------------------
// CONSOLE
// TEST[skip:todo]
When the job opens, you receive the following results:
----
{
"opened": true
}
----
////

View File

@ -0,0 +1,64 @@
[[ml-stop-datafeed]]
==== Stop Data Feeds
A data feed can be opened and closed multiple times throughout its lifecycle.
===== Request
`POST _xpack/ml/datafeeds/<feed_id>/_stop`
////
===== Description
A job can be closed once all data has been analyzed.
When you close a job, it runs housekeeping tasks such as pruning the model history,
flushing buffers, calculating final results and persisting the internal models.
Depending upon the size of the job, it could take several minutes to close and
the equivalent time to re-open.
Once closed, the anomaly detection job has almost no overhead on the cluster
(except for maintaining its meta data). A closed job is blocked for receiving
data and analysis operations, however you can still explore and navigate results.
//NOTE:
//OUTDATED?: If using the {prelert} UI, the job will be automatically closed when stopping a datafeed job.
////
===== Path Parameters
`feed_id` (required)::
(+string+) Identifier for the data feed
////
===== Query Parameters
`close_timeout`::
(+time+; default: ++30 min++) Controls the time to wait until a job has closed
===== Responses
200
(EmptyResponse) The cluster has been successfully deleted
404
(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found)
412
(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error)
===== Examples
The following example closes the `event_rate` job:
[source,js]
--------------------------------------------------
POST _xpack/ml/anomaly_detectors/event_rate/_close
--------------------------------------------------
// CONSOLE
// TEST[skip:todo]
When the job is closed, you receive the following results:
----
{
"closed": true
}
----
////

View File

@ -0,0 +1,84 @@
[[ml-update-datafeed]]
==== Update Data Feeds
The update data feed API allows you to update certain properties of a data feed.
===== Request
`POST _xpack/ml/datafeeds/<feed_id>/_update`
////
===== Description
Important:: Updates do not take effect until after then job is closed and new
data is sent to it.
////
===== Path Parameters
`feed_id` (required)::
(+string+) Identifier for the data feed
////
===== Request Body
The following properties can be updated after the job is created:
`analysis_config`::
(+object+) The analysis configuration, which specifies how to analyze the data.
See <<ml-analysisconfig, analysis configuration objects>>. In particular, the following properties can be updated: `categorization_filters`, `detector_description`, TBD.
`analysis_limits`::
Optionally specifies runtime limits for the job. See <<ml-apilimits,analysis limits>>.
[NOTE]
* You can update the `analysis_limits` only while the job is closed.
* The `model_memory_limit` property value cannot be decreased.
* If the `memory_status` property in the `model_size_stats` object has a value of `hard_limit`,
increasing the `model_memory_limit` is not recommended.
`description`::
(+string+) An optional description of the job.
This expects data to be sent in JSON format using the POST `_data` API.
===== Responses
TBD
////
////
200
(EmptyResponse) The cluster has been successfully deleted
404
(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found)
412
(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error)
===== Examples
The following example updates the `it-ops-kpi` job:
[source,js]
--------------------------------------------------
PUT _xpack/ml/anomaly_detectors/it-ops-kpi/_update
{
"description":"New description",
"analysis_limits":{
"model_memory_limit": 8192
}
}
--------------------------------------------------
// CONSOLE
// TEST[skip:todo]
When the job is updated, you receive the following results:
----
{
"job_id": "it-ops-kpi",
"description": "New description",
...
"analysis_limits": {
"model_memory_limit": 8192
...
}
----
////

View File

@ -0,0 +1,84 @@
[[ml-update-job]]
==== Update Jobs
The update job API allows you to update certain properties of a job.
===== Request
`POST _xpack/ml/anomaly_detectors/<job_id>/_update`
////
===== Description
Important:: Updates do not take effect until after then job is closed and new
data is sent to it.
////
===== Path Parameters
`job_id` (required)::
(+string+) Identifier for the job
===== Request Body
The following properties can be updated after the job is created:
`analysis_config`::
(+object+) The analysis configuration, which specifies how to analyze the data.
See <<ml-analysisconfig, analysis configuration objects>>. In particular, the following properties can be updated: `categorization_filters`, `detector_description`, TBD.
`analysis_limits`::
Optionally specifies runtime limits for the job. See <<ml-apilimits,analysis limits>>.
[NOTE]
* You can update the `analysis_limits` only while the job is closed.
* The `model_memory_limit` property value cannot be decreased.
* If the `memory_status` property in the `model_size_stats` object has a value of `hard_limit`,
increasing the `model_memory_limit` is not recommended.
`description`::
(+string+) An optional description of the job.
////
This expects data to be sent in JSON format using the POST `_data` API.
===== Responses
TBD
////
////
200
(EmptyResponse) The cluster has been successfully deleted
404
(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found)
412
(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error)
////
===== Examples
The following example updates the `it-ops-kpi` job:
[source,js]
--------------------------------------------------
PUT _xpack/ml/anomaly_detectors/it-ops-kpi/_update
{
"description":"New description",
"analysis_limits":{
"model_memory_limit": 8192
}
}
--------------------------------------------------
// CONSOLE
// TEST[skip:todo]
When the job is updated, you receive the following results:
----
{
"job_id": "it-ops-kpi",
"description": "New description",
...
"analysis_limits": {
"model_memory_limit": 8192
...
}
----

View File

@ -0,0 +1,89 @@
[[ml-update-snapshot]]
==== Update Model Snapshots
The update model snapshot API allows you to update certain properties of a snapshot.
===== Request
`POST _xpack/ml/anomaly_detectors/<job_id>/model_snapshots/<snapshot_id>/_update`
////
===== Description
Important:: Updates do not take effect until after then job is closed and new
data is sent to it.
////
===== Path Parameters
`job_id` (required)::
(+string+) Identifier for the job
`snapshot_id` (required)::
(+string+) Identifier for the model snapshot
===== Request Body
The following properties can be updated after the job is created:
TBD
////
`analysis_config`::
(+object+) The analysis configuration, which specifies how to analyze the data.
See <<ml-analysisconfig, analysis configuration objects>>. In particular, the following properties can be updated: `categorization_filters`, `detector_description`, TBD.
`analysis_limits`::
Optionally specifies runtime limits for the job. See <<ml-apilimits,analysis limits>>.
[NOTE]
* You can update the `analysis_limits` only while the job is closed.
* The `model_memory_limit` property value cannot be decreased.
* If the `memory_status` property in the `model_size_stats` object has a value of `hard_limit`,
increasing the `model_memory_limit` is not recommended.
`description`::
(+string+) An optional description of the job.
This expects data to be sent in JSON format using the POST `_data` API.
===== Responses
TBD
200
(EmptyResponse) The cluster has been successfully deleted
404
(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found)
412
(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error)
===== Examples
The following example updates the `it-ops-kpi` job:
[source,js]
--------------------------------------------------
PUT _xpack/ml/anomaly_detectors/it-ops-kpi/_update
{
"description":"New description",
"analysis_limits":{
"model_memory_limit": 8192
}
}
--------------------------------------------------
// CONSOLE
// TEST[skip:todo]
When the job is updated, you receive the following results:
----
{
"job_id": "it-ops-kpi",
"description": "New description",
...
"analysis_limits": {
"model_memory_limit": 8192
...
}
----
////

View File

@ -0,0 +1,61 @@
[[ml-valid-detector]]
==== Validate Detectors
TBD
===== Request
`POST _xpack/ml/anomaly_detectors/_validate/detector`
===== Description
TBD
////
===== Path Parameters
`job_id` (required)::
(+string+) Identifier for the job
////
===== Request Body
TBD
////
`open_timeout`::
(+time+; default: ++30 min++) Controls the time to wait until a job has opened
`ignore_downtime`::
(+boolean+; default: ++true++) If true (default), any gap in data since it was
last closed is treated as a maintenance window. That is to say, it is not an anomaly
===== Responses
200
(EmptyResponse) The cluster has been successfully deleted
404
(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found)
412
(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error)
===== Examples
The following example opens the `event_rate` job:
[source,js]
--------------------------------------------------
POST _xpack/ml/anomaly_detectors/event_rate/_open
{
"ignore_downtime":false
}
--------------------------------------------------
// CONSOLE
// TEST[skip:todo]
When the job opens, you receive the following results:
----
{
"opened": true
}
----
////

View File

@ -0,0 +1,61 @@
[[ml-valid-job]]
==== Validate Jobs
TBD
===== Request
`POST _xpack/ml/anomaly_detectors/_validate`
===== Description
TBD
////
===== Path Parameters
`job_id` (required)::
(+string+) Identifier for the job
////
===== Request Body
TBD
////
`open_timeout`::
(+time+; default: ++30 min++) Controls the time to wait until a job has opened
`ignore_downtime`::
(+boolean+; default: ++true++) If true (default), any gap in data since it was
last closed is treated as a maintenance window. That is to say, it is not an anomaly
===== Responses
200
(EmptyResponse) The cluster has been successfully deleted
404
(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found)
412
(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error)
===== Examples
The following example opens the `event_rate` job:
[source,js]
--------------------------------------------------
POST _xpack/ml/anomaly_detectors/event_rate/_open
{
"ignore_downtime":false
}
--------------------------------------------------
// CONSOLE
// TEST[skip:todo]
When the job opens, you receive the following results:
----
{
"opened": true
}
----
////