[DOCS] Add ML data feed API examples (elastic/x-pack-elasticsearch#1016)

* [DOCS] Added examples for all ML job APIs

* [DOCS] Add ML datafeed API examples

Original commit: elastic/x-pack-elasticsearch@9634356371
This commit is contained in:
Lisa Cawley 2017-04-10 08:59:27 -07:00 committed by lcawley
parent 00bc35cf9f
commit 90575b18f4
16 changed files with 409 additions and 338 deletions

View File

@ -5,8 +5,8 @@ Machine learning in {xpack} automates the analysis of time-series data by
creating accurate baselines of normal behaviors in the data, and identifying
anomalous patterns in that data.
Driven by proprietary machine learning algorithms, anomalies related to temporal
deviations in values/counts/frequencies, statistical rarity, and unusual
Driven by proprietary machine learning algorithms, anomalies related to
temporal deviations in values/counts/frequencies, statistical rarity, and unusual
behaviors for a member of a population are detected, scored and linked with
statistically significant influencers in the data.
@ -15,12 +15,52 @@ that you dont need to specify algorithms, models, or other data
science-related configurations in order to get the benefits of {ml}.
//image::graph-network.jpg["Graph network"]
[float]
=== Integration with the Elastic Stack
Machine learning is tightly integrated with the Elastic Stack.
Data is pulled from {es} for analysis and anomaly results are displayed in
{kb} dashboards.
[float]
[[ml-concepts]]
=== Basic Concepts
There are a few concepts that are core to {ml} in {xpack}.
Understanding these concepts from the outset will tremendously help ease the
learning process.
Jobs::
Machine learning jobs contain the configuration information and metadata
necessary to perform an analytics task. For a list of the properties associated
with a job, see <<ml-job-resource, Job Resources>>.
Data feeds::
Jobs can analyze either a batch of data from a data store or a stream of data
in real-time. The latter involves data that is retrieved from {es} and is
referred to as a _data feed_.
Detectors::
Part of the configuration information associated with a job, detectors define
the type of analysis that needs to be done (for example, max, average, rare).
They also specify which fields to analyze. You can have more than one detector
in a job, which is more efficient than running multiple jobs against the same
data stream. For a list of the properties associated with detectors, see
<<ml-detectorconfig, Detector Configuration Objects>>.
Buckets::
Part of the configuration information associated with a job, the _bucket span_
defines the time interval across which the job analyzes. When setting the
bucket span, take into account the granularity at which you want to analyze,
the frequency of the input data, and the frequency at which alerting is required.
//[float]
//== Where to Go Next

View File

@ -10,7 +10,16 @@ Use machine learning to detect anomalies in time series data.
* <<ml-api-definitions, Definitions>>
[[ml-api-datafeed-endpoint]]
=== Datafeeds
=== Data Feeds
* <<ml-put-datafeed,Create data feeds>>
* <<ml-delete-datafeed,Delete data feeds>>
* <<ml-get-datafeed,Get data feed details>>
* <<ml-get-datafeed-stats,Get data feed statistics>>
* <<ml-preview-datafeed,Preview data feeds>>
* <<ml-start-datafeed,Start data feeds>>
* <<ml-stop-datafeed,Stop data feeds>>
* <<ml-update-datafeed,Update data feeds>>
include::ml/put-datafeed.asciidoc[]
include::ml/delete-datafeed.asciidoc[]

View File

@ -3,8 +3,63 @@
A data feed resource has the following properties:
TBD
////
`analysis_config`::
(+object+) The analysis configuration, which specifies how to analyze the data. See <<ml-analysisconfig, analysis configuration objects>>.
////
`aggregations`::
(+object+) TBD
The aggregations object describes the aggregations that are
applied to the search query?
For more information, see <<{ref}search-aggregations.html,Aggregations>>.
For example:
`{"@timestamp": {"histogram": {"field": "@timestamp",
"interval": 30000,"offset": 0,"order": {"_key": "asc"},"keyed": false,
"min_doc_count": 0}, "aggregations": {"events_per_min": {"sum": {
"field": "events_per_min"}}}}}`.
`datafeed_id`::
(+string+) A numerical character string that uniquely identifies the data feed.
`frequency`::
TBD. A time For example: "150s"
`indexes` (required)::
(+array+) An array of index names. For example: ["it_ops_metrics"]
`job_id` (required)::
(+string+) A numerical character string that uniquely identifies the job.
`query`::
(+object+) TBD. The query that retrieves the data.
By default, this property has the following value: `{"match_all": {"boost": 1}}`.
`query_delay`::
TBD. For example: "60s"
`scroll_size`::
TBD.
The maximum number of hits to be returned with each batch of search results?
The default value is `1000`.
`types` (required)::
(+array+) TBD. For example: ["network","sql","kpi"]
[[ml-datafeed-counts]]
==== Data Feed Counts
The get data feed statistics API provides information about the operational
progress of a data feed. For example:
`assigment_explanation`::
TBD
For example: ""
`node`::
(+object+) TBD
The node that is running the query?
For example: `{"id": "0-o0tOoRTwKFZifatTWKNw","name": "0-o0tOo",
"ephemeral_id": "DOZltLxLS_SzYpW6hQ9hyg","transport_address": "127.0.0.1:9300",
"attributes": {"max_running_jobs": "10"}}
`state`::
(+string+) The status of the data feed,
which can be one of the following values:
* started:: The data feed is actively receiving data.
* stopped:: The data feed is stopped and will not receive data until it is re-started.

View File

@ -7,25 +7,14 @@ The delete data feed API allows you to delete an existing data feed.
`DELETE _xpack/ml/datafeeds/<feed_id>`
////
===== Description
All job configuration, model state and results are deleted.
NOTE: You must stop the data feed before you can delete it.
IMPORTANT: Deleting a job must be done via this API only. Do not delete the
job directly from the `.ml-*` indices using the Elasticsearch
DELETE Document API. When {security} is enabled, make sure no `write`
privileges are granted to anyone over the `.ml-*` indices.
Before you can delete a job, you must delete the data feeds that are associated with it.
//See <<>>.
It is not currently possible to delete multiple jobs using wildcards or a comma separated list.
////
===== Path Parameters
`feed_id` (required)::
(+string+) Identifier for the data feed
(+string+) Identifier for the data feed
////
===== Responses

View File

@ -1,7 +1,8 @@
[[ml-get-datafeed-stats]]
==== Get Data Feed Statistics
The get data feed statistics API allows you to retrieve usage information for data feeds.
The get data feed statistics API allows you to retrieve usage information for
data feeds.
===== Request
@ -9,47 +10,40 @@ The get data feed statistics API allows you to retrieve usage information for da
`GET _xpack/ml/datafeeds/<feed_id>/_stats`
////
===== Description
TBD
////
If the data feed is stopped, the only information you receive is the
`datafeed_id` and the `state`.
===== Path Parameters
`feed_id`::
(+string+) Identifier for the data feed. If you do not specify this optional parameter,
the API returns information about all data feeds that you have authority to view.
(+string+) Identifier for the data feed.
If you do not specify this optional parameter, the API returns information
about all data feeds that you have authority to view.
////
===== Results
The API returns the following usage information:
`job_id`::
(+string+) A numerical character string that uniquely identifies the job.
`assigment_explanation`::
TBD
For example: ""
`data_counts`::
(+object+) An object that describes the number of records processed and any related error counts.
See <<ml-datacounts,data counts objects>>.
`datafeed_id`::
(+string+) A numerical character string that uniquely identifies the data feed.
`model_size_stats`::
(+object+) An object that provides information about the size and contents of the model.
See <<ml-modelsizestats,model size stats objects>>
`node`::
(+object+) TBD
`state`::
(+string+) The status of the job, which can be one of the following values:
running:: The job is actively receiving and processing data.
closed:: The job finished successfully with its model state persisted.
The job is still available to accept further data. NOTE: If you send data in a periodic cycle
and close the job at the end of each transaction, the job is marked as closed in the intervals
between when data is sent. For example, if data is sent every minute and it takes 1 second to process,
the job has a closed state for 59 seconds.
failed:: The job did not finish successfully due to an error. NOTE: This can occur due to invalid input data.
In this case, sending corrected data to a failed job re-opens the job and resets it to a running state.
(+string+) The status of the data feed, which can be one of the following values:
* `started`: The data feed is actively receiving data.
* `stopped`: The data feed is stopped and will not receive data until
it is re-started.
//failed?
////
===== Responses
200
@ -58,48 +52,28 @@ The API returns the following usage information:
(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found)
412
(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error)
////
===== Examples
.Example results for a single job
.Example results for a started job
----
{
"count": 1,
"jobs": [
"datafeeds": [
{
"job_id": "it-ops-kpi",
"data_counts": {
"job_id": "it-ops",
"processed_record_count": 43272,
"processed_field_count": 86544,
"input_bytes": 2846163,
"input_field_count": 86544,
"invalid_date_count": 0,
"missing_field_count": 0,
"out_of_order_timestamp_count": 0,
"empty_bucket_count": 0,
"sparse_bucket_count": 0,
"bucket_count": 4329,
"earliest_record_timestamp": 1454020560000,
"latest_record_timestamp": 1455318900000,
"last_data_time": 1491235405945,
"input_record_count": 43272
"datafeed_id": "datafeed-farequote",
"state": "started",
"node": {
"id": "0-o0tOoRTwKFZifatTWKNw",
"name": "0-o0tOo",
"ephemeral_id": "DOZltLxLS_SzYpW6hQ9hyg",
"transport_address": "127.0.0.1:9300",
"attributes": {
"max_running_jobs": "10"
}
},
"model_size_stats": {
"job_id": "it-ops",
"result_type": "model_size_stats",
"model_bytes": 25586,
"total_by_field_count": 3,
"total_over_field_count": 0,
"total_partition_field_count": 2,
"bucket_allocation_failures_count": 0,
"memory_status": "ok",
"log_time": 1491235406000,
"timestamp": 1455318600000
},
"state": "closed"
"assigment_explanation": ""
}
]
}
----
////

View File

@ -1,7 +1,8 @@
[[ml-get-datafeed]]
==== Get Data Feeds
The get data feeds API allows you to retrieve configuration information about data feeds.
The get data feeds API allows you to retrieve configuration information for
data feeds.
===== Request
@ -16,19 +17,19 @@ OUTDATED?: The get job API can also be applied to all jobs by using `_all` as th
===== Path Parameters
`feed_id`::
(+string+) Identifier for the data feed. If you do not specify this optional parameter,
the API returns information about all data feeds that you have authority to view.
(+string+) Identifier for the data feed.
If you do not specify this optional parameter, the API returns information
about all data feeds that you have authority to view.
===== Results
The API returns information about the data feed resource.
//For more information, see <<ml-job-resource,job resources>>.
For more information, see <<ml-datafeed-resource,data feed resources>>.
////
===== Query Parameters
`_stats`::
(+boolean+; default: ++true++) If true (default false), will just validate the cluster definition but will not perform the creation
None
===== Responses

View File

@ -27,8 +27,7 @@ The API returns information about the job resource. For more information, see
////
===== Query Parameters
`_stats`::
(+boolean+; default: ++true++) If true (default false), will just validate the cluster definition but will not perform the creation
None
===== Responses

View File

@ -52,8 +52,8 @@ An analysis configuration object has the following properties:
Requires `period` to be specified
////
`bucket_span`::
(+unsigned integer+, required) The size of the interval that the analysis is aggregated into, measured in seconds.
`bucket_span` (required)::
(+unsigned integer+) The size of the interval that the analysis is aggregated into, measured in seconds.
The default value is 300 seconds (5 minutes).
`categorization_field_name`::
@ -69,8 +69,8 @@ An analysis configuration object has the following properties:
that should not be taken into consideration for defining categories.
For example, you can exclude SQL statements that appear in your log files.
`detectors`::
(+array+, required) An array of detector configuration objects,
`detectors` (required)::
(+array+) An array of detector configuration objects,
which describe the anomaly detectors that are used in the job.
See <<ml-detectorconfig,detector configuration objects>>.
@ -154,8 +154,8 @@ Each detector has the following properties:
NOTE: The `field_name` cannot contain double quotes or backslashes.
`function`::
(+string+, required) The analysis function that is used.
`function` (required)::
(+string+) The analysis function that is used.
For example, `count`, `rare`, `mean`, `min`, `max`, and `sum`.
The default function is `metric`, which looks for anomalies in all of `min`, `max`,
and `mean`.

View File

@ -1,63 +0,0 @@
[[ml-open-job]]
==== Open Jobs
An anomaly detection job must be opened in order for it to be ready to receive and analyze data.
A job may be opened and closed multiple times throughout its lifecycle.
===== Request
`POST _xpack/ml/anomaly_detectors/<job_id>/_open`
===== Description
A job must be open in order to it to accept and analyze data.
When you open a new job, it starts with an empty model.
When you open an existing job, the most recent model state is automatically loaded.
The job is ready to resume its analysis from where it left off, once new data is received.
===== Path Parameters
`job_id` (required)::
(+string+) Identifier for the job
===== Request Body
`open_timeout`::
(+time+; default: ++30 min++) Controls the time to wait until a job has opened
`ignore_downtime`::
(+boolean+; default: ++true++) If true (default), any gap in data since it was
last closed is treated as a maintenance window. That is to say, it is not an anomaly
////
===== Responses
200
(EmptyResponse) The cluster has been successfully deleted
404
(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found)
412
(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error)
////
===== Examples
The following example opens the `event_rate` job:
[source,js]
--------------------------------------------------
POST _xpack/ml/anomaly_detectors/event_rate/_open
{
"ignore_downtime":false
}
--------------------------------------------------
// CONSOLE
// TEST[skip:todo]
When the job opens, you receive the following results:
----
{
"opened": true
}
----

View File

@ -20,7 +20,7 @@ The job is ready to resume its analysis from where it left off, once new data is
===== Path Parameters
`job_id` (required)::
(+string+) Identifier for the job
(+string+) Identifier for the job
===== Request Body

View File

@ -7,12 +7,13 @@ The preview data feed API allows you to preview a data feed.
`GET _xpack/ml/datafeeds/<feed_id>/_preview`
////
===== Description
Important:: Updates do not take effect until after then job is closed and new
data is sent to it.
////
TBD
//How much data does it return?
The API returns example data by using the current data feed settings.
===== Path Parameters
`feed_id` (required)::
@ -21,25 +22,7 @@ data is sent to it.
////
===== Request Body
The following properties can be updated after the job is created:
`analysis_config`::
(+object+) The analysis configuration, which specifies how to analyze the data.
See <<ml-analysisconfig, analysis configuration objects>>. In particular, the following properties can be updated: `categorization_filters`, `detector_description`, TBD.
`analysis_limits`::
Optionally specifies runtime limits for the job. See <<ml-apilimits,analysis limits>>.
[NOTE]
* You can update the `analysis_limits` only while the job is closed.
* The `model_memory_limit` property value cannot be decreased.
* If the `memory_status` property in the `model_size_stats` object has a value of `hard_limit`,
increasing the `model_memory_limit` is not recommended.
`description`::
(+string+) An optional description of the job.
This expects data to be sent in JSON format using the POST `_data` API.
None
===== Responses
@ -52,33 +35,33 @@ TBD
(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found)
412
(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error)
////
===== Examples
The following example updates the `it-ops-kpi` job:
The following example obtains a previews of the `datafeed-farequote` data feed:
[source,js]
--------------------------------------------------
PUT _xpack/ml/anomaly_detectors/it-ops-kpi/_update
{
"description":"New description",
"analysis_limits":{
"model_memory_limit": 8192
}
}
GET _xpack/ml/datafeeds/datafeed-farequote/_preview
--------------------------------------------------
// CONSOLE
// TEST[skip:todo]
When the job is updated, you receive the following results:
The data that is returned for this example is as follows:
----
{
"job_id": "it-ops-kpi",
"description": "New description",
[
{
"@timestamp": 1454803200000,
"responsetime": 132.20460510253906
},
{
"@timestamp": 1454803200000,
"responsetime": 990.4628295898438
},
{
"@timestamp": 1454803200000,
"responsetime": 877.5927124023438
},
...
"analysis_limits": {
"model_memory_limit": 8192
...
}
]
----
////

View File

@ -1,43 +1,57 @@
[[ml-put-datafeed]]
==== Create Data Feeds
The create data feed API allows you to instantiate a data feed.
The create data feed API enables you to instantiate a data feed.
===== Request
`PUT _xpack/ml/datafeeds/<feed_id>`
////
===== Description
TBD
////
You must create a job before you create a data feed. You can associate only one
data feed to each job.
===== Path Parameters
`feed_id` (required)::
(+string+) Identifier for the data feed
(+string+) A numerical character string that uniquely identifies the data feed.
////
===== Request Body
aggregations::
(+object+) TBD. For example: {"@timestamp": {"histogram": {"field": "@timestamp",
"interval": 30000,"offset": 0,"order": {"_key": "asc"},"keyed": false,
"min_doc_count": 0}, "aggregations": {"events_per_min": {"sum": {
"field": "events_per_min"}}}}}
`description`::
(+string+) An optional description of the job.
frequency::
TBD: For example: "150s"
`analysis_config`::
(+object+) The analysis configuration, which specifies how to analyze the data.
See <<ml-analysisconfig, analysis configuration objects>>.
indexes (required)::
(+array+) An array of index names. For example: ["it_ops_metrics"]
`data_description`::
(+object+) Describes the format of the input data.
See <<ml-datadescription,data description objects>>.
job_id (required)::
(+string+) A numerical character string that uniquely identifies the job.
`analysis_limits`::
Optionally specifies runtime limits for the job. See <<ml-apilimits,analysis limits>>.
query::
(+object+) The query that retrieves the data.
By default, this property has the following value: `{"match_all": {"boost": 1}}`.
query_delay::
TBD. For example: "60s"
This expects data to be sent in JSON format using the POST `_data` API.
scroll_size::
TBD. For example, 1000
types (required)::
TBD. For example: ["network","sql","kpi"]
For more information about these properties,
see <<ml-datafeed-resource, Data Feed Resources>>.
////
===== Responses
TBD
@ -48,62 +62,55 @@ TBD
412
(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error)
////
===== Examples
The following example creates the `it-ops-kpi` job:
The following example creates the `datafeed-it-ops-kpi` data feed:
[source,js]
--------------------------------------------------
PUT _xpack/ml/anomaly_detectors/it-ops-kpi
PUT _xpack/ml/datafeeds/datafeed-it-ops-kpi
{
"description":"First simple job",
"analysis_config":{
"bucket_span": "5m",
"latency": "0ms",
"detectors":[
{
"detector_description": "low_sum(events_per_min)",
"function":"low_sum",
"field_name": "events_per_min"
}
"job_id": "it-ops-kpi",
"query":
{
"match_all":
{
"boost": 1
}
},
"indexes": [
"it_ops_metrics"
],
"types": [
"kpi",
"sql",
"network"
]
},
"data_description": {
"time_field":"@timestamp",
"time_format":"epoch_ms"
}
}
--------------------------------------------------
// CONSOLE
// TEST[skip:todo]
When the job is created, you receive the following results:
When the data feed is created, you receive the following results:
----
{
"datafeed_id": "datafeed-it-ops-kpi",
"job_id": "it-ops-kpi",
"description": "First simple job",
"create_time": 1491247016391,
"analysis_config": {
"bucket_span": "5m",
"latency": "0ms",
"detectors": [
{
"detector_description": "low_sum(events_per_min)",
"function": "low_sum",
"field_name": "events_per_min",
"detector_rules": []
}
],
"influencers": [],
"use_per_partition_normalization": false
"query_delay": "1m",
"indexes": [
"it_ops_metrics"
],
"types": [
"kpi",
"sql",
"network"
],
"query": {
"match_all": {
"boost": 1
}
},
"data_description": {
"time_field": "@timestamp",
"time_format": "epoch_ms"
},
"model_snapshot_retention_days": 1,
"results_index_name": "shared"
"scroll_size": 1000
}
----
////

View File

@ -1,7 +1,7 @@
[[ml-put-job]]
==== Create Jobs
The create job API allows you to instantiate a {ml} job.
The create job API enables you to instantiate a {ml} job.
===== Request

View File

@ -1,37 +1,73 @@
[[ml-start-datafeed]]
==== Start Data Feeds
A data feed must be started in order for it to be ready to receive and analyze data.
A data feed must be started in order to retrieve data from {es}.
A data feed can be opened and closed multiple times throughout its lifecycle.
===== Request
`POST _xpack/ml/datafeeds/<feed_id>/_start`
////
===== Description
A job must be open in order to it to accept and analyze data.
When you start a data feed, you can specify a start time. This allows you to
include a training period, providing you have this data available in {es}.
If you want to analyze from the beginning of a dataset, you can specify any date
earlier than that beginning date.
When you open a new job, it starts with an empty model.
If you do not specify a start time and the data feed is associated with a new
job, the analysis starts from the earliest time for which data is available.
When you start a data feed, you can also specify an end time. If you do so, the
job analyzes data from the start time until the end time, at which point the
analysis stops. This scenario is useful for a one-off batch analysis. If you
do not specify an end time, the data feed runs continuously.
If the system restarts, any jobs that had data feeds running are also restarted.
When a stopped data feed is restarted, it continues processing input data from
the next millisecond after it was stopped. If your data contains the same
timestamp (for example, it is summarized by minute), then data loss is possible
for the timestamp value when the data feed stopped. This situation can occur
because the job might not have completely processed all data for that millisecond.
If you specify a `start` value that is earlier than the timestamp of the latest
processed record, that value is ignored.
NOTE: Before you can start a data feed, the job must be open. Otherwise, an error
occurs.
When you open an existing job, the most recent model state is automatically loaded.
The job is ready to resume its analysis from where it left off, once new data is received.
////
===== Path Parameters
`feed_id` (required)::
(+string+) Identifier for the data feed
////
(+string+) Identifier for the data feed
===== Request Body
`open_timeout`::
(+time+; default: ++30 min++) Controls the time to wait until a job has opened
`end`::
(+string+) The time that the data feed should end. This value is exclusive.
The default value is an empty string.
`ignore_downtime`::
(+boolean+; default: ++true++) If true (default), any gap in data since it was
last closed is treated as a maintenance window. That is to say, it is not an anomaly
`start`::
(+string+) The time that the data feed should begin. This value is inclusive.
The default value is an empty string.
These `start` and `end` times can be specified by using one of the
following formats:
* ISO 8601 format with milliseconds, for example `2017-01-22T06:00:00.000Z`
* ISO 8601 format without milliseconds, for example `2017-01-22T06:00:00+00:00`
* Seconds from the Epoch, for example `1390370400`
NOTE: When a URL is expected (for example, in browsers), the `+`` used in time
zone designators has to be encoded as `%2B`.
Date-time arguments using either of the ISO 8601 formats must have a time zone
designator, where Z is accepted as an abbreviation for UTC time.
`timeout`::
(+time+) Controls the amount of time to wait until a data feed starts.
The default value is 20 seconds.
////
===== Responses
200
@ -40,16 +76,16 @@ The job is ready to resume its analysis from where it left off, once new data is
(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found)
412
(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error)
////
===== Examples
The following example opens the `event_rate` job:
The following example opens the `datafeed-it-ops-kpi` data feed:
[source,js]
--------------------------------------------------
POST _xpack/ml/anomaly_detectors/event_rate/_open
POST _xpack/ml/datafeeds/datafeed-it-ops-kpi/_start
{
"ignore_downtime":false
"start": "2017-04-07T18:22:16Z"
}
--------------------------------------------------
// CONSOLE
@ -58,7 +94,6 @@ POST _xpack/ml/anomaly_detectors/event_rate/_open
When the job opens, you receive the following results:
----
{
"opened": true
"started": true
}
----
////

View File

@ -1,6 +1,7 @@
[[ml-stop-datafeed]]
==== Stop Data Feeds
A data feed that is stopped ceases to retrieve data from {es}.
A data feed can be opened and closed multiple times throughout its lifecycle.
===== Request
@ -10,31 +11,23 @@ A data feed can be opened and closed multiple times throughout its lifecycle.
////
===== Description
A job can be closed once all data has been analyzed.
When you close a job, it runs housekeeping tasks such as pruning the model history,
flushing buffers, calculating final results and persisting the internal models.
Depending upon the size of the job, it could take several minutes to close and
the equivalent time to re-open.
Once closed, the anomaly detection job has almost no overhead on the cluster
(except for maintaining its meta data). A closed job is blocked for receiving
data and analysis operations, however you can still explore and navigate results.
//NOTE:
//OUTDATED?: If using the {prelert} UI, the job will be automatically closed when stopping a datafeed job.
TBD
////
===== Path Parameters
`feed_id` (required)::
(+string+) Identifier for the data feed
(+string+) Identifier for the data feed
===== Request Body
`force`::
(+boolean+) If true, the data feed is stopped forcefully.
`timeout`::
(+time+) Controls the amount of time to wait until a data feed stops.
The default value is 20 seconds.
////
===== Query Parameters
`close_timeout`::
(+time+; default: ++30 min++) Controls the time to wait until a job has closed
===== Responses
200
@ -43,22 +36,24 @@ data and analysis operations, however you can still explore and navigate results
(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found)
412
(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error)
////
===== Examples
The following example closes the `event_rate` job:
The following example stops the `datafeed-it-ops-kpi` data feed:
[source,js]
--------------------------------------------------
POST _xpack/ml/anomaly_detectors/event_rate/_close
POST _xpack/ml/datafeeds/datafeed-it-ops-kpi/_stop
{
"timeout": "30s"
}
--------------------------------------------------
// CONSOLE
// TEST[skip:todo]
When the job is closed, you receive the following results:
When the data feed stops, you receive the following results:
----
{
"closed": true
"stopped": true
}
----
////

View File

@ -10,75 +10,122 @@ The update data feed API allows you to update certain properties of a data feed.
////
===== Description
Important:: Updates do not take effect until after then job is closed and new
data is sent to it.
TBD
////
===== Path Parameters
`feed_id` (required)::
(+string+) Identifier for the data feed
////
===== Request Body
The following properties can be updated after the job is created:
The following properties can be updated after the data feed is created:
`analysis_config`::
(+object+) The analysis configuration, which specifies how to analyze the data.
See <<ml-analysisconfig, analysis configuration objects>>. In particular, the following properties can be updated: `categorization_filters`, `detector_description`, TBD.
aggregations::
(+object+) TBD.
`analysis_limits`::
Optionally specifies runtime limits for the job. See <<ml-apilimits,analysis limits>>.
frequency::
TBD: For example: "150s"
[NOTE]
* You can update the `analysis_limits` only while the job is closed.
* The `model_memory_limit` property value cannot be decreased.
* If the `memory_status` property in the `model_size_stats` object has a value of `hard_limit`,
increasing the `model_memory_limit` is not recommended.
indexes (required)::
(+array+) An array of index names. For example: ["it_ops_metrics"]
`description`::
(+string+) An optional description of the job.
job_id::
(+string+) A numerical character string that uniquely identifies the job.
This expects data to be sent in JSON format using the POST `_data` API.
query::
(+object+) The query that retrieves the data.
By default, this property has the following value: `{"match_all": {"boost": 1}}`.
query_delay::
TBD. For example: "60s"
scroll_size::
TBD. For example, 1000
types (required)::
TBD. For example: ["network","sql","kpi"]
For more information about these properties,
see <<ml-datafeed-resource, Data Feed Resources>>.
////
===== Responses
TBD
////
////
200
(EmptyResponse) The cluster has been successfully deleted
404
(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found)
412
(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error)
////
===== Examples
The following example updates the `it-ops-kpi` job:
[source,js]
--------------------------------------------------
PUT _xpack/ml/anomaly_detectors/it-ops-kpi/_update
POST _xpack/ml/datafeeds/datafeed-it-ops-kpi3/_update
{
"description":"New description",
"analysis_limits":{
"model_memory_limit": 8192
"aggregations": {
"@timestamp": {
"histogram": {
"field": "@timestamp",
"interval": 30000,
"offset": 0,
"order": {
"_key": "asc"
},
"keyed": false,
"min_doc_count": 0
},
"aggregations": {
"events_per_min": {
"sum": {
"field": "events_per_min"
}
}
}
}
},
"frequency": "160s"
}
--------------------------------------------------
// CONSOLE
// TEST[skip:todo]
When the job is updated, you receive the following results:
When the data feed is updated, you receive the following results:
----
{
"datafeed_id": "datafeed-it-ops-kpi",
"job_id": "it-ops-kpi",
"description": "New description",
...
"analysis_limits": {
"model_memory_limit": 8192
...
"query_delay": "1m",
"frequency": "160s",
...
"aggregations": {
"@timestamp": {
"histogram": {
"field": "@timestamp",
"interval": 30000,
"offset": 0,
"order": {
"_key": "asc"
},
"keyed": false,
"min_doc_count": 0
},
"aggregations": {
"events_per_min": {
"sum": {
"field": "events_per_min"
}
}
}
}
},
"scroll_size": 1000
}
----
////