mirror of
https://github.com/honeymoose/OpenSearch.git
synced 2025-03-25 01:19:02 +00:00
[DOCS] Add ML data feed API examples (elastic/x-pack-elasticsearch#1016)
* [DOCS] Added examples for all ML job APIs * [DOCS] Add ML datafeed API examples Original commit: elastic/x-pack-elasticsearch@9634356371
This commit is contained in:
parent
00bc35cf9f
commit
90575b18f4
@ -5,8 +5,8 @@ Machine learning in {xpack} automates the analysis of time-series data by
|
||||
creating accurate baselines of normal behaviors in the data, and identifying
|
||||
anomalous patterns in that data.
|
||||
|
||||
Driven by proprietary machine learning algorithms, anomalies related to temporal
|
||||
deviations in values/counts/frequencies, statistical rarity, and unusual
|
||||
Driven by proprietary machine learning algorithms, anomalies related to
|
||||
temporal deviations in values/counts/frequencies, statistical rarity, and unusual
|
||||
behaviors for a member of a population are detected, scored and linked with
|
||||
statistically significant influencers in the data.
|
||||
|
||||
@ -15,12 +15,52 @@ that you don’t need to specify algorithms, models, or other data
|
||||
science-related configurations in order to get the benefits of {ml}.
|
||||
//image::graph-network.jpg["Graph network"]
|
||||
|
||||
[float]
|
||||
=== Integration with the Elastic Stack
|
||||
|
||||
Machine learning is tightly integrated with the Elastic Stack.
|
||||
Data is pulled from {es} for analysis and anomaly results are displayed in
|
||||
{kb} dashboards.
|
||||
|
||||
[float]
|
||||
[[ml-concepts]]
|
||||
=== Basic Concepts
|
||||
|
||||
There are a few concepts that are core to {ml} in {xpack}.
|
||||
Understanding these concepts from the outset will tremendously help ease the
|
||||
learning process.
|
||||
|
||||
Jobs::
|
||||
Machine learning jobs contain the configuration information and metadata
|
||||
necessary to perform an analytics task. For a list of the properties associated
|
||||
with a job, see <<ml-job-resource, Job Resources>>.
|
||||
|
||||
Data feeds::
|
||||
Jobs can analyze either a batch of data from a data store or a stream of data
|
||||
in real-time. The latter involves data that is retrieved from {es} and is
|
||||
referred to as a _data feed_.
|
||||
|
||||
Detectors::
|
||||
Part of the configuration information associated with a job, detectors define
|
||||
the type of analysis that needs to be done (for example, max, average, rare).
|
||||
They also specify which fields to analyze. You can have more than one detector
|
||||
in a job, which is more efficient than running multiple jobs against the same
|
||||
data stream. For a list of the properties associated with detectors, see
|
||||
<<ml-detectorconfig, Detector Configuration Objects>>.
|
||||
|
||||
Buckets::
|
||||
Part of the configuration information associated with a job, the _bucket span_
|
||||
defines the time interval across which the job analyzes. When setting the
|
||||
bucket span, take into account the granularity at which you want to analyze,
|
||||
the frequency of the input data, and the frequency at which alerting is required.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
//[float]
|
||||
//== Where to Go Next
|
||||
|
||||
|
@ -10,7 +10,16 @@ Use machine learning to detect anomalies in time series data.
|
||||
* <<ml-api-definitions, Definitions>>
|
||||
|
||||
[[ml-api-datafeed-endpoint]]
|
||||
=== Datafeeds
|
||||
=== Data Feeds
|
||||
|
||||
* <<ml-put-datafeed,Create data feeds>>
|
||||
* <<ml-delete-datafeed,Delete data feeds>>
|
||||
* <<ml-get-datafeed,Get data feed details>>
|
||||
* <<ml-get-datafeed-stats,Get data feed statistics>>
|
||||
* <<ml-preview-datafeed,Preview data feeds>>
|
||||
* <<ml-start-datafeed,Start data feeds>>
|
||||
* <<ml-stop-datafeed,Stop data feeds>>
|
||||
* <<ml-update-datafeed,Update data feeds>>
|
||||
|
||||
include::ml/put-datafeed.asciidoc[]
|
||||
include::ml/delete-datafeed.asciidoc[]
|
||||
|
@ -3,8 +3,63 @@
|
||||
|
||||
A data feed resource has the following properties:
|
||||
|
||||
TBD
|
||||
////
|
||||
`analysis_config`::
|
||||
(+object+) The analysis configuration, which specifies how to analyze the data. See <<ml-analysisconfig, analysis configuration objects>>.
|
||||
////
|
||||
`aggregations`::
|
||||
(+object+) TBD
|
||||
The aggregations object describes the aggregations that are
|
||||
applied to the search query?
|
||||
For more information, see <<{ref}search-aggregations.html,Aggregations>>.
|
||||
For example:
|
||||
`{"@timestamp": {"histogram": {"field": "@timestamp",
|
||||
"interval": 30000,"offset": 0,"order": {"_key": "asc"},"keyed": false,
|
||||
"min_doc_count": 0}, "aggregations": {"events_per_min": {"sum": {
|
||||
"field": "events_per_min"}}}}}`.
|
||||
|
||||
`datafeed_id`::
|
||||
(+string+) A numerical character string that uniquely identifies the data feed.
|
||||
|
||||
`frequency`::
|
||||
TBD. A time For example: "150s"
|
||||
|
||||
`indexes` (required)::
|
||||
(+array+) An array of index names. For example: ["it_ops_metrics"]
|
||||
|
||||
`job_id` (required)::
|
||||
(+string+) A numerical character string that uniquely identifies the job.
|
||||
|
||||
`query`::
|
||||
(+object+) TBD. The query that retrieves the data.
|
||||
By default, this property has the following value: `{"match_all": {"boost": 1}}`.
|
||||
|
||||
`query_delay`::
|
||||
TBD. For example: "60s"
|
||||
|
||||
`scroll_size`::
|
||||
TBD.
|
||||
The maximum number of hits to be returned with each batch of search results?
|
||||
The default value is `1000`.
|
||||
|
||||
`types` (required)::
|
||||
(+array+) TBD. For example: ["network","sql","kpi"]
|
||||
|
||||
[[ml-datafeed-counts]]
|
||||
==== Data Feed Counts
|
||||
|
||||
The get data feed statistics API provides information about the operational
|
||||
progress of a data feed. For example:
|
||||
|
||||
`assigment_explanation`::
|
||||
TBD
|
||||
For example: ""
|
||||
|
||||
`node`::
|
||||
(+object+) TBD
|
||||
The node that is running the query?
|
||||
For example: `{"id": "0-o0tOoRTwKFZifatTWKNw","name": "0-o0tOo",
|
||||
"ephemeral_id": "DOZltLxLS_SzYpW6hQ9hyg","transport_address": "127.0.0.1:9300",
|
||||
"attributes": {"max_running_jobs": "10"}}
|
||||
|
||||
`state`::
|
||||
(+string+) The status of the data feed,
|
||||
which can be one of the following values:
|
||||
* started:: The data feed is actively receiving data.
|
||||
* stopped:: The data feed is stopped and will not receive data until it is re-started.
|
||||
|
@ -7,25 +7,14 @@ The delete data feed API allows you to delete an existing data feed.
|
||||
|
||||
`DELETE _xpack/ml/datafeeds/<feed_id>`
|
||||
|
||||
////
|
||||
===== Description
|
||||
|
||||
All job configuration, model state and results are deleted.
|
||||
NOTE: You must stop the data feed before you can delete it.
|
||||
|
||||
IMPORTANT: Deleting a job must be done via this API only. Do not delete the
|
||||
job directly from the `.ml-*` indices using the Elasticsearch
|
||||
DELETE Document API. When {security} is enabled, make sure no `write`
|
||||
privileges are granted to anyone over the `.ml-*` indices.
|
||||
|
||||
Before you can delete a job, you must delete the data feeds that are associated with it.
|
||||
//See <<>>.
|
||||
|
||||
It is not currently possible to delete multiple jobs using wildcards or a comma separated list.
|
||||
////
|
||||
===== Path Parameters
|
||||
|
||||
`feed_id` (required)::
|
||||
(+string+) Identifier for the data feed
|
||||
(+string+) Identifier for the data feed
|
||||
////
|
||||
===== Responses
|
||||
|
||||
|
@ -1,7 +1,8 @@
|
||||
[[ml-get-datafeed-stats]]
|
||||
==== Get Data Feed Statistics
|
||||
|
||||
The get data feed statistics API allows you to retrieve usage information for data feeds.
|
||||
The get data feed statistics API allows you to retrieve usage information for
|
||||
data feeds.
|
||||
|
||||
===== Request
|
||||
|
||||
@ -9,47 +10,40 @@ The get data feed statistics API allows you to retrieve usage information for da
|
||||
|
||||
`GET _xpack/ml/datafeeds/<feed_id>/_stats`
|
||||
|
||||
////
|
||||
|
||||
===== Description
|
||||
|
||||
TBD
|
||||
////
|
||||
If the data feed is stopped, the only information you receive is the
|
||||
`datafeed_id` and the `state`.
|
||||
|
||||
===== Path Parameters
|
||||
|
||||
`feed_id`::
|
||||
(+string+) Identifier for the data feed. If you do not specify this optional parameter,
|
||||
the API returns information about all data feeds that you have authority to view.
|
||||
(+string+) Identifier for the data feed.
|
||||
If you do not specify this optional parameter, the API returns information
|
||||
about all data feeds that you have authority to view.
|
||||
|
||||
|
||||
////
|
||||
===== Results
|
||||
|
||||
The API returns the following usage information:
|
||||
|
||||
`job_id`::
|
||||
(+string+) A numerical character string that uniquely identifies the job.
|
||||
`assigment_explanation`::
|
||||
TBD
|
||||
For example: ""
|
||||
|
||||
`data_counts`::
|
||||
(+object+) An object that describes the number of records processed and any related error counts.
|
||||
See <<ml-datacounts,data counts objects>>.
|
||||
`datafeed_id`::
|
||||
(+string+) A numerical character string that uniquely identifies the data feed.
|
||||
|
||||
`model_size_stats`::
|
||||
(+object+) An object that provides information about the size and contents of the model.
|
||||
See <<ml-modelsizestats,model size stats objects>>
|
||||
`node`::
|
||||
(+object+) TBD
|
||||
|
||||
`state`::
|
||||
(+string+) The status of the job, which can be one of the following values:
|
||||
running:: The job is actively receiving and processing data.
|
||||
closed:: The job finished successfully with its model state persisted.
|
||||
The job is still available to accept further data. NOTE: If you send data in a periodic cycle
|
||||
and close the job at the end of each transaction, the job is marked as closed in the intervals
|
||||
between when data is sent. For example, if data is sent every minute and it takes 1 second to process,
|
||||
the job has a closed state for 59 seconds.
|
||||
failed:: The job did not finish successfully due to an error. NOTE: This can occur due to invalid input data.
|
||||
In this case, sending corrected data to a failed job re-opens the job and resets it to a running state.
|
||||
|
||||
|
||||
|
||||
(+string+) The status of the data feed, which can be one of the following values:
|
||||
* `started`: The data feed is actively receiving data.
|
||||
* `stopped`: The data feed is stopped and will not receive data until
|
||||
it is re-started.
|
||||
//failed?
|
||||
////
|
||||
===== Responses
|
||||
|
||||
200
|
||||
@ -58,48 +52,28 @@ The API returns the following usage information:
|
||||
(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found)
|
||||
412
|
||||
(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error)
|
||||
|
||||
////
|
||||
===== Examples
|
||||
|
||||
.Example results for a single job
|
||||
.Example results for a started job
|
||||
----
|
||||
{
|
||||
"count": 1,
|
||||
"jobs": [
|
||||
"datafeeds": [
|
||||
{
|
||||
"job_id": "it-ops-kpi",
|
||||
"data_counts": {
|
||||
"job_id": "it-ops",
|
||||
"processed_record_count": 43272,
|
||||
"processed_field_count": 86544,
|
||||
"input_bytes": 2846163,
|
||||
"input_field_count": 86544,
|
||||
"invalid_date_count": 0,
|
||||
"missing_field_count": 0,
|
||||
"out_of_order_timestamp_count": 0,
|
||||
"empty_bucket_count": 0,
|
||||
"sparse_bucket_count": 0,
|
||||
"bucket_count": 4329,
|
||||
"earliest_record_timestamp": 1454020560000,
|
||||
"latest_record_timestamp": 1455318900000,
|
||||
"last_data_time": 1491235405945,
|
||||
"input_record_count": 43272
|
||||
"datafeed_id": "datafeed-farequote",
|
||||
"state": "started",
|
||||
"node": {
|
||||
"id": "0-o0tOoRTwKFZifatTWKNw",
|
||||
"name": "0-o0tOo",
|
||||
"ephemeral_id": "DOZltLxLS_SzYpW6hQ9hyg",
|
||||
"transport_address": "127.0.0.1:9300",
|
||||
"attributes": {
|
||||
"max_running_jobs": "10"
|
||||
}
|
||||
},
|
||||
"model_size_stats": {
|
||||
"job_id": "it-ops",
|
||||
"result_type": "model_size_stats",
|
||||
"model_bytes": 25586,
|
||||
"total_by_field_count": 3,
|
||||
"total_over_field_count": 0,
|
||||
"total_partition_field_count": 2,
|
||||
"bucket_allocation_failures_count": 0,
|
||||
"memory_status": "ok",
|
||||
"log_time": 1491235406000,
|
||||
"timestamp": 1455318600000
|
||||
},
|
||||
"state": "closed"
|
||||
"assigment_explanation": ""
|
||||
}
|
||||
]
|
||||
}
|
||||
----
|
||||
////
|
||||
|
@ -1,7 +1,8 @@
|
||||
[[ml-get-datafeed]]
|
||||
==== Get Data Feeds
|
||||
|
||||
The get data feeds API allows you to retrieve configuration information about data feeds.
|
||||
The get data feeds API allows you to retrieve configuration information for
|
||||
data feeds.
|
||||
|
||||
===== Request
|
||||
|
||||
@ -16,19 +17,19 @@ OUTDATED?: The get job API can also be applied to all jobs by using `_all` as th
|
||||
===== Path Parameters
|
||||
|
||||
`feed_id`::
|
||||
(+string+) Identifier for the data feed. If you do not specify this optional parameter,
|
||||
the API returns information about all data feeds that you have authority to view.
|
||||
(+string+) Identifier for the data feed.
|
||||
If you do not specify this optional parameter, the API returns information
|
||||
about all data feeds that you have authority to view.
|
||||
|
||||
===== Results
|
||||
|
||||
The API returns information about the data feed resource.
|
||||
//For more information, see <<ml-job-resource,job resources>>.
|
||||
For more information, see <<ml-datafeed-resource,data feed resources>>.
|
||||
|
||||
////
|
||||
===== Query Parameters
|
||||
|
||||
`_stats`::
|
||||
(+boolean+; default: ++true++) If true (default false), will just validate the cluster definition but will not perform the creation
|
||||
None
|
||||
|
||||
===== Responses
|
||||
|
||||
|
@ -27,8 +27,7 @@ The API returns information about the job resource. For more information, see
|
||||
////
|
||||
===== Query Parameters
|
||||
|
||||
`_stats`::
|
||||
(+boolean+; default: ++true++) If true (default false), will just validate the cluster definition but will not perform the creation
|
||||
None
|
||||
|
||||
===== Responses
|
||||
|
||||
|
@ -52,8 +52,8 @@ An analysis configuration object has the following properties:
|
||||
Requires `period` to be specified
|
||||
////
|
||||
|
||||
`bucket_span`::
|
||||
(+unsigned integer+, required) The size of the interval that the analysis is aggregated into, measured in seconds.
|
||||
`bucket_span` (required)::
|
||||
(+unsigned integer+) The size of the interval that the analysis is aggregated into, measured in seconds.
|
||||
The default value is 300 seconds (5 minutes).
|
||||
|
||||
`categorization_field_name`::
|
||||
@ -69,8 +69,8 @@ An analysis configuration object has the following properties:
|
||||
that should not be taken into consideration for defining categories.
|
||||
For example, you can exclude SQL statements that appear in your log files.
|
||||
|
||||
`detectors`::
|
||||
(+array+, required) An array of detector configuration objects,
|
||||
`detectors` (required)::
|
||||
(+array+) An array of detector configuration objects,
|
||||
which describe the anomaly detectors that are used in the job.
|
||||
See <<ml-detectorconfig,detector configuration objects>>.
|
||||
|
||||
@ -154,8 +154,8 @@ Each detector has the following properties:
|
||||
|
||||
NOTE: The `field_name` cannot contain double quotes or backslashes.
|
||||
|
||||
`function`::
|
||||
(+string+, required) The analysis function that is used.
|
||||
`function` (required)::
|
||||
(+string+) The analysis function that is used.
|
||||
For example, `count`, `rare`, `mean`, `min`, `max`, and `sum`.
|
||||
The default function is `metric`, which looks for anomalies in all of `min`, `max`,
|
||||
and `mean`.
|
||||
|
@ -1,63 +0,0 @@
|
||||
[[ml-open-job]]
|
||||
==== Open Jobs
|
||||
|
||||
An anomaly detection job must be opened in order for it to be ready to receive and analyze data.
|
||||
A job may be opened and closed multiple times throughout its lifecycle.
|
||||
|
||||
===== Request
|
||||
|
||||
`POST _xpack/ml/anomaly_detectors/<job_id>/_open`
|
||||
|
||||
===== Description
|
||||
|
||||
A job must be open in order to it to accept and analyze data.
|
||||
|
||||
When you open a new job, it starts with an empty model.
|
||||
|
||||
When you open an existing job, the most recent model state is automatically loaded.
|
||||
The job is ready to resume its analysis from where it left off, once new data is received.
|
||||
|
||||
===== Path Parameters
|
||||
|
||||
`job_id` (required)::
|
||||
(+string+) Identifier for the job
|
||||
|
||||
===== Request Body
|
||||
|
||||
`open_timeout`::
|
||||
(+time+; default: ++30 min++) Controls the time to wait until a job has opened
|
||||
|
||||
`ignore_downtime`::
|
||||
(+boolean+; default: ++true++) If true (default), any gap in data since it was
|
||||
last closed is treated as a maintenance window. That is to say, it is not an anomaly
|
||||
|
||||
////
|
||||
===== Responses
|
||||
|
||||
200
|
||||
(EmptyResponse) The cluster has been successfully deleted
|
||||
404
|
||||
(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found)
|
||||
412
|
||||
(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error)
|
||||
////
|
||||
===== Examples
|
||||
|
||||
The following example opens the `event_rate` job:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
POST _xpack/ml/anomaly_detectors/event_rate/_open
|
||||
{
|
||||
"ignore_downtime":false
|
||||
}
|
||||
--------------------------------------------------
|
||||
// CONSOLE
|
||||
// TEST[skip:todo]
|
||||
|
||||
When the job opens, you receive the following results:
|
||||
----
|
||||
{
|
||||
"opened": true
|
||||
}
|
||||
----
|
@ -20,7 +20,7 @@ The job is ready to resume its analysis from where it left off, once new data is
|
||||
===== Path Parameters
|
||||
|
||||
`job_id` (required)::
|
||||
(+string+) Identifier for the job
|
||||
(+string+) Identifier for the job
|
||||
|
||||
===== Request Body
|
||||
|
||||
|
@ -7,12 +7,13 @@ The preview data feed API allows you to preview a data feed.
|
||||
|
||||
`GET _xpack/ml/datafeeds/<feed_id>/_preview`
|
||||
|
||||
////
|
||||
|
||||
===== Description
|
||||
|
||||
Important:: Updates do not take effect until after then job is closed and new
|
||||
data is sent to it.
|
||||
////
|
||||
TBD
|
||||
//How much data does it return?
|
||||
The API returns example data by using the current data feed settings.
|
||||
|
||||
===== Path Parameters
|
||||
|
||||
`feed_id` (required)::
|
||||
@ -21,25 +22,7 @@ data is sent to it.
|
||||
////
|
||||
===== Request Body
|
||||
|
||||
The following properties can be updated after the job is created:
|
||||
|
||||
`analysis_config`::
|
||||
(+object+) The analysis configuration, which specifies how to analyze the data.
|
||||
See <<ml-analysisconfig, analysis configuration objects>>. In particular, the following properties can be updated: `categorization_filters`, `detector_description`, TBD.
|
||||
|
||||
`analysis_limits`::
|
||||
Optionally specifies runtime limits for the job. See <<ml-apilimits,analysis limits>>.
|
||||
|
||||
[NOTE]
|
||||
* You can update the `analysis_limits` only while the job is closed.
|
||||
* The `model_memory_limit` property value cannot be decreased.
|
||||
* If the `memory_status` property in the `model_size_stats` object has a value of `hard_limit`,
|
||||
increasing the `model_memory_limit` is not recommended.
|
||||
|
||||
`description`::
|
||||
(+string+) An optional description of the job.
|
||||
|
||||
This expects data to be sent in JSON format using the POST `_data` API.
|
||||
None
|
||||
|
||||
===== Responses
|
||||
|
||||
@ -52,33 +35,33 @@ TBD
|
||||
(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found)
|
||||
412
|
||||
(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error)
|
||||
|
||||
////
|
||||
===== Examples
|
||||
|
||||
The following example updates the `it-ops-kpi` job:
|
||||
The following example obtains a previews of the `datafeed-farequote` data feed:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT _xpack/ml/anomaly_detectors/it-ops-kpi/_update
|
||||
{
|
||||
"description":"New description",
|
||||
"analysis_limits":{
|
||||
"model_memory_limit": 8192
|
||||
}
|
||||
}
|
||||
GET _xpack/ml/datafeeds/datafeed-farequote/_preview
|
||||
--------------------------------------------------
|
||||
// CONSOLE
|
||||
// TEST[skip:todo]
|
||||
|
||||
When the job is updated, you receive the following results:
|
||||
The data that is returned for this example is as follows:
|
||||
----
|
||||
{
|
||||
"job_id": "it-ops-kpi",
|
||||
"description": "New description",
|
||||
[
|
||||
{
|
||||
"@timestamp": 1454803200000,
|
||||
"responsetime": 132.20460510253906
|
||||
},
|
||||
{
|
||||
"@timestamp": 1454803200000,
|
||||
"responsetime": 990.4628295898438
|
||||
},
|
||||
{
|
||||
"@timestamp": 1454803200000,
|
||||
"responsetime": 877.5927124023438
|
||||
},
|
||||
...
|
||||
"analysis_limits": {
|
||||
"model_memory_limit": 8192
|
||||
...
|
||||
}
|
||||
]
|
||||
----
|
||||
////
|
||||
|
@ -1,43 +1,57 @@
|
||||
[[ml-put-datafeed]]
|
||||
==== Create Data Feeds
|
||||
|
||||
The create data feed API allows you to instantiate a data feed.
|
||||
The create data feed API enables you to instantiate a data feed.
|
||||
|
||||
===== Request
|
||||
|
||||
`PUT _xpack/ml/datafeeds/<feed_id>`
|
||||
|
||||
////
|
||||
|
||||
===== Description
|
||||
|
||||
TBD
|
||||
////
|
||||
You must create a job before you create a data feed. You can associate only one
|
||||
data feed to each job.
|
||||
|
||||
===== Path Parameters
|
||||
|
||||
`feed_id` (required)::
|
||||
(+string+) Identifier for the data feed
|
||||
(+string+) A numerical character string that uniquely identifies the data feed.
|
||||
|
||||
////
|
||||
===== Request Body
|
||||
|
||||
aggregations::
|
||||
(+object+) TBD. For example: {"@timestamp": {"histogram": {"field": "@timestamp",
|
||||
"interval": 30000,"offset": 0,"order": {"_key": "asc"},"keyed": false,
|
||||
"min_doc_count": 0}, "aggregations": {"events_per_min": {"sum": {
|
||||
"field": "events_per_min"}}}}}
|
||||
|
||||
`description`::
|
||||
(+string+) An optional description of the job.
|
||||
frequency::
|
||||
TBD: For example: "150s"
|
||||
|
||||
`analysis_config`::
|
||||
(+object+) The analysis configuration, which specifies how to analyze the data.
|
||||
See <<ml-analysisconfig, analysis configuration objects>>.
|
||||
indexes (required)::
|
||||
(+array+) An array of index names. For example: ["it_ops_metrics"]
|
||||
|
||||
`data_description`::
|
||||
(+object+) Describes the format of the input data.
|
||||
See <<ml-datadescription,data description objects>>.
|
||||
job_id (required)::
|
||||
(+string+) A numerical character string that uniquely identifies the job.
|
||||
|
||||
`analysis_limits`::
|
||||
Optionally specifies runtime limits for the job. See <<ml-apilimits,analysis limits>>.
|
||||
query::
|
||||
(+object+) The query that retrieves the data.
|
||||
By default, this property has the following value: `{"match_all": {"boost": 1}}`.
|
||||
|
||||
query_delay::
|
||||
TBD. For example: "60s"
|
||||
|
||||
This expects data to be sent in JSON format using the POST `_data` API.
|
||||
scroll_size::
|
||||
TBD. For example, 1000
|
||||
|
||||
types (required)::
|
||||
TBD. For example: ["network","sql","kpi"]
|
||||
|
||||
For more information about these properties,
|
||||
see <<ml-datafeed-resource, Data Feed Resources>>.
|
||||
|
||||
////
|
||||
===== Responses
|
||||
|
||||
TBD
|
||||
@ -48,62 +62,55 @@ TBD
|
||||
412
|
||||
(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error)
|
||||
|
||||
|
||||
////
|
||||
===== Examples
|
||||
|
||||
The following example creates the `it-ops-kpi` job:
|
||||
The following example creates the `datafeed-it-ops-kpi` data feed:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT _xpack/ml/anomaly_detectors/it-ops-kpi
|
||||
PUT _xpack/ml/datafeeds/datafeed-it-ops-kpi
|
||||
{
|
||||
"description":"First simple job",
|
||||
"analysis_config":{
|
||||
"bucket_span": "5m",
|
||||
"latency": "0ms",
|
||||
"detectors":[
|
||||
{
|
||||
"detector_description": "low_sum(events_per_min)",
|
||||
"function":"low_sum",
|
||||
"field_name": "events_per_min"
|
||||
}
|
||||
"job_id": "it-ops-kpi",
|
||||
"query":
|
||||
{
|
||||
"match_all":
|
||||
{
|
||||
"boost": 1
|
||||
}
|
||||
},
|
||||
"indexes": [
|
||||
"it_ops_metrics"
|
||||
],
|
||||
"types": [
|
||||
"kpi",
|
||||
"sql",
|
||||
"network"
|
||||
]
|
||||
},
|
||||
"data_description": {
|
||||
"time_field":"@timestamp",
|
||||
"time_format":"epoch_ms"
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// CONSOLE
|
||||
// TEST[skip:todo]
|
||||
|
||||
When the job is created, you receive the following results:
|
||||
When the data feed is created, you receive the following results:
|
||||
----
|
||||
{
|
||||
"datafeed_id": "datafeed-it-ops-kpi",
|
||||
"job_id": "it-ops-kpi",
|
||||
"description": "First simple job",
|
||||
"create_time": 1491247016391,
|
||||
"analysis_config": {
|
||||
"bucket_span": "5m",
|
||||
"latency": "0ms",
|
||||
"detectors": [
|
||||
{
|
||||
"detector_description": "low_sum(events_per_min)",
|
||||
"function": "low_sum",
|
||||
"field_name": "events_per_min",
|
||||
"detector_rules": []
|
||||
}
|
||||
],
|
||||
"influencers": [],
|
||||
"use_per_partition_normalization": false
|
||||
"query_delay": "1m",
|
||||
"indexes": [
|
||||
"it_ops_metrics"
|
||||
],
|
||||
"types": [
|
||||
"kpi",
|
||||
"sql",
|
||||
"network"
|
||||
],
|
||||
"query": {
|
||||
"match_all": {
|
||||
"boost": 1
|
||||
}
|
||||
},
|
||||
"data_description": {
|
||||
"time_field": "@timestamp",
|
||||
"time_format": "epoch_ms"
|
||||
},
|
||||
"model_snapshot_retention_days": 1,
|
||||
"results_index_name": "shared"
|
||||
"scroll_size": 1000
|
||||
}
|
||||
----
|
||||
////
|
||||
|
@ -1,7 +1,7 @@
|
||||
[[ml-put-job]]
|
||||
==== Create Jobs
|
||||
|
||||
The create job API allows you to instantiate a {ml} job.
|
||||
The create job API enables you to instantiate a {ml} job.
|
||||
|
||||
===== Request
|
||||
|
||||
|
@ -1,37 +1,73 @@
|
||||
[[ml-start-datafeed]]
|
||||
==== Start Data Feeds
|
||||
|
||||
A data feed must be started in order for it to be ready to receive and analyze data.
|
||||
A data feed must be started in order to retrieve data from {es}.
|
||||
A data feed can be opened and closed multiple times throughout its lifecycle.
|
||||
|
||||
===== Request
|
||||
|
||||
`POST _xpack/ml/datafeeds/<feed_id>/_start`
|
||||
|
||||
////
|
||||
===== Description
|
||||
|
||||
A job must be open in order to it to accept and analyze data.
|
||||
When you start a data feed, you can specify a start time. This allows you to
|
||||
include a training period, providing you have this data available in {es}.
|
||||
If you want to analyze from the beginning of a dataset, you can specify any date
|
||||
earlier than that beginning date.
|
||||
|
||||
When you open a new job, it starts with an empty model.
|
||||
If you do not specify a start time and the data feed is associated with a new
|
||||
job, the analysis starts from the earliest time for which data is available.
|
||||
|
||||
When you start a data feed, you can also specify an end time. If you do so, the
|
||||
job analyzes data from the start time until the end time, at which point the
|
||||
analysis stops. This scenario is useful for a one-off batch analysis. If you
|
||||
do not specify an end time, the data feed runs continuously.
|
||||
|
||||
If the system restarts, any jobs that had data feeds running are also restarted.
|
||||
|
||||
When a stopped data feed is restarted, it continues processing input data from
|
||||
the next millisecond after it was stopped. If your data contains the same
|
||||
timestamp (for example, it is summarized by minute), then data loss is possible
|
||||
for the timestamp value when the data feed stopped. This situation can occur
|
||||
because the job might not have completely processed all data for that millisecond.
|
||||
If you specify a `start` value that is earlier than the timestamp of the latest
|
||||
processed record, that value is ignored.
|
||||
|
||||
NOTE: Before you can start a data feed, the job must be open. Otherwise, an error
|
||||
occurs.
|
||||
|
||||
When you open an existing job, the most recent model state is automatically loaded.
|
||||
The job is ready to resume its analysis from where it left off, once new data is received.
|
||||
////
|
||||
===== Path Parameters
|
||||
|
||||
`feed_id` (required)::
|
||||
(+string+) Identifier for the data feed
|
||||
////
|
||||
(+string+) Identifier for the data feed
|
||||
|
||||
===== Request Body
|
||||
|
||||
`open_timeout`::
|
||||
(+time+; default: ++30 min++) Controls the time to wait until a job has opened
|
||||
`end`::
|
||||
(+string+) The time that the data feed should end. This value is exclusive.
|
||||
The default value is an empty string.
|
||||
|
||||
`ignore_downtime`::
|
||||
(+boolean+; default: ++true++) If true (default), any gap in data since it was
|
||||
last closed is treated as a maintenance window. That is to say, it is not an anomaly
|
||||
`start`::
|
||||
(+string+) The time that the data feed should begin. This value is inclusive.
|
||||
The default value is an empty string.
|
||||
|
||||
These `start` and `end` times can be specified by using one of the
|
||||
following formats:
|
||||
* ISO 8601 format with milliseconds, for example `2017-01-22T06:00:00.000Z`
|
||||
* ISO 8601 format without milliseconds, for example `2017-01-22T06:00:00+00:00`
|
||||
* Seconds from the Epoch, for example `1390370400`
|
||||
|
||||
NOTE: When a URL is expected (for example, in browsers), the `+`` used in time
|
||||
zone designators has to be encoded as `%2B`.
|
||||
|
||||
Date-time arguments using either of the ISO 8601 formats must have a time zone
|
||||
designator, where Z is accepted as an abbreviation for UTC time.
|
||||
|
||||
`timeout`::
|
||||
(+time+) Controls the amount of time to wait until a data feed starts.
|
||||
The default value is 20 seconds.
|
||||
|
||||
////
|
||||
===== Responses
|
||||
|
||||
200
|
||||
@ -40,16 +76,16 @@ The job is ready to resume its analysis from where it left off, once new data is
|
||||
(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found)
|
||||
412
|
||||
(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error)
|
||||
|
||||
////
|
||||
===== Examples
|
||||
|
||||
The following example opens the `event_rate` job:
|
||||
The following example opens the `datafeed-it-ops-kpi` data feed:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
POST _xpack/ml/anomaly_detectors/event_rate/_open
|
||||
POST _xpack/ml/datafeeds/datafeed-it-ops-kpi/_start
|
||||
{
|
||||
"ignore_downtime":false
|
||||
"start": "2017-04-07T18:22:16Z"
|
||||
}
|
||||
--------------------------------------------------
|
||||
// CONSOLE
|
||||
@ -58,7 +94,6 @@ POST _xpack/ml/anomaly_detectors/event_rate/_open
|
||||
When the job opens, you receive the following results:
|
||||
----
|
||||
{
|
||||
"opened": true
|
||||
"started": true
|
||||
}
|
||||
----
|
||||
////
|
||||
|
@ -1,6 +1,7 @@
|
||||
[[ml-stop-datafeed]]
|
||||
==== Stop Data Feeds
|
||||
|
||||
A data feed that is stopped ceases to retrieve data from {es}.
|
||||
A data feed can be opened and closed multiple times throughout its lifecycle.
|
||||
|
||||
===== Request
|
||||
@ -10,31 +11,23 @@ A data feed can be opened and closed multiple times throughout its lifecycle.
|
||||
////
|
||||
===== Description
|
||||
|
||||
A job can be closed once all data has been analyzed.
|
||||
|
||||
When you close a job, it runs housekeeping tasks such as pruning the model history,
|
||||
flushing buffers, calculating final results and persisting the internal models.
|
||||
Depending upon the size of the job, it could take several minutes to close and
|
||||
the equivalent time to re-open.
|
||||
|
||||
Once closed, the anomaly detection job has almost no overhead on the cluster
|
||||
(except for maintaining its meta data). A closed job is blocked for receiving
|
||||
data and analysis operations, however you can still explore and navigate results.
|
||||
|
||||
//NOTE:
|
||||
//OUTDATED?: If using the {prelert} UI, the job will be automatically closed when stopping a datafeed job.
|
||||
TBD
|
||||
////
|
||||
===== Path Parameters
|
||||
|
||||
`feed_id` (required)::
|
||||
(+string+) Identifier for the data feed
|
||||
(+string+) Identifier for the data feed
|
||||
|
||||
===== Request Body
|
||||
|
||||
`force`::
|
||||
(+boolean+) If true, the data feed is stopped forcefully.
|
||||
|
||||
`timeout`::
|
||||
(+time+) Controls the amount of time to wait until a data feed stops.
|
||||
The default value is 20 seconds.
|
||||
|
||||
////
|
||||
===== Query Parameters
|
||||
|
||||
`close_timeout`::
|
||||
(+time+; default: ++30 min++) Controls the time to wait until a job has closed
|
||||
|
||||
|
||||
===== Responses
|
||||
|
||||
200
|
||||
@ -43,22 +36,24 @@ data and analysis operations, however you can still explore and navigate results
|
||||
(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found)
|
||||
412
|
||||
(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error)
|
||||
|
||||
////
|
||||
===== Examples
|
||||
|
||||
The following example closes the `event_rate` job:
|
||||
The following example stops the `datafeed-it-ops-kpi` data feed:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
POST _xpack/ml/anomaly_detectors/event_rate/_close
|
||||
POST _xpack/ml/datafeeds/datafeed-it-ops-kpi/_stop
|
||||
{
|
||||
"timeout": "30s"
|
||||
}
|
||||
--------------------------------------------------
|
||||
// CONSOLE
|
||||
// TEST[skip:todo]
|
||||
|
||||
When the job is closed, you receive the following results:
|
||||
When the data feed stops, you receive the following results:
|
||||
----
|
||||
{
|
||||
"closed": true
|
||||
"stopped": true
|
||||
}
|
||||
----
|
||||
////
|
||||
|
@ -10,75 +10,122 @@ The update data feed API allows you to update certain properties of a data feed.
|
||||
////
|
||||
===== Description
|
||||
|
||||
Important:: Updates do not take effect until after then job is closed and new
|
||||
data is sent to it.
|
||||
TBD
|
||||
|
||||
////
|
||||
===== Path Parameters
|
||||
|
||||
`feed_id` (required)::
|
||||
(+string+) Identifier for the data feed
|
||||
|
||||
////
|
||||
===== Request Body
|
||||
|
||||
The following properties can be updated after the job is created:
|
||||
The following properties can be updated after the data feed is created:
|
||||
|
||||
`analysis_config`::
|
||||
(+object+) The analysis configuration, which specifies how to analyze the data.
|
||||
See <<ml-analysisconfig, analysis configuration objects>>. In particular, the following properties can be updated: `categorization_filters`, `detector_description`, TBD.
|
||||
aggregations::
|
||||
(+object+) TBD.
|
||||
|
||||
`analysis_limits`::
|
||||
Optionally specifies runtime limits for the job. See <<ml-apilimits,analysis limits>>.
|
||||
frequency::
|
||||
TBD: For example: "150s"
|
||||
|
||||
[NOTE]
|
||||
* You can update the `analysis_limits` only while the job is closed.
|
||||
* The `model_memory_limit` property value cannot be decreased.
|
||||
* If the `memory_status` property in the `model_size_stats` object has a value of `hard_limit`,
|
||||
increasing the `model_memory_limit` is not recommended.
|
||||
indexes (required)::
|
||||
(+array+) An array of index names. For example: ["it_ops_metrics"]
|
||||
|
||||
`description`::
|
||||
(+string+) An optional description of the job.
|
||||
job_id::
|
||||
(+string+) A numerical character string that uniquely identifies the job.
|
||||
|
||||
This expects data to be sent in JSON format using the POST `_data` API.
|
||||
query::
|
||||
(+object+) The query that retrieves the data.
|
||||
By default, this property has the following value: `{"match_all": {"boost": 1}}`.
|
||||
|
||||
query_delay::
|
||||
TBD. For example: "60s"
|
||||
|
||||
scroll_size::
|
||||
TBD. For example, 1000
|
||||
|
||||
types (required)::
|
||||
TBD. For example: ["network","sql","kpi"]
|
||||
|
||||
For more information about these properties,
|
||||
see <<ml-datafeed-resource, Data Feed Resources>>.
|
||||
|
||||
////
|
||||
===== Responses
|
||||
|
||||
TBD
|
||||
////
|
||||
////
|
||||
|
||||
200
|
||||
(EmptyResponse) The cluster has been successfully deleted
|
||||
404
|
||||
(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found)
|
||||
412
|
||||
(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error)
|
||||
|
||||
////
|
||||
===== Examples
|
||||
|
||||
The following example updates the `it-ops-kpi` job:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT _xpack/ml/anomaly_detectors/it-ops-kpi/_update
|
||||
POST _xpack/ml/datafeeds/datafeed-it-ops-kpi3/_update
|
||||
{
|
||||
"description":"New description",
|
||||
"analysis_limits":{
|
||||
"model_memory_limit": 8192
|
||||
"aggregations": {
|
||||
"@timestamp": {
|
||||
"histogram": {
|
||||
"field": "@timestamp",
|
||||
"interval": 30000,
|
||||
"offset": 0,
|
||||
"order": {
|
||||
"_key": "asc"
|
||||
},
|
||||
"keyed": false,
|
||||
"min_doc_count": 0
|
||||
},
|
||||
"aggregations": {
|
||||
"events_per_min": {
|
||||
"sum": {
|
||||
"field": "events_per_min"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"frequency": "160s"
|
||||
}
|
||||
--------------------------------------------------
|
||||
// CONSOLE
|
||||
// TEST[skip:todo]
|
||||
|
||||
When the job is updated, you receive the following results:
|
||||
When the data feed is updated, you receive the following results:
|
||||
----
|
||||
{
|
||||
"datafeed_id": "datafeed-it-ops-kpi",
|
||||
"job_id": "it-ops-kpi",
|
||||
"description": "New description",
|
||||
...
|
||||
"analysis_limits": {
|
||||
"model_memory_limit": 8192
|
||||
...
|
||||
"query_delay": "1m",
|
||||
"frequency": "160s",
|
||||
...
|
||||
"aggregations": {
|
||||
"@timestamp": {
|
||||
"histogram": {
|
||||
"field": "@timestamp",
|
||||
"interval": 30000,
|
||||
"offset": 0,
|
||||
"order": {
|
||||
"_key": "asc"
|
||||
},
|
||||
"keyed": false,
|
||||
"min_doc_count": 0
|
||||
},
|
||||
"aggregations": {
|
||||
"events_per_min": {
|
||||
"sum": {
|
||||
"field": "events_per_min"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"scroll_size": 1000
|
||||
}
|
||||
----
|
||||
////
|
||||
|
Loading…
x
Reference in New Issue
Block a user