[DOCS] Add ML data feed API examples (elastic/x-pack-elasticsearch#1016)

* [DOCS] Added examples for all ML job APIs * [DOCS] Add ML datafeed API examples Original commit: elastic/x-pack-elasticsearch@9634356371
2025-03-25 01:19:02 +00:00 · 2017-04-10 08:59:27 -07:00 · 2017-04-10 08:59:27 -07:00 · 90575b18f4
commit 90575b18f4
parent 00bc35cf9f
16 changed files with 409 additions and 338 deletions
--- a/docs/en/ml/introduction.asciidoc
+++ b/docs/en/ml/introduction.asciidoc
@ -5,8 +5,8 @@ Machine learning in {xpack} automates the analysis of time-series data by
 creating accurate baselines of normal behaviors in the data, and identifying
 anomalous patterns in that data.

-Driven by proprietary machine learning algorithms, anomalies related to temporal
-deviations in values/counts/frequencies, statistical rarity, and unusual
+Driven by proprietary machine learning algorithms, anomalies related to
+temporal deviations in values/counts/frequencies, statistical rarity, and unusual
 behaviors for a member of a population are detected, scored and linked with
 statistically significant influencers in the data.

@ -15,12 +15,52 @@ that you don’t need to specify algorithms, models, or other data
 science-related configurations in order to get the benefits of {ml}.
 //image::graph-network.jpg["Graph network"]

+[float]
 === Integration with the Elastic Stack

 Machine learning is tightly integrated with the Elastic Stack.
 Data is pulled from {es} for analysis and anomaly results are displayed in
 {kb} dashboards.

+[float]
+[[ml-concepts]]
+=== Basic Concepts
+
+There are a few concepts that are core to {ml} in {xpack}.
+Understanding these concepts from the outset will tremendously help ease the
+learning process.
+
+Jobs::
+  Machine learning jobs contain the configuration information and metadata
+  necessary to perform an analytics task. For a list of the properties associated
+  with a job, see <<ml-job-resource, Job Resources>>.
+
+Data feeds::
+  Jobs can analyze either a batch of data from a data store or a stream of data
+  in real-time. The latter involves data that is retrieved from {es} and is
+  referred to as a _data feed_.
+
+Detectors::
+  Part of the configuration information associated with a job, detectors define
+  the type of analysis that needs to be done (for example, max, average, rare).
+  They also specify which fields to analyze. You can have more than one detector
+  in a job, which is more efficient than running multiple jobs against the same
+  data stream. For a list of the properties associated with detectors, see
+  <<ml-detectorconfig, Detector Configuration Objects>>.
+
+Buckets::
+  Part of the configuration information associated with a job, the _bucket span_
+  defines the time interval across which the job analyzes. When setting the
+  bucket span, take into account the granularity at which you want to analyze,
+  the frequency of the input data, and the frequency at which alerting is required.
+
+
+
+
+
+
+
+
 //[float]
 //== Where to Go Next

--- a/docs/en/rest-api/ml-api.asciidoc
+++ b/docs/en/rest-api/ml-api.asciidoc
@ -10,7 +10,16 @@ Use machine learning to detect anomalies in time series data.
 * <<ml-api-definitions, Definitions>>

 [[ml-api-datafeed-endpoint]]
-=== Datafeeds
+=== Data Feeds
+
+* <<ml-put-datafeed,Create data feeds>>
+* <<ml-delete-datafeed,Delete data feeds>>
+* <<ml-get-datafeed,Get data feed details>>
+* <<ml-get-datafeed-stats,Get data feed statistics>>
+* <<ml-preview-datafeed,Preview data feeds>>
+* <<ml-start-datafeed,Start data feeds>>
+* <<ml-stop-datafeed,Stop data feeds>>
+* <<ml-update-datafeed,Update data feeds>>

 include::ml/put-datafeed.asciidoc[]
 include::ml/delete-datafeed.asciidoc[]
--- a/docs/en/rest-api/ml/datafeedresource.asciidoc
+++ b/docs/en/rest-api/ml/datafeedresource.asciidoc
@ -3,8 +3,63 @@

 A data feed resource has the following properties:

-TBD
-////
-`analysis_config`::
-  (+object+) The analysis configuration, which specifies how to analyze the data. See <<ml-analysisconfig, analysis configuration objects>>.
-////
+`aggregations`::
+  (+object+) TBD
+  The aggregations object describes the aggregations that are
+  applied to the search query?
+  For more information, see <<{ref}search-aggregations.html,Aggregations>>.
+  For example:
+  `{"@timestamp": {"histogram": {"field": "@timestamp",
+  "interval": 30000,"offset": 0,"order": {"_key": "asc"},"keyed": false,
+  "min_doc_count": 0}, "aggregations": {"events_per_min": {"sum": {
+  "field": "events_per_min"}}}}}`.
+
+`datafeed_id`::
+ (+string+) A numerical character string that uniquely identifies the data feed.
+
+`frequency`::
+   TBD. A time  For example: "150s"
+
+`indexes` (required)::
+  (+array+) An array of index names. For example: ["it_ops_metrics"]
+
+`job_id` (required)::
+ (+string+) A numerical character string that uniquely identifies the job.
+
+`query`::
+  (+object+) TBD. The query that retrieves the data.
+  By default, this property has the following value: `{"match_all": {"boost": 1}}`.
+
+`query_delay`::
+  TBD. For example: "60s"
+
+`scroll_size`::
+  TBD.
+  The maximum number of hits to be returned with each batch of search results?
+  The default value is `1000`.
+
+`types` (required)::
+  (+array+) TBD. For example: ["network","sql","kpi"]
+
+  [[ml-datafeed-counts]]
+  ==== Data Feed Counts
+
+  The get data feed statistics API provides information about the operational
+  progress of a data feed. For example:
+
+`assigment_explanation`::
+  TBD
+  For example: ""
+
+`node`::
+  (+object+) TBD
+  The node that is running the query?
+  For example: `{"id": "0-o0tOoRTwKFZifatTWKNw","name": "0-o0tOo",
+  "ephemeral_id": "DOZltLxLS_SzYpW6hQ9hyg","transport_address": "127.0.0.1:9300",
+  "attributes": {"max_running_jobs": "10"}}
+
+`state`::
+  (+string+) The status of the data feed,
+  which can be one of the following values:
+  * started:: The data feed is actively receiving data.
+  * stopped:: The data feed is stopped and will not receive data until it is re-started.
--- a/docs/en/rest-api/ml/delete-datafeed.asciidoc
+++ b/docs/en/rest-api/ml/delete-datafeed.asciidoc
@ -7,25 +7,14 @@ The delete data feed API allows you to delete an existing data feed.

 `DELETE _xpack/ml/datafeeds/<feed_id>`

-////
 ===== Description

-All job configuration, model state and results are deleted.
+NOTE: You must stop the data feed before you can delete it.

-IMPORTANT:  Deleting a job must be done via this API only. Do not delete the
-            job directly from the `.ml-*` indices using the Elasticsearch
-            DELETE Document API. When {security} is enabled, make sure no `write`
-            privileges are granted to anyone over the `.ml-*` indices.
-
-Before you can delete a job, you must delete the data feeds that are associated with it.
-//See <<>>.
-
-It is not currently possible to delete multiple jobs using wildcards or a comma separated list.
-////
 ===== Path Parameters

 `feed_id` (required)::
-  (+string+)    Identifier for the data feed
+  (+string+) Identifier for the data feed
 ////
 ===== Responses

--- a/docs/en/rest-api/ml/get-datafeed-stats.asciidoc
+++ b/docs/en/rest-api/ml/get-datafeed-stats.asciidoc
@ -1,7 +1,8 @@
 [[ml-get-datafeed-stats]]
 ==== Get Data Feed Statistics

-The get data feed statistics API allows you to retrieve usage information for data feeds.
+The get data feed statistics API allows you to retrieve usage information for
+data feeds.

 ===== Request

@ -9,47 +10,40 @@ The get data feed statistics API allows you to retrieve usage information for da

 `GET _xpack/ml/datafeeds/<feed_id>/_stats`

-////
+
 ===== Description

-TBD
-////
+If the data feed is stopped, the only information you receive is the
+`datafeed_id` and the `state`.
+
 ===== Path Parameters

 `feed_id`::
-  (+string+) Identifier for the data feed. If you do not specify this optional parameter,
-  the API returns information about all data feeds that you have authority to view.
+  (+string+) Identifier for the data feed.
+  If you do not specify this optional parameter, the API returns information
+  about all data feeds that you have authority to view.

-
-////
 ===== Results

 The API returns the following usage information:

-`job_id`::
-  (+string+) A numerical character string that uniquely identifies the job.
+`assigment_explanation`::
+  TBD
+  For example: ""

-`data_counts`::
-  (+object+) An object that describes the number of records processed and any related error counts.
-  See <<ml-datacounts,data counts objects>>.
+`datafeed_id`::
+  (+string+) A numerical character string that uniquely identifies the data feed.

-`model_size_stats`::
-  (+object+) An object that provides information about the size and contents of the model.
-  See <<ml-modelsizestats,model size stats objects>>
+`node`::
+  (+object+) TBD

 `state`::
-  (+string+) The status of the job, which can be one of the following values:
-    running:: The job is actively receiving and processing data.
-    closed:: The job finished successfully with its model state persisted.
-    The job is still available to accept further data. NOTE: If you send data in a periodic cycle
-    and close the job at the end of each transaction, the job is marked as closed in the intervals
-    between when data is sent. For example, if data is sent every minute and it takes 1 second to process,
-    the job has a closed state for 59 seconds.
-    failed:: The job did not finish successfully due to an error. NOTE: This can occur due to invalid input data.
-    In this case, sending corrected data to a failed job re-opens the job and resets it to a running state.
-
-
-
+  (+string+) The status of the data feed, which can be one of the following values:
+  * `started`: The data feed is actively receiving data.
+  * `stopped`: The data feed is stopped and will not receive data until
+    it is re-started.
+//failed?
+////
 ===== Responses

 200
@ -58,48 +52,28 @@ The API returns the following usage information:
 (BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found)
 412
 (BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error)
-
+////
 ===== Examples

-.Example results for a single job
+.Example results for a started job
 ----
 {
  "count": 1,
-  "jobs": [
+  "datafeeds": [
    {
-      "job_id": "it-ops-kpi",
-      "data_counts": {
-        "job_id": "it-ops",
-        "processed_record_count": 43272,
-        "processed_field_count": 86544,
-        "input_bytes": 2846163,
-        "input_field_count": 86544,
-        "invalid_date_count": 0,
-        "missing_field_count": 0,
-        "out_of_order_timestamp_count": 0,
-        "empty_bucket_count": 0,
-        "sparse_bucket_count": 0,
-        "bucket_count": 4329,
-        "earliest_record_timestamp": 1454020560000,
-        "latest_record_timestamp": 1455318900000,
-        "last_data_time": 1491235405945,
-        "input_record_count": 43272
+      "datafeed_id": "datafeed-farequote",
+      "state": "started",
+      "node": {
+        "id": "0-o0tOoRTwKFZifatTWKNw",
+        "name": "0-o0tOo",
+        "ephemeral_id": "DOZltLxLS_SzYpW6hQ9hyg",
+        "transport_address": "127.0.0.1:9300",
+        "attributes": {
+          "max_running_jobs": "10"
+        }
      },
-      "model_size_stats": {
-        "job_id": "it-ops",
-        "result_type": "model_size_stats",
-        "model_bytes": 25586,
-        "total_by_field_count": 3,
-        "total_over_field_count": 0,
-        "total_partition_field_count": 2,
-        "bucket_allocation_failures_count": 0,
-        "memory_status": "ok",
-        "log_time": 1491235406000,
-        "timestamp": 1455318600000
-      },
-      "state": "closed"
+      "assigment_explanation": ""
    }
  ]
 }
 ----
-////
--- a/docs/en/rest-api/ml/get-datafeed.asciidoc
+++ b/docs/en/rest-api/ml/get-datafeed.asciidoc
@ -1,7 +1,8 @@
 [[ml-get-datafeed]]
 ==== Get Data Feeds

-The get data feeds API allows you to retrieve configuration information about data feeds.
+The get data feeds API allows you to retrieve configuration information for
+data feeds.

 ===== Request

@ -16,19 +17,19 @@ OUTDATED?: The get job API can also be applied to all jobs by using `_all` as th
 ===== Path Parameters

 `feed_id`::
-  (+string+) Identifier for the data feed. If you do not specify this optional parameter,
-  the API returns information about all data feeds that you have authority to view.
+  (+string+) Identifier for the data feed.
+  If you do not specify this optional parameter, the API returns information
+  about all data feeds that you have authority to view.

 ===== Results

 The API returns information about the data feed resource.
-//For more information, see <<ml-job-resource,job resources>>.
+For more information, see <<ml-datafeed-resource,data feed resources>>.

 ////
 ===== Query Parameters

-`_stats`::
-(+boolean+; default: ++true++) If true (default false), will just validate the cluster definition but will not perform the creation
+None

 ===== Responses

--- a/docs/en/rest-api/ml/get-job.asciidoc
+++ b/docs/en/rest-api/ml/get-job.asciidoc
@ -27,8 +27,7 @@ The API returns information about the job resource. For more information, see
 ////
 ===== Query Parameters

-`_stats`::
-(+boolean+; default: ++true++) If true (default false), will just validate the cluster definition but will not perform the creation
+None

 ===== Responses

--- a/docs/en/rest-api/ml/jobresource.asciidoc
+++ b/docs/en/rest-api/ml/jobresource.asciidoc
@ -52,8 +52,8 @@ An analysis configuration object has the following properties:
  Requires `period` to be specified
 ////

-`bucket_span`::
-  (+unsigned integer+, required) The size of the interval that the analysis is aggregated into, measured in seconds.
+`bucket_span` (required)::
+  (+unsigned integer+) The size of the interval that the analysis is aggregated into, measured in seconds.
  The default value is 300 seconds (5 minutes).

 `categorization_field_name`::
@ -69,8 +69,8 @@ An analysis configuration object has the following properties:
  that should not be taken into consideration for defining categories.
  For example, you can exclude SQL statements that appear in your log files.

-`detectors`::
-  (+array+, required) An array of detector configuration objects,
+`detectors` (required)::
+  (+array+) An array of detector configuration objects,
  which describe the anomaly detectors that are used in the job.
  See <<ml-detectorconfig,detector configuration objects>>.

@ -154,8 +154,8 @@ Each detector has the following properties:

 NOTE: The `field_name` cannot contain double quotes or backslashes.

-`function`::
-  (+string+, required) The analysis function that is used.
+`function` (required)::
+  (+string+) The analysis function that is used.
  For example, `count`, `rare`, `mean`, `min`, `max`, and `sum`.
  The default function is `metric`, which looks for anomalies in all of `min`, `max`,
  and `mean`.
--- a/docs/en/rest-api/ml/open-datafeed.asciidoc
+++ b/docs/en/rest-api/ml/open-datafeed.asciidoc
@ -1,63 +0,0 @@
-[[ml-open-job]]
-==== Open Jobs
-
-An anomaly detection job must be opened in order for it to be ready to receive and analyze data.
-A job may be opened and closed multiple times throughout its lifecycle.
-
-===== Request
-
-`POST _xpack/ml/anomaly_detectors/<job_id>/_open`
-
-===== Description
-
-A job must be open in order to it to accept and analyze data.
-
-When you open a new job, it starts with an empty model.
-
-When you open an existing job, the most recent model state is automatically loaded.
-The job is ready to resume its analysis from where it left off, once new data is received.
-
-===== Path Parameters
-
-`job_id` (required)::
-(+string+)    Identifier for the job
-
-===== Request Body
-
-`open_timeout`::
-  (+time+; default: ++30 min++) Controls the time to wait until a job has opened
-
-`ignore_downtime`::
-  (+boolean+; default: ++true++) If true (default), any gap in data since it was
-  last closed is treated as a maintenance window. That is to say, it is not an anomaly
-
-////
-===== Responses
-
-200
-(EmptyResponse) The cluster has been successfully deleted
-404
-(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found)
-412
-(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error)
-////
-===== Examples
-
-The following example opens the `event_rate` job:
-
-[source,js]
--------------------------------------------------
-POST _xpack/ml/anomaly_detectors/event_rate/_open
-{
-  "ignore_downtime":false
-}
--------------------------------------------------
-// CONSOLE
-// TEST[skip:todo]
-
-When the job opens, you receive the following results:
----
-{
-  "opened": true
-}
----
--- a/docs/en/rest-api/ml/open-job.asciidoc
+++ b/docs/en/rest-api/ml/open-job.asciidoc
@ -20,7 +20,7 @@ The job is ready to resume its analysis from where it left off, once new data is
 ===== Path Parameters

 `job_id` (required)::
-(+string+)    Identifier for the job
+(+string+) Identifier for the job

 ===== Request Body

--- a/docs/en/rest-api/ml/preview-datafeed.asciidoc
+++ b/docs/en/rest-api/ml/preview-datafeed.asciidoc
@ -7,12 +7,13 @@ The preview data feed API allows you to preview a data feed.

 `GET _xpack/ml/datafeeds/<feed_id>/_preview`

-////
+
 ===== Description

-Important:: Updates do not take effect until after then job is closed and new
-data is sent to it.
-////
+TBD
+//How much data does it return?
+The API returns example data by using the current data feed settings.
+
 ===== Path Parameters

 `feed_id` (required)::
@ -21,25 +22,7 @@ data is sent to it.
 ////
 ===== Request Body

-The following properties can be updated after the job is created:
-
-`analysis_config`::
-  (+object+) The analysis configuration, which specifies how to analyze the data.
-  See <<ml-analysisconfig, analysis configuration objects>>.  In particular, the following properties can be updated: `categorization_filters`, `detector_description`, TBD.
-
-`analysis_limits`::
-  Optionally specifies runtime limits for the job. See <<ml-apilimits,analysis limits>>.
-
-[NOTE]
-* You can update the `analysis_limits` only while the job is closed.
-* The `model_memory_limit` property value cannot be decreased.
-* If the `memory_status` property in the `model_size_stats` object has a value of `hard_limit`,
-increasing the `model_memory_limit` is not recommended.
-
-`description`::
-  (+string+) An optional description of the job.
-
-This expects data to be sent in JSON format using the POST `_data` API.
+None

 ===== Responses

@ -52,33 +35,33 @@ TBD
 (BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found)
 412
 (BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error)
-
+////
 ===== Examples

-The following example updates the `it-ops-kpi` job:
+The following example obtains a previews of the `datafeed-farequote` data feed:

 [source,js]
 --------------------------------------------------
-PUT _xpack/ml/anomaly_detectors/it-ops-kpi/_update
-{
-    "description":"New description",
-    "analysis_limits":{
-      "model_memory_limit": 8192
-    }
-}
+GET _xpack/ml/datafeeds/datafeed-farequote/_preview
 --------------------------------------------------
 // CONSOLE
 // TEST[skip:todo]

-When the job is updated, you receive the following results:
+The data that is returned for this example is as follows:
 ----
-{
-  "job_id": "it-ops-kpi",
-  "description": "New description",
+[
+  {
+    "@timestamp": 1454803200000,
+    "responsetime": 132.20460510253906
+  },
+  {
+    "@timestamp": 1454803200000,
+    "responsetime": 990.4628295898438
+  },
+  {
+    "@timestamp": 1454803200000,
+    "responsetime": 877.5927124023438
+  },
  ...
-  "analysis_limits": {
-    "model_memory_limit": 8192
-  ...
-}
+]
 ----
-////
--- a/docs/en/rest-api/ml/put-datafeed.asciidoc
+++ b/docs/en/rest-api/ml/put-datafeed.asciidoc
@ -1,43 +1,57 @@
 [[ml-put-datafeed]]
 ==== Create Data Feeds

-The create data feed API allows you to instantiate a data feed.
+The create data feed API enables you to instantiate a data feed.

 ===== Request

 `PUT _xpack/ml/datafeeds/<feed_id>`

-////
+
 ===== Description

-TBD
-////
+You must create a job before you create a data feed.  You can associate only one
+data feed to each job.
+
 ===== Path Parameters

 `feed_id` (required)::
-  (+string+) Identifier for the data feed
+  (+string+) A numerical character string that uniquely identifies the data feed.

-////
 ===== Request Body

+aggregations::
+  (+object+) TBD.  For example: {"@timestamp": {"histogram": {"field": "@timestamp",
+  "interval": 30000,"offset": 0,"order": {"_key": "asc"},"keyed": false,
+  "min_doc_count": 0}, "aggregations": {"events_per_min": {"sum": {
+  "field": "events_per_min"}}}}}

-`description`::
-  (+string+) An optional description of the job.
+frequency::
+   TBD: For example: "150s"

-`analysis_config`::
-  (+object+) The analysis configuration, which specifies how to analyze the data.
-  See <<ml-analysisconfig, analysis configuration objects>>.
+indexes (required)::
+  (+array+) An array of index names. For example: ["it_ops_metrics"]

-`data_description`::
-  (+object+) Describes the format of the input data.
-  See <<ml-datadescription,data description objects>>.
+job_id (required)::
+ (+string+) A numerical character string that uniquely identifies the job.

-`analysis_limits`::
-  Optionally specifies runtime limits for the job. See <<ml-apilimits,analysis limits>>.
+query::
+  (+object+) The query that retrieves the data.
+  By default, this property has the following value: `{"match_all": {"boost": 1}}`.

+query_delay::
+  TBD. For example: "60s"

-This expects data to be sent in JSON format using the POST `_data` API.
+scroll_size::
+  TBD. For example, 1000

+types (required)::
+  TBD. For example: ["network","sql","kpi"]
+
+For more information about these properties,
+see <<ml-datafeed-resource, Data Feed Resources>>.  
+
+////
 ===== Responses

 TBD
@ -48,62 +62,55 @@ TBD
 412
 (BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error)

-
+////
 ===== Examples

-The following example creates the `it-ops-kpi` job:
+The following example creates the `datafeed-it-ops-kpi` data feed:

 [source,js]
 --------------------------------------------------
-PUT _xpack/ml/anomaly_detectors/it-ops-kpi
+PUT _xpack/ml/datafeeds/datafeed-it-ops-kpi
 {
-    "description":"First simple job",
-    "analysis_config":{
-      "bucket_span": "5m",
-      "latency": "0ms",
-      "detectors":[
-        {
-          "detector_description": "low_sum(events_per_min)",
-          "function":"low_sum",
-          "field_name": "events_per_min"
-        }
+  "job_id": "it-ops-kpi",
+  "query":
+  {
+   "match_all":
+   {
+     "boost": 1
+   }
+  },
+  "indexes": [
+        "it_ops_metrics"
+      ],
+  "types": [
+        "kpi",
+        "sql",
+        "network"
      ]
-    },
-    "data_description": {
-    "time_field":"@timestamp",
-    "time_format":"epoch_ms"
-    }
 }
 --------------------------------------------------
 // CONSOLE
 // TEST[skip:todo]

-When the job is created, you receive the following results:
+When the data feed is created, you receive the following results:
 ----
 {
+  "datafeed_id": "datafeed-it-ops-kpi",
  "job_id": "it-ops-kpi",
-  "description": "First simple job",
-  "create_time": 1491247016391,
-  "analysis_config": {
-    "bucket_span": "5m",
-    "latency": "0ms",
-    "detectors": [
-      {
-        "detector_description": "low_sum(events_per_min)",
-        "function": "low_sum",
-        "field_name": "events_per_min",
-        "detector_rules": []
-      }
-    ],
-    "influencers": [],
-    "use_per_partition_normalization": false
+  "query_delay": "1m",
+  "indexes": [
+    "it_ops_metrics"
+  ],
+  "types": [
+    "kpi",
+    "sql",
+    "network"
+  ],
+  "query": {
+    "match_all": {
+      "boost": 1
+    }
  },
-  "data_description": {
-    "time_field": "@timestamp",
-    "time_format": "epoch_ms"
-  },
-  "model_snapshot_retention_days": 1,
-  "results_index_name": "shared"
+  "scroll_size": 1000
 }
 ----
-////
--- a/docs/en/rest-api/ml/put-job.asciidoc
+++ b/docs/en/rest-api/ml/put-job.asciidoc
@ -1,7 +1,7 @@
 [[ml-put-job]]
 ==== Create Jobs

-The create job API allows you to instantiate a {ml} job.
+The create job API enables you to instantiate a {ml} job.

 ===== Request

--- a/docs/en/rest-api/ml/start-datafeed.asciidoc
+++ b/docs/en/rest-api/ml/start-datafeed.asciidoc
@ -1,37 +1,73 @@
 [[ml-start-datafeed]]
 ==== Start Data Feeds

-A data feed must be started in order for it to be ready to receive and analyze data.
+A data feed must be started in order to retrieve data from {es}.
 A data feed can be opened and closed multiple times throughout its lifecycle.

 ===== Request

 `POST _xpack/ml/datafeeds/<feed_id>/_start`

-////
 ===== Description

-A job must be open in order to it to accept and analyze data.
+When you start a data feed, you can specify a start time.  This allows you to
+include a training period, providing you have this data available in {es}.
+If you want to analyze from the beginning of a dataset, you can specify any date
+earlier than that beginning date.

-When you open a new job, it starts with an empty model.
+If you do not specify a start time and the data feed is associated with a new
+job, the analysis starts from the earliest time for which data is available.
+
+When you start a data feed, you can also specify an end time. If you do so, the
+job analyzes data from the start time until the end time, at which point the
+analysis stops.  This scenario is useful for a one-off batch analysis.  If you
+do not specify an end time, the data feed runs continuously.
+
+If the system restarts, any jobs that had data feeds running are also restarted.
+
+When a stopped data feed is restarted, it continues processing input data from
+the next millisecond after it was stopped. If your data contains the same
+timestamp (for example, it is summarized by minute), then data loss is possible
+for the timestamp value when the data feed stopped.  This situation can occur
+because the job might not have completely processed all data for that millisecond.
+If you specify a `start` value that is earlier than the timestamp of the latest
+processed record, that value is ignored.
+
+NOTE: Before you can start a data feed, the job must be open. Otherwise, an error
+occurs.

-When you open an existing job, the most recent model state is automatically loaded.
-The job is ready to resume its analysis from where it left off, once new data is received.
-////
 ===== Path Parameters

 `feed_id` (required)::
-(+string+)    Identifier for the data feed
-////
+(+string+) Identifier for the data feed
+
 ===== Request Body

-`open_timeout`::
-  (+time+; default: ++30 min++) Controls the time to wait until a job has opened
+`end`::
+  (+string+) The time that the data feed should end. This value is exclusive.
+  The default value is an empty string.

-`ignore_downtime`::
-  (+boolean+; default: ++true++) If true (default), any gap in data since it was
-  last closed is treated as a maintenance window. That is to say, it is not an anomaly
+`start`::
+  (+string+) The time that the data feed should begin. This value is inclusive.
+  The default value is an empty string.

+These `start` and `end` times can be specified by using one of the
+following formats:
+* ISO 8601 format with milliseconds, for example `2017-01-22T06:00:00.000Z`
+* ISO 8601 format without milliseconds, for example `2017-01-22T06:00:00+00:00`
+* Seconds from the Epoch, for example `1390370400`
+
+NOTE: When a URL is expected (for example, in browsers), the `+`` used in time
+zone designators has to be encoded as `%2B`.
+
+Date-time arguments using either of the ISO 8601 formats must have a time zone
+designator, where Z is accepted as an abbreviation for UTC time.
+
+`timeout`::
+  (+time+) Controls the amount of time to wait until a data feed starts.
+  The default value is 20 seconds.
+
+////
 ===== Responses

 200
@ -40,16 +76,16 @@ The job is ready to resume its analysis from where it left off, once new data is
 (BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found)
 412
 (BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error)
-
+////
 ===== Examples

-The following example opens the `event_rate` job:
+The following example opens the `datafeed-it-ops-kpi` data feed:

 [source,js]
 --------------------------------------------------
-POST _xpack/ml/anomaly_detectors/event_rate/_open
+POST _xpack/ml/datafeeds/datafeed-it-ops-kpi/_start
 {
-  "ignore_downtime":false
+  "start": "2017-04-07T18:22:16Z"
 }
 --------------------------------------------------
 // CONSOLE
@ -58,7 +94,6 @@ POST _xpack/ml/anomaly_detectors/event_rate/_open
 When the job opens, you receive the following results:
 ----
 {
-  "opened": true
+  "started": true
 }
 ----
-////
--- a/docs/en/rest-api/ml/stop-datafeed.asciidoc
+++ b/docs/en/rest-api/ml/stop-datafeed.asciidoc
@ -1,6 +1,7 @@
 [[ml-stop-datafeed]]
 ==== Stop Data Feeds

+A data feed that is stopped ceases to retrieve data from {es}.
 A data feed can be opened and closed multiple times throughout its lifecycle.

 ===== Request
@ -10,31 +11,23 @@ A data feed can be opened and closed multiple times throughout its lifecycle.
 ////
 ===== Description

-A job can be closed once all data has been analyzed.
-
-When you close a job, it runs housekeeping tasks such as pruning the model history,
-flushing buffers, calculating final results and persisting the internal models.
-Depending upon the size of the job, it could take several minutes to close and
-the equivalent time to re-open.
-
-Once closed, the anomaly detection job has almost no overhead on the cluster
-(except for maintaining its meta data). A closed job is blocked for receiving
-data and analysis operations, however you can still explore and navigate results.
-
-//NOTE:
-//OUTDATED?: If using the {prelert} UI, the job will be automatically closed when stopping a datafeed job.
+TBD
 ////
 ===== Path Parameters

 `feed_id` (required)::
-  (+string+)    Identifier for the data feed
+  (+string+) Identifier for the data feed
+
+===== Request Body
+
+`force`::
+  (+boolean+) If true, the data feed is stopped forcefully.
+
+`timeout`::
+  (+time+) Controls the amount of time to wait until a data feed stops.
+  The default value is 20 seconds.
+
 ////
-===== Query Parameters
-
-`close_timeout`::
-  (+time+; default: ++30 min++)  Controls the time to wait until a job has closed
-
-
 ===== Responses

 200
@ -43,22 +36,24 @@ data and analysis operations, however you can still explore and navigate results
 (BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found)
 412
 (BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error)
-
+////
 ===== Examples

-The following example closes the `event_rate` job:
+The following example stops the `datafeed-it-ops-kpi` data feed:

 [source,js]
 --------------------------------------------------
-POST _xpack/ml/anomaly_detectors/event_rate/_close
+POST _xpack/ml/datafeeds/datafeed-it-ops-kpi/_stop
+{
+  "timeout": "30s"
+}
 --------------------------------------------------
 // CONSOLE
 // TEST[skip:todo]

-When the job is closed, you receive the following results:
+When the data feed stops, you receive the following results:
 ----
 {
-  "closed": true
+  "stopped": true
 }
 ----
-////
--- a/docs/en/rest-api/ml/update-datafeed.asciidoc
+++ b/docs/en/rest-api/ml/update-datafeed.asciidoc
@ -10,75 +10,122 @@ The update data feed API allows you to update certain properties of a data feed.
 ////
 ===== Description

-Important:: Updates do not take effect until after then job is closed and new
-data is sent to it.
+TBD
+
 ////
 ===== Path Parameters

 `feed_id` (required)::
  (+string+) Identifier for the data feed

-////
 ===== Request Body

-The following properties can be updated after the job is created:
+The following properties can be updated after the data feed is created:

-`analysis_config`::
-  (+object+) The analysis configuration, which specifies how to analyze the data.
-  See <<ml-analysisconfig, analysis configuration objects>>.  In particular, the following properties can be updated: `categorization_filters`, `detector_description`, TBD.
+aggregations::
+  (+object+) TBD.

-`analysis_limits`::
-  Optionally specifies runtime limits for the job. See <<ml-apilimits,analysis limits>>.
+frequency::
+   TBD: For example: "150s"

-[NOTE]
-* You can update the `analysis_limits` only while the job is closed.
-* The `model_memory_limit` property value cannot be decreased.
-* If the `memory_status` property in the `model_size_stats` object has a value of `hard_limit`,
-increasing the `model_memory_limit` is not recommended.
+indexes (required)::
+  (+array+) An array of index names. For example: ["it_ops_metrics"]

-`description`::
-  (+string+) An optional description of the job.
+job_id::
+ (+string+) A numerical character string that uniquely identifies the job.

-This expects data to be sent in JSON format using the POST `_data` API.
+query::
+  (+object+) The query that retrieves the data.
+  By default, this property has the following value: `{"match_all": {"boost": 1}}`.

+query_delay::
+  TBD. For example: "60s"
+
+scroll_size::
+  TBD. For example, 1000
+
+types (required)::
+  TBD. For example: ["network","sql","kpi"]
+
+For more information about these properties,
+see <<ml-datafeed-resource, Data Feed Resources>>.
+
+////
 ===== Responses

 TBD
-////
-////
+
 200
 (EmptyResponse) The cluster has been successfully deleted
 404
 (BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found)
 412
 (BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error)
-
+////
 ===== Examples

 The following example updates the `it-ops-kpi` job:

 [source,js]
 --------------------------------------------------
-PUT _xpack/ml/anomaly_detectors/it-ops-kpi/_update
+POST _xpack/ml/datafeeds/datafeed-it-ops-kpi3/_update
 {
-    "description":"New description",
-    "analysis_limits":{
-      "model_memory_limit": 8192
+  "aggregations": {
+    "@timestamp": {
+      "histogram": {
+        "field": "@timestamp",
+        "interval": 30000,
+        "offset": 0,
+        "order": {
+          "_key": "asc"
+        },
+        "keyed": false,
+        "min_doc_count": 0
+      },
+      "aggregations": {
+        "events_per_min": {
+          "sum": {
+            "field": "events_per_min"
+          }
+        }
+      }
    }
+  },
+  "frequency": "160s"
 }
 --------------------------------------------------
 // CONSOLE
 // TEST[skip:todo]

-When the job is updated, you receive the following results:
+When the data feed is updated, you receive the following results:
 ----
 {
+  "datafeed_id": "datafeed-it-ops-kpi",
  "job_id": "it-ops-kpi",
-  "description": "New description",
-  ...
-  "analysis_limits": {
-    "model_memory_limit": 8192
-  ...
+  "query_delay": "1m",
+  "frequency": "160s",
+...
+  "aggregations": {
+    "@timestamp": {
+      "histogram": {
+        "field": "@timestamp",
+        "interval": 30000,
+        "offset": 0,
+        "order": {
+          "_key": "asc"
+        },
+        "keyed": false,
+        "min_doc_count": 0
+      },
+      "aggregations": {
+        "events_per_min": {
+          "sum": {
+            "field": "events_per_min"
+          }
+        }
+      }
+    }
+  },
+  "scroll_size": 1000
 }
 ----
-////