OpenSearch/docs/en/rest-api/ml/start-datafeed.asciidoc

106 lines
3.5 KiB
Plaintext

//lcawley Verified example output 2017-04
[[ml-start-datafeed]]
==== Start Data Feeds
A data feed must be started in order to retrieve data from {es}.
A data feed can be opened and closed multiple times throughout its lifecycle.
===== Request
`POST _xpack/ml/datafeeds/<feed_id>/_start`
===== Description
NOTE: Before you can start a data feed, the job must be open. Otherwise, an error
occurs.
When you start a data feed, you can specify a start time. This allows you to
include a training period, providing you have this data available in {es}.
If you want to analyze from the beginning of a dataset, you can specify any date
earlier than that beginning date.
If you do not specify a start time and the data feed is associated with a new
job, the analysis starts from the earliest time for which data is available.
When you start a data feed, you can also specify an end time. If you do so, the
job analyzes data from the start time until the end time, at which point the
analysis stops. This scenario is useful for a one-off batch analysis. If you
do not specify an end time, the data feed runs continuously.
The `start` and `end` times can be specified by using one of the
following formats: +
- ISO 8601 format with milliseconds, for example `2017-01-22T06:00:00.000Z`
- ISO 8601 format without milliseconds, for example `2017-01-22T06:00:00+00:00`
- Seconds from the Epoch, for example `1390370400`
Date-time arguments using either of the ISO 8601 formats must have a time zone
designator, where Z is accepted as an abbreviation for UTC time.
NOTE: When a URL is expected (for example, in browsers), the `+` used in time
zone designators must be encoded as `%2B`.
If the system restarts, any jobs that had data feeds running are also restarted.
When a stopped data feed is restarted, it continues processing input data from
the next millisecond after it was stopped. If your data contains the same
timestamp (for example, it is summarized by minute), then data loss is possible
for the timestamp value when the data feed stopped. This situation can occur
because the job might not have completely processed all data for that millisecond.
If you specify a `start` value that is earlier than the timestamp of the latest
processed record, that value is ignored.
You must have `manage_ml`, or `manage` cluster privileges to use this API.
For more information, see <<privileges-list-cluster>>.
===== Path Parameters
`feed_id` (required)::
(string) Identifier for the data feed
===== Request Body
`end`::
(string) The time that the data feed should end. This value is exclusive.
The default value is an empty string.
`start`::
(string) The time that the data feed should begin. This value is inclusive.
The default value is an empty string.
`timeout`::
(time) Controls the amount of time to wait until a data feed starts.
The default value is 20 seconds.
////
===== Responses
200
(EmptyResponse) The cluster has been successfully deleted
404
(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found)
412
(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error)
////
===== Examples
The following example opens the `datafeed-it-ops-kpi` data feed:
[source,js]
--------------------------------------------------
POST _xpack/ml/datafeeds/datafeed-it-ops-kpi/_start
{
"start": "2017-04-07T18:22:16Z"
}
--------------------------------------------------
// CONSOLE
// TEST[skip:todo]
When the job opens, you receive the following results:
[source,js]
----
{
"started": true
}
----