OpenSearch/docs/en/rest-api/ml/datafeedresource.asciidoc

[role="xpack"]
[[ml-datafeed-resource]]
=== {dfeed-cap} Resources

A {dfeed} resource has the following properties:

`aggregations`::
  (object) If set, the {dfeed} performs aggregation searches.
  Support for aggregations is limited and should only be used with
  low cardinality data. For more information, see
  {xpack-ref}/ml-configuring-aggregation.html[Aggregating Data for Faster Performance].

`chunking_config`::
  (object) Specifies how data searches are split into time chunks.
  See <<ml-datafeed-chunking-config>>.
  For example: `{"mode": "manual", "time_span": "3h"}`

`datafeed_id`::
 (string) A numerical character string that uniquely identifies the {dfeed}.
 This property is informational; you cannot change the identifier for existing
 {dfeeds}.

`frequency`::
  (time units) The interval at which scheduled queries are made while the
  {dfeed} runs in real time. The default value is either the bucket span for short
  bucket spans, or, for longer bucket spans, a sensible fraction of the bucket
  span. For example: `150s`.

`indices`::
  (array) An array of index names. For example: `["it_ops_metrics"]`

`job_id`::
 (string) The unique identifier for the job to which the {dfeed} sends data.

`query`::
  (object) The {es} query domain-specific language (DSL). This value
  corresponds to the query object in an {es} search POST body. All the
  options that are supported by {es} can be used, as this object is
  passed verbatim to {es}. By default, this property has the following
  value: `{"match_all": {"boost": 1}}`.

`query_delay`::
  (time units) The number of seconds behind real time that data is queried. For
  example, if data from 10:04 a.m. might not be searchable in {es} until
  10:06 a.m., set this property to 120 seconds. The default value is randomly
  selected between `60s` and `120s`. This randomness improves the query
  performance when there are multiple jobs running on the same node.

`script_fields`::
  (object) Specifies scripts that evaluate custom expressions and returns
  script fields to the {dfeed}.
  The <<ml-detectorconfig,detector configuration objects>> in a job can contain
  functions that use these script fields.
  For more information, see
  {xpack-ref}/ml-configuring-transform.html[Transforming Data With Script Fields].

`scroll_size`::
  (unsigned integer) The `size` parameter that is used in {es} searches.
  The default value is `1000`.

`types`::
  (array) A list of types to search for within the specified indices. For
  example: `[]`. This property is provided for backwards compatibility with
  releases earlier than 6.0.0. For more information, see <<removal-of-types>>.

[[ml-datafeed-chunking-config]]
==== Chunking Configuration Objects

{dfeeds-cap} might be required to search over long time periods, for several months
or years. This search is split into time chunks in order to ensure the load
on {es} is managed. Chunking configuration controls how the size of these time
chunks are calculated and is an advanced configuration option.

A chunking configuration object has the following properties:

`mode`::
  There are three available modes: +
  `auto`::: The chunk size will be dynamically calculated. This is the default
  and recommended value.
  `manual`::: Chunking will be applied according to the specified `time_span`.
  `off`::: No chunking will be applied.

`time_span`::
  (time units) The time span that each search will be querying.
  This setting is only applicable when the mode is set to `manual`.
  For example: `3h`.

[float]
[[ml-datafeed-counts]]
==== {dfeed-cap} Counts

The get {dfeed} statistics API provides information about the operational
progress of a {dfeed}. All of these properties are informational; you cannot
update their values:

`assignment_explanation`::
  (string) For started {dfeeds} only, contains messages relating to the
  selection of a node.

`datafeed_id`::
 (string) A numerical character string that uniquely identifies the {dfeed}.

`node`::
  (object) The node upon which the {dfeed} is started. The {dfeed} and job will
  be on the same node.
  `id`::: The unique identifier of the node. For example,
  "0-o0tOoRTwKFZifatTWKNw".
  `name`::: The node name. For example, `0-o0tOo`.
  `ephemeral_id`::: The node ephemeral ID.
  `transport_address`::: The host and port where transport HTTP connections are
  accepted. For example, `127.0.0.1:9300`.
  `attributes`::: For example, `{"ml.max_open_jobs": "10"}`.

`state`::
  (string) The status of the {dfeed}, which can be one of the following values: +
  `started`::: The {dfeed} is actively receiving data.
  `stopped`::: The {dfeed} is stopped and will not receive data until it is
  re-started.