diff --git a/docs/en/ml/introduction.asciidoc b/docs/en/ml/introduction.asciidoc index 84d3965e601..a2fc2f15345 100644 --- a/docs/en/ml/introduction.asciidoc +++ b/docs/en/ml/introduction.asciidoc @@ -19,7 +19,7 @@ science-related configurations in order to get the benefits of {ml}. === Integration with the Elastic Stack Machine learning is tightly integrated with the Elastic Stack. -Data is pulled from {es} for analysis and anomaly results are displayed in {kb} +Data is pulled from {es} for analysis and anomaly results are displayed in {kib} dashboards. [float] @@ -36,9 +36,9 @@ Jobs:: with a job, see <>. Data feeds:: - Jobs can analyze either a one-off batch of data or continuously in real-time. - Data feeds retrieve data from {es} for analysis. Alternatively you can - <> from any source directly to an API. + Jobs can analyze either a one-off batch of data or continuously in real time. + Data feeds retrieve data from {es} for analysis. Alternatively you can + <> from any source directly to an API. Detectors:: Part of the configuration information associated with a job, detectors define diff --git a/docs/en/rest-api/ml/datafeedresource.asciidoc b/docs/en/rest-api/ml/datafeedresource.asciidoc index 1f547078b5e..288d1627839 100644 --- a/docs/en/rest-api/ml/datafeedresource.asciidoc +++ b/docs/en/rest-api/ml/datafeedresource.asciidoc @@ -7,15 +7,15 @@ A data feed resource has the following properties: `aggregations`:: (object) If set, the data feed performs aggregation searches. For syntax information, see {ref}/search-aggregations.html[Aggregations]. - Support for aggregations is limited and should only be used with - low cardinality data: + Support for aggregations is limited and should only be used with + low cardinality data. For example: `{"@timestamp": {"histogram": {"field": "@timestamp", "interval": 30000,"offset": 0,"order": {"_key": "asc"},"keyed": false, "min_doc_count": 0}, "aggregations": {"events_per_min": {"sum": { "field": "events_per_min"}}}}}`. - //TBD link to a Working with aggregations page +//TBD link to a Working with aggregations page `chunking_config`:: (object) Specifies how data searches are split into time chunks. See <>. @@ -37,19 +37,19 @@ A data feed resource has the following properties: (string) The unique identifier for the job to which the data feed sends data. `query`:: - (object) The Elasticsearch query domain-specific language (DSL). This value - corresponds to the query object in an Elasticsearch search POST body. All the - options that are supported by Elasticsearch can be used, as this object is - passed verbatim to Elasticsearch. By default, this property has the following - value: `{"match_all": {"boost": 1}}`. + (object) The {es} query domain-specific language (DSL). This value + corresponds to the query object in an {es} search POST body. All the + options that are supported by {es} can be used, as this object is + passed verbatim to {es}. By default, this property has the following + value: `{"match_all": {"boost": 1}}`. `query_delay`:: - (time units) The number of seconds behind real-time that data is queried. For - example, if data from 10:04 a.m. might not be searchable in Elasticsearch - until 10:06 a.m., set this property to 120 seconds. The default value is `60s`. + (time units) The number of seconds behind real time that data is queried. For + example, if data from 10:04 a.m. might not be searchable in {es} until + 10:06 a.m., set this property to 120 seconds. The default value is `60s`. `scroll_size`:: - (unsigned integer) The `size` parameter that is used in Elasticsearch searches. + (unsigned integer) The `size` parameter that is used in {es} searches. The default value is `1000`. `types` (required):: @@ -59,7 +59,7 @@ A data feed resource has the following properties: [[ml-datafeed-chunking-config]] ===== Chunking Configuration Objects -Data feeds may be required to search over long time periods, for several months +Data feeds might be required to search over long time periods, for several months or years. This search is split into time chunks in order to ensure the load on {es} is managed. Chunking configuration controls how the size of these time chunks are calculated and is an advanced configuration option. @@ -68,7 +68,7 @@ A chunking configuration object has the following properties: `mode` (required):: There are three available modes: + - `auto`::: The chunk size will be dynamically calculated. This is the default + `auto`::: The chunk size will be dynamically calculated. This is the default and recommended value. `manual`::: Chunking will be applied according to the specified `time_span`. `off`::: No chunking will be applied. @@ -86,21 +86,25 @@ The get data feed statistics API provides information about the operational progress of a data feed. For example: `assignment_explanation`:: - (string) For started data feeds only, contains messages relating to the selection - of a node. + (string) For started data feeds only, contains messages relating to the + selection of a node. `datafeed_id`:: (string) A numerical character string that uniquely identifies the data feed. `node`:: - (object) The node upon which the data feed is started. The data feed and job will be on the same node. - `id`::: The unique identifier of the node. For example, "0-o0tOoRTwKFZifatTWKNw". + (object) The node upon which the data feed is started. The data feed and + job will be on the same node. + `id`::: The unique identifier of the node. For example, + "0-o0tOoRTwKFZifatTWKNw". `name`::: The node name. For example, "0-o0tOo". - `ephemeral_id`::: The node ephemeral id. - `transport_address`::: The host and port where transport HTTP connections are accepted. For example, "127.0.0.1:9300". + `ephemeral_id`::: The node ephemeral ID. + `transport_address`::: The host and port where transport HTTP connections are + accepted. For example, "127.0.0.1:9300". `attributes`::: For example, {"max_running_jobs": "10"}. `state`:: (string) The status of the data feed, which can be one of the following values: + `started`::: The data feed is actively receiving data. - `stopped`::: The data feed is stopped and will not receive data until it is re-started. + `stopped`::: The data feed is stopped and will not receive data until it is + re-started. diff --git a/docs/en/rest-api/ml/post-data.asciidoc b/docs/en/rest-api/ml/post-data.asciidoc index e61f695cd77..15e6ee35fe4 100644 --- a/docs/en/rest-api/ml/post-data.asciidoc +++ b/docs/en/rest-api/ml/post-data.asciidoc @@ -13,10 +13,10 @@ The job must have been opened prior to sending data. ===== Description -File sizes are limited to 100 Mb, so if your file is larger, -then split it into multiple files and upload each one separately in sequential time order. -When running in real-time, it is generally recommended to perform -many small uploads, rather than queueing data to upload larger files. +File sizes are limited to 100 Mb, so if your file is larger, then split it into +multiple files and upload each one separately in sequential time order. When +running in real time, it is generally recommended to perform many small uploads, +rather than queueing data to upload larger files. When uploading data, check the <> for progress. The following records will not be processed: @@ -26,10 +26,10 @@ The following records will not be processed: //TBD link to Working with Out of Order timeseries concept doc -IMPORTANT: Data can only be accepted from a single connection. -Use a single connection synchronously to send data, close, flush, or delete a single job. +IMPORTANT: Data can only be accepted from a single connection. Use a single +connection synchronously to send data, close, flush, or delete a single job. It is not currently possible to post data to multiple jobs using wildcards -or a comma separated list. +or a comma-separated list. ===== Path Parameters diff --git a/docs/en/rest-api/ml/put-datafeed.asciidoc b/docs/en/rest-api/ml/put-datafeed.asciidoc index 71c9f86360c..af917589bf0 100644 --- a/docs/en/rest-api/ml/put-datafeed.asciidoc +++ b/docs/en/rest-api/ml/put-datafeed.asciidoc @@ -29,8 +29,8 @@ data feed to each job. For more information, see <>. `chunking_config`:: - (object) The chunking configuration, which specifies how data searches are - chunked. See <>. + (object) Specifies how data searches are split into time chunks. + See <>. `frequency`:: (time units) The interval at which scheduled queries are made while the data @@ -45,21 +45,19 @@ data feed to each job. (string) A numerical character string that uniquely identifies the job. `query`:: - (object) The Elasticsearch query domain-specific language (DSL). This value - corresponds to the query object in an Elasticsearch search POST body. All the - options that are supported by Elasticsearch can be used, as this object is - passed verbatim to Elasticsearch. By default, this property has the following - value: `{"match_all": {"boost": 1}}`. If this property is not specified, the - default value is `“match_all”: {}`. + (object) The {es} query domain-specific language (DSL). This value + corresponds to the query object in an {es} search POST body. All the + options that are supported by {Es} can be used, as this object is + passed verbatim to {es}. By default, this property has the following + value: `{"match_all": {"boost": 1}}`. `query_delay`:: - (time units) The number of seconds behind real-time that data is queried. For - example, if data from 10:04 a.m. might not be searchable in Elasticsearch - until 10:06 a.m., set this property to 120 seconds. The default value is 60 - seconds. For example: "60s". + (time units) The number of seconds behind real time that data is queried. For + example, if data from 10:04 a.m. might not be searchable in {es} until + 10:06 a.m., set this property to 120 seconds. The default value is `60s`. `scroll_size`:: - (unsigned integer) The `size` parameter that is used in Elasticsearch searches. + (unsigned integer) The `size` parameter that is used in {es} searches. The default value is `1000`. `types` (required):: diff --git a/docs/en/rest-api/ml/update-datafeed.asciidoc b/docs/en/rest-api/ml/update-datafeed.asciidoc index 70a4a4a32d8..32bfde349ea 100644 --- a/docs/en/rest-api/ml/update-datafeed.asciidoc +++ b/docs/en/rest-api/ml/update-datafeed.asciidoc @@ -24,8 +24,8 @@ The following properties can be updated after the data feed is created: For more information, see <>. `chunking_config`:: - (object) The chunking configuration, which specifies how data searches are - chunked. See <>. + (object) Specifies how data searches are split into time chunks. + See <>. `frequency`:: (time units) The interval at which scheduled queries are made while the data @@ -40,21 +40,19 @@ The following properties can be updated after the data feed is created: (string) A numerical character string that uniquely identifies the job. `query`:: - (object) The Elasticsearch query domain-specific language (DSL). This value - corresponds to the query object in an Elasticsearch search POST body. All the - options that are supported by Elasticsearch can be used, as this object is - passed verbatim to Elasticsearch. By default, this property has the following - value: `{"match_all": {"boost": 1}}`. If this property is not specified, the - default value is `“match_all”: {}`. + (object) The {es} query domain-specific language (DSL). This value + corresponds to the query object in an {es} search POST body. All the + options that are supported by {es} can be used, as this object is + passed verbatim to {es}. By default, this property has the following + value: `{"match_all": {"boost": 1}}`. `query_delay`:: (time units) The number of seconds behind real-time that data is queried. For - example, if data from 10:04 a.m. might not be searchable in Elasticsearch - until 10:06 a.m., set this property to 120 seconds. The default value is 60 - seconds. For example: "60s". + example, if data from 10:04 a.m. might not be searchable in {es} until + 10:06 a.m., set this property to 120 seconds. The default value is `60s`. `scroll_size`:: - (unsigned integer) The `size` parameter that is used in Elasticsearch searches. + (unsigned integer) The `size` parameter that is used in {es} searches. The default value is `1000`. `types` (required)::