[DOCS] Overall review (elastic/x-pack-elasticsearch#1237)
* [DOCS] Overall review * [DOCS] General review * [DOCS] typo * [DOCS] Fix for processed_record_count with aggs * [DOCS] Added latency tbd Original commit: elastic/x-pack-elasticsearch@9e8cf664c1
This commit is contained in:
parent
642b1f7c19
commit
ffb3bb6493
|
@ -10,11 +10,11 @@ All {ml} endpoints have the following base:
|
||||||
|
|
||||||
The main {ml} resources can be accessed with a variety of endpoints:
|
The main {ml} resources can be accessed with a variety of endpoints:
|
||||||
|
|
||||||
* <<ml-api-jobs,+/anomaly_detectors/+>>: Create and manage {ml} jobs.
|
* <<ml-api-jobs,+/anomaly_detectors/+>>: Create and manage {ml} jobs
|
||||||
* <<ml-api-datafeeds,+/datafeeds/+>>: Update data to be analyzed.
|
* <<ml-api-datafeeds,+/datafeeds/+>>: Select data from {es} to be analyzed
|
||||||
* <<ml-api-results,+/results/+>>: Access the results of a {ml} job.
|
* <<ml-api-results,+/results/+>>: Access the results of a {ml} job
|
||||||
* <<ml-api-snapshots,+/model_snapshots/+>>: Manage model snapshots.
|
* <<ml-api-snapshots,+/model_snapshots/+>>: Manage model snapshots
|
||||||
* <<ml-api-validate,+/validate/+>>: Validate subsections of job configurations.
|
* <<ml-api-validate,+/validate/+>>: Validate subsections of job configurations
|
||||||
|
|
||||||
[float]
|
[float]
|
||||||
[[ml-api-jobs]]
|
[[ml-api-jobs]]
|
||||||
|
|
|
@ -19,8 +19,8 @@ science-related configurations in order to get the benefits of {ml}.
|
||||||
=== Integration with the Elastic Stack
|
=== Integration with the Elastic Stack
|
||||||
|
|
||||||
Machine learning is tightly integrated with the Elastic Stack.
|
Machine learning is tightly integrated with the Elastic Stack.
|
||||||
Data is pulled from {es} for analysis and anomaly results are displayed in
|
Data is pulled from {es} for analysis and anomaly results are displayed in {kb}
|
||||||
{kb} dashboards.
|
dashboards.
|
||||||
|
|
||||||
[float]
|
[float]
|
||||||
[[ml-concepts]]
|
[[ml-concepts]]
|
||||||
|
@ -36,23 +36,25 @@ Jobs::
|
||||||
with a job, see <<ml-job-resource, Job Resources>>.
|
with a job, see <<ml-job-resource, Job Resources>>.
|
||||||
|
|
||||||
Data feeds::
|
Data feeds::
|
||||||
Jobs can analyze either a batch of data from a data store or a stream of data
|
Jobs can analyze either a one-off batch of data or continuously in real-time.
|
||||||
in real-time. The latter involves data that is retrieved from {es} and is
|
Data feeds retrieve data from {es} for analysis. Alternatively you can
|
||||||
referred to as a data feed.
|
<<ml-post-data],POST data>> from any source directly to an API.
|
||||||
|
|
||||||
Detectors::
|
Detectors::
|
||||||
Part of the configuration information associated with a job, detectors define
|
Part of the configuration information associated with a job, detectors define
|
||||||
the type of analysis that needs to be done (for example, max, average, rare).
|
the type of analysis that needs to be done (for example, max, average, rare).
|
||||||
They also specify which fields to analyze. You can have more than one detector
|
They also specify which fields to analyze. You can have more than one detector
|
||||||
in a job, which is more efficient than running multiple jobs against the same
|
in a job, which is more efficient than running multiple jobs against the same
|
||||||
data stream. For a list of the properties associated with detectors, see
|
data. For a list of the properties associated with detectors, see
|
||||||
<<ml-detectorconfig, Detector Configuration Objects>>.
|
<<ml-detectorconfig, Detector Configuration Objects>>.
|
||||||
|
|
||||||
Buckets::
|
Buckets::
|
||||||
Part of the configuration information associated with a job, the _bucket span_
|
Part of the configuration information associated with a job, the _bucket span_
|
||||||
defines the time interval across which the job analyzes. When setting the
|
defines the time interval used to summarize and model the data. This is typically
|
||||||
|
between 5 minutes to 1 hour, and it depends on your data characteristics. When setting the
|
||||||
bucket span, take into account the granularity at which you want to analyze,
|
bucket span, take into account the granularity at which you want to analyze,
|
||||||
the frequency of the input data, and the frequency at which alerting is required.
|
the frequency of the input data, the typical duration of the anomalies
|
||||||
|
and the frequency at which alerting is required.
|
||||||
|
|
||||||
Machine learning nodes::
|
Machine learning nodes::
|
||||||
A {ml} node is a node that has `xpack.ml.enabled` and `node.ml` set to `true`,
|
A {ml} node is a node that has `xpack.ml.enabled` and `node.ml` set to `true`,
|
||||||
|
|
|
@ -12,14 +12,14 @@ Use machine learning to detect anomalies in time series data.
|
||||||
[[ml-api-datafeed-endpoint]]
|
[[ml-api-datafeed-endpoint]]
|
||||||
=== Data Feeds
|
=== Data Feeds
|
||||||
|
|
||||||
* <<ml-put-datafeed,Create data feeds>>
|
* <<ml-put-datafeed,Create data feed>>
|
||||||
* <<ml-delete-datafeed,Delete data feeds>>
|
* <<ml-delete-datafeed,Delete data feed>>
|
||||||
* <<ml-get-datafeed,Get data feeds>>
|
* <<ml-get-datafeed,Get data feed info>>
|
||||||
* <<ml-get-datafeed-stats,Get data feed statistics>>
|
* <<ml-get-datafeed-stats,Get data feed statistics>>
|
||||||
* <<ml-preview-datafeed,Preview data feeds>>
|
* <<ml-preview-datafeed,Preview data feed>>
|
||||||
* <<ml-start-datafeed,Start data feeds>>
|
* <<ml-start-datafeed,Start data feed>>
|
||||||
* <<ml-stop-datafeed,Stop data feeds>>
|
* <<ml-stop-datafeed,Stop data feed>>
|
||||||
* <<ml-update-datafeed,Update data feeds>>
|
* <<ml-update-datafeed,Update data feed>>
|
||||||
|
|
||||||
include::ml/put-datafeed.asciidoc[]
|
include::ml/put-datafeed.asciidoc[]
|
||||||
include::ml/delete-datafeed.asciidoc[]
|
include::ml/delete-datafeed.asciidoc[]
|
||||||
|
@ -35,15 +35,15 @@ include::ml/update-datafeed.asciidoc[]
|
||||||
|
|
||||||
You can use APIs to perform the following activities:
|
You can use APIs to perform the following activities:
|
||||||
|
|
||||||
* <<ml-close-job,Close jobs>>
|
* <<ml-close-job,Close job>>
|
||||||
* <<ml-put-job,Create jobs>>
|
* <<ml-put-job,Create job>>
|
||||||
* <<ml-delete-job,Delete jobs>>
|
* <<ml-delete-job,Delete job>>
|
||||||
* <<ml-get-job,Get jobs>>
|
* <<ml-get-job,Get job info>>
|
||||||
* <<ml-get-job-stats,Get job statistics>>
|
* <<ml-get-job-stats,Get job statistics>>
|
||||||
* <<ml-flush-job,Flush jobs>>
|
* <<ml-flush-job,Flush job>>
|
||||||
* <<ml-open-job,Open jobs>>
|
* <<ml-open-job,Open job>>
|
||||||
* <<ml-post-data,Post data to jobs>>
|
* <<ml-post-data,Post data to job>>
|
||||||
* <<ml-update-job,Update jobs>>
|
* <<ml-update-job,Update job>>
|
||||||
* <<ml-valid-detector,Validate detectors>>
|
* <<ml-valid-detector,Validate detectors>>
|
||||||
* <<ml-valid-job,Validate job>>
|
* <<ml-valid-job,Validate job>>
|
||||||
|
|
||||||
|
@ -62,10 +62,10 @@ include::ml/validate-job.asciidoc[]
|
||||||
[[ml-api-snapshot-endpoint]]
|
[[ml-api-snapshot-endpoint]]
|
||||||
=== Model Snapshots
|
=== Model Snapshots
|
||||||
|
|
||||||
* <<ml-delete-snapshot,Delete model snapshots>>
|
* <<ml-delete-snapshot,Delete model snapshot>>
|
||||||
* <<ml-get-snapshot,Get model snapshots>>
|
* <<ml-get-snapshot,Get model snapshot info>>
|
||||||
* <<ml-revert-snapshot,Revert model snapshots>>
|
* <<ml-revert-snapshot,Revert model snapshot>>
|
||||||
* <<ml-update-snapshot,Update model snapshots>>
|
* <<ml-update-snapshot,Update model snapshot>>
|
||||||
|
|
||||||
include::ml/delete-snapshot.asciidoc[]
|
include::ml/delete-snapshot.asciidoc[]
|
||||||
include::ml/get-snapshot.asciidoc[]
|
include::ml/get-snapshot.asciidoc[]
|
||||||
|
@ -91,7 +91,7 @@ include::ml/get-record.asciidoc[]
|
||||||
* <<ml-datafeed-resource,Data feeds>>
|
* <<ml-datafeed-resource,Data feeds>>
|
||||||
* <<ml-datafeed-counts,Data feed counts>>
|
* <<ml-datafeed-counts,Data feed counts>>
|
||||||
* <<ml-job-resource,Jobs>>
|
* <<ml-job-resource,Jobs>>
|
||||||
* <<ml-jobstats,Job Stats>>
|
* <<ml-jobstats,Job statistics>>
|
||||||
* <<ml-snapshot-resource,Model snapshots>>
|
* <<ml-snapshot-resource,Model snapshots>>
|
||||||
* <<ml-results-resource,Results>>
|
* <<ml-results-resource,Results>>
|
||||||
|
|
||||||
|
|
|
@ -7,16 +7,18 @@ A data feed resource has the following properties:
|
||||||
`aggregations`::
|
`aggregations`::
|
||||||
(object) If set, the data feed performs aggregation searches.
|
(object) If set, the data feed performs aggregation searches.
|
||||||
For syntax information, see {ref}/search-aggregations.html[Aggregations].
|
For syntax information, see {ref}/search-aggregations.html[Aggregations].
|
||||||
Support for aggregations is limited: TBD.
|
Support for aggregations is limited and should only be used with
|
||||||
|
low cardinality data:
|
||||||
For example:
|
For example:
|
||||||
`{"@timestamp": {"histogram": {"field": "@timestamp",
|
`{"@timestamp": {"histogram": {"field": "@timestamp",
|
||||||
"interval": 30000,"offset": 0,"order": {"_key": "asc"},"keyed": false,
|
"interval": 30000,"offset": 0,"order": {"_key": "asc"},"keyed": false,
|
||||||
"min_doc_count": 0}, "aggregations": {"events_per_min": {"sum": {
|
"min_doc_count": 0}, "aggregations": {"events_per_min": {"sum": {
|
||||||
"field": "events_per_min"}}}}}`.
|
"field": "events_per_min"}}}}}`.
|
||||||
|
//TBD link to a Working with aggregations page
|
||||||
|
|
||||||
`chunking_config`::
|
`chunking_config`::
|
||||||
(object) The chunking configuration, which specifies how data searches are
|
(object) Specifies how data searches are split into time chunks.
|
||||||
chunked. See <<ml-datafeed-chunking-config>>.
|
See <<ml-datafeed-chunking-config>>.
|
||||||
For example: {"mode": "manual", "time_span": "3h"}
|
For example: {"mode": "manual", "time_span": "3h"}
|
||||||
|
|
||||||
`datafeed_id`::
|
`datafeed_id`::
|
||||||
|
@ -39,14 +41,12 @@ A data feed resource has the following properties:
|
||||||
corresponds to the query object in an Elasticsearch search POST body. All the
|
corresponds to the query object in an Elasticsearch search POST body. All the
|
||||||
options that are supported by Elasticsearch can be used, as this object is
|
options that are supported by Elasticsearch can be used, as this object is
|
||||||
passed verbatim to Elasticsearch. By default, this property has the following
|
passed verbatim to Elasticsearch. By default, this property has the following
|
||||||
value: `{"match_all": {"boost": 1}}`. If this property is not specified, the
|
value: `{"match_all": {"boost": 1}}`.
|
||||||
default value is `“match_all”: {}`.
|
|
||||||
|
|
||||||
`query_delay`::
|
`query_delay`::
|
||||||
(time units) The number of seconds behind real-time that data is queried. For
|
(time units) The number of seconds behind real-time that data is queried. For
|
||||||
example, if data from 10:04 a.m. might not be searchable in Elasticsearch
|
example, if data from 10:04 a.m. might not be searchable in Elasticsearch
|
||||||
until 10:06 a.m., set this property to 120 seconds. The default value is 60
|
until 10:06 a.m., set this property to 120 seconds. The default value is `60s`.
|
||||||
seconds. For example: "60s".
|
|
||||||
|
|
||||||
`scroll_size`::
|
`scroll_size`::
|
||||||
(unsigned integer) The `size` parameter that is used in Elasticsearch searches.
|
(unsigned integer) The `size` parameter that is used in Elasticsearch searches.
|
||||||
|
@ -59,11 +59,17 @@ A data feed resource has the following properties:
|
||||||
[[ml-datafeed-chunking-config]]
|
[[ml-datafeed-chunking-config]]
|
||||||
===== Chunking Configuration Objects
|
===== Chunking Configuration Objects
|
||||||
|
|
||||||
|
Data feeds may be required to search over long time periods, for several months
|
||||||
|
or years. This search is split into time chunks in order to ensure the load
|
||||||
|
on {es} is managed. Chunking configuration controls how the size of these time
|
||||||
|
chunks are calculated and is an advanced configuration option.
|
||||||
|
|
||||||
A chunking configuration object has the following properties:
|
A chunking configuration object has the following properties:
|
||||||
|
|
||||||
`mode` (required)::
|
`mode` (required)::
|
||||||
There are three available modes: +
|
There are three available modes: +
|
||||||
`auto`::: The chunk size will be dynamically calculated.
|
`auto`::: The chunk size will be dynamically calculated. This is the default
|
||||||
|
and recommended value.
|
||||||
`manual`::: Chunking will be applied according to the specified `time_span`.
|
`manual`::: Chunking will be applied according to the specified `time_span`.
|
||||||
`off`::: No chunking will be applied.
|
`off`::: No chunking will be applied.
|
||||||
|
|
||||||
|
@ -79,20 +85,20 @@ A chunking configuration object has the following properties:
|
||||||
The get data feed statistics API provides information about the operational
|
The get data feed statistics API provides information about the operational
|
||||||
progress of a data feed. For example:
|
progress of a data feed. For example:
|
||||||
|
|
||||||
`assigment_explanation`::
|
`assignment_explanation`::
|
||||||
TBD. For example: " "
|
(string) For started data feeds only, contains messages relating to the selection
|
||||||
|
of a node.
|
||||||
|
|
||||||
`datafeed_id`::
|
`datafeed_id`::
|
||||||
(string) A numerical character string that uniquely identifies the data feed.
|
(string) A numerical character string that uniquely identifies the data feed.
|
||||||
|
|
||||||
`node`::
|
`node`::
|
||||||
(object) TBD
|
(object) The node upon which the data feed is started. The data feed and job will be on the same node.
|
||||||
The node that is running the query?
|
`id`::: The unique identifier of the node. For example, "0-o0tOoRTwKFZifatTWKNw".
|
||||||
`id`::: TBD. For example, "0-o0tOoRTwKFZifatTWKNw".
|
`name`::: The node name. For example, "0-o0tOo".
|
||||||
`name`::: TBD. For example, "0-o0tOo".
|
`ephemeral_id`::: The node ephemeral id.
|
||||||
`ephemeral_id`::: TBD. For example, "DOZltLxLS_SzYpW6hQ9hyg".
|
`transport_address`::: The host and port where transport HTTP connections are accepted. For example, "127.0.0.1:9300".
|
||||||
`transport_address`::: TBD. For example, "127.0.0.1:9300".
|
`attributes`::: For example, {"max_running_jobs": "10"}.
|
||||||
`attributes`::: TBD. For example, {"max_running_jobs": "10"}.
|
|
||||||
|
|
||||||
`state`::
|
`state`::
|
||||||
(string) The status of the data feed, which can be one of the following values: +
|
(string) The status of the data feed, which can be one of the following values: +
|
||||||
|
|
|
@ -118,14 +118,8 @@ necessarily a cause for concern.
|
||||||
This value includes records with missing fields, since they are nonetheless
|
This value includes records with missing fields, since they are nonetheless
|
||||||
analyzed. +
|
analyzed. +
|
||||||
If you use data feeds and have aggregations in your search query,
|
If you use data feeds and have aggregations in your search query,
|
||||||
the `processed_record_count` differs from the `input_record_count`. +
|
the `processed_record_count` will be the number of aggregated records
|
||||||
If you use the <<ml-post-data,post data API>> to provide data to the job,
|
processed, not the number of {es} documents.
|
||||||
the following records are not processed: +
|
|
||||||
+
|
|
||||||
--
|
|
||||||
* Records not in chronological order and outside the latency window
|
|
||||||
* Records with invalid timestamp
|
|
||||||
--
|
|
||||||
|
|
||||||
`sparse_bucket_count`::
|
`sparse_bucket_count`::
|
||||||
(long) The number of buckets that contained few data points compared to the
|
(long) The number of buckets that contained few data points compared to the
|
||||||
|
@ -167,12 +161,12 @@ The `model_size_stats` object has the following properties:
|
||||||
(string) For internal use. The type of result.
|
(string) For internal use. The type of result.
|
||||||
|
|
||||||
`total_by_field_count`::
|
`total_by_field_count`::
|
||||||
(long) The number of `by` field values that were analyzed by the models.
|
(long) The number of `by` field values that were analyzed by the models.+
|
||||||
|
|
||||||
NOTE: The `by` field values are counted separately for each detector and partition.
|
NOTE: The `by` field values are counted separately for each detector and partition.
|
||||||
|
|
||||||
`total_over_field_count`::
|
`total_over_field_count`::
|
||||||
(long) The number of `over` field values that were analyzed by the models.
|
(long) The number of `over` field values that were analyzed by the models.+
|
||||||
|
|
||||||
NOTE: The `over` field values are counted separately for each detector and partition.
|
NOTE: The `over` field values are counted separately for each detector and partition.
|
||||||
|
|
||||||
|
@ -196,12 +190,10 @@ This information is available only for open jobs.
|
||||||
(string) The node name.
|
(string) The node name.
|
||||||
|
|
||||||
`ephemeral_id`::
|
`ephemeral_id`::
|
||||||
|
(string) The ephemeral id of the node.
|
||||||
|
|
||||||
`transport_address`::
|
`transport_address`::
|
||||||
(string) The host and port where transport HTTP connections are accepted.
|
(string) The host and port where transport HTTP connections are accepted.
|
||||||
|
|
||||||
`attributes`::
|
`attributes`::
|
||||||
(object) {ml} attributes.
|
(object) For example, {"max_running_jobs": "10"}.
|
||||||
`max_running_jobs`::: The maximum number of concurrently open jobs that are
|
|
||||||
allowed per node.
|
|
||||||
|
|
|
@ -15,9 +15,17 @@ The job must have been opened prior to sending data.
|
||||||
|
|
||||||
File sizes are limited to 100 Mb, so if your file is larger,
|
File sizes are limited to 100 Mb, so if your file is larger,
|
||||||
then split it into multiple files and upload each one separately in sequential time order.
|
then split it into multiple files and upload each one separately in sequential time order.
|
||||||
When running in real-time, it is generally recommended to arrange to perform
|
When running in real-time, it is generally recommended to perform
|
||||||
many small uploads, rather than queueing data to upload larger files.
|
many small uploads, rather than queueing data to upload larger files.
|
||||||
|
|
||||||
|
When uploading data, check the <<ml-datacounts,job data counts>> for progress.
|
||||||
|
The following records will not be processed:
|
||||||
|
|
||||||
|
* Records not in chronological order and outside the latency window
|
||||||
|
* Records with an invalid timestamp
|
||||||
|
|
||||||
|
//TBD link to Working with Out of Order timeseries concept doc
|
||||||
|
|
||||||
IMPORTANT: Data can only be accepted from a single connection.
|
IMPORTANT: Data can only be accepted from a single connection.
|
||||||
Use a single connection synchronously to send data, close, flush, or delete a single job.
|
Use a single connection synchronously to send data, close, flush, or delete a single job.
|
||||||
It is not currently possible to post data to multiple jobs using wildcards
|
It is not currently possible to post data to multiple jobs using wildcards
|
||||||
|
|
|
@ -14,7 +14,6 @@ When choosing a new value, consider the following:
|
||||||
* Persistence enables snapshots to be reverted.
|
* Persistence enables snapshots to be reverted.
|
||||||
* The time taken to persist a job is proportional to the size of the model in memory.
|
* The time taken to persist a job is proportional to the size of the model in memory.
|
||||||
//* The smallest allowed value is 3600 (1 hour).
|
//* The smallest allowed value is 3600 (1 hour).
|
||||||
////
|
|
||||||
|
|
||||||
A model snapshot resource has the following properties:
|
A model snapshot resource has the following properties:
|
||||||
|
|
||||||
|
@ -34,7 +33,8 @@ A model snapshot resource has the following properties:
|
||||||
(object) Summary information describing the model. See <<ml-snapshot-stats,Model Size Statistics>>.
|
(object) Summary information describing the model. See <<ml-snapshot-stats,Model Size Statistics>>.
|
||||||
|
|
||||||
`retain`::
|
`retain`::
|
||||||
(boolean) If true, this snapshot will not be deleted during automatic cleanup of snapshots older than `model_snapshot_retention_days`.
|
(boolean) If true, this snapshot will not be deleted during automatic cleanup of snapshots
|
||||||
|
older than `model_snapshot_retention_days`.
|
||||||
However, this snapshot will be deleted when the job is deleted.
|
However, this snapshot will be deleted when the job is deleted.
|
||||||
The default value is false.
|
The default value is false.
|
||||||
|
|
||||||
|
@ -89,4 +89,4 @@ The `model_size_stats` object has the following properties:
|
||||||
|
|
||||||
`total_partition_field_count`::
|
`total_partition_field_count`::
|
||||||
(long) The number of _partition_ field values analyzed.
|
(long) The number of _partition_ field values analyzed.
|
||||||
////
|
|
||||||
|
|
|
@ -1,6 +1,6 @@
|
||||||
[[ml-settings]]
|
[[ml-settings]]
|
||||||
== Machine Learning Settings
|
== Machine Learning Settings
|
||||||
You do not need to configure any settings to use {ml}.
|
You do not need to configure any settings to use {ml}. It is enabled by default.
|
||||||
|
|
||||||
[float]
|
[float]
|
||||||
[[general-ml-settings]]
|
[[general-ml-settings]]
|
||||||
|
|
Loading…
Reference in New Issue