From 528ac3d9021d8b76063d1d42aa25743a2f864e06 Mon Sep 17 00:00:00 2001 From: Sophie Chang Date: Mon, 24 Apr 2017 17:31:31 +0100 Subject: [PATCH] [DOCS] ML API docs review (elastic/x-pack-elasticsearch#1169) * [DOCS] Fix for prelertcategory * [DOCS] _preview returns a page of data * [DOCS] Added adv options e.g. background_persist_interval" * [DOCS] Clarify meanings of model_snapshot params * [DOCS] Format fixes * [DOCS] Include _all keyword * [DOCS] Explain retain. * [DOCS] Further explanations for model size limits * [DOCS] Format fixes in quick ref * [DOCS] Update for exclude_interim * [DOCS] Update for exclude_interim * [DOCS] Update for exclude_interim Original commit: elastic/x-pack-elasticsearch@cdd2fcefdd3ea7cd2b517142c1bed1d2a02775de --- docs/en/ml/api-quickref.asciidoc | 20 ++++---- docs/en/rest-api/ml/datafeedresource.asciidoc | 8 ++-- docs/en/rest-api/ml/get-bucket.asciidoc | 4 +- .../rest-api/ml/get-datafeed-stats.asciidoc | 3 +- docs/en/rest-api/ml/get-datafeed.asciidoc | 3 +- docs/en/rest-api/ml/get-influencer.asciidoc | 4 +- docs/en/rest-api/ml/get-job-stats.asciidoc | 4 +- docs/en/rest-api/ml/get-job.asciidoc | 4 +- docs/en/rest-api/ml/get-record.asciidoc | 4 +- docs/en/rest-api/ml/jobresource.asciidoc | 22 ++++++++- docs/en/rest-api/ml/preview-datafeed.asciidoc | 10 ++-- docs/en/rest-api/ml/snapshotresource.asciidoc | 47 ++++++++++--------- docs/en/rest-api/ml/update-job.asciidoc | 5 +- docs/en/rest-api/ml/update-snapshot.asciidoc | 10 ++-- 14 files changed, 86 insertions(+), 62 deletions(-) diff --git a/docs/en/ml/api-quickref.asciidoc b/docs/en/ml/api-quickref.asciidoc index 255dd8ff19e..f241ea593fe 100644 --- a/docs/en/ml/api-quickref.asciidoc +++ b/docs/en/ml/api-quickref.asciidoc @@ -13,7 +13,7 @@ The main {ml} resources can be accessed with a variety of endpoints: * <>: Create and manage {ml} jobs. * <>: Update data to be analyzed. * <>: Access the results of a {ml} job. -* <>: Manage model snapshots. +* <>: Manage model snapshots. * <>: Validate subsections of job configurations. [float] @@ -22,7 +22,7 @@ The main {ml} resources can be accessed with a variety of endpoints: * <>: Create a job * </_open>>: Open a job -* </_data>>: Send data to a job +* </_data>>: Send data to a job * <>: List jobs * <+++>>: Get job details * </_stats>>: Get job statistics @@ -35,15 +35,15 @@ The main {ml} resources can be accessed with a variety of endpoints: [[ml-api-datafeeds]] === /datafeeds/ -* <+++>>: Create a data feed -* </_start>>: Start a data feed +* <+++>>: Create a data feed +* </_start>>: Start a data feed * <>: List data feeds -* <+++>>: Get data feed details -* </_stats>>: Get statistical information for data feeds -* </_preview>>: Get a preview of a data feed -* </_update>>: Update certain settings for a data feed -* </_stop>>: Stop a data feed -* <+++>>: Delete data feed +* <+++>>: Get data feed details +* </_stats>>: Get statistical information for data feeds +* </_preview>>: Get a preview of a data feed +* </_update>>: Update certain settings for a data feed +* </_stop>>: Stop a data feed +* <+++>>: Delete data feed [float] [[ml-api-results]] diff --git a/docs/en/rest-api/ml/datafeedresource.asciidoc b/docs/en/rest-api/ml/datafeedresource.asciidoc index 86b3ac09dfd..31a2849afc9 100644 --- a/docs/en/rest-api/ml/datafeedresource.asciidoc +++ b/docs/en/rest-api/ml/datafeedresource.asciidoc @@ -64,11 +64,11 @@ progress of a data feed. For example: The node that is running the query? `id`::: TBD. For example, "0-o0tOoRTwKFZifatTWKNw". `name`::: TBD. For example, "0-o0tOo". - `ephemeral_id::: TBD. For example, "DOZltLxLS_SzYpW6hQ9hyg". - `transport_address::: TBD. For example, "127.0.0.1:9300". + `ephemeral_id`::: TBD. For example, "DOZltLxLS_SzYpW6hQ9hyg". + `transport_address`::: TBD. For example, "127.0.0.1:9300". `attributes`::: TBD. For example, {"max_running_jobs": "10"}. `state`:: (string) The status of the data feed, which can be one of the following values: + - started::: The data feed is actively receiving data. - stopped::: The data feed is stopped and will not receive data until it is re-started. + `started`::: The data feed is actively receiving data. + `stopped`::: The data feed is stopped and will not receive data until it is re-started. diff --git a/docs/en/rest-api/ml/get-bucket.asciidoc b/docs/en/rest-api/ml/get-bucket.asciidoc index f1e27bec94a..4a98b920758 100644 --- a/docs/en/rest-api/ml/get-bucket.asciidoc +++ b/docs/en/rest-api/ml/get-bucket.asciidoc @@ -45,8 +45,8 @@ roles provide these privileges. For more information, see `from`:: (integer) Skips the specified number of buckets. -`include_interim`:: - (boolean) If true, the output includes interim results. +`exclude_interim`:: + (boolean) If true, the output excludes interim results. These are included by default. `size`:: (integer) Specifies the maximum number of buckets to obtain. diff --git a/docs/en/rest-api/ml/get-datafeed-stats.asciidoc b/docs/en/rest-api/ml/get-datafeed-stats.asciidoc index 9e023048735..a82d4f5ce84 100644 --- a/docs/en/rest-api/ml/get-datafeed-stats.asciidoc +++ b/docs/en/rest-api/ml/get-datafeed-stats.asciidoc @@ -23,8 +23,7 @@ privileges to use this API. For more information, see < `feed_id`:: (string) Identifier for the data feed. - If you do not specify this optional parameter, the API returns information - about all data feeds. + Does not support wildcards, however you may specify `_all` to get information about all data feeds. ===== Results diff --git a/docs/en/rest-api/ml/get-datafeed.asciidoc b/docs/en/rest-api/ml/get-datafeed.asciidoc index 8de5b0ab90a..661170a1dba 100644 --- a/docs/en/rest-api/ml/get-datafeed.asciidoc +++ b/docs/en/rest-api/ml/get-datafeed.asciidoc @@ -22,8 +22,7 @@ privileges to use this API. For more information, see < `feed_id`:: (string) Identifier for the data feed. - If you do not specify this optional parameter, the API returns information - about all data feeds. + Does not support wildcards, however you may specify `_all` or leave blank to get information about all data feeds. ===== Results diff --git a/docs/en/rest-api/ml/get-influencer.asciidoc b/docs/en/rest-api/ml/get-influencer.asciidoc index da4cd0d9cbd..d0192bc8caa 100644 --- a/docs/en/rest-api/ml/get-influencer.asciidoc +++ b/docs/en/rest-api/ml/get-influencer.asciidoc @@ -34,8 +34,8 @@ roles provide these privileges. For more information, see `from`:: (integer) Skips the specified number of influencers. -`include_interim`:: - (boolean) If true, the output includes interim results. +`exclude_interim`:: + (boolean) If true, the output excludes interim results. These are included by default. `influencer_score`:: (double) Returns influencers with anomaly scores higher than this value. diff --git a/docs/en/rest-api/ml/get-job-stats.asciidoc b/docs/en/rest-api/ml/get-job-stats.asciidoc index e6d2e4fb082..d32409e46f9 100644 --- a/docs/en/rest-api/ml/get-job-stats.asciidoc +++ b/docs/en/rest-api/ml/get-job-stats.asciidoc @@ -19,8 +19,8 @@ privileges to use this API. For more information, see < ===== Path Parameters `job_id`:: - (string) Identifier for the job. If you do not specify this optional parameter, - the API returns information about all jobs. + (string) A required identifier for the job. + Does not support wildcards, however you may specify `_all` to get information about all jobs. ===== Results diff --git a/docs/en/rest-api/ml/get-job.asciidoc b/docs/en/rest-api/ml/get-job.asciidoc index 28a47a0c589..ee5e7142278 100644 --- a/docs/en/rest-api/ml/get-job.asciidoc +++ b/docs/en/rest-api/ml/get-job.asciidoc @@ -19,8 +19,8 @@ privileges to use this API. For more information, see < ===== Path Parameters `job_id`:: - (string) Identifier for the job. If you do not specify this optional parameter, - the API returns information about all jobs. + (string) Identifier for the job. + Does not support wildcards, however you may specify `_all` or leave blank to get information about all jobs. ===== Results diff --git a/docs/en/rest-api/ml/get-record.asciidoc b/docs/en/rest-api/ml/get-record.asciidoc index fb52dc0986f..84ed7a39349 100644 --- a/docs/en/rest-api/ml/get-record.asciidoc +++ b/docs/en/rest-api/ml/get-record.asciidoc @@ -33,8 +33,8 @@ roles provide these privileges. For more information, see `from`:: (integer) Skips the specified number of records. -`include_interim`:: - (boolean) If true, the output includes interim results. +`exclude_interim`:: + (boolean) If true, the output excludes interim results. These are included by default. `record_score`:: (double) Returns records with anomaly scores higher than this value. diff --git a/docs/en/rest-api/ml/jobresource.asciidoc b/docs/en/rest-api/ml/jobresource.asciidoc index 1ba57f3a80f..17127ae3cd4 100644 --- a/docs/en/rest-api/ml/jobresource.asciidoc +++ b/docs/en/rest-api/ml/jobresource.asciidoc @@ -12,6 +12,13 @@ A job resource has the following properties: (object) Defines approximate limits on the memory resource requirements for the job. See <>. +`background_persist_interval`:: + (time units) Advanced configuration option. + The time between each periodic persistence of the model. + The default value is a randomized value between 3 to 4 hours which avoid all jobs persisting at exactly the same time. + For very large models (several GB), persistence could take 10-20 minutes, so please do not set this value too low. + The smallest allowed value is 1 hour. + `create_time`:: (string) The time the job was created, in ISO 8601 format. For example, `1491007356077`. @@ -29,7 +36,7 @@ A job resource has the following properties: `job_id`:: (string) The unique identifier for the job. - + `job_type`:: (string) Reserved for future use, currently set to `anomaly_detector`. @@ -45,11 +52,22 @@ A job resource has the following properties: (long) The time in days that model snapshots are retained for the job. Older snapshots are deleted. The default value is 1 day. +`renormalization_window_days`:: + (long) Advanced configuration option. + The period over which adjustments to the score are applied, as new data is seen. + The default value is the longer of 30 days or 100 `bucket_spans`. + `results_index_name`:: (string) The name of the index in which to store the {ml} results. The default value is `shared`, which corresponds to the index name `.ml-anomalies-shared` +`results_retention_days`:: + (long) Advanced configuration option. + The number of days for which job results are retained. + Once per day at 00:30 (server time), results older than this period will be deleted from Elasticsearch. + The default value is null, i.e. results are retained. + [[ml-analysisconfig]] ===== Analysis Configuration Objects @@ -62,7 +80,7 @@ An analysis configuration object has the following properties: `categorization_field_name`:: (string) If not null, the values of the specified field will be categorized. The resulting categories can be used in a detector by setting `by_field_name`, - `over_field_name`, or `partition_field_name` to the keyword `prelertcategory`. + `over_field_name`, or `partition_field_name` to the keyword `mlcategory`. `categorization_filters`:: (array of strings) If `categorization_field_name` is specified, diff --git a/docs/en/rest-api/ml/preview-datafeed.asciidoc b/docs/en/rest-api/ml/preview-datafeed.asciidoc index b1b9baa007b..ef461a88f8a 100644 --- a/docs/en/rest-api/ml/preview-datafeed.asciidoc +++ b/docs/en/rest-api/ml/preview-datafeed.asciidoc @@ -6,20 +6,20 @@ The preview data feed API enables you to preview a data feed. ===== Request -`GET _xpack/ml/datafeeds//_preview` +`GET _xpack/ml/datafeeds//_preview` ===== Description -//TBD: How much data does it return? -The API returns example data by using the current data feed settings. +The API returns the first "page" of results from the `search` created using the current data feed settings. +This shows the structure of the data that will be passed to the anomaly detection engine. You must have `monitor_ml`, `monitor`, `manage_ml`, or `manage` cluster privileges to use this API. For more information, see <>. ===== Path Parameters -`feed_id` (required):: +`datafeed_id` (required):: (string) Identifier for the data feed //// @@ -41,7 +41,7 @@ TBD //// ===== Examples -The following example obtains a previews of the `datafeed-farequote` data feed: +The following example obtains a preview of the `datafeed-farequote` data feed: [source,js] -------------------------------------------------- diff --git a/docs/en/rest-api/ml/snapshotresource.asciidoc b/docs/en/rest-api/ml/snapshotresource.asciidoc index eb113d9d03a..07fd44e31cd 100644 --- a/docs/en/rest-api/ml/snapshotresource.asciidoc +++ b/docs/en/rest-api/ml/snapshotresource.asciidoc @@ -2,13 +2,11 @@ [[ml-snapshot-resource]] ==== Model Snapshot Resources -//// Model snapshots are saved to disk periodically. -By default, this is occurs approximately every 3 hours. -//TBD: Can you change this setting? +By default, this is occurs approximately every 3 hours to 4 hours and is configurable using the setting `background_persist_interval`. By default, model snapshots are retained for one day. You can change this -behavior with by updating the `model_snapshot_retention_days` for the job. +behavior by updating the `model_snapshot_retention_days` for the job. When choosing a new value, consider the following: * Persistence enables resilience in the event of a system failure. @@ -23,30 +21,31 @@ A model snapshot resource has the following properties: (string) An optional description of the job. `job_id`:: - (string) A numerical character string that uniquely identifies the job. + (string) A numerical character string that uniquely identifing the job that the snapshot was created for. `latest_record_time_stamp`:: - () TBD. For example: 1455232663000. + (date) The timestamp of the latest processed record. `latest_result_time_stamp`:: - () TBD. For example: 1455229800000. + (date) The timestamp of the latest bucket result. `model_size_stats`:: - (object) TBD. See <>. + (object) Summary information describing the model. See <>. `retain`:: - (boolean) TBD. For example: false. + (boolean) If true, this snapshot will not be deleted during automatic cleanup of snapshots older than `model_snapshot_retention_days`. + However, this snapshot will be deleted when the job is deleted. + The default value is false. `snapshot_id`:: (string) A numerical character string that uniquely identifies the model snapshot. For example: "1491852978". `snapshot_doc_count`:: - () TBD. For example: 1. + (long) For internal use only. `timestamp`:: - (date) The creation timestamp for the snapshot, specified in ISO 8601 format. - For example: 1491852978000. + (date) The creation timestamp for the snapshot. [float] [[ml-snapshot-stats]] @@ -55,31 +54,37 @@ A model snapshot resource has the following properties: The `model_size_stats` object has the following properties: `bucket_allocation_failures_count`:: - () TBD. For example: 0. + (long) The number of buckets for which entites were not processed due to memory limit constraints. `job_id`:: (string) A numerical character string that uniquely identifies the job. `log_time`:: - () TBD. For example: 1491852978000. + (date) The timestamp that the `model_size_stats` were recorded, according to server-time. `memory_status`:: - () TBD. For example: "ok". + (string) The status of the memory in relation to its `model_memory_limit`. + Contains one of the following values. + `ok`::: The internal models stayed below the configured value. + `soft_limit`::: The internal models require more than 60% of the configured memory limit and more aggressive pruning will + be performed in order to try to reclaim space. + `hard_limit`::: The internal models require more space that the configured memory limit. + Some incoming data could not be processed. `model_bytes`:: - () TBD. For example: 100393. + (long) An approximation of the memory resources required for this analysis. `result_type`:: - () TBD. For example: "model_size_stats". + (string) Internal. This value is always set to "model_size_stats". `timestamp`:: - () TBD. For example: 1455229800000. + (date) The timestamp that the `model_size_stats` were recorded, according to the bucket timestamp of the data. `total_by_field_count`:: - () TBD. For example: 13. + (long) The number of _by_ field values analyzed. Note that these are counted separately for each detector and partition. `total_over_field_count`:: - () TBD. For example: 0. + (long) The number of _over_ field values analyzed. Note that these are counted separately for each detector and partition. `total_partition_field_count`:: - () TBD. For example: 2. + (long) The number of _partition_ field values analyzed. diff --git a/docs/en/rest-api/ml/update-job.asciidoc b/docs/en/rest-api/ml/update-job.asciidoc index 19072cc5500..a14e53258f1 100644 --- a/docs/en/rest-api/ml/update-job.asciidoc +++ b/docs/en/rest-api/ml/update-job.asciidoc @@ -13,7 +13,7 @@ The update job API allows you to update certain properties of a job. You must have `manage_ml`, or `manage` cluster privileges to use this API. For more information, see <>. -//TBD: Important:: Updates do not take effect until after then job is closed and new data is sent to it. +//TBD: Important:: Updates do not take effect until after then job is closed and re-opened. ===== Path Parameters @@ -34,7 +34,8 @@ The following properties can be updated after the job is created: * You can update the `analysis_limits` only while the job is closed. * The `model_memory_limit` property value cannot be decreased. * If the `memory_status` property in the `model_size_stats` object has a value of `hard_limit`, - increasing the `model_memory_limit` is not recommended. + this means that it was unable to process some data. You may wish to re-run this job + with an increased `model_memory_limit`. `description`:: (string) An optional description of the job. diff --git a/docs/en/rest-api/ml/update-snapshot.asciidoc b/docs/en/rest-api/ml/update-snapshot.asciidoc index 8b557ea895e..6aaf3014290 100644 --- a/docs/en/rest-api/ml/update-snapshot.asciidoc +++ b/docs/en/rest-api/ml/update-snapshot.asciidoc @@ -11,10 +11,10 @@ The update model snapshot API enables you to update certain properties of a snap ===== Description -//TBD. Is the following still true? +//TBD. Is the following still true? - not sure but close/open would be the method Updates to the configuration are only applied after the job has been closed -and new data has been sent to it. +and re-opened. You must have `manage_ml`, or `manage` cluster privileges to use this API. For more information, see <>. @@ -32,10 +32,12 @@ For more information, see <>. The following properties can be updated after the model snapshot is created: `description`:: - (string) An optional description of the model snapshot. + (string) An optional description of the model snapshot. E.g. "Before black friday" `retain`:: - (boolean) TBD. + (boolean) If true, this snapshot will not be deleted during automatic cleanup of snapshots older than `model_snapshot_retention_days`. + Note that this snapshot will still be deleted when the job is deleted. + The default value is false. //// ===== Responses