From 528ac3d9021d8b76063d1d42aa25743a2f864e06 Mon Sep 17 00:00:00 2001
From: Sophie Chang <sophiec20@users.noreply.github.com>
Date: Mon, 24 Apr 2017 17:31:31 +0100
Subject: [PATCH] [DOCS] ML API docs review (elastic/x-pack-elasticsearch#1169)

* [DOCS] Fix for prelertcategory

* [DOCS] _preview returns a page of data

* [DOCS] Added adv options e.g. background_persist_interval"

* [DOCS] Clarify meanings of model_snapshot params

* [DOCS] Format fixes

* [DOCS] Include _all keyword

* [DOCS] Explain retain.

* [DOCS] Further explanations for model size limits

* [DOCS] Format fixes in quick ref

* [DOCS] Update for exclude_interim

* [DOCS] Update for exclude_interim

* [DOCS] Update for exclude_interim

Original commit: elastic/x-pack-elasticsearch@cdd2fcefdd3ea7cd2b517142c1bed1d2a02775de
---
 docs/en/ml/api-quickref.asciidoc              | 20 ++++----
 docs/en/rest-api/ml/datafeedresource.asciidoc |  8 ++--
 docs/en/rest-api/ml/get-bucket.asciidoc       |  4 +-
 .../rest-api/ml/get-datafeed-stats.asciidoc   |  3 +-
 docs/en/rest-api/ml/get-datafeed.asciidoc     |  3 +-
 docs/en/rest-api/ml/get-influencer.asciidoc   |  4 +-
 docs/en/rest-api/ml/get-job-stats.asciidoc    |  4 +-
 docs/en/rest-api/ml/get-job.asciidoc          |  4 +-
 docs/en/rest-api/ml/get-record.asciidoc       |  4 +-
 docs/en/rest-api/ml/jobresource.asciidoc      | 22 ++++++++-
 docs/en/rest-api/ml/preview-datafeed.asciidoc | 10 ++--
 docs/en/rest-api/ml/snapshotresource.asciidoc | 47 ++++++++++---------
 docs/en/rest-api/ml/update-job.asciidoc       |  5 +-
 docs/en/rest-api/ml/update-snapshot.asciidoc  | 10 ++--
 14 files changed, 86 insertions(+), 62 deletions(-)
diff --git a/docs/en/ml/api-quickref.asciidoc b/docs/en/ml/api-quickref.asciidoc
index 255dd8ff19e..f241ea593fe 100644
--- a/docs/en/ml/api-quickref.asciidoc
+++ b/docs/en/ml/api-quickref.asciidoc
@@ -13,7 +13,7 @@ The main {ml} resources can be accessed with a variety of endpoints:
 * <<ml-api-jobs,+/anomaly_detectors/+>>: Create and manage {ml} jobs.
 * <<ml-api-datafeeds,+/datafeeds/+>>: Update data to be analyzed.
 * <<ml-api-results,+/results/+>>: Access the results of a {ml} job.
-* <<ml-api-snapshots,+/modelsnapshots/+>>: Manage model snapshots.
+* <<ml-api-snapshots,+/model_snapshots/+>>: Manage model snapshots.
 * <<ml-api-validate,+/validate/+>>: Validate subsections of job configurations.
 
 [float]
@@ -22,7 +22,7 @@ The main {ml} resources can be accessed with a variety of endpoints:
 
 * <<ml-put-job,POST /anomaly_detectors>>: Create a job
 * <<ml-open-job,POST /anomaly_detectors/<job_id>/_open>>: Open a job
-* <<ml-post-data,POST anomaly_detectors/<job_id>/_data>>: Send data to a job
+* <<ml-post-data,POST /anomaly_detectors/<job_id>/_data>>: Send data to a job
 * <<ml-get-job,GET /anomaly_detectors>>: List jobs
 * <<ml-get-job,GET /anomaly_detectors/<job_id+++>+++>>: Get job details
 * <<ml-get-job-stats,GET /anomaly_detectors/<job_id>/_stats>>: Get job statistics
@@ -35,15 +35,15 @@ The main {ml} resources can be accessed with a variety of endpoints:
 [[ml-api-datafeeds]]
 === /datafeeds/
 
-* <<ml-put-datafeed,PUT /datafeeds/<datafeedID+++>+++>>: Create a data feed
-* <<ml-start-datafeed,POST /datafeeds/<feed_id>/_start>>: Start a data feed
+* <<ml-put-datafeed,PUT /datafeeds/<datafeed_id+++>+++>>: Create a data feed
+* <<ml-start-datafeed,POST /datafeeds/<datafeed_id>/_start>>: Start a data feed
 * <<ml-get-datafeed,GET /datafeeds>>: List data feeds
-* <<ml-get-datafeed,GET /datafeeds/<feed_id+++>+++>>: Get data feed details
-* <<ml-get-datafeed-stats,GET /datafeeds/<feed_id>/_stats>>: Get statistical information for data feeds
-* <<ml-preview-datafeed,GET /datafeeds/<feed_id>/_preview>>: Get a preview of a data feed
-* <<ml-update-datafeed,POST /datafeeds/<feedid>/_update>>: Update certain settings for a data feed
-* <<ml-stop-datafeed,POST /datafeeds/<feed_id>/_stop>>: Stop a data feed
-* <<ml-delete-datafeed,DELETE /datafeeds/<feed_id+++>+++>>: Delete data feed
+* <<ml-get-datafeed,GET /datafeeds/<datafeed_id+++>+++>>: Get data feed details
+* <<ml-get-datafeed-stats,GET /datafeeds/<datafeed_id>/_stats>>: Get statistical information for data feeds
+* <<ml-preview-datafeed,GET /datafeeds/<datafeed_id>/_preview>>: Get a preview of a data feed
+* <<ml-update-datafeed,POST /datafeeds/<datafeedid>/_update>>: Update certain settings for a data feed
+* <<ml-stop-datafeed,POST /datafeeds/<datafeed_id>/_stop>>: Stop a data feed
+* <<ml-delete-datafeed,DELETE /datafeeds/<datafeed_id+++>+++>>: Delete data feed
 
 [float]
 [[ml-api-results]]
diff --git a/docs/en/rest-api/ml/datafeedresource.asciidoc b/docs/en/rest-api/ml/datafeedresource.asciidoc
index 86b3ac09dfd..31a2849afc9 100644
--- a/docs/en/rest-api/ml/datafeedresource.asciidoc
+++ b/docs/en/rest-api/ml/datafeedresource.asciidoc
@@ -64,11 +64,11 @@ progress of a data feed. For example:
   The node that is running the query?
   `id`::: TBD. For example, "0-o0tOoRTwKFZifatTWKNw".
   `name`::: TBD. For example, "0-o0tOo".
-  `ephemeral_id::: TBD. For example, "DOZltLxLS_SzYpW6hQ9hyg".
-  `transport_address::: TBD. For example, "127.0.0.1:9300".
+  `ephemeral_id`::: TBD. For example, "DOZltLxLS_SzYpW6hQ9hyg".
+  `transport_address`::: TBD. For example, "127.0.0.1:9300".
   `attributes`::: TBD. For example, {"max_running_jobs": "10"}.
 
 `state`::
   (string) The status of the data feed, which can be one of the following values: +
-  started::: The data feed is actively receiving data.
-  stopped::: The data feed is stopped and will not receive data until it is re-started.
+  `started`::: The data feed is actively receiving data.
+  `stopped`::: The data feed is stopped and will not receive data until it is re-started.
diff --git a/docs/en/rest-api/ml/get-bucket.asciidoc b/docs/en/rest-api/ml/get-bucket.asciidoc
index f1e27bec94a..4a98b920758 100644
--- a/docs/en/rest-api/ml/get-bucket.asciidoc
+++ b/docs/en/rest-api/ml/get-bucket.asciidoc
@@ -45,8 +45,8 @@ roles provide these privileges. For more information, see
 `from`::
   (integer) Skips the specified number of buckets.
 
-`include_interim`::
-  (boolean) If true, the output includes interim results.
+`exclude_interim`::
+  (boolean) If true, the output excludes interim results. These are included by default.
 
 `size`::
   (integer) Specifies the maximum number of buckets to obtain.
diff --git a/docs/en/rest-api/ml/get-datafeed-stats.asciidoc b/docs/en/rest-api/ml/get-datafeed-stats.asciidoc
index 9e023048735..a82d4f5ce84 100644
--- a/docs/en/rest-api/ml/get-datafeed-stats.asciidoc
+++ b/docs/en/rest-api/ml/get-datafeed-stats.asciidoc
@@ -23,8 +23,7 @@ privileges to use this API. For more information, see <<privileges-list-cluster>
 
 `feed_id`::
   (string) Identifier for the data feed.
-  If you do not specify this optional parameter, the API returns information
-  about all data feeds.
+  Does not support wildcards, however you may specify `_all` to get information about all data feeds.
 
 ===== Results
 
diff --git a/docs/en/rest-api/ml/get-datafeed.asciidoc b/docs/en/rest-api/ml/get-datafeed.asciidoc
index 8de5b0ab90a..661170a1dba 100644
--- a/docs/en/rest-api/ml/get-datafeed.asciidoc
+++ b/docs/en/rest-api/ml/get-datafeed.asciidoc
@@ -22,8 +22,7 @@ privileges to use this API. For more information, see <<privileges-list-cluster>
 
 `feed_id`::
   (string) Identifier for the data feed.
-  If you do not specify this optional parameter, the API returns information
-  about all data feeds.
+  Does not support wildcards, however you may specify `_all` or leave blank to get information about all data feeds.
 
 ===== Results
 
diff --git a/docs/en/rest-api/ml/get-influencer.asciidoc b/docs/en/rest-api/ml/get-influencer.asciidoc
index da4cd0d9cbd..d0192bc8caa 100644
--- a/docs/en/rest-api/ml/get-influencer.asciidoc
+++ b/docs/en/rest-api/ml/get-influencer.asciidoc
@@ -34,8 +34,8 @@ roles provide these privileges. For more information, see
 `from`::
   (integer) Skips the specified number of influencers.
 
-`include_interim`::
-  (boolean) If true, the output includes interim results.
+`exclude_interim`::
+  (boolean) If true, the output excludes interim results. These are included by default.
 
 `influencer_score`::
   (double) Returns influencers with anomaly scores higher than this value.
diff --git a/docs/en/rest-api/ml/get-job-stats.asciidoc b/docs/en/rest-api/ml/get-job-stats.asciidoc
index e6d2e4fb082..d32409e46f9 100644
--- a/docs/en/rest-api/ml/get-job-stats.asciidoc
+++ b/docs/en/rest-api/ml/get-job-stats.asciidoc
@@ -19,8 +19,8 @@ privileges to use this API. For more information, see <<privileges-list-cluster>
 ===== Path Parameters
 
 `job_id`::
-  (string) Identifier for the job. If you do not specify this optional parameter,
-  the API returns information about all jobs.
+  (string) A required identifier for the job. 
+  Does not support wildcards, however you may specify `_all` to get information about all jobs.
 
 
 ===== Results
diff --git a/docs/en/rest-api/ml/get-job.asciidoc b/docs/en/rest-api/ml/get-job.asciidoc
index 28a47a0c589..ee5e7142278 100644
--- a/docs/en/rest-api/ml/get-job.asciidoc
+++ b/docs/en/rest-api/ml/get-job.asciidoc
@@ -19,8 +19,8 @@ privileges to use this API. For more information, see <<privileges-list-cluster>
 ===== Path Parameters
 
 `job_id`::
-  (string) Identifier for the job. If you do not specify this optional parameter,
-  the API returns information about all jobs.
+  (string) Identifier for the job. 
+  Does not support wildcards, however you may specify `_all` or leave blank to get information about all jobs.
 
 ===== Results
 
diff --git a/docs/en/rest-api/ml/get-record.asciidoc b/docs/en/rest-api/ml/get-record.asciidoc
index fb52dc0986f..84ed7a39349 100644
--- a/docs/en/rest-api/ml/get-record.asciidoc
+++ b/docs/en/rest-api/ml/get-record.asciidoc
@@ -33,8 +33,8 @@ roles provide these privileges. For more information, see
 `from`::
   (integer) Skips the specified number of records.
 
-`include_interim`::
-  (boolean) If true, the output includes interim results.
+`exclude_interim`::
+  (boolean) If true, the output excludes interim results. These are included by default.
 
 `record_score`::
   (double) Returns records with anomaly scores higher than this value.
diff --git a/docs/en/rest-api/ml/jobresource.asciidoc b/docs/en/rest-api/ml/jobresource.asciidoc
index 1ba57f3a80f..17127ae3cd4 100644
--- a/docs/en/rest-api/ml/jobresource.asciidoc
+++ b/docs/en/rest-api/ml/jobresource.asciidoc
@@ -12,6 +12,13 @@ A job resource has the following properties:
   (object) Defines approximate limits on the memory resource requirements for the job.
   See <<ml-apilimits,analysis limits>>.
 
+`background_persist_interval`::
+  (time units) Advanced configuration option. 
+  The time between each periodic persistence of the model. 
+  The default value is a randomized value between 3 to 4 hours which avoid all jobs persisting at exactly the same time.
+  For very large models (several GB), persistence could take 10-20 minutes, so please do not set this value too low. 
+  The smallest allowed value is 1 hour. 
+
 `create_time`::
   (string) The time the job was created, in ISO 8601 format.
   For example, `1491007356077`.
@@ -29,7 +36,7 @@ A job resource has the following properties:
 
 `job_id`::
   (string) The unique identifier for the job.
-
+ 
 `job_type`::
   (string) Reserved for future use, currently set to `anomaly_detector`.
 
@@ -45,11 +52,22 @@ A job resource has the following properties:
   (long) The time in days that model snapshots are retained for the job.
   Older snapshots are deleted. The default value is 1 day.
 
+`renormalization_window_days`::
+  (long) Advanced configuration option.
+  The period over which adjustments to the score are applied, as new data is seen.
+  The default value is the longer of 30 days or 100 `bucket_spans`.
+
 `results_index_name`::
   (string) The name of the index in which to store the {ml} results.
   The default value is `shared`,
   which corresponds to the index name `.ml-anomalies-shared`
 
+`results_retention_days`::
+  (long) Advanced configuration option. 
+  The number of days for which job results are retained. 
+  Once per day at 00:30 (server time), results older than this period will be deleted from Elasticsearch. 
+  The default value is null, i.e. results are retained.
+
 [[ml-analysisconfig]]
 ===== Analysis Configuration Objects
 
@@ -62,7 +80,7 @@ An analysis configuration object has the following properties:
 `categorization_field_name`::
   (string) If not null, the values of the specified field will be categorized.
   The resulting categories can be used in a detector by setting `by_field_name`,
-  `over_field_name`, or `partition_field_name` to the keyword `prelertcategory`.
+  `over_field_name`, or `partition_field_name` to the keyword `mlcategory`.
 
 `categorization_filters`::
   (array of strings) If `categorization_field_name` is specified,
diff --git a/docs/en/rest-api/ml/preview-datafeed.asciidoc b/docs/en/rest-api/ml/preview-datafeed.asciidoc
index b1b9baa007b..ef461a88f8a 100644
--- a/docs/en/rest-api/ml/preview-datafeed.asciidoc
+++ b/docs/en/rest-api/ml/preview-datafeed.asciidoc
@@ -6,20 +6,20 @@ The preview data feed API enables you to preview a data feed.
 
 ===== Request
 
-`GET _xpack/ml/datafeeds/<feed_id>/_preview`
+`GET _xpack/ml/datafeeds/<datafeed_id>/_preview`
 
 
 ===== Description
 
-//TBD: How much data does it return?
-The API returns example data by using the current data feed settings.
+The API returns the first "page" of results from the `search` created using the current data feed settings.
+This shows the structure of the data that will be passed to the anomaly detection engine.
 
 You must have `monitor_ml`, `monitor`, `manage_ml`, or `manage` cluster
 privileges to use this API. For more information, see <<privileges-list-cluster>>.
 
 ===== Path Parameters
 
-`feed_id` (required)::
+`datafeed_id` (required)::
   (string) Identifier for the data feed
 
 ////
@@ -41,7 +41,7 @@ TBD
 ////
 ===== Examples
 
-The following example obtains a previews of the `datafeed-farequote` data feed:
+The following example obtains a preview of the `datafeed-farequote` data feed:
 
 [source,js]
 --------------------------------------------------
diff --git a/docs/en/rest-api/ml/snapshotresource.asciidoc b/docs/en/rest-api/ml/snapshotresource.asciidoc
index eb113d9d03a..07fd44e31cd 100644
--- a/docs/en/rest-api/ml/snapshotresource.asciidoc
+++ b/docs/en/rest-api/ml/snapshotresource.asciidoc
@@ -2,13 +2,11 @@
 [[ml-snapshot-resource]]
 ==== Model Snapshot Resources
 
-////
 Model snapshots are saved to disk periodically.
-By default, this is occurs approximately every 3 hours.
-//TBD: Can you change this setting?
+By default, this is occurs approximately every 3 hours to 4 hours and is configurable using the setting `background_persist_interval`.
 
 By default, model snapshots are retained for one day. You can change this
-behavior with by updating the `model_snapshot_retention_days` for the job.
+behavior by updating the `model_snapshot_retention_days` for the job.
 When choosing a new value, consider the following:
 
 * Persistence enables resilience in the event of a system failure.
@@ -23,30 +21,31 @@ A model snapshot resource has the following properties:
   (string) An optional description of the job.
 
 `job_id`::
-  (string) A numerical character string that uniquely identifies the job.
+  (string) A numerical character string that uniquely identifing the job that the snapshot was created for.
 
 `latest_record_time_stamp`::
-  () TBD. For example: 1455232663000.
+  (date) The timestamp of the latest processed record.
 
 `latest_result_time_stamp`::
-  () TBD. For example: 1455229800000.
+  (date) The timestamp of the latest bucket result.
 
 `model_size_stats`::
-  (object) TBD. See <<ml-snapshot-stats,Model Size Statistics>>.
+  (object) Summary information describing the model. See <<ml-snapshot-stats,Model Size Statistics>>.
 
 `retain`::
-  (boolean) TBD. For example: false.
+  (boolean) If true, this snapshot will not be deleted during automatic cleanup of snapshots older than `model_snapshot_retention_days`.
+  However, this snapshot will be deleted when the job is deleted.
+  The default value is false.
 
 `snapshot_id`::
   (string) A numerical character string that uniquely identifies the model
   snapshot. For example: "1491852978".
 
 `snapshot_doc_count`::
-  () TBD. For example: 1.
+  (long) For internal use only.
 
 `timestamp`::
-  (date) The creation timestamp for the snapshot, specified in ISO 8601 format.
-  For example: 1491852978000.
+  (date) The creation timestamp for the snapshot.
 
 [float]
 [[ml-snapshot-stats]]
@@ -55,31 +54,37 @@ A model snapshot resource has the following properties:
 The `model_size_stats` object has the following properties:
 
 `bucket_allocation_failures_count`::
-  () TBD. For example: 0.
+  (long) The number of buckets for which entites were not processed due to memory limit constraints.
 
 `job_id`::
   (string) A numerical character string that uniquely identifies the job.
 
 `log_time`::
-  () TBD. For example: 1491852978000.
+  (date) The timestamp that the `model_size_stats` were recorded, according to server-time.
 
 `memory_status`::
-  () TBD. For example: "ok".
+  (string) The status of the memory in relation to its `model_memory_limit`.
+  Contains one of the following values.
+  `ok`::: The internal models stayed below the configured value.
+  `soft_limit`::: The internal models require more than 60% of the configured memory limit and more aggressive pruning will
+  be performed in order to try to reclaim space.
+  `hard_limit`::: The internal models require more space that the configured memory limit.
+  Some incoming data could not be processed.
 
 `model_bytes`::
-  () TBD. For example: 100393.
+  (long) An approximation of the memory resources required for this analysis.
 
 `result_type`::
-  () TBD. For example: "model_size_stats".
+  (string) Internal. This value is always set to "model_size_stats".
 
 `timestamp`::
-  () TBD. For example: 1455229800000.
+  (date) The timestamp that the `model_size_stats` were recorded, according to the bucket timestamp of the data.
 
 `total_by_field_count`::
-  () TBD. For example: 13.
+  (long) The number of _by_ field values analyzed. Note that these are counted separately for each detector and partition.
 
 `total_over_field_count`::
-  () TBD. For example: 0.
+  (long) The number of _over_ field values analyzed. Note that these are counted separately for each detector and partition.
 
 `total_partition_field_count`::
-  () TBD. For example: 2.
+  (long) The number of _partition_ field values analyzed.
diff --git a/docs/en/rest-api/ml/update-job.asciidoc b/docs/en/rest-api/ml/update-job.asciidoc
index 19072cc5500..a14e53258f1 100644
--- a/docs/en/rest-api/ml/update-job.asciidoc
+++ b/docs/en/rest-api/ml/update-job.asciidoc
@@ -13,7 +13,7 @@ The update job API allows you to update certain properties of a job.
 
 You must have `manage_ml`, or `manage` cluster privileges to use this API.
 For more information, see <<privileges-list-cluster>>.
-//TBD: Important:: Updates do not take effect until after then job is closed and new data is sent to it.
+//TBD: Important:: Updates do not take effect until after then job is closed and re-opened.
 
 ===== Path Parameters
 
@@ -34,7 +34,8 @@ The following properties can be updated after the job is created:
   * You can update the `analysis_limits` only while the job is closed.
   * The `model_memory_limit` property value cannot be decreased.
   * If the `memory_status` property in the `model_size_stats` object has a value of `hard_limit`,
-  increasing the `model_memory_limit` is not recommended.
+  this means that it was unable to process some data. You may wish to re-run this job
+  with an increased `model_memory_limit`. 
 
 `description`::
   (string) An optional description of the job.
diff --git a/docs/en/rest-api/ml/update-snapshot.asciidoc b/docs/en/rest-api/ml/update-snapshot.asciidoc
index 8b557ea895e..6aaf3014290 100644
--- a/docs/en/rest-api/ml/update-snapshot.asciidoc
+++ b/docs/en/rest-api/ml/update-snapshot.asciidoc
@@ -11,10 +11,10 @@ The update model snapshot API enables you to update certain properties of a snap
 
 ===== Description
 
-//TBD. Is the following still true?
+//TBD. Is the following still true? - not sure but close/open would be the method
 
 Updates to the configuration are only applied after the job has been closed
-and new data has been sent to it.
+and re-opened.
 
 You must have `manage_ml`, or `manage` cluster privileges to use this API.
 For more information, see <<privileges-list-cluster>>.
@@ -32,10 +32,12 @@ For more information, see <<privileges-list-cluster>>.
 The following properties can be updated after the model snapshot is created:
 
 `description`::
-  (string) An optional description of the model snapshot.
+  (string) An optional description of the model snapshot. E.g. "Before black friday"
 
 `retain`::
-  (boolean) TBD.
+  (boolean) If true, this snapshot will not be deleted during automatic cleanup of snapshots older than `model_snapshot_retention_days`.
+  Note that this snapshot will still be deleted when the job is deleted.
+  The default value is false.
 
 ////
 ===== Responses