[DOCS] ML API docs review (elastic/x-pack-elasticsearch#1169)

* [DOCS] Fix for prelertcategory * [DOCS] _preview returns a page of data * [DOCS] Added adv options e.g. background_persist_interval" * [DOCS] Clarify meanings of model_snapshot params * [DOCS] Format fixes * [DOCS] Include _all keyword * [DOCS] Explain retain. * [DOCS] Further explanations for model size limits * [DOCS] Format fixes in quick ref * [DOCS] Update for exclude_interim * [DOCS] Update for exclude_interim * [DOCS] Update for exclude_interim Original commit: elastic/x-pack-elasticsearch@cdd2fcefdd
2025-02-23 21:38:15 +00:00 · 2017-04-24 17:31:31 +01:00 · 2017-04-24 17:31:31 +01:00 · 528ac3d902
commit 528ac3d902
parent 2c2261881d
14 changed files with 86 additions and 62 deletions
--- a/docs/en/ml/api-quickref.asciidoc
+++ b/docs/en/ml/api-quickref.asciidoc
@ -13,7 +13,7 @@ The main {ml} resources can be accessed with a variety of endpoints:
 * <<ml-api-jobs,+/anomaly_detectors/+>>: Create and manage {ml} jobs.
 * <<ml-api-datafeeds,+/datafeeds/+>>: Update data to be analyzed.
 * <<ml-api-results,+/results/+>>: Access the results of a {ml} job.
-* <<ml-api-snapshots,+/modelsnapshots/+>>: Manage model snapshots.
+* <<ml-api-snapshots,+/model_snapshots/+>>: Manage model snapshots.
 * <<ml-api-validate,+/validate/+>>: Validate subsections of job configurations.

 [float]
@ -22,7 +22,7 @@ The main {ml} resources can be accessed with a variety of endpoints:

 * <<ml-put-job,POST /anomaly_detectors>>: Create a job
 * <<ml-open-job,POST /anomaly_detectors/<job_id>/_open>>: Open a job
-* <<ml-post-data,POST anomaly_detectors/<job_id>/_data>>: Send data to a job
+* <<ml-post-data,POST /anomaly_detectors/<job_id>/_data>>: Send data to a job
 * <<ml-get-job,GET /anomaly_detectors>>: List jobs
 * <<ml-get-job,GET /anomaly_detectors/<job_id+++>+++>>: Get job details
 * <<ml-get-job-stats,GET /anomaly_detectors/<job_id>/_stats>>: Get job statistics
@ -35,15 +35,15 @@ The main {ml} resources can be accessed with a variety of endpoints:
 [[ml-api-datafeeds]]
 === /datafeeds/

-* <<ml-put-datafeed,PUT /datafeeds/<datafeedID+++>+++>>: Create a data feed
-* <<ml-start-datafeed,POST /datafeeds/<feed_id>/_start>>: Start a data feed
+* <<ml-put-datafeed,PUT /datafeeds/<datafeed_id+++>+++>>: Create a data feed
+* <<ml-start-datafeed,POST /datafeeds/<datafeed_id>/_start>>: Start a data feed
 * <<ml-get-datafeed,GET /datafeeds>>: List data feeds
-* <<ml-get-datafeed,GET /datafeeds/<feed_id+++>+++>>: Get data feed details
-* <<ml-get-datafeed-stats,GET /datafeeds/<feed_id>/_stats>>: Get statistical information for data feeds
-* <<ml-preview-datafeed,GET /datafeeds/<feed_id>/_preview>>: Get a preview of a data feed
-* <<ml-update-datafeed,POST /datafeeds/<feedid>/_update>>: Update certain settings for a data feed
-* <<ml-stop-datafeed,POST /datafeeds/<feed_id>/_stop>>: Stop a data feed
-* <<ml-delete-datafeed,DELETE /datafeeds/<feed_id+++>+++>>: Delete data feed
+* <<ml-get-datafeed,GET /datafeeds/<datafeed_id+++>+++>>: Get data feed details
+* <<ml-get-datafeed-stats,GET /datafeeds/<datafeed_id>/_stats>>: Get statistical information for data feeds
+* <<ml-preview-datafeed,GET /datafeeds/<datafeed_id>/_preview>>: Get a preview of a data feed
+* <<ml-update-datafeed,POST /datafeeds/<datafeedid>/_update>>: Update certain settings for a data feed
+* <<ml-stop-datafeed,POST /datafeeds/<datafeed_id>/_stop>>: Stop a data feed
+* <<ml-delete-datafeed,DELETE /datafeeds/<datafeed_id+++>+++>>: Delete data feed

 [float]
 [[ml-api-results]]
--- a/docs/en/rest-api/ml/datafeedresource.asciidoc
+++ b/docs/en/rest-api/ml/datafeedresource.asciidoc
@ -64,11 +64,11 @@ progress of a data feed. For example:
  The node that is running the query?
  `id`::: TBD. For example, "0-o0tOoRTwKFZifatTWKNw".
  `name`::: TBD. For example, "0-o0tOo".
-  `ephemeral_id::: TBD. For example, "DOZltLxLS_SzYpW6hQ9hyg".
-  `transport_address::: TBD. For example, "127.0.0.1:9300".
+  `ephemeral_id`::: TBD. For example, "DOZltLxLS_SzYpW6hQ9hyg".
+  `transport_address`::: TBD. For example, "127.0.0.1:9300".
  `attributes`::: TBD. For example, {"max_running_jobs": "10"}.

 `state`::
  (string) The status of the data feed, which can be one of the following values: +
-  started::: The data feed is actively receiving data.
-  stopped::: The data feed is stopped and will not receive data until it is re-started.
+  `started`::: The data feed is actively receiving data.
+  `stopped`::: The data feed is stopped and will not receive data until it is re-started.
--- a/docs/en/rest-api/ml/get-bucket.asciidoc
+++ b/docs/en/rest-api/ml/get-bucket.asciidoc
@ -45,8 +45,8 @@ roles provide these privileges. For more information, see
 `from`::
  (integer) Skips the specified number of buckets.

-`include_interim`::
-  (boolean) If true, the output includes interim results.
+`exclude_interim`::
+  (boolean) If true, the output excludes interim results. These are included by default.

 `size`::
  (integer) Specifies the maximum number of buckets to obtain.
--- a/docs/en/rest-api/ml/get-datafeed-stats.asciidoc
+++ b/docs/en/rest-api/ml/get-datafeed-stats.asciidoc
@ -23,8 +23,7 @@ privileges to use this API. For more information, see <<privileges-list-cluster>

 `feed_id`::
  (string) Identifier for the data feed.
-  If you do not specify this optional parameter, the API returns information
-  about all data feeds.
+  Does not support wildcards, however you may specify `_all` to get information about all data feeds.

 ===== Results

--- a/docs/en/rest-api/ml/get-datafeed.asciidoc
+++ b/docs/en/rest-api/ml/get-datafeed.asciidoc
@ -22,8 +22,7 @@ privileges to use this API. For more information, see <<privileges-list-cluster>

 `feed_id`::
  (string) Identifier for the data feed.
-  If you do not specify this optional parameter, the API returns information
-  about all data feeds.
+  Does not support wildcards, however you may specify `_all` or leave blank to get information about all data feeds.

 ===== Results

--- a/docs/en/rest-api/ml/get-influencer.asciidoc
+++ b/docs/en/rest-api/ml/get-influencer.asciidoc
@ -34,8 +34,8 @@ roles provide these privileges. For more information, see
 `from`::
  (integer) Skips the specified number of influencers.

-`include_interim`::
-  (boolean) If true, the output includes interim results.
+`exclude_interim`::
+  (boolean) If true, the output excludes interim results. These are included by default.

 `influencer_score`::
  (double) Returns influencers with anomaly scores higher than this value.
--- a/docs/en/rest-api/ml/get-job-stats.asciidoc
+++ b/docs/en/rest-api/ml/get-job-stats.asciidoc
@ -19,8 +19,8 @@ privileges to use this API. For more information, see <<privileges-list-cluster>
 ===== Path Parameters

 `job_id`::
-  (string) Identifier for the job. If you do not specify this optional parameter,
-  the API returns information about all jobs.
+  (string) A required identifier for the job. 
+  Does not support wildcards, however you may specify `_all` to get information about all jobs.


 ===== Results
--- a/docs/en/rest-api/ml/get-job.asciidoc
+++ b/docs/en/rest-api/ml/get-job.asciidoc
@ -19,8 +19,8 @@ privileges to use this API. For more information, see <<privileges-list-cluster>
 ===== Path Parameters

 `job_id`::
-  (string) Identifier for the job. If you do not specify this optional parameter,
-  the API returns information about all jobs.
+  (string) Identifier for the job. 
+  Does not support wildcards, however you may specify `_all` or leave blank to get information about all jobs.

 ===== Results

--- a/docs/en/rest-api/ml/get-record.asciidoc
+++ b/docs/en/rest-api/ml/get-record.asciidoc
@ -33,8 +33,8 @@ roles provide these privileges. For more information, see
 `from`::
  (integer) Skips the specified number of records.

-`include_interim`::
-  (boolean) If true, the output includes interim results.
+`exclude_interim`::
+  (boolean) If true, the output excludes interim results. These are included by default.

 `record_score`::
  (double) Returns records with anomaly scores higher than this value.
--- a/docs/en/rest-api/ml/jobresource.asciidoc
+++ b/docs/en/rest-api/ml/jobresource.asciidoc
@ -12,6 +12,13 @@ A job resource has the following properties:
  (object) Defines approximate limits on the memory resource requirements for the job.
  See <<ml-apilimits,analysis limits>>.

+`background_persist_interval`::
+  (time units) Advanced configuration option. 
+  The time between each periodic persistence of the model. 
+  The default value is a randomized value between 3 to 4 hours which avoid all jobs persisting at exactly the same time.
+  For very large models (several GB), persistence could take 10-20 minutes, so please do not set this value too low. 
+  The smallest allowed value is 1 hour. 
+
 `create_time`::
  (string) The time the job was created, in ISO 8601 format.
  For example, `1491007356077`.
@ -29,7 +36,7 @@ A job resource has the following properties:

 `job_id`::
  (string) The unique identifier for the job.
-
+ 
 `job_type`::
  (string) Reserved for future use, currently set to `anomaly_detector`.

@ -45,11 +52,22 @@ A job resource has the following properties:
  (long) The time in days that model snapshots are retained for the job.
  Older snapshots are deleted. The default value is 1 day.

+`renormalization_window_days`::
+  (long) Advanced configuration option.
+  The period over which adjustments to the score are applied, as new data is seen.
+  The default value is the longer of 30 days or 100 `bucket_spans`.
+
 `results_index_name`::
  (string) The name of the index in which to store the {ml} results.
  The default value is `shared`,
  which corresponds to the index name `.ml-anomalies-shared`

+`results_retention_days`::
+  (long) Advanced configuration option. 
+  The number of days for which job results are retained. 
+  Once per day at 00:30 (server time), results older than this period will be deleted from Elasticsearch. 
+  The default value is null, i.e. results are retained.
+
 [[ml-analysisconfig]]
 ===== Analysis Configuration Objects

@ -62,7 +80,7 @@ An analysis configuration object has the following properties:
 `categorization_field_name`::
  (string) If not null, the values of the specified field will be categorized.
  The resulting categories can be used in a detector by setting `by_field_name`,
-  `over_field_name`, or `partition_field_name` to the keyword `prelertcategory`.
+  `over_field_name`, or `partition_field_name` to the keyword `mlcategory`.

 `categorization_filters`::
  (array of strings) If `categorization_field_name` is specified,
--- a/docs/en/rest-api/ml/preview-datafeed.asciidoc
+++ b/docs/en/rest-api/ml/preview-datafeed.asciidoc
@ -6,20 +6,20 @@ The preview data feed API enables you to preview a data feed.

 ===== Request

-`GET _xpack/ml/datafeeds/<feed_id>/_preview`
+`GET _xpack/ml/datafeeds/<datafeed_id>/_preview`


 ===== Description

-//TBD: How much data does it return?
-The API returns example data by using the current data feed settings.
+The API returns the first "page" of results from the `search` created using the current data feed settings.
+This shows the structure of the data that will be passed to the anomaly detection engine.

 You must have `monitor_ml`, `monitor`, `manage_ml`, or `manage` cluster
 privileges to use this API. For more information, see <<privileges-list-cluster>>.

 ===== Path Parameters

-`feed_id` (required)::
+`datafeed_id` (required)::
  (string) Identifier for the data feed

 ////
@ -41,7 +41,7 @@ TBD
 ////
 ===== Examples

-The following example obtains a previews of the `datafeed-farequote` data feed:
+The following example obtains a preview of the `datafeed-farequote` data feed:

 [source,js]
 --------------------------------------------------
--- a/docs/en/rest-api/ml/snapshotresource.asciidoc
+++ b/docs/en/rest-api/ml/snapshotresource.asciidoc
@ -2,13 +2,11 @@
 [[ml-snapshot-resource]]
 ==== Model Snapshot Resources

-////
 Model snapshots are saved to disk periodically.
-By default, this is occurs approximately every 3 hours.
-//TBD: Can you change this setting?
+By default, this is occurs approximately every 3 hours to 4 hours and is configurable using the setting `background_persist_interval`.

 By default, model snapshots are retained for one day. You can change this
-behavior with by updating the `model_snapshot_retention_days` for the job.
+behavior by updating the `model_snapshot_retention_days` for the job.
 When choosing a new value, consider the following:

 * Persistence enables resilience in the event of a system failure.
@ -23,30 +21,31 @@ A model snapshot resource has the following properties:
  (string) An optional description of the job.

 `job_id`::
-  (string) A numerical character string that uniquely identifies the job.
+  (string) A numerical character string that uniquely identifing the job that the snapshot was created for.

 `latest_record_time_stamp`::
-  () TBD. For example: 1455232663000.
+  (date) The timestamp of the latest processed record.

 `latest_result_time_stamp`::
-  () TBD. For example: 1455229800000.
+  (date) The timestamp of the latest bucket result.

 `model_size_stats`::
-  (object) TBD. See <<ml-snapshot-stats,Model Size Statistics>>.
+  (object) Summary information describing the model. See <<ml-snapshot-stats,Model Size Statistics>>.

 `retain`::
-  (boolean) TBD. For example: false.
+  (boolean) If true, this snapshot will not be deleted during automatic cleanup of snapshots older than `model_snapshot_retention_days`.
+  However, this snapshot will be deleted when the job is deleted.
+  The default value is false.

 `snapshot_id`::
  (string) A numerical character string that uniquely identifies the model
  snapshot. For example: "1491852978".

 `snapshot_doc_count`::
-  () TBD. For example: 1.
+  (long) For internal use only.

 `timestamp`::
-  (date) The creation timestamp for the snapshot, specified in ISO 8601 format.
-  For example: 1491852978000.
+  (date) The creation timestamp for the snapshot.

 [float]
 [[ml-snapshot-stats]]
@ -55,31 +54,37 @@ A model snapshot resource has the following properties:
 The `model_size_stats` object has the following properties:

 `bucket_allocation_failures_count`::
-  () TBD. For example: 0.
+  (long) The number of buckets for which entites were not processed due to memory limit constraints.

 `job_id`::
  (string) A numerical character string that uniquely identifies the job.

 `log_time`::
-  () TBD. For example: 1491852978000.
+  (date) The timestamp that the `model_size_stats` were recorded, according to server-time.

 `memory_status`::
-  () TBD. For example: "ok".
+  (string) The status of the memory in relation to its `model_memory_limit`.
+  Contains one of the following values.
+  `ok`::: The internal models stayed below the configured value.
+  `soft_limit`::: The internal models require more than 60% of the configured memory limit and more aggressive pruning will
+  be performed in order to try to reclaim space.
+  `hard_limit`::: The internal models require more space that the configured memory limit.
+  Some incoming data could not be processed.

 `model_bytes`::
-  () TBD. For example: 100393.
+  (long) An approximation of the memory resources required for this analysis.

 `result_type`::
-  () TBD. For example: "model_size_stats".
+  (string) Internal. This value is always set to "model_size_stats".

 `timestamp`::
-  () TBD. For example: 1455229800000.
+  (date) The timestamp that the `model_size_stats` were recorded, according to the bucket timestamp of the data.

 `total_by_field_count`::
-  () TBD. For example: 13.
+  (long) The number of _by_ field values analyzed. Note that these are counted separately for each detector and partition.

 `total_over_field_count`::
-  () TBD. For example: 0.
+  (long) The number of _over_ field values analyzed. Note that these are counted separately for each detector and partition.

 `total_partition_field_count`::
-  () TBD. For example: 2.
+  (long) The number of _partition_ field values analyzed.
--- a/docs/en/rest-api/ml/update-job.asciidoc
+++ b/docs/en/rest-api/ml/update-job.asciidoc
@ -13,7 +13,7 @@ The update job API allows you to update certain properties of a job.

 You must have `manage_ml`, or `manage` cluster privileges to use this API.
 For more information, see <<privileges-list-cluster>>.
-//TBD: Important:: Updates do not take effect until after then job is closed and new data is sent to it.
+//TBD: Important:: Updates do not take effect until after then job is closed and re-opened.

 ===== Path Parameters

@ -34,7 +34,8 @@ The following properties can be updated after the job is created:
  * You can update the `analysis_limits` only while the job is closed.
  * The `model_memory_limit` property value cannot be decreased.
  * If the `memory_status` property in the `model_size_stats` object has a value of `hard_limit`,
-  increasing the `model_memory_limit` is not recommended.
+  this means that it was unable to process some data. You may wish to re-run this job
+  with an increased `model_memory_limit`. 

 `description`::
  (string) An optional description of the job.
--- a/docs/en/rest-api/ml/update-snapshot.asciidoc
+++ b/docs/en/rest-api/ml/update-snapshot.asciidoc
@ -11,10 +11,10 @@ The update model snapshot API enables you to update certain properties of a snap

 ===== Description

-//TBD. Is the following still true?
+//TBD. Is the following still true? - not sure but close/open would be the method

 Updates to the configuration are only applied after the job has been closed
-and new data has been sent to it.
+and re-opened.

 You must have `manage_ml`, or `manage` cluster privileges to use this API.
 For more information, see <<privileges-list-cluster>>.
@ -32,10 +32,12 @@ For more information, see <<privileges-list-cluster>>.
 The following properties can be updated after the model snapshot is created:

 `description`::
-  (string) An optional description of the model snapshot.
+  (string) An optional description of the model snapshot. E.g. "Before black friday"

 `retain`::
-  (boolean) TBD.
+  (boolean) If true, this snapshot will not be deleted during automatic cleanup of snapshots older than `model_snapshot_retention_days`.
+  Note that this snapshot will still be deleted when the job is deleted.
+  The default value is false.

 ////
 ===== Responses