* [DOCS] Fix for prelertcategory

* [DOCS] _preview returns a page of data

* [DOCS] Added adv options e.g. background_persist_interval"

* [DOCS] Clarify meanings of model_snapshot params

* [DOCS] Format fixes

* [DOCS] Include _all keyword

* [DOCS] Explain retain.

* [DOCS] Further explanations for model size limits

* [DOCS] Format fixes in quick ref

* [DOCS] Update for exclude_interim

* [DOCS] Update for exclude_interim

* [DOCS] Update for exclude_interim

Original commit: elastic/x-pack-elasticsearch@cdd2fcefdd
This commit is contained in:
Sophie Chang 2017-04-24 17:31:31 +01:00 committed by lcawley
parent 2c2261881d
commit 528ac3d902
14 changed files with 86 additions and 62 deletions

View File

@ -13,7 +13,7 @@ The main {ml} resources can be accessed with a variety of endpoints:
* <<ml-api-jobs,+/anomaly_detectors/+>>: Create and manage {ml} jobs.
* <<ml-api-datafeeds,+/datafeeds/+>>: Update data to be analyzed.
* <<ml-api-results,+/results/+>>: Access the results of a {ml} job.
* <<ml-api-snapshots,+/modelsnapshots/+>>: Manage model snapshots.
* <<ml-api-snapshots,+/model_snapshots/+>>: Manage model snapshots.
* <<ml-api-validate,+/validate/+>>: Validate subsections of job configurations.
[float]
@ -22,7 +22,7 @@ The main {ml} resources can be accessed with a variety of endpoints:
* <<ml-put-job,POST /anomaly_detectors>>: Create a job
* <<ml-open-job,POST /anomaly_detectors/<job_id>/_open>>: Open a job
* <<ml-post-data,POST anomaly_detectors/<job_id>/_data>>: Send data to a job
* <<ml-post-data,POST /anomaly_detectors/<job_id>/_data>>: Send data to a job
* <<ml-get-job,GET /anomaly_detectors>>: List jobs
* <<ml-get-job,GET /anomaly_detectors/<job_id+++>+++>>: Get job details
* <<ml-get-job-stats,GET /anomaly_detectors/<job_id>/_stats>>: Get job statistics
@ -35,15 +35,15 @@ The main {ml} resources can be accessed with a variety of endpoints:
[[ml-api-datafeeds]]
=== /datafeeds/
* <<ml-put-datafeed,PUT /datafeeds/<datafeedID+++>+++>>: Create a data feed
* <<ml-start-datafeed,POST /datafeeds/<feed_id>/_start>>: Start a data feed
* <<ml-put-datafeed,PUT /datafeeds/<datafeed_id+++>+++>>: Create a data feed
* <<ml-start-datafeed,POST /datafeeds/<datafeed_id>/_start>>: Start a data feed
* <<ml-get-datafeed,GET /datafeeds>>: List data feeds
* <<ml-get-datafeed,GET /datafeeds/<feed_id+++>+++>>: Get data feed details
* <<ml-get-datafeed-stats,GET /datafeeds/<feed_id>/_stats>>: Get statistical information for data feeds
* <<ml-preview-datafeed,GET /datafeeds/<feed_id>/_preview>>: Get a preview of a data feed
* <<ml-update-datafeed,POST /datafeeds/<feedid>/_update>>: Update certain settings for a data feed
* <<ml-stop-datafeed,POST /datafeeds/<feed_id>/_stop>>: Stop a data feed
* <<ml-delete-datafeed,DELETE /datafeeds/<feed_id+++>+++>>: Delete data feed
* <<ml-get-datafeed,GET /datafeeds/<datafeed_id+++>+++>>: Get data feed details
* <<ml-get-datafeed-stats,GET /datafeeds/<datafeed_id>/_stats>>: Get statistical information for data feeds
* <<ml-preview-datafeed,GET /datafeeds/<datafeed_id>/_preview>>: Get a preview of a data feed
* <<ml-update-datafeed,POST /datafeeds/<datafeedid>/_update>>: Update certain settings for a data feed
* <<ml-stop-datafeed,POST /datafeeds/<datafeed_id>/_stop>>: Stop a data feed
* <<ml-delete-datafeed,DELETE /datafeeds/<datafeed_id+++>+++>>: Delete data feed
[float]
[[ml-api-results]]

View File

@ -64,11 +64,11 @@ progress of a data feed. For example:
The node that is running the query?
`id`::: TBD. For example, "0-o0tOoRTwKFZifatTWKNw".
`name`::: TBD. For example, "0-o0tOo".
`ephemeral_id::: TBD. For example, "DOZltLxLS_SzYpW6hQ9hyg".
`transport_address::: TBD. For example, "127.0.0.1:9300".
`ephemeral_id`::: TBD. For example, "DOZltLxLS_SzYpW6hQ9hyg".
`transport_address`::: TBD. For example, "127.0.0.1:9300".
`attributes`::: TBD. For example, {"max_running_jobs": "10"}.
`state`::
(string) The status of the data feed, which can be one of the following values: +
started::: The data feed is actively receiving data.
stopped::: The data feed is stopped and will not receive data until it is re-started.
`started`::: The data feed is actively receiving data.
`stopped`::: The data feed is stopped and will not receive data until it is re-started.

View File

@ -45,8 +45,8 @@ roles provide these privileges. For more information, see
`from`::
(integer) Skips the specified number of buckets.
`include_interim`::
(boolean) If true, the output includes interim results.
`exclude_interim`::
(boolean) If true, the output excludes interim results. These are included by default.
`size`::
(integer) Specifies the maximum number of buckets to obtain.

View File

@ -23,8 +23,7 @@ privileges to use this API. For more information, see <<privileges-list-cluster>
`feed_id`::
(string) Identifier for the data feed.
If you do not specify this optional parameter, the API returns information
about all data feeds.
Does not support wildcards, however you may specify `_all` to get information about all data feeds.
===== Results

View File

@ -22,8 +22,7 @@ privileges to use this API. For more information, see <<privileges-list-cluster>
`feed_id`::
(string) Identifier for the data feed.
If you do not specify this optional parameter, the API returns information
about all data feeds.
Does not support wildcards, however you may specify `_all` or leave blank to get information about all data feeds.
===== Results

View File

@ -34,8 +34,8 @@ roles provide these privileges. For more information, see
`from`::
(integer) Skips the specified number of influencers.
`include_interim`::
(boolean) If true, the output includes interim results.
`exclude_interim`::
(boolean) If true, the output excludes interim results. These are included by default.
`influencer_score`::
(double) Returns influencers with anomaly scores higher than this value.

View File

@ -19,8 +19,8 @@ privileges to use this API. For more information, see <<privileges-list-cluster>
===== Path Parameters
`job_id`::
(string) Identifier for the job. If you do not specify this optional parameter,
the API returns information about all jobs.
(string) A required identifier for the job.
Does not support wildcards, however you may specify `_all` to get information about all jobs.
===== Results

View File

@ -19,8 +19,8 @@ privileges to use this API. For more information, see <<privileges-list-cluster>
===== Path Parameters
`job_id`::
(string) Identifier for the job. If you do not specify this optional parameter,
the API returns information about all jobs.
(string) Identifier for the job.
Does not support wildcards, however you may specify `_all` or leave blank to get information about all jobs.
===== Results

View File

@ -33,8 +33,8 @@ roles provide these privileges. For more information, see
`from`::
(integer) Skips the specified number of records.
`include_interim`::
(boolean) If true, the output includes interim results.
`exclude_interim`::
(boolean) If true, the output excludes interim results. These are included by default.
`record_score`::
(double) Returns records with anomaly scores higher than this value.

View File

@ -12,6 +12,13 @@ A job resource has the following properties:
(object) Defines approximate limits on the memory resource requirements for the job.
See <<ml-apilimits,analysis limits>>.
`background_persist_interval`::
(time units) Advanced configuration option.
The time between each periodic persistence of the model.
The default value is a randomized value between 3 to 4 hours which avoid all jobs persisting at exactly the same time.
For very large models (several GB), persistence could take 10-20 minutes, so please do not set this value too low.
The smallest allowed value is 1 hour.
`create_time`::
(string) The time the job was created, in ISO 8601 format.
For example, `1491007356077`.
@ -29,7 +36,7 @@ A job resource has the following properties:
`job_id`::
(string) The unique identifier for the job.
`job_type`::
(string) Reserved for future use, currently set to `anomaly_detector`.
@ -45,11 +52,22 @@ A job resource has the following properties:
(long) The time in days that model snapshots are retained for the job.
Older snapshots are deleted. The default value is 1 day.
`renormalization_window_days`::
(long) Advanced configuration option.
The period over which adjustments to the score are applied, as new data is seen.
The default value is the longer of 30 days or 100 `bucket_spans`.
`results_index_name`::
(string) The name of the index in which to store the {ml} results.
The default value is `shared`,
which corresponds to the index name `.ml-anomalies-shared`
`results_retention_days`::
(long) Advanced configuration option.
The number of days for which job results are retained.
Once per day at 00:30 (server time), results older than this period will be deleted from Elasticsearch.
The default value is null, i.e. results are retained.
[[ml-analysisconfig]]
===== Analysis Configuration Objects
@ -62,7 +80,7 @@ An analysis configuration object has the following properties:
`categorization_field_name`::
(string) If not null, the values of the specified field will be categorized.
The resulting categories can be used in a detector by setting `by_field_name`,
`over_field_name`, or `partition_field_name` to the keyword `prelertcategory`.
`over_field_name`, or `partition_field_name` to the keyword `mlcategory`.
`categorization_filters`::
(array of strings) If `categorization_field_name` is specified,

View File

@ -6,20 +6,20 @@ The preview data feed API enables you to preview a data feed.
===== Request
`GET _xpack/ml/datafeeds/<feed_id>/_preview`
`GET _xpack/ml/datafeeds/<datafeed_id>/_preview`
===== Description
//TBD: How much data does it return?
The API returns example data by using the current data feed settings.
The API returns the first "page" of results from the `search` created using the current data feed settings.
This shows the structure of the data that will be passed to the anomaly detection engine.
You must have `monitor_ml`, `monitor`, `manage_ml`, or `manage` cluster
privileges to use this API. For more information, see <<privileges-list-cluster>>.
===== Path Parameters
`feed_id` (required)::
`datafeed_id` (required)::
(string) Identifier for the data feed
////
@ -41,7 +41,7 @@ TBD
////
===== Examples
The following example obtains a previews of the `datafeed-farequote` data feed:
The following example obtains a preview of the `datafeed-farequote` data feed:
[source,js]
--------------------------------------------------

View File

@ -2,13 +2,11 @@
[[ml-snapshot-resource]]
==== Model Snapshot Resources
////
Model snapshots are saved to disk periodically.
By default, this is occurs approximately every 3 hours.
//TBD: Can you change this setting?
By default, this is occurs approximately every 3 hours to 4 hours and is configurable using the setting `background_persist_interval`.
By default, model snapshots are retained for one day. You can change this
behavior with by updating the `model_snapshot_retention_days` for the job.
behavior by updating the `model_snapshot_retention_days` for the job.
When choosing a new value, consider the following:
* Persistence enables resilience in the event of a system failure.
@ -23,30 +21,31 @@ A model snapshot resource has the following properties:
(string) An optional description of the job.
`job_id`::
(string) A numerical character string that uniquely identifies the job.
(string) A numerical character string that uniquely identifing the job that the snapshot was created for.
`latest_record_time_stamp`::
() TBD. For example: 1455232663000.
(date) The timestamp of the latest processed record.
`latest_result_time_stamp`::
() TBD. For example: 1455229800000.
(date) The timestamp of the latest bucket result.
`model_size_stats`::
(object) TBD. See <<ml-snapshot-stats,Model Size Statistics>>.
(object) Summary information describing the model. See <<ml-snapshot-stats,Model Size Statistics>>.
`retain`::
(boolean) TBD. For example: false.
(boolean) If true, this snapshot will not be deleted during automatic cleanup of snapshots older than `model_snapshot_retention_days`.
However, this snapshot will be deleted when the job is deleted.
The default value is false.
`snapshot_id`::
(string) A numerical character string that uniquely identifies the model
snapshot. For example: "1491852978".
`snapshot_doc_count`::
() TBD. For example: 1.
(long) For internal use only.
`timestamp`::
(date) The creation timestamp for the snapshot, specified in ISO 8601 format.
For example: 1491852978000.
(date) The creation timestamp for the snapshot.
[float]
[[ml-snapshot-stats]]
@ -55,31 +54,37 @@ A model snapshot resource has the following properties:
The `model_size_stats` object has the following properties:
`bucket_allocation_failures_count`::
() TBD. For example: 0.
(long) The number of buckets for which entites were not processed due to memory limit constraints.
`job_id`::
(string) A numerical character string that uniquely identifies the job.
`log_time`::
() TBD. For example: 1491852978000.
(date) The timestamp that the `model_size_stats` were recorded, according to server-time.
`memory_status`::
() TBD. For example: "ok".
(string) The status of the memory in relation to its `model_memory_limit`.
Contains one of the following values.
`ok`::: The internal models stayed below the configured value.
`soft_limit`::: The internal models require more than 60% of the configured memory limit and more aggressive pruning will
be performed in order to try to reclaim space.
`hard_limit`::: The internal models require more space that the configured memory limit.
Some incoming data could not be processed.
`model_bytes`::
() TBD. For example: 100393.
(long) An approximation of the memory resources required for this analysis.
`result_type`::
() TBD. For example: "model_size_stats".
(string) Internal. This value is always set to "model_size_stats".
`timestamp`::
() TBD. For example: 1455229800000.
(date) The timestamp that the `model_size_stats` were recorded, according to the bucket timestamp of the data.
`total_by_field_count`::
() TBD. For example: 13.
(long) The number of _by_ field values analyzed. Note that these are counted separately for each detector and partition.
`total_over_field_count`::
() TBD. For example: 0.
(long) The number of _over_ field values analyzed. Note that these are counted separately for each detector and partition.
`total_partition_field_count`::
() TBD. For example: 2.
(long) The number of _partition_ field values analyzed.

View File

@ -13,7 +13,7 @@ The update job API allows you to update certain properties of a job.
You must have `manage_ml`, or `manage` cluster privileges to use this API.
For more information, see <<privileges-list-cluster>>.
//TBD: Important:: Updates do not take effect until after then job is closed and new data is sent to it.
//TBD: Important:: Updates do not take effect until after then job is closed and re-opened.
===== Path Parameters
@ -34,7 +34,8 @@ The following properties can be updated after the job is created:
* You can update the `analysis_limits` only while the job is closed.
* The `model_memory_limit` property value cannot be decreased.
* If the `memory_status` property in the `model_size_stats` object has a value of `hard_limit`,
increasing the `model_memory_limit` is not recommended.
this means that it was unable to process some data. You may wish to re-run this job
with an increased `model_memory_limit`.
`description`::
(string) An optional description of the job.

View File

@ -11,10 +11,10 @@ The update model snapshot API enables you to update certain properties of a snap
===== Description
//TBD. Is the following still true?
//TBD. Is the following still true? - not sure but close/open would be the method
Updates to the configuration are only applied after the job has been closed
and new data has been sent to it.
and re-opened.
You must have `manage_ml`, or `manage` cluster privileges to use this API.
For more information, see <<privileges-list-cluster>>.
@ -32,10 +32,12 @@ For more information, see <<privileges-list-cluster>>.
The following properties can be updated after the model snapshot is created:
`description`::
(string) An optional description of the model snapshot.
(string) An optional description of the model snapshot. E.g. "Before black friday"
`retain`::
(boolean) TBD.
(boolean) If true, this snapshot will not be deleted during automatic cleanup of snapshots older than `model_snapshot_retention_days`.
Note that this snapshot will still be deleted when the job is deleted.
The default value is false.
////
===== Responses