[DOCS] Add snapshot API examples

Original commit: elastic/x-pack-elasticsearch@8928c3c1be
This commit is contained in:
lcawley 2017-04-11 13:25:38 -07:00
parent a0771019a5
commit 298ee9f554
11 changed files with 379 additions and 198 deletions

View File

@ -62,6 +62,11 @@ include::ml/validate-job.asciidoc[]
[[ml-api-snapshot-endpoint]]
=== Model Snapshots
* <<ml-delete-snapshot,Delete model snapshots>>
* <<ml-get-snapshot,Get model snapshots>>
* <<ml-revert-snapshot,Revert model snapshots>>
* <<ml-update-snapshot,Update model snapshots>>
include::ml/delete-snapshot.asciidoc[]
include::ml/get-snapshot.asciidoc[]
include::ml/revert-snapshot.asciidoc[]
@ -83,6 +88,13 @@ include::ml/get-record.asciidoc[]
[[ml-api-definitions]]
=== Definitions
* <<ml-datafeed-resource,Data feeds>>
* <<ml-datafeed-counts,Data feed counts>>
* <<ml-job-resource,Jobs>>
* <<ml-jobcounts,Job counts>>
* <<ml-snapshot-resource,Model snapshots>>
* <<ml-results-resource,Results>>
include::ml/datafeedresource.asciidoc[]
include::ml/jobresource.asciidoc[]
include::ml/jobcounts.asciidoc[]

View File

@ -48,8 +48,7 @@ The get data feed statistics API provides information about the operational
progress of a data feed. For example:
`assigment_explanation`::
TBD
For example: ""
TBD. For example: ""
`node`::
(+object+) TBD
@ -61,5 +60,5 @@ progress of a data feed. For example:
`state`::
(+string+) The status of the data feed,
which can be one of the following values:
* started:: The data feed is actively receiving data.
* stopped:: The data feed is stopped and will not receive data until it is re-started.
started:: The data feed is actively receiving data.
stopped:: The data feed is stopped and will not receive data until it is re-started.

View File

@ -7,21 +7,15 @@ The delete model snapshot API allows you to delete an existing model snapshot.
`DELETE _xpack/ml/anomaly_detectors/<job_id>/model_snapshots/<snapshot_id>`
////
===== Description
All job configuration, model state and results are deleted.
IMPORTANT: You cannot delete the active model snapshot. To delete that snapshot,
first revert to a different one.
IMPORTANT: Deleting a job must be done via this API only. Do not delete the
job directly from the `.ml-*` indices using the Elasticsearch
DELETE Document API. When {security} is enabled, make sure no `write`
privileges are granted to anyone over the `.ml-*` indices.
//TBD: Where do you see restorePriority? Per old docs, the active model snapshot
//is "...the snapshot with the highest restorePriority".
Before you can delete a job, you must delete the data feeds that are associated with it.
//See <<>>.
It is not currently possible to delete multiple jobs using wildcards or a comma separated list.
////
===== Path Parameters
`job_id` (required)::

View File

@ -38,20 +38,19 @@ The API returns the following usage information:
`state`::
(+string+) The status of the job, which can be one of the following values:
open:: The job is actively receiving and processing data.
closed:: The job finished successfully with its model state persisted.
`open`::: The job is actively receiving and processing data.
`closed`::: The job finished successfully with its model state persisted.
The job is still available to accept further data.
`closing`::: TBD
`failed`::: The job did not finish successfully due to an error.
This situation can occur due to invalid input data. In this case,
sending corrected data to a failed job re-opens the job and
resets it to an open state.
closing:: TBD
NOTE: If you send data in a periodic cycle and close the job at the end of each transaction,
the job is marked as closed in the intervals between when data is sent.
For example, if data is sent every minute and it takes 1 second to process, the job has a closed state for 59 seconds.
failed:: The job did not finish successfully due to an error. NOTE: This can occur due to invalid input data.
In this case, sending corrected data to a failed job re-opens the job and resets it to a running state.
NOTE: If you send data in a periodic cycle and close the job at the end of
each transaction, the job is marked as closed in the intervals between
when data is sent. For example, if data is sent every minute and it takes
1 second to process, the job has a closed state for 59 seconds.
////
===== Responses

View File

@ -19,20 +19,44 @@ OUTDATED?: The get job API can also be applied to all jobs by using `_all` as th
(+string+) Identifier for the job.
`snapshot_id`::
(+string+) Identifier for the job. If you do not specify this optional parameter,
(+string+) Identifier for the model snapshot. If you do not specify this optional parameter,
the API returns information about all model snapshots that you have authority to view.
////
===== Request Body
`desc`::
(+boolean+) If true, the results are sorted in descending order.
`description`::
(+string+) Returns snapshots that match this description.
//TBD: I couldn't get this to work. What description value is it using?
NOTE: It might be necessary to URL encode the description.
`end`::
(+date+) Returns snapshots with timestamps earlier than this time.
`from`::
(+integer+) Skips the specified number of snapshots.
`size`::
(+integer+) Specifies the maximum number of snapshots to obtain.
`sort`::
(+string+) Specifies the sort field for the requested snapshots.
//By default, the snapshots are sorted by the xxx value.
`start`::
(+string+) Returns snapshots with timestamps after this time.
===== Results
The API returns information about the job resource. For more information, see
<<ml-job-resource,job resources>>.
The API returns the following information:
===== Query Parameters
`_stats`::
(+boolean+; default: ++true++) If true (default false), will just validate the cluster definition but will not perform the creation
`model_snapshots`::
(+array+) An array of model snapshot objects. For more information, see
<<ml-snapshot-resource,Model Snapshots>>.
////
===== Responses
200
@ -41,46 +65,50 @@ The API returns information about the job resource. For more information, see
(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found)
412
(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error)
////
===== Examples
.Example results for a single job
The following example gets model snapshot information for the
`it_ops_new_logs` job:
[source,js]
--------------------------------------------------
GET _xpack/ml/anomaly_detectors/it_ops_new_logs/model_snapshots
{
"start": "1491852977000"
}
--------------------------------------------------
// CONSOLE
// TEST[skip:todo]
In this example, the API provides a single result:
----
{
"count": 1,
"jobs": [
"model_snapshots": [
{
"job_id": "it-ops-kpi",
"description": "First simple job",
"create_time": 1491007356077,
"finished_time": 1491007365347,
"analysis_config": {
"bucket_span": "5m",
"latency": "0ms",
"summary_count_field_name": "doc_count",
"detectors": [
{
"detector_description": "low_sum(events_per_min)",
"function": "low_sum",
"field_name": "events_per_min",
"detector_rules": []
}
],
"influencers": [],
"use_per_partition_normalization": false
"job_id": "it_ops_new_logs",
"timestamp": 1491852978000,
"description": "State persisted due to job close at 2017-04-10T12:36:18-0700",
"snapshot_id": "1491852978",
"snapshot_doc_count": 1,
"model_size_stats": {
"job_id": "it_ops_new_logs",
"result_type": "model_size_stats",
"model_bytes": 100393,
"total_by_field_count": 13,
"total_over_field_count": 0,
"total_partition_field_count": 2,
"bucket_allocation_failures_count": 0,
"memory_status": "ok",
"log_time": 1491852978000,
"timestamp": 1455229800000
},
"data_description": {
"time_field": "@timestamp",
"time_format": "epoch_ms"
},
"model_plot_config": {
"enabled": true
},
"model_snapshot_retention_days": 1,
"model_snapshot_id": "1491007364",
"results_index_name": "shared"
"latest_record_time_stamp": 1455232663000,
"latest_result_time_stamp": 1455229800000,
"retain": false
}
]
}
----
////

View File

@ -18,9 +18,10 @@ A `data_counts` object has the following properties:
`processed_record_count`::
(+long+) The number of records that have been processed by the job.
This value includes records with missing fields, since they are nonetheless analyzed.
+
The following records are not processed:
* Records not in chronological order and outside the latency window
* Records with invalid timestamps
* Records with invalid timestamp
* Records filtered by an exclude transform
`processed_field_count`::
@ -107,11 +108,9 @@ NOTE: The `over` field values are counted separately for each detector and parti
`memory_status`::
(+string+) The status of the mathematical models. This property can have one of the following values:
"ok":: The models stayed below the configured value.
"soft_limit":: The models used more than 60% of the configured memory limit and older unused models will
be pruned to free up space.
"hard_limit":: The models used more space than the configured memory limit. As a result,
not all incoming data was processed.
`ok`::: The models stayed below the configured value.
`soft_limit`::: The models used more than 60% of the configured memory limit and older unused models will be pruned to free up space.
`hard_limit`::: The models used more space than the configured memory limit. As a result, not all incoming data was processed.
`log_time`::
TBD

View File

@ -32,62 +32,105 @@ the anomaly records into buckets.
A record object has the following properties:
`actual`::
TBD. For example, [633].
(+number+) The actual value for the bucket.
`bucket_span`::
TBD. For example, 600.
(+number+) The length of the bucket in seconds.
This value matches the `bucket_span` that is specified in the job.
//`byFieldName`::
//TBD: This field did not appear in my results, but it might be a valid property.
// (+string+) The name of the analyzed field, if it was specified in the detector.
//`byFieldValue`::
//TBD: This field did not appear in my results, but it might be a valid property.
// (+string+) The value of `by_field_name`, if it was specified in the detecter.
//`causes`
//TBD: This field did not appear in my results, but it might be a valid property.
// (+array+) If an over field was specified in the detector, this property
// contains an array of anomaly records that are the causes for the anomaly
// that has been identified for the over field.
// If no over fields exist. this field will not be present.
// This sub-resource contains the most anomalous records for the `over_field_name`.
// For scalability reasons, a maximum of the 10 most significant causes of
// the anomaly will be returned. As part of the core analytical modeling,
// these low-level anomaly records are aggregated for their parent over field record.
// The causes resource contains similar elements to the record resource,
// namely actual, typical, *FieldName and *FieldValue.
// Probability and scores are not applicable to causes.
`detector_index`::
TBD. For example, 0.
(+number+) A unique identifier for the detector.
//`fieldName`::
// TBD: This field did not appear in my results, but it might be a valid property.
// (+string+) Certain functions require a field to operate on. For those functions,
// this is the name of the field to be analyzed.
`function`::
TBD. For example, "low_non_zero_count".
(+string+) The function in which the anomaly occurs.
`function_description`::
TBD. For example, "count".
(+string+) The description of the function in which the anomaly occurs, as
specified in the detector configuration information.
`influencers`::
TBD. For example, [{
"influencer_field_name": "kpi_indicator",
"influencer_field_values": [
"online_purchases"]}].
(+array+) If `influencers` was specified in the detector configuration, then
this array contains influencers that contributed to or were to blame for an
anomaly.
`initial_record_score`::
TBD. For example, 94.1386.
(++) TBD. For example, 94.1386.
`is_interim`::
TBD. For example, false.
(+boolean+) If true, then this anomaly record is an interim result.
In other words, it is calculated based on partial input data
`job_id`::
TBD. For example, "it_ops_new_kpi".
(+string+) A numerical character string that uniquely identifies the job.
`kpi_indicator`::
TBD. For example, ["online_purchases"]
(++) TBD. For example, ["online_purchases"]
`partition_field_name`::
TBD. For example, "kpi_indicator".
(+string+) The name of the partition field that was used in the analysis, if
such a field was specified in the detector.
//`overFieldName`::
// TBD: This field did not appear in my results, but it might be a valid property.
// (+string+) The name of the over field, if `over_field_name` was specified
// in the detector.
`partition_field_value`::
TBD. For example, "online_purchases".
(+string+) The value of the partition field that was used in the analysis, if
`partition_field_name` was specified in the detector.
`probability`::
TBD. For example, 0.0000772031.
(+number+) The probability of the individual anomaly occurring.
This value is in the range 0 to 1. For example, 0.0000772031.
//This value is held to a high precision of over 300 decimal places.
//In scientific notation, a value of 3.24E-300 is highly unlikely and therefore
//highly anomalous.
`record_score`::
TBD. For example, 94.1386.
(+number+) An anomaly score for the bucket time interval.
The score is calculated based on a sophisticated aggregation of the anomalies
in the bucket.
//Use this score for rate-controlled alerting.
`result_type`::
TBD. For example, "record".
(+string+) TBD. For example, "record".
`sequence_num`::
TBD. For example, 1.
(++) TBD. For example, 1.
`timestamp`::
(+date+) The start time of the bucket that contains the record,
specified in ISO 8601 format. For example, 1454020800000.
(+date+) The start time of the bucket that contains the record, specified in
ISO 8601 format. For example, 1454020800000.
`typical`::
TBD. For example, [3596.71].
(+number+) The typical value for the bucket, according to analytical modeling.
[float]
[[ml-results-influencers]]
@ -106,42 +149,53 @@ records that contain this influencer.
An influencer object has the following properties:
`bucket_span`::
TBD. For example, 300.
(++) TBD. For example, 300.
// Same as for buckets? i.e. (+unsigned integer+) The length of the bucket in seconds.
// This value is equal to the `bucket_span` value in the job configuration.
`job_id`::
(+string+) A numerical character string that uniquely identifies the job.
`influencer_score`::
TBD. For example: 94.1386.
(+number+) An anomaly score for the influencer in this bucket time interval.
The score is calculated based upon a sophisticated aggregation of the anomalies
in the bucket for this entity. For example: 94.1386.
`initial_influencer_score`::
TBD. For example, 83.3831.
(++) TBD. For example, 83.3831.
`influencer_field_name`::
TBD. For example, "bucket_time".
(+string+) The field name of the influencer.
`influencer_field_value`::
TBD. For example, "online_purchases".
(+string+) The entity that influenced, contributed to, or was to blame for the
anomaly.
`is_interim`::
TBD. For example, false.
(+boolean+) If true, then this is an interim result.
In other words, it is calculated based on partial input data.
`kpi_indicator`::
TBD. For example, "online_purchases".
(++) TBD. For example, "online_purchases".
`probability`::
TBD. For example, 0.0000109783.
(+number+) The probability that the influencer has this behavior.
This value is in the range 0 to 1. For example, 0.0000109783.
// For example, 0.03 means 3%. This value is held to a high precision of over
//300 decimal places. In scientific notation, a value of 3.24E-300 is highly
//unlikely and therefore highly anomalous.
`result_type`::
TBD. For example, "influencer".
(++) TBD. For example, "influencer".
//TBD: How is this different from the "bucket_influencer" type?
`sequence_num`::
`TBD. For example, 2.
(++) TBD. For example, 2.
`timestamp`::
TBD. For example, 1454943900000.
(+date+) Influencers are produced in buckets. This value is the start time
of the bucket, specified in ISO 8601 format. For example, 1454943900000.
[float]
[[ml-results-buckets]]

View File

@ -1,18 +1,54 @@
[[ml-revert-snapshot]]
==== Update Model Snapshots
==== Revert Model Snapshots
The update model snapshot API allows you to update certain properties of a snapshot.
The revert model snapshot API allows you to revert to a specific snapshot.
===== Request
`POST _xpack/ml/anomaly_detectors/<job_id>/model_snapshots/<snapshot_id>/_update`
`POST _xpack/ml/anomaly_detectors/<job_id>/model_snapshots/<snapshot_id>/_revert`
////
===== Description
Important:: Updates do not take effect until after then job is closed and new
data is sent to it.
The {ml} feature in {xpack} reacts quickly to anomalous input, learning new behaviors in data.
Highly anomalous input increases the variance in the models whilst the system learns
whether this is a new step-change in behavior or a one-off event. In the case
where this anomalous input is known to be a one-off, then it might be appropriate
to reset the model state to a time before this event. For example, you might
consider reverting to a saved snapshot after Black Friday
or a critical system failure.
////
To revert to a saved snapshot, you must follow this sequence:
. Close the job
. Revert to a snapshot
. Open the job
. Send new data to the job
When reverting to a snapshot, there is a choice to make about whether or not
you want to keep the results that were created between the time of the snapshot
and the current time. In the case of Black Friday for instance, you might want
to keep the results and carry on processing data from the current time,
though without the models learning the one-off behavior and compensating for it.
However, say in the event of a critical system failure and you decide to reset
and models to a previous known good state and process data from that time,
it makes sense to delete the intervening results for the known bad period and
resend data from that earlier time.
Any gaps in data since the snapshot time will be treated as nulls and not modeled.
If there is a partial bucket at the end of the snapshot and/or at the beginning
of the new input data, then this will be ignored and treated as a gap.
For jobs with many entities, the model state may be very large.
If a model state is several GB, this could take 10-20 mins to revert depending
upon machine spec and resources. If this is the case, please ensure this time
is planned for.
Model size (in bytes) is available as part of the Job Resource Model Size Stats.
////
IMPORTANT: Before you revert to a saved snapshot, you must close the job.
Sending data to a closed job changes its status to `open`, so you must also
ensure that you do not expect data imminently.
===== Path Parameters
`job_id` (required)::
@ -23,30 +59,16 @@ data is sent to it.
===== Request Body
The following properties can be updated after the job is created:
`delete_intervening_results`::
(+boolean+) If true, deletes the results in the time period between the
latest results and the time of the reverted snapshot. It also resets the
model to accept records for this time period.
TBD
NOTE: If you choose not to delete intervening results when reverting a snapshot,
the job will not accept input data that is older than the current time.
If you want to resend data, then delete the intervening results.
////
`analysis_config`::
(+object+) The analysis configuration, which specifies how to analyze the data.
See <<ml-analysisconfig, analysis configuration objects>>. In particular, the following properties can be updated: `categorization_filters`, `detector_description`, TBD.
`analysis_limits`::
Optionally specifies runtime limits for the job. See <<ml-apilimits,analysis limits>>.
[NOTE]
* You can update the `analysis_limits` only while the job is closed.
* The `model_memory_limit` property value cannot be decreased.
* If the `memory_status` property in the `model_size_stats` object has a value of `hard_limit`,
increasing the `model_memory_limit` is not recommended.
`description`::
(+string+) An optional description of the job.
This expects data to be sent in JSON format using the POST `_data` API.
===== Responses
TBD
@ -56,34 +78,49 @@ TBD
(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found)
412
(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error)
////
===== Examples
The following example updates the `it-ops-kpi` job:
The following example reverts to the `1491856080` snapshot for the
`it_ops_new_kpi` job:
[source,js]
--------------------------------------------------
PUT _xpack/ml/anomaly_detectors/it-ops-kpi/_update
POST
_xpack/ml/anomaly_detectors/it_ops_new_kpi/model_snapshots/1491856080/_revert
{
"description":"New description",
"analysis_limits":{
"model_memory_limit": 8192
}
"delete_intervening_results": true
}
--------------------------------------------------
// CONSOLE
// TEST[skip:todo]
When the job is updated, you receive the following results:
When the operation is complete, you receive the following results:
----
{
"job_id": "it-ops-kpi",
"description": "New description",
...
"analysis_limits": {
"model_memory_limit": 8192
...
"acknowledged": true,
"model": {
"job_id": "it_ops_new_kpi",
"timestamp": 1491856080000,
"description": "State persisted due to job close at 2017-04-10T13:28:00-0700",
"snapshot_id": "1491856080",
"snapshot_doc_count": 1,
"model_size_stats": {
"job_id": "it_ops_new_kpi",
"result_type": "model_size_stats",
"model_bytes": 29518,
"total_by_field_count": 3,
"total_over_field_count": 0,
"total_partition_field_count": 2,
"bucket_allocation_failures_count": 0,
"memory_status": "ok",
"log_time": 1491856080000,
"timestamp": 1455318000000
},
"latest_record_time_stamp": 1455318669000,
"latest_result_time_stamp": 1455318000000,
"retain": false
}
}
----
////

View File

@ -1,10 +1,82 @@
[[ml-snapshot-resource]]
==== Model Snapshot Resources
////
Model snapshots are saved to disk periodically.
By default, this is occurs approximately every 3 hours.
//TBD: Can you change this setting?
By default, model snapshots are retained for one day. You can change this
behavior with by updating the `model_snapshot_retention_days` for the job.
When choosing a new value, consider the following:
* Persistence enables resilience in the event of a system failure.
* Persistence allows for snapshots to be reverted.
* The time taken to persist a job is proportional to the size of the model in memory.
//* The smallest allowed value is 3600 (1 hour).
////
A model snapshot resource has the following properties:
TBD
////
`analysis_config`::
(+object+) The analysis configuration, which specifies how to analyze the data. See <<ml-analysisconfig, analysis configuration objects>>.
////
`description`::
(+string+) An optional description of the job.
`job_id`::
(+string+) A numerical character string that uniquely identifies the job.
`latest_record_time_stamp`::
(++) TBD. For example: 1455232663000.
`latest_result_time_stamp`::
(++) TBD. For example: 1455229800000.
`model_size_stats`::
(+object+) TBD.
`retain`::
(+boolean+) TBD. For example: false.
`snapshot_id`::
(+string+) A numerical character string that uniquely identifies the model
snapshot. For example: "1491852978".
`snapshot_doc_count`::
(++) TBD. For example: 1.
`timestamp`::
(+date+) The creation timestamp for the snapshot, specified in ISO 8601 format.
For example: 1491852978000.
===== Model Size Statistics
The `model_size_stats` object has the following properties:
`bucket_allocation_failures_count`::
(++) TBD. For example: 0.
`job_id`::
(+string+) A numerical character string that uniquely identifies the job.
`log_time`::
(++) TBD. For example: 1491852978000.
`memory_status`::
(++) TBD. For example: "ok".
`model_bytes`::
(++) TBD. For example: 100393.
`result_type`::
(++) TBD. For example: "model_size_stats".
`timestamp`::
(++) TBD. For example: 1455229800000.
`total_by_field_count`::
(++) TBD. For example: 13.
`total_over_field_count`::
(++) TBD. For example: 0.
`total_partition_field_count`::
(++) TBD. For example: 2.

View File

@ -7,12 +7,14 @@ The update model snapshot API allows you to update certain properties of a snaps
`POST _xpack/ml/anomaly_detectors/<job_id>/model_snapshots/<snapshot_id>/_update`
////
===== Description
Important:: Updates do not take effect until after then job is closed and new
data is sent to it.
////
//TBD. Is the following still true?
Updates to the configuration are only applied after the job has been closed
and new data has been sent to it.
===== Path Parameters
`job_id` (required)::
@ -23,30 +25,15 @@ data is sent to it.
===== Request Body
The following properties can be updated after the job is created:
TBD
////
`analysis_config`::
(+object+) The analysis configuration, which specifies how to analyze the data.
See <<ml-analysisconfig, analysis configuration objects>>. In particular, the following properties can be updated: `categorization_filters`, `detector_description`, TBD.
`analysis_limits`::
Optionally specifies runtime limits for the job. See <<ml-apilimits,analysis limits>>.
[NOTE]
* You can update the `analysis_limits` only while the job is closed.
* The `model_memory_limit` property value cannot be decreased.
* If the `memory_status` property in the `model_size_stats` object has a value of `hard_limit`,
increasing the `model_memory_limit` is not recommended.
The following properties can be updated after the model snapshot is created:
`description`::
(+string+) An optional description of the job.
(+string+) An optional description of the model snapshot.
`retain`::
(+boolean+) TBD.
This expects data to be sent in JSON format using the POST `_data` API.
////
===== Responses
TBD
@ -56,34 +43,34 @@ TBD
(BasicFailedReply) The cluster specified by {cluster_id} cannot be found (code: clusters.cluster_not_found)
412
(BasicFailedReply) The Elasticsearch cluster has not been shutdown yet (code: clusters.cluster_plan_state_error)
////
===== Examples
The following example updates the `it-ops-kpi` job:
The following example updates the snapshot identified as `1491852978`:
[source,js]
--------------------------------------------------
PUT _xpack/ml/anomaly_detectors/it-ops-kpi/_update
POST
_xpack/ml/anomaly_detectors/it_ops_new_logs/model_snapshots/1491852978/_update
{
"description":"New description",
"analysis_limits":{
"model_memory_limit": 8192
}
"description": "Snapshot 1",
"retain": true
}
--------------------------------------------------
// CONSOLE
// TEST[skip:todo]
When the job is updated, you receive the following results:
When the snapshot is updated, you receive the following results:
----
{
"job_id": "it-ops-kpi",
"description": "New description",
...
"analysis_limits": {
"model_memory_limit": 8192
"acknowledged": true,
"model": {
"job_id": "it_ops_new_logs",
"timestamp": 1491852978000,
"description": "Snapshot 1",
...
"retain": true
}
}
----
////