[DOCS] Review of API docs part 1 (elastic/x-pack-elasticsearch#1118)
* [DOCS] Review of close job and job stats * [DOCS] Add force close * [DOCS] Remove invalid params from get records * [DOCS] Remove invalid params from get buckets * [DOCS] Job resource corrections Original commit: elastic/x-pack-elasticsearch@bc68d05097
This commit is contained in:
parent
7ee48846ec
commit
e2cc00ab8e
|
@ -91,7 +91,7 @@ include::ml/get-record.asciidoc[]
|
||||||
* <<ml-datafeed-resource,Data feeds>>
|
* <<ml-datafeed-resource,Data feeds>>
|
||||||
* <<ml-datafeed-counts,Data feed counts>>
|
* <<ml-datafeed-counts,Data feed counts>>
|
||||||
* <<ml-job-resource,Jobs>>
|
* <<ml-job-resource,Jobs>>
|
||||||
* <<ml-jobcounts,Job counts>>
|
* <<ml-jobstats,Job Stats>>
|
||||||
* <<ml-snapshot-resource,Model snapshots>>
|
* <<ml-snapshot-resource,Model snapshots>>
|
||||||
* <<ml-results-resource,Results>>
|
* <<ml-results-resource,Results>>
|
||||||
|
|
||||||
|
|
|
@ -5,6 +5,9 @@
|
||||||
The close job API enables you to close a job.
|
The close job API enables you to close a job.
|
||||||
A job can be opened and closed multiple times throughout its lifecycle.
|
A job can be opened and closed multiple times throughout its lifecycle.
|
||||||
|
|
||||||
|
A closed job cannot receive data or perform analysis
|
||||||
|
operations, but you can still explore and navigate results.
|
||||||
|
|
||||||
===== Request
|
===== Request
|
||||||
|
|
||||||
`POST _xpack/ml/anomaly_detectors/<job_id>/_close`
|
`POST _xpack/ml/anomaly_detectors/<job_id>/_close`
|
||||||
|
@ -18,14 +21,15 @@ flushing buffers, calculating final results and persisting the model snapshots.
|
||||||
Depending upon the size of the job, it could take several minutes to close and
|
Depending upon the size of the job, it could take several minutes to close and
|
||||||
the equivalent time to re-open.
|
the equivalent time to re-open.
|
||||||
|
|
||||||
After it is closed, the job has almost no overhead on the cluster except for
|
After it is closed, the job has a minimal overhead on the cluster except for
|
||||||
maintaining its meta data. A closed job cannot receive data or perform analysis
|
maintaining its meta data.
|
||||||
operations, but you can still explore and navigate results.
|
Therefore it is best practice to close jobs that are no longer required to process data.
|
||||||
|
|
||||||
|
When a datafeed that has a specified end date stops, it will automatically close the job.
|
||||||
|
|
||||||
You must have `manage_ml`, or `manage` cluster privileges to use this API.
|
You must have `manage_ml`, or `manage` cluster privileges to use this API.
|
||||||
For more information, see <<privileges-list-cluster>>.
|
For more information, see <<privileges-list-cluster>>.
|
||||||
//NOTE: TBD
|
|
||||||
//OUTDATED?: If using the {prelert} UI, the job will be automatically closed when stopping a datafeed job.
|
|
||||||
|
|
||||||
===== Path Parameters
|
===== Path Parameters
|
||||||
|
|
||||||
|
@ -35,9 +39,14 @@ For more information, see <<privileges-list-cluster>>.
|
||||||
===== Query Parameters
|
===== Query Parameters
|
||||||
|
|
||||||
`close_timeout`::
|
`close_timeout`::
|
||||||
(time) Controls the time to wait until a job has closed.
|
(time units) Controls the time to wait until a job has closed.
|
||||||
The default value is 30 minutes.
|
The default value is 30 minutes.
|
||||||
|
|
||||||
|
`force`::
|
||||||
|
(boolean) Use to close a failed job, or to forcefully close a job which has not
|
||||||
|
responded to its initial close request.
|
||||||
|
|
||||||
|
|
||||||
////
|
////
|
||||||
===== Responses
|
===== Responses
|
||||||
|
|
||||||
|
|
|
@ -29,7 +29,7 @@ The API returns the following information:
|
||||||
|
|
||||||
`jobs`::
|
`jobs`::
|
||||||
(array) An array of job count objects.
|
(array) An array of job count objects.
|
||||||
For more information, see <<ml-jobcounts,Job Counts>>.
|
For more information, see <<ml-jobstats,Job Stats>>.
|
||||||
|
|
||||||
////
|
////
|
||||||
===== Responses
|
===== Responses
|
||||||
|
|
|
@ -27,10 +27,6 @@ privileges to use this API. For more information, see <<privileges-list-cluster>
|
||||||
`end`::
|
`end`::
|
||||||
(string) Returns records with timestamps earlier than this time.
|
(string) Returns records with timestamps earlier than this time.
|
||||||
|
|
||||||
`expand`::
|
|
||||||
(boolean) TBD
|
|
||||||
//This field did not work on older build.
|
|
||||||
|
|
||||||
`from`::
|
`from`::
|
||||||
(integer) Skips the specified number of records.
|
(integer) Skips the specified number of records.
|
||||||
|
|
||||||
|
|
|
@ -1,12 +1,12 @@
|
||||||
//lcawley Verified example output 2017-04-11
|
//lcawley Verified example output 2017-04-11
|
||||||
[[ml-jobcounts]]
|
[[ml-jobstats]]
|
||||||
==== Job Counts
|
==== Job Stats
|
||||||
|
|
||||||
The get job statistics API provides information about the operational
|
The get job statistics API provides information about the operational
|
||||||
progress of a job.
|
progress of a job.
|
||||||
|
|
||||||
NOTE: Job count values are cumulative for the lifetime of a job. If a model snapshot is reverted
|
`assignment_explanation`::
|
||||||
or old results are deleted, the job counts are not reset.
|
(string) For open jobs only, contains messages relating to the selection of an executing node.
|
||||||
|
|
||||||
`data_counts`::
|
`data_counts`::
|
||||||
(object) An object that describes the number of records processed and any related error counts.
|
(object) An object that describes the number of records processed and any related error counts.
|
||||||
|
@ -19,31 +19,36 @@ or old results are deleted, the job counts are not reset.
|
||||||
(object) An object that provides information about the size and contents of the model.
|
(object) An object that provides information about the size and contents of the model.
|
||||||
See <<ml-modelsizestats,model size stats objects>>
|
See <<ml-modelsizestats,model size stats objects>>
|
||||||
|
|
||||||
|
`node`::
|
||||||
|
(object) For open jobs only, contains information about the executing node.
|
||||||
|
See <<ml-stats-node,node object>>.
|
||||||
|
|
||||||
|
`open_time`::
|
||||||
|
(string) For open jobs only, the elapsed time for which the job has been open.
|
||||||
|
E.g. `28746386s`.
|
||||||
|
|
||||||
`state`::
|
`state`::
|
||||||
(string) The status of the job, which can be one of the following values:
|
(string) The status of the job, which can be one of the following values:
|
||||||
`closed`::: The job finished successfully with its model state persisted.
|
|
||||||
The job is still available to accept further data. +
|
|
||||||
+
|
|
||||||
--
|
|
||||||
NOTE: If you send data in a periodic cycle and close the job at the end of
|
|
||||||
each transaction, the job is marked as closed in the intervals between
|
|
||||||
when data is sent. For example, if data is sent every minute and it takes
|
|
||||||
1 second to process, the job has a closed state for 59 seconds.
|
|
||||||
|
|
||||||
--
|
`open`::: The job is available to receive and process data.
|
||||||
`closing`::: TBD. The job is in the process of closing?
|
`closed`::: The job finished successfully with its model state persisted.
|
||||||
|
The job must be opened before it can accept further data.
|
||||||
|
`closing`::: The job close action is in progress and has not yet completed.
|
||||||
|
A closing job cannot accept further data.
|
||||||
`failed`::: The job did not finish successfully due to an error.
|
`failed`::: The job did not finish successfully due to an error.
|
||||||
This situation can occur due to invalid input data. In this case,
|
This situation can occur due to invalid input data.
|
||||||
sending corrected data to a failed job re-opens the job and
|
If the job had irrevecobly failed, it must be force closed and then deleted.
|
||||||
resets it to an open state.
|
If the datafeed can be corrected, the job can be closed and then re-opened.
|
||||||
`open`::: The job is actively receiving and processing data.
|
|
||||||
|
|
||||||
[float]
|
[float]
|
||||||
[[ml-datacounts]]
|
[[ml-datacounts]]
|
||||||
===== Data Counts Objects
|
===== Data Counts Objects
|
||||||
|
|
||||||
The `data_counts` object describes the number of records processed
|
The `data_counts` object describes the number of records processed
|
||||||
and any related error counts. It has the following properties:
|
and any related error counts.
|
||||||
|
|
||||||
|
The `data_count` values are cumulative for the lifetime of a job. If a model snapshot is reverted
|
||||||
|
or old results are deleted, the job counts are not reset.
|
||||||
|
|
||||||
`bucket_count`::
|
`bucket_count`::
|
||||||
(long) The number of bucket results produced by the job.
|
(long) The number of bucket results produced by the job.
|
||||||
|
@ -53,7 +58,9 @@ and any related error counts. It has the following properties:
|
||||||
The datetime string is in ISO 8601 format.
|
The datetime string is in ISO 8601 format.
|
||||||
|
|
||||||
`empty_bucket_count`::
|
`empty_bucket_count`::
|
||||||
() TBD
|
(long) The number of buckets which did not contain any data. If your data contains many
|
||||||
|
empty buckets, consider increasing your `bucket_span` or using functions that are tolerant
|
||||||
|
to gaps in data such as `mean`, `non_null_sum` or `non_zero_count`.
|
||||||
|
|
||||||
`input_bytes`::
|
`input_bytes`::
|
||||||
(long) The number of raw bytes read by the job.
|
(long) The number of raw bytes read by the job.
|
||||||
|
@ -72,16 +79,16 @@ and any related error counts. It has the following properties:
|
||||||
(string) A numerical character string that uniquely identifies the job.
|
(string) A numerical character string that uniquely identifies the job.
|
||||||
|
|
||||||
`last_data_time`::
|
`last_data_time`::
|
||||||
() TBD
|
(datetime) The timestamp at which data was last analyzed, according to server time.
|
||||||
|
|
||||||
|
`latest_empty_bucket_timestamp`::
|
||||||
|
(date) The timestamp of the last bucket that did not contain any data.
|
||||||
|
|
||||||
`latest_record_timestamp`::
|
`latest_record_timestamp`::
|
||||||
(string) The timestamp of the last chronologically ordered record.
|
(date) The timestamp of the last processed record.
|
||||||
If the records are not in strict chronological order, this value might not be
|
|
||||||
the same as the timestamp of the last record.
|
|
||||||
The datetime string is in ISO 8601 format.
|
|
||||||
|
|
||||||
`latest_sparse_bucket_timestamp`::
|
`latest_sparse_bucket_timestamp`::
|
||||||
() TBD
|
(date) The timestamp of the last bucket that was considered sparse.
|
||||||
|
|
||||||
`missing_field_count`::
|
`missing_field_count`::
|
||||||
(long) The number of records that are missing a field that the job is configured to analyze.
|
(long) The number of records that are missing a field that the job is configured to analyze.
|
||||||
|
@ -97,6 +104,7 @@ necessarily a cause for concern.
|
||||||
|
|
||||||
`out_of_order_timestamp_count`::
|
`out_of_order_timestamp_count`::
|
||||||
(long) The number of records that are out of time sequence and outside of the latency window.
|
(long) The number of records that are out of time sequence and outside of the latency window.
|
||||||
|
This is only applicable when using the `_data` endpoint.
|
||||||
These records are discarded, since jobs require time series data to be in ascending chronological order.
|
These records are discarded, since jobs require time series data to be in ascending chronological order.
|
||||||
|
|
||||||
`processed_field_count`::
|
`processed_field_count`::
|
||||||
|
@ -108,13 +116,16 @@ necessarily a cause for concern.
|
||||||
(long) The number of records that have been processed by the job.
|
(long) The number of records that have been processed by the job.
|
||||||
This value includes records with missing fields, since they are nonetheless analyzed.
|
This value includes records with missing fields, since they are nonetheless analyzed.
|
||||||
+
|
+
|
||||||
The following records are not processed:
|
When using datafeeds, the `processed_record_count` will differ from the `input_record_count`
|
||||||
|
if you are using aggregations in your search query.
|
||||||
|
+
|
||||||
|
When posting to the `/_data` endpoint, the following records are not processed:
|
||||||
* Records not in chronological order and outside the latency window
|
* Records not in chronological order and outside the latency window
|
||||||
* Records with invalid timestamp
|
* Records with invalid timestamp
|
||||||
* Records filtered by an exclude transform
|
|
||||||
|
|
||||||
`sparse_bucket_count`::
|
`sparse_bucket_count`::
|
||||||
() TBD
|
(long) The number of buckets which contained few data points compared to the expected number
|
||||||
|
of data points. If your data contains many sparse buckets, consider using a longer `bucket_span`.
|
||||||
|
|
||||||
[float]
|
[float]
|
||||||
[[ml-modelsizestats]]
|
[[ml-modelsizestats]]
|
||||||
|
@ -123,13 +134,14 @@ necessarily a cause for concern.
|
||||||
The `model_size_stats` object has the following properties:
|
The `model_size_stats` object has the following properties:
|
||||||
|
|
||||||
`bucket_allocation_failures_count`::
|
`bucket_allocation_failures_count`::
|
||||||
() TBD
|
(long) The number of buckets for which new entites in incoming data was not processed due to a
|
||||||
|
insufficient model memory as signified by a `hard_limit` `memory_status`.
|
||||||
|
|
||||||
`job_id`::
|
`job_id`::
|
||||||
(string) A numerical character string that uniquely identifies the job.
|
(string) A numerical character string that uniquely identifies the job.
|
||||||
|
|
||||||
`log_time`::
|
`log_time`::
|
||||||
() TBD
|
(date) The timestamp of the `model_size_stats` according to server time.
|
||||||
|
|
||||||
`memory_status`::
|
`memory_status`::
|
||||||
(string) The status of the mathematical models. This property can have one of the following values:
|
(string) The status of the mathematical models. This property can have one of the following values:
|
||||||
|
@ -142,24 +154,46 @@ The `model_size_stats` object has the following properties:
|
||||||
last time the model was persisted. If the job is closed, this value indicates the latest size.
|
last time the model was persisted. If the job is closed, this value indicates the latest size.
|
||||||
|
|
||||||
`result_type`::
|
`result_type`::
|
||||||
TBD
|
(string) For internal use. The type of result.
|
||||||
|
|
||||||
`total_by_field_count`::
|
`total_by_field_count`::
|
||||||
(long) The number of `by` field values that were analyzed by the models. +
|
(long) The number of `by` field values that were analyzed by the models.
|
||||||
+
|
+
|
||||||
--
|
--
|
||||||
NOTE: The `by` field values are counted separately for each detector and partition.
|
NOTE: The `by` field values are counted separately for each detector and partition.
|
||||||
|
|
||||||
--
|
--
|
||||||
|
|
||||||
`total_over_field_count`::
|
`total_over_field_count`::
|
||||||
(long) The number of `over` field values that were analyzed by the models. +
|
(long) The number of `over` field values that were analyzed by the models.
|
||||||
+
|
+
|
||||||
--
|
--
|
||||||
NOTE: The `over` field values are counted separately for each detector and partition.
|
NOTE: The `over` field values are counted separately for each detector and partition.
|
||||||
|
|
||||||
--
|
--
|
||||||
|
|
||||||
`total_partition_field_count`::
|
`total_partition_field_count`::
|
||||||
(long) The number of `partition` field values that were analyzed by the models.
|
(long) The number of `partition` field values that were analyzed by the models.
|
||||||
|
|
||||||
`timestamp`::
|
`timestamp`::
|
||||||
TBD
|
(date) The timestamp of the `model_size_stats` according to the timestamp of the data.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
[[ml-stats-node]]
|
||||||
|
===== Node Objects
|
||||||
|
|
||||||
|
The `node` objects contains properties of the executing node and is only available for open jobs.
|
||||||
|
|
||||||
|
`id`::
|
||||||
|
(string) The unique identifier of the executing node.
|
||||||
|
|
||||||
|
`name`::
|
||||||
|
(string) The node's name.
|
||||||
|
|
||||||
|
`ephemeral_id`::
|
||||||
|
|
||||||
|
|
||||||
|
`transport_address`::
|
||||||
|
(string) Host and port where transport HTTP connections are accepted.
|
||||||
|
|
||||||
|
`attributes`::
|
||||||
|
(object) {ml} attributes.
|
||||||
|
`max_running_jobs`::: The maximum number of concurrently open jobs allowed per node.
|
||||||
|
|
|
@ -5,10 +5,11 @@
|
||||||
A job resource has the following properties:
|
A job resource has the following properties:
|
||||||
|
|
||||||
`analysis_config`::
|
`analysis_config`::
|
||||||
(object) The analysis configuration, which specifies how to analyze the data. See <<ml-analysisconfig, analysis configuration objects>>.
|
(object) The analysis configuration, which specifies how to analyze the data.
|
||||||
|
See <<ml-analysisconfig, analysis configuration objects>>.
|
||||||
|
|
||||||
`analysis_limits`::
|
`analysis_limits`::
|
||||||
(object) Defines limits on the number of field values and time buckets to be analyzed.
|
(object) Defines approximate limits on the memory resource requirements for the job.
|
||||||
See <<ml-apilimits,analysis limits>>.
|
See <<ml-apilimits,analysis limits>>.
|
||||||
|
|
||||||
`create_time`::
|
`create_time`::
|
||||||
|
@ -21,17 +22,17 @@ A job resource has the following properties:
|
||||||
(string) An optional description of the job.
|
(string) An optional description of the job.
|
||||||
|
|
||||||
`finished_time`::
|
`finished_time`::
|
||||||
(string) If the job closed of failed, this is the time the job finished, in ISO 8601 format.
|
(string) If the job closed or failed, this is the time the job finished, otherwise it is `null`.
|
||||||
Otherwise, it is `null`. For example, `1491007365347`.
|
|
||||||
|
|
||||||
`job_id`::
|
`job_id`::
|
||||||
(string) A numerical character string that uniquely identifies the job.
|
(string) The unique identifier for the job.
|
||||||
|
|
||||||
`job_type`::
|
`job_type`::
|
||||||
(string) TBD. For example: "anomaly_detector".
|
(string) Reserved for future use, currently set to `anomaly_detector`.
|
||||||
|
|
||||||
`model_plot_config`:: TBD
|
`model_plot_config`::
|
||||||
`enabled`:: TBD. For example, `true`.
|
(object) Configuration properties for storing additional model information.
|
||||||
|
See <<ml-apimodelplotconfig, model plot configuration>>.
|
||||||
|
|
||||||
`model_snapshot_id`::
|
`model_snapshot_id`::
|
||||||
(string) A numerical character string that uniquely identifies the model
|
(string) A numerical character string that uniquely identifies the model
|
||||||
|
@ -42,7 +43,8 @@ A job resource has the following properties:
|
||||||
Older snapshots are deleted. The default value is 1 day.
|
Older snapshots are deleted. The default value is 1 day.
|
||||||
|
|
||||||
`results_index_name`::
|
`results_index_name`::
|
||||||
() TBD. For example, `shared`.
|
(string) The name of the index in which to store the results generated by {ml} results.
|
||||||
|
The default value is `shared` which corresponds to the index name `.ml-anomalies-shared`
|
||||||
|
|
||||||
[[ml-analysisconfig]]
|
[[ml-analysisconfig]]
|
||||||
===== Analysis Configuration Objects
|
===== Analysis Configuration Objects
|
||||||
|
@ -50,8 +52,8 @@ A job resource has the following properties:
|
||||||
An analysis configuration object has the following properties:
|
An analysis configuration object has the following properties:
|
||||||
|
|
||||||
`bucket_span` (required)::
|
`bucket_span` (required)::
|
||||||
(unsigned integer) The size of the interval that the analysis is aggregated into, measured in seconds. The default value is 5 minutes.
|
(time units) The size of the interval that the analysis is aggregated into, typically between `5m` and `1h`.
|
||||||
//TBD: Is this now measured in minutes?
|
The default value is `5m`.
|
||||||
|
|
||||||
`categorization_field_name`::
|
`categorization_field_name`::
|
||||||
(string) If not null, the values of the specified field will be categorized.
|
(string) If not null, the values of the specified field will be categorized.
|
||||||
|
@ -84,9 +86,9 @@ and an error is returned.
|
||||||
the use of influencers is recommended as it aggregates results for each influencer entity.
|
the use of influencers is recommended as it aggregates results for each influencer entity.
|
||||||
|
|
||||||
`latency`::
|
`latency`::
|
||||||
(unsigned integer) The size of the window, in seconds, in which to expect data that is out of time order. The default value is 0 milliseconds (no latency). +
|
(unsigned integer) The size of the window, in seconds, in which to expect data that is out of time order.
|
||||||
+
|
The default value is 0 (no latency).
|
||||||
--
|
|
||||||
NOTE: Latency is only applicable when you send data by using the <<ml-post-data, Post Data to Jobs>> API.
|
NOTE: Latency is only applicable when you send data by using the <<ml-post-data, Post Data to Jobs>> API.
|
||||||
|
|
||||||
--
|
--
|
||||||
|
@ -103,10 +105,10 @@ NOTE: Latency is only applicable when you send data by using the <<ml-post-data,
|
||||||
--
|
--
|
||||||
NOTE: To use the `multivariate_by_fields` property, you must also specify `by_field_name` in your detector.
|
NOTE: To use the `multivariate_by_fields` property, you must also specify `by_field_name` in your detector.
|
||||||
|
|
||||||
--
|
// LEAVE UNDOCUMENTED
|
||||||
`overlapping_buckets`::
|
// `overlapping_buckets`::
|
||||||
(boolean) If set to `true`, an additional analysis occurs that runs out of phase by half a bucket length.
|
// (boolean) If set to `true`, an additional analysis occurs that runs out of phase by half a bucket length.
|
||||||
This requires more system resources and enhances detection of anomalies that span bucket boundaries.
|
// This requires more system resources and enhances detection of anomalies that span bucket boundaries.
|
||||||
|
|
||||||
`summary_count_field_name`::
|
`summary_count_field_name`::
|
||||||
(string) If not null, the data fed to the job is expected to be pre-summarized.
|
(string) If not null, the data fed to the job is expected to be pre-summarized.
|
||||||
|
@ -115,10 +117,11 @@ NOTE: To use the `multivariate_by_fields` property, you must also specify `by_fi
|
||||||
+
|
+
|
||||||
--
|
--
|
||||||
NOTE: The `summary_count_field_name` property cannot be used with the `metric` function.
|
NOTE: The `summary_count_field_name` property cannot be used with the `metric` function.
|
||||||
|
|
||||||
--
|
--
|
||||||
`use_per_partition_normalization`::
|
|
||||||
() TBD
|
// LEAVE UNDOCUMENTED
|
||||||
|
// `use_per_partition_normalization`::
|
||||||
|
// () TBD
|
||||||
|
|
||||||
[[ml-detectorconfig]]
|
[[ml-detectorconfig]]
|
||||||
===== Detector Configuration Objects
|
===== Detector Configuration Objects
|
||||||
|
@ -134,10 +137,11 @@ Each detector has the following properties:
|
||||||
It is used for finding unusual values in the context of the split.
|
It is used for finding unusual values in the context of the split.
|
||||||
|
|
||||||
`detector_description`::
|
`detector_description`::
|
||||||
(string) A description of the detector. For example, `low_sum(events_per_min)`.
|
(string) A description of the detector. For example, `Low event rate`.
|
||||||
|
|
||||||
`detector_rules`::
|
// LEAVE UNDOCUMENTED
|
||||||
(array) TBD
|
// `detector_rules`::
|
||||||
|
// (array) TBD
|
||||||
|
|
||||||
`exclude_frequent`::
|
`exclude_frequent`::
|
||||||
(string) Contains one of the following values: `all`, `none`, `by`, or `over`.
|
(string) Contains one of the following values: `all`, `none`, `by`, or `over`.
|
||||||
|
@ -152,19 +156,12 @@ Each detector has the following properties:
|
||||||
+
|
+
|
||||||
--
|
--
|
||||||
NOTE: The `field_name` cannot contain double quotes or backslashes.
|
NOTE: The `field_name` cannot contain double quotes or backslashes.
|
||||||
|
|
||||||
--
|
--
|
||||||
|
|
||||||
`function` (required)::
|
`function` (required)::
|
||||||
(string) The analysis function that is used.
|
(string) The analysis function that is used.
|
||||||
For example, `count`, `rare`, `mean`, `min`, `max`, and `sum`.
|
For example, `count`, `rare`, `mean`, `min`, `max`, and `sum`.
|
||||||
The default function is `metric`, which looks for anomalies in all of `min`, `max`,
|
|
||||||
and `mean`. +
|
|
||||||
+
|
|
||||||
--
|
|
||||||
NOTE: You cannot use the `metric` function with pre-summarized input. If `summary_count_field_name`
|
|
||||||
is not null, you must specify a function other than `metric`.
|
|
||||||
|
|
||||||
--
|
|
||||||
`over_field_name`::
|
`over_field_name`::
|
||||||
(string) The field used to split the data.
|
(string) The field used to split the data.
|
||||||
In particular, this property is used for analyzing the splits with respect to the history of all splits.
|
In particular, this property is used for analyzing the splits with respect to the history of all splits.
|
||||||
|
@ -180,33 +177,21 @@ NOTE: You cannot use the `metric` function with pre-summarized input. If `summar
|
||||||
+
|
+
|
||||||
--
|
--
|
||||||
IMPORTANT: Field names are case sensitive, for example a field named 'Bytes' is different to one named 'bytes'.
|
IMPORTANT: Field names are case sensitive, for example a field named 'Bytes' is different to one named 'bytes'.
|
||||||
|
|
||||||
--
|
--
|
||||||
|
|
||||||
[[ml-datadescription]]
|
[[ml-datadescription]]
|
||||||
===== Data Description Objects
|
===== Data Description Objects
|
||||||
|
|
||||||
The data description settings define the format of the input data.
|
The data description define the format of the input data when posting time-ordered data to the `_data` endpoint.
|
||||||
|
Please note that when configuring a datafeed, these are automatically set.
|
||||||
When data is read from Elasticsearch, the datafeed must be configured.
|
|
||||||
This defines which index data will be taken from, and over what time period.
|
|
||||||
|
|
||||||
When data is received via the <<ml-post-data, Post Data to Jobs>> API,
|
When data is received via the <<ml-post-data, Post Data to Jobs>> API,
|
||||||
you must specify the data format (for example, JSON or CSV). In this scenario,
|
|
||||||
the data posted is not stored in Elasticsearch. Only the results for anomaly detection are retained.
|
the data posted is not stored in Elasticsearch. Only the results for anomaly detection are retained.
|
||||||
|
|
||||||
When you create a job, by default it accepts data in tab-separated-values format and expects
|
|
||||||
an Epoch time value in a field named `time`. The `time` field must be measured in seconds from the Epoch.
|
|
||||||
If, however, your data is not in this format, you can provide a data description object that specifies the
|
|
||||||
format of your data.
|
|
||||||
|
|
||||||
A data description object has the following properties:
|
A data description object has the following properties:
|
||||||
|
|
||||||
`fieldDelimiter`::
|
|
||||||
() TBD
|
|
||||||
|
|
||||||
`format`::
|
`format`::
|
||||||
() TBD
|
(string) Only `JSON` format is supported at this time.
|
||||||
|
|
||||||
`time_field`::
|
`time_field`::
|
||||||
(string) The name of the field that contains the timestamp.
|
(string) The name of the field that contains the timestamp.
|
||||||
|
@ -215,23 +200,22 @@ A data description object has the following properties:
|
||||||
`time_format`::
|
`time_format`::
|
||||||
(string) The time format, which can be `epoch`, `epoch_ms`, or a custom pattern.
|
(string) The time format, which can be `epoch`, `epoch_ms`, or a custom pattern.
|
||||||
The default value is `epoch`, which refers to UNIX or Epoch time (the number of seconds
|
The default value is `epoch`, which refers to UNIX or Epoch time (the number of seconds
|
||||||
since 1 Jan 1970) and corresponds to the time_t type in C and C++.
|
since 1 Jan 1970).
|
||||||
The value `epoch_ms` indicates that time is measured in milliseconds since the epoch.
|
The value `epoch_ms` indicates that time is measured in milliseconds since the epoch.
|
||||||
The `epoch` and `epoch_ms` time formats accept either integer or real values. +
|
The `epoch` and `epoch_ms` time formats accept either integer or real values. +
|
||||||
+
|
+
|
||||||
--
|
--
|
||||||
NOTE: Custom patterns must conform to the Java `DateTimeFormatter` class. When you use date-time formatting patterns, it is recommended that you provide the full date, time and time zone. For example: `yyyy-MM-dd'T'HH:mm:ssX`. If the pattern that you specify is not sufficient to produce a complete timestamp, job creation fails.
|
NOTE: Custom patterns must conform to the Java `DateTimeFormatter` class. When you use date-time formatting patterns, it is recommended that you provide the full date, time and time zone. For example: `yyyy-MM-dd'T'HH:mm:ssX`. If the pattern that you specify is not sufficient to produce a complete timestamp, job creation fails.
|
||||||
|
|
||||||
--
|
--
|
||||||
`quotecharacter`::
|
|
||||||
() TBD
|
|
||||||
|
|
||||||
[[ml-apilimits]]
|
[[ml-apilimits]]
|
||||||
===== Analysis Limits
|
===== Analysis Limits
|
||||||
|
|
||||||
Limits can be applied for the size of the mathematical models that are held in memory.
|
Limits can be applied for the resources required to hold the mathematical models in memory.
|
||||||
These limits can be set per job and do not control the memory used by other processes.
|
These limits are approximate and can be set per job.
|
||||||
If necessary, the limits can also be updated after the job is created.
|
They do not control the memory used by other processes, for example the elasticsearch Java processes.
|
||||||
|
If necessary, the limits can be increased after the job is created.
|
||||||
|
|
||||||
The `analysis_limits` object has the following properties:
|
The `analysis_limits` object has the following properties:
|
||||||
|
|
||||||
|
@ -241,10 +225,33 @@ The `analysis_limits` object has the following properties:
|
||||||
more examples are available, however it requires that you have more storage available.
|
more examples are available, however it requires that you have more storage available.
|
||||||
If you set this value to `0`, no examples are stored.
|
If you set this value to `0`, no examples are stored.
|
||||||
|
|
||||||
////
|
|
||||||
NOTE: The `categorization_examples_limit` only applies to analysis that uses categorization.
|
NOTE: The `categorization_examples_limit` only applies to analysis that uses categorization.
|
||||||
////
|
|
||||||
`model_memory_limit`::
|
`model_memory_limit`::
|
||||||
(long) The maximum amount of memory, in MiB, that the mathematical models can use.
|
(long) The maximum amount of memory, in MiB, that the mathematical models can use.
|
||||||
Once this limit is approached, data pruning becomes more aggressive.
|
Once this limit is approached, data pruning becomes more aggressive.
|
||||||
Upon exceeding this limit, new entities are not modeled. The default value is 4096.
|
Upon exceeding this limit, new entities are not modeled. The default value is 4096.
|
||||||
|
|
||||||
|
[[ml-apimodelplotconfig]]
|
||||||
|
===== Model Plot Config
|
||||||
|
|
||||||
|
This advanced configuration option will store model information along with results allowing a more detailed view into anomaly detection.
|
||||||
|
Enabling this can add considerable overhead to the performance of the system and is not feasible for jobs with many entities.
|
||||||
|
|
||||||
|
Model plot provides a simplified and indicative view of the model and its bounds.
|
||||||
|
It does not display complex features such as multivariate correlations or multimodal data.
|
||||||
|
As such, anomalies may occassionally be reported which cannot be seen in the model plot.
|
||||||
|
|
||||||
|
Model plot config can be configured when the job is created or updated later. It must be disabled if performance issues are experienced.
|
||||||
|
|
||||||
|
The `model_plot_config` object has the following properties:
|
||||||
|
|
||||||
|
`enabled`::
|
||||||
|
(boolean) If true, will enable calculation and storage of the model bounds for each entity being analyzed.
|
||||||
|
By default, this is not enabled.
|
||||||
|
|
||||||
|
`terms`::
|
||||||
|
(string) Limits data collection to this comma separated list of _partition_ or _by_ field names.
|
||||||
|
If terms are not specified or is an empty string, no filtering is applied.
|
||||||
|
E.g. `"CPU,NetworkIn,DiskWrites"`
|
||||||
|
|
||||||
|
|
|
@ -88,4 +88,4 @@ For example:
|
||||||
}
|
}
|
||||||
----
|
----
|
||||||
|
|
||||||
For more information about these properties, see <<ml-jobcounts,Job Counts>>.
|
For more information about these properties, see <<ml-jobstats,Job Stats>>.
|
||||||
|
|
Loading…
Reference in New Issue