diff --git a/docs/reference/ml/anomaly-detection/apis/get-bucket.asciidoc b/docs/reference/ml/anomaly-detection/apis/get-bucket.asciidoc index 027de1385e8..441c916304d 100644 --- a/docs/reference/ml/anomaly-detection/apis/get-bucket.asciidoc +++ b/docs/reference/ml/anomaly-detection/apis/get-bucket.asciidoc @@ -40,99 +40,180 @@ bucket. include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-anomaly-detection] ``:: - (Optional, string) The timestamp of a single bucket result. If you do not - specify this parameter, the API returns information about all buckets. +(Optional, string) The timestamp of a single bucket result. If you do not +specify this parameter, the API returns information about all buckets. [[ml-get-bucket-request-body]] ==== {api-request-body-title} `anomaly_score`:: - (Optional, double) Returns buckets with anomaly scores greater or equal than - this value. +(Optional, double) Returns buckets with anomaly scores greater or equal than +this value. `desc`:: - (Optional, boolean) If true, the buckets are sorted in descending order. +(Optional, boolean) +include::{docdir}/ml/ml-shared.asciidoc[tag=desc-results] `end`:: - (Optional, string) Returns buckets with timestamps earlier than this time. +(Optional, string) Returns buckets with timestamps earlier than this time. `exclude_interim`:: - (Optional, boolean) If true, the output excludes interim results. By default, - interim results are included. +(Optional, boolean) +include::{docdir}/ml/ml-shared.asciidoc[tag=exclude-interim-results] `expand`:: - (Optional, boolean) If true, the output includes anomaly records. +(Optional, boolean) If true, the output includes anomaly records. -`page`:: -`from`::: - (Optional, integer) Skips the specified number of buckets. -`size`::: - (Optional, integer) Specifies the maximum number of buckets to obtain. +`page`.`from`:: +(Optional, integer) Skips the specified number of buckets. + +`page`.`size`:: +(Optional, integer) Specifies the maximum number of buckets to obtain. `sort`:: - (Optional, string) Specifies the sort field for the requested buckets. By - default, the buckets are sorted by the `timestamp` field. +(Optional, string) Specifies the sort field for the requested buckets. By +default, the buckets are sorted by the `timestamp` field. `start`:: - (Optional, string) Returns buckets with timestamps after this time. +(Optional, string) Returns buckets with timestamps after this time. [[ml-get-bucket-results]] ==== {api-response-body-title} -The API returns the following information: +The API returns an array of bucket objects, which have the following properties: -`buckets`:: - (array) An array of bucket objects. For more information, see - <>. +`anomaly_score`:: +(number) The maximum anomaly score, between 0-100, for any of the bucket +influencers. This is an overall, rate-limited score for the job. All the anomaly +records in the bucket contribute to this score. This value might be updated as +new data is analyzed. + +`bucket_influencers`:: +(array) An array of bucket influencer objects, which have the following +properties: + +`bucket_influencers`.`anomaly_score`::: +(number) A normalized score between 0-100, which is calculated for each bucket +influencer. This score might be updated as newer data is analyzed. + +`bucket_influencers`.`bucket_span`::: +(number) +include::{docdir}/ml/ml-shared.asciidoc[tag=bucket-span-results] + +`bucket_influencers`.`initial_anomaly_score`::: +(number) The score between 0-100 for each bucket influencer. This score is the +initial value that was calculated at the time the bucket was processed. + +`bucket_influencers`.`influencer_field_name`::: +(string) The field name of the influencer. + +`bucket_influencers`.`influencer_field_value`::: +(string) The field value of the influencer. + +`bucket_influencers`.`is_interim`::: +(boolean) +include::{docdir}/ml/ml-shared.asciidoc[tag=is-interim] + +`bucket_influencers`.`job_id`::: +(string) +include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-anomaly-detection] + +`bucket_influencers`.`probability`::: +(number) The probability that the bucket has this behavior, in the range 0 to 1. +This value can be held to a high precision of over 300 decimal places, so the +`anomaly_score` is provided as a human-readable and friendly interpretation of +this. + +`bucket_influencers`.`raw_anomaly_score`::: +(number) Internal. + +`bucket_influencers`.`result_type`::: +(string) Internal. This value is always set to `bucket_influencer`. + +`bucket_influencers`.`timestamp`::: +(date) The start time of the bucket for which these results were calculated. + +`bucket_span`:: +(number) +include::{docdir}/ml/ml-shared.asciidoc[tag=bucket-span-results] + +`event_count`:: +(number) The number of input data records processed in this bucket. + +`initial_anomaly_score`:: +(number) The maximum `anomaly_score` for any of the bucket influencers. This is +the initial value that was calculated at the time the bucket was processed. + +`is_interim`:: +(boolean) +include::{docdir}/ml/ml-shared.asciidoc[tag=is-interim] + +`job_id`:: +(string) +include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-anomaly-detection] + +`processing_time_ms`:: +(number) The amount of time, in milliseconds, that it took to analyze the bucket +contents and calculate results. + +`result_type`:: +(string) Internal. This value is always set to `bucket`. + +`timestamp`:: +(date) The start time of the bucket. This timestamp uniquely identifies the +bucket. ++ +-- +NOTE: Events that occur exactly at the timestamp of the bucket are included in +the results for the bucket. + +-- [[ml-get-bucket-example]] ==== {api-examples-title} -The following example gets bucket information for the `it-ops-kpi` job: - [source,console] -------------------------------------------------- -GET _ml/anomaly_detectors/it-ops-kpi/results/buckets +GET _ml/anomaly_detectors/low_request_rate/results/buckets { "anomaly_score": 80, "start": "1454530200001" } -------------------------------------------------- -// TEST[skip:todo] +// TEST[skip:Kibana sample data] In this example, the API returns a single result that matches the specified score and time constraints: [source,js] ---- { - "count": 1, - "buckets": [ + "count" : 1, + "buckets" : [ { - "job_id": "it-ops-kpi", - "timestamp": 1454943900000, - "anomaly_score": 94.1706, - "bucket_span": 300, - "initial_anomaly_score": 94.1706, - "event_count": 153, - "is_interim": false, - "bucket_influencers": [ + "job_id" : "low_request_rate", + "timestamp" : 1578398400000, + "anomaly_score" : 91.58505459594764, + "bucket_span" : 3600, + "initial_anomaly_score" : 91.58505459594764, + "event_count" : 0, + "is_interim" : false, + "bucket_influencers" : [ { - "job_id": "it-ops-kpi", - "result_type": "bucket_influencer", - "influencer_field_name": "bucket_time", - "initial_anomaly_score": 94.1706, - "anomaly_score": 94.1706, - "raw_anomaly_score": 2.32119, - "probability": 0.00000575042, - "timestamp": 1454943900000, - "bucket_span": 300, - "is_interim": false + "job_id" : "low_request_rate", + "result_type" : "bucket_influencer", + "influencer_field_name" : "bucket_time", + "initial_anomaly_score" : 91.58505459594764, + "anomaly_score" : 91.58505459594764, + "raw_anomaly_score" : 0.5758246639716365, + "probability" : 1.7340849573442696E-4, + "timestamp" : 1578398400000, + "bucket_span" : 3600, + "is_interim" : false } ], - "processing_time_ms": 2, - "partition_scores": [], - "result_type": "bucket" + "processing_time_ms" : 0, + "result_type" : "bucket" } ] } ----- +---- \ No newline at end of file diff --git a/docs/reference/ml/anomaly-detection/apis/get-category.asciidoc b/docs/reference/ml/anomaly-detection/apis/get-category.asciidoc index 914ca5daa16..9a7f4d79ab8 100644 --- a/docs/reference/ml/anomaly-detection/apis/get-category.asciidoc +++ b/docs/reference/ml/anomaly-detection/apis/get-category.asciidoc @@ -28,45 +28,76 @@ privileges. See <> and [[ml-get-category-desc]] ==== {api-description-title} -For more information about categories, see +When `categorization_field_name` is specified in the job configuration, it is +possible to view the definitions of the resulting categories. A category +definition describes the common terms matched and contains examples of matched +values. + +The anomaly results from a categorization analysis are available as bucket, +influencer, and record results. For example, the results might indicate that +at 16:45 there was an unusual count of log message category 11. You can then +examine the description and examples of that category. For more information, see {ml-docs}/ml-configuring-categories.html[Categorizing log messages]. [[ml-get-category-path-parms]] ==== {api-path-parms-title} +``:: +(Optional, long) Identifier for the category. If you do not specify this +parameter, the API returns information about all categories in the {anomaly-job}. + ``:: (Required, string) include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-anomaly-detection] -``:: - (Optional, long) Identifier for the category. If you do not specify this - parameter, the API returns information about all categories in the - {anomaly-job}. - [[ml-get-category-request-body]] ==== {api-request-body-title} -`page`:: -`from`::: - (Optional, integer) Skips the specified number of categories. -`size`::: - (Optional, integer) Specifies the maximum number of categories to obtain. +`page`.`from`:: +(Optional, integer) Skips the specified number of categories. + +`page`.`size`:: +(Optional, integer) Specifies the maximum number of categories to obtain. [[ml-get-category-results]] ==== {api-response-body-title} -The API returns the following information: +The API returns an array of category objects, which have the following +properties: -`categories`:: - (array) An array of category objects. For more information, see - <>. +`category_id`:: +(unsigned integer) A unique identifier for the category. + +`examples`:: +(array) A list of examples of actual values that matched the category. + +`grok_pattern`:: +experimental[] (string) A Grok pattern that could be used in {ls} or an ingest +pipeline to extract fields from messages that match the category. This field is +experimental and may be changed or removed in a future release. The Grok +patterns that are found are not optimal, but are often a good starting point for +manual tweaking. + +`job_id`:: +(string) +include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-anomaly-detection] + +`max_matching_length`:: +(unsigned integer) The maximum length of the fields that matched the category. +The value is increased by 10% to enable matching for similar fields that have +not been analyzed. + +`regex`:: +(string) A regular expression that is used to search for values that match the +category. + +`terms`:: +(string) A space separated list of the common tokens that are matched in values +of the category. [[ml-get-category-example]] ==== {api-examples-title} -The following example gets information about one category for the -`esxi_log` job: - [source,console] -------------------------------------------------- GET _ml/anomaly_detectors/esxi_log/results/categories diff --git a/docs/reference/ml/anomaly-detection/apis/get-influencer.asciidoc b/docs/reference/ml/anomaly-detection/apis/get-influencer.asciidoc index 2165d8ef9f7..93d38326b23 100644 --- a/docs/reference/ml/anomaly-detection/apis/get-influencer.asciidoc +++ b/docs/reference/ml/anomaly-detection/apis/get-influencer.asciidoc @@ -23,6 +23,13 @@ need `read` index privilege on the index that stores the results. The privileges. See <> and <>. +[[ml-get-influencer-desc]] +==== {api-description-title} + +Influencers are the entities that have contributed to, or are to blame for, +the anomalies. Influencer results are available only if an +`influencer_field_name` is specified in the job configuration. + [[ml-get-influencer-path-parms]] ==== {api-path-parms-title} @@ -34,75 +41,119 @@ include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-anomaly-detection] ==== {api-request-body-title} `desc`:: - (Optional, boolean) If true, the results are sorted in descending order. +(Optional, boolean) +include::{docdir}/ml/ml-shared.asciidoc[tag=desc-results] `end`:: - (Optional, string) Returns influencers with timestamps earlier than this time. +(Optional, string) Returns influencers with timestamps earlier than this time. `exclude_interim`:: - (Optional, boolean) If true, the output excludes interim results. By default, - interim results are included. +(Optional, boolean) +include::{docdir}/ml/ml-shared.asciidoc[tag=exclude-interim-results] `influencer_score`:: - (Optional, double) Returns influencers with anomaly scores greater than or - equal to this value. +(Optional, double) Returns influencers with anomaly scores greater than or equal +to this value. -`page`:: -`from`::: - (Optional, integer) Skips the specified number of influencers. -`size`::: - (Optional, integer) Specifies the maximum number of influencers to obtain. +`page`.`from`:: +(Optional, integer) Skips the specified number of influencers. + +`page`.`size`:: +(Optional, integer) Specifies the maximum number of influencers to obtain. `sort`:: - (Optional, string) Specifies the sort field for the requested influencers. By - default, the influencers are sorted by the `influencer_score` value. +(Optional, string) Specifies the sort field for the requested influencers. By +default, the influencers are sorted by the `influencer_score` value. `start`:: - (Optional, string) Returns influencers with timestamps after this time. +(Optional, string) Returns influencers with timestamps after this time. [[ml-get-influencer-results]] ==== {api-response-body-title} -The API returns the following information: +The API returns an array of influencer objects, which have the following +properties: -`influencers`:: - (array) An array of influencer objects. - For more information, see <>. +`bucket_span`:: +(number) +include::{docdir}/ml/ml-shared.asciidoc[tag=bucket-span-results] + +`influencer_score`:: +(number) A normalized score between 0-100, which is based on the probability of +the influencer in this bucket aggregated across detectors. Unlike +`initial_influencer_score`, this value will be updated by a re-normalization +process as new data is analyzed. + +`influencer_field_name`:: +(string) The field name of the influencer. + +`influencer_field_value`:: +(string) The entity that influenced, contributed to, or was to blame for the +anomaly. + +`initial_influencer_score`:: +(number) A normalized score between 0-100, which is based on the probability of +the influencer aggregated across detectors. This is the initial value that was +calculated at the time the bucket was processed. + +`is_interim`:: +(boolean) +include::{docdir}/ml/ml-shared.asciidoc[tag=is-interim] + +`job_id`:: +(string) +include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-anomaly-detection] + +`probability`:: +(number) The probability that the influencer has this behavior, in the range 0 +to 1. This value can be held to a high precision of over 300 decimal places, so +the `influencer_score` is provided as a human-readable and friendly +interpretation of this. + +`result_type`:: +(string) Internal. This value is always set to `influencer`. + +`timestamp`:: +(date) +include::{docdir}/ml/ml-shared.asciidoc[tag=timestamp-results] + +NOTE: Additional influencer properties are added, depending on the fields being +analyzed. For example, if it's analyzing `user_name` as an influencer, then a +field `user_name` is added to the result document. This information enables you to +filter the anomaly results more easily. [[ml-get-influencer-example]] ==== {api-examples-title} -The following example gets influencer information for the `it_ops_new_kpi` job: - [source,console] -------------------------------------------------- -GET _ml/anomaly_detectors/it_ops_new_kpi/results/influencers +GET _ml/anomaly_detectors/high_sum_total_sales/results/influencers { "sort": "influencer_score", "desc": true } -------------------------------------------------- -// TEST[skip:todo] +// TEST[skip:Kibana sample data] In this example, the API returns the following information, sorted based on the influencer score in descending order: [source,js] ---- { - "count": 28, + "count": 189, "influencers": [ { - "job_id": "it_ops_new_kpi", + "job_id": "high_sum_total_sales", "result_type": "influencer", - "influencer_field_name": "kpi_indicator", - "influencer_field_value": "online_purchases", - "kpi_indicator": "online_purchases", - "influencer_score": 94.1386, - "initial_influencer_score": 94.1386, - "probability": 0.000111612, - "bucket_span": 600, - "is_interim": false, - "timestamp": 1454943600000 + "influencer_field_name": "customer_full_name.keyword", + "influencer_field_value": "Wagdi Shaw", + "customer_full_name.keyword" : "Wagdi Shaw", + "influencer_score": 99.02493, + "initial_influencer_score" : 94.67233079580171, + "probability" : 1.4784807245686567E-10, + "bucket_span" : 3600, + "is_interim" : false, + "timestamp" : 1574661600000 }, ... ] diff --git a/docs/reference/ml/anomaly-detection/apis/get-overall-buckets.asciidoc b/docs/reference/ml/anomaly-detection/apis/get-overall-buckets.asciidoc index a678aa51442..f4c9bcec178 100644 --- a/docs/reference/ml/anomaly-detection/apis/get-overall-buckets.asciidoc +++ b/docs/reference/ml/anomaly-detection/apis/get-overall-buckets.asciidoc @@ -66,45 +66,63 @@ include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-anomaly-detection-wildcard-li include::{docdir}/ml/ml-shared.asciidoc[tag=allow-no-jobs] `bucket_span`:: - (Optional, string) The span of the overall buckets. Must be greater or equal - to the largest bucket span of the specified {anomaly-jobs}, which is the - default value. +(Optional, string) The span of the overall buckets. Must be greater or equal to +the largest bucket span of the specified {anomaly-jobs}, which is the default +value. `end`:: - (Optional, string) Returns overall buckets with timestamps earlier than this - time. +(Optional, string) Returns overall buckets with timestamps earlier than this +time. `exclude_interim`:: - (Optional, boolean) If `true`, the output excludes interim overall buckets. - Overall buckets are interim if any of the job buckets within the overall - bucket interval are interim. By default, interim results are included. +(Optional, boolean) +include::{docdir}/ml/ml-shared.asciidoc[tag=exclude-interim-results] ++ +-- +If any of the job bucket results within the overall bucket interval are interim +results, the overall bucket results are interim results. +-- `overall_score`:: - (Optional, double) Returns overall buckets with overall scores greater or - equal than this value. +(Optional, double) Returns overall buckets with overall scores greater than or +equal to this value. `start`:: - (Optional, string) Returns overall buckets with timestamps after this time. +(Optional, string) Returns overall buckets with timestamps after this time. `top_n`:: - (Optional, integer) The number of top {anomaly-job} bucket scores to be used - in the `overall_score` calculation. The default value is `1`. +(Optional, integer) The number of top {anomaly-job} bucket scores to be used in +the `overall_score` calculation. The default value is `1`. [[ml-get-overall-buckets-results]] ==== {api-response-body-title} -The API returns the following information: +The API returns an array of overall bucket objects, which have the following +properties: -`overall_buckets`:: - (array) An array of overall bucket objects. For more information, see - <>. +`bucket_span`:: +(number) The length of the bucket in seconds. Matches the job with the longest `bucket_span` value. + +`is_interim`:: +(boolean) +include::{docdir}/ml/ml-shared.asciidoc[tag=is-interim] + +`jobs`:: +(array) An array of objects that contain the `max_anomaly_score` per `job_id`. + +`overall_score`:: +(number) The `top_n` average of the maximum bucket `anomaly_score` per job. + +`result_type`:: +(string) Internal. This is always set to `overall_bucket`. + +`timestamp`:: +(date) +include::{docdir}/ml/ml-shared.asciidoc[tag=timestamp-results] [[ml-get-overall-buckets-example]] ==== {api-examples-title} -The following example gets overall buckets for {anomaly-jobs} with IDs matching -`job-*`: - [source,console] -------------------------------------------------- GET _ml/anomaly_detectors/job-*/results/overall_buckets diff --git a/docs/reference/ml/anomaly-detection/apis/get-record.asciidoc b/docs/reference/ml/anomaly-detection/apis/get-record.asciidoc index b5bbb15580e..d392b9f4f79 100644 --- a/docs/reference/ml/anomaly-detection/apis/get-record.asciidoc +++ b/docs/reference/ml/anomaly-detection/apis/get-record.asciidoc @@ -22,6 +22,22 @@ need `read` index privilege on the index that stores the results. The `machine_learning_admin` and `machine_learning_user` roles provide these privileges. See <> and <>. +[[ml-get-record-desc]] +==== {api-description-title} + +Records contain the detailed analytical results. They describe the anomalous +activity that has been identified in the input data based on the detector +configuration. + +There can be many anomaly records depending on the characteristics and size of +the input data. In practice, there are often too many to be able to manually +process them. The {ml-features} therefore perform a sophisticated aggregation of +the anomaly records into buckets. + +The number of record results depends on the number of anomalies found in each +bucket, which relates to the number of time series being modeled and the number +of detectors. + [[ml-get-record-path-parms]] ==== {api-path-parms-title} @@ -33,83 +49,194 @@ include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-anomaly-detection] ==== {api-request-body-title} `desc`:: - (Optional, boolean) If true, the results are sorted in descending order. +(Optional, boolean) +include::{docdir}/ml/ml-shared.asciidoc[tag=desc-results] `end`:: - (Optional, string) Returns records with timestamps earlier than this time. +(Optional, string) Returns records with timestamps earlier than this time. `exclude_interim`:: - (Optional, boolean) If true, the output excludes interim results. By default, - interim results are included. +(Optional, boolean) +include::{docdir}/ml/ml-shared.asciidoc[tag=exclude-interim-results] -`page`:: -`from`::: - (Optional, integer) Skips the specified number of records. -`size`::: - (Optional, integer) Specifies the maximum number of records to obtain. +`page`.`from`:: +(Optional, integer) Skips the specified number of records. + +`page`.`size`:: +(Optional, integer) Specifies the maximum number of records to obtain. `record_score`:: - (Optional, double) Returns records with anomaly scores greater or equal than - this value. +(Optional, double) Returns records with anomaly scores greater or equal than +this value. `sort`:: - (Optional, string) Specifies the sort field for the requested records. By - default, the records are sorted by the `anomaly_score` value. +(Optional, string) Specifies the sort field for the requested records. By +default, the records are sorted by the `anomaly_score` value. `start`:: - (Optional, string) Returns records with timestamps after this time. +(Optional, string) Returns records with timestamps after this time. [[ml-get-record-results]] ==== {api-response-body-title} -The API returns the following information: +The API returns an array of record objects, which have the following properties: -`records`:: - (array) An array of record objects. For more information, see - <>. +`actual`:: +(array) The actual value for the bucket. + +`bucket_span`:: +(number) +include::{docdir}/ml/ml-shared.asciidoc[tag=bucket-span-results] + +`by_field_name`:: +(string) +include::{docdir}/ml/ml-shared.asciidoc[tag=by-field-name] + +`by_field_value`:: +(string) The value of the by field. + +`causes`:: +(array) For population analysis, an over field must be specified in the detector. +This property contains an array of anomaly records that are the causes for the +anomaly that has been identified for the over field. If no over fields exist, +this field is not present. This sub-resource contains the most anomalous records +for the `over_field_name`. For scalability reasons, a maximum of the 10 most +significant causes of the anomaly are returned. As part of the core analytical +modeling, these low-level anomaly records are aggregated for their parent over +field record. The causes resource contains similar elements to the record +resource, namely `actual`, `typical`, `geo_results.actual_point`, +`geo_results.typical_point`, `*_field_name` and `*_field_value`. Probability and +scores are not applicable to causes. + +`detector_index`:: +(number) +include::{docdir}/ml/ml-shared.asciidoc[tag=detector-index] + +`field_name`:: +(string) Certain functions require a field to operate on, for example, `sum()`. +For those functions, this value is the name of the field to be analyzed. + +`function`:: +(string) The function in which the anomaly occurs, as specified in the +detector configuration. For example, `max`. + +`function_description`:: +(string) The description of the function in which the anomaly occurs, as +specified in the detector configuration. + +`geo_results.actual_point`:: +(string) The actual value for the bucket formatted as a `geo_point`. If the +detector function is `lat_long`, this is a comma delimited string of the +latitude and longitude. + +`geo_results.typical_point`:: +(string) The typical value for the bucket formatted as a `geo_point`. If the +detector function is `lat_long`, this is a comma delimited string of the +latitude and longitude. + +`influencers`:: +(array) If `influencers` was specified in the detector configuration, this array +contains influencers that contributed to or were to blame for an anomaly. + +`initial_record_score`:: +(number) A normalized score between 0-100, which is based on the probability of +the anomalousness of this record. This is the initial value that was calculated +at the time the bucket was processed. + +`is_interim`:: +(boolean) +include::{docdir}/ml/ml-shared.asciidoc[tag=is-interim] + +`job_id`:: +(string) +include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-anomaly-detection] + +`over_field_name`:: +(string) +include::{docdir}/ml/ml-shared.asciidoc[tag=over-field-name] + +`over_field_value`:: +(string) The value of the over field. + +`partition_field_name`:: +(string) +include::{docdir}/ml/ml-shared.asciidoc[tag=partition-field-name] + +`partition_field_value`:: +(string) The value of the partition field. + +`probability`:: +(number) The probability of the individual anomaly occurring, in the range `0` +to `1`. This value can be held to a high precision of over 300 decimal places, +so the `record_score` is provided as a human-readable and friendly +interpretation of this. + +`multi_bucket_impact`:: +(number) an indication of how strongly an anomaly is multi bucket or single +bucket. The value is on a scale of `-5.0` to `+5.0` where `-5.0` means the +anomaly is purely single bucket and `+5.0` means the anomaly is purely multi +bucket. + +`record_score`:: +(number) A normalized score between 0-100, which is based on the probability of +the anomalousness of this record. Unlike `initial_record_score`, this value will +be updated by a re-normalization process as new data is analyzed. + +`result_type`:: +(string) Internal. This is always set to `record`. + +`timestamp`:: +(date) +include::{docdir}/ml/ml-shared.asciidoc[tag=timestamp-results] + +`typical`:: +(array) The typical value for the bucket, according to analytical modeling. + +NOTE: Additional record properties are added, depending on the fields being +analyzed. For example, if it's analyzing `hostname` as a _by field_, then a field +`hostname` is added to the result document. This information enables you to +filter the anomaly results more easily. [[ml-get-record-example]] ==== {api-examples-title} -The following example gets record information for the `it-ops-kpi` job: - [source,console] -------------------------------------------------- -GET _ml/anomaly_detectors/it-ops-kpi/results/records +GET _ml/anomaly_detectors/low_request_rate/results/records { "sort": "record_score", "desc": true, "start": "1454944100000" } -------------------------------------------------- -// TEST[skip:todo] +// TEST[skip:Kibana sample data] In this example, the API returns twelve results for the specified time constraints: [source,js] ---- { - "count": 12, - "records": [ + "count" : 4, + "records" : [ { - "job_id": "it-ops-kpi", - "result_type": "record", - "probability": 0.00000332668, - "record_score": 72.9929, - "initial_record_score": 65.7923, - "bucket_span": 300, - "detector_index": 0, - "is_interim": false, - "timestamp": 1454944200000, - "function": "low_sum", - "function_description": "sum", - "typical": [ - 1806.48 + "job_id" : "low_request_rate", + "result_type" : "record", + "probability" : 1.3882308899968812E-4, + "multi_bucket_impact" : -5.0, + "record_score" : 94.98554565630553, + "initial_record_score" : 94.98554565630553, + "bucket_span" : 3600, + "detector_index" : 0, + "is_interim" : false, + "timestamp" : 1577793600000, + "function" : "low_count", + "function_description" : "count", + "typical" : [ + 28.254208230188834 ], - "actual": [ - 288 - ], - "field_name": "events_per_min" + "actual" : [ + 0.0 + ] }, ... ] diff --git a/docs/reference/ml/anomaly-detection/apis/resultsresource.asciidoc b/docs/reference/ml/anomaly-detection/apis/resultsresource.asciidoc deleted file mode 100644 index e05b9318d16..00000000000 --- a/docs/reference/ml/anomaly-detection/apis/resultsresource.asciidoc +++ /dev/null @@ -1,479 +0,0 @@ -[role="xpack"] -[testenv="platinum"] -[[ml-results-resource]] -=== Results resources - -Several different result types are created for each job. You can query anomaly -results for _buckets_, _influencers_, and _records_ by using the results API. -Summarized bucket results over multiple jobs can be queried as well; those -results are called _overall buckets_. - -Results are written for each `bucket_span`. The timestamp for the results is the -start of the bucket time interval. - -The results include scores, which are calculated for each anomaly result type and -each bucket interval. These scores are aggregated in order to reduce noise, and -normalized in order to identify and rank the most mathematically significant -anomalies. - -Bucket results provide the top level, overall view of the job and are ideal for -alerts. For example, the bucket results might indicate that at 16:05 the system -was unusual. This information is a summary of all the anomalies, pinpointing -when they occurred. - -Influencer results show which entities were anomalous and when. For example, -the influencer results might indicate that at 16:05 `user_name: Bob` was unusual. -This information is a summary of all the anomalies for each entity, so there -can be a lot of these results. Once you have identified a notable bucket time, -you can look to see which entities were significant. - -Record results provide details about what the individual anomaly was, when it -occurred and which entity was involved. For example, the record results might -indicate that at 16:05 Bob sent 837262434 bytes, when the typical value was -1067 bytes. Once you have identified a bucket time and perhaps a significant -entity too, you can drill through to the record results in order to investigate -the anomalous behavior. - -Categorization results contain the definitions of _categories_ that have been -identified. These are only applicable for jobs that are configured to analyze -unstructured log data using categorization. These results do not contain a -timestamp or any calculated scores. For more information, see -{ml-docs}/ml-configuring-categories.html[Categorizing log messages]. - -* <> -* <> -* <> -* <> -* <> - -NOTE: All of these resources and properties are informational; you cannot -change their values. - -[float] -[[ml-results-buckets]] -==== Buckets - -Bucket results provide the top level, overall view of the job and are best for -alerting. - -Each bucket has an `anomaly_score`, which is a statistically aggregated and -normalized view of the combined anomalousness of all the record results within -each bucket. - -One bucket result is written for each `bucket_span` for each job, even if it is -not considered to be anomalous. If the bucket is not anomalous, it has an -`anomaly_score` of zero. - -When you identify an anomalous bucket, you can investigate further by expanding -the bucket resource to show the records as nested objects. Alternatively, you -can access the records resource directly and filter by the date range. - -A bucket resource has the following properties: - -`anomaly_score`:: - (number) The maximum anomaly score, between 0-100, for any of the bucket - influencers. This is an overall, rate-limited score for the job. All the - anomaly records in the bucket contribute to this score. This value might be - updated as new data is analyzed. - -`bucket_influencers`:: - (array) An array of bucket influencer objects. - For more information, see <>. - -`bucket_span`:: - (number) The length of the bucket in seconds. - This value matches the `bucket_span` that is specified in the job. - -`event_count`:: - (number) The number of input data records processed in this bucket. - -`initial_anomaly_score`:: - (number) The maximum `anomaly_score` for any of the bucket influencers. - This is the initial value that was calculated at the time the bucket was - processed. - -`is_interim`:: - (boolean) If true, this is an interim result. In other words, the bucket - results are calculated based on partial input data. - -`job_id`:: - (string) The unique identifier for the job that these results belong to. - -`processing_time_ms`:: - (number) The amount of time, in milliseconds, that it took to analyze the - bucket contents and calculate results. - -`result_type`:: - (string) Internal. This value is always set to `bucket`. - -`timestamp`:: - (date) The start time of the bucket. This timestamp uniquely identifies the - bucket. + - -NOTE: Events that occur exactly at the timestamp of the bucket are included in -the results for the bucket. - - -[float] -[[ml-results-bucket-influencers]] -==== Bucket Influencers - -Bucket influencer results are available as nested objects contained within -bucket results. These results are an aggregation for each type of influencer. -For example, if both `client_ip` and `user_name` were specified as influencers, -then you would be able to determine when the `client_ip` or `user_name` values -were collectively anomalous. - -There is a built-in bucket influencer called `bucket_time` which is always -available. This bucket influencer is the aggregation of all records in the -bucket; it is not just limited to a type of influencer. - -NOTE: A bucket influencer is a type of influencer. For example, `client_ip` or -`user_name` can be bucket influencers, whereas `192.168.88.2` and `Bob` are -influencers. - -An bucket influencer object has the following properties: - -`anomaly_score`:: - (number) A normalized score between 0-100, which is calculated for each bucket - influencer. This score might be updated as newer data is analyzed. - -`bucket_span`:: - (number) The length of the bucket in seconds. This value matches the `bucket_span` - that is specified in the job. - -`initial_anomaly_score`:: - (number) The score between 0-100 for each bucket influencer. This score is - the initial value that was calculated at the time the bucket was processed. - -`influencer_field_name`:: - (string) The field name of the influencer. For example `client_ip` or - `user_name`. - -`influencer_field_value`:: - (string) The field value of the influencer. For example `192.168.88.2` or - `Bob`. - -`is_interim`:: - (boolean) If true, this is an interim result. In other words, the bucket - influencer results are calculated based on partial input data. - -`job_id`:: - (string) The unique identifier for the job that these results belong to. - -`probability`:: - (number) The probability that the bucket has this behavior, in the range 0 - to 1. For example, 0.0000109783. This value can be held to a high precision - of over 300 decimal places, so the `anomaly_score` is provided as a - human-readable and friendly interpretation of this. - -`raw_anomaly_score`:: - (number) Internal. - -`result_type`:: - (string) Internal. This value is always set to `bucket_influencer`. - -`timestamp`:: - (date) The start time of the bucket for which these results were calculated. - -[float] -[[ml-results-influencers]] -==== Influencers - -Influencers are the entities that have contributed to, or are to blame for, -the anomalies. Influencer results are available only if an -`influencer_field_name` is specified in the job configuration. - -Influencers are given an `influencer_score`, which is calculated based on the -anomalies that have occurred in each bucket interval. For jobs with more than -one detector, this gives a powerful view of the most anomalous entities. - -For example, if you are analyzing unusual bytes sent and unusual domains -visited and you specified `user_name` as the influencer, then an -`influencer_score` for each anomalous user name is written per bucket. For -example, if `user_name: Bob` had an `influencer_score` greater than 75, then -`Bob` would be considered very anomalous during this time interval in one or -both of those areas (unusual bytes sent or unusual domains visited). - -One influencer result is written per bucket for each influencer that is -considered anomalous. - -When you identify an influencer with a high score, you can investigate further -by accessing the records resource for that bucket and enumerating the anomaly -records that contain the influencer. - -An influencer object has the following properties: - -`bucket_span`:: - (number) The length of the bucket in seconds. This value matches the `bucket_span` - that is specified in the job. - -`influencer_score`:: - (number) A normalized score between 0-100, which is based on the probability - of the influencer in this bucket aggregated across detectors. Unlike - `initial_influencer_score`, this value will be updated by a re-normalization - process as new data is analyzed. - -`initial_influencer_score`:: - (number) A normalized score between 0-100, which is based on the probability - of the influencer aggregated across detectors. This is the initial value that - was calculated at the time the bucket was processed. - -`influencer_field_name`:: - (string) The field name of the influencer. - -`influencer_field_value`:: - (string) The entity that influenced, contributed to, or was to blame for the - anomaly. - -`is_interim`:: - (boolean) If true, this is an interim result. In other words, the influencer - results are calculated based on partial input data. - -`job_id`:: - (string) The unique identifier for the job that these results belong to. - -`probability`:: - (number) The probability that the influencer has this behavior, in the range - 0 to 1. For example, 0.0000109783. This value can be held to a high precision - of over 300 decimal places, so the `influencer_score` is provided as a - human-readable and friendly interpretation of this. -// For example, 0.03 means 3%. This value is held to a high precision of over -//300 decimal places. In scientific notation, a value of 3.24E-300 is highly -//unlikely and therefore highly anomalous. - -`result_type`:: - (string) Internal. This value is always set to `influencer`. - -`timestamp`:: - (date) The start time of the bucket for which these results were calculated. - -NOTE: Additional influencer properties are added, depending on the fields being -analyzed. For example, if it's analyzing `user_name` as an influencer, then a -field `user_name` is added to the result document. This information enables you to -filter the anomaly results more easily. - - -[float] -[[ml-results-records]] -==== Records - -Records contain the detailed analytical results. They describe the anomalous -activity that has been identified in the input data based on the detector -configuration. - -For example, if you are looking for unusually large data transfers, an anomaly -record can identify the source IP address, the destination, the time window -during which it occurred, the expected and actual size of the transfer, and the -probability of this occurrence. - -There can be many anomaly records depending on the characteristics and size of -the input data. In practice, there are often too many to be able to manually -process them. The {ml-features} therefore perform a sophisticated -aggregation of the anomaly records into buckets. - -The number of record results depends on the number of anomalies found in each -bucket, which relates to the number of time series being modeled and the number of -detectors. - -A record object has the following properties: - -`actual`:: - (array) The actual value for the bucket. - -`bucket_span`:: - (number) The length of the bucket in seconds. - This value matches the `bucket_span` that is specified in the job. - -`by_field_name`:: - (string) The name of the analyzed field. This value is present only if - it is specified in the detector. For example, `client_ip`. - -`by_field_value`:: - (string) The value of `by_field_name`. This value is present only if - it is specified in the detector. For example, `192.168.66.2`. - -`causes`:: - (array) For population analysis, an over field must be specified in the - detector. This property contains an array of anomaly records that are the - causes for the anomaly that has been identified for the over field. If no - over fields exist, this field is not present. This sub-resource contains - the most anomalous records for the `over_field_name`. For scalability reasons, - a maximum of the 10 most significant causes of the anomaly are returned. As - part of the core analytical modeling, these low-level anomaly records are - aggregated for their parent over field record. The causes resource contains - similar elements to the record resource, namely `actual`, `typical`, - `geo_results.actual_point`, `geo_results.typical_point`, - `*_field_name` and `*_field_value`. - Probability and scores are not applicable to causes. - -`detector_index`:: - (number) A unique identifier for the detector. - -`field_name`:: - (string) Certain functions require a field to operate on, for example, `sum()`. - For those functions, this value is the name of the field to be analyzed. - -`function`:: - (string) The function in which the anomaly occurs, as specified in the - detector configuration. For example, `max`. - -`function_description`:: - (string) The description of the function in which the anomaly occurs, as - specified in the detector configuration. - -`influencers`:: - (array) If `influencers` was specified in the detector configuration, then - this array contains influencers that contributed to or were to blame for an - anomaly. - -`initial_record_score`:: - (number) A normalized score between 0-100, which is based on the - probability of the anomalousness of this record. This is the initial value - that was calculated at the time the bucket was processed. - -`is_interim`:: - (boolean) If true, this is an interim result. In other words, the anomaly - record is calculated based on partial input data. - -`job_id`:: - (string) The unique identifier for the job that these results belong to. - -`over_field_name`:: - (string) The name of the over field that was used in the analysis. This value - is present only if it was specified in the detector. Over fields are used - in population analysis. For example, `user`. - -`over_field_value`:: - (string) The value of `over_field_name`. This value is present only if it - was specified in the detector. For example, `Bob`. - -`partition_field_name`:: - (string) The name of the partition field that was used in the analysis. This - value is present only if it was specified in the detector. For example, - `region`. - -`partition_field_value`:: - (string) The value of `partition_field_name`. This value is present only if - it was specified in the detector. For example, `us-east-1`. - -`probability`:: - (number) The probability of the individual anomaly occurring, in the range - 0 to 1. For example, 0.0000772031. This value can be held to a high precision - of over 300 decimal places, so the `record_score` is provided as a - human-readable and friendly interpretation of this. -//In scientific notation, a value of 3.24E-300 is highly unlikely and therefore -//highly anomalous. - -`multi_bucket_impact`:: - (number) an indication of how strongly an anomaly is multi bucket or single bucket. - The value is on a scale of -5 to +5 where -5 means the anomaly is purely single - bucket and +5 means the anomaly is purely multi bucket. - -`record_score`:: - (number) A normalized score between 0-100, which is based on the probability - of the anomalousness of this record. Unlike `initial_record_score`, this - value will be updated by a re-normalization process as new data is analyzed. - -`result_type`:: - (string) Internal. This is always set to `record`. - -`timestamp`:: - (date) The start time of the bucket for which these results were calculated. - -`typical`:: - (array) The typical value for the bucket, according to analytical modeling. - -`geo_results.actual_point`:: - (string) The actual value for the bucket formatted as a `geo_point`. - If the detector function is `lat_long`, this is a comma delimited string - of the latitude and longitude. - -`geo_results.typical_point`:: - (string) The typical value for the bucket formatted as a `geo_point`. - If the detector function is `lat_long`, this is a comma delimited string - of the latitude and longitude. - -NOTE: Additional record properties are added, depending on the fields being -analyzed. For example, if it's analyzing `hostname` as a _by field_, then a field -`hostname` is added to the result document. This information enables you to -filter the anomaly results more easily. - - -[float] -[[ml-results-categories]] -==== Categories - -When `categorization_field_name` is specified in the job configuration, it is -possible to view the definitions of the resulting categories. A category -definition describes the common terms matched and contains examples of matched -values. - -The anomaly results from a categorization analysis are available as bucket, -influencer, and record results. For example, the results might indicate that -at 16:45 there was an unusual count of log message category 11. You can then -examine the description and examples of that category. - -A category resource has the following properties: - -`category_id`:: - (unsigned integer) A unique identifier for the category. - -`examples`:: - (array) A list of examples of actual values that matched the category. - -`grok_pattern`:: - experimental[] (string) A Grok pattern that could be used in Logstash or an - Ingest Pipeline to extract fields from messages that match the category. This - field is experimental and may be changed or removed in a future release. The - Grok patterns that are found are not optimal, but are often a good starting - point for manual tweaking. - -`job_id`:: - (string) The unique identifier for the job that these results belong to. - -`max_matching_length`:: - (unsigned integer) The maximum length of the fields that matched the category. - The value is increased by 10% to enable matching for similar fields that have - not been analyzed. - -`regex`:: - (string) A regular expression that is used to search for values that match the - category. - -`terms`:: - (string) A space separated list of the common tokens that are matched in - values of the category. - -[float] -[[ml-results-overall-buckets]] -==== Overall Buckets - -Overall buckets provide a summary of bucket results over multiple jobs. -Their `bucket_span` equals the longest `bucket_span` of the jobs in question. -The `overall_score` is the `top_n` average of the max `anomaly_score` per job -within the overall bucket time interval. -This means that you can fine-tune the `overall_score` so that it is more -or less sensitive to the number of jobs that detect an anomaly at the same time. - -An overall bucket resource has the following properties: - -`timestamp`:: - (date) The start time of the overall bucket. - -`bucket_span`:: - (number) The length of the bucket in seconds. Matches the `bucket_span` - of the job with the longest one. - -`overall_score`:: - (number) The `top_n` average of the max bucket `anomaly_score` per job. - -`jobs`:: - (array) An array of objects that contain the `max_anomaly_score` per `job_id`. - -`is_interim`:: - (boolean) If true, this is an interim result. In other words, the anomaly - record is calculated based on partial input data. - -`result_type`:: - (string) Internal. This is always set to `overall_bucket`. diff --git a/docs/reference/redirects.asciidoc b/docs/reference/redirects.asciidoc index 9a54ea8b645..3baf2c6ed07 100644 --- a/docs/reference/redirects.asciidoc +++ b/docs/reference/redirects.asciidoc @@ -498,3 +498,20 @@ the details in <>. This page was deleted. [[ml-snapshot-stats]] See <> and <>. + +[role="exclude",id="ml-results-resource"] +=== Results resources + +This page was deleted. +[[ml-results-buckets]] +See <>, +[[ml-results-bucket-influencers]] +<>, +[[ml-results-influencers]] +<>, +[[ml-results-records]] +<>, +[[ml-results-categories]] +<>, and +[[ml-results-overall-buckets]] +<>. diff --git a/docs/reference/rest-api/defs.asciidoc b/docs/reference/rest-api/defs.asciidoc index 8767d3ed481..ce5fe0f8d35 100644 --- a/docs/reference/rest-api/defs.asciidoc +++ b/docs/reference/rest-api/defs.asciidoc @@ -6,9 +6,7 @@ These resource definitions are used in APIs related to {ml-features} and {security-features} and in {kib} advanced {ml} job configuration options. * <> -* <> * <> include::{es-repo-dir}/ml/df-analytics/apis/analysisobjects.asciidoc[] include::{xes-repo-dir}/rest-api/security/role-mapping-resources.asciidoc[] -include::{es-repo-dir}/ml/anomaly-detection/apis/resultsresource.asciidoc[]