From 019b1f7ece66fc319d5fc1d3b222620fbc328917 Mon Sep 17 00:00:00 2001 From: Lisa Cawley Date: Tue, 25 Apr 2017 08:15:53 -0700 Subject: [PATCH] [DOCS] Fix doc build errors for elastic/x-pack-elasticsearch#1197 (elastic/x-pack-elasticsearch#1199) Original commit: elastic/x-pack-elasticsearch@30f69513ab34f9a7cae6b5bd9fc78d1f2a1aca5c --- docs/en/rest-api/ml/resultsresource.asciidoc | 52 ++++++++++---------- 1 file changed, 25 insertions(+), 27 deletions(-) diff --git a/docs/en/rest-api/ml/resultsresource.asciidoc b/docs/en/rest-api/ml/resultsresource.asciidoc index e2ccddac536..2e3bff309e8 100644 --- a/docs/en/rest-api/ml/resultsresource.asciidoc +++ b/docs/en/rest-api/ml/resultsresource.asciidoc @@ -7,7 +7,7 @@ Anomaly results for _buckets_, _influencers_ and _records_ can be queried using These results are written for every `bucket_span`, with the timestamp being the start of the time interval. -As part of the results, scores are calculated for each anomaly result type and each bucket interval. +As part of the results, scores are calculated for each anomaly result type and each bucket interval. These are aggregated in order to reduce noise, and normalized in order to identify and rank the most mathematically significant anomalies. Bucket results provide the top level, overall view of the job and are ideal for alerting on. @@ -16,17 +16,17 @@ This is a summary of all the anomalies, pinpointing when they occurred. Influencer results show which entities were anomalous and when. For example, at 16:05 `user_name: Bob` was unusual. -This is a summary of all anomalies for each entity, so there can be a lot of these results. -Once you have identified a noteable bucket time, you can look to see which entites were significant. +This is a summary of all anomalies for each entity, so there can be a lot of these results. +Once you have identified a notable bucket time, you can look to see which entites were significant. Record results provide the detail showing what the individual anomaly was, when it occurred and which entity was involved. For example, at 16:05 Bob sent 837262434 bytes, when the typical value was 1067 bytes. -Once you have identifed a bucket time and/or a significant entity, you can drill through to the record results +Once you have identified a bucket time and/or a significant entity, you can drill through to the record results in order to investigate the anomalous behavior. //TBD Add links to categorization Categorization results contain the definitions of _categories_ that have been identified. -These are only applicable for jobs that are configured to analyze unstructured log data using categorization. +These are only applicable for jobs that are configured to analyze unstructured log data using categorization. These results do not contain a timestamp or any calculated scores. * <> @@ -43,7 +43,7 @@ Bucket results provide the top level, overall view of the job and are best for a Each bucket has an `anomaly_score`, which is a statistically aggregated and normalized view of the combined anomalousness of all record results within each bucket. -One bucket result is written for each `bucket_span` for each job, even if it is not considered to be anomalous +One bucket result is written for each `bucket_span` for each job, even if it is not considered to be anomalous (when it will have an `anomaly_score` of zero). Upon identifying an anomalous bucket, you can investigate further by either @@ -71,7 +71,7 @@ A bucket resource has the following properties: `initial_anomaly_score`:: (number) The maximum `anomaly_score` for any of the bucket influencers. - This is this initial value calculated at the time the bucket was processed. + This is this initial value calculated at the time the bucket was processed. `is_interim`:: (boolean) If true, then this bucket result is an interim result. @@ -91,20 +91,18 @@ A bucket resource has the following properties: `timestamp`:: (date) The start time of the bucket. This timestamp uniquely identifies the bucket. + -+ --- + NOTE: Events that occur exactly at the timestamp of the bucket are included in the results for the bucket. --- [float] [[ml-results-bucket-influencers]] -====== Bucket Influencers +===== Bucket Influencers Bucket influencer results are available as nested objects contained within bucket results. -These results are an aggregation for each the type of influencer. -For example if both client_ip and user_name were specified as influencers, +These results are an aggregation for each the type of influencer. +For example if both client_ip and user_name were specified as influencers, then you would be able to find when client_ip's or user_name's were collectively anomalous. There is a built-in bucket influencer called `bucket_time` which is always available. @@ -125,7 +123,7 @@ An bucket influencer object has the following properties: `initial_anomaly_score`:: (number) The score between 0-100 for each bucket influencers. - This is this initial value calculated at the time the bucket was processed. + This is this initial value calculated at the time the bucket was processed. `influencer_field_name`:: (string) The field name of the influencer. For example `client_ip` or `user_name`. @@ -170,7 +168,7 @@ For jobs with more than one detector, this gives a powerful view of the most ano For example, if analyzing unusual bytes sent and unusual domains visited, if user_name was specified as the influencer, then an 'influencer_score' for each anomalous user_name would be written per bucket. -E.g. If `user_name: Bob` had an `influencer_score` > 75, +E.g. If `user_name: Bob` had an `influencer_score` > 75, then `Bob` would be considered very anomalous during this time interval in either or both of those attack vectors. One `influencer` result is written per bucket for each influencer that is considered anomalous. @@ -186,13 +184,13 @@ An influencer object has the following properties: This value matches the `bucket_span` that is specified in the job. `influencer_score`:: - (number) A normalized score between 0-100, based on the probability of the influencer in this bucket, + (number) A normalized score between 0-100, based on the probability of the influencer in this bucket, aggregated across detectors. Unlike `initial_influencer_score`, this value will be updated by a re-normalization process as new data is analyzed. `initial_influencer_score`:: (number) A normalized score between 0-100, based on the probability of the influencer, aggregated across detectors. - This is this initial value calculated at the time the bucket was processed. + This is this initial value calculated at the time the bucket was processed. `influencer_field_name`:: (string) The field name of the influencer. @@ -209,9 +207,9 @@ An influencer object has the following properties: (string) The unique identifier for the job that these results belong to. `probability`:: - (number) The probability that the influencer has this behavior, in the range 0 to 1. + (number) The probability that the influencer has this behavior, in the range 0 to 1. For example, 0.0000109783. - This value can be held to a high precision of over 300 decimal places, + This value can be held to a high precision of over 300 decimal places, so the `influencer_score` is provided as a human-readable and friendly interpretation of this. // For example, 0.03 means 3%. This value is held to a high precision of over //300 decimal places. In scientific notation, a value of 3.24E-300 is highly @@ -226,8 +224,8 @@ An influencer object has the following properties: `timestamp`:: (date) The start time of the bucket for which these results have been calculated for. -NOTE: Additional influencer properties are added, depending on the fields being analyzed. -For example, if analysing `user_name` as an influencer, then a field `user_name` would be added to the +NOTE: Additional influencer properties are added, depending on the fields being analyzed. +For example, if analysing `user_name` as an influencer, then a field `user_name` would be added to the result document. This allows easier filtering of the anomaly results. @@ -278,7 +276,7 @@ A record object has the following properties: For scalability reasons, a maximum of the 10 most significant causes of the anomaly will be returned. As part of the core analytical modeling, these low-level anomaly records are aggregated for their parent over field record. - The causes resource contains similar elements to the record resource, + The causes resource contains similar elements to the record resource, namely `actual`, `typical`, `*_field_name` and `*_field_value`. Probability and scores are not applicable to causes. @@ -303,7 +301,7 @@ A record object has the following properties: `initial_record_score`:: (number) A normalized score between 0-100, based on the probability of the anomalousness of this record. - This is this initial value calculated at the time the bucket was processed. + This is this initial value calculated at the time the bucket was processed. `is_interim`:: (boolean) If true, then this anomaly record is an interim result. @@ -352,8 +350,8 @@ A record object has the following properties: `typical`:: (array) The typical value for the bucket, according to analytical modeling. -NOTE: Additional record properties are added, depending on the fields being analyzed. -For example, if analyzing `hostname` as a _by field_, then a field `hostname` would be added to the +NOTE: Additional record properties are added, depending on the fields being analyzed. +For example, if analyzing `hostname` as a _by field_, then a field `hostname` would be added to the result document. This allows easier filtering of the anomaly results. @@ -361,8 +359,8 @@ result document. This allows easier filtering of the anomaly results. [[ml-results-categories]] ===== Categories -When `categorization_field_name` is specified in the job configuration, -it is possible to view the definitions of the resulting categories. +When `categorization_field_name` is specified in the job configuration, +it is possible to view the definitions of the resulting categories. A category definition describes the common terms matched and contains examples of matched values. The anomaly results from a categorization analysis are available as _buckets_, _influencers_ and _records_ results.