356 lines
12 KiB
Plaintext
356 lines
12 KiB
Plaintext
//lcawley Verified example output 2017-04-11
|
|
[[ml-results-resource]]
|
|
==== Results Resources
|
|
|
|
The results of a job are organized into _records_ and _buckets_.
|
|
The results are aggregated and normalized in order to identify the mathematically
|
|
significant anomalies.
|
|
|
|
When categorization is specified, the results also contain category definitions.
|
|
|
|
* <<ml-results-records,Records>>
|
|
* <<ml-results-influencers,Influencers>>
|
|
* <<ml-results-buckets,Buckets>>
|
|
* <<ml-results-categories,Categories>>
|
|
|
|
[float]
|
|
[[ml-results-records]]
|
|
===== Records
|
|
|
|
Records contain the analytic results. They detail the anomalous activity that
|
|
has been identified in the input data based upon the detector configuration.
|
|
For example, if you are looking for unusually large data transfers,
|
|
an anomaly record would identify the source IP address, the destination,
|
|
the time window during which it occurred, the expected and actual size of the
|
|
transfer and the probability of this occurring.
|
|
Something that is highly improbable is therefore highly anomalous.
|
|
|
|
There can be many anomaly records depending upon the characteristics and size
|
|
of the input data; in practice too many to be able to manually process.
|
|
The {xpack} {ml} features therefore perform a sophisticated aggregation of
|
|
the anomaly records into buckets.
|
|
|
|
A record object has the following properties:
|
|
|
|
`actual`::
|
|
(number) The actual value for the bucket.
|
|
|
|
`bucket_span`::
|
|
(number) The length of the bucket in seconds.
|
|
This value matches the `bucket_span` that is specified in the job.
|
|
|
|
//`byFieldName`::
|
|
//TBD: This field did not appear in my results, but it might be a valid property.
|
|
// (string) The name of the analyzed field, if it was specified in the detector.
|
|
|
|
//`byFieldValue`::
|
|
//TBD: This field did not appear in my results, but it might be a valid property.
|
|
// (string) The value of `by_field_name`, if it was specified in the detecter.
|
|
|
|
//`causes`
|
|
//TBD: This field did not appear in my results, but it might be a valid property.
|
|
// (array) If an over field was specified in the detector, this property
|
|
// contains an array of anomaly records that are the causes for the anomaly
|
|
// that has been identified for the over field.
|
|
// If no over fields exist. this field will not be present.
|
|
// This sub-resource contains the most anomalous records for the `over_field_name`.
|
|
// For scalability reasons, a maximum of the 10 most significant causes of
|
|
// the anomaly will be returned. As part of the core analytical modeling,
|
|
// these low-level anomaly records are aggregated for their parent over field record.
|
|
// The causes resource contains similar elements to the record resource,
|
|
// namely actual, typical, *FieldName and *FieldValue.
|
|
// Probability and scores are not applicable to causes.
|
|
|
|
`detector_index`::
|
|
(number) A unique identifier for the detector.
|
|
|
|
`field_name`::
|
|
(string) Certain functions require a field to operate on.
|
|
For those functions, this is the name of the field to be analyzed.
|
|
|
|
`function`::
|
|
(string) The function in which the anomaly occurs.
|
|
|
|
`function_description`::
|
|
(string) The description of the function in which the anomaly occurs, as
|
|
specified in the detector configuration information.
|
|
|
|
`influencers`::
|
|
(array) If `influencers` was specified in the detector configuration, then
|
|
this array contains influencers that contributed to or were to blame for an
|
|
anomaly.
|
|
|
|
`initial_record_score`::
|
|
() TBD. For example, 94.1386.
|
|
|
|
`is_interim`::
|
|
(boolean) If true, then this anomaly record is an interim result.
|
|
In other words, it is calculated based on partial input data
|
|
|
|
`job_id`::
|
|
(string) A numerical character string that uniquely identifies the job.
|
|
|
|
//`kpi_indicator`::
|
|
// () TBD. For example, ["online_purchases"]
|
|
// I did not receive this in later tests. Is it still valid?
|
|
|
|
`partition_field_name`::
|
|
(string) The name of the partition field that was used in the analysis, if
|
|
such a field was specified in the detector.
|
|
|
|
//`overFieldName`::
|
|
// TBD: This field did not appear in my results, but it might be a valid property.
|
|
// (string) The name of the over field, if `over_field_name` was specified
|
|
// in the detector.
|
|
|
|
`partition_field_value`::
|
|
(string) The value of the partition field that was used in the analysis, if
|
|
`partition_field_name` was specified in the detector.
|
|
|
|
`probability`::
|
|
(number) The probability of the individual anomaly occurring.
|
|
This value is in the range 0 to 1. For example, 0.0000772031.
|
|
//This value is held to a high precision of over 300 decimal places.
|
|
//In scientific notation, a value of 3.24E-300 is highly unlikely and therefore
|
|
//highly anomalous.
|
|
|
|
`record_score`::
|
|
(number) An anomaly score for the bucket time interval.
|
|
The score is calculated based on a sophisticated aggregation of the anomalies
|
|
in the bucket.
|
|
//Use this score for rate-controlled alerting.
|
|
|
|
`result_type`::
|
|
(string) TBD. For example, "record".
|
|
|
|
`sequence_num`::
|
|
() TBD. For example, 1.
|
|
|
|
`timestamp`::
|
|
(date) The start time of the bucket that contains the record, specified in
|
|
ISO 8601 format. For example, 1454020800000.
|
|
|
|
`typical`::
|
|
(number) The typical value for the bucket, according to analytical modeling.
|
|
|
|
[float]
|
|
[[ml-results-influencers]]
|
|
===== Influencers
|
|
|
|
Influencers are the entities that have contributed to, or are to blame for,
|
|
the anomalies. Influencers are given an anomaly score, which is calculated
|
|
based on the anomalies that have occurred in each bucket interval.
|
|
For jobs with more than one detector, this gives a powerful view of the most
|
|
anomalous entities.
|
|
|
|
Upon identifying an influencer with a high score, you can investigate further
|
|
by accessing the records resource for that bucket and enumerating the anomaly
|
|
records that contain this influencer.
|
|
|
|
An influencer object has the following properties:
|
|
|
|
`bucket_span`::
|
|
() TBD. For example, 300.
|
|
|
|
// Same as for buckets? i.e. (unsigned integer) The length of the bucket in seconds.
|
|
// This value is equal to the `bucket_span` value in the job configuration.
|
|
|
|
`influencer_score`::
|
|
(number) An anomaly score for the influencer in this bucket time interval.
|
|
The score is calculated based upon a sophisticated aggregation of the anomalies
|
|
in the bucket for this entity. For example: 94.1386.
|
|
|
|
`initial_influencer_score`::
|
|
() TBD. For example, 83.3831.
|
|
|
|
`influencer_field_name`::
|
|
(string) The field name of the influencer.
|
|
|
|
`influencer_field_value`::
|
|
(string) The entity that influenced, contributed to, or was to blame for the
|
|
anomaly.
|
|
|
|
`is_interim`::
|
|
(boolean) If true, then this is an interim result.
|
|
In other words, it is calculated based on partial input data.
|
|
|
|
`job_id`::
|
|
(string) A numerical character string that uniquely identifies the job.
|
|
|
|
`kpi_indicator`::
|
|
() TBD. For example, "online_purchases".
|
|
|
|
`probability`::
|
|
(number) The probability that the influencer has this behavior.
|
|
This value is in the range 0 to 1. For example, 0.0000109783.
|
|
// For example, 0.03 means 3%. This value is held to a high precision of over
|
|
//300 decimal places. In scientific notation, a value of 3.24E-300 is highly
|
|
//unlikely and therefore highly anomalous.
|
|
|
|
`result_type`::
|
|
() TBD. For example, "influencer".
|
|
|
|
`sequence_num`::
|
|
() TBD. For example, 2.
|
|
|
|
`timestamp`::
|
|
(date) Influencers are produced in buckets. This value is the start time
|
|
of the bucket, specified in ISO 8601 format. For example, 1454943900000.
|
|
|
|
An bucket influencer object has the same following properties:
|
|
|
|
`anomaly_score`::
|
|
(number) TBD
|
|
//It is unclear how this differs from the influencer_score.
|
|
//An anomaly score for the influencer in this bucket time interval.
|
|
//The score is calculated based upon a sophisticated aggregation of the anomalies
|
|
//in the bucket for this entity. For example: 94.1386.
|
|
|
|
`bucket_span`::
|
|
() TBD. For example, 300.
|
|
////
|
|
// Same as for buckets? i.e. (unsigned integer) The length of the bucket in seconds.
|
|
// This value is equal to the `bucket_span` value in the job configuration.
|
|
////
|
|
`initial_anomaly_score`::
|
|
() TBD. For example, 83.3831.
|
|
|
|
`influencer_field_name`::
|
|
(string) The field name of the influencer.
|
|
|
|
`is_interim`::
|
|
(boolean) If true, then this is an interim result.
|
|
In other words, it is calculated based on partial input data.
|
|
|
|
`job_id`::
|
|
(string) A numerical character string that uniquely identifies the job.
|
|
|
|
`probability`::
|
|
(number) The probability that the influencer has this behavior.
|
|
This value is in the range 0 to 1. For example, 0.0000109783.
|
|
// For example, 0.03 means 3%. This value is held to a high precision of over
|
|
//300 decimal places. In scientific notation, a value of 3.24E-300 is highly
|
|
//unlikely and therefore highly anomalous.
|
|
|
|
`raw_anomaly_score`::
|
|
() TBD. For example, 2.32119.
|
|
|
|
`result_type`::
|
|
() TBD. For example, "bucket_influencer".
|
|
|
|
`sequence_num`::
|
|
() TBD. For example, 2.
|
|
|
|
`timestamp`::
|
|
(date) Influencers are produced in buckets. This value is the start time
|
|
of the bucket, specified in ISO 8601 format. For example, 1454943900000.
|
|
|
|
[float]
|
|
[[ml-results-buckets]]
|
|
===== Buckets
|
|
|
|
Buckets are the grouped and time-ordered view of the job results.
|
|
A bucket time interval is defined by `bucket_span`, which is specified in the
|
|
job configuration.
|
|
|
|
Each bucket has an `anomaly_score`, which is a statistically aggregated and
|
|
normalized view of the combined anomalousness of the records. You can use this
|
|
score for rate controlled alerting.
|
|
|
|
//TBD: Still correct?
|
|
//Each bucket also has a maxNormalizedProbability that is equal to the highest
|
|
//normalizedProbability of the records with the bucket. This gives an indication
|
|
// of the most anomalous event that has occurred within the time interval.
|
|
//Unlike anomalyScore this does not take into account the number of correlated
|
|
//anomalies that have happened.
|
|
Upon identifying an anomalous bucket, you can investigate further by either
|
|
expanding the bucket resource to show the records as nested objects or by
|
|
accessing the records resource directly and filtering upon date range.
|
|
|
|
A bucket resource has the following properties:
|
|
|
|
`anomaly_score`::
|
|
(number) The aggregated and normalized anomaly score.
|
|
All the anomaly records in the bucket contribute to this score.
|
|
|
|
`bucket_influencers`::
|
|
(array) An array of influencer objects.
|
|
For more information, see <<ml-results-influencers,Influencers>>.
|
|
|
|
`bucket_span`::
|
|
(unsigned integer) The length of the bucket in seconds. This value is
|
|
equal to the `bucket_span` value in the job configuration.
|
|
|
|
`event_count`::
|
|
(unsigned integer) The number of input data records processed in this bucket.
|
|
|
|
`initial_anomaly_score`::
|
|
(number) The value of `anomaly_score` at the time the bucket result was
|
|
created. This is normalized based on data which has already been seen;
|
|
this is not re-normalized and therefore is not adjusted for more recent data.
|
|
//TBD. This description is unclear.
|
|
|
|
`is_interim`::
|
|
(boolean) If true, then this bucket result is an interim result.
|
|
In other words, it is calculated based on partial input data.
|
|
|
|
`job_id`::
|
|
(string) A numerical character string that uniquely identifies the job.
|
|
|
|
`partition_scores`::
|
|
(TBD) TBD. For example, [].
|
|
|
|
`processing_time_ms`::
|
|
(unsigned integer) The time in milliseconds taken to analyze the bucket
|
|
contents and produce results.
|
|
|
|
`record_count`::
|
|
(unsigned integer) The number of anomaly records in this bucket.
|
|
|
|
`result_type`::
|
|
(string) TBD. For example, "bucket".
|
|
|
|
`timestamp`::
|
|
(date) The start time of the bucket, specified in ISO 8601 format.
|
|
For example, 1454020800000. This timestamp uniquely identifies the bucket. +
|
|
+
|
|
--
|
|
NOTE: Events that occur exactly at the timestamp of the bucket are included in
|
|
the results for the bucket.
|
|
|
|
--
|
|
|
|
[float]
|
|
[[ml-results-categories]]
|
|
===== Categories
|
|
|
|
When `categorization_field_name` is specified in the job configuration, it is
|
|
possible to view the definitions of the resulting categories. A category
|
|
definition describes the common terms matched and contains examples of matched
|
|
values.
|
|
|
|
A category resource has the following properties:
|
|
|
|
`category_id`::
|
|
(unsigned integer) A unique identifier for the category.
|
|
|
|
`examples`::
|
|
(array) A list of examples of actual values that matched the category.
|
|
|
|
`job_id`::
|
|
(string) A numerical character string that uniquely identifies the job.
|
|
|
|
`max_matching_length`::
|
|
(unsigned integer) The maximum length of the fields that matched the
|
|
category.
|
|
//TBD: Still true? "The value is increased by 10% to enable matching for
|
|
//similar fields that have not been analyzed"
|
|
|
|
`regex`::
|
|
(string) A regular expression that is used to search for values that match
|
|
the category.
|
|
|
|
`terms`::
|
|
(string) A space separated list of the common tokens that are matched in
|
|
values of the category.
|