[7.x][DOCS] Move machine learning results definitions into APIs (#50543)

This commit is contained in:
Lisa Cawley 2019-12-31 13:21:17 -08:00 committed by GitHub
parent f8eef43fc6
commit ab5a69d1e2
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
8 changed files with 486 additions and 642 deletions

View File

@ -40,99 +40,180 @@ bucket.
include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-anomaly-detection]
`<timestamp>`::
(Optional, string) The timestamp of a single bucket result. If you do not
specify this parameter, the API returns information about all buckets.
(Optional, string) The timestamp of a single bucket result. If you do not
specify this parameter, the API returns information about all buckets.
[[ml-get-bucket-request-body]]
==== {api-request-body-title}
`anomaly_score`::
(Optional, double) Returns buckets with anomaly scores greater or equal than
this value.
(Optional, double) Returns buckets with anomaly scores greater or equal than
this value.
`desc`::
(Optional, boolean) If true, the buckets are sorted in descending order.
(Optional, boolean)
include::{docdir}/ml/ml-shared.asciidoc[tag=desc-results]
`end`::
(Optional, string) Returns buckets with timestamps earlier than this time.
(Optional, string) Returns buckets with timestamps earlier than this time.
`exclude_interim`::
(Optional, boolean) If true, the output excludes interim results. By default,
interim results are included.
(Optional, boolean)
include::{docdir}/ml/ml-shared.asciidoc[tag=exclude-interim-results]
`expand`::
(Optional, boolean) If true, the output includes anomaly records.
(Optional, boolean) If true, the output includes anomaly records.
`page`::
`from`:::
(Optional, integer) Skips the specified number of buckets.
`size`:::
(Optional, integer) Specifies the maximum number of buckets to obtain.
`page`.`from`::
(Optional, integer) Skips the specified number of buckets.
`page`.`size`::
(Optional, integer) Specifies the maximum number of buckets to obtain.
`sort`::
(Optional, string) Specifies the sort field for the requested buckets. By
default, the buckets are sorted by the `timestamp` field.
(Optional, string) Specifies the sort field for the requested buckets. By
default, the buckets are sorted by the `timestamp` field.
`start`::
(Optional, string) Returns buckets with timestamps after this time.
(Optional, string) Returns buckets with timestamps after this time.
[[ml-get-bucket-results]]
==== {api-response-body-title}
The API returns the following information:
The API returns an array of bucket objects, which have the following properties:
`buckets`::
(array) An array of bucket objects. For more information, see
<<ml-results-buckets,Buckets>>.
`anomaly_score`::
(number) The maximum anomaly score, between 0-100, for any of the bucket
influencers. This is an overall, rate-limited score for the job. All the anomaly
records in the bucket contribute to this score. This value might be updated as
new data is analyzed.
`bucket_influencers`::
(array) An array of bucket influencer objects, which have the following
properties:
`bucket_influencers`.`anomaly_score`:::
(number) A normalized score between 0-100, which is calculated for each bucket
influencer. This score might be updated as newer data is analyzed.
`bucket_influencers`.`bucket_span`:::
(number)
include::{docdir}/ml/ml-shared.asciidoc[tag=bucket-span-results]
`bucket_influencers`.`initial_anomaly_score`:::
(number) The score between 0-100 for each bucket influencer. This score is the
initial value that was calculated at the time the bucket was processed.
`bucket_influencers`.`influencer_field_name`:::
(string) The field name of the influencer.
`bucket_influencers`.`influencer_field_value`:::
(string) The field value of the influencer.
`bucket_influencers`.`is_interim`:::
(boolean)
include::{docdir}/ml/ml-shared.asciidoc[tag=is-interim]
`bucket_influencers`.`job_id`:::
(string)
include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-anomaly-detection]
`bucket_influencers`.`probability`:::
(number) The probability that the bucket has this behavior, in the range 0 to 1.
This value can be held to a high precision of over 300 decimal places, so the
`anomaly_score` is provided as a human-readable and friendly interpretation of
this.
`bucket_influencers`.`raw_anomaly_score`:::
(number) Internal.
`bucket_influencers`.`result_type`:::
(string) Internal. This value is always set to `bucket_influencer`.
`bucket_influencers`.`timestamp`:::
(date) The start time of the bucket for which these results were calculated.
`bucket_span`::
(number)
include::{docdir}/ml/ml-shared.asciidoc[tag=bucket-span-results]
`event_count`::
(number) The number of input data records processed in this bucket.
`initial_anomaly_score`::
(number) The maximum `anomaly_score` for any of the bucket influencers. This is
the initial value that was calculated at the time the bucket was processed.
`is_interim`::
(boolean)
include::{docdir}/ml/ml-shared.asciidoc[tag=is-interim]
`job_id`::
(string)
include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-anomaly-detection]
`processing_time_ms`::
(number) The amount of time, in milliseconds, that it took to analyze the bucket
contents and calculate results.
`result_type`::
(string) Internal. This value is always set to `bucket`.
`timestamp`::
(date) The start time of the bucket. This timestamp uniquely identifies the
bucket.
+
--
NOTE: Events that occur exactly at the timestamp of the bucket are included in
the results for the bucket.
--
[[ml-get-bucket-example]]
==== {api-examples-title}
The following example gets bucket information for the `it-ops-kpi` job:
[source,console]
--------------------------------------------------
GET _ml/anomaly_detectors/it-ops-kpi/results/buckets
GET _ml/anomaly_detectors/low_request_rate/results/buckets
{
"anomaly_score": 80,
"start": "1454530200001"
}
--------------------------------------------------
// TEST[skip:todo]
// TEST[skip:Kibana sample data]
In this example, the API returns a single result that matches the specified
score and time constraints:
[source,js]
----
{
"count": 1,
"buckets": [
"count" : 1,
"buckets" : [
{
"job_id": "it-ops-kpi",
"timestamp": 1454943900000,
"anomaly_score": 94.1706,
"bucket_span": 300,
"initial_anomaly_score": 94.1706,
"event_count": 153,
"is_interim": false,
"bucket_influencers": [
"job_id" : "low_request_rate",
"timestamp" : 1578398400000,
"anomaly_score" : 91.58505459594764,
"bucket_span" : 3600,
"initial_anomaly_score" : 91.58505459594764,
"event_count" : 0,
"is_interim" : false,
"bucket_influencers" : [
{
"job_id": "it-ops-kpi",
"result_type": "bucket_influencer",
"influencer_field_name": "bucket_time",
"initial_anomaly_score": 94.1706,
"anomaly_score": 94.1706,
"raw_anomaly_score": 2.32119,
"probability": 0.00000575042,
"timestamp": 1454943900000,
"bucket_span": 300,
"is_interim": false
"job_id" : "low_request_rate",
"result_type" : "bucket_influencer",
"influencer_field_name" : "bucket_time",
"initial_anomaly_score" : 91.58505459594764,
"anomaly_score" : 91.58505459594764,
"raw_anomaly_score" : 0.5758246639716365,
"probability" : 1.7340849573442696E-4,
"timestamp" : 1578398400000,
"bucket_span" : 3600,
"is_interim" : false
}
],
"processing_time_ms": 2,
"partition_scores": [],
"result_type": "bucket"
"processing_time_ms" : 0,
"result_type" : "bucket"
}
]
}
----
----

View File

@ -28,45 +28,76 @@ privileges. See <<security-privileges>> and
[[ml-get-category-desc]]
==== {api-description-title}
For more information about categories, see
When `categorization_field_name` is specified in the job configuration, it is
possible to view the definitions of the resulting categories. A category
definition describes the common terms matched and contains examples of matched
values.
The anomaly results from a categorization analysis are available as bucket,
influencer, and record results. For example, the results might indicate that
at 16:45 there was an unusual count of log message category 11. You can then
examine the description and examples of that category. For more information, see
{ml-docs}/ml-configuring-categories.html[Categorizing log messages].
[[ml-get-category-path-parms]]
==== {api-path-parms-title}
`<category_id>`::
(Optional, long) Identifier for the category. If you do not specify this
parameter, the API returns information about all categories in the {anomaly-job}.
`<job_id>`::
(Required, string)
include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-anomaly-detection]
`<category_id>`::
(Optional, long) Identifier for the category. If you do not specify this
parameter, the API returns information about all categories in the
{anomaly-job}.
[[ml-get-category-request-body]]
==== {api-request-body-title}
`page`::
`from`:::
(Optional, integer) Skips the specified number of categories.
`size`:::
(Optional, integer) Specifies the maximum number of categories to obtain.
`page`.`from`::
(Optional, integer) Skips the specified number of categories.
`page`.`size`::
(Optional, integer) Specifies the maximum number of categories to obtain.
[[ml-get-category-results]]
==== {api-response-body-title}
The API returns the following information:
The API returns an array of category objects, which have the following
properties:
`categories`::
(array) An array of category objects. For more information, see
<<ml-results-categories,Categories>>.
`category_id`::
(unsigned integer) A unique identifier for the category.
`examples`::
(array) A list of examples of actual values that matched the category.
`grok_pattern`::
experimental[] (string) A Grok pattern that could be used in {ls} or an ingest
pipeline to extract fields from messages that match the category. This field is
experimental and may be changed or removed in a future release. The Grok
patterns that are found are not optimal, but are often a good starting point for
manual tweaking.
`job_id`::
(string)
include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-anomaly-detection]
`max_matching_length`::
(unsigned integer) The maximum length of the fields that matched the category.
The value is increased by 10% to enable matching for similar fields that have
not been analyzed.
`regex`::
(string) A regular expression that is used to search for values that match the
category.
`terms`::
(string) A space separated list of the common tokens that are matched in values
of the category.
[[ml-get-category-example]]
==== {api-examples-title}
The following example gets information about one category for the
`esxi_log` job:
[source,console]
--------------------------------------------------
GET _ml/anomaly_detectors/esxi_log/results/categories

View File

@ -23,6 +23,13 @@ need `read` index privilege on the index that stores the results. The
privileges. See <<security-privileges>> and
<<built-in-roles>>.
[[ml-get-influencer-desc]]
==== {api-description-title}
Influencers are the entities that have contributed to, or are to blame for,
the anomalies. Influencer results are available only if an
`influencer_field_name` is specified in the job configuration.
[[ml-get-influencer-path-parms]]
==== {api-path-parms-title}
@ -34,75 +41,119 @@ include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-anomaly-detection]
==== {api-request-body-title}
`desc`::
(Optional, boolean) If true, the results are sorted in descending order.
(Optional, boolean)
include::{docdir}/ml/ml-shared.asciidoc[tag=desc-results]
`end`::
(Optional, string) Returns influencers with timestamps earlier than this time.
(Optional, string) Returns influencers with timestamps earlier than this time.
`exclude_interim`::
(Optional, boolean) If true, the output excludes interim results. By default,
interim results are included.
(Optional, boolean)
include::{docdir}/ml/ml-shared.asciidoc[tag=exclude-interim-results]
`influencer_score`::
(Optional, double) Returns influencers with anomaly scores greater than or
equal to this value.
(Optional, double) Returns influencers with anomaly scores greater than or equal
to this value.
`page`::
`from`:::
(Optional, integer) Skips the specified number of influencers.
`size`:::
(Optional, integer) Specifies the maximum number of influencers to obtain.
`page`.`from`::
(Optional, integer) Skips the specified number of influencers.
`page`.`size`::
(Optional, integer) Specifies the maximum number of influencers to obtain.
`sort`::
(Optional, string) Specifies the sort field for the requested influencers. By
default, the influencers are sorted by the `influencer_score` value.
(Optional, string) Specifies the sort field for the requested influencers. By
default, the influencers are sorted by the `influencer_score` value.
`start`::
(Optional, string) Returns influencers with timestamps after this time.
(Optional, string) Returns influencers with timestamps after this time.
[[ml-get-influencer-results]]
==== {api-response-body-title}
The API returns the following information:
The API returns an array of influencer objects, which have the following
properties:
`influencers`::
(array) An array of influencer objects.
For more information, see <<ml-results-influencers,Influencers>>.
`bucket_span`::
(number)
include::{docdir}/ml/ml-shared.asciidoc[tag=bucket-span-results]
`influencer_score`::
(number) A normalized score between 0-100, which is based on the probability of
the influencer in this bucket aggregated across detectors. Unlike
`initial_influencer_score`, this value will be updated by a re-normalization
process as new data is analyzed.
`influencer_field_name`::
(string) The field name of the influencer.
`influencer_field_value`::
(string) The entity that influenced, contributed to, or was to blame for the
anomaly.
`initial_influencer_score`::
(number) A normalized score between 0-100, which is based on the probability of
the influencer aggregated across detectors. This is the initial value that was
calculated at the time the bucket was processed.
`is_interim`::
(boolean)
include::{docdir}/ml/ml-shared.asciidoc[tag=is-interim]
`job_id`::
(string)
include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-anomaly-detection]
`probability`::
(number) The probability that the influencer has this behavior, in the range 0
to 1. This value can be held to a high precision of over 300 decimal places, so
the `influencer_score` is provided as a human-readable and friendly
interpretation of this.
`result_type`::
(string) Internal. This value is always set to `influencer`.
`timestamp`::
(date)
include::{docdir}/ml/ml-shared.asciidoc[tag=timestamp-results]
NOTE: Additional influencer properties are added, depending on the fields being
analyzed. For example, if it's analyzing `user_name` as an influencer, then a
field `user_name` is added to the result document. This information enables you to
filter the anomaly results more easily.
[[ml-get-influencer-example]]
==== {api-examples-title}
The following example gets influencer information for the `it_ops_new_kpi` job:
[source,console]
--------------------------------------------------
GET _ml/anomaly_detectors/it_ops_new_kpi/results/influencers
GET _ml/anomaly_detectors/high_sum_total_sales/results/influencers
{
"sort": "influencer_score",
"desc": true
}
--------------------------------------------------
// TEST[skip:todo]
// TEST[skip:Kibana sample data]
In this example, the API returns the following information, sorted based on the
influencer score in descending order:
[source,js]
----
{
"count": 28,
"count": 189,
"influencers": [
{
"job_id": "it_ops_new_kpi",
"job_id": "high_sum_total_sales",
"result_type": "influencer",
"influencer_field_name": "kpi_indicator",
"influencer_field_value": "online_purchases",
"kpi_indicator": "online_purchases",
"influencer_score": 94.1386,
"initial_influencer_score": 94.1386,
"probability": 0.000111612,
"bucket_span": 600,
"is_interim": false,
"timestamp": 1454943600000
"influencer_field_name": "customer_full_name.keyword",
"influencer_field_value": "Wagdi Shaw",
"customer_full_name.keyword" : "Wagdi Shaw",
"influencer_score": 99.02493,
"initial_influencer_score" : 94.67233079580171,
"probability" : 1.4784807245686567E-10,
"bucket_span" : 3600,
"is_interim" : false,
"timestamp" : 1574661600000
},
...
]

View File

@ -66,45 +66,63 @@ include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-anomaly-detection-wildcard-li
include::{docdir}/ml/ml-shared.asciidoc[tag=allow-no-jobs]
`bucket_span`::
(Optional, string) The span of the overall buckets. Must be greater or equal
to the largest bucket span of the specified {anomaly-jobs}, which is the
default value.
(Optional, string) The span of the overall buckets. Must be greater or equal to
the largest bucket span of the specified {anomaly-jobs}, which is the default
value.
`end`::
(Optional, string) Returns overall buckets with timestamps earlier than this
time.
(Optional, string) Returns overall buckets with timestamps earlier than this
time.
`exclude_interim`::
(Optional, boolean) If `true`, the output excludes interim overall buckets.
Overall buckets are interim if any of the job buckets within the overall
bucket interval are interim. By default, interim results are included.
(Optional, boolean)
include::{docdir}/ml/ml-shared.asciidoc[tag=exclude-interim-results]
+
--
If any of the job bucket results within the overall bucket interval are interim
results, the overall bucket results are interim results.
--
`overall_score`::
(Optional, double) Returns overall buckets with overall scores greater or
equal than this value.
(Optional, double) Returns overall buckets with overall scores greater than or
equal to this value.
`start`::
(Optional, string) Returns overall buckets with timestamps after this time.
(Optional, string) Returns overall buckets with timestamps after this time.
`top_n`::
(Optional, integer) The number of top {anomaly-job} bucket scores to be used
in the `overall_score` calculation. The default value is `1`.
(Optional, integer) The number of top {anomaly-job} bucket scores to be used in
the `overall_score` calculation. The default value is `1`.
[[ml-get-overall-buckets-results]]
==== {api-response-body-title}
The API returns the following information:
The API returns an array of overall bucket objects, which have the following
properties:
`overall_buckets`::
(array) An array of overall bucket objects. For more information, see
<<ml-results-overall-buckets,Overall Buckets>>.
`bucket_span`::
(number) The length of the bucket in seconds. Matches the job with the longest `bucket_span` value.
`is_interim`::
(boolean)
include::{docdir}/ml/ml-shared.asciidoc[tag=is-interim]
`jobs`::
(array) An array of objects that contain the `max_anomaly_score` per `job_id`.
`overall_score`::
(number) The `top_n` average of the maximum bucket `anomaly_score` per job.
`result_type`::
(string) Internal. This is always set to `overall_bucket`.
`timestamp`::
(date)
include::{docdir}/ml/ml-shared.asciidoc[tag=timestamp-results]
[[ml-get-overall-buckets-example]]
==== {api-examples-title}
The following example gets overall buckets for {anomaly-jobs} with IDs matching
`job-*`:
[source,console]
--------------------------------------------------
GET _ml/anomaly_detectors/job-*/results/overall_buckets

View File

@ -22,6 +22,22 @@ need `read` index privilege on the index that stores the results. The
`machine_learning_admin` and `machine_learning_user` roles provide these
privileges. See <<security-privileges>> and <<built-in-roles>>.
[[ml-get-record-desc]]
==== {api-description-title}
Records contain the detailed analytical results. They describe the anomalous
activity that has been identified in the input data based on the detector
configuration.
There can be many anomaly records depending on the characteristics and size of
the input data. In practice, there are often too many to be able to manually
process them. The {ml-features} therefore perform a sophisticated aggregation of
the anomaly records into buckets.
The number of record results depends on the number of anomalies found in each
bucket, which relates to the number of time series being modeled and the number
of detectors.
[[ml-get-record-path-parms]]
==== {api-path-parms-title}
@ -33,83 +49,194 @@ include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-anomaly-detection]
==== {api-request-body-title}
`desc`::
(Optional, boolean) If true, the results are sorted in descending order.
(Optional, boolean)
include::{docdir}/ml/ml-shared.asciidoc[tag=desc-results]
`end`::
(Optional, string) Returns records with timestamps earlier than this time.
(Optional, string) Returns records with timestamps earlier than this time.
`exclude_interim`::
(Optional, boolean) If true, the output excludes interim results. By default,
interim results are included.
(Optional, boolean)
include::{docdir}/ml/ml-shared.asciidoc[tag=exclude-interim-results]
`page`::
`from`:::
(Optional, integer) Skips the specified number of records.
`size`:::
(Optional, integer) Specifies the maximum number of records to obtain.
`page`.`from`::
(Optional, integer) Skips the specified number of records.
`page`.`size`::
(Optional, integer) Specifies the maximum number of records to obtain.
`record_score`::
(Optional, double) Returns records with anomaly scores greater or equal than
this value.
(Optional, double) Returns records with anomaly scores greater or equal than
this value.
`sort`::
(Optional, string) Specifies the sort field for the requested records. By
default, the records are sorted by the `anomaly_score` value.
(Optional, string) Specifies the sort field for the requested records. By
default, the records are sorted by the `anomaly_score` value.
`start`::
(Optional, string) Returns records with timestamps after this time.
(Optional, string) Returns records with timestamps after this time.
[[ml-get-record-results]]
==== {api-response-body-title}
The API returns the following information:
The API returns an array of record objects, which have the following properties:
`records`::
(array) An array of record objects. For more information, see
<<ml-results-records>>.
`actual`::
(array) The actual value for the bucket.
`bucket_span`::
(number)
include::{docdir}/ml/ml-shared.asciidoc[tag=bucket-span-results]
`by_field_name`::
(string)
include::{docdir}/ml/ml-shared.asciidoc[tag=by-field-name]
`by_field_value`::
(string) The value of the by field.
`causes`::
(array) For population analysis, an over field must be specified in the detector.
This property contains an array of anomaly records that are the causes for the
anomaly that has been identified for the over field. If no over fields exist,
this field is not present. This sub-resource contains the most anomalous records
for the `over_field_name`. For scalability reasons, a maximum of the 10 most
significant causes of the anomaly are returned. As part of the core analytical
modeling, these low-level anomaly records are aggregated for their parent over
field record. The causes resource contains similar elements to the record
resource, namely `actual`, `typical`, `geo_results.actual_point`,
`geo_results.typical_point`, `*_field_name` and `*_field_value`. Probability and
scores are not applicable to causes.
`detector_index`::
(number)
include::{docdir}/ml/ml-shared.asciidoc[tag=detector-index]
`field_name`::
(string) Certain functions require a field to operate on, for example, `sum()`.
For those functions, this value is the name of the field to be analyzed.
`function`::
(string) The function in which the anomaly occurs, as specified in the
detector configuration. For example, `max`.
`function_description`::
(string) The description of the function in which the anomaly occurs, as
specified in the detector configuration.
`geo_results.actual_point`::
(string) The actual value for the bucket formatted as a `geo_point`. If the
detector function is `lat_long`, this is a comma delimited string of the
latitude and longitude.
`geo_results.typical_point`::
(string) The typical value for the bucket formatted as a `geo_point`. If the
detector function is `lat_long`, this is a comma delimited string of the
latitude and longitude.
`influencers`::
(array) If `influencers` was specified in the detector configuration, this array
contains influencers that contributed to or were to blame for an anomaly.
`initial_record_score`::
(number) A normalized score between 0-100, which is based on the probability of
the anomalousness of this record. This is the initial value that was calculated
at the time the bucket was processed.
`is_interim`::
(boolean)
include::{docdir}/ml/ml-shared.asciidoc[tag=is-interim]
`job_id`::
(string)
include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-anomaly-detection]
`over_field_name`::
(string)
include::{docdir}/ml/ml-shared.asciidoc[tag=over-field-name]
`over_field_value`::
(string) The value of the over field.
`partition_field_name`::
(string)
include::{docdir}/ml/ml-shared.asciidoc[tag=partition-field-name]
`partition_field_value`::
(string) The value of the partition field.
`probability`::
(number) The probability of the individual anomaly occurring, in the range `0`
to `1`. This value can be held to a high precision of over 300 decimal places,
so the `record_score` is provided as a human-readable and friendly
interpretation of this.
`multi_bucket_impact`::
(number) an indication of how strongly an anomaly is multi bucket or single
bucket. The value is on a scale of `-5.0` to `+5.0` where `-5.0` means the
anomaly is purely single bucket and `+5.0` means the anomaly is purely multi
bucket.
`record_score`::
(number) A normalized score between 0-100, which is based on the probability of
the anomalousness of this record. Unlike `initial_record_score`, this value will
be updated by a re-normalization process as new data is analyzed.
`result_type`::
(string) Internal. This is always set to `record`.
`timestamp`::
(date)
include::{docdir}/ml/ml-shared.asciidoc[tag=timestamp-results]
`typical`::
(array) The typical value for the bucket, according to analytical modeling.
NOTE: Additional record properties are added, depending on the fields being
analyzed. For example, if it's analyzing `hostname` as a _by field_, then a field
`hostname` is added to the result document. This information enables you to
filter the anomaly results more easily.
[[ml-get-record-example]]
==== {api-examples-title}
The following example gets record information for the `it-ops-kpi` job:
[source,console]
--------------------------------------------------
GET _ml/anomaly_detectors/it-ops-kpi/results/records
GET _ml/anomaly_detectors/low_request_rate/results/records
{
"sort": "record_score",
"desc": true,
"start": "1454944100000"
}
--------------------------------------------------
// TEST[skip:todo]
// TEST[skip:Kibana sample data]
In this example, the API returns twelve results for the specified
time constraints:
[source,js]
----
{
"count": 12,
"records": [
"count" : 4,
"records" : [
{
"job_id": "it-ops-kpi",
"result_type": "record",
"probability": 0.00000332668,
"record_score": 72.9929,
"initial_record_score": 65.7923,
"bucket_span": 300,
"detector_index": 0,
"is_interim": false,
"timestamp": 1454944200000,
"function": "low_sum",
"function_description": "sum",
"typical": [
1806.48
"job_id" : "low_request_rate",
"result_type" : "record",
"probability" : 1.3882308899968812E-4,
"multi_bucket_impact" : -5.0,
"record_score" : 94.98554565630553,
"initial_record_score" : 94.98554565630553,
"bucket_span" : 3600,
"detector_index" : 0,
"is_interim" : false,
"timestamp" : 1577793600000,
"function" : "low_count",
"function_description" : "count",
"typical" : [
28.254208230188834
],
"actual": [
288
],
"field_name": "events_per_min"
"actual" : [
0.0
]
},
...
]

View File

@ -1,479 +0,0 @@
[role="xpack"]
[testenv="platinum"]
[[ml-results-resource]]
=== Results resources
Several different result types are created for each job. You can query anomaly
results for _buckets_, _influencers_, and _records_ by using the results API.
Summarized bucket results over multiple jobs can be queried as well; those
results are called _overall buckets_.
Results are written for each `bucket_span`. The timestamp for the results is the
start of the bucket time interval.
The results include scores, which are calculated for each anomaly result type and
each bucket interval. These scores are aggregated in order to reduce noise, and
normalized in order to identify and rank the most mathematically significant
anomalies.
Bucket results provide the top level, overall view of the job and are ideal for
alerts. For example, the bucket results might indicate that at 16:05 the system
was unusual. This information is a summary of all the anomalies, pinpointing
when they occurred.
Influencer results show which entities were anomalous and when. For example,
the influencer results might indicate that at 16:05 `user_name: Bob` was unusual.
This information is a summary of all the anomalies for each entity, so there
can be a lot of these results. Once you have identified a notable bucket time,
you can look to see which entities were significant.
Record results provide details about what the individual anomaly was, when it
occurred and which entity was involved. For example, the record results might
indicate that at 16:05 Bob sent 837262434 bytes, when the typical value was
1067 bytes. Once you have identified a bucket time and perhaps a significant
entity too, you can drill through to the record results in order to investigate
the anomalous behavior.
Categorization results contain the definitions of _categories_ that have been
identified. These are only applicable for jobs that are configured to analyze
unstructured log data using categorization. These results do not contain a
timestamp or any calculated scores. For more information, see
{ml-docs}/ml-configuring-categories.html[Categorizing log messages].
* <<ml-results-buckets,Buckets>>
* <<ml-results-influencers,Influencers>>
* <<ml-results-records,Records>>
* <<ml-results-categories,Categories>>
* <<ml-results-overall-buckets,Overall Buckets>>
NOTE: All of these resources and properties are informational; you cannot
change their values.
[float]
[[ml-results-buckets]]
==== Buckets
Bucket results provide the top level, overall view of the job and are best for
alerting.
Each bucket has an `anomaly_score`, which is a statistically aggregated and
normalized view of the combined anomalousness of all the record results within
each bucket.
One bucket result is written for each `bucket_span` for each job, even if it is
not considered to be anomalous. If the bucket is not anomalous, it has an
`anomaly_score` of zero.
When you identify an anomalous bucket, you can investigate further by expanding
the bucket resource to show the records as nested objects. Alternatively, you
can access the records resource directly and filter by the date range.
A bucket resource has the following properties:
`anomaly_score`::
(number) The maximum anomaly score, between 0-100, for any of the bucket
influencers. This is an overall, rate-limited score for the job. All the
anomaly records in the bucket contribute to this score. This value might be
updated as new data is analyzed.
`bucket_influencers`::
(array) An array of bucket influencer objects.
For more information, see <<ml-results-bucket-influencers,Bucket Influencers>>.
`bucket_span`::
(number) The length of the bucket in seconds.
This value matches the `bucket_span` that is specified in the job.
`event_count`::
(number) The number of input data records processed in this bucket.
`initial_anomaly_score`::
(number) The maximum `anomaly_score` for any of the bucket influencers.
This is the initial value that was calculated at the time the bucket was
processed.
`is_interim`::
(boolean) If true, this is an interim result. In other words, the bucket
results are calculated based on partial input data.
`job_id`::
(string) The unique identifier for the job that these results belong to.
`processing_time_ms`::
(number) The amount of time, in milliseconds, that it took to analyze the
bucket contents and calculate results.
`result_type`::
(string) Internal. This value is always set to `bucket`.
`timestamp`::
(date) The start time of the bucket. This timestamp uniquely identifies the
bucket. +
NOTE: Events that occur exactly at the timestamp of the bucket are included in
the results for the bucket.
[float]
[[ml-results-bucket-influencers]]
==== Bucket Influencers
Bucket influencer results are available as nested objects contained within
bucket results. These results are an aggregation for each type of influencer.
For example, if both `client_ip` and `user_name` were specified as influencers,
then you would be able to determine when the `client_ip` or `user_name` values
were collectively anomalous.
There is a built-in bucket influencer called `bucket_time` which is always
available. This bucket influencer is the aggregation of all records in the
bucket; it is not just limited to a type of influencer.
NOTE: A bucket influencer is a type of influencer. For example, `client_ip` or
`user_name` can be bucket influencers, whereas `192.168.88.2` and `Bob` are
influencers.
An bucket influencer object has the following properties:
`anomaly_score`::
(number) A normalized score between 0-100, which is calculated for each bucket
influencer. This score might be updated as newer data is analyzed.
`bucket_span`::
(number) The length of the bucket in seconds. This value matches the `bucket_span`
that is specified in the job.
`initial_anomaly_score`::
(number) The score between 0-100 for each bucket influencer. This score is
the initial value that was calculated at the time the bucket was processed.
`influencer_field_name`::
(string) The field name of the influencer. For example `client_ip` or
`user_name`.
`influencer_field_value`::
(string) The field value of the influencer. For example `192.168.88.2` or
`Bob`.
`is_interim`::
(boolean) If true, this is an interim result. In other words, the bucket
influencer results are calculated based on partial input data.
`job_id`::
(string) The unique identifier for the job that these results belong to.
`probability`::
(number) The probability that the bucket has this behavior, in the range 0
to 1. For example, 0.0000109783. This value can be held to a high precision
of over 300 decimal places, so the `anomaly_score` is provided as a
human-readable and friendly interpretation of this.
`raw_anomaly_score`::
(number) Internal.
`result_type`::
(string) Internal. This value is always set to `bucket_influencer`.
`timestamp`::
(date) The start time of the bucket for which these results were calculated.
[float]
[[ml-results-influencers]]
==== Influencers
Influencers are the entities that have contributed to, or are to blame for,
the anomalies. Influencer results are available only if an
`influencer_field_name` is specified in the job configuration.
Influencers are given an `influencer_score`, which is calculated based on the
anomalies that have occurred in each bucket interval. For jobs with more than
one detector, this gives a powerful view of the most anomalous entities.
For example, if you are analyzing unusual bytes sent and unusual domains
visited and you specified `user_name` as the influencer, then an
`influencer_score` for each anomalous user name is written per bucket. For
example, if `user_name: Bob` had an `influencer_score` greater than 75, then
`Bob` would be considered very anomalous during this time interval in one or
both of those areas (unusual bytes sent or unusual domains visited).
One influencer result is written per bucket for each influencer that is
considered anomalous.
When you identify an influencer with a high score, you can investigate further
by accessing the records resource for that bucket and enumerating the anomaly
records that contain the influencer.
An influencer object has the following properties:
`bucket_span`::
(number) The length of the bucket in seconds. This value matches the `bucket_span`
that is specified in the job.
`influencer_score`::
(number) A normalized score between 0-100, which is based on the probability
of the influencer in this bucket aggregated across detectors. Unlike
`initial_influencer_score`, this value will be updated by a re-normalization
process as new data is analyzed.
`initial_influencer_score`::
(number) A normalized score between 0-100, which is based on the probability
of the influencer aggregated across detectors. This is the initial value that
was calculated at the time the bucket was processed.
`influencer_field_name`::
(string) The field name of the influencer.
`influencer_field_value`::
(string) The entity that influenced, contributed to, or was to blame for the
anomaly.
`is_interim`::
(boolean) If true, this is an interim result. In other words, the influencer
results are calculated based on partial input data.
`job_id`::
(string) The unique identifier for the job that these results belong to.
`probability`::
(number) The probability that the influencer has this behavior, in the range
0 to 1. For example, 0.0000109783. This value can be held to a high precision
of over 300 decimal places, so the `influencer_score` is provided as a
human-readable and friendly interpretation of this.
// For example, 0.03 means 3%. This value is held to a high precision of over
//300 decimal places. In scientific notation, a value of 3.24E-300 is highly
//unlikely and therefore highly anomalous.
`result_type`::
(string) Internal. This value is always set to `influencer`.
`timestamp`::
(date) The start time of the bucket for which these results were calculated.
NOTE: Additional influencer properties are added, depending on the fields being
analyzed. For example, if it's analyzing `user_name` as an influencer, then a
field `user_name` is added to the result document. This information enables you to
filter the anomaly results more easily.
[float]
[[ml-results-records]]
==== Records
Records contain the detailed analytical results. They describe the anomalous
activity that has been identified in the input data based on the detector
configuration.
For example, if you are looking for unusually large data transfers, an anomaly
record can identify the source IP address, the destination, the time window
during which it occurred, the expected and actual size of the transfer, and the
probability of this occurrence.
There can be many anomaly records depending on the characteristics and size of
the input data. In practice, there are often too many to be able to manually
process them. The {ml-features} therefore perform a sophisticated
aggregation of the anomaly records into buckets.
The number of record results depends on the number of anomalies found in each
bucket, which relates to the number of time series being modeled and the number of
detectors.
A record object has the following properties:
`actual`::
(array) The actual value for the bucket.
`bucket_span`::
(number) The length of the bucket in seconds.
This value matches the `bucket_span` that is specified in the job.
`by_field_name`::
(string) The name of the analyzed field. This value is present only if
it is specified in the detector. For example, `client_ip`.
`by_field_value`::
(string) The value of `by_field_name`. This value is present only if
it is specified in the detector. For example, `192.168.66.2`.
`causes`::
(array) For population analysis, an over field must be specified in the
detector. This property contains an array of anomaly records that are the
causes for the anomaly that has been identified for the over field. If no
over fields exist, this field is not present. This sub-resource contains
the most anomalous records for the `over_field_name`. For scalability reasons,
a maximum of the 10 most significant causes of the anomaly are returned. As
part of the core analytical modeling, these low-level anomaly records are
aggregated for their parent over field record. The causes resource contains
similar elements to the record resource, namely `actual`, `typical`,
`geo_results.actual_point`, `geo_results.typical_point`,
`*_field_name` and `*_field_value`.
Probability and scores are not applicable to causes.
`detector_index`::
(number) A unique identifier for the detector.
`field_name`::
(string) Certain functions require a field to operate on, for example, `sum()`.
For those functions, this value is the name of the field to be analyzed.
`function`::
(string) The function in which the anomaly occurs, as specified in the
detector configuration. For example, `max`.
`function_description`::
(string) The description of the function in which the anomaly occurs, as
specified in the detector configuration.
`influencers`::
(array) If `influencers` was specified in the detector configuration, then
this array contains influencers that contributed to or were to blame for an
anomaly.
`initial_record_score`::
(number) A normalized score between 0-100, which is based on the
probability of the anomalousness of this record. This is the initial value
that was calculated at the time the bucket was processed.
`is_interim`::
(boolean) If true, this is an interim result. In other words, the anomaly
record is calculated based on partial input data.
`job_id`::
(string) The unique identifier for the job that these results belong to.
`over_field_name`::
(string) The name of the over field that was used in the analysis. This value
is present only if it was specified in the detector. Over fields are used
in population analysis. For example, `user`.
`over_field_value`::
(string) The value of `over_field_name`. This value is present only if it
was specified in the detector. For example, `Bob`.
`partition_field_name`::
(string) The name of the partition field that was used in the analysis. This
value is present only if it was specified in the detector. For example,
`region`.
`partition_field_value`::
(string) The value of `partition_field_name`. This value is present only if
it was specified in the detector. For example, `us-east-1`.
`probability`::
(number) The probability of the individual anomaly occurring, in the range
0 to 1. For example, 0.0000772031. This value can be held to a high precision
of over 300 decimal places, so the `record_score` is provided as a
human-readable and friendly interpretation of this.
//In scientific notation, a value of 3.24E-300 is highly unlikely and therefore
//highly anomalous.
`multi_bucket_impact`::
(number) an indication of how strongly an anomaly is multi bucket or single bucket.
The value is on a scale of -5 to +5 where -5 means the anomaly is purely single
bucket and +5 means the anomaly is purely multi bucket.
`record_score`::
(number) A normalized score between 0-100, which is based on the probability
of the anomalousness of this record. Unlike `initial_record_score`, this
value will be updated by a re-normalization process as new data is analyzed.
`result_type`::
(string) Internal. This is always set to `record`.
`timestamp`::
(date) The start time of the bucket for which these results were calculated.
`typical`::
(array) The typical value for the bucket, according to analytical modeling.
`geo_results.actual_point`::
(string) The actual value for the bucket formatted as a `geo_point`.
If the detector function is `lat_long`, this is a comma delimited string
of the latitude and longitude.
`geo_results.typical_point`::
(string) The typical value for the bucket formatted as a `geo_point`.
If the detector function is `lat_long`, this is a comma delimited string
of the latitude and longitude.
NOTE: Additional record properties are added, depending on the fields being
analyzed. For example, if it's analyzing `hostname` as a _by field_, then a field
`hostname` is added to the result document. This information enables you to
filter the anomaly results more easily.
[float]
[[ml-results-categories]]
==== Categories
When `categorization_field_name` is specified in the job configuration, it is
possible to view the definitions of the resulting categories. A category
definition describes the common terms matched and contains examples of matched
values.
The anomaly results from a categorization analysis are available as bucket,
influencer, and record results. For example, the results might indicate that
at 16:45 there was an unusual count of log message category 11. You can then
examine the description and examples of that category.
A category resource has the following properties:
`category_id`::
(unsigned integer) A unique identifier for the category.
`examples`::
(array) A list of examples of actual values that matched the category.
`grok_pattern`::
experimental[] (string) A Grok pattern that could be used in Logstash or an
Ingest Pipeline to extract fields from messages that match the category. This
field is experimental and may be changed or removed in a future release. The
Grok patterns that are found are not optimal, but are often a good starting
point for manual tweaking.
`job_id`::
(string) The unique identifier for the job that these results belong to.
`max_matching_length`::
(unsigned integer) The maximum length of the fields that matched the category.
The value is increased by 10% to enable matching for similar fields that have
not been analyzed.
`regex`::
(string) A regular expression that is used to search for values that match the
category.
`terms`::
(string) A space separated list of the common tokens that are matched in
values of the category.
[float]
[[ml-results-overall-buckets]]
==== Overall Buckets
Overall buckets provide a summary of bucket results over multiple jobs.
Their `bucket_span` equals the longest `bucket_span` of the jobs in question.
The `overall_score` is the `top_n` average of the max `anomaly_score` per job
within the overall bucket time interval.
This means that you can fine-tune the `overall_score` so that it is more
or less sensitive to the number of jobs that detect an anomaly at the same time.
An overall bucket resource has the following properties:
`timestamp`::
(date) The start time of the overall bucket.
`bucket_span`::
(number) The length of the bucket in seconds. Matches the `bucket_span`
of the job with the longest one.
`overall_score`::
(number) The `top_n` average of the max bucket `anomaly_score` per job.
`jobs`::
(array) An array of objects that contain the `max_anomaly_score` per `job_id`.
`is_interim`::
(boolean) If true, this is an interim result. In other words, the anomaly
record is calculated based on partial input data.
`result_type`::
(string) Internal. This is always set to `overall_bucket`.

View File

@ -498,3 +498,20 @@ the details in <<ml-get-job-stats>>.
This page was deleted.
[[ml-snapshot-stats]]
See <<ml-update-snapshot>> and <<ml-get-snapshot>>.
[role="exclude",id="ml-results-resource"]
=== Results resources
This page was deleted.
[[ml-results-buckets]]
See <<ml-get-bucket>>,
[[ml-results-bucket-influencers]]
<<ml-get-bucket>>,
[[ml-results-influencers]]
<<ml-get-influencer>>,
[[ml-results-records]]
<<ml-get-record>>,
[[ml-results-categories]]
<<ml-get-category>>, and
[[ml-results-overall-buckets]]
<<ml-get-overall-buckets>>.

View File

@ -6,9 +6,7 @@ These resource definitions are used in APIs related to {ml-features} and
{security-features} and in {kib} advanced {ml} job configuration options.
* <<ml-dfa-analysis-objects>>
* <<ml-results-resource,{anomaly-detect-cap} results>>
* <<role-mapping-resources,Role mappings>>
include::{es-repo-dir}/ml/df-analytics/apis/analysisobjects.asciidoc[]
include::{xes-repo-dir}/rest-api/security/role-mapping-resources.asciidoc[]
include::{es-repo-dir}/ml/anomaly-detection/apis/resultsresource.asciidoc[]