[7.x][DOCS] Move machine learning results definitions into APIs (#50543)
This commit is contained in:
parent
f8eef43fc6
commit
ab5a69d1e2
|
@ -40,99 +40,180 @@ bucket.
|
|||
include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-anomaly-detection]
|
||||
|
||||
`<timestamp>`::
|
||||
(Optional, string) The timestamp of a single bucket result. If you do not
|
||||
specify this parameter, the API returns information about all buckets.
|
||||
(Optional, string) The timestamp of a single bucket result. If you do not
|
||||
specify this parameter, the API returns information about all buckets.
|
||||
|
||||
[[ml-get-bucket-request-body]]
|
||||
==== {api-request-body-title}
|
||||
|
||||
`anomaly_score`::
|
||||
(Optional, double) Returns buckets with anomaly scores greater or equal than
|
||||
this value.
|
||||
(Optional, double) Returns buckets with anomaly scores greater or equal than
|
||||
this value.
|
||||
|
||||
`desc`::
|
||||
(Optional, boolean) If true, the buckets are sorted in descending order.
|
||||
(Optional, boolean)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=desc-results]
|
||||
|
||||
`end`::
|
||||
(Optional, string) Returns buckets with timestamps earlier than this time.
|
||||
(Optional, string) Returns buckets with timestamps earlier than this time.
|
||||
|
||||
`exclude_interim`::
|
||||
(Optional, boolean) If true, the output excludes interim results. By default,
|
||||
interim results are included.
|
||||
(Optional, boolean)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=exclude-interim-results]
|
||||
|
||||
`expand`::
|
||||
(Optional, boolean) If true, the output includes anomaly records.
|
||||
(Optional, boolean) If true, the output includes anomaly records.
|
||||
|
||||
`page`::
|
||||
`from`:::
|
||||
(Optional, integer) Skips the specified number of buckets.
|
||||
`size`:::
|
||||
(Optional, integer) Specifies the maximum number of buckets to obtain.
|
||||
`page`.`from`::
|
||||
(Optional, integer) Skips the specified number of buckets.
|
||||
|
||||
`page`.`size`::
|
||||
(Optional, integer) Specifies the maximum number of buckets to obtain.
|
||||
|
||||
`sort`::
|
||||
(Optional, string) Specifies the sort field for the requested buckets. By
|
||||
default, the buckets are sorted by the `timestamp` field.
|
||||
(Optional, string) Specifies the sort field for the requested buckets. By
|
||||
default, the buckets are sorted by the `timestamp` field.
|
||||
|
||||
`start`::
|
||||
(Optional, string) Returns buckets with timestamps after this time.
|
||||
(Optional, string) Returns buckets with timestamps after this time.
|
||||
|
||||
[[ml-get-bucket-results]]
|
||||
==== {api-response-body-title}
|
||||
|
||||
The API returns the following information:
|
||||
The API returns an array of bucket objects, which have the following properties:
|
||||
|
||||
`buckets`::
|
||||
(array) An array of bucket objects. For more information, see
|
||||
<<ml-results-buckets,Buckets>>.
|
||||
`anomaly_score`::
|
||||
(number) The maximum anomaly score, between 0-100, for any of the bucket
|
||||
influencers. This is an overall, rate-limited score for the job. All the anomaly
|
||||
records in the bucket contribute to this score. This value might be updated as
|
||||
new data is analyzed.
|
||||
|
||||
`bucket_influencers`::
|
||||
(array) An array of bucket influencer objects, which have the following
|
||||
properties:
|
||||
|
||||
`bucket_influencers`.`anomaly_score`:::
|
||||
(number) A normalized score between 0-100, which is calculated for each bucket
|
||||
influencer. This score might be updated as newer data is analyzed.
|
||||
|
||||
`bucket_influencers`.`bucket_span`:::
|
||||
(number)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=bucket-span-results]
|
||||
|
||||
`bucket_influencers`.`initial_anomaly_score`:::
|
||||
(number) The score between 0-100 for each bucket influencer. This score is the
|
||||
initial value that was calculated at the time the bucket was processed.
|
||||
|
||||
`bucket_influencers`.`influencer_field_name`:::
|
||||
(string) The field name of the influencer.
|
||||
|
||||
`bucket_influencers`.`influencer_field_value`:::
|
||||
(string) The field value of the influencer.
|
||||
|
||||
`bucket_influencers`.`is_interim`:::
|
||||
(boolean)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=is-interim]
|
||||
|
||||
`bucket_influencers`.`job_id`:::
|
||||
(string)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-anomaly-detection]
|
||||
|
||||
`bucket_influencers`.`probability`:::
|
||||
(number) The probability that the bucket has this behavior, in the range 0 to 1.
|
||||
This value can be held to a high precision of over 300 decimal places, so the
|
||||
`anomaly_score` is provided as a human-readable and friendly interpretation of
|
||||
this.
|
||||
|
||||
`bucket_influencers`.`raw_anomaly_score`:::
|
||||
(number) Internal.
|
||||
|
||||
`bucket_influencers`.`result_type`:::
|
||||
(string) Internal. This value is always set to `bucket_influencer`.
|
||||
|
||||
`bucket_influencers`.`timestamp`:::
|
||||
(date) The start time of the bucket for which these results were calculated.
|
||||
|
||||
`bucket_span`::
|
||||
(number)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=bucket-span-results]
|
||||
|
||||
`event_count`::
|
||||
(number) The number of input data records processed in this bucket.
|
||||
|
||||
`initial_anomaly_score`::
|
||||
(number) The maximum `anomaly_score` for any of the bucket influencers. This is
|
||||
the initial value that was calculated at the time the bucket was processed.
|
||||
|
||||
`is_interim`::
|
||||
(boolean)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=is-interim]
|
||||
|
||||
`job_id`::
|
||||
(string)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-anomaly-detection]
|
||||
|
||||
`processing_time_ms`::
|
||||
(number) The amount of time, in milliseconds, that it took to analyze the bucket
|
||||
contents and calculate results.
|
||||
|
||||
`result_type`::
|
||||
(string) Internal. This value is always set to `bucket`.
|
||||
|
||||
`timestamp`::
|
||||
(date) The start time of the bucket. This timestamp uniquely identifies the
|
||||
bucket.
|
||||
+
|
||||
--
|
||||
NOTE: Events that occur exactly at the timestamp of the bucket are included in
|
||||
the results for the bucket.
|
||||
|
||||
--
|
||||
|
||||
[[ml-get-bucket-example]]
|
||||
==== {api-examples-title}
|
||||
|
||||
The following example gets bucket information for the `it-ops-kpi` job:
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
GET _ml/anomaly_detectors/it-ops-kpi/results/buckets
|
||||
GET _ml/anomaly_detectors/low_request_rate/results/buckets
|
||||
{
|
||||
"anomaly_score": 80,
|
||||
"start": "1454530200001"
|
||||
}
|
||||
--------------------------------------------------
|
||||
// TEST[skip:todo]
|
||||
// TEST[skip:Kibana sample data]
|
||||
|
||||
In this example, the API returns a single result that matches the specified
|
||||
score and time constraints:
|
||||
[source,js]
|
||||
----
|
||||
{
|
||||
"count": 1,
|
||||
"buckets": [
|
||||
"count" : 1,
|
||||
"buckets" : [
|
||||
{
|
||||
"job_id": "it-ops-kpi",
|
||||
"timestamp": 1454943900000,
|
||||
"anomaly_score": 94.1706,
|
||||
"bucket_span": 300,
|
||||
"initial_anomaly_score": 94.1706,
|
||||
"event_count": 153,
|
||||
"is_interim": false,
|
||||
"bucket_influencers": [
|
||||
"job_id" : "low_request_rate",
|
||||
"timestamp" : 1578398400000,
|
||||
"anomaly_score" : 91.58505459594764,
|
||||
"bucket_span" : 3600,
|
||||
"initial_anomaly_score" : 91.58505459594764,
|
||||
"event_count" : 0,
|
||||
"is_interim" : false,
|
||||
"bucket_influencers" : [
|
||||
{
|
||||
"job_id": "it-ops-kpi",
|
||||
"result_type": "bucket_influencer",
|
||||
"influencer_field_name": "bucket_time",
|
||||
"initial_anomaly_score": 94.1706,
|
||||
"anomaly_score": 94.1706,
|
||||
"raw_anomaly_score": 2.32119,
|
||||
"probability": 0.00000575042,
|
||||
"timestamp": 1454943900000,
|
||||
"bucket_span": 300,
|
||||
"is_interim": false
|
||||
"job_id" : "low_request_rate",
|
||||
"result_type" : "bucket_influencer",
|
||||
"influencer_field_name" : "bucket_time",
|
||||
"initial_anomaly_score" : 91.58505459594764,
|
||||
"anomaly_score" : 91.58505459594764,
|
||||
"raw_anomaly_score" : 0.5758246639716365,
|
||||
"probability" : 1.7340849573442696E-4,
|
||||
"timestamp" : 1578398400000,
|
||||
"bucket_span" : 3600,
|
||||
"is_interim" : false
|
||||
}
|
||||
],
|
||||
"processing_time_ms": 2,
|
||||
"partition_scores": [],
|
||||
"result_type": "bucket"
|
||||
"processing_time_ms" : 0,
|
||||
"result_type" : "bucket"
|
||||
}
|
||||
]
|
||||
}
|
||||
----
|
||||
----
|
|
@ -28,45 +28,76 @@ privileges. See <<security-privileges>> and
|
|||
[[ml-get-category-desc]]
|
||||
==== {api-description-title}
|
||||
|
||||
For more information about categories, see
|
||||
When `categorization_field_name` is specified in the job configuration, it is
|
||||
possible to view the definitions of the resulting categories. A category
|
||||
definition describes the common terms matched and contains examples of matched
|
||||
values.
|
||||
|
||||
The anomaly results from a categorization analysis are available as bucket,
|
||||
influencer, and record results. For example, the results might indicate that
|
||||
at 16:45 there was an unusual count of log message category 11. You can then
|
||||
examine the description and examples of that category. For more information, see
|
||||
{ml-docs}/ml-configuring-categories.html[Categorizing log messages].
|
||||
|
||||
[[ml-get-category-path-parms]]
|
||||
==== {api-path-parms-title}
|
||||
|
||||
`<category_id>`::
|
||||
(Optional, long) Identifier for the category. If you do not specify this
|
||||
parameter, the API returns information about all categories in the {anomaly-job}.
|
||||
|
||||
`<job_id>`::
|
||||
(Required, string)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-anomaly-detection]
|
||||
|
||||
`<category_id>`::
|
||||
(Optional, long) Identifier for the category. If you do not specify this
|
||||
parameter, the API returns information about all categories in the
|
||||
{anomaly-job}.
|
||||
|
||||
[[ml-get-category-request-body]]
|
||||
==== {api-request-body-title}
|
||||
|
||||
`page`::
|
||||
`from`:::
|
||||
(Optional, integer) Skips the specified number of categories.
|
||||
`size`:::
|
||||
(Optional, integer) Specifies the maximum number of categories to obtain.
|
||||
`page`.`from`::
|
||||
(Optional, integer) Skips the specified number of categories.
|
||||
|
||||
`page`.`size`::
|
||||
(Optional, integer) Specifies the maximum number of categories to obtain.
|
||||
|
||||
[[ml-get-category-results]]
|
||||
==== {api-response-body-title}
|
||||
|
||||
The API returns the following information:
|
||||
The API returns an array of category objects, which have the following
|
||||
properties:
|
||||
|
||||
`categories`::
|
||||
(array) An array of category objects. For more information, see
|
||||
<<ml-results-categories,Categories>>.
|
||||
`category_id`::
|
||||
(unsigned integer) A unique identifier for the category.
|
||||
|
||||
`examples`::
|
||||
(array) A list of examples of actual values that matched the category.
|
||||
|
||||
`grok_pattern`::
|
||||
experimental[] (string) A Grok pattern that could be used in {ls} or an ingest
|
||||
pipeline to extract fields from messages that match the category. This field is
|
||||
experimental and may be changed or removed in a future release. The Grok
|
||||
patterns that are found are not optimal, but are often a good starting point for
|
||||
manual tweaking.
|
||||
|
||||
`job_id`::
|
||||
(string)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-anomaly-detection]
|
||||
|
||||
`max_matching_length`::
|
||||
(unsigned integer) The maximum length of the fields that matched the category.
|
||||
The value is increased by 10% to enable matching for similar fields that have
|
||||
not been analyzed.
|
||||
|
||||
`regex`::
|
||||
(string) A regular expression that is used to search for values that match the
|
||||
category.
|
||||
|
||||
`terms`::
|
||||
(string) A space separated list of the common tokens that are matched in values
|
||||
of the category.
|
||||
|
||||
[[ml-get-category-example]]
|
||||
==== {api-examples-title}
|
||||
|
||||
The following example gets information about one category for the
|
||||
`esxi_log` job:
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
GET _ml/anomaly_detectors/esxi_log/results/categories
|
||||
|
|
|
@ -23,6 +23,13 @@ need `read` index privilege on the index that stores the results. The
|
|||
privileges. See <<security-privileges>> and
|
||||
<<built-in-roles>>.
|
||||
|
||||
[[ml-get-influencer-desc]]
|
||||
==== {api-description-title}
|
||||
|
||||
Influencers are the entities that have contributed to, or are to blame for,
|
||||
the anomalies. Influencer results are available only if an
|
||||
`influencer_field_name` is specified in the job configuration.
|
||||
|
||||
[[ml-get-influencer-path-parms]]
|
||||
==== {api-path-parms-title}
|
||||
|
||||
|
@ -34,75 +41,119 @@ include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-anomaly-detection]
|
|||
==== {api-request-body-title}
|
||||
|
||||
`desc`::
|
||||
(Optional, boolean) If true, the results are sorted in descending order.
|
||||
(Optional, boolean)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=desc-results]
|
||||
|
||||
`end`::
|
||||
(Optional, string) Returns influencers with timestamps earlier than this time.
|
||||
(Optional, string) Returns influencers with timestamps earlier than this time.
|
||||
|
||||
`exclude_interim`::
|
||||
(Optional, boolean) If true, the output excludes interim results. By default,
|
||||
interim results are included.
|
||||
(Optional, boolean)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=exclude-interim-results]
|
||||
|
||||
`influencer_score`::
|
||||
(Optional, double) Returns influencers with anomaly scores greater than or
|
||||
equal to this value.
|
||||
(Optional, double) Returns influencers with anomaly scores greater than or equal
|
||||
to this value.
|
||||
|
||||
`page`::
|
||||
`from`:::
|
||||
(Optional, integer) Skips the specified number of influencers.
|
||||
`size`:::
|
||||
(Optional, integer) Specifies the maximum number of influencers to obtain.
|
||||
`page`.`from`::
|
||||
(Optional, integer) Skips the specified number of influencers.
|
||||
|
||||
`page`.`size`::
|
||||
(Optional, integer) Specifies the maximum number of influencers to obtain.
|
||||
|
||||
`sort`::
|
||||
(Optional, string) Specifies the sort field for the requested influencers. By
|
||||
default, the influencers are sorted by the `influencer_score` value.
|
||||
(Optional, string) Specifies the sort field for the requested influencers. By
|
||||
default, the influencers are sorted by the `influencer_score` value.
|
||||
|
||||
`start`::
|
||||
(Optional, string) Returns influencers with timestamps after this time.
|
||||
(Optional, string) Returns influencers with timestamps after this time.
|
||||
|
||||
[[ml-get-influencer-results]]
|
||||
==== {api-response-body-title}
|
||||
|
||||
The API returns the following information:
|
||||
The API returns an array of influencer objects, which have the following
|
||||
properties:
|
||||
|
||||
`influencers`::
|
||||
(array) An array of influencer objects.
|
||||
For more information, see <<ml-results-influencers,Influencers>>.
|
||||
`bucket_span`::
|
||||
(number)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=bucket-span-results]
|
||||
|
||||
`influencer_score`::
|
||||
(number) A normalized score between 0-100, which is based on the probability of
|
||||
the influencer in this bucket aggregated across detectors. Unlike
|
||||
`initial_influencer_score`, this value will be updated by a re-normalization
|
||||
process as new data is analyzed.
|
||||
|
||||
`influencer_field_name`::
|
||||
(string) The field name of the influencer.
|
||||
|
||||
`influencer_field_value`::
|
||||
(string) The entity that influenced, contributed to, or was to blame for the
|
||||
anomaly.
|
||||
|
||||
`initial_influencer_score`::
|
||||
(number) A normalized score between 0-100, which is based on the probability of
|
||||
the influencer aggregated across detectors. This is the initial value that was
|
||||
calculated at the time the bucket was processed.
|
||||
|
||||
`is_interim`::
|
||||
(boolean)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=is-interim]
|
||||
|
||||
`job_id`::
|
||||
(string)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-anomaly-detection]
|
||||
|
||||
`probability`::
|
||||
(number) The probability that the influencer has this behavior, in the range 0
|
||||
to 1. This value can be held to a high precision of over 300 decimal places, so
|
||||
the `influencer_score` is provided as a human-readable and friendly
|
||||
interpretation of this.
|
||||
|
||||
`result_type`::
|
||||
(string) Internal. This value is always set to `influencer`.
|
||||
|
||||
`timestamp`::
|
||||
(date)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=timestamp-results]
|
||||
|
||||
NOTE: Additional influencer properties are added, depending on the fields being
|
||||
analyzed. For example, if it's analyzing `user_name` as an influencer, then a
|
||||
field `user_name` is added to the result document. This information enables you to
|
||||
filter the anomaly results more easily.
|
||||
|
||||
[[ml-get-influencer-example]]
|
||||
==== {api-examples-title}
|
||||
|
||||
The following example gets influencer information for the `it_ops_new_kpi` job:
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
GET _ml/anomaly_detectors/it_ops_new_kpi/results/influencers
|
||||
GET _ml/anomaly_detectors/high_sum_total_sales/results/influencers
|
||||
{
|
||||
"sort": "influencer_score",
|
||||
"desc": true
|
||||
}
|
||||
--------------------------------------------------
|
||||
// TEST[skip:todo]
|
||||
// TEST[skip:Kibana sample data]
|
||||
|
||||
In this example, the API returns the following information, sorted based on the
|
||||
influencer score in descending order:
|
||||
[source,js]
|
||||
----
|
||||
{
|
||||
"count": 28,
|
||||
"count": 189,
|
||||
"influencers": [
|
||||
{
|
||||
"job_id": "it_ops_new_kpi",
|
||||
"job_id": "high_sum_total_sales",
|
||||
"result_type": "influencer",
|
||||
"influencer_field_name": "kpi_indicator",
|
||||
"influencer_field_value": "online_purchases",
|
||||
"kpi_indicator": "online_purchases",
|
||||
"influencer_score": 94.1386,
|
||||
"initial_influencer_score": 94.1386,
|
||||
"probability": 0.000111612,
|
||||
"bucket_span": 600,
|
||||
"is_interim": false,
|
||||
"timestamp": 1454943600000
|
||||
"influencer_field_name": "customer_full_name.keyword",
|
||||
"influencer_field_value": "Wagdi Shaw",
|
||||
"customer_full_name.keyword" : "Wagdi Shaw",
|
||||
"influencer_score": 99.02493,
|
||||
"initial_influencer_score" : 94.67233079580171,
|
||||
"probability" : 1.4784807245686567E-10,
|
||||
"bucket_span" : 3600,
|
||||
"is_interim" : false,
|
||||
"timestamp" : 1574661600000
|
||||
},
|
||||
...
|
||||
]
|
||||
|
|
|
@ -66,45 +66,63 @@ include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-anomaly-detection-wildcard-li
|
|||
include::{docdir}/ml/ml-shared.asciidoc[tag=allow-no-jobs]
|
||||
|
||||
`bucket_span`::
|
||||
(Optional, string) The span of the overall buckets. Must be greater or equal
|
||||
to the largest bucket span of the specified {anomaly-jobs}, which is the
|
||||
default value.
|
||||
(Optional, string) The span of the overall buckets. Must be greater or equal to
|
||||
the largest bucket span of the specified {anomaly-jobs}, which is the default
|
||||
value.
|
||||
|
||||
`end`::
|
||||
(Optional, string) Returns overall buckets with timestamps earlier than this
|
||||
time.
|
||||
(Optional, string) Returns overall buckets with timestamps earlier than this
|
||||
time.
|
||||
|
||||
`exclude_interim`::
|
||||
(Optional, boolean) If `true`, the output excludes interim overall buckets.
|
||||
Overall buckets are interim if any of the job buckets within the overall
|
||||
bucket interval are interim. By default, interim results are included.
|
||||
(Optional, boolean)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=exclude-interim-results]
|
||||
+
|
||||
--
|
||||
If any of the job bucket results within the overall bucket interval are interim
|
||||
results, the overall bucket results are interim results.
|
||||
--
|
||||
|
||||
`overall_score`::
|
||||
(Optional, double) Returns overall buckets with overall scores greater or
|
||||
equal than this value.
|
||||
(Optional, double) Returns overall buckets with overall scores greater than or
|
||||
equal to this value.
|
||||
|
||||
`start`::
|
||||
(Optional, string) Returns overall buckets with timestamps after this time.
|
||||
(Optional, string) Returns overall buckets with timestamps after this time.
|
||||
|
||||
`top_n`::
|
||||
(Optional, integer) The number of top {anomaly-job} bucket scores to be used
|
||||
in the `overall_score` calculation. The default value is `1`.
|
||||
(Optional, integer) The number of top {anomaly-job} bucket scores to be used in
|
||||
the `overall_score` calculation. The default value is `1`.
|
||||
|
||||
[[ml-get-overall-buckets-results]]
|
||||
==== {api-response-body-title}
|
||||
|
||||
The API returns the following information:
|
||||
The API returns an array of overall bucket objects, which have the following
|
||||
properties:
|
||||
|
||||
`overall_buckets`::
|
||||
(array) An array of overall bucket objects. For more information, see
|
||||
<<ml-results-overall-buckets,Overall Buckets>>.
|
||||
`bucket_span`::
|
||||
(number) The length of the bucket in seconds. Matches the job with the longest `bucket_span` value.
|
||||
|
||||
`is_interim`::
|
||||
(boolean)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=is-interim]
|
||||
|
||||
`jobs`::
|
||||
(array) An array of objects that contain the `max_anomaly_score` per `job_id`.
|
||||
|
||||
`overall_score`::
|
||||
(number) The `top_n` average of the maximum bucket `anomaly_score` per job.
|
||||
|
||||
`result_type`::
|
||||
(string) Internal. This is always set to `overall_bucket`.
|
||||
|
||||
`timestamp`::
|
||||
(date)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=timestamp-results]
|
||||
|
||||
[[ml-get-overall-buckets-example]]
|
||||
==== {api-examples-title}
|
||||
|
||||
The following example gets overall buckets for {anomaly-jobs} with IDs matching
|
||||
`job-*`:
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
GET _ml/anomaly_detectors/job-*/results/overall_buckets
|
||||
|
|
|
@ -22,6 +22,22 @@ need `read` index privilege on the index that stores the results. The
|
|||
`machine_learning_admin` and `machine_learning_user` roles provide these
|
||||
privileges. See <<security-privileges>> and <<built-in-roles>>.
|
||||
|
||||
[[ml-get-record-desc]]
|
||||
==== {api-description-title}
|
||||
|
||||
Records contain the detailed analytical results. They describe the anomalous
|
||||
activity that has been identified in the input data based on the detector
|
||||
configuration.
|
||||
|
||||
There can be many anomaly records depending on the characteristics and size of
|
||||
the input data. In practice, there are often too many to be able to manually
|
||||
process them. The {ml-features} therefore perform a sophisticated aggregation of
|
||||
the anomaly records into buckets.
|
||||
|
||||
The number of record results depends on the number of anomalies found in each
|
||||
bucket, which relates to the number of time series being modeled and the number
|
||||
of detectors.
|
||||
|
||||
[[ml-get-record-path-parms]]
|
||||
==== {api-path-parms-title}
|
||||
|
||||
|
@ -33,83 +49,194 @@ include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-anomaly-detection]
|
|||
==== {api-request-body-title}
|
||||
|
||||
`desc`::
|
||||
(Optional, boolean) If true, the results are sorted in descending order.
|
||||
(Optional, boolean)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=desc-results]
|
||||
|
||||
`end`::
|
||||
(Optional, string) Returns records with timestamps earlier than this time.
|
||||
(Optional, string) Returns records with timestamps earlier than this time.
|
||||
|
||||
`exclude_interim`::
|
||||
(Optional, boolean) If true, the output excludes interim results. By default,
|
||||
interim results are included.
|
||||
(Optional, boolean)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=exclude-interim-results]
|
||||
|
||||
`page`::
|
||||
`from`:::
|
||||
(Optional, integer) Skips the specified number of records.
|
||||
`size`:::
|
||||
(Optional, integer) Specifies the maximum number of records to obtain.
|
||||
`page`.`from`::
|
||||
(Optional, integer) Skips the specified number of records.
|
||||
|
||||
`page`.`size`::
|
||||
(Optional, integer) Specifies the maximum number of records to obtain.
|
||||
|
||||
`record_score`::
|
||||
(Optional, double) Returns records with anomaly scores greater or equal than
|
||||
this value.
|
||||
(Optional, double) Returns records with anomaly scores greater or equal than
|
||||
this value.
|
||||
|
||||
`sort`::
|
||||
(Optional, string) Specifies the sort field for the requested records. By
|
||||
default, the records are sorted by the `anomaly_score` value.
|
||||
(Optional, string) Specifies the sort field for the requested records. By
|
||||
default, the records are sorted by the `anomaly_score` value.
|
||||
|
||||
`start`::
|
||||
(Optional, string) Returns records with timestamps after this time.
|
||||
(Optional, string) Returns records with timestamps after this time.
|
||||
|
||||
[[ml-get-record-results]]
|
||||
==== {api-response-body-title}
|
||||
|
||||
The API returns the following information:
|
||||
The API returns an array of record objects, which have the following properties:
|
||||
|
||||
`records`::
|
||||
(array) An array of record objects. For more information, see
|
||||
<<ml-results-records>>.
|
||||
`actual`::
|
||||
(array) The actual value for the bucket.
|
||||
|
||||
`bucket_span`::
|
||||
(number)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=bucket-span-results]
|
||||
|
||||
`by_field_name`::
|
||||
(string)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=by-field-name]
|
||||
|
||||
`by_field_value`::
|
||||
(string) The value of the by field.
|
||||
|
||||
`causes`::
|
||||
(array) For population analysis, an over field must be specified in the detector.
|
||||
This property contains an array of anomaly records that are the causes for the
|
||||
anomaly that has been identified for the over field. If no over fields exist,
|
||||
this field is not present. This sub-resource contains the most anomalous records
|
||||
for the `over_field_name`. For scalability reasons, a maximum of the 10 most
|
||||
significant causes of the anomaly are returned. As part of the core analytical
|
||||
modeling, these low-level anomaly records are aggregated for their parent over
|
||||
field record. The causes resource contains similar elements to the record
|
||||
resource, namely `actual`, `typical`, `geo_results.actual_point`,
|
||||
`geo_results.typical_point`, `*_field_name` and `*_field_value`. Probability and
|
||||
scores are not applicable to causes.
|
||||
|
||||
`detector_index`::
|
||||
(number)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=detector-index]
|
||||
|
||||
`field_name`::
|
||||
(string) Certain functions require a field to operate on, for example, `sum()`.
|
||||
For those functions, this value is the name of the field to be analyzed.
|
||||
|
||||
`function`::
|
||||
(string) The function in which the anomaly occurs, as specified in the
|
||||
detector configuration. For example, `max`.
|
||||
|
||||
`function_description`::
|
||||
(string) The description of the function in which the anomaly occurs, as
|
||||
specified in the detector configuration.
|
||||
|
||||
`geo_results.actual_point`::
|
||||
(string) The actual value for the bucket formatted as a `geo_point`. If the
|
||||
detector function is `lat_long`, this is a comma delimited string of the
|
||||
latitude and longitude.
|
||||
|
||||
`geo_results.typical_point`::
|
||||
(string) The typical value for the bucket formatted as a `geo_point`. If the
|
||||
detector function is `lat_long`, this is a comma delimited string of the
|
||||
latitude and longitude.
|
||||
|
||||
`influencers`::
|
||||
(array) If `influencers` was specified in the detector configuration, this array
|
||||
contains influencers that contributed to or were to blame for an anomaly.
|
||||
|
||||
`initial_record_score`::
|
||||
(number) A normalized score between 0-100, which is based on the probability of
|
||||
the anomalousness of this record. This is the initial value that was calculated
|
||||
at the time the bucket was processed.
|
||||
|
||||
`is_interim`::
|
||||
(boolean)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=is-interim]
|
||||
|
||||
`job_id`::
|
||||
(string)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-anomaly-detection]
|
||||
|
||||
`over_field_name`::
|
||||
(string)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=over-field-name]
|
||||
|
||||
`over_field_value`::
|
||||
(string) The value of the over field.
|
||||
|
||||
`partition_field_name`::
|
||||
(string)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=partition-field-name]
|
||||
|
||||
`partition_field_value`::
|
||||
(string) The value of the partition field.
|
||||
|
||||
`probability`::
|
||||
(number) The probability of the individual anomaly occurring, in the range `0`
|
||||
to `1`. This value can be held to a high precision of over 300 decimal places,
|
||||
so the `record_score` is provided as a human-readable and friendly
|
||||
interpretation of this.
|
||||
|
||||
`multi_bucket_impact`::
|
||||
(number) an indication of how strongly an anomaly is multi bucket or single
|
||||
bucket. The value is on a scale of `-5.0` to `+5.0` where `-5.0` means the
|
||||
anomaly is purely single bucket and `+5.0` means the anomaly is purely multi
|
||||
bucket.
|
||||
|
||||
`record_score`::
|
||||
(number) A normalized score between 0-100, which is based on the probability of
|
||||
the anomalousness of this record. Unlike `initial_record_score`, this value will
|
||||
be updated by a re-normalization process as new data is analyzed.
|
||||
|
||||
`result_type`::
|
||||
(string) Internal. This is always set to `record`.
|
||||
|
||||
`timestamp`::
|
||||
(date)
|
||||
include::{docdir}/ml/ml-shared.asciidoc[tag=timestamp-results]
|
||||
|
||||
`typical`::
|
||||
(array) The typical value for the bucket, according to analytical modeling.
|
||||
|
||||
NOTE: Additional record properties are added, depending on the fields being
|
||||
analyzed. For example, if it's analyzing `hostname` as a _by field_, then a field
|
||||
`hostname` is added to the result document. This information enables you to
|
||||
filter the anomaly results more easily.
|
||||
|
||||
[[ml-get-record-example]]
|
||||
==== {api-examples-title}
|
||||
|
||||
The following example gets record information for the `it-ops-kpi` job:
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
GET _ml/anomaly_detectors/it-ops-kpi/results/records
|
||||
GET _ml/anomaly_detectors/low_request_rate/results/records
|
||||
{
|
||||
"sort": "record_score",
|
||||
"desc": true,
|
||||
"start": "1454944100000"
|
||||
}
|
||||
--------------------------------------------------
|
||||
// TEST[skip:todo]
|
||||
// TEST[skip:Kibana sample data]
|
||||
|
||||
In this example, the API returns twelve results for the specified
|
||||
time constraints:
|
||||
[source,js]
|
||||
----
|
||||
{
|
||||
"count": 12,
|
||||
"records": [
|
||||
"count" : 4,
|
||||
"records" : [
|
||||
{
|
||||
"job_id": "it-ops-kpi",
|
||||
"result_type": "record",
|
||||
"probability": 0.00000332668,
|
||||
"record_score": 72.9929,
|
||||
"initial_record_score": 65.7923,
|
||||
"bucket_span": 300,
|
||||
"detector_index": 0,
|
||||
"is_interim": false,
|
||||
"timestamp": 1454944200000,
|
||||
"function": "low_sum",
|
||||
"function_description": "sum",
|
||||
"typical": [
|
||||
1806.48
|
||||
"job_id" : "low_request_rate",
|
||||
"result_type" : "record",
|
||||
"probability" : 1.3882308899968812E-4,
|
||||
"multi_bucket_impact" : -5.0,
|
||||
"record_score" : 94.98554565630553,
|
||||
"initial_record_score" : 94.98554565630553,
|
||||
"bucket_span" : 3600,
|
||||
"detector_index" : 0,
|
||||
"is_interim" : false,
|
||||
"timestamp" : 1577793600000,
|
||||
"function" : "low_count",
|
||||
"function_description" : "count",
|
||||
"typical" : [
|
||||
28.254208230188834
|
||||
],
|
||||
"actual": [
|
||||
288
|
||||
],
|
||||
"field_name": "events_per_min"
|
||||
"actual" : [
|
||||
0.0
|
||||
]
|
||||
},
|
||||
...
|
||||
]
|
||||
|
|
|
@ -1,479 +0,0 @@
|
|||
[role="xpack"]
|
||||
[testenv="platinum"]
|
||||
[[ml-results-resource]]
|
||||
=== Results resources
|
||||
|
||||
Several different result types are created for each job. You can query anomaly
|
||||
results for _buckets_, _influencers_, and _records_ by using the results API.
|
||||
Summarized bucket results over multiple jobs can be queried as well; those
|
||||
results are called _overall buckets_.
|
||||
|
||||
Results are written for each `bucket_span`. The timestamp for the results is the
|
||||
start of the bucket time interval.
|
||||
|
||||
The results include scores, which are calculated for each anomaly result type and
|
||||
each bucket interval. These scores are aggregated in order to reduce noise, and
|
||||
normalized in order to identify and rank the most mathematically significant
|
||||
anomalies.
|
||||
|
||||
Bucket results provide the top level, overall view of the job and are ideal for
|
||||
alerts. For example, the bucket results might indicate that at 16:05 the system
|
||||
was unusual. This information is a summary of all the anomalies, pinpointing
|
||||
when they occurred.
|
||||
|
||||
Influencer results show which entities were anomalous and when. For example,
|
||||
the influencer results might indicate that at 16:05 `user_name: Bob` was unusual.
|
||||
This information is a summary of all the anomalies for each entity, so there
|
||||
can be a lot of these results. Once you have identified a notable bucket time,
|
||||
you can look to see which entities were significant.
|
||||
|
||||
Record results provide details about what the individual anomaly was, when it
|
||||
occurred and which entity was involved. For example, the record results might
|
||||
indicate that at 16:05 Bob sent 837262434 bytes, when the typical value was
|
||||
1067 bytes. Once you have identified a bucket time and perhaps a significant
|
||||
entity too, you can drill through to the record results in order to investigate
|
||||
the anomalous behavior.
|
||||
|
||||
Categorization results contain the definitions of _categories_ that have been
|
||||
identified. These are only applicable for jobs that are configured to analyze
|
||||
unstructured log data using categorization. These results do not contain a
|
||||
timestamp or any calculated scores. For more information, see
|
||||
{ml-docs}/ml-configuring-categories.html[Categorizing log messages].
|
||||
|
||||
* <<ml-results-buckets,Buckets>>
|
||||
* <<ml-results-influencers,Influencers>>
|
||||
* <<ml-results-records,Records>>
|
||||
* <<ml-results-categories,Categories>>
|
||||
* <<ml-results-overall-buckets,Overall Buckets>>
|
||||
|
||||
NOTE: All of these resources and properties are informational; you cannot
|
||||
change their values.
|
||||
|
||||
[float]
|
||||
[[ml-results-buckets]]
|
||||
==== Buckets
|
||||
|
||||
Bucket results provide the top level, overall view of the job and are best for
|
||||
alerting.
|
||||
|
||||
Each bucket has an `anomaly_score`, which is a statistically aggregated and
|
||||
normalized view of the combined anomalousness of all the record results within
|
||||
each bucket.
|
||||
|
||||
One bucket result is written for each `bucket_span` for each job, even if it is
|
||||
not considered to be anomalous. If the bucket is not anomalous, it has an
|
||||
`anomaly_score` of zero.
|
||||
|
||||
When you identify an anomalous bucket, you can investigate further by expanding
|
||||
the bucket resource to show the records as nested objects. Alternatively, you
|
||||
can access the records resource directly and filter by the date range.
|
||||
|
||||
A bucket resource has the following properties:
|
||||
|
||||
`anomaly_score`::
|
||||
(number) The maximum anomaly score, between 0-100, for any of the bucket
|
||||
influencers. This is an overall, rate-limited score for the job. All the
|
||||
anomaly records in the bucket contribute to this score. This value might be
|
||||
updated as new data is analyzed.
|
||||
|
||||
`bucket_influencers`::
|
||||
(array) An array of bucket influencer objects.
|
||||
For more information, see <<ml-results-bucket-influencers,Bucket Influencers>>.
|
||||
|
||||
`bucket_span`::
|
||||
(number) The length of the bucket in seconds.
|
||||
This value matches the `bucket_span` that is specified in the job.
|
||||
|
||||
`event_count`::
|
||||
(number) The number of input data records processed in this bucket.
|
||||
|
||||
`initial_anomaly_score`::
|
||||
(number) The maximum `anomaly_score` for any of the bucket influencers.
|
||||
This is the initial value that was calculated at the time the bucket was
|
||||
processed.
|
||||
|
||||
`is_interim`::
|
||||
(boolean) If true, this is an interim result. In other words, the bucket
|
||||
results are calculated based on partial input data.
|
||||
|
||||
`job_id`::
|
||||
(string) The unique identifier for the job that these results belong to.
|
||||
|
||||
`processing_time_ms`::
|
||||
(number) The amount of time, in milliseconds, that it took to analyze the
|
||||
bucket contents and calculate results.
|
||||
|
||||
`result_type`::
|
||||
(string) Internal. This value is always set to `bucket`.
|
||||
|
||||
`timestamp`::
|
||||
(date) The start time of the bucket. This timestamp uniquely identifies the
|
||||
bucket. +
|
||||
|
||||
NOTE: Events that occur exactly at the timestamp of the bucket are included in
|
||||
the results for the bucket.
|
||||
|
||||
|
||||
[float]
|
||||
[[ml-results-bucket-influencers]]
|
||||
==== Bucket Influencers
|
||||
|
||||
Bucket influencer results are available as nested objects contained within
|
||||
bucket results. These results are an aggregation for each type of influencer.
|
||||
For example, if both `client_ip` and `user_name` were specified as influencers,
|
||||
then you would be able to determine when the `client_ip` or `user_name` values
|
||||
were collectively anomalous.
|
||||
|
||||
There is a built-in bucket influencer called `bucket_time` which is always
|
||||
available. This bucket influencer is the aggregation of all records in the
|
||||
bucket; it is not just limited to a type of influencer.
|
||||
|
||||
NOTE: A bucket influencer is a type of influencer. For example, `client_ip` or
|
||||
`user_name` can be bucket influencers, whereas `192.168.88.2` and `Bob` are
|
||||
influencers.
|
||||
|
||||
An bucket influencer object has the following properties:
|
||||
|
||||
`anomaly_score`::
|
||||
(number) A normalized score between 0-100, which is calculated for each bucket
|
||||
influencer. This score might be updated as newer data is analyzed.
|
||||
|
||||
`bucket_span`::
|
||||
(number) The length of the bucket in seconds. This value matches the `bucket_span`
|
||||
that is specified in the job.
|
||||
|
||||
`initial_anomaly_score`::
|
||||
(number) The score between 0-100 for each bucket influencer. This score is
|
||||
the initial value that was calculated at the time the bucket was processed.
|
||||
|
||||
`influencer_field_name`::
|
||||
(string) The field name of the influencer. For example `client_ip` or
|
||||
`user_name`.
|
||||
|
||||
`influencer_field_value`::
|
||||
(string) The field value of the influencer. For example `192.168.88.2` or
|
||||
`Bob`.
|
||||
|
||||
`is_interim`::
|
||||
(boolean) If true, this is an interim result. In other words, the bucket
|
||||
influencer results are calculated based on partial input data.
|
||||
|
||||
`job_id`::
|
||||
(string) The unique identifier for the job that these results belong to.
|
||||
|
||||
`probability`::
|
||||
(number) The probability that the bucket has this behavior, in the range 0
|
||||
to 1. For example, 0.0000109783. This value can be held to a high precision
|
||||
of over 300 decimal places, so the `anomaly_score` is provided as a
|
||||
human-readable and friendly interpretation of this.
|
||||
|
||||
`raw_anomaly_score`::
|
||||
(number) Internal.
|
||||
|
||||
`result_type`::
|
||||
(string) Internal. This value is always set to `bucket_influencer`.
|
||||
|
||||
`timestamp`::
|
||||
(date) The start time of the bucket for which these results were calculated.
|
||||
|
||||
[float]
|
||||
[[ml-results-influencers]]
|
||||
==== Influencers
|
||||
|
||||
Influencers are the entities that have contributed to, or are to blame for,
|
||||
the anomalies. Influencer results are available only if an
|
||||
`influencer_field_name` is specified in the job configuration.
|
||||
|
||||
Influencers are given an `influencer_score`, which is calculated based on the
|
||||
anomalies that have occurred in each bucket interval. For jobs with more than
|
||||
one detector, this gives a powerful view of the most anomalous entities.
|
||||
|
||||
For example, if you are analyzing unusual bytes sent and unusual domains
|
||||
visited and you specified `user_name` as the influencer, then an
|
||||
`influencer_score` for each anomalous user name is written per bucket. For
|
||||
example, if `user_name: Bob` had an `influencer_score` greater than 75, then
|
||||
`Bob` would be considered very anomalous during this time interval in one or
|
||||
both of those areas (unusual bytes sent or unusual domains visited).
|
||||
|
||||
One influencer result is written per bucket for each influencer that is
|
||||
considered anomalous.
|
||||
|
||||
When you identify an influencer with a high score, you can investigate further
|
||||
by accessing the records resource for that bucket and enumerating the anomaly
|
||||
records that contain the influencer.
|
||||
|
||||
An influencer object has the following properties:
|
||||
|
||||
`bucket_span`::
|
||||
(number) The length of the bucket in seconds. This value matches the `bucket_span`
|
||||
that is specified in the job.
|
||||
|
||||
`influencer_score`::
|
||||
(number) A normalized score between 0-100, which is based on the probability
|
||||
of the influencer in this bucket aggregated across detectors. Unlike
|
||||
`initial_influencer_score`, this value will be updated by a re-normalization
|
||||
process as new data is analyzed.
|
||||
|
||||
`initial_influencer_score`::
|
||||
(number) A normalized score between 0-100, which is based on the probability
|
||||
of the influencer aggregated across detectors. This is the initial value that
|
||||
was calculated at the time the bucket was processed.
|
||||
|
||||
`influencer_field_name`::
|
||||
(string) The field name of the influencer.
|
||||
|
||||
`influencer_field_value`::
|
||||
(string) The entity that influenced, contributed to, or was to blame for the
|
||||
anomaly.
|
||||
|
||||
`is_interim`::
|
||||
(boolean) If true, this is an interim result. In other words, the influencer
|
||||
results are calculated based on partial input data.
|
||||
|
||||
`job_id`::
|
||||
(string) The unique identifier for the job that these results belong to.
|
||||
|
||||
`probability`::
|
||||
(number) The probability that the influencer has this behavior, in the range
|
||||
0 to 1. For example, 0.0000109783. This value can be held to a high precision
|
||||
of over 300 decimal places, so the `influencer_score` is provided as a
|
||||
human-readable and friendly interpretation of this.
|
||||
// For example, 0.03 means 3%. This value is held to a high precision of over
|
||||
//300 decimal places. In scientific notation, a value of 3.24E-300 is highly
|
||||
//unlikely and therefore highly anomalous.
|
||||
|
||||
`result_type`::
|
||||
(string) Internal. This value is always set to `influencer`.
|
||||
|
||||
`timestamp`::
|
||||
(date) The start time of the bucket for which these results were calculated.
|
||||
|
||||
NOTE: Additional influencer properties are added, depending on the fields being
|
||||
analyzed. For example, if it's analyzing `user_name` as an influencer, then a
|
||||
field `user_name` is added to the result document. This information enables you to
|
||||
filter the anomaly results more easily.
|
||||
|
||||
|
||||
[float]
|
||||
[[ml-results-records]]
|
||||
==== Records
|
||||
|
||||
Records contain the detailed analytical results. They describe the anomalous
|
||||
activity that has been identified in the input data based on the detector
|
||||
configuration.
|
||||
|
||||
For example, if you are looking for unusually large data transfers, an anomaly
|
||||
record can identify the source IP address, the destination, the time window
|
||||
during which it occurred, the expected and actual size of the transfer, and the
|
||||
probability of this occurrence.
|
||||
|
||||
There can be many anomaly records depending on the characteristics and size of
|
||||
the input data. In practice, there are often too many to be able to manually
|
||||
process them. The {ml-features} therefore perform a sophisticated
|
||||
aggregation of the anomaly records into buckets.
|
||||
|
||||
The number of record results depends on the number of anomalies found in each
|
||||
bucket, which relates to the number of time series being modeled and the number of
|
||||
detectors.
|
||||
|
||||
A record object has the following properties:
|
||||
|
||||
`actual`::
|
||||
(array) The actual value for the bucket.
|
||||
|
||||
`bucket_span`::
|
||||
(number) The length of the bucket in seconds.
|
||||
This value matches the `bucket_span` that is specified in the job.
|
||||
|
||||
`by_field_name`::
|
||||
(string) The name of the analyzed field. This value is present only if
|
||||
it is specified in the detector. For example, `client_ip`.
|
||||
|
||||
`by_field_value`::
|
||||
(string) The value of `by_field_name`. This value is present only if
|
||||
it is specified in the detector. For example, `192.168.66.2`.
|
||||
|
||||
`causes`::
|
||||
(array) For population analysis, an over field must be specified in the
|
||||
detector. This property contains an array of anomaly records that are the
|
||||
causes for the anomaly that has been identified for the over field. If no
|
||||
over fields exist, this field is not present. This sub-resource contains
|
||||
the most anomalous records for the `over_field_name`. For scalability reasons,
|
||||
a maximum of the 10 most significant causes of the anomaly are returned. As
|
||||
part of the core analytical modeling, these low-level anomaly records are
|
||||
aggregated for their parent over field record. The causes resource contains
|
||||
similar elements to the record resource, namely `actual`, `typical`,
|
||||
`geo_results.actual_point`, `geo_results.typical_point`,
|
||||
`*_field_name` and `*_field_value`.
|
||||
Probability and scores are not applicable to causes.
|
||||
|
||||
`detector_index`::
|
||||
(number) A unique identifier for the detector.
|
||||
|
||||
`field_name`::
|
||||
(string) Certain functions require a field to operate on, for example, `sum()`.
|
||||
For those functions, this value is the name of the field to be analyzed.
|
||||
|
||||
`function`::
|
||||
(string) The function in which the anomaly occurs, as specified in the
|
||||
detector configuration. For example, `max`.
|
||||
|
||||
`function_description`::
|
||||
(string) The description of the function in which the anomaly occurs, as
|
||||
specified in the detector configuration.
|
||||
|
||||
`influencers`::
|
||||
(array) If `influencers` was specified in the detector configuration, then
|
||||
this array contains influencers that contributed to or were to blame for an
|
||||
anomaly.
|
||||
|
||||
`initial_record_score`::
|
||||
(number) A normalized score between 0-100, which is based on the
|
||||
probability of the anomalousness of this record. This is the initial value
|
||||
that was calculated at the time the bucket was processed.
|
||||
|
||||
`is_interim`::
|
||||
(boolean) If true, this is an interim result. In other words, the anomaly
|
||||
record is calculated based on partial input data.
|
||||
|
||||
`job_id`::
|
||||
(string) The unique identifier for the job that these results belong to.
|
||||
|
||||
`over_field_name`::
|
||||
(string) The name of the over field that was used in the analysis. This value
|
||||
is present only if it was specified in the detector. Over fields are used
|
||||
in population analysis. For example, `user`.
|
||||
|
||||
`over_field_value`::
|
||||
(string) The value of `over_field_name`. This value is present only if it
|
||||
was specified in the detector. For example, `Bob`.
|
||||
|
||||
`partition_field_name`::
|
||||
(string) The name of the partition field that was used in the analysis. This
|
||||
value is present only if it was specified in the detector. For example,
|
||||
`region`.
|
||||
|
||||
`partition_field_value`::
|
||||
(string) The value of `partition_field_name`. This value is present only if
|
||||
it was specified in the detector. For example, `us-east-1`.
|
||||
|
||||
`probability`::
|
||||
(number) The probability of the individual anomaly occurring, in the range
|
||||
0 to 1. For example, 0.0000772031. This value can be held to a high precision
|
||||
of over 300 decimal places, so the `record_score` is provided as a
|
||||
human-readable and friendly interpretation of this.
|
||||
//In scientific notation, a value of 3.24E-300 is highly unlikely and therefore
|
||||
//highly anomalous.
|
||||
|
||||
`multi_bucket_impact`::
|
||||
(number) an indication of how strongly an anomaly is multi bucket or single bucket.
|
||||
The value is on a scale of -5 to +5 where -5 means the anomaly is purely single
|
||||
bucket and +5 means the anomaly is purely multi bucket.
|
||||
|
||||
`record_score`::
|
||||
(number) A normalized score between 0-100, which is based on the probability
|
||||
of the anomalousness of this record. Unlike `initial_record_score`, this
|
||||
value will be updated by a re-normalization process as new data is analyzed.
|
||||
|
||||
`result_type`::
|
||||
(string) Internal. This is always set to `record`.
|
||||
|
||||
`timestamp`::
|
||||
(date) The start time of the bucket for which these results were calculated.
|
||||
|
||||
`typical`::
|
||||
(array) The typical value for the bucket, according to analytical modeling.
|
||||
|
||||
`geo_results.actual_point`::
|
||||
(string) The actual value for the bucket formatted as a `geo_point`.
|
||||
If the detector function is `lat_long`, this is a comma delimited string
|
||||
of the latitude and longitude.
|
||||
|
||||
`geo_results.typical_point`::
|
||||
(string) The typical value for the bucket formatted as a `geo_point`.
|
||||
If the detector function is `lat_long`, this is a comma delimited string
|
||||
of the latitude and longitude.
|
||||
|
||||
NOTE: Additional record properties are added, depending on the fields being
|
||||
analyzed. For example, if it's analyzing `hostname` as a _by field_, then a field
|
||||
`hostname` is added to the result document. This information enables you to
|
||||
filter the anomaly results more easily.
|
||||
|
||||
|
||||
[float]
|
||||
[[ml-results-categories]]
|
||||
==== Categories
|
||||
|
||||
When `categorization_field_name` is specified in the job configuration, it is
|
||||
possible to view the definitions of the resulting categories. A category
|
||||
definition describes the common terms matched and contains examples of matched
|
||||
values.
|
||||
|
||||
The anomaly results from a categorization analysis are available as bucket,
|
||||
influencer, and record results. For example, the results might indicate that
|
||||
at 16:45 there was an unusual count of log message category 11. You can then
|
||||
examine the description and examples of that category.
|
||||
|
||||
A category resource has the following properties:
|
||||
|
||||
`category_id`::
|
||||
(unsigned integer) A unique identifier for the category.
|
||||
|
||||
`examples`::
|
||||
(array) A list of examples of actual values that matched the category.
|
||||
|
||||
`grok_pattern`::
|
||||
experimental[] (string) A Grok pattern that could be used in Logstash or an
|
||||
Ingest Pipeline to extract fields from messages that match the category. This
|
||||
field is experimental and may be changed or removed in a future release. The
|
||||
Grok patterns that are found are not optimal, but are often a good starting
|
||||
point for manual tweaking.
|
||||
|
||||
`job_id`::
|
||||
(string) The unique identifier for the job that these results belong to.
|
||||
|
||||
`max_matching_length`::
|
||||
(unsigned integer) The maximum length of the fields that matched the category.
|
||||
The value is increased by 10% to enable matching for similar fields that have
|
||||
not been analyzed.
|
||||
|
||||
`regex`::
|
||||
(string) A regular expression that is used to search for values that match the
|
||||
category.
|
||||
|
||||
`terms`::
|
||||
(string) A space separated list of the common tokens that are matched in
|
||||
values of the category.
|
||||
|
||||
[float]
|
||||
[[ml-results-overall-buckets]]
|
||||
==== Overall Buckets
|
||||
|
||||
Overall buckets provide a summary of bucket results over multiple jobs.
|
||||
Their `bucket_span` equals the longest `bucket_span` of the jobs in question.
|
||||
The `overall_score` is the `top_n` average of the max `anomaly_score` per job
|
||||
within the overall bucket time interval.
|
||||
This means that you can fine-tune the `overall_score` so that it is more
|
||||
or less sensitive to the number of jobs that detect an anomaly at the same time.
|
||||
|
||||
An overall bucket resource has the following properties:
|
||||
|
||||
`timestamp`::
|
||||
(date) The start time of the overall bucket.
|
||||
|
||||
`bucket_span`::
|
||||
(number) The length of the bucket in seconds. Matches the `bucket_span`
|
||||
of the job with the longest one.
|
||||
|
||||
`overall_score`::
|
||||
(number) The `top_n` average of the max bucket `anomaly_score` per job.
|
||||
|
||||
`jobs`::
|
||||
(array) An array of objects that contain the `max_anomaly_score` per `job_id`.
|
||||
|
||||
`is_interim`::
|
||||
(boolean) If true, this is an interim result. In other words, the anomaly
|
||||
record is calculated based on partial input data.
|
||||
|
||||
`result_type`::
|
||||
(string) Internal. This is always set to `overall_bucket`.
|
|
@ -498,3 +498,20 @@ the details in <<ml-get-job-stats>>.
|
|||
This page was deleted.
|
||||
[[ml-snapshot-stats]]
|
||||
See <<ml-update-snapshot>> and <<ml-get-snapshot>>.
|
||||
|
||||
[role="exclude",id="ml-results-resource"]
|
||||
=== Results resources
|
||||
|
||||
This page was deleted.
|
||||
[[ml-results-buckets]]
|
||||
See <<ml-get-bucket>>,
|
||||
[[ml-results-bucket-influencers]]
|
||||
<<ml-get-bucket>>,
|
||||
[[ml-results-influencers]]
|
||||
<<ml-get-influencer>>,
|
||||
[[ml-results-records]]
|
||||
<<ml-get-record>>,
|
||||
[[ml-results-categories]]
|
||||
<<ml-get-category>>, and
|
||||
[[ml-results-overall-buckets]]
|
||||
<<ml-get-overall-buckets>>.
|
||||
|
|
|
@ -6,9 +6,7 @@ These resource definitions are used in APIs related to {ml-features} and
|
|||
{security-features} and in {kib} advanced {ml} job configuration options.
|
||||
|
||||
* <<ml-dfa-analysis-objects>>
|
||||
* <<ml-results-resource,{anomaly-detect-cap} results>>
|
||||
* <<role-mapping-resources,Role mappings>>
|
||||
|
||||
include::{es-repo-dir}/ml/df-analytics/apis/analysisobjects.asciidoc[]
|
||||
include::{xes-repo-dir}/rest-api/security/role-mapping-resources.asciidoc[]
|
||||
include::{es-repo-dir}/ml/anomaly-detection/apis/resultsresource.asciidoc[]
|
||||
|
|
Loading…
Reference in New Issue