mirror of
https://github.com/honeymoose/OpenSearch.git
synced 2025-02-08 22:14:59 +00:00
[DOCS] Updates anomaly detection terminology (#44888)
This commit is contained in:
parent
cef375f883
commit
a041d1eacf
@ -4,7 +4,7 @@
|
|||||||
|
|
||||||
By default, {dfeeds} fetch data from {es} using search and scroll requests.
|
By default, {dfeeds} fetch data from {es} using search and scroll requests.
|
||||||
It can be significantly more efficient, however, to aggregate data in {es}
|
It can be significantly more efficient, however, to aggregate data in {es}
|
||||||
and to configure your jobs to analyze aggregated data.
|
and to configure your {anomaly-jobs} to analyze aggregated data.
|
||||||
|
|
||||||
One of the benefits of aggregating data this way is that {es} automatically
|
One of the benefits of aggregating data this way is that {es} automatically
|
||||||
distributes these calculations across your cluster. You can then feed this
|
distributes these calculations across your cluster. You can then feed this
|
||||||
@ -19,8 +19,8 @@ of the last record in the bucket. If you use a terms aggregation and the
|
|||||||
cardinality of a term is high, then the aggregation might not be effective and
|
cardinality of a term is high, then the aggregation might not be effective and
|
||||||
you might want to just use the default search and scroll behavior.
|
you might want to just use the default search and scroll behavior.
|
||||||
|
|
||||||
When you create or update a job, you can include the names of aggregations, for
|
When you create or update an {anomaly-job}, you can include the names of
|
||||||
example:
|
aggregations, for example:
|
||||||
|
|
||||||
[source,js]
|
[source,js]
|
||||||
----------------------------------
|
----------------------------------
|
||||||
|
@ -68,8 +68,8 @@ we do not want the detailed SQL to be considered in the message categorization.
|
|||||||
This particular categorization filter removes the SQL statement from the categorization
|
This particular categorization filter removes the SQL statement from the categorization
|
||||||
algorithm.
|
algorithm.
|
||||||
|
|
||||||
If your data is stored in {es}, you can create an advanced job with these same
|
If your data is stored in {es}, you can create an advanced {anomaly-job} with
|
||||||
properties:
|
these same properties:
|
||||||
|
|
||||||
[role="screenshot"]
|
[role="screenshot"]
|
||||||
image::images/ml-category-advanced.jpg["Advanced job configuration options related to categorization"]
|
image::images/ml-category-advanced.jpg["Advanced job configuration options related to categorization"]
|
||||||
@ -209,7 +209,7 @@ letters in tokens whereas the `ml_classic` tokenizer does, although that could
|
|||||||
be fixed by using more complex regular expressions.
|
be fixed by using more complex regular expressions.
|
||||||
|
|
||||||
For more information about the `categorization_analyzer` property, see
|
For more information about the `categorization_analyzer` property, see
|
||||||
{ref}/ml-job-resource.html#ml-categorizationanalyzer[Categorization Analyzer].
|
{ref}/ml-job-resource.html#ml-categorizationanalyzer[Categorization analyzer].
|
||||||
|
|
||||||
NOTE: To add the `categorization_analyzer` property in {kib}, you must use the
|
NOTE: To add the `categorization_analyzer` property in {kib}, you must use the
|
||||||
**Edit JSON** tab and copy the `categorization_analyzer` object from one of the
|
**Edit JSON** tab and copy the `categorization_analyzer` object from one of the
|
||||||
|
@ -7,8 +7,8 @@ your cluster and all master-eligible nodes must have {ml} enabled. By default,
|
|||||||
all nodes are {ml} nodes. For more information about these settings, see
|
all nodes are {ml} nodes. For more information about these settings, see
|
||||||
{ref}/modules-node.html#ml-node[{ml} nodes].
|
{ref}/modules-node.html#ml-node[{ml} nodes].
|
||||||
|
|
||||||
To use the {ml-features} to analyze your data, you must create a job and
|
To use the {ml-features} to analyze your data, you can create an {anomaly-job}
|
||||||
send your data to that job.
|
and send your data to that job.
|
||||||
|
|
||||||
* If your data is stored in {es}:
|
* If your data is stored in {es}:
|
||||||
|
|
||||||
|
@ -2,17 +2,17 @@
|
|||||||
[[ml-configuring-url]]
|
[[ml-configuring-url]]
|
||||||
=== Adding custom URLs to machine learning results
|
=== Adding custom URLs to machine learning results
|
||||||
|
|
||||||
When you create an advanced job or edit any job in {kib}, you can optionally
|
When you create an advanced {anomaly-job} or edit any {anomaly-jobs} in {kib},
|
||||||
attach one or more custom URLs.
|
you can optionally attach one or more custom URLs.
|
||||||
|
|
||||||
The custom URLs provide links from the anomalies table in the *Anomaly Explorer*
|
The custom URLs provide links from the anomalies table in the *Anomaly Explorer*
|
||||||
or *Single Metric Viewer* window in {kib} to {kib} dashboards, the *Discovery*
|
or *Single Metric Viewer* window in {kib} to {kib} dashboards, the *Discovery*
|
||||||
page, or external websites. For example, you can define a custom URL that
|
page, or external websites. For example, you can define a custom URL that
|
||||||
provides a way for users to drill down to the source data from the results set.
|
provides a way for users to drill down to the source data from the results set.
|
||||||
|
|
||||||
When you edit a job in {kib}, it simplifies the creation of the custom URLs for
|
When you edit an {anomaly-job} in {kib}, it simplifies the creation of the
|
||||||
{kib} dashboards and the *Discover* page and it enables you to test your URLs.
|
custom URLs for {kib} dashboards and the *Discover* page and it enables you to
|
||||||
For example:
|
test your URLs. For example:
|
||||||
|
|
||||||
[role="screenshot"]
|
[role="screenshot"]
|
||||||
image::images/ml-customurl-edit.jpg["Edit a job to add a custom URL"]
|
image::images/ml-customurl-edit.jpg["Edit a job to add a custom URL"]
|
||||||
@ -29,7 +29,8 @@ As in this case, the custom URL can contain
|
|||||||
are populated when you click the link in the anomalies table. In this example,
|
are populated when you click the link in the anomalies table. In this example,
|
||||||
the custom URL contains `$earliest$`, `$latest$`, and `$service$` tokens, which
|
the custom URL contains `$earliest$`, `$latest$`, and `$service$` tokens, which
|
||||||
pass the beginning and end of the time span of the selected anomaly and the
|
pass the beginning and end of the time span of the selected anomaly and the
|
||||||
pertinent `service` field value to the target page. If you were interested in the following anomaly, for example:
|
pertinent `service` field value to the target page. If you were interested in
|
||||||
|
the following anomaly, for example:
|
||||||
|
|
||||||
[role="screenshot"]
|
[role="screenshot"]
|
||||||
image::images/ml-customurl.jpg["An example of the custom URL links in the Anomaly Explorer anomalies table"]
|
image::images/ml-customurl.jpg["An example of the custom URL links in the Anomaly Explorer anomalies table"]
|
||||||
@ -43,8 +44,8 @@ image::images/ml-customurl-discover.jpg["An example of the results on the Discov
|
|||||||
Since we specified a time range of 2 hours, the time filter restricts the
|
Since we specified a time range of 2 hours, the time filter restricts the
|
||||||
results to the time period two hours before and after the anomaly.
|
results to the time period two hours before and after the anomaly.
|
||||||
|
|
||||||
You can also specify these custom URL settings when you create or update jobs by
|
You can also specify these custom URL settings when you create or update
|
||||||
using the {ml} APIs.
|
{anomaly-jobs} by using the APIs.
|
||||||
|
|
||||||
[float]
|
[float]
|
||||||
[[ml-configuring-url-strings]]
|
[[ml-configuring-url-strings]]
|
||||||
@ -74,9 +75,9 @@ time as the earliest and latest times. The same is also true if the interval is
|
|||||||
set to `Auto` and a one hour interval was chosen. You can override this behavior
|
set to `Auto` and a one hour interval was chosen. You can override this behavior
|
||||||
by using the `time_range` setting.
|
by using the `time_range` setting.
|
||||||
|
|
||||||
The `$mlcategoryregex$` and `$mlcategoryterms$` tokens pertain to jobs where you
|
The `$mlcategoryregex$` and `$mlcategoryterms$` tokens pertain to {anomaly-jobs}
|
||||||
are categorizing field values. For more information about this type of analysis,
|
where you are categorizing field values. For more information about this type of
|
||||||
see <<ml-configuring-categories>>.
|
analysis, see <<ml-configuring-categories>>.
|
||||||
|
|
||||||
The `$mlcategoryregex$` token passes the regular expression value of the
|
The `$mlcategoryregex$` token passes the regular expression value of the
|
||||||
category of the selected anomaly, as identified by the value of the `mlcategory`
|
category of the selected anomaly, as identified by the value of the `mlcategory`
|
||||||
|
@ -22,8 +22,8 @@ functions are not really affected. In these situations, it all comes out okay in
|
|||||||
the end as the delayed data is distributed randomly. An example would be a `mean`
|
the end as the delayed data is distributed randomly. An example would be a `mean`
|
||||||
metric for a field in a large collection of data. In this case, checking for
|
metric for a field in a large collection of data. In this case, checking for
|
||||||
delayed data may not provide much benefit. If data are consistently delayed,
|
delayed data may not provide much benefit. If data are consistently delayed,
|
||||||
however, jobs with a `low_count` function may provide false positives. In this
|
however, {anomaly-jobs} with a `low_count` function may provide false positives.
|
||||||
situation, it would be useful to see if data comes in after an anomaly is
|
In this situation, it would be useful to see if data comes in after an anomaly is
|
||||||
recorded so that you can determine a next course of action.
|
recorded so that you can determine a next course of action.
|
||||||
|
|
||||||
==== How do we detect delayed data?
|
==== How do we detect delayed data?
|
||||||
@ -35,11 +35,11 @@ Every 15 minutes or every `check_window`, whichever is smaller, the datafeed
|
|||||||
triggers a document search over the configured indices. This search looks over a
|
triggers a document search over the configured indices. This search looks over a
|
||||||
time span with a length of `check_window` ending with the latest finalized bucket.
|
time span with a length of `check_window` ending with the latest finalized bucket.
|
||||||
That time span is partitioned into buckets, whose length equals the bucket span
|
That time span is partitioned into buckets, whose length equals the bucket span
|
||||||
of the associated job. The `doc_count` of those buckets are then compared with
|
of the associated {anomaly-job}. The `doc_count` of those buckets are then
|
||||||
the job's finalized analysis buckets to see whether any data has arrived since
|
compared with the job's finalized analysis buckets to see whether any data has
|
||||||
the analysis. If there is indeed missing data due to their ingest delay, the end
|
arrived since the analysis. If there is indeed missing data due to their ingest
|
||||||
user is notified. For example, you can see annotations in {kib} for the periods
|
delay, the end user is notified. For example, you can see annotations in {kib}
|
||||||
where these delays occur.
|
for the periods where these delays occur.
|
||||||
|
|
||||||
==== What to do about delayed data?
|
==== What to do about delayed data?
|
||||||
|
|
||||||
|
@ -16,17 +16,18 @@ Let us see how those can be configured by examples.
|
|||||||
|
|
||||||
==== Specifying custom rule scope
|
==== Specifying custom rule scope
|
||||||
|
|
||||||
Let us assume we are configuring a job in order to detect DNS data exfiltration.
|
Let us assume we are configuring an {anomaly-job} in order to detect DNS data
|
||||||
Our data contain fields "subdomain" and "highest_registered_domain".
|
exfiltration. Our data contain fields "subdomain" and "highest_registered_domain".
|
||||||
We can use a detector that looks like `high_info_content(subdomain) over highest_registered_domain`.
|
We can use a detector that looks like
|
||||||
If we run such a job it is possible that we discover a lot of anomalies on
|
`high_info_content(subdomain) over highest_registered_domain`. If we run such a
|
||||||
frequently used domains that we have reasons to trust. As security analysts, we
|
job, it is possible that we discover a lot of anomalies on frequently used
|
||||||
are not interested in such anomalies. Ideally, we could instruct the detector to
|
domains that we have reasons to trust. As security analysts, we are not
|
||||||
skip results for domains that we consider safe. Using a rule with a scope allows
|
interested in such anomalies. Ideally, we could instruct the detector to skip
|
||||||
us to achieve this.
|
results for domains that we consider safe. Using a rule with a scope allows us
|
||||||
|
to achieve this.
|
||||||
|
|
||||||
First, we need to create a list of our safe domains. Those lists are called
|
First, we need to create a list of our safe domains. Those lists are called
|
||||||
_filters_ in {ml}. Filters can be shared across jobs.
|
_filters_ in {ml}. Filters can be shared across {anomaly-jobs}.
|
||||||
|
|
||||||
We create our filter using the {ref}/ml-put-filter.html[put filter API]:
|
We create our filter using the {ref}/ml-put-filter.html[put filter API]:
|
||||||
|
|
||||||
@ -41,8 +42,8 @@ PUT _ml/filters/safe_domains
|
|||||||
// CONSOLE
|
// CONSOLE
|
||||||
// TEST[skip:needs-licence]
|
// TEST[skip:needs-licence]
|
||||||
|
|
||||||
Now, we can create our job specifying a scope that uses the `safe_domains`
|
Now, we can create our {anomaly-job} specifying a scope that uses the
|
||||||
filter for the `highest_registered_domain` field:
|
`safe_domains` filter for the `highest_registered_domain` field:
|
||||||
|
|
||||||
[source,js]
|
[source,js]
|
||||||
----------------------------------
|
----------------------------------
|
||||||
@ -139,8 +140,8 @@ example, 0.02. Given our knowledge about how CPU utilization behaves we might
|
|||||||
determine that anomalies with such small actual values are not interesting for
|
determine that anomalies with such small actual values are not interesting for
|
||||||
investigation.
|
investigation.
|
||||||
|
|
||||||
Let us now configure a job with a rule that will skip results where CPU
|
Let us now configure an {anomaly-job} with a rule that will skip results where
|
||||||
utilization is less than 0.20.
|
CPU utilization is less than 0.20.
|
||||||
|
|
||||||
[source,js]
|
[source,js]
|
||||||
----------------------------------
|
----------------------------------
|
||||||
@ -214,18 +215,18 @@ PUT _ml/anomaly_detectors/rule_with_range
|
|||||||
==== Custom rules in the life-cycle of a job
|
==== Custom rules in the life-cycle of a job
|
||||||
|
|
||||||
Custom rules only affect results created after the rules were applied.
|
Custom rules only affect results created after the rules were applied.
|
||||||
Let us imagine that we have configured a job and it has been running
|
Let us imagine that we have configured an {anomaly-job} and it has been running
|
||||||
for some time. After observing its results we decide that we can employ
|
for some time. After observing its results we decide that we can employ
|
||||||
rules in order to get rid of some uninteresting results. We can use
|
rules in order to get rid of some uninteresting results. We can use
|
||||||
the {ref}/ml-update-job.html[update job API] to do so. However, the rule we
|
the {ref}/ml-update-job.html[update {anomaly-job} API] to do so. However, the
|
||||||
added will only be in effect for any results created from the moment we added
|
rule we added will only be in effect for any results created from the moment we
|
||||||
the rule onwards. Past results will remain unaffected.
|
added the rule onwards. Past results will remain unaffected.
|
||||||
|
|
||||||
==== Using custom rules VS filtering data
|
==== Using custom rules vs. filtering data
|
||||||
|
|
||||||
It might appear like using rules is just another way of filtering the data
|
It might appear like using rules is just another way of filtering the data
|
||||||
that feeds into a job. For example, a rule that skips results when the
|
that feeds into an {anomaly-job}. For example, a rule that skips results when
|
||||||
partition field value is in a filter sounds equivalent to having a query
|
the partition field value is in a filter sounds equivalent to having a query
|
||||||
that filters out such documents. But it is not. There is a fundamental
|
that filters out such documents. But it is not. There is a fundamental
|
||||||
difference. When the data is filtered before reaching a job it is as if they
|
difference. When the data is filtered before reaching a job it is as if they
|
||||||
never existed for the job. With rules, the data still reaches the job and
|
never existed for the job. With rules, the data still reaches the job and
|
||||||
|
@ -5,10 +5,10 @@
|
|||||||
The {ml-features} include analysis functions that provide a wide variety of
|
The {ml-features} include analysis functions that provide a wide variety of
|
||||||
flexible ways to analyze data for anomalies.
|
flexible ways to analyze data for anomalies.
|
||||||
|
|
||||||
When you create jobs, you specify one or more detectors, which define the type of
|
When you create {anomaly-jobs}, you specify one or more detectors, which define
|
||||||
analysis that needs to be done. If you are creating your job by using {ml} APIs,
|
the type of analysis that needs to be done. If you are creating your job by
|
||||||
you specify the functions in
|
using {ml} APIs, you specify the functions in
|
||||||
{ref}/ml-job-resource.html#ml-detectorconfig[Detector Configuration Objects].
|
{ref}/ml-job-resource.html#ml-detectorconfig[Detector configuration objects].
|
||||||
If you are creating your job in {kib}, you specify the functions differently
|
If you are creating your job in {kib}, you specify the functions differently
|
||||||
depending on whether you are creating single metric, multi-metric, or advanced
|
depending on whether you are creating single metric, multi-metric, or advanced
|
||||||
jobs.
|
jobs.
|
||||||
@ -24,8 +24,8 @@ You can specify a `summary_count_field_name` with any function except `metric`.
|
|||||||
When you use `summary_count_field_name`, the {ml} features expect the input
|
When you use `summary_count_field_name`, the {ml} features expect the input
|
||||||
data to be pre-aggregated. The value of the `summary_count_field_name` field
|
data to be pre-aggregated. The value of the `summary_count_field_name` field
|
||||||
must contain the count of raw events that were summarized. In {kib}, use the
|
must contain the count of raw events that were summarized. In {kib}, use the
|
||||||
**summary_count_field_name** in advanced jobs. Analyzing aggregated input data
|
**summary_count_field_name** in advanced {anomaly-jobs}. Analyzing aggregated
|
||||||
provides a significant boost in performance. For more information, see
|
input data provides a significant boost in performance. For more information, see
|
||||||
<<ml-configuring-aggregation>>.
|
<<ml-configuring-aggregation>>.
|
||||||
|
|
||||||
If your data is sparse, there may be gaps in the data which means you might have
|
If your data is sparse, there may be gaps in the data which means you might have
|
||||||
|
@ -40,7 +40,7 @@ These functions support the following properties:
|
|||||||
* `partition_field_name` (optional)
|
* `partition_field_name` (optional)
|
||||||
|
|
||||||
For more information about those properties,
|
For more information about those properties,
|
||||||
see {ref}/ml-job-resource.html#ml-detectorconfig[Detector Configuration Objects].
|
see {ref}/ml-job-resource.html#ml-detectorconfig[Detector configuration objects].
|
||||||
|
|
||||||
.Example 1: Analyzing events with the count function
|
.Example 1: Analyzing events with the count function
|
||||||
[source,js]
|
[source,js]
|
||||||
@ -65,8 +65,9 @@ This example is probably the simplest possible analysis. It identifies
|
|||||||
time buckets during which the overall count of events is higher or lower than
|
time buckets during which the overall count of events is higher or lower than
|
||||||
usual.
|
usual.
|
||||||
|
|
||||||
When you use this function in a detector in your job, it models the event rate
|
When you use this function in a detector in your {anomaly-job}, it models the
|
||||||
and detects when the event rate is unusual compared to its past behavior.
|
event rate and detects when the event rate is unusual compared to its past
|
||||||
|
behavior.
|
||||||
|
|
||||||
.Example 2: Analyzing errors with the high_count function
|
.Example 2: Analyzing errors with the high_count function
|
||||||
[source,js]
|
[source,js]
|
||||||
@ -89,7 +90,7 @@ PUT _ml/anomaly_detectors/example2
|
|||||||
// CONSOLE
|
// CONSOLE
|
||||||
// TEST[skip:needs-licence]
|
// TEST[skip:needs-licence]
|
||||||
|
|
||||||
If you use this `high_count` function in a detector in your job, it
|
If you use this `high_count` function in a detector in your {anomaly-job}, it
|
||||||
models the event rate for each error code. It detects users that generate an
|
models the event rate for each error code. It detects users that generate an
|
||||||
unusually high count of error codes compared to other users.
|
unusually high count of error codes compared to other users.
|
||||||
|
|
||||||
@ -117,9 +118,9 @@ PUT _ml/anomaly_detectors/example3
|
|||||||
In this example, the function detects when the count of events for a
|
In this example, the function detects when the count of events for a
|
||||||
status code is lower than usual.
|
status code is lower than usual.
|
||||||
|
|
||||||
When you use this function in a detector in your job, it models the event rate
|
When you use this function in a detector in your {anomaly-job}, it models the
|
||||||
for each status code and detects when a status code has an unusually low count
|
event rate for each status code and detects when a status code has an unusually
|
||||||
compared to its past behavior.
|
low count compared to its past behavior.
|
||||||
|
|
||||||
.Example 4: Analyzing aggregated data with the count function
|
.Example 4: Analyzing aggregated data with the count function
|
||||||
[source,js]
|
[source,js]
|
||||||
@ -168,7 +169,7 @@ These functions support the following properties:
|
|||||||
* `partition_field_name` (optional)
|
* `partition_field_name` (optional)
|
||||||
|
|
||||||
For more information about those properties,
|
For more information about those properties,
|
||||||
see {ref}/ml-job-resource.html#ml-detectorconfig[Detector Configuration Objects].
|
see {ref}/ml-job-resource.html#ml-detectorconfig[Detector configuration objects].
|
||||||
|
|
||||||
For example, if you have the following number of events per bucket:
|
For example, if you have the following number of events per bucket:
|
||||||
|
|
||||||
@ -206,10 +207,10 @@ PUT _ml/anomaly_detectors/example5
|
|||||||
// CONSOLE
|
// CONSOLE
|
||||||
// TEST[skip:needs-licence]
|
// TEST[skip:needs-licence]
|
||||||
|
|
||||||
If you use this `high_non_zero_count` function in a detector in your job, it
|
If you use this `high_non_zero_count` function in a detector in your
|
||||||
models the count of events for the `signaturename` field. It ignores any buckets
|
{anomaly-job}, it models the count of events for the `signaturename` field. It
|
||||||
where the count is zero and detects when a `signaturename` value has an
|
ignores any buckets where the count is zero and detects when a `signaturename`
|
||||||
unusually high count of events compared to its past behavior.
|
value has an unusually high count of events compared to its past behavior.
|
||||||
|
|
||||||
NOTE: Population analysis (using an `over_field_name` property value) is not
|
NOTE: Population analysis (using an `over_field_name` property value) is not
|
||||||
supported for the `non_zero_count`, `high_non_zero_count`, and
|
supported for the `non_zero_count`, `high_non_zero_count`, and
|
||||||
@ -238,7 +239,7 @@ These functions support the following properties:
|
|||||||
* `partition_field_name` (optional)
|
* `partition_field_name` (optional)
|
||||||
|
|
||||||
For more information about those properties,
|
For more information about those properties,
|
||||||
see {ref}/ml-job-resource.html#ml-detectorconfig[Detector Configuration Objects].
|
see {ref}/ml-job-resource.html#ml-detectorconfig[Detector configuration objects].
|
||||||
|
|
||||||
.Example 6: Analyzing users with the distinct_count function
|
.Example 6: Analyzing users with the distinct_count function
|
||||||
[source,js]
|
[source,js]
|
||||||
@ -261,9 +262,9 @@ PUT _ml/anomaly_detectors/example6
|
|||||||
// TEST[skip:needs-licence]
|
// TEST[skip:needs-licence]
|
||||||
|
|
||||||
This `distinct_count` function detects when a system has an unusual number
|
This `distinct_count` function detects when a system has an unusual number
|
||||||
of logged in users. When you use this function in a detector in your job, it
|
of logged in users. When you use this function in a detector in your
|
||||||
models the distinct count of users. It also detects when the distinct number of
|
{anomaly-job}, it models the distinct count of users. It also detects when the
|
||||||
users is unusual compared to the past.
|
distinct number of users is unusual compared to the past.
|
||||||
|
|
||||||
.Example 7: Analyzing ports with the high_distinct_count function
|
.Example 7: Analyzing ports with the high_distinct_count function
|
||||||
[source,js]
|
[source,js]
|
||||||
@ -287,6 +288,6 @@ PUT _ml/anomaly_detectors/example7
|
|||||||
// TEST[skip:needs-licence]
|
// TEST[skip:needs-licence]
|
||||||
|
|
||||||
This example detects instances of port scanning. When you use this function in a
|
This example detects instances of port scanning. When you use this function in a
|
||||||
detector in your job, it models the distinct count of ports. It also detects the
|
detector in your {anomaly-job}, it models the distinct count of ports. It also
|
||||||
`src_ip` values that connect to an unusually high number of different
|
detects the `src_ip` values that connect to an unusually high number of different
|
||||||
`dst_ports` values compared to other `src_ip` values.
|
`dst_ports` values compared to other `src_ip` values.
|
||||||
|
@ -7,9 +7,9 @@ input data.
|
|||||||
|
|
||||||
The {ml-features} include the following geographic function: `lat_long`.
|
The {ml-features} include the following geographic function: `lat_long`.
|
||||||
|
|
||||||
NOTE: You cannot create forecasts for jobs that contain geographic functions.
|
NOTE: You cannot create forecasts for {anomaly-jobs} that contain geographic
|
||||||
You also cannot add rules with conditions to detectors that use geographic
|
functions. You also cannot add rules with conditions to detectors that use
|
||||||
functions.
|
geographic functions.
|
||||||
|
|
||||||
[float]
|
[float]
|
||||||
[[ml-lat-long]]
|
[[ml-lat-long]]
|
||||||
@ -26,7 +26,7 @@ This function supports the following properties:
|
|||||||
* `partition_field_name` (optional)
|
* `partition_field_name` (optional)
|
||||||
|
|
||||||
For more information about those properties,
|
For more information about those properties,
|
||||||
see {ref}/ml-job-resource.html#ml-detectorconfig[Detector Configuration Objects].
|
see {ref}/ml-job-resource.html#ml-detectorconfig[Detector configuration objects].
|
||||||
|
|
||||||
.Example 1: Analyzing transactions with the lat_long function
|
.Example 1: Analyzing transactions with the lat_long function
|
||||||
[source,js]
|
[source,js]
|
||||||
@ -49,15 +49,15 @@ PUT _ml/anomaly_detectors/example1
|
|||||||
// CONSOLE
|
// CONSOLE
|
||||||
// TEST[skip:needs-licence]
|
// TEST[skip:needs-licence]
|
||||||
|
|
||||||
If you use this `lat_long` function in a detector in your job, it
|
If you use this `lat_long` function in a detector in your {anomaly-job}, it
|
||||||
detects anomalies where the geographic location of a credit card transaction is
|
detects anomalies where the geographic location of a credit card transaction is
|
||||||
unusual for a particular customer’s credit card. An anomaly might indicate fraud.
|
unusual for a particular customer’s credit card. An anomaly might indicate fraud.
|
||||||
|
|
||||||
IMPORTANT: The `field_name` that you supply must be a single string that contains
|
IMPORTANT: The `field_name` that you supply must be a single string that contains
|
||||||
two comma-separated numbers of the form `latitude,longitude`, a `geo_point` field,
|
two comma-separated numbers of the form `latitude,longitude`, a `geo_point` field,
|
||||||
a `geo_shape` field that contains point values, or a `geo_centroid` aggregation.
|
a `geo_shape` field that contains point values, or a `geo_centroid` aggregation.
|
||||||
The `latitude` and `longitude` must be in the range -180 to 180 and represent a point on the
|
The `latitude` and `longitude` must be in the range -180 to 180 and represent a
|
||||||
surface of the Earth.
|
point on the surface of the Earth.
|
||||||
|
|
||||||
For example, JSON data might contain the following transaction coordinates:
|
For example, JSON data might contain the following transaction coordinates:
|
||||||
|
|
||||||
@ -75,6 +75,6 @@ In {es}, location data is likely to be stored in `geo_point` fields. For more
|
|||||||
information, see {ref}/geo-point.html[Geo-point datatype]. This data type is
|
information, see {ref}/geo-point.html[Geo-point datatype]. This data type is
|
||||||
supported natively in {ml-features}. Specifically, {dfeed} when pulling data from
|
supported natively in {ml-features}. Specifically, {dfeed} when pulling data from
|
||||||
a `geo_point` field, will transform the data into the appropriate `lat,lon` string
|
a `geo_point` field, will transform the data into the appropriate `lat,lon` string
|
||||||
format before sending to the {ml} job.
|
format before sending to the {anomaly-job}.
|
||||||
|
|
||||||
For more information, see <<ml-configuring-transform>>.
|
For more information, see <<ml-configuring-transform>>.
|
||||||
|
@ -29,7 +29,7 @@ These functions support the following properties:
|
|||||||
* `partition_field_name` (optional)
|
* `partition_field_name` (optional)
|
||||||
|
|
||||||
For more information about those properties, see
|
For more information about those properties, see
|
||||||
{ref}/ml-job-resource.html#ml-detectorconfig[Detector Configuration Objects].
|
{ref}/ml-job-resource.html#ml-detectorconfig[Detector configuration objects].
|
||||||
|
|
||||||
.Example 1: Analyzing subdomain strings with the info_content function
|
.Example 1: Analyzing subdomain strings with the info_content function
|
||||||
[source,js]
|
[source,js]
|
||||||
@ -42,9 +42,9 @@ For more information about those properties, see
|
|||||||
--------------------------------------------------
|
--------------------------------------------------
|
||||||
// NOTCONSOLE
|
// NOTCONSOLE
|
||||||
|
|
||||||
If you use this `info_content` function in a detector in your job, it models
|
If you use this `info_content` function in a detector in your {anomaly-job}, it
|
||||||
information that is present in the `subdomain` string. It detects anomalies
|
models information that is present in the `subdomain` string. It detects
|
||||||
where the information content is unusual compared to the other
|
anomalies where the information content is unusual compared to the other
|
||||||
`highest_registered_domain` values. An anomaly could indicate an abuse of the
|
`highest_registered_domain` values. An anomaly could indicate an abuse of the
|
||||||
DNS protocol, such as malicious command and control activity.
|
DNS protocol, such as malicious command and control activity.
|
||||||
|
|
||||||
@ -63,8 +63,8 @@ choice.
|
|||||||
--------------------------------------------------
|
--------------------------------------------------
|
||||||
// NOTCONSOLE
|
// NOTCONSOLE
|
||||||
|
|
||||||
If you use this `high_info_content` function in a detector in your job, it
|
If you use this `high_info_content` function in a detector in your {anomaly-job},
|
||||||
models information content that is held in the DNS query string. It detects
|
it models information content that is held in the DNS query string. It detects
|
||||||
`src_ip` values where the information content is unusually high compared to
|
`src_ip` values where the information content is unusually high compared to
|
||||||
other `src_ip` values. This example is similar to the example for the
|
other `src_ip` values. This example is similar to the example for the
|
||||||
`info_content` function, but it reports anomalies only where the amount of
|
`info_content` function, but it reports anomalies only where the amount of
|
||||||
@ -81,8 +81,8 @@ information content is higher than expected.
|
|||||||
--------------------------------------------------
|
--------------------------------------------------
|
||||||
// NOTCONSOLE
|
// NOTCONSOLE
|
||||||
|
|
||||||
If you use this `low_info_content` function in a detector in your job, it models
|
If you use this `low_info_content` function in a detector in your {anomaly-job},
|
||||||
information content that is present in the message string for each
|
it models information content that is present in the message string for each
|
||||||
`logfilename`. It detects anomalies where the information content is low
|
`logfilename`. It detects anomalies where the information content is low
|
||||||
compared to its past behavior. For example, this function detects unusually low
|
compared to its past behavior. For example, this function detects unusually low
|
||||||
amounts of information in a collection of rolling log files. Low information
|
amounts of information in a collection of rolling log files. Low information
|
||||||
|
@ -35,7 +35,7 @@ This function supports the following properties:
|
|||||||
* `partition_field_name` (optional)
|
* `partition_field_name` (optional)
|
||||||
|
|
||||||
For more information about those properties, see
|
For more information about those properties, see
|
||||||
{ref}/ml-job-resource.html#ml-detectorconfig[Detector Configuration Objects].
|
{ref}/ml-job-resource.html#ml-detectorconfig[Detector configuration objects].
|
||||||
|
|
||||||
.Example 1: Analyzing minimum transactions with the min function
|
.Example 1: Analyzing minimum transactions with the min function
|
||||||
[source,js]
|
[source,js]
|
||||||
@ -48,9 +48,9 @@ For more information about those properties, see
|
|||||||
--------------------------------------------------
|
--------------------------------------------------
|
||||||
// NOTCONSOLE
|
// NOTCONSOLE
|
||||||
|
|
||||||
If you use this `min` function in a detector in your job, it detects where the
|
If you use this `min` function in a detector in your {anomaly-job}, it detects
|
||||||
smallest transaction is lower than previously observed. You can use this
|
where the smallest transaction is lower than previously observed. You can use
|
||||||
function to detect items for sale at unintentionally low prices due to data
|
this function to detect items for sale at unintentionally low prices due to data
|
||||||
entry mistakes. It models the minimum amount for each product over time.
|
entry mistakes. It models the minimum amount for each product over time.
|
||||||
|
|
||||||
[float]
|
[float]
|
||||||
@ -70,7 +70,7 @@ This function supports the following properties:
|
|||||||
* `partition_field_name` (optional)
|
* `partition_field_name` (optional)
|
||||||
|
|
||||||
For more information about those properties, see
|
For more information about those properties, see
|
||||||
{ref}/ml-job-resource.html#ml-detectorconfig[Detector Configuration Objects].
|
{ref}/ml-job-resource.html#ml-detectorconfig[Detector configuration objects].
|
||||||
|
|
||||||
.Example 2: Analyzing maximum response times with the max function
|
.Example 2: Analyzing maximum response times with the max function
|
||||||
[source,js]
|
[source,js]
|
||||||
@ -83,9 +83,9 @@ For more information about those properties, see
|
|||||||
--------------------------------------------------
|
--------------------------------------------------
|
||||||
// NOTCONSOLE
|
// NOTCONSOLE
|
||||||
|
|
||||||
If you use this `max` function in a detector in your job, it detects where the
|
If you use this `max` function in a detector in your {anomaly-job}, it detects
|
||||||
longest `responsetime` is longer than previously observed. You can use this
|
where the longest `responsetime` is longer than previously observed. You can use
|
||||||
function to detect applications that have `responsetime` values that are
|
this function to detect applications that have `responsetime` values that are
|
||||||
unusually lengthy. It models the maximum `responsetime` for each application
|
unusually lengthy. It models the maximum `responsetime` for each application
|
||||||
over time and detects when the longest `responsetime` is unusually long compared
|
over time and detects when the longest `responsetime` is unusually long compared
|
||||||
to previous applications.
|
to previous applications.
|
||||||
@ -132,7 +132,7 @@ These functions support the following properties:
|
|||||||
* `partition_field_name` (optional)
|
* `partition_field_name` (optional)
|
||||||
|
|
||||||
For more information about those properties, see
|
For more information about those properties, see
|
||||||
{ref}/ml-job-resource.html#ml-detectorconfig[Detector Configuration Objects].
|
{ref}/ml-job-resource.html#ml-detectorconfig[Detector configuration objects].
|
||||||
|
|
||||||
.Example 4: Analyzing response times with the median function
|
.Example 4: Analyzing response times with the median function
|
||||||
[source,js]
|
[source,js]
|
||||||
@ -145,9 +145,9 @@ For more information about those properties, see
|
|||||||
--------------------------------------------------
|
--------------------------------------------------
|
||||||
// NOTCONSOLE
|
// NOTCONSOLE
|
||||||
|
|
||||||
If you use this `median` function in a detector in your job, it models the
|
If you use this `median` function in a detector in your {anomaly-job}, it models
|
||||||
median `responsetime` for each application over time. It detects when the median
|
the median `responsetime` for each application over time. It detects when the
|
||||||
`responsetime` is unusual compared to previous `responsetime` values.
|
median `responsetime` is unusual compared to previous `responsetime` values.
|
||||||
|
|
||||||
[float]
|
[float]
|
||||||
[[ml-metric-mean]]
|
[[ml-metric-mean]]
|
||||||
@ -170,7 +170,7 @@ These functions support the following properties:
|
|||||||
* `partition_field_name` (optional)
|
* `partition_field_name` (optional)
|
||||||
|
|
||||||
For more information about those properties, see
|
For more information about those properties, see
|
||||||
{ref}/ml-job-resource.html#ml-detectorconfig[Detector Configuration Objects].
|
{ref}/ml-job-resource.html#ml-detectorconfig[Detector configuration objects].
|
||||||
|
|
||||||
.Example 5: Analyzing response times with the mean function
|
.Example 5: Analyzing response times with the mean function
|
||||||
[source,js]
|
[source,js]
|
||||||
@ -183,8 +183,8 @@ For more information about those properties, see
|
|||||||
--------------------------------------------------
|
--------------------------------------------------
|
||||||
// NOTCONSOLE
|
// NOTCONSOLE
|
||||||
|
|
||||||
If you use this `mean` function in a detector in your job, it models the mean
|
If you use this `mean` function in a detector in your {anomaly-job}, it models
|
||||||
`responsetime` for each application over time. It detects when the mean
|
the mean `responsetime` for each application over time. It detects when the mean
|
||||||
`responsetime` is unusual compared to previous `responsetime` values.
|
`responsetime` is unusual compared to previous `responsetime` values.
|
||||||
|
|
||||||
.Example 6: Analyzing response times with the high_mean function
|
.Example 6: Analyzing response times with the high_mean function
|
||||||
@ -198,9 +198,10 @@ If you use this `mean` function in a detector in your job, it models the mean
|
|||||||
--------------------------------------------------
|
--------------------------------------------------
|
||||||
// NOTCONSOLE
|
// NOTCONSOLE
|
||||||
|
|
||||||
If you use this `high_mean` function in a detector in your job, it models the
|
If you use this `high_mean` function in a detector in your {anomaly-job}, it
|
||||||
mean `responsetime` for each application over time. It detects when the mean
|
models the mean `responsetime` for each application over time. It detects when
|
||||||
`responsetime` is unusually high compared to previous `responsetime` values.
|
the mean `responsetime` is unusually high compared to previous `responsetime`
|
||||||
|
values.
|
||||||
|
|
||||||
.Example 7: Analyzing response times with the low_mean function
|
.Example 7: Analyzing response times with the low_mean function
|
||||||
[source,js]
|
[source,js]
|
||||||
@ -213,9 +214,10 @@ mean `responsetime` for each application over time. It detects when the mean
|
|||||||
--------------------------------------------------
|
--------------------------------------------------
|
||||||
// NOTCONSOLE
|
// NOTCONSOLE
|
||||||
|
|
||||||
If you use this `low_mean` function in a detector in your job, it models the
|
If you use this `low_mean` function in a detector in your {anomaly-job}, it
|
||||||
mean `responsetime` for each application over time. It detects when the mean
|
models the mean `responsetime` for each application over time. It detects when
|
||||||
`responsetime` is unusually low compared to previous `responsetime` values.
|
the mean `responsetime` is unusually low compared to previous `responsetime`
|
||||||
|
values.
|
||||||
|
|
||||||
[float]
|
[float]
|
||||||
[[ml-metric-metric]]
|
[[ml-metric-metric]]
|
||||||
@ -236,7 +238,7 @@ This function supports the following properties:
|
|||||||
* `partition_field_name` (optional)
|
* `partition_field_name` (optional)
|
||||||
|
|
||||||
For more information about those properties, see
|
For more information about those properties, see
|
||||||
{ref}/ml-job-resource.html#ml-detectorconfig[Detector Configuration Objects].
|
{ref}/ml-job-resource.html#ml-detectorconfig[Detector configuration objects].
|
||||||
|
|
||||||
.Example 8: Analyzing response times with the metric function
|
.Example 8: Analyzing response times with the metric function
|
||||||
[source,js]
|
[source,js]
|
||||||
@ -249,8 +251,8 @@ For more information about those properties, see
|
|||||||
--------------------------------------------------
|
--------------------------------------------------
|
||||||
// NOTCONSOLE
|
// NOTCONSOLE
|
||||||
|
|
||||||
If you use this `metric` function in a detector in your job, it models the
|
If you use this `metric` function in a detector in your {anomaly-job}, it models
|
||||||
mean, min, and max `responsetime` for each application over time. It detects
|
the mean, min, and max `responsetime` for each application over time. It detects
|
||||||
when the mean, min, or max `responsetime` is unusual compared to previous
|
when the mean, min, or max `responsetime` is unusual compared to previous
|
||||||
`responsetime` values.
|
`responsetime` values.
|
||||||
|
|
||||||
@ -273,7 +275,7 @@ These functions support the following properties:
|
|||||||
* `partition_field_name` (optional)
|
* `partition_field_name` (optional)
|
||||||
|
|
||||||
For more information about those properties, see
|
For more information about those properties, see
|
||||||
{ref}/ml-job-resource.html#ml-detectorconfig[Detector Configuration Objects].
|
{ref}/ml-job-resource.html#ml-detectorconfig[Detector configuration objects].
|
||||||
|
|
||||||
.Example 9: Analyzing response times with the varp function
|
.Example 9: Analyzing response times with the varp function
|
||||||
[source,js]
|
[source,js]
|
||||||
@ -286,10 +288,10 @@ For more information about those properties, see
|
|||||||
--------------------------------------------------
|
--------------------------------------------------
|
||||||
// NOTCONSOLE
|
// NOTCONSOLE
|
||||||
|
|
||||||
If you use this `varp` function in a detector in your job, it models the
|
If you use this `varp` function in a detector in your {anomaly-job}, it models
|
||||||
variance in values of `responsetime` for each application over time. It detects
|
the variance in values of `responsetime` for each application over time. It
|
||||||
when the variance in `responsetime` is unusual compared to past application
|
detects when the variance in `responsetime` is unusual compared to past
|
||||||
behavior.
|
application behavior.
|
||||||
|
|
||||||
.Example 10: Analyzing response times with the high_varp function
|
.Example 10: Analyzing response times with the high_varp function
|
||||||
[source,js]
|
[source,js]
|
||||||
@ -302,10 +304,10 @@ behavior.
|
|||||||
--------------------------------------------------
|
--------------------------------------------------
|
||||||
// NOTCONSOLE
|
// NOTCONSOLE
|
||||||
|
|
||||||
If you use this `high_varp` function in a detector in your job, it models the
|
If you use this `high_varp` function in a detector in your {anomaly-job}, it
|
||||||
variance in values of `responsetime` for each application over time. It detects
|
models the variance in values of `responsetime` for each application over time.
|
||||||
when the variance in `responsetime` is unusual compared to past application
|
It detects when the variance in `responsetime` is unusual compared to past
|
||||||
behavior.
|
application behavior.
|
||||||
|
|
||||||
.Example 11: Analyzing response times with the low_varp function
|
.Example 11: Analyzing response times with the low_varp function
|
||||||
[source,js]
|
[source,js]
|
||||||
@ -318,7 +320,7 @@ behavior.
|
|||||||
--------------------------------------------------
|
--------------------------------------------------
|
||||||
// NOTCONSOLE
|
// NOTCONSOLE
|
||||||
|
|
||||||
If you use this `low_varp` function in a detector in your job, it models the
|
If you use this `low_varp` function in a detector in your {anomaly-job}, it
|
||||||
variance in values of `responsetime` for each application over time. It detects
|
models the variance in values of `responsetime` for each application over time.
|
||||||
when the variance in `responsetime` is unusual compared to past application
|
It detects when the variance in `responsetime` is unusual compared to past
|
||||||
behavior.
|
application behavior.
|
||||||
|
@ -13,8 +13,8 @@ number of times (frequency) rare values occur.
|
|||||||
====
|
====
|
||||||
* The `rare` and `freq_rare` functions should not be used in conjunction with
|
* The `rare` and `freq_rare` functions should not be used in conjunction with
|
||||||
`exclude_frequent`.
|
`exclude_frequent`.
|
||||||
* You cannot create forecasts for jobs that contain `rare` or `freq_rare`
|
* You cannot create forecasts for {anomaly-jobs} that contain `rare` or
|
||||||
functions.
|
`freq_rare` functions.
|
||||||
* You cannot add rules with conditions to detectors that use `rare` or
|
* You cannot add rules with conditions to detectors that use `rare` or
|
||||||
`freq_rare` functions.
|
`freq_rare` functions.
|
||||||
* Shorter bucket spans (less than 1 hour, for example) are recommended when
|
* Shorter bucket spans (less than 1 hour, for example) are recommended when
|
||||||
@ -47,7 +47,7 @@ This function supports the following properties:
|
|||||||
* `partition_field_name` (optional)
|
* `partition_field_name` (optional)
|
||||||
|
|
||||||
For more information about those properties, see
|
For more information about those properties, see
|
||||||
{ref}/ml-job-resource.html#ml-detectorconfig[Detector Configuration Objects].
|
{ref}/ml-job-resource.html#ml-detectorconfig[Detector configuration objects].
|
||||||
|
|
||||||
.Example 1: Analyzing status codes with the rare function
|
.Example 1: Analyzing status codes with the rare function
|
||||||
[source,js]
|
[source,js]
|
||||||
@ -59,10 +59,11 @@ For more information about those properties, see
|
|||||||
--------------------------------------------------
|
--------------------------------------------------
|
||||||
// NOTCONSOLE
|
// NOTCONSOLE
|
||||||
|
|
||||||
If you use this `rare` function in a detector in your job, it detects values
|
If you use this `rare` function in a detector in your {anomaly-job}, it detects
|
||||||
that are rare in time. It models status codes that occur over time and detects
|
values that are rare in time. It models status codes that occur over time and
|
||||||
when rare status codes occur compared to the past. For example, you can detect
|
detects when rare status codes occur compared to the past. For example, you can
|
||||||
status codes in a web access log that have never (or rarely) occurred before.
|
detect status codes in a web access log that have never (or rarely) occurred
|
||||||
|
before.
|
||||||
|
|
||||||
.Example 2: Analyzing status codes in a population with the rare function
|
.Example 2: Analyzing status codes in a population with the rare function
|
||||||
[source,js]
|
[source,js]
|
||||||
@ -75,15 +76,15 @@ status codes in a web access log that have never (or rarely) occurred before.
|
|||||||
--------------------------------------------------
|
--------------------------------------------------
|
||||||
// NOTCONSOLE
|
// NOTCONSOLE
|
||||||
|
|
||||||
If you use this `rare` function in a detector in your job, it detects values
|
If you use this `rare` function in a detector in your {anomaly-job}, it detects
|
||||||
that are rare in a population. It models status code and client IP interactions
|
values that are rare in a population. It models status code and client IP
|
||||||
that occur. It defines a rare status code as one that occurs for few client IP
|
interactions that occur. It defines a rare status code as one that occurs for
|
||||||
values compared to the population. It detects client IP values that experience
|
few client IP values compared to the population. It detects client IP values
|
||||||
one or more distinct rare status codes compared to the population. For example
|
that experience one or more distinct rare status codes compared to the
|
||||||
in a web access log, a `clientip` that experiences the highest number of
|
population. For example in a web access log, a `clientip` that experiences the
|
||||||
different rare status codes compared to the population is regarded as highly
|
highest number of different rare status codes compared to the population is
|
||||||
anomalous. This analysis is based on the number of different status code values,
|
regarded as highly anomalous. This analysis is based on the number of different
|
||||||
not the count of occurrences.
|
status code values, not the count of occurrences.
|
||||||
|
|
||||||
NOTE: To define a status code as rare the {ml-features} look at the number
|
NOTE: To define a status code as rare the {ml-features} look at the number
|
||||||
of distinct status codes that occur, not the number of times the status code
|
of distinct status codes that occur, not the number of times the status code
|
||||||
@ -105,7 +106,7 @@ This function supports the following properties:
|
|||||||
* `partition_field_name` (optional)
|
* `partition_field_name` (optional)
|
||||||
|
|
||||||
For more information about those properties, see
|
For more information about those properties, see
|
||||||
{ref}/ml-job-resource.html#ml-detectorconfig[Detector Configuration Objects].
|
{ref}/ml-job-resource.html#ml-detectorconfig[Detector configuration objects].
|
||||||
|
|
||||||
.Example 3: Analyzing URI values in a population with the freq_rare function
|
.Example 3: Analyzing URI values in a population with the freq_rare function
|
||||||
[source,js]
|
[source,js]
|
||||||
@ -118,7 +119,7 @@ For more information about those properties, see
|
|||||||
--------------------------------------------------
|
--------------------------------------------------
|
||||||
// NOTCONSOLE
|
// NOTCONSOLE
|
||||||
|
|
||||||
If you use this `freq_rare` function in a detector in your job, it
|
If you use this `freq_rare` function in a detector in your {anomaly-job}, it
|
||||||
detects values that are frequently rare in a population. It models URI paths and
|
detects values that are frequently rare in a population. It models URI paths and
|
||||||
client IP interactions that occur. It defines a rare URI path as one that is
|
client IP interactions that occur. It defines a rare URI path as one that is
|
||||||
visited by few client IP values compared to the population. It detects the
|
visited by few client IP values compared to the population. It detects the
|
||||||
|
@ -2,7 +2,8 @@
|
|||||||
[[ml-sum-functions]]
|
[[ml-sum-functions]]
|
||||||
=== Sum functions
|
=== Sum functions
|
||||||
|
|
||||||
The sum functions detect anomalies when the sum of a field in a bucket is anomalous.
|
The sum functions detect anomalies when the sum of a field in a bucket is
|
||||||
|
anomalous.
|
||||||
|
|
||||||
If you want to monitor unusually high totals, use high-sided functions.
|
If you want to monitor unusually high totals, use high-sided functions.
|
||||||
|
|
||||||
@ -35,7 +36,7 @@ These functions support the following properties:
|
|||||||
* `partition_field_name` (optional)
|
* `partition_field_name` (optional)
|
||||||
|
|
||||||
For more information about those properties, see
|
For more information about those properties, see
|
||||||
{ref}/ml-job-resource.html#ml-detectorconfig[Detector Configuration Objects].
|
{ref}/ml-job-resource.html#ml-detectorconfig[Detector configuration objects].
|
||||||
|
|
||||||
.Example 1: Analyzing total expenses with the sum function
|
.Example 1: Analyzing total expenses with the sum function
|
||||||
[source,js]
|
[source,js]
|
||||||
@ -49,7 +50,7 @@ For more information about those properties, see
|
|||||||
--------------------------------------------------
|
--------------------------------------------------
|
||||||
// NOTCONSOLE
|
// NOTCONSOLE
|
||||||
|
|
||||||
If you use this `sum` function in a detector in your job, it
|
If you use this `sum` function in a detector in your {anomaly-job}, it
|
||||||
models total expenses per employees for each cost center. For each time bucket,
|
models total expenses per employees for each cost center. For each time bucket,
|
||||||
it detects when an employee’s expenses are unusual for a cost center compared
|
it detects when an employee’s expenses are unusual for a cost center compared
|
||||||
to other employees.
|
to other employees.
|
||||||
@ -65,7 +66,7 @@ to other employees.
|
|||||||
--------------------------------------------------
|
--------------------------------------------------
|
||||||
// NOTCONSOLE
|
// NOTCONSOLE
|
||||||
|
|
||||||
If you use this `high_sum` function in a detector in your job, it
|
If you use this `high_sum` function in a detector in your {anomaly-job}, it
|
||||||
models total `cs_bytes`. It detects `cs_hosts` that transfer unusually high
|
models total `cs_bytes`. It detects `cs_hosts` that transfer unusually high
|
||||||
volumes compared to other `cs_hosts`. This example looks for volumes of data
|
volumes compared to other `cs_hosts`. This example looks for volumes of data
|
||||||
transferred from a client to a server on the internet that are unusual compared
|
transferred from a client to a server on the internet that are unusual compared
|
||||||
@ -91,7 +92,7 @@ These functions support the following properties:
|
|||||||
* `partition_field_name` (optional)
|
* `partition_field_name` (optional)
|
||||||
|
|
||||||
For more information about those properties, see
|
For more information about those properties, see
|
||||||
{ref}/ml-job-resource.html#ml-detectorconfig[Detector Configuration Objects].
|
{ref}/ml-job-resource.html#ml-detectorconfig[Detector configuration objects].
|
||||||
|
|
||||||
NOTE: Population analysis (that is to say, use of the `over_field_name` property)
|
NOTE: Population analysis (that is to say, use of the `over_field_name` property)
|
||||||
is not applicable for this function.
|
is not applicable for this function.
|
||||||
@ -107,9 +108,7 @@ is not applicable for this function.
|
|||||||
--------------------------------------------------
|
--------------------------------------------------
|
||||||
// NOTCONSOLE
|
// NOTCONSOLE
|
||||||
|
|
||||||
If you use this `high_non_null_sum` function in a detector in your job, it
|
If you use this `high_non_null_sum` function in a detector in your {anomaly-job},
|
||||||
models the total `amount_approved` for each employee. It ignores any buckets
|
it models the total `amount_approved` for each employee. It ignores any buckets
|
||||||
where the amount is null. It detects employees who approve unusually high
|
where the amount is null. It detects employees who approve unusually high
|
||||||
amounts compared to their past behavior.
|
amounts compared to their past behavior.
|
||||||
//For this credit control system analysis, using non_null_sum will ignore
|
|
||||||
//periods where the employees are not active on the system.
|
|
||||||
|
@ -14,22 +14,25 @@ The {ml-features} include the following time functions:
|
|||||||
|
|
||||||
[NOTE]
|
[NOTE]
|
||||||
====
|
====
|
||||||
* NOTE: You cannot create forecasts for jobs that contain time functions.
|
* NOTE: You cannot create forecasts for {anomaly-jobs} that contain time
|
||||||
* The `time_of_day` function is not aware of the difference between days, for instance
|
functions.
|
||||||
work days and weekends. When modeling different days, use the `time_of_week` function.
|
* The `time_of_day` function is not aware of the difference between days, for
|
||||||
In general, the `time_of_week` function is more suited to modeling the behavior of people
|
instance work days and weekends. When modeling different days, use the
|
||||||
rather than machines, as people vary their behavior according to the day of the week.
|
`time_of_week` function. In general, the `time_of_week` function is more suited
|
||||||
* Shorter bucket spans (for example, 10 minutes) are recommended when performing a
|
to modeling the behavior of people rather than machines, as people vary their
|
||||||
`time_of_day` or `time_of_week` analysis. The time of the events being modeled are not
|
behavior according to the day of the week.
|
||||||
affected by the bucket span, but a shorter bucket span enables quicker alerting on unusual
|
* Shorter bucket spans (for example, 10 minutes) are recommended when performing
|
||||||
events.
|
a `time_of_day` or `time_of_week` analysis. The time of the events being modeled
|
||||||
* Unusual events are flagged based on the previous pattern of the data, not on what we
|
are not affected by the bucket span, but a shorter bucket span enables quicker
|
||||||
might think of as unusual based on human experience. So, if events typically occur
|
alerting on unusual events.
|
||||||
between 3 a.m. and 5 a.m., and event occurring at 3 p.m. is be flagged as unusual.
|
* Unusual events are flagged based on the previous pattern of the data, not on
|
||||||
* When Daylight Saving Time starts or stops, regular events can be flagged as anomalous.
|
what we might think of as unusual based on human experience. So, if events
|
||||||
This situation occurs because the actual time of the event (as measured against a UTC
|
typically occur between 3 a.m. and 5 a.m., and event occurring at 3 p.m. is be
|
||||||
baseline) has changed. This situation is treated as a step change in behavior and the new
|
flagged as unusual.
|
||||||
times will be learned quickly.
|
* When Daylight Saving Time starts or stops, regular events can be flagged as
|
||||||
|
anomalous. This situation occurs because the actual time of the event (as
|
||||||
|
measured against a UTC baseline) has changed. This situation is treated as a
|
||||||
|
step change in behavior and the new times will be learned quickly.
|
||||||
====
|
====
|
||||||
|
|
||||||
[float]
|
[float]
|
||||||
@ -51,7 +54,7 @@ This function supports the following properties:
|
|||||||
* `partition_field_name` (optional)
|
* `partition_field_name` (optional)
|
||||||
|
|
||||||
For more information about those properties, see
|
For more information about those properties, see
|
||||||
{ref}/ml-job-resource.html#ml-detectorconfig[Detector Configuration Objects].
|
{ref}/ml-job-resource.html#ml-detectorconfig[Detector configuration objects].
|
||||||
|
|
||||||
.Example 1: Analyzing events with the time_of_day function
|
.Example 1: Analyzing events with the time_of_day function
|
||||||
[source,js]
|
[source,js]
|
||||||
@ -63,7 +66,7 @@ For more information about those properties, see
|
|||||||
--------------------------------------------------
|
--------------------------------------------------
|
||||||
// NOTCONSOLE
|
// NOTCONSOLE
|
||||||
|
|
||||||
If you use this `time_of_day` function in a detector in your job, it
|
If you use this `time_of_day` function in a detector in your {anomaly-job}, it
|
||||||
models when events occur throughout a day for each process. It detects when an
|
models when events occur throughout a day for each process. It detects when an
|
||||||
event occurs for a process that is at an unusual time in the day compared to
|
event occurs for a process that is at an unusual time in the day compared to
|
||||||
its past behavior.
|
its past behavior.
|
||||||
@ -82,7 +85,7 @@ This function supports the following properties:
|
|||||||
* `partition_field_name` (optional)
|
* `partition_field_name` (optional)
|
||||||
|
|
||||||
For more information about those properties, see
|
For more information about those properties, see
|
||||||
{ref}/ml-job-resource.html#ml-detectorconfig[Detector Configuration Objects].
|
{ref}/ml-job-resource.html#ml-detectorconfig[Detector configuration objects].
|
||||||
|
|
||||||
.Example 2: Analyzing events with the time_of_week function
|
.Example 2: Analyzing events with the time_of_week function
|
||||||
[source,js]
|
[source,js]
|
||||||
@ -95,7 +98,7 @@ For more information about those properties, see
|
|||||||
--------------------------------------------------
|
--------------------------------------------------
|
||||||
// NOTCONSOLE
|
// NOTCONSOLE
|
||||||
|
|
||||||
If you use this `time_of_week` function in a detector in your job, it
|
If you use this `time_of_week` function in a detector in your {anomaly-job}, it
|
||||||
models when events occur throughout the week for each `eventcode`. It detects
|
models when events occur throughout the week for each `eventcode`. It detects
|
||||||
when a workstation event occurs at an unusual time during the week for that
|
when a workstation event occurs at an unusual time during the week for that
|
||||||
`eventcode` compared to other workstations. It detects events for a
|
`eventcode` compared to other workstations. It detects events for a
|
||||||
|
@ -57,9 +57,9 @@ PUT _ml/anomaly_detectors/population
|
|||||||
in each bucket.
|
in each bucket.
|
||||||
|
|
||||||
If your data is stored in {es}, you can use the population job wizard in {kib}
|
If your data is stored in {es}, you can use the population job wizard in {kib}
|
||||||
to create a job with these same properties. For example, if you add the sample
|
to create an {anomaly-job} with these same properties. For example, if you add
|
||||||
web logs in {kib}, you can use the following job settings in the population job
|
the sample web logs in {kib}, you can use the following job settings in the
|
||||||
wizard:
|
population job wizard:
|
||||||
|
|
||||||
[role="screenshot"]
|
[role="screenshot"]
|
||||||
image::images/ml-population-job.jpg["Job settings in the population job wizard]
|
image::images/ml-population-job.jpg["Job settings in the population job wizard]
|
||||||
|
@ -1,22 +1,22 @@
|
|||||||
[role="xpack"]
|
[role="xpack"]
|
||||||
[[stopping-ml]]
|
[[stopping-ml]]
|
||||||
== Stopping machine learning
|
== Stopping {ml} {anomaly-detect}
|
||||||
|
|
||||||
An orderly shutdown of {ml} ensures that:
|
An orderly shutdown ensures that:
|
||||||
|
|
||||||
* {dfeeds-cap} are stopped
|
* {dfeeds-cap} are stopped
|
||||||
* Buffers are flushed
|
* Buffers are flushed
|
||||||
* Model history is pruned
|
* Model history is pruned
|
||||||
* Final results are calculated
|
* Final results are calculated
|
||||||
* Model snapshots are saved
|
* Model snapshots are saved
|
||||||
* Jobs are closed
|
* {anomaly-jobs-cap} are closed
|
||||||
|
|
||||||
This process ensures that jobs are in a consistent state in case you want to
|
This process ensures that jobs are in a consistent state in case you want to
|
||||||
subsequently re-open them.
|
subsequently re-open them.
|
||||||
|
|
||||||
[float]
|
[float]
|
||||||
[[stopping-ml-datafeeds]]
|
[[stopping-ml-datafeeds]]
|
||||||
=== Stopping {dfeeds-cap}
|
=== Stopping {dfeeds}
|
||||||
|
|
||||||
When you stop a {dfeed}, it ceases to retrieve data from {es}. You can stop a
|
When you stop a {dfeed}, it ceases to retrieve data from {es}. You can stop a
|
||||||
{dfeed} by using {kib} or the
|
{dfeed} by using {kib} or the
|
||||||
@ -25,7 +25,7 @@ request stops the `feed1` {dfeed}:
|
|||||||
|
|
||||||
[source,js]
|
[source,js]
|
||||||
--------------------------------------------------
|
--------------------------------------------------
|
||||||
POST _ml/datafeeds/datafeed-total-requests/_stop
|
POST _ml/datafeeds/feed1/_stop
|
||||||
--------------------------------------------------
|
--------------------------------------------------
|
||||||
// CONSOLE
|
// CONSOLE
|
||||||
// TEST[skip:setup:server_metrics_startdf]
|
// TEST[skip:setup:server_metrics_startdf]
|
||||||
@ -39,7 +39,7 @@ A {dfeed} can be started and stopped multiple times throughout its lifecycle.
|
|||||||
|
|
||||||
[float]
|
[float]
|
||||||
[[stopping-all-ml-datafeeds]]
|
[[stopping-all-ml-datafeeds]]
|
||||||
==== Stopping All {dfeeds-cap}
|
==== Stopping all {dfeeds}
|
||||||
|
|
||||||
If you are upgrading your cluster, you can use the following request to stop all
|
If you are upgrading your cluster, you can use the following request to stop all
|
||||||
{dfeeds}:
|
{dfeeds}:
|
||||||
@ -53,19 +53,20 @@ POST _ml/datafeeds/_all/_stop
|
|||||||
|
|
||||||
[float]
|
[float]
|
||||||
[[closing-ml-jobs]]
|
[[closing-ml-jobs]]
|
||||||
=== Closing Jobs
|
=== Closing {anomaly-jobs}
|
||||||
|
|
||||||
When you close a job, it cannot receive data or perform analysis operations.
|
When you close an {anomaly-job}, it cannot receive data or perform analysis
|
||||||
If a job is associated with a {dfeed}, you must stop the {dfeed} before you can
|
operations. If a job is associated with a {dfeed}, you must stop the {dfeed}
|
||||||
close the jobs. If the {dfeed} has an end date, the job closes automatically on
|
before you can close the job. If the {dfeed} has an end date, the job closes
|
||||||
that end date.
|
automatically on that end date.
|
||||||
|
|
||||||
You can close a job by using the {ref}/ml-close-job.html[close job API]. For
|
You can close a job by using the
|
||||||
|
{ref}/ml-close-job.html[close {anomaly-job} API]. For
|
||||||
example, the following request closes the `job1` job:
|
example, the following request closes the `job1` job:
|
||||||
|
|
||||||
[source,js]
|
[source,js]
|
||||||
--------------------------------------------------
|
--------------------------------------------------
|
||||||
POST _ml/anomaly_detectors/total-requests/_close
|
POST _ml/anomaly_detectors/job1/_close
|
||||||
--------------------------------------------------
|
--------------------------------------------------
|
||||||
// CONSOLE
|
// CONSOLE
|
||||||
// TEST[skip:setup:server_metrics_openjob]
|
// TEST[skip:setup:server_metrics_openjob]
|
||||||
@ -73,14 +74,15 @@ POST _ml/anomaly_detectors/total-requests/_close
|
|||||||
NOTE: You must have `manage_ml`, or `manage` cluster privileges to stop {dfeeds}.
|
NOTE: You must have `manage_ml`, or `manage` cluster privileges to stop {dfeeds}.
|
||||||
For more information, see <<security-privileges>>.
|
For more information, see <<security-privileges>>.
|
||||||
|
|
||||||
A job can be opened and closed multiple times throughout its lifecycle.
|
{anomaly-jobs-cap} can be opened and closed multiple times throughout their
|
||||||
|
lifecycle.
|
||||||
|
|
||||||
[float]
|
[float]
|
||||||
[[closing-all-ml-datafeeds]]
|
[[closing-all-ml-datafeeds]]
|
||||||
==== Closing All Jobs
|
==== Closing all {anomaly-jobs}
|
||||||
|
|
||||||
If you are upgrading your cluster, you can use the following request to close
|
If you are upgrading your cluster, you can use the following request to close
|
||||||
all open jobs on the cluster:
|
all open {anomaly-jobs} on the cluster:
|
||||||
|
|
||||||
[source,js]
|
[source,js]
|
||||||
----------------------------------
|
----------------------------------
|
||||||
|
@ -7,9 +7,9 @@ it is analyzed. {dfeeds-cap} contain an optional `script_fields` property, where
|
|||||||
you can specify scripts that evaluate custom expressions and return script
|
you can specify scripts that evaluate custom expressions and return script
|
||||||
fields.
|
fields.
|
||||||
|
|
||||||
If your {dfeed} defines script fields, you can use those fields in your job.
|
If your {dfeed} defines script fields, you can use those fields in your
|
||||||
For example, you can use the script fields in the analysis functions in one or
|
{anomaly-job}. For example, you can use the script fields in the analysis
|
||||||
more detectors.
|
functions in one or more detectors.
|
||||||
|
|
||||||
* <<ml-configuring-transform1>>
|
* <<ml-configuring-transform1>>
|
||||||
* <<ml-configuring-transform2>>
|
* <<ml-configuring-transform2>>
|
||||||
@ -146,12 +146,14 @@ PUT _ml/datafeeds/datafeed-test1
|
|||||||
within the job.
|
within the job.
|
||||||
<2> The script field is defined in the {dfeed}.
|
<2> The script field is defined in the {dfeed}.
|
||||||
|
|
||||||
This `test1` job contains a detector that uses a script field in a mean analysis
|
This `test1` {anomaly-job} contains a detector that uses a script field in a
|
||||||
function. The `datafeed-test1` {dfeed} defines the script field. It contains a
|
mean analysis function. The `datafeed-test1` {dfeed} defines the script field.
|
||||||
script that adds two fields in the document to produce a "total" error count.
|
It contains a script that adds two fields in the document to produce a "total"
|
||||||
|
error count.
|
||||||
|
|
||||||
The syntax for the `script_fields` property is identical to that used by {es}.
|
The syntax for the `script_fields` property is identical to that used by {es}.
|
||||||
For more information, see {ref}/search-request-body.html#request-body-search-script-fields[Script Fields].
|
For more information, see
|
||||||
|
{ref}/search-request-body.html#request-body-search-script-fields[Script fields].
|
||||||
|
|
||||||
You can preview the contents of the {dfeed} by using the following API:
|
You can preview the contents of the {dfeed} by using the following API:
|
||||||
|
|
||||||
@ -181,15 +183,15 @@ insufficient data to generate meaningful results.
|
|||||||
//For a full demonstration of
|
//For a full demonstration of
|
||||||
//how to create jobs with sample data, see <<ml-getting-started>>.
|
//how to create jobs with sample data, see <<ml-getting-started>>.
|
||||||
|
|
||||||
You can alternatively use {kib} to create an advanced job that uses script
|
You can alternatively use {kib} to create an advanced {anomaly-job} that uses
|
||||||
fields. To add the `script_fields` property to your {dfeed}, you must use the
|
script fields. To add the `script_fields` property to your {dfeed}, you must use
|
||||||
**Edit JSON** tab. For example:
|
the **Edit JSON** tab. For example:
|
||||||
|
|
||||||
[role="screenshot"]
|
[role="screenshot"]
|
||||||
image::images/ml-scriptfields.jpg[Adding script fields to a {dfeed} in {kib}]
|
image::images/ml-scriptfields.jpg[Adding script fields to a {dfeed} in {kib}]
|
||||||
|
|
||||||
[[ml-configuring-transform-examples]]
|
[[ml-configuring-transform-examples]]
|
||||||
==== Common Script Field Examples
|
==== Common script field examples
|
||||||
|
|
||||||
While the possibilities are limitless, there are a number of common scenarios
|
While the possibilities are limitless, there are a number of common scenarios
|
||||||
where you might use script fields in your {dfeeds}.
|
where you might use script fields in your {dfeeds}.
|
||||||
@ -199,7 +201,7 @@ where you might use script fields in your {dfeeds}.
|
|||||||
Some of these examples use regular expressions. By default, regular
|
Some of these examples use regular expressions. By default, regular
|
||||||
expressions are disabled because they circumvent the protection that Painless
|
expressions are disabled because they circumvent the protection that Painless
|
||||||
provides against long running and memory hungry scripts. For more information,
|
provides against long running and memory hungry scripts. For more information,
|
||||||
see {ref}/modules-scripting-painless.html[Painless Scripting Language].
|
see {ref}/modules-scripting-painless.html[Painless scripting language].
|
||||||
|
|
||||||
Machine learning analysis is case sensitive. For example, "John" is considered
|
Machine learning analysis is case sensitive. For example, "John" is considered
|
||||||
to be different than "john". This is one reason you might consider using scripts
|
to be different than "john". This is one reason you might consider using scripts
|
||||||
|
Loading…
x
Reference in New Issue
Block a user