This commit is contained in:
parent
8dc5880c3f
commit
fb212269ce
|
@ -26,7 +26,7 @@ apply plugin: 'elasticsearch.rest-resources'
|
||||||
|
|
||||||
/* List of files that have snippets that will not work until platinum tests can occur ... */
|
/* List of files that have snippets that will not work until platinum tests can occur ... */
|
||||||
buildRestTests.expectedUnconvertedCandidates = [
|
buildRestTests.expectedUnconvertedCandidates = [
|
||||||
'reference/ml/anomaly-detection/transforms.asciidoc',
|
'reference/ml/anomaly-detection/ml-configuring-transform.asciidoc',
|
||||||
'reference/ml/anomaly-detection/apis/delete-calendar-event.asciidoc',
|
'reference/ml/anomaly-detection/apis/delete-calendar-event.asciidoc',
|
||||||
'reference/ml/anomaly-detection/apis/get-bucket.asciidoc',
|
'reference/ml/anomaly-detection/apis/get-bucket.asciidoc',
|
||||||
'reference/ml/anomaly-detection/apis/get-category.asciidoc',
|
'reference/ml/anomaly-detection/apis/get-category.asciidoc',
|
||||||
|
|
|
@ -1,52 +0,0 @@
|
||||||
[role="xpack"]
|
|
||||||
[[ml-configuring]]
|
|
||||||
== Configuring machine learning
|
|
||||||
|
|
||||||
If you want to use {ml-features}, there must be at least one {ml} node in
|
|
||||||
your cluster and all master-eligible nodes must have {ml} enabled. By default,
|
|
||||||
all nodes are {ml} nodes. For more information about these settings, see
|
|
||||||
{ref}/modules-node.html#ml-node[{ml} nodes].
|
|
||||||
|
|
||||||
To use the {ml-features} to analyze your data, you can create an {anomaly-job}
|
|
||||||
and send your data to that job.
|
|
||||||
|
|
||||||
* If your data is stored in {es}:
|
|
||||||
|
|
||||||
** You can create a {dfeed}, which retrieves data from {es} for analysis.
|
|
||||||
** You can use {kib} to expedite the creation of jobs and {dfeeds}.
|
|
||||||
|
|
||||||
* If your data is not stored in {es}, you can
|
|
||||||
{ref}/ml-post-data.html[POST data] from any source directly to an API.
|
|
||||||
|
|
||||||
The results of {ml} analysis are stored in {es} and you can use {kib} to help
|
|
||||||
you visualize and explore the results.
|
|
||||||
|
|
||||||
//For a tutorial that walks you through these configuration steps,
|
|
||||||
//see <<ml-getting-started>>.
|
|
||||||
|
|
||||||
Though it is quite simple to analyze your data and provide quick {ml} results,
|
|
||||||
gaining deep insights might require some additional planning and configuration.
|
|
||||||
The scenarios in this section describe some best practices for generating useful
|
|
||||||
{ml} results and insights from your data.
|
|
||||||
|
|
||||||
* <<ml-configuring-url>>
|
|
||||||
* <<ml-configuring-aggregation>>
|
|
||||||
* <<ml-configuring-categories>>
|
|
||||||
* <<ml-configuring-detector-custom-rules>>
|
|
||||||
* <<ml-configuring-pop>>
|
|
||||||
* <<ml-configuring-transform>>
|
|
||||||
* <<ml-delayed-data-detection>>
|
|
||||||
|
|
||||||
include::customurl.asciidoc[]
|
|
||||||
|
|
||||||
include::aggregations.asciidoc[]
|
|
||||||
|
|
||||||
include::detector-custom-rules.asciidoc[]
|
|
||||||
|
|
||||||
include::categories.asciidoc[]
|
|
||||||
|
|
||||||
include::populations.asciidoc[]
|
|
||||||
|
|
||||||
include::transforms.asciidoc[]
|
|
||||||
|
|
||||||
include::delayed-data-detection.asciidoc[]
|
|
|
@ -1,6 +1,6 @@
|
||||||
[role="xpack"]
|
[role="xpack"]
|
||||||
[[ml-count-functions]]
|
[[ml-count-functions]]
|
||||||
=== Count functions
|
= Count functions
|
||||||
|
|
||||||
Count functions detect anomalies when the number of events in a bucket is
|
Count functions detect anomalies when the number of events in a bucket is
|
||||||
anomalous.
|
anomalous.
|
||||||
|
@ -22,7 +22,7 @@ The {ml-features} include the following count functions:
|
||||||
|
|
||||||
[float]
|
[float]
|
||||||
[[ml-count]]
|
[[ml-count]]
|
||||||
===== Count, high_count, low_count
|
== Count, high_count, low_count
|
||||||
|
|
||||||
The `count` function detects anomalies when the number of events in a bucket is
|
The `count` function detects anomalies when the number of events in a bucket is
|
||||||
anomalous.
|
anomalous.
|
||||||
|
@ -145,7 +145,7 @@ and the `summary_count_field_name` property. For more information, see
|
||||||
|
|
||||||
[float]
|
[float]
|
||||||
[[ml-nonzero-count]]
|
[[ml-nonzero-count]]
|
||||||
===== Non_zero_count, high_non_zero_count, low_non_zero_count
|
== Non_zero_count, high_non_zero_count, low_non_zero_count
|
||||||
|
|
||||||
The `non_zero_count` function detects anomalies when the number of events in a
|
The `non_zero_count` function detects anomalies when the number of events in a
|
||||||
bucket is anomalous, but it ignores cases where the bucket count is zero. Use
|
bucket is anomalous, but it ignores cases where the bucket count is zero. Use
|
||||||
|
@ -215,7 +215,7 @@ data is sparse, use the `count` functions, which are optimized for that scenario
|
||||||
|
|
||||||
[float]
|
[float]
|
||||||
[[ml-distinct-count]]
|
[[ml-distinct-count]]
|
||||||
===== Distinct_count, high_distinct_count, low_distinct_count
|
== Distinct_count, high_distinct_count, low_distinct_count
|
||||||
|
|
||||||
The `distinct_count` function detects anomalies where the number of distinct
|
The `distinct_count` function detects anomalies where the number of distinct
|
||||||
values in one field is unusual.
|
values in one field is unusual.
|
|
@ -1,6 +1,6 @@
|
||||||
[role="xpack"]
|
[role="xpack"]
|
||||||
[[ml-functions]]
|
[[ml-functions]]
|
||||||
== Function reference
|
= Function reference
|
||||||
|
|
||||||
The {ml-features} include analysis functions that provide a wide variety of
|
The {ml-features} include analysis functions that provide a wide variety of
|
||||||
flexible ways to analyze data for anomalies.
|
flexible ways to analyze data for anomalies.
|
||||||
|
@ -41,17 +41,3 @@ These functions effectively ignore empty buckets.
|
||||||
* <<ml-rare-functions>>
|
* <<ml-rare-functions>>
|
||||||
* <<ml-sum-functions>>
|
* <<ml-sum-functions>>
|
||||||
* <<ml-time-functions>>
|
* <<ml-time-functions>>
|
||||||
|
|
||||||
include::functions/count.asciidoc[]
|
|
||||||
|
|
||||||
include::functions/geo.asciidoc[]
|
|
||||||
|
|
||||||
include::functions/info.asciidoc[]
|
|
||||||
|
|
||||||
include::functions/metric.asciidoc[]
|
|
||||||
|
|
||||||
include::functions/rare.asciidoc[]
|
|
||||||
|
|
||||||
include::functions/sum.asciidoc[]
|
|
||||||
|
|
||||||
include::functions/time.asciidoc[]
|
|
|
@ -1,6 +1,6 @@
|
||||||
[role="xpack"]
|
[role="xpack"]
|
||||||
[[ml-geo-functions]]
|
[[ml-geo-functions]]
|
||||||
=== Geographic functions
|
= Geographic functions
|
||||||
|
|
||||||
The geographic functions detect anomalies in the geographic location of the
|
The geographic functions detect anomalies in the geographic location of the
|
||||||
input data.
|
input data.
|
||||||
|
@ -13,7 +13,7 @@ geographic functions.
|
||||||
|
|
||||||
[float]
|
[float]
|
||||||
[[ml-lat-long]]
|
[[ml-lat-long]]
|
||||||
==== Lat_long
|
== Lat_long
|
||||||
|
|
||||||
The `lat_long` function detects anomalies in the geographic location of the
|
The `lat_long` function detects anomalies in the geographic location of the
|
||||||
input data.
|
input data.
|
|
@ -1,5 +1,5 @@
|
||||||
[[ml-info-functions]]
|
[[ml-info-functions]]
|
||||||
=== Information Content Functions
|
= Information Content Functions
|
||||||
|
|
||||||
The information content functions detect anomalies in the amount of information
|
The information content functions detect anomalies in the amount of information
|
||||||
that is contained in strings within a bucket. These functions can be used as
|
that is contained in strings within a bucket. These functions can be used as
|
||||||
|
@ -12,7 +12,7 @@ The {ml-features} include the following information content functions:
|
||||||
|
|
||||||
[float]
|
[float]
|
||||||
[[ml-info-content]]
|
[[ml-info-content]]
|
||||||
==== Info_content, High_info_content, Low_info_content
|
== Info_content, High_info_content, Low_info_content
|
||||||
|
|
||||||
The `info_content` function detects anomalies in the amount of information that
|
The `info_content` function detects anomalies in the amount of information that
|
||||||
is contained in strings in a bucket.
|
is contained in strings in a bucket.
|
|
@ -1,6 +1,6 @@
|
||||||
[role="xpack"]
|
[role="xpack"]
|
||||||
[[ml-metric-functions]]
|
[[ml-metric-functions]]
|
||||||
=== Metric functions
|
= Metric functions
|
||||||
|
|
||||||
The metric functions include functions such as mean, min and max. These values
|
The metric functions include functions such as mean, min and max. These values
|
||||||
are calculated for each bucket. Field values that cannot be converted to
|
are calculated for each bucket. Field values that cannot be converted to
|
||||||
|
@ -20,7 +20,7 @@ function.
|
||||||
|
|
||||||
[float]
|
[float]
|
||||||
[[ml-metric-min]]
|
[[ml-metric-min]]
|
||||||
==== Min
|
== Min
|
||||||
|
|
||||||
The `min` function detects anomalies in the arithmetic minimum of a value.
|
The `min` function detects anomalies in the arithmetic minimum of a value.
|
||||||
The minimum value is calculated for each bucket.
|
The minimum value is calculated for each bucket.
|
||||||
|
@ -55,7 +55,7 @@ entry mistakes. It models the minimum amount for each product over time.
|
||||||
|
|
||||||
[float]
|
[float]
|
||||||
[[ml-metric-max]]
|
[[ml-metric-max]]
|
||||||
==== Max
|
== Max
|
||||||
|
|
||||||
The `max` function detects anomalies in the arithmetic maximum of a value.
|
The `max` function detects anomalies in the arithmetic maximum of a value.
|
||||||
The maximum value is calculated for each bucket.
|
The maximum value is calculated for each bucket.
|
||||||
|
@ -113,7 +113,7 @@ response times for each bucket.
|
||||||
|
|
||||||
[float]
|
[float]
|
||||||
[[ml-metric-median]]
|
[[ml-metric-median]]
|
||||||
==== Median, high_median, low_median
|
== Median, high_median, low_median
|
||||||
|
|
||||||
The `median` function detects anomalies in the statistical median of a value.
|
The `median` function detects anomalies in the statistical median of a value.
|
||||||
The median value is calculated for each bucket.
|
The median value is calculated for each bucket.
|
||||||
|
@ -151,7 +151,7 @@ median `responsetime` is unusual compared to previous `responsetime` values.
|
||||||
|
|
||||||
[float]
|
[float]
|
||||||
[[ml-metric-mean]]
|
[[ml-metric-mean]]
|
||||||
==== Mean, high_mean, low_mean
|
== Mean, high_mean, low_mean
|
||||||
|
|
||||||
The `mean` function detects anomalies in the arithmetic mean of a value.
|
The `mean` function detects anomalies in the arithmetic mean of a value.
|
||||||
The mean value is calculated for each bucket.
|
The mean value is calculated for each bucket.
|
||||||
|
@ -221,7 +221,7 @@ values.
|
||||||
|
|
||||||
[float]
|
[float]
|
||||||
[[ml-metric-metric]]
|
[[ml-metric-metric]]
|
||||||
==== Metric
|
== Metric
|
||||||
|
|
||||||
The `metric` function combines `min`, `max`, and `mean` functions. You can use
|
The `metric` function combines `min`, `max`, and `mean` functions. You can use
|
||||||
it as a shorthand for a combined analysis. If you do not specify a function in
|
it as a shorthand for a combined analysis. If you do not specify a function in
|
||||||
|
@ -258,7 +258,7 @@ when the mean, min, or max `responsetime` is unusual compared to previous
|
||||||
|
|
||||||
[float]
|
[float]
|
||||||
[[ml-metric-varp]]
|
[[ml-metric-varp]]
|
||||||
==== Varp, high_varp, low_varp
|
== Varp, high_varp, low_varp
|
||||||
|
|
||||||
The `varp` function detects anomalies in the variance of a value which is a
|
The `varp` function detects anomalies in the variance of a value which is a
|
||||||
measure of the variability and spread in the data.
|
measure of the variability and spread in the data.
|
|
@ -1,6 +1,6 @@
|
||||||
[role="xpack"]
|
[role="xpack"]
|
||||||
[[ml-rare-functions]]
|
[[ml-rare-functions]]
|
||||||
=== Rare functions
|
= Rare functions
|
||||||
|
|
||||||
The rare functions detect values that occur rarely in time or rarely for a
|
The rare functions detect values that occur rarely in time or rarely for a
|
||||||
population.
|
population.
|
||||||
|
@ -35,7 +35,7 @@ The {ml-features} include the following rare functions:
|
||||||
|
|
||||||
[float]
|
[float]
|
||||||
[[ml-rare]]
|
[[ml-rare]]
|
||||||
==== Rare
|
== Rare
|
||||||
|
|
||||||
The `rare` function detects values that occur rarely in time or rarely for a
|
The `rare` function detects values that occur rarely in time or rarely for a
|
||||||
population. It detects anomalies according to the number of distinct rare values.
|
population. It detects anomalies according to the number of distinct rare values.
|
||||||
|
@ -93,7 +93,7 @@ is rare, even if it occurs for that client IP in every bucket.
|
||||||
|
|
||||||
[float]
|
[float]
|
||||||
[[ml-freq-rare]]
|
[[ml-freq-rare]]
|
||||||
==== Freq_rare
|
== Freq_rare
|
||||||
|
|
||||||
The `freq_rare` function detects values that occur rarely for a population.
|
The `freq_rare` function detects values that occur rarely for a population.
|
||||||
It detects anomalies according to the number of times (frequency) that rare
|
It detects anomalies according to the number of times (frequency) that rare
|
|
@ -1,6 +1,6 @@
|
||||||
[role="xpack"]
|
[role="xpack"]
|
||||||
[[ml-sum-functions]]
|
[[ml-sum-functions]]
|
||||||
=== Sum functions
|
= Sum functions
|
||||||
|
|
||||||
The sum functions detect anomalies when the sum of a field in a bucket is
|
The sum functions detect anomalies when the sum of a field in a bucket is
|
||||||
anomalous.
|
anomalous.
|
||||||
|
@ -19,7 +19,7 @@ The {ml-features} include the following sum functions:
|
||||||
|
|
||||||
[float]
|
[float]
|
||||||
[[ml-sum]]
|
[[ml-sum]]
|
||||||
==== Sum, high_sum, low_sum
|
== Sum, high_sum, low_sum
|
||||||
|
|
||||||
The `sum` function detects anomalies where the sum of a field in a bucket is
|
The `sum` function detects anomalies where the sum of a field in a bucket is
|
||||||
anomalous.
|
anomalous.
|
||||||
|
@ -75,7 +75,7 @@ to find users that are abusing internet privileges.
|
||||||
|
|
||||||
[float]
|
[float]
|
||||||
[[ml-nonnull-sum]]
|
[[ml-nonnull-sum]]
|
||||||
==== Non_null_sum, high_non_null_sum, low_non_null_sum
|
== Non_null_sum, high_non_null_sum, low_non_null_sum
|
||||||
|
|
||||||
The `non_null_sum` function is useful if your data is sparse. Buckets without
|
The `non_null_sum` function is useful if your data is sparse. Buckets without
|
||||||
values are ignored and buckets with a zero value are analyzed.
|
values are ignored and buckets with a zero value are analyzed.
|
|
@ -1,6 +1,6 @@
|
||||||
[role="xpack"]
|
[role="xpack"]
|
||||||
[[ml-time-functions]]
|
[[ml-time-functions]]
|
||||||
=== Time functions
|
= Time functions
|
||||||
|
|
||||||
The time functions detect events that happen at unusual times, either of the day
|
The time functions detect events that happen at unusual times, either of the day
|
||||||
or of the week. These functions can be used to find unusual patterns of behavior,
|
or of the week. These functions can be used to find unusual patterns of behavior,
|
||||||
|
@ -37,7 +37,7 @@ step change in behavior and the new times will be learned quickly.
|
||||||
|
|
||||||
[float]
|
[float]
|
||||||
[[ml-time-of-day]]
|
[[ml-time-of-day]]
|
||||||
==== Time_of_day
|
== Time_of_day
|
||||||
|
|
||||||
The `time_of_day` function detects when events occur that are outside normal
|
The `time_of_day` function detects when events occur that are outside normal
|
||||||
usage patterns. For example, it detects unusual activity in the middle of the
|
usage patterns. For example, it detects unusual activity in the middle of the
|
||||||
|
@ -73,7 +73,7 @@ its past behavior.
|
||||||
|
|
||||||
[float]
|
[float]
|
||||||
[[ml-time-of-week]]
|
[[ml-time-of-week]]
|
||||||
==== Time_of_week
|
== Time_of_week
|
||||||
|
|
||||||
The `time_of_week` function detects when events occur that are outside normal
|
The `time_of_week` function detects when events occur that are outside normal
|
||||||
usage patterns. For example, it detects login events on the weekend.
|
usage patterns. For example, it detects login events on the weekend.
|
|
@ -1,6 +1,6 @@
|
||||||
[role="xpack"]
|
[role="xpack"]
|
||||||
[[ml-configuring-aggregation]]
|
[[ml-configuring-aggregation]]
|
||||||
=== Aggregating data for faster performance
|
= Aggregating data for faster performance
|
||||||
|
|
||||||
By default, {dfeeds} fetch data from {es} using search and scroll requests.
|
By default, {dfeeds} fetch data from {es} using search and scroll requests.
|
||||||
It can be significantly more efficient, however, to aggregate data in {es}
|
It can be significantly more efficient, however, to aggregate data in {es}
|
||||||
|
@ -17,7 +17,7 @@ search and scroll behavior.
|
||||||
|
|
||||||
[discrete]
|
[discrete]
|
||||||
[[aggs-limits-dfeeds]]
|
[[aggs-limits-dfeeds]]
|
||||||
==== Requirements and limitations
|
== Requirements and limitations
|
||||||
|
|
||||||
There are some limitations to using aggregations in {dfeeds}. Your aggregation
|
There are some limitations to using aggregations in {dfeeds}. Your aggregation
|
||||||
must include a `date_histogram` aggregation, which in turn must contain a `max`
|
must include a `date_histogram` aggregation, which in turn must contain a `max`
|
||||||
|
@ -48,7 +48,7 @@ functions, set the interval to the same value as the bucket span.
|
||||||
|
|
||||||
[discrete]
|
[discrete]
|
||||||
[[aggs-include-jobs]]
|
[[aggs-include-jobs]]
|
||||||
==== Including aggregations in {anomaly-jobs}
|
== Including aggregations in {anomaly-jobs}
|
||||||
|
|
||||||
When you create or update an {anomaly-job}, you can include the names of
|
When you create or update an {anomaly-job}, you can include the names of
|
||||||
aggregations, for example:
|
aggregations, for example:
|
||||||
|
@ -134,7 +134,7 @@ that match values in the job configuration are fed to the job.
|
||||||
|
|
||||||
[discrete]
|
[discrete]
|
||||||
[[aggs-dfeeds]]
|
[[aggs-dfeeds]]
|
||||||
==== Nested aggregations in {dfeeds}
|
== Nested aggregations in {dfeeds}
|
||||||
|
|
||||||
{dfeeds-cap} support complex nested aggregations. This example uses the
|
{dfeeds-cap} support complex nested aggregations. This example uses the
|
||||||
`derivative` pipeline aggregation to find the first order derivative of the
|
`derivative` pipeline aggregation to find the first order derivative of the
|
||||||
|
@ -180,7 +180,7 @@ counter `system.network.out.bytes` for each value of the field `beat.name`.
|
||||||
|
|
||||||
[discrete]
|
[discrete]
|
||||||
[[aggs-single-dfeeds]]
|
[[aggs-single-dfeeds]]
|
||||||
==== Single bucket aggregations in {dfeeds}
|
== Single bucket aggregations in {dfeeds}
|
||||||
|
|
||||||
{dfeeds-cap} not only supports multi-bucket aggregations, but also single bucket
|
{dfeeds-cap} not only supports multi-bucket aggregations, but also single bucket
|
||||||
aggregations. The following shows two `filter` aggregations, each gathering the
|
aggregations. The following shows two `filter` aggregations, each gathering the
|
||||||
|
@ -232,7 +232,7 @@ number of unique entries for the `error` field.
|
||||||
|
|
||||||
[discrete]
|
[discrete]
|
||||||
[[aggs-define-dfeeds]]
|
[[aggs-define-dfeeds]]
|
||||||
==== Defining aggregations in {dfeeds}
|
== Defining aggregations in {dfeeds}
|
||||||
|
|
||||||
When you define an aggregation in a {dfeed}, it must have the following form:
|
When you define an aggregation in a {dfeed}, it must have the following form:
|
||||||
|
|
|
@ -1,7 +1,7 @@
|
||||||
[role="xpack"]
|
[role="xpack"]
|
||||||
[testenv="platinum"]
|
[testenv="platinum"]
|
||||||
[[ml-configuring-categories]]
|
[[ml-configuring-categories]]
|
||||||
=== Detecting anomalous categories of data
|
= Detecting anomalous categories of data
|
||||||
|
|
||||||
Categorization is a {ml} process that tokenizes a text field, clusters similar
|
Categorization is a {ml} process that tokenizes a text field, clusters similar
|
||||||
data together, and classifies it into categories. It works best on
|
data together, and classifies it into categories. It works best on
|
||||||
|
@ -100,7 +100,7 @@ SQL statement from the categorization algorithm.
|
||||||
|
|
||||||
[discrete]
|
[discrete]
|
||||||
[[ml-configuring-analyzer]]
|
[[ml-configuring-analyzer]]
|
||||||
==== Customizing the categorization analyzer
|
== Customizing the categorization analyzer
|
||||||
|
|
||||||
Categorization uses English dictionary words to identify log message categories.
|
Categorization uses English dictionary words to identify log message categories.
|
||||||
By default, it also uses English tokenization rules. For this reason, if you use
|
By default, it also uses English tokenization rules. For this reason, if you use
|
|
@ -1,6 +1,6 @@
|
||||||
[role="xpack"]
|
[role="xpack"]
|
||||||
[[ml-configuring-detector-custom-rules]]
|
[[ml-configuring-detector-custom-rules]]
|
||||||
=== Customizing detectors with custom rules
|
= Customizing detectors with custom rules
|
||||||
|
|
||||||
<<ml-rules,Custom rules>> enable you to change the behavior of anomaly
|
<<ml-rules,Custom rules>> enable you to change the behavior of anomaly
|
||||||
detectors based on domain-specific knowledge.
|
detectors based on domain-specific knowledge.
|
||||||
|
@ -15,7 +15,7 @@ scope and conditions. For the full list of specification details, see the
|
||||||
{anomaly-jobs} API.
|
{anomaly-jobs} API.
|
||||||
|
|
||||||
[[ml-custom-rules-scope]]
|
[[ml-custom-rules-scope]]
|
||||||
==== Specifying custom rule scope
|
== Specifying custom rule scope
|
||||||
|
|
||||||
Let us assume we are configuring an {anomaly-job} in order to detect DNS data
|
Let us assume we are configuring an {anomaly-job} in order to detect DNS data
|
||||||
exfiltration. Our data contain fields "subdomain" and "highest_registered_domain".
|
exfiltration. Our data contain fields "subdomain" and "highest_registered_domain".
|
||||||
|
@ -131,7 +131,7 @@ Such a detector will skip results when the values of all 3 scoped fields
|
||||||
are included in the referenced filters.
|
are included in the referenced filters.
|
||||||
|
|
||||||
[[ml-custom-rules-conditions]]
|
[[ml-custom-rules-conditions]]
|
||||||
==== Specifying custom rule conditions
|
== Specifying custom rule conditions
|
||||||
|
|
||||||
Imagine a detector that looks for anomalies in CPU utilization.
|
Imagine a detector that looks for anomalies in CPU utilization.
|
||||||
Given a machine that is idle for long enough, small movement in CPU could
|
Given a machine that is idle for long enough, small movement in CPU could
|
||||||
|
@ -211,7 +211,7 @@ PUT _ml/anomaly_detectors/rule_with_range
|
||||||
// TEST[skip:needs-licence]
|
// TEST[skip:needs-licence]
|
||||||
|
|
||||||
[[ml-custom-rules-lifecycle]]
|
[[ml-custom-rules-lifecycle]]
|
||||||
==== Custom rules in the lifecycle of a job
|
== Custom rules in the lifecycle of a job
|
||||||
|
|
||||||
Custom rules only affect results created after the rules were applied.
|
Custom rules only affect results created after the rules were applied.
|
||||||
Let us imagine that we have configured an {anomaly-job} and it has been running
|
Let us imagine that we have configured an {anomaly-job} and it has been running
|
||||||
|
@ -222,7 +222,7 @@ rule we added will only be in effect for any results created from the moment we
|
||||||
added the rule onwards. Past results will remain unaffected.
|
added the rule onwards. Past results will remain unaffected.
|
||||||
|
|
||||||
[[ml-custom-rules-filtering]]
|
[[ml-custom-rules-filtering]]
|
||||||
==== Using custom rules vs. filtering data
|
== Using custom rules vs. filtering data
|
||||||
|
|
||||||
It might appear like using rules is just another way of filtering the data
|
It might appear like using rules is just another way of filtering the data
|
||||||
that feeds into an {anomaly-job}. For example, a rule that skips results when
|
that feeds into an {anomaly-job}. For example, a rule that skips results when
|
|
@ -1,6 +1,6 @@
|
||||||
[role="xpack"]
|
[role="xpack"]
|
||||||
[[ml-configuring-pop]]
|
[[ml-configuring-populations]]
|
||||||
=== Performing population analysis
|
= Performing population analysis
|
||||||
|
|
||||||
Entities or events in your data can be considered anomalous when:
|
Entities or events in your data can be considered anomalous when:
|
||||||
|
|
|
@ -1,6 +1,6 @@
|
||||||
[role="xpack"]
|
[role="xpack"]
|
||||||
[[ml-configuring-transform]]
|
[[ml-configuring-transform]]
|
||||||
=== Transforming data with script fields
|
= Transforming data with script fields
|
||||||
|
|
||||||
If you use {dfeeds}, you can add scripts to transform your data before
|
If you use {dfeeds}, you can add scripts to transform your data before
|
||||||
it is analyzed. {dfeeds-cap} contain an optional `script_fields` property, where
|
it is analyzed. {dfeeds-cap} contain an optional `script_fields` property, where
|
||||||
|
@ -190,7 +190,7 @@ the **Edit JSON** tab. For example:
|
||||||
image::images/ml-scriptfields.jpg[Adding script fields to a {dfeed} in {kib}]
|
image::images/ml-scriptfields.jpg[Adding script fields to a {dfeed} in {kib}]
|
||||||
|
|
||||||
[[ml-configuring-transform-examples]]
|
[[ml-configuring-transform-examples]]
|
||||||
==== Common script field examples
|
== Common script field examples
|
||||||
|
|
||||||
While the possibilities are limitless, there are a number of common scenarios
|
While the possibilities are limitless, there are a number of common scenarios
|
||||||
where you might use script fields in your {dfeeds}.
|
where you might use script fields in your {dfeeds}.
|
|
@ -1,6 +1,6 @@
|
||||||
[role="xpack"]
|
[role="xpack"]
|
||||||
[[ml-configuring-url]]
|
[[ml-configuring-url]]
|
||||||
=== Adding custom URLs to machine learning results
|
= Adding custom URLs to machine learning results
|
||||||
|
|
||||||
When you create an advanced {anomaly-job} or edit any {anomaly-jobs} in {kib},
|
When you create an advanced {anomaly-job} or edit any {anomaly-jobs} in {kib},
|
||||||
you can optionally attach one or more custom URLs.
|
you can optionally attach one or more custom URLs.
|
||||||
|
@ -49,7 +49,7 @@ You can also specify these custom URL settings when you create or update
|
||||||
|
|
||||||
[float]
|
[float]
|
||||||
[[ml-configuring-url-strings]]
|
[[ml-configuring-url-strings]]
|
||||||
==== String substitution in custom URLs
|
== String substitution in custom URLs
|
||||||
|
|
||||||
You can use dollar sign ($) delimited tokens in a custom URL. These tokens are
|
You can use dollar sign ($) delimited tokens in a custom URL. These tokens are
|
||||||
substituted for the values of the corresponding fields in the anomaly records.
|
substituted for the values of the corresponding fields in the anomaly records.
|
|
@ -1,6 +1,6 @@
|
||||||
[role="xpack"]
|
[role="xpack"]
|
||||||
[[ml-delayed-data-detection]]
|
[[ml-delayed-data-detection]]
|
||||||
=== Handling delayed data
|
= Handling delayed data
|
||||||
|
|
||||||
Delayed data are documents that are indexed late. That is to say, it is data
|
Delayed data are documents that are indexed late. That is to say, it is data
|
||||||
related to a time that the {dfeed} has already processed.
|
related to a time that the {dfeed} has already processed.
|
||||||
|
@ -15,7 +15,7 @@ if it is set too high, analysis drifts farther away from real-time. The balance
|
||||||
that is struck depends upon each use case and the environmental factors of the
|
that is struck depends upon each use case and the environmental factors of the
|
||||||
cluster.
|
cluster.
|
||||||
|
|
||||||
==== Why worry about delayed data?
|
== Why worry about delayed data?
|
||||||
|
|
||||||
This is a particularly prescient question. If data are delayed randomly (and
|
This is a particularly prescient question. If data are delayed randomly (and
|
||||||
consequently are missing from analysis), the results of certain types of
|
consequently are missing from analysis), the results of certain types of
|
||||||
|
@ -27,7 +27,7 @@ however, {anomaly-jobs} with a `low_count` function may provide false positives.
|
||||||
In this situation, it would be useful to see if data comes in after an anomaly is
|
In this situation, it would be useful to see if data comes in after an anomaly is
|
||||||
recorded so that you can determine a next course of action.
|
recorded so that you can determine a next course of action.
|
||||||
|
|
||||||
==== How do we detect delayed data?
|
== How do we detect delayed data?
|
||||||
|
|
||||||
In addition to the `query_delay` field, there is a delayed data check config,
|
In addition to the `query_delay` field, there is a delayed data check config,
|
||||||
which enables you to configure the datafeed to look in the past for delayed data.
|
which enables you to configure the datafeed to look in the past for delayed data.
|
||||||
|
@ -41,7 +41,7 @@ arrived since the analysis. If there is indeed missing data due to their ingest
|
||||||
delay, the end user is notified. For example, you can see annotations in {kib}
|
delay, the end user is notified. For example, you can see annotations in {kib}
|
||||||
for the periods where these delays occur.
|
for the periods where these delays occur.
|
||||||
|
|
||||||
==== What to do about delayed data?
|
== What to do about delayed data?
|
||||||
|
|
||||||
The most common course of action is to simply to do nothing. For many functions
|
The most common course of action is to simply to do nothing. For many functions
|
||||||
and situations, ignoring the data is acceptable. However, if the amount of
|
and situations, ignoring the data is acceptable. However, if the amount of
|
|
@ -1,88 +0,0 @@
|
||||||
[role="xpack"]
|
|
||||||
[[stopping-ml]]
|
|
||||||
== Stopping {ml} {anomaly-detect}
|
|
||||||
|
|
||||||
An orderly shutdown ensures that:
|
|
||||||
|
|
||||||
* {dfeeds-cap} are stopped
|
|
||||||
* Buffers are flushed
|
|
||||||
* Model history is pruned
|
|
||||||
* Final results are calculated
|
|
||||||
* Model snapshots are saved
|
|
||||||
* {anomaly-jobs-cap} are closed
|
|
||||||
|
|
||||||
This process ensures that jobs are in a consistent state in case you want to
|
|
||||||
subsequently re-open them.
|
|
||||||
|
|
||||||
[float]
|
|
||||||
[[stopping-ml-datafeeds]]
|
|
||||||
=== Stopping {dfeeds}
|
|
||||||
|
|
||||||
When you stop a {dfeed}, it ceases to retrieve data from {es}. You can stop a
|
|
||||||
{dfeed} by using {kib} or the
|
|
||||||
{ref}/ml-stop-datafeed.html[stop {dfeeds} API]. For example, the following
|
|
||||||
request stops the `feed1` {dfeed}:
|
|
||||||
|
|
||||||
[source,console]
|
|
||||||
--------------------------------------------------
|
|
||||||
POST _ml/datafeeds/feed1/_stop
|
|
||||||
--------------------------------------------------
|
|
||||||
// TEST[skip:setup:server_metrics_startdf]
|
|
||||||
|
|
||||||
NOTE: You must have `manage_ml`, or `manage` cluster privileges to stop {dfeeds}.
|
|
||||||
For more information, see {ref}/security-privileges.html[Security privileges]
|
|
||||||
|
|
||||||
A {dfeed} can be started and stopped multiple times throughout its lifecycle.
|
|
||||||
|
|
||||||
//For examples of stopping {dfeeds} in {kib}, see <<ml-gs-job1-manage>>.
|
|
||||||
|
|
||||||
[float]
|
|
||||||
[[stopping-all-ml-datafeeds]]
|
|
||||||
==== Stopping all {dfeeds}
|
|
||||||
|
|
||||||
If you are upgrading your cluster, you can use the following request to stop all
|
|
||||||
{dfeeds}:
|
|
||||||
|
|
||||||
[source,console]
|
|
||||||
----------------------------------
|
|
||||||
POST _ml/datafeeds/_all/_stop
|
|
||||||
----------------------------------
|
|
||||||
// TEST[skip:needs-licence]
|
|
||||||
|
|
||||||
[float]
|
|
||||||
[[closing-ml-jobs]]
|
|
||||||
=== Closing {anomaly-jobs}
|
|
||||||
|
|
||||||
When you close an {anomaly-job}, it cannot receive data or perform analysis
|
|
||||||
operations. If a job is associated with a {dfeed}, you must stop the {dfeed}
|
|
||||||
before you can close the job. If the {dfeed} has an end date, the job closes
|
|
||||||
automatically on that end date.
|
|
||||||
|
|
||||||
You can close a job by using the
|
|
||||||
{ref}/ml-close-job.html[close {anomaly-job} API]. For
|
|
||||||
example, the following request closes the `job1` job:
|
|
||||||
|
|
||||||
[source,console]
|
|
||||||
--------------------------------------------------
|
|
||||||
POST _ml/anomaly_detectors/job1/_close
|
|
||||||
--------------------------------------------------
|
|
||||||
// TEST[skip:setup:server_metrics_openjob]
|
|
||||||
|
|
||||||
NOTE: You must have `manage_ml`, or `manage` cluster privileges to stop {dfeeds}.
|
|
||||||
For more information, see {ref}/security-privileges.html[Security privileges]
|
|
||||||
|
|
||||||
{anomaly-jobs-cap} can be opened and closed multiple times throughout their
|
|
||||||
lifecycle.
|
|
||||||
|
|
||||||
[float]
|
|
||||||
[[closing-all-ml-datafeeds]]
|
|
||||||
==== Closing all {anomaly-jobs}
|
|
||||||
|
|
||||||
If you are upgrading your cluster, you can use the following request to close
|
|
||||||
all open {anomaly-jobs} on the cluster:
|
|
||||||
|
|
||||||
[source,console]
|
|
||||||
----------------------------------
|
|
||||||
POST _ml/anomaly_detectors/_all/_close
|
|
||||||
----------------------------------
|
|
||||||
// TEST[skip:needs-licence]
|
|
|
@ -1115,7 +1115,7 @@ tag::over-field-name[]
|
||||||
The field used to split the data. In particular, this property is used for
|
The field used to split the data. In particular, this property is used for
|
||||||
analyzing the splits with respect to the history of all splits. It is used for
|
analyzing the splits with respect to the history of all splits. It is used for
|
||||||
finding unusual values in the population of all splits. For more information,
|
finding unusual values in the population of all splits. For more information,
|
||||||
see {ml-docs}/ml-configuring-pop.html[Performing population analysis].
|
see {ml-docs}/ml-configuring-populations.html[Performing population analysis].
|
||||||
end::over-field-name[]
|
end::over-field-name[]
|
||||||
|
|
||||||
tag::partition-field-name[]
|
tag::partition-field-name[]
|
||||||
|
|
Loading…
Reference in New Issue