[DOCS] Updates anomaly detection terminology (#44888)

This commit is contained in:
Lisa Cawley 2019-07-26 11:07:01 -07:00 committed by lcawl
parent cef375f883
commit a041d1eacf
17 changed files with 213 additions and 201 deletions

View File

@ -4,7 +4,7 @@
By default, {dfeeds} fetch data from {es} using search and scroll requests. By default, {dfeeds} fetch data from {es} using search and scroll requests.
It can be significantly more efficient, however, to aggregate data in {es} It can be significantly more efficient, however, to aggregate data in {es}
and to configure your jobs to analyze aggregated data. and to configure your {anomaly-jobs} to analyze aggregated data.
One of the benefits of aggregating data this way is that {es} automatically One of the benefits of aggregating data this way is that {es} automatically
distributes these calculations across your cluster. You can then feed this distributes these calculations across your cluster. You can then feed this
@ -19,8 +19,8 @@ of the last record in the bucket. If you use a terms aggregation and the
cardinality of a term is high, then the aggregation might not be effective and cardinality of a term is high, then the aggregation might not be effective and
you might want to just use the default search and scroll behavior. you might want to just use the default search and scroll behavior.
When you create or update a job, you can include the names of aggregations, for When you create or update an {anomaly-job}, you can include the names of
example: aggregations, for example:
[source,js] [source,js]
---------------------------------- ----------------------------------

View File

@ -68,8 +68,8 @@ we do not want the detailed SQL to be considered in the message categorization.
This particular categorization filter removes the SQL statement from the categorization This particular categorization filter removes the SQL statement from the categorization
algorithm. algorithm.
If your data is stored in {es}, you can create an advanced job with these same If your data is stored in {es}, you can create an advanced {anomaly-job} with
properties: these same properties:
[role="screenshot"] [role="screenshot"]
image::images/ml-category-advanced.jpg["Advanced job configuration options related to categorization"] image::images/ml-category-advanced.jpg["Advanced job configuration options related to categorization"]
@ -209,7 +209,7 @@ letters in tokens whereas the `ml_classic` tokenizer does, although that could
be fixed by using more complex regular expressions. be fixed by using more complex regular expressions.
For more information about the `categorization_analyzer` property, see For more information about the `categorization_analyzer` property, see
{ref}/ml-job-resource.html#ml-categorizationanalyzer[Categorization Analyzer]. {ref}/ml-job-resource.html#ml-categorizationanalyzer[Categorization analyzer].
NOTE: To add the `categorization_analyzer` property in {kib}, you must use the NOTE: To add the `categorization_analyzer` property in {kib}, you must use the
**Edit JSON** tab and copy the `categorization_analyzer` object from one of the **Edit JSON** tab and copy the `categorization_analyzer` object from one of the

View File

@ -7,8 +7,8 @@ your cluster and all master-eligible nodes must have {ml} enabled. By default,
all nodes are {ml} nodes. For more information about these settings, see all nodes are {ml} nodes. For more information about these settings, see
{ref}/modules-node.html#ml-node[{ml} nodes]. {ref}/modules-node.html#ml-node[{ml} nodes].
To use the {ml-features} to analyze your data, you must create a job and To use the {ml-features} to analyze your data, you can create an {anomaly-job}
send your data to that job. and send your data to that job.
* If your data is stored in {es}: * If your data is stored in {es}:

View File

@ -2,17 +2,17 @@
[[ml-configuring-url]] [[ml-configuring-url]]
=== Adding custom URLs to machine learning results === Adding custom URLs to machine learning results
When you create an advanced job or edit any job in {kib}, you can optionally When you create an advanced {anomaly-job} or edit any {anomaly-jobs} in {kib},
attach one or more custom URLs. you can optionally attach one or more custom URLs.
The custom URLs provide links from the anomalies table in the *Anomaly Explorer* The custom URLs provide links from the anomalies table in the *Anomaly Explorer*
or *Single Metric Viewer* window in {kib} to {kib} dashboards, the *Discovery* or *Single Metric Viewer* window in {kib} to {kib} dashboards, the *Discovery*
page, or external websites. For example, you can define a custom URL that page, or external websites. For example, you can define a custom URL that
provides a way for users to drill down to the source data from the results set. provides a way for users to drill down to the source data from the results set.
When you edit a job in {kib}, it simplifies the creation of the custom URLs for When you edit an {anomaly-job} in {kib}, it simplifies the creation of the
{kib} dashboards and the *Discover* page and it enables you to test your URLs. custom URLs for {kib} dashboards and the *Discover* page and it enables you to
For example: test your URLs. For example:
[role="screenshot"] [role="screenshot"]
image::images/ml-customurl-edit.jpg["Edit a job to add a custom URL"] image::images/ml-customurl-edit.jpg["Edit a job to add a custom URL"]
@ -29,7 +29,8 @@ As in this case, the custom URL can contain
are populated when you click the link in the anomalies table. In this example, are populated when you click the link in the anomalies table. In this example,
the custom URL contains `$earliest$`, `$latest$`, and `$service$` tokens, which the custom URL contains `$earliest$`, `$latest$`, and `$service$` tokens, which
pass the beginning and end of the time span of the selected anomaly and the pass the beginning and end of the time span of the selected anomaly and the
pertinent `service` field value to the target page. If you were interested in the following anomaly, for example: pertinent `service` field value to the target page. If you were interested in
the following anomaly, for example:
[role="screenshot"] [role="screenshot"]
image::images/ml-customurl.jpg["An example of the custom URL links in the Anomaly Explorer anomalies table"] image::images/ml-customurl.jpg["An example of the custom URL links in the Anomaly Explorer anomalies table"]
@ -43,8 +44,8 @@ image::images/ml-customurl-discover.jpg["An example of the results on the Discov
Since we specified a time range of 2 hours, the time filter restricts the Since we specified a time range of 2 hours, the time filter restricts the
results to the time period two hours before and after the anomaly. results to the time period two hours before and after the anomaly.
You can also specify these custom URL settings when you create or update jobs by You can also specify these custom URL settings when you create or update
using the {ml} APIs. {anomaly-jobs} by using the APIs.
[float] [float]
[[ml-configuring-url-strings]] [[ml-configuring-url-strings]]
@ -74,9 +75,9 @@ time as the earliest and latest times. The same is also true if the interval is
set to `Auto` and a one hour interval was chosen. You can override this behavior set to `Auto` and a one hour interval was chosen. You can override this behavior
by using the `time_range` setting. by using the `time_range` setting.
The `$mlcategoryregex$` and `$mlcategoryterms$` tokens pertain to jobs where you The `$mlcategoryregex$` and `$mlcategoryterms$` tokens pertain to {anomaly-jobs}
are categorizing field values. For more information about this type of analysis, where you are categorizing field values. For more information about this type of
see <<ml-configuring-categories>>. analysis, see <<ml-configuring-categories>>.
The `$mlcategoryregex$` token passes the regular expression value of the The `$mlcategoryregex$` token passes the regular expression value of the
category of the selected anomaly, as identified by the value of the `mlcategory` category of the selected anomaly, as identified by the value of the `mlcategory`

View File

@ -22,8 +22,8 @@ functions are not really affected. In these situations, it all comes out okay in
the end as the delayed data is distributed randomly. An example would be a `mean` the end as the delayed data is distributed randomly. An example would be a `mean`
metric for a field in a large collection of data. In this case, checking for metric for a field in a large collection of data. In this case, checking for
delayed data may not provide much benefit. If data are consistently delayed, delayed data may not provide much benefit. If data are consistently delayed,
however, jobs with a `low_count` function may provide false positives. In this however, {anomaly-jobs} with a `low_count` function may provide false positives.
situation, it would be useful to see if data comes in after an anomaly is In this situation, it would be useful to see if data comes in after an anomaly is
recorded so that you can determine a next course of action. recorded so that you can determine a next course of action.
==== How do we detect delayed data? ==== How do we detect delayed data?
@ -35,11 +35,11 @@ Every 15 minutes or every `check_window`, whichever is smaller, the datafeed
triggers a document search over the configured indices. This search looks over a triggers a document search over the configured indices. This search looks over a
time span with a length of `check_window` ending with the latest finalized bucket. time span with a length of `check_window` ending with the latest finalized bucket.
That time span is partitioned into buckets, whose length equals the bucket span That time span is partitioned into buckets, whose length equals the bucket span
of the associated job. The `doc_count` of those buckets are then compared with of the associated {anomaly-job}. The `doc_count` of those buckets are then
the job's finalized analysis buckets to see whether any data has arrived since compared with the job's finalized analysis buckets to see whether any data has
the analysis. If there is indeed missing data due to their ingest delay, the end arrived since the analysis. If there is indeed missing data due to their ingest
user is notified. For example, you can see annotations in {kib} for the periods delay, the end user is notified. For example, you can see annotations in {kib}
where these delays occur. for the periods where these delays occur.
==== What to do about delayed data? ==== What to do about delayed data?

View File

@ -16,17 +16,18 @@ Let us see how those can be configured by examples.
==== Specifying custom rule scope ==== Specifying custom rule scope
Let us assume we are configuring a job in order to detect DNS data exfiltration. Let us assume we are configuring an {anomaly-job} in order to detect DNS data
Our data contain fields "subdomain" and "highest_registered_domain". exfiltration. Our data contain fields "subdomain" and "highest_registered_domain".
We can use a detector that looks like `high_info_content(subdomain) over highest_registered_domain`. We can use a detector that looks like
If we run such a job it is possible that we discover a lot of anomalies on `high_info_content(subdomain) over highest_registered_domain`. If we run such a
frequently used domains that we have reasons to trust. As security analysts, we job, it is possible that we discover a lot of anomalies on frequently used
are not interested in such anomalies. Ideally, we could instruct the detector to domains that we have reasons to trust. As security analysts, we are not
skip results for domains that we consider safe. Using a rule with a scope allows interested in such anomalies. Ideally, we could instruct the detector to skip
us to achieve this. results for domains that we consider safe. Using a rule with a scope allows us
to achieve this.
First, we need to create a list of our safe domains. Those lists are called First, we need to create a list of our safe domains. Those lists are called
_filters_ in {ml}. Filters can be shared across jobs. _filters_ in {ml}. Filters can be shared across {anomaly-jobs}.
We create our filter using the {ref}/ml-put-filter.html[put filter API]: We create our filter using the {ref}/ml-put-filter.html[put filter API]:
@ -41,8 +42,8 @@ PUT _ml/filters/safe_domains
// CONSOLE // CONSOLE
// TEST[skip:needs-licence] // TEST[skip:needs-licence]
Now, we can create our job specifying a scope that uses the `safe_domains` Now, we can create our {anomaly-job} specifying a scope that uses the
filter for the `highest_registered_domain` field: `safe_domains` filter for the `highest_registered_domain` field:
[source,js] [source,js]
---------------------------------- ----------------------------------
@ -139,8 +140,8 @@ example, 0.02. Given our knowledge about how CPU utilization behaves we might
determine that anomalies with such small actual values are not interesting for determine that anomalies with such small actual values are not interesting for
investigation. investigation.
Let us now configure a job with a rule that will skip results where CPU Let us now configure an {anomaly-job} with a rule that will skip results where
utilization is less than 0.20. CPU utilization is less than 0.20.
[source,js] [source,js]
---------------------------------- ----------------------------------
@ -214,18 +215,18 @@ PUT _ml/anomaly_detectors/rule_with_range
==== Custom rules in the life-cycle of a job ==== Custom rules in the life-cycle of a job
Custom rules only affect results created after the rules were applied. Custom rules only affect results created after the rules were applied.
Let us imagine that we have configured a job and it has been running Let us imagine that we have configured an {anomaly-job} and it has been running
for some time. After observing its results we decide that we can employ for some time. After observing its results we decide that we can employ
rules in order to get rid of some uninteresting results. We can use rules in order to get rid of some uninteresting results. We can use
the {ref}/ml-update-job.html[update job API] to do so. However, the rule we the {ref}/ml-update-job.html[update {anomaly-job} API] to do so. However, the
added will only be in effect for any results created from the moment we added rule we added will only be in effect for any results created from the moment we
the rule onwards. Past results will remain unaffected. added the rule onwards. Past results will remain unaffected.
==== Using custom rules VS filtering data ==== Using custom rules vs. filtering data
It might appear like using rules is just another way of filtering the data It might appear like using rules is just another way of filtering the data
that feeds into a job. For example, a rule that skips results when the that feeds into an {anomaly-job}. For example, a rule that skips results when
partition field value is in a filter sounds equivalent to having a query the partition field value is in a filter sounds equivalent to having a query
that filters out such documents. But it is not. There is a fundamental that filters out such documents. But it is not. There is a fundamental
difference. When the data is filtered before reaching a job it is as if they difference. When the data is filtered before reaching a job it is as if they
never existed for the job. With rules, the data still reaches the job and never existed for the job. With rules, the data still reaches the job and

View File

@ -5,10 +5,10 @@
The {ml-features} include analysis functions that provide a wide variety of The {ml-features} include analysis functions that provide a wide variety of
flexible ways to analyze data for anomalies. flexible ways to analyze data for anomalies.
When you create jobs, you specify one or more detectors, which define the type of When you create {anomaly-jobs}, you specify one or more detectors, which define
analysis that needs to be done. If you are creating your job by using {ml} APIs, the type of analysis that needs to be done. If you are creating your job by
you specify the functions in using {ml} APIs, you specify the functions in
{ref}/ml-job-resource.html#ml-detectorconfig[Detector Configuration Objects]. {ref}/ml-job-resource.html#ml-detectorconfig[Detector configuration objects].
If you are creating your job in {kib}, you specify the functions differently If you are creating your job in {kib}, you specify the functions differently
depending on whether you are creating single metric, multi-metric, or advanced depending on whether you are creating single metric, multi-metric, or advanced
jobs. jobs.
@ -24,8 +24,8 @@ You can specify a `summary_count_field_name` with any function except `metric`.
When you use `summary_count_field_name`, the {ml} features expect the input When you use `summary_count_field_name`, the {ml} features expect the input
data to be pre-aggregated. The value of the `summary_count_field_name` field data to be pre-aggregated. The value of the `summary_count_field_name` field
must contain the count of raw events that were summarized. In {kib}, use the must contain the count of raw events that were summarized. In {kib}, use the
**summary_count_field_name** in advanced jobs. Analyzing aggregated input data **summary_count_field_name** in advanced {anomaly-jobs}. Analyzing aggregated
provides a significant boost in performance. For more information, see input data provides a significant boost in performance. For more information, see
<<ml-configuring-aggregation>>. <<ml-configuring-aggregation>>.
If your data is sparse, there may be gaps in the data which means you might have If your data is sparse, there may be gaps in the data which means you might have

View File

@ -40,7 +40,7 @@ These functions support the following properties:
* `partition_field_name` (optional) * `partition_field_name` (optional)
For more information about those properties, For more information about those properties,
see {ref}/ml-job-resource.html#ml-detectorconfig[Detector Configuration Objects]. see {ref}/ml-job-resource.html#ml-detectorconfig[Detector configuration objects].
.Example 1: Analyzing events with the count function .Example 1: Analyzing events with the count function
[source,js] [source,js]
@ -65,8 +65,9 @@ This example is probably the simplest possible analysis. It identifies
time buckets during which the overall count of events is higher or lower than time buckets during which the overall count of events is higher or lower than
usual. usual.
When you use this function in a detector in your job, it models the event rate When you use this function in a detector in your {anomaly-job}, it models the
and detects when the event rate is unusual compared to its past behavior. event rate and detects when the event rate is unusual compared to its past
behavior.
.Example 2: Analyzing errors with the high_count function .Example 2: Analyzing errors with the high_count function
[source,js] [source,js]
@ -89,7 +90,7 @@ PUT _ml/anomaly_detectors/example2
// CONSOLE // CONSOLE
// TEST[skip:needs-licence] // TEST[skip:needs-licence]
If you use this `high_count` function in a detector in your job, it If you use this `high_count` function in a detector in your {anomaly-job}, it
models the event rate for each error code. It detects users that generate an models the event rate for each error code. It detects users that generate an
unusually high count of error codes compared to other users. unusually high count of error codes compared to other users.
@ -117,9 +118,9 @@ PUT _ml/anomaly_detectors/example3
In this example, the function detects when the count of events for a In this example, the function detects when the count of events for a
status code is lower than usual. status code is lower than usual.
When you use this function in a detector in your job, it models the event rate When you use this function in a detector in your {anomaly-job}, it models the
for each status code and detects when a status code has an unusually low count event rate for each status code and detects when a status code has an unusually
compared to its past behavior. low count compared to its past behavior.
.Example 4: Analyzing aggregated data with the count function .Example 4: Analyzing aggregated data with the count function
[source,js] [source,js]
@ -168,7 +169,7 @@ These functions support the following properties:
* `partition_field_name` (optional) * `partition_field_name` (optional)
For more information about those properties, For more information about those properties,
see {ref}/ml-job-resource.html#ml-detectorconfig[Detector Configuration Objects]. see {ref}/ml-job-resource.html#ml-detectorconfig[Detector configuration objects].
For example, if you have the following number of events per bucket: For example, if you have the following number of events per bucket:
@ -206,10 +207,10 @@ PUT _ml/anomaly_detectors/example5
// CONSOLE // CONSOLE
// TEST[skip:needs-licence] // TEST[skip:needs-licence]
If you use this `high_non_zero_count` function in a detector in your job, it If you use this `high_non_zero_count` function in a detector in your
models the count of events for the `signaturename` field. It ignores any buckets {anomaly-job}, it models the count of events for the `signaturename` field. It
where the count is zero and detects when a `signaturename` value has an ignores any buckets where the count is zero and detects when a `signaturename`
unusually high count of events compared to its past behavior. value has an unusually high count of events compared to its past behavior.
NOTE: Population analysis (using an `over_field_name` property value) is not NOTE: Population analysis (using an `over_field_name` property value) is not
supported for the `non_zero_count`, `high_non_zero_count`, and supported for the `non_zero_count`, `high_non_zero_count`, and
@ -238,7 +239,7 @@ These functions support the following properties:
* `partition_field_name` (optional) * `partition_field_name` (optional)
For more information about those properties, For more information about those properties,
see {ref}/ml-job-resource.html#ml-detectorconfig[Detector Configuration Objects]. see {ref}/ml-job-resource.html#ml-detectorconfig[Detector configuration objects].
.Example 6: Analyzing users with the distinct_count function .Example 6: Analyzing users with the distinct_count function
[source,js] [source,js]
@ -261,9 +262,9 @@ PUT _ml/anomaly_detectors/example6
// TEST[skip:needs-licence] // TEST[skip:needs-licence]
This `distinct_count` function detects when a system has an unusual number This `distinct_count` function detects when a system has an unusual number
of logged in users. When you use this function in a detector in your job, it of logged in users. When you use this function in a detector in your
models the distinct count of users. It also detects when the distinct number of {anomaly-job}, it models the distinct count of users. It also detects when the
users is unusual compared to the past. distinct number of users is unusual compared to the past.
.Example 7: Analyzing ports with the high_distinct_count function .Example 7: Analyzing ports with the high_distinct_count function
[source,js] [source,js]
@ -287,6 +288,6 @@ PUT _ml/anomaly_detectors/example7
// TEST[skip:needs-licence] // TEST[skip:needs-licence]
This example detects instances of port scanning. When you use this function in a This example detects instances of port scanning. When you use this function in a
detector in your job, it models the distinct count of ports. It also detects the detector in your {anomaly-job}, it models the distinct count of ports. It also
`src_ip` values that connect to an unusually high number of different detects the `src_ip` values that connect to an unusually high number of different
`dst_ports` values compared to other `src_ip` values. `dst_ports` values compared to other `src_ip` values.

View File

@ -7,9 +7,9 @@ input data.
The {ml-features} include the following geographic function: `lat_long`. The {ml-features} include the following geographic function: `lat_long`.
NOTE: You cannot create forecasts for jobs that contain geographic functions. NOTE: You cannot create forecasts for {anomaly-jobs} that contain geographic
You also cannot add rules with conditions to detectors that use geographic functions. You also cannot add rules with conditions to detectors that use
functions. geographic functions.
[float] [float]
[[ml-lat-long]] [[ml-lat-long]]
@ -26,7 +26,7 @@ This function supports the following properties:
* `partition_field_name` (optional) * `partition_field_name` (optional)
For more information about those properties, For more information about those properties,
see {ref}/ml-job-resource.html#ml-detectorconfig[Detector Configuration Objects]. see {ref}/ml-job-resource.html#ml-detectorconfig[Detector configuration objects].
.Example 1: Analyzing transactions with the lat_long function .Example 1: Analyzing transactions with the lat_long function
[source,js] [source,js]
@ -49,15 +49,15 @@ PUT _ml/anomaly_detectors/example1
// CONSOLE // CONSOLE
// TEST[skip:needs-licence] // TEST[skip:needs-licence]
If you use this `lat_long` function in a detector in your job, it If you use this `lat_long` function in a detector in your {anomaly-job}, it
detects anomalies where the geographic location of a credit card transaction is detects anomalies where the geographic location of a credit card transaction is
unusual for a particular customers credit card. An anomaly might indicate fraud. unusual for a particular customers credit card. An anomaly might indicate fraud.
IMPORTANT: The `field_name` that you supply must be a single string that contains IMPORTANT: The `field_name` that you supply must be a single string that contains
two comma-separated numbers of the form `latitude,longitude`, a `geo_point` field, two comma-separated numbers of the form `latitude,longitude`, a `geo_point` field,
a `geo_shape` field that contains point values, or a `geo_centroid` aggregation. a `geo_shape` field that contains point values, or a `geo_centroid` aggregation.
The `latitude` and `longitude` must be in the range -180 to 180 and represent a point on the The `latitude` and `longitude` must be in the range -180 to 180 and represent a
surface of the Earth. point on the surface of the Earth.
For example, JSON data might contain the following transaction coordinates: For example, JSON data might contain the following transaction coordinates:
@ -75,6 +75,6 @@ In {es}, location data is likely to be stored in `geo_point` fields. For more
information, see {ref}/geo-point.html[Geo-point datatype]. This data type is information, see {ref}/geo-point.html[Geo-point datatype]. This data type is
supported natively in {ml-features}. Specifically, {dfeed} when pulling data from supported natively in {ml-features}. Specifically, {dfeed} when pulling data from
a `geo_point` field, will transform the data into the appropriate `lat,lon` string a `geo_point` field, will transform the data into the appropriate `lat,lon` string
format before sending to the {ml} job. format before sending to the {anomaly-job}.
For more information, see <<ml-configuring-transform>>. For more information, see <<ml-configuring-transform>>.

View File

@ -29,7 +29,7 @@ These functions support the following properties:
* `partition_field_name` (optional) * `partition_field_name` (optional)
For more information about those properties, see For more information about those properties, see
{ref}/ml-job-resource.html#ml-detectorconfig[Detector Configuration Objects]. {ref}/ml-job-resource.html#ml-detectorconfig[Detector configuration objects].
.Example 1: Analyzing subdomain strings with the info_content function .Example 1: Analyzing subdomain strings with the info_content function
[source,js] [source,js]
@ -42,9 +42,9 @@ For more information about those properties, see
-------------------------------------------------- --------------------------------------------------
// NOTCONSOLE // NOTCONSOLE
If you use this `info_content` function in a detector in your job, it models If you use this `info_content` function in a detector in your {anomaly-job}, it
information that is present in the `subdomain` string. It detects anomalies models information that is present in the `subdomain` string. It detects
where the information content is unusual compared to the other anomalies where the information content is unusual compared to the other
`highest_registered_domain` values. An anomaly could indicate an abuse of the `highest_registered_domain` values. An anomaly could indicate an abuse of the
DNS protocol, such as malicious command and control activity. DNS protocol, such as malicious command and control activity.
@ -63,8 +63,8 @@ choice.
-------------------------------------------------- --------------------------------------------------
// NOTCONSOLE // NOTCONSOLE
If you use this `high_info_content` function in a detector in your job, it If you use this `high_info_content` function in a detector in your {anomaly-job},
models information content that is held in the DNS query string. It detects it models information content that is held in the DNS query string. It detects
`src_ip` values where the information content is unusually high compared to `src_ip` values where the information content is unusually high compared to
other `src_ip` values. This example is similar to the example for the other `src_ip` values. This example is similar to the example for the
`info_content` function, but it reports anomalies only where the amount of `info_content` function, but it reports anomalies only where the amount of
@ -81,8 +81,8 @@ information content is higher than expected.
-------------------------------------------------- --------------------------------------------------
// NOTCONSOLE // NOTCONSOLE
If you use this `low_info_content` function in a detector in your job, it models If you use this `low_info_content` function in a detector in your {anomaly-job},
information content that is present in the message string for each it models information content that is present in the message string for each
`logfilename`. It detects anomalies where the information content is low `logfilename`. It detects anomalies where the information content is low
compared to its past behavior. For example, this function detects unusually low compared to its past behavior. For example, this function detects unusually low
amounts of information in a collection of rolling log files. Low information amounts of information in a collection of rolling log files. Low information

View File

@ -35,7 +35,7 @@ This function supports the following properties:
* `partition_field_name` (optional) * `partition_field_name` (optional)
For more information about those properties, see For more information about those properties, see
{ref}/ml-job-resource.html#ml-detectorconfig[Detector Configuration Objects]. {ref}/ml-job-resource.html#ml-detectorconfig[Detector configuration objects].
.Example 1: Analyzing minimum transactions with the min function .Example 1: Analyzing minimum transactions with the min function
[source,js] [source,js]
@ -48,9 +48,9 @@ For more information about those properties, see
-------------------------------------------------- --------------------------------------------------
// NOTCONSOLE // NOTCONSOLE
If you use this `min` function in a detector in your job, it detects where the If you use this `min` function in a detector in your {anomaly-job}, it detects
smallest transaction is lower than previously observed. You can use this where the smallest transaction is lower than previously observed. You can use
function to detect items for sale at unintentionally low prices due to data this function to detect items for sale at unintentionally low prices due to data
entry mistakes. It models the minimum amount for each product over time. entry mistakes. It models the minimum amount for each product over time.
[float] [float]
@ -70,7 +70,7 @@ This function supports the following properties:
* `partition_field_name` (optional) * `partition_field_name` (optional)
For more information about those properties, see For more information about those properties, see
{ref}/ml-job-resource.html#ml-detectorconfig[Detector Configuration Objects]. {ref}/ml-job-resource.html#ml-detectorconfig[Detector configuration objects].
.Example 2: Analyzing maximum response times with the max function .Example 2: Analyzing maximum response times with the max function
[source,js] [source,js]
@ -83,9 +83,9 @@ For more information about those properties, see
-------------------------------------------------- --------------------------------------------------
// NOTCONSOLE // NOTCONSOLE
If you use this `max` function in a detector in your job, it detects where the If you use this `max` function in a detector in your {anomaly-job}, it detects
longest `responsetime` is longer than previously observed. You can use this where the longest `responsetime` is longer than previously observed. You can use
function to detect applications that have `responsetime` values that are this function to detect applications that have `responsetime` values that are
unusually lengthy. It models the maximum `responsetime` for each application unusually lengthy. It models the maximum `responsetime` for each application
over time and detects when the longest `responsetime` is unusually long compared over time and detects when the longest `responsetime` is unusually long compared
to previous applications. to previous applications.
@ -132,7 +132,7 @@ These functions support the following properties:
* `partition_field_name` (optional) * `partition_field_name` (optional)
For more information about those properties, see For more information about those properties, see
{ref}/ml-job-resource.html#ml-detectorconfig[Detector Configuration Objects]. {ref}/ml-job-resource.html#ml-detectorconfig[Detector configuration objects].
.Example 4: Analyzing response times with the median function .Example 4: Analyzing response times with the median function
[source,js] [source,js]
@ -145,9 +145,9 @@ For more information about those properties, see
-------------------------------------------------- --------------------------------------------------
// NOTCONSOLE // NOTCONSOLE
If you use this `median` function in a detector in your job, it models the If you use this `median` function in a detector in your {anomaly-job}, it models
median `responsetime` for each application over time. It detects when the median the median `responsetime` for each application over time. It detects when the
`responsetime` is unusual compared to previous `responsetime` values. median `responsetime` is unusual compared to previous `responsetime` values.
[float] [float]
[[ml-metric-mean]] [[ml-metric-mean]]
@ -170,7 +170,7 @@ These functions support the following properties:
* `partition_field_name` (optional) * `partition_field_name` (optional)
For more information about those properties, see For more information about those properties, see
{ref}/ml-job-resource.html#ml-detectorconfig[Detector Configuration Objects]. {ref}/ml-job-resource.html#ml-detectorconfig[Detector configuration objects].
.Example 5: Analyzing response times with the mean function .Example 5: Analyzing response times with the mean function
[source,js] [source,js]
@ -183,8 +183,8 @@ For more information about those properties, see
-------------------------------------------------- --------------------------------------------------
// NOTCONSOLE // NOTCONSOLE
If you use this `mean` function in a detector in your job, it models the mean If you use this `mean` function in a detector in your {anomaly-job}, it models
`responsetime` for each application over time. It detects when the mean the mean `responsetime` for each application over time. It detects when the mean
`responsetime` is unusual compared to previous `responsetime` values. `responsetime` is unusual compared to previous `responsetime` values.
.Example 6: Analyzing response times with the high_mean function .Example 6: Analyzing response times with the high_mean function
@ -198,9 +198,10 @@ If you use this `mean` function in a detector in your job, it models the mean
-------------------------------------------------- --------------------------------------------------
// NOTCONSOLE // NOTCONSOLE
If you use this `high_mean` function in a detector in your job, it models the If you use this `high_mean` function in a detector in your {anomaly-job}, it
mean `responsetime` for each application over time. It detects when the mean models the mean `responsetime` for each application over time. It detects when
`responsetime` is unusually high compared to previous `responsetime` values. the mean `responsetime` is unusually high compared to previous `responsetime`
values.
.Example 7: Analyzing response times with the low_mean function .Example 7: Analyzing response times with the low_mean function
[source,js] [source,js]
@ -213,9 +214,10 @@ mean `responsetime` for each application over time. It detects when the mean
-------------------------------------------------- --------------------------------------------------
// NOTCONSOLE // NOTCONSOLE
If you use this `low_mean` function in a detector in your job, it models the If you use this `low_mean` function in a detector in your {anomaly-job}, it
mean `responsetime` for each application over time. It detects when the mean models the mean `responsetime` for each application over time. It detects when
`responsetime` is unusually low compared to previous `responsetime` values. the mean `responsetime` is unusually low compared to previous `responsetime`
values.
[float] [float]
[[ml-metric-metric]] [[ml-metric-metric]]
@ -236,7 +238,7 @@ This function supports the following properties:
* `partition_field_name` (optional) * `partition_field_name` (optional)
For more information about those properties, see For more information about those properties, see
{ref}/ml-job-resource.html#ml-detectorconfig[Detector Configuration Objects]. {ref}/ml-job-resource.html#ml-detectorconfig[Detector configuration objects].
.Example 8: Analyzing response times with the metric function .Example 8: Analyzing response times with the metric function
[source,js] [source,js]
@ -249,8 +251,8 @@ For more information about those properties, see
-------------------------------------------------- --------------------------------------------------
// NOTCONSOLE // NOTCONSOLE
If you use this `metric` function in a detector in your job, it models the If you use this `metric` function in a detector in your {anomaly-job}, it models
mean, min, and max `responsetime` for each application over time. It detects the mean, min, and max `responsetime` for each application over time. It detects
when the mean, min, or max `responsetime` is unusual compared to previous when the mean, min, or max `responsetime` is unusual compared to previous
`responsetime` values. `responsetime` values.
@ -273,7 +275,7 @@ These functions support the following properties:
* `partition_field_name` (optional) * `partition_field_name` (optional)
For more information about those properties, see For more information about those properties, see
{ref}/ml-job-resource.html#ml-detectorconfig[Detector Configuration Objects]. {ref}/ml-job-resource.html#ml-detectorconfig[Detector configuration objects].
.Example 9: Analyzing response times with the varp function .Example 9: Analyzing response times with the varp function
[source,js] [source,js]
@ -286,10 +288,10 @@ For more information about those properties, see
-------------------------------------------------- --------------------------------------------------
// NOTCONSOLE // NOTCONSOLE
If you use this `varp` function in a detector in your job, it models the If you use this `varp` function in a detector in your {anomaly-job}, it models
variance in values of `responsetime` for each application over time. It detects the variance in values of `responsetime` for each application over time. It
when the variance in `responsetime` is unusual compared to past application detects when the variance in `responsetime` is unusual compared to past
behavior. application behavior.
.Example 10: Analyzing response times with the high_varp function .Example 10: Analyzing response times with the high_varp function
[source,js] [source,js]
@ -302,10 +304,10 @@ behavior.
-------------------------------------------------- --------------------------------------------------
// NOTCONSOLE // NOTCONSOLE
If you use this `high_varp` function in a detector in your job, it models the If you use this `high_varp` function in a detector in your {anomaly-job}, it
variance in values of `responsetime` for each application over time. It detects models the variance in values of `responsetime` for each application over time.
when the variance in `responsetime` is unusual compared to past application It detects when the variance in `responsetime` is unusual compared to past
behavior. application behavior.
.Example 11: Analyzing response times with the low_varp function .Example 11: Analyzing response times with the low_varp function
[source,js] [source,js]
@ -318,7 +320,7 @@ behavior.
-------------------------------------------------- --------------------------------------------------
// NOTCONSOLE // NOTCONSOLE
If you use this `low_varp` function in a detector in your job, it models the If you use this `low_varp` function in a detector in your {anomaly-job}, it
variance in values of `responsetime` for each application over time. It detects models the variance in values of `responsetime` for each application over time.
when the variance in `responsetime` is unusual compared to past application It detects when the variance in `responsetime` is unusual compared to past
behavior. application behavior.

View File

@ -13,8 +13,8 @@ number of times (frequency) rare values occur.
==== ====
* The `rare` and `freq_rare` functions should not be used in conjunction with * The `rare` and `freq_rare` functions should not be used in conjunction with
`exclude_frequent`. `exclude_frequent`.
* You cannot create forecasts for jobs that contain `rare` or `freq_rare` * You cannot create forecasts for {anomaly-jobs} that contain `rare` or
functions. `freq_rare` functions.
* You cannot add rules with conditions to detectors that use `rare` or * You cannot add rules with conditions to detectors that use `rare` or
`freq_rare` functions. `freq_rare` functions.
* Shorter bucket spans (less than 1 hour, for example) are recommended when * Shorter bucket spans (less than 1 hour, for example) are recommended when
@ -47,7 +47,7 @@ This function supports the following properties:
* `partition_field_name` (optional) * `partition_field_name` (optional)
For more information about those properties, see For more information about those properties, see
{ref}/ml-job-resource.html#ml-detectorconfig[Detector Configuration Objects]. {ref}/ml-job-resource.html#ml-detectorconfig[Detector configuration objects].
.Example 1: Analyzing status codes with the rare function .Example 1: Analyzing status codes with the rare function
[source,js] [source,js]
@ -59,10 +59,11 @@ For more information about those properties, see
-------------------------------------------------- --------------------------------------------------
// NOTCONSOLE // NOTCONSOLE
If you use this `rare` function in a detector in your job, it detects values If you use this `rare` function in a detector in your {anomaly-job}, it detects
that are rare in time. It models status codes that occur over time and detects values that are rare in time. It models status codes that occur over time and
when rare status codes occur compared to the past. For example, you can detect detects when rare status codes occur compared to the past. For example, you can
status codes in a web access log that have never (or rarely) occurred before. detect status codes in a web access log that have never (or rarely) occurred
before.
.Example 2: Analyzing status codes in a population with the rare function .Example 2: Analyzing status codes in a population with the rare function
[source,js] [source,js]
@ -75,15 +76,15 @@ status codes in a web access log that have never (or rarely) occurred before.
-------------------------------------------------- --------------------------------------------------
// NOTCONSOLE // NOTCONSOLE
If you use this `rare` function in a detector in your job, it detects values If you use this `rare` function in a detector in your {anomaly-job}, it detects
that are rare in a population. It models status code and client IP interactions values that are rare in a population. It models status code and client IP
that occur. It defines a rare status code as one that occurs for few client IP interactions that occur. It defines a rare status code as one that occurs for
values compared to the population. It detects client IP values that experience few client IP values compared to the population. It detects client IP values
one or more distinct rare status codes compared to the population. For example that experience one or more distinct rare status codes compared to the
in a web access log, a `clientip` that experiences the highest number of population. For example in a web access log, a `clientip` that experiences the
different rare status codes compared to the population is regarded as highly highest number of different rare status codes compared to the population is
anomalous. This analysis is based on the number of different status code values, regarded as highly anomalous. This analysis is based on the number of different
not the count of occurrences. status code values, not the count of occurrences.
NOTE: To define a status code as rare the {ml-features} look at the number NOTE: To define a status code as rare the {ml-features} look at the number
of distinct status codes that occur, not the number of times the status code of distinct status codes that occur, not the number of times the status code
@ -105,7 +106,7 @@ This function supports the following properties:
* `partition_field_name` (optional) * `partition_field_name` (optional)
For more information about those properties, see For more information about those properties, see
{ref}/ml-job-resource.html#ml-detectorconfig[Detector Configuration Objects]. {ref}/ml-job-resource.html#ml-detectorconfig[Detector configuration objects].
.Example 3: Analyzing URI values in a population with the freq_rare function .Example 3: Analyzing URI values in a population with the freq_rare function
[source,js] [source,js]
@ -118,7 +119,7 @@ For more information about those properties, see
-------------------------------------------------- --------------------------------------------------
// NOTCONSOLE // NOTCONSOLE
If you use this `freq_rare` function in a detector in your job, it If you use this `freq_rare` function in a detector in your {anomaly-job}, it
detects values that are frequently rare in a population. It models URI paths and detects values that are frequently rare in a population. It models URI paths and
client IP interactions that occur. It defines a rare URI path as one that is client IP interactions that occur. It defines a rare URI path as one that is
visited by few client IP values compared to the population. It detects the visited by few client IP values compared to the population. It detects the

View File

@ -2,7 +2,8 @@
[[ml-sum-functions]] [[ml-sum-functions]]
=== Sum functions === Sum functions
The sum functions detect anomalies when the sum of a field in a bucket is anomalous. The sum functions detect anomalies when the sum of a field in a bucket is
anomalous.
If you want to monitor unusually high totals, use high-sided functions. If you want to monitor unusually high totals, use high-sided functions.
@ -35,7 +36,7 @@ These functions support the following properties:
* `partition_field_name` (optional) * `partition_field_name` (optional)
For more information about those properties, see For more information about those properties, see
{ref}/ml-job-resource.html#ml-detectorconfig[Detector Configuration Objects]. {ref}/ml-job-resource.html#ml-detectorconfig[Detector configuration objects].
.Example 1: Analyzing total expenses with the sum function .Example 1: Analyzing total expenses with the sum function
[source,js] [source,js]
@ -49,7 +50,7 @@ For more information about those properties, see
-------------------------------------------------- --------------------------------------------------
// NOTCONSOLE // NOTCONSOLE
If you use this `sum` function in a detector in your job, it If you use this `sum` function in a detector in your {anomaly-job}, it
models total expenses per employees for each cost center. For each time bucket, models total expenses per employees for each cost center. For each time bucket,
it detects when an employees expenses are unusual for a cost center compared it detects when an employees expenses are unusual for a cost center compared
to other employees. to other employees.
@ -65,7 +66,7 @@ to other employees.
-------------------------------------------------- --------------------------------------------------
// NOTCONSOLE // NOTCONSOLE
If you use this `high_sum` function in a detector in your job, it If you use this `high_sum` function in a detector in your {anomaly-job}, it
models total `cs_bytes`. It detects `cs_hosts` that transfer unusually high models total `cs_bytes`. It detects `cs_hosts` that transfer unusually high
volumes compared to other `cs_hosts`. This example looks for volumes of data volumes compared to other `cs_hosts`. This example looks for volumes of data
transferred from a client to a server on the internet that are unusual compared transferred from a client to a server on the internet that are unusual compared
@ -91,7 +92,7 @@ These functions support the following properties:
* `partition_field_name` (optional) * `partition_field_name` (optional)
For more information about those properties, see For more information about those properties, see
{ref}/ml-job-resource.html#ml-detectorconfig[Detector Configuration Objects]. {ref}/ml-job-resource.html#ml-detectorconfig[Detector configuration objects].
NOTE: Population analysis (that is to say, use of the `over_field_name` property) NOTE: Population analysis (that is to say, use of the `over_field_name` property)
is not applicable for this function. is not applicable for this function.
@ -107,9 +108,7 @@ is not applicable for this function.
-------------------------------------------------- --------------------------------------------------
// NOTCONSOLE // NOTCONSOLE
If you use this `high_non_null_sum` function in a detector in your job, it If you use this `high_non_null_sum` function in a detector in your {anomaly-job},
models the total `amount_approved` for each employee. It ignores any buckets it models the total `amount_approved` for each employee. It ignores any buckets
where the amount is null. It detects employees who approve unusually high where the amount is null. It detects employees who approve unusually high
amounts compared to their past behavior. amounts compared to their past behavior.
//For this credit control system analysis, using non_null_sum will ignore
//periods where the employees are not active on the system.

View File

@ -14,22 +14,25 @@ The {ml-features} include the following time functions:
[NOTE] [NOTE]
==== ====
* NOTE: You cannot create forecasts for jobs that contain time functions. * NOTE: You cannot create forecasts for {anomaly-jobs} that contain time
* The `time_of_day` function is not aware of the difference between days, for instance functions.
work days and weekends. When modeling different days, use the `time_of_week` function. * The `time_of_day` function is not aware of the difference between days, for
In general, the `time_of_week` function is more suited to modeling the behavior of people instance work days and weekends. When modeling different days, use the
rather than machines, as people vary their behavior according to the day of the week. `time_of_week` function. In general, the `time_of_week` function is more suited
* Shorter bucket spans (for example, 10 minutes) are recommended when performing a to modeling the behavior of people rather than machines, as people vary their
`time_of_day` or `time_of_week` analysis. The time of the events being modeled are not behavior according to the day of the week.
affected by the bucket span, but a shorter bucket span enables quicker alerting on unusual * Shorter bucket spans (for example, 10 minutes) are recommended when performing
events. a `time_of_day` or `time_of_week` analysis. The time of the events being modeled
* Unusual events are flagged based on the previous pattern of the data, not on what we are not affected by the bucket span, but a shorter bucket span enables quicker
might think of as unusual based on human experience. So, if events typically occur alerting on unusual events.
between 3 a.m. and 5 a.m., and event occurring at 3 p.m. is be flagged as unusual. * Unusual events are flagged based on the previous pattern of the data, not on
* When Daylight Saving Time starts or stops, regular events can be flagged as anomalous. what we might think of as unusual based on human experience. So, if events
This situation occurs because the actual time of the event (as measured against a UTC typically occur between 3 a.m. and 5 a.m., and event occurring at 3 p.m. is be
baseline) has changed. This situation is treated as a step change in behavior and the new flagged as unusual.
times will be learned quickly. * When Daylight Saving Time starts or stops, regular events can be flagged as
anomalous. This situation occurs because the actual time of the event (as
measured against a UTC baseline) has changed. This situation is treated as a
step change in behavior and the new times will be learned quickly.
==== ====
[float] [float]
@ -51,7 +54,7 @@ This function supports the following properties:
* `partition_field_name` (optional) * `partition_field_name` (optional)
For more information about those properties, see For more information about those properties, see
{ref}/ml-job-resource.html#ml-detectorconfig[Detector Configuration Objects]. {ref}/ml-job-resource.html#ml-detectorconfig[Detector configuration objects].
.Example 1: Analyzing events with the time_of_day function .Example 1: Analyzing events with the time_of_day function
[source,js] [source,js]
@ -63,7 +66,7 @@ For more information about those properties, see
-------------------------------------------------- --------------------------------------------------
// NOTCONSOLE // NOTCONSOLE
If you use this `time_of_day` function in a detector in your job, it If you use this `time_of_day` function in a detector in your {anomaly-job}, it
models when events occur throughout a day for each process. It detects when an models when events occur throughout a day for each process. It detects when an
event occurs for a process that is at an unusual time in the day compared to event occurs for a process that is at an unusual time in the day compared to
its past behavior. its past behavior.
@ -82,7 +85,7 @@ This function supports the following properties:
* `partition_field_name` (optional) * `partition_field_name` (optional)
For more information about those properties, see For more information about those properties, see
{ref}/ml-job-resource.html#ml-detectorconfig[Detector Configuration Objects]. {ref}/ml-job-resource.html#ml-detectorconfig[Detector configuration objects].
.Example 2: Analyzing events with the time_of_week function .Example 2: Analyzing events with the time_of_week function
[source,js] [source,js]
@ -95,7 +98,7 @@ For more information about those properties, see
-------------------------------------------------- --------------------------------------------------
// NOTCONSOLE // NOTCONSOLE
If you use this `time_of_week` function in a detector in your job, it If you use this `time_of_week` function in a detector in your {anomaly-job}, it
models when events occur throughout the week for each `eventcode`. It detects models when events occur throughout the week for each `eventcode`. It detects
when a workstation event occurs at an unusual time during the week for that when a workstation event occurs at an unusual time during the week for that
`eventcode` compared to other workstations. It detects events for a `eventcode` compared to other workstations. It detects events for a

View File

@ -57,9 +57,9 @@ PUT _ml/anomaly_detectors/population
in each bucket. in each bucket.
If your data is stored in {es}, you can use the population job wizard in {kib} If your data is stored in {es}, you can use the population job wizard in {kib}
to create a job with these same properties. For example, if you add the sample to create an {anomaly-job} with these same properties. For example, if you add
web logs in {kib}, you can use the following job settings in the population job the sample web logs in {kib}, you can use the following job settings in the
wizard: population job wizard:
[role="screenshot"] [role="screenshot"]
image::images/ml-population-job.jpg["Job settings in the population job wizard] image::images/ml-population-job.jpg["Job settings in the population job wizard]

View File

@ -1,22 +1,22 @@
[role="xpack"] [role="xpack"]
[[stopping-ml]] [[stopping-ml]]
== Stopping machine learning == Stopping {ml} {anomaly-detect}
An orderly shutdown of {ml} ensures that: An orderly shutdown ensures that:
* {dfeeds-cap} are stopped * {dfeeds-cap} are stopped
* Buffers are flushed * Buffers are flushed
* Model history is pruned * Model history is pruned
* Final results are calculated * Final results are calculated
* Model snapshots are saved * Model snapshots are saved
* Jobs are closed * {anomaly-jobs-cap} are closed
This process ensures that jobs are in a consistent state in case you want to This process ensures that jobs are in a consistent state in case you want to
subsequently re-open them. subsequently re-open them.
[float] [float]
[[stopping-ml-datafeeds]] [[stopping-ml-datafeeds]]
=== Stopping {dfeeds-cap} === Stopping {dfeeds}
When you stop a {dfeed}, it ceases to retrieve data from {es}. You can stop a When you stop a {dfeed}, it ceases to retrieve data from {es}. You can stop a
{dfeed} by using {kib} or the {dfeed} by using {kib} or the
@ -25,7 +25,7 @@ request stops the `feed1` {dfeed}:
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
POST _ml/datafeeds/datafeed-total-requests/_stop POST _ml/datafeeds/feed1/_stop
-------------------------------------------------- --------------------------------------------------
// CONSOLE // CONSOLE
// TEST[skip:setup:server_metrics_startdf] // TEST[skip:setup:server_metrics_startdf]
@ -39,7 +39,7 @@ A {dfeed} can be started and stopped multiple times throughout its lifecycle.
[float] [float]
[[stopping-all-ml-datafeeds]] [[stopping-all-ml-datafeeds]]
==== Stopping All {dfeeds-cap} ==== Stopping all {dfeeds}
If you are upgrading your cluster, you can use the following request to stop all If you are upgrading your cluster, you can use the following request to stop all
{dfeeds}: {dfeeds}:
@ -53,19 +53,20 @@ POST _ml/datafeeds/_all/_stop
[float] [float]
[[closing-ml-jobs]] [[closing-ml-jobs]]
=== Closing Jobs === Closing {anomaly-jobs}
When you close a job, it cannot receive data or perform analysis operations. When you close an {anomaly-job}, it cannot receive data or perform analysis
If a job is associated with a {dfeed}, you must stop the {dfeed} before you can operations. If a job is associated with a {dfeed}, you must stop the {dfeed}
close the jobs. If the {dfeed} has an end date, the job closes automatically on before you can close the job. If the {dfeed} has an end date, the job closes
that end date. automatically on that end date.
You can close a job by using the {ref}/ml-close-job.html[close job API]. For You can close a job by using the
{ref}/ml-close-job.html[close {anomaly-job} API]. For
example, the following request closes the `job1` job: example, the following request closes the `job1` job:
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
POST _ml/anomaly_detectors/total-requests/_close POST _ml/anomaly_detectors/job1/_close
-------------------------------------------------- --------------------------------------------------
// CONSOLE // CONSOLE
// TEST[skip:setup:server_metrics_openjob] // TEST[skip:setup:server_metrics_openjob]
@ -73,14 +74,15 @@ POST _ml/anomaly_detectors/total-requests/_close
NOTE: You must have `manage_ml`, or `manage` cluster privileges to stop {dfeeds}. NOTE: You must have `manage_ml`, or `manage` cluster privileges to stop {dfeeds}.
For more information, see <<security-privileges>>. For more information, see <<security-privileges>>.
A job can be opened and closed multiple times throughout its lifecycle. {anomaly-jobs-cap} can be opened and closed multiple times throughout their
lifecycle.
[float] [float]
[[closing-all-ml-datafeeds]] [[closing-all-ml-datafeeds]]
==== Closing All Jobs ==== Closing all {anomaly-jobs}
If you are upgrading your cluster, you can use the following request to close If you are upgrading your cluster, you can use the following request to close
all open jobs on the cluster: all open {anomaly-jobs} on the cluster:
[source,js] [source,js]
---------------------------------- ----------------------------------

View File

@ -7,9 +7,9 @@ it is analyzed. {dfeeds-cap} contain an optional `script_fields` property, where
you can specify scripts that evaluate custom expressions and return script you can specify scripts that evaluate custom expressions and return script
fields. fields.
If your {dfeed} defines script fields, you can use those fields in your job. If your {dfeed} defines script fields, you can use those fields in your
For example, you can use the script fields in the analysis functions in one or {anomaly-job}. For example, you can use the script fields in the analysis
more detectors. functions in one or more detectors.
* <<ml-configuring-transform1>> * <<ml-configuring-transform1>>
* <<ml-configuring-transform2>> * <<ml-configuring-transform2>>
@ -146,12 +146,14 @@ PUT _ml/datafeeds/datafeed-test1
within the job. within the job.
<2> The script field is defined in the {dfeed}. <2> The script field is defined in the {dfeed}.
This `test1` job contains a detector that uses a script field in a mean analysis This `test1` {anomaly-job} contains a detector that uses a script field in a
function. The `datafeed-test1` {dfeed} defines the script field. It contains a mean analysis function. The `datafeed-test1` {dfeed} defines the script field.
script that adds two fields in the document to produce a "total" error count. It contains a script that adds two fields in the document to produce a "total"
error count.
The syntax for the `script_fields` property is identical to that used by {es}. The syntax for the `script_fields` property is identical to that used by {es}.
For more information, see {ref}/search-request-body.html#request-body-search-script-fields[Script Fields]. For more information, see
{ref}/search-request-body.html#request-body-search-script-fields[Script fields].
You can preview the contents of the {dfeed} by using the following API: You can preview the contents of the {dfeed} by using the following API:
@ -181,15 +183,15 @@ insufficient data to generate meaningful results.
//For a full demonstration of //For a full demonstration of
//how to create jobs with sample data, see <<ml-getting-started>>. //how to create jobs with sample data, see <<ml-getting-started>>.
You can alternatively use {kib} to create an advanced job that uses script You can alternatively use {kib} to create an advanced {anomaly-job} that uses
fields. To add the `script_fields` property to your {dfeed}, you must use the script fields. To add the `script_fields` property to your {dfeed}, you must use
**Edit JSON** tab. For example: the **Edit JSON** tab. For example:
[role="screenshot"] [role="screenshot"]
image::images/ml-scriptfields.jpg[Adding script fields to a {dfeed} in {kib}] image::images/ml-scriptfields.jpg[Adding script fields to a {dfeed} in {kib}]
[[ml-configuring-transform-examples]] [[ml-configuring-transform-examples]]
==== Common Script Field Examples ==== Common script field examples
While the possibilities are limitless, there are a number of common scenarios While the possibilities are limitless, there are a number of common scenarios
where you might use script fields in your {dfeeds}. where you might use script fields in your {dfeeds}.
@ -199,7 +201,7 @@ where you might use script fields in your {dfeeds}.
Some of these examples use regular expressions. By default, regular Some of these examples use regular expressions. By default, regular
expressions are disabled because they circumvent the protection that Painless expressions are disabled because they circumvent the protection that Painless
provides against long running and memory hungry scripts. For more information, provides against long running and memory hungry scripts. For more information,
see {ref}/modules-scripting-painless.html[Painless Scripting Language]. see {ref}/modules-scripting-painless.html[Painless scripting language].
Machine learning analysis is case sensitive. For example, "John" is considered Machine learning analysis is case sensitive. For example, "John" is considered
to be different than "john". This is one reason you might consider using scripts to be different than "john". This is one reason you might consider using scripts