[DOCS] Modify ML analytical functions (elastic/x-pack-elasticsearch#1467)
* [DOCS] Modify ML analytical functions * [DOCS] Fix ML function section titles Original commit: elastic/x-pack-elasticsearch@f95ae012bb
This commit is contained in:
parent
fa95474ab8
commit
27b0af7eae
|
@ -6,39 +6,32 @@ that is contained in strings within a bucket. These functions can be used as
|
|||
a more sophisticated method to identify incidences of data exfiltration or
|
||||
C2C activity, when analyzing the size in bytes of the data might not be sufficient.
|
||||
|
||||
If you want to monitor for unusually high amounts of information, use `high_info_content`.
|
||||
If want to look at drops in information content, use `low_info_content`.
|
||||
|
||||
The {xpackml} features include the following information content functions:
|
||||
|
||||
* <<ml-info-content,`info_content`>>
|
||||
* <<ml-high-info-content,`high_info_content`>>
|
||||
* <<ml-low-info-content,`low_info_content`>>
|
||||
* `info_content`, `high_info_content`, `low_info_content`
|
||||
|
||||
[float]
|
||||
[[ml-info-content]]
|
||||
==== Info_content
|
||||
==== Info_content, High_info_content, Low_info_content
|
||||
|
||||
The `info_content` function detects anomalies in the amount of information that
|
||||
is contained in strings in a bucket.
|
||||
|
||||
This function supports the following properties:
|
||||
If you want to monitor for unusually high amounts of information,
|
||||
use `high_info_content`.
|
||||
If want to look at drops in information content, use `low_info_content`.
|
||||
|
||||
These functions support the following properties:
|
||||
|
||||
* `field_name` (required)
|
||||
* `by_field_name` (optional)
|
||||
* `over_field_name` (optional)
|
||||
* `partition_field_name` (optional)
|
||||
* `summary_count_field_name` (optional)
|
||||
|
||||
For more information about those properties,
|
||||
see <<ml-detectorconfig,Detector Configuration Objects>>.
|
||||
|
||||
For example, if you use the following function in a detector in your job, it
|
||||
models information that is present in the `subdomain` string. It detects
|
||||
anomalies where the information content is unusual compared to the other
|
||||
`highest_registered_domain` values. An anomaly could indicate an abuse of the
|
||||
DNS protocol, such as malicious command and control activity.
|
||||
|
||||
.Example 1: Analyzing subdomain strings with the info_content function
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
|
@ -48,36 +41,17 @@ DNS protocol, such as malicious command and control activity.
|
|||
}
|
||||
--------------------------------------------------
|
||||
|
||||
NOTE: Both high and low values are considered anomalous. In many use cases, the
|
||||
`high_info_content` function is often a more appropriate choice.
|
||||
If you use this `info_content` function in a detector in your job, it models
|
||||
information that is present in the `subdomain` string. It detects anomalies
|
||||
where the information content is unusual compared to the other
|
||||
`highest_registered_domain` values. An anomaly could indicate an abuse of the
|
||||
DNS protocol, such as malicious command and control activity.
|
||||
|
||||
[float]
|
||||
[[ml-high-info-content]]
|
||||
==== High_info_content
|
||||
|
||||
The `high_info_content` function detects anomalies in the amount of information
|
||||
that is contained in strings in a bucket. Use this function if you want to
|
||||
monitor for unusually high amounts of information.
|
||||
|
||||
This function supports the following properties:
|
||||
|
||||
* `field_name` (required)
|
||||
* `by_field_name` (optional)
|
||||
* `over_field_name` (optional)
|
||||
* `partition_field_name` (optional)
|
||||
* `summary_count_field_name` (optional)
|
||||
|
||||
For more information about those properties,
|
||||
see <<ml-detectorconfig,Detector Configuration Objects>>.
|
||||
|
||||
For example, if you use the following function in a detector in your job, it
|
||||
models information content that is held in the DNS query string. It detects
|
||||
`src_ip` values where the information content is unusually high compared to
|
||||
other `src_ip` values. This example is similar to the example for the
|
||||
`info_content` function, but it reports anomalies only where the amount of
|
||||
information content is higher than expected.
|
||||
//TBD: Still pertinent? "This configuration identifies activity typical of DGA malware.""
|
||||
NOTE: In this example, both high and low values are considered anomalous.
|
||||
In many use cases, the `high_info_content` function is often a more appropriate
|
||||
choice.
|
||||
|
||||
.Example 2: Analyzing query strings with the high_info_content function
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
|
@ -87,33 +61,14 @@ information content is higher than expected.
|
|||
}
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
[[ml-low-info-content]]
|
||||
==== Low_info_content
|
||||
|
||||
The `low_info_content` function detects anomalies in the amount of information
|
||||
that is contained in strings in a bucket. Use this function if you want to look
|
||||
at drops in information content.
|
||||
|
||||
This function supports the following properties:
|
||||
|
||||
* `field_name` (required)
|
||||
* `by_field_name` (optional)
|
||||
* `over_field_name` (optional)
|
||||
* `partition_field_name` (optional)
|
||||
* `summary_count_field_name` (optional)
|
||||
|
||||
For more information about those properties,
|
||||
see <<ml-detectorconfig,Detector Configuration Objects>>.
|
||||
|
||||
For example, if you use the following function in a detector in your job, it
|
||||
models information content that is present in the message string for each
|
||||
`logfilename`. It detects anomalies where the information content is low compared
|
||||
to its past behavior. For example, this function detects unusually low amounts
|
||||
of information in a collection of rolling log files. Low information might
|
||||
indicate that a process has entered an infinite loop or that logging features
|
||||
have been disabled.
|
||||
If you use this `high_info_content` function in a detector in your job, it
|
||||
models information content that is held in the DNS query string. It detects
|
||||
`src_ip` values where the information content is unusually high compared to
|
||||
other `src_ip` values. This example is similar to the example for the
|
||||
`info_content` function, but it reports anomalies only where the amount of
|
||||
information content is higher than expected.
|
||||
|
||||
.Example 3: Analyzing message strings with the low_info_content function
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
|
@ -122,3 +77,11 @@ have been disabled.
|
|||
"by_field_name" : "logfilename"
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
If you use this `low_info_content` function in a detector in your job, it models
|
||||
information content that is present in the message string for each
|
||||
`logfilename`. It detects anomalies where the information content is low
|
||||
compared to its past behavior. For example, this function detects unusually low
|
||||
amounts of information in a collection of rolling log files. Low information
|
||||
might indicate that a process has entered an infinite loop or that logging
|
||||
features have been disabled.
|
||||
|
|
|
@ -9,16 +9,10 @@ The {xpackml} features include the following metric functions:
|
|||
|
||||
* <<ml-metric-min,`min`>>
|
||||
* <<ml-metric-max,`max`>>
|
||||
* <<ml-metric-median,`median`>>
|
||||
* <<ml-metric-high-median,`high_median`>>
|
||||
* <<ml-metric-low-median,`low_median`>>
|
||||
* <<ml-metric-mean,`mean`>>
|
||||
* <<ml-metric-high-mean,`high_mean`>>
|
||||
* <<ml-metric-low-mean,`low_mean`>>
|
||||
* xref:ml-metric-median[`median`, `high_median`, `low_median`]
|
||||
* xref:ml-metric-mean[`mean`, `high_mean`, `low_mean`]
|
||||
* <<ml-metric-metric,`metric`>>
|
||||
* <<ml-metric-varp,`varp`>>
|
||||
* <<ml-metric-high-varp,`high_varp`>>
|
||||
* <<ml-metric-low-varp,`low_varp`>>
|
||||
* xref:ml-metric-varp[`varp`, `high_varp`, `low_varp`]
|
||||
|
||||
[float]
|
||||
[[ml-metric-min]]
|
||||
|
@ -35,18 +29,11 @@ This function supports the following properties:
|
|||
* `by_field_name` (optional)
|
||||
* `over_field_name` (optional)
|
||||
* `partition_field_name` (optional)
|
||||
* `summary_count_field_name` (optional)
|
||||
|
||||
For more information about those properties,
|
||||
see <<ml-detectorconfig,Detector Configuration Objects>>.
|
||||
|
||||
For example, if you use the following function in a detector in your job,
|
||||
it detects where the smallest transaction is lower than previously observed.
|
||||
You can use this function to detect items for sale at unintentionally low
|
||||
prices due to data entry mistakes. It models the minimum amount for each
|
||||
product over time.
|
||||
//Detect when the minumum amount for a product is unusually low compared to its past amounts
|
||||
|
||||
.Example 1: Analyzing minimum transactions with the min function
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
|
@ -56,6 +43,10 @@ product over time.
|
|||
}
|
||||
--------------------------------------------------
|
||||
|
||||
If you use this `min` function in a detector in your job, it detects where the
|
||||
smallest transaction is lower than previously observed. You can use this
|
||||
function to detect items for sale at unintentionally low prices due to data
|
||||
entry mistakes. It models the minimum amount for each product over time.
|
||||
|
||||
[float]
|
||||
[[ml-metric-max]]
|
||||
|
@ -72,18 +63,11 @@ This function supports the following properties:
|
|||
* `by_field_name` (optional)
|
||||
* `over_field_name` (optional)
|
||||
* `partition_field_name` (optional)
|
||||
* `summary_count_field_name` (optional)
|
||||
|
||||
For more information about those properties,
|
||||
see <<ml-detectorconfig,Detector Configuration Objects>>.
|
||||
|
||||
For example, if you use the following function in a detector in your job,
|
||||
it detects where the longest `responsetime` is longer than previously observed.
|
||||
You can use this function to detect applications that have `responsetime`
|
||||
values that are unusually lengthy. It models the maximum `responsetime` for
|
||||
each application over time and detects when the longest `responsetime` is
|
||||
unusually long compared to previous applications.
|
||||
|
||||
.Example 2: Analyzing maximum response times with the max function
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
|
@ -93,11 +77,14 @@ unusually long compared to previous applications.
|
|||
}
|
||||
--------------------------------------------------
|
||||
|
||||
This analysis can be performed alongside `high_mean` functions by
|
||||
application. By combining detectors and using the same influencer this would
|
||||
detect both unusually long individual response times and average response times
|
||||
for each bucket. For example:
|
||||
If you use this `max` function in a detector in your job, it detects where the
|
||||
longest `responsetime` is longer than previously observed. You can use this
|
||||
function to detect applications that have `responsetime` values that are
|
||||
unusually lengthy. It models the maximum `responsetime` for each application
|
||||
over time and detects when the longest `responsetime` is unusually long compared
|
||||
to previous applications.
|
||||
|
||||
.Example 3: Two detectors with max and high_mean functions
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
|
@ -112,29 +99,35 @@ for each bucket. For example:
|
|||
}
|
||||
--------------------------------------------------
|
||||
|
||||
The analysis in the previous example can be performed alongside `high_mean`
|
||||
functions by application. By combining detectors and using the same influencer
|
||||
this job can detect both unusually long individual response times and average
|
||||
response times for each bucket.
|
||||
|
||||
[float]
|
||||
[[ml-metric-median]]
|
||||
==== Median
|
||||
==== Median, High_median, Low_median
|
||||
|
||||
The `median` function detects anomalies in the statistical median of a value.
|
||||
The median value is calculated for each bucket.
|
||||
|
||||
This function supports the following properties:
|
||||
If you want to monitor unusually high median values, use the `high_median`
|
||||
function.
|
||||
|
||||
If you are just interested in unusually low median values, use the `low_median`
|
||||
function.
|
||||
|
||||
These functions support the following properties:
|
||||
|
||||
* `field_name` (required)
|
||||
* `by_field_name` (optional)
|
||||
* `over_field_name` (optional)
|
||||
* `partition_field_name` (optional)
|
||||
* `summary_count_field_name` (optional)
|
||||
|
||||
For more information about those properties,
|
||||
see <<ml-detectorconfig,Detector Configuration Objects>>.
|
||||
|
||||
For example, if you use the following function in a detector in your job,
|
||||
it models the median `responsetime` for each application over time. It detects
|
||||
when the median `responsetime` is unusual compared to previous `responsetime`
|
||||
values.
|
||||
|
||||
.Example 4: Analyzing response times with the median function
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
|
@ -144,68 +137,34 @@ values.
|
|||
}
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
[[ml-metric-high-median]]
|
||||
==== High_median
|
||||
|
||||
The `high_median` function detects anomalies in the statistical median of a value.
|
||||
The median value is calculated for each bucket.
|
||||
Use this function if you want to monitor unusually high median values.
|
||||
|
||||
This function supports the following properties:
|
||||
|
||||
* `field_name` (required)
|
||||
* `by_field_name` (optional)
|
||||
* `over_field_name` (optional)
|
||||
* `partition_field_name` (optional)
|
||||
* `summary_count_field_name` (optional)
|
||||
|
||||
For more information about those properties,
|
||||
see <<ml-detectorconfig,Detector Configuration Objects>>.
|
||||
|
||||
[float]
|
||||
[[ml-metric-low-median]]
|
||||
==== Low_median
|
||||
|
||||
The `low_median` function detects anomalies in the statistical median of a value.
|
||||
The median value is calculated for each bucket.
|
||||
Use this function if you are just interested in unusually low median values.
|
||||
|
||||
This function supports the following properties:
|
||||
|
||||
* `field_name` (required)
|
||||
* `by_field_name` (optional)
|
||||
* `over_field_name` (optional)
|
||||
* `partition_field_name` (optional)
|
||||
* `summary_count_field_name` (optional)
|
||||
|
||||
For more information about those properties,
|
||||
see <<ml-detectorconfig,Detector Configuration Objects>>.
|
||||
|
||||
If you use this `median` function in a detector in your job, it models the
|
||||
median `responsetime` for each application over time. It detects when the median
|
||||
`responsetime` is unusual compared to previous `responsetime` values.
|
||||
|
||||
[float]
|
||||
[[ml-metric-mean]]
|
||||
==== Mean
|
||||
==== Mean, High_mean, Low_mean
|
||||
|
||||
The `mean` function detects anomalies in the arithmetic mean of a value.
|
||||
The mean value is calculated for each bucket.
|
||||
|
||||
This function supports the following properties:
|
||||
If you want to monitor unusually high average values, use the `high_mean`
|
||||
function.
|
||||
|
||||
If you are just interested in unusually low average values, use the `low_mean`
|
||||
function.
|
||||
|
||||
These functions support the following properties:
|
||||
|
||||
* `field_name` (required)
|
||||
* `by_field_name` (optional)
|
||||
* `over_field_name` (optional)
|
||||
* `partition_field_name` (optional)
|
||||
* `summary_count_field_name` (optional)
|
||||
|
||||
For more information about those properties,
|
||||
see <<ml-detectorconfig,Detector Configuration Objects>>.
|
||||
|
||||
For example, if you use the following function in a detector in your job,
|
||||
it models the mean `responsetime` for each application over time. It detects
|
||||
when the mean `responsetime` is unusual compared to previous `responsetime`
|
||||
values.
|
||||
|
||||
.Example 5: Analyzing response times with the mean function
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
|
@ -215,30 +174,11 @@ values.
|
|||
}
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
[[ml-metric-high-mean]]
|
||||
==== High_mean
|
||||
|
||||
The `high_mean` function detects anomalies in the arithmetic mean of a value.
|
||||
The mean value is calculated for each bucket.
|
||||
Use this function if you want to monitor unusually high average values.
|
||||
|
||||
This function supports the following properties:
|
||||
|
||||
* `field_name` (required)
|
||||
* `by_field_name` (optional)
|
||||
* `over_field_name` (optional)
|
||||
* `partition_field_name` (optional)
|
||||
* `summary_count_field_name` (optional)
|
||||
|
||||
For more information about those properties,
|
||||
see <<ml-detectorconfig,Detector Configuration Objects>>.
|
||||
|
||||
For example, if you use the following function in a detector in your job,
|
||||
it models the mean `responsetime` for each application over time. It detects
|
||||
when the mean `responsetime` is unusually high compared to previous
|
||||
`responsetime` values.
|
||||
If you use this `mean` function in a detector in your job, it models the mean
|
||||
`responsetime` for each application over time. It detects when the mean
|
||||
`responsetime` is unusual compared to previous `responsetime` values.
|
||||
|
||||
.Example 6: Analyzing response times with the high_mean function
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
|
@ -248,30 +188,11 @@ when the mean `responsetime` is unusually high compared to previous
|
|||
}
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
[[ml-metric-low-mean]]
|
||||
==== Low_mean
|
||||
|
||||
The `low_mean` function detects anomalies in the arithmetic mean of a value.
|
||||
The mean value is calculated for each bucket.
|
||||
Use this function if you are just interested in unusually low average values.
|
||||
|
||||
This function supports the following properties:
|
||||
|
||||
* `field_name` (required)
|
||||
* `by_field_name` (optional)
|
||||
* `over_field_name` (optional)
|
||||
* `partition_field_name` (optional)
|
||||
* `summary_count_field_name` (optional)
|
||||
|
||||
For more information about those properties,
|
||||
see <<ml-detectorconfig,Detector Configuration Objects>>.
|
||||
|
||||
For example, if you use the following function in a detector in your job,
|
||||
it models the mean `responsetime` for each application over time. It detects
|
||||
when the mean `responsetime` is unusually low
|
||||
compared to previous `responsetime` values.
|
||||
If you use this `high_mean` function in a detector in your job, it models the
|
||||
mean `responsetime` for each application over time. It detects when the mean
|
||||
`responsetime` is unusually high compared to previous `responsetime` values.
|
||||
|
||||
.Example 7: Analyzing response times with the low_mean function
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
|
@ -281,6 +202,10 @@ compared to previous `responsetime` values.
|
|||
}
|
||||
--------------------------------------------------
|
||||
|
||||
If you use this `low_mean` function in a detector in your job, it models the
|
||||
mean `responsetime` for each application over time. It detects when the mean
|
||||
`responsetime` is unusually low compared to previous `responsetime` values.
|
||||
|
||||
[float]
|
||||
[[ml-metric-metric]]
|
||||
==== Metric
|
||||
|
@ -303,11 +228,7 @@ This function supports the following properties:
|
|||
For more information about those properties,
|
||||
see <<ml-detectorconfig,Detector Configuration Objects>>.
|
||||
|
||||
For example, if you use the following function in a detector in your job,
|
||||
it models the mean, min, and max `responsetime` for each application over time.
|
||||
It detects when the mean, min, or max `responsetime` is unusual compared to
|
||||
previous `responsetime` values.
|
||||
|
||||
.Example 8: Analyzing response times with the metric function
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
|
@ -317,30 +238,33 @@ previous `responsetime` values.
|
|||
}
|
||||
--------------------------------------------------
|
||||
|
||||
If you use this `metric` function in a detector in your job, it models the
|
||||
mean, min, and max `responsetime` for each application over time. It detects
|
||||
when the mean, min, or max `responsetime` is unusual compared to previous
|
||||
`responsetime` values.
|
||||
|
||||
[float]
|
||||
[[ml-metric-varp]]
|
||||
==== Varp
|
||||
==== Varp, High_varp, Low_varp
|
||||
|
||||
The `varp` function detects anomalies in the variance of a value which is a
|
||||
measure of the variability and spread in the data.
|
||||
|
||||
This function supports the following properties:
|
||||
If you want to monitor unusually high variance, use the `high_varp` function.
|
||||
|
||||
If you are just interested in unusually low variance, use the `low_varp` function.
|
||||
|
||||
These functions support the following properties:
|
||||
|
||||
* `field_name` (required)
|
||||
* `by_field_name` (optional)
|
||||
* `over_field_name` (optional)
|
||||
* `partition_field_name` (optional)
|
||||
* `summary_count_field_name` (optional)
|
||||
|
||||
For more information about those properties,
|
||||
see <<ml-detectorconfig,Detector Configuration Objects>>.
|
||||
|
||||
For example, if you use the following function in a detector in your job,
|
||||
it models models the variance in values of `responsetime` for each application
|
||||
over time. It detects when the variance in `responsetime` is unusual compared
|
||||
to past application behavior.
|
||||
|
||||
.Example 9: Analyzing response times with the varp function
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
|
@ -350,30 +274,12 @@ to past application behavior.
|
|||
}
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
[[ml-metric-high-varp]]
|
||||
==== High_varp
|
||||
|
||||
The `high_varp` function detects anomalies in the variance of a value which is a
|
||||
measure of the variability and spread in the data. Use this function if you want
|
||||
to monitor unusually high variance.
|
||||
|
||||
This function supports the following properties:
|
||||
|
||||
* `field_name` (required)
|
||||
* `by_field_name` (optional)
|
||||
* `over_field_name` (optional)
|
||||
* `partition_field_name` (optional)
|
||||
* `summary_count_field_name` (optional)
|
||||
|
||||
For more information about those properties,
|
||||
see <<ml-detectorconfig,Detector Configuration Objects>>.
|
||||
|
||||
For example, if you use the following function in a detector in your job,
|
||||
it models models the variance in values of `responsetime` for each application
|
||||
over time. It detects when the variance in `responsetime` is unusual compared
|
||||
to past application behavior.
|
||||
If you use this `varp` function in a detector in your job, it models the
|
||||
variance in values of `responsetime` for each application over time. It detects
|
||||
when the variance in `responsetime` is unusual compared to past application
|
||||
behavior.
|
||||
|
||||
.Example 10: Analyzing response times with the high_varp function
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
|
@ -383,31 +289,12 @@ to past application behavior.
|
|||
}
|
||||
--------------------------------------------------
|
||||
|
||||
If you use this `high_varp` function in a detector in your job, it models the
|
||||
variance in values of `responsetime` for each application over time. It detects
|
||||
when the variance in `responsetime` is unusual compared to past application
|
||||
behavior.
|
||||
|
||||
[float]
|
||||
[[ml-metric-low-varp]]
|
||||
==== Low_varp
|
||||
|
||||
The `low_varp` function detects anomalies in the variance of a value which is a
|
||||
measure of the variability and spread in the data. Use this function if you are
|
||||
just interested in unusually low variance.
|
||||
|
||||
This function supports the following properties:
|
||||
|
||||
* `field_name` (required)
|
||||
* `by_field_name` (optional)
|
||||
* `over_field_name` (optional)
|
||||
* `partition_field_name` (optional)
|
||||
* `summary_count_field_name` (optional)
|
||||
|
||||
For more information about those properties,
|
||||
see <<ml-detectorconfig,Detector Configuration Objects>>.
|
||||
|
||||
For example, if you use the following function in a detector in your job,
|
||||
it models models the variance in values of `responsetime` for each application
|
||||
over time. It detects when the variance in `responsetime` is unusual compared
|
||||
to past application behavior.
|
||||
|
||||
.Example 11: Analyzing response times with the low_varp function
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
|
@ -416,3 +303,8 @@ to past application behavior.
|
|||
"by_field_name" : "application"
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
If you use this `low_varp` function in a detector in your job, it models the
|
||||
variance in values of `responsetime` for each application over time. It detects
|
||||
when the variance in `responsetime` is unusual compared to past application
|
||||
behavior.
|
||||
|
|
|
@ -40,17 +40,11 @@ This function supports the following properties:
|
|||
* `by_field_name` (required)
|
||||
* `over_field_name` (optional)
|
||||
* `partition_field_name` (optional)
|
||||
* `summary_count_field_name` (optional)
|
||||
|
||||
For more information about those properties,
|
||||
see <<ml-detectorconfig,Detector Configuration Objects>>.
|
||||
|
||||
For example, if you use the following function in a detector in your job, it
|
||||
detects values that are rare in time. It models status codes that occur over
|
||||
time and detects when rare status codes occur compared to the past. For example,
|
||||
you can detect status codes in a web
|
||||
access log that have never (or rarely) occurred before.
|
||||
|
||||
.Example 1: Analyzing status codes with the rare function
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
|
@ -59,16 +53,12 @@ access log that have never (or rarely) occurred before.
|
|||
}
|
||||
--------------------------------------------------
|
||||
|
||||
If you use the following function in a detector in your job, it
|
||||
detects values that are rare in a population. It models status code and client
|
||||
IP interactions that occur. It defines a rare status code as one that occurs for
|
||||
few client IP values compared to the population. It detects client IP values
|
||||
that experience one or more distinct rare status codes compared to the
|
||||
population. For example in a web access log, a `clientip` that experiences the
|
||||
highest number of different rare status codes compared to the population is
|
||||
regarded as highly anomalous. This analysis is based on the number of different
|
||||
status code values, not the count of occurrences.
|
||||
If you use this `rare` function in a detector in your job, it detects values
|
||||
that are rare in time. It models status codes that occur over time and detects
|
||||
when rare status codes occur compared to the past. For example, you can detect
|
||||
status codes in a web access log that have never (or rarely) occurred before.
|
||||
|
||||
.Example 2: Analyzing status codes in a population with the rare function
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
|
@ -78,14 +68,21 @@ status code values, not the count of occurrences.
|
|||
}
|
||||
--------------------------------------------------
|
||||
|
||||
If you use this `rare` function in a detector in your job, it detects values
|
||||
that are rare in a population. It models status code and client IP interactions
|
||||
that occur. It defines a rare status code as one that occurs for few client IP
|
||||
values compared to the population. It detects client IP values that experience
|
||||
one or more distinct rare status codes compared to the population. For example
|
||||
in a web access log, a `clientip` that experiences the highest number of
|
||||
different rare status codes compared to the population is regarded as highly
|
||||
anomalous. This analysis is based on the number of different status code values,
|
||||
not the count of occurrences.
|
||||
|
||||
NOTE: To define a status code as rare the {xpackml} features look at the number
|
||||
of distinct status codes that occur, not the number of times the status code
|
||||
occurs. If a single client IP experiences a single unique status code, this
|
||||
is rare, even if it occurs for that client IP in every bucket.
|
||||
|
||||
//TBD: Still pertinent? "Here with rare we look at the number of distinct status codes.""
|
||||
|
||||
|
||||
[float]
|
||||
[[ml-freq-rare]]
|
||||
==== Freq_rare
|
||||
|
@ -99,21 +96,11 @@ This function supports the following properties:
|
|||
* `by_field_name` (required)
|
||||
* `over_field_name` (optional)
|
||||
* `partition_field_name` (optional)
|
||||
* `summary_count_field_name` (optional)
|
||||
|
||||
For more information about those properties,
|
||||
see <<ml-detectorconfig,Detector Configuration Objects>>.
|
||||
|
||||
For example, if you use the following function in a detector in your job, it
|
||||
detects values that are frequently rare in a population. It models URI paths and
|
||||
client IP interactions that occur. It defines a rare URI path as one that is
|
||||
visited by few client IP values compared to the population. It detects the
|
||||
client IP values that experience many interactions with rare URI paths compared
|
||||
to the population. For example in a web access log, a client IP that visits
|
||||
one or more rare URI paths many times compared to the population is regarded as
|
||||
highly anomalous. This analysis is based on the count of interactions with rare
|
||||
URI paths, not the number of different URI path values.
|
||||
|
||||
.Example 3: Analyzing URI values in a population with the freq_rare function
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
|
@ -123,9 +110,17 @@ URI paths, not the number of different URI path values.
|
|||
}
|
||||
--------------------------------------------------
|
||||
|
||||
If you use this `freq_rare` function in a detector in your job, it
|
||||
detects values that are frequently rare in a population. It models URI paths and
|
||||
client IP interactions that occur. It defines a rare URI path as one that is
|
||||
visited by few client IP values compared to the population. It detects the
|
||||
client IP values that experience many interactions with rare URI paths compared
|
||||
to the population. For example in a web access log, a client IP that visits
|
||||
one or more rare URI paths many times compared to the population is regarded as
|
||||
highly anomalous. This analysis is based on the count of interactions with rare
|
||||
URI paths, not the number of different URI path values.
|
||||
|
||||
NOTE: To define a URI path as rare, the analytics consider the number of
|
||||
distinct values that occur and not the number of times the URI path occurs.
|
||||
If a single client IP visits a single unique URI path, this is rare, even if it
|
||||
occurs for that client IP in every bucket.
|
||||
|
||||
//TBD: Still pertinent? "Here with freq_rare we look at the number of times interactions have happened.""
|
||||
|
|
|
@ -13,12 +13,8 @@ ignored; buckets with a zero value are analyzed.
|
|||
|
||||
The {xpackml} features include the following sum functions:
|
||||
|
||||
* <<ml-sum,`sum`>>
|
||||
* <<ml-high-sum,`high_sum`>>
|
||||
* <<ml-low-sum,`low_sum`>>
|
||||
* <<ml-nonnull-sum,`non_null_sum`>>
|
||||
* <<ml-high-nonnull-sum,`high_non_null_sum`>>
|
||||
* <<ml-low-nonnull-sum,`low_non_null_sum`>>
|
||||
* xref:ml-sum[`sum`, `high_sum`, `low_sum`]
|
||||
* xref:ml-nonnull-sum[`non_null_sum`, `high_non_null_sum`, `low_non_null_sum`]
|
||||
|
||||
////
|
||||
TBD: Incorporate from prelert docs?:
|
||||
|
@ -29,27 +25,26 @@ a more appropriate method to using the sum function.
|
|||
|
||||
[float]
|
||||
[[ml-sum]]
|
||||
==== Sum
|
||||
==== Sum, High_sum, Low_sum
|
||||
|
||||
The `sum` function detects anomalies where the sum of a field in a bucket is
|
||||
anomalous.
|
||||
|
||||
This function supports the following properties:
|
||||
If you want to monitor unusually high sum values, use the `high_sum` function.
|
||||
|
||||
If you want to monitor unusually low sum values, use the `low_sum` function.
|
||||
|
||||
These functions support the following properties:
|
||||
|
||||
* `field_name` (required)
|
||||
* `by_field_name` (optional)
|
||||
* `over_field_name` (optional)
|
||||
* `partition_field_name` (optional)
|
||||
* `summary_count_field_name` (optional)
|
||||
|
||||
For more information about those properties,
|
||||
see <<ml-detectorconfig,Detector Configuration Objects>>.
|
||||
|
||||
For example, if you use the following function in a detector in your job, it
|
||||
models total expenses per employees for each cost center. For each time bucket,
|
||||
it detects when an employee’s expenses are unusual for a cost center compared
|
||||
to other employees.
|
||||
|
||||
.Example 1: Analyzing total expenses with the sum function
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
|
@ -60,28 +55,12 @@ to other employees.
|
|||
}
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
[[ml-high-sum]]
|
||||
==== High_sum
|
||||
|
||||
The `high_sum` function detects anomalies where the sum of a field in a bucket
|
||||
is unusually high.
|
||||
|
||||
This function supports the following properties:
|
||||
|
||||
* `field_name` (required)
|
||||
* `by_field_name` (optional)
|
||||
* `over_field_name` (optional)
|
||||
* `partition_field_name` (optional)
|
||||
* `summary_count_field_name` (optional)
|
||||
|
||||
For more information about those properties,
|
||||
see <<ml-detectorconfig,Detector Configuration Objects>>.
|
||||
|
||||
For example, if you use the following function in a detector in your job, it
|
||||
models total `cs_bytes`. It detects `cs_hosts` that transfer unusually high
|
||||
volumes compared to other `cs_hosts`.
|
||||
If you use this `sum` function in a detector in your job, it
|
||||
models total expenses per employees for each cost center. For each time bucket,
|
||||
it detects when an employee’s expenses are unusual for a cost center compared
|
||||
to other employees.
|
||||
|
||||
.Example 2: Analyzing total bytes with the high_sum function
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
|
@ -91,42 +70,30 @@ volumes compared to other `cs_hosts`.
|
|||
}
|
||||
--------------------------------------------------
|
||||
|
||||
This example looks for volumes of data transferred from a client to a server on
|
||||
the internet that are unusual compared to other clients. This scenario could be
|
||||
useful to detect data exfiltration or to find users that are abusing internet
|
||||
privileges.
|
||||
|
||||
[float]
|
||||
[[ml-low-sum]]
|
||||
==== Low_sum
|
||||
|
||||
The `low_sum` function detects anomalies where the sum of a field in a bucket
|
||||
is unusually low.
|
||||
|
||||
This function supports the following properties:
|
||||
|
||||
* `field_name` (required)
|
||||
* `by_field_name` (optional)
|
||||
* `over_field_name` (optional)
|
||||
* `partition_field_name` (optional)
|
||||
* `summary_count_field_name` (optional)
|
||||
|
||||
For more information about those properties,
|
||||
see <<ml-detectorconfig,Detector Configuration Objects>>.
|
||||
If you use this `high_sum` function in a detector in your job, it
|
||||
models total `cs_bytes`. It detects `cs_hosts` that transfer unusually high
|
||||
volumes compared to other `cs_hosts`. This example looks for volumes of data
|
||||
transferred from a client to a server on the internet that are unusual compared
|
||||
to other clients. This scenario could be useful to detect data exfiltration or
|
||||
to find users that are abusing internet privileges.
|
||||
|
||||
[float]
|
||||
[[ml-nonnull-sum]]
|
||||
==== Non_null_sum
|
||||
==== Non_null_sum, High_non_null_sum, Low_non_null_sum
|
||||
|
||||
The `non_null_sum` function is useful if your data is sparse. Buckets without
|
||||
values are ignored and buckets with a zero value are analyzed.
|
||||
|
||||
This function supports the following properties:
|
||||
If you want to monitor unusually high totals, use the `high_non_null_sum`
|
||||
function.
|
||||
|
||||
If you want to look at drops in totals, use the `low_non_null_sum` function.
|
||||
|
||||
These functions support the following properties:
|
||||
|
||||
* `field_name` (required)
|
||||
* `by_field_name` (optional)
|
||||
* `partition_field_name` (optional)
|
||||
* `summary_count_field_name` (optional)
|
||||
|
||||
For more information about those properties,
|
||||
see <<ml-detectorconfig,Detector Configuration Objects>>.
|
||||
|
@ -134,32 +101,7 @@ see <<ml-detectorconfig,Detector Configuration Objects>>.
|
|||
NOTE: Population analysis (that is to say, use of the `over_field_name` property)
|
||||
is not applicable for this function.
|
||||
|
||||
[float]
|
||||
[[ml-high-nonnull-sum]]
|
||||
==== High_non_null_sum
|
||||
|
||||
The `high_non_null_sum` function is useful if your data is sparse. Buckets
|
||||
without values are ignored and buckets with a zero value are analyzed.
|
||||
Use this function if you want to monitor unusually high totals.
|
||||
|
||||
This function supports the following properties:
|
||||
|
||||
* `field_name` (required)
|
||||
* `by_field_name` (optional)
|
||||
* `partition_field_name` (optional)
|
||||
* `summary_count_field_name` (optional)
|
||||
|
||||
For more information about those properties,
|
||||
see <<ml-detectorconfig,Detector Configuration Objects>>.
|
||||
|
||||
NOTE: Population analysis (that is to say, use of the `over_field_name` property)
|
||||
is not applicable for this function.
|
||||
|
||||
For example, if you use the following function in a detector in your job, it
|
||||
models the total `amount_approved` for each employee. It ignores any buckets
|
||||
where the amount is null. It detects employees who approve unusually high
|
||||
amounts compared to their past behavior.
|
||||
|
||||
.Example 3: Analyzing employee approvals with the high_non_null_sum function
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
|
@ -169,26 +111,9 @@ amounts compared to their past behavior.
|
|||
}
|
||||
--------------------------------------------------
|
||||
|
||||
If you use this `high_non_null_sum` function in a detector in your job, it
|
||||
models the total `amount_approved` for each employee. It ignores any buckets
|
||||
where the amount is null. It detects employees who approve unusually high
|
||||
amounts compared to their past behavior.
|
||||
//For this credit control system analysis, using non_null_sum will ignore
|
||||
//periods where the employees are not active on the system.
|
||||
|
||||
[float]
|
||||
[[ml-low-nonnull-sum]]
|
||||
==== Low_non_null_sum
|
||||
|
||||
The `low_non_null_sum` function is useful if your data is sparse. Buckets
|
||||
without values are ignored and buckets with a zero value are analyzed.
|
||||
Use this function if you want to look at drops in totals.
|
||||
|
||||
This function supports the following properties:
|
||||
|
||||
* `field_name` (required)
|
||||
* `by_field_name` (optional)
|
||||
* `partition_field_name` (optional)
|
||||
* `summary_count_field_name` (optional)
|
||||
|
||||
For more information about those properties,
|
||||
see <<ml-detectorconfig,Detector Configuration Objects>>.
|
||||
|
||||
NOTE: Population analysis (that is to say, use of the `over_field_name` property)
|
||||
is not applicable for this function.
|
||||
|
|
|
@ -47,16 +47,11 @@ This function supports the following properties:
|
|||
* `by_field_name` (optional)
|
||||
* `over_field_name` (optional)
|
||||
* `partition_field_name` (optional)
|
||||
* `summary_count_field_name` (optional)
|
||||
|
||||
For more information about those properties,
|
||||
see <<ml-detectorconfig,Detector Configuration Objects>>.
|
||||
|
||||
For example, if you use the following function in a detector in your job, it
|
||||
models when events occur throughout a day for each process. It detects when an
|
||||
event occurs for a process that is at an unusual time in the day compared to
|
||||
its past behavior.
|
||||
|
||||
.Example 1: Analyzing events with the time_of_day function
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
|
@ -65,6 +60,10 @@ its past behavior.
|
|||
}
|
||||
--------------------------------------------------
|
||||
|
||||
If you use this `time_of_day` function in a detector in your job, it
|
||||
models when events occur throughout a day for each process. It detects when an
|
||||
event occurs for a process that is at an unusual time in the day compared to
|
||||
its past behavior.
|
||||
|
||||
[float]
|
||||
[[ml-time-of-week]]
|
||||
|
@ -78,17 +77,11 @@ This function supports the following properties:
|
|||
* `by_field_name` (optional)
|
||||
* `over_field_name` (optional)
|
||||
* `partition_field_name` (optional)
|
||||
* `summary_count_field_name` (optional)
|
||||
|
||||
For more information about those properties,
|
||||
see <<ml-detectorconfig,Detector Configuration Objects>>.
|
||||
|
||||
For example, if you use the following function in a detector in your job, it
|
||||
models when events occur throughout the week for each `eventcode`. It detects
|
||||
when a workstation event occurs at an unusual time during the week for that
|
||||
`eventcode` compared to other workstations. It detects events for a
|
||||
particular workstation that are outside the normal usage pattern.
|
||||
|
||||
.Example 2: Analyzing events with the time_of_week function
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
|
@ -97,3 +90,9 @@ particular workstation that are outside the normal usage pattern.
|
|||
"over_field_name" : "workstation"
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
If you use this `time_of_week` function in a detector in your job, it
|
||||
models when events occur throughout the week for each `eventcode`. It detects
|
||||
when a workstation event occurs at an unusual time during the week for that
|
||||
`eventcode` compared to other workstations. It detects events for a
|
||||
particular workstation that are outside the normal usage pattern.
|
||||
|
|
Loading…
Reference in New Issue