[DOCS] Modify ML analytical functions (elastic/x-pack-elasticsearch#1467)

* [DOCS] Modify ML analytical functions

* [DOCS] Fix ML function section titles

Original commit: elastic/x-pack-elasticsearch@f95ae012bb
This commit is contained in:
Lisa Cawley 2017-05-19 10:48:15 -07:00 committed by GitHub
parent fa95474ab8
commit 27b0af7eae
5 changed files with 184 additions and 410 deletions

View File

@ -6,39 +6,32 @@ that is contained in strings within a bucket. These functions can be used as
a more sophisticated method to identify incidences of data exfiltration or a more sophisticated method to identify incidences of data exfiltration or
C2C activity, when analyzing the size in bytes of the data might not be sufficient. C2C activity, when analyzing the size in bytes of the data might not be sufficient.
If you want to monitor for unusually high amounts of information, use `high_info_content`.
If want to look at drops in information content, use `low_info_content`.
The {xpackml} features include the following information content functions: The {xpackml} features include the following information content functions:
* <<ml-info-content,`info_content`>> * `info_content`, `high_info_content`, `low_info_content`
* <<ml-high-info-content,`high_info_content`>>
* <<ml-low-info-content,`low_info_content`>>
[float] [float]
[[ml-info-content]] [[ml-info-content]]
==== Info_content ==== Info_content, High_info_content, Low_info_content
The `info_content` function detects anomalies in the amount of information that The `info_content` function detects anomalies in the amount of information that
is contained in strings in a bucket. is contained in strings in a bucket.
This function supports the following properties: If you want to monitor for unusually high amounts of information,
use `high_info_content`.
If want to look at drops in information content, use `low_info_content`.
These functions support the following properties:
* `field_name` (required) * `field_name` (required)
* `by_field_name` (optional) * `by_field_name` (optional)
* `over_field_name` (optional) * `over_field_name` (optional)
* `partition_field_name` (optional) * `partition_field_name` (optional)
* `summary_count_field_name` (optional)
For more information about those properties, For more information about those properties,
see <<ml-detectorconfig,Detector Configuration Objects>>. see <<ml-detectorconfig,Detector Configuration Objects>>.
For example, if you use the following function in a detector in your job, it .Example 1: Analyzing subdomain strings with the info_content function
models information that is present in the `subdomain` string. It detects
anomalies where the information content is unusual compared to the other
`highest_registered_domain` values. An anomaly could indicate an abuse of the
DNS protocol, such as malicious command and control activity.
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
{ {
@ -48,36 +41,17 @@ DNS protocol, such as malicious command and control activity.
} }
-------------------------------------------------- --------------------------------------------------
NOTE: Both high and low values are considered anomalous. In many use cases, the If you use this `info_content` function in a detector in your job, it models
`high_info_content` function is often a more appropriate choice. information that is present in the `subdomain` string. It detects anomalies
where the information content is unusual compared to the other
`highest_registered_domain` values. An anomaly could indicate an abuse of the
DNS protocol, such as malicious command and control activity.
[float] NOTE: In this example, both high and low values are considered anomalous.
[[ml-high-info-content]] In many use cases, the `high_info_content` function is often a more appropriate
==== High_info_content choice.
The `high_info_content` function detects anomalies in the amount of information
that is contained in strings in a bucket. Use this function if you want to
monitor for unusually high amounts of information.
This function supports the following properties:
* `field_name` (required)
* `by_field_name` (optional)
* `over_field_name` (optional)
* `partition_field_name` (optional)
* `summary_count_field_name` (optional)
For more information about those properties,
see <<ml-detectorconfig,Detector Configuration Objects>>.
For example, if you use the following function in a detector in your job, it
models information content that is held in the DNS query string. It detects
`src_ip` values where the information content is unusually high compared to
other `src_ip` values. This example is similar to the example for the
`info_content` function, but it reports anomalies only where the amount of
information content is higher than expected.
//TBD: Still pertinent? "This configuration identifies activity typical of DGA malware.""
.Example 2: Analyzing query strings with the high_info_content function
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
{ {
@ -87,33 +61,14 @@ information content is higher than expected.
} }
-------------------------------------------------- --------------------------------------------------
[float] If you use this `high_info_content` function in a detector in your job, it
[[ml-low-info-content]] models information content that is held in the DNS query string. It detects
==== Low_info_content `src_ip` values where the information content is unusually high compared to
other `src_ip` values. This example is similar to the example for the
The `low_info_content` function detects anomalies in the amount of information `info_content` function, but it reports anomalies only where the amount of
that is contained in strings in a bucket. Use this function if you want to look information content is higher than expected.
at drops in information content.
This function supports the following properties:
* `field_name` (required)
* `by_field_name` (optional)
* `over_field_name` (optional)
* `partition_field_name` (optional)
* `summary_count_field_name` (optional)
For more information about those properties,
see <<ml-detectorconfig,Detector Configuration Objects>>.
For example, if you use the following function in a detector in your job, it
models information content that is present in the message string for each
`logfilename`. It detects anomalies where the information content is low compared
to its past behavior. For example, this function detects unusually low amounts
of information in a collection of rolling log files. Low information might
indicate that a process has entered an infinite loop or that logging features
have been disabled.
.Example 3: Analyzing message strings with the low_info_content function
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
{ {
@ -122,3 +77,11 @@ have been disabled.
"by_field_name" : "logfilename" "by_field_name" : "logfilename"
} }
-------------------------------------------------- --------------------------------------------------
If you use this `low_info_content` function in a detector in your job, it models
information content that is present in the message string for each
`logfilename`. It detects anomalies where the information content is low
compared to its past behavior. For example, this function detects unusually low
amounts of information in a collection of rolling log files. Low information
might indicate that a process has entered an infinite loop or that logging
features have been disabled.

View File

@ -9,16 +9,10 @@ The {xpackml} features include the following metric functions:
* <<ml-metric-min,`min`>> * <<ml-metric-min,`min`>>
* <<ml-metric-max,`max`>> * <<ml-metric-max,`max`>>
* <<ml-metric-median,`median`>> * xref:ml-metric-median[`median`, `high_median`, `low_median`]
* <<ml-metric-high-median,`high_median`>> * xref:ml-metric-mean[`mean`, `high_mean`, `low_mean`]
* <<ml-metric-low-median,`low_median`>>
* <<ml-metric-mean,`mean`>>
* <<ml-metric-high-mean,`high_mean`>>
* <<ml-metric-low-mean,`low_mean`>>
* <<ml-metric-metric,`metric`>> * <<ml-metric-metric,`metric`>>
* <<ml-metric-varp,`varp`>> * xref:ml-metric-varp[`varp`, `high_varp`, `low_varp`]
* <<ml-metric-high-varp,`high_varp`>>
* <<ml-metric-low-varp,`low_varp`>>
[float] [float]
[[ml-metric-min]] [[ml-metric-min]]
@ -35,18 +29,11 @@ This function supports the following properties:
* `by_field_name` (optional) * `by_field_name` (optional)
* `over_field_name` (optional) * `over_field_name` (optional)
* `partition_field_name` (optional) * `partition_field_name` (optional)
* `summary_count_field_name` (optional)
For more information about those properties, For more information about those properties,
see <<ml-detectorconfig,Detector Configuration Objects>>. see <<ml-detectorconfig,Detector Configuration Objects>>.
For example, if you use the following function in a detector in your job, .Example 1: Analyzing minimum transactions with the min function
it detects where the smallest transaction is lower than previously observed.
You can use this function to detect items for sale at unintentionally low
prices due to data entry mistakes. It models the minimum amount for each
product over time.
//Detect when the minumum amount for a product is unusually low compared to its past amounts
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
{ {
@ -56,6 +43,10 @@ product over time.
} }
-------------------------------------------------- --------------------------------------------------
If you use this `min` function in a detector in your job, it detects where the
smallest transaction is lower than previously observed. You can use this
function to detect items for sale at unintentionally low prices due to data
entry mistakes. It models the minimum amount for each product over time.
[float] [float]
[[ml-metric-max]] [[ml-metric-max]]
@ -72,18 +63,11 @@ This function supports the following properties:
* `by_field_name` (optional) * `by_field_name` (optional)
* `over_field_name` (optional) * `over_field_name` (optional)
* `partition_field_name` (optional) * `partition_field_name` (optional)
* `summary_count_field_name` (optional)
For more information about those properties, For more information about those properties,
see <<ml-detectorconfig,Detector Configuration Objects>>. see <<ml-detectorconfig,Detector Configuration Objects>>.
For example, if you use the following function in a detector in your job, .Example 2: Analyzing maximum response times with the max function
it detects where the longest `responsetime` is longer than previously observed.
You can use this function to detect applications that have `responsetime`
values that are unusually lengthy. It models the maximum `responsetime` for
each application over time and detects when the longest `responsetime` is
unusually long compared to previous applications.
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
{ {
@ -93,11 +77,14 @@ unusually long compared to previous applications.
} }
-------------------------------------------------- --------------------------------------------------
This analysis can be performed alongside `high_mean` functions by If you use this `max` function in a detector in your job, it detects where the
application. By combining detectors and using the same influencer this would longest `responsetime` is longer than previously observed. You can use this
detect both unusually long individual response times and average response times function to detect applications that have `responsetime` values that are
for each bucket. For example: unusually lengthy. It models the maximum `responsetime` for each application
over time and detects when the longest `responsetime` is unusually long compared
to previous applications.
.Example 3: Two detectors with max and high_mean functions
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
{ {
@ -112,29 +99,35 @@ for each bucket. For example:
} }
-------------------------------------------------- --------------------------------------------------
The analysis in the previous example can be performed alongside `high_mean`
functions by application. By combining detectors and using the same influencer
this job can detect both unusually long individual response times and average
response times for each bucket.
[float] [float]
[[ml-metric-median]] [[ml-metric-median]]
==== Median ==== Median, High_median, Low_median
The `median` function detects anomalies in the statistical median of a value. The `median` function detects anomalies in the statistical median of a value.
The median value is calculated for each bucket. The median value is calculated for each bucket.
This function supports the following properties: If you want to monitor unusually high median values, use the `high_median`
function.
If you are just interested in unusually low median values, use the `low_median`
function.
These functions support the following properties:
* `field_name` (required) * `field_name` (required)
* `by_field_name` (optional) * `by_field_name` (optional)
* `over_field_name` (optional) * `over_field_name` (optional)
* `partition_field_name` (optional) * `partition_field_name` (optional)
* `summary_count_field_name` (optional)
For more information about those properties, For more information about those properties,
see <<ml-detectorconfig,Detector Configuration Objects>>. see <<ml-detectorconfig,Detector Configuration Objects>>.
For example, if you use the following function in a detector in your job, .Example 4: Analyzing response times with the median function
it models the median `responsetime` for each application over time. It detects
when the median `responsetime` is unusual compared to previous `responsetime`
values.
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
{ {
@ -144,68 +137,34 @@ values.
} }
-------------------------------------------------- --------------------------------------------------
[float] If you use this `median` function in a detector in your job, it models the
[[ml-metric-high-median]] median `responsetime` for each application over time. It detects when the median
==== High_median `responsetime` is unusual compared to previous `responsetime` values.
The `high_median` function detects anomalies in the statistical median of a value.
The median value is calculated for each bucket.
Use this function if you want to monitor unusually high median values.
This function supports the following properties:
* `field_name` (required)
* `by_field_name` (optional)
* `over_field_name` (optional)
* `partition_field_name` (optional)
* `summary_count_field_name` (optional)
For more information about those properties,
see <<ml-detectorconfig,Detector Configuration Objects>>.
[float]
[[ml-metric-low-median]]
==== Low_median
The `low_median` function detects anomalies in the statistical median of a value.
The median value is calculated for each bucket.
Use this function if you are just interested in unusually low median values.
This function supports the following properties:
* `field_name` (required)
* `by_field_name` (optional)
* `over_field_name` (optional)
* `partition_field_name` (optional)
* `summary_count_field_name` (optional)
For more information about those properties,
see <<ml-detectorconfig,Detector Configuration Objects>>.
[float] [float]
[[ml-metric-mean]] [[ml-metric-mean]]
==== Mean ==== Mean, High_mean, Low_mean
The `mean` function detects anomalies in the arithmetic mean of a value. The `mean` function detects anomalies in the arithmetic mean of a value.
The mean value is calculated for each bucket. The mean value is calculated for each bucket.
This function supports the following properties: If you want to monitor unusually high average values, use the `high_mean`
function.
If you are just interested in unusually low average values, use the `low_mean`
function.
These functions support the following properties:
* `field_name` (required) * `field_name` (required)
* `by_field_name` (optional) * `by_field_name` (optional)
* `over_field_name` (optional) * `over_field_name` (optional)
* `partition_field_name` (optional) * `partition_field_name` (optional)
* `summary_count_field_name` (optional)
For more information about those properties, For more information about those properties,
see <<ml-detectorconfig,Detector Configuration Objects>>. see <<ml-detectorconfig,Detector Configuration Objects>>.
For example, if you use the following function in a detector in your job, .Example 5: Analyzing response times with the mean function
it models the mean `responsetime` for each application over time. It detects
when the mean `responsetime` is unusual compared to previous `responsetime`
values.
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
{ {
@ -215,30 +174,11 @@ values.
} }
-------------------------------------------------- --------------------------------------------------
[float] If you use this `mean` function in a detector in your job, it models the mean
[[ml-metric-high-mean]] `responsetime` for each application over time. It detects when the mean
==== High_mean `responsetime` is unusual compared to previous `responsetime` values.
The `high_mean` function detects anomalies in the arithmetic mean of a value.
The mean value is calculated for each bucket.
Use this function if you want to monitor unusually high average values.
This function supports the following properties:
* `field_name` (required)
* `by_field_name` (optional)
* `over_field_name` (optional)
* `partition_field_name` (optional)
* `summary_count_field_name` (optional)
For more information about those properties,
see <<ml-detectorconfig,Detector Configuration Objects>>.
For example, if you use the following function in a detector in your job,
it models the mean `responsetime` for each application over time. It detects
when the mean `responsetime` is unusually high compared to previous
`responsetime` values.
.Example 6: Analyzing response times with the high_mean function
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
{ {
@ -248,30 +188,11 @@ when the mean `responsetime` is unusually high compared to previous
} }
-------------------------------------------------- --------------------------------------------------
[float] If you use this `high_mean` function in a detector in your job, it models the
[[ml-metric-low-mean]] mean `responsetime` for each application over time. It detects when the mean
==== Low_mean `responsetime` is unusually high compared to previous `responsetime` values.
The `low_mean` function detects anomalies in the arithmetic mean of a value.
The mean value is calculated for each bucket.
Use this function if you are just interested in unusually low average values.
This function supports the following properties:
* `field_name` (required)
* `by_field_name` (optional)
* `over_field_name` (optional)
* `partition_field_name` (optional)
* `summary_count_field_name` (optional)
For more information about those properties,
see <<ml-detectorconfig,Detector Configuration Objects>>.
For example, if you use the following function in a detector in your job,
it models the mean `responsetime` for each application over time. It detects
when the mean `responsetime` is unusually low
compared to previous `responsetime` values.
.Example 7: Analyzing response times with the low_mean function
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
{ {
@ -281,6 +202,10 @@ compared to previous `responsetime` values.
} }
-------------------------------------------------- --------------------------------------------------
If you use this `low_mean` function in a detector in your job, it models the
mean `responsetime` for each application over time. It detects when the mean
`responsetime` is unusually low compared to previous `responsetime` values.
[float] [float]
[[ml-metric-metric]] [[ml-metric-metric]]
==== Metric ==== Metric
@ -303,11 +228,7 @@ This function supports the following properties:
For more information about those properties, For more information about those properties,
see <<ml-detectorconfig,Detector Configuration Objects>>. see <<ml-detectorconfig,Detector Configuration Objects>>.
For example, if you use the following function in a detector in your job, .Example 8: Analyzing response times with the metric function
it models the mean, min, and max `responsetime` for each application over time.
It detects when the mean, min, or max `responsetime` is unusual compared to
previous `responsetime` values.
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
{ {
@ -317,30 +238,33 @@ previous `responsetime` values.
} }
-------------------------------------------------- --------------------------------------------------
If you use this `metric` function in a detector in your job, it models the
mean, min, and max `responsetime` for each application over time. It detects
when the mean, min, or max `responsetime` is unusual compared to previous
`responsetime` values.
[float] [float]
[[ml-metric-varp]] [[ml-metric-varp]]
==== Varp ==== Varp, High_varp, Low_varp
The `varp` function detects anomalies in the variance of a value which is a The `varp` function detects anomalies in the variance of a value which is a
measure of the variability and spread in the data. measure of the variability and spread in the data.
This function supports the following properties: If you want to monitor unusually high variance, use the `high_varp` function.
If you are just interested in unusually low variance, use the `low_varp` function.
These functions support the following properties:
* `field_name` (required) * `field_name` (required)
* `by_field_name` (optional) * `by_field_name` (optional)
* `over_field_name` (optional) * `over_field_name` (optional)
* `partition_field_name` (optional) * `partition_field_name` (optional)
* `summary_count_field_name` (optional)
For more information about those properties, For more information about those properties,
see <<ml-detectorconfig,Detector Configuration Objects>>. see <<ml-detectorconfig,Detector Configuration Objects>>.
For example, if you use the following function in a detector in your job, .Example 9: Analyzing response times with the varp function
it models models the variance in values of `responsetime` for each application
over time. It detects when the variance in `responsetime` is unusual compared
to past application behavior.
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
{ {
@ -350,30 +274,12 @@ to past application behavior.
} }
-------------------------------------------------- --------------------------------------------------
[float] If you use this `varp` function in a detector in your job, it models the
[[ml-metric-high-varp]] variance in values of `responsetime` for each application over time. It detects
==== High_varp when the variance in `responsetime` is unusual compared to past application
behavior.
The `high_varp` function detects anomalies in the variance of a value which is a
measure of the variability and spread in the data. Use this function if you want
to monitor unusually high variance.
This function supports the following properties:
* `field_name` (required)
* `by_field_name` (optional)
* `over_field_name` (optional)
* `partition_field_name` (optional)
* `summary_count_field_name` (optional)
For more information about those properties,
see <<ml-detectorconfig,Detector Configuration Objects>>.
For example, if you use the following function in a detector in your job,
it models models the variance in values of `responsetime` for each application
over time. It detects when the variance in `responsetime` is unusual compared
to past application behavior.
.Example 10: Analyzing response times with the high_varp function
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
{ {
@ -383,31 +289,12 @@ to past application behavior.
} }
-------------------------------------------------- --------------------------------------------------
If you use this `high_varp` function in a detector in your job, it models the
variance in values of `responsetime` for each application over time. It detects
when the variance in `responsetime` is unusual compared to past application
behavior.
[float] .Example 11: Analyzing response times with the low_varp function
[[ml-metric-low-varp]]
==== Low_varp
The `low_varp` function detects anomalies in the variance of a value which is a
measure of the variability and spread in the data. Use this function if you are
just interested in unusually low variance.
This function supports the following properties:
* `field_name` (required)
* `by_field_name` (optional)
* `over_field_name` (optional)
* `partition_field_name` (optional)
* `summary_count_field_name` (optional)
For more information about those properties,
see <<ml-detectorconfig,Detector Configuration Objects>>.
For example, if you use the following function in a detector in your job,
it models models the variance in values of `responsetime` for each application
over time. It detects when the variance in `responsetime` is unusual compared
to past application behavior.
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
{ {
@ -416,3 +303,8 @@ to past application behavior.
"by_field_name" : "application" "by_field_name" : "application"
} }
-------------------------------------------------- --------------------------------------------------
If you use this `low_varp` function in a detector in your job, it models the
variance in values of `responsetime` for each application over time. It detects
when the variance in `responsetime` is unusual compared to past application
behavior.

View File

@ -40,17 +40,11 @@ This function supports the following properties:
* `by_field_name` (required) * `by_field_name` (required)
* `over_field_name` (optional) * `over_field_name` (optional)
* `partition_field_name` (optional) * `partition_field_name` (optional)
* `summary_count_field_name` (optional)
For more information about those properties, For more information about those properties,
see <<ml-detectorconfig,Detector Configuration Objects>>. see <<ml-detectorconfig,Detector Configuration Objects>>.
For example, if you use the following function in a detector in your job, it .Example 1: Analyzing status codes with the rare function
detects values that are rare in time. It models status codes that occur over
time and detects when rare status codes occur compared to the past. For example,
you can detect status codes in a web
access log that have never (or rarely) occurred before.
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
{ {
@ -59,16 +53,12 @@ access log that have never (or rarely) occurred before.
} }
-------------------------------------------------- --------------------------------------------------
If you use the following function in a detector in your job, it If you use this `rare` function in a detector in your job, it detects values
detects values that are rare in a population. It models status code and client that are rare in time. It models status codes that occur over time and detects
IP interactions that occur. It defines a rare status code as one that occurs for when rare status codes occur compared to the past. For example, you can detect
few client IP values compared to the population. It detects client IP values status codes in a web access log that have never (or rarely) occurred before.
that experience one or more distinct rare status codes compared to the
population. For example in a web access log, a `clientip` that experiences the
highest number of different rare status codes compared to the population is
regarded as highly anomalous. This analysis is based on the number of different
status code values, not the count of occurrences.
.Example 2: Analyzing status codes in a population with the rare function
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
{ {
@ -78,14 +68,21 @@ status code values, not the count of occurrences.
} }
-------------------------------------------------- --------------------------------------------------
If you use this `rare` function in a detector in your job, it detects values
that are rare in a population. It models status code and client IP interactions
that occur. It defines a rare status code as one that occurs for few client IP
values compared to the population. It detects client IP values that experience
one or more distinct rare status codes compared to the population. For example
in a web access log, a `clientip` that experiences the highest number of
different rare status codes compared to the population is regarded as highly
anomalous. This analysis is based on the number of different status code values,
not the count of occurrences.
NOTE: To define a status code as rare the {xpackml} features look at the number NOTE: To define a status code as rare the {xpackml} features look at the number
of distinct status codes that occur, not the number of times the status code of distinct status codes that occur, not the number of times the status code
occurs. If a single client IP experiences a single unique status code, this occurs. If a single client IP experiences a single unique status code, this
is rare, even if it occurs for that client IP in every bucket. is rare, even if it occurs for that client IP in every bucket.
//TBD: Still pertinent? "Here with rare we look at the number of distinct status codes.""
[float] [float]
[[ml-freq-rare]] [[ml-freq-rare]]
==== Freq_rare ==== Freq_rare
@ -99,21 +96,11 @@ This function supports the following properties:
* `by_field_name` (required) * `by_field_name` (required)
* `over_field_name` (optional) * `over_field_name` (optional)
* `partition_field_name` (optional) * `partition_field_name` (optional)
* `summary_count_field_name` (optional)
For more information about those properties, For more information about those properties,
see <<ml-detectorconfig,Detector Configuration Objects>>. see <<ml-detectorconfig,Detector Configuration Objects>>.
For example, if you use the following function in a detector in your job, it .Example 3: Analyzing URI values in a population with the freq_rare function
detects values that are frequently rare in a population. It models URI paths and
client IP interactions that occur. It defines a rare URI path as one that is
visited by few client IP values compared to the population. It detects the
client IP values that experience many interactions with rare URI paths compared
to the population. For example in a web access log, a client IP that visits
one or more rare URI paths many times compared to the population is regarded as
highly anomalous. This analysis is based on the count of interactions with rare
URI paths, not the number of different URI path values.
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
{ {
@ -123,9 +110,17 @@ URI paths, not the number of different URI path values.
} }
-------------------------------------------------- --------------------------------------------------
If you use this `freq_rare` function in a detector in your job, it
detects values that are frequently rare in a population. It models URI paths and
client IP interactions that occur. It defines a rare URI path as one that is
visited by few client IP values compared to the population. It detects the
client IP values that experience many interactions with rare URI paths compared
to the population. For example in a web access log, a client IP that visits
one or more rare URI paths many times compared to the population is regarded as
highly anomalous. This analysis is based on the count of interactions with rare
URI paths, not the number of different URI path values.
NOTE: To define a URI path as rare, the analytics consider the number of NOTE: To define a URI path as rare, the analytics consider the number of
distinct values that occur and not the number of times the URI path occurs. distinct values that occur and not the number of times the URI path occurs.
If a single client IP visits a single unique URI path, this is rare, even if it If a single client IP visits a single unique URI path, this is rare, even if it
occurs for that client IP in every bucket. occurs for that client IP in every bucket.
//TBD: Still pertinent? "Here with freq_rare we look at the number of times interactions have happened.""

View File

@ -13,12 +13,8 @@ ignored; buckets with a zero value are analyzed.
The {xpackml} features include the following sum functions: The {xpackml} features include the following sum functions:
* <<ml-sum,`sum`>> * xref:ml-sum[`sum`, `high_sum`, `low_sum`]
* <<ml-high-sum,`high_sum`>> * xref:ml-nonnull-sum[`non_null_sum`, `high_non_null_sum`, `low_non_null_sum`]
* <<ml-low-sum,`low_sum`>>
* <<ml-nonnull-sum,`non_null_sum`>>
* <<ml-high-nonnull-sum,`high_non_null_sum`>>
* <<ml-low-nonnull-sum,`low_non_null_sum`>>
//// ////
TBD: Incorporate from prelert docs?: TBD: Incorporate from prelert docs?:
@ -29,27 +25,26 @@ a more appropriate method to using the sum function.
[float] [float]
[[ml-sum]] [[ml-sum]]
==== Sum ==== Sum, High_sum, Low_sum
The `sum` function detects anomalies where the sum of a field in a bucket is The `sum` function detects anomalies where the sum of a field in a bucket is
anomalous. anomalous.
This function supports the following properties: If you want to monitor unusually high sum values, use the `high_sum` function.
If you want to monitor unusually low sum values, use the `low_sum` function.
These functions support the following properties:
* `field_name` (required) * `field_name` (required)
* `by_field_name` (optional) * `by_field_name` (optional)
* `over_field_name` (optional) * `over_field_name` (optional)
* `partition_field_name` (optional) * `partition_field_name` (optional)
* `summary_count_field_name` (optional)
For more information about those properties, For more information about those properties,
see <<ml-detectorconfig,Detector Configuration Objects>>. see <<ml-detectorconfig,Detector Configuration Objects>>.
For example, if you use the following function in a detector in your job, it .Example 1: Analyzing total expenses with the sum function
models total expenses per employees for each cost center. For each time bucket,
it detects when an employees expenses are unusual for a cost center compared
to other employees.
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
{ {
@ -60,28 +55,12 @@ to other employees.
} }
-------------------------------------------------- --------------------------------------------------
[float] If you use this `sum` function in a detector in your job, it
[[ml-high-sum]] models total expenses per employees for each cost center. For each time bucket,
==== High_sum it detects when an employees expenses are unusual for a cost center compared
to other employees.
The `high_sum` function detects anomalies where the sum of a field in a bucket
is unusually high.
This function supports the following properties:
* `field_name` (required)
* `by_field_name` (optional)
* `over_field_name` (optional)
* `partition_field_name` (optional)
* `summary_count_field_name` (optional)
For more information about those properties,
see <<ml-detectorconfig,Detector Configuration Objects>>.
For example, if you use the following function in a detector in your job, it
models total `cs_bytes`. It detects `cs_hosts` that transfer unusually high
volumes compared to other `cs_hosts`.
.Example 2: Analyzing total bytes with the high_sum function
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
{ {
@ -91,42 +70,30 @@ volumes compared to other `cs_hosts`.
} }
-------------------------------------------------- --------------------------------------------------
This example looks for volumes of data transferred from a client to a server on If you use this `high_sum` function in a detector in your job, it
the internet that are unusual compared to other clients. This scenario could be models total `cs_bytes`. It detects `cs_hosts` that transfer unusually high
useful to detect data exfiltration or to find users that are abusing internet volumes compared to other `cs_hosts`. This example looks for volumes of data
privileges. transferred from a client to a server on the internet that are unusual compared
to other clients. This scenario could be useful to detect data exfiltration or
[float] to find users that are abusing internet privileges.
[[ml-low-sum]]
==== Low_sum
The `low_sum` function detects anomalies where the sum of a field in a bucket
is unusually low.
This function supports the following properties:
* `field_name` (required)
* `by_field_name` (optional)
* `over_field_name` (optional)
* `partition_field_name` (optional)
* `summary_count_field_name` (optional)
For more information about those properties,
see <<ml-detectorconfig,Detector Configuration Objects>>.
[float] [float]
[[ml-nonnull-sum]] [[ml-nonnull-sum]]
==== Non_null_sum ==== Non_null_sum, High_non_null_sum, Low_non_null_sum
The `non_null_sum` function is useful if your data is sparse. Buckets without The `non_null_sum` function is useful if your data is sparse. Buckets without
values are ignored and buckets with a zero value are analyzed. values are ignored and buckets with a zero value are analyzed.
This function supports the following properties: If you want to monitor unusually high totals, use the `high_non_null_sum`
function.
If you want to look at drops in totals, use the `low_non_null_sum` function.
These functions support the following properties:
* `field_name` (required) * `field_name` (required)
* `by_field_name` (optional) * `by_field_name` (optional)
* `partition_field_name` (optional) * `partition_field_name` (optional)
* `summary_count_field_name` (optional)
For more information about those properties, For more information about those properties,
see <<ml-detectorconfig,Detector Configuration Objects>>. see <<ml-detectorconfig,Detector Configuration Objects>>.
@ -134,32 +101,7 @@ see <<ml-detectorconfig,Detector Configuration Objects>>.
NOTE: Population analysis (that is to say, use of the `over_field_name` property) NOTE: Population analysis (that is to say, use of the `over_field_name` property)
is not applicable for this function. is not applicable for this function.
[float] .Example 3: Analyzing employee approvals with the high_non_null_sum function
[[ml-high-nonnull-sum]]
==== High_non_null_sum
The `high_non_null_sum` function is useful if your data is sparse. Buckets
without values are ignored and buckets with a zero value are analyzed.
Use this function if you want to monitor unusually high totals.
This function supports the following properties:
* `field_name` (required)
* `by_field_name` (optional)
* `partition_field_name` (optional)
* `summary_count_field_name` (optional)
For more information about those properties,
see <<ml-detectorconfig,Detector Configuration Objects>>.
NOTE: Population analysis (that is to say, use of the `over_field_name` property)
is not applicable for this function.
For example, if you use the following function in a detector in your job, it
models the total `amount_approved` for each employee. It ignores any buckets
where the amount is null. It detects employees who approve unusually high
amounts compared to their past behavior.
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
{ {
@ -169,26 +111,9 @@ amounts compared to their past behavior.
} }
-------------------------------------------------- --------------------------------------------------
If you use this `high_non_null_sum` function in a detector in your job, it
models the total `amount_approved` for each employee. It ignores any buckets
where the amount is null. It detects employees who approve unusually high
amounts compared to their past behavior.
//For this credit control system analysis, using non_null_sum will ignore //For this credit control system analysis, using non_null_sum will ignore
//periods where the employees are not active on the system. //periods where the employees are not active on the system.
[float]
[[ml-low-nonnull-sum]]
==== Low_non_null_sum
The `low_non_null_sum` function is useful if your data is sparse. Buckets
without values are ignored and buckets with a zero value are analyzed.
Use this function if you want to look at drops in totals.
This function supports the following properties:
* `field_name` (required)
* `by_field_name` (optional)
* `partition_field_name` (optional)
* `summary_count_field_name` (optional)
For more information about those properties,
see <<ml-detectorconfig,Detector Configuration Objects>>.
NOTE: Population analysis (that is to say, use of the `over_field_name` property)
is not applicable for this function.

View File

@ -47,16 +47,11 @@ This function supports the following properties:
* `by_field_name` (optional) * `by_field_name` (optional)
* `over_field_name` (optional) * `over_field_name` (optional)
* `partition_field_name` (optional) * `partition_field_name` (optional)
* `summary_count_field_name` (optional)
For more information about those properties, For more information about those properties,
see <<ml-detectorconfig,Detector Configuration Objects>>. see <<ml-detectorconfig,Detector Configuration Objects>>.
For example, if you use the following function in a detector in your job, it .Example 1: Analyzing events with the time_of_day function
models when events occur throughout a day for each process. It detects when an
event occurs for a process that is at an unusual time in the day compared to
its past behavior.
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
{ {
@ -65,6 +60,10 @@ its past behavior.
} }
-------------------------------------------------- --------------------------------------------------
If you use this `time_of_day` function in a detector in your job, it
models when events occur throughout a day for each process. It detects when an
event occurs for a process that is at an unusual time in the day compared to
its past behavior.
[float] [float]
[[ml-time-of-week]] [[ml-time-of-week]]
@ -78,17 +77,11 @@ This function supports the following properties:
* `by_field_name` (optional) * `by_field_name` (optional)
* `over_field_name` (optional) * `over_field_name` (optional)
* `partition_field_name` (optional) * `partition_field_name` (optional)
* `summary_count_field_name` (optional)
For more information about those properties, For more information about those properties,
see <<ml-detectorconfig,Detector Configuration Objects>>. see <<ml-detectorconfig,Detector Configuration Objects>>.
For example, if you use the following function in a detector in your job, it .Example 2: Analyzing events with the time_of_week function
models when events occur throughout the week for each `eventcode`. It detects
when a workstation event occurs at an unusual time during the week for that
`eventcode` compared to other workstations. It detects events for a
particular workstation that are outside the normal usage pattern.
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
{ {
@ -97,3 +90,9 @@ particular workstation that are outside the normal usage pattern.
"over_field_name" : "workstation" "over_field_name" : "workstation"
} }
-------------------------------------------------- --------------------------------------------------
If you use this `time_of_week` function in a detector in your job, it
models when events occur throughout the week for each `eventcode`. It detects
when a workstation event occurs at an unusual time during the week for that
`eventcode` compared to other workstations. It detects events for a
particular workstation that are outside the normal usage pattern.