2018-06-19 16:57:10 -04:00
|
|
|
[role="xpack"]
|
2017-05-05 13:40:17 -04:00
|
|
|
[[ml-rare-functions]]
|
2018-06-19 16:57:10 -04:00
|
|
|
=== Rare functions
|
2017-05-05 13:40:17 -04:00
|
|
|
|
|
|
|
The rare functions detect values that occur rarely in time or rarely for a
|
|
|
|
population.
|
|
|
|
|
|
|
|
The `rare` analysis detects anomalies according to the number of distinct rare
|
|
|
|
values. This differs from `freq_rare`, which detects anomalies according to the
|
|
|
|
number of times (frequency) rare values occur.
|
|
|
|
|
|
|
|
[NOTE]
|
|
|
|
====
|
|
|
|
* The `rare` and `freq_rare` functions should not be used in conjunction with
|
|
|
|
`exclude_frequent`.
|
2017-12-21 11:14:52 -05:00
|
|
|
* You cannot create forecasts for jobs that contain `rare` or `freq_rare`
|
|
|
|
functions.
|
2017-05-05 13:40:17 -04:00
|
|
|
* Shorter bucket spans (less than 1 hour, for example) are recommended when
|
|
|
|
looking for rare events. The functions model whether something happens in a
|
|
|
|
bucket at least once. With longer bucket spans, it is more likely that
|
|
|
|
entities will be seen in a bucket and therefore they appear less rare.
|
|
|
|
Picking the ideal the bucket span depends on the characteristics of the data
|
|
|
|
with shorter bucket spans typically being measured in minutes, not hours.
|
|
|
|
* To model rare data, a learning period of at least 20 buckets is required
|
|
|
|
for typical data.
|
|
|
|
====
|
|
|
|
|
2017-05-17 12:34:30 -04:00
|
|
|
The {xpackml} features include the following rare functions:
|
|
|
|
|
|
|
|
* <<ml-rare,`rare`>>
|
|
|
|
* <<ml-freq-rare,`freq_rare`>>
|
|
|
|
|
|
|
|
|
|
|
|
[float]
|
|
|
|
[[ml-rare]]
|
|
|
|
==== Rare
|
|
|
|
|
|
|
|
The `rare` function detects values that occur rarely in time or rarely for a
|
|
|
|
population. It detects anomalies according to the number of distinct rare values.
|
|
|
|
|
|
|
|
This function supports the following properties:
|
|
|
|
|
|
|
|
* `by_field_name` (required)
|
|
|
|
* `over_field_name` (optional)
|
|
|
|
* `partition_field_name` (optional)
|
|
|
|
|
2017-06-19 22:31:39 -04:00
|
|
|
For more information about those properties, see
|
|
|
|
{ref}/ml-job-resource.html#ml-detectorconfig[Detector Configuration Objects].
|
2017-05-17 12:34:30 -04:00
|
|
|
|
2017-05-19 13:48:15 -04:00
|
|
|
.Example 1: Analyzing status codes with the rare function
|
2017-05-17 12:34:30 -04:00
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------
|
|
|
|
{
|
|
|
|
"function" : "rare",
|
|
|
|
"by_field_name" : "status"
|
|
|
|
}
|
|
|
|
--------------------------------------------------
|
2018-06-19 16:57:10 -04:00
|
|
|
// NOTCONSOLE
|
2017-05-17 12:34:30 -04:00
|
|
|
|
2017-05-19 13:48:15 -04:00
|
|
|
If you use this `rare` function in a detector in your job, it detects values
|
|
|
|
that are rare in time. It models status codes that occur over time and detects
|
|
|
|
when rare status codes occur compared to the past. For example, you can detect
|
|
|
|
status codes in a web access log that have never (or rarely) occurred before.
|
2017-05-17 12:34:30 -04:00
|
|
|
|
2017-05-19 13:48:15 -04:00
|
|
|
.Example 2: Analyzing status codes in a population with the rare function
|
2017-05-17 12:34:30 -04:00
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------
|
|
|
|
{
|
|
|
|
"function" : "rare",
|
|
|
|
"by_field_name" : "status",
|
|
|
|
"over_field_name" : "clientip"
|
|
|
|
}
|
|
|
|
--------------------------------------------------
|
2018-06-19 16:57:10 -04:00
|
|
|
// NOTCONSOLE
|
2017-05-17 12:34:30 -04:00
|
|
|
|
2017-05-19 13:48:15 -04:00
|
|
|
If you use this `rare` function in a detector in your job, it detects values
|
|
|
|
that are rare in a population. It models status code and client IP interactions
|
|
|
|
that occur. It defines a rare status code as one that occurs for few client IP
|
|
|
|
values compared to the population. It detects client IP values that experience
|
|
|
|
one or more distinct rare status codes compared to the population. For example
|
|
|
|
in a web access log, a `clientip` that experiences the highest number of
|
|
|
|
different rare status codes compared to the population is regarded as highly
|
|
|
|
anomalous. This analysis is based on the number of different status code values,
|
|
|
|
not the count of occurrences.
|
|
|
|
|
2017-05-17 12:34:30 -04:00
|
|
|
NOTE: To define a status code as rare the {xpackml} features look at the number
|
|
|
|
of distinct status codes that occur, not the number of times the status code
|
|
|
|
occurs. If a single client IP experiences a single unique status code, this
|
|
|
|
is rare, even if it occurs for that client IP in every bucket.
|
|
|
|
|
|
|
|
[float]
|
|
|
|
[[ml-freq-rare]]
|
|
|
|
==== Freq_rare
|
|
|
|
|
|
|
|
The `freq_rare` function detects values that occur rarely for a population.
|
|
|
|
It detects anomalies according to the number of times (frequency) that rare
|
|
|
|
values occur.
|
|
|
|
|
|
|
|
This function supports the following properties:
|
|
|
|
|
|
|
|
* `by_field_name` (required)
|
2017-12-28 05:50:20 -05:00
|
|
|
* `over_field_name` (required)
|
2017-05-17 12:34:30 -04:00
|
|
|
* `partition_field_name` (optional)
|
2017-05-05 13:40:17 -04:00
|
|
|
|
2017-06-19 22:31:39 -04:00
|
|
|
For more information about those properties, see
|
|
|
|
{ref}/ml-job-resource.html#ml-detectorconfig[Detector Configuration Objects].
|
2017-05-05 14:57:20 -04:00
|
|
|
|
2017-05-19 13:48:15 -04:00
|
|
|
.Example 3: Analyzing URI values in a population with the freq_rare function
|
2017-05-05 14:57:20 -04:00
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------
|
2017-05-17 12:34:30 -04:00
|
|
|
{
|
|
|
|
"function" : "freq_rare",
|
|
|
|
"by_field_name" : "uri",
|
|
|
|
"over_field_name" : "clientip"
|
|
|
|
}
|
2017-05-05 14:57:20 -04:00
|
|
|
--------------------------------------------------
|
2018-06-19 16:57:10 -04:00
|
|
|
// NOTCONSOLE
|
2017-05-05 14:57:20 -04:00
|
|
|
|
2017-05-19 13:48:15 -04:00
|
|
|
If you use this `freq_rare` function in a detector in your job, it
|
|
|
|
detects values that are frequently rare in a population. It models URI paths and
|
|
|
|
client IP interactions that occur. It defines a rare URI path as one that is
|
|
|
|
visited by few client IP values compared to the population. It detects the
|
|
|
|
client IP values that experience many interactions with rare URI paths compared
|
|
|
|
to the population. For example in a web access log, a client IP that visits
|
|
|
|
one or more rare URI paths many times compared to the population is regarded as
|
|
|
|
highly anomalous. This analysis is based on the count of interactions with rare
|
|
|
|
URI paths, not the number of different URI path values.
|
|
|
|
|
2017-05-17 12:34:30 -04:00
|
|
|
NOTE: To define a URI path as rare, the analytics consider the number of
|
|
|
|
distinct values that occur and not the number of times the URI path occurs.
|
|
|
|
If a single client IP visits a single unique URI path, this is rare, even if it
|
|
|
|
occurs for that client IP in every bucket.
|