mirror of
https://github.com/honeymoose/OpenSearch.git
synced 2025-02-07 21:48:39 +00:00
6d6c776cd4
* [DOCS] Add ML rare functions * [DOCS] Address feedback in ML rare functions Original commit: elastic/x-pack-elasticsearch@388274557c
132 lines
5.0 KiB
Plaintext
132 lines
5.0 KiB
Plaintext
[[ml-rare-functions]]
|
|
=== Rare Functions
|
|
|
|
The rare functions detect values that occur rarely in time or rarely for a
|
|
population.
|
|
|
|
The `rare` analysis detects anomalies according to the number of distinct rare
|
|
values. This differs from `freq_rare`, which detects anomalies according to the
|
|
number of times (frequency) rare values occur.
|
|
|
|
[NOTE]
|
|
====
|
|
* The `rare` and `freq_rare` functions should not be used in conjunction with
|
|
`exclude_frequent`.
|
|
* Shorter bucket spans (less than 1 hour, for example) are recommended when
|
|
looking for rare events. The functions model whether something happens in a
|
|
bucket at least once. With longer bucket spans, it is more likely that
|
|
entities will be seen in a bucket and therefore they appear less rare.
|
|
Picking the ideal the bucket span depends on the characteristics of the data
|
|
with shorter bucket spans typically being measured in minutes, not hours.
|
|
* To model rare data, a learning period of at least 20 buckets is required
|
|
for typical data.
|
|
====
|
|
|
|
The {xpackml} features include the following rare functions:
|
|
|
|
* <<ml-rare,`rare`>>
|
|
* <<ml-freq-rare,`freq_rare`>>
|
|
|
|
|
|
[float]
|
|
[[ml-rare]]
|
|
==== Rare
|
|
|
|
The `rare` function detects values that occur rarely in time or rarely for a
|
|
population. It detects anomalies according to the number of distinct rare values.
|
|
|
|
This function supports the following properties:
|
|
|
|
* `by_field_name` (required)
|
|
* `over_field_name` (optional)
|
|
* `partition_field_name` (optional)
|
|
* `summary_count_field_name` (optional)
|
|
|
|
For more information about those properties,
|
|
see <<ml-detectorconfig,Detector Configuration Objects>>.
|
|
|
|
For example, if you use the following function in a detector in your job, it
|
|
detects values that are rare in time. It models status codes that occur over
|
|
time and detects when rare status codes occur compared to the past. For example,
|
|
you can detect status codes in a web
|
|
access log that have never (or rarely) occurred before.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"function" : "rare",
|
|
"by_field_name" : "status"
|
|
}
|
|
--------------------------------------------------
|
|
|
|
If you use the following function in a detector in your job, it
|
|
detects values that are rare in a population. It models status code and client
|
|
IP interactions that occur. It defines a rare status code as one that occurs for
|
|
few client IP values compared to the population. It detects client IP values
|
|
that experience one or more distinct rare status codes compared to the
|
|
population. For example in a web access log, a `clientip` that experiences the
|
|
highest number of different rare status codes compared to the population is
|
|
regarded as highly anomalous. This analysis is based on the number of different
|
|
status code values, not the count of occurrences.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"function" : "rare",
|
|
"by_field_name" : "status",
|
|
"over_field_name" : "clientip"
|
|
}
|
|
--------------------------------------------------
|
|
|
|
NOTE: To define a status code as rare the {xpackml} features look at the number
|
|
of distinct status codes that occur, not the number of times the status code
|
|
occurs. If a single client IP experiences a single unique status code, this
|
|
is rare, even if it occurs for that client IP in every bucket.
|
|
|
|
//TBD: Still pertinent? "Here with rare we look at the number of distinct status codes.""
|
|
|
|
|
|
[float]
|
|
[[ml-freq-rare]]
|
|
==== Freq_rare
|
|
|
|
The `freq_rare` function detects values that occur rarely for a population.
|
|
It detects anomalies according to the number of times (frequency) that rare
|
|
values occur.
|
|
|
|
This function supports the following properties:
|
|
|
|
* `by_field_name` (required)
|
|
* `over_field_name` (optional)
|
|
* `partition_field_name` (optional)
|
|
* `summary_count_field_name` (optional)
|
|
|
|
For more information about those properties,
|
|
see <<ml-detectorconfig,Detector Configuration Objects>>.
|
|
|
|
For example, if you use the following function in a detector in your job, it
|
|
detects values that are frequently rare in a population. It models URI paths and
|
|
client IP interactions that occur. It defines a rare URI path as one that is
|
|
visited by few client IP values compared to the population. It detects the
|
|
client IP values that experience many interactions with rare URI paths compared
|
|
to the population. For example in a web access log, a client IP that visits
|
|
one or more rare URI paths many times compared to the population is regarded as
|
|
highly anomalous. This analysis is based on the count of interactions with rare
|
|
URI paths, not the number of different URI path values.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"function" : "freq_rare",
|
|
"by_field_name" : "uri",
|
|
"over_field_name" : "clientip"
|
|
}
|
|
--------------------------------------------------
|
|
|
|
NOTE: To define a URI path as rare, the analytics consider the number of
|
|
distinct values that occur and not the number of times the URI path occurs.
|
|
If a single client IP visits a single unique URI path, this is rare, even if it
|
|
occurs for that client IP in every bucket.
|
|
|
|
//TBD: Still pertinent? "Here with freq_rare we look at the number of times interactions have happened.""
|