[DOCS] Add ML rare functions (elastic/x-pack-elasticsearch#1351)
* [DOCS] Add ML rare functions * [DOCS] Address feedback in ML rare functions Original commit: elastic/x-pack-elasticsearch@388274557c
This commit is contained in:
parent
cf5f8e4bad
commit
6d6c776cd4
|
@ -1,11 +1,6 @@
|
|||
[[ml-rare-functions]]
|
||||
=== Rare Functions
|
||||
|
||||
The {xpackml} features include the following rare functions:
|
||||
|
||||
* `rare`
|
||||
* `freq_rare`
|
||||
|
||||
The rare functions detect values that occur rarely in time or rarely for a
|
||||
population.
|
||||
|
||||
|
@ -27,16 +22,110 @@ with shorter bucket spans typically being measured in minutes, not hours.
|
|||
for typical data.
|
||||
====
|
||||
|
||||
////
|
||||
rare:: rare items
|
||||
The {xpackml} features include the following rare functions:
|
||||
|
||||
freq_rare:: frequently rare items
|
||||
* <<ml-rare,`rare`>>
|
||||
* <<ml-freq-rare,`freq_rare`>>
|
||||
|
||||
|
||||
[float]
|
||||
[[ml-rare]]
|
||||
==== Rare
|
||||
|
||||
The `rare` function detects values that occur rarely in time or rarely for a
|
||||
population. It detects anomalies according to the number of distinct rare values.
|
||||
|
||||
This function supports the following properties:
|
||||
|
||||
* `by_field_name` (required)
|
||||
* `over_field_name` (optional)
|
||||
* `partition_field_name` (optional)
|
||||
* `summary_count_field_name` (optional)
|
||||
|
||||
For more information about those properties,
|
||||
see <<ml-detectorconfig,Detector Configuration Objects>>.
|
||||
|
||||
For example, if you use the following function in a detector in your job, it
|
||||
detects values that are rare in time. It models status codes that occur over
|
||||
time and detects when rare status codes occur compared to the past. For example,
|
||||
you can detect status codes in a web
|
||||
access log that have never (or rarely) occurred before.
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{ "function" : "min", "fieldName" : "amt", "byFieldName" : "product" }
|
||||
{
|
||||
"function" : "rare",
|
||||
"by_field_name" : "status"
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
If you use the following function in a detector in your job, it
|
||||
detects values that are rare in a population. It models status code and client
|
||||
IP interactions that occur. It defines a rare status code as one that occurs for
|
||||
few client IP values compared to the population. It detects client IP values
|
||||
that experience one or more distinct rare status codes compared to the
|
||||
population. For example in a web access log, a `clientip` that experiences the
|
||||
highest number of different rare status codes compared to the population is
|
||||
regarded as highly anomalous. This analysis is based on the number of different
|
||||
status code values, not the count of occurrences.
|
||||
|
||||
////
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"function" : "rare",
|
||||
"by_field_name" : "status",
|
||||
"over_field_name" : "clientip"
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
NOTE: To define a status code as rare the {xpackml} features look at the number
|
||||
of distinct status codes that occur, not the number of times the status code
|
||||
occurs. If a single client IP experiences a single unique status code, this
|
||||
is rare, even if it occurs for that client IP in every bucket.
|
||||
|
||||
//TBD: Still pertinent? "Here with rare we look at the number of distinct status codes.""
|
||||
|
||||
|
||||
[float]
|
||||
[[ml-freq-rare]]
|
||||
==== Freq_rare
|
||||
|
||||
The `freq_rare` function detects values that occur rarely for a population.
|
||||
It detects anomalies according to the number of times (frequency) that rare
|
||||
values occur.
|
||||
|
||||
This function supports the following properties:
|
||||
|
||||
* `by_field_name` (required)
|
||||
* `over_field_name` (optional)
|
||||
* `partition_field_name` (optional)
|
||||
* `summary_count_field_name` (optional)
|
||||
|
||||
For more information about those properties,
|
||||
see <<ml-detectorconfig,Detector Configuration Objects>>.
|
||||
|
||||
For example, if you use the following function in a detector in your job, it
|
||||
detects values that are frequently rare in a population. It models URI paths and
|
||||
client IP interactions that occur. It defines a rare URI path as one that is
|
||||
visited by few client IP values compared to the population. It detects the
|
||||
client IP values that experience many interactions with rare URI paths compared
|
||||
to the population. For example in a web access log, a client IP that visits
|
||||
one or more rare URI paths many times compared to the population is regarded as
|
||||
highly anomalous. This analysis is based on the count of interactions with rare
|
||||
URI paths, not the number of different URI path values.
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"function" : "freq_rare",
|
||||
"by_field_name" : "uri",
|
||||
"over_field_name" : "clientip"
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
NOTE: To define a URI path as rare, the analytics consider the number of
|
||||
distinct values that occur and not the number of times the URI path occurs.
|
||||
If a single client IP visits a single unique URI path, this is rare, even if it
|
||||
occurs for that client IP in every bucket.
|
||||
|
||||
//TBD: Still pertinent? "Here with freq_rare we look at the number of times interactions have happened.""
|
||||
|
|
Loading…
Reference in New Issue