[DOCS] Add ML rare functions (elastic/x-pack-elasticsearch#1351)

* [DOCS] Add ML rare functions

* [DOCS] Address feedback in ML rare functions

Original commit: elastic/x-pack-elasticsearch@388274557c
This commit is contained in:
Lisa Cawley 2017-05-17 09:34:30 -07:00 committed by GitHub
parent cf5f8e4bad
commit 6d6c776cd4
1 changed files with 99 additions and 10 deletions

View File

@ -1,11 +1,6 @@
[[ml-rare-functions]]
=== Rare Functions
The {xpackml} features include the following rare functions:
* `rare`
* `freq_rare`
The rare functions detect values that occur rarely in time or rarely for a
population.
@ -27,16 +22,110 @@ with shorter bucket spans typically being measured in minutes, not hours.
for typical data.
====
////
rare:: rare items
The {xpackml} features include the following rare functions:
freq_rare:: frequently rare items
* <<ml-rare,`rare`>>
* <<ml-freq-rare,`freq_rare`>>
[float]
[[ml-rare]]
==== Rare
The `rare` function detects values that occur rarely in time or rarely for a
population. It detects anomalies according to the number of distinct rare values.
This function supports the following properties:
* `by_field_name` (required)
* `over_field_name` (optional)
* `partition_field_name` (optional)
* `summary_count_field_name` (optional)
For more information about those properties,
see <<ml-detectorconfig,Detector Configuration Objects>>.
For example, if you use the following function in a detector in your job, it
detects values that are rare in time. It models status codes that occur over
time and detects when rare status codes occur compared to the past. For example,
you can detect status codes in a web
access log that have never (or rarely) occurred before.
[source,js]
--------------------------------------------------
{ "function" : "min", "fieldName" : "amt", "byFieldName" : "product" }
{
"function" : "rare",
"by_field_name" : "status"
}
--------------------------------------------------
If you use the following function in a detector in your job, it
detects values that are rare in a population. It models status code and client
IP interactions that occur. It defines a rare status code as one that occurs for
few client IP values compared to the population. It detects client IP values
that experience one or more distinct rare status codes compared to the
population. For example in a web access log, a `clientip` that experiences the
highest number of different rare status codes compared to the population is
regarded as highly anomalous. This analysis is based on the number of different
status code values, not the count of occurrences.
////
[source,js]
--------------------------------------------------
{
"function" : "rare",
"by_field_name" : "status",
"over_field_name" : "clientip"
}
--------------------------------------------------
NOTE: To define a status code as rare the {xpackml} features look at the number
of distinct status codes that occur, not the number of times the status code
occurs. If a single client IP experiences a single unique status code, this
is rare, even if it occurs for that client IP in every bucket.
//TBD: Still pertinent? "Here with rare we look at the number of distinct status codes.""
[float]
[[ml-freq-rare]]
==== Freq_rare
The `freq_rare` function detects values that occur rarely for a population.
It detects anomalies according to the number of times (frequency) that rare
values occur.
This function supports the following properties:
* `by_field_name` (required)
* `over_field_name` (optional)
* `partition_field_name` (optional)
* `summary_count_field_name` (optional)
For more information about those properties,
see <<ml-detectorconfig,Detector Configuration Objects>>.
For example, if you use the following function in a detector in your job, it
detects values that are frequently rare in a population. It models URI paths and
client IP interactions that occur. It defines a rare URI path as one that is
visited by few client IP values compared to the population. It detects the
client IP values that experience many interactions with rare URI paths compared
to the population. For example in a web access log, a client IP that visits
one or more rare URI paths many times compared to the population is regarded as
highly anomalous. This analysis is based on the count of interactions with rare
URI paths, not the number of different URI path values.
[source,js]
--------------------------------------------------
{
"function" : "freq_rare",
"by_field_name" : "uri",
"over_field_name" : "clientip"
}
--------------------------------------------------
NOTE: To define a URI path as rare, the analytics consider the number of
distinct values that occur and not the number of times the URI path occurs.
If a single client IP visits a single unique URI path, this is rare, even if it
occurs for that client IP in every bucket.
//TBD: Still pertinent? "Here with freq_rare we look at the number of times interactions have happened.""