From 6d6c776cd4e6cbd05896905508f37ba7705b4044 Mon Sep 17 00:00:00 2001 From: Lisa Cawley Date: Wed, 17 May 2017 09:34:30 -0700 Subject: [PATCH] [DOCS] Add ML rare functions (elastic/x-pack-elasticsearch#1351) * [DOCS] Add ML rare functions * [DOCS] Address feedback in ML rare functions Original commit: elastic/x-pack-elasticsearch@388274557ce30865145fbbb1d859147ca9e0f974 --- docs/en/ml/functions/rare.asciidoc | 109 ++++++++++++++++++++++++++--- 1 file changed, 99 insertions(+), 10 deletions(-) diff --git a/docs/en/ml/functions/rare.asciidoc b/docs/en/ml/functions/rare.asciidoc index c63673c5847..1a45f718426 100644 --- a/docs/en/ml/functions/rare.asciidoc +++ b/docs/en/ml/functions/rare.asciidoc @@ -1,11 +1,6 @@ [[ml-rare-functions]] === Rare Functions -The {xpackml} features include the following rare functions: - -* `rare` -* `freq_rare` - The rare functions detect values that occur rarely in time or rarely for a population. @@ -27,16 +22,110 @@ with shorter bucket spans typically being measured in minutes, not hours. for typical data. ==== -//// -rare:: rare items +The {xpackml} features include the following rare functions: -freq_rare:: frequently rare items +* <> +* <> +[float] +[[ml-rare]] +==== Rare + +The `rare` function detects values that occur rarely in time or rarely for a +population. It detects anomalies according to the number of distinct rare values. + +This function supports the following properties: + +* `by_field_name` (required) +* `over_field_name` (optional) +* `partition_field_name` (optional) +* `summary_count_field_name` (optional) + +For more information about those properties, +see <>. + +For example, if you use the following function in a detector in your job, it +detects values that are rare in time. It models status codes that occur over +time and detects when rare status codes occur compared to the past. For example, +you can detect status codes in a web +access log that have never (or rarely) occurred before. + [source,js] -------------------------------------------------- -{ "function" : "min", "fieldName" : "amt", "byFieldName" : "product" } +{ + "function" : "rare", + "by_field_name" : "status" +} -------------------------------------------------- +If you use the following function in a detector in your job, it +detects values that are rare in a population. It models status code and client +IP interactions that occur. It defines a rare status code as one that occurs for +few client IP values compared to the population. It detects client IP values +that experience one or more distinct rare status codes compared to the +population. For example in a web access log, a `clientip` that experiences the +highest number of different rare status codes compared to the population is +regarded as highly anomalous. This analysis is based on the number of different +status code values, not the count of occurrences. -//// +[source,js] +-------------------------------------------------- +{ + "function" : "rare", + "by_field_name" : "status", + "over_field_name" : "clientip" +} +-------------------------------------------------- + +NOTE: To define a status code as rare the {xpackml} features look at the number +of distinct status codes that occur, not the number of times the status code +occurs. If a single client IP experiences a single unique status code, this +is rare, even if it occurs for that client IP in every bucket. + +//TBD: Still pertinent? "Here with rare we look at the number of distinct status codes."" + + +[float] +[[ml-freq-rare]] +==== Freq_rare + +The `freq_rare` function detects values that occur rarely for a population. +It detects anomalies according to the number of times (frequency) that rare +values occur. + +This function supports the following properties: + +* `by_field_name` (required) +* `over_field_name` (optional) +* `partition_field_name` (optional) +* `summary_count_field_name` (optional) + +For more information about those properties, +see <>. + +For example, if you use the following function in a detector in your job, it +detects values that are frequently rare in a population. It models URI paths and +client IP interactions that occur. It defines a rare URI path as one that is +visited by few client IP values compared to the population. It detects the +client IP values that experience many interactions with rare URI paths compared +to the population. For example in a web access log, a client IP that visits +one or more rare URI paths many times compared to the population is regarded as +highly anomalous. This analysis is based on the count of interactions with rare +URI paths, not the number of different URI path values. + +[source,js] +-------------------------------------------------- +{ + "function" : "freq_rare", + "by_field_name" : "uri", + "over_field_name" : "clientip" +} +-------------------------------------------------- + +NOTE: To define a URI path as rare, the analytics consider the number of +distinct values that occur and not the number of times the URI path occurs. +If a single client IP visits a single unique URI path, this is rare, even if it +occurs for that client IP in every bucket. + +//TBD: Still pertinent? "Here with freq_rare we look at the number of times interactions have happened.""