diff --git a/docs/en/ml/functions/info.asciidoc b/docs/en/ml/functions/info.asciidoc index 453b2a11b00..07b5f9f9fed 100644 --- a/docs/en/ml/functions/info.asciidoc +++ b/docs/en/ml/functions/info.asciidoc @@ -6,39 +6,32 @@ that is contained in strings within a bucket. These functions can be used as a more sophisticated method to identify incidences of data exfiltration or C2C activity, when analyzing the size in bytes of the data might not be sufficient. -If you want to monitor for unusually high amounts of information, use `high_info_content`. -If want to look at drops in information content, use `low_info_content`. - The {xpackml} features include the following information content functions: -* <> -* <> -* <> +* `info_content`, `high_info_content`, `low_info_content` [float] [[ml-info-content]] -==== Info_content +==== Info_content, High_info_content, Low_info_content The `info_content` function detects anomalies in the amount of information that is contained in strings in a bucket. -This function supports the following properties: +If you want to monitor for unusually high amounts of information, +use `high_info_content`. +If want to look at drops in information content, use `low_info_content`. + +These functions support the following properties: * `field_name` (required) * `by_field_name` (optional) * `over_field_name` (optional) * `partition_field_name` (optional) -* `summary_count_field_name` (optional) For more information about those properties, see <>. -For example, if you use the following function in a detector in your job, it -models information that is present in the `subdomain` string. It detects -anomalies where the information content is unusual compared to the other -`highest_registered_domain` values. An anomaly could indicate an abuse of the -DNS protocol, such as malicious command and control activity. - +.Example 1: Analyzing subdomain strings with the info_content function [source,js] -------------------------------------------------- { @@ -48,36 +41,17 @@ DNS protocol, such as malicious command and control activity. } -------------------------------------------------- -NOTE: Both high and low values are considered anomalous. In many use cases, the -`high_info_content` function is often a more appropriate choice. +If you use this `info_content` function in a detector in your job, it models +information that is present in the `subdomain` string. It detects anomalies +where the information content is unusual compared to the other +`highest_registered_domain` values. An anomaly could indicate an abuse of the +DNS protocol, such as malicious command and control activity. -[float] -[[ml-high-info-content]] -==== High_info_content - -The `high_info_content` function detects anomalies in the amount of information -that is contained in strings in a bucket. Use this function if you want to -monitor for unusually high amounts of information. - -This function supports the following properties: - -* `field_name` (required) -* `by_field_name` (optional) -* `over_field_name` (optional) -* `partition_field_name` (optional) -* `summary_count_field_name` (optional) - -For more information about those properties, -see <>. - -For example, if you use the following function in a detector in your job, it -models information content that is held in the DNS query string. It detects -`src_ip` values where the information content is unusually high compared to -other `src_ip` values. This example is similar to the example for the -`info_content` function, but it reports anomalies only where the amount of -information content is higher than expected. -//TBD: Still pertinent? "This configuration identifies activity typical of DGA malware."" +NOTE: In this example, both high and low values are considered anomalous. +In many use cases, the `high_info_content` function is often a more appropriate +choice. +.Example 2: Analyzing query strings with the high_info_content function [source,js] -------------------------------------------------- { @@ -87,33 +61,14 @@ information content is higher than expected. } -------------------------------------------------- -[float] -[[ml-low-info-content]] -==== Low_info_content - -The `low_info_content` function detects anomalies in the amount of information -that is contained in strings in a bucket. Use this function if you want to look -at drops in information content. - -This function supports the following properties: - -* `field_name` (required) -* `by_field_name` (optional) -* `over_field_name` (optional) -* `partition_field_name` (optional) -* `summary_count_field_name` (optional) - -For more information about those properties, -see <>. - -For example, if you use the following function in a detector in your job, it -models information content that is present in the message string for each -`logfilename`. It detects anomalies where the information content is low compared -to its past behavior. For example, this function detects unusually low amounts -of information in a collection of rolling log files. Low information might -indicate that a process has entered an infinite loop or that logging features -have been disabled. +If you use this `high_info_content` function in a detector in your job, it +models information content that is held in the DNS query string. It detects +`src_ip` values where the information content is unusually high compared to +other `src_ip` values. This example is similar to the example for the +`info_content` function, but it reports anomalies only where the amount of +information content is higher than expected. +.Example 3: Analyzing message strings with the low_info_content function [source,js] -------------------------------------------------- { @@ -122,3 +77,11 @@ have been disabled. "by_field_name" : "logfilename" } -------------------------------------------------- + +If you use this `low_info_content` function in a detector in your job, it models +information content that is present in the message string for each +`logfilename`. It detects anomalies where the information content is low +compared to its past behavior. For example, this function detects unusually low +amounts of information in a collection of rolling log files. Low information +might indicate that a process has entered an infinite loop or that logging +features have been disabled. diff --git a/docs/en/ml/functions/metric.asciidoc b/docs/en/ml/functions/metric.asciidoc index d9cdfc6f6e3..203d6b44e88 100644 --- a/docs/en/ml/functions/metric.asciidoc +++ b/docs/en/ml/functions/metric.asciidoc @@ -9,16 +9,10 @@ The {xpackml} features include the following metric functions: * <> * <> -* <> -* <> -* <> -* <> -* <> -* <> +* xref:ml-metric-median[`median`, `high_median`, `low_median`] +* xref:ml-metric-mean[`mean`, `high_mean`, `low_mean`] * <> -* <> -* <> -* <> +* xref:ml-metric-varp[`varp`, `high_varp`, `low_varp`] [float] [[ml-metric-min]] @@ -35,18 +29,11 @@ This function supports the following properties: * `by_field_name` (optional) * `over_field_name` (optional) * `partition_field_name` (optional) -* `summary_count_field_name` (optional) For more information about those properties, see <>. -For example, if you use the following function in a detector in your job, -it detects where the smallest transaction is lower than previously observed. -You can use this function to detect items for sale at unintentionally low -prices due to data entry mistakes. It models the minimum amount for each -product over time. -//Detect when the minumum amount for a product is unusually low compared to its past amounts - +.Example 1: Analyzing minimum transactions with the min function [source,js] -------------------------------------------------- { @@ -56,6 +43,10 @@ product over time. } -------------------------------------------------- +If you use this `min` function in a detector in your job, it detects where the +smallest transaction is lower than previously observed. You can use this +function to detect items for sale at unintentionally low prices due to data +entry mistakes. It models the minimum amount for each product over time. [float] [[ml-metric-max]] @@ -72,18 +63,11 @@ This function supports the following properties: * `by_field_name` (optional) * `over_field_name` (optional) * `partition_field_name` (optional) -* `summary_count_field_name` (optional) For more information about those properties, see <>. -For example, if you use the following function in a detector in your job, -it detects where the longest `responsetime` is longer than previously observed. -You can use this function to detect applications that have `responsetime` -values that are unusually lengthy. It models the maximum `responsetime` for -each application over time and detects when the longest `responsetime` is -unusually long compared to previous applications. - +.Example 2: Analyzing maximum response times with the max function [source,js] -------------------------------------------------- { @@ -93,11 +77,14 @@ unusually long compared to previous applications. } -------------------------------------------------- -This analysis can be performed alongside `high_mean` functions by -application. By combining detectors and using the same influencer this would -detect both unusually long individual response times and average response times -for each bucket. For example: +If you use this `max` function in a detector in your job, it detects where the +longest `responsetime` is longer than previously observed. You can use this +function to detect applications that have `responsetime` values that are +unusually lengthy. It models the maximum `responsetime` for each application +over time and detects when the longest `responsetime` is unusually long compared +to previous applications. +.Example 3: Two detectors with max and high_mean functions [source,js] -------------------------------------------------- { @@ -112,29 +99,35 @@ for each bucket. For example: } -------------------------------------------------- +The analysis in the previous example can be performed alongside `high_mean` +functions by application. By combining detectors and using the same influencer +this job can detect both unusually long individual response times and average +response times for each bucket. + [float] [[ml-metric-median]] -==== Median +==== Median, High_median, Low_median The `median` function detects anomalies in the statistical median of a value. The median value is calculated for each bucket. -This function supports the following properties: +If you want to monitor unusually high median values, use the `high_median` +function. + +If you are just interested in unusually low median values, use the `low_median` +function. + +These functions support the following properties: * `field_name` (required) * `by_field_name` (optional) * `over_field_name` (optional) * `partition_field_name` (optional) -* `summary_count_field_name` (optional) For more information about those properties, see <>. -For example, if you use the following function in a detector in your job, -it models the median `responsetime` for each application over time. It detects -when the median `responsetime` is unusual compared to previous `responsetime` -values. - +.Example 4: Analyzing response times with the median function [source,js] -------------------------------------------------- { @@ -144,68 +137,34 @@ values. } -------------------------------------------------- -[float] -[[ml-metric-high-median]] -==== High_median - -The `high_median` function detects anomalies in the statistical median of a value. -The median value is calculated for each bucket. -Use this function if you want to monitor unusually high median values. - -This function supports the following properties: - -* `field_name` (required) -* `by_field_name` (optional) -* `over_field_name` (optional) -* `partition_field_name` (optional) -* `summary_count_field_name` (optional) - -For more information about those properties, -see <>. - -[float] -[[ml-metric-low-median]] -==== Low_median - -The `low_median` function detects anomalies in the statistical median of a value. -The median value is calculated for each bucket. -Use this function if you are just interested in unusually low median values. - -This function supports the following properties: - -* `field_name` (required) -* `by_field_name` (optional) -* `over_field_name` (optional) -* `partition_field_name` (optional) -* `summary_count_field_name` (optional) - -For more information about those properties, -see <>. - +If you use this `median` function in a detector in your job, it models the +median `responsetime` for each application over time. It detects when the median +`responsetime` is unusual compared to previous `responsetime` values. [float] [[ml-metric-mean]] -==== Mean +==== Mean, High_mean, Low_mean The `mean` function detects anomalies in the arithmetic mean of a value. The mean value is calculated for each bucket. -This function supports the following properties: +If you want to monitor unusually high average values, use the `high_mean` +function. + +If you are just interested in unusually low average values, use the `low_mean` +function. + +These functions support the following properties: * `field_name` (required) * `by_field_name` (optional) * `over_field_name` (optional) * `partition_field_name` (optional) -* `summary_count_field_name` (optional) For more information about those properties, see <>. -For example, if you use the following function in a detector in your job, -it models the mean `responsetime` for each application over time. It detects -when the mean `responsetime` is unusual compared to previous `responsetime` -values. - +.Example 5: Analyzing response times with the mean function [source,js] -------------------------------------------------- { @@ -215,30 +174,11 @@ values. } -------------------------------------------------- -[float] -[[ml-metric-high-mean]] -==== High_mean - -The `high_mean` function detects anomalies in the arithmetic mean of a value. -The mean value is calculated for each bucket. -Use this function if you want to monitor unusually high average values. - -This function supports the following properties: - -* `field_name` (required) -* `by_field_name` (optional) -* `over_field_name` (optional) -* `partition_field_name` (optional) -* `summary_count_field_name` (optional) - -For more information about those properties, -see <>. - -For example, if you use the following function in a detector in your job, -it models the mean `responsetime` for each application over time. It detects -when the mean `responsetime` is unusually high compared to previous -`responsetime` values. +If you use this `mean` function in a detector in your job, it models the mean +`responsetime` for each application over time. It detects when the mean +`responsetime` is unusual compared to previous `responsetime` values. +.Example 6: Analyzing response times with the high_mean function [source,js] -------------------------------------------------- { @@ -248,30 +188,11 @@ when the mean `responsetime` is unusually high compared to previous } -------------------------------------------------- -[float] -[[ml-metric-low-mean]] -==== Low_mean - -The `low_mean` function detects anomalies in the arithmetic mean of a value. -The mean value is calculated for each bucket. -Use this function if you are just interested in unusually low average values. - -This function supports the following properties: - -* `field_name` (required) -* `by_field_name` (optional) -* `over_field_name` (optional) -* `partition_field_name` (optional) -* `summary_count_field_name` (optional) - -For more information about those properties, -see <>. - -For example, if you use the following function in a detector in your job, -it models the mean `responsetime` for each application over time. It detects -when the mean `responsetime` is unusually low -compared to previous `responsetime` values. +If you use this `high_mean` function in a detector in your job, it models the +mean `responsetime` for each application over time. It detects when the mean +`responsetime` is unusually high compared to previous `responsetime` values. +.Example 7: Analyzing response times with the low_mean function [source,js] -------------------------------------------------- { @@ -281,6 +202,10 @@ compared to previous `responsetime` values. } -------------------------------------------------- +If you use this `low_mean` function in a detector in your job, it models the +mean `responsetime` for each application over time. It detects when the mean +`responsetime` is unusually low compared to previous `responsetime` values. + [float] [[ml-metric-metric]] ==== Metric @@ -303,11 +228,7 @@ This function supports the following properties: For more information about those properties, see <>. -For example, if you use the following function in a detector in your job, -it models the mean, min, and max `responsetime` for each application over time. -It detects when the mean, min, or max `responsetime` is unusual compared to -previous `responsetime` values. - +.Example 8: Analyzing response times with the metric function [source,js] -------------------------------------------------- { @@ -317,30 +238,33 @@ previous `responsetime` values. } -------------------------------------------------- +If you use this `metric` function in a detector in your job, it models the +mean, min, and max `responsetime` for each application over time. It detects +when the mean, min, or max `responsetime` is unusual compared to previous +`responsetime` values. [float] [[ml-metric-varp]] -==== Varp +==== Varp, High_varp, Low_varp The `varp` function detects anomalies in the variance of a value which is a measure of the variability and spread in the data. -This function supports the following properties: +If you want to monitor unusually high variance, use the `high_varp` function. + +If you are just interested in unusually low variance, use the `low_varp` function. + +These functions support the following properties: * `field_name` (required) * `by_field_name` (optional) * `over_field_name` (optional) * `partition_field_name` (optional) -* `summary_count_field_name` (optional) For more information about those properties, see <>. -For example, if you use the following function in a detector in your job, -it models models the variance in values of `responsetime` for each application -over time. It detects when the variance in `responsetime` is unusual compared -to past application behavior. - +.Example 9: Analyzing response times with the varp function [source,js] -------------------------------------------------- { @@ -350,30 +274,12 @@ to past application behavior. } -------------------------------------------------- -[float] -[[ml-metric-high-varp]] -==== High_varp - -The `high_varp` function detects anomalies in the variance of a value which is a -measure of the variability and spread in the data. Use this function if you want -to monitor unusually high variance. - -This function supports the following properties: - -* `field_name` (required) -* `by_field_name` (optional) -* `over_field_name` (optional) -* `partition_field_name` (optional) -* `summary_count_field_name` (optional) - -For more information about those properties, -see <>. - -For example, if you use the following function in a detector in your job, -it models models the variance in values of `responsetime` for each application -over time. It detects when the variance in `responsetime` is unusual compared -to past application behavior. +If you use this `varp` function in a detector in your job, it models the +variance in values of `responsetime` for each application over time. It detects +when the variance in `responsetime` is unusual compared to past application +behavior. +.Example 10: Analyzing response times with the high_varp function [source,js] -------------------------------------------------- { @@ -383,31 +289,12 @@ to past application behavior. } -------------------------------------------------- +If you use this `high_varp` function in a detector in your job, it models the +variance in values of `responsetime` for each application over time. It detects +when the variance in `responsetime` is unusual compared to past application +behavior. -[float] -[[ml-metric-low-varp]] -==== Low_varp - -The `low_varp` function detects anomalies in the variance of a value which is a -measure of the variability and spread in the data. Use this function if you are -just interested in unusually low variance. - -This function supports the following properties: - -* `field_name` (required) -* `by_field_name` (optional) -* `over_field_name` (optional) -* `partition_field_name` (optional) -* `summary_count_field_name` (optional) - -For more information about those properties, -see <>. - -For example, if you use the following function in a detector in your job, -it models models the variance in values of `responsetime` for each application -over time. It detects when the variance in `responsetime` is unusual compared -to past application behavior. - +.Example 11: Analyzing response times with the low_varp function [source,js] -------------------------------------------------- { @@ -416,3 +303,8 @@ to past application behavior. "by_field_name" : "application" } -------------------------------------------------- + +If you use this `low_varp` function in a detector in your job, it models the +variance in values of `responsetime` for each application over time. It detects +when the variance in `responsetime` is unusual compared to past application +behavior. diff --git a/docs/en/ml/functions/rare.asciidoc b/docs/en/ml/functions/rare.asciidoc index 1a45f718426..86cbf33ca45 100644 --- a/docs/en/ml/functions/rare.asciidoc +++ b/docs/en/ml/functions/rare.asciidoc @@ -40,17 +40,11 @@ This function supports the following properties: * `by_field_name` (required) * `over_field_name` (optional) * `partition_field_name` (optional) -* `summary_count_field_name` (optional) For more information about those properties, see <>. -For example, if you use the following function in a detector in your job, it -detects values that are rare in time. It models status codes that occur over -time and detects when rare status codes occur compared to the past. For example, -you can detect status codes in a web -access log that have never (or rarely) occurred before. - +.Example 1: Analyzing status codes with the rare function [source,js] -------------------------------------------------- { @@ -59,16 +53,12 @@ access log that have never (or rarely) occurred before. } -------------------------------------------------- -If you use the following function in a detector in your job, it -detects values that are rare in a population. It models status code and client -IP interactions that occur. It defines a rare status code as one that occurs for -few client IP values compared to the population. It detects client IP values -that experience one or more distinct rare status codes compared to the -population. For example in a web access log, a `clientip` that experiences the -highest number of different rare status codes compared to the population is -regarded as highly anomalous. This analysis is based on the number of different -status code values, not the count of occurrences. +If you use this `rare` function in a detector in your job, it detects values +that are rare in time. It models status codes that occur over time and detects +when rare status codes occur compared to the past. For example, you can detect +status codes in a web access log that have never (or rarely) occurred before. +.Example 2: Analyzing status codes in a population with the rare function [source,js] -------------------------------------------------- { @@ -78,14 +68,21 @@ status code values, not the count of occurrences. } -------------------------------------------------- +If you use this `rare` function in a detector in your job, it detects values +that are rare in a population. It models status code and client IP interactions +that occur. It defines a rare status code as one that occurs for few client IP +values compared to the population. It detects client IP values that experience +one or more distinct rare status codes compared to the population. For example +in a web access log, a `clientip` that experiences the highest number of +different rare status codes compared to the population is regarded as highly +anomalous. This analysis is based on the number of different status code values, +not the count of occurrences. + NOTE: To define a status code as rare the {xpackml} features look at the number of distinct status codes that occur, not the number of times the status code occurs. If a single client IP experiences a single unique status code, this is rare, even if it occurs for that client IP in every bucket. -//TBD: Still pertinent? "Here with rare we look at the number of distinct status codes."" - - [float] [[ml-freq-rare]] ==== Freq_rare @@ -99,21 +96,11 @@ This function supports the following properties: * `by_field_name` (required) * `over_field_name` (optional) * `partition_field_name` (optional) -* `summary_count_field_name` (optional) For more information about those properties, see <>. -For example, if you use the following function in a detector in your job, it -detects values that are frequently rare in a population. It models URI paths and -client IP interactions that occur. It defines a rare URI path as one that is -visited by few client IP values compared to the population. It detects the -client IP values that experience many interactions with rare URI paths compared -to the population. For example in a web access log, a client IP that visits -one or more rare URI paths many times compared to the population is regarded as -highly anomalous. This analysis is based on the count of interactions with rare -URI paths, not the number of different URI path values. - +.Example 3: Analyzing URI values in a population with the freq_rare function [source,js] -------------------------------------------------- { @@ -123,9 +110,17 @@ URI paths, not the number of different URI path values. } -------------------------------------------------- +If you use this `freq_rare` function in a detector in your job, it +detects values that are frequently rare in a population. It models URI paths and +client IP interactions that occur. It defines a rare URI path as one that is +visited by few client IP values compared to the population. It detects the +client IP values that experience many interactions with rare URI paths compared +to the population. For example in a web access log, a client IP that visits +one or more rare URI paths many times compared to the population is regarded as +highly anomalous. This analysis is based on the count of interactions with rare +URI paths, not the number of different URI path values. + NOTE: To define a URI path as rare, the analytics consider the number of distinct values that occur and not the number of times the URI path occurs. If a single client IP visits a single unique URI path, this is rare, even if it occurs for that client IP in every bucket. - -//TBD: Still pertinent? "Here with freq_rare we look at the number of times interactions have happened."" diff --git a/docs/en/ml/functions/sum.asciidoc b/docs/en/ml/functions/sum.asciidoc index d1cef15815e..9c3e01c701b 100644 --- a/docs/en/ml/functions/sum.asciidoc +++ b/docs/en/ml/functions/sum.asciidoc @@ -13,12 +13,8 @@ ignored; buckets with a zero value are analyzed. The {xpackml} features include the following sum functions: -* <> -* <> -* <> -* <> -* <> -* <> +* xref:ml-sum[`sum`, `high_sum`, `low_sum`] +* xref:ml-nonnull-sum[`non_null_sum`, `high_non_null_sum`, `low_non_null_sum`] //// TBD: Incorporate from prelert docs?: @@ -29,27 +25,26 @@ a more appropriate method to using the sum function. [float] [[ml-sum]] -==== Sum +==== Sum, High_sum, Low_sum The `sum` function detects anomalies where the sum of a field in a bucket is anomalous. -This function supports the following properties: +If you want to monitor unusually high sum values, use the `high_sum` function. + +If you want to monitor unusually low sum values, use the `low_sum` function. + +These functions support the following properties: * `field_name` (required) * `by_field_name` (optional) * `over_field_name` (optional) * `partition_field_name` (optional) -* `summary_count_field_name` (optional) For more information about those properties, see <>. -For example, if you use the following function in a detector in your job, it -models total expenses per employees for each cost center. For each time bucket, -it detects when an employee’s expenses are unusual for a cost center compared -to other employees. - +.Example 1: Analyzing total expenses with the sum function [source,js] -------------------------------------------------- { @@ -60,28 +55,12 @@ to other employees. } -------------------------------------------------- -[float] -[[ml-high-sum]] -==== High_sum - -The `high_sum` function detects anomalies where the sum of a field in a bucket -is unusually high. - -This function supports the following properties: - -* `field_name` (required) -* `by_field_name` (optional) -* `over_field_name` (optional) -* `partition_field_name` (optional) -* `summary_count_field_name` (optional) - -For more information about those properties, -see <>. - -For example, if you use the following function in a detector in your job, it -models total `cs_bytes`. It detects `cs_hosts` that transfer unusually high -volumes compared to other `cs_hosts`. +If you use this `sum` function in a detector in your job, it +models total expenses per employees for each cost center. For each time bucket, +it detects when an employee’s expenses are unusual for a cost center compared +to other employees. +.Example 2: Analyzing total bytes with the high_sum function [source,js] -------------------------------------------------- { @@ -91,42 +70,30 @@ volumes compared to other `cs_hosts`. } -------------------------------------------------- -This example looks for volumes of data transferred from a client to a server on -the internet that are unusual compared to other clients. This scenario could be -useful to detect data exfiltration or to find users that are abusing internet -privileges. - -[float] -[[ml-low-sum]] -==== Low_sum - -The `low_sum` function detects anomalies where the sum of a field in a bucket -is unusually low. - -This function supports the following properties: - -* `field_name` (required) -* `by_field_name` (optional) -* `over_field_name` (optional) -* `partition_field_name` (optional) -* `summary_count_field_name` (optional) - -For more information about those properties, -see <>. +If you use this `high_sum` function in a detector in your job, it +models total `cs_bytes`. It detects `cs_hosts` that transfer unusually high +volumes compared to other `cs_hosts`. This example looks for volumes of data +transferred from a client to a server on the internet that are unusual compared +to other clients. This scenario could be useful to detect data exfiltration or +to find users that are abusing internet privileges. [float] [[ml-nonnull-sum]] -==== Non_null_sum +==== Non_null_sum, High_non_null_sum, Low_non_null_sum The `non_null_sum` function is useful if your data is sparse. Buckets without values are ignored and buckets with a zero value are analyzed. -This function supports the following properties: +If you want to monitor unusually high totals, use the `high_non_null_sum` +function. + +If you want to look at drops in totals, use the `low_non_null_sum` function. + +These functions support the following properties: * `field_name` (required) * `by_field_name` (optional) * `partition_field_name` (optional) -* `summary_count_field_name` (optional) For more information about those properties, see <>. @@ -134,32 +101,7 @@ see <>. NOTE: Population analysis (that is to say, use of the `over_field_name` property) is not applicable for this function. -[float] -[[ml-high-nonnull-sum]] -==== High_non_null_sum - -The `high_non_null_sum` function is useful if your data is sparse. Buckets -without values are ignored and buckets with a zero value are analyzed. -Use this function if you want to monitor unusually high totals. - -This function supports the following properties: - -* `field_name` (required) -* `by_field_name` (optional) -* `partition_field_name` (optional) -* `summary_count_field_name` (optional) - -For more information about those properties, -see <>. - -NOTE: Population analysis (that is to say, use of the `over_field_name` property) -is not applicable for this function. - -For example, if you use the following function in a detector in your job, it -models the total `amount_approved` for each employee. It ignores any buckets -where the amount is null. It detects employees who approve unusually high -amounts compared to their past behavior. - +.Example 3: Analyzing employee approvals with the high_non_null_sum function [source,js] -------------------------------------------------- { @@ -169,26 +111,9 @@ amounts compared to their past behavior. } -------------------------------------------------- +If you use this `high_non_null_sum` function in a detector in your job, it +models the total `amount_approved` for each employee. It ignores any buckets +where the amount is null. It detects employees who approve unusually high +amounts compared to their past behavior. //For this credit control system analysis, using non_null_sum will ignore //periods where the employees are not active on the system. - -[float] -[[ml-low-nonnull-sum]] -==== Low_non_null_sum - -The `low_non_null_sum` function is useful if your data is sparse. Buckets -without values are ignored and buckets with a zero value are analyzed. -Use this function if you want to look at drops in totals. - -This function supports the following properties: - -* `field_name` (required) -* `by_field_name` (optional) -* `partition_field_name` (optional) -* `summary_count_field_name` (optional) - -For more information about those properties, -see <>. - -NOTE: Population analysis (that is to say, use of the `over_field_name` property) -is not applicable for this function. diff --git a/docs/en/ml/functions/time.asciidoc b/docs/en/ml/functions/time.asciidoc index 38781275d3f..e17301eb6f5 100644 --- a/docs/en/ml/functions/time.asciidoc +++ b/docs/en/ml/functions/time.asciidoc @@ -47,16 +47,11 @@ This function supports the following properties: * `by_field_name` (optional) * `over_field_name` (optional) * `partition_field_name` (optional) -* `summary_count_field_name` (optional) For more information about those properties, see <>. -For example, if you use the following function in a detector in your job, it -models when events occur throughout a day for each process. It detects when an -event occurs for a process that is at an unusual time in the day compared to -its past behavior. - +.Example 1: Analyzing events with the time_of_day function [source,js] -------------------------------------------------- { @@ -65,6 +60,10 @@ its past behavior. } -------------------------------------------------- +If you use this `time_of_day` function in a detector in your job, it +models when events occur throughout a day for each process. It detects when an +event occurs for a process that is at an unusual time in the day compared to +its past behavior. [float] [[ml-time-of-week]] @@ -78,17 +77,11 @@ This function supports the following properties: * `by_field_name` (optional) * `over_field_name` (optional) * `partition_field_name` (optional) -* `summary_count_field_name` (optional) For more information about those properties, see <>. -For example, if you use the following function in a detector in your job, it -models when events occur throughout the week for each `eventcode`. It detects -when a workstation event occurs at an unusual time during the week for that -`eventcode` compared to other workstations. It detects events for a -particular workstation that are outside the normal usage pattern. - +.Example 2: Analyzing events with the time_of_week function [source,js] -------------------------------------------------- { @@ -97,3 +90,9 @@ particular workstation that are outside the normal usage pattern. "over_field_name" : "workstation" } -------------------------------------------------- + +If you use this `time_of_week` function in a detector in your job, it +models when events occur throughout the week for each `eventcode`. It detects +when a workstation event occurs at an unusual time during the week for that +`eventcode` compared to other workstations. It detects events for a +particular workstation that are outside the normal usage pattern.