2017-05-05 13:40:17 -04:00
|
|
|
[[ml-info-functions]]
|
2020-07-20 20:04:59 -04:00
|
|
|
= Information Content Functions
|
2017-05-05 13:40:17 -04:00
|
|
|
|
|
|
|
The information content functions detect anomalies in the amount of information
|
|
|
|
that is contained in strings within a bucket. These functions can be used as
|
|
|
|
a more sophisticated method to identify incidences of data exfiltration or
|
2020-11-25 07:44:57 -05:00
|
|
|
C2 (Command and Control) activity, when analyzing the size in bytes of the data might not be sufficient.
|
2017-05-05 13:40:17 -04:00
|
|
|
|
2019-01-07 17:32:36 -05:00
|
|
|
The {ml-features} include the following information content functions:
|
2017-05-09 11:12:59 -04:00
|
|
|
|
2017-05-19 13:48:15 -04:00
|
|
|
* `info_content`, `high_info_content`, `low_info_content`
|
2017-05-09 11:12:59 -04:00
|
|
|
|
2020-07-23 12:42:33 -04:00
|
|
|
[discrete]
|
2017-05-09 11:12:59 -04:00
|
|
|
[[ml-info-content]]
|
2020-07-20 20:04:59 -04:00
|
|
|
== Info_content, High_info_content, Low_info_content
|
2017-05-09 11:12:59 -04:00
|
|
|
|
|
|
|
The `info_content` function detects anomalies in the amount of information that
|
|
|
|
is contained in strings in a bucket.
|
|
|
|
|
2017-05-19 13:48:15 -04:00
|
|
|
If you want to monitor for unusually high amounts of information,
|
|
|
|
use `high_info_content`.
|
|
|
|
If want to look at drops in information content, use `low_info_content`.
|
|
|
|
|
|
|
|
These functions support the following properties:
|
2017-05-09 11:12:59 -04:00
|
|
|
|
|
|
|
* `field_name` (required)
|
|
|
|
* `by_field_name` (optional)
|
|
|
|
* `over_field_name` (optional)
|
|
|
|
* `partition_field_name` (optional)
|
|
|
|
|
2019-12-27 16:30:26 -05:00
|
|
|
For more information about those properties, see the
|
|
|
|
{ref}/ml-put-job.html#ml-put-job-request-body[create {anomaly-jobs} API].
|
2017-05-09 11:12:59 -04:00
|
|
|
|
2017-05-19 13:48:15 -04:00
|
|
|
.Example 1: Analyzing subdomain strings with the info_content function
|
2017-05-05 14:57:20 -04:00
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------
|
2017-05-09 11:12:59 -04:00
|
|
|
{
|
|
|
|
"function" : "info_content",
|
|
|
|
"field_name" : "subdomain",
|
|
|
|
"over_field_name" : "highest_registered_domain"
|
|
|
|
}
|
2017-05-05 14:57:20 -04:00
|
|
|
--------------------------------------------------
|
2018-06-19 16:57:10 -04:00
|
|
|
// NOTCONSOLE
|
2017-05-05 14:57:20 -04:00
|
|
|
|
2019-07-26 14:07:01 -04:00
|
|
|
If you use this `info_content` function in a detector in your {anomaly-job}, it
|
|
|
|
models information that is present in the `subdomain` string. It detects
|
|
|
|
anomalies where the information content is unusual compared to the other
|
2017-05-19 13:48:15 -04:00
|
|
|
`highest_registered_domain` values. An anomaly could indicate an abuse of the
|
|
|
|
DNS protocol, such as malicious command and control activity.
|
2017-05-09 11:12:59 -04:00
|
|
|
|
2017-05-19 13:48:15 -04:00
|
|
|
NOTE: In this example, both high and low values are considered anomalous.
|
|
|
|
In many use cases, the `high_info_content` function is often a more appropriate
|
|
|
|
choice.
|
2017-05-09 11:12:59 -04:00
|
|
|
|
2017-05-19 13:48:15 -04:00
|
|
|
.Example 2: Analyzing query strings with the high_info_content function
|
2017-05-09 11:12:59 -04:00
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------
|
|
|
|
{
|
|
|
|
"function" : "high_info_content",
|
|
|
|
"field_name" : "query",
|
|
|
|
"over_field_name" : "src_ip"
|
|
|
|
}
|
|
|
|
--------------------------------------------------
|
2018-06-19 16:57:10 -04:00
|
|
|
// NOTCONSOLE
|
2017-05-09 11:12:59 -04:00
|
|
|
|
2019-07-26 14:07:01 -04:00
|
|
|
If you use this `high_info_content` function in a detector in your {anomaly-job},
|
|
|
|
it models information content that is held in the DNS query string. It detects
|
2017-05-19 13:48:15 -04:00
|
|
|
`src_ip` values where the information content is unusually high compared to
|
|
|
|
other `src_ip` values. This example is similar to the example for the
|
|
|
|
`info_content` function, but it reports anomalies only where the amount of
|
|
|
|
information content is higher than expected.
|
2017-05-09 11:12:59 -04:00
|
|
|
|
2017-05-19 13:48:15 -04:00
|
|
|
.Example 3: Analyzing message strings with the low_info_content function
|
2017-05-09 11:12:59 -04:00
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------
|
|
|
|
{
|
|
|
|
"function" : "low_info_content",
|
|
|
|
"field_name" : "message",
|
|
|
|
"by_field_name" : "logfilename"
|
|
|
|
}
|
|
|
|
--------------------------------------------------
|
2018-06-19 16:57:10 -04:00
|
|
|
// NOTCONSOLE
|
2017-05-19 13:48:15 -04:00
|
|
|
|
2019-07-26 14:07:01 -04:00
|
|
|
If you use this `low_info_content` function in a detector in your {anomaly-job},
|
|
|
|
it models information content that is present in the message string for each
|
2017-05-19 13:48:15 -04:00
|
|
|
`logfilename`. It detects anomalies where the information content is low
|
|
|
|
compared to its past behavior. For example, this function detects unusually low
|
|
|
|
amounts of information in a collection of rolling log files. Low information
|
|
|
|
might indicate that a process has entered an infinite loop or that logging
|
|
|
|
features have been disabled.
|