OpenSearch/docs/en/ml/functions/info.asciidoc

[[ml-info-functions]]
=== Information Content Functions

The information content functions detect anomalies in the amount of information
that is contained in strings within a bucket. These functions can be used as
a more sophisticated method to identify incidences of data exfiltration or
C2C activity, when analyzing the size in bytes of the data might not be sufficient.

If you want to monitor for unusually high amounts of information, use `high_info_content`.
If want to look at drops in information content, use `low_info_content`.

The {xpackml} features include the following information content functions:

* <<ml-info-content,`info_content`>>
* <<ml-high-info-content,`high_info_content`>>
* <<ml-low-info-content,`low_info_content`>>

[float]
[[ml-info-content]]
==== Info_content

The `info_content` function detects anomalies in the amount of information that
is contained in strings in a bucket.

This function supports the following properties:

* `field_name` (required)
* `by_field_name` (optional)
* `over_field_name` (optional)
* `partition_field_name` (optional)
* `summary_count_field_name` (optional)

For more information about those properties,
see <<ml-detectorconfig,Detector Configuration Objects>>.

For example, if you use the following function in a detector in your job, it
models information that is present in the `subdomain` string. It detects
anomalies where the information content is unusual compared to the other
`highest_registered_domain` values. An anomaly could indicate an abuse of the
DNS protocol, such as malicious command and control activity.

[source,js]
--------------------------------------------------
{
  "function" : "info_content",
  "field_name" : "subdomain",
  "over_field_name" : "highest_registered_domain"
}
--------------------------------------------------

NOTE: Both high and low values are considered anomalous. In many use cases, the
`high_info_content` function is often a more appropriate choice.

[float]
[[ml-high-info-content]]
==== High_info_content

The `high_info_content` function detects anomalies in the amount of information
that is contained in strings in a bucket. Use this function if you want to
monitor for unusually high amounts of information.

This function supports the following properties:

* `field_name` (required)
* `by_field_name` (optional)
* `over_field_name` (optional)
* `partition_field_name` (optional)
* `summary_count_field_name` (optional)

For more information about those properties,
see <<ml-detectorconfig,Detector Configuration Objects>>.

For example, if you use the following function in a detector in your job, it
models information content that is held in the DNS query string. It detects
`src_ip` values where the information content is unusually high compared to
other `src_ip` values. This example is similar to the example for the
`info_content` function, but it reports anomalies only where the amount of
information content is higher than expected.
//TBD: Still pertinent? "This configuration identifies activity typical of DGA malware.""

[source,js]
--------------------------------------------------
{
  "function" : "high_info_content",
  "field_name" : "query",
  "over_field_name" : "src_ip"
}
--------------------------------------------------

[float]
[[ml-low-info-content]]
==== Low_info_content

The `low_info_content` function detects anomalies in the amount of information
that is contained in strings in a bucket. Use this function if you want to look
at drops in information content.

This function supports the following properties:

* `field_name` (required)
* `by_field_name` (optional)
* `over_field_name` (optional)
* `partition_field_name` (optional)
* `summary_count_field_name` (optional)

For more information about those properties,
see <<ml-detectorconfig,Detector Configuration Objects>>.

For example, if you use the following function in a detector in your job, it
models information content that is present in the message string for each
`logfilename`. It detects anomalies where the information content is low compared
to its past behavior. For example, this function detects unusually low amounts
of information in a collection of rolling log files. Low information might
indicate that a process has entered an infinite loop or that logging features
have been disabled.

[source,js]
--------------------------------------------------
{
  "function" : "low_info_content",
  "field_name" : "message",
  "by_field_name" : "logfilename"
}
--------------------------------------------------