2018-07-25 11:10:32 -04:00
|
|
|
[role="xpack"]
|
|
|
|
[[ml-configuring-detector-custom-rules]]
|
2018-08-13 09:51:13 -04:00
|
|
|
=== Customizing detectors with custom rules
|
2018-07-25 11:10:32 -04:00
|
|
|
|
2018-08-13 09:51:13 -04:00
|
|
|
<<ml-rules,Custom rules>> enable you to change the behavior of anomaly
|
2018-07-25 11:10:32 -04:00
|
|
|
detectors based on domain-specific knowledge.
|
|
|
|
|
2018-08-13 09:51:13 -04:00
|
|
|
Custom rules describe _when_ a detector should take a certain _action_ instead
|
2018-07-25 11:10:32 -04:00
|
|
|
of following its default behavior. To specify the _when_ a rule uses
|
|
|
|
a `scope` and `conditions`. You can think of `scope` as the categorical
|
|
|
|
specification of a rule, while `conditions` are the numerical part.
|
|
|
|
A rule can have a scope, one or more conditions, or a combination of
|
2020-02-25 12:30:14 -05:00
|
|
|
scope and conditions. For the full list of specification details, see the
|
|
|
|
{ref}/ml-put-job.html#put-customrules[`custom_rules` object] in the create
|
|
|
|
{anomaly-jobs} API.
|
2018-07-25 11:10:32 -04:00
|
|
|
|
2020-02-25 12:30:14 -05:00
|
|
|
[[ml-custom-rules-scope]]
|
2018-08-13 09:51:13 -04:00
|
|
|
==== Specifying custom rule scope
|
2018-07-25 11:10:32 -04:00
|
|
|
|
2019-07-26 14:07:01 -04:00
|
|
|
Let us assume we are configuring an {anomaly-job} in order to detect DNS data
|
|
|
|
exfiltration. Our data contain fields "subdomain" and "highest_registered_domain".
|
|
|
|
We can use a detector that looks like
|
|
|
|
`high_info_content(subdomain) over highest_registered_domain`. If we run such a
|
|
|
|
job, it is possible that we discover a lot of anomalies on frequently used
|
|
|
|
domains that we have reasons to trust. As security analysts, we are not
|
|
|
|
interested in such anomalies. Ideally, we could instruct the detector to skip
|
|
|
|
results for domains that we consider safe. Using a rule with a scope allows us
|
|
|
|
to achieve this.
|
2018-07-25 11:10:32 -04:00
|
|
|
|
2018-07-31 19:28:09 -04:00
|
|
|
First, we need to create a list of our safe domains. Those lists are called
|
2019-07-26 14:07:01 -04:00
|
|
|
_filters_ in {ml}. Filters can be shared across {anomaly-jobs}.
|
2018-07-25 11:10:32 -04:00
|
|
|
|
2020-02-25 12:30:14 -05:00
|
|
|
You can create a filter in **Anomaly Detection > Settings > Filter Lists** in
|
|
|
|
{kib} or by using the {ref}/ml-put-filter.html[put filter API]:
|
2018-07-25 11:10:32 -04:00
|
|
|
|
2019-09-09 12:35:50 -04:00
|
|
|
[source,console]
|
2018-07-25 11:10:32 -04:00
|
|
|
----------------------------------
|
2018-12-07 15:34:11 -05:00
|
|
|
PUT _ml/filters/safe_domains
|
2018-07-25 11:10:32 -04:00
|
|
|
{
|
|
|
|
"description": "Our list of safe domains",
|
|
|
|
"items": ["safe.com", "trusted.com"]
|
|
|
|
}
|
|
|
|
----------------------------------
|
2018-08-31 14:56:26 -04:00
|
|
|
// TEST[skip:needs-licence]
|
2018-07-25 11:10:32 -04:00
|
|
|
|
2019-07-26 14:07:01 -04:00
|
|
|
Now, we can create our {anomaly-job} specifying a scope that uses the
|
2020-02-25 12:30:14 -05:00
|
|
|
`safe_domains` filter for the `highest_registered_domain` field:
|
2018-07-25 11:10:32 -04:00
|
|
|
|
2019-09-09 12:35:50 -04:00
|
|
|
[source,console]
|
2018-07-25 11:10:32 -04:00
|
|
|
----------------------------------
|
2018-12-07 15:34:11 -05:00
|
|
|
PUT _ml/anomaly_detectors/dns_exfiltration_with_rule
|
2018-07-25 11:10:32 -04:00
|
|
|
{
|
|
|
|
"analysis_config" : {
|
|
|
|
"bucket_span":"5m",
|
|
|
|
"detectors" :[{
|
|
|
|
"function":"high_info_content",
|
|
|
|
"field_name": "subdomain",
|
|
|
|
"over_field_name": "highest_registered_domain",
|
|
|
|
"custom_rules": [{
|
|
|
|
"actions": ["skip_result"],
|
|
|
|
"scope": {
|
|
|
|
"highest_registered_domain": {
|
|
|
|
"filter_id": "safe_domains",
|
|
|
|
"filter_type": "include"
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}]
|
|
|
|
}]
|
|
|
|
},
|
|
|
|
"data_description" : {
|
|
|
|
"time_field":"timestamp"
|
|
|
|
}
|
|
|
|
}
|
|
|
|
----------------------------------
|
2018-08-31 14:56:26 -04:00
|
|
|
// TEST[skip:needs-licence]
|
2018-07-25 11:10:32 -04:00
|
|
|
|
|
|
|
As time advances and we see more data and more results, we might encounter new
|
2020-02-25 12:30:14 -05:00
|
|
|
domains that we want to add in the filter. We can do that in the
|
|
|
|
**Anomaly Detection > Settings > Filter Lists** in {kib} or by using the
|
2018-07-25 11:10:32 -04:00
|
|
|
{ref}/ml-update-filter.html[update filter API]:
|
|
|
|
|
2019-09-09 12:35:50 -04:00
|
|
|
[source,console]
|
2018-07-25 11:10:32 -04:00
|
|
|
----------------------------------
|
2018-12-07 15:34:11 -05:00
|
|
|
POST _ml/filters/safe_domains/_update
|
2018-07-25 11:10:32 -04:00
|
|
|
{
|
|
|
|
"add_items": ["another-safe.com"]
|
|
|
|
}
|
|
|
|
----------------------------------
|
2018-08-31 14:56:26 -04:00
|
|
|
// TEST[skip:setup:ml_filter_safe_domains]
|
2018-07-25 11:10:32 -04:00
|
|
|
|
2018-07-31 19:28:09 -04:00
|
|
|
Note that we can use any of the `partition_field_name`, `over_field_name`, or
|
|
|
|
`by_field_name` fields in the `scope`.
|
|
|
|
|
2018-07-25 11:10:32 -04:00
|
|
|
In the following example we scope multiple fields:
|
|
|
|
|
2019-09-09 12:35:50 -04:00
|
|
|
[source,console]
|
2018-07-25 11:10:32 -04:00
|
|
|
----------------------------------
|
2018-12-07 15:34:11 -05:00
|
|
|
PUT _ml/anomaly_detectors/scoping_multiple_fields
|
2018-07-25 11:10:32 -04:00
|
|
|
{
|
|
|
|
"analysis_config" : {
|
|
|
|
"bucket_span":"5m",
|
|
|
|
"detectors" :[{
|
|
|
|
"function":"count",
|
|
|
|
"partition_field_name": "my_partition",
|
|
|
|
"over_field_name": "my_over",
|
|
|
|
"by_field_name": "my_by",
|
|
|
|
"custom_rules": [{
|
|
|
|
"actions": ["skip_result"],
|
|
|
|
"scope": {
|
|
|
|
"my_partition": {
|
|
|
|
"filter_id": "filter_1"
|
|
|
|
},
|
|
|
|
"my_over": {
|
|
|
|
"filter_id": "filter_2"
|
|
|
|
},
|
|
|
|
"my_by": {
|
|
|
|
"filter_id": "filter_3"
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}]
|
|
|
|
}]
|
|
|
|
},
|
|
|
|
"data_description" : {
|
|
|
|
"time_field":"timestamp"
|
|
|
|
}
|
|
|
|
}
|
|
|
|
----------------------------------
|
2018-08-31 14:56:26 -04:00
|
|
|
// TEST[skip:needs-licence]
|
2018-07-25 11:10:32 -04:00
|
|
|
|
|
|
|
Such a detector will skip results when the values of all 3 scoped fields
|
|
|
|
are included in the referenced filters.
|
|
|
|
|
2020-02-25 12:30:14 -05:00
|
|
|
[[ml-custom-rules-conditions]]
|
2018-08-13 09:51:13 -04:00
|
|
|
==== Specifying custom rule conditions
|
2018-07-25 11:10:32 -04:00
|
|
|
|
|
|
|
Imagine a detector that looks for anomalies in CPU utilization.
|
|
|
|
Given a machine that is idle for long enough, small movement in CPU could
|
|
|
|
result in anomalous results where the `actual` value is quite small, for
|
|
|
|
example, 0.02. Given our knowledge about how CPU utilization behaves we might
|
|
|
|
determine that anomalies with such small actual values are not interesting for
|
|
|
|
investigation.
|
|
|
|
|
2019-07-26 14:07:01 -04:00
|
|
|
Let us now configure an {anomaly-job} with a rule that will skip results where
|
|
|
|
CPU utilization is less than 0.20.
|
2018-07-25 11:10:32 -04:00
|
|
|
|
2019-09-09 12:35:50 -04:00
|
|
|
[source,console]
|
2018-07-25 11:10:32 -04:00
|
|
|
----------------------------------
|
2018-12-07 15:34:11 -05:00
|
|
|
PUT _ml/anomaly_detectors/cpu_with_rule
|
2018-07-25 11:10:32 -04:00
|
|
|
{
|
|
|
|
"analysis_config" : {
|
|
|
|
"bucket_span":"5m",
|
|
|
|
"detectors" :[{
|
|
|
|
"function":"high_mean",
|
|
|
|
"field_name": "cpu_utilization",
|
|
|
|
"custom_rules": [{
|
|
|
|
"actions": ["skip_result"],
|
|
|
|
"conditions": [
|
|
|
|
{
|
|
|
|
"applies_to": "actual",
|
|
|
|
"operator": "lt",
|
|
|
|
"value": 0.20
|
|
|
|
}
|
|
|
|
]
|
|
|
|
}]
|
|
|
|
}]
|
|
|
|
},
|
|
|
|
"data_description" : {
|
|
|
|
"time_field":"timestamp"
|
|
|
|
}
|
|
|
|
}
|
|
|
|
----------------------------------
|
2018-08-31 14:56:26 -04:00
|
|
|
// TEST[skip:needs-licence]
|
2018-07-25 11:10:32 -04:00
|
|
|
|
|
|
|
When there are multiple conditions they are combined with a logical `and`.
|
|
|
|
This is useful when we want the rule to apply to a range. We simply create
|
|
|
|
a rule with two conditions, one for each end of the desired range.
|
|
|
|
|
|
|
|
Here is an example where a count detector will skip results when the count
|
|
|
|
is greater than 30 and less than 50:
|
|
|
|
|
2019-09-09 12:35:50 -04:00
|
|
|
[source,console]
|
2018-07-25 11:10:32 -04:00
|
|
|
----------------------------------
|
2018-12-07 15:34:11 -05:00
|
|
|
PUT _ml/anomaly_detectors/rule_with_range
|
2018-07-25 11:10:32 -04:00
|
|
|
{
|
|
|
|
"analysis_config" : {
|
|
|
|
"bucket_span":"5m",
|
|
|
|
"detectors" :[{
|
|
|
|
"function":"count",
|
|
|
|
"custom_rules": [{
|
|
|
|
"actions": ["skip_result"],
|
|
|
|
"conditions": [
|
|
|
|
{
|
|
|
|
"applies_to": "actual",
|
|
|
|
"operator": "gt",
|
|
|
|
"value": 30
|
|
|
|
},
|
|
|
|
{
|
|
|
|
"applies_to": "actual",
|
|
|
|
"operator": "lt",
|
|
|
|
"value": 50
|
|
|
|
}
|
|
|
|
]
|
|
|
|
}]
|
|
|
|
}]
|
|
|
|
},
|
|
|
|
"data_description" : {
|
|
|
|
"time_field":"timestamp"
|
|
|
|
}
|
|
|
|
}
|
|
|
|
----------------------------------
|
2018-08-31 14:56:26 -04:00
|
|
|
// TEST[skip:needs-licence]
|
2018-07-25 11:10:32 -04:00
|
|
|
|
2020-02-25 12:30:14 -05:00
|
|
|
[[ml-custom-rules-lifecycle]]
|
|
|
|
==== Custom rules in the lifecycle of a job
|
2018-07-25 11:10:32 -04:00
|
|
|
|
2018-08-13 09:51:13 -04:00
|
|
|
Custom rules only affect results created after the rules were applied.
|
2019-07-26 14:07:01 -04:00
|
|
|
Let us imagine that we have configured an {anomaly-job} and it has been running
|
2018-07-25 11:10:32 -04:00
|
|
|
for some time. After observing its results we decide that we can employ
|
|
|
|
rules in order to get rid of some uninteresting results. We can use
|
2019-07-26 14:07:01 -04:00
|
|
|
the {ref}/ml-update-job.html[update {anomaly-job} API] to do so. However, the
|
|
|
|
rule we added will only be in effect for any results created from the moment we
|
2020-02-25 12:30:14 -05:00
|
|
|
added the rule onwards. Past results will remain unaffected.
|
2018-07-25 11:10:32 -04:00
|
|
|
|
2020-02-25 12:30:14 -05:00
|
|
|
[[ml-custom-rules-filtering]]
|
2019-07-26 14:07:01 -04:00
|
|
|
==== Using custom rules vs. filtering data
|
2018-07-25 11:10:32 -04:00
|
|
|
|
|
|
|
It might appear like using rules is just another way of filtering the data
|
2019-07-26 14:07:01 -04:00
|
|
|
that feeds into an {anomaly-job}. For example, a rule that skips results when
|
|
|
|
the partition field value is in a filter sounds equivalent to having a query
|
2018-07-25 11:10:32 -04:00
|
|
|
that filters out such documents. But it is not. There is a fundamental
|
|
|
|
difference. When the data is filtered before reaching a job it is as if they
|
|
|
|
never existed for the job. With rules, the data still reaches the job and
|
|
|
|
affects its behavior (depending on the rule actions).
|
|
|
|
|
|
|
|
For example, a rule with the `skip_result` action means all data will still
|
|
|
|
be modeled. On the other hand, a rule with the `skip_model_update` action means
|
|
|
|
results will still be created even though the model will not be updated by
|
|
|
|
data matched by a rule.
|