[DOCS] Add ML categorization of messages (elastic/x-pack-elasticsearch#1666)
* [DOCS] Add ML categorization of messages * [DOCS] Describe ML categorization_examples_limit property * [DOCS] Updated ML categorization of messages * [DOCS] Add links to ML categorization Original commit: elastic/x-pack-elasticsearch@6403f6ce84
This commit is contained in:
parent
29811ea1d8
commit
62ee1bc635
|
@ -0,0 +1,87 @@
|
|||
[[ml-configuring-categories]]
|
||||
=== Categorizing log messages
|
||||
|
||||
Application log events are often unstructured and contain variable data. For
|
||||
example:
|
||||
//Obtained from it_ops_new_app_logs.json
|
||||
[source,js]
|
||||
----------------------------------
|
||||
{"time":1454516381000,"message":"org.jdbi.v2.exceptions.UnableToExecuteStatementException: com.mysql.jdbc.exceptions.MySQLTimeoutException: Statement cancelled due to timeout or client request [statement:\"SELECT id, customer_id, name, force_disabled, enabled FROM customers\"]","type":"logs"}
|
||||
----------------------------------
|
||||
//NOTCONSOLE
|
||||
|
||||
You can use {ml} to observe the static parts of the message, cluster similar
|
||||
messages together, and classify them into message categories. The {ml} model
|
||||
learns what volume and pattern is normal for each category over time. You can
|
||||
then detect anomalies and surface rare events or unusual types of messages by
|
||||
using count or rare functions. For example:
|
||||
|
||||
//Obtained from it_ops_new_app_logs.sh
|
||||
[source,js]
|
||||
----------------------------------
|
||||
PUT _xpack/ml/anomaly_detectors/it_ops_new_logs
|
||||
{
|
||||
"description" : "IT Ops Application Logs",
|
||||
"analysis_config" : {
|
||||
"categorization_field_name": "message", <1>
|
||||
"bucket_span":"30m",
|
||||
"detectors" :[{
|
||||
"function":"count",
|
||||
"by_field_name": "mlcategory", <2>
|
||||
"detector_description": "Unusual message counts"
|
||||
}],
|
||||
"categorization_filters":[ "\\[statement:.*\\]"]
|
||||
},
|
||||
"analysis_limits":{
|
||||
"categorization_examples_limit": 5
|
||||
},
|
||||
"data_description" : {
|
||||
"time_field":"time",
|
||||
"time_format": "epoch_ms"
|
||||
}
|
||||
}
|
||||
----------------------------------
|
||||
//CONSOLE
|
||||
<1> The `categorization_field_name` property indicates which field will be
|
||||
categorized.
|
||||
<2> The resulting categories can be used in a detector by setting `by_field_name`,
|
||||
`over_field_name`, or `partition_field_name` to the keyword `mlcategory`.
|
||||
|
||||
The optional `categorization_examples_limit` property specifies the
|
||||
maximum number of examples that are stored in memory and in the results data
|
||||
store for each category. The default value is `4`. Note that this setting does
|
||||
not affect the categorization; it just affects the list of visible examples. If
|
||||
you increase this value, more examples are available, but you must have more
|
||||
storage available. If you set this value to `0`, no examples are stored.
|
||||
|
||||
The optional `categorization_filters` property can contain an array of regular
|
||||
expressions. If a categorization field value matches the regular expression, the
|
||||
portion of the field that is matched is not taken into consideration when
|
||||
defining categories. The categorization filters are applied in the order they
|
||||
are listed in the job configuration, which allows you to disregard multiple
|
||||
sections of the categorization field value. In this example, we have decided that
|
||||
we do not want the detailed SQL to be considered in the message categorization.
|
||||
This particular categorization filter removes the SQL statement from the categorization
|
||||
algorithm.
|
||||
|
||||
If your data is stored in {es}, you can create an advanced job with these same
|
||||
properties:
|
||||
|
||||
[role="screenshot"]
|
||||
image::images/ml-category-advanced.jpg["Advanced job configuration options related to categorization"]
|
||||
|
||||
NOTE: To add the `categorization_examples_limit` property, you must use the
|
||||
**Edit JSON** tab and copy the `analysis_limits` object from the API example.
|
||||
|
||||
|
||||
After you open the job and start the {dfeed} or supply data to the job, you can
|
||||
view the results in {kib}. For example:
|
||||
|
||||
[role="screenshot"]
|
||||
image::images/ml-category-anomalies.jpg["Categorization example in the Anomaly Explorer"]
|
||||
|
||||
For this type of job, the **Anomaly Explorer** contains extra information for
|
||||
each anomaly: the name of the category (for example, `mlcategory 11`) and
|
||||
examples of the messages in that category. In this case, you can use these
|
||||
details to investigate occurrences of unusually high message counts for specific
|
||||
message categories.
|
|
@ -29,5 +29,7 @@ The scenarios in this section describe some best practices for generating useful
|
|||
{ml} results and insights from your data.
|
||||
|
||||
* <<ml-configuring-aggregation>>
|
||||
* <<ml-configuring-categories>>
|
||||
|
||||
include::aggregations.asciidoc[]
|
||||
include::categories.asciidoc[]
|
||||
|
|
Binary file not shown.
After Width: | Height: | Size: 118 KiB |
Binary file not shown.
After Width: | Height: | Size: 347 KiB |
|
@ -12,7 +12,9 @@ categories.
|
|||
|
||||
`GET _xpack/ml/anomaly_detectors/<job_id>/results/categories/<category_id>`
|
||||
|
||||
//===== Description
|
||||
==== Description
|
||||
|
||||
For more information about categories, see <<ml-configuring-categories>>.
|
||||
|
||||
==== Path Parameters
|
||||
|
||||
|
|
|
@ -85,6 +85,7 @@ An analysis configuration object has the following properties:
|
|||
(string) If not null, the values of the specified field will be categorized.
|
||||
The resulting categories can be used in a detector by setting `by_field_name`,
|
||||
`over_field_name`, or `partition_field_name` to the keyword `mlcategory`.
|
||||
For more information, see <<ml-configuring-categories>>.
|
||||
|
||||
`categorization_filters`::
|
||||
(array of strings) If `categorization_field_name` is specified,
|
||||
|
@ -93,7 +94,8 @@ An analysis configuration object has the following properties:
|
|||
off the categorization field values. This functionality is useful to fine tune
|
||||
categorization by excluding sequences that should not be taken into
|
||||
consideration for defining categories. For example, you can exclude SQL
|
||||
statements that appear in your log files.
|
||||
statements that appear in your log files. For more information,
|
||||
see <<ml-configuring-categories>>.
|
||||
|
||||
`detectors`::
|
||||
(array) An array of detector configuration objects,
|
||||
|
@ -263,6 +265,7 @@ The `analysis_limits` object has the following properties:
|
|||
If you set this value to `0`, no examples are stored. +
|
||||
|
||||
NOTE: The `categorization_examples_limit` only applies to analysis that uses categorization.
|
||||
For more information, see <<ml-configuring-categories>>.
|
||||
|
||||
`model_memory_limit`::
|
||||
(long) The approximate maximum amount of memory resources that are required
|
||||
|
|
|
@ -3,7 +3,7 @@
|
|||
=== Results Resources
|
||||
|
||||
Several different result types are created for each job. You can query anomaly
|
||||
results for _buckets_, _influencers_ and _records_ by using the results API.
|
||||
results for _buckets_, _influencers_, and _records_ by using the results API.
|
||||
|
||||
Results are written for each `bucket_span`. The timestamp for the results is the
|
||||
start of the bucket time interval.
|
||||
|
@ -31,11 +31,11 @@ indicate that at 16:05 Bob sent 837262434 bytes, when the typical value was
|
|||
entity too, you can drill through to the record results in order to investigate
|
||||
the anomalous behavior.
|
||||
|
||||
//TBD Add links to categorization
|
||||
Categorization results contain the definitions of _categories_ that have been
|
||||
identified. These are only applicable for jobs that are configured to analyze
|
||||
unstructured log data using categorization. These results do not contain a
|
||||
timestamp or any calculated scores.
|
||||
timestamp or any calculated scores. For more information,
|
||||
see <<ml-configuring-categories>>.
|
||||
|
||||
* <<ml-results-buckets,Buckets>>
|
||||
* <<ml-results-influencers,Influencers>>
|
||||
|
|
Loading…
Reference in New Issue