diff --git a/docs/en/ml/categories.asciidoc b/docs/en/ml/categories.asciidoc new file mode 100644 index 00000000000..f28a0885268 --- /dev/null +++ b/docs/en/ml/categories.asciidoc @@ -0,0 +1,87 @@ +[[ml-configuring-categories]] +=== Categorizing log messages + +Application log events are often unstructured and contain variable data. For +example: +//Obtained from it_ops_new_app_logs.json +[source,js] +---------------------------------- +{"time":1454516381000,"message":"org.jdbi.v2.exceptions.UnableToExecuteStatementException: com.mysql.jdbc.exceptions.MySQLTimeoutException: Statement cancelled due to timeout or client request [statement:\"SELECT id, customer_id, name, force_disabled, enabled FROM customers\"]","type":"logs"} +---------------------------------- +//NOTCONSOLE + +You can use {ml} to observe the static parts of the message, cluster similar +messages together, and classify them into message categories. The {ml} model +learns what volume and pattern is normal for each category over time. You can +then detect anomalies and surface rare events or unusual types of messages by +using count or rare functions. For example: + +//Obtained from it_ops_new_app_logs.sh +[source,js] +---------------------------------- +PUT _xpack/ml/anomaly_detectors/it_ops_new_logs +{ + "description" : "IT Ops Application Logs", + "analysis_config" : { + "categorization_field_name": "message", <1> + "bucket_span":"30m", + "detectors" :[{ + "function":"count", + "by_field_name": "mlcategory", <2> + "detector_description": "Unusual message counts" + }], + "categorization_filters":[ "\\[statement:.*\\]"] + }, + "analysis_limits":{ + "categorization_examples_limit": 5 + }, + "data_description" : { + "time_field":"time", + "time_format": "epoch_ms" + } +} +---------------------------------- +//CONSOLE +<1> The `categorization_field_name` property indicates which field will be +categorized. +<2> The resulting categories can be used in a detector by setting `by_field_name`, +`over_field_name`, or `partition_field_name` to the keyword `mlcategory`. + +The optional `categorization_examples_limit` property specifies the +maximum number of examples that are stored in memory and in the results data +store for each category. The default value is `4`. Note that this setting does +not affect the categorization; it just affects the list of visible examples. If +you increase this value, more examples are available, but you must have more +storage available. If you set this value to `0`, no examples are stored. + +The optional `categorization_filters` property can contain an array of regular +expressions. If a categorization field value matches the regular expression, the +portion of the field that is matched is not taken into consideration when +defining categories. The categorization filters are applied in the order they +are listed in the job configuration, which allows you to disregard multiple +sections of the categorization field value. In this example, we have decided that +we do not want the detailed SQL to be considered in the message categorization. +This particular categorization filter removes the SQL statement from the categorization +algorithm. + +If your data is stored in {es}, you can create an advanced job with these same +properties: + +[role="screenshot"] +image::images/ml-category-advanced.jpg["Advanced job configuration options related to categorization"] + +NOTE: To add the `categorization_examples_limit` property, you must use the +**Edit JSON** tab and copy the `analysis_limits` object from the API example. + + +After you open the job and start the {dfeed} or supply data to the job, you can +view the results in {kib}. For example: + +[role="screenshot"] +image::images/ml-category-anomalies.jpg["Categorization example in the Anomaly Explorer"] + +For this type of job, the **Anomaly Explorer** contains extra information for +each anomaly: the name of the category (for example, `mlcategory 11`) and +examples of the messages in that category. In this case, you can use these +details to investigate occurrences of unusually high message counts for specific +message categories. diff --git a/docs/en/ml/configuring.asciidoc b/docs/en/ml/configuring.asciidoc index cbbca119ee3..064bc490bca 100644 --- a/docs/en/ml/configuring.asciidoc +++ b/docs/en/ml/configuring.asciidoc @@ -29,5 +29,7 @@ The scenarios in this section describe some best practices for generating useful {ml} results and insights from your data. * <> +* <> include::aggregations.asciidoc[] +include::categories.asciidoc[] diff --git a/docs/en/ml/images/ml-category-advanced.jpg b/docs/en/ml/images/ml-category-advanced.jpg new file mode 100644 index 00000000000..0a862903c0b Binary files /dev/null and b/docs/en/ml/images/ml-category-advanced.jpg differ diff --git a/docs/en/ml/images/ml-category-anomalies.jpg b/docs/en/ml/images/ml-category-anomalies.jpg new file mode 100644 index 00000000000..2d8f805b963 Binary files /dev/null and b/docs/en/ml/images/ml-category-anomalies.jpg differ diff --git a/docs/en/rest-api/ml/get-category.asciidoc b/docs/en/rest-api/ml/get-category.asciidoc index fb44764ab24..a038d39d655 100644 --- a/docs/en/rest-api/ml/get-category.asciidoc +++ b/docs/en/rest-api/ml/get-category.asciidoc @@ -12,7 +12,9 @@ categories. `GET _xpack/ml/anomaly_detectors//results/categories/` -//===== Description +==== Description + +For more information about categories, see <>. ==== Path Parameters diff --git a/docs/en/rest-api/ml/jobresource.asciidoc b/docs/en/rest-api/ml/jobresource.asciidoc index 5240b59e66b..a1018316a7b 100644 --- a/docs/en/rest-api/ml/jobresource.asciidoc +++ b/docs/en/rest-api/ml/jobresource.asciidoc @@ -85,6 +85,7 @@ An analysis configuration object has the following properties: (string) If not null, the values of the specified field will be categorized. The resulting categories can be used in a detector by setting `by_field_name`, `over_field_name`, or `partition_field_name` to the keyword `mlcategory`. + For more information, see <>. `categorization_filters`:: (array of strings) If `categorization_field_name` is specified, @@ -93,7 +94,8 @@ An analysis configuration object has the following properties: off the categorization field values. This functionality is useful to fine tune categorization by excluding sequences that should not be taken into consideration for defining categories. For example, you can exclude SQL - statements that appear in your log files. + statements that appear in your log files. For more information, + see <>. `detectors`:: (array) An array of detector configuration objects, @@ -263,6 +265,7 @@ The `analysis_limits` object has the following properties: If you set this value to `0`, no examples are stored. + NOTE: The `categorization_examples_limit` only applies to analysis that uses categorization. +For more information, see <>. `model_memory_limit`:: (long) The approximate maximum amount of memory resources that are required diff --git a/docs/en/rest-api/ml/resultsresource.asciidoc b/docs/en/rest-api/ml/resultsresource.asciidoc index fc03fe6fe95..0519eb26258 100644 --- a/docs/en/rest-api/ml/resultsresource.asciidoc +++ b/docs/en/rest-api/ml/resultsresource.asciidoc @@ -3,7 +3,7 @@ === Results Resources Several different result types are created for each job. You can query anomaly -results for _buckets_, _influencers_ and _records_ by using the results API. +results for _buckets_, _influencers_, and _records_ by using the results API. Results are written for each `bucket_span`. The timestamp for the results is the start of the bucket time interval. @@ -31,11 +31,11 @@ indicate that at 16:05 Bob sent 837262434 bytes, when the typical value was entity too, you can drill through to the record results in order to investigate the anomalous behavior. -//TBD Add links to categorization Categorization results contain the definitions of _categories_ that have been identified. These are only applicable for jobs that are configured to analyze unstructured log data using categorization. These results do not contain a -timestamp or any calculated scores. +timestamp or any calculated scores. For more information, +see <>. * <> * <>