opensearch-docs-cn/_monitoring-plugins/ad/index.md

---
layout: default
title: Anomaly detection
nav_order: 46
has_children: true
redirect_from:
  - /monitoring-plugins/ad/
---

# Anomaly detection

An anomaly in OpenSearch is any unusual behavior change in your time-series data. Anomalies can provide valuable insights into your data. For example, for IT infrastructure data, an anomaly in the memory usage metric might help you uncover early signs of a system failure.

It can be challenging to discover anomalies using conventional methods such as creating visualizations and dashboards. You could configure an alert based on a static threshold, but this requires prior domain knowledge and isn't adaptive to data that exhibits organic growth or seasonal behavior.

Anomaly detection  automatically detects anomalies in your OpenSearch data in near real-time using the Random Cut Forest (RCF) algorithm. RCF is an unsupervised machine learning algorithm that models a sketch of your incoming data stream to compute an `anomaly grade` and `confidence score` value for each incoming data point. These values are used to differentiate an anomaly from normal variations. For more information about how RCF works, see [Random Cut Forests](https://api.semanticscholar.org/CorpusID:927435).

You can pair the anomaly detection plugin with the [alerting plugin]({{site.url}}{{site.baseurl}}/monitoring-plugins/alerting/) to notify you as soon as an anomaly is detected.

To get started, choose **Anomaly Detection** in OpenSearch Dashboards.
To first test with sample streaming data, you can try out one of the preconfigured detectors with one of the sample datasets.

## Step 1: Define a detector

A detector is an individual anomaly detection task. You can define multiple detectors, and all the detectors can run simultaneously, with each analyzing data from different sources.

1. Choose **Create detector**.
1. Add in the detector details.
   - Enter a name and brief description. Make sure the name is unique and descriptive enough to help you to identify the purpose of the detector.
1. Specify the data source.   
   - For **Data source**, choose the index you want to use as the data source. You can optionally use index patterns to choose multiple indices.
   - (Optional) For **Data filter**, filter the index you chose as the data source. From the **Data filter** menu, choose **Add data filter**, and then design your filter query by selecting **Field**, **Operator**, and **Value**, or choose **Use query DSL** and add your own JSON filter query.
1. Specify a timestamp.    
   - Select the **Timestamp field** in your index.
1. Define operation settings.
   - For **Operation settings**, define the **Detector interval**, which is the time interval at which the detector collects data.
      - The detector aggregates the data in this interval, then feeds the aggregated result into the anomaly detection model.
      The shorter you set this interval, the fewer data points the detector aggregates.
      The anomaly detection model uses a shingling process, a technique that uses consecutive data points to create a sample for the model. This process needs a certain number of aggregated data points from contiguous intervals.

      - We recommend setting the detector interval based on your actual data. If it's too long it might delay the results, and if it's too short it might miss some data. It also won't have a sufficient number of consecutive data points for the shingle process.

   - (Optional) To add extra processing time for data collection, specify a **Window delay** value.
      - This value tells the detector that the data is not ingested into OpenSearch in real time but with a certain delay. Set the window delay to shift the detector interval to account for this delay.
      - For example, say the detector interval is 10 minutes and data is ingested into your cluster with a general delay of 1 minute. Assume the detector runs at 2:00. The detector attempts to get the last 10 minutes of data from 1:50 to 2:00, but because of the 1-minute delay, it only gets 9 minutes of data and misses the data from 1:59 to 2:00. Setting the window delay to 1 minute shifts the interval window to 1:49 - 1:59, so the detector accounts for all 10 minutes of the detector interval time.
1. Specify custom result index.
   - If you want to store the anomaly detection results in your own index, choose **Enable custom result index** and specify the custom index to store the result. The anomaly detection plugin adds an `opensearch-ad-plugin-result-` prefix to the index name that you input. For example, if you input `abc` as the result index name, the final index name is `opensearch-ad-plugin-result-abc`.

   You can use the dash “-” sign to separate the namespace to manage custom result index permissions. For example, if you use `opensearch-ad-plugin-result-financial-us-group1` as the result index, you can create a permission role based on the pattern `opensearch-ad-plugin-result-financial-us-*` to represent the "financial" department at a granular level for the "us" area. 
   {: .note }

      - If the custom index you specify doesn’t already exist, the anomaly detection plugin creates this index when you create the detector and start your real-time or historical analysis.
      - If the custom index already exists, the plugin checks if the index mapping of the custom index matches the anomaly result file. You need to make sure the custom index has valid mapping as shown here: [anomaly-results.json](https://github.com/opensearch-project/anomaly-detection/blob/main/src/main/resources/mappings/anomaly-results.json).
   - To use the custom result index option, you need the following permissions:
      - `indices:admin/create` - If the custom index already exists, you don't need this.
      - `indices:data/write/index` - You need the `write` permission for the anomaly detection plugin to write results into the custom index for a single-entity detector.
      - `indices:data/write/delete` - Because the detector might generate a large number of anomaly results, you need the `delete` permission to delete old data and save disk space.
      - `indices:data/write/bulk*` -  You need the `bulk*` permission because the anomaly detection plugin uses the bulk API to write results into the custom index.
   - Managing the custom result index:
      - The anomaly detection dashboard queries all detectors’ results from all custom result indices. Having too many custom result indices might impact the performance of the anomaly detection plugin.
      - You can use [Index State Management]({{site.url}}{{site.baseurl}}/im-plugin/ism/index/) to rollover old result indices. You can also manually delete or archive any old result indices. We recommend reusing a custom result index for multiple detectors.
1. Choose **Next**.   

After you define the detector, the next step is to configure the model.

## Step 2: Configure the model

#### Add features to your detector

A feature is the field in your index that you want to check for anomalies. A detector can discover anomalies across one or more features. You must choose an aggregation method for each feature: `average()`, `count()`, `sum()`, `min()`, or `max()`. The aggregation method determines what constitutes an anomaly.

For example, if you choose `min()`, the detector focuses on finding anomalies based on the minimum values of your feature. If you choose `average()`, the detector finds anomalies based on the average values of your feature.

A multi-feature model correlates anomalies across all its features. The [curse of dimensionality](https://en.wikipedia.org/wiki/Curse_of_dimensionality) makes it less likely for multi-feature models to identify smaller anomalies as compared to a single-feature model. Adding more features might negatively impact the [precision and recall](https://en.wikipedia.org/wiki/Precision_and_recall) of a model. A higher proportion of noise in your data might further amplify this negative impact. Selecting the optimal feature set is usually an iterative process. By default, the maximum number of features for a detector is 5. You can adjust this limit with the `plugins.anomaly_detection.max_anomaly_features` setting.
{: .note }

1. On the **Configure Model** page, enter the **Feature name** and check **Enable feature**.
1. For **Find anomalies based on**, choose the method to find anomalies. For **Field Value**, choose the **aggregation method**. Or choose **Custom expression**, and add your own JSON aggregation query.
1. Select a field.

#### (Optional) Set category fields for high cardinality

You can categorize anomalies based on a keyword or IP field type.

The category field categorizes or slices the source time series with a dimension like IP addresses, product IDs, country codes, and so on. This helps to see a granular view of anomalies within each entity of the category field to isolate and debug issues.

To set a category field, choose **Enable a category field** and select a field. You can’t change the category fields after you create the detector.

Only a certain number of unique entities are supported in the category field. Use the following equation to calculate the recommended total number of entities supported in a cluster:

```
(data nodes * heap size * anomaly detection maximum memory percentage) / (entity model size of a detector)
```

To get the entity model size of a detector, use the [profile detector API]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/api/#profile-detector). You can adjust the maximum memory percentage with the `plugins.anomaly_detection.model_max_size_percent` setting.

This formula provides a good starting point, but make sure to test with a representative workload.
{: .note }

For example, for a cluster with three data nodes, each with 8 GB of JVM heap size, a maximum memory percentage of 10% (default), and the entity model size of the detector as 1MB: the total number of unique entities supported is (8.096 * 10^9 * 0.1 / 1 MB ) * 3 = 2429.

If the actual total number of unique entities higher than this number that you calculate (in this case: 2429), the anomaly detector makes its best effort to model the extra entities. The detector prioritizes entities that occur more often and are more recent.

#### (Advanced settings) Set a shingle size

Set the number of aggregation intervals from your data stream to consider in a detection window. It’s best to choose this value based on your actual data to see which one leads to the best results for your use case.

The anomaly detector expects the shingle size to be in the range of 1 and 60. The default shingle size is 8. We recommend that you don't choose 1 unless you have two or more features. Smaller values might increase [recall](https://en.wikipedia.org/wiki/Precision_and_recall) but also false positives. Larger values might be useful for ignoring noise in a signal.

#### Preview sample anomalies

Preview sample anomalies and adjust the feature settings if needed.
For sample previews, the anomaly detection plugin selects a small number of data samples---for example, one data point every 30 minutes---and uses interpolation to estimate the remaining data points to approximate the actual feature data. It loads this sample dataset into the detector. The detector uses this sample dataset to generate a sample preview of anomaly results.

Examine the sample preview and use it to fine-tune your feature configurations (for example, enable or disable features) to get more accurate results.

1. Choose **Preview sample anomalies**.
    - If you don't see any sample anomaly result, check the detector interval and make sure you have more than 400 data points for some entities during the preview date range.
1. Choose **Next**.

## Step 3: Set up detector jobs

To start a real-time detector to find anomalies in your data in near real-time, check **Start real-time detector automatically (recommended)**.

Alternatively, if you want to perform historical analysis and find patterns in long historical data windows (weeks or months), check **Run historical analysis detection** and select a date range (at least 128 detection intervals).

Analyzing historical data helps you get familiar with the anomaly detection plugin. You can also evaluate the performance of a detector with historical data to further fine-tune it.

We recommend experimenting with historical analysis with different feature sets and checking the precision before moving on to real-time detectors.

## Step 4: Review and create

Review your model configuration and select **Create detector**.

## Step 5: Observe the results

Choose the **Real-time results** or **Historical analysis** tab. For real-time results, you need to wait for some time to see the anomaly results. If the detector interval is 10 minutes, the detector might take more than an hour to start, as it's waiting for sufficient data to generate anomalies.

A shorter interval means the model passes the shingle process more quickly and starts to generate the anomaly results sooner.
Use the [profile detector]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/api#profile-detector) operation to make sure you have sufficient data points.

If you see the detector pending in "initialization" for longer than a day, aggregate your existing data using the detector interval to check for any missing data points. If you find a lot of missing data points from the aggregated data, consider increasing the detector interval.

![Anomaly detection results]({{site.url}}{{site.baseurl}}/images/ad.png)

Analyze anomalies with the following visualizations:

- **Live anomalies** - displays live anomaly results for the last 60 intervals. For example, if the interval is 10, it shows results for the last 600 minutes. The chart refreshes every 30 seconds.
- **Anomaly history** (for historical analysis) / **Anomaly overview** (for real-time results) - plots the anomaly grade with the corresponding measure of confidence.
- **Anomaly occurrence** - shows the `Start time`, `End time`, `Data confidence`, and `Anomaly grade` for each detected anomaly.
- **Feature breakdown** - plots the features based on the aggregation method. You can vary the date-time range of the detector.

`Anomaly grade` is a number between 0 and 1 that indicates how anomalous a data point is. An anomaly grade of 0 represents “not an anomaly,” and a non-zero value represents the relative severity of the anomaly.

`Data confidence` is an estimate of the probability that the reported anomaly grade matches the expected anomaly grade. Confidence increases as the model observes more data and learns the data behavior and trends. Note that confidence is distinct from model accuracy.

If you set the category field, you see an additional **Heat map** chart. The heat map correlates results for anomalous entities. This chart is empty until you select an anomalous entity. You also see the anomaly and feature line chart for the time period of the anomaly (`anomaly_grade` > 0).

Choose and drag over the anomaly line chart to zoom in and see a more detailed view of an anomaly.
{: .note }

## Step 6: Set up alerts

Under **Real-time results**, choose **Set up alerts** and configure a monitor to notify you when anomalies are detected. For steps to create a monitor and set up notifications based on your anomaly detector, see [Monitors]({{site.url}}{{site.baseurl}}/monitoring-plugins/alerting/monitors/).

If you stop or delete a detector, make sure to delete any monitors associated with it.

## Step 7: Adjust the model

To see all the configuration settings for a detector, choose the **Detector configuration** tab.

1. To make any changes to the detector configuration, or fine tune the time interval to minimize any false positives, go to the **Detector configuration** section and choose **Edit**.
- You need to stop real-time and historical analysis to change its configuration. Confirm that you want to stop the detector and proceed.
1. To enable or disable features, in the **Features** section, choose **Edit** and adjust the feature settings as needed. After you make your changes, choose **Save and start detector**.

## Step 8: Manage your detectors

To start, stop, or delete a detector, go to the **Detectors** page.

1. Choose the detector name.
2. Choose **Actions** and select **Start real-time detectors**, **Stop real-time detectors**, or **Delete detectors**.
-												Breaks site

											
										
										
											2021-05-28 13:48:19 -04:00
+								---
 								layout: default
 								title: Anomaly detection
 								nav_order: 46
 								has_children: true
-												Perhaps I can redirect myself out of this long nightmare

											
										
										
											2021-06-10 18:09:17 -04:00
+								redirect_from:
 								  - /monitoring-plugins/ad/
-												Breaks site

											
										
										
											2021-05-28 13:48:19 -04:00
+								---
 								# Anomaly detection
 								An anomaly in OpenSearch is any unusual behavior change in your time-series data. Anomalies can provide valuable insights into your data. For example, for IT infrastructure data, an anomaly in the memory usage metric might help you uncover early signs of a system failure.
 								It can be challenging to discover anomalies using conventional methods such as creating visualizations and dashboards. You could configure an alert based on a static threshold, but this requires prior domain knowledge and isn't adaptive to data that exhibits organic growth or seasonal behavior.
-												Clean up broken links

											
										
										
											2021-08-12 17:59:49 -04:00
+								Anomaly detection  automatically detects anomalies in your OpenSearch data in near real-time using the Random Cut Forest (RCF) algorithm. RCF is an unsupervised machine learning algorithm that models a sketch of your incoming data stream to compute an `anomaly grade` and `confidence score` value for each incoming data point. These values are used to differentiate an anomaly from normal variations. For more information about how RCF works, see [Random Cut Forests](https://api.semanticscholar.org/CorpusID:927435).
-												Breaks site

											
										
										
											2021-05-28 13:48:19 -04:00
-												No more relative links

											
										
										
											2021-06-09 22:15:41 -04:00
+								You can pair the anomaly detection plugin with the [alerting plugin]({{site.url}}{{site.baseurl}}/monitoring-plugins/alerting/) to notify you as soon as an anomaly is detected.
-												Breaks site

											
										
										
											2021-05-28 13:48:19 -04:00
 								To get started, choose **Anomaly Detection** in OpenSearch Dashboards.
-												ad 1.1

											
										
										
											2021-10-01 14:22:47 -04:00
+								To first test with sample streaming data, you can try out one of the preconfigured detectors with one of the sample datasets.
-												Breaks site

											
										
										
											2021-05-28 13:48:19 -04:00
-												ad feedback

											
										
										
											2021-10-05 21:25:45 -04:00
+								## Step 1: Define a detector
-												Breaks site

											
										
										
											2021-05-28 13:48:19 -04:00
-												ad 1.1

											
										
										
											2021-10-01 14:22:47 -04:00
+								A detector is an individual anomaly detection task. You can define multiple detectors, and all the detectors can run simultaneously, with each analyzing data from different sources.
-												Breaks site

											
										
										
											2021-05-28 13:48:19 -04:00
-												ad changes for 1.1

											
										
										
											2021-10-04 15:08:00 -04:00
+. Choose **Create detector**.
-												..

Signed-off-by: ashwinkumar12345 <kumarjao@users.noreply.github.com>

											
										
										
											2021-11-10 14:46:32 -05:00
+. Add in the detector details.
 								   - Enter a name and brief description. Make sure the name is unique and descriptive enough to help you to identify the purpose of the detector.
 . Specify the data source.
 								   - For **Data source**, choose the index you want to use as the data source. You can optionally use index patterns to choose multiple indices.
 								   - (Optional) For **Data filter**, filter the index you chose as the data source. From the **Data filter** menu, choose **Add data filter**, and then design your filter query by selecting **Field**, **Operator**, and **Value**, or choose **Use query DSL** and add your own JSON filter query.
 . Specify a timestamp.
 								   - Select the **Timestamp field** in your index.
 . Define operation settings.
 								   - For **Operation settings**, define the **Detector interval**, which is the time interval at which the detector collects data.
 								      - The detector aggregates the data in this interval, then feeds the aggregated result into the anomaly detection model.
 								      The shorter you set this interval, the fewer data points the detector aggregates.
 								      The anomaly detection model uses a shingling process, a technique that uses consecutive data points to create a sample for the model. This process needs a certain number of aggregated data points from contiguous intervals.
 								      - We recommend setting the detector interval based on your actual data. If it's too long it might delay the results, and if it's too short it might miss some data. It also won't have a sufficient number of consecutive data points for the shingle process.
 								   - (Optional) To add extra processing time for data collection, specify a **Window delay** value.
 								      - This value tells the detector that the data is not ingested into OpenSearch in real time but with a certain delay. Set the window delay to shift the detector interval to account for this delay.
 								      - For example, say the detector interval is 10 minutes and data is ingested into your cluster with a general delay of 1 minute. Assume the detector runs at 2:00. The detector attempts to get the last 10 minutes of data from 1:50 to 2:00, but because of the 1-minute delay, it only gets 9 minutes of data and misses the data from 1:59 to 2:00. Setting the window delay to 1 minute shifts the interval window to 1:49 - 1:59, so the detector accounts for all 10 minutes of the detector interval time.
 . Specify custom result index.
-												Incorporated feedback

Signed-off-by: ashwinkumar12345 <kumarjao@users.noreply.github.com>

											
										
										
											2021-11-11 16:59:50 -05:00
+								   - If you want to store the anomaly detection results in your own index, choose **Enable custom result index** and specify the custom index to store the result. The anomaly detection plugin adds an `opensearch-ad-plugin-result-` prefix to the index name that you input. For example, if you input `abc` as the result index name, the final index name is `opensearch-ad-plugin-result-abc`.
 								   You can use the dash “-” sign to separate the namespace to manage custom result index permissions. For example, if you use `opensearch-ad-plugin-result-financial-us-group1` as the result index, you can create a permission role based on the pattern `opensearch-ad-plugin-result-financial-us-*` to represent the "financial" department at a granular level for the "us" area.
 								   {: .note }
-												..

Signed-off-by: ashwinkumar12345 <kumarjao@users.noreply.github.com>

											
										
										
											2021-11-10 14:46:32 -05:00
+								      - If the custom index you specify doesn’t already exist, the anomaly detection plugin creates this index when you create the detector and start your real-time or historical analysis.
 								      - If the custom index already exists, the plugin checks if the index mapping of the custom index matches the anomaly result file. You need to make sure the custom index has valid mapping as shown here: [anomaly-results.json](https://github.com/opensearch-project/anomaly-detection/blob/main/src/main/resources/mappings/anomaly-results.json).
 								   - To use the custom result index option, you need the following permissions:
 								      - `indices:admin/create` - If the custom index already exists, you don't need this.
 								      - `indices:data/write/index` - You need the `write` permission for the anomaly detection plugin to write results into the custom index for a single-entity detector.
 								      - `indices:data/write/delete` - Because the detector might generate a large number of anomaly results, you need the `delete` permission to delete old data and save disk space.
 								      - `indices:data/write/bulk*` -  You need the `bulk*` permission because the anomaly detection plugin uses the bulk API to write results into the custom index.
 								   - Managing the custom result index:
 								      - The anomaly detection dashboard queries all detectors’ results from all custom result indices. Having too many custom result indices might impact the performance of the anomaly detection plugin.
-												Incorporated feedback

Signed-off-by: ashwinkumar12345 <kumarjao@users.noreply.github.com>

											
										
										
											2021-11-11 16:59:50 -05:00
+								      - You can use [Index State Management]({{site.url}}{{site.baseurl}}/im-plugin/ism/index/) to rollover old result indices. You can also manually delete or archive any old result indices. We recommend reusing a custom result index for multiple detectors.
-												..

Signed-off-by: ashwinkumar12345 <kumarjao@users.noreply.github.com>

											
										
										
											2021-11-10 14:46:32 -05:00
+. Choose **Next**.
-												Breaks site

											
										
										
											2021-05-28 13:48:19 -04:00
-												ad 1.1

											
										
										
											2021-10-01 14:22:47 -04:00
+								After you define the detector, the next step is to configure the model.
-												Breaks site

											
										
										
											2021-05-28 13:48:19 -04:00
-												ad feedback

											
										
										
											2021-10-05 21:25:45 -04:00
+								## Step 2: Configure the model
-												ad changes for 1.1

											
										
										
											2021-10-04 15:08:00 -04:00
 								#### Add features to your detector
-												Breaks site

											
										
										
											2021-05-28 13:48:19 -04:00
 								A feature is the field in your index that you want to check for anomalies. A detector can discover anomalies across one or more features. You must choose an aggregation method for each feature: `average()`, `count()`, `sum()`, `min()`, or `max()`. The aggregation method determines what constitutes an anomaly.
 								For example, if you choose `min()`, the detector focuses on finding anomalies based on the minimum values of your feature. If you choose `average()`, the detector finds anomalies based on the average values of your feature.
-												ad 1.1

											
										
										
											2021-10-01 14:22:47 -04:00
+								A multi-feature model correlates anomalies across all its features. The [curse of dimensionality](https://en.wikipedia.org/wiki/Curse_of_dimensionality) makes it less likely for multi-feature models to identify smaller anomalies as compared to a single-feature model. Adding more features might negatively impact the [precision and recall](https://en.wikipedia.org/wiki/Precision_and_recall) of a model. A higher proportion of noise in your data might further amplify this negative impact. Selecting the optimal feature set is usually an iterative process. By default, the maximum number of features for a detector is 5. You can adjust this limit with the `plugins.anomaly_detection.max_anomaly_features` setting.
-												Breaks site

											
										
										
											2021-05-28 13:48:19 -04:00
+								{: .note }
-												ad changes for 1.1

											
										
										
											2021-10-04 15:08:00 -04:00
+. On the **Configure Model** page, enter the **Feature name** and check **Enable feature**.
-												ad 1.1

											
										
										
											2021-10-01 14:22:47 -04:00
+. For **Find anomalies based on**, choose the method to find anomalies. For **Field Value**, choose the **aggregation method**. Or choose **Custom expression**, and add your own JSON aggregation query.
 . Select a field.
-												Breaks site

											
										
										
											2021-05-28 13:48:19 -04:00
-												ad 1.1

											
										
										
											2021-10-01 14:22:47 -04:00
+								#### (Optional) Set category fields for high cardinality
-												Breaks site

											
										
										
											2021-05-28 13:48:19 -04:00
 								You can categorize anomalies based on a keyword or IP field type.
 								The category field categorizes or slices the source time series with a dimension like IP addresses, product IDs, country codes, and so on. This helps to see a granular view of anomalies within each entity of the category field to isolate and debug issues.
-												ad 1.1

											
										
										
											2021-10-01 14:22:47 -04:00
+								To set a category field, choose **Enable a category field** and select a field. You can’t change the category fields after you create the detector.
-												Breaks site

											
										
										
											2021-05-28 13:48:19 -04:00
 								Only a certain number of unique entities are supported in the category field. Use the following equation to calculate the recommended total number of entities supported in a cluster:
 								```
-												ad 1.1

											
										
										
											2021-10-01 14:22:47 -04:00
+								(data nodes * heap size * anomaly detection maximum memory percentage) / (entity model size of a detector)
-												Breaks site

											
										
										
											2021-05-28 13:48:19 -04:00
+								```
-												more updates

											
										
										
											2021-10-03 13:35:18 -04:00
+								To get the entity model size of a detector, use the [profile detector API]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/api/#profile-detector). You can adjust the maximum memory percentage with the `plugins.anomaly_detection.model_max_size_percent` setting.
-												Breaks site

											
										
										
											2021-05-28 13:48:19 -04:00
+								This formula provides a good starting point, but make sure to test with a representative workload.
 								{: .note }
-												updated requests and responses

											
										
										
											2021-10-05 05:36:58 -04:00
+								For example, for a cluster with three data nodes, each with 8 GB of JVM heap size, a maximum memory percentage of 10% (default), and the entity model size of the detector as 1MB: the total number of unique entities supported is (8.096 * 10^9 * 0.1 / 1 MB ) * 3 = 2429.
-												more updates

											
										
										
											2021-10-03 13:35:18 -04:00
-												minor changes

											
										
										
											2021-10-05 15:41:37 -04:00
+								If the actual total number of unique entities higher than this number that you calculate (in this case: 2429), the anomaly detector makes its best effort to model the extra entities. The detector prioritizes entities that occur more often and are more recent.
-												Breaks site

											
										
										
											2021-05-28 13:48:19 -04:00
-												ad 1.1

											
										
										
											2021-10-01 14:22:47 -04:00
+								#### (Advanced settings) Set a shingle size
-												Breaks site

											
										
										
											2021-05-28 13:48:19 -04:00
-												changed window size to shingle size

											
										
										
											2021-08-25 14:21:56 -04:00
+								Set the number of aggregation intervals from your data stream to consider in a detection window. It’s best to choose this value based on your actual data to see which one leads to the best results for your use case.
-												Breaks site

											
										
										
											2021-05-28 13:48:19 -04:00
-												minor change

											
										
										
											2021-08-25 19:34:46 -04:00
+								The anomaly detector expects the shingle size to be in the range of 1 and 60. The default shingle size is 8. We recommend that you don't choose 1 unless you have two or more features. Smaller values might increase [recall](https://en.wikipedia.org/wiki/Precision_and_recall) but also false positives. Larger values might be useful for ignoring noise in a signal.
-												Breaks site

											
										
										
											2021-05-28 13:48:19 -04:00
 								#### Preview sample anomalies
 								Preview sample anomalies and adjust the feature settings if needed.
 								For sample previews, the anomaly detection plugin selects a small number of data samples---for example, one data point every 30 minutes---and uses interpolation to estimate the remaining data points to approximate the actual feature data. It loads this sample dataset into the detector. The detector uses this sample dataset to generate a sample preview of anomaly results.
 								Examine the sample preview and use it to fine-tune your feature configurations (for example, enable or disable features) to get more accurate results.
-												ad 1.1

											
										
										
											2021-10-01 14:22:47 -04:00
+. Choose **Preview sample anomalies**.
 								    - If you don't see any sample anomaly result, check the detector interval and make sure you have more than 400 data points for some entities during the preview date range.
 . Choose **Next**.
-												ad feedback

											
										
										
											2021-10-05 21:25:45 -04:00
+								## Step 3: Set up detector jobs
-												ad 1.1

											
										
										
											2021-10-01 14:22:47 -04:00
 								To start a real-time detector to find anomalies in your data in near real-time, check **Start real-time detector automatically (recommended)**.
 								Alternatively, if you want to perform historical analysis and find patterns in long historical data windows (weeks or months), check **Run historical analysis detection** and select a date range (at least 128 detection intervals).
 								Analyzing historical data helps you get familiar with the anomaly detection plugin. You can also evaluate the performance of a detector with historical data to further fine-tune it.
 								We recommend experimenting with historical analysis with different feature sets and checking the precision before moving on to real-time detectors.
-												ad feedback

											
										
										
											2021-10-05 21:25:45 -04:00
+								## Step 4: Review and create
-												Breaks site

											
										
										
											2021-05-28 13:48:19 -04:00
-												ad 1.1

											
										
										
											2021-10-01 14:22:47 -04:00
+								Review your model configuration and select **Create detector**.
-												ad feedback

											
										
										
											2021-10-05 21:25:45 -04:00
+								## Step 5: Observe the results
-												Breaks site

											
										
										
											2021-05-28 13:48:19 -04:00
-												more updates

											
										
										
											2021-10-03 13:35:18 -04:00
+								Choose the **Real-time results** or **Historical analysis** tab. For real-time results, you need to wait for some time to see the anomaly results. If the detector interval is 10 minutes, the detector might take more than an hour to start, as it's waiting for sufficient data to generate anomalies.
-												Breaks site

											
										
										
											2021-05-28 13:48:19 -04:00
 								A shorter interval means the model passes the shingle process more quickly and starts to generate the anomaly results sooner.
-												No more relative links

											
										
										
											2021-06-09 22:15:41 -04:00
+								Use the [profile detector]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/api#profile-detector) operation to make sure you have sufficient data points.
-												Breaks site

											
										
										
											2021-05-28 13:48:19 -04:00
 								If you see the detector pending in "initialization" for longer than a day, aggregate your existing data using the detector interval to check for any missing data points. If you find a lot of missing data points from the aggregated data, consider increasing the detector interval.
-												No more relative links

											
										
										
											2021-06-09 22:15:41 -04:00
+								![Anomaly detection results]({{site.url}}{{site.baseurl}}/images/ad.png)
-												Breaks site

											
										
										
											2021-05-28 13:48:19 -04:00
-												ad 1.1

											
										
										
											2021-10-01 14:22:47 -04:00
+								Analyze anomalies with the following visualizations:
-												Breaks site

											
										
										
											2021-05-28 13:48:19 -04:00
 								- **Live anomalies** - displays live anomaly results for the last 60 intervals. For example, if the interval is 10, it shows results for the last 600 minutes. The chart refreshes every 30 seconds.
-												more updates

											
										
										
											2021-10-03 13:35:18 -04:00
+								- **Anomaly history** (for historical analysis) / **Anomaly overview** (for real-time results) - plots the anomaly grade with the corresponding measure of confidence.
-												Breaks site

											
										
										
											2021-05-28 13:48:19 -04:00
+								- **Anomaly occurrence** - shows the `Start time`, `End time`, `Data confidence`, and `Anomaly grade` for each detected anomaly.
-												ad changes for 1.1

											
										
										
											2021-10-04 15:08:00 -04:00
+								- **Feature breakdown** - plots the features based on the aggregation method. You can vary the date-time range of the detector.
-												Breaks site

											
										
										
											2021-05-28 13:48:19 -04:00
-												No more relative links

											
										
										
											2021-06-09 22:15:41 -04:00
+								`Anomaly grade` is a number between 0 and 1 that indicates how anomalous a data point is. An anomaly grade of 0 represents “not an anomaly,” and a non-zero value represents the relative severity of the anomaly.
-												Breaks site

											
										
										
											2021-05-28 13:48:19 -04:00
 								`Data confidence` is an estimate of the probability that the reported anomaly grade matches the expected anomaly grade. Confidence increases as the model observes more data and learns the data behavior and trends. Note that confidence is distinct from model accuracy.
 								If you set the category field, you see an additional **Heat map** chart. The heat map correlates results for anomalous entities. This chart is empty until you select an anomalous entity. You also see the anomaly and feature line chart for the time period of the anomaly (`anomaly_grade` > 0).
-												more updates

											
										
										
											2021-10-03 13:35:18 -04:00
+								Choose and drag over the anomaly line chart to zoom in and see a more detailed view of an anomaly.
-												Breaks site

											
										
										
											2021-05-28 13:48:19 -04:00
+								{: .note }
-												ad feedback

											
										
										
											2021-10-05 21:25:45 -04:00
+								## Step 6: Set up alerts
-												Breaks site

											
										
										
											2021-05-28 13:48:19 -04:00
-												more updates

											
										
										
											2021-10-03 13:35:18 -04:00
+								Under **Real-time results**, choose **Set up alerts** and configure a monitor to notify you when anomalies are detected. For steps to create a monitor and set up notifications based on your anomaly detector, see [Monitors]({{site.url}}{{site.baseurl}}/monitoring-plugins/alerting/monitors/).
-												Breaks site

											
										
										
											2021-05-28 13:48:19 -04:00
 								If you stop or delete a detector, make sure to delete any monitors associated with it.
-												ad feedback

											
										
										
											2021-10-05 21:25:45 -04:00
+								## Step 7: Adjust the model
-												Breaks site

											
										
										
											2021-05-28 13:48:19 -04:00
 								To see all the configuration settings for a detector, choose the **Detector configuration** tab.
 . To make any changes to the detector configuration, or fine tune the time interval to minimize any false positives, go to the **Detector configuration** section and choose **Edit**.
-												incorporated feedback

											
										
										
											2021-10-05 17:43:31 -04:00
+								- You need to stop real-time and historical analysis to change its configuration. Confirm that you want to stop the detector and proceed.
-												Breaks site

											
										
										
											2021-05-28 13:48:19 -04:00
+. To enable or disable features, in the **Features** section, choose **Edit** and adjust the feature settings as needed. After you make your changes, choose **Save and start detector**.
-												ad feedback

											
										
										
											2021-10-05 21:25:45 -04:00
+								## Step 8: Manage your detectors
-												Breaks site

											
										
										
											2021-05-28 13:48:19 -04:00
-												ad 1.1

											
										
										
											2021-10-01 14:22:47 -04:00
+								To start, stop, or delete a detector, go to the **Detectors** page.
-												Breaks site

											
										
										
											2021-05-28 13:48:19 -04:00
-												ad 1.1

											
										
										
											2021-10-01 14:22:47 -04:00
+. Choose the detector name.
-												more updates

											
										
										
											2021-10-03 13:35:18 -04:00
+. Choose **Actions** and select **Start real-time detectors**, **Stop real-time detectors**, or **Delete detectors**.