This commit is contained in:
ashwinkumar12345 2021-10-01 11:22:47 -07:00
parent c378da8799
commit 1fdf8b9517
2 changed files with 101 additions and 108 deletions

View File

@ -240,56 +240,6 @@ POST _plugins/_anomaly_detection/detectors
} }
``` ```
To create a historical detector:
#### Request
```json
POST _plugins/_anomaly_detection/detectors
{
"name": "test1",
"description": "test historical detector",
"time_field": "timestamp",
"indices": [
"host-cloudwatch"
],
"filter_query": {
"match_all": {
"boost": 1
}
},
"detection_interval": {
"period": {
"interval": 1,
"unit": "Minutes"
}
},
"window_delay": {
"period": {
"interval": 1,
"unit": "Minutes"
}
},
"feature_attributes": [
{
"feature_name": "F1",
"feature_enabled": true,
"aggregation_query": {
"f_1": {
"sum": {
"field": "value"
}
}
}
}
],
"detection_date_range": {
"start_time": 1577840401000,
"end_time": 1606121925000
}
}
```
You can specify the following options. You can specify the following options.
Options | Description | Type | Required Options | Description | Type | Required
@ -303,7 +253,6 @@ Options | Description | Type | Required
`detection_interval` | The time interval for your anomaly detector. | `object` | Yes `detection_interval` | The time interval for your anomaly detector. | `object` | Yes
`window_delay` | Add extra processing time for data collection. | `object` | No `window_delay` | Add extra processing time for data collection. | `object` | No
`category_field` | Categorizes or slices data with a dimension. Similar to `GROUP BY` in SQL. | `list` | No `category_field` | Categorizes or slices data with a dimension. Similar to `GROUP BY` in SQL. | `list` | No
`detection_date_range` | Specify the start time and end time for a historical detector. | `object` | No
--- ---
@ -316,10 +265,44 @@ Passes a date range to the anomaly detector to return any anomalies within that
#### Request #### Request
```json ```json
POST _plugins/_anomaly_detection/detectors/<detectorId>/_preview POST _plugins/_anomaly_detection/detectors/_preview
{ {
"period_start": 1588838250000, "period_start": 1612982516000,
"period_end": 1589443050000 "period_end": 1614278539000,
"detector": {
"name": "test-detector",
"description": "test nab_art_daily_jumpsdown",
"time_field": "timestamp",
"indices": [
"nab_art_daily_jumpsdown"
],
"detection_interval": {
"period": {
"interval": 1,
"unit": "Minutes"
}
},
"window_delay": {
"period": {
"interval": 1,
"unit": "Minutes"
}
},
"feature_attributes": [
{
"feature_name": "F1",
"feature_enabled": true,
"aggregation_query": {
"f_1": {
"sum": {
"field": "value"
}
}
}
}
]
}
} }
``` ```
@ -446,6 +429,17 @@ If you specify a category field, each result is associated with an entity:
``` ```
Or, you can specify the detector ID:
```json
POST _plugins/_anomaly_detection/detectors/_preview
{
"detector_id": "sYkUvHcBiZv51f-Lv8QN",
"period_start": 1612982516000,
"period_end": 1614278539000
}
```
--- ---
## Start detector job ## Start detector job
@ -472,6 +466,15 @@ POST _plugins/_anomaly_detection/detectors/<detectorId>/_start
} }
``` ```
To start historical analysis:
```json
POST _plugins/_anomaly_detection/detectors/<detectorId>/_start
{
"start_time": 1503168590000,
"end_time": 1617301324000
}
```
--- ---
@ -493,6 +496,12 @@ POST _plugins/_anomaly_detection/detectors/<detectorId>/_stop
Stopped detector: m4ccEnIBTXsGi3mvMt9p Stopped detector: m4ccEnIBTXsGi3mvMt9p
``` ```
To stop historical analysis:
```jsom
POST _plugins/_anomaly_detection/detectors/<detectorId>/_stop?historical=true
```
--- ---
## Search detector result ## Search detector result
@ -786,15 +795,6 @@ POST _plugins/_anomaly_detection/detectors/results/_search
} }
``` ```
In historical detectors, specify the `detector_id`.
To get the latest task:
#### Request
```json
GET _plugins/_anomaly_detection/detectors/<detector_id>?task=true
```
To query the anomaly results with `task_id`: To query the anomaly results with `task_id`:
#### Request #### Request

View File

@ -17,24 +17,22 @@ Anomaly detection automatically detects anomalies in your OpenSearch data in ne
You can pair the anomaly detection plugin with the [alerting plugin]({{site.url}}{{site.baseurl}}/monitoring-plugins/alerting/) to notify you as soon as an anomaly is detected. You can pair the anomaly detection plugin with the [alerting plugin]({{site.url}}{{site.baseurl}}/monitoring-plugins/alerting/) to notify you as soon as an anomaly is detected.
To use the anomaly detection plugin, your computer needs to have more than one CPU core.
{: .note }
## Get started with Anomaly Detection ## Get started with Anomaly Detection
To get started, choose **Anomaly Detection** in OpenSearch Dashboards. To get started, choose **Anomaly Detection** in OpenSearch Dashboards.
To first test with sample streaming data, choose **Sample Detectors** and try out one of the preconfigured detectors. To first test with sample streaming data, you can try out one of the preconfigured detectors with one of the sample datasets.
### Step 1: Create a detector ### Step 1: Define a detector
A detector is an individual anomaly detection task. You can create multiple detectors, and all the detectors can run simultaneously, with each analyzing data from different sources. A detector is an individual anomaly detection task. You can define multiple detectors, and all the detectors can run simultaneously, with each analyzing data from different sources.
1. Choose **Create Detector**. 1. Choose **Create Detector**.
1. Enter a name and brief description. Make sure the name is unique and descriptive enough to help you to identify the purpose of the detector. 1. Enter a name and brief description. Make sure the name is unique and descriptive enough to help you to identify the purpose of the detector.
1. For **Data source**, choose the index you want to use as the data source. You can optionally use index patterns to choose multiple indices. 1. For **Data source**, choose the index you want to use as the data source. You can optionally use index patterns to choose multiple indices.
1. (Optional) For **Data filter**, filter the index you chose as the data source. From the **Data filter** menu, design your filter query by selecting **Field**, **Operator**, and **Value**, or choose **Use query DSL** and add your own JSON filter query.
1. Select the **Timestamp field** in your index. 1. Select the **Timestamp field** in your index.
1. (Optional) For **Data filter**, filter the index you chose as the data source. From the **Filter type** menu, choose **Visual filter**, and then design your filter query by selecting **Fields**, **Operator**, and **Value**, or choose **Custom Expression** and add your own JSON filter query. 1. (Optional) For **Data filter**, filter the index you chose as the data source. From the **Filter type** menu, choose **Visual filter**, and then design your filter query by selecting **Fields**, **Operator**, and **Value**, or choose **Custom Expression** and add your own JSON filter query.
1. For **Detector operation settings**, define the **Detector interval**, which is the time interval at which the detector collects data. 1. For **Operation settings**, define the **Detector interval**, which is the time interval at which the detector collects data.
- The detector aggregates the data in this interval, then feeds the aggregated result into the anomaly detection model. - The detector aggregates the data in this interval, then feeds the aggregated result into the anomaly detection model.
The shorter you set this interval, the fewer data points the detector aggregates. The shorter you set this interval, the fewer data points the detector aggregates.
The anomaly detection model uses a shingling process, a technique that uses consecutive data points to create a sample for the model. This process needs a certain number of aggregated data points from contiguous intervals. The anomaly detection model uses a shingling process, a technique that uses consecutive data points to create a sample for the model. This process needs a certain number of aggregated data points from contiguous intervals.
@ -44,9 +42,9 @@ Set the window delay to shift the detector interval to account for this delay.
- For example, say the detector interval is 10 minutes and data is ingested into your cluster with a general delay of 1 minute. - For example, say the detector interval is 10 minutes and data is ingested into your cluster with a general delay of 1 minute.
Assume the detector runs at 2:00. The detector attempts to get the last 10 minutes of data from 1:50 to 2:00, but because of the 1-minute delay, it only gets 9 minutes of data and misses the data from 1:59 to 2:00. Assume the detector runs at 2:00. The detector attempts to get the last 10 minutes of data from 1:50 to 2:00, but because of the 1-minute delay, it only gets 9 minutes of data and misses the data from 1:59 to 2:00.
Setting the window delay to 1 minute shifts the interval window to 1:49 - 1:59, so the detector accounts for all 10 minutes of the detector interval time. Setting the window delay to 1 minute shifts the interval window to 1:49 - 1:59, so the detector accounts for all 10 minutes of the detector interval time.
1. Choose **Create**. 1. Choose **Next**.
After you create the detector, the next step is to add features to it. After you define the detector, the next step is to configure the model.
### Step 2: Add features to your detector ### Step 2: Add features to your detector
@ -54,24 +52,25 @@ A feature is the field in your index that you want to check for anomalies. A det
For example, if you choose `min()`, the detector focuses on finding anomalies based on the minimum values of your feature. If you choose `average()`, the detector finds anomalies based on the average values of your feature. For example, if you choose `min()`, the detector focuses on finding anomalies based on the minimum values of your feature. If you choose `average()`, the detector finds anomalies based on the average values of your feature.
A multi-feature model correlates anomalies across all its features. The [curse of dimensionality](https://en.wikipedia.org/wiki/Curse_of_dimensionality) makes it less likely for multi-feature models to identify smaller anomalies as compared to a single-feature model. Adding more features might negatively impact the [precision and recall](https://en.wikipedia.org/wiki/Precision_and_recall) of a model. A higher proportion of noise in your data might further amplify this negative impact. Selecting the optimal feature set is usually an iterative process. We recommend experimenting with a historical detector with different feature sets and checking the precision before moving on to real-time detectors. By default, the maximum number of features for a detector is 5. You can adjust this limit with the `plugins.anomaly_detection.max_anomaly_features` setting. A multi-feature model correlates anomalies across all its features. The [curse of dimensionality](https://en.wikipedia.org/wiki/Curse_of_dimensionality) makes it less likely for multi-feature models to identify smaller anomalies as compared to a single-feature model. Adding more features might negatively impact the [precision and recall](https://en.wikipedia.org/wiki/Precision_and_recall) of a model. A higher proportion of noise in your data might further amplify this negative impact. Selecting the optimal feature set is usually an iterative process. By default, the maximum number of features for a detector is 5. You can adjust this limit with the `plugins.anomaly_detection.max_anomaly_features` setting.
{: .note } {: .note }
1. On the **Model configuration** page, enter the **Feature name**. 1. On the **Configure Model** page, enter the **Feature name** and check **Enabled feature name**.
1. For **Find anomalies based on**, choose the method to find anomalies. For **Field Value** menu, choose the **field** and the **aggregation method**. Or choose **Custom expression**, and add your own JSON aggregation query. 1. For **Find anomalies based on**, choose the method to find anomalies. For **Field Value**, choose the **aggregation method**. Or choose **Custom expression**, and add your own JSON aggregation query.
1. Select a field.
#### (Optional) Set a category field for high cardinality #### (Optional) Set category fields for high cardinality
You can categorize anomalies based on a keyword or IP field type. You can categorize anomalies based on a keyword or IP field type.
The category field categorizes or slices the source time series with a dimension like IP addresses, product IDs, country codes, and so on. This helps to see a granular view of anomalies within each entity of the category field to isolate and debug issues. The category field categorizes or slices the source time series with a dimension like IP addresses, product IDs, country codes, and so on. This helps to see a granular view of anomalies within each entity of the category field to isolate and debug issues.
To set a category field, choose **Enable a category field** and select a field. To set a category field, choose **Enable a category field** and select a field. You cant change the category fields after you create the detector.
Only a certain number of unique entities are supported in the category field. Use the following equation to calculate the recommended total number of entities supported in a cluster: Only a certain number of unique entities are supported in the category field. Use the following equation to calculate the recommended total number of entities supported in a cluster:
``` ```
(data nodes * heap size * anomaly detection maximum memory percentage) / (entity size of a detector) (data nodes * heap size * anomaly detection maximum memory percentage) / (entity model size of a detector)
``` ```
This formula provides a good starting point, but make sure to test with a representative workload. This formula provides a good starting point, but make sure to test with a representative workload.
@ -79,7 +78,7 @@ This formula provides a good starting point, but make sure to test with a repres
For example, for a cluster with 3 data nodes, each with 8G of JVM heap size, a maximum memory percentage of 10% (default), and the entity size of the detector as 1MB: the total number of unique entities supported is (8.096 * 10^9 * 0.1 / 1M ) * 3 = 2429. For example, for a cluster with 3 data nodes, each with 8G of JVM heap size, a maximum memory percentage of 10% (default), and the entity size of the detector as 1MB: the total number of unique entities supported is (8.096 * 10^9 * 0.1 / 1M ) * 3 = 2429.
#### Set a shingle size #### (Advanced settings) Set a shingle size
Set the number of aggregation intervals from your data stream to consider in a detection window. Its best to choose this value based on your actual data to see which one leads to the best results for your use case. Set the number of aggregation intervals from your data stream to consider in a detection window. Its best to choose this value based on your actual data to see which one leads to the best results for your use case.
@ -92,10 +91,25 @@ For sample previews, the anomaly detection plugin selects a small number of data
Examine the sample preview and use it to fine-tune your feature configurations (for example, enable or disable features) to get more accurate results. Examine the sample preview and use it to fine-tune your feature configurations (for example, enable or disable features) to get more accurate results.
1. Choose **Save and start detector**. 1. Choose **Preview sample anomalies**.
1. Choose between automatically starting the detector (recommended) or manually starting the detector at a later time. - If you don't see any sample anomaly result, check the detector interval and make sure you have more than 400 data points for some entities during the preview date range.
1. Choose **Next**.
### Step 3: Observe the results ### Step 3: Set up detector jobs
To start a real-time detector to find anomalies in your data in near real-time, check **Start real-time detector automatically (recommended)**.
Alternatively, if you want to perform historical analysis and find patterns in long historical data windows (weeks or months), check **Run historical analysis detection** and select a date range (at least 128 detection intervals).
Analyzing historical data helps you get familiar with the anomaly detection plugin. You can also evaluate the performance of a detector with historical data to further fine-tune it.
We recommend experimenting with historical analysis with different feature sets and checking the precision before moving on to real-time detectors.
### Step 4: Review and create
Review your model configuration and select **Create detector**.
### Step 5: Observe the results
Choose the **Anomaly results** tab. You need to wait for some time to see the anomaly results. If the detector interval is 10 minutes, the detector might take more than an hour to start, as it's waiting for sufficient data to generate anomalies. Choose the **Anomaly results** tab. You need to wait for some time to see the anomaly results. If the detector interval is 10 minutes, the detector might take more than an hour to start, as it's waiting for sufficient data to generate anomalies.
@ -106,7 +120,7 @@ If you see the detector pending in "initialization" for longer than a day, aggre
![Anomaly detection results]({{site.url}}{{site.baseurl}}/images/ad.png) ![Anomaly detection results]({{site.url}}{{site.baseurl}}/images/ad.png)
Analize anomalies with the following visualizations: Analyze anomalies with the following visualizations:
- **Live anomalies** - displays live anomaly results for the last 60 intervals. For example, if the interval is 10, it shows results for the last 600 minutes. The chart refreshes every 30 seconds. - **Live anomalies** - displays live anomaly results for the last 60 intervals. For example, if the interval is 10, it shows results for the last 600 minutes. The chart refreshes every 30 seconds.
- **Anomaly history** - plots the anomaly grade with the corresponding measure of confidence. - **Anomaly history** - plots the anomaly grade with the corresponding measure of confidence.
@ -135,31 +149,10 @@ To see all the configuration settings for a detector, choose the **Detector conf
1. To make any changes to the detector configuration, or fine tune the time interval to minimize any false positives, go to the **Detector configuration** section and choose **Edit**. 1. To make any changes to the detector configuration, or fine tune the time interval to minimize any false positives, go to the **Detector configuration** section and choose **Edit**.
- You need to stop the detector to change its configuration. Confirm that you want to stop the detector and proceed. - You need to stop the detector to change its configuration. Confirm that you want to stop the detector and proceed.
1. To enable or disable features, in the **Features** section, choose **Edit** and adjust the feature settings as needed. After you make your changes, choose **Save and start detector**. 1. To enable or disable features, in the **Features** section, choose **Edit** and adjust the feature settings as needed. After you make your changes, choose **Save and start detector**.
- Choose between automatically starting the detector (recommended) or manually starting the detector at a later time.
### Step 6: Analyze historical data ### Step 8: Manage your detectors
Analyzing historical data helps you get familiar with the anomaly detection plugin. You can also evaluate the performance of a detector with historical data to further fine-tune it. To start, stop, or delete a detector, go to the **Detectors** page.
To use a historical detector, you need to specify a date range that has data present in at least 1,000 detection intervals. 1. Choose the detector name.
{: .note } 2. Choose **Actions** and select **Start real-time detectors**, **Stop real-time detectors**, or **Delete detectors**.
1. Choose **Historical detectors** and **Create historical detector**.
1. Enter the **Name** of the detector and a brief **Description**.
1. For **Data source**, choose the index to use as the data source. You can optionally use index patterns to choose multiple indices.
1. For **Time range**, select a time range for historical analysis.
1. For **Detector settings**, choose to use the settings of an existing detector. Or choose the **Timestamp field** in your index, add individual features to the detector, and set the detector interval.
1. (Optional) Choose to run the historical detector automatically after creating it.
1. Choose **Create**.
- You can stop the historical detector even before it completes.
### Step 7: Manage your detectors
To change or delete a detector, go to the **Detector details** page.
1. To make changes to your detector, choose the detector name.
1. Choose **Actions** and **Edit detector**.
- You need to stop the detector to change its configuration. Confirm that you want to stop the detector and proceed.
1. Make your changes and choose **Save changes**.
To delete your detector, choose **Actions** and **Delete detector**. In the pop-up box, type `delete` to confirm and choose **Delete**.