more updates

2021-10-03 10:35:18 -07:00 · 2021-10-03 10:35:18 -07:00 · 9f74fed582
parent 1fdf8b9517
commit 9f74fed582
3 changed files with 2049 additions and 220 deletions
--- a/_monitoring-plugins/ad/api.md
+++ b/_monitoring-plugins/ad/api.md
--- a/_monitoring-plugins/ad/index.md
+++ b/_monitoring-plugins/ad/index.md
@ -73,10 +73,14 @@ Only a certain number of unique entities are supported in the category field. Us
 (data nodes * heap size * anomaly detection maximum memory percentage) / (entity model size of a detector)
 ```

+To get the entity model size of a detector, use the [profile detector API]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/api/#profile-detector). You can adjust the maximum memory percentage with the `plugins.anomaly_detection.model_max_size_percent` setting.
+
 This formula provides a good starting point, but make sure to test with a representative workload.
 {: .note }

-For example, for a cluster with 3 data nodes, each with 8G of JVM heap size, a maximum memory percentage of 10% (default), and the entity size of the detector as 1MB: the total number of unique entities supported is (8.096 * 10^9 * 0.1 / 1M ) * 3 = 2429.
+For example, for a cluster with 3 data nodes, each with 8G of JVM heap size, a maximum memory percentage of 10% (default), and the entity model size of the detector as 1MB: the total number of unique entities supported is (8.096 * 10^9 * 0.1 / 1M ) * 3 = 2429.
+
+If you set the total number of unique entities higher than this number that you calculate (in this case: 2429), the anomaly detector makes its best effort to model the extra entities. The detector prioritizes entities that occur more often and are more recent.

 #### (Advanced settings) Set a shingle size

@ -111,7 +115,7 @@ Review your model configuration and select **Create detector**.

 ### Step 5: Observe the results

-Choose the **Anomaly results** tab. You need to wait for some time to see the anomaly results. If the detector interval is 10 minutes, the detector might take more than an hour to start, as it's waiting for sufficient data to generate anomalies.
+Choose the **Real-time results** or **Historical analysis** tab. For real-time results, you need to wait for some time to see the anomaly results. If the detector interval is 10 minutes, the detector might take more than an hour to start, as it's waiting for sufficient data to generate anomalies.

 A shorter interval means the model passes the shingle process more quickly and starts to generate the anomaly results sooner.
 Use the [profile detector]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/api#profile-detector) operation to make sure you have sufficient data points.
@ -123,7 +127,7 @@ If you see the detector pending in "initialization" for longer than a day, aggre
 Analyze anomalies with the following visualizations:

 - **Live anomalies** - displays live anomaly results for the last 60 intervals. For example, if the interval is 10, it shows results for the last 600 minutes. The chart refreshes every 30 seconds.
- **Anomaly history** - plots the anomaly grade with the corresponding measure of confidence.
+- **Anomaly history** (for historical analysis) / **Anomaly overview** (for real-time results) - plots the anomaly grade with the corresponding measure of confidence.
 - **Feature breakdown** - plots the features based on the aggregation method. You can vary the date-time range of the detector.
 - **Anomaly occurrence** - shows the `Start time`, `End time`, `Data confidence`, and `Anomaly grade` for each detected anomaly.

@ -133,12 +137,13 @@ Analyze anomalies with the following visualizations:

 If you set the category field, you see an additional **Heat map** chart. The heat map correlates results for anomalous entities. This chart is empty until you select an anomalous entity. You also see the anomaly and feature line chart for the time period of the anomaly (`anomaly_grade` > 0).

-Choose a filled rectangle to see a more detailed view of the anomaly.
+Choose and drag over the anomaly line chart to zoom in and see a more detailed view of an anomaly.
 {: .note }

+
 ### Step 4: Set up alerts

-Choose **Set up alerts** and configure a monitor to notify you when anomalies are detected. For steps to create a monitor and set up notifications based on your anomaly detector, see [Monitors]({{site.url}}{{site.baseurl}}/monitoring-plugins/alerting/monitors/).
+Under **Real-time results**, choose **Set up alerts** and configure a monitor to notify you when anomalies are detected. For steps to create a monitor and set up notifications based on your anomaly detector, see [Monitors]({{site.url}}{{site.baseurl}}/monitoring-plugins/alerting/monitors/).

 If you stop or delete a detector, make sure to delete any monitors associated with it.

@ -147,7 +152,7 @@ If you stop or delete a detector, make sure to delete any monitors associated wi
 To see all the configuration settings for a detector, choose the **Detector configuration** tab.

 1. To make any changes to the detector configuration, or fine tune the time interval to minimize any false positives, go to the **Detector configuration** section and choose **Edit**.
- You need to stop the detector to change its configuration. Confirm that you want to stop the detector and proceed.
+- You need to stop a real-time or historical detector to change its configuration. Confirm that you want to stop the detector and proceed.
 1. To enable or disable features, in the **Features** section, choose **Edit** and adjust the feature settings as needed. After you make your changes, choose **Save and start detector**.

 ### Step 8: Manage your detectors
@ -155,4 +160,4 @@ To see all the configuration settings for a detector, choose the **Detector conf
 To start, stop, or delete a detector, go to the **Detectors** page.

 1. Choose the detector name.
-2. Choose **Actions** and select **Start real-time detectors**, **Stop real-time detectors**, or **Delete detectors**. 
+2. Choose **Actions** and select **Start real-time detectors**, **Stop real-time detectors**, or **Delete detectors**.
--- a/_monitoring-plugins/ad/settings.md
+++ b/_monitoring-plugins/ad/settings.md
@ -29,14 +29,15 @@ Setting | Default | Description
 `plugins.anomaly_detection.max_multi_entity_anomaly_detectors` | 10 | The maximum number of high cardinality detectors (with category field) in a cluster.
 `plugins.anomaly_detection.max_anomaly_features` | 5 | The maximum number of features for a detector.
 `plugins.anomaly_detection.ad_result_history_rollover_period` | 12h | How often the rollover condition is checked. If `true`, the plugin rolls over the result index to a new index.
-`plugins.anomaly_detection.ad_result_history_max_docs` | 250000000 | The maximum number of documents in one result index. The plugin only counts refreshed documents in the primary shards.
-`plugins.anomaly_detection.ad_result_history_retention_period` | 30d | The maximum age of the result index.  If its age exceeds the threshold, the plugin deletes the rolled over result index. If the cluster has only one result index, the plugin keeps the index even if it's older than its configured retention period.
-`plugins.anomaly_detection.max_entities_per_query` | 1,000 | The maximum unique values per detection interval for high cardinality detectors. By default, if the category field has more than 1,000 unique values in a detector interval, the plugin selects the top 1,000 values and orders them by `doc_count`.
-`plugins.anomaly_detection.max_entities_for_preview` | 30 | The maximum unique category field values displayed with the preview operation for high cardinality detectors. If the category field has more than 30 unique values, the plugin selects the top 30 values and orders them by `doc_count`.
+`plugins.anomaly_detection.ad_result_history_max_docs` | 250,000,000 | The maximum number of documents in one result index. The plugin only counts refreshed documents in the primary shards.
+`plugins.anomaly_detection.ad_result_history_max_docs_per_shard` | 1,350,000,000 | The maximum number of documents in a single shard of the result index. The anomaly detection plugin only counts the refreshed documents in the primary shards.
+`plugins.anomaly_detection.max_entities_per_query` | 1,000,000 | The maximum unique values per detection interval for high cardinality detectors. By default, if the category field has more than 1,000 unique values in a detector interval, the plugin selects the top 1,000 values and orders them by `doc_count`.
+`plugins.anomaly_detection.max_entities_for_preview` | 5 | The maximum unique category field values displayed with the preview operation for high cardinality detectors. If the category field has more than 30 unique values, the plugin selects the top 30 values and orders them by `doc_count`.
 `plugins.anomaly_detection.max_primary_shards` | 10 | The maximum number of primary shards an anomaly detection index can have.
 `plugins.anomaly_detection.filter_by_backend_roles` | False | When you enable the security plugin and set this to `true`, the plugin filters results based on the user's backend role(s).
-`plugins.anomaly_detection.max_cache_miss_handling_per_second` | 100 | High cardinality detectors use a cache to store active models. In the event of a cache miss, the cache gets the models from the model checkpoint index. Use this setting to limit the rate of fetching models. Because the thread pool for a GET operation has a queue of 1,000, we recommend setting this value below 1,000.
-`plugins.anomaly_detection.max_batch_task_per_node` | 2 | Starting a historical detector triggers a batch task. This setting is the number of batch tasks that you can run per data node. You can tune this setting from 1 to 1000. If the data nodes can't support all batch tasks and you're not sure if the data nodes are capable of running more historical detectors, add more data nodes instead of changing this setting to a higher value.
-`plugins.anomaly_detection.max_old_ad_task_docs_per_detector` | 10 | You can run the same historical detector many times. For each run, the anomaly detection plugin creates a new task. This setting is the number of previous tasks the plugin keeps. Set this value to at least 1 to track its last run. You can keep a maximum of 1,000 old tasks to avoid overwhelming the cluster.
-`plugins.anomaly_detection.batch_task_piece_size` | 1000 | The date range for a historical task is split into smaller pieces and the anomaly detection plugin runs the task piece by piece. Each piece contains 1,000 detection intervals by default. For example, if detector interval is 1 minute and one piece is 1000 minutes, the feature data is queried every 1,000 minutes. You can change this setting from 1 to 10,000.
+`plugins.anomaly_detection.max_batch_task_per_node` | 10 | Starting a historical detector triggers a batch task. This setting is the number of batch tasks that you can run per data node. You can tune this setting from 1 to 1000. If the data nodes can't support all batch tasks and you're not sure if the data nodes are capable of running more historical detectors, add more data nodes instead of changing this setting to a higher value.
+`plugins.anomaly_detection.max_old_ad_task_docs_per_detector` | 1 | You can run the same historical detector many times. For each run, the anomaly detection plugin creates a new task. This setting is the number of previous tasks the plugin keeps. Set this value to at least 1 to track its last run. You can keep a maximum of 1,000 old tasks to avoid overwhelming the cluster.
+`plugins.anomaly_detection.batch_task_piece_size` | 1,000 | The date range for a historical task is split into smaller pieces and the anomaly detection plugin runs the task piece by piece. Each piece contains 1,000 detection intervals by default. For example, if detector interval is 1 minute and one piece is 1000 minutes, the feature data is queried every 1,000 minutes. You can change this setting from 1 to 10,000.
 `plugins.anomaly_detection.batch_task_piece_interval_seconds` | 5 | Add a time interval between historical detector tasks. This interval prevents the task from consuming too much of the available resources and starving other operations like search and bulk index. You can change this setting from 1 to 600 seconds.
+`plugins.anomaly_detection.max_top_entities_for_historical_analysis` | 1,000 | The maximum number of top entities that you run for a high-cardinality detector historical analysis.
+`plugins.anomaly_detection.max_running_entities_per_detector_for_historical_analysis` | 10 | How many entity tasks you can run in parallel for one HC detector. The cluster availble task slots will impact how many entities can run in parallel as well. For example, the cluster has 3 data nodes, each data node has 10 task slots by default. But if we have already started 2 HC detectors and each HC running 10 entities, and start a single-flow detector which takes 1 task slot,  then the availabe task slots will be 10 * 3 - 10 * 2 - 1 = 9. Then, if we start a new HC detector, it can only run 9 entities in parallel, not 10.