Fix the wrong formula to calculate the total number of unique entities supported in the anomaly detection plugin (#4474)

* Fix wrong formula to calculate the total number of unique entities supported in anomaly detection plugin

* Add a blog post for detailed and comprehensive explain for anomaly detector

* Update _observing-your-data/ad/index.md

Signed-off-by: Melissa Vagi <vagimeli@amazon.com>

* Update _observing-your-data/ad/index.md

Signed-off-by: Melissa Vagi <vagimeli@amazon.com>

* Update index.md

Signed-off-by: Melissa Vagi <vagimeli@amazon.com>

* Update _observing-your-data/ad/index.md

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: Melissa Vagi <vagimeli@amazon.com>

* Update _observing-your-data/ad/index.md

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: Melissa Vagi <vagimeli@amazon.com>

* Update _observing-your-data/ad/index.md

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: Melissa Vagi <vagimeli@amazon.com>

---------

Signed-off-by: Melissa Vagi <vagimeli@amazon.com>
Co-authored-by: Melissa Vagi <vagimeli@amazon.com>
Co-authored-by: Nathan Bower <nbower@amazon.com>
This commit is contained in:
quocbao 2023-12-22 04:17:49 +07:00 committed by GitHub
parent e63b67b873
commit 3a4143cabb
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 9 additions and 5 deletions

View File

@ -107,13 +107,17 @@ Only a certain number of unique entities are supported in the category field. Us
To get the entity model size of a detector, use the [profile detector API]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/api/#profile-detector). You can adjust the maximum memory percentage with the `plugins.anomaly_detection.model_max_size_percent` setting. To get the entity model size of a detector, use the [profile detector API]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/api/#profile-detector). You can adjust the maximum memory percentage with the `plugins.anomaly_detection.model_max_size_percent` setting.
This formula provides a good starting point, but make sure to test with a representative workload. Consider a cluster with 3 data nodes, each with 8 GB of JVM heap size and the default 10% memory allocation. With an entity model size of 1 MB, the following formula calculates the estimated number of unique entities:
```
(8096 MB * 0.1 / 1 MB ) * 3 = 2429
```
If the actual total number of unique entities is higher than the number that you calculate (in this case, 2,429), the anomaly detector will attempt to model the extra entities. The detector prioritizes entities that occur more often and are more recent.
This formula serves as a starting point. Make sure to test it with a representative workload. You can find more information in the [Improving Anomaly Detection: One million entities in one minute](https://opensearch.org/blog/one-million-enitities-in-one-minute/) blog post.
{: .note } {: .note }
For example, for a cluster with three data nodes, each with 8 GB of JVM heap size, a maximum memory percentage of 10% (default), and the entity model size of the detector as 1MB: the total number of unique entities supported is (8.096 * 10^9 * 0.1 / 1 MB ) * 3 = 2429.
If the actual total number of unique entities higher than this number that you calculate (in this case: 2429), the anomaly detector makes its best effort to model the extra entities. The detector prioritizes entities that occur more often and are more recent.
#### (Advanced settings) Set a shingle size #### (Advanced settings) Set a shingle size
Set the number of aggregation intervals from your data stream to consider in a detection window. Its best to choose this value based on your actual data to see which one leads to the best results for your use case. Set the number of aggregation intervals from your data stream to consider in a detection window. Its best to choose this value based on your actual data to see which one leads to the best results for your use case.