[DOC} Reconcile Alerting Monitors 2.9 Documentation Changes (#4710)
* Reconcile PR changes that weren't published due to files being edited simultaneously --------- Signed-off-by: Melissa Vagi <vagimeli@amazon.com> Co-authored-by: Chris Moore <107723039+cwillum@users.noreply.github.com>
This commit is contained in:
parent
8e9d0ed4de
commit
fc14355c1f
|
@ -14,9 +14,9 @@ An anomaly in OpenSearch is any unusual behavior change in your time-series data
|
||||||
|
|
||||||
It can be challenging to discover anomalies using conventional methods such as creating visualizations and dashboards. You could configure an alert based on a static threshold, but this requires prior domain knowledge and isn't adaptive to data that exhibits organic growth or seasonal behavior.
|
It can be challenging to discover anomalies using conventional methods such as creating visualizations and dashboards. You could configure an alert based on a static threshold, but this requires prior domain knowledge and isn't adaptive to data that exhibits organic growth or seasonal behavior.
|
||||||
|
|
||||||
Anomaly detection automatically detects anomalies in your OpenSearch data in near real-time using the Random Cut Forest (RCF) algorithm. RCF is an unsupervised machine learning algorithm that models a sketch of your incoming data stream to compute an `anomaly grade` and `confidence score` value for each incoming data point. These values are used to differentiate an anomaly from normal variations. For more information about how RCF works, see [Random Cut Forests](https://www.semanticscholar.org/paper/Robust-Random-Cut-Forest-Based-Anomaly-Detection-on-Guha-Mishra/ecb365ef9b67cd5540cc4c53035a6a7bd88678f9).
|
Anomaly detection automatically detects anomalies in your OpenSearch data in near real-time using the Random Cut Forest (RCF) algorithm. RCF is an unsupervised machine learning algorithm that models a sketch of your incoming data stream to compute an `anomaly grade` and `confidence score` value for each incoming data point. These values are used to differentiate an anomaly from normal variations. For more information about how RCF works, see [Random Cut Forests](https://www.semanticscholar.org/paper/Robust-Random-Cut-Forest-Based-Anomaly-Detection-on-Guha-Mishra/ecb365ef9b67cd5540cc4c53035a6a7bd88678f9).
|
||||||
|
|
||||||
You can pair the anomaly detection plugin with the [alerting plugin]({{site.url}}{{site.baseurl}}/monitoring-plugins/alerting/) to notify you as soon as an anomaly is detected.
|
You can pair the Anomaly Detection plugin with the [Alerting plugin]({{site.url}}{{site.baseurl}}/monitoring-plugins/alerting/) to notify you as soon as an anomaly is detected.
|
||||||
|
|
||||||
To get started, choose **Anomaly Detection** in OpenSearch Dashboards.
|
To get started, choose **Anomaly Detection** in OpenSearch Dashboards.
|
||||||
To first test with sample streaming data, you can try out one of the preconfigured detectors with one of the sample datasets.
|
To first test with sample streaming data, you can try out one of the preconfigured detectors with one of the sample datasets.
|
||||||
|
@ -43,21 +43,21 @@ A detector is an individual anomaly detection task. You can define multiple dete
|
||||||
|
|
||||||
- (Optional) To add extra processing time for data collection, specify a **Window delay** value.
|
- (Optional) To add extra processing time for data collection, specify a **Window delay** value.
|
||||||
- This value tells the detector that the data is not ingested into OpenSearch in real time but with a certain delay. Set the window delay to shift the detector interval to account for this delay.
|
- This value tells the detector that the data is not ingested into OpenSearch in real time but with a certain delay. Set the window delay to shift the detector interval to account for this delay.
|
||||||
- For example, say the detector interval is 10 minutes and data is ingested into your cluster with a general delay of 1 minute. Assume the detector runs at 2:00. The detector attempts to get the last 10 minutes of data from 1:50 to 2:00, but because of the 1-minute delay, it only gets 9 minutes of data and misses the data from 1:59 to 2:00. Setting the window delay to 1 minute shifts the interval window to 1:49 - 1:59, so the detector accounts for all 10 minutes of the detector interval time.
|
- For example, say the detector interval is 10 minutes and data is ingested into your cluster with a general delay of 1 minute. Assume the detector runs at 2:00. The detector attempts to get the last 10 minutes of data from 1:50 to 2:00, but because of the 1-minute delay, it only gets 9 minutes of data and misses the data from 1:59 to 2:00. Setting the window delay to 1 minute shifts the interval window to 1:49--1:59, so the detector accounts for all 10 minutes of the detector interval time.
|
||||||
1. Specify custom result index.
|
1. Specify custom result index.
|
||||||
- If you want to store the anomaly detection results in your own index, choose **Enable custom result index** and specify the custom index to store the result. The anomaly detection plugin adds an `opensearch-ad-plugin-result-` prefix to the index name that you input. For example, if you input `abc` as the result index name, the final index name is `opensearch-ad-plugin-result-abc`.
|
- If you want to store the anomaly detection results in your own index, choose **Enable custom result index** and specify the custom index to store the result. The anomaly detection plugin adds an `opensearch-ad-plugin-result-` prefix to the index name that you input. For example, if you input `abc` as the result index name, the final index name is `opensearch-ad-plugin-result-abc`.
|
||||||
|
|
||||||
You can use the dash “-” sign to separate the namespace to manage custom result index permissions. For example, if you use `opensearch-ad-plugin-result-financial-us-group1` as the result index, you can create a permission role based on the pattern `opensearch-ad-plugin-result-financial-us-*` to represent the "financial" department at a granular level for the "us" area.
|
You can use the dash “-” sign to separate the namespace to manage custom result index permissions. For example, if you use `opensearch-ad-plugin-result-financial-us-group1` as the result index, you can create a permission role based on the pattern `opensearch-ad-plugin-result-financial-us-*` to represent the "financial" department at a granular level for the "us" area.
|
||||||
{: .note }
|
{: .note }
|
||||||
|
|
||||||
- If the custom index you specify doesn’t already exist, the anomaly detection plugin creates this index when you create the detector and start your real-time or historical analysis.
|
- If the custom index you specify doesn’t already exist, the Anomaly Detection plugin creates this index when you create the detector and start your real-time or historical analysis.
|
||||||
- If the custom index already exists, the plugin checks if the index mapping of the custom index matches the anomaly result file. You need to make sure the custom index has valid mapping as shown here: [anomaly-results.json](https://github.com/opensearch-project/anomaly-detection/blob/main/src/main/resources/mappings/anomaly-results.json).
|
- If the custom index already exists, the plugin checks if the index mapping of the custom index matches the anomaly result file. You need to make sure the custom index has valid mapping as shown here: [anomaly-results.json](https://github.com/opensearch-project/anomaly-detection/blob/main/src/main/resources/mappings/anomaly-results.json).
|
||||||
- To use the custom result index option, you need the following permissions:
|
- To use the custom result index option, you need the following permissions:
|
||||||
- `indices:admin/create` - If the custom index already exists, you don't need this.
|
- `indices:admin/create` - If the custom index already exists, you don't need this.
|
||||||
- `indices:data/write/index` - You need the `write` permission for the anomaly detection plugin to write results into the custom index for a single-entity detector.
|
- `indices:data/write/index` - You need the `write` permission for the Anomaly Detection plugin to write results into the custom index for a single-entity detector.
|
||||||
- `indices:data/read/search` - You need the `search` permission because the Anomaly Detection plugin needs to search custom result indexes to show results on the anomaly detection UI.
|
- `indices:data/read/search` - You need the `search` permission because the Anomaly Detection plugin needs to search custom result indexes to show results on the anomaly detection UI.
|
||||||
- `indices:data/write/delete` - Because the detector might generate a large number of anomaly results, you need the `delete` permission to delete old data and save disk space.
|
- `indices:data/write/delete` - Because the detector might generate a large number of anomaly results, you need the `delete` permission to delete old data and save disk space.
|
||||||
- `indices:data/write/bulk*` - You need the `bulk*` permission because the anomaly detection plugin uses the bulk API to write results into the custom index.
|
- `indices:data/write/bulk*` - You need the `bulk*` permission because the Anomaly Detection plugin uses the bulk API to write results into the custom index.
|
||||||
- Managing the custom result index:
|
- Managing the custom result index:
|
||||||
- The anomaly detection dashboard queries all detectors’ results from all custom result indexes. Having too many custom result indexes might impact the performance of the Anomaly Detection plugin.
|
- The anomaly detection dashboard queries all detectors’ results from all custom result indexes. Having too many custom result indexes might impact the performance of the Anomaly Detection plugin.
|
||||||
- You can use [Index State Management]({{site.url}}{{site.baseurl}}/im-plugin/ism/index/) to rollover old result indexes. You can also manually delete or archive any old result indexes. We recommend reusing a custom result index for multiple detectors.
|
- You can use [Index State Management]({{site.url}}{{site.baseurl}}/im-plugin/ism/index/) to rollover old result indexes. You can also manually delete or archive any old result indexes. We recommend reusing a custom result index for multiple detectors.
|
||||||
|
@ -123,7 +123,7 @@ The anomaly detector expects the shingle size to be in the range of 1 and 60. Th
|
||||||
#### Preview sample anomalies
|
#### Preview sample anomalies
|
||||||
|
|
||||||
Preview sample anomalies and adjust the feature settings if needed.
|
Preview sample anomalies and adjust the feature settings if needed.
|
||||||
For sample previews, the anomaly detection plugin selects a small number of data samples---for example, one data point every 30 minutes---and uses interpolation to estimate the remaining data points to approximate the actual feature data. It loads this sample dataset into the detector. The detector uses this sample dataset to generate a sample preview of anomaly results.
|
For sample previews, the Anomaly Detection plugin selects a small number of data samples---for example, one data point every 30 minutes---and uses interpolation to estimate the remaining data points to approximate the actual feature data. It loads this sample dataset into the detector. The detector uses this sample dataset to generate a sample preview of anomaly results.
|
||||||
|
|
||||||
Examine the sample preview and use it to fine-tune your feature configurations (for example, enable or disable features) to get more accurate results.
|
Examine the sample preview and use it to fine-tune your feature configurations (for example, enable or disable features) to get more accurate results.
|
||||||
|
|
||||||
|
@ -137,7 +137,7 @@ To start a real-time detector to find anomalies in your data in near real-time,
|
||||||
|
|
||||||
Alternatively, if you want to perform historical analysis and find patterns in long historical data windows (weeks or months), check **Run historical analysis detection** and select a date range (at least 128 detection intervals).
|
Alternatively, if you want to perform historical analysis and find patterns in long historical data windows (weeks or months), check **Run historical analysis detection** and select a date range (at least 128 detection intervals).
|
||||||
|
|
||||||
Analyzing historical data helps you get familiar with the anomaly detection plugin. You can also evaluate the performance of a detector with historical data to further fine-tune it.
|
Analyzing historical data helps you get familiar with the Anomaly Detection plugin. You can also evaluate the performance of a detector with historical data to further fine-tune it.
|
||||||
|
|
||||||
We recommend experimenting with historical analysis with different feature sets and checking the precision before moving on to real-time detectors.
|
We recommend experimenting with historical analysis with different feature sets and checking the precision before moving on to real-time detectors.
|
||||||
|
|
||||||
|
@ -189,11 +189,11 @@ If you set the category field, you see an additional **Heat map** chart. The hea
|
||||||
|
|
||||||
If you have set multiple category fields, you can select a subset of fields to filter and sort the fields by. Selecting a subset of fields lets you see the top values of one field that share a common value with another field.
|
If you have set multiple category fields, you can select a subset of fields to filter and sort the fields by. Selecting a subset of fields lets you see the top values of one field that share a common value with another field.
|
||||||
|
|
||||||
For example, if you have a detector with the category fields `ip` and `endpoint`, you can select `endpoint` in the **View by** dropdown menu. Then, select a specific cell to overlay the top 20 values of `ip` on the charts. The anomaly detection plugin selects the top `ip` by default. You can see a maximum of 5 individual time-series values at the same time.
|
For example, if you have a detector with the category fields `ip` and `endpoint`, you can select `endpoint` in the **View by** dropdown menu. Then select a specific cell to overlay the top 20 values of `ip` on the charts. The Anomaly Detection plugin selects the top `ip` by default. You can see a maximum of 5 individual time-series values at the same time.
|
||||||
|
|
||||||
## Step 6: Set up alerts
|
## Step 6: Set up alerts
|
||||||
|
|
||||||
Under **Real-time results**, choose **Set up alerts** and configure a monitor to notify you when anomalies are detected. For steps to create a monitor and set up notifications based on your anomaly detector, see [Monitors]({{site.url}}{{site.baseurl}}/monitoring-plugins/alerting/monitors/).
|
Under **Real-time results**, choose **Set up alerts** and configure a monitor to notify you when anomalies are detected. For steps to create a monitor and set up notifications based on your anomaly detector, see [Monitors]({{site.url}}{{site.baseurl}}/observing-your-data/alerting/monitors/).
|
||||||
|
|
||||||
If you stop or delete a detector, make sure to delete any monitors associated with it.
|
If you stop or delete a detector, make sure to delete any monitors associated with it.
|
||||||
|
|
||||||
|
|
|
@ -1,7 +1,7 @@
|
||||||
---
|
---
|
||||||
layout: default
|
layout: default
|
||||||
title: Actions
|
title: Actions
|
||||||
nav_order: 15
|
nav_order: 50
|
||||||
grand_parent: Alerting
|
grand_parent: Alerting
|
||||||
parent: Monitors
|
parent: Monitors
|
||||||
---
|
---
|
||||||
|
|
|
@ -1,10 +1,12 @@
|
||||||
---
|
---
|
||||||
layout: default
|
layout: default
|
||||||
title: Composite monitors
|
title: Composite monitors
|
||||||
nav_order: 3
|
nav_order: 25
|
||||||
parent: Alerting
|
parent: Monitors
|
||||||
|
grand_parent: Alerting
|
||||||
has_children: false
|
has_children: false
|
||||||
redirect_from:
|
redirect_from:
|
||||||
|
- /observing-your-data/alerting/composite-monitors/
|
||||||
---
|
---
|
||||||
|
|
||||||
# Composite monitors
|
# Composite monitors
|
||||||
|
@ -36,18 +38,17 @@ Composite monitors remove the limitations of basic monitors in the following way
|
||||||
|
|
||||||
## Key terms
|
## Key terms
|
||||||
|
|
||||||
The key terms in the following table describe the basic concepts of composite monitors. For additional terms common to all types of monitors, see [Key terms]({{site.url}}{{site.baseurl}}/observing-your-data/alerting/monitors/#key-terms) for basic monitors.
|
The key terms in the following table describe the basic concepts of composite monitors. For additional terms common to all types of monitors, see [Key terms]({{site.url}}{{site.baseurl}}/observing-your-data/alerting/index/#key-terms) in the Alerting section.
|
||||||
|
|
||||||
| Term | Definition |
|
| Term | Definition |
|
||||||
| :--- | :--- |
|
| :--- | :--- |
|
||||||
| Composite monitor | A composite monitor is a type of monitor that supports the execution of multiple monitors in a sequential workflow. It supports configuring triggers to create chained alerts. |
|
| Composite monitor | A composite monitor is a type of monitor that supports the execution of multiple monitors in a sequential workflow. It supports configuring triggers to create chained alerts. |
|
||||||
| Delegate monitor | Delegate monitors are executed sequentially according to their order in a composite monitor's definition. When a delegate monitor's trigger conditions are met, it generates an audit alert. This audit alert then becomes a condition for the composite monitor's trigger. The composite monitor supports per query, per bucket, and per document monitors as delegate monitors. |
|
| Delegate monitor | Delegate monitors are executed sequentially according to their order in a composite monitor's definition. When a delegate monitor's trigger conditions are met, it generates an audit alert. This audit alert then becomes a condition for the composite monitor's trigger. The composite monitor supports per query, per bucket, and per document monitors as delegate monitors. |
|
||||||
| workflow ID | The workflow ID provides an identifier for the entire workflow of all delegate monitors. It is synonymous with a composite monitor's monitor ID. |
|
| workflow ID | The workflow ID provides an identifier for the entire workflow of all delegate monitors. It is synonymous with a composite monitor's monitor ID. |
|
||||||
| Chained alert | Chained alerts are generated from composite monitor triggers when delegate monitors generate audit alerts. The chained alert trigger condition supports the use of the logical operators AND, OR, and NOT so you can combine multiple functions into a single expression. |
|
| Chained alert | Chained alerts are generated from composite monitor triggers when delegate monitors generate audit alerts. The chained alert trigger condition supports the use of the logical operators `AND`, `OR`, and `NOT` so you can combine multiple functions into a single expression. |
|
||||||
| Audit alert | Delegate monitors generate alerts in an **audit** state. Users are not notified about each individual audit alert and don't need to acknowledge them. Audit alerts are used to evaluate chained alert trigger conditions in composite monitors. |
|
| Audit alert | Delegate monitors generate alerts in an **audit** state. Users are not notified about each individual audit alert and don't need to acknowledge them. Audit alerts are used to evaluate chained alert trigger conditions in composite monitors. |
|
||||||
| Execution | A single run of all delegate monitors in the sequence defined in the composite monitor's configuration. |
|
| Execution | A single run of all delegate monitors in the sequence defined in the composite monitor's configuration. |
|
||||||
|
|
||||||
|
|
||||||
## Basic workflow
|
## Basic workflow
|
||||||
|
|
||||||
You create composite monitors by combining individual monitors in a workflow that executes each monitor in a defined sequence. When individual audit alerts from the delegate monitors meet the trigger conditions for a composite monitor, the composite monitor generates its own chained alert. Consider the following sequence of events to understand how a simple composite monitor configured with two delegate monitors executes its workflow. In this example, the trigger condition for the composite monitor is met when the first monitor and the second monitor both generate an alert.
|
You create composite monitors by combining individual monitors in a workflow that executes each monitor in a defined sequence. When individual audit alerts from the delegate monitors meet the trigger conditions for a composite monitor, the composite monitor generates its own chained alert. Consider the following sequence of events to understand how a simple composite monitor configured with two delegate monitors executes its workflow. In this example, the trigger condition for the composite monitor is met when the first monitor and the second monitor both generate an alert.
|
||||||
|
@ -64,8 +65,7 @@ In this simple example, the first monitor could be a per document monitor config
|
||||||
|
|
||||||
You can manage composite monitors using the REST API or OpenSearch Dashboards. This section covers API functionality for composite monitors.
|
You can manage composite monitors using the REST API or OpenSearch Dashboards. This section covers API functionality for composite monitors.
|
||||||
|
|
||||||
|
### Create composite monitor
|
||||||
### Create Composite Monitor
|
|
||||||
|
|
||||||
This API allows you to create a composite monitor.
|
This API allows you to create a composite monitor.
|
||||||
|
|
||||||
|
@ -384,7 +384,7 @@ POST /_plugins/_alerting/workflows/<workflow_id>/_execute
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
### Get Chained Alerts
|
### Get chained alerts
|
||||||
|
|
||||||
This API returns an array of chained alerts generated in composite monitor workflows:
|
This API returns an array of chained alerts generated in composite monitor workflows:
|
||||||
|
|
||||||
|
@ -534,7 +534,7 @@ GET /_plugins/_alerting/workflows/alerts?workflowIds=<workflow_ids>&getAssociate
|
||||||
| `associatedAlerts` | Array | A list of audit alerts generated by the delegate monitors. |
|
| `associatedAlerts` | Array | A list of audit alerts generated by the delegate monitors. |
|
||||||
|
|
||||||
|
|
||||||
### Acknowledge Chained Alerts
|
### Acknowledge chained alerts
|
||||||
|
|
||||||
[After getting your alerts](#get-chained-alerts), you can acknowledge multiple active alerts in one call. If the alert is already in an ERROR, COMPLETED, or ACKNOWLEDGED state, it appears in the failed array.
|
[After getting your alerts](#get-chained-alerts), you can acknowledge multiple active alerts in one call. If the alert is already in an ERROR, COMPLETED, or ACKNOWLEDGED state, it appears in the failed array.
|
||||||
|
|
||||||
|
@ -563,7 +563,6 @@ POST _plugins/_alerting/workflows/<workflow_id>/_acknowledge/alerts
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
## Creating composite monitors in OpenSearch Dashboards
|
## Creating composite monitors in OpenSearch Dashboards
|
||||||
|
|
||||||
Begin by navigating to the **Create monitor** page in OpenSearch Dashboards: **Alerting > Monitors** and select **Create monitor**. Give the monitor a name and then select **Composite monitor** as the monitor type. Steps for creating a composite monitor workflow and trigger conditions vary depending on whether you use the **Visual editor** or the **Extraction query editor**. The first provides basic UI selectors for defining the composite monitor, while the second allows you to build the workflow and trigger conditions using a script. After deciding which method to use, refer to the corresponding section.
|
Begin by navigating to the **Create monitor** page in OpenSearch Dashboards: **Alerting > Monitors** and select **Create monitor**. Give the monitor a name and then select **Composite monitor** as the monitor type. Steps for creating a composite monitor workflow and trigger conditions vary depending on whether you use the **Visual editor** or the **Extraction query editor**. The first provides basic UI selectors for defining the composite monitor, while the second allows you to build the workflow and trigger conditions using a script. After deciding which method to use, refer to the corresponding section.
|
||||||
|
@ -647,7 +646,6 @@ The extraction query editor follows the same general steps as the visual editor,
|
||||||
(monitor[id=8d36S4kB0DWOHH7wpkET] || monitor[id=4t36S4kB0DWOHH7wL0Hk])
|
(monitor[id=8d36S4kB0DWOHH7wpkET] || monitor[id=4t36S4kB0DWOHH7wL0Hk])
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
### Viewing monitor details
|
### Viewing monitor details
|
||||||
|
|
||||||
After a composite monitor is created, it appears in the list of monitors on the **Monitors** tab. The **Type** column indicates the type of monitor, including the composite monitor type. The **Associations with composite monitors** column provides a count of how many composite monitors a basic monitor is used in as a delegate monitor. Select a monitor in the **Monitor name** column to open its details window.
|
After a composite monitor is created, it appears in the list of monitors on the **Monitors** tab. The **Type** column indicates the type of monitor, including the composite monitor type. The **Associations with composite monitors** column provides a count of how many composite monitors a basic monitor is used in as a delegate monitor. Select a monitor in the **Monitor name** column to open its details window.
|
||||||
|
@ -659,4 +657,3 @@ For composite monitors, The **Alerts** section of the details window includes th
|
||||||
Select this icon to open the **Alert details** window. This window shows you all of the audit alerts that were part of the execution that generated the chained alert and includes the delegate monitor that generated the audit alert. Select the **X** in the upper-right corner of the window to close **Alert details**.
|
Select this icon to open the **Alert details** window. This window shows you all of the audit alerts that were part of the execution that generated the chained alert and includes the delegate monitor that generated the audit alert. Select the **X** in the upper-right corner of the window to close **Alert details**.
|
||||||
|
|
||||||
After returning to the **Alerts** section of the monitor's details window, you can select the check box to the left of the **Alert start time** to highlight the alert. After the alert is highlighted, you can select **Acknowledge** in the upper-right portion of this section. The alert is acknowledged and the status in the **State** column changes from Active to Acknowledged.
|
After returning to the **Alerts** section of the monitor's details window, you can select the check box to the left of the **Alert start time** to highlight the alert. After the alert is highlighted, you can select **Acknowledge** in the upper-right portion of this section. The alert is acknowledged and the status in the **State** column changes from Active to Acknowledged.
|
||||||
|
|
||||||
|
|
|
@ -16,28 +16,17 @@ To create an alert, do the following:
|
||||||
- Configure one or more _triggers_, which define the conditions that generate events. Optional.
|
- Configure one or more _triggers_, which define the conditions that generate events. Optional.
|
||||||
- Configure _actions_, which is what happens after an alert is triggered. Optional.
|
- Configure _actions_, which is what happens after an alert is triggered. Optional.
|
||||||
|
|
||||||
## Getting started
|
## Key terms
|
||||||
|
|
||||||
To get started with creating alerts:
|
The following table lists alerting terminology commonly used in OpenSearch and throughout the Alerting documentation.
|
||||||
|
|
||||||
1. Choose **Alerting** from the OpenSearch Plugins main menu, then **Create monitor**. If alerts exist, you'll see a list of those alerts and the Create monitor button won't appear. In this case, select the **Monitors** tab, then **Create monitor**.
|
|
||||||
2. Create a per query, per bucket, per cluster metrics, or per document monitor. For instructions, see [Monitors]({{site.url}}{{site.baseurl}}/observing-your-data/notifications/index/).
|
|
||||||
3. Create one or more triggers. For instructions, see [Triggers[({{site.url}}{{site.baseurl}}/observing-your-data/alerting/triggers/)].
|
|
||||||
4. For Actions, set up a notification channel for the alert. For instructions, see [Actions]({{site.url}}{{site.baseurl}}/observing-your-data/alerting/actions/).
|
|
||||||
|
|
||||||
## Alerting terminology
|
|
||||||
|
|
||||||
The following table lists alerting terminology commonly used in OpenSearch.
|
|
||||||
|
|
||||||
Term | Definition
|
Term | Definition
|
||||||
:--- | :---
|
:--- | :---
|
||||||
Monitor | A job that runs on a defined schedule and queries OpenSearch indexes. The results of these queries are then used as input for one or more *triggers*.
|
Monitor | Job that runs on a defined schedule and queries OpenSearch indexes. The results of these queries are then used as input for one or more triggers.
|
||||||
Trigger | A condition that, if met, generates an *alert*.
|
Trigger | Conditions that, if met, generate alerts. See [Triggers]({{site.url}}{{site.baseurl}}/observing-your-data/alerting/triggers/).
|
||||||
Tag | A label that can be applied to multiple queries to combine them with the logical `OR` operation in a per document monitor. You cannot use tags with other monitor types.
|
Alert | Event associated with a trigger. When an alert is created, the trigger performs actions, including sending notifications.
|
||||||
Alert | An event associated with a trigger. When an alert is created, the trigger performs *actions*, which can include sending a notification.
|
Action | Specific task that is performed when an alert is triggered. See [Actions]({{site.url}}{{site.baseurl}}/observing-your-data/alerting/actions/).
|
||||||
Action | The information that you want the monitor to send after being triggered. Actions have a *channel*, a message subject, and a message body.
|
Notification | Message that is sent to users when an alert is triggered. See [Notifications]({{site.url}}{{site.baseurl}}/notifications-plugin/index/).
|
||||||
Channel | A notification channel to use in an action. Supported channels are Amazon Chime, Slack, Amazon Simple Notification Service (Amazon SNS), email, or custom webhook. See [Notifications]({{site.url}}{{site.baseurl}}/notifications-plugin/index/) for more information.
|
|
||||||
Finding | An entry for an individual document found by a per document monitor query that contains the document ID, index name, and timestamp. Findings are stored in the Findings index `.opensearch-alerting-finding*`.
|
|
||||||
|
|
||||||
## Alert states
|
## Alert states
|
||||||
|
|
||||||
|
@ -50,3 +39,18 @@ Acknowledged | The alert is acknowledged but the root cause is not fixed.
|
||||||
Completed | The alert is no longer ongoing. Alerts enter this state after the corresponding trigger evaluates to `false`.
|
Completed | The alert is no longer ongoing. Alerts enter this state after the corresponding trigger evaluates to `false`.
|
||||||
Error | An error occurred while executing the trigger---usually the result of a bad trigger or destination.
|
Error | An error occurred while executing the trigger---usually the result of a bad trigger or destination.
|
||||||
Deleted | The monitor or trigger associated with this alert was deleted while the alert was ongoing.
|
Deleted | The monitor or trigger associated with this alert was deleted while the alert was ongoing.
|
||||||
|
|
||||||
|
## Creating an alert monitor
|
||||||
|
|
||||||
|
You can follow these basic steps to create an alert monitor:
|
||||||
|
|
||||||
|
1. In the **OpenSearch Plugins** main menu, choose **Alerting**.
|
||||||
|
1. Choose **Create monitor**. See [Monitors]({{site.url}}{{site.baseurl}}/observing-your-data/notifications/index/) for more information about the monitor types.
|
||||||
|
1. Enter the **Monitor details**, including monitor type, method, and schedule.
|
||||||
|
1. Select a data source from the dropdown list.
|
||||||
|
1. Define the metrics in the Query section.
|
||||||
|
1. Add a trigger. See [Triggers]({{site.url}}{{site.baseurl}}/observing-your-data/alerting/triggers/) for more information about triggers.
|
||||||
|
1. Add an action. See [Actions]({{site.url}}{{site.baseurl}}/observing-your-data/alerting/actions/) for more information about actions.
|
||||||
|
1. Select **Create**.
|
||||||
|
|
||||||
|
Learn more about creating specific monitor types in their respective documentation.
|
|
@ -10,348 +10,24 @@ redirect_from:
|
||||||
|
|
||||||
# Monitors
|
# Monitors
|
||||||
|
|
||||||
---
|
Proactively monitor your data in OpenSearch with features available in Alerting and Anomaly Detection. For example, you can pair Anomaly Detection with Alerting to ensure that you're notified as soon as an anomaly is detected. You can do this by setting up a detector to automatically detect outliers in your streaming data and monitors to alert you through notifications when data exceeds certain thresholds.
|
||||||
|
|
||||||
<details closed markdown="block">
|
|
||||||
<summary>
|
|
||||||
Table of contents
|
|
||||||
</summary>
|
|
||||||
{: .text-delta }
|
|
||||||
- TOC
|
|
||||||
{:toc}
|
|
||||||
</details>
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Monitor types
|
## Monitor types
|
||||||
|
|
||||||
The OpenSearch Dashboard Alerting plugin provides four basic monitor types as well as a composite monitor that can integrate the functionality of multiple monitors into a single workflow:
|
The Alerting plugin provides the following monitor types:
|
||||||
* **per query** – This monitor runs a query and generates alert notifications based on criteria that matches.
|
|
||||||
* **per bucket** – This monitor runs a query that evaluates trigger criteria based on aggregated values in the dataset.
|
|
||||||
* **per cluster metrics** – This monitor runs API requests on the cluster to monitor its health.
|
|
||||||
* **per document** – This monitor runs a query (or multiple queries combined by a tag) that returns individual documents that match the alert notification trigger condition.
|
|
||||||
* **composite monitor** — The composite monitor allows you to run multiple monitors in a single workflow and generate a single alert based on multiple trigger conditions. See [Composite monitors]({{site.url}}{{site.baseurl}}/observing-your-data/alerting/composite-monitors/) for information about creating and using these types of monitors.
|
|
||||||
|
|
||||||
## Key terms
|
1. **per query**: Runs a query and generates alert notifications based on the matching criteria. See [Per query monitors]({{site.url}}{{site.baseurl}}/observing-your-data/alerting/per-query-bucket-monitors/) for information about creating and using this monitor type.
|
||||||
|
1. **per bucket**: Runs a query that evaluates trigger criteria based on aggregated values in the dataset. See [Per bucket monitors]({{site.url}}{{site.baseurl}}/observing-your-data/alerting/per-query-bucket-monitors/) for information about creating and using this monitor type.
|
||||||
|
1. **per cluster metrics**: Runs API requests on the cluster to monitor its health. See [Per cluster metrics monitors]({{site.url}}{{site.baseurl}}/observing-your-data/alerting/per-cluster-metrics-monitors/) for information about creating and using this monitor type.
|
||||||
|
1. **per document**: Runs a query (or multiple queries combined by a tag) that returns individual documents that match the alert notification trigger condition. See [Per document monitors]({{site.url}}{{site.baseurl}}/observing-your-data/alerting/per-document-monitors/) for information about creating and using this monitor type.
|
||||||
|
1. **composite monitor**: Runs multiple monitors in a single workflow and generates a single alert based on multiple trigger conditions. See [Composite monitors]({{site.url}}{{site.baseurl}}/observing-your-data/alerting/composite-monitors/) for information about creating and using this monitor type.
|
||||||
|
|
||||||
Term | Definition
|
The maximum number of monitors you can create is 1,000. You can change the default maximum number of alerts for your cluster by updating the `plugins.alerting.monitor.max_monitors` setting using the [cluster settings API]({{site.url}}{{site.baseurl}}/observing-your-data/alerting/settings/).
|
||||||
:--- | :---
|
{: .tip}
|
||||||
Monitor | A job that runs on a defined schedule and queries OpenSearch indexes. The results of these queries are then used as input for one or more *triggers*.
|
|
||||||
Trigger | Conditions that, if met, generate *alerts*.
|
|
||||||
Tag | A label that can be applied to multiple queries to combine them with the logical OR operation in a per document monitor. You cannot use tags with other monitor types.
|
|
||||||
Alert | An event associated with a trigger. When an alert is created, the trigger performs *actions*, which can include sending a notification.
|
|
||||||
Action | The information that you want the monitor to send out after being triggered. Actions have a *destination*, a message subject, and a message body.
|
|
||||||
Destination | A reusable location for an action. Supported locations are Amazon Chime, Email, Slack, or custom webhook.
|
|
||||||
Finding | An entry for an individual document found by a per document monitor query that contains the document ID, index name, and timestamp. Findings are stored in the Findings index: `.opensearch-alerting-finding*`.
|
|
||||||
Channel | A notification channel to use in an action. See [Notifications]({{site.url}}{{site.baseurl}}/notifications-plugin/index) for more information.
|
|
||||||
|
|
||||||
## Per document monitors
|
## Monitor variables
|
||||||
|
|
||||||
Introduced 2.0
|
The following table lists the variables available for customizing your monitors.
|
||||||
{: .label .label-purple }
|
|
||||||
|
|
||||||
Per document monitors allow you to define up to 10 queries that compare the selected field with your desired value. You can define supported field data types using the following operators:
|
|
||||||
|
|
||||||
- `is`
|
|
||||||
- `is not`
|
|
||||||
- `is greater than`
|
|
||||||
- `is greater than equal`
|
|
||||||
- `is less than`
|
|
||||||
- `is less than equal`
|
|
||||||
|
|
||||||
You query each trigger using up to 10 tags, adding the tag as a single trigger condition instead of specifying a single query. The Alerting plugin processes the trigger conditions from all queries as a logical `OR` operation, so if any of the query conditions are met, it triggers an alert. Next, the Alerting plugin tells the Notifications plugin to send the notification to a channel.
|
|
||||||
|
|
||||||
The Alerting plugin also creates a list of document findings that contains metadata about which document matches each query. Security analytics can use the document findings data to keep track of and analyze the query data separately from the alert processes.
|
|
||||||
|
|
||||||
|
|
||||||
The Alerting API provides a document-level monitor that programmatically accomplishes the same function as the per document monitor in the OpenSearch Dashboards. To learn more, see [Document-level monitors]({{site.url}}{{site.baseurl}}/observing-your-data/alerting/api/#document-level-monitors).
|
|
||||||
{: .note}
|
|
||||||
|
|
||||||
### Document findings
|
|
||||||
|
|
||||||
When a per document monitor executes a query that matches a document in an index, a finding is created. OpenSearch provides a Findings index: `.opensearch-alerting-finding*` that contains findings data for all per document monitor queries. You can search the findings index with the Alerting API search operation. To learn more, see [Search the findings index]({{site.url}}{{site.baseurl}}/observing-your-data/alerting/api/#search-the-findings-index).
|
|
||||||
|
|
||||||
The following metadata is provided for each document finding entry:
|
|
||||||
|
|
||||||
* **Document** – The document ID and index name. For example: `Re5akdirhj3fl | test-logs-index`.
|
|
||||||
* **Query** – The query name that matched the document.
|
|
||||||
* **Time found** – The timestamp that indicates when the document was found during the runtime.
|
|
||||||
|
|
||||||
It is possible to configure an alert notification for each finding, however we don't recommend this unless rules are well defined to prevent a huge volume of findings in a high ingestion cluster.
|
|
||||||
|
|
||||||
|
|
||||||
## Create destinations
|
|
||||||
|
|
||||||
1. Choose **Alerting**, **Destinations**, **Add destination**.
|
|
||||||
1. Specify a name for the destination so that you can identify it later.
|
|
||||||
1. For **Type**, choose Slack, Amazon Chime, custom webhook, or [email](#email-as-a-destination).
|
|
||||||
|
|
||||||
For Email, refer to the [Email as a destination](#email-as-a-destination) section below. For all other types, specify the webhook URL. See the documentation for [Slack](https://api.slack.com/incoming-webhooks) and [Amazon Chime](https://docs.aws.amazon.com/chime/latest/ug/webhooks.html) to learn more about webhooks.
|
|
||||||
|
|
||||||
If you're using custom webhooks, you must specify more information: parameters and headers. For example, if your endpoint requires basic authentication, you might need to add a header with a key of `Authorization` and a value of `Basic <Base64-encoded-credential-string>`. You might also need to change `Content-Type` to whatever your webhook requires. Popular values are `application/json`, `application/xml`, and `text/plain`.
|
|
||||||
|
|
||||||
This information is stored in plain text in the OpenSearch cluster. We will improve this design in the future, but for now, the encoded credentials (which are neither encrypted nor hashed) might be visible to other OpenSearch users.
|
|
||||||
|
|
||||||
|
|
||||||
### Email as a destination
|
|
||||||
|
|
||||||
To send or receive an alert notification as an email, choose **Email** as the destination type. Next, add at least one sender and recipient. We recommend adding email groups if you want to notify more than a few people of an alert. You can configure senders and recipients using **Manage senders** and **Manage email groups**.
|
|
||||||
|
|
||||||
#### Manage senders
|
|
||||||
|
|
||||||
You need to specify an email account from which the Alerting plugin can send notifications.
|
|
||||||
|
|
||||||
To configure a sender email, do the following:
|
|
||||||
|
|
||||||
1. After you choose **Email** as the destination type, choose **Manage senders**.
|
|
||||||
1. Choose **Add sender**, **New sender** and enter a unique name.
|
|
||||||
1. Enter the email address, SMTP host (e.g. `smtp.gmail.com` for a Gmail account), and the port.
|
|
||||||
1. Choose an encryption method, or use the default value of **None**. However, most email providers require SSL or TLS, which require a username and password in OpenSearch keystore. Refer to [Authenticate sender account](#authenticate-sender-account) to learn more.
|
|
||||||
1. Choose **Save** to save the configuration and create the sender. You can create a sender even before you add your credentials to the OpenSearch keystore. However, you must [authenticate each sender account](#authenticate-sender-account) before you use the destination to send your alert.
|
|
||||||
|
|
||||||
You can reuse senders across many different destinations, but each destination only supports one sender.
|
|
||||||
|
|
||||||
|
|
||||||
#### Manage email groups or recipients
|
|
||||||
|
|
||||||
Use email groups to create and manage reusable lists of email addresses. For example, one alert might email the DevOps team, whereas another might email the executive team and the engineering team.
|
|
||||||
|
|
||||||
You can enter individual email addresses or an email group in the **Recipients** field.
|
|
||||||
|
|
||||||
1. After you choose **Email** as the destination type, choose **Manage email groups**. Then choose **Add email group**, **New email group**.
|
|
||||||
1. Enter a unique name.
|
|
||||||
1. For recipient emails, enter any number of email addresses.
|
|
||||||
1. Choose **Save**.
|
|
||||||
|
|
||||||
|
|
||||||
#### Authenticate sender account
|
|
||||||
|
|
||||||
If your email provider requires SSL or TLS, you must authenticate each sender account before you can send an email. Enter these credentials in the OpenSearch keystore using the CLI. Run the following commands (in your OpenSearch directory) to enter your username and password. The `<sender_name>` is the name you entered for **Sender** earlier.
|
|
||||||
|
|
||||||
```bash
|
|
||||||
./bin/opensearch-keystore add plugins.alerting.destination.email.<sender_name>.username
|
|
||||||
./bin/opensearch-keystore add plugins.alerting.destination.email.<sender_name>.password
|
|
||||||
```
|
|
||||||
|
|
||||||
Note: Keystore settings are node-specific. You must run these commands on each node.
|
|
||||||
{: .note}
|
|
||||||
|
|
||||||
To change or update your credentials (after you've added them to the keystore on every node), call the reload API to automatically update those credentials without restarting OpenSearch:
|
|
||||||
|
|
||||||
```json
|
|
||||||
POST _nodes/reload_secure_settings
|
|
||||||
{
|
|
||||||
"secure_settings_password": "1234"
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
|
|
||||||
## Create a monitor
|
|
||||||
|
|
||||||
1. Choose **Alerting**, **Monitors**, **Create monitor**.
|
|
||||||
1. Specify a name for the monitor.
|
|
||||||
1. Choose either **Per query monitor**, **Per bucket monitor**, **Per cluster metrics monitor**, or **Per document monitor**.
|
|
||||||
|
|
||||||
OpenSearch supports the following types of monitors:
|
|
||||||
|
|
||||||
- **Per query monitors** run your specified query and then check whether the query's results trigger any alerts. Per query monitors can only trigger one alert at a time.
|
|
||||||
- **Per bucket monitors** let you create buckets based on selected fields and then categorize your results into those buckets. The Alerting plugin runs each bucket's unique results against a script you define later, so you have finer control over which results should trigger alerts. Furthermore, each bucket can trigger an alert.
|
|
||||||
|
|
||||||
The maximum number of monitors you can create is 1,000. You can change the default maximum number of alerts for your cluster by calling the cluster settings API `plugins.alerting.monitor.max_monitors`.
|
|
||||||
|
|
||||||
1. Decide how you want to define your query and triggers. You can use any of the following methods: visual editor, query editor, or anomaly detector.
|
|
||||||
|
|
||||||
- Visual definition works well for monitors that you can define as "some value is above or below some threshold for some amount of time."
|
|
||||||
|
|
||||||
- Query definition gives you flexibility in terms of what you query for (using [OpenSearch query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/full-text/index)) and how you evaluate the results of that query (Painless scripting).
|
|
||||||
|
|
||||||
This example averages the `cpu_usage` field:
|
|
||||||
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"size": 0,
|
|
||||||
"query": {
|
|
||||||
"match_all": {}
|
|
||||||
},
|
|
||||||
"aggs": {
|
|
||||||
"avg_cpu": {
|
|
||||||
"avg": {
|
|
||||||
"field": "cpu_usage"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
You can even filter query results using `{% raw %}{{period_start}}{% endraw %}` and `{% raw %}{{period_end}}{% endraw %}`:
|
|
||||||
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"size": 0,
|
|
||||||
"query": {
|
|
||||||
"bool": {
|
|
||||||
"filter": [{
|
|
||||||
"range": {
|
|
||||||
"timestamp": {
|
|
||||||
"from": "{% raw %}{{period_end}}{% endraw %}||-1h",
|
|
||||||
"to": "{% raw %}{{period_end}}{% endraw %}",
|
|
||||||
"include_lower": true,
|
|
||||||
"include_upper": true,
|
|
||||||
"format": "epoch_millis",
|
|
||||||
"boost": 1
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}],
|
|
||||||
"adjust_pure_negative": true,
|
|
||||||
"boost": 1
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"aggregations": {}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
"Start" and "end" refer to the interval at which the monitor runs. See [Available variables](#available-variables).
|
|
||||||
|
|
||||||
To define a monitor visually, choose **Visual editor**. Then choose a source index, a timeframe, an aggregation (for example, `count()` or `average()`), a data filter if you want to monitor a subset of your source index, and a group-by field if you want to include an aggregation field in your query. At least one group-by field is required if you're defining a bucket-level monitor. Visual definition works well for most monitors.
|
|
||||||
|
|
||||||
If you use the Security plugin, you can only choose indexes that you have permission to access. For details, see [Alerting security]({{site.url}}{{site.baseurl}}/security/).
|
|
||||||
|
|
||||||
To use a query, choose **Extraction query editor**, add your query (using [OpenSearch query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/full-text/index)), and test it using the **Run** button.
|
|
||||||
|
|
||||||
The monitor makes this query to OpenSearch as often as the schedule dictates; check the **Query Performance** section and make sure you're comfortable with the performance implications.
|
|
||||||
|
|
||||||
To use an anomaly detector, choose **Anomaly detector** and select your **Detector**.
|
|
||||||
|
|
||||||
The anomaly detection option is for pairing with the anomaly detection plugin. See [Anomaly Detection]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/).
|
|
||||||
For anomaly detector, choose an appropriate schedule for the monitor based on the detector interval. Otherwise, the alerting monitor might miss reading the results.
|
|
||||||
|
|
||||||
For example, assume you set the monitor interval and the detector interval as 5 minutes, and you start the detector at 12:00. If an anomaly is detected at 12:05, it might be available at 12:06 because of the delay between writing the anomaly and it being available for queries. The monitor reads the anomaly results between 12:00 and 12:05, so it does not get the anomaly results available at 12:06.
|
|
||||||
|
|
||||||
To avoid this issue, make sure the alerting monitor is at least twice the detector interval.
|
|
||||||
When you create a monitor using OpenSearch Dashboards, the anomaly detector plugin generates a default monitor schedule that's twice the detector interval.
|
|
||||||
|
|
||||||
Whenever you update a detector’s interval, make sure to update the associated monitor interval as well, as the anomaly detection plugin does not do this automatically.
|
|
||||||
|
|
||||||
**Note**: Anomaly detection is available only if you are defining a per query monitor.
|
|
||||||
{: .note}
|
|
||||||
|
|
||||||
1. Choose how frequently to run your monitor. You can run it either by time intervals (minutes, hours, or days) or on a schedule. If you run it on a daily, weekly or monthly schedule or according to a custom [custom cron expression]({{site.url}}{{site.baseurl}}/monitoring-plugins/alerting/cron/), then you need to also provide the time zone.
|
|
||||||
|
|
||||||
1. Add a trigger to your monitor.
|
|
||||||
|
|
||||||
|
|
||||||
## Create triggers
|
|
||||||
|
|
||||||
Steps to create a trigger differ depending on whether you chose **Visual editor**, **Extraction query editor**, or **Anomaly detector** when you created the monitor.
|
|
||||||
|
|
||||||
You begin by specifying a name and severity level for the trigger. Severity levels help you manage alerts. A trigger with a high severity level (e.g. 1) might page a specific individual, whereas a trigger with a low severity level might message a chat room.
|
|
||||||
|
|
||||||
Remember that query-level monitors run your trigger's script just once against the query's results, but bucket-level monitors execute your trigger's script on each bucket, so you should create a trigger that best fits the monitor you chose. If you want to execute multiple scripts, you must create multiple triggers.
|
|
||||||
|
|
||||||
### Visual editor
|
|
||||||
|
|
||||||
For a query-level monitor's **Trigger condition**, specify a threshold for the aggregation and timeframe you chose earlier, such as "is below 1,000" or "is exactly 10."
|
|
||||||
|
|
||||||
The line moves up and down as you increase and decrease the threshold. Once this line is crossed, the trigger evaluates to true.
|
|
||||||
|
|
||||||
Bucket-level monitors also require you to specify a threshold and value for your aggregation and timeframe, but you can use a maximum of five conditions to better refine your trigger. Optionally, you can also use a keyword filter to filter for a specific field in your index.
|
|
||||||
|
|
||||||
Document-level monitors provide the added option to use tags that represent multiple queries connected by the logical OR operator.
|
|
||||||
|
|
||||||
To create a multiple query combination trigger, do the following steps:
|
|
||||||
|
|
||||||
1. Create a per document monitor with more than one query.
|
|
||||||
2. Create the first query with a field, an operator, and a value. For example, set the query to search for the `region` field with either operator: "is" or "is not", and set the value "us-west-2".
|
|
||||||
3. Select **Add Tag** and give the tag a name.
|
|
||||||
3. Create the second query and add the same tag to it.
|
|
||||||
4. Now you can create the trigger condition and specify the tag name. This creates a combination trigger that checks two queries that both contain the same tag. The monitor checks both queries with a logical OR operation and if either query's conditions are met, then it will generate the alert notification.
|
|
||||||
|
|
||||||
### Extraction query
|
|
||||||
|
|
||||||
If you're using a query-level monitor, specify a Painless script that returns true or false. Painless is the default OpenSearch scripting language and has a syntax similar to Groovy.
|
|
||||||
|
|
||||||
Trigger condition scripts revolve around the `ctx.results[0]` variable, which corresponds to the extraction query response. For example, your script might reference `ctx.results[0].hits.total.value` or `ctx.results[0].hits.hits[i]._source.error_code`.
|
|
||||||
|
|
||||||
A return value of true means the trigger condition has been met, and the trigger should execute its actions. Test your script using the **Run** button.
|
|
||||||
|
|
||||||
The **Info** link next to **Trigger condition** contains a useful summary of the variables and results available to your query.
|
|
||||||
{: .tip }
|
|
||||||
|
|
||||||
Bucket-level monitors require you to specify more information in your trigger condition. At a minimum, you must have the following fields:
|
|
||||||
|
|
||||||
- `buckets_path`, which maps variable names to metrics to use in your script.
|
|
||||||
- `parent_bucket_path`, which is a path to a multi-bucket aggregation. The path can include single-bucket aggregations, but the last aggregation must be multi-bucket. For example, if you have a pipeline such as `agg1>agg2>agg3`, `agg1` and `agg2` are single-bucket aggregations, but `agg3` must be a multi-bucket aggregation.
|
|
||||||
- `script`, which is the script that OpenSearch runs to evaluate whether to trigger any alerts.
|
|
||||||
|
|
||||||
For example, you might have a script that looks like the following:
|
|
||||||
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"buckets_path": {
|
|
||||||
"count_var": "_count"
|
|
||||||
},
|
|
||||||
"parent_bucket_path": "composite_agg",
|
|
||||||
"script": {
|
|
||||||
"source": "params.count_var > 5"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
After mapping the `count_var` variable to the `_count` metric, you can use `count_var` in your script and reference `_count` data. Finally, `composite_agg` is a path to a multi-bucket aggregation.
|
|
||||||
|
|
||||||
### Anomaly detector
|
|
||||||
|
|
||||||
For **Trigger type**, choose **Anomaly detector grade and confidence**.
|
|
||||||
|
|
||||||
Specify the **Anomaly grade condition** for the aggregation and timeframe you chose earlier, "IS ABOVE 0.7" or "IS EXACTLY 0.5." The *anomaly grade* is a number between 0 and 1 that indicates the level of severity of how anomalous a data point is.
|
|
||||||
|
|
||||||
Specify the **Anomaly confidence condition** for the aggregation and timeframe you chose earlier, "IS ABOVE 0.7" or "IS EXACTLY 0.5." The *anomaly confidence* is an estimate of the probability that the reported anomaly grade matches the expected anomaly grade.
|
|
||||||
|
|
||||||
The line moves up and down as you increase and decrease the threshold. Once this line is crossed, the trigger evaluates to true.
|
|
||||||
|
|
||||||
|
|
||||||
#### Sample scripts
|
|
||||||
|
|
||||||
{::comment}
|
|
||||||
These scripts are Painless, not Groovy, but calling them Groovy in Jekyll gets us syntax highlighting in the generated HTML.
|
|
||||||
{:/comment}
|
|
||||||
|
|
||||||
```groovy
|
|
||||||
// Evaluates to true if the query returned any documents
|
|
||||||
ctx.results[0].hits.total.value > 0
|
|
||||||
```
|
|
||||||
|
|
||||||
```groovy
|
|
||||||
// Returns true if the avg_cpu aggregation exceeds 90
|
|
||||||
if (ctx.results[0].aggregations.avg_cpu.value > 90) {
|
|
||||||
return true;
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
```groovy
|
|
||||||
// Performs some crude custom scoring and returns true if that score exceeds a certain value
|
|
||||||
int score = 0;
|
|
||||||
for (int i = 0; i < ctx.results[0].hits.hits.length; i++) {
|
|
||||||
// Weighs 500 errors 10 times as heavily as 503 errors
|
|
||||||
if (ctx.results[0].hits.hits[i]._source.http_status_code == "500") {
|
|
||||||
score += 10;
|
|
||||||
} else if (ctx.results[0].hits.hits[i]._source.http_status_code == "503") {
|
|
||||||
score += 1;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
if (score > 99) {
|
|
||||||
return true;
|
|
||||||
} else {
|
|
||||||
return false;
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
|
|
||||||
### Available variables
|
|
||||||
|
|
||||||
You can include the following variables in your message using Mustache templates to see more information about your monitors.
|
|
||||||
|
|
||||||
#### Monitor variables
|
|
||||||
|
|
||||||
Variable | Data type | Description
|
Variable | Data type | Description
|
||||||
:--- | :--- | :---
|
:--- | :--- | :---
|
||||||
|
@ -366,37 +42,13 @@ Variable | Data type | Description
|
||||||
`ctx.monitor.inputs.search.indices` | Array | An array that contains the indexes the monitor observes.
|
`ctx.monitor.inputs.search.indices` | Array | An array that contains the indexes the monitor observes.
|
||||||
`ctx.monitor.inputs.search.query` | N/A | The definition used to define the monitor.
|
`ctx.monitor.inputs.search.query` | N/A | The definition used to define the monitor.
|
||||||
|
|
||||||
#### Trigger variables
|
The following table lists other variables you can use with your monitors.
|
||||||
|
|
||||||
Variable | Data type | Description
|
|
||||||
:--- | :--- | : ---
|
|
||||||
`ctx.trigger.id` | String | The trigger's ID.
|
|
||||||
`ctx.trigger.name` | String | The trigger's name.
|
|
||||||
`ctx.trigger.severity` | String | The trigger's severity.
|
|
||||||
`ctx.trigger.condition`| Object | Contains the Painless script used when creating the monitor.
|
|
||||||
`ctx.trigger.condition.script.source` | String | The language used to define the script. Must be painless.
|
|
||||||
`ctx.trigger.condition.script.lang` | String | The script used to define the trigger.
|
|
||||||
`ctx.trigger.actions`| Array | An array with one element that contains information about the action the monitor needs to trigger.
|
|
||||||
|
|
||||||
#### Action variables
|
|
||||||
|
|
||||||
Variable | Data type | Description
|
|
||||||
:--- | :--- | : ---
|
|
||||||
`ctx.trigger.actions.id` | String | The action's ID.
|
|
||||||
`ctx.trigger.actions.name` | String | The action's name.
|
|
||||||
`ctx.trigger.actions.message_template.source` | String | The message to send in the alert.
|
|
||||||
`ctx.trigger.actions.message_template.lang` | String | The scripting language used to define the message. Must be Mustache.
|
|
||||||
`ctx.trigger.actions.throttle_enabled` | Boolean | Whether throttling is enabled for this trigger. See [adding actions](#add-actions) for more information about throttling.
|
|
||||||
`ctx.trigger.actions.subject_template.source` | String | The message's subject in the alert.
|
|
||||||
`ctx.trigger.actions.subject_template.lang` | String | The scripting language used to define the subject. Must be mustache.
|
|
||||||
|
|
||||||
#### Other variables
|
|
||||||
|
|
||||||
Variable | Data type | Description
|
Variable | Data type | Description
|
||||||
:--- | :--- : :---
|
:--- | :--- : :---
|
||||||
`ctx.results` | Array | An array with one element (i.e. `ctx.results[0]`). Contains the query results. This variable is empty if the trigger was unable to retrieve results. See `ctx.error`.
|
`ctx.results` | Array | An array with one element, for example, `ctx.results[0]`. Contains the query results. This variable is empty if the trigger was unable to retrieve results. See `ctx.error`.
|
||||||
`ctx.last_update_time` | Milliseconds | Unix epoch time of when the monitor was last updated.
|
`ctx.last_update_time` | Milliseconds | Unix epoch time of when the monitor was last updated.
|
||||||
`ctx.periodStart` | String | Unix timestamp for the beginning of the period during which the alert triggered. For example, if a monitor runs every ten minutes, a period might begin at 10:40 and end at 10:50.
|
`ctx.periodStart` | String | Unix timestamp for the beginning of the period during which the alert triggered. For example, if a monitor runs every 10 minutes, a period might begin at 10:40 and end at 10:50.
|
||||||
`ctx.periodEnd` | String | The end of the period during which the alert triggered.
|
`ctx.periodEnd` | String | The end of the period during which the alert triggered.
|
||||||
`ctx.error` | String | The error message if the trigger was unable to retrieve results or unable to evaluate the trigger, typically due to a compile error or null pointer exception. Null otherwise.
|
`ctx.error` | String | The error message if the trigger was unable to retrieve results or unable to evaluate the trigger, typically due to a compile error or null pointer exception. Null otherwise.
|
||||||
`ctx.alert` | Object | The current, active alert (if it exists). Includes `ctx.alert.id`, `ctx.alert.version`, and `ctx.alert.isAcknowledged`. Null if no alert is active. Only available with query-level monitors.
|
`ctx.alert` | Object | The current, active alert (if it exists). Includes `ctx.alert.id`, `ctx.alert.version`, and `ctx.alert.isAcknowledged`. Null if no alert is active. Only available with query-level monitors.
|
||||||
|
@ -406,186 +58,3 @@ Variable | Data type | Description
|
||||||
`bucket_keys` | String | Comma-separated list of the monitor's bucket key values. Available only for `ctx.dedupedAlerts`, `ctx.newAlerts`, and `ctx.completedAlerts`. Accessed through `ctx.dedupedAlerts[0].bucket_keys`.
|
`bucket_keys` | String | Comma-separated list of the monitor's bucket key values. Available only for `ctx.dedupedAlerts`, `ctx.newAlerts`, and `ctx.completedAlerts`. Accessed through `ctx.dedupedAlerts[0].bucket_keys`.
|
||||||
`parent_bucket_path` | String | The parent bucket path of the bucket that triggered the alert. Accessed through `ctx.dedupedAlerts[0].parent_bucket_path`.
|
`parent_bucket_path` | String | The parent bucket path of the bucket that triggered the alert. Accessed through `ctx.dedupedAlerts[0].parent_bucket_path`.
|
||||||
|
|
||||||
|
|
||||||
## Add actions
|
|
||||||
|
|
||||||
The final step in creating a monitor is to add one or more actions. Actions send notifications when trigger conditions are met. See the [Notifications plugin]({{site.url}}{{site.baseurl}}/notifications-plugin/index) to see what communication channels are supported.
|
|
||||||
|
|
||||||
If you don't want to receive notifications for alerts, you don't have to add actions to your triggers. Instead, you can periodically check OpenSearch Dashboards.
|
|
||||||
{: .tip }
|
|
||||||
|
|
||||||
1. Specify a name for the action.
|
|
||||||
1. Choose a [notification channel]({{site.url}}{{site.baseurl}}/notifications-plugin/index).
|
|
||||||
1. Add a subject and body for the message.
|
|
||||||
|
|
||||||
You can add variables to your messages using [Mustache templates](https://mustache.github.io/mustache.5.html). You have access to `ctx.action.name`, the name of the current action, as well as all [trigger variables](#available-variables).
|
|
||||||
|
|
||||||
If your destination is a custom webhook that expects a particular data format, you might need to include JSON (or even XML) directly in the message body:
|
|
||||||
|
|
||||||
```json
|
|
||||||
{% raw %}{ "text": "Monitor {{ctx.monitor.name}} just entered alert status. Please investigate the issue. - Trigger: {{ctx.trigger.name}} - Severity: {{ctx.trigger.severity}} - Period start: {{ctx.periodStart}} - Period end: {{ctx.periodEnd}}" }{% endraw %}
|
|
||||||
```
|
|
||||||
|
|
||||||
In this case, the message content must conform to the `Content-Type` header in the [custom webhook]({{site.url}}{{site.baseurl}}/notifications-plugin/index).
|
|
||||||
1. If you're using a bucket-level monitor, you can choose whether the monitor should perform an action for each execution or for each alert.
|
|
||||||
|
|
||||||
1. (Optional) Use action throttling to limit the number of notifications you receive within a given span of time.
|
|
||||||
|
|
||||||
For example, if a monitor checks a trigger condition every minute, you could receive one notification per minute. If you set action throttling to 60 minutes, you receive no more than one notification per hour, even if the trigger condition is met dozens of times in that hour.
|
|
||||||
|
|
||||||
1. Choose **Create**.
|
|
||||||
|
|
||||||
After an action sends a message, the content of that message has left the purview of the Security plugin. Securing access to the message (e.g. access to the Slack channel) is your responsibility.
|
|
||||||
|
|
||||||
|
|
||||||
#### Sample message
|
|
||||||
|
|
||||||
```mustache
|
|
||||||
{% raw %}Monitor {{ctx.monitor.name}} just entered an alert state. Please investigate the issue.
|
|
||||||
- Trigger: {{ctx.trigger.name}}
|
|
||||||
- Severity: {{ctx.trigger.severity}}
|
|
||||||
- Period start: {{ctx.periodStart}}
|
|
||||||
- Period end: {{ctx.periodEnd}}{% endraw %}
|
|
||||||
```
|
|
||||||
|
|
||||||
If you want to use the `ctx.results` variable in a message, use `{% raw %}{{ctx.results.0}}{% endraw %}` rather than `{% raw %}{{ctx.results[0]}}{% endraw %}`. This difference is due to how Mustache handles bracket notation.
|
|
||||||
{: .note }
|
|
||||||
|
|
||||||
### Questions about destinations
|
|
||||||
|
|
||||||
Q: What plugins do I need installed besides Alerting?
|
|
||||||
|
|
||||||
A: To continue using the notification action in the Alerting plugin, you need to install the backend plugins `notifications-core` and `notifications`. You can also install the Notifications Dashboards plugin to manage Notification channels via OpenSearch Dashboards.
|
|
||||||
|
|
||||||
Q: Can I still create destinations?
|
|
||||||
A: No, destinations have been deprecated and can no longer be created/edited.
|
|
||||||
|
|
||||||
Q: Will I need to move my destinations to the Notifications plugin?
|
|
||||||
A: No. To upgrade users, a background process will automatically move destinations to notification channels. These channels will have the same ID as the destinations, and monitor execution will choose the correct ID, so you don't have to make any changes to the monitor's definition. The migrated destinations will be deleted.
|
|
||||||
|
|
||||||
Q: What happens if any destinations fail to migrate?
|
|
||||||
A: If a destination failed to migrate, the monitor will continue using it until the monitor is migrated to a notification channel. You don't need to do anything in this case.
|
|
||||||
|
|
||||||
Q: Do I need to install the Notifications plugins if monitors can still use destinations?
|
|
||||||
A: Yes. The fallback on destination is to prevent failures in sending messages if migration fails; however, the Notification plugin is what actually sends the message. Not having the Notification plugin installed will lead to the action failing.
|
|
||||||
|
|
||||||
|
|
||||||
## Work with alerts
|
|
||||||
|
|
||||||
Alerts persist until you resolve the root cause and have the following states:
|
|
||||||
|
|
||||||
State | Description
|
|
||||||
:--- | :---
|
|
||||||
Active | The alert is ongoing and unacknowledged. Alerts remain in this state until you acknowledge them, delete the trigger associated with the alert, or delete the monitor entirely.
|
|
||||||
Acknowledged | Someone has acknowledged the alert, but not fixed the root cause.
|
|
||||||
Completed | The alert is no longer ongoing. Alerts enter this state after the corresponding trigger evaluates to false.
|
|
||||||
Error | An error occurred while executing the trigger---usually the result of a a bad trigger or destination.
|
|
||||||
Deleted | Someone deleted the monitor or trigger associated with this alert while the alert was ongoing.
|
|
||||||
|
|
||||||
|
|
||||||
## Create cluster metrics monitor
|
|
||||||
|
|
||||||
In addition to monitoring conditions for indexes, the Alerting plugin allows monitoring conditions for clusters. Alerts can be set by cluster metrics to watch for the following conditions:
|
|
||||||
|
|
||||||
- The health of your cluster reaches a status of yellow or red
|
|
||||||
- Cluster-level metrics, such as CPU usage and JVM memory usage, reach specified thresholds
|
|
||||||
- Node-level metrics, such as available disk space, JVM memory usage, and CPU usage, reach a specified threshold
|
|
||||||
- The total number of documents stores reaches a specified amount
|
|
||||||
|
|
||||||
To create a cluster metrics monitor:
|
|
||||||
|
|
||||||
1. Select **Alerting** > **Monitors** > **Create monitor**.
|
|
||||||
2. Select the **Per cluster metrics monitor** option.
|
|
||||||
3. In the Query section, pick the **Request type** from the dropdown.
|
|
||||||
4. (Optional) If you want to filter the API response to use only certain path parameters, enter those parameters under **Query parameters**. Most APIs that can be used to monitor cluster status support path parameters as described in their documentation (e.g., comma-separated lists of index names).
|
|
||||||
5. In the Triggers section, indicate what conditions trigger an alert. The trigger condition autopopulates a painless ctx variable. For example, a cluster monitor watching for Cluster Stats uses the trigger condition `ctx.results[0].indices.count <= 0`, which triggers an alert based on the number of indexes returned by the query. For more specificity, add any additional painless conditions supported by the API. To see an example of the condition response, select **Preview condition response**.
|
|
||||||
6. In the Actions section, indicate how you want your users to be notified when a trigger condition is met.
|
|
||||||
7. Select **Create**. Your new monitor appears in the **Monitors** list.
|
|
||||||
|
|
||||||
### Supported APIs
|
|
||||||
|
|
||||||
Trigger conditions use responses from the following cat API endpoints. Most APIs that can be used to monitor cluster status support path parameters as described in their documentation (e.g., comma-separated lists of index names). However, they do not support query parameters.
|
|
||||||
|
|
||||||
1. [_cluster/health]({{site.url}}{{site.baseurl}}/api-reference/cluster-health/)
|
|
||||||
2. [_cluster/stats]({{site.url}}{{site.baseurl}}/api-reference/cluster-stats/)
|
|
||||||
3. [_cluster/settings]({{site.url}}{{site.baseurl}}/api-reference/cluster-settings/)
|
|
||||||
4. [_nodes/stats]({{site.url}}{{site.baseurl}}/opensearch/popular-api/#get-node-statistics)
|
|
||||||
5. [_cat/pending_tasks]({{site.url}}{{site.baseurl}}/api-reference/cat/cat-pending-tasks/)
|
|
||||||
6. [_cat/recovery]({{site.url}}{{site.baseurl}}/api-reference/cat/cat-recovery/)
|
|
||||||
7. [_cat/snapshots]({{site.url}}{{site.baseurl}}/api-reference/cat/cat-snapshots/)
|
|
||||||
8. [_cat/tasks]({{site.url}}{{site.baseurl}}/api-reference/cat/cat-tasks/)
|
|
||||||
|
|
||||||
### Restrict API fields
|
|
||||||
|
|
||||||
If you want to hide fields from the API response that you do not want exposed for alerting, reconfigure the [supported_json_payloads.json](https://github.com/opensearch-project/alerting/blob/main/alerting/src/main/resources/org/opensearch/alerting/settings/supported_json_payloads.json) file inside the Alerting plugin. The file functions as an allow list for the API fields you want to use in an alert. By default, all APIs and their parameters can be used for monitors and trigger conditions.
|
|
||||||
|
|
||||||
However, you can modify the file so that cluster metric monitors can only be created for APIs referenced. Furthermore, only fields referenced in the supported files can create trigger conditions. This `supported_json_payloads.json` allows for a cluster metrics monitor to be created for the `_cluster/stats` API, and triggers conditions for the `indices.shards.total` and `indices.shards.index.shards.min` fields.
|
|
||||||
|
|
||||||
```json
|
|
||||||
"/_cluster/stats": {
|
|
||||||
"indices": [
|
|
||||||
"shards.total",
|
|
||||||
"shards.index.shards.min"
|
|
||||||
]
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### Painless triggers
|
|
||||||
|
|
||||||
Painless scripts define triggers for cluster metrics monitors, similar to query or bucket-level monitors that are defined using the extraction query definition option. Painless scripts are comprised of at least one statement and any additional functions you wish to execute.
|
|
||||||
|
|
||||||
The cluster metrics monitor supports up to **ten** triggers.
|
|
||||||
|
|
||||||
In this example, a JSON object creates a trigger that sends an alert when the Cluster Health is yellow. `script` points the `source` to the painless script `ctx.results[0].status == \"yellow\`.
|
|
||||||
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"name": "Cluster Health Monitor",
|
|
||||||
"type": "monitor",
|
|
||||||
"monitor_type": "query_level_monitor",
|
|
||||||
"enabled": true,
|
|
||||||
"schedule": {
|
|
||||||
"period": {
|
|
||||||
"unit": "MINUTES",
|
|
||||||
"interval": 1
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"inputs": [
|
|
||||||
{
|
|
||||||
"uri": {
|
|
||||||
"api_type": "CLUSTER_HEALTH",
|
|
||||||
"path": "_cluster/health/",
|
|
||||||
"path_params": "",
|
|
||||||
"url": "http://localhost:9200/_cluster/health/"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"triggers": [
|
|
||||||
{
|
|
||||||
"query_level_trigger": {
|
|
||||||
"id": "Tf_L_nwBti6R6Bm-18qC",
|
|
||||||
"name": "Yellow status trigger",
|
|
||||||
"severity": "1",
|
|
||||||
"condition": {
|
|
||||||
"script": {
|
|
||||||
"source": "ctx.results[0].status == \"yellow\"",
|
|
||||||
"lang": "painless"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"actions": []
|
|
||||||
}
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
See [trigger variables](#trigger-variables) for more painless ctx options.
|
|
||||||
|
|
||||||
### Limitations
|
|
||||||
|
|
||||||
Currently, the cluster metrics monitor has the following limitations:
|
|
||||||
|
|
||||||
- You cannot create monitors for remote clusters.
|
|
||||||
- The OpenSearch cluster must be in a state where an index's conditions can be monitored and actions can be executed against the index.
|
|
||||||
- Removing resource permissions from a user will not prevent that user’s preexisting monitors for that resource from executing.
|
|
||||||
- Users with permissions to create monitors are not blocked from creating monitors for resources for which they do not have permissions; however, those monitors will not execute.
|
|
||||||
|
|
|
@ -0,0 +1,123 @@
|
||||||
|
---
|
||||||
|
layout: default
|
||||||
|
title: Per cluster metrics monitors
|
||||||
|
nav_order: 15
|
||||||
|
parent: Monitors
|
||||||
|
grand_parent: Alerting
|
||||||
|
has_children: false
|
||||||
|
---
|
||||||
|
|
||||||
|
# Per cluster metrics monitors
|
||||||
|
|
||||||
|
Per cluster metrics monitors are a type of alert monitor that collects and analyzes metrics from a single cluster, providing insights into the cluster's performance and health. You can set alerts to monitor certain conditions, such as when:
|
||||||
|
|
||||||
|
- Cluster health reaches yellow or red status.
|
||||||
|
- Cluster-level metrics---for example, CPU usage and JVM memory usage---reach specified thresholds.
|
||||||
|
- Node-level metrics---for example, available disk space, JVM memory usage, and CPU usage---reach specified thresholds.
|
||||||
|
- Total number of documents stored reaches specified thresholds.
|
||||||
|
|
||||||
|
## Create a cluster metrics monitor
|
||||||
|
|
||||||
|
To create a cluster metrics monitor, follow these steps:
|
||||||
|
|
||||||
|
1. Select **Alerting** > **Monitors** > **Create monitor**.
|
||||||
|
2. Select the **Per cluster metrics monitor** option.
|
||||||
|
3. In the Query section, pick the **Request type** from the dropdown list.
|
||||||
|
4. (Optional) If you want to filter the API response to use only certain path parameters, enter those parameters under **Query parameters**. Most APIs that can be used to monitor cluster status support path parameters as described in their documentation (for example, comma-separated lists of index names).
|
||||||
|
5. In the [Triggers]({{site.url}}{{site.baseurl}}/observing-your-data/alerting/triggers/) section, indicate which conditions will trigger an alert. The trigger condition autopopulates a `painless ctx` variable. For example, a cluster monitor watching for Cluster Stats uses the trigger condition `ctx.results[0].indices.count <= 0`, which triggers an alert based on the number of indexes returned by the query. For more specificity, add any additional Painless conditions supported by the API. To see an example of the condition response, select **Preview condition response**.
|
||||||
|
6. In the Actions section, indicate how you want your users to be notified when a trigger condition is met.
|
||||||
|
7. Select **Create**. Your new monitor appears in the **Monitors** list.
|
||||||
|
|
||||||
|
The following example shows a configuration of a cluster metrics monitor.
|
||||||
|
|
||||||
|
<img src="{{site.url}}{{site.baseurl}}/images/cluster-metrics.png" alt="Cluster metrics monitor" width="700"/>
|
||||||
|
|
||||||
|
## Supported APIs
|
||||||
|
|
||||||
|
Trigger conditions use responses from the following API endpoints. Most APIs that can be used to monitor cluster status support path parameters (for example, comma-separated lists of index names). They do not support query parameters.
|
||||||
|
|
||||||
|
- [_cluster/health]({{site.url}}{{site.baseurl}}/api-reference/cluster-health/)
|
||||||
|
- [_cluster/stats]({{site.url}}{{site.baseurl}}/api-reference/cluster-stats/)
|
||||||
|
- [_cluster/settings]({{site.url}}{{site.baseurl}}/api-reference/cluster-settings/)
|
||||||
|
- [_nodes/stats]({{site.url}}{{site.baseurl}}/opensearch/popular-api/#get-node-statistics)
|
||||||
|
- [_cat/indices]({{site.url}}{{site.baseurl}}/api-reference/cat/cat-indices/)
|
||||||
|
- [_cat/pending_tasks]({{site.url}}{{site.baseurl}}/api-reference/cat/cat-pending-tasks/)
|
||||||
|
- [_cat/recovery]({{site.url}}{{site.baseurl}}/api-reference/cat/cat-recovery/)
|
||||||
|
- [_cat/shards]({{site.url}}{{site.baseurl}}/api-reference/cat/cat-shards/)
|
||||||
|
- [_cat/snapshots]({{site.url}}{{site.baseurl}}/api-reference/cat/cat-snapshots/)
|
||||||
|
- [_cat/tasks]({{site.url}}{{site.baseurl}}/api-reference/cat/cat-tasks/)
|
||||||
|
|
||||||
|
## Restrict API fields
|
||||||
|
|
||||||
|
If you want to hide fields from the API response and not expose them for alerting, reconfigure the [supported_json_payloads.json](https://github.com/opensearch-project/alerting/blob/main/alerting/src/main/resources/org/opensearch/alerting/settings/supported_json_payloads.json) file inside the Alerting plugin. The file functions as an allow list for the API fields you want to use in an alert. By default, all APIs and their parameters can be used for monitors and trigger conditions.
|
||||||
|
|
||||||
|
However, you can modify the file so that cluster metric monitors can only be created for APIs referenced. Furthermore, only fields referenced in the supported files can create trigger conditions. This `supported_json_payloads.json` allows for a cluster metrics monitor to be created for the `_cluster/stats` API, and triggers conditions for the `indices.shards.total` and `indices.shards.index.shards.min` fields.
|
||||||
|
|
||||||
|
```json
|
||||||
|
"/_cluster/stats": {
|
||||||
|
"indices": [
|
||||||
|
"shards.total",
|
||||||
|
"shards.index.shards.min"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Painless triggers
|
||||||
|
|
||||||
|
Painless scripts define triggers for cluster metrics monitors, similar to per query or per bucket monitors, which are defined using the extraction query definition option. Painless scripts are composed of at least one statement and any additional functions you wish to run.
|
||||||
|
|
||||||
|
The cluster metrics monitor supports up to **ten** triggers.
|
||||||
|
|
||||||
|
In the following example, a JSON object creates a trigger that sends an alert when the cluster health is yellow. `script` points the `source` to the Painless script `ctx.results[0].status == \"yellow\`.
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"name": "Cluster Health Monitor",
|
||||||
|
"type": "monitor",
|
||||||
|
"monitor_type": "query_level_monitor",
|
||||||
|
"enabled": true,
|
||||||
|
"schedule": {
|
||||||
|
"period": {
|
||||||
|
"unit": "MINUTES",
|
||||||
|
"interval": 1
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"inputs": [
|
||||||
|
{
|
||||||
|
"uri": {
|
||||||
|
"api_type": "CLUSTER_HEALTH",
|
||||||
|
"path": "_cluster/health/",
|
||||||
|
"path_params": "",
|
||||||
|
"url": "http://localhost:9200/_cluster/health/"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"triggers": [
|
||||||
|
{
|
||||||
|
"query_level_trigger": {
|
||||||
|
"id": "Tf_L_nwBti6R6Bm-18qC",
|
||||||
|
"name": "Yellow status trigger",
|
||||||
|
"severity": "1",
|
||||||
|
"condition": {
|
||||||
|
"script": {
|
||||||
|
"source": "ctx.results[0].status == \"yellow\"",
|
||||||
|
"lang": "painless"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"actions": []
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
See [Trigger variables]({{site.url}}{{site.baseurl}}/observing-your-data/alerting/triggers/#trigger-variables) for more `painless ctx` variable options.
|
||||||
|
|
||||||
|
### Limitations
|
||||||
|
|
||||||
|
Per cluster metrics monitors have the following limitations:
|
||||||
|
|
||||||
|
- You cannot create monitors for remote clusters.
|
||||||
|
- The OpenSearch cluster must be in a state where an index's conditions can be monitored and actions can be executed against the index.
|
||||||
|
- Removing resource permissions from a user will not prevent that user’s preexisting monitors for that resource from executing.
|
||||||
|
- Users with permissions to create monitors are not blocked from creating monitors for resources for which they do not have permissions; however, those monitors will not run.
|
|
@ -0,0 +1,52 @@
|
||||||
|
---
|
||||||
|
layout: default
|
||||||
|
title: Per document monitors
|
||||||
|
nav_order: 20
|
||||||
|
parent: Monitors
|
||||||
|
grand_parent: Alerting
|
||||||
|
has_children: false
|
||||||
|
---
|
||||||
|
|
||||||
|
# Per document monitors
|
||||||
|
Introduced 2.0
|
||||||
|
{: .label .label-purple }
|
||||||
|
|
||||||
|
Per document monitors are a type of alert monitor that can be used to identify and alert on specific documents in an OpenSearch index. For example, you can use the monitor to:
|
||||||
|
|
||||||
|
- Detect corrupted data or unauthorized changes.
|
||||||
|
- Enforce data quality policies, such as ensuring all documents contain a certain field or that values in a field are within a certain range.
|
||||||
|
- Track changes to a specific document over time, which can be helpful for auditing and compliance purposes
|
||||||
|
|
||||||
|
## Defining queries
|
||||||
|
|
||||||
|
Per document monitors allow you to define up to 10 queries that compare a selected field with a desired value. You can define supported field data types using the following operators:
|
||||||
|
|
||||||
|
- `is`
|
||||||
|
- `is not`
|
||||||
|
- `is greater than`
|
||||||
|
- `is greater than equal`
|
||||||
|
- `is less than`
|
||||||
|
- `is less than equal`
|
||||||
|
|
||||||
|
You can query each [trigger]({{site.url}}{{site.baseurl}}/observing-your-data/alerting/triggers/) using up to 10 tags, adding the tag as a single trigger condition instead of specifying a single query. The [Alerting plugin]({{site.url}}{{site.baseurl}}/observing-your-data/alerting/monitors/) processes the trigger conditions from all queries as a logical `OR` operation, so if any of the query conditions are met, it triggers an alert. The Alerting plugin then tells the [Notifications plugin]({{site.url}}{{site.baseurl}}/observing-your-data/notifications/index/) to send the alert notification to a channel.
|
||||||
|
|
||||||
|
You can only use _tags_--- that is, labels that can be applied to multiple queries to combine them with the logical `OR`` operation---in a per document monitor.
|
||||||
|
{: .important}
|
||||||
|
|
||||||
|
## Document findings
|
||||||
|
|
||||||
|
The Alerting plugin creates a list of _Findings_ that contain metadata about which document matches each query. A _Finding_ is a record of a document identified by the per document monitor query as meeting the alert condition. Key components of a finding include the document ID, timestamp, alert condition details. Findings are stored in the Findings index, `.opensearch-alerting-finding*`.
|
||||||
|
|
||||||
|
Security Analytics can use the findings data to keep track of and analyze the query data separately from the alert processes. See [Working with findings]({{site.url}}{{site.baseurl}}/security-analytics/usage/findings/) to learn more.
|
||||||
|
{: .note}
|
||||||
|
|
||||||
|
The Alerting API also provides a _document-level monitor_ that programmatically accomplishes the same function as the _per document monitor_ in OpenSearch Dashboards. See [Document-level monitors]({{site.url}}{{site.baseurl}}/monitoring-plugins/alerting/api/#document-level-monitors) to learn more.
|
||||||
|
|
||||||
|
To prevent a large volume of findings in a high-ingestion cluster, configuring alert notifications for each finding is not recommended unless rules are well defined.
|
||||||
|
{: .important}
|
||||||
|
|
||||||
|
The following metadata is provided for each document findings entry:
|
||||||
|
|
||||||
|
* **Document**: The document ID and index name. For example: `Re5akdirhj3fl | test-logs-index`.
|
||||||
|
* **Query**: The query name that matched the document.
|
||||||
|
* **Time found**: The timestamp that indicates when the document was found during the runtime.
|
|
@ -0,0 +1,99 @@
|
||||||
|
---
|
||||||
|
layout: default
|
||||||
|
title: Per query and per bucket monitors
|
||||||
|
nav_order: 5
|
||||||
|
parent: Monitors
|
||||||
|
grand_parent: Alerting
|
||||||
|
has_children: false
|
||||||
|
---
|
||||||
|
|
||||||
|
# Per query and per bucket monitors
|
||||||
|
|
||||||
|
Per query monitors are a type of alert monitor that can be used to identify and alert on specific queries that are run against an OpenSearch index; for example, queries that detect and respond to anomalies in specific queries. Per query monitors only trigger one alert at a time.
|
||||||
|
|
||||||
|
Per bucket monitors are a type of alert monitor that can be used to identify and alert on specific buckets of data that are created by a query against an OpenSearch index.
|
||||||
|
|
||||||
|
## Creating a per query or per bucket monitor
|
||||||
|
|
||||||
|
To create a per query monitor, follow these steps:
|
||||||
|
|
||||||
|
**Step 1.** Define your query and [triggers]({{site.url}}{{site.baseurl}}/observing-your-data/alerting/triggers/). You can use any of these methods: visual editor, query editor, or anomaly detector.
|
||||||
|
|
||||||
|
- Visual definition works well for monitors that can be defined as "some value is above or below some threshold for some amount of time." It also works well for most monitors.
|
||||||
|
|
||||||
|
- Query definition provides flexibility in relation to your query (using [OpenSearch query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/full-text/index)) and how you evaluate the results of that query (Painless scripting).
|
||||||
|
|
||||||
|
The following example averages the `cpu_usage` field:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"size": 0,
|
||||||
|
"query": {
|
||||||
|
"match_all": {}
|
||||||
|
},
|
||||||
|
"aggs": {
|
||||||
|
"avg_cpu": {
|
||||||
|
"avg": {
|
||||||
|
"field": "cpu_usage"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
You can also filter query results using `{% raw %}{{period_start}}{% endraw %}` and `{% raw %}{{period_end}}{% endraw %}`:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"size": 0,
|
||||||
|
"query": {
|
||||||
|
"bool": {
|
||||||
|
"filter": [{
|
||||||
|
"range": {
|
||||||
|
"timestamp": {
|
||||||
|
"from": "{% raw %}{{period_end}}{% endraw %}||-1h",
|
||||||
|
"to": "{% raw %}{{period_end}}{% endraw %}",
|
||||||
|
"include_lower": true,
|
||||||
|
"include_upper": true,
|
||||||
|
"format": "epoch_millis",
|
||||||
|
"boost": 1
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}],
|
||||||
|
"adjust_pure_negative": true,
|
||||||
|
"boost": 1
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"aggregations": {}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
"Start" and "end" refer to the interval at which the monitor runs. See [Monitor variables]({{site.url}}{{site.baseurl}}/observing-your-data/alerting/monitors/#monitor-variables).
|
||||||
|
|
||||||
|
To define a monitor visually, choose **Visual editor**. Then choose a source index, a time frame, an aggregation (for example, `count()` or `average()`), a data filter (if you want to monitor a subset of your source index), and a group-by field if you want to include an aggregation field in your query. At least one group-by field is required if you're defining a per bucket monitor.
|
||||||
|
|
||||||
|
Visual definition works well for most monitors.
|
||||||
|
{: .tip }
|
||||||
|
|
||||||
|
If you use the Security plugin, you can only choose indexes that you have permission to access. For details, see [Alerting security]({{site.url}}{{site.baseurl}}/security/).
|
||||||
|
|
||||||
|
To use a query, choose **Extraction query editor**, add your query (using [OpenSearch query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/full-text/index)), and test it using the **Run** button.
|
||||||
|
|
||||||
|
The monitor makes this query to OpenSearch as often as the schedule dictates; check the **Query Performance** section and make sure you're comfortable with the performance implications.
|
||||||
|
|
||||||
|
Anomaly detection is available only if you are defining a per query monitor.
|
||||||
|
{: .warning}
|
||||||
|
|
||||||
|
To use an anomaly detector, choose **Anomaly detector** and select your **Detector**.
|
||||||
|
|
||||||
|
The anomaly detection option is for pairing with the Anomaly Detection plugin. See [Anomaly Detection]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/).
|
||||||
|
|
||||||
|
For anomaly detector, choose an appropriate schedule for the monitor based on the detector interval. Otherwise, the alerting monitor might miss reading the results. For example, assume you set the monitor interval and the detector interval as 5 minutes, and you start the detector at 12:00. If an anomaly is detected at 12:05, it might be available at 12:06 because of the delay between writing the anomaly and it being available for queries. The monitor reads the anomaly results between 12:00 and 12:05, so it does not get the anomaly results available at 12:06.
|
||||||
|
|
||||||
|
To avoid this issue, make sure the alerting monitor is at least twice the detector interval. When you create a monitor using OpenSearch Dashboards, the anomaly detector plugin generates a default monitor schedule that's twice the detector interval.
|
||||||
|
|
||||||
|
Whenever you update a detector’s interval, make sure to update the associated monitor interval, as the Anomaly Detection plugin does not do this automatically.
|
||||||
|
|
||||||
|
**Step 2.** Choose the frequency to run the monitor, for example, either by time intervals (minutes, hours, days) or on a schedule. If you run it by time interval or on a custom [custom cron expression]({{site.url}}{{site.baseurl}}/monitoring-plugins/alerting/cron/), then you must provide the time zone.
|
||||||
|
|
||||||
|
**Step 3.** Add a trigger to the monitor.
|
|
@ -1,7 +1,7 @@
|
||||||
---
|
---
|
||||||
layout: default
|
layout: default
|
||||||
title: Triggers
|
title: Triggers
|
||||||
nav_order: 10
|
nav_order: 40
|
||||||
grand_parent: Alerting
|
grand_parent: Alerting
|
||||||
parent: Monitors
|
parent: Monitors
|
||||||
---
|
---
|
||||||
|
|
Binary file not shown.
After Width: | Height: | Size: 174 KiB |
Loading…
Reference in New Issue