Add searchbp metrics to Performance Analyzer (#5390)

* added searchbp metrics

Signed-off-by: Heather Halter <hdhalter@amazon.com>

* Update reference.md

Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

* Update reference.md

Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

---------

Signed-off-by: Heather Halter <hdhalter@amazon.com>
Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>
Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>
Co-authored-by: Nathan Bower <nbower@amazon.com>
This commit is contained in:
Heather Halter 2024-02-13 11:11:52 -08:00 committed by GitHub
parent 9799a90c6b
commit b6b2ed7fa0
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
1 changed files with 165 additions and 19 deletions

View File

@ -743,27 +743,173 @@ The following metrics are relevant to the cluster as a whole and do not require
</tbody>
</table>
## Relevant dimensions: `NodeID`, `searchbp_mode`
<table>
<thead style="text-align: left">
<tr>
<th>Metric</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>SearchBP_Shard_Stats_CancellationCount
</td>
<td>The number of tasks marked for cancellation at the shard task level.
</td>
</tr>
<tr>
<td>SearchBP_Shard_Stats_LimitReachedCount
</td>
<td>The number of times that the cancellable task total exceeded the set cancellation threshold at the shard task level.
</td>
</tr>
<tr>
<td>SearchBP_Shard_Stats_Resource_Heap_Usage_CancellationCount
</td>
<td>The number of tasks marked for cancellation because of excessive heap usage since the node last restarted at the shard task level.
</td>
</tr>
<tr>
<td>SearchBP_Shard_Stats_Resource_Heap_Usage_CurrentMax
</td>
<td>The maximum heap usage for tasks currently running at the shard task level.
</td>
</tr>
<tr>
<td>SearchBP_Shard_Stats_Resource_Heap_Usage_RollingAvg
</td>
<td> The rolling average heap usage for the _n_ most recent tasks at the shard task level. The default value for _n_ is `100`.
</td>
</tr>
<tr>
<td>SearchBP_Shard_Stats_Resource_CPU_Usage_CancellationCount
</td>
<td>The number of tasks marked for cancellation because of excessive CPU usage since the node last restarted at the shard task level.
</td>
</tr>
<tr>
<td>SearchBP_Shard_Stats_Resource_CPU_Usage_CurrentMax
</td>
<td>The maximum CPU time for all tasks currently running on the node at the shard task level.
</td>
</tr>
<tr>
<td>SearchBP_Shard_Stats_Resource_CPU_Usage_CurrentAvg
</td>
<td>The average CPU time for all tasks currently running on the node at the shard task level.
</td>
</tr>
<tr>
<td>SearchBP_Shard_Stats_Resource_ElaspedTime_Usage_CancellationCount
</td>
<td>The number of tasks marked for cancellation because of excessive time elapsed since the node last restarted at the shard task level.
</td>
</tr>
<tr>
<td>SearchBP_Shard_Stats_Resource_ElaspedTime_Usage_CurrentMax
</td>
<td>The maximum time elapsed for all tasks currently running on the node at the shard task level.
</td>
</tr>
<tr>
<td>SearchBP_Shard_Stats_Resource_ElaspedTime_Usage_CurrentAvg
</td>
<td>The average time elapsed for all tasks currently running on the node at the shard task level.
</td>
</tr>
<tr>
<td>Searchbp_Task_Stats_CancellationCount
</td>
<td>The number of tasks marked for cancellation at the search task level.
</td>
</tr>
<tr>
<td>SearchBP_Task_Stats_LimitReachedCount
</td>
<td>The number of times that the cancellable task total exceeded the set cancellation threshold at the search task level.
</td>
</tr>
<tr>
<td>SearchBP_Task_Stats_Resource_Heap_Usage_CancellationCount
</td>
<td>The number of tasks marked for cancellation because of excessive heap usage since the node last restarted at the search task level.
</td>
</tr>
<tr>
<td>SearchBP_Task_Stats_Resource_Heap_Usage_CurrentMax
</td>
<td>The maximum heap usage for tasks currently running at the search task level.
</td>
</tr>
<tr>
<td>SearchBP_Task_Stats_Resource_Heap_Usage_RollingAvg
</td>
<td> The rolling average heap usage for the _n_ most recent tasks at the search task level. The default value for _n_ is `10`.
</td>
</tr>
<tr>
<td>SearchBP_Task_Stats_Resource_CPU_Usage_CancellationCount
</td>
<td>The number of tasks marked for cancellation because of excessive CPU usage since the node last restarted at the search task level.
</td>
</tr>
<tr>
<td>SearchBP_Task_Stats_Resource_CPU_Usage_CurrentMax
</td>
<td>The maximum CPU time for all tasks currently running on the node at the search task level.
</td>
</tr>
<tr>
<td>SearchBP_Task_Stats_Resource_CPU_Usage_CurrentAvg
</td>
<td>The average CPU time for all tasks currently running on the node at the search task level.
</td>
</tr>
<tr>
<td>SearchBP_Task_Stats_Resource_ElaspedTime_Usage_CancellationCount
</td>
<td>The number of tasks marked for cancellation because of excessive time elapsed since the node last restarted at the search task level.
</td>
</tr>
<tr>
<td>SearchBP_Task_Stats_Resource_ElaspedTime_Usage_CurrentMax
</td>
<td>The maximum time elapsed for all tasks currently running on the node at the search task level.
</td>
</tr>
<tr>
<td>SearchBP_Task_Stats_Resource_ElaspedTime_Usage_CurrentAvg
</td>
<td>The average time elapsed for all tasks currently running on the node at the search task level.
</td>
</tr>
</tbody>
</table>
## Dimensions reference
| Dimension | Return values |
|----------------------|-------------------------------------------------|
| ShardID | The ID of the shard, for example, `1`. |
| IndexName | The name of the index, for example, `my-index`. |
| Operation | The type of operation, for example, `shardbulk`. |
| ShardRole | The shard role, for example, `primary` or `replica`. |
| Exception | OpenSearch exceptions, for example, `org.opensearch.index_not_found_exception`. |
| Indices | The list of indexes in the request URL. |
| HTTPRespCode | The response code from OpenSearch, for example, `200`. |
| MemType | The memory type, for example, `totYoungGC`, `totFullGC`, `Survivor`, `PermGen`, `OldGen`, `Eden`, `NonHeap`, or `Heap`. |
| DiskName | The name of the disk, for example, `sda1`. |
| DestAddr | The destination address, for example, `010015AC`. |
| Direction | The direction, for example, `in` or `out`. |
| ThreadPoolType | The OpenSearch thread pools, for example, `index`, `search`, or `snapshot`. |
| CBType | The circuit breaker type, for example, `accounting`, `fielddata`, `in_flight_requests`, `parent`, or `request`. |
| ClusterManagerTaskInsertOrder| The order in which the task was inserted, for example, `3691`. |
| ClusterManagerTaskPriority | The priority of the task, for example, `URGENT`. OpenSearch executes higher-priority tasks before lower-priority ones, regardless of `insert_order`. |
| ClusterManagerTaskType | The task type, for example, `shard-started`, `create-index`, `delete-index`, `refresh-mapping`, `put-mapping`, `CleanupSnapshotRestoreState`, or `Update snapshot state`. |
| ClusterManagerTaskMetadata | The metadata for the task (if any). |
| CacheType | The cache type, for example, `Field_Data_Cache`, `Shard_Request_Cache`, or `Node_Query_Cache`. |
| `ShardID` | The ID of the shard, for example, `1`. |
| `IndexName` | The name of the index, for example, `my-index`. |
| `Operation` | The type of operation, for example, `shardbulk`. |
| `ShardRole` | The shard role, for example, `primary` or `replica`. |
| `Exception` | OpenSearch exceptions, for example, `org.opensearch.index_not_found_exception`. |
| `Indices` | The list of indexes in the request URL. |
| `HTTPRespCode` | The OpenSearch response code, for example, `200`. |
| `MemType` | The memory type, for example, `totYoungGC`, `totFullGC`, `Survivor`, `PermGen`, `OldGen`, `Eden`, `NonHeap`, or `Heap`. |
| `DiskName` | The name of the disk, for example, `sda1`. |
| `DestAddr` | The destination address, for example, `010015AC`. |
| `Direction` | The direction, for example, `in` or `out`. |
| `ThreadPoolType` | The OpenSearch thread pools, for example, `index`, `search`, or `snapshot`. |
| `CBType` | The circuit breaker type, for example, `accounting`, `fielddata`, `in_flight_requests`, `parent`, or `request`. |
| `ClusterManagerTaskInsertOrder`| The order in which the task was inserted, for example, `3691`. |
| `ClusterManagerTaskPriority` | The priority of the task, for example, `URGENT`. OpenSearch executes higher-priority tasks before lower-priority ones, regardless of `insert_order`. |
| `ClusterManagerTaskType` | The task type, for example, `shard-started`, `create-index`, `delete-index`, `refresh-mapping`, `put-mapping`, `CleanupSnapshotRestoreState`, or `Update snapshot state`. |
| `ClusterManagerTaskMetadata` | The metadata for the task (if any). |
| `CacheType` | The cache type, for example, `Field_Data_Cache`, `Shard_Request_Cache`, or `Node_Query_Cache`. |
| `NodeID` | The ID of the node. |
| `Searchbp_mode` | The search backpressure mode, for example, `monitor_only` (default), `enforced`, or `disabled`. |