Merge pull request #275 from kjoseph07/karisjos-add-metrics

Add PA metrics to the metrics documentation page
This commit is contained in:
Keith Chan 2021-11-23 09:46:35 -08:00 committed by GitHub
commit 2b3014c66b
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 162 additions and 4 deletions

View File

@ -26,7 +26,7 @@ This list is extensive. We recommend using Ctrl/Cmd + F to find what you're look
<tr> <tr>
<td>CPU_Utilization <td>CPU_Utilization
</td> </td>
<td rowspan="18">ShardID, IndexName, Operation, ShardRole <td rowspan="20">ShardID, IndexName, Operation, ShardRole
</td> </td>
<td>CPU usage ratio. CPU time (in milliseconds) used by the associated thread(s) in the past five seconds, divided by 5000 milliseconds. <td>CPU usage ratio. CPU time (in milliseconds) used by the associated thread(s) in the past five seconds, divided by 5000 milliseconds.
</td> </td>
@ -121,6 +121,18 @@ This list is extensive. We recommend using Ctrl/Cmd + F to find what you're look
<td>The total number of times that the associated thread(s) blocked to enter or reenter a monitor (i.e. the number of times a thread has been in the blocked state). <td>The total number of times that the associated thread(s) blocked to enter or reenter a monitor (i.e. the number of times a thread has been in the blocked state).
</td> </td>
</tr> </tr>
<tr>
<td>Thread_Waited_Time
</td>
<td>Average time (seconds) that the associated thread(s) waited to enter or reenter a monitor in WAITING or TIMED_WAITING state.
</td>
</tr>
<tr>
<td>Thread_Waited_Event
</td>
<td>The total number of times that the associated thread(s) waited to enter or reenter a monitor (i.e. the number of times a thread has been in the WAITING or TIMED_WAITING state).
</td>
</tr>
<tr> <tr>
<td>ShardEvents <td>ShardEvents
</td> </td>
@ -315,6 +327,37 @@ This list is extensive. We recommend using Ctrl/Cmd + F to find what you're look
<td>Estimated disk usage of the shard in bytes. <td>Estimated disk usage of the shard in bytes.
</td> </td>
</tr> </tr>
<tr>
<td>Indexing_Pressure_Current_Limits
</td>
<td rowspan="5">ShardID, IndexName, IndexingStage
<td>Total heap size (in bytes) that is available for utilization by a shard of an index in a particular indexing stage (Coordinating, Primary or Replica).
</td>
</tr>
<tr>
<td>Indexing_Pressure_Current_Bytes
</td>
<td>Total heap size (in bytes) occupied by a shard of an index in a particular indexing stage (Coordinating, Primary or Replica).
</td>
</tr>
<tr>
<td>Indexing_Pressure_Last_Successful_Timestamp
</td>
<td>Timestamp of a request that was successful for a shard of an index in a particular indexing stage (Coordinating, Primary or Replica).
</td>
</tr>
<tr>
<td>Indexing_Pressure_Rejection_Count
</td>
<td>Total rejections performed by OpenSearch for a shard of an index in a particular indexing stage (Coordinating, Primary or Replica).
</td>
</tr>
<tr>
<td>Indexing_Pressure_Average_Window_Throughput
</td>
<td>Average throughput of the last n requests (The value of n is determined by `shard_indexing_pressure.secondary_parameter.throughput.request_size_window` setting) for a shard of an index in a particular indexing stage (Coordinating, Primary or Replica).
</td>
</tr>
<tr> <tr>
<td>Latency <td>Latency
</td> </td>
@ -424,7 +467,7 @@ This list is extensive. We recommend using Ctrl/Cmd + F to find what you're look
</td> </td>
<td rowspan="5">Direction <td rowspan="5">Direction
</td> </td>
<td>The total number of IPv4 datagrams transmitted/received from/by interfaces per second, including those transmitted or received in error <td>The total number of IPv4 datagrams transmitted/received from/by interfaces per second, including those transmitted or received in error.
</td> </td>
</tr> </tr>
<tr> <tr>
@ -454,7 +497,7 @@ This list is extensive. We recommend using Ctrl/Cmd + F to find what you're look
<tr> <tr>
<td>ThreadPool_QueueSize <td>ThreadPool_QueueSize
</td> </td>
<td rowspan="4">ThreadPoolType <td rowspan="6">ThreadPoolType
</td> </td>
<td>The size of the task queue. <td>The size of the task queue.
</td> </td>
@ -477,10 +520,22 @@ This list is extensive. We recommend using Ctrl/Cmd + F to find what you're look
<td>The approximate number of threads that are actively executing tasks. <td>The approximate number of threads that are actively executing tasks.
</td> </td>
</tr> </tr>
<tr>
<td>ThreadPool_QueueLatency
</td>
<td>The latency of the task queue.
</td>
</tr>
<tr>
<td>ThreadPool_QueueCapacity
</td>
<td>The current capacity of the task queue.
</td>
</tr>
<tr> <tr>
<td>Master_PendingQueueSize <td>Master_PendingQueueSize
</td> </td>
<td>N/A <td>Master_PendingTaskType
</td> </td>
<td>The current number of pending tasks in the cluster state update thread. Each node has a cluster state update thread that submits cluster state update tasks (create index, update mapping, allocate shard, fail shard, etc.). <td>The current number of pending tasks in the cluster state update thread. Each node has a cluster state update thread that submits cluster state update tasks (create index, update mapping, allocate shard, fail shard, etc.).
</td> </td>
@ -533,6 +588,108 @@ This list is extensive. We recommend using Ctrl/Cmd + F to find what you're look
<td>The time (milliseconds) that a master task has been executed. <td>The time (milliseconds) that a master task has been executed.
</td> </td>
</tr> </tr>
<tr>
<td>Cache_MaxSize
</td>
<td>CacheType
</td>
<td>The max size of the cache in bytes.
</td>
</tr>
<tr>
<td>AdmissionControl_RejectionCount (WIP)
</td>
<td rowspan="3">ControllerName
</td>
<td>Total rejections performed by a Controller of Admission Control.
</td>
</tr>
<tr>
<td>AdmissionControl_CurrentValue (WIP)
</td>
<td>Current value for Controller of Admission Control.
</td>
</tr>
<tr>
<td>AdmissionControl_ThresholdValue (WIP)
</td>
<td>Threshold value for Controller of Admission Control.
</td>
</tr>
<tr>
<td>Data_RetryingPendingTasksCount (WIP)
</td>
<td rowspan="2"> NodeID
</td>
<td>Number of throttled pending tasks on which data node is actively performing retries. It will be an absolute metric at that point of time.
</td>
</tr>
<tr>
<td>Master_ThrottledPendingTasksCount (WIP)
</td>
<td>Sum of total pending tasks which got throttled by node (master node). It is a cumulative metric so look at the max aggregation.
</td>
</tr>
<tr>
<td>Election_Term (WIP)
</td>
<td rowspan="5">N/A
</td>
<td>Monotonically increasing number with every master election.
</td>
</tr>
<tr>
<td>PublishClusterState_Latency (WIP)
</td>
<td>The time taken by quorum of nodes to publish new cluster state. This metric is available for current master.
</td>
</tr>
<tr>
<td>PublishClusterState_Failure (WIP)
</td>
<td>The number of times publish new cluster state action failed on master node.
</td>
</tr>
<tr>
<td>ClusterApplierService_Latency (WIP)
</td>
<td>The time taken by each node to apply cluster state sent by master.
</td>
</tr>
<tr>
<td>ClusterApplierService_Failure (WIP)
</td>
<td>The number of times apply cluster state action failed on each node.
</td>
</tr>
<tr>
<td>Shard_State (WIP)
</td>
<td>IndexName, NodeName, ShardType, ShardID
</td>
<td>The state of each shard - whether it is STARTED, UNASSIGNED, RELOCATING etc.
</td>
</tr>
<tr>
<td>LeaderCheck_Latency (WIP)
</td>
<td rowspan="4">WIP
</td>
<td rowspan="4">WIP
</td>
</tr>
<tr>
<td>FollowerCheck_Failure (WIP)
</td>
</tr>
<tr>
<td>LeaderCheck_Failure (WIP)
</td>
</tr>
<tr>
<td>FollowerCheck_Latency (WIP)
</td>
</tr>
</tbody> </tbody>
</table> </table>
@ -558,3 +715,4 @@ MasterTaskInsertOrder | The order in which the task was inserted (e.g. `3691`).
MasterTaskPriority | Priority of the task (e.g. `URGENT`). OpenSearch executes higher priority tasks before lower priority ones, regardless of `insert_order`. MasterTaskPriority | Priority of the task (e.g. `URGENT`). OpenSearch executes higher priority tasks before lower priority ones, regardless of `insert_order`.
MasterTaskType | `shard-started`, `create-index`, `delete-index`, `refresh-mapping`, `put-mapping`, `CleanupSnapshotRestoreState`, `Update snapshot state` MasterTaskType | `shard-started`, `create-index`, `delete-index`, `refresh-mapping`, `put-mapping`, `CleanupSnapshotRestoreState`, `Update snapshot state`
MasterTaskMetadata | Metadata for the task (if any). MasterTaskMetadata | Metadata for the task (if any).
CacheType | `Field_Data_Cache`, `Shard_Request_Cache`, `Node_Query_Cache`