From b05daacf3878a53bdb241e0d79a97b34b0cfd2a0 Mon Sep 17 00:00:00 2001 From: Karishma Joseph Date: Mon, 15 Nov 2021 17:25:54 -0800 Subject: [PATCH] Add PA metrics to the metrics documentation page Signed-off-by: Karishma Joseph --- _monitoring-plugins/pa/reference.md | 166 +++++++++++++++++++++++++++- 1 file changed, 162 insertions(+), 4 deletions(-) diff --git a/_monitoring-plugins/pa/reference.md b/_monitoring-plugins/pa/reference.md index 77b61d89..dc33f0e9 100644 --- a/_monitoring-plugins/pa/reference.md +++ b/_monitoring-plugins/pa/reference.md @@ -26,7 +26,7 @@ This list is extensive. We recommend using Ctrl/Cmd + F to find what you're look CPU_Utilization - ShardID, IndexName, Operation, ShardRole + ShardID, IndexName, Operation, ShardRole CPU usage ratio. CPU time (in milliseconds) used by the associated thread(s) in the past five seconds, divided by 5000 milliseconds. @@ -121,6 +121,18 @@ This list is extensive. We recommend using Ctrl/Cmd + F to find what you're look The total number of times that the associated thread(s) blocked to enter or reenter a monitor (i.e. the number of times a thread has been in the blocked state). + + Thread_Waited_Time + + Average time (seconds) that the associated thread(s) waited to enter or reenter a monitor in WAITING or TIMED_WAITING state. + + + + Thread_Waited_Event + + The total number of times that the associated thread(s) waited to enter or reenter a monitor (i.e. the number of times a thread has been in the WAITING or TIMED_WAITING state). + + ShardEvents @@ -315,6 +327,37 @@ This list is extensive. We recommend using Ctrl/Cmd + F to find what you're look Estimated disk usage of the shard in bytes. + + Indexing_Pressure_Current_Limits + + ShardID, IndexName, IndexingStage + Total heap size (in bytes) that is available for utilization by a shard of an index in a particular indexing stage (Coordinating, Primary or Replica). + + + + Indexing_Pressure_Current_Bytes + + Total heap size (in bytes) occupied by a shard of an index in a particular indexing stage (Coordinating, Primary or Replica). + + + + Indexing_Pressure_Last_Successful_Timestamp + + Timestamp of a request that was successful for a shard of an index in a particular indexing stage (Coordinating, Primary or Replica). + + + + Indexing_Pressure_Rejection_Count + + Total rejections performed by OpenSearch for a shard of an index in a particular indexing stage (Coordinating, Primary or Replica). + + + + Indexing_Pressure_Average_Window_Throughput + + Average throughput of the last n requests (The value of n is determined by `shard_indexing_pressure.secondary_parameter.throughput.request_size_window` setting) for a shard of an index in a particular indexing stage (Coordinating, Primary or Replica). + + Latency @@ -424,7 +467,7 @@ This list is extensive. We recommend using Ctrl/Cmd + F to find what you're look Direction - The total number of IPv4 datagrams transmitted/received from/by interfaces per second, including those transmitted or received in error + The total number of IPv4 datagrams transmitted/received from/by interfaces per second, including those transmitted or received in error. @@ -454,7 +497,7 @@ This list is extensive. We recommend using Ctrl/Cmd + F to find what you're look ThreadPool_QueueSize - ThreadPoolType + ThreadPoolType The size of the task queue. @@ -477,10 +520,22 @@ This list is extensive. We recommend using Ctrl/Cmd + F to find what you're look The approximate number of threads that are actively executing tasks. + + ThreadPool_QueueLatency + + The latency of the task queue. + + + + ThreadPool_QueueCapacity + + The current capacity of the task queue. + + Master_PendingQueueSize - N/A + Master_PendingTaskType The current number of pending tasks in the cluster state update thread. Each node has a cluster state update thread that submits cluster state update tasks (create index, update mapping, allocate shard, fail shard, etc.). @@ -533,6 +588,108 @@ This list is extensive. We recommend using Ctrl/Cmd + F to find what you're look The time (milliseconds) that a master task has been executed. + + Cache_MaxSize + + CacheType + + The max size of the cache in bytes. + + + + AdmissionControl_RejectionCount (WIP) + + ControllerName + + Total rejections performed by a Controller of Admission Control. + + + + AdmissionControl_CurrentValue (WIP) + + Current value for Controller of Admission Control. + + + + AdmissionControl_ThresholdValue (WIP) + + Threshold value for Controller of Admission Control. + + + + Data_RetryingPendingTasksCount (WIP) + + NodeID + + Number of throttled pending tasks on which data node is actively performing retries. It will be an absolute metric at that point of time. + + + + Master_ThrottledPendingTasksCount (WIP) + + Sum of total pending tasks which got throttled by node (master node). It is a cumulative metric so look at the max aggregation. + + + + Election_Term (WIP) + + N/A + + Monotonically increasing number with every master election. + + + + PublishClusterState_Latency (WIP) + + The time taken by quorum of nodes to publish new cluster state. This metric is available for current master. + + + + PublishClusterState_Failure (WIP) + + The number of times publish new cluster state action failed on master node. + + + + ClusterApplierService_Latency (WIP) + + The time taken by each node to apply cluster state sent by master. + + + + ClusterApplierService_Failure (WIP) + + The number of times apply cluster state action failed on each node. + + + + Shard_State (WIP) + + IndexName, NodeName, ShardType, ShardID + + The state of each shard - whether it is STARTED, UNASSIGNED, RELOCATING etc. + + + + LeaderCheck_Latency (WIP) + + WIP + + WIP + + + + FollowerCheck_Failure (WIP) + + + + LeaderCheck_Failure (WIP) + + + + FollowerCheck_Latency (WIP) + + @@ -558,3 +715,4 @@ MasterTaskInsertOrder | The order in which the task was inserted (e.g. `3691`). MasterTaskPriority | Priority of the task (e.g. `URGENT`). OpenSearch executes higher priority tasks before lower priority ones, regardless of `insert_order`. MasterTaskType | `shard-started`, `create-index`, `delete-index`, `refresh-mapping`, `put-mapping`, `CleanupSnapshotRestoreState`, `Update snapshot state` MasterTaskMetadata | Metadata for the task (if any). +CacheType | `Field_Data_Cache`, `Shard_Request_Cache`, `Node_Query_Cache`