diff --git a/_monitoring-your-cluster/pa/reference.md b/_monitoring-your-cluster/pa/reference.md index 9fed7646..4d9e8532 100644 --- a/_monitoring-your-cluster/pa/reference.md +++ b/_monitoring-your-cluster/pa/reference.md @@ -9,27 +9,26 @@ redirect_from: # Metrics reference -This page contains all Performance Analyzer metrics. All metrics support the `avg`, `sum`, `min`, and `max` aggregations, although certain metrics measure only one thing, making the choice of aggregation irrelevant. +Performance Analyzer provides a number of metrics to help you evaluate performance. The following tables describe the available metrics, grouped by the dimensions that are most relevant for that metric. All metrics support the `avg`, `sum`, `min`, and `max` aggregations, although for certain metrics, the measured value is the same regardless of aggregation type. -For information on dimensions, see the [dimensions reference](#dimensions-reference). +For information about each of the dimensions, see [dimensions reference](#dimensions-reference) later in this topic. This list is extensive. We recommend using Ctrl/Cmd + F to find what you're looking for. {: .tip } +## Relevant dimensions: `ShardID`, `IndexName`, `Operation`, `ShardRole` +
Metric | -Dimensions | Description |
---|---|---|
CPU_Utilization | -ShardID, IndexName, Operation, ShardRole - | CPU usage ratio. CPU time (in milliseconds) used by the associated thread(s) in the past five seconds, divided by 5000 milliseconds. |
Heap_AllocRate | -An approximation of the heap memory allocated, in bytes, per second in the past five seconds + | An approximation, in bytes, of the heap memory allocated per second in the last 5 seconds. |
Thread_Blocked_Time | -Average time (seconds) that the associated thread(s) blocked to enter or reenter a monitor. + | The average amount of time, in seconds, that the associated thread has been blocked from entering or reentering a monitor. |
Thread_Blocked_Event | -The total number of times that the associated thread(s) blocked to enter or reenter a monitor (i.e. the number of times a thread has been in the blocked state). + | The total number of times that the associated thread has been blocked from entering or reentering a monitor (that is, the number of times a thread has been in the `blocked` state). |
Thread_Waited_Time | -Average time (seconds) that the associated thread(s) waited to enter or reenter a monitor in WAITING or TIMED_WAITING state. + | The average amount of time, in seconds, that the associated thread has waited to enter or reenter a monitor (that is, the amount of time a thread has been in the `WAITING` or `TIMED_WAITING` state)". |
Thread_Waited_Event | -The total number of times that the associated thread(s) waited to enter or reenter a monitor (i.e. the number of times a thread has been in the WAITING or TIMED_WAITING state). + | The total number of times that the associated thread has waited to enter or reenter a monitor (that is, the number of times a thread has been in the WAITING or TIMED_WAITING state).
|
The total number of documents indexed in the past five seconds. | -
Metric | +Description | +|
---|---|---|
Indexing_ThrottleTime | -ShardID, IndexName - | Time (milliseconds) that the index has been under merge throttling control in the past five seconds. | Estimated disk usage of the shard in bytes. | + +
Metric | +Description | +||
---|---|---|---|
Indexing_Pressure_Current_Limits | -ShardID, IndexName, IndexingStage - | -Total heap size (in bytes) that is available for utilization by a shard of an index in a particular indexing stage (Coordinating, Primary or Replica). + | The total heap size, in bytes, that is available for use by an index shard in a particular indexing stage (Coordinating, Primary, or Replica). |
Indexing_Pressure_Current_Bytes | -Total heap size (in bytes) occupied by a shard of an index in a particular indexing stage (Coordinating, Primary or Replica). + | The total heap size, in bytes, occupied by an index shard in a particular indexing stage (Coordinating, Primary, or Replica). | |
Indexing_Pressure_Last_Successful_Timestamp | -Timestamp of a request that was successful for a shard of an index in a particular indexing stage (Coordinating, Primary or Replica). + | The timestamp of a successful request for an index shard in a particular indexing stage (Coordinating, Primary, or Replica). | |
Indexing_Pressure_Rejection_Count | -Total rejections performed by OpenSearch for a shard of an index in a particular indexing stage (Coordinating, Primary or Replica). + | The total number of rejections performed by OpenSearch for an index shard in a particular indexing stage (Coordinating, Primary, or Replica). | |
Indexing_Pressure_Average_Window_Throughput | -Average throughput of the last n requests (The value of n is determined by `shard_indexing_pressure.secondary_parameter.throughput.request_size_window` setting) for a shard of an index in a particular indexing stage (Coordinating, Primary or Replica). + | The average throughput of the last n requests (The value of n is determined by the `shard_indexing_pressure.secondary_parameter.throughput.request_size_window` setting) for an index shard in a particular indexing stage (Coordinating, Primary, or Replica). |
Latency - | -Operation, Exception, Indices, HTTPRespCode, ShardID, IndexName, ShardRole + | Metric | +Description | +
---|---|---|---|
Latency | Latency (milliseconds) of a request. |
Metric | +Description | +|
---|---|---|
GC_Collection_Event | -MemType - | The number of garbage collections that have occurred in the past five seconds. | The amount of used memory in bytes. | + +
Metric | +Description | +|
---|---|---|
Disk_Utilization | -DiskName - | Disk utilization rate: percentage of disk time spent reading and writing by the OpenSearch process in the past five seconds. | Service rate: MB read or written per second in the past five seconds. This metric assumes that each disk sector stores 512 bytes. | + +
Metric | +Description | +
---|---|
Net_TCP_NumFlows + | +The number of samples collected. Performance Analyzer collects 1 sample every 5 seconds. + | +
Net_TCP_TxQ + | +The average number of TCP packets in the send buffer. + | +
Net_TCP_RxQ + | +The average number of TCP packets in the receive buffer. + | +
Net_TCP_Lost + | +The average number of unrecovered recurring timeouts. This number is reset when the recovery finishes or `SND.UNA` is advanced. `SND.UNA` is the sequence number of the first byte of data that has been sent but not yet acknowledged. + | +
Net_TCP_SendCWND + | +The average size, in bytes, of the sending congestion window. + | +
Net_TCP_SSThresh + | +The average size, in bytes, of the slow start size threshold. + | +
Metric | +Description | +|||
---|---|---|---|---|
Net_TCP_NumFlows - | -DestAddr - | -Number of samples collected. Performance Analyzer collects one sample every five seconds. - | -||
Net_TCP_TxQ - | -Average number of TCP packets in the send buffer. - | -|||
Net_TCP_RxQ - | -Average number of TCP packets in the receive buffer. - | -|||
Net_TCP_Lost - | -Average number of unrecovered recurring timeouts. This number is reset when the recovery finishes or `SND.UNA` is advanced. `SND.UNA` is the sequence number of the first byte of data that has been sent, but not yet acknowledged. - | -|||
Net_TCP_SendCWND - | -Average size (bytes) of the sending congestion window. - | -|||
Net_TCP_SSThresh - | -Average size (bytes) of the slow start size threshold. - | -|||
Net_PacketRate4 - | -Direction - | -The total number of IPv4 datagrams transmitted/received from/by interfaces per second, including those transmitted or received in error. - | +Net_PacketRate4 + | +The total number of IPv4 datagrams transmitted/received from/by interfaces per second, including those transmitted or received in error. + |
Net_PacketDropRate4 @@ -455,225 +531,317 @@ This list is extensive. We recommend using Ctrl/Cmd + F to find what you're look | The number of bits transmitted or received per second by all network interfaces. |
ThreadPool_QueueSize - | -ThreadPoolType - | -The size of the task queue. - | +Metric | +Description | +
---|---|---|---|---|
ThreadPool_QueueSize + | +The size of the task queue. + | +|||
ThreadPool_RejectedReqs + | +The number of rejected executions. + | +|||
ThreadPool_TotalThreads + | +The current number of threads in the pool. + | +|||
ThreadPool_ActiveThreads + | +The approximate number of threads that are actively executing tasks. + | +|||
ThreadPool_QueueLatency + | +The latency of the task queue. + | +|||
ThreadPool_QueueCapacity + | +The current capacity of the task queue. + | +
Metric | +Description | +
---|---|
ClusterManager_PendingQueueSize + | +The current number of pending tasks in the cluster state update thread. Each node has a cluster state update thread that submits cluster state update tasks, such as create index, update mapping, allocate shard, and fail shard. + | +
Metric | +Description | +
---|---|
HTTP_RequestDocs + | +The number of items in the request (only for the `_bulk` request type). + | +
HTTP_TotalRequests + | +The number of requests completed in the last 5 seconds. + | +
Metric | +Description | +||
---|---|---|---|
CB_EstimatedSize + | +The current number of estimated bytes. + | ||
ThreadPool_RejectedReqs - | -The number of rejected executions. - | -||
ThreadPool_TotalThreads - | -The current number of threads in the pool. - | -||
ThreadPool_ActiveThreads - | -The approximate number of threads that are actively executing tasks. - | -||
ThreadPool_QueueLatency - | -The latency of the task queue. - | -||
ThreadPool_QueueCapacity - | -The current capacity of the task queue. - | -||
Master_PendingQueueSize - | -Master_PendingTaskType - | -The current number of pending tasks in the cluster state update thread. Each node has a cluster state update thread that submits cluster state update tasks (create index, update mapping, allocate shard, fail shard, etc.). - | -|
HTTP_RequestDocs - | -Operation, Exception, Indices, HTTPRespCode - | -The number of items in the request (only for `_bulk` request type). - | -|
HTTP_TotalRequests - | -The number of finished requests in the past five seconds. - | -||
CB_EstimatedSize - | -CBType - | -The current number of estimated bytes. - | -|
CB_TrippedEvents - | -The number of times the circuit breaker has tripped. - | +CB_TrippedEvents + | +The number of times that the circuit breaker has tripped. + |
CB_ConfiguredSize | -The limit (bytes) for how much memory operations can use. + | The limit, in bytes, of the amount of memory operations can use. + | +
Metric | +Description | +
---|---|
ClusterManager_Task_Queue_Time + | +The amount of time, in milliseconds, that a cluster manager task spent in the queue. + | +
ClusterManager_Task_Run_Time + | +The amount of time, in milliseconds, that a cluster manager task has been running. + | +
Metric | +Description | +
---|---|
Cache_MaxSize + | +The maximum size of the cache, in bytes. + | +
Metric | +Description | +
---|---|
AdmissionControl_RejectionCount + | +The total number of rejections performed by a Controller of Admission Control. + | +
AdmissionControl_CurrentValue + | +The current value for Controller of Admission Control. + | +
AdmissionControl_ThresholdValue + | +The threshold value for Controller of Admission Control. + | +
Metric | +Description | +||
---|---|---|---|
Data_RetryingPendingTasksCount + | +The number of throttled pending tasks on which the data node is actively performing retries. It is an absolute metric at that point in time. | ||
Master_Task_Queue_Time + | ClusterManager_ThrottledPendingTasksCount | -MasterTaskInsertOrder, MasterTaskPriority, MasterTaskType, MasterTaskMetadata - | -The time (milliseconds) that a master task spent in the queue. - | -
Master_Task_Run_Time - | -The time (milliseconds) that a master task has been executed. - | -||
Cache_MaxSize - | -CacheType - | -The max size of the cache in bytes. - | -|
AdmissionControl_RejectionCount (WIP) - | -ControllerName - | -Total rejections performed by a Controller of Admission Control. - | -|
AdmissionControl_CurrentValue (WIP) - | -Current value for Controller of Admission Control. - | -||
AdmissionControl_ThresholdValue (WIP) - | -Threshold value for Controller of Admission Control. - | -||
Data_RetryingPendingTasksCount (WIP) - | -NodeID - | -Number of throttled pending tasks on which data node is actively performing retries. It will be an absolute metric at that point of time. - | -|
Master_ThrottledPendingTasksCount (WIP) - | -Sum of total pending tasks which got throttled by node (master node). It is a cumulative metric so look at the max aggregation. - | -||
Election_Term (WIP) - | -N/A - | -Monotonically increasing number with every master election. - | -|
PublishClusterState_Latency (WIP) - | -The time taken by quorum of nodes to publish new cluster state. This metric is available for current master. - | -||
PublishClusterState_Failure (WIP) - | -The number of times publish new cluster state action failed on master node. - | -||
ClusterApplierService_Latency (WIP) - | -The time taken by each node to apply cluster state sent by master. - | -||
ClusterApplierService_Failure (WIP) - | -The number of times apply cluster state action failed on each node. - | -||
Shard_State (WIP) - | -IndexName, NodeName, ShardType, ShardID - | -The state of each shard - whether it is STARTED, UNASSIGNED, RELOCATING etc. - | -|
LeaderCheck_Latency (WIP) - | -WIP - | -WIP - | -|
FollowerCheck_Failure (WIP) - | -|||
LeaderCheck_Failure (WIP) - | -|||
FollowerCheck_Latency (WIP) + | The sum of the total pending tasks that were throttled by the cluster manager node. This is a cumulative metric, so make sure to check the max aggregation. |
Metric | +Description | +
---|---|
Election_Term + | +A number that increases monotonically with every cluster manager election. + | +
PublishClusterState_Latency + | +The amount of time taken by the quorum of nodes to publish the new cluster state. This metric is available for the current cluster manager. + | +
PublishClusterState_Failure + | +The number of times the new cluster state failed to publish on the cluster manager node. + | +
ClusterApplierService_Latency + | +The amount of time taken by each node for the apply cluster state sent by the cluster manager. + | +
ClusterApplierService_Failure + | +The number of times that the apply cluster state action failed on each node. + | +
Metric | +Description | +
---|---|
Shard_State + | +The state of each shard, for example, `STARTED`, `UNASSIGNED`, or `RELOCATING`. + | +