From e6d729c27cedb06236c9f1a5a54e34d69e393aa2 Mon Sep 17 00:00:00 2001 From: Arpit Agarwal Date: Fri, 13 Jun 2014 02:56:47 +0000 Subject: [PATCH] HADOOP-6350: Merging r1602324 from trunk to branch-2. git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-2@1602325 13f79535-47bb-0310-9956-ffa450edef68 --- .../hadoop-common/CHANGES.txt | 2 + .../hadoop-common/src/site/apt/Metrics.apt.vm | 732 ++++++++++++++++++ hadoop-project/src/site/site.xml | 1 + 3 files changed, 735 insertions(+) create mode 100644 hadoop-common-project/hadoop-common/src/site/apt/Metrics.apt.vm diff --git a/hadoop-common-project/hadoop-common/CHANGES.txt b/hadoop-common-project/hadoop-common/CHANGES.txt index 3c1aed5301f..18dd7a6ad40 100644 --- a/hadoop-common-project/hadoop-common/CHANGES.txt +++ b/hadoop-common-project/hadoop-common/CHANGES.txt @@ -75,6 +75,8 @@ Release 2.5.0 - UNRELEASED HADOOP-10376. Refactor refresh*Protocols into a single generic refreshConfigProtocol. (Chris Li via Arpit Agarwal) + HADOOP-6350. Documenting Hadoop metrics. (Akira Ajisaka via Arpit Agarwal) + OPTIMIZATIONS BUG FIXES diff --git a/hadoop-common-project/hadoop-common/src/site/apt/Metrics.apt.vm b/hadoop-common-project/hadoop-common/src/site/apt/Metrics.apt.vm new file mode 100644 index 00000000000..55e532df9fc --- /dev/null +++ b/hadoop-common-project/hadoop-common/src/site/apt/Metrics.apt.vm @@ -0,0 +1,732 @@ +~~ Licensed to the Apache Software Foundation (ASF) under one or more +~~ contributor license agreements. See the NOTICE file distributed with +~~ this work for additional information regarding copyright ownership. +~~ The ASF licenses this file to You under the Apache License, Version 2.0 +~~ (the "License"); you may not use this file except in compliance with +~~ the License. You may obtain a copy of the License at +~~ +~~ http://www.apache.org/licenses/LICENSE-2.0 +~~ +~~ Unless required by applicable law or agreed to in writing, software +~~ distributed under the License is distributed on an "AS IS" BASIS, +~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +~~ See the License for the specific language governing permissions and +~~ limitations under the License. + + --- + Metrics Guide + --- + --- + ${maven.build.timestamp} + +%{toc} + +Overview + + Metrics are statistical information exposed by Hadoop daemons, + used for monitoring, performance tuning and debug. + There are many metrics available by default + and they are very useful for troubleshooting. + This page shows the details of the available metrics. + + Each section describes each context into which metrics are grouped. + + The documentation of Metrics 2.0 framework is + {{{../../api/org/apache/hadoop/metrics2/package-summary.html}here}}. + +jvm context + +* JvmMetrics + + Each metrics record contains tags such as ProcessName, SessionID + and Hostname as additional information along with metrics. + +*-------------------------------------+--------------------------------------+ +|| Name || Description +*-------------------------------------+--------------------------------------+ +|<<>> | Current non-heap memory used in MB +*-------------------------------------+--------------------------------------+ +|<<>> | Current non-heap memory committed in MB +*-------------------------------------+--------------------------------------+ +|<<>> | Max non-heap memory size in MB +*-------------------------------------+--------------------------------------+ +|<<>> | Current heap memory used in MB +*-------------------------------------+--------------------------------------+ +|<<>> | Current heap memory committed in MB +*-------------------------------------+--------------------------------------+ +|<<>> | Max heap memory size in MB +*-------------------------------------+--------------------------------------+ +|<<>> | Max memory size in MB +*-------------------------------------+--------------------------------------+ +|<<>> | Current number of NEW threads +*-------------------------------------+--------------------------------------+ +|<<>> | Current number of RUNNABLE threads +*-------------------------------------+--------------------------------------+ +|<<>> | Current number of BLOCKED threads +*-------------------------------------+--------------------------------------+ +|<<>> | Current number of WAITING threads +*-------------------------------------+--------------------------------------+ +|<<>> | Current number of TIMED_WAITING threads +*-------------------------------------+--------------------------------------+ +|<<>> | Current number of TERMINATED threads +*-------------------------------------+--------------------------------------+ +|<<>> | Total GC count and GC time in msec, grouped by the kind of GC. \ + | ex.) GcCountPS Scavenge=6, GCTimeMillisPS Scavenge=40, + | GCCountPS MarkSweep=0, GCTimeMillisPS MarkSweep=0 +*-------------------------------------+--------------------------------------+ +|<<>> | Total GC count +*-------------------------------------+--------------------------------------+ +|<<>> | Total GC time in msec +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of FATAL logs +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of ERROR logs +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of WARN logs +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of INFO logs +*-------------------------------------+--------------------------------------+ + +rpc context + +* rpc + + Each metrics record contains tags such as Hostname + and port (number to which server is bound) + as additional information along with metrics. + +*-------------------------------------+--------------------------------------+ +|| Name || Description +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of received bytes +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of sent bytes +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of RPC calls +*-------------------------------------+--------------------------------------+ +|<<>> | Average queue time in milliseconds +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of RPC calls (same to + | RpcQueueTimeNumOps) +*-------------------------------------+--------------------------------------+ +|<<>> | Average Processing time in milliseconds +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of authentication failures +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of authentication successes +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of authorization failures +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of authorization successes +*-------------------------------------+--------------------------------------+ +|<<>> | Current number of open connections +*-------------------------------------+--------------------------------------+ +|<<>> | Current length of the call queue +*-------------------------------------+--------------------------------------+ +|<<>><<>> | Shows total number of RPC calls +| | ( seconds granularity) if <<>> is set to +| | true. is specified by <<>>. +*-------------------------------------+--------------------------------------+ +|<<>><<>> | +| | Shows the 50th percentile of RPC queue time in milliseconds +| | ( seconds granularity) if <<>> is set to +| | true. is specified by <<>>. +*-------------------------------------+--------------------------------------+ +|<<>><<>> | +| | Shows the 75th percentile of RPC queue time in milliseconds +| | ( seconds granularity) if <<>> is set to +| | true. is specified by <<>>. +*-------------------------------------+--------------------------------------+ +|<<>><<>> | +| | Shows the 90th percentile of RPC queue time in milliseconds +| | ( seconds granularity) if <<>> is set to +| | true. is specified by <<>>. +*-------------------------------------+--------------------------------------+ +|<<>><<>> | +| | Shows the 95th percentile of RPC queue time in milliseconds +| | ( seconds granularity) if <<>> is set to +| | true. is specified by <<>>. +*-------------------------------------+--------------------------------------+ +|<<>><<>> | +| | Shows the 99th percentile of RPC queue time in milliseconds +| | ( seconds granularity) if <<>> is set to +| | true. is specified by <<>>. +*-------------------------------------+--------------------------------------+ +|<<>><<>> | Shows total number of RPC calls +| | ( seconds granularity) if <<>> is set to +| | true. is specified by <<>>. +*-------------------------------------+--------------------------------------+ +|<<>><<>> | +| | Shows the 50th percentile of RPC processing time in milliseconds +| | ( seconds granularity) if <<>> is set to +| | true. is specified by <<>>. +*-------------------------------------+--------------------------------------+ +|<<>><<>> | +| | Shows the 75th percentile of RPC processing time in milliseconds +| | ( seconds granularity) if <<>> is set to +| | true. is specified by <<>>. +*-------------------------------------+--------------------------------------+ +|<<>><<>> | +| | Shows the 90th percentile of RPC processing time in milliseconds +| | ( seconds granularity) if <<>> is set to +| | true. is specified by <<>>. +*-------------------------------------+--------------------------------------+ +|<<>><<>> | +| | Shows the 95th percentile of RPC processing time in milliseconds +| | ( seconds granularity) if <<>> is set to +| | true. is specified by <<>>. +*-------------------------------------+--------------------------------------+ +|<<>><<>> | +| | Shows the 99th percentile of RPC processing time in milliseconds +| | ( seconds granularity) if <<>> is set to +| | true. is specified by <<>>. +*-------------------------------------+--------------------------------------+ + +* RetryCache/NameNodeRetryCache + + RetryCache metrics is useful to monitor NameNode fail-over. + Each metrics record contains Hostname tag. + +*-------------------------------------+--------------------------------------+ +|| Name || Description +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of RetryCache hit +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of RetryCache cleared +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of RetryCache updated +*-------------------------------------+--------------------------------------+ + +rpcdetailed context + + Metrics of rpcdetailed context are exposed in unified manner by RPC + layer. Two metrics are exposed for each RPC based on its name. + Metrics named "(RPC method name)NumOps" indicates total number of + method calls, and metrics named "(RPC method name)AvgTime" shows + average turn around time for method calls in milliseconds. + +* rpcdetailed + + Each metrics record contains tags such as Hostname + and port (number to which server is bound) + as additional information along with metrics. + + The Metrics about RPCs which is not called are not included + in metrics record. + +*-------------------------------------+--------------------------------------+ +|| Name || Description +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of the times the method is called +*-------------------------------------+--------------------------------------+ +|<<>> | Average turn around time of the method in + | milliseconds +*-------------------------------------+--------------------------------------+ + +dfs context + +* namenode + + Each metrics record contains tags such as ProcessName, SessionId, + and Hostname as additional information along with metrics. + +*-------------------------------------+--------------------------------------+ +|| Name || Description +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of files created +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of files and directories created by create + | or mkdir operations +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of files appended +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of getBlockLocations operations +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of rename <> (NOT number of + | files/dirs renamed) +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of directory listing operations +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of delete operations +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of files and directories deleted by delete + | or rename operations +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of getFileInfo and getLinkFileInfo + | operations +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of addBlock operations succeeded +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of getAdditionalDatanode + | operations +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of createSymlink operations +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of getLinkTarget operations +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of files and directories listed by + | directory listing operations +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of allowSnapshot operations +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of disallowSnapshot operations +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of createSnapshot operations +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of deleteSnapshot operations +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of renameSnapshot operations +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of snapshottableDirectoryStatus + | operations +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of getSnapshotDiffReport + | operations +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of Journal transactions +*-------------------------------------+--------------------------------------+ +|<<>> | Average time of Journal transactions in + | milliseconds +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of Journal syncs +*-------------------------------------+--------------------------------------+ +|<<>> | Average time of Journal syncs in milliseconds +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of Journal transactions batched + | in sync +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of processing block reports from + | DataNode +*-------------------------------------+--------------------------------------+ +|<<>> | Average time of processing block reports in + | milliseconds +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of processing cache reports from + | DataNode +*-------------------------------------+--------------------------------------+ +|<<>> | Average time of processing cache reports in + | milliseconds +*-------------------------------------+--------------------------------------+ +|<<>> | The interval between FSNameSystem starts and the last + | time safemode leaves in milliseconds. \ + | (sometimes not equal to the time in SafeMode, + | see {{{https://issues.apache.org/jira/browse/HDFS-5156}HDFS-5156}}) +*-------------------------------------+--------------------------------------+ +|<<>> | Time loading FS Image at startup in milliseconds +*-------------------------------------+--------------------------------------+ +|<<>> | Time loading FS Image at startup in milliseconds +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of edits downloads from SecondaryNameNode +*-------------------------------------+--------------------------------------+ +|<<>> | Average edits download time in milliseconds +*-------------------------------------+--------------------------------------+ +|<<>> |Total number of fsimage downloads from SecondaryNameNode +*-------------------------------------+--------------------------------------+ +|<<>> | Average fsimage download time in milliseconds +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of fsimage uploads to SecondaryNameNode +*-------------------------------------+--------------------------------------+ +|<<>> | Average fsimage upload time in milliseconds +*-------------------------------------+--------------------------------------+ + +* FSNamesystem + + Each metrics record contains tags such as HAState and Hostname + as additional information along with metrics. + +*-------------------------------------+--------------------------------------+ +|| Name || Description +*-------------------------------------+--------------------------------------+ +|<<>> | Current number of missing blocks +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of expired heartbeats +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of transactions since + | last checkpoint +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of transactions since last + | edit log roll +*-------------------------------------+--------------------------------------+ +|<<>> | Last transaction ID written to the edit log +*-------------------------------------+--------------------------------------+ +|<<>> | Time in milliseconds since epoch of last checkpoint +*-------------------------------------+--------------------------------------+ +|<<>> | Current raw capacity of DataNodes in bytes +*-------------------------------------+--------------------------------------+ +|<<>> | Current raw capacity of DataNodes in GB +*-------------------------------------+--------------------------------------+ +|<<>> | Current used capacity across all DataNodes in bytes +*-------------------------------------+--------------------------------------+ +|<<>> | Current used capacity across all DataNodes in GB +*-------------------------------------+--------------------------------------+ +|<<>> | Current remaining capacity in bytes +*-------------------------------------+--------------------------------------+ +|<<>> | Current remaining capacity in GB +*-------------------------------------+--------------------------------------+ +|<<>> | Current space used by DataNodes for non DFS + | purposes in bytes +*-------------------------------------+--------------------------------------+ +|<<>> | Current number of connections +*-------------------------------------+--------------------------------------+ +|<<>> | Current number of snapshottable directories +*-------------------------------------+--------------------------------------+ +|<<>> | Current number of snapshots +*-------------------------------------+--------------------------------------+ +|<<>> | Current number of allocated blocks in the system +*-------------------------------------+--------------------------------------+ +|<<>> | Current number of files and directories +*-------------------------------------+--------------------------------------+ +|<<>> | Current number of blocks pending to be + | replicated +*-------------------------------------+--------------------------------------+ +|<<>> | Current number of blocks under replicated +*-------------------------------------+--------------------------------------+ +|<<>> | Current number of blocks with corrupt replicas. +*-------------------------------------+--------------------------------------+ +|<<>> | Current number of blocks scheduled for + | replications +*-------------------------------------+--------------------------------------+ +|<<>> | Current number of blocks pending deletion +*-------------------------------------+--------------------------------------+ +|<<>> | Current number of excess blocks +*-------------------------------------+--------------------------------------+ +|<<>> | (HA-only) Current number of blocks + | postponed to replicate +*-------------------------------------+--------------------------------------+ +|<<>> | (HA-only) Current number of pending + | block-related messages for later + | processing in the standby NameNode +*-------------------------------------+--------------------------------------+ +|<<>> | (HA-only) Time in milliseconds since the + | last time standby NameNode load edit log. + | In active NameNode, set to 0 +*-------------------------------------+--------------------------------------+ +|<<>> | Current number of block capacity +*-------------------------------------+--------------------------------------+ +|<<>> | Current number of DataNodes marked stale due to delayed + | heartbeat +*-------------------------------------+--------------------------------------+ +|<<>> |Current number of files and directories (same as FilesTotal) +*-------------------------------------+--------------------------------------+ + +* JournalNode + + The server-side metrics for a journal from the JournalNode's perspective. + Each metrics record contains Hostname tag as additional information + along with metrics. + +*-------------------------------------+--------------------------------------+ +|| Name || Description +*-------------------------------------+--------------------------------------+ +|<<>> | Number of sync operations (1 minute granularity) +*-------------------------------------+--------------------------------------+ +|<<>> | The 50th percentile of sync +| | latency in microseconds (1 minute granularity) +*-------------------------------------+--------------------------------------+ +|<<>> | The 75th percentile of sync +| | latency in microseconds (1 minute granularity) +*-------------------------------------+--------------------------------------+ +|<<>> | The 90th percentile of sync +| | latency in microseconds (1 minute granularity) +*-------------------------------------+--------------------------------------+ +|<<>> | The 95th percentile of sync +| | latency in microseconds (1 minute granularity) +*-------------------------------------+--------------------------------------+ +|<<>> | The 99th percentile of sync +| | latency in microseconds (1 minute granularity) +*-------------------------------------+--------------------------------------+ +|<<>> | Number of sync operations (5 minutes granularity) +*-------------------------------------+--------------------------------------+ +|<<>> | The 50th percentile of sync +| | latency in microseconds (5 minutes granularity) +*-------------------------------------+--------------------------------------+ +|<<>> | The 75th percentile of sync +| | latency in microseconds (5 minutes granularity) +*-------------------------------------+--------------------------------------+ +|<<>> | The 90th percentile of sync +| | latency in microseconds (5 minutes granularity) +*-------------------------------------+--------------------------------------+ +|<<>> | The 95th percentile of sync +| | latency in microseconds (5 minutes granularity) +*-------------------------------------+--------------------------------------+ +|<<>> | The 99th percentile of sync +| | latency in microseconds (5 minutes granularity) +*-------------------------------------+--------------------------------------+ +|<<>> | Number of sync operations (1 hour granularity) +*-------------------------------------+--------------------------------------+ +|<<>> | The 50th percentile of sync +| | latency in microseconds (1 hour granularity) +*-------------------------------------+--------------------------------------+ +|<<>> | The 75th percentile of sync +| | latency in microseconds (1 hour granularity) +*-------------------------------------+--------------------------------------+ +|<<>> | The 90th percentile of sync +| | latency in microseconds (1 hour granularity) +*-------------------------------------+--------------------------------------+ +|<<>> | The 95th percentile of sync +| | latency in microseconds (1 hour granularity) +*-------------------------------------+--------------------------------------+ +|<<>> | The 99th percentile of sync +| | latency in microseconds (1 hour granularity) +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of batches written since startup +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of transactions written since startup +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of bytes written since startup +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of batches written where this +| | node was lagging +*-------------------------------------+--------------------------------------+ +|<<>> | Current writer's epoch number +*-------------------------------------+--------------------------------------+ +|<<>> | The number of transactions that this JournalNode is +| | lagging +*-------------------------------------+--------------------------------------+ +|<<>> | The highest transaction id stored on this JournalNode +*-------------------------------------+--------------------------------------+ +|<<>> | The last epoch number which this node has promised +| | not to accept any lower epoch, or 0 if no promises have been made +*-------------------------------------+--------------------------------------+ + +* datanode + + Each metrics record contains tags such as SessionId and Hostname + as additional information along with metrics. + +*-------------------------------------+--------------------------------------+ +|| Name || Description +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of bytes written to DataNode +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of bytes read from DataNode +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of blocks written to DataNode +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of blocks read from DataNode +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of blocks replicated +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of blocks removed +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of blocks verified +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of verifications failures +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of blocks cached +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of blocks uncached +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of read operations from local client +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of read operations from remote + | client +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of write operations from local + | client +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of write operations from remote + | client +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of operations to get local path + | names of blocks +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of fsync +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of volume failures occurred +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of read operations +*-------------------------------------+--------------------------------------+ +|<<>> | Average time of read operations in milliseconds +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of write operations +*-------------------------------------+--------------------------------------+ +|<<>> | Average time of write operations in milliseconds +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of blockChecksum operations +*-------------------------------------+--------------------------------------+ +|<<>> | Average time of blockChecksum operations in + | milliseconds +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of block copy operations +*-------------------------------------+--------------------------------------+ +|<<>> | Average time of block copy operations in + | milliseconds +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of block replace operations +*-------------------------------------+--------------------------------------+ +|<<>> | Average time of block replace operations in + | milliseconds +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of heartbeats +*-------------------------------------+--------------------------------------+ +|<<>> | Average heartbeat time in milliseconds +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of block report operations +*-------------------------------------+--------------------------------------+ +|<<>> | Average time of block report operations in + | milliseconds +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of cache report operations +*-------------------------------------+--------------------------------------+ +|<<>> | Average time of cache report operations in + | milliseconds +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of ack round trip +*-------------------------------------+--------------------------------------+ +|<<>> | Average time from ack send to +| | receive minus the downstream ack time in nanoseconds +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of flushes +*-------------------------------------+--------------------------------------+ +|<<>> | Average flush time in nanoseconds +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of fsync +*-------------------------------------+--------------------------------------+ +|<<>> | Average fsync time in nanoseconds +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of sending + | packets +*-------------------------------------+--------------------------------------+ +|<<>> | Average waiting time of +| | sending packets in nanoseconds +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of sending packets +*-------------------------------------+--------------------------------------+ +|<<>> | Average transfer time of sending + | packets in nanoseconds +*-------------------------------------+--------------------------------------+ + +ugi context + +* UgiMetrics + + UgiMetrics is related to user and group information. + Each metrics record contains Hostname tag as additional information + along with metrics. + +*-------------------------------------+--------------------------------------+ +|| Name || Description +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of successful kerberos logins +*-------------------------------------+--------------------------------------+ +|<<>> | Average time for successful kerberos logins in + | milliseconds +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of failed kerberos logins +*-------------------------------------+--------------------------------------+ +|<<>> | Average time for failed kerberos logins in + | milliseconds +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of group resolutions +*-------------------------------------+--------------------------------------+ +|<<>> | Average time for group resolution in milliseconds +*-------------------------------------+--------------------------------------+ +|<<>><<>> | +| | Total number of group resolutions ( seconds granularity). is +| | specified by <<>>. +*-------------------------------------+--------------------------------------+ +|<<>><<>> | +| | Shows the 50th percentile of group resolution time in milliseconds +| | ( seconds granularity). is specified by +| | <<>>. +*-------------------------------------+--------------------------------------+ +|<<>><<>> | +| | Shows the 75th percentile of group resolution time in milliseconds +| | ( seconds granularity). is specified by +| | <<>>. +*-------------------------------------+--------------------------------------+ +|<<>><<>> | +| | Shows the 90th percentile of group resolution time in milliseconds +| | ( seconds granularity). is specified by +| | <<>>. +*-------------------------------------+--------------------------------------+ +|<<>><<>> | +| | Shows the 95th percentile of group resolution time in milliseconds +| | ( seconds granularity). is specified by +| | <<>>. +*-------------------------------------+--------------------------------------+ +|<<>><<>> | +| | Shows the 99th percentile of group resolution time in milliseconds +| | ( seconds granularity). is specified by +| | <<>>. +*-------------------------------------+--------------------------------------+ + +metricssystem context + +* MetricsSystem + + MetricsSystem shows the statistics for metrics snapshots and publishes. + Each metrics record contains Hostname tag as additional information + along with metrics. + +*-------------------------------------+--------------------------------------+ +|| Name || Description +*-------------------------------------+--------------------------------------+ +|<<>> | Current number of active metrics sources +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of metrics sources +*-------------------------------------+--------------------------------------+ +|<<>> | Current number of active sinks +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of sinks \ + | (BUT usually less than <<>>, + | see {{{https://issues.apache.org/jira/browse/HADOOP-9946}HADOOP-9946}}) +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of operations to snapshot statistics from + | a metrics source +*-------------------------------------+--------------------------------------+ +|<<>> | Average time in milliseconds to snapshot statistics + | from a metrics source +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of operations to publish statistics to a + | sink +*-------------------------------------+--------------------------------------+ +|<<>> | Average time in milliseconds to publish statistics to + | a sink +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of dropped publishes +*-------------------------------------+--------------------------------------+ +|<<>><<>> | Total number of sink operations for the + | +*-------------------------------------+--------------------------------------+ +|<<>><<>> | Average time in milliseconds of sink + | operations for the +*-------------------------------------+--------------------------------------+ +|<<>><<>> | Total number of dropped sink operations + | for the +*-------------------------------------+--------------------------------------+ +|<<>><<>> | Current queue length of sink operations \ + | (BUT always set to 0 because nothing to + | increment this metrics, see + | {{{https://issues.apache.org/jira/browse/HADOOP-9941}HADOOP-9941}}) +*-------------------------------------+--------------------------------------+ + +default context + +* StartupProgress + + StartupProgress metrics shows the statistics of NameNode startup. + Four metrics are exposed for each startup phase based on its name. + The startup s are <<>>, <<>>, + <<>>, and <<>>. + Each metrics record contains Hostname tag as additional information + along with metrics. + +*-------------------------------------+--------------------------------------+ +|| Name || Description +*-------------------------------------+--------------------------------------+ +|<<>> | Total elapsed time in milliseconds +*-------------------------------------+--------------------------------------+ +|<<>> | Current rate completed in NameNode startup progress \ + | (The max value is not 100 but 1.0) +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of steps completed in the phase +*-------------------------------------+--------------------------------------+ +|<<>> | Total elapsed time in the phase in milliseconds +*-------------------------------------+--------------------------------------+ +|<<>> | Total number of steps in the phase +*-------------------------------------+--------------------------------------+ +|<<>> | Current rate completed in the phase \ + | (The max value is not 100 but 1.0) +*-------------------------------------+--------------------------------------+ diff --git a/hadoop-project/src/site/site.xml b/hadoop-project/src/site/site.xml index a4e8799a454..b846b20a536 100644 --- a/hadoop-project/src/site/site.xml +++ b/hadoop-project/src/site/site.xml @@ -136,6 +136,7 @@ +