From 7525fa93869c7343c80b7b64344dcb520b8e9fdf Mon Sep 17 00:00:00 2001 From: Misty Stanley-Jones Date: Thu, 2 Oct 2014 09:21:58 +1000 Subject: [PATCH] HBASE-11981 Document how to find the units of measure for a given HBase metric --- src/main/docbkx/ops_mgt.xml | 201 ++++++------------------------------ 1 file changed, 34 insertions(+), 167 deletions(-) diff --git a/src/main/docbkx/ops_mgt.xml b/src/main/docbkx/ops_mgt.xml index aafb422df56..7341ead3d15 100644 --- a/src/main/docbkx/ops_mgt.xml +++ b/src/main/docbkx/ops_mgt.xml @@ -985,174 +985,41 @@ $ for i in `cat conf/regionservers|sort`; do ./bin/graceful_stop.sh --restart -- which may swamp your installation. Options include either increasing Ganglia server capacity, or configuring HBase to emit fewer metrics. -
- Most Important RegionServer Metrics -
- <varname>blockCacheExpressCachingRatio (formerly - blockCacheHitCachingRatio)</varname> - Block cache hit caching ratio (0 to 100). The cache-hit ratio for reads configured to - look in the cache (i.e., cacheBlocks=true). -
-
- <varname>callQueueLength</varname> - Point in time length of the RegionServer call queue. If requests arrive faster than - the RegionServer handlers can process them they will back up in the callQueue. -
-
- <varname>compactionQueueLength (formerly compactionQueueSize)</varname> - Point in time length of the compaction queue. This is the number of Stores in the - RegionServer that have been targeted for compaction. -
-
- <varname>flushQueueSize</varname> - Point in time number of enqueued regions in the MemStore awaiting flush. -
-
- <varname>hdfsBlocksLocalityIndex</varname> - Point in time percentage of HDFS blocks that are local to this RegionServer. The - higher the better. -
-
- <varname>memstoreSizeMB</varname> - Point in time sum of all the memstore sizes in this RegionServer (MB). Watch for this - nearing or exceeding the configured high-watermark for MemStore memory in the - RegionServer. -
-
- <varname>numberOfOnlineRegions</varname> - Point in time number of regions served by the RegionServer. This is an important - metric to track for RegionServer-Region density. -
-
- <varname>readRequestsCount</varname> - Number of read requests for this RegionServer since startup. Note: this is a 32-bit - integer and can roll. -
-
- <varname>slowHLogAppendCount</varname> - Number of slow HLog append writes for this RegionServer since startup, where "slow" is - > 1 second. This is a good "canary" metric for HDFS. -
-
- <varname>usedHeapMB</varname> - Point in time amount of memory used by the RegionServer (MB). -
-
- <varname>writeRequestsCount</varname> - Number of write requests for this RegionServer since startup. Note: this is a 32-bit - integer and can roll. -
- +
+ Units of Measure for Metrics + Different metrics are expressed in different units, as appropriate. Often, the unit of + measure is in the name (as in the metric shippedKBs). Otherwise, use the + following guidelines. When in doubt, you may need to examine the source for a given + metric. + + + Metrics that refer to a point in time are usually expressed as a timestamp. + + + Metrics that refer to an age (such as ageOfLastShippedOp) are usually + expressed in milliseconds. + + + Metrics that refer to memory sizes are in bytes. + + + Sizes of queues (such as sizeOfLogQueue) are expressed as the number of + items in the queue. Determine the size by multiplying by the block size (default is 64 + MB in HDFS). + + + Metrics that refer to things like the number of a given type of operations (such as + logEditsRead) are expressed as an integer. + +
-
- Other RegionServer Metrics -
- <varname>blockCacheCount</varname> - Point in time block cache item count in memory. This is the number of blocks of - StoreFiles (HFiles) in the cache. -
-
- <varname>blockCacheEvictedCount</varname> - Number of blocks that had to be evicted from the block cache due to heap size - constraints by RegionServer since startup. -
-
- <varname>blockCacheFreeMB</varname> - Point in time block cache memory available (MB). -
-
- <varname>blockCacheHitCount</varname> - Number of blocks of StoreFiles (HFiles) read from the cache by RegionServer since - startup. -
-
- <varname>blockCacheHitRatio</varname> - Block cache hit ratio (0 to 100) from RegionServer startup. Includes all read - requests, although those with cacheBlocks=false will always read from disk and be counted - as a "cache miss", which means that full-scan MapReduce jobs can affect this metric - significantly. -
-
- <varname>blockCacheMissCount</varname> - Number of blocks of StoreFiles (HFiles) requested but not read from the cache from - RegionServer startup. -
-
- <varname>blockCacheSizeMB</varname> - Point in time block cache size in memory (MB). i.e., memory in use by the - BlockCache -
-
- <varname>fsPreadLatency*</varname> - There are several filesystem positional read latency (ms) metrics, all measured from - RegionServer startup. -
-
- <varname>fsReadLatency*</varname> - There are several filesystem read latency (ms) metrics, all measured from RegionServer - startup. The issue with interpretation is that ALL reads go into this metric (e.g., - single-record Gets, full table Scans), including reads required for compactions. This - metric is only interesting "over time" when comparing major releases of HBase or your own - code. -
-
- <varname>fsWriteLatency*</varname> - There are several filesystem write latency (ms) metrics, all measured from - RegionServer startup. The issue with interpretation is that ALL writes go into this metric - (e.g., single-record Puts, full table re-writes due to compaction). This metric is only - interesting "over time" when comparing major releases of HBase or your own code. -
-
- <varname>NumberOfStores</varname> - Point in time number of Stores open on the RegionServer. A Store corresponds to a - ColumnFamily. For example, if a table (which contains the column family) has 3 regions on - a RegionServer, there will be 3 stores open for that column family. -
-
- <varname>NumberOfStorefiles</varname> - Point in time number of StoreFiles open on the RegionServer. A store may have more - than one StoreFile (HFile). -
-
- <varname>requestsPerSecond</varname> - Point in time number of read and write requests. Requests correspond to RegionServer - RPC calls, thus a single Get will result in 1 request, but a Scan with caching set to 1000 - will result in 1 request for each 'next' call (i.e., not each row). A bulk-load request - will constitute 1 request per HFile. This metric is less interesting than - readRequestsCount and writeRequestsCount in terms of measuring activity due to this metric - being periodic. -
-
- <varname>storeFileIndexSizeMB</varname> - Point in time sum of all the StoreFile index sizes in this RegionServer (MB) -
+
+ Most Important RegionServer Metrics + Previously, this section contained a list of the most important RegionServer metrics. + However, the list was extremely out of date. In some cases, the name of a given metric has + changed. In other cases, the metric seems to no longer be exposed. An effort is underway to + create automatic documentation for each metric based upon information pulled from its + implementation.