diff --git a/src/main/docbkx/ops_mgt.xml b/src/main/docbkx/ops_mgt.xml
index 3fc025a8446..cfd029cf92e 100644
--- a/src/main/docbkx/ops_mgt.xml
+++ b/src/main/docbkx/ops_mgt.xml
@@ -11,7 +11,7 @@
/**
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
+ * distributed with this work forf additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
@@ -556,77 +556,115 @@ false
Metric SetupSee Metrics for
- an introduction and how to enable Metrics emission.
+ an introduction and how to enable Metrics emission. Still valid for HBase 0.94.x.
+
+ For HBase 0.95.x and up, see
+
+ Warning To Ganglia Users
+ Warning to Ganglia Users: by default, HBase will emit a LOT of metrics per RegionServer which may swamp your installation.
+ Options include either increasing Ganglia server capacity, or configuring HBase to emit fewer metrics.
+
+
- RegionServer Metrics
- hbase.regionserver.blockCacheCount
- Block cache item count in memory. This is the number of blocks of StoreFiles (HFiles) in the cache.
-
- hbase.regionserver.blockCacheEvictedCount
- Number of blocks that had to be evicted from the block cache due to heap size constraints.
-
- hbase.regionserver.blockCacheFree
- Block cache memory available (bytes).
-
- hbase.regionserver.blockCacheHitCachingRatio
+ Most Important RegionServer Metrics
+ blockCacheExpressCachingRatio (formerly blockCacheHitCachingRatio)Block cache hit caching ratio (0 to 100). The cache-hit ratio for reads configured to look in the cache (i.e., cacheBlocks=true).
- hbase.regionserver.blockCacheHitCount
- Number of blocks of StoreFiles (HFiles) read from the cache.
+ callQueueLength
+ Point in time length of the RegionServer call queue. If requests arrive faster than the RegionServer handlers can process
+ them they will back up in the callQueue.
- hbase.regionserver.blockCacheHitRatio
- Block cache hit ratio (0 to 100). Includes all read requests, although those with cacheBlocks=false
- will always read from disk and be counted as a "cache miss".
+ compactionQueueLength (formerly compactionQueueSize)
+ Point in time length of the compaction queue. This is the number of Stores in the RegionServer that have been targeted for compaction.
- hbase.regionserver.blockCacheMissCount
- Number of blocks of StoreFiles (HFiles) requested but not read from the cache.
+ flushQueueSize
+ Point in time number of enqueued regions in the MemStore awaiting flush.
- hbase.regionserver.blockCacheSize
- Block cache size in memory (bytes). i.e., memory in use by the BlockCache
+ hdfsBlocksLocalityIndex
+ Point in time percentage of HDFS blocks that are local to this RegionServer. The higher the better.
- hbase.regionserver.compactionQueueSize
- Size of the compaction queue. This is the number of Stores in the RegionServer that have been targeted for compaction.
+ memstoreSizeMB
+ Point in time sum of all the memstore sizes in this RegionServer (MB). Watch for this nearing or exceeding
+ the configured high-watermark for MemStore memory in the RegionServer.
- hbase.regionserver.flushQueueSize
- Number of enqueued regions in the MemStore awaiting flush.
+ numberOfOnlineRegions
+ Point in time number of regions served by the RegionServer. This is an important metric to track for RegionServer-Region density.
+
- hbase.regionserver.fsReadLatency_avg_time
- Filesystem read latency (ms). This is the average time to read from HDFS.
+ readRequestsCount
+ Number of read requests for this RegionServer since startup. Note: this is a 32-bit integer and can roll.
- hbase.regionserver.fsReadLatency_num_ops
- Filesystem read operations.
+ slowHLogAppendCount
+ Number of slow HLog append writes for this RegionServer since startup, where "slow" is > 1 second. This is
+ a good "canary" metric for HDFS.
- hbase.regionserver.fsSyncLatency_avg_time
- Filesystem sync latency (ms). Latency to sync the write-ahead log records to the filesystem.
+ usedHeapMB
+ Point in time amount of memory used by the RegionServer (MB).
- hbase.regionserver.fsSyncLatency_num_ops
- Number of operations to sync the write-ahead log records to the filesystem.
+ writeRequestsCount
+ Number of write requests for this RegionServer since startup. Note: this is a 32-bit integer and can roll.
- hbase.regionserver.fsWriteLatency_avg_time
- Filesystem write latency (ms). Total latency for all writers, including StoreFiles and write-head log.
+
+
+
+ Other RegionServer Metrics
+ blockCacheCount
+ Point in time block cache item count in memory. This is the number of blocks of StoreFiles (HFiles) in the cache.
- hbase.regionserver.fsWriteLatency_num_ops
- Number of filesystem write operations, including StoreFiles and write-ahead log.
+ blockCacheEvictedCount
+ Number of blocks that had to be evicted from the block cache due to heap size constraints by RegionServer since startup.
- hbase.regionserver.memstoreSizeMB
- Sum of all the memstore sizes in this RegionServer (MB)
+ blockCacheFreeMB
+ Point in time block cache memory available (MB).
- hbase.regionserver.regions
- Number of regions served by the RegionServer
+ blockCacheHitCount
+ Number of blocks of StoreFiles (HFiles) read from the cache by RegionServer since startup.
- hbase.regionserver.requests
- Total number of read and write requests. Requests correspond to RegionServer RPC calls, thus a single Get will result in 1 request, but a Scan with caching set to 1000 will result in 1 request for each 'next' call (i.e., not each row). A bulk-load request will constitute 1 request per HFile.
+ blockCacheHitRatio
+ Block cache hit ratio (0 to 100) from RegionServer startup. Includes all read requests, although those with cacheBlocks=false
+ will always read from disk and be counted as a "cache miss", which means that full-scan MapReduce jobs can affect
+ this metric significantly.
- hbase.regionserver.storeFileIndexSizeMB
- Sum of all the StoreFile index sizes in this RegionServer (MB)
+ blockCacheMissCount
+ Number of blocks of StoreFiles (HFiles) requested but not read from the cache from RegionServer startup.
- hbase.regionserver.stores
- Number of Stores open on the RegionServer. A Store corresponds to a ColumnFamily. For example, if a table (which contains the column family) has 3 regions on a RegionServer, there will be 3 stores open for that column family.
+ blockCacheSizeMB
+ Point in time block cache size in memory (MB). i.e., memory in use by the BlockCache
- hbase.regionserver.storeFiles
- Number of StoreFiles open on the RegionServer. A store may have more than one StoreFile (HFile).
+ fsPreadLatency*
+ There are several filesystem positional read latency (ms) metrics, all measured from RegionServer startup.
+
+ fsReadLatency*
+ There are several filesystem read latency (ms) metrics, all measured from RegionServer startup. The issue with
+ interpretation is that ALL reads go into this metric (e.g., single-record Gets, full table Scans), including
+ reads required for compactions. This metric is only interesting "over time" when comparing
+ major releases of HBase or your own code.
+
+ fsWriteLatency*
+ There are several filesystem write latency (ms) metrics, all measured from RegionServer startup. The issue with
+ interpretation is that ALL writes go into this metric (e.g., single-record Puts, full table re-writes due to compaction).
+ This metric is only interesting "over time" when comparing
+ major releases of HBase or your own code.
+
+ NumberOfStores
+ Point in time number of Stores open on the RegionServer. A Store corresponds to a ColumnFamily. For example,
+ if a table (which contains the column family) has 3 regions on a RegionServer, there will be 3 stores open for that
+ column family.
+
+ NumberOfStorefiles
+ Point in time number of StoreFiles open on the RegionServer. A store may have more than one StoreFile (HFile).
+
+ requestsPerSecond
+ Point in time number of read and write requests. Requests correspond to RegionServer RPC calls,
+ thus a single Get will result in 1 request, but a Scan with caching set to 1000 will result in 1 request for each 'next' call
+ (i.e., not each row). A bulk-load request will constitute 1 request per HFile.
+ This metric is less interesting than readRequestsCount and writeRequestsCount in terms of measuring activity
+ due to this metric being periodic.
+
+ storeFileIndexSizeMB
+ Point in time sum of all the StoreFile index sizes in this RegionServer (MB)
@@ -642,8 +680,7 @@ false
HBase:
- Requests
- Compactions queue
+ See OS: