diff --git a/src/main/docbkx/ops_mgt.xml b/src/main/docbkx/ops_mgt.xml index 3fc025a8446..cfd029cf92e 100644 --- a/src/main/docbkx/ops_mgt.xml +++ b/src/main/docbkx/ops_mgt.xml @@ -11,7 +11,7 @@ /** * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information + * distributed with this work forf additional information * regarding copyright ownership. The ASF licenses this file * to you under the Apache License, Version 2.0 (the * "License"); you may not use this file except in compliance @@ -556,77 +556,115 @@ false
Metric Setup See Metrics for - an introduction and how to enable Metrics emission. + an introduction and how to enable Metrics emission. Still valid for HBase 0.94.x. + + For HBase 0.95.x and up, see
+
+ Warning To Ganglia Users + Warning to Ganglia Users: by default, HBase will emit a LOT of metrics per RegionServer which may swamp your installation. + Options include either increasing Ganglia server capacity, or configuring HBase to emit fewer metrics. + +
- RegionServer Metrics -
<varname>hbase.regionserver.blockCacheCount</varname> - Block cache item count in memory. This is the number of blocks of StoreFiles (HFiles) in the cache. -
-
<varname>hbase.regionserver.blockCacheEvictedCount</varname> - Number of blocks that had to be evicted from the block cache due to heap size constraints. -
-
<varname>hbase.regionserver.blockCacheFree</varname> - Block cache memory available (bytes). -
-
<varname>hbase.regionserver.blockCacheHitCachingRatio</varname> + Most Important RegionServer Metrics +
<varname>blockCacheExpressCachingRatio (formerly blockCacheHitCachingRatio)</varname> Block cache hit caching ratio (0 to 100). The cache-hit ratio for reads configured to look in the cache (i.e., cacheBlocks=true).
-
<varname>hbase.regionserver.blockCacheHitCount</varname> - Number of blocks of StoreFiles (HFiles) read from the cache. +
<varname>callQueueLength</varname> + Point in time length of the RegionServer call queue. If requests arrive faster than the RegionServer handlers can process + them they will back up in the callQueue.
-
<varname>hbase.regionserver.blockCacheHitRatio</varname> - Block cache hit ratio (0 to 100). Includes all read requests, although those with cacheBlocks=false - will always read from disk and be counted as a "cache miss". +
<varname>compactionQueueLength (formerly compactionQueueSize)</varname> + Point in time length of the compaction queue. This is the number of Stores in the RegionServer that have been targeted for compaction.
-
<varname>hbase.regionserver.blockCacheMissCount</varname> - Number of blocks of StoreFiles (HFiles) requested but not read from the cache. +
<varname>flushQueueSize</varname> + Point in time number of enqueued regions in the MemStore awaiting flush.
-
<varname>hbase.regionserver.blockCacheSize</varname> - Block cache size in memory (bytes). i.e., memory in use by the BlockCache +
<varname>hdfsBlocksLocalityIndex</varname> + Point in time percentage of HDFS blocks that are local to this RegionServer. The higher the better.
-
<varname>hbase.regionserver.compactionQueueSize</varname> - Size of the compaction queue. This is the number of Stores in the RegionServer that have been targeted for compaction. +
<varname>memstoreSizeMB</varname> + Point in time sum of all the memstore sizes in this RegionServer (MB). Watch for this nearing or exceeding + the configured high-watermark for MemStore memory in the RegionServer.
-
<varname>hbase.regionserver.flushQueueSize</varname> - Number of enqueued regions in the MemStore awaiting flush. +
<varname>numberOfOnlineRegions</varname> + Point in time number of regions served by the RegionServer. This is an important metric to track for RegionServer-Region density. +
-
<varname>hbase.regionserver.fsReadLatency_avg_time</varname> - Filesystem read latency (ms). This is the average time to read from HDFS. +
<varname>readRequestsCount</varname> + Number of read requests for this RegionServer since startup. Note: this is a 32-bit integer and can roll.
-
<varname>hbase.regionserver.fsReadLatency_num_ops</varname> - Filesystem read operations. +
<varname>slowHLogAppendCount</varname> + Number of slow HLog append writes for this RegionServer since startup, where "slow" is > 1 second. This is + a good "canary" metric for HDFS.
-
<varname>hbase.regionserver.fsSyncLatency_avg_time</varname> - Filesystem sync latency (ms). Latency to sync the write-ahead log records to the filesystem. +
<varname>usedHeapMB</varname> + Point in time amount of memory used by the RegionServer (MB).
-
<varname>hbase.regionserver.fsSyncLatency_num_ops</varname> - Number of operations to sync the write-ahead log records to the filesystem. +
<varname>writeRequestsCount</varname> + Number of write requests for this RegionServer since startup. Note: this is a 32-bit integer and can roll.
-
<varname>hbase.regionserver.fsWriteLatency_avg_time</varname> - Filesystem write latency (ms). Total latency for all writers, including StoreFiles and write-head log. + +
+
+ Other RegionServer Metrics +
<varname>blockCacheCount</varname> + Point in time block cache item count in memory. This is the number of blocks of StoreFiles (HFiles) in the cache.
-
<varname>hbase.regionserver.fsWriteLatency_num_ops</varname> - Number of filesystem write operations, including StoreFiles and write-ahead log. +
<varname>blockCacheEvictedCount</varname> + Number of blocks that had to be evicted from the block cache due to heap size constraints by RegionServer since startup.
-
<varname>hbase.regionserver.memstoreSizeMB</varname> - Sum of all the memstore sizes in this RegionServer (MB) +
<varname>blockCacheFreeMB</varname> + Point in time block cache memory available (MB).
-
<varname>hbase.regionserver.regions</varname> - Number of regions served by the RegionServer +
<varname>blockCacheHitCount</varname> + Number of blocks of StoreFiles (HFiles) read from the cache by RegionServer since startup.
-
<varname>hbase.regionserver.requests</varname> - Total number of read and write requests. Requests correspond to RegionServer RPC calls, thus a single Get will result in 1 request, but a Scan with caching set to 1000 will result in 1 request for each 'next' call (i.e., not each row). A bulk-load request will constitute 1 request per HFile. +
<varname>blockCacheHitRatio</varname> + Block cache hit ratio (0 to 100) from RegionServer startup. Includes all read requests, although those with cacheBlocks=false + will always read from disk and be counted as a "cache miss", which means that full-scan MapReduce jobs can affect + this metric significantly.
-
<varname>hbase.regionserver.storeFileIndexSizeMB</varname> - Sum of all the StoreFile index sizes in this RegionServer (MB) +
<varname>blockCacheMissCount</varname> + Number of blocks of StoreFiles (HFiles) requested but not read from the cache from RegionServer startup.
-
<varname>hbase.regionserver.stores</varname> - Number of Stores open on the RegionServer. A Store corresponds to a ColumnFamily. For example, if a table (which contains the column family) has 3 regions on a RegionServer, there will be 3 stores open for that column family. +
<varname>blockCacheSizeMB</varname> + Point in time block cache size in memory (MB). i.e., memory in use by the BlockCache
-
<varname>hbase.regionserver.storeFiles</varname> - Number of StoreFiles open on the RegionServer. A store may have more than one StoreFile (HFile). +
<varname>fsPreadLatency*</varname> + There are several filesystem positional read latency (ms) metrics, all measured from RegionServer startup. +
+
<varname>fsReadLatency*</varname> + There are several filesystem read latency (ms) metrics, all measured from RegionServer startup. The issue with + interpretation is that ALL reads go into this metric (e.g., single-record Gets, full table Scans), including + reads required for compactions. This metric is only interesting "over time" when comparing + major releases of HBase or your own code. +
+
<varname>fsWriteLatency*</varname> + There are several filesystem write latency (ms) metrics, all measured from RegionServer startup. The issue with + interpretation is that ALL writes go into this metric (e.g., single-record Puts, full table re-writes due to compaction). + This metric is only interesting "over time" when comparing + major releases of HBase or your own code. +
+
<varname>NumberOfStores</varname> + Point in time number of Stores open on the RegionServer. A Store corresponds to a ColumnFamily. For example, + if a table (which contains the column family) has 3 regions on a RegionServer, there will be 3 stores open for that + column family. +
+
<varname>NumberOfStorefiles</varname> + Point in time number of StoreFiles open on the RegionServer. A store may have more than one StoreFile (HFile). +
+
<varname>requestsPerSecond</varname> + Point in time number of read and write requests. Requests correspond to RegionServer RPC calls, + thus a single Get will result in 1 request, but a Scan with caching set to 1000 will result in 1 request for each 'next' call + (i.e., not each row). A bulk-load request will constitute 1 request per HFile. + This metric is less interesting than readRequestsCount and writeRequestsCount in terms of measuring activity + due to this metric being periodic. +
+
<varname>storeFileIndexSizeMB</varname> + Point in time sum of all the StoreFile index sizes in this RegionServer (MB)
@@ -642,8 +680,7 @@ false HBase: - Requests - Compactions queue + See OS: