diff --git a/src/main/docbkx/ops_mgt.xml b/src/main/docbkx/ops_mgt.xml index 29f244f5cff..ac45ecf5556 100644 --- a/src/main/docbkx/ops_mgt.xml +++ b/src/main/docbkx/ops_mgt.xml @@ -951,196 +951,132 @@ $ for i in `cat conf/regionservers|sort`; do ./bin/graceful_stop.sh --restart -- -
+
HBase Metrics -
+ HBase emits metrics which adhere to the Hadoop metrics API. Starting with HBase 0.95, HBase is configured to emit a default + set of metrics with a default sampling period of every 10 seconds. You can use HBase + metrics in conjunction with Ganglia. You can also filter which metrics are emitted and extend + the metrics framework to capture custom metrics appropriate for your environment. +
Metric Setup - See Metrics for an introduction and - how to enable Metrics emission. Still valid for HBase 0.94.x. - For HBase 0.95.x and up, see + For HBase 0.95 and newer, HBase ships with a default metrics configuration, or + sink. This includes a wide variety of individual metrics, and emits + them every 10 seconds by default. To configure metrics for a given region server, edit the + conf/hadoop-metrics2-hbase.properties file. Restart the region server + for the changes to take effect. + To change the sampling rate for the default sink, edit the line beginning with + *.period. To filter which metrics are emitted or to extend the metrics + framework, see + + HBase Metrics and Ganglia + By default, HBase emits a large number of metrics per region server. Ganglia may have + difficulty processing all these metrics. Consider increasing the capacity of the Ganglia + server or reducing the number of metrics emitted by HBase. See Metrics Filtering. +
-
- Warning To Ganglia Users - Warning to Ganglia Users: by default, HBase will emit a LOT of metrics per RegionServer - which may swamp your installation. Options include either increasing Ganglia server - capacity, or configuring HBase to emit fewer metrics. +
+ Disabling Metrics + To disable metrics for a region server, edit the + conf/hadoop-metrics2-hbase.properties file and comment out any + uncommented lines. Restart the region server for the changes to take effect.
-
- Most Important RegionServer Metrics -
- <varname>blockCacheExpressCachingRatio (formerly - blockCacheHitCachingRatio)</varname> - Block cache hit caching ratio (0 to 100). The cache-hit ratio for reads configured to - look in the cache (i.e., cacheBlocks=true). -
-
- <varname>callQueueLength</varname> - Point in time length of the RegionServer call queue. If requests arrive faster than - the RegionServer handlers can process them they will back up in the callQueue. -
-
- <varname>compactionQueueLength (formerly compactionQueueSize)</varname> - Point in time length of the compaction queue. This is the number of Stores in the - RegionServer that have been targeted for compaction. -
-
- <varname>flushQueueSize</varname> - Point in time number of enqueued regions in the MemStore awaiting flush. -
-
- <varname>hdfsBlocksLocalityIndex</varname> - Point in time percentage of HDFS blocks that are local to this RegionServer. The - higher the better. -
-
- <varname>memstoreSizeMB</varname> - Point in time sum of all the memstore sizes in this RegionServer (MB). Watch for this - nearing or exceeding the configured high-watermark for MemStore memory in the - RegionServer. -
-
- <varname>numberOfOnlineRegions</varname> - Point in time number of regions served by the RegionServer. This is an important - metric to track for RegionServer-Region density. -
-
- <varname>readRequestsCount</varname> - Number of read requests for this RegionServer since startup. Note: this is a 32-bit - integer and can roll. -
-
- <varname>slowHLogAppendCount</varname> - Number of slow HLog append writes for this RegionServer since startup, where "slow" is - > 1 second. This is a good "canary" metric for HDFS. -
-
- <varname>usedHeapMB</varname> - Point in time amount of memory used by the RegionServer (MB). -
-
- <varname>writeRequestsCount</varname> - Number of write requests for this RegionServer since startup. Note: this is a 32-bit - integer and can roll. -
+
+ Discovering Available Metrics + Rather than listing each metric which HBase emits by default, you can browse through the + available metrics, either as a JSON output or via JMX. At this time, the JSON output does + not include the description field which is included in the JMX view. Different metrics are + exposed for the Master process and each region server process. + + Access a JSON Output of Available Metrics + + After starting HBase, access the region server's web UI, at + http://localhost:60030 by default. + + + Click the Metrics Dump link near the top. The metrics for the region server are + presented as a dump of the JMX bean in JSON format. + + + To view metrics for the Master, connect to the Master's web UI instead (defaults to + http://localhost:60010) and click its Metrics + Dump link. + + + + + Browse the JMX Output of Available Metrics + You can use many different tools to view JMX content by browsing MBeans. This + procedure uses jvisualvm, which is an application usually available in the JDK. + + + Start HBase, if it is not already running. + + + Run the command jvisualvm command on a host with a GUI display. + You can launch it from the command line or another method appropriate for your operating + system. + + + Be sure the VisualVM-MBeans plugin is installed. Browse to + Tools + Plugins + . Click Installed and check whether the plugin is + listed. If not, click Available Plugins, select it, and click + Install. When finished, click + Close. + + + To view details for a given HBase process, double-click the process in the + Local sub-tree in the left-hand panel. A detailed view opens in + the right-hand panel. Click the MBeans tab which appears as a tab + in the top of the right-hand panel. + + + To access the HBase metrics, navigate to the appropriate sub-bean: + + + Master: + Hadoop + HBase + Master + Server + + + + RegionServer: + Hadoop + HBase + RegionServer + Server + + + + + + The name of each metric and its current value is displayed in the + Attributes tab. For a view which includes more details, including + the description of each attribute, click the Metadata tab. + +
-
- Other RegionServer Metrics -
- <varname>blockCacheCount</varname> - Point in time block cache item count in memory. This is the number of blocks of - StoreFiles (HFiles) in the cache. -
-
- <varname>blockCacheEvictedCount</varname> - Number of blocks that had to be evicted from the block cache due to heap size - constraints by RegionServer since startup. -
-
- <varname>blockCacheFreeMB</varname> - Point in time block cache memory available (MB). -
-
- <varname>blockCacheHitCount</varname> - Number of blocks of StoreFiles (HFiles) read from the cache by RegionServer since - startup. -
-
- <varname>blockCacheHitRatio</varname> - Block cache hit ratio (0 to 100) from RegionServer startup. Includes all read - requests, although those with cacheBlocks=false will always read from disk and be counted - as a "cache miss", which means that full-scan MapReduce jobs can affect this metric - significantly. -
-
- <varname>blockCacheMissCount</varname> - Number of blocks of StoreFiles (HFiles) requested but not read from the cache from - RegionServer startup. -
-
- <varname>blockCacheSizeMB</varname> - Point in time block cache size in memory (MB). i.e., memory in use by the - BlockCache -
-
- <varname>fsPreadLatency*</varname> - There are several filesystem positional read latency (ms) metrics, all measured from - RegionServer startup. -
-
- <varname>fsReadLatency*</varname> - There are several filesystem read latency (ms) metrics, all measured from RegionServer - startup. The issue with interpretation is that ALL reads go into this metric (e.g., - single-record Gets, full table Scans), including reads required for compactions. This - metric is only interesting "over time" when comparing major releases of HBase or your own - code. -
-
- <varname>fsWriteLatency*</varname> - There are several filesystem write latency (ms) metrics, all measured from - RegionServer startup. The issue with interpretation is that ALL writes go into this metric - (e.g., single-record Puts, full table re-writes due to compaction). This metric is only - interesting "over time" when comparing major releases of HBase or your own code. -
-
- <varname>NumberOfStores</varname> - Point in time number of Stores open on the RegionServer. A Store corresponds to a - ColumnFamily. For example, if a table (which contains the column family) has 3 regions on - a RegionServer, there will be 3 stores open for that column family. -
-
- <varname>NumberOfStorefiles</varname> - Point in time number of StoreFiles open on the RegionServer. A store may have more - than one StoreFile (HFile). -
-
- <varname>requestsPerSecond</varname> - Point in time number of read and write requests. Requests correspond to RegionServer - RPC calls, thus a single Get will result in 1 request, but a Scan with caching set to 1000 - will result in 1 request for each 'next' call (i.e., not each row). A bulk-load request - will constitute 1 request per HFile. This metric is less interesting than - readRequestsCount and writeRequestsCount in terms of measuring activity due to this metric - being periodic. -
-
- <varname>storeFileIndexSizeMB</varname> - Point in time sum of all the StoreFile index sizes in this RegionServer (MB) -
+
+ Most Important RegionServer Metrics + Previously, this section contained a list of the most important RegionServer metrics. + However, the list was extremely out of date. In some cases, the name of a given metric has + changed. In other cases, the metric seems to no longer be exposed. An effort is underway to + create automatic documentation for each metric based upon information pulled from its + implementation.
-
+
+