HBASE-11607 Document HBase metrics (Misty Stanley-Jones)
This commit is contained in:
parent
3b864842c7
commit
8a52d58a7b
|
@ -951,196 +951,132 @@ $ for i in `cat conf/regionservers|sort`; do ./bin/graceful_stop.sh --restart --
|
|||
</section>
|
||||
<!-- node mgt -->
|
||||
|
||||
<section
|
||||
xml:id="hbase_metrics">
|
||||
<section xml:id="hbase_metrics">
|
||||
<title>HBase Metrics</title>
|
||||
<section
|
||||
xml:id="metric_setup">
|
||||
<para>HBase emits metrics which adhere to the <link
|
||||
xlink:href="http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/metrics/package-summary.html"
|
||||
>Hadoop metrics</link> API. Starting with HBase 0.95, HBase is configured to emit a default
|
||||
set of metrics with a default sampling period of every 10 seconds. You can use HBase
|
||||
metrics in conjunction with Ganglia. You can also filter which metrics are emitted and extend
|
||||
the metrics framework to capture custom metrics appropriate for your environment.</para>
|
||||
<section xml:id="metric_setup">
|
||||
<title>Metric Setup</title>
|
||||
<para>See <link
|
||||
xlink:href="http://hbase.apache.org/metrics.html">Metrics</link> for an introduction and
|
||||
how to enable Metrics emission. Still valid for HBase 0.94.x. </para>
|
||||
<para>For HBase 0.95.x and up, see <link
|
||||
xlink:href="http://hadoop.apache.org/docs/current/api/org/apache/hadoop/metrics2/package-summary.html" />
|
||||
<para>For HBase 0.95 and newer, HBase ships with a default metrics configuration, or
|
||||
<firstterm>sink</firstterm>. This includes a wide variety of individual metrics, and emits
|
||||
them every 10 seconds by default. To configure metrics for a given region server, edit the
|
||||
<filename>conf/hadoop-metrics2-hbase.properties</filename> file. Restart the region server
|
||||
for the changes to take effect.</para>
|
||||
<para>To change the sampling rate for the default sink, edit the line beginning with
|
||||
<literal>*.period</literal>. To filter which metrics are emitted or to extend the metrics
|
||||
framework, see <link
|
||||
xlink:href="http://hadoop.apache.org/docs/current/api/org/apache/hadoop/metrics2/package-summary.html"
|
||||
/>
|
||||
</para>
|
||||
<note xml:id="rs_metrics_ganglia">
|
||||
<title>HBase Metrics and Ganglia</title>
|
||||
<para>By default, HBase emits a large number of metrics per region server. Ganglia may have
|
||||
difficulty processing all these metrics. Consider increasing the capacity of the Ganglia
|
||||
server or reducing the number of metrics emitted by HBase. See <link
|
||||
xlink:href="http://hadoop.apache.org/docs/current/api/org/apache/hadoop/metrics2/package-summary.html#filtering"
|
||||
>Metrics Filtering</link>.</para>
|
||||
</note>
|
||||
</section>
|
||||
<section
|
||||
xml:id="rs_metrics_ganglia">
|
||||
<title>Warning To Ganglia Users</title>
|
||||
<para>Warning to Ganglia Users: by default, HBase will emit a LOT of metrics per RegionServer
|
||||
which may swamp your installation. Options include either increasing Ganglia server
|
||||
capacity, or configuring HBase to emit fewer metrics. </para>
|
||||
<section>
|
||||
<title>Disabling Metrics</title>
|
||||
<para>To disable metrics for a region server, edit the
|
||||
<filename>conf/hadoop-metrics2-hbase.properties</filename> file and comment out any
|
||||
uncommented lines. Restart the region server for the changes to take effect.</para>
|
||||
</section>
|
||||
<section
|
||||
xml:id="rs_metrics">
|
||||
<title>Most Important RegionServer Metrics</title>
|
||||
<section
|
||||
xml:id="hbase.regionserver.blockCacheHitCachingRatio">
|
||||
<title><varname>blockCacheExpressCachingRatio (formerly
|
||||
blockCacheHitCachingRatio)</varname></title>
|
||||
<para>Block cache hit caching ratio (0 to 100). The cache-hit ratio for reads configured to
|
||||
look in the cache (i.e., cacheBlocks=true). </para>
|
||||
</section>
|
||||
<section
|
||||
xml:id="hbase.regionserver.callQueueLength">
|
||||
<title><varname>callQueueLength</varname></title>
|
||||
<para>Point in time length of the RegionServer call queue. If requests arrive faster than
|
||||
the RegionServer handlers can process them they will back up in the callQueue.</para>
|
||||
</section>
|
||||
<section
|
||||
xml:id="hbase.regionserver.compactionQueueSize">
|
||||
<title><varname>compactionQueueLength (formerly compactionQueueSize)</varname></title>
|
||||
<para>Point in time length of the compaction queue. This is the number of Stores in the
|
||||
RegionServer that have been targeted for compaction.</para>
|
||||
</section>
|
||||
<section
|
||||
xml:id="hbase.regionserver.flushQueueSize">
|
||||
<title><varname>flushQueueSize</varname></title>
|
||||
<para>Point in time number of enqueued regions in the MemStore awaiting flush.</para>
|
||||
</section>
|
||||
<section
|
||||
xml:id="hbase.regionserver.hdfsBlocksLocalityIndex">
|
||||
<title><varname>hdfsBlocksLocalityIndex</varname></title>
|
||||
<para>Point in time percentage of HDFS blocks that are local to this RegionServer. The
|
||||
higher the better. </para>
|
||||
</section>
|
||||
<section
|
||||
xml:id="hbase.regionserver.memstoreSizeMB">
|
||||
<title><varname>memstoreSizeMB</varname></title>
|
||||
<para>Point in time sum of all the memstore sizes in this RegionServer (MB). Watch for this
|
||||
nearing or exceeding the configured high-watermark for MemStore memory in the
|
||||
RegionServer. </para>
|
||||
</section>
|
||||
<section
|
||||
xml:id="hbase.regionserver.regions">
|
||||
<title><varname>numberOfOnlineRegions</varname></title>
|
||||
<para>Point in time number of regions served by the RegionServer. This is an important
|
||||
metric to track for RegionServer-Region density. </para>
|
||||
</section>
|
||||
<section
|
||||
xml:id="hbase.regionserver.readRequestsCount">
|
||||
<title><varname>readRequestsCount</varname></title>
|
||||
<para>Number of read requests for this RegionServer since startup. Note: this is a 32-bit
|
||||
integer and can roll. </para>
|
||||
</section>
|
||||
<section
|
||||
xml:id="hbase.regionserver.slowHLogAppendCount">
|
||||
<title><varname>slowHLogAppendCount</varname></title>
|
||||
<para>Number of slow HLog append writes for this RegionServer since startup, where "slow" is
|
||||
> 1 second. This is a good "canary" metric for HDFS. </para>
|
||||
</section>
|
||||
<section
|
||||
xml:id="hbase.regionserver.usedHeapMB">
|
||||
<title><varname>usedHeapMB</varname></title>
|
||||
<para>Point in time amount of memory used by the RegionServer (MB).</para>
|
||||
</section>
|
||||
<section
|
||||
xml:id="hbase.regionserver.writeRequestsCount">
|
||||
<title><varname>writeRequestsCount</varname></title>
|
||||
<para>Number of write requests for this RegionServer since startup. Note: this is a 32-bit
|
||||
integer and can roll. </para>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<title>Discovering Available Metrics</title>
|
||||
<para>Rather than listing each metric which HBase emits by default, you can browse through the
|
||||
available metrics, either as a JSON output or via JMX. At this time, the JSON output does
|
||||
not include the description field which is included in the JMX view. Different metrics are
|
||||
exposed for the Master process and each region server process.</para>
|
||||
<procedure>
|
||||
<title>Access a JSON Output of Available Metrics</title>
|
||||
<step>
|
||||
<para>After starting HBase, access the region server's web UI, at
|
||||
<literal>http://localhost:60030</literal> by default.</para>
|
||||
</step>
|
||||
<step>
|
||||
<para>Click the <guilabel>Metrics Dump</guilabel> link near the top. The metrics for the region server are
|
||||
presented as a dump of the JMX bean in JSON format.</para>
|
||||
</step>
|
||||
<step>
|
||||
<para>To view metrics for the Master, connect to the Master's web UI instead (defaults to
|
||||
<literal>http://localhost:60010</literal>) and click its <guilabel>Metrics
|
||||
Dump</guilabel> link.</para>
|
||||
</step>
|
||||
</procedure>
|
||||
|
||||
<procedure>
|
||||
<title>Browse the JMX Output of Available Metrics</title>
|
||||
<para>You can use many different tools to view JMX content by browsing MBeans. This
|
||||
procedure uses <command>jvisualvm</command>, which is an application usually available in the JDK.
|
||||
</para>
|
||||
<step>
|
||||
<para>Start HBase, if it is not already running.</para>
|
||||
</step>
|
||||
<step>
|
||||
<para>Run the command <command>jvisualvm</command> command on a host with a GUI display.
|
||||
You can launch it from the command line or another method appropriate for your operating
|
||||
system.</para>
|
||||
</step>
|
||||
<step>
|
||||
<para>Be sure the <guilabel>VisualVM-MBeans</guilabel> plugin is installed. Browse to <menuchoice>
|
||||
<guimenu>Tools</guimenu>
|
||||
<guimenuitem>Plugins</guimenuitem>
|
||||
</menuchoice>. Click <guilabel>Installed</guilabel> and check whether the plugin is
|
||||
listed. If not, click <guilabel>Available Plugins</guilabel>, select it, and click
|
||||
<guibutton>Install</guibutton>. When finished, click
|
||||
<guibutton>Close</guibutton>.</para>
|
||||
</step>
|
||||
<step>
|
||||
<para>To view details for a given HBase process, double-click the process in the
|
||||
<guilabel>Local</guilabel> sub-tree in the left-hand panel. A detailed view opens in
|
||||
the right-hand panel. Click the <guilabel>MBeans</guilabel> tab which appears as a tab
|
||||
in the top of the right-hand panel.</para>
|
||||
</step>
|
||||
<step>
|
||||
<para>To access the HBase metrics, navigate to the appropriate sub-bean:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Master: <menuchoice>
|
||||
<guimenu>Hadoop</guimenu>
|
||||
<guisubmenu>HBase</guisubmenu>
|
||||
<guisubmenu>Master</guisubmenu>
|
||||
<guisubmenu>Server</guisubmenu>
|
||||
</menuchoice></para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>RegionServer: <menuchoice>
|
||||
<guimenu>Hadoop</guimenu>
|
||||
<guisubmenu>HBase</guisubmenu>
|
||||
<guisubmenu>RegionServer</guisubmenu>
|
||||
<guisubmenu>Server</guisubmenu>
|
||||
</menuchoice></para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
</step>
|
||||
<step>
|
||||
<para>The name of each metric and its current value is displayed in the
|
||||
<guilabel>Attributes</guilabel> tab. For a view which includes more details, including
|
||||
the description of each attribute, click the <guilabel>Metadata</guilabel> tab.</para>
|
||||
</step>
|
||||
</procedure>
|
||||
</section>
|
||||
<section
|
||||
xml:id="rs_metrics_other">
|
||||
<title>Other RegionServer Metrics</title>
|
||||
<section
|
||||
xml:id="hbase.regionserver.blockCacheCount">
|
||||
<title><varname>blockCacheCount</varname></title>
|
||||
<para>Point in time block cache item count in memory. This is the number of blocks of
|
||||
StoreFiles (HFiles) in the cache.</para>
|
||||
</section>
|
||||
<section
|
||||
xml:id="hbase.regionserver.blockCacheEvictedCount">
|
||||
<title><varname>blockCacheEvictedCount</varname></title>
|
||||
<para>Number of blocks that had to be evicted from the block cache due to heap size
|
||||
constraints by RegionServer since startup.</para>
|
||||
</section>
|
||||
<section
|
||||
xml:id="hbase.regionserver.blockCacheFree">
|
||||
<title><varname>blockCacheFreeMB</varname></title>
|
||||
<para>Point in time block cache memory available (MB).</para>
|
||||
</section>
|
||||
<section
|
||||
xml:id="hbase.regionserver.blockCacheHitCount">
|
||||
<title><varname>blockCacheHitCount</varname></title>
|
||||
<para>Number of blocks of StoreFiles (HFiles) read from the cache by RegionServer since
|
||||
startup.</para>
|
||||
</section>
|
||||
<section
|
||||
xml:id="hbase.regionserver.blockCacheHitRatio">
|
||||
<title><varname>blockCacheHitRatio</varname></title>
|
||||
<para>Block cache hit ratio (0 to 100) from RegionServer startup. Includes all read
|
||||
requests, although those with cacheBlocks=false will always read from disk and be counted
|
||||
as a "cache miss", which means that full-scan MapReduce jobs can affect this metric
|
||||
significantly.</para>
|
||||
</section>
|
||||
<section
|
||||
xml:id="hbase.regionserver.blockCacheMissCount">
|
||||
<title><varname>blockCacheMissCount</varname></title>
|
||||
<para>Number of blocks of StoreFiles (HFiles) requested but not read from the cache from
|
||||
RegionServer startup.</para>
|
||||
</section>
|
||||
<section
|
||||
xml:id="hbase.regionserver.blockCacheSize">
|
||||
<title><varname>blockCacheSizeMB</varname></title>
|
||||
<para>Point in time block cache size in memory (MB). i.e., memory in use by the
|
||||
BlockCache</para>
|
||||
</section>
|
||||
<section
|
||||
xml:id="hbase.regionserver.fsPreadLatency">
|
||||
<title><varname>fsPreadLatency*</varname></title>
|
||||
<para>There are several filesystem positional read latency (ms) metrics, all measured from
|
||||
RegionServer startup.</para>
|
||||
</section>
|
||||
<section
|
||||
xml:id="hbase.regionserver.fsReadLatency">
|
||||
<title><varname>fsReadLatency*</varname></title>
|
||||
<para>There are several filesystem read latency (ms) metrics, all measured from RegionServer
|
||||
startup. The issue with interpretation is that ALL reads go into this metric (e.g.,
|
||||
single-record Gets, full table Scans), including reads required for compactions. This
|
||||
metric is only interesting "over time" when comparing major releases of HBase or your own
|
||||
code.</para>
|
||||
</section>
|
||||
<section
|
||||
xml:id="hbase.regionserver.fsWriteLatency">
|
||||
<title><varname>fsWriteLatency*</varname></title>
|
||||
<para>There are several filesystem write latency (ms) metrics, all measured from
|
||||
RegionServer startup. The issue with interpretation is that ALL writes go into this metric
|
||||
(e.g., single-record Puts, full table re-writes due to compaction). This metric is only
|
||||
interesting "over time" when comparing major releases of HBase or your own code.</para>
|
||||
</section>
|
||||
<section
|
||||
xml:id="hbase.regionserver.stores">
|
||||
<title><varname>NumberOfStores</varname></title>
|
||||
<para>Point in time number of Stores open on the RegionServer. A Store corresponds to a
|
||||
ColumnFamily. For example, if a table (which contains the column family) has 3 regions on
|
||||
a RegionServer, there will be 3 stores open for that column family. </para>
|
||||
</section>
|
||||
<section
|
||||
xml:id="hbase.regionserver.storeFiles">
|
||||
<title><varname>NumberOfStorefiles</varname></title>
|
||||
<para>Point in time number of StoreFiles open on the RegionServer. A store may have more
|
||||
than one StoreFile (HFile).</para>
|
||||
</section>
|
||||
<section
|
||||
xml:id="hbase.regionserver.requests">
|
||||
<title><varname>requestsPerSecond</varname></title>
|
||||
<para>Point in time number of read and write requests. Requests correspond to RegionServer
|
||||
RPC calls, thus a single Get will result in 1 request, but a Scan with caching set to 1000
|
||||
will result in 1 request for each 'next' call (i.e., not each row). A bulk-load request
|
||||
will constitute 1 request per HFile. This metric is less interesting than
|
||||
readRequestsCount and writeRequestsCount in terms of measuring activity due to this metric
|
||||
being periodic. </para>
|
||||
</section>
|
||||
<section
|
||||
xml:id="hbase.regionserver.storeFileIndexSizeMB">
|
||||
<title><varname>storeFileIndexSizeMB</varname></title>
|
||||
<para>Point in time sum of all the StoreFile index sizes in this RegionServer (MB)</para>
|
||||
</section>
|
||||
<section xml:id="rs_metrics">
|
||||
<title>Most Important RegionServer Metrics</title>
|
||||
<para>Previously, this section contained a list of the most important RegionServer metrics.
|
||||
However, the list was extremely out of date. In some cases, the name of a given metric has
|
||||
changed. In other cases, the metric seems to no longer be exposed. An effort is underway to
|
||||
create automatic documentation for each metric based upon information pulled from its
|
||||
implementation.</para>
|
||||
</section>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
|
||||
<section
|
||||
xml:id="ops.monitoring">
|
||||
|
|
Loading…
Reference in New Issue