hbase/src/docbkx/ops_mgt.xml

687 lines
41 KiB
XML

<?xml version="1.0" encoding="UTF-8"?>
<chapter version="5.0" xml:id="ops_mgt"
xmlns="http://docbook.org/ns/docbook"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:xi="http://www.w3.org/2001/XInclude"
xmlns:svg="http://www.w3.org/2000/svg"
xmlns:m="http://www.w3.org/1998/Math/MathML"
xmlns:html="http://www.w3.org/1999/xhtml"
xmlns:db="http://docbook.org/ns/docbook">
<!--
/**
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
-->
<title>HBase Operational Management</title>
This chapter will cover operational tools and practices required of a running HBase cluster.
The subject of operations is related to the topics of <xref linkend="trouble" />, <xref linkend="performance"/>,
and <xref linkend="configuration" /> but is a distinct topic in itself.
<section xml:id="tools">
<title >HBase Tools and Utilities</title>
<para>Here we list HBase tools for administration, analysis, fixup, and
debugging.</para>
<section xml:id="driver"><title>Driver</title>
<para>There is a <code>Driver</code> class that is executed by the HBase jar can be used to invoke frequently accessed utilities. For example,
<programlisting>HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath` ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-VERSION.jar
</programlisting>
... will return...
<programlisting>
An example program must be given as the first argument.
Valid program names are:
completebulkload: Complete a bulk data load.
copytable: Export a table from local cluster to peer cluster
export: Write table data to HDFS.
import: Import data written by Export.
importtsv: Import data in TSV format.
rowcounter: Count rows in HBase table
verifyrep: Compare the data from tables in two different clusters. WARNING: It doesn't work for incrementColumnValues'd cells since the timestamp is chan
</programlisting>
... for allowable program names.
</para>
</section>
<section xml:id="hbck">
<title>HBase <application>hbck</application></title>
<subtitle>An <emphasis>fsck</emphasis> for your HBase install</subtitle>
<para>To run <application>hbck</application> against your HBase cluster run
<programlisting>$ ./bin/hbase hbck</programlisting>
At the end of the commands output it prints <emphasis>OK</emphasis>
or <emphasis>INCONSISTENCY</emphasis>. If your cluster reports
inconsistencies, pass <command>-details</command> to see more detail emitted.
If inconsistencies, run <command>hbck</command> a few times because the
inconsistency may be transient (e.g. cluster is starting up or a region is
splitting).
Passing <command>-fix</command> may correct the inconsistency (This latter
is an experimental feature).
</para>
<para>For more information, see <xref linkend="hbck.in.depth"/>.
</para>
</section>
<section xml:id="hfile_tool2"><title>HFile Tool</title>
<para>See <xref linkend="hfile_tool" />.</para>
</section>
<section xml:id="wal_tools">
<title>WAL Tools</title>
<section xml:id="hlog_tool">
<title><classname>HLog</classname> tool</title>
<para>The main method on <classname>HLog</classname> offers manual
split and dump facilities. Pass it WALs or the product of a split, the
content of the <filename>recovered.edits</filename>. directory.</para>
<para>You can get a textual dump of a WAL file content by doing the
following:<programlisting> <code>$ ./bin/hbase org.apache.hadoop.hbase.regionserver.wal.HLog --dump hdfs://example.org:8020/hbase/.logs/example.org,60020,1283516293161/10.10.21.10%3A60020.1283973724012</code> </programlisting>The
return code will be non-zero if issues with the file so you can test
wholesomeness of file by redirecting <varname>STDOUT</varname> to
<code>/dev/null</code> and testing the program return.</para>
<para>Similarly you can force a split of a log file directory by
doing:<programlisting> $ ./<code>bin/hbase org.apache.hadoop.hbase.regionserver.wal.HLog --split hdfs://example.org:8020/hbase/.logs/example.org,60020,1283516293161/</code></programlisting></para>
<section xml:id="hlog_tool.prettyprint">
<title><classname>HLogPrettyPrinter</classname></title>
<para><classname>HLogPrettyPrinter</classname> is a tool with configurable options to print the contents of an HLog.
</para>
</section>
</section>
</section>
<section xml:id="compression.tool"><title>Compression Tool</title>
<para>See <xref linkend="compression.test" />.</para>
</section>
<section xml:id="copytable">
<title>CopyTable</title>
<para>
CopyTable is a utility that can copy part or of all of a table, either to the same cluster or another cluster. The usage is as follows:
<programlisting>$ bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable [--starttime=X] [--endtime=Y] [--new.name=NEW] [--peer.adr=ADR] tablename
</programlisting>
</para>
<para>
Options:
<itemizedlist>
<listitem><varname>starttime</varname> Beginning of the time range. Without endtime means starttime to forever.</listitem>
<listitem><varname>endtime</varname> End of the time range. Without endtime means starttime to forever.</listitem>
<listitem><varname>versions</varname> Number of cell versions to copy.</listitem>
<listitem><varname>new.name</varname> New table's name.</listitem>
<listitem><varname>peer.adr</varname> Address of the peer cluster given in the format hbase.zookeeper.quorum:hbase.zookeeper.client.port:zookeeper.znode.parent</listitem>
<listitem><varname>families</varname> Comma-separated list of ColumnFamilies to copy.</listitem>
<listitem><varname>all.cells</varname> Also copy delete markers and uncollected deleted cells (advanced option).</listitem>
</itemizedlist>
Args:
<itemizedlist>
<listitem>tablename Name of table to copy.</listitem>
</itemizedlist>
</para>
<para>Example of copying 'TestTable' to a cluster that uses replication for a 1 hour window:
<programlisting>$ bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable
--starttime=1265875194289 --endtime=1265878794289
--peer.adr=server1,server2,server3:2181:/hbase TestTable</programlisting>
</para>
<note><title>Scanner Caching</title>
<para>Caching for the input Scan is configured via <code>hbase.client.scanner.caching</code> in the job configuration.
</para>
</note>
<para>
See Jonathan Hsieh's <link xlink:href="http://www.cloudera.com/blog/2012/06/online-hbase-backups-with-copytable-2/">Online HBase Backups with CopyTable</link> blog post for more on <command>CopyTable</command>.
</para>
</section>
<section xml:id="export">
<title>Export</title>
<para>Export is a utility that will dump the contents of table to HDFS in a sequence file. Invoke via:
<programlisting>$ bin/hbase org.apache.hadoop.hbase.mapreduce.Export &lt;tablename&gt; &lt;outputdir&gt; [&lt;versions&gt; [&lt;starttime&gt; [&lt;endtime&gt;]]]
</programlisting>
</para>
<para>Note: caching for the input Scan is configured via <code>hbase.client.scanner.caching</code> in the job configuration.
</para>
</section>
<section xml:id="import">
<title>Import</title>
<para>Import is a utility that will load data that has been exported back into HBase. Invoke via:
<programlisting>$ bin/hbase org.apache.hadoop.hbase.mapreduce.Import &lt;tablename&gt; &lt;inputdir&gt;
</programlisting>
</para>
</section>
<section xml:id="importtsv">
<title>ImportTsv</title>
<para>ImportTsv is a utility that will load data in TSV format into HBase. It has two distinct usages: loading data from TSV format in HDFS
into HBase via Puts, and preparing StoreFiles to be loaded via the <code>completebulkload</code>.
</para>
<para>To load data via Puts (i.e., non-bulk loading):
<programlisting>$ bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=a,b,c &lt;tablename&gt; &lt;hdfs-inputdir&gt;
</programlisting>
</para>
<para>To generate StoreFiles for bulk-loading:
<programlisting>$ bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=a,b,c -Dimporttsv.bulk.output=hdfs://storefile-outputdir &lt;tablename&gt; &lt;hdfs-data-inputdir&gt;
</programlisting>
</para>
<para>These generated StoreFiles can be loaded into HBase via <xref linkend="completebulkload"/>.
</para>
<section xml:id="importtsv.options"><title>ImportTsv Options</title>
Running ImportTsv with no arguments prints brief usage information:
<programlisting>
Usage: importtsv -Dimporttsv.columns=a,b,c &lt;tablename&gt; &lt;inputdir&gt;
Imports the given input directory of TSV data into the specified table.
The column names of the TSV data must be specified using the -Dimporttsv.columns
option. This option takes the form of comma-separated column names, where each
column name is either a simple column family, or a columnfamily:qualifier. The special
column name HBASE_ROW_KEY is used to designate that this column should be used
as the row key for each imported record. You must specify exactly one column
to be the row key, and you must specify a column name for every column that exists in the
input data.
By default importtsv will load data directly into HBase. To instead generate
HFiles of data to prepare for a bulk data load, pass the option:
-Dimporttsv.bulk.output=/path/for/output
Note: if you do not use this option, then the target table must already exist in HBase
Other options that may be specified with -D include:
-Dimporttsv.skip.bad.lines=false - fail if encountering an invalid line
'-Dimporttsv.separator=|' - eg separate on pipes instead of tabs
-Dimporttsv.timestamp=currentTimeAsLong - use the specified timestamp for the import
-Dimporttsv.mapper.class=my.Mapper - A user-defined Mapper to use instead of org.apache.hadoop.hbase.mapreduce.TsvImporterMapper
</programlisting>
</section>
<section xml:id="importtsv.example"><title>ImportTsv Example</title>
<para>For example, assume that we are loading data into a table called 'datatsv' with a ColumnFamily called 'd' with two columns "c1" and "c2".
</para>
<para>Assume that an input file exists as follows:
<programlisting>
row1 c1 c2
row2 c1 c2
row3 c1 c2
row4 c1 c2
row5 c1 c2
row6 c1 c2
row7 c1 c2
row8 c1 c2
row9 c1 c2
row10 c1 c2
</programlisting>
</para>
<para>For ImportTsv to use this imput file, the command line needs to look like this:
<programlisting>
HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath` ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-VERSION.jar importtsv -Dimporttsv.columns=HBASE_ROW_KEY,d:c1,d:c2 -Dimporttsv.bulk.output=hdfs://storefileoutput datatsv hdfs://inputfile
</programlisting>
... and in this example the first column is the rowkey, which is why the HBASE_ROW_KEY is used. The second and third columns in the file will be imported as "d:c1" and "d:c2", respectively.
</para>
</section>
<section xml:id="importtsv.warning"><title>ImportTsv Warning</title>
<para>If you have preparing a lot of data for bulk loading, make sure the target HBase table is pre-split appropriately.
</para>
</section>
<section xml:id="importtsv.also"><title>See Also</title>
For more information about bulk-loading HFiles into HBase, see <xref linkend="arch.bulk.load"/>
</section>
</section>
<section xml:id="completebulkload">
<title>CompleteBulkLoad</title>
<para>The <code>completebulkload</code> utility will move generated StoreFiles into an HBase table. This utility is often used
in conjunction with output from <xref linkend="importtsv"/>.
</para>
<para>There are two ways to invoke this utility, with explicit classname and via the driver:
<programlisting>$ bin/hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles &lt;hdfs://storefileoutput&gt; &lt;tablename&gt;
</programlisting>
.. and via the Driver..
<programlisting>HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath` ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-VERSION.jar completebulkload &lt;hdfs://storefileoutput&gt; &lt;tablename&gt;
</programlisting>
</para>
<para>For more information about bulk-loading HFiles into HBase, see <xref linkend="arch.bulk.load"/>.
</para>
</section>
<section xml:id="walplayer">
<title>WALPlayer</title>
<para>WALPlayer is a utility to replay WAL files into HBase.
</para>
<para>The WAL can be replayed for a set of tables or all tables, and a timerange can be provided (in milliseconds). The WAL is filtered to this set of tables. The output can optionally be mapped to another set of tables.
</para>
<para>WALPlayer can also generate HFiles for later bulk importing, in that case only a single table and no mapping can be specified.
</para>
<para>Invoke via:
<programlisting>$ bin/hbase org.apache.hadoop.hbase.mapreduce.WALPlayer [options] &lt;wal inputdir&gt; &lt;tables&gt; [&lt;tableMappings>]&gt;
</programlisting>
</para>
<para>For example:
<programlisting>$ bin/hbase org.apache.hadoop.hbase.mapreduce.WALPlayer /backuplogdir oldTable1,oldTable2 newTable1,newTable2
</programlisting>
</para>
</section>
<section xml:id="rowcounter">
<title>RowCounter</title>
<para>RowCounter is a utility that will count all the rows of a table. This is a good utility to use
as a sanity check to ensure that HBase can read all the blocks of a table if there are any concerns of metadata inconsistency.
<programlisting>$ bin/hbase org.apache.hadoop.hbase.mapreduce.RowCounter &lt;tablename&gt; [&lt;column1&gt; &lt;column2&gt;...]
</programlisting>
</para>
<para>Note: caching for the input Scan is configured via <code>hbase.client.scanner.caching</code> in the job configuration.
</para>
</section>
</section> <!-- tools -->
<section xml:id="ops.regionmgt">
<title>Region Management</title>
<section xml:id="ops.regionmgt.majorcompact">
<title>Major Compaction</title>
<para>Major compactions can be requested via the HBase shell or <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#majorCompact%28java.lang.String%29">HBaseAdmin.majorCompact</link>.
</para>
<para>Note: major compactions do NOT do region merges. See <xref linkend="compaction"/> for more information about compactions.
</para>
</section>
<section xml:id="ops.regionmgt.merge">
<title>Merge</title>
<para>Merge is a utility that can merge adjoining regions in the same table (see org.apache.hadoop.hbase.util.Merge).</para>
<programlisting>$ bin/hbase org.apache.hbase.util.Merge &lt;tablename&gt; &lt;region1&gt; &lt;region2&gt;
</programlisting>
<para>If you feel you have too many regions and want to consolidate them, Merge is the utility you need. Merge must
run be done when the cluster is down.
See the <link xlink:href="http://ofps.oreilly.com/titles/9781449396107/performance.html">O'Reilly HBase Book</link> for
an example of usage.
</para>
<para>Additionally, there is a Ruby script attached to <link xlink:href="https://issues.apache.org/jira/browse/HBASE-1621">HBASE-1621</link>
for region merging.
</para>
</section>
</section>
<section xml:id="node.management"><title>Node Management</title>
<section xml:id="decommission"><title>Node Decommission</title>
<para>You can stop an individual RegionServer by running the following
script in the HBase directory on the particular node:
<programlisting>$ ./bin/hbase-daemon.sh stop regionserver</programlisting>
The RegionServer will first close all regions and then shut itself down.
On shutdown, the RegionServer's ephemeral node in ZooKeeper will expire.
The master will notice the RegionServer gone and will treat it as
a 'crashed' server; it will reassign the nodes the RegionServer was carrying.
<note><title>Disable the Load Balancer before Decommissioning a node</title>
<para>If the load balancer runs while a node is shutting down, then
there could be contention between the Load Balancer and the
Master's recovery of the just decommissioned RegionServer.
Avoid any problems by disabling the balancer first.
See <xref linkend="lb" /> below.
</para>
</note>
</para>
<para>
A downside to the above stop of a RegionServer is that regions could be offline for
a good period of time. Regions are closed in order. If many regions on the server, the
first region to close may not be back online until all regions close and after the master
notices the RegionServer's znode gone. In HBase 0.90.2, we added facility for having
a node gradually shed its load and then shutdown itself down. HBase 0.90.2 added the
<filename>graceful_stop.sh</filename> script. Here is its usage:
<programlisting>$ ./bin/graceful_stop.sh
Usage: graceful_stop.sh [--config &amp;conf-dir>] [--restart] [--reload] [--thrift] [--rest] &amp;hostname>
thrift If we should stop/start thrift before/after the hbase stop/start
rest If we should stop/start rest before/after the hbase stop/start
restart If we should restart after graceful stop
reload Move offloaded regions back on to the stopped server
debug Move offloaded regions back on to the stopped server
hostname Hostname of server we are to stop</programlisting>
</para>
<para>
To decommission a loaded RegionServer, run the following:
<programlisting>$ ./bin/graceful_stop.sh HOSTNAME</programlisting>
where <varname>HOSTNAME</varname> is the host carrying the RegionServer
you would decommission.
<note><title>On <varname>HOSTNAME</varname></title>
<para>The <varname>HOSTNAME</varname> passed to <filename>graceful_stop.sh</filename>
must match the hostname that hbase is using to identify RegionServers.
Check the list of RegionServers in the master UI for how HBase is
referring to servers. Its usually hostname but can also be FQDN.
Whatever HBase is using, this is what you should pass the
<filename>graceful_stop.sh</filename> decommission
script. If you pass IPs, the script is not yet smart enough to make
a hostname (or FQDN) of it and so it will fail when it checks if server is
currently running; the graceful unloading of regions will not run.
</para>
</note> The <filename>graceful_stop.sh</filename> script will move the regions off the
decommissioned RegionServer one at a time to minimize region churn.
It will verify the region deployed in the new location before it
will moves the next region and so on until the decommissioned server
is carrying zero regions. At this point, the <filename>graceful_stop.sh</filename>
tells the RegionServer <command>stop</command>. The master will at this point notice the
RegionServer gone but all regions will have already been redeployed
and because the RegionServer went down cleanly, there will be no
WAL logs to split.
<note xml:id="lb"><title>Load Balancer</title>
<para>
It is assumed that the Region Load Balancer is disabled while the
<command>graceful_stop</command> script runs (otherwise the balancer
and the decommission script will end up fighting over region deployments).
Use the shell to disable the balancer:
<programlisting>hbase(main):001:0> balance_switch false
true
0 row(s) in 0.3590 seconds</programlisting>
This turns the balancer OFF. To reenable, do:
<programlisting>hbase(main):001:0> balance_switch true
false
0 row(s) in 0.3590 seconds</programlisting>
</para>
</note>
</para>
</section>
<section xml:id="rolling">
<title>Rolling Restart</title>
<para>
You can also ask this script to restart a RegionServer after the shutdown
AND move its old regions back into place. The latter you might do to
retain data locality. A primitive rolling restart might be effected by
running something like the following:
<programlisting>$ for i in `cat conf/regionservers|sort`; do ./bin/graceful_stop.sh --restart --reload --debug $i; done &amp;> /tmp/log.txt &amp;
</programlisting>
Tail the output of <filename>/tmp/log.txt</filename> to follow the scripts
progress. The above does RegionServers only. Be sure to disable the
load balancer before doing the above. You'd need to do the master
update separately. Do it before you run the above script.
Here is a pseudo-script for how you might craft a rolling restart script:
<orderedlist>
<listitem><para>Untar your release, make sure of its configuration and
then rsync it across the cluster. If this is 0.90.2, patch it
with HBASE-3744 and HBASE-3756.
</para>
</listitem>
<listitem>
<para>Run hbck to ensure the cluster consistent
<programlisting>$ ./bin/hbase hbck</programlisting>
Effect repairs if inconsistent.
</para>
</listitem>
<listitem>
<para>Restart the Master: <programlisting>$ ./bin/hbase-daemon.sh stop master; ./bin/hbase-daemon.sh start master</programlisting>
</para>
</listitem>
<listitem>
<para>
Disable the region balancer:<programlisting>$ echo "balance_switch false" | ./bin/hbase shell</programlisting>
</para>
</listitem>
<listitem>
<para>Run the <filename>graceful_stop.sh</filename> script per RegionServer. For example:
<programlisting>$ for i in `cat conf/regionservers|sort`; do ./bin/graceful_stop.sh --restart --reload --debug $i; done &amp;> /tmp/log.txt &amp;
</programlisting>
If you are running thrift or rest servers on the RegionServer, pass --thrift or --rest options (See usage
for <filename>graceful_stop.sh</filename> script).
</para>
</listitem>
<listitem>
<para>Restart the Master again. This will clear out dead servers list and reenable the balancer.
</para>
</listitem>
<listitem>
<para>Run hbck to ensure the cluster is consistent.
</para>
</listitem>
</orderedlist>
</para>
</section>
</section> <!-- node mgt -->
<section xml:id="hbase_metrics">
<title>HBase Metrics</title>
<section xml:id="metric_setup">
<title>Metric Setup</title>
<para>See <link xlink:href="http://hbase.apache.org/metrics.html">Metrics</link> for
an introduction and how to enable Metrics emission.
</para>
</section>
<section xml:id="rs_metrics">
<title>RegionServer Metrics</title>
<section xml:id="hbase.regionserver.blockCacheCount"><title><varname>hbase.regionserver.blockCacheCount</varname></title>
<para>Block cache item count in memory. This is the number of blocks of StoreFiles (HFiles) in the cache.</para>
</section>
<section xml:id="hbase.regionserver.blockCacheEvictedCount"><title><varname>hbase.regionserver.blockCacheEvictedCount</varname></title>
<para>Number of blocks that had to be evicted from the block cache due to heap size constraints.</para>
</section>
<section xml:id="hbase.regionserver.blockCacheFree"><title><varname>hbase.regionserver.blockCacheFree</varname></title>
<para>Block cache memory available (bytes).</para>
</section>
<section xml:id="hbase.regionserver.blockCacheHitCachingRatio"><title><varname>hbase.regionserver.blockCacheHitCachingRatio</varname></title>
<para>Block cache hit caching ratio (0 to 100). The cache-hit ratio for reads configured to look in the cache (i.e., cacheBlocks=true). </para>
</section>
<section xml:id="hbase.regionserver.blockCacheHitCount"><title><varname>hbase.regionserver.blockCacheHitCount</varname></title>
<para>Number of blocks of StoreFiles (HFiles) read from the cache.</para>
</section>
<section xml:id="hbase.regionserver.blockCacheHitRatio"><title><varname>hbase.regionserver.blockCacheHitRatio</varname></title>
<para>Block cache hit ratio (0 to 100). Includes all read requests, although those with cacheBlocks=false
will always read from disk and be counted as a "cache miss".</para>
</section>
<section xml:id="hbase.regionserver.blockCacheMissCount"><title><varname>hbase.regionserver.blockCacheMissCount</varname></title>
<para>Number of blocks of StoreFiles (HFiles) requested but not read from the cache.</para>
</section>
<section xml:id="hbase.regionserver.blockCacheSize"><title><varname>hbase.regionserver.blockCacheSize</varname></title>
<para>Block cache size in memory (bytes). i.e., memory in use by the BlockCache</para>
</section>
<section xml:id="hbase.regionserver.compactionQueueSize"><title><varname>hbase.regionserver.compactionQueueSize</varname></title>
<para>Size of the compaction queue. This is the number of Stores in the RegionServer that have been targeted for compaction.</para>
</section>
<section xml:id="hbase.regionserver.flushQueueSize"><title><varname>hbase.regionserver.flushQueueSize</varname></title>
<para>Number of enqueued regions in the MemStore awaiting flush.</para>
</section>
<section xml:id="hbase.regionserver.fsReadLatency_avg_time"><title><varname>hbase.regionserver.fsReadLatency_avg_time</varname></title>
<para>Filesystem read latency (ms). This is the average time to read from HDFS.</para>
</section>
<section xml:id="hbase.regionserver.fsReadLatency_num_ops"><title><varname>hbase.regionserver.fsReadLatency_num_ops</varname></title>
<para>Filesystem read operations.</para>
</section>
<section xml:id="hbase.regionserver.fsSyncLatency_avg_time"><title><varname>hbase.regionserver.fsSyncLatency_avg_time</varname></title>
<para>Filesystem sync latency (ms). Latency to sync the write-ahead log records to the filesystem.</para>
</section>
<section xml:id="hbase.regionserver.fsSyncLatency_num_ops"><title><varname>hbase.regionserver.fsSyncLatency_num_ops</varname></title>
<para>Number of operations to sync the write-ahead log records to the filesystem.</para>
</section>
<section xml:id="hbase.regionserver.fsWriteLatency_avg_time"><title><varname>hbase.regionserver.fsWriteLatency_avg_time</varname></title>
<para>Filesystem write latency (ms). Total latency for all writers, including StoreFiles and write-head log.</para>
</section>
<section xml:id="hbase.regionserver.fsWriteLatency_num_ops"><title><varname>hbase.regionserver.fsWriteLatency_num_ops</varname></title>
<para>Number of filesystem write operations, including StoreFiles and write-ahead log.</para>
</section>
<section xml:id="hbase.regionserver.memstoreSizeMB"><title><varname>hbase.regionserver.memstoreSizeMB</varname></title>
<para>Sum of all the memstore sizes in this RegionServer (MB)</para>
</section>
<section xml:id="hbase.regionserver.regions"><title><varname>hbase.regionserver.regions</varname></title>
<para>Number of regions served by the RegionServer</para>
</section>
<section xml:id="hbase.regionserver.requests"><title><varname>hbase.regionserver.requests</varname></title>
<para>Total number of read and write requests. Requests correspond to RegionServer RPC calls, thus a single Get will result in 1 request, but a Scan with caching set to 1000 will result in 1 request for each 'next' call (i.e., not each row). A bulk-load request will constitute 1 request per HFile.</para>
</section>
<section xml:id="hbase.regionserver.storeFileIndexSizeMB"><title><varname>hbase.regionserver.storeFileIndexSizeMB</varname></title>
<para>Sum of all the StoreFile index sizes in this RegionServer (MB)</para>
</section>
<section xml:id="hbase.regionserver.stores"><title><varname>hbase.regionserver.stores</varname></title>
<para>Number of Stores open on the RegionServer. A Store corresponds to a ColumnFamily. For example, if a table (which contains the column family) has 3 regions on a RegionServer, there will be 3 stores open for that column family. </para>
</section>
<section xml:id="hbase.regionserver.storeFiles"><title><varname>hbase.regionserver.storeFiles</varname></title>
<para>Number of StoreFiles open on the RegionServer. A store may have more than one StoreFile (HFile).</para>
</section>
</section>
</section>
<section xml:id="ops.monitoring">
<title >HBase Monitoring</title>
<section xml:id="ops.monitoring.overview">
<title>Overview</title>
<para>The following metrics are arguably the most important to monitor for each RegionServer for
"macro monitoring", preferably with a system like <link xlink:href="http://opentsdb.net/">OpenTSDB</link>.
If your cluster is having performance issues it's likely that you'll see something unusual with
this group.
</para>
<para>HBase:
<itemizedlist>
<listitem>Requests</listitem>
<listitem>Compactions queue</listitem>
</itemizedlist>
</para>
<para>OS:
<itemizedlist>
<listitem>IO Wait</listitem>
<listitem>User CPU</listitem>
</itemizedlist>
</para>
<para>Java:
<itemizedlist>
<listitem>GC</listitem>
</itemizedlist>
</para>
<para>
</para>
<para>
For more information on HBase metrics, see <xref linkend="hbase_metrics"/>.
</para>
</section>
<section xml:id="ops.slow.query">
<title>Slow Query Log</title>
<para>The HBase slow query log consists of parseable JSON structures describing the properties of those client operations (Gets, Puts, Deletes, etc.) that either took too long to run, or produced too much output. The thresholds for "too long to run" and "too much output" are configurable, as described below. The output is produced inline in the main region server logs so that it is easy to discover further details from context with other logged events. It is also prepended with identifying tags <constant>(responseTooSlow)</constant>, <constant>(responseTooLarge)</constant>, <constant>(operationTooSlow)</constant>, and <constant>(operationTooLarge)</constant> in order to enable easy filtering with grep, in case the user desires to see only slow queries.
</para>
<section><title>Configuration</title>
<para>There are two configuration knobs that can be used to adjust the thresholds for when queries are logged.
</para>
<itemizedlist>
<listitem>
<varname>hbase.ipc.warn.response.time</varname> Maximum number of milliseconds that a query can be run without being logged. Defaults to 10000, or 10 seconds. Can be set to -1 to disable logging by time.
</listitem>
<listitem><varname>hbase.ipc.warn.response.size</varname> Maximum byte size of response that a query can return without being logged. Defaults to 100 megabytes. Can be set to -1 to disable logging by size.
</listitem>
</itemizedlist>
</section>
<section><title>Metrics</title>
<para>The slow query log exposes to metrics to JMX.
<itemizedlist><listitem><varname>hadoop.regionserver_rpc_slowResponse</varname> a global metric reflecting the durations of all responses that triggered logging.</listitem>
<listitem><varname>hadoop.regionserver_rpc_methodName.aboveOneSec</varname> A metric reflecting the durations of all responses that lasted for more than one second.</listitem>
</itemizedlist>
</para>
</section>
<section><title>Output</title>
<para>The output is tagged with operation e.g. <constant>(operationTooSlow)</constant> if the call was a client operation, such as a Put, Get, or Delete, which we expose detailed fingerprint information for. If not, it is tagged <constant>(responseTooSlow)</constant> and still produces parseable JSON output, but with less verbose information solely regarding its duration and size in the RPC itself. <constant>TooLarge</constant> is substituted for <constant>TooSlow</constant> if the response size triggered the logging, with <constant>TooLarge</constant> appearing even in the case that both size and duration triggered logging.
</para>
</section>
<section><title>Example</title>
<para>
<programlisting>2011-09-08 10:01:25,824 WARN org.apache.hadoop.ipc.HBaseServer: (operationTooSlow): {"tables":{"riley2":{"puts":[{"totalColumns":11,"families":{"actions":[{"timestamp":1315501284459,"qualifier":"0","vlen":9667580},{"timestamp":1315501284459,"qualifier":"1","vlen":10122412},{"timestamp":1315501284459,"qualifier":"2","vlen":11104617},{"timestamp":1315501284459,"qualifier":"3","vlen":13430635}]},"row":"cfcd208495d565ef66e7dff9f98764da:0"}],"families":["actions"]}},"processingtimems":956,"client":"10.47.34.63:33623","starttimems":1315501284456,"queuetimems":0,"totalPuts":1,"class":"HRegionServer","responsesize":0,"method":"multiPut"}</programlisting>
</para>
<para>Note that everything inside the "tables" structure is output produced by MultiPut's fingerprint, while the rest of the information is RPC-specific, such as processing time and client IP/port. Other client operations follow the same pattern and the same general structure, with necessary differences due to the nature of the individual operations. In the case that the call is not a client operation, that detailed fingerprint information will be completely absent.
</para>
<para>This particular example, for example, would indicate that the likely cause of slowness is simply a very large (on the order of 100MB) multiput, as we can tell by the "vlen," or value length, fields of each put in the multiPut.
</para>
</section>
</section>
</section>
<section xml:id="cluster_replication">
<title>Cluster Replication</title>
<para>See <link xlink:href="http://hbase.apache.org/replication.html">Cluster Replication</link>.
</para>
</section>
<section xml:id="ops.backup">
<title >HBase Backup</title>
<para>There are two broad strategies for performing HBase backups: backing up with a full cluster shutdown, and backing up on a live cluster.
Each approach has pros and cons.
</para>
<para>For additional information, see <link xlink:href="http://blog.sematext.com/2011/03/11/hbase-backup-options/">HBase Backup Options</link> over on the Sematext Blog.
</para>
<section xml:id="ops.backup.fullshutdown"><title>Full Shutdown Backup</title>
<para>Some environments can tolerate a periodic full shutdown of their HBase cluster, for example if it is being used a back-end analytic capacity
and not serving front-end web-pages. The benefits are that the NameNode/Master are RegionServers are down, so there is no chance of missing
any in-flight changes to either StoreFiles or metadata. The obvious con is that the cluster is down. The steps include:
</para>
<section xml:id="ops.backup.fullshutdown.stop"><title>Stop HBase</title>
<para>
</para>
</section>
<section xml:id="ops.backup.fullshutdown.distcp"><title>Distcp</title>
<para>Distcp could be used to either copy the contents of the HBase directory in HDFS to either the same cluster in another directory, or
to a different cluster.
</para>
<para>Note: Distcp works in this situation because the cluster is down and there are no in-flight edits to files.
Distcp-ing of files in the HBase directory is not generally recommended on a live cluster.
</para>
</section>
<section xml:id="ops.backup.fullshutdown.restore"><title>Restore (if needed)</title>
<para>The backup of the hbase directory from HDFS is copied onto the 'real' hbase directory via distcp. The act of copying these files
creates new HDFS metadata, which is why a restore of the NameNode edits from the time of the HBase backup isn't required for this kind of
restore, because it's a restore (via distcp) of a specific HDFS directory (i.e., the HBase part) not the entire HDFS file-system.
</para>
</section>
</section>
<section xml:id="ops.backup.live.replication"><title>Live Cluster Backup - Replication</title>
<para>This approach assumes that there is a second cluster.
See the HBase page on <link xlink:href="http://hbase.apache.org/replication.html">replication</link> for more information.
</para>
</section>
<section xml:id="ops.backup.live.copytable"><title>Live Cluster Backup - CopyTable</title>
<para>The <xref linkend="copytable" /> utility could either be used to copy data from one table to another on the
same cluster, or to copy data to another table on another cluster.
</para>
<para>Since the cluster is up, there is a risk that edits could be missed in the copy process.
</para>
</section>
<section xml:id="ops.backup.live.export"><title>Live Cluster Backup - Export</title>
<para>The <xref linkend="export" /> approach dumps the content of a table to HDFS on the same cluster. To restore the data, the
<xref linkend="import" /> utility would be used.
</para>
<para>Since the cluster is up, there is a risk that edits could be missed in the export process.
</para>
</section>
</section> <!-- backup -->
<section xml:id="ops.capacity"><title>Capacity Planning</title>
<section xml:id="ops.capacity.storage"><title>Storage</title>
<para>A common question for HBase administrators is estimating how much storage will be required for an HBase cluster.
There are several apsects to consider, the most important of which is what data load into the cluster. Start
with a solid understanding of how HBase handles data internally (KeyValue).
</para>
<section xml:id="ops.capacity.storage.kv"><title>KeyValue</title>
<para>HBase storage will be dominated by KeyValues. See <xref linkend="keyvalue" /> and <xref linkend="keysize" /> for
how HBase stores data internally.
</para>
<para>It is critical to understand that there is a KeyValue instance for every attribute stored in a row, and the
rowkey-length, ColumnFamily name-length and attribute lengths will drive the size of the database more than any other
factor.
</para>
</section>
<section xml:id="ops.capacity.storage.sf"><title>StoreFiles and Blocks</title>
<para>KeyValue instances are aggregated into blocks, and the blocksize is configurable on a per-ColumnFamily basis.
Blocks are aggregated into StoreFile's. See <xref linkend="regions.arch" />.
</para>
</section>
<section xml:id="ops.capacity.storage.hdfs"><title>HDFS Block Replication</title>
<para>Because HBase runs on top of HDFS, factor in HDFS block replication into storage calculations.
</para>
</section>
</section>
<section xml:id="ops.capacity.regions"><title>Regions</title>
<para>Another common question for HBase administrators is determining the right number of regions per
RegionServer. This affects both storage and hardware planning. See <xref linkend="perf.number.of.regions" />.
</para>
</section>
</section>
</chapter>