HBASE-6701 Revisit thrust of paragraph on splitting (Misty Stanley-Jones)

This commit is contained in:
Michael Stack 2014-06-02 09:52:01 -07:00
parent 768c4d6775
commit db9cb9ca08
1 changed files with 85 additions and 92 deletions

View File

@ -1194,7 +1194,7 @@ index e70ebc6..96f8c27 100644
xml:id="recommended_configurations.zk"> xml:id="recommended_configurations.zk">
<title>ZooKeeper Configuration</title> <title>ZooKeeper Configuration</title>
<section <section
xml:id="zookeeper.session.timeout"> xml:id="sect.zookeeper.session.timeout">
<title><varname>zookeeper.session.timeout</varname></title> <title><varname>zookeeper.session.timeout</varname></title>
<para>The default timeout is three minutes (specified in milliseconds). This means that if <para>The default timeout is three minutes (specified in milliseconds). This means that if
a server crashes, it will be three minutes before the Master notices the crash and a server crashes, it will be three minutes before the Master notices the crash and
@ -1295,41 +1295,52 @@ index e70ebc6..96f8c27 100644
<section <section
xml:id="disable.splitting"> xml:id="disable.splitting">
<title>Managed Splitting</title> <title>Managed Splitting</title>
<para> Rather than let HBase auto-split your Regions, manage the splitting manually <footnote> <para>HBase generally handles splitting your regions, based upon the settings in your
<para>What follows is taken from the javadoc at the head of the <filename>hbase-default.xml</filename> and <filename>hbase-site.xml</filename>
<classname>org.apache.hadoop.hbase.util.RegionSplitter</classname> tool added to configuration files. Important settings include
HBase post-0.90.0 release. </para> <varname>hbase.regionserver.region.split.policy</varname>,
</footnote>. With growing amounts of data, splits will continually be needed. Since you <varname>hbase.hregion.max.filesize</varname>,
always know exactly what regions you have, long-term debugging and profiling is much <varname>hbase.regionserver.regionSplitLimit</varname>. A simplistic view of splitting
easier with manual splits. It is hard to trace the logs to understand region level is that when a region grows to <varname>hbase.hregion.max.filesize</varname>, it is split.
problems if it keeps splitting and getting renamed. Data offlining bugs + unknown number For most use patterns, most of the time, you should use automatic splitting.</para>
of split regions == oh crap! If an <classname>HLog</classname> or <para>Instead of allowing HBase to split your regions automatically, you can choose to
<classname>StoreFile</classname> was mistakenly unprocessed by HBase due to a weird bug manage the splitting yourself. This feature was added in HBase 0.90.0. Manually managing
and you notice it a day or so later, you can be assured that the regions specified in splits works if you know your keyspace well, otherwise let HBase figure where to split for you.
these files are the same as the current regions and you have less headaches trying to Manual splitting can mitigate region creation and movement under load. It also makes it so
restore/replay your data. You can finely tune your compaction algorithm. With roughly region boundaries are known and invariant (if you disable region splitting). If you use manual
uniform data growth, it's easy to cause split / compaction storms as the regions all splits, it is easier doing staggered, time-based major compactions spread out your network IO
roughly hit the same data size at the same time. With manual splits, you can let load.</para>
staggered, time-based major compactions spread out your network IO load. </para>
<para> How do I turn off automatic splitting? Automatic splitting is determined by the <formalpara>
configuration value <code>hbase.hregion.max.filesize</code>. It is not recommended that <title>Disable Automatic Splitting</title>
you set this to <varname>Long.MAX_VALUE</varname> in case you forget about manual splits. <para>To disable automatic splitting, set <varname>hbase.hregion.max.filesize</varname> to
A suggested setting is 100GB, which would result in > 1hr major compactions if reached. </para> a very large value, such as <literal>100 GB</literal> It is not recommended to set it to
<para>What's the optimal number of pre-split regions to create? Mileage will vary depending its absolute maximum value of <literal>Long.MAX_VALUE</literal>.</para>
upon your application. You could start low with 10 pre-split regions / server and watch as </formalpara>
data grows over time. It's better to err on the side of too little regions and rolling <note>
split later. A more complicated answer is that this depends upon the largest storefile in <title>Automatic Splitting Is Recommended</title>
your region. With a growing data size, this will get larger over time. You want the <para>If you disable automatic splits to diagnose a problem or during a period of fast
largest region to be just big enough that the <classname>Store</classname> compact data growth, it is recommended to re-enable them when your situation becomes more
selection algorithm only compacts it due to a timed major. If you don't, your cluster can stable. The potential benefits of managing region splits yourself are not
be prone to compaction storms as the algorithm decides to run major compactions on a large undisputed.</para>
series of regions all at once. Note that compaction storms are due to the uniform data </note>
growth, not the manual split decision. </para> <formalpara>
<para> If you pre-split your regions too thin, you can increase the major compaction <title>Determine the Optimal Number of Pre-Split Regions</title>
interval by configuring <varname>HConstants.MAJOR_COMPACTION_PERIOD</varname>. If your <para>The optimal number of pre-split regions depends on your application and environment.
data size grows too large, use the (post-0.90.0 HBase) A good rule of thumb is to start with 10 pre-split regions per server and watch as data
<classname>org.apache.hadoop.hbase.util.RegionSplitter</classname> script to perform a grows over time. It is better to err on the side of too few regions and perform rolling
network IO safe rolling split of all regions. </para> splits later. The optimal number of regions depends upon the largest StoreFile in your
region. The size of the largest StoreFile will increase with time if the amount of data
grows. The goal is for the largest region to be just large enough that the compaction
selection algorithm only compacts it during a timed major compaction. Otherwise, the
cluster can be prone to compaction storms where a large number of regions under
compaction at the same time. It is important to understand that the data growth causes
compaction storms, and not the manual split decision.</para>
</formalpara>
<para>If the regions are split into too many large regions, you can increase the major
compaction interval by configuring <varname>HConstants.MAJOR_COMPACTION_PERIOD</varname>.
HBase 0.90 introduced <classname>org.apache.hadoop.hbase.util.RegionSplitter</classname>,
which provides a network-IO-safe rolling split of all regions.</para>
</section> </section>
<section <section
xml:id="managed.compactions"> xml:id="managed.compactions">
@ -1356,62 +1367,44 @@ index e70ebc6..96f8c27 100644
<varname>mapreduce.reduce.speculative</varname> to false. </para> <varname>mapreduce.reduce.speculative</varname> to false. </para>
</section> </section>
</section> </section>
<section xml:id="other_configuration"><title>Other Configurations</title>
<section <section xml:id="balancer_config"><title>Balancer</title>
xml:id="other_configuration"> <para>The balancer is a periodic operation which is run on the master to redistribute regions on the cluster. It is configured via
<title>Other Configurations</title> <varname>hbase.balancer.period</varname> and defaults to 300000 (5 minutes). </para>
<section <para>See <xref linkend="master.processes.loadbalancer" /> for more information on the LoadBalancer.
xml:id="balancer_config">
<title>Balancer</title>
<para>The balancer is a periodic operation which is run on the master to redistribute
regions on the cluster. It is configured via <varname>hbase.balancer.period</varname> and
defaults to 300000 (5 minutes). </para>
<para>See <xref
linkend="master.processes.loadbalancer" /> for more information on the LoadBalancer.
</para> </para>
</section> </section>
<section <section xml:id="disabling.blockcache"><title>Disabling Blockcache</title>
xml:id="disabling.blockcache"> <para>Do not turn off block cache (You'd do it by setting <varname>hbase.block.cache.size</varname> to zero).
<title>Disabling Blockcache</title> Currently we do not do well if you do this because the regionserver will spend all its time loading hfile
<para>Do not turn off block cache (You'd do it by setting indices over and over again. In fact, in later versions of HBase, it is not possible to disable the
<varname>hbase.block.cache.size</varname> to zero). Currently we do not do well if you block cache completely.
do this because the regionserver will spend all its time loading hfile indices over and HBase will cache meta blocks -- the INDEX and BLOOM blocks -- even if the block cache
over again. If your working set it such that block cache does you no good, at least size is disabled.</para>
the block cache such that hfile indices will stay up in the cache (you can get a rough
idea on the size you need by surveying regionserver UIs; you'll see index block size
accounted near the top of the webpage).</para>
</section> </section>
<section <section xml:id="nagles">
xml:id="nagles"> <title><link xlink:href="http://en.wikipedia.org/wiki/Nagle's_algorithm">Nagle's</link> or the small package problem</title>
<title><link <para>If a big 40ms or so occasional delay is seen in operations against HBase,
xlink:href="http://en.wikipedia.org/wiki/Nagle's_algorithm">Nagle's</link> or the small try the Nagles' setting. For example, see the user mailing list thread,
package problem</title> <link xlink:href="http://search-hadoop.com/m/pduLg2fydtE/Inconsistent+scan+performance+with+caching+set+&amp;subj=Re+Inconsistent+scan+performance+with+caching+set+to+1">Inconsistent scan performance with caching set to 1</link>
<para>If a big 40ms or so occasional delay is seen in operations against HBase, try the and the issue cited therein where setting notcpdelay improved scan speeds. You might also
Nagles' setting. For example, see the user mailing list thread, <link see the graphs on the tail of <link xlink:href="https://issues.apache.org/jira/browse/HBASE-7008">HBASE-7008 Set scanner caching to a better default</link>
xlink:href="http://search-hadoop.com/m/pduLg2fydtE/Inconsistent+scan+performance+with+caching+set+&amp;subj=Re+Inconsistent+scan+performance+with+caching+set+to+1">Inconsistent where our Lars Hofhansl tries various data sizes w/ Nagle's on and off measuring the effect.</para>
scan performance with caching set to 1</link> and the issue cited therein where setting
notcpdelay improved scan speeds. You might also see the graphs on the tail of <link
xlink:href="https://issues.apache.org/jira/browse/HBASE-7008">HBASE-7008 Set scanner
caching to a better default</link> where our Lars Hofhansl tries various data sizes w/
Nagle's on and off measuring the effect.</para>
</section> </section>
<section <section xml:id="mttr">
xml:id="mttr">
<title>Better Mean Time to Recover (MTTR)</title> <title>Better Mean Time to Recover (MTTR)</title>
<para>This section is about configurations that will make servers come back faster after a <para>This section is about configurations that will make servers come back faster after a fail.
fail. See the Deveraj Das an Nicolas Liochon blog post <link See the Deveraj Das an Nicolas Liochon blog post
xlink:href="http://hortonworks.com/blog/introduction-to-hbase-mean-time-to-recover-mttr/">Introduction <link xlink:href="http://hortonworks.com/blog/introduction-to-hbase-mean-time-to-recover-mttr/">Introduction to HBase Mean Time to Recover (MTTR)</link>
to HBase Mean Time to Recover (MTTR)</link> for a brief introduction.</para> for a brief introduction.</para>
<para>The issue <link <para>The issue <link xlink:href="https://issues.apache.org/jira/browse/HBASE-8389">HBASE-8354 forces Namenode into loop with lease recovery requests</link>
xlink:href="https://issues.apache.org/jira/browse/HBASE-8389">HBASE-8354 forces Namenode is messy but has a bunch of good discussion toward the end on low timeouts and how to effect faster recovery including citation of fixes
into loop with lease recovery requests</link> is messy but has a bunch of good added to HDFS. Read the Varun Sharma comments. The below suggested configurations are Varun's suggestions distilled and tested. Make sure you are
discussion toward the end on low timeouts and how to effect faster recovery including running on a late-version HDFS so you have the fixes he refers too and himself adds to HDFS that help HBase MTTR
citation of fixes added to HDFS. Read the Varun Sharma comments. The below suggested (e.g. HDFS-3703, HDFS-3712, and HDFS-4791 -- hadoop 2 for sure has them and late hadoop 1 has some).
configurations are Varun's suggestions distilled and tested. Make sure you are running on Set the following in the RegionServer.</para>
a late-version HDFS so you have the fixes he refers too and himself adds to HDFS that help <programlisting>
HBase MTTR (e.g. HDFS-3703, HDFS-3712, and HDFS-4791 -- hadoop 2 for sure has them and <![CDATA[<property>
late hadoop 1 has some). Set the following in the RegionServer. </para>
<programlisting><![CDATA[
<property> <property>
<name>hbase.lease.recovery.dfs.timeout</name> <name>hbase.lease.recovery.dfs.timeout</name>
<value>23000</value> <value>23000</value>