HBASE-6701 Revisit thrust of paragraph on splitting (Misty Stanley-Jones)
This commit is contained in:
parent
768c4d6775
commit
db9cb9ca08
|
@ -1194,7 +1194,7 @@ index e70ebc6..96f8c27 100644
|
||||||
xml:id="recommended_configurations.zk">
|
xml:id="recommended_configurations.zk">
|
||||||
<title>ZooKeeper Configuration</title>
|
<title>ZooKeeper Configuration</title>
|
||||||
<section
|
<section
|
||||||
xml:id="zookeeper.session.timeout">
|
xml:id="sect.zookeeper.session.timeout">
|
||||||
<title><varname>zookeeper.session.timeout</varname></title>
|
<title><varname>zookeeper.session.timeout</varname></title>
|
||||||
<para>The default timeout is three minutes (specified in milliseconds). This means that if
|
<para>The default timeout is three minutes (specified in milliseconds). This means that if
|
||||||
a server crashes, it will be three minutes before the Master notices the crash and
|
a server crashes, it will be three minutes before the Master notices the crash and
|
||||||
|
@ -1295,41 +1295,52 @@ index e70ebc6..96f8c27 100644
|
||||||
<section
|
<section
|
||||||
xml:id="disable.splitting">
|
xml:id="disable.splitting">
|
||||||
<title>Managed Splitting</title>
|
<title>Managed Splitting</title>
|
||||||
<para> Rather than let HBase auto-split your Regions, manage the splitting manually <footnote>
|
<para>HBase generally handles splitting your regions, based upon the settings in your
|
||||||
<para>What follows is taken from the javadoc at the head of the
|
<filename>hbase-default.xml</filename> and <filename>hbase-site.xml</filename>
|
||||||
<classname>org.apache.hadoop.hbase.util.RegionSplitter</classname> tool added to
|
configuration files. Important settings include
|
||||||
HBase post-0.90.0 release. </para>
|
<varname>hbase.regionserver.region.split.policy</varname>,
|
||||||
</footnote>. With growing amounts of data, splits will continually be needed. Since you
|
<varname>hbase.hregion.max.filesize</varname>,
|
||||||
always know exactly what regions you have, long-term debugging and profiling is much
|
<varname>hbase.regionserver.regionSplitLimit</varname>. A simplistic view of splitting
|
||||||
easier with manual splits. It is hard to trace the logs to understand region level
|
is that when a region grows to <varname>hbase.hregion.max.filesize</varname>, it is split.
|
||||||
problems if it keeps splitting and getting renamed. Data offlining bugs + unknown number
|
For most use patterns, most of the time, you should use automatic splitting.</para>
|
||||||
of split regions == oh crap! If an <classname>HLog</classname> or
|
<para>Instead of allowing HBase to split your regions automatically, you can choose to
|
||||||
<classname>StoreFile</classname> was mistakenly unprocessed by HBase due to a weird bug
|
manage the splitting yourself. This feature was added in HBase 0.90.0. Manually managing
|
||||||
and you notice it a day or so later, you can be assured that the regions specified in
|
splits works if you know your keyspace well, otherwise let HBase figure where to split for you.
|
||||||
these files are the same as the current regions and you have less headaches trying to
|
Manual splitting can mitigate region creation and movement under load. It also makes it so
|
||||||
restore/replay your data. You can finely tune your compaction algorithm. With roughly
|
region boundaries are known and invariant (if you disable region splitting). If you use manual
|
||||||
uniform data growth, it's easy to cause split / compaction storms as the regions all
|
splits, it is easier doing staggered, time-based major compactions spread out your network IO
|
||||||
roughly hit the same data size at the same time. With manual splits, you can let
|
load.</para>
|
||||||
staggered, time-based major compactions spread out your network IO load. </para>
|
|
||||||
<para> How do I turn off automatic splitting? Automatic splitting is determined by the
|
<formalpara>
|
||||||
configuration value <code>hbase.hregion.max.filesize</code>. It is not recommended that
|
<title>Disable Automatic Splitting</title>
|
||||||
you set this to <varname>Long.MAX_VALUE</varname> in case you forget about manual splits.
|
<para>To disable automatic splitting, set <varname>hbase.hregion.max.filesize</varname> to
|
||||||
A suggested setting is 100GB, which would result in > 1hr major compactions if reached. </para>
|
a very large value, such as <literal>100 GB</literal> It is not recommended to set it to
|
||||||
<para>What's the optimal number of pre-split regions to create? Mileage will vary depending
|
its absolute maximum value of <literal>Long.MAX_VALUE</literal>.</para>
|
||||||
upon your application. You could start low with 10 pre-split regions / server and watch as
|
</formalpara>
|
||||||
data grows over time. It's better to err on the side of too little regions and rolling
|
<note>
|
||||||
split later. A more complicated answer is that this depends upon the largest storefile in
|
<title>Automatic Splitting Is Recommended</title>
|
||||||
your region. With a growing data size, this will get larger over time. You want the
|
<para>If you disable automatic splits to diagnose a problem or during a period of fast
|
||||||
largest region to be just big enough that the <classname>Store</classname> compact
|
data growth, it is recommended to re-enable them when your situation becomes more
|
||||||
selection algorithm only compacts it due to a timed major. If you don't, your cluster can
|
stable. The potential benefits of managing region splits yourself are not
|
||||||
be prone to compaction storms as the algorithm decides to run major compactions on a large
|
undisputed.</para>
|
||||||
series of regions all at once. Note that compaction storms are due to the uniform data
|
</note>
|
||||||
growth, not the manual split decision. </para>
|
<formalpara>
|
||||||
<para> If you pre-split your regions too thin, you can increase the major compaction
|
<title>Determine the Optimal Number of Pre-Split Regions</title>
|
||||||
interval by configuring <varname>HConstants.MAJOR_COMPACTION_PERIOD</varname>. If your
|
<para>The optimal number of pre-split regions depends on your application and environment.
|
||||||
data size grows too large, use the (post-0.90.0 HBase)
|
A good rule of thumb is to start with 10 pre-split regions per server and watch as data
|
||||||
<classname>org.apache.hadoop.hbase.util.RegionSplitter</classname> script to perform a
|
grows over time. It is better to err on the side of too few regions and perform rolling
|
||||||
network IO safe rolling split of all regions. </para>
|
splits later. The optimal number of regions depends upon the largest StoreFile in your
|
||||||
|
region. The size of the largest StoreFile will increase with time if the amount of data
|
||||||
|
grows. The goal is for the largest region to be just large enough that the compaction
|
||||||
|
selection algorithm only compacts it during a timed major compaction. Otherwise, the
|
||||||
|
cluster can be prone to compaction storms where a large number of regions under
|
||||||
|
compaction at the same time. It is important to understand that the data growth causes
|
||||||
|
compaction storms, and not the manual split decision.</para>
|
||||||
|
</formalpara>
|
||||||
|
<para>If the regions are split into too many large regions, you can increase the major
|
||||||
|
compaction interval by configuring <varname>HConstants.MAJOR_COMPACTION_PERIOD</varname>.
|
||||||
|
HBase 0.90 introduced <classname>org.apache.hadoop.hbase.util.RegionSplitter</classname>,
|
||||||
|
which provides a network-IO-safe rolling split of all regions.</para>
|
||||||
</section>
|
</section>
|
||||||
<section
|
<section
|
||||||
xml:id="managed.compactions">
|
xml:id="managed.compactions">
|
||||||
|
@ -1356,62 +1367,44 @@ index e70ebc6..96f8c27 100644
|
||||||
<varname>mapreduce.reduce.speculative</varname> to false. </para>
|
<varname>mapreduce.reduce.speculative</varname> to false. </para>
|
||||||
</section>
|
</section>
|
||||||
</section>
|
</section>
|
||||||
|
<section xml:id="other_configuration"><title>Other Configurations</title>
|
||||||
<section
|
<section xml:id="balancer_config"><title>Balancer</title>
|
||||||
xml:id="other_configuration">
|
<para>The balancer is a periodic operation which is run on the master to redistribute regions on the cluster. It is configured via
|
||||||
<title>Other Configurations</title>
|
<varname>hbase.balancer.period</varname> and defaults to 300000 (5 minutes). </para>
|
||||||
<section
|
<para>See <xref linkend="master.processes.loadbalancer" /> for more information on the LoadBalancer.
|
||||||
xml:id="balancer_config">
|
</para>
|
||||||
<title>Balancer</title>
|
</section>
|
||||||
<para>The balancer is a periodic operation which is run on the master to redistribute
|
<section xml:id="disabling.blockcache"><title>Disabling Blockcache</title>
|
||||||
regions on the cluster. It is configured via <varname>hbase.balancer.period</varname> and
|
<para>Do not turn off block cache (You'd do it by setting <varname>hbase.block.cache.size</varname> to zero).
|
||||||
defaults to 300000 (5 minutes). </para>
|
Currently we do not do well if you do this because the regionserver will spend all its time loading hfile
|
||||||
<para>See <xref
|
indices over and over again. In fact, in later versions of HBase, it is not possible to disable the
|
||||||
linkend="master.processes.loadbalancer" /> for more information on the LoadBalancer.
|
block cache completely.
|
||||||
</para>
|
HBase will cache meta blocks -- the INDEX and BLOOM blocks -- even if the block cache
|
||||||
</section>
|
is disabled.</para>
|
||||||
<section
|
</section>
|
||||||
xml:id="disabling.blockcache">
|
<section xml:id="nagles">
|
||||||
<title>Disabling Blockcache</title>
|
<title><link xlink:href="http://en.wikipedia.org/wiki/Nagle's_algorithm">Nagle's</link> or the small package problem</title>
|
||||||
<para>Do not turn off block cache (You'd do it by setting
|
<para>If a big 40ms or so occasional delay is seen in operations against HBase,
|
||||||
<varname>hbase.block.cache.size</varname> to zero). Currently we do not do well if you
|
try the Nagles' setting. For example, see the user mailing list thread,
|
||||||
do this because the regionserver will spend all its time loading hfile indices over and
|
<link xlink:href="http://search-hadoop.com/m/pduLg2fydtE/Inconsistent+scan+performance+with+caching+set+&subj=Re+Inconsistent+scan+performance+with+caching+set+to+1">Inconsistent scan performance with caching set to 1</link>
|
||||||
over again. If your working set it such that block cache does you no good, at least size
|
and the issue cited therein where setting notcpdelay improved scan speeds. You might also
|
||||||
the block cache such that hfile indices will stay up in the cache (you can get a rough
|
see the graphs on the tail of <link xlink:href="https://issues.apache.org/jira/browse/HBASE-7008">HBASE-7008 Set scanner caching to a better default</link>
|
||||||
idea on the size you need by surveying regionserver UIs; you'll see index block size
|
where our Lars Hofhansl tries various data sizes w/ Nagle's on and off measuring the effect.</para>
|
||||||
accounted near the top of the webpage).</para>
|
</section>
|
||||||
</section>
|
<section xml:id="mttr">
|
||||||
<section
|
<title>Better Mean Time to Recover (MTTR)</title>
|
||||||
xml:id="nagles">
|
<para>This section is about configurations that will make servers come back faster after a fail.
|
||||||
<title><link
|
See the Deveraj Das an Nicolas Liochon blog post
|
||||||
xlink:href="http://en.wikipedia.org/wiki/Nagle's_algorithm">Nagle's</link> or the small
|
<link xlink:href="http://hortonworks.com/blog/introduction-to-hbase-mean-time-to-recover-mttr/">Introduction to HBase Mean Time to Recover (MTTR)</link>
|
||||||
package problem</title>
|
for a brief introduction.</para>
|
||||||
<para>If a big 40ms or so occasional delay is seen in operations against HBase, try the
|
<para>The issue <link xlink:href="https://issues.apache.org/jira/browse/HBASE-8389">HBASE-8354 forces Namenode into loop with lease recovery requests</link>
|
||||||
Nagles' setting. For example, see the user mailing list thread, <link
|
is messy but has a bunch of good discussion toward the end on low timeouts and how to effect faster recovery including citation of fixes
|
||||||
xlink:href="http://search-hadoop.com/m/pduLg2fydtE/Inconsistent+scan+performance+with+caching+set+&subj=Re+Inconsistent+scan+performance+with+caching+set+to+1">Inconsistent
|
added to HDFS. Read the Varun Sharma comments. The below suggested configurations are Varun's suggestions distilled and tested. Make sure you are
|
||||||
scan performance with caching set to 1</link> and the issue cited therein where setting
|
running on a late-version HDFS so you have the fixes he refers too and himself adds to HDFS that help HBase MTTR
|
||||||
notcpdelay improved scan speeds. You might also see the graphs on the tail of <link
|
(e.g. HDFS-3703, HDFS-3712, and HDFS-4791 -- hadoop 2 for sure has them and late hadoop 1 has some).
|
||||||
xlink:href="https://issues.apache.org/jira/browse/HBASE-7008">HBASE-7008 Set scanner
|
Set the following in the RegionServer.</para>
|
||||||
caching to a better default</link> where our Lars Hofhansl tries various data sizes w/
|
<programlisting>
|
||||||
Nagle's on and off measuring the effect.</para>
|
<![CDATA[<property>
|
||||||
</section>
|
|
||||||
<section
|
|
||||||
xml:id="mttr">
|
|
||||||
<title>Better Mean Time to Recover (MTTR)</title>
|
|
||||||
<para>This section is about configurations that will make servers come back faster after a
|
|
||||||
fail. See the Deveraj Das an Nicolas Liochon blog post <link
|
|
||||||
xlink:href="http://hortonworks.com/blog/introduction-to-hbase-mean-time-to-recover-mttr/">Introduction
|
|
||||||
to HBase Mean Time to Recover (MTTR)</link> for a brief introduction.</para>
|
|
||||||
<para>The issue <link
|
|
||||||
xlink:href="https://issues.apache.org/jira/browse/HBASE-8389">HBASE-8354 forces Namenode
|
|
||||||
into loop with lease recovery requests</link> is messy but has a bunch of good
|
|
||||||
discussion toward the end on low timeouts and how to effect faster recovery including
|
|
||||||
citation of fixes added to HDFS. Read the Varun Sharma comments. The below suggested
|
|
||||||
configurations are Varun's suggestions distilled and tested. Make sure you are running on
|
|
||||||
a late-version HDFS so you have the fixes he refers too and himself adds to HDFS that help
|
|
||||||
HBase MTTR (e.g. HDFS-3703, HDFS-3712, and HDFS-4791 -- hadoop 2 for sure has them and
|
|
||||||
late hadoop 1 has some). Set the following in the RegionServer. </para>
|
|
||||||
<programlisting><![CDATA[
|
|
||||||
<property>
|
<property>
|
||||||
<name>hbase.lease.recovery.dfs.timeout</name>
|
<name>hbase.lease.recovery.dfs.timeout</name>
|
||||||
<value>23000</value>
|
<value>23000</value>
|
||||||
|
|
Loading…
Reference in New Issue