HBASE-6701 Revisit thrust of paragraph on splitting (Misty Stanley-Jones)

2014-06-02 09:52:01 -07:00 · 2014-06-02 09:52:01 -07:00 · db9cb9ca08
parent 768c4d6775
commit db9cb9ca08
1 changed files with 85 additions and 92 deletions
--- a/src/main/docbkx/configuration.xml
+++ b/src/main/docbkx/configuration.xml
@ -1194,7 +1194,7 @@ index e70ebc6..96f8c27 100644
        xml:id="recommended_configurations.zk">
        <title>ZooKeeper Configuration</title>
        <section
-          xml:id="zookeeper.session.timeout">
+          xml:id="sect.zookeeper.session.timeout">
          <title><varname>zookeeper.session.timeout</varname></title>
          <para>The default timeout is three minutes (specified in milliseconds). This means that if
            a server crashes, it will be three minutes before the Master notices the crash and
@ -1295,41 +1295,52 @@ index e70ebc6..96f8c27 100644
      <section
        xml:id="disable.splitting">
        <title>Managed Splitting</title>
-        <para> Rather than let HBase auto-split your Regions, manage the splitting manually <footnote>
+        <para>HBase generally handles splitting your regions, based upon the settings in your
-            <para>What follows is taken from the javadoc at the head of the
+            <filename>hbase-default.xml</filename> and <filename>hbase-site.xml</filename>
-                <classname>org.apache.hadoop.hbase.util.RegionSplitter</classname> tool added to
+          configuration files. Important settings include
-              HBase post-0.90.0 release. </para>
+            <varname>hbase.regionserver.region.split.policy</varname>,
-          </footnote>. With growing amounts of data, splits will continually be needed. Since you
+            <varname>hbase.hregion.max.filesize</varname>,
-          always know exactly what regions you have, long-term debugging and profiling is much
+            <varname>hbase.regionserver.regionSplitLimit</varname>. A simplistic view of splitting
-          easier with manual splits. It is hard to trace the logs to understand region level
+          is that when a region grows to <varname>hbase.hregion.max.filesize</varname>, it is split.
-          problems if it keeps splitting and getting renamed. Data offlining bugs + unknown number
+          For most use patterns, most of the time, you should use automatic splitting.</para>
-          of split regions == oh crap! If an <classname>HLog</classname> or
+        <para>Instead of allowing HBase to split your regions automatically, you can choose to
-            <classname>StoreFile</classname> was mistakenly unprocessed by HBase due to a weird bug
+          manage the splitting yourself. This feature was added in HBase 0.90.0. Manually managing
-          and you notice it a day or so later, you can be assured that the regions specified in
+          splits works if you know your keyspace well, otherwise let HBase figure where to split for you.
-          these files are the same as the current regions and you have less headaches trying to
+          Manual splitting can mitigate region creation and movement under load. It also makes it so
-          restore/replay your data. You can finely tune your compaction algorithm. With roughly
+          region boundaries are known and invariant (if you disable region splitting). If you use manual
-          uniform data growth, it's easy to cause split / compaction storms as the regions all
+          splits, it is easier doing staggered, time-based major compactions spread out your network IO
-          roughly hit the same data size at the same time. With manual splits, you can let
+          load.</para>
-          staggered, time-based major compactions spread out your network IO load. </para>
+
-        <para> How do I turn off automatic splitting? Automatic splitting is determined by the
+        <formalpara>
-          configuration value <code>hbase.hregion.max.filesize</code>. It is not recommended that
+          <title>Disable Automatic Splitting</title>
-          you set this to <varname>Long.MAX_VALUE</varname> in case you forget about manual splits.
+          <para>To disable automatic splitting, set <varname>hbase.hregion.max.filesize</varname> to
-          A suggested setting is 100GB, which would result in > 1hr major compactions if reached. </para>
+            a very large value, such as <literal>100 GB</literal> It is not recommended to set it to
-        <para>What's the optimal number of pre-split regions to create? Mileage will vary depending
+            its absolute maximum value of <literal>Long.MAX_VALUE</literal>.</para>
-          upon your application. You could start low with 10 pre-split regions / server and watch as
+        </formalpara>
-          data grows over time. It's better to err on the side of too little regions and rolling
+        <note>
-          split later. A more complicated answer is that this depends upon the largest storefile in
+          <title>Automatic Splitting Is Recommended</title>
-          your region. With a growing data size, this will get larger over time. You want the
+          <para>If you disable automatic splits to diagnose a problem or during a period of fast
-          largest region to be just big enough that the <classname>Store</classname> compact
+            data growth, it is recommended to re-enable them when your situation becomes more
-          selection algorithm only compacts it due to a timed major. If you don't, your cluster can
+            stable. The potential benefits of managing region splits yourself are not
-          be prone to compaction storms as the algorithm decides to run major compactions on a large
+            undisputed.</para>
-          series of regions all at once. Note that compaction storms are due to the uniform data
+        </note>
-          growth, not the manual split decision. </para>
+        <formalpara>
-        <para> If you pre-split your regions too thin, you can increase the major compaction
+          <title>Determine the Optimal Number of Pre-Split Regions</title>
-          interval by configuring <varname>HConstants.MAJOR_COMPACTION_PERIOD</varname>. If your
+          <para>The optimal number of pre-split regions depends on your application and environment.
-          data size grows too large, use the (post-0.90.0 HBase)
+            A good rule of thumb is to start with 10 pre-split regions per server and watch as data
-            <classname>org.apache.hadoop.hbase.util.RegionSplitter</classname> script to perform a
+            grows over time. It is better to err on the side of too few regions and perform rolling
-          network IO safe rolling split of all regions. </para>
+            splits later. The optimal number of regions depends upon the largest StoreFile in your
            region. The size of the largest StoreFile will increase with time if the amount of data
            grows. The goal is for the largest region to be just large enough that the compaction
            selection algorithm only compacts it during a timed major compaction. Otherwise, the
            cluster can be prone to compaction storms where a large number of regions under
            compaction at the same time. It is important to understand that the data growth causes
            compaction storms, and not the manual split decision.</para>
        </formalpara>
        <para>If the regions are split into too many large regions, you can increase the major
          compaction interval by configuring <varname>HConstants.MAJOR_COMPACTION_PERIOD</varname>.
          HBase 0.90 introduced <classname>org.apache.hadoop.hbase.util.RegionSplitter</classname>,
          which provides a network-IO-safe rolling split of all regions.</para>
      </section>
      <section
        xml:id="managed.compactions">
@ -1356,62 +1367,44 @@ index e70ebc6..96f8c27 100644
            <varname>mapreduce.reduce.speculative</varname> to false. </para>
      </section>
    </section>
-
+      <section xml:id="other_configuration"><title>Other Configurations</title>
-    <section
+         <section xml:id="balancer_config"><title>Balancer</title>
-      xml:id="other_configuration">
+           <para>The balancer is a periodic operation which is run on the master to redistribute regions on the cluster.  It is configured via
-      <title>Other Configurations</title>
+           <varname>hbase.balancer.period</varname> and defaults to 300000 (5 minutes). </para>
-      <section
+           <para>See <xref linkend="master.processes.loadbalancer" /> for more information on the LoadBalancer.
        xml:id="balancer_config">
        <title>Balancer</title>
        <para>The balancer is a periodic operation which is run on the master to redistribute
          regions on the cluster. It is configured via <varname>hbase.balancer.period</varname> and
          defaults to 300000 (5 minutes). </para>
        <para>See <xref
            linkend="master.processes.loadbalancer" /> for more information on the LoadBalancer.
           </para>
         </section>
-      <section
+         <section xml:id="disabling.blockcache"><title>Disabling Blockcache</title>
-        xml:id="disabling.blockcache">
+           <para>Do not turn off block cache (You'd do it by setting <varname>hbase.block.cache.size</varname> to zero).
-        <title>Disabling Blockcache</title>
+           Currently we do not do well if you do this because the regionserver will spend all its time loading hfile
-        <para>Do not turn off block cache (You'd do it by setting
+           indices over and over again.  In fact, in later versions of HBase, it is not possible to disable the
-            <varname>hbase.block.cache.size</varname> to zero). Currently we do not do well if you
+           block cache completely.
-          do this because the regionserver will spend all its time loading hfile indices over and
+           HBase will cache meta blocks -- the INDEX and BLOOM blocks -- even if the block cache 
-          over again. If your working set it such that block cache does you no good, at least size
+           is disabled.</para>
          the block cache such that hfile indices will stay up in the cache (you can get a rough
          idea on the size you need by surveying regionserver UIs; you'll see index block size
          accounted near the top of the webpage).</para>
         </section>
-      <section
+    <section xml:id="nagles">
-        xml:id="nagles">
+      <title><link xlink:href="http://en.wikipedia.org/wiki/Nagle's_algorithm">Nagle's</link> or the small package problem</title>
-        <title><link
+      <para>If a big 40ms or so occasional delay is seen in operations against HBase,
-            xlink:href="http://en.wikipedia.org/wiki/Nagle's_algorithm">Nagle's</link> or the small
+      try the Nagles' setting.  For example, see the user mailing list thread,
-          package problem</title>
+      <link xlink:href="http://search-hadoop.com/m/pduLg2fydtE/Inconsistent+scan+performance+with+caching+set+&amp;subj=Re+Inconsistent+scan+performance+with+caching+set+to+1">Inconsistent scan performance with caching set to 1</link>
-        <para>If a big 40ms or so occasional delay is seen in operations against HBase, try the
+      and the issue cited therein where setting notcpdelay improved scan speeds.  You might also
-          Nagles' setting. For example, see the user mailing list thread, <link
+      see the graphs on the tail of <link xlink:href="https://issues.apache.org/jira/browse/HBASE-7008">HBASE-7008 Set scanner caching to a better default</link>
-            xlink:href="http://search-hadoop.com/m/pduLg2fydtE/Inconsistent+scan+performance+with+caching+set+&amp;subj=Re+Inconsistent+scan+performance+with+caching+set+to+1">Inconsistent
+      where our Lars Hofhansl tries various data sizes w/ Nagle's on and off measuring the effect.</para>
            scan performance with caching set to 1</link> and the issue cited therein where setting
          notcpdelay improved scan speeds. You might also see the graphs on the tail of <link
            xlink:href="https://issues.apache.org/jira/browse/HBASE-7008">HBASE-7008 Set scanner
            caching to a better default</link> where our Lars Hofhansl tries various data sizes w/
          Nagle's on and off measuring the effect.</para>
    </section>
-      <section
+    <section xml:id="mttr">
        xml:id="mttr">
      <title>Better Mean Time to Recover (MTTR)</title>
-        <para>This section is about configurations that will make servers come back faster after a
+      <para>This section is about configurations that will make servers come back faster after a fail.
-          fail. See the Deveraj Das an Nicolas Liochon blog post <link
+          See the Deveraj Das an Nicolas Liochon blog post
-            xlink:href="http://hortonworks.com/blog/introduction-to-hbase-mean-time-to-recover-mttr/">Introduction
+          <link xlink:href="http://hortonworks.com/blog/introduction-to-hbase-mean-time-to-recover-mttr/">Introduction to HBase Mean Time to Recover (MTTR)</link>
-            to HBase Mean Time to Recover (MTTR)</link> for a brief introduction.</para>
+          for a brief introduction.</para>
-        <para>The issue <link
+      <para>The issue <link xlink:href="https://issues.apache.org/jira/browse/HBASE-8389">HBASE-8354 forces Namenode into loop with lease recovery requests</link>
-            xlink:href="https://issues.apache.org/jira/browse/HBASE-8389">HBASE-8354 forces Namenode
+          is messy but has a bunch of good discussion toward the end on low timeouts and how to effect faster recovery including citation of fixes
-            into loop with lease recovery requests</link> is messy but has a bunch of good
+          added to HDFS.  Read the Varun Sharma comments.  The below suggested configurations are Varun's suggestions distilled and tested.  Make sure you are
-          discussion toward the end on low timeouts and how to effect faster recovery including
+          running on a late-version HDFS so you have the fixes he refers too and himself adds to HDFS that help HBase MTTR
-          citation of fixes added to HDFS. Read the Varun Sharma comments. The below suggested
+          (e.g. HDFS-3703, HDFS-3712, and HDFS-4791 -- hadoop 2 for sure has them and late hadoop 1 has some).
-          configurations are Varun's suggestions distilled and tested. Make sure you are running on
+          Set the following in the RegionServer.</para>
-          a late-version HDFS so you have the fixes he refers too and himself adds to HDFS that help
+      <programlisting>
-          HBase MTTR (e.g. HDFS-3703, HDFS-3712, and HDFS-4791 -- hadoop 2 for sure has them and
+<![CDATA[<property>
          late hadoop 1 has some). Set the following in the RegionServer. </para>
        <programlisting><![CDATA[
 <property>
    <name>hbase.lease.recovery.dfs.timeout</name>
    <value>23000</value>