From db9cb9ca08985ab0b12901abf66609caaac923bf Mon Sep 17 00:00:00 2001 From: Michael Stack Date: Mon, 2 Jun 2014 09:52:01 -0700 Subject: [PATCH] HBASE-6701 Revisit thrust of paragraph on splitting (Misty Stanley-Jones) --- src/main/docbkx/configuration.xml | 177 ++++++++++++++---------------- 1 file changed, 85 insertions(+), 92 deletions(-) diff --git a/src/main/docbkx/configuration.xml b/src/main/docbkx/configuration.xml index 2fbc3e83b07..5e6bdbb88b3 100644 --- a/src/main/docbkx/configuration.xml +++ b/src/main/docbkx/configuration.xml @@ -1194,7 +1194,7 @@ index e70ebc6..96f8c27 100644 xml:id="recommended_configurations.zk"> ZooKeeper Configuration
+ xml:id="sect.zookeeper.session.timeout"> <varname>zookeeper.session.timeout</varname> The default timeout is three minutes (specified in milliseconds). This means that if a server crashes, it will be three minutes before the Master notices the crash and @@ -1295,41 +1295,52 @@ index e70ebc6..96f8c27 100644
Managed Splitting - Rather than let HBase auto-split your Regions, manage the splitting manually - What follows is taken from the javadoc at the head of the - org.apache.hadoop.hbase.util.RegionSplitter tool added to - HBase post-0.90.0 release. - . With growing amounts of data, splits will continually be needed. Since you - always know exactly what regions you have, long-term debugging and profiling is much - easier with manual splits. It is hard to trace the logs to understand region level - problems if it keeps splitting and getting renamed. Data offlining bugs + unknown number - of split regions == oh crap! If an HLog or - StoreFile was mistakenly unprocessed by HBase due to a weird bug - and you notice it a day or so later, you can be assured that the regions specified in - these files are the same as the current regions and you have less headaches trying to - restore/replay your data. You can finely tune your compaction algorithm. With roughly - uniform data growth, it's easy to cause split / compaction storms as the regions all - roughly hit the same data size at the same time. With manual splits, you can let - staggered, time-based major compactions spread out your network IO load. - How do I turn off automatic splitting? Automatic splitting is determined by the - configuration value hbase.hregion.max.filesize. It is not recommended that - you set this to Long.MAX_VALUE in case you forget about manual splits. - A suggested setting is 100GB, which would result in > 1hr major compactions if reached. - What's the optimal number of pre-split regions to create? Mileage will vary depending - upon your application. You could start low with 10 pre-split regions / server and watch as - data grows over time. It's better to err on the side of too little regions and rolling - split later. A more complicated answer is that this depends upon the largest storefile in - your region. With a growing data size, this will get larger over time. You want the - largest region to be just big enough that the Store compact - selection algorithm only compacts it due to a timed major. If you don't, your cluster can - be prone to compaction storms as the algorithm decides to run major compactions on a large - series of regions all at once. Note that compaction storms are due to the uniform data - growth, not the manual split decision. - If you pre-split your regions too thin, you can increase the major compaction - interval by configuring HConstants.MAJOR_COMPACTION_PERIOD. If your - data size grows too large, use the (post-0.90.0 HBase) - org.apache.hadoop.hbase.util.RegionSplitter script to perform a - network IO safe rolling split of all regions. + HBase generally handles splitting your regions, based upon the settings in your + hbase-default.xml and hbase-site.xml + configuration files. Important settings include + hbase.regionserver.region.split.policy, + hbase.hregion.max.filesize, + hbase.regionserver.regionSplitLimit. A simplistic view of splitting + is that when a region grows to hbase.hregion.max.filesize, it is split. + For most use patterns, most of the time, you should use automatic splitting. + Instead of allowing HBase to split your regions automatically, you can choose to + manage the splitting yourself. This feature was added in HBase 0.90.0. Manually managing + splits works if you know your keyspace well, otherwise let HBase figure where to split for you. + Manual splitting can mitigate region creation and movement under load. It also makes it so + region boundaries are known and invariant (if you disable region splitting). If you use manual + splits, it is easier doing staggered, time-based major compactions spread out your network IO + load. + + + Disable Automatic Splitting + To disable automatic splitting, set hbase.hregion.max.filesize to + a very large value, such as 100 GB It is not recommended to set it to + its absolute maximum value of Long.MAX_VALUE. + + + Automatic Splitting Is Recommended + If you disable automatic splits to diagnose a problem or during a period of fast + data growth, it is recommended to re-enable them when your situation becomes more + stable. The potential benefits of managing region splits yourself are not + undisputed. + + + Determine the Optimal Number of Pre-Split Regions + The optimal number of pre-split regions depends on your application and environment. + A good rule of thumb is to start with 10 pre-split regions per server and watch as data + grows over time. It is better to err on the side of too few regions and perform rolling + splits later. The optimal number of regions depends upon the largest StoreFile in your + region. The size of the largest StoreFile will increase with time if the amount of data + grows. The goal is for the largest region to be just large enough that the compaction + selection algorithm only compacts it during a timed major compaction. Otherwise, the + cluster can be prone to compaction storms where a large number of regions under + compaction at the same time. It is important to understand that the data growth causes + compaction storms, and not the manual split decision. + + If the regions are split into too many large regions, you can increase the major + compaction interval by configuring HConstants.MAJOR_COMPACTION_PERIOD. + HBase 0.90 introduced org.apache.hadoop.hbase.util.RegionSplitter, + which provides a network-IO-safe rolling split of all regions.
@@ -1356,62 +1367,44 @@ index e70ebc6..96f8c27 100644 mapreduce.reduce.speculative to false.
- -
- Other Configurations -
- Balancer - The balancer is a periodic operation which is run on the master to redistribute - regions on the cluster. It is configured via hbase.balancer.period and - defaults to 300000 (5 minutes). - See for more information on the LoadBalancer. - -
-
- Disabling Blockcache - Do not turn off block cache (You'd do it by setting - hbase.block.cache.size to zero). Currently we do not do well if you - do this because the regionserver will spend all its time loading hfile indices over and - over again. If your working set it such that block cache does you no good, at least size - the block cache such that hfile indices will stay up in the cache (you can get a rough - idea on the size you need by surveying regionserver UIs; you'll see index block size - accounted near the top of the webpage). -
-
- <link - xlink:href="http://en.wikipedia.org/wiki/Nagle's_algorithm">Nagle's</link> or the small - package problem - If a big 40ms or so occasional delay is seen in operations against HBase, try the - Nagles' setting. For example, see the user mailing list thread, Inconsistent - scan performance with caching set to 1 and the issue cited therein where setting - notcpdelay improved scan speeds. You might also see the graphs on the tail of HBASE-7008 Set scanner - caching to a better default where our Lars Hofhansl tries various data sizes w/ - Nagle's on and off measuring the effect. -
-
- Better Mean Time to Recover (MTTR) - This section is about configurations that will make servers come back faster after a - fail. See the Deveraj Das an Nicolas Liochon blog post Introduction - to HBase Mean Time to Recover (MTTR) for a brief introduction. - The issue HBASE-8354 forces Namenode - into loop with lease recovery requests is messy but has a bunch of good - discussion toward the end on low timeouts and how to effect faster recovery including - citation of fixes added to HDFS. Read the Varun Sharma comments. The below suggested - configurations are Varun's suggestions distilled and tested. Make sure you are running on - a late-version HDFS so you have the fixes he refers too and himself adds to HDFS that help - HBase MTTR (e.g. HDFS-3703, HDFS-3712, and HDFS-4791 -- hadoop 2 for sure has them and - late hadoop 1 has some). Set the following in the RegionServer. - Other Configurations +
Balancer + The balancer is a periodic operation which is run on the master to redistribute regions on the cluster. It is configured via + hbase.balancer.period and defaults to 300000 (5 minutes). + See for more information on the LoadBalancer. + +
+
Disabling Blockcache + Do not turn off block cache (You'd do it by setting hbase.block.cache.size to zero). + Currently we do not do well if you do this because the regionserver will spend all its time loading hfile + indices over and over again. In fact, in later versions of HBase, it is not possible to disable the + block cache completely. + HBase will cache meta blocks -- the INDEX and BLOOM blocks -- even if the block cache + is disabled. +
+
+ <link xlink:href="http://en.wikipedia.org/wiki/Nagle's_algorithm">Nagle's</link> or the small package problem + If a big 40ms or so occasional delay is seen in operations against HBase, + try the Nagles' setting. For example, see the user mailing list thread, + Inconsistent scan performance with caching set to 1 + and the issue cited therein where setting notcpdelay improved scan speeds. You might also + see the graphs on the tail of HBASE-7008 Set scanner caching to a better default + where our Lars Hofhansl tries various data sizes w/ Nagle's on and off measuring the effect. +
+
+ Better Mean Time to Recover (MTTR) + This section is about configurations that will make servers come back faster after a fail. + See the Deveraj Das an Nicolas Liochon blog post + Introduction to HBase Mean Time to Recover (MTTR) + for a brief introduction. + The issue HBASE-8354 forces Namenode into loop with lease recovery requests + is messy but has a bunch of good discussion toward the end on low timeouts and how to effect faster recovery including citation of fixes + added to HDFS. Read the Varun Sharma comments. The below suggested configurations are Varun's suggestions distilled and tested. Make sure you are + running on a late-version HDFS so you have the fixes he refers too and himself adds to HDFS that help HBase MTTR + (e.g. HDFS-3703, HDFS-3712, and HDFS-4791 -- hadoop 2 for sure has them and late hadoop 1 has some). + Set the following in the RegionServer. + + hbase.lease.recovery.dfs.timeout 23000