HBASE-5138 [ref manual] Add a discussion on the number of regions

git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1435998 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Michael Stack 2013-01-20 23:10:33 +00:00
parent c56eeb9598
commit a611da16b3
1 changed files with 32 additions and 0 deletions

View File

@ -1051,6 +1051,38 @@ index e70ebc6..96f8c27 100644
RegionSize can also be set on a per-table basis via
<link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html">HTableDescriptor</link>.
</para>
<section xml:id="too_many_regions">
<title>Too many regions</title>
<para>
Here are some issues you will run into when lots of regions per regionserver:
<unorderedlist>
<listitem><para>
MSLAB requires 2mb per memstore (that's 2mb per family per region).
1000 regions that have 2 families each is 3.9GB of heap used, and it's not even storing data yet. NB: the 2MB value is configurable.
</para></listitem>
<listitem><para>If you fill all the regions at somewhat the same rate, the global memory usage makes it that it forces tiny
flushes when you have too many regions which in turn generates compactions.
Rewriting the same data tens of times is the last thing you want.
An example is filling 1000 regions (with one family) equally and let's consider a lower bound for global memstore
usage of 5GB (the region server would have a big heap).
Once it reaches 5GB it will force flush the biggest region,
at that point they should almost all have about 5MB of data so
it would flush that amount. 5MB inserted later, it would flush another
region that will now have a bit over 5MB of data, and so on.
</para></listitem>
<listitem><para>The master as is is allergic to tons of regions, and will
take a lot of time assigning them and moving them around in batches.
The reason is that it's heavy on ZK usage, and it's not very async
at the moment (could really be improved -- and has been imporoved a bunch
in 0.96 hbase).
</para></listitem>
</unorderedlist>
</para>
<para>Another issue is the effect of the number of regions on mapreduce jobs.
Keeping 5 regions per RS would be too low for a job, whereas 1000 will generate too many maps.
</para>
</section>
</section>
<section xml:id="disable.splitting">