HBASE-5138 [ref manual] Add a discussion on the number of regions
git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1435998 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
c56eeb9598
commit
a611da16b3
|
@ -1051,6 +1051,38 @@ index e70ebc6..96f8c27 100644
|
|||
RegionSize can also be set on a per-table basis via
|
||||
<link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html">HTableDescriptor</link>.
|
||||
</para>
|
||||
<section xml:id="too_many_regions">
|
||||
<title>Too many regions</title>
|
||||
<para>
|
||||
Here are some issues you will run into when lots of regions per regionserver:
|
||||
<unorderedlist>
|
||||
<listitem><para>
|
||||
MSLAB requires 2mb per memstore (that's 2mb per family per region).
|
||||
1000 regions that have 2 families each is 3.9GB of heap used, and it's not even storing data yet. NB: the 2MB value is configurable.
|
||||
</para></listitem>
|
||||
<listitem><para>If you fill all the regions at somewhat the same rate, the global memory usage makes it that it forces tiny
|
||||
flushes when you have too many regions which in turn generates compactions.
|
||||
Rewriting the same data tens of times is the last thing you want.
|
||||
An example is filling 1000 regions (with one family) equally and let's consider a lower bound for global memstore
|
||||
usage of 5GB (the region server would have a big heap).
|
||||
Once it reaches 5GB it will force flush the biggest region,
|
||||
at that point they should almost all have about 5MB of data so
|
||||
it would flush that amount. 5MB inserted later, it would flush another
|
||||
region that will now have a bit over 5MB of data, and so on.
|
||||
</para></listitem>
|
||||
<listitem><para>The master as is is allergic to tons of regions, and will
|
||||
take a lot of time assigning them and moving them around in batches.
|
||||
The reason is that it's heavy on ZK usage, and it's not very async
|
||||
at the moment (could really be improved -- and has been imporoved a bunch
|
||||
in 0.96 hbase).
|
||||
</para></listitem>
|
||||
</unorderedlist>
|
||||
</para>
|
||||
<para>Another issue is the effect of the number of regions on mapreduce jobs.
|
||||
Keeping 5 regions per RS would be too low for a job, whereas 1000 will generate too many maps.
|
||||
</para>
|
||||
|
||||
</section>
|
||||
|
||||
</section>
|
||||
<section xml:id="disable.splitting">
|
||||
|
|
Loading…
Reference in New Issue