More edits to the 'how many regions' section from our man Kevin O' Dell

git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1436497 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Michael Stack 2013-01-21 17:25:44 +00:00
parent 7e1a05c200
commit df99d9db4d
1 changed files with 20 additions and 4 deletions

View File

@ -1052,9 +1052,11 @@ index e70ebc6..96f8c27 100644
<link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html">HTableDescriptor</link>.
</para>
<section xml:id="too_many_regions">
<title>Too many regions</title>
<title>How many regions per RegionServer?</title>
<para>
Here are some issues you will run into when lots of regions per regionserver:
Typically you want to keep your region count low on HBase for numerous reasons.
Usually right around 100 regions per RegionServer has yielded the best results.
Here are some of the reasons below for keeping region count low:
<unorderedlist>
<listitem><para>
MSLAB requires 2mb per memstore (that's 2mb per family per region).
@ -1069,6 +1071,16 @@ index e70ebc6..96f8c27 100644
at that point they should almost all have about 5MB of data so
it would flush that amount. 5MB inserted later, it would flush another
region that will now have a bit over 5MB of data, and so on.
A basic formula for the amount of regions to have per region server would
look like this:
Heap * upper global memstore limit = amount of heap devoted to memstore
then the amount of heap devoted to memstore / (Number of regions per RS * CFs).
This will give you the rough memstore size if everything is being written to.
A more accurate formula is
Heap * upper global memstore limit = amount of heap devoted to memstore then the
amount of heap devoted to memstore / (Number of actively written regions per RS * CFs).
This can allot you a higher region count from the write perspective if you know how many
regions you will be writing to at one time.
</para></listitem>
<listitem><para>The master as is is allergic to tons of regions, and will
take a lot of time assigning them and moving them around in batches.
@ -1076,12 +1088,16 @@ index e70ebc6..96f8c27 100644
at the moment (could really be improved -- and has been imporoved a bunch
in 0.96 hbase).
</para></listitem>
<listitem><para>
In older versions of HBase (pre-v2 hfile, 0.90 and previous), tons of regions
on a few RS can cause the store file index to rise raising heap usage and can
create memory pressure or OOME on the RSs
</para></listitem>
</unorderedlist>
</para>
<para>Another issue is the effect of the number of regions on mapreduce jobs.
Keeping 5 regions per RS would be too low for a job, whereas 1000 will generate too many maps.
</para>
</para>
</section>
</section>