More edits to the 'how many regions' section from our man Kevin O' Dell
git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1436497 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
7e1a05c200
commit
df99d9db4d
|
@ -1052,9 +1052,11 @@ index e70ebc6..96f8c27 100644
|
|||
<link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html">HTableDescriptor</link>.
|
||||
</para>
|
||||
<section xml:id="too_many_regions">
|
||||
<title>Too many regions</title>
|
||||
<title>How many regions per RegionServer?</title>
|
||||
<para>
|
||||
Here are some issues you will run into when lots of regions per regionserver:
|
||||
Typically you want to keep your region count low on HBase for numerous reasons.
|
||||
Usually right around 100 regions per RegionServer has yielded the best results.
|
||||
Here are some of the reasons below for keeping region count low:
|
||||
<unorderedlist>
|
||||
<listitem><para>
|
||||
MSLAB requires 2mb per memstore (that's 2mb per family per region).
|
||||
|
@ -1069,6 +1071,16 @@ index e70ebc6..96f8c27 100644
|
|||
at that point they should almost all have about 5MB of data so
|
||||
it would flush that amount. 5MB inserted later, it would flush another
|
||||
region that will now have a bit over 5MB of data, and so on.
|
||||
A basic formula for the amount of regions to have per region server would
|
||||
look like this:
|
||||
Heap * upper global memstore limit = amount of heap devoted to memstore
|
||||
then the amount of heap devoted to memstore / (Number of regions per RS * CFs).
|
||||
This will give you the rough memstore size if everything is being written to.
|
||||
A more accurate formula is
|
||||
Heap * upper global memstore limit = amount of heap devoted to memstore then the
|
||||
amount of heap devoted to memstore / (Number of actively written regions per RS * CFs).
|
||||
This can allot you a higher region count from the write perspective if you know how many
|
||||
regions you will be writing to at one time.
|
||||
</para></listitem>
|
||||
<listitem><para>The master as is is allergic to tons of regions, and will
|
||||
take a lot of time assigning them and moving them around in batches.
|
||||
|
@ -1076,12 +1088,16 @@ index e70ebc6..96f8c27 100644
|
|||
at the moment (could really be improved -- and has been imporoved a bunch
|
||||
in 0.96 hbase).
|
||||
</para></listitem>
|
||||
<listitem><para>
|
||||
In older versions of HBase (pre-v2 hfile, 0.90 and previous), tons of regions
|
||||
on a few RS can cause the store file index to rise raising heap usage and can
|
||||
create memory pressure or OOME on the RSs
|
||||
</para></listitem>
|
||||
</unorderedlist>
|
||||
</para>
|
||||
<para>Another issue is the effect of the number of regions on mapreduce jobs.
|
||||
Keeping 5 regions per RS would be too low for a job, whereas 1000 will generate too many maps.
|
||||
</para>
|
||||
|
||||
</para>
|
||||
</section>
|
||||
|
||||
</section>
|
||||
|
|
Loading…
Reference in New Issue