Inserted an email Ryan wrote the list on 'considerations sizing regions'

git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1000113 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Michael Stack 2010-09-22 18:05:32 +00:00
parent af3bf8cad0
commit 0de40fe1b0
1 changed files with 49 additions and 0 deletions

View File

@ -66,6 +66,55 @@
<para>TODO: Review all of the below to ensure it matches what was <para>TODO: Review all of the below to ensure it matches what was
committed -- St.Ack 20100901</para> committed -- St.Ack 20100901</para>
</note> </note>
<section>
<title>
Region Size
</title>
<para>Region size is one of those tricky things, there are a few factors to consider:
</para>
<itemizedlist>
<listitem>
<para>
Regions are the basic element of availability and distribution.
</para>
</listitem>
<listitem>
<para>
HBase scales by having regions across many servers. Thus if you
have 2 regions for 16GB data, on a 20 node machine you are a net loss
there.
</para>
</listitem>
<listitem>
<para>
High region count has been known to make things slow, this is
getting better, but it is probably better to have 700 regions than
3000 for the same amount of data.
</para>
</listitem>
<listitem>
<para>
Low region count prevents parallel scalability as per point #2.
This really cant be stressed enough, since a common problem is loading
200MB data into HBase then wondering why your awesome 10 node cluster
is mostly idle.
</para>
</listitem>
<listitem>
<para>
There is not much memory footprint difference between 1 region and
10 in terms of indexes, etc, held by the regionserver.
</para>
</listitem>
</itemizedlist>
<para>Its probably best to stick to the default,
perhaps going smaller for hot tables (or manually split hot regions
to spread the load over the cluster), or go with a 1GB region size
if your cell sizes tend to be largish (100k and up).
</para>
</section>
<section> <section>
<title>Region Transitions</title> <title>Region Transitions</title>