From 0de40fe1b0b9366768e02199531b1cd5aebbcdc9 Mon Sep 17 00:00:00 2001 From: Michael Stack Date: Wed, 22 Sep 2010 18:05:32 +0000 Subject: [PATCH] Inserted an email Ryan wrote the list on 'considerations sizing regions' git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1000113 13f79535-47bb-0310-9956-ffa450edef68 --- src/docbkx/book.xml | 49 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 49 insertions(+) diff --git a/src/docbkx/book.xml b/src/docbkx/book.xml index 3bb86f402d1..6a794a59b02 100644 --- a/src/docbkx/book.xml +++ b/src/docbkx/book.xml @@ -66,6 +66,55 @@ TODO: Review all of the below to ensure it matches what was committed -- St.Ack 20100901 +
+ + Region Size + +Region size is one of those tricky things, there are a few factors to consider: + + + + +Regions are the basic element of availability and distribution. + + + + +HBase scales by having regions across many servers. Thus if you +have 2 regions for 16GB data, on a 20 node machine you are a net loss +there. + + + + +High region count has been known to make things slow, this is +getting better, but it is probably better to have 700 regions than +3000 for the same amount of data. + + + + +Low region count prevents parallel scalability as per point #2. +This really cant be stressed enough, since a common problem is loading +200MB data into HBase then wondering why your awesome 10 node cluster +is mostly idle. + + + + +There is not much memory footprint difference between 1 region and +10 in terms of indexes, etc, held by the regionserver. + + + + +Its probably best to stick to the default, +perhaps going smaller for hot tables (or manually split hot regions +to spread the load over the cluster), or go with a 1GB region size +if your cell sizes tend to be largish (100k and up). + + +
Region Transitions