hbase-5028 book.xml - adding info on region assignment and file locality

git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1214412 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Doug Meil 2011-12-14 19:14:10 +00:00
parent 9ed28b472c
commit 685284c604
1 changed files with 96 additions and 12 deletions

View File

@ -1554,6 +1554,8 @@ scan.setFilter(filter);
<para>Periodically, and when there are not any regions in transition,
a load balancer will run and move regions around to balance cluster load.
See <xref linkend="balancer_config" /> for configuring this property.</para>
<para>See <xref linkend="regions.arch.assignment"/> for more information on region assignment.
</para>
</section>
<section xml:id="master.processes.catalog"><title>CatalogJanitor</title>
<para>Periodically checks and cleans up the .META. table. See <xref linkend="arch.catalog.meta" /> for more information on META.</para>
@ -1714,6 +1716,90 @@ scan.setFilter(filter);
</para>
</section>
<section xml:id="regions.arch.assignment">
<title>Region-RegionServer Assignment</title>
<para>This section describes how Regions are assigned to RegionServers.
</para>
<section xml:id="regions.arch.assignment.startup">
<title>Startup</title>
<para>When HBase starts regions are assigned as follows (short version):
</para>
<orderedlist>
<listitem>
<para>The Master invokes the <code>AssignmentManager</code> upon startup.</para>
</listitem>
<listitem>
<para>The <code>AssignmentManager</code> looks at the existing region assignments
in META.</para>
</listitem>
<listitem>
<para>If the region assignment is still valid (i.e., if the RegionServer) is still online
then the assignment is kept.
</para>
</listitem>
<listitem>
<para>If the assignment is invalid, then the <code>LoadBalancerFactory</code> is invoked to assign the
region. The <code>DefaultLoadBalancer</code> will randomly assign the region to a RegionServer.
</para>
</listitem>
</orderedlist>
</section>
<section xml:id="regions.arch.assignment.failover">
<title>Failover</title>
<para>When a RegionServer fails (short version):
</para>
<orderedlist>
<listitem>
<para>The regions immediately become unavailable because the RegionServer is down.</para>
</listitem>
<listitem>
<para>The Master will detect that the RegionServer has failed.</para>
</listitem>
<listitem>
<para>The region assignments will be considered invalid and will be re-assigned just
like the startup sequence.
</para>
</listitem>
</orderedlist>
</section>
<section xml:id="regions.arch.balancer">
<title>Region Load Balancing</title>
<para>
Regions can be periodically moved by the <xref linkend="master.processes.loadbalancer" />.
</para>
</section>
</section> <!-- assignment -->
<section xml:id="regions.arch.locality">
<title>Region-RegionServer Locality</title>
<para>Over time, Region-RegionServer locality is achieved via the an aspect of
HDFS block replication. The HDFS client when choosing where to write it replicas,
by default does as follows:
<orderedlist>
<listitem>First replica is written to local node
</listitem>
<listitem>Second replica to another node in same rack
</listitem>
<listitem>Third replica to a node in another rack (if sufficient nodes)
</listitem>
</orderedlist>
HBase eventually achieves locality for a region after a flush a compaction.
In a RegionServer failover situation a RegionServer may be assigned regions with non-local
StoreFiles (i.e., none of the replicas are local), however eventually as new data is written
in the region, or the table is compacted and StoreFiles are re-written, they will become "local"
to the RegionServer.
</para>
<para>For more information, see <link xlink:href="http://hadoop.apache.org/common/docs/r0.20.205.0/hdfs_design.html#Replica+Placement%3A+The+First+Baby+Steps">HDFS Design on Replica Placement</link>
and also Lars George's blog on <link xlink:href="http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html">HBase and HDFS locality</link>.
</para>
</section>
<section>
<title>Region Splits</title>
@ -1725,15 +1811,6 @@ scan.setFilter(filter);
splits (and for why you might do this)</para>
</section>
<section xml:id="regions.arch.balancer">
<title>Region Load Balancing</title>
<para>
Regions can be periodically moved by the <xref linkend="master.processes.loadbalancer" />.
</para>
</section>
<section xml:id="store">
<title>Store</title>
<para>A Store hosts a MemStore and 0 or more StoreFiles (HFiles). A Store corresponds to a column family for a table for a given region.
@ -2729,13 +2806,15 @@ Comparator class used for Bloom filter keys, a UTF>8 encoded string stored usi
<para><link xlink:href="http://www.slideshare.net/cloudera/hw09-practical-h-base-getting-the-most-from-your-h-base-install">Getting The Most From Your HBase Install</link> by Ryan Rawson, Jonathan Gray (Hadoop World 2009).
</para>
</section>
<section xml:id="other.info.papers"><title>Papers</title>
<section xml:id="other.info.papers"><title>HBase Papers</title>
<para><link xlink:href="http://research.google.com/archive/bigtable.html">BigTable</link> by Google (2006).
</para>
<para><link xlink:href="http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html">HBase and HDFS Locality</link> by Lars George (2010).
</para>
<para><link xlink:href="http://ianvarley.com/UT/MR/Varley_MastersReport_Full_2009-08-07.pdf">No Relation: The Mixed Blessings of Non-Relational Databases</link> by Ian Varley (2009).
</para>
</section>
<section xml:id="other.info.sites"><title>Sites</title>
<section xml:id="other.info.sites"><title>HBase Sites</title>
<para><link xlink:href="http://www.cloudera.com/blog/category/hbase/">Cloudera's HBase Blog</link> has a lot of links to useful HBase information.
<itemizedlist>
<listitem><link xlink:href="http://www.cloudera.com/blog/2010/04/cap-confusion-problems-with-partition-tolerance/">CAP Confusion</link> is a relevant entry for background information on
@ -2746,10 +2825,15 @@ Comparator class used for Bloom filter keys, a UTF>8 encoded string stored usi
<para><link xlink:href="http://wiki.apache.org/hadoop/HBase/HBasePresentations">HBase Wiki</link> has a page with a number of presentations.
</para>
</section>
<section xml:id="other.info.books"><title>Books</title>
<section xml:id="other.info.books"><title>HBase Books</title>
<para><link xlink:href="http://shop.oreilly.com/product/0636920014348.do">HBase: The Definitive Guide</link> by Lars George.
</para>
</section>
<section xml:id="other.info.books.hadoop"><title>Hadoop Books</title>
<para><link xlink:href="http://shop.oreilly.com/product/9780596521981.do">Hadoop: The Definitive Guide</link> by Tom White.
</para>
</section>
</appendix>
<appendix xml:id="asf" ><title>HBase and the Apache Software Foundation</title>