HBASE-4871 hbase book. docs cleanup.

git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1206362 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Doug Meil 2011-11-25 22:20:48 +00:00
parent 66e47a99f9
commit ec4ebc7ee3
4 changed files with 48 additions and 47 deletions

View File

@ -1556,41 +1556,30 @@ scan.setFilter(filter);
<section xml:id="regions.arch">
<title>Regions</title>
<para>This section is all about Regions.</para>
<note>
<para>Regions are comprised of a Store per Column Family.
<para>Regions are the basic element of availability and
distribution for tables, and are comprised of a Store per Column Family.
</para>
</note>
<section xml:id="arch.regions.size">
<title>Region Size</title>
<para>Region size is one of those tricky things, there are a few factors
<para>Determining the "right" region size can be tricky, and there are a few factors
to consider:</para>
<itemizedlist>
<listitem>
<para>Regions are the basic element of availability and
distribution.</para>
</listitem>
<listitem>
<para>HBase scales by having regions across many servers. Thus if
you have 2 regions for 16GB data, on a 20 node machine you are a net
loss there.</para>
you have 2 regions for 16GB data, on a 20 node machine your data
will be concentrated on just a few machines - nearly the entire
cluster will be idle. This really cant be stressed enough, since a
common problem is loading 200MB data into HBase then wondering why
your awesome 10 node cluster isn't doing anything.</para>
</listitem>
<listitem>
<para>High region count has been known to make things slow, this is
getting better, but it is probably better to have 700 regions than
3000 for the same amount of data.</para>
</listitem>
<listitem>
<para>Low region count prevents parallel scalability as per point
#2. This really cant be stressed enough, since a common problem is
loading 200MB data into HBase then wondering why your awesome 10
node cluster is mostly idle.</para>
<para>On the other hand, high region count has been known to make things slow.
This is getting better with each release of HBase, but it is probably better to have
700 regions than 3000 for the same amount of data.</para>
</listitem>
<listitem>
@ -1599,10 +1588,12 @@ scan.setFilter(filter);
</listitem>
</itemizedlist>
<para>Its probably best to stick to the default, perhaps going smaller
for hot tables (or manually split hot regions to spread the load over
the cluster), or go with a 1GB region size if your cell sizes tend to be
<para>When starting off, its probably best to stick to the default region-size, perhaps going
smaller for hot tables (or manually split hot regions to spread the load over
the cluster), or go with larger region sizes if your cell sizes tend to be
largish (100k and up).</para>
<para>See <xref linkend="bigger.regions"/> for more information on configuration.
</para>
</section>
<section>

View File

@ -1028,6 +1028,11 @@ index e70ebc6..96f8c27 100644
throughput is affected since every request that hits that region server will take longer,
which exacerbates the problem even more.
</para>
<para>You can get a sense of whether you have too little or too many handlers by
<xref linkend="rpc.logging" />
on an individual RegionServer then tailing its logs (Queued requests
consume memory).
</para>
</section>
<section xml:id="big_memory">
<title>Configuration for large memory machines</title>
@ -1054,11 +1059,20 @@ index e70ebc6..96f8c27 100644
Consider going to larger regions to cut down on the total number of regions
on your cluster. Generally less Regions to manage makes for a smoother running
cluster (You can always later manually split the big Regions should one prove
hot and you want to spread the request load over the cluster). By default,
regions are 256MB in size. You could run with
1G. Some run with even larger regions; 4G or even larger. Adjust
<code>hbase.hregion.max.filesize</code> in your <filename>hbase-site.xml</filename>.
hot and you want to spread the request load over the cluster). A lower number of regions is
preferred, generally in the range of 20 to low-hundreds
per RegionServer. Adjust the regionsize as appropriate to achieve this number.
</para>
<para>For the 0.90.x codebase, the upper-bound of regionsize is about 4Gb, with a default of 256Mb.
For 0.92.x codebase, due to the HFile v2 change much larger regionsizes can be supported (e.g., 20Gb).
</para>
<para>You may need to experiment with this setting based on your hardware configuration and application needs.
</para>
<para>Adjust <code>hbase.hregion.max.filesize</code> in your <filename>hbase-site.xml</filename>.
RegionSize can also be set on a per-table basis via
<link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html">HTableDescriptor</link>.
</para>
</section>
<section xml:id="disable.splitting">
<title>Managed Splitting</title>

View File

@ -140,14 +140,6 @@
<para>The number of regions for an HBase table is driven by the <xref
linkend="bigger.regions" />. Also, see the architecture
section on <xref linkend="arch.regions.size" /></para>
<para>A lower number of regions is preferred, generally in the range of 20 to low-hundreds
per RegionServer. Adjust the regionsize as appropriate to achieve this number.
</para>
<para>For the 0.90.x codebase, the upper-bound of regionsize is about 4Gb.
For 0.92.x codebase, due to the HFile v2 change much larger regionsizes can be supported (e.g., 20Gb).
</para>
<para>You may need to experiment with this setting based on your hardware configuration and application needs.
</para>
</section>
<section xml:id="perf.compactions.and.splits">
@ -161,15 +153,7 @@
<section xml:id="perf.handlers">
<title><varname>hbase.regionserver.handler.count</varname></title>
<para>See <xref linkend="hbase.regionserver.handler.count"/>.
This setting in essence sets how many requests are
concurrently being processed inside the RegionServer at any
one time. If set too high, then throughput may suffer as
the concurrent requests contend; if set too low, requests will
be stuck waiting to get into the machine. You can get a
sense of whether you have too little or too many handlers by
<xref linkend="rpc.logging" />
on an individual RegionServer then tailing its logs (Queued requests
consume memory).</para>
</para>
</section>
<section xml:id="perf.hfile.block.cache.size">
<title><varname>hfile.block.cache.size</varname></title>

View File

@ -574,6 +574,18 @@ hadoop 17789 155 35.2 9067824 8604364 ? S&lt;l Mar04 9855:48 /usr/java/j
</section>
</section>
<section xml:id="trouble.network">
<title>Network</title>
<section xml:id="trouble.network.spikes">
<title>Network Spikes</title>
<para>If you are seeing periodic network spikes you might want to check the compactionQueues to see if major
compactions are happening.
</para>
<para>See <xref linkend="managed.compactions"/> for more information on managing compactions.
</para>
</section>
</section>
<section xml:id="trouble.rs">
<title>RegionServer</title>
<para>For more information on the RegionServers, see <xref linkend="regionserver.arch"/>.