HBASE-4871 hbase book. docs cleanup.
git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1206362 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
66e47a99f9
commit
ec4ebc7ee3
|
@ -1556,41 +1556,30 @@ scan.setFilter(filter);
|
|||
|
||||
<section xml:id="regions.arch">
|
||||
<title>Regions</title>
|
||||
<para>This section is all about Regions.</para>
|
||||
<note>
|
||||
<para>Regions are comprised of a Store per Column Family.
|
||||
</para>
|
||||
</note>
|
||||
<para>Regions are the basic element of availability and
|
||||
distribution for tables, and are comprised of a Store per Column Family.
|
||||
</para>
|
||||
|
||||
<section xml:id="arch.regions.size">
|
||||
<title>Region Size</title>
|
||||
|
||||
<para>Region size is one of those tricky things, there are a few factors
|
||||
<para>Determining the "right" region size can be tricky, and there are a few factors
|
||||
to consider:</para>
|
||||
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Regions are the basic element of availability and
|
||||
distribution.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>HBase scales by having regions across many servers. Thus if
|
||||
you have 2 regions for 16GB data, on a 20 node machine you are a net
|
||||
loss there.</para>
|
||||
you have 2 regions for 16GB data, on a 20 node machine your data
|
||||
will be concentrated on just a few machines - nearly the entire
|
||||
cluster will be idle. This really cant be stressed enough, since a
|
||||
common problem is loading 200MB data into HBase then wondering why
|
||||
your awesome 10 node cluster isn't doing anything.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>High region count has been known to make things slow, this is
|
||||
getting better, but it is probably better to have 700 regions than
|
||||
3000 for the same amount of data.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>Low region count prevents parallel scalability as per point
|
||||
#2. This really cant be stressed enough, since a common problem is
|
||||
loading 200MB data into HBase then wondering why your awesome 10
|
||||
node cluster is mostly idle.</para>
|
||||
<para>On the other hand, high region count has been known to make things slow.
|
||||
This is getting better with each release of HBase, but it is probably better to have
|
||||
700 regions than 3000 for the same amount of data.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
|
@ -1599,10 +1588,12 @@ scan.setFilter(filter);
|
|||
</listitem>
|
||||
</itemizedlist>
|
||||
|
||||
<para>Its probably best to stick to the default, perhaps going smaller
|
||||
for hot tables (or manually split hot regions to spread the load over
|
||||
the cluster), or go with a 1GB region size if your cell sizes tend to be
|
||||
<para>When starting off, its probably best to stick to the default region-size, perhaps going
|
||||
smaller for hot tables (or manually split hot regions to spread the load over
|
||||
the cluster), or go with larger region sizes if your cell sizes tend to be
|
||||
largish (100k and up).</para>
|
||||
<para>See <xref linkend="bigger.regions"/> for more information on configuration.
|
||||
</para>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
|
|
|
@ -1028,6 +1028,11 @@ index e70ebc6..96f8c27 100644
|
|||
throughput is affected since every request that hits that region server will take longer,
|
||||
which exacerbates the problem even more.
|
||||
</para>
|
||||
<para>You can get a sense of whether you have too little or too many handlers by
|
||||
<xref linkend="rpc.logging" />
|
||||
on an individual RegionServer then tailing its logs (Queued requests
|
||||
consume memory).
|
||||
</para>
|
||||
</section>
|
||||
<section xml:id="big_memory">
|
||||
<title>Configuration for large memory machines</title>
|
||||
|
@ -1054,11 +1059,20 @@ index e70ebc6..96f8c27 100644
|
|||
Consider going to larger regions to cut down on the total number of regions
|
||||
on your cluster. Generally less Regions to manage makes for a smoother running
|
||||
cluster (You can always later manually split the big Regions should one prove
|
||||
hot and you want to spread the request load over the cluster). By default,
|
||||
regions are 256MB in size. You could run with
|
||||
1G. Some run with even larger regions; 4G or even larger. Adjust
|
||||
<code>hbase.hregion.max.filesize</code> in your <filename>hbase-site.xml</filename>.
|
||||
hot and you want to spread the request load over the cluster). A lower number of regions is
|
||||
preferred, generally in the range of 20 to low-hundreds
|
||||
per RegionServer. Adjust the regionsize as appropriate to achieve this number.
|
||||
</para>
|
||||
<para>For the 0.90.x codebase, the upper-bound of regionsize is about 4Gb, with a default of 256Mb.
|
||||
For 0.92.x codebase, due to the HFile v2 change much larger regionsizes can be supported (e.g., 20Gb).
|
||||
</para>
|
||||
<para>You may need to experiment with this setting based on your hardware configuration and application needs.
|
||||
</para>
|
||||
<para>Adjust <code>hbase.hregion.max.filesize</code> in your <filename>hbase-site.xml</filename>.
|
||||
RegionSize can also be set on a per-table basis via
|
||||
<link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html">HTableDescriptor</link>.
|
||||
</para>
|
||||
|
||||
</section>
|
||||
<section xml:id="disable.splitting">
|
||||
<title>Managed Splitting</title>
|
||||
|
|
|
@ -140,14 +140,6 @@
|
|||
<para>The number of regions for an HBase table is driven by the <xref
|
||||
linkend="bigger.regions" />. Also, see the architecture
|
||||
section on <xref linkend="arch.regions.size" /></para>
|
||||
<para>A lower number of regions is preferred, generally in the range of 20 to low-hundreds
|
||||
per RegionServer. Adjust the regionsize as appropriate to achieve this number.
|
||||
</para>
|
||||
<para>For the 0.90.x codebase, the upper-bound of regionsize is about 4Gb.
|
||||
For 0.92.x codebase, due to the HFile v2 change much larger regionsizes can be supported (e.g., 20Gb).
|
||||
</para>
|
||||
<para>You may need to experiment with this setting based on your hardware configuration and application needs.
|
||||
</para>
|
||||
</section>
|
||||
|
||||
<section xml:id="perf.compactions.and.splits">
|
||||
|
@ -161,15 +153,7 @@
|
|||
<section xml:id="perf.handlers">
|
||||
<title><varname>hbase.regionserver.handler.count</varname></title>
|
||||
<para>See <xref linkend="hbase.regionserver.handler.count"/>.
|
||||
This setting in essence sets how many requests are
|
||||
concurrently being processed inside the RegionServer at any
|
||||
one time. If set too high, then throughput may suffer as
|
||||
the concurrent requests contend; if set too low, requests will
|
||||
be stuck waiting to get into the machine. You can get a
|
||||
sense of whether you have too little or too many handlers by
|
||||
<xref linkend="rpc.logging" />
|
||||
on an individual RegionServer then tailing its logs (Queued requests
|
||||
consume memory).</para>
|
||||
</para>
|
||||
</section>
|
||||
<section xml:id="perf.hfile.block.cache.size">
|
||||
<title><varname>hfile.block.cache.size</varname></title>
|
||||
|
|
|
@ -574,6 +574,18 @@ hadoop 17789 155 35.2 9067824 8604364 ? S<l Mar04 9855:48 /usr/java/j
|
|||
</section>
|
||||
</section>
|
||||
|
||||
<section xml:id="trouble.network">
|
||||
<title>Network</title>
|
||||
<section xml:id="trouble.network.spikes">
|
||||
<title>Network Spikes</title>
|
||||
<para>If you are seeing periodic network spikes you might want to check the compactionQueues to see if major
|
||||
compactions are happening.
|
||||
</para>
|
||||
<para>See <xref linkend="managed.compactions"/> for more information on managing compactions.
|
||||
</para>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section xml:id="trouble.rs">
|
||||
<title>RegionServer</title>
|
||||
<para>For more information on the RegionServers, see <xref linkend="regionserver.arch"/>.
|
||||
|
|
Loading…
Reference in New Issue