HBASE-4871 hbase book. docs cleanup.

git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1206362 13f79535-47bb-0310-9956-ffa450edef68
2011-11-25 22:20:48 +00:00 · 2011-11-25 22:20:48 +00:00 · ec4ebc7ee3
commit ec4ebc7ee3
parent 66e47a99f9
4 changed files with 48 additions and 47 deletions
--- a/src/docbkx/book.xml
+++ b/src/docbkx/book.xml
@ -1556,41 +1556,30 @@ scan.setFilter(filter);

    <section xml:id="regions.arch">
    <title>Regions</title>
-    <para>This section is all about Regions.</para>
-    <note>
-        <para>Regions are comprised of a Store per Column Family.
-        </para>
-    </note>
+    <para>Regions are the basic element of availability and
+     distribution for tables, and are comprised of a Store per Column Family.
+    </para>

    <section xml:id="arch.regions.size">
      <title>Region Size</title>

-      <para>Region size is one of those tricky things, there are a few factors
+      <para>Determining the "right" region size can be tricky, and there are a few factors
      to consider:</para>

      <itemizedlist>
-        <listitem>
-          <para>Regions are the basic element of availability and
-          distribution.</para>
-        </listitem>
-
        <listitem>
          <para>HBase scales by having regions across many servers. Thus if
-          you have 2 regions for 16GB data, on a 20 node machine you are a net
-          loss there.</para>
+          you have 2 regions for 16GB data, on a 20 node machine your data
+          will be concentrated on just a few machines - nearly the entire
+          cluster will be idle.  This really cant be stressed enough, since a 
+          common problem is loading 200MB data into HBase then wondering why 
+          your awesome 10 node cluster isn't doing anything.</para>
        </listitem>

        <listitem>
-          <para>High region count has been known to make things slow, this is
-          getting better, but it is probably better to have 700 regions than
-          3000 for the same amount of data.</para>
-        </listitem>
-
-        <listitem>
-          <para>Low region count prevents parallel scalability as per point
-          #2. This really cant be stressed enough, since a common problem is
-          loading 200MB data into HBase then wondering why your awesome 10
-          node cluster is mostly idle.</para>
+          <para>On the other hand, high region count has been known to make things slow. 
+          This is getting better with each release of HBase, but it is probably better to have
+          700 regions than 3000 for the same amount of data.</para>
        </listitem>

        <listitem>
@ -1599,10 +1588,12 @@ scan.setFilter(filter);
        </listitem>
      </itemizedlist>

-      <para>Its probably best to stick to the default, perhaps going smaller
-      for hot tables (or manually split hot regions to spread the load over
-      the cluster), or go with a 1GB region size if your cell sizes tend to be
+      <para>When starting off, its probably best to stick to the default region-size, perhaps going
+      smaller for hot tables (or manually split hot regions to spread the load over
+      the cluster), or go with larger region sizes if your cell sizes tend to be
      largish (100k and up).</para>
+      <para>See <xref linkend="bigger.regions"/> for more information on configuration.
+      </para>
    </section>

      <section>
--- a/src/docbkx/configuration.xml
+++ b/src/docbkx/configuration.xml
@ -1028,6 +1028,11 @@ index e70ebc6..96f8c27 100644
          throughput is affected since every request that hits that region server will take longer,
          which exacerbates the problem even more.
          </para>
+          <para>You can get a sense of whether you have too little or too many handlers by
+            <xref linkend="rpc.logging" />
+            on an individual RegionServer then tailing its logs (Queued requests
+            consume memory).
+            </para>
          </section>
      <section xml:id="big_memory">
        <title>Configuration for large memory machines</title>
@ -1054,11 +1059,20 @@ index e70ebc6..96f8c27 100644
      Consider going to larger regions to cut down on the total number of regions
      on your cluster. Generally less Regions to manage makes for a smoother running
      cluster (You can always later manually split the big Regions should one prove
-      hot and you want to spread the request load over the cluster).  By default,
-      regions are 256MB in size.  You could run with
-      1G.  Some run with even larger regions; 4G or even larger.  Adjust
-      <code>hbase.hregion.max.filesize</code> in your <filename>hbase-site.xml</filename>.
+      hot and you want to spread the request load over the cluster).  A lower number of regions is
+       preferred, generally in the range of 20 to low-hundreds
+       per RegionServer.  Adjust the regionsize as appropriate to achieve this number. 
+       </para>
+       <para>For the 0.90.x codebase, the upper-bound of regionsize is about 4Gb, with a default of 256Mb.
+       For 0.92.x codebase, due to the HFile v2 change much larger regionsizes can be supported (e.g., 20Gb).
+       </para>
+       <para>You may need to experiment with this setting based on your hardware configuration and application needs.
+       </para>
+       <para>Adjust <code>hbase.hregion.max.filesize</code> in your <filename>hbase-site.xml</filename>.
+       RegionSize can also be set on a per-table basis via 
+       <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html">HTableDescriptor</link>.
      </para>
+      
      </section>
      <section xml:id="disable.splitting">
      <title>Managed Splitting</title>
--- a/src/docbkx/performance.xml
+++ b/src/docbkx/performance.xml
@ -140,14 +140,6 @@
      <para>The number of regions for an HBase table is driven by the <xref
              linkend="bigger.regions" />. Also, see the architecture
          section on <xref linkend="arch.regions.size" /></para>
-       <para>A lower number of regions is preferred, generally in the range of 20 to low-hundreds
-       per RegionServer.  Adjust the regionsize as appropriate to achieve this number. 
-       </para>
-       <para>For the 0.90.x codebase, the upper-bound of regionsize is about 4Gb.
-       For 0.92.x codebase, due to the HFile v2 change much larger regionsizes can be supported (e.g., 20Gb).
-       </para>
-       <para>You may need to experiment with this setting based on your hardware configuration and application needs.
-       </para>
    </section>

    <section xml:id="perf.compactions.and.splits">
@ -161,15 +153,7 @@
    <section xml:id="perf.handlers">
        <title><varname>hbase.regionserver.handler.count</varname></title>
        <para>See <xref linkend="hbase.regionserver.handler.count"/>. 
-            This setting in essence sets how many requests are
-            concurrently being processed inside the RegionServer at any
-            one time.  If set too high, then throughput may suffer as
-            the concurrent requests contend; if set too low, requests will
-            be stuck waiting to get into the machine.  You can get a
-            sense of whether you have too little or too many handlers by
-            <xref linkend="rpc.logging" />
-            on an individual RegionServer then tailing its logs (Queued requests
-            consume memory).</para>
+	    </para>
    </section>
    <section xml:id="perf.hfile.block.cache.size">
        <title><varname>hfile.block.cache.size</varname></title>
--- a/src/docbkx/troubleshooting.xml
+++ b/src/docbkx/troubleshooting.xml
@ -574,6 +574,18 @@ hadoop   17789  155 35.2 9067824 8604364 ?     S&lt;l  Mar04 9855:48 /usr/java/j
       </section>    
     </section>
        
+    <section xml:id="trouble.network">
+      <title>Network</title>
+      <section xml:id="trouble.network.spikes">
+        <title>Network Spikes</title>
+        <para>If you are seeing periodic network spikes you might want to check the compactionQueues to see if major 
+        compactions are happening.
+        </para>
+        <para>See <xref linkend="managed.compactions"/> for more information on managing compactions.
+        </para>
+        </section>
+    </section>
+        
    <section xml:id="trouble.rs">
      <title>RegionServer</title>
        <para>For more information on the RegionServers, see <xref linkend="regionserver.arch"/>.