HBASE-4786 book.xml,performance.xml adding and reorg of schema info

git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1201992 13f79535-47bb-0310-9956-ffa450edef68
2011-11-15 01:14:10 +00:00 · 2011-11-15 01:14:10 +00:00 · dab526e492
commit dab526e492
parent 1c9f356fca
2 changed files with 60 additions and 37 deletions
--- a/src/docbkx/book.xml
+++ b/src/docbkx/book.xml
@ -545,7 +545,8 @@ admin.modifyColumn(table, cf2 );    // modifying existing ColumnFamily
 admin.enableTable(table);                
      </programlisting>
      </para>See <xref linkend="client_dependencies"/> for more information about configuring client connections.
-      <para>
+      <para>Note:  online schema changes are supported in the 0.92.x codebase, but the 0.90.x codebase requires the table
+      to be disabled.
      </para>
  </section>   
  <section xml:id="number.of.cfs">
@ -739,17 +740,6 @@ System.out.println("md5 digest as string length: " + sbDigest.length);    // ret
      </para>
    </section> 
  </section>
-  <section xml:id="cf.in.memory">
-  <title>
-  In-Memory ColumnFamilies
-  </title>
-  <para>ColumnFamilies can optionally be defined as in-memory.  Data is still persisted to disk, just like any other ColumnFamily.  
-  In-memory blocks have the highest priority in the <xref linkend="block.cache" />, but it is not a guarantee that the entire table
-  will be in memory.
-  </para>
-  <para>See <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html">HColumnDescriptor</link> for more information.
-  </para>
-  </section>
  <section xml:id="ttl">
  <title>Time To Live (TTL)</title>
  <para>ColumnFamilies can set a TTL length in seconds, and HBase will automatically delete rows once the expiration time is reached.
@ -775,20 +765,6 @@ System.out.println("md5 digest as string length: " + sbDigest.length);    // ret
  <para>See <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html">HColumnDescriptor</link> for more information.
  </para>
  </section>
-  <section xml:id="schema.bloom">
-  <title>Bloom Filters</title>
-  <para>Bloom Filters can be enabled per-ColumnFamily.
-        Use <code>HColumnDescriptor.setBloomFilterType(NONE | ROW |
-        ROWCOL)</code> to enable blooms per Column Family. Default =
-        <varname>NONE</varname> for no bloom filters. If
-        <varname>ROW</varname>, the hash of the row will be added to the bloom
-        on each insert. If <varname>ROWCOL</varname>, the hash of the row +
-        column family + column family qualifier will be added to the bloom on
-        each key insert.</para>
-  <para>See <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html">HColumnDescriptor</link> and 
-  <xref linkend="blooms"/> for more information.
-  </para>
-  </section>
  <section xml:id="secondary.indexes">
  <title>
  Secondary Indexes and Alternate Query Paths
@ -874,6 +850,11 @@ System.out.println("md5 digest as string length: " + sbDigest.length);    // ret
      </para>
    </section>
  </section>
+  <section xml:id="schema.ops"><title>Operational and Performance Configuration Options</title>
+    <para>See the Performance section <xref linkend="perf.schema"/> for more information operational and performance
+    schema design options, such as Bloom Filters, Table-configured regionsizes, and blocksizes.
+    </para>
+  </section>  

  </chapter>   <!--  schema design -->

--- a/src/docbkx/performance.xml
+++ b/src/docbkx/performance.xml
@ -140,10 +140,13 @@
      <para>The number of regions for an HBase table is driven by the <xref
              linkend="bigger.regions" />. Also, see the architecture
          section on <xref linkend="arch.regions.size" /></para>
-       <para>A lower number of regions is preferred, generally in the range of 20 to 200
-       per RegionServer.  Adjust the regionsize as appropriate to achieve this number.  There
-       are some clusters that set the regionsize to 20Gb, for example, so you may need to 
-       experiment with this setting based on your hardware configuration and application needs.
+       <para>A lower number of regions is preferred, generally in the range of 20 to low-hundreds
+       per RegionServer.  Adjust the regionsize as appropriate to achieve this number. 
+       </para>
+       <para>For the 0.90.x codebase, the upper-bound of regionsize is about 4Gb.
+       For 0.92.x codebase, due to the HFile v2 change much larger regionsizes can be supported (e.g., 20Gb).
+       </para>
+       <para>You may need to experiment with this setting based on your hardware configuration and application needs.
       </para>
    </section>

@ -155,12 +158,6 @@
      something you want to consider.</para>
    </section>

-    <section xml:id="perf.compression">
-      <title>Compression</title>
-      <para>Production systems should use compression with their column family definitions.  See <xref linkend="compression" /> for more information.
-      </para>
-    </section>
-
    <section xml:id="perf.handlers">
        <title><varname>hbase.regionserver.handler.count</varname></title>
        <para>See <xref linkend="hbase.regionserver.handler.count"/>. 
@ -218,7 +215,52 @@
      <title>Key and Attribute Lengths</title>
      <para>See <xref linkend="keysize" />.</para>
    </section>
-  </section>
+    <section xml:id="schema.regionsize"><title>Table RegionSize</title>
+    <para>The regionsize can be set on a per-table basis via <code>setFileSize</code> on
+    <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html">HTableDescriptor</link> in the 
+    event where certain tables require different regionsizes than the configured default regionsize.
+    </para>
+    <para>See <xref linkend="perf.number.of.regions"/> for more information.
+    </para>
+    </section>
+    <section xml:id="schema.bloom">
+    <title>Bloom Filters</title>
+    <para>Bloom Filters can be enabled per-ColumnFamily.
+        Use <code>HColumnDescriptor.setBloomFilterType(NONE | ROW |
+        ROWCOL)</code> to enable blooms per Column Family. Default =
+        <varname>NONE</varname> for no bloom filters. If
+        <varname>ROW</varname>, the hash of the row will be added to the bloom
+        on each insert. If <varname>ROWCOL</varname>, the hash of the row +
+        column family + column family qualifier will be added to the bloom on
+        each key insert.</para>
+    <para>See <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html">HColumnDescriptor</link> and 
+    <xref linkend="blooms"/> for more information.
+    </para>
+    </section>
+    <section xml:id="schema.cf.blocksize"><title>ColumnFamily BlockSize</title>
+    <para>The blocksize can be configured for each ColumnFamily in a table, and this defaults to 64k.  Larger cell values require larger blocksizes. 
+    There is an inverse relationship between blocksize and the resulting StoreFile indexes (i.e., if the blocksize is doubled then the resulting
+    indexes should be roughly halved).
+    </para>
+    <para>See <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html">HColumnDescriptor</link> 
+    and <xref linkend="store"/>for more information.
+    </para>
+    </section>
+    <section xml:id="cf.in.memory">
+    <title>In-Memory ColumnFamilies</title>
+    <para>ColumnFamilies can optionally be defined as in-memory.  Data is still persisted to disk, just like any other ColumnFamily.  
+    In-memory blocks have the highest priority in the <xref linkend="block.cache" />, but it is not a guarantee that the entire table
+    will be in memory.
+    </para>
+    <para>See <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html">HColumnDescriptor</link> for more information.
+    </para>
+    </section>
+    <section xml:id="perf.compression">
+      <title>Compression</title>
+      <para>Production systems should use compression with their ColumnFamily definitions.  See <xref linkend="compression" /> for more information.
+      </para>
+    </section>
+  </section>  <!--  perf schema -->
  
  <section xml:id="perf.writing">
    <title>Writing to HBase</title>