HBASE-4730 book.xml, ops_mgt.xml, performance.xml - handful of changes

git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1196792 13f79535-47bb-0310-9956-ffa450edef68
2011-11-02 20:40:14 +00:00 · 2011-11-02 20:40:14 +00:00 · f0444014b8
parent b44e09085c
commit f0444014b8
3 changed files with 27 additions and 1 deletions
--- a/src/docbkx/book.xml
+++ b/src/docbkx/book.xml
@ -565,6 +565,12 @@ admin.enableTable(table);
        second and third column family in the case where data access is usually column scoped;
        i.e. you query one column family or the other but usually not both at the one time.
    </para>
+    <section xml:id="number.of.cfs.card"><title>Cardinality of ColumnFamilies</title>
+      <para>Where multiple ColumnFamilies exist in a single table, be aware of the cardinality (i.e., number of rows).  
+      If ColumnFamily-A has 1000,000 rows and ColumnFamily-B has 1 billion rows, ColumnFamily-A's data will likely be spread 
+      across many, many regions (and RegionServers).  This makes mass scans for ColumnFamily-A less efficient.  
+      </para>
+    </section>
  </section>
  <section xml:id="rowkey.design"><title>Rowkey Design</title>
    <section xml:id="timeseries">
@ -972,6 +978,11 @@ public static class MyMapper extends TableMapper&lt;ImmutableBytesWritable, Put&
    </para>
    </para>
    </section>
+    <section xml:id="mapreduce.example.readwrite.multi">
+      <title>HBase MapReduce Read/Write Example With Multi-Table Output</title>
+      <para>TODO:  example for <classname>MultiTableOutputFormat</classname>.
+    </para>
+    </section>
    <section xml:id="mapreduce.example.summary">
    <title>HBase MapReduce Summary Example</title>
    <para>The following example uses HBase as a MapReduce source and sink with a summarization step.  This example will 
@ -1575,7 +1586,6 @@ scan.setFilter(filter);
          <para>For more information, see the <link xlink:href="http://hbase.apache.org/xref/org/apache/hadoop/hbase/io/hfile/HFile.html">HFile source code</link>.
          </para>
      </section>
-
      <section xml:id="hfile_tool">
        <title>HFile Tool</title>

@ -1589,7 +1599,13 @@ scan.setFilter(filter);
        usage for other things to do with the <classname>HFile</classname>
        tool.</para>
      </section>
+      <section xml:id="store.file.dir">
+       <title>StoreFile Directory Structure on HDFS</title>
+        <para>For more information of what StoreFiles look like on HDFS with respect to the directory structure, see <xref linkend="trouble.namenode.hbase.objects" />.
+        </para>
      </section>
+      </section> <!--  hfile -->
+      
      <section xml:id="hfile.blocks">
        <title>Blocks</title>
        <para>StoreFiles are composed of blocks.  The blocksize is configured on a per-ColumnFamily basis.
--- a/src/docbkx/ops_mgt.xml
+++ b/src/docbkx/ops_mgt.xml
@ -417,6 +417,11 @@ false
        </para>
      </section>
    </section>
+    <section xml:id="ops.capacity.regions"><title>Regions</title>
+      <para>Another common question for HBase administrators is determining the right number of regions per
+      RegionServer.  This affects both storage and hardware planning. See <xref linkend="perf.number.of.regions" />.
+      </para>
+    </section>
  </section>

 </chapter>
--- a/src/docbkx/performance.xml
+++ b/src/docbkx/performance.xml
@ -140,6 +140,11 @@
      <para>The number of regions for an HBase table is driven by the <xref
              linkend="bigger.regions" />. Also, see the architecture
          section on <xref linkend="arch.regions.size" /></para>
+       <para>A lower number of regions is preferred, generally in the range of 20 to 200
+       per RegionServer.  Adjust the regionsize as appropriate to achieve this number.  There
+       are some clusters that set the regionsize to 20Gb, for example, so you may need to 
+       experiment with this setting based on your hardware configuration and application needs.
+       </para>
    </section>

    <section xml:id="perf.compactions.and.splits">