From a10fb0ccad58f635ce6978961156f8b8ce11b439 Mon Sep 17 00:00:00 2001
From: Doug Meil <dmeil@apache.org>
Date: Tue, 29 Nov 2011 19:08:17 +0000
Subject: [PATCH] hbase-4892 book.xml, ops_mgt.xml book changes.

git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1208028 13f79535-47bb-0310-9956-ffa450edef68
---
 src/docbkx/book.xml    | 50 +++++++++++++++++++++++++++---------------
 src/docbkx/ops_mgt.xml | 25 ++++++++++++++++++++-
 2 files changed, 56 insertions(+), 19 deletions(-)
diff --git a/src/docbkx/book.xml b/src/docbkx/book.xml
index 3c1216961c1..7f077d29b5a 100644
--- a/src/docbkx/book.xml
+++ b/src/docbkx/book.xml
@@ -271,6 +271,12 @@ for(Result result : htable.getScanner(scan)) {
         <link xlink:href="http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/client/HTable.html#delete%28org.apache.hadoop.hbase.client.Delete%29">
         HTable.delete</link>.
         </para>
+        <para>HBase does not modify data in place, and so deletes are handled by creating new markers called <emphasis>tombstones</emphasis>.
+        These tombstones, along with the dead values, are cleaned up on major compactions.
+        </para>
+        <para>See <xref linkend="version.delete"/> for more information on deleting versions of columns.         
+        </para>
+ 
       </section>
             
     </section>
@@ -428,28 +434,20 @@ htable.put(put);
           
         </section>
 
-        <section>
+        <section xml:id="version.delete">
           <title>Delete</title>
 
-          <para>When performing a delete operation in HBase, there are two
-          ways to specify the versions to be deleted</para>
-
-          <itemizedlist>
-            <listitem>
-              <para>Delete all versions older than a certain timestamp</para>
+          <para>There are three different types of internal delete markers: 
+            <itemizedlist>
+            <listitem><para>Delete:  for a specific version of a column.</para>
             </listitem>
-
-            <listitem>
-              <para>Delete the version at a specific timestamp</para>
+            <listitem><para>Delete column:  for all versions of a column.</para>
+            </listitem>
+            <listitem><para>Delete family:  for all columns of a particular ColumnFamily</para>
             </listitem>
           </itemizedlist>
-
-          <para>A delete can apply to a complete row, a complete column
-          family, or to just one column. It is only in the last case that you
-          can delete explicit versions. For the deletion of a row or all the
-          columns within a family, it always works by deleting all cells older
-          than a certain version.</para>
-
+          When deleting an entire row, HBase will internally create a tombstone for each ColumnFamily (i.e., not each individual column).
+         </para>
           <para>Deletes work by creating <emphasis>tombstone</emphasis>
           markers. For example, let's suppose we want to delete a row. For
           this you can specify a version, or else by default the
@@ -466,8 +464,10 @@ htable.put(put);
             </footnote>. If the version you specified when deleting a row is
           larger than the version of any value in the row, then you can
           consider the complete row to be deleted.</para>
+          <para>Also see <xref linkend="keyvalue"/> for more information on the internal KeyValue format.
+          </para>
         </section>
-      </section>
+       </section>
 
       <section>
         <title>Current Limitations</title>
@@ -1113,6 +1113,20 @@ if (!b) {
 }
     </programlisting>
    </section>
+   <section xml:id="mapreduce.example.summary.noreducer">
+    <title>HBase MapReduce Summary Without Reducer</title>
+       <para>It is also possible to perform summaries without a reducer - if you use HBase as the reducer.
+       </para> 
+       <para>There would need to exist an HTable target table for the job summary.  The HTable method <code>incrementColumnValue</code>
+       would be used to atomically increment values.  From a performance perspective, it might make sense to keep a Map 
+       of values with their values to be incremeneted for each map-task, and make one update per key at during the <code>
+       cleanup</code> method of the mapper.  However, your milage may vary depending on the number of rows to be processed and 
+       unique keys.
+       </para>
+       <para>In the end, the summary results are in HBase.
+       </para>
+   </section>
+   
    </section> <!--  mr examples -->
    <section xml:id="mapreduce.htable.access">
    <title>Accessing Other HBase Tables in a MapReduce Job</title>
diff --git a/src/docbkx/ops_mgt.xml b/src/docbkx/ops_mgt.xml
index fa534842564..d86d2f6fa6e 100644
--- a/src/docbkx/ops_mgt.xml
+++ b/src/docbkx/ops_mgt.xml
@@ -132,6 +132,30 @@
     </section>
            
     </section>  <!--  tools -->
+
+  <section xml:id="ops.regionmgt">
+    <title>Region Management</title>
+    <section xml:id="ops.regionmgt.majorcompact">
+      <title>Major Compaction</title>
+      <para>Major compactions can be requested via the HBase shell or <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#majorCompact%28java.lang.String%29">HBaseAdmin.majorCompact</link>.
+      </para>
+      <para>Note:  major compactions do NOT do region merges.  See <xref linkend="compaction"/> for more information about compactions.
+      
+      </para>
+    </section>
+    <section xml:id="ops.regionmgt.merge">
+      <title>Merge</title>
+      <para>Merge is a utility that can merge adjoining regions in the same table (see org.apache.hadoop.hbase.util.Merge).</para>
+<programlisting>$ bin/hbase org.apache.hbase.util.Merge &lt;tablename&gt; &lt;region1&gt; &lt;region2&gt;
+</programlisting>
+      <para>If you feel you have too many regions and want to consolidate them, Merge is the utility you need.  Merge must
+      run be done when the cluster is down.  
+      See the <link xlink:href="http://ofps.oreilly.com/titles/9781449396107/performance.html">O'Reilly HBase Book</link> for
+      an example of usage.
+      </para>
+      
+    </section>
+  </section>
     
     <section xml:id="node.management"><title>Node Management</title>
      <section xml:id="decommission"><title>Node Decommission</title>
@@ -340,7 +364,6 @@ false
     <para>See <link xlink:href="http://hbase.apache.org/replication.html">Cluster Replication</link>.
     </para>
   </section>
-  
   <section xml:id="ops.backup">
     <title >HBase Backup</title>
     <para>There are two broad strategies for performing HBase backups: backing up with a full cluster shutdown, and backing up on a live cluster.