hbase-4892 book.xml, ops_mgt.xml book changes.

git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1208028 13f79535-47bb-0310-9956-ffa450edef68
2011-11-29 19:08:17 +00:00 · 2011-11-29 19:08:17 +00:00 · a10fb0ccad
parent 8db322ebdd
commit a10fb0ccad
2 changed files with 56 additions and 19 deletions
--- a/src/docbkx/book.xml
+++ b/src/docbkx/book.xml
@ -271,6 +271,12 @@ for(Result result : htable.getScanner(scan)) {
        <link xlink:href="http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/client/HTable.html#delete%28org.apache.hadoop.hbase.client.Delete%29">
        HTable.delete</link>.
        </para>
+        <para>HBase does not modify data in place, and so deletes are handled by creating new markers called <emphasis>tombstones</emphasis>.
+        These tombstones, along with the dead values, are cleaned up on major compactions.
+        </para>
+        <para>See <xref linkend="version.delete"/> for more information on deleting versions of columns.         
+        </para>
+ 
      </section>
            
    </section>
@ -428,28 +434,20 @@ htable.put(put);
          
        </section>

-        <section>
+        <section xml:id="version.delete">
          <title>Delete</title>

-          <para>When performing a delete operation in HBase, there are two
-          ways to specify the versions to be deleted</para>
-
+          <para>There are three different types of internal delete markers: 
            <itemizedlist>
-            <listitem>
-              <para>Delete all versions older than a certain timestamp</para>
+            <listitem><para>Delete:  for a specific version of a column.</para>
            </listitem>
-
-            <listitem>
-              <para>Delete the version at a specific timestamp</para>
+            <listitem><para>Delete column:  for all versions of a column.</para>
+            </listitem>
+            <listitem><para>Delete family:  for all columns of a particular ColumnFamily</para>
            </listitem>
          </itemizedlist>
-
-          <para>A delete can apply to a complete row, a complete column
-          family, or to just one column. It is only in the last case that you
-          can delete explicit versions. For the deletion of a row or all the
-          columns within a family, it always works by deleting all cells older
-          than a certain version.</para>
-
+          When deleting an entire row, HBase will internally create a tombstone for each ColumnFamily (i.e., not each individual column).
+         </para>
          <para>Deletes work by creating <emphasis>tombstone</emphasis>
          markers. For example, let's suppose we want to delete a row. For
          this you can specify a version, or else by default the
@ -466,6 +464,8 @@ htable.put(put);
            </footnote>. If the version you specified when deleting a row is
          larger than the version of any value in the row, then you can
          consider the complete row to be deleted.</para>
+          <para>Also see <xref linkend="keyvalue"/> for more information on the internal KeyValue format.
+          </para>
        </section>
       </section>

@ -1113,6 +1113,20 @@ if (!b) {
 }
    </programlisting>
   </section>
+   <section xml:id="mapreduce.example.summary.noreducer">
+    <title>HBase MapReduce Summary Without Reducer</title>
+       <para>It is also possible to perform summaries without a reducer - if you use HBase as the reducer.
+       </para> 
+       <para>There would need to exist an HTable target table for the job summary.  The HTable method <code>incrementColumnValue</code>
+       would be used to atomically increment values.  From a performance perspective, it might make sense to keep a Map 
+       of values with their values to be incremeneted for each map-task, and make one update per key at during the <code>
+       cleanup</code> method of the mapper.  However, your milage may vary depending on the number of rows to be processed and 
+       unique keys.
+       </para>
+       <para>In the end, the summary results are in HBase.
+       </para>
+   </section>
+   
   </section> <!--  mr examples -->
   <section xml:id="mapreduce.htable.access">
   <title>Accessing Other HBase Tables in a MapReduce Job</title>
--- a/src/docbkx/ops_mgt.xml
+++ b/src/docbkx/ops_mgt.xml
@ -133,6 +133,30 @@
           
    </section>  <!--  tools -->

+  <section xml:id="ops.regionmgt">
+    <title>Region Management</title>
+    <section xml:id="ops.regionmgt.majorcompact">
+      <title>Major Compaction</title>
+      <para>Major compactions can be requested via the HBase shell or <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#majorCompact%28java.lang.String%29">HBaseAdmin.majorCompact</link>.
+      </para>
+      <para>Note:  major compactions do NOT do region merges.  See <xref linkend="compaction"/> for more information about compactions.
+      
+      </para>
+    </section>
+    <section xml:id="ops.regionmgt.merge">
+      <title>Merge</title>
+      <para>Merge is a utility that can merge adjoining regions in the same table (see org.apache.hadoop.hbase.util.Merge).</para>
+<programlisting>$ bin/hbase org.apache.hbase.util.Merge &lt;tablename&gt; &lt;region1&gt; &lt;region2&gt;
+</programlisting>
+      <para>If you feel you have too many regions and want to consolidate them, Merge is the utility you need.  Merge must
+      run be done when the cluster is down.  
+      See the <link xlink:href="http://ofps.oreilly.com/titles/9781449396107/performance.html">O'Reilly HBase Book</link> for
+      an example of usage.
+      </para>
+      
+    </section>
+  </section>
+    
    <section xml:id="node.management"><title>Node Management</title>
     <section xml:id="decommission"><title>Node Decommission</title>
        <para>You can stop an individual RegionServer by running the following
@ -340,7 +364,6 @@ false
    <para>See <link xlink:href="http://hbase.apache.org/replication.html">Cluster Replication</link>.
    </para>
  </section>
-  
  <section xml:id="ops.backup">
    <title >HBase Backup</title>
    <para>There are two broad strategies for performing HBase backups: backing up with a full cluster shutdown, and backing up on a live cluster.