hbase-4892 book.xml, ops_mgt.xml book changes.

git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1208028 13f79535-47bb-0310-9956-ffa450edef68
2011-11-29 19:08:17 +00:00 · 2011-11-29 19:08:17 +00:00 · a10fb0ccad
parent 8db322ebdd
commit a10fb0ccad
2 changed files with 56 additions and 19 deletions
--- a/src/docbkx/book.xml
+++ b/src/docbkx/book.xml
@ -271,6 +271,12 @@ for(Result result : htable.getScanner(scan)) {
        <link xlink:href="http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/client/HTable.html#delete%28org.apache.hadoop.hbase.client.Delete%29">
        HTable.delete</link>.
        </para>
        <para>HBase does not modify data in place, and so deletes are handled by creating new markers called <emphasis>tombstones</emphasis>.
        These tombstones, along with the dead values, are cleaned up on major compactions.
        </para>
        <para>See <xref linkend="version.delete"/> for more information on deleting versions of columns.         
        </para>
      </section>
    </section>
@ -428,28 +434,20 @@ htable.put(put);
        </section>
-        <section>
+        <section xml:id="version.delete">
          <title>Delete</title>
-          <para>When performing a delete operation in HBase, there are two
+          <para>There are three different types of internal delete markers: 
-          ways to specify the versions to be deleted</para>
+            <itemizedlist>
-
+            <listitem><para>Delete:  for a specific version of a column.</para>
          <itemizedlist>
            <listitem>
              <para>Delete all versions older than a certain timestamp</para>
            </listitem>
-
+            <listitem><para>Delete column:  for all versions of a column.</para>
-            <listitem>
+            </listitem>
-              <para>Delete the version at a specific timestamp</para>
+            <listitem><para>Delete family:  for all columns of a particular ColumnFamily</para>
            </listitem>
          </itemizedlist>
-
+          When deleting an entire row, HBase will internally create a tombstone for each ColumnFamily (i.e., not each individual column).
-          <para>A delete can apply to a complete row, a complete column
+         </para>
          family, or to just one column. It is only in the last case that you
          can delete explicit versions. For the deletion of a row or all the
          columns within a family, it always works by deleting all cells older
          than a certain version.</para>
          <para>Deletes work by creating <emphasis>tombstone</emphasis>
          markers. For example, let's suppose we want to delete a row. For
          this you can specify a version, or else by default the
@ -466,8 +464,10 @@ htable.put(put);
            </footnote>. If the version you specified when deleting a row is
          larger than the version of any value in the row, then you can
          consider the complete row to be deleted.</para>
          <para>Also see <xref linkend="keyvalue"/> for more information on the internal KeyValue format.
          </para>
        </section>
-      </section>
+       </section>
      <section>
        <title>Current Limitations</title>
@ -1113,6 +1113,20 @@ if (!b) {
 }
    </programlisting>
   </section>
   <section xml:id="mapreduce.example.summary.noreducer">
    <title>HBase MapReduce Summary Without Reducer</title>
       <para>It is also possible to perform summaries without a reducer - if you use HBase as the reducer.
       </para> 
       <para>There would need to exist an HTable target table for the job summary.  The HTable method <code>incrementColumnValue</code>
       would be used to atomically increment values.  From a performance perspective, it might make sense to keep a Map 
       of values with their values to be incremeneted for each map-task, and make one update per key at during the <code>
       cleanup</code> method of the mapper.  However, your milage may vary depending on the number of rows to be processed and 
       unique keys.
       </para>
       <para>In the end, the summary results are in HBase.
       </para>
   </section>
   </section> <!--  mr examples -->
   <section xml:id="mapreduce.htable.access">
   <title>Accessing Other HBase Tables in a MapReduce Job</title>
--- a/src/docbkx/ops_mgt.xml
+++ b/src/docbkx/ops_mgt.xml
@ -132,6 +132,30 @@
    </section>
    </section>  <!--  tools -->
  <section xml:id="ops.regionmgt">
    <title>Region Management</title>
    <section xml:id="ops.regionmgt.majorcompact">
      <title>Major Compaction</title>
      <para>Major compactions can be requested via the HBase shell or <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#majorCompact%28java.lang.String%29">HBaseAdmin.majorCompact</link>.
      </para>
      <para>Note:  major compactions do NOT do region merges.  See <xref linkend="compaction"/> for more information about compactions.
      </para>
    </section>
    <section xml:id="ops.regionmgt.merge">
      <title>Merge</title>
      <para>Merge is a utility that can merge adjoining regions in the same table (see org.apache.hadoop.hbase.util.Merge).</para>
 <programlisting>$ bin/hbase org.apache.hbase.util.Merge &lt;tablename&gt; &lt;region1&gt; &lt;region2&gt;
 </programlisting>
      <para>If you feel you have too many regions and want to consolidate them, Merge is the utility you need.  Merge must
      run be done when the cluster is down.  
      See the <link xlink:href="http://ofps.oreilly.com/titles/9781449396107/performance.html">O'Reilly HBase Book</link> for
      an example of usage.
      </para>
    </section>
  </section>
    <section xml:id="node.management"><title>Node Management</title>
     <section xml:id="decommission"><title>Node Decommission</title>
@ -340,7 +364,6 @@ false
    <para>See <link xlink:href="http://hbase.apache.org/replication.html">Cluster Replication</link>.
    </para>
  </section>
  <section xml:id="ops.backup">
    <title >HBase Backup</title>
    <para>There are two broad strategies for performing HBase backups: backing up with a full cluster shutdown, and backing up on a live cluster.