hbase-4892 book.xml, ops_mgt.xml book changes.

git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1208028 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Doug Meil 2011-11-29 19:08:17 +00:00
parent 8db322ebdd
commit a10fb0ccad
2 changed files with 56 additions and 19 deletions

View File

@ -271,6 +271,12 @@ for(Result result : htable.getScanner(scan)) {
<link xlink:href="http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/client/HTable.html#delete%28org.apache.hadoop.hbase.client.Delete%29">
HTable.delete</link>.
</para>
<para>HBase does not modify data in place, and so deletes are handled by creating new markers called <emphasis>tombstones</emphasis>.
These tombstones, along with the dead values, are cleaned up on major compactions.
</para>
<para>See <xref linkend="version.delete"/> for more information on deleting versions of columns.
</para>
</section>
</section>
@ -428,28 +434,20 @@ htable.put(put);
</section>
<section>
<section xml:id="version.delete">
<title>Delete</title>
<para>When performing a delete operation in HBase, there are two
ways to specify the versions to be deleted</para>
<para>There are three different types of internal delete markers:
<itemizedlist>
<listitem>
<para>Delete all versions older than a certain timestamp</para>
<listitem><para>Delete: for a specific version of a column.</para>
</listitem>
<listitem>
<para>Delete the version at a specific timestamp</para>
<listitem><para>Delete column: for all versions of a column.</para>
</listitem>
<listitem><para>Delete family: for all columns of a particular ColumnFamily</para>
</listitem>
</itemizedlist>
<para>A delete can apply to a complete row, a complete column
family, or to just one column. It is only in the last case that you
can delete explicit versions. For the deletion of a row or all the
columns within a family, it always works by deleting all cells older
than a certain version.</para>
When deleting an entire row, HBase will internally create a tombstone for each ColumnFamily (i.e., not each individual column).
</para>
<para>Deletes work by creating <emphasis>tombstone</emphasis>
markers. For example, let's suppose we want to delete a row. For
this you can specify a version, or else by default the
@ -466,6 +464,8 @@ htable.put(put);
</footnote>. If the version you specified when deleting a row is
larger than the version of any value in the row, then you can
consider the complete row to be deleted.</para>
<para>Also see <xref linkend="keyvalue"/> for more information on the internal KeyValue format.
</para>
</section>
</section>
@ -1113,6 +1113,20 @@ if (!b) {
}
</programlisting>
</section>
<section xml:id="mapreduce.example.summary.noreducer">
<title>HBase MapReduce Summary Without Reducer</title>
<para>It is also possible to perform summaries without a reducer - if you use HBase as the reducer.
</para>
<para>There would need to exist an HTable target table for the job summary. The HTable method <code>incrementColumnValue</code>
would be used to atomically increment values. From a performance perspective, it might make sense to keep a Map
of values with their values to be incremeneted for each map-task, and make one update per key at during the <code>
cleanup</code> method of the mapper. However, your milage may vary depending on the number of rows to be processed and
unique keys.
</para>
<para>In the end, the summary results are in HBase.
</para>
</section>
</section> <!-- mr examples -->
<section xml:id="mapreduce.htable.access">
<title>Accessing Other HBase Tables in a MapReduce Job</title>

View File

@ -133,6 +133,30 @@
</section> <!-- tools -->
<section xml:id="ops.regionmgt">
<title>Region Management</title>
<section xml:id="ops.regionmgt.majorcompact">
<title>Major Compaction</title>
<para>Major compactions can be requested via the HBase shell or <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#majorCompact%28java.lang.String%29">HBaseAdmin.majorCompact</link>.
</para>
<para>Note: major compactions do NOT do region merges. See <xref linkend="compaction"/> for more information about compactions.
</para>
</section>
<section xml:id="ops.regionmgt.merge">
<title>Merge</title>
<para>Merge is a utility that can merge adjoining regions in the same table (see org.apache.hadoop.hbase.util.Merge).</para>
<programlisting>$ bin/hbase org.apache.hbase.util.Merge &lt;tablename&gt; &lt;region1&gt; &lt;region2&gt;
</programlisting>
<para>If you feel you have too many regions and want to consolidate them, Merge is the utility you need. Merge must
run be done when the cluster is down.
See the <link xlink:href="http://ofps.oreilly.com/titles/9781449396107/performance.html">O'Reilly HBase Book</link> for
an example of usage.
</para>
</section>
</section>
<section xml:id="node.management"><title>Node Management</title>
<section xml:id="decommission"><title>Node Decommission</title>
<para>You can stop an individual RegionServer by running the following
@ -340,7 +364,6 @@ false
<para>See <link xlink:href="http://hbase.apache.org/replication.html">Cluster Replication</link>.
</para>
</section>
<section xml:id="ops.backup">
<title >HBase Backup</title>
<para>There are two broad strategies for performing HBase backups: backing up with a full cluster shutdown, and backing up on a live cluster.