hbase-4892 book.xml, ops_mgt.xml book changes.

git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1208028 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Doug Meil 2011-11-29 19:08:17 +00:00
parent 8db322ebdd
commit a10fb0ccad
2 changed files with 56 additions and 19 deletions

View File

@ -271,6 +271,12 @@ for(Result result : htable.getScanner(scan)) {
<link xlink:href="http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/client/HTable.html#delete%28org.apache.hadoop.hbase.client.Delete%29"> <link xlink:href="http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/client/HTable.html#delete%28org.apache.hadoop.hbase.client.Delete%29">
HTable.delete</link>. HTable.delete</link>.
</para> </para>
<para>HBase does not modify data in place, and so deletes are handled by creating new markers called <emphasis>tombstones</emphasis>.
These tombstones, along with the dead values, are cleaned up on major compactions.
</para>
<para>See <xref linkend="version.delete"/> for more information on deleting versions of columns.
</para>
</section> </section>
</section> </section>
@ -428,28 +434,20 @@ htable.put(put);
</section> </section>
<section> <section xml:id="version.delete">
<title>Delete</title> <title>Delete</title>
<para>When performing a delete operation in HBase, there are two <para>There are three different types of internal delete markers:
ways to specify the versions to be deleted</para> <itemizedlist>
<listitem><para>Delete: for a specific version of a column.</para>
<itemizedlist>
<listitem>
<para>Delete all versions older than a certain timestamp</para>
</listitem> </listitem>
<listitem><para>Delete column: for all versions of a column.</para>
<listitem> </listitem>
<para>Delete the version at a specific timestamp</para> <listitem><para>Delete family: for all columns of a particular ColumnFamily</para>
</listitem> </listitem>
</itemizedlist> </itemizedlist>
When deleting an entire row, HBase will internally create a tombstone for each ColumnFamily (i.e., not each individual column).
<para>A delete can apply to a complete row, a complete column </para>
family, or to just one column. It is only in the last case that you
can delete explicit versions. For the deletion of a row or all the
columns within a family, it always works by deleting all cells older
than a certain version.</para>
<para>Deletes work by creating <emphasis>tombstone</emphasis> <para>Deletes work by creating <emphasis>tombstone</emphasis>
markers. For example, let's suppose we want to delete a row. For markers. For example, let's suppose we want to delete a row. For
this you can specify a version, or else by default the this you can specify a version, or else by default the
@ -466,8 +464,10 @@ htable.put(put);
</footnote>. If the version you specified when deleting a row is </footnote>. If the version you specified when deleting a row is
larger than the version of any value in the row, then you can larger than the version of any value in the row, then you can
consider the complete row to be deleted.</para> consider the complete row to be deleted.</para>
<para>Also see <xref linkend="keyvalue"/> for more information on the internal KeyValue format.
</para>
</section> </section>
</section> </section>
<section> <section>
<title>Current Limitations</title> <title>Current Limitations</title>
@ -1113,6 +1113,20 @@ if (!b) {
} }
</programlisting> </programlisting>
</section> </section>
<section xml:id="mapreduce.example.summary.noreducer">
<title>HBase MapReduce Summary Without Reducer</title>
<para>It is also possible to perform summaries without a reducer - if you use HBase as the reducer.
</para>
<para>There would need to exist an HTable target table for the job summary. The HTable method <code>incrementColumnValue</code>
would be used to atomically increment values. From a performance perspective, it might make sense to keep a Map
of values with their values to be incremeneted for each map-task, and make one update per key at during the <code>
cleanup</code> method of the mapper. However, your milage may vary depending on the number of rows to be processed and
unique keys.
</para>
<para>In the end, the summary results are in HBase.
</para>
</section>
</section> <!-- mr examples --> </section> <!-- mr examples -->
<section xml:id="mapreduce.htable.access"> <section xml:id="mapreduce.htable.access">
<title>Accessing Other HBase Tables in a MapReduce Job</title> <title>Accessing Other HBase Tables in a MapReduce Job</title>

View File

@ -132,6 +132,30 @@
</section> </section>
</section> <!-- tools --> </section> <!-- tools -->
<section xml:id="ops.regionmgt">
<title>Region Management</title>
<section xml:id="ops.regionmgt.majorcompact">
<title>Major Compaction</title>
<para>Major compactions can be requested via the HBase shell or <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#majorCompact%28java.lang.String%29">HBaseAdmin.majorCompact</link>.
</para>
<para>Note: major compactions do NOT do region merges. See <xref linkend="compaction"/> for more information about compactions.
</para>
</section>
<section xml:id="ops.regionmgt.merge">
<title>Merge</title>
<para>Merge is a utility that can merge adjoining regions in the same table (see org.apache.hadoop.hbase.util.Merge).</para>
<programlisting>$ bin/hbase org.apache.hbase.util.Merge &lt;tablename&gt; &lt;region1&gt; &lt;region2&gt;
</programlisting>
<para>If you feel you have too many regions and want to consolidate them, Merge is the utility you need. Merge must
run be done when the cluster is down.
See the <link xlink:href="http://ofps.oreilly.com/titles/9781449396107/performance.html">O'Reilly HBase Book</link> for
an example of usage.
</para>
</section>
</section>
<section xml:id="node.management"><title>Node Management</title> <section xml:id="node.management"><title>Node Management</title>
<section xml:id="decommission"><title>Node Decommission</title> <section xml:id="decommission"><title>Node Decommission</title>
@ -340,7 +364,6 @@ false
<para>See <link xlink:href="http://hbase.apache.org/replication.html">Cluster Replication</link>. <para>See <link xlink:href="http://hbase.apache.org/replication.html">Cluster Replication</link>.
</para> </para>
</section> </section>
<section xml:id="ops.backup"> <section xml:id="ops.backup">
<title >HBase Backup</title> <title >HBase Backup</title>
<para>There are two broad strategies for performing HBase backups: backing up with a full cluster shutdown, and backing up on a live cluster. <para>There are two broad strategies for performing HBase backups: backing up with a full cluster shutdown, and backing up on a live cluster.