From a10fb0ccad58f635ce6978961156f8b8ce11b439 Mon Sep 17 00:00:00 2001 From: Doug Meil Date: Tue, 29 Nov 2011 19:08:17 +0000 Subject: [PATCH] hbase-4892 book.xml, ops_mgt.xml book changes. git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1208028 13f79535-47bb-0310-9956-ffa450edef68 --- src/docbkx/book.xml | 50 +++++++++++++++++++++++++++--------------- src/docbkx/ops_mgt.xml | 25 ++++++++++++++++++++- 2 files changed, 56 insertions(+), 19 deletions(-) diff --git a/src/docbkx/book.xml b/src/docbkx/book.xml index 3c1216961c1..7f077d29b5a 100644 --- a/src/docbkx/book.xml +++ b/src/docbkx/book.xml @@ -271,6 +271,12 @@ for(Result result : htable.getScanner(scan)) { HTable.delete. + HBase does not modify data in place, and so deletes are handled by creating new markers called tombstones. + These tombstones, along with the dead values, are cleaned up on major compactions. + + See for more information on deleting versions of columns. + + @@ -428,28 +434,20 @@ htable.put(put); -
+
Delete - When performing a delete operation in HBase, there are two - ways to specify the versions to be deleted - - - - Delete all versions older than a certain timestamp + There are three different types of internal delete markers: + + Delete: for a specific version of a column. - - - Delete the version at a specific timestamp + Delete column: for all versions of a column. + + Delete family: for all columns of a particular ColumnFamily - - A delete can apply to a complete row, a complete column - family, or to just one column. It is only in the last case that you - can delete explicit versions. For the deletion of a row or all the - columns within a family, it always works by deleting all cells older - than a certain version. - + When deleting an entire row, HBase will internally create a tombstone for each ColumnFamily (i.e., not each individual column). + Deletes work by creating tombstone markers. For example, let's suppose we want to delete a row. For this you can specify a version, or else by default the @@ -466,8 +464,10 @@ htable.put(put); . If the version you specified when deleting a row is larger than the version of any value in the row, then you can consider the complete row to be deleted. + Also see for more information on the internal KeyValue format. +
-
+
Current Limitations @@ -1113,6 +1113,20 @@ if (!b) { }
+
+ HBase MapReduce Summary Without Reducer + It is also possible to perform summaries without a reducer - if you use HBase as the reducer. + + There would need to exist an HTable target table for the job summary. The HTable method incrementColumnValue + would be used to atomically increment values. From a performance perspective, it might make sense to keep a Map + of values with their values to be incremeneted for each map-task, and make one update per key at during the + cleanup method of the mapper. However, your milage may vary depending on the number of rows to be processed and + unique keys. + + In the end, the summary results are in HBase. + +
+
Accessing Other HBase Tables in a MapReduce Job diff --git a/src/docbkx/ops_mgt.xml b/src/docbkx/ops_mgt.xml index fa534842564..d86d2f6fa6e 100644 --- a/src/docbkx/ops_mgt.xml +++ b/src/docbkx/ops_mgt.xml @@ -132,6 +132,30 @@
+ +
+ Region Management +
+ Major Compaction + Major compactions can be requested via the HBase shell or HBaseAdmin.majorCompact. + + Note: major compactions do NOT do region merges. See for more information about compactions. + + +
+
+ Merge + Merge is a utility that can merge adjoining regions in the same table (see org.apache.hadoop.hbase.util.Merge). +$ bin/hbase org.apache.hbase.util.Merge <tablename> <region1> <region2> + + If you feel you have too many regions and want to consolidate them, Merge is the utility you need. Merge must + run be done when the cluster is down. + See the O'Reilly HBase Book for + an example of usage. + + +
+
Node Management
Node Decommission @@ -340,7 +364,6 @@ false See Cluster Replication.
-
HBase Backup There are two broad strategies for performing HBase backups: backing up with a full cluster shutdown, and backing up on a live cluster.