From f0aeae2ba449123336d3c11bae2230e6538f010a Mon Sep 17 00:00:00 2001 From: Doug Meil Date: Mon, 3 Oct 2011 16:07:10 +0000 Subject: [PATCH] HBASE-4530 expanding backup section git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1178437 13f79535-47bb-0310-9956-ffa450edef68 --- src/docbkx/ops_mgt.xml | 62 ++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 59 insertions(+), 3 deletions(-) diff --git a/src/docbkx/ops_mgt.xml b/src/docbkx/ops_mgt.xml index 172da244273..fe29eff524a 100644 --- a/src/docbkx/ops_mgt.xml +++ b/src/docbkx/ops_mgt.xml @@ -89,13 +89,26 @@ --peer.adr=server1,server2,server3:2181:/hbase TestTable +
+ Export + Export is a utility that will dump the contents of table to HDFS in a sequence file. Invoke via: +$ bin/hbase org.apache.hadoop.hbase.mapreduce.Export <tablename> <outputdir> [<versions> [<starttime> [<endtime>]]] + + +
+
+ Import + Import is a utility that will load data that has been exported back into HBase. Invoke via: +$ bin/hbase org.apache.hadoop.hbase.mapreduce.Import <tablename> <inputdir> + + +
RowCounter RowCounter is a utility that will count all the rows of a table. This is a good utility to use as a sanity check to ensure that HBase can read all the blocks of a table if there are any concerns of metadata inconsistency. -$ bin/hbase org.apache.hadoop.hbase.mapreduce.RowCounter +$ bin/hbase org.apache.hadoop.hbase.mapreduce.RowCounter <tablename> [<column1> <column2>...] -
@@ -240,8 +253,51 @@ false
HBase Backup - See HBase Backup Options over on the Sematext Blog. + There are two broad strategies for performing HBase backups: backing up with a full cluster shutdown, and backing up on a live cluster. + Each approach has pros and cons. + For additional information, see HBase Backup Options over on the Sematext Blog. + +
Full Shutdown Backup + Some environments can tolerate a periodic full shutdown of their HBase cluster, for example if it is being used a back-end analytic capacity + and not serving front-end web-pages. The benefits are that the NameNode/Master are RegionServers are down, so there is no chance of missing + any in-flight changes to either StoreFiles or metadata. The obvious con is that the cluster is down. The steps include: + +
Stop HBase + + +
+
Backup NameNode + + +
+
Distcp + Distcp could be used to either copy the contents of the hbase directory in HDFS to either the same cluster, or do a different cluster. + + Note: Distcp works in this situation because the cluster is down and there are no in-flight edits to files. + This is not recommended on a live cluster. + +
+
+
Live Cluster Backup - Replication + This approach assumes that there is a second cluster. + See the HBase page on replication for more information. + +
+
Live Cluster Backup - CopyTable + The utility could either be used to copy data from one table to another on the + same cluster, or to copy data to another table on another cluster. + + Since the cluster is up, there is a risk that edits could be missed in the copy process. + +
+
Live Cluster Backup - Export + The approach dumps the content of a table to HDFS on the same cluster. To restore the data, the + utility would be used. + + Since the cluster is up, there is a risk that edits could be missed in the export process. + +