HBASE-15646 Add some docs about exporting and importing snapshots using S3

2016-04-13 12:14:29 -07:00 · 2016-04-13 12:14:29 -07:00 · 92f5595e7e
parent a050e1d9f8
commit 92f5595e7e
2 changed files with 99 additions and 0 deletions
--- a/src/main/asciidoc/_chapters/configuration.adoc
+++ b/src/main/asciidoc/_chapters/configuration.adoc
@ -1111,6 +1111,37 @@ Only a subset of all configurations can currently be changed in the running serv
 Here is an incomplete list: `hbase.regionserver.thread.compaction.large`, `hbase.regionserver.thread.compaction.small`, `hbase.regionserver.thread.split`, `hbase.regionserver.thread.merge`, as well as compaction policy and configurations and adjustment to offpeak hours.
 For the full list consult the patch attached to  link:https://issues.apache.org/jira/browse/HBASE-12147[HBASE-12147 Porting Online Config Change from 89-fb].
 [[amazon_s3_configuration]]
 == Using Amazon S3 Storage
 HBase is designed to be tightly coupled with HDFS, and testing of other filesystems
 has not been thorough.
 The following limitations have been reported:
 - RegionServers should be deployed in Amazon EC2 to mitigate latency and bandwidth
 limitations when accessing the filesystem, and RegionServers must remain available
 to preserve data locality.
 - S3 writes each inbound and outbound file to disk, which adds overhead to each operation.
 - The best performance is achieved when all clients and servers are in the Amazon
 cloud, rather than a heterogenous architecture.
 - You must be aware of the location of `hadoop.tmp.dir` so that the local `/tmp/`
 directory is not filled to capacity.
 - HBase has a different file usage pattern than MapReduce jobs and has been optimized for
 HDFS, rather than distant networked storage.
 - The `s3a://` protocol is strongly recommended. The `s3n://` and `s3://` protocols have serious
 limitations and do not use the Amazon AWS SDK. The `s3a://` protocol is supported
 for use with HBase if you use Hadoop 2.6.1 or higher with HBase 1.2 or higher. Hadoop
 2.6.0 is not supported with HBase at all.
 Configuration details for Amazon S3 and associated Amazon services such as EMR are
 out of the scope of the HBase documentation. See the
 link:https://wiki.apache.org/hadoop/AmazonS3[Hadoop Wiki entry on Amazon S3 Storage]
 and
 link:http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-hbase.html[Amazon's documentation for deploying HBase in EMR].
 One use case that is well-suited for Amazon S3 is storing snapshots. See <<snapshots_s3>>.
 ifdef::backend-docbook[]
 [index]
 == Index
--- a/src/main/asciidoc/_chapters/ops_mgt.adoc
+++ b/src/main/asciidoc/_chapters/ops_mgt.adoc
@ -2050,6 +2050,74 @@ The following example limits the above example to 200 MB/sec.
 $ bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot MySnapshot -copy-to hdfs://srv2:8082/hbase -mappers 16 -bandwidth 200
 ----
 [[snapshots_s3]]
 === Storing Snapshots in an Amazon S3 Bucket
 For general information and limitations of using Amazon S3 storage with HBase, see
 <<amazon_s3_configuration>>. You can also store and retrieve snapshots from Amazon
 S3, using the following procedure.
 NOTE: You can also store snapshots in Microsoft Azure Blob Storage. See <<snapshots_azure>>.
 .Prerequisites
 - You must be using HBase 1.0 or higher and Hadoop 2.6.1 or higher, which is the first
 configuration that uses the Amazon AWS SDK.
 - You must use the `s3a://` protocol to connect to Amazon S3. The older `s3n://`
 and `s3://` protocols have various limitations and do not use the Amazon AWS SDK.
 - The `s3a://` URI must be configured and available on the server where you run
 the commands to export and restore the snapshot.
 After you have fulfilled the prerequisites, take the snapshot like you normally would.
 Afterward, you can export it using the `org.apache.hadoop.hbase.snapshot.ExportSnapshot`
 command like the one below, substituting your own `s3a://` path in the `copy-from`
 or `copy-to` directive and substituting or modifying other options as required:
 ----
 $ hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot \
    -snapshot MySnapshot \
    -copy-from hdfs://srv2:8082/hbase \
    -copy-to s3a://<bucket>/<namespace>/hbase \
    -chuser MyUser \
    -chgroup MyGroup \
    -chmod 700 \
    -mappers 16
 ----
 ----
 $ hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot \
    -snapshot MySnapshot
    -copy-from s3a://<bucket>/<namespace>/hbase \
    -copy-to hdfs://srv2:8082/hbase \
    -chuser MyUser \
    -chgroup MyGroup \
    -chmod 700 \
    -mappers 16
 ----
 You can also use the `org.apache.hadoop.hbase.snapshot.SnapshotInfo` utility with the `s3a://` path by including the
 `-remote-dir` option.
 ----
 $ hbase org.apache.hadoop.hbase.snapshot.SnapshotInfo \
    -remote-dir s3a://<bucket>/<namespace>/hbase \
    -list-snapshots
 ----
 [[snapshots_azure]]
 == Storing Snapshots in Microsoft Azure Blob Storage
 You can store snapshots in Microsoft Azure Blog Storage using the same techniques
 as in <<snapshots_s3>>.
 .Prerequisites
 - You must be using HBase 1.2 or higher with Hadoop 2.7.1 or
  higher. No version of HBase supports Hadoop 2.7.0.
 - Your hosts must be configured to be aware of the Azure blob storage filesystem.
  See http://hadoop.apache.org/docs/r2.7.1/hadoop-azure/index.html.
 After you meet the prerequisites, follow the instructions
 in <<snapshots_s3>>, replacingthe protocol specifier with `wasb://` or `wasbs://`.
 [[ops.capacity]]
 == Capacity Planning and Region Sizing