HBASE-15646 Add some docs about exporting and importing snapshots using S3

This commit is contained in:
Misty Stanley-Jones 2016-04-13 12:14:29 -07:00
parent a050e1d9f8
commit 92f5595e7e
2 changed files with 99 additions and 0 deletions

View File

@ -1111,6 +1111,37 @@ Only a subset of all configurations can currently be changed in the running serv
Here is an incomplete list: `hbase.regionserver.thread.compaction.large`, `hbase.regionserver.thread.compaction.small`, `hbase.regionserver.thread.split`, `hbase.regionserver.thread.merge`, as well as compaction policy and configurations and adjustment to offpeak hours. Here is an incomplete list: `hbase.regionserver.thread.compaction.large`, `hbase.regionserver.thread.compaction.small`, `hbase.regionserver.thread.split`, `hbase.regionserver.thread.merge`, as well as compaction policy and configurations and adjustment to offpeak hours.
For the full list consult the patch attached to link:https://issues.apache.org/jira/browse/HBASE-12147[HBASE-12147 Porting Online Config Change from 89-fb]. For the full list consult the patch attached to link:https://issues.apache.org/jira/browse/HBASE-12147[HBASE-12147 Porting Online Config Change from 89-fb].
[[amazon_s3_configuration]]
== Using Amazon S3 Storage
HBase is designed to be tightly coupled with HDFS, and testing of other filesystems
has not been thorough.
The following limitations have been reported:
- RegionServers should be deployed in Amazon EC2 to mitigate latency and bandwidth
limitations when accessing the filesystem, and RegionServers must remain available
to preserve data locality.
- S3 writes each inbound and outbound file to disk, which adds overhead to each operation.
- The best performance is achieved when all clients and servers are in the Amazon
cloud, rather than a heterogenous architecture.
- You must be aware of the location of `hadoop.tmp.dir` so that the local `/tmp/`
directory is not filled to capacity.
- HBase has a different file usage pattern than MapReduce jobs and has been optimized for
HDFS, rather than distant networked storage.
- The `s3a://` protocol is strongly recommended. The `s3n://` and `s3://` protocols have serious
limitations and do not use the Amazon AWS SDK. The `s3a://` protocol is supported
for use with HBase if you use Hadoop 2.6.1 or higher with HBase 1.2 or higher. Hadoop
2.6.0 is not supported with HBase at all.
Configuration details for Amazon S3 and associated Amazon services such as EMR are
out of the scope of the HBase documentation. See the
link:https://wiki.apache.org/hadoop/AmazonS3[Hadoop Wiki entry on Amazon S3 Storage]
and
link:http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-hbase.html[Amazon's documentation for deploying HBase in EMR].
One use case that is well-suited for Amazon S3 is storing snapshots. See <<snapshots_s3>>.
ifdef::backend-docbook[] ifdef::backend-docbook[]
[index] [index]
== Index == Index

View File

@ -2050,6 +2050,74 @@ The following example limits the above example to 200 MB/sec.
$ bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot MySnapshot -copy-to hdfs://srv2:8082/hbase -mappers 16 -bandwidth 200 $ bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot MySnapshot -copy-to hdfs://srv2:8082/hbase -mappers 16 -bandwidth 200
---- ----
[[snapshots_s3]]
=== Storing Snapshots in an Amazon S3 Bucket
For general information and limitations of using Amazon S3 storage with HBase, see
<<amazon_s3_configuration>>. You can also store and retrieve snapshots from Amazon
S3, using the following procedure.
NOTE: You can also store snapshots in Microsoft Azure Blob Storage. See <<snapshots_azure>>.
.Prerequisites
- You must be using HBase 1.0 or higher and Hadoop 2.6.1 or higher, which is the first
configuration that uses the Amazon AWS SDK.
- You must use the `s3a://` protocol to connect to Amazon S3. The older `s3n://`
and `s3://` protocols have various limitations and do not use the Amazon AWS SDK.
- The `s3a://` URI must be configured and available on the server where you run
the commands to export and restore the snapshot.
After you have fulfilled the prerequisites, take the snapshot like you normally would.
Afterward, you can export it using the `org.apache.hadoop.hbase.snapshot.ExportSnapshot`
command like the one below, substituting your own `s3a://` path in the `copy-from`
or `copy-to` directive and substituting or modifying other options as required:
----
$ hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot \
-snapshot MySnapshot \
-copy-from hdfs://srv2:8082/hbase \
-copy-to s3a://<bucket>/<namespace>/hbase \
-chuser MyUser \
-chgroup MyGroup \
-chmod 700 \
-mappers 16
----
----
$ hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot \
-snapshot MySnapshot
-copy-from s3a://<bucket>/<namespace>/hbase \
-copy-to hdfs://srv2:8082/hbase \
-chuser MyUser \
-chgroup MyGroup \
-chmod 700 \
-mappers 16
----
You can also use the `org.apache.hadoop.hbase.snapshot.SnapshotInfo` utility with the `s3a://` path by including the
`-remote-dir` option.
----
$ hbase org.apache.hadoop.hbase.snapshot.SnapshotInfo \
-remote-dir s3a://<bucket>/<namespace>/hbase \
-list-snapshots
----
[[snapshots_azure]]
== Storing Snapshots in Microsoft Azure Blob Storage
You can store snapshots in Microsoft Azure Blog Storage using the same techniques
as in <<snapshots_s3>>.
.Prerequisites
- You must be using HBase 1.2 or higher with Hadoop 2.7.1 or
higher. No version of HBase supports Hadoop 2.7.0.
- Your hosts must be configured to be aware of the Azure blob storage filesystem.
See http://hadoop.apache.org/docs/r2.7.1/hadoop-azure/index.html.
After you meet the prerequisites, follow the instructions
in <<snapshots_s3>>, replacingthe protocol specifier with `wasb://` or `wasbs://`.
[[ops.capacity]] [[ops.capacity]]
== Capacity Planning and Region Sizing == Capacity Planning and Region Sizing