HBASE-15646 Add some docs about exporting and importing snapshots using S3
This commit is contained in:
parent
a050e1d9f8
commit
92f5595e7e
|
@ -1111,6 +1111,37 @@ Only a subset of all configurations can currently be changed in the running serv
|
|||
Here is an incomplete list: `hbase.regionserver.thread.compaction.large`, `hbase.regionserver.thread.compaction.small`, `hbase.regionserver.thread.split`, `hbase.regionserver.thread.merge`, as well as compaction policy and configurations and adjustment to offpeak hours.
|
||||
For the full list consult the patch attached to link:https://issues.apache.org/jira/browse/HBASE-12147[HBASE-12147 Porting Online Config Change from 89-fb].
|
||||
|
||||
[[amazon_s3_configuration]]
|
||||
== Using Amazon S3 Storage
|
||||
|
||||
HBase is designed to be tightly coupled with HDFS, and testing of other filesystems
|
||||
has not been thorough.
|
||||
|
||||
The following limitations have been reported:
|
||||
|
||||
- RegionServers should be deployed in Amazon EC2 to mitigate latency and bandwidth
|
||||
limitations when accessing the filesystem, and RegionServers must remain available
|
||||
to preserve data locality.
|
||||
- S3 writes each inbound and outbound file to disk, which adds overhead to each operation.
|
||||
- The best performance is achieved when all clients and servers are in the Amazon
|
||||
cloud, rather than a heterogenous architecture.
|
||||
- You must be aware of the location of `hadoop.tmp.dir` so that the local `/tmp/`
|
||||
directory is not filled to capacity.
|
||||
- HBase has a different file usage pattern than MapReduce jobs and has been optimized for
|
||||
HDFS, rather than distant networked storage.
|
||||
- The `s3a://` protocol is strongly recommended. The `s3n://` and `s3://` protocols have serious
|
||||
limitations and do not use the Amazon AWS SDK. The `s3a://` protocol is supported
|
||||
for use with HBase if you use Hadoop 2.6.1 or higher with HBase 1.2 or higher. Hadoop
|
||||
2.6.0 is not supported with HBase at all.
|
||||
|
||||
Configuration details for Amazon S3 and associated Amazon services such as EMR are
|
||||
out of the scope of the HBase documentation. See the
|
||||
link:https://wiki.apache.org/hadoop/AmazonS3[Hadoop Wiki entry on Amazon S3 Storage]
|
||||
and
|
||||
link:http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-hbase.html[Amazon's documentation for deploying HBase in EMR].
|
||||
|
||||
One use case that is well-suited for Amazon S3 is storing snapshots. See <<snapshots_s3>>.
|
||||
|
||||
ifdef::backend-docbook[]
|
||||
[index]
|
||||
== Index
|
||||
|
|
|
@ -2050,6 +2050,74 @@ The following example limits the above example to 200 MB/sec.
|
|||
$ bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot MySnapshot -copy-to hdfs://srv2:8082/hbase -mappers 16 -bandwidth 200
|
||||
----
|
||||
|
||||
[[snapshots_s3]]
|
||||
=== Storing Snapshots in an Amazon S3 Bucket
|
||||
|
||||
For general information and limitations of using Amazon S3 storage with HBase, see
|
||||
<<amazon_s3_configuration>>. You can also store and retrieve snapshots from Amazon
|
||||
S3, using the following procedure.
|
||||
|
||||
NOTE: You can also store snapshots in Microsoft Azure Blob Storage. See <<snapshots_azure>>.
|
||||
|
||||
.Prerequisites
|
||||
- You must be using HBase 1.0 or higher and Hadoop 2.6.1 or higher, which is the first
|
||||
configuration that uses the Amazon AWS SDK.
|
||||
- You must use the `s3a://` protocol to connect to Amazon S3. The older `s3n://`
|
||||
and `s3://` protocols have various limitations and do not use the Amazon AWS SDK.
|
||||
- The `s3a://` URI must be configured and available on the server where you run
|
||||
the commands to export and restore the snapshot.
|
||||
|
||||
After you have fulfilled the prerequisites, take the snapshot like you normally would.
|
||||
Afterward, you can export it using the `org.apache.hadoop.hbase.snapshot.ExportSnapshot`
|
||||
command like the one below, substituting your own `s3a://` path in the `copy-from`
|
||||
or `copy-to` directive and substituting or modifying other options as required:
|
||||
|
||||
----
|
||||
$ hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot \
|
||||
-snapshot MySnapshot \
|
||||
-copy-from hdfs://srv2:8082/hbase \
|
||||
-copy-to s3a://<bucket>/<namespace>/hbase \
|
||||
-chuser MyUser \
|
||||
-chgroup MyGroup \
|
||||
-chmod 700 \
|
||||
-mappers 16
|
||||
----
|
||||
|
||||
----
|
||||
$ hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot \
|
||||
-snapshot MySnapshot
|
||||
-copy-from s3a://<bucket>/<namespace>/hbase \
|
||||
-copy-to hdfs://srv2:8082/hbase \
|
||||
-chuser MyUser \
|
||||
-chgroup MyGroup \
|
||||
-chmod 700 \
|
||||
-mappers 16
|
||||
----
|
||||
|
||||
You can also use the `org.apache.hadoop.hbase.snapshot.SnapshotInfo` utility with the `s3a://` path by including the
|
||||
`-remote-dir` option.
|
||||
|
||||
----
|
||||
$ hbase org.apache.hadoop.hbase.snapshot.SnapshotInfo \
|
||||
-remote-dir s3a://<bucket>/<namespace>/hbase \
|
||||
-list-snapshots
|
||||
----
|
||||
|
||||
[[snapshots_azure]]
|
||||
== Storing Snapshots in Microsoft Azure Blob Storage
|
||||
|
||||
You can store snapshots in Microsoft Azure Blog Storage using the same techniques
|
||||
as in <<snapshots_s3>>.
|
||||
|
||||
.Prerequisites
|
||||
- You must be using HBase 1.2 or higher with Hadoop 2.7.1 or
|
||||
higher. No version of HBase supports Hadoop 2.7.0.
|
||||
- Your hosts must be configured to be aware of the Azure blob storage filesystem.
|
||||
See http://hadoop.apache.org/docs/r2.7.1/hadoop-azure/index.html.
|
||||
|
||||
After you meet the prerequisites, follow the instructions
|
||||
in <<snapshots_s3>>, replacingthe protocol specifier with `wasb://` or `wasbs://`.
|
||||
|
||||
[[ops.capacity]]
|
||||
== Capacity Planning and Region Sizing
|
||||
|
||||
|
|
Loading…
Reference in New Issue