HBASE-15646 Add some docs about exporting and importing snapshots using S3
This commit is contained in:
parent
a050e1d9f8
commit
92f5595e7e
|
@ -1111,6 +1111,37 @@ Only a subset of all configurations can currently be changed in the running serv
|
||||||
Here is an incomplete list: `hbase.regionserver.thread.compaction.large`, `hbase.regionserver.thread.compaction.small`, `hbase.regionserver.thread.split`, `hbase.regionserver.thread.merge`, as well as compaction policy and configurations and adjustment to offpeak hours.
|
Here is an incomplete list: `hbase.regionserver.thread.compaction.large`, `hbase.regionserver.thread.compaction.small`, `hbase.regionserver.thread.split`, `hbase.regionserver.thread.merge`, as well as compaction policy and configurations and adjustment to offpeak hours.
|
||||||
For the full list consult the patch attached to link:https://issues.apache.org/jira/browse/HBASE-12147[HBASE-12147 Porting Online Config Change from 89-fb].
|
For the full list consult the patch attached to link:https://issues.apache.org/jira/browse/HBASE-12147[HBASE-12147 Porting Online Config Change from 89-fb].
|
||||||
|
|
||||||
|
[[amazon_s3_configuration]]
|
||||||
|
== Using Amazon S3 Storage
|
||||||
|
|
||||||
|
HBase is designed to be tightly coupled with HDFS, and testing of other filesystems
|
||||||
|
has not been thorough.
|
||||||
|
|
||||||
|
The following limitations have been reported:
|
||||||
|
|
||||||
|
- RegionServers should be deployed in Amazon EC2 to mitigate latency and bandwidth
|
||||||
|
limitations when accessing the filesystem, and RegionServers must remain available
|
||||||
|
to preserve data locality.
|
||||||
|
- S3 writes each inbound and outbound file to disk, which adds overhead to each operation.
|
||||||
|
- The best performance is achieved when all clients and servers are in the Amazon
|
||||||
|
cloud, rather than a heterogenous architecture.
|
||||||
|
- You must be aware of the location of `hadoop.tmp.dir` so that the local `/tmp/`
|
||||||
|
directory is not filled to capacity.
|
||||||
|
- HBase has a different file usage pattern than MapReduce jobs and has been optimized for
|
||||||
|
HDFS, rather than distant networked storage.
|
||||||
|
- The `s3a://` protocol is strongly recommended. The `s3n://` and `s3://` protocols have serious
|
||||||
|
limitations and do not use the Amazon AWS SDK. The `s3a://` protocol is supported
|
||||||
|
for use with HBase if you use Hadoop 2.6.1 or higher with HBase 1.2 or higher. Hadoop
|
||||||
|
2.6.0 is not supported with HBase at all.
|
||||||
|
|
||||||
|
Configuration details for Amazon S3 and associated Amazon services such as EMR are
|
||||||
|
out of the scope of the HBase documentation. See the
|
||||||
|
link:https://wiki.apache.org/hadoop/AmazonS3[Hadoop Wiki entry on Amazon S3 Storage]
|
||||||
|
and
|
||||||
|
link:http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-hbase.html[Amazon's documentation for deploying HBase in EMR].
|
||||||
|
|
||||||
|
One use case that is well-suited for Amazon S3 is storing snapshots. See <<snapshots_s3>>.
|
||||||
|
|
||||||
ifdef::backend-docbook[]
|
ifdef::backend-docbook[]
|
||||||
[index]
|
[index]
|
||||||
== Index
|
== Index
|
||||||
|
|
|
@ -2050,6 +2050,74 @@ The following example limits the above example to 200 MB/sec.
|
||||||
$ bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot MySnapshot -copy-to hdfs://srv2:8082/hbase -mappers 16 -bandwidth 200
|
$ bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot MySnapshot -copy-to hdfs://srv2:8082/hbase -mappers 16 -bandwidth 200
|
||||||
----
|
----
|
||||||
|
|
||||||
|
[[snapshots_s3]]
|
||||||
|
=== Storing Snapshots in an Amazon S3 Bucket
|
||||||
|
|
||||||
|
For general information and limitations of using Amazon S3 storage with HBase, see
|
||||||
|
<<amazon_s3_configuration>>. You can also store and retrieve snapshots from Amazon
|
||||||
|
S3, using the following procedure.
|
||||||
|
|
||||||
|
NOTE: You can also store snapshots in Microsoft Azure Blob Storage. See <<snapshots_azure>>.
|
||||||
|
|
||||||
|
.Prerequisites
|
||||||
|
- You must be using HBase 1.0 or higher and Hadoop 2.6.1 or higher, which is the first
|
||||||
|
configuration that uses the Amazon AWS SDK.
|
||||||
|
- You must use the `s3a://` protocol to connect to Amazon S3. The older `s3n://`
|
||||||
|
and `s3://` protocols have various limitations and do not use the Amazon AWS SDK.
|
||||||
|
- The `s3a://` URI must be configured and available on the server where you run
|
||||||
|
the commands to export and restore the snapshot.
|
||||||
|
|
||||||
|
After you have fulfilled the prerequisites, take the snapshot like you normally would.
|
||||||
|
Afterward, you can export it using the `org.apache.hadoop.hbase.snapshot.ExportSnapshot`
|
||||||
|
command like the one below, substituting your own `s3a://` path in the `copy-from`
|
||||||
|
or `copy-to` directive and substituting or modifying other options as required:
|
||||||
|
|
||||||
|
----
|
||||||
|
$ hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot \
|
||||||
|
-snapshot MySnapshot \
|
||||||
|
-copy-from hdfs://srv2:8082/hbase \
|
||||||
|
-copy-to s3a://<bucket>/<namespace>/hbase \
|
||||||
|
-chuser MyUser \
|
||||||
|
-chgroup MyGroup \
|
||||||
|
-chmod 700 \
|
||||||
|
-mappers 16
|
||||||
|
----
|
||||||
|
|
||||||
|
----
|
||||||
|
$ hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot \
|
||||||
|
-snapshot MySnapshot
|
||||||
|
-copy-from s3a://<bucket>/<namespace>/hbase \
|
||||||
|
-copy-to hdfs://srv2:8082/hbase \
|
||||||
|
-chuser MyUser \
|
||||||
|
-chgroup MyGroup \
|
||||||
|
-chmod 700 \
|
||||||
|
-mappers 16
|
||||||
|
----
|
||||||
|
|
||||||
|
You can also use the `org.apache.hadoop.hbase.snapshot.SnapshotInfo` utility with the `s3a://` path by including the
|
||||||
|
`-remote-dir` option.
|
||||||
|
|
||||||
|
----
|
||||||
|
$ hbase org.apache.hadoop.hbase.snapshot.SnapshotInfo \
|
||||||
|
-remote-dir s3a://<bucket>/<namespace>/hbase \
|
||||||
|
-list-snapshots
|
||||||
|
----
|
||||||
|
|
||||||
|
[[snapshots_azure]]
|
||||||
|
== Storing Snapshots in Microsoft Azure Blob Storage
|
||||||
|
|
||||||
|
You can store snapshots in Microsoft Azure Blog Storage using the same techniques
|
||||||
|
as in <<snapshots_s3>>.
|
||||||
|
|
||||||
|
.Prerequisites
|
||||||
|
- You must be using HBase 1.2 or higher with Hadoop 2.7.1 or
|
||||||
|
higher. No version of HBase supports Hadoop 2.7.0.
|
||||||
|
- Your hosts must be configured to be aware of the Azure blob storage filesystem.
|
||||||
|
See http://hadoop.apache.org/docs/r2.7.1/hadoop-azure/index.html.
|
||||||
|
|
||||||
|
After you meet the prerequisites, follow the instructions
|
||||||
|
in <<snapshots_s3>>, replacingthe protocol specifier with `wasb://` or `wasbs://`.
|
||||||
|
|
||||||
[[ops.capacity]]
|
[[ops.capacity]]
|
||||||
== Capacity Planning and Region Sizing
|
== Capacity Planning and Region Sizing
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue