diff --git a/src/main/asciidoc/_chapters/backup_restore.adoc b/src/main/asciidoc/_chapters/backup_restore.adoc new file mode 100644 index 00000000000..a9dbcf5ec12 --- /dev/null +++ b/src/main/asciidoc/_chapters/backup_restore.adoc @@ -0,0 +1,912 @@ +//// +/** + * + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +//// + +[[casestudies]] += Backup and Restore +:doctype: book +:numbered: +:toc: left +:icons: font +:experimental: + +[[br.overview]] +== Overview + +Backup and restore is a standard operation provided by many databases. An effective backup and restore +strategy helps ensure that users can recover data in case of unexpected failures. The HBase backup and restore +feature helps ensure that enterprises using HBase as a canonical data repository can recover from catastrophic +failures. Another important feature is the ability to restore the database to a particular +point-in-time, commonly referred to as a snapshot. + +The HBase backup and restore feature provides the ability to create full backups and incremental backups on +tables in an HBase cluster. The full backup is the foundation on which incremental backups are applied +to build iterative snapshots. Incremental backups can be run on a schedule to capture changes over time, +for example by using a Cron task. Incremental backups are more cost-effective than full backups because they only capture +the changes since the last backup and they also enable administrators to restore the database to any prior incremental backup. Furthermore, the +utilities also enable table-level data backup-and-recovery if you do not want to restore the entire dataset +of the backup. + +The backup and restore feature supplements the HBase Replication feature. While HBase replication is ideal for +creating "hot" copies of the data (where the replicated data is immediately available for query), the backup and +restore feature is ideal for creating "cold" copies of data (where a manual step must be taken to restore the system). +Previously, users only had the ability to create full backups via the ExportSnapshot functionality. The incremental +backup implementation is the novel improvement over the previous "art" provided by ExportSnapshot. + +[[br.terminology]] +== Terminology + +The backup and restore feature introduces new terminology which can be used to understand how control flows through the +system. + +* _A backup_: A logical unit of data and metadata which can restore a table to its state at a specific point in time. +* _Full backup_: a type of backup which wholly encapsulates the contents of the table at a point in time. +* _Incremental backup_: a type of backup which contains the changes in a table since a full backup. +* _Backup set_: A user-defined name which references one or more tables over which a backup can be executed. +* _Backup ID_: A unique names which identifies one backup from the rest, e.g. `backupId_1467823988425` + +[[br.planning]] +== Planning + +There are some common strategies which can be used to implement backup and restore in your environment. The following section +shows how these strategies are implemented and identifies potential tradeoffs with each. + +WARNING: This backup and restore tools has not been tested on Transparent Data Encryption (TDE) enabled HDFS clusters. +This is related to the open issue link:https://issues.apache.org/jira/browse/HBASE-16178[HBASE-16178]. + +[[br.intracluster.backup]] +=== Backup within a cluster + +This strategy stores the backups on the same cluster as where the backup was taken. This approach is only appropriate for testing +as it does not provide any additional safety on top of what the software itself already provides. + +.Intra-Cluster Backup +image::backup-intra-cluster.png[] + +[[br.dedicated.cluster.backup]] +=== Backup using a dedicated cluster + +This strategy provides greater fault tolerance and provides a path towards disaster recovery. In this setting, you will +store the backup on a separate HDFS cluster by supplying the backup destination cluster’s HDFS URL to the backup utility. +You should consider backing up to a different physical location, such as a different data center. + +Typically, a backup-dedicated HDFS cluster uses a more economical hardware profile to save money. + +.Dedicated HDFS Cluster Backup +image::backup-dedicated-cluster.png[] + +[[br.cloud.or.vendor.backup]] +=== Backup to the Cloud or a storage vendor appliance + +Another approach to safeguarding HBase incremental backups is to store the data on provisioned, secure servers that belong +to third-party vendors and that are located off-site. The vendor can be a public cloud provider or a storage vendor who uses +a Hadoop-compatible file system, such as S3 and other HDFS-compatible destinations. + +.Backup to Cloud or Vendor Storage Solutions +image::backup-cloud-appliance.png[] + +NOTE: The HBase backup utility does not support backup to multiple destinations. A workaround is to manually create copies +of the backup files from HDFS or S3. + +[[br.initial.setup]] +== First-time configuration steps + +This section contains the necessary configuration changes that must be made in order to use the backup and restore feature. +As this feature makes significant use of YARN's MapReduce framework to parallelize these I/O heavy operations, configuration +changes extend outside of just `hbase-site.xml`. + +=== Allow the "hbase" system user in YARN + +The YARN *container-executor.cfg* configuration file must have the following property setting: _allowed.system.users=hbase_. No spaces +are allowed in entries of this configuration file. + +WARNING: Skipping this step will result in runtime errors when executing the first backup tasks. + +*Example of a valid container-executor.cfg file for backup and restore:* + +[source] +---- +yarn.nodemanager.log-dirs=/var/log/hadoop/mapred +yarn.nodemanager.linux-container-executor.group=yarn +banned.users=hdfs,yarn,mapred,bin +allowed.system.users=hbase +min.user.id=500 +---- + +=== HBase specific changes + +Add the following properties to hbase-site.xml and restart HBase if it is already running. + +NOTE: The ",..." is an ellipsis meant to imply that this is a comma-separated list of values, not literal text which should be added to hbase-site.xml. + +[source] +---- + + hbase.backup.enable + true + + + hbase.master.logcleaner.plugins + org.apache.hadoop.hbase.backup.master.BackupLogCleaner,... + + + hbase.procedure.master.classes + org.apache.hadoop.hbase.backup.master.LogRollMasterProcedureManager,... + + + hbase.procedure.regionserver.classes + org.apache.hadoop.hbase.backup.regionserver.LogRollRegionServerProcedureManager,... + + + hbase.coprocessor.region.classes + org.apache.hadoop.hbase.backup.BackupObserver,... + + + hbase.master.hfilecleaner.plugins + org.apache.hadoop.hbase.backup.BackupHFileCleaner,... + +---- + +== Backup and Restore commands + +This covers the command-line utilities that administrators would run to create, restore, and merge backups. Tools to +inspect details on specific backup sessions is covered in the next section, <>. + +Run the command `hbase backup help ` to access the online help that provides basic information about a command +and its options. The below information is captured in this help message for each command. + +// hbase backup create + +[[br.creating.complete.backup]] +### Creating a Backup Image + +[NOTE] +==== +For HBase clusters also using Apache Phoenix: include the SQL system catalog tables in the backup. In the event that you +need to restore the HBase backup, access to the system catalog tables enable you to resume Phoenix interoperability with the +restored data. +==== + +The first step in running the backup and restore utilities is to perform a full backup and to store the data in a separate image +from the source. At a minimum, you must do this to get a baseline before you can rely on incremental backups. + +Run the following command as HBase superuser: + +[source] +---- +hbase backup create +---- + +After the command finishes running, the console prints a SUCCESS or FAILURE status message. The SUCCESS message includes a _backup_ ID. +The backup ID is the Unix time (also known as Epoch time) that the HBase master received the backup request from the client. + +[TIP] +==== +Record the backup ID that appears at the end of a successful backup. In case the source cluster fails and you need to recover the +dataset with a restore operation, having the backup ID readily available can save time. +==== + +[[br.create.positional.cli.arguments]] +#### Positional Command-Line Arguments + +_type_:: + The type of backup to execute: _full_ or _incremental_. As a reminder, an _incremental_ backup requires a _full_ backup to + already exist. + +_backup_path_:: + The _backup_path_ argument specifies the full filesystem URI of where to store the backup image. Valid prefixes are + are _hdfs:_, _webhdfs:_, _gpfs:_, and _s3fs:_. + +[[br.create.named.cli.arguments]] +#### Named Command-Line Arguments + +_-t _:: + A comma-separated list of tables to back up. If no tables are specified, all tables are backed up. No regular-expression or + wildcard support is present; all table names must be explicitly listed. See <> for more + information about peforming operations on collections of tables. Mutually exclusive with the _-s_ option; one of these + named options are required. + +_-s _:: + Identify tables to backup based on a backup set. See <> for the purpose and usage + of backup sets. Mutually exclusive with the _-t_ option. + +_-w _:: + (Optional) Specifies the number of parallel workers to copy data to backup destination. Backups are currently executed by MapReduce jobs + so this value corresponds to the number of Mappers that will be spawned by the job. + +_-b _:: + (Optional) Specifies the bandwidth of each worker in MB per second. + +_-d_:: + (Optional) Enables "DEBUG" mode which prints additional logging about the backup creation. + +_-q _:: + (Optional) Allows specification of the name of a YARN queue which the MapReduce job to create the backup should be executed in. This option + is useful to prevent backup tasks from stealing resources away from other MapReduce jobs of high importance. + +[[br.usage.examples]] +#### Example usage + +[source] +---- +$ hbase backup create full hdfs://host5:8020/data/backup -t SALES2,SALES3 -w 3 +---- + +This command creates a full backup image of two tables, SALES2 and SALES3, in the HDFS instance who NameNode is host5:8020 +in the path _/data/backup_. The _-w_ option specifies that no more than three parallel works complete the operation. + +// hbase backup restore + +[[br.restoring.backup]] +### Restoring a Backup Image + +Run the following command as an HBase superuser. You can only restore a backup on a running HBase cluster because the data must be +redistributed the RegionServers for the operation to complete successfully. + +[source] +---- +hbase restore +---- + +[[br.restore.positional.args]] +#### Positional Command-Line Arguments + +_backup_path_:: + The _backup_path_ argument specifies the full filesystem URI of where to store the backup image. Valid prefixes are + are _hdfs:_, _webhdfs:_, _gpfs:_, and _s3fs:_. + +_backup_id_:: + The backup ID that uniquely identifies the backup image to be restored. + + +[[br.restore.named.args]] +#### Named Command-Line Arguments + +_-t _:: + A comma-separated list of tables to restore. See <> for more + information about peforming operations on collections of tables. Mutually exclusive with the _-s_ option; one of these + named options are required. + +_-s _:: + Identify tables to backup based on a backup set. See <> for the purpose and usage + of backup sets. Mutually exclusive with the _-t_ option. + +_-q _:: + (Optional) Allows specification of the name of a YARN queue which the MapReduce job to create the backup should be executed in. This option + is useful to prevent backup tasks from stealing resources away from other MapReduce jobs of high importance. + +_-c_:: + (Optional) Perform a dry-run of the restore. The actions are checked, but not executed. + +_-m _:: + (Optional) A comma-separated list of tables to restore into. If this option is not provided, the original table name is used. When + this option is provided, there must be an equal number of entries provided in the `-t` option. + +_-o_:: + (Optional) Overwrites the target table for the restore if the table already exists. + + +[[br.restore.usage]] +#### Example of Usage + +[source] +---- +hbase backup restore /tmp/backup_incremental backupId_1467823988425 -t mytable1,mytable2 +---- + +This command restores two tables of an incremental backup image. In this example: +• `/tmp/backup_incremental` is the path to the directory containing the backup image. +• `backupId_1467823988425` is the backup ID. +• `mytable1` and `mytable2` are the names of tables in the backup image to be restored. + +// hbase backup merge + +[[br.merge.backup]] +### Merging Incremental Backup Images + +This command can be used to merge two or more incremental backup images into a single incremental +backup image. This can be used to consolidate multiple, small incremental backup images into a single +larger incremental backup image. This command could be used to merge hourly incremental backups +into a daily incremental backup image, or daily incremental backups into a weekly incremental backup. + +[source] +---- +$ hbase backup merge +---- + +[[br.merge.backup.positional.cli.arguments]] +#### Positional Command-Line Arguments + +_backup_ids_:: + A comma-separated list of incremental backup image IDs that are to be combined into a single image. + +[[br.merge.backup.named.cli.arguments]] +#### Named Command-Line Arguments + +None. + +[[br.merge.backup.example]] +#### Example usage + +[source] +---- +$ hbase backup merge backupId_1467823988425,backupId_1467827588425 +---- + +// hbase backup set + +[[br.using.backup.sets]] +### Using Backup Sets + +Backup sets can ease the administration of HBase data backups and restores by reducing the amount of repetitive input +of table names. You can group tables into a named backup set with the `hbase backup set add` command. You can then use +the -set option to invoke the name of a backup set in the `hbase backup create` or `hbase backup restore` rather than list +individually every table in the group. You can have multiple backup sets. + +NOTE: Note the differentiation between the `hbase backup set add` command and the _-set_ option. The `hbase backup set add` +command must be run before using the `-set` option in a different command because backup sets must be named and defined +before using backup sets as a shortcut. + +If you run the `hbase backup set add` command and specify a backup set name that does not yet exist on your system, a new set +is created. If you run the command with the name of an existing backup set name, then the tables that you specify are added +to the set. + +In this command, the backup set name is case-sensitive. + +NOTE: The metadata of backup sets are stored within HBase. If you do not have access to the original HBase cluster with the +backup set metadata, then you must specify individual table names to restore the data. + +To create a backup set, run the following command as the HBase superuser: + +[source] +---- +$ hbase backup set +---- + +[[br.set.subcommands]] +#### Backup Set Subcommands + +The following list details subcommands of the hbase backup set command. + +NOTE: You must enter one (and no more than one) of the following subcommands after hbase backup set to complete an operation. +Also, the backup set name is case-sensitive in the command-line utility. + +_add_:: + Adds table[s] to a backup set. Specify a _backup_set_name_ value after this argument to create a backup set. + +_remove_:: + Removes tables from the set. Specify the tables to remove in the tables argument. + +_list_:: + Lists all backup sets. + +_describe_:: + Displays a description of a backup set. The information includes whether the set has full + or incremental backups, start and end times of the backups, and a list of the tables in the set. This subcommand must precede + a valid value for the _backup_set_name_ value. + +_delete_:: + Deletes a backup set. Enter the value for the _backup_set_name_ option directly after the `hbase backup set delete` command. + +[[br.set.positional.cli.arguments]] +#### Positional Command-Line Arguments + +_backup_set_name_:: + Use to assign or invoke a backup set name. The backup set name must contain only printable characters and cannot have any spaces. + +_tables_:: + List of tables (or a single table) to include in the backup set. Enter the table names as a comma-separated list. If no tables + are specified, all tables are included in the set. + +TIP: Maintain a log or other record of the case-sensitive backup set names and the corresponding tables in each set on a separate +or remote cluster, backup strategy. This information can help you in case of failure on the primary cluster. + +[[br.set.usage]] +#### Example of Usage + +[source] +---- +$ hbase backup set add Q1Data TEAM3,TEAM_4 +---- + +Depending on the environment, this command results in _one_ of the following actions: + +* If the `Q1Data` backup set does not exist, a backup set containing tables `TEAM_3` and `TEAM_4` is created. +* If the `Q1Data` backup set exists already, the tables `TEAM_3` and `TEAM_4` are added to the `Q1Data` backup set. + +[[br.administration]] +## Administration of Backup Images + +The `hbase backup` command has several subcommands that help with administering backup images as they accumulate. Most production +environments require recurring backups, so it is necessary to have utilities to help manage the data of the backup repository. +Some subcommands enable you to find information that can help identify backups that are relevant in a search for particular data. +You can also delete backup images. + +The following list details each `hbase backup subcommand` that can help administer backups. Run the full command-subcommand line as +the HBase superuser. + +// hbase backup progress + +[[br.managing.backup.progress]] +### Managing Backup Progress + +You can monitor a running backup in another terminal session by running the _hbase backup progress_ command and specifying the backup ID as an argument. + +For example, run the following command as hbase superuser to view the progress of a backup + +[source] +---- +$ hbase backup progress +---- + +[[br.progress.positional.cli.arguments]] +#### Positional Command-Line Arguments + +_backup_id_:: + Specifies the backup that you want to monitor by seeing the progress information. The backupId is case-sensitive. + +[[br.progress.named.cli.arguments]] +#### Named Command-Line Arguments + +None. + +[[br.progress.example]] +#### Example usage + +[source] +---- +hbase backup progress backupId_1467823988425 +---- + +// hbase backup history + +[[br.managing.backup.history]] +### Managing Backup History + +This command displays a log of backup sessions. The information for each session includes backup ID, type (full or incremental), the tables +in the backup, status, and start and end time. Specify the number of backup sessions to display with the optional -n argument. + +[source] +---- +$ hbase backup history +---- + +[[br.history.positional.cli.arguments]] +#### Positional Command-Line Arguments + +_backup_id_:: + Specifies the backup that you want to monitor by seeing the progress information. The backupId is case-sensitive. + +[[br.history.named.cli.arguments]] +#### Named Command-Line Arguments + +_-n _:: + (Optional) The maximum number of backup records (Default: 10). + +_-p _:: + The full filesystem URI of where backup images are stored. + +_-s _:: + The name of the backup set to obtain history for. Mutually exclusive with the _-t_ option. + +_-t_ :: + The name of table to obtain history for. Mutually exclusive with the _-s_ option. + +[[br.history.backup.example]] +#### Example usage + +[source] +---- +$ hbase backup history +$ hbase backup history -n 20 +$ hbase backup history -t WebIndexRecords +---- + +// hbase backup describe + +[[br.describe.backup]] +### Describing a Backup Image + +This command can be used to obtain information about a specific backup image. + +[source] +---- +$ hbase backup describe +---- + +[[br.describe.backup.positional.cli.arguments]] +#### Positional Command-Line Arguments + +_backup_id_:: + The ID of the backup image to describe. + +[[br.describe.backup.named.cli.arguments]] +#### Named Command-Line Arguments + +None. + +[[br.describe.backup.example]] +#### Example usage + +[source] +---- +$ hbase backup describe backupId_1467823988425 +---- + +// hbase backup delete + +[[br.delete.backup]] +### Deleting a Backup Image + +This command can be used to delete a backup image which is no longer needed. + +[source] +---- +$ hbase backup delete +---- + +[[br.delete.backup.positional.cli.arguments]] +#### Positional Command-Line Arguments + +_backup_id_:: + The ID to the backup image which should be deleted. + +[[br.delete.backup.named.cli.arguments]] +#### Named Command-Line Arguments + +None. + +[[br.delete.backup.example]] +#### Example usage + +[source] +---- +$ hbase backup delete backupId_1467823988425 +---- + +// hbase backup repair + +[[br.repair.backup]] +### Backup Repair Command + +This command attempts to correct any inconsistencies in persisted backup metadata which exists as +the result of software errors or unhandled failure scenarios. While the backup implementation tries +to correct all errors on its own, this tool may be necessary in the cases where the system cannot +automatically recover on its own. + +[source] +---- +$ hbase backup repair +---- + +[[br.repair.backup.positional.cli.arguments]] +#### Positional Command-Line Arguments + +None. + +[[br.repair.backup.named.cli.arguments]] +### Named Command-Line Arguments + +None. + +[[br.repair.backup.example]] +#### Example usage + +[source] +---- +$ hbase backup repair +---- + +[[br.backup.configuration]] +## Configuration keys + +The backup and restore feature includes both required and optional configuration keys. + +### Required properties + +_hbase.backup.enable_: Controls whether or not the feature is enabled (Default: `false`). Set this value to `true`. + +_hbase.master.logcleaner.plugins_: A comma-separated list of classes invoked when cleaning logs in the HBase Master. Set +this value to `org.apache.hadoop.hbase.backup.master.BackupLogCleaner` or append it to the current value. + +_hbase.procedure.master.classes_: A comma-separated list of classes invoked with the Procedure framework in the Master. Set +this value to `org.apache.hadoop.hbase.backup.master.LogRollMasterProcedureManager` or append it to the current value. + +_hbase.procedure.regionserver.classes_: A comma-separated list of classes invoked with the Procedure framework in the RegionServer. +Set this value to `org.apache.hadoop.hbase.backup.regionserver.LogRollRegionServerProcedureManager` or append it to the current value. + +_hbase.coprocessor.region.classes_: A comma-separated list of RegionObservers deployed on tables. Set this value to +`org.apache.hadoop.hbase.backup.BackupObserver` or append it to the current value. + +_hbase.master.hfilecleaner.plugins_: A comma-separated list of HFileCleaners deployed on the Master. Set this value +to `org.apache.hadoop.hbase.backup.BackupHFileCleaner` or append it to the current value. + +### Optional properties + +_hbase.backup.system.ttl_: The time-to-live in seconds of data in the `hbase:backup` tables (default: forever). This property +is only relevant prior to the creation of the `hbase:backup` table. Use the `alter` command in the HBase shell to modify the TTL +when this table already exists. See the <> for more details on the impact of this +configuration property. + +_hbase.backup.attempts.max_: The number of attempts to perform when taking hbase table snapshots (default: 10). + +_hbase.backup.attempts.pause.ms_: The amount of time to wait between failed snapshot attempts in milliseconds (default: 10000). + +_hbase.backup.logroll.timeout.millis_: The amount of time (in milliseconds) to wait for RegionServers to execute a WAL rolling +in the Master's procedure framework (default: 30000). + +[[br.best.practices]] +## Best Practices + +### Formulate a restore strategy and test it. + +Before you rely on a backup and restore strategy for your production environment, identify how backups must be performed, +and more importantly, how restores must be performed. Test the plan to ensure that it is workable. +At a minimum, store backup data from a production cluster on a different cluster or server. To further safeguard the data, +use a backup location that is at a different physical location. + +If you have a unrecoverable loss of data on your primary production cluster as a result of computer system issues, you may +be able to restore the data from a different cluster or server at the same site. However, a disaster that destroys the whole +site renders locally stored backups useless. Consider storing the backup data and necessary resources (both computing capacity +and operator expertise) to restore the data at a site sufficiently remote from the production site. In the case of a catastrophe +at the whole primary site (fire, earthquake, etc.), the remote backup site can be very valuable. + +### Secure a full backup image first. + +As a baseline, you must complete a full backup of HBase data at least once before you can rely on incremental backups. The full +backup should be stored outside of the source cluster. To ensure complete dataset recovery, you must run the restore utility +with the option to restore baseline full backup. The full backup is the foundation of your dataset. Incremental backup data +is applied on top of the full backup during the restore operation to return you to the point in time when backup was last taken. + +### Define and use backup sets for groups of tables that are logical subsets of the entire dataset. + +You can group tables into an object called a backup set. A backup set can save time when you have a particular group of tables +that you expect to repeatedly back up or restore. + +When you create a backup set, you type table names to include in the group. The backup set includes not only groups of related +tables, but also retains the HBase backup metadata. Afterwards, you can invoke the backup set name to indicate what tables apply +to the command execution instead of entering all the table names individually. + +### Document the backup and restore strategy, and ideally log information about each backup. + +Document the whole process so that the knowledge base can transfer to new administrators after employee turnover. As an extra +safety precaution, also log the calendar date, time, and other relevant details about the data of each backup. This metadata +can potentially help locate a particular dataset in case of source cluster failure or primary site disaster. Maintain duplicate +copies of all documentation: one copy at the production cluster site and another at the backup location or wherever it can be +accessed by an administrator remotely from the production cluster. + +[[br.s3.backup.scenario]] +## Scenario: Safeguarding Application Datasets on Amazon S3 + +This scenario describes how a hypothetical retail business uses backups to safeguard application data and then restore the dataset +after failure. + +The HBase administration team uses backup sets to store data from a group of tables that have interrelated information for an +application called green. In this example, one table contains transaction records and the other contains customer details. The +two tables need to be backed up and be recoverable as a group. + +The admin team also wants to ensure daily backups occur automatically. + +.Tables Composing The Backup Set +image::backup-app-components.png[] + +The following is an outline of the steps and examples of commands that are used to backup the data for the _green_ application and +to recover the data later. All commands are run when logged in as HBase superuser. + +1. A backup set called _green_set_ is created as an alias for both the transactions table and the customer table. The backup set can +be used for all operations to avoid typing each table name. The backup set name is case-sensitive and should be formed with only +printable characters and without spaces. + +[source] +---- +$ hbase backup set add green_set transactions +$ hbase backup set add green_set customer +---- + +2. The first backup of green_set data must be a full backup. The following command example shows how credentials are passed to Amazon +S3 and specifies the file system with the s3a: prefix. + +[source] +---- +$ ACCESS_KEY=ABCDEFGHIJKLMNOPQRST +$ SECRET_KEY=123456789abcdefghijklmnopqrstuvwxyzABCD +$ sudo -u hbase hbase backup create full\ + s3a://$ACCESS_KEY:SECRET_KEY@prodhbasebackups/backups -s green_set +---- + +3. Incremental backups should be run according to a schedule that ensures essential data recovery in the event of a catastrophe. At +this retail company, the HBase admin team decides that automated daily backups secures the data sufficiently. The team decides that +they can implement this by modifying an existing Cron job that is defined in `/etc/crontab`. Consequently, IT modifies the Cron job +by adding the following line: + +[source] +---- +@daily hbase hbase backup create incremental s3a://$ACCESS_KEY:$SECRET_KEY@prodhbasebackups/backups -s green_set +---- + +4. A catastrophic IT incident disables the production cluster that the green application uses. An HBase system administrator of the +backup cluster must restore the _green_set_ dataset to the point in time closest to the recovery objective. + +NOTE: If the administrator of the backup HBase cluster has the backup ID with relevant details in accessible records, the following +search with the `hdfs dfs -ls` command and manually scanning the backup ID list can be bypassed. Consider continuously maintaining +and protecting a detailed log of backup IDs outside the production cluster in your environment. + +The HBase administrator runs the following command on the directory where backups are stored to print the list of successful backup +IDs on the console: + +`hdfs dfs -ls -t /prodhbasebackups/backups` + +5. The admin scans the list to see which backup was created at a date and time closest to the recovery objective. To do this, the +admin converts the calendar timestamp of the recovery point in time to Unix time because backup IDs are uniquely identified with +Unix time. The backup IDs are listed in reverse chronological order, meaning the most recent successful backup appears first. + +The admin notices that the following line in the command output corresponds with the _green_set_ backup that needs to be restored: + +`/prodhbasebackups/backups/backup_1467823988425` + +6. The admin restores green_set invoking the backup ID and the -overwrite option. The -overwrite option truncates all existing data +in the destination and populates the tables with data from the backup dataset. Without this flag, the backup data is appended to the +existing data in the destination. In this case, the admin decides to overwrite the data because it is corrupted. + +[source] +---- +$ sudo -u hbase hbase restore -s green_set \ + s3a://$ACCESS_KEY:$SECRET_KEY@prodhbasebackups/backups backup_1467823988425 \ -overwrite +---- + +[[br.data.security]] +## Security of Backup Data + +With this feature which makes copying data to remote locations, it's important to take a moment to clearly state the procedural +concerns that exist around data security. Like the HBase replication feature, backup and restore provides the constructs to automatically +copy data from within a corporate boundary to some system outside of that boundary. It is imperative when storing sensitive data that with backup and restore, much +less any feature which extracts data from HBase, the locations to which data is being sent has undergone a security audit to ensure +that only authenticated users are allowed to access that data. + +For example, with the above example of backing up data to S3, it is of the utmost importance that the proper permissions are assigned +to the S3 bucket to ensure that only a minimum set of authorized users are allowed to access this data. Because the data is no longer +being accessed via HBase, and its authentication and authorization controls, we must ensure that the filesystem storing that data is +providing a comparable level of security. This is a manual step which users *must* implement on their own. + +[[br.technical.details]] +## Technical Details of Incremental Backup and Restore + +HBase incremental backups enable more efficient capture of HBase table images than previous attempts at serial backup and restore +solutions, such as those that only used HBase Export and Import APIs. Incremental backups use Write Ahead Logs (WALs) to capture +the data changes since the previous backup was created. A WAL roll (create new WALs) is executed across all RegionServers to track +the WALs that need to be in the backup. + +After the incremental backup image is created, the source backup files usually are on same node as the data source. A process similar +to the DistCp (distributed copy) tool is used to move the source backup files to the target file systems. When a table restore operation +starts, a two-step process is initiated. First, the full backup is restored from the full backup image. Second, all WAL files from +incremental backups between the last full backup and the incremental backup being restored are converted to HFiles, which the HBase +Bulk Load utility automatically imports as restored data in the table. + +You can only restore on a live HBase cluster because the data must be redistributed to complete the restore operation successfully. + +[[br.filesystem.growth.warning]] +## A Warning on File System Growth + +As a reminder, incremental backups are implemented via retaining the write-ahead logs which HBase primarily uses for data durability. +Thus, to ensure that all data needing to be included in a backup is still available in the system, the HBase backup and restore feature +retains all write-ahead logs since the last backup until the next incremental backup is executed. + +Like HBase Snapshots, this can have an expectedly large impact on the HDFS usage of HBase for high volume tables. Take care in enabling +and using the backup and restore feature, specifically with a mind to removing backup sessions when they are not actively being used. + +The only automated, upper-bound on retained write-ahead logs for backup and restore is based on the TTL of the `hbase:backup` system table which, +as of the time this document is written, is infinite (backup table entries are never automatically deleted). This requires that administrators +perform backups on a schedule whose frequency is relative to the amount of available space on HDFS (e.g. less available HDFS space requires +more aggressive backup merges and deletions). As a reminder, the TTL can be altered on the `hbase:backup` table using the `alter` command +in the HBase shell. Modifying the configuration property `hbase.backup.system.ttl` in hbase-site.xml after the system table exists has no effect. + +[[br.backup.capacity.planning]] +## Capacity Planning + +When designing a distributed system deployment, it is critical that some basic mathmatical rigor is executed to ensure sufficient computational +capacity is available given the data and software requirements of the system. For this feature, the availability of network capacity is the largest +bottleneck when estimating the performance of some implementation of backup and restore. The second most costly function is the speed at which +data can be read/written. + +### Full Backups + +To estimate the duration of a full backup, we have to understand the general actions which are invoked: + +* Write-ahead log roll on each RegionServer: ones to tens of seconds per RegionServer in parallel. Relative to the load on each RegionServer. +* Take an HBase snapshot of the table(s): tens of seconds. Relative to the number of regions and files that comprise the table. +* Export the snapshot to the destination: see below. Relative to the size of the data and the network bandwidth to the destination. + +[[br.export.snapshot.cost]] +To approximate how long the final step will take, we have to make some assumptions on hardware. Be aware that these will *not* be accurate for your +system -- these are numbers that your or your administrator know for your system. Let's say the speed of reading data from HDFS on a single node is +capped at 80MB/s (across all Mappers that run on that host), a modern network interface controller (NIC) supports 10Gb/s, the top-of-rack switch can +handle 40Gb/s, and the WAN between your clusters is 10Gb/s. This means that you can only ship data to your remote at a speed of 1.25GB/s -- meaning +that 16 nodes (`1.25 * 1024 / 80 = 16`) participating in the ExportSnapshot should be able to fully saturate the link between clusters. With more +nodes in the cluster, we can still saturate the network but at a lesser impact on any one node which helps ensure local SLAs are made. If the size +of the snapshot is 10TB, this would full backup would take in the ballpark of 2.5 hours (`10 * 1024 / 1.25 / (60 * 60) = 2.23hrs`) + +As a general statement, it is very likely that the WAN bandwidth between your local cluster and the remote storage is the largest +bottleneck to the speed of a full backup. + +When the concern is restricting the computational impact of backups to a "production system", the above formulas can be reused with the optional +command-line arguments to `hbase backup create`: `-b`, `-w`, `-q`. The `-b` option defines the bandwidth at which each worker (Mapper) would +write data. The `-w` argument limits the number of workers that would be spawned in the DistCp job. The `-q` allows the user to specify a YARN +queue which can limit the specific nodes where the workers will be spawned -- this can quarantine the backup workers performing the copy to +a set of non-critical nodes. Relating the `-b` and `-w` options to our earlier equations: `-b` would be used to restrict each node from reading +data at the full 80MB/s and `-w` is used to limit the job from spawning 16 worker tasks. + +### Incremental Backup + +Like we did for full backups, we have to understand the incremental backup process to approximate its runtime and cost. + +* Identify new write-ahead logs since last full or incremental backup: negligible. Apriori knowledge from the backup system table(s). +* Read, filter, and write "minimized" HFiles equivalent to the WALs: dominated by the speed of writing data. Relative to write speed of HDFS. +* DistCp the HFiles to the destination: <>. + +For the second step, the dominating cost of this operation would be the re-writing the data (under the assumption that a majority of the +data in the WAL is preserved). In this case, we can assume an aggregate write speed of 30MB/s per node. Continuing our 16-node cluster example, +this would require approximately 15 minutes to perform this step for 50GB of data (50 * 1024 / 60 / 60 = 14.2). The amount of time to start the +DistCp MapReduce job would likely dominate the actual time taken to copy the data (50 / 1.25 = 40 seconds) and can be ignored. + +[[br.limitations]] +## Limitations of the Backup and Restore Utility + +*Serial backup operations* + +Backup operations cannot be run concurrently. An operation includes actions like create, delete, restore, and merge. Only one active backup session is supported. link:https://issues.apache.org/jira/browse/HBASE-16391[HBASE-16391] +will introduce multiple-backup sessions support. + +*No means to cancel backups* + +Both backup and restore operations cannot be canceled. (link:https://issues.apache.org/jira/browse/HBASE-15997[HBASE-15997], link:https://issues.apache.org/jira/browse/HBASE-15998[HBASE-15998]). +The workaround to cancel a backup would be to kill the client-side backup command (`control-C`), ensure all relevant MapReduce jobs have exited, and then +run the `hbase backup repair` command to ensure the system backup metadata is consistent. + +*Backups can only be saved to a single location* + +Copying backup information to multiple locations is an exercise left to the user. link:https://issues.apache.org/jira/browse/HBASE-15476[HBASE-15476] will +introduce the ability to specify multiple-backup destinations intrinsically. + +*HBase superuser access is required* + +Only an HBase superuser (e.g. hbase) is allowed to perform backup/restore, can pose a problem for shared HBase installations. Current mitigations would require +coordination with system administrators to build and deploy a backup and restore strategy (link:https://issues.apache.org/jira/browse/HBASE-14138[HBASE-14138]). + +*Backup restoration is an online operation* + +To perform a restore from a backup, it requires that the HBase cluster is online as a caveat of the current implementation (link:https://issues.apache.org/jira/browse/HBASE-16573[HBASE-16573]). + +*Some operations may fail and require re-run* + +The HBase backup feature is primarily client driven. While there is the standard HBase retry logic built into the HBase Connection, persistent errors in executing operations +may propagate back to the client (e.g. snapshot failure due to region splits). The backup implementation should be moved from client-side into the ProcedureV2 framework +in the future which would provide additional robustness around transient/retryable failures. The `hbase backup repair` command is meant to correct states which the system +cannot automatically detect and recover from. + +*Avoidance of declaration of public API* + +While the Java API to interact with this feature exists and its implementation is separated from an interface, insufficient rigor has been applied to determine if +it is exactly what we intend to ship to users. As such, it is marked as for a `Private` audience with the expectation that, as users begin to try the feature, there +will be modifications that would necessitate breaking compatibility (link:https://issues.apache.org/jira/browse/HBASE-17517[HBASE-17517]). + +*Lack of global metrics for backup and restore* + +Individual backup and restore operations contain metrics about the amount of work the operation included, but there is no centralized location (e.g. the Master UI) +which present information for consumption (link:https://issues.apache.org/jira/browse/HBASE-16565[HBASE-16565]). diff --git a/src/main/asciidoc/book.adoc b/src/main/asciidoc/book.adoc index f6c6cdb0dd7..1ceb3baa7e1 100644 --- a/src/main/asciidoc/book.adoc +++ b/src/main/asciidoc/book.adoc @@ -19,7 +19,7 @@ */ //// -= Apache HBase (TM) Reference Guide += Apache HBase (TM) Reference Guide :Author: Apache HBase Team :Email: :doctype: book @@ -62,6 +62,7 @@ include::_chapters/mapreduce.adoc[] include::_chapters/security.adoc[] include::_chapters/architecture.adoc[] include::_chapters/hbase_mob.adoc[] +include::_chapters/backup_restore.adoc[] include::_chapters/hbase_apis.adoc[] include::_chapters/external_apis.adoc[] include::_chapters/thrift_filter_language.adoc[] @@ -92,5 +93,3 @@ include::_chapters/asf.adoc[] include::_chapters/orca.adoc[] include::_chapters/tracing.adoc[] include::_chapters/rpc.adoc[] - - diff --git a/src/main/site/resources/images/backup-app-components.png b/src/main/site/resources/images/backup-app-components.png new file mode 100644 index 00000000000..5e403e25c6c Binary files /dev/null and b/src/main/site/resources/images/backup-app-components.png differ diff --git a/src/main/site/resources/images/backup-cloud-appliance.png b/src/main/site/resources/images/backup-cloud-appliance.png new file mode 100644 index 00000000000..76b6d5a7964 Binary files /dev/null and b/src/main/site/resources/images/backup-cloud-appliance.png differ diff --git a/src/main/site/resources/images/backup-dedicated-cluster.png b/src/main/site/resources/images/backup-dedicated-cluster.png new file mode 100644 index 00000000000..bca282d8308 Binary files /dev/null and b/src/main/site/resources/images/backup-dedicated-cluster.png differ diff --git a/src/main/site/resources/images/backup-intra-cluster.png b/src/main/site/resources/images/backup-intra-cluster.png new file mode 100644 index 00000000000..113c577f63c Binary files /dev/null and b/src/main/site/resources/images/backup-intra-cluster.png differ