HDFS-8974. Convert docs in xdoc format to markdown. Contributed by Masatake Iwasaki.

This commit is contained in:
Akira Ajisaka 2015-09-10 16:50:57 +09:00
parent b539bb41bb
commit c260a8aa93
5 changed files with 615 additions and 653 deletions

View File

@ -563,6 +563,9 @@ Release 2.8.0 - UNRELEASED
HDFS-7116. Add a command to get the balancer bandwidth
(Rakesh R via vinayakumarb)
HDFS-8974. Convert docs in xdoc format to markdown.
(Masatake Iwasaki via aajisaka)
OPTIMIZATIONS
HDFS-8026. Trace FSOutputSummer#writeChecksumChunks rather than

View File

@ -0,0 +1,311 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
HDFS Rolling Upgrade
====================
* [Introduction](#Introduction)
* [Upgrade](#Upgrade)
* [Upgrade without Downtime](#Upgrade_without_Downtime)
* [Upgrading Non-Federated Clusters](#Upgrading_Non-Federated_Clusters)
* [Upgrading Federated Clusters](#Upgrading_Federated_Clusters)
* [Upgrade with Downtime](#Upgrade_with_Downtime)
* [Upgrading Non-HA Clusters](#Upgrading_Non-HA_Clusters)
* [Downgrade and Rollback](#Downgrade_and_Rollback)
* [Downgrade](#Downgrade)
* [Downgrade without Downtime](#Downgrade_without_Downtime)
* [Downgrade with Downtime](#Downgrade_with_Downtime)
* [Rollback](#Rollback)
* [Commands and Startup Options for Rolling Upgrade](#Commands_and_Startup_Options_for_Rolling_Upgrade)
* [DFSAdmin Commands](#DFSAdmin_Commands)
* [dfsadmin -rollingUpgrade](#dfsadmin_-rollingUpgrade)
* [dfsadmin -getDatanodeInfo](#dfsadmin_-getDatanodeInfo)
* [dfsadmin -shutdownDatanode](#dfsadmin_-shutdownDatanode)
* [NameNode Startup Options](#NameNode_Startup_Options)
* [namenode -rollingUpgrade](#namenode_-rollingUpgrade)
Introduction
------------
*HDFS rolling upgrade* allows upgrading individual HDFS daemons.
For examples, the datanodes can be upgraded independent of the namenodes.
A namenode can be upgraded independent of the other namenodes.
The namenodes can be upgraded independent of datanods and journal nodes.
Upgrade
-------
In Hadoop v2, HDFS supports highly-available (HA) namenode services and wire compatibility.
These two capabilities make it feasible to upgrade HDFS without incurring HDFS downtime.
In order to upgrade a HDFS cluster without downtime, the cluster must be setup with HA.
If there is any new feature which is enabled in new software release, may not work with old software release after upgrade.
In such cases upgrade should be done by following steps.
1. Disable new feature.
2. Upgrade the cluster.
3. Enable the new feature.
Note that rolling upgrade is supported only from Hadoop-2.4.0 onwards.
### Upgrade without Downtime
In a HA cluster, there are two or more *NameNodes (NNs)*, many *DataNodes (DNs)*,
a few *JournalNodes (JNs)* and a few *ZooKeeperNodes (ZKNs)*.
*JNs* is relatively stable and does not require upgrade when upgrading HDFS in most of the cases.
In the rolling upgrade procedure described here,
only *NNs* and *DNs* are considered but *JNs* and *ZKNs* are not.
Upgrading *JNs* and *ZKNs* may incur cluster downtime.
#### Upgrading Non-Federated Clusters
Suppose there are two namenodes *NN1* and *NN2*,
where *NN1* and *NN2* are respectively in active and standby states.
The following are the steps for upgrading a HA cluster:
1. Prepare Rolling Upgrade
1. Run "[`hdfs dfsadmin -rollingUpgrade prepare`](#dfsadmin_-rollingUpgrade)"
to create a fsimage for rollback.
1. Run "[`hdfs dfsadmin -rollingUpgrade query`](#dfsadmin_-rollingUpgrade)"
to check the status of the rollback image.
Wait and re-run the command until
the "`Proceed with rolling upgrade`" message is shown.
1. Upgrade Active and Standby *NNs*
1. Shutdown and upgrade *NN2*.
1. Start *NN2* as standby with the
"[`-rollingUpgrade started`](#namenode_-rollingUpgrade)" option.
1. Failover from *NN1* to *NN2*
so that *NN2* becomes active and *NN1* becomes standby.
1. Shutdown and upgrade *NN1*.
1. Start *NN1* as standby with the
"[`-rollingUpgrade started`](#namenode_-rollingUpgrade)" option.
1. Upgrade *DNs*
1. Choose a small subset of datanodes (e.g. all datanodes under a particular rack).
1. Run "[`hdfs dfsadmin -shutdownDatanode <DATANODE_HOST:IPC_PORT> upgrade`](#dfsadmin_-shutdownDatanode)"
to shutdown one of the chosen datanodes.
1. Run "[`hdfs dfsadmin -getDatanodeInfo <DATANODE_HOST:IPC_PORT>`](#dfsadmin_-getDatanodeInfo)"
to check and wait for the datanode to shutdown.
1. Upgrade and restart the datanode.
1. Perform the above steps for all the chosen datanodes in the subset in parallel.
1. Repeat the above steps until all datanodes in the cluster are upgraded.
1. Finalize Rolling Upgrade
1. Run "[`hdfs dfsadmin -rollingUpgrade finalize`](#dfsadmin_-rollingUpgrade)"
to finalize the rolling upgrade.
#### Upgrading Federated Clusters
In a federated cluster, there are multiple namespaces
and a pair of active and standby *NNs* for each namespace.
The procedure for upgrading a federated cluster is similar to upgrading a non-federated cluster
except that Step 1 and Step 4 are performed on each namespace
and Step 2 is performed on each pair of active and standby *NNs*, i.e.
1. Prepare Rolling Upgrade for Each Namespace
1. Upgrade Active and Standby *NN* pairs for Each Namespace
1. Upgrade *DNs*
1. Finalize Rolling Upgrade for Each Namespace
### Upgrade with Downtime
For non-HA clusters,
it is impossible to upgrade HDFS without downtime since it requires restarting the namenodes.
However, datanodes can still be upgraded in a rolling manner.
#### Upgrading Non-HA Clusters
In a non-HA cluster, there are a *NameNode (NN)*, a *SecondaryNameNode (SNN)*
and many *DataNodes (DNs)*.
The procedure for upgrading a non-HA cluster is similar to upgrading a HA cluster
except that Step 2 "Upgrade Active and Standby *NNs*" is changed to below:
* Upgrade *NN* and *SNN*
1. Shutdown *SNN*
1. Shutdown and upgrade *NN*.
1. Start *NN* with the
"[`-rollingUpgrade started`](#namenode_-rollingUpgrade)" option.
1. Upgrade and restart *SNN*
Downgrade and Rollback
----------------------
When the upgraded release is undesirable
or, in some unlikely case, the upgrade fails (due to bugs in the newer release),
administrators may choose to downgrade HDFS back to the pre-upgrade release,
or rollback HDFS to the pre-upgrade release and the pre-upgrade state.
Note that downgrade can be done in a rolling fashion but rollback cannot.
Rollback requires cluster downtime.
Note also that downgrade and rollback are possible only after a rolling upgrade is started and
before the upgrade is terminated.
An upgrade can be terminated by either finalize, downgrade or rollback.
Therefore, it may not be possible to perform rollback after finalize or downgrade,
or to perform downgrade after finalize.
Downgrade
---------
*Downgrade* restores the software back to the pre-upgrade release
and preserves the user data.
Suppose time *T* is the rolling upgrade start time and the upgrade is terminated by downgrade.
Then, the files created before or after *T* remain available in HDFS.
The files deleted before or after *T* remain deleted in HDFS.
A newer release is downgradable to the pre-upgrade release
only if both the namenode layout version and the datenode layout version
are not changed between these two releases.
### Downgrade without Downtime
In a HA cluster,
when a rolling upgrade from an old software release to a new software release is in progress,
it is possible to downgrade, in a rolling fashion, the upgraded machines back to the old software release.
Same as before, suppose *NN1* and *NN2* are respectively in active and standby states.
Below are the steps for rolling downgrade:
1. Downgrade *DNs*
1. Choose a small subset of datanodes (e.g. all datanodes under a particular rack).
1. Run "[`hdfs dfsadmin -shutdownDatanode <DATANODE_HOST:IPC_PORT> upgrade`](#dfsadmin_-shutdownDatanode)"
to shutdown one of the chosen datanodes.
1. Run "[`hdfs dfsadmin -getDatanodeInfo <DATANODE_HOST:IPC_PORT>`](#dfsadmin_-getDatanodeInfo)"
to check and wait for the datanode to shutdown.
1. Downgrade and restart the datanode.
1. Perform the above steps for all the chosen datanodes in the subset in parallel.
1. Repeat the above steps until all upgraded datanodes in the cluster are downgraded.
1. Downgrade Active and Standby *NNs*
1. Shutdown and downgrade *NN2*.
1. Start *NN2* as standby normally. (Note that it is incorrect to use the
"[`-rollingUpgrade downgrade`](#namenode_-rollingUpgrade)" option here.)
1. Failover from *NN1* to *NN2*
so that *NN2* becomes active and *NN1* becomes standby.
1. Shutdown and upgrade *NN1*.
1. Start *NN1* as standby normally. (Note that it is incorrect to use the
"[`-rollingUpgrade downgrade`](#namenode_-rollingUpgrade)" option here.)
1. Finalize Rolling Downgrade
1. Run "[`hdfs dfsadmin -rollingUpgrade finalize`](#dfsadmin_-rollingUpgrade)"
to finalize the rolling downgrade.
Note that the datanodes must be downgraded before downgrading the namenodes
since protocols may be changed in a backward compatible manner but not forward compatible,
i.e. old datanodes can talk to the new namenodes but not vice versa.
### Downgrade with Downtime
Administrator may choose to first shutdown the cluster and then downgrade it.
The following are the steps:
1. Shutdown all *NNs* and *DNs*.
1. Restore the pre-upgrade release in all machines.
1. Start *NNs* with the
"[`-rollingUpgrade downgrade`](#namenode_-rollingUpgrade)" option.
1. Start *DNs* normally.
Rollback
--------
*Rollback* restores the software back to the pre-upgrade release
but also reverts the user data back to the pre-upgrade state.
Suppose time *T* is the rolling upgrade start time and the upgrade is terminated by rollback.
The files created before *T* remain available in HDFS but the files created after *T* become unavailable.
The files deleted before *T* remain deleted in HDFS but the files deleted after *T* are restored.
Rollback from a newer release to the pre-upgrade release is always supported.
However, it cannot be done in a rolling fashion. It requires cluster downtime.
Suppose *NN1* and *NN2* are respectively in active and standby states.
Below are the steps for rollback:
* Rollback HDFS
1. Shutdown all *NNs* and *DNs*.
1. Restore the pre-upgrade release in all machines.
1. Start *NN1* as Active with the
"[`-rollingUpgrade rollback`](#namenode_-rollingUpgrade)" option.
1. Run `-bootstrapStandby' on NN2 and start it normally as standby.
1. Start *DNs* with the "`-rollback`" option.
Commands and Startup Options for Rolling Upgrade
------------------------------------------------
### DFSAdmin Commands
#### `dfsadmin -rollingUpgrade`
hdfs dfsadmin -rollingUpgrade <query|prepare|finalize>
Execute a rolling upgrade action.
* Options:
| --- | --- |
| `query` | Query the current rolling upgrade status. |
| `prepare` | Prepare a new rolling upgrade. |
| `finalize` | Finalize the current rolling upgrade. |
#### `dfsadmin -getDatanodeInfo`
hdfs dfsadmin -getDatanodeInfo <DATANODE_HOST:IPC_PORT>
Get the information about the given datanode.
This command can be used for checking if a datanode is alive
like the Unix `ping` command.
#### `dfsadmin -shutdownDatanode`
hdfs dfsadmin -shutdownDatanode <DATANODE_HOST:IPC_PORT> [upgrade]
Submit a shutdown request for the given datanode.
If the optional `upgrade` argument is specified,
clients accessing the datanode will be advised to wait for it to restart
and the fast start-up mode will be enabled.
When the restart does not happen in time, clients will timeout and ignore the datanode.
In such case, the fast start-up mode will also be disabled.
Note that the command does not wait for the datanode shutdown to complete.
The "[`dfsadmin -getDatanodeInfo`](#dfsadmin_-getDatanodeInfo)"
command can be used for checking if the datanode shutdown is completed.
### NameNode Startup Options
#### `namenode -rollingUpgrade`
hdfs namenode -rollingUpgrade <downgrade|rollback|started>
When a rolling upgrade is in progress,
the `-rollingUpgrade` namenode startup option is used to specify
various rolling upgrade options.
* Options:
| --- | --- |
| `downgrade` | Restores the namenode back to the pre-upgrade release and preserves the user data. |
| `rollback` | Restores the namenode back to the pre-upgrade release but also reverts the user data back to the pre-upgrade state. |
| `started` | Specifies a rolling upgrade already started so that the namenode should allow image directories with different layout versions during startup. |

View File

@ -0,0 +1,301 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
HDFS Snapshots
==============
* [HDFS Snapshots](#HDFS_Snapshots)
* [Overview](#Overview)
* [Snapshottable Directories](#Snapshottable_Directories)
* [Snapshot Paths](#Snapshot_Paths)
* [Upgrading to a version of HDFS with snapshots](#Upgrading_to_a_version_of_HDFS_with_snapshots)
* [Snapshot Operations](#Snapshot_Operations)
* [Administrator Operations](#Administrator_Operations)
* [Allow Snapshots](#Allow_Snapshots)
* [Disallow Snapshots](#Disallow_Snapshots)
* [User Operations](#User_Operations)
* [Create Snapshots](#Create_Snapshots)
* [Delete Snapshots](#Delete_Snapshots)
* [Rename Snapshots](#Rename_Snapshots)
* [Get Snapshottable Directory Listing](#Get_Snapshottable_Directory_Listing)
* [Get Snapshots Difference Report](#Get_Snapshots_Difference_Report)
Overview
--------
HDFS Snapshots are read-only point-in-time copies of the file system.
Snapshots can be taken on a subtree of the file system or the entire file system.
Some common use cases of snapshots are data backup, protection against user errors
and disaster recovery.
The implementation of HDFS Snapshots is efficient:
* Snapshot creation is instantaneous:
the cost is *O(1)* excluding the inode lookup time.
* Additional memory is used only when modifications are made relative to a snapshot:
memory usage is *O(M)*,
where *M* is the number of modified files/directories.
* Blocks in datanodes are not copied:
the snapshot files record the block list and the file size.
There is no data copying.
* Snapshots do not adversely affect regular HDFS operations:
modifications are recorded in reverse chronological order
so that the current data can be accessed directly.
The snapshot data is computed by subtracting the modifications
from the current data.
### Snapshottable Directories
Snapshots can be taken on any directory once the directory has been set as
*snapshottable*.
A snapshottable directory is able to accommodate 65,536 simultaneous snapshots.
There is no limit on the number of snapshottable directories.
Administrators may set any directory to be snapshottable.
If there are snapshots in a snapshottable directory,
the directory can be neither deleted nor renamed
before all the snapshots are deleted.
Nested snapshottable directories are currently not allowed.
In other words, a directory cannot be set to snapshottable
if one of its ancestors/descendants is a snapshottable directory.
### Snapshot Paths
For a snapshottable directory,
the path component *".snapshot"* is used for accessing its snapshots.
Suppose `/foo` is a snapshottable directory,
`/foo/bar` is a file/directory in `/foo`,
and `/foo` has a snapshot `s0`.
Then, the path `/foo/.snapshot/s0/bar`
refers to the snapshot copy of `/foo/bar`.
The usual API and CLI can work with the ".snapshot" paths.
The following are some examples.
* Listing all the snapshots under a snapshottable directory:
hdfs dfs -ls /foo/.snapshot
* Listing the files in snapshot `s0`:
hdfs dfs -ls /foo/.snapshot/s0
* Copying a file from snapshot `s0`:
hdfs dfs -cp -ptopax /foo/.snapshot/s0/bar /tmp
Note that this example uses the preserve option to preserve
timestamps, ownership, permission, ACLs and XAttrs.
Upgrading to a version of HDFS with snapshots
---------------------------------------------
The HDFS snapshot feature introduces a new reserved path name used to
interact with snapshots: `.snapshot`. When upgrading from an
older version of HDFS, existing paths named `.snapshot` need
to first be renamed or deleted to avoid conflicting with the reserved path.
See the upgrade section in
[the HDFS user guide](HdfsUserGuide.html#Upgrade_and_Rollback)
for more information.
Snapshot Operations
-------------------
### Administrator Operations
The operations described in this section require superuser privilege.
#### Allow Snapshots
Allowing snapshots of a directory to be created.
If the operation completes successfully, the directory becomes snapshottable.
* Command:
hdfs dfsadmin -allowSnapshot <path>
* Arguments:
| --- | --- |
| path | The path of the snapshottable directory. |
See also the corresponding Java API
`void allowSnapshot(Path path)` in `HdfsAdmin`.
#### Disallow Snapshots
Disallowing snapshots of a directory to be created.
All snapshots of the directory must be deleted before disallowing snapshots.
* Command:
hdfs dfsadmin -disallowSnapshot <path>
* Arguments:
| --- | --- |
| path | The path of the snapshottable directory. |
See also the corresponding Java API
`void disallowSnapshot(Path path)` in `HdfsAdmin`.
### User Operations
The section describes user operations.
Note that HDFS superuser can perform all the operations
without satisfying the permission requirement in the individual operations.
#### Create Snapshots
Create a snapshot of a snapshottable directory.
This operation requires owner privilege of the snapshottable directory.
* Command:
hdfs dfs -createSnapshot <path> [<snapshotName>]
* Arguments:
| --- | --- |
| path | The path of the snapshottable directory. |
| snapshotName | The snapshot name, which is an optional argument. When it is omitted, a default name is generated using a timestamp with the format `"'s'yyyyMMdd-HHmmss.SSS"`, e.g. `"s20130412-151029.033"`. |
See also the corresponding Java API
`Path createSnapshot(Path path)` and
`Path createSnapshot(Path path, String snapshotName)`
in [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html)
The snapshot path is returned in these methods.
#### Delete Snapshots
Delete a snapshot of from a snapshottable directory.
This operation requires owner privilege of the snapshottable directory.
* Command:
hdfs dfs -deleteSnapshot <path> <snapshotName>
* Arguments:
| --- | --- |
| path | The path of the snapshottable directory. |
| snapshotName | The snapshot name. |
See also the corresponding Java API
`void deleteSnapshot(Path path, String snapshotName)`
in [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).
#### Rename Snapshots
Rename a snapshot.
This operation requires owner privilege of the snapshottable directory.
* Command:
hdfs dfs -renameSnapshot <path> <oldName> <newName>
* Arguments:
| --- | --- |
| path | The path of the snapshottable directory. |
| oldName | The old snapshot name. |
| newName | The new snapshot name. |
See also the corresponding Java API
`void renameSnapshot(Path path, String oldName, String newName)`
in [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).
#### Get Snapshottable Directory Listing
Get all the snapshottable directories where the current user has permission to take snapshtos.
* Command:
hdfs lsSnapshottableDir
* Arguments: none
See also the corresponding Java API
`SnapshottableDirectoryStatus[] getSnapshottableDirectoryListing()`
in `DistributedFileSystem`.
#### Get Snapshots Difference Report
Get the differences between two snapshots.
This operation requires read access privilege for all files/directories in both snapshots.
* Command:
hdfs snapshotDiff <path> <fromSnapshot> <toSnapshot>
* Arguments:
| --- | --- |
| path | The path of the snapshottable directory. |
| fromSnapshot | The name of the starting snapshot. |
| toSnapshot | The name of the ending snapshot. |
Note that snapshotDiff can be used to get the difference report between two snapshots, or between
a snapshot and the current status of a directory. Users can use "." to represent the current status.
* Results:
| --- | --- |
| \+ | The file/directory has been created. |
| \- | The file/directory has been deleted. |
| M | The file/directory has been modified. |
| R | The file/directory has been renamed. |
A *RENAME* entry indicates a file/directory has been renamed but
is still under the same snapshottable directory. A file/directory is
reported as deleted if it was renamed to outside of the snapshottble directory.
A file/directory renamed from outside of the snapshottble directory is
reported as newly created.
The snapshot difference report does not guarantee the same operation sequence.
For example, if we rename the directory *"/foo"* to *"/foo2"*, and
then append new data to the file *"/foo2/bar"*, the difference report will
be:
R. /foo -> /foo2
M. /foo/bar
I.e., the changes on the files/directories under a renamed directory is
reported using the original path before the rename (*"/foo/bar"* in
the above example).
See also the corresponding Java API
`SnapshotDiffReport getSnapshotDiffReport(Path path, String fromSnapshot, String toSnapshot)`
in `DistributedFileSystem`.

View File

@ -1,350 +0,0 @@
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<document xmlns="http://maven.apache.org/XDOC/2.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/XDOC/2.0 http://maven.apache.org/xsd/xdoc-2.0.xsd">
<properties>
<title>HDFS Rolling Upgrade</title>
</properties>
<body>
<h1>HDFS Rolling Upgrade</h1>
<macro name="toc">
<param name="section" value="0"/>
<param name="fromDepth" value="0"/>
<param name="toDepth" value="4"/>
</macro>
<section name="Introduction" id="Introduction">
<p>
<em>HDFS rolling upgrade</em> allows upgrading individual HDFS daemons.
For examples, the datanodes can be upgraded independent of the namenodes.
A namenode can be upgraded independent of the other namenodes.
The namenodes can be upgraded independent of datanods and journal nodes.
</p>
</section>
<section name="Upgrade" id="Upgrade">
<p>
In Hadoop v2, HDFS supports highly-available (HA) namenode services and wire compatibility.
These two capabilities make it feasible to upgrade HDFS without incurring HDFS downtime.
In order to upgrade a HDFS cluster without downtime, the cluster must be setup with HA.
</p>
<p>
If there is any new feature which is enabled in new software release, may not work with old software release after upgrade.
In such cases upgrade should be done by following steps.
</p>
<ol>
<li>Disable new feature.</li>
<li>Upgrade the cluster.</li>
<li>Enable the new feature.</li>
</ol>
<p>
Note that rolling upgrade is supported only from Hadoop-2.4.0 onwards.
</p>
<subsection name="Upgrade without Downtime" id="UpgradeWithoutDowntime">
<p>
In a HA cluster, there are two or more <em>NameNodes (NNs)</em>, many <em>DataNodes (DNs)</em>,
a few <em>JournalNodes (JNs)</em> and a few <em>ZooKeeperNodes (ZKNs)</em>.
<em>JNs</em> is relatively stable and does not require upgrade when upgrading HDFS in most of the cases.
In the rolling upgrade procedure described here,
only <em>NNs</em> and <em>DNs</em> are considered but <em>JNs</em> and <em>ZKNs</em> are not.
Upgrading <em>JNs</em> and <em>ZKNs</em> may incur cluster downtime.
</p>
<h4>Upgrading Non-Federated Clusters</h4>
<p>
Suppose there are two namenodes <em>NN1</em> and <em>NN2</em>,
where <em>NN1</em> and <em>NN2</em> are respectively in active and standby states.
The following are the steps for upgrading a HA cluster:
</p>
<ol>
<li>Prepare Rolling Upgrade<ol>
<li>Run "<code><a href="#dfsadmin_-rollingUpgrade">hdfs dfsadmin -rollingUpgrade prepare</a></code>"
to create a fsimage for rollback.
</li>
<li>Run "<code><a href="#dfsadmin_-rollingUpgrade">hdfs dfsadmin -rollingUpgrade query</a></code>"
to check the status of the rollback image.
Wait and re-run the command until
the "<tt>Proceed with rolling upgrade</tt>" message is shown.
</li>
</ol></li>
<li>Upgrade Active and Standby <em>NNs</em><ol>
<li>Shutdown and upgrade <em>NN2</em>.</li>
<li>Start <em>NN2</em> as standby with the
"<a href="#namenode_-rollingUpgrade"><code>-rollingUpgrade started</code></a>" option.</li>
<li>Failover from <em>NN1</em> to <em>NN2</em>
so that <em>NN2</em> becomes active and <em>NN1</em> becomes standby.</li>
<li>Shutdown and upgrade <em>NN1</em>.</li>
<li>Start <em>NN1</em> as standby with the
"<a href="#namenode_-rollingUpgrade"><code>-rollingUpgrade started</code></a>" option.</li>
</ol></li>
<li>Upgrade <em>DNs</em><ol>
<li>Choose a small subset of datanodes (e.g. all datanodes under a particular rack).</li>
<ol>
<li>Run "<code><a href="#dfsadmin_-shutdownDatanode">hdfs dfsadmin -shutdownDatanode &lt;DATANODE_HOST:IPC_PORT&gt; upgrade</a></code>"
to shutdown one of the chosen datanodes.</li>
<li>Run "<code><a href="#dfsadmin_-getDatanodeInfo">hdfs dfsadmin -getDatanodeInfo &lt;DATANODE_HOST:IPC_PORT&gt;</a></code>"
to check and wait for the datanode to shutdown.</li>
<li>Upgrade and restart the datanode.</li>
<li>Perform the above steps for all the chosen datanodes in the subset in parallel.</li>
</ol>
<li>Repeat the above steps until all datanodes in the cluster are upgraded.</li>
</ol></li>
<li>Finalize Rolling Upgrade<ul>
<li>Run "<code><a href="#dfsadmin_-rollingUpgrade">hdfs dfsadmin -rollingUpgrade finalize</a></code>"
to finalize the rolling upgrade.</li>
</ul></li>
</ol>
<h4>Upgrading Federated Clusters</h4>
<p>
In a federated cluster, there are multiple namespaces
and a pair of active and standby <em>NNs</em> for each namespace.
The procedure for upgrading a federated cluster is similar to upgrading a non-federated cluster
except that Step 1 and Step 4 are performed on each namespace
and Step 2 is performed on each pair of active and standby <em>NNs</em>, i.e.
</p>
<ol>
<li>Prepare Rolling Upgrade for Each Namespace</li>
<li>Upgrade Active and Standby <em>NN</em> pairs for Each Namespace</li>
<li>Upgrade <em>DNs</em></li>
<li>Finalize Rolling Upgrade for Each Namespace</li>
</ol>
</subsection>
<subsection name="Upgrade with Downtime" id="UpgradeWithDowntime">
<p>
For non-HA clusters,
it is impossible to upgrade HDFS without downtime since it requires restarting the namenodes.
However, datanodes can still be upgraded in a rolling manner.
</p>
<h4>Upgrading Non-HA Clusters</h4>
<p>
In a non-HA cluster, there are a <em>NameNode (NN)</em>, a <em>SecondaryNameNode (SNN)</em>
and many <em>DataNodes (DNs)</em>.
The procedure for upgrading a non-HA cluster is similar to upgrading a HA cluster
except that Step 2 "Upgrade Active and Standby <em>NNs</em>" is changed to below:
</p>
<ul>
<li>Upgrade <em>NN</em> and <em>SNN</em><ol>
<li>Shutdown <em>SNN</em></li>
<li>Shutdown and upgrade <em>NN</em>.</li>
<li>Start <em>NN</em> with the
"<a href="#namenode_-rollingUpgrade"><code>-rollingUpgrade started</code></a>" option.</li>
<li>Upgrade and restart <em>SNN</em></li>
</ol></li>
</ul>
</subsection>
</section>
<section name="Downgrade and Rollback" id="DowngradeAndRollback">
<p>
When the upgraded release is undesirable
or, in some unlikely case, the upgrade fails (due to bugs in the newer release),
administrators may choose to downgrade HDFS back to the pre-upgrade release,
or rollback HDFS to the pre-upgrade release and the pre-upgrade state.
</p>
<p>
Note that downgrade can be done in a rolling fashion but rollback cannot.
Rollback requires cluster downtime.
</p>
<p>
Note also that downgrade and rollback are possible only after a rolling upgrade is started and
before the upgrade is terminated.
An upgrade can be terminated by either finalize, downgrade or rollback.
Therefore, it may not be possible to perform rollback after finalize or downgrade,
or to perform downgrade after finalize.
</p>
</section>
<section name="Downgrade" id="Downgrade">
<p>
<em>Downgrade</em> restores the software back to the pre-upgrade release
and preserves the user data.
Suppose time <em>T</em> is the rolling upgrade start time and the upgrade is terminated by downgrade.
Then, the files created before or after <em>T</em> remain available in HDFS.
The files deleted before or after <em>T</em> remain deleted in HDFS.
</p>
<p>
A newer release is downgradable to the pre-upgrade release
only if both the namenode layout version and the datenode layout version
are not changed between these two releases.
</p>
<subsection name="Downgrade without Downtime" id="DowngradeWithoutDowntime">
<p>
In a HA cluster,
when a rolling upgrade from an old software release to a new software release is in progress,
it is possible to downgrade, in a rolling fashion, the upgraded machines back to the old software release.
Same as before, suppose <em>NN1</em> and <em>NN2</em> are respectively in active and standby states.
Below are the steps for rolling downgrade:
</p>
<ol>
<li>Downgrade <em>DNs</em><ol>
<li>Choose a small subset of datanodes (e.g. all datanodes under a particular rack).</li>
<ol>
<li>Run "<code><a href="#dfsadmin_-shutdownDatanode">hdfs dfsadmin -shutdownDatanode &lt;DATANODE_HOST:IPC_PORT&gt; upgrade</a></code>"
to shutdown one of the chosen datanodes.</li>
<li>Run "<code><a href="#dfsadmin_-getDatanodeInfo">hdfs dfsadmin -getDatanodeInfo &lt;DATANODE_HOST:IPC_PORT&gt;</a></code>"
to check and wait for the datanode to shutdown.</li>
<li>Downgrade and restart the datanode.</li>
<li>Perform the above steps for all the chosen datanodes in the subset in parallel.</li>
</ol>
<li>Repeat the above steps until all upgraded datanodes in the cluster are downgraded.</li>
</ol></li>
<li>Downgrade Active and Standby <em>NNs</em><ol>
<li>Shutdown and downgrade <em>NN2</em>.</li>
<li>Start <em>NN2</em> as standby normally. (Note that it is incorrect to use the
"<a href="#namenode_-rollingUpgrade"><code>-rollingUpgrade downgrade</code></a>"
option here.)
</li>
<li>Failover from <em>NN1</em> to <em>NN2</em>
so that <em>NN2</em> becomes active and <em>NN1</em> becomes standby.</li>
<li>Shutdown and upgrade <em>NN1</em>.</li>
<li>Start <em>NN1</em> as standby normally. (Note that it is incorrect to use the
"<a href="#namenode_-rollingUpgrade"><code>-rollingUpgrade downgrade</code></a>"
option here.)
</li>
</ol></li>
<li>Finalize Rolling Downgrade<ul>
<li>Run "<code><a href="#dfsadmin_-rollingUpgrade">hdfs dfsadmin -rollingUpgrade finalize</a></code>"
to finalize the rolling downgrade.</li>
</ul></li>
</ol>
<p>
Note that the datanodes must be downgraded before downgrading the namenodes
since protocols may be changed in a backward compatible manner but not forward compatible,
i.e. old datanodes can talk to the new namenodes but not vice versa.
</p>
</subsection>
<subsection name="Downgrade with Downtime" id="DowngradeWithDowntime">
<p>
Administrator may choose to first shutdown the cluster and then downgrade it.
The following are the steps:
</p>
<ol>
<li>Shutdown all <em>NNs</em> and <em>DNs</em>.</li>
<li>Restore the pre-upgrade release in all machines.</li>
<li>Start <em>NNs</em> with the
"<a href="#namenode_-rollingUpgrade"><code>-rollingUpgrade downgrade</code></a>" option.</li>
<li>Start <em>DNs</em> normally.</li>
</ol>
</subsection>
</section>
<section name="Rollback" id="Rollback">
<p>
<em>Rollback</em> restores the software back to the pre-upgrade release
but also reverts the user data back to the pre-upgrade state.
Suppose time <em>T</em> is the rolling upgrade start time and the upgrade is terminated by rollback.
The files created before <em>T</em> remain available in HDFS but the files created after <em>T</em> become unavailable.
The files deleted before <em>T</em> remain deleted in HDFS but the files deleted after <em>T</em> are restored.
</p>
<p>
Rollback from a newer release to the pre-upgrade release is always supported.
However, it cannot be done in a rolling fashion. It requires cluster downtime.
Suppose <em>NN1</em> and <em>NN2</em> are respectively in active and standby states.
Below are the steps for rollback:
</p>
<ul>
<li>Rollback HDFS<ol>
<li>Shutdown all <em>NNs</em> and <em>DNs</em>.</li>
<li>Restore the pre-upgrade release in all machines.</li>
<li>Start <em>NN1</em> as Active with the
"<a href="#namenode_-rollingUpgrade"><code>-rollingUpgrade rollback</code></a>" option.</li>
<li>Run `-bootstrapStandby' on NN2 and start it normally as standby.</li>
<li>Start <em>DNs</em> with the "<code>-rollback</code>" option.</li>
</ol></li>
</ul>
</section>
<section name="Commands and Startup Options for Rolling Upgrade" id="dfsadminCommands">
<subsection name="DFSAdmin Commands" id="dfsadminCommands">
<h4><code>dfsadmin -rollingUpgrade</code></h4>
<source>hdfs dfsadmin -rollingUpgrade &lt;query|prepare|finalize&gt;</source>
<p>
Execute a rolling upgrade action.
<ul><li>Options:<table>
<tr><td><code>query</code></td><td>Query the current rolling upgrade status.</td></tr>
<tr><td><code>prepare</code></td><td>Prepare a new rolling upgrade.</td></tr>
<tr><td><code>finalize</code></td><td>Finalize the current rolling upgrade.</td></tr>
</table></li></ul>
</p>
<h4><code>dfsadmin -getDatanodeInfo</code></h4>
<source>hdfs dfsadmin -getDatanodeInfo &lt;DATANODE_HOST:IPC_PORT&gt;</source>
<p>
Get the information about the given datanode.
This command can be used for checking if a datanode is alive
like the Unix <code>ping</code> command.
</p>
<h4><code>dfsadmin -shutdownDatanode</code></h4>
<source>hdfs dfsadmin -shutdownDatanode &lt;DATANODE_HOST:IPC_PORT&gt; [upgrade]</source>
<p>
Submit a shutdown request for the given datanode.
If the optional <code>upgrade</code> argument is specified,
clients accessing the datanode will be advised to wait for it to restart
and the fast start-up mode will be enabled.
When the restart does not happen in time, clients will timeout and ignore the datanode.
In such case, the fast start-up mode will also be disabled.
</p>
<p>
Note that the command does not wait for the datanode shutdown to complete.
The "<a href="#dfsadmin_-getDatanodeInfo">dfsadmin -getDatanodeInfo</a>"
command can be used for checking if the datanode shutdown is completed.
</p>
</subsection>
<subsection name="NameNode Startup Options" id="dfsadminCommands">
<h4><code>namenode -rollingUpgrade</code></h4>
<source>hdfs namenode -rollingUpgrade &lt;downgrade|rollback|started&gt;</source>
<p>
When a rolling upgrade is in progress,
the <code>-rollingUpgrade</code> namenode startup option is used to specify
various rolling upgrade options.
</p>
<ul><li>Options:<table>
<tr><td><code>downgrade</code></td>
<td>Restores the namenode back to the pre-upgrade release
and preserves the user data.</td>
</tr>
<tr><td><code>rollback</code></td>
<td>Restores the namenode back to the pre-upgrade release
but also reverts the user data back to the pre-upgrade state.</td>
</tr>
<tr><td><code>started</code></td>
<td>Specifies a rolling upgrade already started
so that the namenode should allow image directories
with different layout versions during startup.</td>
</tr>
</table></li></ul>
</subsection>
</section>
</body>
</document>

View File

@ -1,303 +0,0 @@
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<document xmlns="http://maven.apache.org/XDOC/2.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/XDOC/2.0 http://maven.apache.org/xsd/xdoc-2.0.xsd">
<properties>
<title>HDFS Snapshots</title>
</properties>
<body>
<h1>HDFS Snapshots</h1>
<macro name="toc">
<param name="section" value="0"/>
<param name="fromDepth" value="0"/>
<param name="toDepth" value="4"/>
</macro>
<section name="Overview" id="Overview">
<p>
HDFS Snapshots are read-only point-in-time copies of the file system.
Snapshots can be taken on a subtree of the file system or the entire file system.
Some common use cases of snapshots are data backup, protection against user errors
and disaster recovery.
</p>
<p>
The implementation of HDFS Snapshots is efficient:
</p>
<ul>
<li>Snapshot creation is instantaneous:
the cost is <em>O(1)</em> excluding the inode lookup time.</li>
<li>Additional memory is used only when modifications are made relative to a snapshot:
memory usage is <em>O(M)</em>,
where <em>M</em> is the number of modified files/directories.</li>
<li>Blocks in datanodes are not copied:
the snapshot files record the block list and the file size.
There is no data copying.</li>
<li>Snapshots do not adversely affect regular HDFS operations:
modifications are recorded in reverse chronological order
so that the current data can be accessed directly.
The snapshot data is computed by subtracting the modifications
from the current data.</li>
</ul>
<subsection name="Snapshottable Directories" id="SnapshottableDirectories">
<p>
Snapshots can be taken on any directory once the directory has been set as
<em>snapshottable</em>.
A snapshottable directory is able to accommodate 65,536 simultaneous snapshots.
There is no limit on the number of snapshottable directories.
Administrators may set any directory to be snapshottable.
If there are snapshots in a snapshottable directory,
the directory can be neither deleted nor renamed
before all the snapshots are deleted.
</p>
<p>
Nested snapshottable directories are currently not allowed.
In other words, a directory cannot be set to snapshottable
if one of its ancestors/descendants is a snapshottable directory.
</p>
</subsection>
<subsection name="Snapshot Paths" id="SnapshotPaths">
<p>
For a snapshottable directory,
the path component <em>".snapshot"</em> is used for accessing its snapshots.
Suppose <code>/foo</code> is a snapshottable directory,
<code>/foo/bar</code> is a file/directory in <code>/foo</code>,
and <code>/foo</code> has a snapshot <code>s0</code>.
Then, the path <source>/foo/.snapshot/s0/bar</source>
refers to the snapshot copy of <code>/foo/bar</code>.
The usual API and CLI can work with the ".snapshot" paths.
The following are some examples.
</p>
<ul>
<li>Listing all the snapshots under a snapshottable directory:
<source>hdfs dfs -ls /foo/.snapshot</source></li>
<li>Listing the files in snapshot <code>s0</code>:
<source>hdfs dfs -ls /foo/.snapshot/s0</source></li>
<li>Copying a file from snapshot <code>s0</code>:
<source>hdfs dfs -cp -ptopax /foo/.snapshot/s0/bar /tmp</source>
<p>Note that this example uses the preserve option to preserve
timestamps, ownership, permission, ACLs and XAttrs.</p></li>
</ul>
</subsection>
</section>
<section name="Upgrading to a version of HDFS with snapshots" id="Upgrade">
<p>
The HDFS snapshot feature introduces a new reserved path name used to
interact with snapshots: <tt>.snapshot</tt>. When upgrading from an
older version of HDFS, existing paths named <tt>.snapshot</tt> need
to first be renamed or deleted to avoid conflicting with the reserved path.
See the upgrade section in
<a href="HdfsUserGuide.html#Upgrade_and_Rollback">the HDFS user guide</a>
for more information. </p>
</section>
<section name="Snapshot Operations" id="SnapshotOperations">
<subsection name="Administrator Operations" id="AdministratorOperations">
<p>
The operations described in this section require superuser privilege.
</p>
<h4>Allow Snapshots</h4>
<p>
Allowing snapshots of a directory to be created.
If the operation completes successfully, the directory becomes snapshottable.
</p>
<ul>
<li>Command:
<source>hdfs dfsadmin -allowSnapshot &lt;path&gt;</source></li>
<li>Arguments:<table>
<tr><td>path</td><td>The path of the snapshottable directory.</td></tr>
</table></li>
</ul>
<p>
See also the corresponding Java API
<code>void allowSnapshot(Path path)</code> in <code>HdfsAdmin</code>.
</p>
<h4>Disallow Snapshots</h4>
<p>
Disallowing snapshots of a directory to be created.
All snapshots of the directory must be deleted before disallowing snapshots.
</p>
<ul>
<li>Command:
<source>hdfs dfsadmin -disallowSnapshot &lt;path&gt;</source></li>
<li>Arguments:<table>
<tr><td>path</td><td>The path of the snapshottable directory.</td></tr>
</table></li>
</ul>
<p>
See also the corresponding Java API
<code>void disallowSnapshot(Path path)</code> in <code>HdfsAdmin</code>.
</p>
</subsection>
<subsection name="User Operations" id="UserOperations">
<p>
The section describes user operations.
Note that HDFS superuser can perform all the operations
without satisfying the permission requirement in the individual operations.
</p>
<h4>Create Snapshots</h4>
<p>
Create a snapshot of a snapshottable directory.
This operation requires owner privilege of the snapshottable directory.
</p>
<ul>
<li>Command:
<source>hdfs dfs -createSnapshot &lt;path&gt; [&lt;snapshotName&gt;]</source></li>
<li>Arguments:<table>
<tr><td>path</td><td>The path of the snapshottable directory.</td></tr>
<tr><td>snapshotName</td><td>
The snapshot name, which is an optional argument.
When it is omitted, a default name is generated using a timestamp with the format
<code>"'s'yyyyMMdd-HHmmss.SSS"</code>, e.g. "s20130412-151029.033".
</td></tr>
</table></li>
</ul>
<p>
See also the corresponding Java API
<code>Path createSnapshot(Path path)</code> and
<code>Path createSnapshot(Path path, String snapshotName)</code>
in <a href="../../api/org/apache/hadoop/fs/FileSystem.html"><code>FileSystem</code></a>.
The snapshot path is returned in these methods.
</p>
<h4>Delete Snapshots</h4>
<p>
Delete a snapshot of from a snapshottable directory.
This operation requires owner privilege of the snapshottable directory.
</p>
<ul>
<li>Command:
<source>hdfs dfs -deleteSnapshot &lt;path&gt; &lt;snapshotName&gt;</source></li>
<li>Arguments:<table>
<tr><td>path</td><td>The path of the snapshottable directory.</td></tr>
<tr><td>snapshotName</td><td>The snapshot name.</td></tr>
</table></li>
</ul>
<p>
See also the corresponding Java API
<code>void deleteSnapshot(Path path, String snapshotName)</code>
in <a href="../../api/org/apache/hadoop/fs/FileSystem.html"><code>FileSystem</code></a>.
</p>
<h4>Rename Snapshots</h4>
<p>
Rename a snapshot.
This operation requires owner privilege of the snapshottable directory.
</p>
<ul>
<li>Command:
<source>hdfs dfs -renameSnapshot &lt;path&gt; &lt;oldName&gt; &lt;newName&gt;</source></li>
<li>Arguments:<table>
<tr><td>path</td><td>The path of the snapshottable directory.</td></tr>
<tr><td>oldName</td><td>The old snapshot name.</td></tr>
<tr><td>newName</td><td>The new snapshot name.</td></tr>
</table></li>
</ul>
<p>
See also the corresponding Java API
<code>void renameSnapshot(Path path, String oldName, String newName)</code>
in <a href="../../api/org/apache/hadoop/fs/FileSystem.html"><code>FileSystem</code></a>.
</p>
<h4>Get Snapshottable Directory Listing</h4>
<p>
Get all the snapshottable directories where the current user has permission to take snapshtos.
</p>
<ul>
<li>Command:
<source>hdfs lsSnapshottableDir</source></li>
<li>Arguments: none</li>
</ul>
<p>
See also the corresponding Java API
<code>SnapshottableDirectoryStatus[] getSnapshottableDirectoryListing()</code>
in <code>DistributedFileSystem</code>.
</p>
<h4>Get Snapshots Difference Report</h4>
<p>
Get the differences between two snapshots.
This operation requires read access privilege for all files/directories in both snapshots.
</p>
<ul>
<li>Command:
<source>hdfs snapshotDiff &lt;path&gt; &lt;fromSnapshot&gt; &lt;toSnapshot&gt;</source></li>
<li>Arguments:<table>
<tr><td>path</td><td>The path of the snapshottable directory.</td></tr>
<tr><td>fromSnapshot</td><td>The name of the starting snapshot.</td></tr>
<tr><td>toSnapshot</td><td>The name of the ending snapshot.</td></tr>
</table></li>
<p>
Note that snapshotDiff can be used to get the difference report between two snapshots, or between
a snapshot and the current status of a directory.Users can use "." to represent the current status.
</p>
<li>Results:
<table>
<tr><td>+</td><td>The file/directory has been created.</td></tr>
<tr><td>-</td><td>The file/directory has been deleted.</td></tr>
<tr><td>M</td><td>The file/directory has been modified.</td></tr>
<tr><td>R</td><td>The file/directory has been renamed.</td></tr>
</table>
</li>
</ul>
<p>
A <em>RENAME</em> entry indicates a file/directory has been renamed but
is still under the same snapshottable directory. A file/directory is
reported as deleted if it was renamed to outside of the snapshottble directory.
A file/directory renamed from outside of the snapshottble directory is
reported as newly created.
</p>
<p>
The snapshot difference report does not guarantee the same operation sequence.
For example, if we rename the directory <em>"/foo"</em> to <em>"/foo2"</em>, and
then append new data to the file <em>"/foo2/bar"</em>, the difference report will
be:
<source>
R. /foo -> /foo2
M. /foo/bar
</source>
I.e., the changes on the files/directories under a renamed directory is
reported using the original path before the rename (<em>"/foo/bar"</em> in
the above example).
</p>
<p>
See also the corresponding Java API
<code>SnapshotDiffReport getSnapshotDiffReport(Path path, String fromSnapshot, String toSnapshot)</code>
in <code>DistributedFileSystem</code>.
</p>
</subsection>
</section>
</body>
</document>