diff --git a/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt b/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
index bbb6066e17d..445c50f8410 100644
--- a/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
+++ b/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
@@ -911,6 +911,9 @@ Release 2.8.0 - UNRELEASED
HDFS-7116. Add a command to get the balancer bandwidth
(Rakesh R via vinayakumarb)
+ HDFS-8974. Convert docs in xdoc format to markdown.
+ (Masatake Iwasaki via aajisaka)
+
OPTIMIZATIONS
HDFS-8026. Trace FSOutputSummer#writeChecksumChunks rather than
diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsRollingUpgrade.md b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsRollingUpgrade.md
new file mode 100644
index 00000000000..5415912b8f4
--- /dev/null
+++ b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsRollingUpgrade.md
@@ -0,0 +1,293 @@
+
+
+HDFS Rolling Upgrade
+====================
+
+* [Introduction](#Introduction)
+* [Upgrade](#Upgrade)
+ * [Upgrade without Downtime](#Upgrade_without_Downtime)
+ * [Upgrading Non-Federated Clusters](#Upgrading_Non-Federated_Clusters)
+ * [Upgrading Federated Clusters](#Upgrading_Federated_Clusters)
+ * [Upgrade with Downtime](#Upgrade_with_Downtime)
+ * [Upgrading Non-HA Clusters](#Upgrading_Non-HA_Clusters)
+* [Downgrade and Rollback](#Downgrade_and_Rollback)
+* [Downgrade](#Downgrade)
+* [Rollback](#Rollback)
+* [Commands and Startup Options for Rolling Upgrade](#Commands_and_Startup_Options_for_Rolling_Upgrade)
+ * [DFSAdmin Commands](#DFSAdmin_Commands)
+ * [dfsadmin -rollingUpgrade](#dfsadmin_-rollingUpgrade)
+ * [dfsadmin -getDatanodeInfo](#dfsadmin_-getDatanodeInfo)
+ * [dfsadmin -shutdownDatanode](#dfsadmin_-shutdownDatanode)
+ * [NameNode Startup Options](#NameNode_Startup_Options)
+ * [namenode -rollingUpgrade](#namenode_-rollingUpgrade)
+
+
+Introduction
+------------
+
+*HDFS rolling upgrade* allows upgrading individual HDFS daemons.
+For examples, the datanodes can be upgraded independent of the namenodes.
+A namenode can be upgraded independent of the other namenodes.
+The namenodes can be upgraded independent of datanods and journal nodes.
+
+
+Upgrade
+-------
+
+In Hadoop v2, HDFS supports highly-available (HA) namenode services and wire compatibility.
+These two capabilities make it feasible to upgrade HDFS without incurring HDFS downtime.
+In order to upgrade a HDFS cluster without downtime, the cluster must be setup with HA.
+
+If there is any new feature which is enabled in new software release, may not work with old software release after upgrade.
+In such cases upgrade should be done by following steps.
+
+1. Disable new feature.
+2. Upgrade the cluster.
+3. Enable the new feature.
+
+Note that rolling upgrade is supported only from Hadoop-2.4.0 onwards.
+
+
+### Upgrade without Downtime
+
+In a HA cluster, there are two or more *NameNodes (NNs)*, many *DataNodes (DNs)*,
+a few *JournalNodes (JNs)* and a few *ZooKeeperNodes (ZKNs)*.
+*JNs* is relatively stable and does not require upgrade when upgrading HDFS in most of the cases.
+In the rolling upgrade procedure described here,
+only *NNs* and *DNs* are considered but *JNs* and *ZKNs* are not.
+Upgrading *JNs* and *ZKNs* may incur cluster downtime.
+
+#### Upgrading Non-Federated Clusters
+
+Suppose there are two namenodes *NN1* and *NN2*,
+where *NN1* and *NN2* are respectively in active and standby states.
+The following are the steps for upgrading a HA cluster:
+
+1. Prepare Rolling Upgrade
+ 1. Run "[`hdfs dfsadmin -rollingUpgrade prepare`](#dfsadmin_-rollingUpgrade)"
+ to create a fsimage for rollback.
+ 1. Run "[`hdfs dfsadmin -rollingUpgrade query`](#dfsadmin_-rollingUpgrade)"
+ to check the status of the rollback image.
+ Wait and re-run the command until
+ the "`Proceed with rolling upgrade`" message is shown.
+1. Upgrade Active and Standby *NNs*
+ 1. Shutdown and upgrade *NN2*.
+ 1. Start *NN2* as standby with the
+ "[`-rollingUpgrade started`](#namenode_-rollingUpgrade)" option.
+ 1. Failover from *NN1* to *NN2*
+ so that *NN2* becomes active and *NN1* becomes standby.
+ 1. Shutdown and upgrade *NN1*.
+ 1. Start *NN1* as standby with the
+ "[`-rollingUpgrade started`](#namenode_-rollingUpgrade)" option.
+1. Upgrade *DNs*
+ 1. Choose a small subset of datanodes (e.g. all datanodes under a particular rack).
+ 1. Run "[`hdfs dfsadmin -shutdownDatanode upgrade`](#dfsadmin_-shutdownDatanode)"
+ to shutdown one of the chosen datanodes.
+ 1. Run "[`hdfs dfsadmin -getDatanodeInfo `](#dfsadmin_-getDatanodeInfo)"
+ to check and wait for the datanode to shutdown.
+ 1. Upgrade and restart the datanode.
+ 1. Perform the above steps for all the chosen datanodes in the subset in parallel.
+ 1. Repeat the above steps until all datanodes in the cluster are upgraded.
+1. Finalize Rolling Upgrade
+ 1. Run "[`hdfs dfsadmin -rollingUpgrade finalize`](#dfsadmin_-rollingUpgrade)"
+ to finalize the rolling upgrade.
+
+
+#### Upgrading Federated Clusters
+
+In a federated cluster, there are multiple namespaces
+and a pair of active and standby *NNs* for each namespace.
+The procedure for upgrading a federated cluster is similar to upgrading a non-federated cluster
+except that Step 1 and Step 4 are performed on each namespace
+and Step 2 is performed on each pair of active and standby *NNs*, i.e.
+
+1. Prepare Rolling Upgrade for Each Namespace
+1. Upgrade Active and Standby *NN* pairs for Each Namespace
+1. Upgrade *DNs*
+1. Finalize Rolling Upgrade for Each Namespace
+
+
+### Upgrade with Downtime
+
+For non-HA clusters,
+it is impossible to upgrade HDFS without downtime since it requires restarting the namenodes.
+However, datanodes can still be upgraded in a rolling manner.
+
+
+#### Upgrading Non-HA Clusters
+
+In a non-HA cluster, there are a *NameNode (NN)*, a *SecondaryNameNode (SNN)*
+and many *DataNodes (DNs)*.
+The procedure for upgrading a non-HA cluster is similar to upgrading a HA cluster
+except that Step 2 "Upgrade Active and Standby *NNs*" is changed to below:
+
+* Upgrade *NN* and *SNN*
+ 1. Shutdown *SNN*
+ 1. Shutdown and upgrade *NN*.
+ 1. Start *NN* with the
+ "[`-rollingUpgrade started`](#namenode_-rollingUpgrade)" option.
+ 1. Upgrade and restart *SNN*
+
+
+Downgrade and Rollback
+----------------------
+
+When the upgraded release is undesirable
+or, in some unlikely case, the upgrade fails (due to bugs in the newer release),
+administrators may choose to downgrade HDFS back to the pre-upgrade release,
+or rollback HDFS to the pre-upgrade release and the pre-upgrade state.
+
+Note that downgrade can be done in a rolling fashion but rollback cannot.
+Rollback requires cluster downtime.
+
+Note also that downgrade and rollback are possible only after a rolling upgrade is started and
+before the upgrade is terminated.
+An upgrade can be terminated by either finalize, downgrade or rollback.
+Therefore, it may not be possible to perform rollback after finalize or downgrade,
+or to perform downgrade after finalize.
+
+
+Downgrade
+---------
+
+*Downgrade* restores the software back to the pre-upgrade release
+and preserves the user data.
+Suppose time *T* is the rolling upgrade start time and the upgrade is terminated by downgrade.
+Then, the files created before or after *T* remain available in HDFS.
+The files deleted before or after *T* remain deleted in HDFS.
+
+A newer release is downgradable to the pre-upgrade release
+only if both the namenode layout version and the datenode layout version
+are not changed between these two releases.
+
+In a HA cluster,
+when a rolling upgrade from an old software release to a new software release is in progress,
+it is possible to downgrade, in a rolling fashion, the upgraded machines back to the old software release.
+Same as before, suppose *NN1* and *NN2* are respectively in active and standby states.
+Below are the steps for rolling downgrade without downtime:
+
+1. Downgrade *DNs*
+ 1. Choose a small subset of datanodes (e.g. all datanodes under a particular rack).
+ 1. Run "[`hdfs dfsadmin -shutdownDatanode upgrade`](#dfsadmin_-shutdownDatanode)"
+ to shutdown one of the chosen datanodes.
+ 1. Run "[`hdfs dfsadmin -getDatanodeInfo `](#dfsadmin_-getDatanodeInfo)"
+ to check and wait for the datanode to shutdown.
+ 1. Downgrade and restart the datanode.
+ 1. Perform the above steps for all the chosen datanodes in the subset in parallel.
+ 1. Repeat the above steps until all upgraded datanodes in the cluster are downgraded.
+1. Downgrade Active and Standby *NNs*
+ 1. Shutdown and downgrade *NN2*.
+ 1. Start *NN2* as standby normally.
+ 1. Failover from *NN1* to *NN2*
+ so that *NN2* becomes active and *NN1* becomes standby.
+ 1. Shutdown and upgrade *NN1*.
+ 1. Start *NN1* as standby normally.
+1. Finalize Rolling Downgrade
+ 1. Run "[`hdfs dfsadmin -rollingUpgrade finalize`](#dfsadmin_-rollingUpgrade)"
+ to finalize the rolling downgrade.
+
+Note that the datanodes must be downgraded before downgrading the namenodes
+since protocols may be changed in a backward compatible manner but not forward compatible,
+i.e. old datanodes can talk to the new namenodes but not vice versa.
+
+
+Rollback
+--------
+
+*Rollback* restores the software back to the pre-upgrade release
+but also reverts the user data back to the pre-upgrade state.
+Suppose time *T* is the rolling upgrade start time and the upgrade is terminated by rollback.
+The files created before *T* remain available in HDFS but the files created after *T* become unavailable.
+The files deleted before *T* remain deleted in HDFS but the files deleted after *T* are restored.
+
+Rollback from a newer release to the pre-upgrade release is always supported.
+However, it cannot be done in a rolling fashion. It requires cluster downtime.
+Suppose *NN1* and *NN2* are respectively in active and standby states.
+Below are the steps for rollback:
+
+* Rollback HDFS
+ 1. Shutdown all *NNs* and *DNs*.
+ 1. Restore the pre-upgrade release in all machines.
+ 1. Start *NN1* as Active with the
+ "[`-rollingUpgrade rollback`](#namenode_-rollingUpgrade)" option.
+ 1. Run `-bootstrapStandby' on NN2 and start it normally as standby.
+ 1. Start *DNs* with the "`-rollback`" option.
+
+
+Commands and Startup Options for Rolling Upgrade
+------------------------------------------------
+
+### DFSAdmin Commands
+
+#### `dfsadmin -rollingUpgrade`
+
+ hdfs dfsadmin -rollingUpgrade
+
+Execute a rolling upgrade action.
+
+* Options:
+
+ | --- | --- |
+ | `query` | Query the current rolling upgrade status. |
+ | `prepare` | Prepare a new rolling upgrade. |
+ | `finalize` | Finalize the current rolling upgrade. |
+
+
+#### `dfsadmin -getDatanodeInfo`
+
+ hdfs dfsadmin -getDatanodeInfo
+
+Get the information about the given datanode.
+This command can be used for checking if a datanode is alive
+like the Unix `ping` command.
+
+
+#### `dfsadmin -shutdownDatanode`
+
+ hdfs dfsadmin -shutdownDatanode [upgrade]
+
+Submit a shutdown request for the given datanode.
+If the optional `upgrade` argument is specified,
+clients accessing the datanode will be advised to wait for it to restart
+and the fast start-up mode will be enabled.
+When the restart does not happen in time, clients will timeout and ignore the datanode.
+In such case, the fast start-up mode will also be disabled.
+
+Note that the command does not wait for the datanode shutdown to complete.
+The "[`dfsadmin -getDatanodeInfo`](#dfsadmin_-getDatanodeInfo)"
+command can be used for checking if the datanode shutdown is completed.
+
+
+### NameNode Startup Options
+
+#### `namenode -rollingUpgrade`
+
+ hdfs namenode -rollingUpgrade
+
+When a rolling upgrade is in progress,
+the `-rollingUpgrade` namenode startup option is used to specify
+various rolling upgrade options.
+
+* Options:
+
+ | --- | --- |
+ | `rollback` | Restores the namenode back to the pre-upgrade release but also reverts the user data back to the pre-upgrade state. |
+ | `started` | Specifies a rolling upgrade already started so that the namenode should allow image directories with different layout versions during startup. |
+
+**WARN: downgrade options is obsolete.**
+It is not necessary to start namenode with downgrade options explicitly.
diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsSnapshots.md b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsSnapshots.md
new file mode 100644
index 00000000000..94a37cd77c1
--- /dev/null
+++ b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsSnapshots.md
@@ -0,0 +1,301 @@
+
+
+HDFS Snapshots
+==============
+
+* [HDFS Snapshots](#HDFS_Snapshots)
+ * [Overview](#Overview)
+ * [Snapshottable Directories](#Snapshottable_Directories)
+ * [Snapshot Paths](#Snapshot_Paths)
+ * [Upgrading to a version of HDFS with snapshots](#Upgrading_to_a_version_of_HDFS_with_snapshots)
+ * [Snapshot Operations](#Snapshot_Operations)
+ * [Administrator Operations](#Administrator_Operations)
+ * [Allow Snapshots](#Allow_Snapshots)
+ * [Disallow Snapshots](#Disallow_Snapshots)
+ * [User Operations](#User_Operations)
+ * [Create Snapshots](#Create_Snapshots)
+ * [Delete Snapshots](#Delete_Snapshots)
+ * [Rename Snapshots](#Rename_Snapshots)
+ * [Get Snapshottable Directory Listing](#Get_Snapshottable_Directory_Listing)
+ * [Get Snapshots Difference Report](#Get_Snapshots_Difference_Report)
+
+
+Overview
+--------
+
+HDFS Snapshots are read-only point-in-time copies of the file system.
+Snapshots can be taken on a subtree of the file system or the entire file system.
+Some common use cases of snapshots are data backup, protection against user errors
+and disaster recovery.
+
+The implementation of HDFS Snapshots is efficient:
+
+
+* Snapshot creation is instantaneous:
+ the cost is *O(1)* excluding the inode lookup time.
+
+* Additional memory is used only when modifications are made relative to a snapshot:
+ memory usage is *O(M)*,
+ where *M* is the number of modified files/directories.
+
+* Blocks in datanodes are not copied:
+ the snapshot files record the block list and the file size.
+ There is no data copying.
+
+* Snapshots do not adversely affect regular HDFS operations:
+ modifications are recorded in reverse chronological order
+ so that the current data can be accessed directly.
+ The snapshot data is computed by subtracting the modifications
+ from the current data.
+
+
+### Snapshottable Directories
+
+Snapshots can be taken on any directory once the directory has been set as
+*snapshottable*.
+A snapshottable directory is able to accommodate 65,536 simultaneous snapshots.
+There is no limit on the number of snapshottable directories.
+Administrators may set any directory to be snapshottable.
+If there are snapshots in a snapshottable directory,
+the directory can be neither deleted nor renamed
+before all the snapshots are deleted.
+
+Nested snapshottable directories are currently not allowed.
+In other words, a directory cannot be set to snapshottable
+if one of its ancestors/descendants is a snapshottable directory.
+
+
+### Snapshot Paths
+
+For a snapshottable directory,
+the path component *".snapshot"* is used for accessing its snapshots.
+Suppose `/foo` is a snapshottable directory,
+`/foo/bar` is a file/directory in `/foo`,
+and `/foo` has a snapshot `s0`.
+Then, the path `/foo/.snapshot/s0/bar`
+refers to the snapshot copy of `/foo/bar`.
+The usual API and CLI can work with the ".snapshot" paths.
+The following are some examples.
+
+* Listing all the snapshots under a snapshottable directory:
+
+ hdfs dfs -ls /foo/.snapshot
+
+* Listing the files in snapshot `s0`:
+
+ hdfs dfs -ls /foo/.snapshot/s0
+
+* Copying a file from snapshot `s0`:
+
+ hdfs dfs -cp -ptopax /foo/.snapshot/s0/bar /tmp
+
+ Note that this example uses the preserve option to preserve
+ timestamps, ownership, permission, ACLs and XAttrs.
+
+
+Upgrading to a version of HDFS with snapshots
+---------------------------------------------
+
+The HDFS snapshot feature introduces a new reserved path name used to
+interact with snapshots: `.snapshot`. When upgrading from an
+older version of HDFS, existing paths named `.snapshot` need
+to first be renamed or deleted to avoid conflicting with the reserved path.
+See the upgrade section in
+[the HDFS user guide](HdfsUserGuide.html#Upgrade_and_Rollback)
+for more information.
+
+
+Snapshot Operations
+-------------------
+
+
+### Administrator Operations
+
+The operations described in this section require superuser privilege.
+
+
+#### Allow Snapshots
+
+
+Allowing snapshots of a directory to be created.
+If the operation completes successfully, the directory becomes snapshottable.
+
+* Command:
+
+ hdfs dfsadmin -allowSnapshot
+
+* Arguments:
+
+ | --- | --- |
+ | path | The path of the snapshottable directory. |
+
+See also the corresponding Java API
+`void allowSnapshot(Path path)` in `HdfsAdmin`.
+
+
+#### Disallow Snapshots
+
+Disallowing snapshots of a directory to be created.
+All snapshots of the directory must be deleted before disallowing snapshots.
+
+* Command:
+
+ hdfs dfsadmin -disallowSnapshot
+
+* Arguments:
+
+ | --- | --- |
+ | path | The path of the snapshottable directory. |
+
+See also the corresponding Java API
+`void disallowSnapshot(Path path)` in `HdfsAdmin`.
+
+
+### User Operations
+
+The section describes user operations.
+Note that HDFS superuser can perform all the operations
+without satisfying the permission requirement in the individual operations.
+
+
+#### Create Snapshots
+
+Create a snapshot of a snapshottable directory.
+This operation requires owner privilege of the snapshottable directory.
+
+* Command:
+
+ hdfs dfs -createSnapshot []
+
+* Arguments:
+
+ | --- | --- |
+ | path | The path of the snapshottable directory. |
+ | snapshotName | The snapshot name, which is an optional argument. When it is omitted, a default name is generated using a timestamp with the format `"'s'yyyyMMdd-HHmmss.SSS"`, e.g. `"s20130412-151029.033"`. |
+
+See also the corresponding Java API
+`Path createSnapshot(Path path)` and
+`Path createSnapshot(Path path, String snapshotName)`
+in [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html)
+The snapshot path is returned in these methods.
+
+
+#### Delete Snapshots
+
+Delete a snapshot of from a snapshottable directory.
+This operation requires owner privilege of the snapshottable directory.
+
+* Command:
+
+ hdfs dfs -deleteSnapshot
+
+* Arguments:
+
+ | --- | --- |
+ | path | The path of the snapshottable directory. |
+ | snapshotName | The snapshot name. |
+
+See also the corresponding Java API
+`void deleteSnapshot(Path path, String snapshotName)`
+in [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).
+
+
+#### Rename Snapshots
+
+Rename a snapshot.
+This operation requires owner privilege of the snapshottable directory.
+
+* Command:
+
+ hdfs dfs -renameSnapshot
+
+* Arguments:
+
+ | --- | --- |
+ | path | The path of the snapshottable directory. |
+ | oldName | The old snapshot name. |
+ | newName | The new snapshot name. |
+
+See also the corresponding Java API
+`void renameSnapshot(Path path, String oldName, String newName)`
+in [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).
+
+
+#### Get Snapshottable Directory Listing
+
+Get all the snapshottable directories where the current user has permission to take snapshtos.
+
+* Command:
+
+ hdfs lsSnapshottableDir
+
+* Arguments: none
+
+See also the corresponding Java API
+`SnapshottableDirectoryStatus[] getSnapshottableDirectoryListing()`
+in `DistributedFileSystem`.
+
+
+#### Get Snapshots Difference Report
+
+Get the differences between two snapshots.
+This operation requires read access privilege for all files/directories in both snapshots.
+
+* Command:
+
+ hdfs snapshotDiff
+
+* Arguments:
+
+ | --- | --- |
+ | path | The path of the snapshottable directory. |
+ | fromSnapshot | The name of the starting snapshot. |
+ | toSnapshot | The name of the ending snapshot. |
+
+ Note that snapshotDiff can be used to get the difference report between two snapshots, or between
+ a snapshot and the current status of a directory. Users can use "." to represent the current status.
+
+* Results:
+
+ | --- | --- |
+ | \+ | The file/directory has been created. |
+ | \- | The file/directory has been deleted. |
+ | M | The file/directory has been modified. |
+ | R | The file/directory has been renamed. |
+
+A *RENAME* entry indicates a file/directory has been renamed but
+is still under the same snapshottable directory. A file/directory is
+reported as deleted if it was renamed to outside of the snapshottble directory.
+A file/directory renamed from outside of the snapshottble directory is
+reported as newly created.
+
+The snapshot difference report does not guarantee the same operation sequence.
+For example, if we rename the directory *"/foo"* to *"/foo2"*, and
+then append new data to the file *"/foo2/bar"*, the difference report will
+be:
+
+ R. /foo -> /foo2
+ M. /foo/bar
+
+I.e., the changes on the files/directories under a renamed directory is
+reported using the original path before the rename (*"/foo/bar"* in
+the above example).
+
+See also the corresponding Java API
+`SnapshotDiffReport getSnapshotDiffReport(Path path, String fromSnapshot, String toSnapshot)`
+in `DistributedFileSystem`.
diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/site/xdoc/HdfsRollingUpgrade.xml b/hadoop-hdfs-project/hadoop-hdfs/src/site/xdoc/HdfsRollingUpgrade.xml
deleted file mode 100644
index f0b0ccf0df0..00000000000
--- a/hadoop-hdfs-project/hadoop-hdfs/src/site/xdoc/HdfsRollingUpgrade.xml
+++ /dev/null
@@ -1,329 +0,0 @@
-
-
-
-
-
- HDFS Rolling Upgrade
-
-
-
-
-
HDFS Rolling Upgrade
-
-
-
-
-
-
-
-
- HDFS rolling upgrade allows upgrading individual HDFS daemons.
- For examples, the datanodes can be upgraded independent of the namenodes.
- A namenode can be upgraded independent of the other namenodes.
- The namenodes can be upgraded independent of datanods and journal nodes.
-
-
-
-
-
- In Hadoop v2, HDFS supports highly-available (HA) namenode services and wire compatibility.
- These two capabilities make it feasible to upgrade HDFS without incurring HDFS downtime.
- In order to upgrade a HDFS cluster without downtime, the cluster must be setup with HA.
-
-
- If there is any new feature which is enabled in new software release, may not work with old software release after upgrade.
- In such cases upgrade should be done by following steps.
-
-
-
Disable new feature.
-
Upgrade the cluster.
-
Enable the new feature.
-
-
- Note that rolling upgrade is supported only from Hadoop-2.4.0 onwards.
-
-
-
- In a HA cluster, there are two or more NameNodes (NNs), many DataNodes (DNs),
- a few JournalNodes (JNs) and a few ZooKeeperNodes (ZKNs).
- JNs is relatively stable and does not require upgrade when upgrading HDFS in most of the cases.
- In the rolling upgrade procedure described here,
- only NNs and DNs are considered but JNs and ZKNs are not.
- Upgrading JNs and ZKNs may incur cluster downtime.
-
-
-
Upgrading Non-Federated Clusters
-
- Suppose there are two namenodes NN1 and NN2,
- where NN1 and NN2 are respectively in active and standby states.
- The following are the steps for upgrading a HA cluster:
-
Run "hdfs dfsadmin -rollingUpgrade query"
- to check the status of the rollback image.
- Wait and re-run the command until
- the "Proceed with rolling upgrade" message is shown.
-
- In a federated cluster, there are multiple namespaces
- and a pair of active and standby NNs for each namespace.
- The procedure for upgrading a federated cluster is similar to upgrading a non-federated cluster
- except that Step 1 and Step 4 are performed on each namespace
- and Step 2 is performed on each pair of active and standby NNs, i.e.
-
-
-
Prepare Rolling Upgrade for Each Namespace
-
Upgrade Active and Standby NN pairs for Each Namespace
-
Upgrade DNs
-
Finalize Rolling Upgrade for Each Namespace
-
-
-
-
-
-
- For non-HA clusters,
- it is impossible to upgrade HDFS without downtime since it requires restarting the namenodes.
- However, datanodes can still be upgraded in a rolling manner.
-
-
-
Upgrading Non-HA Clusters
-
- In a non-HA cluster, there are a NameNode (NN), a SecondaryNameNode (SNN)
- and many DataNodes (DNs).
- The procedure for upgrading a non-HA cluster is similar to upgrading a HA cluster
- except that Step 2 "Upgrade Active and Standby NNs" is changed to below:
-
- When the upgraded release is undesirable
- or, in some unlikely case, the upgrade fails (due to bugs in the newer release),
- administrators may choose to downgrade HDFS back to the pre-upgrade release,
- or rollback HDFS to the pre-upgrade release and the pre-upgrade state.
-
-
- Note that downgrade can be done in a rolling fashion but rollback cannot.
- Rollback requires cluster downtime.
-
-
- Note also that downgrade and rollback are possible only after a rolling upgrade is started and
- before the upgrade is terminated.
- An upgrade can be terminated by either finalize, downgrade or rollback.
- Therefore, it may not be possible to perform rollback after finalize or downgrade,
- or to perform downgrade after finalize.
-
-
-
-
-
- Downgrade restores the software back to the pre-upgrade release
- and preserves the user data.
- Suppose time T is the rolling upgrade start time and the upgrade is terminated by downgrade.
- Then, the files created before or after T remain available in HDFS.
- The files deleted before or after T remain deleted in HDFS.
-
-
- A newer release is downgradable to the pre-upgrade release
- only if both the namenode layout version and the datenode layout version
- are not changed between these two releases.
-
-
- In a HA cluster,
- when a rolling upgrade from an old software release to a new software release is in progress,
- it is possible to downgrade, in a rolling fashion, the upgraded machines back to the old software release.
- Same as before, suppose NN1 and NN2 are respectively in active and standby states.
- Below are the steps for rolling downgrade without downtime:
-
-
-
Downgrade DNs
-
Choose a small subset of datanodes (e.g. all datanodes under a particular rack).
- Note that the datanodes must be downgraded before downgrading the namenodes
- since protocols may be changed in a backward compatible manner but not forward compatible,
- i.e. old datanodes can talk to the new namenodes but not vice versa.
-
-
-
-
-
- Rollback restores the software back to the pre-upgrade release
- but also reverts the user data back to the pre-upgrade state.
- Suppose time T is the rolling upgrade start time and the upgrade is terminated by rollback.
- The files created before T remain available in HDFS but the files created after T become unavailable.
- The files deleted before T remain deleted in HDFS but the files deleted after T are restored.
-
-
- Rollback from a newer release to the pre-upgrade release is always supported.
- However, it cannot be done in a rolling fashion. It requires cluster downtime.
- Suppose NN1 and NN2 are respectively in active and standby states.
- Below are the steps for rollback:
-
Run `-bootstrapStandby' on NN2 and start it normally as standby.
-
Start DNs with the "-rollback" option.
-
-
-
-
-
-
-
-
-
dfsadmin -rollingUpgrade
-
-
- Execute a rolling upgrade action.
-
Options:
-
query
Query the current rolling upgrade status.
-
prepare
Prepare a new rolling upgrade.
-
finalize
Finalize the current rolling upgrade.
-
-
-
-
dfsadmin -getDatanodeInfo
-
-
- Get the information about the given datanode.
- This command can be used for checking if a datanode is alive
- like the Unix ping command.
-
-
-
dfsadmin -shutdownDatanode
-
-
- Submit a shutdown request for the given datanode.
- If the optional upgrade argument is specified,
- clients accessing the datanode will be advised to wait for it to restart
- and the fast start-up mode will be enabled.
- When the restart does not happen in time, clients will timeout and ignore the datanode.
- In such case, the fast start-up mode will also be disabled.
-
-
- Note that the command does not wait for the datanode shutdown to complete.
- The "dfsadmin -getDatanodeInfo"
- command can be used for checking if the datanode shutdown is completed.
-
-
-
-
-
-
namenode -rollingUpgrade
-
-
- When a rolling upgrade is in progress,
- the -rollingUpgrade namenode startup option is used to specify
- various rolling upgrade options.
-
-
Options:
-
rollback
-
Restores the namenode back to the pre-upgrade release
- but also reverts the user data back to the pre-upgrade state.
-
-
started
-
Specifies a rolling upgrade already started
- so that the namenode should allow image directories
- with different layout versions during startup.
-
-
-
- WARN: downgrade options is obsolete.
- It is not necessary to start namenode with downgrade options explicitly.
-
- HDFS Snapshots are read-only point-in-time copies of the file system.
- Snapshots can be taken on a subtree of the file system or the entire file system.
- Some common use cases of snapshots are data backup, protection against user errors
- and disaster recovery.
-
-
-
- The implementation of HDFS Snapshots is efficient:
-
-
-
Snapshot creation is instantaneous:
- the cost is O(1) excluding the inode lookup time.
-
Additional memory is used only when modifications are made relative to a snapshot:
- memory usage is O(M),
- where M is the number of modified files/directories.
-
Blocks in datanodes are not copied:
- the snapshot files record the block list and the file size.
- There is no data copying.
-
Snapshots do not adversely affect regular HDFS operations:
- modifications are recorded in reverse chronological order
- so that the current data can be accessed directly.
- The snapshot data is computed by subtracting the modifications
- from the current data.
-
-
-
-
- Snapshots can be taken on any directory once the directory has been set as
- snapshottable.
- A snapshottable directory is able to accommodate 65,536 simultaneous snapshots.
- There is no limit on the number of snapshottable directories.
- Administrators may set any directory to be snapshottable.
- If there are snapshots in a snapshottable directory,
- the directory can be neither deleted nor renamed
- before all the snapshots are deleted.
-
-
-
- Nested snapshottable directories are currently not allowed.
- In other words, a directory cannot be set to snapshottable
- if one of its ancestors/descendants is a snapshottable directory.
-
-
-
-
-
-
- For a snapshottable directory,
- the path component ".snapshot" is used for accessing its snapshots.
- Suppose /foo is a snapshottable directory,
- /foo/bar is a file/directory in /foo,
- and /foo has a snapshot s0.
- Then, the path
- refers to the snapshot copy of /foo/bar.
- The usual API and CLI can work with the ".snapshot" paths.
- The following are some examples.
-
-
-
Listing all the snapshots under a snapshottable directory:
-
-
Listing the files in snapshot s0:
-
-
Copying a file from snapshot s0:
-
-
Note that this example uses the preserve option to preserve
- timestamps, ownership, permission, ACLs and XAttrs.
-
-
-
-
-
-
-
- The HDFS snapshot feature introduces a new reserved path name used to
- interact with snapshots: .snapshot. When upgrading from an
- older version of HDFS, existing paths named .snapshot need
- to first be renamed or deleted to avoid conflicting with the reserved path.
- See the upgrade section in
- the HDFS user guide
- for more information.
-
-
-
-
-
-
- The operations described in this section require superuser privilege.
-
-
-
Allow Snapshots
-
- Allowing snapshots of a directory to be created.
- If the operation completes successfully, the directory becomes snapshottable.
-
-
-
Command:
-
-
Arguments:
-
path
The path of the snapshottable directory.
-
-
-
- See also the corresponding Java API
- void allowSnapshot(Path path) in HdfsAdmin.
-
-
-
Disallow Snapshots
-
- Disallowing snapshots of a directory to be created.
- All snapshots of the directory must be deleted before disallowing snapshots.
-
-
-
Command:
-
-
Arguments:
-
path
The path of the snapshottable directory.
-
-
-
- See also the corresponding Java API
- void disallowSnapshot(Path path) in HdfsAdmin.
-
-
-
-
-
- The section describes user operations.
- Note that HDFS superuser can perform all the operations
- without satisfying the permission requirement in the individual operations.
-
-
-
Create Snapshots
-
- Create a snapshot of a snapshottable directory.
- This operation requires owner privilege of the snapshottable directory.
-
-
-
Command:
-
-
Arguments:
-
path
The path of the snapshottable directory.
-
snapshotName
- The snapshot name, which is an optional argument.
- When it is omitted, a default name is generated using a timestamp with the format
- "'s'yyyyMMdd-HHmmss.SSS", e.g. "s20130412-151029.033".
-
-
-
-
- See also the corresponding Java API
- Path createSnapshot(Path path) and
- Path createSnapshot(Path path, String snapshotName)
- in FileSystem.
- The snapshot path is returned in these methods.
-
-
-
Delete Snapshots
-
- Delete a snapshot of from a snapshottable directory.
- This operation requires owner privilege of the snapshottable directory.
-
-
-
Command:
-
-
Arguments:
-
path
The path of the snapshottable directory.
-
snapshotName
The snapshot name.
-
-
-
- See also the corresponding Java API
- void deleteSnapshot(Path path, String snapshotName)
- in FileSystem.
-
-
-
Rename Snapshots
-
- Rename a snapshot.
- This operation requires owner privilege of the snapshottable directory.
-
-
-
Command:
-
-
Arguments:
-
path
The path of the snapshottable directory.
-
oldName
The old snapshot name.
-
newName
The new snapshot name.
-
-
-
- See also the corresponding Java API
- void renameSnapshot(Path path, String oldName, String newName)
- in FileSystem.
-
-
-
Get Snapshottable Directory Listing
-
- Get all the snapshottable directories where the current user has permission to take snapshtos.
-
-
-
Command:
-
-
Arguments: none
-
-
- See also the corresponding Java API
- SnapshottableDirectoryStatus[] getSnapshottableDirectoryListing()
- in DistributedFileSystem.
-
-
-
Get Snapshots Difference Report
-
- Get the differences between two snapshots.
- This operation requires read access privilege for all files/directories in both snapshots.
-
-
-
Command:
-
-
Arguments:
-
path
The path of the snapshottable directory.
-
fromSnapshot
The name of the starting snapshot.
-
toSnapshot
The name of the ending snapshot.
-
-
- Note that snapshotDiff can be used to get the difference report between two snapshots, or between
- a snapshot and the current status of a directory.Users can use "." to represent the current status.
-
-
Results:
-
-
+
The file/directory has been created.
-
-
The file/directory has been deleted.
-
M
The file/directory has been modified.
-
R
The file/directory has been renamed.
-
-
-
-
- A RENAME entry indicates a file/directory has been renamed but
- is still under the same snapshottable directory. A file/directory is
- reported as deleted if it was renamed to outside of the snapshottble directory.
- A file/directory renamed from outside of the snapshottble directory is
- reported as newly created.
-
-
- The snapshot difference report does not guarantee the same operation sequence.
- For example, if we rename the directory "/foo" to "/foo2", and
- then append new data to the file "/foo2/bar", the difference report will
- be:
-
- I.e., the changes on the files/directories under a renamed directory is
- reported using the original path before the rename ("/foo/bar" in
- the above example).
-
-
- See also the corresponding Java API
- SnapshotDiffReport getSnapshotDiffReport(Path path, String fromSnapshot, String toSnapshot)
- in DistributedFileSystem.
-