HDFS-7581. HDFS documentation needs updating post-shell rewrite (aw)

2015-01-15 07:48:55 -08:00 · 2015-01-15 07:48:55 -08:00 · ce0117636a
parent 533e551eb4
commit ce0117636a
6 changed files with 560 additions and 287 deletions
--- a/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
+++ b/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
@ -265,6 +265,8 @@ Trunk (Unreleased)

    HDFS-7407. Minor typo in privileged pid/out/log names (aw)

+    HDFS-7581. HDFS documentation needs updating post-shell rewrite (aw)
+
 Release 2.7.0 - UNRELEASED

  INCOMPATIBLE CHANGES
--- a/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/Federation.apt.vm
+++ b/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/Federation.apt.vm
@ -34,40 +34,40 @@ HDFS Federation

    * Consists of directories, files and blocks

-    * It supports all the namespace related file system operations such as 
+    * It supports all the namespace related file system operations such as
      create, delete, modify and list files and directories.

  * <<Block Storage Service>> has two parts

    * Block Management (which is done in Namenode)

-      * Provides datanode cluster membership by handling registrations, and 
+      * Provides datanode cluster membership by handling registrations, and
        periodic heart beats.

      * Processes block reports and maintains location of blocks.

-      * Supports block related operations such as create, delete, modify and 
+      * Supports block related operations such as create, delete, modify and
        get block location.

-      * Manages replica placement and replication of a block for under 
+      * Manages replica placement and replication of a block for under
        replicated blocks and deletes blocks that are over replicated.

-    * Storage - is provided by datanodes by storing blocks on the local file 
+    * Storage - is provided by datanodes by storing blocks on the local file
      system and allows read/write access.

-  The prior HDFS architecture allows only a single namespace for the 
-  entire cluster. A single Namenode manages this namespace. HDFS 
-  Federation addresses limitation of the prior architecture by adding 
+  The prior HDFS architecture allows only a single namespace for the
+  entire cluster. A single Namenode manages this namespace. HDFS
+  Federation addresses limitation of the prior architecture by adding
  support multiple Namenodes/namespaces to HDFS file system.
-    
+
 * {Multiple Namenodes/Namespaces}

-  In order to scale the name service horizontally, federation uses multiple 
-  independent Namenodes/namespaces. The Namenodes are federated, that is, the 
-  Namenodes are independent and don’t require coordination with each other. 
-  The datanodes are used as common storage for blocks by all the Namenodes. 
-  Each datanode registers with all the Namenodes in the cluster. Datanodes 
-  send periodic heartbeats and block reports and handles commands from the 
+  In order to scale the name service horizontally, federation uses multiple
+  independent Namenodes/namespaces. The Namenodes are federated, that is, the
+  Namenodes are independent and do not require coordination with each other.
+  The datanodes are used as common storage for blocks by all the Namenodes.
+  Each datanode registers with all the Namenodes in the cluster. Datanodes
+  send periodic heartbeats and block reports and handles commands from the
  Namenodes.

  Users may use {{{./ViewFs.html}ViewFs}} to create personalized namespace views,
@ -78,48 +78,48 @@ HDFS Federation

  <<Block Pool>>

-  A Block Pool is a set of blocks that belong to a single namespace. 
+  A Block Pool is a set of blocks that belong to a single namespace.
  Datanodes store blocks for all the block pools in the cluster.
-  It is managed independently of other block pools. This allows a namespace 
-  to generate Block IDs for new blocks without the need for coordination 
-  with the other namespaces. The failure of a Namenode does not prevent 
+  It is managed independently of other block pools. This allows a namespace
+  to generate Block IDs for new blocks without the need for coordination
+  with the other namespaces. The failure of a Namenode does not prevent
  the datanode from serving other Namenodes in the cluster.

-  A Namespace and its block pool together are called Namespace Volume. 
-  It is a self-contained unit of management. When a Namenode/namespace 
+  A Namespace and its block pool together are called Namespace Volume.
+  It is a self-contained unit of management. When a Namenode/namespace
  is deleted, the corresponding block pool at the datanodes is deleted.
  Each namespace volume is upgraded as a unit, during cluster upgrade.

  <<ClusterID>>

-  A new identifier <<ClusterID>> is added to identify all the nodes in 
-  the cluster.  When a Namenode is formatted, this identifier is provided 
-  or auto generated. This ID should be used for formatting the other 
+  A new identifier <<ClusterID>> is added to identify all the nodes in
+  the cluster.  When a Namenode is formatted, this identifier is provided
+  or auto generated. This ID should be used for formatting the other
  Namenodes into the cluster.

 ** Key Benefits

-  * Namespace Scalability - HDFS cluster storage scales horizontally but 
-    the namespace does not. Large deployments or deployments using lot 
-    of small files benefit from scaling the namespace by adding more 
+  * Namespace Scalability - HDFS cluster storage scales horizontally but
+    the namespace does not. Large deployments or deployments using lot
+    of small files benefit from scaling the namespace by adding more
    Namenodes to the cluster

  * Performance - File system operation throughput is limited by a single
    Namenode in the prior architecture. Adding more Namenodes to the cluster
    scales the file system read/write operations throughput.

-  * Isolation - A single Namenode offers no isolation in multi user 
-    environment. An experimental application can overload the Namenode 
-    and slow down production critical applications. With multiple Namenodes, 
-    different categories of applications and users can be isolated to 
+  * Isolation - A single Namenode offers no isolation in multi user
+    environment. An experimental application can overload the Namenode
+    and slow down production critical applications. With multiple Namenodes,
+    different categories of applications and users can be isolated to
    different namespaces.

 * {Federation Configuration}

-  Federation configuration is <<backward compatible>> and allows existing 
-  single Namenode configuration to work without any change. The new 
-  configuration is designed such that all the nodes in the cluster have 
-  same configuration without the need for deploying different configuration 
+  Federation configuration is <<backward compatible>> and allows existing
+  single Namenode configuration to work without any change. The new
+  configuration is designed such that all the nodes in the cluster have
+  same configuration without the need for deploying different configuration
  based on the type of the node in the cluster.

  A new abstraction called <<<NameServiceID>>> is added with
@ -132,12 +132,12 @@ HDFS Federation
 ** Configuration:

  <<Step 1>>: Add the following parameters to your configuration:
-  <<<dfs.nameservices>>>: Configure with list of comma separated 
-  NameServiceIDs. This will be used by Datanodes to determine all the 
+  <<<dfs.nameservices>>>: Configure with list of comma separated
+  NameServiceIDs. This will be used by Datanodes to determine all the
  Namenodes in the cluster.
-  
-  <<Step 2>>: For each Namenode and Secondary Namenode/BackupNode/Checkpointer 
-  add the following configuration suffixed with the corresponding 
+
+  <<Step 2>>: For each Namenode and Secondary Namenode/BackupNode/Checkpointer
+  add the following configuration suffixed with the corresponding
  <<<NameServiceID>>> into the common configuration file.

 *---------------------+--------------------------------------------+
@ -159,7 +159,7 @@ HDFS Federation
 | BackupNode          | <<<dfs.namenode.backup.address>>>          |
 |                     | <<<dfs.secondary.namenode.keytab.file>>>   |
 *---------------------+--------------------------------------------+
-    
+
  Here is an example configuration with two namenodes:

 ----
@ -200,31 +200,31 @@ HDFS Federation
 ** Formatting Namenodes

  <<Step 1>>: Format a namenode using the following command:
-  
+
 ----
-> $HADOOP_PREFIX_HOME/bin/hdfs namenode -format [-clusterId <cluster_id>]
+[hdfs]$ $HADOOP_PREFIX/bin/hdfs namenode -format [-clusterId <cluster_id>]
 ----
-  Choose a unique cluster_id, which will not conflict other clusters in 
-  your environment. If it is not provided, then a unique ClusterID is 
+  Choose a unique cluster_id, which will not conflict other clusters in
+  your environment. If it is not provided, then a unique ClusterID is
  auto generated.

  <<Step 2>>: Format additional namenode using the following command:

 ----
-> $HADOOP_PREFIX_HOME/bin/hdfs namenode -format -clusterId <cluster_id>
+[hdfs]$ $HADOOP_PREFIX/bin/hdfs namenode -format -clusterId <cluster_id>
 ----
-  Note that the cluster_id in step 2 must be same as that of the 
-  cluster_id in step 1. If they are different, the additional Namenodes 
+  Note that the cluster_id in step 2 must be same as that of the
+  cluster_id in step 1. If they are different, the additional Namenodes
  will not be part of the federated cluster.

 ** Upgrading from an older release and configuring federation

-  Older releases supported a single Namenode. 
+  Older releases supported a single Namenode.
  Upgrade the cluster to newer release to enable federation
  During upgrade you can provide a ClusterID as follows:

 ----
-> $HADOOP_PREFIX_HOME/bin/hdfs start namenode --config $HADOOP_CONF_DIR  -upgrade -clusterId <cluster_ID>
+[hdfs]$ $HADOOP_PREFIX/bin/hdfs start namenode --config $HADOOP_CONF_DIR  -upgrade -clusterId <cluster_ID>
 ----
  If ClusterID is not provided, it is auto generated.

@ -234,8 +234,8 @@ HDFS Federation

  * Add configuration parameter <<<dfs.nameservices>>> to the configuration.

-  * Update the configuration with NameServiceID suffix. Configuration 
-    key names have changed post release 0.20. You must use new configuration 
+  * Update the configuration with NameServiceID suffix. Configuration
+    key names have changed post release 0.20. You must use new configuration
    parameter names, for federation.

  * Add new Namenode related config to the configuration files.
@ -244,11 +244,11 @@ HDFS Federation

  * Start the new Namenode, Secondary/Backup.

-  * Refresh the datanodes to pickup the newly added Namenode by running 
+  * Refresh the datanodes to pickup the newly added Namenode by running
    the following command:

 ----
-> $HADOOP_PREFIX_HOME/bin/hdfs dfadmin -refreshNameNode <datanode_host_name>:<datanode_rpc_port>
+[hdfs]$ $HADOOP_PREFIX/bin/hdfs dfadmin -refreshNameNode <datanode_host_name>:<datanode_rpc_port>
 ----

  * The above command must be run against all the datanodes in the cluster.
@ -260,37 +260,37 @@ HDFS Federation
  To start the cluster run the following command:

 ----
-> $HADOOP_PREFIX_HOME/bin/start-dfs.sh
+[hdfs]$ $HADOOP_PREFIX/sbin/start-dfs.sh
 ----

  To stop the cluster run the following command:

 ----
-> $HADOOP_PREFIX_HOME/bin/stop-dfs.sh
+[hdfs]$ $HADOOP_PREFIX/sbin/stop-dfs.sh
 ----

-  These commands can be run from any node where the HDFS configuration is 
-  available.  The command uses configuration to determine the Namenodes 
-  in the cluster and starts the Namenode process on those nodes. The 
-  datanodes are started on nodes specified in the <<<slaves>>> file. The 
-  script can be used as reference for building your own scripts for 
+  These commands can be run from any node where the HDFS configuration is
+  available.  The command uses configuration to determine the Namenodes
+  in the cluster and starts the Namenode process on those nodes. The
+  datanodes are started on nodes specified in the <<<slaves>>> file. The
+  script can be used as reference for building your own scripts for
  starting and stopping the cluster.

 **  Balancer

-  Balancer has been changed to work with multiple Namenodes in the cluster to 
+  Balancer has been changed to work with multiple Namenodes in the cluster to
  balance the cluster. Balancer can be run using the command:

 ----
-"$HADOOP_PREFIX"/bin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script "$bin"/hdfs start balancer [-policy <policy>]
+[hdfs]$ $HADOOP_PREFIX/bin/hdfs --daemon start balancer [-policy <policy>]
 ----

  Policy could be:

-  * <<<datanode>>> - this is the <default> policy. This balances the storage at 
+  * <<<datanode>>> - this is the <default> policy. This balances the storage at
    the datanode level. This is similar to balancing policy from prior releases.

-  * <<<blockpool>>> - this balances the storage at the block pool level. 
+  * <<<blockpool>>> - this balances the storage at the block pool level.
    Balancing at block pool level balances storage at the datanode level also.

  Note that Balancer only balances the data and does not balance the namespace.
@ -298,43 +298,43 @@ HDFS Federation

 ** Decommissioning

-  Decommissioning is similar to prior releases. The nodes that need to be 
-  decomissioned are added to the exclude file at all the Namenode. Each 
-  Namenode decommissions its Block Pool. When all the Namenodes finish 
+  Decommissioning is similar to prior releases. The nodes that need to be
+  decomissioned are added to the exclude file at all the Namenode. Each
+  Namenode decommissions its Block Pool. When all the Namenodes finish
  decommissioning a datanode, the datanode is considered to be decommissioned.

-  <<Step 1>>: To distributed an exclude file to all the Namenodes, use the 
+  <<Step 1>>: To distributed an exclude file to all the Namenodes, use the
  following command:

 ----
-"$HADOOP_PREFIX"/bin/distributed-exclude.sh <exclude_file>
+[hdfs]$ $HADOOP_PREFIX/sbin/distributed-exclude.sh <exclude_file>
 ----

  <<Step 2>>: Refresh all the Namenodes to pick up the new exclude file.

 ----
-"$HADOOP_PREFIX"/bin/refresh-namenodes.sh
+[hdfs]$ $HADOOP_PREFIX/sbin/refresh-namenodes.sh
 ----
- 
-  The above command uses HDFS configuration to determine the Namenodes 
-  configured in the cluster and refreshes all the Namenodes to pick up 
+
+  The above command uses HDFS configuration to determine the Namenodes
+  configured in the cluster and refreshes all the Namenodes to pick up
  the new exclude file.

 ** Cluster Web Console

-  Similar to Namenode status web page, a Cluster Web Console is added in 
-  federation to monitor the federated cluster at 
+  Similar to Namenode status web page, a Cluster Web Console is added in
+  federation to monitor the federated cluster at
  <<<http://<any_nn_host:port>/dfsclusterhealth.jsp>>>.
  Any Namenode in the cluster can be used to access this web page.

  The web page provides the following information:

-  * Cluster summary that shows number of files, number of blocks and 
-    total configured storage capacity, available and used storage information 
+  * Cluster summary that shows number of files, number of blocks and
+    total configured storage capacity, available and used storage information
    for the entire cluster.

  * Provides list of Namenodes and summary that includes number of files,
-    blocks, missing blocks, number of live and dead data nodes for each 
+    blocks, missing blocks, number of live and dead data nodes for each
    Namenode. It also provides a link to conveniently access Namenode web UI.

  * It also provides decommissioning status of datanodes.
--- a/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HDFSCommands.apt.vm
+++ b/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HDFSCommands.apt.vm
@ -18,7 +18,7 @@

 HDFS Commands Guide

-%{toc|section=1|fromDepth=2|toDepth=4}
+%{toc|section=1|fromDepth=2|toDepth=3}

 * Overview

@ -26,39 +26,37 @@ HDFS Commands Guide
   hdfs script without any arguments prints the description for all
   commands.

-   Usage: <<<hdfs [--config confdir] [--loglevel loglevel] [COMMAND]
-          [GENERIC_OPTIONS] [COMMAND_OPTIONS]>>>
+ Usage: <<<hdfs [SHELL_OPTIONS] COMMAND [GENERIC_OPTIONS] [COMMAND_OPTIONS]>>>

-   Hadoop has an option parsing framework that employs parsing generic options
-   as well as running classes.
+  Hadoop has an option parsing framework that employs parsing generic options as
+  well as running classes.

-*-----------------------+---------------+
-|| COMMAND_OPTION       || Description
-*-----------------------+---------------+
-| <<<--config confdir>>>| Overwrites the default Configuration directory.
-|                       | Default is <<<${HADOOP_HOME}/conf>>>.
-*-----------------------+---------------+
-| <<<--loglevel loglevel>>>| Overwrites the log level. Valid log levels are
-|                       | FATAL, ERROR, WARN, INFO, DEBUG, and TRACE.
-|                       | Default is INFO.
-*-----------------------+---------------+
-| GENERIC_OPTIONS       | The common set of options supported by multiple
-|                       | commands. Full list is
-|                       | {{{../hadoop-common/CommandsManual.html#Generic_Options}here}}.
-*-----------------------+---------------+
-| COMMAND_OPTIONS       | Various commands with their options are described in
-|                       | the following sections. The commands have been
-|                       | grouped into {{{User Commands}}} and
-|                       | {{{Administration Commands}}}.
-*-----------------------+---------------+
+*---------------+--------------+
+|| COMMAND_OPTIONS || Description                   |
+*-------------------------+-------------+
+| SHELL_OPTIONS | The common set of shell options. These are documented on the {{{../../hadoop-project-dist/hadoop-common/CommandsManual.html#Shell Options}Commands Manual}} page.
+*-------------------------+----+
+| GENERIC_OPTIONS | The common set of options supported by multiple commands. See the Hadoop {{{../../hadoop-project-dist/hadoop-common/CommandsManual.html#Generic Options}Commands Manual}} for more information.
+*------------------+---------------+
+| COMMAND COMMAND_OPTIONS | Various commands with their options are described
+|                         | in the following sections. The commands have been
+|                         | grouped into {{User Commands}} and
+|                         | {{Administration Commands}}.
+*-------------------------+--------------+

-* User Commands
+* {User Commands}

   Commands useful for users of a hadoop cluster.

+** <<<classpath>>>
+
+  Usage: <<<hdfs classpath>>>
+
+  Prints the class path needed to get the Hadoop jar and the required libraries
+
 ** <<<dfs>>>

-   Usage: <<<hdfs dfs [GENERIC_OPTIONS] [COMMAND_OPTIONS]>>>
+   Usage: <<<hdfs dfs [COMMAND [COMMAND_OPTIONS]]>>>

   Run a filesystem command on the file system supported in Hadoop.
   The various COMMAND_OPTIONS can be found at
@ -66,97 +64,307 @@ HDFS Commands Guide

 ** <<<fetchdt>>>

-   Gets Delegation Token from a NameNode.
-   See {{{./HdfsUserGuide.html#fetchdt}fetchdt}} for more info.
-
-   Usage: <<<hdfs fetchdt [GENERIC_OPTIONS]
-          [--webservice <namenode_http_addr>] <path> >>>
+   Usage: <<<hdfs fetchdt [--webservice <namenode_http_addr>] <path> >>>

 *------------------------------+---------------------------------------------+
 || COMMAND_OPTION              || Description
 *------------------------------+---------------------------------------------+
-| <fileName>                   | File name to store the token into.
-*------------------------------+---------------------------------------------+
 | --webservice <https_address> | use http protocol instead of RPC
 *------------------------------+---------------------------------------------+
+| <fileName>                   | File name to store the token into.
+*------------------------------+---------------------------------------------+
+
+
+   Gets Delegation Token from a NameNode.
+   See {{{./HdfsUserGuide.html#fetchdt}fetchdt}} for more info.

 ** <<<fsck>>>

-   Runs a HDFS filesystem checking utility.
-   See {{{./HdfsUserGuide.html#fsck}fsck}} for more info.
+   Usage:

-   Usage: <<<hdfs fsck [GENERIC_OPTIONS] <path>
-          [-list-corruptfileblocks | 
+---
+   hdfs fsck <path>
+          [-list-corruptfileblocks |
          [-move | -delete | -openforwrite]
          [-files [-blocks [-locations | -racks]]]
-          [-includeSnapshots] [-showprogress]>>>
+          [-includeSnapshots] [-showprogress]
+---

 *------------------------+---------------------------------------------+
 ||  COMMAND_OPTION       || Description
 *------------------------+---------------------------------------------+
 |   <path>               | Start checking from this path.
 *------------------------+---------------------------------------------+
-| -move                  | Move corrupted files to /lost+found.
-*------------------------+---------------------------------------------+
 | -delete                | Delete corrupted files.
 *------------------------+---------------------------------------------+
 | -files                 | Print out files being checked.
 *------------------------+---------------------------------------------+
-| -openforwrite          | Print out files opened for write.
+| -files -blocks         | Print out the block report
 *------------------------+---------------------------------------------+
-|                        | Include snapshot data if the given path 
-| -includeSnapshots      | indicates a snapshottable directory or 
+| -files -blocks -locations | Print out locations for every block.
+*------------------------+---------------------------------------------+
+| -files -blocks -racks  | Print out network topology for data-node locations.
+*------------------------+---------------------------------------------+
+|                        | Include snapshot data if the given path
+| -includeSnapshots      | indicates a snapshottable directory or
 |                        | there are snapshottable directories under it.
 *------------------------+---------------------------------------------+
-| -list-corruptfileblocks| Print out list of missing blocks and 
+| -list-corruptfileblocks| Print out list of missing blocks and
 |                        | files they belong to.
 *------------------------+---------------------------------------------+
-| -blocks                | Print out block report.
+| -move                  | Move corrupted files to /lost+found.
 *------------------------+---------------------------------------------+
-| -locations             | Print out locations for every block.
-*------------------------+---------------------------------------------+
-| -racks                 | Print out network topology for data-node locations.
+| -openforwrite          | Print out files opened for write.
 *------------------------+---------------------------------------------+
 | -showprogress          | Print out dots for progress in output. Default is OFF
 |                        | (no progress).
 *------------------------+---------------------------------------------+

+
+   Runs the HDFS filesystem checking utility.
+   See {{{./HdfsUserGuide.html#fsck}fsck}} for more info.
+
+** <<<getconf>>>
+
+   Usage:
+
+---
+   hdfs getconf -namenodes
+   hdfs getconf -secondaryNameNodes
+   hdfs getconf -backupNodes
+   hdfs getconf -includeFile
+   hdfs getconf -excludeFile
+   hdfs getconf -nnRpcAddresses
+   hdfs getconf -confKey [key]
+---
+
+*------------------------+---------------------------------------------+
+||  COMMAND_OPTION       || Description
+*------------------------+---------------------------------------------+
+|	-namenodes			| gets list of namenodes in the cluster.
+*------------------------+---------------------------------------------+
+|	-secondaryNameNodes	| gets list of secondary namenodes in the cluster.
+*------------------------+---------------------------------------------+
+|	-backupNodes		| gets list of backup nodes in the cluster.
+*------------------------+---------------------------------------------+
+|	-includeFile		| gets the include file path that defines the datanodes that can join the cluster.
+*------------------------+---------------------------------------------+
+|	-excludeFile		| gets the exclude file path that defines the datanodes that need to decommissioned.
+*------------------------+---------------------------------------------+
+|	-nnRpcAddresses		| gets the namenode rpc addresses
+*------------------------+---------------------------------------------+
+|	-confKey [key]		| gets a specific key from the configuration
+*------------------------+---------------------------------------------+
+
+ Gets configuration information from the configuration directory, post-processing.
+
+** <<<groups>>>
+
+   Usage: <<<hdfs groups [username ...]>>>
+
+   Returns the group information given one or more usernames.
+
+** <<<lsSnapshottableDir>>>
+
+   Usage: <<<hdfs lsSnapshottableDir [-help]>>>
+
+*------------------------+---------------------------------------------+
+||  COMMAND_OPTION       || Description
+*------------------------+---------------------------------------------+
+| -help | print help
+*------------------------+---------------------------------------------+
+
+  Get the list of snapshottable directories.  When this is run as a super user,
+  it returns all snapshottable directories.  Otherwise it returns those directories
+  that are owned by the current user.
+
+** <<<jmxget>>>
+
+   Usage: <<<hdfs jmxget [-localVM ConnectorURL | -port port | -server mbeanserver | -service service]>>>
+
+*------------------------+---------------------------------------------+
+||  COMMAND_OPTION       || Description
+*------------------------+---------------------------------------------+
+| -help | print help
+*------------------------+---------------------------------------------+
+| -localVM ConnectorURL | connect to the VM on the same machine
+*------------------------+---------------------------------------------+
+| -port <mbean server port>    |  specify mbean server port, if missing
+|                             | it will try to connect to MBean Server in
+|                              |  the same VM
+*------------------------+---------------------------------------------+
+| -service  | specify jmx service, either DataNode or NameNode, the default
+*------------------------+---------------------------------------------+
+
+    Dump JMX information from a service.
+
+** <<<oev>>>
+
+  Usage: <<<hdfs oev [OPTIONS] -i INPUT_FILE -o OUTPUT_FILE>>>
+
+*** Required command line arguments:
+
+*------------------------+---------------------------------------------+
+||  COMMAND_OPTION       || Description
+*------------------------+---------------------------------------------+
+|-i,--inputFile <arg> | edits file to process, xml (case
+                       | insensitive) extension means XML format,
+                       | any other filename means binary format
+*------------------------+---------------------------------------------+
+| -o,--outputFile <arg> | Name of output file. If the specified
+                      | file exists, it will be overwritten,
+                      | format of the file is determined
+                      | by -p option
+*------------------------+---------------------------------------------+
+
+*** Optional command line arguments:
+
+*------------------------+---------------------------------------------+
+||  COMMAND_OPTION       || Description
+*------------------------+---------------------------------------------+
+| -f,--fix-txids        | Renumber the transaction IDs in the input,
+                      | so that there are no gaps or invalid transaction IDs.
+*------------------------+---------------------------------------------+
+| -h,--help             | Display usage information and exit
+*------------------------+---------------------------------------------+
+| -r,--recover          | When reading binary edit logs, use recovery
+                      | mode.  This will give you the chance to skip
+                      | corrupt parts of the edit log.
+*------------------------+---------------------------------------------+
+| -p,--processor <arg>  | Select which type of processor to apply
+                      | against image file, currently supported
+                      | processors are: binary (native binary format
+                      | that Hadoop uses), xml (default, XML
+                      | format), stats (prints statistics about
+                      | edits file)
+*------------------------+---------------------------------------------+
+| -v,--verbose          | More verbose output, prints the input and
+                      | output filenames, for processors that write
+                      | to a file, also output to screen. On large
+                      | image files this will dramatically increase
+                      | processing time (default is false).
+*------------------------+---------------------------------------------+
+
+  Hadoop offline edits viewer.
+
+** <<<oiv>>>
+
+  Usage: <<<hdfs oiv [OPTIONS] -i INPUT_FILE>>>
+
+*** Required command line arguments:
+
+*------------------------+---------------------------------------------+
+||  COMMAND_OPTION       || Description
+*------------------------+---------------------------------------------+
+|-i,--inputFile <arg> | edits file to process, xml (case
+                       | insensitive) extension means XML format,
+                       | any other filename means binary format
+*------------------------+---------------------------------------------+
+
+*** Optional command line arguments:
+
+*------------------------+---------------------------------------------+
+||  COMMAND_OPTION       || Description
+*------------------------+---------------------------------------------+
+| -h,--help             | Display usage information and exit
+*------------------------+---------------------------------------------+
+| -o,--outputFile <arg> | Name of output file. If the specified
+                      | file exists, it will be overwritten,
+                      | format of the file is determined
+                      | by -p option
+*------------------------+---------------------------------------------+
+| -p,--processor <arg>  | Select which type of processor to apply
+                      | against image file, currently supported
+                      | processors are: binary (native binary format
+                      | that Hadoop uses), xml (default, XML
+                      | format), stats (prints statistics about
+                      | edits file)
+*------------------------+---------------------------------------------+
+
+  Hadoop Offline Image Viewer for newer image files.
+
+** <<<oiv_legacy>>>
+
+  Usage: <<<hdfs oiv_legacy [OPTIONS] -i INPUT_FILE -o OUTPUT_FILE>>>
+
+*------------------------+---------------------------------------------+
+||  COMMAND_OPTION       || Description
+*------------------------+---------------------------------------------+
+| -h,--help             | Display usage information and exit
+*------------------------+---------------------------------------------+
+|-i,--inputFile <arg> | edits file to process, xml (case
+                       | insensitive) extension means XML format,
+                       | any other filename means binary format
+*------------------------+---------------------------------------------+
+| -o,--outputFile <arg> | Name of output file. If the specified
+                      | file exists, it will be overwritten,
+                      | format of the file is determined
+                      | by -p option
+*------------------------+---------------------------------------------+
+
+  Hadoop offline image viewer for older versions of Hadoop.
+
+
+** <<<snapshotDiff>>>
+
+    Usage:  <<<hdfs snapshotDiff <path> <fromSnapshot> <toSnapshot> >>>
+
+    Determine the difference between HDFS snapshots. See the
+    {{{./HdfsSnapshots.html#Get_Snapshots_Difference_Report}HDFS Snapshot Documentation}} for more information.
+
 ** <<<version>>>

-   Prints the version.
-
   Usage: <<<hdfs version>>>

+   Prints the version.
+
 * Administration Commands

   Commands useful for administrators of a hadoop cluster.

 ** <<<balancer>>>

-   Runs a cluster balancing utility. An administrator can simply press Ctrl-C
-   to stop the rebalancing process. See
-   {{{./HdfsUserGuide.html#Balancer}Balancer}} for more details.
-
   Usage: <<<hdfs balancer [-threshold <threshold>] [-policy <policy>]>>>

 *------------------------+----------------------------------------------------+
 || COMMAND_OPTION        | Description
 *------------------------+----------------------------------------------------+
-| -threshold <threshold> | Percentage of disk capacity. This overwrites the
-|                        | default threshold.
-*------------------------+----------------------------------------------------+
 | -policy <policy>       | <<<datanode>>> (default): Cluster is balanced if
 |                        | each datanode is balanced. \
 |                        | <<<blockpool>>>: Cluster is balanced if each block
 |                        | pool in each datanode is balanced.
 *------------------------+----------------------------------------------------+
+| -threshold <threshold> | Percentage of disk capacity. This overwrites the
+|                        | default threshold.
+*------------------------+----------------------------------------------------+
+
+   Runs a cluster balancing utility. An administrator can simply press Ctrl-C
+   to stop the rebalancing process. See
+   {{{./HdfsUserGuide.html#Balancer}Balancer}} for more details.

   Note that the <<<blockpool>>> policy is more strict than the <<<datanode>>>
   policy.

-** <<<datanode>>>
+** <<<cacheadmin>>>

-   Runs a HDFS datanode.
+    Usage: <<<hdfs cacheadmin -addDirective -path <path> -pool <pool-name> [-force] [-replication <replication>] [-ttl <time-to-live>]>>>
+
+    See the {{{./CentralizedCacheManagement.html#cacheadmin_command-line_interface}HDFS Cache Administration Documentation}} for more information.
+
+** <<<crypto>>>
+
+    Usage:
+
+---
+  hdfs crypto -createZone -keyName <keyName> -path <path>
+  hdfs crypto -help <command-name>
+  hdfs crypto -listZones
+---
+
+  See the {{{./TransparentEncryption.html#crypto_command-line_interface}HDFS Transparent Encryption Documentation}} for more information.
+
+
+** <<<datanode>>>

   Usage: <<<hdfs datanode [-regular | -rollback | -rollingupgrace rollback]>>>

@ -172,12 +380,14 @@ HDFS Commands Guide
 | -rollingupgrade rollback | Rollback a rolling upgrade operation.
 *-----------------+-----------------------------------------------------------+

+   Runs a HDFS datanode.
+
 ** <<<dfsadmin>>>

-   Runs a HDFS dfsadmin client.
+  Usage:

-+------------------------------------------+
-   Usage: hdfs dfsadmin [GENERIC_OPTIONS]
+------------------------------------------
+    hdfs dfsadmin [GENERIC_OPTIONS]
          [-report [-live] [-dead] [-decommissioning]]
          [-safemode enter | leave | get | wait]
          [-saveNamespace]
@ -210,7 +420,7 @@ HDFS Commands Guide
          [-getDatanodeInfo <datanode_host:ipc_port>]
          [-triggerBlockReport [-incremental] <datanode_host:ipc_port>]
          [-help [cmd]]
-+------------------------------------------+
+------------------------------------------

 *-----------------+-----------------------------------------------------------+
 || COMMAND_OPTION || Description
@ -323,11 +533,11 @@ HDFS Commands Guide
 *-----------------+-----------------------------------------------------------+
 | -allowSnapshot \<snapshotDir\> | Allowing snapshots of a directory to be
                  | created. If the operation completes successfully, the
-                  | directory becomes snapshottable.
+                  | directory becomes snapshottable. See the {{{./HdfsSnapshots.html}HDFS Snapshot Documentation}} for more information.
 *-----------------+-----------------------------------------------------------+
 | -disallowSnapshot \<snapshotDir\> | Disallowing snapshots of a directory to
                  | be created. All snapshots of the directory must be deleted
-                  | before disallowing snapshots.
+                  | before disallowing snapshots. See the {{{./HdfsSnapshots.html}HDFS Snapshot Documentation}} for more information.
 *-----------------+-----------------------------------------------------------+
 | -fetchImage \<local directory\> | Downloads the most recent fsimage from the
                  | NameNode and saves it in the specified local directory.
@ -351,30 +561,68 @@ HDFS Commands Guide
                  | is specified.
 *-----------------+-----------------------------------------------------------+

-** <<<mover>>>
+   Runs a HDFS dfsadmin client.

-   Runs the data migration utility.
-   See {{{./ArchivalStorage.html#Mover_-_A_New_Data_Migration_Tool}Mover}} for more details.
+** <<<haadmin>>>
+
+    Usage:
+    
+---
+    hdfs haadmin -checkHealth <serviceId>
+    hdfs haadmin -failover [--forcefence] [--forceactive] <serviceId> <serviceId>
+    hdfs haadmin -getServiceState <serviceId>
+    hdfs haadmin -help <command>
+    hdfs haadmin -transitionToActive <serviceId> [--forceactive]
+    hdfs haadmin -transitionToStandby <serviceId>
+---
+
+*--------------------+--------------------------------------------------------+
+|| COMMAND_OPTION    || Description
+*--------------------+--------------------------------------------------------+
+| -checkHealth | check the health of the given NameNode
+*--------------------+--------------------------------------------------------+
+| -failover | initiate a failover between two NameNodes
+*--------------------+--------------------------------------------------------+
+| -getServiceState | determine whether the given NameNode is Active or Standby
+*--------------------+--------------------------------------------------------+
+| -transitionToActive  | transition the state of the given NameNode to Active (Warning: No fencing is done)
+*--------------------+--------------------------------------------------------+
+| -transitionToStandby | transition the state of the given NameNode to Standby (Warning: No fencing is done)
+*--------------------+--------------------------------------------------------+
+
+  See {{{./HDFSHighAvailabilityWithNFS.html#Administrative_commands}HDFS HA with NFS}} or
+  {{{./HDFSHighAvailabilityWithQJM.html#Administrative_commands}HDFS HA with QJM}} for more
+  information on this command.
+
+** <<<journalnode>>>
+
+   Usage: <<<hdfs journalnode>>>
+
+   This comamnd starts a journalnode for use with {{{./HDFSHighAvailabilityWithQJM.html#Administrative_commands}HDFS HA with QJM}}.
+
+** <<<mover>>>

   Usage: <<<hdfs mover [-p <files/dirs> | -f <local file name>]>>>

 *--------------------+--------------------------------------------------------+
 || COMMAND_OPTION    || Description
 *--------------------+--------------------------------------------------------+
-| -p \<files/dirs\>  | Specify a space separated list of HDFS files/dirs to migrate.
-*--------------------+--------------------------------------------------------+
 | -f \<local file\>  | Specify a local file containing a list of HDFS files/dirs to migrate.
 *--------------------+--------------------------------------------------------+
+| -p \<files/dirs\>  | Specify a space separated list of HDFS files/dirs to migrate.
+*--------------------+--------------------------------------------------------+
+
+   Runs the data migration utility.
+   See {{{./ArchivalStorage.html#Mover_-_A_New_Data_Migration_Tool}Mover}} for more details.

  Note that, when both -p and -f options are omitted, the default path is the root directory.

 ** <<<namenode>>>

-   Runs the namenode. More info about the upgrade, rollback and finalize is at
-   {{{./HdfsUserGuide.html#Upgrade_and_Rollback}Upgrade Rollback}}.
+   Usage:

-+------------------------------------------+
-   Usage: hdfs namenode [-backup] |
+------------------------------------------
+  hdfs namenode [-backup] |
          [-checkpoint] |
          [-format [-clusterid cid ] [-force] [-nonInteractive] ] |
          [-upgrade [-clusterid cid] [-renameReserved<k-v pairs>] ] |
@ -387,7 +635,7 @@ HDFS Commands Guide
          [-bootstrapStandby] |
          [-recover [-force] ] |
          [-metadataVersion ]
-+------------------------------------------+
+------------------------------------------

 *--------------------+--------------------------------------------------------+
 || COMMAND_OPTION    || Description
@ -443,11 +691,23 @@ HDFS Commands Guide
                     | metadata versions of the software and the image.
 *--------------------+--------------------------------------------------------+

-** <<<secondarynamenode>>>
+   Runs the namenode. More info about the upgrade, rollback and finalize is at
+   {{{./HdfsUserGuide.html#Upgrade_and_Rollback}Upgrade Rollback}}.

-   Runs the HDFS secondary namenode.
-   See {{{./HdfsUserGuide.html#Secondary_NameNode}Secondary Namenode}}
-   for more info.
+
+** <<<nfs3>>>
+
+   Usage: <<<hdfs nfs3>>>
+
+   This comamnd starts the NFS3 gateway for use with the {{{./HdfsNfsGateway.html#Start_and_stop_NFS_gateway_service}HDFS NFS3 Service}}.
+
+** <<<portmap>>>
+
+   Usage: <<<hdfs portmap>>>
+
+   This comamnd starts the RPC portmap for use with the {{{./HdfsNfsGateway.html#Start_and_stop_NFS_gateway_service}HDFS NFS3 Service}}.
+
+** <<<secondarynamenode>>>

   Usage: <<<hdfs secondarynamenode [-checkpoint [force]] | [-format] |
          [-geteditsize]>>>
@ -465,6 +725,33 @@ HDFS Commands Guide
                       | the NameNode.
 *----------------------+------------------------------------------------------+

+   Runs the HDFS secondary namenode.
+   See {{{./HdfsUserGuide.html#Secondary_NameNode}Secondary Namenode}}
+   for more info.
+
+
+** <<<storagepolicies>>>
+
+    Usage: <<<hdfs storagepolicies>>>
+
+    Lists out all storage policies.  See the {{{./ArchivalStorage.html}HDFS Storage Policy Documentation}} for more information.
+
+** <<<zkfc>>>
+
+
+   Usage: <<<hdfs zkfc [-formatZK [-force] [-nonInteractive]]>>>
+
+*----------------------+------------------------------------------------------+
+|| COMMAND_OPTION      || Description
+*----------------------+------------------------------------------------------+
+| -formatZK | Format the Zookeeper instance
+*----------------------+------------------------------------------------------+
+| -h | Display help
+*----------------------+------------------------------------------------------+
+
+   This comamnd starts a Zookeeper Failover Controller process for use with {{{./HDFSHighAvailabilityWithQJM.html#Administrative_commands}HDFS HA with QJM}}.
+
+
 * Debug Commands

   Useful commands to help administrators debug HDFS issues, like validating
@ -472,30 +759,25 @@ HDFS Commands Guide

 ** <<<verify>>>

-   Verify HDFS metadata and block files.  If a block file is specified, we
-   will verify that the checksums in the metadata file match the block
-   file.
-
   Usage: <<<hdfs dfs verify [-meta <metadata-file>] [-block <block-file>]>>>

 *------------------------+----------------------------------------------------+
 || COMMAND_OPTION        | Description
 *------------------------+----------------------------------------------------+
-| -meta <metadata-file>  | Absolute path for the metadata file on the local file
-|                        | system of the data node.
-*------------------------+----------------------------------------------------+
 | -block <block-file>    | Optional parameter to specify the absolute path for
 |                        | the block file on the local file system of the data
 |                        | node.
+*------------------------+----------------------------------------------------+
+| -meta <metadata-file>  | Absolute path for the metadata file on the local file
+|                        | system of the data node.
 *------------------------+----------------------------------------------------+

-
+   Verify HDFS metadata and block files.  If a block file is specified, we
+   will verify that the checksums in the metadata file match the block
+   file.

 ** <<<recoverLease>>>

-   Recover the lease on the specified path.  The path must reside on an
-   HDFS filesystem.  The default number of retries is 1.
-
   Usage: <<<hdfs dfs recoverLease [-path <path>] [-retries <num-retries>]>>>

 *-------------------------------+--------------------------------------------+
@ -507,3 +789,6 @@ HDFS Commands Guide
 |                               | recoverLease. The default number of retries
 |                               | is 1.
 *-------------------------------+---------------------------------------------+
+
+   Recover the lease on the specified path.  The path must reside on an
+   HDFS filesystem.  The default number of retries is 1.
--- a/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HDFSHighAvailabilityWithNFS.apt.vm
+++ b/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HDFSHighAvailabilityWithNFS.apt.vm
@ -25,7 +25,7 @@ HDFS High Availability
  This guide provides an overview of the HDFS High Availability (HA) feature and
  how to configure and manage an HA HDFS cluster, using NFS for the shared
  storage required by the NameNodes.
- 
+
  This document assumes that the reader has a general understanding of
  general components and node types in an HDFS cluster. Please refer to the
  HDFS Architecture guide for details.
@ -44,7 +44,7 @@ HDFS High Availability
  an HDFS cluster. Each cluster had a single NameNode, and if that machine or
  process became unavailable, the cluster as a whole would be unavailable
  until the NameNode was either restarted or brought up on a separate machine.
-  
+
  This impacted the total availability of the HDFS cluster in two major ways:

    * In the case of an unplanned event such as a machine crash, the cluster would
@ -52,7 +52,7 @@ HDFS High Availability

    * Planned maintenance events such as software or hardware upgrades on the
      NameNode machine would result in windows of cluster downtime.
-  
+
  The HDFS High Availability feature addresses the above problems by providing
  the option of running two redundant NameNodes in the same cluster in an
  Active/Passive configuration with a hot standby. This allows a fast failover to
@ -67,7 +67,7 @@ HDFS High Availability
  for all client operations in the cluster, while the Standby is simply acting
  as a slave, maintaining enough state to provide a fast failover if
  necessary.
-  
+
  In order for the Standby node to keep its state synchronized with the Active
  node, the current implementation requires that the two nodes both have access
  to a directory on a shared storage device (eg an NFS mount from a NAS). This
@ -80,12 +80,12 @@ HDFS High Availability
  a failover, the Standby will ensure that it has read all of the edits from the
  shared storage before promoting itself to the Active state. This ensures that
  the namespace state is fully synchronized before a failover occurs.
-  
+
  In order to provide a fast failover, it is also necessary that the Standby node
  have up-to-date information regarding the location of blocks in the cluster.
  In order to achieve this, the DataNodes are configured with the location of
  both NameNodes, and send block location information and heartbeats to both.
-  
+
  It is vital for the correct operation of an HA cluster that only one of the
  NameNodes be Active at a time. Otherwise, the namespace state would quickly
  diverge between the two, risking data loss or other incorrect results.  In
@ -116,7 +116,7 @@ HDFS High Availability
    network, and power). Beacuse of this, it is recommended that the shared storage
    server be a high-quality dedicated NAS appliance rather than a simple Linux
    server.
-  
+
  Note that, in an HA cluster, the Standby NameNode also performs checkpoints of
  the namespace state, and thus it is not necessary to run a Secondary NameNode,
  CheckpointNode, or BackupNode in an HA cluster. In fact, to do so would be an
@ -133,7 +133,7 @@ HDFS High Availability
  The new configuration is designed such that all the nodes in the cluster may
  have the same configuration without the need for deploying different
  configuration files to different machines based on the type of the node.
- 
+
  Like HDFS Federation, HA clusters reuse the <<<nameservice ID>>> to identify a
  single HDFS instance that may in fact consist of multiple HA NameNodes. In
  addition, a new abstraction called <<<NameNode ID>>> is added with HA. Each
@ -330,7 +330,7 @@ HDFS High Availability
      <<dfs_namenode_rpc-address>> will contain the RPC address of the target node, even
      though the configuration may specify that variable as
      <<dfs.namenode.rpc-address.ns1.nn1>>.
-      
+
      Additionally, the following variables referring to the target node to be fenced
      are also available:

@ -345,7 +345,7 @@ HDFS High Availability
 *-----------------------:-----------------------------------+
 | $target_namenodeid    | the namenode ID of the NN to be fenced |
 *-----------------------:-----------------------------------+
-      
+
      These environment variables may also be used as substitutions in the shell
      command itself. For example:

@ -355,7 +355,7 @@ HDFS High Availability
  <value>shell(/path/to/my/script.sh --nameservice=$target_nameserviceid $target_host:$target_port)</value>
 </property>
 ---
-      
+
      If the shell command returns an exit
      code of 0, the fencing is determined to be successful. If it returns any other
      exit code, the fencing was not successful and the next fencing method in the
@ -386,7 +386,7 @@ HDFS High Availability

    * If you are setting up a fresh HDFS cluster, you should first run the format
    command (<hdfs namenode -format>) on one of NameNodes.
-  
+
    * If you have already formatted the NameNode, or are converting a
    non-HA-enabled cluster to be HA-enabled, you should now copy over the
    contents of your NameNode metadata directories to the other, unformatted
@ -394,7 +394,7 @@ HDFS High Availability
    unformatted NameNode. Running this command will also ensure that the shared
    edits directory (as configured by <<dfs.namenode.shared.edits.dir>>) contains
    sufficient edits transactions to be able to start both NameNodes.
-  
+
    * If you are converting a non-HA NameNode to be HA, you should run the
    command "<hdfs -initializeSharedEdits>", which will initialize the shared
    edits directory with the edits data from the local NameNode edits directories.
@ -484,7 +484,7 @@ Usage: DFSHAAdmin [-ns <nameserviceId>]
  of coordination data, notifying clients of changes in that data, and
  monitoring clients for failures. The implementation of automatic HDFS failover
  relies on ZooKeeper for the following things:
-  
+
    * <<Failure detection>> - each of the NameNode machines in the cluster
    maintains a persistent session in ZooKeeper. If the machine crashes, the
    ZooKeeper session will expire, notifying the other NameNode that a failover
@ -585,7 +585,7 @@ Usage: DFSHAAdmin [-ns <nameserviceId>]
  from one of the NameNode hosts.

 ----
-$ hdfs zkfc -formatZK
+[hdfs]$ $HADOOP_PREFIX/bin/zkfc -formatZK
 ----

  This will create a znode in ZooKeeper inside of which the automatic failover
@ -605,7 +605,7 @@ $ hdfs zkfc -formatZK
  can start the daemon by running:

 ----
-$ hadoop-daemon.sh start zkfc
+[hdfs]$ $HADOOP_PREFIX/bin/hdfs --daemon start zkfc
 ----

 ** Securing access to ZooKeeper
@ -646,7 +646,7 @@ digest:hdfs-zkfcs:mypassword
  a command like the following:

 ----
-$ java -cp $ZK_HOME/lib/*:$ZK_HOME/zookeeper-3.4.2.jar org.apache.zookeeper.server.auth.DigestAuthenticationProvider hdfs-zkfcs:mypassword
+[hdfs]$ java -cp $ZK_HOME/lib/*:$ZK_HOME/zookeeper-3.4.2.jar org.apache.zookeeper.server.auth.DigestAuthenticationProvider hdfs-zkfcs:mypassword
 output: hdfs-zkfcs:mypassword->hdfs-zkfcs:P/OQvnYyU/nF/mGYvB/xurX8dYs=
 ----

@ -726,24 +726,24 @@ digest:hdfs-zkfcs:vlUvLnd8MlacsE80rDuu6ONESbM=:rwcda
  using the same <<<hdfs haadmin>>> command. It will perform a coordinated
  failover.

- 
+
 * BookKeeper as a Shared storage (EXPERIMENTAL)

-   One option for shared storage for the NameNode is BookKeeper. 
+   One option for shared storage for the NameNode is BookKeeper.
  BookKeeper achieves high availability and strong durability guarantees by replicating
-  edit log entries across multiple storage nodes. The edit log can be striped across 
-  the storage nodes for high performance. Fencing is supported in the protocol, i.e, 
+  edit log entries across multiple storage nodes. The edit log can be striped across
+  the storage nodes for high performance. Fencing is supported in the protocol, i.e,
  BookKeeper will not allow two writers to write the single edit log.

  The meta data for BookKeeper is stored in ZooKeeper.
  In current HA architecture, a Zookeeper cluster is required for ZKFC. The same cluster can be
  for BookKeeper metadata.

-  For more details on building a BookKeeper cluster, please refer to the 
+  For more details on building a BookKeeper cluster, please refer to the
   {{{http://zookeeper.apache.org/bookkeeper/docs/trunk/bookkeeperConfig.html }BookKeeper documentation}}

 The BookKeeperJournalManager is an implementation of the HDFS JournalManager interface, which allows custom write ahead logging implementations to be plugged into the HDFS NameNode.
- 
+
 **<<BookKeeper Journal Manager>>

   To use BookKeeperJournalManager, add the following to hdfs-site.xml.
@ -772,12 +772,12 @@ digest:hdfs-zkfcs:vlUvLnd8MlacsE80rDuu6ONESbM=:rwcda
   classpath. We explain how to generate a jar file with the journal manager and
   its dependencies, and how to put it into the classpath below.

- *** <<More configuration options>> 
+ *** <<More configuration options>>

-     * <<dfs.namenode.bookkeeperjournal.output-buffer-size>> - 
+     * <<dfs.namenode.bookkeeperjournal.output-buffer-size>> -
       Number of bytes a bookkeeper journal stream will buffer before
       forcing a flush. Default is 1024.
-     
+
 ----
       <property>
         <name>dfs.namenode.bookkeeperjournal.output-buffer-size</name>
@ -785,7 +785,7 @@ digest:hdfs-zkfcs:vlUvLnd8MlacsE80rDuu6ONESbM=:rwcda
       </property>
 ----

-     * <<dfs.namenode.bookkeeperjournal.ensemble-size>> - 
+     * <<dfs.namenode.bookkeeperjournal.ensemble-size>> -
       Number of bookkeeper servers in edit log ensembles. This
       is the number of bookkeeper servers which need to be available
       for the edit log to be writable. Default is 3.
@ -797,7 +797,7 @@ digest:hdfs-zkfcs:vlUvLnd8MlacsE80rDuu6ONESbM=:rwcda
       </property>
 ----

-     * <<dfs.namenode.bookkeeperjournal.quorum-size>> - 
+     * <<dfs.namenode.bookkeeperjournal.quorum-size>> -
       Number of bookkeeper servers in the write quorum. This is the
       number of bookkeeper servers which must have acknowledged the
       write of an entry before it is considered written. Default is 2.
@ -809,7 +809,7 @@ digest:hdfs-zkfcs:vlUvLnd8MlacsE80rDuu6ONESbM=:rwcda
       </property>
 ----

-     * <<dfs.namenode.bookkeeperjournal.digestPw>> - 
+     * <<dfs.namenode.bookkeeperjournal.digestPw>> -
       Password to use when creating edit log segments.

 ----
@ -819,9 +819,9 @@ digest:hdfs-zkfcs:vlUvLnd8MlacsE80rDuu6ONESbM=:rwcda
       </property>
 ----

-     * <<dfs.namenode.bookkeeperjournal.zk.session.timeout>> - 
+     * <<dfs.namenode.bookkeeperjournal.zk.session.timeout>> -
       Session timeout for Zookeeper client from BookKeeper Journal Manager.
-       Hadoop recommends that this value should be less than the ZKFC 
+       Hadoop recommends that this value should be less than the ZKFC
       session timeout value. Default value is 3000.

 ----
@ -838,22 +838,22 @@ digest:hdfs-zkfcs:vlUvLnd8MlacsE80rDuu6ONESbM=:rwcda

     $ mvn clean package -Pdist

-     This will generate a jar with the BookKeeperJournalManager, 
+     This will generate a jar with the BookKeeperJournalManager,
     hadoop-hdfs/src/contrib/bkjournal/target/hadoop-hdfs-bkjournal-<VERSION>.jar

     Note that the -Pdist part of the build command is important, this would
-     copy the dependent bookkeeper-server jar under 
+     copy the dependent bookkeeper-server jar under
     hadoop-hdfs/src/contrib/bkjournal/target/lib.

 *** <<Putting the BookKeeperJournalManager in the NameNode classpath>>

    To run a HDFS namenode using BookKeeper as a backend, copy the bkjournal and
-    bookkeeper-server jar, mentioned above, into the lib directory of hdfs. In the 
+    bookkeeper-server jar, mentioned above, into the lib directory of hdfs. In the
    standard distribution of HDFS, this is at $HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/

    cp hadoop-hdfs/src/contrib/bkjournal/target/hadoop-hdfs-bkjournal-<VERSION>.jar $HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/

- *** <<Current limitations>> 
+ *** <<Current limitations>>

      1) Security in BookKeeper. BookKeeper does not support SASL nor SSL for
         connections between the NameNode and BookKeeper storage nodes.
--- a/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HDFSHighAvailabilityWithQJM.apt.vm
+++ b/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HDFSHighAvailabilityWithQJM.apt.vm
@ -25,7 +25,7 @@ HDFS High Availability Using the Quorum Journal Manager
  This guide provides an overview of the HDFS High Availability (HA) feature
  and how to configure and manage an HA HDFS cluster, using the Quorum Journal
  Manager (QJM) feature.
- 
+
  This document assumes that the reader has a general understanding of
  general components and node types in an HDFS cluster. Please refer to the
  HDFS Architecture guide for details.
@ -44,7 +44,7 @@ HDFS High Availability Using the Quorum Journal Manager
  an HDFS cluster. Each cluster had a single NameNode, and if that machine or
  process became unavailable, the cluster as a whole would be unavailable
  until the NameNode was either restarted or brought up on a separate machine.
-  
+
  This impacted the total availability of the HDFS cluster in two major ways:

    * In the case of an unplanned event such as a machine crash, the cluster would
@ -52,7 +52,7 @@ HDFS High Availability Using the Quorum Journal Manager

    * Planned maintenance events such as software or hardware upgrades on the
      NameNode machine would result in windows of cluster downtime.
-  
+
  The HDFS High Availability feature addresses the above problems by providing
  the option of running two redundant NameNodes in the same cluster in an
  Active/Passive configuration with a hot standby. This allows a fast failover to
@ -67,7 +67,7 @@ HDFS High Availability Using the Quorum Journal Manager
  for all client operations in the cluster, while the Standby is simply acting
  as a slave, maintaining enough state to provide a fast failover if
  necessary.
-  
+
  In order for the Standby node to keep its state synchronized with the Active
  node, both nodes communicate with a group of separate daemons called
  "JournalNodes" (JNs). When any namespace modification is performed by the
@ -78,12 +78,12 @@ HDFS High Availability Using the Quorum Journal Manager
  failover, the Standby will ensure that it has read all of the edits from the
  JounalNodes before promoting itself to the Active state. This ensures that the
  namespace state is fully synchronized before a failover occurs.
-  
+
  In order to provide a fast failover, it is also necessary that the Standby node
  have up-to-date information regarding the location of blocks in the cluster.
  In order to achieve this, the DataNodes are configured with the location of
  both NameNodes, and send block location information and heartbeats to both.
-  
+
  It is vital for the correct operation of an HA cluster that only one of the
  NameNodes be Active at a time. Otherwise, the namespace state would quickly
  diverge between the two, risking data loss or other incorrect results.  In
@ -113,7 +113,7 @@ HDFS High Availability Using the Quorum Journal Manager
    you should run an odd number of JNs, (i.e. 3, 5, 7, etc.). Note that when
    running with N JournalNodes, the system can tolerate at most (N - 1) / 2
    failures and continue to function normally.
-  
+
  Note that, in an HA cluster, the Standby NameNode also performs checkpoints of
  the namespace state, and thus it is not necessary to run a Secondary NameNode,
  CheckpointNode, or BackupNode in an HA cluster. In fact, to do so would be an
@ -130,7 +130,7 @@ HDFS High Availability Using the Quorum Journal Manager
  The new configuration is designed such that all the nodes in the cluster may
  have the same configuration without the need for deploying different
  configuration files to different machines based on the type of the node.
- 
+
  Like HDFS Federation, HA clusters reuse the <<<nameservice ID>>> to identify a
  single HDFS instance that may in fact consist of multiple HA NameNodes. In
  addition, a new abstraction called <<<NameNode ID>>> is added with HA. Each
@ -347,7 +347,7 @@ HDFS High Availability Using the Quorum Journal Manager
      <<dfs_namenode_rpc-address>> will contain the RPC address of the target node, even
      though the configuration may specify that variable as
      <<dfs.namenode.rpc-address.ns1.nn1>>.
-      
+
      Additionally, the following variables referring to the target node to be fenced
      are also available:

@ -362,7 +362,7 @@ HDFS High Availability Using the Quorum Journal Manager
 *-----------------------:-----------------------------------+
 | $target_namenodeid    | the namenode ID of the NN to be fenced |
 *-----------------------:-----------------------------------+
-      
+
      These environment variables may also be used as substitutions in the shell
      command itself. For example:

@ -372,7 +372,7 @@ HDFS High Availability Using the Quorum Journal Manager
  <value>shell(/path/to/my/script.sh --nameservice=$target_nameserviceid $target_host:$target_port)</value>
 </property>
 ---
-      
+
      If the shell command returns an exit
      code of 0, the fencing is determined to be successful. If it returns any other
      exit code, the fencing was not successful and the next fencing method in the
@ -424,7 +424,7 @@ HDFS High Availability Using the Quorum Journal Manager

    * If you are setting up a fresh HDFS cluster, you should first run the format
    command (<hdfs namenode -format>) on one of NameNodes.
-  
+
    * If you have already formatted the NameNode, or are converting a
    non-HA-enabled cluster to be HA-enabled, you should now copy over the
    contents of your NameNode metadata directories to the other, unformatted
@ -432,7 +432,7 @@ HDFS High Availability Using the Quorum Journal Manager
    unformatted NameNode. Running this command will also ensure that the
    JournalNodes (as configured by <<dfs.namenode.shared.edits.dir>>) contain
    sufficient edits transactions to be able to start both NameNodes.
-  
+
    * If you are converting a non-HA NameNode to be HA, you should run the
    command "<hdfs -initializeSharedEdits>", which will initialize the
    JournalNodes with the edits data from the local NameNode edits directories.
@ -522,7 +522,7 @@ Usage: DFSHAAdmin [-ns <nameserviceId>]
  of coordination data, notifying clients of changes in that data, and
  monitoring clients for failures. The implementation of automatic HDFS failover
  relies on ZooKeeper for the following things:
-  
+
    * <<Failure detection>> - each of the NameNode machines in the cluster
    maintains a persistent session in ZooKeeper. If the machine crashes, the
    ZooKeeper session will expire, notifying the other NameNode that a failover
@ -623,7 +623,7 @@ Usage: DFSHAAdmin [-ns <nameserviceId>]
  from one of the NameNode hosts.

 ----
-$ hdfs zkfc -formatZK
+[hdfs]$ $HADOOP_PREFIX/bin/hdfs zkfc -formatZK
 ----

  This will create a znode in ZooKeeper inside of which the automatic failover
@ -643,7 +643,7 @@ $ hdfs zkfc -formatZK
  can start the daemon by running:

 ----
-$ hadoop-daemon.sh start zkfc
+[hdfs]$ $HADOOP_PREFIX/bin/hdfs --daemon start zkfc
 ----

 ** Securing access to ZooKeeper
@ -684,7 +684,7 @@ digest:hdfs-zkfcs:mypassword
  a command like the following:

 ----
-$ java -cp $ZK_HOME/lib/*:$ZK_HOME/zookeeper-3.4.2.jar org.apache.zookeeper.server.auth.DigestAuthenticationProvider hdfs-zkfcs:mypassword
+[hdfs]$ java -cp $ZK_HOME/lib/*:$ZK_HOME/zookeeper-3.4.2.jar org.apache.zookeeper.server.auth.DigestAuthenticationProvider hdfs-zkfcs:mypassword
 output: hdfs-zkfcs:mypassword->hdfs-zkfcs:P/OQvnYyU/nF/mGYvB/xurX8dYs=
 ----

@ -786,18 +786,18 @@ digest:hdfs-zkfcs:vlUvLnd8MlacsE80rDuu6ONESbM=:rwcda
    operations, the operation will fail.

    [[3]] Start one of the NNs with the <<<'-upgrade'>>> flag.
-  
+
    [[4]] On start, this NN will not enter the standby state as usual in an HA
    setup. Rather, this NN will immediately enter the active state, perform an
    upgrade of its local storage dirs, and also perform an upgrade of the shared
    edit log.
-  
+
    [[5]] At this point the other NN in the HA pair will be out of sync with
    the upgraded NN. In order to bring it back in sync and once again have a highly
    available setup, you should re-bootstrap this NameNode by running the NN with
    the <<<'-bootstrapStandby'>>> flag. It is an error to start this second NN with
    the <<<'-upgrade'>>> flag.
-  
+
  Note that if at any time you want to restart the NameNodes before finalizing
  or rolling back the upgrade, you should start the NNs as normal, i.e. without
  any special startup flag.
--- a/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsNfsGateway.apt.vm
+++ b/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsNfsGateway.apt.vm
@ -36,15 +36,15 @@ HDFS NFS Gateway
     HDFS file system.

   * Users can stream data directly to HDFS through the mount point. File
-     append is supported but random write is not supported. 
+     append is supported but random write is not supported.

  The NFS gateway machine needs the same thing to run an HDFS client like Hadoop JAR files, HADOOP_CONF directory.
-  The NFS gateway can be on the same host as DataNode, NameNode, or any HDFS client. 
+  The NFS gateway can be on the same host as DataNode, NameNode, or any HDFS client.


 * {Configuration}

-   The NFS-gateway uses proxy user to proxy all the users accessing the NFS mounts. 
+   The NFS-gateway uses proxy user to proxy all the users accessing the NFS mounts.
   In non-secure mode, the user running the gateway is the proxy user, while in secure mode the
   user in Kerberos keytab is the proxy user. Suppose the proxy user is 'nfsserver'
   and users belonging to the groups 'users-group1'
@ -57,10 +57,10 @@ HDFS NFS Gateway
  <name>hadoop.proxyuser.nfsserver.groups</name>
  <value>root,users-group1,users-group2</value>
  <description>
-         The 'nfsserver' user is allowed to proxy all members of the 'users-group1' and 
+         The 'nfsserver' user is allowed to proxy all members of the 'users-group1' and
         'users-group2' groups. Note that in most cases you will need to include the
         group "root" because the user "root" (which usually belonges to "root" group) will
-         generally be the user that initially executes the mount on the NFS client system. 
+         generally be the user that initially executes the mount on the NFS client system.
         Set this to '*' to allow nfsserver user to proxy any group.
  </description>
 </property>
@ -78,7 +78,7 @@ HDFS NFS Gateway
 ----

   The above are the only required configuration for the NFS gateway in non-secure mode. For Kerberized
-   hadoop clusters, the following configurations need to be added to hdfs-site.xml for the gateway (NOTE: replace 
+   hadoop clusters, the following configurations need to be added to hdfs-site.xml for the gateway (NOTE: replace
   string "nfsserver" with the proxy user name and ensure the user contained in the keytab is
   also the same proxy user):

@ -95,7 +95,7 @@ HDFS NFS Gateway
    <value>nfsserver/_HOST@YOUR-REALM.COM</value>
  </property>
 ----
-  
+
   The rest of the NFS gateway configurations are optional for both secure and non-secure mode.

   The AIX NFS client has a {{{https://issues.apache.org/jira/browse/HDFS-6549}few known issues}}
@ -119,11 +119,11 @@ HDFS NFS Gateway

   It's strongly recommended for the users to update a few configuration properties based on their use
   cases. All the following configuration properties can be added or updated in hdfs-site.xml.
-  
-   * If the client mounts the export with access time update allowed, make sure the following 
-    property is not disabled in the configuration file. Only NameNode needs to restart after 
+
+   * If the client mounts the export with access time update allowed, make sure the following
+    property is not disabled in the configuration file. Only NameNode needs to restart after
    this property is changed. On some Unix systems, the user can disable access time update
-    by mounting the export with "noatime". If the export is mounted with "noatime", the user 
+    by mounting the export with "noatime". If the export is mounted with "noatime", the user
    doesn't need to change the following property and thus no need to restart namenode.

 ----
@ -149,11 +149,11 @@ HDFS NFS Gateway
      this property is updated.

 ----
-  <property>    
+  <property>
    <name>nfs.dump.dir</name>
    <value>/tmp/.hdfs-nfs</value>
  </property>
---- 
+----

  * By default, the export can be mounted by any client. To better control the access,
    users can update the following property. The value string contains machine name and
@ -161,7 +161,7 @@ HDFS NFS Gateway
    characters. The machine name format can be a single host, a Java regular expression, or an IPv4 address. The
    access privilege uses rw or ro to specify read/write or read-only access of the machines to exports. If the access
    privilege is not provided, the default is read-only. Entries are separated by ";".
-    For example: "192.168.0.0/22 rw ; host.*\.example\.com ; host1.test.org ro;". Only the NFS gateway needs to restart after 
+    For example: "192.168.0.0/22 rw ; host.*\.example\.com ; host1.test.org ro;". Only the NFS gateway needs to restart after
    this property is updated.

 ----
@ -171,22 +171,22 @@ HDFS NFS Gateway
 </property>
 ----

-  * JVM and log settings. You can export JVM settings (e.g., heap size and GC log) in 
-   HADOOP_NFS3_OPTS. More NFS related settings can be found in hadoop-env.sh. 
-   To get NFS debug trace, you can edit the log4j.property file 
+  * JVM and log settings. You can export JVM settings (e.g., heap size and GC log) in
+   HADOOP_NFS3_OPTS. More NFS related settings can be found in hadoop-env.sh.
+   To get NFS debug trace, you can edit the log4j.property file
   to add the following. Note, debug trace, especially for ONCRPC, can be very verbose.

    To change logging level:

----------------------------------------------- 
+-----------------------------------------------
    log4j.logger.org.apache.hadoop.hdfs.nfs=DEBUG
----------------------------------------------- 
+-----------------------------------------------

    To get more details of ONCRPC requests:

----------------------------------------------- 
+-----------------------------------------------
    log4j.logger.org.apache.hadoop.oncrpc=DEBUG
----------------------------------------------- 
+-----------------------------------------------


 * {Start and stop NFS gateway service}
@ -195,53 +195,39 @@ HDFS NFS Gateway
  The NFS gateway process has both nfsd and mountd. It shares the HDFS root "/" as the
  only export. It is recommended to use the portmap included in NFS gateway package. Even
  though NFS gateway works with portmap/rpcbind provide by most Linux distributions, the
-  package included portmap is needed on some Linux systems such as REHL6.2 due to an 
+  package included portmap is needed on some Linux systems such as REHL6.2 due to an
  {{{https://bugzilla.redhat.com/show_bug.cgi?id=731542}rpcbind bug}}. More detailed discussions can
  be found in {{{https://issues.apache.org/jira/browse/HDFS-4763}HDFS-4763}}.

-   [[1]] Stop nfs/rpcbind/portmap services provided by the platform (commands can be different on various Unix platforms):
-      
-------------------------
-     service nfs stop
-      
-     service rpcbind stop
-------------------------
-
-
-   [[2]] Start package included portmap (needs root privileges):
+   [[1]] Stop nfsv3 and rpcbind/portmap services provided by the platform (commands can be different on various Unix platforms):

 -------------------------
-     hdfs portmap
-  
-     OR
+[root]> service nfs stop
+[root]> service rpcbind stop
+-------------------------

-     hadoop-daemon.sh start portmap
+   [[2]] Start Hadoop's portmap (needs root privileges):
+
+-------------------------
+[root]> $HADOOP_PREFIX/bin/hdfs --daemon start portmap
 -------------------------

   [[3]] Start mountd and nfsd.
-   
+
     No root privileges are required for this command. In non-secure mode, the NFS gateway
-     should be started by the proxy user mentioned at the beginning of this user guide. 
-     While in secure mode, any user can start NFS gateway 
+     should be started by the proxy user mentioned at the beginning of this user guide.
+     While in secure mode, any user can start NFS gateway
     as long as the user has read access to the Kerberos keytab defined in "nfs.keytab.file".

 -------------------------
-     hdfs nfs3
-
-     OR
-
-     hadoop-daemon.sh start nfs3
+[hdfs]$ $HADOOP_PREFIX/bin/hdfs --daemon start nfs3
 -------------------------

-     Note, if the hadoop-daemon.sh script starts the NFS gateway, its log can be found in the hadoop log folder.
-
-
   [[4]] Stop NFS gateway services.

 -------------------------
-      hadoop-daemon.sh stop nfs3
-
-      hadoop-daemon.sh stop portmap
+[hdfs]$ $HADOOP_PREFIX/bin/hdfs --daemon stop nfs3
+[root]> $HADOOP_PREFIX/bin/hdfs --daemon stop portmap
 -------------------------

  Optionally, you can forgo running the Hadoop-provided portmap daemon and
@ -263,7 +249,7 @@ HDFS NFS Gateway
    [[1]] Execute the following command to verify if all the services are up and running:

 -------------------------
-       rpcinfo -p $nfs_server_ip
+[root]> rpcinfo -p $nfs_server_ip
 -------------------------

     You should see output similar to the following:
@ -293,11 +279,11 @@ HDFS NFS Gateway
    [[2]]  Verify if the HDFS namespace is exported and can be mounted.

 -------------------------
-        showmount -e $nfs_server_ip                         
+[root]> showmount -e $nfs_server_ip
 -------------------------

      You should see output similar to the following:
-     
+
 -------------------------
        Exports list on $nfs_server_ip :

@ -307,22 +293,22 @@ HDFS NFS Gateway

 * {Mount the export “/”}

-  Currently NFS v3 only uses TCP as the transportation protocol. 
+  Currently NFS v3 only uses TCP as the transportation protocol.
  NLM is not supported so mount option "nolock" is needed. It's recommended to use
-  hard mount. This is because, even after the client sends all data to 
-  NFS gateway, it may take NFS gateway some extra time to transfer data to HDFS 
+  hard mount. This is because, even after the client sends all data to
+  NFS gateway, it may take NFS gateway some extra time to transfer data to HDFS
  when writes were reorderd by NFS client Kernel.
- 
-  If soft mount has to be used, the user should give it a relatively 
+
+  If soft mount has to be used, the user should give it a relatively
  long timeout (at least no less than the default timeout on the host) .

  The users can mount the HDFS namespace as shown below:

-------------------------------------------------------------------  
-       mount -t nfs -o vers=3,proto=tcp,nolock,noacl $server:/  $mount_point
+-------------------------------------------------------------------
+ [root]>mount -t nfs -o vers=3,proto=tcp,nolock,noacl $server:/  $mount_point
 -------------------------------------------------------------------

-  Then the users can access HDFS as part of the local file system except that, 
+  Then the users can access HDFS as part of the local file system except that,
  hard link and random write are not supported yet. To optimize the performance
  of large file I/O, one can increase the NFS transfer size(rsize and wsize) during mount.
  By default, NFS gateway supports 1MB as the maximum transfer size. For larger data
@ -347,7 +333,7 @@ HDFS NFS Gateway
 * {User authentication and mapping}

  NFS gateway in this release uses AUTH_UNIX style authentication. When the user on NFS client
-  accesses the mount point, NFS client passes the UID to NFS gateway. 
+  accesses the mount point, NFS client passes the UID to NFS gateway.
  NFS gateway does a lookup to find user name from the UID, and then passes the
  username to the HDFS along with the HDFS requests.
  For example, if the NFS client has current user as "admin", when the user accesses
@ -358,7 +344,7 @@ HDFS NFS Gateway
  The system administrator must ensure that the user on NFS client host has the same
  name and UID as that on the NFS gateway host. This is usually not a problem if
  the same user management system (e.g., LDAP/NIS) is used to create and deploy users on
-  HDFS nodes and NFS client node. In case the user account is created manually on different hosts, one might need to 
+  HDFS nodes and NFS client node. In case the user account is created manually on different hosts, one might need to
  modify UID (e.g., do "usermod -u 123 myusername") on either NFS client or NFS gateway host
  in order to make it the same on both sides. More technical details of RPC AUTH_UNIX can be found
  in {{{http://tools.ietf.org/html/rfc1057}RPC specification}}.