HADOOP-10908. Common needs updates for shell rewrite (aw)

2015-01-05 14:26:41 -08:00 · 2015-01-05 14:26:41 -08:00 · 94d342e607
parent 41d72cbd48
commit 94d342e607
5 changed files with 558 additions and 489 deletions
--- a/hadoop-common-project/hadoop-common/CHANGES.txt
+++ b/hadoop-common-project/hadoop-common/CHANGES.txt
@ -344,6 +344,8 @@ Trunk (Unreleased)

    HADOOP-11397. Can't override HADOOP_IDENT_STRING (Kengo Seki via aw)

+    HADOOP-10908. Common needs updates for shell rewrite (aw)
+
  OPTIMIZATIONS

    HADOOP-7761. Improve the performance of raw comparisons. (todd)
--- a/hadoop-common-project/hadoop-common/src/site/apt/ClusterSetup.apt.vm
+++ b/hadoop-common-project/hadoop-common/src/site/apt/ClusterSetup.apt.vm
@ -11,83 +11,81 @@
 ~~ limitations under the License. See accompanying LICENSE file.

  ---
-  Hadoop Map Reduce Next Generation-${project.version} - Cluster Setup
+  Hadoop ${project.version} - Cluster Setup
  ---
  ---
  ${maven.build.timestamp}

 %{toc|section=1|fromDepth=0}

-Hadoop MapReduce Next Generation - Cluster Setup
+Hadoop Cluster Setup

 * {Purpose}

-  This document describes how to install, configure and manage non-trivial
+  This document describes how to install and configure
  Hadoop clusters ranging from a few nodes to extremely large clusters
-  with thousands of nodes.
+  with thousands of nodes.  To play with Hadoop, you may first want to
+  install it on a single machine (see {{{./SingleCluster.html}Single Node Setup}}).

-  To play with Hadoop, you may first want to install it on a single
-  machine (see {{{./SingleCluster.html}Single Node Setup}}).
+  This document does not cover advanced topics such as {{{./SecureMode.html}Security}} or
+  High Availability.

 * {Prerequisites}

-  Download a stable version of Hadoop from Apache mirrors.
+  * Install Java. See the {{{http://wiki.apache.org/hadoop/HadoopJavaVersions}Hadoop Wiki}} for known good versions.
+  * Download a stable version of Hadoop from Apache mirrors.

 * {Installation}

  Installing a Hadoop cluster typically involves unpacking the software on all
-  the machines in the cluster or installing RPMs.
+  the machines in the cluster or installing it via a packaging system as
+  appropriate for your operating system.  It is important to divide up the hardware
+  into functions.

  Typically one machine in the cluster is designated as the NameNode and
-  another machine the as ResourceManager, exclusively. These are the masters.
+  another machine the as ResourceManager, exclusively. These are the masters. Other
+  services (such as Web App Proxy Server and MapReduce Job History server) are usually
+  run either on dedicated hardware or on shared infrastrucutre, depending upon the load.

  The rest of the machines in the cluster act as both DataNode and NodeManager.
  These are the slaves.

-* {Running Hadoop in Non-Secure Mode}
+* {Configuring Hadoop in Non-Secure Mode}

-  The following sections describe how to configure a Hadoop cluster.
-
-  {Configuration Files}
-
-    Hadoop configuration is driven by two types of important configuration files:
+    Hadoop's Java configuration is driven by two types of important configuration files:

      * Read-only default configuration - <<<core-default.xml>>>,
        <<<hdfs-default.xml>>>, <<<yarn-default.xml>>> and
        <<<mapred-default.xml>>>.

-      * Site-specific configuration - <<conf/core-site.xml>>,
-        <<conf/hdfs-site.xml>>, <<conf/yarn-site.xml>> and
-        <<conf/mapred-site.xml>>.
+      * Site-specific configuration - <<<etc/hadoop/core-site.xml>>>,
+        <<<etc/hadoop/hdfs-site.xml>>>, <<<etc/hadoop/yarn-site.xml>>> and
+        <<<etc/hadoop/mapred-site.xml>>>.


-    Additionally, you can control the Hadoop scripts found in the bin/
-    directory of the distribution, by setting site-specific values via the
-    <<conf/hadoop-env.sh>> and <<yarn-env.sh>>.
-
-  {Site Configuration}
+  Additionally, you can control the Hadoop scripts found in the bin/
+  directory of the distribution, by setting site-specific values via the
+  <<<etc/hadoop/hadoop-env.sh>>> and <<<etc/hadoop/yarn-env.sh>>>.

  To configure the Hadoop cluster you will need to configure the
  <<<environment>>> in which the Hadoop daemons execute as well as the
  <<<configuration parameters>>> for the Hadoop daemons.

-  The Hadoop daemons are NameNode/DataNode and ResourceManager/NodeManager.
+  HDFS daemons are NameNode, SecondaryNameNode, and DataNode.  YARN damones
+  are ResourceManager, NodeManager, and WebAppProxy.  If MapReduce is to be
+  used, then the MapReduce Job History Server will also be running.  For
+  large installations, these are generally running on separate hosts.


 ** {Configuring Environment of Hadoop Daemons}

-  Administrators should use the <<conf/hadoop-env.sh>> and
-  <<conf/yarn-env.sh>> script to do site-specific customization of the
-  Hadoop daemons' process environment.
+  Administrators should use the <<<etc/hadoop/hadoop-env.sh>>> and optionally the
+  <<<etc/hadoop/mapred-env.sh>>> and <<<etc/hadoop/yarn-env.sh>>> scripts to do
+  site-specific customization of the Hadoop daemons' process environment.

-  At the very least you should specify the <<<JAVA_HOME>>> so that it is
+  At the very least, you must specify the <<<JAVA_HOME>>> so that it is
  correctly defined on each remote node.

-  In most cases you should also specify <<<HADOOP_PID_DIR>>> and
-  <<<HADOOP_SECURE_DN_PID_DIR>>> to point to directories that can only be
-  written to by the users that are going to run the hadoop daemons.
-  Otherwise there is the potential for a symlink attack.
-
  Administrators can configure individual daemons using the configuration
  options shown below in the table:

@ -114,20 +112,42 @@ Hadoop MapReduce Next Generation - Cluster Setup
  statement should be added in hadoop-env.sh :

 ----
-  export HADOOP_NAMENODE_OPTS="-XX:+UseParallelGC ${HADOOP_NAMENODE_OPTS}"
+  export HADOOP_NAMENODE_OPTS="-XX:+UseParallelGC"
 ----

+  See <<<etc/hadoop/hadoop-env.sh>>> for other examples.
+
  Other useful configuration parameters that you can customize include:

-    * <<<HADOOP_LOG_DIR>>> / <<<YARN_LOG_DIR>>> - The directory where the
-      daemons' log files are stored. They are automatically created if they
-      don't exist.
+    * <<<HADOOP_PID_DIR>>> - The directory where the
+      daemons' process id files are stored.

-    * <<<HADOOP_HEAPSIZE>>> / <<<YARN_HEAPSIZE>>> - The maximum amount of
-      heapsize to use, in MB e.g. if the varibale is set to 1000 the heap
-      will be set to 1000MB.  This is used to configure the heap
-      size for the daemon. By default, the value is 1000.  If you want to
-      configure the values separately for each deamon you can use.
+    * <<<HADOOP_LOG_DIR>>> - The directory where the
+      daemons' log files are stored. Log files are automatically created
+      if they don't exist.
+
+    * <<<HADOOP_HEAPSIZE_MAX>>> - The maximum amount of
+      memory to use for the Java heapsize.  Units supported by the JVM
+      are also supported here.  If no unit is present, it will be assumed
+      the number is in megabytes. By default, Hadoop will let the JVM
+      determine how much to use. This value can be overriden on
+      a per-daemon basis using the appropriate <<<_OPTS>>> variable listed above.
+      For example, setting <<<HADOOP_HEAPSIZE_MAX=1g>>> and
+      <<<HADOOP_NAMENODE_OPTS="-Xmx5g">>>  will configure the NameNode with 5GB heap.
+
+  In most cases, you should specify the <<<HADOOP_PID_DIR>>> and
+  <<<HADOOP_LOG_DIR>>> directories such that they can only be
+  written to by the users that are going to run the hadoop daemons.
+  Otherwise there is the potential for a symlink attack.
+
+  It is also traditional to configure <<<HADOOP_PREFIX>>> in the system-wide
+  shell environment configuration.  For example, a simple script inside
+  <<</etc/profile.d>>>:
+
+---
+  HADOOP_PREFIX=/path/to/hadoop
+  export HADOOP_PREFIX
+---

 *--------------------------------------+--------------------------------------+
 || Daemon                              || Environment Variable                |
@ -141,12 +161,12 @@ Hadoop MapReduce Next Generation - Cluster Setup
 | Map Reduce Job History Server        | HADOOP_JOB_HISTORYSERVER_HEAPSIZE    |
 *--------------------------------------+--------------------------------------+

-** {Configuring the Hadoop Daemons in Non-Secure Mode}
+** {Configuring the Hadoop Daemons}

    This section deals with important parameters to be specified in
    the given configuration files:

-    * <<<conf/core-site.xml>>>
+    * <<<etc/hadoop/core-site.xml>>>

 *-------------------------+-------------------------+------------------------+
 || Parameter              || Value                  || Notes                 |
@ -157,7 +177,7 @@ Hadoop MapReduce Next Generation - Cluster Setup
 | | | Size of read/write buffer used in SequenceFiles. |
 *-------------------------+-------------------------+------------------------+

-    * <<<conf/hdfs-site.xml>>>
+    * <<<etc/hadoop/hdfs-site.xml>>>

      * Configurations for NameNode:

@ -195,7 +215,7 @@ Hadoop MapReduce Next Generation - Cluster Setup
 | | | stored in all named directories, typically on different devices. |
 *-------------------------+-------------------------+------------------------+

-    * <<<conf/yarn-site.xml>>>
+    * <<<etc/hadoop/yarn-site.xml>>>

      * Configurations for ResourceManager and NodeManager:

@ -341,9 +361,7 @@ Hadoop MapReduce Next Generation - Cluster Setup
 | | | Be careful, set this too small and you will spam the name node. |
 *-------------------------+-------------------------+------------------------+

-
-
-    * <<<conf/mapred-site.xml>>>
+    * <<<etc/hadoop/mapred-site.xml>>>

      * Configurations for MapReduce Applications:

@ -395,22 +413,6 @@ Hadoop MapReduce Next Generation - Cluster Setup
 | | | Directory where history files are managed by the MR JobHistory Server. |
 *-------------------------+-------------------------+------------------------+

-* {Hadoop Rack Awareness}
-
-    The HDFS and the YARN components are rack-aware.
-
-    The NameNode and the ResourceManager obtains the rack information of the
-    slaves in the cluster by invoking an API <resolve> in an administrator
-    configured module.
-
-    The API resolves the DNS name (also IP address) to a rack id.
-
-    The site-specific module to use can be configured using the configuration
-    item <<<topology.node.switch.mapping.impl>>>. The default implementation
-    of the same runs a script/command configured using
-    <<<topology.script.file.name>>>. If <<<topology.script.file.name>>> is
-    not set, the rack id </default-rack> is returned for any passed IP address.
-
 * {Monitoring Health of NodeManagers}

    Hadoop provides a mechanism by which administrators can configure the
@ -433,7 +435,7 @@ Hadoop MapReduce Next Generation - Cluster Setup
    node was healthy is also displayed on the web interface.

    The following parameters can be used to control the node health
-    monitoring script in <<<conf/yarn-site.xml>>>.
+    monitoring script in <<<etc/hadoop/yarn-site.xml>>>.

 *-------------------------+-------------------------+------------------------+
 || Parameter              || Value                  || Notes                 |
@ -465,181 +467,111 @@ Hadoop MapReduce Next Generation - Cluster Setup
  disk is either raided or a failure in the boot disk is identified by the
  health checker script.

-* {Slaves file}
+* {Slaves File}

-  Typically you choose one machine in the cluster to act as the NameNode and
-  one machine as to act as the ResourceManager, exclusively. The rest of the
-  machines act as both a DataNode and NodeManager and are referred to as
-  <slaves>.
+  List all slave hostnames or IP addresses in your <<<etc/hadoop/slaves>>>
+  file, one per line.  Helper scripts (described below) will use the
+  <<<etc/hadoop/slaves>>> file to run commands on many hosts at once.  It is not
+  used for any of the Java-based Hadoop configuration.  In order
+  to use this functionality, ssh trusts (via either passphraseless ssh or
+  some other means, such as Kerberos) must be established for the accounts
+  used to run Hadoop.

-  List all slave hostnames or IP addresses in your <<<conf/slaves>>> file,
-  one per line.
+* {Hadoop Rack Awareness}
+
+  Many Hadoop components are rack-aware and take advantage of the
+  network topology for performance and safety. Hadoop daemons obtain the
+  rack information of the slaves in the cluster by invoking an administrator
+  configured module.  See the {{{./RackAwareness.html}Rack Awareness}}
+  documentation for more specific information.
+
+  It is highly recommended configuring rack awareness prior to starting HDFS.

 * {Logging}

-  Hadoop uses the Apache log4j via the Apache Commons Logging framework for
-  logging. Edit the <<<conf/log4j.properties>>> file to customize the
+  Hadoop uses the {{{http://logging.apache.org/log4j/2.x/}Apache log4j}} via the Apache Commons Logging framework for
+  logging. Edit the <<<etc/hadoop/log4j.properties>>> file to customize the
  Hadoop daemons' logging configuration (log-formats and so on).

 * {Operating the Hadoop Cluster}

  Once all the necessary configuration is complete, distribute the files to the
-  <<<HADOOP_CONF_DIR>>> directory on all the machines.
+  <<<HADOOP_CONF_DIR>>> directory on all the machines.  This should be the
+  same directory on all machines.

-** Hadoop Startup
-
-  To start a Hadoop cluster you will need to start both the HDFS and YARN
-  cluster.
-
-  Format a new distributed filesystem:
-
----
-$ $HADOOP_PREFIX/bin/hdfs namenode -format <cluster_name>
----
-
-  Start the HDFS with the following command, run on the designated NameNode:
-
----
-$ $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start namenode
----    	
-
-  Run a script to start DataNodes on all slaves:
-
----
-$ $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start datanode
----    	
-
-  Start the YARN with the following command, run on the designated
-  ResourceManager:
-
----
-$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start resourcemanager
----    	
-
-  Run a script to start NodeManagers on all slaves:
-
----
-$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start nodemanager
----    	
-
-  Start a standalone WebAppProxy server.  If multiple servers
-  are used with load balancing it should be run on each of them:
-
----
-$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh start proxyserver --config $HADOOP_CONF_DIR
----
-
-  Start the MapReduce JobHistory Server with the following command, run on the
-  designated server:
-
----
-$ $HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh start historyserver --config $HADOOP_CONF_DIR
----    	
-
-** Hadoop Shutdown
-
-  Stop the NameNode with the following command, run on the designated
-  NameNode:
-
----
-$ $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs stop namenode
----    	
-
-  Run a script to stop DataNodes on all slaves:
-
----
-$ $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs stop datanode
----    	
-
-  Stop the ResourceManager with the following command, run on the designated
-  ResourceManager:
-
----
-$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop resourcemanager
----    	
-
-  Run a script to stop NodeManagers on all slaves:
-
----
-$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop nodemanager
----    	
-
-  Stop the WebAppProxy server. If multiple servers are used with load
-  balancing it should be run on each of them:
-
----
-$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh stop proxyserver --config $HADOOP_CONF_DIR
----
-
-
-  Stop the MapReduce JobHistory Server with the following command, run on the
-  designated server:
-
----
-$ $HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh stop historyserver --config $HADOOP_CONF_DIR
----    	
-
-
-* {Operating the Hadoop Cluster}
-
-  Once all the necessary configuration is complete, distribute the files to the
-  <<<HADOOP_CONF_DIR>>> directory on all the machines.
-
-  This section also describes the various Unix users who should be starting the
-  various components and uses the same Unix accounts and groups used previously:
+  In general, it is recommended that HDFS and YARN run as separate users.
+  In the majority of installations, HDFS processes execute as 'hdfs'.  YARN
+  is typically using the 'yarn' account.

 ** Hadoop Startup

    To start a Hadoop cluster you will need to start both the HDFS and YARN
    cluster.

-    Format a new distributed filesystem as <hdfs>:
+    The first time you bring up HDFS, it must be formatted.  Format a new
+    distributed filesystem as <hdfs>:

 ----
 [hdfs]$ $HADOOP_PREFIX/bin/hdfs namenode -format <cluster_name>
 ----

-    Start the HDFS with the following command, run on the designated NameNode
-    as <hdfs>:
+    Start the HDFS NameNode with the following command on the
+    designated node as <hdfs>:

 ----
-[hdfs]$ $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start namenode
----    	
+[hdfs]$ $HADOOP_PREFIX/bin/hdfs --daemon start namenode
+----

-    Run a script to start DataNodes on all slaves as <root> with a special
-    environment variable <<<HADOOP_SECURE_DN_USER>>> set to <hdfs>:
+    Start a HDFS DataNode with the following command on each
+    designated node as <hdfs>:

 ----
-[root]$ HADOOP_SECURE_DN_USER=hdfs $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start datanode
----    	
+[hdfs]$ $HADOOP_PREFIX/bin/hdfs --daemon start datanode
+----
+
+    If <<<etc/hadoop/slaves>>> and ssh trusted access is configured
+    (see {{{./SingleCluster.html}Single Node Setup}}), all of the
+    HDFS processes can be started with a utility script.  As <hdfs>:
+
+----
+[hdfs]$ $HADOOP_PREFIX/sbin/start-dfs.sh
+----

    Start the YARN with the following command, run on the designated
    ResourceManager as <yarn>:

 ----
-[yarn]$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start resourcemanager
----    	
+[yarn]$ $HADOOP_PREFIX/bin/yarn --daemon start resourcemanager
+----

-    Run a script to start NodeManagers on all slaves as <yarn>:
+    Run a script to start a NodeManager on each designated host as <yarn>:

 ----
-[yarn]$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start nodemanager
----    	
+[yarn]$ $HADOOP_PREFIX/bin/yarn --daemon start nodemanager
+----

    Start a standalone WebAppProxy server. Run on the WebAppProxy
    server as <yarn>.  If multiple servers are used with load balancing
    it should be run on each of them:

 ----
-[yarn]$ $HADOOP_YARN_HOME/bin/yarn start proxyserver --config $HADOOP_CONF_DIR
----    	
+[yarn]$ $HADOOP_PREFIX/bin/yarn --daemon start proxyserver
+----

-    Start the MapReduce JobHistory Server with the following command, run on the
-    designated server as <mapred>:
+    If <<<etc/hadoop/slaves>>> and ssh trusted access is configured
+    (see {{{./SingleCluster.html}Single Node Setup}}), all of the
+    YARN processes can be started with a utility script.  As <yarn>:

 ----
-[mapred]$ $HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh start historyserver --config $HADOOP_CONF_DIR
----    	
+[yarn]$ $HADOOP_PREFIX/sbin/start-yarn.sh
+----
+
+    Start the MapReduce JobHistory Server with the following command, run
+    on the designated server as <mapred>:
+
+----
+[mapred]$ $HADOOP_PREFIX/bin/mapred --daemon start historyserver
+----

 ** Hadoop Shutdown

@ -647,42 +579,58 @@ $ $HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh stop historyserver --config $HADOO
  as <hdfs>:

 ----
-[hdfs]$ $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs stop namenode
----    	
+[hdfs]$ $HADOOP_PREFIX/bin/hdfs --daemon stop namenode
+----

-  Run a script to stop DataNodes on all slaves as <root>:
+  Run a script to stop a DataNode as <hdfs>:

 ----
-[root]$ $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs stop datanode
----    	
+[hdfs]$ $HADOOP_PREFIX/bin/hdfs --daemon stop datanode
+----
+
+    If <<<etc/hadoop/slaves>>> and ssh trusted access is configured
+    (see {{{./SingleCluster.html}Single Node Setup}}), all of the
+    HDFS processes may be stopped with a utility script.  As <hdfs>:
+
+----
+[hdfs]$ $HADOOP_PREFIX/sbin/stop-dfs.sh
+----

  Stop the ResourceManager with the following command, run on the designated
  ResourceManager as <yarn>:

 ----
-[yarn]$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop resourcemanager
----    	
+[yarn]$ $HADOOP_PREFIX/bin/yarn --daemon stop resourcemanager
+----

-  Run a script to stop NodeManagers on all slaves as <yarn>:
+  Run a script to stop a NodeManager on a slave as <yarn>:

 ----
-[yarn]$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop nodemanager
----    	
+[yarn]$ $HADOOP_PREFIX/bin/yarn --daemon stop nodemanager
+----
+
+    If <<<etc/hadoop/slaves>>> and ssh trusted access is configured
+    (see {{{./SingleCluster.html}Single Node Setup}}), all of the
+    YARN processes can be stopped with a utility script.  As <yarn>:
+
+----
+[yarn]$ $HADOOP_PREFIX/sbin/stop-yarn.sh
+----

  Stop the WebAppProxy server. Run on the WebAppProxy  server as
  <yarn>.  If multiple servers are used with load balancing it
  should be run on each of them:

 ----
-[yarn]$ $HADOOP_YARN_HOME/bin/yarn stop proxyserver --config $HADOOP_CONF_DIR
+[yarn]$ $HADOOP_PREFIX/bin/yarn stop proxyserver
 ----

  Stop the MapReduce JobHistory Server with the following command, run on the
  designated server as <mapred>:

 ----
-[mapred]$ $HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh stop historyserver --config $HADOOP_CONF_DIR
----    	
+[mapred]$ $HADOOP_PREFIX/bin/mapred --daemon stop historyserver
+----

 * {Web Interfaces}

--- a/hadoop-common-project/hadoop-common/src/site/apt/CommandsManual.apt.vm
+++ b/hadoop-common-project/hadoop-common/src/site/apt/CommandsManual.apt.vm
@ -21,102 +21,161 @@

 %{toc}

-Overview
+Hadoop Commands Guide

-   All hadoop commands are invoked by the <<<bin/hadoop>>> script. Running the
-   hadoop script without any arguments prints the description for all
-   commands.
+* Overview

-   Usage: <<<hadoop [--config confdir] [--loglevel loglevel] [COMMAND]
-             [GENERIC_OPTIONS] [COMMAND_OPTIONS]>>>
+   All of the Hadoop commands and subprojects follow the same basic structure:

-   Hadoop has an option parsing framework that employs parsing generic
-   options as well as running classes.
+   Usage: <<<shellcommand [SHELL_OPTIONS] [COMMAND] [GENERIC_OPTIONS] [COMMAND_OPTIONS]>>>

+*--------+---------+
+|| FIELD || Description 
 *-----------------------+---------------+
-|| COMMAND_OPTION       || Description
+| shellcommand | The command of the project being invoked.  For example,
+               | Hadoop common uses <<<hadoop>>>, HDFS uses <<<hdfs>>>, 
+               | and YARN uses <<<yarn>>>.
+*---------------+-------------------+
+| SHELL_OPTIONS | Options that the shell processes prior to executing Java.
 *-----------------------+---------------+
-| <<<--config confdir>>>| Overwrites the default Configuration directory.  Default is <<<${HADOOP_HOME}/conf>>>.
+| COMMAND | Action to perform.
 *-----------------------+---------------+
-| <<<--loglevel loglevel>>>| Overwrites the log level. Valid log levels are
-|                       | FATAL, ERROR, WARN, INFO, DEBUG, and TRACE.
-|                       | Default is INFO.
+| GENERIC_OPTIONS       | The common set of options supported by 
+                        | multiple commands.
 *-----------------------+---------------+
-| GENERIC_OPTIONS       | The common set of options supported by multiple commands.
-| COMMAND_OPTIONS       | Various commands with their options are described in the following sections. The commands have been grouped into User Commands and Administration Commands.
+| COMMAND_OPTIONS       | Various commands with their options are 
+                        | described in this documention for the 
+                        | Hadoop common sub-project.  HDFS and YARN are
+                        | covered in other documents.
 *-----------------------+---------------+

-Generic Options
+** {Shell Options}

-   The following options are supported by {{dfsadmin}}, {{fs}}, {{fsck}},
-   {{job}} and {{fetchdt}}. Applications should implement 
-   {{{../../api/org/apache/hadoop/util/Tool.html}Tool}} to support
-   GenericOptions.
+   All of the shell commands will accept a common set of options.  For some commands,
+   these options are ignored. For example, passing <<<---hostnames>>> on a
+   command that only executes on a single host will be ignored.
+
+*-----------------------+---------------+
+|| SHELL_OPTION       || Description
+*-----------------------+---------------+
+| <<<--buildpaths>>>    | Enables developer versions of jars.
+*-----------------------+---------------+
+| <<<--config confdir>>> | Overwrites the default Configuration 
+                         | directory.  Default is <<<${HADOOP_PREFIX}/conf>>>.
+*-----------------------+----------------+
+| <<<--daemon mode>>>   | If the command supports daemonization (e.g.,
+                        | <<<hdfs namenode>>>), execute in the appropriate
+                        | mode. Supported modes are <<<start>>> to start the
+                        | process in daemon mode, <<<stop>>> to stop the
+                        | process, and <<<status>>> to determine the active
+                        | status of the process.  <<<status>>> will return
+                        | an {{{http://refspecs.linuxbase.org/LSB_3.0.0/LSB-generic/LSB-generic/iniscrptact.html}LSB-compliant}} result code. 
+                        | If no option is provided, commands that support
+                        | daemonization will run in the foreground.   
+*-----------------------+---------------+
+| <<<--debug>>>         | Enables shell level configuration debugging information
+*-----------------------+---------------+
+| <<<--help>>>          | Shell script usage information.
+*-----------------------+---------------+
+| <<<--hostnames>>> | A space delimited list of hostnames where to execute 
+                    | a multi-host subcommand. By default, the content of
+                    | the <<<slaves>>> file is used.  
+*-----------------------+----------------+
+| <<<--hosts>>> | A file that contains a list of hostnames where to execute
+                | a multi-host subcommand. By default, the content of the
+                | <<<slaves>>> file is used.  
+*-----------------------+----------------+
+| <<<--loglevel loglevel>>> | Overrides the log level. Valid log levels are
+|                           | FATAL, ERROR, WARN, INFO, DEBUG, and TRACE.
+|                           | Default is INFO.
+*-----------------------+---------------+
+
+** {Generic Options}
+
+   Many subcommands honor a common set of configuration options to alter their behavior:

 *------------------------------------------------+-----------------------------+
 ||            GENERIC_OPTION                     ||            Description
 *------------------------------------------------+-----------------------------+
-|<<<-conf \<configuration file\> >>>             | Specify an application
-                                                 | configuration file.
-*------------------------------------------------+-----------------------------+
-|<<<-D \<property\>=\<value\> >>>                | Use value for given property.
-*------------------------------------------------+-----------------------------+
-|<<<-jt \<local\> or \<resourcemanager:port\>>>> | Specify a ResourceManager.
-                                                 | Applies only to job.
-*------------------------------------------------+-----------------------------+
-|<<<-files \<comma separated list of files\> >>> | Specify comma separated files
-                                                 | to be copied to the map
-                                                 | reduce cluster.  Applies only
-                                                 | to job.
-*------------------------------------------------+-----------------------------+
-|<<<-libjars \<comma seperated list of jars\> >>>| Specify comma separated jar
-                                                 | files to include in the
-                                                 | classpath. Applies only to
-                                                 | job.
-*------------------------------------------------+-----------------------------+
 |<<<-archives \<comma separated list of archives\> >>> | Specify comma separated
                                                 | archives to be unarchived on
                                                 | the compute machines. Applies
                                                 | only to job.
 *------------------------------------------------+-----------------------------+
+|<<<-conf \<configuration file\> >>>             | Specify an application
+                                                 | configuration file.
+*------------------------------------------------+-----------------------------+
+|<<<-D \<property\>=\<value\> >>>                | Use value for given property.
+*------------------------------------------------+-----------------------------+
+|<<<-files \<comma separated list of files\> >>> | Specify comma separated files
+                                                 | to be copied to the map
+                                                 | reduce cluster.  Applies only
+                                                 | to job.
+*------------------------------------------------+-----------------------------+
+|<<<-jt \<local\> or \<resourcemanager:port\>>>> | Specify a ResourceManager.
+                                                 | Applies only to job.
+*------------------------------------------------+-----------------------------+
+|<<<-libjars \<comma seperated list of jars\> >>>| Specify comma separated jar
+                                                 | files to include in the
+                                                 | classpath. Applies only to
+                                                 | job.
+*------------------------------------------------+-----------------------------+

-User Commands
+Hadoop Common Commands
+
+  All of these commands are executed from the <<<hadoop>>> shell command.  They
+  have been broken up into {{User Commands}} and 
+  {{Admininistration Commands}}.
+
+* User Commands

   Commands useful for users of a hadoop cluster.

-* <<<archive>>>
-
+** <<<archive>>>
+    
   Creates a hadoop archive. More information can be found at
-   {{{../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/HadoopArchives.html}
+  {{{../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/HadoopArchives.html}
   Hadoop Archives Guide}}.

-* <<<credential>>>
+** <<<checknative>>>

-   Command to manage credentials, passwords and secrets within credential providers.
+    Usage: <<<hadoop checknative [-a] [-h] >>>

-   The CredentialProvider API in Hadoop allows for the separation of applications
-   and how they store their required passwords/secrets. In order to indicate
-   a particular provider type and location, the user must provide the
-   <hadoop.security.credential.provider.path> configuration element in core-site.xml
-   or use the command line option <<<-provider>>> on each of the following commands.
-   This provider path is a comma-separated list of URLs that indicates the type and
-   location of a list of providers that should be consulted.
-   For example, the following path:
+*-----------------+-----------------------------------------------------------+
+|| COMMAND_OPTION || Description
+*-----------------+-----------------------------------------------------------+
+| -a              | Check all libraries are available.
+*-----------------+-----------------------------------------------------------+
+| -h              | print help
+*-----------------+-----------------------------------------------------------+

-   <<<user:///,jceks://file/tmp/test.jceks,jceks://hdfs@nn1.example.com/my/path/test.jceks>>>
+    This command checks the availability of the Hadoop native code.  See
+    {{{NativeLibraries.html}}} for more information.  By default, this command 
+    only checks the availability of libhadoop.

-   indicates that the current user's credentials file should be consulted through
-   the User Provider, that the local file located at <<</tmp/test.jceks>>> is a Java Keystore
-   Provider and that the file located within HDFS at <<<nn1.example.com/my/path/test.jceks>>>
-   is also a store for a Java Keystore Provider.
+** <<<classpath>>>

-   When utilizing the credential command it will often be for provisioning a password
-   or secret to a particular credential store provider. In order to explicitly
-   indicate which provider store to use the <<<-provider>>> option should be used. Otherwise,
-   given a path of multiple providers, the first non-transient provider will be used.
-   This may or may not be the one that you intended.
+   Usage: <<<hadoop classpath [--glob|--jar <path>|-h|--help]>>>

-   Example: <<<-provider jceks://file/tmp/test.jceks>>>
+*-----------------+-----------------------------------------------------------+
+|| COMMAND_OPTION || Description
+*-----------------+-----------------------------------------------------------+
+| --glob          | expand wildcards
+*-----------------+-----------------------------------------------------------+
+| --jar <path>    | write classpath as manifest in jar named <path>
+*-----------------+-----------------------------------------------------------+
+| -h, --help      | print help
+*-----------------+-----------------------------------------------------------+
+
+   Prints the class path needed to get the Hadoop jar and the required
+   libraries.  If called without arguments, then prints the classpath set up by
+   the command scripts, which is likely to contain wildcards in the classpath
+   entries.  Additional options print the classpath after wildcard expansion or
+   write the classpath into the manifest of a jar file.  The latter is useful in
+   environments where wildcards cannot be used and the expanded classpath exceeds
+   the maximum supported command line length.
+
+** <<<credential>>>

   Usage: <<<hadoop credential <subcommand> [options]>>>

@ -143,109 +202,96 @@ User Commands
                    | indicated.
 *-------------------+-------------------------------------------------------+

-* <<<distcp>>>
+   Command to manage credentials, passwords and secrets within credential providers.
+
+   The CredentialProvider API in Hadoop allows for the separation of applications
+   and how they store their required passwords/secrets. In order to indicate
+   a particular provider type and location, the user must provide the
+   <hadoop.security.credential.provider.path> configuration element in core-site.xml
+   or use the command line option <<<-provider>>> on each of the following commands.
+   This provider path is a comma-separated list of URLs that indicates the type and
+   location of a list of providers that should be consulted. For example, the following path:
+   <<<user:///,jceks://file/tmp/test.jceks,jceks://hdfs@nn1.example.com/my/path/test.jceks>>>
+
+   indicates that the current user's credentials file should be consulted through
+   the User Provider, that the local file located at <<</tmp/test.jceks>>> is a Java Keystore
+   Provider and that the file located within HDFS at <<<nn1.example.com/my/path/test.jceks>>>
+   is also a store for a Java Keystore Provider.
+
+   When utilizing the credential command it will often be for provisioning a password
+   or secret to a particular credential store provider. In order to explicitly
+   indicate which provider store to use the <<<-provider>>> option should be used. Otherwise,
+   given a path of multiple providers, the first non-transient provider will be used.
+   This may or may not be the one that you intended.
+
+   Example: <<<-provider jceks://file/tmp/test.jceks>>>
+
+** <<<distch>>>
+
+  Usage: <<<hadoop distch [-f urilist_url] [-i] [-log logdir] path:owner:group:permissions>>>
+  
+*-------------------+-------------------------------------------------------+
+||COMMAND_OPTION    ||                   Description
+*-------------------+-------------------------------------------------------+
+| -f | List of objects to change
+*----+------------+
+| -i | Ignore failures
+*----+------------+
+| -log | Directory to log output
+*-----+---------+
+
+  Change the ownership and permissions on many files at once.
+
+** <<<distcp>>>

   Copy file or directories recursively. More information can be found at
   {{{../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/DistCp.html}
   Hadoop DistCp Guide}}.

-* <<<fs>>>
+** <<<fs>>>

-   Deprecated, use {{{../hadoop-hdfs/HDFSCommands.html#dfs}<<<hdfs dfs>>>}}
-   instead.
+   This command is documented in the {{{./FileSystemShell.html}File System Shell Guide}}.  It is a synonym for <<<hdfs dfs>>> when HDFS is in use.

-* <<<fsck>>>
+** <<<jar>>>

-   Deprecated, use {{{../hadoop-hdfs/HDFSCommands.html#fsck}<<<hdfs fsck>>>}}
-   instead.
+  Usage: <<<hadoop jar <jar> [mainClass] args...>>>

-* <<<fetchdt>>>
+  Runs a jar file. 
+  
+  Use {{{../../hadoop-yarn/hadoop-yarn-site/YarnCommands.html#jar}<<<yarn jar>>>}}
+  to launch YARN applications instead.

-   Deprecated, use {{{../hadoop-hdfs/HDFSCommands.html#fetchdt}
-   <<<hdfs fetchdt>>>}} instead.
+** <<<jnipath>>>

-* <<<jar>>>
+    Usage: <<<hadoop jnipath>>>

-   Runs a jar file. Users can bundle their Map Reduce code in a jar file and
-   execute it using this command.
+    Print the computed java.library.path.

-   Usage: <<<hadoop jar <jar> [mainClass] args...>>>
+** <<<key>>>

-   The streaming jobs are run via this command. Examples can be referred from
-   Streaming examples
+    Manage keys via the KeyProvider.

-   Word count example is also run using jar command. It can be referred from
-   Wordcount example
+** <<<trace>>>

-   Use {{{../../hadoop-yarn/hadoop-yarn-site/YarnCommands.html#jar}<<<yarn jar>>>}}
-   to launch YARN applications instead.
+    View and modify Hadoop tracing settings.   See the {{{./Tracing.html}Tracing Guide}}.

-* <<<job>>>
-
-   Deprecated. Use
-   {{{../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapredCommands.html#job}
-   <<<mapred job>>>}} instead.
-
-* <<<pipes>>>
-
-   Deprecated. Use
-   {{{../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapredCommands.html#pipes}
-   <<<mapred pipes>>>}} instead.
-
-* <<<queue>>>
-
-   Deprecated. Use
-   {{{../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapredCommands.html#queue}
-   <<<mapred queue>>>}} instead.
-
-* <<<version>>>
-
-   Prints the version.
+** <<<version>>>

   Usage: <<<hadoop version>>>

-* <<<CLASSNAME>>>
+   Prints the version.

-   hadoop script can be used to invoke any class.
+** <<<CLASSNAME>>>

   Usage: <<<hadoop CLASSNAME>>>

-   Runs the class named <<<CLASSNAME>>>.
+   Runs the class named <<<CLASSNAME>>>.  The class must be part of a package.

-* <<<classpath>>>
-
-   Prints the class path needed to get the Hadoop jar and the required
-   libraries.  If called without arguments, then prints the classpath set up by
-   the command scripts, which is likely to contain wildcards in the classpath
-   entries.  Additional options print the classpath after wildcard expansion or
-   write the classpath into the manifest of a jar file.  The latter is useful in
-   environments where wildcards cannot be used and the expanded classpath exceeds
-   the maximum supported command line length.
-
-   Usage: <<<hadoop classpath [--glob|--jar <path>|-h|--help]>>>
-
-*-----------------+-----------------------------------------------------------+
-|| COMMAND_OPTION || Description
-*-----------------+-----------------------------------------------------------+
-| --glob          | expand wildcards
-*-----------------+-----------------------------------------------------------+
-| --jar <path>    | write classpath as manifest in jar named <path>
-*-----------------+-----------------------------------------------------------+
-| -h, --help      | print help
-*-----------------+-----------------------------------------------------------+
-
-Administration Commands
+* {Administration Commands}

   Commands useful for administrators of a hadoop cluster.

-* <<<balancer>>>
-
-   Deprecated, use {{{../hadoop-hdfs/HDFSCommands.html#balancer}
-   <<<hdfs balancer>>>}} instead.
-
-* <<<daemonlog>>>
-
-   Get/Set the log level for each daemon.
+** <<<daemonlog>>>

   Usage: <<<hadoop daemonlog -getlevel <host:port> <name> >>>
   Usage: <<<hadoop daemonlog -setlevel <host:port> <name> <level> >>>
@ -262,22 +308,20 @@ Administration Commands
                               | connects to http://<host:port>/logLevel?log=<name>
 *------------------------------+-----------------------------------------------------------+

-* <<<datanode>>>
+   Get/Set the log level for each daemon.

-   Deprecated, use {{{../hadoop-hdfs/HDFSCommands.html#datanode}
-   <<<hdfs datanode>>>}} instead.
+* Files

-* <<<dfsadmin>>>
+** <<etc/hadoop/hadoop-env.sh>>

-   Deprecated, use {{{../hadoop-hdfs/HDFSCommands.html#dfsadmin}
-   <<<hdfs dfsadmin>>>}} instead.
+    This file stores the global settings used by all Hadoop shell commands.

-* <<<namenode>>>
+** <<etc/hadoop/hadoop-user-functions.sh>>

-   Deprecated, use {{{../hadoop-hdfs/HDFSCommands.html#namenode}
-   <<<hdfs namenode>>>}} instead.
+    This file allows for advanced users to override some shell functionality.

-* <<<secondarynamenode>>>
+** <<~/.hadooprc>>

-   Deprecated, use {{{../hadoop-hdfs/HDFSCommands.html#secondarynamenode}
-   <<<hdfs secondarynamenode>>>}} instead.
+    This stores the personal environment for an individual user.  It is
+    processed after the hadoop-env.sh and hadoop-user-functions.sh files
+    and can contain the same settings.
--- a/hadoop-common-project/hadoop-common/src/site/apt/FileSystemShell.apt.vm
+++ b/hadoop-common-project/hadoop-common/src/site/apt/FileSystemShell.apt.vm
@ -45,46 +45,62 @@ bin/hadoop fs <args>
   Differences are described with each of the commands. Error information is
   sent to stderr and the output is sent to stdout.

-appendToFile
+   If HDFS is being used, <<<hdfs dfs>>> is a synonym.

-      Usage: <<<hdfs dfs -appendToFile <localsrc> ... <dst> >>>
+   See the {{{./CommandsManual.html}Commands Manual}} for generic shell options.
+
+* appendToFile
+
+      Usage: <<<hadoop fs -appendToFile <localsrc> ... <dst> >>>

      Append single src, or multiple srcs from local file system to the
      destination file system. Also reads input from stdin and appends to
      destination file system.

-        * <<<hdfs dfs -appendToFile localfile /user/hadoop/hadoopfile>>>
+        * <<<hadoop fs -appendToFile localfile /user/hadoop/hadoopfile>>>

-        * <<<hdfs dfs -appendToFile localfile1 localfile2 /user/hadoop/hadoopfile>>>
+        * <<<hadoop fs -appendToFile localfile1 localfile2 /user/hadoop/hadoopfile>>>

-        * <<<hdfs dfs -appendToFile localfile hdfs://nn.example.com/hadoop/hadoopfile>>>
+        * <<<hadoop fs -appendToFile localfile hdfs://nn.example.com/hadoop/hadoopfile>>>

-        * <<<hdfs dfs -appendToFile - hdfs://nn.example.com/hadoop/hadoopfile>>>
+        * <<<hadoop fs -appendToFile - hdfs://nn.example.com/hadoop/hadoopfile>>>
          Reads the input from stdin.

      Exit Code:

      Returns 0 on success and 1 on error.

-cat
+* cat

-   Usage: <<<hdfs dfs -cat URI [URI ...]>>>
+   Usage: <<<hadoop fs -cat URI [URI ...]>>>

   Copies source paths to stdout.

   Example:

-     * <<<hdfs dfs -cat hdfs://nn1.example.com/file1 hdfs://nn2.example.com/file2>>>
+     * <<<hadoop fs -cat hdfs://nn1.example.com/file1 hdfs://nn2.example.com/file2>>>

-     * <<<hdfs dfs -cat file:///file3 /user/hadoop/file4>>>
+     * <<<hadoop fs -cat file:///file3 /user/hadoop/file4>>>

   Exit Code:

   Returns 0 on success and -1 on error.

-chgrp
+* checksum

-   Usage: <<<hdfs dfs -chgrp [-R] GROUP URI [URI ...]>>>
+  Usage: <<<hadoop fs -checksum URI>>>
+
+  Returns the checksum information of a file.
+
+  Example:
+
+    * <<<hadoop fs -checksum hdfs://nn1.example.com/file1>>>
+
+    * <<<hadoop fs -checksum file:///etc/hosts>>>
+
+* chgrp
+
+   Usage: <<<hadoop fs -chgrp [-R] GROUP URI [URI ...]>>>

   Change group association of files. The user must be the owner of files, or
   else a super-user. Additional information is in the
@ -94,9 +110,9 @@ chgrp

     * The -R option will make the change recursively through the directory structure.

-chmod
+* chmod

-   Usage: <<<hdfs dfs -chmod [-R] <MODE[,MODE]... | OCTALMODE> URI [URI ...]>>>
+   Usage: <<<hadoop fs -chmod [-R] <MODE[,MODE]... | OCTALMODE> URI [URI ...]>>>

   Change the permissions of files. With -R, make the change recursively
   through the directory structure. The user must be the owner of the file, or
@ -107,9 +123,9 @@ chmod

     * The -R option will make the change recursively through the directory structure.

-chown
+* chown

-   Usage: <<<hdfs dfs -chown [-R] [OWNER][:[GROUP]] URI [URI ]>>>
+   Usage: <<<hadoop fs -chown [-R] [OWNER][:[GROUP]] URI [URI ]>>>

   Change the owner of files. The user must be a super-user. Additional information
   is in the {{{../hadoop-hdfs/HdfsPermissionsGuide.html}Permissions Guide}}.
@ -118,9 +134,9 @@ chown

     * The -R option will make the change recursively through the directory structure.

-copyFromLocal
+* copyFromLocal

-   Usage: <<<hdfs dfs -copyFromLocal <localsrc> URI>>>
+   Usage: <<<hadoop fs -copyFromLocal <localsrc> URI>>>

   Similar to put command, except that the source is restricted to a local
   file reference.
@ -129,16 +145,16 @@ copyFromLocal

     * The -f option will overwrite the destination if it already exists.

-copyToLocal
+* copyToLocal

-   Usage: <<<hdfs dfs -copyToLocal [-ignorecrc] [-crc] URI <localdst> >>>
+   Usage: <<<hadoop fs -copyToLocal [-ignorecrc] [-crc] URI <localdst> >>>

   Similar to get command, except that the destination is restricted to a
   local file reference.

-count
+* count

-   Usage: <<<hdfs dfs -count [-q] [-h] <paths> >>>
+   Usage: <<<hadoop fs -count [-q] [-h] <paths> >>>

   Count the number of directories, files and bytes under the paths that match
   the specified file pattern.  The output columns with -count are: DIR_COUNT,
@ -151,19 +167,19 @@ count

   Example:

-     * <<<hdfs dfs -count hdfs://nn1.example.com/file1 hdfs://nn2.example.com/file2>>>
+     * <<<hadoop fs -count hdfs://nn1.example.com/file1 hdfs://nn2.example.com/file2>>>

-     * <<<hdfs dfs -count -q hdfs://nn1.example.com/file1>>>
+     * <<<hadoop fs -count -q hdfs://nn1.example.com/file1>>>

-     * <<<hdfs dfs -count -q -h hdfs://nn1.example.com/file1>>>
+     * <<<hadoop fs -count -q -h hdfs://nn1.example.com/file1>>>

   Exit Code:

   Returns 0 on success and -1 on error.

-cp
+* cp

-   Usage: <<<hdfs dfs -cp [-f] [-p | -p[topax]] URI [URI ...] <dest> >>>
+   Usage: <<<hadoop fs -cp [-f] [-p | -p[topax]] URI [URI ...] <dest> >>>

   Copy files from source to destination. This command allows multiple sources
   as well in which case the destination must be a directory.
@ -177,7 +193,7 @@ cp
    Options:

      * The -f option will overwrite the destination if it already exists.
-      
+
      * The -p option will preserve file attributes [topx] (timestamps,
        ownership, permission, ACL, XAttr). If -p is specified with no <arg>,
        then preserves timestamps, ownership, permission. If -pa is specified,
@ -187,17 +203,41 @@ cp

   Example:

-     * <<<hdfs dfs -cp /user/hadoop/file1 /user/hadoop/file2>>>
+     * <<<hadoop fs -cp /user/hadoop/file1 /user/hadoop/file2>>>

-     * <<<hdfs dfs -cp /user/hadoop/file1 /user/hadoop/file2 /user/hadoop/dir>>>
+     * <<<hadoop fs -cp /user/hadoop/file1 /user/hadoop/file2 /user/hadoop/dir>>>

   Exit Code:

   Returns 0 on success and -1 on error.

-du
+* createSnapshot

-   Usage: <<<hdfs dfs -du [-s] [-h] URI [URI ...]>>>
+  See {{{../hadoop-hdfs/HdfsSnapshots.html}HDFS Snapshots Guide}}.
+
+
+* deleteSnapshot
+
+  See {{{../hadoop-hdfs/HdfsSnapshots.html}HDFS Snapshots Guide}}.
+
+* df
+
+   Usage: <<<hadoop fs -df [-h] URI [URI ...]>>>
+
+   Displays free space.
+
+   Options:
+
+     * The -h option will format file sizes in a "human-readable" fashion (e.g
+       64.0m instead of 67108864)
+
+   Example:
+
+     * <<<hadoop dfs -df /user/hadoop/dir1>>>
+
+* du
+
+   Usage: <<<hadoop fs -du [-s] [-h] URI [URI ...]>>>

   Displays sizes of files and directories contained in the given directory or
   the length of a file in case its just a file.
@ -212,29 +252,29 @@ du

   Example:

-    * hdfs dfs -du /user/hadoop/dir1 /user/hadoop/file1 hdfs://nn.example.com/user/hadoop/dir1
+    * <<<hadoop fs -du /user/hadoop/dir1 /user/hadoop/file1 hdfs://nn.example.com/user/hadoop/dir1>>>

   Exit Code:
   Returns 0 on success and -1 on error.

-dus
+* dus

-   Usage: <<<hdfs dfs -dus <args> >>>
+   Usage: <<<hadoop fs -dus <args> >>>

   Displays a summary of file lengths.

-   <<Note:>> This command is deprecated. Instead use <<<hdfs dfs -du -s>>>.
+   <<Note:>> This command is deprecated. Instead use <<<hadoop fs -du -s>>>.

-expunge
+* expunge

-   Usage: <<<hdfs dfs -expunge>>>
+   Usage: <<<hadoop fs -expunge>>>

   Empty the Trash. Refer to the {{{../hadoop-hdfs/HdfsDesign.html}
   HDFS Architecture Guide}} for more information on the Trash feature.

-find
+* find

-   Usage: <<<hdfs dfs -find <path> ... <expression> ... >>>
+   Usage: <<<hadoop fs -find <path> ... <expression> ... >>>

   Finds all files that match the specified expression and applies selected
   actions to them. If no <path> is specified then defaults to the current
@ -269,15 +309,15 @@ find

   Example:

-   <<<hdfs dfs -find / -name test -print>>>
+   <<<hadoop fs -find / -name test -print>>>

   Exit Code:

     Returns 0 on success and -1 on error.

-get
+* get

-   Usage: <<<hdfs dfs -get [-ignorecrc] [-crc] <src> <localdst> >>>
+   Usage: <<<hadoop fs -get [-ignorecrc] [-crc] <src> <localdst> >>>

   Copy files to the local file system. Files that fail the CRC check may be
   copied with the -ignorecrc option. Files and CRCs may be copied using the
@ -285,17 +325,17 @@ get

   Example:

-     * <<<hdfs dfs -get /user/hadoop/file localfile>>>
+     * <<<hadoop fs -get /user/hadoop/file localfile>>>

-     * <<<hdfs dfs -get hdfs://nn.example.com/user/hadoop/file localfile>>>
+     * <<<hadoop fs -get hdfs://nn.example.com/user/hadoop/file localfile>>>

   Exit Code:

   Returns 0 on success and -1 on error.

-getfacl
+* getfacl

-   Usage: <<<hdfs dfs -getfacl [-R] <path> >>>
+   Usage: <<<hadoop fs -getfacl [-R] <path> >>>

   Displays the Access Control Lists (ACLs) of files and directories. If a
   directory has a default ACL, then getfacl also displays the default ACL.
@ -308,17 +348,17 @@ getfacl

   Examples:

-     * <<<hdfs dfs -getfacl /file>>>
+     * <<<hadoop fs -getfacl /file>>>

-     * <<<hdfs dfs -getfacl -R /dir>>>
+     * <<<hadoop fs -getfacl -R /dir>>>

   Exit Code:

   Returns 0 on success and non-zero on error.

-getfattr
+* getfattr

-   Usage: <<<hdfs dfs -getfattr [-R] {-n name | -d} [-e en] <path> >>>
+   Usage: <<<hadoop fs -getfattr [-R] {-n name | -d} [-e en] <path> >>>

   Displays the extended attribute names and values (if any) for a file or
   directory.
@ -337,26 +377,32 @@ getfattr

   Examples:

-     * <<<hdfs dfs -getfattr -d /file>>>
+     * <<<hadoop fs -getfattr -d /file>>>

-     * <<<hdfs dfs -getfattr -R -n user.myAttr /dir>>>
+     * <<<hadoop fs -getfattr -R -n user.myAttr /dir>>>

   Exit Code:

   Returns 0 on success and non-zero on error.

-getmerge
+* getmerge

-   Usage: <<<hdfs dfs -getmerge <src> <localdst> [addnl]>>>
+   Usage: <<<hadoop fs -getmerge <src> <localdst> [addnl]>>>

   Takes a source directory and a destination file as input and concatenates
   files in src into the destination local file. Optionally addnl can be set to
   enable adding a newline character at the
   end of each file.

-ls
+* help

-   Usage: <<<hdfs dfs -ls [-R] <args> >>>
+   Usage: <<<hadoop fs -help>>>
+
+   Return usage output.
+
+* ls
+
+   Usage: <<<hadoop fs -ls [-R] <args> >>>

   Options:

@ -377,23 +423,23 @@ permissions userid groupid modification_date modification_time dirname

   Example:

-     * <<<hdfs dfs -ls /user/hadoop/file1>>>
+     * <<<hadoop fs -ls /user/hadoop/file1>>>

   Exit Code:

   Returns 0 on success and -1 on error.

-lsr
+* lsr

-   Usage: <<<hdfs dfs -lsr <args> >>>
+   Usage: <<<hadoop fs -lsr <args> >>>

   Recursive version of ls.

-   <<Note:>> This command is deprecated. Instead use <<<hdfs dfs -ls -R>>>
+   <<Note:>> This command is deprecated. Instead use <<<hadoop fs -ls -R>>>

-mkdir
+* mkdir

-   Usage: <<<hdfs dfs -mkdir [-p] <paths> >>>
+   Usage: <<<hadoop fs -mkdir [-p] <paths> >>>

   Takes path uri's as argument and creates directories.

@ -403,30 +449,30 @@ mkdir

   Example:

-     * <<<hdfs dfs -mkdir /user/hadoop/dir1 /user/hadoop/dir2>>>
+     * <<<hadoop fs -mkdir /user/hadoop/dir1 /user/hadoop/dir2>>>

-     * <<<hdfs dfs -mkdir hdfs://nn1.example.com/user/hadoop/dir hdfs://nn2.example.com/user/hadoop/dir>>>
+     * <<<hadoop fs -mkdir hdfs://nn1.example.com/user/hadoop/dir hdfs://nn2.example.com/user/hadoop/dir>>>

   Exit Code:

   Returns 0 on success and -1 on error.

-moveFromLocal
+* moveFromLocal

-   Usage: <<<hdfs dfs -moveFromLocal <localsrc> <dst> >>>
+   Usage: <<<hadoop fs -moveFromLocal <localsrc> <dst> >>>

   Similar to put command, except that the source localsrc is deleted after
   it's copied.

-moveToLocal
+* moveToLocal

-   Usage: <<<hdfs dfs -moveToLocal [-crc] <src> <dst> >>>
+   Usage: <<<hadoop fs -moveToLocal [-crc] <src> <dst> >>>

   Displays a "Not implemented yet" message.

-mv
+* mv

-   Usage: <<<hdfs dfs -mv URI [URI ...] <dest> >>>
+   Usage: <<<hadoop fs -mv URI [URI ...] <dest> >>>

   Moves files from source to destination. This command allows multiple sources
   as well in which case the destination needs to be a directory. Moving files
@ -434,38 +480,42 @@ mv

   Example:

-     * <<<hdfs dfs -mv /user/hadoop/file1 /user/hadoop/file2>>>
+     * <<<hadoop fs -mv /user/hadoop/file1 /user/hadoop/file2>>>

-     * <<<hdfs dfs -mv hdfs://nn.example.com/file1 hdfs://nn.example.com/file2 hdfs://nn.example.com/file3 hdfs://nn.example.com/dir1>>>
+     * <<<hadoop fs -mv hdfs://nn.example.com/file1 hdfs://nn.example.com/file2 hdfs://nn.example.com/file3 hdfs://nn.example.com/dir1>>>

   Exit Code:

   Returns 0 on success and -1 on error.

-put
+* put

-   Usage: <<<hdfs dfs -put <localsrc> ... <dst> >>>
+   Usage: <<<hadoop fs -put <localsrc> ... <dst> >>>

   Copy single src, or multiple srcs from local file system to the destination
   file system. Also reads input from stdin and writes to destination file
   system.

-     * <<<hdfs dfs -put localfile /user/hadoop/hadoopfile>>>
+     * <<<hadoop fs -put localfile /user/hadoop/hadoopfile>>>

-     * <<<hdfs dfs -put localfile1 localfile2 /user/hadoop/hadoopdir>>>
+     * <<<hadoop fs -put localfile1 localfile2 /user/hadoop/hadoopdir>>>

-     * <<<hdfs dfs -put localfile hdfs://nn.example.com/hadoop/hadoopfile>>>
+     * <<<hadoop fs -put localfile hdfs://nn.example.com/hadoop/hadoopfile>>>

-     * <<<hdfs dfs -put - hdfs://nn.example.com/hadoop/hadoopfile>>>
+     * <<<hadoop fs -put - hdfs://nn.example.com/hadoop/hadoopfile>>>
       Reads the input from stdin.

   Exit Code:

   Returns 0 on success and -1 on error.

-rm
+* renameSnapshot

-   Usage: <<<hdfs dfs -rm [-f] [-r|-R] [-skipTrash] URI [URI ...]>>>
+  See {{{../hadoop-hdfs/HdfsSnapshots.html}HDFS Snapshots Guide}}.
+
+* rm
+
+   Usage: <<<hadoop fs -rm [-f] [-r|-R] [-skipTrash] URI [URI ...]>>>

   Delete files specified as args.

@ -484,23 +534,37 @@ rm

   Example:

-     * <<<hdfs dfs -rm hdfs://nn.example.com/file /user/hadoop/emptydir>>>
+     * <<<hadoop fs -rm hdfs://nn.example.com/file /user/hadoop/emptydir>>>

   Exit Code:

   Returns 0 on success and -1 on error.

-rmr
+* rmdir

-   Usage: <<<hdfs dfs -rmr [-skipTrash] URI [URI ...]>>>
+   Usage: <<<hadoop fs -rmdir [--ignore-fail-on-non-empty] URI [URI ...]>>>
+
+   Delete a directory.
+
+   Options:
+
+     * --ignore-fail-on-non-empty: When using wildcards, do not fail if a directory still contains files.
+
+   Example:
+
+     * <<<hadoop fs -rmdir /user/hadoop/emptydir>>>
+
+* rmr
+
+   Usage: <<<hadoop fs -rmr [-skipTrash] URI [URI ...]>>>

   Recursive version of delete.

-   <<Note:>> This command is deprecated. Instead use <<<hdfs dfs -rm -r>>>
+   <<Note:>> This command is deprecated. Instead use <<<hadoop fs -rm -r>>>

-setfacl
+* setfacl

-   Usage: <<<hdfs dfs -setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>] >>>
+   Usage: <<<hadoop fs -setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>] >>>

   Sets Access Control Lists (ACLs) of files and directories.

@ -528,27 +592,27 @@ setfacl

   Examples:

-      * <<<hdfs dfs -setfacl -m user:hadoop:rw- /file>>>
+      * <<<hadoop fs -setfacl -m user:hadoop:rw- /file>>>

-      * <<<hdfs dfs -setfacl -x user:hadoop /file>>>
+      * <<<hadoop fs -setfacl -x user:hadoop /file>>>

-      * <<<hdfs dfs -setfacl -b /file>>>
+      * <<<hadoop fs -setfacl -b /file>>>

-      * <<<hdfs dfs -setfacl -k /dir>>>
+      * <<<hadoop fs -setfacl -k /dir>>>

-      * <<<hdfs dfs -setfacl --set user::rw-,user:hadoop:rw-,group::r--,other::r-- /file>>>
+      * <<<hadoop fs -setfacl --set user::rw-,user:hadoop:rw-,group::r--,other::r-- /file>>>

-      * <<<hdfs dfs -setfacl -R -m user:hadoop:r-x /dir>>>
+      * <<<hadoop fs -setfacl -R -m user:hadoop:r-x /dir>>>

-      * <<<hdfs dfs -setfacl -m default:user:hadoop:r-x /dir>>>
+      * <<<hadoop fs -setfacl -m default:user:hadoop:r-x /dir>>>

   Exit Code:

   Returns 0 on success and non-zero on error.

-setfattr
+* setfattr

-   Usage: <<<hdfs dfs -setfattr {-n name [-v value] | -x name} <path> >>>
+   Usage: <<<hadoop fs -setfattr {-n name [-v value] | -x name} <path> >>>

   Sets an extended attribute name and value for a file or directory.

@ -566,19 +630,19 @@ setfattr

   Examples:

-      * <<<hdfs dfs -setfattr -n user.myAttr -v myValue /file>>>
+      * <<<hadoop fs -setfattr -n user.myAttr -v myValue /file>>>

-      * <<<hdfs dfs -setfattr -n user.noValue /file>>>
+      * <<<hadoop fs -setfattr -n user.noValue /file>>>

-      * <<<hdfs dfs -setfattr -x user.myAttr /file>>>
+      * <<<hadoop fs -setfattr -x user.myAttr /file>>>

   Exit Code:

   Returns 0 on success and non-zero on error.

-setrep
+* setrep

-   Usage: <<<hdfs dfs -setrep [-R] [-w] <numReplicas> <path> >>>
+   Usage: <<<hadoop fs -setrep [-R] [-w] <numReplicas> <path> >>>

   Changes the replication factor of a file. If <path> is a directory then
   the command recursively changes the replication factor of all files under
@ -593,28 +657,28 @@ setrep

   Example:

-     * <<<hdfs dfs -setrep -w 3 /user/hadoop/dir1>>>
+     * <<<hadoop fs -setrep -w 3 /user/hadoop/dir1>>>

   Exit Code:

   Returns 0 on success and -1 on error.

-stat
+* stat

-   Usage: <<<hdfs dfs -stat URI [URI ...]>>>
+   Usage: <<<hadoop fs -stat URI [URI ...]>>>

   Returns the stat information on the path.

   Example:

-     * <<<hdfs dfs -stat path>>>
+     * <<<hadoop fs -stat path>>>

   Exit Code:
   Returns 0 on success and -1 on error.

-tail
+* tail

-   Usage: <<<hdfs dfs -tail [-f] URI>>>
+   Usage: <<<hadoop fs -tail [-f] URI>>>

   Displays last kilobyte of the file to stdout.

@ -624,43 +688,54 @@ tail

   Example:

-     * <<<hdfs dfs -tail pathname>>>
+     * <<<hadoop fs -tail pathname>>>

   Exit Code:
   Returns 0 on success and -1 on error.

-test
+* test

-   Usage: <<<hdfs dfs -test -[ezd] URI>>>
+   Usage: <<<hadoop fs -test -[defsz] URI>>>

   Options:

-     * The -e option will check to see if the file exists, returning 0 if true.
+     * -d: f the path is a directory, return 0.

-     * The -z option will check to see if the file is zero length, returning 0 if true.
+     * -e: if the path exists, return 0.

-     * The -d option will check to see if the path is directory, returning 0 if true.
+     * -f: if the path is a file, return 0.
+
+     * -s: if the path is not empty, return 0.
+
+     * -z: if the file is zero length, return 0.

   Example:

-     * <<<hdfs dfs -test -e filename>>>
+     * <<<hadoop fs -test -e filename>>>

-text
+* text

-   Usage: <<<hdfs dfs -text <src> >>>
+   Usage: <<<hadoop fs -text <src> >>>

   Takes a source file and outputs the file in text format. The allowed formats
   are zip and TextRecordInputStream.

-touchz
+* touchz

-   Usage: <<<hdfs dfs -touchz URI [URI ...]>>>
+   Usage: <<<hadoop fs -touchz URI [URI ...]>>>

   Create a file of zero length.

   Example:

-     * <<<hdfs dfs -touchz pathname>>>
+     * <<<hadoop fs -touchz pathname>>>

   Exit Code:
   Returns 0 on success and -1 on error.
+
+
+* usage
+
+   Usage: <<<hadoop fs -usage command>>>
+
+   Return the help for an individual command.
--- a/hadoop-common-project/hadoop-common/src/site/apt/SingleCluster.apt.vm
+++ b/hadoop-common-project/hadoop-common/src/site/apt/SingleCluster.apt.vm
@ -11,12 +11,12 @@
 ~~ limitations under the License. See accompanying LICENSE file.

  ---
-  Hadoop MapReduce Next Generation ${project.version} - Setting up a Single Node Cluster.
+  Hadoop ${project.version} - Setting up a Single Node Cluster.
  ---
  ---
  ${maven.build.timestamp}

-Hadoop MapReduce Next Generation - Setting up a Single Node Cluster.
+Hadoop - Setting up a Single Node Cluster.

 %{toc|section=1|fromDepth=0}

@ -46,7 +46,9 @@ Hadoop MapReduce Next Generation - Setting up a Single Node Cluster.
         HadoopJavaVersions}}.

   [[2]] ssh must be installed and sshd must be running to use the Hadoop
-         scripts that manage remote Hadoop daemons.
+         scripts that manage remote Hadoop daemons if the optional start
+         and stop scripts are to be used.  Additionally, it is recommmended that
+         pdsh also be installed for better ssh resource management.

 ** Installing Software

@ -57,7 +59,7 @@ Hadoop MapReduce Next Generation - Setting up a Single Node Cluster.

 ----
  $ sudo apt-get install ssh
-  $ sudo apt-get install rsync
+  $ sudo apt-get install pdsh
 ----

 * Download
@ -75,9 +77,6 @@ Hadoop MapReduce Next Generation - Setting up a Single Node Cluster.
 ----
  # set to the root of your Java installation
  export JAVA_HOME=/usr/java/latest
-
-  # Assuming your installation directory is /usr/local/hadoop
-  export HADOOP_PREFIX=/usr/local/hadoop
 ----

  Try the following command:
@ -158,6 +157,7 @@ Hadoop MapReduce Next Generation - Setting up a Single Node Cluster.
 ----
  $ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
  $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
+  $ chmod 0700 ~/.ssh/authorized_keys
 ----

 ** Execution
@ -228,7 +228,7 @@ Hadoop MapReduce Next Generation - Setting up a Single Node Cluster.
  $ sbin/stop-dfs.sh
 ----

-** YARN on Single Node
+** YARN on a Single Node

  You can run a MapReduce job on YARN in a pseudo-distributed mode by setting
  a few parameters and running ResourceManager daemon and NodeManager daemon
@ -239,7 +239,7 @@ Hadoop MapReduce Next Generation - Setting up a Single Node Cluster.

  [[1]] Configure parameters as follows:

-        etc/hadoop/mapred-site.xml:
+        <<<etc/hadoop/mapred-site.xml>>>:

 +---+
 <configuration>
@ -250,7 +250,7 @@ Hadoop MapReduce Next Generation - Setting up a Single Node Cluster.
 </configuration>
 +---+

-        etc/hadoop/yarn-site.xml:
+        <<<etc/hadoop/yarn-site.xml>>>:

 +---+
 <configuration>