HADOOP-10908. Common needs updates for shell rewrite (aw)
This commit is contained in:
parent
41d72cbd48
commit
94d342e607
|
@ -344,6 +344,8 @@ Trunk (Unreleased)
|
|||
|
||||
HADOOP-11397. Can't override HADOOP_IDENT_STRING (Kengo Seki via aw)
|
||||
|
||||
HADOOP-10908. Common needs updates for shell rewrite (aw)
|
||||
|
||||
OPTIMIZATIONS
|
||||
|
||||
HADOOP-7761. Improve the performance of raw comparisons. (todd)
|
||||
|
|
|
@ -11,83 +11,81 @@
|
|||
~~ limitations under the License. See accompanying LICENSE file.
|
||||
|
||||
---
|
||||
Hadoop Map Reduce Next Generation-${project.version} - Cluster Setup
|
||||
Hadoop ${project.version} - Cluster Setup
|
||||
---
|
||||
---
|
||||
${maven.build.timestamp}
|
||||
|
||||
%{toc|section=1|fromDepth=0}
|
||||
|
||||
Hadoop MapReduce Next Generation - Cluster Setup
|
||||
Hadoop Cluster Setup
|
||||
|
||||
* {Purpose}
|
||||
|
||||
This document describes how to install, configure and manage non-trivial
|
||||
This document describes how to install and configure
|
||||
Hadoop clusters ranging from a few nodes to extremely large clusters
|
||||
with thousands of nodes.
|
||||
with thousands of nodes. To play with Hadoop, you may first want to
|
||||
install it on a single machine (see {{{./SingleCluster.html}Single Node Setup}}).
|
||||
|
||||
To play with Hadoop, you may first want to install it on a single
|
||||
machine (see {{{./SingleCluster.html}Single Node Setup}}).
|
||||
This document does not cover advanced topics such as {{{./SecureMode.html}Security}} or
|
||||
High Availability.
|
||||
|
||||
* {Prerequisites}
|
||||
|
||||
Download a stable version of Hadoop from Apache mirrors.
|
||||
* Install Java. See the {{{http://wiki.apache.org/hadoop/HadoopJavaVersions}Hadoop Wiki}} for known good versions.
|
||||
* Download a stable version of Hadoop from Apache mirrors.
|
||||
|
||||
* {Installation}
|
||||
|
||||
Installing a Hadoop cluster typically involves unpacking the software on all
|
||||
the machines in the cluster or installing RPMs.
|
||||
the machines in the cluster or installing it via a packaging system as
|
||||
appropriate for your operating system. It is important to divide up the hardware
|
||||
into functions.
|
||||
|
||||
Typically one machine in the cluster is designated as the NameNode and
|
||||
another machine the as ResourceManager, exclusively. These are the masters.
|
||||
another machine the as ResourceManager, exclusively. These are the masters. Other
|
||||
services (such as Web App Proxy Server and MapReduce Job History server) are usually
|
||||
run either on dedicated hardware or on shared infrastrucutre, depending upon the load.
|
||||
|
||||
The rest of the machines in the cluster act as both DataNode and NodeManager.
|
||||
These are the slaves.
|
||||
|
||||
* {Running Hadoop in Non-Secure Mode}
|
||||
* {Configuring Hadoop in Non-Secure Mode}
|
||||
|
||||
The following sections describe how to configure a Hadoop cluster.
|
||||
|
||||
{Configuration Files}
|
||||
|
||||
Hadoop configuration is driven by two types of important configuration files:
|
||||
Hadoop's Java configuration is driven by two types of important configuration files:
|
||||
|
||||
* Read-only default configuration - <<<core-default.xml>>>,
|
||||
<<<hdfs-default.xml>>>, <<<yarn-default.xml>>> and
|
||||
<<<mapred-default.xml>>>.
|
||||
|
||||
* Site-specific configuration - <<conf/core-site.xml>>,
|
||||
<<conf/hdfs-site.xml>>, <<conf/yarn-site.xml>> and
|
||||
<<conf/mapred-site.xml>>.
|
||||
* Site-specific configuration - <<<etc/hadoop/core-site.xml>>>,
|
||||
<<<etc/hadoop/hdfs-site.xml>>>, <<<etc/hadoop/yarn-site.xml>>> and
|
||||
<<<etc/hadoop/mapred-site.xml>>>.
|
||||
|
||||
|
||||
Additionally, you can control the Hadoop scripts found in the bin/
|
||||
directory of the distribution, by setting site-specific values via the
|
||||
<<conf/hadoop-env.sh>> and <<yarn-env.sh>>.
|
||||
|
||||
{Site Configuration}
|
||||
Additionally, you can control the Hadoop scripts found in the bin/
|
||||
directory of the distribution, by setting site-specific values via the
|
||||
<<<etc/hadoop/hadoop-env.sh>>> and <<<etc/hadoop/yarn-env.sh>>>.
|
||||
|
||||
To configure the Hadoop cluster you will need to configure the
|
||||
<<<environment>>> in which the Hadoop daemons execute as well as the
|
||||
<<<configuration parameters>>> for the Hadoop daemons.
|
||||
|
||||
The Hadoop daemons are NameNode/DataNode and ResourceManager/NodeManager.
|
||||
HDFS daemons are NameNode, SecondaryNameNode, and DataNode. YARN damones
|
||||
are ResourceManager, NodeManager, and WebAppProxy. If MapReduce is to be
|
||||
used, then the MapReduce Job History Server will also be running. For
|
||||
large installations, these are generally running on separate hosts.
|
||||
|
||||
|
||||
** {Configuring Environment of Hadoop Daemons}
|
||||
|
||||
Administrators should use the <<conf/hadoop-env.sh>> and
|
||||
<<conf/yarn-env.sh>> script to do site-specific customization of the
|
||||
Hadoop daemons' process environment.
|
||||
Administrators should use the <<<etc/hadoop/hadoop-env.sh>>> and optionally the
|
||||
<<<etc/hadoop/mapred-env.sh>>> and <<<etc/hadoop/yarn-env.sh>>> scripts to do
|
||||
site-specific customization of the Hadoop daemons' process environment.
|
||||
|
||||
At the very least you should specify the <<<JAVA_HOME>>> so that it is
|
||||
At the very least, you must specify the <<<JAVA_HOME>>> so that it is
|
||||
correctly defined on each remote node.
|
||||
|
||||
In most cases you should also specify <<<HADOOP_PID_DIR>>> and
|
||||
<<<HADOOP_SECURE_DN_PID_DIR>>> to point to directories that can only be
|
||||
written to by the users that are going to run the hadoop daemons.
|
||||
Otherwise there is the potential for a symlink attack.
|
||||
|
||||
Administrators can configure individual daemons using the configuration
|
||||
options shown below in the table:
|
||||
|
||||
|
@ -114,20 +112,42 @@ Hadoop MapReduce Next Generation - Cluster Setup
|
|||
statement should be added in hadoop-env.sh :
|
||||
|
||||
----
|
||||
export HADOOP_NAMENODE_OPTS="-XX:+UseParallelGC ${HADOOP_NAMENODE_OPTS}"
|
||||
export HADOOP_NAMENODE_OPTS="-XX:+UseParallelGC"
|
||||
----
|
||||
|
||||
See <<<etc/hadoop/hadoop-env.sh>>> for other examples.
|
||||
|
||||
Other useful configuration parameters that you can customize include:
|
||||
|
||||
* <<<HADOOP_LOG_DIR>>> / <<<YARN_LOG_DIR>>> - The directory where the
|
||||
daemons' log files are stored. They are automatically created if they
|
||||
don't exist.
|
||||
* <<<HADOOP_PID_DIR>>> - The directory where the
|
||||
daemons' process id files are stored.
|
||||
|
||||
* <<<HADOOP_HEAPSIZE>>> / <<<YARN_HEAPSIZE>>> - The maximum amount of
|
||||
heapsize to use, in MB e.g. if the varibale is set to 1000 the heap
|
||||
will be set to 1000MB. This is used to configure the heap
|
||||
size for the daemon. By default, the value is 1000. If you want to
|
||||
configure the values separately for each deamon you can use.
|
||||
* <<<HADOOP_LOG_DIR>>> - The directory where the
|
||||
daemons' log files are stored. Log files are automatically created
|
||||
if they don't exist.
|
||||
|
||||
* <<<HADOOP_HEAPSIZE_MAX>>> - The maximum amount of
|
||||
memory to use for the Java heapsize. Units supported by the JVM
|
||||
are also supported here. If no unit is present, it will be assumed
|
||||
the number is in megabytes. By default, Hadoop will let the JVM
|
||||
determine how much to use. This value can be overriden on
|
||||
a per-daemon basis using the appropriate <<<_OPTS>>> variable listed above.
|
||||
For example, setting <<<HADOOP_HEAPSIZE_MAX=1g>>> and
|
||||
<<<HADOOP_NAMENODE_OPTS="-Xmx5g">>> will configure the NameNode with 5GB heap.
|
||||
|
||||
In most cases, you should specify the <<<HADOOP_PID_DIR>>> and
|
||||
<<<HADOOP_LOG_DIR>>> directories such that they can only be
|
||||
written to by the users that are going to run the hadoop daemons.
|
||||
Otherwise there is the potential for a symlink attack.
|
||||
|
||||
It is also traditional to configure <<<HADOOP_PREFIX>>> in the system-wide
|
||||
shell environment configuration. For example, a simple script inside
|
||||
<<</etc/profile.d>>>:
|
||||
|
||||
---
|
||||
HADOOP_PREFIX=/path/to/hadoop
|
||||
export HADOOP_PREFIX
|
||||
---
|
||||
|
||||
*--------------------------------------+--------------------------------------+
|
||||
|| Daemon || Environment Variable |
|
||||
|
@ -141,12 +161,12 @@ Hadoop MapReduce Next Generation - Cluster Setup
|
|||
| Map Reduce Job History Server | HADOOP_JOB_HISTORYSERVER_HEAPSIZE |
|
||||
*--------------------------------------+--------------------------------------+
|
||||
|
||||
** {Configuring the Hadoop Daemons in Non-Secure Mode}
|
||||
** {Configuring the Hadoop Daemons}
|
||||
|
||||
This section deals with important parameters to be specified in
|
||||
the given configuration files:
|
||||
|
||||
* <<<conf/core-site.xml>>>
|
||||
* <<<etc/hadoop/core-site.xml>>>
|
||||
|
||||
*-------------------------+-------------------------+------------------------+
|
||||
|| Parameter || Value || Notes |
|
||||
|
@ -157,7 +177,7 @@ Hadoop MapReduce Next Generation - Cluster Setup
|
|||
| | | Size of read/write buffer used in SequenceFiles. |
|
||||
*-------------------------+-------------------------+------------------------+
|
||||
|
||||
* <<<conf/hdfs-site.xml>>>
|
||||
* <<<etc/hadoop/hdfs-site.xml>>>
|
||||
|
||||
* Configurations for NameNode:
|
||||
|
||||
|
@ -195,7 +215,7 @@ Hadoop MapReduce Next Generation - Cluster Setup
|
|||
| | | stored in all named directories, typically on different devices. |
|
||||
*-------------------------+-------------------------+------------------------+
|
||||
|
||||
* <<<conf/yarn-site.xml>>>
|
||||
* <<<etc/hadoop/yarn-site.xml>>>
|
||||
|
||||
* Configurations for ResourceManager and NodeManager:
|
||||
|
||||
|
@ -341,9 +361,7 @@ Hadoop MapReduce Next Generation - Cluster Setup
|
|||
| | | Be careful, set this too small and you will spam the name node. |
|
||||
*-------------------------+-------------------------+------------------------+
|
||||
|
||||
|
||||
|
||||
* <<<conf/mapred-site.xml>>>
|
||||
* <<<etc/hadoop/mapred-site.xml>>>
|
||||
|
||||
* Configurations for MapReduce Applications:
|
||||
|
||||
|
@ -395,22 +413,6 @@ Hadoop MapReduce Next Generation - Cluster Setup
|
|||
| | | Directory where history files are managed by the MR JobHistory Server. |
|
||||
*-------------------------+-------------------------+------------------------+
|
||||
|
||||
* {Hadoop Rack Awareness}
|
||||
|
||||
The HDFS and the YARN components are rack-aware.
|
||||
|
||||
The NameNode and the ResourceManager obtains the rack information of the
|
||||
slaves in the cluster by invoking an API <resolve> in an administrator
|
||||
configured module.
|
||||
|
||||
The API resolves the DNS name (also IP address) to a rack id.
|
||||
|
||||
The site-specific module to use can be configured using the configuration
|
||||
item <<<topology.node.switch.mapping.impl>>>. The default implementation
|
||||
of the same runs a script/command configured using
|
||||
<<<topology.script.file.name>>>. If <<<topology.script.file.name>>> is
|
||||
not set, the rack id </default-rack> is returned for any passed IP address.
|
||||
|
||||
* {Monitoring Health of NodeManagers}
|
||||
|
||||
Hadoop provides a mechanism by which administrators can configure the
|
||||
|
@ -433,7 +435,7 @@ Hadoop MapReduce Next Generation - Cluster Setup
|
|||
node was healthy is also displayed on the web interface.
|
||||
|
||||
The following parameters can be used to control the node health
|
||||
monitoring script in <<<conf/yarn-site.xml>>>.
|
||||
monitoring script in <<<etc/hadoop/yarn-site.xml>>>.
|
||||
|
||||
*-------------------------+-------------------------+------------------------+
|
||||
|| Parameter || Value || Notes |
|
||||
|
@ -465,165 +467,87 @@ Hadoop MapReduce Next Generation - Cluster Setup
|
|||
disk is either raided or a failure in the boot disk is identified by the
|
||||
health checker script.
|
||||
|
||||
* {Slaves file}
|
||||
* {Slaves File}
|
||||
|
||||
Typically you choose one machine in the cluster to act as the NameNode and
|
||||
one machine as to act as the ResourceManager, exclusively. The rest of the
|
||||
machines act as both a DataNode and NodeManager and are referred to as
|
||||
<slaves>.
|
||||
List all slave hostnames or IP addresses in your <<<etc/hadoop/slaves>>>
|
||||
file, one per line. Helper scripts (described below) will use the
|
||||
<<<etc/hadoop/slaves>>> file to run commands on many hosts at once. It is not
|
||||
used for any of the Java-based Hadoop configuration. In order
|
||||
to use this functionality, ssh trusts (via either passphraseless ssh or
|
||||
some other means, such as Kerberos) must be established for the accounts
|
||||
used to run Hadoop.
|
||||
|
||||
List all slave hostnames or IP addresses in your <<<conf/slaves>>> file,
|
||||
one per line.
|
||||
* {Hadoop Rack Awareness}
|
||||
|
||||
Many Hadoop components are rack-aware and take advantage of the
|
||||
network topology for performance and safety. Hadoop daemons obtain the
|
||||
rack information of the slaves in the cluster by invoking an administrator
|
||||
configured module. See the {{{./RackAwareness.html}Rack Awareness}}
|
||||
documentation for more specific information.
|
||||
|
||||
It is highly recommended configuring rack awareness prior to starting HDFS.
|
||||
|
||||
* {Logging}
|
||||
|
||||
Hadoop uses the Apache log4j via the Apache Commons Logging framework for
|
||||
logging. Edit the <<<conf/log4j.properties>>> file to customize the
|
||||
Hadoop uses the {{{http://logging.apache.org/log4j/2.x/}Apache log4j}} via the Apache Commons Logging framework for
|
||||
logging. Edit the <<<etc/hadoop/log4j.properties>>> file to customize the
|
||||
Hadoop daemons' logging configuration (log-formats and so on).
|
||||
|
||||
* {Operating the Hadoop Cluster}
|
||||
|
||||
Once all the necessary configuration is complete, distribute the files to the
|
||||
<<<HADOOP_CONF_DIR>>> directory on all the machines.
|
||||
<<<HADOOP_CONF_DIR>>> directory on all the machines. This should be the
|
||||
same directory on all machines.
|
||||
|
||||
** Hadoop Startup
|
||||
|
||||
To start a Hadoop cluster you will need to start both the HDFS and YARN
|
||||
cluster.
|
||||
|
||||
Format a new distributed filesystem:
|
||||
|
||||
----
|
||||
$ $HADOOP_PREFIX/bin/hdfs namenode -format <cluster_name>
|
||||
----
|
||||
|
||||
Start the HDFS with the following command, run on the designated NameNode:
|
||||
|
||||
----
|
||||
$ $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start namenode
|
||||
----
|
||||
|
||||
Run a script to start DataNodes on all slaves:
|
||||
|
||||
----
|
||||
$ $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start datanode
|
||||
----
|
||||
|
||||
Start the YARN with the following command, run on the designated
|
||||
ResourceManager:
|
||||
|
||||
----
|
||||
$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start resourcemanager
|
||||
----
|
||||
|
||||
Run a script to start NodeManagers on all slaves:
|
||||
|
||||
----
|
||||
$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start nodemanager
|
||||
----
|
||||
|
||||
Start a standalone WebAppProxy server. If multiple servers
|
||||
are used with load balancing it should be run on each of them:
|
||||
|
||||
----
|
||||
$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh start proxyserver --config $HADOOP_CONF_DIR
|
||||
----
|
||||
|
||||
Start the MapReduce JobHistory Server with the following command, run on the
|
||||
designated server:
|
||||
|
||||
----
|
||||
$ $HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh start historyserver --config $HADOOP_CONF_DIR
|
||||
----
|
||||
|
||||
** Hadoop Shutdown
|
||||
|
||||
Stop the NameNode with the following command, run on the designated
|
||||
NameNode:
|
||||
|
||||
----
|
||||
$ $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs stop namenode
|
||||
----
|
||||
|
||||
Run a script to stop DataNodes on all slaves:
|
||||
|
||||
----
|
||||
$ $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs stop datanode
|
||||
----
|
||||
|
||||
Stop the ResourceManager with the following command, run on the designated
|
||||
ResourceManager:
|
||||
|
||||
----
|
||||
$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop resourcemanager
|
||||
----
|
||||
|
||||
Run a script to stop NodeManagers on all slaves:
|
||||
|
||||
----
|
||||
$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop nodemanager
|
||||
----
|
||||
|
||||
Stop the WebAppProxy server. If multiple servers are used with load
|
||||
balancing it should be run on each of them:
|
||||
|
||||
----
|
||||
$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh stop proxyserver --config $HADOOP_CONF_DIR
|
||||
----
|
||||
|
||||
|
||||
Stop the MapReduce JobHistory Server with the following command, run on the
|
||||
designated server:
|
||||
|
||||
----
|
||||
$ $HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh stop historyserver --config $HADOOP_CONF_DIR
|
||||
----
|
||||
|
||||
|
||||
* {Operating the Hadoop Cluster}
|
||||
|
||||
Once all the necessary configuration is complete, distribute the files to the
|
||||
<<<HADOOP_CONF_DIR>>> directory on all the machines.
|
||||
|
||||
This section also describes the various Unix users who should be starting the
|
||||
various components and uses the same Unix accounts and groups used previously:
|
||||
In general, it is recommended that HDFS and YARN run as separate users.
|
||||
In the majority of installations, HDFS processes execute as 'hdfs'. YARN
|
||||
is typically using the 'yarn' account.
|
||||
|
||||
** Hadoop Startup
|
||||
|
||||
To start a Hadoop cluster you will need to start both the HDFS and YARN
|
||||
cluster.
|
||||
|
||||
Format a new distributed filesystem as <hdfs>:
|
||||
The first time you bring up HDFS, it must be formatted. Format a new
|
||||
distributed filesystem as <hdfs>:
|
||||
|
||||
----
|
||||
[hdfs]$ $HADOOP_PREFIX/bin/hdfs namenode -format <cluster_name>
|
||||
----
|
||||
|
||||
Start the HDFS with the following command, run on the designated NameNode
|
||||
as <hdfs>:
|
||||
Start the HDFS NameNode with the following command on the
|
||||
designated node as <hdfs>:
|
||||
|
||||
----
|
||||
[hdfs]$ $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start namenode
|
||||
[hdfs]$ $HADOOP_PREFIX/bin/hdfs --daemon start namenode
|
||||
----
|
||||
|
||||
Run a script to start DataNodes on all slaves as <root> with a special
|
||||
environment variable <<<HADOOP_SECURE_DN_USER>>> set to <hdfs>:
|
||||
Start a HDFS DataNode with the following command on each
|
||||
designated node as <hdfs>:
|
||||
|
||||
----
|
||||
[root]$ HADOOP_SECURE_DN_USER=hdfs $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start datanode
|
||||
[hdfs]$ $HADOOP_PREFIX/bin/hdfs --daemon start datanode
|
||||
----
|
||||
|
||||
If <<<etc/hadoop/slaves>>> and ssh trusted access is configured
|
||||
(see {{{./SingleCluster.html}Single Node Setup}}), all of the
|
||||
HDFS processes can be started with a utility script. As <hdfs>:
|
||||
|
||||
----
|
||||
[hdfs]$ $HADOOP_PREFIX/sbin/start-dfs.sh
|
||||
----
|
||||
|
||||
Start the YARN with the following command, run on the designated
|
||||
ResourceManager as <yarn>:
|
||||
|
||||
----
|
||||
[yarn]$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start resourcemanager
|
||||
[yarn]$ $HADOOP_PREFIX/bin/yarn --daemon start resourcemanager
|
||||
----
|
||||
|
||||
Run a script to start NodeManagers on all slaves as <yarn>:
|
||||
Run a script to start a NodeManager on each designated host as <yarn>:
|
||||
|
||||
----
|
||||
[yarn]$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start nodemanager
|
||||
[yarn]$ $HADOOP_PREFIX/bin/yarn --daemon start nodemanager
|
||||
----
|
||||
|
||||
Start a standalone WebAppProxy server. Run on the WebAppProxy
|
||||
|
@ -631,14 +555,22 @@ $ $HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh stop historyserver --config $HADOO
|
|||
it should be run on each of them:
|
||||
|
||||
----
|
||||
[yarn]$ $HADOOP_YARN_HOME/bin/yarn start proxyserver --config $HADOOP_CONF_DIR
|
||||
[yarn]$ $HADOOP_PREFIX/bin/yarn --daemon start proxyserver
|
||||
----
|
||||
|
||||
Start the MapReduce JobHistory Server with the following command, run on the
|
||||
designated server as <mapred>:
|
||||
If <<<etc/hadoop/slaves>>> and ssh trusted access is configured
|
||||
(see {{{./SingleCluster.html}Single Node Setup}}), all of the
|
||||
YARN processes can be started with a utility script. As <yarn>:
|
||||
|
||||
----
|
||||
[mapred]$ $HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh start historyserver --config $HADOOP_CONF_DIR
|
||||
[yarn]$ $HADOOP_PREFIX/sbin/start-yarn.sh
|
||||
----
|
||||
|
||||
Start the MapReduce JobHistory Server with the following command, run
|
||||
on the designated server as <mapred>:
|
||||
|
||||
----
|
||||
[mapred]$ $HADOOP_PREFIX/bin/mapred --daemon start historyserver
|
||||
----
|
||||
|
||||
** Hadoop Shutdown
|
||||
|
@ -647,26 +579,42 @@ $ $HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh stop historyserver --config $HADOO
|
|||
as <hdfs>:
|
||||
|
||||
----
|
||||
[hdfs]$ $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs stop namenode
|
||||
[hdfs]$ $HADOOP_PREFIX/bin/hdfs --daemon stop namenode
|
||||
----
|
||||
|
||||
Run a script to stop DataNodes on all slaves as <root>:
|
||||
Run a script to stop a DataNode as <hdfs>:
|
||||
|
||||
----
|
||||
[root]$ $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs stop datanode
|
||||
[hdfs]$ $HADOOP_PREFIX/bin/hdfs --daemon stop datanode
|
||||
----
|
||||
|
||||
If <<<etc/hadoop/slaves>>> and ssh trusted access is configured
|
||||
(see {{{./SingleCluster.html}Single Node Setup}}), all of the
|
||||
HDFS processes may be stopped with a utility script. As <hdfs>:
|
||||
|
||||
----
|
||||
[hdfs]$ $HADOOP_PREFIX/sbin/stop-dfs.sh
|
||||
----
|
||||
|
||||
Stop the ResourceManager with the following command, run on the designated
|
||||
ResourceManager as <yarn>:
|
||||
|
||||
----
|
||||
[yarn]$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop resourcemanager
|
||||
[yarn]$ $HADOOP_PREFIX/bin/yarn --daemon stop resourcemanager
|
||||
----
|
||||
|
||||
Run a script to stop NodeManagers on all slaves as <yarn>:
|
||||
Run a script to stop a NodeManager on a slave as <yarn>:
|
||||
|
||||
----
|
||||
[yarn]$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop nodemanager
|
||||
[yarn]$ $HADOOP_PREFIX/bin/yarn --daemon stop nodemanager
|
||||
----
|
||||
|
||||
If <<<etc/hadoop/slaves>>> and ssh trusted access is configured
|
||||
(see {{{./SingleCluster.html}Single Node Setup}}), all of the
|
||||
YARN processes can be stopped with a utility script. As <yarn>:
|
||||
|
||||
----
|
||||
[yarn]$ $HADOOP_PREFIX/sbin/stop-yarn.sh
|
||||
----
|
||||
|
||||
Stop the WebAppProxy server. Run on the WebAppProxy server as
|
||||
|
@ -674,14 +622,14 @@ $ $HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh stop historyserver --config $HADOO
|
|||
should be run on each of them:
|
||||
|
||||
----
|
||||
[yarn]$ $HADOOP_YARN_HOME/bin/yarn stop proxyserver --config $HADOOP_CONF_DIR
|
||||
[yarn]$ $HADOOP_PREFIX/bin/yarn stop proxyserver
|
||||
----
|
||||
|
||||
Stop the MapReduce JobHistory Server with the following command, run on the
|
||||
designated server as <mapred>:
|
||||
|
||||
----
|
||||
[mapred]$ $HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh stop historyserver --config $HADOOP_CONF_DIR
|
||||
[mapred]$ $HADOOP_PREFIX/bin/mapred --daemon stop historyserver
|
||||
----
|
||||
|
||||
* {Web Interfaces}
|
||||
|
|
|
@ -21,102 +21,161 @@
|
|||
|
||||
%{toc}
|
||||
|
||||
Overview
|
||||
Hadoop Commands Guide
|
||||
|
||||
All hadoop commands are invoked by the <<<bin/hadoop>>> script. Running the
|
||||
hadoop script without any arguments prints the description for all
|
||||
commands.
|
||||
* Overview
|
||||
|
||||
Usage: <<<hadoop [--config confdir] [--loglevel loglevel] [COMMAND]
|
||||
[GENERIC_OPTIONS] [COMMAND_OPTIONS]>>>
|
||||
All of the Hadoop commands and subprojects follow the same basic structure:
|
||||
|
||||
Hadoop has an option parsing framework that employs parsing generic
|
||||
options as well as running classes.
|
||||
Usage: <<<shellcommand [SHELL_OPTIONS] [COMMAND] [GENERIC_OPTIONS] [COMMAND_OPTIONS]>>>
|
||||
|
||||
*--------+---------+
|
||||
|| FIELD || Description
|
||||
*-----------------------+---------------+
|
||||
|| COMMAND_OPTION || Description
|
||||
| shellcommand | The command of the project being invoked. For example,
|
||||
| Hadoop common uses <<<hadoop>>>, HDFS uses <<<hdfs>>>,
|
||||
| and YARN uses <<<yarn>>>.
|
||||
*---------------+-------------------+
|
||||
| SHELL_OPTIONS | Options that the shell processes prior to executing Java.
|
||||
*-----------------------+---------------+
|
||||
| <<<--config confdir>>>| Overwrites the default Configuration directory. Default is <<<${HADOOP_HOME}/conf>>>.
|
||||
| COMMAND | Action to perform.
|
||||
*-----------------------+---------------+
|
||||
| <<<--loglevel loglevel>>>| Overwrites the log level. Valid log levels are
|
||||
| | FATAL, ERROR, WARN, INFO, DEBUG, and TRACE.
|
||||
| | Default is INFO.
|
||||
| GENERIC_OPTIONS | The common set of options supported by
|
||||
| multiple commands.
|
||||
*-----------------------+---------------+
|
||||
| GENERIC_OPTIONS | The common set of options supported by multiple commands.
|
||||
| COMMAND_OPTIONS | Various commands with their options are described in the following sections. The commands have been grouped into User Commands and Administration Commands.
|
||||
| COMMAND_OPTIONS | Various commands with their options are
|
||||
| described in this documention for the
|
||||
| Hadoop common sub-project. HDFS and YARN are
|
||||
| covered in other documents.
|
||||
*-----------------------+---------------+
|
||||
|
||||
Generic Options
|
||||
** {Shell Options}
|
||||
|
||||
The following options are supported by {{dfsadmin}}, {{fs}}, {{fsck}},
|
||||
{{job}} and {{fetchdt}}. Applications should implement
|
||||
{{{../../api/org/apache/hadoop/util/Tool.html}Tool}} to support
|
||||
GenericOptions.
|
||||
All of the shell commands will accept a common set of options. For some commands,
|
||||
these options are ignored. For example, passing <<<---hostnames>>> on a
|
||||
command that only executes on a single host will be ignored.
|
||||
|
||||
*-----------------------+---------------+
|
||||
|| SHELL_OPTION || Description
|
||||
*-----------------------+---------------+
|
||||
| <<<--buildpaths>>> | Enables developer versions of jars.
|
||||
*-----------------------+---------------+
|
||||
| <<<--config confdir>>> | Overwrites the default Configuration
|
||||
| directory. Default is <<<${HADOOP_PREFIX}/conf>>>.
|
||||
*-----------------------+----------------+
|
||||
| <<<--daemon mode>>> | If the command supports daemonization (e.g.,
|
||||
| <<<hdfs namenode>>>), execute in the appropriate
|
||||
| mode. Supported modes are <<<start>>> to start the
|
||||
| process in daemon mode, <<<stop>>> to stop the
|
||||
| process, and <<<status>>> to determine the active
|
||||
| status of the process. <<<status>>> will return
|
||||
| an {{{http://refspecs.linuxbase.org/LSB_3.0.0/LSB-generic/LSB-generic/iniscrptact.html}LSB-compliant}} result code.
|
||||
| If no option is provided, commands that support
|
||||
| daemonization will run in the foreground.
|
||||
*-----------------------+---------------+
|
||||
| <<<--debug>>> | Enables shell level configuration debugging information
|
||||
*-----------------------+---------------+
|
||||
| <<<--help>>> | Shell script usage information.
|
||||
*-----------------------+---------------+
|
||||
| <<<--hostnames>>> | A space delimited list of hostnames where to execute
|
||||
| a multi-host subcommand. By default, the content of
|
||||
| the <<<slaves>>> file is used.
|
||||
*-----------------------+----------------+
|
||||
| <<<--hosts>>> | A file that contains a list of hostnames where to execute
|
||||
| a multi-host subcommand. By default, the content of the
|
||||
| <<<slaves>>> file is used.
|
||||
*-----------------------+----------------+
|
||||
| <<<--loglevel loglevel>>> | Overrides the log level. Valid log levels are
|
||||
| | FATAL, ERROR, WARN, INFO, DEBUG, and TRACE.
|
||||
| | Default is INFO.
|
||||
*-----------------------+---------------+
|
||||
|
||||
** {Generic Options}
|
||||
|
||||
Many subcommands honor a common set of configuration options to alter their behavior:
|
||||
|
||||
*------------------------------------------------+-----------------------------+
|
||||
|| GENERIC_OPTION || Description
|
||||
*------------------------------------------------+-----------------------------+
|
||||
|<<<-conf \<configuration file\> >>> | Specify an application
|
||||
| configuration file.
|
||||
*------------------------------------------------+-----------------------------+
|
||||
|<<<-D \<property\>=\<value\> >>> | Use value for given property.
|
||||
*------------------------------------------------+-----------------------------+
|
||||
|<<<-jt \<local\> or \<resourcemanager:port\>>>> | Specify a ResourceManager.
|
||||
| Applies only to job.
|
||||
*------------------------------------------------+-----------------------------+
|
||||
|<<<-files \<comma separated list of files\> >>> | Specify comma separated files
|
||||
| to be copied to the map
|
||||
| reduce cluster. Applies only
|
||||
| to job.
|
||||
*------------------------------------------------+-----------------------------+
|
||||
|<<<-libjars \<comma seperated list of jars\> >>>| Specify comma separated jar
|
||||
| files to include in the
|
||||
| classpath. Applies only to
|
||||
| job.
|
||||
*------------------------------------------------+-----------------------------+
|
||||
|<<<-archives \<comma separated list of archives\> >>> | Specify comma separated
|
||||
| archives to be unarchived on
|
||||
| the compute machines. Applies
|
||||
| only to job.
|
||||
*------------------------------------------------+-----------------------------+
|
||||
|<<<-conf \<configuration file\> >>> | Specify an application
|
||||
| configuration file.
|
||||
*------------------------------------------------+-----------------------------+
|
||||
|<<<-D \<property\>=\<value\> >>> | Use value for given property.
|
||||
*------------------------------------------------+-----------------------------+
|
||||
|<<<-files \<comma separated list of files\> >>> | Specify comma separated files
|
||||
| to be copied to the map
|
||||
| reduce cluster. Applies only
|
||||
| to job.
|
||||
*------------------------------------------------+-----------------------------+
|
||||
|<<<-jt \<local\> or \<resourcemanager:port\>>>> | Specify a ResourceManager.
|
||||
| Applies only to job.
|
||||
*------------------------------------------------+-----------------------------+
|
||||
|<<<-libjars \<comma seperated list of jars\> >>>| Specify comma separated jar
|
||||
| files to include in the
|
||||
| classpath. Applies only to
|
||||
| job.
|
||||
*------------------------------------------------+-----------------------------+
|
||||
|
||||
User Commands
|
||||
Hadoop Common Commands
|
||||
|
||||
All of these commands are executed from the <<<hadoop>>> shell command. They
|
||||
have been broken up into {{User Commands}} and
|
||||
{{Admininistration Commands}}.
|
||||
|
||||
* User Commands
|
||||
|
||||
Commands useful for users of a hadoop cluster.
|
||||
|
||||
* <<<archive>>>
|
||||
** <<<archive>>>
|
||||
|
||||
Creates a hadoop archive. More information can be found at
|
||||
{{{../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/HadoopArchives.html}
|
||||
{{{../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/HadoopArchives.html}
|
||||
Hadoop Archives Guide}}.
|
||||
|
||||
* <<<credential>>>
|
||||
** <<<checknative>>>
|
||||
|
||||
Command to manage credentials, passwords and secrets within credential providers.
|
||||
Usage: <<<hadoop checknative [-a] [-h] >>>
|
||||
|
||||
The CredentialProvider API in Hadoop allows for the separation of applications
|
||||
and how they store their required passwords/secrets. In order to indicate
|
||||
a particular provider type and location, the user must provide the
|
||||
<hadoop.security.credential.provider.path> configuration element in core-site.xml
|
||||
or use the command line option <<<-provider>>> on each of the following commands.
|
||||
This provider path is a comma-separated list of URLs that indicates the type and
|
||||
location of a list of providers that should be consulted.
|
||||
For example, the following path:
|
||||
*-----------------+-----------------------------------------------------------+
|
||||
|| COMMAND_OPTION || Description
|
||||
*-----------------+-----------------------------------------------------------+
|
||||
| -a | Check all libraries are available.
|
||||
*-----------------+-----------------------------------------------------------+
|
||||
| -h | print help
|
||||
*-----------------+-----------------------------------------------------------+
|
||||
|
||||
<<<user:///,jceks://file/tmp/test.jceks,jceks://hdfs@nn1.example.com/my/path/test.jceks>>>
|
||||
This command checks the availability of the Hadoop native code. See
|
||||
{{{NativeLibraries.html}}} for more information. By default, this command
|
||||
only checks the availability of libhadoop.
|
||||
|
||||
indicates that the current user's credentials file should be consulted through
|
||||
the User Provider, that the local file located at <<</tmp/test.jceks>>> is a Java Keystore
|
||||
Provider and that the file located within HDFS at <<<nn1.example.com/my/path/test.jceks>>>
|
||||
is also a store for a Java Keystore Provider.
|
||||
** <<<classpath>>>
|
||||
|
||||
When utilizing the credential command it will often be for provisioning a password
|
||||
or secret to a particular credential store provider. In order to explicitly
|
||||
indicate which provider store to use the <<<-provider>>> option should be used. Otherwise,
|
||||
given a path of multiple providers, the first non-transient provider will be used.
|
||||
This may or may not be the one that you intended.
|
||||
Usage: <<<hadoop classpath [--glob|--jar <path>|-h|--help]>>>
|
||||
|
||||
Example: <<<-provider jceks://file/tmp/test.jceks>>>
|
||||
*-----------------+-----------------------------------------------------------+
|
||||
|| COMMAND_OPTION || Description
|
||||
*-----------------+-----------------------------------------------------------+
|
||||
| --glob | expand wildcards
|
||||
*-----------------+-----------------------------------------------------------+
|
||||
| --jar <path> | write classpath as manifest in jar named <path>
|
||||
*-----------------+-----------------------------------------------------------+
|
||||
| -h, --help | print help
|
||||
*-----------------+-----------------------------------------------------------+
|
||||
|
||||
Prints the class path needed to get the Hadoop jar and the required
|
||||
libraries. If called without arguments, then prints the classpath set up by
|
||||
the command scripts, which is likely to contain wildcards in the classpath
|
||||
entries. Additional options print the classpath after wildcard expansion or
|
||||
write the classpath into the manifest of a jar file. The latter is useful in
|
||||
environments where wildcards cannot be used and the expanded classpath exceeds
|
||||
the maximum supported command line length.
|
||||
|
||||
** <<<credential>>>
|
||||
|
||||
Usage: <<<hadoop credential <subcommand> [options]>>>
|
||||
|
||||
|
@ -143,109 +202,96 @@ User Commands
|
|||
| indicated.
|
||||
*-------------------+-------------------------------------------------------+
|
||||
|
||||
* <<<distcp>>>
|
||||
Command to manage credentials, passwords and secrets within credential providers.
|
||||
|
||||
The CredentialProvider API in Hadoop allows for the separation of applications
|
||||
and how they store their required passwords/secrets. In order to indicate
|
||||
a particular provider type and location, the user must provide the
|
||||
<hadoop.security.credential.provider.path> configuration element in core-site.xml
|
||||
or use the command line option <<<-provider>>> on each of the following commands.
|
||||
This provider path is a comma-separated list of URLs that indicates the type and
|
||||
location of a list of providers that should be consulted. For example, the following path:
|
||||
<<<user:///,jceks://file/tmp/test.jceks,jceks://hdfs@nn1.example.com/my/path/test.jceks>>>
|
||||
|
||||
indicates that the current user's credentials file should be consulted through
|
||||
the User Provider, that the local file located at <<</tmp/test.jceks>>> is a Java Keystore
|
||||
Provider and that the file located within HDFS at <<<nn1.example.com/my/path/test.jceks>>>
|
||||
is also a store for a Java Keystore Provider.
|
||||
|
||||
When utilizing the credential command it will often be for provisioning a password
|
||||
or secret to a particular credential store provider. In order to explicitly
|
||||
indicate which provider store to use the <<<-provider>>> option should be used. Otherwise,
|
||||
given a path of multiple providers, the first non-transient provider will be used.
|
||||
This may or may not be the one that you intended.
|
||||
|
||||
Example: <<<-provider jceks://file/tmp/test.jceks>>>
|
||||
|
||||
** <<<distch>>>
|
||||
|
||||
Usage: <<<hadoop distch [-f urilist_url] [-i] [-log logdir] path:owner:group:permissions>>>
|
||||
|
||||
*-------------------+-------------------------------------------------------+
|
||||
||COMMAND_OPTION || Description
|
||||
*-------------------+-------------------------------------------------------+
|
||||
| -f | List of objects to change
|
||||
*----+------------+
|
||||
| -i | Ignore failures
|
||||
*----+------------+
|
||||
| -log | Directory to log output
|
||||
*-----+---------+
|
||||
|
||||
Change the ownership and permissions on many files at once.
|
||||
|
||||
** <<<distcp>>>
|
||||
|
||||
Copy file or directories recursively. More information can be found at
|
||||
{{{../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/DistCp.html}
|
||||
Hadoop DistCp Guide}}.
|
||||
|
||||
* <<<fs>>>
|
||||
** <<<fs>>>
|
||||
|
||||
Deprecated, use {{{../hadoop-hdfs/HDFSCommands.html#dfs}<<<hdfs dfs>>>}}
|
||||
instead.
|
||||
This command is documented in the {{{./FileSystemShell.html}File System Shell Guide}}. It is a synonym for <<<hdfs dfs>>> when HDFS is in use.
|
||||
|
||||
* <<<fsck>>>
|
||||
** <<<jar>>>
|
||||
|
||||
Deprecated, use {{{../hadoop-hdfs/HDFSCommands.html#fsck}<<<hdfs fsck>>>}}
|
||||
instead.
|
||||
Usage: <<<hadoop jar <jar> [mainClass] args...>>>
|
||||
|
||||
* <<<fetchdt>>>
|
||||
Runs a jar file.
|
||||
|
||||
Deprecated, use {{{../hadoop-hdfs/HDFSCommands.html#fetchdt}
|
||||
<<<hdfs fetchdt>>>}} instead.
|
||||
Use {{{../../hadoop-yarn/hadoop-yarn-site/YarnCommands.html#jar}<<<yarn jar>>>}}
|
||||
to launch YARN applications instead.
|
||||
|
||||
* <<<jar>>>
|
||||
** <<<jnipath>>>
|
||||
|
||||
Runs a jar file. Users can bundle their Map Reduce code in a jar file and
|
||||
execute it using this command.
|
||||
Usage: <<<hadoop jnipath>>>
|
||||
|
||||
Usage: <<<hadoop jar <jar> [mainClass] args...>>>
|
||||
Print the computed java.library.path.
|
||||
|
||||
The streaming jobs are run via this command. Examples can be referred from
|
||||
Streaming examples
|
||||
** <<<key>>>
|
||||
|
||||
Word count example is also run using jar command. It can be referred from
|
||||
Wordcount example
|
||||
Manage keys via the KeyProvider.
|
||||
|
||||
Use {{{../../hadoop-yarn/hadoop-yarn-site/YarnCommands.html#jar}<<<yarn jar>>>}}
|
||||
to launch YARN applications instead.
|
||||
** <<<trace>>>
|
||||
|
||||
* <<<job>>>
|
||||
View and modify Hadoop tracing settings. See the {{{./Tracing.html}Tracing Guide}}.
|
||||
|
||||
Deprecated. Use
|
||||
{{{../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapredCommands.html#job}
|
||||
<<<mapred job>>>}} instead.
|
||||
|
||||
* <<<pipes>>>
|
||||
|
||||
Deprecated. Use
|
||||
{{{../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapredCommands.html#pipes}
|
||||
<<<mapred pipes>>>}} instead.
|
||||
|
||||
* <<<queue>>>
|
||||
|
||||
Deprecated. Use
|
||||
{{{../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapredCommands.html#queue}
|
||||
<<<mapred queue>>>}} instead.
|
||||
|
||||
* <<<version>>>
|
||||
|
||||
Prints the version.
|
||||
** <<<version>>>
|
||||
|
||||
Usage: <<<hadoop version>>>
|
||||
|
||||
* <<<CLASSNAME>>>
|
||||
Prints the version.
|
||||
|
||||
hadoop script can be used to invoke any class.
|
||||
** <<<CLASSNAME>>>
|
||||
|
||||
Usage: <<<hadoop CLASSNAME>>>
|
||||
|
||||
Runs the class named <<<CLASSNAME>>>.
|
||||
Runs the class named <<<CLASSNAME>>>. The class must be part of a package.
|
||||
|
||||
* <<<classpath>>>
|
||||
|
||||
Prints the class path needed to get the Hadoop jar and the required
|
||||
libraries. If called without arguments, then prints the classpath set up by
|
||||
the command scripts, which is likely to contain wildcards in the classpath
|
||||
entries. Additional options print the classpath after wildcard expansion or
|
||||
write the classpath into the manifest of a jar file. The latter is useful in
|
||||
environments where wildcards cannot be used and the expanded classpath exceeds
|
||||
the maximum supported command line length.
|
||||
|
||||
Usage: <<<hadoop classpath [--glob|--jar <path>|-h|--help]>>>
|
||||
|
||||
*-----------------+-----------------------------------------------------------+
|
||||
|| COMMAND_OPTION || Description
|
||||
*-----------------+-----------------------------------------------------------+
|
||||
| --glob | expand wildcards
|
||||
*-----------------+-----------------------------------------------------------+
|
||||
| --jar <path> | write classpath as manifest in jar named <path>
|
||||
*-----------------+-----------------------------------------------------------+
|
||||
| -h, --help | print help
|
||||
*-----------------+-----------------------------------------------------------+
|
||||
|
||||
Administration Commands
|
||||
* {Administration Commands}
|
||||
|
||||
Commands useful for administrators of a hadoop cluster.
|
||||
|
||||
* <<<balancer>>>
|
||||
|
||||
Deprecated, use {{{../hadoop-hdfs/HDFSCommands.html#balancer}
|
||||
<<<hdfs balancer>>>}} instead.
|
||||
|
||||
* <<<daemonlog>>>
|
||||
|
||||
Get/Set the log level for each daemon.
|
||||
** <<<daemonlog>>>
|
||||
|
||||
Usage: <<<hadoop daemonlog -getlevel <host:port> <name> >>>
|
||||
Usage: <<<hadoop daemonlog -setlevel <host:port> <name> <level> >>>
|
||||
|
@ -262,22 +308,20 @@ Administration Commands
|
|||
| connects to http://<host:port>/logLevel?log=<name>
|
||||
*------------------------------+-----------------------------------------------------------+
|
||||
|
||||
* <<<datanode>>>
|
||||
Get/Set the log level for each daemon.
|
||||
|
||||
Deprecated, use {{{../hadoop-hdfs/HDFSCommands.html#datanode}
|
||||
<<<hdfs datanode>>>}} instead.
|
||||
* Files
|
||||
|
||||
* <<<dfsadmin>>>
|
||||
** <<etc/hadoop/hadoop-env.sh>>
|
||||
|
||||
Deprecated, use {{{../hadoop-hdfs/HDFSCommands.html#dfsadmin}
|
||||
<<<hdfs dfsadmin>>>}} instead.
|
||||
This file stores the global settings used by all Hadoop shell commands.
|
||||
|
||||
* <<<namenode>>>
|
||||
** <<etc/hadoop/hadoop-user-functions.sh>>
|
||||
|
||||
Deprecated, use {{{../hadoop-hdfs/HDFSCommands.html#namenode}
|
||||
<<<hdfs namenode>>>}} instead.
|
||||
This file allows for advanced users to override some shell functionality.
|
||||
|
||||
* <<<secondarynamenode>>>
|
||||
** <<~/.hadooprc>>
|
||||
|
||||
Deprecated, use {{{../hadoop-hdfs/HDFSCommands.html#secondarynamenode}
|
||||
<<<hdfs secondarynamenode>>>}} instead.
|
||||
This stores the personal environment for an individual user. It is
|
||||
processed after the hadoop-env.sh and hadoop-user-functions.sh files
|
||||
and can contain the same settings.
|
||||
|
|
|
@ -45,46 +45,62 @@ bin/hadoop fs <args>
|
|||
Differences are described with each of the commands. Error information is
|
||||
sent to stderr and the output is sent to stdout.
|
||||
|
||||
appendToFile
|
||||
If HDFS is being used, <<<hdfs dfs>>> is a synonym.
|
||||
|
||||
Usage: <<<hdfs dfs -appendToFile <localsrc> ... <dst> >>>
|
||||
See the {{{./CommandsManual.html}Commands Manual}} for generic shell options.
|
||||
|
||||
* appendToFile
|
||||
|
||||
Usage: <<<hadoop fs -appendToFile <localsrc> ... <dst> >>>
|
||||
|
||||
Append single src, or multiple srcs from local file system to the
|
||||
destination file system. Also reads input from stdin and appends to
|
||||
destination file system.
|
||||
|
||||
* <<<hdfs dfs -appendToFile localfile /user/hadoop/hadoopfile>>>
|
||||
* <<<hadoop fs -appendToFile localfile /user/hadoop/hadoopfile>>>
|
||||
|
||||
* <<<hdfs dfs -appendToFile localfile1 localfile2 /user/hadoop/hadoopfile>>>
|
||||
* <<<hadoop fs -appendToFile localfile1 localfile2 /user/hadoop/hadoopfile>>>
|
||||
|
||||
* <<<hdfs dfs -appendToFile localfile hdfs://nn.example.com/hadoop/hadoopfile>>>
|
||||
* <<<hadoop fs -appendToFile localfile hdfs://nn.example.com/hadoop/hadoopfile>>>
|
||||
|
||||
* <<<hdfs dfs -appendToFile - hdfs://nn.example.com/hadoop/hadoopfile>>>
|
||||
* <<<hadoop fs -appendToFile - hdfs://nn.example.com/hadoop/hadoopfile>>>
|
||||
Reads the input from stdin.
|
||||
|
||||
Exit Code:
|
||||
|
||||
Returns 0 on success and 1 on error.
|
||||
|
||||
cat
|
||||
* cat
|
||||
|
||||
Usage: <<<hdfs dfs -cat URI [URI ...]>>>
|
||||
Usage: <<<hadoop fs -cat URI [URI ...]>>>
|
||||
|
||||
Copies source paths to stdout.
|
||||
|
||||
Example:
|
||||
|
||||
* <<<hdfs dfs -cat hdfs://nn1.example.com/file1 hdfs://nn2.example.com/file2>>>
|
||||
* <<<hadoop fs -cat hdfs://nn1.example.com/file1 hdfs://nn2.example.com/file2>>>
|
||||
|
||||
* <<<hdfs dfs -cat file:///file3 /user/hadoop/file4>>>
|
||||
* <<<hadoop fs -cat file:///file3 /user/hadoop/file4>>>
|
||||
|
||||
Exit Code:
|
||||
|
||||
Returns 0 on success and -1 on error.
|
||||
|
||||
chgrp
|
||||
* checksum
|
||||
|
||||
Usage: <<<hdfs dfs -chgrp [-R] GROUP URI [URI ...]>>>
|
||||
Usage: <<<hadoop fs -checksum URI>>>
|
||||
|
||||
Returns the checksum information of a file.
|
||||
|
||||
Example:
|
||||
|
||||
* <<<hadoop fs -checksum hdfs://nn1.example.com/file1>>>
|
||||
|
||||
* <<<hadoop fs -checksum file:///etc/hosts>>>
|
||||
|
||||
* chgrp
|
||||
|
||||
Usage: <<<hadoop fs -chgrp [-R] GROUP URI [URI ...]>>>
|
||||
|
||||
Change group association of files. The user must be the owner of files, or
|
||||
else a super-user. Additional information is in the
|
||||
|
@ -94,9 +110,9 @@ chgrp
|
|||
|
||||
* The -R option will make the change recursively through the directory structure.
|
||||
|
||||
chmod
|
||||
* chmod
|
||||
|
||||
Usage: <<<hdfs dfs -chmod [-R] <MODE[,MODE]... | OCTALMODE> URI [URI ...]>>>
|
||||
Usage: <<<hadoop fs -chmod [-R] <MODE[,MODE]... | OCTALMODE> URI [URI ...]>>>
|
||||
|
||||
Change the permissions of files. With -R, make the change recursively
|
||||
through the directory structure. The user must be the owner of the file, or
|
||||
|
@ -107,9 +123,9 @@ chmod
|
|||
|
||||
* The -R option will make the change recursively through the directory structure.
|
||||
|
||||
chown
|
||||
* chown
|
||||
|
||||
Usage: <<<hdfs dfs -chown [-R] [OWNER][:[GROUP]] URI [URI ]>>>
|
||||
Usage: <<<hadoop fs -chown [-R] [OWNER][:[GROUP]] URI [URI ]>>>
|
||||
|
||||
Change the owner of files. The user must be a super-user. Additional information
|
||||
is in the {{{../hadoop-hdfs/HdfsPermissionsGuide.html}Permissions Guide}}.
|
||||
|
@ -118,9 +134,9 @@ chown
|
|||
|
||||
* The -R option will make the change recursively through the directory structure.
|
||||
|
||||
copyFromLocal
|
||||
* copyFromLocal
|
||||
|
||||
Usage: <<<hdfs dfs -copyFromLocal <localsrc> URI>>>
|
||||
Usage: <<<hadoop fs -copyFromLocal <localsrc> URI>>>
|
||||
|
||||
Similar to put command, except that the source is restricted to a local
|
||||
file reference.
|
||||
|
@ -129,16 +145,16 @@ copyFromLocal
|
|||
|
||||
* The -f option will overwrite the destination if it already exists.
|
||||
|
||||
copyToLocal
|
||||
* copyToLocal
|
||||
|
||||
Usage: <<<hdfs dfs -copyToLocal [-ignorecrc] [-crc] URI <localdst> >>>
|
||||
Usage: <<<hadoop fs -copyToLocal [-ignorecrc] [-crc] URI <localdst> >>>
|
||||
|
||||
Similar to get command, except that the destination is restricted to a
|
||||
local file reference.
|
||||
|
||||
count
|
||||
* count
|
||||
|
||||
Usage: <<<hdfs dfs -count [-q] [-h] <paths> >>>
|
||||
Usage: <<<hadoop fs -count [-q] [-h] <paths> >>>
|
||||
|
||||
Count the number of directories, files and bytes under the paths that match
|
||||
the specified file pattern. The output columns with -count are: DIR_COUNT,
|
||||
|
@ -151,19 +167,19 @@ count
|
|||
|
||||
Example:
|
||||
|
||||
* <<<hdfs dfs -count hdfs://nn1.example.com/file1 hdfs://nn2.example.com/file2>>>
|
||||
* <<<hadoop fs -count hdfs://nn1.example.com/file1 hdfs://nn2.example.com/file2>>>
|
||||
|
||||
* <<<hdfs dfs -count -q hdfs://nn1.example.com/file1>>>
|
||||
* <<<hadoop fs -count -q hdfs://nn1.example.com/file1>>>
|
||||
|
||||
* <<<hdfs dfs -count -q -h hdfs://nn1.example.com/file1>>>
|
||||
* <<<hadoop fs -count -q -h hdfs://nn1.example.com/file1>>>
|
||||
|
||||
Exit Code:
|
||||
|
||||
Returns 0 on success and -1 on error.
|
||||
|
||||
cp
|
||||
* cp
|
||||
|
||||
Usage: <<<hdfs dfs -cp [-f] [-p | -p[topax]] URI [URI ...] <dest> >>>
|
||||
Usage: <<<hadoop fs -cp [-f] [-p | -p[topax]] URI [URI ...] <dest> >>>
|
||||
|
||||
Copy files from source to destination. This command allows multiple sources
|
||||
as well in which case the destination must be a directory.
|
||||
|
@ -187,17 +203,41 @@ cp
|
|||
|
||||
Example:
|
||||
|
||||
* <<<hdfs dfs -cp /user/hadoop/file1 /user/hadoop/file2>>>
|
||||
* <<<hadoop fs -cp /user/hadoop/file1 /user/hadoop/file2>>>
|
||||
|
||||
* <<<hdfs dfs -cp /user/hadoop/file1 /user/hadoop/file2 /user/hadoop/dir>>>
|
||||
* <<<hadoop fs -cp /user/hadoop/file1 /user/hadoop/file2 /user/hadoop/dir>>>
|
||||
|
||||
Exit Code:
|
||||
|
||||
Returns 0 on success and -1 on error.
|
||||
|
||||
du
|
||||
* createSnapshot
|
||||
|
||||
Usage: <<<hdfs dfs -du [-s] [-h] URI [URI ...]>>>
|
||||
See {{{../hadoop-hdfs/HdfsSnapshots.html}HDFS Snapshots Guide}}.
|
||||
|
||||
|
||||
* deleteSnapshot
|
||||
|
||||
See {{{../hadoop-hdfs/HdfsSnapshots.html}HDFS Snapshots Guide}}.
|
||||
|
||||
* df
|
||||
|
||||
Usage: <<<hadoop fs -df [-h] URI [URI ...]>>>
|
||||
|
||||
Displays free space.
|
||||
|
||||
Options:
|
||||
|
||||
* The -h option will format file sizes in a "human-readable" fashion (e.g
|
||||
64.0m instead of 67108864)
|
||||
|
||||
Example:
|
||||
|
||||
* <<<hadoop dfs -df /user/hadoop/dir1>>>
|
||||
|
||||
* du
|
||||
|
||||
Usage: <<<hadoop fs -du [-s] [-h] URI [URI ...]>>>
|
||||
|
||||
Displays sizes of files and directories contained in the given directory or
|
||||
the length of a file in case its just a file.
|
||||
|
@ -212,29 +252,29 @@ du
|
|||
|
||||
Example:
|
||||
|
||||
* hdfs dfs -du /user/hadoop/dir1 /user/hadoop/file1 hdfs://nn.example.com/user/hadoop/dir1
|
||||
* <<<hadoop fs -du /user/hadoop/dir1 /user/hadoop/file1 hdfs://nn.example.com/user/hadoop/dir1>>>
|
||||
|
||||
Exit Code:
|
||||
Returns 0 on success and -1 on error.
|
||||
|
||||
dus
|
||||
* dus
|
||||
|
||||
Usage: <<<hdfs dfs -dus <args> >>>
|
||||
Usage: <<<hadoop fs -dus <args> >>>
|
||||
|
||||
Displays a summary of file lengths.
|
||||
|
||||
<<Note:>> This command is deprecated. Instead use <<<hdfs dfs -du -s>>>.
|
||||
<<Note:>> This command is deprecated. Instead use <<<hadoop fs -du -s>>>.
|
||||
|
||||
expunge
|
||||
* expunge
|
||||
|
||||
Usage: <<<hdfs dfs -expunge>>>
|
||||
Usage: <<<hadoop fs -expunge>>>
|
||||
|
||||
Empty the Trash. Refer to the {{{../hadoop-hdfs/HdfsDesign.html}
|
||||
HDFS Architecture Guide}} for more information on the Trash feature.
|
||||
|
||||
find
|
||||
* find
|
||||
|
||||
Usage: <<<hdfs dfs -find <path> ... <expression> ... >>>
|
||||
Usage: <<<hadoop fs -find <path> ... <expression> ... >>>
|
||||
|
||||
Finds all files that match the specified expression and applies selected
|
||||
actions to them. If no <path> is specified then defaults to the current
|
||||
|
@ -269,15 +309,15 @@ find
|
|||
|
||||
Example:
|
||||
|
||||
<<<hdfs dfs -find / -name test -print>>>
|
||||
<<<hadoop fs -find / -name test -print>>>
|
||||
|
||||
Exit Code:
|
||||
|
||||
Returns 0 on success and -1 on error.
|
||||
|
||||
get
|
||||
* get
|
||||
|
||||
Usage: <<<hdfs dfs -get [-ignorecrc] [-crc] <src> <localdst> >>>
|
||||
Usage: <<<hadoop fs -get [-ignorecrc] [-crc] <src> <localdst> >>>
|
||||
|
||||
Copy files to the local file system. Files that fail the CRC check may be
|
||||
copied with the -ignorecrc option. Files and CRCs may be copied using the
|
||||
|
@ -285,17 +325,17 @@ get
|
|||
|
||||
Example:
|
||||
|
||||
* <<<hdfs dfs -get /user/hadoop/file localfile>>>
|
||||
* <<<hadoop fs -get /user/hadoop/file localfile>>>
|
||||
|
||||
* <<<hdfs dfs -get hdfs://nn.example.com/user/hadoop/file localfile>>>
|
||||
* <<<hadoop fs -get hdfs://nn.example.com/user/hadoop/file localfile>>>
|
||||
|
||||
Exit Code:
|
||||
|
||||
Returns 0 on success and -1 on error.
|
||||
|
||||
getfacl
|
||||
* getfacl
|
||||
|
||||
Usage: <<<hdfs dfs -getfacl [-R] <path> >>>
|
||||
Usage: <<<hadoop fs -getfacl [-R] <path> >>>
|
||||
|
||||
Displays the Access Control Lists (ACLs) of files and directories. If a
|
||||
directory has a default ACL, then getfacl also displays the default ACL.
|
||||
|
@ -308,17 +348,17 @@ getfacl
|
|||
|
||||
Examples:
|
||||
|
||||
* <<<hdfs dfs -getfacl /file>>>
|
||||
* <<<hadoop fs -getfacl /file>>>
|
||||
|
||||
* <<<hdfs dfs -getfacl -R /dir>>>
|
||||
* <<<hadoop fs -getfacl -R /dir>>>
|
||||
|
||||
Exit Code:
|
||||
|
||||
Returns 0 on success and non-zero on error.
|
||||
|
||||
getfattr
|
||||
* getfattr
|
||||
|
||||
Usage: <<<hdfs dfs -getfattr [-R] {-n name | -d} [-e en] <path> >>>
|
||||
Usage: <<<hadoop fs -getfattr [-R] {-n name | -d} [-e en] <path> >>>
|
||||
|
||||
Displays the extended attribute names and values (if any) for a file or
|
||||
directory.
|
||||
|
@ -337,26 +377,32 @@ getfattr
|
|||
|
||||
Examples:
|
||||
|
||||
* <<<hdfs dfs -getfattr -d /file>>>
|
||||
* <<<hadoop fs -getfattr -d /file>>>
|
||||
|
||||
* <<<hdfs dfs -getfattr -R -n user.myAttr /dir>>>
|
||||
* <<<hadoop fs -getfattr -R -n user.myAttr /dir>>>
|
||||
|
||||
Exit Code:
|
||||
|
||||
Returns 0 on success and non-zero on error.
|
||||
|
||||
getmerge
|
||||
* getmerge
|
||||
|
||||
Usage: <<<hdfs dfs -getmerge <src> <localdst> [addnl]>>>
|
||||
Usage: <<<hadoop fs -getmerge <src> <localdst> [addnl]>>>
|
||||
|
||||
Takes a source directory and a destination file as input and concatenates
|
||||
files in src into the destination local file. Optionally addnl can be set to
|
||||
enable adding a newline character at the
|
||||
end of each file.
|
||||
|
||||
ls
|
||||
* help
|
||||
|
||||
Usage: <<<hdfs dfs -ls [-R] <args> >>>
|
||||
Usage: <<<hadoop fs -help>>>
|
||||
|
||||
Return usage output.
|
||||
|
||||
* ls
|
||||
|
||||
Usage: <<<hadoop fs -ls [-R] <args> >>>
|
||||
|
||||
Options:
|
||||
|
||||
|
@ -377,23 +423,23 @@ permissions userid groupid modification_date modification_time dirname
|
|||
|
||||
Example:
|
||||
|
||||
* <<<hdfs dfs -ls /user/hadoop/file1>>>
|
||||
* <<<hadoop fs -ls /user/hadoop/file1>>>
|
||||
|
||||
Exit Code:
|
||||
|
||||
Returns 0 on success and -1 on error.
|
||||
|
||||
lsr
|
||||
* lsr
|
||||
|
||||
Usage: <<<hdfs dfs -lsr <args> >>>
|
||||
Usage: <<<hadoop fs -lsr <args> >>>
|
||||
|
||||
Recursive version of ls.
|
||||
|
||||
<<Note:>> This command is deprecated. Instead use <<<hdfs dfs -ls -R>>>
|
||||
<<Note:>> This command is deprecated. Instead use <<<hadoop fs -ls -R>>>
|
||||
|
||||
mkdir
|
||||
* mkdir
|
||||
|
||||
Usage: <<<hdfs dfs -mkdir [-p] <paths> >>>
|
||||
Usage: <<<hadoop fs -mkdir [-p] <paths> >>>
|
||||
|
||||
Takes path uri's as argument and creates directories.
|
||||
|
||||
|
@ -403,30 +449,30 @@ mkdir
|
|||
|
||||
Example:
|
||||
|
||||
* <<<hdfs dfs -mkdir /user/hadoop/dir1 /user/hadoop/dir2>>>
|
||||
* <<<hadoop fs -mkdir /user/hadoop/dir1 /user/hadoop/dir2>>>
|
||||
|
||||
* <<<hdfs dfs -mkdir hdfs://nn1.example.com/user/hadoop/dir hdfs://nn2.example.com/user/hadoop/dir>>>
|
||||
* <<<hadoop fs -mkdir hdfs://nn1.example.com/user/hadoop/dir hdfs://nn2.example.com/user/hadoop/dir>>>
|
||||
|
||||
Exit Code:
|
||||
|
||||
Returns 0 on success and -1 on error.
|
||||
|
||||
moveFromLocal
|
||||
* moveFromLocal
|
||||
|
||||
Usage: <<<hdfs dfs -moveFromLocal <localsrc> <dst> >>>
|
||||
Usage: <<<hadoop fs -moveFromLocal <localsrc> <dst> >>>
|
||||
|
||||
Similar to put command, except that the source localsrc is deleted after
|
||||
it's copied.
|
||||
|
||||
moveToLocal
|
||||
* moveToLocal
|
||||
|
||||
Usage: <<<hdfs dfs -moveToLocal [-crc] <src> <dst> >>>
|
||||
Usage: <<<hadoop fs -moveToLocal [-crc] <src> <dst> >>>
|
||||
|
||||
Displays a "Not implemented yet" message.
|
||||
|
||||
mv
|
||||
* mv
|
||||
|
||||
Usage: <<<hdfs dfs -mv URI [URI ...] <dest> >>>
|
||||
Usage: <<<hadoop fs -mv URI [URI ...] <dest> >>>
|
||||
|
||||
Moves files from source to destination. This command allows multiple sources
|
||||
as well in which case the destination needs to be a directory. Moving files
|
||||
|
@ -434,38 +480,42 @@ mv
|
|||
|
||||
Example:
|
||||
|
||||
* <<<hdfs dfs -mv /user/hadoop/file1 /user/hadoop/file2>>>
|
||||
* <<<hadoop fs -mv /user/hadoop/file1 /user/hadoop/file2>>>
|
||||
|
||||
* <<<hdfs dfs -mv hdfs://nn.example.com/file1 hdfs://nn.example.com/file2 hdfs://nn.example.com/file3 hdfs://nn.example.com/dir1>>>
|
||||
* <<<hadoop fs -mv hdfs://nn.example.com/file1 hdfs://nn.example.com/file2 hdfs://nn.example.com/file3 hdfs://nn.example.com/dir1>>>
|
||||
|
||||
Exit Code:
|
||||
|
||||
Returns 0 on success and -1 on error.
|
||||
|
||||
put
|
||||
* put
|
||||
|
||||
Usage: <<<hdfs dfs -put <localsrc> ... <dst> >>>
|
||||
Usage: <<<hadoop fs -put <localsrc> ... <dst> >>>
|
||||
|
||||
Copy single src, or multiple srcs from local file system to the destination
|
||||
file system. Also reads input from stdin and writes to destination file
|
||||
system.
|
||||
|
||||
* <<<hdfs dfs -put localfile /user/hadoop/hadoopfile>>>
|
||||
* <<<hadoop fs -put localfile /user/hadoop/hadoopfile>>>
|
||||
|
||||
* <<<hdfs dfs -put localfile1 localfile2 /user/hadoop/hadoopdir>>>
|
||||
* <<<hadoop fs -put localfile1 localfile2 /user/hadoop/hadoopdir>>>
|
||||
|
||||
* <<<hdfs dfs -put localfile hdfs://nn.example.com/hadoop/hadoopfile>>>
|
||||
* <<<hadoop fs -put localfile hdfs://nn.example.com/hadoop/hadoopfile>>>
|
||||
|
||||
* <<<hdfs dfs -put - hdfs://nn.example.com/hadoop/hadoopfile>>>
|
||||
* <<<hadoop fs -put - hdfs://nn.example.com/hadoop/hadoopfile>>>
|
||||
Reads the input from stdin.
|
||||
|
||||
Exit Code:
|
||||
|
||||
Returns 0 on success and -1 on error.
|
||||
|
||||
rm
|
||||
* renameSnapshot
|
||||
|
||||
Usage: <<<hdfs dfs -rm [-f] [-r|-R] [-skipTrash] URI [URI ...]>>>
|
||||
See {{{../hadoop-hdfs/HdfsSnapshots.html}HDFS Snapshots Guide}}.
|
||||
|
||||
* rm
|
||||
|
||||
Usage: <<<hadoop fs -rm [-f] [-r|-R] [-skipTrash] URI [URI ...]>>>
|
||||
|
||||
Delete files specified as args.
|
||||
|
||||
|
@ -484,23 +534,37 @@ rm
|
|||
|
||||
Example:
|
||||
|
||||
* <<<hdfs dfs -rm hdfs://nn.example.com/file /user/hadoop/emptydir>>>
|
||||
* <<<hadoop fs -rm hdfs://nn.example.com/file /user/hadoop/emptydir>>>
|
||||
|
||||
Exit Code:
|
||||
|
||||
Returns 0 on success and -1 on error.
|
||||
|
||||
rmr
|
||||
* rmdir
|
||||
|
||||
Usage: <<<hdfs dfs -rmr [-skipTrash] URI [URI ...]>>>
|
||||
Usage: <<<hadoop fs -rmdir [--ignore-fail-on-non-empty] URI [URI ...]>>>
|
||||
|
||||
Delete a directory.
|
||||
|
||||
Options:
|
||||
|
||||
* --ignore-fail-on-non-empty: When using wildcards, do not fail if a directory still contains files.
|
||||
|
||||
Example:
|
||||
|
||||
* <<<hadoop fs -rmdir /user/hadoop/emptydir>>>
|
||||
|
||||
* rmr
|
||||
|
||||
Usage: <<<hadoop fs -rmr [-skipTrash] URI [URI ...]>>>
|
||||
|
||||
Recursive version of delete.
|
||||
|
||||
<<Note:>> This command is deprecated. Instead use <<<hdfs dfs -rm -r>>>
|
||||
<<Note:>> This command is deprecated. Instead use <<<hadoop fs -rm -r>>>
|
||||
|
||||
setfacl
|
||||
* setfacl
|
||||
|
||||
Usage: <<<hdfs dfs -setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>] >>>
|
||||
Usage: <<<hadoop fs -setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>] >>>
|
||||
|
||||
Sets Access Control Lists (ACLs) of files and directories.
|
||||
|
||||
|
@ -528,27 +592,27 @@ setfacl
|
|||
|
||||
Examples:
|
||||
|
||||
* <<<hdfs dfs -setfacl -m user:hadoop:rw- /file>>>
|
||||
* <<<hadoop fs -setfacl -m user:hadoop:rw- /file>>>
|
||||
|
||||
* <<<hdfs dfs -setfacl -x user:hadoop /file>>>
|
||||
* <<<hadoop fs -setfacl -x user:hadoop /file>>>
|
||||
|
||||
* <<<hdfs dfs -setfacl -b /file>>>
|
||||
* <<<hadoop fs -setfacl -b /file>>>
|
||||
|
||||
* <<<hdfs dfs -setfacl -k /dir>>>
|
||||
* <<<hadoop fs -setfacl -k /dir>>>
|
||||
|
||||
* <<<hdfs dfs -setfacl --set user::rw-,user:hadoop:rw-,group::r--,other::r-- /file>>>
|
||||
* <<<hadoop fs -setfacl --set user::rw-,user:hadoop:rw-,group::r--,other::r-- /file>>>
|
||||
|
||||
* <<<hdfs dfs -setfacl -R -m user:hadoop:r-x /dir>>>
|
||||
* <<<hadoop fs -setfacl -R -m user:hadoop:r-x /dir>>>
|
||||
|
||||
* <<<hdfs dfs -setfacl -m default:user:hadoop:r-x /dir>>>
|
||||
* <<<hadoop fs -setfacl -m default:user:hadoop:r-x /dir>>>
|
||||
|
||||
Exit Code:
|
||||
|
||||
Returns 0 on success and non-zero on error.
|
||||
|
||||
setfattr
|
||||
* setfattr
|
||||
|
||||
Usage: <<<hdfs dfs -setfattr {-n name [-v value] | -x name} <path> >>>
|
||||
Usage: <<<hadoop fs -setfattr {-n name [-v value] | -x name} <path> >>>
|
||||
|
||||
Sets an extended attribute name and value for a file or directory.
|
||||
|
||||
|
@ -566,19 +630,19 @@ setfattr
|
|||
|
||||
Examples:
|
||||
|
||||
* <<<hdfs dfs -setfattr -n user.myAttr -v myValue /file>>>
|
||||
* <<<hadoop fs -setfattr -n user.myAttr -v myValue /file>>>
|
||||
|
||||
* <<<hdfs dfs -setfattr -n user.noValue /file>>>
|
||||
* <<<hadoop fs -setfattr -n user.noValue /file>>>
|
||||
|
||||
* <<<hdfs dfs -setfattr -x user.myAttr /file>>>
|
||||
* <<<hadoop fs -setfattr -x user.myAttr /file>>>
|
||||
|
||||
Exit Code:
|
||||
|
||||
Returns 0 on success and non-zero on error.
|
||||
|
||||
setrep
|
||||
* setrep
|
||||
|
||||
Usage: <<<hdfs dfs -setrep [-R] [-w] <numReplicas> <path> >>>
|
||||
Usage: <<<hadoop fs -setrep [-R] [-w] <numReplicas> <path> >>>
|
||||
|
||||
Changes the replication factor of a file. If <path> is a directory then
|
||||
the command recursively changes the replication factor of all files under
|
||||
|
@ -593,28 +657,28 @@ setrep
|
|||
|
||||
Example:
|
||||
|
||||
* <<<hdfs dfs -setrep -w 3 /user/hadoop/dir1>>>
|
||||
* <<<hadoop fs -setrep -w 3 /user/hadoop/dir1>>>
|
||||
|
||||
Exit Code:
|
||||
|
||||
Returns 0 on success and -1 on error.
|
||||
|
||||
stat
|
||||
* stat
|
||||
|
||||
Usage: <<<hdfs dfs -stat URI [URI ...]>>>
|
||||
Usage: <<<hadoop fs -stat URI [URI ...]>>>
|
||||
|
||||
Returns the stat information on the path.
|
||||
|
||||
Example:
|
||||
|
||||
* <<<hdfs dfs -stat path>>>
|
||||
* <<<hadoop fs -stat path>>>
|
||||
|
||||
Exit Code:
|
||||
Returns 0 on success and -1 on error.
|
||||
|
||||
tail
|
||||
* tail
|
||||
|
||||
Usage: <<<hdfs dfs -tail [-f] URI>>>
|
||||
Usage: <<<hadoop fs -tail [-f] URI>>>
|
||||
|
||||
Displays last kilobyte of the file to stdout.
|
||||
|
||||
|
@ -624,43 +688,54 @@ tail
|
|||
|
||||
Example:
|
||||
|
||||
* <<<hdfs dfs -tail pathname>>>
|
||||
* <<<hadoop fs -tail pathname>>>
|
||||
|
||||
Exit Code:
|
||||
Returns 0 on success and -1 on error.
|
||||
|
||||
test
|
||||
* test
|
||||
|
||||
Usage: <<<hdfs dfs -test -[ezd] URI>>>
|
||||
Usage: <<<hadoop fs -test -[defsz] URI>>>
|
||||
|
||||
Options:
|
||||
|
||||
* The -e option will check to see if the file exists, returning 0 if true.
|
||||
* -d: f the path is a directory, return 0.
|
||||
|
||||
* The -z option will check to see if the file is zero length, returning 0 if true.
|
||||
* -e: if the path exists, return 0.
|
||||
|
||||
* The -d option will check to see if the path is directory, returning 0 if true.
|
||||
* -f: if the path is a file, return 0.
|
||||
|
||||
* -s: if the path is not empty, return 0.
|
||||
|
||||
* -z: if the file is zero length, return 0.
|
||||
|
||||
Example:
|
||||
|
||||
* <<<hdfs dfs -test -e filename>>>
|
||||
* <<<hadoop fs -test -e filename>>>
|
||||
|
||||
text
|
||||
* text
|
||||
|
||||
Usage: <<<hdfs dfs -text <src> >>>
|
||||
Usage: <<<hadoop fs -text <src> >>>
|
||||
|
||||
Takes a source file and outputs the file in text format. The allowed formats
|
||||
are zip and TextRecordInputStream.
|
||||
|
||||
touchz
|
||||
* touchz
|
||||
|
||||
Usage: <<<hdfs dfs -touchz URI [URI ...]>>>
|
||||
Usage: <<<hadoop fs -touchz URI [URI ...]>>>
|
||||
|
||||
Create a file of zero length.
|
||||
|
||||
Example:
|
||||
|
||||
* <<<hdfs dfs -touchz pathname>>>
|
||||
* <<<hadoop fs -touchz pathname>>>
|
||||
|
||||
Exit Code:
|
||||
Returns 0 on success and -1 on error.
|
||||
|
||||
|
||||
* usage
|
||||
|
||||
Usage: <<<hadoop fs -usage command>>>
|
||||
|
||||
Return the help for an individual command.
|
|
@ -11,12 +11,12 @@
|
|||
~~ limitations under the License. See accompanying LICENSE file.
|
||||
|
||||
---
|
||||
Hadoop MapReduce Next Generation ${project.version} - Setting up a Single Node Cluster.
|
||||
Hadoop ${project.version} - Setting up a Single Node Cluster.
|
||||
---
|
||||
---
|
||||
${maven.build.timestamp}
|
||||
|
||||
Hadoop MapReduce Next Generation - Setting up a Single Node Cluster.
|
||||
Hadoop - Setting up a Single Node Cluster.
|
||||
|
||||
%{toc|section=1|fromDepth=0}
|
||||
|
||||
|
@ -46,7 +46,9 @@ Hadoop MapReduce Next Generation - Setting up a Single Node Cluster.
|
|||
HadoopJavaVersions}}.
|
||||
|
||||
[[2]] ssh must be installed and sshd must be running to use the Hadoop
|
||||
scripts that manage remote Hadoop daemons.
|
||||
scripts that manage remote Hadoop daemons if the optional start
|
||||
and stop scripts are to be used. Additionally, it is recommmended that
|
||||
pdsh also be installed for better ssh resource management.
|
||||
|
||||
** Installing Software
|
||||
|
||||
|
@ -57,7 +59,7 @@ Hadoop MapReduce Next Generation - Setting up a Single Node Cluster.
|
|||
|
||||
----
|
||||
$ sudo apt-get install ssh
|
||||
$ sudo apt-get install rsync
|
||||
$ sudo apt-get install pdsh
|
||||
----
|
||||
|
||||
* Download
|
||||
|
@ -75,9 +77,6 @@ Hadoop MapReduce Next Generation - Setting up a Single Node Cluster.
|
|||
----
|
||||
# set to the root of your Java installation
|
||||
export JAVA_HOME=/usr/java/latest
|
||||
|
||||
# Assuming your installation directory is /usr/local/hadoop
|
||||
export HADOOP_PREFIX=/usr/local/hadoop
|
||||
----
|
||||
|
||||
Try the following command:
|
||||
|
@ -158,6 +157,7 @@ Hadoop MapReduce Next Generation - Setting up a Single Node Cluster.
|
|||
----
|
||||
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
|
||||
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
|
||||
$ chmod 0700 ~/.ssh/authorized_keys
|
||||
----
|
||||
|
||||
** Execution
|
||||
|
@ -228,7 +228,7 @@ Hadoop MapReduce Next Generation - Setting up a Single Node Cluster.
|
|||
$ sbin/stop-dfs.sh
|
||||
----
|
||||
|
||||
** YARN on Single Node
|
||||
** YARN on a Single Node
|
||||
|
||||
You can run a MapReduce job on YARN in a pseudo-distributed mode by setting
|
||||
a few parameters and running ResourceManager daemon and NodeManager daemon
|
||||
|
@ -239,7 +239,7 @@ Hadoop MapReduce Next Generation - Setting up a Single Node Cluster.
|
|||
|
||||
[[1]] Configure parameters as follows:
|
||||
|
||||
etc/hadoop/mapred-site.xml:
|
||||
<<<etc/hadoop/mapred-site.xml>>>:
|
||||
|
||||
+---+
|
||||
<configuration>
|
||||
|
@ -250,7 +250,7 @@ Hadoop MapReduce Next Generation - Setting up a Single Node Cluster.
|
|||
</configuration>
|
||||
+---+
|
||||
|
||||
etc/hadoop/yarn-site.xml:
|
||||
<<<etc/hadoop/yarn-site.xml>>>:
|
||||
|
||||
+---+
|
||||
<configuration>
|
||||
|
|
Loading…
Reference in New Issue