HADOOP-6738. Move cluster_setup.xml, hod_scheduler, commands_manual from MapReduce to Common.
git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/trunk@951480 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
04a00b74f5
commit
ea5200d922
|
@ -932,6 +932,9 @@ Release 0.21.0 - Unreleased
|
||||||
HADOOP-6585. Add FileStatus#isDirectory and isFile. (Eli Collins via
|
HADOOP-6585. Add FileStatus#isDirectory and isFile. (Eli Collins via
|
||||||
tomwhite)
|
tomwhite)
|
||||||
|
|
||||||
|
HADOOP-6738. Move cluster_setup.xml from MapReduce to Common.
|
||||||
|
(Tom White via tomwhite)
|
||||||
|
|
||||||
OPTIMIZATIONS
|
OPTIMIZATIONS
|
||||||
|
|
||||||
HADOOP-5595. NameNode does not need to run a replicator to choose a
|
HADOOP-5595. NameNode does not need to run a replicator to choose a
|
||||||
|
|
|
@ -33,20 +33,20 @@
|
||||||
Hadoop clusters ranging from a few nodes to extremely large clusters with
|
Hadoop clusters ranging from a few nodes to extremely large clusters with
|
||||||
thousands of nodes.</p>
|
thousands of nodes.</p>
|
||||||
<p>
|
<p>
|
||||||
To play with Hadoop, you may first want to install Hadoop on a single machine (see <a href="single_node_setup.html"> Single Node Setup</a>).
|
To play with Hadoop, you may first want to install Hadoop on a single machine (see <a href="single_node_setup.html"> Hadoop Quick Start</a>).
|
||||||
</p>
|
</p>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section>
|
<section>
|
||||||
<title>Prerequisites</title>
|
<title>Pre-requisites</title>
|
||||||
|
|
||||||
<ol>
|
<ol>
|
||||||
<li>
|
<li>
|
||||||
Make sure all <a href="single_node_setup.html#PreReqs">required software</a>
|
Make sure all <a href="single_node_setup.html#PreReqs">requisite</a> software
|
||||||
is installed on all nodes in your cluster.
|
is installed on all nodes in your cluster.
|
||||||
</li>
|
</li>
|
||||||
<li>
|
<li>
|
||||||
<a href="single_node_setup.html#Download">Download</a> the Hadoop software.
|
<a href="single_node_setup.html#Download">Get</a> the Hadoop software.
|
||||||
</li>
|
</li>
|
||||||
</ol>
|
</ol>
|
||||||
</section>
|
</section>
|
||||||
|
@ -81,21 +81,23 @@
|
||||||
<ol>
|
<ol>
|
||||||
<li>
|
<li>
|
||||||
Read-only default configuration -
|
Read-only default configuration -
|
||||||
<a href="ext:common-default">src/common/common-default.xml</a>,
|
<a href="ext:common-default">src/core/core-default.xml</a>,
|
||||||
<a href="ext:hdfs-default">src/hdfs/hdfs-default.xml</a> and
|
<a href="ext:hdfs-default">src/hdfs/hdfs-default.xml</a>,
|
||||||
<a href="ext:mapred-default">src/mapred/mapred-default.xml</a>.
|
<a href="ext:mapred-default">src/mapred/mapred-default.xml</a> and
|
||||||
|
<a href="ext:mapred-queues">conf/mapred-queues.xml.template</a>.
|
||||||
</li>
|
</li>
|
||||||
<li>
|
<li>
|
||||||
Site-specific configuration -
|
Site-specific configuration -
|
||||||
<em>conf/core-site.xml</em>,
|
<a href="#core-site.xml">conf/core-site.xml</a>,
|
||||||
<em>conf/hdfs-site.xml</em> and
|
<a href="#hdfs-site.xml">conf/hdfs-site.xml</a>,
|
||||||
<em>conf/mapred-site.xml</em>.
|
<a href="#mapred-site.xml">conf/mapred-site.xml</a> and
|
||||||
|
<a href="#mapred-queues.xml">conf/mapred-queues.xml</a>.
|
||||||
</li>
|
</li>
|
||||||
</ol>
|
</ol>
|
||||||
|
|
||||||
<p>To learn more about how the Hadoop framework is controlled by these
|
<p>To learn more about how the Hadoop framework is controlled by these
|
||||||
configuration files see
|
configuration files, look
|
||||||
<a href="ext:api/org/apache/hadoop/conf/configuration">Class Configuration</a>.</p>
|
<a href="ext:api/org/apache/hadoop/conf/configuration">here</a>.</p>
|
||||||
|
|
||||||
<p>Additionally, you can control the Hadoop scripts found in the
|
<p>Additionally, you can control the Hadoop scripts found in the
|
||||||
<code>bin/</code> directory of the distribution, by setting site-specific
|
<code>bin/</code> directory of the distribution, by setting site-specific
|
||||||
|
@ -163,9 +165,8 @@
|
||||||
<title>Configuring the Hadoop Daemons</title>
|
<title>Configuring the Hadoop Daemons</title>
|
||||||
|
|
||||||
<p>This section deals with important parameters to be specified in the
|
<p>This section deals with important parameters to be specified in the
|
||||||
following:
|
following:</p>
|
||||||
<br/>
|
<anchor id="core-site.xml"/><p><code>conf/core-site.xml</code>:</p>
|
||||||
<code>conf/core-site.xml</code>:</p>
|
|
||||||
|
|
||||||
<table>
|
<table>
|
||||||
<tr>
|
<tr>
|
||||||
|
@ -180,7 +181,7 @@
|
||||||
</tr>
|
</tr>
|
||||||
</table>
|
</table>
|
||||||
|
|
||||||
<p><br/><code>conf/hdfs-site.xml</code>:</p>
|
<anchor id="hdfs-site.xml"/><p><code>conf/hdfs-site.xml</code>:</p>
|
||||||
|
|
||||||
<table>
|
<table>
|
||||||
<tr>
|
<tr>
|
||||||
|
@ -212,7 +213,7 @@
|
||||||
</tr>
|
</tr>
|
||||||
</table>
|
</table>
|
||||||
|
|
||||||
<p><br/><code>conf/mapred-site.xml</code>:</p>
|
<anchor id="mapred-site.xml"/><p><code>conf/mapred-site.xml</code>:</p>
|
||||||
|
|
||||||
<table>
|
<table>
|
||||||
<tr>
|
<tr>
|
||||||
|
@ -221,12 +222,12 @@
|
||||||
<th>Notes</th>
|
<th>Notes</th>
|
||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
<td>mapred.job.tracker</td>
|
<td>mapreduce.jobtracker.address</td>
|
||||||
<td>Host or IP and port of <code>JobTracker</code>.</td>
|
<td>Host or IP and port of <code>JobTracker</code>.</td>
|
||||||
<td><em>host:port</em> pair.</td>
|
<td><em>host:port</em> pair.</td>
|
||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
<td>mapred.system.dir</td>
|
<td>mapreduce.jobtracker.system.dir</td>
|
||||||
<td>
|
<td>
|
||||||
Path on the HDFS where where the Map/Reduce framework stores
|
Path on the HDFS where where the Map/Reduce framework stores
|
||||||
system files e.g. <code>/hadoop/mapred/system/</code>.
|
system files e.g. <code>/hadoop/mapred/system/</code>.
|
||||||
|
@ -237,7 +238,7 @@
|
||||||
</td>
|
</td>
|
||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
<td>mapred.local.dir</td>
|
<td>mapreduce.cluster.local.dir</td>
|
||||||
<td>
|
<td>
|
||||||
Comma-separated list of paths on the local filesystem where
|
Comma-separated list of paths on the local filesystem where
|
||||||
temporary Map/Reduce data is written.
|
temporary Map/Reduce data is written.
|
||||||
|
@ -264,7 +265,7 @@
|
||||||
</td>
|
</td>
|
||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
<td>mapred.hosts/mapred.hosts.exclude</td>
|
<td>mapreduce.jobtracker.hosts.filename/mapreduce.jobtracker.hosts.exclude.filename</td>
|
||||||
<td>List of permitted/excluded TaskTrackers.</td>
|
<td>List of permitted/excluded TaskTrackers.</td>
|
||||||
<td>
|
<td>
|
||||||
If necessary, use these files to control the list of allowable
|
If necessary, use these files to control the list of allowable
|
||||||
|
@ -272,82 +273,331 @@
|
||||||
</td>
|
</td>
|
||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
<td>mapred.queue.names</td>
|
<td>mapreduce.cluster.job-authorization-enabled</td>
|
||||||
<td>Comma separated list of queues to which jobs can be submitted.</td>
|
<td>Boolean, specifying whether job ACLs are supported for
|
||||||
|
authorizing view and modification of a job</td>
|
||||||
<td>
|
<td>
|
||||||
The Map/Reduce system always supports atleast one queue
|
If <em>true</em>, job ACLs would be checked while viewing or
|
||||||
with the name as <em>default</em>. Hence, this parameter's
|
modifying a job. More details are available at
|
||||||
value should always contain the string <em>default</em>.
|
<a href ="ext:mapred-tutorial/JobAuthorization">Job Authorization</a>.
|
||||||
Some job schedulers supported in Hadoop, like the
|
|
||||||
<a href="http://hadoop.apache.org/mapreduce/docs/current/capacity_scheduler.html">Capacity Scheduler</a>,
|
|
||||||
support multiple queues. If such a scheduler is
|
|
||||||
being used, the list of configured queue names must be
|
|
||||||
specified here. Once queues are defined, users can submit
|
|
||||||
jobs to a queue using the property name
|
|
||||||
<em>mapred.job.queue.name</em> in the job configuration.
|
|
||||||
There could be a separate
|
|
||||||
configuration file for configuring properties of these
|
|
||||||
queues that is managed by the scheduler.
|
|
||||||
Refer to the documentation of the scheduler for information on
|
|
||||||
the same.
|
|
||||||
</td>
|
</td>
|
||||||
</tr>
|
</tr>
|
||||||
<tr>
|
|
||||||
<td>mapred.acls.enabled</td>
|
</table>
|
||||||
<td>Specifies whether ACLs are supported for controlling job
|
|
||||||
submission and administration</td>
|
|
||||||
<td>
|
|
||||||
If <em>true</em>, ACLs would be checked while submitting
|
|
||||||
and administering jobs. ACLs can be specified using the
|
|
||||||
configuration parameters of the form
|
|
||||||
<em>mapred.queue.queue-name.acl-name</em>, defined below.
|
|
||||||
</td>
|
|
||||||
</tr>
|
|
||||||
</table>
|
|
||||||
|
|
||||||
<p><br/><code> conf/mapred-queue-acls.xml</code></p>
|
|
||||||
|
|
||||||
<table>
|
|
||||||
<tr>
|
|
||||||
<th>Parameter</th>
|
|
||||||
<th>Value</th>
|
|
||||||
<th>Notes</th>
|
|
||||||
</tr>
|
|
||||||
<tr>
|
|
||||||
<td>mapred.queue.<em>queue-name</em>.acl-submit-job</td>
|
|
||||||
<td>List of users and groups that can submit jobs to the
|
|
||||||
specified <em>queue-name</em>.</td>
|
|
||||||
<td>
|
|
||||||
The list of users and groups are both comma separated
|
|
||||||
list of names. The two lists are separated by a blank.
|
|
||||||
Example: <em>user1,user2 group1,group2</em>.
|
|
||||||
If you wish to define only a list of groups, provide
|
|
||||||
a blank at the beginning of the value.
|
|
||||||
</td>
|
|
||||||
</tr>
|
|
||||||
<tr>
|
|
||||||
<td>mapred.queue.<em>queue-name</em>.acl-administer-job</td>
|
|
||||||
<td>List of users and groups that can change the priority
|
|
||||||
or kill jobs that have been submitted to the
|
|
||||||
specified <em>queue-name</em>.</td>
|
|
||||||
<td>
|
|
||||||
The list of users and groups are both comma separated
|
|
||||||
list of names. The two lists are separated by a blank.
|
|
||||||
Example: <em>user1,user2 group1,group2</em>.
|
|
||||||
If you wish to define only a list of groups, provide
|
|
||||||
a blank at the beginning of the value. Note that an
|
|
||||||
owner of a job can always change the priority or kill
|
|
||||||
his/her own job, irrespective of the ACLs.
|
|
||||||
</td>
|
|
||||||
</tr>
|
|
||||||
</table>
|
|
||||||
|
|
||||||
|
|
||||||
<p>Typically all the above parameters are marked as
|
<p>Typically all the above parameters are marked as
|
||||||
<a href="ext:api/org/apache/hadoop/conf/configuration/final_parameters">
|
<a href="ext:api/org/apache/hadoop/conf/configuration/final_parameters">
|
||||||
final</a> to ensure that they cannot be overriden by user-applications.
|
final</a> to ensure that they cannot be overriden by user-applications.
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
|
<anchor id="mapred-queues.xml"/><p><code>conf/mapred-queues.xml
|
||||||
|
</code>:</p>
|
||||||
|
<p>This file is used to configure the queues in the Map/Reduce
|
||||||
|
system. Queues are abstract entities in the JobTracker that can be
|
||||||
|
used to manage collections of jobs. They provide a way for
|
||||||
|
administrators to organize jobs in specific ways and to enforce
|
||||||
|
certain policies on such collections, thus providing varying
|
||||||
|
levels of administrative control and management functions on jobs.
|
||||||
|
</p>
|
||||||
|
<p>One can imagine the following sample scenarios:</p>
|
||||||
|
<ul>
|
||||||
|
<li> Jobs submitted by a particular group of users can all be
|
||||||
|
submitted to one queue. </li>
|
||||||
|
<li> Long running jobs in an organization can be submitted to a
|
||||||
|
queue. </li>
|
||||||
|
<li> Short running jobs can be submitted to a queue and the number
|
||||||
|
of jobs that can run concurrently can be restricted. </li>
|
||||||
|
</ul>
|
||||||
|
<p>The usage of queues is closely tied to the scheduler configured
|
||||||
|
at the JobTracker via <em>mapreduce.jobtracker.taskscheduler</em>.
|
||||||
|
The degree of support of queues depends on the scheduler used. Some
|
||||||
|
schedulers support a single queue, while others support more complex
|
||||||
|
configurations. Schedulers also implement the policies that apply
|
||||||
|
to jobs in a queue. Some schedulers, such as the Fairshare scheduler,
|
||||||
|
implement their own mechanisms for collections of jobs and do not rely
|
||||||
|
on queues provided by the framework. The administrators are
|
||||||
|
encouraged to refer to the documentation of the scheduler they are
|
||||||
|
interested in for determining the level of support for queues.</p>
|
||||||
|
<p>The Map/Reduce framework supports some basic operations on queues
|
||||||
|
such as job submission to a specific queue, access control for queues,
|
||||||
|
queue states, viewing configured queues and their properties
|
||||||
|
and refresh of queue properties. In order to fully implement some of
|
||||||
|
these operations, the framework takes the help of the configured
|
||||||
|
scheduler.</p>
|
||||||
|
<p>The following types of queue configurations are possible:</p>
|
||||||
|
<ul>
|
||||||
|
<li> Single queue: The default configuration in Map/Reduce comprises
|
||||||
|
of a single queue, as supported by the default scheduler. All jobs
|
||||||
|
are submitted to this default queue which maintains jobs in a priority
|
||||||
|
based FIFO order.</li>
|
||||||
|
<li> Multiple single level queues: Multiple queues are defined, and
|
||||||
|
jobs can be submitted to any of these queues. Different policies
|
||||||
|
can be applied to these queues by schedulers that support this
|
||||||
|
configuration to provide a better level of support. For example,
|
||||||
|
the <a href="ext:capacity-scheduler">capacity scheduler</a>
|
||||||
|
provides ways of configuring different
|
||||||
|
capacity and fairness guarantees on these queues.</li>
|
||||||
|
<li> Hierarchical queues: Hierarchical queues are a configuration in
|
||||||
|
which queues can contain other queues within them recursively. The
|
||||||
|
queues that contain other queues are referred to as
|
||||||
|
container queues. Queues that do not contain other queues are
|
||||||
|
referred as leaf or job queues. Jobs can only be submitted to leaf
|
||||||
|
queues. Hierarchical queues can potentially offer a higher level
|
||||||
|
of control to administrators, as schedulers can now build a
|
||||||
|
hierarchy of policies where policies applicable to a container
|
||||||
|
queue can provide context for policies applicable to queues it
|
||||||
|
contains. It also opens up possibilities for delegating queue
|
||||||
|
administration where administration of queues in a container queue
|
||||||
|
can be turned over to a different set of administrators, within
|
||||||
|
the context provided by the container queue. For example, the
|
||||||
|
<a href="ext:capacity-scheduler">capacity scheduler</a>
|
||||||
|
uses hierarchical queues to partition capacity of a cluster
|
||||||
|
among container queues, and allowing queues they contain to divide
|
||||||
|
that capacity in more ways.</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<p>Most of the configuration of the queues can be refreshed/reloaded
|
||||||
|
without restarting the Map/Reduce sub-system by editing this
|
||||||
|
configuration file as described in the section on
|
||||||
|
<a href="commands_manual.html#RefreshQueues">reloading queue
|
||||||
|
configuration</a>.
|
||||||
|
Not all configuration properties can be reloaded of course,
|
||||||
|
as will description of each property below explain.</p>
|
||||||
|
|
||||||
|
<p>The format of conf/mapred-queues.xml is different from the other
|
||||||
|
configuration files, supporting nested configuration
|
||||||
|
elements to support hierarchical queues. The format is as follows:
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<source>
|
||||||
|
<queues aclsEnabled="$aclsEnabled">
|
||||||
|
<queue>
|
||||||
|
<name>$queue-name</name>
|
||||||
|
<state>$state</state>
|
||||||
|
<queue>
|
||||||
|
<name>$child-queue1</name>
|
||||||
|
<properties>
|
||||||
|
<property key="$key" value="$value"/>
|
||||||
|
...
|
||||||
|
</properties>
|
||||||
|
<queue>
|
||||||
|
<name>$grand-child-queue1</name>
|
||||||
|
...
|
||||||
|
</queue>
|
||||||
|
</queue>
|
||||||
|
<queue>
|
||||||
|
<name>$child-queue2</name>
|
||||||
|
...
|
||||||
|
</queue>
|
||||||
|
...
|
||||||
|
...
|
||||||
|
...
|
||||||
|
<queue>
|
||||||
|
<name>$leaf-queue</name>
|
||||||
|
<acl-submit-job>$acls</acl-submit-job>
|
||||||
|
<acl-administer-jobs>$acls</acl-administer-jobs>
|
||||||
|
<properties>
|
||||||
|
<property key="$key" value="$value"/>
|
||||||
|
...
|
||||||
|
</properties>
|
||||||
|
</queue>
|
||||||
|
</queue>
|
||||||
|
</queues>
|
||||||
|
</source>
|
||||||
|
<table>
|
||||||
|
<tr>
|
||||||
|
<th>Tag/Attribute</th>
|
||||||
|
<th>Value</th>
|
||||||
|
<th>
|
||||||
|
<a href="commands_manual.html#RefreshQueues">Refresh-able?</a>
|
||||||
|
</th>
|
||||||
|
<th>Notes</th>
|
||||||
|
</tr>
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td><anchor id="queues_tag"/>queues</td>
|
||||||
|
<td>Root element of the configuration file.</td>
|
||||||
|
<td>Not-applicable</td>
|
||||||
|
<td>All the queues are nested inside this root element of the
|
||||||
|
file. There can be only one root queues element in the file.</td>
|
||||||
|
</tr>
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td>aclsEnabled</td>
|
||||||
|
<td>Boolean attribute to the
|
||||||
|
<a href="#queues_tag"><em><queues></em></a> tag
|
||||||
|
specifying whether ACLs are supported for controlling job
|
||||||
|
submission and administration for <em>all</em> the queues
|
||||||
|
configured.
|
||||||
|
</td>
|
||||||
|
<td>Yes</td>
|
||||||
|
<td>If <em>false</em>, ACLs are ignored for <em>all</em> the
|
||||||
|
configured queues. <br/><br/>
|
||||||
|
If <em>true</em>, the user and group details of the user
|
||||||
|
are checked against the configured ACLs of the corresponding
|
||||||
|
job-queue while submitting and administering jobs. ACLs can be
|
||||||
|
specified for each queue using the queue-specific tags
|
||||||
|
"acl-$acl_name", defined below. ACLs are checked only against
|
||||||
|
the job-queues, i.e. the leaf-level queues; ACLs configured
|
||||||
|
for the rest of the queues in the hierarchy are ignored.
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td><anchor id="queue_tag"/>queue</td>
|
||||||
|
<td>A child element of the
|
||||||
|
<a href="#queues_tag"><em><queues></em></a> tag or another
|
||||||
|
<a href="#queue_tag"><em><queue></em></a>. Denotes a queue
|
||||||
|
in the system.
|
||||||
|
</td>
|
||||||
|
<td>Not applicable</td>
|
||||||
|
<td>Queues can be hierarchical and so this element can contain
|
||||||
|
children of this same type.</td>
|
||||||
|
</tr>
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td>name</td>
|
||||||
|
<td>Child element of a
|
||||||
|
<a href="#queue_tag"><em><queue></em></a> specifying the
|
||||||
|
name of the queue.</td>
|
||||||
|
<td>No</td>
|
||||||
|
<td>Name of the queue cannot contain the character <em>":"</em>
|
||||||
|
which is reserved as the queue-name delimiter when addressing a
|
||||||
|
queue in a hierarchy.</td>
|
||||||
|
</tr>
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td>state</td>
|
||||||
|
<td>Child element of a
|
||||||
|
<a href="#queue_tag"><em><queue></em></a> specifying the
|
||||||
|
state of the queue.
|
||||||
|
</td>
|
||||||
|
<td>Yes</td>
|
||||||
|
<td>Each queue has a corresponding state. A queue in
|
||||||
|
<em>'running'</em> state can accept new jobs, while a queue in
|
||||||
|
<em>'stopped'</em> state will stop accepting any new jobs. State
|
||||||
|
is defined and respected by the framework only for the
|
||||||
|
leaf-level queues and is ignored for all other queues.
|
||||||
|
<br/><br/>
|
||||||
|
The state of the queue can be viewed from the command line using
|
||||||
|
<code>'bin/mapred queue'</code> command and also on the the Web
|
||||||
|
UI.<br/><br/>
|
||||||
|
Administrators can stop and start queues at runtime using the
|
||||||
|
feature of <a href="commands_manual.html#RefreshQueues">reloading
|
||||||
|
queue configuration</a>. If a queue is stopped at runtime, it
|
||||||
|
will complete all the existing running jobs and will stop
|
||||||
|
accepting any new jobs.
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td>acl-submit-job</td>
|
||||||
|
<td>Child element of a
|
||||||
|
<a href="#queue_tag"><em><queue></em></a> specifying the
|
||||||
|
list of users and groups that can submit jobs to the specified
|
||||||
|
queue.</td>
|
||||||
|
<td>Yes</td>
|
||||||
|
<td>
|
||||||
|
Applicable only to leaf-queues.<br/><br/>
|
||||||
|
The list of users and groups are both comma separated
|
||||||
|
list of names. The two lists are separated by a blank.
|
||||||
|
Example: <em>user1,user2 group1,group2</em>.
|
||||||
|
If you wish to define only a list of groups, provide
|
||||||
|
a blank at the beginning of the value.
|
||||||
|
<br/><br/>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td>acl-administer-job</td>
|
||||||
|
<td>Child element of a
|
||||||
|
<a href="#queue_tag"><em><queue></em></a> specifying the
|
||||||
|
list of users and groups that can change the priority of a job
|
||||||
|
or kill a job that has been submitted to the specified queue.
|
||||||
|
</td>
|
||||||
|
<td>Yes</td>
|
||||||
|
<td>
|
||||||
|
Applicable only to leaf-queues.<br/><br/>
|
||||||
|
The list of users and groups are both comma separated
|
||||||
|
list of names. The two lists are separated by a blank.
|
||||||
|
Example: <em>user1,user2 group1,group2</em>.
|
||||||
|
If you wish to define only a list of groups, provide
|
||||||
|
a blank at the beginning of the value. Note that an
|
||||||
|
owner of a job can always change the priority or kill
|
||||||
|
his/her own job, irrespective of the ACLs.
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td><anchor id="properties_tag"/>properties</td>
|
||||||
|
<td>Child element of a
|
||||||
|
<a href="#queue_tag"><em><queue></em></a> specifying the
|
||||||
|
scheduler specific properties.</td>
|
||||||
|
<td>Not applicable</td>
|
||||||
|
<td>The scheduler specific properties are the children of this
|
||||||
|
element specified as a group of <property> tags described
|
||||||
|
below. The JobTracker completely ignores these properties. These
|
||||||
|
can be used as per-queue properties needed by the scheduler
|
||||||
|
being configured. Please look at the scheduler specific
|
||||||
|
documentation as to how these properties are used by that
|
||||||
|
particular scheduler.
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td><anchor id="property_tag"/>property</td>
|
||||||
|
<td>Child element of
|
||||||
|
<a href="#properties_tag"><em><properties></em></a> for a
|
||||||
|
specific queue.</td>
|
||||||
|
<td>Not applicable</td>
|
||||||
|
<td>A single scheduler specific queue-property. Ignored by
|
||||||
|
the JobTracker and used by the scheduler that is configured.</td>
|
||||||
|
</tr>
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td>key</td>
|
||||||
|
<td>Attribute of a
|
||||||
|
<a href="#property_tag"><em><property></em></a> for a
|
||||||
|
specific queue.</td>
|
||||||
|
<td>Scheduler-specific</td>
|
||||||
|
<td>The name of a single scheduler specific queue-property.</td>
|
||||||
|
</tr>
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td>value</td>
|
||||||
|
<td>Attribute of a
|
||||||
|
<a href="#property_tag"><em><property></em></a> for a
|
||||||
|
specific queue.</td>
|
||||||
|
<td>Scheduler-specific</td>
|
||||||
|
<td>The value of a single scheduler specific queue-property.
|
||||||
|
The value can be anything that is left for the proper
|
||||||
|
interpretation by the scheduler that is configured.</td>
|
||||||
|
</tr>
|
||||||
|
|
||||||
|
</table>
|
||||||
|
|
||||||
|
<p>Once the queues are configured properly and the Map/Reduce
|
||||||
|
system is up and running, from the command line one can
|
||||||
|
<a href="commands_manual.html#QueuesList">get the list
|
||||||
|
of queues</a> and
|
||||||
|
<a href="commands_manual.html#QueuesInfo">obtain
|
||||||
|
information specific to each queue</a>. This information is also
|
||||||
|
available from the web UI. On the web UI, queue information can be
|
||||||
|
seen by going to queueinfo.jsp, linked to from the queues table-cell
|
||||||
|
in the cluster-summary table. The queueinfo.jsp prints the hierarchy
|
||||||
|
of queues as well as the specific information for each queue.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p> Users can submit jobs only to a
|
||||||
|
leaf-level queue by specifying the fully-qualified queue-name for
|
||||||
|
the property name <em>mapreduce.job.queuename</em> in the job
|
||||||
|
configuration. The character ':' is the queue-name delimiter and so,
|
||||||
|
for e.g., if one wants to submit to a configured job-queue 'Queue-C'
|
||||||
|
which is one of the sub-queues of 'Queue-B' which in-turn is a
|
||||||
|
sub-queue of 'Queue-A', then the job configuration should contain
|
||||||
|
property <em>mapreduce.job.queuename</em> set to the <em>
|
||||||
|
<value>Queue-A:Queue-B:Queue-C</value></em></p>
|
||||||
|
</section>
|
||||||
<section>
|
<section>
|
||||||
<title>Real-World Cluster Configurations</title>
|
<title>Real-World Cluster Configurations</title>
|
||||||
|
|
||||||
|
@ -383,7 +633,7 @@
|
||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
<td>conf/mapred-site.xml</td>
|
<td>conf/mapred-site.xml</td>
|
||||||
<td>mapred.reduce.parallel.copies</td>
|
<td>mapreduce.reduce.shuffle.parallelcopies</td>
|
||||||
<td>20</td>
|
<td>20</td>
|
||||||
<td>
|
<td>
|
||||||
Higher number of parallel copies run by reduces to fetch
|
Higher number of parallel copies run by reduces to fetch
|
||||||
|
@ -392,7 +642,7 @@
|
||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
<td>conf/mapred-site.xml</td>
|
<td>conf/mapred-site.xml</td>
|
||||||
<td>mapred.map.child.java.opts</td>
|
<td>mapreduce.map.java.opts</td>
|
||||||
<td>-Xmx512M</td>
|
<td>-Xmx512M</td>
|
||||||
<td>
|
<td>
|
||||||
Larger heap-size for child jvms of maps.
|
Larger heap-size for child jvms of maps.
|
||||||
|
@ -400,7 +650,7 @@
|
||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
<td>conf/mapred-site.xml</td>
|
<td>conf/mapred-site.xml</td>
|
||||||
<td>mapred.reduce.child.java.opts</td>
|
<td>mapreduce.reduce.java.opts</td>
|
||||||
<td>-Xmx512M</td>
|
<td>-Xmx512M</td>
|
||||||
<td>
|
<td>
|
||||||
Larger heap-size for child jvms of reduces.
|
Larger heap-size for child jvms of reduces.
|
||||||
|
@ -417,13 +667,13 @@
|
||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
<td>conf/core-site.xml</td>
|
<td>conf/core-site.xml</td>
|
||||||
<td>io.sort.factor</td>
|
<td>mapreduce.task.io.sort.factor</td>
|
||||||
<td>100</td>
|
<td>100</td>
|
||||||
<td>More streams merged at once while sorting files.</td>
|
<td>More streams merged at once while sorting files.</td>
|
||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
<td>conf/core-site.xml</td>
|
<td>conf/core-site.xml</td>
|
||||||
<td>io.sort.mb</td>
|
<td>mapreduce.task.io.sort.mb</td>
|
||||||
<td>200</td>
|
<td>200</td>
|
||||||
<td>Higher memory-limit while sorting data.</td>
|
<td>Higher memory-limit while sorting data.</td>
|
||||||
</tr>
|
</tr>
|
||||||
|
@ -448,7 +698,7 @@
|
||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
<td>conf/mapred-site.xml</td>
|
<td>conf/mapred-site.xml</td>
|
||||||
<td>mapred.job.tracker.handler.count</td>
|
<td>mapreduce.jobtracker.handler.count</td>
|
||||||
<td>60</td>
|
<td>60</td>
|
||||||
<td>
|
<td>
|
||||||
More JobTracker server threads to handle RPCs from large
|
More JobTracker server threads to handle RPCs from large
|
||||||
|
@ -457,13 +707,13 @@
|
||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
<td>conf/mapred-site.xml</td>
|
<td>conf/mapred-site.xml</td>
|
||||||
<td>mapred.reduce.parallel.copies</td>
|
<td>mapreduce.reduce.shuffle.parallelcopies</td>
|
||||||
<td>50</td>
|
<td>50</td>
|
||||||
<td></td>
|
<td></td>
|
||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
<td>conf/mapred-site.xml</td>
|
<td>conf/mapred-site.xml</td>
|
||||||
<td>tasktracker.http.threads</td>
|
<td>mapreduce.tasktracker.http.threads</td>
|
||||||
<td>50</td>
|
<td>50</td>
|
||||||
<td>
|
<td>
|
||||||
More worker threads for the TaskTracker's http server. The
|
More worker threads for the TaskTracker's http server. The
|
||||||
|
@ -473,7 +723,7 @@
|
||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
<td>conf/mapred-site.xml</td>
|
<td>conf/mapred-site.xml</td>
|
||||||
<td>mapred.map.child.java.opts</td>
|
<td>mapreduce.map.java.opts</td>
|
||||||
<td>-Xmx512M</td>
|
<td>-Xmx512M</td>
|
||||||
<td>
|
<td>
|
||||||
Larger heap-size for child jvms of maps.
|
Larger heap-size for child jvms of maps.
|
||||||
|
@ -481,7 +731,7 @@
|
||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
<td>conf/mapred-site.xml</td>
|
<td>conf/mapred-site.xml</td>
|
||||||
<td>mapred.reduce.child.java.opts</td>
|
<td>mapreduce.reduce.java.opts</td>
|
||||||
<td>-Xmx1024M</td>
|
<td>-Xmx1024M</td>
|
||||||
<td>Larger heap-size for child jvms of reduces.</td>
|
<td>Larger heap-size for child jvms of reduces.</td>
|
||||||
</tr>
|
</tr>
|
||||||
|
@ -500,11 +750,11 @@
|
||||||
or equal to the -Xmx passed to JavaVM, else the VM might not start.
|
or equal to the -Xmx passed to JavaVM, else the VM might not start.
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
<p>Note: <code>mapred.child.java.opts</code> are used only for
|
<p>Note: <code>mapred.{map|reduce}.child.java.opts</code> are used only for
|
||||||
configuring the launched child tasks from task tracker. Configuring
|
configuring the launched child tasks from task tracker. Configuring
|
||||||
the memory options for daemons is documented under
|
the memory options for daemons is documented in
|
||||||
<a href="cluster_setup.html#Configuring+the+Environment+of+the+Hadoop+Daemons">
|
<a href="cluster_setup.html#Configuring+the+Environment+of+the+Hadoop+Daemons">
|
||||||
Configuring the Environment of the Hadoop Daemons</a>.</p>
|
cluster_setup.html </a></p>
|
||||||
|
|
||||||
<p>The memory available to some parts of the framework is also
|
<p>The memory available to some parts of the framework is also
|
||||||
configurable. In map and reduce tasks, performance may be influenced
|
configurable. In map and reduce tasks, performance may be influenced
|
||||||
|
@ -558,7 +808,7 @@
|
||||||
|
|
||||||
<table>
|
<table>
|
||||||
<tr><th>Name</th><th>Type</th><th>Description</th></tr>
|
<tr><th>Name</th><th>Type</th><th>Description</th></tr>
|
||||||
<tr><td>mapred.tasktracker.taskmemorymanager.monitoring-interval</td>
|
<tr><td>mapreduce.tasktracker.taskmemorymanager.monitoringinterval</td>
|
||||||
<td>long</td>
|
<td>long</td>
|
||||||
<td>The time interval, in milliseconds, between which the TT
|
<td>The time interval, in milliseconds, between which the TT
|
||||||
checks for any memory violation. The default value is 5000 msec
|
checks for any memory violation. The default value is 5000 msec
|
||||||
|
@ -668,10 +918,11 @@
|
||||||
the tasks. For maximum security, this task controller
|
the tasks. For maximum security, this task controller
|
||||||
sets up restricted permissions and user/group ownership of
|
sets up restricted permissions and user/group ownership of
|
||||||
local files and directories used by the tasks such as the
|
local files and directories used by the tasks such as the
|
||||||
job jar files, intermediate files and task log files. Currently
|
job jar files, intermediate files, task log files and distributed
|
||||||
permissions on distributed cache files are opened up to be
|
cache files. Particularly note that, because of this, except the
|
||||||
accessible by all users. In future, it is expected that stricter
|
job owner and tasktracker, no other user can access any of the
|
||||||
file permissions are set for these files too.
|
local files/directories including those localized as part of the
|
||||||
|
distributed cache.
|
||||||
</td>
|
</td>
|
||||||
</tr>
|
</tr>
|
||||||
</table>
|
</table>
|
||||||
|
@ -684,7 +935,7 @@
|
||||||
<th>Property</th><th>Value</th><th>Notes</th>
|
<th>Property</th><th>Value</th><th>Notes</th>
|
||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
<td>mapred.task.tracker.task-controller</td>
|
<td>mapreduce.tasktracker.taskcontroller</td>
|
||||||
<td>Fully qualified class name of the task controller class</td>
|
<td>Fully qualified class name of the task controller class</td>
|
||||||
<td>Currently there are two implementations of task controller
|
<td>Currently there are two implementations of task controller
|
||||||
in the Hadoop system, DefaultTaskController and LinuxTaskController.
|
in the Hadoop system, DefaultTaskController and LinuxTaskController.
|
||||||
|
@ -715,21 +966,35 @@
|
||||||
<p>
|
<p>
|
||||||
The executable must have specific permissions as follows. The
|
The executable must have specific permissions as follows. The
|
||||||
executable should have <em>6050 or --Sr-s---</em> permissions
|
executable should have <em>6050 or --Sr-s---</em> permissions
|
||||||
user-owned by root(super-user) and group-owned by a group
|
user-owned by root(super-user) and group-owned by a special group
|
||||||
of which only the TaskTracker's user is the sole group member.
|
of which the TaskTracker's user is the group member and no job
|
||||||
|
submitter is. If any job submitter belongs to this special group,
|
||||||
|
security will be compromised. This special group name should be
|
||||||
|
specified for the configuration property
|
||||||
|
<em>"mapreduce.tasktracker.group"</em> in both mapred-site.xml and
|
||||||
|
<a href="#task-controller.cfg">task-controller.cfg</a>.
|
||||||
For example, let's say that the TaskTracker is run as user
|
For example, let's say that the TaskTracker is run as user
|
||||||
<em>mapred</em> who is part of the groups <em>users</em> and
|
<em>mapred</em> who is part of the groups <em>users</em> and
|
||||||
<em>mapredGroup</em> any of them being the primary group.
|
<em>specialGroup</em> any of them being the primary group.
|
||||||
Let also be that <em>users</em> has both <em>mapred</em> and
|
Let also be that <em>users</em> has both <em>mapred</em> and
|
||||||
another user <em>X</em> as its members, while <em>mapredGroup</em>
|
another user (job submitter) <em>X</em> as its members, and X does
|
||||||
has only <em>mapred</em> as its member. Going by the above
|
not belong to <em>specialGroup</em>. Going by the above
|
||||||
description, the setuid/setgid executable should be set
|
description, the setuid/setgid executable should be set
|
||||||
<em>6050 or --Sr-s---</em> with user-owner as <em>mapred</em> and
|
<em>6050 or --Sr-s---</em> with user-owner as <em>mapred</em> and
|
||||||
group-owner as <em>mapredGroup</em> which has
|
group-owner as <em>specialGroup</em> which has
|
||||||
only <em>mapred</em> as its member(and not <em>users</em> which has
|
<em>mapred</em> as its member(and not <em>users</em> which has
|
||||||
<em>X</em> also as its member besides <em>mapred</em>).
|
<em>X</em> also as its member besides <em>mapred</em>).
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
The LinuxTaskController requires that paths including and leading up
|
||||||
|
to the directories specified in
|
||||||
|
<em>mapreduce.cluster.local.dir</em> and <em>hadoop.log.dir</em> to
|
||||||
|
be set 755 permissions.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<title>task-controller.cfg</title>
|
||||||
<p>The executable requires a configuration file called
|
<p>The executable requires a configuration file called
|
||||||
<em>taskcontroller.cfg</em> to be
|
<em>taskcontroller.cfg</em> to be
|
||||||
present in the configuration directory passed to the ant target
|
present in the configuration directory passed to the ant target
|
||||||
|
@ -747,8 +1012,8 @@
|
||||||
</p>
|
</p>
|
||||||
<table><tr><th>Name</th><th>Description</th></tr>
|
<table><tr><th>Name</th><th>Description</th></tr>
|
||||||
<tr>
|
<tr>
|
||||||
<td>mapred.local.dir</td>
|
<td>mapreduce.cluster.local.dir</td>
|
||||||
<td>Path to mapred local directories. Should be same as the value
|
<td>Path to mapreduce.cluster.local.directories. Should be same as the value
|
||||||
which was provided to key in mapred-site.xml. This is required to
|
which was provided to key in mapred-site.xml. This is required to
|
||||||
validate paths passed to the setuid executable in order to prevent
|
validate paths passed to the setuid executable in order to prevent
|
||||||
arbitrary paths being passed to it.</td>
|
arbitrary paths being passed to it.</td>
|
||||||
|
@ -760,14 +1025,16 @@
|
||||||
permissions on the log files so that they can be written to by the user's
|
permissions on the log files so that they can be written to by the user's
|
||||||
tasks and read by the TaskTracker for serving on the web UI.</td>
|
tasks and read by the TaskTracker for serving on the web UI.</td>
|
||||||
</tr>
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>mapreduce.tasktracker.group</td>
|
||||||
|
<td>Group to which the TaskTracker belongs. The group owner of the
|
||||||
|
taskcontroller binary should be this group. Should be same as
|
||||||
|
the value with which the TaskTracker is configured. This
|
||||||
|
configuration is required for validating the secure access of the
|
||||||
|
task-controller binary.</td>
|
||||||
|
</tr>
|
||||||
</table>
|
</table>
|
||||||
|
</section>
|
||||||
<p>
|
|
||||||
The LinuxTaskController requires that paths including and leading up to
|
|
||||||
the directories specified in
|
|
||||||
<em>mapred.local.dir</em> and <em>hadoop.log.dir</em> to be set 755
|
|
||||||
permissions.
|
|
||||||
</p>
|
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
</section>
|
</section>
|
||||||
|
@ -800,7 +1067,7 @@
|
||||||
monitoring script in <em>mapred-site.xml</em>.</p>
|
monitoring script in <em>mapred-site.xml</em>.</p>
|
||||||
<table>
|
<table>
|
||||||
<tr><th>Name</th><th>Description</th></tr>
|
<tr><th>Name</th><th>Description</th></tr>
|
||||||
<tr><td><code>mapred.healthChecker.script.path</code></td>
|
<tr><td><code>mapreduce.tasktracker.healthchecker.script.path</code></td>
|
||||||
<td>Absolute path to the script which is periodically run by the
|
<td>Absolute path to the script which is periodically run by the
|
||||||
TaskTracker to determine if the node is
|
TaskTracker to determine if the node is
|
||||||
healthy or not. The file should be executable by the TaskTracker.
|
healthy or not. The file should be executable by the TaskTracker.
|
||||||
|
@ -809,18 +1076,18 @@
|
||||||
is not started.</td>
|
is not started.</td>
|
||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
<td><code>mapred.healthChecker.interval</code></td>
|
<td><code>mapreduce.tasktracker.healthchecker.interval</code></td>
|
||||||
<td>Frequency at which the node health script is run,
|
<td>Frequency at which the node health script is run,
|
||||||
in milliseconds</td>
|
in milliseconds</td>
|
||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
<td><code>mapred.healthChecker.script.timeout</code></td>
|
<td><code>mapreduce.tasktracker.healthchecker.script.timeout</code></td>
|
||||||
<td>Time after which the node health script will be killed by
|
<td>Time after which the node health script will be killed by
|
||||||
the TaskTracker if unresponsive.
|
the TaskTracker if unresponsive.
|
||||||
The node is marked unhealthy. if node health script times out.</td>
|
The node is marked unhealthy. if node health script times out.</td>
|
||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
<td><code>mapred.healthChecker.script.args</code></td>
|
<td><code>mapreduce.tasktracker.healthchecker.script.args</code></td>
|
||||||
<td>Extra arguments that can be passed to the node health script
|
<td>Extra arguments that can be passed to the node health script
|
||||||
when launched.
|
when launched.
|
||||||
These should be comma separated list of arguments. </td>
|
These should be comma separated list of arguments. </td>
|
||||||
|
@ -857,17 +1124,17 @@
|
||||||
<title>History Logging</title>
|
<title>History Logging</title>
|
||||||
|
|
||||||
<p> The job history files are stored in central location
|
<p> The job history files are stored in central location
|
||||||
<code> hadoop.job.history.location </code> which can be on DFS also,
|
<code> mapreduce.jobtracker.jobhistory.location </code> which can be on DFS also,
|
||||||
whose default value is <code>${HADOOP_LOG_DIR}/history</code>.
|
whose default value is <code>${HADOOP_LOG_DIR}/history</code>.
|
||||||
The history web UI is accessible from job tracker web UI.</p>
|
The history web UI is accessible from job tracker web UI.</p>
|
||||||
|
|
||||||
<p> The history files are also logged to user specified directory
|
<p> The history files are also logged to user specified directory
|
||||||
<code>hadoop.job.history.user.location</code>
|
<code>mapreduce.job.userhistorylocation</code>
|
||||||
which defaults to job output directory. The files are stored in
|
which defaults to job output directory. The files are stored in
|
||||||
"_logs/history/" in the specified directory. Hence, by default
|
"_logs/history/" in the specified directory. Hence, by default
|
||||||
they will be in "mapred.output.dir/_logs/history/". User can stop
|
they will be in "mapreduce.output.fileoutputformat.outputdir/_logs/history/". User can stop
|
||||||
logging by giving the value <code>none</code> for
|
logging by giving the value <code>none</code> for
|
||||||
<code>hadoop.job.history.user.location</code> </p>
|
<code>mapreduce.job.userhistorylocation</code> </p>
|
||||||
|
|
||||||
<p> User can view the history logs summary in specified directory
|
<p> User can view the history logs summary in specified directory
|
||||||
using the following command <br/>
|
using the following command <br/>
|
||||||
|
@ -880,7 +1147,6 @@
|
||||||
<code>$ bin/hadoop job -history all output-dir</code><br/></p>
|
<code>$ bin/hadoop job -history all output-dir</code><br/></p>
|
||||||
</section>
|
</section>
|
||||||
</section>
|
</section>
|
||||||
</section>
|
|
||||||
|
|
||||||
<p>Once all the necessary configuration is complete, distribute the files
|
<p>Once all the necessary configuration is complete, distribute the files
|
||||||
to the <code>HADOOP_CONF_DIR</code> directory on all the machines,
|
to the <code>HADOOP_CONF_DIR</code> directory on all the machines,
|
||||||
|
@ -891,9 +1157,9 @@
|
||||||
<section>
|
<section>
|
||||||
<title>Map/Reduce</title>
|
<title>Map/Reduce</title>
|
||||||
<p>The job tracker restart can recover running jobs if
|
<p>The job tracker restart can recover running jobs if
|
||||||
<code>mapred.jobtracker.restart.recover</code> is set true and
|
<code>mapreduce.jobtracker.restart.recover</code> is set true and
|
||||||
<a href="#Logging">JobHistory logging</a> is enabled. Also
|
<a href="#Logging">JobHistory logging</a> is enabled. Also
|
||||||
<code>mapred.jobtracker.job.history.block.size</code> value should be
|
<code>mapreduce.jobtracker.jobhistory.block.size</code> value should be
|
||||||
set to an optimal value to dump job history to disk as soon as
|
set to an optimal value to dump job history to disk as soon as
|
||||||
possible, the typical value is 3145728(3MB).</p>
|
possible, the typical value is 3145728(3MB).</p>
|
||||||
</section>
|
</section>
|
||||||
|
@ -951,7 +1217,7 @@
|
||||||
and starts the <code>TaskTracker</code> daemon on all the listed slaves.
|
and starts the <code>TaskTracker</code> daemon on all the listed slaves.
|
||||||
</p>
|
</p>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section>
|
<section>
|
||||||
<title>Hadoop Shutdown</title>
|
<title>Hadoop Shutdown</title>
|
||||||
|
|
||||||
|
|
|
@ -0,0 +1,772 @@
|
||||||
|
<?xml version="1.0"?>
|
||||||
|
<!--
|
||||||
|
Licensed to the Apache Software Foundation (ASF) under one or more
|
||||||
|
contributor license agreements. See the NOTICE file distributed with
|
||||||
|
this work for additional information regarding copyright ownership.
|
||||||
|
The ASF licenses this file to You under the Apache License, Version 2.0
|
||||||
|
(the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software
|
||||||
|
distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
See the License for the specific language governing permissions and
|
||||||
|
limitations under the License.
|
||||||
|
-->
|
||||||
|
|
||||||
|
<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd">
|
||||||
|
<document>
|
||||||
|
<header>
|
||||||
|
<title>Hadoop Commands Guide</title>
|
||||||
|
</header>
|
||||||
|
|
||||||
|
<body>
|
||||||
|
<section>
|
||||||
|
<title>Overview</title>
|
||||||
|
<p>
|
||||||
|
All Hadoop commands are invoked by the bin/hadoop script. Running the Hadoop
|
||||||
|
script without any arguments prints the description for all commands.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
<code>Usage: hadoop [--config confdir] [COMMAND] [GENERIC_OPTIONS] [COMMAND_OPTIONS]</code>
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
Hadoop has an option parsing framework that employs parsing generic options as well as running classes.
|
||||||
|
</p>
|
||||||
|
<table>
|
||||||
|
<tr><th> COMMAND_OPTION </th><th> Description </th></tr>
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td><code>--config confdir</code></td>
|
||||||
|
<td>Overwrites the default Configuration directory. Default is ${HADOOP_HOME}/conf.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>GENERIC_OPTIONS</code></td>
|
||||||
|
<td>The common set of options supported by multiple commands.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>COMMAND</code><br/><code>COMMAND_OPTIONS</code></td>
|
||||||
|
<td>Various commands with their options are described in the following sections. The commands
|
||||||
|
have been grouped into <a href="commands_manual.html#User+Commands">User Commands</a>
|
||||||
|
and <a href="commands_manual.html#Administration+Commands">Administration Commands</a>.</td>
|
||||||
|
</tr>
|
||||||
|
</table>
|
||||||
|
<section>
|
||||||
|
<title>Generic Options</title>
|
||||||
|
<p>
|
||||||
|
The following options are supported by <a href="commands_manual.html#dfsadmin">dfsadmin</a>,
|
||||||
|
<a href="commands_manual.html#fs">fs</a>, <a href="commands_manual.html#fsck">fsck</a> and
|
||||||
|
<a href="commands_manual.html#job">job</a>.
|
||||||
|
Applications should implement
|
||||||
|
<a href="ext:api/org/apache/hadoop/util/tool">Tool</a> to support
|
||||||
|
<a href="ext:api/org/apache/hadoop/util/genericoptionsparser">
|
||||||
|
GenericOptions</a>.
|
||||||
|
</p>
|
||||||
|
<table>
|
||||||
|
<tr><th> GENERIC_OPTION </th><th> Description </th></tr>
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td><code>-conf <configuration file></code></td>
|
||||||
|
<td>Specify an application configuration file.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-D <property=value></code></td>
|
||||||
|
<td>Use value for given property.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-fs <local|namenode:port></code></td>
|
||||||
|
<td>Specify a namenode.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-jt <local|jobtracker:port></code></td>
|
||||||
|
<td>Specify a job tracker. Applies only to <a href="commands_manual.html#job">job</a>.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-files <comma separated list of files></code></td>
|
||||||
|
<td>Specify comma separated files to be copied to the map reduce cluster.
|
||||||
|
Applies only to <a href="commands_manual.html#job">job</a>.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-libjars <comma seperated list of jars></code></td>
|
||||||
|
<td>Specify comma separated jar files to include in the classpath.
|
||||||
|
Applies only to <a href="commands_manual.html#job">job</a>.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-archives <comma separated list of archives></code></td>
|
||||||
|
<td>Specify comma separated archives to be unarchived on the compute machines.
|
||||||
|
Applies only to <a href="commands_manual.html#job">job</a>.</td>
|
||||||
|
</tr>
|
||||||
|
</table>
|
||||||
|
</section>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<title> User Commands </title>
|
||||||
|
<p>Commands useful for users of a Hadoop cluster.</p>
|
||||||
|
<section>
|
||||||
|
<title> archive </title>
|
||||||
|
<p>
|
||||||
|
Creates a Hadoop archive. More information see the <a href="ext:hadoop-archives">Hadoop Archives Guide</a>.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
<code>Usage: hadoop archive -archiveName NAME <src>* <dest></code>
|
||||||
|
</p>
|
||||||
|
<table>
|
||||||
|
<tr><th> COMMAND_OPTION </th><th> Description </th></tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-archiveName NAME</code></td>
|
||||||
|
<td>Name of the archive to be created.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>src</code></td>
|
||||||
|
<td>Filesystem pathnames which work as usual with regular expressions.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>dest</code></td>
|
||||||
|
<td>Destination directory which would contain the archive.</td>
|
||||||
|
</tr>
|
||||||
|
</table>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<title> distcp </title>
|
||||||
|
<p>
|
||||||
|
Copy file or directories recursively. More information can be found at <a href="ext:distcp">DistCp Guide</a>.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
<code>Usage: hadoop distcp <srcurl> <desturl></code>
|
||||||
|
</p>
|
||||||
|
<table>
|
||||||
|
<tr><th> COMMAND_OPTION </th><th> Description </th></tr>
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td><code>srcurl</code></td>
|
||||||
|
<td>Source Url</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>desturl</code></td>
|
||||||
|
<td>Destination Url</td>
|
||||||
|
</tr>
|
||||||
|
</table>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<title> fs </title>
|
||||||
|
<p>
|
||||||
|
Runs a generic filesystem user client.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
<code>Usage: hadoop fs [</code><a href="commands_manual.html#Generic+Options">GENERIC_OPTIONS</a><code>]
|
||||||
|
[COMMAND_OPTIONS]</code>
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
The various COMMAND_OPTIONS can be found at
|
||||||
|
<a href="file_system_shell.html">File System Shell Guide</a>.
|
||||||
|
</p>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<title> fsck </title>
|
||||||
|
<p>
|
||||||
|
Runs a HDFS filesystem checking utility. See <a href="http://hadoop.apache.org/hdfs/docs/current/hdfs_user_guide.html#Fsck">Fsck</a> for more info.
|
||||||
|
</p>
|
||||||
|
<p><code>Usage: hadoop fsck [</code><a href="commands_manual.html#Generic+Options">GENERIC_OPTIONS</a><code>]
|
||||||
|
<path> [-move | -delete | -openforwrite] [-files [-blocks
|
||||||
|
[-locations | -racks]]]</code></p>
|
||||||
|
<table>
|
||||||
|
<tr><th> COMMAND_OPTION </th><th> Description </th></tr>
|
||||||
|
<tr>
|
||||||
|
<td><code><path></code></td>
|
||||||
|
<td>Start checking from this path.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-move</code></td>
|
||||||
|
<td>Move corrupted files to /lost+found</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-delete</code></td>
|
||||||
|
<td>Delete corrupted files.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-openforwrite</code></td>
|
||||||
|
<td>Print out files opened for write.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-files</code></td>
|
||||||
|
<td>Print out files being checked.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-blocks</code></td>
|
||||||
|
<td>Print out block report.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-locations</code></td>
|
||||||
|
<td>Print out locations for every block.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-racks</code></td>
|
||||||
|
<td>Print out network topology for data-node locations.</td>
|
||||||
|
</tr>
|
||||||
|
</table>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<title> jar </title>
|
||||||
|
<p>
|
||||||
|
Runs a jar file. Users can bundle their Map Reduce code in a jar file and execute it using this command.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
<code>Usage: hadoop jar <jar> [mainClass] args...</code>
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
The streaming jobs are run via this command. For examples, see
|
||||||
|
<a href="ext:streaming">Hadoop Streaming</a>.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
The WordCount example is also run using jar command. For examples, see the
|
||||||
|
<a href="ext:mapred-tutorial">MapReduce Tutorial</a>.
|
||||||
|
</p>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<title> job </title>
|
||||||
|
<p>
|
||||||
|
Command to interact with Map Reduce Jobs.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
<code>Usage: hadoop job [</code><a href="commands_manual.html#Generic+Options">GENERIC_OPTIONS</a><code>]
|
||||||
|
[-submit <job-file>] | [-status <job-id>] |
|
||||||
|
[-counter <job-id> <group-name> <counter-name>] | [-kill <job-id>] |
|
||||||
|
[-events <job-id> <from-event-#> <#-of-events>] | [-history [all] <historyFile>] |
|
||||||
|
[-list [all]] | [-kill-task <task-id>] | [-fail-task <task-id>] |
|
||||||
|
[-set-priority <job-id> <priority>]</code>
|
||||||
|
</p>
|
||||||
|
<table>
|
||||||
|
<tr><th> COMMAND_OPTION </th><th> Description </th></tr>
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td><code>-submit <job-file></code></td>
|
||||||
|
<td>Submits the job.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-status <job-id></code></td>
|
||||||
|
<td>Prints the map and reduce completion percentage and all job counters.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-counter <job-id> <group-name> <counter-name></code></td>
|
||||||
|
<td>Prints the counter value.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-kill <job-id></code></td>
|
||||||
|
<td>Kills the job.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-events <job-id> <from-event-#> <#-of-events></code></td>
|
||||||
|
<td>Prints the events' details received by jobtracker for the given range.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-history [all] <historyFile></code></td>
|
||||||
|
<td>-history <historyFile> prints job details, failed and killed tip details. More details
|
||||||
|
about the job such as successful tasks and task attempts made for each task can be viewed by
|
||||||
|
specifying the [all] option. </td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-list [all]</code></td>
|
||||||
|
<td>-list all displays all jobs. -list displays only jobs which are yet to complete.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-kill-task <task-id></code></td>
|
||||||
|
<td>Kills the task. Killed tasks are NOT counted against failed attempts.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-fail-task <task-id></code></td>
|
||||||
|
<td>Fails the task. Failed tasks are counted against failed attempts.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-set-priority <job-id> <priority></code></td>
|
||||||
|
<td>Changes the priority of the job.
|
||||||
|
Allowed priority values are VERY_HIGH, HIGH, NORMAL, LOW, VERY_LOW</td>
|
||||||
|
</tr>
|
||||||
|
</table>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<title> pipes </title>
|
||||||
|
<p>
|
||||||
|
Runs a pipes job.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
<code>Usage: hadoop pipes [-conf <path>] [-jobconf <key=value>, <key=value>, ...]
|
||||||
|
[-input <path>] [-output <path>] [-jar <jar file>] [-inputformat <class>]
|
||||||
|
[-map <class>] [-partitioner <class>] [-reduce <class>] [-writer <class>]
|
||||||
|
[-program <executable>] [-reduces <num>] </code>
|
||||||
|
</p>
|
||||||
|
<table>
|
||||||
|
<tr><th> COMMAND_OPTION </th><th> Description </th></tr>
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td><code>-conf <path></code></td>
|
||||||
|
<td>Configuration for job</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-jobconf <key=value>, <key=value>, ...</code></td>
|
||||||
|
<td>Add/override configuration for job</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-input <path></code></td>
|
||||||
|
<td>Input directory</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-output <path></code></td>
|
||||||
|
<td>Output directory</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-jar <jar file></code></td>
|
||||||
|
<td>Jar filename</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-inputformat <class></code></td>
|
||||||
|
<td>InputFormat class</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-map <class></code></td>
|
||||||
|
<td>Java Map class</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-partitioner <class></code></td>
|
||||||
|
<td>Java Partitioner</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-reduce <class></code></td>
|
||||||
|
<td>Java Reduce class</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-writer <class></code></td>
|
||||||
|
<td>Java RecordWriter</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-program <executable></code></td>
|
||||||
|
<td>Executable URI</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-reduces <num></code></td>
|
||||||
|
<td>Number of reduces</td>
|
||||||
|
</tr>
|
||||||
|
</table>
|
||||||
|
</section>
|
||||||
|
<section>
|
||||||
|
<title> queue </title>
|
||||||
|
<p>
|
||||||
|
command to interact and view Job Queue information
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
<code>Usage : hadoop queue [-list] | [-info <job-queue-name> [-showJobs]] | [-showacls]</code>
|
||||||
|
</p>
|
||||||
|
<table>
|
||||||
|
<tr>
|
||||||
|
<th> COMMAND_OPTION </th><th> Description </th>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><anchor id="QueuesList"/><code>-list</code> </td>
|
||||||
|
<td>Gets list of Job Queues configured in the system. Along with scheduling information
|
||||||
|
associated with the job queues.
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><anchor id="QueuesInfo"/><code>-info <job-queue-name> [-showJobs]</code></td>
|
||||||
|
<td>
|
||||||
|
Displays the job queue information and associated scheduling information of particular
|
||||||
|
job queue. If -showJobs options is present a list of jobs submitted to the particular job
|
||||||
|
queue is displayed.
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-showacls</code></td>
|
||||||
|
<td>Displays the queue name and associated queue operations allowed for the current user.
|
||||||
|
The list consists of only those queues to which the user has access.
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
</table>
|
||||||
|
</section>
|
||||||
|
<section>
|
||||||
|
<title> version </title>
|
||||||
|
<p>
|
||||||
|
Prints the version.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
<code>Usage: hadoop version</code>
|
||||||
|
</p>
|
||||||
|
</section>
|
||||||
|
<section>
|
||||||
|
<title> CLASSNAME </title>
|
||||||
|
<p>
|
||||||
|
Hadoop script can be used to invoke any class.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
Runs the class named CLASSNAME.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
<code>Usage: hadoop CLASSNAME</code>
|
||||||
|
</p>
|
||||||
|
|
||||||
|
</section>
|
||||||
|
</section>
|
||||||
|
<section>
|
||||||
|
<title> Administration Commands </title>
|
||||||
|
<p>Commands useful for administrators of a Hadoop cluster.</p>
|
||||||
|
<section>
|
||||||
|
<title> balancer </title>
|
||||||
|
<p>
|
||||||
|
Runs a cluster balancing utility. An administrator can simply press Ctrl-C to stop the
|
||||||
|
rebalancing process. For more details see
|
||||||
|
<a href="http://hadoop.apache.org/hdfs/docs/current/hdfs_user_guide.html#Rebalancer">Rebalancer</a>.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
<code>Usage: hadoop balancer [-threshold <threshold>]</code>
|
||||||
|
</p>
|
||||||
|
<table>
|
||||||
|
<tr><th> COMMAND_OPTION </th><th> Description </th></tr>
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td><code>-threshold <threshold></code></td>
|
||||||
|
<td>Percentage of disk capacity. This overwrites the default threshold.</td>
|
||||||
|
</tr>
|
||||||
|
</table>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<title> daemonlog </title>
|
||||||
|
<p>
|
||||||
|
Get/Set the log level for each daemon.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
<code>Usage: hadoop daemonlog -getlevel <host:port> <name></code><br/>
|
||||||
|
<code>Usage: hadoop daemonlog -setlevel <host:port> <name> <level></code>
|
||||||
|
</p>
|
||||||
|
<table>
|
||||||
|
<tr><th> COMMAND_OPTION </th><th> Description </th></tr>
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td><code>-getlevel <host:port> <name></code></td>
|
||||||
|
<td>Prints the log level of the daemon running at <host:port>.
|
||||||
|
This command internally connects to http://<host:port>/logLevel?log=<name></td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-setlevel <host:port> <name> <level></code></td>
|
||||||
|
<td>Sets the log level of the daemon running at <host:port>.
|
||||||
|
This command internally connects to http://<host:port>/logLevel?log=<name></td>
|
||||||
|
</tr>
|
||||||
|
</table>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<title> datanode</title>
|
||||||
|
<p>
|
||||||
|
Runs a HDFS datanode.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
<code>Usage: hadoop datanode [-rollback]</code>
|
||||||
|
</p>
|
||||||
|
<table>
|
||||||
|
<tr><th> COMMAND_OPTION </th><th> Description </th></tr>
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td><code>-rollback</code></td>
|
||||||
|
<td>Rollsback the datanode to the previous version. This should be used after stopping the datanode
|
||||||
|
and distributing the old Hadoop version.</td>
|
||||||
|
</tr>
|
||||||
|
</table>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<title> dfsadmin </title>
|
||||||
|
<p>
|
||||||
|
Runs a HDFS dfsadmin client.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
<code>Usage: hadoop dfsadmin [</code><a href="commands_manual.html#Generic+Options">GENERIC_OPTIONS</a><code>] [-report] [-safemode enter | leave | get | wait] [-refreshNodes]
|
||||||
|
[-finalizeUpgrade] [-upgradeProgress status | details | force] [-metasave filename]
|
||||||
|
[-setQuota <quota> <dirname>...<dirname>] [-clrQuota <dirname>...<dirname>]
|
||||||
|
[-restoreFailedStorage true|false|check]
|
||||||
|
[-help [cmd]]</code>
|
||||||
|
</p>
|
||||||
|
<table>
|
||||||
|
<tr><th> COMMAND_OPTION </th><th> Description </th></tr>
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td><code>-report</code></td>
|
||||||
|
<td>Reports basic filesystem information and statistics.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-safemode enter | leave | get | wait</code></td>
|
||||||
|
<td>Safe mode maintenance command.
|
||||||
|
Safe mode is a Namenode state in which it <br/>
|
||||||
|
1. does not accept changes to the name space (read-only) <br/>
|
||||||
|
2. does not replicate or delete blocks. <br/>
|
||||||
|
Safe mode is entered automatically at Namenode startup, and
|
||||||
|
leaves safe mode automatically when the configured minimum
|
||||||
|
percentage of blocks satisfies the minimum replication
|
||||||
|
condition. Safe mode can also be entered manually, but then
|
||||||
|
it can only be turned off manually as well.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-refreshNodes</code></td>
|
||||||
|
<td>Re-read the hosts and exclude files to update the set
|
||||||
|
of Datanodes that are allowed to connect to the Namenode
|
||||||
|
and those that should be decommissioned or recommissioned.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-finalizeUpgrade</code></td>
|
||||||
|
<td>Finalize upgrade of HDFS.
|
||||||
|
Datanodes delete their previous version working directories,
|
||||||
|
followed by Namenode doing the same.
|
||||||
|
This completes the upgrade process.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-printTopology</code></td>
|
||||||
|
<td>Print a tree of the rack/datanode topology of the
|
||||||
|
cluster as seen by the NameNode.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-upgradeProgress status | details | force</code></td>
|
||||||
|
<td>Request current distributed upgrade status,
|
||||||
|
a detailed status or force the upgrade to proceed.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-metasave filename</code></td>
|
||||||
|
<td>Save Namenode's primary data structures
|
||||||
|
to <filename> in the directory specified by hadoop.log.dir property.
|
||||||
|
<filename> will contain one line for each of the following <br/>
|
||||||
|
1. Datanodes heart beating with Namenode<br/>
|
||||||
|
2. Blocks waiting to be replicated<br/>
|
||||||
|
3. Blocks currrently being replicated<br/>
|
||||||
|
4. Blocks waiting to be deleted</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-setQuota <quota> <dirname>...<dirname></code></td>
|
||||||
|
<td>Set the quota <quota> for each directory <dirname>.
|
||||||
|
The directory quota is a long integer that puts a hard limit on the number of names in the directory tree.<br/>
|
||||||
|
Best effort for the directory, with faults reported if<br/>
|
||||||
|
1. N is not a positive integer, or<br/>
|
||||||
|
2. user is not an administrator, or<br/>
|
||||||
|
3. the directory does not exist or is a file, or<br/>
|
||||||
|
4. the directory would immediately exceed the new quota.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-clrQuota <dirname>...<dirname></code></td>
|
||||||
|
<td>Clear the quota for each directory <dirname>.<br/>
|
||||||
|
Best effort for the directory. with fault reported if<br/>
|
||||||
|
1. the directory does not exist or is a file, or<br/>
|
||||||
|
2. user is not an administrator.<br/>
|
||||||
|
It does not fault if the directory has no quota.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-restoreFailedStorage true | false | check</code></td>
|
||||||
|
<td>This option will turn on/off automatic attempt to restore failed storage replicas.
|
||||||
|
If a failed storage becomes available again the system will attempt to restore
|
||||||
|
edits and/or fsimage during checkpoint. 'check' option will return current setting.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-help [cmd]</code></td>
|
||||||
|
<td> Displays help for the given command or all commands if none
|
||||||
|
is specified.</td>
|
||||||
|
</tr>
|
||||||
|
</table>
|
||||||
|
</section>
|
||||||
|
<section>
|
||||||
|
<title>mradmin</title>
|
||||||
|
<p>Runs MR admin client</p>
|
||||||
|
<p><code>Usage: hadoop mradmin [</code>
|
||||||
|
<a href="commands_manual.html#Generic+Options">GENERIC_OPTIONS</a>
|
||||||
|
<code>] [-refreshServiceAcl] [-refreshQueues] [-refreshNodes] [-help [cmd]] </code></p>
|
||||||
|
<table>
|
||||||
|
<tr>
|
||||||
|
<th> COMMAND_OPTION </th><th> Description </th>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-refreshServiceAcl</code></td>
|
||||||
|
<td> Reload the service-level authorization policies. Jobtracker
|
||||||
|
will reload the authorization policy file.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><anchor id="RefreshQueues"/><code>-refreshQueues</code></td>
|
||||||
|
<td><p> Reload the queues' configuration at the JobTracker.
|
||||||
|
Most of the configuration of the queues can be refreshed/reloaded
|
||||||
|
without restarting the Map/Reduce sub-system. Administrators
|
||||||
|
typically own the
|
||||||
|
<a href="cluster_setup.html#mapred-queues.xml">
|
||||||
|
<em>conf/mapred-queues.xml</em></a>
|
||||||
|
file, can edit it while the JobTracker is still running, and can do
|
||||||
|
a reload by running this command.</p>
|
||||||
|
<p>It should be noted that while trying to refresh queues'
|
||||||
|
configuration, one cannot change the hierarchy of queues itself.
|
||||||
|
This means no operation that involves a change in either the
|
||||||
|
hierarchy structure itself or the queues' names will be allowed.
|
||||||
|
Only selected properties of queues can be changed during refresh.
|
||||||
|
For example, new queues cannot be added dynamically, neither can an
|
||||||
|
existing queue be deleted.</p>
|
||||||
|
<p>If during a reload of queue configuration,
|
||||||
|
a syntactic or semantic error in made during the editing of the
|
||||||
|
configuration file, the refresh command fails with an exception that
|
||||||
|
is printed on the standard output of this command, thus informing the
|
||||||
|
requester with any helpful messages of what has gone wrong during
|
||||||
|
the edit/reload. Importantly, the existing queue configuration is
|
||||||
|
untouched and the system is left in a consistent state.
|
||||||
|
</p>
|
||||||
|
<p>As described in the
|
||||||
|
<a href="cluster_setup.html#mapred-queues.xml"><em>
|
||||||
|
conf/mapred-queues.xml</em></a> section, the
|
||||||
|
<a href="cluster_setup.html#properties_tag"><em>
|
||||||
|
<properties></em></a> tag in the queue configuration file can
|
||||||
|
also be used to specify per-queue properties needed by the scheduler.
|
||||||
|
When the framework's queue configuration is reloaded using this
|
||||||
|
command, this scheduler specific configuration will also be reloaded
|
||||||
|
, provided the scheduler being configured supports this reload.
|
||||||
|
Please see the documentation of the particular scheduler in use.</p>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-refreshNodes</code></td>
|
||||||
|
<td> Refresh the hosts information at the jobtracker.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-help [cmd]</code></td>
|
||||||
|
<td>Displays help for the given command or all commands if none
|
||||||
|
is specified.</td>
|
||||||
|
</tr>
|
||||||
|
</table>
|
||||||
|
</section>
|
||||||
|
<section>
|
||||||
|
<title> jobtracker </title>
|
||||||
|
<p>
|
||||||
|
Runs the MapReduce job Tracker node.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
<code>Usage: hadoop jobtracker [-dumpConfiguration]</code>
|
||||||
|
</p>
|
||||||
|
<table>
|
||||||
|
<tr>
|
||||||
|
<th>COMMAND_OPTION</th><th> Description</th>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-dumpConfiguration</code></td>
|
||||||
|
<td> Dumps the configuration used by the JobTracker alongwith queue
|
||||||
|
configuration in JSON format into Standard output used by the
|
||||||
|
jobtracker and exits.</td>
|
||||||
|
</tr>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<title> namenode </title>
|
||||||
|
<p>
|
||||||
|
Runs the namenode. For more information about upgrade, rollback and finalize see
|
||||||
|
<a href="http://hadoop.apache.org/hdfs/docs/current/hdfs_user_guide.html#Upgrade+and+Rollback">Upgrade and Rollback</a>.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
<code>Usage: hadoop namenode [-format] | [-upgrade] | [-rollback] | [-finalize] | [-importCheckpoint] | [-checkpoint] | [-backup]</code>
|
||||||
|
</p>
|
||||||
|
<table>
|
||||||
|
<tr><th> COMMAND_OPTION </th><th> Description </th></tr>
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td><code>-regular</code></td>
|
||||||
|
<td>Start namenode in standard, active role rather than as backup or checkpoint node. This is the default role.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-checkpoint</code></td>
|
||||||
|
<td>Start namenode in checkpoint role, creating periodic checkpoints of the active namenode metadata.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-backup</code></td>
|
||||||
|
<td>Start namenode in backup role, maintaining an up-to-date in-memory copy of the namespace and creating periodic checkpoints.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-format</code></td>
|
||||||
|
<td>Formats the namenode. It starts the namenode, formats it and then shut it down.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-upgrade</code></td>
|
||||||
|
<td>Namenode should be started with upgrade option after the distribution of new Hadoop version.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-rollback</code></td>
|
||||||
|
<td>Rollsback the namenode to the previous version. This should be used after stopping the cluster
|
||||||
|
and distributing the old Hadoop version.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-finalize</code></td>
|
||||||
|
<td>Finalize will remove the previous state of the files system. Recent upgrade will become permanent.
|
||||||
|
Rollback option will not be available anymore. After finalization it shuts the namenode down.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-importCheckpoint</code></td>
|
||||||
|
<td>Loads image from a checkpoint directory and saves it into the current one. Checkpoint directory
|
||||||
|
is read from property fs.checkpoint.dir
|
||||||
|
(see <a href="http://hadoop.apache.org/hdfs/docs/current/hdfs_user_guide.html#Import+checkpoint">Import Checkpoint</a>).
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-checkpoint</code></td>
|
||||||
|
<td>Enables checkpointing
|
||||||
|
(see <a href="http://hadoop.apache.org/hdfs/docs/current/hdfs_user_guide.html#Checkpoint+Node">Checkpoint Node</a>).</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-backup</code></td>
|
||||||
|
<td>Enables checkpointing and maintains an in-memory, up-to-date copy of the file system namespace
|
||||||
|
(see <a href="http://hadoop.apache.org/hdfs/docs/current/hdfs_user_guide.html#Backup+Node">Backup Node</a>).</td>
|
||||||
|
</tr>
|
||||||
|
</table>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<title> secondarynamenode </title>
|
||||||
|
<note>
|
||||||
|
The Secondary NameNode has been deprecated. Instead, consider using the
|
||||||
|
<a href="http://hadoop.apache.org/hdfs/docs/current/hdfs_user_guide.html#Checkpoint+Node">Checkpoint Node</a> or
|
||||||
|
<a href="http://hadoop.apache.org/hdfs/docs/current/hdfs_user_guide.html#Backup+Node">Backup Node</a>.
|
||||||
|
</note>
|
||||||
|
<p>
|
||||||
|
Runs the HDFS secondary
|
||||||
|
namenode. See <a href="http://hadoop.apache.org/hdfs/docs/current/hdfs_user_guide.html#Secondary+NameNode">Secondary NameNode</a>
|
||||||
|
for more info.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
<code>Usage: hadoop secondarynamenode [-checkpoint [force]] | [-geteditsize]</code>
|
||||||
|
</p>
|
||||||
|
<table>
|
||||||
|
<tr><th> COMMAND_OPTION </th><th> Description </th></tr>
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td><code>-checkpoint [force]</code></td>
|
||||||
|
<td>Checkpoints the Secondary namenode if EditLog size >= fs.checkpoint.size.
|
||||||
|
If -force is used, checkpoint irrespective of EditLog size.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-geteditsize</code></td>
|
||||||
|
<td>Prints the EditLog size.</td>
|
||||||
|
</tr>
|
||||||
|
</table>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<title> tasktracker </title>
|
||||||
|
<p>
|
||||||
|
Runs a MapReduce task Tracker node.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
<code>Usage: hadoop tasktracker</code>
|
||||||
|
</p>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
</section>
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
</body>
|
||||||
|
</document>
|
File diff suppressed because it is too large
Load Diff
|
@ -97,7 +97,7 @@
|
||||||
|
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section>
|
<section id="Download">
|
||||||
<title>Download</title>
|
<title>Download</title>
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
|
|
|
@ -39,10 +39,12 @@ See http://forrest.apache.org/docs/linking.html for more info.
|
||||||
</docs>
|
</docs>
|
||||||
|
|
||||||
<docs label="Guides">
|
<docs label="Guides">
|
||||||
|
<commands_manual label="Hadoop Commands" href="commands_manual.html" />
|
||||||
<fsshell label="File System Shell" href="file_system_shell.html" />
|
<fsshell label="File System Shell" href="file_system_shell.html" />
|
||||||
<SLA label="Service Level Authorization" href="service_level_auth.html"/>
|
<SLA label="Service Level Authorization" href="service_level_auth.html"/>
|
||||||
<native_lib label="Native Libraries" href="native_libraries.html" />
|
<native_lib label="Native Libraries" href="native_libraries.html" />
|
||||||
<superusers label="Superusers Acting On Behalf Of Other Users" href="Superusers.html"/>
|
<superusers label="Superusers Acting On Behalf Of Other Users" href="Superusers.html"/>
|
||||||
|
<hod_scheduler label="Hadoop On Demand" href="hod_scheduler.html"/>
|
||||||
</docs>
|
</docs>
|
||||||
|
|
||||||
<docs label="Miscellaneous">
|
<docs label="Miscellaneous">
|
||||||
|
@ -69,6 +71,15 @@ See http://forrest.apache.org/docs/linking.html for more info.
|
||||||
<hdfs-default href="http://hadoop.apache.org/hdfs/docs/current/hdfs-default.html" />
|
<hdfs-default href="http://hadoop.apache.org/hdfs/docs/current/hdfs-default.html" />
|
||||||
<mapred-default href="http://hadoop.apache.org/mapreduce/docs/current/mapred-default.html" />
|
<mapred-default href="http://hadoop.apache.org/mapreduce/docs/current/mapred-default.html" />
|
||||||
|
|
||||||
|
<mapred-queues href="http://hadoop.apache.org/mapreduce/docs/current/mapred_queues.xml" />
|
||||||
|
<capacity-scheduler href="http://hadoop.apache.org/mapreduce/docs/current/capacity_scheduler.html" />
|
||||||
|
<mapred-tutorial href="http://hadoop.apache.org/mapreduce/docs/current/mapred_tutorial.html" >
|
||||||
|
<JobAuthorization href="#Job+Authorization" />
|
||||||
|
</mapred-tutorial>
|
||||||
|
<streaming href="http://hadoop.apache.org/mapreduce/docs/current/streaming.html" />
|
||||||
|
<distcp href="http://hadoop.apache.org/mapreduce/docs/current/distcp.html" />
|
||||||
|
<hadoop-archives href="http://hadoop.apache.org/mapreduce/docs/current/hadoop_archives.html" />
|
||||||
|
|
||||||
<zlib href="http://www.zlib.net/" />
|
<zlib href="http://www.zlib.net/" />
|
||||||
<gzip href="http://www.gzip.org/" />
|
<gzip href="http://www.gzip.org/" />
|
||||||
<bzip href="http://www.bzip.org/" />
|
<bzip href="http://www.bzip.org/" />
|
||||||
|
|
Loading…
Reference in New Issue