HADOOP-10618: Merging r1596964 from trunk to branch-2.
git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-2@1596965 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
50acb4032f
commit
517eccb0c7
|
@ -54,6 +54,9 @@ Release 2.5.0 - UNRELEASED
|
||||||
HADOOP-10614. CBZip2InputStream is not threadsafe (Xiangrui Meng via
|
HADOOP-10614. CBZip2InputStream is not threadsafe (Xiangrui Meng via
|
||||||
Sandy Ryza)
|
Sandy Ryza)
|
||||||
|
|
||||||
|
HADOOP-10618. Remove SingleNodeSetup.apt.vm. (Akira Ajisaka via
|
||||||
|
Arpit Agarwal)
|
||||||
|
|
||||||
OPTIMIZATIONS
|
OPTIMIZATIONS
|
||||||
|
|
||||||
BUG FIXES
|
BUG FIXES
|
||||||
|
|
|
@ -18,210 +18,7 @@
|
||||||
|
|
||||||
Single Node Setup
|
Single Node Setup
|
||||||
|
|
||||||
%{toc|section=1|fromDepth=0}
|
This page will be removed in the next major release.
|
||||||
|
|
||||||
* Purpose
|
See {{{./SingleCluster.html}Single Cluster Setup}} to set up and configure a
|
||||||
|
single-node Hadoop installation.
|
||||||
This document describes how to set up and configure a single-node
|
|
||||||
Hadoop installation so that you can quickly perform simple operations
|
|
||||||
using Hadoop MapReduce and the Hadoop Distributed File System (HDFS).
|
|
||||||
|
|
||||||
* Prerequisites
|
|
||||||
|
|
||||||
** Supported Platforms
|
|
||||||
|
|
||||||
* GNU/Linux is supported as a development and production platform.
|
|
||||||
Hadoop has been demonstrated on GNU/Linux clusters with 2000 nodes.
|
|
||||||
|
|
||||||
* Windows is also a supported platform.
|
|
||||||
|
|
||||||
** Required Software
|
|
||||||
|
|
||||||
Required software for Linux and Windows include:
|
|
||||||
|
|
||||||
[[1]] Java^TM 1.6.x, preferably from Sun, must be installed.
|
|
||||||
|
|
||||||
[[2]] ssh must be installed and sshd must be running to use the Hadoop
|
|
||||||
scripts that manage remote Hadoop daemons.
|
|
||||||
|
|
||||||
** Installing Software
|
|
||||||
|
|
||||||
If your cluster doesn't have the requisite software you will need to
|
|
||||||
install it.
|
|
||||||
|
|
||||||
For example on Ubuntu Linux:
|
|
||||||
|
|
||||||
----
|
|
||||||
$ sudo apt-get install ssh
|
|
||||||
$ sudo apt-get install rsync
|
|
||||||
----
|
|
||||||
|
|
||||||
* Download
|
|
||||||
|
|
||||||
To get a Hadoop distribution, download a recent stable release from one
|
|
||||||
of the Apache Download Mirrors.
|
|
||||||
|
|
||||||
* Prepare to Start the Hadoop Cluster
|
|
||||||
|
|
||||||
Unpack the downloaded Hadoop distribution. In the distribution, edit
|
|
||||||
the file <<<conf/hadoop-env.sh>>> to define at least <<<JAVA_HOME>>> to be the root
|
|
||||||
of your Java installation.
|
|
||||||
|
|
||||||
Try the following command:
|
|
||||||
|
|
||||||
----
|
|
||||||
$ bin/hadoop
|
|
||||||
----
|
|
||||||
|
|
||||||
This will display the usage documentation for the hadoop script.
|
|
||||||
|
|
||||||
Now you are ready to start your Hadoop cluster in one of the three
|
|
||||||
supported modes:
|
|
||||||
|
|
||||||
* Local (Standalone) Mode
|
|
||||||
|
|
||||||
* Pseudo-Distributed Mode
|
|
||||||
|
|
||||||
* Fully-Distributed Mode
|
|
||||||
|
|
||||||
* Standalone Operation
|
|
||||||
|
|
||||||
By default, Hadoop is configured to run in a non-distributed mode, as a
|
|
||||||
single Java process. This is useful for debugging.
|
|
||||||
|
|
||||||
The following example copies the unpacked conf directory to use as
|
|
||||||
input and then finds and displays every match of the given regular
|
|
||||||
expression. Output is written to the given output directory.
|
|
||||||
|
|
||||||
----
|
|
||||||
$ mkdir input
|
|
||||||
$ cp conf/*.xml input
|
|
||||||
$ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
|
|
||||||
$ cat output/*
|
|
||||||
---
|
|
||||||
|
|
||||||
* Pseudo-Distributed Operation
|
|
||||||
|
|
||||||
Hadoop can also be run on a single-node in a pseudo-distributed mode
|
|
||||||
where each Hadoop daemon runs in a separate Java process.
|
|
||||||
|
|
||||||
** Configuration
|
|
||||||
|
|
||||||
Use the following:
|
|
||||||
|
|
||||||
conf/core-site.xml:
|
|
||||||
|
|
||||||
----
|
|
||||||
<configuration>
|
|
||||||
<property>
|
|
||||||
<name>fs.defaultFS</name>
|
|
||||||
<value>hdfs://localhost:9000</value>
|
|
||||||
</property>
|
|
||||||
</configuration>
|
|
||||||
----
|
|
||||||
|
|
||||||
conf/hdfs-site.xml:
|
|
||||||
|
|
||||||
----
|
|
||||||
<configuration>
|
|
||||||
<property>
|
|
||||||
<name>dfs.replication</name>
|
|
||||||
<value>1</value>
|
|
||||||
</property>
|
|
||||||
</configuration>
|
|
||||||
----
|
|
||||||
|
|
||||||
conf/mapred-site.xml:
|
|
||||||
|
|
||||||
----
|
|
||||||
<configuration>
|
|
||||||
<property>
|
|
||||||
<name>mapred.job.tracker</name>
|
|
||||||
<value>localhost:9001</value>
|
|
||||||
</property>
|
|
||||||
</configuration>
|
|
||||||
----
|
|
||||||
|
|
||||||
** Setup passphraseless ssh
|
|
||||||
|
|
||||||
Now check that you can ssh to the localhost without a passphrase:
|
|
||||||
|
|
||||||
----
|
|
||||||
$ ssh localhost
|
|
||||||
----
|
|
||||||
|
|
||||||
If you cannot ssh to localhost without a passphrase, execute the
|
|
||||||
following commands:
|
|
||||||
|
|
||||||
----
|
|
||||||
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
|
|
||||||
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
|
|
||||||
----
|
|
||||||
|
|
||||||
** Execution
|
|
||||||
|
|
||||||
Format a new distributed-filesystem:
|
|
||||||
|
|
||||||
----
|
|
||||||
$ bin/hadoop namenode -format
|
|
||||||
----
|
|
||||||
|
|
||||||
Start the hadoop daemons:
|
|
||||||
|
|
||||||
----
|
|
||||||
$ bin/start-all.sh
|
|
||||||
----
|
|
||||||
|
|
||||||
The hadoop daemon log output is written to the <<<${HADOOP_LOG_DIR}>>>
|
|
||||||
directory (defaults to <<<${HADOOP_PREFIX}/logs>>>).
|
|
||||||
|
|
||||||
Browse the web interface for the NameNode and the JobTracker; by
|
|
||||||
default they are available at:
|
|
||||||
|
|
||||||
* NameNode - <<<http://localhost:50070/>>>
|
|
||||||
|
|
||||||
* JobTracker - <<<http://localhost:50030/>>>
|
|
||||||
|
|
||||||
Copy the input files into the distributed filesystem:
|
|
||||||
|
|
||||||
----
|
|
||||||
$ bin/hadoop fs -put conf input
|
|
||||||
----
|
|
||||||
|
|
||||||
Run some of the examples provided:
|
|
||||||
|
|
||||||
----
|
|
||||||
$ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
|
|
||||||
----
|
|
||||||
|
|
||||||
Examine the output files:
|
|
||||||
|
|
||||||
Copy the output files from the distributed filesystem to the local
|
|
||||||
filesytem and examine them:
|
|
||||||
|
|
||||||
----
|
|
||||||
$ bin/hadoop fs -get output output
|
|
||||||
$ cat output/*
|
|
||||||
----
|
|
||||||
|
|
||||||
or
|
|
||||||
|
|
||||||
View the output files on the distributed filesystem:
|
|
||||||
|
|
||||||
----
|
|
||||||
$ bin/hadoop fs -cat output/*
|
|
||||||
----
|
|
||||||
|
|
||||||
When you're done, stop the daemons with:
|
|
||||||
|
|
||||||
----
|
|
||||||
$ bin/stop-all.sh
|
|
||||||
----
|
|
||||||
|
|
||||||
* Fully-Distributed Operation
|
|
||||||
|
|
||||||
For information on setting up fully-distributed, non-trivial clusters
|
|
||||||
see {{{./ClusterSetup.html}Cluster Setup}}.
|
|
||||||
|
|
||||||
Java and JNI are trademarks or registered trademarks of Sun
|
|
||||||
Microsystems, Inc. in the United States and other countries.
|
|
||||||
|
|
Loading…
Reference in New Issue