null
if
no such property exists.
Values are processed for variable expansion
before being returned.
@param name the property name.
@return the value of the name
property,
or null if no such property exists.]]>
name
property,
or null if no such property exists.]]>
name
property.
@param name property name.
@param value property value.]]>
defaultValue
is returned.
@param name property name.
@param defaultValue default value.
@return property value, or defaultValue
if the property
doesn't exist.]]>
int
.
If no such property exists, or if the specified value is not a valid
int
, then defaultValue
is returned.
@param name property name.
@param defaultValue default value.
@return property value as an int
,
or defaultValue
.]]>
int
.
@param name property name.
@param value int
value of the property.]]>
long
.
If no such property is specified, or if the specified value is not a valid
long
, then defaultValue
is returned.
@param name property name.
@param defaultValue default value.
@return property value as a long
,
or defaultValue
.]]>
long
.
@param name property name.
@param value long
value of the property.]]>
float
.
If no such property is specified, or if the specified value is not a valid
float
, then defaultValue
is returned.
@param name property name.
@param defaultValue default value.
@return property value as a float
,
or defaultValue
.]]>
boolean
.
If no such property is specified, or if the specified value is not a valid
boolean
, then defaultValue
is returned.
@param name property name.
@param defaultValue default value.
@return property value as a boolean
,
or defaultValue
.]]>
boolean
.
@param name property name.
@param value boolean
value of the property.]]>
String
s.
If no such property is specified then null
is returned.
@param name property name.
@return property value as an array of String
s,
or null
.]]>
String
s.
If no such property is specified then default value is returned.
@param name property name.
@param defaultValue The default value
@return property value as an array of String
s,
or default value.]]>
Class
.
If no such property is specified, then defaultValue
is
returned.
@param name the class name.
@param defaultValue default value.
@return property value as a Class
,
or defaultValue
.]]>
Class
implementing the interface specified by xface
.
If no such property is specified, then defaultValue
is
returned.
An exception is thrown if the returned class does not implement the named
interface.
@param name the class name.
@param defaultValue default value.
@param xface the interface implemented by the named class.
@return property value as a Class
,
or defaultValue
.]]>
theClass
implementing the given interface xface
.
An exception is thrown if theClass
does not implement the
interface xface
.
@param name property name.
@param theClass property value.
@param xface the interface implemented by the named class.]]>
false
to turn it off.]]>
Configurations are specified by resources. A resource contains a set of
name/value pairs as XML data. Each resource is named by either a
String
or by a {@link Path}. If named by a String
,
then the classpath is examined for a file with that name. If named by a
Path
, then the local filesystem is examined directly, without
referring to the classpath.
Hadoop by default specifies two resources, loaded in-order from the classpath:
Configuration parameters may be declared final.
Once a resource declares a value final, no subsequently-loaded
resource can alter that value.
For example, one might define a final parameter with:
<property>
<name>dfs.client.buffer.dir</name>
<value>/tmp/hadoop/dfs/client</value>
<final>true</final>
</property>
Administrators typically define parameters as final in
hadoop-site.xml for values that user applications may not alter.
Value strings are first processed for variable expansion. The available properties are:
For example, if a configuration resource contains the following property
definitions:
<property>
<name>basedir</name>
<value>/user/${user.name}</value>
</property>
<property>
<name>tempdir</name>
<value>${basedir}/tmp</value>
</property>
When conf.get("tempdir") is called, then ${basedir}
will be resolved to another property in this Configuration, while
${user.name} would then ordinarily be resolved to the value
of the System property with that name.]]>
SYNOPSIS
To start: bin/start-balancer.sh [-threshold] Example: bin/ start-balancer.sh start the balancer with a default threshold of 10% bin/ start-balancer.sh -threshold 5 start the balancer with a threshold of 5% To stop: bin/ stop-balancer.sh
DESCRIPTION
The threshold parameter is a fraction in the range of (0%, 100%) with a default value of 10%. The threshold sets a target for whether the cluster is balanced. A cluster is balanced if for each datanode, the utilization of the node (ratio of used space at the node to total capacity of the node) differs from the utilization of the (ratio of used space in the cluster to total capacity of the cluster) by no more than the threshold value. The smaller the threshold, the more balanced a cluster will become. It takes more time to run the balancer for small threshold values. Also for a very small threshold the cluster may not be able to reach the balanced state when applications write and delete files concurrently.
The tool moves blocks from highly utilized datanodes to poorly utilized datanodes iteratively. In each iteration a datanode moves or receives no more than the lesser of 10G bytes or the threshold fraction of its capacity. Each iteration runs no more than 20 minutes. At the end of each iteration, the balancer obtains updated datanodes information from the namenode.
A system property that limits the balancer's use of bandwidth is defined in the default configuration file:
dfs.balance.bandwidthPerSec 1048576 Specifies the maximum bandwidth that each datanode can utilize for the balancing purpose in term of the number of bytes per second.
This property determines the maximum speed at which a block will be moved from one datanode to another. The default value is 1MB/s. The higher the bandwidth, the faster a cluster can reach the balanced state, but with greater competition with application processes. If an administrator changes the value of this property in the configuration file, the change is observed when HDFS is next restarted.
MONITERING BALANCER PROGRESS
After the balancer is started, an output file name where the balancer progress will be recorded is printed on the screen. The administrator can monitor the running of the balancer by reading the output file. The output shows the balancer's status iteration by iteration. In each iteration it prints the starting time, the iteration number, the total number of bytes that have been moved in the previous iterations, the total number of bytes that are left to move in order for the cluster to be balanced, and the number of bytes that are being moved in this iteration. Normally "Bytes Already Moved" is increasing while "Bytes Left To Move" is decreasing.
Running multiple instances of the balancer in an HDFS cluster is prohibited by the tool.
The balancer automatically exits when any of the following five conditions is satisfied:
Upon exit, a balancer returns an exit code and prints one of the following messages to the output file in corresponding to the above exit reasons:
The administrator can interrupt the execution of the balancer at any time by running the command "stop-balancer.sh" on the machine where the balancer is running.]]>
zero
in the conf.
@param conf confirguration
@throws IOException]]>
size
@param datanode on which blocks are located
@param size total size of blocks]]>
{@link #syncs}.inc()]]>
The most important difference is that unlike GFS, Hadoop DFS files have strictly one writer at any one time. Bytes are always appended to the end of the writer's stream. There is no notion of "record appends" or "mutations" that are then checked or reordered. Writers simply emit a byte stream. That byte stream is guaranteed to be stored in the order written.
]]>{@link #blocksRead}.inc()]]>
dfs.class=org.apache.hadoop.metrics.spi.NullContextWithUpdateThread dfs.period=10
Note that the metrics are collected regardless of the context used. The context with the update thread is used to average the data periodically.
Name Node Status info is reported in another MBean @see org.apache.hadoop.dfs.datanode.metrics.FSDatasetMBean]]>
dfs.class=org.apache.hadoop.metrics.spi.NullContextWithUpdateThread dfs.period=10
Note that the metrics are collected regardless of the context used. The context with the update thread is used to average the data periodically.
Name Node Status info is report in another MBean @see org.apache.hadoop.dfs.namenode.metrics.FSNamesystemMBean]]>
DistributedCache
is a facility provided by the Map-Reduce
framework to cache files (text, archives, jars etc.) needed by applications.
Applications specify the files, via urls (hdfs:// or http://) to be cached
via the {@link JobConf}. The DistributedCache
assumes that the
files specified via hdfs:// urls are already present on the
{@link FileSystem} at the path specified by the url.
The framework will copy the necessary files on to the slave node before any tasks for the job are executed on that node. Its efficiency stems from the fact that the files are only copied once per job and the ability to cache archives which are un-archived on the slaves.
DistributedCache
can be used to distribute simple, read-only
data/text files and/or more complex types such as archives, jars etc.
Archives (zip files) are un-archived at the slave nodes. Jars maybe be
optionally added to the classpath of the tasks, a rudimentary software
distribution mechanism. Files have execution permissions. Optionally users
can also direct it to symlink the distributed cache file(s) into
the working directory of the task.
DistributedCache
tracks modification timestamps of the cache
files. Clearly the cache files should not be modified by the application
or externally while the job is executing.
Here is an illustrative example on how to use the
DistributedCache
:
@see JobConf @see JobClient]]>// Setting up the cache for the application 1. Copy the requisite files to theFileSystem
: $ bin/hadoop fs -copyFromLocal lookup.dat /myapp/lookup.dat $ bin/hadoop fs -copyFromLocal map.zip /myapp/map.zip $ bin/hadoop fs -copyFromLocal mylib.jar /myapp/mylib.jar 2. Setup the application'sJobConf
: JobConf job = new JobConf(); DistributedCache.addCacheFile(new URI("/myapp/lookup.dat#lookup.dat"), job); DistributedCache.addCacheArchive(new URI("/myapp/map.zip", job); DistributedCache.addFileToClassPath(new Path("/myapp/mylib.jar"), job); 3. Use the cached files in the {@link Mapper} or {@link Reducer}: public static class MapClass extends MapReduceBase implements Mapper<K, V, K, V> { private Path[] localArchives; private Path[] localFiles; public void configure(JobConf job) { // Get the cached archives/files localArchives = DistributedCache.getLocalCacheArchives(job); localFiles = DistributedCache.getLocalCacheFiles(job); } public void map(K key, V value, OutputCollector<K, V> output, Reporter reporter) throws IOException { // Use data from the cached archives/files here // ... // ... output.collect(k, v); } }
in
, for later use. An internal
buffer array of length size
is created and stored in buf
.
@param in the underlying input stream.
@param size the buffer size.
@exception IllegalArgumentException if size <= 0.]]>
A filename pattern is composed of regular characters and special pattern matching characters, which are:
The local implementation is {@link LocalFileSystem} and distributed implementation is {@link DistributedFileSystem}.]]>
FilterFileSystem
itself simply overrides all methods of
FileSystem
with versions that
pass all requests to the contained file
system. Subclasses of FilterFileSystem
may further override some of these methods
and may also provide additional methods
and fields.]]>
offset
and checksum into checksum
.
The method is used for implementing read, therefore, it should be optimized
for sequential reading
@param pos chunkPos
@param buf desitination buffer
@param offset offset in buf at which to store data
@param len maximun number of bytes to read
@return number of bytes read]]>
{@link InputStream#read(byte[], int, int) read}
method of
the {@link InputStream}
class. As an additional
convenience, it attempts to read as many bytes as possible by repeatedly
invoking the read
method of the underlying stream. This
iterated read
continues until one of the following
conditions becomes true: read
method of the underlying stream returns
-1
, indicating end-of-file.
read
on the underlying stream returns
-1
to indicate end-of-file then this method returns
-1
. Otherwise this method returns the number of bytes
actually read.
@param b destination buffer.
@param off offset at which to start storing bytes.
@param len maximum number of bytes to read.
@return the number of bytes read, or -1
if the end of
the stream has been reached.
@exception IOException if an I/O error occurs.
ChecksumException if any checksum error occurs]]>
This method may skip more bytes than are remaining in the backing file. This produces no exception and the number of bytes skipped may include some number of bytes that were beyond the EOF of the backing file. Attempting to read from the stream after skipping past the end will result in -1 indicating the end of the file.
If n
is negative, no bytes are skipped.
@param n the number of bytes to be skipped.
@return the actual number of bytes skipped.
@exception IOException if an I/O error occurs.
ChecksumException if the chunk to skip to is corrupted]]>
stm
@param stm an input stream
@param buf destiniation buffer
@param offset offset at which to store data
@param len number of bytes to read
@return actual number of bytes read
@throws IOException if there is any IO error]]>
off
and generate a checksum for
each data chunk.
This method stores bytes from the given array into this stream's buffer before it gets checksumed. The buffer gets checksumed and flushed to the underlying output stream when all data in a checksum chunk are in the buffer. If the buffer is empty and requested length is at least as large as the size of next checksum chunk size, this method will checksum and write the chunk directly to the underlying output stream. Thus it avoids uneccessary data copy. @param b the data. @param off the start offset in the data. @param len the number of bytes to write. @exception IOException if an I/O error occurs.]]>
pathname
should be included]]>
<property> <name>fs.kfs.impl</name> <value>org.apache.hadoop.fs.kfs.KosmosFileSystem</value> <description>The FileSystem for kfs: uris.</description> </property>
<property> <name>fs.default.name</name> <value>kfs://<server:port></value> </property> <property> <name>fs.kfs.metaServerHost</name> <value><server></value> <description>The location of the KFS meta server.</description> </property> <property> <name>fs.kfs.metaServerPort</name> <value><port></value> <description>The location of the meta server's port.</description> </property>
export LD_LIBRARY_PATH=<path>
All files in the filesystem are migrated by re-writing the block metadata - no datafiles are touched.
]]>Files are stored in S3 as blocks (represented by {@link org.apache.hadoop.fs.s3.Block}), which have an ID and a length. Block metadata is stored in S3 as a small record (represented by {@link org.apache.hadoop.fs.s3.INode}) using the URL-encoded path string as a key. Inodes record the file type (regular file or directory) and the list of blocks. This design makes it easy to seek to any given position in a file by reading the inode data to compute which block to access, then using S3's support for HTTP Range headers to start streaming from the correct position. Renames are also efficient since only the inode is moved (by a DELETE followed by a PUT since S3 does not support renames).
For a single file /dir1/file1 which takes two blocks of storage, the file structure in S3 would be something like this:
/ /dir1 /dir1/file1 block-6415776850131549260 block-3026438247347758425
Inodes start with a leading /
, while blocks are prefixed with block-
.
Typical usage is something like the following:
DataInputBuffer buffer = new DataInputBuffer(); while (... loop condition ...) { byte[] data = ... get data ...; int dataLength = ... get data length ...; buffer.reset(data, dataLength); ... read buffer using DataInput methods ... }]]>
Typical usage is something like the following:
DataOutputBuffer buffer = new DataOutputBuffer(); while (... loop condition ...) { buffer.reset(); ... write buffer using DataOutput methods ... byte[] data = buffer.getData(); int dataLength = buffer.getLength(); ... write data to its ultimate destination ... }]]>
Compared with ObjectWritable
, this class is much more effective,
because ObjectWritable
will append the class declaration as a String
into the output file in every Key-Value pair.
Generic Writable implements {@link Configurable} interface, so that it will be configured by the framework. The configuration is passed to the wrapped objects implementing {@link Configurable} interface before deserialization.
how to use it:getTypes()
, defines
the classes which will be wrapped in GenericObject in application.
Attention: this classes defined in getTypes()
method, must
implement Writable
interface.
@since Nov 8, 2006]]>public class GenericObject extends GenericWritable { private static Class[] CLASSES = { ClassType1.class, ClassType2.class, ClassType3.class, }; protected Class[] getTypes() { return CLASSES; } }
Typical usage is something like the following:
InputBuffer buffer = new InputBuffer(); while (... loop condition ...) { byte[] data = ... get data ...; int dataLength = ... get data length ...; buffer.reset(data, dataLength); ... read buffer using InputStream methods ... }@see DataInputBuffer @see DataOutput]]>
data
file,
containing all keys and values in the map, and a smaller index
file, containing a fraction of the keys. The fraction is determined by
{@link Writer#getIndexInterval()}.
The index file is read entirely into memory. Thus key implementations should try to keep themselves small.
Map files are created by adding entries in-order. To maintain a large database, perform updates by copying the previous version of a database and merging in a sorted change list, to create a new version of the database in a new file. Sorting large change lists can be done with {@link SequenceFile.Sorter}.]]>
val
. Returns true if such a pair exists and false when at
the end of the map]]>
key
. Otherwise,
return the record that sorts just after.
@return - the key that was the closest match or null if eof.]]>
Typical usage is something like the following:
OutputBuffer buffer = new OutputBuffer(); while (... loop condition ...) { buffer.reset(); ... write buffer using OutputStream methods ... byte[] data = buffer.getData(); int dataLength = buffer.getLength(); ... write data to its ultimate destination ... }@see DataOutputBuffer @see InputBuffer]]>
SequenceFile
provides {@link Writer}, {@link Reader} and
{@link Sorter} classes for writing, reading and sorting respectively.
SequenceFile
Writer
s based on the
{@link CompressionType} used to compress key/value pairs:
Writer
: Uncompressed records.
RecordCompressWriter
: Record-compressed files, only compress
values.
BlockCompressWriter
: Block-compressed files, both keys &
values are collected in 'blocks'
separately and compressed. The size of
the 'block' is configurable.
The actual compression algorithm used to compress key and/or values can be specified by using the appropriate {@link CompressionCodec}.
The recommended way is to use the static createWriter methods
provided by the SequenceFile
to chose the preferred format.
The {@link Reader} acts as the bridge and can read any of the above
SequenceFile
formats.
Essentially there are 3 different formats for SequenceFile
s
depending on the CompressionType
specified. All of them share a
common header described below.
CompressionCodec
class which is used for
compression of keys and/or values (if compression is
enabled).
100
bytes or so.
100
bytes or so.
100
bytes or so.
The compressed blocks of key lengths and value lengths consist of the actual lengths of individual keys/values encoded in ZeroCompressedInteger format.
@see CompressionCodec]]>val
. Returns true if such a pair exists and false when at
end of file]]>
key
, or null if no match exists.]]>
start
. The starting
position is measured in bytes and the return value is in
terms of byte position in the buffer. The backing buffer is
not converted to a string for this operation.
@return byte position of the first occurence of the search
string in the UTF-8 buffer or -1 if not found]]>
Also includes utilities for serializing/deserialing a string, coding/decoding a string, checking if a byte array contains valid UTF8 code, calculating the length of an encoded string.]]>
DataOuput
to serialize this object into.
@throws IOException]]>
For efficiency, implementations should attempt to re-use storage in the existing object where possible.
@param inDataInput
to deseriablize this object from.
@throws IOException]]>
key
or value
type in the Hadoop Map-Reduce
framework implements this interface.
Implementations typically implement a static read(DataInput)
method which constructs a new instance, calls {@link #readFields(DataInput)}
and returns the instance.
Example:
]]>public class MyWritable implements Writable { // Some data private int counter; private long timestamp; public void write(DataOutput out) throws IOException { out.writeInt(counter); out.writeLong(timestamp); } public void readFields(DataInput in) throws IOException { counter = in.readInt(); timestamp = in.readLong(); } public static MyWritable read(DataInput in) throws IOException { MyWritable w = new MyWritable(); w.readFields(in); return w; } }
WritableComparable
s can be compared to each other, typically
via Comparator
s. Any type which is to be used as a
key
in the Hadoop Map-Reduce framework should implement this
interface.
Example:
]]>public class MyWritableComparable implements WritableComparable { // Some data private int counter; private long timestamp; public void write(DataOutput out) throws IOException { out.writeInt(counter); out.writeLong(timestamp); } public void readFields(DataInput in) throws IOException { counter = in.readInt(); timestamp = in.readLong(); } public int compareTo(MyWritableComparable w) { int thisValue = this.value; int thatValue = ((IntWritable)o).value; return (thisValue < thatValue ? -1 : (thisValue==thatValue ? 0 : 1)); } }
One may optimize compare-intensive operations by overriding {@link #compare(byte[],int,int,byte[],int,int)}. Static utility methods are provided to assist in optimized implementations of this method.]]>
true
if a preset dictionary is needed for decompression]]>
false
]]>
false
]]>
false
]]>
false
]]>
sleepTime
mutliplied by the number of tries so far.
]]>
sleepTime
mutliplied by a random
number in the range of [0, 2 to the number of retries)
]]>
void
methods, or by
re-throwing the exception for non-void
methods.
]]>
true
if the method should be retried,
false
if the method should not be retried
but shouldn't fail with an exception (only for void methods).
@throws Exception The re-thrown exception e
indicating
that the method failed and should not be retried further.]]>
Typical usage is
UnreliableImplementation unreliableImpl = new UnreliableImplementation(); UnreliableInterface unreliable = (UnreliableInterface) RetryProxy.create(UnreliableInterface.class, unreliableImpl, RetryPolicies.retryUpToMaximumCountWithFixedSleep(4, 10, TimeUnit.SECONDS)); unreliable.call();
This will retry any method called on unreliable
four times - in this case the call()
method - sleeping 10 seconds between
each retry. There are a number of {@link org.apache.hadoop.io.retry.RetryPolicies retry policies}
available, or you can implement a custom one by implementing {@link org.apache.hadoop.io.retry.RetryPolicy}.
It is also possible to specify retry policies on a
{@link org.apache.hadoop.io.retry.RetryProxy#create(Class, Object, Map) per-method basis}.
t
is non-null then this deserializer
may set its internal state to the next object read from the input
stream. Otherwise, if the object t
is null a new
deserialized object will be created.
@return the deserialized object]]>
Deserializers are stateful, but must not buffer the input since other producers may read from the input between calls to {@link #deserialize(Object)}.
@paramOne may optimize compare-intensive operations by using a custom implementation of {@link RawComparator} that operates directly on byte representations.
@paramio.serializations
property from conf
, which is a comma-delimited list of
classnames.
]]>
t
to the underlying output stream.]]>
Serializers are stateful, but must not buffer the output since other producers may write to the output between calls to {@link #serialize(Object)}.
@paramTo add a new serialization framework write an implementation of {@link org.apache.hadoop.io.serializer.Serialization} and add its name to the "io.serializations" property.
]]>address
, returning the value. Throws exceptions if there are
network problems or if the remote code threw an exception.]]>
Throwable
that has a constructor taking
a String
as a parameter.
Otherwise it returns this.
@return Throwable]]>
boolean
, byte
,
char
, short
, int
, long
,
float
, double
, or void
; or{@link #rpcDiscardedOps}.inc(time)]]>
rpc.class=org.apache.hadoop.metrics.spi.NullContextWithUpdateThread rpc.period=10
Note that the metrics are collected regardless of the context used. The context with the update thread is used to average the data periodically]]>
JobTracker
.]]>
ClusterStatus
provides clients with information such as:
JobTracker
.
Clients can query for the latest ClusterStatus
, via
{@link JobClient#getClusterStatus()}.
Counters
represent global counters, defined either by the
Map-Reduce framework or applications. Each Counter
can be of
any {@link Enum} type.
Counters
are bunched into {@link Group}s, each comprising of
counters from a particular Enum
class.]]>
Group
handles localization of the class name and the
counter names.
false
to ensure that individual input files are never split-up
so that {@link Mapper}s process entire files.
@param fs the file system that the file is on
@param filename the file name to check
@return is this file splitable?]]>
FileInputFormat
is the base class for all file-based
InputFormat
s. This provides generic implementations of
{@link #validateInput(JobConf)} and {@link #getSplits(JobConf, int)}.
Implementations fo FileInputFormat
can also override the
{@link #isSplitable(FileSystem, Path)} method to ensure input-files are
not split-up and are processed as a whole by {@link Mapper}s.]]>
false
otherwise]]>
Some applications need to create/write-to side-files, which differ from the actual job-outputs.
In such cases there could be issues with 2 instances of the same TIP (running simultaneously e.g. speculative tasks) trying to open/write-to the same file (path) on HDFS. Hence the application-writer will have to pick unique names per task-attempt (e.g. using the taskid, say task_200709221812_0001_m_000000_0), not just per TIP.
To get around this the Map-Reduce framework helps the application-writer out by maintaining a special ${mapred.output.dir}/_temporary/_${taskid} sub-directory for each task-attempt on HDFS where the output of the task-attempt goes. On successful completion of the task-attempt the files in the ${mapred.output.dir}/_temporary/_${taskid} (only) are promoted to ${mapred.output.dir}. Of course, the framework discards the sub-directory of unsuccessful task-attempts. This is completely transparent to the application.
The application-writer can take advantage of this by creating any side-files required in ${mapred.work.output.dir} during execution of his reduce-task i.e. via {@link #getWorkOutputPath(JobConf)}, and the framework will move them out similarly - thus she doesn't have to pick unique paths per task-attempt.
Note: the value of ${mapred.work.output.dir} during execution of a particular task-attempt is actually ${mapred.output.dir}/_temporary/_{$taskid}, and this value is set by the map-reduce framework. So, just create any side-files in the path returned by {@link #getWorkOutputPath(JobConf)} from map/reduce task to take advantage of this feature.
The entire discussion holds true for maps of jobs with reducer=NONE (i.e. 0 reduces) since output of the map, in that case, goes directly to HDFS.
@return the {@link Path} to the task's temporary output directory for the map-reduce job.]]>Note: The split is a logical split of the inputs and the input files are not physically split into chunks. For e.g. a split could be <input-file-path, start, offset> tuple. @param job job configuration. @param numSplits the desired number of splits, a hint. @return an array of {@link InputSplit}s for the job.]]>
RecordReader
to respect
record boundaries while processing the logical split to present a
record-oriented view to the individual task.
@param split the {@link InputSplit}
@param job the job that this split belongs to
@return a {@link RecordReader}]]>
The Map-Reduce framework relies on the InputFormat
of the
job to:
InputSplit
for processing by
the {@link Mapper}.
The default behavior of file-based {@link InputFormat}s, typically sub-classes of {@link FileInputFormat}, is to split the input into logical {@link InputSplit}s based on the total size, in bytes, of the input files. However, the {@link FileSystem} blocksize of the input files is treated as an upper bound for input splits. A lower bound on the split size can be set via mapred.min.split.size.
Clearly, logical splits based on input-size is insufficient for many
applications since record boundaries are to respected. In such cases, the
application has to also implement a {@link RecordReader} on whom lies the
responsibilty to respect record-boundaries and present a record-oriented
view of the logical InputSplit
to the individual task.
@see InputSplit
@see RecordReader
@see JobClient
@see FileInputFormat]]>
String
s.
@throws IOException]]>
Typically, it presents a byte-oriented view on the input and is the responsibility of {@link RecordReader} of the job to process this and present a record-oriented view. @see InputFormat @see RecordReader]]>
JobClient
provides facilities to submit jobs, track their
progress, access component-tasks' reports/logs, get the Map-Reduce cluster
status information etc.
The job submission process involves:
JobTracker
and optionally monitoring
it's status.
JobClient
to submit
the job and monitor its progress.
Here is an example on how to use JobClient
:
// Create a new JobConf JobConf job = new JobConf(new Configuration(), MyJob.class); // Specify various job-specific parameters job.setJobName("myjob"); job.setInputPath(new Path("in")); job.setOutputPath(new Path("out")); job.setMapperClass(MyJob.MyMapper.class); job.setReducerClass(MyJob.MyReducer.class); // Submit the job, then poll for progress until the job is complete JobClient.runJob(job);
At times clients would chain map-reduce jobs to accomplish complex tasks which cannot be done via a single map-reduce job. This is fairly easy since the output of the job, typically, goes to distributed file-system and that can be used as the input for the next job.
However, this also means that the onus on ensuring jobs are complete (success/failure) lies squarely on the clients. In such situations the various job-control options are:
false
otherwise.]]>
false
otherwise.]]>
For key-value pairs (K1,V1) and (K2,V2), the values (V1, V2) are passed in a single call to the reduce function if K1 and K2 compare as equal.
Since {@link #setOutputKeyComparatorClass(Class)} can be used to control how keys are sorted, this can be used in conjunction to simulate secondary sort on values.
Note: This is not a guarantee of the reduce sort being stable in any sense. (In any case, with the order of available map-outputs to the reduce being non-deterministic, it wouldn't make that much sense.)
@param theClass the comparator class to be used for grouping keys. It should implementRawComparator
.
@see #setOutputKeyComparatorClass(Class)]]>
The combiner is a task-level aggregation operation which, in some cases, helps to cut down the amount of data transferred from the {@link Mapper} to the {@link Reducer}, leading to better performance.
Typically the combiner is same as the Reducer
for the
job i.e. {@link #setReducerClass(Class)}.
true
if speculative execution be used for this job,
false
otherwise.]]>
false
.]]>
true
if speculative execution be
used for this job for map tasks,
false
otherwise.]]>
false
.]]>
true
if speculative execution be used
for reduce tasks for this job,
false
otherwise.]]>
false
.]]>
The number of maps is usually driven by the total size of the inputs i.e. total number of blocks of the input files.
The right level of parallelism for maps seems to be around 10-100 maps per-node, although it has been set up to 300 or so for very cpu-light map tasks. Task setup takes awhile, so it is best if the maps take at least a minute to execute.
The default behavior of file-based {@link InputFormat}s is to split the input into logical {@link InputSplit}s based on the total size, in bytes, of input files. However, the {@link FileSystem} blocksize of the input files is treated as an upper bound for input splits. A lower bound on the split size can be set via mapred.min.split.size.
Thus, if you expect 10TB of input data and have a blocksize of 128MB, you'll end up with 82,000 maps, unless {@link #setNumMapTasks(int)} is used to set it even higher.
@param n the number of map tasks for this job. @see InputFormat#getSplits(JobConf, int) @see FileInputFormat @see FileSystem#getDefaultBlockSize() @see FileStatus#getBlockSize()]]>The right number of reduces seems to be 0.95
or
1.75
multiplied by (<no. of nodes> *
mapred.tasktracker.reduce.tasks.maximum).
With 0.95
all of the reduces can launch immediately and
start transfering map outputs as the maps finish. With 1.75
the faster nodes will finish their first round of reduces and launch a
second wave of reduces doing a much better job of load balancing.
Increasing the number of reduces increases the framework overhead, but increases load balancing and lowers the cost of failures.
The scaling factors above are slightly less than whole numbers to reserve a few reduce slots in the framework for speculative-tasks, failures etc.
It is legal to set the number of reduce-tasks to zero
.
In this case the output of the map-tasks directly go to distributed file-system, to the path set by {@link FileOutputFormat#setOutputPath(JobConf, Path)}. Also, the framework doesn't sort the map-outputs before writing it out to HDFS.
@param n the number of reduce tasks for this job.]]>zero
, i.e. any failed map-task results in
the job being declared as {@link JobStatus#FAILED}.
@return the maximum percentage of map tasks that can fail without
the job being aborted.]]>
zero
, i.e. any failed reduce-task results
in the job being declared as {@link JobStatus#FAILED}.
@return the maximum percentage of reduce tasks that can fail without
the job being aborted.]]>
The debug command, run on the node where the map failed, is:
$script $stdout $stderr $syslog $jobconf.
The script file is distributed through {@link DistributedCache} APIs. The script needs to be symlinked.
Here is an example on how to submit a script
@param mDbgScript the script name]]>job.setMapDebugScript("./myscript"); DistributedCache.createSymlink(job); DistributedCache.addCacheFile("/debug/scripts/myscript#myscript");
The debug command, run on the node where the map failed, is:
$script $stdout $stderr $syslog $jobconf.
The script file is distributed through {@link DistributedCache} APIs. The script file needs to be symlinked
Here is an example on how to submit a script
@param rDbgScript the script name]]>job.setReduceDebugScript("./myscript"); DistributedCache.createSymlink(job); DistributedCache.addCacheFile("/debug/scripts/myscript#myscript");
This is typically used by application-writers to implement chaining of Map-Reduce jobs in an asynchronous manner.
@param uri the job end notification uri @see JobStatus @see Job Completion and Chaining]]>
${mapred.local.dir}/taskTracker/jobcache/$jobid/work/
.
This directory is exposed to the users through
job.local.dir
.
So, the tasks can use this space
as scratch space and share files among them.
This value is available as System property also.
@return The localized job specific shared directory]]>
JobConf
is the primary interface for a user to describe a
map-reduce job to the Hadoop framework for execution. The framework tries to
faithfully execute the job as-is described by JobConf
, however:
JobConf
typically specifies the {@link Mapper}, combiner
(if any), {@link Partitioner}, {@link Reducer}, {@link InputFormat} and
{@link OutputFormat} implementations to be used etc.
Optionally JobConf
is used to specify other advanced facets
of the job such as Comparator
s to be used, files to be put in
the {@link DistributedCache}, whether or not intermediate and/or job outputs
are to be compressed (and how), debugability via user-provided scripts
( {@link #setMapDebugScript(String)}/{@link #setReduceDebugScript(String)}),
for doing post-processing on task logs, task's stdout, stderr, syslog.
and etc.
Here is an example on how to configure a job via JobConf
:
@see JobClient @see ClusterStatus @see Tool @see DistributedCache]]>// Create a new JobConf JobConf job = new JobConf(new Configuration(), MyJob.class); // Specify various job-specific parameters job.setJobName("myjob"); FileInputFormat.setInputPaths(job, new Path("in")); FileOutputFormat.setOutputPath(job, new Path("out")); job.setMapperClass(MyJob.MyMapper.class); job.setCombinerClass(MyJob.MyReducer.class); job.setReducerClass(MyJob.MyReducer.class); job.setInputFormat(SequenceFileInputFormat.class); job.setOutputFormat(SequenceFileOutputFormat.class);
Configuration
.
@param in input stream
@param conf configuration
@throws IOException]]>
Applications can use the {@link Reporter} provided to report progress or just indicate that they are alive. In scenarios where the application takes an insignificant amount of time to process individual key/value pairs, this is crucial since the framework might assume that the task has timed-out and kill that task. The other way of avoiding this is to set mapred.task.timeout to a high-enough value (or even zero for no time-outs).
@param key the input key. @param value the input value. @param output collects mapped keys and values. @param reporter facility to report progress.]]>The Hadoop Map-Reduce framework spawns one map task for each
{@link InputSplit} generated by the {@link InputFormat} for the job.
Mapper
implementations can access the {@link JobConf} for the
job via the {@link JobConfigurable#configure(JobConf)} and initialize
themselves. Similarly they can use the {@link Closeable#close()} method for
de-initialization.
The framework then calls
{@link #map(Object, Object, OutputCollector, Reporter)}
for each key/value pair in the InputSplit
for that task.
All intermediate values associated with a given output key are
subsequently grouped by the framework, and passed to a {@link Reducer} to
determine the final output. Users can control the grouping by specifying
a Comparator
via
{@link JobConf#setOutputKeyComparatorClass(Class)}.
The grouped Mapper
outputs are partitioned per
Reducer
. Users can control which keys (and hence records) go to
which Reducer
by implementing a custom {@link Partitioner}.
Users can optionally specify a combiner
, via
{@link JobConf#setCombinerClass(Class)}, to perform local aggregation of the
intermediate outputs, which helps to cut down the amount of data transferred
from the Mapper
to the Reducer
.
The intermediate, grouped outputs are always stored in
{@link SequenceFile}s. Applications can specify if and how the intermediate
outputs are to be compressed and which {@link CompressionCodec}s are to be
used via the JobConf
.
If the job has
zero
reduces then the output of the Mapper
is directly written
to the {@link FileSystem} without grouping by keys.
Example:
public class MyMapper<K extends WritableComparable, V extends Writable> extends MapReduceBase implements Mapper<K, V, K, V> { static enum MyCounters { NUM_RECORDS } private String mapTaskId; private String inputFile; private int noRecords = 0; public void configure(JobConf job) { mapTaskId = job.get("mapred.task.id"); inputFile = job.get("mapred.input.file"); } public void map(K key, V val, OutputCollector<K, V> output, Reporter reporter) throws IOException { // Process the <key, value> pair (assume this takes a while) // ... // ... // Let the framework know that we are alive, and kicking! // reporter.progress(); // Process some more // ... // ... // Increment the no. of <key, value> pairs processed ++noRecords; // Increment counters reporter.incrCounter(NUM_RECORDS, 1); // Every 100 records update application-level status if ((noRecords%100) == 0) { reporter.setStatus(mapTaskId + " processed " + noRecords + " from input-file: " + inputFile); } // Output the result output.collect(key, val); } }
Applications may write a custom {@link MapRunnable} to exert greater
control on map processing e.g. multi-threaded Mapper
s etc.
Mapping of input records to output records is complete when this method returns.
@param input the {@link RecordReader} to read the input records. @param output the {@link OutputCollector} to collect the outputrecords. @param reporter {@link Reporter} to report progress, status-updates etc. @throws IOException]]>MapRunnable
can exert greater
control on map processing e.g. multi-threaded, asynchronous mappers etc.
@see Mapper]]>
RecordReader
's for MultiFileSplit
's.
@see MultiFileSplit]]>
OutputCollector
is the generalization of the facility
provided by the Map-Reduce framework to collect data output by either the
Mapper
or the Reducer
i.e. intermediate outputs
or the output of the job.
The Map-Reduce framework relies on the OutputFormat
of the
job to:
false
otherwise]]>
key
.]]>
Partitioner
controls the partitioning of the keys of the
intermediate map-outputs. The key (or a subset of the key) is used to derive
the partition, typically by a hash function. The total number of partitions
is the same as the number of reduce tasks for the job. Hence this controls
which of the m
reduce tasks the intermediate key (and hence the
record) is sent for reduction.
@see Reducer]]>
1.0
.
@throws IOException]]>
RecordReader
, typically, converts the byte-oriented view of
the input, provided by the InputSplit
, and presents a
record-oriented view for the {@link Mapper} & {@link Reducer} tasks for
processing. It thus assumes the responsibility of processing record
boundaries and presenting the tasks with keys and values.
RecordWriter
implementations write the job outputs to the
{@link FileSystem}.
@see OutputFormat]]>
The framework calls this method for each
<key, (list of values)>
pair in the grouped inputs.
Output values must be of the same type as input values. Input keys must
not be altered. Typically all values are combined into zero or one value.
Output pairs are collected with calls to {@link OutputCollector#collect(Object,Object)}.
Applications can use the {@link Reporter} provided to report progress or just indicate that they are alive. In scenarios where the application takes an insignificant amount of time to process individual key/value pairs, this is crucial since the framework might assume that the task has timed-out and kill that task. The other way of avoiding this is to set mapred.task.timeout to a high-enough value (or even zero for no time-outs).
@param key the key. @param values the list of values to reduce. @param output to collect keys and combined values. @param reporter facility to report progress.]]>Reducer
s for the job is set by the user via
{@link JobConf#setNumReduceTasks(int)}. Reducer
implementations
can access the {@link JobConf} for the job via the
{@link JobConfigurable#configure(JobConf)} method and initialize themselves.
Similarly they can use the {@link Closeable#close()} method for
de-initialization.
Reducer
has 3 primary phases:
Reducer
is input the grouped output of a {@link Mapper}.
In the phase the framework, for each Reducer
, fetches the
relevant partition of the output of all the Mapper
s, via HTTP.
The framework groups Reducer
inputs by key
s
(since different Mapper
s may have output the same key) in this
stage.
The shuffle and sort phases occur simultaneously i.e. while outputs are being fetched they are merged.
If equivalence rules for keys while grouping the intermediates are
different from those for grouping keys before reduction, then one may
specify a Comparator
via
{@link JobConf#setOutputValueGroupingComparator(Class)}.Since
{@link JobConf#setOutputKeyComparatorClass(Class)} can be used to
control how intermediate keys are grouped, these can be used in conjunction
to simulate secondary sort on values.
In this phase the
{@link #reduce(Object, Iterator, OutputCollector, Reporter)}
method is called for each <key, (list of values)>
pair in
the grouped inputs.
The output of the reduce task is typically written to the {@link FileSystem} via {@link OutputCollector#collect(Object, Object)}.
The output of the Reducer
is not re-sorted.
Example:
@see Mapper @see Partitioner @see Reporter @see MapReduceBase]]>public class MyReducer<K extends WritableComparable, V extends Writable> extends MapReduceBase implements Reducer<K, V, K, V> { static enum MyCounters { NUM_RECORDS } private String reduceTaskId; private int noKeys = 0; public void configure(JobConf job) { reduceTaskId = job.get("mapred.task.id"); } public void reduce(K key, Iterator<V> values, OutputCollector<K, V> output, Reporter reporter) throws IOException { // Process int noValues = 0; while (values.hasNext()) { V value = values.next(); // Increment the no. of values for this key ++noValues; // Process the <key, value> pair (assume this takes a while) // ... // ... // Let the framework know that we are alive, and kicking! if ((noValues%10) == 0) { reporter.progress(); } // Process some more // ... // ... // Output the <key, value> output.collect(key, value); } // Increment the no. of <key, list of values> pairs processed ++noKeys; // Increment counters reporter.incrCounter(NUM_RECORDS, 1); // Every 100 keys update application-level status if ((noKeys%100) == 0) { reporter.setStatus(reduceTaskId + " processed " + noKeys); } } }
Reporter
provided to report progress or just indicate that they are alive. In
scenarios where the application takes an insignificant amount of time to
process individual key/value pairs, this is crucial since the framework
might assume that the task has timed-out and kill that task.
Applications can also update {@link Counters} via the provided
Reporter
.
false
.
@throws IOException]]>
false
.
@throws IOException]]>
Clients can get hold of RunningJob
via the {@link JobClient}
and then query the running-job for details such as name, configuration,
progress etc.
A Map-Reduce job usually splits the input data-set into independent chunks which processed by map tasks in completely parallel manner, followed by reduce tasks which aggregating their output. Typically both the input and the output of the job are stored in a {@link org.apache.hadoop.fs.FileSystem}. The framework takes care of monitoring tasks and re-executing failed ones. Since, usually, the compute nodes and the storage nodes are the same i.e. Hadoop's Map-Reduce framework and Distributed FileSystem are running on the same set of nodes, tasks are effectively scheduled on the nodes where data is already present, resulting in very high aggregate bandwidth across the cluster.
The Map-Reduce framework operates exclusively on <key, value> pairs i.e. the input to the job is viewed as a set of <key, value> pairs and the output as another, possibly different, set of <key, value> pairs. The keys and values have to be serializable as {@link org.apache.hadoop.io.Writable}s and additionally the keys have to be {@link org.apache.hadoop.io.WritableComparable}s in order to facilitate grouping by the framework.
Data flow:
(input) <k1, v1> | V map | V <k2, v2> | V combine | V <k2, v2> | V reduce | V <k3, v3> (output)
Applications typically implement {@link org.apache.hadoop.mapred.Mapper#map(Object, Object, OutputCollector, Reporter)} and {@link org.apache.hadoop.mapred.Reducer#reduce(Object, Iterator, OutputCollector, Reporter)} methods. The application-writer also specifies various facets of the job such as input and output locations, the Partitioner, InputFormat & OutputFormat implementations to be used etc. as a {@link org.apache.hadoop.mapred.JobConf}. The client program, {@link org.apache.hadoop.mapred.JobClient}, then submits the job to the framework and optionally monitors it.
The framework spawns one map task per {@link org.apache.hadoop.mapred.InputSplit} generated by the {@link org.apache.hadoop.mapred.InputFormat} of the job and calls {@link org.apache.hadoop.mapred.Mapper#map(Object, Object, OutputCollector, Reporter)} with each <key, value> pair read by the {@link org.apache.hadoop.mapred.RecordReader} from the InputSplit for the task. The intermediate outputs of the maps are then grouped by keys and optionally aggregated by combiner. The key space of intermediate outputs are paritioned by the {@link org.apache.hadoop.mapred.Partitioner}, where the number of partitions is exactly the number of reduce tasks for the job.
The reduce tasks fetch the sorted intermediate outputs of the maps, via http, merge the <key, value> pairs and call {@link org.apache.hadoop.mapred.Reducer#reduce(Object, Iterator, OutputCollector, Reporter)} for each <key, list of values> pair. The output of the reduce tasks' is stored on the FileSystem by the {@link org.apache.hadoop.mapred.RecordWriter} provided by the {@link org.apache.hadoop.mapred.OutputFormat} of the job.
Map-Reduce application to perform a distributed grep:
public class Grep extends Configured implements Tool { // map: Search for the pattern specified by 'grep.mapper.regex' & // 'grep.mapper.regex.group' class GrepMapper<K, Text> extends MapReduceBase implements Mapper<K, Text, Text, LongWritable> { private Pattern pattern; private int group; public void configure(JobConf job) { pattern = Pattern.compile(job.get("grep.mapper.regex")); group = job.getInt("grep.mapper.regex.group", 0); } public void map(K key, Text value, OutputCollector<Text, LongWritable> output, Reporter reporter) throws IOException { String text = value.toString(); Matcher matcher = pattern.matcher(text); while (matcher.find()) { output.collect(new Text(matcher.group(group)), new LongWritable(1)); } } } // reduce: Count the number of occurrences of the pattern class GrepReducer<K> extends MapReduceBase implements Reducer<K, LongWritable, K, LongWritable> { public void reduce(K key, Iterator<LongWritable> values, OutputCollector<K, LongWritable> output, Reporter reporter) throws IOException { // sum all values for this key long sum = 0; while (values.hasNext()) { sum += values.next().get(); } // output sum output.collect(key, new LongWritable(sum)); } } public int run(String[] args) throws Exception { if (args.length < 3) { System.out.println("Grep <inDir> <outDir> <regex> [<group>]"); ToolRunner.printGenericCommandUsage(System.out); return -1; } JobConf grepJob = new JobConf(getConf(), Grep.class); grepJob.setJobName("grep"); grepJob.setInputPath(new Path(args[0])); grepJob.setOutputPath(args[1]); grepJob.setMapperClass(GrepMapper.class); grepJob.setCombinerClass(GrepReducer.class); grepJob.setReducerClass(GrepReducer.class); grepJob.set("mapred.mapper.regex", args[2]); if (args.length == 4) grepJob.set("mapred.mapper.regex.group", args[3]); grepJob.setOutputFormat(SequenceFileOutputFormat.class); grepJob.setOutputKeyClass(Text.class); grepJob.setOutputValueClass(LongWritable.class); JobClient.runJob(grepJob); return 0; } public static void main(String[] args) throws Exception { int res = ToolRunner.run(new Configuration(), new Grep(), args); System.exit(res); } }
Notice how the data-flow of the above grep job is very similar to doing the same via the unix pipeline:
cat input/* | grep | sort | uniq -c > out
input | map | shuffle | reduce > out
Hadoop Map-Reduce applications need not be written in JavaTM only. Hadoop Streaming is a utility which allows users to create and run jobs with any executables (e.g. shell utilities) as the mapper and/or the reducer. Hadoop Pipes is a SWIG-compatible C++ API to implement Map-Reduce applications (non JNITM based).
See Google's original Map/Reduce paper for background information.
Java and JNI are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries.
]]>) }]]>
The attached code offers the following interface to users of these classes.
property | required | value |
---|---|---|
mapred.join.expr | yes | Join expression to effect over input data |
mapred.join.keycomparator | no | WritableComparator class to use for comparing keys |
mapred.join.define.<ident> | no | Class mapped to identifier in join expression |
The join expression understands the following grammar:
func ::= <ident>([<func>,]*<func>) func ::= tbl(<class>,"<path>");
Operations included in this patch are partitioned into one of two types: join operations emitting tuples and "multi-filter" operations emitting a single value from (but not necessarily included in) a set of input values. For a given key, each operation will consider the cross product of all values for all sources at that node.
Identifiers supported by default:
identifier | type | description |
---|---|---|
inner | Join | Full inner join |
outer | Join | Full outer join |
override | MultiFilter | For a given key, prefer values from the rightmost source |
A user of this class must set the InputFormat for the job to CompositeInputFormat and define a join expression accepted by the preceding grammar. For example, both of the following are acceptable:
inner(tbl(org.apache.hadoop.mapred.SequenceFileInputFormat.class, "hdfs://host:8020/foo/bar"), tbl(org.apache.hadoop.mapred.SequenceFileInputFormat.class, "hdfs://host:8020/foo/baz")) outer(override(tbl(org.apache.hadoop.mapred.SequenceFileInputFormat.class, "hdfs://host:8020/foo/bar"), tbl(org.apache.hadoop.mapred.SequenceFileInputFormat.class, "hdfs://host:8020/foo/baz")), tbl(org.apache.hadoop.mapred/SequenceFileInputFormat.class, "hdfs://host:8020/foo/rab"))
CompositeInputFormat includes a handful of convenience methods to aid construction of these verbose statements.
As in the second example, joins may be nested. Users may provide a comparator class in the mapred.join.keycomparator property to specify the ordering of their keys, or accept the default comparator as returned by WritableComparator.get(keyclass).
Users can specify their own join operations, typically by overriding JoinRecordReader or MultiFilterRecordReader and mapping that class to an identifier in the join expression using the mapred.join.define.ident property, where ident is the identifier appearing in the join expression. Users may elect to emit- or modify- values passing through their join operation. Consulting the existing operations for guidance is recommended. Adding arguments is considerably more complex (and only partially supported), as one must also add a Node type to the parse tree. One is probably better off extending RecordReader in most cases.
JIRA]]>Map implementations using this MapRunnable must be thread-safe.
The Map-Reduce job has to be configured to use this MapRunnable class (using
the JobConf.setMapRunnerClass method) and
the number of thread the thread-pool can use with the
mapred.map.multithreadedrunner.threads
property, its default
value is 10 threads.
]]>
To call this function, the user needs to pass in arguments specifying the input directories, the output directory, the number of reducers, the input data format (textinputformat or sequencefileinputformat), and a file specifying user plugin class(es) to load by the mapper. A user plugin class is responsible for specifying what aggregators to use and what values are for which aggregators. A plugin class must implement the following interface:public static JobConf createValueAggregatorJob(String args[]) throws IOException;
Function generateKeyValPairs will generate aggregation key/value pairs for the input key/value pair. Each aggregation key encodes two pieces of information: the aggregation type and aggregation ID. The value is the value to be aggregated onto the aggregation ID according to the aggregation type. Here is a simple example user plugin class for counting the words in the input texts:public interface ValueAggregatorDescriptor { public ArrayList<Entry> generateKeyValPairs(Object key, Object value); public void configure(JobConfjob); }
In the above code, LONG_VALUE_SUM is a string denoting the aggregation type LongValueSum, which sums over long values. ONE denotes a string "1". Function generateEntry(LONG_VALUE_SUM, words[i], ONE) will inperpret the first argument as an aggregation type, the second as an aggregation ID, and the third argumnent as the value to be aggregated. The output will look like: "LongValueSum:xxxx", where XXXX is the string value of words[i]. The value will be "1". The mapper will call generateKeyValPairs(Object key, Object val) for each input key/value pair to generate the desired aggregation id/value pairs. The down stream combiner/reducer will interpret these pairs as adding one to the aggregator XXXX. Class ValueAggregatorBaseDescriptor is a base class that user plugin classes can extend. Here is the XML fragment specifying the user plugin class:public class WordCountAggregatorDescriptor extends ValueAggregatorBaseDescriptor { public ArrayList<Entry> generateKeyValPairs(Object key, Object val) { String words [] = val.toString().split(" |\t"); ArrayList<Entry> retv = new ArrayList<Entry>(); for (int i = 0; i < words.length; i++) { retv.add(generateEntry(LONG_VALUE_SUM, words[i], ONE)) } return retv; } public void configure(JobConf job) {} }
Class ValueAggregatorBaseDescriptor itself provides a default implementation for generateKeyValPairs:<property> <name>aggregator.descriptor.num</name> <value>1</value> </property> <property> <name>aggregator.descriptor.0</name> <value>UserDefined,org.apache.hadoop.mapred.lib.aggregate.examples.WordCountAggregatorDescriptor</value> </property>
Thus, if no user plugin class is specified, the default behavior of the map/reduce job is to count the number of records (lines) in the imput files. During runtime, the mapper will invoke the generateKeyValPairs function for each input key/value pair, and emit the generated key/value pairs:public ArrayList<Entry> generateKeyValPairs(Object key, Object val) { ArrayList<Entry> retv = new ArrayList<Entry>(); String countType = LONG_VALUE_SUM; String id = "record_count"; retv.add(generateEntry(countType, id, ONE)); return retv; }
The reducer will create an aggregator object for each key/value list pair, and perform the appropriate aggregation. At the end, it will emit the aggregator's results:public void map(WritableComparable key, Writable value, OutputCollector output, Reporter reporter) throws IOException { Iterator iter = this.aggregatorDescriptorList.iterator(); while (iter.hasNext()) { ValueAggregatorDescriptor ad = (ValueAggregatorDescriptor) iter.next(); Iterator<Entry> ens = ad.generateKeyValPairs(key, value).iterator(); while (ens.hasNext()) { Entry en = ens.next(); output.collect((WritableComparable)en.getKey(), (Writable)en.getValue()); } } }
In order to be able to use combiner, all the aggregation type be aggregators must be associative and communitive. The following are the types supported:public void reduce(WritableComparable key, Iterator values, OutputCollector output, Reporter reporter) throws IOException { String keyStr = key.toString(); int pos = keyStr.indexOf(ValueAggregatorDescriptor.TYPE_SEPARATOR); String type = keyStr.substring(0,pos); keyStr = keyStr.substring(pos+ValueAggregatorDescriptor.TYPE_SEPARATOR.length()); ValueAggregator aggregator = ValueAggregatorBaseDescriptor.generateValueAggregator(type); while (values.hasNext()) { aggregator.addNextValue(values.next()); } String val = aggregator.getReport(); key = new Text(keyStr); output.collect(key, new Text(val)); }
2. Create an xml file specifying the user plugin. 3. Compile your java class and create a jar file, say wc.jar. Finally, run the job:import org.apache.hadoop.mapred.lib.aggregate.ValueAggregatorBaseDescriptor; import org.apache.hadoop.mapred.JobConf; public class WordCountAggregatorDescriptor extends ValueAggregatorBaseDescriptor { public void map(WritableComparable key, Writable value, OutputCollector output, Reporter reporter) throws IOException { } public void configure(JobConf job) { } }
]]>hadoop jar wc.jar org.apache.hadoop.mapred.lib.aggregate..ValueAggregatorJob indirs outdir numofreducers textinputformat|sequencefileinputformat spec_file
bin/hadoop pipes \ [-conf path] \ [-input inputDir] \ [-output outputDir] \ [-jar applicationJarFile] \ [-inputformat class] \ [-map class] \ [-partitioner class] \ [-reduce class] \ [-writer class] \ [-program program url]
The application programs link against a thin C++ wrapper library that handles the communication with the rest of the Hadoop system. The C++ interface is "swigable" so that interfaces can be generated for python and other scripting languages. All of the C++ functions and classes are in the HadoopPipes namespace. The job may consist of any combination of Java and C++ RecordReaders, Mappers, Paritioner, Combiner, Reducer, and RecordWriter.
Hadoop Pipes has a generic Java class for handling the mapper and reducer (PipesMapRunner and PipesReducer). They fork off the application program and communicate with it over a socket. The communication is handled by the C++ wrapper library and the PipesMapRunner and PipesReducer.
The application program passes in a factory object that can create the various objects needed by the framework to the runTask function. The framework creates the Mapper or Reducer as appropriate and calls the map or reduce method to invoke the application's code. The JobConf is available to the application.
The Mapper and Reducer objects get all of their inputs, outputs, and context via context objects. The advantage of using the context objects is that their interface can be extended with additional methods without breaking clients. Although this interface is different from the current Java interface, the plan is to migrate the Java interface in this direction.
Although the Java implementation is typed, the C++ interfaces of keys and values is just a byte buffer. Since STL strings provide precisely the right functionality and are standard, they will be used. The decision to not use stronger types was to simplify the interface.
The application can also define combiner functions. The combiner will be run locally by the framework in the application process to avoid the round trip to the Java process and back. Because the compare function is not available in C++, the combiner will use memcmp to sort the inputs to the combiner. This is not as general as the Java equivalent, which uses the user's comparator, but should cover the majority of the use cases. As the map function outputs key/value pairs, they will be buffered. When the buffer is full, it will be sorted and passed to the combiner. The output of the combiner will be sent to the Java process.
The application can also set a partition function to control which key is given to a particular reduce. If a partition function is not defined, the Java one will be used. The partition function will be called by the C++ framework before the key/value pair is sent back to Java.]]>
org.apache.hadoop.metrics.spi.NullContext
, which is a
dummy "no-op" context which will cause all metric data to be discarded.
@param contextName the name of the context
@return the named MetricsContext]]>
hadoop-metrics.properties
exists on the class path. If it
exists, it must be in the format defined by java.util.Properties, and all
the properties in the file are set as attributes on the newly created
ContextFactory instance.
@return the singleton ContextFactory instance]]>
recordName
is not in that set.
@param recordName the name of the record
@throws MetricsException if recordName conflicts with configuration data]]>
update()
to pass the record to the
client library.
Metric data is not immediately sent to the metrics system
each time that update()
is called.
An internal table is maintained, identified by the record name. This
table has columns
corresponding to the tag and the metric names, and rows
corresponding to each unique set of tag values. An update
either modifies an existing row in the table, or adds a new row with a set of
tag values that are different from all the other rows. Note that if there
are no tags, then there can be at most one row in the table.
Once a row is added to the table, its data will be sent to the metrics system
on every timer period, whether or not it has been updated since the previous
timer period. If this is inappropriate, for example if metrics were being
reported by some transient object in an application, the remove()
method can be used to remove the row and thus stop the data from being
sent.
Note that the update()
method is atomic. This means that it is
safe for different threads to be updating the same metric. More precisely,
it is OK for different threads to call update()
on MetricsRecord instances
with the same set of tag names and tag values. Different threads should
not use the same MetricsRecord instance at the same time.]]>
org.apache.hadoop.metrics.spi
org.apache.hadoop.metrics.file
org.apache.hadoop.metrics.ganglia
private ContextFactory contextFactory = ContextFactory.getFactory(); void reportMyMetric(float myMetric) { MetricsContext myContext = contextFactory.getContext("myContext"); MetricsRecord myRecord = myContext.getRecord("myRecord"); myRecord.setMetric("myMetric", myMetric); myRecord.update(); }In this example there are three names:
private MetricsRecord diskStats = contextFactory.getContext("myContext").getRecord("diskStats"); void reportDiskMetrics(String diskName, float diskBusy, float diskUsed) { diskStats.setTag("diskName", diskName); diskStats.setMetric("diskBusy", diskBusy); diskStats.setMetric("diskUsed", diskUsed); diskStats.update(); }
MetricsRecord.update()
is called. Instead it is stored in an
internal table, and the contents of the table are sent periodically.
This can be important for two reasons:
registerUpdater()
method. The benefit of this
versus using java.util.Timer
is that the callbacks will be done
immediately before sending the data, making the data as current as possible.
ContextFactory factory = ContextFactory.getFactory(); ... examine and/or modify factory attributes ... MetricsContext context = factory.getContext("myContext");The factory attributes can be examined and modified using the following
ContextFactory
methods:
Object getAttribute(String attributeName)
String[] getAttributeNames()
void setAttribute(String name, Object value)
void removeAttribute(attributeName)
ContextFactory.getFactory()
initializes the factory attributes by
reading the properties file hadoop-metrics.properties
if it exists
on the class path.
A factory attribute named:
contextName.classshould have as its value the fully qualified name of the class to be instantiated by a call of the
CodeFactory
method
getContext(contextName)
. If this factory attribute is not
specified, the default is to instantiate
org.apache.hadoop.metrics.file.FileContext
.
Other factory attributes are specific to a particular implementation of this
API and are documented elsewhere. For example, configuration attributes for
the file and Ganglia implementations can be found in the javadoc for
their respective packages.]]>
myContextName.fileName=/tmp/metrics.log myContextName.period=5]]>
recordName
is not in that set.
@param recordName the name of the record
@throws MetricsException if recordName conflicts with configuration data]]>
emitRecord
method in order to transmit
the data. ]]>
remove()
.]]>
org.apache.hadoop.metrics.ganglia
.
Plugging in an implementation involves writing a concrete subclass of
AbstractMetricsContext
. The subclass should get its
configuration information using the getAttribute(attributeName)
method.]]>
recfile = *include module *record
include = "include" path
path = (relative-path / absolute-path)
module = "module" module-name
module-name = name *("." name)
record := "class" name "{" 1*(field) "}"
field := type name ";"
name := ALPHA (ALPHA / DIGIT / "_" )*
type := (ptype / ctype)
ptype := ("byte" / "boolean" / "int" |
"long" / "float" / "double"
"ustring" / "buffer")
ctype := (("vector" "<" type ">") /
("map" "<" type "," type ">" ) ) / name)
A DDL file describes one or more record types. It begins with zero or
more include declarations, a single mandatory module declaration
followed by zero or more class declarations. The semantics of each of
these declarations are described below:
module links {
class Link {
ustring URL;
boolean isRelative;
ustring anchorText;
};
}
include "links.jr"
module outlinks {
class OutLinks {
ustring baseURL;
vector outLinks;
};
}
$ rcc -l C++ ...
namespace hadoop {
enum RecFormat { kBinary, kXML, kCSV };
class InStream {
public:
virtual ssize_t read(void *buf, size_t n) = 0;
};
class OutStream {
public:
virtual ssize_t write(const void *buf, size_t n) = 0;
};
class IOError : public runtime_error {
public:
explicit IOError(const std::string& msg);
};
class IArchive;
class OArchive;
class RecordReader {
public:
RecordReader(InStream& in, RecFormat fmt);
virtual ~RecordReader(void);
virtual void read(Record& rec);
};
class RecordWriter {
public:
RecordWriter(OutStream& out, RecFormat fmt);
virtual ~RecordWriter(void);
virtual void write(Record& rec);
};
class Record {
public:
virtual std::string type(void) const = 0;
virtual std::string signature(void) const = 0;
protected:
virtual bool validate(void) const = 0;
virtual void
serialize(OArchive& oa, const std::string& tag) const = 0;
virtual void
deserialize(IArchive& ia, const std::string& tag) = 0;
};
}
namespace links {
class Link : public hadoop::Record {
// ....
};
};
Each field within the record will cause the generation of a private member
declaration of the appropriate type in the class declaration, and one or more
acccessor methods. The generated class will implement the serialize and
deserialize methods defined in hadoop::Record+. It will also
implement the inspection methods type and signature from
hadoop::Record. A default constructor and virtual destructor will also
be generated. Serialization code will read/write records into streams that
implement the hadoop::InStream and the hadoop::OutStream interfaces.
For each member of a record an accessor method is generated that returns
either the member or a reference to the member. For members that are returned
by value, a setter method is also generated. This is true for primitive
data members of the types byte, int, long, boolean, float and
double. For example, for a int field called MyField the folowing
code is generated.
...
private:
int32_t mMyField;
...
public:
int32_t getMyField(void) const {
return mMyField;
};
void setMyField(int32_t m) {
mMyField = m;
};
...
For a ustring or buffer or composite field. The generated code
only contains accessors that return a reference to the field. A const
and a non-const accessor are generated. For example:
...
private:
std::string mMyBuf;
...
public:
std::string& getMyBuf() {
return mMyBuf;
};
const std::string& getMyBuf() const {
return mMyBuf;
};
...
module inclrec {
class RI {
int I32;
double D;
ustring S;
};
}
and the testrec.jr file contains:
include "inclrec.jr"
module testrec {
class R {
vector VF;
RI Rec;
buffer Buf;
};
}
Then the invocation of rcc such as:
$ rcc -l c++ inclrec.jr testrec.jr
will result in generation of four files:
inclrec.jr.{cc,hh} and testrec.jr.{cc,hh}.
The inclrec.jr.hh will contain:
#ifndef _INCLREC_JR_HH_
#define _INCLREC_JR_HH_
#include "recordio.hh"
namespace inclrec {
class RI : public hadoop::Record {
private:
int32_t I32;
double D;
std::string S;
public:
RI(void);
virtual ~RI(void);
virtual bool operator==(const RI& peer) const;
virtual bool operator<(const RI& peer) const;
virtual int32_t getI32(void) const { return I32; }
virtual void setI32(int32_t v) { I32 = v; }
virtual double getD(void) const { return D; }
virtual void setD(double v) { D = v; }
virtual std::string& getS(void) const { return S; }
virtual const std::string& getS(void) const { return S; }
virtual std::string type(void) const;
virtual std::string signature(void) const;
protected:
virtual void serialize(hadoop::OArchive& a) const;
virtual void deserialize(hadoop::IArchive& a);
};
} // end namespace inclrec
#endif /* _INCLREC_JR_HH_ */
The testrec.jr.hh file will contain:
#ifndef _TESTREC_JR_HH_
#define _TESTREC_JR_HH_
#include "inclrec.jr.hh"
namespace testrec {
class R : public hadoop::Record {
private:
std::vector VF;
inclrec::RI Rec;
std::string Buf;
public:
R(void);
virtual ~R(void);
virtual bool operator==(const R& peer) const;
virtual bool operator<(const R& peer) const;
virtual std::vector& getVF(void) const;
virtual const std::vector& getVF(void) const;
virtual std::string& getBuf(void) const ;
virtual const std::string& getBuf(void) const;
virtual inclrec::RI& getRec(void) const;
virtual const inclrec::RI& getRec(void) const;
virtual bool serialize(hadoop::OutArchive& a) const;
virtual bool deserialize(hadoop::InArchive& a);
virtual std::string type(void) const;
virtual std::string signature(void) const;
};
}; // end namespace testrec
#endif /* _TESTREC_JR_HH_ */
DDL Type C++ Type Java Type
boolean bool boolean
byte int8_t byte
int int32_t int
long int64_t long
float float float
double double double
ustring std::string java.lang.String
buffer std::string org.apache.hadoop.record.Buffer
class type class type class type
vector std::vector java.util.ArrayList
map std::map java.util.TreeMap
record = primitive / struct / vector / map
primitive = boolean / int / long / float / double / ustring / buffer
boolean = "T" / "F"
int = ["-"] 1*DIGIT
long = ";" ["-"] 1*DIGIT
float = ["-"] 1*DIGIT "." 1*DIGIT ["E" / "e" ["-"] 1*DIGIT]
double = ";" ["-"] 1*DIGIT "." 1*DIGIT ["E" / "e" ["-"] 1*DIGIT]
ustring = "'" *(UTF8 char except NULL, LF, % and , / "%00" / "%0a" / "%25" / "%2c" )
buffer = "#" *(BYTE except NULL, LF, % and , / "%00" / "%0a" / "%25" / "%2c" )
struct = "s{" record *("," record) "}"
vector = "v{" [record *("," record)] "}"
map = "m{" [*(record "," record)] "}"
class {
int MY_INT; // value 5
vector MY_VEC; // values 0.1, -0.89, 2.45e4
buffer MY_BUF; // value '\00\n\tabc%'
}
is serialized as
<value>
<struct>
<member>
<name>MY_INT</name>
<value><i4>5</i4></value>
</member>
<member>
<name>MY_VEC</name>
<value>
<array>
<data>
<value><ex:float>0.1</ex:float></value>
<value><ex:float>-0.89</ex:float></value>
<value><ex:float>2.45e4</ex:float></value>
</data>
</array>
</value>
</member>
<member>
<name>MY_BUF</name>
<value><string>%00\n\tabc%25</string></value>
</member>
</struct>
</value>
]]>
The task requires the file
or the nested fileset element to be
specified. Optional attributes are language
(set the output
language, default is "java"),
destdir
(name of the destination directory for generated java/c++
code, default is ".") and failonerror
(specifies error handling
behavior. default is true).
<recordcc destdir="${basedir}/gensrc" language="java"> <fileset include="**\/*.jr" /> </recordcc>]]>
conf
as a property attr
The String starts with the user name followed by the default group names,
and other group names.
@param conf configuration
@param attr property name
@param ugi a UnixUserGroupInformation]]>
attr
as a comma separated string that starts
with the user name followed by group names.
If the property name is not defined, return null.
It's assumed that there is only one UGI per user. If this user already
has a UGI in the ugi map, return the ugi in the map.
Otherwise, construct a UGI from the configuration, store it in the
ugi map and return it.
@param conf configuration
@param attr property name
@return a UnixUGI
@throws LoginException if the stored string is ill-formatted.]]>
]]>
to parse only the generic Hadoop
arguments.
The array of string arguments other than the generic arguments can be
obtained by {@link #getRemainingArgs()}.
@param conf the Configuration
to modify.
@param args command-line arguments.]]>
CommandLine
object can be obtained by
{@link #getCommandLine()}.
@param conf the configuration to modify
@param options options built by the caller
@param args User-specified arguments]]>
CommandLine
representing list of arguments
parsed against Options descriptor.]]>
GenericOptionsParser
recognizes several standarad command
line arguments, enabling applications to easily specify a namenode, a
jobtracker, additional configuration resources etc.
The supported generic options are:
-conf <configuration file> specify a configuration file -D <property=value> use value for given property -fs <local|namenode:port> specify a namenode -jt <local|jobtracker:port> specify a job tracker
The general command line syntax is:
bin/hadoop command [genericOptions] [commandOptions]
Generic command line arguments might modify
Configuration
objects, given to constructors.
The functionality is implemented using Commons CLI.
Examples:
@see Tool @see ToolRunner]]>$ bin/hadoop dfs -fs darwin:8020 -ls /data list /data directory in dfs with namenode darwin:8020 $ bin/hadoop dfs -D fs.default.name=darwin:8020 -ls /data list /data directory in dfs with namenode darwin:8020 $ bin/hadoop dfs -conf hadoop-site.xml -ls /data list /data directory in dfs with conf specified in hadoop-site.xml $ bin/hadoop job -D mapred.job.tracker=darwin:50020 -submit job.xml submit a job to job tracker darwin:50020 $ bin/hadoop job -jt darwin:50020 -submit job.xml submit a job to job tracker darwin:50020 $ bin/hadoop job -jt local -submit job.xml submit a job to local runner
T
.
@param Class<T>
]]>
T[]
.
@param c the Class object of the items in the list
@param list the list to convert]]>
T[]
.
@param list the list to convert
@throws ArrayIndexOutOfBoundsException if the list is empty.
Use {@link #toArray(Class, List)} if the list may be empty.]]>
false
]]>
false
otherwise.]]>
{ o = pq.pop(); o.change(); pq.push(o); }]]>
Progressable
to explicitly report progress to the Hadoop framework. This is especially
important for operations which take an insignificant amount of time since,
in-lieu of the reported progress, the framework has to assume that an error
has occured and time-out the operation.]]>
null
.
@param job job configuration
@return a String[]
with the ulimit command arguments or
null
if we are running on a non *nix platform or
if the limit is unspecified.]]>
du
or
df
. It also offers facilities to gate commands by
time-intervals.]]>
escapeChar
@param str string
@param escapeChar escape char
@param charToEscape the char to be escaped
@return an escaped string]]>
escapeChar
@param str string
@param escapeChar escape char
@param charToEscape the escaped char
@return an unescaped string]]>
Tool
, is the standard for any Map-Reduce tool/application.
The tool/application should delegate the handling of
standard command-line options to {@link ToolRunner#run(Tool, String[])}
and only handle its custom arguments.
Here is how a typical Tool
is implemented:
@see GenericOptionsParser @see ToolRunner]]>public class MyApp extends Configured implements Tool { public int run(String[] args) throws Exception { //Configuration
processed byToolRunner
Configuration conf = getConf(); // Create a JobConf using the processedconf
JobConf job = new JobConf(conf, MyApp.class); // Process custom command-line options Path in = new Path(args[1]); Path out = new Path(args[2]); // Specify various job-specific parameters job.setJobName("my-app"); job.setInputPath(in); job.setOutputPath(out); job.setMapperClass(MyApp.MyMapper.class); job.setReducerClass(MyApp.MyReducer.class); // Submit the job, then poll for progress until the job is complete JobClient.runJob(job); } public static void main(String[] args) throws Exception { // LetToolRunner
handle generic command-line options int res = ToolRunner.run(new Configuration(), new Sort(), args); System.exit(res); } }
Configuration
, or builds one if null.
Sets the Tool
's configuration with the possibly modified
version of the conf
.
@param conf Configuration
for the Tool
.
@param tool Tool
to run.
@param args command-line arguments to the tool.
@return exit code of the {@link Tool#run(String[])} method.]]>
Configuration
.
Equivalent to run(tool.getConf(), tool, args)
.
@param tool Tool
to run.
@param args command-line arguments to the tool.
@return exit code of the {@link Tool#run(String[])} method.]]>
ToolRunner
can be used to run classes implementing
Tool
interface. It works in conjunction with
{@link GenericOptionsParser} to parse the
generic hadoop command line arguments and modifies the
Configuration
of the Tool
. The
application-specific options are passed along without being modified.
@see Tool
@see GenericOptionsParser]]>