HADOOP-11558. Fix dead links to doc of hadoop-tools. Contributed by Jean-Pierre Matsumoto.
(cherry picked from commit 79426f3334
)
This commit is contained in:
parent
e28e2e4f52
commit
f9c18fd610
|
@ -659,6 +659,9 @@ Release 2.7.0 - UNRELEASED
|
||||||
HADOOP-11710. Make CryptoOutputStream behave like DFSOutputStream wrt
|
HADOOP-11710. Make CryptoOutputStream behave like DFSOutputStream wrt
|
||||||
synchronization. (Sean Busbey via yliu)
|
synchronization. (Sean Busbey via yliu)
|
||||||
|
|
||||||
|
HADOOP-11558. Fix dead links to doc of hadoop-tools. (Jean-Pierre
|
||||||
|
Matsumoto via ozawa)
|
||||||
|
|
||||||
Release 2.6.1 - UNRELEASED
|
Release 2.6.1 - UNRELEASED
|
||||||
|
|
||||||
INCOMPATIBLE CHANGES
|
INCOMPATIBLE CHANGES
|
||||||
|
|
|
@ -43,7 +43,7 @@ The Yarn Scheduler Load Simulator (SLS) is such a tool, which can simulate large
|
||||||
o
|
o
|
||||||
The simulator will exercise the real Yarn `ResourceManager` removing the network factor by simulating `NodeManagers` and `ApplicationMasters` via handling and dispatching `NM`/`AMs` heartbeat events from within the same JVM. To keep tracking of scheduler behavior and performance, a scheduler wrapper will wrap the real scheduler.
|
The simulator will exercise the real Yarn `ResourceManager` removing the network factor by simulating `NodeManagers` and `ApplicationMasters` via handling and dispatching `NM`/`AMs` heartbeat events from within the same JVM. To keep tracking of scheduler behavior and performance, a scheduler wrapper will wrap the real scheduler.
|
||||||
|
|
||||||
The size of the cluster and the application load can be loaded from configuration files, which are generated from job history files directly by adopting [Apache Rumen](https://hadoop.apache.org/docs/stable/rumen.html).
|
The size of the cluster and the application load can be loaded from configuration files, which are generated from job history files directly by adopting [Apache Rumen](../hadoop-rumen/Rumen.html).
|
||||||
|
|
||||||
The simulator will produce real time metrics while executing, including:
|
The simulator will produce real time metrics while executing, including:
|
||||||
|
|
||||||
|
|
|
@ -201,7 +201,7 @@ To specify additional local temp directories use:
|
||||||
-D mapred.system.dir=/tmp/system
|
-D mapred.system.dir=/tmp/system
|
||||||
-D mapred.temp.dir=/tmp/temp
|
-D mapred.temp.dir=/tmp/temp
|
||||||
|
|
||||||
**Note:** For more details on job configuration parameters see: [mapred-default.xml](./mapred-default.xml)
|
**Note:** For more details on job configuration parameters see: [mapred-default.xml](../hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml)
|
||||||
|
|
||||||
$H4 Specifying Map-Only Jobs
|
$H4 Specifying Map-Only Jobs
|
||||||
|
|
||||||
|
@ -322,7 +322,7 @@ More Usage Examples
|
||||||
|
|
||||||
$H3 Hadoop Partitioner Class
|
$H3 Hadoop Partitioner Class
|
||||||
|
|
||||||
Hadoop has a library class, [KeyFieldBasedPartitioner](../../api/org/apache/hadoop/mapred/lib/KeyFieldBasedPartitioner.html), that is useful for many applications. This class allows the Map/Reduce framework to partition the map outputs based on certain key fields, not the whole keys. For example:
|
Hadoop has a library class, [KeyFieldBasedPartitioner](../api/org/apache/hadoop/mapred/lib/KeyFieldBasedPartitioner.html), that is useful for many applications. This class allows the Map/Reduce framework to partition the map outputs based on certain key fields, not the whole keys. For example:
|
||||||
|
|
||||||
hadoop jar hadoop-streaming-${project.version}.jar \
|
hadoop jar hadoop-streaming-${project.version}.jar \
|
||||||
-D stream.map.output.field.separator=. \
|
-D stream.map.output.field.separator=. \
|
||||||
|
@ -372,7 +372,7 @@ Sorting within each partition for the reducer(all 4 fields used for sorting)
|
||||||
|
|
||||||
$H3 Hadoop Comparator Class
|
$H3 Hadoop Comparator Class
|
||||||
|
|
||||||
Hadoop has a library class, [KeyFieldBasedComparator](../../api/org/apache/hadoop/mapreduce/lib/partition/KeyFieldBasedComparator.html), that is useful for many applications. This class provides a subset of features provided by the Unix/GNU Sort. For example:
|
Hadoop has a library class, [KeyFieldBasedComparator](../api/org/apache/hadoop/mapreduce/lib/partition/KeyFieldBasedComparator.html), that is useful for many applications. This class provides a subset of features provided by the Unix/GNU Sort. For example:
|
||||||
|
|
||||||
hadoop jar hadoop-streaming-${project.version}.jar \
|
hadoop jar hadoop-streaming-${project.version}.jar \
|
||||||
-D mapreduce.job.output.key.comparator.class=org.apache.hadoop.mapreduce.lib.partition.KeyFieldBasedComparator \
|
-D mapreduce.job.output.key.comparator.class=org.apache.hadoop.mapreduce.lib.partition.KeyFieldBasedComparator \
|
||||||
|
@ -406,7 +406,7 @@ Sorting output for the reducer (where second field used for sorting)
|
||||||
|
|
||||||
$H3 Hadoop Aggregate Package
|
$H3 Hadoop Aggregate Package
|
||||||
|
|
||||||
Hadoop has a library package called [Aggregate](../../org/apache/hadoop/mapred/lib/aggregate/package-summary.html). Aggregate provides a special reducer class and a special combiner class, and a list of simple aggregators that perform aggregations such as "sum", "max", "min" and so on over a sequence of values. Aggregate allows you to define a mapper plugin class that is expected to generate "aggregatable items" for each input key/value pair of the mappers. The combiner/reducer will aggregate those aggregatable items by invoking the appropriate aggregators.
|
Hadoop has a library package called [Aggregate](../api/org/apache/hadoop/mapred/lib/aggregate/package-summary.html). Aggregate provides a special reducer class and a special combiner class, and a list of simple aggregators that perform aggregations such as "sum", "max", "min" and so on over a sequence of values. Aggregate allows you to define a mapper plugin class that is expected to generate "aggregatable items" for each input key/value pair of the mappers. The combiner/reducer will aggregate those aggregatable items by invoking the appropriate aggregators.
|
||||||
|
|
||||||
To use Aggregate, simply specify "-reducer aggregate":
|
To use Aggregate, simply specify "-reducer aggregate":
|
||||||
|
|
||||||
|
@ -441,7 +441,7 @@ The python program myAggregatorForKeyCount.py looks like:
|
||||||
|
|
||||||
$H3 Hadoop Field Selection Class
|
$H3 Hadoop Field Selection Class
|
||||||
|
|
||||||
Hadoop has a library class, [FieldSelectionMapReduce](../../api/org/apache/hadoop/mapred/lib/FieldSelectionMapReduce.html), that effectively allows you to process text data like the unix "cut" utility. The map function defined in the class treats each input key/value pair as a list of fields. You can specify the field separator (the default is the tab character). You can select an arbitrary list of fields as the map output key, and an arbitrary list of fields as the map output value. Similarly, the reduce function defined in the class treats each input key/value pair as a list of fields. You can select an arbitrary list of fields as the reduce output key, and an arbitrary list of fields as the reduce output value. For example:
|
Hadoop has a library class, [FieldSelectionMapReduce](../api/org/apache/hadoop/mapred/lib/FieldSelectionMapReduce.html), that effectively allows you to process text data like the unix "cut" utility. The map function defined in the class treats each input key/value pair as a list of fields. You can specify the field separator (the default is the tab character). You can select an arbitrary list of fields as the map output key, and an arbitrary list of fields as the map output value. Similarly, the reduce function defined in the class treats each input key/value pair as a list of fields. You can select an arbitrary list of fields as the reduce output key, and an arbitrary list of fields as the reduce output value. For example:
|
||||||
|
|
||||||
hadoop jar hadoop-streaming-${project.version}.jar \
|
hadoop jar hadoop-streaming-${project.version}.jar \
|
||||||
-D mapreduce.map.output.key.field.separator=. \
|
-D mapreduce.map.output.key.field.separator=. \
|
||||||
|
@ -480,7 +480,7 @@ As an example, consider the problem of zipping (compressing) a set of files acro
|
||||||
|
|
||||||
$H3 How many reducers should I use?
|
$H3 How many reducers should I use?
|
||||||
|
|
||||||
See MapReduce Tutorial for details: [Reducer](./MapReduceTutorial.html#Reducer)
|
See MapReduce Tutorial for details: [Reducer](../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html#Reducer)
|
||||||
|
|
||||||
$H3 If I set up an alias in my shell script, will that work after -mapper?
|
$H3 If I set up an alias in my shell script, will that work after -mapper?
|
||||||
|
|
||||||
|
@ -556,4 +556,4 @@ A streaming process can use the stderr to emit status information. To set a stat
|
||||||
|
|
||||||
$H3 How do I get the Job variables in a streaming job's mapper/reducer?
|
$H3 How do I get the Job variables in a streaming job's mapper/reducer?
|
||||||
|
|
||||||
See [Configured Parameters](./MapReduceTutorial.html#Configured_Parameters). During the execution of a streaming job, the names of the "mapred" parameters are transformed. The dots ( . ) become underscores ( \_ ). For example, mapreduce.job.id becomes mapreduce\_job\_id and mapreduce.job.jar becomes mapreduce\_job\_jar. In your code, use the parameter names with the underscores.
|
See [Configured Parameters](../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html#Configured_Parameters). During the execution of a streaming job, the names of the "mapred" parameters are transformed. The dots ( . ) become underscores ( \_ ). For example, mapreduce.job.id becomes mapreduce\_job\_id and mapreduce.job.jar becomes mapreduce\_job\_jar. In your code, use the parameter names with the underscores.
|
||||||
|
|
Loading…
Reference in New Issue