MAPREDUCE-4977. Documentation for pluggable shuffle and pluggable sort. (tucu)
git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/trunk@1443168 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
ab16a37572
commit
17e72be6d8
|
@ -230,6 +230,9 @@ Release 2.0.3-alpha - 2013-02-06
|
||||||
MAPREDUCE-4971. Minor extensibility enhancements to Counters &
|
MAPREDUCE-4971. Minor extensibility enhancements to Counters &
|
||||||
FileOutputFormat. (Arun C Murthy via sseth)
|
FileOutputFormat. (Arun C Murthy via sseth)
|
||||||
|
|
||||||
|
MAPREDUCE-4977. Documentation for pluggable shuffle and pluggable sort.
|
||||||
|
(tucu)
|
||||||
|
|
||||||
OPTIMIZATIONS
|
OPTIMIZATIONS
|
||||||
|
|
||||||
MAPREDUCE-4893. Fixed MR ApplicationMaster to do optimal assignment of
|
MAPREDUCE-4893. Fixed MR ApplicationMaster to do optimal assignment of
|
||||||
|
|
|
@ -0,0 +1,96 @@
|
||||||
|
~~ Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
~~ you may not use this file except in compliance with the License.
|
||||||
|
~~ You may obtain a copy of the License at
|
||||||
|
~~
|
||||||
|
~~ http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
~~
|
||||||
|
~~ Unless required by applicable law or agreed to in writing, software
|
||||||
|
~~ distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
~~ See the License for the specific language governing permissions and
|
||||||
|
~~ limitations under the License. See accompanying LICENSE file.
|
||||||
|
|
||||||
|
---
|
||||||
|
Hadoop Map Reduce Next Generation-${project.version} - Pluggable Shuffle and Pluggable Sort
|
||||||
|
---
|
||||||
|
---
|
||||||
|
${maven.build.timestamp}
|
||||||
|
|
||||||
|
Hadoop MapReduce Next Generation - Pluggable Shuffle and Pluggable Sort
|
||||||
|
|
||||||
|
\[ {{{./index.html}Go Back}} \]
|
||||||
|
|
||||||
|
* Introduction
|
||||||
|
|
||||||
|
The pluggable shuffle and pluggable sort capabilities allow replacing the
|
||||||
|
built in shuffle and sort logic with alternate implementations. Example use
|
||||||
|
cases for this are: using a different application protocol other than HTTP
|
||||||
|
such as RDMA for shuffling data from the Map nodes to the Reducer nodes; or
|
||||||
|
replacing the sort logic with custom algorithms that enable Hash aggregation
|
||||||
|
and Limit-N query.
|
||||||
|
|
||||||
|
<<IMPORTANT:>> The pluggable shuffle and pluggable sort capabilities are
|
||||||
|
experimental and unstable. This means the provided APIs may change and break
|
||||||
|
compatibility in future versions of Hadoop.
|
||||||
|
|
||||||
|
* Implementing a Custom Shuffle and a Custom Sort
|
||||||
|
|
||||||
|
A custom shuffle implementation requires a
|
||||||
|
<<<org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.AuxiliaryService>>>
|
||||||
|
implementation class running in the NodeManagers and a
|
||||||
|
<<<org.apache.hadoop.mapred.ShuffleConsumerPlugin>>> implementation class
|
||||||
|
running in the Reducer tasks.
|
||||||
|
|
||||||
|
The default implementations provided by Hadoop can be used as references:
|
||||||
|
|
||||||
|
* <<<org.apache.hadoop.mapred.ShuffleHandler>>>
|
||||||
|
|
||||||
|
* <<<org.apache.hadoop.mapreduce.task.reduce.Shuffle>>>
|
||||||
|
|
||||||
|
A custom sort implementation requires a <<<org.apache.hadoop.mapred.MapOutputCollector>>>
|
||||||
|
implementation class running in the Mapper tasks and (optionally, depending
|
||||||
|
on the sort implementation) a <<<org.apache.hadoop.mapred.ShuffleConsumerPlugin>>>
|
||||||
|
implementation class running in the Reducer tasks.
|
||||||
|
|
||||||
|
The default implementations provided by Hadoop can be used as references:
|
||||||
|
|
||||||
|
* <<<org.apache.hadoop.mapred.MapTask$MapOutputBuffer>>>
|
||||||
|
|
||||||
|
* <<<org.apache.hadoop.mapreduce.task.reduce.Shuffle>>>
|
||||||
|
|
||||||
|
* Configuration
|
||||||
|
|
||||||
|
Except for the auxiliary service running in the NodeManagers serving the
|
||||||
|
shuffle (by default the <<<ShuffleHandler>>>), all the pluggable components
|
||||||
|
run in the job tasks. This means, they can be configured on per job basis.
|
||||||
|
The auxiliary service servicing the Shuffle must be configured in the
|
||||||
|
NodeManagers configuration.
|
||||||
|
|
||||||
|
** Job Configuration Properties (on per job basis):
|
||||||
|
|
||||||
|
*--------------------------------------+---------------------+-----------------+
|
||||||
|
| <<Property>> | <<Default Value>> | <<Explanation>> |
|
||||||
|
*--------------------------------------+---------------------+-----------------+
|
||||||
|
| <<<mapreduce.job.reduce.shuffle.consumer.plugin.class>>> | <<<org.apache.hadoop.mapreduce.task.reduce.Shuffle>>> | The <<<ShuffleConsumerPlugin>>> implementation to use |
|
||||||
|
*--------------------------------------+---------------------+-----------------+
|
||||||
|
| <<<mapreduce.job.map.output.collector.class>>> | <<<org.apache.hadoop.mapred.MapTask$MapOutputBuffer>>> | The <<<MapOutputCollector>>> implementation to use |
|
||||||
|
*--------------------------------------+---------------------+-----------------+
|
||||||
|
|
||||||
|
These properties can also be set in the <<<mapred-site.xml>>> to change the default values for all jobs.
|
||||||
|
|
||||||
|
** NodeManager Configuration properties, <<<yarn-site.xml>>> in all nodes:
|
||||||
|
|
||||||
|
*--------------------------------------+---------------------+-----------------+
|
||||||
|
| <<Property>> | <<Default Value>> | <<Explanation>> |
|
||||||
|
*--------------------------------------+---------------------+-----------------+
|
||||||
|
| <<<yarn.nodemanager.aux-services>>> | <<<...,mapreduce.shuffle>>> | The auxiliary service name |
|
||||||
|
*--------------------------------------+---------------------+-----------------+
|
||||||
|
| <<<yarn.nodemanager.aux-services.mapreduce.shuffle.class>>> | <<<org.apache.hadoop.mapred.ShuffleHandler>>> | The auxiliary service class to use |
|
||||||
|
*--------------------------------------+---------------------+-----------------+
|
||||||
|
|
||||||
|
<<IMPORTANT:>> If setting an auxiliary service in addition the default
|
||||||
|
<<<mapreduce.shuffle>>> service, then a new service key should be added to the
|
||||||
|
<<<yarn.nodemanager.aux-services>>> property, for example <<<mapred.shufflex>>>.
|
||||||
|
Then the property defining the corresponding class must be
|
||||||
|
<<<yarn.nodemanager.aux-services.mapreduce.shufflex.class>>>.
|
||||||
|
|
|
@ -65,6 +65,7 @@
|
||||||
|
|
||||||
<menu name="MapReduce" inherit="top">
|
<menu name="MapReduce" inherit="top">
|
||||||
<item name="Encrypted Shuffle" href="hadoop-mapreduce-client/hadoop-mapreduce-client-core/EncryptedShuffle.html"/>
|
<item name="Encrypted Shuffle" href="hadoop-mapreduce-client/hadoop-mapreduce-client-core/EncryptedShuffle.html"/>
|
||||||
|
<item name="Pluggable Shuffle/Sort" href="hadoop-mapreduce-client/hadoop-mapreduce-client-core/PluggableShuffleAndPluggableSort.html"/>
|
||||||
</menu>
|
</menu>
|
||||||
|
|
||||||
<menu name="YARN" inherit="top">
|
<menu name="YARN" inherit="top">
|
||||||
|
|
Loading…
Reference in New Issue