MAPREDUCE-4977. Documentation for pluggable shuffle and pluggable sort. (tucu)

git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/trunk@1443168 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Alejandro Abdelnur 2013-02-06 19:52:25 +00:00
parent ab16a37572
commit 17e72be6d8
3 changed files with 100 additions and 0 deletions

View File

@ -230,6 +230,9 @@ Release 2.0.3-alpha - 2013-02-06
MAPREDUCE-4971. Minor extensibility enhancements to Counters &
FileOutputFormat. (Arun C Murthy via sseth)
MAPREDUCE-4977. Documentation for pluggable shuffle and pluggable sort.
(tucu)
OPTIMIZATIONS
MAPREDUCE-4893. Fixed MR ApplicationMaster to do optimal assignment of

View File

@ -0,0 +1,96 @@
~~ Licensed under the Apache License, Version 2.0 (the "License");
~~ you may not use this file except in compliance with the License.
~~ You may obtain a copy of the License at
~~
~~ http://www.apache.org/licenses/LICENSE-2.0
~~
~~ Unless required by applicable law or agreed to in writing, software
~~ distributed under the License is distributed on an "AS IS" BASIS,
~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~~ See the License for the specific language governing permissions and
~~ limitations under the License. See accompanying LICENSE file.
---
Hadoop Map Reduce Next Generation-${project.version} - Pluggable Shuffle and Pluggable Sort
---
---
${maven.build.timestamp}
Hadoop MapReduce Next Generation - Pluggable Shuffle and Pluggable Sort
\[ {{{./index.html}Go Back}} \]
* Introduction
The pluggable shuffle and pluggable sort capabilities allow replacing the
built in shuffle and sort logic with alternate implementations. Example use
cases for this are: using a different application protocol other than HTTP
such as RDMA for shuffling data from the Map nodes to the Reducer nodes; or
replacing the sort logic with custom algorithms that enable Hash aggregation
and Limit-N query.
<<IMPORTANT:>> The pluggable shuffle and pluggable sort capabilities are
experimental and unstable. This means the provided APIs may change and break
compatibility in future versions of Hadoop.
* Implementing a Custom Shuffle and a Custom Sort
A custom shuffle implementation requires a
<<<org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.AuxiliaryService>>>
implementation class running in the NodeManagers and a
<<<org.apache.hadoop.mapred.ShuffleConsumerPlugin>>> implementation class
running in the Reducer tasks.
The default implementations provided by Hadoop can be used as references:
* <<<org.apache.hadoop.mapred.ShuffleHandler>>>
* <<<org.apache.hadoop.mapreduce.task.reduce.Shuffle>>>
A custom sort implementation requires a <<<org.apache.hadoop.mapred.MapOutputCollector>>>
implementation class running in the Mapper tasks and (optionally, depending
on the sort implementation) a <<<org.apache.hadoop.mapred.ShuffleConsumerPlugin>>>
implementation class running in the Reducer tasks.
The default implementations provided by Hadoop can be used as references:
* <<<org.apache.hadoop.mapred.MapTask$MapOutputBuffer>>>
* <<<org.apache.hadoop.mapreduce.task.reduce.Shuffle>>>
* Configuration
Except for the auxiliary service running in the NodeManagers serving the
shuffle (by default the <<<ShuffleHandler>>>), all the pluggable components
run in the job tasks. This means, they can be configured on per job basis.
The auxiliary service servicing the Shuffle must be configured in the
NodeManagers configuration.
** Job Configuration Properties (on per job basis):
*--------------------------------------+---------------------+-----------------+
| <<Property>> | <<Default Value>> | <<Explanation>> |
*--------------------------------------+---------------------+-----------------+
| <<<mapreduce.job.reduce.shuffle.consumer.plugin.class>>> | <<<org.apache.hadoop.mapreduce.task.reduce.Shuffle>>> | The <<<ShuffleConsumerPlugin>>> implementation to use |
*--------------------------------------+---------------------+-----------------+
| <<<mapreduce.job.map.output.collector.class>>> | <<<org.apache.hadoop.mapred.MapTask$MapOutputBuffer>>> | The <<<MapOutputCollector>>> implementation to use |
*--------------------------------------+---------------------+-----------------+
These properties can also be set in the <<<mapred-site.xml>>> to change the default values for all jobs.
** NodeManager Configuration properties, <<<yarn-site.xml>>> in all nodes:
*--------------------------------------+---------------------+-----------------+
| <<Property>> | <<Default Value>> | <<Explanation>> |
*--------------------------------------+---------------------+-----------------+
| <<<yarn.nodemanager.aux-services>>> | <<<...,mapreduce.shuffle>>> | The auxiliary service name |
*--------------------------------------+---------------------+-----------------+
| <<<yarn.nodemanager.aux-services.mapreduce.shuffle.class>>> | <<<org.apache.hadoop.mapred.ShuffleHandler>>> | The auxiliary service class to use |
*--------------------------------------+---------------------+-----------------+
<<IMPORTANT:>> If setting an auxiliary service in addition the default
<<<mapreduce.shuffle>>> service, then a new service key should be added to the
<<<yarn.nodemanager.aux-services>>> property, for example <<<mapred.shufflex>>>.
Then the property defining the corresponding class must be
<<<yarn.nodemanager.aux-services.mapreduce.shufflex.class>>>.

View File

@ -65,6 +65,7 @@
<menu name="MapReduce" inherit="top">
<item name="Encrypted Shuffle" href="hadoop-mapreduce-client/hadoop-mapreduce-client-core/EncryptedShuffle.html"/>
<item name="Pluggable Shuffle/Sort" href="hadoop-mapreduce-client/hadoop-mapreduce-client-core/PluggableShuffleAndPluggableSort.html"/>
</menu>
<menu name="YARN" inherit="top">