MAPREDUCE-5184. Document compatibility for MapReduce applications in hadoop-2 vis-a-vis hadoop-1. Contributed by Zhijie Shen.

git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/trunk@1493570 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Arun Murthy 2013-06-16 19:09:48 +00:00
parent 98d97d316c
commit 7f96128e43
4 changed files with 113 additions and 1 deletions

View File

@ -288,6 +288,9 @@ Release 2.1.0-beta - UNRELEASED
MAPREDUCE-5192. Allow for alternate resolutions of TaskCompletionEvents.
(cdouglas via acmurthy)
MAPREDUCE-5184. Document compatibility for MapReduce applications in
hadoop-2 vis-a-vis hadoop-1. (Zhijie Shen via acmurthy)
OPTIMIZATIONS
MAPREDUCE-4974. Optimising the LineRecordReader initialize() method

View File

@ -0,0 +1,107 @@
~~ Licensed under the Apache License, Version 2.0 (the "License");
~~ you may not use this file except in compliance with the License.
~~ You may obtain a copy of the License at
~~
~~ http://www.apache.org/licenses/LICENSE-2.0
~~
~~ Unless required by applicable law or agreed to in writing, software
~~ distributed under the License is distributed on an "AS IS" BASIS,
~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~~ See the License for the specific language governing permissions and
~~ limitations under the License. See accompanying LICENSE file.
---
Hadoop Map Reduce Next Generation-${project.version} - Backward Compatibility
---
---
${maven.build.timestamp}
Apache Hadoop MapReduce - Migrating from Apache Hadoop 1.x to Apache Hadoop 2.x
\[ {{{../../hadoop-yarn/hadoop-yarn-site/index.html}Go Back}} \]
* {Introduction}
This document provides information for users to migrate their Apache Hadoop
MapReduce applications from Apache Hadoop 1.x to Apache Hadoop 2.x.
In Apache Hadoop 2.x we have spun off resource management capabilities
into Apache Hadoop YARN, a general purpose, distributed application management
framework while Apache Hadoop MapReduce (aka MRv2) remains as a pure
distributed computation framework.
In general, the previous MapReduce runtime (aka MRv1) has been reused and
no major surgery has been conducted on it. Therefore, MRv2 is able to ensure
satisfactory compatibility with MRv1 applications. However, due to some
improvements and code refactorings, a few APIs have been rendered
backward-incompatible.
The remainder of this page will discuss the scope and the level of backward
compatibility that we support in Apache Hadoop MapReduce 2.x (MRv2).
* {Binary Compatibility}
First, we ensure binary compatibility to the applications that use old
<<mapred>> APIs. This means that applications which were built against MRv1
<<mapred>> APIs can run directly on YARN without recompilation, merely by
pointing them to an Apache Hadoop 2.x cluster via configuration.
* {Source Compatibility}
We cannot ensure complete binary compatibility with the applications that use
<<mapreduce>> APIs, as these APIs have evolved a lot since MRv1. However, we
ensure source compatibility for <<mapreduce>> APIs that break binary
compatibility. In other words, users should recompile their applications that
use <<mapreduce>> APIs against MRv2 jars. One notable binary incompatibility
break is Counter and CounterGroup.
* {Not Supported}
MRAdmin has been removed in MRv2 because because <<<mradmin>>> commands
no longer exist. They have been replaced by the commands in <<<rmadmin>>>. We
neither support binary compatibility nor source compatibility for the
applications that use this class directly.
* {Tradeoffs between MRv1 Users and Early MRv2 Adopters}
Unfortunately, maintaining binary compatibility for MRv1 applications may lead
to binary incompatibility issues for early MRv2 adopters, in particular Hadoop
0.23 users. For <<mapred>> APIs, we have chosen to be compatible with MRv1
applications, which have a larger user base. For <<mapreduce>> APIs, if they
don't significantly break Hadoop 0.23 applications, we still change them to be
compatible with MRv1 applications. Below is the list of MapReduce APIs which
are incompatible with Hadoop 0.23.
*-----------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------+
| <<Problematic Function>> | <<Incompatibility Issue>> |
*-----------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------+
| <<<org.apache.hadoop.util.ProgramDriver#drive>>> | Return type changes from <<<void>>> to <<<int>>> |
*-----------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------+
| <<<org.apache.hadoop.mapred.jobcontrol.Job#getMapredJobID>>> | Return type changes from <<<String>>> to <<<JobID>>> |
*-----------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------+
| <<<org.apache.hadoop.mapred.TaskReport#getTaskId>>> | Return type changes from <<<String>>> to <<<TaskID>>> |
*-----------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------+
| <<<org.apache.hadoop.mapred.ClusterStatus#UNINITIALIZED_MEMORY_VALUE>>> | Data type changes from <<<long>>> to <<<int>>> |
*-----------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------+
| <<<org.apache.hadoop.mapreduce.filecache.DistributedCache#getArchiveTimestamps>>> | Return type changes from <<<long[]>>> to <<<String[]>>> |
*-----------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------+
| <<<org.apache.hadoop.mapreduce.filecache.DistributedCache#getFileTimestamps>>> | Return type changes from <<<long[]>>> to <<<String[]>>> |
*-----------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------+
| <<<org.apache.hadoop.mapreduce.Job#failTask>>> | Return type changes from <<<void>>> to <<<boolean>>> |
*-----------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------+
| <<<org.apache.hadoop.mapreduce.Job#killTask>>> | Return type changes from <<<void>>> to <<<boolean>>> |
*-----------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------+
| <<<org.apache.hadoop.mapreduce.Job#getTaskCompletionEvents>>> | Return type changes from <<<o.a.h.mapred.TaskCompletionEvent[]>>> to <<<o.a.h.mapreduce.TaskCompletionEvent[]>>> |
*-----------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------+
* {Malicious}
For the users who are going to try <<<hadoop-examples-1.x.x.jar>>> on YARN,
please note that <<<hadoop -jar hadoop-examples-1.x.x.jar>>> will still use
<<<hadoop-mapreduce-examples-2.x.x.jar>>>, which is installed together with
other MRv2 jars. By default Hadoop framework jars appear before the users'
jars in the classpath, such that the classes from the 2.x.x jar will still be
picked. Users should either remove <<<hadoop-mapreduce-examples-2.x.x.jar>>>
from the classpath or set <<<HADOOP_USER_CLASSPATH_FIRST=true>>> and
<<<HADOOP_CLASSPATH=...:hadoop-examples-1.x.x.jar>>> to run their target
examples jar.

View File

@ -79,6 +79,7 @@
</menu>
<menu name="MapReduce" inherit="top">
<item name="Compatibilty between Hadoop 1.x and Hadoop 2.x" href="hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html"/>
<item name="Encrypted Shuffle" href="hadoop-mapreduce-client/hadoop-mapreduce-client-core/EncryptedShuffle.html"/>
<item name="Pluggable Shuffle/Sort" href="hadoop-mapreduce-client/hadoop-mapreduce-client-core/PluggableShuffleAndPluggableSort.html"/>
</menu>

View File

@ -53,5 +53,6 @@ MapReduce NextGen aka YARN aka MRv2
* {{{../../hadoop-project-dist/hadoop-common/CLIMiniCluster.html}CLI MiniCluster}}
* {{{../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/EncryptedShuffle.html}Encrypted Shuffle}}
* {{{../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html}Backward Compatibility between Apache Hadoop 1.x and 2.x for MapReduce}}
* {{{../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/EncryptedShuffle.html}Encrypted Shuffle}}