HADOOP-13714. Tighten up our compatibility guidelines for Hadoop 3

(cherry picked from commit 7618fa9194)
This commit is contained in:
Daniel Templeton 2017-09-16 09:20:33 +02:00
parent 0704a5b5cb
commit 972250923f
2 changed files with 662 additions and 184 deletions

View File

@ -20,109 +20,276 @@ Apache Hadoop Compatibility
Purpose
-------
This document captures the compatibility goals of the Apache Hadoop project. The different types of compatibility between Hadoop releases that affects Hadoop developers, downstream projects, and end-users are enumerated. For each type of compatibility we:
This document captures the compatibility goals of the Apache Hadoop project.
The different types of compatibility between Hadoop releases that affect
Hadoop developers, downstream projects, and end-users are enumerated. For each
type of compatibility this document will:
* describe the impact on downstream projects or end-users
* where applicable, call out the policy adopted by the Hadoop developers when incompatible changes are permitted.
All Hadoop interfaces are classified according to the intended audience and
stability in order to maintain compatibility with previous releases. See the
[Hadoop Interface Taxonomy](./InterfaceClassification.html) for details
about the classifications.
### Target Audience
This document is intended for consumption by the Hadoop developer community.
This document describes the lens through which changes to the Hadoop project
should be viewed. In order for end users and third party developers to have
confidence about cross-release compatibility, the developer community must
ensure that development efforts adhere to these policies. It is the
responsibility of the project committers to validate that all changes either
maintain compatibility or are explicitly marked as incompatible.
Within a component Hadoop developers are free to use Private and Limited Private
APIs, but when using components from a different module Hadoop developers
should follow the same guidelines as third-party developers: do not
use Private or Limited Private (unless explicitly allowed) interfaces and
prefer instead Stable interfaces to Evolving or Unstable interfaces where
possible. Where not possible, the preferred solution is to expand the audience
of the API rather than introducing or perpetuating an exception to these
compatibility guidelines. When working within a Maven module Hadoop developers
should observe where possible the same level of restraint with regard to
using components located in other Maven modules.
Above all, Hadoop developers must be mindful of the impact of their changes.
Stable interfaces must not change between major releases. Evolving interfaces
must not change between minor releases. New classes and components must be
labeled appropriately for audience and stability. See the
[Hadoop Interface Taxonomy](./InterfaceClassification.html) for details about
when the various labels are appropriate. As a general rule, all new interfaces
and APIs should have the most limited labels (e.g. Private Unstable) that will
not inhibit the intent of the interface or API.
### Notational Conventions
The key words "MUST" "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" are to be interpreted as
described in [RFC 2119](http://tools.ietf.org/html/rfc2119).
Deprecation
-----------
The Java API provides a @Deprecated annotation to mark an API element as
flagged for removal. The standard meaning of the annotation is that the
API element should not be used and may be removed in a later version.
In all cases removing an element from an API is an incompatible
change. In the case of [Stable](./InterfaceClassification.html#Stable) APIs,
the change cannot be made between minor releases within the same major
version. In addition, to allow consumers of the API time to adapt to the change,
the API element to be removed should be marked as deprecated for a full major
release before it is removed. For example, if a method is marked as deprecated
in Hadoop 2.8, it cannot be removed until Hadoop 4.0.
### Policy
[Stable](./InterfaceClassification.html#Stable) API elements MUST NOT be removed
until they have been marked as deprecated (through the @Deprecated annotation or
other appropriate documentation) for a full major release. In the case that an
API element was introduced as deprecated (to indicate that it is a temporary
measure that is intended to be removed) the API element MAY be removed in the
following major release. When modifying a
[Stable](./InterfaceClassification.html#Stable) API, developers SHOULD prefer
introducing a new method or endpoint and deprecating the existing one to making
incompatible changes to the method or endpoint.
Compatibility types
-------------------
### Java API
Hadoop interfaces and classes are annotated to describe the intended audience and stability in order to maintain compatibility with previous releases. See [Hadoop Interface Classification](./InterfaceClassification.html) for details.
Developers SHOULD annotate all Hadoop interfaces and classes with the
@InterfaceAudience and @InterfaceStability annotations to describe the
intended audience and stability. Annotations may be at the package, class, or
member variable or method level. Member variable and method annotations SHALL
override class annotations, and class annotations SHALL override package
annotations. A package, class, or member variable or method that is not
annotated SHALL be interpreted as implicitly
[Private](./InterfaceClassification.html#Private) and
[Unstable](./InterfaceClassification.html#Unstable).
* InterfaceAudience: captures the intended audience, possible values are Public (for end users and external projects), LimitedPrivate (for other Hadoop components, and closely related projects like YARN, MapReduce, HBase etc.), and Private (for intra component use).
* InterfaceStability: describes what types of interface changes are permitted. Possible values are Stable, Evolving, Unstable, and Deprecated.
* @InterfaceAudience captures the intended audience. Possible values are
[Public](./InterfaceClassification.html#Public) (for end users and external
projects), Limited[Private](./InterfaceClassification.html#Private) (for other
Hadoop components, and closely related projects like YARN, MapReduce, HBase
etc.), and [Private](./InterfaceClassification.html#Private)
(for intra component use).
* @InterfaceStability describes what types of interface changes are permitted. Possible values are [Stable](./InterfaceClassification.html#Stable), [Evolving](./InterfaceClassification.html#Evolving), and [Unstable](./InterfaceClassification.html#Unstable).
* @Deprecated notes that the package, class, or member variable or method could potentially be removed in the future and should not be used.
#### Use Cases
* Public-Stable API compatibility is required to ensure end-user programs and downstream projects continue to work without modification.
* LimitedPrivate-Stable API compatibility is required to allow upgrade of individual components across minor releases.
* Private-Stable API compatibility is required for rolling upgrades.
* [Public](./InterfaceClassification.html#Public)-[Stable](./InterfaceClassification.html#Stable) API compatibility is required to ensure end-user programs and downstream projects continue to work without modification.
* [Public](./InterfaceClassification.html#Public)-[Evolving](./InterfaceClassification.html#Evolving) API compatibility is useful to make functionality available for consumption before it is fully baked.
* Limited Private-[Stable](./InterfaceClassification.html#Stable) API compatibility is required to allow upgrade of individual components across minor releases.
* [Private](./InterfaceClassification.html#Private)-[Stable](./InterfaceClassification.html#Stable) API compatibility is required for rolling upgrades.
* [Private](./InterfaceClassification.html#Private)-[Unstable](./InterfaceClassification.html#Unstable) API compatibility allows internal components to evolve rapidly without concern for downstream consumers, and is how most interfaces should be labeled.
#### Policy
* Public-Stable APIs must be deprecated for at least one major release prior to their removal in a major release.
* LimitedPrivate-Stable APIs can change across major releases, but not within a major release.
* Private-Stable APIs can change across major releases, but not within a major release.
* Classes not annotated are implicitly "Private". Class members not annotated inherit the annotations of the enclosing class.
* Note: APIs generated from the proto files need to be compatible for rolling-upgrades. See the section on wire-compatibility for more details. The compatibility policies for APIs and wire-communication need to go hand-in-hand to address this.
The compatibility policy SHALL be determined by the relevant package, class, or
member variable or method annotations.
### Semantic compatibility
Note: APIs generated from the proto files MUST be compatible for rolling
upgrades. See the section on wire protocol compatibility for more details. The
compatibility policies for APIs and wire protocols must therefore go hand
in hand.
Apache Hadoop strives to ensure that the behavior of APIs remains consistent over versions, though changes for correctness may result in changes in behavior. Tests and javadocs specify the API's behavior. The community is in the process of specifying some APIs more rigorously, and enhancing test suites to verify compliance with the specification, effectively creating a formal specification for the subset of behaviors that can be easily tested.
#### Semantic compatibility
Apache Hadoop strives to ensure that the behavior of APIs remains consistent
over versions, though changes for correctness may result in changes in
behavior. API behavior SHALL be specified by the JavaDoc API documentation
where present and complete. When JavaDoc API documentation is not available,
behavior SHALL be specified by the behavior expected by the related unit tests.
In cases with no JavaDoc API documentation or unit test coverage, the expected
behavior is presumed to be obvious and SHOULD be assumed to be the minimum
functionality implied by the interface naming. The community is in the process
of specifying some APIs more rigorously and enhancing test suites to verify
compliance with the specification, effectively creating a formal specification
for the subset of behaviors that can be easily tested.
The behavior of any API MAY be changed to fix incorrect behavior according to
the stability of the API, with such a change to be accompanied by updating
existing documentation and tests and/or adding new documentation or tests.
#### Java Binary compatibility for end-user applications i.e. Apache Hadoop ABI
Apache Hadoop revisions SHOUD retain binary compatability such that end-user
applications continue to work without any modifications. Minor Apache Hadoop
revisions within the same major revision MUST retain compatibility such that
existing MapReduce applications (e.g. end-user applications and projects such
as Apache Pig, Apache Hive, et al), existing YARN applications (e.g.
end-user applications and projects such as Apache Spark, Apache Tez et al),
and applications that accesses HDFS directly (e.g. end-user applications and
projects such as Apache HBase, Apache Flume, et al) work unmodified and without
recompilation when used with any Apache Hadoop cluster within the same major
release as the original build target.
For MapReduce applications in particular, i.e. applications using the
org.apache.hadoop.mapred and/or org.apache.hadoop.mapreduce APIs, the developer
community SHALL support binary compatibility across major releases. The
MapReduce APIs SHALL be supported compatibly across major releases. See
[Compatibility for MapReduce applications between hadoop-1.x and hadoop-2.x](../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html) for more details.
Some applications may be affected by changes to disk layouts or other internal
changes. See the sections that follow for policies on how incompatible
changes to non-API interfaces are handled.
### Native Dependencies
Hadoop includes several native components, including compression, the
container executor binary, and various native integrations. These native
components introduce a set of native dependencies for Hadoop, both at compile
time and at runtime, such as cmake, gcc, zlib, etc. This set of native
dependencies is part of the Hadoop ABI.
#### Policy
The behavior of API may be changed to fix incorrect behavior, such a change to be accompanied by updating existing buggy tests or adding tests in cases there were none prior to the change.
The minimum required versions of the native components on which Hadoop depends
at compile time and/or runtime SHALL be considered
[Stable](./InterfaceClassification.html#Stable). Changes to the minimum
required versions MUST NOT increase between minor releases within a major
version.
### Wire compatibility
### Wire Protocols
Wire compatibility concerns data being transmitted over the wire between Hadoop processes. Hadoop uses Protocol Buffers for most RPC communication. Preserving compatibility requires prohibiting modification as described below. Non-RPC communication should be considered as well, for example using HTTP to transfer an HDFS image as part of snapshotting or transferring MapTask output. The potential communications can be categorized as follows:
Wire compatibility concerns data being transmitted "over the wire" between
Hadoop processes. Hadoop uses
[Protocol Buffers](https://developers.google.com/protocol-buffers/) for most
RPC communication. Preserving compatibility requires prohibiting modification
as described below. Non-RPC communication should be considered as well, for
example using HTTP to transfer an HDFS image as part of snapshotting or
transferring MapReduce map task output. The communications can be categorized as
follows:
* Client-Server: communication between Hadoop clients and servers (e.g., the HDFS client to NameNode protocol, or the YARN client to ResourceManager protocol).
* Client-Server (Admin): It is worth distinguishing a subset of the Client-Server protocols used solely by administrative commands (e.g., the HAAdmin protocol) as these protocols only impact administrators who can tolerate changes that end users (which use general Client-Server protocols) can not.
* Client-Server (Admin): It is worth distinguishing a subset of the Client-Server protocols used solely by administrative commands (e.g., the HAAdmin protocol) as these protocols only impact administrators who can tolerate changes that end users (which use general Client-Server protocols) cannot.
* Server-Server: communication between servers (e.g., the protocol between the DataNode and NameNode, or NodeManager and ResourceManager)
#### Use Cases
#### Protocol Dependencies
* Client-Server compatibility is required to allow users to continue using the old clients even after upgrading the server (cluster) to a later version (or vice versa). For example, a Hadoop 2.1.0 client talking to a Hadoop 2.3.0 cluster.
* Client-Server compatibility is also required to allow users to upgrade the client before upgrading the server (cluster). For example, a Hadoop 2.4.0 client talking to a Hadoop 2.3.0 cluster. This allows deployment of client-side bug fixes ahead of full cluster upgrades. Note that new cluster features invoked by new client APIs or shell commands will not be usable. YARN applications that attempt to use new APIs (including new fields in data structures) that have not yet been deployed to the cluster can expect link exceptions.
* Client-Server compatibility is also required to allow upgrading individual components without upgrading others. For example, upgrade HDFS from version 2.1.0 to 2.2.0 without upgrading MapReduce.
* Server-Server compatibility is required to allow mixed versions within an active cluster so the cluster may be upgraded without downtime in a rolling fashion.
The components of Apache Hadoop may have dependencies that include their own
protocols, such as Zookeeper, S3, Kerberos, etc. These protocol dependencies
SHALL be treated as internal protocols and governed by the same policy.
#### Transports
In addition to compatibility of the protocols themselves, maintaining
cross-version communications requires that the transports supported also be
stable. The most likely source of transport changes stems from secure
transports, such as SSL. Upgrading a service from SSLv2 to SSLv3 may break
existing SSLv2 clients. The minimum supported major version of any transports
MUST not increase across minor releases within a major version.
Service ports are considered as part of the transport mechanism. Fixed
service port numbers MUST be kept consistent to prevent breaking clients.
#### Policy
* Both Client-Server and Server-Server compatibility is preserved within a major release. (Different policies for different categories are yet to be considered.)
* Compatibility can be broken only at a major release, though breaking compatibility even at major releases has grave consequences and should be discussed in the Hadoop community.
* Hadoop protocols are defined in .proto (ProtocolBuffers) files. Client-Server protocols and Server-Server protocol .proto files are marked as stable. When a .proto file is marked as stable it means that changes should be made in a compatible fashion as described below:
* The following changes are compatible and are allowed at any time:
* Add an optional field, with the expectation that the code deals with the field missing due to communication with an older version of the code.
* Add a new rpc/method to the service
* Add a new optional request to a Message
* Rename a field
* Rename a .proto file
* Change .proto annotations that effect code generation (e.g. name of java package)
* The following changes are incompatible but can be considered only at a major release
* Change the rpc/method name
* Change the rpc/method parameter type or return type
* Remove an rpc/method
* Change the service name
* Change the name of a Message
* Modify a field type in an incompatible way (as defined recursively)
* Change an optional field to required
* Add or delete a required field
* Delete an optional field as long as the optional field has reasonable defaults to allow deletions
* The following changes are incompatible and hence never allowed
* Change a field id
* Reuse an old field that was previously deleted.
* Field numbers are cheap and changing and reusing is not a good idea.
Hadoop wire protocols are defined in .proto (ProtocolBuffers) files.
Client-Server and Server-Server protocols SHALL be classified according to the
audience and stability classifications noted in their .proto files. In cases
where no classifications are present, the protocols SHOULD be assumed to be
[Private](./InterfaceClassification.html#Private) and
[Stable](./InterfaceClassification.html#Stable).
### Java Binary compatibility for end-user applications i.e. Apache Hadoop ABI
The following changes to a .proto file SHALL be considered compatible:
As Apache Hadoop revisions are upgraded end-users reasonably expect that their applications should continue to work without any modifications. This is fulfilled as a result of supporting API compatibility, Semantic compatibility and Wire compatibility.
* Add an optional field, with the expectation that the code deals with the field missing due to communication with an older version of the code
* Add a new rpc/method to the service
* Add a new optional request to a Message
* Rename a field
* Rename a .proto file
* Change .proto annotations that effect code generation (e.g. name of java package)
However, Apache Hadoop is a very complex, distributed system and services a very wide variety of use-cases. In particular, Apache Hadoop MapReduce is a very, very wide API; in the sense that end-users may make wide-ranging assumptions such as layout of the local disk when their map/reduce tasks are executing, environment variables for their tasks etc. In such cases, it becomes very hard to fully specify, and support, absolute compatibility.
The following changes to a .proto file SHALL be considered incompatible:
#### Use cases
* Change an rpc/method name
* Change an rpc/method parameter type or return type
* Remove an rpc/method
* Change the service name
* Change the name of a Message
* Modify a field type in an incompatible way (as defined recursively)
* Change an optional field to required
* Add or delete a required field
* Delete an optional field as long as the optional field has reasonable defaults to allow deletions
* Existing MapReduce applications, including jars of existing packaged end-user applications and projects such as Apache Pig, Apache Hive, Cascading etc. should work unmodified when pointed to an upgraded Apache Hadoop cluster within a major release.
* Existing YARN applications, including jars of existing packaged end-user applications and projects such as Apache Tez etc. should work unmodified when pointed to an upgraded Apache Hadoop cluster within a major release.
* Existing applications which transfer data in/out of HDFS, including jars of existing packaged end-user applications and frameworks such as Apache Flume, should work unmodified when pointed to an upgraded Apache Hadoop cluster within a major release.
The following changes to a .proto file SHALL be considered incompatible:
#### Policy
* Change a field id
* Reuse an old field that was previously deleted.
* Existing MapReduce, YARN & HDFS applications and frameworks should work unmodified within a major release i.e. Apache Hadoop ABI is supported.
* A very minor fraction of applications maybe affected by changes to disk layouts etc., the developer community will strive to minimize these changes and will not make them within a minor version. In more egregious cases, we will consider strongly reverting these breaking changes and invalidating offending releases if necessary.
* In particular for MapReduce applications, the developer community will try our best to support providing binary compatibility across major releases e.g. applications using org.apache.hadoop.mapred.
* APIs are supported compatibly across hadoop-1.x and hadoop-2.x. See [Compatibility for MapReduce applications between hadoop-1.x and hadoop-2.x](../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html) for more details.
Hadoop wire protocols that are not defined via .proto files SHOULD be considered
to be [Private](./InterfaceClassification.html#Private) and
[Stable](./InterfaceClassification.html#Stable).
In addition to the limitations imposed by being
[Stable](./InterfaceClassification.html#Stable), Hadoop's wire protocols
MUST also be forward compatible across minor releases within a major version
according to the following:
* Client-Server compatibility MUST be maintained so as to allow users to continue using older clients even after upgrading the server (cluster) to a later version (or vice versa). For example, a Hadoop 2.1.0 client talking to a Hadoop 2.3.0 cluster.
* Client-Server compatibility MUST be maintained so as to allow users to upgrade the client before upgrading the server (cluster). For example, a Hadoop 2.4.0 client talking to a Hadoop 2.3.0 cluster. This allows deployment of client-side bug fixes ahead of full cluster upgrades. Note that new cluster features invoked by new client APIs or shell commands will not be usable. YARN applications that attempt to use new APIs (including new fields in data structures) that have not yet been deployed to the cluster can expect link exceptions.
* Client-Server compatibility MUST be maintained so as to allow upgrading individual components without upgrading others. For example, upgrade HDFS from version 2.1.0 to 2.2.0 without upgrading MapReduce.
* Server-Server compatibility MUST be maintained so as to allow mixed versions within an active cluster so the cluster may be upgraded without downtime in a rolling fashion.
New transport mechanisms MUST only be introduced with minor or major version
changes. Existing transport mechanisms MUST continue to be supported across
minor versions within a major version. Service port numbers MUST remain
consistent across minor version numbers within a major version.
### REST APIs
REST API compatibility corresponds to both the requests (URLs) and responses to each request (content, which may contain other URLs). Hadoop REST APIs are specifically meant for stable use by clients across releases, even major ones. The following are the exposed REST APIs:
REST API compatibility applies to the REST endpoints (URLs) and response data
format. Hadoop REST APIs are specifically meant for stable use by clients across
releases, even major ones. The following is a non-exhaustive list of the
exposed REST APIs:
* [WebHDFS](../hadoop-hdfs/WebHDFS.html) - Stable
* [WebHDFS](../hadoop-hdfs/WebHDFS.html)
* [ResourceManager](../../hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html)
* [NodeManager](../../hadoop-yarn/hadoop-yarn-site/NodeManagerRest.html)
* [MR Application Master](../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapredAppMasterRest.html)
@ -130,134 +297,390 @@ REST API compatibility corresponds to both the requests (URLs) and responses to
* [Timeline Server v1 REST API](../../hadoop-yarn/hadoop-yarn-site/TimelineServer.html)
* [Timeline Service v2 REST API](../../hadoop-yarn/hadoop-yarn-site/TimelineServiceV2.html)
Each API has an API-specific version number. Any incompatible changes MUST
increment the API version number.
#### Policy
The APIs annotated stable in the text above preserve compatibility across at least one major release, and maybe deprecated by a newer version of the REST API in a major release.
The Hadoop REST APIs SHALL be considered
[Public](./InterfaceClassification.html#Public) and
[Evolving](./InterfaceClassification.html#Evolving). With respect to API version
numbers, the Hadoop REST APIs SHALL be considered
[Public](./InterfaceClassification.html#Public) and
[Stable](./InterfaceClassification.html#Stable), i.e. no incompatible changes
are allowed to within an API version number.
### Log Output
The Hadoop daemons and CLIs produce log output via Log4j that is intended to
aid administrators and developers in understanding and troubleshooting cluster
behavior. Log messages are intended for human consumption, though automation
use cases are also supported.
#### Policy
All log output SHALL be considered
[Public](./InterfaceClassification.html#Public) and
[Evolving](./InterfaceClassification.html#Evolving).
### Audit Log Output
Several components have audit logging systems that record system information in
a machine readable format. Incompatible changes to that data format may break
existing automation utilities. For the audit log, an incompatible change is any
change that changes the format such that existing parsers no longer can parse
the logs.
#### Policy
All audit log output SHALL be considered
[Public](./InterfaceClassification.html#Public) and
[Stable](./InterfaceClassification.html#Stable). Any change to the
data format SHALL be considered an incompatible change.
### Metrics/JMX
While the Metrics API compatibility is governed by Java API compatibility, the actual metrics exposed by Hadoop need to be compatible for users to be able to automate using them (scripts etc.). Adding additional metrics is compatible. Modifying (e.g. changing the unit or measurement) or removing existing metrics breaks compatibility. Similarly, changes to JMX MBean object names also break compatibility.
While the Metrics API compatibility is governed by Java API compatibility, the
Metrics data format exposed by Hadoop MUST be maintained as compatible for
consumers of the data, e.g. for automation tasks.
#### Policy
Metrics should preserve compatibility within the major release.
The data format exposed via Metrics SHALL be considered
[Public](./InterfaceClassification.html#Public) and
[Stable](./InterfaceClassification.html#Stable).
### File formats & Metadata
User and system level data (including metadata) is stored in files of different formats. Changes to the metadata or the file formats used to store data/metadata can lead to incompatibilities between versions.
User and system level data (including metadata) is stored in files of various
formats. Changes to the metadata or the file formats used to store
data/metadata can lead to incompatibilities between versions. Each class of file
format is addressed below.
#### User-level file formats
Changes to formats that end-users use to store their data can prevent them from accessing the data in later releases, and hence it is highly important to keep those file-formats compatible. One can always add a "new" format improving upon an existing format. Examples of these formats include har, war, SequenceFileFormat etc.
Changes to formats that end users use to store their data can prevent them from
accessing the data in later releases, and hence are important to be compatible.
Examples of these formats include har, war, SequenceFileFormat, etc.
##### Policy
* Non-forward-compatible user-file format changes are restricted to major releases. When user-file formats change, new releases are expected to read existing formats, but may write data in formats incompatible with prior releases. Also, the community shall prefer to create a new format that programs must opt in to instead of making incompatible changes to existing formats.
User-level file formats SHALL be considered
[Public](./InterfaceClassification.html#Public) and
[Stable](./InterfaceClassification.html#Stable). User-lever file
format changes SHOULD be made forward compatible across major releases and MUST
be made forward compatible within a major release. The developer community
SHOULD prefer the creation of a new derivative file format to making
incompatible changes to an existing file format. Such new file formats MUST be
created as opt-in, meaning that users must be able to continue using the
existing compatible format until and unless they explicitly opt in to using
the new file format.
#### System-internal file formats
#### System-internal data schemas
Hadoop internal data is also stored in files and again changing these formats can lead to incompatibilities. While such changes are not as devastating as the user-level file formats, a policy on when the compatibility can be broken is important.
Hadoop internal data may also be stored in files or other data stores. Changing
the schemas of these data stores can lead to incompatibilities.
##### MapReduce
MapReduce uses formats like I-File to store MapReduce-specific data.
##### Policy
###### Policy
MapReduce-internal formats like IFile maintain compatibility within a major release. Changes to these formats can cause in-flight jobs to fail and hence we should ensure newer clients can fetch shuffle-data from old servers in a compatible manner.
All MapReduce-internal file formats, such as I-File format or the job history
server's jhist file format, SHALL be considered
[Private](./InterfaceClassification.html#Private) and
[Stable](./InterfaceClassification.html#Stable).
##### HDFS Metadata
HDFS persists metadata (the image and edit logs) in a particular format. Incompatible changes to either the format or the metadata prevent subsequent releases from reading older metadata. Such incompatible changes might require an HDFS "upgrade" to convert the metadata to make it accessible. Some changes can require more than one such "upgrades".
HDFS persists metadata (the image and edit logs) in a private file format.
Incompatible changes to either the format or the metadata prevent subsequent
releases from reading older metadata. Incompatible changes MUST include a
process by which existing metadata may be upgraded. Changes SHALL be
allowed to require more than one upgrade. Incompatible changes MUST result in
the metadata version number being incremented.
Depending on the degree of incompatibility in the changes, the following potential scenarios can arise:
Depending on the degree of incompatibility in the changes, the following
potential scenarios can arise:
* Automatic: The image upgrades automatically, no need for an explicit "upgrade".
* Direct: The image is upgradable, but might require one explicit release "upgrade".
* Indirect: The image is upgradable, but might require upgrading to intermediate release(s) first.
* Not upgradeable: The image is not upgradeable.
##### Policy
HDFS data nodes store data in a private directory structure. The schema of that
directory structure must remain stable to retain compatibility.
* A release upgrade must allow a cluster to roll-back to the older version and its older disk format. The rollback needs to restore the original data, but not required to restore the updated data.
* HDFS metadata changes must be upgradeable via any of the upgrade paths - automatic, direct or indirect.
* More detailed policies based on the kind of upgrade are yet to be considered.
###### Policy
The HDFS metadata format SHALL be considered
[Private](./InterfaceClassification.html#Private) and
[Evolving](./InterfaceClassification.html#Evolving). Incompatible
changes MUST include a process by which existing metada may be upgraded. The
upgrade process MUST allow the cluster metadata to be rolled back to the older
version and its older disk format. The rollback MUST restore the original data
but is not REQUIRED to restore the updated data. Any incompatible change
to the format MUST result in the major version number of the schema being
incremented.
The data node directory format SHALL be considered
[Private](./InterfaceClassification.html#Private) and
[Evolving](./InterfaceClassification.html#Evolving).
##### AWS S3A Guard Metadata
For each operation in the Hadoop S3 client (s3a) that reads or modifies
file metadata, a shadow copy of that file metadata is stored in a separate
metadata store, which offers HDFS-like consistency for the metadata, and may
also provide faster lookups for things like file status or directory listings.
S3A guard tables are created with a version marker which indicates
compatibility.
###### Policy
The S3A guard metadata schema SHALL be considered
[Private](./InterfaceClassification.html#Private) and
[Unstable](./InterfaceClassification.html#Unstable). Any incompatible change
to the schema MUST result in the version number of the schema being incremented.
##### YARN Resource Manager State Store
The YARN resource manager stores information about the cluster state in an
external state store for use in fail over and recovery. If the schema used for
the state store data does not remain compatible, the resource manager will not
be able to recover its state and will fail to start. The state store data
schema includes a version number that indicates compatibility.
###### Policy
The YARN resource manager state store data schema SHALL be considered
[Private](./InterfaceClassification.html#Private) and
[Evolving](./InterfaceClassification.html#Evolving). Any incompatible change
to the schema MUST result in the major version number of the schema being
incremented. Any compatible change to the schema MUST result in the minor
version number being incremented.
##### YARN Node Manager State Store
The YARN node manager stores information about the node state in an
external state store for use in recovery. If the schema used for the state
store data does not remain compatible, the node manager will not
be able to recover its state and will fail to start. The state store data
schema includes a version number that indicates compatibility.
###### Policy
The YARN node manager state store data schema SHALL be considered
[Private](./InterfaceClassification.html#Private) and
[Evolving](./InterfaceClassification.html#Evolving). Any incompatible change
to the schema MUST result in the major version number of the schema being
incremented. Any compatible change to the schema MUST result in the minor
version number being incremented.
##### YARN Federation State Store
The YARN resource manager federation service stores information about the
federated clusters, running applications, and routing policies in an
external state store for use in replication and recovery. If the schema used
for the state store data does not remain compatible, the federation service
will fail to initialize. The state store data schema includes a version number
that indicates compatibility.
###### Policy
The YARN federation service state store data schema SHALL be considered
[Private](./InterfaceClassification.html#Private) and
[Evolving](./InterfaceClassification.html#Evolving). Any incompatible change
to the schema MUST result in the major version number of the schema being
incremented. Any compatible change to the schema MUST result in the minor
version number being incremented.
### Command Line Interface (CLI)
The Hadoop command line programs may be used either directly via the system shell or via shell scripts. Changing the path of a command, removing or renaming command line options, the order of arguments, or the command return code and output break compatibility and may adversely affect users.
The Hadoop command line programs may be used either directly via the system
shell or via shell scripts. The CLIs include both the user-facing commands, such
as the hdfs command or the yarn command, and the admin-facing commands, such as
the scripts used to start and stop daemons. Changing the path of a command,
removing or renaming command line options, the order of arguments, or the
command return codes and output break compatibility and adversely affect users.
#### Policy
CLI commands are to be deprecated (warning when used) for one major release before they are removed or incompatibly modified in a subsequent major release.
All Hadoop CLI paths, usage, and output SHALL be considered
[Public](./InterfaceClassification.html#Public) and
[Stable](./InterfaceClassification.html#Stable).
Note that the CLI output SHALL be considered distinct from the log output
generated by the Hadoop CLIs. The latter SHALL be governed by the policy on log
output. Note also that for CLI output, all changes SHALL be considered
incompatible changes.
### Web UI
Web UI, particularly the content and layout of web pages, changes could potentially interfere with attempts to screen scrape the web pages for information.
Web UI, particularly the content and layout of web pages, changes could
potentially interfere with attempts to screen scrape the web pages for
information. The Hadoop Web UI pages, however, are not meant to be scraped, e.g.
for automation purposes. Users are expected to use REST APIs to programmatically
access cluster information.
#### Policy
Web pages are not meant to be scraped and hence incompatible changes to them are allowed at any time. Users are expected to use REST APIs to get any information.
The Hadoop Web UI SHALL be considered
[Public](./InterfaceClassification.html#Public) and
[Unstable](./InterfaceClassification.html#Unstable).
### Hadoop Configuration Files
Users use (1) Hadoop-defined properties to configure and provide hints to Hadoop and (2) custom properties to pass information to jobs. Hence, compatibility of config properties is two-fold:
* Modifying key-names, units of values, and default values of Hadoop-defined properties.
* Custom configuration property keys should not conflict with the namespace of Hadoop-defined properties. Typically, users should avoid using prefixes used by Hadoop: hadoop, io, ipc, fs, net, file, ftp, s3, kfs, ha, file, dfs, mapred, mapreduce, yarn.
Users use Hadoop-defined properties to configure and provide hints to Hadoop and
custom properties to pass information to jobs. Users are encouraged to avoid
using custom configuration property names that conflict with the namespace of
Hadoop-defined properties and should avoid using any prefixes used by Hadoop,
e.g. hadoop, io, ipc, fs, net, file, ftp, s3, kfs, ha, file, dfs, mapred,
mapreduce, and yarn.
#### Policy
* Hadoop-defined properties are to be deprecated at least for one major release before being removed. Modifying units for existing properties is not allowed.
* The default values of Hadoop-defined properties can be changed across minor/major releases, but will remain the same across point releases within a minor release.
* Currently, there is NO explicit policy regarding when new prefixes can be added/removed, and the list of prefixes to be avoided for custom configuration properties. However, as noted above, users should avoid using prefixes used by Hadoop: hadoop, io, ipc, fs, net, file, ftp, s3, kfs, ha, file, dfs, mapred, mapreduce, yarn.
Hadoop-defined properties (names and meanings) SHALL be considered
[Public](./InterfaceClassification.html#Public) and
[Stable](./InterfaceClassification.html#Stable). The units implied by a
Hadoop-defined property MUST NOT change, even
across major versions. Default values of Hadoop-defined properties SHALL be
considered [Public](./InterfaceClassification.html#Public) and
[Evolving](./InterfaceClassification.html#Evolving).
### Log4j Configuration Files
The log output produced by Hadoop daemons and CLIs is governed by a set of
configuration files. These files control the minimum level of log message that
will be output by the various components of Hadoop, as well as where and how
those messages are stored.
#### Policy
All Log4j configurations SHALL be considered
[Public](./InterfaceClassification.html#Public) and
[Evolving](./InterfaceClassification.html#Evolving).
### Directory Structure
Source code, artifacts (source and tests), user logs, configuration files, output and job history are all stored on disk either local file system or HDFS. Changing the directory structure of these user-accessible files break compatibility, even in cases where the original path is preserved via symbolic links (if, for example, the path is accessed by a servlet that is configured to not follow symbolic links).
Source code, artifacts (source and tests), user logs, configuration files,
output, and job history are all stored on disk either local file system or HDFS.
Changing the directory structure of these user-accessible files can break
compatibility, even in cases where the original path is preserved via symbolic
links (such as when the path is accessed by a servlet that is configured to
not follow symbolic links).
#### Policy
* The layout of source code and build artifacts can change anytime, particularly so across major versions. Within a major version, the developers will attempt (no guarantees) to preserve the directory structure; however, individual files can be added/moved/deleted. The best way to ensure patches stay in sync with the code is to get them committed to the Apache source tree.
* The directory structure of configuration files, user logs, and job history will be preserved across minor and point releases within a major release.
The layout of source code and build artifacts SHALL be considered
[Private](./InterfaceClassification.html#Private) and
[Unstable](./InterfaceClassification.html#Unstable). Within a major version,
the developer community SHOULD preserve the
overall directory structure, though individual files MAY be added, moved, or
deleted with no warning.
The directory structure of configuration files, user logs, and job history SHALL
be considered [Public](./InterfaceClassification.html#Public) and
[Evolving](./InterfaceClassification.html#Evolving).
### Java Classpath
User applications built against Hadoop might add all Hadoop jars (including Hadoop's library dependencies) to the application's classpath. Adding new dependencies or updating the version of existing dependencies may interfere with those in applications' classpaths.
Hadoop provides several client artifacts that applications use to interact
with the system. These artifacts typically have their own dependencies on
common libraries. In the cases where these dependencies are exposed to
end user applications or downstream consumers (i.e. not
[shaded](https://stackoverflow.com/questions/13620281/what-is-the-maven-shade-plugin-used-for-and-why-would-you-want-to-relocate-java))
changes to these dependencies can be disruptive. Developers are strongly
encouraged to avoid exposing dependencies to clients by using techniques
such as
[shading](https://stackoverflow.com/questions/13620281/what-is-the-maven-shade-plugin-used-for-and-why-would-you-want-to-relocate-java).
With regard to dependencies, adding a dependency is an incompatible change,
whereas removing a dependency is a compatible change.
Some user applications built against Hadoop may add all Hadoop JAR files
(including Hadoop's library dependencies) to the application's classpath.
Adding new dependencies or updating the versions of existing dependencies may
interfere with those in applications' classpaths and hence their correct
operation. Users are therefore discouraged from adopting this practice.
#### Policy
Currently, there is NO policy on when Hadoop's dependencies can change.
The set of dependencies exposed by the Hadoop client artifacts SHALL be
considered [Public](./InterfaceClassification.html#Public) and
[Stable](./InterfaceClassification.html#Stable). Any dependencies that are not
exposed to clients (either because they are shaded or only exist in non-client
artifacts) SHALL be considered [Private](./InterfaceClassification.html#Private)
and [Unstable](./InterfaceClassification.html#Unstable)
### Environment variables
Users and related projects often utilize the exported environment variables (eg HADOOP\_CONF\_DIR), therefore removing or renaming environment variables is an incompatible change.
Users and related projects often utilize the environment variables exported by
Hadoop (e.g. HADOOP\_CONF\_DIR). Removing or renaming environment variables can
therefore impact end user applications.
#### Policy
Currently, there is NO policy on when the environment variables can change. Developers try to limit changes to major releases.
The environment variables consumed by Hadoop and the environment variables made
accessible to applications through YARN SHALL be considered
[Public](./InterfaceClassification.html#Public) and
[Evolving](./InterfaceClassification.html#Evolving).
The developer community SHOULD limit changes to major releases.
### Build artifacts
Hadoop uses maven for project management and changing the artifacts can affect existing user workflows.
Hadoop uses Maven for project management. Changes to the contents of
generated artifacts can impact existing user applications.
#### Policy
* Test artifacts: The test jars generated are strictly for internal use and are not expected to be used outside of Hadoop, similar to APIs annotated @Private, @Unstable.
* Built artifacts: The hadoop-client artifact (maven groupId:artifactId) stays compatible within a major release, while the other artifacts can change in incompatible ways.
The contents of Hadoop test artifacts SHALL be considered
[Private](./InterfaceClassification.html#Private) and
[Unstable](./InterfaceClassification.html#Unstable). Test artifacts include
all JAR files generated from test source code and all JAR files that include
"tests" in the file name.
The Hadoop client artifacts SHALL be considered
[Public](./InterfaceClassification.html#Public) and
[Stable](./InterfaceClassification.html#Stable). Client artifacts are the
following:
* hadoop-client
* hadoop-client-api
* hadoop-client-minicluster
* hadoop-client-runtime
* hadoop-hdfs-client
* hadoop-hdfs-native-client
* hadoop-mapreduce-client-app
* hadoop-mapreduce-client-common
* hadoop-mapreduce-client-core
* hadoop-mapreduce-client-hs
* hadoop-mapreduce-client-hs-plugins
* hadoop-mapreduce-client-jobclient
* hadoop-mapreduce-client-nativetask
* hadoop-mapreduce-client-shuffle
* hadoop-yarn-client
All other build artifacts SHALL be considered
[Private](./InterfaceClassification.html#Private) and
[Unstable](./InterfaceClassification.html#Unstable).
### Hardware/Software Requirements
To keep up with the latest advances in hardware, operating systems, JVMs, and other software, new Hadoop releases or some of their features might require higher versions of the same. For a specific environment, upgrading Hadoop might require upgrading other dependent software components.
To keep up with the latest advances in hardware, operating systems, JVMs, and
other software, new Hadoop releases may include features that require
newer hardware, operating systems releases, or JVM versions than previous
Hadoop releases. For a specific environment, upgrading Hadoop might require
upgrading other dependent software components.
#### Policies
* Hardware
* Architecture: The community has no plans to restrict Hadoop to specific architectures, but can have family-specific optimizations.
* Minimum resources: While there are no guarantees on the minimum resources required by Hadoop daemons, the community attempts to not increase requirements within a minor release.
* Operating Systems: The community will attempt to maintain the same OS requirements (OS kernel versions) within a minor release. Currently GNU/Linux and Microsoft Windows are the OSes officially supported by the community while Apache Hadoop is known to work reasonably well on other OSes such as Apple MacOSX, Solaris etc.
* The JVM requirements will not change across point releases within the same minor release except if the JVM version under question becomes unsupported. Minor/major releases might require later versions of JVM for some/all of the supported operating systems.
* Other software: The community tries to maintain the minimum versions of additional software required by Hadoop. For example, ssh, kerberos etc.
* Minimum resources: While there are no guarantees on the minimum resources required by Hadoop daemons, the developer community SHOULD avoid increasing requirements within a minor release.
* Operating Systems: The community SHOULD maintain the same minimum OS requirements (OS kernel versions) within a minor release. Currently GNU/Linux and Microsoft Windows are the OSes officially supported by the community, while Apache Hadoop is known to work reasonably well on other OSes such as Apple MacOSX, Solaris, etc.
* The JVM requirements SHALL NOT change across minor releases within the same major release unless the JVM version in question becomes unsupported. The JVM version requirement MAY be different for different operating systems or even operating system releases.
* File systems supported by Hadoop, e.g. through the HDFS FileSystem API, SHOULD not become unsupported between minor releases within a major version unless a migration path to an alternate client implementation is available.
References
----------

View File

@ -66,54 +66,103 @@ Hadoop uses the following kinds of audience in order of increasing/wider visibil
#### Private
The interface is for internal use within the project (such as HDFS or MapReduce)
and should not be used by applications or by other projects. It is subject to
change at anytime without notice. Most interfaces of a project are Private (also
referred to as project-private).
A Private interface is for internal use within the project (such as HDFS or
MapReduce) and should not be used by applications or by other projects. Most
interfaces of a project are Private (also referred to as project-private).
Unless an interface is intentionally exposed for external consumption, it should
be marked Private.
#### Limited-Private
The interface is used by a specified set of projects or systems (typically
closely related projects). Other projects or systems should not use the
interface. Changes to the interface will be communicated/negotiated with the
A Limited-Private interface is used by a specified set of projects or systems
(typically closely related projects). Other projects or systems should not use
the interface. Changes to the interface will be communicated/negotiated with the
specified projects. For example, in the Hadoop project, some interfaces are
LimitedPrivate{HDFS, MapReduce} in that they are private to the HDFS and
MapReduce projects.
#### Public
The interface is for general use by any application.
A Public interface is for general use by any application.
### Change Compatibility
Changes to an API fall into two broad categories: compatible and incompatible.
A compatible change is a change that meets the following criteria:
* no existing capabilities are removed,
* no existing capabilities are modified in a way that prevents their use by clients that were constructed to use the interface prior to the change, and
* no capabilities are added that require changes to clients that were constructed to use the interface prior to the change.
Any change that does not meet these three criteria is an incompatible change.
Stated simply a compatible change will not break existing clients. These
examples are compatible changes:
* adding a method to a Java class,
* adding an optional parameter to a RESTful web service, or
* adding a tag to an XML document.
* making the audience annotation of an interface more broad (e.g. from Private to Public) or the change compatibility annotation more restrictive (e.g. from Evolving to Stable)
These examples are incompatible changes:
* removing a method from a Java class,
* adding a method to a Java interface,
* adding a required parameter to a RESTful web service, or
* renaming a field in a JSON document.
* making the audience annotation of an interface less broad (e.g. from Public to Limited Private) or the change compatibility annotation more restrictive (e.g. from Evolving to Unstable)
### Stability
Stability denotes how stable an interface is, as in when incompatible changes to
the interface are allowed. Hadoop APIs have the following levels of stability.
Stability denotes how stable an interface is and when compatible and
incompatible changes to the interface are allowed. Hadoop APIs have the
following levels of stability.
#### Stable
Can evolve while retaining compatibility for minor release boundaries; in other
words, incompatible changes to APIs marked as Stable are allowed only at major
releases (i.e. at m.0).
A Stable interface is exposed as a preferred means of communication. A Stable
interface is expected not to change incompatibly within a major release and
hence serves as a safe development target. A Stable interface may evolve
compatibly between minor releases.
Incompatible changes allowed: major (X.0.0)
Compatible changes allowed: maintenance (x.Y.0)
#### Evolving
Evolving, but incompatible changes are allowed at minor releases (i.e. m .x)
An Evolving interface is typically exposed so that users or external code can
make use of a feature before it has stabilized. The expectation that an
interface should "eventually" stabilize and be promoted to Stable, however,
is not a requirement for the interface to be labeled as Evolving.
Incompatible changes are allowed for Evolving interface only at minor releases.
Incompatible changes allowed: minor (x.Y.0)
Compatible changes allowed: maintenance (x.y.Z)
#### Unstable
Incompatible changes to Unstable APIs are allowed at any time. This usually makes
sense for only private interfaces.
An Unstable interface is one for which no compatibility guarantees are made. An
Unstable interface is not necessarily unstable. An unstable interface is
typically exposed because a user or external code needs to access an interface
that is not intended for consumption. The interface is exposed as an Unstable
interface to state clearly that even though the interface is exposed, it is not
the preferred access path, and no compatibility guarantees are made for it.
However one may call this out for a supposedly public interface to highlight
that it should not be used as an interface; for public interfaces, labeling it
as Not-an-interface is probably more appropriate than "Unstable".
Unless there is a reason to offer a compatibility guarantee on an interface,
whether it is exposed or not, it should be labeled as Unstable. Private
interfaces also should be Unstable in most cases.
Examples of publicly visible interfaces that are unstable
(i.e. not-an-interface): GUI, CLIs whose output format will change.
Incompatible changes to Unstable interfaces are allowed at any time.
Incompatible changes allowed: maintenance (x.y.Z)
Compatible changes allowed: maintenance (x.y.Z)
#### Deprecated
APIs that could potentially be removed in the future and should not be used.
A Deprecated interface could potentially be removed in the future and should
not be used. Even so, a Deprecated interface will continue to function until
it is removed. When a Deprecated interface can be removed depends on whether
it is also Stable, Evolving, or Unstable.
How are the Classifications Recorded?
-------------------------------------
@ -121,95 +170,101 @@ How are the Classifications Recorded?
How will the classification be recorded for Hadoop APIs?
* Each interface or class will have the audience and stability recorded using
annotations in org.apache.hadoop.classification package.
annotations in the org.apache.hadoop.classification package.
* The javadoc generated by the maven target javadoc:javadoc lists only the public API.
* The javadoc generated by the maven target javadoc:javadoc lists only the
public API.
* One can derive the audience of java classes and java interfaces by the
audience of the package in which they are contained. Hence it is useful to
declare the audience of each java package as public or private (along with the
private audience variations).
How will the classification be recorded for other interfaces, such as CLIs?
* See the [Hadoop Compatibility](Compatibility.html) page for details.
FAQ
---
* Why arent the java scopes (private, package private and public) good enough?
* Javas scoping is not very complete. One is often forced to make a class
public in order for other internal components to use it. It does not have
friends or sub-package-private like C++.
public in order for other internal components to use it. It also does not
have friends or sub-package-private like C++.
* But I can easily access a private implementation interface if it is Java public.
Where is the protection and control?
* The purpose of this is not providing absolute access control. Its purpose
is to communicate to users and developers. One can access private
implementation functions in libc; however if they change the internal
implementation details, your application will break and you will have
little sympathy from the folks who are supplying libc. If you use a
non-public interface you understand the risks.
* But I can easily access a Private interface if it is Java public. Where is the
protection and control?
* The purpose of this classification scheme is not providing absolute
access control. Its purpose is to communicate to users and developers.
One can access private implementation functions in libc; however if
they change the internal implementation details, the application will
break and one will receive little sympathy from the folks who are
supplying libc. When using a non-public interface, the risks are
understood.
* Why bother declaring the stability of a private interface?
Arent private interfaces always unstable?
* Private interfaces are not always unstable. In the cases where they are
stable they capture internal properties of the system and can communicate
* Why bother declaring the stability of a Private interface? Arent Private
interfaces always Unstable?
* Private interfaces are not always Unstable. In the cases where they are
Stable they capture internal properties of the system and can communicate
these properties to its internal users and to developers of the interface.
* e.g. In HDFS, NN-DN protocol is private but stable and can help
implement rolling upgrades. It communicates that this interface should
not be changed in incompatible ways even though it is private.
* e.g. In HDFS, FSImage stability provides more flexible rollback.
* e.g. In HDFS, NN-DN protocol is Private but Stable and can help
implement rolling upgrades. The stability annotation communicates that
this interface should not be changed in incompatible ways even though
it is Private.
* e.g. In HDFS, FSImage the Stabile designation provides more flexible
rollback.
* What is the harm in applications using a private interface that is stable? How
is it different than a public stable interface?
* While a private interface marked as stable is targeted to change only at
* What is the harm in applications using a Private interface that is Stable?
How is it different from a Public Stable interface?
* While a Private interface marked as Stable is targeted to change only at
major releases, it may break at other times if the providers of that
interface are willing to change the internal users of that
interface. Further, a public stable interface is less likely to break even
interface also are willing to change the internal consumers of that
interface. Further, a Public Stable interface is less likely to break even
at major releases (even though it is allowed to break compatibility)
because the impact of the change is larger. If you use a private interface
because the impact of the change is larger. If you use a Private interface
(regardless of its stability) you run the risk of incompatibility.
* Why bother with Limited-private? Isnt it giving special treatment to some projects?
That is not fair.
* First, most interfaces should be public or private; actually let us state
it even stronger: make it private unless you really want to expose it to
public for general use.
* Limited-private is for interfaces that are not intended for general
* Why bother with Limited-Private? Isnt it giving special treatment to some
projects? That is not fair.
* Most interfaces should be Public or Private. An interface should be
Private unless it is explicitly intended for general use.
* Limited-Private is for interfaces that are not intended for general
use. They are exposed to related projects that need special hooks. Such a
classification has a cost to both the supplier and consumer of the limited
classification has a cost to both the supplier and consumer of the
interface. Both will have to work together if ever there is a need to
break the interface in the future; for example the supplier and the
consumers will have to work together to get coordinated releases of their
respective projects. This should not be taken lightly if you can get
away with private then do so; if the interface is really for general use
for all applications then do so. But remember that making an interface
public has huge responsibility. Sometimes Limited-private is just right.
* A good example of a limited-private interface is BlockLocations, This is a
fairly low-level interface that we are willing to expose to MR and perhaps
HBase. We are likely to change it down the road and at that time we will
coordinate release effort with the MR team.
While MR and HDFS are always released in sync today, they may
change down the road.
* If you have a limited-private interface with many projects listed then you
are fooling yourself. It is practically public.
* It might be worth declaring a special audience classification called
Hadoop-Private for the Hadoop family.
respective projects. This contract should not be taken lightlyuse
Private if possible; if the interface is really for general use
for all applications then use Public. Always remember that making an
interface Public comes with large burden of responsibility. Sometimes
Limited-Private is just right.
* A good example of a Limited-Private interface is BlockLocations. This
interface is a fairly low-level interface that is exposed to MapReduce
and HBase. The interface is likely to change down the road, and at that
time the release effort will have to be coordinated with the
MapReduce development team. While MapReduce and HDFS are always released
in sync today, that policy may change down the road.
* If you have a Limited-Private interface with many projects listed then
the interface is probably a good candidate to be made Public.
* Lets treat all private interfaces as Hadoop-private. What is the harm in
projects in the Hadoop family have access to private classes?
* Do we want MR accessing class files that are implementation details inside
HDFS. There used to be many such layer violations in the code that we have
been cleaning up over the last few years. We dont want such layer
violations to creep back in by no separating between the major components
like HDFS and MR.
* Let's treat all Private interfaces as Limited-Private for all of Hadoop. What
is the harm if projects in the Hadoop family have access to private classes?
* There used to be many cases in the code where one project depended on the
internal implementation details of another. A significant effort went
into cleaning up those issues. Opening up all interfaces as
Limited-Private for all of Hadoop would open the door to reintroducing
such coupling issues.
* Aren't all public interfaces stable?
* One may mark a public interface as evolving in its early days. Here one is
* Aren't all Public interfaces Stable?
* One may mark a Public interface as Evolving in its early days. Here one is
promising to make an effort to make compatible changes but may need to
break it at minor releases.
* One example of a public interface that is unstable is where one is
* One example of a Public interface that is Unstable is where one is
providing an implementation of a standards-body based interface that is
still under development. For example, many companies, in an attempt to be
first to market, have provided implementations of a new NFS protocol even
when the protocol was not fully completed by IETF. The implementor cannot
evolve the interface in a fashion that causes least distruption because
evolve the interface in a fashion that causes least disruption because
the stability is controlled by the standards body. Hence it is appropriate
to label the interface as unstable.
to label the interface as Unstable.