diff --git a/hadoop-common-project/hadoop-common/CHANGES.txt b/hadoop-common-project/hadoop-common/CHANGES.txt index e00f5c0f2ed..e0fb93bfcc0 100644 --- a/hadoop-common-project/hadoop-common/CHANGES.txt +++ b/hadoop-common-project/hadoop-common/CHANGES.txt @@ -87,6 +87,8 @@ Release 2.0.5-beta - UNRELEASED helping YARN ResourceManager to reuse code for RM restart. (Jian He via vinodkv) + HADOOP-7391 Document Interface Classification from HADOOP-5073 (sanjay Radia) + OPTIMIZATIONS HADOOP-9150. Avoid unnecessary DNS resolution attempts for logical URIs diff --git a/hadoop-common-project/hadoop-common/src/site/apt/InterfaceClassification.apt.vm b/hadoop-common-project/hadoop-common/src/site/apt/InterfaceClassification.apt.vm new file mode 100644 index 00000000000..811cfe54109 --- /dev/null +++ b/hadoop-common-project/hadoop-common/src/site/apt/InterfaceClassification.apt.vm @@ -0,0 +1,241 @@ +~~ Licensed under the Apache License, Version 2.0 (the "License"); +~~ you may not use this file except in compliance with the License. +~~ You may obtain a copy of the License at +~~ +~~ http://www.apache.org/licenses/LICENSE-2.0 +~~ +~~ Unless required by applicable law or agreed to in writing, software +~~ distributed under the License is distributed on an "AS IS" BASIS, +~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +~~ See the License for the specific language governing permissions and +~~ limitations under the License. See accompanying LICENSE file. + + --- + Hadoop Interface Taxonomy: Audience and Stability Classification + --- + --- + ${maven.build.timestamp} + +Hadoop Interface Taxonomy: Audience and Stability Classification + + \[ {{{./index.html}Go Back}} \] + +%{toc|section=1|fromDepth=0} + +* Motivation + + The interface taxonomy classification provided here is for guidance to + developers and users of interfaces. The classification guides a developer + to declare the targeted audience or users of an interface and also its + stability. + + * Benefits to the user of an interface: Knows which interfaces to use or not + use and their stability. + + * Benefits to the developer: to prevent accidental changes of interfaces and + hence accidental impact on users or other components or system. This is + particularly useful in large systems with many developers who may not all + have a shared state/history of the project. + +* Interface Classification + + Hadoop adopts the following interface classification, + this classification was derived from the + {{{http://www.opensolaris.org/os/community/arc/policies/interface-taxonomy/#Advice}OpenSolaris taxonomy}} + and, to some extent, from taxonomy used inside Yahoo. Interfaces have two main + attributes: Audience and Stability + +** Audience + + Audience denotes the potential consumers of the interface. While many + interfaces are internal/private to the implementation, + other are public/external interfaces are meant for wider consumption by + applications and/or clients. For example, in posix, libc is an external or + public interface, while large parts of the kernel are internal or private + interfaces. Also, some interfaces are targeted towards other specific + subsystems. + + Identifying the audience of an interface helps define the impact of + breaking it. For instance, it might be okay to break the compatibility of + an interface whose audience is a small number of specific subsystems. On + the other hand, it is probably not okay to break a protocol interfaces + that millions of Internet users depend on. + + Hadoop uses the following kinds of audience in order of + increasing/wider visibility: + + * Private: + + * The interface is for internal use within the project (such as HDFS or + MapReduce) and should not be used by applications or by other projects. It + is subject to change at anytime without notice. Most interfaces of a + project are Private (also referred to as project-private). + + * Limited-Private: + + * The interface is used by a specified set of projects or systems + (typically closely related projects). Other projects or systems should not + use the interface. Changes to the interface will be communicated/ + negotiated with the specified projects. For example, in the Hadoop project, + some interfaces are LimitedPrivate\{HDFS, MapReduce\} in that they + are private to the HDFS and MapReduce projects. + + * Public + + * The interface is for general use by any application. + + Hadoop doesn't have a Company-Private classification, + which is meant for APIs which are intended to be used by other projects + within the company, since it doesn't apply to opensource projects. Also, + certain APIs are annotated as @VisibleForTesting (from com.google.common + .annotations.VisibleForTesting) - these are meant to be used strictly for + unit tests and should be treated as "Private" APIs. + +** Stability + + Stability denotes how stable an interface is, as in when incompatible + changes to the interface are allowed. Hadoop APIs have the following + levels of stability. + + * Stable + + * Can evolve while retaining compatibility for minor release boundaries; + in other words, incompatible changes to APIs marked Stable are allowed + only at major releases (i.e. at m.0). + + * Evolving + + * Evolving, but incompatible changes are allowed at minor release (i.e. m + .x) + + * Unstable + + * Incompatible changes to Unstable APIs are allowed any time. This + usually makes sense for only private interfaces. + + * However one may call this out for a supposedly public interface to + highlight that it should not be used as an interface; for public + interfaces, labeling it as Not-an-interface is probably more appropriate + than "Unstable". + + * Examples of publicly visible interfaces that are unstable (i.e. + not-an-interface): GUI, CLIs whose output format will change + + * Deprecated + + * APIs that could potentially removed in the future and should not be + used. + +* How are the Classifications Recorded? + + How will the classification be recorded for Hadoop APIs? + + * Each interface or class will have the audience and stability recorded + using annotations in org.apache.hadoop.classification package. + + * The javadoc generated by the maven target javadoc:javadoc lists only the + public API. + + * One can derive the audience of java classes and java interfaces by the + audience of the package in which they are contained. Hence it is useful to + declare the audience of each java package as public or private (along with + the private audience variations). + +* FAQ + + * Why aren’t the java scopes (private, package private and public) good + enough? + + * Java’s scoping is not very complete. One is often forced to make a class + public in order for other internal components to use it. It does not have + friends or sub-package-private like C++. + + * But I can easily access a private implementation interface if it is Java + public. Where is the protection and control? + + * The purpose of this is not providing absolute access control. Its purpose + is to communicate to users and developers. One can access private + implementation functions in libc; however if they change the internal + implementation details, your application will break and you will have little + sympathy from the folks who are supplying libc. If you use a non-public + interface you understand the risks. + + * Why bother declaring the stability of a private interface? Aren’t private + interfaces always unstable? + + * Private interfaces are not always unstable. In the cases where they are + stable they capture internal properties of the system and can communicate + these properties to its internal users and to developers of the interface. + + * e.g. In HDFS, NN-DN protocol is private but stable and can help + implement rolling upgrades. It communicates that this interface should not + be changed in incompatible ways even though it is private. + + * e.g. In HDFS, FSImage stability can help provide more flexible roll + backs. + + * What is the harm in applications using a private interface that is + stable? How is it different than a public stable interface? + + * While a private interface marked as stable is targeted to change only at + major releases, it may break at other times if the providers of that + interface are willing to changes the internal users of that interface. + Further, a public stable interface is less likely to break even at major + releases (even though it is allowed to break compatibility) because the + impact of the change is larger. If you use a private interface (regardless + of its stability) you run the risk of incompatibility. + + * Why bother with Limited-private? Isn’t it giving special treatment to some + projects? That is not fair. + + * First, most interfaces should be public or private; actually let us state + it even stronger: make it private unless you really want to expose it to + public for general use. + + * Limited-private is for interfaces that are not intended for general use. + They are exposed to related projects that need special hooks. Such a + classification has a cost to both the supplier and consumer of the limited + interface. Both will have to work together if ever there is a need to break + the interface in the future; for example the supplier and the consumers will + have to work together to get coordinated releases of their respective + projects. This should not be taken lightly – if you can get away with + private then do so; if the interface is really for general use for all + applications then do so. But remember that making an interface public has + huge responsibility. Sometimes Limited-private is just right. + + * A good example of a limited-private interface is BlockLocations, This is + fairly low-level interface that we are willing to expose to MR and perhaps + HBase. We are likely to change it down the road and at that time we will + have get a coordinated effort with the MR team to release matching releases. + While MR and HDFS are always released in sync today, they may change down + the road. + + * If you have a limited-private interface with many projects listed then + you are fooling yourself. It is practically public. + + * It might be worth declaring a special audience classification called + Hadoop-Private for the Hadoop family. + + * Lets treat all private interfaces as Hadoop-private. What is the harm in + projects in the Hadoop family have access to private classes? + + * Do we want MR accessing class files that are implementation details + inside HDFS. There used to be many such layer violations in the code that + we have been cleaning up over the last few years. We don’t want such + layer violations to creep back in by no separating between the major + components like HDFS and MR. + + * Aren't all public interfaces stable? + + * One may mark a public interface as evolving in its early days. + Here one is promising to make an effort to make compatible changes but may + need to break it at minor releases. + + * One example of a public interface that is unstable is where one is providing + an implementation of a standards-body based interface that is still under development. + For example, many companies, in an attampt to be first to market, + have provided implementations of a new NFS protocol even when the protocol was not + fully completed by IETF. + The implementor cannot evolve the interface in a fashion that causes least distruption + because the stability is controlled by the standards body. Hence it is appropriate to + label the interface as unstable.