<li><ahref="#Java_Binary_compatibility_for_end-user_applications_i.e._Apache_Hadoop_ABI">Java Binary compatibility for end-user applications i.e. Apache Hadoop ABI</a></li></ul></li>
<p>This document captures the compatibility goals of the Apache Hadoop project. The different types of compatibility between Hadoop releases that affect Hadoop developers, downstream projects, and end-users are enumerated. For each type of compatibility this document will:</p>
<ul>
<li>describe the impact on downstream projects or end-users</li>
<li>where applicable, call out the policy adopted by the Hadoop developers when incompatible changes are permitted.</li>
</ul>
<p>All Hadoop interfaces are classified according to the intended audience and stability in order to maintain compatibility with previous releases. See the <ahref="./InterfaceClassification.html">Hadoop Interface Taxonomy</a> for details about the classifications.</p><section>
<p>This document is intended for consumption by the Hadoop developer community. This document describes the lens through which changes to the Hadoop project should be viewed. In order for end users and third party developers to have confidence about cross-release compatibility, the developer community must ensure that development efforts adhere to these policies. It is the responsibility of the project committers to validate that all changes either maintain compatibility or are explicitly marked as incompatible.</p>
<p>Within a component Hadoop developers are free to use Private and Limited Private APIs, but when using components from a different module Hadoop developers should follow the same guidelines as third-party developers: do not use Private or Limited Private (unless explicitly allowed) interfaces and prefer instead Stable interfaces to Evolving or Unstable interfaces where possible. Where not possible, the preferred solution is to expand the audience of the API rather than introducing or perpetuating an exception to these compatibility guidelines. When working within a Maven module Hadoop developers should observe where possible the same level of restraint with regard to using components located in other Maven modules.</p>
<p>Above all, Hadoop developers must be mindful of the impact of their changes. Stable interfaces must not change between major releases. Evolving interfaces must not change between minor releases. New classes and components must be labeled appropriately for audience and stability. See the <ahref="./InterfaceClassification.html">Hadoop Interface Taxonomy</a> for details about when the various labels are appropriate. As a general rule, all new interfaces and APIs should have the most limited labels (e.g. Private Unstable) that will not inhibit the intent of the interface or API.</p></section><section>
<h3><aname="Structure"></a>Structure</h3>
<p>This document is arranged in sections according to the various compatibility concerns. Within each section an introductory text explains what compatibility means in that section, why it’s important, and what the intent to support compatibility is. The subsequent “Policy” section then sets forth in specific terms what the governing policy is.</p></section><section>
<p>The key words “MUST”“MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” are to be interpreted as described in <aclass="externalLink"href="http://tools.ietf.org/html/rfc2119">RFC 2119</a>.</p></section></section><section>
<h2><aname="Deprecation"></a>Deprecation</h2>
<p>The Java API provides a @Deprecated annotation to mark an API element as flagged for removal. The standard meaning of the annotation is that the API element should not be used and may be removed in a later version.</p>
<p>In all cases removing an element from an API is an incompatible change. The stability of the element SHALL determine when such a change is permissible. A <ahref="./InterfaceClassification.html#Stable">Stable</a> element MUST be marked as deprecated for a full major release before it can be removed and SHALL NOT be removed in a minor or maintenance release. An <ahref="./InterfaceClassification.html#Evolving">Evolving</a> element MUST be marked as deprecated for a full minor release before it can be removed and SHALL NOT be removed during a maintenance release. An <ahref="./InterfaceClassification.html#Unstable">Unstable</a> element MAY be removed at any time. When possible an <ahref="./InterfaceClassification.html#Unstable">Unstable</a> element SHOULD be marked as deprecated for at least one release before being removed. For example, if a method is marked as deprecated in Hadoop 2.8, it cannot be removed until Hadoop 4.0.</p><section>
<h3><aname="Policy"></a>Policy</h3>
<p><ahref="./InterfaceClassification.html#Stable">Stable</a> API elements MUST NOT be removed until they have been marked as deprecated (through the @Deprecated annotation or other appropriate documentation) for a full major release. In the case that an API element was introduced as deprecated (to indicate that it is a temporary measure that is intended to be removed) the API element MAY be removed in the following major release. When modifying a <ahref="./InterfaceClassification.html#Stable">Stable</a> API, developers SHOULD prefer introducing a new method or endpoint and deprecating the existing one to making incompatible changes to the method or endpoint.</p></section></section><section>
<p>Developers SHOULD annotate all Hadoop interfaces and classes with the @InterfaceAudience and @InterfaceStability annotations to describe the intended audience and stability.</p>
<ul>
<li>@InterfaceAudience captures the intended audience. Possible values are <ahref="./InterfaceClassification.html#Public">Public</a> (for end users and external projects), Limited<ahref="./InterfaceClassification.html#Private">Private</a> (for other Hadoop components, and closely related projects like YARN, MapReduce, HBase etc.), and <ahref="./InterfaceClassification.html#Private">Private</a> (for intra component use).</li>
<li>@InterfaceStability describes what types of interface changes are permitted. Possible values are <ahref="./InterfaceClassification.html#Stable">Stable</a>, <ahref="./InterfaceClassification.html#Evolving">Evolving</a>, and <ahref="./InterfaceClassification.html#Unstable">Unstable</a>.</li>
<li>@Deprecated notes that the package, class, or member variable or method could potentially be removed in the future and should not be used.</li>
</ul>
<p>Annotations MAY be applied at the package, class, or method level. If a method has no privacy or stability annotation, it SHALL inherit its intended audience or stability level from the class to which it belongs. If a class has no privacy or stability annotation, it SHALL inherit its intended audience or stability level from the package to which it belongs. If a package has no privacy or stability annotation, it SHALL be assumed to be <ahref="./InterfaceClassification.html#Private">Private</a> and <ahref="./InterfaceClassification.html#Unstable">Unstable</a>, respectively.</p>
<p>In the event that an element’s audience or stability annotation conflicts with the corresponding annotation of its parent (whether explicit or inherited), the element’s audience or stability (respectively) SHALL be determined by the more restrictive annotation. For example, if a <ahref="./InterfaceClassification.html#Private">Private</a> method is contained in a <ahref="./InterfaceClassification.html#Public">Public</a> class, then the method SHALL be treated as <ahref="./InterfaceClassification.html#Private">Private</a>. If a <ahref="./InterfaceClassification.html#Public">Public</a> method is contained in a <ahref="./InterfaceClassification.html#Private">Private</a> class, the method SHALL be treated as <ahref="./InterfaceClassification.html#Private">Private</a>.</p><section>
<h4><aname="Use_Cases"></a>Use Cases</h4>
<ul>
<li><ahref="./InterfaceClassification.html#Public">Public</a>-<ahref="./InterfaceClassification.html#Stable">Stable</a> API compatibility is required to ensure end-user programs and downstream projects continue to work without modification.</li>
<li><ahref="./InterfaceClassification.html#Public">Public</a>-<ahref="./InterfaceClassification.html#Evolving">Evolving</a> API compatibility is useful to make functionality available for consumption before it is fully baked.</li>
<li>Limited Private-<ahref="./InterfaceClassification.html#Stable">Stable</a> API compatibility is required to allow upgrade of individual components across minor releases.</li>
<li><ahref="./InterfaceClassification.html#Private">Private</a>-<ahref="./InterfaceClassification.html#Stable">Stable</a> API compatibility is required for rolling upgrades.</li>
<li><ahref="./InterfaceClassification.html#Private">Private</a>-<ahref="./InterfaceClassification.html#Unstable">Unstable</a> API compatibility allows internal components to evolve rapidly without concern for downstream consumers, and is how most interfaces should be labeled.</li>
</ul></section><section>
<h4><aname="Policy"></a>Policy</h4>
<p>The compatibility policy SHALL be determined by the relevant package, class, or member variable or method annotations.</p>
<p>Note: APIs generated from the proto files MUST be compatible for rolling upgrades. See the section on wire protocol compatibility for more details. The compatibility policies for APIs and wire protocols must therefore go hand in hand.</p></section><section>
<p>Apache Hadoop strives to ensure that the behavior of APIs remains consistent across releases, though changes for correctness may result in changes in behavior. API behavior SHALL be specified by the JavaDoc API documentation where present and complete. When JavaDoc API documentation is not available, behavior SHALL be specified by the behavior expected by the related unit tests. In cases with no JavaDoc API documentation or unit test coverage, the expected behavior is presumed to be obvious and SHOULD be assumed to be the minimum functionality implied by the interface naming. The community is in the process of specifying some APIs more rigorously and enhancing test suites to verify compliance with the specification, effectively creating a formal specification for the subset of behaviors that can be easily tested.</p>
<p>The behavior of any API MAY be changed to fix incorrect behavior according to the stability of the API, with such a change to be accompanied by updating existing documentation and tests and/or adding new documentation or tests.</p></section><section>
<h4><aname="Java_Binary_compatibility_for_end-user_applications_i.e._Apache_Hadoop_ABI"></a>Java Binary compatibility for end-user applications i.e. Apache Hadoop ABI</h4>
<p>Apache Hadoop revisions SHOULD retain binary compatability such that end-user applications continue to work without any modifications. Minor Apache Hadoop revisions within the same major revision MUST retain compatibility such that existing MapReduce applications (e.g. end-user applications and projects such as Apache Pig, Apache Hive, et al), existing YARN applications (e.g. end-user applications and projects such as Apache Spark, Apache Tez et al), and applications that accesses HDFS directly (e.g. end-user applications and projects such as Apache HBase, Apache Flume, et al) work unmodified and without recompilation when used with any Apache Hadoop cluster within the same major release as the original build target.</p>
<p>For MapReduce applications in particular, i.e. applications using the org.apache.hadoop.mapred and/or org.apache.hadoop.mapreduce APIs, the developer community SHALL support binary compatibility across major releases. The MapReduce APIs SHALL be supported compatibly across major releases. See <ahref="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html">Compatibility for MapReduce applications between hadoop-1.x and hadoop-2.x</a> for more details.</p>
<p>Some applications may be affected by changes to disk layouts or other internal changes. See the sections that follow for policies on how incompatible changes to non-API interfaces are handled.</p></section></section><section>
<p>Hadoop includes several native components, including compression, the container executor binary, and various native integrations. These native components introduce a set of native dependencies for Hadoop, both at compile time and at runtime, such as cmake, gcc, zlib, etc. This set of native dependencies is part of the Hadoop ABI.</p><section>
<h4><aname="Policy"></a>Policy</h4>
<p>The minimum required versions of the native components on which Hadoop depends at compile time and/or runtime SHALL be considered <ahref="./InterfaceClassification.html#Evolving">Evolving</a>. The minimum required versions SHOULD NOT increase between minor releases within a major version, though updates because of security issues, license issues, or other reasons MAY occur. When the native components on which Hadoop depends must be updated between minor releases within a major release, where possible the changes SHOULD only change the minor versions of the components without changing the major versions.</p></section></section><section>
<p>Wire compatibility concerns data being transmitted “over the wire” between Hadoop processes. Hadoop uses <aclass="externalLink"href="https://developers.google.com/protocol-buffers/">Protocol Buffers</a> for most RPC communication. Preserving compatibility requires prohibiting modification as described below. Non-RPC communication should be considered as well, for example using HTTP to transfer an HDFS image as part of snapshotting or transferring MapReduce map task output. The communications can be categorized as follows:</p>
<ul>
<li>Client-Server: communication between Hadoop clients and servers (e.g., the HDFS client to NameNode protocol, or the YARN client to ResourceManager protocol).</li>
<li>Client-Server (Admin): It is worth distinguishing a subset of the Client-Server protocols used solely by administrative commands (e.g., the HAAdmin protocol) as these protocols only impact administrators who can tolerate changes that end users (which use general Client-Server protocols) cannot.</li>
<li>Server-Server: communication between servers (e.g., the protocol between the DataNode and NameNode, or NodeManager and ResourceManager)</li>
<p>The components of Apache Hadoop may have dependencies that include their own protocols, such as Zookeeper, S3, Kerberos, etc. These protocol dependencies SHALL be treated as internal protocols and governed by the same policy.</p></section><section>
<h4><aname="Transports"></a>Transports</h4>
<p>In addition to compatibility of the protocols themselves, maintaining cross-version communications requires that the transports supported also be stable. The most likely source of transport changes stems from secure transports, such as SSL. Upgrading a service from SSLv2 to SSLv3 may break existing SSLv2 clients. The minimum supported major version of any transports SHOULD NOT increase between minor releases within a major version, though updates because of security issues, license issues, or other reasons MAY occur. When a transport must be updated between minor releases within a major release, where possible the changes SHOULD only change the minor versions of the components without changing the major versions.</p>
<p>Service ports are considered as part of the transport mechanism. Default service port numbers must be kept consistent to prevent breaking clients.</p></section><section>
<h4><aname="Policy"></a>Policy</h4>
<p>Hadoop wire protocols are defined in .proto (ProtocolBuffers) files. Client-Server and Server-Server protocols SHALL be classified according to the audience and stability classifications noted in their .proto files. In cases where no classifications are present, the protocols SHOULD be assumed to be <ahref="./InterfaceClassification.html#Private">Private</a> and <ahref="./InterfaceClassification.html#Stable">Stable</a>.</p>
<p>The following changes to a .proto file SHALL be considered compatible:</p>
<ul>
<li>Add an optional field, with the expectation that the code deals with the field missing due to communication with an older version of the code</li>
<li>Add a new rpc/method to the service</li>
<li>Add a new optional request to a Message</li>
<li>Rename a field</li>
<li>Rename a .proto file</li>
<li>Change .proto annotations that effect code generation (e.g. name of java package)</li>
</ul>
<p>The following changes to a .proto file SHALL be considered incompatible:</p>
<ul>
<li>Change an rpc/method name</li>
<li>Change an rpc/method parameter type or return type</li>
<li>Remove an rpc/method</li>
<li>Change the service name</li>
<li>Change the name of a Message</li>
<li>Modify a field type in an incompatible way (as defined recursively)</li>
<li>Change an optional field to required</li>
<li>Add or delete a required field</li>
<li>Delete an optional field as long as the optional field has reasonable defaults to allow deletions</li>
</ul>
<p>The following changes to a .proto file SHALL be considered incompatible:</p>
<ul>
<li>Change a field id</li>
<li>Reuse an old field that was previously deleted.</li>
</ul>
<p>Hadoop wire protocols that are not defined via .proto files SHOULD be considered to be <ahref="./InterfaceClassification.html#Private">Private</a> and <ahref="./InterfaceClassification.html#Stable">Stable</a>.</p>
<p>In addition to the limitations imposed by being <ahref="./InterfaceClassification.html#Stable">Stable</a>, Hadoop’s wire protocols MUST also be forward compatible across minor releases within a major version according to the following:</p>
<ul>
<li>Client-Server compatibility MUST be maintained so as to allow users to continue using older clients even after upgrading the server (cluster) to a later version (or vice versa). For example, a Hadoop 2.1.0 client talking to a Hadoop 2.3.0 cluster.</li>
<li>Client-Server compatibility MUST be maintained so as to allow users to upgrade the client before upgrading the server (cluster). For example, a Hadoop 2.4.0 client talking to a Hadoop 2.3.0 cluster. This allows deployment of client-side bug fixes ahead of full cluster upgrades. Note that new cluster features invoked by new client APIs or shell commands will not be usable. YARN applications that attempt to use new APIs (including new fields in data structures) that have not yet been deployed to the cluster can expect link exceptions.</li>
<li>Client-Server compatibility MUST be maintained so as to allow upgrading individual components without upgrading others. For example, upgrade HDFS from version 2.1.0 to 2.2.0 without upgrading MapReduce.</li>
<li>Server-Server compatibility MUST be maintained so as to allow mixed versions within an active cluster so the cluster may be upgraded without downtime in a rolling fashion.</li>
</ul>
<p>New transport mechanisms MUST only be introduced with minor or major version changes. Existing transport mechanisms MUST continue to be supported across minor versions within a major version. Default service port numbers SHALL be considered <ahref="./InterfaceClassification.html#Stable">Stable</a>.</p></section></section><section>
<h3><aname="REST_APIs"></a>REST APIs</h3>
<p>REST API compatibility applies to the exposed REST endpoints (URLs) and response data format. Hadoop REST APIs are specifically meant for stable use by clients across releases, even major ones. For purposes of this document, an exposed PEST API is one that is documented in the public documentation. The following is a non-exhaustive list of the exposed REST APIs:</p>
<li><ahref="../../hadoop-yarn/hadoop-yarn-site/TimelineServer.html">Timeline Server v1 REST API</a></li>
<li><ahref="../../hadoop-yarn/hadoop-yarn-site/TimelineServiceV2.html">Timeline Service v2 REST API</a></li>
</ul>
<p>Each API has an API-specific version number. Any incompatible changes MUST increment the API version number.</p><section>
<h4><aname="Policy"></a>Policy</h4>
<p>The exposed Hadoop REST APIs SHALL be considered <ahref="./InterfaceClassification.html#Public">Public</a> and <ahref="./InterfaceClassification.html#Evolving">Evolving</a>. With respect to API version numbers, the exposed Hadoop REST APIs SHALL be considered <ahref="./InterfaceClassification.html#Public">Public</a> and <ahref="./InterfaceClassification.html#Stable">Stable</a>, i.e. no incompatible changes are allowed to within an API version number. A REST API version must be labeled as deprecated for a full major release before it can be removed.</p></section></section><section>
<h3><aname="Log_Output"></a>Log Output</h3>
<p>The Hadoop daemons and CLIs produce log output via Log4j that is intended to aid administrators and developers in understanding and troubleshooting cluster behavior. Log messages are intended for human consumption, though automation use cases are also supported.</p><section>
<h4><aname="Policy"></a>Policy</h4>
<p>All log output SHALL be considered <ahref="./InterfaceClassification.html#Public">Public</a> and <ahref="./InterfaceClassification.html#Unstable">Unstable</a>. For log output, an incompatible change is one that renders a parser unable to find or recognize a line of log output.</p></section></section><section>
<p>Several components have audit logging systems that record system information in a machine readable format. Incompatible changes to that data format may break existing automation utilities. For the audit log, an incompatible change is any change that changes the format such that existing parsers no longer can parse the logs.</p><section>
<h4><aname="Policy"></a>Policy</h4>
<p>All audit log output SHALL be considered <ahref="./InterfaceClassification.html#Public">Public</a> and <ahref="./InterfaceClassification.html#Stable">Stable</a>. Any change to the data format SHALL be considered an incompatible change.</p></section></section><section>
<h3><aname="Metrics.2FJMX"></a>Metrics/JMX</h3>
<p>While the Metrics API compatibility is governed by Java API compatibility, the Metrics data format exposed by Hadoop MUST be maintained as compatible for consumers of the data, e.g. for automation tasks.</p><section>
<h4><aname="Policy"></a>Policy</h4>
<p>The data format exposed via Metrics SHALL be considered <ahref="./InterfaceClassification.html#Public">Public</a> and <ahref="./InterfaceClassification.html#Stable">Stable</a>.</p></section></section><section>
<h3><aname="File_formats_.26_Metadata"></a>File formats & Metadata</h3>
<p>User and system level data (including metadata) is stored in files of various formats. Changes to the metadata or the file formats used to store data/metadata can lead to incompatibilities between versions. Each class of file format is addressed below.</p><section>
<p>Changes to formats that end users use to store their data can prevent them from accessing the data in later releases, and hence are important to be compatible. Examples of these formats include har, war, SequenceFileFormat, etc.</p><section>
<h5><aname="Policy"></a>Policy</h5>
<p>User-level file formats SHALL be considered <ahref="./InterfaceClassification.html#Public">Public</a> and <ahref="./InterfaceClassification.html#Stable">Stable</a>. User-lever file format changes SHOULD be made forward compatible across major releases and MUST be made forward compatible within a major release. The developer community SHOULD prefer the creation of a new derivative file format to making incompatible changes to an existing file format. Such new file formats MUST be created as opt-in, meaning that users must be able to continue using the existing compatible format until and unless they explicitly opt in to using the new file format.</p></section></section><section>
<h4><aname="System-internal_data_schemas"></a>System-internal data schemas</h4>
<p>Hadoop internal data may also be stored in files or other data stores. Changing the schemas of these data stores can lead to incompatibilities.</p><section>
<h5><aname="MapReduce"></a>MapReduce</h5>
<p>MapReduce uses formats like I-File to store MapReduce-specific data.</p><section>
<h6><aname="Policy"></a>Policy</h6>
<p>All MapReduce-internal file formats, such as I-File format or the job history server’s jhist file format, SHALL be considered <ahref="./InterfaceClassification.html#Private">Private</a> and <ahref="./InterfaceClassification.html#Stable">Stable</a>.</p></section></section><section>
<h5><aname="HDFS_Metadata"></a>HDFS Metadata</h5>
<p>HDFS persists metadata (the image and edit logs) in a private file format. Incompatible changes to either the format or the metadata prevent subsequent releases from reading older metadata. Incompatible changes must include a process by which existing metadata may be upgraded.</p>
<p>Depending on the degree of incompatibility in the changes, the following potential scenarios can arise:</p>
<ul>
<li>Automatic: The image upgrades automatically, no need for an explicit “upgrade”.</li>
<li>Direct: The image is upgradeable, but might require one explicit release “upgrade”.</li>
<li>Indirect: The image is upgradeable, but might require upgrading to intermediate release(s) first.</li>
<li>Not upgradeable: The image is not upgradeable.</li>
</ul>
<p>HDFS data nodes store data in a private directory structure. Incompatible changes to the directory structure may prevent older releases from accessing stored data. Incompatible changes must include a process by which existing data directories may be upgraded.</p><section>
<h6><aname="Policy"></a>Policy</h6>
<p>The HDFS metadata format SHALL be considered <ahref="./InterfaceClassification.html#Private">Private</a> and <ahref="./InterfaceClassification.html#Evolving">Evolving</a>. Incompatible changes MUST include a process by which existing metadata may be upgraded. The upgrade process SHALL be allowed to require more than one upgrade. The upgrade process MUST allow the cluster metadata to be rolled back to the older version and its older disk format. The rollback MUST restore the original data but is not REQUIRED to restore the updated data. Any incompatible change to the format MUST result in the major version number of the schema being incremented.</p>
<p>The data node directory format SHALL be considered <ahref="./InterfaceClassification.html#Private">Private</a> and <ahref="./InterfaceClassification.html#Evolving">Evolving</a>. Incompatible changes MUST include a process by which existing data directories may be upgraded. The upgrade process SHALL be allowed to require more than one upgrade. The upgrade process MUST allow the data directories to be rolled back to the older layout.</p></section></section><section>
<p>The S3Guard metastore used to store metadata in DynamoDB tables; as such it had to maintain a compatibility strategy. Now that S3Guard is removed, the tables are not needed.</p>
<p>Applications configured to use an S3A metadata store other than the “null” store will fail.</p></section><section>
<h5><aname="YARN_Resource_Manager_State_Store"></a>YARN Resource Manager State Store</h5>
<p>The YARN resource manager stores information about the cluster state in an external state store for use in fail over and recovery. If the schema used for the state store data does not remain compatible, the resource manager will not be able to recover its state and will fail to start. The state store data schema includes a version number that indicates compatibility.</p><section>
<h6><aname="Policy"></a>Policy</h6>
<p>The YARN resource manager state store data schema SHALL be considered <ahref="./InterfaceClassification.html#Private">Private</a> and <ahref="./InterfaceClassification.html#Evolving">Evolving</a>. Any incompatible change to the schema MUST result in the major version number of the schema being incremented. Any compatible change to the schema MUST result in the minor version number being incremented.</p></section></section><section>
<h5><aname="YARN_Node_Manager_State_Store"></a>YARN Node Manager State Store</h5>
<p>The YARN node manager stores information about the node state in an external state store for use in recovery. If the schema used for the state store data does not remain compatible, the node manager will not be able to recover its state and will fail to start. The state store data schema includes a version number that indicates compatibility.</p><section>
<h6><aname="Policy"></a>Policy</h6>
<p>The YARN node manager state store data schema SHALL be considered <ahref="./InterfaceClassification.html#Private">Private</a> and <ahref="./InterfaceClassification.html#Evolving">Evolving</a>. Any incompatible change to the schema MUST result in the major version number of the schema being incremented. Any compatible change to the schema MUST result in the minor version number being incremented.</p></section></section><section>
<h5><aname="YARN_Federation_State_Store"></a>YARN Federation State Store</h5>
<p>The YARN resource manager federation service stores information about the federated clusters, running applications, and routing policies in an external state store for use in replication and recovery. If the schema used for the state store data does not remain compatible, the federation service will fail to initialize. The state store data schema includes a version number that indicates compatibility.</p><section>
<h6><aname="Policy"></a>Policy</h6>
<p>The YARN federation service state store data schema SHALL be considered <ahref="./InterfaceClassification.html#Private">Private</a> and <ahref="./InterfaceClassification.html#Evolving">Evolving</a>. Any incompatible change to the schema MUST result in the major version number of the schema being incremented. Any compatible change to the schema MUST result in the minor version number being incremented.</p></section></section></section></section><section>
<h3><aname="Command_Line_Interface_.28CLI.29"></a>Command Line Interface (CLI)</h3>
<p>The Hadoop command line programs may be used either directly via the system shell or via shell scripts. The CLIs include both the user-facing commands, such as the hdfs command or the yarn command, and the admin-facing commands, such as the scripts used to start and stop daemons. Changing the path of a command, removing or renaming command line options, the order of arguments, or the command return codes and output break compatibility and adversely affect users.</p><section>
<h4><aname="Policy"></a>Policy</h4>
<p>All Hadoop CLI paths, usage, and output SHALL be considered <ahref="./InterfaceClassification.html#Public">Public</a> and <ahref="./InterfaceClassification.html#Stable">Stable</a> unless documented as experimental and subject to change.</p>
<p>Note that the CLI output SHALL be considered distinct from the log output generated by the Hadoop CLIs. The latter SHALL be governed by the policy on log output. Note also that for CLI output, all changes SHALL be considered incompatible changes.</p></section></section><section>
<h3><aname="Web_UI"></a>Web UI</h3>
<p>Web UI, particularly the content and layout of web pages, changes could potentially interfere with attempts to screen scrape the web pages for information. The Hadoop Web UI pages, however, are not meant to be scraped, e.g. for automation purposes. Users are expected to use REST APIs to programmatically access cluster information.</p><section>
<h4><aname="Policy"></a>Policy</h4>
<p>The Hadoop Web UI SHALL be considered <ahref="./InterfaceClassification.html#Public">Public</a> and <ahref="./InterfaceClassification.html#Unstable">Unstable</a>.</p></section></section><section>
<p>Users depend on the behavior of a Hadoop cluster remaining consistent across releases. Changes which cause unexpectedly different behaviors from the cluster can lead to frustration and long adoption cycles. No new configuration should be added which changes the behavior of an existing cluster, assuming the cluster’s configuration files remain unchanged. For any new settings that are defined, care should be taken to ensure that the new setting does not change the behavior of existing clusters.</p><section>
<h4><aname="Policy"></a>Policy</h4>
<p>Changes to existing functionality MUST NOT change the default behavior or the meaning of existing configuration settings between maintenance releases within the same minor version, regardless of whether the changes arise from changes to the system or logic or to internal or external default configuration values.</p>
<p>Changes to existing functionality SHOULD NOT change the default behavior or the meaning of existing configuration settings between minor releases within the same major version, though changes, such as to fix correctness or security issues, may require incompatible behavioral changes. Where possible such behavioral changes SHOULD be off by default.</p></section></section><section>
<p>Users use Hadoop-defined properties to configure and provide hints to Hadoop and custom properties to pass information to jobs. Users are encouraged to avoid using custom configuration property names that conflict with the namespace of Hadoop-defined properties and should avoid using any prefixes used by Hadoop, e.g. hadoop, io, ipc, fs, net, file, ftp, kfs, ha, file, dfs, mapred, mapreduce, and yarn.</p>
<p>In addition to properties files, Hadoop uses other configuration files to set system behavior, such as the fair scheduler configuration file or the resource profiles configuration file.</p><section>
<h4><aname="Policy"></a>Policy</h4>
<p>Hadoop-defined properties (names and meanings) SHALL be considered <ahref="./InterfaceClassification.html#Public">Public</a> and <ahref="./InterfaceClassification.html#Stable">Stable</a>. The units implied by a Hadoop-defined property MUST NOT change, even across major versions. Default values of Hadoop-defined properties SHALL be considered <ahref="./InterfaceClassification.html#Public">Public</a> and <ahref="./InterfaceClassification.html#Evolving">Evolving</a>.</p>
<p>Hadoop configuration files that are not governed by the above rules about Hadoop-defined properties SHALL be considered <ahref="./InterfaceClassification.html#Public">Public</a> and <ahref="./InterfaceClassification.html#Stable">Stable</a>. The definition of an incompatible change depends on the particular configuration file format, but the general rule is that a compatible change will allow a configuration file that was valid before the change to remain valid after the change.</p></section></section><section>
<p>The log output produced by Hadoop daemons and CLIs is governed by a set of configuration files. These files control the minimum level of log message that will be output by the various components of Hadoop, as well as where and how those messages are stored.</p><section>
<h4><aname="Policy"></a>Policy</h4>
<p>All Log4j configurations SHALL be considered <ahref="./InterfaceClassification.html#Public">Public</a> and <ahref="./InterfaceClassification.html#Evolving">Evolving</a>.</p></section></section><section>
<p>Source code, artifacts (source and tests), user logs, configuration files, output, and job history are all stored on disk on either the local file system or HDFS. Changing the directory structure of these user-accessible files can break compatibility, even in cases where the original path is preserved via symbolic links (such as when the path is accessed by a servlet that is configured to not follow symbolic links).</p><section>
<h4><aname="Policy"></a>Policy</h4>
<p>The layout of source code and build artifacts SHALL be considered <ahref="./InterfaceClassification.html#Private">Private</a> and <ahref="./InterfaceClassification.html#Unstable">Unstable</a>. Within a major version, the developer community SHOULD preserve the overall directory structure, though individual files MAY be added, moved, or deleted with no warning.</p>
<p>The directory structure of configuration files, user logs, and job history SHALL be considered <ahref="./InterfaceClassification.html#Public">Public</a> and <ahref="./InterfaceClassification.html#Evolving">Evolving</a>.</p></section></section><section>
<p>Hadoop provides several client artifacts that applications use to interact with the system. These artifacts typically have their own dependencies on common libraries. In the cases where these dependencies are exposed to end user applications or downstream consumers (i.e. not <aclass="externalLink"href="https://stackoverflow.com/questions/13620281/what-is-the-maven-shade-plugin-used-for-and-why-would-you-want-to-relocate-java">shaded</a>) changes to these dependencies can be disruptive. Developers are strongly encouraged to avoid exposing dependencies to clients by using techniques such as <aclass="externalLink"href="https://stackoverflow.com/questions/13620281/what-is-the-maven-shade-plugin-used-for-and-why-would-you-want-to-relocate-java">shading</a>.</p>
<p>With regard to dependencies, adding a dependency is an incompatible change, whereas removing a dependency is a compatible change.</p>
<p>Some user applications built against Hadoop may add all Hadoop JAR files (including Hadoop’s library dependencies) to the application’s classpath. Adding new dependencies or updating the versions of existing dependencies may interfere with those in applications’ classpaths and hence their correct operation. Users are therefore discouraged from adopting this practice.</p><section>
<h4><aname="Policy"></a>Policy</h4>
<p>The set of dependencies exposed by the Hadoop client artifacts SHALL be considered <ahref="./InterfaceClassification.html#Public">Public</a> and <ahref="./InterfaceClassification.html#Stable">Stable</a>. Any dependencies that are not exposed to clients (either because they are shaded or only exist in non-client artifacts) SHALL be considered <ahref="./InterfaceClassification.html#Private">Private</a> and <ahref="./InterfaceClassification.html#Unstable">Unstable</a></p></section></section><section>
<p>Users and related projects often utilize the environment variables exported by Hadoop (e.g. HADOOP_CONF_DIR). Removing or renaming environment variables can therefore impact end user applications.</p><section>
<h4><aname="Policy"></a>Policy</h4>
<p>The environment variables consumed by Hadoop and the environment variables made accessible to applications through YARN SHALL be considered <ahref="./InterfaceClassification.html#Public">Public</a> and <ahref="./InterfaceClassification.html#Evolving">Evolving</a>. The developer community SHOULD limit changes to major releases.</p></section></section><section>
<p>Hadoop uses Maven for project management. Changes to the contents of generated artifacts can impact existing user applications.</p><section>
<h4><aname="Policy"></a>Policy</h4>
<p>The contents of Hadoop test artifacts SHALL be considered <ahref="./InterfaceClassification.html#Private">Private</a> and <ahref="./InterfaceClassification.html#Unstable">Unstable</a>. Test artifacts include all JAR files generated from test source code and all JAR files that include “tests” in the file name.</p>
<p>The Hadoop client artifacts SHALL be considered <ahref="./InterfaceClassification.html#Public">Public</a> and <ahref="./InterfaceClassification.html#Stable">Stable</a>. Client artifacts are the following:</p>
<ul>
<li>hadoop-client</li>
<li>hadoop-client-api</li>
<li>hadoop-client-minicluster</li>
<li>hadoop-client-runtime</li>
<li>hadoop-hdfs-client</li>
<li>hadoop-hdfs-native-client</li>
<li>hadoop-mapreduce-client-app</li>
<li>hadoop-mapreduce-client-common</li>
<li>hadoop-mapreduce-client-core</li>
<li>hadoop-mapreduce-client-hs</li>
<li>hadoop-mapreduce-client-hs-plugins</li>
<li>hadoop-mapreduce-client-jobclient</li>
<li>hadoop-mapreduce-client-nativetask</li>
<li>hadoop-mapreduce-client-shuffle</li>
<li>hadoop-yarn-client</li>
</ul>
<p>All other build artifacts SHALL be considered <ahref="./InterfaceClassification.html#Private">Private</a> and <ahref="./InterfaceClassification.html#Unstable">Unstable</a>.</p></section></section><section>
<p>To keep up with the latest advances in hardware, operating systems, JVMs, and other software, new Hadoop releases may include features that require newer hardware, operating systems releases, or JVM versions than previous Hadoop releases. For a specific environment, upgrading Hadoop might require upgrading other dependent software components.</p><section>
<h4><aname="Policies"></a>Policies</h4>
<ul>
<li>Hardware
<ul>
<li>Architecture: Intel and AMD are the processor architectures currently supported by the community. The community has no plans to restrict Hadoop to specific architectures, but MAY have family-specific optimizations. Support for any processor architecture SHOULD NOT be dropped without first being documented as deprecated for a full major release and MUST NOT be dropped without first being deprecated for at least a full minor release.</li>
<li>Minimum resources: While there are no guarantees on the minimum resources required by Hadoop daemons, the developer community SHOULD avoid increasing requirements within a minor release.</li>
</ul>
</li>
<li>Operating Systems: The community SHOULD maintain the same minimum OS requirements (OS kernel versions) within a minor release. Currently GNU/Linux and Microsoft Windows are the OSes officially supported by the community, while Apache Hadoop is known to work reasonably well on other OSes such as Apple MacOSX, Solaris, etc. Support for any OS SHOULD NOT be dropped without first being documented as deprecated for a full major release and MUST NOT be dropped without first being deprecated for at least a full minor release.</li>
<li>The JVM requirements SHALL NOT change across minor releases within the same major release unless the JVM version in question becomes unsupported. The JVM version requirement MAY be different for different operating systems or even operating system releases.</li>
<li>File systems supported by Hadoop, e.g. through the FileSystem API, SHOULD not become unsupported between minor releases within a major version unless a migration path to an alternate client implementation is available.</li>
</ul></section></section></section><section>
<h2><aname="References"></a>References</h2>
<p>Here are some relevant JIRAs and pages related to the topic:</p>
<ul>
<li>The evolution of this document - <aclass="externalLink"href="https://issues.apache.org/jira/browse/HADOOP-9517">HADOOP-9517</a></li>
<li>Binary compatibility for MapReduce end-user applications between hadoop-1.x and hadoop-2.x - <ahref="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html">MapReduce Compatibility between hadoop-1.x and hadoop-2.x</a></li>
<li>Annotations for interfaces as per interface classification schedule - <aclass="externalLink"href="https://issues.apache.org/jira/browse/HADOOP-7391">HADOOP-7391</a><ahref="./InterfaceClassification.html">Hadoop Interface Classification</a></li>
<li>Compatibility for Hadoop 1.x releases - <aclass="externalLink"href="https://issues.apache.org/jira/browse/HADOOP-5071">HADOOP-5071</a></li>
<li>The <aclass="externalLink"href="http://wiki.apache.org/hadoop/Roadmap">Hadoop Roadmap</a> page that captures other release policies</li>