HADOOP-13724. Fix a few typos in site markdown documents. Contributed by Ding Fei.
(cherry picked from commit987ee51141
) (cherry picked from commit4ed7cf3b36
)
This commit is contained in:
parent
9d473b8ddc
commit
15ff590c37
|
@ -35,7 +35,7 @@ Installation
|
||||||
|
|
||||||
Installing a Hadoop cluster typically involves unpacking the software on all the machines in the cluster or installing it via a packaging system as appropriate for your operating system. It is important to divide up the hardware into functions.
|
Installing a Hadoop cluster typically involves unpacking the software on all the machines in the cluster or installing it via a packaging system as appropriate for your operating system. It is important to divide up the hardware into functions.
|
||||||
|
|
||||||
Typically one machine in the cluster is designated as the NameNode and another machine the as ResourceManager, exclusively. These are the masters. Other services (such as Web App Proxy Server and MapReduce Job History server) are usually run either on dedicated hardware or on shared infrastrucutre, depending upon the load.
|
Typically one machine in the cluster is designated as the NameNode and another machine as the ResourceManager, exclusively. These are the masters. Other services (such as Web App Proxy Server and MapReduce Job History server) are usually run either on dedicated hardware or on shared infrastructure, depending upon the load.
|
||||||
|
|
||||||
The rest of the machines in the cluster act as both DataNode and NodeManager. These are the slaves.
|
The rest of the machines in the cluster act as both DataNode and NodeManager. These are the slaves.
|
||||||
|
|
||||||
|
|
|
@ -68,7 +68,7 @@ Wire compatibility concerns data being transmitted over the wire between Hadoop
|
||||||
#### Use Cases
|
#### Use Cases
|
||||||
|
|
||||||
* Client-Server compatibility is required to allow users to continue using the old clients even after upgrading the server (cluster) to a later version (or vice versa). For example, a Hadoop 2.1.0 client talking to a Hadoop 2.3.0 cluster.
|
* Client-Server compatibility is required to allow users to continue using the old clients even after upgrading the server (cluster) to a later version (or vice versa). For example, a Hadoop 2.1.0 client talking to a Hadoop 2.3.0 cluster.
|
||||||
* Client-Server compatibility is also required to allow users to upgrade the client before upgrading the server (cluster). For example, a Hadoop 2.4.0 client talking to a Hadoop 2.3.0 cluster. This allows deployment of client-side bug fixes ahead of full cluster upgrades. Note that new cluster features invoked by new client APIs or shell commands will not be usable. YARN applications that attempt to use new APIs (including new fields in data structures) that have not yet deployed to the cluster can expect link exceptions.
|
* Client-Server compatibility is also required to allow users to upgrade the client before upgrading the server (cluster). For example, a Hadoop 2.4.0 client talking to a Hadoop 2.3.0 cluster. This allows deployment of client-side bug fixes ahead of full cluster upgrades. Note that new cluster features invoked by new client APIs or shell commands will not be usable. YARN applications that attempt to use new APIs (including new fields in data structures) that have not yet been deployed to the cluster can expect link exceptions.
|
||||||
* Client-Server compatibility is also required to allow upgrading individual components without upgrading others. For example, upgrade HDFS from version 2.1.0 to 2.2.0 without upgrading MapReduce.
|
* Client-Server compatibility is also required to allow upgrading individual components without upgrading others. For example, upgrade HDFS from version 2.1.0 to 2.2.0 without upgrading MapReduce.
|
||||||
* Server-Server compatibility is required to allow mixed versions within an active cluster so the cluster may be upgraded without downtime in a rolling fashion.
|
* Server-Server compatibility is required to allow mixed versions within an active cluster so the cluster may be upgraded without downtime in a rolling fashion.
|
||||||
|
|
||||||
|
@ -76,7 +76,7 @@ Wire compatibility concerns data being transmitted over the wire between Hadoop
|
||||||
|
|
||||||
* Both Client-Server and Server-Server compatibility is preserved within a major release. (Different policies for different categories are yet to be considered.)
|
* Both Client-Server and Server-Server compatibility is preserved within a major release. (Different policies for different categories are yet to be considered.)
|
||||||
* Compatibility can be broken only at a major release, though breaking compatibility even at major releases has grave consequences and should be discussed in the Hadoop community.
|
* Compatibility can be broken only at a major release, though breaking compatibility even at major releases has grave consequences and should be discussed in the Hadoop community.
|
||||||
* Hadoop protocols are defined in .proto (ProtocolBuffers) files. Client-Server protocols and Server-protocol .proto files are marked as stable. When a .proto file is marked as stable it means that changes should be made in a compatible fashion as described below:
|
* Hadoop protocols are defined in .proto (ProtocolBuffers) files. Client-Server protocols and Server-Server protocol .proto files are marked as stable. When a .proto file is marked as stable it means that changes should be made in a compatible fashion as described below:
|
||||||
* The following changes are compatible and are allowed at any time:
|
* The following changes are compatible and are allowed at any time:
|
||||||
* Add an optional field, with the expectation that the code deals with the field missing due to communication with an older version of the code.
|
* Add an optional field, with the expectation that the code deals with the field missing due to communication with an older version of the code.
|
||||||
* Add a new rpc/method to the service
|
* Add a new rpc/method to the service
|
||||||
|
@ -101,7 +101,7 @@ Wire compatibility concerns data being transmitted over the wire between Hadoop
|
||||||
|
|
||||||
### Java Binary compatibility for end-user applications i.e. Apache Hadoop ABI
|
### Java Binary compatibility for end-user applications i.e. Apache Hadoop ABI
|
||||||
|
|
||||||
As Apache Hadoop revisions are upgraded end-users reasonably expect that their applications should continue to work without any modifications. This is fulfilled as a result of support API compatibility, Semantic compatibility and Wire compatibility.
|
As Apache Hadoop revisions are upgraded end-users reasonably expect that their applications should continue to work without any modifications. This is fulfilled as a result of supporting API compatibility, Semantic compatibility and Wire compatibility.
|
||||||
|
|
||||||
However, Apache Hadoop is a very complex, distributed system and services a very wide variety of use-cases. In particular, Apache Hadoop MapReduce is a very, very wide API; in the sense that end-users may make wide-ranging assumptions such as layout of the local disk when their map/reduce tasks are executing, environment variables for their tasks etc. In such cases, it becomes very hard to fully specify, and support, absolute compatibility.
|
However, Apache Hadoop is a very complex, distributed system and services a very wide variety of use-cases. In particular, Apache Hadoop MapReduce is a very, very wide API; in the sense that end-users may make wide-ranging assumptions such as layout of the local disk when their map/reduce tasks are executing, environment variables for their tasks etc. In such cases, it becomes very hard to fully specify, and support, absolute compatibility.
|
||||||
|
|
||||||
|
@ -115,12 +115,12 @@ However, Apache Hadoop is a very complex, distributed system and services a very
|
||||||
|
|
||||||
* Existing MapReduce, YARN & HDFS applications and frameworks should work unmodified within a major release i.e. Apache Hadoop ABI is supported.
|
* Existing MapReduce, YARN & HDFS applications and frameworks should work unmodified within a major release i.e. Apache Hadoop ABI is supported.
|
||||||
* A very minor fraction of applications maybe affected by changes to disk layouts etc., the developer community will strive to minimize these changes and will not make them within a minor version. In more egregious cases, we will consider strongly reverting these breaking changes and invalidating offending releases if necessary.
|
* A very minor fraction of applications maybe affected by changes to disk layouts etc., the developer community will strive to minimize these changes and will not make them within a minor version. In more egregious cases, we will consider strongly reverting these breaking changes and invalidating offending releases if necessary.
|
||||||
* In particular for MapReduce applications, the developer community will try our best to support provide binary compatibility across major releases e.g. applications using org.apache.hadoop.mapred.
|
* In particular for MapReduce applications, the developer community will try our best to support providing binary compatibility across major releases e.g. applications using org.apache.hadoop.mapred.
|
||||||
* APIs are supported compatibly across hadoop-1.x and hadoop-2.x. See [Compatibility for MapReduce applications between hadoop-1.x and hadoop-2.x](../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html) for more details.
|
* APIs are supported compatibly across hadoop-1.x and hadoop-2.x. See [Compatibility for MapReduce applications between hadoop-1.x and hadoop-2.x](../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html) for more details.
|
||||||
|
|
||||||
### REST APIs
|
### REST APIs
|
||||||
|
|
||||||
REST API compatibility corresponds to both the request (URLs) and responses to each request (content, which may contain other URLs). Hadoop REST APIs are specifically meant for stable use by clients across releases, even major releases. The following are the exposed REST APIs:
|
REST API compatibility corresponds to both the requests (URLs) and responses to each request (content, which may contain other URLs). Hadoop REST APIs are specifically meant for stable use by clients across releases, even major ones. The following are the exposed REST APIs:
|
||||||
|
|
||||||
* [WebHDFS](../hadoop-hdfs/WebHDFS.html) - Stable
|
* [WebHDFS](../hadoop-hdfs/WebHDFS.html) - Stable
|
||||||
* [ResourceManager](../../hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html)
|
* [ResourceManager](../../hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html)
|
||||||
|
@ -135,7 +135,7 @@ The APIs annotated stable in the text above preserve compatibility across at lea
|
||||||
|
|
||||||
### Metrics/JMX
|
### Metrics/JMX
|
||||||
|
|
||||||
While the Metrics API compatibility is governed by Java API compatibility, the actual metrics exposed by Hadoop need to be compatible for users to be able to automate using them (scripts etc.). Adding additional metrics is compatible. Modifying (eg changing the unit or measurement) or removing existing metrics breaks compatibility. Similarly, changes to JMX MBean object names also break compatibility.
|
While the Metrics API compatibility is governed by Java API compatibility, the actual metrics exposed by Hadoop need to be compatible for users to be able to automate using them (scripts etc.). Adding additional metrics is compatible. Modifying (e.g. changing the unit or measurement) or removing existing metrics breaks compatibility. Similarly, changes to JMX MBean object names also break compatibility.
|
||||||
|
|
||||||
#### Policy
|
#### Policy
|
||||||
|
|
||||||
|
@ -147,7 +147,7 @@ User and system level data (including metadata) is stored in files of different
|
||||||
|
|
||||||
#### User-level file formats
|
#### User-level file formats
|
||||||
|
|
||||||
Changes to formats that end-users use to store their data can prevent them for accessing the data in later releases, and hence it is highly important to keep those file-formats compatible. One can always add a "new" format improving upon an existing format. Examples of these formats include har, war, SequenceFileFormat etc.
|
Changes to formats that end-users use to store their data can prevent them from accessing the data in later releases, and hence it is highly important to keep those file-formats compatible. One can always add a "new" format improving upon an existing format. Examples of these formats include har, war, SequenceFileFormat etc.
|
||||||
|
|
||||||
##### Policy
|
##### Policy
|
||||||
|
|
||||||
|
@ -184,7 +184,7 @@ Depending on the degree of incompatibility in the changes, the following potenti
|
||||||
|
|
||||||
### Command Line Interface (CLI)
|
### Command Line Interface (CLI)
|
||||||
|
|
||||||
The Hadoop command line programs may be use either directly via the system shell or via shell scripts. Changing the path of a command, removing or renaming command line options, the order of arguments, or the command return code and output break compatibility and may adversely affect users.
|
The Hadoop command line programs may be used either directly via the system shell or via shell scripts. Changing the path of a command, removing or renaming command line options, the order of arguments, or the command return code and output break compatibility and may adversely affect users.
|
||||||
|
|
||||||
#### Policy
|
#### Policy
|
||||||
|
|
||||||
|
|
|
@ -44,7 +44,7 @@ Interfaces have two main attributes: Audience and Stability
|
||||||
|
|
||||||
Audience denotes the potential consumers of the interface. While many interfaces
|
Audience denotes the potential consumers of the interface. While many interfaces
|
||||||
are internal/private to the implementation, other are public/external interfaces
|
are internal/private to the implementation, other are public/external interfaces
|
||||||
are meant for wider consumption by applications and/or clients. For example, in
|
that are meant for wider consumption by applications and/or clients. For example, in
|
||||||
posix, libc is an external or public interface, while large parts of the kernel
|
posix, libc is an external or public interface, while large parts of the kernel
|
||||||
are internal or private interfaces. Also, some interfaces are targeted towards
|
are internal or private interfaces. Also, some interfaces are targeted towards
|
||||||
other specific subsystems.
|
other specific subsystems.
|
||||||
|
@ -52,7 +52,7 @@ other specific subsystems.
|
||||||
Identifying the audience of an interface helps define the impact of breaking
|
Identifying the audience of an interface helps define the impact of breaking
|
||||||
it. For instance, it might be okay to break the compatibility of an interface
|
it. For instance, it might be okay to break the compatibility of an interface
|
||||||
whose audience is a small number of specific subsystems. On the other hand, it
|
whose audience is a small number of specific subsystems. On the other hand, it
|
||||||
is probably not okay to break a protocol interfaces that millions of Internet
|
is probably not okay to break a protocol interface that millions of Internet
|
||||||
users depend on.
|
users depend on.
|
||||||
|
|
||||||
Hadoop uses the following kinds of audience in order of increasing/wider visibility:
|
Hadoop uses the following kinds of audience in order of increasing/wider visibility:
|
||||||
|
@ -75,7 +75,7 @@ referred to as project-private).
|
||||||
|
|
||||||
The interface is used by a specified set of projects or systems (typically
|
The interface is used by a specified set of projects or systems (typically
|
||||||
closely related projects). Other projects or systems should not use the
|
closely related projects). Other projects or systems should not use the
|
||||||
interface. Changes to the interface will be communicated/ negotiated with the
|
interface. Changes to the interface will be communicated/negotiated with the
|
||||||
specified projects. For example, in the Hadoop project, some interfaces are
|
specified projects. For example, in the Hadoop project, some interfaces are
|
||||||
LimitedPrivate{HDFS, MapReduce} in that they are private to the HDFS and
|
LimitedPrivate{HDFS, MapReduce} in that they are private to the HDFS and
|
||||||
MapReduce projects.
|
MapReduce projects.
|
||||||
|
@ -92,16 +92,16 @@ the interface are allowed. Hadoop APIs have the following levels of stability.
|
||||||
#### Stable
|
#### Stable
|
||||||
|
|
||||||
Can evolve while retaining compatibility for minor release boundaries; in other
|
Can evolve while retaining compatibility for minor release boundaries; in other
|
||||||
words, incompatible changes to APIs marked Stable are allowed only at major
|
words, incompatible changes to APIs marked as Stable are allowed only at major
|
||||||
releases (i.e. at m.0).
|
releases (i.e. at m.0).
|
||||||
|
|
||||||
#### Evolving
|
#### Evolving
|
||||||
|
|
||||||
Evolving, but incompatible changes are allowed at minor release (i.e. m .x)
|
Evolving, but incompatible changes are allowed at minor releases (i.e. m .x)
|
||||||
|
|
||||||
#### Unstable
|
#### Unstable
|
||||||
|
|
||||||
Incompatible changes to Unstable APIs are allowed any time. This usually makes
|
Incompatible changes to Unstable APIs are allowed at any time. This usually makes
|
||||||
sense for only private interfaces.
|
sense for only private interfaces.
|
||||||
|
|
||||||
However one may call this out for a supposedly public interface to highlight
|
However one may call this out for a supposedly public interface to highlight
|
||||||
|
@ -109,11 +109,11 @@ that it should not be used as an interface; for public interfaces, labeling it
|
||||||
as Not-an-interface is probably more appropriate than "Unstable".
|
as Not-an-interface is probably more appropriate than "Unstable".
|
||||||
|
|
||||||
Examples of publicly visible interfaces that are unstable
|
Examples of publicly visible interfaces that are unstable
|
||||||
(i.e. not-an-interface): GUI, CLIs whose output format will change
|
(i.e. not-an-interface): GUI, CLIs whose output format will change.
|
||||||
|
|
||||||
#### Deprecated
|
#### Deprecated
|
||||||
|
|
||||||
APIs that could potentially removed in the future and should not be used.
|
APIs that could potentially be removed in the future and should not be used.
|
||||||
|
|
||||||
How are the Classifications Recorded?
|
How are the Classifications Recorded?
|
||||||
-------------------------------------
|
-------------------------------------
|
||||||
|
@ -155,13 +155,13 @@ FAQ
|
||||||
* e.g. In HDFS, NN-DN protocol is private but stable and can help
|
* e.g. In HDFS, NN-DN protocol is private but stable and can help
|
||||||
implement rolling upgrades. It communicates that this interface should
|
implement rolling upgrades. It communicates that this interface should
|
||||||
not be changed in incompatible ways even though it is private.
|
not be changed in incompatible ways even though it is private.
|
||||||
* e.g. In HDFS, FSImage stability can help provide more flexible roll backs.
|
* e.g. In HDFS, FSImage stability provides more flexible rollback.
|
||||||
|
|
||||||
* What is the harm in applications using a private interface that is stable? How
|
* What is the harm in applications using a private interface that is stable? How
|
||||||
is it different than a public stable interface?
|
is it different than a public stable interface?
|
||||||
* While a private interface marked as stable is targeted to change only at
|
* While a private interface marked as stable is targeted to change only at
|
||||||
major releases, it may break at other times if the providers of that
|
major releases, it may break at other times if the providers of that
|
||||||
interface are willing to changes the internal users of that
|
interface are willing to change the internal users of that
|
||||||
interface. Further, a public stable interface is less likely to break even
|
interface. Further, a public stable interface is less likely to break even
|
||||||
at major releases (even though it is allowed to break compatibility)
|
at major releases (even though it is allowed to break compatibility)
|
||||||
because the impact of the change is larger. If you use a private interface
|
because the impact of the change is larger. If you use a private interface
|
||||||
|
@ -182,11 +182,11 @@ FAQ
|
||||||
away with private then do so; if the interface is really for general use
|
away with private then do so; if the interface is really for general use
|
||||||
for all applications then do so. But remember that making an interface
|
for all applications then do so. But remember that making an interface
|
||||||
public has huge responsibility. Sometimes Limited-private is just right.
|
public has huge responsibility. Sometimes Limited-private is just right.
|
||||||
* A good example of a limited-private interface is BlockLocations, This is
|
* A good example of a limited-private interface is BlockLocations, This is a
|
||||||
fairly low-level interface that we are willing to expose to MR and perhaps
|
fairly low-level interface that we are willing to expose to MR and perhaps
|
||||||
HBase. We are likely to change it down the road and at that time we will
|
HBase. We are likely to change it down the road and at that time we will
|
||||||
have get a coordinated effort with the MR team to release matching
|
coordinate release effort with the MR team.
|
||||||
releases. While MR and HDFS are always released in sync today, they may
|
While MR and HDFS are always released in sync today, they may
|
||||||
change down the road.
|
change down the road.
|
||||||
* If you have a limited-private interface with many projects listed then you
|
* If you have a limited-private interface with many projects listed then you
|
||||||
are fooling yourself. It is practically public.
|
are fooling yourself. It is practically public.
|
||||||
|
@ -207,7 +207,7 @@ FAQ
|
||||||
break it at minor releases.
|
break it at minor releases.
|
||||||
* One example of a public interface that is unstable is where one is
|
* One example of a public interface that is unstable is where one is
|
||||||
providing an implementation of a standards-body based interface that is
|
providing an implementation of a standards-body based interface that is
|
||||||
still under development. For example, many companies, in an attampt to be
|
still under development. For example, many companies, in an attempt to be
|
||||||
first to market, have provided implementations of a new NFS protocol even
|
first to market, have provided implementations of a new NFS protocol even
|
||||||
when the protocol was not fully completed by IETF. The implementor cannot
|
when the protocol was not fully completed by IETF. The implementor cannot
|
||||||
evolve the interface in a fashion that causes least distruption because
|
evolve the interface in a fashion that causes least distruption because
|
||||||
|
|
|
@ -35,7 +35,7 @@ of the client.
|
||||||
|
|
||||||
**Implementation Note**: the static `FileSystem get(URI uri, Configuration conf) ` method MAY return
|
**Implementation Note**: the static `FileSystem get(URI uri, Configuration conf) ` method MAY return
|
||||||
a pre-existing instance of a filesystem client class—a class that may also be in use in other threads.
|
a pre-existing instance of a filesystem client class—a class that may also be in use in other threads.
|
||||||
The implementations of `FileSystem` which ship with Apache Hadoop
|
The implementations of `FileSystem` shipped with Apache Hadoop
|
||||||
*do not make any attempt to synchronize access to the working directory field*.
|
*do not make any attempt to synchronize access to the working directory field*.
|
||||||
|
|
||||||
## Invariants
|
## Invariants
|
||||||
|
@ -214,7 +214,6 @@ response, then, if a listing `listStatus("/d")` takes place concurrently with th
|
||||||
|
|
||||||
[a, part-0000001, ... , part-9999999]
|
[a, part-0000001, ... , part-9999999]
|
||||||
[part-0000001, ... , part-9999999, z]
|
[part-0000001, ... , part-9999999, z]
|
||||||
|
|
||||||
[a, part-0000001, ... , part-9999999, z]
|
[a, part-0000001, ... , part-9999999, z]
|
||||||
[part-0000001, ... , part-9999999]
|
[part-0000001, ... , part-9999999]
|
||||||
|
|
||||||
|
@ -282,7 +281,7 @@ value is an instance of the `LocatedFileStatus` subclass of a `FileStatus`,
|
||||||
and that rather than return an entire list, an iterator is returned.
|
and that rather than return an entire list, an iterator is returned.
|
||||||
|
|
||||||
This is actually a `protected` method, directly invoked by
|
This is actually a `protected` method, directly invoked by
|
||||||
`listLocatedStatus(Path path):`. Calls to it may be delegated through
|
`listLocatedStatus(Path path)`. Calls to it may be delegated through
|
||||||
layered filesystems, such as `FilterFileSystem`, so its implementation MUST
|
layered filesystems, such as `FilterFileSystem`, so its implementation MUST
|
||||||
be considered mandatory, even if `listLocatedStatus(Path path)` has been
|
be considered mandatory, even if `listLocatedStatus(Path path)` has been
|
||||||
implemented in a different manner. There are open JIRAs proposing
|
implemented in a different manner. There are open JIRAs proposing
|
||||||
|
@ -442,7 +441,7 @@ the convention is generally retained.
|
||||||
|
|
||||||
### `long getDefaultBlockSize()`
|
### `long getDefaultBlockSize()`
|
||||||
|
|
||||||
Get the "default" block size for a filesystem. This often used during
|
Get the "default" block size for a filesystem. This is often used during
|
||||||
split calculations to divide work optimally across a set of worker processes.
|
split calculations to divide work optimally across a set of worker processes.
|
||||||
|
|
||||||
#### Preconditions
|
#### Preconditions
|
||||||
|
@ -604,7 +603,7 @@ This MAY be a bug, as it allows >1 client to create a file with `overwrite==fals
|
||||||
and potentially confuse file/directory logic
|
and potentially confuse file/directory logic
|
||||||
|
|
||||||
* The Local FileSystem raises a `FileNotFoundException` when trying to create a file over
|
* The Local FileSystem raises a `FileNotFoundException` when trying to create a file over
|
||||||
a directory, hence it is is listed as an exception that MAY be raised when
|
a directory, hence it is listed as an exception that MAY be raised when
|
||||||
this precondition fails.
|
this precondition fails.
|
||||||
|
|
||||||
* Not covered: symlinks. The resolved path of the symlink is used as the final path argument to the `create()` operation
|
* Not covered: symlinks. The resolved path of the symlink is used as the final path argument to the `create()` operation
|
||||||
|
@ -898,7 +897,7 @@ Renaming a file where the destination is a directory moves the file as a child
|
||||||
##### Renaming a directory onto a directory
|
##### Renaming a directory onto a directory
|
||||||
|
|
||||||
If `src` is a directory then all its children will then exist under `dest`, while the path
|
If `src` is a directory then all its children will then exist under `dest`, while the path
|
||||||
`src` and its descendants will no longer not exist. The names of the paths under
|
`src` and its descendants will no longer exist. The names of the paths under
|
||||||
`dest` will match those under `src`, as will the contents:
|
`dest` will match those under `src`, as will the contents:
|
||||||
|
|
||||||
if isDir(FS, src) isDir(FS, dest) and src != dest :
|
if isDir(FS, src) isDir(FS, dest) and src != dest :
|
||||||
|
@ -928,7 +927,7 @@ The outcome is no change to FileSystem state, with a return value of false.
|
||||||
*Local Filesystem, S3N*
|
*Local Filesystem, S3N*
|
||||||
|
|
||||||
The outcome is as a normal rename, with the additional (implicit) feature
|
The outcome is as a normal rename, with the additional (implicit) feature
|
||||||
that the parent directores of the destination also exist
|
that the parent directories of the destination also exist.
|
||||||
|
|
||||||
exists(FS', parent(dest))
|
exists(FS', parent(dest))
|
||||||
|
|
||||||
|
@ -1018,9 +1017,9 @@ HDFS: All source files except the final one MUST be a complete block:
|
||||||
|
|
||||||
|
|
||||||
HDFS's restrictions may be an implementation detail of how it implements
|
HDFS's restrictions may be an implementation detail of how it implements
|
||||||
`concat` -by changing the inode references to join them together in
|
`concat` by changing the inode references to join them together in
|
||||||
a sequence. As no other filesystem in the Hadoop core codebase
|
a sequence. As no other filesystem in the Hadoop core codebase
|
||||||
implements this method, there is no way to distinguish implementation detail.
|
implements this method, there is no way to distinguish implementation detail
|
||||||
from specification.
|
from specification.
|
||||||
|
|
||||||
|
|
||||||
|
|
|
@ -29,7 +29,7 @@ with extensions that add key assumptions to the system.
|
||||||
1. The stream being read references a finite array of bytes.
|
1. The stream being read references a finite array of bytes.
|
||||||
1. The length of the data does not change during the read process.
|
1. The length of the data does not change during the read process.
|
||||||
1. The contents of the data does not change during the process.
|
1. The contents of the data does not change during the process.
|
||||||
1. The source file remains present during the read process
|
1. The source file remains present during the read process.
|
||||||
1. Callers may use `Seekable.seek()` to offsets within the array of bytes, with future
|
1. Callers may use `Seekable.seek()` to offsets within the array of bytes, with future
|
||||||
reads starting at this offset.
|
reads starting at this offset.
|
||||||
1. The cost of forward and backward seeks is low.
|
1. The cost of forward and backward seeks is low.
|
||||||
|
@ -104,7 +104,7 @@ Return the current position. The outcome when a stream is closed is undefined.
|
||||||
|
|
||||||
Return the data at the current position.
|
Return the data at the current position.
|
||||||
|
|
||||||
1. Implementations should fail when a stream is closed
|
1. Implementations should fail when a stream is closed.
|
||||||
1. There is no limit on how long `read()` may take to complete.
|
1. There is no limit on how long `read()` may take to complete.
|
||||||
|
|
||||||
#### Preconditions
|
#### Preconditions
|
||||||
|
@ -124,7 +124,7 @@ Return the data at the current position.
|
||||||
|
|
||||||
Read `length` bytes of data into the destination buffer, starting at offset
|
Read `length` bytes of data into the destination buffer, starting at offset
|
||||||
`offset`. The source of the data is the current position of the stream,
|
`offset`. The source of the data is the current position of the stream,
|
||||||
as implicitly set in `pos`
|
as implicitly set in `pos`.
|
||||||
|
|
||||||
#### Preconditions
|
#### Preconditions
|
||||||
|
|
||||||
|
@ -166,7 +166,7 @@ the stream.
|
||||||
|
|
||||||
That is, rather than `l` being simply defined as `min(length, len(data)-length)`,
|
That is, rather than `l` being simply defined as `min(length, len(data)-length)`,
|
||||||
it strictly is an integer in the range `1..min(length, len(data)-length)`.
|
it strictly is an integer in the range `1..min(length, len(data)-length)`.
|
||||||
While the caller may expect for as much as the buffer as possible to be filled
|
While the caller may expect as much of the buffer as possible to be filled
|
||||||
in, it is within the specification for an implementation to always return
|
in, it is within the specification for an implementation to always return
|
||||||
a smaller number, perhaps only ever 1 byte.
|
a smaller number, perhaps only ever 1 byte.
|
||||||
|
|
||||||
|
@ -192,7 +192,7 @@ Some filesystems do not perform this check, relying on the `read()` contract
|
||||||
to reject reads on a closed stream (e.g. `RawLocalFileSystem`).
|
to reject reads on a closed stream (e.g. `RawLocalFileSystem`).
|
||||||
|
|
||||||
A `seek(0)` MUST always succeed, as the seek position must be
|
A `seek(0)` MUST always succeed, as the seek position must be
|
||||||
positive and less than the length of the Stream's:
|
positive and less than the length of the Stream:
|
||||||
|
|
||||||
s > 0 and ((s==0) or ((s < len(data)))) else raise [EOFException, IOException]
|
s > 0 and ((s==0) or ((s < len(data)))) else raise [EOFException, IOException]
|
||||||
|
|
||||||
|
@ -222,7 +222,7 @@ data at offset `offset`.
|
||||||
|
|
||||||
#### Preconditions
|
#### Preconditions
|
||||||
|
|
||||||
Not all subclasses implement the operation operation, and instead
|
Not all subclasses implement this operation, and instead
|
||||||
either raise an exception or return `False`.
|
either raise an exception or return `False`.
|
||||||
|
|
||||||
supported(FSDIS, Seekable.seekToNewSource) else raise [UnsupportedOperationException, IOException]
|
supported(FSDIS, Seekable.seekToNewSource) else raise [UnsupportedOperationException, IOException]
|
||||||
|
@ -250,7 +250,7 @@ If the operation is supported and there is a new location for the data:
|
||||||
|
|
||||||
The new data is the original data (or an updated version of it, as covered
|
The new data is the original data (or an updated version of it, as covered
|
||||||
in the Consistency section below), but the block containing the data at `offset`
|
in the Consistency section below), but the block containing the data at `offset`
|
||||||
sourced from a different replica.
|
is sourced from a different replica.
|
||||||
|
|
||||||
If there is no other copy, `FSDIS` is not updated; the response indicates this:
|
If there is no other copy, `FSDIS` is not updated; the response indicates this:
|
||||||
|
|
||||||
|
@ -258,7 +258,7 @@ If there is no other copy, `FSDIS` is not updated; the response indicates this:
|
||||||
|
|
||||||
Outside of test methods, the primary use of this method is in the {{FSInputChecker}}
|
Outside of test methods, the primary use of this method is in the {{FSInputChecker}}
|
||||||
class, which can react to a checksum error in a read by attempting to source
|
class, which can react to a checksum error in a read by attempting to source
|
||||||
the data elsewhere. It a new source can be found it attempts to reread and
|
the data elsewhere. If a new source can be found it attempts to reread and
|
||||||
recheck that portion of the file.
|
recheck that portion of the file.
|
||||||
|
|
||||||
## <a name="PositionedReadable"></a> interface `PositionedReadable`
|
## <a name="PositionedReadable"></a> interface `PositionedReadable`
|
||||||
|
|
|
@ -141,7 +141,7 @@ The failure modes when a user lacks security permissions are not specified.
|
||||||
|
|
||||||
### Networking Assumptions
|
### Networking Assumptions
|
||||||
|
|
||||||
This document assumes this all network operations succeed. All statements
|
This document assumes that all network operations succeed. All statements
|
||||||
can be assumed to be qualified as *"assuming the operation does not fail due
|
can be assumed to be qualified as *"assuming the operation does not fail due
|
||||||
to a network availability problem"*
|
to a network availability problem"*
|
||||||
|
|
||||||
|
@ -303,7 +303,7 @@ does not hold on blob stores]
|
||||||
1. Directory list operations are fast for directories with few entries, but may
|
1. Directory list operations are fast for directories with few entries, but may
|
||||||
incur a cost that is `O(entries)`. Hadoop 2 added iterative listing to
|
incur a cost that is `O(entries)`. Hadoop 2 added iterative listing to
|
||||||
handle the challenge of listing directories with millions of entries without
|
handle the challenge of listing directories with millions of entries without
|
||||||
buffering -at the cost of consistency.
|
buffering at the cost of consistency.
|
||||||
|
|
||||||
1. A `close()` of an `OutputStream` is fast, irrespective of whether or not
|
1. A `close()` of an `OutputStream` is fast, irrespective of whether or not
|
||||||
the file operation has succeeded or not.
|
the file operation has succeeded or not.
|
||||||
|
@ -317,8 +317,8 @@ This specification refers to *Object Stores* in places, often using the
|
||||||
term *Blobstore*. Hadoop does provide FileSystem client classes for some of these
|
term *Blobstore*. Hadoop does provide FileSystem client classes for some of these
|
||||||
even though they violate many of the requirements. This is why, although
|
even though they violate many of the requirements. This is why, although
|
||||||
Hadoop can read and write data in an object store, the two which Hadoop ships
|
Hadoop can read and write data in an object store, the two which Hadoop ships
|
||||||
with direct support for —Amazon S3 and OpenStack Swift&mdash cannot
|
with direct support for — Amazon S3 and OpenStack Swift — cannot
|
||||||
be used as direct replacement for HDFS.
|
be used as direct replacements for HDFS.
|
||||||
|
|
||||||
*What is an Object Store?*
|
*What is an Object Store?*
|
||||||
|
|
||||||
|
@ -358,10 +358,10 @@ are current with respect to the files within that directory.
|
||||||
as are `delete()` operations. Object store FileSystem clients implement these
|
as are `delete()` operations. Object store FileSystem clients implement these
|
||||||
as operations on the individual objects whose names match the directory prefix.
|
as operations on the individual objects whose names match the directory prefix.
|
||||||
As a result, the changes take place a file at a time, and are not atomic. If
|
As a result, the changes take place a file at a time, and are not atomic. If
|
||||||
an operation fails part way through the process, the the state of the object store
|
an operation fails part way through the process, then the state of the object store
|
||||||
reflects the partially completed operation. Note also that client code
|
reflects the partially completed operation. Note also that client code
|
||||||
assumes that these operations are `O(1)` —in an object store they are
|
assumes that these operations are `O(1)` —in an object store they are
|
||||||
more likely to be be `O(child-entries)`.
|
more likely to be `O(child-entries)`.
|
||||||
|
|
||||||
1. **Durability**. Hadoop assumes that `OutputStream` implementations write data
|
1. **Durability**. Hadoop assumes that `OutputStream` implementations write data
|
||||||
to their (persistent) storage on a `flush()` operation. Object store implementations
|
to their (persistent) storage on a `flush()` operation. Object store implementations
|
||||||
|
|
|
@ -18,7 +18,7 @@
|
||||||
|
|
||||||
## Paths and Path Elements
|
## Paths and Path Elements
|
||||||
|
|
||||||
A Path is a list of Path elements which represents a path to a file, directory of symbolic link
|
A Path is a list of Path elements which represents a path to a file, directory or symbolic link
|
||||||
|
|
||||||
Path elements are non-empty strings. The exact set of valid strings MAY
|
Path elements are non-empty strings. The exact set of valid strings MAY
|
||||||
be specific to a particular FileSystem implementation.
|
be specific to a particular FileSystem implementation.
|
||||||
|
@ -179,7 +179,7 @@ path begins with the path P -that is their parent is P or an ancestor is P
|
||||||
|
|
||||||
### File references
|
### File references
|
||||||
|
|
||||||
A path MAY refer to a file; that it it has data in the filesystem; its path is a key in the data dictionary
|
A path MAY refer to a file that has data in the filesystem; its path is a key in the data dictionary
|
||||||
|
|
||||||
def isFile(FS, p) = p in FS.Files
|
def isFile(FS, p) = p in FS.Files
|
||||||
|
|
||||||
|
@ -206,7 +206,8 @@ process working with the filesystem:
|
||||||
|
|
||||||
The function `getHomeDirectory` returns the home directory for the Filesystem and the current user account.
|
The function `getHomeDirectory` returns the home directory for the Filesystem and the current user account.
|
||||||
For some FileSystems, the path is `["/","users", System.getProperty("user-name")]`. However,
|
For some FileSystems, the path is `["/","users", System.getProperty("user-name")]`. However,
|
||||||
for HDFS,
|
for HDFS, the username is derived from the credentials used to authenticate the client with HDFS.
|
||||||
|
This may differ from the local user account name.
|
||||||
|
|
||||||
|
|
||||||
### Exclusivity
|
### Exclusivity
|
||||||
|
|
|
@ -130,7 +130,7 @@ Strings are lists of characters represented in double quotes. e.g. `"abc"`
|
||||||
|
|
||||||
All system state declarations are immutable.
|
All system state declarations are immutable.
|
||||||
|
|
||||||
The suffix "'" (single quote) is used as the convention to indicate the state of the system after a operation:
|
The suffix "'" (single quote) is used as the convention to indicate the state of the system after an operation:
|
||||||
|
|
||||||
L' = L + ['d','e']
|
L' = L + ['d','e']
|
||||||
|
|
||||||
|
|
|
@ -28,7 +28,7 @@ remote server providing the filesystem.
|
||||||
|
|
||||||
These filesystem bindings must be defined in an XML configuration file, usually
|
These filesystem bindings must be defined in an XML configuration file, usually
|
||||||
`hadoop-common-project/hadoop-common/src/test/resources/contract-test-options.xml`.
|
`hadoop-common-project/hadoop-common/src/test/resources/contract-test-options.xml`.
|
||||||
This file is excluded should not be checked in.
|
This file is excluded and should not be checked in.
|
||||||
|
|
||||||
### ftp://
|
### ftp://
|
||||||
|
|
||||||
|
@ -122,7 +122,7 @@ new contract class, then creating a new non-abstract test class for every test
|
||||||
suite that you wish to test.
|
suite that you wish to test.
|
||||||
|
|
||||||
1. Do not try and add these tests into Hadoop itself. They won't be added to
|
1. Do not try and add these tests into Hadoop itself. They won't be added to
|
||||||
the soutce tree. The tests must live with your own filesystem source.
|
the source tree. The tests must live with your own filesystem source.
|
||||||
1. Create a package in your own test source tree (usually) under `contract`,
|
1. Create a package in your own test source tree (usually) under `contract`,
|
||||||
for the files and tests.
|
for the files and tests.
|
||||||
1. Subclass `AbstractFSContract` for your own contract implementation.
|
1. Subclass `AbstractFSContract` for your own contract implementation.
|
||||||
|
|
|
@ -157,6 +157,6 @@ Hadoop Archives and MapReduce
|
||||||
Using Hadoop Archives in MapReduce is as easy as specifying a different input
|
Using Hadoop Archives in MapReduce is as easy as specifying a different input
|
||||||
filesystem than the default file system. If you have a hadoop archive stored
|
filesystem than the default file system. If you have a hadoop archive stored
|
||||||
in HDFS in /user/zoo/foo.har then for using this archive for MapReduce input,
|
in HDFS in /user/zoo/foo.har then for using this archive for MapReduce input,
|
||||||
all you need to specify the input directory as har:///user/zoo/foo.har. Since
|
all you need is to specify the input directory as har:///user/zoo/foo.har. Since
|
||||||
Hadoop Archives is exposed as a file system MapReduce will be able to use all
|
Hadoop Archives is exposed as a file system MapReduce will be able to use all
|
||||||
the logical input files in Hadoop Archives as input.
|
the logical input files in Hadoop Archives as input.
|
||||||
|
|
Loading…
Reference in New Issue