HADOOP-15332. Fix typos in hadoop-aws markdown docs. Contributed by Gabor Bota.

This commit is contained in:
Sean Mackrory 2018-03-20 21:11:51 -07:00 committed by Arpit Agarwal
parent 6891da1e75
commit 2b46bd3f45
7 changed files with 81 additions and 81 deletions

View File

@ -28,7 +28,7 @@ The standard commit algorithms (the `FileOutputCommitter` and its v1 and v2 algo
rely on directory rename being an `O(1)` atomic operation: callers output their rely on directory rename being an `O(1)` atomic operation: callers output their
work to temporary directories in the destination filesystem, then work to temporary directories in the destination filesystem, then
rename these directories to the final destination as way of committing work. rename these directories to the final destination as way of committing work.
This is the perfect solution for commiting work against any filesystem with This is the perfect solution for committing work against any filesystem with
consistent listing operations and where the `FileSystem.rename()` command consistent listing operations and where the `FileSystem.rename()` command
is an atomic `O(1)` operation. is an atomic `O(1)` operation.
@ -60,7 +60,7 @@ delayed completion of multi-part PUT operations
That is: tasks write all data as multipart uploads, *but delay the final That is: tasks write all data as multipart uploads, *but delay the final
commit action until until the final, single job commit action.* Only that commit action until until the final, single job commit action.* Only that
data committed in the job commit action will be made visible; work from speculative data committed in the job commit action will be made visible; work from speculative
and failed tasks will not be instiantiated. As there is no rename, there is no and failed tasks will not be instantiated. As there is no rename, there is no
delay while data is copied from a temporary directory to the final directory. delay while data is copied from a temporary directory to the final directory.
The duration of the commit will be the time needed to determine which commit operations The duration of the commit will be the time needed to determine which commit operations
to construct, and to execute them. to construct, and to execute them.
@ -109,7 +109,7 @@ This is traditionally implemented via a `FileSystem.rename()` call.
It is useful to differentiate between a *task-side commit*: an operation performed It is useful to differentiate between a *task-side commit*: an operation performed
in the task process after its work, and a *driver-side task commit*, in which in the task process after its work, and a *driver-side task commit*, in which
the Job driver perfoms the commit operation. Any task-side commit work will the Job driver performs the commit operation. Any task-side commit work will
be performed across the cluster, and may take place off the critical part for be performed across the cluster, and may take place off the critical part for
job execution. However, unless the commit protocol requires all tasks to await job execution. However, unless the commit protocol requires all tasks to await
a signal from the job driver, task-side commits cannot instantiate their output a signal from the job driver, task-side commits cannot instantiate their output
@ -241,7 +241,7 @@ def commitTask(fs, jobAttemptPath, taskAttemptPath, dest):
fs.rename(taskAttemptPath, taskCommittedPath) fs.rename(taskAttemptPath, taskCommittedPath)
``` ```
On a genuine fileystem this is an `O(1)` directory rename. On a genuine filesystem this is an `O(1)` directory rename.
On an object store with a mimiced rename, it is `O(data)` for the copy, On an object store with a mimiced rename, it is `O(data)` for the copy,
along with overhead for listing and deleting all files (For S3, that's along with overhead for listing and deleting all files (For S3, that's
@ -257,13 +257,13 @@ def abortTask(fs, jobAttemptPath, taskAttemptPath, dest):
fs.delete(taskAttemptPath, recursive=True) fs.delete(taskAttemptPath, recursive=True)
``` ```
On a genuine fileystem this is an `O(1)` operation. On an object store, On a genuine filesystem this is an `O(1)` operation. On an object store,
proportional to the time to list and delete files, usually in batches. proportional to the time to list and delete files, usually in batches.
### Job Commit ### Job Commit
Merge all files/directories in all task commited paths into final destination path. Merge all files/directories in all task committed paths into final destination path.
Optionally; create 0-byte `_SUCCESS` file in destination path. Optionally; create 0-byte `_SUCCESS` file in destination path.
```python ```python
@ -420,9 +420,9 @@ by renaming the files.
A a key difference is that the v1 algorithm commits a source directory to A a key difference is that the v1 algorithm commits a source directory to
via a directory rename, which is traditionally an `O(1)` operation. via a directory rename, which is traditionally an `O(1)` operation.
In constrast, the v2 algorithm lists all direct children of a source directory In contrast, the v2 algorithm lists all direct children of a source directory
and recursively calls `mergePath()` on them, ultimately renaming the individual and recursively calls `mergePath()` on them, ultimately renaming the individual
files. As such, the number of renames it performa equals the number of source files. As such, the number of renames it performs equals the number of source
*files*, rather than the number of source *directories*; the number of directory *files*, rather than the number of source *directories*; the number of directory
listings being `O(depth(src))` , where `depth(path)` is a function returning the listings being `O(depth(src))` , where `depth(path)` is a function returning the
depth of directories under the given path. depth of directories under the given path.
@ -431,7 +431,7 @@ On a normal filesystem, the v2 merge algorithm is potentially more expensive
than the v1 algorithm. However, as the merging only takes place in task commit, than the v1 algorithm. However, as the merging only takes place in task commit,
it is potentially less of a bottleneck in the entire execution process. it is potentially less of a bottleneck in the entire execution process.
On an objcct store, it is suboptimal not just from its expectation that `rename()` On an object store, it is suboptimal not just from its expectation that `rename()`
is an `O(1)` operation, but from its expectation that a recursive tree walk is is an `O(1)` operation, but from its expectation that a recursive tree walk is
an efficient way to enumerate and act on a tree of data. If the algorithm was an efficient way to enumerate and act on a tree of data. If the algorithm was
switched to using `FileSystem.listFiles(path, recursive)` for a single call to switched to using `FileSystem.listFiles(path, recursive)` for a single call to
@ -548,7 +548,7 @@ the final destination FS, while `file://` can retain the default
### Task Setup ### Task Setup
`Task.initialize()`: read in the configuration, instantate the `JobContextImpl` `Task.initialize()`: read in the configuration, instantiate the `JobContextImpl`
and `TaskAttemptContextImpl` instances bonded to the current job & task. and `TaskAttemptContextImpl` instances bonded to the current job & task.
### Task Ccommit ### Task Ccommit
@ -610,7 +610,7 @@ deleting the previous attempt's data is straightforward. However, for S3 committ
using Multipart Upload as the means of uploading uncommitted data, it is critical using Multipart Upload as the means of uploading uncommitted data, it is critical
to ensure that pending uploads are always aborted. This can be done by to ensure that pending uploads are always aborted. This can be done by
* Making sure that all task-side failure branvches in `Task.done()` call `committer.abortTask()`. * Making sure that all task-side failure branches in `Task.done()` call `committer.abortTask()`.
* Having job commit & abort cleaning up all pending multipart writes to the same directory * Having job commit & abort cleaning up all pending multipart writes to the same directory
tree. That is: require that no other jobs are writing to the same tree, and so tree. That is: require that no other jobs are writing to the same tree, and so
list all pending operations and cancel them. list all pending operations and cancel them.
@ -653,7 +653,7 @@ rather than relying on fields initiated from the context passed to the construct
#### AM: Job setup: `OutputCommitter.setupJob()` #### AM: Job setup: `OutputCommitter.setupJob()`
This is initated in `org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.StartTransition`. This is initiated in `org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.StartTransition`.
It is queued for asynchronous execution in `org.apache.hadoop.mapreduce.v2.app.MRAppMaster.startJobs()`, It is queued for asynchronous execution in `org.apache.hadoop.mapreduce.v2.app.MRAppMaster.startJobs()`,
which is invoked when the service is started. Thus: the job is set up when the which is invoked when the service is started. Thus: the job is set up when the
AM is started. AM is started.
@ -686,7 +686,7 @@ the job is considered not to have attempted to commit itself yet.
The presence of `COMMIT_SUCCESS` or `COMMIT_FAIL` are taken as evidence The presence of `COMMIT_SUCCESS` or `COMMIT_FAIL` are taken as evidence
that the previous job completed successfully or unsucessfully; the AM that the previous job completed successfully or unsuccessfully; the AM
then completes with a success/failure error code, without attempting to rerun then completes with a success/failure error code, without attempting to rerun
the job. the job.
@ -871,16 +871,16 @@ base directory. As well as translating the write operation, it also supports
a `getFileStatus()` call on the original path, returning details on the file a `getFileStatus()` call on the original path, returning details on the file
at the final destination. This allows for committing applications to verify at the final destination. This allows for committing applications to verify
the creation/existence/size of the written files (in contrast to the magic the creation/existence/size of the written files (in contrast to the magic
committer covdered below). committer covered below).
The FS targets Openstack Swift, though other object stores are supportable through The FS targets Openstack Swift, though other object stores are supportable through
different backends. different backends.
This solution is innovative in that it appears to deliver the same semantics This solution is innovative in that it appears to deliver the same semantics
(and hence failure modes) as the Spark Direct OutputCommitter, but which (and hence failure modes) as the Spark Direct OutputCommitter, but which
does not need any changs in either Spark *or* the Hadoop committers. In contrast, does not need any change in either Spark *or* the Hadoop committers. In contrast,
the committers proposed here combines changing the Hadoop MR committers for the committers proposed here combines changing the Hadoop MR committers for
ease of pluggability, and offers a new committer exclusivley for S3, one ease of pluggability, and offers a new committer exclusively for S3, one
strongly dependent upon and tightly integrated with the S3A Filesystem. strongly dependent upon and tightly integrated with the S3A Filesystem.
The simplicity of the Stocator committer is something to appreciate. The simplicity of the Stocator committer is something to appreciate.
@ -922,7 +922,7 @@ The completion operation is apparently `O(1)`; presumably the PUT requests
have already uploaded the data to the server(s) which will eventually be have already uploaded the data to the server(s) which will eventually be
serving up the data for the final path. All that is needed to complete serving up the data for the final path. All that is needed to complete
the upload is to construct an object by linking together the files in the upload is to construct an object by linking together the files in
the server's local filesystem and udate an entry the index table of the the server's local filesystem and update an entry the index table of the
object store. object store.
In the S3A client, all PUT calls in the sequence and the final commit are In the S3A client, all PUT calls in the sequence and the final commit are
@ -941,11 +941,11 @@ number of appealing features
The final point is not to be underestimated, es not even The final point is not to be underestimated, es not even
a need for a consistency layer. a need for a consistency layer.
* Overall a simpler design.pecially given the need to * Overall a simpler design. Especially given the need to
be resilient to the various failure modes which may arise. be resilient to the various failure modes which may arise.
The commiter writes task outputs to a temporary directory on the local FS. The committer writes task outputs to a temporary directory on the local FS.
Task outputs are directed to the local FS by `getTaskAttemptPath` and `getWorkPath`. Task outputs are directed to the local FS by `getTaskAttemptPath` and `getWorkPath`.
On task commit, the committer enumerates files in the task attempt directory (ignoring hidden files). On task commit, the committer enumerates files in the task attempt directory (ignoring hidden files).
Each file is uploaded to S3 using the [multi-part upload API](http://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html), Each file is uploaded to S3 using the [multi-part upload API](http://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html),
@ -966,7 +966,7 @@ is a local `file://` reference.
within a consistent, cluster-wide filesystem. For Netflix, that is HDFS. within a consistent, cluster-wide filesystem. For Netflix, that is HDFS.
1. The Standard `FileOutputCommitter` (algorithm 1) is used to manage the commit/abort of these 1. The Standard `FileOutputCommitter` (algorithm 1) is used to manage the commit/abort of these
files. That is: it copies only those lists of files to commit from successful tasks files. That is: it copies only those lists of files to commit from successful tasks
into a (transient) job commmit directory. into a (transient) job commit directory.
1. The S3 job committer reads the pending file list for every task committed 1. The S3 job committer reads the pending file list for every task committed
in HDFS, and completes those put requests. in HDFS, and completes those put requests.
@ -1028,7 +1028,7 @@ complete at or near the same time, there may be a peak of bandwidth load
slowing down the upload. slowing down the upload.
Time to commit will be the same, and, given the Netflix committer has already Time to commit will be the same, and, given the Netflix committer has already
implemented the paralellization logic here, a time of `O(files/threads)`. implemented the parallelization logic here, a time of `O(files/threads)`.
### Resilience ### Resilience
@ -1105,7 +1105,7 @@ This is done by
an abort of all successfully read files. an abort of all successfully read files.
1. List and abort all pending multipart uploads. 1. List and abort all pending multipart uploads.
Because of action #2, action #1 is superflous. It is retained so as to leave Because of action #2, action #1 is superfluous. It is retained so as to leave
open the option of making action #2 a configurable option -which would be open the option of making action #2 a configurable option -which would be
required to handle the use case of >1 partitioned commit running simultaneously/ required to handle the use case of >1 partitioned commit running simultaneously/
@ -1115,7 +1115,7 @@ Because the local data is managed with the v1 commit algorithm, the
second attempt of the job will recover all the outstanding commit data second attempt of the job will recover all the outstanding commit data
of the first attempt; those tasks will not be rerun. of the first attempt; those tasks will not be rerun.
This also ensures that on a job abort, the invidual tasks' .pendingset This also ensures that on a job abort, the individual tasks' .pendingset
files can be read and used to initiate the abort of those uploads. files can be read and used to initiate the abort of those uploads.
That is: a recovered job can clean up the pending writes of the previous job That is: a recovered job can clean up the pending writes of the previous job
@ -1129,7 +1129,7 @@ must be configured to automatically delete the pending request.
Those uploads already executed by a failed job commit will persist; those Those uploads already executed by a failed job commit will persist; those
yet to execute will remain outstanding. yet to execute will remain outstanding.
The committer currently declares itself as non-recoverble, but that The committer currently declares itself as non-recoverable, but that
may not actually hold, as the recovery process could be one of: may not actually hold, as the recovery process could be one of:
1. Enumerate all job commits from the .pendingset files (*:= Commits*). 1. Enumerate all job commits from the .pendingset files (*:= Commits*).
@ -1203,7 +1203,7 @@ that of the final job destination. When the job is committed, the pending
writes are instantiated. writes are instantiated.
With the addition of the Netflix Staging committer, the actual committer With the addition of the Netflix Staging committer, the actual committer
code now shares common formats for the persistent metadadata and shared routines code now shares common formats for the persistent metadata and shared routines
for parallel committing of work, including all the error handling based on for parallel committing of work, including all the error handling based on
the Netflix experience. the Netflix experience.
@ -1333,7 +1333,7 @@ during job and task committer initialization.
The job/task commit protocol is expected to handle this with the task The job/task commit protocol is expected to handle this with the task
only committing work when the job driver tells it to. A network partition only committing work when the job driver tells it to. A network partition
should trigger the task committer's cancellation of the work (this is a protcol should trigger the task committer's cancellation of the work (this is a protocol
above the committers). above the committers).
#### Job Driver failure #### Job Driver failure
@ -1349,7 +1349,7 @@ when the job driver cleans up it will cancel pending writes under the directory.
#### Multiple jobs targeting the same destination directory #### Multiple jobs targeting the same destination directory
This leaves things in an inderminate state. This leaves things in an indeterminate state.
#### Failure during task commit #### Failure during task commit
@ -1388,7 +1388,7 @@ Two options present themselves
and test that code as appropriate. and test that code as appropriate.
Fixing the calling code does seem to be the best strategy, as it allows the Fixing the calling code does seem to be the best strategy, as it allows the
failure to be explictly handled in the commit protocol, rather than hidden failure to be explicitly handled in the commit protocol, rather than hidden
in the committer.::OpenFile in the committer.::OpenFile
#### Preemption #### Preemption
@ -1418,7 +1418,7 @@ with many millions of objects —rather than list all keys searching for those
with `/__magic/**/*.pending` in their name, work backwards from the active uploads to with `/__magic/**/*.pending` in their name, work backwards from the active uploads to
the directories with the data. the directories with the data.
We may also want to consider having a cleanup operationn in the S3 CLI to We may also want to consider having a cleanup operation in the S3 CLI to
do the full tree scan and purge of pending items; give some statistics on do the full tree scan and purge of pending items; give some statistics on
what was found. This will keep costs down and help us identify problems what was found. This will keep costs down and help us identify problems
related to cleanup. related to cleanup.
@ -1538,7 +1538,7 @@ The S3A Committer version, would
In order to support the ubiquitous `FileOutputFormat` and subclasses, In order to support the ubiquitous `FileOutputFormat` and subclasses,
S3A Committers will need somehow be accepted as a valid committer by the class, S3A Committers will need somehow be accepted as a valid committer by the class,
a class which explicity expects the output committer to be `FileOutputCommitter` a class which explicitly expects the output committer to be `FileOutputCommitter`
```java ```java
public Path getDefaultWorkFile(TaskAttemptContext context, public Path getDefaultWorkFile(TaskAttemptContext context,
@ -1555,10 +1555,10 @@ Here are some options which have been considered, explored and discarded
1. Adding more of a factory mechanism to create `FileOutputCommitter` instances; 1. Adding more of a factory mechanism to create `FileOutputCommitter` instances;
subclass this for S3A output and return it. The complexity of `FileOutputCommitter` subclass this for S3A output and return it. The complexity of `FileOutputCommitter`
and of supporting more dynamic consturction makes this dangerous from an implementation and of supporting more dynamic construction makes this dangerous from an implementation
and maintenance perspective. and maintenance perspective.
1. Add a new commit algorithmm "3", which actually reads in the configured 1. Add a new commit algorithm "3", which actually reads in the configured
classname of a committer which it then instantiates and then relays the commit classname of a committer which it then instantiates and then relays the commit
operations, passing in context information. Ths new committer interface would operations, passing in context information. Ths new committer interface would
add methods for methods and attributes. This is viable, but does still change add methods for methods and attributes. This is viable, but does still change
@ -1695,7 +1695,7 @@ marker implied the classic `FileOutputCommitter` had been used; if it could be r
then it provides some details on the commit operation which are then used then it provides some details on the commit operation which are then used
in assertions in the test suite. in assertions in the test suite.
It has since been extended to collet metrics and other values, and has proven It has since been extended to collect metrics and other values, and has proven
equally useful in Spark integration testing. equally useful in Spark integration testing.
## Integrating the Committers with Apache Spark ## Integrating the Committers with Apache Spark
@ -1727,8 +1727,8 @@ tree.
Alternatively, the fact that Spark tasks provide data to the job committer on their Alternatively, the fact that Spark tasks provide data to the job committer on their
completion means that a list of pending PUT commands could be built up, with the commit completion means that a list of pending PUT commands could be built up, with the commit
operations being excuted by an S3A-specific implementation of the `FileCommitProtocol`. operations being executed by an S3A-specific implementation of the `FileCommitProtocol`.
As noted earlier, this may permit the reqirement for a consistent list operation As noted earlier, this may permit the requirement for a consistent list operation
to be bypassed. It would still be important to list what was being written, as to be bypassed. It would still be important to list what was being written, as
it is needed to aid aborting work in failed tasks, but the list of files it is needed to aid aborting work in failed tasks, but the list of files
created by successful tasks could be passed directly from the task to committer, created by successful tasks could be passed directly from the task to committer,
@ -1833,7 +1833,7 @@ quotas in local FS, keeping temp dirs on different mounted FS from root.
The intermediate `.pendingset` files are saved in HDFS under the directory in The intermediate `.pendingset` files are saved in HDFS under the directory in
`fs.s3a.committer.staging.tmp.path`; defaulting to `/tmp`. This data can `fs.s3a.committer.staging.tmp.path`; defaulting to `/tmp`. This data can
disclose the workflow (it contains the destination paths & amount of data disclose the workflow (it contains the destination paths & amount of data
generated), and if deleted, breaks the job. If malicous code were to edit generated), and if deleted, breaks the job. If malicious code were to edit
the file, by, for example, reordering the ordered etag list, the generated the file, by, for example, reordering the ordered etag list, the generated
data would be committed out of order, creating invalid files. As this is data would be committed out of order, creating invalid files. As this is
the (usually transient) cluster FS, any user in the cluster has the potential the (usually transient) cluster FS, any user in the cluster has the potential
@ -1848,7 +1848,7 @@ The directory defined by `fs.s3a.buffer.dir` is used to buffer blocks
before upload, unless the job is configured to buffer the blocks in memory. before upload, unless the job is configured to buffer the blocks in memory.
This is as before: no incremental risk. As blocks are deleted from the filesystem This is as before: no incremental risk. As blocks are deleted from the filesystem
after upload, the amount of storage needed is determined by the data generation after upload, the amount of storage needed is determined by the data generation
bandwidth and the data upload bandwdith. bandwidth and the data upload bandwidth.
No use is made of the cluster filesystem; there are no risks there. No use is made of the cluster filesystem; there are no risks there.
@ -1946,6 +1946,6 @@ which will made absolute relative to the current user. In filesystems in
which access under user's home directories are restricted, this final, absolute which access under user's home directories are restricted, this final, absolute
path, will not be visible to untrusted accounts. path, will not be visible to untrusted accounts.
* Maybe: define the for valid characters in a text strings, and a regext for * Maybe: define the for valid characters in a text strings, and a regex for
validating, e,g, `[a-zA-Z0-9 \.\,\(\) \-\+]+` and then validate any free text validating, e,g, `[a-zA-Z0-9 \.\,\(\) \-\+]+` and then validate any free text
JSON fields on load and save. JSON fields on load and save.

View File

@ -226,14 +226,14 @@ it is committed through the standard "v1" commit algorithm.
When the Job is committed, the Job Manager reads the lists of pending writes from its When the Job is committed, the Job Manager reads the lists of pending writes from its
HDFS Job destination directory and completes those uploads. HDFS Job destination directory and completes those uploads.
Cancelling a task is straighforward: the local directory is deleted with Cancelling a task is straightforward: the local directory is deleted with
its staged data. Cancelling a job is achieved by reading in the lists of its staged data. Cancelling a job is achieved by reading in the lists of
pending writes from the HDFS job attempt directory, and aborting those pending writes from the HDFS job attempt directory, and aborting those
uploads. For extra safety, all outstanding multipart writes to the destination directory uploads. For extra safety, all outstanding multipart writes to the destination directory
are aborted. are aborted.
The staging committer comes in two slightly different forms, with slightly The staging committer comes in two slightly different forms, with slightly
diffrent conflict resolution policies: different conflict resolution policies:
* **Directory**: the entire directory tree of data is written or overwritten, * **Directory**: the entire directory tree of data is written or overwritten,
@ -278,7 +278,7 @@ any with the same name. Reliable use requires unique names for generated files,
which the committers generate which the committers generate
by default. by default.
The difference between the two staging ommitters are as follows: The difference between the two staging committers are as follows:
The Directory Committer uses the entire directory tree for conflict resolution. The Directory Committer uses the entire directory tree for conflict resolution.
If any file exists at the destination it will fail in job setup; if the resolution If any file exists at the destination it will fail in job setup; if the resolution
@ -301,7 +301,7 @@ It's intended for use in Apache Spark Dataset operations, rather
than Hadoop's original MapReduce engine, and only in jobs than Hadoop's original MapReduce engine, and only in jobs
where adding new data to an existing dataset is the desired goal. where adding new data to an existing dataset is the desired goal.
Preequisites for successful work Prerequisites for successful work
1. The output is written into partitions via `PARTITIONED BY` or `partitionedBy()` 1. The output is written into partitions via `PARTITIONED BY` or `partitionedBy()`
instructions. instructions.
@ -401,7 +401,7 @@ Generated files are initially written to a local directory underneath one of the
directories listed in `fs.s3a.buffer.dir`. directories listed in `fs.s3a.buffer.dir`.
The staging commmitter needs a path in the cluster filesystem The staging committer needs a path in the cluster filesystem
(e.g. HDFS). This must be declared in `fs.s3a.committer.staging.tmp.path`. (e.g. HDFS). This must be declared in `fs.s3a.committer.staging.tmp.path`.
Temporary files are saved in HDFS (or other cluster filesystem) under the path Temporary files are saved in HDFS (or other cluster filesystem) under the path
@ -460,7 +460,7 @@ What the partitioned committer does is, where the tooling permits, allows caller
to add data to an existing partitioned layout*. to add data to an existing partitioned layout*.
More specifically, it does this by having a conflict resolution options which More specifically, it does this by having a conflict resolution options which
only act on invididual partitions, rather than across the entire output tree. only act on individual partitions, rather than across the entire output tree.
| `fs.s3a.committer.staging.conflict-mode` | Meaning | | `fs.s3a.committer.staging.conflict-mode` | Meaning |
| -----------------------------------------|---------| | -----------------------------------------|---------|
@ -508,7 +508,7 @@ documentation to see if it is consistent, hence compatible "out of the box".
<property> <property>
<name>fs.s3a.committer.magic.enabled</name> <name>fs.s3a.committer.magic.enabled</name>
<description> <description>
Enable support in the filesystem for the S3 "Magic" committter. Enable support in the filesystem for the S3 "Magic" committer.
</description> </description>
<value>true</value> <value>true</value>
</property> </property>
@ -706,7 +706,7 @@ This message should not appear through the committer itself &mdash;it will
fail with the error message in the previous section, but may arise fail with the error message in the previous section, but may arise
if other applications are attempting to create files under the path `/__magic/`. if other applications are attempting to create files under the path `/__magic/`.
Make sure the filesytem meets the requirements of the magic committer Make sure the filesystem meets the requirements of the magic committer
(a consistent S3A filesystem through S3Guard or the S3 service itself), (a consistent S3A filesystem through S3Guard or the S3 service itself),
and set the `fs.s3a.committer.magic.enabled` flag to indicate that magic file and set the `fs.s3a.committer.magic.enabled` flag to indicate that magic file
writes are supported. writes are supported.
@ -741,7 +741,7 @@ at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.
While that will not make the problem go away, it will at least make While that will not make the problem go away, it will at least make
the failure happen at the start of a job. the failure happen at the start of a job.
(Setting this option will not interfer with the Staging Committers' use of HDFS, (Setting this option will not interfere with the Staging Committers' use of HDFS,
as it explicitly sets the algorithm to "2" for that part of its work). as it explicitly sets the algorithm to "2" for that part of its work).
The other way to check which committer to use is to examine the `_SUCCESS` file. The other way to check which committer to use is to examine the `_SUCCESS` file.

View File

@ -23,7 +23,7 @@
The S3A filesystem client supports Amazon S3's Server Side Encryption The S3A filesystem client supports Amazon S3's Server Side Encryption
for at-rest data encryption. for at-rest data encryption.
You should to read up on the [AWS documentation](https://docs.aws.amazon.com/AmazonS3/latest/dev/serv-side-encryption.html) You should to read up on the [AWS documentation](https://docs.aws.amazon.com/AmazonS3/latest/dev/serv-side-encryption.html)
for S3 Server Side Encryption for up to date information on the encryption mechansims. for S3 Server Side Encryption for up to date information on the encryption mechanisms.
@ -135,7 +135,7 @@ it blank to use the default configured for that region.
the right to use it, uses it to encrypt the object-specific key. the right to use it, uses it to encrypt the object-specific key.
When downloading SSE-KMS encrypte data, the sequence is as follows When downloading SSE-KMS encrypted data, the sequence is as follows
1. The S3A client issues an HTTP GET request to read the data. 1. The S3A client issues an HTTP GET request to read the data.
1. S3 sees that the data was encrypted with SSE-KMS, and looks up the specific key in the KMS service 1. S3 sees that the data was encrypted with SSE-KMS, and looks up the specific key in the KMS service
@ -413,8 +413,8 @@ a KMS key hosted in the AWS-KMS service in the same region.
``` ```
Again the approprate bucket policy can be used to guarantee that all callers Again the appropriate bucket policy can be used to guarantee that all callers
will use SSE-KMS; they can even mandata the name of the key used to encrypt will use SSE-KMS; they can even mandate the name of the key used to encrypt
the data, so guaranteeing that access to thee data can be read by everyone the data, so guaranteeing that access to thee data can be read by everyone
granted access to that key, and nobody without access to it. granted access to that key, and nobody without access to it.

View File

@ -638,7 +638,7 @@ over that of the `hadoop.security` list (i.e. they are prepended to the common l
</property> </property>
``` ```
This was added to suppport binding different credential providers on a per This was added to support binding different credential providers on a per
bucket basis, without adding alternative secrets in the credential list. bucket basis, without adding alternative secrets in the credential list.
However, some applications (e.g Hive) prevent the list of credential providers However, some applications (e.g Hive) prevent the list of credential providers
from being dynamically updated by users. As per-bucket secrets are now supported, from being dynamically updated by users. As per-bucket secrets are now supported,
@ -938,7 +938,7 @@ The S3A client makes a best-effort attempt at recovering from network failures;
this section covers the details of what it does. this section covers the details of what it does.
The S3A divides exceptions returned by the AWS SDK into different categories, The S3A divides exceptions returned by the AWS SDK into different categories,
and chooses a differnt retry policy based on their type and whether or and chooses a different retry policy based on their type and whether or
not the failing operation is idempotent. not the failing operation is idempotent.
@ -969,7 +969,7 @@ These failures will be retried with a fixed sleep interval set in
`fs.s3a.retry.interval`, up to the limit set in `fs.s3a.retry.limit`. `fs.s3a.retry.interval`, up to the limit set in `fs.s3a.retry.limit`.
### Only retrible on idempotent operations ### Only retriable on idempotent operations
Some network failures are considered to be retriable if they occur on Some network failures are considered to be retriable if they occur on
idempotent operations; there's no way to know if they happened idempotent operations; there's no way to know if they happened
@ -997,11 +997,11 @@ it's a no-op if reprocessed. As indeed, is `Filesystem.delete()`.
1. Any filesystem supporting an atomic `FileSystem.create(path, overwrite=false)` 1. Any filesystem supporting an atomic `FileSystem.create(path, overwrite=false)`
operation to reject file creation if the path exists MUST NOT consider operation to reject file creation if the path exists MUST NOT consider
delete to be idempotent, because a `create(path, false)` operation will delete to be idempotent, because a `create(path, false)` operation will
only succeed if the first `delete()` call has already succeded. only succeed if the first `delete()` call has already succeeded.
1. And a second, retried `delete()` call could delete the new data. 1. And a second, retried `delete()` call could delete the new data.
Because S3 is eventially consistent *and* doesn't support an Because S3 is eventually consistent *and* doesn't support an
atomic create-no-overwrite operation, the choice is more ambigious. atomic create-no-overwrite operation, the choice is more ambiguous.
Currently S3A considers delete to be Currently S3A considers delete to be
idempotent because it is convenient for many workflows, including the idempotent because it is convenient for many workflows, including the
@ -1045,11 +1045,11 @@ Notes
1. There is also throttling taking place inside the AWS SDK; this is managed 1. There is also throttling taking place inside the AWS SDK; this is managed
by the value `fs.s3a.attempts.maximum`. by the value `fs.s3a.attempts.maximum`.
1. Throttling events are tracked in the S3A filesystem metrics and statistics. 1. Throttling events are tracked in the S3A filesystem metrics and statistics.
1. Amazon KMS may thottle a customer based on the total rate of uses of 1. Amazon KMS may throttle a customer based on the total rate of uses of
KMS *across all user accounts and applications*. KMS *across all user accounts and applications*.
Throttling of S3 requests is all too common; it is caused by too many clients Throttling of S3 requests is all too common; it is caused by too many clients
trying to access the same shard of S3 Storage. This generatlly trying to access the same shard of S3 Storage. This generally
happen if there are too many reads, those being the most common in Hadoop happen if there are too many reads, those being the most common in Hadoop
applications. This problem is exacerbated by Hive's partitioning applications. This problem is exacerbated by Hive's partitioning
strategy used when storing data, such as partitioning by year and then month. strategy used when storing data, such as partitioning by year and then month.
@ -1087,7 +1087,7 @@ of data asked for in every GET request, as well as how much data is
skipped in the existing stream before aborting it and creating a new stream. skipped in the existing stream before aborting it and creating a new stream.
1. If the DynamoDB tables used by S3Guard are being throttled, increase 1. If the DynamoDB tables used by S3Guard are being throttled, increase
the capacity through `hadoop s3guard set-capacity` (and pay more, obviously). the capacity through `hadoop s3guard set-capacity` (and pay more, obviously).
1. KMS: "consult AWS about increating your capacity". 1. KMS: "consult AWS about increasing your capacity".
@ -1173,14 +1173,14 @@ fs.s3a.bucket.nightly.server-side-encryption-algorithm
``` ```
When accessing the bucket `s3a://nightly/`, the per-bucket configuration When accessing the bucket `s3a://nightly/`, the per-bucket configuration
options for that backet will be used, here the access keys and token, options for that bucket will be used, here the access keys and token,
and including the encryption algorithm and key. and including the encryption algorithm and key.
### <a name="per_bucket_endpoints"></a>Using Per-Bucket Configuration to access data round the world ### <a name="per_bucket_endpoints"></a>Using Per-Bucket Configuration to access data round the world
S3 Buckets are hosted in different "regions", the default being "US-East". S3 Buckets are hosted in different "regions", the default being "US-East".
The S3A client talks to this region by default, issing HTTP requests The S3A client talks to this region by default, issuing HTTP requests
to the server `s3.amazonaws.com`. to the server `s3.amazonaws.com`.
S3A can work with buckets from any region. Each region has its own S3A can work with buckets from any region. Each region has its own
@ -1331,12 +1331,12 @@ The "fast" output stream
to the available disk space. to the available disk space.
1. Generates output statistics as metrics on the filesystem, including 1. Generates output statistics as metrics on the filesystem, including
statistics of active and pending block uploads. statistics of active and pending block uploads.
1. Has the time to `close()` set by the amount of remaning data to upload, rather 1. Has the time to `close()` set by the amount of remaining data to upload, rather
than the total size of the file. than the total size of the file.
Because it starts uploading while data is still being written, it offers Because it starts uploading while data is still being written, it offers
significant benefits when very large amounts of data are generated. significant benefits when very large amounts of data are generated.
The in memory buffering mechanims may also offer speedup when running adjacent to The in memory buffering mechanisms may also offer speedup when running adjacent to
S3 endpoints, as disks are not used for intermediate data storage. S3 endpoints, as disks are not used for intermediate data storage.
@ -1400,7 +1400,7 @@ upload operation counts, so identifying when there is a backlog of work/
a mismatch between data generation rates and network bandwidth. Per-stream a mismatch between data generation rates and network bandwidth. Per-stream
statistics can also be logged by calling `toString()` on the current stream. statistics can also be logged by calling `toString()` on the current stream.
* Files being written are still invisible untl the write * Files being written are still invisible until the write
completes in the `close()` call, which will block until the upload is completed. completes in the `close()` call, which will block until the upload is completed.
@ -1526,7 +1526,7 @@ compete with other filesystem operations.
We recommend a low value of `fs.s3a.fast.upload.active.blocks`; enough We recommend a low value of `fs.s3a.fast.upload.active.blocks`; enough
to start background upload without overloading other parts of the system, to start background upload without overloading other parts of the system,
then experiment to see if higher values deliver more throughtput —especially then experiment to see if higher values deliver more throughput —especially
from VMs running on EC2. from VMs running on EC2.
```xml ```xml
@ -1569,10 +1569,10 @@ from VMs running on EC2.
There are two mechanisms for cleaning up after leftover multipart There are two mechanisms for cleaning up after leftover multipart
uploads: uploads:
- Hadoop s3guard CLI commands for listing and deleting uploads by their - Hadoop s3guard CLI commands for listing and deleting uploads by their
age. Doumented in the [S3Guard](./s3guard.html) section. age. Documented in the [S3Guard](./s3guard.html) section.
- The configuration parameter `fs.s3a.multipart.purge`, covered below. - The configuration parameter `fs.s3a.multipart.purge`, covered below.
If an large stream writeoperation is interrupted, there may be If a large stream write operation is interrupted, there may be
intermediate partitions uploaded to S3 —data which will be billed for. intermediate partitions uploaded to S3 —data which will be billed for.
These charges can be reduced by enabling `fs.s3a.multipart.purge`, These charges can be reduced by enabling `fs.s3a.multipart.purge`,

View File

@ -506,7 +506,7 @@ Input seek policy: fs.s3a.experimental.input.fadvise=normal
Note that other clients may have a S3Guard table set up to store metadata Note that other clients may have a S3Guard table set up to store metadata
on this bucket; the checks are all done from the perspective of the configuration on this bucket; the checks are all done from the perspective of the configuration
setttings of the current client. settings of the current client.
```bash ```bash
hadoop s3guard bucket-info -guarded -auth s3a://landsat-pds hadoop s3guard bucket-info -guarded -auth s3a://landsat-pds
@ -798,6 +798,6 @@ The IO load of clients of the (shared) DynamoDB table was exceeded.
Currently S3Guard doesn't do any throttling and retries here; the way to address Currently S3Guard doesn't do any throttling and retries here; the way to address
this is to increase capacity via the AWS console or the `set-capacity` command. this is to increase capacity via the AWS console or the `set-capacity` command.
## Other Topis ## Other Topics
For details on how to test S3Guard, see [Testing S3Guard](./testing.html#s3guard) For details on how to test S3Guard, see [Testing S3Guard](./testing.html#s3guard)

View File

@ -28,7 +28,7 @@ be ignored.
## <a name="policy"></a> Policy for submitting patches which affect the `hadoop-aws` module. ## <a name="policy"></a> Policy for submitting patches which affect the `hadoop-aws` module.
The Apache Jenkins infrastucture does not run any S3 integration tests, The Apache Jenkins infrastructure does not run any S3 integration tests,
due to the need to keep credentials secure. due to the need to keep credentials secure.
### The submitter of any patch is required to run all the integration tests and declare which S3 region/implementation they used. ### The submitter of any patch is required to run all the integration tests and declare which S3 region/implementation they used.
@ -319,10 +319,10 @@ mvn verify -Dparallel-tests -Dscale -DtestsThreadCount=8
The most bandwidth intensive tests (those which upload data) always run The most bandwidth intensive tests (those which upload data) always run
sequentially; those which are slow due to HTTPS setup costs or server-side sequentially; those which are slow due to HTTPS setup costs or server-side
actionsare included in the set of parallelized tests. actions are included in the set of parallelized tests.
### <a name="tuning_scale"></a> Tuning scale optins from Maven ### <a name="tuning_scale"></a> Tuning scale options from Maven
Some of the tests can be tuned from the maven build or from the Some of the tests can be tuned from the maven build or from the
@ -344,7 +344,7 @@ then the configuration value is used. The `unset` option is used to
Only a few properties can be set this way; more will be added. Only a few properties can be set this way; more will be added.
| Property | Meaninging | | Property | Meaning |
|-----------|-------------| |-----------|-------------|
| `fs.s3a.scale.test.timeout`| Timeout in seconds for scale tests | | `fs.s3a.scale.test.timeout`| Timeout in seconds for scale tests |
| `fs.s3a.scale.test.huge.filesize`| Size for huge file uploads | | `fs.s3a.scale.test.huge.filesize`| Size for huge file uploads |
@ -493,7 +493,7 @@ cases must be disabled:
<value>false</value> <value>false</value>
</property> </property>
``` ```
These tests reqest a temporary set of credentials from the STS service endpoint. These tests request a temporary set of credentials from the STS service endpoint.
An alternate endpoint may be defined in `test.fs.s3a.sts.endpoint`. An alternate endpoint may be defined in `test.fs.s3a.sts.endpoint`.
```xml ```xml
@ -641,7 +641,7 @@ to support the declaration of a specific large test file on alternate filesystem
### Works Over Long-haul Links ### Works Over Long-haul Links
As well as making file size and operation counts scaleable, this includes As well as making file size and operation counts scalable, this includes
making test timeouts adequate. The Scale tests make this configurable; it's making test timeouts adequate. The Scale tests make this configurable; it's
hard coded to ten minutes in `AbstractS3ATestBase()`; subclasses can hard coded to ten minutes in `AbstractS3ATestBase()`; subclasses can
change this by overriding `getTestTimeoutMillis()`. change this by overriding `getTestTimeoutMillis()`.
@ -677,7 +677,7 @@ Tests can overrun `createConfiguration()` to add new options to the configuratio
file for the S3A Filesystem instance used in their tests. file for the S3A Filesystem instance used in their tests.
However, filesystem caching may mean that a test suite may get a cached However, filesystem caching may mean that a test suite may get a cached
instance created with an differennnt configuration. For tests which don't need instance created with an different configuration. For tests which don't need
specific configurations caching is good: it reduces test setup time. specific configurations caching is good: it reduces test setup time.
For those tests which do need unique options (encryption, magic files), For those tests which do need unique options (encryption, magic files),
@ -888,7 +888,7 @@ s3a://bucket/a/b/c/DELAY_LISTING_ME
``` ```
In real-life S3 inconsistency, however, we expect that all the above paths In real-life S3 inconsistency, however, we expect that all the above paths
(including `a` and `b`) will be subject to delayed visiblity. (including `a` and `b`) will be subject to delayed visibility.
### Using the `InconsistentAmazonS3CClient` in downstream integration tests ### Using the `InconsistentAmazonS3CClient` in downstream integration tests
@ -952,7 +952,7 @@ When the `s3guard` profile is enabled, following profiles can be specified:
DynamoDB web service; launch the server and creating the table. DynamoDB web service; launch the server and creating the table.
You won't be charged bills for using DynamoDB in test. As it runs in-JVM, You won't be charged bills for using DynamoDB in test. As it runs in-JVM,
the table isn't shared across other tests running in parallel. the table isn't shared across other tests running in parallel.
* `non-auth`: treat the S3Guard metadata as authorative. * `non-auth`: treat the S3Guard metadata as authoritative.
```bash ```bash
mvn -T 1C verify -Dparallel-tests -DtestsThreadCount=6 -Ds3guard -Ddynamo -Dauth mvn -T 1C verify -Dparallel-tests -DtestsThreadCount=6 -Ds3guard -Ddynamo -Dauth
@ -984,7 +984,7 @@ throttling, and compare performance for different implementations. These
are included in the scale tests executed when `-Dscale` is passed to are included in the scale tests executed when `-Dscale` is passed to
the maven command line. the maven command line.
The two S3Guard scale testse are `ITestDynamoDBMetadataStoreScale` and The two S3Guard scale tests are `ITestDynamoDBMetadataStoreScale` and
`ITestLocalMetadataStoreScale`. To run the DynamoDB test, you will need to `ITestLocalMetadataStoreScale`. To run the DynamoDB test, you will need to
define your table name and region in your test configuration. For example, define your table name and region in your test configuration. For example,
the following settings allow us to run `ITestDynamoDBMetadataStoreScale` with the following settings allow us to run `ITestDynamoDBMetadataStoreScale` with

View File

@ -967,7 +967,7 @@ Again, this is due to the fact that the data is cached locally until the
`close()` operation. The S3A filesystem cannot be used as a store of data `close()` operation. The S3A filesystem cannot be used as a store of data
if it is required that the data is persisted durably after every if it is required that the data is persisted durably after every
`Syncable.hflush()` or `Syncable.hsync()` call. `Syncable.hflush()` or `Syncable.hsync()` call.
This includes resilient logging, HBase-style journalling This includes resilient logging, HBase-style journaling
and the like. The standard strategy here is to save to HDFS and then copy to S3. and the like. The standard strategy here is to save to HDFS and then copy to S3.
## <a name="encryption"></a> S3 Server Side Encryption ## <a name="encryption"></a> S3 Server Side Encryption