diff --git a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/committer_architecture.md b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/committer_architecture.md index 3071754836c..30ee7b4e7a3 100644 --- a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/committer_architecture.md +++ b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/committer_architecture.md @@ -242,7 +242,7 @@ def commitTask(fs, jobAttemptPath, taskAttemptPath, dest): On a genuine filesystem this is an `O(1)` directory rename. -On an object store with a mimiced rename, it is `O(data)` for the copy, +On an object store with a mimicked rename, it is `O(data)` for the copy, along with overhead for listing and deleting all files (For S3, that's `(1 + files/500)` lists, and the same number of delete calls. @@ -476,7 +476,7 @@ def needsTaskCommit(fs, jobAttemptPath, taskAttemptPath, dest): def commitTask(fs, jobAttemptPath, taskAttemptPath, dest): if fs.exists(taskAttemptPath) : - mergePathsV2(fs. taskAttemptPath, dest) + mergePathsV2(fs, taskAttemptPath, dest) ``` ### v2 Task Abort @@ -903,7 +903,7 @@ not be a problem. IBM's [Stocator](https://github.com/SparkTC/stocator) can transform indirect writes of V1/V2 committers into direct writes to the destination directory. -Hpw does it do this? It's a special Hadoop `FileSystem` implementation which +How does it do this? It's a special Hadoop `FileSystem` implementation which recognizes writes to `_temporary` paths and translate them to writes to the base directory. As well as translating the write operation, it also supports a `getFileStatus()` call on the original path, returning details on the file @@ -969,7 +969,7 @@ It is that fact, that a different process may perform different parts of the upload, which make this algorithm viable. -## The Netfix "Staging" committer +## The Netflix "Staging" committer Ryan Blue, of Netflix, has submitted an alternate committer, one which has a number of appealing features @@ -1081,7 +1081,7 @@ output reaches the job commit. Similarly, if a task is aborted, temporary output on the local FS is removed. If a task dies while the committer is running, it is possible for data to be -eft on the local FS or as unfinished parts in S3. +left on the local FS or as unfinished parts in S3. Unfinished upload parts in S3 are not visible to table readers and are cleaned up following the rules in the target bucket's life-cycle policy. diff --git a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md index 22b98ed599c..964bda49dd0 100644 --- a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md +++ b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md @@ -159,7 +159,7 @@ the number of files, during which time partial updates may be visible. If the operations are interrupted, the filesystem is left in an intermediate state. -### Warning #2: Directories are mimiced +### Warning #2: Directories are mimicked The S3A clients mimics directories by: @@ -184,7 +184,7 @@ Parts of Hadoop relying on this can have unexpected behaviour. E.g. the performance recursive listings whenever possible. * It is possible to create files under files if the caller tries hard. * The time to rename a directory is proportional to the number of files -underneath it (directory or indirectly) and the size of the files. (The copyis +underneath it (directory or indirectly) and the size of the files. (The copy is executed inside the S3 storage, so the time is independent of the bandwidth from client to S3). * Directory renames are not atomic: they can fail partway through, and callers @@ -320,7 +320,7 @@ export AWS_SECRET_ACCESS_KEY=my.secret.key If the environment variable `AWS_SESSION_TOKEN` is set, session authentication using "Temporary Security Credentials" is enabled; the Key ID and secret key -must be set to the credentials for that specific sesssion. +must be set to the credentials for that specific session. ```bash export AWS_SESSION_TOKEN=SECRET-SESSION-TOKEN @@ -534,7 +534,7 @@ This means that the default S3A authentication chain can be defined as to directly authenticate with S3 and DynamoDB services. When S3A Delegation tokens are enabled, depending upon the delegation token binding it may be used - to communicate wih the STS endpoint to request session/role + to communicate with the STS endpoint to request session/role credentials. These are loaded and queried in sequence for a valid set of credentials. @@ -630,13 +630,13 @@ The S3A configuration options with sensitive data and `fs.s3a.server-side-encryption.key`) can have their data saved to a binary file stored, with the values being read in when the S3A filesystem URL is used for data access. The reference to this -credential provider then declareed in the hadoop configuration. +credential provider then declared in the Hadoop configuration. For additional reading on the Hadoop Credential Provider API see: [Credential Provider API](../../../hadoop-project-dist/hadoop-common/CredentialProviderAPI.html). -The following configuration options can be storeed in Hadoop Credential Provider +The following configuration options can be stored in Hadoop Credential Provider stores. ``` @@ -725,7 +725,7 @@ of credentials. ### Using secrets from credential providers -Once the provider is set in the Hadoop configuration, hadoop commands +Once the provider is set in the Hadoop configuration, Hadoop commands work exactly as if the secrets were in an XML file. ```bash @@ -761,7 +761,7 @@ used to change the endpoint, encryption and authentication mechanisms of buckets S3Guard options, various minor options. Here are the S3A properties for use in production. The S3Guard options are -documented in the [S3Guard documenents](./s3guard.html); some testing-related +documented in the [S3Guard documents](./s3guard.html); some testing-related options are covered in [Testing](./testing.md). ```xml