HADOOP-18752. Change fs.s3a.directory.marker.retention to "keep" (#5689)

This 
1. changes the default value of fs.s3a.directory.marker.retention
   to "keep"
2. no longer prints a message when an S3A FS instances is
   instantiated with any option other than delete.

Switching to marker retention improves performance
on any S3 bucket as there are no needless marker DELETE requests
-leading to a reduction in write IOPS and and any delays waiting
for the DELETE call to finish.

There are *very* significant improvements on versioned buckets,
where tombstone markers slow down LIST operations: the more
tombstones there are, the worse query planning gets.

Having versioning enabled on production stores is the foundation
of any data protection strategy, so this has tangible benefits
in production.

It is *not* compatible with older hadoop releases; specifically
- Hadoop branch 2 < 2.10.2
- Any release of Hadoop 3.0.x and Hadoop 3.1.x
- Hadoop 3.2.0 and 3.2.1
- Hadoop 3.3.0
Incompatible releases have no problems reading data in stores
where markers are retained, but can get confused when deleting
or renaming directories.

If you are still using older versions to write to data, and cannot
yet upgrade, switch the option back to "delete"

Contributed by Steve Loughran
This commit is contained in:
Steve Loughran 2023-06-08 12:12:29 +01:00 committed by GitHub
parent 0e6bd09ae3
commit 7bb09f1010
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
7 changed files with 141 additions and 158 deletions

View File

@ -1088,7 +1088,7 @@ public final class Constants {
* Default retention policy: {@value}.
*/
public static final String DEFAULT_DIRECTORY_MARKER_POLICY =
DIRECTORY_MARKER_POLICY_DELETE;
DIRECTORY_MARKER_POLICY_KEEP;
/**

View File

@ -186,11 +186,11 @@ public final class DirectoryPolicyImpl
policy = DELETE;
break;
case DIRECTORY_MARKER_POLICY_KEEP:
LOG.info("Directory markers will be kept");
LOG.debug("Directory markers will be kept");
policy = KEEP;
break;
case DIRECTORY_MARKER_POLICY_AUTHORITATIVE:
LOG.info("Directory markers will be kept on authoritative"
LOG.debug("Directory markers will be kept on authoritative"
+ " paths");
policy = new DirectoryPolicyImpl(MarkerPolicy.Authoritative,
authoritativeness);

View File

@ -792,7 +792,7 @@ Security
Delegation token support is disabled
Directory Markers
The directory marker policy is "delete"
The directory marker policy is "keep"
Available Policies: delete, keep, authoritative
Authoritative paths: fs.s3a.authoritative.path=```
```

View File

@ -12,11 +12,11 @@
limitations under the License. See accompanying LICENSE file.
-->
# Experimental: Controlling the S3A Directory Marker Behavior
# Controlling the S3A Directory Marker Behavior
This document discusses an experimental feature of the S3A
connector since Hadoop 3.3.1: the ability to retain directory
marker objects above paths containing files or subdirectories.
This document discusses a performance feature of the S3A
connector: directory markers are not deleted unless the
client is explicitly configured to do so.
## <a name="compatibility"></a> Critical: this is not backwards compatible!
@ -24,23 +24,28 @@ This document shows how the performance of S3 I/O, especially applications
creating many files (for example Apache Hive) or working with versioned S3 buckets can
increase performance by changing the S3A directory marker retention policy.
Changing the policy from the default value, `"delete"` _is not backwards compatible_.
The default policy in this release of hadoop is "keep",
which _is not backwards compatible_ with hadoop versions
released before 2021.
Versions of Hadoop which are incompatible with other marker retention policies,
as of August 2020.
The compatibility table of older releases is as follows:
| Branch | Compatible Since | Supported |
|------------|------------------|---------------------|
| Hadoop 2.x | n/a | WONTFIX |
| Hadoop 3.0 | check | Read-only |
| Hadoop 3.1 | check | Read-only |
| Hadoop 3.2 | check | Read-only |
| Hadoop 3.3 | 3.3.1 | Done |
| Branch | Compatible Since | Supported | Released |
|------------|------------------|-----------|----------|
| Hadoop 2.x | 2.10.2 | Read-only | 05/2022 |
| Hadoop 3.0 | n/a | WONTFIX | |
| Hadoop 3.1 | n/a | WONTFIX | |
| Hadoop 3.2 | 3.2.2 | Read-only | 01/2022 |
| Hadoop 3.3 | 3.3.1 | Done | 01/2021 |
*WONTFIX*
The Hadoop branch-2 line will *not* be patched.
The Hadoop 3.0 and 3.1 lines will have no further releases, so will
not be upgraded.
The compatibility patch "HADOOP-17199. S3A Directory Marker HADOOP-13230 backport"
is present in both source code branches, for anyone wishing to make a private
release.
*Read-only*
@ -50,7 +55,17 @@ These branches have read-only compatibility.
such directories have child entries.
* They will open files under directories with such markers.
## How to re-enable backwards compatibility
The option `fs.s3a.directory.marker.retention` can be changed to "delete" to re-enable
the original policy.
```xml
<property>
<name>fs.s3a.directory.marker.retention</name>
<value>delete</value>
</property>
```
## Verifying read compatibility.
The `s3guard bucket-info` tool [can be used to verify support](#bucket-info).
@ -161,7 +176,7 @@ When a file is created under a path, the directory marker is deleted. And when a
file is deleted, if it was the last file in the directory, the marker is
recreated.
And, historically, When a path is listed, if a marker to that path is found, *it
And, historically, when a path is listed, if a marker to that path is found, *it
has been interpreted as an empty directory.*
It is that little detail which is the cause of the incompatibility issues.
@ -202,7 +217,7 @@ Issue #2 has turned out to cause significant problems on some interactions with
large hive tables:
Because each object listed in a DELETE call is treated as one operation, and
there is (as of summer 2020) a limit of 3500 write requests/second in a directory
there is a limit of 3500 write requests/second in a directory
tree.
When writing many files to a deep directory tree, it is the delete calls which
create throttling problems.
@ -237,29 +252,19 @@ of backwards compatibility.
There is now an option `fs.s3a.directory.marker.retention` which controls how
markers are managed when new files are created
*Default* `delete`: a request is issued to delete any parental directory markers
1. `delete`: a request is issued to delete any parental directory markers
whenever a file or directory is created.
*New* `keep`: No delete request is issued.
2. `keep`: No delete request is issued.
Any directory markers which exist are not deleted.
This is *not* backwards compatible
3. `authoritative`: directory markers are deleted _except for files created
in "authoritative" directories_. This is backwards compatible _outside authoritative directories_.
*New* `authoritative`: directory markers are deleted _except for files created
in "authoritative" directories_.
This is backwards compatible _outside authoritative directories_.
The setting, `fs.s3a.directory.marker.retention = delete` is compatible with
every shipping Hadoop release; that of `keep` compatible with
all releases since 2021.
Until now, the notion of an "authoritative"
directory has only been used as a performance optimization for deployments
where it is known that all applications were using the same S3Guard metastore
when writing and reading data.
In such a deployment, if it is also known that all applications are using a
compatible version of the s3a connector, then they
can switch to the higher-performance mode for those specific directories.
Only the default setting, `fs.s3a.directory.marker.retention = delete` is compatible with
every shipping Hadoop releases.
## <a name="s3guard"></a> Directory Markers and S3Guard
## <a name="s3guard"></a> Directory Markers and Authoritative paths
The now-deleted S3Guard feature included the concept of "authoritative paths";
@ -271,9 +276,8 @@ store -potentially being much faster.
In production, authoritative paths were usually only ever for Hive managed
tables, where access was strictly restricted to the Hive services.
Likewise, the directory marker retention can enabled purely for authoritative
paths.
When S3Guard is configured to treat some directories as [Authoritative](s3guard.html#authoritative)
When the S3A client is configured to treat some directories as "Authoritative"
then an S3A connector with a retention policy of `fs.s3a.directory.marker.retention` of
`authoritative` will omit deleting markers in authoritative directories.
@ -293,12 +297,12 @@ The `bucket-info` command has been enhanced to support verification from the com
line of bucket policies via the `-marker` option
| option | verifies |
|--------|--------|
| `-markers aware` | the hadoop release is "aware" of directory markers |
| `-markers delete` | directory markers are deleted |
| `-markers keep` | directory markers are kept (not backwards compatible) |
| `-markers authoritative` | directory markers are kept in authoritative paths |
| option | verifies |
|--------------------------|-------------------------------------------------------|
| `-markers aware` | the hadoop release is "aware" of directory markers |
| `-markers delete` | directory markers are deleted |
| `-markers keep` | directory markers are kept (not backwards compatible) |
| `-markers authoritative` | directory markers are kept in authoritative paths |
All releases of Hadoop which have been updated to be marker aware will support the `-markers aware` option.
@ -311,18 +315,17 @@ Example: `s3guard bucket-info -markers aware` on a compatible release.
```
> hadoop s3guard bucket-info -markers aware s3a://landsat-pds/
Filesystem s3a://landsat-pds
Location: us-west-2
Filesystem s3a://landsat-pds
Location: us-west-2
...
Security
Delegation token support is disabled
Directory Markers
The directory marker policy is "keep"
Available Policies: delete, keep, authoritative
Authoritative paths: fs.s3a.authoritative.path=
The S3A connector is compatible with buckets where directory markers are not deleted
The directory marker policy is "delete"
The S3A connector is compatible with buckets where directory markers are not deleted
Available Policies: delete, keep, authoritative
```
The same command will fail on older releases, because the `-markers` option
@ -351,20 +354,24 @@ Generic options supported are:
A specific policy check verifies that the connector is configured as desired
```
> hadoop s3guard bucket-info -markers delete s3a://landsat-pds/
> hadoop s3guard bucket-info -markers keep s3a://landsat-pds/
Filesystem s3a://landsat-pds
Location: us-west-2
...
The directory marker policy is "delete"
Directory Markers
The directory marker policy is "keep"
Available Policies: delete, keep, authoritative
Authoritative paths: fs.s3a.authoritative.path=
```
When probing for a specific policy, the error code "46" is returned if the active policy
does not match that requested:
```
> hadoop s3guard bucket-info -markers keep s3a://landsat-pds/
> hadoop s3guard bucket-info -markers delete s3a://landsat-pds/
Filesystem s3a://landsat-pds
Location: us-west-2
@ -397,7 +404,7 @@ Directory Markers
```
## <a name="marker-tool"></a> The marker tool:`hadoop s3guard markers`
## <a name="marker-tool"></a> The marker tool: `hadoop s3guard markers`
The marker tool aims to help migration by scanning/auditing directory trees
for surplus markers, and for optionally deleting them.
@ -415,25 +422,25 @@ markers (-audit | -clean) [-min <count>] [-max <count>] [-out <filename>] [-limi
*Options*
| Option | Meaning |
|-------------------------|-------------------------|
| `-audit` | Audit the path for surplus markers |
| `-clean` | Clean all surplus markers under a path |
| `-min <count>` | Minimum number of markers an audit must find (default: 0) |
| `-max <count>]` | Minimum number of markers an audit must find (default: 0) |
| `-limit <count>]` | Limit the number of objects to scan |
| `-nonauth` | Only consider markers in non-authoritative paths as errors |
| `-out <filename>` | Save a list of all markers found to the nominated file |
| `-verbose` | Verbose output |
| Option | Meaning |
|-------------------|------------------------------------------------------------|
| `-audit` | Audit the path for surplus markers |
| `-clean` | Clean all surplus markers under a path |
| `-min <count>` | Minimum number of markers an audit must find (default: 0) |
| `-max <count>]` | Minimum number of markers an audit must find (default: 0) |
| `-limit <count>]` | Limit the number of objects to scan |
| `-nonauth` | Only consider markers in non-authoritative paths as errors |
| `-out <filename>` | Save a list of all markers found to the nominated file |
| `-verbose` | Verbose output |
*Exit Codes*
| Code | Meaning |
|-------|---------|
| 0 | Success |
| 3 | interrupted -the value of `-limit` was reached |
| 42 | Usage |
| 46 | Markers were found (see HTTP "406", "unacceptable") |
| Code | Meaning |
|------|-----------------------------------------------------|
| 0 | Success |
| 3 | interrupted -the value of `-limit` was reached |
| 42 | Usage |
| 46 | Markers were found (see HTTP "406", "unacceptable") |
All other non-zero status code also indicate errors of some form or other.
@ -445,7 +452,7 @@ Audit the path and fail if any markers were found.
```
> hadoop s3guard markers -limit 8000 -audit s3a://landsat-pds/
The directory marker policy of s3a://landsat-pds is "Delete"
The directory marker policy of s3a://landsat-pds is "Keep"
2020-08-05 13:42:56,079 [main] INFO tools.MarkerTool (DurationInfo.java:<init>(77)) - Starting: marker scan s3a://landsat-pds/
Scanned 1,000 objects
Scanned 2,000 objects
@ -470,7 +477,6 @@ are found under the path `/tables`.
```
> bin/hadoop s3guard markers -audit s3a://london/
2020-08-05 18:29:16,473 [main] INFO impl.DirectoryPolicyImpl (DirectoryPolicyImpl.java:getDirectoryPolicy(143)) - Directory markers will be kept on authoritative paths
The directory marker policy of s3a://london is "Authoritative"
Authoritative path list is "/tables"
2020-08-05 18:29:19,186 [main] INFO tools.MarkerTool (DurationInfo.java:<init>(77)) - Starting: marker scan s3a://london/
@ -500,7 +506,6 @@ The `-nonauth` option does not treat markers under authoritative paths as errors
```
bin/hadoop s3guard markers -nonauth -audit s3a://london/
2020-08-05 18:31:16,255 [main] INFO impl.DirectoryPolicyImpl (DirectoryPolicyImpl.java:getDirectoryPolicy(143)) - Directory markers will be kept on authoritative paths
The directory marker policy of s3a://london is "Authoritative"
Authoritative path list is "/tables"
2020-08-05 18:31:19,210 [main] INFO tools.MarkerTool (DurationInfo.java:<init>(77)) - Starting: marker scan s3a://london/
@ -532,44 +537,53 @@ The `markers clean` command will clean the directory tree of all surplus markers
The `-verbose` option prints more detail on the operation as well as some IO statistics
```
> hadoop s3guard markers -clean -verbose s3a://london/
bin/hadoop s3guard markers -clean -verbose s3a://stevel-london/
The directory marker policy of s3a://stevel-london is "Keep"
2023-06-06 17:15:52,110 [main] INFO tools.MarkerTool (DurationInfo.java:<init>(77)) - Starting: marker scan s3a://stevel-london/
Directory Marker user/stevel/target/test/data/4so7pZebRx/
Directory Marker user/stevel/target/test/data/OKvfC3oxlD/
Directory Marker user/stevel/target/test/data/VSTQ1O4dMi/
2020-08-05 18:33:25,303 [main] INFO impl.DirectoryPolicyImpl (DirectoryPolicyImpl.java:getDirectoryPolicy(143)) - Directory markers will be kept on authoritative paths
The directory marker policy of s3a://london is "Authoritative"
Authoritative path list is "/tables"
2020-08-05 18:33:28,511 [main] INFO tools.MarkerTool (DurationInfo.java:<init>(77)) - Starting: marker scan s3a://london/
Directory Marker tables
Directory Marker tables/tables-2/
Directory Marker tables/tables-3/
Directory Marker tables/tables-4/
Directory Marker tables/tables-4/tables-5/
Directory Marker tables/tables-4/tables-5/06/
Directory Marker tables2/
Directory Marker tables3/
2020-08-05 18:33:31,685 [main] INFO tools.MarkerTool (DurationInfo.java:close(98)) - marker scan s3a://london/: duration 0:03.175s
Listed 8 objects under s3a://london/
Listing statistics:
counters=((object_continue_list_request=0) (object_list_request.failures=0) (object_list_request=1) (object_continue_list_request.failures=0));
gauges=();
minimums=((object_list_request.min=540) (object_continue_list_request.min=-1) (object_continue_list_request.failures.min=-1) (object_list_request.failures.min=-1));
maximums=((object_continue_list_request.failures.max=-1) (object_list_request.failures.max=-1) (object_list_request.max=540) (object_continue_list_request.max=-1));
means=((object_list_request.mean=(samples=1, sum=540, mean=540.0000)) (object_continue_list_request.failures.mean=(samples=0, sum=0, mean=0.0000)) (object_list_request.failures.mean=(samples=0, sum=0, mean=0.0000)) (object_continue_list_request.mean=(samples=0, sum=0, mean=0.0000)));
Found 3 surplus directory markers under s3a://london/
s3a://london/tables/
s3a://london/tables/tables-4/
s3a://london/tables/tables-4/tables-5/
Found 5 empty directory 'leaf' markers under s3a://london/
s3a://london/tables/tables-2/
s3a://london/tables/tables-3/
s3a://london/tables/tables-4/tables-5/06/
s3a://london/tables2/
s3a://london/tables3/
2023-06-06 17:15:52,662 [main] INFO tools.MarkerTool (DurationInfo.java:close(98)) - marker scan s3a://stevel-london/: duration 0:00.553s
Listed 3 objects under s3a://stevel-london/
No surplus directory markers were found under s3a://stevel-london/
Found 3 empty directory 'leaf' markers under s3a://stevel-london/
s3a://stevel-london/user/stevel/target/test/data/4so7pZebRx/
s3a://stevel-london/user/stevel/target/test/data/OKvfC3oxlD/
s3a://stevel-london/user/stevel/target/test/data/VSTQ1O4dMi/
These are required to indicate empty directories
3 markers to delete in 1 page of 250 keys/page
2020-08-05 18:33:31,688 [main] INFO tools.MarkerTool (DurationInfo.java:<init>(77)) - Starting: Deleting markers
2020-08-05 18:33:31,812 [main] INFO tools.MarkerTool (DurationInfo.java:close(98)) - Deleting markers: duration 0:00.124s
0 markers to delete in 0 pages of 250 keys/page
2023-06-06 17:15:52,664 [main] INFO tools.MarkerTool (DurationInfo.java:<init>(77)) - Starting: Deleting markers
2023-06-06 17:15:52,664 [main] INFO tools.MarkerTool (DurationInfo.java:close(98)) - Deleting markers: duration 0:00.000s
Storage Statistics for s3a://london
IO Statistics for s3a://stevel-london
op_get_file_status 1
object_delete_requests 1
object_list_requests 2
counters=((audit_request_execution=1)
(audit_span_creation=3)
(object_list_request=1)
(op_get_file_status=1)
(store_io_request=1));
gauges=();
minimums=((object_list_request.min=540)
(op_get_file_status.min=2));
maximums=((object_list_request.max=540)
(op_get_file_status.max=2));
means=((object_list_request.mean=(samples=1, sum=540, mean=540.0000))
(op_get_file_status.mean=(samples=1, sum=2, mean=2.0000)));
```
The `markers -clean` command _does not_ delete markers above empty directories -only those which have
@ -647,7 +661,7 @@ Take a bucket with a retention policy of "authoritative" -only paths under `/tab
<property>
<name>fs.s3a.bucket.london.authoritative.path</name>
<value>/tables</value>
</property>```
</property>
```
With this policy the path capability `fs.s3a.capability.directory.marker.action.keep` will hold under
@ -732,6 +746,8 @@ and some applications.
* [HADOOP-16823](https://issues.apache.org/jira/browse/HADOOP-16823). _Large DeleteObject requests are their own Thundering Herd_
* [HADOOP-17199](https://issues.apache.org/jira/browse/HADOOP-17199).
Backport HADOOP-13230 list/getFileStatus changes for preserved directory markers
* [Object Versioning](https://docs.aws.amazon.com/AmazonS3/latest/dev/Versioning.html). _Using versioning_
* [Optimizing Performance](https://docs.aws.amazon.com/AmazonS3/latest/dev/optimizing-performance.html). _Best Practices Design Patterns: Optimizing Amazon S3 Performance_

View File

@ -23,11 +23,10 @@
### <a name="directory-marker-compatibility"></a> Directory Marker Compatibility
1. This release can safely list/index/read S3 buckets where "empty directory"
markers are retained.
1. This release can be configured to retain these directory makers at the
expense of being backwards incompatible.
This release does not delete directory markers when creating
files or directories underneath.
This is incompatible with versions of the Hadoop S3A client released
before 2021.
Consult [Controlling the S3A Directory Marker Behavior](directory_markers.html) for
full details.

View File

@ -119,14 +119,7 @@ Without S3Guard, listing performance may be slower. However, Hadoop 3.3.0+ has s
improved listing performance ([HADOOP-17400](https://issues.apache.org/jira/browse/HADOOP-17400)
_Optimize S3A for maximum performance in directory listings_) so this should not be apparent.
We recommend disabling [directory marker deletion](directory_markers.html) to reduce
the number of DELETE operations made when writing files.
this reduces the load on the S3 partition and so the risk of throttling, which can
impact performance.
This is very important when working with versioned S3 buckets, as the tombstone markers
created will slow down subsequent listing operations.
Finally, the S3A [auditing](auditing.html) feature adds information to the S3 server logs
The S3A [auditing](auditing.html) feature adds information to the S3 server logs
about which jobs, users and filesystem operations have been making S3 requests.
This auditing information can be used to identify opportunities to reduce load.
@ -162,7 +155,6 @@ Example
```bash
> hadoop s3guard bucket-info -magic -markers keep s3a://test-london/
2021-11-22 15:21:00,289 [main] INFO impl.DirectoryPolicyImpl (DirectoryPolicyImpl.java:getDirectoryPolicy(189)) - Directory markers will be kept
Filesystem s3a://test-london
Location: eu-west-2

View File

@ -339,11 +339,6 @@ Hadoop supports [different policies for directory marker retention](directory_ma
-essentially the classic "delete" and the higher-performance "keep" options; "authoritative"
is just "keep" restricted to a part of the bucket.
Example: test with `markers=delete`
```
mvn verify -Dparallel-tests -DtestsThreadCount=4 -Dmarkers=delete
```
Example: test with `markers=keep`
@ -351,6 +346,14 @@ Example: test with `markers=keep`
mvn verify -Dparallel-tests -DtestsThreadCount=4 -Dmarkers=keep
```
This is the default and does not need to be explicitly set.
Example: test with `markers=delete`
```
mvn verify -Dparallel-tests -DtestsThreadCount=4 -Dmarkers=delete
```
Example: test with `markers=authoritative`
```
@ -1268,33 +1271,6 @@ bin/hdfs fetchdt -print secrets.bin
# expect warning "No TokenRenewer defined for token kind S3ADelegationToken/Session"
bin/hdfs fetchdt -renew secrets.bin
# ---------------------------------------------------
# Directory markers
# ---------------------------------------------------
# require success
bin/hadoop s3guard bucket-info -markers aware $BUCKET
# expect failure unless bucket policy is keep
bin/hadoop s3guard bucket-info -markers keep $BUCKET/path
# you may need to set this on a per-bucket basis if you have already been
# playing with options
bin/hadoop s3guard -D fs.s3a.directory.marker.retention=keep bucket-info -markers keep $BUCKET/path
bin/hadoop s3guard -D fs.s3a.bucket.$BUCKETNAME.directory.marker.retention=keep bucket-info -markers keep $BUCKET/path
# expect to see "Directory markers will be kept" messages and status code of "46"
bin/hadoop fs -D fs.s3a.bucket.$BUCKETNAME.directory.marker.retention=keep -mkdir $BUCKET/p1
bin/hadoop fs -D fs.s3a.bucket.$BUCKETNAME.directory.marker.retention=keep -mkdir $BUCKET/p1/p2
bin/hadoop fs -D fs.s3a.bucket.$BUCKETNAME.directory.marker.retention=keep -touchz $BUCKET/p1/p2/file
# expect failure as markers will be found for /p1/ and /p1/p2/
bin/hadoop s3guard markers -audit -verbose $BUCKET
# clean will remove markers
bin/hadoop s3guard markers -clean -verbose $BUCKET
# expect success and exit code of 0
bin/hadoop s3guard markers -audit -verbose $BUCKET
# ---------------------------------------------------
# Copy to from local