266 Commits

Author SHA1 Message Date
Steve Loughran
134539f054
HADOOP-17199. S3A Directory Marker HADOOP-13230 backport #2210)
This backports the listing-side changes of HADOOP-13230.

With this patch in, this branch of Hadoop is compatible with S3A clients
which do not delete directory markers when files are created underneath.

It does not allow this version to disable marker deletion; if the
fs.s3a.marker.retention option is changed to request this, a message
is printed at INFO and the policy remains at "delete"

The s3guard bucket-info command has been extended to support
probing for marker retention, as has the hasPathCapability method on
S3AFileSystem.

Read the documentation!
2020-08-25 22:47:43 +01:00
Steve Loughran
42c71a5790
HADOOP-15691. Add PathCapabilities to FileSystem and FileContext.
Contributed by Steve Loughran.

This complements the StreamCapabilities Interface by allowing applications to probe for a specific path on a specific instance of a FileSystem client
to offer a specific capability.

This is intended to allow applications to determine

* Whether a method is implemented before calling it and dealing with UnsupportedOperationException.
* Whether a specific feature is believed to be available in the remote store.

As well as a common set of capabilities defined in CommonPathCapabilities,
file systems are free to add their own capabilities, prefixed with
 fs. + schema + .

The plan is to identify and document more capabilities -and for file systems which add new features, for a declaration of the availability of the feature to always be available.

Note

* The remote store is not expected to be checked for the feature;
  It is more a check of client API and the client's configuration/knowledge
  of the state of the remote system.
* Permissions are not checked.
2020-08-19 17:15:06 +01:00
Ayush Saxena
d6a9ed8140 HDFS-15514. Remove useless dfs.webhdfs.enabled. Contributed by Fei Hui. 2020-08-07 22:23:02 +05:30
Masatake Iwasaki
77e69a73da HADOOP-17040. Fix intermittent failure of ITestBlockingThreadPoolExecutorService. (#2020)
(cherry picked from commit 968531463375ebf29ba3186c13b5f8685df10d25)
2020-05-22 21:30:57 +09:00
Masatake Iwasaki
89696b66e7 HADOOP-17025. Fix invalid metastore configuration in S3GuardTool tests. (#1994)
(cherry picked from commit 99840aaba662e7bb6187206b74e35132102a3b38)

 Conflicts:
	hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/AbstractS3GuardToolTestBase.java
2020-05-07 12:11:37 +09:00
Steve Loughran
204d54005a HADOOP-16117. Update AWS SDK to 1.11.563.
Contributed by Steve Loughran.

Change-Id: I7c46ed2a6378e1370f567acf4cdcfeb93e43fa13
2020-04-24 10:46:24 +01:00
Mingliang Liu
d19981fe48
HADOOP-16758. Refine testing.md to tell user better how to use auth-keys.xml (#1753)
Contributed by Mingliang Liu
2019-12-11 11:54:12 -08:00
Mingliang Liu
8a60429e0b
HADOOP-16735. Make it clearer in config default that EnvironmentVariableCredentialsProvider supports AWS_SESSION_TOKEN. Contributed by Mingliang Liu
This closes #1733
2019-12-05 17:50:28 -08:00
Rohith Sharma K S
7d5bb2ebb7 Preparing for 3.2.2-SNAPSHOT development. 2019-09-07 08:52:08 +05:30
Takanobu Asanuma
a9a3450560 HADOOP-16331. Fix ASF License check in pom.xml. Contributed by Akira Ajisaka.
Signed-off-by: Takanobu Asanuma <tasanuma@apache.org>
2019-05-29 17:34:16 +09:00
Akira Ajisaka
855dc997d6
HADOOP-16323. https everywhere in Maven settings. 2019-05-27 15:27:33 +09:00
Rajat Khandelwal
12e0053932
HADOOP-16278. With S3A Filesystem, Long Running services End up Doing lot of GC and eventually die.
Contributed by Rajat Khandelwal

(cherry picked from commit 591ca698230f25217c10c7549aff8097baa11f1e)
2019-05-09 21:14:37 +01:00
Steve Loughran
b6ebe74526
HADOOP-16233. S3AFileStatus to declare that isEncrypted() is always true (#685)
This is needed to fix up some confusion about caching of job.addCache() handling of S3A paths; all parent dirs -the files are downloaded by the NM without  using the DTs of the user submitting the job. This means that when you submit jobs to an EC2 cluster with lower IAM permissions than the user, cached resources don't get downloaded and the job doesn't start.

Production code changes:
* S3AFileStatus Adds "true" to the superclass's encrypted flag during construction.

Tests
* Base AbstractContractOpenTest can control whether zero byte files created in tests are encrypted. Not done via an XML attribute, just a subclass point. Thoughts?
* Verify that the filecache considers paths to not have the permissions which trigger reduce-privilege downloads
* And extend ITestDelegatedMRJob to test a completely different bucket (open street map), to verify that cached resources do get their tokens picked up

Docs:
* Advise FS developers to say all files are encrypted. It's otherwise harmless and it'll stop other people seeing impossible to debug error messages on app launch.

Contributed by Steve Loughran.

Change-Id: Ifaae4c9d735ccc5eafeebd2584b65daf2d4e5da3
(cherry picked from commit 366186d9990ef9059b6ac9a19ad24310d6f36d04)
2019-04-03 21:35:19 +01:00
Steve Loughran
60c9042286
HADOOP-16058. S3A tests to include Terasort.
Contributed by Steve Loughran.

This includes
 - HADOOP-15890. Some S3A committer tests don't match ITest* pattern; don't run in maven
 - MAPREDUCE-7090. BigMapOutput example doesn't work with paths off cluster fs
 - MAPREDUCE-7091. Terasort on S3A to switch to new committers
 - MAPREDUCE-7092. MR examples to work better against cloud stores
2019-03-29 15:25:45 +00:00
Adam Antal
81a6ba1825
HADOOP-16124. Extend documentation in testing.md about S3 endpoint constants.
Contributed by Adam Antal.

(cherry picked from commit c0427c84dddf942529dfdfc5cc7a3e25e3f12c5e)
2019-03-18 19:14:43 +00:00
Ben Roling
43e8ac6097
HADOOP-15625. S3A input stream to use etags/version number to detect changed source files.
Author: Ben Roling <ben.roling@gmail.com>

Initial patch from Brahma Reddy Battula.
2019-03-14 19:46:34 +00:00
Steve Loughran
b6f6c34223
HADOOP-16109. Parquet reading S3AFileSystem causes EOF
Nobody gets seek right. No matter how many times they think they have.

Reproducible test from: Dave Christianson
Fixed seek() logic: Steve Loughran
2019-03-11 11:15:25 +00:00
Andrew Olson
36f3e775d4
HADOOP-15281. Distcp to add no-rename copy option.
Contributed by Andrew Olson.

(cherry picked from commit de804e53b9d20a2df75a4c7252bf83ed52011488)
2019-02-07 10:09:13 +00:00
Steve Loughran
bdd17be9ec
HDFS-13713. Add specification of Multipart Upload API to FS specification, with contract tests.
Contributed by Ewan Higgs and Steve Loughran.

(cherry picked from commit c1d24f848345f6d34a2ac2d570d49e9787a0df6a)
2019-02-04 17:10:19 +00:00
Akira Ajisaka
dc12754ab6
HADOOP-16065. -Ddynamodb should be -Ddynamo in AWS SDK testing document.
(cherry picked from commit 3c60303ac59d3b6cc375e7ac10214fc36d330fa4)
2019-01-25 10:28:46 +09:00
Steve Loughran
fa1d4ba7d4
HADOOP-15932. Oozie unable to create sharelib in s3a filesystem.
Contributed by Steve Loughran.

(cherry picked from commit 4c106fca0ca91536e288f11052568406a0b84300)
2018-11-27 20:40:48 +00:00
Akira Ajisaka
8c9681d7f0
HADOOP-15926. Document upgrading the section in NOTICE.txt when upgrading the version of AWS SDK. Contributed by Dinesh Chitlangia.
(cherry picked from commit 66b1335bb3a9a6f3a3db455540c973d4a85bef73)
2018-11-15 16:31:05 +09:00
Sunil G
bde4fd5ed9 Preparing for 3.2.0 release 2018-10-18 17:07:45 +05:30
Steve Loughran
c19864d8c8
HADOOP-15848. ITestS3AContractMultipartUploader#testMultipartUploadEmptyPart test error.
Contributed by Ewan Higgs.
2018-10-16 19:58:58 +01:00
Steve Loughran
70d39c74d9
HADOOP-15837. DynamoDB table Update can fail S3A FS init.
Contributed by Steve Loughran.

(cherry picked from commit ee816f1fd78b029b9efa567e8b1b62391563de14)
2018-10-11 14:58:37 +01:00
Mingliang Liu
c07715e378 HADOOP-15781 S3A assumed role tests failing due to changed error text in AWS exceptions. Contributed by Steve Loughran 2018-09-24 12:53:21 -07:00
Sunil G
d060cbea48 HDFS-13937. Multipart Uploader APIs to be marked as private/unstable in 3.2.0. Contributed by Steve Loughran. 2018-09-24 21:19:47 +05:30
Steve Loughran
26d0c63a1e
HADOOP-15754. s3guard: testDynamoTableTagging should clear existing config.
Contributed by Gabor Bota.
2018-09-17 22:40:08 +01:00
Steve Loughran
d7c0a08a1c
HADOOP-15426 Make S3guard client resilient to DDB throttle events and network failures (Contributed by Steve Loughran) 2018-09-12 21:04:49 -07:00
Aaron Fabbri
d32a8d5d58
HADOOP-14734 add option to tag DDB table(s) created. (Contributed by Gabor Bota and Abe Fine) 2018-09-12 16:36:01 -07:00
Mingliang Liu
1f6c4545cf HADOOP-15750. Remove obsolete S3A test ITestS3ACredentialsInURL. Contributed by Steve Loughran 2018-09-12 10:58:39 -07:00
Sean Mackrory
47b72c87eb HADOOP-15635. s3guard set-capacity command to fail fast if bucket is unguarded.
Contributed by Gabor Bota.
2018-09-12 09:12:38 -06:00
Mingliang Liu
87f63b6479 HADOOP-14833. Remove s3a user:secret authentication. Contributed by Steve Loughran 2018-09-11 17:18:42 -07:00
Gabor Bota
36c7c78260
HADOOP-15709 Move S3Guard LocalMetadataStore constants to org.apache.hadoop.fs.s3a.Constants (Contributed by Gabor Bota) 2018-09-07 10:25:20 -07:00
Steve Loughran
5a0babf765
HADOOP-15107. Stabilize/tune S3A committers; review correctness & docs.
Contributed by Steve Loughran.
2018-08-30 14:49:53 +01:00
Steve Loughran
2e6c1109dc
HADOOP-15667. FileSystemMultipartUploader should verify that UploadHandle has non-0 length.
Contributed by Ewan Higgs
2018-08-30 14:33:16 +01:00
Aaron Fabbri
d7232857d8
HADOOP-14154 Persist isAuthoritative bit in DynamoDBMetaStore (Contributed by Gabor Bota) 2018-08-17 10:15:39 -07:00
Steve Loughran
0e832e7a74
HADOOP-15642. Update aws-sdk version to 1.11.375.
Contributed by Steve Loughran.
2018-08-16 09:58:46 -07:00
Akira Ajisaka
3e3963b035
HADOOP-15552. Move logging APIs over to slf4j in hadoop-tools - Part2. Contributed by Ian Pickering. 2018-08-16 00:31:59 +09:00
Ewan Higgs
a13929ddcb HADOOP-15645. ITestS3GuardToolLocal.testDiffCommand fails if bucket has per-bucket binding to DDB. Contributed by Steve Loughran. 2018-08-13 12:57:45 +02:00
Steve Loughran
da9a39eed1
HADOOP-15583. Stabilize S3A Assumed Role support.
Contributed by Steve Loughran.
2018-08-08 22:57:24 -07:00
Ewan Higgs
2ec97abb2e HADOOP-15576. S3A Multipart Uploader to work with S3Guard and encryption Originally contributed by Ewan Higgs with refinements by Steve Loughran. 2018-08-08 13:50:23 +02:00
Sean Mackrory
7862f1523f HADOOP-15400. Improve S3Guard documentation on Authoritative Mode implementation. (Contributed by Gabor Bota) 2018-08-07 20:13:09 -06:00
Steve Loughran
48673bc2a8
HADOOP-15626. FileContextMainOperationsBaseTest.testBuilderCreateAppendExistingFile fails on filesystems without append.
Contributed by Steve Loughran.
2018-08-03 16:06:00 -07:00
Sean Mackrory
59adeb8d7f HADOOP-15636. Follow-up from HADOOP-14918; restoring test under new name. Contributed by Gabor Bota. 2018-07-27 18:23:29 -06:00
Sean Mackrory
a08812a1b1 HADOOP-15349. S3Guard DDB retryBackoff to be more informative on limits exceeded. Contributed by Gabor Bota. 2018-07-12 17:24:01 +02:00
Sean Mackrory
d503f65b66 HADOOP-15541. [s3a] Shouldn't try to drain stream before aborting
connection in case of timeout.
2018-07-10 17:52:57 +02:00
Aaron Fabbri
93ac01cb59
HADOOP-15215 s3guard set-capacity command to fail on read/write of 0 (Gabor Bota) 2018-07-03 13:50:11 -07:00
Akira Ajisaka
2b2399d623
HADOOP-15495. Upgrade commons-lang version to 3.7 in hadoop-common-project and hadoop-tools. Contributed by Takanobu Asanuma. 2018-06-28 14:37:22 +09:00
Sean Mackrory
c687a6617d HADOOP-15423. Merge fileCache and dirCache into ine single cache in LocalMetadataStore. Contributed by Gabor Bota. 2018-06-25 14:59:41 -06:00