Commit Graph

1440 Commits

Author SHA1 Message Date
Steve Loughran ce7827c82a
HADOOP-17318. Support concurrent S3A commit jobs with same app attempt ID. (#2399)
See also [SPARK-33402]: Jobs launched in same second have duplicate MapReduce JobIDs

Contributed by Steve Loughran.

Change-Id: Iae65333cddc84692997aae5d902ad8765b45772a
2020-11-18 13:34:51 +00:00
Steve Loughran e3c08f285a
HADOOP-17244. S3A directory delete tombstones dir markers prematurely. (#2310)
This fixes the S3Guard/Directory Marker Retention integration so that when
fs.s3a.directory.marker.retention=keep, failures during multipart delete
are handled correctly, as are incremental deletes during
directory tree operations.

In both cases, when a directory marker with children is deleted from
S3, the directory entry in S3Guard is not deleted, because it is still
critical to representing the structure of the store.

Contributed by Steve Loughran.

Change-Id: I4ca133a23ea582cd42ec35dbf2dc85b286297d2f
2020-11-18 12:18:11 +00:00
Jungtaek Lim a7b923c80c
HADOOP-17379. AbstractS3ATokenIdentifier to set issue date == now. (#2466)
Unless you explicitly set it, the issue date of a delegation token identifier is 0, which confuses spark renewal (SPARK-33440). This patch makes sure that all S3A DT identifiers have the current time as issue date, fixing the problem as far as S3A tokens are concerned.

Contributed by Jungtaek Lim.
2020-11-17 14:43:29 +00:00
Doroszlai, Attila dd85a90da6
HADOOP-17376. ITestS3AContractRename failing against stricter tests. (#2462)
Contributed by Attila Doroszlai.
2020-11-16 11:24:00 +00:00
jianghuazhu 375900049c
HDFS-15608.Reset the DistCp#CLEANUP variable definition. (#2351). Contributed by JiangHua Zhu.
Co-authored-by: zhujianghua <zhujianghua@zhujianghuadeMacBook-Pro.local>
2020-11-10 13:02:29 -08:00
Eric E Payne 0461a07c01 YARN-10475: Scale RM-NM heartbeat interval based on node utilization. Contributed by Jim Brennan (Jim_Brennan). 2020-11-02 16:55:06 +00:00
Yiqun Lin 15a5f53673 HDFS-15640. Add diff threshold to FedBalance. Contributed by Jinglun. 2020-10-27 10:41:10 +08:00
Anoop Sam John 7bdf165f62
HADOOP-17308. WASB PageBlobOutputStream.flush succeeds even when flush to storage fails (#2392)
Contributed by Anoop Sam John.
2020-10-23 10:51:19 +01:00
Mukund Thakur 7f8ef76c48
HADOOP-17305. Fix ITestCustomSigner to work with s3 compatible endpoints (#2395)
Contributed by Mukund Thakur
2020-10-21 13:01:13 +01:00
Aryan Gupta d60d5fe43d
HADOOP-17302. Upgrade to jQuery 3.5.1 in hadoop-sls. (#2379)
Co-authored-by: Aryan Gupta
2020-10-19 18:18:46 +05:30
Ayush Saxena 1e3a6efcef
HADOOP-17288. Use shaded guava from thirdparty. (#2342). Contributed by Ayush Saxena. 2020-10-17 12:01:18 +05:30
Adam Antal bd8cf7fd4c YARN-10448. SLS should set default user to handle SYNTH format. Contributed by zhuqi 2020-10-13 17:54:15 +02:00
Sneha Vijayarajan c4fff74cc5
HADOOP-17301. ABFS: read-ahead error reporting breaks buffer management (#2369)
Fixes read-ahead buffer management issues introduced by HADOOP-16852,
 "ABFS: Send error back to client for Read Ahead request failure".

Contributed by Sneha Vijayarajan
2020-10-13 16:30:34 +01:00
Dongjoon Hyun b92f72758b
HADOOP-17258. Magic S3Guard Committer to overwrite existing pendingSet file on task commit (#2371)
Contributed by Dongjoon Hyun and Steve Loughran

Change-Id: Ibaf8082e60eff5298ff4e6513edc386c5bae0274
2020-10-12 13:39:15 +01:00
Steve Loughran f83e07a20f HADOOP-17293. S3A to always probe S3 in S3A getFileStatus on non-auth paths
This reverts changes in HADOOP-13230 to use S3Guard TTL in choosing when
to issue a HEAD request; fixing tests to compensate.

New org.apache.hadoop.fs.s3a.performance.OperationCost cost,
S3GUARD_NONAUTH_FILE_STATUS_PROBE for use in cost tests.

Contributed by Steve Loughran.

Change-Id: I418d55d2d2562a48b2a14ec7dee369db49b4e29e
2020-10-08 15:35:57 +01:00
Mukund Thakur 82522d60fb
HADOOP-17281 Implement FileSystem.listStatusIterator() in S3AFileSystem (#2354)
Contains HADOOP-17300: FileSystem.DirListingIterator.next() call should 
return NoSuchElementException

Contributed by Mukund Thakur
2020-10-07 13:59:06 +01:00
Ikko Ashimine 4347a5c955
HADOOP-17294. Fix typos existance to existence (#2357) 2020-10-06 10:10:44 +09:00
Arpit Agarwal 18fa4397e6
MAPREDUCE-7298. Distcp doesn't close the job after the job is completed. Contributed by Aasha Medhi.
Change-Id: I63d249bbb18ccedaeee9f10123a78e32f9e54ed2
2020-10-02 08:29:55 -07:00
bilaharith 51598d8b1b
HADOOP-17183. ABFS: Enabling checkaccess on ABFS (#2331)
Contributed by Bilahari TH
2020-10-01 21:29:05 +01:00
Sneha Vijayarajan c3a90dd918
HADOOP-17279: ABFS: testNegativeScenariosForCreateOverwriteDisabled fails for non-HNS account.
Contributed by Sneha Vijayarajan

Testing:

namespace.enabled=false
auth.type=SharedKey
$mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify

Tests run: 87, Failures: 0, Errors: 0, Skipped: 0
Tests run: 457, Failures: 0, Errors: 0, Skipped: 246
Tests run: 207, Failures: 0, Errors: 0, Skipped: 24

namespace.enabled=true
auth.type=SharedKey
$mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify

Tests run: 87, Failures: 0, Errors: 0, Skipped: 0
Tests run: 457, Failures: 0, Errors: 0, Skipped: 33
Tests run: 207, Failures: 0, Errors: 0, Skipped: 24

namespace.enabled=true
auth.type=OAuth
$mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify

Tests run: 87, Failures: 0, Errors: 0, Skipped: 0
Tests run: 457, Failures: 0, Errors: 0, Skipped: 74
Tests run: 207, Failures: 0, Errors: 0, Skipped: 140
2020-09-23 15:59:00 +00:00
Steve Loughran 7fae4133e0
HADOOP-17261. s3a rename() needs s3:deleteObjectVersion permission (#2303)
Contributed by Steve Loughran.
2020-09-22 17:22:04 +01:00
Mukund Thakur 83c7c2b4c4
HADOOP-17023. Tune S3AFileSystem.listStatus() (#2257)
S3AFileSystem.listStatus() is optimized for invocations
where the path supplied is a non-empty directory.
The number of S3 requests is significantly reduced, saving
time, money, and reducing the risk of S3 throttling.

Contributed by Mukund Thakur.
2020-09-21 17:20:16 +01:00
Sneha Vijayarajan e31a636e92
HADOOP-17215: Support for conditional overwrite.
Contributed by Sneha Vijayarajan

DETAILS:

    This change adds config key "fs.azure.enable.conditional.create.overwrite" with
    a default of true.  When enabled, if create(path, overwrite: true) is invoked
    and the file exists, the ABFS driver will first obtain its etag and then attempt
    to overwrite the file on the condition that the etag matches. The purpose of this
    is to mitigate the non-idempotency of this method.  Specifically, in the event of
    a network error or similar, the client will retry and this can result in the file
    being created more than once which may result in data loss.  In essense this is
    like a poor man's file handle, and will be addressed more thoroughly in the future
    when support for lease is added to ABFS.

TEST RESULTS:

    namespace.enabled=true
    auth.type=SharedKey
    -------------------
    $mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify
    Tests run: 87, Failures: 0, Errors: 0, Skipped: 0
    Tests run: 457, Failures: 0, Errors: 0, Skipped: 42
    Tests run: 207, Failures: 0, Errors: 0, Skipped: 24

    namespace.enabled=true
    auth.type=OAuth
    -------------------
    $mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify
    Tests run: 87, Failures: 0, Errors: 0, Skipped: 0
    Tests run: 457, Failures: 0, Errors: 0, Skipped: 74
    Tests run: 207, Failures: 0, Errors: 0, Skipped: 140
2020-09-19 01:28:44 +00:00
ThomasMarquardt 0dc54d0247
HADOOP-17203: Revert HADOOP-17183. ABFS: Enabling checkaccess on ABFS
This reverts commit a2610e21ed.
2020-09-18 17:52:11 -07:00
Steve Loughran 958cab804e
Revert "HADOOP-17244. S3A directory delete tombstones dir markers prematurely. (#2280)"
This reverts commit 9960c01a25.

Change-Id: I820534c3292f2a343693d835f625488c325fb5d6
2020-09-11 18:07:49 +01:00
Steve Loughran 9960c01a25
HADOOP-17244. S3A directory delete tombstones dir markers prematurely. (#2280)
This changes directory tree deletion so that only files are incrementally deleted
from S3Guard after the objects are deleted; the directories are left alone
until metadataStore.deleteSubtree(path) is invoke.

This avoids directory tombstones being added above files/child directories,
which stop the treewalk and delete phase from working.

Also:

* Callback to delete objects splits files and dirs so that
any problems deleting the dirs doesn't trigger s3guard updates
* New statistic to measure #of objects deleted, alongside request count.
* Callback listFilesAndEmptyDirectories renamed listFilesAndDirectoryMarkers
  to clarify behavior.
* Test enhancements to replicate the failure and verify the fix

Contributed by Steve Loughran
2020-09-10 17:03:52 +01:00
bilaharith 85119267be
HADOOP-17166. ABFS: configure output stream thread pool (#2179)
Adds the options to control the size of the per-output-stream threadpool
when writing data through the abfs connector

* fs.azure.write.max.concurrent.requests
* fs.azure.write.max.requests.to.queue

Contributed by Bilahari T H
2020-09-09 16:41:36 +01:00
Mehakmeet Singh 0d855159f0
HADOOP-17229. No updation of bytes received counter value after response failure occurs in ABFS (#2264)
Contributed by Mehakmeet Singh
2020-09-08 10:14:23 +01:00
Mehakmeet Singh 84ed6adccc
HADOOP-17158. Test timeout for ITestAbfsInputStreamStatistics#testReadAheadCounters (#2272)
Contributed by: Mehakmeet Singh.
2020-09-08 10:11:06 +01:00
Steve Loughran 5346cc3263
HADOOP-17227. S3A Marker Tool tuning (#2254)
Contributed by Steve Loughran.
2020-09-04 14:58:03 +01:00
Mukund Thakur 139a43e98e
HADOOP-17167 ITestS3AEncryptionWithDefaultS3Settings failing (#2187)
Now skips ITestS3AEncryptionWithDefaultS3Settings.testEncryptionOverRename
when server side encryption is not set to sse:kms

Contributed by Mukund Thakur
2020-09-03 19:35:24 +01:00
Mehakmeet Singh d1c60a53f6
HADOOP-17194. Adding Context class for AbfsClient in ABFS (#2216)
Contributed by Mehakmeet Singh.
2020-08-27 11:27:00 +01:00
Mukund Thakur cc641534dc
HADOOP-17074. S3A Listing to be fully asynchronous. (#2207)
Contributed by Mukund Thakur.
2020-08-25 11:29:43 +01:00
bilaharith 64f36b9543
HADOOP-16915. ABFS: Ignoring the test ITestAzureBlobFileSystemRandomRead.testRandomReadPerformance
- Contributed by Bilahari T H
2020-08-24 12:00:55 -07:00
swamirishi 872c2909bd
HADOOP-17122: Preserving Directory Attributes in DistCp with Atomic Copy (#2133)
Contributed by Swaminathan Balachandran
2020-08-22 18:48:21 +01:00
Sneha Vijayarajan b367942fe4
Upgrade store REST API version to 2019-12-12
- Contributed by Sneha Vijayarajan
2020-08-17 10:17:18 -07:00
Steve Loughran 5092ea62ec HADOOP-13230. S3A to optionally retain directory markers.
This adds an option to disable "empty directory" marker deletion,
so avoid throttling and other scale problems.

This feature is *not* backwards compatible.
Consult the documentation and use with care.

Contributed by Steve Loughran.

Change-Id: I69a61e7584dc36e485d5e39ff25b1e3e559a1958
2020-08-15 12:51:08 +01:00
Mukund Thakur 4a400d3193
HADOOP-17192. ITestS3AHugeFilesSSECDiskBlock failing (#2221)
Contributed by Mukund Thakur
2020-08-13 14:21:49 +01:00
Ayush Saxena 975b6024dd HDFS-15514. Remove useless dfs.webhdfs.enabled. Contributed by Fei Hui. 2020-08-07 22:19:17 +05:30
bilaharith a2610e21ed
HADOOP-17183. ABFS: Enabling checkaccess on ABFS
- Contributed by Bilahari T H
2020-08-06 14:52:02 -07:00
bilaharith 3f73facd7b
HADOOP-17149. ABFS: Fixing the testcase ITestGetNameSpaceEnabled
- Contributed by Bilahari T H
2020-08-05 10:01:04 -07:00
bilaharith c566cabd62
HADOOP-17163. ABFS: Adding debug log for rename failures
- Contributed by Bilahari T H
2020-08-05 09:38:13 -07:00
Mukund Thakur ac697571a1
HADOOP-17186. Fixing javadoc in ListingOperationCallbacks (#2196) 2020-08-05 20:40:49 +09:00
Mukund Thakur 8fd4f5490f
HADOOP-17131. Refactor S3A Listing code for better isolation. (#2148)
Contributed by Mukund Thakur.
2020-08-04 16:00:02 +01:00
Akira Ajisaka c40cbc57fa
HADOOP-17091. [JDK11] Fix Javadoc errors (#2098) 2020-08-03 10:46:51 +09:00
bilaharith a7fda2e38f
HADOOP-17137. ABFS: Makes the test cases in ITestAbfsNetworkStatistics agnostic
- Contributed by Bilahari T H
2020-07-31 12:27:57 -07:00
Mehakmeet Singh 48a7c5b6ba
HADOOP-17113. Adding ReadAhead Counters in ABFS (#2154)
Contributed by Mehakmeet Singh
2020-07-22 18:22:30 +01:00
Masatake Iwasaki 1b29c9bfee
HADOOP-17138. Fix spotbugs warnings surfaced after upgrade to 4.0.6. (#2155) 2020-07-22 13:40:20 +09:00
Sneha Vijayarajan d23cc9d85d
Hadoop 17132. ABFS: Fix Rename and Delete Idempotency check trigger
- Contributed by Sneha Vijayarajan
2020-07-21 09:22:38 -07:00
bilaharith b4b23ef0d1
HADOOP-17092. ABFS: Making AzureADAuthenticator.getToken() throw HttpException
- Contributed by Bilahari T H
2020-07-21 09:18:54 -07:00