Commit Graph

1504 Commits

Author SHA1 Message Date
Thomas Marquardt 717b835068
HADOOP-17397: ABFS: SAS Test updates for version and permission update
DETAILS:

    The previous commit for HADOOP-17397 was not the correct fix.  DelegationSASGenerator.getDelegationSAS
    should return sp=p for the set-permission and set-acl operations.  The tests have also been updated as
    follows:

    1. When saoid and suoid are not specified, skoid must have an RBAC role assignment which grants
       Microsoft.Storage/storageAccounts/blobServices/containers/blobs/modifyPermissions/action and sp=p
       to set permissions or set ACL.

    2. When saoid or suiod is specified, same as 1) but furthermore the saoid or suoid must be an owner of
       the file or directory in order for the operation to succeed.

    3. When saoid or suiod is specified, the ownership check is bypassed by also including 'o' (ownership)
       in the SAS permission (for example, sp=op).  Note that 'o' grants the saoid or suoid the ability to
       change the file or directory owner to themself, and they can also change the owning group. Generally
       speaking, if a trusted authorizer would like to give a user the ability to change the permissions or
       ACL, then that user should be the file or directory owner.

TEST RESULTS:

    namespace.enabled=true
    auth.type=SharedKey
    -------------------
    $mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify
    Tests run: 90, Failures: 0, Errors: 0, Skipped: 0
    Tests run: 462, Failures: 0, Errors: 0, Skipped: 24
    Tests run: 208, Failures: 0, Errors: 0, Skipped: 24

    namespace.enabled=true
    auth.type=OAuth
    -------------------
    $mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify
    Tests run: 90, Failures: 0, Errors: 0, Skipped: 0
    Tests run: 462, Failures: 0, Errors: 0, Skipped: 70
    Tests run: 208, Failures: 0, Errors: 0, Skipped: 141
2020-12-03 13:11:17 +00:00
Sneha Vijayarajan 142941b96e
HADOOP-17296. ABFS: Force reads to be always of buffer size.
Contributed by Sneha Vijayarajan.
2020-11-27 14:22:34 +00:00
Mukund Thakur 03b4e98971
HADOOP-17398. Skipping network I/O in S3A getFileStatus(/) breaks some tests (#2493)
Follow-on to HADOOP-17323.

Contributed by Mukund Thakur.
2020-11-26 20:25:32 +00:00
Steve Loughran 67dc0928c1
HADOOP-17385. ITestS3ADeleteCost.testDirMarkersFileCreation failure (#2473). Contributed by Steve Loughran
The addition of deprecated S3A configuration options in HADOOP-17318
triggered a reload of default (xml resource) configurations, which breaks
tests which fail if there's a per-bucket setting inconsistent with test
setup.

Creating an S3AFS instance before creating the Configuration() instance
for test runs gets that reload out the way before test setup takes
place.

Along with the fix, extra changes in the failing test suite to fail
fast when marker policy isn't as expected, and to log FS state better.

Rather than create and discard an instance, add a new static method
to S3AFS and invoke it in test setup. This forces the load

Change-Id: Id52b1c46912c6fedd2ae270e2b1eb2222a360329
2020-11-26 13:50:33 +01:00
Sneha Vijayarajan cf43a7eaae
HADOOP-17397. ABFS: SAS Test updates for version and permission update (#2492)
Contributed by Sneha Vijayarajan.
2020-11-26 10:21:01 +00:00
Sneha Vijayarajan 009ce4f02a
HADOOP-17396. ABFS: testRenameFileOverExistingFile fails (#2491)
Contributed by Sneha  Vijayarajan.
2020-11-26 10:11:25 +00:00
Steve Loughran ac7045b75f
HADOOP-17313. FileSystem.get to support slow-to-instantiate FS clients. (#2396)
This adds a semaphore to throttle the number of FileSystem instances which
can be created simultaneously, set in "fs.creation.parallel.count".

This is designed to reduce the impact of many threads in an application calling
FileSystem.get() on a filesystem which takes time to instantiate -for example
to an object where HTTPS connections are set up during initialization.
Many threads trying to do this may create spurious delays by conflicting
for access to synchronized blocks, when simply limiting the parallelism
diminishes the conflict, so speeds up all threads trying to access
the store.

The default value, 64, is larger than is likely to deliver any speedup -but
it does mean that there should be no adverse effects from the change.

If a service appears to be blocking on all threads initializing connections to
abfs, s3a or store, try a smaller (possibly significantly smaller) value.

Contributed by Steve Loughran.
2020-11-25 14:31:02 +00:00
bilaharith 3193d8c793
HADOOP-17311. ABFS: Logs should redact SAS signature (#2422)
Contributed by bilaharith.
2020-11-25 14:22:10 +00:00
Mukund Thakur 5fee95076b
HADOOP-17323. S3A getFileStatus("/") to skip IO (#2479)
Contributed by Mukund Thakur.
2020-11-24 11:06:56 +00:00
Steve Loughran 9b4faf2b51
HADOOP-17332. S3A MarkerTool -min and -max are inverted. (#2425)
This patch
* fixes the inversion
* adds a precondition check
* if the commands are supplied inverted, swaps them with a warning.
  This is to stop breaking any tests written to cope with the existing
  behavior.

Contributed by Steve Loughran
2020-11-23 20:49:42 +00:00
Steve Loughran 07b7d07388
HADOOP-17325. WASB Test Failures
Contributed by Ayush Saxena and Steve Loughran

Change-Id: I4bb76815bc1d11d1804dc67bafde68b6a995b974
2020-11-23 17:22:13 +00:00
Steve Loughran fb79be932c
HADOOP-17343. Upgrade AWS SDK to 1.11.901 (#2468)
Contributed by Steve Loughran.
2020-11-23 14:08:12 +00:00
Jungtaek Lim f3c629c27e
HADOOP-17388. AbstractS3ATokenIdentifier to issue date in UTC. (#2477)
Followup to HADOOP-17379.

Contributed by Jungtaek Lim.
2020-11-20 10:38:42 +00:00
Ahmed Hussein 07050339e0
HADOOP-17367. Add InetAddress api to ProxyUsers.authorize (#2449). Contributed by Daryn Sharp and Ahmed Hussein 2020-11-19 14:37:14 -06:00
Steve Loughran ce7827c82a
HADOOP-17318. Support concurrent S3A commit jobs with same app attempt ID. (#2399)
See also [SPARK-33402]: Jobs launched in same second have duplicate MapReduce JobIDs

Contributed by Steve Loughran.

Change-Id: Iae65333cddc84692997aae5d902ad8765b45772a
2020-11-18 13:34:51 +00:00
Steve Loughran e3c08f285a
HADOOP-17244. S3A directory delete tombstones dir markers prematurely. (#2310)
This fixes the S3Guard/Directory Marker Retention integration so that when
fs.s3a.directory.marker.retention=keep, failures during multipart delete
are handled correctly, as are incremental deletes during
directory tree operations.

In both cases, when a directory marker with children is deleted from
S3, the directory entry in S3Guard is not deleted, because it is still
critical to representing the structure of the store.

Contributed by Steve Loughran.

Change-Id: I4ca133a23ea582cd42ec35dbf2dc85b286297d2f
2020-11-18 12:18:11 +00:00
Jungtaek Lim a7b923c80c
HADOOP-17379. AbstractS3ATokenIdentifier to set issue date == now. (#2466)
Unless you explicitly set it, the issue date of a delegation token identifier is 0, which confuses spark renewal (SPARK-33440). This patch makes sure that all S3A DT identifiers have the current time as issue date, fixing the problem as far as S3A tokens are concerned.

Contributed by Jungtaek Lim.
2020-11-17 14:43:29 +00:00
Doroszlai, Attila dd85a90da6
HADOOP-17376. ITestS3AContractRename failing against stricter tests. (#2462)
Contributed by Attila Doroszlai.
2020-11-16 11:24:00 +00:00
jianghuazhu 375900049c
HDFS-15608.Reset the DistCp#CLEANUP variable definition. (#2351). Contributed by JiangHua Zhu.
Co-authored-by: zhujianghua <zhujianghua@zhujianghuadeMacBook-Pro.local>
2020-11-10 13:02:29 -08:00
Eric E Payne 0461a07c01 YARN-10475: Scale RM-NM heartbeat interval based on node utilization. Contributed by Jim Brennan (Jim_Brennan). 2020-11-02 16:55:06 +00:00
Yiqun Lin 15a5f53673 HDFS-15640. Add diff threshold to FedBalance. Contributed by Jinglun. 2020-10-27 10:41:10 +08:00
Anoop Sam John 7bdf165f62
HADOOP-17308. WASB PageBlobOutputStream.flush succeeds even when flush to storage fails (#2392)
Contributed by Anoop Sam John.
2020-10-23 10:51:19 +01:00
Mukund Thakur 7f8ef76c48
HADOOP-17305. Fix ITestCustomSigner to work with s3 compatible endpoints (#2395)
Contributed by Mukund Thakur
2020-10-21 13:01:13 +01:00
Aryan Gupta d60d5fe43d
HADOOP-17302. Upgrade to jQuery 3.5.1 in hadoop-sls. (#2379)
Co-authored-by: Aryan Gupta
2020-10-19 18:18:46 +05:30
Ayush Saxena 1e3a6efcef
HADOOP-17288. Use shaded guava from thirdparty. (#2342). Contributed by Ayush Saxena. 2020-10-17 12:01:18 +05:30
Adam Antal bd8cf7fd4c YARN-10448. SLS should set default user to handle SYNTH format. Contributed by zhuqi 2020-10-13 17:54:15 +02:00
Sneha Vijayarajan c4fff74cc5
HADOOP-17301. ABFS: read-ahead error reporting breaks buffer management (#2369)
Fixes read-ahead buffer management issues introduced by HADOOP-16852,
 "ABFS: Send error back to client for Read Ahead request failure".

Contributed by Sneha Vijayarajan
2020-10-13 16:30:34 +01:00
Dongjoon Hyun b92f72758b
HADOOP-17258. Magic S3Guard Committer to overwrite existing pendingSet file on task commit (#2371)
Contributed by Dongjoon Hyun and Steve Loughran

Change-Id: Ibaf8082e60eff5298ff4e6513edc386c5bae0274
2020-10-12 13:39:15 +01:00
Steve Loughran f83e07a20f HADOOP-17293. S3A to always probe S3 in S3A getFileStatus on non-auth paths
This reverts changes in HADOOP-13230 to use S3Guard TTL in choosing when
to issue a HEAD request; fixing tests to compensate.

New org.apache.hadoop.fs.s3a.performance.OperationCost cost,
S3GUARD_NONAUTH_FILE_STATUS_PROBE for use in cost tests.

Contributed by Steve Loughran.

Change-Id: I418d55d2d2562a48b2a14ec7dee369db49b4e29e
2020-10-08 15:35:57 +01:00
Mukund Thakur 82522d60fb
HADOOP-17281 Implement FileSystem.listStatusIterator() in S3AFileSystem (#2354)
Contains HADOOP-17300: FileSystem.DirListingIterator.next() call should 
return NoSuchElementException

Contributed by Mukund Thakur
2020-10-07 13:59:06 +01:00
Ikko Ashimine 4347a5c955
HADOOP-17294. Fix typos existance to existence (#2357) 2020-10-06 10:10:44 +09:00
Arpit Agarwal 18fa4397e6
MAPREDUCE-7298. Distcp doesn't close the job after the job is completed. Contributed by Aasha Medhi.
Change-Id: I63d249bbb18ccedaeee9f10123a78e32f9e54ed2
2020-10-02 08:29:55 -07:00
bilaharith 51598d8b1b
HADOOP-17183. ABFS: Enabling checkaccess on ABFS (#2331)
Contributed by Bilahari TH
2020-10-01 21:29:05 +01:00
Sneha Vijayarajan c3a90dd918
HADOOP-17279: ABFS: testNegativeScenariosForCreateOverwriteDisabled fails for non-HNS account.
Contributed by Sneha Vijayarajan

Testing:

namespace.enabled=false
auth.type=SharedKey
$mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify

Tests run: 87, Failures: 0, Errors: 0, Skipped: 0
Tests run: 457, Failures: 0, Errors: 0, Skipped: 246
Tests run: 207, Failures: 0, Errors: 0, Skipped: 24

namespace.enabled=true
auth.type=SharedKey
$mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify

Tests run: 87, Failures: 0, Errors: 0, Skipped: 0
Tests run: 457, Failures: 0, Errors: 0, Skipped: 33
Tests run: 207, Failures: 0, Errors: 0, Skipped: 24

namespace.enabled=true
auth.type=OAuth
$mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify

Tests run: 87, Failures: 0, Errors: 0, Skipped: 0
Tests run: 457, Failures: 0, Errors: 0, Skipped: 74
Tests run: 207, Failures: 0, Errors: 0, Skipped: 140
2020-09-23 15:59:00 +00:00
Steve Loughran 7fae4133e0
HADOOP-17261. s3a rename() needs s3:deleteObjectVersion permission (#2303)
Contributed by Steve Loughran.
2020-09-22 17:22:04 +01:00
Mukund Thakur 83c7c2b4c4
HADOOP-17023. Tune S3AFileSystem.listStatus() (#2257)
S3AFileSystem.listStatus() is optimized for invocations
where the path supplied is a non-empty directory.
The number of S3 requests is significantly reduced, saving
time, money, and reducing the risk of S3 throttling.

Contributed by Mukund Thakur.
2020-09-21 17:20:16 +01:00
Sneha Vijayarajan e31a636e92
HADOOP-17215: Support for conditional overwrite.
Contributed by Sneha Vijayarajan

DETAILS:

    This change adds config key "fs.azure.enable.conditional.create.overwrite" with
    a default of true.  When enabled, if create(path, overwrite: true) is invoked
    and the file exists, the ABFS driver will first obtain its etag and then attempt
    to overwrite the file on the condition that the etag matches. The purpose of this
    is to mitigate the non-idempotency of this method.  Specifically, in the event of
    a network error or similar, the client will retry and this can result in the file
    being created more than once which may result in data loss.  In essense this is
    like a poor man's file handle, and will be addressed more thoroughly in the future
    when support for lease is added to ABFS.

TEST RESULTS:

    namespace.enabled=true
    auth.type=SharedKey
    -------------------
    $mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify
    Tests run: 87, Failures: 0, Errors: 0, Skipped: 0
    Tests run: 457, Failures: 0, Errors: 0, Skipped: 42
    Tests run: 207, Failures: 0, Errors: 0, Skipped: 24

    namespace.enabled=true
    auth.type=OAuth
    -------------------
    $mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify
    Tests run: 87, Failures: 0, Errors: 0, Skipped: 0
    Tests run: 457, Failures: 0, Errors: 0, Skipped: 74
    Tests run: 207, Failures: 0, Errors: 0, Skipped: 140
2020-09-19 01:28:44 +00:00
ThomasMarquardt 0dc54d0247
HADOOP-17203: Revert HADOOP-17183. ABFS: Enabling checkaccess on ABFS
This reverts commit a2610e21ed.
2020-09-18 17:52:11 -07:00
Steve Loughran 958cab804e
Revert "HADOOP-17244. S3A directory delete tombstones dir markers prematurely. (#2280)"
This reverts commit 9960c01a25.

Change-Id: I820534c3292f2a343693d835f625488c325fb5d6
2020-09-11 18:07:49 +01:00
Steve Loughran 9960c01a25
HADOOP-17244. S3A directory delete tombstones dir markers prematurely. (#2280)
This changes directory tree deletion so that only files are incrementally deleted
from S3Guard after the objects are deleted; the directories are left alone
until metadataStore.deleteSubtree(path) is invoke.

This avoids directory tombstones being added above files/child directories,
which stop the treewalk and delete phase from working.

Also:

* Callback to delete objects splits files and dirs so that
any problems deleting the dirs doesn't trigger s3guard updates
* New statistic to measure #of objects deleted, alongside request count.
* Callback listFilesAndEmptyDirectories renamed listFilesAndDirectoryMarkers
  to clarify behavior.
* Test enhancements to replicate the failure and verify the fix

Contributed by Steve Loughran
2020-09-10 17:03:52 +01:00
bilaharith 85119267be
HADOOP-17166. ABFS: configure output stream thread pool (#2179)
Adds the options to control the size of the per-output-stream threadpool
when writing data through the abfs connector

* fs.azure.write.max.concurrent.requests
* fs.azure.write.max.requests.to.queue

Contributed by Bilahari T H
2020-09-09 16:41:36 +01:00
Mehakmeet Singh 0d855159f0
HADOOP-17229. No updation of bytes received counter value after response failure occurs in ABFS (#2264)
Contributed by Mehakmeet Singh
2020-09-08 10:14:23 +01:00
Mehakmeet Singh 84ed6adccc
HADOOP-17158. Test timeout for ITestAbfsInputStreamStatistics#testReadAheadCounters (#2272)
Contributed by: Mehakmeet Singh.
2020-09-08 10:11:06 +01:00
Steve Loughran 5346cc3263
HADOOP-17227. S3A Marker Tool tuning (#2254)
Contributed by Steve Loughran.
2020-09-04 14:58:03 +01:00
Mukund Thakur 139a43e98e
HADOOP-17167 ITestS3AEncryptionWithDefaultS3Settings failing (#2187)
Now skips ITestS3AEncryptionWithDefaultS3Settings.testEncryptionOverRename
when server side encryption is not set to sse:kms

Contributed by Mukund Thakur
2020-09-03 19:35:24 +01:00
Mehakmeet Singh d1c60a53f6
HADOOP-17194. Adding Context class for AbfsClient in ABFS (#2216)
Contributed by Mehakmeet Singh.
2020-08-27 11:27:00 +01:00
Mukund Thakur cc641534dc
HADOOP-17074. S3A Listing to be fully asynchronous. (#2207)
Contributed by Mukund Thakur.
2020-08-25 11:29:43 +01:00
bilaharith 64f36b9543
HADOOP-16915. ABFS: Ignoring the test ITestAzureBlobFileSystemRandomRead.testRandomReadPerformance
- Contributed by Bilahari T H
2020-08-24 12:00:55 -07:00
swamirishi 872c2909bd
HADOOP-17122: Preserving Directory Attributes in DistCp with Atomic Copy (#2133)
Contributed by Swaminathan Balachandran
2020-08-22 18:48:21 +01:00
Sneha Vijayarajan b367942fe4
Upgrade store REST API version to 2019-12-12
- Contributed by Sneha Vijayarajan
2020-08-17 10:17:18 -07:00
Steve Loughran 5092ea62ec HADOOP-13230. S3A to optionally retain directory markers.
This adds an option to disable "empty directory" marker deletion,
so avoid throttling and other scale problems.

This feature is *not* backwards compatible.
Consult the documentation and use with care.

Contributed by Steve Loughran.

Change-Id: I69a61e7584dc36e485d5e39ff25b1e3e559a1958
2020-08-15 12:51:08 +01:00
Mukund Thakur 4a400d3193
HADOOP-17192. ITestS3AHugeFilesSSECDiskBlock failing (#2221)
Contributed by Mukund Thakur
2020-08-13 14:21:49 +01:00
Ayush Saxena 975b6024dd HDFS-15514. Remove useless dfs.webhdfs.enabled. Contributed by Fei Hui. 2020-08-07 22:19:17 +05:30
bilaharith a2610e21ed
HADOOP-17183. ABFS: Enabling checkaccess on ABFS
- Contributed by Bilahari T H
2020-08-06 14:52:02 -07:00
bilaharith 3f73facd7b
HADOOP-17149. ABFS: Fixing the testcase ITestGetNameSpaceEnabled
- Contributed by Bilahari T H
2020-08-05 10:01:04 -07:00
bilaharith c566cabd62
HADOOP-17163. ABFS: Adding debug log for rename failures
- Contributed by Bilahari T H
2020-08-05 09:38:13 -07:00
Mukund Thakur ac697571a1
HADOOP-17186. Fixing javadoc in ListingOperationCallbacks (#2196) 2020-08-05 20:40:49 +09:00
Mukund Thakur 8fd4f5490f
HADOOP-17131. Refactor S3A Listing code for better isolation. (#2148)
Contributed by Mukund Thakur.
2020-08-04 16:00:02 +01:00
Akira Ajisaka c40cbc57fa
HADOOP-17091. [JDK11] Fix Javadoc errors (#2098) 2020-08-03 10:46:51 +09:00
bilaharith a7fda2e38f
HADOOP-17137. ABFS: Makes the test cases in ITestAbfsNetworkStatistics agnostic
- Contributed by Bilahari T H
2020-07-31 12:27:57 -07:00
Mehakmeet Singh 48a7c5b6ba
HADOOP-17113. Adding ReadAhead Counters in ABFS (#2154)
Contributed by Mehakmeet Singh
2020-07-22 18:22:30 +01:00
Masatake Iwasaki 1b29c9bfee
HADOOP-17138. Fix spotbugs warnings surfaced after upgrade to 4.0.6. (#2155) 2020-07-22 13:40:20 +09:00
Sneha Vijayarajan d23cc9d85d
Hadoop 17132. ABFS: Fix Rename and Delete Idempotency check trigger
- Contributed by Sneha Vijayarajan
2020-07-21 09:22:38 -07:00
bilaharith b4b23ef0d1
HADOOP-17092. ABFS: Making AzureADAuthenticator.getToken() throw HttpException
- Contributed by Bilahari T H
2020-07-21 09:18:54 -07:00
Mukund Thakur bb459d4dd6
HADOOP-17136. ITestS3ADirectoryPerformance.testListOperations failing (#2153)
A regression caused by HADOOP-17022: the reduction in LIST calls broken an assertion.

Contributed by Mukund Thakur
2020-07-20 16:58:50 +01:00
Steve Loughran 9f407bcc88
HADOOP-17107. hadoop-azure parallel tests not working on recent JDKs (#2118)
Contributed by Steve Loughran.
2020-07-20 10:51:26 +01:00
bilaharith 99655167f3
HADOOP-16682. ABFS: Removing unnecessary toString() invocations
- Contributed by Bilahari T H
2020-07-18 10:00:18 -07:00
Ayush Saxena 6bcb24d269 HADOOP-17100. Replace Guava Supplier with Java8+ Supplier in Hadoop. Contributed by Ahmed Hussein. 2020-07-18 14:33:43 +05:30
Mehakmeet Singh 4083fd57b5
HADOOP-17129. Validating storage keys in ABFS correctly (#2141)
Contributed by Mehakmeet Singh
2020-07-16 17:29:37 +01:00
Mukund Thakur 4647a60430
HADOOP-17022. Tune S3AFileSystem.listFiles() API.
Contributed by Mukund Thakur.

Change-Id: I17f5cfdcd25670ce3ddb62c13378c7e2dc06ba52
2020-07-14 15:27:35 +01:00
Anoop Sam John 380e0f4506
HADOOP-16998. WASB : NativeAzureFsOutputStream#close() throwing IllegalArgumentException (#2073)
Contributed by Anoop Sam John.
2020-07-14 14:07:27 +01:00
jimmy-zuber-amzn 806d84b79c
HADOOP-17105. S3AFS - Do not attempt to resolve symlinks in globStatus (#2113)
Contributed by Jimmy Zuber.
2020-07-13 19:07:48 +01:00
Steve Loughran b9fa5e0182
HDFS-13934. Multipart uploaders to be created through FileSystem/FileContext.
Contributed by Steve Loughran.

Change-Id: Iebd34140c1a0aa71f44a3f4d0fee85f6bdf123a3
2020-07-13 13:30:02 +01:00
Sebastian Nagel 5b1ed2113b
HADOOP-17117 Fix typos in hadoop-aws documentation (#2127) 2020-07-09 00:03:15 +09:00
ishaniahuja d20109c171
HADOOP-17058. ABFS: Support for AppendBlob in Hadoop ABFS Driver
- Contributed by Ishani Ahuja
2020-07-04 13:25:14 -07:00
bilaharith e0cededfbd
HADOOP-17086. ABFS: Making the ListStatus response ignore unknown properties. (#2101)
Contributed by Bilahari T H.
2020-07-03 19:00:22 +01:00
Mehakmeet Singh 3b5c9a90c0
HADOOP-16961. ABFS: Adding metrics to AbfsInputStream (#2076)
Contributed by Mehakmeet Singh.
2020-07-03 11:41:35 +01:00
Yiqun Lin ff8bb67200 HDFS-15374. Add documentation for fedbalance tool. Contributed by Jinglun. 2020-07-01 14:18:18 +08:00
Yiqun Lin de2cb86260 HDFS-15410. Add separated config file hdfs-fedbalance-default.xml for fedbalance tool. Contributed by Jinglun. 2020-07-01 14:06:27 +08:00
Steve Loughran 4249c04d45
HADOOP-16798. S3A Committer thread pool shutdown problems. (#1963)
Contributed by Steve Loughran.

Fixes a condition which can cause job commit to fail if a task was
aborted < 60s before the job commit commenced: the task abort
will shut down the thread pool with a hard exit after 60s; the
job commit POST requests would be scheduled through the same pool,
so be interrupted and fail. At present the access is synchronized,
but presumably the executor shutdown code is calling wait() and releasing
locks.

Task abort is triggered from the AM when task attempts succeed but
there are still active speculative task attempts running. Thus it
only surfaces when speculation is enabled and the final tasks are
speculating, which, given they are the stragglers, is not unheard of.

Note: this problem has never been seen in production; it has surfaced
in the hadoop-aws tests on a heavily overloaded desktop
2020-06-30 10:44:51 +01:00
Thomas Marquardt 4b5b54c73f
HADOOP-17089: WASB: Update azure-storage-java SDK
Contributed by Thomas Marquardt

DETAILS: WASB depends on the Azure Storage Java SDK. There is a concurrency
bug in the Azure Storage Java SDK that can cause the results of a list blobs
operation to appear empty. This causes the Filesystem listStatus and similar
APIs to return empty results. This has been seen in Spark work loads when jobs
use more than one executor core.

See Azure/azure-storage-java#546 for details on the bug in the Azure Storage SDK.

TESTS: A new test was added to validate the fix. All tests are passing:

wasb:
mvn -T 1C -Dparallel-tests=wasb -Dscale -DtestsThreadCount=8 clean verify
Tests run: 248, Failures: 0, Errors: 0, Skipped: 11
Tests run: 651, Failures: 0, Errors: 0, Skipped: 65

abfs:
mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify
Tests run: 64, Failures: 0, Errors: 0, Skipped: 0
Tests run: 437, Failures: 0, Errors: 0, Skipped: 33
Tests run: 206, Failures: 0, Errors: 0, Skipped: 24
2020-06-25 02:32:42 +00:00
Akira Ajisaka 201d734af3
HDFS-15428. Javadocs fails for hadoop-federation-balance. Contributed by Xieming Li. 2020-06-22 19:43:19 +09:00
Mehakmeet Singh 3472c3efc0
HADOOP-17065. Add Network Counters to ABFS (#2056)
Contributed by Mehakmeet Singh.
2020-06-19 14:03:49 +01:00
Yiqun Lin 9cbd76cc77 HDFS-15346. FedBalance tool implementation. Contributed by Jinglun. 2020-06-18 13:33:25 +08:00
Thomas Marquardt caf3995ac2
HADOOP-17076: ABFS: Delegation SAS Generator Updates
Contributed by Thomas Marquardt.

DETAILS:
1) The authentication version in the service has been updated from Dec19 to Feb20, so need to update the client.
2) Add support and test cases for getXattr and setXAttr.
3) Update DelegationSASGenerator and related to use Duration instead of int for time periods.
4) Cleanup DelegationSASGenerator switch/case statement that maps operations to permissions.
5) Cleanup SASGenerator classes to use String.equals instead of ==.

TESTS:
Added tests for getXAttr and setXAttr.

All tests are passing against my account in eastus2euap:

 $mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify
 Tests run: 76, Failures: 0, Errors: 0, Skipped: 0
 Tests run: 441, Failures: 0, Errors: 0, Skipped: 33
 Tests run: 206, Failures: 0, Errors: 0, Skipped: 24
2020-06-18 02:07:08 +00:00
Steve Loughran ac5d899d40
HADOOP-17050 S3A to support additional token issuers
Contributed by Steve Loughran.

S3A delegation token providers will be asked for any additional
token issuers, an array can be returned,
each one will be asked for tokens when DelegationTokenIssuer collects
all the tokens for a filesystem.
2020-06-09 14:39:06 +01:00
Steve Loughran 40d63e02f0
HADOOP-16568. S3A FullCredentialsTokenBinding fails if local credentials are unset. (#1441)
Contributed by Steve Loughran.

Move the loading to deployUnbonded (where they are required) and add a safety check when a new DT is requested
2020-06-03 17:07:00 +01:00
Mehakmeet Singh 7f486f0258
HADOOP-17016. Adding Common Counters in ABFS (#1991).
Contributed by: Mehakmeet Singh.

Change-Id: Ib84e7a42f28e064df4c6204fcce33e573360bf42
2020-06-02 18:31:35 +01:00
Karthik Amarnath b2200a33a6
HDFS-15168: ABFS enhancement to translate AAD to Linux identities. (#1978) 2020-05-28 19:00:23 -07:00
Sneha Vijayarajan 4c5cd751e3
HADOOP-17053. ABFS: Fix Account-specific OAuth config setting parsing
Contributed by Sneha Vijayarajan
2020-05-27 13:56:09 -07:00
Sneha Vijayarajan 53b993e604
HADOOP-16852: Report read-ahead error back
Contributed by Sneha Vijayarajan
2020-05-27 13:51:42 -07:00
Sneha Vijayarajan 37b1b4799d
HADOOP-17054. ABFS: Fix test AbfsClient authentication instance
Contributed by Sneha Vijayarajan
2020-05-26 15:26:28 -07:00
Masatake Iwasaki 9685314633
HADOOP-17040. Fix intermittent failure of ITestBlockingThreadPoolExecutorService. (#2020) 2020-05-22 18:50:19 +09:00
bilaharith d2f7133c62
HADOOP-17004. Fixing a formatting issue
Contributed by Bilahari T H.
2020-05-20 11:51:48 -07:00
Mukund Thakur 29b19cd592
HADOOP-16900. Very large files can be truncated when written through the S3A FileSystem.
Contributed by Mukund Thakur and Steve Loughran.

This patch ensures that writes to S3A fail when more than 10,000 blocks are
written. That upper bound still exists. To write massive files, make sure
that the value of fs.s3a.multipart.size is set to a size which is large
enough to upload the files in fewer than 10,000 blocks.

Change-Id: Icec604e2a357ffd38d7ae7bc3f887ff55f2d721a
2020-05-20 13:42:25 +01:00
Masatake Iwasaki 0b7799bf6e
HADOOP-16586. ITestS3GuardFsck, others fails when run using a local metastore. (#1950) 2020-05-20 08:47:04 +09:00
Sneha Vijayarajan 8f78aeb250
Hadoop-17015. ABFS: Handling Rename and Delete idempotency
Contributed by Sneha Vijayarajan.
2020-05-19 12:30:07 -07:00
bilaharith bdbd59cfa0
HADOOP-17004. ABFS: Improve the ABFS driver documentation
Contributed by Bilahari T H.
2020-05-18 20:45:54 -07:00
Steve Loughran d08b9e94e3
Revert "HADOOP-14557. Document HADOOP-8143 (Change distcp to have -pb on by default)."
This reverts commit 44350fdf49.

It is related to the rollback of HADOOP-8143.

Change-Id: If48e3dd670c920ada702dc36461ff398fe9d35cc
2020-05-14 19:04:36 +01:00
Steve Loughran 4486220bb2
Revert "HADOOP-8143. Change distcp to have -pb on by default."
This reverts commit dd65eea74b.

Change-Id: I74180cf59d5bbad8c9f66cb331535addcbea863e
2020-05-14 19:03:56 +01:00