Commit Graph

24937 Commits

Author SHA1 Message Date
Steve Loughran 1cc83f0f45
MAPREDUCE-7341. Add an intermediate manifest committer for Azure and GCS
This is a mapreduce/spark output committer optimized for
performance and correctness on Azure ADLS Gen 2 storage
(via the abfs connector) and Google Cloud Storage
(via the external gcs connector library).

* It is safe to use with HDFS, however it has not been optimized
for that use.
* It is *not* safe for use with S3, and will fail if an attempt
is made to do so.

Contributed by Steve Loughran

Change-Id: I6f3502e79c578b9fd1a8c1485f826784b5421fca
2022-03-17 11:46:41 +00:00
Steve Loughran e06ed88012
HADOOP-18162. hadoop-common support for MAPREDUCE-7341 Manifest Committer
* New statistic names in StoreStatisticNames
  (for joint use with s3a committers)
* Improvements to IOStatistics implementation classes
* RateLimiting wrapper to guava RateLimiter
* S3A committer Tasks moved over as TaskPool and
  added support for RemoteIterator
* JsonSerialization.load() to fail fast if source does not exist

+ tests.

This commit is a prerequisite for the main MAPREDUCE-7341 Manifest Committer
patch.

Contributed by Steve Loughran

Change-Id: Ia92e2ab5083ac3d8d3d713a4d9cb3e9e0278f654
2022-03-17 11:46:11 +00:00
Viraj Jasani 712d9bece8
HDFS-16502. Reconfigure Block Invalidate limit (#4064)
Signed-off-by: Wei-Chiu Chuang <weichiu@apache.org>
(cherry picked from commit 1c0bc35305)
2022-03-16 09:40:59 +08:00
Owen O'Malley c52b97d084 HDFS-16495: RBF should prepend the client ip rather than append it.
Fixes #4054

Signed-off-by: Owen O'Malley <oomalley@linkedin.com>
2022-03-14 10:30:29 -07:00
Takanobu Asanuma 52fb9d7ce2 HADOOP-18014. CallerContext should not include some characters. (#3698)
Reviewed-by: Viraj Jasani <vjasani@apache.org>
Reviewed-by: Mingliang Liu <liuml07@apache.org>
Reviewed-by: Hui Fei <ferhui@apache.org>

Cherry-picked from 9c887e5b by Owen O'Malley
2022-03-14 10:29:37 -07:00
litao 496657c63f HDFS-16310. RBF: Add client port to CallerContext for Router (#3635)
Cherry-picked from 5b05068f by Owen O'Malley
2022-03-14 10:29:30 -07:00
litao 0029f22d7d HADOOP-18003. Add a method appendIfAbsent for CallerContext (#3644)
Cherry-picked from 573b358f by Owen O'Malley
2022-03-14 10:29:23 -07:00
litao f9d40ed7b7 HDFS-16266. Add remote port information to HDFS audit log (#3538)
Reviewed-by: Akira Ajisaka <aajisaka@apache.org>
Reviewed-by: Wei-Chiu Chuang <weichiu@apache.org>
Signed-off-by: Takanobu Asanuma <tasanuma@apache.org>
Cherry-picked from 359b03c8 by Owen O'Malley
2022-03-14 10:29:11 -07:00
Hui Fei 2479d4ab6c HDFS-15630. RBF: Fix wrong client IP info in CallerContext when requests mount points with multi-destinations. Contributed by Chengwei Wang
Cherry-picked from 264c948e by Owen O'Malley
2022-03-14 10:29:04 -07:00
Hui Fei 8e129e5b8d HDFS-13293. RBF: The RouterRPCServer should transfer client IP via CallerContext to NamenodeRpcServer (#2363)
Cherry-picked from 518a212c by Owen O'Malley
2022-03-14 10:28:55 -07:00
Fei Hui 5a38ed2f22 HADOOP-17276. Extend CallerContext to make it include many items (#2327)
Cherry-picked from d0d10f7e by Owen O'Malley
2022-03-14 10:28:38 -07:00
Wei-Chiu Chuang 743db6e7b4
HADOOP-18155. Refactor tests in TestFileUtil (#4063)
(cherry picked from commit d0fa9b5775)

 Conflicts:
	hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileUtil.java
	hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestFileUtil.java

Co-authored-by: Gautham B A <gautham.bangalore@gmail.com>
2022-03-14 09:40:17 +09:00
Thinker313 0801fe450e
HDFS-16428. Source path with storagePolicy cause wrong typeConsumed while rename (#3898). Contributed by lei w.
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
2022-03-12 19:45:13 +08:00
Mukund Thakur e0619b702a HADOOP-18112: Implement paging during multi object delete. (#4045)
Multi object delete of size more than 1000 is not supported by S3 and 
fails with MalformedXML error. So implementing paging of requests to 
reduce the number of keys in a single request. Page size can be configured
using "fs.s3a.bulk.delete.page.size" 

 Contributed By: Mukund Thakur
2022-03-11 13:16:51 +05:30
Mehakmeet Singh 909048d87d
HADOOP-18150. Fix ITestAuditManagerDisabled test in S3A. (#4044)
Contributed by Mehakmeet Singh

Change-Id: I25c10844e4ad64b1fd7af9a02018220a611c85e0
2022-03-03 18:46:28 +00:00
Tamas Domok d7c375da40 YARN-11076. Upgrade jQuery version in Yarn UI2. (#4046)
Change-Id: I3cb1677741df5a1978e83029443d4a2d5d7e3d7f
(cherry picked from commit 22fe79cee3)
2022-03-03 23:55:26 +09:00
Chao Sun b174aaed57 Make upstream aware of 3.3.2 release 2022-03-02 19:10:30 -08:00
Szilard Nemeth 856e483592 YARN-11022. Fix the documentation for max-parallel-apps in CS. Contributed by Tamas Domok 2022-03-02 16:13:35 +01:00
Szilard Nemeth 192f53283b YARN-10894. Follow up YARN-10237: fix the new test case in TestRMWebServicesCapacitySched. Contributed by Tamas Domok 2022-03-02 16:04:38 +01:00
Szilard Nemeth 3ef3c5a05b YARN-11033. isAbsoluteResource is not correct for dynamically created queues. Contributed by Tamas Domok 2022-03-02 14:45:31 +01:00
Szilard Nemeth f06f44b1c2 YARN-11014. YARN incorrectly validates maximum capacity resources on the validation API. Contributed by Benjamin Teke 2022-03-02 14:23:00 +01:00
Szilard Nemeth 935619a28c YARN-11075. Explicitly declare serialVersionUID in LogMutation class. Contributed by Benjamin Teke 2022-03-01 18:05:04 +01:00
Steve Loughran 36a50ba3e0
HADOOP-18075. ABFS: Fix failure caused by listFiles() in ITestAbfsRestOperationException (#4040)
Contributed by Sumangala Patki

Change-Id: I245c08dab050d59b90ac6fdcb4c03153db77be0b
2022-03-01 13:48:39 +00:00
sumangala-patki 0ed0375413
HADOOP-17862. ABFS: Fix unchecked cast compiler warning for AbfsListStatusRemoteIterator (#3331)
closes #3331

Contributed by Sumangala Patki

Change-Id: I6cca91c8bcc34052c5233035f14a576f23086067
2022-03-01 13:48:39 +00:00
sumangala-patki 5e109705ef
HADOOP-17765. ABFS: Use Unique File Paths in Tests. (#3153)
Contributed by Sumangala Patki

Change-Id: Ic8f34bf578069504f7a811a7729982b9c9f49729
2022-03-01 12:29:03 +00:00
litao 74f5f90615 HDFS-16397. Reconfig slow disk parameters for datanode (#3828)
(cherry picked from commit 6b07c851f3)
2022-02-26 02:22:28 +09:00
litao 9c0bdc5aea
HDFS-16371. Exclude slow disks when choosing volume (#3753) (#4031) 2022-02-26 02:21:07 +09:00
litao 5601d0848a HDFS-15854. Make some parameters configurable for SlowDiskTracker and SlowPeerTracker (#2718)
Authored-by: tomscut <litao@bigo.sg>
(cherry picked from commit 32353eb38a)
2022-02-25 11:19:27 +09:00
Owen O'Malley 1a3060d41e
HADOOP-18139: Allow configuration of zookeeper server principal.
Fixes #4024

Signed-off-by: Owen O'Malley <oomalley@linkedin.com>
2022-02-24 15:15:13 -08:00
Sumangala Patki a1319e2404
HADOOP-18071. ABFS: Set driver global timeout for ITestAzureBlobFileSystemBasics (#3866)
Contributed by Sumangala Patki

Change-Id: I05f0cd1f0bd277b90f06a71345c46bfde48d7e7e
2022-02-23 21:30:39 +00:00
Ayush Saxena fa30224e95
HDFS-11041. Unable to unregister FsDatasetState MBean if DataNode is shutdown twice. Contributed by Wei-Chiu Chuang.
(cherry picked from commit e8cb2ae409)
2022-02-23 11:41:19 +08:00
Viraj Jasani d763c99707
HADOOP-18125. Utility to identify git commit / Jira fixVersion discrepancies for RC preparation (#3991)
Signed-off-by: Wei-Chiu Chuang <weichiu@apache.org>
(cherry picked from commit 697e5d4636)
2022-02-22 11:01:35 +08:00
Steve Loughran 94a0a04113
HADOOP-18136. Verify FileUtils.unTar() handling of missing .tar files.
Contributed by Steve Loughran

Change-Id: I3856afa821dbc8c2e3cb1cbe33793ec1734e2e24
2022-02-21 17:09:36 +00:00
PJ Fanning a302a19b48 HADOOP-18126. update junit 5 version due to build issues (#3993)
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
(cherry picked from commit 5f6a294fab)
2022-02-17 14:07:57 +09:00
Chentao Yu d14a7c6ee5 HADOOP-18109. Ensure that default permissions of directories under internal ViewFS directories are the same as directories on target filesystems. Contributed by Chentao Yu. (3953)
(cherry picked from commit 19d90e62fb)
2022-02-15 16:48:13 -08:00
litao db67952f9f HDFS-16396. Reconfig slow peer parameters for datanode (#3827)
Reviewed-by: Ayush Saxena <ayushsaxena@apache.org>
(cherry picked from commit 0c194f2157)
2022-02-16 09:45:07 +09:00
Takanobu Asanuma 4c57fb4d6b
HDFS-15745. Make DataNodePeerMetrics#LOW_THRESHOLD_MS and MIN_OUTLIER_DETECTION_NODES configurable. Contributed by Haibin Huang. (#3992)
(cherry picked from commit 1cd96e8dd8)

 Conflicts:
	hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java

Reviewed-by: Ayush Saxena <ayushsaxena@apache.org>
2022-02-16 09:42:43 +09:00
Akira Ajisaka 352656999f
YARN-10788. TestCsiClient fails (#3989)
Create unix domain socket in java.io.tmpdir instead of
test.build.dir to avoid 'File name too long' error.

Reviewed-by: Ayush Saxena <ayushsaxena@apache.org>
(cherry picked from commit 7fd90cdcbe)
2022-02-15 01:14:31 +09:00
GuoPhilipse 7512714475
HDFS-16449. Fix hadoop web site release notes and changelog not available (#3967)
Reviewed-by: Ayush Saxena <ayushsaxena@apache.org>
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
(cherry picked from commit b68964336d)
2022-02-14 05:40:16 +09:00
luoyuan3471 752a7b6d49
HADOOP-18044. Hadoop - Upgrade to jQuery 3.6.0 (#3791)
Co-authored-by: luoyuan <luoyuan@shopee.com>
(cherry picked from commit e2d620192a)
2022-02-11 23:18:25 +08:00
daimin 9071c9646c
Fix thread safety of EC decoding during concurrent preads (#3881)
(cherry picked from commit 0e74f1e467)
2022-02-11 10:20:45 +08:00
Ayush Saxena 5b47b9f360
HADOOP-18096. Distcp: Sync moves filtered file to home directory rather than deleting. (#3940). Contributed by Ayush Saxena.
Reviewed-by: Steve Loughran <stevel@apache.org>
Reviewed-by: stack <stack@apache.org>
2022-02-11 02:05:14 +05:30
Steve Loughran 088684ec60
HADOOP-18091. S3A auditing leaks memory through ThreadLocal references (#3930)
Adds a new map type WeakReferenceMap, which stores weak
references to values, and a WeakReferenceThreadMap subclass
to more closely resemble a thread local type, as it is a
map of threadId to value.

Construct it with a factory method and optional callback
for notification on loss and regeneration.

 WeakReferenceThreadMap<WrappingAuditSpan> activeSpan =
      new WeakReferenceThreadMap<>(
          (k) -> getUnbondedSpan(),
          this::noteSpanReferenceLost);

This is used in ActiveAuditManagerS3A for span tracking.

Relates to
* HADOOP-17511. Add an Audit plugin point for S3A
* HADOOP-18094. Disable S3A auditing by default.

Contributed by Steve Loughran.

Change-Id: Ibf7bb082fd47298f7ebf46d92f56e80ca9b2aaf8
2022-02-10 12:33:40 +00:00
Joey Krabacher 84de16028d
HADOOP-18114. Documentation correction in assumed_roles.md (#3949)
Fixes typo in hadoop-aws/assumed_roles.md

Contributed by Joey Krabacher

Change-Id: I2b77bd7793ae0433196b77042d5f400d0bcbe745
2022-02-09 10:47:24 +00:00
singer-bin ce7cabb771
HDFS-16437 ReverseXML processor doesn't accept XML files without the … (#3926)
(cherry picked from commit 125e3b6160)
2022-02-06 13:36:57 +08:00
daimin 709e617a84
HDFS-16403. Improve FUSE IO performance by supporting FUSE parameter max_background (#3842)
Reviewed-by: Istvan Fajth <pifta@apache.org>
Reviewed-by: Wei-Chiu Chuang <weichiu@apache.org>
(cherry picked from commit d69938994e)
2022-02-06 13:06:35 +08:00
Abhishek Das 8b03514eaf HADOOP-18100: Change scope of inner classes in InodeTree to make them accessible outside package
Fixes #3950

Signed-off-by: Owen O'Malley <omalley@apache.org>

Cherry-picked from 3684c7f6 by Owen O'Malley
2022-02-04 12:13:10 -08:00
Petre Bogdan Stolojan 87ff57765a
HADOOP-18085. S3 SDK Upgrade causes AccessPoint ARN endpoint mistranslation (#3902)
Part of HADOOP-17198. Support S3 Access Points.

HADOOP-18068. "upgrade AWS SDK to 1.12.132" broke the access point endpoint
translation.

Correct endpoints should start with "s3-accesspoint.", after SDK upgrade they start with
"s3.accesspoint-" which messes up tests + region detection by the SDK.

Contributed by Bogdan Stolojan

Change-Id: I0c0181628ab803afc39036003777eaec79aa378c
2022-02-04 16:22:24 +00:00
Petre Bogdan Stolojan a8d7acf1a8
HADOOP-17951. Improve S3A checking of S3 Access Point existence (#3516)
Follow-on to HADOOP-17198. Support S3 Access Points

Contributed by Bogdan Stolojan

Change-Id: I0932476c64e1967eb0cb3e0f00060fac5d2bae72
2022-02-04 16:22:04 +00:00
Petre Bogdan Stolojan 664075f35d
HADOOP-17198. Support S3 Access Points (#3260)
Add support for S3 Access Points. This provides extra security as it
ensures applications are not working with buckets belong to third parties.

To bind a bucket to an access point, set the access point (ap) ARN,
which must be done for each specific bucket, using the pattern

fs.s3a.bucket.$BUCKET.accesspoint.arn = ARN

* The global/bucket option `fs.s3a.accesspoint.required` to
mandate that buckets must declare their access point.
* This is not compatible with S3Guard.

Consult the documentation for further details.

Contributed by Bogdan Stolojan

(this commit contains the changes to TestArnResource from HADOOP-18068,
 "upgrade AWS SDK to 1.12.132" so that it works with the later SDK.)

Change-Id: I3fac213e52ca6ec1c813effb8496c353964b8e1b
2022-02-04 16:21:35 +00:00