Commit Graph

25074 Commits

Author SHA1 Message Date
PJ Fanning bd276092b0 MAPREDUCE-7411: use secure XML parsers in mapreduce modules (#4980)
Lockdown of parsers in hadoop-mapreduce.

Follow-on to HADOOP-18469. Add secure XML parser factories to XMLUtils

Contributed by P J Fanning
2022-10-26 11:04:29 +01:00
Viraj Jasani 36a0e818ec HDFS-16016. BPServiceActor to provide new thread to handle IBR (#2998)
Contributed by Viraj Jasani

(cherry picked from commit c1bf3cb0da)
2022-10-24 15:16:38 +09:00
Takanobu Asanuma 198bc444de
HDFS-16566 Erasure Coding: Recovery may causes excess replicas when busy DN exsits (#4252) (#5059)
(cherry picked from commit 9376b65989)

Co-authored-by: RuinanGu <57645247+RuinanGu@users.noreply.github.com>
Reviewed-by: Wei-Chiu Chuang <weichiu@apache.org>
Reviewed-by: Ashutosh Gupta <ashugpt@amazon.com>
2022-10-22 13:14:04 +09:00
Steve Loughran 19f8e4f34d
YARN-11330. use secure XML parsers (#4981)
Move construction of XML parsers in YARN
modules to using the locked-down parser factory
of HADOOP-18469.

One exception: GpuDeviceInformationParser still supports DTD resolution;
all other features are disabled.

Contributed by P J Fanning
2022-10-21 14:16:22 +01:00
SevenAddSix 237814a9b3 HDFS-16480. Fix typo: indicies -> indices (#4020)
(cherry picked from commit 5eab9719cb)
2022-10-21 17:32:58 +09:00
Hui Fei 0c2234fd8e HDFS-15803. EC: Remove unnecessary method (getWeight) in StripedReconstructionInfo. Contributed by huhaiyang
(cherry picked from commit 66ecee333e)
2022-10-21 17:31:30 +09:00
FuzzingTeam 1f414ab847
HADOOP-18471. Fixed ArrayIndexOutOfBoundsException in DefaultStringifier (#4957)
Contributed by FuzzingTeam
2022-10-20 18:14:24 +01:00
Steve Loughran 75b04010a2
HDFS-16795. Use secure XML parsers (#4979)
Move construction of XML parsers in HDFS
modules to using the locked-down parser factory
of HADOOP-18469.

Contributed by P J Fanning
2022-10-20 17:48:58 +01:00
Steve Loughran 5b7cbe2075
HADOOP-18156. Address JavaDoc warnings in classes like MarkerTool, S3ObjectAttributes, etc (#4965)
Contributed by Ankit Saurabh
2022-10-20 17:46:46 +01:00
PJ Fanning ea851c5e4a
HADOOP-15983. Use jersey-json that is built to use jackson2 ((#3988)
Moves from com.sun.jersey 1.19 to the artifact
com.github.pjfanning:jersey-json:1.20

This allows jackson 1 to be removed from the classpath.

Contains

* HADOOP-16908. Prune Jackson 1 from the codebase and restrict
   its usage for future
* HADOOP-18219. Fix shaded client test failure

These are needed for the HADOOP-15983 changes to build.

Contributed by PJ Fanning.
2022-10-20 17:37:56 +01:00
Daniel Carl Jones 2778bc8d90
HADOOP-18465. Fix S3A SSE test skip when encryption is disabled (#4925)
Contributed by Daniel Carl Jones
2022-10-19 16:02:36 +01:00
Daniel Carl Jones c30b2f0b8c
HADOOP-18304. Improve user-facing S3A committers documentation (#4478)
Contributed by: Daniel Carl Jones
2022-10-19 13:08:27 +01:00
Steve Loughran 7a18ceb269
HADOOP-18476. Abfs and S3A FileContext bindings to close wrapped filesystems in finalizer (#4966)
This is to try and close the underlying filesystems when the FileContext APIs are used.
Without this, threads may be leaked

Contributed by Steve Loughran
2022-10-18 15:28:55 +01:00
Hexiaoqiao 84c7fd909b
HADOOP-18497. Upgrade commons-text version to 1.10.0 to fix CVE-2022-42889. (#5037).
Contributed by PJ Fanning.
2022-10-18 15:05:08 +01:00
slfan1989 2e3f91bdf5
HADOOP-18360. Update commons-csv from 1.0 to 1.9.0. (#4928). Contributed by fanshilun.
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
2022-10-17 10:23:13 +05:30
PJ Fanning 96d4b9e6a7
HADOOP-18493: upgrade jackson-databind to 2.12.7.1 (#5011). Contributed by PJ Fanning.
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
2022-10-17 10:04:21 +05:30
Steve Loughran cd856b7195
HADOOP-17563. Upgrade BouncyCastle to 1.68 (#3980) (#5015)
Addresses CVE-2020-15522 and CVE-2020-26939.

This can break builds with older maven shade plugins or
other code using asm.jar which is not aware of recent java bytecodes
and/or multi-release JARs. fix: use a later version of asm.jar

Contributed by PJ Fanning
2022-10-15 15:09:05 +01:00
ahmarsuhail 08760fc4c1
HADOOP-18481. AWS v2 SDK upgrade log to not about standard AWS Credential Providers. (#4973)
The AWS SDKV2 upgrade log no longer warns about instantiation
of the v1 SDK credential providers which are commonly used in
s3a configurations:

* com.amazonaws.auth.EnvironmentVariableCredentialsProvider
* com.amazonaws.auth.EC2ContainerCredentialsProviderWrapper
* com.amazonaws.auth.InstanceProfileCredentialsProvider

When the hadoop-aws module moves to the v2 SDK, references to these
credential providers will be rewritten to their v2 equivalents.

Follow-on to HADOOP-18382. "Upgrade AWS SDK to V2 - Prerequisites"

Contributed by Ahmar Suhail
2022-10-14 11:46:14 +01:00
ahmarsuhail 47c1c8eddc
HADOOP-18382. AWS SDK v2 upgrade prerequisites (#4698)
This patch prepares the hadoop-aws module for a future
migration to using the v2 AWS SDK (HADOOP-18073)

That upgrade will be incompatible; this patch prepares
for it:
-marks some credential providers and other
 classes and methods as @deprecated.
-updates site documentation
-reduces the visibility of the s3 client;
 other than for testing, it is kept private to
 the S3AFileSystem class.
-logs some warnings when deprecated APIs are used.

The warning messages are printed only once
per JVM's life. To disable them, set the
log level of org.apache.hadoop.fs.s3a.SDKV2Upgrade
to ERROR

Contributed by Ahmar Suhail
2022-10-14 11:45:43 +01:00
monthonk 52eca61a3e
HADOOP-18292. Fix s3 select tests when running against unsupported storage class (#4489)
Follow-on from HADOOP-12020.

Contributed by Monthon Klongklaew
2022-10-13 13:37:35 +01:00
belugabehr 6253bf72b6
HADOOP-17779: Lock File System Creator Semaphore Uninterruptibly (#3158)
Contributed by David Mollitor.
2022-10-11 13:07:42 +01:00
Xing Lin 760144f135
HDFS-16628. RBF: Correct target directory when move to trash for kerberos login user. (#4974)
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
2022-10-11 16:14:39 +09:00
Ashutosh Gupta 6847ec0647
HADOOP-11245. Update NFS gateway to use Netty4 (#2832) (#4997)
Reviewed-by: Tsz-Wo Nicholas Sze <szetszwo@apache.org>

Co-authored-by: Wei-Chiu Chuang <weichiu@apache.org>
2022-10-11 05:27:43 +08:00
Mukund Thakur 77cb778a44
HADOOP-18460. checkIfVectoredIOStopped before populating the buffers (#4986)
Contributed by Mukund Thakur
2022-10-10 11:18:22 +01:00
Steve Loughran 80525615e5
HADOOP-18480. Upgrade aws sdk to 1.12.316 (#4972)
Contributed by Steve Loughran
2022-10-10 10:29:41 +01:00
Steve Loughran e360e7620c
HADOOP-18468: Upgrade jettison to 1.5.1 to fix CVE-2022-40149 (#4937)
Contributed by PJ Fanning
2022-10-10 10:05:39 +01:00
Xing Lin 7d7f7a9e9b
HDFS-16024. RBF: Rename data to the Trash should be based on src location (#4962)
(cherry picked from commit e18d806212)

Reviewed-by: Dinesh Chitlangia <dineshc@apache.org>
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
2022-10-10 00:33:48 +09:00
Steve Loughran 61e1603750
HADOOP-18401. No ARM binaries in branch-3.3.x releases. (#4953)
Fix the branch-3.3 docker image and create-release scripts to work on arm 64 and macbook m1

Contributed by Ayush Saxena and Steve Loughran
2022-10-07 15:58:51 +01:00
Steve Loughran c70b8709cc
HADOOP-18442. Remove openstack support (#4855)
The swift:// connector for openstack support has been removed.
The hadoop-openstack jar remains, only now it is empty of code. 
This is to ensure that projects which declare the JAR a dependency
will still have successful builds.

Contributed by Steve Loughran
2022-10-07 12:03:08 +01:00
Steve Loughran 80781306dd
HADOOP-18469. Add secure XML parser factories to XMLUtils (#4940)
Add to XMLUtils a set of methods to create secure XML Parsers/transformers,
locking down DTD, schema, XXE exposure.

Use these wherever XML parsers are created.

Contributed by PJ Fanning
2022-10-07 10:47:55 +01:00
Ashutosh Gupta 725cd90712 MAPREDUCE-7370. Parallelize MultipleOutputs#close call (#4248). Contributed by Ashutosh Gupta.
Reviewed-by: Akira Ajisaka <aajisaka@apache.org>
Signed-off-by: Chris Nauroth <cnauroth@apache.org>
(cherry picked from commit 062c50db6b)
2022-10-06 23:14:38 +00:00
Ashutosh Gupta 1c3bf42ad0
YARN-11303. Upgrade jquery ui to 1.13.2 to mitigate CVE-2022-31160 (#4895)
Contributed by Ashutosh Gupta
2022-10-05 12:09:11 +01:00
Mukund Thakur 0d772b353f HADOOP-18463. Add an integration test to process data asynchronously during vectored read. (#4921)
part of HADOOP-18103.

Contributed by: Mukund Thakur
2022-09-28 15:38:41 -05:00
Mukund Thakur bbe841e601 HADOOP-18347. S3A Vectored IO to use bounded thread pool. (#4918)
part of HADOOP-18103.

Also introducing a config fs.s3a.vectored.active.ranged.reads
to configure the maximum number of number of range reads a
single input stream can have active (downloading, or queued)
to the central FileSystem instance's pool of queued operations.
This stops a single stream overloading the shared thread pool.

Contributed by: Mukund Thakur
 Conflicts:
	hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java
	hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
2022-09-28 15:34:31 -05:00
Mehakmeet Singh e5a566c91f
HADOOP-18416. fix ITestS3AIOStatisticsContext test failure (#4931)
Follow on to HADOOP-17461.

Contributed by: Mehakmeet Singh
2022-09-28 14:17:56 +05:30
Ashutosh Gupta dea018ef23
HDFS-16766. XML External Entity (XXE) attacks can occur while processing XML received from an untrusted source (#4886)
Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com>
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
(cherry picked from commit d9f435f6ac)
2022-09-27 15:44:58 +09:00
Ashutosh Gupta 51605f9dcc
HADOOP-18443. Upgrade snakeyaml to 1.32 (#4873)
Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com>
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
2022-09-25 23:50:46 +09:00
Xing Lin f1c1ad52c5 HADOOP-18444 Add Support for localized trash for ViewFileSystem in Trash.moveToAppropriateTrash (#4869)
* HADOOP-18444 Add Support for localized trash for ViewFileSystem in Trash.moveToAppropriateTrash

Signed-off-by: Xing Lin <xinglin@linkedin.com>
2022-09-23 11:06:23 -07:00
Steve Loughran af0a6d7987
HADOOP-18456. NullPointerException in ObjectListingIterator. (#4909)
This problem surfaced in impala integration tests
   IMPALA-11592. TestLocalCatalogRetries.test_fetch_metadata_retry fails in S3 build
after the change
  HADOOP-17461. Add thread-level IOStatistics Context
The actual GC race condition came with
 HADOOP-18091. S3A auditing leaks memory through ThreadLocal references

The fix for this is, if our hypothesis is correct, in WeakReferenceMap.create()
where a strong reference to the new value is kept in a local variable
*and referred to later* so that the JVM will not GC it.

Along with the fix, extra assertions ensure that if the problem is not fixed,
applications will fail faster/more meaningfully.

Contributed by Steve Loughran.
2022-09-23 09:57:49 +01:00
Kidd5368 ceec19e61a HDFS-16776 Erasure Coding: The length of targets should be checked when DN gets a reconstruction task (#4901)
(cherry picked from commit 9a29075f91)
2022-09-23 12:29:39 +09:00
PJ Fanning d66dea300e
HADOOP-18341: upgrade commons-configuration2 to 2.8.0 and commons-text to 1.9 (#4916) 2022-09-22 10:44:27 +09:00
Ashutosh Gupta 683fa264ee
HADOOP-16769. LocalDirAllocator to provide diagnostics when file creation fails (#4896)
The patch provides detailed diagnostics of file creation failure in LocalDirAllocator.

Contributed by: Ashutosh Gupta
2022-09-21 11:54:47 +05:30
Ashutosh Gupta 3af155ceeb HADOOP-18400. Fix file split duplicating records from a succeeding split when reading BZip2 text files (#4732)
Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com>
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
(cherry picked from commit 30c36ef25a)
2022-09-19 13:45:47 +09:00
Steve Vaughan 357c83db94
HDFS-16686. GetJournalEditServlet fails to authorize valid Kerberos request (#4724) (#4794) 2022-09-13 10:50:23 -07:00
Ashutosh Gupta 2532eca013
YARN-11241. Add uncleaning option for local app log file with log-aggregation enabled (#4703)
Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com>
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
(cherry picked from commit 65a027b112)
2022-09-12 23:33:10 +09:00
Mukund Thakur c9d6605a59 HADOOP-18439. Fix VectoredIO for LocalFileSystem when checksum is enabled. (#4862)
part of HADOOP-18103.

While merging the ranges in CheckSumFs, they are rounded up based on the
value of checksum bytes size which leads to some ranges crossing the EOF
thus they need to be fixed else it will cause EOFException during actual reads.

Contributed By: Mukund Thakur
2022-09-09 11:17:32 -05:00
Sumangala Patki 2e4c5ca88f
HADOOP-17873. ABFS: Fix transient failures in ITestAbfsStreamStatistics and ITestAbfsRestOperationException (#3699)
Successor for the reverted PR #3341, using the hadoop @VisibleForTesting attribute

Contributed by Sumangala Patki
2022-09-06 11:34:55 +01:00
sreeb-msft 5f3bc4340e
HADOOP-18408. ABFS: ITestAbfsManifestCommitProtocol fails on nonHNS configuration (#4758)
ITestAbfsManifestCommitProtocol  to set requireRenameResilience to false for nonHNS configuration

Contributed by Sree Bhattacharyya
2022-09-02 12:34:43 +01:00
monthonk 9dffa65021
HADOOP-18339. S3A storage class option only picked up when buffering writes to disk. (#4669)
Follow-up to HADOOP-12020 Support configuration of different S3 storage classes;
S3 storage class is now set when buffering to heap/bytebuffers, and when
creating directory markers

Contributed by Monthon Klongklaew
2022-09-01 18:15:48 +01:00
Steve Vaughan 3a6c8ff8bb
HDFS-16755. TestQJMWithFaults.testUnresolvableHostName() can fail due to unexpected host resolution (#4833)
Use ".invalid" domain from IETF RFC 2606 to ensure that the host doesn't resolve.

Contributed by Steve Vaughan Jr
2022-09-01 14:01:26 +01:00