Commit Graph

1633 Commits

Author SHA1 Message Date
Viraj Jasani 7c20602b17
HDFS-16522. Set Http and Ipc ports for Datanodes in MiniDFSCluster (#4108)
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
2022-04-06 18:17:02 +09:00
litao 966b773a7c
HDFS-16527. Add global timeout rule for TestRouterDistCpProcedure (#4129)
Reviewed-by: Inigo Goiri <inigoiri@apache.org>
Reviewed-by: Ayush Saxena <ayushsaxena@apache.org>
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
2022-04-06 14:34:24 +09:00
9uapaw 4b1a6bfb10 YARN-11102. Fix spotbugs error in hadoop-sls module. Contributed by Szilard Nemeth, Andras Gyori. 2022-04-01 18:24:37 +02:00
Szilard Nemeth 94031b729d YARN-11103. SLS cleanup after previously merged SLS refactor jiras. Contributed by Szilard Nemeth 2022-03-31 14:29:59 +02:00
Szilard Nemeth ab8c360620 YARN-10550. Decouple NM runner logic from SLSRunner. Contributed by Szilard Nemeth 2022-03-30 19:53:10 +02:00
9uapaw e386d6a661 YARN-10549. Decouple RM runner logic from SLSRunner. Contributed by Szilard Nemeth. 2022-03-29 09:58:27 +02:00
9uapaw adbaf48082 YARN-11100. Fix StackOverflowError in SLS scheduler event handling. Contributed by Szilard Nemeth. 2022-03-26 21:43:10 +01:00
9uapaw 08a77a765b YARN-10548. Decouple AM runner logic from SLSRunner. Contributed by Szilard Nemeth. 2022-03-25 18:48:56 +01:00
Benjamin Teke ffa0eab488 YARN-11094. Follow up changes for YARN-10547. Contributed by Szilard Nemeth 2022-03-25 12:01:44 +01:00
9uapaw 526142447a YARN-10552. Eliminate code duplication in SLSCapacityScheduler and SLSFairScheduler. Contributed by Szilard Nemeth. 2022-03-24 16:24:33 +01:00
9uapaw 077c6c62d6 YARN-10547. Decouple job parsing logic from SLSRunner. Contributed by Szilard Nemeth. 2022-03-24 06:16:26 +01:00
Daniel Carl Jones 9edfe30a60
HADOOP-14661. Add S3 requester pays bucket support to S3A (#3962)
Adds the option fs.s3a.requester.pays.enabled, which, if set to true, allows
the client to access S3 buckets where the requester is billed for the IO.

Contributed by Daniel Carl Jones
2022-03-23 20:00:50 +00:00
Steve Loughran 708a0ce21b
HADOOP-13704. Optimized S3A getContentSummary()
Optimize the scan for s3 by performing a deep tree listing,
inferring directory counts from the paths returned.

Contributed by Ahmar Suhail.

Change-Id: I26ffa8c6f65fd11c68a88d6e2243b0eac6ffd024
2022-03-22 13:21:12 +00:00
Steve Loughran 8294bd5a37
HADOOP-18163. hadoop-azure support for the Manifest Committer of MAPREDUCE-7341
Follow-on patch to MAPREDUCE-7341, adding ABFS support and tests

* resilient rename
* tests for job commit through the manifest committer.

contains
- HADOOP-17976. ABFS etag extraction inconsistent between LIST and HEAD calls
- HADOOP-16204. ABFS tests to include terasort

Contributed by Steve Loughran.

Change-Id: I0a7d4043bdf19bcb00c033fc389730109b93b77f
2022-03-17 11:24:51 +00:00
Mukund Thakur 672e380c4f
HADOOP-18112: Implement paging during multi object delete. (#4045)
Multi object delete of size more than 1000 is not supported by S3 and 
fails with MalformedXML error. So implementing paging of requests to 
reduce the number of keys in a single request. Page size can be configured
using "fs.s3a.bulk.delete.page.size" 

 Contributed By: Mukund Thakur
2022-03-11 13:05:45 +05:30
Viraj Jasani 66b72406bd
HADOOP-18131. Upgrade maven enforcer plugin and relevant dependencies (#4000)
Reviewed-by: Akira Ajisaka <aajisaka@apache.org>
Reviewed-by: Wei-Chiu Chuang <weichiu@apache.org>
Signed-off-by: Takanobu Asanuma <tasanuma@apache.org>
2022-03-08 17:27:04 +09:00
Mehakmeet Singh 6995374b54
HADOOP-18150. Fix ITestAuditManagerDisabled test in S3A. (#4044)
Contributed by Mehakmeet Singh
2022-03-03 18:44:28 +00:00
Steve Loughran b56af00114
HADOOP-18075. ABFS: Fix failure caused by listFiles() in ITestAbfsRestOperationException (#4040)
Contributed by Sumangala Patki
2022-03-01 11:48:10 +00:00
Sumangala Patki c18b646020
HADOOP-18071. ABFS: Set driver global timeout for ITestAzureBlobFileSystemBasics (#3866)
Contributed by Sumangala Patki
2022-02-23 19:38:10 +00:00
monthonk 1f157f802d
HADOOP-17386. Change default fs.s3a.buffer.dir to be under Yarn container path on yarn applications (#3908)
Co-authored-by: Monthon Klongklaew <monthonk@amazon.com>
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
2022-02-22 13:50:27 +09:00
Mohanad Elsafty a4f459097b
HADOOP-18117. Add an option to preserve root directory permissions (#3970) 2022-02-18 19:12:50 +08:00
Ayush Saxena fe583c4b63
HADOOP-18096. Distcp: Sync moves filtered file to home directory rather than deleting. (#3940). Contributed by Ayush Saxena.
Reviewed-by: Steve Loughran <stevel@apache.org>
Reviewed-by: stack <stack@apache.org>
2022-02-11 01:59:40 +05:30
Steve Loughran efdec92cab
HADOOP-18091. S3A auditing leaks memory through ThreadLocal references (#3930)
Adds a new map type WeakReferenceMap, which stores weak
references to values, and a WeakReferenceThreadMap subclass
to more closely resemble a thread local type, as it is a
map of threadId to value.

Construct it with a factory method and optional callback
for notification on loss and regeneration.

 WeakReferenceThreadMap<WrappingAuditSpan> activeSpan =
      new WeakReferenceThreadMap<>(
          (k) -> getUnbondedSpan(),
          this::noteSpanReferenceLost);

This is used in ActiveAuditManagerS3A for span tracking.

Relates to
* HADOOP-17511. Add an Audit plugin point for S3A
* HADOOP-18094. Disable S3A auditing by default.

Contributed by Steve Loughran.
2022-02-10 12:31:41 +00:00
Joey Krabacher a08e69d33e
HADOOP-18114. Documentation correction in assumed_roles.md (#3949)
Fixes typo in hadoop-aws/assumed_roles.md

Contributed by Joey Krabacher
2022-02-09 10:35:11 +00:00
Petre Bogdan Stolojan 5e7ce26e66
HADOOP-18085. S3 SDK Upgrade causes AccessPoint ARN endpoint mistranslation (#3902)
Part of HADOOP-17198. Support S3 Access Points.

HADOOP-18068. "upgrade AWS SDK to 1.12.132" broke the access point endpoint
translation.

Correct endpoints should start with "s3-accesspoint.", after SDK upgrade they start with
"s3.accesspoint-" which messes up tests + region detection by the SDK.

Contributed by Bogdan Stolojan
2022-02-04 15:37:08 +00:00
Steve Loughran b795f6f9a8
HADOOP-18094. Disable S3A auditing by default.
See HADOOP-18091. S3A auditing leaks memory through ThreadLocal references

* Adds a new option fs.s3a.audit.enabled to controls whether or not auditing
is enabled. This is false by default.

* When false, the S3A auditing manager is NoopAuditManagerS3A,
which was formerly only used for unit tests and
during filsystem initialization.

* When true, ActiveAuditManagerS3A is used for managing auditing,
allowing auditing events to be reported.

* updates documentation and tests.

This patch does not fix the underlying leak. When auditing is enabled,
long-lived threads will retain references to the audit managers
of S3A filesystem instances which have already been closed.

Contributed by Steve Loughran.
2022-01-24 13:37:33 +00:00
Anmol Asrani 7c97c0f969
HADOOP-18084. ABFS: Add testfilePath while verifying test contents are read correctly (#3903)
Contributed by: Anmol Asrani
2022-01-19 10:13:13 +00:00
Steve Loughran d8ab84275e
HADOOP-18068. upgrade AWS SDK to 1.12.132 (#3864)
With this update, the versions of key shaded dependencies are

  jackson    2.12.3
  httpclient 4.5.13

Contributed by Steve Loughran
2022-01-18 10:31:28 +00:00
Steve Loughran 14ba19af06
HADOOP-17409. Remove s3guard from S3A module (#3534)
Completely removes S3Guard support from the S3A codebase.

If the connector is configured to use any metastore other than
the null and local stores (i.e. DynamoDB is selected) the s3a client
will raise an exception and refuse to initialize.

This is to ensure that there is no mix of S3Guard enabled and disabled
deployments with the same configuration but different hadoop releases
-it must be turned off completely.

The "hadoop s3guard" command has been retained -but the supported
subcommands have been reduced to those which are not purely S3Guard
related: "bucket-info" and "uploads".

This is major change in terms of the number of files
changed; before cherry picking subsequent s3a patches into
older releases, this patch will probably need backporting
first.

Goodbye S3Guard, your work is done. Time to die.

Contributed by Steve Loughran.
2022-01-17 18:08:57 +00:00
monthonk b27732c69b
HADOOP-14334. S3 SSEC tests to downgrade when running against a mandatory encryption object store (#3870)
Contributed by Monthon Klongklaew
2022-01-09 18:01:47 +00:00
Viraj Jasani 08c803ea30
MAPREDUCE-7371. DistributedCache alternative APIs should not use DistributedCache APIs internally (#3855) 2022-01-09 00:18:10 +09:00
Ayush Saxena 657a2882e9
HADOOP-18056. DistCp: Filter duplicates in the source paths. (#3825). Contributed by Ayush Saxena.
Reviewed-by: tomscut <litao@bigo.sg>
Reviewed-by: Steve Loughran <stevel@apache.org>
2022-01-05 23:53:07 +05:30
Akira Ajisaka dba139cd0f
HADOOP-18045. Disable TestDynamometerInfra (#3829)
Reviewed-by: Fei Hui <feihui.ustc@gmail.com>
2021-12-28 13:22:12 +09:00
Ashutosh Gupta ebdbe7eb82
HADOOP-18057. Fix typo: validateEncrytionSecrets -> validateEncryptionSecrets (#3826) 2021-12-27 16:51:17 +08:00
Viraj Jasani 04b6b9a87b
HADOOP-16908. Prune Jackson 1 from the codebase and restrict it's usage for future (#3789)
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
2021-12-20 16:01:34 +09:00
Szilard Nemeth a967033a9f
YARN-10427. Duplicate Job IDs in SLS output (#3809). Contributed by Szilard Nemeth 2021-12-17 00:34:16 +01:00
Akira Ajisaka 9b9e2ef87f
HADOOP-18040. Use maven.test.failure.ignore instead of ignoreTestFailure (#3774)
Reviewed-by: Masatake Iwasaki <iwasakims@apache.org>
2021-12-10 01:36:31 +09:00
GuoPhilipse c65c87f211
HADOOP-18026. Fix default value of Magic committer (#3723)
Contributed by guophilipse
2021-11-29 15:50:30 +00:00
Viraj Jasani 215388beea
HADOOP-18022. Add restrict-imports-enforcer-rule for Guava Preconditions and remove remaining usages (#3712)
Reviewed-by: Akira Ajisaka <aajisaka@apache.org>
Signed-off-by: Takanobu Asanuma <tasanuma@apache.org>
2021-11-29 17:37:30 +09:00
Steve Loughran 98fe0d0fc3
HADOOP-17979. Add Interface EtagSource to allow FileStatus subclasses to provide etags (#3633)
Contributed by Steve Loughran
2021-11-24 17:33:12 +00:00
Mehakmeet Singh a35f7dec25
HADOOP-18016. Make certain methods LimitedPrivate in S3AUtils.java (#3685)
Contributed By: Mehakmeet Singh
2021-11-24 13:32:59 +05:30
Andrew Chung 5b1b2c8ef6
YARN-11003. Make RMNode aware of all (OContainer inclusive) allocated resources (#3646) 2021-11-23 13:20:08 -08:00
Viraj Jasani c7ec1897c4
HADOOP-18018. unguava: remove Preconditions from hadoop-tools modules (#3688) 2021-11-23 13:34:10 +09:00
Steve Loughran 3391b69692
HADOOP-18002. ABFS rename idempotency broken -remove recovery (#3641)
Cut modtime-based rename recovery as object modification time
is not updated during rename operation.
Applications will have to use etag API of HADOOP-17979
and implement it themselves.

Why not do the HEAD and etag recovery in ABFS client?
Cuts the IO capacity in half so kills job commit performance.
The manifest committer of MAPREDUCE-7341 will do this recovery
and act as the reference implementation of the algorithm.

Contributed by: Steve Loughran
2021-11-16 16:47:44 +05:30
Steve Loughran 45f164a854 Revert "HADOOP-17873. ABFS: Fix transient failures in ITestAbfsStreamStatistics and ITestAbfsRestOperationException (#3341)"
This reverts commit 82658a22d6.
2021-11-05 14:21:15 +00:00
sumangala-patki e1ac10ceae
HADOOP-17863. ABFS: Fix compiler deprecation warning in TextFileBasedIdentityHandler (#3332)
Closes #3332 

Contributed by Sumangala Patki
2021-11-05 12:53:51 +00:00
sumangala-patki 19644c0cdc
HADOOP-17862. ABFS: Fix unchecked cast compiler warning for AbfsListStatusRemoteIterator (#3331)
closes #3331 

Contributed by Sumangala Patki
2021-11-05 12:50:37 +00:00
sumangala-patki 82658a22d6
HADOOP-17873. ABFS: Fix transient failures in ITestAbfsStreamStatistics and ITestAbfsRestOperationException (#3341)
Addresses transient failures in the following test classes:

* ITestAbfsStreamStatistics: Uses a filesystem level static instance to record read/write statistics, which also tracks these operations in other tests running in parallel. Marked for sequential-only run to avoid transient failure

* ITestAbfsRestOperationException: The use of a static member to track retry count causes transient failures when two tests of this class happen to run together. Switch to non-static variable for assertions on retry count

closes #3341

Contributed by Sumangala Patki
2021-11-04 14:40:37 +00:00
Jinhu Wu a9c51ea57d
HADOOP-17374. support listObjectV2 (#3587) 2021-11-03 21:47:41 -07:00
Steve Loughran 6c6d1b64d4
HADOOP-17928. Syncable: S3A to warn and downgrade (#3585)
This switches the default behavior of S3A output streams
to warning that Syncable.hsync() or hflush() have been
called; it's not considered an error unless the defaults
are overridden.

This avoids breaking applications which call the APIs,
at the risk of people trying to use S3 as a safe store
of streamed data (HBase WALs, audit logs etc).

Contributed by Steve Loughran.
2021-11-02 13:26:16 +00:00