hadoop

Commit Graph

Author	SHA1	Message	Date
Xing Lin	7d7f7a9e9b	HDFS-16024. RBF: Rename data to the Trash should be based on src location (#4962 ) (cherry picked from commit `e18d806212`) Reviewed-by: Dinesh Chitlangia <dineshc@apache.org> Signed-off-by: Akira Ajisaka <aajisaka@apache.org>	2022-10-10 00:33:48 +09:00
Steve Loughran	61e1603750	HADOOP-18401. No ARM binaries in branch-3.3.x releases. (#4953 ) Fix the branch-3.3 docker image and create-release scripts to work on arm 64 and macbook m1 Contributed by Ayush Saxena and Steve Loughran	2022-10-07 15:58:51 +01:00
Steve Loughran	c70b8709cc	HADOOP-18442. Remove openstack support (#4855 ) The swift:// connector for openstack support has been removed. The hadoop-openstack jar remains, only now it is empty of code. This is to ensure that projects which declare the JAR a dependency will still have successful builds. Contributed by Steve Loughran	2022-10-07 12:03:08 +01:00
Steve Loughran	80781306dd	HADOOP-18469. Add secure XML parser factories to XMLUtils (#4940 ) Add to XMLUtils a set of methods to create secure XML Parsers/transformers, locking down DTD, schema, XXE exposure. Use these wherever XML parsers are created. Contributed by PJ Fanning	2022-10-07 10:47:55 +01:00
Ashutosh Gupta	725cd90712	MAPREDUCE-7370. Parallelize MultipleOutputs#close call (#4248 ). Contributed by Ashutosh Gupta. Reviewed-by: Akira Ajisaka <aajisaka@apache.org> Signed-off-by: Chris Nauroth <cnauroth@apache.org> (cherry picked from commit `062c50db6b`)	2022-10-06 23:14:38 +00:00
Ashutosh Gupta	1c3bf42ad0	YARN-11303. Upgrade jquery ui to 1.13.2 to mitigate CVE-2022-31160 (#4895 ) Contributed by Ashutosh Gupta	2022-10-05 12:09:11 +01:00
Mukund Thakur	0d772b353f	HADOOP-18463. Add an integration test to process data asynchronously during vectored read. (#4921 ) part of HADOOP-18103. Contributed by: Mukund Thakur	2022-09-28 15:38:41 -05:00
Mukund Thakur	bbe841e601	HADOOP-18347. S3A Vectored IO to use bounded thread pool. (#4918 ) part of HADOOP-18103. Also introducing a config fs.s3a.vectored.active.ranged.reads to configure the maximum number of number of range reads a single input stream can have active (downloading, or queued) to the central FileSystem instance's pool of queued operations. This stops a single stream overloading the shared thread pool. Contributed by: Mukund Thakur Conflicts: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java	2022-09-28 15:34:31 -05:00
Mehakmeet Singh	e5a566c91f	HADOOP-18416. fix ITestS3AIOStatisticsContext test failure (#4931 ) Follow on to HADOOP-17461. Contributed by: Mehakmeet Singh	2022-09-28 14:17:56 +05:30
Ashutosh Gupta	dea018ef23	HDFS-16766. XML External Entity (XXE) attacks can occur while processing XML received from an untrusted source (#4886 ) Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com> Signed-off-by: Akira Ajisaka <aajisaka@apache.org> (cherry picked from commit `d9f435f6ac`)	2022-09-27 15:44:58 +09:00
Ashutosh Gupta	51605f9dcc	HADOOP-18443. Upgrade snakeyaml to 1.32 (#4873 ) Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com> Signed-off-by: Akira Ajisaka <aajisaka@apache.org>	2022-09-25 23:50:46 +09:00
Xing Lin	f1c1ad52c5	HADOOP-18444 Add Support for localized trash for ViewFileSystem in Trash.moveToAppropriateTrash (#4869 ) * HADOOP-18444 Add Support for localized trash for ViewFileSystem in Trash.moveToAppropriateTrash Signed-off-by: Xing Lin <xinglin@linkedin.com>	2022-09-23 11:06:23 -07:00
Steve Loughran	af0a6d7987	HADOOP-18456. NullPointerException in ObjectListingIterator. (#4909 ) This problem surfaced in impala integration tests IMPALA-11592. TestLocalCatalogRetries.test_fetch_metadata_retry fails in S3 build after the change HADOOP-17461. Add thread-level IOStatistics Context The actual GC race condition came with HADOOP-18091. S3A auditing leaks memory through ThreadLocal references The fix for this is, if our hypothesis is correct, in WeakReferenceMap.create() where a strong reference to the new value is kept in a local variable and referred to later so that the JVM will not GC it. Along with the fix, extra assertions ensure that if the problem is not fixed, applications will fail faster/more meaningfully. Contributed by Steve Loughran.	2022-09-23 09:57:49 +01:00
Kidd5368	ceec19e61a	HDFS-16776 Erasure Coding: The length of targets should be checked when DN gets a reconstruction task (#4901 ) (cherry picked from commit `9a29075f91`)	2022-09-23 12:29:39 +09:00
PJ Fanning	d66dea300e	HADOOP-18341: upgrade commons-configuration2 to 2.8.0 and commons-text to 1.9 (#4916 )	2022-09-22 10:44:27 +09:00
Ashutosh Gupta	683fa264ee	HADOOP-16769. LocalDirAllocator to provide diagnostics when file creation fails (#4896 ) The patch provides detailed diagnostics of file creation failure in LocalDirAllocator. Contributed by: Ashutosh Gupta	2022-09-21 11:54:47 +05:30
Ashutosh Gupta	3af155ceeb	HADOOP-18400. Fix file split duplicating records from a succeeding split when reading BZip2 text files (#4732 ) Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com> Signed-off-by: Akira Ajisaka <aajisaka@apache.org> (cherry picked from commit `30c36ef25a`)	2022-09-19 13:45:47 +09:00
Steve Vaughan	357c83db94	HDFS-16686. GetJournalEditServlet fails to authorize valid Kerberos request (#4724 ) (#4794 )	2022-09-13 10:50:23 -07:00
Ashutosh Gupta	2532eca013	YARN-11241. Add uncleaning option for local app log file with log-aggregation enabled (#4703 ) Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com> Signed-off-by: Akira Ajisaka <aajisaka@apache.org> (cherry picked from commit `65a027b112`)	2022-09-12 23:33:10 +09:00
Mukund Thakur	c9d6605a59	HADOOP-18439. Fix VectoredIO for LocalFileSystem when checksum is enabled. (#4862 ) part of HADOOP-18103. While merging the ranges in CheckSumFs, they are rounded up based on the value of checksum bytes size which leads to some ranges crossing the EOF thus they need to be fixed else it will cause EOFException during actual reads. Contributed By: Mukund Thakur	2022-09-09 11:17:32 -05:00
Sumangala Patki	2e4c5ca88f	HADOOP-17873. ABFS: Fix transient failures in ITestAbfsStreamStatistics and ITestAbfsRestOperationException (#3699 ) Successor for the reverted PR #3341, using the hadoop @VisibleForTesting attribute Contributed by Sumangala Patki	2022-09-06 11:34:55 +01:00
sreeb-msft	5f3bc4340e	HADOOP-18408. ABFS: ITestAbfsManifestCommitProtocol fails on nonHNS configuration (#4758 ) ITestAbfsManifestCommitProtocol to set requireRenameResilience to false for nonHNS configuration Contributed by Sree Bhattacharyya	2022-09-02 12:34:43 +01:00
monthonk	9dffa65021	HADOOP-18339. S3A storage class option only picked up when buffering writes to disk. (#4669 ) Follow-up to HADOOP-12020 Support configuration of different S3 storage classes; S3 storage class is now set when buffering to heap/bytebuffers, and when creating directory markers Contributed by Monthon Klongklaew	2022-09-01 18:15:48 +01:00
Steve Vaughan	3a6c8ff8bb	HDFS-16755. TestQJMWithFaults.testUnresolvableHostName() can fail due to unexpected host resolution (#4833 ) Use ".invalid" domain from IETF RFC 2606 to ensure that the host doesn't resolve. Contributed by Steve Vaughan Jr	2022-09-01 14:01:26 +01:00
Mukund Thakur	6cc5c92a89	HADOOP-18391. Improvements in VectoredReadUtils#readVectored() for direct buffers (#4787 ) part of HADOOP-18103. Contributed By: Mukund Thakur	2022-08-31 11:15:15 -05:00
Mukund Thakur	0a11ce2546	HADOOP-18407. Improve readVectored() api spec (#4760 ) part of HADOOP-18103. Contributed By: Mukund Thakur	2022-08-31 11:15:10 -05:00
Steve Loughran	f6c557d3b3	HADOOP-18410. S3AInputStream.unbuffer() does not release http connections (#4766 ) HADOOP-16202 "Enhance openFile()" added asynchronous draining of the remaining bytes of an S3 HTTP input stream for those operations (unbuffer, seek) where it could avoid blocking the active thread. This patch fixes the asynchronous stream draining to work and so return the stream back to the http pool. Without this, whenever unbuffer() or seek() was called on a stream and an asynchronous drain triggered, the connection was not returned; eventually the pool would be empty and subsequent S3 requests would fail with the message "Timeout waiting for connection from pool" The root cause was that even though the fields passed in to drain() were converted to references through the methods, in the lambda expression passed in to submit, they were direct references operation = client.submit( () -> drain(uri, streamStatistics, false, reason, remaining, object, wrappedStream)); /* here */ Those fields were only read during the async execution, at which point they would have been set to null (or even a subsequent read). A new SDKStreamDrainer class peforms the draining; this is a Callable and can be submitted directly to the executor pool. The class is used in both the classic and prefetching s3a input streams. Also, calling unbuffer() switches the S3AInputStream from adaptive to random IO mode; that is, it is considered a cue that future IO will not be sequential, whole-file reads. Contributed by Steve Loughran.	2022-08-31 16:52:12 +01:00
Masatake Iwasaki	2a1701151c	HADOOP-18375. Fix failure of shelltest for hadoop_add_ldlibpath. (#4652 ) (cherry picked from commit `22835be63d`)	2022-08-30 10:44:11 +00:00
Steve Vaughan	833fc64558	HDFS-16684. Exclude the current JournalNode (#4786 ) The JournalNodeSyncer will include the local instance in syncing when using a bind host (e.g. 0.0.0.0). There is a mechanism that is supposed to exclude the local instance, but it doesn't recognize the meta-address as a local address. Running with bind addresses set to 0.0.0.0, the JournalNodeSyncer will log attempts to sync with itself as part of the normal syncing rotation. For an HA configuration running 3 JournalNodes, the "other" list used by the JournalNodeSyncer will include 3 proxies. Exclude bound local addresses, including the use of a wildcard address in the bound host configurations, while still allowing multiple instances on the same host. Allow sync attempts with unresolved addresses, so that sync attempts can drive resolution as servers become available. Backport. Signed-off-by: stack <stack@apache.org>	2022-08-28 11:15:04 -07:00
zhengchenyu	3edddaf9fc	HDFS-16732. [SBN READ] Avoid get location from observer when the block report is delayed (#4756 ) Signed-off-by: Erik Krogen <xkrogen@apache.org> (cherry picked from commit `231a4468cd`)	2022-08-25 10:41:04 -07:00
Simba Dzinamarira	0326b7e935	HADOOP-18406: Adds alignment context to call path for creating RPC proxy with multiple connections per user. Fixes #4748 Signed-off-by: Owen O'Malley <oomalley@linkedin.com>	2022-08-24 16:48:55 -07:00
xuzq	5b2d6684e6	HADOOP-13144. Enhancing IPC client throughput via multiple connections per user (#4542 )	2022-08-24 16:48:35 -07:00
Ayush Saxena	9890a4aea4	Revert "HADOOP-18417. Upgrade to M7 of surefire plugin (#4795 )" This reverts commit `1ff121041c`.	2022-08-25 03:53:34 +05:30
Steve Loughran	1168abc704	MAPREDUCE-7403. manifest-committer dynamic partitioning support. (#4728 ) Declares its compatibility with Spark's dynamic output partitioning by having the stream capability "mapreduce.job.committer.dynamic.partitioning" Requires a Spark release with SPARK-40034, which does the probing before deciding whether to accept/rejecting instantiation with dynamic partition overwrite set This feature can be declared as supported by any other PathOutputCommitter implementations whose algorithm and destination filesystem are compatible. None of the S3A committers are compatible. The classic FileOutputCommitter is, but it does not declare itself as such out of our fear of changing that code. The Spark-side code will automatically infer compatibility if the created committer is of that class or a subclass. Contributed by Steve Loughran.	2022-08-24 11:19:05 +01:00
Steve Vaughan	98dd2b534f	HADOOP-18417. Upgrade to M7 of surefire plugin (#4795 ) This addresses an issue where the plugin's default classpath for executing tests fails to include org.junit.platform.launcher.core.LauncherFactory. Contributed by: Steve Vaughan Jr	2022-08-24 11:07:34 +01:00
Wei-Chiu Chuang	c4d94f5623	HADOOP-18333. Upgrade jetty version to 9.4.48.v20220622 (#4600 ) * HADOOP-18001. Upgrade jetty version to 9.4.44 (#3700). Contributed by Yuan Luo. Signed-off-by: Ayush Saxena <ayushsaxena@apache.org> (cherry picked from commit `b85c66a035`) * HADOOP-18333.Upgrade jetty version to 9.4.48.v20220622 (#4553) Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com> (cherry picked from commit `e664f81ce7`) Conflicts: LICENSE-binary Change-Id: I5a758df2551539c2780e170c3738c5b21eb0c79d Co-authored-by: better3471 <46600375+better3471@users.noreply.github.com> Co-authored-by: Ashutosh Gupta <ashutosh.gupta@st.niituniversity.in>	2022-08-24 08:16:49 +08:00
Simba Dzinamarira	e28dc524f6	HDFS-16669: Enhance client protocol to propagate last seen state IDs for multiple nameservices. Fixes #4584 Signed-off-by: Owen O'Malley <oomalley@linkedin.com>	2022-08-23 11:27:21 -07:00
Steve Vaughan	4138661010	HDFS-16625. Check assumption about PMDK availability (#4788 ) Co-authored-by: Steve Vaughan Jr <s_vaughan@apple.com>	2022-08-23 19:35:59 +09:00
Steve Vaughan	a73efb2d55	HDFS-16687. RouterFsckServlet replicates code from DfsServlet base class (#4681 ) (#4790 )	2022-08-22 20:26:03 -07:00
Steve Vaughan	1120cc8485	HDFS-4043. Namenode Kerberos Login does not use proper hostname for host qualified hdfs principal name (#4785 ) Use the existing DomainNameResolver to leverage the pluggable resolution framework. This provides a means to perform a reverse lookup if needed. Update default implementation of DNSDomainNameResolver to protect against returning the IP address as a string from a cached value. Co-authored-by: Steve Vaughan Jr <s_vaughan@apple.com>	2022-08-23 05:34:33 +08:00
jianghuazhu	2123859d60	HDFS-16729. RBF: fix some unreasonably annotated docs. (#4745 ) Reviewed-by: Inigo Goiri <inigoiri@apache.org> Signed-off-by: Akira Ajisaka <aajisaka@apache.org> (cherry picked from commit `7f176d080c`)	2022-08-21 07:31:01 +09:00
Steve Vaughan	cfc11d2e5f	HADOOP-18365. Update the remote address when a change is detected (#4692 ) (#4768 ) Back port to branch-3.3, to avoid reconnecting to the old address after detecting that the address has been updated. * Use a stable hashCode to allow safe IP addr changes * Add test that updated address is used Once the address has been updated, it will be used in future calls. Test verifies that a second request succeeds and that it uses the existing updated address instead of having to re-resolve. Co-authored-by: Steve Vaughan Jr <s_vaughan@apple.com>	2022-08-19 18:56:02 -07:00
Viraj Jasani	51ddd02395	HADOOP-18403. Fix FileSystem leak in ITestS3AAWSCredentialsProvider (#4737 ) Contributed By: Viraj Jasani	2022-08-18 17:45:44 -05:00
Ashutosh Gupta	a5d5d0708a	HADOOP-18385. ITestS3ACannedACLs failure; fixed by adding in a span (#4736 ) Contributed by Ashutosh Gupta	2022-08-18 16:55:46 +01:00
Viraj Jasani	e8a28dc0d7	HADOOP-18371. S3A FS init to log at debug when fs.s3a.create.storage.class is unset (#4730 ) Contributed By: Viraj Jasani	2022-08-16 12:45:59 -05:00
Ashutosh Gupta	3b3bd89084	YARN-11248. Add unit test for FINISHED_CONTAINERS_PULLED_BY_AM event on DECOMMISSIONING (#4721 ) Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com> Signed-off-by: Akira Ajisaka <aajisaka@apache.org> (cherry picked from commit `f02ff1afe2`)	2022-08-16 19:07:42 +09:00
kevins-29	eff292bd5f	HADOOP-18383. Codecs with @DoNotPool annotation are not closed causing memory leak (#4739 )	2022-08-15 10:14:02 -07:00
Steve Loughran	97763619c9	HADOOP-18402. S3A committer NPE in spark job abort (#4735 ) JobID.toString() and TaskID.toString() to only be called when the IDs are not null. This doesn't surface in MapReduce, but Spark SQL can trigger in job abort, where it may invoke abortJob() with an incomplete TaskContext. This patch MUST be applied to branches containing HADOOP-17833. "Improve Magic Committer Performance." Contributed by Steve Loughran.	2022-08-15 11:32:06 +01:00
Viraj Jasani	6b7c1329b2	HADOOP-18397. Shutdown AWSSecurityTokenService when its resources are no longer in use (#4722 ) Contributed by Viraj Jasani.	2022-08-12 15:19:51 +01:00
Mukund Thakur	93c4704b33	HADOOP-18392. Propagate vectored s3a input stream stats to file system stats. (#4704 ) part of HADOOP-18103. Contributed By: Mukund Thakur	2022-08-11 15:24:25 -05:00

... 3 4 5 6 7 ...

25248 Commits All Branches Search

25248 Commits

All Branches