Commit Graph

5735 Commits

Author SHA1 Message Date
Melissa You d33ee67151
Hadoop-18520. Backport HADOOP-18427 and HADOOP-18452 to branch-3.3 (#5118)
* HADOOP-18427. Improve ZKDelegationTokenSecretManager#startThead With recommended methods. (#4812)

* HADOOP-18452. Fix TestKMS#testKMSHAZooKeeperDelegationToken Failed By Hadoop-18427. (#4885)

Co-authored-by: slfan1989 <55643692+slfan1989@users.noreply.github.com>
2022-11-07 18:48:29 -08:00
Melissa You 02aedd7811
Hadoop-18519. Backport HDFS-15383 and HADOOP-17835 to branch-3.3 (#5112)
* HDFS-15383. RBF: Add support for router delegation token without watch (#2047)

Improving router's performance for delegation tokens related operations. It achieves the goal by removing watchers from router on tokens since based on our experience. The huge number of watches inside Zookeeper is degrading Zookeeper's performance pretty hard. The current limit is about 1.2-1.5 million.

* HADOOP-17835. Use CuratorCache implementation instead of PathChildrenCache / TreeCache (#3266)

Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
Co-authored-by: lfengnan <lfengnan@uber.com>
Co-authored-by: Viraj Jasani <vjasani@apache.org>
Co-authored-by: Melissa You <myou@myou-mn1.linkedin.biz>
2022-11-07 13:29:50 -08:00
Melissa You 853ffb545a
HADOOP-18515. Backport HADOOP-17612 to branch-3.3(Upgrade Zookeeper to 3.6.3 and Curator to 5.2.0) (#5097)
* HADOOP-17612. Upgrade Zookeeper to 3.6.3 and Curator to 5.2.0 (#3241)

Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
Co-authored-by: Viraj Jasani <vjasani@apache.org>
Co-authored-by: Melissa You <myou@myou-mn1.linkedin.biz>
2022-11-05 09:28:24 -07:00
M1eyu2018 cbac2c4875 HDFS-16716. Improve appendToFile command: support appending on file with new block (#4697)
Reviewed-by: xuzq <15040255127@163.com>
Signed-off-by: Tao Li <tomscut@apache.org>
2022-10-27 19:11:51 +08:00
Steve Loughran 19f8e4f34d
YARN-11330. use secure XML parsers (#4981)
Move construction of XML parsers in YARN
modules to using the locked-down parser factory
of HADOOP-18469.

One exception: GpuDeviceInformationParser still supports DTD resolution;
all other features are disabled.

Contributed by P J Fanning
2022-10-21 14:16:22 +01:00
FuzzingTeam 1f414ab847
HADOOP-18471. Fixed ArrayIndexOutOfBoundsException in DefaultStringifier (#4957)
Contributed by FuzzingTeam
2022-10-20 18:14:24 +01:00
PJ Fanning ea851c5e4a
HADOOP-15983. Use jersey-json that is built to use jackson2 ((#3988)
Moves from com.sun.jersey 1.19 to the artifact
com.github.pjfanning:jersey-json:1.20

This allows jackson 1 to be removed from the classpath.

Contains

* HADOOP-16908. Prune Jackson 1 from the codebase and restrict
   its usage for future
* HADOOP-18219. Fix shaded client test failure

These are needed for the HADOOP-15983 changes to build.

Contributed by PJ Fanning.
2022-10-20 17:37:56 +01:00
belugabehr 6253bf72b6
HADOOP-17779: Lock File System Creator Semaphore Uninterruptibly (#3158)
Contributed by David Mollitor.
2022-10-11 13:07:42 +01:00
Ashutosh Gupta 6847ec0647
HADOOP-11245. Update NFS gateway to use Netty4 (#2832) (#4997)
Reviewed-by: Tsz-Wo Nicholas Sze <szetszwo@apache.org>

Co-authored-by: Wei-Chiu Chuang <weichiu@apache.org>
2022-10-11 05:27:43 +08:00
Steve Loughran 61e1603750
HADOOP-18401. No ARM binaries in branch-3.3.x releases. (#4953)
Fix the branch-3.3 docker image and create-release scripts to work on arm 64 and macbook m1

Contributed by Ayush Saxena and Steve Loughran
2022-10-07 15:58:51 +01:00
Steve Loughran c70b8709cc
HADOOP-18442. Remove openstack support (#4855)
The swift:// connector for openstack support has been removed.
The hadoop-openstack jar remains, only now it is empty of code. 
This is to ensure that projects which declare the JAR a dependency
will still have successful builds.

Contributed by Steve Loughran
2022-10-07 12:03:08 +01:00
Steve Loughran 80781306dd
HADOOP-18469. Add secure XML parser factories to XMLUtils (#4940)
Add to XMLUtils a set of methods to create secure XML Parsers/transformers,
locking down DTD, schema, XXE exposure.

Use these wherever XML parsers are created.

Contributed by PJ Fanning
2022-10-07 10:47:55 +01:00
Mukund Thakur 0d772b353f HADOOP-18463. Add an integration test to process data asynchronously during vectored read. (#4921)
part of HADOOP-18103.

Contributed by: Mukund Thakur
2022-09-28 15:38:41 -05:00
Mukund Thakur bbe841e601 HADOOP-18347. S3A Vectored IO to use bounded thread pool. (#4918)
part of HADOOP-18103.

Also introducing a config fs.s3a.vectored.active.ranged.reads
to configure the maximum number of number of range reads a
single input stream can have active (downloading, or queued)
to the central FileSystem instance's pool of queued operations.
This stops a single stream overloading the shared thread pool.

Contributed by: Mukund Thakur
 Conflicts:
	hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java
	hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
2022-09-28 15:34:31 -05:00
Xing Lin f1c1ad52c5 HADOOP-18444 Add Support for localized trash for ViewFileSystem in Trash.moveToAppropriateTrash (#4869)
* HADOOP-18444 Add Support for localized trash for ViewFileSystem in Trash.moveToAppropriateTrash

Signed-off-by: Xing Lin <xinglin@linkedin.com>
2022-09-23 11:06:23 -07:00
Steve Loughran af0a6d7987
HADOOP-18456. NullPointerException in ObjectListingIterator. (#4909)
This problem surfaced in impala integration tests
   IMPALA-11592. TestLocalCatalogRetries.test_fetch_metadata_retry fails in S3 build
after the change
  HADOOP-17461. Add thread-level IOStatistics Context
The actual GC race condition came with
 HADOOP-18091. S3A auditing leaks memory through ThreadLocal references

The fix for this is, if our hypothesis is correct, in WeakReferenceMap.create()
where a strong reference to the new value is kept in a local variable
*and referred to later* so that the JVM will not GC it.

Along with the fix, extra assertions ensure that if the problem is not fixed,
applications will fail faster/more meaningfully.

Contributed by Steve Loughran.
2022-09-23 09:57:49 +01:00
Ashutosh Gupta 683fa264ee
HADOOP-16769. LocalDirAllocator to provide diagnostics when file creation fails (#4896)
The patch provides detailed diagnostics of file creation failure in LocalDirAllocator.

Contributed by: Ashutosh Gupta
2022-09-21 11:54:47 +05:30
Ashutosh Gupta 3af155ceeb HADOOP-18400. Fix file split duplicating records from a succeeding split when reading BZip2 text files (#4732)
Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com>
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
(cherry picked from commit 30c36ef25a)
2022-09-19 13:45:47 +09:00
Mukund Thakur c9d6605a59 HADOOP-18439. Fix VectoredIO for LocalFileSystem when checksum is enabled. (#4862)
part of HADOOP-18103.

While merging the ranges in CheckSumFs, they are rounded up based on the
value of checksum bytes size which leads to some ranges crossing the EOF
thus they need to be fixed else it will cause EOFException during actual reads.

Contributed By: Mukund Thakur
2022-09-09 11:17:32 -05:00
Mukund Thakur 6cc5c92a89 HADOOP-18391. Improvements in VectoredReadUtils#readVectored() for direct buffers (#4787)
part of HADOOP-18103.

Contributed By: Mukund Thakur
2022-08-31 11:15:15 -05:00
Mukund Thakur 0a11ce2546 HADOOP-18407. Improve readVectored() api spec (#4760)
part of HADOOP-18103.

Contributed By: Mukund Thakur
2022-08-31 11:15:10 -05:00
Masatake Iwasaki 2a1701151c HADOOP-18375. Fix failure of shelltest for hadoop_add_ldlibpath. (#4652)
(cherry picked from commit 22835be63d)
2022-08-30 10:44:11 +00:00
Simba Dzinamarira 0326b7e935
HADOOP-18406: Adds alignment context to call path for creating RPC proxy with multiple connections per user.
Fixes #4748

Signed-off-by: Owen O'Malley <oomalley@linkedin.com>
2022-08-24 16:48:55 -07:00
xuzq 5b2d6684e6 HADOOP-13144. Enhancing IPC client throughput via multiple connections per user (#4542) 2022-08-24 16:48:35 -07:00
Wei-Chiu Chuang c4d94f5623
HADOOP-18333. Upgrade jetty version to 9.4.48.v20220622 (#4600)
* HADOOP-18001. Upgrade jetty version to 9.4.44 (#3700). Contributed by Yuan Luo.

Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
(cherry picked from commit b85c66a035)

* HADOOP-18333.Upgrade jetty version to 9.4.48.v20220622 (#4553)

Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com>
(cherry picked from commit e664f81ce7)

 Conflicts:
	LICENSE-binary

Change-Id: I5a758df2551539c2780e170c3738c5b21eb0c79d

Co-authored-by: better3471 <46600375+better3471@users.noreply.github.com>
Co-authored-by: Ashutosh Gupta <ashutosh.gupta@st.niituniversity.in>
2022-08-24 08:16:49 +08:00
Simba Dzinamarira e28dc524f6
HDFS-16669: Enhance client protocol to propagate last seen state IDs for multiple nameservices.
Fixes #4584

Signed-off-by: Owen O'Malley <oomalley@linkedin.com>
2022-08-23 11:27:21 -07:00
Steve Vaughan 1120cc8485
HDFS-4043. Namenode Kerberos Login does not use proper hostname for host qualified hdfs principal name (#4785)
Use the existing DomainNameResolver to leverage the pluggable resolution framework.  This provides a means to perform a reverse lookup if needed.

Update default implementation of DNSDomainNameResolver to protect against returning the IP address as a string from a cached value.

Co-authored-by: Steve Vaughan Jr <s_vaughan@apple.com>
2022-08-23 05:34:33 +08:00
Steve Vaughan cfc11d2e5f
HADOOP-18365. Update the remote address when a change is detected (#4692) (#4768)
Back port to branch-3.3, to avoid reconnecting to the old address after detecting that the address has been updated.

* Use a stable hashCode to allow safe IP addr changes
* Add test that updated address is used

Once the address has been updated, it will be used in future calls.  Test verifies that a second request succeeds and that it uses the existing updated address instead of having to re-resolve.

Co-authored-by: Steve Vaughan Jr <s_vaughan@apple.com>
2022-08-19 18:56:02 -07:00
kevins-29 eff292bd5f
HADOOP-18383. Codecs with @DoNotPool annotation are not closed causing memory leak (#4739) 2022-08-15 10:14:02 -07:00
Mukund Thakur 93c4704b33 HADOOP-18392. Propagate vectored s3a input stream stats to file system stats. (#4704)
part of HADOOP-18103.

Contributed By: Mukund Thakur
2022-08-11 15:24:25 -05:00
Mukund Thakur 09c8084191 HADOOP-18355. Update previous index properly while validating overlapping ranges. (#4647)
part of HADOOP-18103.

Contributed By: Mukund Thakur
2022-08-11 15:24:08 -05:00
Mukund Thakur 147a466c6d HADOOP-18227. Add input stream IOStats for vectored IO api in S3A. (#4636)
part of HADOOP-18103.

Contributed By: Mukund Thakur
2022-08-11 15:23:57 -05:00
Yubi Lee a0e2ab2974
HADOOP-18398. Prevent AvroRecord*.class from being included non-test jar (#4727)
Contributed by Yubi Lee.
2022-08-11 20:16:52 +01:00
Viraj Jasani 0455769531
HADOOP-18373. IOStatisticsContext tuning (#4705)
The name of the option to enable/disable thread level statistics is
"fs.iostatistics.thread.level.enabled";

There is also an enabled() probe in IOStatisticsContext which can
be used to see if the thread level statistics is active.

Contributed by Viraj Jasani
2022-08-08 14:37:39 +01:00
Ashutosh Gupta 29ea8ceb49 HADOOP-18390. Fix out of sync import for HADOOP-18321 (#4694)
Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com>
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
(cherry picked from commit bd0f9a46e1)
2022-08-07 16:06:09 +09:00
Ashutosh Gupta 3c339a11ec HADOOP-18321.Fix when to read an additional record from a BZip2 text file split (#4521)
* HADOOP-18321.Fix when to read an additional record from a BZip2 text file split

Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com> and Reviewed by Akira Ajisaka.
(cherry picked from commit a432925f74)
2022-08-06 21:53:48 +09:00
Steve Loughran 9c5228cf6b
HADOOP-18305. Release Hadoop 3.3.4: upstream changelog and jdiff files
Add the r3.3.4 changelog, release notes and jdiff xml files.

Change-Id: I98b0fed54da3b810c3f23fe5b12e673937916257
2022-08-05 14:02:28 +01:00
Mehakmeet Singh 363f8138d2
HADOOP-17461. Collect thread-level IOStatistics. (#4352)
This adds a thread-level collector of IOStatistics, IOStatisticsContext,
which can be:
* Retrieved for a thread and cached for access from other
  threads.
* reset() to record new statistics.
* Queried for live statistics through the
  IOStatisticsSource.getIOStatistics() method.
* Queries for a statistics aggregator for use in instrumented
  classes.
* Asked to create a serializable copy in snapshot()

The goal is to make it possible for applications with multiple
threads performing different work items simultaneously
to be able to collect statistics on the individual threads,
and so generate aggregate reports on the total work performed
for a specific job, query or similar unit of work.

Some changes in IOStatistics-gathering classes are needed for 
this feature
* Caching the active context's aggregator in the object's
  constructor
* Updating it in close()

Slightly more work is needed in multithreaded code,
such as the S3A committers, which collect statistics across
all threads used in task and job commit operations.

Currently the IOStatisticsContext-aware classes are:
* The S3A input stream, output stream and list iterators.
* RawLocalFileSystem's input and output streams.
* The S3A committers.
* The TaskPool class in hadoop-common, which propagates
  the active context into scheduled worker threads.

Collection of statistics in the IOStatisticsContext
is disabled process-wide by default until the feature 
is considered stable.

To enable the collection, set the option
fs.thread.level.iostatistics.enabled
to "true" in core-site.xml;
	
Contributed by Mehakmeet Singh and Steve Loughran
2022-07-27 11:23:06 +01:00
Masatake Iwasaki 0ff544951a Make upstream aware of 3.2.4 release.
(cherry picked from commit 817b8fdd38)
2022-07-22 04:09:00 +00:00
Masatake Iwasaki ff13f9ee8b Make upstream aware of 3.2.4 release.
(cherry picked from commit e1637a57df)
2022-07-22 02:31:34 +00:00
lmccay f015c7f1a7
HADOOP-18074 - Partial/Incomplete groups list can be returned in LDAP. (#4503)
Partial/Incomplete groups list can be returned in LDAP groups lookup.

Backported in #4550; minor tuning of parameters needed.

Contributed by larry mccay
2022-07-14 13:54:15 +01:00
Ashutosh Gupta 3a6d865a0f
HADOOP-18336. Tag FSDataInputStream.getWrappedStream() @Public/@Stable (#4555)
Contributed by: Ashutosh Gupta
2022-07-13 12:57:28 +01:00
HerCath e0abdd8d33
HADOOP-18217. ExitUtil synchronized blocks reduced. #4255
Reduce the ExitUtil synchronized block scopes so System.exit
and Runtime.halt calls aren't within their boundaries,
so ExitUtil wrappers do not block each other.

Enlarged catches to all Throwables (not just Exceptions).

Contributed by Remi Catherinot
2022-07-13 12:37:32 +01:00
hchaverr f4e8a4f36c HDFS-16591. Setup JaasConfiguration in ZKCuratorManager when SASL is enabled
Fixes #4447
Signed-off-by: Owen O'Malley <oomalley@linkedin.com>
2022-07-11 12:55:18 -07:00
Masatake Iwasaki 75c739c458 Revert "HADOOP-17196. Fix C/C++ standard warnings (#2208)"
This reverts commit b4a105a209.
2022-06-30 00:57:52 +00:00
Mukund Thakur c517b086f2 HADOOP-18106: Handle memory fragmentation in S3A Vectored IO. (#4445)
part of HADOOP-18103.
Handling memory fragmentation in S3A vectored IO implementation by
allocating smaller user range requested size buffers and directly
filling them from the remote S3 stream and skipping undesired
data in between ranges.
This patch also adds aborting active vectored reads when stream is
closed or unbuffer() is called.

Contributed By: Mukund Thakur

 Conflicts:
	hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java
2022-06-23 17:34:29 -05:00
Mukund Thakur bfb7d020d1 HADOOP-18105 Implement buffer pooling with weak references (#4263)
part of HADOOP-18103.
Required for vectored IO feature. None of current buffer pool
implementation is complete. ElasticByteBufferPool doesn't use
weak references and could lead to memory leak errors and
DirectBufferPool doesn't support caller preferences of direct
and heap buffers and has only fixed length buffer implementation.

Contributed By: Mukund Thakur
2022-06-23 17:11:13 -05:00
Mukund Thakur bb5a17b177 HADOOP-18107 Adding scale test for vectored reads for large file (#4273)
part of HADOOP-18103.

Contributed By: Mukund Thakur
2022-06-23 17:11:09 -05:00
Mukund Thakur 9f03f87963 HADOOP-18104: S3A: Add configs to configure minSeekForVectorReads and maxReadSizeForVectorReads (#3964)
Part of HADOOP-18103.
Introducing fs.s3a.vectored.read.min.seek.size and fs.s3a.vectored.read.max.merged.size
to configure min seek and max read during a vectored IO operation in S3A connector.
These properties actually define how the ranges will be merged. To completely
disable merging set fs.s3a.max.readsize.vectored.read to 0.

Contributed By: Mukund Thakur
2022-06-23 17:11:04 -05:00
Mukund Thakur 5c348c41ab HADOOP-11867. Add a high-performance vectored read API. (#3904)
part of HADOOP-18103.
Add support for multiple ranged vectored read api in PositionedReadable.
The default iterates through the ranges to read each synchronously,
but the intent is that FSDataInputStream subclasses can make more
efficient readers especially in object stores implementation.

Also added implementation in S3A where smaller ranges are merged and
sliced byte buffers are returned to the readers. All the merged ranged are
fetched from S3 asynchronously.

Contributed By: Owen O'Malley and Mukund Thakur

 Conflicts:
	hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java
	pom.xml
2022-06-23 17:09:16 -05:00
Viraj Jasani 4ba463069b
HADOOP-18288. Total requests and total requests per sec served by RPC servers (#4485)
Signed-off-by: Tao Li <tomscut@apache.org>
2022-06-23 17:30:01 +08:00
Steve Loughran 9ca4ac0af0
HADOOP-18305. Preparing for 3.3.4 release: branch-3.3 version => 3.3.9 (#4482)
Updating the hadoop version of branch-3.3 to 3.3.9-SNAPSHOT
pending agreement on what number its future release should take.

Using 3.3.9-SNAPSHOT puts space in for other incremental releases,
while avoiding creating JIRA release ordering and autocompletion
confusion the way adding a 3.3.10 or higher version would do.

Contributed by Steve Loughran
2022-06-22 13:09:50 +01:00
Steve Loughran aeb2a2f860
HADOOP-17833. Improve Magic Committer performance (#3289) (#4470)
Speed up the magic committer with key changes being

* Writes under __magic always retain directory markers

* File creation under __magic skips all overwrite checks,
  including the LIST call intended to stop files being
        created over dirs.
* mkdirs under __magic probes the path for existence
  but does not look any further.

Extra parallelism in task and job commit directory scanning
Use of createFile and openFile with parameters which all for
HEAD checks to be skipped.

The committer can write the summary _SUCCESS file to the path
`fs.s3a.committer.summary.report.directory`, which can be in a
different file system/bucket if desired, using the job id as
the filename.

Also: HADOOP-15460. S3A FS to add `fs.s3a.create.performance`

Application code can set the createFile() option
fs.s3a.create.performance to true to disable the same
safety checks when writing under magic directories.
Use with care.

The createFile option prefix `fs.s3a.create.header.`
can be used to add custom headers to S3 objects when
created.

Contributed by Steve Loughran.
2022-06-21 10:49:37 +01:00
zhengchenyu d7de378b22
YARN-11172. Fix TestClientRMTokens#testDelegationToken introduced by HDFS-16563. (#4408)
Regression caused by HDFS-16563; the hdfs exception text was changed, but because it was
a YARN test doing the check, Yetus didn't notice.

Contributed by zhengchenyu
2022-06-17 19:51:56 +01:00
Renukaprasad C 0c15daa77a
HDFS-16563. Namenode WebUI prints sensitive information on Token expiry (#4241)
Contributed by Renukaprasad C

Change-Id: I5cd2cec1dd79917f810207821b3bdf4fe1a5d24c
2022-06-06 11:08:57 +01:00
Ashutosh Gupta de4c975710
HADOOP-18238. Fix reentrancy check in SFTPFileSystem.close() (#4330)
Contributed by Ashutosh Gupta

Change-Id: I2742675add74259a93b3762a80c7ab5ee6d08c37
2022-05-30 17:34:45 +01:00
slfan1989 91f19bf8fa
HADOOP-18244. Fix Hadoop-Common JavaDoc Error on branch-3.3 (#4327). Contributed by fanshilun. 2022-05-29 11:31:16 +05:30
Wei-Chiu Chuang ba856bff95
HDFS-16456. EC: Decommission a rack with only on dn will fail when the rack number is equal with replication (#4126) (#4304)
(cherry picked from commit cee8c62498)

 Conflicts:
	hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetworkTopology.java

(cherry picked from commit dd79aee635fdc61648e0c87bea1560dc35aee053)

Co-authored-by: caozhiqiang <lfxy@163.com>
Reviewed-by: Takanobu Asanuma <tasanuma@apache.org>
2022-05-27 00:50:40 +08:00
Steve Loughran fe306ce57e HADOOP-18198. Release 3.3.3: release notes and jdiff files.
* Add the changelog and release notes
* add all jdiff XML files
* update the project pom with the new stable version

Change-Id: Iaea846c3e451bbd446b45de146845a48953d580d
2022-05-17 19:00:09 +01:00
hchaverr 1e043b937a
HADOOP-18222. Prevent DelegationTokenSecretManagerMetrics from registering multiple times
Fixes #4266

Signed-off-by: Owen O'Malley <oomalley@linkedin.com>
2022-05-10 14:52:31 -07:00
Ashutosh Gupta 277daca91f
HADOOP-17479. Fix the examples of hadoop config prefix (#4197)
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
(cherry picked from commit 40a8b9a6a5)
2022-05-08 08:09:47 +09:00
Ashutosh Gupta 62c6a08ffd
HADOOP-16515. Update the link to compatibility guide (#4226)
Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com>
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
(cherry picked from commit ae47846a5b)
2022-05-08 07:39:17 +09:00
Steve Loughran 75950e47e7
HADOOP-16202. Enhanced openFile(): hadoop-common changes. (#2584/1)
This defines standard option and values for the
openFile() builder API for opening a file:

fs.option.openfile.read.policy
 A list of the desired read policy, in preferred order.
 standard values are
 adaptive, default, random, sequential, vector, whole-file

fs.option.openfile.length
 How long the file is.

fs.option.openfile.split.start
 start of a task's split

fs.option.openfile.split.end
 end of a task's split

These can be used by filesystem connectors to optimize their
reading of the source file, including but not limited to
* skipping existence/length probes when opening a file
* choosing a policy for prefetching/caching data

The hadoop shell commands which read files all declare "whole-file"
and "sequential", as appropriate.

Contributed by Steve Loughran.

Change-Id: Ia290f79ea7973ce8713d4f90f1315b24d7a23da1
2022-04-27 19:23:10 +01:00
Viraj Jasani bb13e228bc
HADOOP-17956. Replace all default Charset usage with UTF-8 (#3529)
Change-Id: I0094a84619ce19acf340d8dd1040cfe9bd88184e
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
2022-04-27 10:30:07 +01:00
hchaverri 4043ef0485 HADOOP-18167. Add metrics to track delegation token secret manager op… (#4092)
* HADOOP-18167. Add metrics to track delegation token secret manager operations
2022-04-26 14:28:25 -07:00
Ashutosh Gupta 261b11f7db HADOOP-17564. Fix typo in UnixShellGuide.html (#4195)
contributed by Ashutosh Gupta
2022-04-22 18:00:58 +01:00
Boyina, Hemanth Kumar 30061940af
HADOOP-17588. CryptoInputStream#close() should be synchronized.
Contributed by RenukaPrasad C

Change-Id: Ia0a19ccc55a67e5434f0be23a500496bc7682f40
2022-04-21 22:11:37 +01:00
Xing Lin 7ae328949e HADOOP-18172: Changed scope for isRootInternalDir/getRootFallbackLink for InodeTree (#4106)
Co-authored-by: Xing Lin <xinglin@linkedin.com>
(cherry picked from commit 98b9c435f2)
2022-04-20 10:55:48 -07:00
Steve Loughran 44e662272f
HADOOP-18198. Preparing for 3.3.4 development
Change-Id: I2bf19beb541739af22fced38c2545f09c4e1bd53
2022-04-12 14:09:08 +01:00
Viraj Jasani e5516cdfaf HADOOP-18191. Log retry count while handling exceptions in RetryInvocationHandler (#4133)
(cherry picked from commit b69ede7154)
2022-04-11 15:23:55 +09:00
Hanisha Koneru 9da7d80c4e HADOOP-17116. Skip Retry INFO logging on first failover from a proxy
(cherry picked from commit e62d8f8412)
2022-04-11 15:19:18 +09:00
Masatake Iwasaki 160b6d106d
HADOOP-18088. Replace log4j 1.x with reload4j. (#4052)
Co-authored-by: Wei-Chiu Chuang <weichiu@apache.org>
2022-04-07 08:33:13 +09:00
Viraj Jasani a6fb77f7eb HDFS-16481. Provide support to set Http and Rpc ports in MiniJournalCluster (#4028). Contributed by Viraj Jasani.
(cherry picked from commit 278568203b)
2022-04-06 18:40:07 +09:00
Xing Lin 20483f6dc7
HADOOP-18169. getDelegationTokens in ViewFs should also fetch the token from fallback FS (#4094)
Signed-off-by: Owen O'Malley <oomalley@linkedin.com>
2022-03-31 15:17:26 -07:00
Xing Lin ecafd38c09
HADOOP-18144: getTrashRoot in ViewFileSystem should return a path in ViewFS. (#4123)
To get the new behavior, define fs.viewfs.trash.force-inside-mount-point to be true.

If the trash root for path p is in the same mount point as path p,
and one of:
* The mount point isn't at the top of the target fs.
* The resolved path of path is root (eg it is the fallback FS).
* The trash root isn't in user's target fs home directory.
get the corresponding viewFS path for the trash root and return it.
Otherwise, use <mnt>/.Trash/<user>.

Signed-off-by: Owen O'Malley <oomalley@linkedin.com>
(cherry picked from commit 8b8158f02d)

Co-authored-by: Xing Lin <xinglin@linkedin.com>
2022-03-31 20:26:09 +00:00
litao 0ecb34f8f6
HDFS-16413. Reconfig dfs usage parameters for datanode (#3863) (#4125) 2022-03-31 19:24:05 +09:00
Owen O'Malley e24bd1c15b HDFS-16517 Distance metric is wrong for non-DN machines in 2.10. Fixed in HADOOP-16161, but
this test case adds value to ensure the two getWeight methods stay in sync.

Fixes #4091

Signed-off-by: Owen O'Malley <oomalley@linkedin.com>
2022-03-30 14:58:03 -07:00
zhongjingxiong 1ee93f7947
HADOOP-18145. Fileutil's unzip method causes unzipped files to lose their original permissions (#4036)
Contributed by jingxiong zhong

Change-Id: Ie252e609798719dc658364f9bae48b34dc72a79c
2022-03-30 12:52:52 +01:00
Masatake Iwasaki 419d9718a8 Make upstream aware of 3.2.3 release.
(cherry picked from commit 10876333ac)
2022-03-28 08:03:39 +00:00
Steve Loughran 105e0dbd92
HADOOP-13704. Optimized S3A getContentSummary()
Optimize the scan for s3 by performing a deep tree listing,
inferring directory counts from the paths returned.

Contributed by Ahmar Suhail.

Change-Id: I26ffa8c6f65fd11c68a88d6e2243b0eac6ffd024
2022-03-22 13:31:13 +00:00
Owen O'Malley 1175acbb75 HDFS-13248: Namenode needs to use the actual client IP when going through the
RBF proxy. There is a new configuration knob dfs.namenode.ip-proxy-users that configures
the list of users than can set their client ip address using the client context.

Fixes #4081
2022-03-21 10:24:54 -07:00
Abhishek Das c3a4ce8ee8 HADOOP-18129: Change URI to String in INodeLink to reduce memory footprint of ViewFileSystem
Fixes #3996
2022-03-17 17:27:26 -07:00
Steve Loughran e06ed88012
HADOOP-18162. hadoop-common support for MAPREDUCE-7341 Manifest Committer
* New statistic names in StoreStatisticNames
  (for joint use with s3a committers)
* Improvements to IOStatistics implementation classes
* RateLimiting wrapper to guava RateLimiter
* S3A committer Tasks moved over as TaskPool and
  added support for RemoteIterator
* JsonSerialization.load() to fail fast if source does not exist

+ tests.

This commit is a prerequisite for the main MAPREDUCE-7341 Manifest Committer
patch.

Contributed by Steve Loughran

Change-Id: Ia92e2ab5083ac3d8d3d713a4d9cb3e9e0278f654
2022-03-17 11:46:11 +00:00
Owen O'Malley c52b97d084 HDFS-16495: RBF should prepend the client ip rather than append it.
Fixes #4054

Signed-off-by: Owen O'Malley <oomalley@linkedin.com>
2022-03-14 10:30:29 -07:00
Takanobu Asanuma 52fb9d7ce2 HADOOP-18014. CallerContext should not include some characters. (#3698)
Reviewed-by: Viraj Jasani <vjasani@apache.org>
Reviewed-by: Mingliang Liu <liuml07@apache.org>
Reviewed-by: Hui Fei <ferhui@apache.org>

Cherry-picked from 9c887e5b by Owen O'Malley
2022-03-14 10:29:37 -07:00
litao 0029f22d7d HADOOP-18003. Add a method appendIfAbsent for CallerContext (#3644)
Cherry-picked from 573b358f by Owen O'Malley
2022-03-14 10:29:23 -07:00
litao f9d40ed7b7 HDFS-16266. Add remote port information to HDFS audit log (#3538)
Reviewed-by: Akira Ajisaka <aajisaka@apache.org>
Reviewed-by: Wei-Chiu Chuang <weichiu@apache.org>
Signed-off-by: Takanobu Asanuma <tasanuma@apache.org>
Cherry-picked from 359b03c8 by Owen O'Malley
2022-03-14 10:29:11 -07:00
Hui Fei 8e129e5b8d HDFS-13293. RBF: The RouterRPCServer should transfer client IP via CallerContext to NamenodeRpcServer (#2363)
Cherry-picked from 518a212c by Owen O'Malley
2022-03-14 10:28:55 -07:00
Fei Hui 5a38ed2f22 HADOOP-17276. Extend CallerContext to make it include many items (#2327)
Cherry-picked from d0d10f7e by Owen O'Malley
2022-03-14 10:28:38 -07:00
Wei-Chiu Chuang 743db6e7b4
HADOOP-18155. Refactor tests in TestFileUtil (#4063)
(cherry picked from commit d0fa9b5775)

 Conflicts:
	hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileUtil.java
	hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestFileUtil.java

Co-authored-by: Gautham B A <gautham.bangalore@gmail.com>
2022-03-14 09:40:17 +09:00
Mukund Thakur e0619b702a HADOOP-18112: Implement paging during multi object delete. (#4045)
Multi object delete of size more than 1000 is not supported by S3 and 
fails with MalformedXML error. So implementing paging of requests to 
reduce the number of keys in a single request. Page size can be configured
using "fs.s3a.bulk.delete.page.size" 

 Contributed By: Mukund Thakur
2022-03-11 13:16:51 +05:30
Chao Sun b174aaed57 Make upstream aware of 3.3.2 release 2022-03-02 19:10:30 -08:00
Owen O'Malley 1a3060d41e
HADOOP-18139: Allow configuration of zookeeper server principal.
Fixes #4024

Signed-off-by: Owen O'Malley <oomalley@linkedin.com>
2022-02-24 15:15:13 -08:00
Steve Loughran 94a0a04113
HADOOP-18136. Verify FileUtils.unTar() handling of missing .tar files.
Contributed by Steve Loughran

Change-Id: I3856afa821dbc8c2e3cb1cbe33793ec1734e2e24
2022-02-21 17:09:36 +00:00
Chentao Yu d14a7c6ee5 HADOOP-18109. Ensure that default permissions of directories under internal ViewFS directories are the same as directories on target filesystems. Contributed by Chentao Yu. (3953)
(cherry picked from commit 19d90e62fb)
2022-02-15 16:48:13 -08:00
GuoPhilipse 7512714475
HDFS-16449. Fix hadoop web site release notes and changelog not available (#3967)
Reviewed-by: Ayush Saxena <ayushsaxena@apache.org>
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
(cherry picked from commit b68964336d)
2022-02-14 05:40:16 +09:00
daimin 9071c9646c
Fix thread safety of EC decoding during concurrent preads (#3881)
(cherry picked from commit 0e74f1e467)
2022-02-11 10:20:45 +08:00
Steve Loughran 088684ec60
HADOOP-18091. S3A auditing leaks memory through ThreadLocal references (#3930)
Adds a new map type WeakReferenceMap, which stores weak
references to values, and a WeakReferenceThreadMap subclass
to more closely resemble a thread local type, as it is a
map of threadId to value.

Construct it with a factory method and optional callback
for notification on loss and regeneration.

 WeakReferenceThreadMap<WrappingAuditSpan> activeSpan =
      new WeakReferenceThreadMap<>(
          (k) -> getUnbondedSpan(),
          this::noteSpanReferenceLost);

This is used in ActiveAuditManagerS3A for span tracking.

Relates to
* HADOOP-17511. Add an Audit plugin point for S3A
* HADOOP-18094. Disable S3A auditing by default.

Contributed by Steve Loughran.

Change-Id: Ibf7bb082fd47298f7ebf46d92f56e80ca9b2aaf8
2022-02-10 12:33:40 +00:00
Abhishek Das 8b03514eaf HADOOP-18100: Change scope of inner classes in InodeTree to make them accessible outside package
Fixes #3950

Signed-off-by: Owen O'Malley <omalley@apache.org>

Cherry-picked from 3684c7f6 by Owen O'Malley
2022-02-04 12:13:10 -08:00
Petre Bogdan Stolojan 664075f35d
HADOOP-17198. Support S3 Access Points (#3260)
Add support for S3 Access Points. This provides extra security as it
ensures applications are not working with buckets belong to third parties.

To bind a bucket to an access point, set the access point (ap) ARN,
which must be done for each specific bucket, using the pattern

fs.s3a.bucket.$BUCKET.accesspoint.arn = ARN

* The global/bucket option `fs.s3a.accesspoint.required` to
mandate that buckets must declare their access point.
* This is not compatible with S3Guard.

Consult the documentation for further details.

Contributed by Bogdan Stolojan

(this commit contains the changes to TestArnResource from HADOOP-18068,
 "upgrade AWS SDK to 1.12.132" so that it works with the later SDK.)

Change-Id: I3fac213e52ca6ec1c813effb8496c353964b8e1b
2022-02-04 16:21:35 +00:00