Commit Graph

1142 Commits

Author SHA1 Message Date
Steve Loughran 134539f054
HADOOP-17199. S3A Directory Marker HADOOP-13230 backport #2210)
This backports the listing-side changes of HADOOP-13230.

With this patch in, this branch of Hadoop is compatible with S3A clients
which do not delete directory markers when files are created underneath.

It does not allow this version to disable marker deletion; if the
fs.s3a.marker.retention option is changed to request this, a message
is printed at INFO and the policy remains at "delete"

The s3guard bucket-info command has been extended to support
probing for marker retention, as has the hasPathCapability method on
S3AFileSystem.

Read the documentation!
2020-08-25 22:47:43 +01:00
Steve Loughran 42c71a5790
HADOOP-15691. Add PathCapabilities to FileSystem and FileContext.
Contributed by Steve Loughran.

This complements the StreamCapabilities Interface by allowing applications to probe for a specific path on a specific instance of a FileSystem client
to offer a specific capability.

This is intended to allow applications to determine

* Whether a method is implemented before calling it and dealing with UnsupportedOperationException.
* Whether a specific feature is believed to be available in the remote store.

As well as a common set of capabilities defined in CommonPathCapabilities,
file systems are free to add their own capabilities, prefixed with
 fs. + schema + .

The plan is to identify and document more capabilities -and for file systems which add new features, for a declaration of the availability of the feature to always be available.

Note

* The remote store is not expected to be checked for the feature;
  It is more a check of client API and the client's configuration/knowledge
  of the state of the remote system.
* Permissions are not checked.
2020-08-19 17:15:06 +01:00
Ayush Saxena d6a9ed8140 HDFS-15514. Remove useless dfs.webhdfs.enabled. Contributed by Fei Hui. 2020-08-07 22:23:02 +05:30
Ayush Saxena 27a97e4f28 HADOOP-17100. Replace Guava Supplier with Java8+ Supplier in Hadoop. Contributed by Ahmed Hussein. 2020-07-22 18:39:49 +05:30
Masatake Iwasaki 77e69a73da HADOOP-17040. Fix intermittent failure of ITestBlockingThreadPoolExecutorService. (#2020)
(cherry picked from commit 9685314633)
2020-05-22 21:30:57 +09:00
Steve Loughran 168c597ece Revert "HADOOP-14557. Document HADOOP-8143 (Change distcp to have -pb on by default)."
This reverts commit 44350fdf49.

It is related to the rollback of HADOOP-8143.

Change-Id: If48e3dd670c920ada702dc36461ff398fe9d35cc
2020-05-15 13:47:22 +01:00
Steve Loughran 7a8b8c3588 Revert "HADOOP-8143. Change distcp to have -pb on by default."
This reverts commit dd65eea74b.

Change-Id: I74180cf59d5bbad8c9f66cb331535addcbea863e
2020-05-15 13:46:07 +01:00
Masatake Iwasaki 89696b66e7 HADOOP-17025. Fix invalid metastore configuration in S3GuardTool tests. (#1994)
(cherry picked from commit 99840aaba6)

 Conflicts:
	hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/AbstractS3GuardToolTestBase.java
2020-05-07 12:11:37 +09:00
Steve Loughran 204d54005a HADOOP-16117. Update AWS SDK to 1.11.563.
Contributed by Steve Loughran.

Change-Id: I7c46ed2a6378e1370f567acf4cdcfeb93e43fa13
2020-04-24 10:46:24 +01:00
Masatake Iwasaki 83c4f8b9a0 HADOOP-16739. Fix native build failure of hadoop-pipes on CentOS 8. 2020-04-24 15:38:11 +09:00
Weiwei Yang 5fca921fe3 HADOOP-16840. AliyunOSS: getFileStatus throws FileNotFoundException in versioning bucket. Contributed by wujinhu.
(cherry picked from commit 6dfe00c71e)
2020-03-08 21:14:41 -07:00
Mukund Thakur 3937abddbd HDFS-13660. DistCp job fails when new data is appended in the file while the DistCp copy job is running
This uses the length of the file known at the start of the copy to determine the amount of data to copy.

* If a file is appended to during the copy, the original bytes are copied.
* If a file is truncated during a copy, or the attempt to read the data fails with a truncated stream,
  distcp will now fail. Until now these failures were not detected.

Contributed by Mukund Thakur.

Change-Id: I576a49d951fa48d37a45a7e4c82c47488aa8e884
(cherry picked from commit 51c64b357d)
2020-02-27 16:37:03 -08:00
Akira Ajisaka d4f75e2798
HADOOP-16808. Use forkCount and reuseForks parameters instead of forkMode in the config of maven surefire plugin. Contributed by Xieming Li.
(cherry picked from commit f6d20daf40)
2020-01-21 18:03:56 +09:00
Steve Loughran 429d5db3d9
HADOOP-16785. followup to abfs close() fix.
Adds one extra test to the ABFS close logic, to explicitly
verify that the close sequence of FilterOutputStream is
not going to fail.

This is just a due-diligence patch, but it helps ensure
that no regressions creep in in future.

Contributed by Steve Loughran.

Change-Id: Ifd33a8c322d32513411405b15f50a1aebcfa6e48
2020-01-20 16:26:33 +00:00
Steve Loughran e21cb8f96e HADOOP-16785. Improve wasb and abfs resilience on double close() calls.
This hardens the wasb and abfs output streams' resilience to being invoked
in/after close().

wasb:
  Explicity raise IOEs on operations invoked after close,
  rather than implicitly raise NPEs.
  This ensures that invocations which catch and swallow IOEs will perform as
  expected.

abfs:
  When rethrowing an IOException in the close() call, explicitly wrap it
  with a new instance of the same subclass.
  This is needed to handle failures in try-with-resources clauses, where
  any exception in closed() is added as a suppressed exception to the one
  thrown in the try {} clause
  *and you cannot attach the same exception to itself*

Contributed by Steve Loughran.

Change-Id: Ic44b494ff5da332b47d6c198ceb67b965d34dd1b
2020-01-08 12:04:11 +00:00
Steve Loughran 5410732cff
HADOOP-16775. DistCp reuses the same temp file within the task for different files.
Contributed by Amir Shenavandeh.

This avoids overwrite consistency issues with S3 and other stores -though
given S3's copy operation is O(data), you are still best of using -direct
when distcp-ing to it.

Change-Id: I8dc9f048ad0cc57ff01543b849da1ce4eaadf8c3
2020-01-02 15:37:55 +00:00
Akira Ajisaka 7201384e1b
HADOOP-16771. Update checkstyle to 8.26 and maven-checkstyle-plugin to 3.1.0. Contributed by Andras Bokor.
(cherry picked from commit f777cd398f)
2019-12-20 13:12:02 +09:00
Mingliang Liu d19981fe48
HADOOP-16758. Refine testing.md to tell user better how to use auth-keys.xml (#1753)
Contributed by Mingliang Liu
2019-12-11 11:54:12 -08:00
Sneha Vijayarajan aa9cd0a2d6
HADOOP-16660. ABFS: Make RetryCount in ExponentialRetryPolicy Configurable.
Contributed by Sneha Vijayarajan.
2019-12-08 21:32:13 -08:00
bilaharith c225efe237
HADOOP-16455. ABFS: Implement FileSystem.access() method.
Contributed by Bilahari T H.
2019-12-08 21:32:02 -08:00
Jeetesh Mangwani b1e748f45b
HADOOP-16612. Track Azure Blob File System client-perceived latency
Contributed by Jeetesh Mangwani.

This add the ability to track the end-to-end performance of ADLS Gen 2 REST APIs by measuring latency in the Hadoop ABFS driver.
The latency information is sent back to the ADLS Gen 2 REST API endpoints in the subsequent requests.
2019-12-08 21:31:51 -08:00
bilaharith ffeb6d8ece
HADOOP-16587. Make ABFS AAD endpoints configurable.
Contributed by Bilahari T H.

This also addresses HADOOP-16498: AzureADAuthenticator cannot authenticate
in China.

Change-Id: I2441dd48b50b59b912b0242f7f5a4418cf94a87c
2019-12-08 21:31:39 -08:00
Sneha Vijayarajan 8b2c7e0c4d
HADOOP-16578 : Avoid FileSystem API calls when FileSystem already exists 2019-12-08 21:31:24 -08:00
Sneha Vijayarajan 546db6428e
HADOOP-16548 : Disable Flush() over config 2019-12-08 21:31:08 -08:00
Mingliang Liu 8a60429e0b
HADOOP-16735. Make it clearer in config default that EnvironmentVariableCredentialsProvider supports AWS_SESSION_TOKEN. Contributed by Mingliang Liu
This closes #1733
2019-12-05 17:50:28 -08:00
Szilard Nemeth 62622ab9c1 YARN-9836. General usability improvements in showSimulationTrace.html. Contributed by Adam Antal 2019-11-19 21:21:17 +01:00
Andras Bokor 89e95370a4 HADOOP-16710. Testing_azure.md documentation is misleading.
Contributed by Andras Bokor.

Change-Id: Icf07a53145936953629c7dace2e9648b7b21588d
2019-11-17 17:06:10 +00:00
Siyao Meng e0cf1735e1 HADOOP-16676. Backport HADOOP-16152 to branch-3.2. Contributed by Siyao Meng.
Signed-off-by: Wei-Chiu Chuang <weichiu@apache.org>
2019-11-12 11:38:42 -08:00
Da Zhou fe96407451
HADOOP-16640. WASB: Override getCanonicalServiceName() to return URI
(cherry picked from commit 9a8edb0aed)
2019-10-16 14:27:11 -07:00
Rohith Sharma K S 7d5bb2ebb7 Preparing for 3.2.2-SNAPSHOT development. 2019-09-07 08:52:08 +05:30
bilaharith 3b3c0c4b87 HADOOP-16479. ABFS FileStatus.getModificationTime returns localized time instead of UTC.
Contributed by Bilahari T H

Change-Id: I532055baaadfd7c324710e4b25f60cdf0378bdc0
2019-08-27 19:08:38 +00:00
Robert Levas ce23e971b4 HADOOP-16340. ABFS driver continues to retry on IOException responses from REST operations.
Contributed by Robert Levas.

This makes the HttpException constructor protected rather than public, so it is possible
to implement custom subclasses of this exception -exceptions which will not be retried.

Change-Id: Ie8aaa23a707233c2db35948784908b6778ff3a8f
2019-08-27 19:08:29 +00:00
Da Zhou a6d50a9054 HADOOP-16376. ABFS: Override access() to no-op.
Contributed by Da Zhou.

Change-Id: Ia0024bba32250189a87eb6247808b2473c331ed0
2019-08-27 19:04:16 +00:00
Da Zhou dd636127e9 HADOOP-16269. ABFS: add listFileStatus with StartFrom.
Author:    Da Zhou
2019-08-27 19:01:21 +00:00
Da Zhou 006ae258b3 HADOOP-16163. NPE in setup/teardown of ITestAbfsDelegationTokens.
Contributed by Da Zhou.

Signed-off-by: Steve Loughran <stevel@apache.org>
2019-08-27 19:01:21 +00:00
Akira Ajisaka afb3f329fd
YARN-9774. Fix order of arguments for assertEquals in TestSLSUtils. Contributed by Nikhil Navadiya.
(cherry picked from commit 84b1982060)
2019-08-23 14:40:15 +09:00
bibinchundatt 69255fa1b9 YARN-9765. SLS runner crashes when run with metrics turned off. Contributed by Abhishek Modi.
(cherry picked from commit 10ec31d20e)
2019-08-21 13:57:53 +05:30
KAI XIE b3c14d4132 HADOOP-16158. DistCp to support checksum validation when copy blocks in parallel (#919)
* DistCp to support checksum validation when copy blocks in parallel

* address review comments

* add checksums comparison test for combine mode

(cherry picked from commit c765584eb2)
2019-08-18 18:48:21 -07:00
Da Zhou 330e450397
HADOOP-16315. ABFS: transform full UPN for named user in AclStatus
Contributed by Da Zhou

Change-Id: Ibc78322415fcbeff89c06c8586c53f5695550290
2019-08-12 09:41:52 +08:00
Ayush Saxena 35ff1ce42c HADOOP-16440. Distcp can not preserve timestamp with -delete option. Contributed by ludun. 2019-07-20 13:29:45 +05:30
Arun Singh 5f2d07af1b
HADOOP-16404. ABFS default blocksize change(256MB from 512MB)
Contributed by: Arun Singh
2019-07-19 20:34:28 -07:00
Masatake Iwasaki b6718c754a HADOOP-16401. ABFS: port Azure doc to 3.2 branch.
Signed-off-by: Masatake Iwasaki <iwasakims@apache.org>
2019-07-10 17:16:43 +09:00
Takanobu Asanuma 6dffad028e HDFS-12564. Add the documents of swebhdfs configurations on the client side. Contributed by Takanobu Asanuma.
Signed-off-by: Wei-Chiu Chuang <weichiu@apache.org>
(cherry picked from commit 98d2065643)
2019-06-20 20:17:45 -07:00
DadanielZ 9c8e40fbdb
HADOOP-16251. ABFS: add FSMainOperationsBaseTest. Re-commit to fix git metadata.
Author: Da Zhou
(cherry picked from commit ff27e8eabd)
2019-06-07 18:09:38 +01:00
Da Zhou bf0bb2470f
HADOOP-16242. ABFS: add bufferpool to AbfsOutputStream.
Contributed by Da Zhou.

(cherry picked from commit 1cef194a28)
2019-06-07 18:09:38 +01:00
Vishwajeet Dusane 907a016142
HADOOP-16182. Update abfs storage back-end with "close" flag when application is done writing to a file.
Contributed by Vishwajeet Dusane.

(cherry picked from commit 1edf1914ac)
2019-06-07 18:09:37 +01:00
Shweta Yakkali 6b115966bc
HADOOP-16157. [Clean-up] Remove NULL check before instanceof in AzureNativeFileSystemStore
(Contributed by Shweta Yakkali via Daniel Templeton)

Change-Id: I6269ae66378e46eed440a76f847ae1af1fa95450
(cherry picked from commit bb8ad096e7)
2019-06-07 18:09:37 +01:00
Shweta Yakkali 57c6060c3a
HADOOP-15860. ABFS: Throw exception when directory / file name ends with a period (.).
Contributed by Shweta Yakkali.

(cherry picked from commit 13f0ee21f2)

Change-Id: Ibd010d2e6adc15f53a9c5357482e57313bf84d2e
2019-06-07 18:09:37 +01:00
Da Zhou 3593b66693
HADOOP-15823. ABFS: Stop requiring client ID and tenant ID for MSI
(Contributed by Da Zhou via Daniel Templeton)

Change-Id: I546ab3a1df1efec635c08c388148e718dc4a9843
(cherry picked from commit e374584479)
2019-06-07 18:09:37 +01:00
Denes Gerencser ede5cbd707
HADOOP-16174. Disable wildfly logs to the console.
Follow-on to HADOOP-15851.

Author:    Denes Gerencser <dgerencser@cloudera.com>
(cherry picked from commit ddede7ae6f)
2019-06-07 18:09:37 +01:00