Commit Graph

312 Commits

Author SHA1 Message Date
bilaharith 85119267be
HADOOP-17166. ABFS: configure output stream thread pool (#2179)
Adds the options to control the size of the per-output-stream threadpool
when writing data through the abfs connector

* fs.azure.write.max.concurrent.requests
* fs.azure.write.max.requests.to.queue

Contributed by Bilahari T H
2020-09-09 16:41:36 +01:00
Mehakmeet Singh 0d855159f0
HADOOP-17229. No updation of bytes received counter value after response failure occurs in ABFS (#2264)
Contributed by Mehakmeet Singh
2020-09-08 10:14:23 +01:00
Mehakmeet Singh 84ed6adccc
HADOOP-17158. Test timeout for ITestAbfsInputStreamStatistics#testReadAheadCounters (#2272)
Contributed by: Mehakmeet Singh.
2020-09-08 10:11:06 +01:00
Mehakmeet Singh d1c60a53f6
HADOOP-17194. Adding Context class for AbfsClient in ABFS (#2216)
Contributed by Mehakmeet Singh.
2020-08-27 11:27:00 +01:00
bilaharith 64f36b9543
HADOOP-16915. ABFS: Ignoring the test ITestAzureBlobFileSystemRandomRead.testRandomReadPerformance
- Contributed by Bilahari T H
2020-08-24 12:00:55 -07:00
Sneha Vijayarajan b367942fe4
Upgrade store REST API version to 2019-12-12
- Contributed by Sneha Vijayarajan
2020-08-17 10:17:18 -07:00
bilaharith a2610e21ed
HADOOP-17183. ABFS: Enabling checkaccess on ABFS
- Contributed by Bilahari T H
2020-08-06 14:52:02 -07:00
bilaharith 3f73facd7b
HADOOP-17149. ABFS: Fixing the testcase ITestGetNameSpaceEnabled
- Contributed by Bilahari T H
2020-08-05 10:01:04 -07:00
bilaharith c566cabd62
HADOOP-17163. ABFS: Adding debug log for rename failures
- Contributed by Bilahari T H
2020-08-05 09:38:13 -07:00
bilaharith a7fda2e38f
HADOOP-17137. ABFS: Makes the test cases in ITestAbfsNetworkStatistics agnostic
- Contributed by Bilahari T H
2020-07-31 12:27:57 -07:00
Mehakmeet Singh 48a7c5b6ba
HADOOP-17113. Adding ReadAhead Counters in ABFS (#2154)
Contributed by Mehakmeet Singh
2020-07-22 18:22:30 +01:00
Sneha Vijayarajan d23cc9d85d
Hadoop 17132. ABFS: Fix Rename and Delete Idempotency check trigger
- Contributed by Sneha Vijayarajan
2020-07-21 09:22:38 -07:00
bilaharith b4b23ef0d1
HADOOP-17092. ABFS: Making AzureADAuthenticator.getToken() throw HttpException
- Contributed by Bilahari T H
2020-07-21 09:18:54 -07:00
Steve Loughran 9f407bcc88
HADOOP-17107. hadoop-azure parallel tests not working on recent JDKs (#2118)
Contributed by Steve Loughran.
2020-07-20 10:51:26 +01:00
bilaharith 99655167f3
HADOOP-16682. ABFS: Removing unnecessary toString() invocations
- Contributed by Bilahari T H
2020-07-18 10:00:18 -07:00
Mehakmeet Singh 4083fd57b5
HADOOP-17129. Validating storage keys in ABFS correctly (#2141)
Contributed by Mehakmeet Singh
2020-07-16 17:29:37 +01:00
Anoop Sam John 380e0f4506
HADOOP-16998. WASB : NativeAzureFsOutputStream#close() throwing IllegalArgumentException (#2073)
Contributed by Anoop Sam John.
2020-07-14 14:07:27 +01:00
ishaniahuja d20109c171
HADOOP-17058. ABFS: Support for AppendBlob in Hadoop ABFS Driver
- Contributed by Ishani Ahuja
2020-07-04 13:25:14 -07:00
bilaharith e0cededfbd
HADOOP-17086. ABFS: Making the ListStatus response ignore unknown properties. (#2101)
Contributed by Bilahari T H.
2020-07-03 19:00:22 +01:00
Mehakmeet Singh 3b5c9a90c0
HADOOP-16961. ABFS: Adding metrics to AbfsInputStream (#2076)
Contributed by Mehakmeet Singh.
2020-07-03 11:41:35 +01:00
Thomas Marquardt 4b5b54c73f
HADOOP-17089: WASB: Update azure-storage-java SDK
Contributed by Thomas Marquardt

DETAILS: WASB depends on the Azure Storage Java SDK. There is a concurrency
bug in the Azure Storage Java SDK that can cause the results of a list blobs
operation to appear empty. This causes the Filesystem listStatus and similar
APIs to return empty results. This has been seen in Spark work loads when jobs
use more than one executor core.

See Azure/azure-storage-java#546 for details on the bug in the Azure Storage SDK.

TESTS: A new test was added to validate the fix. All tests are passing:

wasb:
mvn -T 1C -Dparallel-tests=wasb -Dscale -DtestsThreadCount=8 clean verify
Tests run: 248, Failures: 0, Errors: 0, Skipped: 11
Tests run: 651, Failures: 0, Errors: 0, Skipped: 65

abfs:
mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify
Tests run: 64, Failures: 0, Errors: 0, Skipped: 0
Tests run: 437, Failures: 0, Errors: 0, Skipped: 33
Tests run: 206, Failures: 0, Errors: 0, Skipped: 24
2020-06-25 02:32:42 +00:00
Mehakmeet Singh 3472c3efc0
HADOOP-17065. Add Network Counters to ABFS (#2056)
Contributed by Mehakmeet Singh.
2020-06-19 14:03:49 +01:00
Thomas Marquardt caf3995ac2
HADOOP-17076: ABFS: Delegation SAS Generator Updates
Contributed by Thomas Marquardt.

DETAILS:
1) The authentication version in the service has been updated from Dec19 to Feb20, so need to update the client.
2) Add support and test cases for getXattr and setXAttr.
3) Update DelegationSASGenerator and related to use Duration instead of int for time periods.
4) Cleanup DelegationSASGenerator switch/case statement that maps operations to permissions.
5) Cleanup SASGenerator classes to use String.equals instead of ==.

TESTS:
Added tests for getXAttr and setXAttr.

All tests are passing against my account in eastus2euap:

 $mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify
 Tests run: 76, Failures: 0, Errors: 0, Skipped: 0
 Tests run: 441, Failures: 0, Errors: 0, Skipped: 33
 Tests run: 206, Failures: 0, Errors: 0, Skipped: 24
2020-06-18 02:07:08 +00:00
Mehakmeet Singh 7f486f0258
HADOOP-17016. Adding Common Counters in ABFS (#1991).
Contributed by: Mehakmeet Singh.

Change-Id: Ib84e7a42f28e064df4c6204fcce33e573360bf42
2020-06-02 18:31:35 +01:00
Karthik Amarnath b2200a33a6
HDFS-15168: ABFS enhancement to translate AAD to Linux identities. (#1978) 2020-05-28 19:00:23 -07:00
Sneha Vijayarajan 4c5cd751e3
HADOOP-17053. ABFS: Fix Account-specific OAuth config setting parsing
Contributed by Sneha Vijayarajan
2020-05-27 13:56:09 -07:00
Sneha Vijayarajan 53b993e604
HADOOP-16852: Report read-ahead error back
Contributed by Sneha Vijayarajan
2020-05-27 13:51:42 -07:00
Sneha Vijayarajan 37b1b4799d
HADOOP-17054. ABFS: Fix test AbfsClient authentication instance
Contributed by Sneha Vijayarajan
2020-05-26 15:26:28 -07:00
bilaharith d2f7133c62
HADOOP-17004. Fixing a formatting issue
Contributed by Bilahari T H.
2020-05-20 11:51:48 -07:00
Sneha Vijayarajan 8f78aeb250
Hadoop-17015. ABFS: Handling Rename and Delete idempotency
Contributed by Sneha Vijayarajan.
2020-05-19 12:30:07 -07:00
bilaharith bdbd59cfa0
HADOOP-17004. ABFS: Improve the ABFS driver documentation
Contributed by Bilahari T H.
2020-05-18 20:45:54 -07:00
Thomas Marquardt b214bbd2d9
HADOOP-16916: ABFS: Delegation SAS generator for integration with Ranger
Contributed by Thomas Marquardt.

DETAILS:

Previously we had a SASGenerator class which generated Service SAS, but we need to add DelegationSASGenerator.
I separated SASGenerator into a base class and two subclasses ServiceSASGenerator and DelegationSASGenreator.  The
code in ServiceSASGenerator is copied from SASGenerator but the DelegationSASGenrator code is new.  The
DelegationSASGenerator code demonstrates how to use Delegation SAS with minimal permissions, as would be used
by an authorization service such as Apache Ranger.  Adding this to the tests helps us lock in this behavior.

Added a MockDelegationSASTokenProvider for testing User Delegation SAS.

Fixed the ITestAzureBlobFileSystemCheckAccess tests to assume oauth client ID so that they are ignored when that
is not configured.

To improve performance, AbfsInputStream/AbfsOutputStream re-use SAS tokens until the expiry is within 120 seconds.
After this a new SAS will be requested.  The default period of 120 seconds can be changed using the configuration
setting "fs.azure.sas.token.renew.period.for.streams".

The SASTokenProvider operation names were updated to correspond better with the ADLS Gen2 REST API, since these
operations must be provided tokens with appropriate SAS parameters to succeed.

Support for the version 2.0 AAD authentication endpoint was added to AzureADAuthenticator.

The getFileStatus method was mistakenly calling the ADLS Gen2 Get Properties API which requires read permission
while the getFileStatus call only requires execute permission.  ADLS Gen2 Get Status API is supposed to be used
for this purpose, so the underlying AbfsClient.getPathStatus API was updated with a includeProperties
parameter which is set to false for getFileStatus and true for getXAttr.

Added SASTokenProvider support for delete recursive.

Fixed bugs in AzureBlobFileSystem where public methods were not validating the Path by calling makeQualified.  This is
necessary to avoid passing null paths and to convert relative paths into absolute paths.

Canonicalized the path used for root path internally so that root path can be used with SAS tokens, which requires
that the path in the URL and the path in the SAS token match.  Internally the code was using
"//" instead of "/" for the root path, sometimes.  Also related to this, the AzureBlobFileSystemStore.getRelativePath
API was updated so that we no longer remove and then add back a preceding forward / to paths.

To run ITestAzureBlobFileSystemDelegationSAS tests follow the instructions in testing_azure.md under the heading
"To run Delegation SAS test cases".  You also need to set "fs.azure.enable.check.access" to true.

TEST RESULTS:

namespace.enabled=true
auth.type=SharedKey
-------------------
$mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify
Tests run: 63, Failures: 0, Errors: 0, Skipped: 0
Tests run: 432, Failures: 0, Errors: 0, Skipped: 41
Tests run: 206, Failures: 0, Errors: 0, Skipped: 24

namespace.enabled=false
auth.type=SharedKey
-------------------
$mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify
Tests run: 63, Failures: 0, Errors: 0, Skipped: 0
Tests run: 432, Failures: 0, Errors: 0, Skipped: 244
Tests run: 206, Failures: 0, Errors: 0, Skipped: 24

namespace.enabled=true
auth.type=SharedKey
sas.token.provider.type=MockDelegationSASTokenProvider
enable.check.access=true
-------------------
$mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify
Tests run: 63, Failures: 0, Errors: 0, Skipped: 0
Tests run: 432, Failures: 0, Errors: 0, Skipped: 33
Tests run: 206, Failures: 0, Errors: 0, Skipped: 24

namespace.enabled=true
auth.type=OAuth
-------------------
$mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify
Tests run: 63, Failures: 0, Errors: 0, Skipped: 0
Tests run: 432, Failures: 0, Errors: 1, Skipped: 74
Tests run: 206, Failures: 0, Errors: 0, Skipped: 140
2020-05-12 18:35:38 +00:00
Mehakmeet Singh 192cad9ee2
HADOOP-17018. Intermittent failing of ITestAbfsStreamStatistics in ABFS (#1990)
Contributed by: Mehakmeet Singh

In some cases, ABFS-prefetch thread runs in the background which returns some bytes from the buffer and gives an extra readOp. Thus, making readOps values arbitrary and giving intermittent failures in some cases. Hence, readOps values of 2 or 3 are seen in different setups.
2020-05-07 12:15:28 +01:00
bilaharith 30ef8d0f1a
HADOOP-17002. ABFS: Adding config to determine if the account is HNS enabled or not
Contributed by Bilahari T H.
2020-04-23 17:46:18 -07:00
Mehakmeet Singh 459eb2ad6d
HADOOP-16914 Adding Output Stream Counters in ABFS (#1899)
Contributed by Mehakmeet Singh.There
2020-04-23 13:35:39 +01:00
Sneha Vijayarajan 3d69383c26
Hadoop 16857. ABFS: Stop CustomTokenProvider retry logic to depend on AbfsRestOp retry policy
Contributed by Sneha Vijayarajan
2020-04-21 21:39:48 -07:00
bilaharith 264e49c8f2
HADOOP-16922. ABFS: Change User-Agent header (#1938)
Contributed by Bilahari T H.
2020-04-21 17:37:40 +01:00
Mukund Thakur 8031c66295
HADOOP-16965. Refactor abfs stream configuration. (#1956)
Contributed by Mukund Thakur.
2020-04-21 17:27:29 +01:00
bilaharith 0ad0102678
HADOOP-16855. Changing wildfly dependency scope in hadoop-azure to compile
Contributed by Biliharith
2020-04-14 19:17:12 +01:00
Mehakmeet Singh c734d247b1
HADOOP-16910 : ABFS Streams to update FileSystem.Statistics counters on IO. (#1918). Contributed by Mehakmeet Singh. 2020-03-31 14:49:09 +02:00
Brahma Reddy Battula 8914cf9167 Preparing for 3.4.0 development 2020-03-29 23:24:25 +05:30
Steve Loughran 745a6c1e69
Revert "HADOOP-16818. ABFS: Combine append+flush calls for blockblob & appendblob"
This reverts commit 3612317038.

Change-Id: Ie0d36f25de0b55a937894f4d9963c495bae0576a
2020-03-26 15:24:37 +00:00
Steve Loughran 28afdce009
Revert ""HADOOP-16910. ABFS Streams to update FileSystem.Statistics counters on IO."
This reverts commit e2c7ac71b5.

Change-Id: I5b5a93f5a36cdb0c3d56d1b3f747c318f089de20
2020-03-24 12:11:18 +00:00
Mehakmeet Singh e2c7ac71b5
ABFS Streams to update FileSystem.Statistics counters on IO.
Contributed by Mehakmeet Singh
2020-03-23 13:50:18 +00:00
ishaniahuja 3612317038
HADOOP-16818. ABFS: Combine append+flush calls for blockblob & appendblob
Contributed by Ishani Ahuja.
2020-03-20 10:27:41 +00:00
bilaharith 6ce5f8734f
HADOOP-16920 ABFS: Make list page size configurable.
Contributed by Bilahari T H.

The page limit is set in "fs.azure.list.max.results"; default value is 500. 

There's currently a limit of 5000 in the store -there are no range checks
in the client code so that limit can be changed on the server without
any need to update the abfs connector.
2020-03-18 14:14:18 +00:00
bilaharith 0b931f36ec
Hadoop 16890. Change in expiry calculation for MSI token provider.
Contributed by Bilahari T H
2020-03-11 20:39:10 +00:00
Steve Loughran d4d4c37810
HADOOP-14630 Contract Tests to verify create, mkdirs and rename under a file is forbidden
Contributed by Steve Loughran.

Not all stores do complete validation here; in particular the S3A
Connector does not: checking up the entire directory tree to see if a path matches
is a file significantly slows things down.

This check does take place in S3A mkdirs(), which walks backwards up the list of
parent paths until it finds a directory (success) or a file (failure).
In practice production applications invariably create destination directories
before writing 1+ file into them -restricting check purely to the mkdirs()
call deliver significant speed up while implicitly including the checks.

Change-Id: I2c9df748e92b5655232e7d888d896f1868806eb0
2020-03-09 14:44:28 +00:00
Sneha Vijayarajan 791270a2e5
HADOOP-16730: ABFS: Support for Shared Access Signatures (SAS). Contributed by Sneha Vijayarajan. 2020-02-27 18:27:22 +00:00
Sahil Takiar 42dfd270a1
HADOOP-16859: ABFS: Add unbuffer support to ABFS connector.
Contributed by Sahil Takiar
2020-02-24 16:28:00 +00:00