hadoop

Commit Graph

Author	SHA1	Message	Date
Thomas Marquardt	4b5b54c73f	HADOOP-17089: WASB: Update azure-storage-java SDK Contributed by Thomas Marquardt DETAILS: WASB depends on the Azure Storage Java SDK. There is a concurrency bug in the Azure Storage Java SDK that can cause the results of a list blobs operation to appear empty. This causes the Filesystem listStatus and similar APIs to return empty results. This has been seen in Spark work loads when jobs use more than one executor core. See Azure/azure-storage-java#546 for details on the bug in the Azure Storage SDK. TESTS: A new test was added to validate the fix. All tests are passing: wasb: mvn -T 1C -Dparallel-tests=wasb -Dscale -DtestsThreadCount=8 clean verify Tests run: 248, Failures: 0, Errors: 0, Skipped: 11 Tests run: 651, Failures: 0, Errors: 0, Skipped: 65 abfs: mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify Tests run: 64, Failures: 0, Errors: 0, Skipped: 0 Tests run: 437, Failures: 0, Errors: 0, Skipped: 33 Tests run: 206, Failures: 0, Errors: 0, Skipped: 24	2020-06-25 02:32:42 +00:00
Mehakmeet Singh	3472c3efc0	HADOOP-17065. Add Network Counters to ABFS (#2056 ) Contributed by Mehakmeet Singh.	2020-06-19 14:03:49 +01:00
Thomas Marquardt	caf3995ac2	HADOOP-17076: ABFS: Delegation SAS Generator Updates Contributed by Thomas Marquardt. DETAILS: 1) The authentication version in the service has been updated from Dec19 to Feb20, so need to update the client. 2) Add support and test cases for getXattr and setXAttr. 3) Update DelegationSASGenerator and related to use Duration instead of int for time periods. 4) Cleanup DelegationSASGenerator switch/case statement that maps operations to permissions. 5) Cleanup SASGenerator classes to use String.equals instead of ==. TESTS: Added tests for getXAttr and setXAttr. All tests are passing against my account in eastus2euap: $mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify Tests run: 76, Failures: 0, Errors: 0, Skipped: 0 Tests run: 441, Failures: 0, Errors: 0, Skipped: 33 Tests run: 206, Failures: 0, Errors: 0, Skipped: 24	2020-06-18 02:07:08 +00:00
Mehakmeet Singh	7f486f0258	HADOOP-17016. Adding Common Counters in ABFS (#1991 ). Contributed by: Mehakmeet Singh. Change-Id: Ib84e7a42f28e064df4c6204fcce33e573360bf42	2020-06-02 18:31:35 +01:00
Karthik Amarnath	b2200a33a6	HDFS-15168: ABFS enhancement to translate AAD to Linux identities. (#1978 )	2020-05-28 19:00:23 -07:00
Sneha Vijayarajan	4c5cd751e3	HADOOP-17053. ABFS: Fix Account-specific OAuth config setting parsing Contributed by Sneha Vijayarajan	2020-05-27 13:56:09 -07:00
Sneha Vijayarajan	53b993e604	HADOOP-16852: Report read-ahead error back Contributed by Sneha Vijayarajan	2020-05-27 13:51:42 -07:00
Sneha Vijayarajan	37b1b4799d	HADOOP-17054. ABFS: Fix test AbfsClient authentication instance Contributed by Sneha Vijayarajan	2020-05-26 15:26:28 -07:00
bilaharith	d2f7133c62	HADOOP-17004. Fixing a formatting issue Contributed by Bilahari T H.	2020-05-20 11:51:48 -07:00
Sneha Vijayarajan	8f78aeb250	Hadoop-17015. ABFS: Handling Rename and Delete idempotency Contributed by Sneha Vijayarajan.	2020-05-19 12:30:07 -07:00
bilaharith	bdbd59cfa0	HADOOP-17004. ABFS: Improve the ABFS driver documentation Contributed by Bilahari T H.	2020-05-18 20:45:54 -07:00
Thomas Marquardt	b214bbd2d9	HADOOP-16916: ABFS: Delegation SAS generator for integration with Ranger Contributed by Thomas Marquardt. DETAILS: Previously we had a SASGenerator class which generated Service SAS, but we need to add DelegationSASGenerator. I separated SASGenerator into a base class and two subclasses ServiceSASGenerator and DelegationSASGenreator. The code in ServiceSASGenerator is copied from SASGenerator but the DelegationSASGenrator code is new. The DelegationSASGenerator code demonstrates how to use Delegation SAS with minimal permissions, as would be used by an authorization service such as Apache Ranger. Adding this to the tests helps us lock in this behavior. Added a MockDelegationSASTokenProvider for testing User Delegation SAS. Fixed the ITestAzureBlobFileSystemCheckAccess tests to assume oauth client ID so that they are ignored when that is not configured. To improve performance, AbfsInputStream/AbfsOutputStream re-use SAS tokens until the expiry is within 120 seconds. After this a new SAS will be requested. The default period of 120 seconds can be changed using the configuration setting "fs.azure.sas.token.renew.period.for.streams". The SASTokenProvider operation names were updated to correspond better with the ADLS Gen2 REST API, since these operations must be provided tokens with appropriate SAS parameters to succeed. Support for the version 2.0 AAD authentication endpoint was added to AzureADAuthenticator. The getFileStatus method was mistakenly calling the ADLS Gen2 Get Properties API which requires read permission while the getFileStatus call only requires execute permission. ADLS Gen2 Get Status API is supposed to be used for this purpose, so the underlying AbfsClient.getPathStatus API was updated with a includeProperties parameter which is set to false for getFileStatus and true for getXAttr. Added SASTokenProvider support for delete recursive. Fixed bugs in AzureBlobFileSystem where public methods were not validating the Path by calling makeQualified. This is necessary to avoid passing null paths and to convert relative paths into absolute paths. Canonicalized the path used for root path internally so that root path can be used with SAS tokens, which requires that the path in the URL and the path in the SAS token match. Internally the code was using "//" instead of "/" for the root path, sometimes. Also related to this, the AzureBlobFileSystemStore.getRelativePath API was updated so that we no longer remove and then add back a preceding forward / to paths. To run ITestAzureBlobFileSystemDelegationSAS tests follow the instructions in testing_azure.md under the heading "To run Delegation SAS test cases". You also need to set "fs.azure.enable.check.access" to true. TEST RESULTS: namespace.enabled=true auth.type=SharedKey ------------------- $mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify Tests run: 63, Failures: 0, Errors: 0, Skipped: 0 Tests run: 432, Failures: 0, Errors: 0, Skipped: 41 Tests run: 206, Failures: 0, Errors: 0, Skipped: 24 namespace.enabled=false auth.type=SharedKey ------------------- $mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify Tests run: 63, Failures: 0, Errors: 0, Skipped: 0 Tests run: 432, Failures: 0, Errors: 0, Skipped: 244 Tests run: 206, Failures: 0, Errors: 0, Skipped: 24 namespace.enabled=true auth.type=SharedKey sas.token.provider.type=MockDelegationSASTokenProvider enable.check.access=true ------------------- $mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify Tests run: 63, Failures: 0, Errors: 0, Skipped: 0 Tests run: 432, Failures: 0, Errors: 0, Skipped: 33 Tests run: 206, Failures: 0, Errors: 0, Skipped: 24 namespace.enabled=true auth.type=OAuth ------------------- $mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify Tests run: 63, Failures: 0, Errors: 0, Skipped: 0 Tests run: 432, Failures: 0, Errors: 1, Skipped: 74 Tests run: 206, Failures: 0, Errors: 0, Skipped: 140	2020-05-12 18:35:38 +00:00
Mehakmeet Singh	192cad9ee2	HADOOP-17018. Intermittent failing of ITestAbfsStreamStatistics in ABFS (#1990 ) Contributed by: Mehakmeet Singh In some cases, ABFS-prefetch thread runs in the background which returns some bytes from the buffer and gives an extra readOp. Thus, making readOps values arbitrary and giving intermittent failures in some cases. Hence, readOps values of 2 or 3 are seen in different setups.	2020-05-07 12:15:28 +01:00
bilaharith	30ef8d0f1a	HADOOP-17002. ABFS: Adding config to determine if the account is HNS enabled or not Contributed by Bilahari T H.	2020-04-23 17:46:18 -07:00
Mehakmeet Singh	459eb2ad6d	HADOOP-16914 Adding Output Stream Counters in ABFS (#1899 ) Contributed by Mehakmeet Singh.There	2020-04-23 13:35:39 +01:00
Sneha Vijayarajan	3d69383c26	Hadoop 16857. ABFS: Stop CustomTokenProvider retry logic to depend on AbfsRestOp retry policy Contributed by Sneha Vijayarajan	2020-04-21 21:39:48 -07:00
bilaharith	264e49c8f2	HADOOP-16922. ABFS: Change User-Agent header (#1938 ) Contributed by Bilahari T H.	2020-04-21 17:37:40 +01:00
Mukund Thakur	8031c66295	HADOOP-16965. Refactor abfs stream configuration. (#1956 ) Contributed by Mukund Thakur.	2020-04-21 17:27:29 +01:00
bilaharith	0ad0102678	HADOOP-16855. Changing wildfly dependency scope in hadoop-azure to compile Contributed by Biliharith	2020-04-14 19:17:12 +01:00
Mehakmeet Singh	c734d247b1	HADOOP-16910 : ABFS Streams to update FileSystem.Statistics counters on IO. (#1918 ). Contributed by Mehakmeet Singh.	2020-03-31 14:49:09 +02:00
Brahma Reddy Battula	8914cf9167	Preparing for 3.4.0 development	2020-03-29 23:24:25 +05:30
Steve Loughran	745a6c1e69	Revert "HADOOP-16818. ABFS: Combine append+flush calls for blockblob & appendblob" This reverts commit `3612317038`. Change-Id: Ie0d36f25de0b55a937894f4d9963c495bae0576a	2020-03-26 15:24:37 +00:00
Steve Loughran	28afdce009	Revert ""HADOOP-16910. ABFS Streams to update FileSystem.Statistics counters on IO." This reverts commit `e2c7ac71b5`. Change-Id: I5b5a93f5a36cdb0c3d56d1b3f747c318f089de20	2020-03-24 12:11:18 +00:00
Mehakmeet Singh	e2c7ac71b5	ABFS Streams to update FileSystem.Statistics counters on IO. Contributed by Mehakmeet Singh	2020-03-23 13:50:18 +00:00
ishaniahuja	3612317038	HADOOP-16818. ABFS: Combine append+flush calls for blockblob & appendblob Contributed by Ishani Ahuja.	2020-03-20 10:27:41 +00:00
bilaharith	6ce5f8734f	HADOOP-16920 ABFS: Make list page size configurable. Contributed by Bilahari T H. The page limit is set in "fs.azure.list.max.results"; default value is 500. There's currently a limit of 5000 in the store -there are no range checks in the client code so that limit can be changed on the server without any need to update the abfs connector.	2020-03-18 14:14:18 +00:00
bilaharith	0b931f36ec	Hadoop 16890. Change in expiry calculation for MSI token provider. Contributed by Bilahari T H	2020-03-11 20:39:10 +00:00
Steve Loughran	d4d4c37810	HADOOP-14630 Contract Tests to verify create, mkdirs and rename under a file is forbidden Contributed by Steve Loughran. Not all stores do complete validation here; in particular the S3A Connector does not: checking up the entire directory tree to see if a path matches is a file significantly slows things down. This check does take place in S3A mkdirs(), which walks backwards up the list of parent paths until it finds a directory (success) or a file (failure). In practice production applications invariably create destination directories before writing 1+ file into them -restricting check purely to the mkdirs() call deliver significant speed up while implicitly including the checks. Change-Id: I2c9df748e92b5655232e7d888d896f1868806eb0	2020-03-09 14:44:28 +00:00
Sneha Vijayarajan	791270a2e5	HADOOP-16730: ABFS: Support for Shared Access Signatures (SAS). Contributed by Sneha Vijayarajan.	2020-02-27 18:27:22 +00:00
Sahil Takiar	42dfd270a1	HADOOP-16859: ABFS: Add unbuffer support to ABFS connector. Contributed by Sahil Takiar	2020-02-24 16:28:00 +00:00
Steve Loughran	e3bba5fa22	HADOOP-16706. ITestClientUrlScheme fails for accounts which don't support HTTP Adds a new service code to recognise accounts without HTTP support; catches that and considers such a responset a successful validation of the ability of the client to switch to http when the test parameters expect that. Contributed by Steve Loughran	2020-02-21 11:13:38 +00:00
bilaharith	5944d28130	HADOOP-16825: ITestAzureBlobFileSystemCheckAccess failing. Contributed by Bilahari T H.	2020-02-06 18:48:00 +00:00
Sneha Vijayarajan	55f2421580	HADOOP-16845: Disable ITestAbfsClient.testContinuationTokenHavingEqualSign due to ADLS Gen2 service bug. Contributed by Sneha Vijayarajan.	2020-02-06 18:41:06 +00:00
Karthick Narendran	978c487672	HADOOP-16826. ABFS: update abfs.md to include config keys for identity transformation Contributed by Karthick Narendran	2020-01-23 20:35:57 -08:00
Sahil Takiar	f206b736f0	HADOOP-16346. Stabilize S3A OpenSSL support. Introduces `openssl` as an option for `fs.s3a.ssl.channel.mode`. The new option is documented and marked as experimental. For details on how to use this, consult the peformance document in the s3a documentation. This patch is the successor to HADOOP-16050 "S3A SSL connections should use OpenSSL" -which was reverted because of incompatibilities between the wildfly OpenSSL client and the AWS HTTPS servers (HADOOP-16347). With the Wildfly release moved up to 1.0.7.Final (HADOOP-16405) everything should now work. Related issues: * HADOOP-15669. ABFS: Improve HTTPS Performance * HADOOP-16050: S3A SSL connections should use OpenSSL * HADOOP-16371: Option to disable GCM for SSL connections when running on Java 8 * HADOOP-16405: Upgrade Wildfly Openssl version to 1.0.7.Final Contributed by Sahil Takiar Change-Id: I80a4bc5051519f186b7383b2c1cea140be42444e	2020-01-21 16:37:51 +00:00
Steve Loughran	6a859d33aa	HADOOP-16785. followup to abfs close() fix. Adds one extra test to the ABFS close logic, to explicitly verify that the close sequence of FilterOutputStream is not going to fail. This is just a due-diligence patch, but it helps ensure that no regressions creep in in future. Contributed by Steve Loughran. Change-Id: Ifd33a8c322d32513411405b15f50a1aebcfa6e48	2020-01-20 16:23:41 +00:00
Clemens Wolff	c36f09deb9	HADOOP-16005. NativeAzureFileSystem does not support setXAttr. Contributed by Clemens Wolff.	2020-01-14 17:28:37 -08:00
Steve Loughran	17aa8f6764	HADOOP-16785. Improve wasb and abfs resilience on double close() calls. This hardens the wasb and abfs output streams' resilience to being invoked in/after close(). wasb: Explicity raise IOEs on operations invoked after close, rather than implicitly raise NPEs. This ensures that invocations which catch and swallow IOEs will perform as expected. abfs: When rethrowing an IOException in the close() call, explicitly wrap it with a new instance of the same subclass. This is needed to handle failures in try-with-resources clauses, where any exception in closed() is added as a suppressed exception to the one thrown in the try {} clause and you cannot attach the same exception to itself Contributed by Steve Loughran. Change-Id: Ic44b494ff5da332b47d6c198ceb67b965d34dd1b	2020-01-08 11:46:54 +00:00
Sneha Vijayarajan	d1f5976c00	HADOOP-16699. Add verbose TRACE logging to ABFS. Contributed by Sneha Vijayarajan, Change-Id: Ic616a10406e6e9f11616c9cc05d8630ebbedaf65	2020-01-07 18:05:47 +00:00
Akira Ajisaka	f777cd398f	HADOOP-16771. Update checkstyle to 8.26 and maven-checkstyle-plugin to 3.1.0. Contributed by Andras Bokor.	2019-12-20 13:10:26 +09:00
Sneha Vijayarajan	82ad9b549f	HADOOP-16660. ABFS: Make RetryCount in ExponentialRetryPolicy Configurable. Contributed by Sneha Vijayarajan.	2019-11-27 15:10:21 -08:00
bilaharith	9e69628f55	HADOOP-16455. ABFS: Implement FileSystem.access() method. Contributed by Bilahari T H.	2019-11-27 15:56:38 +00:00
Sneha Vijayarajan	de38045021	HADOOP-16687. ABFS: Fix testcase added for HADOOP-16138 for namespace enabled account. (#1701 )	2019-11-21 11:24:12 +09:00
Jeetesh Mangwani	b033c681e4	HADOOP-16612. Track Azure Blob File System client-perceived latency Contributed by Jeetesh Mangwani. This add the ability to track the end-to-end performance of ADLS Gen 2 REST APIs by measuring latency in the Hadoop ABFS driver. The latency information is sent back to the ADLS Gen 2 REST API endpoints in the subsequent requests.	2019-11-19 09:00:24 -08:00
Andras Bokor	96c4520f89	HADOOP-16710. Testing_azure.md documentation is misleading. Contributed by Andras Bokor. Change-Id: Icf07a53145936953629c7dace2e9648b7b21588d	2019-11-17 17:04:29 +00:00
Da Zhou	9a8edb0aed	HADOOP-16640. WASB: Override getCanonicalServiceName() to return URI	2019-10-16 13:14:15 -07:00
bilaharith	1a77a15fe4	HADOOP-16587. Make ABFS AAD endpoints configurable. Contributed by Bilahari T H. This also addresses HADOOP-16498: AzureADAuthenticator cannot authenticate in China. Change-Id: I2441dd48b50b59b912b0242f7f5a4418cf94a87c	2019-10-07 13:07:46 +01:00
Sneha Vijayarajan	770adc5d4a	HADOOP-16578 : Avoid FileSystem API calls when FileSystem already exists	2019-10-01 17:38:11 -07:00
Sneha Vijayarajan	c0edc848a8	HADOOP-16548 : Disable Flush() over config	2019-09-28 20:39:42 -07:00
Steve Loughran	e346e3638c	HADOOP-15691 Add PathCapabilities to FileSystem and FileContext. Contributed by Steve Loughran. This complements the StreamCapabilities Interface by allowing applications to probe for a specific path on a specific instance of a FileSystem client to offer a specific capability. This is intended to allow applications to determine * Whether a method is implemented before calling it and dealing with UnsupportedOperationException. * Whether a specific feature is believed to be available in the remote store. As well as a common set of capabilities defined in CommonPathCapabilities, file systems are free to add their own capabilities, prefixed with fs. + schema + . The plan is to identify and document more capabilities -and for file systems which add new features, for a declaration of the availability of the feature to always be available. Note * The remote store is not expected to be checked for the feature; It is more a check of client API and the client's configuration/knowledge of the state of the remote system. * Permissions are not checked. Change-Id: I80bfebe94f4a8bdad8f3ac055495735b824968f5	2019-09-25 12:16:41 +01:00

1 2 3 4 5 ...

292 Commits