From 974f33add21f77fff920caee15d38526ffa5be79 Mon Sep 17 00:00:00 2001 From: Mingliang Liu Date: Mon, 5 Jun 2017 18:24:40 -0700 Subject: [PATCH] HADOOP-14491. Azure has messed doc structure. Contributed by Mingliang Liu --- .../hadoop-azure/src/site/markdown/index.md | 235 +++++++++--------- 1 file changed, 124 insertions(+), 111 deletions(-) diff --git a/hadoop-tools/hadoop-azure/src/site/markdown/index.md b/hadoop-tools/hadoop-azure/src/site/markdown/index.md index 1dca3b9d8fa..9c57e60301f 100644 --- a/hadoop-tools/hadoop-azure/src/site/markdown/index.md +++ b/hadoop-tools/hadoop-azure/src/site/markdown/index.md @@ -14,20 +14,9 @@ # Hadoop Azure Support: Azure Blob Storage -* [Introduction](#Introduction) -* [Features](#Features) -* [Limitations](#Limitations) -* [Usage](#Usage) - * [Concepts](#Concepts) - * [Configuring Credentials](#Configuring_Credentials) - * [Page Blob Support and Configuration](#Page_Blob_Support_and_Configuration) - * [Atomic Folder Rename](#Atomic_Folder_Rename) - * [Accessing wasb URLs](#Accessing_wasb_URLs) - * [Append API Support and Configuration](#Append_API_Support_and_Configuration) - * [Multithread Support](#Multithread_Support) -* [Testing the hadoop-azure Module](#Testing_the_hadoop-azure_Module) + -## Introduction +## Introduction The hadoop-azure module provides support for integration with [Azure Blob Storage](http://azure.microsoft.com/en-us/documentation/services/storage/). @@ -38,7 +27,7 @@ on the additional artifacts it requires, notably the To make it part of Apache Hadoop's default classpath, simply make sure that HADOOP_OPTIONAL_TOOLS in hadoop-env.sh has 'hadoop-azure' in the list. -## Features +## Features * Read and write data stored in an Azure Blob Storage account. * Present a hierarchical file system view by implementing the standard Hadoop @@ -54,15 +43,15 @@ HADOOP_OPTIONAL_TOOLS in hadoop-env.sh has 'hadoop-azure' in the list. * Tested on both Linux and Windows. * Tested at scale. -## Limitations +## Limitations * File owner and group are persisted, but the permissions model is not enforced. Authorization occurs at the level of the entire Azure Blob Storage account. * File last access time is not tracked. -## Usage +## Usage -### Concepts +### Concepts The Azure Blob Storage data model presents 3 core concepts: @@ -76,7 +65,7 @@ The Azure Blob Storage data model presents 3 core concepts: The internal implementation also uses blobs to persist the file system hierarchy and other metadata. -### Configuring Credentials +### Configuring Credentials Usage of Azure Blob Storage requires configuration of credentials. Typically this is set in core-site.xml. The configuration property name is of the form @@ -87,11 +76,12 @@ untrusted party.** For example: - - fs.azure.account.key.youraccount.blob.core.windows.net - YOUR ACCESS KEY - - +```xml + + fs.azure.account.key.youraccount.blob.core.windows.net + YOUR ACCESS KEY + +``` In many Hadoop clusters, the core-site.xml file is world-readable. It is possible to protect the access key within a credential provider as well. This provides an encrypted file format along with protection with file permissions. @@ -110,14 +100,14 @@ For additional reading on the credential provider API see: ###### provision -``` +```bash % hadoop credential create fs.azure.account.key.youraccount.blob.core.windows.net -value 123 -provider localjceks://file/home/lmccay/wasb.jceks ``` ###### configure core-site.xml or command line system property -``` +```xml hadoop.security.credential.provider.path localjceks://file/home/lmccay/wasb.jceks @@ -127,7 +117,7 @@ For additional reading on the credential provider API see: ###### distcp -``` +```bash % hadoop distcp [-D hadoop.security.credential.provider.path=localjceks://file/home/lmccay/wasb.jceks] hdfs://hostname:9001/user/lmccay/007020615 wasb://yourcontainer@youraccount.blob.core.windows.net/testDir/ @@ -145,22 +135,25 @@ specifies an external program to be invoked by Hadoop processes to decrypt the key. The encrypted key value is passed to this external program as a command line argument: - - fs.azure.account.keyprovider.youraccount - org.apache.hadoop.fs.azure.ShellDecryptionKeyProvider - +```xml + + fs.azure.account.keyprovider.youraccount + org.apache.hadoop.fs.azure.ShellDecryptionKeyProvider + - - fs.azure.account.key.youraccount.blob.core.windows.net - YOUR ENCRYPTED ACCESS KEY - + + fs.azure.account.key.youraccount.blob.core.windows.net + YOUR ENCRYPTED ACCESS KEY + - - fs.azure.shellkeyprovider.script - PATH TO DECRYPTION PROGRAM - + + fs.azure.shellkeyprovider.script + PATH TO DECRYPTION PROGRAM + -### Page Blob Support and Configuration +``` + +### Page Blob Support and Configuration The Azure Blob Storage interface for Hadoop supports two kinds of blobs, [block blobs and page blobs](http://msdn.microsoft.com/en-us/library/azure/ee691964.aspx). @@ -182,10 +175,12 @@ folder names. For example: - - fs.azure.page.blob.dir - /hbase/WALs,/hbase/oldWALs,/data/mypageblobfiles - +```xml + + fs.azure.page.blob.dir + /hbase/WALs,/hbase/oldWALs,/data/mypageblobfiles + +``` You can set this to simply / to make all files page blobs. @@ -197,7 +192,7 @@ The configuration option `fs.azure.page.blob.extension.size` is the page blob extension size. This defines the amount to extend a page blob if it starts to get full. It must be 128MB or greater, specified as an integer number of bytes. -### Atomic Folder Rename +### Atomic Folder Rename Azure storage stores files as a flat key/value store without formal support for folders. The hadoop-azure file system layer simulates folders on top @@ -216,12 +211,14 @@ the intention of the rename operation, to allow redo in event of a failure. For example: - - fs.azure.atomic.rename.dir - /hbase,/data - +```xml + + fs.azure.atomic.rename.dir + /hbase,/data + +``` -### Accessing wasb URLs +### Accessing wasb URLs After credentials are configured in core-site.xml, any Hadoop component may reference files in that Azure Blob Storage account by using URLs of the following @@ -238,28 +235,32 @@ For example, the following commands demonstrate access to a storage account named `youraccount` and a container named `yourcontainer`. - > hadoop fs -mkdir wasb://yourcontainer@youraccount.blob.core.windows.net/testDir +```bash +% hadoop fs -mkdir wasb://yourcontainer@youraccount.blob.core.windows.net/testDir - > hadoop fs -put testFile wasb://yourcontainer@youraccount.blob.core.windows.net/testDir/testFile +% hadoop fs -put testFile wasb://yourcontainer@youraccount.blob.core.windows.net/testDir/testFile - > hadoop fs -cat wasbs://yourcontainer@youraccount.blob.core.windows.net/testDir/testFile - test file content +% hadoop fs -cat wasbs://yourcontainer@youraccount.blob.core.windows.net/testDir/testFile +test file content +``` It's also possible to configure `fs.defaultFS` to use a `wasb` or `wasbs` URL. This causes all bare paths, such as `/testDir/testFile` to resolve automatically to that file system. -### Append API Support and Configuration +### Append API Support and Configuration The Azure Blob Storage interface for Hadoop has optional support for Append API for single writer by setting the configuration `fs.azure.enable.append.support` to true. For Example: - - fs.azure.enable.append.support - true - +```xml + + fs.azure.enable.append.support + true + +``` It must be noted Append support in Azure Blob Storage interface DIFFERS FROM HDFS SEMANTICS. Append support does not enforce single writer internally but requires applications to guarantee this semantic. @@ -267,25 +268,29 @@ It becomes a responsibility of the application either to ensure single-threaded file path, or rely on some external locking mechanism of its own. Failure to do so will result in unexpected behavior. -### Multithread Support +### Multithread Support Rename and Delete blob operations on directories with large number of files and sub directories currently is very slow as these operations are done one blob at a time serially. These files and sub folders can be deleted or renamed parallel. Following configurations can be used to enable threads to do parallel processing To enable 10 threads for Delete operation. Set configuration value to 0 or 1 to disable threads. The default behavior is threads disabled. - - fs.azure.delete.threads - 10 - +```xml + + fs.azure.delete.threads + 10 + +``` To enable 20 threads for Rename operation. Set configuration value to 0 or 1 to disable threads. The default behavior is threads disabled. - - fs.azure.rename.threads - 20 - +```xml + + fs.azure.rename.threads + 20 + +``` -### WASB Secure mode and configuration +### WASB Secure mode and configuration WASB can operate in secure mode where the Storage access keys required to communicate with Azure storage does not have to be in the same address space as the process using WASB. In this mode all interactions with Azure storage is performed using @@ -295,30 +300,32 @@ Romote mode, however for testing purposes the local mode can be enabled to gener To enable Secure mode following property needs to be set to true. -``` - - fs.azure.secure.mode - true - +```xml + + fs.azure.secure.mode + true + ``` To enable SAS key generation locally following property needs to be set to true. +```xml + + fs.azure.local.sas.key.mode + true + ``` - - fs.azure.local.sas.key.mode - true - -``` + To use the remote SAS key generation mode, an external REST service is expected to provided required SAS keys. Following property can used to provide the end point to use for remote SAS Key generation: +```xml + + fs.azure.cred.service.url + {URL} + ``` - - fs.azure.cred.service.url - {URL} - -``` + The remote service is expected to provide support for two REST calls ```{URL}/GET_CONTAINER_SAS``` and ```{URL}/GET_RELATIVE_BLOB_SAS```, for generating container and relative blob sas keys. An example requests @@ -326,7 +333,8 @@ container and relative blob sas keys. An example requests ```{URL}/GET_CONTAINER_SAS?storage_account=&container=&relative_path=&sas_expiry=&delegation_token=``` The service is expected to return a response in JSON format: -``` + +```json { "responseCode" : 0 or non-zero , "responseMessage" : relavant message on failure , @@ -334,40 +342,42 @@ The service is expected to return a response in JSON format: } ``` -## Authorization Support in WASB. +### Authorization Support in WASB Authorization support can be enabled in WASB using the following configuration: -``` - - fs.azure.authorization - true - -``` - The current implementation of authorization relies on the presence of an external service that can enforce - the authorization. The service is expected to be running on a URL provided by the following config. - -``` - - fs.azure.authorization.remote.service.url - {URL} - +```xml + + fs.azure.authorization + true + ``` - The remote service is expected to provide support for the following REST call: ```{URL}/CHECK_AUTHORIZATION``` - An example request: +The current implementation of authorization relies on the presence of an external service that can enforce +the authorization. The service is expected to be running on a URL provided by the following config. + +```xml + + fs.azure.authorization.remote.service.url + {URL} + +``` + +The remote service is expected to provide support for the following REST call: ```{URL}/CHECK_AUTHORIZATION``` +An example request: ```{URL}/CHECK_AUTHORIZATION?wasb_absolute_path=&operation_type=&delegation_token=``` - The service is expected to return a response in JSON format: - ``` - { +The service is expected to return a response in JSON format: + +```json +{ "responseCode" : 0 or non-zero , "responseMessage" : relevant message on failure , "authorizationResult" : true/false - } - ``` +} +``` -## Testing the hadoop-azure Module +## Testing the hadoop-azure Module The hadoop-azure module includes a full suite of unit tests. Most of the tests will run without additional configuration by running `mvn test`. This includes @@ -382,10 +392,12 @@ that runs on a local machine. To use the emulator, install Azure SDK 2.3 and start the storage emulator. Then, edit `src/test/resources/azure-test.xml` and add the following property: - - fs.azure.test.emulator - true - +```xml + + fs.azure.test.emulator + true + +``` There is a known issue when running tests with the emulator. You may see the following failure message: @@ -399,6 +411,7 @@ file to `src/test/resources/azure-auth-keys.xml` and setting the name of the storage account and its access key. For example: + ```xml