From e671a0f52b5488b8453e1a3258ea5e6477995648 Mon Sep 17 00:00:00 2001 From: Mingfei Date: Sun, 28 Aug 2016 10:37:52 +0800 Subject: [PATCH] HADOOP-13481. User documents for Aliyun OSS FileSystem. Contributed by Genmao Yu. --- .../hadoop/fs/aliyun/oss/Constants.java | 3 +- .../markdown/tools/hadoop-aliyun/index.md | 299 ++++++++++++++++++ 2 files changed, 300 insertions(+), 2 deletions(-) create mode 100644 hadoop-tools/hadoop-aliyun/src/site/markdown/tools/hadoop-aliyun/index.md diff --git a/hadoop-tools/hadoop-aliyun/src/main/java/org/apache/hadoop/fs/aliyun/oss/Constants.java b/hadoop-tools/hadoop-aliyun/src/main/java/org/apache/hadoop/fs/aliyun/oss/Constants.java index 243fdd4c0e1..e0c05ed740f 100644 --- a/hadoop-tools/hadoop-aliyun/src/main/java/org/apache/hadoop/fs/aliyun/oss/Constants.java +++ b/hadoop-tools/hadoop-aliyun/src/main/java/org/apache/hadoop/fs/aliyun/oss/Constants.java @@ -95,8 +95,7 @@ public final class Constants { // Comma separated list of directories public static final String BUFFER_DIR_KEY = "fs.oss.buffer.dir"; - // private | public-read | public-read-write | authenticated-read | - // log-delivery-write | bucket-owner-read | bucket-owner-full-control + // private | public-read | public-read-write public static final String CANNED_ACL_KEY = "fs.oss.acl.default"; public static final String CANNED_ACL_DEFAULT = ""; diff --git a/hadoop-tools/hadoop-aliyun/src/site/markdown/tools/hadoop-aliyun/index.md b/hadoop-tools/hadoop-aliyun/src/site/markdown/tools/hadoop-aliyun/index.md new file mode 100644 index 00000000000..4095e06fa57 --- /dev/null +++ b/hadoop-tools/hadoop-aliyun/src/site/markdown/tools/hadoop-aliyun/index.md @@ -0,0 +1,299 @@ + + +# Hadoop-Aliyun module: Integration with Aliyun Web Services + + + +## Overview + +The `hadoop-aliyun` module provides support for Aliyun integration with +[Aliyun Object Storage Service (Aliyun OSS)](https://www.aliyun.com/product/oss). +The generated JAR file, `hadoop-aliyun.jar` also declares a transitive +dependency on all external artifacts which are needed for this support — enabling +downstream applications to easily use this support. + +To make it part of Apache Hadoop's default classpath, simply make sure +that HADOOP_OPTIONAL_TOOLS in hadoop-env.sh has 'hadoop-aliyun' in the list. + +### Features + +* Read and write data stored in Aliyun OSS. +* Present a hierarchical file system view by implementing the standard Hadoop +[`FileSystem`](../api/org/apache/hadoop/fs/FileSystem.html) interface. +* Can act as a source of data in a MapReduce job, or a sink. + +### Warning #1: Object Stores are not filesystems. + +Aliyun OSS is an example of "an object store". In order to achieve scalability +and especially high availability, Aliyun OSS has relaxed some of the constraints +which classic "POSIX" filesystems promise. + + + +Specifically + +1. Atomic operations: `delete()` and `rename()` are implemented by recursive +file-by-file operations. They take time at least proportional to the number of files, +during which time partial updates may be visible. `delete()` and `rename()` +can not guarantee atomicity. If the operations are interrupted, the filesystem +is left in an intermediate state. +2. File owner and group are persisted, but the permissions model is not enforced. +Authorization occurs at the level of the entire Aliyun account via +[Aliyun Resource Access Management (Aliyun RAM)](https://www.aliyun.com/product/ram). +3. Directory last access time is not tracked. +4. The append operation is not supported. + +### Warning #2: Directory last access time is not tracked, +features of Hadoop relying on this can have unexpected behaviour. E.g. the +AggregatedLogDeletionService of YARN will not remove the appropriate logfiles. + +### Warning #3: Your Aliyun credentials are valuable + +Your Aliyun credentials not only pay for services, they offer read and write +access to the data. Anyone with the account can not only read your datasets +—they can delete them. + +Do not inadvertently share these credentials through means such as +1. Checking in to SCM any configuration files containing the secrets. +2. Logging them to a console, as they invariably end up being seen. +3. Defining filesystem URIs with the credentials in the URL, such as +`oss://accessKeyId:accessKeySecret@directory/file`. They will end up in +logs and error messages. +4. Including the secrets in bug reports. + +If you do any of these: change your credentials immediately! + +### Warning #4: The Aliyun OSS client provided by Aliyun E-MapReduce are different from this implementation + +Specifically: on Aliyun E-MapReduce, `oss://` is also supported but with +a different implementation. If you are using Aliyun E-MapReduce, +follow these instructions —and be aware that all issues related to Aliyun +OSS integration in E-MapReduce can only be addressed by Aliyun themselves: +please raise your issues with them. + +## OSS + +### Authentication properties + + + fs.oss.accessKeyId + Aliyun access key ID + + + + fs.oss.accessKeySecret + Aliyun access key secret + + + + fs.oss.credentials.provider + + Class name of a credentials provider that implements + com.aliyun.oss.common.auth.CredentialsProvider. Omit if using access/secret keys + or another authentication mechanism. The specified class must provide an + accessible constructor accepting java.net.URI and + org.apache.hadoop.conf.Configuration, or an accessible default constructor. + + + +### Other properties + + + fs.oss.endpoint + Aliyun OSS endpoint to connect to. An up-to-date list is + provided in the Aliyun OSS Documentation. + + + + + fs.oss.proxy.host + Hostname of the (optinal) proxy server for Aliyun OSS connection + + + + fs.oss.proxy.port + Proxy server port + + + + fs.oss.proxy.username + Username for authenticating with proxy server + + + + fs.oss.proxy.password + Password for authenticating with proxy server. + + + + fs.oss.proxy.domain + Domain for authenticating with proxy server. + + + + fs.oss.proxy.workstation + Workstation for authenticating with proxy server. + + + + fs.oss.attempts.maximum + 20 + How many times we should retry commands on transient errors. + + + + fs.oss.connection.establish.timeout + 50000 + Connection setup timeout in milliseconds. + + + + fs.oss.connection.timeout + 200000 + Socket connection timeout in milliseconds. + + + + fs.oss.paging.maximum + 500 + How many keys to request from Aliyun OSS when doing directory listings at a time. + + + + + fs.oss.multipart.upload.size + 10485760 + Size of each of multipart pieces in bytes. + + + + fs.oss.multipart.upload.threshold + 20971520 + Minimum size in bytes before we start a multipart uploads or copy. + + + + fs.oss.multipart.download.size + 102400/value> + Size in bytes in each request from ALiyun OSS. + + + + fs.oss.buffer.dir + Comma separated list of directories to buffer OSS data before uploading to Aliyun OSS + + + + fs.oss.buffer.dir + Comma separated list of directories to buffer OSS data before uploading to Aliyun OSS + + + + fs.oss.acl.default + + Set a canned ACL for bucket. Value may be private, public-read, public-read-write. + + + + + fs.oss.server-side-encryption-algorithm + + Specify a server-side encryption algorithm for oss: file system. + Unset by default, and the only other currently allowable value is AES256. + + + + + fs.oss.connection.maximum + 32 + Number of simultaneous connections to oss. + + + + fs.oss.connection.secure.enabled + true + Connect to oss over ssl or not, true by default. + + +## Testing the hadoop-aliyun Module + +To test `oss://` filesystem client, two files which pass in authentication +details to the test runner are needed. + +1. `auth-keys.xml` +2. `core-site.xml` + +Those two configuration files must be put into +`hadoop-tools/hadoop-aliyun/src/test/resources`. + +### `core-site.xml` + +This file pre-exists and sources the configurations created in `auth-keys.xml`. + +For most cases, no modification is needed, unless a specific, non-default property +needs to be set during the testing. + +### `auth-keys.xml` + +This file triggers the testing of Aliyun OSS module. Without this file, +*none of the tests in this module will be executed* + +It contains the access key Id/secret and proxy information that are needed to +connect to Aliyun OSS, and an OSS bucket URL should be also provided. + +1. `test.fs.oss.name` : the URL of the bucket for Aliyun OSS tests + +The contents of the bucket will be cleaned during the testing process, so +do not use the bucket for any purpose other than testing. + +### Run Hadoop contract tests +Create file `contract-test-options.xml` under `/test/resources`. If a +specific file `fs.contract.test.fs.oss` test path is not defined, those +tests will be skipped. Credentials are also needed to run any of those +tests, they can be copied from `auth-keys.xml` or through direct +XInclude inclusion. Here is an example of `contract-test-options.xml`: + + + + + + + + + fs.contract.test.fs.oss + oss://spark-tests + + + + fs.oss.impl + org.apache.hadoop.fs.aliyun.AliyunOSSFileSystem + + + + fs.oss.endpoint + oss-cn-hangzhou.aliyuncs.com + + + + fs.oss.buffer.dir + /tmp/oss + + + + fs.oss.multipart.download.size + 102400 + +