HADOOP-13655. document object store use with fs shell and distcp. Contributed by Steve Loughran

This closes #131
This commit is contained in:
Mingliang Liu 2016-11-21 17:49:05 -08:00
parent 83cc7263af
commit beb70fed4f
3 changed files with 431 additions and 28 deletions

View File

@ -53,10 +53,14 @@ Returns 0 on success and 1 on error.
cat
---
Usage: `hadoop fs -cat URI [URI ...]`
Usage: `hadoop fs -cat [-ignoreCrc] URI [URI ...]`
Copies source paths to stdout.
Options
* The `-ignoreCrc` option disables checkshum verification.
Example:
* `hadoop fs -cat hdfs://nn1.example.com/file1 hdfs://nn2.example.com/file2`
@ -116,11 +120,16 @@ copyFromLocal
Usage: `hadoop fs -copyFromLocal <localsrc> URI`
Similar to put command, except that the source is restricted to a local file reference.
Similar to the `fs -put` command, except that the source is restricted to a local file reference.
Options:
* The -f option will overwrite the destination if it already exists.
* `-p` : Preserves access and modification times, ownership and the permissions.
(assuming the permissions can be propagated across filesystems)
* `-f` : Overwrites the destination if it already exists.
* `-l` : Allow DataNode to lazily persist the file to disk, Forces a replication
factor of 1. This flag will result in reduced durability. Use with care.
* `-d` : Skip creation of temporary file with the suffix `._COPYING_`.
copyToLocal
-----------
@ -300,7 +309,7 @@ Returns 0 on success and -1 on error.
get
---
Usage: `hadoop fs -get [-ignorecrc] [-crc] <src> <localdst> `
Usage: `hadoop fs -get [-ignorecrc] [-crc] [-p] [-f] <src> <localdst> `
Copy files to the local file system. Files that fail the CRC check may be copied with the -ignorecrc option. Files and CRCs may be copied using the -crc option.
@ -315,7 +324,11 @@ Returns 0 on success and -1 on error.
Options:
The -f option will overwrite the destination if it already exists.
* `-p` : Preserves access and modification times, ownership and the permissions.
(assuming the permissions can be propagated across filesystems)
* `-f` : Overwrites the destination if it already exists.
* `-ignorecrc` : Skip CRC checks on the file(s) downloaded.
* `-crc`: write CRC checksums for the files downloaded.
getfacl
-------
@ -483,13 +496,28 @@ Returns 0 on success and -1 on error.
put
---
Usage: `hadoop fs -put <localsrc> ... <dst> `
Usage: `hadoop fs -put [-f] [-p] [-l] [-d] [ - | <localsrc1> .. ]. <dst>`
Copy single src, or multiple srcs from local file system to the destination file system. Also reads input from stdin and writes to destination file system.
Copy single src, or multiple srcs from local file system to the destination file system.
Also reads input from stdin and writes to destination file system if the source is set to "-"
Copying fails if the file already exists, unless the -f flag is given.
Options:
* `-p` : Preserves access and modification times, ownership and the permissions.
(assuming the permissions can be propagated across filesystems)
* `-f` : Overwrites the destination if it already exists.
* `-l` : Allow DataNode to lazily persist the file to disk, Forces a replication
factor of 1. This flag will result in reduced durability. Use with care.
* `-d` : Skip creation of temporary file with the suffix `._COPYING_`.
Examples:
* `hadoop fs -put localfile /user/hadoop/hadoopfile`
* `hadoop fs -put localfile1 localfile2 /user/hadoop/hadoopdir`
* `hadoop fs -put localfile hdfs://nn.example.com/hadoop/hadoopfile`
* `hadoop fs -put -f localfile1 localfile2 /user/hadoop/hadoopdir`
* `hadoop fs -put -d localfile hdfs://nn.example.com/hadoop/hadoopfile`
* `hadoop fs -put - hdfs://nn.example.com/hadoop/hadoopfile` Reads the input from stdin.
Exit Code:
@ -696,7 +724,7 @@ touchz
Usage: `hadoop fs -touchz URI [URI ...]`
Create a file of zero length.
Create a file of zero length. An error is returned if the file exists with non-zero length.
Example:
@ -729,3 +757,279 @@ usage
Usage: `hadoop fs -usage command`
Return the help for an individual command.
<a name="ObjectStores" />Working with Object Storage
====================================================
The Hadoop FileSystem shell works with Object Stores such as Amazon S3,
Azure WASB and OpenStack Swift.
```bash
# Create a directory
hadoop fs -mkdir s3a://bucket/datasets/
# Upload a file from the cluster filesystem
hadoop fs -put /datasets/example.orc s3a://bucket/datasets/
# touch a file
hadoop fs -touchz wasb://yourcontainer@youraccount.blob.core.windows.net/touched
```
Unlike a normal filesystem, renaming files and directories in an object store
usually takes time proportional to the size of the objects being manipulated.
As many of the filesystem shell operations
use renaming as the final stage in operations, skipping that stage
can avoid long delays.
In particular, the `put` and `copyFromLocal` commands should
both have the `-d` options set for a direct upload.
```bash
# Upload a file from the cluster filesystem
hadoop fs -put -d /datasets/example.orc s3a://bucket/datasets/
# Upload a file from under the user's home directory in the local filesystem.
# Note it is the shell expanding the "~", not the hadoop fs command
hadoop fs -copyFromLocal -d -f ~/datasets/devices.orc s3a://bucket/datasets/
# create a file from stdin
# the special "-" source means "use stdin"
echo "hello" | hadoop fs -put -d -f - wasb://yourcontainer@youraccount.blob.core.windows.net/hello.txt
```
Objects can be downloaded and viewed:
```bash
# copy a directory to the local filesystem
hadoop fs -copyToLocal s3a://bucket/datasets/
# copy a file from the object store to the cluster filesystem.
hadoop fs -get wasb://yourcontainer@youraccount.blob.core.windows.net/hello.txt /examples
# print the object
hadoop fs -cat wasb://yourcontainer@youraccount.blob.core.windows.net/hello.txt
# print the object, unzipping it if necessary
hadoop fs -text wasb://yourcontainer@youraccount.blob.core.windows.net/hello.txt
## download log files into a local file
hadoop fs -getmerge wasb://yourcontainer@youraccount.blob.core.windows.net/logs\* log.txt
```
Commands which list many files tend to be significantly slower than when
working with HDFS or other filesystems
```bash
hadoop fs -count s3a://bucket/
hadoop fs -du s3a://bucket/
```
Other slow commands include `find`, `mv`, `cp` and `rm`.
**Find**
This can be very slow on a large store with many directories under the path
supplied.
```bash
# enumerate all files in the object store's container.
hadoop fs -find s3a://bucket/ -print
# remember to escape the wildcards to stop the shell trying to expand them first
hadoop fs -find s3a://bucket/datasets/ -name \*.txt -print
```
**Rename**
The time to rename a file depends on its size.
The time to rename a directory depends on the number and size of all files
beneath that directory.
```bash
hadoop fs -mv s3a://bucket/datasets s3a://bucket/historical
```
If the operation is interrupted, the object store will be in an undefined
state.
**Copy**
```bash
hadoop fs -cp s3a://bucket/datasets s3a://bucket/historical
```
The copy operation reads each file and then writes it back to the object store;
the time to complete depends on the amount of data to copy, and the bandwidth
in both directions between the local computer and the object store.
**The further the computer is from the object store, the longer the copy takes**
Deleting objects
----------------
The `rm` command will delete objects and directories full of objects.
If the object store is *eventually consistent*, `fs ls` commands
and other accessors may briefly return the details of the now-deleted objects; this
is an artifact of object stores which cannot be avoided.
If the filesystem client is configured to copy files to a trash directory,
this will be in the bucket; the `rm` operation will then take time proportional
to the size of the data. Furthermore, the deleted files will continue to incur
storage costs.
To avoid this, use the the `-skipTrash` option.
```bash
hadoop fs -rm -skipTrash s3a://bucket/dataset
```
Data moved to the `.Trash` directory can be purged using the `expunge` command.
As this command only works with the default filesystem, it must be configured to
make the default filesystem the target object store.
```bash
hadoop fs -expunge -D fs.defaultFS=s3a://bucket/
```
Overwriting Objects
----------------
If an object store is *eventually consistent*, then any operation which
overwrites existing objects may not be immediately visible to all clients/queries.
That is: later operations which query the same object's status or contents
may get the previous object. This can sometimes surface within the same client,
while reading a single object.
Avoid having a sequence of commands which overwrite objects and then immediately
work on the updated data; there is a risk that the previous data will be used
instead.
Timestamps
----------
Timestamps of objects and directories in Object Stores
may not follow the behavior of files and directories in HDFS.
1. The creation and initial modification times of an object will be the
time it was created on the object store; this will be at the end of the write process,
not the beginning.
1. The timestamp will be taken from the object store infrastructure's clock, not that of
the client.
1. If an object is overwritten, the modification time will be updated.
1. Directories may or may not have valid timestamps. They are unlikely
to have their modification times updated when an object underneath is updated.
1. The `atime` access time feature is not supported by any of the object stores
found in the Apache Hadoop codebase.
Consult the `DistCp` documentation for details on how this may affect the
`distcp -update` operation.
Security model and operations
-----------------------------
The security and permissions models of object stores are usually very different
from those of a Unix-style filesystem; operations which query or manipulate
permissions are generally unsupported.
Operations to which this applies include: `chgrp`, `chmod`, `chown`,
`getfacl`, and `setfacl`. The related attribute commands `getfattr` and`setfattr`
are also usually unavailable.
* Filesystem commands which list permission and user/group details, usually
simulate these details.
* Operations which try to preserve permissions (example `fs -put -p`)
do not preserve permissions for this reason. (Special case: `wasb://`, which preserves
permissions but does not enforce them).
When interacting with read-only object stores, the permissions found in "list"
and "stat" commands may indicate that the user has write access, when in fact they do not.
Object stores usually have permissions models of their own,
models can be manipulated through store-specific tooling.
Be aware that some of the permissions which an object store may provide
(such as write-only paths, or different permissions on the root path) may
be incompatible with the Hadoop filesystem clients. These tend to require full
read and write access to the entire object store bucket/container into which they write data.
As an example of how permissions are mocked, here is a listing of Amazon's public,
read-only bucket of Landsat images:
```bash
$ hadoop fs -ls s3a://landsat-pds/
Found 10 items
drwxrwxrwx - mapred 0 2016-09-26 12:16 s3a://landsat-pds/L8
-rw-rw-rw- 1 mapred 23764 2015-01-28 18:13 s3a://landsat-pds/index.html
drwxrwxrwx - mapred 0 2016-09-26 12:16 s3a://landsat-pds/landsat-pds_stats
-rw-rw-rw- 1 mapred 105 2016-08-19 18:12 s3a://landsat-pds/robots.txt
-rw-rw-rw- 1 mapred 38 2016-09-26 12:16 s3a://landsat-pds/run_info.json
drwxrwxrwx - mapred 0 2016-09-26 12:16 s3a://landsat-pds/runs
-rw-rw-rw- 1 mapred 27458808 2016-09-26 12:16 s3a://landsat-pds/scene_list.gz
drwxrwxrwx - mapred 0 2016-09-26 12:16 s3a://landsat-pds/tarq
drwxrwxrwx - mapred 0 2016-09-26 12:16 s3a://landsat-pds/tarq_corrupt
drwxrwxrwx - mapred 0 2016-09-26 12:16 s3a://landsat-pds/test
```
1. All files are listed as having full read/write permissions.
1. All directories appear to have full `rwx` permissions.
1. The replication count of all files is "1".
1. The owner of all files and directories is declared to be the current user (`mapred`).
1. The timestamp of all directories is actually that of the time the `-ls` operation
was executed. This is because these directories are not actual objects in the store;
they are simulated directories based on the existence of objects under their paths.
When an attempt is made to delete one of the files, the operation fails —despite
the permissions shown by the `ls` command:
```bash
$ hadoop fs -rm s3a://landsat-pds/scene_list.gz
rm: s3a://landsat-pds/scene_list.gz: delete on s3a://landsat-pds/scene_list.gz:
com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3;
Status Code: 403; Error Code: AccessDenied; Request ID: 1EF98D5957BCAB3D),
S3 Extended Request ID: wi3veOXFuFqWBUCJgV3Z+NQVj9gWgZVdXlPU4KBbYMsw/gA+hyhRXcaQ+PogOsDgHh31HlTCebQ=
```
This demonstrates that the listed permissions cannot
be taken as evidence of write access; only object manipulation can determine
this.
Note that the Microsoft Azure WASB filesystem does allow permissions to be set and checked,
however the permissions are not actually enforced. This feature offers
the ability for a HDFS directory tree to be backed up with DistCp, with
its permissions preserved, permissions which may be restored when copying
the directory back into HDFS. For securing access to the data in the object
store, however, Azure's [own model and tools must be used](https://azure.microsoft.com/en-us/documentation/articles/storage-security-guide/).
Commands of limited value
---------------------------
Here is the list of shell commands which generally have no effect —and may
actually fail.
| command | limitations |
|-----------|-------------|
| `appendToFile` | generally unsupported |
| `checksum` | the usual checksum is "NONE" |
| `chgrp` | generally unsupported permissions model; no-op |
| `chmod` | generally unsupported permissions model; no-op|
| `chown` | generally unsupported permissions model; no-op |
| `createSnapshot` | generally unsupported |
| `deleteSnapshot` | generally unsupported |
| `df` | default values are normally displayed |
| `getfacl` | may or may not be supported |
| `getfattr` | generally supported |
| `renameSnapshot` | generally unsupported |
| `setfacl` | generally unsupported permissions model |
| `setfattr` | generally unsupported permissions model |
| `setrep`| has no effect |
| `truncate` | generally unsupported |
Different object store clients *may* support these commands: do consult the
documentation and test against the target store.

View File

@ -30,10 +30,10 @@ are places where HDFS diverges from the expected behaviour of a POSIX
filesystem.
The behaviour of other Hadoop filesystems are not as rigorously tested.
The bundled S3 FileSystem makes Amazon's S3 Object Store ("blobstore")
The bundled S3N and S3A FileSystem clients make Amazon's S3 Object Store ("blobstore")
accessible through the FileSystem API. The Swift FileSystem driver provides similar
functionality for the OpenStack Swift blobstore. The Azure object storage
FileSystem in branch-1-win talks to Microsoft's Azure equivalent. All of these
FileSystem talks to Microsoft's Azure equivalent. All of these
bind to object stores, which do have different behaviors, especially regarding
consistency guarantees, and atomicity of operations.

View File

@ -68,7 +68,7 @@ $H3 Basic Usage
This will expand the namespace under `/foo/bar` on nn1 into a temporary file,
partition its contents among a set of map tasks, and start a copy on each
NodeManager from nn1 to nn2.
NodeManager from `nn1` to `nn2`.
One can also specify multiple source directories on the command line:
@ -110,7 +110,7 @@ $H3 Basic Usage
It's also worth noting that if another client is still writing to a source
file, the copy will likely fail. Attempting to overwrite a file being written
at the destination should also fail on HDFS. If a source file is (re)moved
before it is copied, the copy will fail with a FileNotFoundException.
before it is copied, the copy will fail with a `FileNotFoundException`.
Please refer to the detailed Command Line Reference for information on all
the options available in DistCp.
@ -196,20 +196,20 @@ $H3 raw Namespace Extended Attribute Preservation
This section only applies to HDFS.
If the target and all of the source pathnames are in the /.reserved/raw
If the target and all of the source pathnames are in the `/.reserved/raw`
hierarchy, then 'raw' namespace extended attributes will be preserved.
'raw' xattrs are used by the system for internal functions such as encryption
meta data. They are only visible to users when accessed through the
/.reserved/raw hierarchy.
`/.reserved/raw` hierarchy.
raw xattrs are preserved based solely on whether /.reserved/raw prefixes are
supplied. The -p (preserve, see below) flag does not impact preservation of
raw xattrs.
To prevent raw xattrs from being preserved, simply do not use the
/.reserved/raw prefix on any of the source and target paths.
`/.reserved/raw` prefix on any of the source and target paths.
If the /.reserved/raw prefix is specified on only a subset of the source and
If the `/.reserved/raw `prefix is specified on only a subset of the source and
target paths, an error will be displayed and a non-0 exit code returned.
Command Line Options
@ -288,27 +288,27 @@ $H3 Copy-listing Generator
that need copy into a SequenceFile, for consumption by the DistCp Hadoop
Job. The main classes in this module include:
1. CopyListing: The interface that should be implemented by any
1. `CopyListing`: The interface that should be implemented by any
copy-listing-generator implementation. Also provides the factory method by
which the concrete CopyListing implementation is chosen.
2. SimpleCopyListing: An implementation of CopyListing that accepts multiple
2. `SimpleCopyListing`: An implementation of `CopyListing` that accepts multiple
source paths (files/directories), and recursively lists all the individual
files and directories under each, for copy.
3. GlobbedCopyListing: Another implementation of CopyListing that expands
3. `GlobbedCopyListing`: Another implementation of `CopyListing` that expands
wild-cards in the source paths.
4. FileBasedCopyListing: An implementation of CopyListing that reads the
4. `FileBasedCopyListing`: An implementation of `CopyListing` that reads the
source-path list from a specified file.
Based on whether a source-file-list is specified in the DistCpOptions, the
source-listing is generated in one of the following ways:
1. If there's no source-file-list, the GlobbedCopyListing is used. All
1. If there's no source-file-list, the `GlobbedCopyListing` is used. All
wild-cards are expanded, and all the expansions are forwarded to the
SimpleCopyListing, which in turn constructs the listing (via recursive
descent of each path).
2. If a source-file-list is specified, the FileBasedCopyListing is used.
2. If a source-file-list is specified, the `FileBasedCopyListing` is used.
Source-paths are read from the specified file, and then forwarded to the
GlobbedCopyListing. The listing is then constructed as described above.
`GlobbedCopyListing`. The listing is then constructed as described above.
One may customize the method by which the copy-listing is constructed by
providing a custom implementation of the CopyListing interface. The behaviour
@ -343,13 +343,13 @@ $H3 InputFormats and MapReduce Components
implementation keeps the setup-time low.
* **DynamicInputFormat and DynamicRecordReader:**
The DynamicInputFormat implements org.apache.hadoop.mapreduce.InputFormat,
The DynamicInputFormat implements `org.apache.hadoop.mapreduce.InputFormat`,
and is new to DistCp. The listing-file is split into several "chunk-files",
the exact number of chunk-files being a multiple of the number of maps
requested for in the Hadoop Job. Each map task is "assigned" one of the
chunk-files (by renaming the chunk to the task's id), before the Job is
launched.
Paths are read from each chunk using the DynamicRecordReader, and
Paths are read from each chunk using the `DynamicRecordReader`, and
processed in the CopyMapper. After all the paths in a chunk are processed,
the current chunk is deleted and a new chunk is acquired. The process
continues until no more chunks are available.
@ -403,7 +403,7 @@ $H3 Map sizing
slower nodes. While this distribution isn't uniform, it is fair with regard
to each mapper's capacity.
The dynamic-strategy is implemented by the DynamicInputFormat. It provides
The dynamic-strategy is implemented by the `DynamicInputFormat`. It provides
superior performance under most conditions.
Tuning the number of maps to the size of the source and destination clusters,
@ -432,6 +432,105 @@ $H3 MapReduce and other side-effects
* If `mapreduce.map.speculative` is set set final and true, the result of the
copy is undefined.
$H3 DistCp and Object Stores
DistCp works with Object Stores such as Amazon S3, Azure WASB and OpenStack Swift.
Prequisites
1. The JAR containing the object store implementation is on the classpath,
along with all of its dependencies.
1. Unless the JAR automatically registers its bundled filesystem clients,
the configuration may need to be modified to state the class which
implements the filesystem schema. All of the ASF's own object store clients
are self-registering.
1. The relevant object store access credentials must be available in the cluster
configuration, or be otherwise available in all cluster hosts.
DistCp can be used to upload data
```bash
hadoop distcp hdfs://nn1:8020/datasets/set1 s3a://bucket/datasets/set1
```
To download data
```bash
hadoop distcp s3a://bucket/generated/results hdfs://nn1:8020/results
```
To copy data between object stores
```bash
hadoop distcp s3a://bucket/generated/results \
wasb://updates@example.blob.core.windows.net
```
And do copy data within an object store
```bash
hadoop distcp wasb://updates@example.blob.core.windows.net/current \
wasb://updates@example.blob.core.windows.net/old
```
And to use `-update` to only copy changed files.
```bash
hadoop distcp -update -numListstatusThreads 20 \
swift://history.cluster1/2016 \
hdfs://nn1:8020/history/2016
```
Because object stores are slow to list files, consider setting the `-numListstatusThreads` option when performing a `-update` operation
on a large directory tree (the limit is 40 threads).
When `DistCp -update` is used with object stores,
generally only the modification time and length of the individual files are compared,
not any checksums. The fact that most object stores do have valid timestamps
for directories is irrelevant; only the file timestamps are compared.
However, it is important to have the clock of the client computers close
to that of the infrastructure, so that timestamps are consistent between
the client/HDFS cluster and that of the object store. Otherwise, changed files may be
missed/copied too often.
**Notes**
* The `-atomic` option causes a rename of the temporary data, so significantly
increases the time to commit work at the end of the operation. Furthermore,
as Object Stores other than (optionally) `wasb://` do not offer atomic renames of directories
the `-atomic` operation doesn't actually deliver what is promised. *Avoid*.
* The `-append` option is not supported.
* The `-diff` and `rdiff` options are not supported
* CRC checking will not be performed, irrespective of the value of the `-skipCrc`
flag.
* All `-p` options, including those to preserve permissions, user and group information, attributes
checksums and replication are generally ignored. The `wasb://` connector will
preserve the information, but not enforce the permissions.
* Some object store connectors offer an option for in-memory buffering of
output —for example the S3A connector. Using such option while copying
large files may trigger some form of out of memory event,
be it a heap overflow or a YARN container termination.
This is particularly common if the network bandwidth
between the cluster and the object store is limited (such as when working
with remote object stores). It is best to disable/avoid such options and
rely on disk buffering.
* Copy operations within a single object store still take place in the Hadoop cluster
—even when the object store implements a more efficient COPY operation internally
That is, an operation such as
hadoop distcp s3a://bucket/datasets/set1 s3a://bucket/datasets/set2
Copies each byte down to the Hadoop worker nodes and back to the
bucket. As well as being slow, it means that charges may be incurred.
Frequently Asked Questions
--------------------------