HADOOP-13309. Document S3A known limitations in file ownership and permission model. Contributed by Chris Nauroth.
This commit is contained in:
parent
dbd205762e
commit
309a43925c
|
@ -373,6 +373,21 @@ a time proportional to the quantity of data to upload, and inversely proportiona
|
||||||
to the network bandwidth. It may also fail —a failure that is better
|
to the network bandwidth. It may also fail —a failure that is better
|
||||||
escalated than ignored.
|
escalated than ignored.
|
||||||
|
|
||||||
|
1. **Authorization**. Hadoop uses the `FileStatus` class to
|
||||||
|
represent core metadata of files and directories, including the owner, group and
|
||||||
|
permissions. Object stores might not have a viable way to persist this
|
||||||
|
metadata, so they might need to populate `FileStatus` with stub values. Even if
|
||||||
|
the object store persists this metadata, it still might not be feasible for the
|
||||||
|
object store to enforce file authorization in the same way as a traditional file
|
||||||
|
system. If the object store cannot persist this metadata, then the recommended
|
||||||
|
convention is:
|
||||||
|
* File owner is reported as the current user.
|
||||||
|
* File group also is reported as the current user.
|
||||||
|
* Directory permissions are reported as 777.
|
||||||
|
* File permissions are reported as 666.
|
||||||
|
* File system APIs that set ownership and permissions execute successfully
|
||||||
|
without error, but they are no-ops.
|
||||||
|
|
||||||
Object stores with these characteristics, can not be used as a direct replacement
|
Object stores with these characteristics, can not be used as a direct replacement
|
||||||
for HDFS. In terms of this specification, their implementations of the
|
for HDFS. In terms of this specification, their implementations of the
|
||||||
specified operations do not match those required. They are considered supported
|
specified operations do not match those required. They are considered supported
|
||||||
|
|
|
@ -39,7 +39,7 @@ higher performance.
|
||||||
|
|
||||||
The specifics of using these filesystems are documented below.
|
The specifics of using these filesystems are documented below.
|
||||||
|
|
||||||
### Warning #1: Object Stores are not filesystems.
|
### Warning #1: Object Stores are not filesystems
|
||||||
|
|
||||||
Amazon S3 is an example of "an object store". In order to achieve scalability
|
Amazon S3 is an example of "an object store". In order to achieve scalability
|
||||||
and especially high availability, S3 has —as many other cloud object stores have
|
and especially high availability, S3 has —as many other cloud object stores have
|
||||||
|
@ -56,14 +56,38 @@ recursive file-by-file operations. They take time at least proportional to
|
||||||
the number of files, during which time partial updates may be visible. If
|
the number of files, during which time partial updates may be visible. If
|
||||||
the operations are interrupted, the filesystem is left in an intermediate state.
|
the operations are interrupted, the filesystem is left in an intermediate state.
|
||||||
|
|
||||||
### Warning #2: Because Object stores don't track modification times of directories,
|
### Warning #2: Object stores don't track modification times of directories
|
||||||
features of Hadoop relying on this can have unexpected behaviour. E.g. the
|
|
||||||
|
Features of Hadoop relying on this can have unexpected behaviour. E.g. the
|
||||||
AggregatedLogDeletionService of YARN will not remove the appropriate logfiles.
|
AggregatedLogDeletionService of YARN will not remove the appropriate logfiles.
|
||||||
|
|
||||||
For further discussion on these topics, please consult
|
For further discussion on these topics, please consult
|
||||||
[The Hadoop FileSystem API Definition](../../../hadoop-project-dist/hadoop-common/filesystem/index.html).
|
[The Hadoop FileSystem API Definition](../../../hadoop-project-dist/hadoop-common/filesystem/index.html).
|
||||||
|
|
||||||
### Warning #3: your AWS credentials are valuable
|
### Warning #3: Object stores have differerent authorization models
|
||||||
|
|
||||||
|
The object authorization model of S3 is much different from the file
|
||||||
|
authorization model of HDFS and traditional file systems. It is not feasible to
|
||||||
|
persist file ownership and permissions in S3, so S3A reports stub information
|
||||||
|
from APIs that would query this metadata:
|
||||||
|
|
||||||
|
* File owner is reported as the current user.
|
||||||
|
* File group also is reported as the current user. Prior to Apache Hadoop
|
||||||
|
2.8.0, file group was reported as empty (no group associated), which is a
|
||||||
|
potential incompatibility problem for scripts that perform positional parsing of
|
||||||
|
shell output and other clients that expect to find a well-defined group.
|
||||||
|
* Directory permissions are reported as 777.
|
||||||
|
* File permissions are reported as 666.
|
||||||
|
|
||||||
|
S3A does not really enforce any authorization checks on these stub permissions.
|
||||||
|
Users authenticate to an S3 bucket using AWS credentials. It's possible that
|
||||||
|
object ACLs have been defined to enforce authorization at the S3 side, but this
|
||||||
|
happens entirely within the S3 service, not within the S3A implementation.
|
||||||
|
|
||||||
|
For further discussion on these topics, please consult
|
||||||
|
[The Hadoop FileSystem API Definition](../../../hadoop-project-dist/hadoop-common/filesystem/index.html).
|
||||||
|
|
||||||
|
### Warning #4: Your AWS credentials are valuable
|
||||||
|
|
||||||
Your AWS credentials not only pay for services, they offer read and write
|
Your AWS credentials not only pay for services, they offer read and write
|
||||||
access to the data. Anyone with the credentials can not only read your datasets
|
access to the data. Anyone with the credentials can not only read your datasets
|
||||||
|
@ -78,7 +102,7 @@ Do not inadvertently share these credentials through means such as
|
||||||
|
|
||||||
If you do any of these: change your credentials immediately!
|
If you do any of these: change your credentials immediately!
|
||||||
|
|
||||||
### Warning #4: the S3 client provided by Amazon EMR are not from the Apache
|
### Warning #5: The S3 client provided by Amazon EMR are not from the Apache
|
||||||
Software foundation, and are only supported by Amazon.
|
Software foundation, and are only supported by Amazon.
|
||||||
|
|
||||||
Specifically: on Amazon EMR, s3a is not supported, and amazon recommend
|
Specifically: on Amazon EMR, s3a is not supported, and amazon recommend
|
||||||
|
|
Loading…
Reference in New Issue