HADOOP-13946. Document how HDFS updates timestamps in the FS spec; compare with object stores. Contributed by Steve Loughran

(cherry picked from commit fd26783aaf)
This commit is contained in:
Mingliang Liu 2017-03-10 00:21:20 -08:00
parent 4218671f3c
commit 9ea6b24c49
1 changed files with 85 additions and 0 deletions

View File

@ -392,3 +392,88 @@ Object stores with these characteristics, can not be used as a direct replacemen
for HDFS. In terms of this specification, their implementations of the
specified operations do not match those required. They are considered supported
by the Hadoop development community, but not to the same extent as HDFS.
#### Timestamps
`FileStatus` entries have a modification time and an access time.
1. The exact behavior as to when these timestamps are set and whether or not they are valid
varies between filesystems, and potentially between individual installations of a filesystem.
1. The granularity of the timestamps is again, specific to both a filesystem
and potentially individual installations.
The HDFS filesystem does not update the modification time while it is being written to.
Specifically
* `FileSystem.create()` creation: a zero-byte file is listed; the modification time is
set to the current time as seen on the NameNode.
* Writes to a file via the output stream returned in the `create()` call: the modification
time *does not change*.
* When `OutputStream.close()` is called, all remaining data is written, the file closed and
the NameNode updated with the final size of the file. The modification time is set to
the time the file was closed.
* Opening a file for appends via an `append()` operation does not change the modification
time of the file until the `close()` call is made on the output stream.
* `FileSystem.setTimes()` can be used to explicitly set the time on a file.
* When a file is renamed, its modification time is not changed, but the source
and destination directories have their modification times updated.
* The rarely used operations: `FileSystem.concat()`, `createSnapshot()`,
`createSymlink()` and `truncate()` all update the modification time.
* The access time granularity is set in milliseconds `dfs.namenode.access.time.precision`;
the default granularity is 1 hour. If the precision is set to zero, access times
are not recorded.
* If a modification or access time is not set, the value of that `FileStatus`
field is 0.
Other filesystems may have different behaviors. In particular,
* Access times may or may not be supported; even if the underlying FS may support access times,
the option it is often disabled for performance reasons.
* The granularity of the timestamps is an implementation-specific detail.
Object stores have an even vaguer view of time, which can be summarized as
"it varies".
* The timestamp granularity is likely to be 1 second, that being the granularity
of timestamps returned in HTTP HEAD and GET requests.
* Access times are likely to be unset. That is, `FileStatus.getAccessTime() == 0`.
* The modification timestamp for a newly created file MAY be that of the
`create()` call, or the actual time which the PUT request was initiated.
This may be in the `FileSystem.create()` call, the final
`OutputStream.close()` operation, some period in between.
* The modification time may not be updated in the `close()` call.
* The timestamp is likely to be in UTC or the TZ of the object store. If the
client is in a different timezone, the timestamp of objects may be ahead or
behind that of the client.
* Object stores with cached metadata databases (for example: AWS S3 with
an in-memory or a DynamoDB metadata store) may have timestamps generated
from the local system clock, rather than that of the service.
This is an optimization to avoid round-trip calls to the object stores.
+ A file's modification time is often the same as its creation time.
+ The `FileSystem.setTimes()` operation to set file timestamps *may* be ignored.
* `FileSystem.chmod()` may update modification times (example: Azure `wasb://`).
* If `FileSystem.append()` is supported, the changes and modification time
are likely to only become visible after the output stream is closed.
* Out-of-band operations to data in object stores (that is: direct requests
to object stores which bypass the Hadoop FileSystem APIs), may result
in different timestamps being stored and/or returned.
* As the notion of a directory structure is often simulated, the timestamps
of directories *may* be artificially generated —perhaps using the current
system time.
* As `rename()` operations are often implemented as a COPY + DELETE, the
timestamps of renamed objects may become that of the time the rename of an
object was started, rather than the timestamp of the source object.
* The exact timestamp behavior may vary between different object store installations,
even with the same timestore client.
Finally, note that the Apache Hadoop project cannot make any guarantees about
whether the timestamp behavior of a remote object store will remain consistent
over time: they are third-party services, usually accessed via third-party libraries.
The best strategy here is "experiment with the exact endpoint you intend to work with".
Furthermore, if you intend to use any caching/consistency layer, test with that
feature enabled. Retest after updates to Hadoop releases, and endpoint object
store updates.