HDFS-6803 Document DFSClient#DFSInputStream expectations reading and preading in concurrent context. (stack via stevel)

This commit is contained in:
Steve Loughran 2014-11-26 11:43:46 +00:00
parent 0c8a8ba86e
commit 6a01497d7c
2 changed files with 49 additions and 16 deletions

View File

@ -31,8 +31,8 @@ with extensions that add key assumptions to the system.
reads starting at this offset. reads starting at this offset.
1. The cost of forward and backward seeks is low. 1. The cost of forward and backward seeks is low.
1. There is no requirement for the stream implementation to be thread-safe. 1. There is no requirement for the stream implementation to be thread-safe.
Callers MUST assume that instances are not thread-safe. 1. BUT, if a stream implements [PositionedReadable](#PositionedReadable),
"positioned reads" MUST be thread-safe.
Files are opened via `FileSystem.open(p)`, which, if successful, returns: Files are opened via `FileSystem.open(p)`, which, if successful, returns:
@ -84,7 +84,7 @@ unexpectedly.
FSDIS' = ((undefined), (undefined), False) FSDIS' = ((undefined), (undefined), False)
### `Seekable.getPos()` ### <a name="Seekable.getPos"></a>`Seekable.getPos()`
Return the current position. The outcome when a stream is closed is undefined. Return the current position. The outcome when a stream is closed is undefined.
@ -97,7 +97,7 @@ Return the current position. The outcome when a stream is closed is undefined.
result = pos(FSDIS) result = pos(FSDIS)
### `InputStream.read()` ### <a name="InputStream.read"></a> `InputStream.read()`
Return the data at the current position. Return the data at the current position.
@ -117,7 +117,7 @@ Return the data at the current position.
result = -1 result = -1
### `InputStream.read(buffer[], offset, length)` ### <a name="InputStream.read.buffer[]"></a> `InputStream.read(buffer[], offset, length)`
Read `length` bytes of data into the destination buffer, starting at offset Read `length` bytes of data into the destination buffer, starting at offset
`offset` `offset`
@ -151,7 +151,7 @@ Exceptions that may be raised on precondition failure are
FSDIS' = (pos+l, data, true) FSDIS' = (pos+l, data, true)
result = l result = l
### `Seekable.seek(s)` ### <a name="Seekable.seek"></a>`Seekable.seek(s)`
#### Preconditions #### Preconditions
@ -237,14 +237,47 @@ class, which can react to a checksum error in a read by attempting to source
the data elsewhere. It a new source can be found it attempts to reread and the data elsewhere. It a new source can be found it attempts to reread and
recheck that portion of the file. recheck that portion of the file.
## interface `PositionedReadable` ## <a name="PositionedReadable"></a> interface `PositionedReadable`
The `PositionedReadable` operations provide the ability to The `PositionedReadable` operations supply "positioned reads" ("pread").
read data into a buffer from a specific position in They provide the ability to read data into a buffer from a specific
the data stream. position in the data stream. Positioned reads equate to a
[`Seekable.seek`](#Seekable.seek) at a particular offset followed by a
[`InputStream.read(buffer[], offset, length)`](#InputStream.read.buffer[]),
only there is a single method invocation, rather than `seek` then
`read`, and two positioned reads can *optionally* run concurrently
over a single instance of a `FSDataInputStream` stream.
Although the interface declares that it must be thread safe, The interface declares positioned reads thread-safe (some of the
some of the implementations do not follow this guarantee. implementations do not follow this guarantee).
Any positional read run concurrent with a stream operation &mdash; e.g.
[`Seekable.seek`](#Seekable.seek), [`Seekable.getPos()`](#Seekable.getPos),
and [`InputStream.read()`](#InputStream.read) &mdash; MUST run in
isolation; there must not be mutual interference.
Concurrent positional reads and stream operations MUST be serializable;
one may block the other so they run in series but, for better throughput
and 'liveness', they SHOULD run concurrently.
Given two parallel positional reads, one at `pos1` for `len1` into buffer
`dest1`, and another at `pos2` for `len2` into buffer `dest2`, AND given
a concurrent, stream read run after a seek to `pos3`, the resultant
buffers MUST be filled as follows, even if the reads happen to overlap
on the underlying stream:
// Positioned read #1
read(pos1, dest1, ... len1) -> dest1[0..len1 - 1] =
[data(FS, path, pos1), data(FS, path, pos1 + 1) ... data(FS, path, pos1 + len1 - 1]
// Positioned read #2
read(pos2, dest2, ... len2) -> dest2[0..len2 - 1] =
[data(FS, path, pos2), data(FS, path, pos2 + 1) ... data(FS, path, pos2 + len2 - 1]
// Stream read
seek(pos3);
read(dest3, ... len3) -> dest3[0..len3 - 1] =
[data(FS, path, pos3), data(FS, path, pos3 + 1) ... data(FS, path, pos3 + len3 - 1]
#### Implementation preconditions #### Implementation preconditions
@ -265,9 +298,6 @@ of `pos` is unchanged at the end of the operation
pos(FSDIS') == pos(FSDIS) pos(FSDIS') == pos(FSDIS)
There are no guarantees that this holds *during* the operation.
#### Failure states #### Failure states
For any operations that fail, the contents of the destination For any operations that fail, the contents of the destination
@ -326,7 +356,7 @@ The semantics of this are exactly equivalent to
are expected to receive access to the data of `FS.Files[p]` at the time of opening. are expected to receive access to the data of `FS.Files[p]` at the time of opening.
* If the underlying data is changed during the read process, these changes MAY or * If the underlying data is changed during the read process, these changes MAY or
MAY NOT be visible. MAY NOT be visible.
* Such changes are visible MAY be partially visible. * Such changes that are visible MAY be partially visible.
At time t0 At time t0

View File

@ -141,6 +141,9 @@ Release 2.7.0 - UNRELEASED
HDFS-7440. Consolidate snapshot related operations in a single class. HDFS-7440. Consolidate snapshot related operations in a single class.
(wheat9) (wheat9)
HDFS-6803 Document DFSClient#DFSInputStream expectations reading and preading
in concurrent context. (stack via stevel)
OPTIMIZATIONS OPTIMIZATIONS
BUG FIXES BUG FIXES