diff --git a/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/fsdatainputstream.md b/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/fsdatainputstream.md
index de6b861a66c..e656d6ff87d 100644
--- a/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/fsdatainputstream.md
+++ b/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/fsdatainputstream.md
@@ -120,7 +120,8 @@ Return the data at the current position.
### `InputStream.read(buffer[], offset, length)`
Read `length` bytes of data into the destination buffer, starting at offset
-`offset`
+`offset`. The source of the data is the current position of the stream,
+as implicitly set in `pos`
#### Preconditions
@@ -129,6 +130,7 @@ Read `length` bytes of data into the destination buffer, starting at offset
length >= 0
offset < len(buffer)
length <= len(buffer) - offset
+ pos >= 0 else raise EOFException, IOException
Exceptions that may be raised on precondition failure are
@@ -136,20 +138,39 @@ Exceptions that may be raised on precondition failure are
ArrayIndexOutOfBoundsException
RuntimeException
+Not all filesystems check the `isOpen` state.
+
#### Postconditions
if length == 0 :
result = 0
- elseif pos > len(data):
- result -1
+ else if pos > len(data):
+ result = -1
else
let l = min(length, len(data)-length) :
- buffer' = buffer where forall i in [0..l-1]:
- buffer'[o+i] = data[pos+i]
- FSDIS' = (pos+l, data, true)
- result = l
+ buffer' = buffer where forall i in [0..l-1]:
+ buffer'[o+i] = data[pos+i]
+ FSDIS' = (pos+l, data, true)
+ result = l
+
+The `java.io` API states that if the amount of data to be read (i.e. `length`)
+then the call must block until the amount of data available is greater than
+zero —that is, until there is some data. The call is not required to return
+when the buffer is full, or indeed block until there is no data left in
+the stream.
+
+That is, rather than `l` being simply defined as `min(length, len(data)-length)`,
+it strictly is an integer in the range `1..min(length, len(data)-length)`.
+While the caller may expect for as much as the buffer as possible to be filled
+in, it is within the specification for an implementation to always return
+a smaller number, perhaps only ever 1 byte.
+
+What is critical is that unless the destination buffer size is 0, the call
+must block until at least one byte is returned. Thus, for any data source
+of length greater than zero, repeated invocations of this `read()` operation
+will eventually read all the data.
### `Seekable.seek(s)`
diff --git a/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java b/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
index 50f0b4d53be..6bef2bfcf40 100644
--- a/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
+++ b/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
@@ -973,6 +973,10 @@ public class DFSInputStream extends FSInputStream
@Override
public synchronized int read(@Nonnull final byte buf[], int off, int len)
throws IOException {
+ validatePositionedReadArgs(pos, buf, off, len);
+ if (len == 0) {
+ return 0;
+ }
ReaderStrategy byteArrayReader = new ByteArrayStrategy(buf);
try (TraceScope ignored =
dfsClient.newPathTraceScope("DFSInputStream#byteArrayRead", src)) {