diff --git a/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/fsdatainputstream.md b/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/fsdatainputstream.md index de6b861a66c..e656d6ff87d 100644 --- a/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/fsdatainputstream.md +++ b/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/fsdatainputstream.md @@ -120,7 +120,8 @@ Return the data at the current position. ### `InputStream.read(buffer[], offset, length)` Read `length` bytes of data into the destination buffer, starting at offset -`offset` +`offset`. The source of the data is the current position of the stream, +as implicitly set in `pos` #### Preconditions @@ -129,6 +130,7 @@ Read `length` bytes of data into the destination buffer, starting at offset length >= 0 offset < len(buffer) length <= len(buffer) - offset + pos >= 0 else raise EOFException, IOException Exceptions that may be raised on precondition failure are @@ -136,20 +138,39 @@ Exceptions that may be raised on precondition failure are ArrayIndexOutOfBoundsException RuntimeException +Not all filesystems check the `isOpen` state. + #### Postconditions if length == 0 : result = 0 - elseif pos > len(data): - result -1 + else if pos > len(data): + result = -1 else let l = min(length, len(data)-length) : - buffer' = buffer where forall i in [0..l-1]: - buffer'[o+i] = data[pos+i] - FSDIS' = (pos+l, data, true) - result = l + buffer' = buffer where forall i in [0..l-1]: + buffer'[o+i] = data[pos+i] + FSDIS' = (pos+l, data, true) + result = l + +The `java.io` API states that if the amount of data to be read (i.e. `length`) +then the call must block until the amount of data available is greater than +zero —that is, until there is some data. The call is not required to return +when the buffer is full, or indeed block until there is no data left in +the stream. + +That is, rather than `l` being simply defined as `min(length, len(data)-length)`, +it strictly is an integer in the range `1..min(length, len(data)-length)`. +While the caller may expect for as much as the buffer as possible to be filled +in, it is within the specification for an implementation to always return +a smaller number, perhaps only ever 1 byte. + +What is critical is that unless the destination buffer size is 0, the call +must block until at least one byte is returned. Thus, for any data source +of length greater than zero, repeated invocations of this `read()` operation +will eventually read all the data. ### `Seekable.seek(s)` diff --git a/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java b/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java index 50f0b4d53be..6bef2bfcf40 100644 --- a/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java +++ b/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java @@ -973,6 +973,10 @@ public class DFSInputStream extends FSInputStream @Override public synchronized int read(@Nonnull final byte buf[], int off, int len) throws IOException { + validatePositionedReadArgs(pos, buf, off, len); + if (len == 0) { + return 0; + } ReaderStrategy byteArrayReader = new ByteArrayStrategy(buf); try (TraceScope ignored = dfsClient.newPathTraceScope("DFSInputStream#byteArrayRead", src)) {