HDFS-10277. PositionedReadable test testReadFullyZeroByteFile failing in HDFS. Contributed by Steve Loughran.

(cherry picked from commit a409508b3f)
2016-04-12 14:53:01 +09:00 · 2016-04-12 14:53:01 +09:00 · 8f968171bd
parent d93cfefea5
commit 8f968171bd
2 changed files with 32 additions and 7 deletions
--- a/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/fsdatainputstream.md
+++ b/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/fsdatainputstream.md
@ -120,7 +120,8 @@ Return the data at the current position.
 ### <a name="InputStream.read.buffer[]"></a> `InputStream.read(buffer[], offset, length)`

 Read `length` bytes of data into the destination buffer, starting at offset
-`offset`
+`offset`. The source of the data is the current position of the stream,
+as implicitly set in `pos`

 #### Preconditions

@ -129,6 +130,7 @@ Read `length` bytes of data into the destination buffer, starting at offset
    length >= 0
    offset < len(buffer)
    length <= len(buffer) - offset
+    pos >= 0 else raise EOFException, IOException

 Exceptions that may be raised on precondition failure are

@ -136,20 +138,39 @@ Exceptions that may be raised on precondition failure are
    ArrayIndexOutOfBoundsException
    RuntimeException

+Not all filesystems check the `isOpen` state.
+
 #### Postconditions

    if length == 0 :
      result = 0

-    elseif pos > len(data):
-      result -1
+    else if pos > len(data):
+      result = -1

    else
      let l = min(length, len(data)-length) :
-          buffer' = buffer where forall i in [0..l-1]:
-              buffer'[o+i] = data[pos+i]
-          FSDIS' = (pos+l, data, true)
-          result = l
+        buffer' = buffer where forall i in [0..l-1]:
+           buffer'[o+i] = data[pos+i]
+        FSDIS' = (pos+l, data, true)
+        result = l
+
+The `java.io` API states that if the amount of data to be read (i.e. `length`)
+then the call must block until the amount of data available is greater than
+zero —that is, until there is some data. The call is not required to return
+when the buffer is full, or indeed block until there is no data left in
+the stream.
+
+That is, rather than `l` being simply defined as `min(length, len(data)-length)`,
+it strictly is an integer in the range `1..min(length, len(data)-length)`.
+While the caller may expect for as much as the buffer as possible to be filled
+in, it is within the specification for an implementation to always return
+a smaller number, perhaps only ever 1 byte.
+
+What is critical is that unless the destination buffer size is 0, the call
+must block until at least one byte is returned. Thus, for any data source
+of length greater than zero, repeated invocations of this `read()` operation
+will eventually read all the data.

 ### <a name="Seekable.seek"></a>`Seekable.seek(s)`

--- a/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
+++ b/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
@ -974,6 +974,10 @@ public class DFSInputStream extends FSInputStream
  @Override
  public synchronized int read(@Nonnull final byte buf[], int off, int len)
      throws IOException {
+    validatePositionedReadArgs(pos, buf, off, len);
+    if (len == 0) {
+      return 0;
+    }
    ReaderStrategy byteArrayReader = new ByteArrayStrategy(buf);
    try (TraceScope scope =
             dfsClient.newReaderTraceScope("DFSInputStream#byteArrayRead",