HDFS-4453. Make a simple doc to describe the usage and design of the shortcircuit read feature. Contributed by Colin Patrick McCabe.

git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/HDFS-347@1444963 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Aaron Myers 2013-02-11 21:10:23 +00:00
parent 1132f51a9c
commit aa92072b54
3 changed files with 72 additions and 0 deletions

View File

@ -43,3 +43,5 @@ HDFS-4440. Avoid annoying log message when dfs.domain.socket.path is not set. (C
HDFS-4473. Don't create domain socket unless we need it. (Colin Patrick McCabe via atm)
HDFS-4485. DN should chmod socket path a+w. (Colin Patrick McCabe via atm)
HDFS-4453. Make a simple doc to describe the usage and design of the shortcircuit read feature. (Colin Patrick McCabe via atm)

View File

@ -0,0 +1,68 @@
~~ Licensed under the Apache License, Version 2.0 (the "License");
~~ you may not use this file except in compliance with the License.
~~ You may obtain a copy of the License at
~~
~~ http://www.apache.org/licenses/LICENSE-2.0
~~
~~ Unless required by applicable law or agreed to in writing, software
~~ distributed under the License is distributed on an "AS IS" BASIS,
~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~~ See the License for the specific language governing permissions and
~~ limitations under the License. See accompanying LICENSE file.
---
Hadoop Distributed File System-${project.version} - Short-Circuit Local Reads
---
---
${maven.build.timestamp}
HDFS Short-Circuit Local Reads
\[ {{{./index.html}Go Back}} \]
%{toc|section=1|fromDepth=0}
* {Background}
In <<<HDFS>>>, reads normally go through the <<<DataNode>>>. Thus, when the
client asks the <<<DataNode>>> to read a file, the <<<DataNode>>> reads that
file off of the disk and sends the data to the client over a TCP socket.
So-called "short-circuit" reads bypass the <<<DataNode>>>, allowing the client
to read the file directly. Obviously, this is only possible in cases where
the client is co-located with the data. Short-circuit reads provide a
substantial performance boost to many applications.
* {Configuration}
To configure short-circuit local reads, you will need to enable
<<<libhadoop.so>>>. See
{{{../hadoop-common/NativeLibraries.html}Native
Libraries}} for details on enabling this library.
Short-circuit reads make use of a UNIX domain socket. This is a special path
in the filesystem that allows the client and the DataNodes to communicate.
You will need to set a path to this socket. The DataNode needs to be able to
create this path. On the other hand, it should not be possible for any user
except the hdfs user or root to create this path. For this reason, paths
under <<</var/run>>> or <<</var/lib>>> are often used.
Short-circuit local reads need to be configured on both the <<<DataNode>>>
and the client.
* {Example Configuration}
Here is an example configuration.
----
<configuration>
<property>
<name>dfs.client.read.shortcircuit</name>
<value>true</value>
</property>
<property>
<name>dfs.domain.socket.path</name>
<value>/var/lib/hadoop-hdfs/dn_socket</value>
</property>
</configuration>
----

View File

@ -61,6 +61,8 @@
<item name="Federation" href="hadoop-project-dist/hadoop-hdfs/Federation.html"/>
<item name="WebHDFS REST API" href="hadoop-project-dist/hadoop-hdfs/WebHDFS.html"/>
<item name="HttpFS Gateway" href="hadoop-hdfs-httpfs/index.html"/>
<item name="Short Circuit Local Reads"
href="hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html"/>
</menu>
<menu name="MapReduce" inherit="top">