HDFS-6007. Update documentation about short-circuit local reads (iwasakims via cmccabe)
git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/trunk@1578994 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
e52c1535b8
commit
bd98fa152d
|
@ -255,6 +255,9 @@ Release 2.5.0 - UNRELEASED
|
|||
|
||||
IMPROVEMENTS
|
||||
|
||||
HDFS-6007. Update documentation about short-circuit local reads (iwasakims
|
||||
via cmccabe)
|
||||
|
||||
OPTIMIZATIONS
|
||||
|
||||
BUG FIXES
|
||||
|
|
|
@ -1416,17 +1416,6 @@
|
|||
</description>
|
||||
</property>
|
||||
|
||||
<property>
|
||||
<name>dfs.domain.socket.path</name>
|
||||
<value></value>
|
||||
<description>
|
||||
Optional. This is a path to a UNIX domain socket that will be used for
|
||||
communication between the DataNode and local HDFS clients.
|
||||
If the string "_PORT" is present in this path, it will be replaced by the
|
||||
TCP port of the DataNode.
|
||||
</description>
|
||||
</property>
|
||||
|
||||
<property>
|
||||
<name>dfs.datanode.available-space-volume-choosing-policy.balanced-space-threshold</name>
|
||||
<value>10737418240</value> <!-- 10 GB -->
|
||||
|
@ -1755,4 +1744,101 @@
|
|||
</description>
|
||||
</property>
|
||||
|
||||
<property>
|
||||
<name>dfs.client.read.shortcircuit</name>
|
||||
<value>false</value>
|
||||
<description>
|
||||
This configuration parameter turns on short-circuit local reads.
|
||||
</description>
|
||||
</property>
|
||||
|
||||
<property>
|
||||
<name>dfs.domain.socket.path</name>
|
||||
<value></value>
|
||||
<description>
|
||||
Optional. This is a path to a UNIX domain socket that will be used for
|
||||
communication between the DataNode and local HDFS clients.
|
||||
If the string "_PORT" is present in this path, it will be replaced by the
|
||||
TCP port of the DataNode.
|
||||
</description>
|
||||
</property>
|
||||
|
||||
<property>
|
||||
<name>dfs.client.read.shortcircuit.skip.checksum</name>
|
||||
<value>false</value>
|
||||
<description>
|
||||
If this configuration parameter is set,
|
||||
short-circuit local reads will skip checksums.
|
||||
This is normally not recommended,
|
||||
but it may be useful for special setups.
|
||||
You might consider using this
|
||||
if you are doing your own checksumming outside of HDFS.
|
||||
</description>
|
||||
</property>
|
||||
|
||||
<property>
|
||||
<name>dfs.client.read.shortcircuit.streams.cache.size</name>
|
||||
<value>256</value>
|
||||
<description>
|
||||
The DFSClient maintains a cache of recently opened file descriptors.
|
||||
This parameter controls the size of that cache.
|
||||
Setting this higher will use more file descriptors,
|
||||
but potentially provide better performance on workloads
|
||||
involving lots of seeks.
|
||||
</description>
|
||||
</property>
|
||||
|
||||
<property>
|
||||
<name>dfs.client.read.shortcircuit.streams.cache.expiry.ms</name>
|
||||
<value>300000</value>
|
||||
<description>
|
||||
This controls the minimum amount of time
|
||||
file descriptors need to sit in the client cache context
|
||||
before they can be closed for being inactive for too long.
|
||||
</description>
|
||||
</property>
|
||||
|
||||
<property>
|
||||
<name>dfs.datanode.shared.file.descriptor.paths</name>
|
||||
<value>/dev/shm,/tmp</value>
|
||||
<description>
|
||||
Comma separated paths to the directory on which
|
||||
shared memory segments are created.
|
||||
The client and the DataNode exchange information via
|
||||
this shared memory segment.
|
||||
It tries paths in order until creation of shared memory segment succeeds.
|
||||
</description>
|
||||
</property>
|
||||
|
||||
<property>
|
||||
<name>dfs.client.use.legacy.blockreader.local</name>
|
||||
<value>false</value>
|
||||
<description>
|
||||
Legacy short-circuit reader implementation based on HDFS-2246 is used
|
||||
if this configuration parameter is true.
|
||||
This is for the platforms other than Linux
|
||||
where the new implementation based on HDFS-347 is not available.
|
||||
</description>
|
||||
</property>
|
||||
|
||||
<property>
|
||||
<name>dfs.block.local-path-access.user</name>
|
||||
<value></value>
|
||||
<description>
|
||||
Comma separated list of the users allowd to open block files
|
||||
on legacy short-circuit local read.
|
||||
</description>
|
||||
</property>
|
||||
|
||||
<property>
|
||||
<name>dfs.client.domain.socket.data.traffic</name>
|
||||
<value>false</value>
|
||||
<description>
|
||||
This control whether we will try to pass normal data traffic
|
||||
over UNIX domain socket rather than over TCP socket
|
||||
on node-local data transfer.
|
||||
This is currently experimental and turned off by default.
|
||||
</description>
|
||||
</property>
|
||||
|
||||
</configuration>
|
||||
|
|
|
@ -21,7 +21,9 @@ HDFS Short-Circuit Local Reads
|
|||
|
||||
%{toc|section=1|fromDepth=0}
|
||||
|
||||
* {Background}
|
||||
* {Short-Circuit Local Reads}
|
||||
|
||||
** Background
|
||||
|
||||
In <<<HDFS>>>, reads normally go through the <<<DataNode>>>. Thus, when the
|
||||
client asks the <<<DataNode>>> to read a file, the <<<DataNode>>> reads that
|
||||
|
@ -31,7 +33,7 @@ HDFS Short-Circuit Local Reads
|
|||
the client is co-located with the data. Short-circuit reads provide a
|
||||
substantial performance boost to many applications.
|
||||
|
||||
* {Configuration}
|
||||
** Setup
|
||||
|
||||
To configure short-circuit local reads, you will need to enable
|
||||
<<<libhadoop.so>>>. See
|
||||
|
@ -39,16 +41,19 @@ HDFS Short-Circuit Local Reads
|
|||
Libraries}} for details on enabling this library.
|
||||
|
||||
Short-circuit reads make use of a UNIX domain socket. This is a special path
|
||||
in the filesystem that allows the client and the DataNodes to communicate.
|
||||
You will need to set a path to this socket. The DataNode needs to be able to
|
||||
in the filesystem that allows the client and the <<<DataNode>>>s to communicate.
|
||||
You will need to set a path to this socket. The <<<DataNode>>> needs to be able to
|
||||
create this path. On the other hand, it should not be possible for any user
|
||||
except the hdfs user or root to create this path. For this reason, paths
|
||||
except the HDFS user or root to create this path. For this reason, paths
|
||||
under <<</var/run>>> or <<</var/lib>>> are often used.
|
||||
|
||||
The client and the <<<DataNode>>> exchange information via a shared memory segment
|
||||
on <<</dev/shm>>>.
|
||||
|
||||
Short-circuit local reads need to be configured on both the <<<DataNode>>>
|
||||
and the client.
|
||||
|
||||
* {Example Configuration}
|
||||
** Example Configuration
|
||||
|
||||
Here is an example configuration.
|
||||
|
||||
|
@ -65,32 +70,43 @@ HDFS Short-Circuit Local Reads
|
|||
</configuration>
|
||||
----
|
||||
|
||||
* {Configuration Keys}
|
||||
* Legacy HDFS Short-Circuit Local Reads
|
||||
|
||||
* dfs.client.read.shortcircuit
|
||||
Legacy implementation of short-circuit local reads
|
||||
on which the clients directly open the HDFS block files
|
||||
is still available for platforms other than the Linux.
|
||||
Setting the value of <<<dfs.client.use.legacy.blockreader.local>>>
|
||||
in addition to <<<dfs.client.read.shortcircuit>>>
|
||||
to true enables this feature.
|
||||
|
||||
This configuration parameter turns on short-circuit local reads.
|
||||
You also need to set the value of <<<dfs.datanode.data.dir.perm>>>
|
||||
to <<<750>>> instead of the default <<<700>>> and
|
||||
chmod/chown the directory tree under <<<dfs.datanode.data.dir>>>
|
||||
as readable to the client and the <<<DataNode>>>.
|
||||
You must take caution because this means that
|
||||
the client can read all of the block files bypassing HDFS permission.
|
||||
|
||||
* dfs.client.read.shortcircuit.skip.checksum
|
||||
Because Legacy short-circuit local reads is insecure,
|
||||
access to this feature is limited to the users listed in
|
||||
the value of <<<dfs.block.local-path-access.user>>>.
|
||||
|
||||
If this configuration parameter is set, short-circuit local reads will skip
|
||||
checksums. This is normally not recommended, but it may be useful for
|
||||
special setups. You might consider using this if you are doing your own
|
||||
checksumming outside of HDFS.
|
||||
|
||||
* dfs.client.read.shortcircuit.streams.cache.size
|
||||
|
||||
The DFSClient maintains a cache of recently opened file descriptors. This
|
||||
parameter controls the size of that cache. Setting this higher will use more
|
||||
file descriptors, but potentially provide better performance on workloads
|
||||
involving lots of seeks.
|
||||
|
||||
* dfs.client.read.shortcircuit.streams.cache.expiry.ms
|
||||
|
||||
This controls the minimum amount of time file descriptors need to sit in the
|
||||
FileInputStreamCache before they can be closed for being inactive for too long.
|
||||
|
||||
* dfs.client.domain.socket.data.traffic
|
||||
|
||||
This control whether we will try to pass normal data traffic over UNIX domain
|
||||
sockets.
|
||||
----
|
||||
<configuration>
|
||||
<property>
|
||||
<name>dfs.client.read.shortcircuit</name>
|
||||
<value>true</value>
|
||||
</property>
|
||||
<property>
|
||||
<name>dfs.client.use.legacy.blockreader.local</name>
|
||||
<value>true</value>
|
||||
</property>
|
||||
<property>
|
||||
<name>dfs.datanode.data.dir.perm</name>
|
||||
<value>750</value>
|
||||
</property>
|
||||
<property>
|
||||
<name>dfs.block.local-path-access.user</name>
|
||||
<value>foo,bar</value>
|
||||
</property>
|
||||
</configuration>
|
||||
----
|
||||
|
|
Loading…
Reference in New Issue