HDFS-5347. Add HDFS NFS user guide. Contributed by Brandon Li

git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/trunk@1534377 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Brandon Li 2013-10-21 21:15:20 +00:00
parent fc99f3e8a3
commit 4380e480ce
3 changed files with 261 additions and 0 deletions

View File

@ -422,6 +422,8 @@ Release 2.2.1 - UNRELEASED
and range in error message. (Kousuke Saruta via suresh)
HDFS-5365. Fix libhdfs compile error on FreeBSD9. (Radim Kolar via cnauroth)
HDFS-5347. Add HDFS NFS user guide. (brandonli)
Release 2.2.0 - 2013-10-13

View File

@ -0,0 +1,258 @@
~~ Licensed under the Apache License, Version 2.0 (the "License");
~~ you may not use this file except in compliance with the License.
~~ You may obtain a copy of the License at
~~
~~ http://www.apache.org/licenses/LICENSE-2.0
~~
~~ Unless required by applicable law or agreed to in writing, software
~~ distributed under the License is distributed on an "AS IS" BASIS,
~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~~ See the License for the specific language governing permissions and
~~ limitations under the License. See accompanying LICENSE file.
---
Hadoop Distributed File System-${project.version} - HDFS NFS Gateway
---
---
${maven.build.timestamp}
HDFS NFS Gateway
\[ {{{./index.html}Go Back}} \]
%{toc|section=1|fromDepth=0}
* {Overview}
The NFS Gateway supports NFSv3 and allows HDFS to be mounted as part of the client's local file system.
Currently NFS Gateway supports and enables the following usage patterns:
* Users can browse the HDFS file system through their local file system
on NFSv3 client compatible operating systems.
* Users can download files from the the HDFS file system on to their
local file system.
* Users can upload files from their local file system directly to the
HDFS file system.
* Users can stream data directly to HDFS through the mount point. File
append is supported but random write is not supported.
The NFS gateway machine needs the same thing to run an HDFS client like Hadoop JAR files, HADOOP_CONF directory.
The NFS gateway can be on the same host as DataNode, NameNode, or any HDFS client.
* {Configuration}
NFS gateway can work with its default settings in most cases. However, it's
strongly recommended for the users to update a few configuration properties based on their use
cases. All the related configuration properties can be added or updated in hdfs-site.xml.
* If the client mounts the export with access time update allowed, make sure the following
property is not disabled in the configuration file. Only NameNode needs to restart after
this property is changed. On some Unix systems, the user can disable access time update
by mounting the export with "noatime".
----
<property>
<name>dfs.access.time.precision</name>
<value>3600000</value>
<description>The access time for HDFS file is precise upto this value.
The default value is 1 hour. Setting a value of 0 disables
access times for HDFS.
</description>
</property>
----
* Users are expected to update the file dump directory. NFS client often
reorders writes. Sequential writes can arrive at the NFS gateway at random
order. This directory is used to temporarily save out-of-order writes
before writing to HDFS. For each file, the out-of-order writes are dumped after
they are accumulated to exceed certain threshold (e.g., 1MB) in memory.
One needs to make sure the directory has enough
space. For example, if the application uploads 10 files with each having
100MB, it is recommended for this directory to have roughly 1GB space in case if a
worst-case write reorder happens to every file. Only NFS gateway needs to restart after
this property is updated.
----
<property>
<name>dfs.nfs3.dump.dir</name>
<value>/tmp/.hdfs-nfs</value>
</property>
----
* By default, the export can be mounted by any client. To better control the access,
users can update the following property. The value string contains machine name and
access privilege, separated by whitespace
characters. Machine name format can be single host, wildcards, and IPv4 networks.The
access privilege uses rw or ro to specify readwrite or readonly access of the machines to exports. If the access
privilege is not provided, the default is read-only. Entries are separated by ";".
For example: "192.168.0.0/22 rw ; host*.example.com ; host1.test.org ro;". Only NFS gateway needs to restart after
this property is updated.
----
<property>
<name>dfs.nfs.exports.allowed.hosts</name>
<value>* rw</value>
</property>
----
* Customize log settings. To get NFS debug trace, users can edit the log4j.property file
to add the following. Note, debug trace, especially for ONCRPC, can be very verbose.
To change logging level:
-----------------------------------------------
log4j.logger.org.apache.hadoop.hdfs.nfs=DEBUG
-----------------------------------------------
To get more details of ONCRPC requests:
-----------------------------------------------
log4j.logger.org.apache.hadoop.oncrpc=DEBUG
-----------------------------------------------
* {Start and stop NFS gateway service}
Three daemons are required to provide NFS service: rpcbind (or portmap), mountd and nfsd.
The NFS gateway process has both nfsd and mountd. It shares the HDFS root "/" as the
only export. It is recommended to use the portmap included in NFS gateway package. Even
though NFS gateway works with portmap/rpcbind provide by most Linux distributions, the
package included portmap is needed on some Linux systems such as REHL6.2 due to an
{{{https://bugzilla.redhat.com/show_bug.cgi?id=731542}rpcbind bug}}. More detailed discussions can
be found in {{{https://issues.apache.org/jira/browse/HDFS-4763}HDFS-4763}}.
[[1]] Stop nfs/rpcbind/portmap services provided by the platform (commands can be different on various Unix platforms):
-------------------------
service nfs stop
service rpcbind stop
-------------------------
[[2]] Start package included portmap (needs root privileges):
-------------------------
hadoop portmap
OR
hadoop-daemon.sh start portmap
-------------------------
[[3]] Start mountd and nfsd.
No root privileges are required for this command. However, ensure that the user starting
the Hadoop cluster and the user starting the NFS gateway are same.
-------------------------
hadoop nfs3
OR
hadoop-daemon.sh start nfs3
-------------------------
Note, if the hadoop-daemon.sh script starts the NFS gateway, its log can be found in the hadoop log folder.
[[4]] Stop NFS gateway services.
-------------------------
hadoop-daemon.sh stop nfs3
hadoop-daemon.sh stop portmap
-------------------------
* {Verify validity of NFS related services}
[[1]] Execute the following command to verify if all the services are up and running:
-------------------------
rpcinfo -p $nfs_server_ip
-------------------------
You should see output similar to the following:
-------------------------
program vers proto port
100005 1 tcp 4242 mountd
100005 2 udp 4242 mountd
100005 2 tcp 4242 mountd
100000 2 tcp 111 portmapper
100000 2 udp 111 portmapper
100005 3 udp 4242 mountd
100005 1 udp 4242 mountd
100003 3 tcp 2049 nfs
100005 3 tcp 4242 mountd
-------------------------
[[2]] Verify if the HDFS namespace is exported and can be mounted.
-------------------------
showmount -e $nfs_server_ip
-------------------------
You should see output similar to the following:
-------------------------
Exports list on $nfs_server_ip :
/ (everyone)
-------------------------
* {Mount the export “/”}
Currently NFS v3 only uses TCP as the transportation protocol.
NLM is not supported so mount option "nolock" is needed. It's recommended to use
hard mount. This is because, even after the client sends all data to
NFS gateway, it may take NFS gateway some extra time to transfer data to HDFS
when writes were reorderd by NFS client Kernel.
If soft mount has to be used, the user should give it a relatively
long timeout (at least no less than the default timeout on the host) .
The users can mount the HDFS namespace as shown below:
-------------------------------------------------------------------
mount -t nfs -o vers=3,proto=tcp,nolock $server:/ $mount_point
-------------------------------------------------------------------
Then the users can access HDFS as part of the local file system except that,
hard link and random write are not supported yet.
* {User authentication and mapping}
NFS gateway in this release uses AUTH_UNIX style authentication. When the user on NFS client
accesses the mount point, NFS client passes the UID to NFS gateway.
NFS gateway does a lookup to find user name from the UID, and then passes the
username to the HDFS along with the HDFS requests.
For example, if the NFS client has current user as "admin", when the user accesses
the mounted directory, NFS gateway will access HDFS as user "admin". To access HDFS
as the user "hdfs", one needs to switch the current user to "hdfs" on the client system
when accessing the mounted directory.
The system administrator must ensure that the user on NFS client host has the same
name and UID as that on the NFS gateway host. This is usually not a problem if
the same user management system (e.g., LDAP/NIS) is used to create and deploy users on
HDFS nodes and NFS client node. In case the user account is created manually in different hosts, one might need to
modify UID (e.g., do "usermod -u 123 myusername") on either NFS client or NFS gateway host
in order to make it the same on both sides. More technical details of RPC AUTH_UNIX can be found
in {{{http://tools.ietf.org/html/rfc1057}RPC specification}}.

View File

@ -80,6 +80,7 @@
<item name="HttpFS Gateway" href="hadoop-hdfs-httpfs/index.html"/>
<item name="Short Circuit Local Reads"
href="hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html"/>
<item name="HDFS NFS Gateway" href="hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.html"/>
</menu>
<menu name="MapReduce" inherit="top">