HDFS-5347. Add HDFS NFS user guide. Contributed by Brandon Li
git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/trunk@1534377 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
fc99f3e8a3
commit
4380e480ce
|
@ -422,6 +422,8 @@ Release 2.2.1 - UNRELEASED
|
|||
and range in error message. (Kousuke Saruta via suresh)
|
||||
|
||||
HDFS-5365. Fix libhdfs compile error on FreeBSD9. (Radim Kolar via cnauroth)
|
||||
|
||||
HDFS-5347. Add HDFS NFS user guide. (brandonli)
|
||||
|
||||
Release 2.2.0 - 2013-10-13
|
||||
|
||||
|
|
|
@ -0,0 +1,258 @@
|
|||
|
||||
~~ Licensed under the Apache License, Version 2.0 (the "License");
|
||||
~~ you may not use this file except in compliance with the License.
|
||||
~~ You may obtain a copy of the License at
|
||||
~~
|
||||
~~ http://www.apache.org/licenses/LICENSE-2.0
|
||||
~~
|
||||
~~ Unless required by applicable law or agreed to in writing, software
|
||||
~~ distributed under the License is distributed on an "AS IS" BASIS,
|
||||
~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
~~ See the License for the specific language governing permissions and
|
||||
~~ limitations under the License. See accompanying LICENSE file.
|
||||
|
||||
---
|
||||
Hadoop Distributed File System-${project.version} - HDFS NFS Gateway
|
||||
---
|
||||
---
|
||||
${maven.build.timestamp}
|
||||
|
||||
HDFS NFS Gateway
|
||||
|
||||
\[ {{{./index.html}Go Back}} \]
|
||||
|
||||
%{toc|section=1|fromDepth=0}
|
||||
|
||||
* {Overview}
|
||||
|
||||
The NFS Gateway supports NFSv3 and allows HDFS to be mounted as part of the client's local file system.
|
||||
Currently NFS Gateway supports and enables the following usage patterns:
|
||||
|
||||
* Users can browse the HDFS file system through their local file system
|
||||
on NFSv3 client compatible operating systems.
|
||||
|
||||
* Users can download files from the the HDFS file system on to their
|
||||
local file system.
|
||||
|
||||
* Users can upload files from their local file system directly to the
|
||||
HDFS file system.
|
||||
|
||||
* Users can stream data directly to HDFS through the mount point. File
|
||||
append is supported but random write is not supported.
|
||||
|
||||
The NFS gateway machine needs the same thing to run an HDFS client like Hadoop JAR files, HADOOP_CONF directory.
|
||||
The NFS gateway can be on the same host as DataNode, NameNode, or any HDFS client.
|
||||
|
||||
|
||||
* {Configuration}
|
||||
|
||||
NFS gateway can work with its default settings in most cases. However, it's
|
||||
strongly recommended for the users to update a few configuration properties based on their use
|
||||
cases. All the related configuration properties can be added or updated in hdfs-site.xml.
|
||||
|
||||
* If the client mounts the export with access time update allowed, make sure the following
|
||||
property is not disabled in the configuration file. Only NameNode needs to restart after
|
||||
this property is changed. On some Unix systems, the user can disable access time update
|
||||
by mounting the export with "noatime".
|
||||
|
||||
----
|
||||
<property>
|
||||
<name>dfs.access.time.precision</name>
|
||||
<value>3600000</value>
|
||||
<description>The access time for HDFS file is precise upto this value.
|
||||
The default value is 1 hour. Setting a value of 0 disables
|
||||
access times for HDFS.
|
||||
</description>
|
||||
</property>
|
||||
----
|
||||
|
||||
* Users are expected to update the file dump directory. NFS client often
|
||||
reorders writes. Sequential writes can arrive at the NFS gateway at random
|
||||
order. This directory is used to temporarily save out-of-order writes
|
||||
before writing to HDFS. For each file, the out-of-order writes are dumped after
|
||||
they are accumulated to exceed certain threshold (e.g., 1MB) in memory.
|
||||
One needs to make sure the directory has enough
|
||||
space. For example, if the application uploads 10 files with each having
|
||||
100MB, it is recommended for this directory to have roughly 1GB space in case if a
|
||||
worst-case write reorder happens to every file. Only NFS gateway needs to restart after
|
||||
this property is updated.
|
||||
|
||||
----
|
||||
<property>
|
||||
<name>dfs.nfs3.dump.dir</name>
|
||||
<value>/tmp/.hdfs-nfs</value>
|
||||
</property>
|
||||
----
|
||||
|
||||
* By default, the export can be mounted by any client. To better control the access,
|
||||
users can update the following property. The value string contains machine name and
|
||||
access privilege, separated by whitespace
|
||||
characters. Machine name format can be single host, wildcards, and IPv4 networks.The
|
||||
access privilege uses rw or ro to specify readwrite or readonly access of the machines to exports. If the access
|
||||
privilege is not provided, the default is read-only. Entries are separated by ";".
|
||||
For example: "192.168.0.0/22 rw ; host*.example.com ; host1.test.org ro;". Only NFS gateway needs to restart after
|
||||
this property is updated.
|
||||
|
||||
----
|
||||
<property>
|
||||
<name>dfs.nfs.exports.allowed.hosts</name>
|
||||
<value>* rw</value>
|
||||
</property>
|
||||
----
|
||||
|
||||
* Customize log settings. To get NFS debug trace, users can edit the log4j.property file
|
||||
to add the following. Note, debug trace, especially for ONCRPC, can be very verbose.
|
||||
|
||||
To change logging level:
|
||||
|
||||
-----------------------------------------------
|
||||
log4j.logger.org.apache.hadoop.hdfs.nfs=DEBUG
|
||||
-----------------------------------------------
|
||||
|
||||
To get more details of ONCRPC requests:
|
||||
|
||||
-----------------------------------------------
|
||||
log4j.logger.org.apache.hadoop.oncrpc=DEBUG
|
||||
-----------------------------------------------
|
||||
|
||||
|
||||
* {Start and stop NFS gateway service}
|
||||
|
||||
Three daemons are required to provide NFS service: rpcbind (or portmap), mountd and nfsd.
|
||||
The NFS gateway process has both nfsd and mountd. It shares the HDFS root "/" as the
|
||||
only export. It is recommended to use the portmap included in NFS gateway package. Even
|
||||
though NFS gateway works with portmap/rpcbind provide by most Linux distributions, the
|
||||
package included portmap is needed on some Linux systems such as REHL6.2 due to an
|
||||
{{{https://bugzilla.redhat.com/show_bug.cgi?id=731542}rpcbind bug}}. More detailed discussions can
|
||||
be found in {{{https://issues.apache.org/jira/browse/HDFS-4763}HDFS-4763}}.
|
||||
|
||||
[[1]] Stop nfs/rpcbind/portmap services provided by the platform (commands can be different on various Unix platforms):
|
||||
|
||||
-------------------------
|
||||
service nfs stop
|
||||
|
||||
service rpcbind stop
|
||||
-------------------------
|
||||
|
||||
|
||||
[[2]] Start package included portmap (needs root privileges):
|
||||
|
||||
-------------------------
|
||||
hadoop portmap
|
||||
|
||||
OR
|
||||
|
||||
hadoop-daemon.sh start portmap
|
||||
-------------------------
|
||||
|
||||
[[3]] Start mountd and nfsd.
|
||||
|
||||
No root privileges are required for this command. However, ensure that the user starting
|
||||
the Hadoop cluster and the user starting the NFS gateway are same.
|
||||
|
||||
-------------------------
|
||||
hadoop nfs3
|
||||
|
||||
OR
|
||||
|
||||
hadoop-daemon.sh start nfs3
|
||||
-------------------------
|
||||
|
||||
Note, if the hadoop-daemon.sh script starts the NFS gateway, its log can be found in the hadoop log folder.
|
||||
|
||||
|
||||
[[4]] Stop NFS gateway services.
|
||||
|
||||
-------------------------
|
||||
hadoop-daemon.sh stop nfs3
|
||||
|
||||
hadoop-daemon.sh stop portmap
|
||||
-------------------------
|
||||
|
||||
|
||||
* {Verify validity of NFS related services}
|
||||
|
||||
[[1]] Execute the following command to verify if all the services are up and running:
|
||||
|
||||
-------------------------
|
||||
rpcinfo -p $nfs_server_ip
|
||||
-------------------------
|
||||
|
||||
You should see output similar to the following:
|
||||
|
||||
-------------------------
|
||||
program vers proto port
|
||||
|
||||
100005 1 tcp 4242 mountd
|
||||
|
||||
100005 2 udp 4242 mountd
|
||||
|
||||
100005 2 tcp 4242 mountd
|
||||
|
||||
100000 2 tcp 111 portmapper
|
||||
|
||||
100000 2 udp 111 portmapper
|
||||
|
||||
100005 3 udp 4242 mountd
|
||||
|
||||
100005 1 udp 4242 mountd
|
||||
|
||||
100003 3 tcp 2049 nfs
|
||||
|
||||
100005 3 tcp 4242 mountd
|
||||
-------------------------
|
||||
|
||||
[[2]] Verify if the HDFS namespace is exported and can be mounted.
|
||||
|
||||
-------------------------
|
||||
showmount -e $nfs_server_ip
|
||||
-------------------------
|
||||
|
||||
You should see output similar to the following:
|
||||
|
||||
-------------------------
|
||||
Exports list on $nfs_server_ip :
|
||||
|
||||
/ (everyone)
|
||||
-------------------------
|
||||
|
||||
|
||||
* {Mount the export “/”}
|
||||
|
||||
Currently NFS v3 only uses TCP as the transportation protocol.
|
||||
NLM is not supported so mount option "nolock" is needed. It's recommended to use
|
||||
hard mount. This is because, even after the client sends all data to
|
||||
NFS gateway, it may take NFS gateway some extra time to transfer data to HDFS
|
||||
when writes were reorderd by NFS client Kernel.
|
||||
|
||||
If soft mount has to be used, the user should give it a relatively
|
||||
long timeout (at least no less than the default timeout on the host) .
|
||||
|
||||
The users can mount the HDFS namespace as shown below:
|
||||
|
||||
-------------------------------------------------------------------
|
||||
mount -t nfs -o vers=3,proto=tcp,nolock $server:/ $mount_point
|
||||
-------------------------------------------------------------------
|
||||
|
||||
Then the users can access HDFS as part of the local file system except that,
|
||||
hard link and random write are not supported yet.
|
||||
|
||||
* {User authentication and mapping}
|
||||
|
||||
NFS gateway in this release uses AUTH_UNIX style authentication. When the user on NFS client
|
||||
accesses the mount point, NFS client passes the UID to NFS gateway.
|
||||
NFS gateway does a lookup to find user name from the UID, and then passes the
|
||||
username to the HDFS along with the HDFS requests.
|
||||
For example, if the NFS client has current user as "admin", when the user accesses
|
||||
the mounted directory, NFS gateway will access HDFS as user "admin". To access HDFS
|
||||
as the user "hdfs", one needs to switch the current user to "hdfs" on the client system
|
||||
when accessing the mounted directory.
|
||||
|
||||
The system administrator must ensure that the user on NFS client host has the same
|
||||
name and UID as that on the NFS gateway host. This is usually not a problem if
|
||||
the same user management system (e.g., LDAP/NIS) is used to create and deploy users on
|
||||
HDFS nodes and NFS client node. In case the user account is created manually in different hosts, one might need to
|
||||
modify UID (e.g., do "usermod -u 123 myusername") on either NFS client or NFS gateway host
|
||||
in order to make it the same on both sides. More technical details of RPC AUTH_UNIX can be found
|
||||
in {{{http://tools.ietf.org/html/rfc1057}RPC specification}}.
|
||||
|
|
@ -80,6 +80,7 @@
|
|||
<item name="HttpFS Gateway" href="hadoop-hdfs-httpfs/index.html"/>
|
||||
<item name="Short Circuit Local Reads"
|
||||
href="hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html"/>
|
||||
<item name="HDFS NFS Gateway" href="hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.html"/>
|
||||
</menu>
|
||||
|
||||
<menu name="MapReduce" inherit="top">
|
||||
|
|
Loading…
Reference in New Issue