HDFS-5347. Add HDFS NFS user guide. Contributed by Brandon Li

git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/trunk@1534377 13f79535-47bb-0310-9956-ffa450edef68
2013-10-21 21:15:20 +00:00 · 2013-10-21 21:15:20 +00:00 · 4380e480ce
parent fc99f3e8a3
commit 4380e480ce
3 changed files with 261 additions and 0 deletions
--- a/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
+++ b/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
@ -422,6 +422,8 @@ Release 2.2.1 - UNRELEASED
    and range in error message. (Kousuke Saruta via suresh)

    HDFS-5365. Fix libhdfs compile error on FreeBSD9. (Radim Kolar via cnauroth)
+    
+    HDFS-5347. Add HDFS NFS user guide. (brandonli)

 Release 2.2.0 - 2013-10-13

--- a/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsNfsGateway.apt.vm
+++ b/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsNfsGateway.apt.vm
@ -0,0 +1,258 @@
+
+~~ Licensed under the Apache License, Version 2.0 (the "License");
+~~ you may not use this file except in compliance with the License.
+~~ You may obtain a copy of the License at
+~~
+~~   http://www.apache.org/licenses/LICENSE-2.0
+~~
+~~ Unless required by applicable law or agreed to in writing, software
+~~ distributed under the License is distributed on an "AS IS" BASIS,
+~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+~~ See the License for the specific language governing permissions and
+~~ limitations under the License. See accompanying LICENSE file.
+
+  ---
+  Hadoop Distributed File System-${project.version} - HDFS NFS Gateway
+  ---
+  ---
+  ${maven.build.timestamp}
+
+HDFS NFS Gateway
+
+  \[ {{{./index.html}Go Back}} \]
+
+%{toc|section=1|fromDepth=0}
+
+* {Overview}
+
+  The NFS Gateway supports NFSv3 and allows HDFS to be mounted as part of the client's local file system.
+  Currently NFS Gateway supports and enables the following usage patterns:
+
+   * Users can browse the HDFS file system through their local file system
+     on NFSv3 client compatible operating systems.
+
+   * Users can download files from the the HDFS file system on to their
+     local file system.
+
+   * Users can upload files from their local file system directly to the
+     HDFS file system.
+
+   * Users can stream data directly to HDFS through the mount point. File
+     append is supported but random write is not supported. 
+
+  The NFS gateway machine needs the same thing to run an HDFS client like Hadoop JAR files, HADOOP_CONF directory.
+  The NFS gateway can be on the same host as DataNode, NameNode, or any HDFS client. 
+
+
+* {Configuration}
+
+   NFS gateway can work with its default settings in most cases. However, it's
+   strongly recommended for the users to update a few configuration properties based on their use
+   cases. All the related configuration properties can be added or updated in hdfs-site.xml.
+  
+   * If the client mounts the export with access time update allowed, make sure the following 
+    property is not disabled in the configuration file. Only NameNode needs to restart after 
+    this property is changed. On some Unix systems, the user can disable access time update
+    by mounting the export with "noatime".
+
+----
+<property>
+  <name>dfs.access.time.precision</name>
+  <value>3600000</value>
+  <description>The access time for HDFS file is precise upto this value. 
+    The default value is 1 hour. Setting a value of 0 disables
+    access times for HDFS.
+  </description>
+</property>
+----
+
+   * Users are expected to update the file dump directory. NFS client often 
+      reorders writes. Sequential writes can arrive at the NFS gateway at random
+      order. This directory is used to temporarily save out-of-order writes
+      before writing to HDFS. For each file, the out-of-order writes are dumped after 
+      they are accumulated to exceed certain threshold (e.g., 1MB) in memory.
+      One needs to make sure the directory has enough
+      space. For example, if the application uploads 10 files with each having 
+      100MB, it is recommended for this directory to have roughly 1GB space in case if a
+      worst-case write reorder happens to every file. Only NFS gateway needs to restart after 
+      this property is updated.
+
+----
+  <property>    
+    <name>dfs.nfs3.dump.dir</name>    
+    <value>/tmp/.hdfs-nfs</value>
+  </property>
+---- 
+
+  * By default, the export can be mounted by any client. To better control the access,
+    users can update the following property. The value string contains machine name and
+    access privilege, separated by whitespace
+    characters. Machine name format can be single host, wildcards, and IPv4 networks.The
+    access privilege uses rw or ro to specify readwrite or readonly access of the machines to exports. If the access
+    privilege is not provided, the default is read-only. Entries are separated by ";".
+    For example: "192.168.0.0/22 rw ; host*.example.com ; host1.test.org ro;". Only NFS gateway needs to restart after 
+    this property is updated.
+
+----
+<property>
+  <name>dfs.nfs.exports.allowed.hosts</name>
+  <value>* rw</value>
+</property>
+----
+
+  * Customize log settings. To get NFS debug trace, users can edit the log4j.property file 
+   to add the following. Note, debug trace, especially for ONCRPC, can be very verbose.
+
+    To change logging level:
+
+----------------------------------------------- 
+    log4j.logger.org.apache.hadoop.hdfs.nfs=DEBUG
+----------------------------------------------- 
+
+    To get more details of ONCRPC requests:
+
+----------------------------------------------- 
+    log4j.logger.org.apache.hadoop.oncrpc=DEBUG
+----------------------------------------------- 
+
+
+* {Start and stop NFS gateway service}
+
+  Three daemons are required to provide NFS service: rpcbind (or portmap), mountd and nfsd.
+  The NFS gateway process has both nfsd and mountd. It shares the HDFS root "/" as the
+  only export. It is recommended to use the portmap included in NFS gateway package. Even
+  though NFS gateway works with portmap/rpcbind provide by most Linux distributions, the
+  package included portmap is needed on some Linux systems such as REHL6.2 due to an 
+  {{{https://bugzilla.redhat.com/show_bug.cgi?id=731542}rpcbind bug}}. More detailed discussions can
+  be found in {{{https://issues.apache.org/jira/browse/HDFS-4763}HDFS-4763}}.
+
+   [[1]] Stop nfs/rpcbind/portmap services provided by the platform (commands can be different on various Unix platforms):
+      
+-------------------------
+     service nfs stop
+      
+     service rpcbind stop
+-------------------------
+
+
+   [[2]] Start package included portmap (needs root privileges):
+
+-------------------------
+     hadoop portmap
+  
+     OR
+
+     hadoop-daemon.sh start portmap
+-------------------------
+
+   [[3]] Start mountd and nfsd.
+   
+     No root privileges are required for this command. However, ensure that the user starting
+     the Hadoop cluster and the user starting the NFS gateway are same.
+
+-------------------------
+     hadoop nfs3
+
+     OR
+
+     hadoop-daemon.sh start nfs3
+-------------------------
+
+     Note, if the hadoop-daemon.sh script starts the NFS gateway, its log can be found in the hadoop log folder.
+
+
+   [[4]] Stop NFS gateway services.
+
+-------------------------
+      hadoop-daemon.sh stop nfs3
+
+      hadoop-daemon.sh stop portmap
+-------------------------
+
+
+* {Verify validity of NFS related services}
+
+    [[1]] Execute the following command to verify if all the services are up and running:
+
+-------------------------
+       rpcinfo -p $nfs_server_ip
+-------------------------
+
+     You should see output similar to the following:
+
+-------------------------
+       program vers proto   port
+
+       100005    1   tcp   4242  mountd
+
+       100005    2   udp   4242  mountd
+
+       100005    2   tcp   4242  mountd
+
+       100000    2   tcp    111  portmapper
+
+       100000    2   udp    111  portmapper
+
+       100005    3   udp   4242  mountd
+
+       100005    1   udp   4242  mountd
+
+       100003    3   tcp   2049  nfs
+
+       100005    3   tcp   4242  mountd
+-------------------------
+
+    [[2]]  Verify if the HDFS namespace is exported and can be mounted.
+
+-------------------------
+        showmount -e $nfs_server_ip                         
+-------------------------
+
+      You should see output similar to the following:
+     
+-------------------------
+        Exports list on $nfs_server_ip :
+
+        / (everyone)
+-------------------------
+
+
+* {Mount the export “/”}
+
+  Currently NFS v3 only uses TCP as the transportation protocol. 
+  NLM is not supported so mount option "nolock" is needed. It's recommended to use
+  hard mount. This is because, even after the client sends all data to 
+  NFS gateway, it may take NFS gateway some extra time to transfer data to HDFS 
+  when writes were reorderd by NFS client Kernel.
+ 
+  If soft mount has to be used, the user should give it a relatively 
+  long timeout (at least no less than the default timeout on the host) .
+
+  The users can mount the HDFS namespace as shown below:
+
+-------------------------------------------------------------------  
+       mount -t nfs -o vers=3,proto=tcp,nolock $server:/  $mount_point
+-------------------------------------------------------------------
+
+  Then the users can access HDFS as part of the local file system except that, 
+  hard link and random write are not supported yet.
+
+* {User authentication and mapping}
+
+  NFS gateway in this release uses AUTH_UNIX style authentication. When the user on NFS client
+  accesses the mount point, NFS client passes the UID to NFS gateway. 
+  NFS gateway does a lookup to find user name from the UID, and then passes the
+  username to the HDFS along with the HDFS requests.
+  For example, if the NFS client has current user as "admin", when the user accesses
+  the mounted directory, NFS gateway will access HDFS as user "admin". To access HDFS
+  as the user "hdfs", one needs to switch the current user to "hdfs" on the client system
+  when accessing the mounted directory.
+
+  The system administrator must ensure that the user on NFS client host has the same
+  name and UID as that on the NFS gateway host. This is usually not a problem if
+  the same user management system (e.g., LDAP/NIS) is used to create and deploy users on
+  HDFS nodes and NFS client node. In case the user account is created manually in different hosts, one might need to 
+  modify UID (e.g., do "usermod -u 123 myusername") on either NFS client or NFS gateway host
+  in order to make it the same on both sides. More technical details of RPC AUTH_UNIX can be found
+  in {{{http://tools.ietf.org/html/rfc1057}RPC specification}}.
+
--- a/hadoop-project/src/site/site.xml
+++ b/hadoop-project/src/site/site.xml
@ -80,6 +80,7 @@
      <item name="HttpFS Gateway" href="hadoop-hdfs-httpfs/index.html"/>
      <item name="Short Circuit Local Reads" 
          href="hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html"/>
+      <item name="HDFS NFS Gateway" href="hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.html"/>
    </menu>

    <menu name="MapReduce" inherit="top">