diff --git a/hadoop-common-project/hadoop-common/src/site/apt/ClusterSetup.apt.vm b/hadoop-common-project/hadoop-common/src/site/apt/ClusterSetup.apt.vm index 4857cc797a7..546db252b90 100644 --- a/hadoop-common-project/hadoop-common/src/site/apt/ClusterSetup.apt.vm +++ b/hadoop-common-project/hadoop-common/src/site/apt/ClusterSetup.apt.vm @@ -571,440 +571,6 @@ $ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh stop proxyserver --config $HADOOP_CONF_D $ $HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh stop historyserver --config $HADOOP_CONF_DIR ---- -* {Running Hadoop in Secure Mode} - - This section deals with important parameters to be specified in - to run Hadoop in <> with strong, Kerberos-based - authentication. - - * <<>> - - Ensure that HDFS and YARN daemons run as different Unix users, for e.g. - <<>> and <<>>. Also, ensure that the MapReduce JobHistory - server runs as user <<>>. - - It's recommended to have them share a Unix group, for e.g. <<>>. - -*---------------+----------------------------------------------------------------------+ -|| User:Group || Daemons | -*---------------+----------------------------------------------------------------------+ -| hdfs:hadoop | NameNode, Secondary NameNode, Checkpoint Node, Backup Node, DataNode | -*---------------+----------------------------------------------------------------------+ -| yarn:hadoop | ResourceManager, NodeManager | -*---------------+----------------------------------------------------------------------+ -| mapred:hadoop | MapReduce JobHistory Server | -*---------------+----------------------------------------------------------------------+ - - * <<>> - - The following table lists various paths on HDFS and local filesystems (on - all nodes) and recommended permissions: - -*-------------------+-------------------+------------------+------------------+ -|| Filesystem || Path || User:Group || Permissions | -*-------------------+-------------------+------------------+------------------+ -| local | <<>> | hdfs:hadoop | drwx------ | -*-------------------+-------------------+------------------+------------------+ -| local | <<>> | hdfs:hadoop | drwx------ | -*-------------------+-------------------+------------------+------------------+ -| local | $HADOOP_LOG_DIR | hdfs:hadoop | drwxrwxr-x | -*-------------------+-------------------+------------------+------------------+ -| local | $YARN_LOG_DIR | yarn:hadoop | drwxrwxr-x | -*-------------------+-------------------+------------------+------------------+ -| local | <<>> | yarn:hadoop | drwxr-xr-x | -*-------------------+-------------------+------------------+------------------+ -| local | <<>> | yarn:hadoop | drwxr-xr-x | -*-------------------+-------------------+------------------+------------------+ -| local | container-executor | root:hadoop | --Sr-s--- | -*-------------------+-------------------+------------------+------------------+ -| local | <<>> | root:hadoop | r-------- | -*-------------------+-------------------+------------------+------------------+ -| hdfs | / | hdfs:hadoop | drwxr-xr-x | -*-------------------+-------------------+------------------+------------------+ -| hdfs | /tmp | hdfs:hadoop | drwxrwxrwxt | -*-------------------+-------------------+------------------+------------------+ -| hdfs | /user | hdfs:hadoop | drwxr-xr-x | -*-------------------+-------------------+------------------+------------------+ -| hdfs | <<>> | yarn:hadoop | drwxrwxrwxt | -*-------------------+-------------------+------------------+------------------+ -| hdfs | <<>> | mapred:hadoop | | -| | | | drwxrwxrwxt | -*-------------------+-------------------+------------------+------------------+ -| hdfs | <<>> | mapred:hadoop | | -| | | | drwxr-x--- | -*-------------------+-------------------+------------------+------------------+ - - * Kerberos Keytab files - - * HDFS - - The NameNode keytab file, on the NameNode host, should look like the - following: - ----- -$ /usr/kerberos/bin/klist -e -k -t /etc/security/keytab/nn.service.keytab -Keytab name: FILE:/etc/security/keytab/nn.service.keytab -KVNO Timestamp Principal - 4 07/18/11 21:08:09 nn/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) - 4 07/18/11 21:08:09 nn/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) - 4 07/18/11 21:08:09 nn/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) - 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) - 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) - 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) ----- - - The Secondary NameNode keytab file, on that host, should look like the - following: - ----- -$ /usr/kerberos/bin/klist -e -k -t /etc/security/keytab/sn.service.keytab -Keytab name: FILE:/etc/security/keytab/sn.service.keytab -KVNO Timestamp Principal - 4 07/18/11 21:08:09 sn/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) - 4 07/18/11 21:08:09 sn/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) - 4 07/18/11 21:08:09 sn/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) - 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) - 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) - 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) ----- - - The DataNode keytab file, on each host, should look like the following: - ----- -$ /usr/kerberos/bin/klist -e -k -t /etc/security/keytab/dn.service.keytab -Keytab name: FILE:/etc/security/keytab/dn.service.keytab -KVNO Timestamp Principal - 4 07/18/11 21:08:09 dn/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) - 4 07/18/11 21:08:09 dn/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) - 4 07/18/11 21:08:09 dn/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) - 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) - 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) - 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) ----- - - * YARN - - The ResourceManager keytab file, on the ResourceManager host, should look - like the following: - ----- -$ /usr/kerberos/bin/klist -e -k -t /etc/security/keytab/rm.service.keytab -Keytab name: FILE:/etc/security/keytab/rm.service.keytab -KVNO Timestamp Principal - 4 07/18/11 21:08:09 rm/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) - 4 07/18/11 21:08:09 rm/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) - 4 07/18/11 21:08:09 rm/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) - 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) - 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) - 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) ----- - - The NodeManager keytab file, on each host, should look like the following: - ----- -$ /usr/kerberos/bin/klist -e -k -t /etc/security/keytab/nm.service.keytab -Keytab name: FILE:/etc/security/keytab/nm.service.keytab -KVNO Timestamp Principal - 4 07/18/11 21:08:09 nm/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) - 4 07/18/11 21:08:09 nm/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) - 4 07/18/11 21:08:09 nm/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) - 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) - 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) - 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) ----- - - * MapReduce JobHistory Server - - The MapReduce JobHistory Server keytab file, on that host, should look - like the following: - ----- -$ /usr/kerberos/bin/klist -e -k -t /etc/security/keytab/jhs.service.keytab -Keytab name: FILE:/etc/security/keytab/jhs.service.keytab -KVNO Timestamp Principal - 4 07/18/11 21:08:09 jhs/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) - 4 07/18/11 21:08:09 jhs/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) - 4 07/18/11 21:08:09 jhs/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) - 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) - 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) - 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) ----- - -** Configuration in Secure Mode - - * <<>> - -*-------------------------+-------------------------+------------------------+ -|| Parameter || Value || Notes | -*-------------------------+-------------------------+------------------------+ -| <<>> | | is non-secure. | -*-------------------------+-------------------------+------------------------+ -| <<>> | | | -| | | Enable RPC service-level authorization. | -*-------------------------+-------------------------+------------------------+ - - * <<>> - - * Configurations for NameNode: - -*-------------------------+-------------------------+------------------------+ -|| Parameter || Value || Notes | -*-------------------------+-------------------------+------------------------+ -| <<>> | | | -| | | Enable HDFS block access tokens for secure operations. | -*-------------------------+-------------------------+------------------------+ -| <<>> | | | -| | | This value is deprecated. Use dfs.http.policy | -*-------------------------+-------------------------+------------------------+ -| <<>> | or or | | -| | | HTTPS_ONLY turns off http access | -*-------------------------+-------------------------+------------------------+ -| <<>> | | | -*-------------------------+-------------------------+------------------------+ -| <<>> | <50470> | | -*-------------------------+-------------------------+------------------------+ -| <<>> | | | -| | | Kerberos keytab file for the NameNode. | -*-------------------------+-------------------------+------------------------+ -| <<>> | nn/_HOST@REALM.TLD | | -| | | Kerberos principal name for the NameNode. | -*-------------------------+-------------------------+------------------------+ -| <<>> | host/_HOST@REALM.TLD | | -| | | HTTPS Kerberos principal name for the NameNode. | -*-------------------------+-------------------------+------------------------+ - - * Configurations for Secondary NameNode: - -*-------------------------+-------------------------+------------------------+ -|| Parameter || Value || Notes | -*-------------------------+-------------------------+------------------------+ -| <<>> | | | -*-------------------------+-------------------------+------------------------+ -| <<>> | <50470> | | -*-------------------------+-------------------------+------------------------+ -| <<>> | | | -| | | | -| | | Kerberos keytab file for the NameNode. | -*-------------------------+-------------------------+------------------------+ -| <<>> | sn/_HOST@REALM.TLD | | -| | | Kerberos principal name for the Secondary NameNode. | -*-------------------------+-------------------------+------------------------+ -| <<>> | | | -| | host/_HOST@REALM.TLD | | -| | | HTTPS Kerberos principal name for the Secondary NameNode. | -*-------------------------+-------------------------+------------------------+ - - * Configurations for DataNode: - -*-------------------------+-------------------------+------------------------+ -|| Parameter || Value || Notes | -*-------------------------+-------------------------+------------------------+ -| <<>> | 700 | | -*-------------------------+-------------------------+------------------------+ -| <<>> | <0.0.0.0:2003> | | -*-------------------------+-------------------------+------------------------+ -| <<>> | <0.0.0.0:2005> | | -*-------------------------+-------------------------+------------------------+ -| <<>> | | | -| | | Kerberos keytab file for the DataNode. | -*-------------------------+-------------------------+------------------------+ -| <<>> | dn/_HOST@REALM.TLD | | -| | | Kerberos principal name for the DataNode. | -*-------------------------+-------------------------+------------------------+ -| <<>> | | | -| | host/_HOST@REALM.TLD | | -| | | HTTPS Kerberos principal name for the DataNode. | -*-------------------------+-------------------------+------------------------+ - - * <<>> - - * WebAppProxy - - The <<>> provides a proxy between the web applications - exported by an application and an end user. If security is enabled - it will warn users before accessing a potentially unsafe web application. - Authentication and authorization using the proxy is handled just like - any other privileged web application. - -*-------------------------+-------------------------+------------------------+ -|| Parameter || Value || Notes | -*-------------------------+-------------------------+------------------------+ -| <<>> | | | -| | <<>> host:port for proxy to AM web apps. | | -| | | if this is the same as <<>>| -| | | or it is not defined then the <<>> will run the proxy| -| | | otherwise a standalone proxy server will need to be launched.| -*-------------------------+-------------------------+------------------------+ -| <<>> | | | -| | | | -| | | Kerberos keytab file for the WebAppProxy. | -*-------------------------+-------------------------+------------------------+ -| <<>> | wap/_HOST@REALM.TLD | | -| | | Kerberos principal name for the WebAppProxy. | -*-------------------------+-------------------------+------------------------+ - - * LinuxContainerExecutor - - A <<>> used by YARN framework which define how any - launched and controlled. - - The following are the available in Hadoop YARN: - -*--------------------------------------+--------------------------------------+ -|| ContainerExecutor || Description | -*--------------------------------------+--------------------------------------+ -| <<>> | | -| | The default executor which YARN uses to manage container execution. | -| | The container process has the same Unix user as the NodeManager. | -*--------------------------------------+--------------------------------------+ -| <<>> | | -| | Supported only on GNU/Linux, this executor runs the containers as either the | -| | YARN user who submitted the application (when full security is enabled) or | -| | as a dedicated user (defaults to nobody) when full security is not enabled. | -| | When full security is enabled, this executor requires all user accounts to be | -| | created on the cluster nodes where the containers are launched. It uses | -| | a executable that is included in the Hadoop distribution. | -| | The NodeManager uses this executable to launch and kill containers. | -| | The setuid executable switches to the user who has submitted the | -| | application and launches or kills the containers. For maximum security, | -| | this executor sets up restricted permissions and user/group ownership of | -| | local files and directories used by the containers such as the shared | -| | objects, jars, intermediate files, log files etc. Particularly note that, | -| | because of this, except the application owner and NodeManager, no other | -| | user can access any of the local files/directories including those | -| | localized as part of the distributed cache. | -*--------------------------------------+--------------------------------------+ - - To build the LinuxContainerExecutor executable run: - ----- - $ mvn package -Dcontainer-executor.conf.dir=/etc/hadoop/ ----- - - The path passed in <<<-Dcontainer-executor.conf.dir>>> should be the - path on the cluster nodes where a configuration file for the setuid - executable should be located. The executable should be installed in - $HADOOP_YARN_HOME/bin. - - The executable must have specific permissions: 6050 or --Sr-s--- - permissions user-owned by (super-user) and group-owned by a - special group (e.g. <<>>) of which the NodeManager Unix user is - the group member and no ordinary application user is. If any application - user belongs to this special group, security will be compromised. This - special group name should be specified for the configuration property - <<>> in both - <<>> and <<>>. - - For example, let's say that the NodeManager is run as user who is - part of the groups users and , any of them being the primary group. - Let also be that has both and another user - (application submitter) as its members, and does not - belong to . Going by the above description, the setuid/setgid - executable should be set 6050 or --Sr-s--- with user-owner as and - group-owner as which has as its member (and not - which has also as its member besides ). - - The LinuxTaskController requires that paths including and leading up to - the directories specified in <<>> and - <<>> to be set 755 permissions as described - above in the table on permissions on directories. - - * <<>> - - The executable requires a configuration file called - <<>> to be present in the configuration - directory passed to the mvn target mentioned above. - - The configuration file must be owned by the user running NodeManager - (user <<>> in the above example), group-owned by anyone and - should have the permissions 0400 or r--------. - - The executable requires following configuration items to be present - in the <<>> file. The items should be - mentioned as simple key=value pairs, one per-line: - -*-------------------------+-------------------------+------------------------+ -|| Parameter || Value || Notes | -*-------------------------+-------------------------+------------------------+ -| <<>> | | | -| | | Unix group of the NodeManager. The group owner of the | -| | | binary should be this group. Should be same as the | -| | | value with which the NodeManager is configured. This configuration is | -| | | required for validating the secure access of the | -| | | binary. | -*-------------------------+-------------------------+------------------------+ -| <<>> | hfds,yarn,mapred,bin | Banned users. | -*-------------------------+-------------------------+------------------------+ -| <<>> | foo,bar | Allowed system users. | -*-------------------------+-------------------------+------------------------+ -| <<>> | 1000 | Prevent other super-users. | -*-------------------------+-------------------------+------------------------+ - - To re-cap, here are the local file-sysytem permissions required for the - various paths related to the <<>>: - -*-------------------+-------------------+------------------+------------------+ -|| Filesystem || Path || User:Group || Permissions | -*-------------------+-------------------+------------------+------------------+ -| local | container-executor | root:hadoop | --Sr-s--- | -*-------------------+-------------------+------------------+------------------+ -| local | <<>> | root:hadoop | r-------- | -*-------------------+-------------------+------------------+------------------+ -| local | <<>> | yarn:hadoop | drwxr-xr-x | -*-------------------+-------------------+------------------+------------------+ -| local | <<>> | yarn:hadoop | drwxr-xr-x | -*-------------------+-------------------+------------------+------------------+ - - * Configurations for ResourceManager: - -*-------------------------+-------------------------+------------------------+ -|| Parameter || Value || Notes | -*-------------------------+-------------------------+------------------------+ -| <<>> | | | -| | | | -| | | Kerberos keytab file for the ResourceManager. | -*-------------------------+-------------------------+------------------------+ -| <<>> | rm/_HOST@REALM.TLD | | -| | | Kerberos principal name for the ResourceManager. | -*-------------------------+-------------------------+------------------------+ - - * Configurations for NodeManager: - -*-------------------------+-------------------------+------------------------+ -|| Parameter || Value || Notes | -*-------------------------+-------------------------+------------------------+ -| <<>> | | | -| | | Kerberos keytab file for the NodeManager. | -*-------------------------+-------------------------+------------------------+ -| <<>> | nm/_HOST@REALM.TLD | | -| | | Kerberos principal name for the NodeManager. | -*-------------------------+-------------------------+------------------------+ -| <<>> | | | -| | <<>> | -| | | Use LinuxContainerExecutor. | -*-------------------------+-------------------------+------------------------+ -| <<>> | | | -| | | Unix group of the NodeManager. | -*-------------------------+-------------------------+------------------------+ - - * <<>> - - * Configurations for MapReduce JobHistory Server: - -*-------------------------+-------------------------+------------------------+ -|| Parameter || Value || Notes | -*-------------------------+-------------------------+------------------------+ -| <<>> | | | -| | MapReduce JobHistory Server | Default port is 10020. | -*-------------------------+-------------------------+------------------------+ -| <<>> | | -| | | | -| | | Kerberos keytab file for the MapReduce JobHistory Server. | -*-------------------------+-------------------------+------------------------+ -| <<>> | jhs/_HOST@REALM.TLD | | -| | | Kerberos principal name for the MapReduce JobHistory Server. | -*-------------------------+-------------------------+------------------------+ - * {Operating the Hadoop Cluster} diff --git a/hadoop-common-project/hadoop-common/src/site/apt/SecureMode.apt.vm b/hadoop-common-project/hadoop-common/src/site/apt/SecureMode.apt.vm new file mode 100644 index 00000000000..9bd55a67fff --- /dev/null +++ b/hadoop-common-project/hadoop-common/src/site/apt/SecureMode.apt.vm @@ -0,0 +1,637 @@ +~~ Licensed under the Apache License, Version 2.0 (the "License"); +~~ you may not use this file except in compliance with the License. +~~ You may obtain a copy of the License at +~~ +~~ http://www.apache.org/licenses/LICENSE-2.0 +~~ +~~ Unless required by applicable law or agreed to in writing, software +~~ distributed under the License is distributed on an "AS IS" BASIS, +~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +~~ See the License for the specific language governing permissions and +~~ limitations under the License. See accompanying LICENSE file. + + --- + Hadoop in Secure Mode + --- + --- + ${maven.build.timestamp} + +%{toc|section=0|fromDepth=0|toDepth=3} + +Hadoop in Secure Mode + +* Introduction + + This document describes how to configure authentication for Hadoop in + secure mode. + + By default Hadoop runs in non-secure mode in which no actual + authentication is required. + By configuring Hadoop runs in secure mode, + each user and service needs to be authenticated by Kerberos + in order to use Hadoop services. + + Security features of Hadoop consist of + {{{Authentication}authentication}}, + {{{./ServiceLevelAuth.html}service level authorization}}, + {{{./HttpAuthentication.html}authentication for Web consoles}} + and {{{Data confidentiality}data confidenciality}}. + + +* Authentication + +** End User Accounts + + When service level authentication is turned on, + end users using Hadoop in secure mode needs to be authenticated by Kerberos. + The simplest way to do authentication is using <<>> command of Kerberos. + +** User Accounts for Hadoop Daemons + + Ensure that HDFS and YARN daemons run as different Unix users, + e.g. <<>> and <<>>. + Also, ensure that the MapReduce JobHistory server runs as + different user such as <<>>. + + It's recommended to have them share a Unix group, for e.g. <<>>. + See also "{{Mapping from user to group}}" for group management. + +*---------------+----------------------------------------------------------------------+ +|| User:Group || Daemons | +*---------------+----------------------------------------------------------------------+ +| hdfs:hadoop | NameNode, Secondary NameNode, JournalNode, DataNode | +*---------------+----------------------------------------------------------------------+ +| yarn:hadoop | ResourceManager, NodeManager | +*---------------+----------------------------------------------------------------------+ +| mapred:hadoop | MapReduce JobHistory Server | +*---------------+----------------------------------------------------------------------+ + +** Kerberos principals for Hadoop Daemons and Users + + For running hadoop service daemons in Hadoop in secure mode, + Kerberos principals are required. + Each service reads auhenticate information saved in keytab file with appropriate permission. + + HTTP web-consoles should be served by principal different from RPC's one. + + Subsections below shows the examples of credentials for Hadoop services. + +*** HDFS + + The NameNode keytab file, on the NameNode host, should look like the + following: + +---- +$ klist -e -k -t /etc/security/keytab/nn.service.keytab +Keytab name: FILE:/etc/security/keytab/nn.service.keytab +KVNO Timestamp Principal + 4 07/18/11 21:08:09 nn/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) + 4 07/18/11 21:08:09 nn/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) + 4 07/18/11 21:08:09 nn/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) + 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) + 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) + 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) +---- + + The Secondary NameNode keytab file, on that host, should look like the + following: + +---- +$ klist -e -k -t /etc/security/keytab/sn.service.keytab +Keytab name: FILE:/etc/security/keytab/sn.service.keytab +KVNO Timestamp Principal + 4 07/18/11 21:08:09 sn/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) + 4 07/18/11 21:08:09 sn/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) + 4 07/18/11 21:08:09 sn/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) + 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) + 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) + 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) +---- + + The DataNode keytab file, on each host, should look like the following: + +---- +$ klist -e -k -t /etc/security/keytab/dn.service.keytab +Keytab name: FILE:/etc/security/keytab/dn.service.keytab +KVNO Timestamp Principal + 4 07/18/11 21:08:09 dn/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) + 4 07/18/11 21:08:09 dn/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) + 4 07/18/11 21:08:09 dn/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) + 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) + 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) + 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) +---- + +*** YARN + + The ResourceManager keytab file, on the ResourceManager host, should look + like the following: + +---- +$ klist -e -k -t /etc/security/keytab/rm.service.keytab +Keytab name: FILE:/etc/security/keytab/rm.service.keytab +KVNO Timestamp Principal + 4 07/18/11 21:08:09 rm/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) + 4 07/18/11 21:08:09 rm/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) + 4 07/18/11 21:08:09 rm/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) + 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) + 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) + 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) +---- + + The NodeManager keytab file, on each host, should look like the following: + +---- +$ klist -e -k -t /etc/security/keytab/nm.service.keytab +Keytab name: FILE:/etc/security/keytab/nm.service.keytab +KVNO Timestamp Principal + 4 07/18/11 21:08:09 nm/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) + 4 07/18/11 21:08:09 nm/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) + 4 07/18/11 21:08:09 nm/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) + 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) + 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) + 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) +---- + +*** MapReduce JobHistory Server + + The MapReduce JobHistory Server keytab file, on that host, should look + like the following: + +---- +$ klist -e -k -t /etc/security/keytab/jhs.service.keytab +Keytab name: FILE:/etc/security/keytab/jhs.service.keytab +KVNO Timestamp Principal + 4 07/18/11 21:08:09 jhs/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) + 4 07/18/11 21:08:09 jhs/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) + 4 07/18/11 21:08:09 jhs/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) + 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) + 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) + 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) +---- + +** Mapping from Kerberos principal to OS user account + + Hadoop maps Kerberos principal to OS user account using + the rule specified by <<>> + which works in the same way as the <<>> in + {{{http://web.mit.edu/Kerberos/krb5-latest/doc/admin/conf_files/krb5_conf.html}Kerberos configuration file (krb5.conf)}}. + + By default, it picks the first component of principal name as a user name + if the realms matches to the <<>> (usually defined in /etc/krb5.conf). + For example, <<>> is mapped to <<>> + by default rule. + +** Mapping from user to group + + Though files on HDFS are associated to owner and group, + Hadoop does not have the definition of group by itself. + Mapping from user to group is done by OS or LDAP. + + You can change a way of mapping by + specifying the name of mapping provider as a value of + <<>> + See {{{../hadoop-hdfs/HdfsPermissionsGuide.html}HDFS Permissions Guide}} for details. + + Practically you need to manage SSO environment using Kerberos with LDAP + for Hadoop in secure mode. + +** Proxy user + + Some products such as Apache Oozie which access the services of Hadoop + on behalf of end users need to be able to impersonate end users. + You can configure proxy user using properties + <<>> and <<>>. + + For example, by specifying as below in core-site.xml, + user named <<>> accessing from any host + can impersonate any user belonging to any group. + +---- + + hadoop.proxyuser.oozie.hosts + * + + + hadoop.proxyuser.oozie.groups + * + +---- + +** Secure DataNode + + Because the data transfer protocol of DataNode + does not use the RPC framework of Hadoop, + DataNode must authenticate itself by + using privileged ports which are specified by + <<>> and <<>>. + This authentication is based on the assumption + that the attacker won't be able to get root privileges. + + When you execute <<>> command as root, + server process binds privileged port at first, + then drops privilege and runs as the user account specified by + <<>>. + This startup process uses jsvc installed to <<>>. + You must specify <<>> and <<>> + as environment variables on start up (in hadoop-env.sh). + + +* Data confidentiality + +** Data Encryption on RPC + + The data transfered between hadoop services and clients. + Setting <<>> to <<<"privacy">>> in the core-site.xml + activate data encryption. + +** Data Encryption on Block data transfer. + + You need to set <<>> to <<<"true">>> in the hdfs-site.xml + in order to activate data encryption for data transfer protocol of DataNode. + +** Data Encryption on HTTP + + Data transfer between Web-console and clients are protected by using SSL(HTTPS). + + +* Configuration + +** Permissions for both HDFS and local fileSystem paths + + The following table lists various paths on HDFS and local filesystems (on + all nodes) and recommended permissions: + +*-------------------+-------------------+------------------+------------------+ +|| Filesystem || Path || User:Group || Permissions | +*-------------------+-------------------+------------------+------------------+ +| local | <<>> | hdfs:hadoop | drwx------ | +*-------------------+-------------------+------------------+------------------+ +| local | <<>> | hdfs:hadoop | drwx------ | +*-------------------+-------------------+------------------+------------------+ +| local | $HADOOP_LOG_DIR | hdfs:hadoop | drwxrwxr-x | +*-------------------+-------------------+------------------+------------------+ +| local | $YARN_LOG_DIR | yarn:hadoop | drwxrwxr-x | +*-------------------+-------------------+------------------+------------------+ +| local | <<>> | yarn:hadoop | drwxr-xr-x | +*-------------------+-------------------+------------------+------------------+ +| local | <<>> | yarn:hadoop | drwxr-xr-x | +*-------------------+-------------------+------------------+------------------+ +| local | container-executor | root:hadoop | --Sr-s--- | +*-------------------+-------------------+------------------+------------------+ +| local | <<>> | root:hadoop | r-------- | +*-------------------+-------------------+------------------+------------------+ +| hdfs | / | hdfs:hadoop | drwxr-xr-x | +*-------------------+-------------------+------------------+------------------+ +| hdfs | /tmp | hdfs:hadoop | drwxrwxrwxt | +*-------------------+-------------------+------------------+------------------+ +| hdfs | /user | hdfs:hadoop | drwxr-xr-x | +*-------------------+-------------------+------------------+------------------+ +| hdfs | <<>> | yarn:hadoop | drwxrwxrwxt | +*-------------------+-------------------+------------------+------------------+ +| hdfs | <<>> | mapred:hadoop | | +| | | | drwxrwxrwxt | +*-------------------+-------------------+------------------+------------------+ +| hdfs | <<>> | mapred:hadoop | | +| | | | drwxr-x--- | +*-------------------+-------------------+------------------+------------------+ + +** Common Configurations + + In order to turn on RPC authentication in hadoop, + set the value of <<>> property to + <<<"kerberos">>>, and set security related settings listed below appropriately. + + The following properties should be in the <<>> of all the + nodes in the cluster. + +*-------------------------+-------------------------+------------------------+ +|| Parameter || Value || Notes | +*-------------------------+-------------------------+------------------------+ +| <<>> | | | +| | | <<>> : No authentication. (default) \ +| | | <<>> : Enable authentication by Kerberos. | +*-------------------------+-------------------------+------------------------+ +| <<>> | | | +| | | Enable {{{./ServiceLevelAuth.html}RPC service-level authorization}}. | +*-------------------------+-------------------------+------------------------+ +| <<>> | | +| | | : authentication only (default) \ +| | | : integrity check in addition to authentication \ +| | | : data encryption in addition to integrity | +*-------------------------+-------------------------+------------------------+ +| <<>> | | | +| | <<>>\ +| | <<>>\ +| | <...>\ +| | DEFAULT | +| | | The value is string containing new line characters. +| | | See +| | | {{{http://web.mit.edu/Kerberos/krb5-latest/doc/admin/conf_files/krb5_conf.html}Kerberos documentation}} +| | | for format for . +*-------------------------+-------------------------+------------------------+ +| <<>><<<.hosts>>> | | | +| | | comma separated hosts from which access are allowd to impersonation. | +| | | <<<*>>> means wildcard. | +*-------------------------+-------------------------+------------------------+ +| <<>><<<.groups>>> | | | +| | | comma separated groups to which users impersonated by belongs. | +| | | <<<*>>> means wildcard. | +*-------------------------+-------------------------+------------------------+ +Configuration for <<>> + +** NameNode + +*-------------------------+-------------------------+------------------------+ +|| Parameter || Value || Notes | +*-------------------------+-------------------------+------------------------+ +| <<>> | | | +| | | Enable HDFS block access tokens for secure operations. | +*-------------------------+-------------------------+------------------------+ +| <<>> | | | +| | | This value is deprecated. Use dfs.http.policy | +*-------------------------+-------------------------+------------------------+ +| <<>> | or or | | +| | | HTTPS_ONLY turns off http access | +*-------------------------+-------------------------+------------------------+ +| <<>> | | | +*-------------------------+-------------------------+------------------------+ +| <<>> | <50470> | | +*-------------------------+-------------------------+------------------------+ +| <<>> | | | +| | | Kerberos keytab file for the NameNode. | +*-------------------------+-------------------------+------------------------+ +| <<>> | nn/_HOST@REALM.TLD | | +| | | Kerberos principal name for the NameNode. | +*-------------------------+-------------------------+------------------------+ +| <<>> | host/_HOST@REALM.TLD | | +| | | HTTPS Kerberos principal name for the NameNode. | +*-------------------------+-------------------------+------------------------+ +Configuration for <<>> + +** Secondary NameNode + +*-------------------------+-------------------------+------------------------+ +|| Parameter || Value || Notes | +*-------------------------+-------------------------+------------------------+ +| <<>> | | | +*-------------------------+-------------------------+------------------------+ +| <<>> | <50470> | | +*-------------------------+-------------------------+------------------------+ +| <<>> | | | +| | | | +| | | Kerberos keytab file for the NameNode. | +*-------------------------+-------------------------+------------------------+ +| <<>> | sn/_HOST@REALM.TLD | | +| | | Kerberos principal name for the Secondary NameNode. | +*-------------------------+-------------------------+------------------------+ +| <<>> | | | +| | host/_HOST@REALM.TLD | | +| | | HTTPS Kerberos principal name for the Secondary NameNode. | +*-------------------------+-------------------------+------------------------+ +Configuration for <<>> + +** DataNode + +*-------------------------+-------------------------+------------------------+ +|| Parameter || Value || Notes | +*-------------------------+-------------------------+------------------------+ +| <<>> | 700 | | +*-------------------------+-------------------------+------------------------+ +| <<>> | <0.0.0.0:1004> | | +| | | Secure DataNode must use privileged port | +| | | in order to assure that the server was started securely. | +| | | This means that the server must be started via jsvc. | +*-------------------------+-------------------------+------------------------+ +| <<>> | <0.0.0.0:1006> | | +| | | Secure DataNode must use privileged port | +| | | in order to assure that the server was started securely. | +| | | This means that the server must be started via jsvc. | +*-------------------------+-------------------------+------------------------+ +| <<>> | <0.0.0.0:50470> | | +*-------------------------+-------------------------+------------------------+ +| <<>> | | | +| | | Kerberos keytab file for the DataNode. | +*-------------------------+-------------------------+------------------------+ +| <<>> | dn/_HOST@REALM.TLD | | +| | | Kerberos principal name for the DataNode. | +*-------------------------+-------------------------+------------------------+ +| <<>> | | | +| | host/_HOST@REALM.TLD | | +| | | HTTPS Kerberos principal name for the DataNode. | +*-------------------------+-------------------------+------------------------+ +| <<>> | | | +| | | set to <<>> when using data encryption | +*-------------------------+-------------------------+------------------------+ +Configuration for <<>> + + +** WebHDFS + +*-------------------------+-------------------------+------------------------+ +|| Parameter || Value || Notes | +*-------------------------+-------------------------+------------------------+ +| <<>> | http/_HOST@REALM.TLD | | +| | | Enable security on WebHDFS. | +*-------------------------+-------------------------+------------------------+ +| <<>> | http/_HOST@REALM.TLD | | +| | | Kerberos keytab file for the WebHDFS. | +*-------------------------+-------------------------+------------------------+ +| <<>> | | | +| | | Kerberos principal name for WebHDFS. | +*-------------------------+-------------------------+------------------------+ +Configuration for <<>> + + +** ResourceManager + +*-------------------------+-------------------------+------------------------+ +|| Parameter || Value || Notes | +*-------------------------+-------------------------+------------------------+ +| <<>> | | | +| | | | +| | | Kerberos keytab file for the ResourceManager. | +*-------------------------+-------------------------+------------------------+ +| <<>> | rm/_HOST@REALM.TLD | | +| | | Kerberos principal name for the ResourceManager. | +*-------------------------+-------------------------+------------------------+ +Configuration for <<>> + +** NodeManager + +*-------------------------+-------------------------+------------------------+ +|| Parameter || Value || Notes | +*-------------------------+-------------------------+------------------------+ +| <<>> | | | +| | | Kerberos keytab file for the NodeManager. | +*-------------------------+-------------------------+------------------------+ +| <<>> | nm/_HOST@REALM.TLD | | +| | | Kerberos principal name for the NodeManager. | +*-------------------------+-------------------------+------------------------+ +| <<>> | | | +| | <<>> | +| | | Use LinuxContainerExecutor. | +*-------------------------+-------------------------+------------------------+ +| <<>> | | | +| | | Unix group of the NodeManager. | +*-------------------------+-------------------------+------------------------+ +| <<>> | | | +| | | The path to the executable of Linux container executor. | +*-------------------------+-------------------------+------------------------+ +Configuration for <<>> + +** Configuration for WebAppProxy + + The <<>> provides a proxy between the web applications + exported by an application and an end user. If security is enabled + it will warn users before accessing a potentially unsafe web application. + Authentication and authorization using the proxy is handled just like + any other privileged web application. + +*-------------------------+-------------------------+------------------------+ +|| Parameter || Value || Notes | +*-------------------------+-------------------------+------------------------+ +| <<>> | | | +| | <<>> host:port for proxy to AM web apps. | | +| | | if this is the same as <<>>| +| | | or it is not defined then the <<>> will run the proxy| +| | | otherwise a standalone proxy server will need to be launched.| +*-------------------------+-------------------------+------------------------+ +| <<>> | | | +| | | | +| | | Kerberos keytab file for the WebAppProxy. | +*-------------------------+-------------------------+------------------------+ +| <<>> | wap/_HOST@REALM.TLD | | +| | | Kerberos principal name for the WebAppProxy. | +*-------------------------+-------------------------+------------------------+ +Configuration for <<>> + +** LinuxContainerExecutor + + A <<>> used by YARN framework which define how any + launched and controlled. + + The following are the available in Hadoop YARN: + +*--------------------------------------+--------------------------------------+ +|| ContainerExecutor || Description | +*--------------------------------------+--------------------------------------+ +| <<>> | | +| | The default executor which YARN uses to manage container execution. | +| | The container process has the same Unix user as the NodeManager. | +*--------------------------------------+--------------------------------------+ +| <<>> | | +| | Supported only on GNU/Linux, this executor runs the containers as either the | +| | YARN user who submitted the application (when full security is enabled) or | +| | as a dedicated user (defaults to nobody) when full security is not enabled. | +| | When full security is enabled, this executor requires all user accounts to be | +| | created on the cluster nodes where the containers are launched. It uses | +| | a executable that is included in the Hadoop distribution. | +| | The NodeManager uses this executable to launch and kill containers. | +| | The setuid executable switches to the user who has submitted the | +| | application and launches or kills the containers. For maximum security, | +| | this executor sets up restricted permissions and user/group ownership of | +| | local files and directories used by the containers such as the shared | +| | objects, jars, intermediate files, log files etc. Particularly note that, | +| | because of this, except the application owner and NodeManager, no other | +| | user can access any of the local files/directories including those | +| | localized as part of the distributed cache. | +*--------------------------------------+--------------------------------------+ + + To build the LinuxContainerExecutor executable run: + +---- + $ mvn package -Dcontainer-executor.conf.dir=/etc/hadoop/ +---- + + The path passed in <<<-Dcontainer-executor.conf.dir>>> should be the + path on the cluster nodes where a configuration file for the setuid + executable should be located. The executable should be installed in + $HADOOP_YARN_HOME/bin. + + The executable must have specific permissions: 6050 or --Sr-s--- + permissions user-owned by (super-user) and group-owned by a + special group (e.g. <<>>) of which the NodeManager Unix user is + the group member and no ordinary application user is. If any application + user belongs to this special group, security will be compromised. This + special group name should be specified for the configuration property + <<>> in both + <<>> and <<>>. + + For example, let's say that the NodeManager is run as user who is + part of the groups users and , any of them being the primary group. + Let also be that has both and another user + (application submitter) as its members, and does not + belong to . Going by the above description, the setuid/setgid + executable should be set 6050 or --Sr-s--- with user-owner as and + group-owner as which has as its member (and not + which has also as its member besides ). + + The LinuxTaskController requires that paths including and leading up to + the directories specified in <<>> and + <<>> to be set 755 permissions as described + above in the table on permissions on directories. + + * <<>> + + The executable requires a configuration file called + <<>> to be present in the configuration + directory passed to the mvn target mentioned above. + + The configuration file must be owned by the user running NodeManager + (user <<>> in the above example), group-owned by anyone and + should have the permissions 0400 or r--------. + + The executable requires following configuration items to be present + in the <<>> file. The items should be + mentioned as simple key=value pairs, one per-line: + +*-------------------------+-------------------------+------------------------+ +|| Parameter || Value || Notes | +*-------------------------+-------------------------+------------------------+ +| <<>> | | | +| | | Unix group of the NodeManager. The group owner of the | +| | | binary should be this group. Should be same as the | +| | | value with which the NodeManager is configured. This configuration is | +| | | required for validating the secure access of the | +| | | binary. | +*-------------------------+-------------------------+------------------------+ +| <<>> | hfds,yarn,mapred,bin | Banned users. | +*-------------------------+-------------------------+------------------------+ +| <<>> | foo,bar | Allowed system users. | +*-------------------------+-------------------------+------------------------+ +| <<>> | 1000 | Prevent other super-users. | +*-------------------------+-------------------------+------------------------+ +Configuration for <<>> + + To re-cap, here are the local file-sysytem permissions required for the + various paths related to the <<>>: + +*-------------------+-------------------+------------------+------------------+ +|| Filesystem || Path || User:Group || Permissions | +*-------------------+-------------------+------------------+------------------+ +| local | container-executor | root:hadoop | --Sr-s--- | +*-------------------+-------------------+------------------+------------------+ +| local | <<>> | root:hadoop | r-------- | +*-------------------+-------------------+------------------+------------------+ +| local | <<>> | yarn:hadoop | drwxr-xr-x | +*-------------------+-------------------+------------------+------------------+ +| local | <<>> | yarn:hadoop | drwxr-xr-x | +*-------------------+-------------------+------------------+------------------+ + +** MapReduce JobHistory Server + +*-------------------------+-------------------------+------------------------+ +|| Parameter || Value || Notes | +*-------------------------+-------------------------+------------------------+ +| <<>> | | | +| | MapReduce JobHistory Server | Default port is 10020. | +*-------------------------+-------------------------+------------------------+ +| <<>> | | +| | | | +| | | Kerberos keytab file for the MapReduce JobHistory Server. | +*-------------------------+-------------------------+------------------------+ +| <<>> | jhs/_HOST@REALM.TLD | | +| | | Kerberos principal name for the MapReduce JobHistory Server. | +*-------------------------+-------------------------+------------------------+ +Configuration for <<>> diff --git a/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt b/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt index 02309ca2c56..bb9470265f7 100644 --- a/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt +++ b/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt @@ -693,6 +693,9 @@ Release 2.3.0 - UNRELEASED HDFS-5677. Need error checking for HA cluster configuration. (Vincent Sheffer via cos) + HADOOP-10086. User document for authentication in secure cluster. + (Masatake Iwasaki via Arpit Agarwal) + OPTIMIZATIONS BUG FIXES diff --git a/hadoop-project/src/site/site.xml b/hadoop-project/src/site/site.xml index 4e451e8c260..31c105c37de 100644 --- a/hadoop-project/src/site/site.xml +++ b/hadoop-project/src/site/site.xml @@ -59,6 +59,7 @@ +