This page explains how to quickly setup HttpFS with Pseudo authentication against a Hadoop cluster with Pseudo authentication.
~ $ tar xzf httpfs-3.4.0-SNAPSHOT.tar.gz
By default, HttpFS assumes that Hadoop configuration files (core-site.xml & hdfs-site.xml
) are in the HttpFS configuration directory.
If this is not the case, add to the httpfs-site.xml
file the httpfs.hadoop.config.dir
property set to the location of the Hadoop configuration directory.
Edit Hadoop core-site.xml
and defined the Unix user that will run the HttpFS server as a proxyuser. For example:
<property> <name>hadoop.proxyuser.#HTTPFSUSER#.hosts</name> <value>httpfs-host.foo.com</value> </property> <property> <name>hadoop.proxyuser.#HTTPFSUSER#.groups</name> <value>*</value> </property>
IMPORTANT: Replace #HTTPFSUSER#
with the Unix user that will start the HttpFS server.
You need to restart Hadoop for the proxyuser configuration to become active.
To start/stop HttpFS, use hdfs --daemon start|stop httpfs
. For example:
hadoop-3.4.0-SNAPSHOT $ hdfs --daemon start httpfs
NOTE: The script httpfs.sh
is deprecated. It is now just a wrapper of hdfs httpfs
.
$ curl -sS 'http://<HTTPFSHOSTNAME>:14000/webhdfs/v1?op=gethomedirectory&user.name=hdfs' {"Path":"\/user\/hdfs"}
HttpFS preconfigures the HTTP port to 14000.
HttpFS supports the following configuration properties in the HttpFS’s etc/hadoop/httpfs-site.xml
configuration file.
Enable SSL in etc/hadoop/httpfs-site.xml
:
<property> <name>httpfs.ssl.enabled</name> <value>true</value> <description> Whether SSL is enabled. Default is false, i.e. disabled. </description> </property>
Configure etc/hadoop/ssl-server.xml
with proper values, for example:
<property> <name>ssl.server.keystore.location</name> <value>${user.home}/.keystore</value> <description>Keystore to be used. Must be specified. </description> </property> <property> <name>ssl.server.keystore.password</name> <value></value> <description>Must be specified.</description> </property> <property> <name>ssl.server.keystore.keypassword</name> <value></value> <description>Must be specified.</description> </property>
The SSL passwords can be secured by a credential provider. See Credential Provider API.
You need to create an SSL certificate for the HttpFS server. As the httpfs
Unix user, using the Java keytool
command to create the SSL certificate:
$ keytool -genkey -alias jetty -keyalg RSA
You will be asked a series of questions in an interactive prompt. It will create the keystore file, which will be named .keystore and located in the httpfs
user home directory.
The password you enter for “keystore password” must match the value of the property ssl.server.keystore.password
set in the ssl-server.xml
in the configuration directory.
The answer to “What is your first and last name?” (i.e. “CN”) must be the hostname of the machine where the HttpFS Server will be running.
Start HttpFS. It should work over HTTPS.
Using the Hadoop FileSystem
API or the Hadoop FS shell, use the swebhdfs://
scheme. Make sure the JVM is picking up the truststore containing the public key of the SSL certificate if using a self-signed certificate. For more information about the client side settings, see SSL Configurations for SWebHDFS.
NOTE: Some old SSL clients may use weak ciphers that are not supported by the HttpFS server. It is recommended to upgrade the SSL client.
The following environment variables are deprecated. Set the corresponding configuration properties instead.
Environment Variable | Configuration Property | Configuration File |
---|---|---|
HTTPFS_HTTP_HOSTNAME | httpfs.http.hostname | httpfs-site.xml |
HTTPFS_HTTP_PORT | httpfs.http.port | httpfs-site.xml |
HTTPFS_MAX_HTTP_HEADER_SIZE | hadoop.http.max.request.header.size and hadoop.http.max.response.header.size | httpfs-site.xml |
HTTPFS_MAX_THREADS | hadoop.http.max.threads | httpfs-site.xml |
HTTPFS_SSL_ENABLED | httpfs.ssl.enabled | httpfs-site.xml |
HTTPFS_SSL_KEYSTORE_FILE | ssl.server.keystore.location | ssl-server.xml |
HTTPFS_SSL_KEYSTORE_PASS | ssl.server.keystore.password | ssl-server.xml |
Name | Description |
---|---|
/conf | Display configuration properties |
/jmx | Java JMX management interface |
/logLevel | Get or set log level per class |
/logs | Display log files |
/stacks | Display JVM stacks |
/static/index.html | The static home page |
/prof | Async Profiler endpoint |
To control the access to servlet /conf
, /jmx
, /logLevel
, /logs
, /stacks
and /prof
, configure the following properties in httpfs-site.xml
:
<property> <name>hadoop.security.authorization</name> <value>true</value> <description>Is service-level authorization enabled?</description> </property> <property> <name>hadoop.security.instrumentation.requires.admin</name> <value>true</value> <description> Indicates if administrator ACLs are required to access instrumentation servlets (JMX, METRICS, CONF, STACKS, PROF). </description> </property> <property> <name>httpfs.http.administrators</name> <value></value> <description>ACL for the admins, this configuration is used to control who can access the default servlets for HttpFS server. The value should be a comma separated list of users and groups. The user list comes first and is separated by a space followed by the group list, e.g. "user1,user2 group1,group2". Both users and groups are optional, so "user1", " group1", "", "user1 group1", "user1,user2 group1,group2" are all valid (note the leading space in " group1"). '*' grants access to all users and groups, e.g. '*', '* ' and ' *' are all valid. </description> </property>