Hadoop HDFS over HTTP - Server Setup

This page explains how to quickly setup HttpFS with Pseudo authentication against a Hadoop cluster with Pseudo authentication.

Install HttpFS

~ $ tar xzf  httpfs-3.4.0-SNAPSHOT.tar.gz

Configure HttpFS

By default, HttpFS assumes that Hadoop configuration files (core-site.xml & hdfs-site.xml) are in the HttpFS configuration directory.

If this is not the case, add to the httpfs-site.xml file the httpfs.hadoop.config.dir property set to the location of the Hadoop configuration directory.

Configure Hadoop

Edit Hadoop core-site.xml and defined the Unix user that will run the HttpFS server as a proxyuser. For example:

  <property>
    <name>hadoop.proxyuser.#HTTPFSUSER#.hosts</name>
    <value>httpfs-host.foo.com</value>
  </property>
  <property>
    <name>hadoop.proxyuser.#HTTPFSUSER#.groups</name>
    <value>*</value>
  </property>

IMPORTANT: Replace #HTTPFSUSER# with the Unix user that will start the HttpFS server.

Restart Hadoop

You need to restart Hadoop for the proxyuser configuration to become active.

Start/Stop HttpFS

To start/stop HttpFS, use hdfs --daemon start|stop httpfs. For example:

hadoop-3.4.0-SNAPSHOT $ hdfs --daemon start httpfs

NOTE: The script httpfs.sh is deprecated. It is now just a wrapper of hdfs httpfs.

Test HttpFS is working

$ curl -sS 'http://<HTTPFSHOSTNAME>:14000/webhdfs/v1?op=gethomedirectory&user.name=hdfs'
{"Path":"\/user\/hdfs"}

HttpFS Configuration

HttpFS preconfigures the HTTP port to 14000.

HttpFS supports the following configuration properties in the HttpFS’s etc/hadoop/httpfs-site.xml configuration file.

HttpFS over HTTPS (SSL)

Enable SSL in etc/hadoop/httpfs-site.xml:

  <property>
    <name>httpfs.ssl.enabled</name>
    <value>true</value>
    <description>
      Whether SSL is enabled. Default is false, i.e. disabled.
    </description>
  </property>

Configure etc/hadoop/ssl-server.xml with proper values, for example:

  <property>
    <name>ssl.server.keystore.location</name>
    <value>${user.home}/.keystore</value>
    <description>Keystore to be used. Must be specified.
    </description>
  </property>

  <property>
    <name>ssl.server.keystore.password</name>
    <value></value>
    <description>Must be specified.</description>
  </property>

  <property>
    <name>ssl.server.keystore.keypassword</name>
    <value></value>
    <description>Must be specified.</description>
  </property>

The SSL passwords can be secured by a credential provider. See Credential Provider API.

You need to create an SSL certificate for the HttpFS server. As the httpfs Unix user, using the Java keytool command to create the SSL certificate:

$ keytool -genkey -alias jetty -keyalg RSA

You will be asked a series of questions in an interactive prompt. It will create the keystore file, which will be named .keystore and located in the httpfs user home directory.

The password you enter for “keystore password” must match the value of the property ssl.server.keystore.password set in the ssl-server.xml in the configuration directory.

The answer to “What is your first and last name?” (i.e. “CN”) must be the hostname of the machine where the HttpFS Server will be running.

Start HttpFS. It should work over HTTPS.

Using the Hadoop FileSystem API or the Hadoop FS shell, use the swebhdfs:// scheme. Make sure the JVM is picking up the truststore containing the public key of the SSL certificate if using a self-signed certificate. For more information about the client side settings, see SSL Configurations for SWebHDFS.

NOTE: Some old SSL clients may use weak ciphers that are not supported by the HttpFS server. It is recommended to upgrade the SSL client.

Deprecated Environment Variables

The following environment variables are deprecated. Set the corresponding configuration properties instead.

Environment Variable Configuration Property Configuration File
HTTPFS_HTTP_HOSTNAME httpfs.http.hostname httpfs-site.xml
HTTPFS_HTTP_PORT httpfs.http.port httpfs-site.xml
HTTPFS_MAX_HTTP_HEADER_SIZE hadoop.http.max.request.header.size and hadoop.http.max.response.header.size httpfs-site.xml
HTTPFS_MAX_THREADS hadoop.http.max.threads httpfs-site.xml
HTTPFS_SSL_ENABLED httpfs.ssl.enabled httpfs-site.xml
HTTPFS_SSL_KEYSTORE_FILE ssl.server.keystore.location ssl-server.xml
HTTPFS_SSL_KEYSTORE_PASS ssl.server.keystore.password ssl-server.xml

HTTP Default Services

Name Description
/conf Display configuration properties
/jmx Java JMX management interface
/logLevel Get or set log level per class
/logs Display log files
/stacks Display JVM stacks
/static/index.html The static home page
/prof Async Profiler endpoint

To control the access to servlet /conf, /jmx, /logLevel, /logs, /stacks and /prof, configure the following properties in httpfs-site.xml:

  <property>
    <name>hadoop.security.authorization</name>
    <value>true</value>
    <description>Is service-level authorization enabled?</description>
  </property>

  <property>
    <name>hadoop.security.instrumentation.requires.admin</name>
    <value>true</value>
    <description>
      Indicates if administrator ACLs are required to access
      instrumentation servlets (JMX, METRICS, CONF, STACKS, PROF).
    </description>
  </property>

  <property>
    <name>httpfs.http.administrators</name>
    <value></value>
    <description>ACL for the admins, this configuration is used to control
      who can access the default servlets for HttpFS server. The value
      should be a comma separated list of users and groups. The user list
      comes first and is separated by a space followed by the group list,
      e.g. "user1,user2 group1,group2". Both users and groups are optional,
      so "user1", " group1", "", "user1 group1", "user1,user2 group1,group2"
      are all valid (note the leading space in " group1"). '*' grants access
      to all users and groups, e.g. '*', '* ' and ' *' are all valid.
    </description>
  </property>