2019-03-11 14:39:15 -04:00
---
2019-08-21 00:48:59 -04:00
id: tutorial-kerberos-hadoop
2023-05-19 12:42:27 -04:00
title: Configure Apache Druid to use Kerberized Apache Hadoop as deep storage
sidebar_label: Kerberized HDFS deep storage
2019-03-11 14:39:15 -04:00
---
<!--
~ Licensed to the Apache Software Foundation (ASF) under one
~ or more contributor license agreements. See the NOTICE file
~ distributed with this work for additional information
~ regarding copyright ownership. The ASF licenses this file
~ to you under the Apache License, Version 2.0 (the
~ "License"); you may not use this file except in compliance
~ with the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing,
~ software distributed under the License is distributed on an
~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
~ KIND, either express or implied. See the License for the
~ specific language governing permissions and limitations
~ under the License.
-->
2019-08-21 00:48:59 -04:00
2019-03-11 14:39:15 -04:00
## Hadoop Setup
Following are the configurations files required to be copied over to Druid conf folders:
1. For HDFS as a deep storage, hdfs-site.xml, core-site.xml
2. For ingestion, mapred-site.xml, yarn-site.xml
### HDFS Folders and permissions
1. Choose any folder name for the druid deep storage, for example 'druid'
2. Create the folder in hdfs under the required parent folder. For example,
`hdfs dfs -mkdir /druid`
OR
`hdfs dfs -mkdir /apps/druid`
3. Give druid processes appropriate permissions for the druid processes to access this folder. This would ensure that druid is able to create necessary folders like data and indexing_log in HDFS.
For example, if druid processes run as user 'root', then
`hdfs dfs -chown root:root /apps/druid`
OR
`hdfs dfs -chmod 777 /apps/druid`
Druid creates necessary sub-folders to store data and index under this newly created folder.
## Druid Setup
Edit common.runtime.properties at conf/druid/_common/common.runtime.properties to include the HDFS properties. Folders used for the location are same as the ones used for example above.
2019-10-22 17:40:41 -04:00
### common.runtime.properties
2019-03-11 14:39:15 -04:00
2019-10-22 17:40:41 -04:00
```properties
2019-03-11 14:39:15 -04:00
# Deep storage
#
# For HDFS:
2019-10-22 17:40:41 -04:00
druid.storage.type=hdfs
druid.storage.storageDirectory=/druid/segments
2019-03-11 14:39:15 -04:00
# OR
# druid.storage.storageDirectory=/apps/druid/segments
#
# Indexing service logs
#
# For HDFS:
druid.indexer.logs.type=hdfs
druid.indexer.logs.directory=/druid/indexing-logs
# OR
# druid.storage.storageDirectory=/apps/druid/indexing-logs
```
Note: Comment out Local storage and S3 Storage parameters in the file
2019-10-22 17:40:41 -04:00
Also include hdfs-storage core extension to `conf/druid/_common/common.runtime.properties`
2019-03-11 14:39:15 -04:00
```properties
#
# Extensions
#
druid.extensions.directory=dist/druid/extensions
druid.extensions.hadoopDependenciesDir=dist/druid/hadoop-dependencies
2019-08-01 20:03:25 -04:00
druid.extensions.loadList=["mysql-metadata-storage", "druid-hdfs-storage", "druid-kerberos"]
2019-03-11 14:39:15 -04:00
```
2019-08-21 00:48:59 -04:00
2019-03-11 14:39:15 -04:00
### Hadoop Jars
Ensure that Druid has necessary jars to support the Hadoop version.
2019-08-21 00:48:59 -04:00
Find the hadoop version using command, `hadoop version`
2019-03-11 14:39:15 -04:00
2019-10-22 17:40:41 -04:00
In case there is other software used with hadoop, like `WanDisco` , ensure that
2019-03-11 14:39:15 -04:00
1. the necessary libraries are available
2. add the requisite extensions to `druid.extensions.loadlist` in `conf/druid/_common/common.runtime.properties`
### Kerberos setup
Create a headless keytab which would have access to the druid data and index.
Edit conf/druid/_common/common.runtime.properties and add the following properties:
```properties
druid.hadoop.security.kerberos.principal
druid.hadoop.security.kerberos.keytab
```
For example
```properties
druid.hadoop.security.kerberos.principal=hdfs-test@EXAMPLE.IO
druid.hadoop.security.kerberos.keytab=/etc/security/keytabs/hdfs.headless.keytab
```
### Restart Druid Services
With the above changes, restart Druid. This would ensure that Druid works with Kerberized Hadoop