druid-docs-cn/tutorials/tutorial-kerberos-hadoop.md

---
id: tutorial-kerberos-hadoop
title: "Configuring Apache Druid to use Kerberized Apache Hadoop as deep storage"
sidebar_label: "Kerberized HDFS deep storage"
---

<!--
  ~ Licensed to the Apache Software Foundation (ASF) under one
  ~ or more contributor license agreements.  See the NOTICE file
  ~ distributed with this work for additional information
  ~ regarding copyright ownership.  The ASF licenses this file
  ~ to you under the Apache License, Version 2.0 (the
  ~ "License"); you may not use this file except in compliance
  ~ with the License.  You may obtain a copy of the License at
  ~
  ~   http://www.apache.org/licenses/LICENSE-2.0
  ~
  ~ Unless required by applicable law or agreed to in writing,
  ~ software distributed under the License is distributed on an
  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  ~ KIND, either express or implied.  See the License for the
  ~ specific language governing permissions and limitations
  ~ under the License.
  -->


## Hadoop Setup

Following are the configurations files required to be copied over to Druid conf folders:

1. For HDFS as a deep storage, hdfs-site.xml, core-site.xml
2. For ingestion, mapred-site.xml, yarn-site.xml

### HDFS Folders and permissions

1. Choose any folder name for the druid deep storage, for example 'druid'
2. Create the folder in hdfs under the required parent folder. For example,
`hdfs dfs -mkdir /druid`
OR
`hdfs dfs -mkdir /apps/druid`

3. Give druid processes appropriate permissions for the druid processes to access this folder. This would ensure that druid is able to create necessary folders like data and indexing_log in HDFS.
For example, if druid processes run as user 'root', then

    `hdfs dfs -chown root:root /apps/druid`

    OR

    `hdfs dfs -chmod 777 /apps/druid`

Druid creates necessary sub-folders to store data and index under this newly created folder.

## Druid Setup

Edit common.runtime.properties at conf/druid/_common/common.runtime.properties to include the HDFS properties. Folders used for the location are same as the ones used for example above.

### common.runtime.properties

```properties
# Deep storage
#
# For HDFS:
druid.storage.type=hdfs
druid.storage.storageDirectory=/druid/segments
# OR
# druid.storage.storageDirectory=/apps/druid/segments

#
# Indexing service logs
#

# For HDFS:
druid.indexer.logs.type=hdfs
druid.indexer.logs.directory=/druid/indexing-logs
# OR
# druid.storage.storageDirectory=/apps/druid/indexing-logs
```

Note: Comment out Local storage and S3 Storage parameters in the file

Also include hdfs-storage core extension to `conf/druid/_common/common.runtime.properties`

```properties
#
# Extensions
#

druid.extensions.directory=dist/druid/extensions
druid.extensions.hadoopDependenciesDir=dist/druid/hadoop-dependencies
druid.extensions.loadList=["mysql-metadata-storage", "druid-hdfs-storage", "druid-kerberos"]
```

### Hadoop Jars

Ensure that Druid has necessary jars to support the Hadoop version.

Find the hadoop version using command, `hadoop version`

In case there is other software used with hadoop, like `WanDisco`, ensure that
1. the necessary libraries are available
2. add the requisite extensions to `druid.extensions.loadlist` in `conf/druid/_common/common.runtime.properties`

### Kerberos setup

Create a headless keytab which would have access to the druid data and index.

Edit conf/druid/_common/common.runtime.properties and add the following properties:

```properties
druid.hadoop.security.kerberos.principal
druid.hadoop.security.kerberos.keytab
```

For example

```properties
druid.hadoop.security.kerberos.principal=hdfs-test@EXAMPLE.IO
druid.hadoop.security.kerberos.keytab=/etc/security/keytabs/hdfs.headless.keytab
```

### Restart Druid Services

With the above changes, restart Druid. This would ensure that Druid works with Kerberized Hadoop
重新整理文件并且将官方的英文版内容拷贝部分 2021-07-19 16:14:08 -04:00			`---`
			`id: tutorial-kerberos-hadoop`
			`title: "Configuring Apache Druid to use Kerberized Apache Hadoop as deep storage"`
			`sidebar_label: "Kerberized HDFS deep storage"`
			`---`

			`<!--`
			`~ Licensed to the Apache Software Foundation (ASF) under one`
			`~ or more contributor license agreements. See the NOTICE file`
			`~ distributed with this work for additional information`
			`~ regarding copyright ownership. The ASF licenses this file`
			`~ to you under the Apache License, Version 2.0 (the`
			`~ "License"); you may not use this file except in compliance`
			`~ with the License. You may obtain a copy of the License at`
			`~`
			`~ http://www.apache.org/licenses/LICENSE-2.0`
			`~`
			`~ Unless required by applicable law or agreed to in writing,`
			`~ software distributed under the License is distributed on an`
			`~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY`
			`~ KIND, either express or implied. See the License for the`
			`~ specific language governing permissions and limitations`
			`~ under the License.`
			`-->`


			`## Hadoop Setup`

			`Following are the configurations files required to be copied over to Druid conf folders:`

			`1. For HDFS as a deep storage, hdfs-site.xml, core-site.xml`
			`2. For ingestion, mapred-site.xml, yarn-site.xml`

			`### HDFS Folders and permissions`

			`1. Choose any folder name for the druid deep storage, for example 'druid'`
			`2. Create the folder in hdfs under the required parent folder. For example,`
			`hdfs dfs -mkdir /druid`
			`OR`
			`hdfs dfs -mkdir /apps/druid`

			`3. Give druid processes appropriate permissions for the druid processes to access this folder. This would ensure that druid is able to create necessary folders like data and indexing_log in HDFS.`
			`For example, if druid processes run as user 'root', then`

			`hdfs dfs -chown root:root /apps/druid`

			`OR`

			`hdfs dfs -chmod 777 /apps/druid`

			`Druid creates necessary sub-folders to store data and index under this newly created folder.`

			`## Druid Setup`

			`Edit common.runtime.properties at conf/druid/_common/common.runtime.properties to include the HDFS properties. Folders used for the location are same as the ones used for example above.`

			`### common.runtime.properties`

			```properties
			`# Deep storage`
			`#`
			`# For HDFS:`
			`druid.storage.type=hdfs`
			`druid.storage.storageDirectory=/druid/segments`
			`# OR`
			`# druid.storage.storageDirectory=/apps/druid/segments`

			`#`
			`# Indexing service logs`
			`#`

			`# For HDFS:`
			`druid.indexer.logs.type=hdfs`
			`druid.indexer.logs.directory=/druid/indexing-logs`
			`# OR`
			`# druid.storage.storageDirectory=/apps/druid/indexing-logs`
			```

			`Note: Comment out Local storage and S3 Storage parameters in the file`

			Also include hdfs-storage core extension to `conf/druid/_common/common.runtime.properties`

			```properties
			`#`
			`# Extensions`
			`#`

			`druid.extensions.directory=dist/druid/extensions`
			`druid.extensions.hadoopDependenciesDir=dist/druid/hadoop-dependencies`
			`druid.extensions.loadList=["mysql-metadata-storage", "druid-hdfs-storage", "druid-kerberos"]`
			```

			`### Hadoop Jars`

			`Ensure that Druid has necessary jars to support the Hadoop version.`

			Find the hadoop version using command, `hadoop version`

			In case there is other software used with hadoop, like `WanDisco`, ensure that
			`1. the necessary libraries are available`
			2. add the requisite extensions to `druid.extensions.loadlist` in `conf/druid/_common/common.runtime.properties`

			`### Kerberos setup`

			`Create a headless keytab which would have access to the druid data and index.`

			`Edit conf/druid/_common/common.runtime.properties and add the following properties:`

			```properties
			`druid.hadoop.security.kerberos.principal`
			`druid.hadoop.security.kerberos.keytab`
			```

			`For example`

			```properties
			`druid.hadoop.security.kerberos.principal=hdfs-test@EXAMPLE.IO`
			`druid.hadoop.security.kerberos.keytab=/etc/security/keytabs/hdfs.headless.keytab`
			```

			`### Restart Druid Services`

			`With the above changes, restart Druid. This would ensure that Druid works with Kerberized Hadoop`