Merge pull request #2554 from guobingkun/improve_include_extensions

improve doc on including druid and hadoop extensions
This commit is contained in:
Fangjin Yang 2016-02-26 20:05:57 -08:00
commit 8f97b1e40c
2 changed files with 72 additions and 33 deletions

View File

@ -3,31 +3,47 @@ layout: doc_page
---
# Including Extensions
Druid uses a module system that allows for the addition of extensions at runtime.
Druid uses a module system that allows for the addition of extensions at runtime. To instruct Druid to load extensions, follow the steps below.
## Specifying extensions
## Download extensions
Druid extensions can be specified in the `common.runtime.properties`. There are two ways of adding druid extensions currently.
Core Druid extensions are already bundled in the Druid release tarball. You can get them by downloading the tarball at [druid.io](http://druid.io/downloads.html).
Unpack the tarball; You will see an ```extensions``` folder that contains all the core extensions, along with a ```hadoop-dependencies``` folder
where it contains all the hadoop extensions. Each extension will have its own folder that contains extension jars. However, because of licensing
we didn't package the mysql-metadata-storage extension in the extensions folder. In order to get it, you can download it from [druid.io](http://druid.io/downloads.html),
then unpack and move it into ```extensions``` directory.
### Add to the classpath
Optionally, you can use the `pull-deps` tool to download extensions you want.
See [pull-deps](../operations/pull-deps.html) for a complete example.
If you add your extension jar to the classpath at runtime, Druid will load it into the system. This mechanism is relatively easy to reason about, but it also means that you have to ensure that all dependency jars on the classpath are compatible. That is, Druid makes no provisions while using this method to maintain class loader isolation so you must make sure that the jars on your classpath are mutually compatible.
## Load extensions
### Add to the extension directory
There are two ways to let Druid load extensions.
If you don't want to fiddle with classpath, you can create an extension directory and tell Druid to load extensions from there.
### Load from classpath
If you add your extension jar to the classpath at runtime, Druid will load it into the system. This mechanism is relatively easy to reason about,
but it also means that you have to ensure that all dependency jars on the classpath are compatible. That is, Druid makes no provisions while using
this method to maintain class loader isolation so you must make sure that the jars on your classpath are mutually compatible.
### Load from extension directory
If you don't want to fiddle with classpath, you can tell Druid to load extensions from an extension directory.
To let Druid load your extensions, follow the steps below
1) Specify `druid.extensions.directory` (root directory for normal Druid extensions). If you don't specify it, Druid will use the default value, see [Configuration](../configuration/index.html).
**Tell Druid where your extensions are**
2) Under the root extension directory, create sub-directories for each extension you might want to load. Inside each sub-directory, put extension-related files. (If you don't want to manually set up the extension directory, you can use Druid's [pull-deps](../pull-deps.html) tool to help you generate these directories automatically.)
Specify `druid.extensions.directory` (the root directory that contains Druid extensions). See [Configuration](../configuration/index.html)
The value for this property should be set to the absolute path of the folder that contains all the extensions.
In general, you should simply reuse the release tarball's extensions directory (i.e., ```extensions```).
Example:
Suppose you specify `druid.extensions.directory=/usr/local/druid/extensions`, and want Druid to load normal extensions ```druid-kafka-eight``` and ```mysql-metadata-storage```.
Suppose you specify `druid.extensions.directory=/usr/local/druid_tarball/extensions`
Then under ```extensions```, it should look like this,
Then underneath ```extensions```, it should look like this,
```
extensions/
@ -45,17 +61,22 @@ extensions/
│   ├── slf4j-log4j12-1.6.1.jar
│   ├── snappy-java-1.1.1.6.jar
│   ├── zkclient-0.3.jar
│   └── zookeeper-3.4.7.jar
└── mysql-metadata-storage
├── jdbi-2.32.jar
├── mysql-connector-java-5.1.34.jar
└── mysql-metadata-storage-0.8.0-rc1.jar
└── mysql-metadata-storage-0.9.0.jar
```
As you can see, under ```extensions``` there are two sub-directories ```druid-kafka-eight``` and ```mysql-metadata-storage```, each sub-directory denotes an extension that Druid might load.
As you can see, underneath ```extensions``` there are two sub-directories ```druid-kafka-eight``` and ```mysql-metadata-storage```.
Each sub-directory denotes an extension that Druid could load.
3) To have Druid load a specific list of extensions present under the root extension directory, set `druid.extensions.loadList` to the list of extensions to load. Using the example above, if you want Druid to load ```druid-kafka-eight``` and ```mysql-metadata-storage```, you can specify `druid.extensions.loadList=["druid-kafka-eight", "mysql-metadata-storage"]`.
**Tell Druid what extensions to load**
Use `druid.extensions.loadList`(See [Configuration](../configuration/index.html) ) to specify a
list of names of extensions that should be loaded by Druid.
For example, `druid.extensions.loadList=["druid-kafka-eight", "mysql-metadata-storage"]` instructs Druid to load `druid-kafka-eight`
and `mysql-metdata-storage` extensions. That is, the name you specified in the list should be the same as its extension folder's name.
If you specify `druid.extensions.loadList=[]`, Druid won't load any extensions from the file system.
If you don't specify `druid.extensions.loadList`, Druid will load all the extensions under the root extension directory.
If you don't specify `druid.extensions.loadList`, Druid will load all the extensions under the directory specified by `druid.extensions.directory`.

View File

@ -3,32 +3,38 @@ layout: doc_page
---
# Working with different versions of Hadoop
## Including Hadoop dependencies
## Download Hadoop Dependencies
```hadoop-client:2.3.0``` is already bundled in the Druid release tarball. You can get it by downloading the tarball at [druid.io](http://druid.io/downloads.html).
Unpack the tarball; You will see a ```hadoop-dependencies``` folder that contains all the Hadoop dependencies. Each dependency will have its own folder
that contains Hadoop jars.
You can also use the `pull-deps` tool to download other Hadoop dependencies you want.
See [pull-deps](../operations/pull-deps.html) for a complete example.
## Load Hadoop dependencies
There are two different ways to let Druid pick up your Hadoop version, choose the one that fits your need.
### Add your Hadoop dependencies to the Hadoop dependencies directory
### Load Hadoop dependencies from Hadoop dependencies directory
You can create a Hadoop dependency directory and tell Druid to load your Hadoop jars from there.
You can create a Hadoop dependency directory and tell Druid to load your Hadoop dependencies from there.
To make this work, follow the steps below
(1) Specify `druid.extensions.hadoopDependenciesDir` (root directory for Hadoop related dependencies) in your `common.runtime.properties` file. If you don't
specify it, Druid will use a default value. See [Configuration](../configuration/index.html) for more details.
**Tell Druid where your Hadoop dependencies are**
(2) Set up the Hadoop dependencies directories under root Hadoop dependency directory. Under the root directory, you should
create sub-directories for each Hadoop dependencies. Inside each sub-directory, created a sub-sub-directory whose name
is the version of Hadoop it contains, and inside that sub-sub-directory, put Hadoop jars in it. This file structure is
almost same as normal Druid extensions described in [Including Extensions](../operations/including-extensions.html),
except that there is an extra layer of folder that specifies the version of Hadoop. (If you don't want to manually setup
this directory, Druid also provides a [pull-deps](../operations/pull-deps.html) tool that can help you generate these
directories automatically).
Specify `druid.extensions.hadoopDependenciesDir` (root directory for Hadoop related dependencies) See [Configuration](../configuration/index.html).
The value for this property should be set to the absolute path of the folder that contains all the Hadoop dependencies.
In general, you should simply reuse the release tarball's ```hadoop-dependencies``` directory.
Example:
Suppose you specify `druid.extensions.hadoopDependenciesDir=/usr/local/druid/hadoop-dependencies`, and you want to prepare both `hadoop-client` 2.3.0 and 2.4.0 for Druid,
Suppose you specify `druid.extensions.hadoopDependenciesDir=/usr/local/druid_tarball/hadoop-dependencies`, and you have downloaded
`hadoop-client` 2.3.0 and 2.4.0.
Then you can either use [pull-deps](../operations/pull-deps.html) or manually set up Hadoop dependencies directories such that under ```hadoop-dependencies```, it looks like this:
Then underneath ```hadoop-dependencies```, it should look like this:
```
hadoop-dependencies/
@ -51,11 +57,23 @@ hadoop-dependencies/
..... lots of jars
```
As you can see, under ```hadoop-client```, there are two sub-directories, each denotes a version of ```hadoop-client```. During runtime, Druid will look for these directories and load appropriate ```hadoop-client``` based on `hadoopDependencyCoordinates` passed to the [Hadoop Index Task](../ingestion/tasks.html).
As you can see, under ```hadoop-client```, there are two sub-directories, each denotes a version of ```hadoop-client```.
**Tell Druid what version of Hadoop to load**
Use `hadoopDependencyCoordinates` in [Hadoop Index Task](../ingestion/batch-ingestion.html) to specify the Hadoop dependencies you want Druid to load.
For example, in your Hadoop Index Task spec file, you have
`"hadoopDependencyCoordinates": ["org.apache.hadoop:hadoop-client:2.4.0"]`
This instructs Druid to load hadoop-client 2.4.0 when processing the task. What happens behind the scene is that Druid first looks for a folder
called ```hadoop-client``` underneath `druid.extensions.hadoopDependenciesDir`, then looks for a folder called ```2.4.0```
underneath ```hadoop-client```, upon successfully locating these folders, hadoop-client 2.4.0 is loaded.
### Append your Hadoop jars to the Druid classpath
If you really don't like the way above, and you just want to use one specific Hadoop version, and don't want Druid to work with different Hadoop versions, then you can
If you don't like the way above, and you just want to use one specific Hadoop version, and don't want Druid to work with different Hadoop versions, you can
(1) Set `druid.indexer.task.defaultHadoopCoordinates=[]`. `druid.indexer.task.defaultHadoopCoordinates` specifies the default Hadoop coordinates that Druid uses. Its default value is `["org.apache.hadoop:hadoop-client:2.3.0"]`. By setting it to an empty list, Druid will not load any other Hadoop dependencies except the ones specified in the classpath.