Merge pull request #2465 from druid-io/more-doc-fix

more doc fixes
This commit is contained in:
Fangjin Yang 2016-02-17 11:00:38 -08:00
commit 083f019a48
9 changed files with 56 additions and 30 deletions

1
NOTICE
View File

@ -1,6 +1,7 @@
Druid - a distributed column store. Druid - a distributed column store.
Copyright 2012-2016 Metamarkets Group Inc. Copyright 2012-2016 Metamarkets Group Inc.
Copyright 2015-2016 Yahoo! Inc. Copyright 2015-2016 Yahoo! Inc.
Copyright 2015-2016 Imply Data, Inc.
------------------------------------------------------------------------------- -------------------------------------------------------------------------------

View File

@ -23,7 +23,7 @@ More information about Druid can be found on <http://www.druid.io>.
### Documentation ### Documentation
You can find the [latest Druid Documentation](http://druid.io/docs/latest/) on You can find the [documentation for the latest Druid release](http://druid.io/docs/latest/) on
the [project website](http://druid.io/docs/latest/). the [project website](http://druid.io/docs/latest/).
If you would like to contribute documentation, please do so under If you would like to contribute documentation, please do so under

View File

@ -1,9 +1,14 @@
--- ---
layout: doc_page layout: doc_page
--- ---
# About Experimental Features # About Experimental Features
Experimental features are features we have developed but have not fully tested in a production environment. If you choose to try them out, there will likely be edge cases that we have not covered. We would love feedback on any of these features, whether they are bug reports, suggestions for improvement, or letting us know they work as intended. Experimental features are features we have developed but have not fully tested in a production environment. If you choose to try them out, there will likely be edge cases that we have not covered. We would love feedback on any of these features, whether they are bug reports, suggestions for improvement, or letting us know they work as intended.
<div class="note caution">
APIs for experimental features may change in backwards incompatible ways.
</div>
To enable experimental features, include their artifacts in the configuration runtime.properties file, e.g., To enable experimental features, include their artifacts in the configuration runtime.properties file, e.g.,

View File

@ -9,7 +9,7 @@ We welcome any contributions to new formats.
## Formatting the Data ## Formatting the Data
The following are some samples of the data used in the [Wikipedia example](../tutorials/tutorial-loading-streaming-data.html). The following are some samples of the data used in the [Wikipedia example](../tutorials/quickstart.html).
_JSON_ _JSON_

View File

@ -8,8 +8,8 @@ A Druid ingestion spec consists of 3 components:
```json ```json
{ {
"dataSchema" : {...} "dataSchema" : {...},
"ioConfig" : {...} "ioConfig" : {...},
"tuningConfig" : {...} "tuningConfig" : {...}
} }
``` ```
@ -93,7 +93,7 @@ If `type` is not included, the parser defaults to `string`.
### Avro Stream Parser ### Avro Stream Parser
This is for realtime ingestion. Make sure to include "io.druid.extensions:druid-avro-extensions" as an extension. This is for realtime ingestion. Make sure to include `druid-avro-extensions` as an extension.
| Field | Type | Description | Required | | Field | Type | Description | Required |
|-------|------|-------------|----------| |-------|------|-------------|----------|

View File

@ -1,9 +1,9 @@
--- ---
layout: doc_page layout: doc_page
--- ---
# Work with different versions of Hadoop # Working with different versions of Hadoop
## Include Hadoop dependencies ## Including Hadoop dependencies
There are two different ways to let Druid pick up your Hadoop version, choose the one that fits your need. There are two different ways to let Druid pick up your Hadoop version, choose the one that fits your need.
@ -13,15 +13,22 @@ You can create a Hadoop dependency directory and tell Druid to load your Hadoop
To make this work, follow the steps below To make this work, follow the steps below
(1) Specify `druid.extensions.hadoopDependenciesDir` (root directory for Hadoop related dependencies). If you don't specify it, Druid will use its default value, see [Configuration](../configuration/index.html). (1) Specify `druid.extensions.hadoopDependenciesDir` (root directory for Hadoop related dependencies) in your `common.runtime.properties` file. If you don't
specify it, Druid will use a default value. See [Configuration](../configuration/index.html) for more details.
(2) Set-up Hadoop dependencies directories under root Hadoop dependency directory. Under the root directory, you should create sub-directories for each Hadoop dependencies. Inside each sub-directory, created a sub-sub-directory whose name is the version of Hadoop it contains, and inside that sub-sub-directory, put Hadoop jars in it. This file structure is almost same as normal Druid extensions described in [Including-Extensions](../including-extensions.html), except that there is an extra layer of folder that specifies the version of Hadoop. (If you don't want to manually setup this directory, Druid also provides a [pull-deps](../pull-deps.html) tool that can help you generate these directories automatically) (2) Set up the Hadoop dependencies directories under root Hadoop dependency directory. Under the root directory, you should
create sub-directories for each Hadoop dependencies. Inside each sub-directory, created a sub-sub-directory whose name
is the version of Hadoop it contains, and inside that sub-sub-directory, put Hadoop jars in it. This file structure is
almost same as normal Druid extensions described in [Including Extensions](../operations/including-extensions.html),
except that there is an extra layer of folder that specifies the version of Hadoop. (If you don't want to manually setup
this directory, Druid also provides a [pull-deps](../operations/pull-deps.html) tool that can help you generate these
directories automatically).
Example: Example:
Suppose you specify `druid.extensions.hadoopDependenciesDir=/usr/local/druid/hadoop-dependencies`, and you want to prepare both `hadoop-client` 2.3.0 and 2.4.0 for Druid, Suppose you specify `druid.extensions.hadoopDependenciesDir=/usr/local/druid/hadoop-dependencies`, and you want to prepare both `hadoop-client` 2.3.0 and 2.4.0 for Druid,
Then you can either use [pull-deps](../pull-deps.html) or manually set up Hadoop dependencies directories such that under ```hadoop-dependencies```, it looks like this, Then you can either use [pull-deps](../operations/pull-deps.html) or manually set up Hadoop dependencies directories such that under ```hadoop-dependencies```, it looks like this:
``` ```
hadoop-dependencies/ hadoop-dependencies/
@ -44,7 +51,7 @@ hadoop-dependencies/
..... lots of jars ..... lots of jars
``` ```
As you can see, under ```hadoop-client```, there are two sub-directories, each denotes a version of ```hadoop-client```. During runtime, Druid will look for these directories and load appropriate ```hadoop-client``` based on `hadoopDependencyCoordinates` passed to [Hadoop Index Task](../ingestion/tasks.html). As you can see, under ```hadoop-client```, there are two sub-directories, each denotes a version of ```hadoop-client```. During runtime, Druid will look for these directories and load appropriate ```hadoop-client``` based on `hadoopDependencyCoordinates` passed to the [Hadoop Index Task](../ingestion/tasks.html).
### Append your Hadoop jars to the Druid classpath ### Append your Hadoop jars to the Druid classpath
@ -54,17 +61,17 @@ If you really don't like the way above, and you just want to use one specific Ha
(2) Append your Hadoop jars to the classpath, Druid will load them into the system. This mechanism is relatively easy to reason about, but it also means that you have to ensure that all dependency jars on the classpath are compatible. That is, Druid makes no provisions while using this method to maintain class loader isolation so you must make sure that the jars on your classpath are mutually compatible. (2) Append your Hadoop jars to the classpath, Druid will load them into the system. This mechanism is relatively easy to reason about, but it also means that you have to ensure that all dependency jars on the classpath are compatible. That is, Druid makes no provisions while using this method to maintain class loader isolation so you must make sure that the jars on your classpath are mutually compatible.
## Working with Hadoop 2.x #### Hadoop 2.x
The default version of Hadoop bundled with Druid is 2.3. The default version of Hadoop bundled with Druid is 2.3.
To override the default Hadoop version, both the Hadoop Index Task and the standalone Hadoop indexer support the parameter `hadoopDependencyCoordinates`(See [Index Hadoop Task](../ingestion/tasks.html). You can pass another set of Hadoop coordinates through this parameter (e.g. You can specify coordinates for Hadoop 2.4.0 as `["org.apache.hadoop:hadoop-client:2.4.0"]`), which will overwrite the default Hadoop coordinates Druid uses. To override the default Hadoop version, both the Hadoop Index Task and the standalone Hadoop indexer support the parameter `hadoopDependencyCoordinates`(See [Index Hadoop Task](../ingestion/tasks.html)). You can pass another set of Hadoop coordinates through this parameter (e.g. You can specify coordinates for Hadoop 2.4.0 as `["org.apache.hadoop:hadoop-client:2.4.0"]`), which will overwrite the default Hadoop coordinates Druid uses.
The Hadoop Index Task takes this parameter has part of the task JSON and the standalone Hadoop indexer takes this parameter as a command line argument. The Hadoop Index Task takes this parameter has part of the task JSON and the standalone Hadoop indexer takes this parameter as a command line argument.
If you are still having problems, include all relevant hadoop jars at the beginning of the classpath of your indexing or historical nodes. If you are still having problems, include all relevant hadoop jars at the beginning of the classpath of your indexing or historical nodes.
## Working with CDH #### CDH
Members of the community have reported dependency conflicts between the version of Jackson used in CDH and Druid. Currently, our best workaround is to edit Druid's pom.xml dependencies to match the version of Jackson in your Hadoop version and recompile Druid. Members of the community have reported dependency conflicts between the version of Jackson used in CDH and Druid. Currently, our best workaround is to edit Druid's pom.xml dependencies to match the version of Jackson in your Hadoop version and recompile Druid.

View File

@ -1,6 +1,7 @@
--- ---
layout: doc_page layout: doc_page
--- ---
# pull-deps Tool # pull-deps Tool
`pull-deps` is a tool that can pull down dependencies to the local repository and lay dependencies out into the extension directory as needed. `pull-deps` is a tool that can pull down dependencies to the local repository and lay dependencies out into the extension directory as needed.
@ -9,31 +10,31 @@ layout: doc_page
`-c` or `--coordinate` (Can be specified multiply times) `-c` or `--coordinate` (Can be specified multiply times)
Extension coordinate to pull down, followed by a maven coordinate, e.g. io.druid.extensions:mysql-metadata-storage Extension coordinate to pull down, followed by a maven coordinate, e.g. io.druid.extensions:mysql-metadata-storage
`-h` or `--hadoop-coordinate` (Can be specified multiply times) `-h` or `--hadoop-coordinate` (Can be specified multiply times)
Hadoop dependency to pull down, followed by a maven coordinate, e.g. org.apache.hadoop:hadoop-client:2.4.0 Hadoop dependency to pull down, followed by a maven coordinate, e.g. org.apache.hadoop:hadoop-client:2.4.0
`--no-default-hadoop` `--no-default-hadoop`
Don't pull down the default hadoop coordinate, i.e., org.apache.hadoop:hadoop-client:2.3.0. If `-h` option is supplied, then default hadoop coordinate will not be downloaded. Don't pull down the default hadoop coordinate, i.e., org.apache.hadoop:hadoop-client:2.3.0. If `-h` option is supplied, then default hadoop coordinate will not be downloaded.
`--clean` `--clean`
Remove exisiting extension and hadoop dependencies directories before pulling down dependencies. Remove exisiting extension and hadoop dependencies directories before pulling down dependencies.
`-l` or `--localRepository` `-l` or `--localRepository`
A local repostiry that Maven will use to put downloaded files. Then pull-deps will lay these files out into the extensions directory as needed. A local repostiry that Maven will use to put downloaded files. Then pull-deps will lay these files out into the extensions directory as needed.
`-r` or `--remoteRepository` `-r` or `--remoteRepository`
Add a remote repository to the default remote repository list, which includes https://repo1.maven.org/maven2/ and https://metamx.artifactoryonline.com/metamx/pub-libs-releases-local Add a remote repository to the default remote repository list, which includes https://repo1.maven.org/maven2/ and https://metamx.artifactoryonline.com/metamx/pub-libs-releases-local
`-d` or `--defaultVersion` `-d` or `--defaultVersion`
Version to use for extension coordinate that doesn't have a version information. For example, if extension coordinate is `io.druid.extensions:mysql-metadata-storage`, and default version is `0.8.0`, then this coordinate will be treated as `io.druid.extensions:mysql-metadata-storage:0.8.0` Version to use for extension coordinate that doesn't have a version information. For example, if extension coordinate is `io.druid.extensions:mysql-metadata-storage`, and default version is `0.9.0`, then this coordinate will be treated as `io.druid.extensions:mysql-metadata-storage:0.9.0`
To run `pull-deps`, you should To run `pull-deps`, you should
@ -43,9 +44,9 @@ To run `pull-deps`, you should
Example: Example:
Suppose you want to download ```druid-examples```, ```mysql-metadata-storage``` and ```hadoop-client```(both 2.3.0 and 2.4.0) with a specific version, you can run `pull-deps` command with `-c io.druid.extensions:druid-examples:0.8.0`, `-c io.druid.extensions:mysql-metadata-storage:0.8.0`, `-h org.apache.hadoop:hadoop-client:2.3.0` and `-h org.apache.hadoop:hadoop-client:2.4.0`, an example command would be: Suppose you want to download ```druid-examples```, ```mysql-metadata-storage``` and ```hadoop-client```(both 2.3.0 and 2.4.0) with a specific version, you can run `pull-deps` command with `-c io.druid.extensions:druid-examples:0.9.0`, `-c io.druid.extensions:mysql-metadata-storage:0.9.0`, `-h org.apache.hadoop:hadoop-client:2.3.0` and `-h org.apache.hadoop:hadoop-client:2.4.0`, an example command would be:
```java -classpath "/my/druid/library/*" io.druid.cli.Main tools pull-deps --clean -c io.druid.extensions:mysql-metadata-storage:0.8.0 -c io.druid.extensions:druid-examples:0.8.0 -h org.apache.hadoop:hadoop-client:2.3.0 -h org.apache.hadoop:hadoop-client:2.4.0``` ```java -classpath "/my/druid/library/*" io.druid.cli.Main tools pull-deps --clean -c io.druid.extensions:mysql-metadata-storage:0.9.0 -c io.druid.extensions:druid-examples:0.9.0 -h org.apache.hadoop:hadoop-client:2.3.0 -h org.apache.hadoop:hadoop-client:2.4.0```
Because `--clean` is supplied, this command will first remove the directories specified at `druid.extensions.directory` and `druid.extensions.hadoopDependenciesDir`, then recreate them and start downloading the extensions there. After finishing downloading, if you go to the extension directories you specified, you will see Because `--clean` is supplied, this command will first remove the directories specified at `druid.extensions.directory` and `druid.extensions.hadoopDependenciesDir`, then recreate them and start downloading the extensions there. After finishing downloading, if you go to the extension directories you specified, you will see
@ -57,14 +58,14 @@ extensions
│   ├── commons-digester-1.8.jar │   ├── commons-digester-1.8.jar
│   ├── commons-logging-1.1.1.jar │   ├── commons-logging-1.1.1.jar
│   ├── commons-validator-1.4.0.jar │   ├── commons-validator-1.4.0.jar
│   ├── druid-examples-0.8.0.jar │   ├── druid-examples-0.9.0.jar
│   ├── twitter4j-async-3.0.3.jar │   ├── twitter4j-async-3.0.3.jar
│   ├── twitter4j-core-3.0.3.jar │   ├── twitter4j-core-3.0.3.jar
│   └── twitter4j-stream-3.0.3.jar │   └── twitter4j-stream-3.0.3.jar
└── mysql-metadata-storage └── mysql-metadata-storage
├── jdbi-2.32.jar ├── jdbi-2.32.jar
├── mysql-connector-java-5.1.34.jar ├── mysql-connector-java-5.1.34.jar
└── mysql-metadata-storage-0.8.0.jar └── mysql-metadata-storage-0.9.0.jar
``` ```
``` ```
@ -89,6 +90,8 @@ hadoop-dependencies/
..... lots of jars ..... lots of jars
``` ```
Note that if you specify `--defaultVersion`, you don't have to put version information in the coordinate. For example, if you want both `druid-examples` and `mysql-metadata-storage` to use version `0.8.0`, you can change the command above to Note that if you specify `--defaultVersion`, you don't have to put version information in the coordinate. For example, if you want both `druid-examples` and `mysql-metadata-storage` to use version `0.9.0`, you can change the command above to
```java -classpath "/my/druid/library/*" io.druid.cli.Main tools pull-deps --defaultVersion 0.8.0 --clean -c io.druid.extensions:mysql-metadata-storage -c io.druid.extensions:druid-examples -h org.apache.hadoop:hadoop-client:2.3.0 -h org.apache.hadoop:hadoop-client:2.4.0``` ```
java -classpath "/my/druid/library/*" io.druid.cli.Main tools pull-deps --defaultVersion 0.9.0 --clean -c io.druid.extensions:mysql-metadata-storage -c io.druid.extensions:druid-examples -h org.apache.hadoop:hadoop-client:2.3.0 -h org.apache.hadoop:hadoop-client:2.4.0
```

View File

@ -396,6 +396,10 @@ or without setting "locale" (in this case, the current value of the default loca
### Lookup DimensionSpecs ### Lookup DimensionSpecs
<div class="note caution">
Lookups are an <a href="../development/experimental.html">experimental</a> feature.
</div>
Lookup DimensionSpecs can be used to define directly a lookup implementation as dimension spec. Lookup DimensionSpecs can be used to define directly a lookup implementation as dimension spec.
Generally speaking there is two different kind of lookups implementations. Generally speaking there is two different kind of lookups implementations.
The first kind is passed at the query time like `map` implementation. The first kind is passed at the query time like `map` implementation.

View File

@ -3,6 +3,10 @@ layout: doc_page
--- ---
# Lookups # Lookups
<div class="note caution">
Lookups are an <a href="../development/experimental.html">experimental</a> feature.
</div>
Lookups are a concept in Druid where dimension values are (optionally) replaced with new values. Lookups are a concept in Druid where dimension values are (optionally) replaced with new values.
See [dimension specs](../querying/dimensionspecs.html) for more information. For the purpose of these documents, See [dimension specs](../querying/dimensionspecs.html) for more information. For the purpose of these documents,
a "key" refers to a dimension value to match, and a "value" refers to its replacement. a "key" refers to a dimension value to match, and a "value" refers to its replacement.
@ -61,8 +65,8 @@ described as per the sections on this page. For example:
] ]
``` ```
Proper functionality of Namespaced lookups requires the following extension to be loaded on the broker, peon, and historical nodes: Proper functionality of Namespaced lookups requires the following extension to be loaded on the broker, peon, and historical nodes:
`io.druid.extensions:druid-namespace-lookup` `druid-namespace-lookup`
## Cache Settings ## Cache Settings
@ -287,6 +291,8 @@ The following are the handling for kafka consumer properties in `druid.query.ren
To test this setup, you can send key/value pairs to a kafka stream via the following producer console: To test this setup, you can send key/value pairs to a kafka stream via the following producer console:
`./bin/kafka-console-producer.sh --property parse.key=true --property key.separator="->" --broker-list localhost:9092 --topic testTopic` ```
./bin/kafka-console-producer.sh --property parse.key=true --property key.separator="->" --broker-list localhost:9092 --topic testTopic
```
Renames can then be published as `OLD_VAL->NEW_VAL` followed by newline (enter or return) Renames can then be published as `OLD_VAL->NEW_VAL` followed by newline (enter or return)