fix docs for 0.6 part 1 of many

This commit is contained in:
fjy 2013-10-07 14:47:04 -07:00
parent de71c14114
commit af1dbe6eab
20 changed files with 335 additions and 452 deletions

View File

@ -30,4 +30,4 @@ echo "For examples, see: "
echo " " echo " "
ls -1 examples/*/*sh ls -1 examples/*/*sh
echo " " echo " "
echo "See also https://github.com/metamx/druid/wiki" echo "See also http://druid.io/docs/0.6.0/Home.html"

View File

@ -1,24 +0,0 @@
---
layout: post
title: "Welcome to Jekyll!"
date: 2013-09-16 13:06:49
categories: jekyll update
---
You'll find this post in your `_posts` directory - edit this post and re-build (or run with the `-w` switch) to see your changes!
To add new posts, simply add a file in the `_posts` directory that follows the convention: YYYY-MM-DD-name-of-post.ext.
Jekyll also offers powerful support for code snippets:
{% highlight ruby %}
def print_hi(name)
puts "Hi, #{name}"
end
print_hi('Tom')
#=> prints 'Hi, Tom' to STDOUT.
{% endhighlight %}
Check out the [Jekyll docs][jekyll] for more info on how to get the most out of Jekyll. File all bugs/feature requests at [Jekyll's GitHub repo][jekyll-gh].
[jekyll-gh]: https://github.com/mojombo/jekyll
[jekyll]: http://jekyllrb.com

View File

@ -4,22 +4,22 @@ layout: doc_page
Batch Data Ingestion Batch Data Ingestion
==================== ====================
There are two choices for batch data ingestion to your Druid cluster, you can use the [Indexing service](Indexing-service.html) or you can use the `HadoopDruidIndexerMain`. This page describes how to use the `HadoopDruidIndexerMain`. There are two choices for batch data ingestion to your Druid cluster, you can use the [Indexing service](Indexing-service.html) or you can use the `HadoopDruidIndexer`. This page describes how to use the `HadoopDruidIndexer`.
Which should I use? Which should I use?
------------------- -------------------
The [Indexing service](Indexing-service.html) is a node that can run as part of your Druid cluster and can accomplish a number of different types of indexing tasks. Even if all you care about is batch indexing, it provides for the encapsulation of things like the Database that is used for segment metadata and other things, so that your indexing tasks do not need to include such information. Long-term, the indexing service is going to be the preferred method of ingesting data. The [Indexing service](Indexing-service.html) is a node that can run as part of your Druid cluster and can accomplish a number of different types of indexing tasks. Even if all you care about is batch indexing, it provides for the encapsulation of things like the Database that is used for segment metadata and other things, so that your indexing tasks do not need to include such information. Long-term, the indexing service is going to be the preferred method of ingesting data.
The `HadoopDruidIndexerMain` runs hadoop jobs in order to separate and index data segments. It takes advantage of Hadoop as a job scheduling and distributed job execution platform. It is a simple method if you already have Hadoop running and dont want to spend the time configuring and deploying the [Indexing service](Indexing service.html) just yet. The `HadoopDruidIndexer` runs hadoop jobs in order to separate and index data segments. It takes advantage of Hadoop as a job scheduling and distributed job execution platform. It is a simple method if you already have Hadoop running and dont want to spend the time configuring and deploying the [Indexing service](Indexing service.html) just yet.
HadoopDruidIndexer HadoopDruidIndexer
------------------ ------------------
Located at `com.metamx.druid.indexer.HadoopDruidIndexerMain` can be run like The HadoopDruidIndexer can be run like so:
``` ```
java -cp hadoop_config_path:druid_indexer_selfcontained_jar_path com.metamx.druid.indexer.HadoopDruidIndexerMain <config_file> java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath hadoop_config_path:`echo lib/* | tr ' ' ':'` io.druid.cli.Main index hadoop <config_file>
``` ```
The interval is the [ISO8601 interval](http://en.wikipedia.org/wiki/ISO_8601#Time_intervals) of the data you are processing. The config\_file is a path to a file (the "specFile") that contains JSON and an example looks like: The interval is the [ISO8601 interval](http://en.wikipedia.org/wiki/ISO_8601#Time_intervals) of the data you are processing. The config\_file is a path to a file (the "specFile") that contains JSON and an example looks like:

View File

@ -3,7 +3,7 @@ layout: doc_page
--- ---
# Booting a Single Node Cluster # # Booting a Single Node Cluster #
[Loading Your Data](Loading-Your-Data.html) and [Querying Your Data](Querying-Your-Data.html) contain recipes to boot a small druid cluster on localhost. Here we will boot a small cluster on EC2. You can checkout the code, or download a tarball from [here](http://static.druid.io/artifacts/druid-services-0.5.51-SNAPSHOT-bin.tar.gz). [Loading Your Data](Loading-Your-Data.html) and [Querying Your Data](Querying-Your-Data.html) contain recipes to boot a small druid cluster on localhost. Here we will boot a small cluster on EC2. You can checkout the code, or download a tarball from [here](http://static.druid.io/artifacts/druid-services-0.6.0-bin.tar.gz).
The [ec2 run script](https://github.com/metamx/druid/blob/master/examples/bin/run_ec2.sh), run_ec2.sh, is located at 'examples/bin' if you have checked out the code, or at the root of the project if you've downloaded a tarball. The scripts rely on the [Amazon EC2 API Tools](http://aws.amazon.com/developertools/351), and you will need to set three environment variables: The [ec2 run script](https://github.com/metamx/druid/blob/master/examples/bin/run_ec2.sh), run_ec2.sh, is located at 'examples/bin' if you have checked out the code, or at the root of the project if you've downloaded a tarball. The scripts rely on the [Amazon EC2 API Tools](http://aws.amazon.com/developertools/351), and you will need to set three environment variables:

View File

@ -104,7 +104,7 @@ The coordinator node exposes several HTTP endpoints for interactions.
The Coordinator Console The Coordinator Console
------------------ ------------------
The Druid coordinator exposes a web GUI for displaying cluster information and rule configuration. After the coordinator starts, the console can be accessed at http://HOST:PORT/static/. There exists a full cluster view, as well as views for individual historical nodes, datasources and segments themselves. Segment information can be displayed in raw JSON form or as part of a sortable and filterable table. The Druid coordinator exposes a web GUI for displaying cluster information and rule configuration. After the coordinator starts, the console can be accessed at http://<HOST>:<PORT>. There exists a full cluster view, as well as views for individual historical nodes, datasources and segments themselves. Segment information can be displayed in raw JSON form or as part of a sortable and filterable table.
The coordinator console also exposes an interface to creating and editing rules. All valid datasources configured in the segment database, along with a default datasource, are available for configuration. Rules of different types can be added, deleted or edited. The coordinator console also exposes an interface to creating and editing rules. All valid datasources configured in the segment database, along with a default datasource, are available for configuration. Rules of different types can be added, deleted or edited.

View File

@ -6,7 +6,7 @@ A version may be declared as a release candidate if it has been deployed to a si
Release Candidate Release Candidate
----------------- -----------------
There is no release candidate at this time. The current release candidate is tagged at version [0.6.0](https://github.com/metamx/druid/tree/druid-0.6.0).
Stable Release Stable Release
-------------- --------------

View File

@ -4,7 +4,7 @@ layout: doc_page
Examples Examples
======== ========
The examples on this page are setup in order to give you a feel for what Druid does in practice. They are quick demos of Druid based on [RealtimeStandaloneMain](https://github.com/metamx/druid/blob/master/examples/src/main/java/druid/examples/RealtimeStandaloneMain.java). While you wouldnt run it this way in production you should be able to see how ingestion works and the kind of exploratory queries that are possible. Everything that can be done on your box here can be scaled out to 10s of billions of events and terabytes of data per day in a production cluster while still giving the snappy responsive exploratory queries. The examples on this page are setup in order to give you a feel for what Druid does in practice. They are quick demos of Druid based on [CliRealtimeExample](https://github.com/metamx/druid/blob/master/services/src/main/java/io/druid/cli/CliRealtimeExample.java). While you wouldnt run it this way in production you should be able to see how ingestion works and the kind of exploratory queries that are possible. Everything that can be done on your box here can be scaled out to 10s of billions of events and terabytes of data per day in a production cluster while still giving the snappy responsive exploratory queries.
Installing Standalone Druid Installing Standalone Druid
--------------------------- ---------------------------
@ -19,7 +19,7 @@ Clone Druid and build it:
git clone https://github.com/metamx/druid.git druid git clone https://github.com/metamx/druid.git druid
cd druid cd druid
git fetch --tags git fetch --tags
git checkout druid-0.4.30 git checkout druid-0.6.0
./build.sh ./build.sh
``` ```
@ -49,7 +49,7 @@ This Example uses a feature of Twitter that allows for sampling of its stream
### What youll do ### What youll do
See [Tutorial](Tutorial.html) See [Twitter Tutorial](Twitter-Tutorial.html)
Rand Example Rand Example
------------ ------------
@ -68,5 +68,4 @@ In another terminal window:
./run_example_client.sh # type rand when prompted ./run_example_client.sh # type rand when prompted
``` ```
The result of the client query is in JSON format. The client makes a REST request using the program `curl` which is usually installed on Linux, Unix, and OSX by default. The result of the client query is in JSON format. The client makes a REST request using the program `curl` which is usually installed on Linux, Unix, and OSX by default.

View File

@ -31,7 +31,7 @@ For every query that a historical node services, it will log the query and repor
Running Running
------- -------
p
Historical nodes can be run using the `io.druid.cli.Main` class with program arguments "server historical". Historical nodes can be run using the `io.druid.cli.Main` class with program arguments "server historical".
Configuration Configuration

View File

@ -23,3 +23,5 @@ Some great folks have written their own libraries to interact with Druid
* [madvertise/druid-dumbo](https://github.com/madvertise/druid-dumbo) - Scripts to help generate batch configs for the ingestion of data into Druid * [madvertise/druid-dumbo](https://github.com/madvertise/druid-dumbo) - Scripts to help generate batch configs for the ingestion of data into Druid
* [housejester/druid-test-harness](https://github.com/housejester/druid-test-harness) - A set of scripts to simplify standing up some servers and seeing how things work * [housejester/druid-test-harness](https://github.com/housejester/druid-test-harness) - A set of scripts to simplify standing up some servers and seeing how things work
* [mingfang/docker-druid](https://github.com/mingfang/docker-druid) - A Dockerfile to run the entire Druid cluster

View File

@ -1,12 +1,12 @@
--- ---
layout: doc_page layout: doc_page
--- ---
Once you have a realtime node working, it is time to load your own data to see how Druid performs. Once you have a real-time node working, it is time to load your own data to see how Druid performs.
Druid can ingest data in three ways: via Kafka and a realtime node, via the indexing service, and via the Hadoop batch loader. Data is ingested in realtime using a [Firehose](Firehose.html). Druid can ingest data in three ways: via Kafka and a realtime node, via the indexing service, and via the Hadoop batch loader. Data is ingested in real-time using a [Firehose](Firehose.html).
## Create Config Directories ## ## Create Config Directories ##
Each type of node needs its own config file and directory, so create them as subdirectories under the druid directory. Each type of node needs its own config file and directory, so create them as subdirectories under the druid directory if they not already exist.
```bash ```bash
mkdir config mkdir config
@ -18,7 +18,7 @@ mkdir config/broker
## Loading Data with Kafka ## ## Loading Data with Kafka ##
[KafkaFirehoseFactory](https://github.com/metamx/druid/blob/druid-0.5.x/realtime/src/main/java/com/metamx/druid/realtime/firehose/KafkaFirehoseFactory.java) is how druid communicates with Kafka. Using this [Firehose](Firehose.html) with the right configuration, we can import data into Druid in realtime without writing any code. To load data to a realtime node via Kafka, we'll first need to initialize Zookeeper and Kafka, and then configure and initialize a [Realtime](Realtime.html) node. [KafkaFirehoseFactory](https://github.com/metamx/druid/blob/druid-0.6.0/realtime/src/main/java/com/metamx/druid/realtime/firehose/KafkaFirehoseFactory.java) is how druid communicates with Kafka. Using this [Firehose](Firehose.html) with the right configuration, we can import data into Druid in realtime without writing any code. To load data to a realtime node via Kafka, we'll first need to initialize Zookeeper and Kafka, and then configure and initialize a [Realtime](Realtime.html) node.
### Booting Kafka ### ### Booting Kafka ###
@ -59,71 +59,90 @@ Instructions for booting a Zookeeper and then Kafka cluster are available [here]
1. Create a valid configuration file similar to this called config/realtime/runtime.properties: 1. Create a valid configuration file similar to this called config/realtime/runtime.properties:
```properties ```properties
druid.host=0.0.0.0:8080 druid.host=localhost
druid.service=example
druid.port=8080 druid.port=8080
com.metamx.emitter.logging=true druid.zk.service.host=localhost
druid.s3.accessKey=AKIAIMKECRUYKDQGR6YQ
druid.s3.secretKey=QyyfVZ7llSiRg6Qcrql1eEUG7buFpAK6T6engr1b
druid.db.connector.connectURI=jdbc\:mysql\://localhost\:3306/druid
druid.db.connector.user=druid
druid.db.connector.password=diurd
druid.realtime.specFile=config/realtime/realtime.spec
druid.processing.formatString=processing_%s
druid.processing.numThreads=1
druid.processing.buffer.sizeBytes=10000000 druid.processing.buffer.sizeBytes=10000000
#emitting, opaque marker
druid.service=example
druid.request.logging.dir=/tmp/example/log
druid.realtime.specFile=realtime.spec
com.metamx.emitter.logging=true
com.metamx.emitter.logging.level=debug
# below are dummy values when operating a realtime only node
druid.processing.numThreads=3 druid.processing.numThreads=3
com.metamx.aws.accessKey=dummy_access_key
com.metamx.aws.secretKey=dummy_secret_key
druid.storage.s3.bucket=dummy_s3_bucket
druid.zk.service.host=localhost
druid.server.maxSize=300000000000
druid.zk.paths.base=/druid
druid.database.segmentTable=prod_segments
druid.database.user=user
druid.database.password=diurd
druid.database.connectURI=
druid.host=127.0.0.1:8080
``` ```
2. Create a valid realtime configuration file similar to this called realtime.spec: 2. Create a valid realtime configuration file similar to this called realtime.spec:
```json ```json
[{ [
"schema" : { "dataSource":"druidtest", {
"aggregators":[ {"type":"count", "name":"impressions"}, "schema": {
{"type":"doubleSum","name":"wp","fieldName":"wp"}], "dataSource": "druidtest",
"indexGranularity":"minute", "aggregators": [
"shardSpec" : { "type": "none" } }, {
"config" : { "maxRowsInMemory" : 500000, "type": "count",
"intermediatePersistPeriod" : "PT10m" }, "name": "impressions"
"firehose" : { "type" : "kafka-0.7.2", },
"consumerProps" : { "zk.connect" : "localhost:2181", {
"zk.connectiontimeout.ms" : "15000", "type": "doubleSum",
"zk.sessiontimeout.ms" : "15000", "name": "wp",
"zk.synctime.ms" : "5000", "fieldName": "wp"
"groupid" : "topic-pixel-local", }
"fetch.size" : "1048586", ],
"autooffset.reset" : "largest", "indexGranularity": "minute",
"autocommit.enable" : "false" }, "shardSpec": {
"feed" : "druidtest", "type": "none"
"parser" : { "timestampSpec" : { "column" : "utcdt", "format" : "iso" }, }
"data" : { "format" : "json" }, },
"dimensionExclusions" : ["wp"] } }, "config": {
"plumber" : { "type" : "realtime", "maxRowsInMemory": 500000,
"windowPeriod" : "PT10m", "intermediatePersistPeriod": "PT10m"
"segmentGranularity":"hour", },
"basePersistDirectory" : "/tmp/realtime/basePersist", "firehose": {
"rejectionPolicy": {"type": "messageTime"} } "type": "kafka-0.7.2",
"consumerProps": {
}] "zk.connect": "localhost:2181",
"zk.connectiontimeout.ms": "15000",
"zk.sessiontimeout.ms": "15000",
"zk.synctime.ms": "5000",
"groupid": "topic-pixel-local",
"fetch.size": "1048586",
"autooffset.reset": "largest",
"autocommit.enable": "false"
},
"feed": "druidtest",
"parser": {
"timestampSpec": {
"column": "utcdt",
"format": "iso"
},
"data": {
"format": "json"
},
"dimensionExclusions": [
"wp"
]
}
},
"plumber": {
"type": "realtime",
"windowPeriod": "PT10m",
"segmentGranularity": "hour",
"basePersistDirectory": "\/tmp\/realtime\/basePersist",
"rejectionPolicy": {
"type": "messageTime"
}
}
}
]
``` ```
3. Launch the realtime node 3. Launch the realtime node
@ -131,7 +150,7 @@ Instructions for booting a Zookeeper and then Kafka cluster are available [here]
```bash ```bash
java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 \ java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 \
-Ddruid.realtime.specFile=config/realtime/realtime.spec \ -Ddruid.realtime.specFile=config/realtime/realtime.spec \
-classpath lib/*:config/realtime com.metamx.druid.realtime.RealtimeMain -classpath lib/*:config/realtime io.druid.cli.Main server realtime
``` ```
4. Paste data into the Kafka console producer 4. Paste data into the Kafka console producer
@ -239,46 +258,20 @@ If you've already setup a realtime node, be aware that although you can run mult
1. Setup a configuration file called config/coordinator/runtime.properties similar to: 1. Setup a configuration file called config/coordinator/runtime.properties similar to:
```properties ```properties
druid.host=0.0.0.0:8081 druid.host=localhost
druid.service=coordinator
druid.port=8081 druid.port=8081
com.metamx.emitter.logging=true druid.zk.service.host=localhost
druid.processing.formatString=processing_%s druid.s3.accessKey=AKIAIMKECRUYKDQGR6YQ
druid.processing.numThreads=1 druid.s3.secretKey=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
druid.processing.buffer.sizeBytes=10000000
# emitting, opaque marker druid.db.connector.connectURI=jdbc\:mysql\://localhost\:3306/druid
druid.service=example druid.db.connector.user=druid
druid.db.connector.password=diurd
druid.coordinator.startDelay=PT60s druid.coordinator.startDelay=PT60s
druid.request.logging.dir=/tmp/example/log
druid.realtime.specFile=realtime.spec
com.metamx.emitter.logging=true
com.metamx.emitter.logging.level=debug
# below are dummy values when operating a realtime only node
druid.processing.numThreads=3
com.metamx.aws.accessKey=dummy_access_key
com.metamx.aws.secretKey=dummy_secret_key
druid.storage.s3.bucket=dummy_s3_bucket
druid.zk.service.host=localhost
druid.server.maxSize=300000000000
druid.zk.paths.base=/druid
druid.database.segmentTable=prod_segments
druid.database.user=druid
druid.database.password=diurd
druid.database.connectURI=jdbc:mysql://localhost:3306/druid
druid.zk.paths.discoveryPath=/druid/discoveryPath
druid.database.ruleTable=rules
druid.database.configTable=config
# Path on local FS for storage of segments; dir will be created if needed
druid.paths.indexCache=/tmp/druid/indexCache
# Path on local FS for storage of segment metadata; dir will be created if needed
druid.paths.segmentInfoCache=/tmp/druid/segmentInfoCache
``` ```
2. Launch the [Coordinator](Coordinator.html) node 2. Launch the [Coordinator](Coordinator.html) node
@ -286,7 +279,7 @@ If you've already setup a realtime node, be aware that although you can run mult
```bash ```bash
java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 \ java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 \
-classpath lib/*:config/coordinator \ -classpath lib/*:config/coordinator \
com.metamx.druid.http.CoordinatorMain io.druid.Cli.Main server coordinator
``` ```
### Launch a Historical Node ### ### Launch a Historical Node ###
@ -294,48 +287,21 @@ If you've already setup a realtime node, be aware that although you can run mult
1. Create a configuration file in config/historical/runtime.properties similar to: 1. Create a configuration file in config/historical/runtime.properties similar to:
```properties ```properties
druid.host=0.0.0.0:8082 druid.host=localhost
druid.service=historical
druid.port=8082 druid.port=8082
com.metamx.emitter.logging=true druid.zk.service.host=localhost
druid.s3.secretKey=QyyfVZ7llSiRg6Qcrql1eEUG7buFpAK6T6engr1b
druid.s3.accessKey=AKIAIMKECRUYKDQGR6YQ
druid.server.maxSize=100000000
druid.processing.formatString=processing_%s
druid.processing.numThreads=1
druid.processing.buffer.sizeBytes=10000000 druid.processing.buffer.sizeBytes=10000000
# emitting, opaque marker druid.segmentCache.infoPath=/tmp/druid/segmentInfoCache
druid.service=example druid.segmentCache.locations=[{"path": "/tmp/druid/indexCache", "maxSize"\: 100000000}]
druid.request.logging.dir=/tmp/example/log
druid.realtime.specFile=realtime.spec
com.metamx.emitter.logging=true
com.metamx.emitter.logging.level=debug
# below are dummy values when operating a realtime only node
druid.processing.numThreads=3
com.metamx.aws.accessKey=dummy_access_key
com.metamx.aws.secretKey=dummy_secret_key
druid.storage.s3.bucket=dummy_s3_bucket
druid.zk.service.host=localhost
druid.server.maxSize=300000000000
druid.zk.paths.base=/druid
druid.database.segmentTable=prod_segments
druid.database.user=druid
druid.database.password=diurd
druid.database.connectURI=jdbc:mysql://localhost:3306/druid
druid.zk.paths.discoveryPath=/druid/discoveryPath
druid.database.ruleTable=rules
druid.database.configTable=config
# Path on local FS for storage of segments; dir will be created if needed
druid.paths.indexCache=/tmp/druid/indexCache
# Path on local FS for storage of segment metadata; dir will be created if needed
druid.paths.segmentInfoCache=/tmp/druid/segmentInfoCache
# Setup local storage mode
druid.storage.local.storageDirectory=/tmp/druid/localStorage
druid.storage.local=true
``` ```
2. Launch the historical node: 2. Launch the historical node:
@ -371,31 +337,47 @@ Now its time to run the Hadoop [Batch-ingestion](Batch-ingestion.html) job, Hado
"timestampFormat": "iso", "timestampFormat": "iso",
"dataSpec": { "dataSpec": {
"format": "json", "format": "json",
"dimensions": ["gender", "age"] "dimensions": [
"gender",
"age"
]
}, },
"granularitySpec": { "granularitySpec": {
"type":"uniform", "type": "uniform",
"intervals":["2010-01-01T01/PT1H"], "intervals": [
"gran":"hour" "2010-01-01T01\/PT1H"
],
"gran": "hour"
}, },
"pathSpec": { "type": "static", "pathSpec": {
"paths": "/Users/rjurney/Software/druid/records.json" }, "type": "static",
"rollupSpec": { "aggs":[ {"type":"count", "name":"impressions"}, "paths": "\/druid\/records.json"
{"type":"doubleSum","name":"wp","fieldName":"wp"} },
], "rollupSpec": {
"rollupGranularity": "minute"}, "aggs": [
"workingPath": "/tmp/working_path", {
"segmentOutputPath": "/tmp/segments", "type": "count",
"leaveIntermediate": "false", "name": "impressions"
},
{
"type": "doubleSum",
"name": "wp",
"fieldName": "wp"
}
],
"rollupGranularity": "minute"
},
"workingPath": "\/tmp\/working_path",
"segmentOutputPath": "\/tmp\/segments",
"partitionsSpec": { "partitionsSpec": {
"targetPartitionSize": 5000000 "targetPartitionSize": 5000000
}, },
"updaterJobSpec": { "updaterJobSpec": {
"type":"db", "type": "db",
"connectURI":"jdbc:mysql://localhost:3306/druid", "connectURI": "jdbc:mysql:\/\/localhost:3306\/druid",
"user":"druid", "user": "druid",
"password":"diurd", "password": "diurd",
"segmentTable":"prod_segments" "segmentTable": "druid_segments"
} }
} }
``` ```
@ -404,8 +386,8 @@ Now its time to run the Hadoop [Batch-ingestion](Batch-ingestion.html) job, Hado
```bash ```bash
java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 \ java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 \
-Ddruid.realtime.specFile=realtime.spec -classpath lib/* \ -classpath `echo lib/* | tr ' ' ':'` \
com.metamx.druid.indexer.HadoopDruidIndexerMain batchConfig.json io.druid.cli.Main index hadoop batchConfig.json
``` ```
You can now move on to [Querying Your Data](Querying-Your-Data.html)! You can now move on to [Querying Your Data](Querying-Your-Data.html)!

View File

@ -6,7 +6,7 @@ MySQL is an external dependency of Druid. We use it to store various metadata ab
Segments Table Segments Table
-------------- --------------
This is dictated by the `druid.database.segmentTable` property (Note that these properties are going to change in the next stable version after 0.4.12). This is dictated by the `druid.db.tables.segments` property.
This table stores metadata about the segments that are available in the system. The table is polled by the [Coordinator](Coordinator.html) to determine the set of segments that should be available for querying in the system. The table has two main functional columns, the other columns are for indexing purposes. This table stores metadata about the segments that are available in the system. The table is polled by the [Coordinator](Coordinator.html) to determine the set of segments that should be available for querying in the system. The table has two main functional columns, the other columns are for indexing purposes.

View File

@ -7,101 +7,24 @@ Before we start querying druid, we're going to finish setting up a complete clus
## Booting a Broker Node ## ## Booting a Broker Node ##
1. Setup a config file at config/broker/runtime.properties that looks like this: 1. Setup a config file at config/broker/runtime.properties that looks like this:
``` ```
druid.host=0.0.0.0:8083 druid.host=localhost
druid.port=8083 druid.service=broker
druid.port=8080
com.metamx.emitter.logging=true
druid.processing.formatString=processing_%s
druid.processing.numThreads=1
druid.processing.buffer.sizeBytes=10000000
#emitting, opaque marker
druid.service=example
druid.request.logging.dir=/tmp/example/log
druid.realtime.specFile=realtime.spec
com.metamx.emitter.logging=true
com.metamx.emitter.logging.level=debug
# below are dummy values when operating a realtime only node
druid.processing.numThreads=3
com.metamx.aws.accessKey=dummy_access_key
com.metamx.aws.secretKey=dummy_secret_key
druid.storage.s3.bucket=dummy_s3_bucket
druid.zk.service.host=localhost druid.zk.service.host=localhost
druid.server.maxSize=300000000000
druid.zk.paths.base=/druid
druid.database.segmentTable=prod_segments
druid.database.user=druid
druid.database.password=diurd
druid.database.connectURI=jdbc:mysql://localhost:3306/druid
druid.zk.paths.discoveryPath=/druid/discoveryPath
druid.database.ruleTable=rules
druid.database.configTable=config
# Path on local FS for storage of segments; dir will be created if needed
druid.paths.indexCache=/tmp/druid/indexCache
# Path on local FS for storage of segment metadata; dir will be created if needed
druid.paths.segmentInfoCache=/tmp/druid/segmentInfoCache
druid.storage.local.storageDirectory=/tmp/druid/localStorage
druid.storage.local=true
# thread pool size for servicing queries
druid.client.http.connections=30
``` ```
2. Run the broker node: 2. Run the broker node:
```bash ```bash
java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 \ java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath lib/*:config/broker io.druid.cli.Main server broker
-Ddruid.realtime.specFile=realtime.spec \
-classpath services/target/druid-services-0.5.50-SNAPSHOT-selfcontained.jar:config/broker \
com.metamx.druid.http.BrokerMain
``` ```
## Booting a Coordinator Node ## With the Broker node and the other Druid nodes types up and running, you have a fully functional Druid Cluster and are ready to query your data!
1. Setup a config file at config/coordinator/runtime.properties that looks like this: [https://gist.github.com/rjurney/5818870](https://gist.github.com/rjurney/5818870)
2. Run the coordinator node:
```bash
java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 \
-classpath services/target/druid-services-0.5.50-SNAPSHOT-selfcontained.jar:config/coordinator \
io.druid.cli.Main server coordinator
```
## Booting a Realtime Node ##
1. Setup a config file at config/realtime/runtime.properties that looks like this: [https://gist.github.com/rjurney/5818774](https://gist.github.com/rjurney/5818774)
2. Setup a realtime.spec file like this: [https://gist.github.com/rjurney/5818779](https://gist.github.com/rjurney/5818779)
3. Run the realtime node:
```bash
java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 \
-Ddruid.realtime.specFile=realtime.spec \
-classpath services/target/druid-services-0.5.50-SNAPSHOT-selfcontained.jar:config/realtime \
com.metamx.druid.realtime.RealtimeMain
```
## Booting a historical node ##
1. Setup a config file at config/historical/runtime.properties that looks like this: [https://gist.github.com/rjurney/5818885](https://gist.github.com/rjurney/5818885)
2. Run the historical node:
```bash
java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 \
-classpath services/target/druid-services-0.5.50-SNAPSHOT-selfcontained.jar:config/historical \
io.druid.cli.Main server historical
```
# Querying Your Data # # Querying Your Data #
@ -109,7 +32,7 @@ Now that we have a complete cluster setup on localhost, we need to load data. To
## Querying Different Nodes ## ## Querying Different Nodes ##
As a shared-nothing system, there are three ways to query druid, against the [Realtime](Realtime.html), [Historical](Historical.html) or [Broker](Broker.html) node. Querying a Realtime node returns only realtime data, querying a historical node returns only historical segments. Querying the broker will query both realtime and historical segments and compose an overall result for the query. This is the normal mode of operation for queries in druid. As a shared-nothing system, there are three ways to query druid, against the [Realtime](Realtime.html), [Historical](Historical.html) or [Broker](Broker.html) node. Querying a Realtime node returns only realtime data, querying a historical node returns only historical segments. Querying the broker may query both realtime and historical segments and compose an overall result for the query. This is the normal mode of operation for queries in Druid.
### Construct a Query ### ### Construct a Query ###
@ -148,7 +71,7 @@ See our result:
} ] } ]
``` ```
### Querying the historical node ### ### Querying the Historical node ###
Run the query against port 8082: Run the query against port 8082:
```bash ```bash
@ -165,7 +88,7 @@ And get (similar to):
} ] } ]
``` ```
### Querying both Nodes via the Broker ### ### Querying the Broker ###
Run the query against port 8083: Run the query against port 8083:
```bash ```bash
@ -184,39 +107,72 @@ And get:
Now that we know what nodes can be queried (although you should usually use the broker node), lets learn how to know what queries are available. Now that we know what nodes can be queried (although you should usually use the broker node), lets learn how to know what queries are available.
## Querying Against the realtime.spec ## ## Examining the realtime.spec ##
How are we to know what queries we can run? Although [Querying](Querying.html) is a helpful index, to get a handle on querying our data we need to look at our [Realtime](Realtime.html) node's realtime.spec file: How are we to know what queries we can run? Although [Querying](Querying.html) is a helpful index, to get a handle on querying our data we need to look at our [Realtime](Realtime.html) node's realtime.spec file:
```json ```json
[{ [
"schema" : { "dataSource":"druidtest", {
"aggregators":[ {"type":"count", "name":"impressions"}, "schema": {
{"type":"doubleSum","name":"wp","fieldName":"wp"}], "dataSource": "druidtest",
"indexGranularity":"minute", "aggregators": [
"shardSpec" : { "type": "none" } }, {
"config" : { "maxRowsInMemory" : 500000, "type": "count",
"intermediatePersistPeriod" : "PT10m" }, "name": "impressions"
"firehose" : { "type" : "kafka-0.7.2", },
"consumerProps" : { "zk.connect" : "localhost:2181", {
"zk.connectiontimeout.ms" : "15000", "type": "doubleSum",
"zk.sessiontimeout.ms" : "15000", "name": "wp",
"zk.synctime.ms" : "5000", "fieldName": "wp"
"groupid" : "topic-pixel-local", }
"fetch.size" : "1048586", ],
"autooffset.reset" : "largest", "indexGranularity": "minute",
"autocommit.enable" : "false" }, "shardSpec": {
"feed" : "druidtest", "type": "none"
"parser" : { "timestampSpec" : { "column" : "utcdt", "format" : "iso" }, }
"data" : { "format" : "json" }, },
"dimensionExclusions" : ["wp"] } }, "config": {
"plumber" : { "type" : "realtime", "maxRowsInMemory": 500000,
"windowPeriod" : "PT10m", "intermediatePersistPeriod": "PT10m"
"segmentGranularity":"hour", },
"basePersistDirectory" : "/tmp/realtime/basePersist", "firehose": {
"rejectionPolicy": {"type": "messageTime"} } "type": "kafka-0.7.2",
"consumerProps": {
}] "zk.connect": "localhost:2181",
"zk.connectiontimeout.ms": "15000",
"zk.sessiontimeout.ms": "15000",
"zk.synctime.ms": "5000",
"groupid": "topic-pixel-local",
"fetch.size": "1048586",
"autooffset.reset": "largest",
"autocommit.enable": "false"
},
"feed": "druidtest",
"parser": {
"timestampSpec": {
"column": "utcdt",
"format": "iso"
},
"data": {
"format": "json"
},
"dimensionExclusions": [
"wp"
]
}
},
"plumber": {
"type": "realtime",
"windowPeriod": "PT10m",
"segmentGranularity": "hour",
"basePersistDirectory": "\/tmp\/realtime\/basePersist",
"rejectionPolicy": {
"type": "messageTime"
}
}
}
]
``` ```
### dataSource ### ### dataSource ###
@ -330,7 +286,7 @@ Which gets us just people aged 40:
} ] } ]
``` ```
Check out [Filters](Filters.html) for more. Check out [Filters](Filters.html) for more information.
## Learn More ## ## Learn More ##

View File

@ -28,32 +28,64 @@ Realtime nodes take a mix of base server configuration and spec files that descr
The property `druid.realtime.specFile` has the path of a file (absolute or relative path and file name) with realtime specifications in it. This "specFile" should be a JSON Array of JSON objects like the following: The property `druid.realtime.specFile` has the path of a file (absolute or relative path and file name) with realtime specifications in it. This "specFile" should be a JSON Array of JSON objects like the following:
```json ```json
[{ [
"schema" : { "dataSource":"dataSourceName", {
"aggregators":[ {"type":"count", "name":"events"}, "schema": {
{"type":"doubleSum","name":"outColumn","fieldName":"inColumn"} ], "dataSource": "dataSourceName",
"indexGranularity":"minute", "aggregators": [
"shardSpec" : { "type": "none" } }, {
"config" : { "maxRowsInMemory" : 500000, "type": "count",
"intermediatePersistPeriod" : "PT10m" }, "name": "events"
"firehose" : { "type" : "kafka-0.7.2", },
"consumerProps" : { "zk.connect" : "zk_connect_string", {
"zk.connectiontimeout.ms" : "15000", "type": "doubleSum",
"zk.sessiontimeout.ms" : "15000", "name": "outColumn",
"zk.synctime.ms" : "5000", "fieldName": "inColumn"
"groupid" : "consumer-group", }
"fetch.size" : "1048586", ],
"autooffset.reset" : "largest", "indexGranularity": "minute",
"autocommit.enable" : "false" }, "shardSpec": {
"feed" : "your_kafka_topic", "type": "none"
"parser" : { "timestampSpec" : { "column" : "timestamp", "format" : "iso" }, }
"data" : { "format" : "json" }, },
"dimensionExclusions" : ["value"] } }, "config": {
"plumber" : { "type" : "realtime", "maxRowsInMemory": 500000,
"windowPeriod" : "PT10m", "intermediatePersistPeriod": "PT10m"
"segmentGranularity":"hour", },
"basePersistDirectory" : "/tmp/realtime/basePersist" } "firehose": {
}] "type": "kafka-0.7.2",
"consumerProps": {
"zk.connect": "zk_connect_string",
"zk.connectiontimeout.ms": "15000",
"zk.sessiontimeout.ms": "15000",
"zk.synctime.ms": "5000",
"groupid": "consumer-group",
"fetch.size": "1048586",
"autooffset.reset": "largest",
"autocommit.enable": "false"
},
"feed": "your_kafka_topic",
"parser": {
"timestampSpec": {
"column": "timestamp",
"format": "iso"
},
"data": {
"format": "json"
},
"dimensionExclusions": [
"value"
]
}
},
"plumber": {
"type": "realtime",
"windowPeriod": "PT10m",
"segmentGranularity": "hour",
"basePersistDirectory": "\/tmp\/realtime\/basePersist"
}
}
]
``` ```
This is a JSON Array so you can give more than one realtime stream to a given node. The number you can put in the same process depends on the exact configuration. In general, it is best to think of each realtime stream handler as requiring 2-threads: 1 thread for data consumption and aggregation, 1 thread for incremental persists and other background tasks. This is a JSON Array so you can give more than one realtime stream to a given node. The number you can put in the same process depends on the exact configuration. In general, it is best to think of each realtime stream handler as requiring 2-threads: 1 thread for data consumption and aggregation, 1 thread for incremental persists and other background tasks.
@ -116,43 +148,9 @@ Extending the code
Realtime integration is intended to be extended in two ways: Realtime integration is intended to be extended in two ways:
1. Connect to data streams from varied systems ([Firehose](https://github.com/metamx/druid/blob/druid-0.5.x/realtime/src/main/java/com/metamx/druid/realtime/firehose/FirehoseFactory.java)) 1. Connect to data streams from varied systems ([Firehose](https://github.com/metamx/druid/blob/druid-0.6.0/realtime/src/main/java/com/metamx/druid/realtime/firehose/FirehoseFactory.java))
2. Adjust the publishing strategy to match your needs ([Plumber](https://github.com/metamx/druid/blob/druid-0.5.x/realtime/src/main/java/com/metamx/druid/realtime/plumber/PlumberSchool.java)) 2. Adjust the publishing strategy to match your needs ([Plumber](https://github.com/metamx/druid/blob/druid-0.6.0/realtime/src/main/java/com/metamx/druid/realtime/plumber/PlumberSchool.java))
The expectations are that the former will be very common and something that users of Druid will do on a fairly regular basis. Most users will probably never have to deal with the latter form of customization. Indeed, we hope that all potential use cases can be packaged up as part of Druid proper without requiring proprietary customization. The expectations are that the former will be very common and something that users of Druid will do on a fairly regular basis. Most users will probably never have to deal with the latter form of customization. Indeed, we hope that all potential use cases can be packaged up as part of Druid proper without requiring proprietary customization.
Given those expectations, adding a firehose is straightforward and completely encapsulated inside of the interface. Adding a plumber is more involved and requires understanding of how the system works to get right, its not impossible, but its not intended that individuals new to Druid will be able to do it immediately. Given those expectations, adding a firehose is straightforward and completely encapsulated inside of the interface. Adding a plumber is more involved and requires understanding of how the system works to get right, its not impossible, but its not intended that individuals new to Druid will be able to do it immediately.
We will do our best to accept contributions from the community of new Firehoses and Plumbers, but we also understand the requirement for being able to plug in your own proprietary implementations. The model for doing this is by embedding the druid code in another project and writing your own `main()` method that initializes a RealtimeNode object and registers your proprietary objects with it.
```java
public class MyRealtimeMain
{
private static final Logger log = new Logger(MyRealtimeMain.class);
public static void main(String[] args) throws Exception
{
LogLevelAdjuster.register();
Lifecycle lifecycle = new Lifecycle();
lifecycle.addManagedInstance(
RealtimeNode.builder()
.build()
.registerJacksonSubtype(foo.bar.MyFirehose.class)
);
try {
lifecycle.start();
}
catch (Throwable t) {
log.info(t, "Throwable caught at startup, committing seppuku");
System.exit(2);
}
lifecycle.join();
}
}
```
Pluggable pieces of the system are either handled by a setter on the RealtimeNode object, or they are configuration driven and need to be setup to allow for [Jackson polymorphic deserialization](http://wiki.fasterxml.com/JacksonPolymorphicDeserialization) and registered via the relevant methods on the RealtimeNode object.

View File

@ -1,7 +1,7 @@
--- ---
layout: doc_page layout: doc_page
--- ---
Numerous backend engineers at [Metamarkets](http://www.metamarkets.com) work on Druid full-time. If you any questions about usage or code, feel free to contact any of us. Numerous backend engineers at [Metamarkets](http://www.metamarkets.com) and other companies work on Druid full-time. If you any questions about usage or code, feel free to contact any of us.
Google Groups Mailing List Google Groups Mailing List
-------------------------- --------------------------

View File

@ -47,7 +47,7 @@ There are two ways to setup Druid: download a tarball, or [Build From Source](Bu
### Download a Tarball ### Download a Tarball
We've built a tarball that contains everything you'll need. You'll find it [here](http://static.druid.io/artifacts/releases/druid-services-0.5.54-bin.tar.gz) We've built a tarball that contains everything you'll need. You'll find it [here](http://static.druid.io/artifacts/releases/druid-services-0.6.0-bin.tar.gz)
Download this file to a directory of your choosing. Download this file to a directory of your choosing.
You can extract the awesomeness within by issuing: You can extract the awesomeness within by issuing:
@ -59,7 +59,7 @@ tar -zxvf druid-services-*-bin.tar.gz
Not too lost so far right? That's great! If you cd into the directory: Not too lost so far right? That's great! If you cd into the directory:
``` ```
cd druid-services-0.5.54 cd druid-services-0.6.0
``` ```
You should see a bunch of files: You should see a bunch of files:
@ -82,10 +82,12 @@ Select "wikipedia".
Once the node starts up you will see a bunch of logs about setting up properties and connecting to the data source. If everything was successful, you should see messages of the form shown below. Once the node starts up you will see a bunch of logs about setting up properties and connecting to the data source. If everything was successful, you should see messages of the form shown below.
``` ```
2013-07-19 21:54:05,154 INFO [main] com.metamx.druid.realtime.RealtimeNode - Starting Jetty 2013-09-04 19:33:11,922 INFO [main] org.eclipse.jetty.server.AbstractConnector - Started SelectChannelConnector@0.0.0.0:8083
2013-07-19 21:54:05,154 INFO [main] org.mortbay.log - jetty-6.1.x 2013-09-04 19:33:11,946 INFO [ApiDaemon] io.druid.segment.realtime.firehose.IrcFirehoseFactory - irc connection to server [irc.wikimedia.org] established
2013-07-19 21:54:05,171 INFO [chief-wikipedia] com.metamx.druid.realtime.plumber.RealtimePlumberSchool - Expect to run at [2013-07-19T22:03:00.000Z] 2013-09-04 19:33:11,946 INFO [ApiDaemon] io.druid.segment.realtime.firehose.IrcFirehoseFactory - Joining channel #en.wikipedia
2013-07-19 21:54:05,246 INFO [main] org.mortbay.log - Started SelectChannelConnector@0.0.0.0:8083 2013-09-04 19:33:11,946 INFO [ApiDaemon] io.druid.segment.realtime.firehose.IrcFirehoseFactory - Joining channel #fr.wikipedia
2013-09-04 19:33:11,946 INFO [ApiDaemon] io.druid.segment.realtime.firehose.IrcFirehoseFactory - Joining channel #de.wikipedia
2013-09-04 19:33:11,946 INFO [ApiDaemon] io.druid.segment.realtime.firehose.IrcFirehoseFactory - Joining channel #ja.wikipedia
``` ```
The Druid real time-node ingests events in an in-memory buffer. Periodically, these events will be persisted to disk. If you are interested in the details of our real-time architecture and why we persist indexes to disk, I suggest you read our [White Paper](http://static.druid.io/docs/druid.pdf). The Druid real time-node ingests events in an in-memory buffer. Periodically, these events will be persisted to disk. If you are interested in the details of our real-time architecture and why we persist indexes to disk, I suggest you read our [White Paper](http://static.druid.io/docs/druid.pdf).

View File

@ -1,7 +1,7 @@
--- ---
layout: doc_page layout: doc_page
--- ---
Welcome back! In our first [tutorial](https://github.com/metamx/druid/wiki/Tutorial%3A-A-First-Look-at-Druid), we introduced you to the most basic Druid setup: a single realtime node. We streamed in some data and queried it. Realtime nodes collect very recent data and periodically hand that data off to the rest of the Druid cluster. Some questions about the architecture must naturally come to mind. What does the rest of Druid cluster look like? How does Druid load available static data? Welcome back! In our first [tutorial](Tutorial:-A-First-Look-at-Druid), we introduced you to the most basic Druid setup: a single realtime node. We streamed in some data and queried it. Realtime nodes collect very recent data and periodically hand that data off to the rest of the Druid cluster. Some questions about the architecture must naturally come to mind. What does the rest of Druid cluster look like? How does Druid load available static data?
This tutorial will hopefully answer these questions! This tutorial will hopefully answer these questions!
@ -11,7 +11,7 @@ In this tutorial, we will set up other types of Druid nodes as well as and exter
If you followed the first tutorial, you should already have Druid downloaded. If not, let's go back and do that first. If you followed the first tutorial, you should already have Druid downloaded. If not, let's go back and do that first.
You can download the latest version of druid [here](http://static.druid.io/artifacts/releases/druid-services-0.5.54-bin.tar.gz) You can download the latest version of druid [here](http://static.druid.io/artifacts/releases/druid-services-0.6.0-bin.tar.gz)
and untar the contents within by issuing: and untar the contents within by issuing:
@ -26,7 +26,7 @@ You can also [Build From Source](Build-From-Source.html).
Druid requires 3 external dependencies. A "deep" storage that acts as a backup data repository, a relational database such as MySQL to hold configuration and metadata information, and [Apache Zookeeper](http://zookeeper.apache.org/) for coordination among different pieces of the cluster. Druid requires 3 external dependencies. A "deep" storage that acts as a backup data repository, a relational database such as MySQL to hold configuration and metadata information, and [Apache Zookeeper](http://zookeeper.apache.org/) for coordination among different pieces of the cluster.
For deep storage, we have made a public S3 bucket (static.druid.io) available where data for this particular tutorial can be downloaded. More on the data [later](https://github.com/metamx/druid/wiki/Tutorial-Part-2#the-data). For deep storage, we have made a public S3 bucket (static.druid.io) available where data for this particular tutorial can be downloaded. More on the data [later](Tutorial-Part-2.html#the-data).
### Setting up MySQL ### ### Setting up MySQL ###
@ -56,7 +56,7 @@ cd ..
## The Data ## ## The Data ##
Similar to the first tutorial, the data we will be loading is based on edits that have occurred on Wikipedia. Every time someone edits a page in Wikipedia, metadata is generated about the editor and edited page. Druid collects each individual event and packages them together in a container known as a [segment](https://github.com/metamx/druid/wiki/Segments). Segments contain data over some span of time. We've prebuilt a segment for this tutorial and will cover making your own segments in other [pages](https://github.com/metamx/druid/wiki/Loading-Your-Data).The segment we are going to work with has the following format: Similar to the first tutorial, the data we will be loading is based on edits that have occurred on Wikipedia. Every time someone edits a page in Wikipedia, metadata is generated about the editor and edited page. Druid collects each individual event and packages them together in a container known as a [segment](https://github.com/metamx/druid/wiki/Segments). Segments contain data over some span of time. We've prebuilt a segment for this tutorial and will cover making your own segments in other [pages](Loading-Your-Data.html).The segment we are going to work with has the following format:
Dimensions (things to filter on): Dimensions (things to filter on):
@ -92,11 +92,12 @@ Let's start up a few nodes and download our data. First things though, let's cre
mkdir config mkdir config
``` ```
If you are interested in learning more about Druid configuration files, check out this [link](https://github.com/metamx/druid/wiki/Configuration). Many aspects of Druid are customizable. For the purposes of this tutorial, we are going to use default values for most things. If you are interested in learning more about Druid configuration files, check out this [link](Configuration.html). Many aspects of Druid are customizable. For the purposes of this tutorial, we are going to use default values for most things.
### Start a Coordinator Node ### ### Start a Coordinator Node ###
Coordinator nodes are in charge of load assignment and distribution. Coordinator nodes monitor the status of the cluster and command historical nodes to assign and drop segments. Coordinator nodes are in charge of load assignment and distribution. Coordinator nodes monitor the status of the cluster and command historical nodes to assign and drop segments.
For more information about coordinator nodes, see [here](Coordinator.html).
To create the coordinator config file: To create the coordinator config file:
@ -104,36 +105,23 @@ To create the coordinator config file:
mkdir config/coordinator mkdir config/coordinator
``` ```
Under the directory we just created, create the file `runtime.properties` with the following contents: Under the directory we just created, create the file `runtime.properties` with the following contents if it does not exist:
``` ```
druid.host=127.0.0.1:8082 druid.host=localhost
druid.port=8082
druid.service=coordinator druid.service=coordinator
druid.port=8082
# logging
com.metamx.emitter.logging=true
com.metamx.emitter.logging.level=info
# zk
druid.zk.service.host=localhost druid.zk.service.host=localhost
druid.zk.paths.base=/druid
druid.zk.paths.discoveryPath=/druid/discoveryPath
# aws (demo user) druid.s3.accessKey=AKIAIMKECRUYKDQGR6YQ
com.metamx.aws.accessKey=AKIAIMKECRUYKDQGR6YQ druid.s3.secretKey=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
com.metamx.aws.secretKey=QyyfVZ7llSiRg6Qcrql1eEUG7buFpAK6T6engr1b
# db druid.db.connector.connectURI=jdbc\:mysql\://localhost\:3306/druid
druid.database.segmentTable=segments druid.db.connector.user=druid
druid.database.user=druid druid.db.connector.password=diurd
druid.database.password=diurd
druid.database.connectURI=jdbc:mysql://localhost:3306/druid
druid.database.ruleTable=rules
druid.database.configTable=config
# coordinator runtime configs druid.coordinator.startDelay=PT60s
druid.coordinator.startDelay=PT60S
``` ```
To start the coordinator node: To start the coordinator node:
@ -144,7 +132,8 @@ java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath lib/*:config/
### Start a historical node ### ### Start a historical node ###
historical nodes are the workhorses of a cluster and are in charge of loading historical segments and making them available for queries. Our Wikipedia segment will be downloaded by a historical node. Historical nodes are the workhorses of a cluster and are in charge of loading historical segments and making them available for queries. Our Wikipedia segment will be downloaded by a historical node.
For more information about Historical nodes, see [here](Historical.html).
To create the historical config file: To create the historical config file:
@ -155,34 +144,21 @@ mkdir config/historical
Under the directory we just created, create the file `runtime.properties` with the following contents: Under the directory we just created, create the file `runtime.properties` with the following contents:
``` ```
druid.host=127.0.0.1:8081 druid.host=localhost
druid.port=8081
druid.service=historical druid.service=historical
druid.port=8081
# logging
com.metamx.emitter.logging=true
com.metamx.emitter.logging.level=info
# zk
druid.zk.service.host=localhost druid.zk.service.host=localhost
druid.zk.paths.base=/druid
druid.zk.paths.discoveryPath=/druid/discoveryPath
# processing druid.s3.secretKey=QyyfVZ7llSiRg6Qcrql1eEUG7buFpAK6T6engr1b
druid.s3.accessKey=AKIAIMKECRUYKDQGR6YQ
druid.server.maxSize=100000000
druid.processing.buffer.sizeBytes=10000000 druid.processing.buffer.sizeBytes=10000000
# aws (demo user) druid.segmentCache.infoPath=/tmp/druid/segmentInfoCache
com.metamx.aws.accessKey=AKIAIMKECRUYKDQGR6YQ druid.segmentCache.locations=[{"path": "/tmp/druid/indexCache", "maxSize"\: 100000000}]
com.metamx.aws.secretKey=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
# Path on local FS for storage of segments; dir will be created if needed
druid.paths.indexCache=/tmp/druid/indexCache
# Path on local FS for storage of segment metadata; dir will be created if needed
druid.paths.segmentInfoCache=/tmp/druid/segmentInfoCache
# server
druid.server.maxSize=100000000
``` ```
To start the historical node: To start the historical node:
@ -194,6 +170,7 @@ java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath lib/*:config/
### Start a Broker Node ### ### Start a Broker Node ###
Broker nodes are responsible for figuring out which historical and/or realtime nodes correspond to which queries. They also merge partial results from these nodes in a scatter/gather fashion. Broker nodes are responsible for figuring out which historical and/or realtime nodes correspond to which queries. They also merge partial results from these nodes in a scatter/gather fashion.
For more information about Broker nodes, see [here](Broker.html).
To create the broker config file: To create the broker config file:
@ -204,27 +181,17 @@ mkdir config/broker
Under the directory we just created, create the file ```runtime.properties``` with the following contents: Under the directory we just created, create the file ```runtime.properties``` with the following contents:
``` ```
druid.host=127.0.0.1:8080 druid.host=localhost
druid.port=8080
druid.service=broker druid.service=broker
druid.port=8080
# logging
com.metamx.emitter.logging=true
com.metamx.emitter.logging.level=info
# zk
druid.zk.service.host=localhost druid.zk.service.host=localhost
druid.zk.paths.base=/druid
druid.zk.paths.discoveryPath=/druid/discoveryPath
# thread pool size for servicing queries
druid.client.http.connections=10
``` ```
To start the broker node: To start the broker node:
```bash ```bash
java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath lib/*:config/broker com.metamx.druid.http.BrokerMain java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath lib/*:config/broker io.druid.cli.Main server broker
``` ```
## Loading the Data ## ## Loading the Data ##
@ -251,9 +218,9 @@ When the segment completes downloading and ready for queries, you should see the
2013-08-08 22:48:41,959 INFO [ZkCoordinator-0] com.metamx.druid.coordination.BatchDataSegmentAnnouncer - Announcing segment[wikipedia_2013-08-01T00:00:00.000Z_2013-08-02T00:00:00.000Z_2013-08-08T21:22:48.989Z] at path[/druid/segments/127.0.0.1:8081/2013-08-08T22:48:41.959Z] 2013-08-08 22:48:41,959 INFO [ZkCoordinator-0] com.metamx.druid.coordination.BatchDataSegmentAnnouncer - Announcing segment[wikipedia_2013-08-01T00:00:00.000Z_2013-08-02T00:00:00.000Z_2013-08-08T21:22:48.989Z] at path[/druid/segments/127.0.0.1:8081/2013-08-08T22:48:41.959Z]
``` ```
At this point, we can query the segment. For more information on querying, see this [link](https://github.com/metamx/druid/wiki/Querying). At this point, we can query the segment. For more information on querying, see this [link](Querying.html).
## Next Steps ## ## Next Steps ##
Now that you have an understanding of what the Druid clsuter looks like, why not load some of your own data? Now that you have an understanding of what the Druid clsuter looks like, why not load some of your own data?
Check out the [Loading Your Own Data](https://github.com/metamx/druid/wiki/Loading-Your-Data) section for more info! Check out the [Loading Your Own Data](Loading-Your-Data.html) section for more info!

View File

@ -37,7 +37,7 @@ There are two ways to setup Druid: download a tarball, or [Build From Source](Bu
h3. Download a Tarball h3. Download a Tarball
We've built a tarball that contains everything you'll need. You'll find it [here](http://static.druid.io/artifacts/releases/druid-services-0.5.50-bin.tar.gz) We've built a tarball that contains everything you'll need. You'll find it [here](http://static.druid.io/artifacts/releases/druid-services-0.6.0-bin.tar.gz)
Download this file to a directory of your choosing. Download this file to a directory of your choosing.
You can extract the awesomeness within by issuing: You can extract the awesomeness within by issuing:
@ -48,7 +48,7 @@ tar zxvf druid-services-*-bin.tar.gz
Not too lost so far right? That's great! If you cd into the directory: Not too lost so far right? That's great! If you cd into the directory:
``` ```
cd druid-services-0.5.50 cd druid-services-0.6.0
``` ```
You should see a bunch of files: You should see a bunch of files:
@ -68,9 +68,8 @@ Select "webstream".
Once the node starts up you will see a bunch of logs about setting up properties and connecting to the data source. If everything was successful, you should see messages of the form shown below. Once the node starts up you will see a bunch of logs about setting up properties and connecting to the data source. If everything was successful, you should see messages of the form shown below.
``` ```
2013-07-19 21:54:05,154 INFO com.metamx.druid.realtime.RealtimeNode~~ Starting Jetty Jul 19, 2013 21:54:05 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
2013-07-19 21:54:05,154 INFO org.mortbay.log - jetty-6.1.x INFO: Binding io.druid.server.StatusResource to GuiceManagedComponentProvider with the scope "Undefined"
2013-07-19 21:54:05,171 INFO com.metamx.druid.realtime.plumber.RealtimePlumberSchool - Expect to run at
2013-07-19 21:54:05,246 INFO org.mortbay.log - Started SelectChannelConnector@0.0.0.0:8083 2013-07-19 21:54:05,246 INFO org.mortbay.log - Started SelectChannelConnector@0.0.0.0:8083
``` ```

View File

@ -9,16 +9,16 @@ There are two ways to setup Druid: download a tarball, or build it from source.
h3. Download a Tarball h3. Download a Tarball
We've built a tarball that contains everything you'll need. You'll find it "here":http://static.druid.io/data/examples/druid-services-0.4.6.tar.gz. We've built a tarball that contains everything you'll need. You'll find it "here":http://static.druid.io/data/examples/druid-services-0.6.0.tar.gz.
Download this bad boy to a directory of your choosing. Download this bad boy to a directory of your choosing.
You can extract the awesomeness within by issuing: You can extract the awesomeness within by issuing:
pre. tar -zxvf druid-services-0.4.6.tar.gz pre. tar -zxvf druid-services-0.6.0.tar.gz
Not too lost so far right? That's great! If you cd into the directory: Not too lost so far right? That's great! If you cd into the directory:
pre. cd druid-services-0.4.6-SNAPSHOT pre. cd druid-services-0.6.0-SNAPSHOT
You should see a bunch of files: You should see a bunch of files:
* run_example_server.sh * run_example_server.sh
@ -31,7 +31,7 @@ The other way to setup Druid is from source via git. To do so, run these command
<pre><code>git clone git@github.com:metamx/druid.git <pre><code>git clone git@github.com:metamx/druid.git
cd druid cd druid
git checkout druid-0.4.32-branch git checkout druid-0.6.0
./build.sh ./build.sh
</code></pre> </code></pre>

View File

@ -9,4 +9,6 @@ druid.s3.secretKey=QyyfVZ7llSiRg6Qcrql1eEUG7buFpAK6T6engr1b
druid.db.connector.connectURI=jdbc\:mysql\://localhost\:3306/druid druid.db.connector.connectURI=jdbc\:mysql\://localhost\:3306/druid
druid.db.connector.user=druid druid.db.connector.user=druid
druid.db.connector.password=diurd druid.db.connector.password=diurd
druid.coordinator.startDelay=PT60s

View File

@ -31,7 +31,7 @@ public abstract class DruidCoordinatorConfig
public abstract String getHost(); public abstract String getHost();
@Config("druid.coordinator.startDelay") @Config("druid.coordinator.startDelay")
@Default("PT120s") @Default("PT300s")
public abstract Duration getCoordinatorStartDelay(); public abstract Duration getCoordinatorStartDelay();
@Config("druid.coordinator.period") @Config("druid.coordinator.period")