mirror of https://github.com/apache/druid.git
86 lines
3.3 KiB
Markdown
86 lines
3.3 KiB
Markdown
---
|
|
layout: doc_page
|
|
---
|
|
|
|
# Tutorial: Load streaming data from Kafka
|
|
|
|
## Getting started
|
|
|
|
This tutorial demonstrates how to load data from a Kafka stream, using the Druid Kafka indexing service.
|
|
|
|
For this tutorial, we'll assume you've already downloaded Druid as described in
|
|
the [single-machine quickstart](index.html) and have it running on your local machine. You
|
|
don't need to have loaded any data yet.
|
|
|
|
## Download and start Kafka
|
|
|
|
[Apache Kafka](http://kafka.apache.org/) is a high throughput message bus that works well with
|
|
Druid. For this tutorial, we will use Kafka 0.10.2.0. To download Kafka, issue the following
|
|
commands in your terminal:
|
|
|
|
```bash
|
|
curl -O https://archive.apache.org/dist/kafka/0.10.2.0/kafka_2.11-0.10.2.0.tgz
|
|
tar -xzf kafka_2.11-0.10.2.0.tgz
|
|
cd kafka_2.11-0.10.2.0
|
|
```
|
|
|
|
Start a Kafka broker by running the following command in a new terminal:
|
|
|
|
```bash
|
|
./bin/kafka-server-start.sh config/server.properties
|
|
```
|
|
|
|
Run this command to create a Kafka topic called *wikipedia*, to which we'll send data:
|
|
|
|
```bash
|
|
./bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic wikipedia
|
|
```
|
|
|
|
## Enable Druid Kafka ingestion
|
|
|
|
We will use Druid's Kafka indexing service to ingest messages from our newly created *wikipedia* topic. To start the
|
|
service, we will need to submit a supervisor spec to the Druid overlord by running the following from the Druid package root:
|
|
|
|
```bash
|
|
curl -XPOST -H'Content-Type: application/json' -d @quickstart/tutorial/wikipedia-kafka-supervisor.json http://localhost:8090/druid/indexer/v1/supervisor
|
|
```
|
|
|
|
If the supervisor was successfully created, you will get a response containing the ID of the supervisor; in our case we should see `{"id":"wikipedia-kafka"}`.
|
|
|
|
For more details about what's going on here, check out the
|
|
[Druid Kafka indexing service documentation](../development/extensions-core/kafka-ingestion.html).
|
|
|
|
## Load data
|
|
|
|
Let's launch a console producer for our topic and send some data!
|
|
|
|
In your Druid directory, run the following command:
|
|
|
|
```bash
|
|
cd quickstart
|
|
gunzip -k wikipedia-2015-09-12-sampled.json.gz
|
|
```
|
|
|
|
In your Kafka directory, run the following command, where {PATH_TO_DRUID} is replaced by the path to the Druid directory:
|
|
|
|
```bash
|
|
export KAFKA_OPTS="-Dfile.encoding=UTF-8"
|
|
./bin/kafka-console-producer.sh --broker-list localhost:9092 --topic wikipedia < {PATH_TO_DRUID}/quickstart/wikipedia-2015-09-12-sampled.json
|
|
```
|
|
|
|
The previous command posted sample events to the *wikipedia* Kafka topic which were then ingested into Druid by the Kafka indexing service. You're now ready to run some queries!
|
|
|
|
## Querying your data
|
|
|
|
After data is sent to the Kafka stream, it is immediately available for querying.
|
|
|
|
Please follow the [query tutorial](../tutorials/tutorial-query.html) to run some example queries on the newly loaded data.
|
|
|
|
## Cleanup
|
|
|
|
If you wish to go through any of the other ingestion tutorials, you will need to shut down the cluster and reset the cluster state by removing the contents of the `var` directory under the druid package, as the other tutorials will write to the same "wikipedia" datasource.
|
|
|
|
## Further reading
|
|
|
|
For more information on loading data from Kafka streams, please see the [Druid Kafka indexing service documentation](../development/extensions-core/kafka-ingestion.html).
|