druid/docs/content/tutorials/tutorial-kafka.md

<!--
  ~ Licensed to the Apache Software Foundation (ASF) under one
  ~ or more contributor license agreements.  See the NOTICE file
  ~ distributed with this work for additional information
  ~ regarding copyright ownership.  The ASF licenses this file
  ~ to you under the Apache License, Version 2.0 (the
  ~ "License"); you may not use this file except in compliance
  ~ with the License.  You may obtain a copy of the License at
  ~
  ~   http://www.apache.org/licenses/LICENSE-2.0
  ~
  ~ Unless required by applicable law or agreed to in writing,
  ~ software distributed under the License is distributed on an
  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  ~ KIND, either express or implied.  See the License for the
  ~ specific language governing permissions and limitations
  ~ under the License.
  -->

---
layout: doc_page
title: "Tutorial: Load streaming data from Kafka"
---
# Tutorial: Load streaming data from Kafka

## Getting started

This tutorial demonstrates how to load data from a Kafka stream, using the Druid Kafka indexing service.

For this tutorial, we'll assume you've already downloaded Druid as described in 
the [single-machine quickstart](index.html) and have it running on your local machine. You 
don't need to have loaded any data yet.

## Download and start Kafka

[Apache Kafka](http://kafka.apache.org/) is a high throughput message bus that works well with
Druid.  For this tutorial, we will use Kafka 0.10.2.0. To download Kafka, issue the following
commands in your terminal:

```bash
curl -O https://archive.apache.org/dist/kafka/0.10.2.0/kafka_2.11-0.10.2.0.tgz
tar -xzf kafka_2.11-0.10.2.0.tgz
cd kafka_2.11-0.10.2.0
```

Start a Kafka broker by running the following command in a new terminal:

```bash
./bin/kafka-server-start.sh config/server.properties
```

Run this command to create a Kafka topic called *wikipedia*, to which we'll send data:

```bash
./bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic wikipedia
```

## Enable Druid Kafka ingestion

We will use Druid's Kafka indexing service to ingest messages from our newly created *wikipedia* topic. To start the
service, we will need to submit a supervisor spec to the Druid overlord by running the following from the Druid package root:

```bash
curl -XPOST -H'Content-Type: application/json' -d @quickstart/tutorial/wikipedia-kafka-supervisor.json http://localhost:8090/druid/indexer/v1/supervisor
```

If the supervisor was successfully created, you will get a response containing the ID of the supervisor; in our case we should see `{"id":"wikipedia-kafka"}`.

For more details about what's going on here, check out the
[Druid Kafka indexing service documentation](../development/extensions-core/kafka-ingestion.html).

## Load data

Let's launch a console producer for our topic and send some data!

In your Druid directory, run the following command:

```bash
cd quickstart/tutorial
gunzip -k wikiticker-2015-09-12-sampled.json.gz
```

In your Kafka directory, run the following command, where {PATH_TO_DRUID} is replaced by the path to the Druid directory:

```bash
export KAFKA_OPTS="-Dfile.encoding=UTF-8"
./bin/kafka-console-producer.sh --broker-list localhost:9092 --topic wikipedia < {PATH_TO_DRUID}/quickstart/tutorial/wikiticker-2015-09-12-sampled.json
```

The previous command posted sample events to the *wikipedia* Kafka topic which were then ingested into Druid by the Kafka indexing service. You're now ready to run some queries!

## Querying your data

After data is sent to the Kafka stream, it is immediately available for querying.

Please follow the [query tutorial](../tutorials/tutorial-query.html) to run some example queries on the newly loaded data.

## Cleanup

If you wish to go through any of the other ingestion tutorials, you will need to shut down the cluster and reset the cluster state by removing the contents of the `var` directory under the druid package, as the other tutorials will write to the same "wikipedia" datasource.

## Further reading

For more information on loading data from Kafka streams, please see the [Druid Kafka indexing service documentation](../development/extensions-core/kafka-ingestion.html).
add missing license headers, in particular to MD files; clean up RAT … (#6563) * add missing license headers, in particular to MD files; clean up RAT exclusions * revert inadvertent doc changes * docs * cr changes * fix modified druid-production.svg 2018-11-13 12:38:37 -05:00			`<!--`
			`~ Licensed to the Apache Software Foundation (ASF) under one`
			`~ or more contributor license agreements. See the NOTICE file`
			`~ distributed with this work for additional information`
			`~ regarding copyright ownership. The ASF licenses this file`
			`~ to you under the Apache License, Version 2.0 (the`
			`~ "License"); you may not use this file except in compliance`
			`~ with the License. You may obtain a copy of the License at`
			`~`
			`~ http://www.apache.org/licenses/LICENSE-2.0`
			`~`
			`~ Unless required by applicable law or agreed to in writing,`
			`~ software distributed under the License is distributed on an`
			`~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY`
			`~ KIND, either express or implied. See the License for the`
			`~ specific language governing permissions and limitations`
			`~ under the License.`
			`-->`

new quickstart 2016-01-06 00:27:52 -05:00			`---`
			`layout: doc_page`
Added titles and harmonized docs to improve usability and SEO (#6731) * added titles and harmonized docs * manually fixed some titles 2018-12-12 23:42:12 -05:00			`title: "Tutorial: Load streaming data from Kafka"`
new quickstart 2016-01-06 00:27:52 -05:00			`---`
New quickstart and tutorials (#6126) * New quickstart and tutorials * PR comments * Fix tranquility 2018-08-09 16:37:52 -04:00			`# Tutorial: Load streaming data from Kafka`
new quickstart 2016-01-06 00:27:52 -05:00
			`## Getting started`

New quickstart and tutorials (#6126) * New quickstart and tutorials * PR comments * Fix tranquility 2018-08-09 16:37:52 -04:00			`This tutorial demonstrates how to load data from a Kafka stream, using the Druid Kafka indexing service.`
new quickstart 2016-01-06 00:27:52 -05:00
New quickstart and tutorials (#6126) * New quickstart and tutorials * PR comments * Fix tranquility 2018-08-09 16:37:52 -04:00			`For this tutorial, we'll assume you've already downloaded Druid as described in`
			`the [single-machine quickstart](index.html) and have it running on your local machine. You`
new quickstart 2016-01-06 00:27:52 -05:00			`don't need to have loaded any data yet.`

New quickstart and tutorials (#6126) * New quickstart and tutorials * PR comments * Fix tranquility 2018-08-09 16:37:52 -04:00			`## Download and start Kafka`
new quickstart 2016-01-06 00:27:52 -05:00
add doc rendering 2016-02-04 14:53:09 -05:00			`[Apache Kafka](http://kafka.apache.org/) is a high throughput message bus that works well with`
New quickstart and tutorials (#6126) * New quickstart and tutorials * PR comments * Fix tranquility 2018-08-09 16:37:52 -04:00			`Druid. For this tutorial, we will use Kafka 0.10.2.0. To download Kafka, issue the following`
new quickstart 2016-01-06 00:27:52 -05:00			`commands in your terminal:`

			```bash
New quickstart and tutorials (#6126) * New quickstart and tutorials * PR comments * Fix tranquility 2018-08-09 16:37:52 -04:00			`curl -O https://archive.apache.org/dist/kafka/0.10.2.0/kafka_2.11-0.10.2.0.tgz`
			`tar -xzf kafka_2.11-0.10.2.0.tgz`
			`cd kafka_2.11-0.10.2.0`
new quickstart 2016-01-06 00:27:52 -05:00			```

			`Start a Kafka broker by running the following command in a new terminal:`

			```bash
			`./bin/kafka-server-start.sh config/server.properties`
			```

New quickstart and tutorials (#6126) * New quickstart and tutorials * PR comments * Fix tranquility 2018-08-09 16:37:52 -04:00			`Run this command to create a Kafka topic called wikipedia, to which we'll send data:`
new quickstart 2016-01-06 00:27:52 -05:00
			```bash
New quickstart and tutorials (#6126) * New quickstart and tutorials * PR comments * Fix tranquility 2018-08-09 16:37:52 -04:00			`./bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic wikipedia`
new quickstart 2016-01-06 00:27:52 -05:00			```

New quickstart and tutorials (#6126) * New quickstart and tutorials * PR comments * Fix tranquility 2018-08-09 16:37:52 -04:00			`## Enable Druid Kafka ingestion`
new quickstart 2016-01-06 00:27:52 -05:00
New quickstart and tutorials (#6126) * New quickstart and tutorials * PR comments * Fix tranquility 2018-08-09 16:37:52 -04:00			`We will use Druid's Kafka indexing service to ingest messages from our newly created wikipedia topic. To start the`
Fix kafka tutorial typo (#6141) 2018-08-09 21:41:05 -04:00			`service, we will need to submit a supervisor spec to the Druid overlord by running the following from the Druid package root:`
new quickstart 2016-01-06 00:27:52 -05:00
			```bash
New quickstart and tutorials (#6126) * New quickstart and tutorials * PR comments * Fix tranquility 2018-08-09 16:37:52 -04:00			`curl -XPOST -H'Content-Type: application/json' -d @quickstart/tutorial/wikipedia-kafka-supervisor.json http://localhost:8090/druid/indexer/v1/supervisor`
new quickstart 2016-01-06 00:27:52 -05:00			```

New quickstart and tutorials (#6126) * New quickstart and tutorials * PR comments * Fix tranquility 2018-08-09 16:37:52 -04:00			If the supervisor was successfully created, you will get a response containing the ID of the supervisor; in our case we should see `{"id":"wikipedia-kafka"}`.
new quickstart 2016-01-06 00:27:52 -05:00
New quickstart and tutorials (#6126) * New quickstart and tutorials * PR comments * Fix tranquility 2018-08-09 16:37:52 -04:00			`For more details about what's going on here, check out the`
New doc fixes (#6156) 2018-08-13 14:11:32 -04:00			`[Druid Kafka indexing service documentation](../development/extensions-core/kafka-ingestion.html).`
new quickstart 2016-01-06 00:27:52 -05:00
New quickstart and tutorials (#6126) * New quickstart and tutorials * PR comments * Fix tranquility 2018-08-09 16:37:52 -04:00			`## Load data`
new quickstart 2016-01-06 00:27:52 -05:00
New quickstart and tutorials (#6126) * New quickstart and tutorials * PR comments * Fix tranquility 2018-08-09 16:37:52 -04:00			`Let's launch a console producer for our topic and send some data!`
new quickstart 2016-01-06 00:27:52 -05:00
New quickstart and tutorials (#6126) * New quickstart and tutorials * PR comments * Fix tranquility 2018-08-09 16:37:52 -04:00			`In your Druid directory, run the following command:`
new quickstart 2016-01-06 00:27:52 -05:00
New doc fixes (#6156) 2018-08-13 14:11:32 -04:00			```bash
fixup docs to download from Apache mirror, fixup tarball name and path, change references from quickstart/* to quickstart/tutorial/* (#6570) 2018-11-02 00:47:29 -04:00			`cd quickstart/tutorial`
Fix tutorial sample data filename, fix logger classname in metrics docs (#6299) 2018-09-05 00:47:12 -04:00			`gunzip -k wikiticker-2015-09-12-sampled.json.gz`
new quickstart 2016-01-06 00:27:52 -05:00			```

New quickstart and tutorials (#6126) * New quickstart and tutorials * PR comments * Fix tranquility 2018-08-09 16:37:52 -04:00			`In your Kafka directory, run the following command, where {PATH_TO_DRUID} is replaced by the path to the Druid directory:`
new quickstart 2016-01-06 00:27:52 -05:00
			```bash
New quickstart and tutorials (#6126) * New quickstart and tutorials * PR comments * Fix tranquility 2018-08-09 16:37:52 -04:00			`export KAFKA_OPTS="-Dfile.encoding=UTF-8"`
fixup docs to download from Apache mirror, fixup tarball name and path, change references from quickstart/* to quickstart/tutorial/* (#6570) 2018-11-02 00:47:29 -04:00			`./bin/kafka-console-producer.sh --broker-list localhost:9092 --topic wikipedia < {PATH_TO_DRUID}/quickstart/tutorial/wikiticker-2015-09-12-sampled.json`
new quickstart 2016-01-06 00:27:52 -05:00			```

New quickstart and tutorials (#6126) * New quickstart and tutorials * PR comments * Fix tranquility 2018-08-09 16:37:52 -04:00			`The previous command posted sample events to the wikipedia Kafka topic which were then ingested into Druid by the Kafka indexing service. You're now ready to run some queries!`
new quickstart 2016-01-06 00:27:52 -05:00
New quickstart and tutorials (#6126) * New quickstart and tutorials * PR comments * Fix tranquility 2018-08-09 16:37:52 -04:00			`## Querying your data`
new quickstart 2016-01-06 00:27:52 -05:00
New quickstart and tutorials (#6126) * New quickstart and tutorials * PR comments * Fix tranquility 2018-08-09 16:37:52 -04:00			`After data is sent to the Kafka stream, it is immediately available for querying.`
new quickstart 2016-01-06 00:27:52 -05:00
New doc fixes (#6156) 2018-08-13 14:11:32 -04:00			`Please follow the [query tutorial](../tutorials/tutorial-query.html) to run some example queries on the newly loaded data.`
new quickstart 2016-01-06 00:27:52 -05:00
New quickstart and tutorials (#6126) * New quickstart and tutorials * PR comments * Fix tranquility 2018-08-09 16:37:52 -04:00			`## Cleanup`
new quickstart 2016-01-06 00:27:52 -05:00
New quickstart and tutorials (#6126) * New quickstart and tutorials * PR comments * Fix tranquility 2018-08-09 16:37:52 -04:00			If you wish to go through any of the other ingestion tutorials, you will need to shut down the cluster and reset the cluster state by removing the contents of the `var` directory under the druid package, as the other tutorials will write to the same "wikipedia" datasource.
new quickstart 2016-01-06 00:27:52 -05:00
			`## Further reading`

New doc fixes (#6156) 2018-08-13 14:11:32 -04:00			`For more information on loading data from Kafka streams, please see the [Druid Kafka indexing service documentation](../development/extensions-core/kafka-ingestion.html).`