2018-12-13 14:47:20 -05:00
---
layout: doc_page
2019-05-16 14:13:48 -04:00
title: "Apache Druid (incubating) Single-Server Quickstart"
2018-12-13 14:47:20 -05:00
---
2018-11-13 12:38:37 -05:00
<!--
~ Licensed to the Apache Software Foundation (ASF) under one
~ or more contributor license agreements. See the NOTICE file
~ distributed with this work for additional information
~ regarding copyright ownership. The ASF licenses this file
~ to you under the Apache License, Version 2.0 (the
~ "License"); you may not use this file except in compliance
~ with the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing,
~ software distributed under the License is distributed on an
~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
~ KIND, either express or implied. See the License for the
~ specific language governing permissions and limitations
~ under the License.
-->
2019-05-16 14:13:48 -04:00
# Apache Druid (incubating) Single-Server Quickstart
2018-08-09 16:37:52 -04:00
In this quickstart, we will download Druid and set it up on a single machine. The cluster will be ready to load data
after completing this initial setup.
Before beginning the quickstart, it is helpful to read the [general Druid overview ](../design/index.html ) and the
[ingestion overview ](../ingestion/index.html ), as the tutorials will refer to concepts discussed on those pages.
## Prerequisites
2019-05-06 22:11:13 -04:00
### Software
2018-08-09 16:37:52 -04:00
2019-05-06 22:11:13 -04:00
You will need:
* Java 8 (8u92+)
2018-08-09 16:37:52 -04:00
* Linux, Mac OS X, or other Unix-like OS (Windows is not supported)
2019-05-06 22:11:13 -04:00
### Hardware
2019-06-11 11:50:52 -04:00
Druid includes several example [single-server configurations ](../operations/single-server.html ), along with scripts to
start the Druid processes using these configurations.
2019-05-06 22:11:13 -04:00
2019-06-11 11:50:52 -04:00
If you're running on a small machine such as a laptop for a quick evaluation, the `micro-quickstart` configuration is
a good choice, sized for a 4CPU/16GB RAM environment.
2019-05-06 22:11:13 -04:00
2019-06-11 11:50:52 -04:00
If you plan to use the single-machine deployment for further evaluation beyond the tutorials, we recommend a larger
configuration than `micro-quickstart` .
2019-05-06 22:11:13 -04:00
2018-08-09 16:37:52 -04:00
## Getting started
2018-11-02 00:47:29 -04:00
[Download ](https://www.apache.org/dyn/closer.cgi?path=/incubator/druid/#{DRUIDVERSION}/apache-druid-#{DRUIDVERSION}-bin.tar.gz )
the #{DRUIDVERSION} release.
Extract Druid by running the following commands in your terminal:
2018-08-09 16:37:52 -04:00
```bash
2018-11-02 00:47:29 -04:00
tar -xzf apache-druid-#{DRUIDVERSION}-bin.tar.gz
cd apache-druid-#{DRUIDVERSION}
2018-08-09 16:37:52 -04:00
```
In the package, you should find:
2018-11-02 00:47:29 -04:00
* `DISCLAIMER` , `LICENSE` , and `NOTICE` files
* `bin/*` - scripts useful for this quickstart
2019-05-16 14:13:48 -04:00
* `conf/*` - example configurations for single-server and clustered setup
2018-11-02 00:47:29 -04:00
* `extensions/*` - core Druid extensions
* `hadoop-dependencies/*` - Druid Hadoop dependencies
* `lib/*` - libraries and dependencies for core Druid
2018-08-09 16:37:52 -04:00
* `quickstart/*` - configuration files, sample data, and other files for the quickstart tutorials
## Download Zookeeper
2018-11-02 00:47:29 -04:00
Druid has a dependency on [Apache ZooKeeper ](http://zookeeper.apache.org/ ) for distributed coordination. You'll
2018-08-09 16:37:52 -04:00
need to download and run Zookeeper.
In the package root, run the following commands:
```bash
curl https://archive.apache.org/dist/zookeeper/zookeeper-3.4.11/zookeeper-3.4.11.tar.gz -o zookeeper-3.4.11.tar.gz
tar -xzf zookeeper-3.4.11.tar.gz
mv zookeeper-3.4.11 zk
```
2019-06-11 11:50:52 -04:00
The startup scripts for the tutorial will expect the contents of the Zookeeper tarball to be located at `zk` under the
apache-druid-#{DRUIDVERSION} package root.
2018-08-09 16:37:52 -04:00
## Start up Druid services
2019-06-11 11:50:52 -04:00
The following commands will assume that you are using the `micro-quickstart` single-machine configuration. If you are
using a different configuration, the `bin` directory has equivalent scripts for each configuration, such as
`bin/start-single-server-small` .
2019-05-06 22:11:13 -04:00
2018-11-02 00:47:29 -04:00
From the apache-druid-#{DRUIDVERSION} package root, run the following command:
2018-08-09 16:37:52 -04:00
```bash
2019-05-06 22:11:13 -04:00
./bin/start-micro-quickstart
2018-08-09 16:37:52 -04:00
```
This will bring up instances of Zookeeper and the Druid services, all running on the local machine, e.g.:
2018-08-13 14:11:32 -04:00
```bash
2019-05-06 22:11:13 -04:00
$ ./bin/start-micro-quickstart
[Fri May 3 11:40:50 2019] Running command[zk], logging to[/apache-druid-#{DRUIDVERSION}/var/sv/zk.log]: bin/run-zk conf
[Fri May 3 11:40:50 2019] Running command[coordinator-overlord], logging to[/apache-druid-#{DRUIDVERSION}/var/sv/coordinator-overlord.log]: bin/run-druid coordinator-overlord conf/druid/single-server/micro-quickstart
[Fri May 3 11:40:50 2019] Running command[broker], logging to[/apache-druid-#{DRUIDVERSION}/var/sv/broker.log]: bin/run-druid broker conf/druid/single-server/micro-quickstart
[Fri May 3 11:40:50 2019] Running command[router], logging to[/apache-druid-#{DRUIDVERSION}/var/sv/router.log]: bin/run-druid router conf/druid/single-server/micro-quickstart
[Fri May 3 11:40:50 2019] Running command[historical], logging to[/apache-druid-#{DRUIDVERSION}/var/sv/historical.log]: bin/run-druid historical conf/druid/single-server/micro-quickstart
[Fri May 3 11:40:50 2019] Running command[middleManager], logging to[/apache-druid-#{DRUIDVERSION}/var/sv/middleManager.log]: bin/run-druid middleManager conf/druid/single-server/micro-quickstart
2018-08-09 16:37:52 -04:00
```
2018-11-02 00:47:29 -04:00
All persistent state such as the cluster metadata store and segments for the services will be kept in the `var` directory under the apache-druid-#{DRUIDVERSION} package root. Logs for the services are located at `var/sv` .
2018-08-09 16:37:52 -04:00
2019-05-06 22:11:13 -04:00
Later on, if you'd like to stop the services, CTRL-C to exit the `bin/start-micro-quickstart` script, which will terminate the Druid processes.
2018-08-09 16:37:52 -04:00
2018-09-21 17:18:31 -04:00
### Resetting cluster state
2019-05-06 22:11:13 -04:00
If you want a clean start after stopping the services, delete the `var` directory and run the `bin/start-micro-quickstart` script again.
2018-08-09 16:37:52 -04:00
Once every service has started, you are now ready to load data.
2018-09-21 17:18:31 -04:00
#### Resetting Kafka
If you completed [Tutorial: Loading stream data from Kafka ](./tutorial-kafka.html ) and wish to reset the cluster state, you should additionally clear out any Kafka state.
Shut down the Kafka broker with CTRL-C before stopping Zookeeper and the Druid services, and then delete the Kafka log directory at `/tmp/kafka-logs` :
```bash
rm -rf /tmp/kafka-logs
```
2018-08-09 16:37:52 -04:00
## Loading Data
### Tutorial Dataset
For the following data loading tutorials, we have included a sample data file containing Wikipedia page edit events that occurred on 2015-09-12.
2018-11-02 00:47:29 -04:00
This sample data is located at `quickstart/tutorial/wikiticker-2015-09-12-sampled.json.gz` from the Druid package root. The page edit events are stored as JSON objects in a text file.
2018-08-09 16:37:52 -04:00
The sample data has the following columns, and an example event is shown below:
* added
* channel
* cityName
* comment
* countryIsoCode
* countryName
* deleted
* delta
* isAnonymous
* isMinor
* isNew
* isRobot
* isUnpatrolled
* metroCode
* namespace
* page
* regionIsoCode
* regionName
* user
2018-12-12 23:42:12 -05:00
2018-08-13 14:11:32 -04:00
```json
2018-08-09 16:37:52 -04:00
{
"timestamp":"2015-09-12T20:03:45.018Z",
"channel":"#en.wikipedia",
2018-09-28 12:02:36 -04:00
"namespace":"Main",
2018-08-09 16:37:52 -04:00
"page":"Spider-Man's powers and equipment",
"user":"foobar",
"comment":"/* Artificial web-shooters */",
"cityName":"New York",
"regionName":"New York",
"regionIsoCode":"NY",
"countryName":"United States",
"countryIsoCode":"US",
"isAnonymous":false,
"isNew":false,
"isMinor":false,
"isRobot":false,
"isUnpatrolled":false,
"added":99,
"delta":99,
"deleted":0,
}
```
2019-06-11 11:50:52 -04:00
The following tutorials demonstrate various methods of loading data into Druid, including both batch and streaming use
cases. All tutorials assume that you are using the `micro-quickstart` single-machine configuration mentioned above.
2018-08-09 16:37:52 -04:00
### [Tutorial: Loading a file](./tutorial-batch.html)
This tutorial demonstrates how to perform a batch file load, using Druid's native batch ingestion.
2019-04-19 18:52:26 -04:00
### [Tutorial: Loading stream data from Apache Kafka](./tutorial-kafka.html)
2018-08-09 16:37:52 -04:00
This tutorial demonstrates how to load streaming data from a Kafka topic.
2019-04-19 18:52:26 -04:00
### [Tutorial: Loading a file using Apache Hadoop](./tutorial-batch-hadoop.html)
2018-08-09 16:37:52 -04:00
This tutorial demonstrates how to perform a batch file load, using a remote Hadoop cluster.
2018-08-13 14:11:32 -04:00
### [Tutorial: Loading data using Tranquility](./tutorial-tranquility.html)
2018-08-09 16:37:52 -04:00
This tutorial demonstrates how to load streaming data by pushing events to Druid using the Tranquility service.
2018-08-13 14:11:32 -04:00
### [Tutorial: Writing your own ingestion spec](./tutorial-ingestion-spec.html)
2018-08-09 16:37:52 -04:00
2018-09-04 17:20:17 -04:00
This tutorial demonstrates how to write a new ingestion spec and use it to load data.