From 75ae4d434d08c928a543af52a2182c670529f2f2 Mon Sep 17 00:00:00 2001 From: Igal Levy Date: Wed, 19 Feb 2014 15:28:48 -0800 Subject: [PATCH 1/4] added advice about restarting zk and mysql if not already running --- docs/content/Tutorial:-All-About-Queries.md | 2 ++ docs/content/Tutorial:-Loading-Your-Data-Part-1.md | 2 ++ docs/content/Tutorial:-Loading-Your-Data-Part-2.md | 2 ++ 3 files changed, 6 insertions(+) diff --git a/docs/content/Tutorial:-All-About-Queries.md b/docs/content/Tutorial:-All-About-Queries.md index 2e275bf5131..49f7f413f4c 100644 --- a/docs/content/Tutorial:-All-About-Queries.md +++ b/docs/content/Tutorial:-All-About-Queries.md @@ -14,6 +14,8 @@ Before we start digging into how to query Druid, make sure you've gone through t Let's start up a simple Druid cluster so we can query all the things. +Note: If Zookeeper and MySQL aren't running, you'll have to start them again as described in [The Druid Cluster](Tutorial%3A-The-Druid-Cluster.html). + To start a Coordinator node: ```bash diff --git a/docs/content/Tutorial:-Loading-Your-Data-Part-1.md b/docs/content/Tutorial:-Loading-Your-Data-Part-1.md index 5a9d57b7ecb..122ce70ccc4 100644 --- a/docs/content/Tutorial:-Loading-Your-Data-Part-1.md +++ b/docs/content/Tutorial:-Loading-Your-Data-Part-1.md @@ -66,6 +66,8 @@ There are five data points spread across the day of 2013-08-31. Talk about big d In order to ingest and query this data, we are going to need to run a historical node, a coordinator node, and an indexing service to run the batch ingestion. +Note: If Zookeeper and MySQL aren't running, you'll have to start them again as described in [The Druid Cluster](Tutorial%3A-The-Druid-Cluster.html). + #### Starting a Local Indexing Service The simplest indexing service we can start up is to run an [overlord](Indexing-Service.html) node in local mode. You can do so by issuing: diff --git a/docs/content/Tutorial:-Loading-Your-Data-Part-2.md b/docs/content/Tutorial:-Loading-Your-Data-Part-2.md index 94c3d91970a..5d93c306f5b 100644 --- a/docs/content/Tutorial:-Loading-Your-Data-Part-2.md +++ b/docs/content/Tutorial:-Loading-Your-Data-Part-2.md @@ -231,6 +231,8 @@ The following events should exist in the file: To index the data, we are going to need an indexing service, a historical node, and a coordinator node. +Note: If Zookeeper and MySQL aren't running, you'll have to start them again as described in [The Druid Cluster](Tutorial%3A-The-Druid-Cluster.html). + To start the Indexing Service: ```bash From 7aadef8e7649dba0e33e54b5b6398607fa596dee Mon Sep 17 00:00:00 2001 From: Igal Levy Date: Wed, 19 Feb 2014 21:20:07 -0800 Subject: [PATCH 2/4] minor edits for clarity --- docs/content/Tutorial:-Loading-Your-Data-Part-2.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/content/Tutorial:-Loading-Your-Data-Part-2.md b/docs/content/Tutorial:-Loading-Your-Data-Part-2.md index 5d93c306f5b..5dbe8cce30e 100644 --- a/docs/content/Tutorial:-Loading-Your-Data-Part-2.md +++ b/docs/content/Tutorial:-Loading-Your-Data-Part-2.md @@ -45,9 +45,9 @@ With real-world data, we recommend having a message bus such as [Apache Kafka](h #### Setting up Kafka -[KafkaFirehoseFactory](https://github.com/metamx/druid/blob/druid-0.6.61/realtime/src/main/java/com/metamx/druid/realtime/firehose/KafkaFirehoseFactory.java) is how druid communicates with Kafka. Using this [Firehose](Firehose.html) with the right configuration, we can import data into Druid in real-time without writing any code. To load data to a real-time node via Kafka, we'll first need to initialize Zookeeper and Kafka, and then configure and initialize a [Realtime](Realtime.html) node. +[KafkaFirehoseFactory](Firehose.html) is how druid communicates with Kafka. Using this [Firehose](Firehose.html) with the right configuration, we can import data into Druid in real-time without writing any code. To load data to a real-time node via Kafka, we'll first need to initialize Zookeeper and Kafka, and then configure and initialize a [Realtime](Realtime.html) node. -Instructions for booting a Zookeeper and then Kafka cluster are available [here](http://kafka.apache.org/07/quickstart.html). +The following quick-start instructions for booting a Zookeeper and then Kafka cluster were taken from the [Kafka website](http://kafka.apache.org/07/quickstart.html). 1. Download Apache Kafka 0.7.2 from [http://kafka.apache.org/downloads.html](http://kafka.apache.org/downloads.html) @@ -227,7 +227,7 @@ The following events should exist in the file: {"timestamp": "2013-08-31T12:41:27Z", "page": "Coyote Tango", "language" : "ja", "user" : "stringer", "unpatrolled" : "true", "newPage" : "false", "robot": "true", "anonymous": "false", "namespace":"wikipedia", "continent":"Asia", "country":"Japan", "region":"Kanto", "city":"Tokyo", "added": 1, "deleted": 10, "delta": -9} ``` -#### Setup a Druid Cluster +#### Set Up a Druid Cluster To index the data, we are going to need an indexing service, a historical node, and a coordinator node. From 723f1427eb5192715229f606a680ed437f2ee459 Mon Sep 17 00:00:00 2001 From: Igal Levy Date: Thu, 20 Feb 2014 09:50:44 -0800 Subject: [PATCH 3/4] fixed broken url to hadoop setup instructions --- docs/content/Tutorial:-Loading-Your-Data-Part-2.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/content/Tutorial:-Loading-Your-Data-Part-2.md b/docs/content/Tutorial:-Loading-Your-Data-Part-2.md index 5dbe8cce30e..3758d39e0b9 100644 --- a/docs/content/Tutorial:-Loading-Your-Data-Part-2.md +++ b/docs/content/Tutorial:-Loading-Your-Data-Part-2.md @@ -207,7 +207,7 @@ Batch Ingestion --------------- Druid is designed for large data volumes, and most real-world data sets require batch indexing be done through a Hadoop job. -The setup for a single node, 'standalone' Hadoop cluster is available [here](http://hadoop.apache.org/docs/stable/single_node_setup.html). +The setup for a single node, 'standalone' Hadoop cluster is available [here](http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html). For the purposes of this tutorial, we are going to use our very small and simple Wikipedia data set. This data can directly be ingested via other means as shown in the previous [tutorial](Tutorial%3A-Loading-Your-Data-Part-1), but we are going to use Hadoop here for demonstration purposes. From d3a1041e494e53f8b84735518a07b956db5892eb Mon Sep 17 00:00:00 2001 From: Igal Levy Date: Thu, 20 Feb 2014 09:52:25 -0800 Subject: [PATCH 4/4] fixed broken url to other tutorial --- docs/content/Tutorial:-Loading-Your-Data-Part-2.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/content/Tutorial:-Loading-Your-Data-Part-2.md b/docs/content/Tutorial:-Loading-Your-Data-Part-2.md index 3758d39e0b9..4fa9d98dcd2 100644 --- a/docs/content/Tutorial:-Loading-Your-Data-Part-2.md +++ b/docs/content/Tutorial:-Loading-Your-Data-Part-2.md @@ -209,7 +209,7 @@ Druid is designed for large data volumes, and most real-world data sets require The setup for a single node, 'standalone' Hadoop cluster is available [here](http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html). -For the purposes of this tutorial, we are going to use our very small and simple Wikipedia data set. This data can directly be ingested via other means as shown in the previous [tutorial](Tutorial%3A-Loading-Your-Data-Part-1), but we are going to use Hadoop here for demonstration purposes. +For the purposes of this tutorial, we are going to use our very small and simple Wikipedia data set. This data can directly be ingested via other means as shown in the previous [tutorial](Tutorial%3A-Loading-Your-Data-Part-1.html), but we are going to use Hadoop here for demonstration purposes. Our data is located at: