New doc fixes (#6156)

This commit is contained in:
Jonathan Wei 2018-08-13 11:11:32 -07:00 committed by GitHub
parent a7ca4589dd
commit 94a937b5e8
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
15 changed files with 120 additions and 119 deletions

View File

@ -7,6 +7,7 @@ layout: doc_page
This page documents all of the configuration properties for each Druid service type. This page documents all of the configuration properties for each Druid service type.
## Table of Contents ## Table of Contents
* [Recommended Configuration File Organization](#recommended-configuration-file-organization)
* [Common configurations](#common-configurations) * [Common configurations](#common-configurations)
* [JVM Configuration Best Practices](#jvm-configuration-best-practices) * [JVM Configuration Best Practices](#jvm-configuration-best-practices)
* [Extensions](#extensions) * [Extensions](#extensions)

View File

@ -17,13 +17,14 @@ layout: toc
* [Tutorial: Loading a file using Hadoop](/docs/VERSION/tutorials/tutorial-batch-hadoop.html) * [Tutorial: Loading a file using Hadoop](/docs/VERSION/tutorials/tutorial-batch-hadoop.html)
* [Tutorial: Loading stream data using HTTP push](/docs/VERSION/tutorials/tutorial-tranquility.html) * [Tutorial: Loading stream data using HTTP push](/docs/VERSION/tutorials/tutorial-tranquility.html)
* [Tutorial: Querying data](/docs/VERSION/tutorials/tutorial-query.html) * [Tutorial: Querying data](/docs/VERSION/tutorials/tutorial-query.html)
* [Further tutorials](/docs/VERSION/tutorials/advanced.html) * Further tutorials
* [Tutorial: Rollup](/docs/VERSION/tutorials/rollup.html) * [Tutorial: Rollup](/docs/VERSION/tutorials/tutorial-rollup.html)
* [Tutorial: Configuring retention](/docs/VERSION/tutorials/tutorial-retention.html) * [Tutorial: Configuring retention](/docs/VERSION/tutorials/tutorial-retention.html)
* [Tutorial: Updating existing data](/docs/VERSION/tutorials/tutorial-update-data.html) * [Tutorial: Updating existing data](/docs/VERSION/tutorials/tutorial-update-data.html)
* [Tutorial: Compacting segments](/docs/VERSION/tutorials/tutorial-compaction.html) * [Tutorial: Compacting segments](/docs/VERSION/tutorials/tutorial-compaction.html)
* [Tutorial: Deleting data](/docs/VERSION/tutorials/tutorial-delete-data.html) * [Tutorial: Deleting data](/docs/VERSION/tutorials/tutorial-delete-data.html)
* [Tutorial: Writing your own ingestion specs](/docs/VERSION/tutorials/tutorial-ingestion-spec.html) * [Tutorial: Writing your own ingestion specs](/docs/VERSION/tutorials/tutorial-ingestion-spec.html)
* [Tutorial: Transforming input data](/docs/VERSION/tutorials/tutorial-transform-spec.html)
* [Clustering](/docs/VERSION/tutorials/cluster.html) * [Clustering](/docs/VERSION/tutorials/cluster.html)
## Data Ingestion ## Data Ingestion
@ -33,8 +34,8 @@ layout: toc
* [Schema Design](/docs/VERSION/ingestion/schema-design.html) * [Schema Design](/docs/VERSION/ingestion/schema-design.html)
* [Schema Changes](/docs/VERSION/ingestion/schema-changes.html) * [Schema Changes](/docs/VERSION/ingestion/schema-changes.html)
* [Batch File Ingestion](/docs/VERSION/ingestion/batch-ingestion.html) * [Batch File Ingestion](/docs/VERSION/ingestion/batch-ingestion.html)
* [Native Batch Ingestion](docs/VERSION/ingestion/native_tasks.html) * [Native Batch Ingestion](/docs/VERSION/ingestion/native_tasks.html)
* [Hadoop Batch Ingestion](docs/VERSION/ingestion/hadoop.html) * [Hadoop Batch Ingestion](/docs/VERSION/ingestion/hadoop.html)
* [Stream Ingestion](/docs/VERSION/ingestion/stream-ingestion.html) * [Stream Ingestion](/docs/VERSION/ingestion/stream-ingestion.html)
* [Stream Push](/docs/VERSION/ingestion/stream-push.html) * [Stream Push](/docs/VERSION/ingestion/stream-push.html)
* [Stream Pull](/docs/VERSION/ingestion/stream-pull.html) * [Stream Pull](/docs/VERSION/ingestion/stream-pull.html)

View File

@ -72,7 +72,7 @@ bin/supervise -c quickstart/tutorial/conf/tutorial-cluster.conf
This will bring up instances of Zookeeper and the Druid services, all running on the local machine, e.g.: This will bring up instances of Zookeeper and the Druid services, all running on the local machine, e.g.:
``` ```bash
bin/supervise -c quickstart/tutorial/conf/tutorial-cluster.conf bin/supervise -c quickstart/tutorial/conf/tutorial-cluster.conf
[Thu Jul 26 12:16:23 2018] Running command[zk], logging to[/stage/druid-#{DRUIDVERSION}/var/sv/zk.log]: bin/run-zk quickstart/tutorial/conf [Thu Jul 26 12:16:23 2018] Running command[zk], logging to[/stage/druid-#{DRUIDVERSION}/var/sv/zk.log]: bin/run-zk quickstart/tutorial/conf
[Thu Jul 26 12:16:23 2018] Running command[coordinator], logging to[/stage/druid-#{DRUIDVERSION}/var/sv/coordinator.log]: bin/run-druid coordinator quickstart/tutorial/conf [Thu Jul 26 12:16:23 2018] Running command[coordinator], logging to[/stage/druid-#{DRUIDVERSION}/var/sv/coordinator.log]: bin/run-druid coordinator quickstart/tutorial/conf
@ -121,7 +121,7 @@ The sample data has the following columns, and an example event is shown below:
* regionName * regionName
* user * user
``` ```json
{ {
"timestamp":"2015-09-12T20:03:45.018Z", "timestamp":"2015-09-12T20:03:45.018Z",
"channel":"#en.wikipedia", "channel":"#en.wikipedia",
@ -151,18 +151,18 @@ The following tutorials demonstrate various methods of loading data into Druid,
This tutorial demonstrates how to perform a batch file load, using Druid's native batch ingestion. This tutorial demonstrates how to perform a batch file load, using Druid's native batch ingestion.
### [Tutorial: Loading stream data from Kafka](../tutorial-kafka.html) ### [Tutorial: Loading stream data from Kafka](./tutorial-kafka.html)
This tutorial demonstrates how to load streaming data from a Kafka topic. This tutorial demonstrates how to load streaming data from a Kafka topic.
### [Tutorial: Loading a file using Hadoop](../tutorial-batch-hadoop.html) ### [Tutorial: Loading a file using Hadoop](./tutorial-batch-hadoop.html)
This tutorial demonstrates how to perform a batch file load, using a remote Hadoop cluster. This tutorial demonstrates how to perform a batch file load, using a remote Hadoop cluster.
### [Tutorial: Loading data using Tranquility](../tutorial-tranquility.html) ### [Tutorial: Loading data using Tranquility](./tutorial-tranquility.html)
This tutorial demonstrates how to load streaming data by pushing events to Druid using the Tranquility service. This tutorial demonstrates how to load streaming data by pushing events to Druid using the Tranquility service.
### [Tutorial: Writing your own ingestion spec](../tutorial-ingestion-spec.html) ### [Tutorial: Writing your own ingestion spec](./tutorial-ingestion-spec.html)
This tutorial demonstrates how to write a new ingestion spec and use it to load data. This tutorial demonstrates how to write a new ingestion spec and use it to load data.

View File

@ -20,9 +20,9 @@ For this tutorial, we've provided a Dockerfile for a Hadoop 2.8.3 cluster, which
This Dockerfile and related files are located at `quickstart/tutorial/hadoop/docker`. This Dockerfile and related files are located at `quickstart/tutorial/hadoop/docker`.
From the druid-${DRUIDVERSION} package root, run the following commands to build a Docker image named "druid-hadoop-demo" with version tag "2.8.3": From the druid-#{DRUIDVERSION} package root, run the following commands to build a Docker image named "druid-hadoop-demo" with version tag "2.8.3":
``` ```bash
cd quickstart/tutorial/hadoop/docker cd quickstart/tutorial/hadoop/docker
docker build -t druid-hadoop-demo:2.8.3 . docker build -t druid-hadoop-demo:2.8.3 .
``` ```
@ -37,7 +37,7 @@ We'll need a shared folder between the host and the Hadoop container for transfe
Let's create some folders under `/tmp`, we will use these later when starting the Hadoop container: Let's create some folders under `/tmp`, we will use these later when starting the Hadoop container:
``` ```bash
mkdir -p /tmp/shared mkdir -p /tmp/shared
mkdir -p /tmp/shared/hadoop_xml mkdir -p /tmp/shared/hadoop_xml
``` ```
@ -54,13 +54,13 @@ On the host machine, add the following entry to `/etc/hosts`:
Once the `/tmp/shared` folder has been created and the `etc/hosts` entry has been added, run the following command to start the Hadoop container. Once the `/tmp/shared` folder has been created and the `etc/hosts` entry has been added, run the following command to start the Hadoop container.
``` ```bash
docker run -it -h druid-hadoop-demo -p 50010:50010 -p 50020:50020 -p 50075:50075 -p 50090:50090 -p 8020:8020 -p 10020:10020 -p 19888:19888 -p 8030:8030 -p 8031:8031 -p 8032:8032 -p 8033:8033 -p 8040:8040 -p 8042:8042 -p 8088:8088 -p 8443:8443 -p 2049:2049 -p 9000:9000 -p 49707:49707 -p 2122:2122 -p 34455:34455 -v /tmp/shared:/shared druid-hadoop-demo:2.8.3 /etc/bootstrap.sh -bash docker run -it -h druid-hadoop-demo -p 50010:50010 -p 50020:50020 -p 50075:50075 -p 50090:50090 -p 8020:8020 -p 10020:10020 -p 19888:19888 -p 8030:8030 -p 8031:8031 -p 8032:8032 -p 8033:8033 -p 8040:8040 -p 8042:8042 -p 8088:8088 -p 8443:8443 -p 2049:2049 -p 9000:9000 -p 49707:49707 -p 2122:2122 -p 34455:34455 -v /tmp/shared:/shared druid-hadoop-demo:2.8.3 /etc/bootstrap.sh -bash
``` ```
Once the container is started, your terminal will attach to a bash shell running inside the container: Once the container is started, your terminal will attach to a bash shell running inside the container:
``` ```bash
Starting sshd: [ OK ] Starting sshd: [ OK ]
18/07/26 17:27:15 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18/07/26 17:27:15 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [druid-hadoop-demo] Starting namenodes on [druid-hadoop-demo]
@ -80,9 +80,9 @@ The `Unable to load native-hadoop library for your platform... using builtin-jav
### Copy input data to the Hadoop container ### Copy input data to the Hadoop container
From the druid-${DRUIDVERSION} package root on the host, copy the `quickstart/wikiticker-2015-09-12-sampled.json.gz` sample data to the shared folder: From the druid-#{DRUIDVERSION} package root on the host, copy the `quickstart/wikiticker-2015-09-12-sampled.json.gz` sample data to the shared folder:
``` ```bash
cp quickstart/wikiticker-2015-09-12-sampled.json.gz /tmp/shared/wikiticker-2015-09-12-sampled.json.gz cp quickstart/wikiticker-2015-09-12-sampled.json.gz /tmp/shared/wikiticker-2015-09-12-sampled.json.gz
``` ```
@ -90,7 +90,7 @@ cp quickstart/wikiticker-2015-09-12-sampled.json.gz /tmp/shared/wikiticker-2015-
In the Hadoop container's shell, run the following commands to setup the HDFS directories needed by this tutorial and copy the input data to HDFS. In the Hadoop container's shell, run the following commands to setup the HDFS directories needed by this tutorial and copy the input data to HDFS.
``` ```bash
cd /usr/local/hadoop/bin cd /usr/local/hadoop/bin
./hadoop fs -mkdir /druid ./hadoop fs -mkdir /druid
./hadoop fs -mkdir /druid/segments ./hadoop fs -mkdir /druid/segments
@ -113,13 +113,13 @@ Some additional steps are needed to configure the Druid cluster for Hadoop batch
From the Hadoop container's shell, run the following command to copy the Hadoop .xml configuration files to the shared folder: From the Hadoop container's shell, run the following command to copy the Hadoop .xml configuration files to the shared folder:
``` ```bash
cp /usr/local/hadoop/etc/hadoop/*.xml /shared/hadoop_xml cp /usr/local/hadoop/etc/hadoop/*.xml /shared/hadoop_xml
``` ```
From the host machine, run the following, where {PATH_TO_DRUID} is replaced by the path to the Druid package. From the host machine, run the following, where {PATH_TO_DRUID} is replaced by the path to the Druid package.
``` ```bash
mkdir -p {PATH_TO_DRUID}/quickstart/tutorial/conf/druid/_common/hadoop-xml mkdir -p {PATH_TO_DRUID}/quickstart/tutorial/conf/druid/_common/hadoop-xml
cp /tmp/shared/hadoop_xml/*.xml {PATH_TO_DRUID}/quickstart/tutorial/conf/druid/_common/hadoop-xml/ cp /tmp/shared/hadoop_xml/*.xml {PATH_TO_DRUID}/quickstart/tutorial/conf/druid/_common/hadoop-xml/
``` ```
@ -177,17 +177,17 @@ a task that loads the `wikiticker-2015-09-12-sampled.json.gz` file included in t
Let's submit the `wikipedia-index-hadoop-.json` task: Let's submit the `wikipedia-index-hadoop-.json` task:
``` ```bash
bin/post-index-task --file quickstart/tutorial/wikipedia-index-hadoop.json bin/post-index-task --file quickstart/tutorial/wikipedia-index-hadoop.json
``` ```
## Querying your data ## Querying your data
After the data load is complete, please follow the [query tutorial](../tutorial/tutorial-query.html) to run some example queries on the newly loaded data. After the data load is complete, please follow the [query tutorial](../tutorials/tutorial-query.html) to run some example queries on the newly loaded data.
## Cleanup ## Cleanup
This tutorial is only meant to be used together with the [query tutorial](../tutorial/tutorial-query.html). This tutorial is only meant to be used together with the [query tutorial](../tutorials/tutorial-query.html).
If you wish to go through any of the other tutorials, you will need to: If you wish to go through any of the other tutorials, you will need to:
* Shut down the cluster and reset the cluster state by removing the contents of the `var` directory under the druid package. * Shut down the cluster and reset the cluster state by removing the contents of the `var` directory under the druid package.

View File

@ -19,7 +19,7 @@ A data load is initiated by submitting an *ingestion task* spec to the Druid ove
The Druid package includes the following sample native batch ingestion task spec at `quickstart/wikipedia-index.json`, shown here for convenience, The Druid package includes the following sample native batch ingestion task spec at `quickstart/wikipedia-index.json`, shown here for convenience,
which has been configured to read the `quickstart/wikiticker-2015-09-12-sampled.json.gz` input file: which has been configured to read the `quickstart/wikiticker-2015-09-12-sampled.json.gz` input file:
``` ```json
{ {
"type" : "index", "type" : "index",
"spec" : { "spec" : {
@ -101,13 +101,13 @@ This script will POST an ingestion task to the Druid overlord and poll Druid unt
Run the following command from Druid package root: Run the following command from Druid package root:
``` ```bash
bin/post-index-task --file quickstart/tutorial/wikipedia-index.json bin/post-index-task --file quickstart/tutorial/wikipedia-index.json
``` ```
You should see output like the following: You should see output like the following:
``` ```bash
Beginning indexing data for wikipedia Beginning indexing data for wikipedia
Task started: index_wikipedia_2018-07-27T06:37:44.323Z Task started: index_wikipedia_2018-07-27T06:37:44.323Z
Task log: http://localhost:8090/druid/indexer/v1/task/index_wikipedia_2018-07-27T06:37:44.323Z/log Task log: http://localhost:8090/druid/indexer/v1/task/index_wikipedia_2018-07-27T06:37:44.323Z/log
@ -121,7 +121,7 @@ wikipedia loading complete! You may now query your data
## Querying your data ## Querying your data
Once the data is loaded, please follow the [query tutorial](../tutorial/tutorial-query.html) to run some example queries on the newly loaded data. Once the data is loaded, please follow the [query tutorial](../tutorials/tutorial-query.html) to run some example queries on the newly loaded data.
## Cleanup ## Cleanup

View File

@ -11,7 +11,7 @@ Because there is some per-segment memory and processing overhead, it can sometim
For this tutorial, we'll assume you've already downloaded Druid as described in For this tutorial, we'll assume you've already downloaded Druid as described in
the [single-machine quickstart](index.html) and have it running on your local machine. the [single-machine quickstart](index.html) and have it running on your local machine.
It will also be helpful to have finished [Tutorial: Loading a file](/docs/VERSION/tutorials/tutorial-batch.html) and [Tutorial: Querying data](/docs/VERSION/tutorials/tutorial-query.html). It will also be helpful to have finished [Tutorial: Loading a file](../tutorials/tutorial-batch.html) and [Tutorial: Querying data](../tutorials/tutorial-query.html).
## Load the initial data ## Load the initial data
@ -19,7 +19,7 @@ For this tutorial, we'll be using the Wikipedia edits sample data, with an inges
The ingestion spec can be found at `quickstart/tutorial/compaction-init-index.json`. Let's submit that spec, which will create a datasource called `compaction-tutorial`: The ingestion spec can be found at `quickstart/tutorial/compaction-init-index.json`. Let's submit that spec, which will create a datasource called `compaction-tutorial`:
``` ```bash
bin/post-index-task --file quickstart/tutorial/compaction-init-index.json bin/post-index-task --file quickstart/tutorial/compaction-init-index.json
``` ```
@ -31,7 +31,7 @@ There will be 24 segments for this datasource, one segment per hour in the input
Running a COUNT(*) query on this datasource shows that there are 39,244 rows: Running a COUNT(*) query on this datasource shows that there are 39,244 rows:
``` ```bash
dsql> select count(*) from "compaction-tutorial"; dsql> select count(*) from "compaction-tutorial";
┌────────┐ ┌────────┐
│ EXPR$0 │ │ EXPR$0 │
@ -47,7 +47,7 @@ Let's now combine these 24 segments into one segment.
We have included a compaction task spec for this tutorial datasource at `quickstart/tutorial/compaction-final-index.json`: We have included a compaction task spec for this tutorial datasource at `quickstart/tutorial/compaction-final-index.json`:
``` ```json
{ {
"type": "compact", "type": "compact",
"dataSource": "compaction-tutorial", "dataSource": "compaction-tutorial",
@ -69,7 +69,7 @@ In this tutorial example, only one compacted segment will be created, as the 392
Let's submit this task now: Let's submit this task now:
``` ```bash
bin/post-index-task --file quickstart/tutorial/compaction-final-index.json bin/post-index-task --file quickstart/tutorial/compaction-final-index.json
``` ```
@ -85,7 +85,7 @@ The new compacted segment has a more recent version than the original segments,
Let's try running a COUNT(*) on `compaction-tutorial` again, where the row count should still be 39,244: Let's try running a COUNT(*) on `compaction-tutorial` again, where the row count should still be 39,244:
``` ```bash
dsql> select count(*) from "compaction-tutorial"; dsql> select count(*) from "compaction-tutorial";
┌────────┐ ┌────────┐
│ EXPR$0 │ │ EXPR$0 │

View File

@ -9,7 +9,7 @@ This tutorial demonstrates how to delete existing data.
For this tutorial, we'll assume you've already downloaded Druid as described in For this tutorial, we'll assume you've already downloaded Druid as described in
the [single-machine quickstart](index.html) and have it running on your local machine. the [single-machine quickstart](index.html) and have it running on your local machine.
Completing [Tutorial: Configuring retention](/docs/VERSION/tutorials/tutorial-retention.html) first is highly recommended, as we will be using retention rules in this tutorial. Completing [Tutorial: Configuring retention](../tutorials/tutorial-retention.html) first is highly recommended, as we will be using retention rules in this tutorial.
## Load initial data ## Load initial data
@ -17,7 +17,7 @@ In this tutorial, we will use the Wikipedia edits data, with an indexing spec th
Let's load this initial data: Let's load this initial data:
``` ```bash
bin/post-index-task --file quickstart/tutorial/deletion-index.json bin/post-index-task --file quickstart/tutorial/deletion-index.json
``` ```
@ -48,9 +48,9 @@ In the `rule #2` box at the bottom, click `Drop` and `Forever`.
This will cause the first 12 segments of `deletion-tutorial` to be dropped. However, these dropped segments are not removed from deep storage. This will cause the first 12 segments of `deletion-tutorial` to be dropped. However, these dropped segments are not removed from deep storage.
You can see that all 24 segments are still present in deep storage by listing the contents of `druid-{DRUIDVERSION}/var/druid/segments/deletion-tutorial`: You can see that all 24 segments are still present in deep storage by listing the contents of `druid-#{DRUIDVERSION}/var/druid/segments/deletion-tutorial`:
``` ```bash
$ ls -l1 var/druid/segments/deletion-tutorial/ $ ls -l1 var/druid/segments/deletion-tutorial/
2015-09-12T00:00:00.000Z_2015-09-12T01:00:00.000Z 2015-09-12T00:00:00.000Z_2015-09-12T01:00:00.000Z
2015-09-12T01:00:00.000Z_2015-09-12T02:00:00.000Z 2015-09-12T01:00:00.000Z_2015-09-12T02:00:00.000Z
@ -90,7 +90,7 @@ The top of the info box shows the full segment ID, e.g. `deletion-tutorial_2016-
Let's disable the hour 14 segment by sending the following DELETE request to the coordinator, where {SEGMENT-ID} is the full segment ID shown in the info box: Let's disable the hour 14 segment by sending the following DELETE request to the coordinator, where {SEGMENT-ID} is the full segment ID shown in the info box:
``` ```bash
curl -XDELETE http://localhost:8081/druid/coordinator/v1/datasources/deletion-tutorial/segments/{SEGMENT-ID} curl -XDELETE http://localhost:8081/druid/coordinator/v1/datasources/deletion-tutorial/segments/{SEGMENT-ID}
``` ```
@ -100,7 +100,7 @@ After that command completes, you should see that the segment for hour 14 has be
Note that the hour 14 segment is still in deep storage: Note that the hour 14 segment is still in deep storage:
``` ```bash
$ ls -l1 var/druid/segments/deletion-tutorial/ $ ls -l1 var/druid/segments/deletion-tutorial/
2015-09-12T00:00:00.000Z_2015-09-12T01:00:00.000Z 2015-09-12T00:00:00.000Z_2015-09-12T01:00:00.000Z
2015-09-12T01:00:00.000Z_2015-09-12T02:00:00.000Z 2015-09-12T01:00:00.000Z_2015-09-12T02:00:00.000Z
@ -134,13 +134,13 @@ Now that we have disabled some segments, we can submit a Kill Task, which will d
A Kill Task spec has been provided at `quickstart/deletion-kill.json`. Submit this task to the Overlord with the following command: A Kill Task spec has been provided at `quickstart/deletion-kill.json`. Submit this task to the Overlord with the following command:
``` ```bash
curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/tutorial/deletion-kill.json http://localhost:8090/druid/indexer/v1/task curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/tutorial/deletion-kill.json http://localhost:8090/druid/indexer/v1/task
``` ```
After this task completes, you can see that the disabled segments have now been removed from deep storage: After this task completes, you can see that the disabled segments have now been removed from deep storage:
``` ```bash
$ ls -l1 var/druid/segments/deletion-tutorial/ $ ls -l1 var/druid/segments/deletion-tutorial/
2015-09-12T12:00:00.000Z_2015-09-12T13:00:00.000Z 2015-09-12T12:00:00.000Z_2015-09-12T13:00:00.000Z
2015-09-12T13:00:00.000Z_2015-09-12T14:00:00.000Z 2015-09-12T13:00:00.000Z_2015-09-12T14:00:00.000Z

View File

@ -24,7 +24,7 @@ Suppose we have the following network flow data:
* `bytes`: number of bytes transmitted * `bytes`: number of bytes transmitted
* `cost`: the cost of sending the traffic * `cost`: the cost of sending the traffic
``` ```json
{"ts":"2018-01-01T01:01:35Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2", "srcPort":2000, "dstPort":3000, "protocol": 6, "packets":10, "bytes":1000, "cost": 1.4} {"ts":"2018-01-01T01:01:35Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2", "srcPort":2000, "dstPort":3000, "protocol": 6, "packets":10, "bytes":1000, "cost": 1.4}
{"ts":"2018-01-01T01:01:51Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2", "srcPort":2000, "dstPort":3000, "protocol": 6, "packets":20, "bytes":2000, "cost": 3.1} {"ts":"2018-01-01T01:01:51Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2", "srcPort":2000, "dstPort":3000, "protocol": 6, "packets":20, "bytes":2000, "cost": 3.1}
{"ts":"2018-01-01T01:01:59Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2", "srcPort":2000, "dstPort":3000, "protocol": 6, "packets":30, "bytes":3000, "cost": 0.4} {"ts":"2018-01-01T01:01:59Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2", "srcPort":2000, "dstPort":3000, "protocol": 6, "packets":30, "bytes":3000, "cost": 0.4}
@ -74,7 +74,7 @@ A `dataSchema` has a `parser` field, which defines the parser that Druid will us
Since our input data is represented as JSON strings, we'll use a `string` parser with `json` format: Since our input data is represented as JSON strings, we'll use a `string` parser with `json` format:
``` ```json
"dataSchema" : { "dataSchema" : {
"dataSource" : "ingestion-tutorial", "dataSource" : "ingestion-tutorial",
"parser" : { "parser" : {
@ -92,7 +92,7 @@ The `parser` needs to know how to extract the main timestamp field from the inpu
The timestamp column in our input data is named "ts", containing ISO 8601 timestamps, so let's add a `timestampSpec` with that information to the `parseSpec`: The timestamp column in our input data is named "ts", containing ISO 8601 timestamps, so let's add a `timestampSpec` with that information to the `parseSpec`:
``` ```json
"dataSchema" : { "dataSchema" : {
"dataSource" : "ingestion-tutorial", "dataSource" : "ingestion-tutorial",
"parser" : { "parser" : {
@ -128,7 +128,7 @@ For this tutorial, let's enable rollup. This is specified with a `granularitySpe
Note that the `granularitySpec` lies outside of the `parser`. We will revist the `parser` soon when we define our dimensions and metrics. Note that the `granularitySpec` lies outside of the `parser`. We will revist the `parser` soon when we define our dimensions and metrics.
``` ```json
"dataSchema" : { "dataSchema" : {
"dataSource" : "ingestion-tutorial", "dataSource" : "ingestion-tutorial",
"parser" : { "parser" : {
@ -163,7 +163,7 @@ Let's look at how to define these dimensions and metrics within the ingestion sp
Dimensions are specified with a `dimensionsSpec` inside the `parseSpec`. Dimensions are specified with a `dimensionsSpec` inside the `parseSpec`.
``` ```json
"dataSchema" : { "dataSchema" : {
"dataSource" : "ingestion-tutorial", "dataSource" : "ingestion-tutorial",
"parser" : { "parser" : {
@ -255,7 +255,7 @@ Note that we have also defined a `count` aggregator. The count aggregator will t
If we were not using rollup, all columns would be specified in the `dimensionsSpec`, e.g.: If we were not using rollup, all columns would be specified in the `dimensionsSpec`, e.g.:
``` ```json
"dimensionsSpec" : { "dimensionsSpec" : {
"dimensions": [ "dimensions": [
"srcIP", "srcIP",
@ -284,7 +284,7 @@ There are some additional properties we need to set in the `granularitySpec`:
Segment granularity is configured by the `segmentGranularity` property in the `granularitySpec`. For this tutorial, we'll create hourly segments: Segment granularity is configured by the `segmentGranularity` property in the `granularitySpec`. For this tutorial, we'll create hourly segments:
``` ```json
"dataSchema" : { "dataSchema" : {
"dataSource" : "ingestion-tutorial", "dataSource" : "ingestion-tutorial",
"parser" : { "parser" : {
@ -326,7 +326,7 @@ Our input data has events from two separate hours, so this task will generate tw
The query granularity is configured by the `queryGranularity` property in the `granularitySpec`. For this tutorial, let's use minute granularity: The query granularity is configured by the `queryGranularity` property in the `granularitySpec`. For this tutorial, let's use minute granularity:
``` ```json
"dataSchema" : { "dataSchema" : {
"dataSource" : "ingestion-tutorial", "dataSource" : "ingestion-tutorial",
"parser" : { "parser" : {
@ -365,13 +365,13 @@ The query granularity is configured by the `queryGranularity` property in the `g
To see the effect of the query granularity, let's look at this row from the raw input data: To see the effect of the query granularity, let's look at this row from the raw input data:
``` ```json
{"ts":"2018-01-01T01:03:29Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2", "srcPort":5000, "dstPort":7000, "protocol": 6, "packets":60, "bytes":6000, "cost": 4.3} {"ts":"2018-01-01T01:03:29Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2", "srcPort":5000, "dstPort":7000, "protocol": 6, "packets":60, "bytes":6000, "cost": 4.3}
``` ```
When this row is ingested with minute queryGranularity, Druid will floor the row's timestamp to minute buckets: When this row is ingested with minute queryGranularity, Druid will floor the row's timestamp to minute buckets:
``` ```json
{"ts":"2018-01-01T01:03:00Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2", "srcPort":5000, "dstPort":7000, "protocol": 6, "packets":60, "bytes":6000, "cost": 4.3} {"ts":"2018-01-01T01:03:00Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2", "srcPort":5000, "dstPort":7000, "protocol": 6, "packets":60, "bytes":6000, "cost": 4.3}
``` ```
@ -381,7 +381,7 @@ For batch tasks, it is necessary to define a time interval. Input rows with time
The interval is also specified in the `granularitySpec`: The interval is also specified in the `granularitySpec`:
``` ```json
"dataSchema" : { "dataSchema" : {
"dataSource" : "ingestion-tutorial", "dataSource" : "ingestion-tutorial",
"parser" : { "parser" : {
@ -425,7 +425,7 @@ We've now finished defining our `dataSchema`. The remaining steps are to place t
The `dataSchema` is shared across all task types, but each task type has its own specification format. For this tutorial, we will use the native batch ingestion task: The `dataSchema` is shared across all task types, but each task type has its own specification format. For this tutorial, we will use the native batch ingestion task:
``` ```json
{ {
"type" : "index", "type" : "index",
"spec" : { "spec" : {
@ -473,7 +473,7 @@ The `dataSchema` is shared across all task types, but each task type has its own
Now let's define our input source, which is specified in an `ioConfig` object. Each task type has its own type of `ioConfig`. The native batch task uses "firehoses" to read input data, so let's configure a "local" firehose to read the example netflow data we saved earlier: Now let's define our input source, which is specified in an `ioConfig` object. Each task type has its own type of `ioConfig`. The native batch task uses "firehoses" to read input data, so let's configure a "local" firehose to read the example netflow data we saved earlier:
``` ```json
"ioConfig" : { "ioConfig" : {
"type" : "index", "type" : "index",
"firehose" : { "firehose" : {
@ -484,7 +484,7 @@ Now let's define our input source, which is specified in an `ioConfig` object. E
} }
``` ```
``` ```json
{ {
"type" : "index", "type" : "index",
"spec" : { "spec" : {
@ -541,7 +541,7 @@ Each ingestion task has a `tuningConfig` section that allows users to tune vario
As an example, let's add a `tuningConfig` that sets a target segment size for the native batch ingestion task: As an example, let's add a `tuningConfig` that sets a target segment size for the native batch ingestion task:
``` ```json
"tuningConfig" : { "tuningConfig" : {
"type" : "index", "type" : "index",
"targetPartitionSize" : 5000000 "targetPartitionSize" : 5000000
@ -554,7 +554,7 @@ Note that each ingestion task has its own type of `tuningConfig`.
We've finished defining the ingestion spec, it should now look like the following: We've finished defining the ingestion spec, it should now look like the following:
``` ```json
{ {
"type" : "index", "type" : "index",
"spec" : { "spec" : {
@ -611,9 +611,9 @@ We've finished defining the ingestion spec, it should now look like the followin
## Submit the task and query the data ## Submit the task and query the data
From the druid-${DRUIDVERSION} package root, run the following command: From the druid-#{DRUIDVERSION} package root, run the following command:
``` ```bash
bin/post-index-task --file quickstart/ingestion-tutorial-index.json bin/post-index-task --file quickstart/ingestion-tutorial-index.json
``` ```
@ -621,7 +621,7 @@ After the script completes, we will query the data.
Let's run `bin/dsql` and issue a `select * from "ingestion-tutorial";` query to see what data was ingested. Let's run `bin/dsql` and issue a `select * from "ingestion-tutorial";` query to see what data was ingested.
``` ```bash
$ bin/dsql $ bin/dsql
Welcome to dsql, the command-line client for Druid SQL. Welcome to dsql, the command-line client for Druid SQL.
Type "\h" for help. Type "\h" for help.

View File

@ -48,7 +48,7 @@ curl -XPOST -H'Content-Type: application/json' -d @quickstart/tutorial/wikipedia
If the supervisor was successfully created, you will get a response containing the ID of the supervisor; in our case we should see `{"id":"wikipedia-kafka"}`. If the supervisor was successfully created, you will get a response containing the ID of the supervisor; in our case we should see `{"id":"wikipedia-kafka"}`.
For more details about what's going on here, check out the For more details about what's going on here, check out the
[Druid Kafka indexing service documentation](http://druid.io/docs/{{druidVersion}}/development/extensions-core/kafka-ingestion.html). [Druid Kafka indexing service documentation](../development/extensions-core/kafka-ingestion.html).
## Load data ## Load data
@ -56,7 +56,7 @@ Let's launch a console producer for our topic and send some data!
In your Druid directory, run the following command: In your Druid directory, run the following command:
``` ```bash
cd quickstart cd quickstart
gunzip -k wikipedia-2015-09-12-sampled.json.gz gunzip -k wikipedia-2015-09-12-sampled.json.gz
``` ```
@ -74,7 +74,7 @@ The previous command posted sample events to the *wikipedia* Kafka topic which w
After data is sent to the Kafka stream, it is immediately available for querying. After data is sent to the Kafka stream, it is immediately available for querying.
Please follow the [query tutorial](../tutorial/tutorial-query.html) to run some example queries on the newly loaded data. Please follow the [query tutorial](../tutorials/tutorial-query.html) to run some example queries on the newly loaded data.
## Cleanup ## Cleanup
@ -82,4 +82,4 @@ If you wish to go through any of the other ingestion tutorials, you will need to
## Further reading ## Further reading
For more information on loading data from Kafka streams, please see the [Druid Kafka indexing service documentation](http://druid.io/docs/{{druidVersion}}/development/extensions-core/kafka-ingestion.html). For more information on loading data from Kafka streams, please see the [Druid Kafka indexing service documentation](../development/extensions-core/kafka-ingestion.html).

View File

@ -8,10 +8,10 @@ This tutorial will demonstrate how to query data in Druid, with examples for Dru
The tutorial assumes that you've already completed one of the 4 ingestion tutorials, as we will be querying the sample Wikipedia edits data. The tutorial assumes that you've already completed one of the 4 ingestion tutorials, as we will be querying the sample Wikipedia edits data.
* [Tutorial: Loading a file](/docs/VERSION/tutorials/tutorial-batch.html) * [Tutorial: Loading a file](../tutorials/tutorial-batch.html)
* [Tutorial: Loading stream data from Kafka](/docs/VERSION/tutorials/tutorial-kafka.html) * [Tutorial: Loading stream data from Kafka](../tutorials/tutorial-kafka.html)
* [Tutorial: Loading a file using Hadoop](/docs/VERSION/tutorials/tutorial-batch-hadoop.html) * [Tutorial: Loading a file using Hadoop](../tutorials/tutorial-batch-hadoop.html)
* [Tutorial: Loading stream data using Tranquility](/docs/VERSION/tutorials/tutorial-tranquility.html) * [Tutorial: Loading stream data using Tranquility](../tutorials/tutorial-tranquility.html)
## Native JSON queries ## Native JSON queries
@ -102,7 +102,7 @@ curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/tutorial/wikipe
The following results should be returned: The following results should be returned:
``` ```json
[ [
{ {
"page": "Wikipedia:Vandalismusmeldung", "page": "Wikipedia:Vandalismusmeldung",
@ -153,7 +153,7 @@ For convenience, the Druid package includes a SQL command-line client, located a
Let's now run `bin/dsql`; you should see the following prompt: Let's now run `bin/dsql`; you should see the following prompt:
``` ```bash
Welcome to dsql, the command-line client for Druid SQL. Welcome to dsql, the command-line client for Druid SQL.
Type "\h" for help. Type "\h" for help.
dsql> dsql>
@ -161,7 +161,7 @@ dsql>
To submit the query, paste it to the `dsql` prompt and press enter: To submit the query, paste it to the `dsql` prompt and press enter:
``` ```bash
dsql> SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 10; dsql> SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 10;
┌──────────────────────────────────────────────────────────┬───────┐ ┌──────────────────────────────────────────────────────────┬───────┐
│ page │ Edits │ │ page │ Edits │
@ -186,7 +186,7 @@ Retrieved 10 rows in 0.06s.
`SELECT FLOOR(__time to HOUR) AS HourTime, SUM(deleted) AS LinesDeleted FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY FLOOR(__time to HOUR);` `SELECT FLOOR(__time to HOUR) AS HourTime, SUM(deleted) AS LinesDeleted FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY FLOOR(__time to HOUR);`
``` ```bash
dsql> SELECT FLOOR(__time to HOUR) AS HourTime, SUM(deleted) AS LinesDeleted FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY FLOOR(__time to HOUR); dsql> SELECT FLOOR(__time to HOUR) AS HourTime, SUM(deleted) AS LinesDeleted FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY FLOOR(__time to HOUR);
┌──────────────────────────┬──────────────┐ ┌──────────────────────────┬──────────────┐
│ HourTime │ LinesDeleted │ │ HourTime │ LinesDeleted │
@ -223,7 +223,7 @@ Retrieved 24 rows in 0.08s.
`SELECT channel, SUM(added) FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY channel ORDER BY SUM(added) DESC LIMIT 5;` `SELECT channel, SUM(added) FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY channel ORDER BY SUM(added) DESC LIMIT 5;`
``` ```bash
dsql> SELECT channel, SUM(added) FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY channel ORDER BY SUM(added) DESC LIMIT 5; dsql> SELECT channel, SUM(added) FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY channel ORDER BY SUM(added) DESC LIMIT 5;
┌───────────────┬─────────┐ ┌───────────────┬─────────┐
│ channel │ EXPR$1 │ │ channel │ EXPR$1 │
@ -241,7 +241,7 @@ Retrieved 5 rows in 0.05s.
` SELECT user, page FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 02:00:00' AND TIMESTAMP '2015-09-12 03:00:00' LIMIT 5;` ` SELECT user, page FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 02:00:00' AND TIMESTAMP '2015-09-12 03:00:00' LIMIT 5;`
``` ```bash
dsql> SELECT user, page FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 02:00:00' AND TIMESTAMP '2015-09-12 03:00:00' LIMIT 5; dsql> SELECT user, page FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 02:00:00' AND TIMESTAMP '2015-09-12 03:00:00' LIMIT 5;
┌────────────────────────┬────────────────────────────────────────────────────────┐ ┌────────────────────────┬────────────────────────────────────────────────────────┐
│ user │ page │ │ user │ page │
@ -263,7 +263,7 @@ Using the TopN query above as an example:
`EXPLAIN PLAN FOR SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 10;` `EXPLAIN PLAN FOR SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 10;`
``` ```bash
dsql> EXPLAIN PLAN FOR SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 10; dsql> EXPLAIN PLAN FOR SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 10;
┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ ┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ PLAN │ │ PLAN │
@ -275,6 +275,6 @@ Retrieved 1 row in 0.03s.
## Further reading ## Further reading
The [Queries documentation](/docs/VERSION/querying/querying.html) has more information on Druid's native JSON queries. The [Queries documentation](../querying/querying.html) has more information on Druid's native JSON queries.
The [Druid SQL documentation](/docs/VERSION/querying/sql.html) has more information on using Druid SQL queries. The [Druid SQL documentation](../querying/sql.html) has more information on using Druid SQL queries.

View File

@ -9,7 +9,7 @@ This tutorial demonstrates how to configure retention rules on a datasource to s
For this tutorial, we'll assume you've already downloaded Druid as described in For this tutorial, we'll assume you've already downloaded Druid as described in
the [single-machine quickstart](index.html) and have it running on your local machine. the [single-machine quickstart](index.html) and have it running on your local machine.
It will also be helpful to have finished [Tutorial: Loading a file](/docs/VERSION/tutorials/tutorial-batch.html) and [Tutorial: Querying data](/docs/VERSION/tutorials/tutorial-query.html). It will also be helpful to have finished [Tutorial: Loading a file](../tutorials/tutorial-batch.html) and [Tutorial: Querying data](../tutorials/tutorial-query.html).
## Load the example data ## Load the example data
@ -17,7 +17,7 @@ For this tutorial, we'll be using the Wikipedia edits sample data, with an inges
The ingestion spec can be found at `quickstart/retention-index.json`. Let's submit that spec, which will create a datasource called `retention-tutorial`: The ingestion spec can be found at `quickstart/retention-index.json`. Let's submit that spec, which will create a datasource called `retention-tutorial`:
``` ```bash
bin/post-index-task --file quickstart/tutorial/retention-index.json bin/post-index-task --file quickstart/tutorial/retention-index.json
``` ```
@ -67,13 +67,12 @@ The segments for the first 12 hours of 2015-09-12 are now gone:
The resulting retention rule chain is the following: The resulting retention rule chain is the following:
``` 1. loadByInterval 2015-09-12T12/2015-09-13 (12 hours)
loadByInterval 2015-09-12T12/2015-09-13 (12 hours)
dropForever 2. dropForever
3. loadForever (default rule)
loadForever (default rule)
```
The rule chain is evaluated from top to bottom, with the default rule chain always added at the bottom. The rule chain is evaluated from top to bottom, with the default rule chain always added at the bottom.
@ -89,4 +88,4 @@ If instead you want to retain data based on how old it is (e.g., retain data tha
## Further reading ## Further reading
* [Load rules](/docs/VERSION/operations/rule-configuration.html) * [Load rules](../operations/rule-configuration.html)

View File

@ -11,13 +11,13 @@ This tutorial will demonstrate the effects of roll-up on an example dataset.
For this tutorial, we'll assume you've already downloaded Druid as described in For this tutorial, we'll assume you've already downloaded Druid as described in
the [single-machine quickstart](index.html) and have it running on your local machine. the [single-machine quickstart](index.html) and have it running on your local machine.
It will also be helpful to have finished [Tutorial: Loading a file](/docs/VERSION/tutorials/tutorial-batch.html) and [Tutorial: Querying data](/docs/VERSION/tutorials/tutorial-query.html). It will also be helpful to have finished [Tutorial: Loading a file](../tutorials/tutorial-batch.html) and [Tutorial: Querying data](../tutorials/tutorial-query.html).
## Example data ## Example data
For this tutorial, we'll use a small sample of network flow event data, representing packet and byte counts for traffic from a source to a destination IP address that occurred within a particular second. For this tutorial, we'll use a small sample of network flow event data, representing packet and byte counts for traffic from a source to a destination IP address that occurred within a particular second.
``` ```json
{"timestamp":"2018-01-01T01:01:35Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":20,"bytes":9024} {"timestamp":"2018-01-01T01:01:35Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":20,"bytes":9024}
{"timestamp":"2018-01-01T01:01:51Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":255,"bytes":21133} {"timestamp":"2018-01-01T01:01:51Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":255,"bytes":21133}
{"timestamp":"2018-01-01T01:01:59Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":11,"bytes":5780} {"timestamp":"2018-01-01T01:01:59Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":11,"bytes":5780}
@ -33,7 +33,7 @@ A file containing this sample input data is located at `quickstart/tutorial/roll
We'll ingest this data using the following ingestion task spec, located at `quickstart/tutorial/rollup-index.json`. We'll ingest this data using the following ingestion task spec, located at `quickstart/tutorial/rollup-index.json`.
``` ```json
{ {
"type" : "index", "type" : "index",
"spec" : { "spec" : {
@ -95,9 +95,9 @@ We will see how these definitions are used after we load this data.
## Load the example data ## Load the example data
From the druid-${DRUIDVERSION} package root, run the following command: From the druid-#{DRUIDVERSION} package root, run the following command:
``` ```bash
bin/post-index-task --file quickstart/tutorial/rollup-index.json bin/post-index-task --file quickstart/tutorial/rollup-index.json
``` ```
@ -107,7 +107,7 @@ After the script completes, we will query the data.
Let's run `bin/dsql` and issue a `select * from "rollup-tutorial";` query to see what data was ingested. Let's run `bin/dsql` and issue a `select * from "rollup-tutorial";` query to see what data was ingested.
``` ```bash
$ bin/dsql $ bin/dsql
Welcome to dsql, the command-line client for Druid SQL. Welcome to dsql, the command-line client for Druid SQL.
Type "\h" for help. Type "\h" for help.
@ -128,7 +128,7 @@ dsql>
Let's look at the three events in the original input data that occurred during `2018-01-01T01:01`: Let's look at the three events in the original input data that occurred during `2018-01-01T01:01`:
``` ```json
{"timestamp":"2018-01-01T01:01:35Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":20,"bytes":9024} {"timestamp":"2018-01-01T01:01:35Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":20,"bytes":9024}
{"timestamp":"2018-01-01T01:01:51Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":255,"bytes":21133} {"timestamp":"2018-01-01T01:01:51Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":255,"bytes":21133}
{"timestamp":"2018-01-01T01:01:59Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":11,"bytes":5780} {"timestamp":"2018-01-01T01:01:59Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":11,"bytes":5780}
@ -136,7 +136,7 @@ Let's look at the three events in the original input data that occurred during `
These three rows have been "rolled up" into the following row: These three rows have been "rolled up" into the following row:
``` ```bash
┌──────────────────────────┬────────┬───────┬─────────┬─────────┬─────────┐ ┌──────────────────────────┬────────┬───────┬─────────┬─────────┬─────────┐
│ __time │ bytes │ count │ dstIP │ packets │ srcIP │ │ __time │ bytes │ count │ dstIP │ packets │ srcIP │
├──────────────────────────┼────────┼───────┼─────────┼─────────┼─────────┤ ├──────────────────────────┼────────┼───────┼─────────┼─────────┼─────────┤
@ -150,12 +150,12 @@ Before the grouping occurs, the timestamps of the original input data are bucket
Likewise, these two events that occurred during `2018-01-01T01:02` have been rolled up: Likewise, these two events that occurred during `2018-01-01T01:02` have been rolled up:
``` ```json
{"timestamp":"2018-01-01T01:02:14Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":38,"bytes":6289} {"timestamp":"2018-01-01T01:02:14Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":38,"bytes":6289}
{"timestamp":"2018-01-01T01:02:29Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":377,"bytes":359971} {"timestamp":"2018-01-01T01:02:29Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":377,"bytes":359971}
``` ```
``` ```bash
┌──────────────────────────┬────────┬───────┬─────────┬─────────┬─────────┐ ┌──────────────────────────┬────────┬───────┬─────────┬─────────┬─────────┐
│ __time │ bytes │ count │ dstIP │ packets │ srcIP │ │ __time │ bytes │ count │ dstIP │ packets │ srcIP │
├──────────────────────────┼────────┼───────┼─────────┼─────────┼─────────┤ ├──────────────────────────┼────────┼───────┼─────────┼─────────┼─────────┤
@ -165,11 +165,11 @@ Likewise, these two events that occurred during `2018-01-01T01:02` have been rol
For the last event recording traffic between 1.1.1.1 and 2.2.2.2, no roll-up took place, because this was the only event that occurred during `2018-01-01T01:03`: For the last event recording traffic between 1.1.1.1 and 2.2.2.2, no roll-up took place, because this was the only event that occurred during `2018-01-01T01:03`:
``` ```json
{"timestamp":"2018-01-01T01:03:29Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":49,"bytes":10204} {"timestamp":"2018-01-01T01:03:29Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":49,"bytes":10204}
``` ```
``` ```bash
┌──────────────────────────┬────────┬───────┬─────────┬─────────┬─────────┐ ┌──────────────────────────┬────────┬───────┬─────────┬─────────┬─────────┐
│ __time │ bytes │ count │ dstIP │ packets │ srcIP │ │ __time │ bytes │ count │ dstIP │ packets │ srcIP │
├──────────────────────────┼────────┼───────┼─────────┼─────────┼─────────┤ ├──────────────────────────┼────────┼───────┼─────────┼─────────┼─────────┤

View File

@ -18,7 +18,7 @@ don't need to have loaded any data yet.
In the Druid package root, run the following commands: In the Druid package root, run the following commands:
``` ```bash
curl http://static.druid.io/tranquility/releases/tranquility-distribution-0.8.2.tgz -o tranquility-distribution-0.8.2.tgz curl http://static.druid.io/tranquility/releases/tranquility-distribution-0.8.2.tgz -o tranquility-distribution-0.8.2.tgz
tar -xzf tranquility-distribution-0.8.2.tgz tar -xzf tranquility-distribution-0.8.2.tgz
mv tranquility-distribution-0.8.2 tranquility mv tranquility-distribution-0.8.2 tranquility
@ -33,8 +33,8 @@ The startup scripts for the tutorial will expect the contents of the Tranquility
As part of the output of *supervise* you should see something like: As part of the output of *supervise* you should see something like:
``` ```bash
Running command[tranquility-server], logging to[/stage/druid-{DRUIDVERSION}/var/sv/tranquility-server.log]: tranquility/bin/tranquility server -configFile quickstart/tutorial/conf/tranquility/server.json -Ddruid.extensions.loadList=[] Running command[tranquility-server], logging to[/stage/druid-#{DRUIDVERSION}/var/sv/tranquility-server.log]: tranquility/bin/tranquility server -configFile quickstart/tutorial/conf/tranquility/server.json -Ddruid.extensions.loadList=[]
``` ```
You can check the log file in `var/sv/tranquility-server.log` to confirm that the server is starting up properly. You can check the log file in `var/sv/tranquility-server.log` to confirm that the server is starting up properly.
@ -43,14 +43,14 @@ You can check the log file in `var/sv/tranquility-server.log` to confirm that th
Let's send the sample Wikipedia edits data to Tranquility: Let's send the sample Wikipedia edits data to Tranquility:
``` ```bash
gunzip -k quickstart/wikiticker-2015-09-12-sampled.json.gz gunzip -k quickstart/wikiticker-2015-09-12-sampled.json.gz
curl -XPOST -H'Content-Type: application/json' --data-binary @quickstart/wikiticker-2015-09-12-sampled.json http://localhost:8200/v1/post/wikipedia curl -XPOST -H'Content-Type: application/json' --data-binary @quickstart/wikiticker-2015-09-12-sampled.json http://localhost:8200/v1/post/wikipedia
``` ```
Which will print something like: Which will print something like:
``` ```json
{"result":{"received":39244,"sent":39244}} {"result":{"received":39244,"sent":39244}}
``` ```
@ -64,13 +64,13 @@ Once the data is sent to Druid, you can immediately query it.
If you see a `sent` count of 0, retry the send command until the `sent` count also shows 39244: If you see a `sent` count of 0, retry the send command until the `sent` count also shows 39244:
``` ```json
{"result":{"received":39244,"sent":0}} {"result":{"received":39244,"sent":0}}
``` ```
## Querying your data ## Querying your data
Please follow the [query tutorial](../tutorial/tutorial-query.html) to run some example queries on the newly loaded data. Please follow the [query tutorial](../tutorials/tutorial-query.html) to run some example queries on the newly loaded data.
## Cleanup ## Cleanup

View File

@ -9,13 +9,13 @@ This tutorial will demonstrate how to use transform specs to filter and transfor
For this tutorial, we'll assume you've already downloaded Druid as described in For this tutorial, we'll assume you've already downloaded Druid as described in
the [single-machine quickstart](index.html) and have it running on your local machine. the [single-machine quickstart](index.html) and have it running on your local machine.
It will also be helpful to have finished [Tutorial: Loading a file](/docs/VERSION/tutorials/tutorial-batch.html) and [Tutorial: Querying data](/docs/VERSION/tutorials/tutorial-query.html). It will also be helpful to have finished [Tutorial: Loading a file](../tutorials/tutorial-batch.html) and [Tutorial: Querying data](../tutorials/tutorial-query.html).
## Sample data ## Sample data
We've included sample data for this tutorial at `quickstart/tutorial/transform-data.json`, reproduced here for convenience: We've included sample data for this tutorial at `quickstart/tutorial/transform-data.json`, reproduced here for convenience:
``` ```json
{"timestamp":"2018-01-01T07:01:35Z","animal":"octopus", "location":1, "number":100} {"timestamp":"2018-01-01T07:01:35Z","animal":"octopus", "location":1, "number":100}
{"timestamp":"2018-01-01T05:01:35Z","animal":"mongoose", "location":2,"number":200} {"timestamp":"2018-01-01T05:01:35Z","animal":"mongoose", "location":2,"number":200}
{"timestamp":"2018-01-01T06:01:35Z","animal":"snake", "location":3, "number":300} {"timestamp":"2018-01-01T06:01:35Z","animal":"snake", "location":3, "number":300}
@ -26,7 +26,7 @@ We've included sample data for this tutorial at `quickstart/tutorial/transform-d
We will ingest the sample data using the following spec, which demonstrates the use of transform specs: We will ingest the sample data using the following spec, which demonstrates the use of transform specs:
``` ```json
{ {
"type" : "index", "type" : "index",
"spec" : { "spec" : {
@ -115,7 +115,7 @@ This filter selects the first 3 rows, and it will exclude the final "lion" row i
Let's submit this task now, which has been included at `quickstart/tutorial/transform-index.json`: Let's submit this task now, which has been included at `quickstart/tutorial/transform-index.json`:
``` ```bash
bin/post-index-task --file quickstart/tutorial/transform-index.json bin/post-index-task --file quickstart/tutorial/transform-index.json
``` ```
@ -123,7 +123,7 @@ bin/post-index-task --file quickstart/tutorial/transform-index.json
Let's run `bin/dsql` and issue a `select * from "transform-tutorial";` query to see what was ingested: Let's run `bin/dsql` and issue a `select * from "transform-tutorial";` query to see what was ingested:
``` ```bash
dsql> select * from "transform-tutorial"; dsql> select * from "transform-tutorial";
┌──────────────────────────┬────────────────┬───────┬──────────┬────────┬───────────────┐ ┌──────────────────────────┬────────────────┬───────┬──────────┬────────┬───────────────┐
│ __time │ animal │ count │ location │ number │ triple-number │ │ __time │ animal │ count │ location │ number │ triple-number │

View File

@ -9,7 +9,7 @@ This tutorial demonstrates how to update existing data, showing both overwrites
For this tutorial, we'll assume you've already downloaded Druid as described in For this tutorial, we'll assume you've already downloaded Druid as described in
the [single-machine quickstart](index.html) and have it running on your local machine. the [single-machine quickstart](index.html) and have it running on your local machine.
It will also be helpful to have finished [Tutorial: Loading a file](/docs/VERSION/tutorials/tutorial-batch.html), [Tutorial: Querying data](/docs/VERSION/tutorials/tutorial-query.html), and [Tutorial: Rollup](/docs/VERSION/tutorials/tutorial-rollup.html). It will also be helpful to have finished [Tutorial: Loading a file](../tutorials/tutorial-batch.html), [Tutorial: Querying data](../tutorials/tutorial-query.html), and [Tutorial: Rollup](../tutorials/tutorial-rollup.html).
## Overwrite ## Overwrite
@ -23,13 +23,13 @@ The spec we'll use for this tutorial is located at `quickstart/tutorial/updates-
Let's submit that task: Let's submit that task:
``` ```bash
bin/post-index-task --file quickstart/tutorial/updates-init-index.json bin/post-index-task --file quickstart/tutorial/updates-init-index.json
``` ```
We have three initial rows containing an "animal" dimension and "number" metric: We have three initial rows containing an "animal" dimension and "number" metric:
``` ```bash
dsql> select * from "updates-tutorial"; dsql> select * from "updates-tutorial";
┌──────────────────────────┬──────────┬───────┬────────┐ ┌──────────────────────────┬──────────┬───────┬────────┐
│ __time │ animal │ count │ number │ │ __time │ animal │ count │ number │
@ -51,13 +51,13 @@ Note that this task reads input from `quickstart/tutorial/updates-data2.json`, a
Let's submit that task: Let's submit that task:
``` ```bash
bin/post-index-task --file quickstart/tutorial/updates-overwrite-index.json bin/post-index-task --file quickstart/tutorial/updates-overwrite-index.json
``` ```
When Druid finishes loading the new segment from this overwrite task, the "tiger" row now has the value "lion", the "aardvark" row has a different number, and the "giraffe" row has been replaced. It may take a couple of minutes for the changes to take effect: When Druid finishes loading the new segment from this overwrite task, the "tiger" row now has the value "lion", the "aardvark" row has a different number, and the "giraffe" row has been replaced. It may take a couple of minutes for the changes to take effect:
``` ```bash
dsql> select * from "updates-tutorial"; dsql> select * from "updates-tutorial";
┌──────────────────────────┬──────────┬───────┬────────┐ ┌──────────────────────────┬──────────┬───────┬────────┐
│ __time │ animal │ count │ number │ │ __time │ animal │ count │ number │
@ -77,13 +77,13 @@ The `quickstart/tutorial/updates-append-index.json` task spec has been configure
Let's submit that task: Let's submit that task:
``` ```bash
bin/post-index-task --file quickstart/tutorial/updates-append-index.json bin/post-index-task --file quickstart/tutorial/updates-append-index.json
``` ```
When Druid finishes loading the new segment from this overwrite task, the new rows will have been added to the datasource. Note that roll-up occurred for the "lion" row: When Druid finishes loading the new segment from this overwrite task, the new rows will have been added to the datasource. Note that roll-up occurred for the "lion" row:
``` ```bash
dsql> select * from "updates-tutorial"; dsql> select * from "updates-tutorial";
┌──────────────────────────┬──────────┬───────┬────────┐ ┌──────────────────────────┬──────────┬───────┬────────┐
│ __time │ animal │ count │ number │ │ __time │ animal │ count │ number │
@ -106,13 +106,13 @@ The `quickstart/tutorial/updates-append-index2.json` task spec reads input from
Let's submit that task: Let's submit that task:
``` ```bash
bin/post-index-task --file quickstart/tutorial/updates-append-index2.json bin/post-index-task --file quickstart/tutorial/updates-append-index2.json
``` ```
When the new data is loaded, we can see two additional rows after "octopus". Note that the new "bear" row with number 222 has not been rolled up with the existing bear-111 row, because the new data is held in a separate segment. When the new data is loaded, we can see two additional rows after "octopus". Note that the new "bear" row with number 222 has not been rolled up with the existing bear-111 row, because the new data is held in a separate segment.
``` ```bash
dsql> select * from "updates-tutorial"; dsql> select * from "updates-tutorial";
┌──────────────────────────┬──────────┬───────┬────────┐ ┌──────────────────────────┬──────────┬───────┬────────┐
│ __time │ animal │ count │ number │ │ __time │ animal │ count │ number │
@ -132,7 +132,7 @@ Retrieved 8 rows in 0.02s.
If we run a GroupBy query instead of a `select *`, we can see that the "bear" rows will group together at query time: If we run a GroupBy query instead of a `select *`, we can see that the "bear" rows will group together at query time:
``` ```bash
dsql> select __time, animal, SUM("count"), SUM("number") from "updates-tutorial" group by __time, animal; dsql> select __time, animal, SUM("count"), SUM("number") from "updates-tutorial" group by __time, animal;
┌──────────────────────────┬──────────┬────────┬────────┐ ┌──────────────────────────┬──────────┬────────┬────────┐
│ __time │ animal │ EXPR$2 │ EXPR$3 │ │ __time │ animal │ EXPR$2 │ EXPR$3 │