druid/docs/content/tutorials/tutorial-all-about-queries.md

---
layout: doc_page
---

# Tutorial: All About Queries
Hello! This tutorial is meant to provide a more in-depth look into Druid queries. The tutorial is somewhat incomplete right now but we hope to add more content to it in the near future.

Setup
-----

Before we start digging into how to query Druid, make sure you've gone through the other tutorials and are comfortable with spinning up a local cluster and loading data into Druid.

#### Booting a Druid Cluster

Let's start up a simple Druid cluster so we can query all the things.

Note: If Zookeeper and metadata storage aren't running, you'll have to start them again as described in [The Druid Cluster](../tutorials/tutorial-the-druid-cluster.html).

To start a Coordinator node:

```bash
java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath config/_common:config/coordinator:lib/* io.druid.cli.Main server coordinator
```

To start a Historical node:

```bash
java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath config/_common:config/historical:lib/* io.druid.cli.Main server historical
```

To start a Broker node:

```bash
java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath config/_common:config/broker:lib/* io.druid.cli.Main server broker
```

Querying Your Data
------------------

Make sure you've completed [Loading Your Data](../tutorials/tutorial-loading-streaming-data.html) so we have some data to query. Having done that, it's time to query our data! For a complete specification of queries, see [Querying](../querying/querying.html).

#### Construct a Query
```json
{
    "queryType": "groupBy",
    "dataSource": "wikipedia",
    "granularity": "all",
    "dimensions": [],
    "aggregations": [
        {"type": "count", "name": "rows"},
        {"type": "longSum", "name": "edit_count", "fieldName": "count"},
        {"type": "doubleSum", "name": "chars_added", "fieldName": "added"}
    ],
    "intervals": ["2010-01-01T00:00/2020-01-01T00"]
}
```

#### Query That Data
Run the query against your broker:

```bash
curl -X POST "http://localhost:8082/druid/v2/?pretty" -H 'Content-type: application/json' -d @query.body
```

And get:

```json
[ {
  "version" : "v1",
  "timestamp" : "2010-01-01T00:00:00.000Z",
  "event" : {
    "chars_added" : 1545.0,
    "edit_count" : 5,
    "rows" : 5
  }
} ]
```

This result tells us that our query has 5 edits, and we have 5 rows of data as well. In those 5 edits, we have 1545 characters added.

#### What can I query for?

How are we to know what queries we can run? Although [Querying](../querying/querying.html) is a helpful index, to get a handle on querying our data we need to look at our ingestion schema. There are a few particular fields we care about in the ingestion schema. All of these fields should in present in the real-time ingestion schema and the batch ingestion schema.

Datasource:

```json
"dataSource":"wikipedia"
```

Our dataSource tells us the name of the relation/table, or 'source of data'. What we decide to name our data source must match the data source we are going to be querying.

Granularity:

```json
"indexGranularity": "none",
```

Druid will roll up data at ingestion time unless the index/rollup granularity is specified as "none". Your query granularity cannot be lower than your index granularity.

Aggregators:

```json
"aggregators" : [{
   "type" : "count",
   "name" : "count"
  }, {
   "type" : "doubleSum",
   "name" : "added",
   "fieldName" : "added"
  }, {
   "type" : "doubleSum",
   "name" : "deleted",
   "fieldName" : "deleted"
  }, {
   "type" : "doubleSum",
   "name" : "delta",
   "fieldName" : "delta"
}]
```

The [Aggregations](../querying/aggregations.html) specified at ingestion time correlated directly to the metrics that can be queried.

Dimensions:

```json
"dimensions" : ["page","language","user","unpatrolled","newPage","robot","anonymous","namespace","continent","country","region","city"]
```

These specify the dimensions that we can filter our data on. If we added a dimension to our groupBy query, we get:

```json
{
    "queryType": "groupBy",
    "dataSource": "wikipedia",
    "granularity": "all",
    "dimensions": ["namespace"],
    "aggregations": [
        {"type": "longSum", "name": "edit_count", "fieldName": "count"},
        {"type": "doubleSum", "name": "chars_added", "fieldName": "added"}
    ],
    "intervals": ["2010-01-01T00:00/2020-01-01T00"]
}
```

Which gets us data grouped over the namespace dimension in return!

```json
[ {
  "version" : "v1",
  "timestamp" : "2010-01-01T00:00:00.000Z",
  "event" : {
    "chars_added" : 180.0,
    "edit_count" : 2,
    "namespace" : "article"
  }
}, {
  "version" : "v1",
  "timestamp" : "2010-01-01T00:00:00.000Z",
  "event" : {
    "chars_added" : 1365.0,
    "edit_count" : 3,
    "namespace" : "wikipedia"
  }
} ]
```

Additionally,, we can also filter our query to narrow down our metric values:

```json
{
    "queryType": "groupBy",
    "dataSource": "wikipedia",
    "granularity": "all",
    "filter": { "type": "selector", "dimension": "namespace", "value": "article" },
    "aggregations": [
        {"type": "longSum", "name": "edit_count", "fieldName": "count"},
        {"type": "doubleSum", "name": "chars_added", "fieldName": "added"}
    ],
    "intervals": ["2010-01-01T00:00/2020-01-01T00"]
}
```

Which gets us metrics about only those edits where the namespace is 'article':

```json
[ {
  "version" : "v1",
  "timestamp" : "2010-01-01T00:00:00.000Z",
  "event" : {
    "chars_added" : 180.0,
    "edit_count" : 2
  }
} ]
```

Check out [Filters](../querying/filters.html) for more information.

What Types of Queries to Use
----------------------------

The types of query you should use depends on your use case. [TimeBoundary queries](../querying/timeboundaryquery.html) are useful to understand the range of your data. [Timeseries queries](../querying/timeseriesquery.html) are useful for aggregates and filters over a time range, and offer significant speed improvements over [GroupBy queries](../querying/groupbyquery.html). To find the top values for a given dimension, [TopN queries](../querying/topnquery.html) should be used over group by queries as well.


## Learn More ##

You can learn more about querying at [Querying](../querying/querying.html)! If you are ready to evaluate Druid more in depth, check out [Booting a production cluster](../tutorials/booting-a-production-cluster.html)!
a ton of fixes to docs 2013-10-10 18:05:01 -04:00			`---`
			`layout: doc_page`
			`---`
added titles since there is no other indication other than URL as to which page has been selected from the left-side nav menu 2013-12-04 23:41:25 -05:00
			`# Tutorial: All About Queries`
a ton of fixes to docs 2013-10-10 18:05:01 -04:00			`Hello! This tutorial is meant to provide a more in-depth look into Druid queries. The tutorial is somewhat incomplete right now but we hope to add more content to it in the near future.`

			`Setup`
			`-----`

			`Before we start digging into how to query Druid, make sure you've gone through the other tutorials and are comfortable with spinning up a local cluster and loading data into Druid.`

			`#### Booting a Druid Cluster`

			`Let's start up a simple Druid cluster so we can query all the things.`

renaming all *.md filenames to only have lowercase and dashes so that they are editable on case-insensitive os as well 2015-05-05 17:07:32 -04:00			`Note: If Zookeeper and metadata storage aren't running, you'll have to start them again as described in [The Druid Cluster](../tutorials/tutorial-the-druid-cluster.html).`
added advice about restarting zk and mysql if not already running 2014-02-19 18:28:48 -05:00
a ton of fixes to docs 2013-10-10 18:05:01 -04:00			`To start a Coordinator node:`

			```bash
Update docs and examples for log4j2 usage. - Put configs early in classpath in examples so log4j2.xml will get picked up properly - Add an example log4j2.xml file. - Update Logging doc. 2015-02-18 14:19:47 -05:00			`java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath config/_common:config/coordinator:lib/* io.druid.cli.Main server coordinator`
a ton of fixes to docs 2013-10-10 18:05:01 -04:00			```

			`To start a Historical node:`

			```bash
Update docs and examples for log4j2 usage. - Put configs early in classpath in examples so log4j2.xml will get picked up properly - Add an example log4j2.xml file. - Update Logging doc. 2015-02-18 14:19:47 -05:00			`java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath config/_common:config/historical:lib/* io.druid.cli.Main server historical`
a ton of fixes to docs 2013-10-10 18:05:01 -04:00			```

			`To start a Broker node:`

			```bash
Update docs and examples for log4j2 usage. - Put configs early in classpath in examples so log4j2.xml will get picked up properly - Add an example log4j2.xml file. - Update Logging doc. 2015-02-18 14:19:47 -05:00			`java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath config/_common:config/broker:lib/* io.druid.cli.Main server broker`
a ton of fixes to docs 2013-10-10 18:05:01 -04:00			```

			`Querying Your Data`
			`------------------`

renaming all *.md filenames to only have lowercase and dashes so that they are editable on case-insensitive os as well 2015-05-05 17:07:32 -04:00			`Make sure you've completed [Loading Your Data](../tutorials/tutorial-loading-streaming-data.html) so we have some data to query. Having done that, it's time to query our data! For a complete specification of queries, see [Querying](../querying/querying.html).`
a ton of fixes to docs 2013-10-10 18:05:01 -04:00
			`#### Construct a Query`
			```json
			`{`
			`"queryType": "groupBy",`
			`"dataSource": "wikipedia",`
			`"granularity": "all",`
			`"dimensions": [],`
			`"aggregations": [`
			`{"type": "count", "name": "rows"},`
			`{"type": "longSum", "name": "edit_count", "fieldName": "count"},`
			`{"type": "doubleSum", "name": "chars_added", "fieldName": "added"}`
			`],`
			`"intervals": ["2010-01-01T00:00/2020-01-01T00"]`
			`}`
			```

			`#### Query That Data`
			`Run the query against your broker:`

			```bash
Use default ports in examples 2015-02-18 14:46:27 -05:00			`curl -X POST "http://localhost:8082/druid/v2/?pretty" -H 'Content-type: application/json' -d @query.body`
a ton of fixes to docs 2013-10-10 18:05:01 -04:00			```

			`And get:`

			```json
			`[ {`
			`"version" : "v1",`
			`"timestamp" : "2010-01-01T00:00:00.000Z",`
			`"event" : {`
			`"chars_added" : 1545.0,`
			`"edit_count" : 5,`
			`"rows" : 5`
			`}`
			`} ]`
			```

			`This result tells us that our query has 5 edits, and we have 5 rows of data as well. In those 5 edits, we have 1545 characters added.`

			`#### What can I query for?`

renaming all *.md filenames to only have lowercase and dashes so that they are editable on case-insensitive os as well 2015-05-05 17:07:32 -04:00			`How are we to know what queries we can run? Although [Querying](../querying/querying.html) is a helpful index, to get a handle on querying our data we need to look at our ingestion schema. There are a few particular fields we care about in the ingestion schema. All of these fields should in present in the real-time ingestion schema and the batch ingestion schema.`
a ton of fixes to docs 2013-10-10 18:05:01 -04:00
			`Datasource:`

			```json
			`"dataSource":"wikipedia"`
			```

			`Our dataSource tells us the name of the relation/table, or 'source of data'. What we decide to name our data source must match the data source we are going to be querying.`

			`Granularity:`

			```json
			`"indexGranularity": "none",`
			```

			`Druid will roll up data at ingestion time unless the index/rollup granularity is specified as "none". Your query granularity cannot be lower than your index granularity.`

			`Aggregators:`

			```json
			`"aggregators" : [{`
			`"type" : "count",`
			`"name" : "count"`
			`}, {`
			`"type" : "doubleSum",`
			`"name" : "added",`
			`"fieldName" : "added"`
			`}, {`
			`"type" : "doubleSum",`
			`"name" : "deleted",`
			`"fieldName" : "deleted"`
			`}, {`
			`"type" : "doubleSum",`
			`"name" : "delta",`
			`"fieldName" : "delta"`
			`}]`
			```

renaming all *.md filenames to only have lowercase and dashes so that they are editable on case-insensitive os as well 2015-05-05 17:07:32 -04:00			`The [Aggregations](../querying/aggregations.html) specified at ingestion time correlated directly to the metrics that can be queried.`
a ton of fixes to docs 2013-10-10 18:05:01 -04:00
			`Dimensions:`

			```json
			`"dimensions" : ["page","language","user","unpatrolled","newPage","robot","anonymous","namespace","continent","country","region","city"]`
			```

			`These specify the dimensions that we can filter our data on. If we added a dimension to our groupBy query, we get:`

			```json
			`{`
			`"queryType": "groupBy",`
			`"dataSource": "wikipedia",`
			`"granularity": "all",`
			`"dimensions": ["namespace"],`
			`"aggregations": [`
			`{"type": "longSum", "name": "edit_count", "fieldName": "count"},`
			`{"type": "doubleSum", "name": "chars_added", "fieldName": "added"}`
			`],`
			`"intervals": ["2010-01-01T00:00/2020-01-01T00"]`
			`}`
			```

			`Which gets us data grouped over the namespace dimension in return!`

			```json
			`[ {`
			`"version" : "v1",`
			`"timestamp" : "2010-01-01T00:00:00.000Z",`
			`"event" : {`
			`"chars_added" : 180.0,`
			`"edit_count" : 2,`
			`"namespace" : "article"`
			`}`
			`}, {`
			`"version" : "v1",`
			`"timestamp" : "2010-01-01T00:00:00.000Z",`
			`"event" : {`
			`"chars_added" : 1365.0,`
			`"edit_count" : 3,`
			`"namespace" : "wikipedia"`
			`}`
			`} ]`
			```

			`Additionally,, we can also filter our query to narrow down our metric values:`

			```json
			`{`
			`"queryType": "groupBy",`
			`"dataSource": "wikipedia",`
			`"granularity": "all",`
			`"filter": { "type": "selector", "dimension": "namespace", "value": "article" },`
			`"aggregations": [`
			`{"type": "longSum", "name": "edit_count", "fieldName": "count"},`
			`{"type": "doubleSum", "name": "chars_added", "fieldName": "added"}`
			`],`
			`"intervals": ["2010-01-01T00:00/2020-01-01T00"]`
			`}`
			```

			`Which gets us metrics about only those edits where the namespace is 'article':`

			```json
			`[ {`
			`"version" : "v1",`
			`"timestamp" : "2010-01-01T00:00:00.000Z",`
			`"event" : {`
			`"chars_added" : 180.0,`
			`"edit_count" : 2`
			`}`
			`} ]`
			```

renaming all *.md filenames to only have lowercase and dashes so that they are editable on case-insensitive os as well 2015-05-05 17:07:32 -04:00			`Check out [Filters](../querying/filters.html) for more information.`
a ton of fixes to docs 2013-10-10 18:05:01 -04:00
a whole bunch of docs and fixes 2014-01-13 21:01:56 -05:00			`What Types of Queries to Use`
			`----------------------------`

renaming all *.md filenames to only have lowercase and dashes so that they are editable on case-insensitive os as well 2015-05-05 17:07:32 -04:00			The types of query you should use depends on your use case. [TimeBoundary queries](../querying/timeboundaryquery.html) are useful to understand the range of your data. [Timeseries queries](../querying/timeseriesquery.html) are useful for aggregates and filters over a time range, and offer significant speed improvements over [GroupBy queries](../querying/groupbyquery.html). To find the top values for a given dimension, [TopN queries](../querying/topnquery.html) should be used over group by queries as well.
a whole bunch of docs and fixes 2014-01-13 21:01:56 -05:00

a ton of fixes to docs 2013-10-10 18:05:01 -04:00			`## Learn More ##`

renaming all *.md filenames to only have lowercase and dashes so that they are editable on case-insensitive os as well 2015-05-05 17:07:32 -04:00			`You can learn more about querying at [Querying](../querying/querying.html)! If you are ready to evaluate Druid more in depth, check out [Booting a production cluster](../tutorials/booting-a-production-cluster.html)!`