2018-12-13 14:47:20 -05:00
---
2019-08-21 00:48:59 -04:00
id: tutorial-query
2018-12-13 14:47:20 -05:00
title: "Tutorial: Querying data"
2019-08-21 00:48:59 -04:00
sidebar_label: "Querying data"
2018-12-13 14:47:20 -05:00
---
2018-11-13 12:38:37 -05:00
<!--
~ Licensed to the Apache Software Foundation (ASF) under one
~ or more contributor license agreements. See the NOTICE file
~ distributed with this work for additional information
~ regarding copyright ownership. The ASF licenses this file
~ to you under the Apache License, Version 2.0 (the
~ "License"); you may not use this file except in compliance
~ with the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing,
~ software distributed under the License is distributed on an
~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
~ KIND, either express or implied. See the License for the
~ specific language governing permissions and limitations
~ under the License.
-->
2018-08-09 16:37:52 -04:00
2019-06-17 21:00:54 -04:00
This tutorial will demonstrate how to query data in Apache Druid (incubating), with examples for Druid SQL and Druid's native query format.
2018-08-09 16:37:52 -04:00
The tutorial assumes that you've already completed one of the 4 ingestion tutorials, as we will be querying the sample Wikipedia edits data.
2019-08-21 00:48:59 -04:00
* [Tutorial: Loading a file ](../tutorials/tutorial-batch.md )
* [Tutorial: Loading stream data from Kafka ](../tutorials/tutorial-kafka.md )
* [Tutorial: Loading a file using Hadoop ](../tutorials/tutorial-batch-hadoop.md )
2018-08-09 16:37:52 -04:00
2019-06-17 21:00:54 -04:00
Druid queries are sent over HTTP.
2019-08-21 00:48:59 -04:00
The Druid console includes a view to issue queries to Druid and nicely format the results.
2018-08-09 16:37:52 -04:00
2019-06-17 21:00:54 -04:00
## Druid SQL queries
2018-08-09 16:37:52 -04:00
2019-06-17 21:00:54 -04:00
Druid supports a dialect of SQL for querying.
2018-08-09 16:37:52 -04:00
This query retrieves the 10 Wikipedia pages with the most page edits on 2015-09-12.
2019-06-17 21:00:54 -04:00
```sql
SELECT page, COUNT(*) AS Edits
FROM wikipedia
WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00'
GROUP BY page ORDER BY Edits DESC
LIMIT 10
2018-08-09 16:37:52 -04:00
```
2019-06-17 21:00:54 -04:00
Let's look at the different ways to issue this query.
2018-08-09 16:37:52 -04:00
2019-06-17 21:00:54 -04:00
### Query SQL via the console
2018-08-09 16:37:52 -04:00
2019-06-17 21:00:54 -04:00
You can issue the above query from the console.
2019-08-21 00:48:59 -04:00
![Query autocomplete ](../assets/tutorial-query-01.png "Query autocomplete" )
2019-06-17 21:00:54 -04:00
The console query view provides autocomplete together with inline function documentation.
You can also configure extra context flags to be sent with the query from the more options menu.
2019-08-21 00:48:59 -04:00
![Query options ](../assets/tutorial-query-02.png "Query options" )
2019-06-17 21:00:54 -04:00
2019-08-21 00:48:59 -04:00
Note that the console will by default wrap your SQL queries in a limit so that you can issue queries like `SELECT * FROM wikipedia` without much hesitation - you can turn off this behaviour.
2018-08-09 16:37:52 -04:00
2019-06-17 21:00:54 -04:00
### Query SQL via dsql
2018-08-09 16:37:52 -04:00
2019-06-17 21:00:54 -04:00
For convenience, the Druid package includes a SQL command-line client, located at `bin/dsql` from the Druid package root.
Let's now run `bin/dsql` ; you should see the following prompt:
```bash
Welcome to dsql, the command-line client for Druid SQL.
Type "\h" for help.
2019-08-21 00:48:59 -04:00
dsql>
2018-08-09 16:37:52 -04:00
```
2019-06-17 21:00:54 -04:00
To submit the query, paste it to the `dsql` prompt and press enter:
```bash
dsql> SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 10;
┌──────────────────────────────────────────────────────────┬───────┐
│ page │ Edits │
├──────────────────────────────────────────────────────────┼───────┤
│ Wikipedia:Vandalismusmeldung │ 33 │
│ User:Cyde/List of candidates for speedy deletion/Subpage │ 28 │
│ Jeremy Corbyn │ 27 │
│ Wikipedia:Administrators' noticeboard/Incidents │ 21 │
│ Flavia Pennetta │ 20 │
│ Total Drama Presents: The Ridonculous Race │ 18 │
│ User talk:Dudeperson176123 │ 18 │
│ Wikipédia:Le Bistro/12 septembre 2015 │ 18 │
│ Wikipedia:In the news/Candidates │ 17 │
│ Wikipedia:Requests for page protection │ 17 │
└──────────────────────────────────────────────────────────┴───────┘
Retrieved 10 rows in 0.06s.
2018-08-09 16:37:52 -04:00
```
2019-06-17 21:00:54 -04:00
### Query SQL over HTTP
The SQL queries are submitted as JSON over HTTP.
2018-08-09 16:37:52 -04:00
2019-01-30 22:41:07 -05:00
The tutorial package includes an example file that contains the SQL query shown above at `quickstart/tutorial/wikipedia-top-pages-sql.json` . Let's submit that query to the Druid Broker:
2018-08-09 16:37:52 -04:00
```bash
2019-06-17 21:00:54 -04:00
curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/tutorial/wikipedia -top-pages-sql.json http://localhost:8888/druid/v2/sql
2018-08-09 16:37:52 -04:00
```
The following results should be returned:
2018-08-13 14:11:32 -04:00
```json
2018-08-09 16:37:52 -04:00
[
{
"page": "Wikipedia:Vandalismusmeldung",
"Edits": 33
},
{
"page": "User:Cyde/List of candidates for speedy deletion/Subpage",
"Edits": 28
},
{
"page": "Jeremy Corbyn",
"Edits": 27
},
{
"page": "Wikipedia:Administrators' noticeboard/Incidents",
"Edits": 21
},
{
"page": "Flavia Pennetta",
"Edits": 20
},
{
"page": "Total Drama Presents: The Ridonculous Race",
"Edits": 18
},
{
"page": "User talk:Dudeperson176123",
"Edits": 18
},
{
"page": "Wikipédia:Le Bistro/12 septembre 2015",
"Edits": 18
},
{
"page": "Wikipedia:In the news/Candidates",
"Edits": 17
},
{
"page": "Wikipedia:Requests for page protection",
"Edits": 17
}
]
```
2019-06-17 21:00:54 -04:00
### More Druid SQL examples
2018-08-09 16:37:52 -04:00
2019-06-17 21:00:54 -04:00
Here is a collection of queries to try out:
2018-08-09 16:37:52 -04:00
2019-06-17 21:00:54 -04:00
#### Query over time
2018-08-09 16:37:52 -04:00
2019-06-17 21:00:54 -04:00
```sql
SELECT FLOOR(__time to HOUR) AS HourTime, SUM(deleted) AS LinesDeleted
FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00'
GROUP BY 1
2018-08-09 16:37:52 -04:00
```
2019-08-21 00:48:59 -04:00
![Query example ](../assets/tutorial-query-03.png "Query example" )
2018-08-09 16:37:52 -04:00
2019-06-17 21:00:54 -04:00
#### General group by
2018-08-09 16:37:52 -04:00
2019-06-17 21:00:54 -04:00
```sql
SELECT channel, page, SUM(added)
FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00'
GROUP BY channel, page
ORDER BY SUM(added) DESC
2018-08-09 16:37:52 -04:00
```
2019-08-21 00:48:59 -04:00
![Query example ](../assets/tutorial-query-04.png "Query example" )
2018-08-09 16:37:52 -04:00
2019-06-17 21:00:54 -04:00
#### Select raw data
2018-08-09 16:37:52 -04:00
2019-06-17 21:00:54 -04:00
```sql
SELECT user, page
FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 02:00:00' AND TIMESTAMP '2015-09-12 03:00:00'
LIMIT 5
2018-08-09 16:37:52 -04:00
```
2019-08-21 00:48:59 -04:00
![Query example ](../assets/tutorial-query-05.png "Query example" )
2018-08-09 16:37:52 -04:00
2019-06-17 21:00:54 -04:00
### Explain query plan
2018-08-09 16:37:52 -04:00
2019-06-17 21:00:54 -04:00
Druid SQL has the ability to explain the query plan for a given query.
In the console this functionality is accessible from the `...` button.
2018-08-09 16:37:52 -04:00
2019-08-21 00:48:59 -04:00
![Explain query ](../assets/tutorial-query-06.png "Explain query" )
2018-08-09 16:37:52 -04:00
2019-06-17 21:00:54 -04:00
If you are querying in other ways you can get the plan by prepending `EXPLAIN PLAN FOR ` to a Druid SQL query.
2018-08-09 16:37:52 -04:00
2019-06-17 21:00:54 -04:00
Using a query from an example above:
2018-08-09 16:37:52 -04:00
`EXPLAIN PLAN FOR SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 10;`
2018-08-13 14:11:32 -04:00
```bash
2018-08-09 16:37:52 -04:00
dsql> EXPLAIN PLAN FOR SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 10;
┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ PLAN │
├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ DruidQueryRel(query=[{"queryType":"topN","dataSource":{"type":"table","name":"wikipedia"},"virtualColumns":[],"dimension":{"type":"default","dimension":"page","outputName":"d0","outputType":"STRING"},"metric":{"type":"numeric","metric":"a0"},"threshold":10,"intervals":{"type":"intervals","intervals":["2015-09-12T00:00:00.000Z/2015-09-13T00:00:00.001Z"]},"filter":null,"granularity":{"type":"all"},"aggregations":[{"type":"count","name":"a0"}],"postAggregations":[],"context":{},"descending":false}], signature=[{d0:STRING, a0:LONG}]) │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
Retrieved 1 row in 0.03s.
```
2019-06-17 21:00:54 -04:00
## Native JSON queries
Druid's native query format is expressed in JSON.
### Native query via the console
You can issue native Druid queries from the console's Query view.
Here is a query that retrieves the 10 Wikipedia pages with the most page edits on 2015-09-12.
```json
{
"queryType" : "topN",
"dataSource" : "wikipedia",
"intervals" : ["2015-09-12/2015-09-13"],
"granularity" : "all",
"dimension" : "page",
"metric" : "count",
"threshold" : 10,
"aggregations" : [
{
"type" : "count",
"name" : "count"
}
]
}
```
Simply paste it into the console to switch the editor into JSON mode.
2019-08-21 00:48:59 -04:00
![Native query ](../assets/tutorial-query-07.png "Native query" )
2019-06-17 21:00:54 -04:00
### Native queries over HTTP
We have included a sample native TopN query under `quickstart/tutorial/wikipedia-top-pages.json` :
Let's submit this query to Druid:
```bash
curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/tutorial/wikipedia -top-pages.json http://localhost:8888/druid/v2?pretty
```
You should see the following query results:
```json
[ {
"timestamp" : "2015-09-12T00:46:58.771Z",
"result" : [ {
"count" : 33,
"page" : "Wikipedia:Vandalismusmeldung"
}, {
"count" : 28,
"page" : "User:Cyde/List of candidates for speedy deletion/Subpage"
}, {
"count" : 27,
"page" : "Jeremy Corbyn"
}, {
"count" : 21,
"page" : "Wikipedia:Administrators' noticeboard/Incidents"
}, {
"count" : 20,
"page" : "Flavia Pennetta"
}, {
"count" : 18,
"page" : "Total Drama Presents: The Ridonculous Race"
}, {
"count" : 18,
"page" : "User talk:Dudeperson176123"
}, {
"count" : 18,
"page" : "Wikipédia:Le Bistro/12 septembre 2015"
}, {
"count" : 17,
"page" : "Wikipedia:In the news/Candidates"
}, {
"count" : 17,
"page" : "Wikipedia:Requests for page protection"
} ]
} ]
```
2018-08-09 16:37:52 -04:00
## Further reading
2019-08-21 00:48:59 -04:00
The [Queries documentation ](../querying/querying.md ) has more information on Druid's native JSON queries.
2018-08-09 16:37:52 -04:00
2019-08-21 00:48:59 -04:00
The [Druid SQL documentation ](../querying/sql.md ) has more information on using Druid SQL queries.