mirror of https://github.com/apache/druid.git
280 lines
17 KiB
Markdown
280 lines
17 KiB
Markdown
---
|
|
layout: doc_page
|
|
---
|
|
|
|
# Tutorial: Querying data
|
|
|
|
This tutorial will demonstrate how to query data in Druid, with examples for Druid's native query format and Druid SQL.
|
|
|
|
The tutorial assumes that you've already completed one of the 4 ingestion tutorials, as we will be querying the sample Wikipedia edits data.
|
|
|
|
* [Tutorial: Loading a file](../tutorials/tutorial-batch.html)
|
|
* [Tutorial: Loading stream data from Kafka](../tutorials/tutorial-kafka.html)
|
|
* [Tutorial: Loading a file using Hadoop](../tutorials/tutorial-batch-hadoop.html)
|
|
* [Tutorial: Loading stream data using Tranquility](../tutorials/tutorial-tranquility.html)
|
|
|
|
## Native JSON queries
|
|
|
|
Druid's native query format is expressed in JSON. We have included a sample native TopN query under `quickstart/tutorial/wikipedia-top-pages.json`:
|
|
|
|
```json
|
|
{
|
|
"queryType" : "topN",
|
|
"dataSource" : "wikipedia",
|
|
"intervals" : ["2015-09-12/2015-09-13"],
|
|
"granularity" : "all",
|
|
"dimension" : "page",
|
|
"metric" : "count",
|
|
"threshold" : 10,
|
|
"aggregations" : [
|
|
{
|
|
"type" : "count",
|
|
"name" : "count"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
This query retrieves the 10 Wikipedia pages with the most page edits on 2015-09-12.
|
|
|
|
Let's submit this query to the Druid broker:
|
|
|
|
```bash
|
|
curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/tutorial/wikipedia-top-pages.json http://localhost:8082/druid/v2?pretty
|
|
```
|
|
|
|
You should see the following query results:
|
|
|
|
```json
|
|
[ {
|
|
"timestamp" : "2015-09-12T00:46:58.771Z",
|
|
"result" : [ {
|
|
"count" : 33,
|
|
"page" : "Wikipedia:Vandalismusmeldung"
|
|
}, {
|
|
"count" : 28,
|
|
"page" : "User:Cyde/List of candidates for speedy deletion/Subpage"
|
|
}, {
|
|
"count" : 27,
|
|
"page" : "Jeremy Corbyn"
|
|
}, {
|
|
"count" : 21,
|
|
"page" : "Wikipedia:Administrators' noticeboard/Incidents"
|
|
}, {
|
|
"count" : 20,
|
|
"page" : "Flavia Pennetta"
|
|
}, {
|
|
"count" : 18,
|
|
"page" : "Total Drama Presents: The Ridonculous Race"
|
|
}, {
|
|
"count" : 18,
|
|
"page" : "User talk:Dudeperson176123"
|
|
}, {
|
|
"count" : 18,
|
|
"page" : "Wikipédia:Le Bistro/12 septembre 2015"
|
|
}, {
|
|
"count" : 17,
|
|
"page" : "Wikipedia:In the news/Candidates"
|
|
}, {
|
|
"count" : 17,
|
|
"page" : "Wikipedia:Requests for page protection"
|
|
} ]
|
|
} ]
|
|
```
|
|
|
|
## Druid SQL queries
|
|
|
|
Druid also supports a dialect of SQL for querying. Let's run a SQL query that is equivalent to the native JSON query shown above:
|
|
|
|
```
|
|
SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 10;
|
|
```
|
|
|
|
The SQL queries are submitted as JSON over HTTP.
|
|
|
|
### TopN query example
|
|
|
|
The tutorial package includes an example file that contains the SQL query shown above at `quickstart/wikipedia-top-pages-sql.json`. Let's submit that query to the Druid broker:
|
|
|
|
```bash
|
|
curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/tutorial/wikipedia-top-pages-sql.json http://localhost:8082/druid/v2/sql
|
|
```
|
|
|
|
The following results should be returned:
|
|
|
|
```json
|
|
[
|
|
{
|
|
"page": "Wikipedia:Vandalismusmeldung",
|
|
"Edits": 33
|
|
},
|
|
{
|
|
"page": "User:Cyde/List of candidates for speedy deletion/Subpage",
|
|
"Edits": 28
|
|
},
|
|
{
|
|
"page": "Jeremy Corbyn",
|
|
"Edits": 27
|
|
},
|
|
{
|
|
"page": "Wikipedia:Administrators' noticeboard/Incidents",
|
|
"Edits": 21
|
|
},
|
|
{
|
|
"page": "Flavia Pennetta",
|
|
"Edits": 20
|
|
},
|
|
{
|
|
"page": "Total Drama Presents: The Ridonculous Race",
|
|
"Edits": 18
|
|
},
|
|
{
|
|
"page": "User talk:Dudeperson176123",
|
|
"Edits": 18
|
|
},
|
|
{
|
|
"page": "Wikipédia:Le Bistro/12 septembre 2015",
|
|
"Edits": 18
|
|
},
|
|
{
|
|
"page": "Wikipedia:In the news/Candidates",
|
|
"Edits": 17
|
|
},
|
|
{
|
|
"page": "Wikipedia:Requests for page protection",
|
|
"Edits": 17
|
|
}
|
|
]
|
|
```
|
|
|
|
### dsql client
|
|
|
|
For convenience, the Druid package includes a SQL command-line client, located at `bin/dsql` from the Druid package root.
|
|
|
|
Let's now run `bin/dsql`; you should see the following prompt:
|
|
|
|
```bash
|
|
Welcome to dsql, the command-line client for Druid SQL.
|
|
Type "\h" for help.
|
|
dsql>
|
|
```
|
|
|
|
To submit the query, paste it to the `dsql` prompt and press enter:
|
|
|
|
```bash
|
|
dsql> SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 10;
|
|
┌──────────────────────────────────────────────────────────┬───────┐
|
|
│ page │ Edits │
|
|
├──────────────────────────────────────────────────────────┼───────┤
|
|
│ Wikipedia:Vandalismusmeldung │ 33 │
|
|
│ User:Cyde/List of candidates for speedy deletion/Subpage │ 28 │
|
|
│ Jeremy Corbyn │ 27 │
|
|
│ Wikipedia:Administrators' noticeboard/Incidents │ 21 │
|
|
│ Flavia Pennetta │ 20 │
|
|
│ Total Drama Presents: The Ridonculous Race │ 18 │
|
|
│ User talk:Dudeperson176123 │ 18 │
|
|
│ Wikipédia:Le Bistro/12 septembre 2015 │ 18 │
|
|
│ Wikipedia:In the news/Candidates │ 17 │
|
|
│ Wikipedia:Requests for page protection │ 17 │
|
|
└──────────────────────────────────────────────────────────┴───────┘
|
|
Retrieved 10 rows in 0.06s.
|
|
```
|
|
|
|
### Additional Druid SQL queries
|
|
|
|
#### Timeseries
|
|
|
|
`SELECT FLOOR(__time to HOUR) AS HourTime, SUM(deleted) AS LinesDeleted FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY FLOOR(__time to HOUR);`
|
|
|
|
```bash
|
|
dsql> SELECT FLOOR(__time to HOUR) AS HourTime, SUM(deleted) AS LinesDeleted FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY FLOOR(__time to HOUR);
|
|
┌──────────────────────────┬──────────────┐
|
|
│ HourTime │ LinesDeleted │
|
|
├──────────────────────────┼──────────────┤
|
|
│ 2015-09-12T00:00:00.000Z │ 1761 │
|
|
│ 2015-09-12T01:00:00.000Z │ 16208 │
|
|
│ 2015-09-12T02:00:00.000Z │ 14543 │
|
|
│ 2015-09-12T03:00:00.000Z │ 13101 │
|
|
│ 2015-09-12T04:00:00.000Z │ 12040 │
|
|
│ 2015-09-12T05:00:00.000Z │ 6399 │
|
|
│ 2015-09-12T06:00:00.000Z │ 9036 │
|
|
│ 2015-09-12T07:00:00.000Z │ 11409 │
|
|
│ 2015-09-12T08:00:00.000Z │ 11616 │
|
|
│ 2015-09-12T09:00:00.000Z │ 17509 │
|
|
│ 2015-09-12T10:00:00.000Z │ 19406 │
|
|
│ 2015-09-12T11:00:00.000Z │ 16284 │
|
|
│ 2015-09-12T12:00:00.000Z │ 18672 │
|
|
│ 2015-09-12T13:00:00.000Z │ 30520 │
|
|
│ 2015-09-12T14:00:00.000Z │ 18025 │
|
|
│ 2015-09-12T15:00:00.000Z │ 26399 │
|
|
│ 2015-09-12T16:00:00.000Z │ 24759 │
|
|
│ 2015-09-12T17:00:00.000Z │ 19634 │
|
|
│ 2015-09-12T18:00:00.000Z │ 17345 │
|
|
│ 2015-09-12T19:00:00.000Z │ 19305 │
|
|
│ 2015-09-12T20:00:00.000Z │ 22265 │
|
|
│ 2015-09-12T21:00:00.000Z │ 16394 │
|
|
│ 2015-09-12T22:00:00.000Z │ 16379 │
|
|
│ 2015-09-12T23:00:00.000Z │ 15289 │
|
|
└──────────────────────────┴──────────────┘
|
|
Retrieved 24 rows in 0.08s.
|
|
```
|
|
|
|
#### GroupBy
|
|
|
|
`SELECT channel, SUM(added) FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY channel ORDER BY SUM(added) DESC LIMIT 5;`
|
|
|
|
```bash
|
|
dsql> SELECT channel, SUM(added) FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY channel ORDER BY SUM(added) DESC LIMIT 5;
|
|
┌───────────────┬─────────┐
|
|
│ channel │ EXPR$1 │
|
|
├───────────────┼─────────┤
|
|
│ #en.wikipedia │ 3045299 │
|
|
│ #it.wikipedia │ 711011 │
|
|
│ #fr.wikipedia │ 642555 │
|
|
│ #ru.wikipedia │ 640698 │
|
|
│ #es.wikipedia │ 634670 │
|
|
└───────────────┴─────────┘
|
|
Retrieved 5 rows in 0.05s.
|
|
```
|
|
|
|
#### Scan
|
|
|
|
` SELECT user, page FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 02:00:00' AND TIMESTAMP '2015-09-12 03:00:00' LIMIT 5;`
|
|
|
|
```bash
|
|
dsql> SELECT user, page FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 02:00:00' AND TIMESTAMP '2015-09-12 03:00:00' LIMIT 5;
|
|
┌────────────────────────┬────────────────────────────────────────────────────────┐
|
|
│ user │ page │
|
|
├────────────────────────┼────────────────────────────────────────────────────────┤
|
|
│ Thiago89 │ Campeonato Mundial de Voleibol Femenino Sub-20 de 2015 │
|
|
│ 91.34.200.249 │ Friede von Schönbrunn │
|
|
│ TuHan-Bot │ Trĩ vàng │
|
|
│ Lowercase sigmabot III │ User talk:ErrantX │
|
|
│ BattyBot │ Hans W. Jung │
|
|
└────────────────────────┴────────────────────────────────────────────────────────┘
|
|
Retrieved 5 rows in 0.04s.
|
|
```
|
|
|
|
#### EXPLAIN PLAN FOR
|
|
|
|
By prepending `EXPLAIN PLAN FOR ` to a Druid SQL query, it is possible to see what native Druid queries a SQL query will plan into.
|
|
|
|
Using the TopN query above as an example:
|
|
|
|
`EXPLAIN PLAN FOR SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 10;`
|
|
|
|
```bash
|
|
dsql> EXPLAIN PLAN FOR SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 10;
|
|
┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
|
|
│ PLAN │
|
|
├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
|
|
│ DruidQueryRel(query=[{"queryType":"topN","dataSource":{"type":"table","name":"wikipedia"},"virtualColumns":[],"dimension":{"type":"default","dimension":"page","outputName":"d0","outputType":"STRING"},"metric":{"type":"numeric","metric":"a0"},"threshold":10,"intervals":{"type":"intervals","intervals":["2015-09-12T00:00:00.000Z/2015-09-13T00:00:00.001Z"]},"filter":null,"granularity":{"type":"all"},"aggregations":[{"type":"count","name":"a0"}],"postAggregations":[],"context":{},"descending":false}], signature=[{d0:STRING, a0:LONG}]) │
|
|
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
|
|
Retrieved 1 row in 0.03s.
|
|
```
|
|
|
|
## Further reading
|
|
|
|
The [Queries documentation](../querying/querying.html) has more information on Druid's native JSON queries.
|
|
|
|
The [Druid SQL documentation](../querying/sql.html) has more information on using Druid SQL queries. |