--- id: tutorial-query title: "Tutorial: Querying data" sidebar_label: "Querying data" --- This tutorial will demonstrate how to query data in Apache Druid (incubating), with examples for Druid SQL and Druid's native query format. The tutorial assumes that you've already completed one of the 4 ingestion tutorials, as we will be querying the sample Wikipedia edits data. * [Tutorial: Loading a file](../tutorials/tutorial-batch.md) * [Tutorial: Loading stream data from Kafka](../tutorials/tutorial-kafka.md) * [Tutorial: Loading a file using Hadoop](../tutorials/tutorial-batch-hadoop.md) Druid queries are sent over HTTP. The Druid console includes a view to issue queries to Druid and nicely format the results. ## Druid SQL queries Druid supports a dialect of SQL for querying. This query retrieves the 10 Wikipedia pages with the most page edits on 2015-09-12. ```sql SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 10 ``` Let's look at the different ways to issue this query. ### Query SQL via the console You can issue the above query from the console. ![Query autocomplete](../assets/tutorial-query-01.png "Query autocomplete") The console query view provides autocomplete together with inline function documentation. You can also configure extra context flags to be sent with the query from the more options menu. ![Query options](../assets/tutorial-query-02.png "Query options") Note that the console will by default wrap your SQL queries in a limit so that you can issue queries like `SELECT * FROM wikipedia` without much hesitation - you can turn off this behaviour. ### Query SQL via dsql For convenience, the Druid package includes a SQL command-line client, located at `bin/dsql` from the Druid package root. Let's now run `bin/dsql`; you should see the following prompt: ```bash Welcome to dsql, the command-line client for Druid SQL. Type "\h" for help. dsql> ``` To submit the query, paste it to the `dsql` prompt and press enter: ```bash dsql> SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 10; ┌──────────────────────────────────────────────────────────┬───────┐ │ page │ Edits │ ├──────────────────────────────────────────────────────────┼───────┤ │ Wikipedia:Vandalismusmeldung │ 33 │ │ User:Cyde/List of candidates for speedy deletion/Subpage │ 28 │ │ Jeremy Corbyn │ 27 │ │ Wikipedia:Administrators' noticeboard/Incidents │ 21 │ │ Flavia Pennetta │ 20 │ │ Total Drama Presents: The Ridonculous Race │ 18 │ │ User talk:Dudeperson176123 │ 18 │ │ Wikipédia:Le Bistro/12 septembre 2015 │ 18 │ │ Wikipedia:In the news/Candidates │ 17 │ │ Wikipedia:Requests for page protection │ 17 │ └──────────────────────────────────────────────────────────┴───────┘ Retrieved 10 rows in 0.06s. ``` ### Query SQL over HTTP The SQL queries are submitted as JSON over HTTP. The tutorial package includes an example file that contains the SQL query shown above at `quickstart/tutorial/wikipedia-top-pages-sql.json`. Let's submit that query to the Druid Broker: ```bash curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/tutorial/wikipedia-top-pages-sql.json http://localhost:8888/druid/v2/sql ``` The following results should be returned: ```json [ { "page": "Wikipedia:Vandalismusmeldung", "Edits": 33 }, { "page": "User:Cyde/List of candidates for speedy deletion/Subpage", "Edits": 28 }, { "page": "Jeremy Corbyn", "Edits": 27 }, { "page": "Wikipedia:Administrators' noticeboard/Incidents", "Edits": 21 }, { "page": "Flavia Pennetta", "Edits": 20 }, { "page": "Total Drama Presents: The Ridonculous Race", "Edits": 18 }, { "page": "User talk:Dudeperson176123", "Edits": 18 }, { "page": "Wikipédia:Le Bistro/12 septembre 2015", "Edits": 18 }, { "page": "Wikipedia:In the news/Candidates", "Edits": 17 }, { "page": "Wikipedia:Requests for page protection", "Edits": 17 } ] ``` ### More Druid SQL examples Here is a collection of queries to try out: #### Query over time ```sql SELECT FLOOR(__time to HOUR) AS HourTime, SUM(deleted) AS LinesDeleted FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY 1 ``` ![Query example](../assets/tutorial-query-03.png "Query example") #### General group by ```sql SELECT channel, page, SUM(added) FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY channel, page ORDER BY SUM(added) DESC ``` ![Query example](../assets/tutorial-query-04.png "Query example") #### Select raw data ```sql SELECT user, page FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 02:00:00' AND TIMESTAMP '2015-09-12 03:00:00' LIMIT 5 ``` ![Query example](../assets/tutorial-query-05.png "Query example") ### Explain query plan Druid SQL has the ability to explain the query plan for a given query. In the console this functionality is accessible from the `...` button. ![Explain query](../assets/tutorial-query-06.png "Explain query") If you are querying in other ways you can get the plan by prepending `EXPLAIN PLAN FOR ` to a Druid SQL query. Using a query from an example above: `EXPLAIN PLAN FOR SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 10;` ```bash dsql> EXPLAIN PLAN FOR SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 10; ┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ PLAN │ ├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ DruidQueryRel(query=[{"queryType":"topN","dataSource":{"type":"table","name":"wikipedia"},"virtualColumns":[],"dimension":{"type":"default","dimension":"page","outputName":"d0","outputType":"STRING"},"metric":{"type":"numeric","metric":"a0"},"threshold":10,"intervals":{"type":"intervals","intervals":["2015-09-12T00:00:00.000Z/2015-09-13T00:00:00.001Z"]},"filter":null,"granularity":{"type":"all"},"aggregations":[{"type":"count","name":"a0"}],"postAggregations":[],"context":{},"descending":false}], signature=[{d0:STRING, a0:LONG}]) │ └─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ Retrieved 1 row in 0.03s. ``` ## Native JSON queries Druid's native query format is expressed in JSON. ### Native query via the console You can issue native Druid queries from the console's Query view. Here is a query that retrieves the 10 Wikipedia pages with the most page edits on 2015-09-12. ```json { "queryType" : "topN", "dataSource" : "wikipedia", "intervals" : ["2015-09-12/2015-09-13"], "granularity" : "all", "dimension" : "page", "metric" : "count", "threshold" : 10, "aggregations" : [ { "type" : "count", "name" : "count" } ] } ``` Simply paste it into the console to switch the editor into JSON mode. ![Native query](../assets/tutorial-query-07.png "Native query") ### Native queries over HTTP We have included a sample native TopN query under `quickstart/tutorial/wikipedia-top-pages.json`: Let's submit this query to Druid: ```bash curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/tutorial/wikipedia-top-pages.json http://localhost:8888/druid/v2?pretty ``` You should see the following query results: ```json [ { "timestamp" : "2015-09-12T00:46:58.771Z", "result" : [ { "count" : 33, "page" : "Wikipedia:Vandalismusmeldung" }, { "count" : 28, "page" : "User:Cyde/List of candidates for speedy deletion/Subpage" }, { "count" : 27, "page" : "Jeremy Corbyn" }, { "count" : 21, "page" : "Wikipedia:Administrators' noticeboard/Incidents" }, { "count" : 20, "page" : "Flavia Pennetta" }, { "count" : 18, "page" : "Total Drama Presents: The Ridonculous Race" }, { "count" : 18, "page" : "User talk:Dudeperson176123" }, { "count" : 18, "page" : "Wikipédia:Le Bistro/12 septembre 2015" }, { "count" : 17, "page" : "Wikipedia:In the news/Candidates" }, { "count" : 17, "page" : "Wikipedia:Requests for page protection" } ] } ] ``` ## Further reading The [Queries documentation](../querying/querying.md) has more information on Druid's native JSON queries. The [Druid SQL documentation](../querying/sql.md) has more information on using Druid SQL queries.