Merge pull request #26 from cwiki-us-docs/feature/getting_started

查询文档完成
This commit is contained in:
YuCheng Hu 2021-07-31 17:39:43 -04:00 committed by GitHub
commit f5cfe18c9c
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 50 additions and 321 deletions

View File

@ -0,0 +1,10 @@
<?xml version="1.0" encoding="UTF-8"?>
<project version="4">
<component name="RunConfigurationProducerService">
<option name="ignoredProducers">
<set>
<option value="com.android.tools.idea.compose.preview.runconfiguration.ComposePreviewRunConfigurationProducer" />
</set>
</option>
</component>
</project>

View File

@ -45,25 +45,21 @@ Druid 控制台提供了视图能够让用户更加容易的在 Druid 进行查
![Query results](../assets/tutorial-query-04.png "Query results")
Notice that the results are limited in the console to about a hundred, by default, due to the **Smart query limit**
feature. This helps users avoid inadvertently running queries that return an excessive amount of data, possibly
overwhelming their system.
需要注意的是,通过控制台进行查询的返回结果集被限制为默认 100 条记录,这是在 **Smart query limit** 特性中进行配置的。
这个能够帮助用户避免在运行查询的时候返回大量的数据,有可能会让其系统过载。
7. Let's edit the query directly and take a look at a few more query building features in the editor.
Click in the query edit pane and make the following changes:
7. 让我们对上面的查询语句进行一些编辑来看看在查询构建器中能够提供那些特性,请在查询构建起器中进行下面的一些修改:
1. Add a line after the first column, `"page"` and Start typing the name of a new column, `"countryName"`. Notice that the autocomplete menu suggests column names, functions, keywords, and more. Choose "countryName" and
add the new column to the GROUP BY clause as well, either by name or by reference to its position, `2`.
1. 第一列的 `"page"` 后面开始输入一个新列的名字 `"countryName"`。请注意自动完成菜单将会针对你输入的字符提示 列名,函数,关键字以及其他的内容
选择 "countryName" 和添加新的列到 GROUP BY 语句中,可以通过名字或者位置 `2` 来完成操作。
2. For readability, replace `Count` column name with `Edits`, since the `COUNT()` function actually
returns the number of edits for the page. Make the same column name change in the ORDER BY clause as well.
2. 为了让我们的 SQL 更加具有可读性,将 `Count` 列的名字替换为 `Edits`,这是因为这一列是使用 `COUNT()` 函数来进行计算的,实际上的目的是返回编辑的次数。
在 ORDER BY 语句中使用同样的名字来进行排序。
The `COUNT()` function is one of many functions available for use in Druid SQL queries. You can mouse over a function name
in the autocomplete menu to see a brief description of a function. Also, you can find more information in the Druid
documentation; for example, the `COUNT()` function is documented in
[Aggregation functions](../querying/sql.md#aggregation-functions).
`COUNT()` 函数是 Druid 提供的多个可用函数的一个。你可以将你的鼠标移动到函数的名字上面,在随后弹出的自动完成对话框中将会对函数的功能进行一个简要的描述
同时,你可以可以通过 Druid 的文档来了解更多的内容,例如, `COUNT()` 函数的文档位于 [Aggregation functions](../querying/sql.md#aggregation-functions) 页面中。
The query should now be:
当完成上面的所有操作后,你的 SQL 脚本应该看起来和下面的是一样的了:
```sql
SELECT
@ -75,27 +71,25 @@ returns the number of edits for the page. Make the same column name change in th
ORDER BY "Edits" DESC
```
When you run the query again, notice that we're getting the new dimension,`countryName`, but for most of the rows, its value
is null. Let's
show only rows with a `countryName` value.
当你对上面的 SQL 脚本再次运行以后你会注意到我们会返回一个新的列dimension`countryName`,但是这一列的大部分行的值都是空的。
让我们通过修改 SQL 来只显示 `countryName` 不为空的行。
8. Click the countryName dimension in the left pane and choose the first filtering option. It's not exactly what we want, but
we'll edit it by hand. The new WHERE clause should appear in your query.
8. 单击 countryName 这一列在左侧的面部中选择第一个过滤器first filtering的选项。这个过滤器的内容可能并不是我们想要的我们会在后面对其进行编辑
WHERE 语句将会显示在你的查询中。
9. Modify the WHERE clause to exclude results that do not have a value for countryName:
9. 修改 WHERE 语句来将 countryName 不为空的列去除掉。
```sql
WHERE "countryName" IS NOT NULL
```
Run the query again. You should now see the top edits by country:
然后再次运行修改后的 SQL 脚本,你应该可以只看到编辑次数最多的国家:
![Finished query](../assets/tutorial-query-035.png "Finished query")
10. Under the covers, every Druid SQL query is translated into a query in the JSON-based _Druid native query_ format before it runs
on data nodes. You can view the native query for this query by clicking `...` and **Explain SQL Query**.
10. 在 Druid 使用 SQL 进行查询的后面,所有的 Druid SQL 查询都可以被转换为基于 JSON 格式的 _Druid native query_ 来在 Druid 的数据节点中进行查询。
你可以通过单击查询运行按钮的后面`...` 然后选择 **Explain SQL Query** 来进行查看。
While you can use Druid SQL for most purposes, familiarity with native query is useful for composing complex queries and for troubleshooting
performance issues. For more information, see [Native queries](../querying/querying.md).
尽管你可以在大部分的情况下使用 Druid SQL但是如果你能够了解 Druid 原生查询的意义,那么对你在问题解决和有关性能问题的调试上面会更加有效,请参考 [Native queries](../querying/querying.md) 页面来获得更多信息。
![Explain query](../assets/tutorial-query-06.png "Explain query")
@ -116,17 +110,18 @@ performance issues. For more information, see [Native queries](../querying/query
from the command line or over HTTP.
9. Finally, click `...` and **Edit context** to see how you can add additional parameters controlling the execution of the query execution. In the field, enter query context options as JSON key-value pairs, as described in [Context flags](../querying/query-context.md).
11. 最后,单击 `...` 然后选择 **Edit context** 来查看你可以添加的其他参数来控制查询的执行。
在这个字段中,可以通过输入基于 JSON 格式的 key-value 对,请参考 [Context flags](../querying/query-context.md) 页面描述的更多内容。
That's it! We've built a simple query using some of the query builder features built into the Druid Console. The following
sections provide a few more example queries you can try. Also, see [Other ways to invoke SQL queries](#other-ways-to-invoke-sql-queries) to learn how
to run Druid SQL from the command line or over HTTP.
上面就是我们如何通过使用 Druid 控制的查询构建特性来构建的一个简单的数据查询。
在本页面的后续部分提供了更多的一些你可以尝试使用的查询实例。同时请查看 [进行查询的其他方法](#进行查询的其他方法) 部分中的内容来了解如何
在命令行工具或者 HTTP 上运行 Druid SQL 查询。
## More Druid SQL examples
## 更多的查询实例
Here is a collection of queries to try out:
下面是你可以在 Druid 上尝试进行查询的一些实例供你测试:
### Query over time
### 对时间进行查询
```sql
SELECT FLOOR(__time to HOUR) AS HourTime, SUM(deleted) AS LinesDeleted
@ -136,7 +131,7 @@ GROUP BY 1
![Query example](../assets/tutorial-query-07.png "Query example")
### General group by
### 基本的 group by
```sql
SELECT channel, page, SUM(added)
@ -148,13 +143,13 @@ ORDER BY SUM(added) DESC
![Query example](../assets/tutorial-query-08.png "Query example")
## Other ways to invoke SQL queries
## 进行查询的其他方法
### Query SQL via dsql
### 通过 dsql 进行查询
For convenience, the Druid package includes a SQL command-line client, located at `bin/dsql` in the Druid package root.
为了便于使用Druid 包中还提供了一个 SQL 命令行客户端工具,这个工具位于 `bin/dsql` 目录中。
Let's now run `bin/dsql`; you should see the following prompt:
如果你直接运行 `bin/dsql` 的话,你将会看到下面的提示输出:
```bash
Welcome to dsql, the command-line client for Druid SQL.
@ -162,7 +157,7 @@ Type "\h" for help.
dsql>
```
To submit the query, paste it to the `dsql` prompt and press enter:
如果希望进行查询的话,将你的 SQL 张贴到 `dsql` 提示光标后面,然后单击回车:
```bash
dsql> SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 10;
@ -184,18 +179,17 @@ Retrieved 10 rows in 0.06s.
```
### Query SQL over HTTP
### 通过 HTTP 进行查询
你可以通过 HTTP 协议将你的查询直接提交给 Druid Broker 进行处理。
You can submit queries directly to the Druid Broker over HTTP.
The tutorial package includes an example file that contains the SQL query shown above at `quickstart/tutorial/wikipedia-top-pages-sql.json`. Let's submit that query to the Druid Broker:
本教程中有关将 SQL 转换为 JSON 的脚本保存在 `quickstart/tutorial/wikipedia-top-pages-sql.json` 文件中,让我们将上面的查询提交给 Druid Broker 进行处理:
```bash
curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/tutorial/wikipedia-top-pages-sql.json http://localhost:8888/druid/v2/sql
```
The following results should be returned:
下面的结构将会显示查询执行后的返回结果:
```json
[
@ -242,284 +236,9 @@ The following results should be returned:
]
```
## Further reading
## 延伸阅读
See the [Druid SQL documentation](../querying/sql.md) for more information on using Druid SQL queries.
[Druid SQL 文档](../querying/sql.md) - 使用 Druid SQL 进行查询更多信息。
See the [Queries documentation](../querying/querying.md) for more information on Druid native queries.
[查询文档](../querying/querying.md) 使用 Druid native 进行查询更多信息。
## 查询数据
本教程将以Druid SQL和Druid的原生查询格式的示例演示如何在Apache Druid中查询数据。
本教程假定您已经完成了摄取教程之一因为我们将查询Wikipedia编辑样例数据。
* [加载本地文件](tutorial-batch.md)
* [从Kafka加载数据](./chapter-2.md)
* [从Hadoop加载数据](./chapter-3.md)
Druid查询通过HTTP发送,Druid控制台包括一个视图用于向Druid发出查询并很好地格式化结果。
### Druid SQL查询
Druid支持SQL查询。
该查询检索了2015年9月12日被编辑最多的10个维基百科页面
```json
SELECT page, COUNT(*) AS Edits
FROM wikipedia
WHERE TIMESTAMP '2015-09-12 00:00:00' <= "__time" AND "__time" < TIMESTAMP '2015-09-13 00:00:00'
GROUP BY page
ORDER BY Edits DESC
LIMIT 10
```
让我们来看几种不同的查询方法
#### 通过控制台查询SQL
您可以通过在控制台中进行上述查询:
![](img-3/tutorial-query-01.png)
控制台查询视图通过内联文档提供自动补全功能。
![](img-3/tutorial-query-02.png)
您还可以从 `...` 选项菜单中配置要与查询一起发送的其他上下文标志。
请注意控制台将默认情况下使用带Limit的SQL查询以便可以完成诸如`SELECT * FROM wikipedia`之类的查询,您可以通过 `Smart query limit` 切换关闭此行为。
![](img-3/tutorial-query-03.png)
查询视图提供了可以为您编写和修改查询的上下文操作。
#### 通过dsql查询SQL
为方便起见Druid软件包中包括了一个SQL命令行客户端位于Druid根目录中的 `bin/dsql`
运行 `bin/dsql`, 可以看到如下:
```json
Welcome to dsql, the command-line client for Druid SQL.
Type "\h" for help.
dsql>
```
将SQl粘贴到 `dsql` 中提交查询:
```json
dsql> SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 10;
┌──────────────────────────────────────────────────────────┬───────┐
│ page │ Edits │
├──────────────────────────────────────────────────────────┼───────┤
│ Wikipedia:Vandalismusmeldung │ 33 │
│ User:Cyde/List of candidates for speedy deletion/Subpage │ 28 │
│ Jeremy Corbyn │ 27 │
│ Wikipedia:Administrators' noticeboard/Incidents │ 21 │
│ Flavia Pennetta │ 20 │
│ Total Drama Presents: The Ridonculous Race │ 18 │
│ User talk:Dudeperson176123 │ 18 │
│ Wikipédia:Le Bistro/12 septembre 2015 │ 18 │
│ Wikipedia:In the news/Candidates │ 17 │
│ Wikipedia:Requests for page protection │ 17 │
└──────────────────────────────────────────────────────────┴───────┘
Retrieved 10 rows in 0.06s.
```
#### 通过HTTP查询SQL
SQL查询作为JSON通过HTTP提交
教程包括一个示例文件, 该文件`quickstart/tutorial/wikipedia-top-pages-sql.json`包含上面显示的SQL查询, 我们将该查询提交给Druid Broker。
```json
curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/tutorial/wikipedia-top-pages-sql.json http://localhost:8888/druid/v2/sql
```
结果返回如下:
```json
[
{
"page": "Wikipedia:Vandalismusmeldung",
"Edits": 33
},
{
"page": "User:Cyde/List of candidates for speedy deletion/Subpage",
"Edits": 28
},
{
"page": "Jeremy Corbyn",
"Edits": 27
},
{
"page": "Wikipedia:Administrators' noticeboard/Incidents",
"Edits": 21
},
{
"page": "Flavia Pennetta",
"Edits": 20
},
{
"page": "Total Drama Presents: The Ridonculous Race",
"Edits": 18
},
{
"page": "User talk:Dudeperson176123",
"Edits": 18
},
{
"page": "Wikipédia:Le Bistro/12 septembre 2015",
"Edits": 18
},
{
"page": "Wikipedia:In the news/Candidates",
"Edits": 17
},
{
"page": "Wikipedia:Requests for page protection",
"Edits": 17
}
]
```
#### 更多Druid SQL示例
这是一组可尝试的查询:
**时间查询**
```json
SELECT FLOOR(__time to HOUR) AS HourTime, SUM(deleted) AS LinesDeleted
FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00'
GROUP BY 1
```
![](img-3/tutorial-query-03.png)
**聚合查询**
```json
SELECT channel, page, SUM(added)
FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00'
GROUP BY channel, page
ORDER BY SUM(added) DESC
```
![](img-3/tutorial-query-04.png)
**查询原始数据**
```json
SELECT user, page
FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 02:00:00' AND TIMESTAMP '2015-09-12 03:00:00'
LIMIT 5
```
![](img-3/tutorial-query-05.png)
#### SQL查询计划
Druid SQL能够解释给定查询的查询计划, 在控制台中,可以通过 `...` 按钮访问此功能。
![](img-3/tutorial-query-06.png)
如果您以其他方式查询则可以通过在Druid SQL查询之前添加 `EXPLAIN PLAN FOR` 来获得查询计划。
使用上边的一个示例:
`EXPLAIN PLAN FOR SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 10;`
```json
dsql> EXPLAIN PLAN FOR SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 10;
┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ PLAN │
├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ DruidQueryRel(query=[{"queryType":"topN","dataSource":{"type":"table","name":"wikipedia"},"virtualColumns":[],"dimension":{"type":"default","dimension":"page","outputName":"d0","outputType":"STRING"},"metric":{"type":"numeric","metric":"a0"},"threshold":10,"intervals":{"type":"intervals","intervals":["2015-09-12T00:00:00.000Z/2015-09-13T00:00:00.001Z"]},"filter":null,"granularity":{"type":"all"},"aggregations":[{"type":"count","name":"a0"}],"postAggregations":[],"context":{},"descending":false}], signature=[{d0:STRING, a0:LONG}]) │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
Retrieved 1 row in 0.03s.
```
### 原生JSON查询
Druid的原生查询格式以JSON表示。
#### 通过控制台原生查询
您可以从控制台的"Query"视图发出原生Druid查询。
这是一个查询可检索2015-09-12上具有最多页面编辑量的10个wikipedia页面。
```json
{
"queryType" : "topN",
"dataSource" : "wikipedia",
"intervals" : ["2015-09-12/2015-09-13"],
"granularity" : "all",
"dimension" : "page",
"metric" : "count",
"threshold" : 10,
"aggregations" : [
{
"type" : "count",
"name" : "count"
}
]
}
```
只需将其粘贴到控制台即可将编辑器切换到JSON模式。
![](img-3/tutorial-query-07.png)
#### 通过HTTP原生查询
我们在 `quickstart/tutorial/wikipedia-top-pages.json` 文件中包括了一个示例原生TopN查询。
提交该查询到Druid
```json
curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/tutorial/wikipedia-top-pages.json http://localhost:8888/druid/v2?pretty
```
您可以看到如下的查询结果:
```json
[ {
"timestamp" : "2015-09-12T00:46:58.771Z",
"result" : [ {
"count" : 33,
"page" : "Wikipedia:Vandalismusmeldung"
}, {
"count" : 28,
"page" : "User:Cyde/List of candidates for speedy deletion/Subpage"
}, {
"count" : 27,
"page" : "Jeremy Corbyn"
}, {
"count" : 21,
"page" : "Wikipedia:Administrators' noticeboard/Incidents"
}, {
"count" : 20,
"page" : "Flavia Pennetta"
}, {
"count" : 18,
"page" : "Total Drama Presents: The Ridonculous Race"
}, {
"count" : 18,
"page" : "User talk:Dudeperson176123"
}, {
"count" : 18,
"page" : "Wikipédia:Le Bistro/12 septembre 2015"
}, {
"count" : 17,
"page" : "Wikipedia:In the news/Candidates"
}, {
"count" : 17,
"page" : "Wikipedia:Requests for page protection"
} ]
} ]
```
### 进一步阅读
[查询文档](../querying/makeNativeQueries.md)有更多关于Druid原生JSON查询的信息
[Druid SQL文档](../querying/druidsql.md)有更多关于Druid SQL查询的信息