From d512d84286783e3fd42869faf0198a612c215d33 Mon Sep 17 00:00:00 2001 From: YuCheng Hu Date: Sun, 1 Aug 2021 08:03:13 -0400 Subject: [PATCH] =?UTF-8?q?=E6=8F=90=E4=BA=A4=E5=88=B0=20Roll-up=20?= =?UTF-8?q?=E6=9F=A5=E8=AF=A2=E6=B1=87=E6=80=BB=E6=95=B0=E6=8D=AE=E5=B9=B6?= =?UTF-8?q?=E4=B8=94=E8=BF=9B=E8=A1=8C=E8=A7=A3=E9=87=8A=E8=AF=B4=E6=98=8E?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- tutorials/tutorial-rollup.md | 108 +++++------------------------------ 1 file changed, 13 insertions(+), 95 deletions(-) diff --git a/tutorials/tutorial-rollup.md b/tutorials/tutorial-rollup.md index 1ba3870..1fdd76c 100644 --- a/tutorials/tutorial-rollup.md +++ b/tutorials/tutorial-rollup.md @@ -91,17 +91,17 @@ Roll-up 是第一级对选定列集的一级聚合操作,通过这个操作我 ## 载入示例数据 -From the apache-druid-apache-druid-0.21.1 package root, run the following command: +在 Druid 包 的apache-druid-apache-druid-0.21.1 根目录下运行以下命令: ```bash bin/post-index-task --file quickstart/tutorial/rollup-index.json --url http://localhost:8081 ``` -After the script completes, we will query the data. +当上面的脚本运行完成后,我们将会开始查询数据。 -## Query the example data +## 查询示例数据 -Let's run `bin/dsql` and issue a `select * from "rollup-tutorial";` query to see what data was ingested. +让我们运行 `bin/dsql` 命令行工具,然后执行 `select * from "rollup-tutorial";` 脚本,来查看 Druid 系统中导入的数据。 ```bash $ bin/dsql @@ -122,7 +122,7 @@ Retrieved 5 rows in 1.18s. dsql> ``` -Let's look at the three events in the original input data that occurred during `2018-01-01T01:01`: +让我们查看在 `2018-01-01T01:01` 导入的 3 条原始数据: ```json {"timestamp":"2018-01-01T01:01:35Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":20,"bytes":9024} @@ -130,7 +130,7 @@ Let's look at the three events in the original input data that occurred during ` {"timestamp":"2018-01-01T01:01:59Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":11,"bytes":5780} ``` -These three rows have been "rolled up" into the following row: +上面的 3 调原始数据使用 "rolled up" 后将会合并成下面 1 条数据进行导入: ```bash ┌──────────────────────────┬────────┬───────┬─────────┬─────────┬─────────┐ @@ -139,8 +139,12 @@ These three rows have been "rolled up" into the following row: │ 2018-01-01T01:01:00.000Z │ 35937 │ 3 │ 2.2.2.2 │ 286 │ 1.1.1.1 │ └──────────────────────────┴────────┴───────┴─────────┴─────────┴─────────┘ ``` +这输入的数据将会按按照时间列(timestamp)和维度列(dimension columns) `{timestamp, srcIP, dstIP}` 进行分组(Group By),同时在指标列(metric columns) `{packages, bytes}` 上进行聚合。 + +在进行分组之前,原始输入数据的时间戳按分钟进行标记和记录的,这是由于摄取规范中的 `"queryGranularity":"minute"` 配置中决定的。 + +因此,记录中的 `2018-01-01T01:02` 期间发生的时间也被聚合后汇总。 -The input rows have been grouped by the timestamp and dimension columns `{timestamp, srcIP, dstIP}` with sum aggregations on the metric columns `packets` and `bytes`. Before the grouping occurs, the timestamps of the original input data are bucketed/floored by minute, due to the `"queryGranularity":"minute"` setting in the ingestion spec. @@ -159,7 +163,7 @@ Likewise, these two events that occurred during `2018-01-01T01:02` have been rol └──────────────────────────┴────────┴───────┴─────────┴─────────┴─────────┘ ``` -For the last event recording traffic between 1.1.1.1 and 2.2.2.2, no roll-up took place, because this was the only event that occurred during `2018-01-01T01:03`: +针对最后的记录 1.1.1.1 和 2.2.2.2 之间流量事件没有被 roll-up 进行合并汇总, 这是因为这些事件是 `2018-01-01T01:03` 期间发生的唯一事件。nt that occurred during `2018-01-01T01:03`: ```json {"timestamp":"2018-01-01T01:03:29Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":49,"bytes":10204} @@ -173,90 +177,4 @@ For the last event recording traffic between 1.1.1.1 and 2.2.2.2, no roll-up too └──────────────────────────┴────────┴───────┴─────────┴─────────┴─────────┘ ``` -Note that the `count` metric shows how many rows in the original input data contributed to the final "rolled up" row. - - - - - -### 加载示例数据 - -在Druid的根目录下运行以下命令: - -```json -bin/post-index-task --file quickstart/tutorial/rollup-index.json --url http://localhost:8081 -``` - -脚本运行完成以后,我们将查询数据。 - -### 查询示例数据 - -现在运行 `bin/dsql` 然后执行查询 `select * from "rollup-tutorial";` 来查看已经被摄入的数据。 - -```json -$ bin/dsql -Welcome to dsql, the command-line client for Druid SQL. -Type "\h" for help. -dsql> select * from "rollup-tutorial"; -┌──────────────────────────┬────────┬───────┬─────────┬─────────┬─────────┐ -│ __time │ bytes │ count │ dstIP │ packets │ srcIP │ -├──────────────────────────┼────────┼───────┼─────────┼─────────┼─────────┤ -│ 2018-01-01T01:01:00.000Z │ 35937 │ 3 │ 2.2.2.2 │ 286 │ 1.1.1.1 │ -│ 2018-01-01T01:02:00.000Z │ 366260 │ 2 │ 2.2.2.2 │ 415 │ 1.1.1.1 │ -│ 2018-01-01T01:03:00.000Z │ 10204 │ 1 │ 2.2.2.2 │ 49 │ 1.1.1.1 │ -│ 2018-01-02T21:33:00.000Z │ 100288 │ 2 │ 8.8.8.8 │ 161 │ 7.7.7.7 │ -│ 2018-01-02T21:35:00.000Z │ 2818 │ 1 │ 8.8.8.8 │ 12 │ 7.7.7.7 │ -└──────────────────────────┴────────┴───────┴─────────┴─────────┴─────────┘ -Retrieved 5 rows in 1.18s. - -dsql> -``` - -我们来看发生在 `2018-01-01T01:01` 的三条原始数据: - -```json -{"timestamp":"2018-01-01T01:01:35Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":20,"bytes":9024} -{"timestamp":"2018-01-01T01:01:51Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":255,"bytes":21133} -{"timestamp":"2018-01-01T01:01:59Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":11,"bytes":5780} -``` -这三条数据已经被roll up为以下一行数据: - -```json -┌──────────────────────────┬────────┬───────┬─────────┬─────────┬─────────┐ -│ __time │ bytes │ count │ dstIP │ packets │ srcIP │ -├──────────────────────────┼────────┼───────┼─────────┼─────────┼─────────┤ -│ 2018-01-01T01:01:00.000Z │ 35937 │ 3 │ 2.2.2.2 │ 286 │ 1.1.1.1 │ -└──────────────────────────┴────────┴───────┴─────────┴─────────┴─────────┘ -``` - -这输入的数据行已经被按照时间列和维度列 `{timestamp, srcIP, dstIP}` 在指标列 `{packages, bytes}` 上做求和聚合 - -在进行分组之前,原始输入数据的时间戳按分钟进行标记/布局,这是由于摄取规范中的 `"queryGranularity":"minute"` 设置造成的。 -同样,`2018-01-01T01:02` 期间发生的这两起事件也已经汇总。 - -```json -{"timestamp":"2018-01-01T01:02:14Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":38,"bytes":6289} -{"timestamp":"2018-01-01T01:02:29Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":377,"bytes":359971} -``` -```json -┌──────────────────────────┬────────┬───────┬─────────┬─────────┬─────────┐ -│ __time │ bytes │ count │ dstIP │ packets │ srcIP │ -├──────────────────────────┼────────┼───────┼─────────┼─────────┼─────────┤ -│ 2018-01-01T01:02:00.000Z │ 366260 │ 2 │ 2.2.2.2 │ 415 │ 1.1.1.1 │ -└──────────────────────────┴────────┴───────┴─────────┴─────────┴─────────┘ -``` - -对于记录1.1.1.1和2.2.2.2之间流量的最后一个事件没有发生汇总,因为这是 `2018-01-01T01:03` 期间发生的唯一事件 - -```json -{"timestamp":"2018-01-01T01:03:29Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":49,"bytes":10204} -``` -```json -┌──────────────────────────┬────────┬───────┬─────────┬─────────┬─────────┐ -│ __time │ bytes │ count │ dstIP │ packets │ srcIP │ -├──────────────────────────┼────────┼───────┼─────────┼─────────┼─────────┤ -│ 2018-01-01T01:03:00.000Z │ 10204 │ 1 │ 2.2.2.2 │ 49 │ 1.1.1.1 │ -└──────────────────────────┴────────┴───────┴─────────┴─────────┴─────────┘ -``` - -请注意,`计数指标 count` 显示原始输入数据中有多少行贡献给最终的"roll up"行。 +列 `计数指标(count)` 显示的是原始数据中有多少条记录最后被合并汇总(roll up)了。