Merge pull request #34 from cwiki-us-docs/feature/cluster

针对配置文件夹中的内容删除不需要的文件
This commit is contained in:
YuCheng Hu 2021-08-04 14:46:56 -04:00 committed by GitHub
commit bcec8eecda
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
54 changed files with 2926 additions and 197 deletions

View File

@ -1,80 +0,0 @@
<script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<ins class="adsbygoogle"
style="display:block; text-align:center;"
data-ad-layout="in-article"
data-ad-format="fluid"
data-ad-client="ca-pub-8828078415045620"
data-ad-slot="7586680510"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
<!-- toc -->
## 配置文档
本部分内容列出来了每一种Druid服务的所有配置项
### 推荐的配置文件组织方式
对于Druid的配置文件一种推荐的结构组织方式为将配置文件放置在Druid根目录的`conf`目录下,如以下所示:
```json
$ ls -R conf
druid
conf/druid:
_common broker coordinator historical middleManager overlord
conf/druid/_common:
common.runtime.properties log4j2.xml
conf/druid/broker:
jvm.config runtime.properties
conf/druid/coordinator:
jvm.config runtime.properties
conf/druid/historical:
jvm.config runtime.properties
conf/druid/middleManager:
jvm.config runtime.properties
conf/druid/overlord:
jvm.config runtime.properties
```
每一个目录下都有一个 `runtime.properties` 文件该文件中包含了特定的Druid进程相关的配置项例如 `historical`
`jvm.config` 文件包含了每一个服务的JVM参数例如堆内存属性等
所有进程共享的通用属性位于 `_common/common.runtime.properties` 中。
### 通用配置
本节下的属性是应该在集群中的所有Druid服务之间共享的公共配置。
#### JVM配置最佳实践
在我们的所有进程中有四个需要配置的JVM参数
1. `-Duser.timezone=UTC` 该参数将JVM的默认时区设置为UTC。我们总是这样设置不使用其他默认时区进行测试因此本地时区可能会工作但它们也可能会发现奇怪和有趣的错误。要在非UTC时区中发出查询请参阅 [查询粒度](../querying/granularity.md)
2. `-Dfile.encoding=UTF-8` 这类似于时区我们假设UTF-8进行测试。本地编码可能有效但也可能导致奇怪和有趣的错误。
3. `-Djava.io.tmpdir=<a path>` 系统中与文件系统交互的各个部分都是通过临时文件完成的,这些文件可能会变得有些大。许多生产系统都被设置为具有小的(但是很快的)`/tmp`目录这对于Druid来说可能是个问题因此我们建议将JVM的tmp目录指向一些有更多内容的目录。此目录不应为volatile tmpfs。这个目录还应该具有良好的读写速度因此应该强烈避免NFS挂载。
4. `-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager` 这允许log4j2处理使用标准java日志的非log4j2组件如jetty的日志。
#### 扩展
#### 请求日志
#### SQL兼容的空值处理
### Master
#### Coordinator
#### Overlord
### Data
#### MiddleManager and Peons
##### SegmentWriteOutMediumFactory
#### Indexer
#### Historical
### Query
#### Broker
#### Router

View File

@ -1 +0,0 @@
<!-- toc -->

View File

@ -1 +0,0 @@
<!-- toc -->

View File

@ -1 +0,0 @@
<!-- toc -->

View File

@ -1 +0,0 @@
<!-- toc -->

View File

@ -1 +0,0 @@
<!-- toc -->

View File

@ -1 +0,0 @@
<!-- toc -->

View File

@ -1 +0,0 @@
<!-- toc -->

View File

@ -1 +0,0 @@
<!-- toc -->

View File

@ -1 +0,0 @@
<!-- toc -->

View File

@ -1 +0,0 @@
<!-- toc -->

View File

@ -1 +0,0 @@
<!-- toc -->

View File

@ -1 +0,0 @@
<!-- toc -->

View File

@ -1 +0,0 @@
<!-- toc -->

View File

@ -1 +0,0 @@
<!-- toc -->

View File

@ -1 +0,0 @@
<!-- toc -->

View File

@ -1 +0,0 @@
<!-- toc -->

View File

@ -1,2 +0,0 @@
<!-- toc -->
### 社区扩展

View File

@ -14,7 +14,7 @@
## 数据格式
Apache Druid可以接收JSON、CSV或TSV等分隔格式或任何自定义格式的非规范化数据。尽管文档中的大多数示例使用JSON格式的数据但将Druid配置为接收任何其他分隔数据并不困难。我们欢迎对新格式的任何贡献。
此页列出了Druid支持的所有默认和核心扩展数据格式。有关社区扩展支持的其他数据格式请参阅我们的 [社区扩展列表](../Configuration/extensions.md#社区扩展)。
此页列出了Druid支持的所有默认和核心扩展数据格式。有关社区扩展支持的其他数据格式请参阅我们的 [社区扩展列表](../configuration/logging.md#社区扩展)。
### 格式化数据

View File

@ -92,7 +92,7 @@ foo_2015-01-03/2015-01-04_v1_2
}
```
压缩任务读取时间间隔 `2017-01-01/2018-01-01` 的*所有分段*,并生成新分段。由于 `segmentGranularity` 为空,压缩后原始的段粒度将保持不变。要控制每个时间块的结果段数,可以设置 [`maxRowsPerSegment`](../Configuration/configuration.md#Coordinator) 或 [`numShards`](native.md#tuningconfig)。请注意您可以同时运行多个压缩任务。例如您可以每月运行12个compactionTasks而不是一整年只运行一个任务。
压缩任务读取时间间隔 `2017-01-01/2018-01-01` 的*所有分段*,并生成新分段。由于 `segmentGranularity` 为空,压缩后原始的段粒度将保持不变。要控制每个时间块的结果段数,可以设置 [`maxRowsPerSegment`](../configuration/human-readable-byte.md#Coordinator) 或 [`numShards`](native.md#tuningconfig)。请注意您可以同时运行多个压缩任务。例如您可以每月运行12个compactionTasks而不是一整年只运行一个任务。
压缩任务在内部生成 `index` 任务规范,用于使用某些固定参数执行的压缩工作。例如,它的 `inputSource` 始终是 [DruidInputSource](native.md#Druid输入源)`dimensionsSpec` 和 `metricsSpec` 默认包含输入段的所有Dimensions和Metrics。

View File

@ -95,7 +95,7 @@ Apache Druid当前支持通过一个Hadoop摄取任务来支持基于Apache Hado
| `hadoopDependencyCoordinates` | Druid使用的Hadoop依赖这些属性会覆盖默认的Hadoop依赖。 如果该值被指定Druid将在 `druid.extensions.hadoopDependenciesDir` 目录下查找指定的Hadoop依赖 | 否 |
| `classpathPrefix` | 为Peon进程准备的类路径。| 否 |
还要注意Druid会自动计算在Hadoop集群中运行的Hadoop作业容器的类路径。但是如果Hadoop和Druid的依赖项之间发生冲突可以通过设置 `druid.extensions.hadoopContainerDruidClasspath`属性。请参阅 [基本druid配置中的扩展配置](../Configuration/configuration.md#扩展) 。
还要注意Druid会自动计算在Hadoop集群中运行的Hadoop作业容器的类路径。但是如果Hadoop和Druid的依赖项之间发生冲突可以通过设置 `druid.extensions.hadoopContainerDruidClasspath`属性。请参阅 [基本druid配置中的扩展配置](../configuration/human-readable-byte.md#扩展) 。
#### `dataSchema`
该字段是必须的。 详情可以查看摄取页中的 [`dataSchema`](ingestion.md#dataschema) 部分来看它应该包括哪些部分。

View File

@ -162,7 +162,7 @@ curl -X POST -H 'Content-Type: application/json' -d @supervisor-spec.json http:/
| 字段 | 类型 | 描述 | 是否必须 |
|-|-|-|-|
| `type` | String | 对于可用选项,可以见 [额外的Peon配置SegmentWriteOutMediumFactory](../Configuration/configuration.md#SegmentWriteOutMediumFactory) | 是 |
| `type` | String | 对于可用选项,可以见 [额外的Peon配置SegmentWriteOutMediumFactory](../configuration/human-readable-byte.md#SegmentWriteOutMediumFactory) | 是 |
#### KafkaSupervisorIOConfig

View File

@ -635,7 +635,7 @@ PartitionsSpec用于描述辅助分区方法。您应该根据需要的rollup模
| 字段 | 类型 | 描述 | 是否必须 |
|-|-|-|-|
| `type` | String | 配置解释和可选项可以参见 [额外的Peon配置SegmentWriteOutMediumFactory](../Configuration/configuration.md#SegmentWriteOutMediumFactory) | 是 |
| `type` | String | 配置解释和可选项可以参见 [额外的Peon配置SegmentWriteOutMediumFactory](../configuration/human-readable-byte.md#SegmentWriteOutMediumFactory) | 是 |
#### 分段推送模式
@ -739,7 +739,7 @@ S3对象
#### 谷歌云存储输入源
> [!WARNING]
> 您需要添加 [`druid-google-extensions`](../Configuration/core-ext/google-cloud-storage.md) 扩展以便使用谷歌云存储输入源。
> 您需要添加 [`druid-google-extensions`](../configuration/core-ext/google-cloud-storage.md) 扩展以便使用谷歌云存储输入源。
谷歌云存储输入源支持直接从谷歌云存储读取对象可以通过谷歌云存储URI字符串列表指定对象。谷歌云存储输入源是可拆分的可以由 [并行任务](#并行任务) 使用,其中 `index_parallel` 的每个worker任务将读取一个或多个对象。
@ -812,7 +812,7 @@ S3对象
#### Azure输入源
> [!WARNING]
> 您需要添加 [`druid-azure-extensions`](../Configuration/core-ext/microsoft-azure.md) 扩展以便使用Azure输入源。
> 您需要添加 [`druid-azure-extensions`](../configuration/core-ext/microsoft-azure.md) 扩展以便使用Azure输入源。
Azure输入源支持直接从Azure读取对象可以通过Azure URI字符串列表指定对象。Azure输入源是可拆分的可以由 [并行任务](#并行任务) 使用,其中 `index_parallel` 的每个worker任务将读取一个或多个对象。
@ -885,7 +885,7 @@ azure对象
#### HDFS输入源
> [!WARNING]
> 您需要添加 [`druid-hdfs-extensions`](../Configuration/core-ext/hdfs.md) 扩展以便使用HDFS输入源。
> 您需要添加 [`druid-hdfs-extensions`](../configuration/core-ext/hdfs.md) 扩展以便使用HDFS输入源。
HDFS输入源支持直接从HDFS存储中读取文件文件路径可以指定为HDFS URI字符串或者HDFS URI字符串列表。HDFS输入源是可拆分的可以由 [并行任务](#并行任务) 使用,其中 `index_parallel` 的每个worker任务将读取一个或多个文件。
@ -956,7 +956,7 @@ HDFS输入源支持直接从HDFS存储中读取文件文件路径可以指定
| `type` | 应该总是 `hdfs` | None | 是 |
| `paths` | HDFS路径。可以是JSON数组或逗号分隔的路径字符串这些路径支持类似*的通配符。给定路径之下的空文件将会被跳过。 | None | 是 |
您还可以使用HDFS输入源从云存储摄取数据。但是如果您想从AWS S3或谷歌云存储读取数据可以考虑使用 [S3输入源](../Configuration/core-ext/s3.md) 或 [谷歌云存储输入源](../Configuration/core-ext/google-cloud-storage.md)。
您还可以使用HDFS输入源从云存储摄取数据。但是如果您想从AWS S3或谷歌云存储读取数据可以考虑使用 [S3输入源](../configuration/core-ext/s3.md) 或 [谷歌云存储输入源](../configuration/core-ext/google-cloud-storage.md)。
#### HTTP输入源

View File

@ -245,7 +245,7 @@ http://<middlemanager-host>:<worker-port>/druid/worker/v1/chat/<task-id>/unparse
要启用段锁定,可能需要在 [task context(任务上下文)](#上下文参数) 中将 `forceTimeChunkLock` 设置为 `false`。一旦 `forceTimeChunkLock` 被取消设置,任务将自动选择正确的锁类型。**请注意**段锁并不总是可用的。使用时间块锁的最常见场景是当覆盖任务更改段粒度时。此外只有本地索引任务和Kafka/kinesis索引任务支持段锁。Hadoop索引任务和索引实时(`index_realtime`)任务(被 [Tranquility](tranquility.md)使用)还不支持它。
任务上下文中的 `forceTimeChunkLock` 仅应用于单个任务。如果要为所有任务取消设置,则需要在 [Overlord配置](../Configuration/configuration.md#overlord) 中设置 `druid.indexer.tasklock.forceTimeChunkLock` 为false。
任务上下文中的 `forceTimeChunkLock` 仅应用于单个任务。如果要为所有任务取消设置,则需要在 [Overlord配置](../configuration/human-readable-byte.md#overlord) 中设置 `druid.indexer.tasklock.forceTimeChunkLock` 为false。
如果两个或多个任务尝试为同一数据源的重叠时间块获取锁,则锁请求可能会相互冲突。**请注意,**锁冲突可能发生在不同的锁类型之间。

View File

@ -31,7 +31,7 @@ Druid源代码包含一个 [示例docker-compose.yml](https://github.com/apache/
#### 配置
Druid Docker容器的配置是通过环境变量完成的环境变量还可以指定到 [标准Druid配置文件](../Configuration/configuration.md) 的路径
Druid Docker容器的配置是通过环境变量完成的环境变量还可以指定到 [标准Druid配置文件](../configuration/human-readable-byte.md) 的路径
特殊环境变量:

View File

@ -31,7 +31,7 @@ Druid包括一组参考配置和用于单机部署的启动脚本
这些示例配置的启动脚本与Druid服务一起运行单个ZK实例,您也可以选择单独部署ZK。
通过[Coordinator配置文档](../Configuration/configuration.md#Coordinator)中描述的可选配置`druid.coordinator.asOverlord.enabled = true`可以在单个进程中同时运行Druid Coordinator和Overlord。
通过[Coordinator配置文档](../configuration/human-readable-byte.md#Coordinator)中描述的可选配置`druid.coordinator.asOverlord.enabled = true`可以在单个进程中同时运行Druid Coordinator和Overlord。
虽然为大型单台计算机提供了示例配置但在更高规模下我们建议在集群部署中运行Druid以实现容错和减少资源争用。

View File

@ -167,7 +167,7 @@ cd apache-druid-0.17.0
在生产部署中我们建议运行专用的元数据存储例如具有复制功能的MySQL或PostgreSQL与Druid服务器分开部署。
[MySQL扩展](../Configuration/core-ext/mysql.md)和[PostgreSQL](../Configuration/core-ext/postgresql.md)扩展文档包含有关扩展配置和初始数据库安装的说明。
[MySQL扩展](../configuration/core-ext/mysql.md)和[PostgreSQL](../configuration/core-ext/postgresql.md)扩展文档包含有关扩展配置和初始数据库安装的说明。
#### 深度存储
@ -202,7 +202,7 @@ druid.indexer.logs.type=s3
druid.indexer.logs.s3Bucket=your-bucket
druid.indexer.logs.s3Prefix=druid/indexing-logs
```
更多信息可以看[S3扩展](../Configuration/core-ext/s3.md)部分的文档。
更多信息可以看[S3扩展](../configuration/core-ext/s3.md)部分的文档。
##### HDFS
@ -234,7 +234,7 @@ druid.indexer.logs.directory=/druid/indexing-logs
* 需要将Hadoop的配置文件core-site.xml, hdfs-site.xml, yarn-site.xml, mapred-site.xml放置在Druid进程的classpath中可以将他们拷贝到`conf/druid/cluster/_common`目录中
更多信息可以看[HDFS扩展](../Configuration/core-ext/hdfs.md)部分的文档。
更多信息可以看[HDFS扩展](../configuration/core-ext/hdfs.md)部分的文档。
### Hadoop连接配置

View File

@ -99,7 +99,7 @@
* [空间过滤器(Spatial Filter)](querying/spatialfilter.md)
* [配置列表]()
* [配置列表](Configuration/configuration.md)
* [配置列表](configuration/human-readable-byte.md)
* [操作指南]()
* [操作指南](operations/index.md)

View File

@ -0,0 +1,158 @@
---
id: human-readable-byte
title: "Human-readable Byte Configuration Reference"
---
<!--
~ Licensed to the Apache Software Foundation (ASF) under one
~ or more contributor license agreements. See the NOTICE file
~ distributed with this work for additional information
~ regarding copyright ownership. The ASF licenses this file
~ to you under the Apache License, Version 2.0 (the
~ "License"); you may not use this file except in compliance
~ with the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing,
~ software distributed under the License is distributed on an
~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
~ KIND, either express or implied. See the License for the
~ specific language governing permissions and limitations
~ under the License.
-->
This page documents configuration properties related to bytes.
These properties can be configured through 2 ways:
1. a simple number in bytes
2. a number with a unit suffix
## A number in bytes
Given that cache size is 3G, there's a configuration as below
```properties
# 3G bytes = 3_000_000_000 bytes
druid.cache.sizeInBytes=3000000000
```
## A number with a unit suffix
When you have to put a large number for some configuration as above, it is easy to make a mistake such as extra or missing 0s. Druid supports a better way, a number with a unit suffix.
Given a disk of 1T, the configuration can be
```properties
druid.segmentCache.locations=[{"path":"/segment-cache-00","maxSize":"1t"},{"path":"/segment-cache-01","maxSize":"1200g"}]
```
Note: in above example, both `1t` and `1T` are acceptable since it's case-insensitive.
Also, only integers are valid as the number part. For example, you can't replace `1200g` with `1.2t`.
### Supported Units
In the world of computer, a unit like `K` is ambiguous. It means 1000 or 1024 in different contexts, for more information please see [Here](https://en.wikipedia.org/wiki/Binary_prefix).
To make it clear, the base of units are defined in Druid as below
| Unit | Description | Base |
|---|---|---|
| K | Kilo Decimal Byte | 1_000 |
| M | Mega Decimal Byte | 1_000_000 |
| G | Giga Decimal Byte | 1_000_000_000 |
| T | Tera Decimal Byte | 1_000_000_000_000 |
| P | Peta Decimal Byte | 1_000_000_000_000_000 |
| KiB | Kilo Binary Byte | 1024 |
| MiB | Mega Binary Byte | 1024 * 1024 |
| GiB | Giga Binary Byte | 1024 * 1024 * 1024 |
| TiB | Tera Binary Byte | 1024 * 1024 * 1024 * 1024 |
| PiB | Peta Binary Byte | 1024 * 1024 * 1024 * 1024 * 1024 |
Unit is case-insensitive. `k`, `kib`, `KiB`, `kiB` are all acceptable.
Here are two examples
```properties
# 1G bytes = 1_000_000_000 bytes
druid.cache.sizeInBytes=1g
```
```properties
# 256MiB bytes = 256 * 1024 * 1024 bytes
druid.cache.sizeInBytes=256MiB
```
## 配置文档
本部分内容列出来了每一种Druid服务的所有配置项
### 推荐的配置文件组织方式
对于Druid的配置文件一种推荐的结构组织方式为将配置文件放置在Druid根目录的`conf`目录下,如以下所示:
```json
$ ls -R conf
druid
conf/druid:
_common broker coordinator historical middleManager overlord
conf/druid/_common:
common.runtime.properties log4j2.xml
conf/druid/broker:
jvm.config runtime.properties
conf/druid/coordinator:
jvm.config runtime.properties
conf/druid/historical:
jvm.config runtime.properties
conf/druid/middleManager:
jvm.config runtime.properties
conf/druid/overlord:
jvm.config runtime.properties
```
每一个目录下都有一个 `runtime.properties` 文件该文件中包含了特定的Druid进程相关的配置项例如 `historical`
`jvm.config` 文件包含了每一个服务的JVM参数例如堆内存属性等
所有进程共享的通用属性位于 `_common/common.runtime.properties` 中。
### 通用配置
本节下的属性是应该在集群中的所有Druid服务之间共享的公共配置。
#### JVM配置最佳实践
在我们的所有进程中有四个需要配置的JVM参数
1. `-Duser.timezone=UTC` 该参数将JVM的默认时区设置为UTC。我们总是这样设置不使用其他默认时区进行测试因此本地时区可能会工作但它们也可能会发现奇怪和有趣的错误。要在非UTC时区中发出查询请参阅 [查询粒度](../querying/granularity.md)
2. `-Dfile.encoding=UTF-8` 这类似于时区我们假设UTF-8进行测试。本地编码可能有效但也可能导致奇怪和有趣的错误。
3. `-Djava.io.tmpdir=<a path>` 系统中与文件系统交互的各个部分都是通过临时文件完成的,这些文件可能会变得有些大。许多生产系统都被设置为具有小的(但是很快的)`/tmp`目录这对于Druid来说可能是个问题因此我们建议将JVM的tmp目录指向一些有更多内容的目录。此目录不应为volatile tmpfs。这个目录还应该具有良好的读写速度因此应该强烈避免NFS挂载。
4. `-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager` 这允许log4j2处理使用标准java日志的非log4j2组件如jetty的日志。
#### 扩展
#### 请求日志
#### SQL兼容的空值处理
### Master
#### Coordinator
#### Overlord
### Data
#### MiddleManager and Peons
##### SegmentWriteOutMediumFactory
#### Indexer
#### Historical
### Query
#### Broker
#### Router

2056
configuration/index.md Normal file

File diff suppressed because it is too large Load Diff

87
configuration/logging.md Normal file
View File

@ -0,0 +1,87 @@
---
id: logging
title: "Logging"
---
<!--
~ Licensed to the Apache Software Foundation (ASF) under one
~ or more contributor license agreements. See the NOTICE file
~ distributed with this work for additional information
~ regarding copyright ownership. The ASF licenses this file
~ to you under the Apache License, Version 2.0 (the
~ "License"); you may not use this file except in compliance
~ with the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing,
~ software distributed under the License is distributed on an
~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
~ KIND, either express or implied. See the License for the
~ specific language governing permissions and limitations
~ under the License.
-->
Apache Druid processes will emit logs that are useful for debugging to the console. Druid processes also emit periodic metrics about their state. For more about metrics, see [Configuration](../configuration/index.md#enabling-metrics). Metric logs are printed to the console by default, and can be disabled with `-Ddruid.emitter.logging.logLevel=debug`.
Druid uses [log4j2](http://logging.apache.org/log4j/2.x/) for logging. Logging can be configured with a log4j2.xml file. Add the path to the directory containing the log4j2.xml file (e.g. the _common/ dir) to your classpath if you want to override default Druid log configuration. Note that this directory should be earlier in the classpath than the druid jars. The easiest way to do this is to prefix the classpath with the config dir.
To enable java logging to go through log4j2, set the `-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager` server parameter.
An example log4j2.xml ships with Druid under config/_common/log4j2.xml, and a sample file is also shown below:
```
<?xml version="1.0" encoding="UTF-8" ?>
<Configuration status="WARN">
<Appenders>
<Console name="Console" target="SYSTEM_OUT">
<PatternLayout pattern="%d{ISO8601} %p [%t] %c - %m%n"/>
</Console>
</Appenders>
<Loggers>
<Root level="info">
<AppenderRef ref="Console"/>
</Root>
<!-- Uncomment to enable logging of all HTTP requests
<Logger name="org.apache.druid.jetty.RequestLog" additivity="false" level="DEBUG">
<AppenderRef ref="Console"/>
</Logger>
-->
</Loggers>
</Configuration>
```
## My logs are really chatty, can I set them to asynchronously write?
Yes, using a `log4j2.xml` similar to the following causes some of the more chatty classes to write asynchronously:
```
<?xml version="1.0" encoding="UTF-8" ?>
<Configuration status="WARN">
<Appenders>
<Console name="Console" target="SYSTEM_OUT">
<PatternLayout pattern="%d{ISO8601} %p [%t] %c - %m%n"/>
</Console>
</Appenders>
<Loggers>
<AsyncLogger name="org.apache.druid.curator.inventory.CuratorInventoryManager" level="debug" additivity="false">
<AppenderRef ref="Console"/>
</AsyncLogger>
<AsyncLogger name="org.apache.druid.client.BatchServerInventoryView" level="debug" additivity="false">
<AppenderRef ref="Console"/>
</AsyncLogger>
<!-- Make extra sure nobody adds logs in a bad way that can hurt performance -->
<AsyncLogger name="org.apache.druid.client.ServerInventoryView" level="debug" additivity="false">
<AppenderRef ref="Console"/>
</AsyncLogger>
<AsyncLogger name ="org.apache.druid.java.util.http.client.pool.ChannelResourceFactory" level="info" additivity="false">
<AppenderRef ref="Console"/>
</AsyncLogger>
<Root level="info">
<AppenderRef ref="Console"/>
</Root>
</Loggers>
</Configuration>
```

View File

@ -13,7 +13,7 @@
## Broker
### 配置
对于Apache Druid Broker的配置请参见 [Broker配置](../Configuration/configuration.md#Broker)
对于Apache Druid Broker的配置请参见 [Broker配置](../configuration/human-readable-byte.md#Broker)
### HTTP
对于Broker的API的列表请参见 [Broker API](../operations/api.md#Broker)

View File

@ -13,7 +13,7 @@
## Coordinator进程
### 配置
对于Apache Druid的Coordinator进程配置详见 [Coordinator配置](../Configuration/configuration.md#Coordinator)
对于Apache Druid的Coordinator进程配置详见 [Coordinator配置](../configuration/human-readable-byte.md#Coordinator)
### HTTP
对于Coordinator的API接口详见 [Coordinator API](../operations/api.md#Coordinator)
@ -45,7 +45,7 @@ org.apache.druid.cli.Main server coordinator
每次运行时Druid Coordinator都通过合并小段或拆分大片段来压缩段。当您的段没有进行段大小可能会导致查询性能下降优化时该操作非常有用。有关详细信息请参见[段大小优化](../operations/segmentSizeOpt.md)。
Coordinator首先根据[段搜索策略](#段搜索策略)查找要压缩的段。找到某些段后,它会发出[压缩任务](../DataIngestion/taskrefer.md#compact)来压缩这些段。运行压缩任务的最大数目为 `min(sum of worker capacity * slotRatio, maxSlots)`。请注意,即使 `min(sum of worker capacity * slotRatio, maxSlots)` = 0如果为数据源启用了压缩则始终会提交至少一个压缩任务。请参阅[压缩配置API](../operations/api.md#Coordinator)和[压缩配置](../Configuration/configuration.md#Coordinator)以启用压缩。
Coordinator首先根据[段搜索策略](#段搜索策略)查找要压缩的段。找到某些段后,它会发出[压缩任务](../DataIngestion/taskrefer.md#compact)来压缩这些段。运行压缩任务的最大数目为 `min(sum of worker capacity * slotRatio, maxSlots)`。请注意,即使 `min(sum of worker capacity * slotRatio, maxSlots)` = 0如果为数据源启用了压缩则始终会提交至少一个压缩任务。请参阅[压缩配置API](../operations/api.md#Coordinator)和[压缩配置](../configuration/human-readable-byte.md#Coordinator)以启用压缩。
压缩任务可能由于以下原因而失败:
@ -76,7 +76,7 @@ Coordinator首先根据[段搜索策略](#段搜索策略)查找要压缩的段
如果Coordinator还有足够的用于压缩任务的插槽该策略则继续搜索剩下的段并返回 `bar_2017-10-01T00:00:00.000Z_2017-11-01T00:00:00.000Z_VERSION``bar_2017-10-01T00:00:00.000Z_2017-11-01T00:00:00.000Z_VERSION_1`。最后,因为在 `2017-09-01T00:00:00.000Z/2017-10-01T00:00:00.000Z` 时间间隔中只有一个段,所以 `foo_2017-09-01T00:00:00.000Z_2017-10-01T00:00:00.000Z_VERSION` 段也会被选择。
搜索的起点可以通过 [skipOffsetFromLatest](../Configuration/configuration.md#Coordinator) 来更改设置。如果设置了此选项,则此策略将忽略范围内的时间段(最新段的结束时间 - `skipOffsetFromLatest` 该配置项主要是为了避免压缩任务和实时任务之间的冲突。请注意,默认情况下,实时任务的优先级高于压缩任务。如果两个任务的时间间隔重叠,实时任务将撤消压缩任务的锁,从而终止压缩任务。
搜索的起点可以通过 [skipOffsetFromLatest](../configuration/human-readable-byte.md#Coordinator) 来更改设置。如果设置了此选项,则此策略将忽略范围内的时间段(最新段的结束时间 - `skipOffsetFromLatest` 该配置项主要是为了避免压缩任务和实时任务之间的冲突。请注意,默认情况下,实时任务的优先级高于压缩任务。如果两个任务的时间间隔重叠,实时任务将撤消压缩任务的锁,从而终止压缩任务。
> [!WARNING]
> 当有很多相同间隔的小段,并且它们的总大小超过 `inputSegmentSizeBytes` 时,此策略当前无法处理这种情况。如果它找到这样的段,它只会跳过它们。

View File

@ -32,12 +32,12 @@ Apache Druid不提供的存储机制深度存储是存储段的地方。深
### S3适配
请看[druid-s3-extensions](../Configuration/core-ext/s3.md)扩展文档
请看[druid-s3-extensions](../configuration/core-ext/s3.md)扩展文档
### HDFS
请看[druid-hdfs-extensions](../Configuration/core-ext/hdfs.md)扩展文档
请看[druid-hdfs-extensions](../configuration/core-ext/hdfs.md)扩展文档
### 其他深度存储
对于另外的深度存储等,可以参见[扩展列表](../Configuration/extensions.md)
对于另外的深度存储等,可以参见[扩展列表](../configuration/logging.md)

View File

@ -13,7 +13,7 @@
## Historical
### 配置
对于Apache Druid Historical的配置请参见 [Historical配置](../Configuration/configuration.md#Historical)
对于Apache Druid Historical的配置请参见 [Historical配置](../configuration/human-readable-byte.md#Historical)
### HTTP
Historical的API列表请参见 [Historical API](../operations/api.md#Historical)

View File

@ -21,7 +21,7 @@ Apache Druid索引器进程是MiddleManager + Peon任务执行系统的另一种
与MiddleManager + Peon系统相比Indexer的设计更易于配置和部署并且能够更好地实现跨任务的资源共享。
### 配置
对于Apache Druid Indexer进程的配置请参见 [Indexer配置](../Configuration/configuration.md#Indexer)
对于Apache Druid Indexer进程的配置请参见 [Indexer配置](../configuration/human-readable-byte.md#Indexer)
### HTTP
Indexer进程与[MiddleManager](../operations/api.md#MiddleManager)共用API
@ -38,7 +38,7 @@ org.apache.druid.cli.Main server indexer
**查询资源**
查询处理线程和缓冲区在所有任务中共享。索引器将为来自所有任务共享的单个端点的查询提供服务。
如果启用了[查询缓存](../Configuration/configuration.md),则查询缓存也将在所有任务中共享。
如果启用了[查询缓存](../configuration/human-readable-byte.md),则查询缓存也将在所有任务中共享。
**服务端HTTP线程**
索引器维护两个大小相等的HTTP线程池。

View File

@ -15,7 +15,7 @@
元数据存储是Apache Druid的一个外部依赖。Druid使用它来存储系统的各种元数据但不存储实际的数据。下面有许多用于各种目的的表。
Derby是Druid的默认元数据存储但是它不适合生产环境。[MySQL](../Configuration/core-ext/mysql.md)和[PostgreSQL](../Configuration/core-ext/postgresql.md)是更适合生产的元数据存储。
Derby是Druid的默认元数据存储但是它不适合生产环境。[MySQL](../configuration/core-ext/mysql.md)和[PostgreSQL](../configuration/core-ext/postgresql.md)是更适合生产的元数据存储。
> [!WARNING]
> 元数据存储存储了Druid集群工作所必需的整个元数据。对于生产集群考虑使用MySQL或PostgreSQL而不是Derby。此外强烈建议设置数据库的高可用因为如果丢失任何元数据将无法恢复。
@ -31,11 +31,11 @@ druid.metadata.storage.connector.connectURI=jdbc:derby://localhost:1527//opt/var
### MySQL
参见[mysql-metadata-storage](../Configuration/core-ext/mysql.md)扩展文档
参见[mysql-metadata-storage](../configuration/core-ext/mysql.md)扩展文档
### PostgreSQL
参见[postgresql-metadata-storage](../Configuration/core-ext/postgresql.md)扩展文档
参见[postgresql-metadata-storage](../configuration/core-ext/postgresql.md)扩展文档
### 添加自定义的数据库连接池属性

View File

@ -13,7 +13,7 @@
## MiddleManager进程
### 配置
对于Apache Druid MiddleManager配置可以参见[索引服务配置](../Configuration/configuration.md#MiddleManager)
对于Apache Druid MiddleManager配置可以参见[索引服务配置](../configuration/human-readable-byte.md#MiddleManager)
### HTTP
对于MiddleManager的API接口详见 [MiddleManager API](../operations/api.md#MiddleManager)

View File

@ -13,7 +13,7 @@
## Overload进程
### 配置
对于Apache Druid的Overlord进程配置详见 [Overlord配置](../Configuration/configuration.md#Overlord)
对于Apache Druid的Overlord进程配置详见 [Overlord配置](../configuration/human-readable-byte.md#Overlord)
### HTTP
对于Overlord的API接口详见 [Overlord API](../operations/api.md#Overlord)

View File

@ -13,7 +13,7 @@
## Peons
### 配置
对于Apache Druid Peon配置可以参见 [Peon查询配置](../Configuration/configuration.md) 和 [额外的Peon配置](../Configuration/configuration.md)
对于Apache Druid Peon配置可以参见 [Peon查询配置](../configuration/human-readable-byte.md) 和 [额外的Peon配置](../configuration/human-readable-byte.md)
### HTTP
对于Peon的API接口详见 [Peon API](../operations/api.md#Peon)

View File

@ -108,7 +108,7 @@ Coordinator进程的工作负载往往随着集群中段的数量而增加。Ove
通过设置 `druid.Coordinator.asOverlord.enabled` 属性Coordinator进程和Overlord进程可以作为单个组合进程运行。
有关详细信息,请参阅[Coordinator配置](../Configuration/configuration.md#Coordinator)。
有关详细信息,请参阅[Coordinator配置](../configuration/human-readable-byte.md#Coordinator)。
#### Historical和MiddleManager

View File

@ -24,7 +24,7 @@ Apache Druid Router用于将查询路由到不同的Broker。默认情况下B
### 配置
对于Apache Druid Router的配置请参见 [Router 配置](../Configuration/configuration.md#Router)
对于Apache Druid Router的配置请参见 [Router 配置](../configuration/human-readable-byte.md#Router)
### HTTP

View File

@ -19,7 +19,7 @@ Druid 包含有一组可用的参考配置和用于单机部署的启动脚本
相关的内容请参考 [Coordinator configuration documentation](../configuration/index.md#coordinator-operation) 页面中的内容。
我们虽然为大型单台计算机提供了配置的实例,但是在更加真实和大数据的环境下,我们建议在集群方式下部署 Druid请参考 [clustered deployment](../tutorials/cluster.md) 页面中的内容。
通过集群式的部署,能够更好的增加的 Druid 容错能力和扩展能力。
通过集群式的部署,能够更好的增加的 Druid 容错能力和扩展能力。
### Nano-Quickstart: 1 CPU, 4GiB RAM

View File

@ -313,18 +313,18 @@ Double/Float/Long/String的ANY聚合器不能够使用在摄入规范中
**Apache DataSketches Theta Sketch**
聚合器提供的[DataSketches Theta Sketch扩展](../Configuration/core-ext/datasketches-theta.md) 使用[Apache Datasketches库](https://datasketches.apache.org/) 中的Theta Sketch提供不同的计数估计并支持集合并集、交集和差分后置聚合器。
聚合器提供的[DataSketches Theta Sketch扩展](../configuration/core-ext/datasketches-theta.md) 使用[Apache Datasketches库](https://datasketches.apache.org/) 中的Theta Sketch提供不同的计数估计并支持集合并集、交集和差分后置聚合器。
**Apache DataSketches HLL Sketch**
聚合器提供的[DataSketches HLL Sketch扩展](../Configuration/core-ext/datasketches-hll.md)使用HyperLogLog算法给出不同的计数估计。
聚合器提供的[DataSketches HLL Sketch扩展](../configuration/core-ext/datasketches-hll.md)使用HyperLogLog算法给出不同的计数估计。
与Theta草图相比HLL草图不支持set操作更新和合并速度稍慢但需要的空间要少得多
**Cardinality, hyperUnique**
> [!WARNING]
> 对于新的场景,我们推荐评估使用 [DataSketches Theta Sketch扩展](../Configuration/core-ext/datasketches-theta.md) 和 [DataSketches HLL Sketch扩展](../Configuration/core-ext/datasketches-hll.md) 来替代。 DataSketch聚合器通常情况下比经典的Druid `cardinality``hyperUnique` 聚合器提供更弹性的和更好的精确度。
> 对于新的场景,我们推荐评估使用 [DataSketches Theta Sketch扩展](../configuration/core-ext/datasketches-theta.md) 和 [DataSketches HLL Sketch扩展](../configuration/core-ext/datasketches-hll.md) 来替代。 DataSketch聚合器通常情况下比经典的Druid `cardinality``hyperUnique` 聚合器提供更弹性的和更好的精确度。
Cardinality和HyperUnique聚合器是在Druid中默认提供的较旧的聚合器实现它们还使用HyperLogLog算法提供不同的计数估计。较新的数据集Theta和HLL扩展提供了上述聚合器具有更高的精度和性能因此建议改为使用。

522
querying/dimensionspecs.md Normal file
View File

@ -0,0 +1,522 @@
# 查询维度
> Apache Druid supports two query languages: [Druid SQL](sql.md) and [native queries](querying.md).
> This document describes the native
> language. For information about functions available in SQL, refer to the
> [SQL documentation](sql.md#scalar-functions).
The following JSON fields can be used in a query to operate on dimension values.
## DimensionSpec
`DimensionSpec`s define how dimension values get transformed prior to aggregation.
### Default DimensionSpec
Returns dimension values as is and optionally renames the dimension.
```json
{
"type" : "default",
"dimension" : <dimension>,
"outputName": <output_name>,
"outputType": <"STRING"|"LONG"|"FLOAT">
}
```
When specifying a DimensionSpec on a numeric column, the user should include the type of the column in the `outputType` field. If left unspecified, the `outputType` defaults to STRING.
Please refer to the [Output Types](#output-types) section for more details.
### Extraction DimensionSpec
Returns dimension values transformed using the given [extraction function](#extraction-functions).
```json
{
"type" : "extraction",
"dimension" : <dimension>,
"outputName" : <output_name>,
"outputType": <"STRING"|"LONG"|"FLOAT">,
"extractionFn" : <extraction_function>
}
```
`outputType` may also be specified in an ExtractionDimensionSpec to apply type conversion to results before merging. If left unspecified, the `outputType` defaults to STRING.
Please refer to the [Output Types](#output-types) section for more details.
### Filtered DimensionSpecs
These are only useful for multi-value dimensions. If you have a row in Apache Druid that has a multi-value dimension with values ["v1", "v2", "v3"] and you send a groupBy/topN query grouping by that dimension with [query filter](filters.md) for value "v1". In the response you will get 3 rows containing "v1", "v2" and "v3". This behavior might be unintuitive for some use cases.
It happens because "query filter" is internally used on the bitmaps and only used to match the row to be included in the query result processing. With multi-value dimensions, "query filter" behaves like a contains check, which will match the row with dimension value ["v1", "v2", "v3"]. Please see the section on "Multi-value columns" in [segment](../design/segments.md) for more details.
Then groupBy/topN processing pipeline "explodes" all multi-value dimensions resulting 3 rows for "v1", "v2" and "v3" each.
In addition to "query filter" which efficiently selects the rows to be processed, you can use the filtered dimension spec to filter for specific values within the values of a multi-value dimension. These dimensionSpecs take a delegate DimensionSpec and a filtering criteria. From the "exploded" rows, only rows matching the given filtering criteria are returned in the query result.
The following filtered dimension spec acts as a whitelist or blacklist for values as per the "isWhitelist" attribute value.
```json
{ "type" : "listFiltered", "delegate" : <dimensionSpec>, "values": <array of strings>, "isWhitelist": <optional attribute for true/false, default is true> }
```
Following filtered dimension spec retains only the values matching regex. Note that `listFiltered` is faster than this and one should use that for whitelist or blacklist use case.
```json
{ "type" : "regexFiltered", "delegate" : <dimensionSpec>, "pattern": <java regex pattern> }
```
Following filtered dimension spec retains only the values starting with the same prefix.
```json
{ "type" : "prefixFiltered", "delegate" : <dimensionSpec>, "prefix": <prefix string> }
```
For more details and examples, see [multi-value dimensions](multi-value-dimensions.md).
### Lookup DimensionSpecs
> Lookups are an [experimental](../development/experimental.md) feature.
Lookup DimensionSpecs can be used to define directly a lookup implementation as dimension spec.
Generally speaking there is two different kind of lookups implementations.
The first kind is passed at the query time like `map` implementation.
```json
{
"type":"lookup",
"dimension":"dimensionName",
"outputName":"dimensionOutputName",
"replaceMissingValueWith":"missing_value",
"retainMissingValue":false,
"lookup":{"type": "map", "map":{"key":"value"}, "isOneToOne":false}
}
```
A property of `retainMissingValue` and `replaceMissingValueWith` can be specified at query time to hint how to handle missing values. Setting `replaceMissingValueWith` to `""` has the same effect as setting it to `null` or omitting the property.
Setting `retainMissingValue` to true will use the dimension's original value if it is not found in the lookup.
The default values are `replaceMissingValueWith = null` and `retainMissingValue = false` which causes missing values to be treated as missing.
It is illegal to set `retainMissingValue = true` and also specify a `replaceMissingValueWith`.
A property `optimize` can be supplied to allow optimization of lookup based extraction filter (by default `optimize = true`).
The second kind where it is not possible to pass at query time due to their size, will be based on an external lookup table or resource that is already registered via configuration file or/and Coordinator.
```json
{
"type":"lookup",
"dimension":"dimensionName",
"outputName":"dimensionOutputName",
"name":"lookupName"
}
```
## Output Types
The dimension specs provide an option to specify the output type of a column's values. This is necessary as it is possible for a column with given name to have different value types in different segments; results will be converted to the type specified by `outputType` before merging.
Note that not all use cases for DimensionSpec currently support `outputType`, the table below shows which use cases support this option:
|Query Type|Supported?|
|--------|---------|
|GroupBy (v1)|no|
|GroupBy (v2)|yes|
|TopN|yes|
|Search|no|
|Select|no|
|Cardinality Aggregator|no|
## Extraction Functions
Extraction functions define the transformation applied to each dimension value.
Transformations can be applied to both regular (string) dimensions, as well
as the special `__time` dimension, which represents the current time bucket
according to the query [aggregation granularity](../querying/granularities.md).
**Note**: for functions taking string values (such as regular expressions),
`__time` dimension values will be formatted in [ISO-8601 format](https://en.wikipedia.org/wiki/ISO_8601)
before getting passed to the extraction function.
### Regular Expression Extraction Function
Returns the first matching group for the given regular expression.
If there is no match, it returns the dimension value as is.
```json
{
"type" : "regex",
"expr" : <regular_expression>,
"index" : <group to extract, default 1>
"replaceMissingValue" : true,
"replaceMissingValueWith" : "foobar"
}
```
For example, using `"expr" : "(\\w\\w\\w).*"` will transform
`'Monday'`, `'Tuesday'`, `'Wednesday'` into `'Mon'`, `'Tue'`, `'Wed'`.
If "index" is set, it will control which group from the match to extract. Index zero extracts the string matching the
entire pattern.
If the `replaceMissingValue` property is true, the extraction function will transform dimension values that do not match the regex pattern to a user-specified String. Default value is `false`.
The `replaceMissingValueWith` property sets the String that unmatched dimension values will be replaced with, if `replaceMissingValue` is true. If `replaceMissingValueWith` is not specified, unmatched dimension values will be replaced with nulls.
For example, if `expr` is `"(a\w+)"` in the example JSON above, a regex that matches words starting with the letter `a`, the extraction function will convert a dimension value like `banana` to `foobar`.
### Partial Extraction Function
Returns the dimension value unchanged if the regular expression matches, otherwise returns null.
```json
{ "type" : "partial", "expr" : <regular_expression> }
```
### Search query extraction function
Returns the dimension value unchanged if the given [`SearchQuerySpec`](../querying/searchquery.md#searchqueryspec)
matches, otherwise returns null.
```json
{ "type" : "searchQuery", "query" : <search_query_spec> }
```
### Substring Extraction Function
Returns a substring of the dimension value starting from the supplied index and of the desired length. Both index
and length are measured in the number of Unicode code units present in the string as if it were encoded in UTF-16.
Note that some Unicode characters may be represented by two code units. This is the same behavior as the Java String
class's "substring" method.
If the desired length exceeds the length of the dimension value, the remainder of the string starting at index will
be returned. If index is greater than the length of the dimension value, null will be returned.
```json
{ "type" : "substring", "index" : 1, "length" : 4 }
```
The length may be omitted for substring to return the remainder of the dimension value starting from index,
or null if index greater than the length of the dimension value.
```json
{ "type" : "substring", "index" : 3 }
```
### Strlen Extraction Function
Returns the length of dimension values, as measured in the number of Unicode code units present in the string as if it
were encoded in UTF-16. Note that some Unicode characters may be represented by two code units. This is the same
behavior as the Java String class's "length" method.
null strings are considered as having zero length.
```json
{ "type" : "strlen" }
```
### Time Format Extraction Function
Returns the dimension value formatted according to the given format string, time zone, and locale.
For `__time` dimension values, this formats the time value bucketed by the
[aggregation granularity](../querying/granularities.md)
For a regular dimension, it assumes the string is formatted in
[ISO-8601 date and time format](https://en.wikipedia.org/wiki/ISO_8601).
* `format` : date time format for the resulting dimension value, in [Joda Time DateTimeFormat](http://www.joda.org/joda-time/apidocs/org/joda/time/format/DateTimeFormat.html), or null to use the default ISO8601 format.
* `locale` : locale (language and country) to use, given as a [IETF BCP 47 language tag](http://www.oracle.com/technetwork/java/javase/java8locales-2095355.html#util-text), e.g. `en-US`, `en-GB`, `fr-FR`, `fr-CA`, etc.
* `timeZone` : time zone to use in [IANA tz database format](http://en.wikipedia.org/wiki/List_of_tz_database_time_zones), e.g. `Europe/Berlin` (this can possibly be different than the aggregation time-zone)
* `granularity` : [granularity](granularities.md) to apply before formatting, or omit to not apply any granularity.
* `asMillis` : boolean value, set to true to treat input strings as millis rather than ISO8601 strings. Additionally, if `format` is null or not specified, output will be in millis rather than ISO8601.
```json
{ "type" : "timeFormat",
"format" : <output_format> (optional),
"timeZone" : <time_zone> (optional, default UTC),
"locale" : <locale> (optional, default current locale),
"granularity" : <granularity> (optional, default none) },
"asMillis" : <true or false> (optional) }
```
For example, the following dimension spec returns the day of the week for Montréal in French:
```json
{
"type" : "extraction",
"dimension" : "__time",
"outputName" : "dayOfWeek",
"extractionFn" : {
"type" : "timeFormat",
"format" : "EEEE",
"timeZone" : "America/Montreal",
"locale" : "fr"
}
}
```
### Time Parsing Extraction Function
Parses dimension values as timestamps using the given input format,
and returns them formatted using the given output format.
Note, if you are working with the `__time` dimension, you should consider using the
[time extraction function instead](#time-format-extraction-function) instead,
which works on time value directly as opposed to string values.
If "joda" is true, time formats are described in the [Joda DateTimeFormat documentation](http://www.joda.org/joda-time/apidocs/org/joda/time/format/DateTimeFormat.html).
If "joda" is false (or unspecified) then formats are described in the [SimpleDateFormat documentation](http://icu-project.org/apiref/icu4j/com/ibm/icu/text/SimpleDateFormat.html).
In general, we recommend setting "joda" to true since Joda format strings are more common in Druid APIs and since Joda handles certain edge cases (like weeks and weekyears near
the start and end of calendar years) in a more ISO8601 compliant way.
If a value cannot be parsed using the provided timeFormat, it will be returned as-is.
```json
{ "type" : "time",
"timeFormat" : <input_format>,
"resultFormat" : <output_format>,
"joda" : <true, false> }
```
### JavaScript Extraction Function
Returns the dimension value, as transformed by the given JavaScript function.
For regular dimensions, the input value is passed as a string.
For the `__time` dimension, the input value is passed as a number
representing the number of milliseconds since January 1, 1970 UTC.
Example for a regular dimension
```json
{
"type" : "javascript",
"function" : "function(str) { return str.substr(0, 3); }"
}
```
```json
{
"type" : "javascript",
"function" : "function(str) { return str + '!!!'; }",
"injective" : true
}
```
A property of `injective` specifies if the JavaScript function preserves uniqueness. The default value is `false` meaning uniqueness is not preserved
Example for the `__time` dimension:
```json
{
"type" : "javascript",
"function" : "function(t) { return 'Second ' + Math.floor((t % 60000) / 1000); }"
}
```
> JavaScript-based functionality is disabled by default. Please refer to the Druid [JavaScript programming guide](../development/javascript.md) for guidelines about using Druid's JavaScript functionality, including instructions on how to enable it.
### Registered lookup extraction function
Lookups are a concept in Druid where dimension values are (optionally) replaced with new values.
For more documentation on using lookups, please see [Lookups](../querying/lookups.md).
The "registeredLookup" extraction function lets you refer to a lookup that has been registered in the cluster-wide
configuration.
An example:
```json
{
"type":"registeredLookup",
"lookup":"some_lookup_name",
"retainMissingValue":true
}
```
A property of `retainMissingValue` and `replaceMissingValueWith` can be specified at query time to hint how to handle
missing values. Setting `replaceMissingValueWith` to `""` has the same effect as setting it to `null` or omitting the
property. Setting `retainMissingValue` to true will use the dimension's original value if it is not found in the lookup.
The default values are `replaceMissingValueWith = null` and `retainMissingValue = false` which causes missing values to
be treated as missing.
It is illegal to set `retainMissingValue = true` and also specify a `replaceMissingValueWith`.
A property of `injective` can override the lookup's own sense of whether or not it is
[injective](lookups.md#query-execution). If left unspecified, Druid will use the registered cluster-wide lookup
configuration.
A property `optimize` can be supplied to allow optimization of lookup based extraction filter (by default `optimize = true`).
The optimization layer will run on the Broker and it will rewrite the extraction filter as clause of selector filters.
For instance the following filter
```json
{
"filter": {
"type": "selector",
"dimension": "product",
"value": "bar_1",
"extractionFn": {
"type": "registeredLookup",
"optimize": true,
"lookup": "some_lookup_name"
}
}
}
```
will be rewritten as the following simpler query, assuming a lookup that maps "product_1" and "product_3" to the value
"bar_1":
```json
{
"filter":{
"type":"or",
"fields":[
{
"filter":{
"type":"selector",
"dimension":"product",
"value":"product_1"
}
},
{
"filter":{
"type":"selector",
"dimension":"product",
"value":"product_3"
}
}
]
}
}
```
A null dimension value can be mapped to a specific value by specifying the empty string as the key in your lookup file.
This allows distinguishing between a null dimension and a lookup resulting in a null.
For example, specifying `{"":"bar","bat":"baz"}` with dimension values `[null, "foo", "bat"]` and replacing missing values with `"oof"` will yield results of `["bar", "oof", "baz"]`.
Omitting the empty string key will cause the missing value to take over. For example, specifying `{"bat":"baz"}` with dimension values `[null, "foo", "bat"]` and replacing missing values with `"oof"` will yield results of `["oof", "oof", "baz"]`.
### Inline lookup extraction function
Lookups are a concept in Druid where dimension values are (optionally) replaced with new values.
For more documentation on using lookups, please see [Lookups](../querying/lookups.md).
The "lookup" extraction function lets you specify an inline lookup map without registering one in the cluster-wide
configuration.
Examples:
```json
{
"type":"lookup",
"lookup":{
"type":"map",
"map":{"foo":"bar", "baz":"bat"}
},
"retainMissingValue":true,
"injective":true
}
```
```json
{
"type":"lookup",
"lookup":{
"type":"map",
"map":{"foo":"bar", "baz":"bat"}
},
"retainMissingValue":false,
"injective":false,
"replaceMissingValueWith":"MISSING"
}
```
The inline lookup should be of type `map`.
The properties `retainMissingValue`, `replaceMissingValueWith`, `injective`, and `optimize` behave similarly to the
[registered lookup extraction function](#registered-lookup-extraction-function).
### Cascade Extraction Function
Provides chained execution of extraction functions.
A property of `extractionFns` contains an array of any extraction functions, which is executed in the array index order.
Example for chaining [regular expression extraction function](#regular-expression-extraction-function), [JavaScript extraction function](#javascript-extraction-function), and [substring extraction function](#substring-extraction-function) is as followings.
```json
{
"type" : "cascade",
"extractionFns": [
{
"type" : "regex",
"expr" : "/([^/]+)/",
"replaceMissingValue": false,
"replaceMissingValueWith": null
},
{
"type" : "javascript",
"function" : "function(str) { return \"the \".concat(str) }"
},
{
"type" : "substring",
"index" : 0, "length" : 7
}
]
}
```
It will transform dimension values with specified extraction functions in the order named.
For example, `'/druid/prod/historical'` is transformed to `'the dru'` as regular expression extraction function first transforms it to `'druid'` and then, JavaScript extraction function transforms it to `'the druid'`, and lastly, substring extraction function transforms it to `'the dru'`.
### String Format Extraction Function
Returns the dimension value formatted according to the given format string.
```json
{ "type" : "stringFormat", "format" : <sprintf_expression>, "nullHandling" : <optional attribute for handling null value> }
```
For example if you want to concat "[" and "]" before and after the actual dimension value, you need to specify "[%s]" as format string. "nullHandling" can be one of `nullString`, `emptyString` or `returnNull`. With "[%s]" format, each configuration will result `[null]`, `[]`, `null`. Default is `nullString`.
### Upper and Lower extraction functions.
Returns the dimension values as all upper case or lower case.
Optionally user can specify the language to use in order to perform upper or lower transformation
```json
{
"type" : "upper",
"locale":"fr"
}
```
or without setting "locale" (in this case, the current value of the default locale for this instance of the Java Virtual Machine.)
```json
{
"type" : "lower"
}
```
### Bucket Extraction Function
Bucket extraction function is used to bucket numerical values in each range of the given size by converting them to the same base value. Non numeric values are converted to null.
* `size` : the size of the buckets (optional, default 1)
* `offset` : the offset for the buckets (optional, default 0)
The following extraction function creates buckets of 5 starting from 2. In this case, values in the range of [2, 7) will be converted to 2, values in [7, 12) will be converted to 7, etc.
```json
{
"type" : "bucket",
"size" : 5,
"offset" : 2
}
```

View File

@ -127,7 +127,7 @@ Druid的原生类型系统允许字符串可能有多个值。这些 [多值维
#### NULL
[runtime property](../Configuration/configuration.md#SQL兼容的空值处理) 中的 `druid.generic.useDefaultValueForNull` 配置控制着Druid的NULL处理模式。
[runtime property](../configuration/human-readable-byte.md#SQL兼容的空值处理) 中的 `druid.generic.useDefaultValueForNull` 配置控制着Druid的NULL处理模式。
在默认模式(`true`)下Druid将NULL和空字符串互换处理而不是根据SQL标准。在这种模式下Druid SQL只部分支持NULL。例如表达式 `col IS NULL``col = ''` 等效,如果 `col` 包含空字符串则两者的计算结果都为true。类似地如果`col1`是空字符串,则表达式 `COALESCE(col1col2)` 将返回 `col2`。当 `COUNT(*)` 聚合器计算所有行时,`COUNT(expr)` 聚合器将计算expr既不为空也不为空字符串的行数。此模式中的数值列不可为空任何空值或缺少的值都将被视为零。
@ -148,23 +148,23 @@ Druid的原生类型系统允许字符串可能有多个值。这些 [多值维
| `MAX(expr)` | 取数字的最大值 |
| `AVG(expr)` | 取平均值 |
| `APPROX_COUNT_DISTINCT(expr)` | 唯一值的计数该值可以是常规列或hyperUnique。这始终是近似值而不考虑"useApproximateCountDistinct"的值。该函数使用了Druid内置的"cardinality"或"hyperUnique"聚合器。另请参见 `COUNT(DISTINCT expr)` |
| `APPROX_COUNT_DISTINCT_DS_HLL(expr, [lgK, tgtHllType])` | 唯一值的计数,该值可以是常规列或[HLL sketch](../Configuration/core-ext/datasketches-hll.md)。`lgk` 和 `tgtHllType` 参数在HLL Sketch文档中做了描述。 该值也始终是近似值,而不考虑"useApproximateCountDistinct"的值。另请参见 `COUNT(DISTINCT expr)`, 使用该函数需要加载 [DataSketches扩展](../development/datasketches-extension.md) |
| `APPROX_COUNT_DISTINCT_DS_THETA(expr, [size])` | 唯一值的计数,该值可以是常规列或[Theta sketch](../Configuration/core-ext/datasketches-theta.md)。`size` 参数在Theta Sketch文档中做了描述。 该值也始终是近似值,而不考虑"useApproximateCountDistinct"的值。另请参见 `COUNT(DISTINCT expr)`, 使用该函数需要加载 [DataSketches扩展](../development/datasketches-extension.md) |
| `DS_HLL(expr, [lgK, tgtHllType])` | 在表达式的值上创建一个 [`HLL sketch`](../Configuration/core-ext/datasketches-hll.md), 该值可以是常规列或者包括HLL Sketch的列。`lgk` 和 `tgtHllType` 参数在HLL Sketch文档中做了描述。使用该函数需要加载 [DataSketches扩展](../development/datasketches-extension.md) |
| `DS_THETA(expr, [size])` | 在表达式的值上创建一个[`Theta sketch`](../Configuration/core-ext/datasketches-theta.md)该值可以是常规列或者包括Theta Sketch的列。`size` 参数在Theta Sketch文档中做了描述。使用该函数需要加载 [DataSketches扩展](../development/datasketches-extension.md) |
| `APPROX_QUANTILE(expr, probability, [resolution])` | 在数值表达式或者[近似图](../Configuration/core-ext/approximate-histograms.md) 表达式上计算近似分位数,"probability"应该是位于0到1之间不包括1"resolution"是用于计算的centroids更高的resolution将会获得更精确的结果默认值为50。使用该函数需要加载 [近似直方图扩展](../Configuration/core-ext/approximate-histograms.md) |
| `APPROX_QUANTILE_DS(expr, probability, [k])` | 在数值表达式或者 [Quantiles sketch](../Configuration/core-ext/datasketches-quantiles.md) 表达式上计算近似分位数,"probability"应该是位于0到1之间不包括1, `k`参数在Quantiles Sketch文档中做了描述。使用该函数需要加载 [DataSketches扩展](../development/datasketches-extension.md) |
| `APPROX_QUANTILE_FIXED_BUCKETS(expr, probability, numBuckets, lowerLimit, upperLimit, [outlierHandlingMode])` | 在数值表达式或者[fixed buckets直方图](../Configuration/core-ext/approximate-histograms.md) 表达式上计算近似分位数,"probability"应该是位于0到1之间不包括1, `numBuckets`, `lowerLimit`, `upperLimit``outlierHandlingMode` 参数在fixed buckets直方图文档中做了描述。 使用该函数需要加载 [近似直方图扩展](../Configuration/core-ext/approximate-histograms.md) |
| `DS_QUANTILES_SKETCH(expr, [k])` | 在表达式的值上创建一个[`Quantiles sketch`](../Configuration/core-ext/datasketches-quantiles.md)该值可以是常规列或者包括Quantiles Sketch的列。`k`参数在Quantiles Sketch文档中做了描述。使用该函数需要加载 [DataSketches扩展](../development/datasketches-extension.md) |
| `BLOOM_FILTER(expr, numEntries)` | 根据`expr`生成的值计算bloom筛选器其中`numEntries`在假阳性率增加之前具有最大数量的不同值。详细可以参见 [Bloom过滤器扩展](../Configuration/core-ext/bloom-filter.md) |
| `TDIGEST_QUANTILE(expr, quantileFraction, [compression])` | 根据`expr`生成的值构建一个T-Digest sketch并返回分位数的值。"compression"默认值100确定sketch的精度和大小。更高的compression意味着更高的精度但更多的空间来存储sketch。有关更多详细信息请参阅 [t-digest扩展文档](../Configuration/core-ext/tdigestsketch-quantiles.md) |
| `TDIGEST_GENERATE_SKETCH(expr, [compression])` | 根据`expr`生成的值构建一个T-Digest sketch。"compression"默认值100确定sketch的精度和大小。更高的compression意味着更高的精度但更多的空间来存储sketch。有关更多详细信息请参阅 [t-digest扩展文档](../Configuration/core-ext/tdigestsketch-quantiles.md) |
| `VAR_POP(expr)` | 计算`expr`的总体方差, 额外的信息参见 [stats扩展文档](../Configuration/core-ext/stats.md) |
| `VAR_SAMP(expr)` | 计算表达式的样本方差,额外的信息参见 [stats扩展文档](../Configuration/core-ext/stats.md) |
| `VARIANCE(expr)` | 计算表达式的样本方差,额外的信息参见 [stats扩展文档](../Configuration/core-ext/stats.md) |
| `STDDEV_POP(expr)` | 计算`expr`的总体标准差, 额外的信息参见 [stats扩展文档](../Configuration/core-ext/stats.md) |
| `STDDEV_SAMP(expr)` | 计算表达式的样本标准差,额外的信息参见 [stats扩展文档](../Configuration/core-ext/stats.md) |
| `STDDEV(expr)` | 计算表达式的样本标准差,额外的信息参见 [stats扩展文档](../Configuration/core-ext/stats.md) |
| `APPROX_COUNT_DISTINCT_DS_HLL(expr, [lgK, tgtHllType])` | 唯一值的计数,该值可以是常规列或[HLL sketch](../configuration/core-ext/datasketches-hll.md)。`lgk` 和 `tgtHllType` 参数在HLL Sketch文档中做了描述。 该值也始终是近似值,而不考虑"useApproximateCountDistinct"的值。另请参见 `COUNT(DISTINCT expr)`, 使用该函数需要加载 [DataSketches扩展](../development/datasketches-extension.md) |
| `APPROX_COUNT_DISTINCT_DS_THETA(expr, [size])` | 唯一值的计数,该值可以是常规列或[Theta sketch](../configuration/core-ext/datasketches-theta.md)。`size` 参数在Theta Sketch文档中做了描述。 该值也始终是近似值,而不考虑"useApproximateCountDistinct"的值。另请参见 `COUNT(DISTINCT expr)`, 使用该函数需要加载 [DataSketches扩展](../development/datasketches-extension.md) |
| `DS_HLL(expr, [lgK, tgtHllType])` | 在表达式的值上创建一个 [`HLL sketch`](../configuration/core-ext/datasketches-hll.md), 该值可以是常规列或者包括HLL Sketch的列。`lgk` 和 `tgtHllType` 参数在HLL Sketch文档中做了描述。使用该函数需要加载 [DataSketches扩展](../development/datasketches-extension.md) |
| `DS_THETA(expr, [size])` | 在表达式的值上创建一个[`Theta sketch`](../configuration/core-ext/datasketches-theta.md)该值可以是常规列或者包括Theta Sketch的列。`size` 参数在Theta Sketch文档中做了描述。使用该函数需要加载 [DataSketches扩展](../development/datasketches-extension.md) |
| `APPROX_QUANTILE(expr, probability, [resolution])` | 在数值表达式或者[近似图](../configuration/core-ext/approximate-histograms.md) 表达式上计算近似分位数,"probability"应该是位于0到1之间不包括1"resolution"是用于计算的centroids更高的resolution将会获得更精确的结果默认值为50。使用该函数需要加载 [近似直方图扩展](../configuration/core-ext/approximate-histograms.md) |
| `APPROX_QUANTILE_DS(expr, probability, [k])` | 在数值表达式或者 [Quantiles sketch](../configuration/core-ext/datasketches-quantiles.md) 表达式上计算近似分位数,"probability"应该是位于0到1之间不包括1, `k`参数在Quantiles Sketch文档中做了描述。使用该函数需要加载 [DataSketches扩展](../development/datasketches-extension.md) |
| `APPROX_QUANTILE_FIXED_BUCKETS(expr, probability, numBuckets, lowerLimit, upperLimit, [outlierHandlingMode])` | 在数值表达式或者[fixed buckets直方图](../configuration/core-ext/approximate-histograms.md) 表达式上计算近似分位数,"probability"应该是位于0到1之间不包括1, `numBuckets`, `lowerLimit`, `upperLimit``outlierHandlingMode` 参数在fixed buckets直方图文档中做了描述。 使用该函数需要加载 [近似直方图扩展](../configuration/core-ext/approximate-histograms.md) |
| `DS_QUANTILES_SKETCH(expr, [k])` | 在表达式的值上创建一个[`Quantiles sketch`](../configuration/core-ext/datasketches-quantiles.md)该值可以是常规列或者包括Quantiles Sketch的列。`k`参数在Quantiles Sketch文档中做了描述。使用该函数需要加载 [DataSketches扩展](../development/datasketches-extension.md) |
| `BLOOM_FILTER(expr, numEntries)` | 根据`expr`生成的值计算bloom筛选器其中`numEntries`在假阳性率增加之前具有最大数量的不同值。详细可以参见 [Bloom过滤器扩展](../configuration/core-ext/bloom-filter.md) |
| `TDIGEST_QUANTILE(expr, quantileFraction, [compression])` | 根据`expr`生成的值构建一个T-Digest sketch并返回分位数的值。"compression"默认值100确定sketch的精度和大小。更高的compression意味着更高的精度但更多的空间来存储sketch。有关更多详细信息请参阅 [t-digest扩展文档](../configuration/core-ext/tdigestsketch-quantiles.md) |
| `TDIGEST_GENERATE_SKETCH(expr, [compression])` | 根据`expr`生成的值构建一个T-Digest sketch。"compression"默认值100确定sketch的精度和大小。更高的compression意味着更高的精度但更多的空间来存储sketch。有关更多详细信息请参阅 [t-digest扩展文档](../configuration/core-ext/tdigestsketch-quantiles.md) |
| `VAR_POP(expr)` | 计算`expr`的总体方差, 额外的信息参见 [stats扩展文档](../configuration/core-ext/stats.md) |
| `VAR_SAMP(expr)` | 计算表达式的样本方差,额外的信息参见 [stats扩展文档](../configuration/core-ext/stats.md) |
| `VARIANCE(expr)` | 计算表达式的样本方差,额外的信息参见 [stats扩展文档](../configuration/core-ext/stats.md) |
| `STDDEV_POP(expr)` | 计算`expr`的总体标准差, 额外的信息参见 [stats扩展文档](../configuration/core-ext/stats.md) |
| `STDDEV_SAMP(expr)` | 计算表达式的样本标准差,额外的信息参见 [stats扩展文档](../configuration/core-ext/stats.md) |
| `STDDEV(expr)` | 计算表达式的样本标准差,额外的信息参见 [stats扩展文档](../configuration/core-ext/stats.md) |
| `EARLIEST(expr)` | 返回`expr`的最早值,该值必须是数字。如果`expr`来自一个与timestamp列如Druid数据源的关系那么"earliest"是所有被聚合值的最小总时间戳最先遇到的值。如果`expr`不是来自带有时间戳的关系,那么它只是遇到的第一个值。 |
| `ARLIEST(expr, maxBytesPerString) ` | 与`EARLIEST(expr)`相似但是面向string。`maxBytesPerString` 参数确定每个字符串要分配多少聚合空间, 超过此限制的字符串将被截断。这个参数应该设置得尽可能低,因为高值会导致内存浪费。 |
| `LATEST(expr)` | 返回 `expr` 的最新值,该值必须是数字。如果 `expr` 来自一个与timestamp列如Druid数据源的关系那么"latest"是最后一次遇到的值,它是所有被聚合的值的最大总时间戳。如果`expr`不是来自带有时间戳的关系,那么它只是遇到的最后一个值。 |
@ -326,7 +326,7 @@ Druid的原生类型系统允许字符串可能有多个值。这些 [多值维
**HLL Sketch函数**
以下函数操作在 [DataSketches HLL sketches](../Configuration/core-ext/datasketches-hll.md) 之上,使用这些函数之前需要加载 [DataSketches扩展](../development/datasketches-extension.md)
以下函数操作在 [DataSketches HLL sketches](../configuration/core-ext/datasketches-hll.md) 之上,使用这些函数之前需要加载 [DataSketches扩展](../development/datasketches-extension.md)
| 函数 | 描述 |
|-|-|
@ -337,7 +337,7 @@ Druid的原生类型系统允许字符串可能有多个值。这些 [多值维
**Theta Sketch函数**
以下函数操作在 [theta sketches](../Configuration/core-ext/datasketches-theta.md) 之上,使用这些函数之前需要加载 [DataSketches扩展](../development/datasketches-extension.md)
以下函数操作在 [theta sketches](../configuration/core-ext/datasketches-theta.md) 之上,使用这些函数之前需要加载 [DataSketches扩展](../development/datasketches-extension.md)
| 函数 | 描述 |
|-|-|
@ -349,7 +349,7 @@ Druid的原生类型系统允许字符串可能有多个值。这些 [多值维
**Quantiles Sketch函数**
以下函数操作在 [quantiles sketches](../Configuration/core-ext/datasketches-quantiles.md) 之上,使用这些函数之前需要加载 [DataSketches扩展](../development/datasketches-extension.md)
以下函数操作在 [quantiles sketches](../configuration/core-ext/datasketches-quantiles.md) 之上,使用这些函数之前需要加载 [DataSketches扩展](../development/datasketches-extension.md)
| 函数 | 描述 |
|-|-|
@ -370,7 +370,7 @@ Druid的原生类型系统允许字符串可能有多个值。这些 [多值维
| `NULLIF(value1, value2)` | 如果value1和value2匹配则返回NULL否则返回value1 |
| `COALESCE(value1, value2, ...)` | 返回第一个既不是NULL也不是空字符串的值。 |
| `NVL(expr,expr-for-null)` | 如果'expr'为空(或字符串类型为空字符串),则返回 `expr for null` |
| `BLOOM_FILTER_TEST(<expr>, <serialized-filter>)` | 如果值包含在Base64序列化bloom筛选器中则返回true。 详情查看 [Bloom Filter扩展](../Configuration/core-ext/bloom-filter.md) |
| `BLOOM_FILTER_TEST(<expr>, <serialized-filter>)` | 如果值包含在Base64序列化bloom筛选器中则返回true。 详情查看 [Bloom Filter扩展](../configuration/core-ext/bloom-filter.md) |
### 多值字符串函数
@ -435,7 +435,7 @@ DruidJoinQueryRel(condition=[=($1, $3)], joinType=[inner], query=[{"queryType":"
这里有一个带有两个输入的连接。阅读这篇文章的方法是将EXPLAIN计划输出的每一行看作可能成为一个查询或者可能只是一个简单的数据源。它们都拥有的`query` 字段称为"部分查询",并表示如果该行本身运行,将在该行所表示的数据源上运行的查询。在某些情况下,比如本例第二行中的"scan"查询,查询实际上并没有运行,最终被转换为一个简单的表数据源。有关如何工作的更多详细信息,请参见 [Join转换](#连接) 部分
我们可以使用Druid的 [请求日志功能](../Configuration/configuration.md#请求日志) 看到这一点。在启用日志记录并运行此查询之后,我们可以看到它实际上作为以下原生查询运行。
我们可以使用Druid的 [请求日志功能](../configuration/human-readable-byte.md#请求日志) 看到这一点。在启用日志记录并运行此查询之后,我们可以看到它实际上作为以下原生查询运行。
```json
{
@ -798,9 +798,9 @@ Servers表列出集群中发现的所有服务器
| `plaintext_port` | LONG | 服务器的不安全端口,如果禁用明文通信,则为-1 |
| `tls_port` | LONG | 服务器的TLS端口如果禁用了TLS则为-1 |
| `server_type` | STRING | Druid服务的类型可能的值包括COORDINATOR, OVERLORD, BROKER, ROUTER, HISTORICAL, MIDDLE_MANAGER 或者 PEON |
| `tier` | STRING | 分布层,查看 [druid.server.tier](../Configuration/configuration.md#Historical)。仅对Historical有效对于其他类型则为null |
| `tier` | STRING | 分布层,查看 [druid.server.tier](../configuration/human-readable-byte.md#Historical)。仅对Historical有效对于其他类型则为null |
| `current_size` | LONG | 此服务器上以字节为单位的段的当前大小。仅对Historical有效对于其他类型则为0 |
| `max_size` | LONG | 此服务器建议分配给段的最大字节大小,请参阅 [druid.server.maxSize](../Configuration/configuration.md) 文件, 仅对Historical有效对于其他类型则为0 |
| `max_size` | LONG | 此服务器建议分配给段的最大字节大小,请参阅 [druid.server.maxSize](../configuration/human-readable-byte.md) 文件, 仅对Historical有效对于其他类型则为0 |
要检索有关所有服务器的信息,请使用查询:
@ -879,8 +879,8 @@ SELECT * FROM sys.supervisors WHERE healthy=0;
### 服务配置
Druid SQL计划发生在Broker上由 [Broker runtime properties](../Configuration/configuration.md#broker) 配置。
Druid SQL计划发生在Broker上由 [Broker runtime properties](../configuration/human-readable-byte.md#broker) 配置。
### 安全性
有关进行SQL查询需要哪些权限的信息请参阅基本安全文档中的 [定义SQL权限](../Configuration/core-ext/druid-basic-security.md) 。
有关进行SQL查询需要哪些权限的信息请参阅基本安全文档中的 [定义SQL权限](../configuration/core-ext/druid-basic-security.md) 。

View File

@ -51,7 +51,7 @@ LongDouble和String类型都是支持的。 如果一个数字包括了小数
| `like` | like(expr, pattern[, escape]) 等价于SQL的 `expr LIKE pattern` |
| `case_searched` | case_searched(expr1, result1, [[expr2, result2, ...], else-result]) |
| `case_simple` | case_simple(expr, value1, result1, [[value2, result2, ...], else-result]) |
| `bloom_filter_test` | bloom_filter_test(expr, filter)对'filter'base64序列化的字符串测试'expr'的值。 详情可以查看 [布隆过滤器扩展](../Configuration/core-ext/bloom-filter.md) |
| `bloom_filter_test` | bloom_filter_test(expr, filter)对'filter'base64序列化的字符串测试'expr'的值。 详情可以查看 [布隆过滤器扩展](../configuration/core-ext/bloom-filter.md) |
### 字符串函数

View File

@ -56,24 +56,28 @@ grouping asked for by the query.
}
```
Following are main parts to a groupBy query:
下面的表格是有关分组查询groupBy的主要查询参数
|property|description|required?|
|属性|描述|是否是必须的?|
|--------|-----------|---------|
|queryType|This String should always be "groupBy"; this is the first thing Druid looks at to figure out how to interpret the query|yes|
|dataSource|A String or Object defining the data source to query, very similar to a table in a relational database. See [DataSource](../querying/datasource.md) for more information.|yes|
|dimensions|A JSON list of dimensions to do the groupBy over; or see [DimensionSpec](../querying/dimensionspecs.md) for ways to extract dimensions. |yes|
|limitSpec|See [LimitSpec](../querying/limitspec.md).|no|
|having|See [Having](../querying/having.md).|no|
|granularity|Defines the granularity of the query. See [Granularities](../querying/granularities.md)|yes|
|filter|See [Filters](../querying/filters.md)|no|
|aggregations|See [Aggregations](../querying/aggregations.md)|no|
|postAggregations|See [Post Aggregations](../querying/post-aggregations.md)|no|
|intervals|A JSON Object representing ISO-8601 Intervals. This defines the time ranges to run the query over.|yes|
|subtotalsSpec| A JSON array of arrays to return additional result sets for groupings of subsets of top level `dimensions`. It is [described later](groupbyquery.md#more-on-subtotalsspec) in more detail.|no|
|context|An additional JSON Object which can be used to specify certain flags.|no|
|queryType|这个地方的字符串应该总是 "groupBy";这个字段是 Druid 进行查询解析的时候首先查看的地方Druid 会根据这个字符串来决定使用何种解析器。|是YES|
|dataSource|这个字段是一个字符串或者对象。这个字段定义了查询的数据源,与关系数据库中的表的定义是非常相似的,请查看 [数据源DataSource](../querying/datasource.md) 页面中的内容来获得更多的信息。|是YES|
|dimensions|这是一个 JSON 的列表,在这个 JSON 的列表中表达了 groupBy 的查询维度,请参考 [DimensionSpec](../querying/dimensionspecs.md) 页面中的内容来了解如何进行表述。 |是YES|
|limitSpec|请查看 [LimitSpec](../querying/limitspec.md) 页面。|否NO|
|having|请查看 [Having](../querying/having.md)页面。|否NO|
|granularity|定义查询的粒度,请查看 [Granularities](../querying/granularities.md)页面。|是YES|
|filter|请查看 [Filters](../querying/filters.md)页面。|否NO|
|aggregations|请查看 [Aggregations](../querying/aggregations.md)页面。|否NO|
|postAggregations|请查看 [Post Aggregations](../querying/post-aggregations.md)页面。|否NO|
|intervals|一个使用了 ISO-8601 时间格式的 JSON 对象,这个对象定义了查询的时间范围。|是YES|
|subtotalsSpec| 一个 JSON 数组,返回顶级 `维度(dimensions)` 子集分组的附加结果集。稍后将 [更详细地](groupbyquery.md#more-on-subtotalsspec) 对其进行描述。|否NO|
|context|一个附加的 JSON 对象,这个对象将会被中一些标记位。|否NO|
To pull it all together, the above query would return *n\*m* data points, up to a maximum of 5000 points, where n is the cardinality of the `country` dimension, m is the cardinality of the `device` dimension, each day between 2012-01-01 and 2012-01-03, from the `sample_datasource` table. Each data point contains the (long) sum of `total_usage` if the value of the data point is greater than 100, the (double) sum of `data_transfer` and the (double) result of `total_usage` divided by `data_transfer` for the filter set for a particular grouping of `country` and `device`. The output looks like this:
把它们放在一起,上面查询 `sample_datasource` 的表将会返回 *n\*m* 个数据点,查询允许返回最多 5000 个数据点,其中 n 是 `country` 维度的基数m 是`device`维度的基数,在 2012-01-01 和 2012-01-03 之间的每一天。
如果数据点的值大于 100那么每个数据点将会包含 (long) sum 个 `total_usage`,对于特定的 `country``device` 分组,每个数据点都包含 `double total_usage` 除以 `data_transfer` 的结果。
输出如下:
```json
[
@ -103,7 +107,7 @@ To pull it all together, the above query would return *n\*m* data points, up to
]
```
## Behavior on multi-value dimensions
## 多值维度上的表现
groupBy queries can group on multi-value dimensions. When grouping on a multi-value dimension, _all_ values
from matching rows will be used to generate one group per value. It's possible for a query to return more groups than

View File

@ -15,8 +15,8 @@ Lookups没有历史记录总是使用当前的数据。这意味着如果
在所有服务器上Lookup通常都预加载在内存中。但是对于非常小的Lookup大约几十到几百个条目也可以使用"map"Lookup类型在原生查询时内联传递。有关详细信息请参见 [维度说明](querydimensions.md)。
其他的Lookup类型在扩展中是可用的例如
* 来自本地文件、远程URI或JDBC的全局缓存Lookup使用 [lookups-cached-global扩展](../Configuration/core-ext/lookups-cached-global.md)
* 来自Kafka Topic的全局缓存Lookup使用 [ kafka-extraction-namespace扩展](../Configuration/core-ext/kafka-extraction-namespace.md)
* 来自本地文件、远程URI或JDBC的全局缓存Lookup使用 [lookups-cached-global扩展](../configuration/core-ext/lookups-cached-global.md)
* 来自Kafka Topic的全局缓存Lookup使用 [ kafka-extraction-namespace扩展](../configuration/core-ext/kafka-extraction-namespace.md)
<script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
@ -354,7 +354,7 @@ map中所有的条目都将会更新没有条目被删除。
### 配置
可以查看Coordinator配置中的 [Lookups动态配置](../Configuration/configuration.md#coordinator)
可以查看Coordinator配置中的 [Lookups动态配置](../configuration/human-readable-byte.md#coordinator)
使用以下属性来配置Broker/Router/Historical/Peon来宣告它自身作为一个lookup tier的部分。

View File

@ -90,7 +90,7 @@ curl -X DELETE "http://host:port/druid/v2/abc123"
}
```
如果查询请求由于受到 [query scheduler laning configuration](../Configuration/configuration.md#broker) 的限制而失败则为HTTP 429响应该响应具有与错误响应相同的JSON对象架构`errorMessage` 格式为:"Total query capacity exceeded"或"query capacity exceeded for lane 'low'"。
如果查询请求由于受到 [query scheduler laning configuration](../configuration/human-readable-byte.md#broker) 的限制而失败则为HTTP 429响应该响应具有与错误响应相同的JSON对象架构`errorMessage` 格式为:"Total query capacity exceeded"或"query capacity exceeded for lane 'low'"。
响应中的字段是:

View File

@ -23,9 +23,9 @@
| 属性 | 默认值 | 描述 |
|-|-|-|
| timeout | `druid.server.http.defaultQueryTimeout` | 以毫秒为单位的查询超时,超过该时间未完成的查询将被取消。 0意味着`no timeout`。 可以在 [Broker配置中](../Configuration/configuration.md#broker)设置默认的超时时间 |
| timeout | `druid.server.http.defaultQueryTimeout` | 以毫秒为单位的查询超时,超过该时间未完成的查询将被取消。 0意味着`no timeout`。 可以在 [Broker配置中](../configuration/human-readable-byte.md#broker)设置默认的超时时间 |
| priority | `0` | 查询优先级。 具有更高优先级的查询将会优先获得计算资源 |
| lane | `null` | 查询通道,用于控制查询类的使用限制。 详情查看[Broker配置](../Configuration/configuration.md#broker)|
| lane | `null` | 查询通道,用于控制查询类的使用限制。 详情查看[Broker配置](../configuration/human-readable-byte.md#broker)|
| queryId | 自动生成 | 对于本次查询的一个唯一标识符。 如果一个查询ID被设置或者显式指定该ID可以用来取消一个查询 |
| useCache | `true` | 标识是否为此查询利用查询缓存。当设置为false时它将禁止从此查询缓存中读取。当设置为true时Apache Druid使用`druid.broker.cache.useCache`或`druid.historical.cache.useCache`确定是否从查询缓存中读取 |
| populateCache | `true` | 标识是否将查询结果保存到查询缓存。主要用于调试。当设置为false时它禁止将此查询的结果保存到查询缓存中。当设置为true时Druid使用`druid.broker.cache.populateCache`或`druid.historical.cache.populateCache` 来确定是否将此查询的结果保存到查询缓存 |
@ -33,14 +33,14 @@
| populateResultLevelCache | `true` | 标识是否将查询结果保存到结果级缓存。主要用于调试。当设置为false时它禁止将此查询的结果保存到查询缓存中。当设置为true时Druid使用`druid.broker.cache.populateResultLevelCache`来确定是否将此查询的结果保存到结果级查询缓存 |
| bySegment | `false` | 返回"by segment"结果。主要用于调试将其设置为true将返回与它们来自的数据段关联的结果 |
| finalize | `true` | 标识是否"finalize"聚合结果。主要用于调试。例如当该标志设置为false时`hyperUnique`聚合器将返回完整的HyperLogLog草图而不是估计的基数 |
| maxScatterGatherBytes | `druid.server.http.maxScatterGatherBytes` | 从数据进程如Historical和Realtime进程收集的用于执行查询的最大字节数。此参数可用于进一步减少查询时的`maxScatterGatherBytes`限制。有关更多详细信息,请参阅[Broker配置](../Configuration/configuration.md#broker)。 |
| maxScatterGatherBytes | `druid.server.http.maxScatterGatherBytes` | 从数据进程如Historical和Realtime进程收集的用于执行查询的最大字节数。此参数可用于进一步减少查询时的`maxScatterGatherBytes`限制。有关更多详细信息,请参阅[Broker配置](../configuration/human-readable-byte.md#broker)。 |
| maxQueuedBytes | `druid.broker.http.maxQueuedBytes` | 在对数据服务器的通道施加反压力之前,每个查询排队的最大字节数。与`maxScatterGatherBytes`类似但与该配置不同此配置将触发反压力而不是查询失败。0表示禁用 |
| serializeDateTimeAsLong | `false` | 如果为true则在Broker返回的结果和Broker与计算进程之间的数据传输中序列化DateTime |
| serializeDateTimeAsLongInner | `false` | 如果为true则在Broker和计算进程之间的数据传输中DateTime被序列化 |
| enableParallelMerge | `false` | 启用在Broker上进行并行结果合并。注意该配置设置为`true`时`druid.processing.merge.useParallelMergePool`参数必须启用。有关更多详细信息,请参阅[Broker配置](../Configuration/configuration.md#broker) |
| parallelMergeParallelism | `druid.processing.merge.pool.parallelism` | 在Broker上用于并行结果合并的最大并行线程数。有关更多详细信息请参阅[Broker配置](../Configuration/configuration.md#broker) |
| parallelMergeInitialYieldRows | `druid.processing.merge.task.initialYieldNumRows` | 有关更多详细信息,请参阅[Broker配置](../Configuration/configuration.md#broker) |
| parallelMergeSmallBatchRows | `druid.processing.merge.task.smallBatchNumRows` | 有关更多详细信息,请参阅[Broker配置](../Configuration/configuration.md#broker) |
| enableParallelMerge | `false` | 启用在Broker上进行并行结果合并。注意该配置设置为`true`时`druid.processing.merge.useParallelMergePool`参数必须启用。有关更多详细信息,请参阅[Broker配置](../configuration/human-readable-byte.md#broker) |
| parallelMergeParallelism | `druid.processing.merge.pool.parallelism` | 在Broker上用于并行结果合并的最大并行线程数。有关更多详细信息请参阅[Broker配置](../configuration/human-readable-byte.md#broker) |
| parallelMergeInitialYieldRows | `druid.processing.merge.task.initialYieldNumRows` | 有关更多详细信息,请参阅[Broker配置](../configuration/human-readable-byte.md#broker) |
| parallelMergeSmallBatchRows | `druid.processing.merge.task.smallBatchNumRows` | 有关更多详细信息,请参阅[Broker配置](../configuration/human-readable-byte.md#broker) |
| useFilterCNF | `false` | 如果为trueDruid将尝试将查询过滤器转换为合取范式CNF。在查询处理期间可以通过与符合条件的过滤器匹配的所有值的位图索引相交来预过滤列这通常会大大减少需要扫描的原始行数。但是这种效果只发生在顶层过滤器或者顶层“and”过滤器的单个子句中。因此在预过滤期间CNF中的过滤器可能更有可能在字符串列上使用大量位图索引。但是使用此设置时应格外小心因为它有时会对性能产生负面影响并且在某些情况下计算过滤器的CNF的操作可能会非常昂贵。如果可能的话我们建议手动调整过滤器以生成一个最佳的表单或者至少通过实验验证使用此参数实际上可以提高查询性能而不会产生不良影响 |
### 查询类型特定的参数

View File

@ -21,23 +21,23 @@ Apache Druid支持两种级别的结果缓存分别是段缓存和整个
### 使用和填充缓存
所有缓存都有一对参数,用于控制单个查询如何与缓存交互的行为,"use"缓存参数和"populate"缓存参数。必须通过[运行时属性(runtime properties)](../Configuration/configuration.md)在服务级别启用这些设置以利用缓存,但可以通过在[查询上下文(query context)](query-context.md)中设置它们来控制每个查询。"use"参数显然控制查询是否将使用缓存结果, "populate"参数控制查询是否更新缓存的结果。这些是单独的参数,目的是使得不常见数据(例如大型报表或非常旧的数据)的查询不会污染被其他查询重用的缓存结果。
所有缓存都有一对参数,用于控制单个查询如何与缓存交互的行为,"use"缓存参数和"populate"缓存参数。必须通过[运行时属性(runtime properties)](../configuration/human-readable-byte.md)在服务级别启用这些设置以利用缓存,但可以通过在[查询上下文(query context)](query-context.md)中设置它们来控制每个查询。"use"参数显然控制查询是否将使用缓存结果, "populate"参数控制查询是否更新缓存的结果。这些是单独的参数,目的是使得不常见数据(例如大型报表或非常旧的数据)的查询不会污染被其他查询重用的缓存结果。
### Brokers上边查询缓存
Broker同时支持段级缓存与全部查询结果级缓存。 段级缓存通过参数`useCache`和`populateCache`来控制。全部结果级缓存通过参数`useResultLevelCache`和`populateResultLevelCache`来控制,这些参数都在[运行时属性(runtime properties)](../Configuration/configuration.md)中的 `druid.broker.cache.*`
Broker同时支持段级缓存与全部查询结果级缓存。 段级缓存通过参数`useCache`和`populateCache`来控制。全部结果级缓存通过参数`useResultLevelCache`和`populateResultLevelCache`来控制,这些参数都在[运行时属性(runtime properties)](../configuration/human-readable-byte.md)中的 `druid.broker.cache.*`
对于小集群在Broker上启用段级缓存比在Historical上启用查询缓存的结果更快。对于较小的生产集群<5台服务器建议使用此设置对于大型生产集群**不建议**在Broker上填充段级缓存因为当属性`druid.broker.cache.populateCache`设置为`true`并且查询上下文参数`populateCache`未设置为`false`则将会按段返回Historical的结果Historical将无法进行任何本地结果合并这会削弱Druid集群的扩展能力
### Historical上边查询缓存
Historical仅仅支持段级缓存。段级缓存通过上下文参数`useCache`和`populateCache`以及[运行时属性(runtime properties)](../Configuration/configuration.md)中的 `druid.historical.cache.*`来控制。
Historical仅仅支持段级缓存。段级缓存通过上下文参数`useCache`和`populateCache`以及[运行时属性(runtime properties)](../configuration/human-readable-byte.md)中的 `druid.historical.cache.*`来控制。
大型集群应该仅仅在Historical上非Broker启用段级缓存填充这可以避免在Broker上合并所有的查询结果。在Historical上而非Broker上启用缓存填充使得Historical可以在自己本地进行结果合并然后将较少的数据传递给Broker。
### 摄取任务上的查询缓存
任务执行进程如Peon进程或者实验性的Indexer进程仅仅支持段级缓存。段级缓存通过上下文参数`useCache`和`populateCache`以及[运行时属性(runtime properties)](../Configuration/configuration.md)中的 `druid.realtime.cache.*`来控制。
任务执行进程如Peon进程或者实验性的Indexer进程仅仅支持段级缓存。段级缓存通过上下文参数`useCache`和`populateCache`以及[运行时属性(runtime properties)](../configuration/human-readable-byte.md)中的 `druid.realtime.cache.*`来控制。
大型集群应该仅仅在任务执行进程上非Broker启用段级缓存填充这可以避免在Broker上合并所有的查询结果。在任务执行进程上而非Broker上启用缓存填充使得任务执行进程可以在自己本地进行结果合并然后将较少的数据传递给Broker。

View File

@ -49,7 +49,7 @@ Druid的查询执行方法因查询的 [数据源类型](#数据源类型) 而
[query数据源] 是子查询, 每个子查询都被当作它自己的查询来执行, 结果会返回给Broker。然后Broker继续处理查询的其余部分就像子查询被内联数据源替换一样。
在大多数情况下子查询结果在其余查询继续之前在Broker上的内存中完全缓冲这意味着子查询按顺序执行。以这种方式在给定查询的所有子查询中缓冲的行总数不能超过 [druid.server.http.maxSubQueryRows](../Configuration/configuration.md) 属性。
在大多数情况下子查询结果在其余查询继续之前在Broker上的内存中完全缓冲这意味着子查询按顺序执行。以这种方式在给定查询的所有子查询中缓冲的行总数不能超过 [druid.server.http.maxSubQueryRows](../configuration/human-readable-byte.md) 属性。
有一个例外:如果外部查询和所有子查询都是 [groupBy](groupby.md) 类型,则可以以流式方式处理子查询结果,并且 `druid.server.http.maxSubQueryRows` 限制不适用。

View File

@ -18,7 +18,7 @@ Druid源代码包含一个 [示例docker-compose.yml](https://github.com/apache/
#### 配置
Druid Docker容器的配置是通过环境变量完成的环境变量还可以指定到 [标准Druid配置文件](../Configuration/configuration.md) 的路径
Druid Docker容器的配置是通过环境变量完成的环境变量还可以指定到 [标准Druid配置文件](../configuration/human-readable-byte.md) 的路径
特殊环境变量: