cleaning up and fixing links (#10528)

* cleaning up and fixing links

* reverting local link

* Update indexer.md

* link checking

* Fixing one more stale link for PostgreSQL
This commit is contained in:
sthetland 2020-12-17 13:37:43 -08:00 committed by GitHub
parent 1884c35698
commit 6ae8059c09
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
80 changed files with 302 additions and 326 deletions

View File

@ -69,7 +69,7 @@ The properties under this section are common configurations that should be share
There are four JVM parameters that we set on all of our processes:
1. `-Duser.timezone=UTC` This sets the default timezone of the JVM to UTC. We always set this and do not test with other default timezones, so local timezones might work, but they also might uncover weird and interesting bugs. To issue queries in a non-UTC timezone, see [query granularities](../querying/granularities.html#period-granularities)
1. `-Duser.timezone=UTC` This sets the default timezone of the JVM to UTC. We always set this and do not test with other default timezones, so local timezones might work, but they also might uncover weird and interesting bugs. To issue queries in a non-UTC timezone, see [query granularities](../querying/granularities.md#period-granularities)
2. `-Dfile.encoding=UTF-8` This is similar to timezone, we test assuming UTF-8. Local encodings might work, but they also might result in weird and interesting bugs.
3. `-Djava.io.tmpdir=<a path>` Various parts of the system that interact with the file system do it via temporary files, and these files can get somewhat large. Many production systems are set up to have small (but fast) `/tmp` directories, which can be problematic with Druid so we recommend pointing the JVMs tmp directory to something with a little more meat. This directory should not be volatile tmpfs. This directory should also have good read and write speed and hence NFS mount should strongly be avoided.
4. `-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager` This allows log4j2 to handle logs for non-log4j2 components (like jetty) which use standard java logging.
@ -177,9 +177,9 @@ and `druid.tlsPort` properties on each process. Please see `Configuration` secti
#### Jetty Server TLS Configuration
Druid uses Jetty as an embedded web server. To get familiar with TLS/SSL in general and related concepts like Certificates etc.
reading this [Jetty documentation](http://www.eclipse.org/jetty/documentation/9.4.x/configuring-ssl.html) might be helpful.
reading this [Jetty documentation](http://www.eclipse.org/jetty/documentation/9.4.32.v20200930/configuring-ssl.html) might be helpful.
To get more in depth knowledge of TLS/SSL support in Java in general, please refer to this [guide](http://docs.oracle.com/javase/8/docs/technotes/guides/security/jsse/JSSERefGuide.html).
The documentation [here](http://www.eclipse.org/jetty/documentation/9.4.x/configuring-ssl.html#configuring-sslcontextfactory)
The documentation [here](http://www.eclipse.org/jetty/documentation/9.4.32.v20200930/configuring-ssl.html#configuring-sslcontextfactory)
can help in understanding TLS/SSL configurations listed below. This [document](http://docs.oracle.com/javase/8/docs/technotes/guides/security/StandardNames.html) lists all the possible
values for the below mentioned configs among others provided by Java implementation.
@ -312,7 +312,7 @@ For native query, only request logs where query/time is above the threshold are
|--------|-----------|-------|
|`druid.request.logging.queryTimeThresholdMs`|Threshold value for query/time in milliseconds.|0, i.e., no filtering|
|`druid.request.logging.sqlQueryTimeThresholdMs`|Threshold value for sqlQuery/time in milliseconds.|0, i.e., no filtering|
|`druid.request.logging.mutedQueryTypes` | Query requests of these types are not logged. Query types are defined as string objects corresponding to the "queryType" value for the specified query in the Druid's [native JSON query API](http://druid.apache.org/docs/latest/querying/querying.html). Misspelled query types will be ignored. Example to ignore scan and timeBoundary queries: ["scan", "timeBoundary"]| []|
|`druid.request.logging.mutedQueryTypes` | Query requests of these types are not logged. Query types are defined as string objects corresponding to the "queryType" value for the specified query in the Druid's [native JSON query API](http://druid.apache.org/docs/latest/querying/querying). Misspelled query types will be ignored. Example to ignore scan and timeBoundary queries: ["scan", "timeBoundary"]| []|
|`druid.request.logging.delegate.type`|Type of delegate request logger to log requests.|none|
#### Composite Request Logging
@ -398,7 +398,7 @@ The Druid servers [emit various metrics](../operations/metrics.md) and alerts vi
#### Http Emitter Module TLS Overrides
When emitting events to a TLS-enabled receiver, the Http Emitter will by default use an SSLContext obtained via the
process described at [Druid's internal communication over TLS](../operations/tls-support.html), i.e., the same
process described at [Druid's internal communication over TLS](../operations/tls-support.md), i.e., the same
SSLContext that would be used for internal communications between Druid processes.
In some use cases it may be desirable to have the Http Emitter use its own separate truststore configuration. For example, there may be organizational policies that prevent the TLS-enabled metrics receiver's certificate from being added to the same truststore used by Druid's internal HTTP client.
@ -489,10 +489,10 @@ The below table shows some important configurations for S3. See [S3 Deep Storage
|--------|-----------|-------|
|`druid.storage.bucket`|S3 bucket name.|none|
|`druid.storage.baseKey`|S3 object key prefix for storage.|none|
|`druid.storage.disableAcl`|Boolean flag for ACL. If this is set to `false`, the full control would be granted to the bucket owner. This may require to set additional permissions. See [S3 permissions settings](../development/extensions-core/s3.html#s3-permissions-settings).|false|
|`druid.storage.disableAcl`|Boolean flag for ACL. If this is set to `false`, the full control would be granted to the bucket owner. This may require to set additional permissions. See [S3 permissions settings](../development/extensions-core/s3.md#s3-permissions-settings).|false|
|`druid.storage.archiveBucket`|S3 bucket name for archiving when running the *archive task*.|none|
|`druid.storage.archiveBaseKey`|S3 object key prefix for archiving.|none|
|`druid.storage.sse.type`|Server-side encryption type. Should be one of `s3`, `kms`, and `custom`. See the below [Server-side encryption section](../development/extensions-core/s3.html#server-side-encryption) for more details.|None|
|`druid.storage.sse.type`|Server-side encryption type. Should be one of `s3`, `kms`, and `custom`. See the below [Server-side encryption section](../development/extensions-core/s3.md#server-side-encryption) for more details.|None|
|`druid.storage.sse.kms.keyId`|AWS KMS key ID. This is used only when `druid.storage.sse.type` is `kms` and can be empty to use the default key ID.|None|
|`druid.storage.sse.custom.base64EncodedKey`|Base64-encoded key. Should be specified if `druid.storage.sse.type` is `custom`.|None|
|`druid.storage.useS3aSchema`|If true, use the "s3a" filesystem when using Hadoop-based ingestion. If false, the "s3n" filesystem will be used. Only affects Hadoop-based ingestion.|false|
@ -660,7 +660,7 @@ All Druid components can communicate with each other over HTTP.
## Master Server
This section contains the configuration options for the processes that reside on Master servers (Coordinators and Overlords) in the suggested [three-server configuration](../design/processes.html#server-types).
This section contains the configuration options for the processes that reside on Master servers (Coordinators and Overlords) in the suggested [three-server configuration](../design/processes.md#server-types).
### Coordinator
@ -806,7 +806,7 @@ These configuration options control the behavior of the Lookup dynamic configura
##### Compaction Dynamic Configuration
Compaction configurations can also be set or updated dynamically using
[Coordinator's API](../operations/api-reference.html#compaction-configuration) without restarting Coordinators.
[Coordinator's API](../operations/api-reference.md#compaction-configuration) without restarting Coordinators.
For details about segment compaction, please check [Segment Size Optimization](../operations/segment-optimization.md).
@ -815,12 +815,12 @@ A description of the compaction config is:
|Property|Description|Required|
|--------|-----------|--------|
|`dataSource`|dataSource name to be compacted.|yes|
|`taskPriority`|[Priority](../ingestion/tasks.html#priority) of compaction task.|no (default = 25)|
|`taskPriority`|[Priority](../ingestion/tasks.md#priority) of compaction task.|no (default = 25)|
|`inputSegmentSizeBytes`|Maximum number of total segment bytes processed per compaction task. Since a time chunk must be processed in its entirety, if the segments for a particular time chunk have a total size in bytes greater than this parameter, compaction will not run for that time chunk. Because each compaction task runs with a single thread, setting this value too far above 12GB will result in compaction tasks taking an excessive amount of time.|no (default = 419430400)|
|`maxRowsPerSegment`|Max number of rows per segment after compaction.|no|
|`skipOffsetFromLatest`|The offset for searching segments to be compacted. Strongly recommended to set for realtime dataSources. |no (default = "P1D")|
|`tuningConfig`|Tuning config for compaction tasks. See below [Compaction Task TuningConfig](#compaction-tuningconfig).|no|
|`taskContext`|[Task context](../ingestion/tasks.html#context) for compaction tasks.|no|
|`taskContext`|[Task context](../ingestion/tasks.md#context) for compaction tasks.|no|
An example of compaction config is:
@ -893,7 +893,7 @@ These Overlord static configurations can be defined in the `overlord/runtime.pro
|`druid.indexer.queue.restartDelay`|Sleep this long when Overlord queue management throws an exception before trying again.|PT30S|
|`druid.indexer.queue.storageSyncRate`|Sync Overlord state this often with an underlying task persistence mechanism.|PT1M|
The following configs only apply if the Overlord is running in remote mode. For a description of local vs. remote mode, please see (../design/overlord.html).
The following configs only apply if the Overlord is running in remote mode. For a description of local vs. remote mode, see [Overlord Process](../design/overlord.md).
|Property|Description|Default|
|--------|-----------|-------|
@ -1153,7 +1153,7 @@ For GCE's properties, please refer to the [gce-extensions](../development/extens
## Data Server
This section contains the configuration options for the processes that reside on Data servers (MiddleManagers/Peons and Historicals) in the suggested [three-server configuration](../design/processes.html#server-types).
This section contains the configuration options for the processes that reside on Data servers (MiddleManagers/Peons and Historicals) in the suggested [three-server configuration](../design/processes.md#server-types).
Configuration options for the experimental [Indexer process](../design/indexer.md) are also provided here.
@ -1323,14 +1323,14 @@ Druid uses Jetty to serve HTTP requests.
|Property|Description|Default|
|--------|-----------|-------|
|`druid.server.http.numThreads`|Number of threads for HTTP requests. Please see the [Indexer Server HTTP threads](../design/indexer.html#server-http-threads) documentation for more details on how the Indexer uses this configuration.|max(10, (Number of cores * 17) / 16 + 2) + 30|
|`druid.server.http.numThreads`|Number of threads for HTTP requests. Please see the [Indexer Server HTTP threads](../design/indexer.md#server-http-threads) documentation for more details on how the Indexer uses this configuration.|max(10, (Number of cores * 17) / 16 + 2) + 30|
|`druid.server.http.queueSize`|Size of the worker queue used by Jetty server to temporarily store incoming client connections. If this value is set and a request is rejected by jetty because queue is full then client would observe request failure with TCP connection being closed immediately with a completely empty response from server.|Unbounded|
|`druid.server.http.maxIdleTime`|The Jetty max idle time for a connection.|PT5M|
|`druid.server.http.enableRequestLimit`|If enabled, no requests would be queued in jetty queue and "HTTP 429 Too Many Requests" error response would be sent. |false|
|`druid.server.http.defaultQueryTimeout`|Query timeout in millis, beyond which unfinished queries will be cancelled|300000|
|`druid.server.http.gracefulShutdownTimeout`|The maximum amount of time Jetty waits after receiving shutdown signal. After this timeout the threads will be forcefully shutdown. This allows any queries that are executing to complete.|`PT0S` (do not wait)|
|`druid.server.http.unannouncePropagationDelay`|How long to wait for zookeeper unannouncements to propagate before shutting down Jetty. This is a minimum and `druid.server.http.gracefulShutdownTimeout` does not start counting down until after this period elapses.|`PT0S` (do not wait)|
|`druid.server.http.maxQueryTimeout`|Maximum allowed value (in milliseconds) for `timeout` parameter. See [query-context](../querying/query-context.html) to know more about `timeout`. Query is rejected if the query context `timeout` is greater than this value. |Long.MAX_VALUE|
|`druid.server.http.maxQueryTimeout`|Maximum allowed value (in milliseconds) for `timeout` parameter. See [query-context](../querying/query-context.md) to know more about `timeout`. Query is rejected if the query context `timeout` is greater than this value. |Long.MAX_VALUE|
|`druid.server.http.maxRequestHeaderSize`|Maximum size of a request header in bytes. Larger headers consume more memory and can make a server more vulnerable to denial of service attacks.|8 * 1024|
|`druid.server.http.enableForwardedRequestCustomizer`|If enabled, adds Jetty ForwardedRequestCustomizer which reads X-Forwarded-* request headers to manipulate servlet request object when Druid is used behind a proxy.|false|
|`druid.server.http.allowedHttpMethods`|List of HTTP methods that should be allowed in addition to the ones required by Druid APIs. Druid APIs require GET, PUT, POST, and DELETE, which are always allowed. This option is not useful unless you have installed an extension that needs these additional HTTP methods or that adds functionality related to CORS. None of Druid's bundled extensions require these methods.|[]|
@ -1487,7 +1487,7 @@ See [cache configuration](#cache-configuration) for how to configure cache setti
## Query Server
This section contains the configuration options for the processes that reside on Query servers (Brokers) in the suggested [three-server configuration](../design/processes.html#server-types).
This section contains the configuration options for the processes that reside on Query servers (Brokers) in the suggested [three-server configuration](../design/processes.md#server-types).
Configuration options for the experimental [Router process](../design/router.md) are also provided here.
@ -1658,7 +1658,7 @@ The Druid SQL server is configured through the following properties on the Broke
|`druid.sql.planner.maxTopNLimit`|Maximum threshold for a [TopN query](../querying/topnquery.md). Higher limits will be planned as [GroupBy queries](../querying/groupbyquery.md) instead.|100000|
|`druid.sql.planner.metadataRefreshPeriod`|Throttle for metadata refreshes.|PT1M|
|`druid.sql.planner.useApproximateCountDistinct`|Whether to use an approximate cardinality algorithm for `COUNT(DISTINCT foo)`.|true|
|`druid.sql.planner.useApproximateTopN`|Whether to use approximate [TopN queries](../querying/topnquery.html) when a SQL query could be expressed as such. If false, exact [GroupBy queries](../querying/groupbyquery.html) will be used instead.|true|
|`druid.sql.planner.useApproximateTopN`|Whether to use approximate [TopN queries](../querying/topnquery.md) when a SQL query could be expressed as such. If false, exact [GroupBy queries](../querying/groupbyquery.md) will be used instead.|true|
|`druid.sql.planner.requireTimeCondition`|Whether to require SQL to have filter conditions on __time column so that all generated native queries will have user specified intervals. If true, all queries without filter condition on __time column will fail|false|
|`druid.sql.planner.sqlTimeZone`|Sets the default time zone for the server, which will affect how time functions and timestamp literals behave. Should be a time zone name like "America/Los_Angeles" or offset like "-08:00".|UTC|
|`druid.sql.planner.metadataSegmentCacheEnable`|Whether to keep a cache of published segments in broker. If true, broker polls coordinator in background to get segments from metadata store and maintains a local cache. If false, coordinator's REST API will be invoked when broker needs published segments info.|false|
@ -1801,7 +1801,7 @@ This section describes configurations that control behavior of Druid's query typ
### Overriding default query context values
Any [Query Context General Parameter](../querying/query-context.html#general-parameters) default value can be
Any [Query Context General Parameter](../querying/query-context.md#general-parameters) default value can be
overridden by setting runtime property in the format of `druid.query.default.context.{query_context_key}`.
`druid.query.default.context.{query_context_key}` runtime property prefix applies to all current and future
query context keys, the same as how query context parameter passed with the query works. Note that the runtime property
@ -1833,7 +1833,7 @@ context). If query does have `maxQueuedBytes` in the context, then that value is
|Property|Description|Default|
|--------|-----------|-------|
|`druid.query.topN.minTopNThreshold`|See [TopN Aliasing](../querying/topnquery.html#aliasing) for details.|1000|
|`druid.query.topN.minTopNThreshold`|See [TopN Aliasing](../querying/topnquery.md#aliasing) for details.|1000|
### Search query config
@ -1951,9 +1951,9 @@ Supported query contexts:
|`druid.router.tierToBrokerMap`|Queries for a certain tier of data are routed to their appropriate Broker. This value should be an ordered JSON map of tiers to Broker names. The priority of Brokers is based on the ordering.|{"_default_tier": "<defaultBrokerServiceName>"}|
|`druid.router.defaultRule`|The default rule for all datasources.|"_default"|
|`druid.router.pollPeriod`|How often to poll for new rules.|PT1M|
|`druid.router.strategies`|Please see [Router Strategies](../design/router.html#router-strategies) for details.|[{"type":"timeBoundary"},{"type":"priority"}]|
|`druid.router.avatica.balancer.type`|Class to use for balancing Avatica queries across Brokers. Please see [Avatica Query Balancing](../design/router.html#avatica-query-balancing).|rendezvousHash|
|`druid.router.managementProxy.enabled`|Enables the Router's [management proxy](../design/router.html#router-as-management-proxy) functionality.|false|
|`druid.router.strategies`|Please see [Router Strategies](../design/router.md#router-strategies) for details.|[{"type":"timeBoundary"},{"type":"priority"}]|
|`druid.router.avatica.balancer.type`|Class to use for balancing Avatica queries across Brokers. Please see [Avatica Query Balancing](../design/router.md#avatica-query-balancing).|rendezvousHash|
|`druid.router.managementProxy.enabled`|Enables the Router's [management proxy](../design/router.md#router-as-management-proxy) functionality.|false|
|`druid.router.http.numConnections`|Size of connection pool for the Router to connect to Broker processes. If there are more queries than this number that all need to speak to the same process, then they will queue up.|`20`|
|`druid.router.http.readTimeout`|The timeout for data reads from Broker processes.|`PT15M`|
|`druid.router.http.numMaxThreads`|Maximum number of worker threads to handle HTTP requests and responses|`max(10, ((number of cores * 17) / 16 + 2) + 30)`|

View File

@ -23,7 +23,7 @@ title: "Logging"
-->
Apache Druid processes will emit logs that are useful for debugging to the console. Druid processes also emit periodic metrics about their state. For more about metrics, see [Configuration](../configuration/index.html#enabling-metrics). Metric logs are printed to the console by default, and can be disabled with `-Ddruid.emitter.logging.logLevel=debug`.
Apache Druid processes will emit logs that are useful for debugging to the console. Druid processes also emit periodic metrics about their state. For more about metrics, see [Configuration](../configuration/index.md#enabling-metrics). Metric logs are printed to the console by default, and can be disabled with `-Ddruid.emitter.logging.logLevel=debug`.
Druid uses [log4j2](http://logging.apache.org/log4j/2.x/) for logging. Logging can be configured with a log4j2.xml file. Add the path to the directory containing the log4j2.xml file (e.g. the _common/ dir) to your classpath if you want to override default Druid log configuration. Note that this directory should be earlier in the classpath than the druid jars. The easiest way to do this is to prefix the classpath with the config dir.

View File

@ -63,7 +63,7 @@ druid.metadata.storage.connector.dbcp.maxConnLifetimeMillis=1200000
druid.metadata.storage.connector.dbcp.defaultQueryTimeout=30000
```
See [BasicDataSource Configuration](https://commons.apache.org/proper/commons-dbcp/configuration.html) for full list.
See [BasicDataSource Configuration](https://commons.apache.org/proper/commons-dbcp/configuration) for full list.
## Metadata storage tables

View File

@ -118,7 +118,7 @@ to the [metadata store](#metadata-storage). This entry is a self-describing bit
things like the schema of the segment, its size, and its location on deep storage. These entries are what the
Coordinator uses to know what data *should* be available on the cluster.
For details on the segment file format, please see [segment files](segments.html).
For details on the segment file format, please see [segment files](segments.md).
For details on modeling your data in Druid, see [schema design](../ingestion/schema-design.md).
@ -229,9 +229,9 @@ publish in an all-or-nothing manner:
that has not yet been published can be rolled back if ingestion tasks fail. In this case, partially-ingested data is
discarded, and Druid will resume ingestion from the last committed set of stream offsets. This ensures exactly-once
publishing behavior.
- [Hadoop-based batch ingestion](../ingestion/hadoop.html). Each task publishes all segment metadata in a single
- [Hadoop-based batch ingestion](../ingestion/hadoop.md). Each task publishes all segment metadata in a single
transaction.
- [Native batch ingestion](../ingestion/native-batch.html). In parallel mode, the supervisor task publishes all segment
- [Native batch ingestion](../ingestion/native-batch.md). In parallel mode, the supervisor task publishes all segment
metadata in a single transaction after the subtasks are finished. In simple (single-task) mode, the single task
publishes all segment metadata in a single transaction after it is complete.
@ -244,11 +244,11 @@ ingestion will not cause duplicate data to be ingested:
- Supervised "seekable-stream" ingestion methods like [Kafka](../development/extensions-core/kafka-ingestion.md) and
[Kinesis](../development/extensions-core/kinesis-ingestion.md) are idempotent due to the fact that stream offsets and
segment metadata are stored together and updated in lock-step.
- [Hadoop-based batch ingestion](../ingestion/hadoop.html) is idempotent unless one of your input sources
- [Hadoop-based batch ingestion](../ingestion/hadoop.md) is idempotent unless one of your input sources
is the same Druid datasource that you are ingesting into. In this case, running the same task twice is non-idempotent,
because you are adding to existing data instead of overwriting it.
- [Native batch ingestion](../ingestion/native-batch.html) is idempotent unless
[`appendToExisting`](../ingestion/native-batch.html) is true, or one of your input sources is the same Druid datasource
- [Native batch ingestion](../ingestion/native-batch.md) is idempotent unless
[`appendToExisting`](../ingestion/native-batch.md) is true, or one of your input sources is the same Druid datasource
that you are ingesting into. In either of these two cases, running the same task twice is non-idempotent, because you
are adding to existing data instead of overwriting it.

View File

@ -25,11 +25,11 @@ title: "Broker"
### Configuration
For Apache Druid Broker Process Configuration, see [Broker Configuration](../configuration/index.html#broker).
For Apache Druid Broker Process Configuration, see [Broker Configuration](../configuration/index.md#broker).
### HTTP endpoints
For a list of API endpoints supported by the Broker, see [Broker API](../operations/api-reference.html#broker).
For a list of API endpoints supported by the Broker, see [Broker API](../operations/api-reference.md#broker).
### Overview

View File

@ -25,11 +25,11 @@ title: "Coordinator Process"
### Configuration
For Apache Druid Coordinator Process Configuration, see [Coordinator Configuration](../configuration/index.html#coordinator).
For Apache Druid Coordinator Process Configuration, see [Coordinator Configuration](../configuration/index.md#coordinator).
### HTTP endpoints
For a list of API endpoints supported by the Coordinator, see [Coordinator API](../operations/api-reference.html#coordinator).
For a list of API endpoints supported by the Coordinator, see [Coordinator API](../operations/api-reference.md#coordinator).
### Overview
@ -89,12 +89,12 @@ Once some segments are found, it issues a [compaction task](../ingestion/tasks.m
The maximum number of running compaction tasks is `min(sum of worker capacity * slotRatio, maxSlots)`.
Note that even though `min(sum of worker capacity * slotRatio, maxSlots)` = 0, at least one compaction task is always submitted
if the compaction is enabled for a dataSource.
See [Compaction Configuration API](../operations/api-reference.html#compaction-configuration) and [Compaction Configuration](../configuration/index.html#compaction-dynamic-configuration) to enable the compaction.
See [Compaction Configuration API](../operations/api-reference.md#compaction-configuration) and [Compaction Configuration](../configuration/index.md#compaction-dynamic-configuration) to enable the compaction.
Compaction tasks might fail due to the following reasons.
- If the input segments of a compaction task are removed or overshadowed before it starts, that compaction task fails immediately.
- If a task of a higher priority acquires a [time chunk lock](../ingestion/tasks.html#locking) for an interval overlapping with the interval of a compaction task, the compaction task fails.
- If a task of a higher priority acquires a [time chunk lock](../ingestion/tasks.md#locking) for an interval overlapping with the interval of a compaction task, the compaction task fails.
Once a compaction task fails, the Coordinator simply checks the segments in the interval of the failed task again, and issues another compaction task in the next run.
@ -127,7 +127,7 @@ If the coordinator has enough task slots for compaction, this policy will contin
`bar_2017-10-01T00:00:00.000Z_2017-11-01T00:00:00.000Z_VERSION` and `bar_2017-10-01T00:00:00.000Z_2017-11-01T00:00:00.000Z_VERSION_1`.
Finally, `foo_2017-09-01T00:00:00.000Z_2017-10-01T00:00:00.000Z_VERSION` will be picked up even though there is only one segment in the time chunk of `2017-09-01T00:00:00.000Z/2017-10-01T00:00:00.000Z`.
The search start point can be changed by setting [skipOffsetFromLatest](../configuration/index.html#compaction-dynamic-configuration).
The search start point can be changed by setting [skipOffsetFromLatest](../configuration/index.md#compaction-dynamic-configuration).
If this is set, this policy will ignore the segments falling into the time chunk of (the end time of the most recent segment - `skipOffsetFromLatest`).
This is to avoid conflicts between compaction tasks and realtime tasks.
Note that realtime tasks have a higher priority than compaction tasks by default. Realtime tasks will revoke the locks of compaction tasks if their intervals overlap, resulting in the termination of the compaction task.
@ -138,7 +138,7 @@ Note that realtime tasks have a higher priority than compaction tasks by default
### The Coordinator console
The Druid Coordinator exposes a web GUI for displaying cluster information and rule configuration. For more details, please see [coordinator console](../operations/management-uis.html#coordinator-consoles).
The Druid Coordinator exposes a web GUI for displaying cluster information and rule configuration. For more details, please see [coordinator console](../operations/management-uis.md#coordinator-consoles).
### FAQ

View File

@ -25,11 +25,11 @@ title: "Historical Process"
### Configuration
For Apache Druid Historical Process Configuration, see [Historical Configuration](../configuration/index.html#historical).
For Apache Druid Historical Process Configuration, see [Historical Configuration](../configuration/index.md#historical).
### HTTP endpoints
For a list of API endpoints supported by the Historical, please see the [API reference](../operations/api-reference.html#historical).
For a list of API endpoints supported by the Historical, please see the [API reference](../operations/api-reference.md#historical).
### Running

View File

@ -58,7 +58,7 @@ Druid servers fail, the system will automatically route around the damage until
is designed to run 24/7 with no need for planned downtimes for any reason, including configuration changes and software
updates.
6. **Cloud-native, fault-tolerant architecture that won't lose data.** Once Druid has ingested your data, a copy is
stored safely in [deep storage](architecture.html#deep-storage) (typically cloud storage, HDFS, or a shared filesystem).
stored safely in [deep storage](architecture.md#deep-storage) (typically cloud storage, HDFS, or a shared filesystem).
Your data can be recovered from deep storage even if every single Druid server fails. For more limited failures affecting
just a few Druid servers, replication ensures that queries are still possible while the system recovers.
7. **Indexes for quick filtering.** Druid uses [Roaring](https://roaringbitmap.org/) or

View File

@ -22,7 +22,7 @@ title: "Indexer Process"
~ under the License.
-->
> The Indexer is an optional and <a href="../development/experimental.html">experimental</a> feature.
> The Indexer is an optional and [experimental](../development/experimental.md) feature.
> Its memory management system is still under development and will be significantly enhanced in later releases.
The Apache Druid Indexer process is an alternative to the MiddleManager + Peon task execution system. Instead of forking a separate JVM process per-task, the Indexer runs tasks as separate threads within a single JVM process.
@ -31,11 +31,11 @@ The Indexer is designed to be easier to configure and deploy compared to the Mid
### Configuration
For Apache Druid Indexer Process Configuration, see [Indexer Configuration](../configuration/index.html#indexer).
For Apache Druid Indexer Process Configuration, see [Indexer Configuration](../configuration/index.md#indexer).
### HTTP endpoints
The Indexer process shares the same HTTP endpoints as the [MiddleManager](../operations/api-reference.html#middlemanager).
The Indexer process shares the same HTTP endpoints as the [MiddleManager](../operations/api-reference.md#middlemanager).
### Running
@ -51,7 +51,7 @@ The following resources are shared across all tasks running inside an Indexer pr
The query processing threads and buffers are shared across all tasks. The Indexer will serve queries from a single endpoint shared by all tasks.
If [query caching](../configuration/index.html#indexer-caching) is enabled, the query cache is also shared across all tasks.
If [query caching](../configuration/index.md#indexer-caching) is enabled, the query cache is also shared across all tasks.
#### Server HTTP threads

View File

@ -30,7 +30,7 @@ Indexing [tasks](../ingestion/tasks.md) create (and sometimes destroy) Druid [se
The indexing service is composed of three main components: a [Peon](../design/peons.md) component that can run a single task, a [Middle Manager](../design/middlemanager.md) component that manages Peons, and an [Overlord](../design/overlord.md) component that manages task distribution to MiddleManagers.
Overlords and MiddleManagers may run on the same process or across multiple processes while MiddleManagers and Peons always run on the same process.
Tasks are managed using API endpoints on the Overlord service. Please see [Overlord Task API](../operations/api-reference.html#tasks) for more information.
Tasks are managed using API endpoints on the Overlord service. Please see [Overlord Task API](../operations/api-reference.md#tasks) for more information.
![Indexing Service](../assets/indexing_service.png "Indexing Service")

View File

@ -25,11 +25,11 @@ title: "MiddleManager Process"
### Configuration
For Apache Druid MiddleManager Process Configuration, see [Indexing Service Configuration](../configuration/index.html#middlemanager-and-peons).
For Apache Druid MiddleManager Process Configuration, see [Indexing Service Configuration](../configuration/index.md#middlemanager-and-peons).
### HTTP endpoints
For a list of API endpoints supported by the MiddleManager, please see the [API reference](../operations/api-reference.html#middlemanager).
For a list of API endpoints supported by the MiddleManager, please see the [API reference](../operations/api-reference.md#middlemanager).
### Overview

View File

@ -25,11 +25,11 @@ title: "Overlord Process"
### Configuration
For Apache Druid Overlord Process Configuration, see [Overlord Configuration](../configuration/index.html#overlord).
For Apache Druid Overlord Process Configuration, see [Overlord Configuration](../configuration/index.md#overlord).
### HTTP endpoints
For a list of API endpoints supported by the Overlord, please see the [API reference](../operations/api-reference.html#overlord).
For a list of API endpoints supported by the Overlord, please see the [API reference](../operations/api-reference.md#overlord).
### Overview
@ -40,7 +40,7 @@ This mode is recommended if you intend to use the indexing service as the single
### Overlord console
The Overlord provides a UI for managing tasks and workers. For more details, please see [overlord console](../operations/management-uis.html#overlord-console).
The Overlord provides a UI for managing tasks and workers. For more details, please see [overlord console](../operations/management-uis.md#overlord-console).
### Blacklisted workers

View File

@ -25,11 +25,11 @@ title: "Peons"
### Configuration
For Apache Druid Peon Configuration, see [Peon Query Configuration](../configuration/index.html#peon-query-configuration) and [Additional Peon Configuration](../configuration/index.html#additional-peon-configuration).
For Apache Druid Peon Configuration, see [Peon Query Configuration](../configuration/index.md#peon-query-configuration) and [Additional Peon Configuration](../configuration/index.md#additional-peon-configuration).
### HTTP endpoints
For a list of API endpoints supported by the Peon, please see the [Peon API reference](../operations/api-reference.html#peon).
For a list of API endpoints supported by the Peon, please see the [Peon API reference](../operations/api-reference.md#peon).
Peons run a single task in a single JVM. MiddleManager is responsible for creating Peons for running tasks.
Peons should rarely (if ever for testing purposes) be run on their own.

View File

@ -134,7 +134,7 @@ In clusters with very high segment counts, it can make sense to separate the Coo
The Coordinator and Overlord processes can be run as a single combined process by setting the `druid.coordinator.asOverlord.enabled` property.
Please see [Coordinator Configuration: Operation](../configuration/index.html#coordinator-operation) for details.
Please see [Coordinator Configuration: Operation](../configuration/index.md#coordinator-operation) for details.
### Historicals and MiddleManagers

View File

@ -34,11 +34,11 @@ In addition to query routing, the Router also runs the [Druid Console](../operat
### Configuration
For Apache Druid Router Process Configuration, see [Router Configuration](../configuration/index.html#router).
For Apache Druid Router Process Configuration, see [Router Configuration](../configuration/index.md#router).
### HTTP endpoints
For a list of API endpoints supported by the Router, see [Router API](../operations/api-reference.html#router).
For a list of API endpoints supported by the Router, see [Router API](../operations/api-reference.md#router).
### Running

View File

@ -73,7 +73,7 @@ A sample derivativeDataSource supervisor spec is shown below:
|baseDataSource |The name of base dataSource. This dataSource data should be already stored inside Druid, and the dataSource will be used as input data.|yes|
|dimensionsSpec |Specifies the dimensions of the data. These dimensions must be the subset of baseDataSource's dimensions.|yes|
|metricsSpec |A list of aggregators. These metrics must be the subset of baseDataSource's metrics. See [aggregations](../../querying/aggregations.md).|yes|
|tuningConfig |TuningConfig must be HadoopTuningConfig. See [Hadoop tuning config](../../ingestion/hadoop.html#tuningconfig).|yes|
|tuningConfig |TuningConfig must be HadoopTuningConfig. See [Hadoop tuning config](../../ingestion/hadoop.md#tuningconfig).|yes|
|dataSource |The name of this derived dataSource. |no(default=baseDataSource-hashCode of supervisor)|
|hadoopDependencyCoordinates |A JSON array of Hadoop dependency coordinates that Druid will use, this property will override the default Hadoop coordinates. Once specified, Druid will look for those Hadoop dependencies from the location specified by druid.extensions.hadoopDependenciesDir |no|
|classpathPrefix |Classpath that will be prepended for the Peon process. |no|

View File

@ -34,7 +34,7 @@ Moving Average encapsulates the [groupBy query](../../querying/groupbyquery.md)
It runs the query in two main phases:
1. Runs an inner [groupBy](../../querying/groupbyquery.html) or [timeseries](../../querying/timeseriesquery.html) query to compute Aggregators (i.e. daily count of events).
1. Runs an inner [groupBy](../../querying/groupbyquery.md) or [timeseries](../../querying/timeseriesquery.md) query to compute Aggregators (i.e. daily count of events).
2. Passes over aggregated results in Broker, in order to compute Averagers (i.e. moving 7 day average of the daily count).
#### Main enhancements provided by this extension:
@ -70,7 +70,7 @@ There are currently no configuration properties specific to Moving Average.
|dimensions|A JSON list of [DimensionSpec](../../querying/dimensionspecs.md) (Notice that property is optional)|no|
|limitSpec|See [LimitSpec](../../querying/limitspec.md)|no|
|having|See [Having](../../querying/having.md)|no|
|granularity|A period granularity; See [Period Granularities](../../querying/granularities.html#period-granularities)|yes|
|granularity|A period granularity; See [Period Granularities](../../querying/granularities.md#period-granularities)|yes|
|filter|See [Filters](../../querying/filters.md)|no|
|aggregations|Aggregations forms the input to Averagers; See [Aggregations](../../querying/aggregations.md)|yes|
|postAggregations|Supports only aggregations as input; See [Post Aggregations](../../querying/post-aggregations.md)|no|

View File

@ -39,9 +39,9 @@ java -classpath "druid_dir/lib/*" org.apache.druid.cli.Main tools pull-deps -c o
To enable this extension after installation,
1. [include](../../development/extensions.md#loading-extensions) this `druid-redis-cache` extension
2. to enable cache on broker nodes, follow [broker caching docs](../../configuration/index.html#broker-caching) to set related properties
3. to enable cache on historical nodes, follow [historical caching docs](../../configuration/index.html#historical-caching) to set related properties
4. to enable cache on middle manager nodes, follow [peon caching docs](../../configuration/index.html#peon-caching) to set related properties
2. to enable cache on broker nodes, follow [broker caching docs](../../configuration/index.md#broker-caching) to set related properties
3. to enable cache on historical nodes, follow [historical caching docs](../../configuration/index.md#historical-caching) to set related properties
4. to enable cache on middle manager nodes, follow [peon caching docs](../../configuration/index.md#peon-caching) to set related properties
5. set `druid.cache.type` to `redis`
6. add the following properties

View File

@ -47,7 +47,7 @@ All the configuration parameters for the StatsD emitter are under `druid.emitter
|`druid.emitter.statsd.dogstatsd`|Flag to enable [DogStatsD](https://docs.datadoghq.com/developers/dogstatsd/) support. Causes dimensions to be included as tags, not as a part of the metric name. `convertRange` fields will be ignored.|no|false|
|`druid.emitter.statsd.dogstatsdConstantTags`|If `druid.emitter.statsd.dogstatsd` is true, the tags in the JSON list of strings will be sent with every event.|no|[]|
|`druid.emitter.statsd.dogstatsdServiceAsTag`|If `druid.emitter.statsd.dogstatsd` and `druid.emitter.statsd.dogstatsdServiceAsTag` are true, druid service (e.g. `druid/broker`, `druid/coordinator`, etc) is reported as a tag (e.g. `druid_service:druid/broker`) instead of being included in metric name (e.g. `druid.broker.query.time`) and `druid` is used as metric prefix (e.g. `druid.query.time`).|no|false|
|`druid.emitter.statsd.dogstatsdEvents`|If `druid.emitter.statsd.dogstatsd` and `druid.emitter.statsd.dogstatsdEvents` are true, [Alert events](../../operations/alerts.html) are reported to DogStatsD.|no|false|
|`druid.emitter.statsd.dogstatsdEvents`|If `druid.emitter.statsd.dogstatsd` and `druid.emitter.statsd.dogstatsdEvents` are true, [Alert events](../../operations/alerts.md) are reported to DogStatsD.|no|false|
### Druid to StatsD Event Converter

View File

@ -76,7 +76,7 @@ This string can then be used in the native or SQL Druid query.
|`type` |Filter Type. Should always be `bloom`|yes|
|`dimension` |The dimension to filter over. | yes |
|`bloomKFilter` |Base64 encoded Binary representation of `org.apache.hive.common.util.BloomKFilter`| yes |
|`extractionFn`|[Extraction function](../../querying/dimensionspecs.html#extraction-functions) to apply to the dimension values |no|
|`extractionFn`|[Extraction function](../../querying/dimensionspecs.md#extraction-functions) to apply to the dimension values |no|
### Serialized Format for BloomKFilter

View File

@ -33,7 +33,7 @@ druid.extensions.loadList=["druid-datasketches"]
The following modules are available:
* [Theta sketch](datasketches-theta.html) - approximate distinct counting with set operations (union, intersection and set difference).
* [Tuple sketch](datasketches-tuple.html) - extension of Theta sketch to support values associated with distinct keys (arrays of numeric values in this specialized implementation).
* [Quantiles sketch](datasketches-quantiles.html) - approximate distribution of comparable values to obtain ranks, quantiles and histograms. This is a specialized implementation for numeric values.
* [HLL sketch](datasketches-hll.html) - approximate distinct counting using very compact HLL sketch.
* [Theta sketch](datasketches-theta.md) - approximate distinct counting with set operations (union, intersection and set difference).
* [Tuple sketch](datasketches-tuple.md) - extension of Theta sketch to support values associated with distinct keys (arrays of numeric values in this specialized implementation).
* [Quantiles sketch](datasketches-quantiles.md) - approximate distribution of comparable values to obtain ranks, quantiles and histograms. This is a specialized implementation for numeric values.
* [HLL sketch](datasketches-hll.md) - approximate distinct counting using very compact HLL sketch.

View File

@ -23,7 +23,7 @@ title: "DataSketches Quantiles Sketch module"
-->
This module provides Apache Druid aggregators based on numeric quantiles DoublesSketch from [Apache DataSketches](https://datasketches.apache.org/) library. Quantiles sketch is a mergeable streaming algorithm to estimate the distribution of values, and approximately answer queries about the rank of a value, probability mass function of the distribution (PMF) or histogram, cumulative distribution function (CDF), and quantiles (median, min, max, 95th percentile and such). See [Quantiles Sketch Overview](https://datasketches.apache.org/docs/Quantiles/QuantilesOverview.html).
This module provides Apache Druid aggregators based on numeric quantiles DoublesSketch from [Apache DataSketches](https://datasketches.apache.org/) library. Quantiles sketch is a mergeable streaming algorithm to estimate the distribution of values, and approximately answer queries about the rank of a value, probability mass function of the distribution (PMF) or histogram, cumulative distribution function (CDF), and quantiles (median, min, max, 95th percentile and such). See [Quantiles Sketch Overview](https://datasketches.apache.org/docs/Quantiles/QuantilesOverview).
There are three major modes of operation:
@ -55,7 +55,7 @@ The result of the aggregation is a DoublesSketch that is the union of all sketch
|type|This String should always be "quantilesDoublesSketch"|yes|
|name|A String for the output (result) name of the calculation.|yes|
|fieldName|A String for the name of the input field (can contain sketches or raw numeric values).|yes|
|k|Parameter that determines the accuracy and size of the sketch. Higher k means higher accuracy but more space to store sketches. Must be a power of 2 from 2 to 32768. See the [Quantiles Accuracy](https://datasketches.apache.org/docs/Quantiles/QuantilesAccuracy.html) for details. |no, defaults to 128|
|k|Parameter that determines the accuracy and size of the sketch. Higher k means higher accuracy but more space to store sketches. Must be a power of 2 from 2 to 32768. See the [Quantiles Accuracy](https://datasketches.apache.org/docs/Quantiles/QuantilesAccuracy) for details. |no, defaults to 128|
### Post Aggregators

View File

@ -51,7 +51,7 @@ druid.extensions.loadList=["druid-datasketches"]
|name|A String for the output (result) name of the calculation.|yes|
|fieldName|A String for the name of the aggregator used at ingestion time.|yes|
|isInputThetaSketch|This should only be used at indexing time if your input data contains theta sketch objects. This would be the case if you use datasketches library outside of Druid, say with Pig/Hive, to produce the data that you are ingesting into Druid |no, defaults to false|
|size|Must be a power of 2. Internally, size refers to the maximum number of entries sketch object will retain. Higher size means higher accuracy but more space to store sketches. Note that after you index with a particular size, druid will persist sketch in segments and you will use size greater or equal to that at query time. See the [DataSketches site](https://datasketches.apache.org/docs/Theta/ThetaSize.html) for details. In general, We recommend just sticking to default size. |no, defaults to 16384|
|size|Must be a power of 2. Internally, size refers to the maximum number of entries sketch object will retain. Higher size means higher accuracy but more space to store sketches. Note that after you index with a particular size, druid will persist sketch in segments and you will use size greater or equal to that at query time. See the [DataSketches site](https://datasketches.apache.org/docs/Theta/ThetaSize) for details. In general, We recommend just sticking to default size. |no, defaults to 16384|
### Post Aggregators

View File

@ -49,7 +49,7 @@ druid.extensions.loadList=["druid-datasketches"]
|type|This String should always be "arrayOfDoublesSketch"|yes|
|name|A String for the output (result) name of the calculation.|yes|
|fieldName|A String for the name of the input field.|yes|
|nominalEntries|Parameter that determines the accuracy and size of the sketch. Higher k means higher accuracy but more space to store sketches. Must be a power of 2. See the [Theta sketch accuracy](https://datasketches.apache.org/docs/Theta/ThetaErrorTable.html) for details. |no, defaults to 16384|
|nominalEntries|Parameter that determines the accuracy and size of the sketch. Higher k means higher accuracy but more space to store sketches. Must be a power of 2. See the [Theta sketch accuracy](https://datasketches.apache.org/docs/Theta/ThetaErrorTable) for details. |no, defaults to 16384|
|numberOfValues|Number of values associated with each distinct key. |no, defaults to 1|
|metricColumns|If building sketches from raw data, an array of names of the input columns containing numeric values to be associated with each distinct key.|no, defaults to empty array|
@ -118,7 +118,7 @@ Returns a list of variance values from a given ArrayOfDoublesSketch. The result
#### Quantiles sketch from a column
Returns a quantiles DoublesSketch constructed from a given column of values from a given ArrayOfDoublesSketch using optional parameter k that determines the accuracy and size of the quantiles sketch. See [Quantiles Sketch Module](datasketches-quantiles.html)
Returns a quantiles DoublesSketch constructed from a given column of values from a given ArrayOfDoublesSketch using optional parameter k that determines the accuracy and size of the quantiles sketch. See [Quantiles Sketch Module](datasketches-quantiles.md)
* The column number is 1-based and is optional (the default is 1).
* The parameter k is optional (the default is defined in the sketch library).
@ -151,7 +151,7 @@ Returns a result of a specified set operation on the given array of sketches. Su
#### Student's t-test
Performs Student's t-test and returns a list of p-values given two instances of ArrayOfDoublesSketch. The result will be N double values, where N is the number of double values kept in the sketch per key. See [t-test documentation](http://commons.apache.org/proper/commons-math/javadocs/api-3.4/org/apache/commons/math3/stat/inference/TTest.html).
Performs Student's t-test and returns a list of p-values given two instances of ArrayOfDoublesSketch. The result will be N double values, where N is the number of double values kept in the sketch per key. See [t-test documentation](http://commons.apache.org/proper/commons-math/javadocs/api-3.4/org/apache/commons/math3/stat/inference/TTest).
```json
{

View File

@ -455,4 +455,3 @@ Please see [Defining permissions](../../operations/security-user-auth.md#definin
##### Cache Load Status
`GET(/druid-ext/basic-security/authorization/loadStatus)`
Return the current load status of the local caches of the authorization Druid metadata store.

View File

@ -25,7 +25,7 @@ title: "Druid pac4j based Security extension"
Apache Druid Extension to enable [OpenID Connect](https://openid.net/connect/) based Authentication for Druid Processes using [pac4j](https://github.com/pac4j/pac4j) as the underlying client library.
This can be used with any authentication server that supports same e.g. [Okta](https://developer.okta.com/).
This extension should only be used at the router node to enable a group of users in existing authentication server to interact with Druid cluster, using the [Web Console](../../operations/druid-console.html). This extension does not support JDBC client authentication.
This extension should only be used at the router node to enable a group of users in existing authentication server to interact with Druid cluster, using the [Web Console](../../operations/druid-console.md). This extension does not support JDBC client authentication.
## Configuration

View File

@ -106,9 +106,9 @@ GET requires READ permission, while POST and DELETE require WRITE permission.
Queries on Druid datasources require DATASOURCE READ permissions for the specified datasource.
Queries on the [INFORMATION_SCHEMA tables](../../querying/sql.html#information-schema) will return information about datasources that the caller has DATASOURCE READ access to. Other datasources will be omitted.
Queries on the [INFORMATION_SCHEMA tables](../../querying/sql.md#information-schema) will return information about datasources that the caller has DATASOURCE READ access to. Other datasources will be omitted.
Queries on the [system schema tables](../../querying/sql.html#system-schema) require the following permissions:
Queries on the [system schema tables](../../querying/sql.md#system-schema) require the following permissions:
- `segments`: Segments will be filtered based on DATASOURCE READ permissions.
- `servers`: The user requires STATE READ permissions.
- `server_segments`: The user requires STATE READ permissions and segments will be filtered based on DATASOURCE READ permissions.

View File

@ -55,7 +55,7 @@ To use the AWS S3 as the deep storage, you need to configure `druid.storage.stor
|`druid.storage.type`|hdfs| |Must be set.|
|`druid.storage.storageDirectory`|s3a://bucket/example/directory or s3n://bucket/example/directory|Path to the deep storage|Must be set.|
You also need to include the [Hadoop AWS module](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html), especially the `hadoop-aws.jar` in the Druid classpath.
You also need to include the [Hadoop AWS module](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/), especially the `hadoop-aws.jar` in the Druid classpath.
Run the below command to install the `hadoop-aws.jar` file under `${DRUID_HOME}/extensions/druid-hdfs-storage` in all nodes.
```bash
@ -64,7 +64,7 @@ cp ${DRUID_HOME}/hadoop-dependencies/hadoop-aws/${HADOOP_VERSION}/hadoop-aws-${H
```
Finally, you need to add the below properties in the `core-site.xml`.
For more configurations, see the [Hadoop AWS module](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html).
For more configurations, see the [Hadoop AWS module](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/).
```xml
<property>
@ -111,7 +111,7 @@ and authentication properties needed for GCS. You may want to copy the below
example properties. Please follow the instructions at
[https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcs/INSTALL.md](https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcs/INSTALL.md)
for more details.
For more configurations, [GCS core default](https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcs/conf/gcs-core-default.xml)
For more configurations, [GCS core default](https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/v2.0.0/gcs/conf/gcs-core-default.xml)
and [GCS core template](https://github.com/GoogleCloudPlatform/bdutil/blob/master/conf/hadoop2/gcs-core-template.xml).
```xml

View File

@ -191,7 +191,7 @@ The tuningConfig is optional and default parameters will be used if no tuningCon
| `indexSpecForIntermediatePersists`| | Defines segment storage format options to be used at indexing time for intermediate persisted temporary segments. This can be used to disable dimension/metric compression on intermediate segments to reduce memory required for final merging. However, disabling compression on intermediate segments might increase page cache use while they are used before getting merged into final segment published, see [IndexSpec](#indexspec) for possible values. | no (default = same as indexSpec) |
| `reportParseExceptions` | Boolean | *DEPRECATED*. If true, exceptions encountered during parsing will be thrown and will halt ingestion; if false, unparseable rows and fields will be skipped. Setting `reportParseExceptions` to true will override existing configurations for `maxParseExceptions` and `maxSavedParseExceptions`, setting `maxParseExceptions` to 0 and limiting `maxSavedParseExceptions` to no more than 1. | no (default == false) |
| `handoffConditionTimeout` | Long | Milliseconds to wait for segment handoff. It must be >= 0, where 0 means to wait forever. | no (default == 0) |
| `resetOffsetAutomatically` | Boolean | Controls behavior when Druid needs to read Kafka messages that are no longer available (i.e. when OffsetOutOfRangeException is encountered).<br/><br/>If false, the exception will bubble up, which will cause your tasks to fail and ingestion to halt. If this occurs, manual intervention is required to correct the situation; potentially using the [Reset Supervisor API](../../operations/api-reference.html#supervisors). This mode is useful for production, since it will make you aware of issues with ingestion.<br/><br/>If true, Druid will automatically reset to the earlier or latest offset available in Kafka, based on the value of the `useEarliestOffset` property (earliest if true, latest if false). Please note that this can lead to data being _DROPPED_ (if `useEarliestOffset` is false) or _DUPLICATED_ (if `useEarliestOffset` is true) without your knowledge. Messages will be logged indicating that a reset has occurred, but ingestion will continue. This mode is useful for non-production situations, since it will make Druid attempt to recover from problems automatically, even if they lead to quiet dropping or duplicating of data.<br/><br/>This feature behaves similarly to the Kafka `auto.offset.reset` consumer property. | no (default == false) |
| `resetOffsetAutomatically` | Boolean | Controls behavior when Druid needs to read Kafka messages that are no longer available (i.e. when OffsetOutOfRangeException is encountered).<br/><br/>If false, the exception will bubble up, which will cause your tasks to fail and ingestion to halt. If this occurs, manual intervention is required to correct the situation; potentially using the [Reset Supervisor API](../../operations/api-reference.md#supervisors). This mode is useful for production, since it will make you aware of issues with ingestion.<br/><br/>If true, Druid will automatically reset to the earlier or latest offset available in Kafka, based on the value of the `useEarliestOffset` property (earliest if true, latest if false). Please note that this can lead to data being _DROPPED_ (if `useEarliestOffset` is false) or _DUPLICATED_ (if `useEarliestOffset` is true) without your knowledge. Messages will be logged indicating that a reset has occurred, but ingestion will continue. This mode is useful for non-production situations, since it will make Druid attempt to recover from problems automatically, even if they lead to quiet dropping or duplicating of data.<br/><br/>This feature behaves similarly to the Kafka `auto.offset.reset` consumer property. | no (default == false) |
| `workerThreads` | Integer | The number of threads that the supervisor uses to handle requests/responses for worker tasks, along with any other internal asynchronous operation. | no (default == min(10, taskCount)) |
| `chatThreads` | Integer | The number of threads that will be used for communicating with indexing tasks. | no (default == min(10, taskCount * replicas)) |
| `chatRetries` | Integer | The number of times HTTP requests to indexing tasks will be retried before considering tasks unresponsive. | no (default == 8) |
@ -232,12 +232,12 @@ For Concise bitmaps:
|Field|Type|Description|Required|
|-----|----|-----------|--------|
|`type`|String|See [Additional Peon Configuration: SegmentWriteOutMediumFactory](../../configuration/index.html#segmentwriteoutmediumfactory) for explanation and available options.|yes|
|`type`|String|See [Additional Peon Configuration: SegmentWriteOutMediumFactory](../../configuration/index.md#segmentwriteoutmediumfactory) for explanation and available options.|yes|
## Operations
This section gives descriptions of how some supervisor APIs work specifically in Kafka Indexing Service.
For all supervisor APIs, please check [Supervisor APIs](../../operations/api-reference.html#supervisors).
For all supervisor APIs, please check [Supervisor APIs](../../operations/api-reference.md#supervisors).
### Getting Supervisor Status Report

View File

@ -174,7 +174,7 @@ The tuningConfig is optional and default parameters will be used if no tuningCon
| `indexSpecForIntermediatePersists` | | Defines segment storage format options to be used at indexing time for intermediate persisted temporary segments. This can be used to disable dimension/metric compression on intermediate segments to reduce memory required for final merging. However, disabling compression on intermediate segments might increase page cache use while they are used before getting merged into final segment published, see [IndexSpec](#indexspec) for possible values. | no (default = same as indexSpec) |
| `reportParseExceptions` | Boolean | If true, exceptions encountered during parsing will be thrown and will halt ingestion; if false, unparseable rows and fields will be skipped. | no (default == false) |
| `handoffConditionTimeout` | Long | Milliseconds to wait for segment handoff. It must be >= 0, where 0 means to wait forever. | no (default == 0) |
| `resetOffsetAutomatically` | Boolean | Controls behavior when Druid needs to read Kinesis messages that are no longer available.<br/><br/>If false, the exception will bubble up, which will cause your tasks to fail and ingestion to halt. If this occurs, manual intervention is required to correct the situation; potentially using the [Reset Supervisor API](../../operations/api-reference.html#supervisors). This mode is useful for production, since it will make you aware of issues with ingestion.<br/><br/>If true, Druid will automatically reset to the earlier or latest sequence number available in Kinesis, based on the value of the `useEarliestSequenceNumber` property (earliest if true, latest if false). Please note that this can lead to data being _DROPPED_ (if `useEarliestSequenceNumber` is false) or _DUPLICATED_ (if `useEarliestSequenceNumber` is true) without your knowledge. Messages will be logged indicating that a reset has occurred, but ingestion will continue. This mode is useful for non-production situations, since it will make Druid attempt to recover from problems automatically, even if they lead to quiet dropping or duplicating of data. | no (default == false) |
| `resetOffsetAutomatically` | Boolean | Controls behavior when Druid needs to read Kinesis messages that are no longer available.<br/><br/>If false, the exception will bubble up, which will cause your tasks to fail and ingestion to halt. If this occurs, manual intervention is required to correct the situation; potentially using the [Reset Supervisor API](../../operations/api-reference.md#supervisors). This mode is useful for production, since it will make you aware of issues with ingestion.<br/><br/>If true, Druid will automatically reset to the earlier or latest sequence number available in Kinesis, based on the value of the `useEarliestSequenceNumber` property (earliest if true, latest if false). Please note that this can lead to data being _DROPPED_ (if `useEarliestSequenceNumber` is false) or _DUPLICATED_ (if `useEarliestSequenceNumber` is true) without your knowledge. Messages will be logged indicating that a reset has occurred, but ingestion will continue. This mode is useful for non-production situations, since it will make Druid attempt to recover from problems automatically, even if they lead to quiet dropping or duplicating of data. | no (default == false) |
| `skipSequenceNumberAvailabilityCheck` | Boolean | Whether to enable checking if the current sequence number is still available in a particular Kinesis shard. If set to false, the indexing task will attempt to reset the current sequence number (or not), depending on the value of `resetOffsetAutomatically`. | no (default == false) |
| `workerThreads` | Integer | The number of threads that the supervisor uses to handle requests/responses for worker tasks, along with any other internal asynchronous operation. | no (default == min(10, taskCount)) |
| `chatThreads` | Integer | The number of threads that will be used for communicating with indexing tasks. | no (default == min(10, taskCount * replicas)) |
@ -222,12 +222,12 @@ For Concise bitmaps:
|Field|Type|Description|Required|
|-----|----|-----------|--------|
|`type`|String|See [Additional Peon Configuration: SegmentWriteOutMediumFactory](../../configuration/index.html#segmentwriteoutmediumfactory) for explanation and available options.|yes|
|`type`|String|See [Additional Peon Configuration: SegmentWriteOutMediumFactory](../../configuration/index.md#segmentwriteoutmediumfactory) for explanation and available options.|yes|
## Operations
This section gives descriptions of how some supervisor APIs work specifically in Kinesis Indexing Service.
For all supervisor APIs, please check [Supervisor APIs](../../operations/api-reference.html#supervisors).
For all supervisor APIs, please check [Supervisor APIs](../../operations/api-reference.md#supervisors).
### AWS Authentication
To authenticate with AWS, you must provide your AWS access key and AWS secret key via runtime.properties, for example:

View File

@ -86,7 +86,7 @@ The parameters are as follows
|--------|-----------|--------|-------|
|`extractionNamespace`|Specifies how to populate the local cache. See below|Yes|-|
|`firstCacheTimeout`|How long to wait (in ms) for the first run of the cache to populate. 0 indicates to not wait|No|`0` (do not wait)|
|`injective`|If the underlying map is [injective](../../querying/lookups.html#query-execution) (keys and values are unique) then optimizations can occur internally by setting this to `true`|No|`false`|
|`injective`|If the underlying map is [injective](../../querying/lookups.md#query-execution) (keys and values are unique) then optimizations can occur internally by setting this to `true`|No|`false`|
If `firstCacheTimeout` is set to a non-zero value, it should be less than `druid.manager.lookups.hostUpdateTimeout`. If `firstCacheTimeout` is NOT set, then management is essentially asynchronous and does not know if a lookup succeeded or failed in starting. In such a case logs from the processes using lookups should be monitored for repeated failures.
@ -95,7 +95,7 @@ Proper functionality of globally cached lookups requires the following extension
## Example configuration
In a simple case where only one [tier](../../querying/lookups.html#dynamic-configuration) exists (`realtime_customer2`) with one `cachedNamespace` lookup called `country_code`, the resulting configuration JSON looks similar to the following:
In a simple case where only one [tier](../../querying/lookups.md#dynamic-configuration) exists (`realtime_customer2`) with one `cachedNamespace` lookup called `country_code`, the resulting configuration JSON looks similar to the following:
```json
{

View File

@ -76,7 +76,7 @@ Druid uses the following credentials provider chain to connect to your S3 bucket
|6|ECS container credentials|Based on environment variables available on AWS ECS (AWS_CONTAINER_CREDENTIALS_RELATIVE_URI or AWS_CONTAINER_CREDENTIALS_FULL_URI) as described in the [EC2ContainerCredentialsProviderWrapper documentation](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/auth/EC2ContainerCredentialsProviderWrapper.html)|
|7|Instance profile information|Based on the instance profile you may have attached to your druid instance|
You can find more information about authentication method [here](https://docs.aws.amazon.com/fr_fr/sdk-for-java/v1/developer-guide/credentials.html)<br/>
You can find more information about authentication method [here](https://docs.aws.amazon.com/fr_fr/sdk-for-java/v1/developer-guide/credentials)<br/>
**Note :** *Order is important here as it indicates the precedence of authentication methods.<br/>
So if you are trying to use Instance profile information, you **must not** set `druid.s3.accessKey` and `druid.s3.secretKey` in your Druid runtime.properties*
@ -118,9 +118,9 @@ As an example, to set the region to 'us-east-1' through system properties:
## Server-side encryption
You can enable [server-side encryption](https://docs.aws.amazon.com/AmazonS3/latest/dev/serv-side-encryption.html) by setting
You can enable [server-side encryption](https://docs.aws.amazon.com/AmazonS3/latest/dev/serv-side-encryption) by setting
`druid.storage.sse.type` to a supported type of server-side encryption. The current supported types are:
- s3: [Server-side encryption with S3-managed encryption keys](https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingServerSideEncryption.html)
- kms: [Server-side encryption with AWS KMSManaged Keys](https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingKMSEncryption.html)
- custom: [Server-side encryption with Customer-Provided Encryption Keys](https://docs.aws.amazon.com/AmazonS3/latest/dev/ServerSideEncryptionCustomerKeys.html)
- s3: [Server-side encryption with S3-managed encryption keys](https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingServerSideEncryption)
- kms: [Server-side encryption with AWS KMSManaged Keys](https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingKMSEncryption)
- custom: [Server-side encryption with Customer-Provided Encryption Keys](https://docs.aws.amazon.com/AmazonS3/latest/dev/ServerSideEncryptionCustomerKeys)

View File

@ -23,9 +23,9 @@ title: "Simple SSLContext Provider Module"
-->
This Apache Druid module contains a simple implementation of [SSLContext](http://docs.oracle.com/javase/8/docs/api/javax/net/ssl/SSLContext.html)
This Apache Druid module contains a simple implementation of [SSLContext](http://docs.oracle.com/javase/8/docs/api/javax/net/ssl/SSLContext)
that will be injected to be used with HttpClient that Druid processes use internally to communicate with each other. To learn more about
Java's SSL support, please refer to [this](http://docs.oracle.com/javase/8/docs/technotes/guides/security/jsse/JSSERefGuide.html) guide.
Java's SSL support, please refer to [this](http://docs.oracle.com/javase/8/docs/technotes/guides/security/jsse/JSSERefGuide) guide.
|Property|Description|Default|Required|
@ -48,5 +48,5 @@ The following table contains optional parameters for supporting client certifica
|`druid.client.https.keyManagerPassword`|The [Password Provider](../../operations/password-provider.md) or String password for the Key Manager.|none|no|
|`druid.client.https.validateHostnames`|Validate the hostname of the server. This should not be disabled unless you are using [custom TLS certificate checks](../../operations/tls-support.md) and know that standard hostname validation is not needed.|true|no|
This [document](http://docs.oracle.com/javase/8/docs/technotes/guides/security/StandardNames.html) lists all the possible
This [document](http://docs.oracle.com/javase/8/docs/technotes/guides/security/StandardNames) lists all the possible
values for the above mentioned configs among others provided by Java implementation.

View File

@ -30,13 +30,13 @@ This page discusses how to use JavaScript to extend Apache Druid.
JavaScript can be used to extend Druid in a variety of ways:
- [Aggregators](../querying/aggregations.html#javascript-aggregator)
- [Extraction functions](../querying/dimensionspecs.html#javascript-extraction-function)
- [Filters](../querying/filters.html#javascript-filter)
- [Post-aggregators](../querying/post-aggregations.html#javascript-post-aggregator)
- [Input parsers](../ingestion/data-formats.html#javascript-parsespec)
- [Router strategy](../design/router.html#javascript)
- [Worker select strategy](../configuration/index.html#javascript-worker-select-strategy)
- [Aggregators](../querying/aggregations.md#javascript-aggregator)
- [Extraction functions](../querying/dimensionspecs.md#javascript-extraction-function)
- [Filters](../querying/filters.md#javascript-filter)
- [Post-aggregators](../querying/post-aggregations.md#javascript-post-aggregator)
- [Input parsers](../ingestion/data-formats.md#javascript-parsespec)
- [Router strategy](../design/router.md#javascript)
- [Worker select strategy](../configuration/index.md#javascript-worker-select-strategy)
JavaScript can be injected dynamically at runtime, making it convenient to rapidly prototype new functionality
without needing to write and deploy Druid extensions.
@ -48,7 +48,7 @@ Druid uses the Mozilla Rhino engine at optimization level 9 to compile and execu
Druid does not execute JavaScript functions in a sandbox, so they have full access to the machine. So JavaScript
functions allow users to execute arbitrary code inside druid process. So, by default, JavaScript is disabled.
However, on dev/staging environments or secured production environments you can enable those by setting
the [configuration property](../configuration/index.html#javascript)
the [configuration property](../configuration/index.md#javascript)
`druid.javascript.enabled = true`.
## Global variables

View File

@ -72,5 +72,5 @@ At some point in the future, we will likely move the internal UI code out of cor
## Client libraries
We welcome contributions for new client libraries to interact with Druid. See the
[Community and third-party libraries](https://druid.apache.org/libraries.html) page for links to existing client
[Community and third-party libraries](https://druid.apache.org/libraries.md) page for links to existing client
libraries.

View File

@ -329,7 +329,7 @@ and [Kinesis indexing service](../development/extensions-core/kinesis-ingestion.
Consider using the [input format](#input-format) instead for these types of ingestion.
This section lists all default and core extension parsers.
For community extension parsers, please see our [community extensions list](../development/extensions.html#community-extensions).
For community extension parsers, please see our [community extensions list](../development/extensions.md#community-extensions).
### String Parser
@ -423,7 +423,7 @@ the set of ingested dimensions, if missing the discovered fields will make up th
`timeAndDims` parse spec must specify which fields will be extracted as dimensions through the `dimensionSpec`.
[All column types](https://orc.apache.org/docs/types.html) are supported, with the exception of `union` types. Columns of
[All column types](https://orc.apache.org/docs/types.md) are supported, with the exception of `union` types. Columns of
`list` type, if filled with primitives, may be used as a multi-value dimension, or specific elements can be extracted with
`flattenSpec` expressions. Likewise, primitive fields may be extracted from `map` and `struct` types in the same manner.
Auto field discovery will automatically create a string dimension for every (non-timestamp) primitive or `list` of
@ -658,7 +658,7 @@ JSON path expressions for all supported types.
When the time dimension is a [DateType column](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md),
a format should not be supplied. When the format is UTF8 (String), either `auto` or a explicitly defined
[format](http://www.joda.org/joda-time/apidocs/org/joda/time/format/DateTimeFormat.html) is required.
[format](http://www.joda.org/joda-time/apidocs/org/joda/time/format/DateTimeFormat) is required.
#### Parquet Hadoop Parser vs Parquet Avro Hadoop Parser
@ -808,7 +808,7 @@ Note that the `int96` Parquet value type is not supported with this parser.
When the time dimension is a [DateType column](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md),
a format should not be supplied. When the format is UTF8 (String), either `auto` or
an explicitly defined [format](http://www.joda.org/joda-time/apidocs/org/joda/time/format/DateTimeFormat.html) is required.
an explicitly defined [format](http://www.joda.org/joda-time/apidocs/org/joda/time/format/DateTimeFormat) is required.
#### Example

View File

@ -139,7 +139,7 @@ An example of compaction task is
This compaction task reads _all segments_ of the interval `2017-01-01/2018-01-01` and results in new segments.
Since `segmentGranularity` is null, the original segment granularity will be remained and not changed after compaction.
To control the number of result segments per time chunk, you can set [maxRowsPerSegment](../configuration/index.html#compaction-dynamic-configuration) or [numShards](../ingestion/native-batch.md#tuningconfig).
To control the number of result segments per time chunk, you can set [maxRowsPerSegment](../configuration/index.md#compaction-dynamic-configuration) or [numShards](../ingestion/native-batch.md#tuningconfig).
Please note that you can run multiple compactionTasks at the same time. For example, you can run 12 compactionTasks per month instead of running a single task for the entire year.
A compaction task internally generates an `index` task spec for performing compaction work with some fixed parameters.
@ -159,8 +159,8 @@ In this case, the dimensions of recent segments precede that of old segments in
This is because more recent segments are more likely to have the new desired order and data types. If you want to use
your own ordering and types, you can specify a custom `dimensionsSpec` in the compaction task spec.
- Roll-up: the output segment is rolled up only when `rollup` is set for all input segments.
See [Roll-up](../ingestion/index.html#rollup) for more details.
You can check that your segments are rolled up or not by using [Segment Metadata Queries](../querying/segmentmetadataquery.html#analysistypes).
See [Roll-up](../ingestion/index.md#rollup) for more details.
You can check that your segments are rolled up or not by using [Segment Metadata Queries](../querying/segmentmetadataquery.md#analysistypes).
### Compaction IOConfig
@ -243,11 +243,11 @@ scenarios dealing with more than 1GB of data.
## Deleting data
Druid supports permanent deletion of segments that are in an "unused" state (see the
[Segment lifecycle](../design/architecture.html#segment-lifecycle) section of the Architecture page).
[Segment lifecycle](../design/architecture.md#segment-lifecycle) section of the Architecture page).
The Kill Task deletes unused segments within a specified interval from metadata storage and deep storage.
For more information, please see [Kill Task](../ingestion/tasks.html#kill).
For more information, please see [Kill Task](../ingestion/tasks.md#kill).
Permanent deletion of a segment in Apache Druid has two steps:
@ -257,7 +257,7 @@ Permanent deletion of a segment in Apache Druid has two steps:
For documentation on retention rules, please see [Data Retention](../operations/rule-configuration.md).
For documentation on disabling segments using the Coordinator API, please see the
[Coordinator Datasources API](../operations/api-reference.html#coordinator-datasources) reference.
[Coordinator Datasources API](../operations/api-reference.md#coordinator-datasources) reference.
A data deletion tutorial is available at [Tutorial: Deleting data](../tutorials/tutorial-delete-data.md)

View File

@ -28,7 +28,7 @@ instance of a Druid [Overlord](../design/overlord.md). Please refer to our [Hado
comparisons between Hadoop-based, native batch (simple), and native batch (parallel) ingestion.
To run a Hadoop-based ingestion task, write an ingestion spec as specified below. Then POST it to the
[`/druid/indexer/v1/task`](../operations/api-reference.html#tasks) endpoint on the Overlord, or use the
[`/druid/indexer/v1/task`](../operations/api-reference.md#tasks) endpoint on the Overlord, or use the
`bin/post-index-task` script included with Druid.
## Tutorial
@ -111,7 +111,7 @@ A sample task is shown below:
|hadoopDependencyCoordinates|A JSON array of Hadoop dependency coordinates that Druid will use, this property will override the default Hadoop coordinates. Once specified, Druid will look for those Hadoop dependencies from the location specified by `druid.extensions.hadoopDependenciesDir`|no|
|classpathPrefix|Classpath that will be prepended for the Peon process.|no|
Also note that Druid automatically computes the classpath for Hadoop job containers that run in the Hadoop cluster. But in case of conflicts between Hadoop and Druid's dependencies, you can manually specify the classpath by setting `druid.extensions.hadoopContainerDruidClasspath` property. See the extensions config in [base druid configuration](../configuration/index.html#extensions).
Also note that Druid automatically computes the classpath for Hadoop job containers that run in the Hadoop cluster. But in case of conflicts between Hadoop and Druid's dependencies, you can manually specify the classpath by setting `druid.extensions.hadoopContainerDruidClasspath` property. See the extensions config in [base druid configuration](../configuration/index.md#extensions).
## `dataSchema`
@ -150,7 +150,7 @@ For example, using the static input paths:
You can also read from cloud storage such as AWS S3 or Google Cloud Storage.
To do so, you need to install the necessary library under Druid's classpath in _all MiddleManager or Indexer processes_.
For S3, you can run the below command to install the [Hadoop AWS module](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html).
For S3, you can run the below command to install the [Hadoop AWS module](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/).
```bash
java -classpath "${DRUID_HOME}lib/*" org.apache.druid.cli.Main tools pull-deps -h "org.apache.hadoop:hadoop-aws:${HADOOP_VERSION}";
@ -159,7 +159,7 @@ cp ${DRUID_HOME}/hadoop-dependencies/hadoop-aws/${HADOOP_VERSION}/hadoop-aws-${H
Once you install the Hadoop AWS module in all MiddleManager and Indexer processes, you can put
your S3 paths in the inputSpec with the below job properties.
For more configurations, see the [Hadoop AWS module](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html).
For more configurations, see the [Hadoop AWS module](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/).
```
"paths" : "s3a://billy-bucket/the/data/is/here/data.gz,s3a://billy-bucket/the/data/is/here/moredata.gz,s3a://billy-bucket/the/data/is/here/evenmoredata.gz"
@ -179,7 +179,7 @@ under `${DRUID_HOME}/hadoop-dependencies` in _all MiddleManager or Indexer proce
Once you install the GCS Connector jar in all MiddleManager and Indexer processes, you can put
your Google Cloud Storage paths in the inputSpec with the below job properties.
For more configurations, see the [instructions to configure Hadoop](https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcs/INSTALL.md#configure-hadoop),
[GCS core default](https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcs/conf/gcs-core-default.xml)
[GCS core default](https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/v2.0.0/gcs/conf/gcs-core-default.xml)
and [GCS core template](https://github.com/GoogleCloudPlatform/bdutil/blob/master/conf/hadoop2/gcs-core-template.xml).
```
@ -438,7 +438,7 @@ If you are having dependency problems with your version of Hadoop and the versio
If your cluster is running on Amazon Web Services, you can use Elastic MapReduce (EMR) to index data
from S3. To do this:
- Create a persistent, [long-running cluster](http://docs.aws.amazon.com/ElasticMapReduce/latest/ManagementGuide/emr-plan-longrunning-transient.html).
- Create a persistent, [long-running cluster](http://docs.aws.amazon.com/ElasticMapReduce/latest/ManagementGuide/emr-plan-longrunning-transient).
- When creating your cluster, enter the following configuration. If you're using the wizard, this
should be in advanced mode under "Edit software settings":

View File

@ -81,7 +81,7 @@ use the cluster resource of the existing cluster for batch ingestion.
This table compares the three available options:
| **Method** | [Native batch (parallel)](native-batch.html#parallel-task) | [Hadoop-based](hadoop.html) | [Native batch (simple)](native-batch.html#simple-task) |
| **Method** | [Native batch (parallel)](native-batch.md#parallel-task) | [Hadoop-based](hadoop.md) | [Native batch (simple)](native-batch.md#simple-task) |
|---|-----|--------------|------------|
| **Task type** | `index_parallel` | `index_hadoop` | `index` |
| **Parallel?** | Yes, if `inputFormat` is splittable and `maxNumConcurrentSubTasks` > 1 in `tuningConfig`. See [data format documentation](./data-formats.md) for details. | Yes, always. | No. Each task is single-threaded. |
@ -106,7 +106,7 @@ offers a unique data modeling system that bears similarity to both relational an
Druid schemas must always include a primary timestamp. The primary timestamp is used for
[partitioning and sorting](#partitioning) your data. Druid queries are able to rapidly identify and retrieve data
corresponding to time ranges of the primary timestamp column. Druid is also able to use the primary timestamp column
for time-based [data management operations](data-management.html) such as dropping time chunks, overwriting time chunks,
for time-based [data management operations](data-management.md) such as dropping time chunks, overwriting time chunks,
and time-based retention rules.
The primary timestamp is parsed based on the [`timestampSpec`](#timestampspec). In addition, the
@ -186,7 +186,7 @@ Tips for maximizing rollup:
- Generally, the fewer dimensions you have, and the lower the cardinality of your dimensions, the better rollup ratios
you will achieve.
- Use [sketches](schema-design.html#sketches) to avoid storing high cardinality dimensions, which harm rollup ratios.
- Use [sketches](schema-design.md#sketches) to avoid storing high cardinality dimensions, which harm rollup ratios.
- Adjusting `queryGranularity` at ingestion time (for example, using `PT5M` instead of `PT1M`) increases the
likelihood of two rows in Druid having matching timestamps, and can improve your rollup ratios.
- It can be beneficial to load the same data into more than one Druid datasource. Some users choose to create a "full"
@ -218,8 +218,8 @@ The following table shows how each method handles rollup:
|Method|How it works|
|------|------------|
|[Native batch](native-batch.html)|`index_parallel` and `index` type may be either perfect or best-effort, based on configuration.|
|[Hadoop](hadoop.html)|Always perfect.|
|[Native batch](native-batch.md)|`index_parallel` and `index` type may be either perfect or best-effort, based on configuration.|
|[Hadoop](hadoop.md)|Always perfect.|
|[Kafka indexing service](../development/extensions-core/kafka-ingestion.md)|Always best-effort.|
|[Kinesis indexing service](../development/extensions-core/kinesis-ingestion.md)|Always best-effort.|
@ -258,7 +258,7 @@ storage size decreases - and it also tends to improve query performance as well.
Not all ingestion methods support an explicit partitioning configuration, and not all have equivalent levels of
flexibility. As of current Druid versions, If you are doing initial ingestion through a less-flexible method (like
Kafka) then you can use [reindexing techniques](data-management.html#compaction-and-reindexing) to repartition your data after it
Kafka) then you can use [reindexing techniques](data-management.md#compaction-and-reindexing) to repartition your data after it
is initially ingested. This is a powerful technique: you can use it to ensure that any data older than a certain
threshold is optimally partitioned, even as you continuously add new data from a stream.
@ -266,10 +266,10 @@ The following table shows how each ingestion method handles partitioning:
|Method|How it works|
|------|------------|
|[Native batch](native-batch.html)|Configured using [`partitionsSpec`](native-batch.html#partitionsspec) inside the `tuningConfig`.|
|[Hadoop](hadoop.html)|Configured using [`partitionsSpec`](hadoop.html#partitionsspec) inside the `tuningConfig`.|
|[Kafka indexing service](../development/extensions-core/kafka-ingestion.md)|Partitioning in Druid is guided by how your Kafka topic is partitioned. You can also [reindex](data-management.html#compaction-and-reindexing) to repartition after initial ingestion.|
|[Kinesis indexing service](../development/extensions-core/kinesis-ingestion.md)|Partitioning in Druid is guided by how your Kinesis stream is sharded. You can also [reindex](data-management.html#compaction-and-reindexing) to repartition after initial ingestion.|
|[Native batch](native-batch.md)|Configured using [`partitionsSpec`](native-batch.md#partitionsspec) inside the `tuningConfig`.|
|[Hadoop](hadoop.md)|Configured using [`partitionsSpec`](hadoop.md#partitionsspec) inside the `tuningConfig`.|
|[Kafka indexing service](../development/extensions-core/kafka-ingestion.md)|Partitioning in Druid is guided by how your Kafka topic is partitioned. You can also [reindex](data-management.md#compaction-and-reindexing) to repartition after initial ingestion.|
|[Kinesis indexing service](../development/extensions-core/kinesis-ingestion.md)|Partitioning in Druid is guided by how your Kinesis stream is sharded. You can also [reindex](data-management.md#compaction-and-reindexing) to repartition after initial ingestion.|
> Note that, of course, one way to partition data is to load it into separate datasources. This is a perfectly viable
> approach and works very well when the number of datasources does not lead to excessive per-datasource overheads. If
@ -283,7 +283,7 @@ The following table shows how each ingestion method handles partitioning:
## Ingestion specs
No matter what ingestion method you use, data is loaded into Druid using either one-time [tasks](tasks.html) or
No matter what ingestion method you use, data is loaded into Druid using either one-time [tasks](tasks.md) or
ongoing "supervisors" (which run and supervise a set of tasks over time). In any case, part of the task or supervisor
definition is an _ingestion spec_.
@ -359,7 +359,7 @@ You can also load data visually, without the need to write an ingestion spec, us
available in Druid's [web console](../operations/druid-console.md). Druid's visual data loader supports
[Kafka](../development/extensions-core/kafka-ingestion.md),
[Kinesis](../development/extensions-core/kinesis-ingestion.md), and
[native batch](native-batch.html) mode.
[native batch](native-batch.md) mode.
## `dataSchema`
@ -406,7 +406,7 @@ An example `dataSchema` is:
### `dataSource`
The `dataSource` is located in `dataSchema``dataSource` and is simply the name of the
[datasource](../design/architecture.html#datasources-and-segments) that data will be written to. An example
[datasource](../design/architecture.md#datasources-and-segments) that data will be written to. An example
`dataSource` is:
```
@ -526,7 +526,7 @@ An example `metricsSpec` is:
The `granularitySpec` is located in `dataSchema``granularitySpec` and is responsible for configuring
the following operations:
1. Partitioning a datasource into [time chunks](../design/architecture.html#datasources-and-segments) (via `segmentGranularity`).
1. Partitioning a datasource into [time chunks](../design/architecture.md#datasources-and-segments) (via `segmentGranularity`).
2. Truncating the timestamp, if desired (via `queryGranularity`).
3. Specifying which time chunks of segments should be created, for batch ingestion (via `intervals`).
4. Specifying whether ingestion-time [rollup](#rollup) should be used or not (via `rollup`).
@ -551,7 +551,7 @@ A `granularitySpec` can have the following components:
| Field | Description | Default |
|-------|-------------|---------|
| type | Either `uniform` or `arbitrary`. In most cases you want to use `uniform`.| `uniform` |
| segmentGranularity | [Time chunking](../design/architecture.html#datasources-and-segments) granularity for this datasource. Multiple segments can be created per time chunk. For example, when set to `day`, the events of the same day fall into the same time chunk which can be optionally further partitioned into multiple segments based on other configurations and input size. Any [granularity](../querying/granularities.md) can be provided here. Note that all segments in the same time chunk should have the same segment granularity.<br><br>Ignored if `type` is set to `arbitrary`.| `day` |
| segmentGranularity | [Time chunking](../design/architecture.md#datasources-and-segments) granularity for this datasource. Multiple segments can be created per time chunk. For example, when set to `day`, the events of the same day fall into the same time chunk which can be optionally further partitioned into multiple segments based on other configurations and input size. Any [granularity](../querying/granularities.md) can be provided here. Note that all segments in the same time chunk should have the same segment granularity.<br><br>Ignored if `type` is set to `arbitrary`.| `day` |
| queryGranularity | The resolution of timestamp storage within each segment. This must be equal to, or finer, than `segmentGranularity`. This will be the finest granularity that you can query at and still receive sensible results, but note that you can still query at anything coarser than this granularity. E.g., a value of `minute` will mean that records will be stored at minutely granularity, and can be sensibly queried at any multiple of minutes (including minutely, 5-minutely, hourly, etc).<br><br>Any [granularity](../querying/granularities.md) can be provided here. Use `none` to store timestamps as-is, without any truncation. Note that `rollup` will be applied if it is set even when the `queryGranularity` is set to `none`. | `none` |
| rollup | Whether to use ingestion-time [rollup](#rollup) or not. Note that rollup is still effective even when `queryGranularity` is set to `none`. Your data will be rolled up if they have the exactly same timestamp. | `true` |
| intervals | A list of intervals describing what time chunks of segments should be created. If `type` is set to `uniform`, this list will be broken up and rounded-off based on the `segmentGranularity`. If `type` is set to `arbitrary`, this list will be used as-is.<br><br>If `null` or not provided, batch ingestion tasks will generally determine which time chunks to output based on what timestamps are found in the input data.<br><br>If specified, batch ingestion tasks may be able to skip a determining-partitions phase, which can result in faster ingestion. Batch ingestion tasks may also be able to request all their locks up-front instead of one by one. Batch ingestion tasks will throw away any records with timestamps outside of the specified intervals.<br><br>Ignored for any form of streaming ingestion. | `null` |

View File

@ -1264,7 +1264,7 @@ Sample spec:
|property|description|required?|
|--------|-----------|---------|
|type|This should be "local".|yes|
|filter|A wildcard filter for files. See [here](http://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/filefilter/WildcardFileFilter.html) for more information.|yes if `baseDir` is specified|
|filter|A wildcard filter for files. See [here](http://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/filefilter/WildcardFileFilter) for more information.|yes if `baseDir` is specified|
|baseDir|Directory to search recursively for files to be ingested. Empty files under the `baseDir` will be skipped.|At least one of `baseDir` or `files` should be specified|
|files|File paths to ingest. Some files can be ignored to avoid ingesting duplicate files if they are located under the specified `baseDir`. Empty files will be skipped.|At least one of `baseDir` or `files` should be specified|
@ -1570,7 +1570,7 @@ A sample local Firehose spec is shown below:
|property|description|required?|
|--------|-----------|---------|
|type|This should be "local".|yes|
|filter|A wildcard filter for files. See [here](http://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/filefilter/WildcardFileFilter.html) for more information.|yes|
|filter|A wildcard filter for files. See [here](http://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/filefilter/WildcardFileFilter) for more information.|yes|
|baseDir|directory to search recursively for files to be ingested. |yes|
<a name="http-firehose"></a>

View File

@ -89,7 +89,7 @@ it is a natural choice for storing timeseries data. Its flexible data model allo
non-timeseries data, even in the same datasource.
To achieve best-case compression and query performance in Druid for timeseries data, it is important to partition and
sort by metric name, like timeseries databases often do. See [Partitioning and sorting](index.html#partitioning) for more details.
sort by metric name, like timeseries databases often do. See [Partitioning and sorting](index.md#partitioning) for more details.
Tips for modeling timeseries data in Druid:
@ -98,12 +98,12 @@ for ingestion and aggregation.
- Create a dimension that indicates the name of the series that a data point belongs to. This dimension is often called
"metric" or "name". Do not get the dimension named "metric" confused with the concept of Druid metrics. Place this
first in the list of dimensions in your "dimensionsSpec" for best performance (this helps because it improves locality;
see [partitioning and sorting](index.html#partitioning) below for details).
see [partitioning and sorting](index.md#partitioning) below for details).
- Create other dimensions for attributes attached to your data points. These are often called "tags" in timeseries
database systems.
- Create [metrics](../querying/aggregations.md) corresponding to the types of aggregations that you want to be able
to query. Typically this includes "sum", "min", and "max" (in one of the long, float, or double flavors). If you want to
be able to compute percentiles or quantiles, use Druid's [approximate aggregators](../querying/aggregations.html#approx).
be able to compute percentiles or quantiles, use Druid's [approximate aggregators](../querying/aggregations.md#approx).
- Consider enabling [rollup](#rollup), which will allow Druid to potentially combine multiple points into one
row in your Druid datasource. This can be useful if you want to store data at a different time granularity than it is
naturally emitted. It is also useful if you want to combine timeseries and non-timeseries data in the same datasource.
@ -167,7 +167,7 @@ so they can be sorted and the quantile can be computed, Druid instead only needs
can reduce data transfer needs to mere kilobytes.
For details about the sketches available in Druid, see the
[approximate aggregators](../querying/aggregations.html#approx) page.
[approximate aggregators](../querying/aggregations.md#approx) page.
If you prefer videos, take a look at [Not exactly!](https://www.youtube.com/watch?v=Hpd3f_MLdXo), a conference talk
about sketches in Druid.
@ -187,7 +187,7 @@ For details about how to configure numeric dimensions, see the [`dimensionsSpec`
### Secondary timestamps
Druid schemas must always include a primary timestamp. The primary timestamp is used for
[partitioning and sorting](index.html#partitioning) your data, so it should be the timestamp that you will most often filter on.
[partitioning and sorting](index.md#partitioning) your data, so it should be the timestamp that you will most often filter on.
Druid is able to rapidly identify and retrieve data corresponding to time ranges of the primary timestamp column.
If your data has more than one timestamp, you can ingest the others as secondary timestamps. The best way to do this
@ -195,7 +195,7 @@ is to ingest them as [long-typed dimensions](index.md#dimensionsspec) in millise
If necessary, you can get them into this format using a [`transformSpec`](index.md#transformspec) and
[expressions](../misc/math-expr.md) like `timestamp_parse`, which returns millisecond timestamps.
At query time, you can query secondary timestamps with [SQL time functions](../querying/sql.html#time-functions)
At query time, you can query secondary timestamps with [SQL time functions](../querying/sql.md#time-functions)
like `MILLIS_TO_TIMESTAMP`, `TIME_FLOOR`, and others. If you're using native Druid queries, you can use
[expressions](../misc/math-expr.md).

View File

@ -24,7 +24,7 @@ title: "Realtime Process"
Older versions of Apache Druid supported a standalone 'Realtime' process to query and index 'stream pull'
modes of real-time ingestion. These processes would periodically build segments for the data they had collected over
some span of time and then set up hand-off to [Historical](../design/historical.html) servers.
some span of time and then set up hand-off to [Historical](../design/historical.md) servers.
This processes could be invoked by
@ -40,5 +40,5 @@ suffered from limitations which made it not possible to achieve exactly once ing
The extensions `druid-kafka-eight`, `druid-kafka-eight-simpleConsumer`, `druid-rabbitmq`, and `druid-rocketmq` were also
removed at this time, since they were built to operate on the realtime nodes.
Please consider using the [Kafka Indexing Service](../development/extensions-core/kafka-ingestion.html) or
Please consider using the [Kafka Indexing Service](../development/extensions-core/kafka-ingestion.md) or
[Kinesis Indexing Service](../development/extensions-core/kinesis-ingestion.md) for stream pull ingestion instead.

View File

@ -250,7 +250,7 @@ Note that the overshadow relation holds only for the same time chunk and the sam
These overshadowed segments are not considered in query processing to filter out stale data.
Each segment has a _major_ version and a _minor_ version. The major version is
represented as a timestamp in the format of [`"yyyy-MM-dd'T'hh:mm:ss"`](https://www.joda.org/joda-time/apidocs/org/joda/time/format/DateTimeFormat.html)
represented as a timestamp in the format of [`"yyyy-MM-dd'T'hh:mm:ss"`](https://www.joda.org/joda-time/apidocs/org/joda/time/format/DateTimeFormat)
while the minor version is an integer number. These major and minor versions
are used to determine the overshadow relation between segments as seen below.
@ -268,7 +268,7 @@ Here are some examples.
## Locking
If you are running two or more [druid tasks](./tasks.html) which generate segments for the same data source and the same time chunk,
If you are running two or more [druid tasks](./tasks.md) which generate segments for the same data source and the same time chunk,
the generated segments could potentially overshadow each other, which could lead to incorrect query results.
To avoid this problem, tasks will attempt to get locks prior to creating any segment in Druid.
@ -297,7 +297,7 @@ Also, the segment locking is supported by only native indexing tasks and Kafka/K
Hadoop indexing tasks and `index_realtime` tasks (used by [Tranquility](tranquility.md)) don't support it yet.
`forceTimeChunkLock` in the task context is only applied to individual tasks.
If you want to unset it for all tasks, you would want to set `druid.indexer.tasklock.forceTimeChunkLock` to false in the [overlord configuration](../configuration/index.html#overlord-operations).
If you want to unset it for all tasks, you would want to set `druid.indexer.tasklock.forceTimeChunkLock` to false in the [overlord configuration](../configuration/index.md#overlord-operations).
Lock requests can conflict with each other if two or more tasks try to get locks for the overlapped time chunks of the same data source.
Note that the lock conflict can happen between different locks types.
@ -348,7 +348,7 @@ The task context is used for various individual task configuration. The followin
|property|default|description|
|--------|-------|-----------|
|taskLockTimeout|300000|task lock timeout in millisecond. For more details, see [Locking](#locking).|
|forceTimeChunkLock|true|_Setting this to false is still experimental_<br/> Force to always use time chunk lock. If not set, each task automatically chooses a lock type to use. If this set, it will overwrite the `druid.indexer.tasklock.forceTimeChunkLock` [configuration for the overlord](../configuration/index.html#overlord-operations). See [Locking](#locking) for more details.|
|forceTimeChunkLock|true|_Setting this to false is still experimental_<br/> Force to always use time chunk lock. If not set, each task automatically chooses a lock type to use. If this set, it will overwrite the `druid.indexer.tasklock.forceTimeChunkLock` [configuration for the overlord](../configuration/index.md#overlord-operations). See [Locking](#locking) for more details.|
|priority|Different based on task types. See [Priority](#priority).|Task priority|
> When a task acquires a lock, it sends a request via HTTP and awaits until it receives a response containing the lock acquisition result.

View File

@ -50,7 +50,7 @@ Expressions can contain variables. Variable names may contain letters, digits, '
For logical operators, a number is true if and only if it is positive (0 or negative value means false). For string type, it's the evaluation result of 'Boolean.valueOf(string)'.
[Multi-value string dimensions](../querying/multi-value-dimensions.html) are supported and may be treated as either scalar or array typed values. When treated as a scalar type, an expression will automatically be transformed to apply the scalar operation across all values of the multi-valued type, to mimic Druid's native behavior. Values that result in arrays will be coerced back into the native Druid string type for aggregation. Druid aggregations on multi-value string dimensions on the individual values, _not_ the 'array', behaving similar to the `UNNEST` operator available in many SQL dialects. However, by using the `array_to_string` function, aggregations may be done on a stringified version of the complete array, allowing the complete row to be preserved. Using `string_to_array` in an expression post-aggregator, allows transforming the stringified dimension back into the true native array type.
[Multi-value string dimensions](../querying/multi-value-dimensions.md) are supported and may be treated as either scalar or array typed values. When treated as a scalar type, an expression will automatically be transformed to apply the scalar operation across all values of the multi-valued type, to mimic Druid's native behavior. Values that result in arrays will be coerced back into the native Druid string type for aggregation. Druid aggregations on multi-value string dimensions on the individual values, _not_ the 'array', behaving similar to the `UNNEST` operator available in many SQL dialects. However, by using the `array_to_string` function, aggregations may be done on a stringified version of the complete array, allowing the complete row to be preserved. Using `string_to_array` in an expression post-aggregator, allows transforming the stringified dimension back into the true native array type.
The following built-in functions are available.
@ -72,7 +72,7 @@ The following built-in functions are available.
|name|description|
|----|-----------|
|concat|concat(expr, expr...) concatenate a list of strings|
|format|format(pattern[, args...]) returns a string formatted in the manner of Java's [String.format](https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#format-java.lang.String-java.lang.Object...-).|
|format|format(pattern[, args...]) returns a string formatted in the manner of Java's [String.format](https://docs.oracle.com/javase/8/docs/api/java/lang/String.md#format-java.lang.String-java.lang.Object...-).|
|like|like(expr, pattern[, escape]) is equivalent to SQL `expr LIKE pattern`|
|lookup|lookup(expr, lookup-name) looks up expr in a registered [query-time lookup](../querying/lookups.md)|
|parse_long|parse_long(string[, radix]) parses a string as a long with the given radix, or 10 (decimal) if a radix is not provided.|
@ -106,8 +106,8 @@ The following built-in functions are available.
|timestamp_floor|timestamp_floor(expr, period, \[origin, [timezone\]\]) rounds down a timestamp, returning it as a new timestamp. Period can be any ISO8601 period, like P3M (quarters) or PT12H (half-days). The time zone, if provided, should be a time zone name like "America/Los_Angeles" or offset like "-08:00".|
|timestamp_shift|timestamp_shift(expr, period, step, \[timezone\]) shifts a timestamp by a period (step times), returning it as a new timestamp. Period can be any ISO8601 period. Step may be negative. The time zone, if provided, should be a time zone name like "America/Los_Angeles" or offset like "-08:00".|
|timestamp_extract|timestamp_extract(expr, unit, \[timezone\]) extracts a time part from expr, returning it as a number. Unit can be EPOCH (number of seconds since 1970-01-01 00:00:00 UTC), SECOND, MINUTE, HOUR, DAY (day of month), DOW (day of week), DOY (day of year), WEEK (week of [week year](https://en.wikipedia.org/wiki/ISO_week_date)), MONTH (1 through 12), QUARTER (1 through 4), or YEAR. The time zone, if provided, should be a time zone name like "America/Los_Angeles" or offset like "-08:00"|
|timestamp_parse|timestamp_parse(string expr, \[pattern, [timezone\]\]) parses a string into a timestamp using a given [Joda DateTimeFormat pattern](http://www.joda.org/joda-time/apidocs/org/joda/time/format/DateTimeFormat.html). If the pattern is not provided, this parses time strings in either ISO8601 or SQL format. The time zone, if provided, should be a time zone name like "America/Los_Angeles" or offset like "-08:00", and will be used as the time zone for strings that do not include a time zone offset. Pattern and time zone must be literals. Strings that cannot be parsed as timestamps will be returned as nulls.|
|timestamp_format|timestamp_format(expr, \[pattern, \[timezone\]\]) formats a timestamp as a string with a given [Joda DateTimeFormat pattern](http://www.joda.org/joda-time/apidocs/org/joda/time/format/DateTimeFormat.html), or ISO8601 if the pattern is not provided. The time zone, if provided, should be a time zone name like "America/Los_Angeles" or offset like "-08:00". Pattern and time zone must be literals.|
|timestamp_parse|timestamp_parse(string expr, \[pattern, [timezone\]\]) parses a string into a timestamp using a given [Joda DateTimeFormat pattern](http://www.joda.org/joda-time/apidocs/org/joda/time/format/DateTimeFormat). If the pattern is not provided, this parses time strings in either ISO8601 or SQL format. The time zone, if provided, should be a time zone name like "America/Los_Angeles" or offset like "-08:00", and will be used as the time zone for strings that do not include a time zone offset. Pattern and time zone must be literals. Strings that cannot be parsed as timestamps will be returned as nulls.|
|timestamp_format|timestamp_format(expr, \[pattern, \[timezone\]\]) formats a timestamp as a string with a given [Joda DateTimeFormat pattern](http://www.joda.org/joda-time/apidocs/org/joda/time/format/DateTimeFormat), or ISO8601 if the pattern is not provided. The time zone, if provided, should be a time zone name like "America/Los_Angeles" or offset like "-08:00". Pattern and time zone must be literals.|
## Math functions

View File

@ -67,7 +67,7 @@ monitoring checks such as AWS load balancer health checks are not able to look a
## Master Server
This section documents the API endpoints for the processes that reside on Master servers (Coordinators and Overlords)
in the suggested [three-server configuration](../design/processes.html#server-types).
in the suggested [three-server configuration](../design/processes.md#server-types).
### Coordinator
@ -461,7 +461,7 @@ will be set for them.
* `/druid/coordinator/v1/config/compaction`
Creates or updates the compaction config for a dataSource.
See [Compaction Configuration](../configuration/index.html#compaction-dynamic-configuration) for configuration details.
See [Compaction Configuration](../configuration/index.md#compaction-dynamic-configuration) for configuration details.
##### DELETE
@ -584,7 +584,7 @@ Retrieve list of task status objects for list of task id strings in request body
Manually clean up pending segments table in metadata storage for `datasource`. Returns a JSON object response with
`numDeleted` and count of rows deleted from the pending segments table. This API is used by the
`druid.coordinator.kill.pendingSegments.on` [coordinator setting](../configuration/index.html#coordinator-operation)
`druid.coordinator.kill.pendingSegments.on` [coordinator setting](../configuration/index.md#coordinator-operation)
which automates this operation to perform periodically.
#### Supervisors
@ -602,8 +602,8 @@ Returns a list of objects of the currently active supervisors.
|Field|Type|Description|
|---|---|---|
|`id`|String|supervisor unique identifier|
|`state`|String|basic state of the supervisor. Available states:`UNHEALTHY_SUPERVISOR`, `UNHEALTHY_TASKS`, `PENDING`, `RUNNING`, `SUSPENDED`, `STOPPING`. Check [Kafka Docs](../development/extensions-core/kafka-ingestion.html#operations) for details.|
|`detailedState`|String|supervisor specific state. (See documentation of specific supervisor for details), e.g. [Kafka](../development/extensions-core/kafka-ingestion.html) or [Kinesis](../development/extensions-core/kinesis-ingestion.html))|
|`state`|String|basic state of the supervisor. Available states:`UNHEALTHY_SUPERVISOR`, `UNHEALTHY_TASKS`, `PENDING`, `RUNNING`, `SUSPENDED`, `STOPPING`. Check [Kafka Docs](../development/extensions-core/kafka-ingestion.md#operations) for details.|
|`detailedState`|String|supervisor specific state. (See documentation of specific supervisor for details), e.g. [Kafka](../development/extensions-core/kafka-ingestion.md) or [Kinesis](../development/extensions-core/kinesis-ingestion.md))|
|`healthy`|Boolean|true or false indicator of overall supervisor health|
|`spec`|SupervisorSpec|json specification of supervisor (See Supervisor Configuration for details)|
@ -614,8 +614,8 @@ Returns a list of objects of the currently active supervisors and their current
|Field|Type|Description|
|---|---|---|
|`id`|String|supervisor unique identifier|
|`state`|String|basic state of the supervisor. Available states: `UNHEALTHY_SUPERVISOR`, `UNHEALTHY_TASKS`, `PENDING`, `RUNNING`, `SUSPENDED`, `STOPPING`. Check [Kafka Docs](../development/extensions-core/kafka-ingestion.html#operations) for details.|
|`detailedState`|String|supervisor specific state. (See documentation of the specific supervisor for details, e.g. [Kafka](../development/extensions-core/kafka-ingestion.html) or [Kinesis](../development/extensions-core/kinesis-ingestion.html))|
|`state`|String|basic state of the supervisor. Available states: `UNHEALTHY_SUPERVISOR`, `UNHEALTHY_TASKS`, `PENDING`, `RUNNING`, `SUSPENDED`, `STOPPING`. Check [Kafka Docs](../development/extensions-core/kafka-ingestion.md#operations) for details.|
|`detailedState`|String|supervisor specific state. (See documentation of the specific supervisor for details, e.g. [Kafka](../development/extensions-core/kafka-ingestion.md) or [Kinesis](../development/extensions-core/kinesis-ingestion.md))|
|`healthy`|Boolean|true or false indicator of overall supervisor health|
|`suspended`|Boolean|true or false indicator of whether the supervisor is in suspended state|
@ -678,7 +678,7 @@ Shutdown a supervisor.
#### Dynamic configuration
See [Overlord Dynamic Configuration](../configuration/index.html#overlord-dynamic-configuration) for details.
See [Overlord Dynamic Configuration](../configuration/index.md#overlord-dynamic-configuration) for details.
Note that all _interval_ URL parameters are ISO 8601 strings delimited by a `_` instead of a `/`
(e.g., 2016-06-27_2016-06-28).
@ -711,7 +711,7 @@ Update overlord dynamic worker configuration.
## Data Server
This section documents the API endpoints for the processes that reside on Data servers (MiddleManagers/Peons and Historicals)
in the suggested [three-server configuration](../design/processes.html#server-types).
in the suggested [three-server configuration](../design/processes.md#server-types).
### MiddleManager
@ -802,7 +802,7 @@ in the local cache have been loaded, and 503 SERVICE UNAVAILABLE, if they haven'
## Query Server
This section documents the API endpoints for the processes that reside on Query servers (Brokers) in the suggested [three-server configuration](../design/processes.html#server-types).
This section documents the API endpoints for the processes that reside on Query servers (Brokers) in the suggested [three-server configuration](../design/processes.md#server-types).
### Broker

View File

@ -423,7 +423,7 @@ Additionally, for large JVM heaps, here are a few Garbage Collection efficiency
### Use UTC timezone
We recommend using UTC timezone for all your events and across your hosts, not just for Druid, but for all data infrastructure. This can greatly mitigate potential query problems with inconsistent timezones. To query in a non-UTC timezone see [query granularities](../querying/granularities.html#period-granularities)
We recommend using UTC timezone for all your events and across your hosts, not just for Druid, but for all data infrastructure. This can greatly mitigate potential query problems with inconsistent timezones. To query in a non-UTC timezone see [query granularities](../querying/granularities.md#period-granularities)
### System configuration

View File

@ -44,14 +44,14 @@ When migrating from Derby, the coordinator processes will still need to be up in
Before migrating, you will need to copy your old segments to the new deep storage.
For information on what path structure to use in the new deep storage, please see [deep storage migration options](../operations/export-metadata.html#deep-storage-migration).
For information on what path structure to use in the new deep storage, please see [deep storage migration options](../operations/export-metadata.md#deep-storage-migration).
## Export segments with rewritten load specs
Druid provides an [Export Metadata Tool](../operations/export-metadata.md) for exporting metadata from Derby into CSV files
which can then be reimported.
By setting [deep storage migration options](../operations/export-metadata.html#deep-storage-migration), the `export-metadata` tool will export CSV files where the segment load specs have been rewritten to load from your new deep storage location.
By setting [deep storage migration options](../operations/export-metadata.md#deep-storage-migration), the `export-metadata` tool will export CSV files where the segment load specs have been rewritten to load from your new deep storage location.
Run the `export-metadata` tool on your existing cluster, using the migration options appropriate for your new deep storage location, and save the CSV files it generates. After a successful export, you can shut down the coordinator.
@ -59,7 +59,7 @@ Run the `export-metadata` tool on your existing cluster, using the migration opt
After generating the CSV exports with the modified segment data, you can reimport the contents of the Druid segments table from the generated CSVs.
Please refer to [import commands](../operations/export-metadata.html#importing-metadata) for examples. Only the `druid_segments` table needs to be imported.
Please refer to [import commands](../operations/export-metadata.md#importing-metadata) for examples. Only the `druid_segments` table needs to be imported.
### Restart cluster

View File

@ -28,7 +28,7 @@ The Druid Console is hosted by the [Router](../design/router.md) process.
The following cluster settings must be enabled, as they are by default:
- the Router's [management proxy](../design/router.html#enabling-the-management-proxy) must be enabled.
- the Router's [management proxy](../design/router.md#enabling-the-management-proxy) must be enabled.
- the Broker processes in the cluster must have [Druid SQL](../querying/sql.md) enabled.
The Druid console can be accessed at:
@ -47,9 +47,9 @@ Below is a description of the high-level features and functionality of the Druid
The home view provides a high level overview of the cluster.
Each card is clickable and links to the appropriate view.
The legacy menu allows you to go to the [legacy coordinator and overlord consoles](./management-uis.html#legacy-consoles) should you need them.
The legacy menu allows you to go to the [legacy coordinator and overlord consoles](./management-uis.md#legacy-consoles) should you need them.
![home-view](../assets/web-console-01-home-view.png)
![home-view](../assets/web-console-01-home-view.png "home view")
## Data loader

View File

@ -27,10 +27,10 @@ Apache ZooKeeper, metadata store, the coordinator, the overlord, and brokers are
- For highly-available ZooKeeper, you will need a cluster of 3 or 5 ZooKeeper nodes.
We recommend either installing ZooKeeper on its own hardware, or running 3 or 5 Master servers (where overlords or coordinators are running)
and configuring ZooKeeper on them appropriately. See the [ZooKeeper admin guide](https://zookeeper.apache.org/doc/current/zookeeperAdmin.html) for more details.
and configuring ZooKeeper on them appropriately. See the [ZooKeeper admin guide](https://zookeeper.apache.org/doc/current/zookeeperAdmin) for more details.
- For highly-available metadata storage, we recommend MySQL or PostgreSQL with replication and failover enabled.
See [MySQL HA/Scalability Guide](https://dev.mysql.com/doc/mysql-ha-scalability/en/)
and [PostgreSQL's High Availability, Load Balancing, and Replication](https://www.postgresql.org/docs/9.5/high-availability.html) for MySQL and PostgreSQL, respectively.
and [PostgreSQL's High Availability, Load Balancing, and Replication](https://www.postgresql.org/docs/current/high-availability.html) for MySQL and PostgreSQL, respectively.
- For highly-available Apache Druid Coordinators and Overlords, we recommend to run multiple servers.
If they are all configured to use the same ZooKeeper cluster and metadata storage,
then they will automatically failover between each other as necessary.

View File

@ -84,7 +84,7 @@ java -classpath "lib/*" -Dlog4j.configurationFile=conf/druid/cluster/_common/log
### Import metadata
After initializing the tables, please refer to the [import commands](../operations/export-metadata.html#importing-metadata) for your target database.
After initializing the tables, please refer to the [import commands](../operations/export-metadata.md#importing-metadata) for your target database.
### Restart cluster

View File

@ -257,11 +257,11 @@ These metrics are for the Druid Coordinator and are reset each time the Coordina
|`coordinator/time`|Approximate Coordinator duty runtime in milliseconds. The duty dimension is the string alias of the Duty that is being run.|duty.|Varies.|
|`coordinator/global/time`|Approximate runtime of a full coordination cycle in milliseconds. The `dutyGroup` dimension indicates what type of coordination this run was. i.e. Historical Management vs Indexing|`dutyGroup`|Varies.|
If `emitBalancingStats` is set to `true` in the Coordinator [dynamic configuration](
../configuration/index.html#dynamic-configuration), then [log entries](../configuration/logging.md) for class
If `emitBalancingStats` is set to `true` in the Coordinator [dynamic configuration](../configuration/index.md#dynamic-configuration), then [log entries](../configuration/logging.md) for class
`org.apache.druid.server.coordinator.duty.EmitClusterStatsAndMetrics` will have extra information on balancing
decisions.
## General Health
### Historical
@ -292,7 +292,7 @@ These metrics are only available if the JVMMonitor module is included.
|`jvm/mem/used`|Used memory.|memKind.|< max memory|
|`jvm/mem/committed`|Committed memory.|memKind.|close to max memory|
|`jvm/gc/count`|Garbage collection count.|gcName (cms/g1/parallel/etc.), gcGen (old/young)|Varies.|
|`jvm/gc/cpu`|Count of CPU time in Nanoseconds spent on garbage collection. Note: `jvm/gc/cpu` represents the total time over multiple GC cycles; divide by `jvm/gc/count` to get the mean GC time per cycle|gcName, gcGen|Sum of `jvm/gc/cpu` should be within 10-30% of sum of `jvm/cpu/total`, depending on the GC algorithm used (reported by [`JvmCpuMonitor`](../configuration/index.html#enabling-metrics)) |
|`jvm/gc/cpu`|Count of CPU time in Nanoseconds spent on garbage collection. Note: `jvm/gc/cpu` represents the total time over multiple GC cycles; divide by `jvm/gc/count` to get the mean GC time per cycle|gcName, gcGen|Sum of `jvm/gc/cpu` should be within 10-30% of sum of `jvm/cpu/total`, depending on the GC algorithm used (reported by [`JvmCpuMonitor`](../configuration/index.md#enabling-metrics)) |
### EventReceiverFirehose

View File

@ -87,7 +87,7 @@ classloader.
1. HDFS deep storage uses jars from `extensions/druid-hdfs-storage/` to read and write Druid data on HDFS.
2. Batch ingestion uses jars from `hadoop-dependencies/` to submit Map/Reduce jobs (location customizable via the
`druid.extensions.hadoopDependenciesDir` runtime property; see [Configuration](../configuration/index.html#extensions)).
`druid.extensions.hadoopDependenciesDir` runtime property; see [Configuration](../configuration/index.md#extensions)).
`hadoop-client:2.8.5` is the default version of the Hadoop client bundled with Druid for both purposes. This works with
many Hadoop distributions (the version does not necessarily need to match), but if you run into issues, you can instead

View File

@ -59,7 +59,7 @@ You may need to consider the followings to optimize your segments.
> you may need to find the optimal settings for your workload.
There might be several ways to check if the compaction is necessary. One way
is using the [System Schema](../querying/sql.html#system-schema). The
is using the [System Schema](../querying/sql.md#system-schema). The
system schema provides several tables about the current system status including the `segments` table.
By running the below query, you can get the average number of rows and average size for published segments.
@ -87,11 +87,11 @@ In this case, you may want to see only rows of the max version per interval (pai
Once you find your segments need compaction, you can consider the below two options:
- Turning on the [automatic compaction of Coordinators](../design/coordinator.html#compacting-segments).
- Turning on the [automatic compaction of Coordinators](../design/coordinator.md#compacting-segments).
The Coordinator periodically submits [compaction tasks](../ingestion/tasks.md#compact) to re-index small segments.
To enable the automatic compaction, you need to configure it for each dataSource via Coordinator's dynamic configuration.
See [Compaction Configuration API](../operations/api-reference.html#compaction-configuration)
and [Compaction Configuration](../configuration/index.html#compaction-dynamic-configuration) for details.
See [Compaction Configuration API](../operations/api-reference.md#compaction-configuration)
and [Compaction Configuration](../configuration/index.md#compaction-dynamic-configuration) for details.
- Running periodic Hadoop batch ingestion jobs and using a `dataSource`
inputSpec to read from the segments generated by the Kafka indexing tasks. This might be helpful if you want to compact a lot of segments in parallel.
Details on how to do this can be found on the [Updating existing data](../ingestion/data-management.md#update) section

View File

@ -40,7 +40,7 @@ The other configurations are intended for general use single-machine deployments
The startup scripts for these example configurations run a single ZK instance along with the Druid services. You can choose to deploy ZK separately as well.
The example configurations run the Druid Coordinator and Overlord together in a single process using the optional configuration `druid.coordinator.asOverlord.enabled=true`, described in the [Coordinator configuration documentation](../configuration/index.html#coordinator-operation).
The example configurations run the Druid Coordinator and Overlord together in a single process using the optional configuration `druid.coordinator.asOverlord.enabled=true`, described in the [Coordinator configuration documentation](../configuration/index.md#coordinator-operation).
While example configurations are provided for very large single machines, at higher scales we recommend running Druid in a [clustered deployment](../tutorials/cluster.md), for fault-tolerance and reduced resource contention.

View File

@ -78,7 +78,7 @@ The following table contains non-mandatory advanced configuration options, use c
## Internal communication over TLS
Whenever possible Druid processes will use HTTPS to talk to each other. To enable this communication Druid's HttpClient needs to
be configured with a proper [SSLContext](http://docs.oracle.com/javase/8/docs/api/javax/net/ssl/SSLContext.html) that is able
be configured with a proper [SSLContext](http://docs.oracle.com/javase/8/docs/api/javax/net/ssl/SSLContext) that is able
to validate the Server Certificates, otherwise communication will fail.
Since, there are various ways to configure SSLContext, by default, Druid looks for an instance of SSLContext Guice binding

View File

@ -379,7 +379,7 @@ As a general guideline for experimentation, the [Moments Sketch paper](https://a
#### Fixed Buckets Histogram
Druid also provides a [simple histogram implementation](../development/extensions-core/approximate-histograms.html#fixed-buckets-histogram) that uses a fixed range and fixed number of buckets with support for quantile estimation, backed by an array of bucket count values.
Druid also provides a [simple histogram implementation](../development/extensions-core/approximate-histograms.md#fixed-buckets-histogram) that uses a fixed range and fixed number of buckets with support for quantile estimation, backed by an array of bucket count values.
The fixed buckets histogram can perform well when the distribution of the input data allows a small number of buckets to be used.

View File

@ -71,7 +71,7 @@ enables the Historicals to do their own local result merging and puts less strai
Task executor processes such as the Peon or the experimental Indexer only support segment-level caching. Segment-level
caching is controlled by the query context parameters `useCache` and `populateCache`
and [runtime properties](../configuration/index.html) `druid.realtime.cache.*`.
and [runtime properties](../configuration/index.md) `druid.realtime.cache.*`.
Larger production clusters should enable segment-level cache population on task execution processes only
(not on Brokers) to avoid having to use Brokers to merge all query results. Enabling cache population on the
@ -82,17 +82,3 @@ Note that the task executor processes only support caches that keep their data l
This restriction exists because the cache stores results at the level of intermediate partial segments generated by the
ingestion tasks. These intermediate partial segments will not necessarily be identical across task replicas, so
remote cache types such as `memcached` will be ignored by task executor processes.
## Unsupported queries
Query caching is not available for following:
- Queries, that involve a `union` datasource, do not support result-level caching. Refer to the
[related issue](https://github.com/apache/druid/issues/8713) for details. Please note that not all union SQL queries are executed using a union datasource. You can use the `explain` operation to see how the union query in sql will be executed.
- Queries, that involve an `Inline` datasource or a `Lookup` datasource, do not support any caching.
- Queries, with a sub-query in them, do not support any caching though the output of sub-queries itself may be cached.
Refer to the [Query execution](query-execution.md#query) page for more details on how sub-queries are executed.
- Join queries do not support any caching on the broker [More details](https://github.com/apache/druid/issues/10444).
- GroupBy v2 queries do not support any caching on broker [More details](https://github.com/apache/druid/issues/3820).
- Data Source Metadata queries are not cached anywhere.
- Queries, that have `bySegment` set in the query context, are not cached on the broker. They are currently cached on
historical but this behavior will potentially be removed in the future.

View File

@ -24,7 +24,7 @@ title: "Datasources"
Datasources in Apache Druid are things that you can query. The most common kind of datasource is a table datasource,
and in many contexts the word "datasource" implicitly refers to table datasources. This is especially true
[during data ingestion](../ingestion/index.html), where ingestion is always creating or writing into a table
[during data ingestion](../ingestion/index.md), where ingestion is always creating or writing into a table
datasource. But at query time, there are many other types of datasources available.
The word "datasource" is generally spelled `dataSource` (with a capital S) when it appears in API requests and
@ -51,10 +51,10 @@ SELECT column1, column2 FROM "druid"."dataSourceName"
<!--END_DOCUSAURUS_CODE_TABS-->
The table datasource is the most common type. This is the kind of datasource you get when you perform
[data ingestion](../ingestion/index.html). They are split up into segments, distributed around the cluster,
[data ingestion](../ingestion/index.md). They are split up into segments, distributed around the cluster,
and queried in parallel.
In [Druid SQL](sql.html#from), table datasources reside in the the `druid` schema. This is the default schema, so table
In [Druid SQL](sql.md#from), table datasources reside in the the `druid` schema. This is the default schema, so table
datasources can be referenced as either `druid.dataSourceName` or simply `dataSourceName`.
In native queries, table datasources can be referenced using their names as strings (as in the example above), or by
@ -91,7 +91,7 @@ SELECT k, v FROM lookup.countries
```
<!--END_DOCUSAURUS_CODE_TABS-->
Lookup datasources correspond to Druid's key-value [lookup](lookups.html) objects. In [Druid SQL](sql.html#from),
Lookup datasources correspond to Druid's key-value [lookup](lookups.md) objects. In [Druid SQL](sql.md#from),
they reside in the the `lookup` schema. They are preloaded in memory on all servers, so they can be accessed rapidly.
They can be joined onto regular tables using the [join operator](#join).
@ -102,7 +102,7 @@ To see a list of all lookup datasources, use the SQL query
`SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = 'lookup'`.
> Performance tip: Lookups can be joined with a base table either using an explicit [join](#join), or by using the
> SQL [`LOOKUP` function](sql.html#string-functions).
> SQL [`LOOKUP` function](sql.md#string-functions).
> However, the join operator must evaluate the condition on each row, whereas the
> `LOOKUP` function can defer evaluation until after an aggregation phase. This means that the `LOOKUP` function is
> usually faster than joining to a lookup datasource.
@ -113,16 +113,6 @@ use table datasources.
### `union`
<!--DOCUSAURUS_CODE_TABS-->
<!--SQL-->
```sql
SELECT col1, COUNT(*)
FROM (
SELECT col1, col2, col3 FROM tbl1
UNION ALL
SELECT col1, col2, col3 FROM tbl2
)
GROUP BY col1
```
<!--Native-->
```json
{
@ -144,6 +134,8 @@ another will be treated as if they contained all null values in the tables where
The list of "dataSources" must be nonempty. If you want to query an empty dataset, use an [`inline` datasource](#inline)
instead.
Union datasources are not available in Druid SQL.
Refer to the [Query execution](query-execution.md#union) page for more details on how queries are executed when you
use union datasources.
@ -332,7 +324,7 @@ Native join datasources have the following properties. All are required.
Joins are a feature that can significantly affect performance of your queries. Some performance tips and notes:
1. Joins are especially useful with [lookup datasources](#lookup), but in most cases, the
[`LOOKUP` function](sql.html#string-functions) performs better than a join. Consider using the `LOOKUP` function if
[`LOOKUP` function](sql.md#string-functions) performs better than a join. Consider using the `LOOKUP` function if
it is appropriate for your use case.
2. When using joins in Druid SQL, keep in mind that it can generate subqueries that you did not explicitly include in
your queries. Refer to the [Druid SQL](sql.md#query-translation) documentation for more details about when this happens

View File

@ -71,7 +71,7 @@ Please refer to the [Output Types](#output-types) section for more details.
### Filtered DimensionSpecs
These are only useful for multi-value dimensions. If you have a row in Apache Druid that has a multi-value dimension with values ["v1", "v2", "v3"] and you send a groupBy/topN query grouping by that dimension with [query filter](filters.html) for value "v1". In the response you will get 3 rows containing "v1", "v2" and "v3". This behavior might be unintuitive for some use cases.
These are only useful for multi-value dimensions. If you have a row in Apache Druid that has a multi-value dimension with values ["v1", "v2", "v3"] and you send a groupBy/topN query grouping by that dimension with [query filter](filters.md) for value "v1". In the response you will get 3 rows containing "v1", "v2" and "v3". This behavior might be unintuitive for some use cases.
It happens because "query filter" is internally used on the bitmaps and only used to match the row to be included in the query result processing. With multi-value dimensions, "query filter" behaves like a contains check, which will match the row with dimension value ["v1", "v2", "v3"]. Please see the section on "Multi-value columns" in [segment](../design/segments.md) for more details.
Then groupBy/topN processing pipeline "explodes" all multi-value dimensions resulting 3 rows for "v1", "v2" and "v3" each.
@ -96,7 +96,7 @@ Following filtered dimension spec retains only the values starting with the same
{ "type" : "prefixFiltered", "delegate" : <dimensionSpec>, "prefix": <prefix string> }
```
For more details and examples, see [multi-value dimensions](multi-value-dimensions.html).
For more details and examples, see [multi-value dimensions](multi-value-dimensions.md).
### Lookup DimensionSpecs
@ -201,7 +201,7 @@ Returns the dimension value unchanged if the regular expression matches, otherwi
### Search query extraction function
Returns the dimension value unchanged if the given [`SearchQuerySpec`](../querying/searchquery.html#searchqueryspec)
Returns the dimension value unchanged if the given [`SearchQuerySpec`](../querying/searchquery.md#searchqueryspec)
matches, otherwise returns null.
```json
@ -254,7 +254,7 @@ For a regular dimension, it assumes the string is formatted in
* `format` : date time format for the resulting dimension value, in [Joda Time DateTimeFormat](http://www.joda.org/joda-time/apidocs/org/joda/time/format/DateTimeFormat.html), or null to use the default ISO8601 format.
* `locale` : locale (language and country) to use, given as a [IETF BCP 47 language tag](http://www.oracle.com/technetwork/java/javase/java8locales-2095355.html#util-text), e.g. `en-US`, `en-GB`, `fr-FR`, `fr-CA`, etc.
* `timeZone` : time zone to use in [IANA tz database format](http://en.wikipedia.org/wiki/List_of_tz_database_time_zones), e.g. `Europe/Berlin` (this can possibly be different than the aggregation time-zone)
* `granularity` : [granularity](granularities.html) to apply before formatting, or omit to not apply any granularity.
* `granularity` : [granularity](granularities.md) to apply before formatting, or omit to not apply any granularity.
* `asMillis` : boolean value, set to true to treat input strings as millis rather than ISO8601 strings. Additionally, if `format` is null or not specified, output will be in millis rather than ISO8601.
```json
@ -371,7 +371,7 @@ be treated as missing.
It is illegal to set `retainMissingValue = true` and also specify a `replaceMissingValueWith`.
A property of `injective` can override the lookup's own sense of whether or not it is
[injective](lookups.html#query-execution). If left unspecified, Druid will use the registered cluster-wide lookup
[injective](lookups.md#query-execution). If left unspecified, Druid will use the registered cluster-wide lookup
configuration.
A property `optimize` can be supplied to allow optimization of lookup based extraction filter (by default `optimize = true`).

View File

@ -137,7 +137,7 @@ The JavaScript filter supports the use of extraction functions, see [Filtering w
> The extraction filter is now deprecated. The selector filter with an extraction function specified
> provides identical functionality and should be used instead.
Extraction filter matches a dimension using some specific [Extraction function](./dimensionspecs.html#extraction-functions).
Extraction filter matches a dimension using some specific [Extraction function](./dimensionspecs.md#extraction-functions).
The following filter matches the values for which the extraction function has transformation entry `input_key=output_value` where
`output_value` is equal to the filter `value` and `input_key` is present as dimension.
@ -409,7 +409,7 @@ The filter above is equivalent to the following OR of Bound filters:
### Filtering with Extraction Functions
All filters except the "spatial" filter support extraction functions.
An extraction function is defined by setting the "extractionFn" field on a filter.
See [Extraction function](./dimensionspecs.html#extraction-functions) for more details on extraction functions.
See [Extraction function](./dimensionspecs.md#extraction-functions) for more details on extraction functions.
If specified, the extraction function will be used to transform input values before the filter is applied.
The example below shows a selector filter combined with an extraction function. This filter will transform input values
@ -483,7 +483,7 @@ Query filters can also be applied to the timestamp column. The timestamp column
to the timestamp column, use the string `__time` as the dimension name. Like numeric dimensions, timestamp filters
should be specified as if the timestamp values were strings.
If the user wishes to interpret the timestamp with a specific format, timezone, or locale, the [Time Format Extraction Function](./dimensionspecs.html#time-format-extraction-function) is useful.
If the user wishes to interpret the timestamp with a specific format, timezone, or locale, the [Time Format Extraction Function](./dimensionspecs.md#time-format-extraction-function) is useful.
For example, filtering on a long timestamp value:

View File

@ -93,7 +93,7 @@ Following are main parts to a groupBy query:
|aggregations|See [Aggregations](../querying/aggregations.md)|no|
|postAggregations|See [Post Aggregations](../querying/post-aggregations.md)|no|
|intervals|A JSON Object representing ISO-8601 Intervals. This defines the time ranges to run the query over.|yes|
|subtotalsSpec| A JSON array of arrays to return additional result sets for groupings of subsets of top level `dimensions`. It is [described later](groupbyquery.html#more-on-subtotalsspec) in more detail.|no|
|subtotalsSpec| A JSON array of arrays to return additional result sets for groupings of subsets of top level `dimensions`. It is [described later](groupbyquery.md#more-on-subtotalsspec) in more detail.|no|
|context|An additional JSON Object which can be used to specify certain flags.|no|
To pull it all together, the above query would return *n\*m* data points, up to a maximum of 5000 points, where n is the cardinality of the `country` dimension, m is the cardinality of the `device` dimension, each day between 2012-01-01 and 2012-01-03, from the `sample_datasource` table. Each data point contains the (long) sum of `total_usage` if the value of the data point is greater than 100, the (double) sum of `data_transfer` and the (double) result of `total_usage` divided by `data_transfer` for the filter set for a particular grouping of `country` and `device`. The output looks like this:
@ -132,10 +132,10 @@ groupBy queries can group on multi-value dimensions. When grouping on a multi-va
from matching rows will be used to generate one group per value. It's possible for a query to return more groups than
there are rows. For example, a groupBy on the dimension `tags` with filter `"t1" AND "t3"` would match only row1, and
generate a result with three groups: `t1`, `t2`, and `t3`. If you only need to include values that match
your filter, you can use a [filtered dimensionSpec](dimensionspecs.html#filtered-dimensionspecs). This can also
your filter, you can use a [filtered dimensionSpec](dimensionspecs.md#filtered-dimensionspecs). This can also
improve performance.
See [Multi-value dimensions](multi-value-dimensions.html) for more details.
See [Multi-value dimensions](multi-value-dimensions.md) for more details.
## More on subtotalsSpec
@ -299,9 +299,9 @@ will not exceed available memory for the maximum possible concurrent query load
`druid.processing.numMergeBuffers`). See the [basic cluster tuning guide](../operations/basic-cluster-tuning.md)
for more details about direct memory usage, organized by Druid process type.
Brokers do not need merge buffers for basic groupBy queries. Queries with subqueries (using a `query` dataSource) require one merge buffer if there is a single subquery, or two merge buffers if there is more than one layer of nested subqueries. Queries with [subtotals](groupbyquery.html#more-on-subtotalsspec) need one merge buffer. These can stack on top of each other: a groupBy query with multiple layers of nested subqueries, and that also uses subtotals, will need three merge buffers.
Brokers do not need merge buffers for basic groupBy queries. Queries with subqueries (using a `query` dataSource) require one merge buffer if there is a single subquery, or two merge buffers if there is more than one layer of nested subqueries. Queries with [subtotals](groupbyquery.md#more-on-subtotalsspec) need one merge buffer. These can stack on top of each other: a groupBy query with multiple layers of nested subqueries, and that also uses subtotals, will need three merge buffers.
Historicals and ingestion tasks need one merge buffer for each groupBy query, unless [parallel combination](groupbyquery.html#parallel-combine) is enabled, in which case they need two merge buffers per query.
Historicals and ingestion tasks need one merge buffer for each groupBy query, unless [parallel combination](groupbyquery.md#parallel-combine) is enabled, in which case they need two merge buffers per query.
When using groupBy v1, all aggregation is done on-heap, and resource limits are done through the parameter
`druid.query.groupBy.maxResults`. This is a cap on the maximum number of results in a result set. Queries that exceed
@ -355,11 +355,11 @@ computing intermediate aggregates from each segment and another for combining in
There are some situations where other query types may be a better choice than groupBy.
- For queries with no "dimensions" (i.e. grouping by time only) the [Timeseries query](timeseriesquery.html) will
- For queries with no "dimensions" (i.e. grouping by time only) the [Timeseries query](timeseriesquery.md) will
generally be faster than groupBy. The major differences are that it is implemented in a fully streaming manner (taking
advantage of the fact that segments are already sorted on time) and does not need to use a hash table for merging.
- For queries with a single "dimensions" element (i.e. grouping by one string dimension), the [TopN query](topnquery.html)
- For queries with a single "dimensions" element (i.e. grouping by one string dimension), the [TopN query](topnquery.md)
will sometimes be faster than groupBy. This is especially true if you are ordering by a metric and find approximate
results acceptable.
@ -373,7 +373,7 @@ strategy perform the outer query on the Broker in a single-threaded fashion.
### Configurations
This section describes the configurations for groupBy queries. You can set the runtime properties in the `runtime.properties` file on Broker, Historical, and MiddleManager processes. You can set the query context parameters through the [query context](query-context.html).
This section describes the configurations for groupBy queries. You can set the runtime properties in the `runtime.properties` file on Broker, Historical, and MiddleManager processes. You can set the query context parameters through the [query context](query-context.md).
#### Configurations for groupBy v2

View File

@ -35,7 +35,7 @@ Apache Druid supports the following types of having clauses.
### Query filters
Query filter HavingSpecs allow all [Druid query filters](filters.html) to be used in the Having part of the query.
Query filter HavingSpecs allow all [Druid query filters](filters.md) to be used in the Having part of the query.
The grammar for a query filter HavingSpec is:

View File

@ -56,7 +56,7 @@ Other lookup types are available as extensions, including:
Query Syntax
------------
In [Druid SQL](sql.html), lookups can be queried using the [`LOOKUP` function](sql.md#string-functions), for example:
In [Druid SQL](sql.md), lookups can be queried using the [`LOOKUP` function](sql.md#string-functions), for example:
```sql
SELECT
@ -78,7 +78,7 @@ FROM
GROUP BY 1
```
In native queries, lookups can be queried with [dimension specs or extraction functions](dimensionspecs.html).
In native queries, lookups can be queried with [dimension specs or extraction functions](dimensionspecs.md).
Query Execution
---------------

View File

@ -29,8 +29,8 @@ characters).
This document describes the behavior of groupBy (topN has similar behavior) queries on multi-value dimensions when they
are used as a dimension being grouped by. See the section on multi-value columns in
[segments](../design/segments.html#multi-value-columns) for internal representation details. Examples in this document
are in the form of [native Druid queries](querying.html). Refer to the [Druid SQL documentation](sql.html) for details
[segments](../design/segments.md#multi-value-columns) for internal representation details. Examples in this document
are in the form of [native Druid queries](querying.md). Refer to the [Druid SQL documentation](sql.md) for details
about using multi-value string dimensions in SQL.
## Querying multi-value dimensions
@ -47,7 +47,7 @@ called `tags`.
### Filtering
All query types, as well as [filtered aggregators](aggregations.html#filtered-aggregator), can filter on multi-value
All query types, as well as [filtered aggregators](aggregations.md#filtered-aggregator), can filter on multi-value
dimensions. Filters follow these rules on multi-value dimensions:
- Value filters (like "selector", "bound", and "in") match a row if any of the values of a multi-value dimension match
@ -115,12 +115,12 @@ from matching rows will be used to generate one group per value. This can be tho
`UNNEST` operator used on an `ARRAY` type that many SQL dialects support. This means it's possible for a query to return
more groups than there are rows. For example, a topN on the dimension `tags` with filter `"t1" AND "t3"` would match
only row1, and generate a result with three groups: `t1`, `t2`, and `t3`. If you only need to include values that match
your filter, you can use a [filtered dimensionSpec](dimensionspecs.html#filtered-dimensionspecs). This can also
your filter, you can use a [filtered dimensionSpec](dimensionspecs.md#filtered-dimensionspecs). This can also
improve performance.
### Example: GroupBy query with no filtering
See [GroupBy querying](groupbyquery.html) for details.
See [GroupBy querying](groupbyquery.md) for details.
```json
{
@ -208,7 +208,7 @@ notice how original rows are "exploded" into multiple rows and merged.
### Example: GroupBy query with a selector query filter
See [query filters](filters.html) for details of selector query filter.
See [query filters](filters.md) for details of selector query filter.
```json
{
@ -293,7 +293,7 @@ the multiple values matches the query filter.
To solve the problem above and to get only rows for "t3" returned, you would have to use a "filtered dimension spec" as
in the query below.
See section on filtered dimensionSpecs in [dimensionSpecs](dimensionspecs.html#filtered-dimensionspecs) for details.
See section on filtered dimensionSpecs in [dimensionSpecs](dimensionspecs.md#filtered-dimensionspecs) for details.
```json
{
@ -344,6 +344,6 @@ returns the following result.
]
```
Note that, for groupBy queries, you could get similar result with a [having spec](having.html) but using a filtered
Note that, for groupBy queries, you could get similar result with a [having spec](having.md) but using a filtered
dimensionSpec is much more efficient because that gets applied at the lowest level in the query processing pipeline.
Having specs are applied at the outermost level of groupBy query processing.

View File

@ -57,7 +57,7 @@ If your multitenant cluster uses shared datasources, most of your queries will l
dimension. These sorts of queries perform best when data is well-partitioned by tenant. There are a few ways to
accomplish this.
With batch indexing, you can use [single-dimension partitioning](../ingestion/hadoop.html#single-dimension-range-partitioning)
With batch indexing, you can use [single-dimension partitioning](../ingestion/hadoop.md#single-dimension-range-partitioning)
to partition your data by tenant_id. Druid always partitions by time first, but the secondary partition within each
time bucket will be on tenant_id.

View File

@ -39,9 +39,9 @@ These parameters apply to all query types.
|property |default | description |
|-----------------|----------------------------------------|----------------------|
|timeout | `druid.server.http.defaultQueryTimeout`| Query timeout in millis, beyond which unfinished queries will be cancelled. 0 timeout means `no timeout`. To set the default timeout, see [Broker configuration](../configuration/index.html#broker) |
|timeout | `druid.server.http.defaultQueryTimeout`| Query timeout in millis, beyond which unfinished queries will be cancelled. 0 timeout means `no timeout`. To set the default timeout, see [Broker configuration](../configuration/index.md#broker) |
|priority | `0` | Query Priority. Queries with higher priority get precedence for computational resources.|
|lane | `null` | Query lane, used to control usage limits on classes of queries. See [Broker configuration](../configuration/index.html#broker) for more details.|
|lane | `null` | Query lane, used to control usage limits on classes of queries. See [Broker configuration](../configuration/index.md#broker) for more details.|
|queryId | auto-generated | Unique identifier given to this query. If a query ID is set or known, this can be used to cancel the query |
|useCache | `true` | Flag indicating whether to leverage the query cache for this query. When set to false, it disables reading from the query cache for this query. When set to true, Apache Druid uses `druid.broker.cache.useCache` or `druid.historical.cache.useCache` to determine whether or not to read from the query cache |
|populateCache | `true` | Flag indicating whether to save the results of the query to the query cache. Primarily used for debugging. When set to false, it disables saving the results of this query to the query cache. When set to true, Druid uses `druid.broker.cache.populateCache` or `druid.historical.cache.populateCache` to determine whether or not to save the results of this query to the query cache |
@ -49,14 +49,14 @@ These parameters apply to all query types.
|populateResultLevelCache | `true` | Flag indicating whether to save the results of the query to the result level cache. Primarily used for debugging. When set to false, it disables saving the results of this query to the query cache. When set to true, Druid uses `druid.broker.cache.populateResultLevelCache` to determine whether or not to save the results of this query to the result-level query cache |
|bySegment | `false` | Return "by segment" results. Primarily used for debugging, setting it to `true` returns results associated with the data segment they came from |
|finalize | `true` | Flag indicating whether to "finalize" aggregation results. Primarily used for debugging. For instance, the `hyperUnique` aggregator will return the full HyperLogLog sketch instead of the estimated cardinality when this flag is set to `false` |
|maxScatterGatherBytes| `druid.server.http.maxScatterGatherBytes` | Maximum number of bytes gathered from data processes such as Historicals and realtime processes to execute a query. This parameter can be used to further reduce `maxScatterGatherBytes` limit at query time. See [Broker configuration](../configuration/index.html#broker) for more details.|
|maxScatterGatherBytes| `druid.server.http.maxScatterGatherBytes` | Maximum number of bytes gathered from data processes such as Historicals and realtime processes to execute a query. This parameter can be used to further reduce `maxScatterGatherBytes` limit at query time. See [Broker configuration](../configuration/index.md#broker) for more details.|
|maxQueuedBytes | `druid.broker.http.maxQueuedBytes` | Maximum number of bytes queued per query before exerting backpressure on the channel to the data server. Similar to `maxScatterGatherBytes`, except unlike that configuration, this one will trigger backpressure rather than query failure. Zero means disabled.|
|serializeDateTimeAsLong| `false` | If true, DateTime is serialized as long in the result returned by Broker and the data transportation between Broker and compute process|
|serializeDateTimeAsLongInner| `false` | If true, DateTime is serialized as long in the data transportation between Broker and compute process|
|enableParallelMerge|`true`|Enable parallel result merging on the Broker. Note that `druid.processing.merge.useParallelMergePool` must be enabled for this setting to be set to `true`. See [Broker configuration](../configuration/index.html#broker) for more details.|
|parallelMergeParallelism|`druid.processing.merge.pool.parallelism`|Maximum number of parallel threads to use for parallel result merging on the Broker. See [Broker configuration](../configuration/index.html#broker) for more details.|
|parallelMergeInitialYieldRows|`druid.processing.merge.task.initialYieldNumRows`|Number of rows to yield per ForkJoinPool merge task for parallel result merging on the Broker, before forking off a new task to continue merging sequences. See [Broker configuration](../configuration/index.html#broker) for more details.|
|parallelMergeSmallBatchRows|`druid.processing.merge.task.smallBatchNumRows`|Size of result batches to operate on in ForkJoinPool merge tasks for parallel result merging on the Broker. See [Broker configuration](../configuration/index.html#broker) for more details.|
|enableParallelMerge|`true`|Enable parallel result merging on the Broker. Note that `druid.processing.merge.useParallelMergePool` must be enabled for this setting to be set to `true`. See [Broker configuration](../configuration/index.md#broker) for more details.|
|parallelMergeParallelism|`druid.processing.merge.pool.parallelism`|Maximum number of parallel threads to use for parallel result merging on the Broker. See [Broker configuration](../configuration/index.md#broker) for more details.|
|parallelMergeInitialYieldRows|`druid.processing.merge.task.initialYieldNumRows`|Number of rows to yield per ForkJoinPool merge task for parallel result merging on the Broker, before forking off a new task to continue merging sequences. See [Broker configuration](../configuration/index.md#broker) for more details.|
|parallelMergeSmallBatchRows|`druid.processing.merge.task.smallBatchNumRows`|Size of result batches to operate on in ForkJoinPool merge tasks for parallel result merging on the Broker. See [Broker configuration](../configuration/index.md#broker) for more details.|
|useFilterCNF|`false`| If true, Druid will attempt to convert the query filter to Conjunctive Normal Form (CNF). During query processing, columns can be pre-filtered by intersecting the bitmap indexes of all values that match the eligible filters, often greatly reducing the raw number of rows which need to be scanned. But this effect only happens for the top level filter, or individual clauses of a top level 'and' filter. As such, filters in CNF potentially have a higher chance to utilize a large amount of bitmap indexes on string columns during pre-filtering. However, this setting should be used with great caution, as it can sometimes have a negative effect on performance, and in some cases, the act of computing CNF of a filter can be expensive. We recommend hand tuning your filters to produce an optimal form if possible, or at least verifying through experimentation that using this parameter actually improves your query performance with no ill-effects.|
|secondaryPartitionPruning|`true`|Enable secondary partition pruning on the Broker. The Broker will always prune unnecessary segments from the input scan based on a filter on time intervals, but if the data is further partitioned with hash or range partitioning, this option will enable additional pruning based on a filter on secondary partition dimensions.|
@ -98,7 +98,7 @@ include "selector", "bound", "in", "like", "regex", "search", "and", "or", and "
- For GroupBy: No multi-value dimensions.
- For Timeseries: No "descending" order.
- Only immutable segments (not real-time).
- Only [table datasources](datasource.html#table) (not joins, subqueries, lookups, or inline datasources).
- Only [table datasources](datasource.md#table) (not joins, subqueries, lookups, or inline datasources).
Other query types (like TopN, Scan, Select, and Search) ignore the "vectorize" parameter, and will execute without
vectorization. These query types will ignore the "vectorize" parameter even if it is set to `"force"`.

View File

@ -41,9 +41,8 @@ You can also enter them directly in the Druid console's Query view. Simply pasti
![Native query](../assets/native-queries-01.png "Native query")
Druid's native query language is JSON over HTTP, although many members of the community have contributed different
[client libraries](/libraries.html) in other languages to query Druid.
[client libraries](https://druid.apache.org/libraries.html) in other languages to query Druid.
The Content-Type/Accept Headers can also take 'application/x-jackson-smile'.

View File

@ -27,7 +27,7 @@ title: "String comparators"
> language. For information about functions available in SQL, refer to the
> [SQL documentation](sql.md#scalar-functions).
These sorting orders are used by the [TopNMetricSpec](./topnmetricspec.md), [SearchQuery](./searchquery.md), GroupByQuery's [LimitSpec](./limitspec.md), and [BoundFilter](./filters.html#bound-filter).
These sorting orders are used by the [TopNMetricSpec](./topnmetricspec.md), [SearchQuery](./searchquery.md), GroupByQuery's [LimitSpec](./limitspec.md), and [BoundFilter](./filters.md#bound-filter).
## Lexicographic
Sorts values by converting Strings to their UTF-8 byte array representations and comparing lexicographically, byte-by-byte.

View File

@ -62,23 +62,23 @@ FROM { <table> | (<subquery>) | <o1> [ INNER | LEFT ] JOIN <o2> ON condition }
The FROM clause can refer to any of the following:
- [Table datasources](datasource.html#table) from the `druid` schema. This is the default schema, so Druid table
- [Table datasources](datasource.md#table) from the `druid` schema. This is the default schema, so Druid table
datasources can be referenced as either `druid.dataSourceName` or simply `dataSourceName`.
- [Lookups](datasource.html#lookup) from the `lookup` schema, for example `lookup.countries`. Note that lookups can
- [Lookups](datasource.md#lookup) from the `lookup` schema, for example `lookup.countries`. Note that lookups can
also be queried using the [`LOOKUP` function](#string-functions).
- [Subqueries](datasource.html#query).
- [Joins](datasource.html#join) between anything in this list, except between native datasources (table, lookup,
- [Subqueries](datasource.md#query).
- [Joins](datasource.md#join) between anything in this list, except between native datasources (table, lookup,
query) and system tables. The join condition must be an equality between expressions from the left- and right-hand side
of the join.
- [Metadata tables](#metadata-tables) from the `INFORMATION_SCHEMA` or `sys` schemas. Unlike the other options for the
FROM clause, metadata tables are not considered datasources. They exist only in the SQL layer.
For more information about table, lookup, query, and join datasources, refer to the [Datasources](datasource.html)
For more information about table, lookup, query, and join datasources, refer to the [Datasources](datasource.md)
documentation.
### WHERE
The WHERE clause refers to columns in the FROM table, and will be translated to [native filters](filters.html). The
The WHERE clause refers to columns in the FROM table, and will be translated to [native filters](filters.md). The
WHERE clause can also reference a subquery, like `WHERE col1 IN (SELECT foo FROM ...)`. Queries like this are executed
as a join on the subquery, described below in the [Query translation](#subqueries) section.
@ -259,14 +259,14 @@ converted to zeroes).
### Multi-value strings
Druid's native type system allows strings to potentially have multiple values. These
[multi-value string dimensions](multi-value-dimensions.html) will be reported in SQL as `VARCHAR` typed, and can be
[multi-value string dimensions](multi-value-dimensions.md) will be reported in SQL as `VARCHAR` typed, and can be
syntactically used like any other VARCHAR. Regular string functions that refer to multi-value string dimensions will be
applied to all values for each row individually. Multi-value string dimensions can also be treated as arrays via special
[multi-value string functions](#multi-value-string-functions), which can perform powerful array-aware operations.
Grouping by a multi-value expression will observe the native Druid multi-value aggregation behavior, which is similar to
the `UNNEST` functionality available in some other SQL dialects. Refer to the documentation on
[multi-value string dimensions](multi-value-dimensions.html) for additional details.
[multi-value string dimensions](multi-value-dimensions.md) for additional details.
> Because multi-value dimensions are treated by the SQL planner as `VARCHAR`, there are some inconsistencies between how
> they are handled in Druid SQL and in native queries. For example, expressions involving multi-value dimensions may be
@ -277,7 +277,7 @@ the `UNNEST` functionality available in some other SQL dialects. Refer to the do
### NULL values
The `druid.generic.useDefaultValueForNull` [runtime property](../configuration/index.html#sql-compatible-null-handling)
The `druid.generic.useDefaultValueForNull` [runtime property](../configuration/index.md#sql-compatible-null-handling)
controls Druid's NULL handling mode.
In the default mode (`true`), Druid treats NULLs and empty strings interchangeably, rather than according to the SQL
@ -316,23 +316,23 @@ Only the COUNT aggregation can accept DISTINCT.
|`MAX(expr)`|Takes the maximum of numbers.|
|`AVG(expr)`|Averages numbers.|
|`APPROX_COUNT_DISTINCT(expr)`|Counts distinct values of expr, which can be a regular column or a hyperUnique column. This is always approximate, regardless of the value of "useApproximateCountDistinct". This uses Druid's built-in "cardinality" or "hyperUnique" aggregators. See also `COUNT(DISTINCT expr)`.|
|`APPROX_COUNT_DISTINCT_DS_HLL(expr, [lgK, tgtHllType])`|Counts distinct values of expr, which can be a regular column or an [HLL sketch](../development/extensions-core/datasketches-hll.html) column. The `lgK` and `tgtHllType` parameters are described in the HLL sketch documentation. This is always approximate, regardless of the value of "useApproximateCountDistinct". See also `COUNT(DISTINCT expr)`. The [DataSketches extension](../development/extensions-core/datasketches-extension.html) must be loaded to use this function.|
|`APPROX_COUNT_DISTINCT_DS_THETA(expr, [size])`|Counts distinct values of expr, which can be a regular column or a [Theta sketch](../development/extensions-core/datasketches-theta.html) column. The `size` parameter is described in the Theta sketch documentation. This is always approximate, regardless of the value of "useApproximateCountDistinct". See also `COUNT(DISTINCT expr)`. The [DataSketches extension](../development/extensions-core/datasketches-extension.html) must be loaded to use this function.|
|`DS_HLL(expr, [lgK, tgtHllType])`|Creates an [HLL sketch](../development/extensions-core/datasketches-hll.html) on the values of expr, which can be a regular column or a column containing HLL sketches. The `lgK` and `tgtHllType` parameters are described in the HLL sketch documentation. The [DataSketches extension](../development/extensions-core/datasketches-extension.html) must be loaded to use this function.|
|`DS_THETA(expr, [size])`|Creates a [Theta sketch](../development/extensions-core/datasketches-theta.html) on the values of expr, which can be a regular column or a column containing Theta sketches. The `size` parameter is described in the Theta sketch documentation. The [DataSketches extension](../development/extensions-core/datasketches-extension.html) must be loaded to use this function.|
|`APPROX_QUANTILE(expr, probability, [resolution])`|Computes approximate quantiles on numeric or [approxHistogram](../development/extensions-core/approximate-histograms.html#approximate-histogram-aggregator) exprs. The "probability" should be between 0 and 1 (exclusive). The "resolution" is the number of centroids to use for the computation. Higher resolutions will give more precise results but also have higher overhead. If not provided, the default resolution is 50. The [approximate histogram extension](../development/extensions-core/approximate-histograms.html) must be loaded to use this function.|
|`APPROX_QUANTILE_DS(expr, probability, [k])`|Computes approximate quantiles on numeric or [Quantiles sketch](../development/extensions-core/datasketches-quantiles.html) exprs. The "probability" should be between 0 and 1 (exclusive). The `k` parameter is described in the Quantiles sketch documentation. The [DataSketches extension](../development/extensions-core/datasketches-extension.html) must be loaded to use this function.|
|`APPROX_QUANTILE_FIXED_BUCKETS(expr, probability, numBuckets, lowerLimit, upperLimit, [outlierHandlingMode])`|Computes approximate quantiles on numeric or [fixed buckets histogram](../development/extensions-core/approximate-histograms.html#fixed-buckets-histogram) exprs. The "probability" should be between 0 and 1 (exclusive). The `numBuckets`, `lowerLimit`, `upperLimit`, and `outlierHandlingMode` parameters are described in the fixed buckets histogram documentation. The [approximate histogram extension](../development/extensions-core/approximate-histograms.html) must be loaded to use this function.|
|`DS_QUANTILES_SKETCH(expr, [k])`|Creates a [Quantiles sketch](../development/extensions-core/datasketches-quantiles.html) on the values of expr, which can be a regular column or a column containing quantiles sketches. The `k` parameter is described in the Quantiles sketch documentation. The [DataSketches extension](../development/extensions-core/datasketches-extension.html) must be loaded to use this function.|
|`BLOOM_FILTER(expr, numEntries)`|Computes a bloom filter from values produced by `expr`, with `numEntries` maximum number of distinct values before false positive rate increases. See [bloom filter extension](../development/extensions-core/bloom-filter.html) documentation for additional details.|
|`TDIGEST_QUANTILE(expr, quantileFraction, [compression])`|Builds a T-Digest sketch on values produced by `expr` and returns the value for the quantile. Compression parameter (default value 100) determines the accuracy and size of the sketch. Higher compression means higher accuracy but more space to store sketches. See [t-digest extension](../development/extensions-contrib/tdigestsketch-quantiles.html) documentation for additional details.|
|`TDIGEST_GENERATE_SKETCH(expr, [compression])`|Builds a T-Digest sketch on values produced by `expr`. Compression parameter (default value 100) determines the accuracy and size of the sketch Higher compression means higher accuracy but more space to store sketches. See [t-digest extension](../development/extensions-contrib/tdigestsketch-quantiles.html) documentation for additional details.|
|`VAR_POP(expr)`|Computes variance population of `expr`. See [stats extension](../development/extensions-core/stats.html) documentation for additional details.|
|`VAR_SAMP(expr)`|Computes variance sample of `expr`. See [stats extension](../development/extensions-core/stats.html) documentation for additional details.|
|`VARIANCE(expr)`|Computes variance sample of `expr`. See [stats extension](../development/extensions-core/stats.html) documentation for additional details.|
|`STDDEV_POP(expr)`|Computes standard deviation population of `expr`. See [stats extension](../development/extensions-core/stats.html) documentation for additional details.|
|`STDDEV_SAMP(expr)`|Computes standard deviation sample of `expr`. See [stats extension](../development/extensions-core/stats.html) documentation for additional details.|
|`STDDEV(expr)`|Computes standard deviation sample of `expr`. See [stats extension](../development/extensions-core/stats.html) documentation for additional details.|
|`APPROX_COUNT_DISTINCT_DS_HLL(expr, [lgK, tgtHllType])`|Counts distinct values of expr, which can be a regular column or an [HLL sketch](../development/extensions-core/datasketches-hll.md) column. The `lgK` and `tgtHllType` parameters are described in the HLL sketch documentation. This is always approximate, regardless of the value of "useApproximateCountDistinct". See also `COUNT(DISTINCT expr)`. The [DataSketches extension](../development/extensions-core/datasketches-extension.md) must be loaded to use this function.|
|`APPROX_COUNT_DISTINCT_DS_THETA(expr, [size])`|Counts distinct values of expr, which can be a regular column or a [Theta sketch](../development/extensions-core/datasketches-theta.md) column. The `size` parameter is described in the Theta sketch documentation. This is always approximate, regardless of the value of "useApproximateCountDistinct". See also `COUNT(DISTINCT expr)`. The [DataSketches extension](../development/extensions-core/datasketches-extension.md) must be loaded to use this function.|
|`DS_HLL(expr, [lgK, tgtHllType])`|Creates an [HLL sketch](../development/extensions-core/datasketches-hll.md) on the values of expr, which can be a regular column or a column containing HLL sketches. The `lgK` and `tgtHllType` parameters are described in the HLL sketch documentation. The [DataSketches extension](../development/extensions-core/datasketches-extension.md) must be loaded to use this function.|
|`DS_THETA(expr, [size])`|Creates a [Theta sketch](../development/extensions-core/datasketches-theta.md) on the values of expr, which can be a regular column or a column containing Theta sketches. The `size` parameter is described in the Theta sketch documentation. The [DataSketches extension](../development/extensions-core/datasketches-extension.md) must be loaded to use this function.|
|`APPROX_QUANTILE(expr, probability, [resolution])`|Computes approximate quantiles on numeric or [approxHistogram](../development/extensions-core/approximate-histograms.md#approximate-histogram-aggregator) exprs. The "probability" should be between 0 and 1 (exclusive). The "resolution" is the number of centroids to use for the computation. Higher resolutions will give more precise results but also have higher overhead. If not provided, the default resolution is 50. The [approximate histogram extension](../development/extensions-core/approximate-histograms.md) must be loaded to use this function.|
|`APPROX_QUANTILE_DS(expr, probability, [k])`|Computes approximate quantiles on numeric or [Quantiles sketch](../development/extensions-core/datasketches-quantiles.md) exprs. The "probability" should be between 0 and 1 (exclusive). The `k` parameter is described in the Quantiles sketch documentation. The [DataSketches extension](../development/extensions-core/datasketches-extension.md) must be loaded to use this function.|
|`APPROX_QUANTILE_FIXED_BUCKETS(expr, probability, numBuckets, lowerLimit, upperLimit, [outlierHandlingMode])`|Computes approximate quantiles on numeric or [fixed buckets histogram](../development/extensions-core/approximate-histograms.md#fixed-buckets-histogram) exprs. The "probability" should be between 0 and 1 (exclusive). The `numBuckets`, `lowerLimit`, `upperLimit`, and `outlierHandlingMode` parameters are described in the fixed buckets histogram documentation. The [approximate histogram extension](../development/extensions-core/approximate-histograms.md) must be loaded to use this function.|
|`DS_QUANTILES_SKETCH(expr, [k])`|Creates a [Quantiles sketch](../development/extensions-core/datasketches-quantiles.md) on the values of expr, which can be a regular column or a column containing quantiles sketches. The `k` parameter is described in the Quantiles sketch documentation. The [DataSketches extension](../development/extensions-core/datasketches-extension.md) must be loaded to use this function.|
|`BLOOM_FILTER(expr, numEntries)`|Computes a bloom filter from values produced by `expr`, with `numEntries` maximum number of distinct values before false positive rate increases. See [bloom filter extension](../development/extensions-core/bloom-filter.md) documentation for additional details.|
|`TDIGEST_QUANTILE(expr, quantileFraction, [compression])`|Builds a T-Digest sketch on values produced by `expr` and returns the value for the quantile. Compression parameter (default value 100) determines the accuracy and size of the sketch. Higher compression means higher accuracy but more space to store sketches. See [t-digest extension](../development/extensions-contrib/tdigestsketch-quantiles.md) documentation for additional details.|
|`TDIGEST_GENERATE_SKETCH(expr, [compression])`|Builds a T-Digest sketch on values produced by `expr`. Compression parameter (default value 100) determines the accuracy and size of the sketch Higher compression means higher accuracy but more space to store sketches. See [t-digest extension](../development/extensions-contrib/tdigestsketch-quantiles.md) documentation for additional details.|
|`VAR_POP(expr)`|Computes variance population of `expr`. See [stats extension](../development/extensions-core/stats.md) documentation for additional details.|
|`VAR_SAMP(expr)`|Computes variance sample of `expr`. See [stats extension](../development/extensions-core/stats.md) documentation for additional details.|
|`VARIANCE(expr)`|Computes variance sample of `expr`. See [stats extension](../development/extensions-core/stats.md) documentation for additional details.|
|`STDDEV_POP(expr)`|Computes standard deviation population of `expr`. See [stats extension](../development/extensions-core/stats.md) documentation for additional details.|
|`STDDEV_SAMP(expr)`|Computes standard deviation sample of `expr`. See [stats extension](../development/extensions-core/stats.md) documentation for additional details.|
|`STDDEV(expr)`|Computes standard deviation sample of `expr`. See [stats extension](../development/extensions-core/stats.md) documentation for additional details.|
|`EARLIEST(expr)`|Returns the earliest value of `expr`, which must be numeric. If `expr` comes from a relation with a timestamp column (like a Druid datasource) then "earliest" is the value first encountered with the minimum overall timestamp of all values being aggregated. If `expr` does not come from a relation with a timestamp, then it is simply the first value encountered.|
|`EARLIEST(expr, maxBytesPerString)`|Like `EARLIEST(expr)`, but for strings. The `maxBytesPerString` parameter determines how much aggregation space to allocate per string. Strings longer than this limit will be truncated. This parameter should be set as low as possible, since high values will lead to wasted memory.|
|`LATEST(expr)`|Returns the latest value of `expr`, which must be numeric. If `expr` comes from a relation with a timestamp column (like a Druid datasource) then "latest" is the value last encountered with the maximum overall timestamp of all values being aggregated. If `expr` does not come from a relation with a timestamp, then it is simply the last value encountered.|
@ -341,7 +341,7 @@ Only the COUNT aggregation can accept DISTINCT.
|`ANY_VALUE(expr, maxBytesPerString)`|Like `ANY_VALUE(expr)`, but for strings. The `maxBytesPerString` parameter determines how much aggregation space to allocate per string. Strings longer than this limit will be truncated. This parameter should be set as low as possible, since high values will lead to wasted memory.|
|`GROUPING(expr, expr...)`|Returns a number to indicate which groupBy dimension is included in a row, when using `GROUPING SETS`. Refer to [additional documentation](aggregations.md#grouping-aggregator) on how to infer this number.|
For advice on choosing approximate aggregation functions, check out our [approximate aggregations documentation](aggregations.html#approx).
For advice on choosing approximate aggregation functions, check out our [approximate aggregations documentation](aggregations.md#approx).
## Scalar functions
@ -394,7 +394,7 @@ String functions accept strings, and return a type appropriate to the function.
|`CHAR_LENGTH(expr)`|Synonym for `LENGTH`.|
|`CHARACTER_LENGTH(expr)`|Synonym for `LENGTH`.|
|`STRLEN(expr)`|Synonym for `LENGTH`.|
|`LOOKUP(expr, lookupName)`|Look up expr in a registered [query-time lookup table](lookups.html). Note that lookups can also be queried directly using the [`lookup` schema](#from).|
|`LOOKUP(expr, lookupName)`|Look up expr in a registered [query-time lookup table](lookups.md). Note that lookups can also be queried directly using the [`lookup` schema](#from).|
|`LOWER(expr)`|Returns expr in all lowercase.|
|`PARSE_LONG(string[, radix])`|Parses a string into a long (BIGINT) with the given radix, or 10 (decimal) if a radix is not provided.|
|`POSITION(needle IN haystack [FROM fromIndex])`|Returns the index of needle within haystack, with indexes starting from 1. The search will begin at fromIndex, or 1 if fromIndex is not specified. If the needle is not found, returns 0.|
@ -518,8 +518,8 @@ These functions operate on expressions or columns that return sketch objects.
#### HLL sketch functions
The following functions operate on [DataSketches HLL sketches](../development/extensions-core/datasketches-hll.html).
The [DataSketches extension](../development/extensions-core/datasketches-extension.html) must be loaded to use the following functions.
The following functions operate on [DataSketches HLL sketches](../development/extensions-core/datasketches-hll.md).
The [DataSketches extension](../development/extensions-core/datasketches-extension.md) must be loaded to use the following functions.
|Function|Notes|
|--------|-----|
@ -530,8 +530,8 @@ The [DataSketches extension](../development/extensions-core/datasketches-extensi
#### Theta sketch functions
The following functions operate on [theta sketches](../development/extensions-core/datasketches-theta.html).
The [DataSketches extension](../development/extensions-core/datasketches-extension.html) must be loaded to use the following functions.
The following functions operate on [theta sketches](../development/extensions-core/datasketches-theta.md).
The [DataSketches extension](../development/extensions-core/datasketches-extension.md) must be loaded to use the following functions.
|Function|Notes|
|--------|-----|
@ -543,8 +543,8 @@ The [DataSketches extension](../development/extensions-core/datasketches-extensi
#### Quantiles sketch functions
The following functions operate on [quantiles sketches](../development/extensions-core/datasketches-quantiles.html).
The [DataSketches extension](../development/extensions-core/datasketches-extension.html) must be loaded to use the following functions.
The following functions operate on [quantiles sketches](../development/extensions-core/datasketches-quantiles.md).
The [DataSketches extension](../development/extensions-core/datasketches-extension.md) must be loaded to use the following functions.
|Function|Notes|
|--------|-----|
@ -565,7 +565,7 @@ The [DataSketches extension](../development/extensions-core/datasketches-extensi
|`NULLIF(value1, value2)`|Returns NULL if value1 and value2 match, else returns value1.|
|`COALESCE(value1, value2, ...)`|Returns the first value that is neither NULL nor empty string.|
|`NVL(expr,expr-for-null)`|Returns 'expr-for-null' if 'expr' is null (or empty string for string type).|
|`BLOOM_FILTER_TEST(<expr>, <serialized-filter>)`|Returns true if the value is contained in a Base64-serialized bloom filter. See the [Bloom filter extension](../development/extensions-core/bloom-filter.html) documentation for additional details.|
|`BLOOM_FILTER_TEST(<expr>, <serialized-filter>)`|Returns true if the value is contained in a Base64-serialized bloom filter. See the [Bloom filter extension](../development/extensions-core/bloom-filter.md) documentation for additional details.|
## Multi-value string functions
@ -701,20 +701,20 @@ enabling logging and running this query, we can see that it actually runs as the
Druid SQL uses four different native query types.
- [Scan](scan-query.html) is used for queries that do not aggregate (no GROUP BY, no DISTINCT).
- [Scan](scan-query.md) is used for queries that do not aggregate (no GROUP BY, no DISTINCT).
- [Timeseries](timeseriesquery.html) is used for queries that GROUP BY `FLOOR(__time TO <unit>)` or `TIME_FLOOR(__time,
- [Timeseries](timeseriesquery.md) is used for queries that GROUP BY `FLOOR(__time TO <unit>)` or `TIME_FLOOR(__time,
period)`, have no other grouping expressions, no HAVING or LIMIT clauses, no nesting, and either no ORDER BY, or an
ORDER BY that orders by same expression as present in GROUP BY. It also uses Timeseries for "grand total" queries that
have aggregation functions but no GROUP BY. This query type takes advantage of the fact that Druid segments are sorted
by time.
- [TopN](topnquery.html) is used by default for queries that group by a single expression, do have ORDER BY and LIMIT
- [TopN](topnquery.md) is used by default for queries that group by a single expression, do have ORDER BY and LIMIT
clauses, do not have HAVING clauses, and are not nested. However, the TopN query type will deliver approximate ranking
and results in some cases; if you want to avoid this, set "useApproximateTopN" to "false". TopN results are always
computed in memory. See the TopN documentation for more details.
- [GroupBy](groupbyquery.html) is used for all other aggregations, including any nested aggregation queries. Druid's
- [GroupBy](groupbyquery.md) is used for all other aggregations, including any nested aggregation queries. Druid's
GroupBy is a traditional aggregation engine: it delivers exact results and rankings and supports a wide variety of
features. GroupBy aggregates in memory if it can, but it may spill to disk if it doesn't have enough memory to complete
your query. Results are streamed back from data processes through the Broker if you ORDER BY the same expressions in your
@ -799,9 +799,9 @@ Druid does not support all SQL features. In particular, the following features a
Additionally, some Druid native query features are not supported by the SQL language. Some unsupported Druid features
include:
- [Inline datasources](datasource.html#inline).
- [Spatial filters](../development/geo.html).
- [Query cancellation](querying.html#query-cancellation).
- [Inline datasources](datasource.md#inline).
- [Spatial filters](../development/geo.md).
- [Query cancellation](querying.md#query-cancellation).
- [Multi-value dimensions](#multi-value-strings) are only partially implemented in Druid SQL. There are known
inconsistencies between their behavior in SQL queries and in native queries due to how they are currently treated by
the SQL planner.
@ -898,7 +898,7 @@ will be a list of column names. For the `object` and `objectLines` formats, the
keys are column names, and the values are null.
Errors that occur before the response body is sent will be reported in JSON, with an HTTP 500 status code, in the
same format as [native Druid query errors](../querying/querying.html#query-errors). If an error occurs while the response body is
same format as [native Druid query errors](../querying/querying.md#query-errors). If an error occurs while the response body is
being sent, at that point it is too late to change the HTTP status code or report a JSON error, so the response will
simply end midstream and an error will be logged by the Druid server that was handling your request.
@ -962,7 +962,7 @@ final ResultSet resultSet = statement.executeQuery();
Druid SQL supports setting connection parameters on the client. The parameters in the table below affect SQL planning.
All other context parameters you provide will be attached to Druid queries and can affect how they run. See
[Query context](query-context.html) for details on the possible options.
[Query context](query-context.md) for details on the possible options.
```java
String url = "jdbc:avatica:remote:url=http://localhost:8082/druid/v2/sql/avatica/";
@ -987,13 +987,13 @@ Connection context can be specified as JDBC connection properties or as a "conte
|`sqlQueryId`|Unique identifier given to this SQL query. For HTTP client, it will be returned in `X-Druid-SQL-Query-Id` header.|auto-generated|
|`sqlTimeZone`|Sets the time zone for this connection, which will affect how time functions and timestamp literals behave. Should be a time zone name like "America/Los_Angeles" or offset like "-08:00".|druid.sql.planner.sqlTimeZone on the Broker (default: UTC)|
|`useApproximateCountDistinct`|Whether to use an approximate cardinality algorithm for `COUNT(DISTINCT foo)`.|druid.sql.planner.useApproximateCountDistinct on the Broker (default: true)|
|`useApproximateTopN`|Whether to use approximate [TopN queries](topnquery.html) when a SQL query could be expressed as such. If false, exact [GroupBy queries](groupbyquery.html) will be used instead.|druid.sql.planner.useApproximateTopN on the Broker (default: true)|
|`useApproximateTopN`|Whether to use approximate [TopN queries](topnquery.md) when a SQL query could be expressed as such. If false, exact [GroupBy queries](groupbyquery.md) will be used instead.|druid.sql.planner.useApproximateTopN on the Broker (default: true)|
## Metadata tables
Druid Brokers infer table and column metadata for each datasource from segments loaded in the cluster, and use this to
plan SQL queries. This metadata is cached on Broker startup and also updated periodically in the background through
[SegmentMetadata queries](segmentmetadataquery.html). Background metadata refreshing is triggered by
[SegmentMetadata queries](segmentmetadataquery.md). Background metadata refreshing is triggered by
segments entering and exiting the cluster, and can also be throttled through configuration.
Druid exposes system information through special system tables. There are two such schemas available: Information Schema and Sys Schema.
@ -1136,9 +1136,9 @@ Servers table lists all discovered servers in the cluster.
|plaintext_port|LONG|Unsecured port of the server, or -1 if plaintext traffic is disabled|
|tls_port|LONG|TLS port of the server, or -1 if TLS is disabled|
|server_type|STRING|Type of Druid service. Possible values include: COORDINATOR, OVERLORD, BROKER, ROUTER, HISTORICAL, MIDDLE_MANAGER or PEON.|
|tier|STRING|Distribution tier see [druid.server.tier](../configuration/index.html#historical-general-configuration). Only valid for HISTORICAL type, for other types it's null|
|tier|STRING|Distribution tier see [druid.server.tier](../configuration/index.md#historical-general-configuration). Only valid for HISTORICAL type, for other types it's null|
|current_size|LONG|Current size of segments in bytes on this server. Only valid for HISTORICAL type, for other types it's 0|
|max_size|LONG|Max size in bytes this server recommends to assign to segments see [druid.server.maxSize](../configuration/index.html#historical-general-configuration). Only valid for HISTORICAL type, for other types it's 0|
|max_size|LONG|Max size in bytes this server recommends to assign to segments see [druid.server.maxSize](../configuration/index.md#historical-general-configuration). Only valid for HISTORICAL type, for other types it's 0|
To retrieve information about all servers, use the query:
@ -1171,13 +1171,13 @@ GROUP BY servers.server;
#### TASKS table
The tasks table provides information about active and recently-completed indexing tasks. For more information
check out the documentation for [ingestion tasks](../ingestion/tasks.html).
check out the documentation for [ingestion tasks](../ingestion/tasks.md).
|Column|Type|Notes|
|------|-----|-----|
|task_id|STRING|Unique task identifier|
|group_id|STRING|Task group ID for this task, the value depends on the task `type`. For example, for native index tasks, it's same as `task_id`, for sub tasks, this value is the parent task's ID|
|type|STRING|Task type, for example this value is "index" for indexing tasks. See [tasks-overview](../ingestion/tasks.html)|
|type|STRING|Task type, for example this value is "index" for indexing tasks. See [tasks-overview](../ingestion/tasks.md)|
|datasource|STRING|Datasource name being indexed|
|created_time|STRING|Timestamp in ISO8601 format corresponding to when the ingestion task was created. Note that this value is populated for completed and waiting tasks. For running and pending tasks this value is set to 1970-01-01T00:00:00Z|
|queue_insertion_time|STRING|Timestamp in ISO8601 format corresponding to when this task was added to the queue on the Overlord|
@ -1203,8 +1203,8 @@ The supervisors table provides information about supervisors.
|Column|Type|Notes|
|------|-----|-----|
|supervisor_id|STRING|Supervisor task identifier|
|state|STRING|Basic state of the supervisor. Available states: `UNHEALTHY_SUPERVISOR`, `UNHEALTHY_TASKS`, `PENDING`, `RUNNING`, `SUSPENDED`, `STOPPING`. Check [Kafka Docs](../development/extensions-core/kafka-ingestion.html#operations) for details.|
|detailed_state|STRING|Supervisor specific state. (See documentation of the specific supervisor for details, e.g. [Kafka](../development/extensions-core/kafka-ingestion.html) or [Kinesis](../development/extensions-core/kinesis-ingestion.html))|
|state|STRING|Basic state of the supervisor. Available states: `UNHEALTHY_SUPERVISOR`, `UNHEALTHY_TASKS`, `PENDING`, `RUNNING`, `SUSPENDED`, `STOPPING`. Check [Kafka Docs](../development/extensions-core/kafka-ingestion.md#operations) for details.|
|detailed_state|STRING|Supervisor specific state. (See documentation of the specific supervisor for details, e.g. [Kafka](../development/extensions-core/kafka-ingestion.md) or [Kinesis](../development/extensions-core/kinesis-ingestion.md))|
|healthy|LONG|Boolean represented as long type where 1 = true, 0 = false. 1 indicates a healthy supervisor|
|type|STRING|Type of supervisor, e.g. `kafka`, `kinesis` or `materialized_view`|
|source|STRING|Source of the supervisor, e.g. Kafka topic or Kinesis stream|
@ -1220,9 +1220,9 @@ SELECT * FROM sys.supervisors WHERE healthy=0;
## Server configuration
Druid SQL planning occurs on the Broker and is configured by
[Broker runtime properties](../configuration/index.html#sql).
[Broker runtime properties](../configuration/index.md#sql).
## Security
Please see [Defining SQL permissions](../operations/security-user-auth.md#sql-permissions) in the
basic security documentation for information on what permissions are needed for making SQL queries.
basic security documentation for information on permissions needed for making SQL queries.

View File

@ -159,10 +159,10 @@ topN queries can group on multi-value dimensions. When grouping on a multi-value
from matching rows will be used to generate one group per value. It's possible for a query to return more groups than
there are rows. For example, a topN on the dimension `tags` with filter `"t1" AND "t3"` would match only row1, and
generate a result with three groups: `t1`, `t2`, and `t3`. If you only need to include values that match
your filter, you can use a [filtered dimensionSpec](dimensionspecs.html#filtered-dimensionspecs). This can also
your filter, you can use a [filtered dimensionSpec](dimensionspecs.md#filtered-dimensionspecs). This can also
improve performance.
See [Multi-value dimensions](multi-value-dimensions.html) for more details.
See [Multi-value dimensions](multi-value-dimensions.md) for more details.
## Aliasing

View File

@ -161,12 +161,12 @@ cd apache-druid-{{DRUIDVERSION}}
In the package, you should find:
* `LICENSE` and `NOTICE` files
* `bin/*` - scripts related to the [single-machine quickstart](index.html)
* `bin/*` - scripts related to the [single-machine quickstart](index.md)
* `conf/druid/cluster/*` - template configurations for a clustered setup
* `extensions/*` - core Druid extensions
* `hadoop-dependencies/*` - Druid Hadoop dependencies
* `lib/*` - libraries and dependencies for core Druid
* `quickstart/*` - files related to the [single-machine quickstart](index.html)
* `quickstart/*` - files related to the [single-machine quickstart](index.md)
We'll be editing the files in `conf/druid/cluster/` in order to get things running.

View File

@ -27,8 +27,8 @@ sidebar_label: "Load from Apache Hadoop"
This tutorial shows you how to load data files into Apache Druid using a remote Hadoop cluster.
For this tutorial, we'll assume that you've already completed the previous
[batch ingestion tutorial](tutorial-batch.html) using Druid's native batch ingestion system and are using the
`micro-quickstart` single-machine configuration as described in the [quickstart](index.html).
[batch ingestion tutorial](tutorial-batch.md) using Druid's native batch ingestion system and are using the
`micro-quickstart` single-machine configuration as described in the [quickstart](index.md).
## Install Docker

View File

@ -30,7 +30,7 @@ Because there is some per-segment memory and processing overhead, it can sometim
Please check [Segment size optimization](../operations/segment-optimization.md) for details.
For this tutorial, we'll assume you've already downloaded Apache Druid as described in
the [single-machine quickstart](index.html) and have it running on your local machine.
the [single-machine quickstart](index.md) and have it running on your local machine.
It will also be helpful to have finished [Tutorial: Loading a file](../tutorials/tutorial-batch.md) and [Tutorial: Querying data](../tutorials/tutorial-query.md).

View File

@ -27,7 +27,7 @@ sidebar_label: "Deleting data"
This tutorial demonstrates how to delete existing data.
For this tutorial, we'll assume you've already downloaded Apache Druid as described in
the [single-machine quickstart](index.html) and have it running on your local machine.
the [single-machine quickstart](index.md) and have it running on your local machine.
## Load initial data
@ -39,7 +39,7 @@ Let's load this initial data:
bin/post-index-task --file quickstart/tutorial/deletion-index.json --url http://localhost:8081
```
When the load finishes, open [http://localhost:8888/unified-console.html#datasources](http://localhost:8888/unified-console.html#datasources) in a browser.
When the load finishes, open [http://localhost:8888/unified-console.md#datasources](http://localhost:8888/unified-console.html#datasources) in a browser.
## How to permanently delete data

View File

@ -27,7 +27,7 @@ sidebar_label: "Writing an ingestion spec"
This tutorial will guide the reader through the process of defining an ingestion spec, pointing out key considerations and guidelines.
For this tutorial, we'll assume you've already downloaded Apache Druid as described in
the [single-machine quickstart](index.html) and have it running on your local machine.
the [single-machine quickstart](index.md) and have it running on your local machine.
It will also be helpful to have finished [Tutorial: Loading a file](../tutorials/tutorial-batch.md), [Tutorial: Querying data](../tutorials/tutorial-query.md), and [Tutorial: Rollup](../tutorials/tutorial-rollup.md).

View File

@ -29,7 +29,7 @@ sidebar_label: "Load from Apache Kafka"
This tutorial demonstrates how to load data into Apache Druid from a Kafka stream, using Druid's Kafka indexing service.
For this tutorial, we'll assume you've already downloaded Druid as described in
the [quickstart](index.html) using the `micro-quickstart` single-machine configuration and have it
the [quickstart](index.md) using the `micro-quickstart` single-machine configuration and have it
running on your local machine. You don't need to have loaded any data yet.
## Download and start Kafka
@ -254,7 +254,7 @@ If the supervisor was successfully created, you will get a response containing t
For more details about what's going on here, check out the
[Druid Kafka indexing service documentation](../development/extensions-core/kafka-ingestion.md).
You can view the current supervisors and tasks in the Druid Console: [http://localhost:8888/unified-console.html#tasks](http://localhost:8888/unified-console.html#tasks).
You can view the current supervisors and tasks in the Druid Console: [http://localhost:8888/unified-console.md#tasks](http://localhost:8888/unified-console.html#tasks).
## Querying your data

View File

@ -27,7 +27,7 @@ sidebar_label: "Configuring data retention"
This tutorial demonstrates how to configure retention rules on a datasource to set the time intervals of data that will be retained or dropped.
For this tutorial, we'll assume you've already downloaded Apache Druid as described in
the [single-machine quickstart](index.html) and have it running on your local machine.
the [single-machine quickstart](index.md) and have it running on your local machine.
It will also be helpful to have finished [Tutorial: Loading a file](../tutorials/tutorial-batch.md) and [Tutorial: Querying data](../tutorials/tutorial-query.md).

View File

@ -29,7 +29,7 @@ Apache Druid can summarize raw data at ingestion time using a process we refer t
This tutorial will demonstrate the effects of roll-up on an example dataset.
For this tutorial, we'll assume you've already downloaded Druid as described in
the [single-machine quickstart](index.html) and have it running on your local machine.
the [single-machine quickstart](index.md) and have it running on your local machine.
It will also be helpful to have finished [Tutorial: Loading a file](../tutorials/tutorial-batch.md) and [Tutorial: Querying data](../tutorials/tutorial-query.md).

View File

@ -27,7 +27,7 @@ sidebar_label: "Transforming input data"
This tutorial will demonstrate how to use transform specs to filter and transform input data during ingestion.
For this tutorial, we'll assume you've already downloaded Apache Druid as described in
the [single-machine quickstart](index.html) and have it running on your local machine.
the [single-machine quickstart](index.md) and have it running on your local machine.
It will also be helpful to have finished [Tutorial: Loading a file](../tutorials/tutorial-batch.md) and [Tutorial: Querying data](../tutorials/tutorial-query.md).

View File

@ -27,7 +27,7 @@ sidebar_label: "Updating existing data"
This tutorial demonstrates how to update existing data, showing both overwrites and appends.
For this tutorial, we'll assume you've already downloaded Apache Druid as described in
the [single-machine quickstart](index.html) and have it running on your local machine.
the [single-machine quickstart](index.md) and have it running on your local machine.
It will also be helpful to have finished [Tutorial: Loading a file](../tutorials/tutorial-batch.md), [Tutorial: Querying data](../tutorials/tutorial-query.md), and [Tutorial: Rollup](../tutorials/tutorial-rollup.md).