diff --git a/README.md b/README.md
index c23f11d2242..5f516e650f3 100644
--- a/README.md
+++ b/README.md
@@ -85,7 +85,7 @@ Use the built-in query workbench to prototype [DruidSQL](https://druid.apache.or
### Documentation
-See the [latest documentation](https://druid.apache.org/docs/latest/) for the documentation for the current official release. If you need information on a previous release, you can browse [previous releases documentation](https://druid.apache.org/docs/).
+See the [latest documentation](https://druid.apache.org/docs/latest/) for the documentation for the current official release. If you need information on a previous release, you can browse [previous releases documentation](https://druid.apache.org/docs/).
Make documentation and tutorials updates in [`/docs`](https://github.com/apache/druid/tree/master/docs) using [MarkDown](https://www.markdownguide.org/) and contribute them using a pull request.
diff --git a/docs/operations/api-reference.md b/docs/api-reference/api-reference.md
similarity index 92%
rename from docs/operations/api-reference.md
rename to docs/api-reference/api-reference.md
index af390e0774e..9b762c08183 100644
--- a/docs/operations/api-reference.md
+++ b/docs/api-reference/api-reference.md
@@ -1,6 +1,7 @@
---
id: api-reference
-title: "API reference"
+title: HTTP API endpoints reference
+sidebar_label: API endpoints reference
---
-This page documents all of the API endpoints for each Druid service type.
+This topic documents all of the API endpoints for each Druid service type.
## Common
-The following endpoints are supported by all processes.
+All processes support the following endpoints.
### Process information
`GET /status`
-Returns the Druid version, loaded extensions, memory used, total memory and other useful information about the process.
+Returns the Druid version, loaded extensions, memory used, total memory, and other useful information about the process.
`GET /status/health`
-An endpoint that always returns a boolean "true" value with a 200 OK response, useful for automated health checks.
+Always returns a boolean `true` value with a 200 OK response, useful for automated health checks.
`GET /status/properties`
@@ -77,7 +78,7 @@ Returns the current leader Coordinator of the cluster.
`GET /druid/coordinator/v1/isLeader`
-Returns a JSON object with field "leader", either true or false, indicating if this server is the current leader
+Returns a JSON object with `leader` parameter, either true or false, indicating if this server is the current leader
Coordinator of the cluster. In addition, returns HTTP 200 if the server is the current leader and HTTP 404 if not.
This is suitable for use as a load balancer status check if you only want the active leader to be considered in-service
at the load balancer.
@@ -119,11 +120,10 @@ Returns the number of segments to load and drop, as well as the total segment lo
Returns the serialized JSON of segments to load and drop for each Historical process.
-
#### Segment loading by datasource
-Note that all _interval_ query parameters are ISO 8601 strings (e.g., 2016-06-27/2016-06-28).
-Also note that these APIs only guarantees that the segments are available at the time of the call.
+Note that all _interval_ query parameters are ISO 8601 strings—for example, 2016-06-27/2016-06-28.
+Also note that these APIs only guarantees that the segments are available at the time of the call.
Segments can still become missing because of historical process failures or any other reasons afterward.
`GET /druid/coordinator/v1/datasources/{dataSourceName}/loadstatus?forceMetadataRefresh={boolean}&interval={myInterval}`
@@ -144,7 +144,7 @@ over the given interval (or last 2 weeks if interval is not given). This does no
(Note: `forceMetadataRefresh=true` refreshes Coordinator's metadata cache of all datasources. This can be a heavy operation in terms
of the load on the metadata store but can be necessary to make sure that we verify all the latest segments' load status)
* Setting `forceMetadataRefresh` to false will use the metadata cached on the coordinator from the last force/periodic refresh.
-If no used segments are found for the given inputs, this API returns `204 No Content`
+If no used segments are found for the given inputs, this API returns `204 No Content`
`GET /druid/coordinator/v1/datasources/{dataSourceName}/loadstatus?full&forceMetadataRefresh={boolean}&interval={myInterval}`
@@ -216,18 +216,17 @@ segment is unused, or is unknown, a 404 response is returned.
`GET /druid/coordinator/v1/metadata/datasources/{dataSourceName}/segments`
-Returns a list of all segments, overlapping with any of given intervals, for a datasource as stored in the metadata store. Request body is array of string IS0 8601 intervals like [interval1, interval2,...] for example ["2012-01-01T00:00:00.000/2012-01-03T00:00:00.000", "2012-01-05T00:00:00.000/2012-01-07T00:00:00.000"]
+Returns a list of all segments, overlapping with any of given intervals, for a datasource as stored in the metadata store. Request body is array of string IS0 8601 intervals like `[interval1, interval2,...]`—for example, `["2012-01-01T00:00:00.000/2012-01-03T00:00:00.000", "2012-01-05T00:00:00.000/2012-01-07T00:00:00.000"]`.
`GET /druid/coordinator/v1/metadata/datasources/{dataSourceName}/segments?full`
-Returns a list of all segments, overlapping with any of given intervals, for a datasource with the full segment metadata as stored in the metadata store. Request body is array of string ISO 8601 intervals like [interval1, interval2,...] for example ["2012-01-01T00:00:00.000/2012-01-03T00:00:00.000", "2012-01-05T00:00:00.000/2012-01-07T00:00:00.000"]
+Returns a list of all segments, overlapping with any of given intervals, for a datasource with the full segment metadata as stored in the metadata store. Request body is array of string ISO 8601 intervals like `[interval1, interval2,...]`—for example, `["2012-01-01T00:00:00.000/2012-01-03T00:00:00.000", "2012-01-05T00:00:00.000/2012-01-07T00:00:00.000"]`.
#### Datasources
-Note that all _interval_ URL parameters are ISO 8601 strings delimited by a `_` instead of a `/`
-(e.g., 2016-06-27_2016-06-28).
+Note that all _interval_ URL parameters are ISO 8601 strings delimited by a `_` instead of a `/`—for example, `2016-06-27_2016-06-28`.
`GET /druid/coordinator/v1/datasources`
@@ -235,7 +234,7 @@ Returns a list of datasource names found in the cluster as seen by the coordinat
`GET /druid/coordinator/v1/datasources?simple`
-Returns a list of JSON objects containing the name and properties of datasources found in the cluster. Properties include segment count, total segment byte size, replicated total segment byte size, minTime, and maxTime.
+Returns a list of JSON objects containing the name and properties of datasources found in the cluster. Properties include segment count, total segment byte size, replicated total segment byte size, minTime, and maxTime.
`GET /druid/coordinator/v1/datasources?full`
@@ -247,7 +246,7 @@ Returns a JSON object containing the name and properties of a datasource. Proper
`GET /druid/coordinator/v1/datasources/{dataSourceName}?full`
-Returns full metadata for a datasource .
+Returns full metadata for a datasource.
`GET /druid/coordinator/v1/datasources/{dataSourceName}/intervals`
@@ -294,6 +293,7 @@ Returns full segment metadata for a specific segment in the cluster.
Return the tiers that a datasource exists in.
#### Note for Coordinator's POST and DELETE APIs
+
While segments may be enabled by issuing POST requests for the datasources, the Coordinator may again disable segments if they match any configured [drop rules](../operations/rule-configuration.md#drop-rules). Even if segments are enabled by these APIs, you must configure a [load rule](../operations/rule-configuration.md#load-rules) to load them onto Historical processes. If an indexing or kill task runs at the same time these APIs are invoked, the behavior is undefined. Some segments might be killed and others might be enabled. It's also possible that all segments might be disabled, but the indexing task can still read data from those segments and succeed.
> Avoid using indexing or kill tasks and these APIs at the same time for the same datasource and time chunk.
@@ -316,8 +316,8 @@ result of this API call.
Marks segments (un)used for a datasource by interval or set of segment Ids. When marking used only segments that are not overshadowed will be updated.
-The request payload contains the interval or set of segment Ids to be marked unused.
-Either interval or segment ids should be provided, if both or none are provided in the payload, the API would throw an error (400 BAD REQUEST).
+The request payload contains the interval or set of segment IDs to be marked unused.
+Either interval or segment IDs should be provided, if both or none are provided in the payload, the API would throw an error (400 BAD REQUEST).
Interval specifies the start and end times as IS0 8601 strings. `interval=(start/end)` where start and end both are inclusive and only the segments completely contained within the specified interval will be disabled, partially overlapping segments will not be affected.
@@ -325,9 +325,8 @@ JSON Request Payload:
|Key|Description|Example|
|----------|-------------|---------|
-|`interval`|The interval for which to mark segments unused|"2015-09-12T03:00:00.000Z/2015-09-12T05:00:00.000Z"|
-|`segmentIds`|Set of segment Ids to be marked unused|["segmentId1", "segmentId2"]|
-
+|`interval`|The interval for which to mark segments unused|`"2015-09-12T03:00:00.000Z/2015-09-12T05:00:00.000Z"`|
+|`segmentIds`|Set of segment IDs to be marked unused|`["segmentId1", "segmentId2"]`|
`DELETE /druid/coordinator/v1/datasources/{dataSourceName}`
@@ -348,8 +347,7 @@ result of this API call.
#### Retention rules
-Note that all _interval_ URL parameters are ISO 8601 strings delimited by a `_` instead of a `/`
-(e.g., 2016-06-27_2016-06-28).
+Note that all _interval_ URL parameters are ISO 8601 strings delimited by a `_` instead of a `/` as in `2016-06-27_2016-06-28`.
`GET /druid/coordinator/v1/rules`
@@ -365,7 +363,7 @@ Returns all rules for a specified datasource and includes default datasource.
`GET /druid/coordinator/v1/rules/history?interval=`
-Returns audit history of rules for all datasources. default value of interval can be specified by setting `druid.audit.manager.auditHistoryMillis` (1 week if not configured) in Coordinator runtime.properties
+Returns audit history of rules for all datasources. Default value of interval can be specified by setting `druid.audit.manager.auditHistoryMillis` (1 week if not configured) in Coordinator `runtime.properties`.
`GET /druid/coordinator/v1/rules/history?count=`
@@ -373,7 +371,7 @@ Returns last `n` entries of audit history of rules for all datasources.
`GET /druid/coordinator/v1/rules/{dataSourceName}/history?interval=`
-Returns audit history of rules for a specified datasource. default value of interval can be specified by setting `druid.audit.manager.auditHistoryMillis` (1 week if not configured) in Coordinator runtime.properties
+Returns audit history of rules for a specified datasource. Default value of interval can be specified by setting `druid.audit.manager.auditHistoryMillis` (1 week if not configured) in Coordinator `runtime.properties`.
`GET /druid/coordinator/v1/rules/{dataSourceName}/history?count=`
@@ -387,13 +385,12 @@ Optional Header Parameters for auditing the config change can also be specified.
|Header Param Name| Description | Default |
|----------|-------------|---------|
-|`X-Druid-Author`| author making the config change|""|
-|`X-Druid-Comment`| comment describing the change being done|""|
+|`X-Druid-Author`| Author making the config change|`""`|
+|`X-Druid-Comment`| Comment describing the change being done|`""`|
#### Intervals
-Note that all _interval_ URL parameters are ISO 8601 strings delimited by a `_` instead of a `/`
-(e.g., 2016-06-27_2016-06-28).
+Note that all _interval_ URL parameters are ISO 8601 strings delimited by a `_` instead of a `/` as in `2016-06-27_2016-06-28`.
`GET /druid/coordinator/v1/intervals`
@@ -401,22 +398,22 @@ Returns all intervals for all datasources with total size and count.
`GET /druid/coordinator/v1/intervals/{interval}`
-Returns aggregated total size and count for all intervals that intersect given isointerval.
+Returns aggregated total size and count for all intervals that intersect given ISO interval.
`GET /druid/coordinator/v1/intervals/{interval}?simple`
-Returns total size and count for each interval within given isointerval.
+Returns total size and count for each interval within given ISO interval.
`GET /druid/coordinator/v1/intervals/{interval}?full`
-Returns total size and count for each datasource for each interval within given isointerval.
+Returns total size and count for each datasource for each interval within given ISO interval.
#### Dynamic configuration
See [Coordinator Dynamic Configuration](../configuration/index.md#dynamic-configuration) for details.
Note that all _interval_ URL parameters are ISO 8601 strings delimited by a `_` instead of a `/`
-(e.g., 2016-06-27_2016-06-28).
+as in `2016-06-27_2016-06-28`.
`GET /druid/coordinator/v1/config`
@@ -437,11 +434,10 @@ Update overlord dynamic worker configuration.
Returns the total size of segments awaiting compaction for the given dataSource. The specified dataSource must have [automatic compaction](../data-management/automatic-compaction.md) enabled.
-
-
`GET /druid/coordinator/v1/compaction/status`
Returns the status and statistics from the auto-compaction run of all dataSources which have auto-compaction enabled in the latest run. The response payload includes a list of `latestStatus` objects. Each `latestStatus` represents the status for a dataSource (which has/had auto-compaction enabled).
+
The `latestStatus` object has the following keys:
* `dataSource`: name of the datasource for this status information
* `scheduleStatus`: auto-compaction scheduling status. Possible values are `NOT_ENABLED` and `RUNNING`. Returns `RUNNING ` if the dataSource has an active auto-compaction config submitted. Otherwise, returns `NOT_ENABLED`.
@@ -457,8 +453,8 @@ The `latestStatus` object has the following keys:
`GET /druid/coordinator/v1/compaction/status?dataSource={dataSource}`
-Similar to the API `/druid/coordinator/v1/compaction/status` above but filters response to only return information for the {dataSource} given.
-Note that {dataSource} given must have/had auto-compaction enabled.
+Similar to the API `/druid/coordinator/v1/compaction/status` above but filters response to only return information for the dataSource given.
+The dataSource must have auto-compaction enabled.
#### Automatic compaction configuration
@@ -525,14 +521,14 @@ Returns the current leader Overlord of the cluster. If you have multiple Overlor
`GET /druid/indexer/v1/isLeader`
-This returns a JSON object with field "leader", either true or false. In addition, this call returns HTTP 200 if the
+This returns a JSON object with field `leader`, either true or false. In addition, this call returns HTTP 200 if the
server is the current leader and HTTP 404 if not. This is suitable for use as a load balancer status check if you
only want the active leader to be considered in-service at the load balancer.
#### Tasks
Note that all _interval_ URL parameters are ISO 8601 strings delimited by a `_` instead of a `/`
-(e.g., 2016-06-27_2016-06-28).
+as in `2016-06-27_2016-06-28`.
`GET /druid/indexer/v1/tasks`
@@ -618,9 +614,9 @@ Returns a list of objects of the currently active supervisors.
|---|---|---|
|`id`|String|supervisor unique identifier|
|`state`|String|basic state of the supervisor. Available states:`UNHEALTHY_SUPERVISOR`, `UNHEALTHY_TASKS`, `PENDING`, `RUNNING`, `SUSPENDED`, `STOPPING`. Check [Kafka Docs](../development/extensions-core/kafka-supervisor-operations.md) for details.|
-|`detailedState`|String|supervisor specific state. (See documentation of specific supervisor for details), e.g. [Kafka](../development/extensions-core/kafka-ingestion.md) or [Kinesis](../development/extensions-core/kinesis-ingestion.md))|
+|`detailedState`|String|supervisor specific state. See documentation of specific supervisor for details: [Kafka](../development/extensions-core/kafka-ingestion.md) or [Kinesis](../development/extensions-core/kinesis-ingestion.md)|
|`healthy`|Boolean|true or false indicator of overall supervisor health|
-|`spec`|SupervisorSpec|json specification of supervisor (See Supervisor Configuration for details)|
+|`spec`|SupervisorSpec|JSON specification of supervisor|
`GET /druid/indexer/v1/supervisor?state=true`
@@ -630,7 +626,7 @@ Returns a list of objects of the currently active supervisors and their current
|---|---|---|
|`id`|String|supervisor unique identifier|
|`state`|String|basic state of the supervisor. Available states: `UNHEALTHY_SUPERVISOR`, `UNHEALTHY_TASKS`, `PENDING`, `RUNNING`, `SUSPENDED`, `STOPPING`. Check [Kafka Docs](../development/extensions-core/kafka-supervisor-operations.md) for details.|
-|`detailedState`|String|supervisor specific state. (See documentation of the specific supervisor for details, e.g. [Kafka](../development/extensions-core/kafka-ingestion.md) or [Kinesis](../development/extensions-core/kinesis-ingestion.md))|
+|`detailedState`|String|supervisor specific state. See documentation of the specific supervisor for details: [Kafka](../development/extensions-core/kafka-ingestion.md) or [Kinesis](../development/extensions-core/kinesis-ingestion.md)|
|`healthy`|Boolean|true or false indicator of overall supervisor health|
|`suspended`|Boolean|true or false indicator of whether the supervisor is in suspended state|
@@ -685,7 +681,7 @@ Terminate all supervisors at once.
`POST /druid/indexer/v1/supervisor//shutdown`
> This API is deprecated and will be removed in future releases.
-> Please use the equivalent 'terminate' instead.
+> Please use the equivalent `terminate` instead.
Shutdown a supervisor.
@@ -694,7 +690,7 @@ Shutdown a supervisor.
See [Overlord Dynamic Configuration](../configuration/index.md#overlord-dynamic-configuration) for details.
Note that all _interval_ URL parameters are ISO 8601 strings delimited by a `_` instead of a `/`
-(e.g., 2016-06-27_2016-06-28).
+as in `2016-06-27_2016-06-28`.
`GET /druid/indexer/v1/worker`
@@ -735,7 +731,7 @@ and `druid.port` with the boolean state as the value.
`GET /druid/worker/v1/tasks`
-Retrieve a list of active tasks being run on MiddleManager. Returns JSON list of taskid strings. Normal usage should
+Retrieve a list of active tasks being run on MiddleManager. Returns JSON list of taskid strings. Normal usage should
prefer to use the `/druid/indexer/v1/tasks` [Overlord API](#overlord) or one of it's task state specific variants instead.
```json
@@ -810,7 +806,7 @@ This section documents the API endpoints for the processes that reside on Query
#### Datasource information
Note that all _interval_ URL parameters are ISO 8601 strings delimited by a `_` instead of a `/`
-(e.g., 2016-06-27_2016-06-28).
+as in `2016-06-27_2016-06-28`.
> Note: Much of this information is available in a simpler, easier-to-use form through the Druid SQL
> [`INFORMATION_SCHEMA.TABLES`](../querying/sql-metadata-tables.md#tables-table),
diff --git a/docs/querying/sql-api.md b/docs/api-reference/sql-api.md
similarity index 90%
rename from docs/querying/sql-api.md
rename to docs/api-reference/sql-api.md
index a425b713a38..54cc3042d06 100644
--- a/docs/querying/sql-api.md
+++ b/docs/api-reference/sql-api.md
@@ -1,7 +1,7 @@
---
id: sql-api
-title: "Druid SQL API"
-sidebar_label: "Druid SQL API"
+title: Druid SQL API
+sidebar_label: Druid SQL
---
-> Apache Druid supports two query languages: Druid SQL and [native queries](querying.md).
+> Apache Druid supports two query languages: Druid SQL and [native queries](../querying/querying.md).
> This document describes the SQL language.
-You can submit and cancel [Druid SQL](./sql.md) queries using the Druid SQL API.
+You can submit and cancel [Druid SQL](../querying/sql.md) queries using the Druid SQL API.
The Druid SQL API is available at `https://ROUTER:8888/druid/v2/sql`, where `ROUTER` is the IP address of the Druid Router.
## Submit a query
@@ -50,8 +50,8 @@ Submit your query as the value of a "query" field in the JSON object within the
|`header`|Whether or not to include a header row for the query result. See [Responses](#responses) for details.|`false`|
|`typesHeader`|Whether or not to include type information in the header. Can only be set when `header` is also `true`. See [Responses](#responses) for details.|`false`|
|`sqlTypesHeader`|Whether or not to include SQL type information in the header. Can only be set when `header` is also `true`. See [Responses](#responses) for details.|`false`|
-|`context`|JSON object containing [SQL query context parameters](sql-query-context.md).|`{}` (empty)|
-|`parameters`|List of query parameters for parameterized queries. Each parameter in the list should be a JSON object like `{"type": "VARCHAR", "value": "foo"}`. The type should be a SQL type; see [Data types](sql-data-types.md) for a list of supported SQL types.|`[]` (empty)|
+|`context`|JSON object containing [SQL query context parameters](../querying/sql-query-context.md).|`{}` (empty)|
+|`parameters`|List of query parameters for parameterized queries. Each parameter in the list should be a JSON object like `{"type": "VARCHAR", "value": "foo"}`. The type should be a SQL type; see [Data types](../querying/sql-data-types.md) for a list of supported SQL types.|`[]` (empty)|
You can use _curl_ to send SQL queries from the command-line:
@@ -63,7 +63,7 @@ $ curl -XPOST -H'Content-Type: application/json' http://ROUTER:8888/druid/v2/sql
[{"TheCount":24433}]
```
-There are a variety of [SQL query context parameters](sql-query-context.md) you can provide by adding a "context" map,
+There are a variety of [SQL query context parameters](../querying/sql-query-context.md) you can provide by adding a "context" map,
like:
```json
@@ -87,14 +87,13 @@ Parameterized SQL queries are also supported:
}
```
-Metadata is available over HTTP POST by querying [metadata tables](sql-metadata-tables.md).
+Metadata is available over HTTP POST by querying [metadata tables](../querying/sql-metadata-tables.md).
### Responses
#### Result formats
-Druid SQL's HTTP POST API supports a variety of result formats. You can specify these by adding a "resultFormat"
-parameter, like:
+Druid SQL's HTTP POST API supports a variety of result formats. You can specify these by adding a `resultFormat` parameter, like:
```json
{
@@ -105,7 +104,7 @@ parameter, like:
To request a header with information about column names, set `header` to true in your request.
When you set `header` to true, you can optionally include `typesHeader` and `sqlTypesHeader` as well, which gives
-you information about [Druid runtime and SQL types](sql-data-types.md) respectively. You can request all these headers
+you information about [Druid runtime and SQL types](../querying/sql-data-types.md) respectively. You can request all these headers
with a request like:
```json
@@ -128,10 +127,10 @@ The following table shows supported result formats:
|`arrayLines`|Like `array`, but the JSON arrays are separated by newlines instead of being wrapped in a JSON array. This can make it easier to parse the entire response set as a stream, if you do not have ready access to a streaming JSON parser. To make it possible to detect a truncated response, this format includes a trailer of one blank line.|Same as `array`, except the rows are separated by newlines.|text/plain|
|`csv`|Comma-separated values, with one row per line. Individual field values may be escaped by being surrounded in double quotes. If double quotes appear in a field value, they will be escaped by replacing them with double-double-quotes like `""this""`. To make it possible to detect a truncated response, this format includes a trailer of one blank line.|Same as `array`, except the lists are in CSV format.|text/csv|
-If `typesHeader` is set to true, [Druid type](sql-data-types.md) information is included in the response. Complex types,
+If `typesHeader` is set to true, [Druid type](../querying/sql-data-types.md) information is included in the response. Complex types,
like sketches, will be reported as `COMPLEX` if a particular complex type name is known for that field,
or as `COMPLEX` if the particular type name is unknown or mixed. If `sqlTypesHeader` is set to true,
-[SQL type](sql-data-types.md) information is included in the response. It is possible to set both `typesHeader` and
+[SQL type](../querying/sql-data-types.md) information is included in the response. It is possible to set both `typesHeader` and
`sqlTypesHeader` at once. Both parameters require that `header` is also set.
To aid in building clients that are compatible with older Druid versions, Druid returns the HTTP header
@@ -140,7 +139,7 @@ understands the `typesHeader` and `sqlTypesHeader` parameters. This HTTP respons
whether `typesHeader` or `sqlTypesHeader` are set or not.
Druid returns the SQL query identifier in the `X-Druid-SQL-Query-Id` HTTP header.
-This query id will be assigned the value of `sqlQueryId` from the [query context parameters](sql-query-context.md)
+This query id will be assigned the value of `sqlQueryId` from the [query context parameters](../querying/sql-query-context.md)
if specified, else Druid will generate a SQL query id for you.
#### Errors
@@ -179,7 +178,7 @@ You can cancel the query using the query id `myQuery01` as follows:
curl --request DELETE 'https://ROUTER:8888/druid/v2/sql/myQuery01' \
```
-Cancellation requests require READ permission on all resources used in the sql query.
+Cancellation requests require READ permission on all resources used in the SQL query.
Druid returns an HTTP 202 response for successful deletion requests.
diff --git a/docs/multi-stage-query/api.md b/docs/api-reference/sql-ingestion-api.md
similarity index 97%
rename from docs/multi-stage-query/api.md
rename to docs/api-reference/sql-ingestion-api.md
index 19e1e11c4cc..a9cceb8d4d9 100644
--- a/docs/multi-stage-query/api.md
+++ b/docs/api-reference/sql-ingestion-api.md
@@ -1,7 +1,7 @@
---
-id: api
-title: SQL-based ingestion and multi-stage query task API
-sidebar_label: API
+id: sql-ingestion-api
+title: SQL-based ingestion API
+sidebar_label: SQL-based ingestion
---
-> Apache Druid supports two query languages: Druid SQL and [native queries](querying.md).
+> Apache Druid supports two query languages: Druid SQL and [native queries](../querying/querying.md).
> This document describes the SQL language.
-You can make [Druid SQL](./sql.md) queries using the [Avatica JDBC driver](https://calcite.apache.org/avatica/downloads/). We recommend using Avatica JDBC driver version 1.17.0 or later. Note that as of the time of this writing, Avatica 1.17.0, the latest version, does not support passing connection string parameters from the URL to Druid, so you must pass them using a `Properties` object. Once you've downloaded the Avatica client jar, add it to your classpath and use the connect string `jdbc:avatica:remote:url=http://BROKER:8082/druid/v2/sql/avatica/`.
+You can make [Druid SQL](../querying/sql.md) queries using the [Avatica JDBC driver](https://calcite.apache.org/avatica/downloads/). We recommend using Avatica JDBC driver version 1.17.0 or later. Note that as of the time of this writing, Avatica 1.17.0, the latest version, does not support passing connection string parameters from the URL to Druid, so you must pass them using a `Properties` object. Once you've downloaded the Avatica client jar, add it to your classpath and use the connect string `jdbc:avatica:remote:url=http://BROKER:8082/druid/v2/sql/avatica/`.
When using the JDBC connector for the [examples](#examples) or in general, it's helpful to understand the parts of the connect string stored in the `url` variable:
@@ -60,7 +60,7 @@ try (Connection connection = DriverManager.getConnection(url, connectionProperti
For a runnable example that includes a query that you might run, see [Examples](#examples).
It is also possible to use a protocol buffers JDBC connection with Druid, this offer reduced bloat and potential performance
-improvements for larger result sets. To use it apply the following connection url instead, everything else remains the same
+improvements for larger result sets. To use it apply the following connection URL instead, everything else remains the same
```
String url = "jdbc:avatica:remote:url=http://localhost:8082/druid/v2/sql/avatica-protobuf/;serialization=protobuf";
```
@@ -68,7 +68,7 @@ String url = "jdbc:avatica:remote:url=http://localhost:8082/druid/v2/sql/avatica
> The protobuf endpoint is also known to work with the official [Golang Avatica driver](https://github.com/apache/calcite-avatica-go)
Table metadata is available over JDBC using `connection.getMetaData()` or by querying the
-["INFORMATION_SCHEMA" tables](sql-metadata-tables.md). For an example of this, see [Get the metadata for a datasource](#get-the-metadata-for-a-datasource).
+[INFORMATION_SCHEMA tables](../querying/sql-metadata-tables.md). For an example of this, see [Get the metadata for a datasource](#get-the-metadata-for-a-datasource).
## Connection stickiness
@@ -82,7 +82,7 @@ Note that the non-JDBC [JSON over HTTP](sql-api.md#submit-a-query) API is statel
## Dynamic parameters
-You can use [parameterized queries](sql.md#dynamic-parameters) in JDBC code, as in this example:
+You can use [parameterized queries](../querying/sql.md#dynamic-parameters) in JDBC code, as in this example:
```java
PreparedStatement statement = connection.prepareStatement("SELECT COUNT(*) AS cnt FROM druid.foo WHERE dim1 = ? OR dim1 = ?");
diff --git a/docs/development/extensions.md b/docs/configuration/extensions.md
similarity index 93%
rename from docs/development/extensions.md
rename to docs/configuration/extensions.md
index 36d3549b195..3a2844221cc 100644
--- a/docs/development/extensions.md
+++ b/docs/configuration/extensions.md
@@ -96,7 +96,7 @@ All of these community extensions can be downloaded using [pull-deps](../operati
|druid-momentsketch|Support for approximate quantile queries using the [momentsketch](https://github.com/stanford-futuredata/momentsketch) library|[link](../development/extensions-contrib/momentsketch-quantiles.md)|
|druid-tdigestsketch|Support for approximate sketch aggregators based on [T-Digest](https://github.com/tdunning/t-digest)|[link](../development/extensions-contrib/tdigestsketch-quantiles.md)|
|gce-extensions|GCE Extensions|[link](../development/extensions-contrib/gce-extensions.md)|
-|prometheus-emitter|Exposes [Druid metrics](../operations/metrics.md) for Prometheus server collection (https://prometheus.io/)|[link](./extensions-contrib/prometheus.md)|
+|prometheus-emitter|Exposes [Druid metrics](../operations/metrics.md) for Prometheus server collection (https://prometheus.io/)|[link](../development/extensions-contrib/prometheus.md)|
|kubernetes-overlord-extensions|Support for launching tasks in k8s without Middle Managers|[link](../development/extensions-contrib/k8s-jobs.md)|
## Promoting community extensions to core extensions
@@ -111,11 +111,11 @@ For information how to create your own extension, please see [here](../developme
### Loading core extensions
-Apache Druid bundles all [core extensions](../development/extensions.md#core-extensions) out of the box.
-See the [list of extensions](../development/extensions.md#core-extensions) for your options. You
+Apache Druid bundles all [core extensions](../configuration/extensions.md#core-extensions) out of the box.
+See the [list of extensions](../configuration/extensions.md#core-extensions) for your options. You
can load bundled extensions by adding their names to your common.runtime.properties
-`druid.extensions.loadList` property. For example, to load the *postgresql-metadata-storage* and
-*druid-hdfs-storage* extensions, use the configuration:
+`druid.extensions.loadList` property. For example, to load the postgresql-metadata-storage and
+druid-hdfs-storage extensions, use the configuration:
```
druid.extensions.loadList=["postgresql-metadata-storage", "druid-hdfs-storage"]
@@ -125,7 +125,7 @@ These extensions are located in the `extensions` directory of the distribution.
> Druid bundles two sets of configurations: one for the [quickstart](../tutorials/index.md) and
> one for a [clustered configuration](../tutorials/cluster.md). Make sure you are updating the correct
-> common.runtime.properties for your setup.
+> `common.runtime.properties` for your setup.
> Because of licensing, the mysql-metadata-storage extension does not include the required MySQL JDBC driver. For instructions
> on how to install this library, see the [MySQL extension page](../development/extensions-core/mysql.md).
@@ -153,11 +153,11 @@ You only have to install the extension once. Then, add `"druid-example-extension
> Please make sure all the Extensions related configuration properties listed [here](../configuration/index.md#extensions) are set correctly.
-> The Maven groupId for almost every [community extension](../development/extensions.md#community-extensions) is org.apache.druid.extensions.contrib. The artifactId is the name
+> The Maven `groupId` for almost every [community extension](../configuration/extensions.md#community-extensions) is `org.apache.druid.extensions.contrib`. The `artifactId` is the name
> of the extension, and the version is the latest Druid stable version.
### Loading extensions from the classpath
-If you add your extension jar to the classpath at runtime, Druid will also load it into the system. This mechanism is relatively easy to reason about,
-but it also means that you have to ensure that all dependency jars on the classpath are compatible. That is, Druid makes no provisions while using
+If you add your extension jar to the classpath at runtime, Druid will also load it into the system. This mechanism is relatively easy to reason about,
+but it also means that you have to ensure that all dependency jars on the classpath are compatible. That is, Druid makes no provisions while using
this method to maintain class loader isolation so you must make sure that the jars on your classpath are mutually compatible.
diff --git a/docs/configuration/index.md b/docs/configuration/index.md
index 42542c35ea5..074aa47b957 100644
--- a/docs/configuration/index.md
+++ b/docs/configuration/index.md
@@ -245,7 +245,7 @@ values for the above mentioned configs among others provided by Java implementat
|`druid.auth.unsecuredPaths`| List of Strings|List of paths for which security checks will not be performed. All requests to these paths will be allowed.|[]|no|
|`druid.auth.allowUnauthenticatedHttpOptions`|Boolean|If true, skip authentication checks for HTTP OPTIONS requests. This is needed for certain use cases, such as supporting CORS pre-flight requests. Note that disabling authentication checks for OPTIONS requests will allow unauthenticated users to determine what Druid endpoints are valid (by checking if the OPTIONS request returns a 200 instead of 404), so enabling this option may reveal information about server configuration, including information about what extensions are loaded (if those extensions add endpoints).|false|no|
-For more information, please see [Authentication and Authorization](../design/auth.md).
+For more information, please see [Authentication and Authorization](../operations/auth.md).
For configuration options for specific auth extensions, please refer to the extension documentation.
@@ -581,7 +581,7 @@ This deep storage is used to interface with Cassandra. Note that the `druid-cas
#### HDFS input source
You can set the following property to specify permissible protocols for
-the [HDFS input source](../ingestion/native-batch-input-source.md#hdfs-input-source).
+the [HDFS input source](../ingestion/input-sources.md#hdfs-input-source).
|Property|Possible Values|Description|Default|
|--------|---------------|-----------|-------|
@@ -591,7 +591,7 @@ the [HDFS input source](../ingestion/native-batch-input-source.md#hdfs-input-sou
#### HTTP input source
You can set the following property to specify permissible protocols for
-the [HTTP input source](../ingestion/native-batch-input-source.md#http-input-source).
+the [HTTP input source](../ingestion/input-sources.md#http-input-source).
|Property|Possible Values|Description|Default|
|--------|---------------|-----------|-------|
@@ -603,7 +603,7 @@ the [HTTP input source](../ingestion/native-batch-input-source.md#http-input-sou
#### JDBC Connections to External Databases
You can use the following properties to specify permissible JDBC options for:
-- [SQL input source](../ingestion/native-batch-input-source.md#sql-input-source)
+- [SQL input source](../ingestion/input-sources.md#sql-input-source)
- [globally cached JDBC lookups](../development/extensions-core/lookups-cached-global.md#jdbc-lookup)
- [JDBC Data Fetcher for per-lookup caching](../development/extensions-core/druid-lookups.md#data-fetcher-layer).
@@ -998,7 +998,7 @@ These configuration options control Coordinator lookup management. See [dynamic
##### Automatic compaction dynamic configuration
You can set or update [automatic compaction](../data-management/automatic-compaction.md) properties dynamically using the
-[Coordinator API](../operations/api-reference.md#automatic-compaction-configuration) without restarting Coordinators.
+[Coordinator API](../api-reference/api-reference.md#automatic-compaction-configuration) without restarting Coordinators.
For details about segment compaction, see [Segment size optimization](../operations/segment-optimization.md).
@@ -1525,7 +1525,7 @@ Additional peon configs include:
|`druid.indexer.task.gracefulShutdownTimeout`|Wait this long on middleManager restart for restorable tasks to gracefully exit.|PT5M|
|`druid.indexer.task.hadoopWorkingPath`|Temporary working directory for Hadoop tasks.|`/tmp/druid-indexing`|
|`druid.indexer.task.restoreTasksOnRestart`|If true, MiddleManagers will attempt to stop tasks gracefully on shutdown and restore them on restart.|false|
-|`druid.indexer.task.ignoreTimestampSpecForDruidInputSource`|If true, tasks using the [Druid input source](../ingestion/native-batch-input-source.md) will ignore the provided timestampSpec, and will use the `__time` column of the input datasource. This option is provided for compatibility with ingestion specs written before Druid 0.22.0.|false|
+|`druid.indexer.task.ignoreTimestampSpecForDruidInputSource`|If true, tasks using the [Druid input source](../ingestion/input-sources.md) will ignore the provided timestampSpec, and will use the `__time` column of the input datasource. This option is provided for compatibility with ingestion specs written before Druid 0.22.0.|false|
|`druid.indexer.task.storeEmptyColumns`|Boolean value for whether or not to store empty columns during ingestion. When set to true, Druid stores every column specified in the [`dimensionsSpec`](../ingestion/ingestion-spec.md#dimensionsspec). If you use the string-based schemaless ingestion and don't specify any dimensions to ingest, you must also set [`includeAllDimensions`](../ingestion/ingestion-spec.md#dimensionsspec) for Druid to store empty columns.
If you set `storeEmptyColumns` to false, Druid SQL queries referencing empty columns will fail. If you intend to leave `storeEmptyColumns` disabled, you should either ingest placeholder data for empty columns or else not query on empty columns.
You can overwrite this configuration by setting `storeEmptyColumns` in the [task context](../ingestion/tasks.md#context-parameters).|true|
|`druid.indexer.task.tmpStorageBytesPerTask`|Maximum number of bytes per task to be used to store temporary files on disk. This config is generally intended for internal usage. Attempts to set it are very likely to be overwritten by the TaskRunner that executes the task, so be sure of what you expect to happen before directly adjusting this configuration parameter. The config is documented here primarily to provide an understanding of what it means if/when someone sees that it has been set. A value of -1 disables this limit. |-1|
|`druid.indexer.server.maxChatRequests`|Maximum number of concurrent requests served by a task's chat handler. Set to 0 to disable limiting.|0|
@@ -1594,9 +1594,8 @@ then the value from the configuration below is used:
|`druid.indexer.task.gracefulShutdownTimeout`|Wait this long on Indexer restart for restorable tasks to gracefully exit.|PT5M|
|`druid.indexer.task.hadoopWorkingPath`|Temporary working directory for Hadoop tasks.|`/tmp/druid-indexing`|
|`druid.indexer.task.restoreTasksOnRestart`|If true, the Indexer will attempt to stop tasks gracefully on shutdown and restore them on restart.|false|
-|`druid.indexer.task.ignoreTimestampSpecForDruidInputSource`|If true, tasks using the [Druid input source](../ingestion/native-batch-input-source.md) will ignore the provided timestampSpec, and will use the `__time` column of the input datasource. This option is provided for compatibility with ingestion specs written before Druid 0.22.0.|false|
-|`druid.indexer.task.storeEmptyColumns`|Boolean value for whether or not to store empty columns during ingestion. When set to true, Druid stores every column specified in the [`dimensionsSpec`](../ingestion/ingestion-spec.md#dimensionsspec).
If you set `storeEmptyColumns` to false, Druid SQL queries referencing empty columns will fail. If you intend to leave `storeEmptyColumns` disabled, you should either ingest placeholder data for empty columns or else not query on empty columns.
You can overwrite this configuration by setting `storeEmptyColumns` in the [task context](../ingestion/tasks.md#context-parameters).|true|
-|`druid.peon.taskActionClient.retry.minWait`|The minimum retry time to communicate with Overlord.|PT5S|
+|`druid.indexer.task.ignoreTimestampSpecForDruidInputSource`|If true, tasks using the [Druid input source](../ingestion/input-sources.md) will ignore the provided timestampSpec, and will use the `__time` column of the input datasource. This option is provided for compatibility with ingestion specs written before Druid 0.22.0.|false|
+|`druid.indexer.task.storeEmptyColumns`|Boolean value for whether or not to store empty columns during ingestion. When set to true, Druid stores every column specified in the [`dimensionsSpec`](../ingestion/ingestion-spec.md#dimensionsspec).
If you set `storeEmptyColumns` to false, Druid SQL queries referencing empty columns will fail. If you intend to leave `storeEmptyColumns` disabled, you should either ingest placeholder data for empty columns or else not query on empty columns.
You can overwrite this configuration by setting `storeEmptyColumns` in the [task context](../ingestion/tasks.md#context-parameters).|true||`druid.peon.taskActionClient.retry.minWait`|The minimum retry time to communicate with Overlord.|PT5S|
|`druid.peon.taskActionClient.retry.maxWait`|The maximum retry time to communicate with Overlord.|PT1M|
|`druid.peon.taskActionClient.retry.maxRetryCount`|The maximum number of retries to communicate with Overlord.|60|
@@ -2245,7 +2244,7 @@ Supported query contexts:
|Key|Description|Default|
|---|-----------|-------|
-|`druid.expressions.useStrictBooleans`|Controls the behavior of Druid boolean operators and functions, if set to `true` all boolean values will be either a `1` or `0`. See [expression documentation](../misc/math-expr.md#logical-operator-modes)|false|
+|`druid.expressions.useStrictBooleans`|Controls the behavior of Druid boolean operators and functions, if set to `true` all boolean values will be either a `1` or `0`. See [expression documentation](../querying/math-expr.md#logical-operator-modes)|false|
|`druid.expressions.allowNestedArrays`|If enabled, Druid array expressions can create nested arrays.|false|
### Router
diff --git a/docs/data-management/automatic-compaction.md b/docs/data-management/automatic-compaction.md
index 5f5a76eebe4..866ca2407dd 100644
--- a/docs/data-management/automatic-compaction.md
+++ b/docs/data-management/automatic-compaction.md
@@ -40,7 +40,7 @@ This topic guides you through setting up automatic compaction for your Druid clu
## Enable automatic compaction
You can enable automatic compaction for a datasource using the web console or programmatically via an API.
-This process differs for manual compaction tasks, which can be submitted from the [Tasks view of the web console](../operations/web-console.md) or the [Tasks API](../operations/api-reference.md#tasks).
+This process differs for manual compaction tasks, which can be submitted from the [Tasks view of the web console](../operations/web-console.md) or the [Tasks API](../api-reference/api-reference.md#tasks).
### web console
@@ -59,10 +59,10 @@ To disable auto-compaction for a datasource, click **Delete** from the **Compact
### Compaction configuration API
-Use the [Coordinator API](../operations/api-reference.md#automatic-compaction-status) to configure automatic compaction.
+Use the [Coordinator API](../api-reference/api-reference.md#automatic-compaction-status) to configure automatic compaction.
To enable auto-compaction for a datasource, create a JSON object with the desired auto-compaction settings.
See [Configure automatic compaction](#configure-automatic-compaction) for the syntax of an auto-compaction spec.
-Send the JSON object as a payload in a [`POST` request](../operations/api-reference.md#automatic-compaction-configuration) to `/druid/coordinator/v1/config/compaction`.
+Send the JSON object as a payload in a [`POST` request](../api-reference/api-reference.md#automatic-compaction-configuration) to `/druid/coordinator/v1/config/compaction`.
The following example configures auto-compaction for the `wikipedia` datasource:
```sh
@@ -76,7 +76,7 @@ curl --location --request POST 'http://localhost:8081/druid/coordinator/v1/confi
}'
```
-To disable auto-compaction for a datasource, send a [`DELETE` request](../operations/api-reference.md#automatic-compaction-configuration) to `/druid/coordinator/v1/config/compaction/{dataSource}`. Replace `{dataSource}` with the name of the datasource for which to disable auto-compaction. For example:
+To disable auto-compaction for a datasource, send a [`DELETE` request](../api-reference/api-reference.md#automatic-compaction-configuration) to `/druid/coordinator/v1/config/compaction/{dataSource}`. Replace `{dataSource}` with the name of the datasource for which to disable auto-compaction. For example:
```sh
curl --location --request DELETE 'http://localhost:8081/druid/coordinator/v1/config/compaction/wikipedia'
@@ -152,7 +152,7 @@ After the Coordinator has initiated auto-compaction, you can view compaction sta
In the web console, the Datasources view displays auto-compaction statistics. The Tasks view shows the task information for compaction tasks that were triggered by the automatic compaction system.
-To get statistics by API, send a [`GET` request](../operations/api-reference.md#automatic-compaction-status) to `/druid/coordinator/v1/compaction/status`. To filter the results to a particular datasource, pass the datasource name as a query parameter to the requestβfor example, `/druid/coordinator/v1/compaction/status?dataSource=wikipedia`.
+To get statistics by API, send a [`GET` request](../api-reference/api-reference.md#automatic-compaction-status) to `/druid/coordinator/v1/compaction/status`. To filter the results to a particular datasource, pass the datasource name as a query parameter to the requestβfor example, `/druid/coordinator/v1/compaction/status?dataSource=wikipedia`.
## Examples
diff --git a/docs/data-management/compaction.md b/docs/data-management/compaction.md
index a4264c160e5..3e833469e8e 100644
--- a/docs/data-management/compaction.md
+++ b/docs/data-management/compaction.md
@@ -136,7 +136,7 @@ To control the number of result segments per time chunk, you can set [`maxRowsPe
> You can run multiple compaction tasks in parallel. For example, if you want to compact the data for a year, you are not limited to running a single task for the entire year. You can run 12 compaction tasks with month-long intervals.
-A compaction task internally generates an `index` or `index_parallel` task spec for performing compaction work with some fixed parameters. For example, its `inputSource` is always the [`druid` input source](../ingestion/native-batch-input-source.md), and `dimensionsSpec` and `metricsSpec` include all dimensions and metrics of the input segments by default.
+A compaction task internally generates an `index` or `index_parallel` task spec for performing compaction work with some fixed parameters. For example, its `inputSource` is always the [`druid` input source](../ingestion/input-sources.md), and `dimensionsSpec` and `metricsSpec` include all dimensions and metrics of the input segments by default.
Compaction tasks fetch all [relevant segments](#compaction-io-configuration) prior to launching any subtasks, _unless_ the following items are all set. It is strongly recommended to set all of these items to maximize performance and minimize disk usage of the `compact` task:
diff --git a/docs/data-management/delete.md b/docs/data-management/delete.md
index 361c7873cc5..ebabd69c4dc 100644
--- a/docs/data-management/delete.md
+++ b/docs/data-management/delete.md
@@ -38,7 +38,7 @@ Deletion by time range happens in two steps:
you have a backup.
For documentation on disabling segments using the Coordinator API, see the
-[Coordinator API reference](../operations/api-reference.md#coordinator-datasources).
+[Coordinator API reference](../api-reference/api-reference.md#coordinator-datasources).
A data deletion tutorial is available at [Tutorial: Deleting data](../tutorials/tutorial-delete-data.md).
@@ -65,7 +65,7 @@ For example, to delete records where `userName` is `'bob'` with native batch ind
To delete the same records using SQL, use [REPLACE](../multi-stage-query/concepts.md#replace) with `WHERE userName <> 'bob'`.
To reindex using [native batch](../ingestion/native-batch.md), use the [`druid` input
-source](../ingestion/native-batch-input-source.md#druid-input-source). If needed,
+source](../ingestion/input-sources.md#druid-input-source). If needed,
[`transformSpec`](../ingestion/ingestion-spec.md#transformspec) can be used to filter or modify data during the
reindexing job. To reindex with SQL, use [`REPLACE OVERWRITE`](../multi-stage-query/reference.md#replace)
with `SELECT ... FROM `. (Druid does not have `UPDATE` or `ALTER TABLE` statements.) Any SQL SELECT query can be
diff --git a/docs/data-management/update.md b/docs/data-management/update.md
index 070aaf3489a..74508d0acfc 100644
--- a/docs/data-management/update.md
+++ b/docs/data-management/update.md
@@ -52,7 +52,7 @@ is used to perform schema changes, repartition data, filter out unwanted data, e
behaves just like any other [overwrite](#overwrite) with regard to atomic updates and locking.
With [native batch](../ingestion/native-batch.md), use the [`druid` input
-source](../ingestion/native-batch-input-source.md#druid-input-source). If needed,
+source](../ingestion/input-sources.md#druid-input-source). If needed,
[`transformSpec`](../ingestion/ingestion-spec.md#transformspec) can be used to filter or modify data during the
reindexing job.
diff --git a/docs/design/architecture.md b/docs/design/architecture.md
index 21f69663d24..0362ca3c1d2 100644
--- a/docs/design/architecture.md
+++ b/docs/design/architecture.md
@@ -80,7 +80,7 @@ both in deep storage and across your Historical servers for the data you plan to
Deep storage is an important part of Druid's elastic, fault-tolerant design. Druid bootstraps from deep storage even
if every single data server is lost and re-provisioned.
-For more details, please see the [Deep storage](../dependencies/deep-storage.md) page.
+For more details, please see the [Deep storage](../design/deep-storage.md) page.
### Metadata storage
@@ -88,13 +88,13 @@ The metadata storage holds various shared system metadata such as segment usage
clustered deployment, this is typically a traditional RDBMS like PostgreSQL or MySQL. In a single-server
deployment, it is typically a locally-stored Apache Derby database.
-For more details, please see the [Metadata storage](../dependencies/metadata-storage.md) page.
+For more details, please see the [Metadata storage](../design/metadata-storage.md) page.
### ZooKeeper
Used for internal service discovery, coordination, and leader election.
-For more details, please see the [ZooKeeper](../dependencies/zookeeper.md) page.
+For more details, please see the [ZooKeeper](zookeeper.md) page.
## Storage design
@@ -203,7 +203,7 @@ new segments. Then it drops the old segments a few minutes later.
Each segment has a lifecycle that involves the following three major areas:
1. **Metadata store:** Segment metadata (a small JSON payload generally no more than a few KB) is stored in the
-[metadata store](../dependencies/metadata-storage.md) once a segment is done being constructed. The act of inserting
+[metadata store](../design/metadata-storage.md) once a segment is done being constructed. The act of inserting
a record for a segment into the metadata store is called _publishing_. These metadata records have a boolean flag
named `used`, which controls whether the segment is intended to be queryable or not. Segments created by realtime tasks will be
available before they are published, since they are only published when the segment is complete and will not accept
diff --git a/docs/design/broker.md b/docs/design/broker.md
index 795f70faca7..1c8c3be7b63 100644
--- a/docs/design/broker.md
+++ b/docs/design/broker.md
@@ -31,7 +31,7 @@ For basic tuning guidance for the Broker process, see [Basic cluster tuning](../
### HTTP endpoints
-For a list of API endpoints supported by the Broker, see [Broker API](../operations/api-reference.md#broker).
+For a list of API endpoints supported by the Broker, see [Broker API](../api-reference/api-reference.md#broker).
### Overview
diff --git a/docs/design/coordinator.md b/docs/design/coordinator.md
index 52f5f159e48..f0a162fe66c 100644
--- a/docs/design/coordinator.md
+++ b/docs/design/coordinator.md
@@ -31,7 +31,7 @@ For basic tuning guidance for the Coordinator process, see [Basic cluster tuning
### HTTP endpoints
-For a list of API endpoints supported by the Coordinator, see [Coordinator API](../operations/api-reference.md#coordinator).
+For a list of API endpoints supported by the Coordinator, see [Coordinator API](../api-reference/api-reference.md#coordinator).
### Overview
@@ -92,7 +92,7 @@ Once some segments are found, it issues a [compaction task](../ingestion/tasks.m
The maximum number of running compaction tasks is `min(sum of worker capacity * slotRatio, maxSlots)`.
Note that even if `min(sum of worker capacity * slotRatio, maxSlots) = 0`, at least one compaction task is always submitted
if the compaction is enabled for a dataSource.
-See [Automatic compaction configuration API](../operations/api-reference.md#automatic-compaction-configuration) and [Automatic compaction configuration](../configuration/index.md#automatic-compaction-dynamic-configuration) to enable and configure automatic compaction.
+See [Automatic compaction configuration API](../api-reference/api-reference.md#automatic-compaction-configuration) and [Automatic compaction configuration](../configuration/index.md#automatic-compaction-dynamic-configuration) to enable and configure automatic compaction.
Compaction tasks might fail due to the following reasons:
diff --git a/docs/dependencies/deep-storage.md b/docs/design/deep-storage.md
similarity index 98%
rename from docs/dependencies/deep-storage.md
rename to docs/design/deep-storage.md
index b63f968bf54..f5adf35c6aa 100644
--- a/docs/dependencies/deep-storage.md
+++ b/docs/design/deep-storage.md
@@ -73,4 +73,4 @@ See [druid-hdfs-storage extension documentation](../development/extensions-core/
## Additional options
-For additional deep storage options, please see our [extensions list](../development/extensions.md).
+For additional deep storage options, please see our [extensions list](../configuration/extensions.md).
diff --git a/docs/design/extensions-contrib/dropwizard.md b/docs/design/extensions-contrib/dropwizard.md
index a2a8c34d6ea..fa1967cf056 100644
--- a/docs/design/extensions-contrib/dropwizard.md
+++ b/docs/design/extensions-contrib/dropwizard.md
@@ -24,7 +24,7 @@ title: "Dropwizard metrics emitter"
# Dropwizard Emitter
-To use this extension, make sure to [include](../../development/extensions.md#loading-extensions) `dropwizard-emitter` in the extensions load list.
+To use this extension, make sure to [include](../../configuration/extensions.md#loading-extensions) `dropwizard-emitter` in the extensions load list.
## Introduction
diff --git a/docs/design/historical.md b/docs/design/historical.md
index f3580f5dac9..a2fb3032de3 100644
--- a/docs/design/historical.md
+++ b/docs/design/historical.md
@@ -31,7 +31,7 @@ For basic tuning guidance for the Historical process, see [Basic cluster tuning]
### HTTP endpoints
-For a list of API endpoints supported by the Historical, please see the [API reference](../operations/api-reference.md#historical).
+For a list of API endpoints supported by the Historical, please see the [API reference](../api-reference/api-reference.md#historical).
### Running
diff --git a/docs/design/indexer.md b/docs/design/indexer.md
index fa42912e760..eedf0fc775e 100644
--- a/docs/design/indexer.md
+++ b/docs/design/indexer.md
@@ -35,7 +35,7 @@ For Apache Druid Indexer Process Configuration, see [Indexer Configuration](../c
### HTTP endpoints
-The Indexer process shares the same HTTP endpoints as the [MiddleManager](../operations/api-reference.md#middlemanager).
+The Indexer process shares the same HTTP endpoints as the [MiddleManager](../api-reference/api-reference.md#middlemanager).
### Running
diff --git a/docs/design/indexing-service.md b/docs/design/indexing-service.md
index acbf5f9eb03..793c31e81b0 100644
--- a/docs/design/indexing-service.md
+++ b/docs/design/indexing-service.md
@@ -30,7 +30,7 @@ Indexing [tasks](../ingestion/tasks.md) are responsible for creating and [killin
The indexing service is composed of three main components: [Peons](../design/peons.md) that can run a single task, [MiddleManagers](../design/middlemanager.md) that manage Peons, and an [Overlord](../design/overlord.md) that manages task distribution to MiddleManagers.
Overlords and MiddleManagers may run on the same process or across multiple processes, while MiddleManagers and Peons always run on the same process.
-Tasks are managed using API endpoints on the Overlord service. See [Overlord Task API](../operations/api-reference.md#tasks) for more information.
+Tasks are managed using API endpoints on the Overlord service. Please see [Overlord Task API](../api-reference/api-reference.md#tasks) for more information.
![Indexing Service](../assets/indexing_service.png "Indexing Service")
diff --git a/docs/dependencies/metadata-storage.md b/docs/design/metadata-storage.md
similarity index 100%
rename from docs/dependencies/metadata-storage.md
rename to docs/design/metadata-storage.md
diff --git a/docs/design/middlemanager.md b/docs/design/middlemanager.md
index 5cfc29b7072..e0096c6b292 100644
--- a/docs/design/middlemanager.md
+++ b/docs/design/middlemanager.md
@@ -31,7 +31,7 @@ For basic tuning guidance for the MiddleManager process, see [Basic cluster tuni
### HTTP endpoints
-For a list of API endpoints supported by the MiddleManager, please see the [API reference](../operations/api-reference.md#middlemanager).
+For a list of API endpoints supported by the MiddleManager, please see the [API reference](../api-reference/api-reference.md#middlemanager).
### Overview
diff --git a/docs/design/overlord.md b/docs/design/overlord.md
index 74c09dd5903..7c0ce9ce87a 100644
--- a/docs/design/overlord.md
+++ b/docs/design/overlord.md
@@ -31,7 +31,7 @@ For basic tuning guidance for the Overlord process, see [Basic cluster tuning](.
### HTTP endpoints
-For a list of API endpoints supported by the Overlord, please see the [API reference](../operations/api-reference.md#overlord).
+For a list of API endpoints supported by the Overlord, please see the [API reference](../api-reference/api-reference.md#overlord).
### Overview
diff --git a/docs/design/peons.md b/docs/design/peons.md
index 5b2953915f1..d413dcb2503 100644
--- a/docs/design/peons.md
+++ b/docs/design/peons.md
@@ -31,7 +31,7 @@ For basic tuning guidance for MiddleManager tasks, see [Basic cluster tuning](..
### HTTP endpoints
-For a list of API endpoints supported by the Peon, please see the [Peon API reference](../operations/api-reference.md#peon).
+For a list of API endpoints supported by the Peon, please see the [Peon API reference](../api-reference/api-reference.md#peon).
Peons run a single task in a single JVM. MiddleManager is responsible for creating Peons for running tasks.
Peons should rarely (if ever for testing purposes) be run on their own.
diff --git a/docs/design/router.md b/docs/design/router.md
index 582e424e6d4..726f6831f16 100644
--- a/docs/design/router.md
+++ b/docs/design/router.md
@@ -36,7 +36,7 @@ For basic tuning guidance for the Router process, see [Basic cluster tuning](../
### HTTP endpoints
-For a list of API endpoints supported by the Router, see [Router API](../operations/api-reference.md#router).
+For a list of API endpoints supported by the Router, see [Router API](../api-reference/api-reference.md#router).
### Running
diff --git a/docs/dependencies/zookeeper.md b/docs/design/zookeeper.md
similarity index 100%
rename from docs/dependencies/zookeeper.md
rename to docs/design/zookeeper.md
diff --git a/docs/development/experimental-features.md b/docs/development/experimental-features.md
index 30d8c2f77c2..d33f634a4b6 100644
--- a/docs/development/experimental-features.md
+++ b/docs/development/experimental-features.md
@@ -32,7 +32,7 @@ Note that this document does not track the status of contrib extensions, all of
- [SQL-based ingestion](../multi-stage-query/index.md)
- [SQL-based ingestion concepts](../multi-stage-query/concepts.md)
-- [SQL-based ingestion and multi-stage query task API](../multi-stage-query/api.md)
+- [SQL-based ingestion and multi-stage query task API](../api-reference/sql-ingestion-api.md)
## Indexer process
diff --git a/docs/development/extensions-contrib/aliyun-oss-extensions.md b/docs/development/extensions-contrib/aliyun-oss-extensions.md
index f9b0e0e349a..ab0573bdc44 100644
--- a/docs/development/extensions-contrib/aliyun-oss-extensions.md
+++ b/docs/development/extensions-contrib/aliyun-oss-extensions.md
@@ -27,7 +27,7 @@ This document describes how to use OSS as Druid deep storage.
## Installation
-Use the [pull-deps](../../operations/pull-deps.md) tool shipped with Druid to install the `aliyun-oss-extensions` extension, as described [here](../../development/extensions.md#community-extensions) on middle manager and historical nodes.
+Use the [pull-deps](../../operations/pull-deps.md) tool shipped with Druid to install the `aliyun-oss-extensions` extension, as described [here](../../configuration/extensions.md#community-extensions) on middle manager and historical nodes.
```bash
java -classpath "{YOUR_DRUID_DIR}/lib/*" org.apache.druid.cli.Main tools pull-deps -c org.apache.druid.extensions.contrib:aliyun-oss-extensions:{YOUR_DRUID_VERSION}
diff --git a/docs/development/extensions-contrib/ambari-metrics-emitter.md b/docs/development/extensions-contrib/ambari-metrics-emitter.md
index 079d5e84ae2..ee82ca6d781 100644
--- a/docs/development/extensions-contrib/ambari-metrics-emitter.md
+++ b/docs/development/extensions-contrib/ambari-metrics-emitter.md
@@ -23,7 +23,7 @@ title: "Ambari Metrics Emitter"
-->
-To use this Apache Druid extension, [include](../../development/extensions.md#loading-extensions) `ambari-metrics-emitter` in the extensions load list.
+To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `ambari-metrics-emitter` in the extensions load list.
## Introduction
diff --git a/docs/development/extensions-contrib/cassandra.md b/docs/development/extensions-contrib/cassandra.md
index 980857f75f3..916bacb917e 100644
--- a/docs/development/extensions-contrib/cassandra.md
+++ b/docs/development/extensions-contrib/cassandra.md
@@ -23,7 +23,7 @@ title: "Apache Cassandra"
-->
-To use this Apache Druid extension, [include](../../development/extensions.md#loading-extensions) `druid-cassandra-storage` in the extensions load list.
+To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `druid-cassandra-storage` in the extensions load list.
[Apache Cassandra](http://www.datastax.com/what-we-offer/products-services/datastax-enterprise/apache-cassandra) can also
be leveraged for deep storage. This requires some additional Druid configuration as well as setting up the necessary
diff --git a/docs/development/extensions-contrib/cloudfiles.md b/docs/development/extensions-contrib/cloudfiles.md
index 8addd242490..83a1d0c7e10 100644
--- a/docs/development/extensions-contrib/cloudfiles.md
+++ b/docs/development/extensions-contrib/cloudfiles.md
@@ -23,7 +23,7 @@ title: "Rackspace Cloud Files"
-->
-To use this Apache Druid extension, [include](../../development/extensions.md#loading-extensions) `druid-cloudfiles-extensions` in the extensions load list.
+To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `druid-cloudfiles-extensions` in the extensions load list.
## Deep Storage
diff --git a/docs/development/extensions-contrib/compressed-big-decimal.md b/docs/development/extensions-contrib/compressed-big-decimal.md
index 5c96527493a..187d7e45fbf 100644
--- a/docs/development/extensions-contrib/compressed-big-decimal.md
+++ b/docs/development/extensions-contrib/compressed-big-decimal.md
@@ -34,7 +34,7 @@ Compressed big decimal is an absolute number based complex type based on big dec
2. Accuracy: Provides greater level of accuracy in decimal arithmetic
## Operations
-To use this extension, make sure to [load](../../development/extensions.md#loading-extensions) `compressed-big-decimal` to your config file.
+To use this extension, make sure to [load](../../configuration/extensions.md#loading-extensions) `compressed-big-decimal` to your config file.
## Configuration
There are currently no configuration properties specific to Compressed Big Decimal
diff --git a/docs/development/extensions-contrib/distinctcount.md b/docs/development/extensions-contrib/distinctcount.md
index 17954fa4bed..38f8e5efbab 100644
--- a/docs/development/extensions-contrib/distinctcount.md
+++ b/docs/development/extensions-contrib/distinctcount.md
@@ -23,7 +23,7 @@ title: "DistinctCount Aggregator"
-->
-To use this Apache Druid extension, [include](../../development/extensions.md#loading-extensions) the `druid-distinctcount` in the extensions load list.
+To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) the `druid-distinctcount` in the extensions load list.
Additionally, follow these steps:
diff --git a/docs/development/extensions-contrib/gce-extensions.md b/docs/development/extensions-contrib/gce-extensions.md
index 26e7bd4fbd4..17a69c72f23 100644
--- a/docs/development/extensions-contrib/gce-extensions.md
+++ b/docs/development/extensions-contrib/gce-extensions.md
@@ -23,7 +23,7 @@ title: "GCE Extensions"
-->
-To use this Apache Druid extension, [include](../../development/extensions.md#loading-extensions) `gce-extensions` in the extensions load list.
+To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `gce-extensions` in the extensions load list.
At the moment, this extension enables only Druid to autoscale instances in GCE.
diff --git a/docs/development/extensions-contrib/graphite.md b/docs/development/extensions-contrib/graphite.md
index d7a024db1c4..a6e04e9b004 100644
--- a/docs/development/extensions-contrib/graphite.md
+++ b/docs/development/extensions-contrib/graphite.md
@@ -23,7 +23,7 @@ title: "Graphite Emitter"
-->
-To use this Apache Druid extension, [include](../../development/extensions.md#loading-extensions) `graphite-emitter` in the extensions load list.
+To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `graphite-emitter` in the extensions load list.
## Introduction
diff --git a/docs/development/extensions-contrib/influx.md b/docs/development/extensions-contrib/influx.md
index d0dc6841f0b..eec9fb555ec 100644
--- a/docs/development/extensions-contrib/influx.md
+++ b/docs/development/extensions-contrib/influx.md
@@ -23,7 +23,7 @@ title: "InfluxDB Line Protocol Parser"
-->
-To use this Apache Druid extension, [include](../../development/extensions.md#loading-extensions) `druid-influx-extensions` in the extensions load list.
+To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `druid-influx-extensions` in the extensions load list.
This extension enables Druid to parse the [InfluxDB Line Protocol](https://docs.influxdata.com/influxdb/v1.5/write_protocols/line_protocol_tutorial/), a popular text-based timeseries metric serialization format.
diff --git a/docs/development/extensions-contrib/influxdb-emitter.md b/docs/development/extensions-contrib/influxdb-emitter.md
index 039b9d185ae..1086a5121e4 100644
--- a/docs/development/extensions-contrib/influxdb-emitter.md
+++ b/docs/development/extensions-contrib/influxdb-emitter.md
@@ -23,7 +23,7 @@ title: "InfluxDB Emitter"
-->
-To use this Apache Druid extension, [include](../../development/extensions.md#loading-extensions) `druid-influxdb-emitter` in the extensions load list.
+To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `druid-influxdb-emitter` in the extensions load list.
## Introduction
diff --git a/docs/development/extensions-contrib/k8s-jobs.md b/docs/development/extensions-contrib/k8s-jobs.md
index 5cbf4c507be..f3e8d53bb35 100644
--- a/docs/development/extensions-contrib/k8s-jobs.md
+++ b/docs/development/extensions-contrib/k8s-jobs.md
@@ -47,7 +47,7 @@ Task specific pod templates must be specified as the runtime property `druid.ind
## Configuration
-To use this extension please make sure to [include](../extensions.md#loading-extensions)`druid-kubernetes-overlord-extensions` in the extensions load list for your overlord process.
+To use this extension please make sure to [include](../../configuration/extensions.md#loading-extensions)`druid-kubernetes-overlord-extensions` in the extensions load list for your overlord process.
The extension uses the task queue to limit how many concurrent tasks (K8s jobs) are in flight so it is required you have a reasonable value for `druid.indexer.queue.maxSize`. Additionally set the variable `druid.indexer.runner.namespace` to the namespace in which you are running druid.
diff --git a/docs/development/extensions-contrib/kafka-emitter.md b/docs/development/extensions-contrib/kafka-emitter.md
index 85b8f10a7e1..3457c249c71 100644
--- a/docs/development/extensions-contrib/kafka-emitter.md
+++ b/docs/development/extensions-contrib/kafka-emitter.md
@@ -23,7 +23,7 @@ title: "Kafka Emitter"
-->
-To use this Apache Druid extension, [include](../../development/extensions.md#loading-extensions) `kafka-emitter` in the extensions load list.
+To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `kafka-emitter` in the extensions load list.
## Introduction
diff --git a/docs/development/extensions-contrib/momentsketch-quantiles.md b/docs/development/extensions-contrib/momentsketch-quantiles.md
index df7deb0d926..eaad48f69c5 100644
--- a/docs/development/extensions-contrib/momentsketch-quantiles.md
+++ b/docs/development/extensions-contrib/momentsketch-quantiles.md
@@ -26,7 +26,7 @@ title: "Moment Sketches for Approximate Quantiles module"
This module provides aggregators for approximate quantile queries using the [momentsketch](https://github.com/stanford-futuredata/momentsketch) library.
The momentsketch provides coarse quantile estimates with less space and aggregation time overheads than traditional sketches, approaching the performance of counts and sums by reconstructing distributions from computed statistics.
-To use this Apache Druid extension, [include](../../development/extensions.md#loading-extensions) in the extensions load list.
+To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) in the extensions load list.
### Aggregator
diff --git a/docs/development/extensions-contrib/moving-average-query.md b/docs/development/extensions-contrib/moving-average-query.md
index aa7fdb80b5e..54bf2f32588 100644
--- a/docs/development/extensions-contrib/moving-average-query.md
+++ b/docs/development/extensions-contrib/moving-average-query.md
@@ -52,7 +52,7 @@ It runs the query in two main phases:
## Operations
### Installation
-Use [pull-deps](../../operations/pull-deps.md) tool shipped with Druid to install this [extension](../../development/extensions.md#community-extensions) on all Druid broker and router nodes.
+Use [pull-deps](../../operations/pull-deps.md) tool shipped with Druid to install this [extension](../../configuration/extensions.md#community-extensions) on all Druid broker and router nodes.
```bash
java -classpath "/lib/*" org.apache.druid.cli.Main tools pull-deps -c org.apache.druid.extensions.contrib:druid-moving-average-query:{VERSION}
diff --git a/docs/development/extensions-contrib/opentsdb-emitter.md b/docs/development/extensions-contrib/opentsdb-emitter.md
index 8d102baad8e..e13cd5b55fa 100644
--- a/docs/development/extensions-contrib/opentsdb-emitter.md
+++ b/docs/development/extensions-contrib/opentsdb-emitter.md
@@ -23,7 +23,7 @@ title: "OpenTSDB Emitter"
-->
-To use this Apache Druid extension, [include](../../development/extensions.md#loading-extensions) `opentsdb-emitter` in the extensions load list.
+To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `opentsdb-emitter` in the extensions load list.
## Introduction
diff --git a/docs/development/extensions-contrib/prometheus.md b/docs/development/extensions-contrib/prometheus.md
index e5625f160b1..2612921505c 100644
--- a/docs/development/extensions-contrib/prometheus.md
+++ b/docs/development/extensions-contrib/prometheus.md
@@ -23,7 +23,7 @@ title: "Prometheus Emitter"
-->
-To use this Apache Druid extension, [include](../../development/extensions.md#loading-extensions) `prometheus-emitter` in the extensions load list.
+To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `prometheus-emitter` in the extensions load list.
## Introduction
diff --git a/docs/development/extensions-contrib/redis-cache.md b/docs/development/extensions-contrib/redis-cache.md
index 4bd85e9cc50..63e0b9e509c 100644
--- a/docs/development/extensions-contrib/redis-cache.md
+++ b/docs/development/extensions-contrib/redis-cache.md
@@ -28,7 +28,7 @@ Below are guidance and configuration options known to this module.
## Installation
-Use [pull-deps](../../operations/pull-deps.md) tool shipped with Druid to install this [extension](../../development/extensions.md#community-extensions) on broker, historical and middle manager nodes.
+Use [pull-deps](../../operations/pull-deps.md) tool shipped with Druid to install this [extension](../../configuration/extensions.md#community-extensions) on broker, historical and middle manager nodes.
```bash
java -classpath "druid_dir/lib/*" org.apache.druid.cli.Main tools pull-deps -c org.apache.druid.extensions.contrib:druid-redis-cache:{VERSION}
@@ -38,7 +38,7 @@ java -classpath "druid_dir/lib/*" org.apache.druid.cli.Main tools pull-deps -c o
To enable this extension after installation,
-1. [include](../../development/extensions.md#loading-extensions) this `druid-redis-cache` extension
+1. [include](../../configuration/extensions.md#loading-extensions) this `druid-redis-cache` extension
2. to enable cache on broker nodes, follow [broker caching docs](../../configuration/index.md#broker-caching) to set related properties
3. to enable cache on historical nodes, follow [historical caching docs](../../configuration/index.md#historical-caching) to set related properties
4. to enable cache on middle manager nodes, follow [peon caching docs](../../configuration/index.md#peon-caching) to set related properties
diff --git a/docs/development/extensions-contrib/sqlserver.md b/docs/development/extensions-contrib/sqlserver.md
index 482715176c6..0f2e8de24ef 100644
--- a/docs/development/extensions-contrib/sqlserver.md
+++ b/docs/development/extensions-contrib/sqlserver.md
@@ -23,7 +23,7 @@ title: "Microsoft SQLServer"
-->
-To use this Apache Druid extension, [include](../../development/extensions.md#loading-extensions) `sqlserver-metadata-storage` in the extensions load list.
+To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `sqlserver-metadata-storage` in the extensions load list.
## Setting up SQLServer
diff --git a/docs/development/extensions-contrib/statsd.md b/docs/development/extensions-contrib/statsd.md
index 61ff45f09cd..5ad705a31f2 100644
--- a/docs/development/extensions-contrib/statsd.md
+++ b/docs/development/extensions-contrib/statsd.md
@@ -23,7 +23,7 @@ title: "StatsD Emitter"
-->
-To use this Apache Druid extension, [include](../../development/extensions.md#loading-extensions) `statsd-emitter` in the extensions load list.
+To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `statsd-emitter` in the extensions load list.
## Introduction
diff --git a/docs/development/extensions-contrib/tdigestsketch-quantiles.md b/docs/development/extensions-contrib/tdigestsketch-quantiles.md
index 705bbc2edb1..59b5a851c10 100644
--- a/docs/development/extensions-contrib/tdigestsketch-quantiles.md
+++ b/docs/development/extensions-contrib/tdigestsketch-quantiles.md
@@ -35,7 +35,7 @@ to generate sketches during ingestion time itself and then combining them during
The module also provides a postAggregator, quantilesFromTDigestSketch, that can be used to compute approximate
quantiles from T-Digest sketches generated by the tDigestSketch aggregator.
-To use this aggregator, make sure you [include](../../development/extensions.md#loading-extensions) the extension in your config file:
+To use this aggregator, make sure you [include](../../configuration/extensions.md#loading-extensions) the extension in your config file:
```
druid.extensions.loadList=["druid-tdigestsketch"]
diff --git a/docs/development/extensions-contrib/thrift.md b/docs/development/extensions-contrib/thrift.md
index 70dbd4e3e8d..31489827090 100644
--- a/docs/development/extensions-contrib/thrift.md
+++ b/docs/development/extensions-contrib/thrift.md
@@ -23,7 +23,7 @@ title: "Thrift"
-->
-To use this Apache Druid extension, [include](../../development/extensions.md#loading-extensions) `druid-thrift-extensions` in the extensions load list.
+To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `druid-thrift-extensions` in the extensions load list.
This extension enables Druid to ingest thrift compact data online (`ByteBuffer`) and offline (SequenceFile of type `` or LzoThriftBlock File).
diff --git a/docs/development/extensions-contrib/time-min-max.md b/docs/development/extensions-contrib/time-min-max.md
index 7d5588a0bb3..f83667baea2 100644
--- a/docs/development/extensions-contrib/time-min-max.md
+++ b/docs/development/extensions-contrib/time-min-max.md
@@ -23,7 +23,7 @@ title: "Timestamp Min/Max aggregators"
-->
-To use this Apache Druid extension, [include](../../development/extensions.md#loading-extensions) `druid-time-min-max` in the extensions load list.
+To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `druid-time-min-max` in the extensions load list.
These aggregators enable more precise calculation of min and max time of given events than `__time` column whose granularity is sparse, the same as query granularity.
To use this feature, a "timeMin" or "timeMax" aggregator must be included at indexing time.
diff --git a/docs/development/extensions-core/approximate-histograms.md b/docs/development/extensions-core/approximate-histograms.md
index 08dd753353d..7e24f958d46 100644
--- a/docs/development/extensions-core/approximate-histograms.md
+++ b/docs/development/extensions-core/approximate-histograms.md
@@ -23,7 +23,7 @@ title: "Approximate Histogram aggregators"
-->
-To use this Apache Druid extension, [include](../../development/extensions.md#loading-extensions) `druid-histogram` in the extensions load list.
+To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `druid-histogram` in the extensions load list.
The `druid-histogram` extension provides an approximate histogram aggregator and a fixed buckets histogram aggregator.
diff --git a/docs/development/extensions-core/avro.md b/docs/development/extensions-core/avro.md
index ac1b7ef51ca..7db7530b07d 100644
--- a/docs/development/extensions-core/avro.md
+++ b/docs/development/extensions-core/avro.md
@@ -31,7 +31,7 @@ The [Avro Stream Parser](../../ingestion/data-formats.md#avro-stream-parser) is
## Load the Avro extension
-To use the Avro extension, add the `druid-avro-extensions` to the list of loaded extensions. See [Loading extensions](../../development/extensions.md#loading-extensions) for more information.
+To use the Avro extension, add the `druid-avro-extensions` to the list of loaded extensions. See [Loading extensions](../../configuration/extensions.md#loading-extensions) for more information.
## Avro types
diff --git a/docs/development/extensions-core/azure.md b/docs/development/extensions-core/azure.md
index d63e74d8655..c6a1c397905 100644
--- a/docs/development/extensions-core/azure.md
+++ b/docs/development/extensions-core/azure.md
@@ -23,7 +23,7 @@ title: "Microsoft Azure"
-->
-To use this Apache Druid extension, [include](../../development/extensions.md#loading-extensions) `druid-azure-extensions` in the extensions load list.
+To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `druid-azure-extensions` in the extensions load list.
## Deep Storage
diff --git a/docs/development/extensions-core/bloom-filter.md b/docs/development/extensions-core/bloom-filter.md
index 0befa1418fe..30cebeef6c8 100644
--- a/docs/development/extensions-core/bloom-filter.md
+++ b/docs/development/extensions-core/bloom-filter.md
@@ -23,7 +23,7 @@ title: "Bloom Filter"
-->
-To use this Apache Druid extension, [include](../../development/extensions.md#loading-extensions) `druid-bloom-filter` in the extensions load list.
+To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `druid-bloom-filter` in the extensions load list.
This extension adds the ability to both construct bloom filters from query results, and filter query results by testing
against a bloom filter. A Bloom filter is a probabilistic data structure for performing a set membership check. A bloom
@@ -98,7 +98,7 @@ SELECT COUNT(*) FROM druid.foo WHERE bloom_filter_test(, ' If using JDBC, you will need to add your database's client JAR files to the extension's directory.
> For Postgres, the connector JAR is already included.
diff --git a/docs/development/extensions-core/druid-ranger-security.md b/docs/development/extensions-core/druid-ranger-security.md
index 481fb56adfe..8c2b3b36535 100644
--- a/docs/development/extensions-core/druid-ranger-security.md
+++ b/docs/development/extensions-core/druid-ranger-security.md
@@ -22,9 +22,9 @@ title: "Apache Ranger Security"
~ under the License.
-->
-This Apache Druid extension adds an Authorizer which implements access control for Druid, backed by [Apache Ranger](https://ranger.apache.org/). Please see [Authentication and Authorization](../../design/auth.md) for more information on the basic facilities this extension provides.
+This Apache Druid extension adds an Authorizer which implements access control for Druid, backed by [Apache Ranger](https://ranger.apache.org/). Please see [Authentication and Authorization](../../operations/auth.md) for more information on the basic facilities this extension provides.
-Make sure to [include](../../development/extensions.md#loading-extensions) `druid-ranger-security` in the extensions load list.
+Make sure to [include](../../configuration/extensions.md#loading-extensions) `druid-ranger-security` in the extensions load list.
> The latest release of Apache Ranger is at the time of writing version 2.0. This version has a dependency on `log4j 1.2.17` which has a vulnerability if you configure it to use a `SocketServer` (CVE-2019-17571). Next to that, it also includes Kafka 2.0.0 which has 2 known vulnerabilities (CVE-2019-12399, CVE-2018-17196). Kafka can be used by the audit component in Ranger, but is not required.
@@ -98,7 +98,7 @@ When installing a new Druid service in Apache Ranger for the first time, Ranger
### HTTP methods
-For information on what HTTP methods are supported for a particular request endpoint, please refer to the [API documentation](../../operations/api-reference.md).
+For information on what HTTP methods are supported for a particular request endpoint, please refer to the [API documentation](../../api-reference/api-reference.md).
GET requires READ permission, while POST and DELETE require WRITE permission.
diff --git a/docs/development/extensions-core/google.md b/docs/development/extensions-core/google.md
index 813f9827e90..6df933f2da6 100644
--- a/docs/development/extensions-core/google.md
+++ b/docs/development/extensions-core/google.md
@@ -28,7 +28,7 @@ This extension allows you to do 2 things:
* [Ingest data](#reading-data-from-google-cloud-storage) from files stored in Google Cloud Storage.
* Write segments to [deep storage](#deep-storage) in GCS.
-To use this Apache Druid extension, [include](../../development/extensions.md#loading-extensions) `druid-google-extensions` in the extensions load list.
+To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `druid-google-extensions` in the extensions load list.
### Required Configuration
@@ -36,7 +36,7 @@ To configure connectivity to google cloud, run druid processes with `GOOGLE_APPL
### Reading data from Google Cloud Storage
-The [Google Cloud Storage input source](../../ingestion/native-batch-input-source.md) is supported by the [Parallel task](../../ingestion/native-batch.md)
+The [Google Cloud Storage input source](../../ingestion/input-sources.md) is supported by the [Parallel task](../../ingestion/native-batch.md)
to read objects directly from Google Cloud Storage. If you use the [Hadoop task](../../ingestion/hadoop.md),
you can read data from Google Cloud Storage by specifying the paths in your [`inputSpec`](../../ingestion/hadoop.md#inputspec).
diff --git a/docs/development/extensions-core/hdfs.md b/docs/development/extensions-core/hdfs.md
index a49041b2453..edc3fdb04cf 100644
--- a/docs/development/extensions-core/hdfs.md
+++ b/docs/development/extensions-core/hdfs.md
@@ -23,7 +23,7 @@ title: "HDFS"
-->
-To use this Apache Druid extension, [include](../../development/extensions.md#loading-extensions) `druid-hdfs-storage` in the extensions load list and run druid processes with `GOOGLE_APPLICATION_CREDENTIALS=/path/to/service_account_keyfile` in the environment.
+To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `druid-hdfs-storage` in the extensions load list and run druid processes with `GOOGLE_APPLICATION_CREDENTIALS=/path/to/service_account_keyfile` in the environment.
## Deep Storage
@@ -153,12 +153,12 @@ Tested with Druid 0.17.0, Hadoop 2.8.5 and gcs-connector jar 2.0.0-hadoop2.
### Native batch ingestion
-The [HDFS input source](../../ingestion/native-batch-input-source.md#hdfs-input-source) is supported by the [Parallel task](../../ingestion/native-batch.md)
+The [HDFS input source](../../ingestion/input-sources.md#hdfs-input-source) is supported by the [Parallel task](../../ingestion/native-batch.md)
to read files directly from the HDFS Storage. You may be able to read objects from cloud storage
with the HDFS input source, but we highly recommend to use a proper
-[Input Source](../../ingestion/native-batch-input-source.md) instead if possible because
-it is simple to set up. For now, only the [S3 input source](../../ingestion/native-batch-input-source.md#s3-input-source)
-and the [Google Cloud Storage input source](../../ingestion/native-batch-input-source.md#google-cloud-storage-input-source)
+[Input Source](../../ingestion/input-sources.md) instead if possible because
+it is simple to set up. For now, only the [S3 input source](../../ingestion/input-sources.md#s3-input-source)
+and the [Google Cloud Storage input source](../../ingestion/input-sources.md#google-cloud-storage-input-source)
are supported for cloud storage types, and so you may still want to use the HDFS input source
to read from cloud storage other than those two.
diff --git a/docs/development/extensions-core/kafka-extraction-namespace.md b/docs/development/extensions-core/kafka-extraction-namespace.md
index 0efbf7b8155..2d841dfc943 100644
--- a/docs/development/extensions-core/kafka-extraction-namespace.md
+++ b/docs/development/extensions-core/kafka-extraction-namespace.md
@@ -22,7 +22,7 @@ title: "Apache Kafka Lookups"
~ under the License.
-->
-To use this Apache Druid extension, [include](../../development/extensions.md#loading-extensions) `druid-lookups-cached-global` and `druid-kafka-extraction-namespace` in the extensions load list.
+To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `druid-lookups-cached-global` and `druid-kafka-extraction-namespace` in the extensions load list.
If you need updates to populate as promptly as possible, it is possible to plug into a Kafka topic whose key is the old value and message is the desired new value (both in UTF-8) as a LookupExtractorFactory.
diff --git a/docs/development/extensions-core/kafka-ingestion.md b/docs/development/extensions-core/kafka-ingestion.md
index fd7f7b58cee..7a4b49f1732 100644
--- a/docs/development/extensions-core/kafka-ingestion.md
+++ b/docs/development/extensions-core/kafka-ingestion.md
@@ -49,7 +49,7 @@ If your Kafka cluster enables consumer-group based ACLs, you can set `group.id`
## Load the Kafka indexing service
-To use the Kafka indexing service, load the `druid-kafka-indexing-service` extension on both the Overlord and the MiddleManagers. See [Loading extensions](../extensions.md#loading-extensions) for instructions on how to configure extensions.
+To use the Kafka indexing service, load the `druid-kafka-indexing-service` extension on both the Overlord and the MiddleManagers. See [Loading extensions](../../configuration/extensions.md) for instructions on how to configure extensions.
## Define a supervisor spec
diff --git a/docs/development/extensions-core/kafka-supervisor-operations.md b/docs/development/extensions-core/kafka-supervisor-operations.md
index fe8d1f562b6..dbfa05174fb 100644
--- a/docs/development/extensions-core/kafka-supervisor-operations.md
+++ b/docs/development/extensions-core/kafka-supervisor-operations.md
@@ -25,7 +25,7 @@ description: "Reference topic for running and maintaining Apache Kafka superviso
-->
This topic contains operations reference information to run and maintain Apache Kafka supervisors for Apache Druid. It includes descriptions of how some supervisor APIs work within Kafka Indexing Service.
-For all supervisor APIs, see [Supervisor APIs](../../operations/api-reference.md#supervisors).
+For all supervisor APIs, see [Supervisor APIs](../../api-reference/api-reference.md#supervisors).
## Getting Supervisor Status Report
diff --git a/docs/development/extensions-core/kafka-supervisor-reference.md b/docs/development/extensions-core/kafka-supervisor-reference.md
index b410d6f5b25..cf44be7bfd3 100644
--- a/docs/development/extensions-core/kafka-supervisor-reference.md
+++ b/docs/development/extensions-core/kafka-supervisor-reference.md
@@ -205,7 +205,7 @@ The `tuningConfig` is optional and default parameters will be used if no `tuning
| `indexSpecForIntermediatePersists`| | Defines segment storage format options to be used at indexing time for intermediate persisted temporary segments. This can be used to disable dimension/metric compression on intermediate segments to reduce memory required for final merging. However, disabling compression on intermediate segments might increase page cache use while they are used before getting merged into final segment published, see [IndexSpec](#indexspec) for possible values. | no (default = same as `indexSpec`) |
| `reportParseExceptions` | Boolean | *DEPRECATED*. If true, exceptions encountered during parsing will be thrown and will halt ingestion; if false, unparseable rows and fields will be skipped. Setting `reportParseExceptions` to true will override existing configurations for `maxParseExceptions` and `maxSavedParseExceptions`, setting `maxParseExceptions` to 0 and limiting `maxSavedParseExceptions` to no more than 1. | no (default == false) |
| `handoffConditionTimeout` | Long | Milliseconds to wait for segment handoff. It must be >= 0, where 0 means to wait forever. | no (default == 0) |
-| `resetOffsetAutomatically` | Boolean | Controls behavior when Druid needs to read Kafka messages that are no longer available (i.e. when `OffsetOutOfRangeException` is encountered).
If false, the exception will bubble up, which will cause your tasks to fail and ingestion to halt. If this occurs, manual intervention is required to correct the situation; potentially using the [Reset Supervisor API](../../operations/api-reference.md#supervisors). This mode is useful for production, since it will make you aware of issues with ingestion.
If true, Druid will automatically reset to the earlier or latest offset available in Kafka, based on the value of the `useEarliestOffset` property (earliest if true, latest if false). Note that this can lead to data being _DROPPED_ (if `useEarliestOffset` is false) or _DUPLICATED_ (if `useEarliestOffset` is true) without your knowledge. Messages will be logged indicating that a reset has occurred, but ingestion will continue. This mode is useful for non-production situations, since it will make Druid attempt to recover from problems automatically, even if they lead to quiet dropping or duplicating of data.
This feature behaves similarly to the Kafka `auto.offset.reset` consumer property. | no (default == false) |
+| `resetOffsetAutomatically` | Boolean | Controls behavior when Druid needs to read Kafka messages that are no longer available (i.e. when `OffsetOutOfRangeException` is encountered).
If false, the exception will bubble up, which will cause your tasks to fail and ingestion to halt. If this occurs, manual intervention is required to correct the situation; potentially using the [Reset Supervisor API](../../api-reference/api-reference.md#supervisors). This mode is useful for production, since it will make you aware of issues with ingestion.
If true, Druid will automatically reset to the earlier or latest offset available in Kafka, based on the value of the `useEarliestOffset` property (earliest if true, latest if false). Note that this can lead to data being _DROPPED_ (if `useEarliestOffset` is false) or _DUPLICATED_ (if `useEarliestOffset` is true) without your knowledge. Messages will be logged indicating that a reset has occurred, but ingestion will continue. This mode is useful for non-production situations, since it will make Druid attempt to recover from problems automatically, even if they lead to quiet dropping or duplicating of data.
This feature behaves similarly to the Kafka `auto.offset.reset` consumer property. | no (default == false) |
| `workerThreads` | Integer | The number of threads that the supervisor uses to handle requests/responses for worker tasks, along with any other internal asynchronous operation. | no (default == min(10, taskCount)) |
| `chatAsync` | Boolean | If true, use asynchronous communication with indexing tasks, and ignore the `chatThreads` parameter. If false, use synchronous communication in a thread pool of size `chatThreads`. | no (default == true) |
| `chatThreads` | Integer | The number of threads that will be used for communicating with indexing tasks. Ignored if `chatAsync` is `true` (the default). | no (default == min(10, taskCount * replicas)) |
@@ -217,7 +217,7 @@ The `tuningConfig` is optional and default parameters will be used if no `tuning
| `intermediateHandoffPeriod` | ISO8601 Period | How often the tasks should hand off segments. Handoff will happen either if `maxRowsPerSegment` or `maxTotalRows` is hit or every `intermediateHandoffPeriod`, whichever happens earlier. | no (default == P2147483647D) |
| `logParseExceptions` | Boolean | If true, log an error message when a parsing exception occurs, containing information about the row where the error occurred. | no, default == false |
| `maxParseExceptions` | Integer | The maximum number of parse exceptions that can occur before the task halts ingestion and fails. Overridden if `reportParseExceptions` is set. | no, unlimited default |
-| `maxSavedParseExceptions` | Integer | When a parse exception occurs, Druid can keep track of the most recent parse exceptions. `maxSavedParseExceptions` limits how many exception instances will be saved. These saved exceptions will be made available after the task finishes in the [task completion report](../../ingestion/tasks.md#reports). Overridden if `reportParseExceptions` is set. | no, default == 0 |
+| `maxSavedParseExceptions` | Integer | When a parse exception occurs, Druid can keep track of the most recent parse exceptions. `maxSavedParseExceptions` limits how many exception instances will be saved. These saved exceptions will be made available after the task finishes in the [task completion report](../../ingestion/tasks.md#task-reports). Overridden if `reportParseExceptions` is set. | no, default == 0 |
#### IndexSpec
diff --git a/docs/development/extensions-core/kinesis-ingestion.md b/docs/development/extensions-core/kinesis-ingestion.md
index 57457992c3f..046ffd2ad61 100644
--- a/docs/development/extensions-core/kinesis-ingestion.md
+++ b/docs/development/extensions-core/kinesis-ingestion.md
@@ -30,7 +30,7 @@ When you enable the Kinesis indexing service, you can configure *supervisors* on
To use the Kinesis indexing service, load the `druid-kinesis-indexing-service` core Apache Druid extension (see
-[Including Extensions](../../development/extensions.md#loading-extensions)).
+[Including Extensions](../../configuration/extensions.md#loading-extensions)).
> Before you deploy the Kinesis extension to production, read the [Kinesis known issues](#kinesis-known-issues).
@@ -284,7 +284,7 @@ The `tuningConfig` is optional. If no `tuningConfig` is specified, default param
|`indexSpecForIntermediatePersists`|Object|Defines segment storage format options to be used at indexing time for intermediate persisted temporary segments. This can be used to disable dimension/metric compression on intermediate segments to reduce memory required for final merging. However, disabling compression on intermediate segments might increase page cache use while they are used before getting merged into final segment published, see [IndexSpec](#indexspec) for possible values.| no (default = same as `indexSpec`)|
|`reportParseExceptions`|Boolean|If true, exceptions encountered during parsing will be thrown and will halt ingestion; if false, unparseable rows and fields will be skipped.|no (default == false)|
|`handoffConditionTimeout`|Long| Milliseconds to wait for segment handoff. It must be >= 0, where 0 means to wait forever.| no (default == 0)|
-|`resetOffsetAutomatically`|Boolean|Controls behavior when Druid needs to read Kinesis messages that are no longer available.
If false, the exception will bubble up, which will cause your tasks to fail and ingestion to halt. If this occurs, manual intervention is required to correct the situation; potentially using the [Reset Supervisor API](../../operations/api-reference.md#supervisors). This mode is useful for production, since it will make you aware of issues with ingestion.
If true, Druid will automatically reset to the earlier or latest sequence number available in Kinesis, based on the value of the `useEarliestSequenceNumber` property (earliest if true, latest if false). Please note that this can lead to data being _DROPPED_ (if `useEarliestSequenceNumber` is false) or _DUPLICATED_ (if `useEarliestSequenceNumber` is true) without your knowledge. Messages will be logged indicating that a reset has occurred, but ingestion will continue. This mode is useful for non-production situations, since it will make Druid attempt to recover from problems automatically, even if they lead to quiet dropping or duplicating of data.|no (default == false)|
+|`resetOffsetAutomatically`|Boolean|Controls behavior when Druid needs to read Kinesis messages that are no longer available.
If false, the exception will bubble up, which will cause your tasks to fail and ingestion to halt. If this occurs, manual intervention is required to correct the situation; potentially using the [Reset Supervisor API](../../api-reference/api-reference.md#supervisors). This mode is useful for production, since it will make you aware of issues with ingestion.
If true, Druid will automatically reset to the earlier or latest sequence number available in Kinesis, based on the value of the `useEarliestSequenceNumber` property (earliest if true, latest if false). Please note that this can lead to data being _DROPPED_ (if `useEarliestSequenceNumber` is false) or _DUPLICATED_ (if `useEarliestSequenceNumber` is true) without your knowledge. Messages will be logged indicating that a reset has occurred, but ingestion will continue. This mode is useful for non-production situations, since it will make Druid attempt to recover from problems automatically, even if they lead to quiet dropping or duplicating of data.|no (default == false)|
|`skipSequenceNumberAvailabilityCheck`|Boolean|Whether to enable checking if the current sequence number is still available in a particular Kinesis shard. If set to false, the indexing task will attempt to reset the current sequence number (or not), depending on the value of `resetOffsetAutomatically`.|no (default == false)|
|`workerThreads`|Integer|The number of threads that the supervisor uses to handle requests/responses for worker tasks, along with any other internal asynchronous operation.|no (default == min(10, taskCount))|
|`chatAsync`|Boolean| If true, use asynchronous communication with indexing tasks, and ignore the `chatThreads` parameter. If false, use synchronous communication in a thread pool of size `chatThreads`. | no (default == true) |
@@ -338,7 +338,7 @@ For Concise bitmaps:
## Operations
This section describes how some supervisor APIs work in Kinesis Indexing Service.
-For all supervisor APIs, check [Supervisor APIs](../../operations/api-reference.md#supervisors).
+For all supervisor APIs, check [Supervisor APIs](../../api-reference/api-reference.md#supervisors).
### AWS Authentication
diff --git a/docs/development/extensions-core/kubernetes.md b/docs/development/extensions-core/kubernetes.md
index c789a423d90..600c3ada21b 100644
--- a/docs/development/extensions-core/kubernetes.md
+++ b/docs/development/extensions-core/kubernetes.md
@@ -29,7 +29,7 @@ Apache Druid Extension to enable using Kubernetes API Server for node discovery
## Configuration
-To use this extension please make sure to [include](../../development/extensions.md#loading-extensions) `druid-kubernetes-extensions` in the extensions load list.
+To use this extension please make sure to [include](../../configuration/extensions.md#loading-extensions) `druid-kubernetes-extensions` in the extensions load list.
This extension works together with HTTP based segment and task management in Druid. Consequently, following configurations must be set on all Druid nodes.
diff --git a/docs/development/extensions-core/lookups-cached-global.md b/docs/development/extensions-core/lookups-cached-global.md
index 5842d3dea0a..7e9d80d7ec2 100644
--- a/docs/development/extensions-core/lookups-cached-global.md
+++ b/docs/development/extensions-core/lookups-cached-global.md
@@ -22,7 +22,7 @@ title: "Globally Cached Lookups"
~ under the License.
-->
-To use this Apache Druid extension, [include](../extensions.md#loading-extensions) `druid-lookups-cached-global` in the extensions load list.
+To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `druid-lookups-cached-global` in the extensions load list.
## Configuration
> Static configuration is no longer supported. Lookups can be configured through
@@ -168,7 +168,7 @@ It's highly recommended that `druid.lookup.namespace.numBufferedEntries` is set
## Supported lookups
-For additional lookups, please see our [extensions list](../extensions.md).
+For additional lookups, please see our [extensions list](../../configuration/extensions.md).
### URI lookup
diff --git a/docs/development/extensions-core/mysql.md b/docs/development/extensions-core/mysql.md
index f7c300c16ac..5e08c7f5f3c 100644
--- a/docs/development/extensions-core/mysql.md
+++ b/docs/development/extensions-core/mysql.md
@@ -23,7 +23,7 @@ title: "MySQL Metadata Store"
-->
-To use this Apache Druid extension, [include](../../development/extensions.md#loading-extensions) `mysql-metadata-storage` in the extensions load list.
+To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `mysql-metadata-storage` in the extensions load list.
> The MySQL extension requires the MySQL Connector/J library or MariaDB Connector/J library, neither of which are included in the Druid distribution.
> Refer to the following section for instructions on how to install this library.
diff --git a/docs/development/extensions-core/orc.md b/docs/development/extensions-core/orc.md
index e358dc89d45..4be58674099 100644
--- a/docs/development/extensions-core/orc.md
+++ b/docs/development/extensions-core/orc.md
@@ -30,7 +30,7 @@ The extension provides the [ORC input format](../../ingestion/data-formats.md#or
for [native batch ingestion](../../ingestion/native-batch.md) and [Hadoop batch ingestion](../../ingestion/hadoop.md), respectively.
Please see corresponding docs for details.
-To use this extension, make sure to [include](../../development/extensions.md#loading-extensions) `druid-orc-extensions` in the extensions load list.
+To use this extension, make sure to [include](../../configuration/extensions.md#loading-extensions) `druid-orc-extensions` in the extensions load list.
### Migration from 'contrib' extension
This extension, first available in version 0.15.0, replaces the previous 'contrib' extension which was available until
diff --git a/docs/development/extensions-core/parquet.md b/docs/development/extensions-core/parquet.md
index 614e5dcd232..a655c8989c8 100644
--- a/docs/development/extensions-core/parquet.md
+++ b/docs/development/extensions-core/parquet.md
@@ -27,7 +27,7 @@ This Apache Druid module extends [Druid Hadoop based indexing](../../ingestion/h
Apache Parquet files.
Note: If using the `parquet-avro` parser for Apache Hadoop based indexing, `druid-parquet-extensions` depends on the `druid-avro-extensions` module, so be sure to
- [include both](../../development/extensions.md#loading-extensions).
+ [include both](../../configuration/extensions.md#loading-extensions).
The `druid-parquet-extensions` provides the [Parquet input format](../../ingestion/data-formats.md#parquet), the [Parquet Hadoop parser](../../ingestion/data-formats.md#parquet-hadoop-parser),
and the [Parquet Avro Hadoop Parser](../../ingestion/data-formats.md#parquet-avro-hadoop-parser) with `druid-avro-extensions`.
diff --git a/docs/development/extensions-core/postgresql.md b/docs/development/extensions-core/postgresql.md
index 07e17d1f292..cd88b22a43c 100644
--- a/docs/development/extensions-core/postgresql.md
+++ b/docs/development/extensions-core/postgresql.md
@@ -23,7 +23,7 @@ title: "PostgreSQL Metadata Store"
-->
-To use this Apache Druid extension, [include](../../development/extensions.md#loading-extensions) `postgresql-metadata-storage` in the extensions load list.
+To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `postgresql-metadata-storage` in the extensions load list.
## Setting up PostgreSQL
@@ -87,7 +87,7 @@ In most cases, the configuration options map directly to the [postgres JDBC conn
### PostgreSQL Firehose
-The PostgreSQL extension provides an implementation of an [SQL input source](../../ingestion/native-batch-input-source.md) which can be used to ingest data into Druid from a PostgreSQL database.
+The PostgreSQL extension provides an implementation of an [SQL input source](../../ingestion/input-sources.md) which can be used to ingest data into Druid from a PostgreSQL database.
```json
{
diff --git a/docs/development/extensions-core/protobuf.md b/docs/development/extensions-core/protobuf.md
index d6080eca942..3c87809f72b 100644
--- a/docs/development/extensions-core/protobuf.md
+++ b/docs/development/extensions-core/protobuf.md
@@ -23,7 +23,7 @@ title: "Protobuf"
-->
-This Apache Druid extension enables Druid to ingest and understand the Protobuf data format. Make sure to [include](../../development/extensions.md#loading-extensions) `druid-protobuf-extensions` in the extensions load list.
+This Apache Druid extension enables Druid to ingest and understand the Protobuf data format. Make sure to [include](../../configuration/extensions.md#loading-extensions) `druid-protobuf-extensions` in the extensions load list.
The `druid-protobuf-extensions` provides the [Protobuf Parser](../../ingestion/data-formats.md#protobuf-parser)
for [stream ingestion](../../ingestion/index.md#streaming). See corresponding docs for details.
diff --git a/docs/development/extensions-core/s3.md b/docs/development/extensions-core/s3.md
index c8fa755dfb2..20bd1682f24 100644
--- a/docs/development/extensions-core/s3.md
+++ b/docs/development/extensions-core/s3.md
@@ -28,11 +28,11 @@ This extension allows you to do 2 things:
* [Ingest data](#reading-data-from-s3) from files stored in S3.
* Write segments to [deep storage](#deep-storage) in S3.
-To use this Apache Druid extension, [include](../../development/extensions.md#loading-extensions) `druid-s3-extensions` in the extensions load list.
+To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `druid-s3-extensions` in the extensions load list.
### Reading data from S3
-Use a native batch [Parallel task](../../ingestion/native-batch.md) with an [S3 input source](../../ingestion/native-batch-input-source.md#s3-input-source) to read objects directly from S3.
+Use a native batch [Parallel task](../../ingestion/native-batch.md) with an [S3 input source](../../ingestion/input-sources.md#s3-input-source) to read objects directly from S3.
Alternatively, use a [Hadoop task](../../ingestion/hadoop.md),
and specify S3 paths in your [`inputSpec`](../../ingestion/hadoop.md#inputspec).
@@ -79,9 +79,9 @@ The configuration options are listed in order of precedence. For example, if yo
For more information, refer to the [Amazon Developer Guide](https://docs.aws.amazon.com/fr_fr/sdk-for-java/v1/developer-guide/credentials).
-Alternatively, you can bypass this chain by specifying an access key and secret key using a [Properties Object](../../ingestion/native-batch-input-source.md#s3-input-source) inside your ingestion specification.
+Alternatively, you can bypass this chain by specifying an access key and secret key using a [Properties Object](../../ingestion/input-sources.md#s3-input-source) inside your ingestion specification.
-Use the property [`druid.startup.logging.maskProperties`](../../configuration/index.md#startup-logging) to mask credentials information in Druid logs. For example, `["password", "secretKey", "awsSecretAccessKey"]`.
+Use the property [`druid.startup.logging.maskProperties`](../../configuration/index.md#startup-logging) to mask credentials information in Druid logs. For example, `["password", "secretKey", "awsSecretAccessKey"]`.
### S3 permissions settings
diff --git a/docs/development/extensions-core/stats.md b/docs/development/extensions-core/stats.md
index bae91e8b72a..917d3dcdd9f 100644
--- a/docs/development/extensions-core/stats.md
+++ b/docs/development/extensions-core/stats.md
@@ -23,7 +23,7 @@ title: "Stats aggregator"
-->
-This Apache Druid extension includes stat-related aggregators, including variance and standard deviations, etc. Make sure to [include](../../development/extensions.md#loading-extensions) `druid-stats` in the extensions load list.
+This Apache Druid extension includes stat-related aggregators, including variance and standard deviations, etc. Make sure to [include](../../configuration/extensions.md#loading-extensions) `druid-stats` in the extensions load list.
## Variance aggregator
diff --git a/docs/ingestion/data-formats.md b/docs/ingestion/data-formats.md
index 0bcef5777dc..f3ac4d90bd3 100644
--- a/docs/ingestion/data-formats.md
+++ b/docs/ingestion/data-formats.md
@@ -1,6 +1,7 @@
---
id: data-formats
-title: "Data formats"
+title: Source input formats
+sidebar_label: Source input formats
---
-> Firehose ingestion is deprecated. See [Migrate from firehose to input source ingestion](./migrate-from-firehose-ingestion.md) for instructions on migrating from firehose ingestion to using native batch ingestion input sources.
+> Firehose ingestion is deprecated. See [Migrate from firehose to input source ingestion](../operations/migrate-from-firehose-ingestion.md) for instructions on migrating from firehose ingestion to using native batch ingestion input sources.
There are several firehoses readily available in Druid, some are meant for examples, others can be used directly in a production environment.
diff --git a/docs/ingestion/native-batch-simple-task.md b/docs/ingestion/native-batch-simple-task.md
index a7c0ef2e4e3..105fdb65cbe 100644
--- a/docs/ingestion/native-batch-simple-task.md
+++ b/docs/ingestion/native-batch-simple-task.md
@@ -1,7 +1,7 @@
---
id: native-batch-simple-task
-title: "Native batch simple task indexing"
-sidebar_label: "Native batch (simple)"
+title: "JSON-based batch simple task indexing"
+sidebar_label: "JSON-based batch (simple)"
---
-> This page describes native batch ingestion using [ingestion specs](ingestion-spec.md). Refer to the [ingestion
-> methods](../ingestion/index.md#batch) table to determine which ingestion method is right for you.
+> This page describes JSON-based batch ingestion using [ingestion specs](ingestion-spec.md). For SQL-based batch ingestion using the [`druid-multi-stage-query`](../multi-stage-query/index.md) extension, see [SQL-based ingestion](../multi-stage-query/index.md). Refer to the [ingestion methods](../ingestion/index.md#batch) table to determine which ingestion method is right for you.
Apache Druid supports the following types of native batch indexing tasks:
- Parallel task indexing (`index_parallel`) that can run multiple indexing tasks concurrently. Parallel task works well for production ingestion tasks.
@@ -35,14 +34,14 @@ This topic covers the configuration for `index_parallel` ingestion specs.
For related information on batch indexing, see:
- [Batch ingestion method comparison table](./index.md#batch) for a comparison of batch ingestion methods.
- [Tutorial: Loading a file](../tutorials/tutorial-batch.md) for a tutorial on native batch ingestion.
-- [Input sources](./native-batch-input-source.md) for possible input sources.
-- [Input formats](./data-formats.md#input-format) for possible input formats.
+- [Input sources](./input-sources.md) for possible input sources.
+- [Source input formats](./data-formats.md#input-format) for possible input formats.
## Submit an indexing task
To run either kind of native batch indexing task you can:
- Use the **Load Data** UI in the web console to define and submit an ingestion spec.
-- Define an ingestion spec in JSON based upon the [examples](#parallel-indexing-example) and reference topics for batch indexing. Then POST the ingestion spec to the [Indexer API endpoint](../operations/api-reference.md#tasks),
+- Define an ingestion spec in JSON based upon the [examples](#parallel-indexing-example) and reference topics for batch indexing. Then POST the ingestion spec to the [Indexer API endpoint](../api-reference/api-reference.md#tasks),
`/druid/indexer/v1/task`, the Overlord service. Alternatively you can use the indexing script included with Druid at `bin/post-index-task`.
## Parallel task indexing
@@ -196,7 +195,7 @@ The following table defines the primary sections of the input spec:
|type|The task type. For parallel task indexing, set the value to `index_parallel`.|yes|
|id|The task ID. If omitted, Druid generates the task ID using the task type, data source name, interval, and date-time stamp. |no|
|spec|The ingestion spec that defines the [data schema](#dataschema), [IO config](#ioconfig), and [tuning config](#tuningconfig).|yes|
-|context|Context to specify various task configuration parameters. See [Task context parameters](tasks.md#context-parameters) for more details.|no|
+|context|Context to specify various task configuration parameters. See [Task context parameters](../ingestion/tasks.md#context-parameters) for more details.|no|
### `dataSchema`
@@ -263,7 +262,7 @@ The size-based split hint spec affects all splittable input sources except for t
#### Segments Split Hint Spec
-The segments split hint spec is used only for [`DruidInputSource`](./native-batch-input-source.md).
+The segments split hint spec is used only for [`DruidInputSource`](./input-sources.md).
|property|description|default|required?|
|--------|-----------|-------|---------|
@@ -707,17 +706,17 @@ by assigning more task slots to them.
Use the `inputSource` object to define the location where your index can read data. Only the native parallel task and simple task support the input source.
For details on available input sources see:
-- [S3 input source](./native-batch-input-source.md#s3-input-source) (`s3`) reads data from AWS S3 storage.
-- [Google Cloud Storage input source](./native-batch-input-source.md#google-cloud-storage-input-source) (`gs`) reads data from Google Cloud Storage.
-- [Azure input source](./native-batch-input-source.md#azure-input-source) (`azure`) reads data from Azure Blob Storage and Azure Data Lake.
-- [HDFS input source](./native-batch-input-source.md#hdfs-input-source) (`hdfs`) reads data from HDFS storage.
-- [HTTP input Source](./native-batch-input-source.md#http-input-source) (`http`) reads data from HTTP servers.
-- [Inline input Source](./native-batch-input-source.md#inline-input-source) reads data you paste into the web console.
-- [Local input Source](./native-batch-input-source.md#local-input-source) (`local`) reads data from local storage.
-- [Druid input Source](./native-batch-input-source.md#druid-input-source) (`druid`) reads data from a Druid datasource.
-- [SQL input Source](./native-batch-input-source.md#sql-input-source) (`sql`) reads data from a RDBMS source.
+- [S3 input source](./input-sources.md#s3-input-source) (`s3`) reads data from AWS S3 storage.
+- [Google Cloud Storage input source](./input-sources.md#google-cloud-storage-input-source) (`gs`) reads data from Google Cloud Storage.
+- [Azure input source](./input-sources.md#azure-input-source) (`azure`) reads data from Azure Blob Storage and Azure Data Lake.
+- [HDFS input source](./input-sources.md#hdfs-input-source) (`hdfs`) reads data from HDFS storage.
+- [HTTP input Source](./input-sources.md#http-input-source) (`http`) reads data from HTTP servers.
+- [Inline input Source](./input-sources.md#inline-input-source) reads data you paste into the web console.
+- [Local input Source](./input-sources.md#local-input-source) (`local`) reads data from local storage.
+- [Druid input Source](./input-sources.md#druid-input-source) (`druid`) reads data from a Druid datasource.
+- [SQL input Source](./input-sources.md#sql-input-source) (`sql`) reads data from a RDBMS source.
-For information on how to combine input sources, see [Combining input source](./native-batch-input-source.md#combining-input-source).
+For information on how to combine input sources, see [Combining input source](./input-sources.md#combining-input-source).
### `segmentWriteOutMediumFactory`
diff --git a/docs/ingestion/rollup.md b/docs/ingestion/rollup.md
index 08cdfba3783..241ffba367e 100644
--- a/docs/ingestion/rollup.md
+++ b/docs/ingestion/rollup.md
@@ -1,7 +1,7 @@
---
id: rollup
title: "Data rollup"
-sidebar_label: Data rollup
+sidebar_label: Rollup
description: Introduces rollup as a concept. Provides suggestions to maximize the benefits of rollup. Differentiates between perfect and best-effort rollup.
---
@@ -26,7 +26,7 @@ description: Introduces rollup as a concept. Provides suggestions to maximize th
Druid can roll up data at ingestion time to reduce the amount of raw data to store on disk. Rollup is a form of summarization or pre-aggregation. Rolling up data can dramatically reduce the size of data to be stored and reduce row counts by potentially orders of magnitude. As a trade-off for the efficiency of rollup, you lose the ability to query individual events.
-At ingestion time, you control rollup with the `rollup` setting in the [`granularitySpec`](./ingestion-spec.md#granularityspec). Rollup is enabled by default. This means Druid combines into a single row any rows that have identical [dimension](./data-model.md#dimensions) values and [timestamp](./data-model.md#primary-timestamp) values after [`queryGranularity`-based truncation](./ingestion-spec.md#granularityspec).
+At ingestion time, you control rollup with the `rollup` setting in the [`granularitySpec`](./ingestion-spec.md#granularityspec). Rollup is enabled by default. This means Druid combines into a single row any rows that have identical [dimension](./schema-model.md#dimensions) values and [timestamp](./schema-model.md#primary-timestamp) values after [`queryGranularity`-based truncation](./ingestion-spec.md#granularityspec).
When you disable rollup, Druid loads each row as-is without doing any form of pre-aggregation. This mode is similar to databases that do not support a rollup feature. Set `rollup` to `false` if you want Druid to store each record as-is, without any rollup summarization.
diff --git a/docs/ingestion/schema-design.md b/docs/ingestion/schema-design.md
index eaada3651b9..6d385c7b60e 100644
--- a/docs/ingestion/schema-design.md
+++ b/docs/ingestion/schema-design.md
@@ -24,17 +24,17 @@ title: "Schema design tips"
## Druid's data model
-For general information, check out the documentation on [Druid's data model](./data-model.md) on the main
+For general information, check out the documentation on [Druid schema model](./schema-model.md) on the main
ingestion overview page. The rest of this page discusses tips for users coming from other kinds of systems, as well as
general tips and common practices.
-* Druid data is stored in [datasources](./data-model.md), which are similar to tables in a traditional RDBMS.
+* Druid data is stored in [datasources](./schema-model.md), which are similar to tables in a traditional RDBMS.
* Druid datasources can be ingested with or without [rollup](./rollup.md). With rollup enabled, Druid partially aggregates your data during ingestion, potentially reducing its row count, decreasing storage footprint, and improving query performance. With rollup disabled, Druid stores one row for each row in your input data, without any pre-aggregation.
* Every row in Druid must have a timestamp. Data is always partitioned by time, and every query has a time filter. Query results can also be broken down by time buckets like minutes, hours, days, and so on.
* All columns in Druid datasources, other than the timestamp column, are either dimensions or metrics. This follows the [standard naming convention](https://en.wikipedia.org/wiki/Online_analytical_processing#Overview_of_OLAP_systems) of OLAP data.
* Typical production datasources have tens to hundreds of columns.
-* [Dimension columns](./data-model.md#dimensions) are stored as-is, so they can be filtered on, grouped by, or aggregated at query time. They are always single Strings, [arrays of Strings](../querying/multi-value-dimensions.md), single Longs, single Doubles or single Floats.
-* [Metric columns](./data-model.md#metrics) are stored [pre-aggregated](../querying/aggregations.md), so they can only be aggregated at query time (not filtered or grouped by). They are often stored as numbers (integers or floats) but can also be stored as complex objects like [HyperLogLog sketches or approximate quantile sketches](../querying/aggregations.md#approximate-aggregations). Metrics can be configured at ingestion time even when rollup is disabled, but are most useful when rollup is enabled.
+* [Dimension columns](./schema-model.md#dimensions) are stored as-is, so they can be filtered on, grouped by, or aggregated at query time. They are always single Strings, [arrays of Strings](../querying/multi-value-dimensions.md), single Longs, single Doubles or single Floats.
+* [Metric columns](./schema-model.md#metrics) are stored [pre-aggregated](../querying/aggregations.md), so they can only be aggregated at query time (not filtered or grouped by). They are often stored as numbers (integers or floats) but can also be stored as complex objects like [HyperLogLog sketches or approximate quantile sketches](../querying/aggregations.md#approximate-aggregations). Metrics can be configured at ingestion time even when rollup is disabled, but are most useful when rollup is enabled.
## If you're coming from a
@@ -188,11 +188,11 @@ Druid is able to rapidly identify and retrieve data corresponding to time ranges
If your data has more than one timestamp, you can ingest the others as secondary timestamps. The best way to do this
is to ingest them as [long-typed dimensions](./ingestion-spec.md#dimensionsspec) in milliseconds format.
If necessary, you can get them into this format using a [`transformSpec`](./ingestion-spec.md#transformspec) and
-[expressions](../misc/math-expr.md) like `timestamp_parse`, which returns millisecond timestamps.
+[expressions](../querying/math-expr.md) like `timestamp_parse`, which returns millisecond timestamps.
At query time, you can query secondary timestamps with [SQL time functions](../querying/sql-scalar.md#date-and-time-functions)
like `MILLIS_TO_TIMESTAMP`, `TIME_FLOOR`, and others. If you're using native Druid queries, you can use
-[expressions](../misc/math-expr.md).
+[expressions](../querying/math-expr.md).
### Nested dimensions
diff --git a/docs/ingestion/data-model.md b/docs/ingestion/schema-model.md
similarity index 98%
rename from docs/ingestion/data-model.md
rename to docs/ingestion/schema-model.md
index 8a5a126a8df..9d7358001d7 100644
--- a/docs/ingestion/data-model.md
+++ b/docs/ingestion/schema-model.md
@@ -1,7 +1,7 @@
---
-id: data-model
-title: "Druid data model"
-sidebar_label: Data model
+id: schema-model
+title: Druid schema model
+sidebar_label: Schema model
description: Introduces concepts of datasources, primary timestamp, dimensions, and metrics.
---
diff --git a/docs/ingestion/tasks.md b/docs/ingestion/tasks.md
index 95e61f88dc2..6f6c2c010a4 100644
--- a/docs/ingestion/tasks.md
+++ b/docs/ingestion/tasks.md
@@ -1,6 +1,7 @@
---
id: tasks
-title: "Task reference"
+title: Task reference
+sidebar_label: Task reference
---
-Apache Druid relies on [metadata storage](../dependencies/metadata-storage.md) to track information on data storage, operations, and system configuration.
+Apache Druid relies on [metadata storage](../design/metadata-storage.md) to track information on data storage, operations, and system configuration.
The metadata store includes the following:
- Segment records
@@ -230,5 +230,5 @@ druid.coordinator.kill.datasource.durationToRetain=P4D
## Learn more
See the following topics for more information:
- [Metadata management](../configuration/index.md#metadata-management) for metadata store configuration reference.
-- [Metadata storage](../dependencies/metadata-storage.md) for an overview of the metadata storage database.
+- [Metadata storage](../design/metadata-storage.md) for an overview of the metadata storage database.
diff --git a/docs/operations/getting-started.md b/docs/operations/getting-started.md
index 773ade20318..8509d6baa1e 100644
--- a/docs/operations/getting-started.md
+++ b/docs/operations/getting-started.md
@@ -39,7 +39,7 @@ If you wish to jump straight to deploying Druid as a cluster, or if you have an
The [configuration reference](../configuration/index.md) describes all of Druid's configuration properties.
-The [API reference](../operations/api-reference.md) describes the APIs available on each Druid process.
+The [API reference](../api-reference/api-reference.md) describes the APIs available on each Druid process.
The [basic cluster tuning guide](../operations/basic-cluster-tuning.md) is an introductory guide for tuning your Druid cluster.
diff --git a/docs/ingestion/migrate-from-firehose-ingestion.md b/docs/operations/migrate-from-firehose-ingestion.md
similarity index 92%
rename from docs/ingestion/migrate-from-firehose-ingestion.md
rename to docs/operations/migrate-from-firehose-ingestion.md
index fa4d1ad5270..f470324b7f4 100644
--- a/docs/ingestion/migrate-from-firehose-ingestion.md
+++ b/docs/operations/migrate-from-firehose-ingestion.md
@@ -1,6 +1,6 @@
---
id: migrate-from-firehose
-title: "Migrate from firehose to input source ingestion"
+title: "Migrate from firehose to input source ingestion (legacy)"
sidebar_label: "Migrate from firehose"
---
@@ -43,11 +43,11 @@ If you're unable to use the console or you have problems with the console method
### Update your ingestion spec manually
-To update your ingestion spec manually, copy your existing spec into a new file. Refer to [Native batch ingestion with firehose (Deprecated)](./native-batch-firehose.md) for a description of firehose properties.
+To update your ingestion spec manually, copy your existing spec into a new file. Refer to [Native batch ingestion with firehose (Deprecated)](../ingestion/native-batch-firehose.md) for a description of firehose properties.
Edit the new file as follows:
-1. In the `ioConfig` component, replace the `firehose` definition with an `inputSource` definition for your chosen input source. See [Native batch input sources](./native-batch-input-source.md) for details.
+1. In the `ioConfig` component, replace the `firehose` definition with an `inputSource` definition for your chosen input source. See [Native batch input sources](../ingestion/input-sources.md) for details.
2. Move the `timeStampSpec` definition from `parser.parseSpec` to the `dataSchema` component.
3. Move the `dimensionsSpec` definition from `parser.parseSpec` to the `dataSchema` component.
4. Move the `format` definition from `parser.parseSpec` to an `inputFormat` definition in `ioConfig`.
@@ -204,6 +204,6 @@ The following example illustrates the result of migrating the [example firehose
For more information, see the following pages:
-- [Ingestion](./index.md): Overview of the Druid ingestion process.
-- [Native batch ingestion](./native-batch.md): Description of the supported native batch indexing tasks.
-- [Ingestion spec reference](./ingestion-spec.md): Description of the components and properties in the ingestion spec.
+- [Ingestion](../ingestion/index.md): Overview of the Druid ingestion process.
+- [Native batch ingestion](../ingestion/native-batch.md): Description of the supported native batch indexing tasks.
+- [Ingestion spec reference](../ingestion/ingestion-spec.md): Description of the components and properties in the ingestion spec.
diff --git a/docs/operations/pull-deps.md b/docs/operations/pull-deps.md
index ab2d5546b41..2e375f925c2 100644
--- a/docs/operations/pull-deps.md
+++ b/docs/operations/pull-deps.md
@@ -136,4 +136,4 @@ java -classpath "/my/druid/lib/*" org.apache.druid.cli.Main tools pull-deps --de
> Please note to use the pull-deps tool you must know the Maven groupId, artifactId, and version of your extension.
>
-> For Druid community extensions listed [here](../development/extensions.md), the groupId is "org.apache.druid.extensions.contrib" and the artifactId is the name of the extension.
+> For Druid community extensions listed [here](../configuration/extensions.md), the groupId is "org.apache.druid.extensions.contrib" and the artifactId is the name of the extension.
diff --git a/docs/operations/rule-configuration.md b/docs/operations/rule-configuration.md
index f527db19c1b..9719c877cc0 100644
--- a/docs/operations/rule-configuration.md
+++ b/docs/operations/rule-configuration.md
@@ -34,11 +34,11 @@ You can specify the data to retain or drop in the following ways:
- Period: segment data specified as an offset from the present time.
- Interval: a fixed time range.
-Retention rules are persistent: they remain in effect until you change them. Druid stores retention rules in its [metadata store](../dependencies/metadata-storage.md).
+Retention rules are persistent: they remain in effect until you change them. Druid stores retention rules in its [metadata store](../design/metadata-storage.md).
## Set retention rules
-You can use the Druid [web console](./web-console.md) or the [Coordinator API](./api-reference.md#coordinator) to create and manage retention rules.
+You can use the Druid [web console](./web-console.md) or the [Coordinator API](../api-reference/api-reference.md#coordinator) to create and manage retention rules.
### Use the web console
diff --git a/docs/operations/security-overview.md b/docs/operations/security-overview.md
index 5bfda4d6ee8..2fa4b45f388 100644
--- a/docs/operations/security-overview.md
+++ b/docs/operations/security-overview.md
@@ -173,7 +173,7 @@ The following takes you through sample configuration steps for enabling basic au
See the following topics for more information:
-* [Authentication and Authorization](../design/auth.md) for more information about the Authenticator,
+* [Authentication and Authorization](../operations/auth.md) for more information about the Authenticator,
Escalator, and Authorizer.
* [Basic Security](../development/extensions-core/druid-basic-security.md) for more information about
the extension used in the examples above.
diff --git a/docs/operations/security-user-auth.md b/docs/operations/security-user-auth.md
index da87386f628..faefca1a7ec 100644
--- a/docs/operations/security-user-auth.md
+++ b/docs/operations/security-user-auth.md
@@ -39,7 +39,7 @@ Druid uses the following resource types:
* STATE – Cluster-wide state resources.
* SYSTEM_TABLE – when the Broker property `druid.sql.planner.authorizeSystemTablesDirectly` is true, then Druid uses this resource type to authorize the system tables in the `sys` schema in SQL.
-For specific resources associated with the resource types, see [Defining permissions](#defining-permissions) and the corresponding endpoint descriptions in [API reference](./api-reference.md).
+For specific resources associated with the resource types, see [Defining permissions](#defining-permissions) and the corresponding endpoint descriptions in [API reference](../api-reference/api-reference.md).
### Actions
@@ -141,7 +141,7 @@ There is only one possible resource name for the "STATE" config resource type, "
Resource names for this type are system schema table names in the `sys` schema in SQL, for example `sys.segments` and `sys.server_segments`. Druid only enforces authorization for `SYSTEM_TABLE` resources when the Broker property `druid.sql.planner.authorizeSystemTablesDirectly` is true.
### HTTP methods
-For information on what HTTP methods are supported on a particular request endpoint, refer to [API reference](./api-reference.md).
+For information on what HTTP methods are supported on a particular request endpoint, refer to [API reference](../api-reference/api-reference.md).
`GET` requests require READ permissions, while `POST` and `DELETE` requests require WRITE permissions.
diff --git a/docs/operations/tls-support.md b/docs/operations/tls-support.md
index 7189af9f2f9..b5db993eeeb 100644
--- a/docs/operations/tls-support.md
+++ b/docs/operations/tls-support.md
@@ -83,9 +83,9 @@ be configured with a proper [SSLContext](http://docs.oracle.com/javase/8/docs/ap
to validate the Server Certificates, otherwise communication will fail.
Since, there are various ways to configure SSLContext, by default, Druid looks for an instance of SSLContext Guice binding
-while creating the HttpClient. This binding can be achieved writing a [Druid extension](../development/extensions.md)
+while creating the HttpClient. This binding can be achieved writing a [Druid extension](../configuration/extensions.md)
which can provide an instance of SSLContext. Druid comes with a simple extension present [here](../development/extensions-core/simple-client-sslcontext.md)
-which should be useful enough for most simple cases, see [this](../development/extensions.md#loading-extensions) for how to include extensions.
+which should be useful enough for most simple cases, see [this](../configuration/extensions.md#loading-extensions) for how to include extensions.
If this extension does not satisfy the requirements then please follow the extension [implementation](https://github.com/apache/druid/tree/master/extensions-core/simple-client-sslcontext)
to create your own extension.
diff --git a/docs/querying/caching.md b/docs/querying/caching.md
index e8f3fcaedf6..26fe063e68f 100644
--- a/docs/querying/caching.md
+++ b/docs/querying/caching.md
@@ -53,19 +53,19 @@ Druid invalidates any cache the moment any underlying data change to avoid retur
The primary form of caching in Druid is a *per-segment results cache*. This cache stores partial query results on a per-segment basis and is enabled on Historical services by default.
-The per-segment results cache allows Druid to maintain a low-eviction-rate cache for segments that do not change, especially important for those segments that [historical](../design/historical.md) processes pull into their local _segment cache_ from [deep storage](../dependencies/deep-storage.md). Real-time segments, on the other hand, continue to have results computed at query time.
+The per-segment results cache allows Druid to maintain a low-eviction-rate cache for segments that do not change, especially important for those segments that [historical](../design/historical.md) processes pull into their local _segment cache_ from [deep storage](../design/deep-storage.md). Real-time segments, on the other hand, continue to have results computed at query time.
Druid may potentially merge per-segment cached results with the results of later queries that use a similar basic shape with similar filters, aggregations, etc. For example, if the query is identical except that it covers a different time period.
Per-segment caching is controlled by the parameters `useCache` and `populateCache`.
-Use per-segment caching with real-time data. For example, your queries request data actively arriving from Kafka alongside intervals in segments that are loaded on Historicals. Druid can merge cached results from Historical segments with real-time results from the stream. [Whole-query caching](#whole-query-caching), on the other hand, is not helpful in this scenario because new data from real-time ingestion will continually invalidate the entire cached result.
+Use per-segment caching with real-time data. For example, your queries request data actively arriving from Kafka alongside intervals in segments that are loaded on Historicals. Druid can merge cached results from Historical segments with real-time results from the stream. [Whole-query caching](#whole-query-caching), on the other hand, is not helpful in this scenario because new data from real-time ingestion will continually invalidate the entire cached result.
### Whole-query caching
With *whole-query caching*, Druid caches the entire results of individual queries, meaning the Broker no longer needs to merge per-segment results from data processes.
-Use *whole-query caching* on the Broker to increase query efficiency when there is little risk of ingestion invalidating the cache at a segment level. This applies particularly, for example, when _not_ using real-time ingestion. Perhaps your queries tend to use batch-ingested data, in which case per-segment caching would be less efficient since the underlying segments hardly ever change, yet Druid would continue to acquire per-segment results for each query.
+Use *whole-query caching* on the Broker to increase query efficiency when there is little risk of ingestion invalidating the cache at a segment level. This applies particularly, for example, when _not_ using real-time ingestion. Perhaps your queries tend to use batch-ingested data, in which case per-segment caching would be less efficient since the underlying segments hardly ever change, yet Druid would continue to acquire per-segment results for each query.
## Where to enable caching
@@ -79,7 +79,7 @@ Use *whole-query caching* on the Broker to increase query efficiency when there
- On Brokers for small production clusters with less than five servers.
-Avoid using per-segment cache at the Broker for large production clusters. When the Broker cache is enabled (`druid.broker.cache.populateCache` is `true`) and `populateCache` _is not_ `false` in the [query context](../querying/query-context.md), individual Historicals will _not_ merge individual segment-level results, and instead pass these back to the lead Broker. The Broker must then carry out a large merge from _all_ segments on its own.
+Avoid using per-segment cache at the Broker for large production clusters. When the Broker cache is enabled (`druid.broker.cache.populateCache` is `true`) and `populateCache` _is not_ `false` in the [query context](../querying/query-context.md), individual Historicals will _not_ merge individual segment-level results, and instead pass these back to the lead Broker. The Broker must then carry out a large merge from _all_ segments on its own.
**Whole-query cache** is available exclusively on Brokers.
diff --git a/docs/querying/datasource.md b/docs/querying/datasource.md
index 211f58bd8c1..e348bc81c66 100644
--- a/docs/querying/datasource.md
+++ b/docs/querying/datasource.md
@@ -333,7 +333,7 @@ Native join datasources have the following properties. All are required.
|`left`|Left-hand datasource. Must be of type `table`, `join`, `lookup`, `query`, or `inline`. Placing another join as the left datasource allows you to join arbitrarily many datasources.|
|`right`|Right-hand datasource. Must be of type `lookup`, `query`, or `inline`. Note that this is more rigid than what Druid SQL requires.|
|`rightPrefix`|String prefix that will be applied to all columns from the right-hand datasource, to prevent them from colliding with columns from the left-hand datasource. Can be any string, so long as it is nonempty and is not be a prefix of the string `__time`. Any columns from the left-hand side that start with your `rightPrefix` will be shadowed. It is up to you to provide a prefix that will not shadow any important columns from the left side.|
-|`condition`|[Expression](../misc/math-expr.md) that must be an equality where one side is an expression of the left-hand side, and the other side is a simple column reference to the right-hand side. Note that this is more rigid than what Druid SQL requires: here, the right-hand reference must be a simple column reference; in SQL it can be an expression.|
+|`condition`|[Expression](math-expr.md) that must be an equality where one side is an expression of the left-hand side, and the other side is a simple column reference to the right-hand side. Note that this is more rigid than what Druid SQL requires: here, the right-hand reference must be a simple column reference; in SQL it can be an expression.|
|`joinType`|`INNER` or `LEFT`.|
#### Join performance
diff --git a/docs/querying/filters.md b/docs/querying/filters.md
index f243ebb4117..82fdb811688 100644
--- a/docs/querying/filters.md
+++ b/docs/querying/filters.md
@@ -550,4 +550,4 @@ This filter allows for more flexibility, but it might be less performant than a
}
```
-See the [Druid expression system](../misc/math-expr.md) for more details.
+See the [Druid expression system](math-expr.md) for more details.
diff --git a/docs/development/geo.md b/docs/querying/geo.md
similarity index 100%
rename from docs/development/geo.md
rename to docs/querying/geo.md
diff --git a/docs/misc/math-expr.md b/docs/querying/math-expr.md
similarity index 100%
rename from docs/misc/math-expr.md
rename to docs/querying/math-expr.md
diff --git a/docs/querying/nested-columns.md b/docs/querying/nested-columns.md
index d0809ad8c21..8f13372fdb4 100644
--- a/docs/querying/nested-columns.md
+++ b/docs/querying/nested-columns.md
@@ -25,7 +25,7 @@ sidebar_label: Nested columns
Apache Druid supports directly storing nested data structures in `COMPLEX` columns. `COMPLEX` columns store a copy of the structured data in JSON format and specialized internal columns and indexes for nested literal values—STRING, LONG, and DOUBLE types. An optimized [virtual column](./virtual-columns.md#nested-field-virtual-column) allows Druid to read and filter these values at speeds consistent with standard Druid LONG, DOUBLE, and STRING columns.
-Druid [SQL JSON functions](./sql-json-functions.md) allow you to extract, transform, and create `COMPLEX` values in SQL queries, using the specialized virtual columns where appropriate. You can use the [JSON nested columns functions](../misc/math-expr.md#json-functions) in [native queries](./querying.md) using [expression virtual columns](./virtual-columns.md#expression-virtual-column), and in native ingestion with a [`transformSpec`](../ingestion/ingestion-spec.md#transformspec).
+Druid [SQL JSON functions](./sql-json-functions.md) allow you to extract, transform, and create `COMPLEX` values in SQL queries, using the specialized virtual columns where appropriate. You can use the [JSON nested columns functions](math-expr.md#json-functions) in [native queries](./querying.md) using [expression virtual columns](./virtual-columns.md#expression-virtual-column), and in native ingestion with a [`transformSpec`](../ingestion/ingestion-spec.md#transformspec).
You can use the JSON functions in INSERT and REPLACE statements in SQL-based ingestion, or in a `transformSpec` in native ingestion as an alternative to using a [`flattenSpec`](../ingestion/data-formats.md#flattenspec) object to "flatten" nested data for ingestion.
diff --git a/docs/querying/post-aggregations.md b/docs/querying/post-aggregations.md
index 935ca8fbce1..e42b1d333ff 100644
--- a/docs/querying/post-aggregations.md
+++ b/docs/querying/post-aggregations.md
@@ -92,7 +92,7 @@ The constant post-aggregator always returns the specified value.
### Expression post-aggregator
-The expression post-aggregator is defined using a Druid [expression](../misc/math-expr.md).
+The expression post-aggregator is defined using a Druid [expression](math-expr.md).
```json
{
diff --git a/docs/querying/query-context.md b/docs/querying/query-context.md
index 0d6bd350ba4..326753970fb 100644
--- a/docs/querying/query-context.md
+++ b/docs/querying/query-context.md
@@ -26,7 +26,7 @@ sidebar_label: "Query context"
The query context is used for various query configuration parameters. Query context parameters can be specified in
the following ways:
-- For [Druid SQL](sql-api.md), context parameters are provided either in a JSON object named `context` to the
+- For [Druid SQL](../api-reference/sql-api.md), context parameters are provided either in a JSON object named `context` to the
HTTP POST API, or as properties to the JDBC connection.
- For [native queries](querying.md), context parameters are provided in a JSON object named `context`.
@@ -108,12 +108,12 @@ batches of rows at a time. Not all queries can be vectorized. In particular, vec
requirements:
- All query-level filters must either be able to run on bitmap indexes or must offer vectorized row-matchers. These
-include "selector", "bound", "in", "like", "regex", "search", "and", "or", and "not".
+include `selector`, `bound`, `in`, `like`, `regex`, `search`, `and`, `or`, and `not`.
- All filters in filtered aggregators must offer vectorized row-matchers.
-- All aggregators must offer vectorized implementations. These include "count", "doubleSum", "floatSum", "longSum", "longMin",
- "longMax", "doubleMin", "doubleMax", "floatMin", "floatMax", "longAny", "doubleAny", "floatAny", "stringAny",
- "hyperUnique", "filtered", "approxHistogram", "approxHistogramFold", and "fixedBucketsHistogram" (with numerical input).
-- All virtual columns must offer vectorized implementations. Currently for expression virtual columns, support for vectorization is decided on a per expression basis, depending on the type of input and the functions used by the expression. See the currently supported list in the [expression documentation](../misc/math-expr.md#vectorization-support).
+- All aggregators must offer vectorized implementations. These include `count`, `doubleSum`, `floatSum`, `longSum`. `longMin`,
+ `longMax`, `doubleMin`, `doubleMax`, `floatMin`, `floatMax`, `longAny`, `doubleAny`, `floatAny`, `stringAny`,
+ `hyperUnique`, `filtered`, `approxHistogram`, `approxHistogramFold`, and `fixedBucketsHistogram` (with numerical input).
+- All virtual columns must offer vectorized implementations. Currently for expression virtual columns, support for vectorization is decided on a per expression basis, depending on the type of input and the functions used by the expression. See the currently supported list in the [expression documentation](math-expr.md#vectorization-support).
- For GroupBy: All dimension specs must be "default" (no extraction functions or filtered dimension specs).
- For GroupBy: No multi-value dimensions.
- For Timeseries: No "descending" order.
diff --git a/docs/querying/querying.md b/docs/querying/querying.md
index 14885267d15..e957e7a527d 100644
--- a/docs/querying/querying.md
+++ b/docs/querying/querying.md
@@ -108,7 +108,7 @@ curl -X DELETE "http://host:port/druid/v2/abc123"
### Authentication and authorization failures
-For [secured](../design/auth.md) Druid clusters, query requests respond with an HTTP 401 response code in case of an authentication failure. For authorization failures, an HTTP 403 response code is returned.
+For [secured](../operations/auth.md) Druid clusters, query requests respond with an HTTP 401 response code in case of an authentication failure. For authorization failures, an HTTP 403 response code is returned.
### Query execution failures
diff --git a/docs/querying/sql-data-types.md b/docs/querying/sql-data-types.md
index 4e6286d032d..a98fca4a855 100644
--- a/docs/querying/sql-data-types.md
+++ b/docs/querying/sql-data-types.md
@@ -158,7 +158,7 @@ runtime property controls Druid's boolean logic mode. For the most SQL compliant
When `druid.expressions.useStrictBooleans = false` (the default mode), Druid uses two-valued logic.
When `druid.expressions.useStrictBooleans = true`, Druid uses three-valued logic for
-[expressions](../misc/math-expr.md) evaluation, such as `expression` virtual columns or `expression` filters.
+[expressions](math-expr.md) evaluation, such as `expression` virtual columns or `expression` filters.
However, even in this mode, Druid uses two-valued logic for filter types other than `expression`.
## Nested columns
diff --git a/docs/querying/sql-query-context.md b/docs/querying/sql-query-context.md
index caab4772ab6..e469fa390a7 100644
--- a/docs/querying/sql-query-context.md
+++ b/docs/querying/sql-query-context.md
@@ -41,12 +41,12 @@ Configure Druid SQL query planning using the parameters in the table below.
|`useApproximateCountDistinct`|Whether to use an approximate cardinality algorithm for `COUNT(DISTINCT foo)`.|`druid.sql.planner.useApproximateCountDistinct` on the Broker (default: true)|
|`useGroupingSetForExactDistinct`|Whether to use grouping sets to execute queries with multiple exact distinct aggregations.|`druid.sql.planner.useGroupingSetForExactDistinct` on the Broker (default: false)|
|`useApproximateTopN`|Whether to use approximate [TopN queries](topnquery.md) when a SQL query could be expressed as such. If false, exact [GroupBy queries](groupbyquery.md) will be used instead.|`druid.sql.planner.useApproximateTopN` on the Broker (default: true)|
-|`enableTimeBoundaryPlanning`|If true, SQL queries will get converted to TimeBoundary queries wherever possible. TimeBoundary queries are very efficient for min-max calculation on __time column in a datasource |`druid.query.default.context.enableTimeBoundaryPlanning` on the Broker (default: false)|
+|`enableTimeBoundaryPlanning`|If true, SQL queries will get converted to TimeBoundary queries wherever possible. TimeBoundary queries are very efficient for min-max calculation on `__time` column in a datasource |`druid.query.default.context.enableTimeBoundaryPlanning` on the Broker (default: false)|
|`useNativeQueryExplain`|If true, `EXPLAIN PLAN FOR` will return the explain plan as a JSON representation of equivalent native query(s), else it will return the original version of explain plan generated by Calcite.
This property is provided for backwards compatibility. It is not recommended to use this parameter unless you were depending on the older behavior.|`druid.sql.planner.useNativeQueryExplain` on the Broker (default: true)|
|`sqlFinalizeOuterSketches`|If false (default behavior in Druid 25.0.0 and later), `DS_HLL`, `DS_THETA`, and `DS_QUANTILES_SKETCH` return sketches in query results, as documented. If true (default behavior in Druid 24.0.1 and earlier), sketches from these functions are finalized when they appear in query results.
This property is provided for backwards compatibility with behavior in Druid 24.0.1 and earlier. It is not recommended to use this parameter unless you were depending on the older behavior. Instead, use a function that does not return a sketch, such as `APPROX_COUNT_DISTINCT_DS_HLL`, `APPROX_COUNT_DISTINCT_DS_THETA`, `APPROX_QUANTILE_DS`, `DS_THETA_ESTIMATE`, or `DS_GET_QUANTILE`.|`druid.query.default.context.sqlFinalizeOuterSketches` on the Broker (default: false)|
## Setting the query context
-The query context parameters can be specified as a "context" object in the [JSON API](sql-api.md) or as a [JDBC connection properties object](sql-jdbc.md).
+The query context parameters can be specified as a "context" object in the [JSON API](../api-reference/sql-api.md) or as a [JDBC connection properties object](../api-reference/sql-jdbc.md).
See examples for each option below.
### Example using JSON API
diff --git a/docs/querying/sql-translation.md b/docs/querying/sql-translation.md
index 18a2886354e..4b0b2d8fbc8 100644
--- a/docs/querying/sql-translation.md
+++ b/docs/querying/sql-translation.md
@@ -375,7 +375,7 @@ Additionally, some Druid native query features are not supported by the SQL lang
include:
- [Inline datasources](datasource.md#inline).
-- [Spatial filters](../development/geo.md).
+- [Spatial filters](geo.md).
- [Multi-value dimensions](sql-data-types.md#multi-value-strings) are only partially implemented in Druid SQL. There are known
inconsistencies between their behavior in SQL queries and in native queries due to how they are currently treated by
the SQL planner.
diff --git a/docs/querying/sql.md b/docs/querying/sql.md
index 58889896128..c68ce28c845 100644
--- a/docs/querying/sql.md
+++ b/docs/querying/sql.md
@@ -26,7 +26,7 @@ sidebar_label: "Overview and syntax"
> Apache Druid supports two query languages: Druid SQL and [native queries](querying.md).
> This document describes the SQL language.
-You can query data in Druid datasources using [Druid SQL](./sql.md). Druid translates SQL queries into its [native query language](./querying.md). To learn about translation and how to get the best performance from Druid SQL, see [SQL query translation](./sql-translation.md).
+You can query data in Druid datasources using Druid SQL. Druid translates SQL queries into its [native query language](querying.md). To learn about translation and how to get the best performance from Druid SQL, see [SQL query translation](sql-translation.md).
Druid SQL planning occurs on the Broker.
Set [Broker runtime properties](../configuration/index.md#sql) to configure the query plan and JDBC querying.
@@ -42,8 +42,8 @@ For more information and SQL querying options see:
- [Query translation](./sql-translation.md) for information about how Druid translates SQL queries to native queries before running them.
For information about APIs, see:
-- [Druid SQL API](./sql-api.md) for information on the HTTP API.
-- [SQL JDBC driver API](./sql-jdbc.md) for information about the JDBC driver API.
+- [Druid SQL API](../api-reference/sql-api.md) for information on the HTTP API.
+- [SQL JDBC driver API](../api-reference/sql-jdbc.md) for information about the JDBC driver API.
- [SQL query context](./sql-query-context.md) for information about the query context parameters that affect SQL planning.
## Syntax
@@ -270,7 +270,7 @@ written like `INTERVAL '1' HOUR`, `INTERVAL '1 02:03' DAY TO MINUTE`, `INTERVAL
Druid SQL supports dynamic parameters using question mark (`?`) syntax, where parameters are bound to `?` placeholders
at execution time. To use dynamic parameters, replace any literal in the query with a `?` character and provide a
corresponding parameter value when you execute the query. Parameters are bound to the placeholders in the order in
-which they are passed. Parameters are supported in both the [HTTP POST](sql-api.md) and [JDBC](sql-jdbc.md) APIs.
+which they are passed. Parameters are supported in both the [HTTP POST](../api-reference/sql-api.md) and [JDBC](../api-reference/sql-jdbc.md) APIs.
In certain cases, using dynamic parameters in expressions can cause type inference issues which cause your query to fail, for example:
diff --git a/docs/querying/using-caching.md b/docs/querying/using-caching.md
index d920b7bb06b..12e8b5bbe2f 100644
--- a/docs/querying/using-caching.md
+++ b/docs/querying/using-caching.md
@@ -83,7 +83,7 @@ As long as the service is set to populate the cache, you can set cache options f
}
}
```
-In this example the user has set `populateCache` to `false` to avoid filling the result cache with results for segments that are over a year old. For more information, see [Druid SQL client APIs](./sql-api.md).
+In this example the user has set `populateCache` to `false` to avoid filling the result cache with results for segments that are over a year old. For more information, see [Druid SQL client APIs](../api-reference/sql-api.md).
diff --git a/docs/querying/virtual-columns.md b/docs/querying/virtual-columns.md
index b5ccf80f423..6a7e8604c4a 100644
--- a/docs/querying/virtual-columns.md
+++ b/docs/querying/virtual-columns.md
@@ -65,7 +65,7 @@ Each Apache Druid query can accept a list of virtual columns as a parameter. The
### Expression virtual column
-Expression virtual columns use Druid's native [expression](../misc/math-expr.md) system to allow defining query time
+Expression virtual columns use Druid's native [expression](math-expr.md) system to allow defining query time
transforms of inputs from one or more columns.
The expression virtual column has the following syntax:
@@ -83,7 +83,7 @@ The expression virtual column has the following syntax:
|--------|-----------|---------|
|type|Must be `"expression"` to indicate that this is an expression virtual column.|yes|
|name|The name of the virtual column.|yes|
-|expression|An [expression](../misc/math-expr.md) that takes a row as input and outputs a value for the virtual column.|yes|
+|expression|An [expression](math-expr.md) that takes a row as input and outputs a value for the virtual column.|yes|
|outputType|The expression's output will be coerced to this type. Can be LONG, FLOAT, DOUBLE, STRING, ARRAY types, or COMPLEX types.|no, default is FLOAT|
### Nested field virtual column
diff --git a/docs/tutorials/cluster.md b/docs/tutorials/cluster.md
index aeb47dff271..83b9fc2c975 100644
--- a/docs/tutorials/cluster.md
+++ b/docs/tutorials/cluster.md
@@ -1,6 +1,7 @@
---
id: cluster
-title: "Clustered deployment"
+title: Clustered deployment
+sidebar_label: Clustered deployment
---
-Redirecting you to the JDBC connector examples...
+Redirecting you to the JDBC driver API...
-
+
-Click here if you are not redirected.
+Click here if you are not redirected.
diff --git a/docs/tutorials/tutorial-jupyter-index.md b/docs/tutorials/tutorial-jupyter-index.md
index d7f401cae5b..19382f9e4f8 100644
--- a/docs/tutorials/tutorial-jupyter-index.md
+++ b/docs/tutorials/tutorial-jupyter-index.md
@@ -1,6 +1,7 @@
---
id: tutorial-jupyter-index
-title: "Jupyter Notebook tutorials"
+title: Jupyter Notebook tutorials
+sidebar_label: Jupyter Notebook tutorials
---
-Apache Druid can summarize raw data at ingestion time using a process we refer to as "roll-up". Roll-up is a first-level aggregation operation over a selected set of columns that reduces the size of stored data.
+Apache Druid can summarize raw data at ingestion time using a process we refer to as "rollup". Rollup is a first-level aggregation operation over a selected set of columns that reduces the size of stored data.
-This tutorial will demonstrate the effects of roll-up on an example dataset.
+This tutorial will demonstrate the effects of rollup on an example dataset.
For this tutorial, we'll assume you've already downloaded Druid as described in
the [single-machine quickstart](index.md) and have it running on your local machine.
-It will also be helpful to have finished [Tutorial: Loading a file](../tutorials/tutorial-batch.md) and [Tutorial: Querying data](../tutorials/tutorial-query.md).
+It will also be helpful to have finished [Load a file](../tutorials/tutorial-batch.md) and [Query data](../tutorials/tutorial-query.md) tutorials.
## Example data
@@ -105,7 +105,7 @@ We'll ingest this data using the following ingestion task spec, located at `quic
}
```
-Roll-up has been enabled by setting `"rollup" : true` in the `granularitySpec`.
+Rollup has been enabled by setting `"rollup" : true` in the `granularitySpec`.
Note that we have `srcIP` and `dstIP` defined as dimensions, a longSum metric is defined for the `packets` and `bytes` columns, and the `queryGranularity` has been defined as `minute`.
@@ -181,7 +181,7 @@ Likewise, these two events that occurred during `2018-01-01T01:02` have been rol
ββββββββββββββββββββββββββββ΄βββββββββ΄ββββββββ΄ββββββββββ΄ββββββββββ΄ββββββββββ
```
-For the last event recording traffic between 1.1.1.1 and 2.2.2.2, no roll-up took place, because this was the only event that occurred during `2018-01-01T01:03`:
+For the last event recording traffic between 1.1.1.1 and 2.2.2.2, no rollup took place, because this was the only event that occurred during `2018-01-01T01:03`:
```json
{"timestamp":"2018-01-01T01:03:29Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":49,"bytes":10204}
diff --git a/docs/tutorials/tutorial-sql-query-view.md b/docs/tutorials/tutorial-sql-query-view.md
index da47de684c1..beeb08e15d4 100644
--- a/docs/tutorials/tutorial-sql-query-view.md
+++ b/docs/tutorials/tutorial-sql-query-view.md
@@ -1,7 +1,7 @@
---
id: tutorial-sql-query-view
-title: "Tutorial: Get to know Query view"
-sidebar_label: "Get to know Query view"
+title: Get to know Query view
+sidebar_label: Get to know Query view
---