Update Ingestion section (#14023)

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Victoria Lim <lim.t.victoria@gmail.com>
This commit is contained in:
Katya Macedo 2023-05-19 11:42:27 -05:00 committed by GitHub
parent 7f66fd049b
commit 269137c682
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
150 changed files with 528 additions and 581 deletions

View File

@ -1,6 +1,7 @@
---
id: api-reference
title: "API reference"
title: HTTP API endpoints reference
sidebar_label: API endpoints reference
---
<!--
@ -23,21 +24,21 @@ title: "API reference"
-->
This page documents all of the API endpoints for each Druid service type.
This topic documents all of the API endpoints for each Druid service type.
## Common
The following endpoints are supported by all processes.
All processes support the following endpoints.
### Process information
`GET /status`
Returns the Druid version, loaded extensions, memory used, total memory and other useful information about the process.
Returns the Druid version, loaded extensions, memory used, total memory, and other useful information about the process.
`GET /status/health`
An endpoint that always returns a boolean "true" value with a 200 OK response, useful for automated health checks.
Always returns a boolean `true` value with a 200 OK response, useful for automated health checks.
`GET /status/properties`
@ -77,7 +78,7 @@ Returns the current leader Coordinator of the cluster.
`GET /druid/coordinator/v1/isLeader`
Returns a JSON object with field "leader", either true or false, indicating if this server is the current leader
Returns a JSON object with `leader` parameter, either true or false, indicating if this server is the current leader
Coordinator of the cluster. In addition, returns HTTP 200 if the server is the current leader and HTTP 404 if not.
This is suitable for use as a load balancer status check if you only want the active leader to be considered in-service
at the load balancer.
@ -119,10 +120,9 @@ Returns the number of segments to load and drop, as well as the total segment lo
Returns the serialized JSON of segments to load and drop for each Historical process.
#### Segment loading by datasource
Note that all _interval_ query parameters are ISO 8601 strings (e.g., 2016-06-27/2016-06-28).
Note that all _interval_ query parameters are ISO 8601 strings&mdash;for example, 2016-06-27/2016-06-28.
Also note that these APIs only guarantees that the segments are available at the time of the call.
Segments can still become missing because of historical process failures or any other reasons afterward.
@ -216,18 +216,17 @@ segment is unused, or is unknown, a 404 response is returned.
`GET /druid/coordinator/v1/metadata/datasources/{dataSourceName}/segments`
Returns a list of all segments, overlapping with any of given intervals, for a datasource as stored in the metadata store. Request body is array of string IS0 8601 intervals like [interval1, interval2,...] for example ["2012-01-01T00:00:00.000/2012-01-03T00:00:00.000", "2012-01-05T00:00:00.000/2012-01-07T00:00:00.000"]
Returns a list of all segments, overlapping with any of given intervals, for a datasource as stored in the metadata store. Request body is array of string IS0 8601 intervals like `[interval1, interval2,...]`&mdash;for example, `["2012-01-01T00:00:00.000/2012-01-03T00:00:00.000", "2012-01-05T00:00:00.000/2012-01-07T00:00:00.000"]`.
`GET /druid/coordinator/v1/metadata/datasources/{dataSourceName}/segments?full`
Returns a list of all segments, overlapping with any of given intervals, for a datasource with the full segment metadata as stored in the metadata store. Request body is array of string ISO 8601 intervals like [interval1, interval2,...] for example ["2012-01-01T00:00:00.000/2012-01-03T00:00:00.000", "2012-01-05T00:00:00.000/2012-01-07T00:00:00.000"]
Returns a list of all segments, overlapping with any of given intervals, for a datasource with the full segment metadata as stored in the metadata store. Request body is array of string ISO 8601 intervals like `[interval1, interval2,...]`&mdash;for example, `["2012-01-01T00:00:00.000/2012-01-03T00:00:00.000", "2012-01-05T00:00:00.000/2012-01-07T00:00:00.000"]`.
<a name="coordinator-datasources"></a>
#### Datasources
Note that all _interval_ URL parameters are ISO 8601 strings delimited by a `_` instead of a `/`
(e.g., 2016-06-27_2016-06-28).
Note that all _interval_ URL parameters are ISO 8601 strings delimited by a `_` instead of a `/`&mdash;for example, `2016-06-27_2016-06-28`.
`GET /druid/coordinator/v1/datasources`
@ -294,6 +293,7 @@ Returns full segment metadata for a specific segment in the cluster.
Return the tiers that a datasource exists in.
#### Note for Coordinator's POST and DELETE APIs
While segments may be enabled by issuing POST requests for the datasources, the Coordinator may again disable segments if they match any configured [drop rules](../operations/rule-configuration.md#drop-rules). Even if segments are enabled by these APIs, you must configure a [load rule](../operations/rule-configuration.md#load-rules) to load them onto Historical processes. If an indexing or kill task runs at the same time these APIs are invoked, the behavior is undefined. Some segments might be killed and others might be enabled. It's also possible that all segments might be disabled, but the indexing task can still read data from those segments and succeed.
> Avoid using indexing or kill tasks and these APIs at the same time for the same datasource and time chunk.
@ -316,8 +316,8 @@ result of this API call.
Marks segments (un)used for a datasource by interval or set of segment Ids. When marking used only segments that are not overshadowed will be updated.
The request payload contains the interval or set of segment Ids to be marked unused.
Either interval or segment ids should be provided, if both or none are provided in the payload, the API would throw an error (400 BAD REQUEST).
The request payload contains the interval or set of segment IDs to be marked unused.
Either interval or segment IDs should be provided, if both or none are provided in the payload, the API would throw an error (400 BAD REQUEST).
Interval specifies the start and end times as IS0 8601 strings. `interval=(start/end)` where start and end both are inclusive and only the segments completely contained within the specified interval will be disabled, partially overlapping segments will not be affected.
@ -325,9 +325,8 @@ JSON Request Payload:
|Key|Description|Example|
|----------|-------------|---------|
|`interval`|The interval for which to mark segments unused|"2015-09-12T03:00:00.000Z/2015-09-12T05:00:00.000Z"|
|`segmentIds`|Set of segment Ids to be marked unused|["segmentId1", "segmentId2"]|
|`interval`|The interval for which to mark segments unused|`"2015-09-12T03:00:00.000Z/2015-09-12T05:00:00.000Z"`|
|`segmentIds`|Set of segment IDs to be marked unused|`["segmentId1", "segmentId2"]`|
`DELETE /druid/coordinator/v1/datasources/{dataSourceName}`
@ -348,8 +347,7 @@ result of this API call.
#### Retention rules
Note that all _interval_ URL parameters are ISO 8601 strings delimited by a `_` instead of a `/`
(e.g., 2016-06-27_2016-06-28).
Note that all _interval_ URL parameters are ISO 8601 strings delimited by a `_` instead of a `/` as in `2016-06-27_2016-06-28`.
`GET /druid/coordinator/v1/rules`
@ -365,7 +363,7 @@ Returns all rules for a specified datasource and includes default datasource.
`GET /druid/coordinator/v1/rules/history?interval=<interval>`
Returns audit history of rules for all datasources. default value of interval can be specified by setting `druid.audit.manager.auditHistoryMillis` (1 week if not configured) in Coordinator runtime.properties
Returns audit history of rules for all datasources. Default value of interval can be specified by setting `druid.audit.manager.auditHistoryMillis` (1 week if not configured) in Coordinator `runtime.properties`.
`GET /druid/coordinator/v1/rules/history?count=<n>`
@ -373,7 +371,7 @@ Returns last `n` entries of audit history of rules for all datasources.
`GET /druid/coordinator/v1/rules/{dataSourceName}/history?interval=<interval>`
Returns audit history of rules for a specified datasource. default value of interval can be specified by setting `druid.audit.manager.auditHistoryMillis` (1 week if not configured) in Coordinator runtime.properties
Returns audit history of rules for a specified datasource. Default value of interval can be specified by setting `druid.audit.manager.auditHistoryMillis` (1 week if not configured) in Coordinator `runtime.properties`.
`GET /druid/coordinator/v1/rules/{dataSourceName}/history?count=<n>`
@ -387,13 +385,12 @@ Optional Header Parameters for auditing the config change can also be specified.
|Header Param Name| Description | Default |
|----------|-------------|---------|
|`X-Druid-Author`| author making the config change|""|
|`X-Druid-Comment`| comment describing the change being done|""|
|`X-Druid-Author`| Author making the config change|`""`|
|`X-Druid-Comment`| Comment describing the change being done|`""`|
#### Intervals
Note that all _interval_ URL parameters are ISO 8601 strings delimited by a `_` instead of a `/`
(e.g., 2016-06-27_2016-06-28).
Note that all _interval_ URL parameters are ISO 8601 strings delimited by a `_` instead of a `/` as in `2016-06-27_2016-06-28`.
`GET /druid/coordinator/v1/intervals`
@ -401,22 +398,22 @@ Returns all intervals for all datasources with total size and count.
`GET /druid/coordinator/v1/intervals/{interval}`
Returns aggregated total size and count for all intervals that intersect given isointerval.
Returns aggregated total size and count for all intervals that intersect given ISO interval.
`GET /druid/coordinator/v1/intervals/{interval}?simple`
Returns total size and count for each interval within given isointerval.
Returns total size and count for each interval within given ISO interval.
`GET /druid/coordinator/v1/intervals/{interval}?full`
Returns total size and count for each datasource for each interval within given isointerval.
Returns total size and count for each datasource for each interval within given ISO interval.
#### Dynamic configuration
See [Coordinator Dynamic Configuration](../configuration/index.md#dynamic-configuration) for details.
Note that all _interval_ URL parameters are ISO 8601 strings delimited by a `_` instead of a `/`
(e.g., 2016-06-27_2016-06-28).
as in `2016-06-27_2016-06-28`.
`GET /druid/coordinator/v1/config`
@ -437,11 +434,10 @@ Update overlord dynamic worker configuration.
Returns the total size of segments awaiting compaction for the given dataSource. The specified dataSource must have [automatic compaction](../data-management/automatic-compaction.md) enabled.
`GET /druid/coordinator/v1/compaction/status`
Returns the status and statistics from the auto-compaction run of all dataSources which have auto-compaction enabled in the latest run. The response payload includes a list of `latestStatus` objects. Each `latestStatus` represents the status for a dataSource (which has/had auto-compaction enabled).
The `latestStatus` object has the following keys:
* `dataSource`: name of the datasource for this status information
* `scheduleStatus`: auto-compaction scheduling status. Possible values are `NOT_ENABLED` and `RUNNING`. Returns `RUNNING ` if the dataSource has an active auto-compaction config submitted. Otherwise, returns `NOT_ENABLED`.
@ -457,8 +453,8 @@ The `latestStatus` object has the following keys:
`GET /druid/coordinator/v1/compaction/status?dataSource={dataSource}`
Similar to the API `/druid/coordinator/v1/compaction/status` above but filters response to only return information for the {dataSource} given.
Note that {dataSource} given must have/had auto-compaction enabled.
Similar to the API `/druid/coordinator/v1/compaction/status` above but filters response to only return information for the dataSource given.
The dataSource must have auto-compaction enabled.
#### Automatic compaction configuration
@ -525,14 +521,14 @@ Returns the current leader Overlord of the cluster. If you have multiple Overlor
`GET /druid/indexer/v1/isLeader`
This returns a JSON object with field "leader", either true or false. In addition, this call returns HTTP 200 if the
This returns a JSON object with field `leader`, either true or false. In addition, this call returns HTTP 200 if the
server is the current leader and HTTP 404 if not. This is suitable for use as a load balancer status check if you
only want the active leader to be considered in-service at the load balancer.
#### Tasks
Note that all _interval_ URL parameters are ISO 8601 strings delimited by a `_` instead of a `/`
(e.g., 2016-06-27_2016-06-28).
as in `2016-06-27_2016-06-28`.
`GET /druid/indexer/v1/tasks`
@ -618,9 +614,9 @@ Returns a list of objects of the currently active supervisors.
|---|---|---|
|`id`|String|supervisor unique identifier|
|`state`|String|basic state of the supervisor. Available states:`UNHEALTHY_SUPERVISOR`, `UNHEALTHY_TASKS`, `PENDING`, `RUNNING`, `SUSPENDED`, `STOPPING`. Check [Kafka Docs](../development/extensions-core/kafka-supervisor-operations.md) for details.|
|`detailedState`|String|supervisor specific state. (See documentation of specific supervisor for details), e.g. [Kafka](../development/extensions-core/kafka-ingestion.md) or [Kinesis](../development/extensions-core/kinesis-ingestion.md))|
|`detailedState`|String|supervisor specific state. See documentation of specific supervisor for details: [Kafka](../development/extensions-core/kafka-ingestion.md) or [Kinesis](../development/extensions-core/kinesis-ingestion.md)|
|`healthy`|Boolean|true or false indicator of overall supervisor health|
|`spec`|SupervisorSpec|json specification of supervisor (See Supervisor Configuration for details)|
|`spec`|SupervisorSpec|JSON specification of supervisor|
`GET /druid/indexer/v1/supervisor?state=true`
@ -630,7 +626,7 @@ Returns a list of objects of the currently active supervisors and their current
|---|---|---|
|`id`|String|supervisor unique identifier|
|`state`|String|basic state of the supervisor. Available states: `UNHEALTHY_SUPERVISOR`, `UNHEALTHY_TASKS`, `PENDING`, `RUNNING`, `SUSPENDED`, `STOPPING`. Check [Kafka Docs](../development/extensions-core/kafka-supervisor-operations.md) for details.|
|`detailedState`|String|supervisor specific state. (See documentation of the specific supervisor for details, e.g. [Kafka](../development/extensions-core/kafka-ingestion.md) or [Kinesis](../development/extensions-core/kinesis-ingestion.md))|
|`detailedState`|String|supervisor specific state. See documentation of the specific supervisor for details: [Kafka](../development/extensions-core/kafka-ingestion.md) or [Kinesis](../development/extensions-core/kinesis-ingestion.md)|
|`healthy`|Boolean|true or false indicator of overall supervisor health|
|`suspended`|Boolean|true or false indicator of whether the supervisor is in suspended state|
@ -685,7 +681,7 @@ Terminate all supervisors at once.
`POST /druid/indexer/v1/supervisor/<supervisorId>/shutdown`
> This API is deprecated and will be removed in future releases.
> Please use the equivalent 'terminate' instead.
> Please use the equivalent `terminate` instead.
Shutdown a supervisor.
@ -694,7 +690,7 @@ Shutdown a supervisor.
See [Overlord Dynamic Configuration](../configuration/index.md#overlord-dynamic-configuration) for details.
Note that all _interval_ URL parameters are ISO 8601 strings delimited by a `_` instead of a `/`
(e.g., 2016-06-27_2016-06-28).
as in `2016-06-27_2016-06-28`.
`GET /druid/indexer/v1/worker`
@ -810,7 +806,7 @@ This section documents the API endpoints for the processes that reside on Query
#### Datasource information
Note that all _interval_ URL parameters are ISO 8601 strings delimited by a `_` instead of a `/`
(e.g., 2016-06-27_2016-06-28).
as in `2016-06-27_2016-06-28`.
> Note: Much of this information is available in a simpler, easier-to-use form through the Druid SQL
> [`INFORMATION_SCHEMA.TABLES`](../querying/sql-metadata-tables.md#tables-table),

View File

@ -1,7 +1,7 @@
---
id: sql-api
title: "Druid SQL API"
sidebar_label: "Druid SQL API"
title: Druid SQL API
sidebar_label: Druid SQL
---
<!--
@ -23,10 +23,10 @@ sidebar_label: "Druid SQL API"
~ under the License.
-->
> Apache Druid supports two query languages: Druid SQL and [native queries](querying.md).
> Apache Druid supports two query languages: Druid SQL and [native queries](../querying/querying.md).
> This document describes the SQL language.
You can submit and cancel [Druid SQL](./sql.md) queries using the Druid SQL API.
You can submit and cancel [Druid SQL](../querying/sql.md) queries using the Druid SQL API.
The Druid SQL API is available at `https://ROUTER:8888/druid/v2/sql`, where `ROUTER` is the IP address of the Druid Router.
## Submit a query
@ -50,8 +50,8 @@ Submit your query as the value of a "query" field in the JSON object within the
|`header`|Whether or not to include a header row for the query result. See [Responses](#responses) for details.|`false`|
|`typesHeader`|Whether or not to include type information in the header. Can only be set when `header` is also `true`. See [Responses](#responses) for details.|`false`|
|`sqlTypesHeader`|Whether or not to include SQL type information in the header. Can only be set when `header` is also `true`. See [Responses](#responses) for details.|`false`|
|`context`|JSON object containing [SQL query context parameters](sql-query-context.md).|`{}` (empty)|
|`parameters`|List of query parameters for parameterized queries. Each parameter in the list should be a JSON object like `{"type": "VARCHAR", "value": "foo"}`. The type should be a SQL type; see [Data types](sql-data-types.md) for a list of supported SQL types.|`[]` (empty)|
|`context`|JSON object containing [SQL query context parameters](../querying/sql-query-context.md).|`{}` (empty)|
|`parameters`|List of query parameters for parameterized queries. Each parameter in the list should be a JSON object like `{"type": "VARCHAR", "value": "foo"}`. The type should be a SQL type; see [Data types](../querying/sql-data-types.md) for a list of supported SQL types.|`[]` (empty)|
You can use _curl_ to send SQL queries from the command-line:
@ -63,7 +63,7 @@ $ curl -XPOST -H'Content-Type: application/json' http://ROUTER:8888/druid/v2/sql
[{"TheCount":24433}]
```
There are a variety of [SQL query context parameters](sql-query-context.md) you can provide by adding a "context" map,
There are a variety of [SQL query context parameters](../querying/sql-query-context.md) you can provide by adding a "context" map,
like:
```json
@ -87,14 +87,13 @@ Parameterized SQL queries are also supported:
}
```
Metadata is available over HTTP POST by querying [metadata tables](sql-metadata-tables.md).
Metadata is available over HTTP POST by querying [metadata tables](../querying/sql-metadata-tables.md).
### Responses
#### Result formats
Druid SQL's HTTP POST API supports a variety of result formats. You can specify these by adding a "resultFormat"
parameter, like:
Druid SQL's HTTP POST API supports a variety of result formats. You can specify these by adding a `resultFormat` parameter, like:
```json
{
@ -105,7 +104,7 @@ parameter, like:
To request a header with information about column names, set `header` to true in your request.
When you set `header` to true, you can optionally include `typesHeader` and `sqlTypesHeader` as well, which gives
you information about [Druid runtime and SQL types](sql-data-types.md) respectively. You can request all these headers
you information about [Druid runtime and SQL types](../querying/sql-data-types.md) respectively. You can request all these headers
with a request like:
```json
@ -128,10 +127,10 @@ The following table shows supported result formats:
|`arrayLines`|Like `array`, but the JSON arrays are separated by newlines instead of being wrapped in a JSON array. This can make it easier to parse the entire response set as a stream, if you do not have ready access to a streaming JSON parser. To make it possible to detect a truncated response, this format includes a trailer of one blank line.|Same as `array`, except the rows are separated by newlines.|text/plain|
|`csv`|Comma-separated values, with one row per line. Individual field values may be escaped by being surrounded in double quotes. If double quotes appear in a field value, they will be escaped by replacing them with double-double-quotes like `""this""`. To make it possible to detect a truncated response, this format includes a trailer of one blank line.|Same as `array`, except the lists are in CSV format.|text/csv|
If `typesHeader` is set to true, [Druid type](sql-data-types.md) information is included in the response. Complex types,
If `typesHeader` is set to true, [Druid type](../querying/sql-data-types.md) information is included in the response. Complex types,
like sketches, will be reported as `COMPLEX<typeName>` if a particular complex type name is known for that field,
or as `COMPLEX` if the particular type name is unknown or mixed. If `sqlTypesHeader` is set to true,
[SQL type](sql-data-types.md) information is included in the response. It is possible to set both `typesHeader` and
[SQL type](../querying/sql-data-types.md) information is included in the response. It is possible to set both `typesHeader` and
`sqlTypesHeader` at once. Both parameters require that `header` is also set.
To aid in building clients that are compatible with older Druid versions, Druid returns the HTTP header
@ -140,7 +139,7 @@ understands the `typesHeader` and `sqlTypesHeader` parameters. This HTTP respons
whether `typesHeader` or `sqlTypesHeader` are set or not.
Druid returns the SQL query identifier in the `X-Druid-SQL-Query-Id` HTTP header.
This query id will be assigned the value of `sqlQueryId` from the [query context parameters](sql-query-context.md)
This query id will be assigned the value of `sqlQueryId` from the [query context parameters](../querying/sql-query-context.md)
if specified, else Druid will generate a SQL query id for you.
#### Errors
@ -179,7 +178,7 @@ You can cancel the query using the query id `myQuery01` as follows:
curl --request DELETE 'https://ROUTER:8888/druid/v2/sql/myQuery01' \
```
Cancellation requests require READ permission on all resources used in the sql query.
Cancellation requests require READ permission on all resources used in the SQL query.
Druid returns an HTTP 202 response for successful deletion requests.

View File

@ -1,7 +1,7 @@
---
id: api
title: SQL-based ingestion and multi-stage query task API
sidebar_label: API
id: sql-ingestion-api
title: SQL-based ingestion API
sidebar_label: SQL-based ingestion
---
<!--
@ -34,7 +34,7 @@ interface.
When using the API for the MSQ task engine, the action you want to take determines the endpoint you use:
- `/druid/v2/sql/task` endpoint: Submit a query for ingestion.
- `/druid/indexer/v1/task` endpoint: Interact with a query, including getting its status, getting its details, or canceling it. This page describes a few of the Overlord Task APIs that you can use with the MSQ task engine. For information about Druid APIs, see the [API reference for Druid](../operations/api-reference.md#tasks).
- `/druid/indexer/v1/task` endpoint: Interact with a query, including getting its status, getting its details, or canceling it. This page describes a few of the Overlord Task APIs that you can use with the MSQ task engine. For information about Druid APIs, see the [API reference for Druid](../ingestion/tasks.md).
## Submit a query
@ -42,11 +42,11 @@ You submit queries to the MSQ task engine using the `POST /druid/v2/sql/task/` e
#### Request
The SQL task endpoint accepts [SQL requests in the JSON-over-HTTP form](../querying/sql-api.md#request-body) using the
The SQL task endpoint accepts [SQL requests in the JSON-over-HTTP form](sql-api.md#request-body) using the
`query`, `context`, and `parameters` fields, but ignoring the `resultFormat`, `header`, `typesHeader`, and
`sqlTypesHeader` fields.
This endpoint accepts [INSERT](reference.md#insert) and [REPLACE](reference.md#replace) statements.
This endpoint accepts [INSERT](../multi-stage-query/reference.md#insert) and [REPLACE](../multi-stage-query/reference.md#replace) statements.
As an experimental feature, this endpoint also accepts SELECT queries. SELECT query results are collected from workers
by the controller, and written into the [task report](#get-the-report-for-a-query-task) as an array of arrays. The
@ -123,7 +123,7 @@ print(response.text)
| Field | Description |
|---|---|
| `taskId` | Controller task ID. You can use Druid's standard [task APIs](../operations/api-reference.md#overlord) to interact with this controller task. |
| `taskId` | Controller task ID. You can use Druid's standard [task APIs](api-reference.md#overlord) to interact with this controller task. |
| `state` | Initial state for the query, which is "RUNNING". |
## Get the status for a query task
@ -564,8 +564,8 @@ The following table describes the response fields when you retrieve a report for
| `multiStageQuery.payload.status.errorReport.taskId` | The task that reported the error, if known. May be a controller task or a worker task. |
| `multiStageQuery.payload.status.errorReport.host` | The hostname and port of the task that reported the error, if known. |
| `multiStageQuery.payload.status.errorReport.stageNumber` | The stage number that reported the error, if it happened during execution of a specific stage. |
| `multiStageQuery.payload.status.errorReport.error` | Error object. Contains `errorCode` at a minimum, and may contain other fields as described in the [error code table](./reference.md#error-codes). Always present if there is an error. |
| `multiStageQuery.payload.status.errorReport.error.errorCode` | One of the error codes from the [error code table](./reference.md#error-codes). Always present if there is an error. |
| `multiStageQuery.payload.status.errorReport.error` | Error object. Contains `errorCode` at a minimum, and may contain other fields as described in the [error code table](../multi-stage-query/reference.md#error-codes). Always present if there is an error. |
| `multiStageQuery.payload.status.errorReport.error.errorCode` | One of the error codes from the [error code table](../multi-stage-query/reference.md#error-codes). Always present if there is an error. |
| `multiStageQuery.payload.status.errorReport.error.errorMessage` | User-friendly error message. Not always present, even if there is an error. |
| `multiStageQuery.payload.status.errorReport.exceptionStackTrace` | Java stack trace in string form, if the error was due to a server-side exception. |
| `multiStageQuery.payload.stages` | Array of query stages. |

View File

@ -1,7 +1,7 @@
---
id: sql-jdbc
title: "SQL JDBC driver API"
sidebar_label: "JDBC driver API"
title: SQL JDBC driver API
sidebar_label: SQL JDBC driver
---
<!--
@ -23,11 +23,11 @@ sidebar_label: "JDBC driver API"
~ under the License.
-->
> Apache Druid supports two query languages: Druid SQL and [native queries](querying.md).
> Apache Druid supports two query languages: Druid SQL and [native queries](../querying/querying.md).
> This document describes the SQL language.
You can make [Druid SQL](./sql.md) queries using the [Avatica JDBC driver](https://calcite.apache.org/avatica/downloads/). We recommend using Avatica JDBC driver version 1.17.0 or later. Note that as of the time of this writing, Avatica 1.17.0, the latest version, does not support passing connection string parameters from the URL to Druid, so you must pass them using a `Properties` object. Once you've downloaded the Avatica client jar, add it to your classpath and use the connect string `jdbc:avatica:remote:url=http://BROKER:8082/druid/v2/sql/avatica/`.
You can make [Druid SQL](../querying/sql.md) queries using the [Avatica JDBC driver](https://calcite.apache.org/avatica/downloads/). We recommend using Avatica JDBC driver version 1.17.0 or later. Note that as of the time of this writing, Avatica 1.17.0, the latest version, does not support passing connection string parameters from the URL to Druid, so you must pass them using a `Properties` object. Once you've downloaded the Avatica client jar, add it to your classpath and use the connect string `jdbc:avatica:remote:url=http://BROKER:8082/druid/v2/sql/avatica/`.
When using the JDBC connector for the [examples](#examples) or in general, it's helpful to understand the parts of the connect string stored in the `url` variable:
@ -60,7 +60,7 @@ try (Connection connection = DriverManager.getConnection(url, connectionProperti
For a runnable example that includes a query that you might run, see [Examples](#examples).
It is also possible to use a protocol buffers JDBC connection with Druid, this offer reduced bloat and potential performance
improvements for larger result sets. To use it apply the following connection url instead, everything else remains the same
improvements for larger result sets. To use it apply the following connection URL instead, everything else remains the same
```
String url = "jdbc:avatica:remote:url=http://localhost:8082/druid/v2/sql/avatica-protobuf/;serialization=protobuf";
```
@ -68,7 +68,7 @@ String url = "jdbc:avatica:remote:url=http://localhost:8082/druid/v2/sql/avatica
> The protobuf endpoint is also known to work with the official [Golang Avatica driver](https://github.com/apache/calcite-avatica-go)
Table metadata is available over JDBC using `connection.getMetaData()` or by querying the
["INFORMATION_SCHEMA" tables](sql-metadata-tables.md). For an example of this, see [Get the metadata for a datasource](#get-the-metadata-for-a-datasource).
[INFORMATION_SCHEMA tables](../querying/sql-metadata-tables.md). For an example of this, see [Get the metadata for a datasource](#get-the-metadata-for-a-datasource).
## Connection stickiness
@ -82,7 +82,7 @@ Note that the non-JDBC [JSON over HTTP](sql-api.md#submit-a-query) API is statel
## Dynamic parameters
You can use [parameterized queries](sql.md#dynamic-parameters) in JDBC code, as in this example:
You can use [parameterized queries](../querying/sql.md#dynamic-parameters) in JDBC code, as in this example:
```java
PreparedStatement statement = connection.prepareStatement("SELECT COUNT(*) AS cnt FROM druid.foo WHERE dim1 = ? OR dim1 = ?");

View File

@ -96,7 +96,7 @@ All of these community extensions can be downloaded using [pull-deps](../operati
|druid-momentsketch|Support for approximate quantile queries using the [momentsketch](https://github.com/stanford-futuredata/momentsketch) library|[link](../development/extensions-contrib/momentsketch-quantiles.md)|
|druid-tdigestsketch|Support for approximate sketch aggregators based on [T-Digest](https://github.com/tdunning/t-digest)|[link](../development/extensions-contrib/tdigestsketch-quantiles.md)|
|gce-extensions|GCE Extensions|[link](../development/extensions-contrib/gce-extensions.md)|
|prometheus-emitter|Exposes [Druid metrics](../operations/metrics.md) for Prometheus server collection (https://prometheus.io/)|[link](./extensions-contrib/prometheus.md)|
|prometheus-emitter|Exposes [Druid metrics](../operations/metrics.md) for Prometheus server collection (https://prometheus.io/)|[link](../development/extensions-contrib/prometheus.md)|
|kubernetes-overlord-extensions|Support for launching tasks in k8s without Middle Managers|[link](../development/extensions-contrib/k8s-jobs.md)|
## Promoting community extensions to core extensions
@ -111,11 +111,11 @@ For information how to create your own extension, please see [here](../developme
### Loading core extensions
Apache Druid bundles all [core extensions](../development/extensions.md#core-extensions) out of the box.
See the [list of extensions](../development/extensions.md#core-extensions) for your options. You
Apache Druid bundles all [core extensions](../configuration/extensions.md#core-extensions) out of the box.
See the [list of extensions](../configuration/extensions.md#core-extensions) for your options. You
can load bundled extensions by adding their names to your common.runtime.properties
`druid.extensions.loadList` property. For example, to load the *postgresql-metadata-storage* and
*druid-hdfs-storage* extensions, use the configuration:
`druid.extensions.loadList` property. For example, to load the postgresql-metadata-storage and
druid-hdfs-storage extensions, use the configuration:
```
druid.extensions.loadList=["postgresql-metadata-storage", "druid-hdfs-storage"]
@ -125,7 +125,7 @@ These extensions are located in the `extensions` directory of the distribution.
> Druid bundles two sets of configurations: one for the [quickstart](../tutorials/index.md) and
> one for a [clustered configuration](../tutorials/cluster.md). Make sure you are updating the correct
> common.runtime.properties for your setup.
> `common.runtime.properties` for your setup.
> Because of licensing, the mysql-metadata-storage extension does not include the required MySQL JDBC driver. For instructions
> on how to install this library, see the [MySQL extension page](../development/extensions-core/mysql.md).
@ -153,7 +153,7 @@ You only have to install the extension once. Then, add `"druid-example-extension
> Please make sure all the Extensions related configuration properties listed [here](../configuration/index.md#extensions) are set correctly.
> The Maven groupId for almost every [community extension](../development/extensions.md#community-extensions) is org.apache.druid.extensions.contrib. The artifactId is the name
> The Maven `groupId` for almost every [community extension](../configuration/extensions.md#community-extensions) is `org.apache.druid.extensions.contrib`. The `artifactId` is the name
> of the extension, and the version is the latest Druid stable version.
### Loading extensions from the classpath

View File

@ -245,7 +245,7 @@ values for the above mentioned configs among others provided by Java implementat
|`druid.auth.unsecuredPaths`| List of Strings|List of paths for which security checks will not be performed. All requests to these paths will be allowed.|[]|no|
|`druid.auth.allowUnauthenticatedHttpOptions`|Boolean|If true, skip authentication checks for HTTP OPTIONS requests. This is needed for certain use cases, such as supporting CORS pre-flight requests. Note that disabling authentication checks for OPTIONS requests will allow unauthenticated users to determine what Druid endpoints are valid (by checking if the OPTIONS request returns a 200 instead of 404), so enabling this option may reveal information about server configuration, including information about what extensions are loaded (if those extensions add endpoints).|false|no|
For more information, please see [Authentication and Authorization](../design/auth.md).
For more information, please see [Authentication and Authorization](../operations/auth.md).
For configuration options for specific auth extensions, please refer to the extension documentation.
@ -581,7 +581,7 @@ This deep storage is used to interface with Cassandra. Note that the `druid-cas
#### HDFS input source
You can set the following property to specify permissible protocols for
the [HDFS input source](../ingestion/native-batch-input-source.md#hdfs-input-source).
the [HDFS input source](../ingestion/input-sources.md#hdfs-input-source).
|Property|Possible Values|Description|Default|
|--------|---------------|-----------|-------|
@ -591,7 +591,7 @@ the [HDFS input source](../ingestion/native-batch-input-source.md#hdfs-input-sou
#### HTTP input source
You can set the following property to specify permissible protocols for
the [HTTP input source](../ingestion/native-batch-input-source.md#http-input-source).
the [HTTP input source](../ingestion/input-sources.md#http-input-source).
|Property|Possible Values|Description|Default|
|--------|---------------|-----------|-------|
@ -603,7 +603,7 @@ the [HTTP input source](../ingestion/native-batch-input-source.md#http-input-sou
#### JDBC Connections to External Databases
You can use the following properties to specify permissible JDBC options for:
- [SQL input source](../ingestion/native-batch-input-source.md#sql-input-source)
- [SQL input source](../ingestion/input-sources.md#sql-input-source)
- [globally cached JDBC lookups](../development/extensions-core/lookups-cached-global.md#jdbc-lookup)
- [JDBC Data Fetcher for per-lookup caching](../development/extensions-core/druid-lookups.md#data-fetcher-layer).
@ -998,7 +998,7 @@ These configuration options control Coordinator lookup management. See [dynamic
##### Automatic compaction dynamic configuration
You can set or update [automatic compaction](../data-management/automatic-compaction.md) properties dynamically using the
[Coordinator API](../operations/api-reference.md#automatic-compaction-configuration) without restarting Coordinators.
[Coordinator API](../api-reference/api-reference.md#automatic-compaction-configuration) without restarting Coordinators.
For details about segment compaction, see [Segment size optimization](../operations/segment-optimization.md).
@ -1525,7 +1525,7 @@ Additional peon configs include:
|`druid.indexer.task.gracefulShutdownTimeout`|Wait this long on middleManager restart for restorable tasks to gracefully exit.|PT5M|
|`druid.indexer.task.hadoopWorkingPath`|Temporary working directory for Hadoop tasks.|`/tmp/druid-indexing`|
|`druid.indexer.task.restoreTasksOnRestart`|If true, MiddleManagers will attempt to stop tasks gracefully on shutdown and restore them on restart.|false|
|`druid.indexer.task.ignoreTimestampSpecForDruidInputSource`|If true, tasks using the [Druid input source](../ingestion/native-batch-input-source.md) will ignore the provided timestampSpec, and will use the `__time` column of the input datasource. This option is provided for compatibility with ingestion specs written before Druid 0.22.0.|false|
|`druid.indexer.task.ignoreTimestampSpecForDruidInputSource`|If true, tasks using the [Druid input source](../ingestion/input-sources.md) will ignore the provided timestampSpec, and will use the `__time` column of the input datasource. This option is provided for compatibility with ingestion specs written before Druid 0.22.0.|false|
|`druid.indexer.task.storeEmptyColumns`|Boolean value for whether or not to store empty columns during ingestion. When set to true, Druid stores every column specified in the [`dimensionsSpec`](../ingestion/ingestion-spec.md#dimensionsspec). If you use the string-based schemaless ingestion and don't specify any dimensions to ingest, you must also set [`includeAllDimensions`](../ingestion/ingestion-spec.md#dimensionsspec) for Druid to store empty columns.<br/><br/>If you set `storeEmptyColumns` to false, Druid SQL queries referencing empty columns will fail. If you intend to leave `storeEmptyColumns` disabled, you should either ingest placeholder data for empty columns or else not query on empty columns.<br/><br/>You can overwrite this configuration by setting `storeEmptyColumns` in the [task context](../ingestion/tasks.md#context-parameters).|true|
|`druid.indexer.task.tmpStorageBytesPerTask`|Maximum number of bytes per task to be used to store temporary files on disk. This config is generally intended for internal usage. Attempts to set it are very likely to be overwritten by the TaskRunner that executes the task, so be sure of what you expect to happen before directly adjusting this configuration parameter. The config is documented here primarily to provide an understanding of what it means if/when someone sees that it has been set. A value of -1 disables this limit. |-1|
|`druid.indexer.server.maxChatRequests`|Maximum number of concurrent requests served by a task's chat handler. Set to 0 to disable limiting.|0|
@ -1594,9 +1594,8 @@ then the value from the configuration below is used:
|`druid.indexer.task.gracefulShutdownTimeout`|Wait this long on Indexer restart for restorable tasks to gracefully exit.|PT5M|
|`druid.indexer.task.hadoopWorkingPath`|Temporary working directory for Hadoop tasks.|`/tmp/druid-indexing`|
|`druid.indexer.task.restoreTasksOnRestart`|If true, the Indexer will attempt to stop tasks gracefully on shutdown and restore them on restart.|false|
|`druid.indexer.task.ignoreTimestampSpecForDruidInputSource`|If true, tasks using the [Druid input source](../ingestion/native-batch-input-source.md) will ignore the provided timestampSpec, and will use the `__time` column of the input datasource. This option is provided for compatibility with ingestion specs written before Druid 0.22.0.|false|
|`druid.indexer.task.storeEmptyColumns`|Boolean value for whether or not to store empty columns during ingestion. When set to true, Druid stores every column specified in the [`dimensionsSpec`](../ingestion/ingestion-spec.md#dimensionsspec). <br/><br/>If you set `storeEmptyColumns` to false, Druid SQL queries referencing empty columns will fail. If you intend to leave `storeEmptyColumns` disabled, you should either ingest placeholder data for empty columns or else not query on empty columns.<br/><br/>You can overwrite this configuration by setting `storeEmptyColumns` in the [task context](../ingestion/tasks.md#context-parameters).|true|
|`druid.peon.taskActionClient.retry.minWait`|The minimum retry time to communicate with Overlord.|PT5S|
|`druid.indexer.task.ignoreTimestampSpecForDruidInputSource`|If true, tasks using the [Druid input source](../ingestion/input-sources.md) will ignore the provided timestampSpec, and will use the `__time` column of the input datasource. This option is provided for compatibility with ingestion specs written before Druid 0.22.0.|false|
|`druid.indexer.task.storeEmptyColumns`|Boolean value for whether or not to store empty columns during ingestion. When set to true, Druid stores every column specified in the [`dimensionsSpec`](../ingestion/ingestion-spec.md#dimensionsspec). <br/><br/>If you set `storeEmptyColumns` to false, Druid SQL queries referencing empty columns will fail. If you intend to leave `storeEmptyColumns` disabled, you should either ingest placeholder data for empty columns or else not query on empty columns.<br/><br/>You can overwrite this configuration by setting `storeEmptyColumns` in the [task context](../ingestion/tasks.md#context-parameters).|true||`druid.peon.taskActionClient.retry.minWait`|The minimum retry time to communicate with Overlord.|PT5S|
|`druid.peon.taskActionClient.retry.maxWait`|The maximum retry time to communicate with Overlord.|PT1M|
|`druid.peon.taskActionClient.retry.maxRetryCount`|The maximum number of retries to communicate with Overlord.|60|
@ -2245,7 +2244,7 @@ Supported query contexts:
|Key|Description|Default|
|---|-----------|-------|
|`druid.expressions.useStrictBooleans`|Controls the behavior of Druid boolean operators and functions, if set to `true` all boolean values will be either a `1` or `0`. See [expression documentation](../misc/math-expr.md#logical-operator-modes)|false|
|`druid.expressions.useStrictBooleans`|Controls the behavior of Druid boolean operators and functions, if set to `true` all boolean values will be either a `1` or `0`. See [expression documentation](../querying/math-expr.md#logical-operator-modes)|false|
|`druid.expressions.allowNestedArrays`|If enabled, Druid array expressions can create nested arrays.|false|
### Router

View File

@ -40,7 +40,7 @@ This topic guides you through setting up automatic compaction for your Druid clu
## Enable automatic compaction
You can enable automatic compaction for a datasource using the web console or programmatically via an API.
This process differs for manual compaction tasks, which can be submitted from the [Tasks view of the web console](../operations/web-console.md) or the [Tasks API](../operations/api-reference.md#tasks).
This process differs for manual compaction tasks, which can be submitted from the [Tasks view of the web console](../operations/web-console.md) or the [Tasks API](../api-reference/api-reference.md#tasks).
### web console
@ -59,10 +59,10 @@ To disable auto-compaction for a datasource, click **Delete** from the **Compact
### Compaction configuration API
Use the [Coordinator API](../operations/api-reference.md#automatic-compaction-status) to configure automatic compaction.
Use the [Coordinator API](../api-reference/api-reference.md#automatic-compaction-status) to configure automatic compaction.
To enable auto-compaction for a datasource, create a JSON object with the desired auto-compaction settings.
See [Configure automatic compaction](#configure-automatic-compaction) for the syntax of an auto-compaction spec.
Send the JSON object as a payload in a [`POST` request](../operations/api-reference.md#automatic-compaction-configuration) to `/druid/coordinator/v1/config/compaction`.
Send the JSON object as a payload in a [`POST` request](../api-reference/api-reference.md#automatic-compaction-configuration) to `/druid/coordinator/v1/config/compaction`.
The following example configures auto-compaction for the `wikipedia` datasource:
```sh
@ -76,7 +76,7 @@ curl --location --request POST 'http://localhost:8081/druid/coordinator/v1/confi
}'
```
To disable auto-compaction for a datasource, send a [`DELETE` request](../operations/api-reference.md#automatic-compaction-configuration) to `/druid/coordinator/v1/config/compaction/{dataSource}`. Replace `{dataSource}` with the name of the datasource for which to disable auto-compaction. For example:
To disable auto-compaction for a datasource, send a [`DELETE` request](../api-reference/api-reference.md#automatic-compaction-configuration) to `/druid/coordinator/v1/config/compaction/{dataSource}`. Replace `{dataSource}` with the name of the datasource for which to disable auto-compaction. For example:
```sh
curl --location --request DELETE 'http://localhost:8081/druid/coordinator/v1/config/compaction/wikipedia'
@ -152,7 +152,7 @@ After the Coordinator has initiated auto-compaction, you can view compaction sta
In the web console, the Datasources view displays auto-compaction statistics. The Tasks view shows the task information for compaction tasks that were triggered by the automatic compaction system.
To get statistics by API, send a [`GET` request](../operations/api-reference.md#automatic-compaction-status) to `/druid/coordinator/v1/compaction/status`. To filter the results to a particular datasource, pass the datasource name as a query parameter to the request—for example, `/druid/coordinator/v1/compaction/status?dataSource=wikipedia`.
To get statistics by API, send a [`GET` request](../api-reference/api-reference.md#automatic-compaction-status) to `/druid/coordinator/v1/compaction/status`. To filter the results to a particular datasource, pass the datasource name as a query parameter to the request—for example, `/druid/coordinator/v1/compaction/status?dataSource=wikipedia`.
## Examples

View File

@ -136,7 +136,7 @@ To control the number of result segments per time chunk, you can set [`maxRowsPe
> You can run multiple compaction tasks in parallel. For example, if you want to compact the data for a year, you are not limited to running a single task for the entire year. You can run 12 compaction tasks with month-long intervals.
A compaction task internally generates an `index` or `index_parallel` task spec for performing compaction work with some fixed parameters. For example, its `inputSource` is always the [`druid` input source](../ingestion/native-batch-input-source.md), and `dimensionsSpec` and `metricsSpec` include all dimensions and metrics of the input segments by default.
A compaction task internally generates an `index` or `index_parallel` task spec for performing compaction work with some fixed parameters. For example, its `inputSource` is always the [`druid` input source](../ingestion/input-sources.md), and `dimensionsSpec` and `metricsSpec` include all dimensions and metrics of the input segments by default.
Compaction tasks fetch all [relevant segments](#compaction-io-configuration) prior to launching any subtasks, _unless_ the following items are all set. It is strongly recommended to set all of these items to maximize performance and minimize disk usage of the `compact` task:

View File

@ -38,7 +38,7 @@ Deletion by time range happens in two steps:
you have a backup.
For documentation on disabling segments using the Coordinator API, see the
[Coordinator API reference](../operations/api-reference.md#coordinator-datasources).
[Coordinator API reference](../api-reference/api-reference.md#coordinator-datasources).
A data deletion tutorial is available at [Tutorial: Deleting data](../tutorials/tutorial-delete-data.md).
@ -65,7 +65,7 @@ For example, to delete records where `userName` is `'bob'` with native batch ind
To delete the same records using SQL, use [REPLACE](../multi-stage-query/concepts.md#replace) with `WHERE userName <> 'bob'`.
To reindex using [native batch](../ingestion/native-batch.md), use the [`druid` input
source](../ingestion/native-batch-input-source.md#druid-input-source). If needed,
source](../ingestion/input-sources.md#druid-input-source). If needed,
[`transformSpec`](../ingestion/ingestion-spec.md#transformspec) can be used to filter or modify data during the
reindexing job. To reindex with SQL, use [`REPLACE <table> OVERWRITE`](../multi-stage-query/reference.md#replace)
with `SELECT ... FROM <table>`. (Druid does not have `UPDATE` or `ALTER TABLE` statements.) Any SQL SELECT query can be

View File

@ -52,7 +52,7 @@ is used to perform schema changes, repartition data, filter out unwanted data, e
behaves just like any other [overwrite](#overwrite) with regard to atomic updates and locking.
With [native batch](../ingestion/native-batch.md), use the [`druid` input
source](../ingestion/native-batch-input-source.md#druid-input-source). If needed,
source](../ingestion/input-sources.md#druid-input-source). If needed,
[`transformSpec`](../ingestion/ingestion-spec.md#transformspec) can be used to filter or modify data during the
reindexing job.

View File

@ -80,7 +80,7 @@ both in deep storage and across your Historical servers for the data you plan to
Deep storage is an important part of Druid's elastic, fault-tolerant design. Druid bootstraps from deep storage even
if every single data server is lost and re-provisioned.
For more details, please see the [Deep storage](../dependencies/deep-storage.md) page.
For more details, please see the [Deep storage](../design/deep-storage.md) page.
### Metadata storage
@ -88,13 +88,13 @@ The metadata storage holds various shared system metadata such as segment usage
clustered deployment, this is typically a traditional RDBMS like PostgreSQL or MySQL. In a single-server
deployment, it is typically a locally-stored Apache Derby database.
For more details, please see the [Metadata storage](../dependencies/metadata-storage.md) page.
For more details, please see the [Metadata storage](../design/metadata-storage.md) page.
### ZooKeeper
Used for internal service discovery, coordination, and leader election.
For more details, please see the [ZooKeeper](../dependencies/zookeeper.md) page.
For more details, please see the [ZooKeeper](zookeeper.md) page.
## Storage design
@ -203,7 +203,7 @@ new segments. Then it drops the old segments a few minutes later.
Each segment has a lifecycle that involves the following three major areas:
1. **Metadata store:** Segment metadata (a small JSON payload generally no more than a few KB) is stored in the
[metadata store](../dependencies/metadata-storage.md) once a segment is done being constructed. The act of inserting
[metadata store](../design/metadata-storage.md) once a segment is done being constructed. The act of inserting
a record for a segment into the metadata store is called _publishing_. These metadata records have a boolean flag
named `used`, which controls whether the segment is intended to be queryable or not. Segments created by realtime tasks will be
available before they are published, since they are only published when the segment is complete and will not accept

View File

@ -31,7 +31,7 @@ For basic tuning guidance for the Broker process, see [Basic cluster tuning](../
### HTTP endpoints
For a list of API endpoints supported by the Broker, see [Broker API](../operations/api-reference.md#broker).
For a list of API endpoints supported by the Broker, see [Broker API](../api-reference/api-reference.md#broker).
### Overview

View File

@ -31,7 +31,7 @@ For basic tuning guidance for the Coordinator process, see [Basic cluster tuning
### HTTP endpoints
For a list of API endpoints supported by the Coordinator, see [Coordinator API](../operations/api-reference.md#coordinator).
For a list of API endpoints supported by the Coordinator, see [Coordinator API](../api-reference/api-reference.md#coordinator).
### Overview
@ -92,7 +92,7 @@ Once some segments are found, it issues a [compaction task](../ingestion/tasks.m
The maximum number of running compaction tasks is `min(sum of worker capacity * slotRatio, maxSlots)`.
Note that even if `min(sum of worker capacity * slotRatio, maxSlots) = 0`, at least one compaction task is always submitted
if the compaction is enabled for a dataSource.
See [Automatic compaction configuration API](../operations/api-reference.md#automatic-compaction-configuration) and [Automatic compaction configuration](../configuration/index.md#automatic-compaction-dynamic-configuration) to enable and configure automatic compaction.
See [Automatic compaction configuration API](../api-reference/api-reference.md#automatic-compaction-configuration) and [Automatic compaction configuration](../configuration/index.md#automatic-compaction-dynamic-configuration) to enable and configure automatic compaction.
Compaction tasks might fail due to the following reasons:

View File

@ -73,4 +73,4 @@ See [druid-hdfs-storage extension documentation](../development/extensions-core/
## Additional options
For additional deep storage options, please see our [extensions list](../development/extensions.md).
For additional deep storage options, please see our [extensions list](../configuration/extensions.md).

View File

@ -24,7 +24,7 @@ title: "Dropwizard metrics emitter"
# Dropwizard Emitter
To use this extension, make sure to [include](../../development/extensions.md#loading-extensions) `dropwizard-emitter` in the extensions load list.
To use this extension, make sure to [include](../../configuration/extensions.md#loading-extensions) `dropwizard-emitter` in the extensions load list.
## Introduction

View File

@ -31,7 +31,7 @@ For basic tuning guidance for the Historical process, see [Basic cluster tuning]
### HTTP endpoints
For a list of API endpoints supported by the Historical, please see the [API reference](../operations/api-reference.md#historical).
For a list of API endpoints supported by the Historical, please see the [API reference](../api-reference/api-reference.md#historical).
### Running

View File

@ -35,7 +35,7 @@ For Apache Druid Indexer Process Configuration, see [Indexer Configuration](../c
### HTTP endpoints
The Indexer process shares the same HTTP endpoints as the [MiddleManager](../operations/api-reference.md#middlemanager).
The Indexer process shares the same HTTP endpoints as the [MiddleManager](../api-reference/api-reference.md#middlemanager).
### Running

View File

@ -30,7 +30,7 @@ Indexing [tasks](../ingestion/tasks.md) are responsible for creating and [killin
The indexing service is composed of three main components: [Peons](../design/peons.md) that can run a single task, [MiddleManagers](../design/middlemanager.md) that manage Peons, and an [Overlord](../design/overlord.md) that manages task distribution to MiddleManagers.
Overlords and MiddleManagers may run on the same process or across multiple processes, while MiddleManagers and Peons always run on the same process.
Tasks are managed using API endpoints on the Overlord service. See [Overlord Task API](../operations/api-reference.md#tasks) for more information.
Tasks are managed using API endpoints on the Overlord service. Please see [Overlord Task API](../api-reference/api-reference.md#tasks) for more information.
![Indexing Service](../assets/indexing_service.png "Indexing Service")

View File

@ -31,7 +31,7 @@ For basic tuning guidance for the MiddleManager process, see [Basic cluster tuni
### HTTP endpoints
For a list of API endpoints supported by the MiddleManager, please see the [API reference](../operations/api-reference.md#middlemanager).
For a list of API endpoints supported by the MiddleManager, please see the [API reference](../api-reference/api-reference.md#middlemanager).
### Overview

View File

@ -31,7 +31,7 @@ For basic tuning guidance for the Overlord process, see [Basic cluster tuning](.
### HTTP endpoints
For a list of API endpoints supported by the Overlord, please see the [API reference](../operations/api-reference.md#overlord).
For a list of API endpoints supported by the Overlord, please see the [API reference](../api-reference/api-reference.md#overlord).
### Overview

View File

@ -31,7 +31,7 @@ For basic tuning guidance for MiddleManager tasks, see [Basic cluster tuning](..
### HTTP endpoints
For a list of API endpoints supported by the Peon, please see the [Peon API reference](../operations/api-reference.md#peon).
For a list of API endpoints supported by the Peon, please see the [Peon API reference](../api-reference/api-reference.md#peon).
Peons run a single task in a single JVM. MiddleManager is responsible for creating Peons for running tasks.
Peons should rarely (if ever for testing purposes) be run on their own.

View File

@ -36,7 +36,7 @@ For basic tuning guidance for the Router process, see [Basic cluster tuning](../
### HTTP endpoints
For a list of API endpoints supported by the Router, see [Router API](../operations/api-reference.md#router).
For a list of API endpoints supported by the Router, see [Router API](../api-reference/api-reference.md#router).
### Running

View File

@ -32,7 +32,7 @@ Note that this document does not track the status of contrib extensions, all of
- [SQL-based ingestion](../multi-stage-query/index.md)
- [SQL-based ingestion concepts](../multi-stage-query/concepts.md)
- [SQL-based ingestion and multi-stage query task API](../multi-stage-query/api.md)
- [SQL-based ingestion and multi-stage query task API](../api-reference/sql-ingestion-api.md)
## Indexer process

View File

@ -27,7 +27,7 @@ This document describes how to use OSS as Druid deep storage.
## Installation
Use the [pull-deps](../../operations/pull-deps.md) tool shipped with Druid to install the `aliyun-oss-extensions` extension, as described [here](../../development/extensions.md#community-extensions) on middle manager and historical nodes.
Use the [pull-deps](../../operations/pull-deps.md) tool shipped with Druid to install the `aliyun-oss-extensions` extension, as described [here](../../configuration/extensions.md#community-extensions) on middle manager and historical nodes.
```bash
java -classpath "{YOUR_DRUID_DIR}/lib/*" org.apache.druid.cli.Main tools pull-deps -c org.apache.druid.extensions.contrib:aliyun-oss-extensions:{YOUR_DRUID_VERSION}

View File

@ -23,7 +23,7 @@ title: "Ambari Metrics Emitter"
-->
To use this Apache Druid extension, [include](../../development/extensions.md#loading-extensions) `ambari-metrics-emitter` in the extensions load list.
To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `ambari-metrics-emitter` in the extensions load list.
## Introduction

View File

@ -23,7 +23,7 @@ title: "Apache Cassandra"
-->
To use this Apache Druid extension, [include](../../development/extensions.md#loading-extensions) `druid-cassandra-storage` in the extensions load list.
To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `druid-cassandra-storage` in the extensions load list.
[Apache Cassandra](http://www.datastax.com/what-we-offer/products-services/datastax-enterprise/apache-cassandra) can also
be leveraged for deep storage. This requires some additional Druid configuration as well as setting up the necessary

View File

@ -23,7 +23,7 @@ title: "Rackspace Cloud Files"
-->
To use this Apache Druid extension, [include](../../development/extensions.md#loading-extensions) `druid-cloudfiles-extensions` in the extensions load list.
To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `druid-cloudfiles-extensions` in the extensions load list.
## Deep Storage

View File

@ -34,7 +34,7 @@ Compressed big decimal is an absolute number based complex type based on big dec
2. Accuracy: Provides greater level of accuracy in decimal arithmetic
## Operations
To use this extension, make sure to [load](../../development/extensions.md#loading-extensions) `compressed-big-decimal` to your config file.
To use this extension, make sure to [load](../../configuration/extensions.md#loading-extensions) `compressed-big-decimal` to your config file.
## Configuration
There are currently no configuration properties specific to Compressed Big Decimal

View File

@ -23,7 +23,7 @@ title: "DistinctCount Aggregator"
-->
To use this Apache Druid extension, [include](../../development/extensions.md#loading-extensions) the `druid-distinctcount` in the extensions load list.
To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) the `druid-distinctcount` in the extensions load list.
Additionally, follow these steps:

View File

@ -23,7 +23,7 @@ title: "GCE Extensions"
-->
To use this Apache Druid extension, [include](../../development/extensions.md#loading-extensions) `gce-extensions` in the extensions load list.
To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `gce-extensions` in the extensions load list.
At the moment, this extension enables only Druid to autoscale instances in GCE.

View File

@ -23,7 +23,7 @@ title: "Graphite Emitter"
-->
To use this Apache Druid extension, [include](../../development/extensions.md#loading-extensions) `graphite-emitter` in the extensions load list.
To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `graphite-emitter` in the extensions load list.
## Introduction

View File

@ -23,7 +23,7 @@ title: "InfluxDB Line Protocol Parser"
-->
To use this Apache Druid extension, [include](../../development/extensions.md#loading-extensions) `druid-influx-extensions` in the extensions load list.
To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `druid-influx-extensions` in the extensions load list.
This extension enables Druid to parse the [InfluxDB Line Protocol](https://docs.influxdata.com/influxdb/v1.5/write_protocols/line_protocol_tutorial/), a popular text-based timeseries metric serialization format.

View File

@ -23,7 +23,7 @@ title: "InfluxDB Emitter"
-->
To use this Apache Druid extension, [include](../../development/extensions.md#loading-extensions) `druid-influxdb-emitter` in the extensions load list.
To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `druid-influxdb-emitter` in the extensions load list.
## Introduction

View File

@ -47,7 +47,7 @@ Task specific pod templates must be specified as the runtime property `druid.ind
## Configuration
To use this extension please make sure to [include](../extensions.md#loading-extensions)`druid-kubernetes-overlord-extensions` in the extensions load list for your overlord process.
To use this extension please make sure to [include](../../configuration/extensions.md#loading-extensions)`druid-kubernetes-overlord-extensions` in the extensions load list for your overlord process.
The extension uses the task queue to limit how many concurrent tasks (K8s jobs) are in flight so it is required you have a reasonable value for `druid.indexer.queue.maxSize`. Additionally set the variable `druid.indexer.runner.namespace` to the namespace in which you are running druid.

View File

@ -23,7 +23,7 @@ title: "Kafka Emitter"
-->
To use this Apache Druid extension, [include](../../development/extensions.md#loading-extensions) `kafka-emitter` in the extensions load list.
To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `kafka-emitter` in the extensions load list.
## Introduction

View File

@ -26,7 +26,7 @@ title: "Moment Sketches for Approximate Quantiles module"
This module provides aggregators for approximate quantile queries using the [momentsketch](https://github.com/stanford-futuredata/momentsketch) library.
The momentsketch provides coarse quantile estimates with less space and aggregation time overheads than traditional sketches, approaching the performance of counts and sums by reconstructing distributions from computed statistics.
To use this Apache Druid extension, [include](../../development/extensions.md#loading-extensions) in the extensions load list.
To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) in the extensions load list.
### Aggregator

View File

@ -52,7 +52,7 @@ It runs the query in two main phases:
## Operations
### Installation
Use [pull-deps](../../operations/pull-deps.md) tool shipped with Druid to install this [extension](../../development/extensions.md#community-extensions) on all Druid broker and router nodes.
Use [pull-deps](../../operations/pull-deps.md) tool shipped with Druid to install this [extension](../../configuration/extensions.md#community-extensions) on all Druid broker and router nodes.
```bash
java -classpath "<your_druid_dir>/lib/*" org.apache.druid.cli.Main tools pull-deps -c org.apache.druid.extensions.contrib:druid-moving-average-query:{VERSION}

View File

@ -23,7 +23,7 @@ title: "OpenTSDB Emitter"
-->
To use this Apache Druid extension, [include](../../development/extensions.md#loading-extensions) `opentsdb-emitter` in the extensions load list.
To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `opentsdb-emitter` in the extensions load list.
## Introduction

View File

@ -23,7 +23,7 @@ title: "Prometheus Emitter"
-->
To use this Apache Druid extension, [include](../../development/extensions.md#loading-extensions) `prometheus-emitter` in the extensions load list.
To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `prometheus-emitter` in the extensions load list.
## Introduction

View File

@ -28,7 +28,7 @@ Below are guidance and configuration options known to this module.
## Installation
Use [pull-deps](../../operations/pull-deps.md) tool shipped with Druid to install this [extension](../../development/extensions.md#community-extensions) on broker, historical and middle manager nodes.
Use [pull-deps](../../operations/pull-deps.md) tool shipped with Druid to install this [extension](../../configuration/extensions.md#community-extensions) on broker, historical and middle manager nodes.
```bash
java -classpath "druid_dir/lib/*" org.apache.druid.cli.Main tools pull-deps -c org.apache.druid.extensions.contrib:druid-redis-cache:{VERSION}
@ -38,7 +38,7 @@ java -classpath "druid_dir/lib/*" org.apache.druid.cli.Main tools pull-deps -c o
To enable this extension after installation,
1. [include](../../development/extensions.md#loading-extensions) this `druid-redis-cache` extension
1. [include](../../configuration/extensions.md#loading-extensions) this `druid-redis-cache` extension
2. to enable cache on broker nodes, follow [broker caching docs](../../configuration/index.md#broker-caching) to set related properties
3. to enable cache on historical nodes, follow [historical caching docs](../../configuration/index.md#historical-caching) to set related properties
4. to enable cache on middle manager nodes, follow [peon caching docs](../../configuration/index.md#peon-caching) to set related properties

View File

@ -23,7 +23,7 @@ title: "Microsoft SQLServer"
-->
To use this Apache Druid extension, [include](../../development/extensions.md#loading-extensions) `sqlserver-metadata-storage` in the extensions load list.
To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `sqlserver-metadata-storage` in the extensions load list.
## Setting up SQLServer

View File

@ -23,7 +23,7 @@ title: "StatsD Emitter"
-->
To use this Apache Druid extension, [include](../../development/extensions.md#loading-extensions) `statsd-emitter` in the extensions load list.
To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `statsd-emitter` in the extensions load list.
## Introduction

View File

@ -35,7 +35,7 @@ to generate sketches during ingestion time itself and then combining them during
The module also provides a postAggregator, quantilesFromTDigestSketch, that can be used to compute approximate
quantiles from T-Digest sketches generated by the tDigestSketch aggregator.
To use this aggregator, make sure you [include](../../development/extensions.md#loading-extensions) the extension in your config file:
To use this aggregator, make sure you [include](../../configuration/extensions.md#loading-extensions) the extension in your config file:
```
druid.extensions.loadList=["druid-tdigestsketch"]

View File

@ -23,7 +23,7 @@ title: "Thrift"
-->
To use this Apache Druid extension, [include](../../development/extensions.md#loading-extensions) `druid-thrift-extensions` in the extensions load list.
To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `druid-thrift-extensions` in the extensions load list.
This extension enables Druid to ingest thrift compact data online (`ByteBuffer`) and offline (SequenceFile of type `<Writable, BytesWritable>` or LzoThriftBlock File).

View File

@ -23,7 +23,7 @@ title: "Timestamp Min/Max aggregators"
-->
To use this Apache Druid extension, [include](../../development/extensions.md#loading-extensions) `druid-time-min-max` in the extensions load list.
To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `druid-time-min-max` in the extensions load list.
These aggregators enable more precise calculation of min and max time of given events than `__time` column whose granularity is sparse, the same as query granularity.
To use this feature, a "timeMin" or "timeMax" aggregator must be included at indexing time.

View File

@ -23,7 +23,7 @@ title: "Approximate Histogram aggregators"
-->
To use this Apache Druid extension, [include](../../development/extensions.md#loading-extensions) `druid-histogram` in the extensions load list.
To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `druid-histogram` in the extensions load list.
The `druid-histogram` extension provides an approximate histogram aggregator and a fixed buckets histogram aggregator.

View File

@ -31,7 +31,7 @@ The [Avro Stream Parser](../../ingestion/data-formats.md#avro-stream-parser) is
## Load the Avro extension
To use the Avro extension, add the `druid-avro-extensions` to the list of loaded extensions. See [Loading extensions](../../development/extensions.md#loading-extensions) for more information.
To use the Avro extension, add the `druid-avro-extensions` to the list of loaded extensions. See [Loading extensions](../../configuration/extensions.md#loading-extensions) for more information.
## Avro types

View File

@ -23,7 +23,7 @@ title: "Microsoft Azure"
-->
To use this Apache Druid extension, [include](../../development/extensions.md#loading-extensions) `druid-azure-extensions` in the extensions load list.
To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `druid-azure-extensions` in the extensions load list.
## Deep Storage

View File

@ -23,7 +23,7 @@ title: "Bloom Filter"
-->
To use this Apache Druid extension, [include](../../development/extensions.md#loading-extensions) `druid-bloom-filter` in the extensions load list.
To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `druid-bloom-filter` in the extensions load list.
This extension adds the ability to both construct bloom filters from query results, and filter query results by testing
against a bloom filter. A Bloom filter is a probabilistic data structure for performing a set membership check. A bloom
@ -98,7 +98,7 @@ SELECT COUNT(*) FROM druid.foo WHERE bloom_filter_test(<expr>, '<serialized_byte
### Expression and Virtual Column Support
The bloom filter extension also adds a bloom filter [Druid expression](../../misc/math-expr.md) which shares syntax
The bloom filter extension also adds a bloom filter [Druid expression](../../querying/math-expr.md) which shares syntax
with the SQL operator.
```sql

View File

@ -25,7 +25,7 @@ title: "DataSketches extension"
Apache Druid aggregators based on [Apache DataSketches](https://datasketches.apache.org/) library. Sketches are data structures implementing approximate streaming mergeable algorithms. Sketches can be ingested from the outside of Druid or built from raw data at ingestion time. Sketches can be stored in Druid segments as additive metrics.
To use the datasketches aggregators, make sure you [include](../../development/extensions.md#loading-extensions) the extension in your config file:
To use the datasketches aggregators, make sure you [include](../../configuration/extensions.md#loading-extensions) the extension in your config file:
```
druid.extensions.loadList=["druid-datasketches"]

View File

@ -27,7 +27,7 @@ This module provides Apache Druid aggregators for distinct counting based on HLL
the estimate of the number of distinct values presented to the sketch. You can also use post aggregators to produce a union of sketch columns in the same row.
You can use the HLL sketch aggregator on any column to estimate its cardinality.
To use this aggregator, make sure you [include](../../development/extensions.md#loading-extensions) the extension in your config file:
To use this aggregator, make sure you [include](../../configuration/extensions.md#loading-extensions) the extension in your config file:
```
druid.extensions.loadList=["druid-datasketches"]

View File

@ -31,7 +31,7 @@ There are three major modes of operation:
2. Building sketches from raw data during ingestion
3. Building sketches from raw data at query time
To use this aggregator, make sure you [include](../../development/extensions.md#loading-extensions) the extension in your config file:
To use this aggregator, make sure you [include](../../configuration/extensions.md#loading-extensions) the extension in your config file:
```
druid.extensions.loadList=["druid-datasketches"]

View File

@ -31,7 +31,7 @@ There are three major modes of operation:
2. Building sketches from raw data during ingestion
3. Building sketches from raw data at query time
To use this aggregator, make sure you [include](../../development/extensions.md#loading-extensions) the extension in your config file:
To use this aggregator, make sure you [include](../../configuration/extensions.md#loading-extensions) the extension in your config file:
```
druid.extensions.loadList=["druid-datasketches"]

View File

@ -30,7 +30,7 @@ At ingestion time, the Theta sketch aggregator creates Theta sketch objects whic
Note that you can use `thetaSketch` aggregator on columns which were not ingested using the same. It will return estimated cardinality of the column. It is recommended to use it at ingestion time as well to make querying faster.
To use this aggregator, make sure you [include](../../development/extensions.md#loading-extensions) the extension in your config file:
To use this aggregator, make sure you [include](../../configuration/extensions.md#loading-extensions) the extension in your config file:
```
druid.extensions.loadList=["druid-datasketches"]

View File

@ -25,7 +25,7 @@ title: "DataSketches Tuple Sketch module"
This module provides Apache Druid aggregators based on Tuple sketch from [Apache DataSketches](https://datasketches.apache.org/) library. ArrayOfDoublesSketch sketches extend the functionality of the count-distinct Theta sketches by adding arrays of double values associated with unique keys.
To use this aggregator, make sure you [include](../../development/extensions.md#loading-extensions) the extension in your config file:
To use this aggregator, make sure you [include](../../configuration/extensions.md#loading-extensions) the extension in your config file:
```
druid.extensions.loadList=["druid-datasketches"]

View File

@ -31,7 +31,7 @@ title: "Druid AWS RDS Module"
Before using this password provider, please make sure that you have connected all dots for db user to connect using token.
See [AWS Guide](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/UsingWithRDS.IAMDBAuth.html).
To use this extension, make sure you [include](../../development/extensions.md#loading-extensions) it in your config file along with other extensions e.g.
To use this extension, make sure you [include](../../configuration/extensions.md#loading-extensions) it in your config file along with other extensions e.g.
```
druid.extensions.loadList=["druid-aws-rds-extensions", "postgresql-metadata-storage", ...]

View File

@ -29,7 +29,7 @@ The Basic Security extension for Apache Druid adds:
- an Escalator which determines the authentication scheme for internal Druid processes.
- an Authorizer which implements basic role-based access control for Druid metadata store or LDAP users and groups.
To load the extension, [include](../../development/extensions.md#loading-extensions) `druid-basic-security` in the `druid.extensions.loadList` in your `common.runtime.properties`. For example:
To load the extension, [include](../../configuration/extensions.md#loading-extensions) `druid-basic-security` in the `druid.extensions.loadList` in your `common.runtime.properties`. For example:
```
druid.extensions.loadList=["postgresql-metadata-storage", "druid-hdfs-storage", "druid-basic-security"]
```
@ -37,7 +37,7 @@ druid.extensions.loadList=["postgresql-metadata-storage", "druid-hdfs-storage",
To enable basic auth, configure the basic Authenticator, Escalator, and Authorizer in `common.runtime.properties`.
See [Security overview](../../operations/security-overview.md#enable-an-authenticator) for an example configuration for HTTP basic authentication.
Visit [Authentication and Authorization](../../design/auth.md) for more information on the implemented extension interfaces and for an example configuration.
Visit [Authentication and Authorization](../../operations/auth.md) for more information on the implemented extension interfaces and for an example configuration.
## Configuration

View File

@ -25,7 +25,7 @@ title: "Kerberos"
Apache Druid Extension to enable Authentication for Druid Processes using Kerberos.
This extension adds an Authenticator which is used to protect HTTP Endpoints using the simple and protected GSSAPI negotiation mechanism [SPNEGO](https://en.wikipedia.org/wiki/SPNEGO).
Make sure to [include](../../development/extensions.md#loading-extensions) `druid-kerberos` in the extensions load list.
Make sure to [include](../../configuration/extensions.md#loading-extensions) `druid-kerberos` in the extensions load list.
## Configuration
@ -61,7 +61,7 @@ The special string _HOST will be replaced automatically with the value of config
### `druid.auth.authenticator.kerberos.excludedPaths`
In older releases, the Kerberos authenticator had an `excludedPaths` property that allowed the user to specify a list of paths where authentication checks should be skipped. This property has been removed from the Kerberos authenticator because the path exclusion functionality is now handled across all authenticators/authorizers by setting `druid.auth.unsecuredPaths`, as described in the [main auth documentation](../../design/auth.md).
In older releases, the Kerberos authenticator had an `excludedPaths` property that allowed the user to specify a list of paths where authentication checks should be skipped. This property has been removed from the Kerberos authenticator because the path exclusion functionality is now handled across all authenticators/authorizers by setting `druid.auth.unsecuredPaths`, as described in the [main auth documentation](../../operations/auth.md).
### Auth to Local Syntax
`druid.auth.authenticator.kerberos.authToLocal` allows you to set a general rules for mapping principal names to local user names.

View File

@ -28,7 +28,7 @@ The main goal of this cache is to speed up the access to a high latency lookup s
Thus user can define various caching strategies or and implementation per lookup, even if the source is the same.
This module can be used side to side with other lookup module like the global cached lookup module.
To use this Apache Druid extension, [include](../extensions.md#loading-extensions) `druid-lookups-cached-single` in the extensions load list.
To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `druid-lookups-cached-single` in the extensions load list.
> If using JDBC, you will need to add your database's client JAR files to the extension's directory.
> For Postgres, the connector JAR is already included.

View File

@ -22,9 +22,9 @@ title: "Apache Ranger Security"
~ under the License.
-->
This Apache Druid extension adds an Authorizer which implements access control for Druid, backed by [Apache Ranger](https://ranger.apache.org/). Please see [Authentication and Authorization](../../design/auth.md) for more information on the basic facilities this extension provides.
This Apache Druid extension adds an Authorizer which implements access control for Druid, backed by [Apache Ranger](https://ranger.apache.org/). Please see [Authentication and Authorization](../../operations/auth.md) for more information on the basic facilities this extension provides.
Make sure to [include](../../development/extensions.md#loading-extensions) `druid-ranger-security` in the extensions load list.
Make sure to [include](../../configuration/extensions.md#loading-extensions) `druid-ranger-security` in the extensions load list.
> The latest release of Apache Ranger is at the time of writing version 2.0. This version has a dependency on `log4j 1.2.17` which has a vulnerability if you configure it to use a `SocketServer` (CVE-2019-17571). Next to that, it also includes Kafka 2.0.0 which has 2 known vulnerabilities (CVE-2019-12399, CVE-2018-17196). Kafka can be used by the audit component in Ranger, but is not required.
@ -98,7 +98,7 @@ When installing a new Druid service in Apache Ranger for the first time, Ranger
### HTTP methods
For information on what HTTP methods are supported for a particular request endpoint, please refer to the [API documentation](../../operations/api-reference.md).
For information on what HTTP methods are supported for a particular request endpoint, please refer to the [API documentation](../../api-reference/api-reference.md).
GET requires READ permission, while POST and DELETE require WRITE permission.

View File

@ -28,7 +28,7 @@ This extension allows you to do 2 things:
* [Ingest data](#reading-data-from-google-cloud-storage) from files stored in Google Cloud Storage.
* Write segments to [deep storage](#deep-storage) in GCS.
To use this Apache Druid extension, [include](../../development/extensions.md#loading-extensions) `druid-google-extensions` in the extensions load list.
To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `druid-google-extensions` in the extensions load list.
### Required Configuration
@ -36,7 +36,7 @@ To configure connectivity to google cloud, run druid processes with `GOOGLE_APPL
### Reading data from Google Cloud Storage
The [Google Cloud Storage input source](../../ingestion/native-batch-input-source.md) is supported by the [Parallel task](../../ingestion/native-batch.md)
The [Google Cloud Storage input source](../../ingestion/input-sources.md) is supported by the [Parallel task](../../ingestion/native-batch.md)
to read objects directly from Google Cloud Storage. If you use the [Hadoop task](../../ingestion/hadoop.md),
you can read data from Google Cloud Storage by specifying the paths in your [`inputSpec`](../../ingestion/hadoop.md#inputspec).

View File

@ -23,7 +23,7 @@ title: "HDFS"
-->
To use this Apache Druid extension, [include](../../development/extensions.md#loading-extensions) `druid-hdfs-storage` in the extensions load list and run druid processes with `GOOGLE_APPLICATION_CREDENTIALS=/path/to/service_account_keyfile` in the environment.
To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `druid-hdfs-storage` in the extensions load list and run druid processes with `GOOGLE_APPLICATION_CREDENTIALS=/path/to/service_account_keyfile` in the environment.
## Deep Storage
@ -153,12 +153,12 @@ Tested with Druid 0.17.0, Hadoop 2.8.5 and gcs-connector jar 2.0.0-hadoop2.
### Native batch ingestion
The [HDFS input source](../../ingestion/native-batch-input-source.md#hdfs-input-source) is supported by the [Parallel task](../../ingestion/native-batch.md)
The [HDFS input source](../../ingestion/input-sources.md#hdfs-input-source) is supported by the [Parallel task](../../ingestion/native-batch.md)
to read files directly from the HDFS Storage. You may be able to read objects from cloud storage
with the HDFS input source, but we highly recommend to use a proper
[Input Source](../../ingestion/native-batch-input-source.md) instead if possible because
it is simple to set up. For now, only the [S3 input source](../../ingestion/native-batch-input-source.md#s3-input-source)
and the [Google Cloud Storage input source](../../ingestion/native-batch-input-source.md#google-cloud-storage-input-source)
[Input Source](../../ingestion/input-sources.md) instead if possible because
it is simple to set up. For now, only the [S3 input source](../../ingestion/input-sources.md#s3-input-source)
and the [Google Cloud Storage input source](../../ingestion/input-sources.md#google-cloud-storage-input-source)
are supported for cloud storage types, and so you may still want to use the HDFS input source
to read from cloud storage other than those two.

View File

@ -22,7 +22,7 @@ title: "Apache Kafka Lookups"
~ under the License.
-->
To use this Apache Druid extension, [include](../../development/extensions.md#loading-extensions) `druid-lookups-cached-global` and `druid-kafka-extraction-namespace` in the extensions load list.
To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `druid-lookups-cached-global` and `druid-kafka-extraction-namespace` in the extensions load list.
If you need updates to populate as promptly as possible, it is possible to plug into a Kafka topic whose key is the old value and message is the desired new value (both in UTF-8) as a LookupExtractorFactory.

View File

@ -49,7 +49,7 @@ If your Kafka cluster enables consumer-group based ACLs, you can set `group.id`
## Load the Kafka indexing service
To use the Kafka indexing service, load the `druid-kafka-indexing-service` extension on both the Overlord and the MiddleManagers. See [Loading extensions](../extensions.md#loading-extensions) for instructions on how to configure extensions.
To use the Kafka indexing service, load the `druid-kafka-indexing-service` extension on both the Overlord and the MiddleManagers. See [Loading extensions](../../configuration/extensions.md) for instructions on how to configure extensions.
## Define a supervisor spec

View File

@ -25,7 +25,7 @@ description: "Reference topic for running and maintaining Apache Kafka superviso
-->
This topic contains operations reference information to run and maintain Apache Kafka supervisors for Apache Druid. It includes descriptions of how some supervisor APIs work within Kafka Indexing Service.
For all supervisor APIs, see [Supervisor APIs](../../operations/api-reference.md#supervisors).
For all supervisor APIs, see [Supervisor APIs](../../api-reference/api-reference.md#supervisors).
## Getting Supervisor Status Report

View File

@ -205,7 +205,7 @@ The `tuningConfig` is optional and default parameters will be used if no `tuning
| `indexSpecForIntermediatePersists`| | Defines segment storage format options to be used at indexing time for intermediate persisted temporary segments. This can be used to disable dimension/metric compression on intermediate segments to reduce memory required for final merging. However, disabling compression on intermediate segments might increase page cache use while they are used before getting merged into final segment published, see [IndexSpec](#indexspec) for possible values. | no (default = same as `indexSpec`) |
| `reportParseExceptions` | Boolean | *DEPRECATED*. If true, exceptions encountered during parsing will be thrown and will halt ingestion; if false, unparseable rows and fields will be skipped. Setting `reportParseExceptions` to true will override existing configurations for `maxParseExceptions` and `maxSavedParseExceptions`, setting `maxParseExceptions` to 0 and limiting `maxSavedParseExceptions` to no more than 1. | no (default == false) |
| `handoffConditionTimeout` | Long | Milliseconds to wait for segment handoff. It must be >= 0, where 0 means to wait forever. | no (default == 0) |
| `resetOffsetAutomatically` | Boolean | Controls behavior when Druid needs to read Kafka messages that are no longer available (i.e. when `OffsetOutOfRangeException` is encountered).<br/><br/>If false, the exception will bubble up, which will cause your tasks to fail and ingestion to halt. If this occurs, manual intervention is required to correct the situation; potentially using the [Reset Supervisor API](../../operations/api-reference.md#supervisors). This mode is useful for production, since it will make you aware of issues with ingestion.<br/><br/>If true, Druid will automatically reset to the earlier or latest offset available in Kafka, based on the value of the `useEarliestOffset` property (earliest if true, latest if false). Note that this can lead to data being _DROPPED_ (if `useEarliestOffset` is false) or _DUPLICATED_ (if `useEarliestOffset` is true) without your knowledge. Messages will be logged indicating that a reset has occurred, but ingestion will continue. This mode is useful for non-production situations, since it will make Druid attempt to recover from problems automatically, even if they lead to quiet dropping or duplicating of data.<br/><br/>This feature behaves similarly to the Kafka `auto.offset.reset` consumer property. | no (default == false) |
| `resetOffsetAutomatically` | Boolean | Controls behavior when Druid needs to read Kafka messages that are no longer available (i.e. when `OffsetOutOfRangeException` is encountered).<br/><br/>If false, the exception will bubble up, which will cause your tasks to fail and ingestion to halt. If this occurs, manual intervention is required to correct the situation; potentially using the [Reset Supervisor API](../../api-reference/api-reference.md#supervisors). This mode is useful for production, since it will make you aware of issues with ingestion.<br/><br/>If true, Druid will automatically reset to the earlier or latest offset available in Kafka, based on the value of the `useEarliestOffset` property (earliest if true, latest if false). Note that this can lead to data being _DROPPED_ (if `useEarliestOffset` is false) or _DUPLICATED_ (if `useEarliestOffset` is true) without your knowledge. Messages will be logged indicating that a reset has occurred, but ingestion will continue. This mode is useful for non-production situations, since it will make Druid attempt to recover from problems automatically, even if they lead to quiet dropping or duplicating of data.<br/><br/>This feature behaves similarly to the Kafka `auto.offset.reset` consumer property. | no (default == false) |
| `workerThreads` | Integer | The number of threads that the supervisor uses to handle requests/responses for worker tasks, along with any other internal asynchronous operation. | no (default == min(10, taskCount)) |
| `chatAsync` | Boolean | If true, use asynchronous communication with indexing tasks, and ignore the `chatThreads` parameter. If false, use synchronous communication in a thread pool of size `chatThreads`. | no (default == true) |
| `chatThreads` | Integer | The number of threads that will be used for communicating with indexing tasks. Ignored if `chatAsync` is `true` (the default). | no (default == min(10, taskCount * replicas)) |
@ -217,7 +217,7 @@ The `tuningConfig` is optional and default parameters will be used if no `tuning
| `intermediateHandoffPeriod` | ISO8601 Period | How often the tasks should hand off segments. Handoff will happen either if `maxRowsPerSegment` or `maxTotalRows` is hit or every `intermediateHandoffPeriod`, whichever happens earlier. | no (default == P2147483647D) |
| `logParseExceptions` | Boolean | If true, log an error message when a parsing exception occurs, containing information about the row where the error occurred. | no, default == false |
| `maxParseExceptions` | Integer | The maximum number of parse exceptions that can occur before the task halts ingestion and fails. Overridden if `reportParseExceptions` is set. | no, unlimited default |
| `maxSavedParseExceptions` | Integer | When a parse exception occurs, Druid can keep track of the most recent parse exceptions. `maxSavedParseExceptions` limits how many exception instances will be saved. These saved exceptions will be made available after the task finishes in the [task completion report](../../ingestion/tasks.md#reports). Overridden if `reportParseExceptions` is set. | no, default == 0 |
| `maxSavedParseExceptions` | Integer | When a parse exception occurs, Druid can keep track of the most recent parse exceptions. `maxSavedParseExceptions` limits how many exception instances will be saved. These saved exceptions will be made available after the task finishes in the [task completion report](../../ingestion/tasks.md#task-reports). Overridden if `reportParseExceptions` is set. | no, default == 0 |
#### IndexSpec

View File

@ -30,7 +30,7 @@ When you enable the Kinesis indexing service, you can configure *supervisors* on
To use the Kinesis indexing service, load the `druid-kinesis-indexing-service` core Apache Druid extension (see
[Including Extensions](../../development/extensions.md#loading-extensions)).
[Including Extensions](../../configuration/extensions.md#loading-extensions)).
> Before you deploy the Kinesis extension to production, read the [Kinesis known issues](#kinesis-known-issues).
@ -284,7 +284,7 @@ The `tuningConfig` is optional. If no `tuningConfig` is specified, default param
|`indexSpecForIntermediatePersists`|Object|Defines segment storage format options to be used at indexing time for intermediate persisted temporary segments. This can be used to disable dimension/metric compression on intermediate segments to reduce memory required for final merging. However, disabling compression on intermediate segments might increase page cache use while they are used before getting merged into final segment published, see [IndexSpec](#indexspec) for possible values.| no (default = same as `indexSpec`)|
|`reportParseExceptions`|Boolean|If true, exceptions encountered during parsing will be thrown and will halt ingestion; if false, unparseable rows and fields will be skipped.|no (default == false)|
|`handoffConditionTimeout`|Long| Milliseconds to wait for segment handoff. It must be >= 0, where 0 means to wait forever.| no (default == 0)|
|`resetOffsetAutomatically`|Boolean|Controls behavior when Druid needs to read Kinesis messages that are no longer available.<br/><br/>If false, the exception will bubble up, which will cause your tasks to fail and ingestion to halt. If this occurs, manual intervention is required to correct the situation; potentially using the [Reset Supervisor API](../../operations/api-reference.md#supervisors). This mode is useful for production, since it will make you aware of issues with ingestion.<br/><br/>If true, Druid will automatically reset to the earlier or latest sequence number available in Kinesis, based on the value of the `useEarliestSequenceNumber` property (earliest if true, latest if false). Please note that this can lead to data being _DROPPED_ (if `useEarliestSequenceNumber` is false) or _DUPLICATED_ (if `useEarliestSequenceNumber` is true) without your knowledge. Messages will be logged indicating that a reset has occurred, but ingestion will continue. This mode is useful for non-production situations, since it will make Druid attempt to recover from problems automatically, even if they lead to quiet dropping or duplicating of data.|no (default == false)|
|`resetOffsetAutomatically`|Boolean|Controls behavior when Druid needs to read Kinesis messages that are no longer available.<br/><br/>If false, the exception will bubble up, which will cause your tasks to fail and ingestion to halt. If this occurs, manual intervention is required to correct the situation; potentially using the [Reset Supervisor API](../../api-reference/api-reference.md#supervisors). This mode is useful for production, since it will make you aware of issues with ingestion.<br/><br/>If true, Druid will automatically reset to the earlier or latest sequence number available in Kinesis, based on the value of the `useEarliestSequenceNumber` property (earliest if true, latest if false). Please note that this can lead to data being _DROPPED_ (if `useEarliestSequenceNumber` is false) or _DUPLICATED_ (if `useEarliestSequenceNumber` is true) without your knowledge. Messages will be logged indicating that a reset has occurred, but ingestion will continue. This mode is useful for non-production situations, since it will make Druid attempt to recover from problems automatically, even if they lead to quiet dropping or duplicating of data.|no (default == false)|
|`skipSequenceNumberAvailabilityCheck`|Boolean|Whether to enable checking if the current sequence number is still available in a particular Kinesis shard. If set to false, the indexing task will attempt to reset the current sequence number (or not), depending on the value of `resetOffsetAutomatically`.|no (default == false)|
|`workerThreads`|Integer|The number of threads that the supervisor uses to handle requests/responses for worker tasks, along with any other internal asynchronous operation.|no (default == min(10, taskCount))|
|`chatAsync`|Boolean| If true, use asynchronous communication with indexing tasks, and ignore the `chatThreads` parameter. If false, use synchronous communication in a thread pool of size `chatThreads`. | no (default == true) |
@ -338,7 +338,7 @@ For Concise bitmaps:
## Operations
This section describes how some supervisor APIs work in Kinesis Indexing Service.
For all supervisor APIs, check [Supervisor APIs](../../operations/api-reference.md#supervisors).
For all supervisor APIs, check [Supervisor APIs](../../api-reference/api-reference.md#supervisors).
### AWS Authentication

View File

@ -29,7 +29,7 @@ Apache Druid Extension to enable using Kubernetes API Server for node discovery
## Configuration
To use this extension please make sure to [include](../../development/extensions.md#loading-extensions) `druid-kubernetes-extensions` in the extensions load list.
To use this extension please make sure to [include](../../configuration/extensions.md#loading-extensions) `druid-kubernetes-extensions` in the extensions load list.
This extension works together with HTTP based segment and task management in Druid. Consequently, following configurations must be set on all Druid nodes.

View File

@ -22,7 +22,7 @@ title: "Globally Cached Lookups"
~ under the License.
-->
To use this Apache Druid extension, [include](../extensions.md#loading-extensions) `druid-lookups-cached-global` in the extensions load list.
To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `druid-lookups-cached-global` in the extensions load list.
## Configuration
> Static configuration is no longer supported. Lookups can be configured through
@ -168,7 +168,7 @@ It's highly recommended that `druid.lookup.namespace.numBufferedEntries` is set
## Supported lookups
For additional lookups, please see our [extensions list](../extensions.md).
For additional lookups, please see our [extensions list](../../configuration/extensions.md).
### URI lookup

View File

@ -23,7 +23,7 @@ title: "MySQL Metadata Store"
-->
To use this Apache Druid extension, [include](../../development/extensions.md#loading-extensions) `mysql-metadata-storage` in the extensions load list.
To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `mysql-metadata-storage` in the extensions load list.
> The MySQL extension requires the MySQL Connector/J library or MariaDB Connector/J library, neither of which are included in the Druid distribution.
> Refer to the following section for instructions on how to install this library.

View File

@ -30,7 +30,7 @@ The extension provides the [ORC input format](../../ingestion/data-formats.md#or
for [native batch ingestion](../../ingestion/native-batch.md) and [Hadoop batch ingestion](../../ingestion/hadoop.md), respectively.
Please see corresponding docs for details.
To use this extension, make sure to [include](../../development/extensions.md#loading-extensions) `druid-orc-extensions` in the extensions load list.
To use this extension, make sure to [include](../../configuration/extensions.md#loading-extensions) `druid-orc-extensions` in the extensions load list.
### Migration from 'contrib' extension
This extension, first available in version 0.15.0, replaces the previous 'contrib' extension which was available until

View File

@ -27,7 +27,7 @@ This Apache Druid module extends [Druid Hadoop based indexing](../../ingestion/h
Apache Parquet files.
Note: If using the `parquet-avro` parser for Apache Hadoop based indexing, `druid-parquet-extensions` depends on the `druid-avro-extensions` module, so be sure to
[include both](../../development/extensions.md#loading-extensions).
[include both](../../configuration/extensions.md#loading-extensions).
The `druid-parquet-extensions` provides the [Parquet input format](../../ingestion/data-formats.md#parquet), the [Parquet Hadoop parser](../../ingestion/data-formats.md#parquet-hadoop-parser),
and the [Parquet Avro Hadoop Parser](../../ingestion/data-formats.md#parquet-avro-hadoop-parser) with `druid-avro-extensions`.

View File

@ -23,7 +23,7 @@ title: "PostgreSQL Metadata Store"
-->
To use this Apache Druid extension, [include](../../development/extensions.md#loading-extensions) `postgresql-metadata-storage` in the extensions load list.
To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `postgresql-metadata-storage` in the extensions load list.
## Setting up PostgreSQL
@ -87,7 +87,7 @@ In most cases, the configuration options map directly to the [postgres JDBC conn
### PostgreSQL Firehose
The PostgreSQL extension provides an implementation of an [SQL input source](../../ingestion/native-batch-input-source.md) which can be used to ingest data into Druid from a PostgreSQL database.
The PostgreSQL extension provides an implementation of an [SQL input source](../../ingestion/input-sources.md) which can be used to ingest data into Druid from a PostgreSQL database.
```json
{

View File

@ -23,7 +23,7 @@ title: "Protobuf"
-->
This Apache Druid extension enables Druid to ingest and understand the Protobuf data format. Make sure to [include](../../development/extensions.md#loading-extensions) `druid-protobuf-extensions` in the extensions load list.
This Apache Druid extension enables Druid to ingest and understand the Protobuf data format. Make sure to [include](../../configuration/extensions.md#loading-extensions) `druid-protobuf-extensions` in the extensions load list.
The `druid-protobuf-extensions` provides the [Protobuf Parser](../../ingestion/data-formats.md#protobuf-parser)
for [stream ingestion](../../ingestion/index.md#streaming). See corresponding docs for details.

View File

@ -28,11 +28,11 @@ This extension allows you to do 2 things:
* [Ingest data](#reading-data-from-s3) from files stored in S3.
* Write segments to [deep storage](#deep-storage) in S3.
To use this Apache Druid extension, [include](../../development/extensions.md#loading-extensions) `druid-s3-extensions` in the extensions load list.
To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `druid-s3-extensions` in the extensions load list.
### Reading data from S3
Use a native batch [Parallel task](../../ingestion/native-batch.md) with an [S3 input source](../../ingestion/native-batch-input-source.md#s3-input-source) to read objects directly from S3.
Use a native batch [Parallel task](../../ingestion/native-batch.md) with an [S3 input source](../../ingestion/input-sources.md#s3-input-source) to read objects directly from S3.
Alternatively, use a [Hadoop task](../../ingestion/hadoop.md),
and specify S3 paths in your [`inputSpec`](../../ingestion/hadoop.md#inputspec).
@ -79,7 +79,7 @@ The configuration options are listed in order of precedence. For example, if yo
For more information, refer to the [Amazon Developer Guide](https://docs.aws.amazon.com/fr_fr/sdk-for-java/v1/developer-guide/credentials).
Alternatively, you can bypass this chain by specifying an access key and secret key using a [Properties Object](../../ingestion/native-batch-input-source.md#s3-input-source) inside your ingestion specification.
Alternatively, you can bypass this chain by specifying an access key and secret key using a [Properties Object](../../ingestion/input-sources.md#s3-input-source) inside your ingestion specification.
Use the property [`druid.startup.logging.maskProperties`](../../configuration/index.md#startup-logging) to mask credentials information in Druid logs. For example, `["password", "secretKey", "awsSecretAccessKey"]`.

View File

@ -23,7 +23,7 @@ title: "Stats aggregator"
-->
This Apache Druid extension includes stat-related aggregators, including variance and standard deviations, etc. Make sure to [include](../../development/extensions.md#loading-extensions) `druid-stats` in the extensions load list.
This Apache Druid extension includes stat-related aggregators, including variance and standard deviations, etc. Make sure to [include](../../configuration/extensions.md#loading-extensions) `druid-stats` in the extensions load list.
## Variance aggregator

View File

@ -1,6 +1,7 @@
---
id: data-formats
title: "Data formats"
title: Source input formats
sidebar_label: Source input formats
---
<!--
@ -27,7 +28,7 @@ We welcome any contributions to new formats.
This page lists all default and core extension data formats supported by Druid.
For additional data formats supported with community extensions,
please see our [community extensions list](../development/extensions.md#community-extensions).
please see our [community extensions list](../configuration/extensions.md#community-extensions).
## Formatting data
@ -690,7 +691,7 @@ and [Kinesis indexing service](../development/extensions-core/kinesis-ingestion.
Consider using the [input format](#input-format) instead for these types of ingestion.
This section lists all default and core extension parsers.
For community extension parsers, please see our [community extensions list](../development/extensions.md#community-extensions).
For community extension parsers, please see our [community extensions list](../configuration/extensions.md#community-extensions).
### String Parser

View File

@ -33,7 +33,7 @@ If the number of ingested events seem correct, make sure your query is correctly
## Where do my Druid segments end up after ingestion?
Depending on what `druid.storage.type` is set to, Druid will upload segments to some [Deep Storage](../dependencies/deep-storage.md). Local disk is used as the default deep storage.
Depending on what `druid.storage.type` is set to, Druid will upload segments to some [Deep Storage](../design/deep-storage.md). Local disk is used as the default deep storage.
## My stream ingest is not handing segments off
@ -51,21 +51,21 @@ Other common reasons that hand-off fails are as follows:
## How do I get HDFS to work?
Make sure to include the `druid-hdfs-storage` and all the hadoop configuration, dependencies (that can be obtained by running command `hadoop classpath` on a machine where hadoop has been setup) in the classpath. And, provide necessary HDFS settings as described in [deep storage](../dependencies/deep-storage.md) .
Make sure to include the `druid-hdfs-storage` and all the hadoop configuration, dependencies (that can be obtained by running command `hadoop classpath` on a machine where hadoop has been setup) in the classpath. And, provide necessary HDFS settings as described in [deep storage](../design/deep-storage.md) .
## How do I know when I can make query to Druid after submitting batch ingestion task?
You can verify if segments created by a recent ingestion task are loaded onto historicals and available for querying using the following workflow.
1. Submit your ingestion task.
2. Repeatedly poll the [Overlord's tasks API](../operations/api-reference.md#tasks) ( `/druid/indexer/v1/task/{taskId}/status`) until your task is shown to be successfully completed.
3. Poll the [Segment Loading by Datasource API](../operations/api-reference.md#segment-loading-by-datasource) (`/druid/coordinator/v1/datasources/{dataSourceName}/loadstatus`) with
2. Repeatedly poll the [Overlord's tasks API](../api-reference/api-reference.md#tasks) ( `/druid/indexer/v1/task/{taskId}/status`) until your task is shown to be successfully completed.
3. Poll the [Segment Loading by Datasource API](../api-reference/api-reference.md#segment-loading-by-datasource) (`/druid/coordinator/v1/datasources/{dataSourceName}/loadstatus`) with
`forceMetadataRefresh=true` and `interval=<INTERVAL_OF_INGESTED_DATA>` once.
(Note: `forceMetadataRefresh=true` refreshes Coordinator's metadata cache of all datasources. This can be a heavy operation in terms of the load on the metadata store but is necessary to make sure that we verify all the latest segments' load status)
If there are segments not yet loaded, continue to step 4, otherwise you can now query the data.
4. Repeatedly poll the [Segment Loading by Datasource API](../operations/api-reference.md#segment-loading-by-datasource) (`/druid/coordinator/v1/datasources/{dataSourceName}/loadstatus`) with
4. Repeatedly poll the [Segment Loading by Datasource API](../api-reference/api-reference.md#segment-loading-by-datasource) (`/druid/coordinator/v1/datasources/{dataSourceName}/loadstatus`) with
`forceMetadataRefresh=false` and `interval=<INTERVAL_OF_INGESTED_DATA>`.
Continue polling until all segments are loaded. Once all segments are loaded you can now query the data.
Note that this workflow only guarantees that the segments are available at the time of the [Segment Loading by Datasource API](../operations/api-reference.md#segment-loading-by-datasource) call. Segments can still become missing because of historical process failures or any other reasons afterward.
Note that this workflow only guarantees that the segments are available at the time of the [Segment Loading by Datasource API](../api-reference/api-reference.md#segment-loading-by-datasource) call. Segments can still become missing because of historical process failures or any other reasons afterward.
## I don't see my Druid segments on my Historical processes

View File

@ -28,7 +28,7 @@ instance of a Druid [Overlord](../design/overlord.md). Please refer to our [Hado
comparisons between Hadoop-based, native batch (simple), and native batch (parallel) ingestion.
To run a Hadoop-based ingestion task, write an ingestion spec as specified below. Then POST it to the
[`/druid/indexer/v1/task`](../operations/api-reference.md#tasks) endpoint on the Overlord, or use the
[`/druid/indexer/v1/task`](../api-reference/api-reference.md#tasks) endpoint on the Overlord, or use the
`bin/post-index-task` script included with Druid.
## Tutorial

View File

@ -1,6 +1,7 @@
---
id: index
title: "Ingestion"
title: Ingestion overview
sidebar_label: Overview
---
<!--
@ -30,11 +31,11 @@ For most ingestion methods, the Druid [MiddleManager](../design/middlemanager.md
[Indexer](../design/indexer.md) processes load your source data. The sole exception is Hadoop-based ingestion, which
uses a Hadoop MapReduce job on YARN.
During ingestion, Druid creates segments and stores them in [deep storage](../dependencies/deep-storage.md). Historical nodes load the segments into memory to respond to queries. For streaming ingestion, the Middle Managers and indexers can respond to queries in real-time with arriving data. See the [Storage design](../design/architecture.md#storage-design) section of the Druid design documentation for more information.
During ingestion, Druid creates segments and stores them in [deep storage](../design/deep-storage.md). Historical nodes load the segments into memory to respond to queries. For streaming ingestion, the Middle Managers and indexers can respond to queries in real-time with arriving data. See the [Storage design](../design/architecture.md#storage-design) section of the Druid design documentation for more information.
This topic introduces streaming and batch ingestion methods. The following topics describe ingestion concepts and information that apply to all [ingestion methods](#ingestion-methods):
- [Druid data model](./data-model.md) introduces concepts of datasources, primary timestamp, dimensions, and metrics.
- [Druid schema model](./schema-model.md) introduces concepts of datasources, primary timestamp, dimensions, and metrics.
- [Data rollup](./rollup.md) describes rollup as a concept and provides suggestions to maximize the benefits of rollup.
- [Partitioning](./partitioning.md) describes time chunk and secondary partitioning in Druid.
- [Ingestion spec reference](./ingestion-spec.md) provides a reference for the configuration options in the ingestion spec.
@ -68,13 +69,13 @@ runs for the duration of the job.
| **Method** | [Native batch](./native-batch.md) | [SQL](../multi-stage-query/index.md) | [Hadoop-based](hadoop.md) |
|---|-----|--------------|------------|
| **Controller task type** | `index_parallel` | `query_controller` | `index_hadoop` |
| **How you submit it** | Send an `index_parallel` spec to the [task API](../operations/api-reference.md#tasks). | Send an [INSERT](../multi-stage-query/concepts.md#insert) or [REPLACE](../multi-stage-query/concepts.md#replace) statement to the [SQL task API](../multi-stage-query/api.md#submit-a-query). | Send an `index_hadoop` spec to the [task API](../operations/api-reference.md#tasks). |
| **How you submit it** | Send an `index_parallel` spec to the [task API](../api-reference/api-reference.md#tasks). | Send an [INSERT](../multi-stage-query/concepts.md#insert) or [REPLACE](../multi-stage-query/concepts.md#replace) statement to the [SQL task API](../api-reference/sql-ingestion-api.md#submit-a-query). | Send an `index_hadoop` spec to the [task API](../api-reference/api-reference.md#tasks). |
| **Parallelism** | Using subtasks, if [`maxNumConcurrentSubTasks`](native-batch.md#tuningconfig) is greater than 1. | Using `query_worker` subtasks. | Using YARN. |
| **Fault tolerance** | Workers automatically relaunched upon failure. Controller task failure leads to job failure. | Controller or worker task failure leads to job failure. | YARN containers automatically relaunched upon failure. Controller task failure leads to job failure. |
| **Can append?** | Yes. | Yes (INSERT). | No. |
| **Can overwrite?** | Yes. | Yes (REPLACE). | Yes. |
| **External dependencies** | None. | None. | Hadoop cluster. |
| **Input sources** | Any [`inputSource`](./native-batch-input-source.md). | Any [`inputSource`](./native-batch-input-source.md) (using [EXTERN](../multi-stage-query/concepts.md#extern)) or Druid datasource (using FROM). | Any Hadoop FileSystem or Druid datasource. |
| **Input sources** | Any [`inputSource`](./input-sources.md). | Any [`inputSource`](./input-sources.md) (using [EXTERN](../multi-stage-query/concepts.md#extern)) or Druid datasource (using FROM). | Any Hadoop FileSystem or Druid datasource. |
| **Input formats** | Any [`inputFormat`](./data-formats.md#input-format). | Any [`inputFormat`](./data-formats.md#input-format). | Any Hadoop InputFormat. |
| **Secondary partitioning options** | Dynamic, hash-based, and range-based partitioning methods are available. See [partitionsSpec](./native-batch.md#partitionsspec) for details.| Range partitioning ([CLUSTERED BY](../multi-stage-query/concepts.md#clustering)). | Hash-based or range-based partitioning via [`partitionsSpec`](hadoop.md#partitionsspec). |
| **[Rollup modes](./rollup.md#perfect-rollup-vs-best-effort-rollup)** | Perfect if `forceGuaranteedRollup` = true in the [`tuningConfig`](native-batch.md#tuningconfig). | Always perfect. | Always perfect. |

View File

@ -1,7 +1,7 @@
---
id: ingestion-spec
title: Ingestion spec reference
sidebar_label: Ingestion spec
sidebar_label: Ingestion spec reference
description: Reference for the configuration options in the ingestion spec.
---
@ -157,7 +157,7 @@ The `dataSource` is located in `dataSchema` → `dataSource` and is simply the n
### `timestampSpec`
The `timestampSpec` is located in `dataSchema``timestampSpec` and is responsible for
configuring the [primary timestamp](./data-model.md#primary-timestamp). An example `timestampSpec` is:
configuring the [primary timestamp](./schema-model.md#primary-timestamp). An example `timestampSpec` is:
```
"timestampSpec": {
@ -186,7 +186,7 @@ Treat `__time` as a millisecond timestamp: the number of milliseconds since Jan
### `dimensionsSpec`
The `dimensionsSpec` is located in `dataSchema``dimensionsSpec` and is responsible for
configuring [dimensions](./data-model.md#dimensions).
configuring [dimensions](./schema-model.md#dimensions). An example `dimensionsSpec` is:
You can either manually specify the dimensions or take advantage of schema auto-discovery where you allow Druid to infer all or some of the schema for your data. This means that you don't have to explicitly specify your dimensions and their type.
@ -223,8 +223,8 @@ A `dimensionsSpec` can have the following components:
|------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------|
| `dimensions` | A list of [dimension names or objects](#dimension-objects). You cannot include the same column in both `dimensions` and `dimensionExclusions`.<br /><br />If `dimensions` and `spatialDimensions` are both null or empty arrays, Druid treats all columns other than timestamp or metrics that do not appear in `dimensionExclusions` as String-typed dimension columns. See [inclusions and exclusions](#inclusions-and-exclusions) for details.<br /><br />As a best practice, put the most frequently filtered dimensions at the beginning of the dimensions list. In this case, it would also be good to consider [`partitioning`](partitioning.md) by those same dimensions. | `[]` |
| `dimensionExclusions` | The names of dimensions to exclude from ingestion. Only names are supported here, not objects.<br /><br />This list is only used if the `dimensions` and `spatialDimensions` lists are both null or empty arrays; otherwise it is ignored. See [inclusions and exclusions](#inclusions-and-exclusions) below for details. | `[]` |
| `spatialDimensions` | An array of [spatial dimensions](../development/geo.md). | `[]` |
| `includeAllDimensions` | Note that this field only applies to string-based schema discovery where Druid ingests dimensions it discovers as strings. This is different from schema auto-discovery where Druid infers the type for data. You can set `includeAllDimensions` to true to ingest both explicit dimensions in the `dimensions` field and other dimensions that the ingestion task discovers from input data. In this case, the explicit dimensions will appear first in the order that you specify them, and the dimensions dynamically discovered will come after. This flag can be useful especially with auto schema discovery using [`flattenSpec`](./data-formats.html#flattenspec). If this is not set and the `dimensions` field is not empty, Druid will ingest only explicit dimensions. If this is not set and the `dimensions` field is empty, all discovered dimensions will be ingested. | false |
| `spatialDimensions` | An array of [spatial dimensions](../querying/geo.md). | `[]` |
| `includeAllDimensions` | Note that this field only applies to string-based schema discovery where Druid ingests dimensions it discovers as strings. This is different from schema auto-discovery where Druid infers the type for data. You can set `includeAllDimensions` to true to ingest both explicit dimensions in the `dimensions` field and other dimensions that the ingestion task discovers from input data. In this case, the explicit dimensions will appear first in the order that you specify them, and the dimensions dynamically discovered will come after. This flag can be useful especially with auto schema discovery using [`flattenSpec`](./data-formats.md#flattenspec). If this is not set and the `dimensions` field is not empty, Druid will ingest only explicit dimensions. If this is not set and the `dimensions` field is empty, all discovered dimensions will be ingested. | false |
| `useSchemaDiscovery` | Configure Druid to use schema auto-discovery to discover some or all of the dimensions and types for your data. For any dimensions that aren't a uniform type, Druid ingests them as JSON. You can use this for native batch or streaming ingestion. | false |
@ -297,7 +297,7 @@ the following operations:
3. Specifying which time chunks of segments should be created, for batch ingestion (via `intervals`).
4. Specifying whether ingestion-time [rollup](./rollup.md) should be used or not (via `rollup`).
Other than `rollup`, these operations are all based on the [primary timestamp](./data-model.md#primary-timestamp).
Other than `rollup`, these operations are all based on the [primary timestamp](./schema-model.md#primary-timestamp).
An example `granularitySpec` is:
@ -367,7 +367,7 @@ Druid currently includes one kind of built-in transform, the expression transfor
}
```
The `expression` is a [Druid query expression](../misc/math-expr.md).
The `expression` is a [Druid query expression](../querying/math-expr.md).
> Conceptually, after input data records are read, Druid applies ingestion spec components in a particular order:
> first [`flattenSpec`](data-formats.md#flattenspec) (if any), then [`timestampSpec`](#timestampspec), then [`transformSpec`](#transformspec),
@ -397,8 +397,8 @@ For details about `inputFormat` and supported `parser` types, see the ["Data for
For details about major components of the `parseSpec`, refer to their subsections:
- [`timestampSpec`](#timestampspec), responsible for configuring the [primary timestamp](./data-model.md#primary-timestamp).
- [`dimensionsSpec`](#dimensionsspec), responsible for configuring [dimensions](./data-model.md#dimensions).
- [`timestampSpec`](#timestampspec), responsible for configuring the [primary timestamp](./schema-model.md#primary-timestamp).
- [`dimensionsSpec`](#dimensionsspec), responsible for configuring [dimensions](./schema-model.md#dimensions).
- [`flattenSpec`](#flattenspec), responsible for flattening nested data formats.
An example `parser` is:

View File

@ -1,7 +1,7 @@
---
id: native-batch-input-sources
title: "Native batch input sources"
sidebar_label: "Native batch: input sources"
id: input-sources
title: "Input sources"
sidebar_label: "Input sources"
---
<!--

View File

@ -1,6 +1,6 @@
---
id: native-batch-firehose
title: "Native batch ingestion with firehose (Deprecated)"
title: "JSON-based batch ingestion with firehose (Deprecated)"
sidebar_label: "Firehose (deprecated)"
---
@ -23,7 +23,7 @@ sidebar_label: "Firehose (deprecated)"
~ under the License.
-->
> Firehose ingestion is deprecated. See [Migrate from firehose to input source ingestion](./migrate-from-firehose-ingestion.md) for instructions on migrating from firehose ingestion to using native batch ingestion input sources.
> Firehose ingestion is deprecated. See [Migrate from firehose to input source ingestion](../operations/migrate-from-firehose-ingestion.md) for instructions on migrating from firehose ingestion to using native batch ingestion input sources.
There are several firehoses readily available in Druid, some are meant for examples, others can be used directly in a production environment.

View File

@ -1,7 +1,7 @@
---
id: native-batch-simple-task
title: "Native batch simple task indexing"
sidebar_label: "Native batch (simple)"
title: "JSON-based batch simple task indexing"
sidebar_label: "JSON-based batch (simple)"
---
<!--

View File

@ -1,7 +1,7 @@
---
id: native-batch
title: "Native batch ingestion"
sidebar_label: "Native batch"
title: JSON-based batch
sidebar_label: JSON-based batch
---
<!--
@ -23,8 +23,7 @@ sidebar_label: "Native batch"
~ under the License.
-->
> This page describes native batch ingestion using [ingestion specs](ingestion-spec.md). Refer to the [ingestion
> methods](../ingestion/index.md#batch) table to determine which ingestion method is right for you.
> This page describes JSON-based batch ingestion using [ingestion specs](ingestion-spec.md). For SQL-based batch ingestion using the [`druid-multi-stage-query`](../multi-stage-query/index.md) extension, see [SQL-based ingestion](../multi-stage-query/index.md). Refer to the [ingestion methods](../ingestion/index.md#batch) table to determine which ingestion method is right for you.
Apache Druid supports the following types of native batch indexing tasks:
- Parallel task indexing (`index_parallel`) that can run multiple indexing tasks concurrently. Parallel task works well for production ingestion tasks.
@ -35,14 +34,14 @@ This topic covers the configuration for `index_parallel` ingestion specs.
For related information on batch indexing, see:
- [Batch ingestion method comparison table](./index.md#batch) for a comparison of batch ingestion methods.
- [Tutorial: Loading a file](../tutorials/tutorial-batch.md) for a tutorial on native batch ingestion.
- [Input sources](./native-batch-input-source.md) for possible input sources.
- [Input formats](./data-formats.md#input-format) for possible input formats.
- [Input sources](./input-sources.md) for possible input sources.
- [Source input formats](./data-formats.md#input-format) for possible input formats.
## Submit an indexing task
To run either kind of native batch indexing task you can:
- Use the **Load Data** UI in the web console to define and submit an ingestion spec.
- Define an ingestion spec in JSON based upon the [examples](#parallel-indexing-example) and reference topics for batch indexing. Then POST the ingestion spec to the [Indexer API endpoint](../operations/api-reference.md#tasks),
- Define an ingestion spec in JSON based upon the [examples](#parallel-indexing-example) and reference topics for batch indexing. Then POST the ingestion spec to the [Indexer API endpoint](../api-reference/api-reference.md#tasks),
`/druid/indexer/v1/task`, the Overlord service. Alternatively you can use the indexing script included with Druid at `bin/post-index-task`.
## Parallel task indexing
@ -196,7 +195,7 @@ The following table defines the primary sections of the input spec:
|type|The task type. For parallel task indexing, set the value to `index_parallel`.|yes|
|id|The task ID. If omitted, Druid generates the task ID using the task type, data source name, interval, and date-time stamp. |no|
|spec|The ingestion spec that defines the [data schema](#dataschema), [IO config](#ioconfig), and [tuning config](#tuningconfig).|yes|
|context|Context to specify various task configuration parameters. See [Task context parameters](tasks.md#context-parameters) for more details.|no|
|context|Context to specify various task configuration parameters. See [Task context parameters](../ingestion/tasks.md#context-parameters) for more details.|no|
### `dataSchema`
@ -263,7 +262,7 @@ The size-based split hint spec affects all splittable input sources except for t
#### Segments Split Hint Spec
The segments split hint spec is used only for [`DruidInputSource`](./native-batch-input-source.md).
The segments split hint spec is used only for [`DruidInputSource`](./input-sources.md).
|property|description|default|required?|
|--------|-----------|-------|---------|
@ -707,17 +706,17 @@ by assigning more task slots to them.
Use the `inputSource` object to define the location where your index can read data. Only the native parallel task and simple task support the input source.
For details on available input sources see:
- [S3 input source](./native-batch-input-source.md#s3-input-source) (`s3`) reads data from AWS S3 storage.
- [Google Cloud Storage input source](./native-batch-input-source.md#google-cloud-storage-input-source) (`gs`) reads data from Google Cloud Storage.
- [Azure input source](./native-batch-input-source.md#azure-input-source) (`azure`) reads data from Azure Blob Storage and Azure Data Lake.
- [HDFS input source](./native-batch-input-source.md#hdfs-input-source) (`hdfs`) reads data from HDFS storage.
- [HTTP input Source](./native-batch-input-source.md#http-input-source) (`http`) reads data from HTTP servers.
- [Inline input Source](./native-batch-input-source.md#inline-input-source) reads data you paste into the web console.
- [Local input Source](./native-batch-input-source.md#local-input-source) (`local`) reads data from local storage.
- [Druid input Source](./native-batch-input-source.md#druid-input-source) (`druid`) reads data from a Druid datasource.
- [SQL input Source](./native-batch-input-source.md#sql-input-source) (`sql`) reads data from a RDBMS source.
- [S3 input source](./input-sources.md#s3-input-source) (`s3`) reads data from AWS S3 storage.
- [Google Cloud Storage input source](./input-sources.md#google-cloud-storage-input-source) (`gs`) reads data from Google Cloud Storage.
- [Azure input source](./input-sources.md#azure-input-source) (`azure`) reads data from Azure Blob Storage and Azure Data Lake.
- [HDFS input source](./input-sources.md#hdfs-input-source) (`hdfs`) reads data from HDFS storage.
- [HTTP input Source](./input-sources.md#http-input-source) (`http`) reads data from HTTP servers.
- [Inline input Source](./input-sources.md#inline-input-source) reads data you paste into the web console.
- [Local input Source](./input-sources.md#local-input-source) (`local`) reads data from local storage.
- [Druid input Source](./input-sources.md#druid-input-source) (`druid`) reads data from a Druid datasource.
- [SQL input Source](./input-sources.md#sql-input-source) (`sql`) reads data from a RDBMS source.
For information on how to combine input sources, see [Combining input source](./native-batch-input-source.md#combining-input-source).
For information on how to combine input sources, see [Combining input source](./input-sources.md#combining-input-source).
### `segmentWriteOutMediumFactory`

View File

@ -1,7 +1,7 @@
---
id: rollup
title: "Data rollup"
sidebar_label: Data rollup
sidebar_label: Rollup
description: Introduces rollup as a concept. Provides suggestions to maximize the benefits of rollup. Differentiates between perfect and best-effort rollup.
---
@ -26,7 +26,7 @@ description: Introduces rollup as a concept. Provides suggestions to maximize th
Druid can roll up data at ingestion time to reduce the amount of raw data to store on disk. Rollup is a form of summarization or pre-aggregation. Rolling up data can dramatically reduce the size of data to be stored and reduce row counts by potentially orders of magnitude. As a trade-off for the efficiency of rollup, you lose the ability to query individual events.
At ingestion time, you control rollup with the `rollup` setting in the [`granularitySpec`](./ingestion-spec.md#granularityspec). Rollup is enabled by default. This means Druid combines into a single row any rows that have identical [dimension](./data-model.md#dimensions) values and [timestamp](./data-model.md#primary-timestamp) values after [`queryGranularity`-based truncation](./ingestion-spec.md#granularityspec).
At ingestion time, you control rollup with the `rollup` setting in the [`granularitySpec`](./ingestion-spec.md#granularityspec). Rollup is enabled by default. This means Druid combines into a single row any rows that have identical [dimension](./schema-model.md#dimensions) values and [timestamp](./schema-model.md#primary-timestamp) values after [`queryGranularity`-based truncation](./ingestion-spec.md#granularityspec).
When you disable rollup, Druid loads each row as-is without doing any form of pre-aggregation. This mode is similar to databases that do not support a rollup feature. Set `rollup` to `false` if you want Druid to store each record as-is, without any rollup summarization.

View File

@ -24,17 +24,17 @@ title: "Schema design tips"
## Druid's data model
For general information, check out the documentation on [Druid's data model](./data-model.md) on the main
For general information, check out the documentation on [Druid schema model](./schema-model.md) on the main
ingestion overview page. The rest of this page discusses tips for users coming from other kinds of systems, as well as
general tips and common practices.
* Druid data is stored in [datasources](./data-model.md), which are similar to tables in a traditional RDBMS.
* Druid data is stored in [datasources](./schema-model.md), which are similar to tables in a traditional RDBMS.
* Druid datasources can be ingested with or without [rollup](./rollup.md). With rollup enabled, Druid partially aggregates your data during ingestion, potentially reducing its row count, decreasing storage footprint, and improving query performance. With rollup disabled, Druid stores one row for each row in your input data, without any pre-aggregation.
* Every row in Druid must have a timestamp. Data is always partitioned by time, and every query has a time filter. Query results can also be broken down by time buckets like minutes, hours, days, and so on.
* All columns in Druid datasources, other than the timestamp column, are either dimensions or metrics. This follows the [standard naming convention](https://en.wikipedia.org/wiki/Online_analytical_processing#Overview_of_OLAP_systems) of OLAP data.
* Typical production datasources have tens to hundreds of columns.
* [Dimension columns](./data-model.md#dimensions) are stored as-is, so they can be filtered on, grouped by, or aggregated at query time. They are always single Strings, [arrays of Strings](../querying/multi-value-dimensions.md), single Longs, single Doubles or single Floats.
* [Metric columns](./data-model.md#metrics) are stored [pre-aggregated](../querying/aggregations.md), so they can only be aggregated at query time (not filtered or grouped by). They are often stored as numbers (integers or floats) but can also be stored as complex objects like [HyperLogLog sketches or approximate quantile sketches](../querying/aggregations.md#approximate-aggregations). Metrics can be configured at ingestion time even when rollup is disabled, but are most useful when rollup is enabled.
* [Dimension columns](./schema-model.md#dimensions) are stored as-is, so they can be filtered on, grouped by, or aggregated at query time. They are always single Strings, [arrays of Strings](../querying/multi-value-dimensions.md), single Longs, single Doubles or single Floats.
* [Metric columns](./schema-model.md#metrics) are stored [pre-aggregated](../querying/aggregations.md), so they can only be aggregated at query time (not filtered or grouped by). They are often stored as numbers (integers or floats) but can also be stored as complex objects like [HyperLogLog sketches or approximate quantile sketches](../querying/aggregations.md#approximate-aggregations). Metrics can be configured at ingestion time even when rollup is disabled, but are most useful when rollup is enabled.
## If you're coming from a
@ -188,11 +188,11 @@ Druid is able to rapidly identify and retrieve data corresponding to time ranges
If your data has more than one timestamp, you can ingest the others as secondary timestamps. The best way to do this
is to ingest them as [long-typed dimensions](./ingestion-spec.md#dimensionsspec) in milliseconds format.
If necessary, you can get them into this format using a [`transformSpec`](./ingestion-spec.md#transformspec) and
[expressions](../misc/math-expr.md) like `timestamp_parse`, which returns millisecond timestamps.
[expressions](../querying/math-expr.md) like `timestamp_parse`, which returns millisecond timestamps.
At query time, you can query secondary timestamps with [SQL time functions](../querying/sql-scalar.md#date-and-time-functions)
like `MILLIS_TO_TIMESTAMP`, `TIME_FLOOR`, and others. If you're using native Druid queries, you can use
[expressions](../misc/math-expr.md).
[expressions](../querying/math-expr.md).
### Nested dimensions

View File

@ -1,7 +1,7 @@
---
id: data-model
title: "Druid data model"
sidebar_label: Data model
id: schema-model
title: Druid schema model
sidebar_label: Schema model
description: Introduces concepts of datasources, primary timestamp, dimensions, and metrics.
---

View File

@ -1,6 +1,7 @@
---
id: tasks
title: "Task reference"
title: Task reference
sidebar_label: Task reference
---
<!--
@ -25,7 +26,7 @@ title: "Task reference"
Tasks do all [ingestion](index.md)-related work in Druid.
For batch ingestion, you will generally submit tasks directly to Druid using the
[Task APIs](../operations/api-reference.md#tasks). For streaming ingestion, tasks are generally submitted for you by a
[Task APIs](../api-reference/api-reference.md#tasks). For streaming ingestion, tasks are generally submitted for you by a
supervisor.
## Task API
@ -33,7 +34,7 @@ supervisor.
Task APIs are available in two main places:
- The [Overlord](../design/overlord.md) process offers HTTP APIs to submit tasks, cancel tasks, check their status,
review logs and reports, and more. Refer to the [Tasks API reference page](../operations/api-reference.md#tasks) for a
review logs and reports, and more. Refer to the [Tasks API reference page](../api-reference/api-reference.md#tasks) for a
full list.
- Druid SQL includes a [`sys.tasks`](../querying/sql-metadata-tables.md#tasks-table) table that provides information about currently
running tasks. This table is read-only, and has a limited (but useful!) subset of the full information available through
@ -45,7 +46,7 @@ the Overlord APIs.
A report containing information about the number of rows ingested, and any parse exceptions that occurred is available for both completed tasks and running tasks.
The reporting feature is supported by [native batch tasks](../ingestion/native-batch.md), the Hadoop batch task, and Kafka and Kinesis ingestion tasks.
The reporting feature is supported by [native batch tasks](native-batch.md), the Hadoop batch task, and Kafka and Kinesis ingestion tasks.
### Completion report
@ -176,7 +177,7 @@ the `rowStats` map contains information about row counts. There is one entry for
- `processed`: Number of rows successfully ingested without parsing errors
- `processedBytes`: Total number of uncompressed bytes processed by the task. This reports the total byte size of all rows i.e. even those that are included in `processedWithError`, `unparseable` or `thrownAway`.
- `processedWithError`: Number of rows that were ingested, but contained a parsing error within one or more columns. This typically occurs where input rows have a parseable structure but invalid types for columns, such as passing in a non-numeric String value for a numeric column.
- `thrownAway`: Number of rows skipped. This includes rows with timestamps that were outside of the ingestion task's defined time interval and rows that were filtered out with a [`transformSpec`](./ingestion-spec.md#transformspec), but doesn't include the rows skipped by explicit user configurations. For example, the rows skipped by `skipHeaderRows` or `hasHeaderRow` in the CSV format are not counted.
- `thrownAway`: Number of rows skipped. This includes rows with timestamps that were outside of the ingestion task's defined time interval and rows that were filtered out with a [`transformSpec`](ingestion-spec.md#transformspec), but doesn't include the rows skipped by explicit user configurations. For example, the rows skipped by `skipHeaderRows` or `hasHeaderRow` in the CSV format are not counted.
- `unparseable`: Number of rows that could not be parsed at all and were discarded. This tracks input rows without a parseable structure, such as passing in non-JSON data when using a JSON parser.
The `errorMsg` field shows a message describing the error that caused a task to fail. It will be null if the task was successful.
@ -185,7 +186,7 @@ The `errorMsg` field shows a message describing the error that caused a task to
### Row stats
The [native batch task](./native-batch.md), the Hadoop batch task, and Kafka and Kinesis ingestion tasks support retrieval of row stats while the task is running.
The [native batch task](native-batch.md), the Hadoop batch task, and Kafka and Kinesis ingestion tasks support retrieval of row stats while the task is running.
The live report can be accessed with a GET to the following URL on a Peon running a task:
@ -249,7 +250,7 @@ http://<middlemanager-host>:<worker-port>/druid/worker/v1/chat/<task-id>/unparse
```
Note that this functionality is not supported by all task types. Currently, it is only supported by the
non-parallel [native batch task](../ingestion/native-batch.md) (type `index`) and the tasks created by the Kafka
non-parallel [native batch task](native-batch.md) (type `index`) and the tasks created by the Kafka
and Kinesis indexing services.
<a name="locks"></a>
@ -404,7 +405,7 @@ The task then starts creating logs in a local directory of the middle manager (o
When the task completes - whether it succeeds or fails - the middle manager (or indexer) will push the task log file into the location specified in [`druid.indexer.logs`](../configuration/index.md#task-logging).
Task logs on the Druid web console are retrieved via an [API](../operations/api-reference.md#overlord) on the Overlord. It automatically detects where the log file is, either in the middleManager / indexer or in long-term storage, and passes it back.
Task logs on the Druid web console are retrieved via an [API](../api-reference/api-reference.md#overlord) on the Overlord. It automatically detects where the log file is, either in the middleManager / indexer or in long-term storage, and passes it back.
If you don't see the log file in long-term storage, it means either:

View File

@ -38,7 +38,7 @@ and at least one worker task. As an experimental feature, the MSQ task engine al
batch tasks. The behavior and result format of plain SELECT (without INSERT or REPLACE) is subject to change.
You can execute SQL statements using the MSQ task engine through the **Query** view in the [web
console](../operations/web-console.md) or through the [`/druid/v2/sql/task` API](api.md).
console](../operations/web-console.md) or through the [`/druid/v2/sql/task` API](../api-reference/sql-ingestion-api.md).
For more details on how SQL queries are executed using the MSQ task engine, see [multi-stage query
tasks](#multi-stage-query-tasks).
@ -52,7 +52,7 @@ To support ingestion, additional SQL functionality is available through the MSQ
### Read external data with `EXTERN`
Query tasks can access external data through the `EXTERN` function, using any native batch [input
source](../ingestion/native-batch-input-source.md) and [input format](../ingestion/data-formats.md#input-format).
source](../ingestion/input-sources.md) and [input format](../ingestion/data-formats.md#input-format).
`EXTERN` can read multiple files in parallel across different worker tasks. However, `EXTERN` does not split individual
files across multiple worker tasks. If you have a small number of very large input files, you can increase query
@ -126,7 +126,7 @@ The `__time` column is used for [partitioning by time](#partitioning-by-time). I
column in your `INSERT` statement. However, Druid still creates a `__time` column in your Druid table and sets all
timestamps to 1970-01-01 00:00:00.
For more information, see [Primary timestamp](../ingestion/data-model.md#primary-timestamp).
For more information, see [Primary timestamp](../ingestion/schema-model.md#primary-timestamp).
<a name="partitioning"></a>
@ -215,7 +215,7 @@ For an example, see [INSERT with rollup example](examples.md#insert-with-rollup)
### Execution flow
When you execute a SQL statement using the task endpoint [`/druid/v2/sql/task`](api.md#submit-a-query), the following
When you execute a SQL statement using the task endpoint [`/druid/v2/sql/task`](../api-reference/sql-ingestion-api.md#submit-a-query), the following
happens:
1. The Broker plans your SQL query into a native query, as usual.

View File

@ -1,7 +1,7 @@
---
id: index
title: SQL-based ingestion
sidebar_label: Overview
sidebar_label: SQL-based ingestion
description: Introduces multi-stage query architecture and its task engine
---
@ -62,7 +62,7 @@ transformation: creating new tables based on queries of other tables.
To add the extension to an existing cluster, add `druid-multi-stage-query` to `druid.extensions.loadlist` in your
`common.runtime.properties` file.
For more information about how to load an extension, see [Loading extensions](../development/extensions.md#loading-extensions).
For more information about how to load an extension, see [Loading extensions](../configuration/extensions.md#loading-extensions).
To use [EXTERN](reference.md#extern-function), you need READ permission on the resource named "EXTERNAL" of the resource type
"EXTERNAL". If you encounter a 403 error when trying to use `EXTERN`, verify that you have the correct permissions.

View File

@ -61,7 +61,7 @@ FROM TABLE(
`EXTERN` consists of the following parts:
1. Any [Druid input source](../ingestion/native-batch-input-source.md) as a JSON-encoded string.
1. Any [Druid input source](../ingestion/input-sources.md) as a JSON-encoded string.
2. Any [Druid input format](../ingestion/data-formats.md) as a JSON-encoded string.
3. A row signature, as a JSON-encoded array of column descriptors. Each column descriptor must have a
`name` and a `type`. The type can be `string`, `long`, `double`, or `float`. This row signature is

View File

@ -43,8 +43,7 @@ To submit a query:
Once a query is submitted, it executes as a [`query_controller`](concepts.md#execution-flow) task. Query tasks that
users submit to the MSQ task engine are Overlord tasks, so they follow the Overlord's security model. This means that
users with access to the Overlord API can perform some actions even if they didn't submit the query, including
retrieving status or canceling a query. For more information about the Overlord API and the task API, see [APIs for
SQL-based ingestion](./api.md).
retrieving status or canceling a query. For more information about the Overlord API and the task API, see [APIs for SQL-based ingestion](../api-reference/sql-ingestion-api.md).
To interact with a query through the Overlord API, users need the following permissions:

View File

@ -24,7 +24,7 @@ description: "Defines a strategy to maintain Druid metadata store performance by
~ under the License.
-->
Apache Druid relies on [metadata storage](../dependencies/metadata-storage.md) to track information on data storage, operations, and system configuration.
Apache Druid relies on [metadata storage](../design/metadata-storage.md) to track information on data storage, operations, and system configuration.
The metadata store includes the following:
- Segment records
@ -230,5 +230,5 @@ druid.coordinator.kill.datasource.durationToRetain=P4D
## Learn more
See the following topics for more information:
- [Metadata management](../configuration/index.md#metadata-management) for metadata store configuration reference.
- [Metadata storage](../dependencies/metadata-storage.md) for an overview of the metadata storage database.
- [Metadata storage](../design/metadata-storage.md) for an overview of the metadata storage database.

View File

@ -39,7 +39,7 @@ If you wish to jump straight to deploying Druid as a cluster, or if you have an
The [configuration reference](../configuration/index.md) describes all of Druid's configuration properties.
The [API reference](../operations/api-reference.md) describes the APIs available on each Druid process.
The [API reference](../api-reference/api-reference.md) describes the APIs available on each Druid process.
The [basic cluster tuning guide](../operations/basic-cluster-tuning.md) is an introductory guide for tuning your Druid cluster.

View File

@ -1,6 +1,6 @@
---
id: migrate-from-firehose
title: "Migrate from firehose to input source ingestion"
title: "Migrate from firehose to input source ingestion (legacy)"
sidebar_label: "Migrate from firehose"
---
@ -43,11 +43,11 @@ If you're unable to use the console or you have problems with the console method
### Update your ingestion spec manually
To update your ingestion spec manually, copy your existing spec into a new file. Refer to [Native batch ingestion with firehose (Deprecated)](./native-batch-firehose.md) for a description of firehose properties.
To update your ingestion spec manually, copy your existing spec into a new file. Refer to [Native batch ingestion with firehose (Deprecated)](../ingestion/native-batch-firehose.md) for a description of firehose properties.
Edit the new file as follows:
1. In the `ioConfig` component, replace the `firehose` definition with an `inputSource` definition for your chosen input source. See [Native batch input sources](./native-batch-input-source.md) for details.
1. In the `ioConfig` component, replace the `firehose` definition with an `inputSource` definition for your chosen input source. See [Native batch input sources](../ingestion/input-sources.md) for details.
2. Move the `timeStampSpec` definition from `parser.parseSpec` to the `dataSchema` component.
3. Move the `dimensionsSpec` definition from `parser.parseSpec` to the `dataSchema` component.
4. Move the `format` definition from `parser.parseSpec` to an `inputFormat` definition in `ioConfig`.
@ -204,6 +204,6 @@ The following example illustrates the result of migrating the [example firehose
For more information, see the following pages:
- [Ingestion](./index.md): Overview of the Druid ingestion process.
- [Native batch ingestion](./native-batch.md): Description of the supported native batch indexing tasks.
- [Ingestion spec reference](./ingestion-spec.md): Description of the components and properties in the ingestion spec.
- [Ingestion](../ingestion/index.md): Overview of the Druid ingestion process.
- [Native batch ingestion](../ingestion/native-batch.md): Description of the supported native batch indexing tasks.
- [Ingestion spec reference](../ingestion/ingestion-spec.md): Description of the components and properties in the ingestion spec.

View File

@ -136,4 +136,4 @@ java -classpath "/my/druid/lib/*" org.apache.druid.cli.Main tools pull-deps --de
> Please note to use the pull-deps tool you must know the Maven groupId, artifactId, and version of your extension.
>
> For Druid community extensions listed [here](../development/extensions.md), the groupId is "org.apache.druid.extensions.contrib" and the artifactId is the name of the extension.
> For Druid community extensions listed [here](../configuration/extensions.md), the groupId is "org.apache.druid.extensions.contrib" and the artifactId is the name of the extension.

Some files were not shown because too many files have changed in this diff Show More