diff --git a/.travis.yml b/.travis.yml index 6c597cd3e71..894fd3f7b77 100644 --- a/.travis.yml +++ b/.travis.yml @@ -394,7 +394,7 @@ jobs: - name: "docs" stage: Tests - phase 1 - install: ./check_test_suite.py && travis_terminate 0 || (cd website && npm install) + install: ./check_test_suite.py && travis_terminate 0 || (cd website && nvm install 12.22.12 && npm install) script: |- (cd website && npm run lint && npm run spellcheck) || { echo " diff --git a/docs/configuration/index.md b/docs/configuration/index.md index c30952080de..2e11eefe129 100644 --- a/docs/configuration/index.md +++ b/docs/configuration/index.md @@ -895,7 +895,7 @@ These Coordinator static configurations can be defined in the `coordinator/runti The Coordinator has dynamic configuration to change certain behavior on the fly. -It is recommended that you use the [web console](../operations/druid-console.md) to configure these parameters. +It is recommended that you use the [web console](../operations/web-console.md) to configure these parameters. However, if you need to do it via HTTP, the JSON object can be submitted to the Coordinator via a POST request at: ``` @@ -983,7 +983,7 @@ These configuration options control the behavior of the Lookup dynamic configura ##### Automatic compaction dynamic configuration -You can set or update [automatic compaction](../ingestion/automatic-compaction.md) properties dynamically using the +You can set or update [automatic compaction](../data-management/automatic-compaction.md) properties dynamically using the [Coordinator API](../operations/api-reference.md#automatic-compaction-configuration) without restarting Coordinators. For details about segment compaction, see [Segment size optimization](../operations/segment-optimization.md). @@ -995,7 +995,7 @@ You can configure automatic compaction through the following properties: |`dataSource`|dataSource name to be compacted.|yes| |`taskPriority`|[Priority](../ingestion/tasks.md#priority) of compaction task.|no (default = 25)| |`inputSegmentSizeBytes`|Maximum number of total segment bytes processed per compaction task. Since a time chunk must be processed in its entirety, if the segments for a particular time chunk have a total size in bytes greater than this parameter, compaction will not run for that time chunk. Because each compaction task runs with a single thread, setting this value too far above 1–2GB will result in compaction tasks taking an excessive amount of time.|no (default = 100,000,000,000,000 i.e. 100TB)| -|`skipOffsetFromLatest`|The offset for searching segments to be compacted in [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) duration format. Strongly recommended to set for realtime dataSources. See [Data handling with compaction](../ingestion/compaction.md#data-handling-with-compaction).|no (default = "P1D")| +|`skipOffsetFromLatest`|The offset for searching segments to be compacted in [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) duration format. Strongly recommended to set for realtime dataSources. See [Data handling with compaction](../data-management/compaction.md#data-handling-with-compaction).|no (default = "P1D")| |`tuningConfig`|Tuning config for compaction tasks. See below [Automatic compaction tuningConfig](#automatic-compaction-tuningconfig).|no| |`taskContext`|[Task context](../ingestion/tasks.md#context) for compaction tasks.|no| |`granularitySpec`|Custom `granularitySpec`. See [Automatic compaction granularitySpec](#automatic-compaction-granularityspec).|No| @@ -1020,7 +1020,7 @@ You may see this issue with streaming ingestion from Kafka and Kinesis, which in To mitigate this problem, set `skipOffsetFromLatest` to a value large enough so that arriving data tends to fall outside the offset value from the current time. This way you can avoid conflicts between compaction tasks and realtime ingestion tasks. For example, if you want to skip over segments from thirty days prior to the end time of the most recent segment, assign `"skipOffsetFromLatest": "P30D"`. -For more information, see [Avoid conflicts with ingestion](../ingestion/automatic-compaction.md#avoid-conflicts-with-ingestion). +For more information, see [Avoid conflicts with ingestion](../data-management/automatic-compaction.md#avoid-conflicts-with-ingestion). ###### Automatic compaction tuningConfig @@ -1038,7 +1038,7 @@ The below is a list of the supported configurations for auto-compaction. |`indexSpecForIntermediatePersists`|Defines segment storage format options to be used at indexing time for intermediate persisted temporary segments. this can be used to disable dimension/metric compression on intermediate segments to reduce memory required for final merging. however, disabling compression on intermediate segments might increase page cache use while they are used before getting merged into final segment published, see [IndexSpec](../ingestion/ingestion-spec.md#indexspec) for possible values.|no| |`maxPendingPersists`|Maximum number of persists that can be pending but not started. If this limit would be exceeded by a new intermediate persist, ingestion will block until the currently-running persist finishes. Maximum heap memory usage for indexing scales with `maxRowsInMemory` * (2 + `maxPendingPersists`).|no (default = 0, meaning one persist can be running concurrently with ingestion, and none can be queued up)| |`pushTimeout`|Milliseconds to wait for pushing segments. It must be >= 0, where 0 means to wait forever.|no (default = 0)| -|`segmentWriteOutMediumFactory`|Segment write-out medium to use when creating segments. See [SegmentWriteOutMediumFactory](../ingestion/native-batch-simple-task.md#segmentwriteoutmediumfactory).|no (default is the value from `druid.peon.defaultSegmentWriteOutMediumFactory.type` is used)| +|`segmentWriteOutMediumFactory`|Segment write-out medium to use when creating segments. See [SegmentWriteOutMediumFactory](../ingestion/native-batch.md#segmentwriteoutmediumfactory).|no (default is the value from `druid.peon.defaultSegmentWriteOutMediumFactory.type` is used)| |`maxNumConcurrentSubTasks`|Maximum number of worker tasks which can be run in parallel at the same time. The supervisor task would spawn worker tasks up to `maxNumConcurrentSubTasks` regardless of the current available task slots. If this value is set to 1, the supervisor task processes data ingestion on its own instead of spawning worker tasks. If this value is set to too large, too many worker tasks can be created which might block other ingestion. Check [Capacity Planning](../ingestion/native-batch.md#capacity-planning) for more details.|no (default = 1)| |`maxRetry`|Maximum number of retries on task failures.|no (default = 3)| |`maxNumSegmentsToMerge`|Max limit for the number of segments that a single task can merge at the same time in the second phase. Used only with `hashed` or `single_dim` partitionsSpec.|no (default = 100)| @@ -1801,7 +1801,7 @@ Druid uses Jetty to serve HTTP requests. Each query being processed consumes a s |`druid.server.http.enableRequestLimit`|If enabled, no requests would be queued in jetty queue and "HTTP 429 Too Many Requests" error response would be sent. |false| |`druid.server.http.defaultQueryTimeout`|Query timeout in millis, beyond which unfinished queries will be cancelled|300000| |`druid.server.http.maxScatterGatherBytes`|Maximum number of bytes gathered from data processes such as Historicals and realtime processes to execute a query. Queries that exceed this limit will fail. This is an advance configuration that allows to protect in case Broker is under heavy load and not utilizing the data gathered in memory fast enough and leading to OOMs. This limit can be further reduced at query time using `maxScatterGatherBytes` in the context. Note that having large limit is not necessarily bad if broker is never under heavy concurrent load in which case data gathered is processed quickly and freeing up the memory used. Human-readable format is supported, see [here](human-readable-byte.md). |Long.MAX_VALUE| -|`druid.server.http.maxSubqueryRows`|Maximum number of rows from all subqueries per query. Druid stores the subquery rows in temporary tables that live in the Java heap. `druid.server.http.maxSubqueryRows` is a guardrail to prevent the system from exhausting available heap. When a subquery exceeds the row limit, Druid throws a resource limit exceeded exception: "Subquery generated results beyond maximum."

It is a good practice to avoid large subqueries in Druid. However, if you choose to raise the subquery row limit, you must also increase the heap size of all Brokers, Historicals, and task Peons that process data for the subqueries to accommodate the subquery results.

There is no formula to calculate the correct value. Trial and error is the best approach.|100000| +|`druid.server.http.maxSubqueryRows`|Maximum number of rows from all subqueries per query. Druid stores the subquery rows in temporary tables that live in the Java heap. `druid.server.http.maxSubqueryRows` is a guardrail to prevent the system from exhausting available heap. When a subquery exceeds the row limit, Druid throws a resource limit exceeded exception: "Subquery generated results beyond maximum."

It is a good practice to avoid large subqueries in Druid. However, if you choose to raise the subquery row limit, you must also increase the heap size of all Brokers, Historicals, and task Peons that process data for the subqueries to accommodate the subquery results.

There is no formula to calculate the correct value. Trial and error is the best approach.|100000| |`druid.server.http.gracefulShutdownTimeout`|The maximum amount of time Jetty waits after receiving shutdown signal. After this timeout the threads will be forcefully shutdown. This allows any queries that are executing to complete(Only values greater than zero are valid).|`PT30S`| |`druid.server.http.unannouncePropagationDelay`|How long to wait for ZooKeeper unannouncements to propagate before shutting down Jetty. This is a minimum and `druid.server.http.gracefulShutdownTimeout` does not start counting down until after this period elapses.|`PT0S` (do not wait)| |`druid.server.http.maxQueryTimeout`|Maximum allowed value (in milliseconds) for `timeout` parameter. See [query-context](../querying/query-context.md) to know more about `timeout`. Query is rejected if the query context `timeout` is greater than this value. |Long.MAX_VALUE| @@ -1901,7 +1901,7 @@ The Druid SQL server is configured through the following properties on the Broke |`druid.sql.planner.authorizeSystemTablesDirectly`|If true, Druid authorizes queries against any of the system schema tables (`sys` in SQL) as `SYSTEM_TABLE` resources which require `READ` access, in addition to permissions based content filtering.|false| |`druid.sql.planner.useNativeQueryExplain`|If true, `EXPLAIN PLAN FOR` will return the explain plan as a JSON representation of equivalent native query(s), else it will return the original version of explain plan generated by Calcite. It can be overridden per query with `useNativeQueryExplain` context key.|true| |`druid.sql.planner.maxNumericInFilters`|Max limit for the amount of numeric values that can be compared for a string type dimension when the entire SQL WHERE clause of a query translates to an [OR](../querying/filters.md#or) of [Bound filter](../querying/filters.md#bound-filter). By default, Druid does not restrict the amount of numeric Bound Filters on String columns, although this situation may block other queries from running. Set this property to a smaller value to prevent Druid from running queries that have prohibitively long segment processing times. The optimal limit requires some trial and error; we recommend starting with 100. Users who submit a query that exceeds the limit of `maxNumericInFilters` should instead rewrite their queries to use strings in the `WHERE` clause instead of numbers. For example, `WHERE someString IN (‘123’, ‘456’)`. If this value is disabled, `maxNumericInFilters` set through query context is ignored.|`-1` (disabled)| -|`druid.sql.approxCountDistinct.function`|Implementation to use for the [`APPROX_COUNT_DISTINCT` function](../querying/sql-aggregations.md). Without extensions loaded, the only valid value is `APPROX_COUNT_DISTINCT_BUILTIN` (a HyperLogLog, or HLL, based implementation). If the [DataSketches extension](../development/extensions-core/datasketches-extension.md) is loaded, this can also be `APPROX_COUNT_DISTINCT_DS_HLL` (alternative HLL implementation) or `APPROX_COUNT_DISTINCT_DS_THETA`.

Theta sketches use significantly more memory than HLL sketches, so you should prefer one of the two HLL implementations.|APPROX_COUNT_DISTINCT_BUILTIN| +|`druid.sql.approxCountDistinct.function`|Implementation to use for the [`APPROX_COUNT_DISTINCT` function](../querying/sql-aggregations.md). Without extensions loaded, the only valid value is `APPROX_COUNT_DISTINCT_BUILTIN` (a HyperLogLog, or HLL, based implementation). If the [DataSketches extension](../development/extensions-core/datasketches-extension.md) is loaded, this can also be `APPROX_COUNT_DISTINCT_DS_HLL` (alternative HLL implementation) or `APPROX_COUNT_DISTINCT_DS_THETA`.

Theta sketches use significantly more memory than HLL sketches, so you should prefer one of the two HLL implementations.|APPROX_COUNT_DISTINCT_BUILTIN| > Previous versions of Druid had properties named `druid.sql.planner.maxQueryCount` and `druid.sql.planner.maxSemiJoinRowsInMemory`. > These properties are no longer available. Since Druid 0.18.0, you can use `druid.server.http.maxSubqueryRows` to control the maximum diff --git a/docs/ingestion/automatic-compaction.md b/docs/data-management/automatic-compaction.md similarity index 94% rename from docs/ingestion/automatic-compaction.md rename to docs/data-management/automatic-compaction.md index ec2b2f5d62f..5ae63a3a4cb 100644 --- a/docs/ingestion/automatic-compaction.md +++ b/docs/data-management/automatic-compaction.md @@ -39,12 +39,12 @@ This topic guides you through setting up automatic compaction for your Druid clu ## Enable automatic compaction -You can enable automatic compaction for a datasource using the Druid console or programmatically via an API. -This process differs for manual compaction tasks, which can be submitted from the [Tasks view of the Druid console](../operations/druid-console.md) or the [Tasks API](../operations/api-reference.md#post-5). +You can enable automatic compaction for a datasource using the web console or programmatically via an API. +This process differs for manual compaction tasks, which can be submitted from the [Tasks view of the web console](../operations/web-console.md) or the [Tasks API](../operations/api-reference.md#post-5). -### Druid console +### web console -Use the Druid console to enable automatic compaction for a datasource as follows. +Use the web console to enable automatic compaction for a datasource as follows. 1. Click **Datasources** in the top-level navigation. 2. In the **Compaction** column, click the edit icon for the datasource to compact. @@ -142,13 +142,13 @@ druid.coordinator.compaction.period=PT60S After the Coordinator has initiated auto-compaction, you can view compaction statistics for the datasource, including the number of bytes, segments, and intervals already compacted and those awaiting compaction. The Coordinator also reports the total bytes, segments, and intervals not eligible for compaction in accordance with its [segment search policy](../design/coordinator.md#segment-search-policy-in-automatic-compaction). -In the Druid console, the Datasources view displays auto-compaction statistics. The Tasks view shows the task information for compaction tasks that were triggered by the automatic compaction system. +In the web console, the Datasources view displays auto-compaction statistics. The Tasks view shows the task information for compaction tasks that were triggered by the automatic compaction system. To get statistics by API, send a [`GET` request](../operations/api-reference.md#get-10) to `/druid/coordinator/v1/compaction/status`. To filter the results to a particular datasource, pass the datasource name as a query parameter to the request—for example, `/druid/coordinator/v1/compaction/status?dataSource=wikipedia`. ## Examples -The following examples demonstrate potential use cases in which auto-compaction may improve your Druid performance. See more details in [Compaction strategies](../ingestion/compaction.md#compaction-strategies). The examples in this section do not change the underlying data. +The following examples demonstrate potential use cases in which auto-compaction may improve your Druid performance. See more details in [Compaction strategies](../data-management/compaction.md#compaction-strategies). The examples in this section do not change the underlying data. ### Change segment granularity diff --git a/docs/ingestion/compaction.md b/docs/data-management/compaction.md similarity index 90% rename from docs/ingestion/compaction.md rename to docs/data-management/compaction.md index dbd71ac7b6a..130cdedb4f9 100644 --- a/docs/ingestion/compaction.md +++ b/docs/data-management/compaction.md @@ -29,7 +29,7 @@ Query performance in Apache Druid depends on optimally sized segments. Compactio There are several cases to consider compaction for segment optimization: - With streaming ingestion, data can arrive out of chronological order creating many small segments. -- If you append data using `appendToExisting` for [native batch](native-batch.md) ingestion creating suboptimal segments. +- If you append data using `appendToExisting` for [native batch](../ingestion/native-batch.md) ingestion creating suboptimal segments. - When you use `index_parallel` for parallel batch indexing and the parallel ingestion tasks create many small segments. - When a misconfigured ingestion task creates oversized segments. @@ -39,7 +39,7 @@ By default, compaction does not modify the underlying data of the segments. Howe - If you don't need fine-grained granularity for older data, you can use compaction to change older segments to a coarser query granularity. For example, from `minute` to `hour` or `hour` to `day`. This reduces the storage space required for older data. - You can change the dimension order to improve sorting and reduce segment size. - You can remove unused columns in compaction or implement an aggregation metric for older data. -- You can change segment rollup from dynamic partitioning with best-effort rollup to hash or range partitioning with perfect rollup. For more information on rollup, see [perfect vs best-effort rollup](./rollup.md#perfect-rollup-vs-best-effort-rollup). +- You can change segment rollup from dynamic partitioning with best-effort rollup to hash or range partitioning with perfect rollup. For more information on rollup, see [perfect vs best-effort rollup](../ingestion/rollup.md#perfect-rollup-vs-best-effort-rollup). Compaction does not improve performance in all situations. For example, if you rewrite your data with each ingestion task, you don't need to use compaction. See [Segment optimization](../operations/segment-optimization.md) for additional guidance to determine if compaction will help in your environment. @@ -47,7 +47,7 @@ Compaction does not improve performance in all situations. For example, if you r You can configure the Druid Coordinator to perform automatic compaction, also called auto-compaction, for a datasource. Using its [segment search policy](../design/coordinator.md#segment-search-policy-in-automatic-compaction), the Coordinator periodically identifies segments for compaction starting from newest to oldest. When the Coordinator discovers segments that have not been compacted or segments that were compacted with a different or changed spec, it submits compaction tasks for the time interval covering those segments. -Automatic compaction works in most use cases and should be your first option. To learn more, see [Automatic compaction](../ingestion/automatic-compaction.md). +Automatic compaction works in most use cases and should be your first option. To learn more, see [Automatic compaction](../data-management/automatic-compaction.md). In cases where you require more control over compaction, you can manually submit compaction tasks. For example: @@ -60,12 +60,12 @@ See [Setting up a manual compaction task](#setting-up-manual-compaction) for mor During compaction, Druid overwrites the original set of segments with the compacted set. Druid also locks the segments for the time interval being compacted to ensure data consistency. By default, compaction tasks do not modify the underlying data. You can configure the compaction task to change the query granularity or add or remove dimensions in the compaction task. This means that the only changes to query results should be the result of intentional, not automatic, changes. -You can set `dropExisting` in `ioConfig` to "true" in the compaction task to configure Druid to replace all existing segments fully contained by the interval. See the suggestion for reindexing with finer granularity under [Implementation considerations](native-batch.md#implementation-considerations) for an example. +You can set `dropExisting` in `ioConfig` to "true" in the compaction task to configure Druid to replace all existing segments fully contained by the interval. See the suggestion for reindexing with finer granularity under [Implementation considerations](../ingestion/native-batch.md#implementation-considerations) for an example. > WARNING: `dropExisting` in `ioConfig` is a beta feature. If an ingestion task needs to write data to a segment for a time interval locked for compaction, by default the ingestion task supersedes the compaction task and the compaction task fails without finishing. For manual compaction tasks, you can adjust the input spec interval to avoid conflicts between ingestion and compaction. For automatic compaction, you can set the `skipOffsetFromLatest` key to adjust the auto-compaction starting point from the current time to reduce the chance of conflicts between ingestion and compaction. Another option is to set the compaction task to higher priority than the ingestion task. -For more information, see [Avoid conflicts with ingestion](../ingestion/automatic-compaction.md#avoid-conflicts-with-ingestion). +For more information, see [Avoid conflicts with ingestion](../data-management/automatic-compaction.md#avoid-conflicts-with-ingestion). ### Segment granularity handling @@ -126,21 +126,21 @@ To perform a manual compaction, you submit a compaction task. Compaction tasks m |`transformSpec`|When set, the compaction task uses the specified `transformSpec` rather than using `null`. See [Compaction transformSpec](#compaction-transform-spec) for details.|No| |`metricsSpec`|When set, the compaction task uses the specified `metricsSpec` rather than generating one.|No| |`segmentGranularity`|Deprecated. Use `granularitySpec`.|No| -|`tuningConfig`|[Tuning configuration](native-batch.md#tuningconfig) for parallel indexing. `awaitSegmentAvailabilityTimeoutMillis` value is not supported for compaction tasks. Leave this parameter at the default value, 0.|No| +|`tuningConfig`|[Tuning configuration](../ingestion/native-batch.md#tuningconfig) for parallel indexing. `awaitSegmentAvailabilityTimeoutMillis` value is not supported for compaction tasks. Leave this parameter at the default value, 0.|No| |`granularitySpec`|When set, the compaction task uses the specified `granularitySpec` rather than generating one. See [Compaction `granularitySpec`](#compaction-granularity-spec) for details.|No| -|`context`|[Task context](./tasks.md#context)|No| +|`context`|[Task context](../ingestion/tasks.md#context)|No| > Note: Use `granularitySpec` over `segmentGranularity` and only set one of these values. If you specify different values for these in the same compaction spec, the task fails. -To control the number of result segments per time chunk, you can set [`maxRowsPerSegment`](./native-batch.md#partitionsspec) or [`numShards`](../ingestion/native-batch.md#tuningconfig). +To control the number of result segments per time chunk, you can set [`maxRowsPerSegment`](../ingestion/native-batch.md#partitionsspec) or [`numShards`](../ingestion/../ingestion/native-batch.md#tuningconfig). > You can run multiple compaction tasks in parallel. For example, if you want to compact the data for a year, you are not limited to running a single task for the entire year. You can run 12 compaction tasks with month-long intervals. -A compaction task internally generates an `index` task spec for performing compaction work with some fixed parameters. For example, its `inputSource` is always the [DruidInputSource](./native-batch-input-source.md), and `dimensionsSpec` and `metricsSpec` include all dimensions and metrics of the input segments by default. +A compaction task internally generates an `index` task spec for performing compaction work with some fixed parameters. For example, its `inputSource` is always the [`druid` input source](../ingestion/native-batch-input-source.md), and `dimensionsSpec` and `metricsSpec` include all dimensions and metrics of the input segments by default. Compaction tasks exit without doing anything and issue a failure status code in either of the following cases: -- If the interval you specify has no data segments loaded
+- If the interval you specify has no data segments loaded
- If the interval you specify is empty. Note that the metadata between input segments and the resulting compacted segments may differ if the metadata among the input segments differs as well. If all input segments have the same metadata, however, the resulting output segment will have the same metadata as all input segments. @@ -179,7 +179,7 @@ The compaction `ioConfig` requires specifying `inputSpec` as follows: |-----|-----------|-------|--------| |`type`|Task type. Set the value to `compact`.|none|Yes| |`inputSpec`|Specification of the target [intervals](#interval-inputspec) or [segments](#segments-inputspec).|none|Yes| -|`dropExisting`|If `true`, the task replaces all existing segments fully contained by either of the following:
- the `interval` in the `interval` type `inputSpec`.
- the umbrella interval of the `segments` in the `segment` type `inputSpec`.
If compaction fails, Druid does not change any of the existing segments.
**WARNING**: `dropExisting` in `ioConfig` is a beta feature. |false|No| +|`dropExisting`|If `true`, the task replaces all existing segments fully contained by either of the following:
- the `interval` in the `interval` type `inputSpec`.
- the umbrella interval of the `segments` in the `segment` type `inputSpec`.
If compaction fails, Druid does not change any of the existing segments.
**WARNING**: `dropExisting` in `ioConfig` is a beta feature. |false|No| Druid supports two supported `inputSpec` formats: @@ -223,5 +223,5 @@ Druid supports two supported `inputSpec` formats: See the following topics for more information: - [Segment optimization](../operations/segment-optimization.md) for guidance to determine if compaction will help in your case. -- [Automatic compaction](../ingestion/automatic-compaction.md) for how to enable and configure automatic compaction. +- [Automatic compaction](automatic-compaction.md) for how to enable and configure automatic compaction. diff --git a/docs/data-management/delete.md b/docs/data-management/delete.md new file mode 100644 index 00000000000..361c7873cc5 --- /dev/null +++ b/docs/data-management/delete.md @@ -0,0 +1,103 @@ +--- +id: delete +title: "Data deletion" +--- + + + +## By time range, manually + +Apache Druid stores data [partitioned by time chunk](../design/architecture.md#datasources-and-segments) and supports +deleting data for time chunks by dropping segments. This is a fast, metadata-only operation. + +Deletion by time range happens in two steps: + +1. Segments to be deleted must first be marked as ["unused"](../design/architecture.md#segment-lifecycle). This can + happen when a segment is dropped by a [drop rule](../operations/rule-configuration.md) or when you manually mark a + segment unused through the Coordinator API or web console. This is a soft delete: the data is not available for + querying, but the segment files remains in deep storage, and the segment records remains in the metadata store. +2. Once a segment is marked "unused", you can use a [`kill` task](#kill-task) to permanently delete the segment file from + deep storage and remove its record from the metadata store. This is a hard delete: the data is unrecoverable unless + you have a backup. + +For documentation on disabling segments using the Coordinator API, see the +[Coordinator API reference](../operations/api-reference.md#coordinator-datasources). + +A data deletion tutorial is available at [Tutorial: Deleting data](../tutorials/tutorial-delete-data.md). + +## By time range, automatically + +Druid supports [load and drop rules](../operations/rule-configuration.md), which are used to define intervals of time +where data should be preserved, and intervals where data should be discarded. Data that falls under a drop rule is +marked unused, in the same manner as if you [manually mark that time range unused](#by-time-range-manually). This is a +fast, metadata-only operation. + +Data that is dropped in this way is marked unused, but remains in deep storage. To permanently delete it, use a +[`kill` task](#kill-task). + +## Specific records + +Druid supports deleting specific records using [reindexing](update.md#reindex) with a filter. The filter specifies which +data remains after reindexing, so it must be the inverse of the data you want to delete. Because segments must be +rewritten to delete data in this way, it can be a time-consuming operation. + +For example, to delete records where `userName` is `'bob'` with native batch indexing, use a +[`transformSpec`](../ingestion/ingestion-spec.md#transformspec) with filter `{"type": "not", "field": {"type": +"selector", "dimension": "userName", "value": "bob"}}`. + +To delete the same records using SQL, use [REPLACE](../multi-stage-query/concepts.md#replace) with `WHERE userName <> 'bob'`. + +To reindex using [native batch](../ingestion/native-batch.md), use the [`druid` input +source](../ingestion/native-batch-input-source.md#druid-input-source). If needed, +[`transformSpec`](../ingestion/ingestion-spec.md#transformspec) can be used to filter or modify data during the +reindexing job. To reindex with SQL, use [`REPLACE OVERWRITE`](../multi-stage-query/reference.md#replace) +with `SELECT ... FROM
`. (Druid does not have `UPDATE` or `ALTER TABLE` statements.) Any SQL SELECT query can be +used to filter, modify, or enrich the data during the reindexing job. + +Data that is deleted in this way is marked unused, but remains in deep storage. To permanently delete it, use a [`kill` +task](#kill-task). + +## Entire table + +Deleting an entire table works the same way as [deleting part of a table by time range](#by-time-range-manually). First, +mark all segments unused using the Coordinator API or web console. Then, optionally, delete it permanently using a +[`kill` task](#kill-task). + + + +## Permanently (`kill` task) + +Data that has been overwritten or soft-deleted still remains as segments that have been marked unused. You can use a +`kill` task to permanently delete this data. + +The available grammar is: + +```json +{ + "type": "kill", + "id": , + "dataSource": , + "interval" : , + "context": +} +``` + +**WARNING:** The `kill` task permanently removes all information about the affected segments from the metadata store and +deep storage. This operation cannot be undone. diff --git a/docs/tutorials/tutorial-msq-connect-extern.md b/docs/data-management/index.md similarity index 51% rename from docs/tutorials/tutorial-msq-connect-extern.md rename to docs/data-management/index.md index 379601b90e2..410cb0c3fde 100644 --- a/docs/tutorials/tutorial-msq-connect-extern.md +++ b/docs/data-management/index.md @@ -1,7 +1,7 @@ --- -id: tutorial-msq-external-data -title: "Loading files with SQL" -sidebar_label: "Loading files with SQL" +id: index +title: "Data management" +sidebar_label: "Overview" --- - - - - - - - - About the Druid documentation - - - If you are not redirected automatically, follow this - link. - - \ No newline at end of file +Apache Druid stores data [partitioned by time chunk](../design/architecture.md#datasources-and-segments) in immutable +files called [segments](../design/segments.md). Data management operations involving replacing, or deleting, +these segments include: + +- [Updates](update.md) to existing data. +- [Deletion](delete.md) of existing data. +- [Schema changes](schema-changes.md) for new and existing data. +- [Compaction](compaction.md) and [automatic compaction](automatic-compaction.md), which reindex existing data to + optimize storage footprint and performance. diff --git a/docs/data-management/schema-changes.md b/docs/data-management/schema-changes.md new file mode 100644 index 00000000000..2dc535e3bb1 --- /dev/null +++ b/docs/data-management/schema-changes.md @@ -0,0 +1,39 @@ +--- +id: schema-changes +title: "Schema changes" +--- + + + + +## For new data + +Apache Druid allows you to provide a new schema for new data without the need to update the schema of any existing data. +It is sufficient to update your supervisor spec, if using [streaming ingestion](../ingestion/index.md#streaming), or to +provide the new schema the next time you do a [batch ingestion](../ingestion/index.md#batch). This is made possible by +the fact that each [segment](../design/architecture.md#datasources-and-segments), at the time it is created, stores a +copy of its own schema. Druid reconciles all of these individual segment schemas automatically at query time. + +## For existing data + +Schema changes are sometimes necessary for existing data. For example, you may want to change the type of a column in +previously-ingested data, or drop a column entirely. Druid handles this using [reindexing](update.md), the same method +it uses to handle updates of existing data. Reindexing involves rewriting all affected segments and can be a +time-consuming operation. diff --git a/docs/data-management/update.md b/docs/data-management/update.md new file mode 100644 index 00000000000..4eb31f8242c --- /dev/null +++ b/docs/data-management/update.md @@ -0,0 +1,76 @@ +--- +id: update +title: "Data updates" +--- + + + +## Overwrite + +Apache Druid stores data [partitioned by time chunk](../design/architecture.md#datasources-and-segments) and supports +overwriting existing data using time ranges. Data outside the replacement time range is not touched. Overwriting of +existing data is done using the same mechanisms as [batch ingestion](../ingestion/index.md#batch). + +For example: + +- [Native batch](../ingestion/native-batch.md) with `appendToExisting: false`, and `intervals` set to a specific + time range, overwrites data for that time range. +- [SQL `REPLACE
OVERWRITE [ALL | WHERE ...]`](../multi-stage-query/reference.md#replace) overwrites data for + the entire table or for a specified time range. + +In both cases, Druid's atomic update mechanism ensures that queries will flip seamlessly from the old data to the new +data on a time-chunk-by-time-chunk basis. + +Ingestion and overwriting cannot run concurrently for the same time range of the same datasource. While an overwrite job +is ongoing for a particular time range of a datasource, new ingestions for that time range are queued up. Ingestions for +other time ranges proceed as normal. Read-only queries also proceed as normal, using the pre-existing version of the +data. + +Druid does not support single-record updates by primary key. + +## Reindex + +Reindexing is an [overwrite of existing data](#overwrite) where the source of new data is the existing data itself. It +is used to perform schema changes, repartition data, filter out unwanted data, enrich existing data, and so on. This +behaves just like any other [overwrite](#overwrite) with regard to atomic updates and locking. + +With [native batch](../ingestion/native-batch.md), use the [`druid` input +source](../ingestion/native-batch-input-source.md#druid-input-source). If needed, +[`transformSpec`](../ingestion/ingestion-spec.md#transformspec) can be used to filter or modify data during the +reindexing job. + +With SQL, use [`REPLACE
OVERWRITE`](../multi-stage-query/reference.md#replace) with `SELECT ... FROM +
`. (Druid does not have `UPDATE` or `ALTER TABLE` statements.) Any SQL SELECT query can be used to filter, +modify, or enrich the data during the reindexing job. + +## Rolled-up datasources + +Rolled-up datasources can be effectively updated using appends, without rewrites. When you append a row that has an +identical set of dimensions to an existing row, queries that use aggregation operators automatically combine those two +rows together at query time. + +[Compaction](compaction.md) or [automatic compaction](automatic-compaction.md) can be used to physically combine these +matching rows together later on, by rewriting segments in the background. + +## Lookups + +If you have a dimension where values need to be updated frequently, try first using [lookups](../querying/lookups.md). A +classic use case of lookups is when you have an ID dimension stored in a Druid segment, and want to map the ID dimension to a +human-readable string that may need to be updated periodically. diff --git a/docs/design/architecture.md b/docs/design/architecture.md index cc6cf4c1a22..21f69663d24 100644 --- a/docs/design/architecture.md +++ b/docs/design/architecture.md @@ -44,9 +44,9 @@ Druid has several types of services: * [**Historical**](../design/historical.md) services store queryable data. * [**MiddleManager**](../design/middlemanager.md) services ingest data. -You can view services in the **Services** tab in the Druid console: +You can view services in the **Services** tab in the web console: -![Druid services](../assets/services-overview.png "Services in the Druid console") +![Druid services](../assets/services-overview.png "Services in the web console") ## Druid servers diff --git a/docs/design/coordinator.md b/docs/design/coordinator.md index c2e2f26f7ee..52f5f159e48 100644 --- a/docs/design/coordinator.md +++ b/docs/design/coordinator.md @@ -83,7 +83,7 @@ To ensure an even distribution of segments across Historical processes in the cl ### Automatic compaction -The Druid Coordinator manages the [automatic compaction system](../ingestion/automatic-compaction.md). +The Druid Coordinator manages the [automatic compaction system](../data-management/automatic-compaction.md). Each run, the Coordinator compacts segments by merging small segments or splitting a large one. This is useful when the size of your segments is not optimized which may degrade query performance. See [Segment size optimization](../operations/segment-optimization.md) for details. @@ -139,7 +139,7 @@ The search start point can be changed by setting `skipOffsetFromLatest`. If this is set, this policy will ignore the segments falling into the time chunk of (the end time of the most recent segment - `skipOffsetFromLatest`). This is to avoid conflicts between compaction tasks and realtime tasks. Note that realtime tasks have a higher priority than compaction tasks by default. Realtime tasks will revoke the locks of compaction tasks if their intervals overlap, resulting in the termination of the compaction task. -For more information, see [Avoid conflicts with ingestion](../ingestion/automatic-compaction.md#avoid-conflicts-with-ingestion). +For more information, see [Avoid conflicts with ingestion](../data-management/automatic-compaction.md#avoid-conflicts-with-ingestion). > This policy currently cannot handle the situation when there are a lot of small segments which have the same interval, > and their total size exceeds [`inputSegmentSizeBytes`](../configuration/index.md#automatic-compaction-dynamic-configuration). diff --git a/docs/design/indexer.md b/docs/design/indexer.md index 802882f5d2c..fa42912e760 100644 --- a/docs/design/indexer.md +++ b/docs/design/indexer.md @@ -91,6 +91,4 @@ Separate task logs are not currently supported when using the Indexer; all task The Indexer currently imposes an identical memory limit on each task. In later releases, the per-task memory limit will be removed and only the global limit will apply. The limit on concurrent merges will also be removed. -The Indexer does not work properly with [`index_realtime`](../ingestion/tasks.md#index_realtime) task types. Therefore, it is not compatible with [Tranquility](../ingestion/tranquility.md). If you are using Tranquility, consider migrating to Druid's builtin [Apache Kafka](../development/extensions-core/kafka-ingestion.md) or [Amazon Kinesis](../development/extensions-core/kinesis-ingestion.md) ingestion options. - In later releases, per-task memory usage will be dynamically managed. Please see https://github.com/apache/druid/issues/7900 for details on future enhancements to the Indexer. diff --git a/docs/design/processes.md b/docs/design/processes.md index 46f3748c461..a2c314ff49c 100644 --- a/docs/design/processes.md +++ b/docs/design/processes.md @@ -78,7 +78,7 @@ caller. End users typically query Brokers rather than querying Historicals or Mi Overlords, and Coordinators. They are optional since you can also simply contact the Druid Brokers, Overlords, and Coordinators directly. -The Router also runs the [Druid console](../operations/druid-console.md), a management UI for datasources, segments, tasks, data processes (Historicals and MiddleManagers), and coordinator dynamic configuration. The user can also run SQL and native Druid queries within the console. +The Router also runs the [web console](../operations/web-console.md), a management UI for datasources, segments, tasks, data processes (Historicals and MiddleManagers), and coordinator dynamic configuration. The user can also run SQL and native Druid queries within the console. ### Data server diff --git a/docs/design/router.md b/docs/design/router.md index d493a9a76a7..582e424e6d4 100644 --- a/docs/design/router.md +++ b/docs/design/router.md @@ -26,7 +26,7 @@ The Apache Druid Router process can be used to route queries to different Broker For query routing purposes, you should only ever need the Router process if you have a Druid cluster well into the terabyte range. -In addition to query routing, the Router also runs the [Druid console](../operations/druid-console.md), a management UI for datasources, segments, tasks, data processes (Historicals and MiddleManagers), and coordinator dynamic configuration. The user can also run SQL and native Druid queries within the console. +In addition to query routing, the Router also runs the [web console](../operations/web-console.md), a management UI for datasources, segments, tasks, data processes (Historicals and MiddleManagers), and coordinator dynamic configuration. The user can also run SQL and native Druid queries within the console. ### Configuration diff --git a/docs/design/segments.md b/docs/design/segments.md index 048cf396960..5dbc8ba97b3 100644 --- a/docs/design/segments.md +++ b/docs/design/segments.md @@ -23,7 +23,7 @@ title: "Segments" --> -Apache Druid stores its data and indexes in *segment files* partitioned by time. Druid creates a segment for each segment interval that contains data. If an interval is empty—that is, containing no rows—no segment exists for that time interval. Druid may create multiple segments for the same interval if you ingest data for that period via different ingestion jobs. [Compaction](../ingestion/compaction.md) is the Druid process that attempts to combine these segments into a single segment per interval for optimal performance. +Apache Druid stores its data and indexes in *segment files* partitioned by time. Druid creates a segment for each segment interval that contains data. If an interval is empty—that is, containing no rows—no segment exists for that time interval. Druid may create multiple segments for the same interval if you ingest data for that period via different ingestion jobs. [Compaction](../data-management/compaction.md) is the Druid process that attempts to combine these segments into a single segment per interval for optimal performance. The time interval is configurable in the `segmentGranularity` parameter of the [`granularitySpec`](../ingestion/ingestion-spec.md#granularityspec). diff --git a/docs/development/extensions-contrib/kafka-emitter.md b/docs/development/extensions-contrib/kafka-emitter.md index a6dcabffc71..85b8f10a7e1 100644 --- a/docs/development/extensions-contrib/kafka-emitter.md +++ b/docs/development/extensions-contrib/kafka-emitter.md @@ -27,7 +27,7 @@ To use this Apache Druid extension, [include](../../development/extensions.md#lo ## Introduction -This extension emits Druid metrics to [Apache Kafka](https://kafka.apache.org) directly with JSON format.
+This extension emits Druid metrics to [Apache Kafka](https://kafka.apache.org) directly with JSON format.
Currently, Kafka has not only their nice ecosystem but also consumer API readily available. So, If you currently use Kafka, It's easy to integrate various tool or UI to monitor the status of your Druid cluster with this extension. diff --git a/docs/development/extensions-core/druid-pac4j.md b/docs/development/extensions-core/druid-pac4j.md index d06daaaf681..4ee7f38e2ec 100644 --- a/docs/development/extensions-core/druid-pac4j.md +++ b/docs/development/extensions-core/druid-pac4j.md @@ -25,7 +25,7 @@ title: "Druid pac4j based Security extension" Apache Druid Extension to enable [OpenID Connect](https://openid.net/connect/) based Authentication for Druid Processes using [pac4j](https://github.com/pac4j/pac4j) as the underlying client library. This can be used with any authentication server that supports same e.g. [Okta](https://developer.okta.com/). -This extension should only be used at the router node to enable a group of users in existing authentication server to interact with Druid cluster, using the [Druid console](../../operations/druid-console.md). This extension does not support JDBC client authentication. +This extension should only be used at the router node to enable a group of users in existing authentication server to interact with Druid cluster, using the [web console](../../operations/web-console.md). This extension does not support JDBC client authentication. ## Configuration diff --git a/docs/development/extensions-core/kafka-ingestion.md b/docs/development/extensions-core/kafka-ingestion.md index f6f1b00a79b..58636e7f16f 100644 --- a/docs/development/extensions-core/kafka-ingestion.md +++ b/docs/development/extensions-core/kafka-ingestion.md @@ -254,7 +254,7 @@ development wiki-edit 1636399229823 For more information, see [`kafka` data format](../../ingestion/data-formats.md#kafka). ## Submit a supervisor spec -Druid starts a supervisor for a dataSource when you submit a supervisor spec. You can use the data loader in the Druid console or you can submit a supervisor spec to the following endpoint: +Druid starts a supervisor for a dataSource when you submit a supervisor spec. You can use the data loader in the web console or you can submit a supervisor spec to the following endpoint: `http://:/druid/indexer/v1/supervisor` diff --git a/docs/development/extensions-core/kafka-supervisor-reference.md b/docs/development/extensions-core/kafka-supervisor-reference.md index 1e8d1b40cae..01a61fe403d 100644 --- a/docs/development/extensions-core/kafka-supervisor-reference.md +++ b/docs/development/extensions-core/kafka-supervisor-reference.md @@ -151,7 +151,7 @@ export SSL_TRUSTSTORE_PASSWORD=mysecrettruststorepassword } } ``` -Verify that you've changed the values for all configurations to match your own environment. You can use the environment variable config provider syntax in the **Consumer properties** field on the **Connect tab** in the **Load Data** UI in the Druid console. When connecting to Kafka, Druid replaces the environment variables with their corresponding values. +Verify that you've changed the values for all configurations to match your own environment. You can use the environment variable config provider syntax in the **Consumer properties** field on the **Connect tab** in the **Load Data** UI in the web console. When connecting to Kafka, Druid replaces the environment variables with their corresponding values. Note: You can provide SSL connections with [Password Provider](../../operations/password-provider.md) interface to define the `keystore`, `truststore`, and `key`, but this feature is deprecated. diff --git a/docs/development/modules.md b/docs/development/modules.md index bf2f1783143..a0d2335194d 100644 --- a/docs/development/modules.md +++ b/docs/development/modules.md @@ -144,7 +144,7 @@ T00:00:00.000Z/2015-04-14T02:41:09.484Z/0/index.zip] to [/opt/druid/zk_druid/dde * DataSegmentKiller -The easiest way of testing the segment killing is marking a segment as not used and then starting a killing task in the [web console](../operations/druid-console.md). +The easiest way of testing the segment killing is marking a segment as not used and then starting a killing task in the [web console](../operations/web-console.md). To mark a segment as not used, you need to connect to your metadata storage and update the `used` column to `false` on the segment table rows. diff --git a/docs/ingestion/data-formats.md b/docs/ingestion/data-formats.md index ee2b8226714..780f6bbf2ff 100644 --- a/docs/ingestion/data-formats.md +++ b/docs/ingestion/data-formats.md @@ -592,7 +592,7 @@ Configure your `flattenSpec` as follows: | Field | Description | Default | |-------|-------------|---------| -| useFieldDiscovery | If true, interpret all root-level fields as available fields for usage by [`timestampSpec`](./ingestion-spec.md#timestampspec), [`transformSpec`](./ingestion-spec.md#transformspec), [`dimensionsSpec`](./ingestion-spec.md#dimensionsspec), and [`metricsSpec`](./ingestion-spec.md#metricsspec).

If false, only explicitly specified fields (see `fields`) will be available for use. | `true` | +| useFieldDiscovery | If true, interpret all root-level fields as available fields for usage by [`timestampSpec`](./ingestion-spec.md#timestampspec), [`transformSpec`](./ingestion-spec.md#transformspec), [`dimensionsSpec`](./ingestion-spec.md#dimensionsspec), and [`metricsSpec`](./ingestion-spec.md#metricsspec).

If false, only explicitly specified fields (see `fields`) will be available for use. | `true` | | fields | Specifies the fields of interest and how they are accessed. See [Field flattening specifications](#field-flattening-specifications) for more detail. | `[]` | For example: @@ -616,7 +616,7 @@ Each entry in the `fields` list can have the following components: | Field | Description | Default | |-------|-------------|---------| -| type | Options are as follows:

| none (required) | +| type | Options are as follows:

| none (required) | | name | Name of the field after flattening. This name can be referred to by the [`timestampSpec`](./ingestion-spec.md#timestampspec), [`transformSpec`](./ingestion-spec.md#transformspec), [`dimensionsSpec`](./ingestion-spec.md#dimensionsspec), and [`metricsSpec`](./ingestion-spec.md#metricsspec).| none (required) | | expr | Expression for accessing the field while flattening. For type `path`, this should be [JsonPath](https://github.com/jayway/JsonPath). For type `jq`, this should be [jackson-jq](https://github.com/eiiches/jackson-jq) notation. For other types, this parameter is ignored. | none (required for types `path` and `jq`) | diff --git a/docs/ingestion/data-management.md b/docs/ingestion/data-management.md deleted file mode 100644 index 3326525c22d..00000000000 --- a/docs/ingestion/data-management.md +++ /dev/null @@ -1,130 +0,0 @@ ---- -id: data-management -title: "Data management" ---- - - -Within the context of this topic data management refers to Apache Druid's data maintenance capabilities for existing datasources. There are several options to help you keep your data relevant and to help your Druid cluster remain performant. For example updating, reingesting, adding lookups, reindexing, or deleting data. - -In addition to the tasks covered on this page, you can also use segment compaction to improve the layout of your existing data. Refer to [Segment optimization](../operations/segment-optimization.md) to see if compaction will help in your environment. For an overview and steps to configure manual compaction tasks, see [Compaction](./compaction.md). - -## Adding new data to existing datasources - -Druid can insert new data to an existing datasource by appending new segments to existing segment sets. It can also add new data by merging an existing set of segments with new data and overwriting the original set. - -Druid does not support single-record updates by primary key. - - - -## Updating existing data - -Once you ingest some data in a dataSource for an interval and create Apache Druid segments, you might want to make changes to -the ingested data. There are several ways this can be done. - -### Using lookups - -If you have a dimension where values need to be updated frequently, try first using [lookups](../querying/lookups.md). A -classic use case of lookups is when you have an ID dimension stored in a Druid segment, and want to map the ID dimension to a -human-readable String value that may need to be updated periodically. - -### Reingesting data - -If lookup-based techniques are not sufficient, you will need to reingest data into Druid for the time chunks that you -want to update. This can be done using one of the [batch ingestion methods](index.md#batch) in overwrite mode (the -default mode). It can also be done using [streaming ingestion](index.md#streaming), provided you drop data for the -relevant time chunks first. - -If you do the reingestion in batch mode, Druid's atomic update mechanism means that queries will flip seamlessly from -the old data to the new data. - -We recommend keeping a copy of your raw data around in case you ever need to reingest it. - -### With Hadoop-based ingestion - -This section assumes you understand how to do batch ingestion using Hadoop. See -[Hadoop batch ingestion](./hadoop.md) for more information. Hadoop batch-ingestion can be used for reindexing and delta ingestion. - -Druid uses an `inputSpec` in the `ioConfig` to know where the data to be ingested is located and how to read it. -For simple Hadoop batch ingestion, `static` or `granularity` spec types allow you to read data stored in deep storage. - -There are other types of `inputSpec` to enable reindexing and delta ingestion. - -### Reindexing with Native Batch Ingestion - -This section assumes you understand how to do batch ingestion without Hadoop using [native batch indexing](../ingestion/native-batch.md). Native batch indexing uses an `inputSource` to know where and how to read the input data. You can use the [`DruidInputSource`](./native-batch-input-source.md) to read data from segments inside Druid. You can use Parallel task (`index_parallel`) for all native batch reindexing tasks. Increase the `maxNumConcurrentSubTasks` to accommodate the amount of data your are reindexing. See [Capacity planning](native-batch.md#capacity-planning). - - - -## Deleting data - -Druid supports permanent deletion of segments that are in an "unused" state (see the -[Segment lifecycle](../design/architecture.md#segment-lifecycle) section of the Architecture page). - -The Kill Task deletes unused segments within a specified interval from metadata storage and deep storage. - -For more information, please see [Kill Task](../ingestion/tasks.md#kill). - -Permanent deletion of a segment in Apache Druid has two steps: - -1. The segment must first be marked as "unused". This occurs when a segment is dropped by retention rules, and when a user manually disables a segment through the Coordinator API. -2. After segments have been marked as "unused", a Kill Task will delete any "unused" segments from Druid's metadata store as well as deep storage. - -For documentation on retention rules, please see [Data Retention](../operations/rule-configuration.md). - -For documentation on disabling segments using the Coordinator API, please see the -[Coordinator Datasources API](../operations/api-reference.md#coordinator-datasources) reference. - -A data deletion tutorial is available at [Tutorial: Deleting data](../tutorials/tutorial-delete-data.md) - -## Kill Task - -The kill task deletes all information about segments and removes them from deep storage. Segments to kill must be unused (used==0) in the Druid segment table. - -The available grammar is: - -```json -{ - "type": "kill", - "id": , - "dataSource": , - "interval" : , - "markAsUnused": , - "context": -} -``` - -If `markAsUnused` is true (default is false), the kill task will first mark any segments within the specified interval as unused, before deleting the unused segments within the interval. - -**WARNING!** The kill task permanently removes all information about the affected segments from the metadata store and deep storage. These segments cannot be recovered after the kill task runs, this operation cannot be undone. - -## Retention - -Druid supports retention rules, which are used to define intervals of time where data should be preserved, and intervals where data should be discarded. - -Druid also supports separating Historical processes into tiers, and the retention rules can be configured to assign data for specific intervals to specific tiers. - -These features are useful for performance/cost management; a common use case is separating Historical processes into a "hot" tier and a "cold" tier. - -For more information, please see [Load rules](../operations/rule-configuration.md). - -## Learn more -See the following topics for more information: -- [Compaction](./compaction.md) for an overview and steps to configure manual compaction tasks. -- [Segments](../design/segments.md) for information on how Druid handles segment versioning. diff --git a/docs/ingestion/data-model.md b/docs/ingestion/data-model.md index 53141ba0be6..8a5a126a8df 100644 --- a/docs/ingestion/data-model.md +++ b/docs/ingestion/data-model.md @@ -29,7 +29,7 @@ Druid stores data in datasources, which are similar to tables in a traditional r ## Primary timestamp Druid schemas must always include a primary timestamp. Druid uses the primary timestamp to [partition and sort](./partitioning.md) your data. Druid uses the primary timestamp to rapidly identify and retrieve data within the time range of queries. Druid also uses the primary timestamp column -for time-based [data management operations](./data-management.md) such as dropping time chunks, overwriting time chunks, and time-based retention rules. +for time-based [data management operations](../data-management/index.md) such as dropping time chunks, overwriting time chunks, and time-based retention rules. Druid parses the primary timestamp based on the [`timestampSpec`](./ingestion-spec.md#timestampspec) configuration at ingestion time. Regardless of the source field for the primary timestamp, Druid always stores the timestamp in the `__time` column in your Druid datasource. diff --git a/docs/ingestion/faq.md b/docs/ingestion/faq.md index 38e99aa0d18..edd38aac9ea 100644 --- a/docs/ingestion/faq.md +++ b/docs/ingestion/faq.md @@ -27,14 +27,10 @@ sidebar_label: "Troubleshooting FAQ" If you are trying to batch load historical data but no events are being loaded, make sure the interval of your ingestion spec actually encapsulates the interval of your data. Events outside this interval are dropped. -## Druid ingested my events but I they are not in my query results +## Druid ingested my events but they are not in my query results If the number of ingested events seem correct, make sure your query is correctly formed. If you included a `count` aggregator in your ingestion spec, you will need to query for the results of this aggregate with a `longSum` aggregator. Issuing a query with a count aggregator will count the number of Druid rows, which includes [roll-up](../design/index.md). -## What types of data does Druid support? - -Druid can ingest JSON, CSV, TSV and other delimited data out of the box. Druid supports single dimension values, or multiple dimension values (an array of strings). Druid supports long, float, and double numeric columns. - ## Where do my Druid segments end up after ingestion? Depending on what `druid.storage.type` is set to, Druid will upload segments to some [Deep Storage](../dependencies/deep-storage.md). Local disk is used as the default deep storage. @@ -73,7 +69,7 @@ Note that this workflow only guarantees that the segments are available at the t ## I don't see my Druid segments on my Historical processes -You can check the [web console](../operations/druid-console.md) to make sure that your segments have actually loaded on [Historical processes](../design/historical.md). If your segments are not present, check the Coordinator logs for messages about capacity of replication errors. One reason that segments are not downloaded is because Historical processes have maxSizes that are too small, making them incapable of downloading more data. You can change that with (for example): +You can check the [web console](../operations/web-console.md) to make sure that your segments have actually loaded on [Historical processes](../design/historical.md). If your segments are not present, check the Coordinator logs for messages about capacity of replication errors. One reason that segments are not downloaded is because Historical processes have maxSizes that are too small, making them incapable of downloading more data. You can change that with (for example): ``` -Ddruid.segmentCache.locations=[{"path":"/tmp/druid/storageLocation","maxSize":"500000000000"}] @@ -83,26 +79,6 @@ You can check the [web console](../operations/druid-console.md) to make sure tha You can use a [segment metadata query](../querying/segmentmetadataquery.md) for the dimensions and metrics that have been created for your datasource. Make sure that the name of the aggregators you use in your query match one of these metrics. Also make sure that the query interval you specify match a valid time range where data exists. -## How can I Reindex existing data in Druid with schema changes? - -You can use DruidInputSource with the [Parallel task](../ingestion/native-batch.md) to ingest existing druid segments using a new schema and change the name, dimensions, metrics, rollup, etc. of the segment. -See [DruidInputSource](./native-batch-input-source.md) for more details. -Or, if you use hadoop based ingestion, then you can use "dataSource" input spec to do reindexing. - -See the [Update existing data](../ingestion/data-management.md#update) section of the data management page for more details. - -## How can I change the query granularity of existing data in Druid? - -In a lot of situations you may want coarser granularity for older data. Example, any data older than 1 month has only hour level granularity but newer data has minute level granularity. This use case is same as re-indexing. - -To do this use the [DruidInputSource](./native-batch-input-source.md) and run a [Parallel task](../ingestion/native-batch.md). The DruidInputSource will allow you to take in existing segments from Druid and aggregate them and feed them back into Druid. It will also allow you to filter the data in those segments while feeding it back in. This means if there are rows you want to delete, you can just filter them away during re-ingestion. -Typically the above will be run as a batch job to say everyday feed in a chunk of data and aggregate it. -Or, if you use hadoop based ingestion, then you can use "dataSource" input spec to do reindexing. - -See the [Update existing data](../ingestion/data-management.md#update) section of the data management page for more details. - -You can also change the query granularity using compaction. See [Query granularity handling](../ingestion/compaction.md#query-granularity-handling). - ## Real-time ingestion seems to be stuck There are a few ways this can occur. Druid will throttle ingestion to prevent out of memory problems if the intermediate persists are taking too long or if hand-off is taking too long. If your process logs indicate certain columns are taking a very long time to build (for example, if your segment granularity is hourly, but creating a single column takes 30 minutes), you should re-evaluate your configuration or scale up your real-time ingestion. diff --git a/docs/ingestion/index.md b/docs/ingestion/index.md index 755a89e0db4..38de8328f14 100644 --- a/docs/ingestion/index.md +++ b/docs/ingestion/index.md @@ -22,10 +22,13 @@ title: "Ingestion" ~ under the License. --> -Loading data in Druid is called _ingestion_ or _indexing_. When you ingest data into Druid, Druid reads the data from your source system and stores it in data files called _segments_. In general, segment files contain a few million rows. +Loading data in Druid is called _ingestion_ or _indexing_. When you ingest data into Druid, Druid reads the data from +your source system and stores it in data files called [_segments_](../design/architecture.md#datasources-and-segments). +In general, segment files contain a few million rows each. -For most ingestion methods, the Druid [MiddleManager](../design/middlemanager.md) processes or the [Indexer](../design/indexer.md) processes load your source data. One exception is -Hadoop-based ingestion, which uses a Hadoop MapReduce job on YARN MiddleManager or Indexer processes to start and monitor Hadoop jobs. +For most ingestion methods, the Druid [MiddleManager](../design/middlemanager.md) processes or the +[Indexer](../design/indexer.md) processes load your source data. The sole exception is Hadoop-based ingestion, which +uses a Hadoop MapReduce job on YARN. During ingestion Druid creates segments and stores them in [deep storage](../dependencies/deep-storage.md). Historical nodes load the segments into memory to respond to queries. For streaming ingestion, the Middle Managers and indexers can respond to queries in real-time with arriving data. See the [Storage design](../design/architecture.md#storage-design) section of the Druid design documentation for more information. @@ -46,41 +49,32 @@ page. ### Streaming -The most recommended, and most popular, method of streaming ingestion is the -[Kafka indexing service](../development/extensions-core/kafka-ingestion.md) that reads directly from Kafka. Alternatively, the Kinesis -indexing service works with Amazon Kinesis Data Streams. - -Streaming ingestion uses an ongoing process called a supervisor that reads from the data stream to ingest data into Druid. - -This table compares the options: +There are two available options for streaming ingestion. Streaming ingestion is controlled by a continuously-running +supervisor. | **Method** | [Kafka](../development/extensions-core/kafka-ingestion.md) | [Kinesis](../development/extensions-core/kinesis-ingestion.md) | |---|-----|--------------| | **Supervisor type** | `kafka` | `kinesis`| | **How it works** | Druid reads directly from Apache Kafka. | Druid reads directly from Amazon Kinesis.| -| **Can ingest late data?** | Yes | Yes | -| **Exactly-once guarantees?** | Yes | Yes | +| **Can ingest late data?** | Yes. | Yes. | +| **Exactly-once guarantees?** | Yes. | Yes. | ### Batch -When doing batch loads from files, you should use one-time [tasks](tasks.md), and you have three options: `index_parallel` (native batch; parallel), `index_hadoop` (Hadoop-based), -or `index` (native batch; single-task). +There are three available options for batch ingestion. Batch ingestion jobs are associated with a controller task that +runs for the duration of the job. -In general, we recommend native batch whenever it meets your needs, since the setup is simpler (it does not depend on -an external Hadoop cluster). However, there are still scenarios where Hadoop-based batch ingestion might be a better choice, -for example when you already have a running Hadoop cluster and want to -use the cluster resource of the existing cluster for batch ingestion. - -This table compares the three available options: - -| **Method** | [Native batch (parallel)](./native-batch.md) | [Hadoop-based](hadoop.md) | [Native batch (simple)](./native-batch-simple-task.md) | +| **Method** | [Native batch](./native-batch.md) | [SQL](../multi-stage-query/index.md) | [Hadoop-based](hadoop.md) | |---|-----|--------------|------------| -| **Task type** | `index_parallel` | `index_hadoop` | `index` | -| **Parallel?** | Yes, if `inputFormat` is splittable and `maxNumConcurrentSubTasks` > 1 in `tuningConfig`. See [data format documentation](./data-formats.md) for details. | Yes, always. | No. Each task is single-threaded. | -| **Can append or overwrite?** | Yes, both. | Overwrite only. | Yes, both. | -| **External dependencies** | None. | Hadoop cluster (Druid submits Map/Reduce jobs). | None. | -| **Input locations** | Any [`inputSource`](./native-batch-input-source.md). | Any Hadoop FileSystem or Druid datasource. | Any [`inputSource`](./native-batch-input-source.md). | -| **File formats** | Any [`inputFormat`](./data-formats.md#input-format). | Any Hadoop InputFormat. | Any [`inputFormat`](./data-formats.md#input-format). | -| **[Rollup modes](./rollup.md)** | Perfect if `forceGuaranteedRollup` = true in the [`tuningConfig`](native-batch.md#tuningconfig). | Always perfect. | Perfect if `forceGuaranteedRollup` = true in the [`tuningConfig`](native-batch.md#tuningconfig). | -| **Partitioning options** | Dynamic, hash-based, and range-based partitioning methods are available. See [partitionsSpec](./native-batch.md#partitionsspec) for details.| Hash-based or range-based partitioning via [`partitionsSpec`](hadoop.md#partitionsspec). | Dynamic and hash-based partitioning methods are available. See [partitionsSpec](./native-batch.md#partitionsspec) for details. | +| **Controller task type** | `index_parallel` | `query_controller` | `index_hadoop` | +| **How you submit it** | Send an `index_parallel` spec to the [task API](../operations/api-reference.md#task-submit). | Send an [INSERT](../multi-stage-query/concepts.md#insert) or [REPLACE](../multi-stage-query/concepts.md#replace) statement to the [SQL task API](../multi-stage-query/api.md#submit-a-query). | Send an `index_hadoop` spec to the [task API](../operations/api-reference.md#task-submit). | +| **Parallelism** | Using subtasks, if [`maxNumConcurrentSubTasks`](native-batch.md#tuningconfig) is greater than 1. | Using `query_worker` subtasks. | Using YARN. | +| **Fault tolerance** | Workers automatically relaunched upon failure. Controller task failure leads to job failure. | Controller or worker task failure leads to job failure. | YARN containers automatically relaunched upon failure. Controller task failure leads to job failure. | +| **Can append?** | Yes. | Yes (INSERT). | No. | +| **Can overwrite?** | Yes. | Yes (REPLACE). | Yes. | +| **External dependencies** | None. | None. | Hadoop cluster. | +| **Input sources** | Any [`inputSource`](./native-batch-input-source.md). | Any [`inputSource`](./native-batch-input-source.md) (using [EXTERN](../multi-stage-query/concepts.md#extern)) or Druid datasource (using FROM). | Any Hadoop FileSystem or Druid datasource. | +| **Input formats** | Any [`inputFormat`](./data-formats.md#input-format). | Any [`inputFormat`](./data-formats.md#input-format). | Any Hadoop InputFormat. | +| **Secondary partitioning options** | Dynamic, hash-based, and range-based partitioning methods are available. See [partitionsSpec](./native-batch.md#partitionsspec) for details.| Range partitioning ([CLUSTERED BY](../multi-stage-query/concepts.md#clustering)). | Hash-based or range-based partitioning via [`partitionsSpec`](hadoop.md#partitionsspec). | +| **[Rollup modes](./rollup.md#perfect-rollup-vs-best-effort-rollup)** | Perfect if `forceGuaranteedRollup` = true in the [`tuningConfig`](native-batch.md#tuningconfig). | Always perfect. | Always perfect. | diff --git a/docs/ingestion/ingestion-spec.md b/docs/ingestion/ingestion-spec.md index 3684052d837..acfe73c49fa 100644 --- a/docs/ingestion/ingestion-spec.md +++ b/docs/ingestion/ingestion-spec.md @@ -95,7 +95,7 @@ The specific options supported by these sections will depend on the [ingestion m For more examples, refer to the documentation for each ingestion method. You can also load data visually, without the need to write an ingestion spec, using the "Load data" functionality -available in Druid's [web console](../operations/druid-console.md). Druid's visual data loader supports +available in Druid's [web console](../operations/web-console.md). Druid's visual data loader supports [Kafka](../development/extensions-core/kafka-ingestion.md), [Kinesis](../development/extensions-core/kinesis-ingestion.md), and [native batch](native-batch.md) mode. @@ -175,7 +175,7 @@ A `timestampSpec` can have the following components: |Field|Description|Default| |-----|-----------|-------| -|column|Input row field to read the primary timestamp from.

Regardless of the name of this input field, the primary timestamp will always be stored as a column named `__time` in your Druid datasource.|timestamp| +|column|Input row field to read the primary timestamp from.

Regardless of the name of this input field, the primary timestamp will always be stored as a column named `__time` in your Druid datasource.|timestamp| |format|Timestamp format. Options are:
  • `iso`: ISO8601 with 'T' separator, like "2000-01-01T01:02:03.456"
  • `posix`: seconds since epoch
  • `millis`: milliseconds since epoch
  • `micro`: microseconds since epoch
  • `nano`: nanoseconds since epoch
  • `auto`: automatically detects ISO (either 'T' or space separator) or millis format
  • any [Joda DateTimeFormat string](http://joda-time.sourceforge.net/apidocs/org/joda/time/format/DateTimeFormat.html)
|auto| |missingValue|Timestamp to use for input records that have a null or missing timestamp `column`. Should be in ISO8601 format, like `"2000-01-01T01:02:03.456"`, even if you have specified something else for `format`. Since Druid requires a primary timestamp, this setting can be useful for ingesting datasets that do not have any per-record timestamps at all. |none| @@ -209,8 +209,8 @@ A `dimensionsSpec` can have the following components: | Field | Description | Default | |------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------| -| `dimensions` | A list of [dimension names or objects](#dimension-objects). You cannot include the same column in both `dimensions` and `dimensionExclusions`.

If `dimensions` and `spatialDimensions` are both null or empty arrays, Druid treats all columns other than timestamp or metrics that do not appear in `dimensionExclusions` as String-typed dimension columns. See [inclusions and exclusions](#inclusions-and-exclusions) for details.

As a best practice, put the most frequently filtered dimensions at the beginning of the dimensions list. In this case, it would also be good to consider [`partitioning`](partitioning.md) by those same dimensions. | `[]` | -| `dimensionExclusions` | The names of dimensions to exclude from ingestion. Only names are supported here, not objects.

This list is only used if the `dimensions` and `spatialDimensions` lists are both null or empty arrays; otherwise it is ignored. See [inclusions and exclusions](#inclusions-and-exclusions) below for details. | `[]` | +| `dimensions` | A list of [dimension names or objects](#dimension-objects). You cannot include the same column in both `dimensions` and `dimensionExclusions`.

If `dimensions` and `spatialDimensions` are both null or empty arrays, Druid treats all columns other than timestamp or metrics that do not appear in `dimensionExclusions` as String-typed dimension columns. See [inclusions and exclusions](#inclusions-and-exclusions) for details.

As a best practice, put the most frequently filtered dimensions at the beginning of the dimensions list. In this case, it would also be good to consider [`partitioning`](partitioning.md) by those same dimensions. | `[]` | +| `dimensionExclusions` | The names of dimensions to exclude from ingestion. Only names are supported here, not objects.

This list is only used if the `dimensions` and `spatialDimensions` lists are both null or empty arrays; otherwise it is ignored. See [inclusions and exclusions](#inclusions-and-exclusions) below for details. | `[]` | | `spatialDimensions` | An array of [spatial dimensions](../development/geo.md). | `[]` | | `includeAllDimensions` | You can set `includeAllDimensions` to true to ingest both explicit dimensions in the `dimensions` field and other dimensions that the ingestion task discovers from input data. In this case, the explicit dimensions will appear first in order that you specify them and the dimensions dynamically discovered will come after. This flag can be useful especially with auto schema discovery using [`flattenSpec`](./data-formats.html#flattenspec). If this is not set and the `dimensions` field is not empty, Druid will ingest only explicit dimensions. If this is not set and the `dimensions` field is empty, all discovered dimensions will be ingested. | false | @@ -224,7 +224,7 @@ Dimension objects can have the following components: | Field | Description | Default | |-------|-------------|---------| | type | Either `string`, `long`, `float`, `double`, or `json`. | `string` | -| name | The name of the dimension. This will be used as the field name to read from input records, as well as the column name stored in generated segments.

Note that you can use a [`transformSpec`](#transformspec) if you want to rename columns during ingestion time. | none (required) | +| name | The name of the dimension. This will be used as the field name to read from input records, as well as the column name stored in generated segments.

Note that you can use a [`transformSpec`](#transformspec) if you want to rename columns during ingestion time. | none (required) | | createBitmapIndex | For `string` typed dimensions, whether or not bitmap indexes should be created for the column in generated segments. Creating a bitmap index requires more storage, but speeds up certain kinds of filtering (especially equality and prefix filtering). Only supported for `string` typed dimensions. | `true` | | multiValueHandling | Specify the type of handling for [multi-value fields](../querying/multi-value-dimensions.md). Possible values are `sorted_array`, `sorted_set`, and `array`. `sorted_array` and `sorted_set` order the array upon ingestion. `sorted_set` removes duplicates. `array` ingests data as-is | `sorted_array` | @@ -300,9 +300,9 @@ A `granularitySpec` can have the following components: |-------|-------------|---------| | type |`uniform`| `uniform` | | segmentGranularity | [Time chunking](../design/architecture.md#datasources-and-segments) granularity for this datasource. Multiple segments can be created per time chunk. For example, when set to `day`, the events of the same day fall into the same time chunk which can be optionally further partitioned into multiple segments based on other configurations and input size. Any [granularity](../querying/granularities.md) can be provided here. Note that all segments in the same time chunk should have the same segment granularity.| `day` | -| queryGranularity | The resolution of timestamp storage within each segment. This must be equal to, or finer, than `segmentGranularity`. This will be the finest granularity that you can query at and still receive sensible results, but note that you can still query at anything coarser than this granularity. E.g., a value of `minute` will mean that records will be stored at minutely granularity, and can be sensibly queried at any multiple of minutes (including minutely, 5-minutely, hourly, etc).

Any [granularity](../querying/granularities.md) can be provided here. Use `none` to store timestamps as-is, without any truncation. Note that `rollup` will be applied if it is set even when the `queryGranularity` is set to `none`. | `none` | +| queryGranularity | The resolution of timestamp storage within each segment. This must be equal to, or finer, than `segmentGranularity`. This will be the finest granularity that you can query at and still receive sensible results, but note that you can still query at anything coarser than this granularity. E.g., a value of `minute` will mean that records will be stored at minutely granularity, and can be sensibly queried at any multiple of minutes (including minutely, 5-minutely, hourly, etc).

Any [granularity](../querying/granularities.md) can be provided here. Use `none` to store timestamps as-is, without any truncation. Note that `rollup` will be applied if it is set even when the `queryGranularity` is set to `none`. | `none` | | rollup | Whether to use ingestion-time [rollup](./rollup.md) or not. Note that rollup is still effective even when `queryGranularity` is set to `none`. Your data will be rolled up if they have the exactly same timestamp. | `true` | -| intervals | A list of intervals defining time chunks for segments. Specify interval values using ISO8601 format. For example, `["2021-12-06T21:27:10+00:00/2021-12-07T00:00:00+00:00"]`. If you omit the time, the time defaults to "00:00:00".

Druid breaks the list up and rounds off the list values based on the `segmentGranularity`.

If `null` or not provided, batch ingestion tasks generally determine which time chunks to output based on the timestamps found in the input data.

If specified, batch ingestion tasks may be able to skip a determining-partitions phase, which can result in faster ingestion. Batch ingestion tasks may also be able to request all their locks up-front instead of one by one. Batch ingestion tasks throw away any records with timestamps outside of the specified intervals.

Ignored for any form of streaming ingestion. | `null` | +| intervals | A list of intervals defining time chunks for segments. Specify interval values using ISO8601 format. For example, `["2021-12-06T21:27:10+00:00/2021-12-07T00:00:00+00:00"]`. If you omit the time, the time defaults to "00:00:00".

Druid breaks the list up and rounds off the list values based on the `segmentGranularity`.

If `null` or not provided, batch ingestion tasks generally determine which time chunks to output based on the timestamps found in the input data.

If specified, batch ingestion tasks may be able to skip a determining-partitions phase, which can result in faster ingestion. Batch ingestion tasks may also be able to request all their locks up-front instead of one by one. Batch ingestion tasks throw away any records with timestamps outside of the specified intervals.

Ignored for any form of streaming ingestion. | `null` | ### `transformSpec` diff --git a/docs/ingestion/native-batch-firehose.md b/docs/ingestion/native-batch-firehose.md index a6c76b53ccc..4e2cad97fc1 100644 --- a/docs/ingestion/native-batch-firehose.md +++ b/docs/ingestion/native-batch-firehose.md @@ -1,7 +1,7 @@ --- id: native-batch-firehose title: "Native batch ingestion with firehose" -sidebar_label: "Firehose" +sidebar_label: "Firehose (deprecated)" --- -The simple task (type `index`) is designed to ingest small data sets into Apache Druid. The task executes within the indexing service. For general information on native batch indexing and parallel task indexing, see [Native batch ingestion](./native-batch.md). +> This page describes native batch ingestion using [ingestion specs](ingestion-spec.md). Refer to the [ingestion +> methods](../ingestion/index.md#batch) table to determine which ingestion method is right for you. + +The simple task ([task type](tasks.md) `index`) executes single-threaded as a single task within the indexing service. For parallel, scalable options consider using [`index_parallel` tasks](./native-batch.md) or [SQL-based batch ingestion](../multi-stage-query/index.md). ## Simple task example @@ -143,7 +146,7 @@ The tuningConfig is optional and default parameters will be used if no tuningCon |forceGuaranteedRollup|Forces guaranteeing the [perfect rollup](rollup.md). The perfect rollup optimizes the total size of generated segments and querying time while indexing time will be increased. If this is set to true, the index task will read the entire input data twice: one for finding the optimal number of partitions per time chunk and one for generating segments. Note that the result segments would be hash-partitioned. This flag cannot be used with `appendToExisting` of IOConfig. For more details, see the below __Segment pushing modes__ section.|false|no| |reportParseExceptions|DEPRECATED. If true, exceptions encountered during parsing will be thrown and will halt ingestion; if false, unparseable rows and fields will be skipped. Setting `reportParseExceptions` to true will override existing configurations for `maxParseExceptions` and `maxSavedParseExceptions`, setting `maxParseExceptions` to 0 and limiting `maxSavedParseExceptions` to no more than 1.|false|no| |pushTimeout|Milliseconds to wait for pushing segments. It must be >= 0, where 0 means to wait forever.|0|no| -|segmentWriteOutMediumFactory|Segment write-out medium to use when creating segments. See [SegmentWriteOutMediumFactory](#segmentwriteoutmediumfactory).|Not specified, the value from `druid.peon.defaultSegmentWriteOutMediumFactory.type` is used|no| +|segmentWriteOutMediumFactory|Segment write-out medium to use when creating segments. See [SegmentWriteOutMediumFactory](native-batch.md#segmentwriteoutmediumfactory).|Not specified, the value from `druid.peon.defaultSegmentWriteOutMediumFactory.type` is used|no| |logParseExceptions|If true, log an error message when a parsing exception occurs, containing information about the row where the error occurred.|false|no| |maxParseExceptions|The maximum number of parse exceptions that can occur before the task halts ingestion and fails. Overridden if `reportParseExceptions` is set.|unlimited|no| |maxSavedParseExceptions|When a parse exception occurs, Druid can keep track of the most recent parse exceptions. "maxSavedParseExceptions" limits how many exception instances will be saved. These saved exceptions will be made available after the task finishes in the [task completion report](tasks.md#task-reports). Overridden if `reportParseExceptions` is set.|0|no| @@ -170,12 +173,6 @@ For best-effort rollup, you should use `dynamic`. |maxRowsPerSegment|Used in sharding. Determines how many rows are in each segment.|5000000|no| |maxTotalRows|Total number of rows in segments waiting for being pushed.|20000000|no| -### `segmentWriteOutMediumFactory` - -|Field|Type|Description|Required| -|-----|----|-----------|--------| -|type|String|See [Additional Peon Configuration: SegmentWriteOutMediumFactory](../configuration/index.md#segmentwriteoutmediumfactory) for explanation and available options.|yes| - ## Segment pushing modes While ingesting data using the simple task indexing, Druid creates segments from the input data and pushes them. For segment pushing, diff --git a/docs/ingestion/native-batch.md b/docs/ingestion/native-batch.md index a0ea39c3e57..7106a9a000a 100644 --- a/docs/ingestion/native-batch.md +++ b/docs/ingestion/native-batch.md @@ -23,6 +23,8 @@ sidebar_label: "Native batch" ~ under the License. --> +> This page describes native batch ingestion using [ingestion specs](ingestion-spec.md). Refer to the [ingestion +> methods](../ingestion/index.md#batch) table to determine which ingestion method is right for you. Apache Druid supports the following types of native batch indexing tasks: - Parallel task indexing (`index_parallel`) that can run multiple indexing tasks concurrently. Parallel task works well for production ingestion tasks. @@ -31,15 +33,15 @@ Apache Druid supports the following types of native batch indexing tasks: This topic covers the configuration for `index_parallel` ingestion specs. For related information on batch indexing, see: -- [Simple task indexing](./native-batch-simple-task.md) for `index` task configuration. -- [Native batch input sources](./native-batch-input-source.md) for a reference for `inputSource` configuration. -- [Hadoop-based vs. native batch comparison table](./index.md#batch) for a comparison of batch ingestion methods. -- [Loading a file](../tutorials/tutorial-batch.md) for a tutorial on native batch ingestion. +- [Batch ingestion method comparison table](./index.md#batch) for a comparison of batch ingestion methods. +- [Tutorial: Loading a file](../tutorials/tutorial-batch.md) for a tutorial on native batch ingestion. +- [Input sources](./native-batch-input-source.md) for possible input sources. +- [Input formats](./data-formats.md#input-format) for possible input formats. ## Submit an indexing task To run either kind of native batch indexing task you can: -- Use the **Load Data** UI in the Druid console to define and submit an ingestion spec. +- Use the **Load Data** UI in the web console to define and submit an ingestion spec. - Define an ingestion spec in JSON based upon the [examples](#parallel-indexing-example) and reference topics for batch indexing. Then POST the ingestion spec to the [Indexer API endpoint](../operations/api-reference.md#tasks), `/druid/indexer/v1/task`, the Overlord service. Alternatively you can use the indexing script included with Druid at `bin/post-index-task`. @@ -86,7 +88,7 @@ You can set `dropExisting` flag in the `ioConfig` to true if you want the ingest The following examples demonstrate when to set the `dropExisting` property to true in the `ioConfig`: -Consider an existing segment with an interval of 2020-01-01 to 2021-01-01 and `YEAR` `segmentGranularity`. You want to overwrite the whole interval of 2020-01-01 to 2021-01-01 with new data using the finer segmentGranularity of `MONTH`. If the replacement data does not have a record within every months from 2020-01-01 to 2021-01-01 Druid cannot drop the original `YEAR` segment even if it does include all the replacement data. Set `dropExisting` to true in this case to replace the original segment at `YEAR` `segmentGranularity` since you no longer need it.

+Consider an existing segment with an interval of 2020-01-01 to 2021-01-01 and `YEAR` `segmentGranularity`. You want to overwrite the whole interval of 2020-01-01 to 2021-01-01 with new data using the finer segmentGranularity of `MONTH`. If the replacement data does not have a record within every months from 2020-01-01 to 2021-01-01 Druid cannot drop the original `YEAR` segment even if it does include all the replacement data. Set `dropExisting` to true in this case to replace the original segment at `YEAR` `segmentGranularity` since you no longer need it.

Imagine you want to re-ingest or overwrite a datasource and the new data does not contain some time intervals that exist in the datasource. For example, a datasource contains the following data at `MONTH` segmentGranularity: - **January**: 1 record - **February**: 10 records @@ -235,7 +237,7 @@ The tuningConfig is optional and default parameters will be used if no tuningCon |forceGuaranteedRollup|Forces [perfect rollup](rollup.md). The perfect rollup optimizes the total size of generated segments and querying time but increases indexing time. If true, specify `intervals` in the `granularitySpec` and use either `hashed` or `single_dim` for the `partitionsSpec`. You cannot use this flag in conjunction with `appendToExisting` of IOConfig. For more details, see [Segment pushing modes](#segment-pushing-modes).|false|no| |reportParseExceptions|If true, Druid throws exceptions encountered during parsing and halts ingestion. If false, Druid skips unparseable rows and fields.|false|no| |pushTimeout|Milliseconds to wait to push segments. Must be >= 0, where 0 means to wait forever.|0|no| -|segmentWriteOutMediumFactory|Segment write-out medium to use when creating segments. See [SegmentWriteOutMediumFactory](./native-batch-simple-task.md#segmentwriteoutmediumfactory).|If not specified, uses the value from `druid.peon.defaultSegmentWriteOutMediumFactory.type` |no| +|segmentWriteOutMediumFactory|Segment write-out medium to use when creating segments. See [SegmentWriteOutMediumFactory](#segmentwriteoutmediumfactory).|If not specified, uses the value from `druid.peon.defaultSegmentWriteOutMediumFactory.type` |no| |maxNumConcurrentSubTasks|Maximum number of worker tasks that can be run in parallel at the same time. The supervisor task spawns worker tasks up to `maxNumConcurrentSubTasks` regardless of the current available task slots. If this value is 1, the supervisor task processes data ingestion on its own instead of spawning worker tasks. If this value is set to too large, the supervisor may create too many worker tasks that block other ingestion tasks. See [Capacity planning](#capacity-planning) for more details.|1|no| |maxRetry|Maximum number of retries on task failures.|3|no| |maxNumSegmentsToMerge|Max limit for the number of segments that a single task can merge at the same time in the second phase. Used only when `forceGuaranteedRollup` is true.|100|no| @@ -711,11 +713,15 @@ For details on available input sources see: - [Azure input source](./native-batch-input-source.md#azure-input-source) (`azure`) reads data from Azure Blob Storage and Azure Data Lake. - [HDFS input source](./native-batch-input-source.md#hdfs-input-source) (`hdfs`) reads data from HDFS storage. - [HTTP input Source](./native-batch-input-source.md#http-input-source) (`http`) reads data from HTTP servers. -- [Inline input Source](./native-batch-input-source.md#inline-input-source) reads data you paste into the Druid console. +- [Inline input Source](./native-batch-input-source.md#inline-input-source) reads data you paste into the web console. - [Local input Source](./native-batch-input-source.md#local-input-source) (`local`) reads data from local storage. - [Druid input Source](./native-batch-input-source.md#druid-input-source) (`druid`) reads data from a Druid datasource. - [SQL input Source](./native-batch-input-source.md#sql-input-source) (`sql`) reads data from a RDBMS source. For information on how to combine input sources, see [Combining input source](./native-batch-input-source.md#combining-input-source). +### `segmentWriteOutMediumFactory` +|Field|Type|Description|Required| +|-----|----|-----------|--------| +|type|String|See [Additional Peon Configuration: SegmentWriteOutMediumFactory](../configuration/index.md#segmentwriteoutmediumfactory) for explanation and available options.|yes| diff --git a/docs/ingestion/partitioning.md b/docs/ingestion/partitioning.md index ed3318eaa84..0ffd0156003 100644 --- a/docs/ingestion/partitioning.md +++ b/docs/ingestion/partitioning.md @@ -34,6 +34,16 @@ This topic describes how to set up partitions within a single datasource. It doe Druid always partitions datasources by time into _time chunks_. Each time chunk contains one or more segments. This partitioning happens for all ingestion methods based on the `segmentGranularity` parameter in your ingestion spec `dataSchema` object. +Partitioning by time is important for two reasons: + +1. Queries that filter by `__time` (SQL) or `intervals` (native) are able to use time partitioning to prune the set of segments to consider. +2. Certain data management operations, such as overwriting and compacting existing data, acquire exclusive write locks on time partitions. +3. Each segment file is wholly contained within a time partition. Too-fine-grained partitioning may cause a large number + of small segments, which leads to poor performance. + +The most common choices to balance these considerations are `hour` and `day`. For streaming ingestion, `hour` is especially +common, because it allows compaction to follow ingestion with less of a time delay. + ## Secondary partitioning Druid can partition segments within a particular time chunk further depending upon options that vary based on the ingestion type you have chosen. In general, secondary partitioning on a particular dimension improves locality. This means that rows with the same value for that dimension are stored together, decreasing access time. @@ -45,25 +55,26 @@ dimension that you often use as a filter when possible. Such partitioning often Partitioning and sorting work well together. If you do have a "natural" partitioning dimension, consider placing it first in the `dimensions` list of your `dimensionsSpec`. This way Druid sorts rows within each segment by that column. This sorting configuration frequently improves compression more than using partitioning alone. -> Note that Druid always sorts rows within a segment by timestamp first, even before the first dimension listed in your `dimensionsSpec`. This sorting can preclude the efficacy of dimension sorting. To work around this limitation if necessary, set your `queryGranularity` equal to `segmentGranularity` in your [`granularitySpec`](./ingestion-spec.md#granularityspec). Druid will set all timestamps within the segment to the same value, letting you identify a [secondary timestamp](schema-design.md#secondary-timestamps) as the "real" timestamp. +Note that Druid always sorts rows within a segment by timestamp first, even before the first dimension listed in your `dimensionsSpec`. This sorting can preclude the efficacy of dimension sorting. To work around this limitation if necessary, set your `queryGranularity` equal to `segmentGranularity` in your [`granularitySpec`](./ingestion-spec.md#granularityspec). Druid will set all timestamps within the segment to the same value, letting you identify a [secondary timestamp](schema-design.md#secondary-timestamps) as the "real" timestamp. ## How to configure partitioning Not all ingestion methods support an explicit partitioning configuration, and not all have equivalent levels of flexibility. If you are doing initial ingestion through a less-flexible method like -Kafka), you can use [reindexing](data-management.md#reingesting-data) or [compaction](compaction.md) to repartition your data after initial ingestion. This is a powerful technique you can use to optimally partition any data older than a certain time threshold while you continuously add new data from a stream. +Kafka), you can use [reindexing](../data-management/update.md#reindex) or [compaction](../data-management/compaction.md) to repartition your data after initial ingestion. This is a powerful technique you can use to optimally partition any data older than a certain time threshold while you continuously add new data from a stream. The following table shows how each ingestion method handles partitioning: |Method|How it works| |------|------------| |[Native batch](native-batch.md)|Configured using [`partitionsSpec`](native-batch.md#partitionsspec) inside the `tuningConfig`.| +|[SQL](../multi-stage-query/index.md)|Configured using [`PARTITIONED BY`](../multi-stage-query/concepts.md#partitioning) and [`CLUSTERED BY`](../multi-stage-query/concepts.md#clustering).| |[Hadoop](hadoop.md)|Configured using [`partitionsSpec`](hadoop.md#partitionsspec) inside the `tuningConfig`.| -|[Kafka indexing service](../development/extensions-core/kafka-ingestion.md)|Kafka topic partitioning defines how Druid partitions the datasource. You can also [reindex](data-management.md#reingesting-data) or [compact](compaction.md) to repartition after initial ingestion.| -|[Kinesis indexing service](../development/extensions-core/kinesis-ingestion.md)|Kinesis stream sharding defines how Druid partitions the datasource. You can also [reindex](data-management.md#reingesting-data) or [compact](compaction.md) to repartition after initial ingestion.| +|[Kafka indexing service](../development/extensions-core/kafka-ingestion.md)|Kafka topic partitioning defines how Druid partitions the datasource. You can also [reindex](../data-management/update.md#reindex) or [compact](../data-management/compaction.md) to repartition after initial ingestion.| +|[Kinesis indexing service](../development/extensions-core/kinesis-ingestion.md)|Kinesis stream sharding defines how Druid partitions the datasource. You can also [reindex](../data-management/update.md#reindex) or [compact](../data-management/compaction.md) to repartition after initial ingestion.| ## Learn more See the following topics for more information: * [`partitionsSpec`](native-batch.md#partitionsspec) for more detail on partitioning with Native Batch ingestion. -* [Reindexing](data-management.md#reingesting-data) and [Compaction](compaction.md) for information on how to repartition existing data in Druid. +* [Reindexing](../data-management/update.md#reindex) and [Compaction](../data-management/compaction.md) for information on how to repartition existing data in Druid. diff --git a/docs/ingestion/rollup.md b/docs/ingestion/rollup.md index 682fe83d029..411dd5ead94 100644 --- a/docs/ingestion/rollup.md +++ b/docs/ingestion/rollup.md @@ -60,7 +60,7 @@ Tips for maximizing rollup: When queries only involve dimensions in the "abbreviated" set, use the second datasource to reduce query times. Often, this method only requires a small increase in storage footprint because abbreviated datasources tend to be substantially smaller. - If you use a [best-effort rollup](#perfect-rollup-vs-best-effort-rollup) ingestion configuration that does not guarantee perfect rollup, try one of the following: - Switch to a guaranteed perfect rollup option. - - [Reindex](data-management.md#reingesting-data) or [compact](compaction.md) your data in the background after initial ingestion. + - [Reindex](../data-management/update.md#reindex) or [compact](../data-management/compaction.md) your data in the background after initial ingestion. ## Perfect rollup vs best-effort rollup @@ -80,6 +80,7 @@ The following table shows how each method handles rollup: |Method|How it works| |------|------------| |[Native batch](native-batch.md)|`index_parallel` and `index` type may be either perfect or best-effort, based on configuration.| +|[SQL-based batch](../multi-stage-query/index.md)|Always perfect.| |[Hadoop](hadoop.md)|Always perfect.| |[Kafka indexing service](../development/extensions-core/kafka-ingestion.md)|Always best-effort.| |[Kinesis indexing service](../development/extensions-core/kinesis-ingestion.md)|Always best-effort.| diff --git a/docs/ingestion/tasks.md b/docs/ingestion/tasks.md index e429a2da395..dbb4f2b65c7 100644 --- a/docs/ingestion/tasks.md +++ b/docs/ingestion/tasks.md @@ -45,11 +45,11 @@ the Overlord APIs. A report containing information about the number of rows ingested, and any parse exceptions that occurred is available for both completed tasks and running tasks. -The reporting feature is supported by the [simple native batch task](../ingestion/native-batch-simple-task.md), the Hadoop batch task, and Kafka and Kinesis ingestion tasks. +The reporting feature is supported by [native batch tasks](../ingestion/native-batch.md), the Hadoop batch task, and Kafka and Kinesis ingestion tasks. ### Completion report -After a task completes, a completion report can be retrieved at: +After a task completes, if it supports reports, its report can be retrieved at: ``` http://:/druid/indexer/v1/task//reports @@ -104,12 +104,6 @@ When a task is running, a live report containing ingestion state, unparseable ev http://:/druid/indexer/v1/task//reports ``` -and - -``` -http://:/druid/worker/v1/chat//liveReports -``` - An example output is shown below: ```json @@ -184,7 +178,7 @@ The `errorMsg` field shows a message describing the error that caused a task to ### Row stats -The non-parallel [simple native batch task](./native-batch-simple-task.md), the Hadoop batch task, and Kafka and Kinesis ingestion tasks support retrieval of row stats while the task is running. +The [native batch task](./native-batch.md), the Hadoop batch task, and Kafka and Kinesis ingestion tasks support retrieval of row stats while the task is running. The live report can be accessed with a GET to the following URL on a Peon running a task: @@ -356,7 +350,7 @@ You can override the task priority by setting your priority in the task context The task context is used for various individual task configuration. Specify task context configurations in the `context` field of the ingestion spec. -When configuring [automatic compaction](../ingestion/automatic-compaction.md), set the task context configurations in `taskContext` rather than in `context`. +When configuring [automatic compaction](../data-management/automatic-compaction.md), set the task context configurations in `taskContext` rather than in `context`. The settings get passed into the `context` field of the compaction tasks issued to MiddleManagers. The following parameters apply to all task types. @@ -398,18 +392,10 @@ You can configure retention periods for logs in milliseconds by setting `druid.i ## All task types -### `index` - -See [Native batch ingestion (simple task)](./native-batch-simple-task.md). - ### `index_parallel` See [Native batch ingestion (parallel task)](native-batch.md). -### `index_sub` - -Submitted automatically, on your behalf, by an [`index_parallel`](#index_parallel) task. - ### `index_hadoop` See [Hadoop-based ingestion](hadoop.md). @@ -424,16 +410,12 @@ Submitted automatically, on your behalf, by a Submitted automatically, on your behalf, by a [Kinesis-based ingestion supervisor](../development/extensions-core/kinesis-ingestion.md). -### `index_realtime` - -Submitted automatically, on your behalf, by [Tranquility](tranquility.md). - ### `compact` Compaction tasks merge all segments of the given interval. See the documentation on -[compaction](compaction.md) for details. +[compaction](../data-management/compaction.md) for details. ### `kill` Kill tasks delete all metadata about certain segments and removes them from deep storage. -See the documentation on [deleting data](../ingestion/data-management.md#delete) for details. +See the documentation on [deleting data](../data-management/delete.md) for details. diff --git a/docs/multi-stage-query/msq-api.md b/docs/multi-stage-query/api.md similarity index 94% rename from docs/multi-stage-query/msq-api.md rename to docs/multi-stage-query/api.md index 89e568063a1..8532ea6817c 100644 --- a/docs/multi-stage-query/msq-api.md +++ b/docs/multi-stage-query/api.md @@ -1,6 +1,6 @@ --- id: api -title: SQL-based ingestion APIs +title: SQL-based ingestion and multi-stage query task API sidebar_label: API --- @@ -23,9 +23,13 @@ sidebar_label: API ~ under the License. --> -> SQL-based ingestion using the multi-stage query task engine is our recommended solution starting in Druid 24.0. Alternative ingestion solutions, such as native batch and Hadoop-based ingestion systems, will still be supported. We recommend you read all [known issues](./msq-known-issues.md) and test the feature in a development environment before rolling it out in production. Using the multi-stage query task engine with `SELECT` statements that do not write to a datasource is experimental. +> This page describes SQL-based batch ingestion using the [`druid-multi-stage-query`](../multi-stage-query/index.md) +> extension, new in Druid 24.0. Refer to the [ingestion methods](../ingestion/index.md#batch) table to determine which +> ingestion method is right for you. -The **Query** view in the Druid console provides the most stable experience for the multi-stage query task engine (MSQ task engine) and multi-stage query architecture. Use the UI if you do not need a programmatic interface. +The **Query** view in the web console provides a friendly experience for the multi-stage query task engine (MSQ task +engine) and multi-stage query architecture. We recommend using the web console if you do not need a programmatic +interface. When using the API for the MSQ task engine, the action you want to take determines the endpoint you use: @@ -36,16 +40,17 @@ When using the API for the MSQ task engine, the action you want to take determin You submit queries to the MSQ task engine using the `POST /druid/v2/sql/task/` endpoint. -### Request +#### Request Currently, the MSQ task engine ignores the provided values of `resultFormat`, `header`, `typesHeader`, and `sqlTypesHeader`. SQL SELECT queries write out their results into the task report (in the `multiStageQuery.payload.results.results` key) formatted as if `resultFormat` is an `array`. -For task queries similar to the [example queries](./msq-example-queries.md), you need to escape characters such as quotation marks (") if you use something like `curl`. +For task queries similar to the [example queries](./examples.md), you need to escape characters such as quotation marks (") if you use something like `curl`. You don't need to escape characters if you use a method that can parse JSON seamlessly, such as Python. The Python example in this topic escapes quotation marks although it's not required. -The following example is the same query that you submit when you complete [Convert a JSON ingestion spec](./msq-tutorial-convert-ingest-spec.md) where you insert data into a table named `wikipedia`. +The following example is the same query that you submit when you complete [Convert a JSON ingestion +spec](../tutorials/tutorial-msq-convert-spec.md) where you insert data into a table named `wikipedia`. @@ -106,7 +111,7 @@ print(response.text) -### Response +#### Response ```json { @@ -127,7 +132,7 @@ print(response.text) You can retrieve status of a query to see if it is still running, completed successfully, failed, or got canceled. -### Request +#### Request @@ -162,7 +167,7 @@ print(response.text) -### Response +#### Response ``` { @@ -200,7 +205,7 @@ Keep the following in mind when using the task API to view reports: For an explanation of the fields in a report, see [Report response fields](#report-response-fields). -### Request +#### Request @@ -236,7 +241,7 @@ print(response.text) -### Response +#### Response The response shows an example report for a query. @@ -535,7 +540,9 @@ The response shows an example report for a query. } ``` -### Report response fields + + + The following table describes the response fields when you retrieve a report for a MSQ task engine using the `/druid/indexer/v1/task//reports` endpoint: @@ -550,8 +557,8 @@ The following table describes the response fields when you retrieve a report for |multiStageQuery.payload.status.errorReport.taskId|The task that reported the error, if known. May be a controller task or a worker task.| |multiStageQuery.payload.status.errorReport.host|The hostname and port of the task that reported the error, if known.| |multiStageQuery.payload.status.errorReport.stageNumber|The stage number that reported the error, if it happened during execution of a specific stage.| -|multiStageQuery.payload.status.errorReport.error|Error object. Contains `errorCode` at a minimum, and may contain other fields as described in the [error code table](./msq-concepts.md#error-codes). Always present if there is an error.| -|multiStageQuery.payload.status.errorReport.error.errorCode|One of the error codes from the [error code table](./msq-concepts.md#error-codes). Always present if there is an error.| +|multiStageQuery.payload.status.errorReport.error|Error object. Contains `errorCode` at a minimum, and may contain other fields as described in the [error code table](./reference.md#error-codes). Always present if there is an error.| +|multiStageQuery.payload.status.errorReport.error.errorCode|One of the error codes from the [error code table](./reference.md#error-codes). Always present if there is an error.| |multiStageQuery.payload.status.errorReport.error.errorMessage|User-friendly error message. Not always present, even if there is an error.| |multiStageQuery.payload.status.errorReport.exceptionStackTrace|Java stack trace in string form, if the error was due to a server-side exception.| |multiStageQuery.payload.stages|Array of query stages.| @@ -571,7 +578,7 @@ The following table describes the response fields when you retrieve a report for ## Cancel a query task -### Request +#### Request @@ -606,7 +613,7 @@ print(response.text) -### Response +#### Response ``` { diff --git a/docs/multi-stage-query/concepts.md b/docs/multi-stage-query/concepts.md new file mode 100644 index 00000000000..ea65fd76de7 --- /dev/null +++ b/docs/multi-stage-query/concepts.md @@ -0,0 +1,281 @@ +--- +id: concepts +title: "SQL-based ingestion concepts" +sidebar_label: "Key concepts" +--- + + + +> This page describes SQL-based batch ingestion using the [`druid-multi-stage-query`](../multi-stage-query/index.md) +> extension, new in Druid 24.0. Refer to the [ingestion methods](../ingestion/index.md#batch) table to determine which +> ingestion method is right for you. + +## SQL task engine + +The `druid-multi-stage-query` extension adds a multi-stage query (MSQ) task engine that executes SQL SELECT, +[INSERT](reference.md#insert), and [REPLACE](reference.md#replace) statements as batch tasks in the indexing service, +which execute on [Middle Managers](../design/architecture.md#druid-services). INSERT and REPLACE tasks publish +[segments](../design/architecture.md#datasources-and-segments) just like [all other forms of batch +ingestion](../ingestion/index.md#batch). Each query occupies at least two task slots while running: one controller task, +and at least one worker task. + +You can execute queries using the MSQ task engine through the **Query** view in the [web +console](../operations/web-console.md) or through the [`/druid/v2/sql/task` API](api.md). + +For more details on how SQL queries are executed using the MSQ task engine, see [multi-stage query +tasks](#multi-stage-query-tasks). + +## SQL extensions + +To support ingestion, additional SQL functionality is available through the MSQ task engine. + + + +### Read external data with EXTERN + +Query tasks can access external data through the EXTERN function, using any native batch [input +source](../ingestion/native-batch-input-source.md) and [input format](../ingestion/data-formats.md#input-format). + +EXTERN can read multiple files in parallel across different worker tasks. However, EXTERN does not split individual +files across multiple worker tasks. If you have a small number of very large input files, you can increase query +parallelism by splitting up your input files. + +For more information about the syntax, see [EXTERN](./reference.md#extern). + + + +### Load data with INSERT + +INSERT statements can create a new datasource or append to an existing datasource. In Druid SQL, unlike standard SQL, +there is no syntactical difference between creating a table and appending data to a table. Druid does not include a +`CREATE TABLE` statement. + +Nearly all SELECT capabilities are available for `INSERT ... SELECT` queries. Certain exceptions are listed on the [Known +issues](./known-issues.md#select) page. + +INSERT statements acquire a shared lock to the target datasource. Multiple INSERT statements can run at the same time, +for the same datasource, if your cluster has enough task slots. + +Like all other forms of [batch ingestion](../ingestion/index.md#batch), each INSERT statement generates new segments and +publishes them at the end of its run. For this reason, it is best suited to loading data in larger batches. Do not use +INSERT statements to load data in a sequence of microbatches; for that, use [streaming +ingestion](../ingestion/index.md#streaming) instead. + +For more information about the syntax, see [INSERT](./reference.md#insert). + + + +### Overwrite data with REPLACE + +REPLACE statements can create a new datasource or overwrite data in an existing datasource. In Druid SQL, unlike +standard SQL, there is no syntactical difference between creating a table and overwriting data in a table. Druid does +not include a `CREATE TABLE` statement. + +REPLACE uses an [OVERWRITE clause](reference.md#replace-specific-time-ranges) to determine which data to overwrite. You +can overwrite an entire table, or a specific time range of a table. When you overwrite a specific time range, that time +range must align with the granularity specified in the PARTITIONED BY clause. + +REPLACE statements acquire an exclusive write lock to the target time range of the target datasource. No other ingestion +or compaction operations may proceed for that time range while the task is running. However, ingestion and compaction +operations may proceed for other time ranges. + +Nearly all SELECT capabilities are available for `REPLACE ... SELECT` queries. Certain exceptions are listed on the [Known +issues](./known-issues.md#select) page. + +For more information about the syntax, see [REPLACE](./reference.md#replace). + +### Primary timestamp + +Druid tables always include a primary timestamp named `__time`. + +It is common to set a primary timestamp by using [date and time +functions](../querying/sql-scalar.md#date-and-time-functions); for example: `TIME_FORMAT("timestamp", 'yyyy-MM-dd +HH:mm:ss') AS __time`. + +The `__time` column is used for [partitioning by time](#partitioning-by-time). If you use `PARTITIONED BY ALL` or +`PARTITIONED BY ALL TIME`, partitioning by time is disabled. In these cases, you do not need to include a `__time` +column in your INSERT statement. However, Druid still creates a `__time` column in your Druid table and sets all +timestamps to 1970-01-01 00:00:00. + +For more information, see [Primary timestamp](../ingestion/data-model.md#primary-timestamp). + + + +### Partitioning by time + +INSERT and REPLACE statements require the PARTITIONED BY clause, which determines how time-based partitioning is done. +In Druid, data is split into one or more segments per time chunk, defined by the PARTITIONED BY granularity. + +Partitioning by time is important for three reasons: + +1. Queries that filter by `__time` (SQL) or `intervals` (native) are able to use time partitioning to prune the set of + segments to consider. +2. Certain data management operations, such as overwriting and compacting existing data, acquire exclusive write locks + on time partitions. Finer-grained partitioning allows finer-grained exclusive write locks. +3. Each segment file is wholly contained within a time partition. Too-fine-grained partitioning may cause a large number + of small segments, which leads to poor performance. + +`PARTITIONED BY HOUR` and `PARTITIONED BY DAY` are the most common choices to balance these considerations. `PARTITIONED +BY ALL` is suitable if your dataset does not have a [primary timestamp](#primary-timestamp). + +For more information about the syntax, see [PARTITIONED BY](./reference.md#partitioned-by). + +### Clustering + +Within each time chunk defined by [time partitioning](#partitioning-by-time), data can be further split by the optional +[CLUSTERED BY](reference.md#clustered-by) clause. + +For example, suppose you ingest 100 million rows per hour using `PARTITIONED BY HOUR` and `CLUSTERED BY hostName`. The +ingestion task will generate segments of roughly 3 million rows — the default value of +[`rowsPerSegment`](reference.md#context-parameters) — with lexicographic ranges of `hostName`s grouped into segments. + +Clustering is important for two reasons: + +1. Lower storage footprint due to improved locality, and therefore improved compressibility. +2. Better query performance due to dimension-based segment pruning, which removes segments from consideration when they + cannot possibly contain data matching a query's filter. This speeds up filters like `x = 'foo'` and `x IN ('foo', + 'bar')`. + +To activate dimension-based pruning, these requirements must be met: + +- Segments were generated by a REPLACE statement, not an INSERT statement. +- All CLUSTERED BY columns are single-valued string columns. + +If these requirements are _not_ met, Druid still clusters data during ingestion, but will not be able to perform +dimension-based segment pruning at query time. You can tell if dimension-based segment pruning is possible by using the +`sys.segments` table to inspect the `shard_spec` for the segments generated by an ingestion query. If they are of type +`range` or `single`, then dimension-based segment pruning is possible. Otherwise, it is not. The shard spec type is also +available in the **Segments** view under the **Partitioning** column. + +For more information about syntax, see [CLUSTERED BY](./reference.md#clustered-by). + +### Rollup + +[Rollup](../ingestion/rollup.md) is a technique that pre-aggregates data during ingestion to reduce the amount of data +stored. Intermediate aggregations are stored in the generated segments, and further aggregation is done at query time. +This reduces storage footprint and improves performance, often dramatically. + +To perform ingestion with rollup: + +1. Use GROUP BY. The columns in the GROUP BY clause become dimensions, and aggregation functions become metrics. +2. Set [`finalizeAggregations: false`](reference.md#context-parameters) in your context. This causes aggregation + functions to write their internal state to the generated segments, instead of the finalized end result, and enables + further aggregation at query time. +3. Wrap all multi-value strings in `MV_TO_ARRAY(...)` and set [`groupByEnableMultiValueUnnesting: + false`](reference.md#context-parameters) in your context. This ensures that multi-value strings are left alone and + remain lists, instead of being [automatically unnested](../querying/sql-data-types.md#multi-value-strings) by the + GROUP BY operator. + +When you do all of these things, Druid understands that you intend to do an ingestion with rollup, and it writes +rollup-related metadata into the generated segments. Other applications can then use [`segmentMetadata` +queries](../querying/segmentmetadataquery.md) to retrieve rollup-related information. + +If you see the error "Encountered multi-value dimension `x` that cannot be processed with +groupByEnableMultiValueUnnesting set to false", then wrap that column in `MV_TO_ARRAY(x) AS x`. + +The following [aggregation functions](../querying/sql-aggregations.md) are supported for rollup at ingestion time: +`COUNT` (but switch to `SUM` at query time), `SUM`, `MIN`, `MAX`, `EARLIEST` ([string only](known-issues.md#select)), +`LATEST` ([string only](known-issues.md#select)), `APPROX_COUNT_DISTINCT`, `APPROX_COUNT_DISTINCT_BUILTIN`, +`APPROX_COUNT_DISTINCT_DS_HLL`, `APPROX_COUNT_DISTINCT_DS_THETA`, and `DS_QUANTILES_SKETCH` (but switch to +`APPROX_QUANTILE_DS` at query time). Do not use `AVG`; instead, use `SUM` and `COUNT` at ingest time and compute the +quotient at query time. + +For an example, see [INSERT with rollup example](examples.md#insert-with-rollup). + +## Multi-stage query tasks + +### Execution flow + +When you execute a SQL statement using the task endpoint [`/druid/v2/sql/task`](api.md#submit-a-query), the following +happens: + +1. The Broker plans your SQL query into a native query, as usual. + +2. The Broker wraps the native query into a task of type `query_controller` + and submits it to the indexing service. + +3. The Broker returns the task ID to you and exits. + +4. The controller task launches some number of worker tasks determined by + the `maxNumTasks` and `taskAssignment` [context parameters](./reference.md#context-parameters). You can set these settings individually for each query. + +5. Worker tasks of type `query_worker` execute the query. + +6. If the query is a SELECT query, the worker tasks send the results + back to the controller task, which writes them into its task report. + If the query is an INSERT or REPLACE query, the worker tasks generate and + publish new Druid segments to the provided datasource. + +### Parallelism + +The [`maxNumTasks`](./reference.md#context-parameters) query parameter determines the maximum number of tasks your +query will use, including the one `query_controller` task. Generally, queries perform better with more workers. The +lowest possible value of `maxNumTasks` is two (one worker and one controller). Do not set this higher than the number of +free slots available in your cluster; doing so will result in a [TaskStartTimeout](reference.md#error-codes) error. + +When [reading external data](#extern), EXTERN can read multiple files in parallel across +different worker tasks. However, EXTERN does not split individual files across multiple worker tasks. If you have a +small number of very large input files, you can increase query parallelism by splitting up your input files. + +The `druid.worker.capacity` server property on each [Middle Manager](../design/architecture.md#druid-services) +determines the maximum number of worker tasks that can run on each server at once. Worker tasks run single-threaded, +which also determines the maximum number of processors on the server that can contribute towards multi-stage queries. + +### Memory usage + +Increasing the amount of available memory can improve performance in certain cases: + +- Segment generation becomes more efficient when data doesn't spill to disk as often. +- Sorting stage output data becomes more efficient since available memory affects the + number of required sorting passes. + +Worker tasks use both JVM heap memory and off-heap ("direct") memory. + +On Peons launched by Middle Managers, the bulk of the JVM heap (75%) is split up into two bundles of equal size: one +processor bundle and one worker bundle. Each one comprises 37.5% of the available JVM heap. + +The processor memory bundle is used for query processing and segment generation. Each processor bundle must also +provides space to buffer I/O between stages. Specifically, each downstream stage requires 1 MB of buffer space for each +upstream worker. For example, if you have 100 workers running in stage 0, and stage 1 reads from stage 0, then each +worker in stage 1 requires 1M * 100 = 100 MB of memory for frame buffers. + +The worker memory bundle is used for sorting stage output data prior to shuffle. Workers can sort more data than fits in +memory; in this case, they will switch to using disk. + +Worker tasks also use off-heap ("direct") memory. Set the amount of direct memory available (`-XX:MaxDirectMemorySize`) +to at least `(druid.processing.numThreads + 1) * druid.processing.buffer.sizeBytes`. Increasing the amount of direct +memory available beyond the minimum does not speed up processing. + +### Disk usage + +Worker tasks use local disk for four purposes: + +- Temporary copies of input data. Each temporary file is deleted before the next one is read. You only need + enough temporary disk space to store one input file at a time per task. +- Temporary data related to segment generation. You only need enough temporary disk space to store one segments' worth + of data at a time per task. This is generally less than 2 GB per task. +- External sort of data prior to shuffle. Requires enough space to store a compressed copy of the entire output dataset + for a task. +- Storing stage output data during a shuffle. Requires enough space to store a compressed copy of the entire output + dataset for a task. + +Workers use the task working directory, given by +[`druid.indexer.task.baseDir`](../configuration/index.md#additional-peon-configuration), for these items. It is +important that this directory has enough space available for these purposes. diff --git a/docs/multi-stage-query/msq-example-queries.md b/docs/multi-stage-query/examples.md similarity index 95% rename from docs/multi-stage-query/msq-example-queries.md rename to docs/multi-stage-query/examples.md index d810d536cc7..eed42fec4b6 100644 --- a/docs/multi-stage-query/msq-example-queries.md +++ b/docs/multi-stage-query/examples.md @@ -23,9 +23,11 @@ sidebar_label: Examples ~ under the License. --> -> SQL-based ingestion using the multi-stage query task engine is our recommended solution starting in Druid 24.0. Alternative ingestion solutions, such as native batch and Hadoop-based ingestion systems, will still be supported. We recommend you read all [known issues](./msq-known-issues.md) and test the feature in a development environment before rolling it out in production. Using the multi-stage query task engine with `SELECT` statements that do not write to a datasource is experimental. +> This page describes SQL-based batch ingestion using the [`druid-multi-stage-query`](../multi-stage-query/index.md) +> extension, new in Druid 24.0. Refer to the [ingestion methods](../ingestion/index.md#batch) table to determine which +> ingestion method is right for you. -These example queries show you some of the things you can do when modifying queries for your use case. Copy the example queries into the **Query** view of the Druid console and run them to see what they do. +These example queries show you some of the things you can do when modifying queries for your use case. Copy the example queries into the **Query** view of the web console and run them to see what they do. ## INSERT with no rollup @@ -75,7 +77,7 @@ CLUSTERED BY channel ## INSERT with rollup -This example inserts data into a table named `kttm_data` and performs data rollup. This example implements the recommendations described in [multi-value dimensions](./index.md#multi-value-dimensions). +This example inserts data into a table named `kttm_data` and performs data rollup. This example implements the recommendations described in [Rollup](./concepts.md#rollup).
Show the query @@ -472,8 +474,3 @@ LIMIT 1000 ```
- -## Next steps - -* [Read Multi-stage queries](./msq-example-queries.md) to learn more about how multi-stage queries work. -* [Explore the Query view](../operations/druid-console.md) to learn about the UI tools to help you get started. diff --git a/docs/multi-stage-query/index.md b/docs/multi-stage-query/index.md index 07c943fbe7c..d97de6dd633 100644 --- a/docs/multi-stage-query/index.md +++ b/docs/multi-stage-query/index.md @@ -1,7 +1,7 @@ --- id: index -title: SQL-based ingestion overview and syntax -sidebar_label: Overview and syntax +title: SQL-based ingestion +sidebar_label: Overview description: Introduces multi-stage query architecture and its task engine --- @@ -24,319 +24,50 @@ description: Introduces multi-stage query architecture and its task engine ~ under the License. --> -> SQL-based ingestion using the multi-stage query task engine is our recommended solution starting in Druid 24.0. Alternative ingestion solutions, such as native batch and Hadoop-based ingestion systems, will still be supported. We recommend you read all [known issues](./msq-known-issues.md) and test the feature in a development environment before rolling it out in production. Using the multi-stage query task engine with `SELECT` statements that do not write to a datasource is experimental. +> This page describes SQL-based batch ingestion using the [`druid-multi-stage-query`](../multi-stage-query/index.md) +> extension, new in Druid 24.0. Refer to the [ingestion methods](../ingestion/index.md#batch) table to determine which +> ingestion method is right for you. -SQL-based ingestion for Apache Druid uses a distributed multi-stage query architecture, which includes a query engine called the multi-stage query task engine (MSQ task engine). The MSQ task engine extends Druid's query capabilities, so you can write queries that reference [external data](#read-external-data) as well as perform ingestion with SQL [INSERT](#insert-data) and [REPLACE](#replace-data). Essentially, you can perform SQL-based ingestion instead of using JSON ingestion specs that Druid's native ingestion uses. +Apache Druid supports SQL-based ingestion using the bundled [`druid-multi-stage-query` extension](#load-the-extension). +This extension adds a [multi-stage query task engine for SQL](concepts.md#sql-task-engine) that allows running SQL +[INSERT](concepts.md#insert) and [REPLACE](concepts.md#replace) statements as batch tasks. -The MSQ task engine excels at executing queries that can get bottlenecked at the Broker when using Druid's native SQL engine. When you submit queries, the MSQ task engine splits them into stages and automatically exchanges data between stages. Each stage is parallelized to run across multiple data servers at once, simplifying performance. +Nearly all SELECT capabilities are available for `INSERT ... SELECT` and `REPLACE ... SELECT` queries, with certain +exceptions listed on the [Known issues](./known-issues.md#select) page. This allows great flexibility to apply +transformations, filters, JOINs, aggregations, and so on while ingesting data. This also allows in-database +transformation: creating new tables based on queries of other tables. +## Vocabulary -## MSQ task engine features +- **Controller**: An indexing service task of type `query_controller` that manages + the execution of a query. There is one controller task per query. -In its current state, the MSQ task engine enables you to do the following: +- **Worker**: Indexing service tasks of type `query_worker` that execute a + query. There can be multiple worker tasks per query. Internally, + the tasks process items in parallel using their processing pools (up to `druid.processing.numThreads` of execution parallelism + within a worker task). -- Read external data at query time using EXTERN. -- Execute batch ingestion jobs by writing SQL queries using INSERT and REPLACE. You no longer need to generate a JSON-based ingestion spec. -- Transform and rewrite existing tables using SQL. -- Perform multi-dimension range partitioning reliably, which leads to more evenly distributed segment sizes and better performance. +- **Stage**: A stage of query execution that is parallelized across + worker tasks. Workers exchange data with each other between stages. -The MSQ task engine has additional features that can be used as part of a proof of concept or demo, but don't use or rely on the following features for any meaningful use cases, especially production ones: - -- Execute heavy-weight queries and return large numbers of rows. -- Execute queries that exchange large amounts of data between servers, like exact count distinct of high-cardinality fields. +- **Partition**: A slice of data output by worker tasks. In INSERT or REPLACE + queries, the partitions of the final stage become Druid segments. +- **Shuffle**: Workers exchange data between themselves on a per-partition basis in a process called + shuffling. During a shuffle, each output partition is sorted by a clustering key. ## Load the extension -For new clusters that use 24.0 or later, the multi-stage query extension is loaded by default. If you want to add the extension to an existing cluster, add the extension `druid-multi-stage-query` to `druid.extensions.loadlist` in your `common.runtime.properties` file. +To add the extension to an existing cluster, add `druid-multi-stage-query` to `druid.extensions.loadlist` in your +`common.runtime.properties` file. For more information about how to load an extension, see [Loading extensions](../development/extensions.md#loading-extensions). -To use EXTERN, you need READ permission on the resource named "EXTERNAL" of the resource type "EXTERNAL". If you encounter a 403 error when trying to use EXTERN, verify that you have the correct permissions. - -## MSQ task engine query syntax - -You can submit queries to the MSQ task engine through the **Query** view in the Druid console or through the API. The Druid console is a good place to start because you can preview a query before you run it. You can also experiment with many of the [context parameters](./msq-reference.md#context-parameters) through the UI. Once you're comfortable with submitting queries through the Druid console, [explore using the API to submit a query](./msq-api.md#submit-a-query). - -If you encounter an issue after you submit a query, you can learn more about what an error means from the [limits](./msq-concepts.md#limits) and [errors](./msq-concepts.md#error-codes). - -Queries for the MSQ task engine involve three primary functions: - -- EXTERN to query external data -- INSERT INTO ... SELECT to insert data, such as data from an external source -- REPLACE to replace existing datasources, partially or fully, with query results - -For information about the syntax for queries, see [SQL syntax](./msq-reference.md#sql-syntax). - -### Read external data - -Query tasks can access external data through the EXTERN function. When using EXTERN, keep in mind that large files do not get split across different worker tasks. If you have fewer input files than worker tasks, you can increase query parallelism by splitting up your input files such that you have at least one input file per worker task. - -You can use the EXTERN function anywhere a table is expected in the following form: `TABLE(EXTERN(...))`. You can use external data with SELECT, INSERT, and REPLACE queries. - -The following query reads external data: - -```sql -SELECT - * -FROM TABLE( - EXTERN( - '{"type": "http", "uris": ["https://druid.apache.org/data/wikipedia.json.gz"]}', - '{"type": "json"}', - '[{"name": "timestamp", "type": "string"}, {"name": "page", "type": "string"}, {"name": "user", "type": "string"}]' - ) -) -LIMIT 100 -``` - -For more information about the syntax, see [EXTERN](./msq-reference.md#extern). - -### Insert data - -With the MSQ task engine, Druid can use the results of a query task to create a new datasource or to append to an existing datasource. Syntactically, there is no difference between the two. These operations use the INSERT INTO ... SELECT syntax. - -All SELECT capabilities are available for INSERT queries. However, the MSQ task engine does not include all the existing SQL query features of Druid. See [Known issues](./msq-known-issues.md) for a list of capabilities that aren't available. - -The following example query inserts data from an external source into a table named `w000` and partitions it by day: - -```sql -INSERT INTO w000 -SELECT - TIME_PARSE("timestamp") AS __time, - "page", - "user" -FROM TABLE( - EXTERN( - '{"type": "http", "uris": ["https://druid.apache.org/data/wikipedia.json.gz"]}', - '{"type": "json"}', - '[{"name": "timestamp", "type": "string"}, {"name": "page", "type": "string"}, {"name": "user", "type": "string"}]' - ) -) -PARTITIONED BY DAY -``` - -For more information about the syntax, see [INSERT](./msq-reference.md#insert). - -### Replace data - -The syntax for REPLACE is similar to INSERT. All SELECT functionality is available for REPLACE queries. -Note that the MSQ task engine does not yet implement all native Druid query features. -For details, see [Known issues](./msq-known-issues.md). - -When working with REPLACE queries, keep the following in mind: - -- The intervals generated as a result of the OVERWRITE WHERE query must align with the granularity specified in the PARTITIONED BY clause. -- OVERWRITE WHERE queries only support the `__time` column. - -For more information about the syntax, see [REPLACE](./msq-reference.md#replace). - -The following examples show how to replace data in a table. - -#### REPLACE all data - -You can replace all the data in a table by using REPLACE INTO ... OVERWRITE ALL SELECT: - -```sql -REPLACE INTO w000 -OVERWRITE ALL -SELECT - TIME_PARSE("timestamp") AS __time, - "page", - "user" -FROM TABLE( - EXTERN( - '{"type": "http", "uris": ["https://druid.apache.org/data/wikipedia.json.gz"]}', - '{"type": "json"}', - '[{"name": "timestamp", "type": "string"}, {"name": "page", "type": "string"}, {"name": "user", "type": "string"}]' - ) -) -PARTITIONED BY DAY -``` - -#### REPLACE some data - -You can replace some of the data in a table by using REPLACE INTO ... OVERWRITE WHERE ... SELECT: - -```sql -REPLACE INTO w000 -OVERWRITE WHERE __time >= TIMESTAMP '2019-08-25' AND __time < TIMESTAMP '2019-08-28' -SELECT - TIME_PARSE("timestamp") AS __time, - "page", - "user" -FROM TABLE( - EXTERN( - '{"type": "http", "uris": ["https://druid.apache.org/data/wikipedia.json.gz"]}', - '{"type": "json"}', - '[{"name": "timestamp", "type": "string"}, {"name": "page", "type": "string"}, {"name": "user", "type": "string"}]' - ) -) -PARTITIONED BY DAY -``` - -## Adjust query behavior - -In addition to the basic functions, you can further modify your query behavior to control how your queries run or what your results look like. You can control how your queries behave by changing the following: - -### Primary timestamp - -Druid tables always include a primary timestamp named `__time`, so your ingestion query should generally include a column named `__time`. - -The following formats are supported for `__time` in the source data: -- ISO 8601 with 'T' separator, such as "2000-01-01T01:02:03.456" -- Milliseconds since Unix epoch (00:00:00 UTC on January 1, 1970) - -The `__time` column is used for time-based partitioning, such as `PARTITIONED BY DAY`. - -If you use `PARTITIONED BY ALL` or `PARTITIONED BY ALL TIME`, time-based -partitioning is disabled. In these cases, your ingestion query doesn't need -to include a `__time` column. However, Druid still creates a `__time` column -in your Druid table and sets all timestamps to 1970-01-01 00:00:00. - -For more information, see [Primary timestamp](../ingestion/data-model.md#primary-timestamp). - -### PARTITIONED BY - -INSERT and REPLACE queries require the PARTITIONED BY clause, which determines how time-based partitioning is done. In Druid, data is split into segments, one or more per time chunk defined by the PARTITIONED BY granularity. A good general rule is to adjust the granularity so that each segment contains about five million rows. Choose a granularity based on your ingestion rate. For example, if you ingest a million rows per day, PARTITIONED BY DAY is good. If you ingest a million rows an hour, choose PARTITION BY HOUR instead. - -Using the clause provides the following benefits: - -- Better query performance due to time-based segment pruning, which removes segments from - consideration when they do not contain any data for a query's time filter. -- More efficient data management, as data can be rewritten for each time partition individually - rather than the whole table. - -You can use the following arguments for PARTITIONED BY: - -- Time unit: `HOUR`, `DAY`, `MONTH`, or `YEAR`. Equivalent to `FLOOR(__time TO TimeUnit)`. -- `TIME_FLOOR(__time, 'granularity_string')`, where granularity_string is an ISO 8601 period like - 'PT1H'. The first argument must be `__time`. -- `FLOOR(__time TO TimeUnit)`, where `TimeUnit` is any unit supported by the [FLOOR function](../querying/sql-scalar.md#date-and-time-functions). The first - argument must be `__time`. -- `ALL` or `ALL TIME`, which effectively disables time partitioning by placing all data in a single - time chunk. To use LIMIT or OFFSET at the outer level of your INSERT or REPLACE query, you must set PARTITIONED BY to ALL or ALL TIME. - -You can use the following ISO 8601 periods for `TIME_FLOOR`: - -- PT1S -- PT1M -- PT5M -- PT10M -- PT15M -- PT30M -- PT1H -- PT6H -- P1D -- P1W -- P1M -- P3M -- P1Y - - -### CLUSTERED BY - -Data is first divided by the PARTITIONED BY clause. Data can be further split by the CLUSTERED BY clause. For example, suppose you ingest 100 M rows per hour and use `PARTITIONED BY HOUR` as your time partition. You then divide up the data further by adding `CLUSTERED BY hostName`. The result is segments of about 5 million rows, with like `hostName`s grouped within the same segment. - -Using CLUSTERED BY has the following benefits: - -- Lower storage footprint due to combining similar data into the same segments, which improves - compressibility. -- Better query performance due to dimension-based segment pruning, which removes segments from - consideration when they cannot possibly contain data matching a query's filter. - -For dimension-based segment pruning to be effective, your queries should meet the following conditions: - -- All CLUSTERED BY columns are single-valued string columns -- Use a REPLACE query for ingestion - -Druid still clusters data during ingestion if these conditions aren't met but won't perform dimension-based segment pruning at query time. That means if you use an INSERT query for ingestion or have numeric columns or multi-valued string columns, dimension-based segment pruning doesn't occur at query time. - -You can tell if dimension-based segment pruning is possible by using the `sys.segments` table to -inspect the `shard_spec` for the segments generated by an ingestion query. If they are of type -`range` or `single`, then dimension-based segment pruning is possible. Otherwise, it is not. The -shard spec type is also available in the **Segments** view under the **Partitioning** -column. - -You can use the following filters for dimension-based segment pruning: - -- Equality to string literals, like `x = 'foo'` or `x IN ('foo', 'bar')`. -- Comparison to string literals, like `x < 'foo'` or other comparisons involving `<`, `>`, `<=`, or `>=`. - -This differs from multi-dimension range based partitioning in classic batch ingestion where both -string and numeric columns support Broker-level pruning. With SQL-based batch ingestion, -only string columns support Broker-level pruning. - -It is okay to mix time partitioning with secondary partitioning. For example, you can -combine `PARTITIONED BY HOUR` with `CLUSTERED BY channel` to perform -time partitioning by hour and secondary partitioning by channel within each hour. - -### GROUP BY - -A query's GROUP BY clause determines how data is rolled up. The expressions in the GROUP BY clause become -dimensions, and aggregation functions become metrics. - -### Ingest-time aggregations - -When performing rollup using aggregations, it is important to use aggregators -that return nonfinalized state. This allows you to perform further rollups -at query time. To achieve this, set `finalizeAggregations: false` in your -ingestion query context. - -Check out the [INSERT with rollup example query](./msq-example-queries.md#insert-with-rollup) to see this feature in -action. - -Druid needs information for aggregating measures of different segments to compact. For example, to aggregate `count("col") as example_measure`, Druid needs to sum the value of `example_measure` -across the segments. This information is stored inside the metadata of the segment. For the SQL-based ingestion, Druid only populates the -aggregator information of a column in the segment metadata when: - -- The INSERT or REPLACE query has an outer GROUP BY clause. -- The following context parameters are set for the query context: `finalizeAggregations: false` and `groupByEnableMultiValueUnnesting: false` - -The following table lists query-time aggregations for SQL-based ingestion: - -|Query-time aggregation|Notes| -|----------------------|-----| -|SUM|Use unchanged at ingest time.| -|MIN|Use unchanged at ingest time.| -|MAX|Use unchanged at ingest time.| -|AVG|Use SUM and COUNT at ingest time. Switch to quotient of SUM at query time.| -|COUNT|Use unchanged at ingest time, but switch to SUM at query time.| -|COUNT(DISTINCT expr)|If approximate, use APPROX_COUNT_DISTINCT at ingest time.

If exact, you cannot use an ingest-time aggregation. Instead, `expr` must be stored as-is. Add it to the SELECT and GROUP BY lists.| -|EARLIEST(expr)

(numeric form)|Not supported.| -|EARLIEST(expr, maxBytes)

(string form)|Use unchanged at ingest time.| -|LATEST(expr)

(numeric form)|Not supported.| -|LATEST(expr, maxBytes)

(string form)|Use unchanged at ingest time.| -|APPROX_COUNT_DISTINCT|Use unchanged at ingest time.| -|APPROX_COUNT_DISTINCT_BUILTIN|Use unchanged at ingest time.| -|APPROX_COUNT_DISTINCT_DS_HLL|Use unchanged at ingest time.| -|APPROX_COUNT_DISTINCT_DS_THETA|Use unchanged at ingest time.| -|APPROX_QUANTILE|Not supported. Deprecated; use APPROX_QUANTILE_DS instead.| -|APPROX_QUANTILE_DS|Use DS_QUANTILES_SKETCH at ingest time. Continue using APPROX_QUANTILE_DS at query time.| -|APPROX_QUANTILE_FIXED_BUCKETS|Not supported.| - -### Multi-value dimensions - -By default, multi-value dimensions are not ingested as expected when rollup is enabled because the -GROUP BY operator unnests them instead of leaving them as arrays. This is [standard behavior](../querying/sql-data-types.md#multi-value-strings) for GROUP BY but it is generally not desirable behavior for ingestion. - -To address this: - -- When using GROUP BY with data from EXTERN, wrap any string type fields from EXTERN that may be - multi-valued in `MV_TO_ARRAY`. -- Set `groupByEnableMultiValueUnnesting: false` in your query context to ensure that all multi-value - strings are properly converted to arrays using `MV_TO_ARRAY`. If any strings aren't - wrapped in `MV_TO_ARRAY`, the query reports an error that includes the message "Encountered - multi-value dimension x that cannot be processed with groupByEnableMultiValueUnnesting set to false." - -For an example, see [INSERT with rollup example query](./msq-example-queries.md#insert-with-rollup). - -### Context parameters - -Context parameters can control things such as how many tasks get launched or what happens if there's a malformed record. - -For a full list of context parameters and how they affect a query, see [Context parameters](./msq-reference.md#context-parameters). +To use [EXTERN](reference.md#extern), you need READ permission on the resource named "EXTERNAL" of the resource type +"EXTERNAL". If you encounter a 403 error when trying to use EXTERN, verify that you have the correct permissions. ## Next steps -* [Understand how the multi-stage query architecture works](./msq-concepts.md) by reading about the concepts behind it and its processes. -* [Explore the Query view](../operations/druid-console.md) to learn about the UI tools that can help you get started. \ No newline at end of file +* [Read about key concepts](./concepts.md) to learn more about how SQL-based ingestion and multi-stage queries work. +* [Check out the examples](./examples.md) to see SQL-based ingestion in action. +* [Explore the Query view](../operations/web-console.md) to get started in the web console. diff --git a/docs/multi-stage-query/msq-known-issues.md b/docs/multi-stage-query/known-issues.md similarity index 62% rename from docs/multi-stage-query/msq-known-issues.md rename to docs/multi-stage-query/known-issues.md index b8b5747ac1b..8021e1e305b 100644 --- a/docs/multi-stage-query/msq-known-issues.md +++ b/docs/multi-stage-query/known-issues.md @@ -23,41 +23,46 @@ sidebar_label: Known issues ~ under the License. --> -> SQL-based ingestion using the multi-stage query task engine is our recommended solution starting in Druid 24.0. -> Alternative ingestion solutions, such as native batch and Hadoop-based ingestion systems, will still be supported. -> We recommend you read all [known issues](./msq-known-issues.md) and test the feature in a development environment -> before rolling it out in production. Using the multi-stage query task engine with `SELECT` statements that do not -> write to a datasource is experimental. +> This page describes SQL-based batch ingestion using the [`druid-multi-stage-query`](../multi-stage-query/index.md) +> extension, new in Druid 24.0. Refer to the [ingestion methods](../ingestion/index.md#batch) table to determine which +> ingestion method is right for you. ## Multi-stage query task runtime - Fault tolerance is not implemented. If any task fails, the entire query fails. -- SELECT from a Druid datasource does not include unpublished real-time data. - -- GROUPING SETS is not implemented. Queries that use GROUPING SETS fail. - - Worker task stage outputs are stored in the working directory given by `druid.indexer.task.baseDir`. Stages that generate a large amount of output data may exhaust all available disk space. In this case, the query fails with -an [UnknownError](./msq-reference.md#error-codes) with a message including "No space left on device". +an [UnknownError](./reference.md#error-codes) with a message including "No space left on device". + +## SELECT + +- SELECT from a Druid datasource does not include unpublished real-time data. + +- GROUPING SETS and UNION ALL are not implemented. Queries using these features return a + [QueryNotSupported](reference.md#error-codes) error. - The numeric varieties of the EARLIEST and LATEST aggregators do not work properly. Attempting to use the numeric varieties of these aggregators lead to an error like `java.lang.ClassCastException: class java.lang.Double cannot be cast to class org.apache.druid.collections.SerializablePair`. The string varieties, however, do work properly. -## INSERT and REPLACE +## INSERT and REPLACE -- INSERT with column lists, like `INSERT INTO tbl (a, b, c) SELECT ...`, is not implemented. +- INSERT and REPLACE with column lists, like `INSERT INTO tbl (a, b, c) SELECT ...`, is not implemented. -- `INSERT ... SELECT` inserts columns from the SELECT statement based on column name. This differs from SQL standard -behavior, where columns are inserted based on position. +- `INSERT ... SELECT` and `REPLACE ... SELECT` insert columns from the SELECT statement based on column name. This +differs from SQL standard behavior, where columns are inserted based on position. + +- INSERT and REPLACE do not support all options available in [ingestion specs](../ingestion/ingestion-spec.md), +including the `createBitmapIndex` and `multiValueHandling` [dimension](../ingestion/ingestion-spec.md#dimension-objects) +properties, and the `indexSpec` [`tuningConfig`](../ingestion/ingestion-spec.md#tuningconfig) property. ## EXTERN - The [schemaless dimensions](../ingestion/ingestion-spec.md#inclusions-and-exclusions) feature is not available. All columns and their types must be specified explicitly using the `signature` parameter - of the [EXTERN function](msq-reference.md#extern). + of the [EXTERN function](reference.md#extern). - EXTERN with input sources that match large numbers of files may exhaust available memory on the controller task. diff --git a/docs/multi-stage-query/msq-concepts.md b/docs/multi-stage-query/msq-concepts.md deleted file mode 100644 index a7d9cf893ac..00000000000 --- a/docs/multi-stage-query/msq-concepts.md +++ /dev/null @@ -1,168 +0,0 @@ ---- -id: concepts -title: SQL-based ingestion concepts -sidebar_label: Key concepts ---- - - - -> SQL-based ingestion using the multi-stage query task engine is our recommended solution starting in Druid 24.0. Alternative ingestion solutions, such as native batch and Hadoop-based ingestion systems, will still be supported. We recommend you read all [known issues](./msq-known-issues.md) and test the feature in a development environment before rolling it out in production. Using the multi-stage query task engine with `SELECT` statements that do not write to a datasource is experimental. - -This topic covers the main concepts and terminology of the multi-stage query architecture. - -## Vocabulary - -You might see the following terms in the documentation or while you're using the multi-stage query architecture and task engine, such as when you view the report for a query: - -- **Controller**: An indexing service task of type `query_controller` that manages - the execution of a query. There is one controller task per query. - -- **Worker**: Indexing service tasks of type `query_worker` that execute a - query. There can be multiple worker tasks per query. Internally, - the tasks process items in parallel using their processing pools (up to `druid.processing.numThreads` of execution parallelism - within a worker task). - -- **Stage**: A stage of query execution that is parallelized across - worker tasks. Workers exchange data with each other between stages. - -- **Partition**: A slice of data output by worker tasks. In INSERT or REPLACE - queries, the partitions of the final stage become Druid segments. - -- **Shuffle**: Workers exchange data between themselves on a per-partition basis in a process called - shuffling. During a shuffle, each output partition is sorted by a clustering key. - -## How the MSQ task engine works - -Query tasks, specifically queries for INSERT, REPLACE, and SELECT, execute using indexing service tasks. Every query occupies at least two task slots while running. - -When you submit a query task to the MSQ task engine, the following happens: - -1. The Broker plans your SQL query into a native query, as usual. - -2. The Broker wraps the native query into a task of type `query_controller` - and submits it to the indexing service. - -3. The Broker returns the task ID to you and exits. - -4. The controller task launches some number of worker tasks determined by - the `maxNumTasks` and `taskAssignment` [context parameters](./msq-reference.md#context-parameters). You can set these settings individually for each query. - -5. The worker tasks execute the query. - -6. If the query is a SELECT query, the worker tasks send the results - back to the controller task, which writes them into its task report. - If the query is an INSERT or REPLACE query, the worker tasks generate and - publish new Druid segments to the provided datasource. - - -## Parallelism - -Parallelism affects performance. - -The [`maxNumTasks`](./msq-reference.md#context-parameters) query parameter determines the maximum number of tasks (workers and one controller) your query will use. Generally, queries perform better with more workers. The lowest possible value of `maxNumTasks` is two (one worker and one controller), and the highest possible value is equal to the number of free task slots in your cluster. - -The `druid.worker.capacity` server property on each Middle Manager determines the maximum number -of worker tasks that can run on each server at once. Worker tasks run single-threaded, which -also determines the maximum number of processors on the server that can contribute towards -multi-stage queries. Since data servers are shared between Historicals and -Middle Managers, the default setting for `druid.worker.capacity` is lower than the number of -processors on the server. Advanced users may consider enhancing parallelism by increasing this -value to one less than the number of processors on the server. In most cases, this increase must -be accompanied by an adjustment of the memory allotment of the Historical process, -Middle-Manager-launched tasks, or both, to avoid memory overcommitment and server instability. If -you are not comfortable tuning these memory usage parameters to avoid overcommitment, it is best -to stick with the default `druid.worker.capacity`. - -## Memory usage - -Increasing the amount of available memory can improve performance as follows: - -- Segment generation becomes more efficient when data doesn't spill to disk as often. -- Sorting stage output data becomes more efficient since available memory affects the - number of required sorting passes. - -Worker tasks use both JVM heap memory and off-heap ("direct") memory. - -On Peons launched by Middle Managers, the bulk of the JVM heap (75%) is split up into two bundles of equal size: one processor bundle and one worker bundle. Each one comprises 37.5% of the available JVM heap. - -The processor memory bundle is used for query processing and segment generation. Each processor bundle must -also provides space to buffer I/O between stages. Specifically, each downstream stage requires 1 MB of buffer space for -each upstream worker. For example, if you have 100 workers running in stage 0, and stage 1 reads from stage 0, -then each worker in stage 1 requires 1M * 100 = 100 MB of memory for frame buffers. - -The worker memory bundle is used for sorting stage output data prior to shuffle. Workers can sort -more data than fits in memory; in this case, they will switch to using disk. - -Worker tasks also use off-heap ("direct") memory. Set the amount of direct -memory available (`-XX:MaxDirectMemorySize`) to at least -`(druid.processing.numThreads + 1) * druid.processing.buffer.sizeBytes`. Increasing the -amount of direct memory available beyond the minimum does not speed up processing. - -It may be necessary to override one or more memory-related parameters if you run into one of the [known issues](./msq-known-issues.md) around memory usage. - -## Limits - -Knowing the limits for the MSQ task engine can help you troubleshoot any [errors](#error-codes) that you encounter. Many of the errors occur as a result of reaching a limit. - -The following table lists query limits: - -|Limit|Value|Error if exceeded| -|-----|-----|-----------------| -| Size of an individual row written to a frame. Row size when written to a frame may differ from the original row size. | 1 MB | `RowTooLarge` | -| Number of segment-granular time chunks encountered during ingestion. | 5,000 | `TooManyBuckets` | -| Number of input files/segments per worker. | 10,000 | `TooManyInputFiles` | -| Number of output partitions for any one stage. Number of segments generated during ingestion. |25,000 | `TooManyPartitions` | -| Number of output columns for any one stage. | 2,000 | `TooManyColumns` | -| Number of workers for any one stage. | Hard limit is 1,000. Memory-dependent soft limit may be lower. | `TooManyWorkers` | -| Maximum memory occupied by broadcasted tables. | 30% of each [processor memory bundle](#memory-usage). | `BroadcastTablesTooLarge` | - -## Error codes - -The following table describes error codes you may encounter in the `multiStageQuery.payload.status.errorReport.error.errorCode` field: - -|Code|Meaning|Additional fields| -|----|-----------|----| -| BroadcastTablesTooLarge | The size of the broadcast tables, used in right hand side of the joins, exceeded the memory reserved for them in a worker task. | `maxBroadcastTablesSize`: Memory reserved for the broadcast tables, measured in bytes. | -| Canceled | The query was canceled. Common reasons for cancellation:

  • User-initiated shutdown of the controller task via the `/druid/indexer/v1/task/{taskId}/shutdown` API.
  • Restart or failure of the server process that was running the controller task.
| | -| CannotParseExternalData | A worker task could not parse data from an external datasource. | | -| ColumnNameRestricted| The query uses a restricted column name. | | -| ColumnTypeNotSupported| Support for writing or reading from a particular column type is not supported. | | -| ColumnTypeNotSupported | The query attempted to use a column type that is not supported by the frame format. This occurs with ARRAY types, which are not yet implemented for frames. | `columnName`

`columnType` | -| InsertCannotAllocateSegment | The controller task could not allocate a new segment ID due to conflict with existing segments or pending segments. Common reasons for such conflicts:

  • Attempting to mix different granularities in the same intervals of the same datasource.
  • Prior ingestions that used non-extendable shard specs.
| `dataSource`

`interval`: The interval for the attempted new segment allocation. | -| InsertCannotBeEmpty | An INSERT or REPLACE query did not generate any output rows in a situation where output rows are required for success. This can happen for INSERT or REPLACE queries with `PARTITIONED BY` set to something other than `ALL` or `ALL TIME`. | `dataSource` | -| InsertCannotOrderByDescending | An INSERT query contained a `CLUSTERED BY` expression in descending order. Druid's segment generation code only supports ascending order. | `columnName` | -| InsertCannotReplaceExistingSegment | A REPLACE query cannot proceed because an existing segment partially overlaps those bounds, and the portion within the bounds is not fully overshadowed by query results.

There are two ways to address this without modifying your query:
  • Shrink the OVERLAP filter to match the query results.
  • Expand the OVERLAP filter to fully contain the existing segment.
| `segmentId`: The existing segment
-| InsertLockPreempted | An INSERT or REPLACE query was canceled by a higher-priority ingestion job, such as a real-time ingestion task. | | -| InsertTimeNull | An INSERT or REPLACE query encountered a null timestamp in the `__time` field.

This can happen due to using an expression like `TIME_PARSE(timestamp) AS __time` with a timestamp that cannot be parsed. (TIME_PARSE returns null when it cannot parse a timestamp.) In this case, try parsing your timestamps using a different function or pattern.

If your timestamps may genuinely be null, consider using COALESCE to provide a default value. One option is CURRENT_TIMESTAMP, which represents the start time of the job. | -| InsertTimeOutOfBounds | A REPLACE query generated a timestamp outside the bounds of the TIMESTAMP parameter for your OVERWRITE WHERE clause.

To avoid this error, verify that the you specified is valid. | `interval`: time chunk interval corresponding to the out-of-bounds timestamp | -| InvalidNullByte | A string column included a null byte. Null bytes in strings are not permitted. | `column`: The column that included the null byte | -| QueryNotSupported | QueryKit could not translate the provided native query to a multi-stage query.

This can happen if the query uses features that aren't supported, like GROUPING SETS. | | -| RowTooLarge | The query tried to process a row that was too large to write to a single frame. See the [Limits](#limits) table for the specific limit on frame size. Note that the effective maximum row size is smaller than the maximum frame size due to alignment considerations during frame writing. | `maxFrameSize`: The limit on the frame size. | -| TaskStartTimeout | Unable to launch all the worker tasks in time.

There might be insufficient available slots to start all the worker tasks simultaneously.

Try splitting up the query into smaller chunks with lesser `maxNumTasks` number. Another option is to increase capacity. | | -| TooManyBuckets | Exceeded the number of partition buckets for a stage. Partition buckets are only used for `segmentGranularity` during INSERT queries. The most common reason for this error is that your `segmentGranularity` is too narrow relative to the data. See the [Limits](./msq-concepts.md#limits) table for the specific limit. | `maxBuckets`: The limit on buckets. | -| TooManyInputFiles | Exceeded the number of input files/segments per worker. See the [Limits](./msq-concepts.md#limits) table for the specific limit. | `umInputFiles`: The total number of input files/segments for the stage.

`maxInputFiles`: The maximum number of input files/segments per worker per stage.

`minNumWorker`: The minimum number of workers required for a successful run. | -| TooManyPartitions | Exceeded the number of partitions for a stage. The most common reason for this is that the final stage of an INSERT or REPLACE query generated too many segments. See the [Limits](./msq-concepts.md#limits) table for the specific limit. | `maxPartitions`: The limit on partitions which was exceeded | -| TooManyColumns | Exceeded the number of columns for a stage. See the [Limits](#limits) table for the specific limit. | `maxColumns`: The limit on columns which was exceeded. | -| TooManyWarnings | Exceeded the allowed number of warnings of a particular type. | `rootErrorCode`: The error code corresponding to the exception that exceeded the required limit.

`maxWarnings`: Maximum number of warnings that are allowed for the corresponding `rootErrorCode`. | -| TooManyWorkers | Exceeded the supported number of workers running simultaneously. See the [Limits](#limits) table for the specific limit. | `workers`: The number of simultaneously running workers that exceeded a hard or soft limit. This may be larger than the number of workers in any one stage if multiple stages are running simultaneously.

`maxWorkers`: The hard or soft limit on workers that was exceeded. | -| NotEnoughMemory | Insufficient memory to launch a stage. | `serverMemory`: The amount of memory available to a single process.

`serverWorkers`: The number of workers running in a single process.

`serverThreads`: The number of threads in a single process. | -| WorkerFailed | A worker task failed unexpectedly. | `workerTaskId`: The ID of the worker task. | -| WorkerRpcFailed | A remote procedure call to a worker task failed and could not recover. | `workerTaskId`: the id of the worker task | -| UnknownError | All other errors. | | \ No newline at end of file diff --git a/docs/multi-stage-query/msq-reference.md b/docs/multi-stage-query/msq-reference.md deleted file mode 100644 index 10af0422712..00000000000 --- a/docs/multi-stage-query/msq-reference.md +++ /dev/null @@ -1,169 +0,0 @@ ---- -id: reference -title: SQL-based ingestion reference -sidebar_label: Reference ---- - - - -> SQL-based ingestion using the multi-stage query task engine is our recommended solution starting in Druid 24.0. Alternative ingestion solutions, such as native batch and Hadoop-based ingestion systems, will still be supported. We recommend you read all [known issues](./msq-known-issues.md) and test the feature in a development environment before rolling it out in production. Using the multi-stage query task engine with `SELECT` statements that do not write to a datasource is experimental. - -This topic is a reference guide for the multi-stage query architecture in Apache Druid. - -## Context parameters - -In addition to the Druid SQL [context parameters](../querying/sql-query-context.md), the multi-stage query task engine accepts certain context parameters that are specific to it. - -Use context parameters alongside your queries to customize the behavior of the query. If you're using the API, include the context parameters in the query context when you submit a query: - -```json -{ - "query": "SELECT 1 + 1", - "context": { - "": "", - "maxNumTasks": 3 - } -} -``` - -If you're using the Druid console, you can specify the context parameters through various UI options. - -The following table lists the context parameters for the MSQ task engine: - -|Parameter|Description|Default value| -|---------|-----------|-------------| -| maxNumTasks | SELECT, INSERT, REPLACE

The maximum total number of tasks to launch, including the controller task. The lowest possible value for this setting is 2: one controller and one worker. All tasks must be able to launch simultaneously. If they cannot, the query returns a `TaskStartTimeout` error code after approximately 10 minutes.

May also be provided as `numTasks`. If both are present, `maxNumTasks` takes priority.| 2 | -| taskAssignment | SELECT, INSERT, REPLACE

Determines how many tasks to use. Possible values include:
  • `max`: Uses as many tasks as possible, up to `maxNumTasks`.
  • `auto`: When file sizes can be determined through directory listing (for example: local files, S3, GCS, HDFS) uses as few tasks as possible without exceeding 10 GiB or 10,000 files per task, unless exceeding these limits is necessary to stay within `maxNumTasks`. When file sizes cannot be determined through directory listing (for example: http), behaves the same as `max`.
  • | `max` | -| finalizeAggregations | SELECT, INSERT, REPLACE

    Determines the type of aggregation to return. If true, Druid finalizes the results of complex aggregations that directly appear in query results. If false, Druid returns the aggregation's intermediate type rather than finalized type. This parameter is useful during ingestion, where it enables storing sketches directly in Druid tables. For more information about aggregations, see [SQL aggregation functions](../querying/sql-aggregations.md). | true | -| rowsInMemory | INSERT or REPLACE

    Maximum number of rows to store in memory at once before flushing to disk during the segment generation process. Ignored for non-INSERT queries. In most cases, use the default value. You may need to override the default if you run into one of the [known issues](./msq-known-issues.md) around memory usage. | 100,000 | -| segmentSortOrder | INSERT or REPLACE

    Normally, Druid sorts rows in individual segments using `__time` first, followed by the [CLUSTERED BY](./index.md#clustered-by) clause. When you set `segmentSortOrder`, Druid sorts rows in segments using this column list first, followed by the CLUSTERED BY order.

    You provide the column list as comma-separated values or as a JSON array in string form. If your query includes `__time`, then this list must begin with `__time`. For example, consider an INSERT query that uses `CLUSTERED BY country` and has `segmentSortOrder` set to `__time,city`. Within each time chunk, Druid assigns rows to segments based on `country`, and then within each of those segments, Druid sorts those rows by `__time` first, then `city`, then `country`. | empty list | -| maxParseExceptions| SELECT, INSERT, REPLACE

    Maximum number of parse exceptions that are ignored while executing the query before it stops with `TooManyWarningsFault`. To ignore all the parse exceptions, set the value to -1.| 0 | -| rowsPerSegment | INSERT or REPLACE

    The number of rows per segment to target. The actual number of rows per segment may be somewhat higher or lower than this number. In most cases, use the default. For general information about sizing rows per segment, see [Segment Size Optimization](../operations/segment-optimization.md). | 3,000,000 | -| sqlTimeZone | Sets the time zone for this connection, which affects how time functions and timestamp literals behave. Use a time zone name like "America/Los_Angeles" or offset like "-08:00".| `druid.sql.planner.sqlTimeZone` on the Broker (default: UTC)| -| useApproximateCountDistinct | Whether to use an approximate cardinality algorithm for `COUNT(DISTINCT foo)`.| `druid.sql.planner.useApproximateCountDistinct` on the Broker (default: true)| - -## Error codes - -Error codes have corresponding human-readable messages that explain the error. For more information about the error codes, see [Error codes](./msq-concepts.md#error-codes). - -## SQL syntax - -The MSQ task engine has three primary SQL functions: - -- EXTERN -- INSERT -- REPLACE - -For information about using these functions and their corresponding examples, see [MSQ task engine query syntax](./index.md#msq-task-engine-query-syntax). For information about adjusting the shape of your data, see [Adjust query behavior](./index.md#adjust-query-behavior). - -### EXTERN - -Use the EXTERN function to read external data. - -Function format: - -```sql -SELECT - -FROM TABLE( - EXTERN( - '', - '', - '' - ) -) -``` - -EXTERN consists of the following parts: - -1. Any [Druid input source](../ingestion/native-batch-input-source.md) as a JSON-encoded string. -2. Any [Druid input format](../ingestion/data-formats.md) as a JSON-encoded string. -3. A row signature, as a JSON-encoded array of column descriptors. Each column descriptor must have a `name` and a `type`. The type can be `string`, `long`, `double`, or `float`. This row signature is used to map the external data into the SQL layer. - -### INSERT - -Use the INSERT function to insert data. - -Unlike standard SQL, INSERT inserts data according to column name and not positionally. This means that it is important for the output column names of subsequent INSERT queries to be the same as the table. Do not rely on their positions within the SELECT clause. - -Function format: - -```sql -INSERT INTO
-SELECT - -FROM
-PARTITIONED BY
+< SELECT query > +PARTITIONED BY