--- id: index title: SQL-based ingestion sidebar_label: SQL-based ingestion description: Introduces multi-stage query architecture and its task engine --- :::info This page describes SQL-based batch ingestion using the [`druid-multi-stage-query`](../multi-stage-query/index.md) extension, new in Druid 24.0. Refer to the [ingestion methods](../ingestion/index.md#batch) table to determine which ingestion method is right for you. ::: Apache Druid supports SQL-based ingestion using the bundled [`druid-multi-stage-query` extension](#load-the-extension). This extension adds a [multi-stage query task engine for SQL](concepts.md#multi-stage-query-task-engine) that allows running SQL [INSERT](concepts.md#insert) and [REPLACE](concepts.md#replace) statements as batch tasks. As an experimental feature, the task engine also supports running `SELECT` queries as batch tasks. Nearly all `SELECT` capabilities are available in the multi-stage query (MSQ) task engine, with certain exceptions listed on the [Known issues](./known-issues.md#select-statement) page. This allows great flexibility to apply transformations, filters, JOINs, aggregations, and so on as part of `INSERT ... SELECT` and `REPLACE ... SELECT` statements. This also allows in-database transformation: creating new tables based on queries of other tables. ## Vocabulary - **Controller**: An indexing service task of type `query_controller` that manages the execution of a query. There is one controller task per query. - **Worker**: Indexing service tasks of type `query_worker` that execute a query. There can be multiple worker tasks per query. Internally, the tasks process items in parallel using their processing pools (up to `druid.processing.numThreads` of execution parallelism within a worker task). - **Stage**: A stage of query execution that is parallelized across worker tasks. Workers exchange data with each other between stages. - **Partition**: A slice of data output by worker tasks. In INSERT or REPLACE queries, the partitions of the final stage become Druid segments. - **Shuffle**: Workers exchange data between themselves on a per-partition basis in a process called shuffling. During a shuffle, each output partition is sorted by a clustering key. ## Load the extension To add the extension to an existing cluster, add `druid-multi-stage-query` to `druid.extensions.loadlist` in your `common.runtime.properties` file. For more information about how to load an extension, see [Loading extensions](../configuration/extensions.md#loading-extensions). To use [EXTERN](reference.md#extern-function), you need READ permission on the resource named "EXTERNAL" of the resource type "EXTERNAL". If you encounter a 403 error when trying to use `EXTERN`, verify that you have the correct permissions. The same is true of any of the input-source specific table function such as `S3` or `LOCALFILES`. ## Next steps - [Read about key concepts](./concepts.md) to learn more about how SQL-based ingestion and multi-stage queries work. - [Check out the examples](./examples.md) to see SQL-based ingestion in action. - [Explore the Query view](../operations/web-console.md) to get started in the web console.