2022-09-06 13:36:09 -04:00
---
id: known-issues
title: SQL-based ingestion known issues
sidebar_label: Known issues
---
<!--
~ Licensed to the Apache Software Foundation (ASF) under one
~ or more contributor license agreements. See the NOTICE file
~ distributed with this work for additional information
~ regarding copyright ownership. The ASF licenses this file
~ to you under the Apache License, Version 2.0 (the
~ "License"); you may not use this file except in compliance
~ with the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing,
~ software distributed under the License is distributed on an
~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
~ KIND, either express or implied. See the License for the
~ specific language governing permissions and limitations
~ under the License.
-->
2022-09-17 00:58:11 -04:00
> This page describes SQL-based batch ingestion using the [`druid-multi-stage-query`](../multi-stage-query/index.md)
> extension, new in Druid 24.0. Refer to the [ingestion methods](../ingestion/index.md#batch) table to determine which
> ingestion method is right for you.
2022-09-06 13:36:09 -04:00
2022-09-16 05:15:26 -04:00
## Multi-stage query task runtime
2022-09-06 13:36:09 -04:00
2023-01-10 21:08:29 -05:00
- Fault tolerance is partially implemented. Workers get relaunched when they are killed unexpectedly. The controller does not get relaunched if it is killed unexpectedly.
2022-09-06 13:36:09 -04:00
2022-09-16 05:15:26 -04:00
- Worker task stage outputs are stored in the working directory given by `druid.indexer.task.baseDir` . Stages that
generate a large amount of output data may exhaust all available disk space. In this case, the query fails with
2022-12-08 16:18:26 -05:00
an [UnknownError ](./reference.md#error_UnknownError ) with a message including "No space left on device".
2022-09-17 00:58:11 -04:00
2023-01-17 11:41:57 -05:00
## `SELECT` Statement
2022-09-17 00:58:11 -04:00
2023-01-17 11:41:57 -05:00
- `SELECT` from a Druid datasource does not include unpublished real-time data.
2022-09-17 00:58:11 -04:00
2023-01-17 11:41:57 -05:00
- `GROUPING SETS` and `UNION ALL` are not implemented. Queries using these features return a
2022-12-08 16:18:26 -05:00
[QueryNotSupported ](reference.md#error_QueryNotSupported ) error.
2022-10-05 22:47:03 -04:00
2023-01-17 11:41:57 -05:00
- For some `COUNT DISTINCT` queries, you'll encounter a [QueryNotSupported ](reference.md#error_QueryNotSupported ) error
2022-12-08 16:18:26 -05:00
that includes `Must not have 'subtotalsSpec'` as one of its causes. This is caused by the planner attempting to use
2023-01-17 11:41:57 -05:00
`GROUPING SET` s, which are not implemented.
2022-09-06 13:36:09 -04:00
2023-01-17 11:41:57 -05:00
- The numeric varieties of the `EARLIEST` and `LATEST` aggregators do not work properly. Attempting to use the numeric
2022-09-16 05:15:26 -04:00
varieties of these aggregators lead to an error like
`java.lang.ClassCastException: class java.lang.Double cannot be cast to class org.apache.druid.collections.SerializablePair` .
The string varieties, however, do work properly.
2022-09-06 13:36:09 -04:00
2023-01-17 11:41:57 -05:00
## `INSERT` and `REPLACE` Statements
2022-09-17 00:58:11 -04:00
2023-01-17 11:41:57 -05:00
- The `INSERT` and `REPLACE` statements with column lists, like `INSERT INTO tbl (a, b, c) SELECT ...` , is not implemented.
2022-09-06 13:36:09 -04:00
2023-01-17 11:41:57 -05:00
- `INSERT ... SELECT` and `REPLACE ... SELECT` insert columns from the `SELECT` statement based on column name. This
2022-09-17 00:58:11 -04:00
differs from SQL standard behavior, where columns are inserted based on position.
2022-09-06 13:36:09 -04:00
2023-01-17 11:41:57 -05:00
- `INSERT` and `REPLACE` do not support all options available in [ingestion specs ](../ingestion/ingestion-spec.md ),
2022-09-17 00:58:11 -04:00
including the `createBitmapIndex` and `multiValueHandling` [dimension ](../ingestion/ingestion-spec.md#dimension-objects )
properties, and the `indexSpec` [`tuningConfig` ](../ingestion/ingestion-spec.md#tuningconfig ) property.
2022-09-06 13:36:09 -04:00
2023-01-17 11:41:57 -05:00
## `EXTERN` Function
2022-09-06 13:36:09 -04:00
- The [schemaless dimensions ](../ingestion/ingestion-spec.md#inclusions-and-exclusions )
2022-09-16 05:15:26 -04:00
feature is not available. All columns and their types must be specified explicitly using the `signature` parameter
2023-01-17 11:41:57 -05:00
of the [`EXTERN` function ](reference.md#extern-function ).
2022-09-06 13:36:09 -04:00
2023-01-17 11:41:57 -05:00
- `EXTERN` with input sources that match large numbers of files may exhaust available memory on the controller task.
2022-09-06 13:36:09 -04:00
2023-01-17 11:41:57 -05:00
- `EXTERN` refers to external files. Use `FROM` to access `druid` input sources.