druid/docs/multi-stage-query/known-issues.md
Karan Kumar 56076d33fb
Worker retry for MSQ task (#13353)
* Initial commit.

* Fixing error message in retry exceeded exception

* Cleaning up some code

* Adding some test cases.

* Adding java docs.

* Finishing up state test cases.

* Adding some more java docs and fixing spot bugs, intellij inspections

* Fixing intellij inspections and added tests

* Documenting error codes

* Migrate current integration batch tests to equivalent MSQ tests (#13374)

* Migrate current integration batch tests to equivalent MSQ tests using new IT framework

* Fix build issues

* Trigger Build

* Adding more tests and addressing comments

* fixBuildIssues

* fix dependency issues

* Parameterized the test and addressed comments

* Addressing comments

* fixing checkstyle errors

* Adressing comments

* Adding ITTest which kills the worker abruptly

* Review comments phase one

* Adding doc changes

* Adjusting for single threaded execution.

* Adding Sequential Merge PR state handling

* Merge things

* Fixing checkstyle.

* Adding new context param for fault tolerance.
Adding stale task handling in sketchFetcher.
Adding UT's.

* Merge things

* Merge things

* Adding parameterized tests
Created separate module for faultToleranceTests

* Adding missed files

* Review comments and fixing tests.

* Documentation things.

* Fixing IT

* Controller impl fix.

* Fixing racy WorkerSketchFetcherTest.java exception handling.

Co-authored-by: abhagraw <99210446+abhagraw@users.noreply.github.com>
Co-authored-by: Karan Kumar <cryptoe@karans-mbp.lan>
2023-01-11 07:38:29 +05:30

3.6 KiB

id title sidebar_label
known-issues SQL-based ingestion known issues Known issues

This page describes SQL-based batch ingestion using the druid-multi-stage-query extension, new in Druid 24.0. Refer to the ingestion methods table to determine which ingestion method is right for you.

Multi-stage query task runtime

  • Fault tolerance is partially implemented. Workers get relaunched when they are killed unexpectedly. The controller does not get relaunched if it is killed unexpectedly.

  • Worker task stage outputs are stored in the working directory given by druid.indexer.task.baseDir. Stages that generate a large amount of output data may exhaust all available disk space. In this case, the query fails with an UnknownError with a message including "No space left on device".

SELECT

  • SELECT from a Druid datasource does not include unpublished real-time data.

  • GROUPING SETS and UNION ALL are not implemented. Queries using these features return a QueryNotSupported error.

  • For some COUNT DISTINCT queries, you'll encounter a QueryNotSupported error that includes Must not have 'subtotalsSpec' as one of its causes. This is caused by the planner attempting to use GROUPING SETs, which are not implemented.

  • The numeric varieties of the EARLIEST and LATEST aggregators do not work properly. Attempting to use the numeric varieties of these aggregators lead to an error like java.lang.ClassCastException: class java.lang.Double cannot be cast to class org.apache.druid.collections.SerializablePair. The string varieties, however, do work properly.

INSERT and REPLACE

  • INSERT and REPLACE with column lists, like INSERT INTO tbl (a, b, c) SELECT ..., is not implemented.

  • INSERT ... SELECT and REPLACE ... SELECT insert columns from the SELECT statement based on column name. This differs from SQL standard behavior, where columns are inserted based on position.

  • INSERT and REPLACE do not support all options available in ingestion specs, including the createBitmapIndex and multiValueHandling dimension properties, and the indexSpec tuningConfig property.

EXTERN

  • The schemaless dimensions feature is not available. All columns and their types must be specified explicitly using the signature parameter of the EXTERN function.

  • EXTERN with input sources that match large numbers of files may exhaust available memory on the controller task.

  • EXTERN does not accept druid input sources. Use FROM instead.