* When an ArrayList RAC creates a child RAC, the start and end offsets need to have the offset of parent's start offset
* Defaults the 2nd window bound to CURRENT ROW when only a single bound is specified
* Removes the windowingStrictValidation warning and throws a hard exception when Order By alongside RANGE clause is not provided with UNBOUNDED or CURRENT ROW as both bounds
Description:
Compaction operations issued by the Coordinator currently run using the native query engine.
As majority of the advancements that we are making in batch ingestion are in MSQ, it is imperative
that we support compaction on MSQ to make Compaction more robust and possibly faster.
For instance, we have seen OOM errors in native compaction that MSQ could have handled by its
auto-calculation of tuning parameters.
This commit enables compaction on MSQ to remove the dependency on native engine.
Main changes:
* `DataSourceCompactionConfig` now has an additional field `engine` that can be one of
`[native, msq]` with `native` being the default.
* if engine is MSQ, `CompactSegments` duty assigns all available compaction task slots to the
launched `CompactionTask` to ensure full capacity is available to MSQ. This is to avoid stalling which
could happen in case a fraction of the tasks were allotted and they eventually fell short of the number
of tasks required by the MSQ engine to run the compaction.
* `ClientCompactionTaskQuery` has a new field `compactionRunner` with just one `engine` field.
* `CompactionTask` now has `CompactionRunner` interface instance with its implementations
`NativeCompactinRunner` and `MSQCompactionRunner` in the `druid-multi-stage-query` extension.
The objectmapper deserializes `ClientCompactionRunnerInfo` in `ClientCompactionTaskQuery` to the
`CompactionRunner` instance that is mapped to the specified type [`native`, `msq`].
* `CompactTask` uses the `CompactionRunner` instance it receives to create the indexing tasks.
* `CompactionTask` to `MSQControllerTask` conversion logic checks whether metrics are present in
the segment schema. If present, the task is created with a native group-by query; if not, the task is
issued with a scan query. The `storeCompactionState` flag is set in the context.
* Each created `MSQControllerTask` is launched in-place and its `TaskStatus` tracked to determine the
final status of the `CompactionTask`. The id of each of these tasks is the same as that of `CompactionTask`
since otherwise, the workers will be unable to determine the controller task's location for communication
(as they haven't been launched via the overlord).
temp fix until CALCITE-6435 gets fixed (released&upgraded to)
added a custom rule (FixIncorrectInExpansionTypes) to fix-up types of the affected literals
added a testcase which will alert on upgrade
* first pass
* more changes
* fix tests and formatting
* fix kinesis failing tests
* fix kafka tests
* add dimension name to float parse errors
* double and convertToType handling of dimensionName can report parse errors with dimension name
* fix checkstyle issue
* fix tests
* more cases to have better parse exception messages
* fix test
* fix tests
* partially address comments
* annotate method parameter with nullable
* address comments
* fix tests
* let float, double, long dimensionIndexer pass dimensionName down to dimensionHandlerUtils
* fix compilation error and clean up formatting
* clean up whitespace
* address feedback. undo change, pass down report parse exception for convertToType
* fix test
index_realtime tasks were removed from the documentation in #13107. Even
at that time, they weren't really documented per se— just mentioned. They
existed solely to support Tranquility, which is an obsolete ingestion
method that predates migration of Druid to ASF and is no longer being
maintained. Tranquility docs were also de-linked from the sidebars and
the other doc pages in #11134. Only a stub remains, so people with
links to the page can see that it's no longer recommended.
index_realtime_appenderator tasks existed in the code base, but were
never documented, nor as far as I am aware were they used for any purpose.
This patch removes both task types completely, as well as removes all
supporting code that was otherwise unused. It also updates the stub
doc for Tranquility to be firmer that it is not compatible. (Previously,
the stub doc said it wasn't recommended, and pointed out that it is
built against an ancient 0.9.2 version of Druid.)
ITUnionQueryTest has been migrated to the new integration tests framework and updated to use Kafka ingestion.
Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
* Turn invalid periods into user-facing exception providing more context.
The current exception is targeting the ADMIN persona. Catch that and turn
it into a USER persona instead. Also, provide more context in the error
message.
* Review comment: pass the wrapping expression and stringify.
* Update processing/src/main/java/org/apache/druid/query/expression/ExprUtils.java
Co-authored-by: Clint Wylie <cjwylie@gmail.com>
---------
Co-authored-by: Clint Wylie <cjwylie@gmail.com>
* contains Make a full copy of the parser and apply our modifications to it #16503
* some minor api changes pair/entry
* some unnecessary aggregation was removed from a set of queries in `CalciteSubqueryTest`
* `AliasedOperatorConversion` was detecting `CHAR_LENGTH` as not a function ; I've removed the check
* the field it was using doesn't look maintained that much
* the `kind` is passed for the created `SqlFunction` so I don't think this check is actually needed
* some decoupled test cases become broken - will be fixed later
* some aggregate related changes: due to the fact that SUM() and COUNT() of no inputs are different
* upgrade avatica to 1.25.0
* `CalciteQueryTest#testExactCountDistinctWithFilter` is now executable
Closeapache/druid#16503
* fix NestedDataColumnIndexerV4 to not report cardinality
changes:
* fix issue similar to #16489 but for NestedDataColumnIndexerV4, which can report STRING type if it only processes a single type of values. this should be less common than the auto indexer problem
* fix some issues with sql benchmarks
This fixes an issue where in some cases, a SQL syntax error encountered when parsing / planning a query results in an error returned to the user with persona a `admin` when it should instead be `user`.
* Fallback vectorization for FunctionExpr and BaseMacroFunctionExpr.
This patch adds FallbackVectorProcessor, a processor that adapts non-vectorizable
operations into vectorizable ones. It is used in FunctionExpr and BaseMacroFunctionExpr.
In addition:
- Identifiers are updated to offer getObjectVector for ARRAY and COMPLEX in addition
to STRING. ExprEvalObjectVector is updated to offer ARRAY and COMPLEX as well.
- In SQL tests, cannotVectorize now fails tests if an exception is not thrown. This makes
it easier to identify tests that can now vectorize.
- Fix a null-matcher bug in StringObjectVectorValueMatcher.
* Fix tests.
* Fixes.
* Fix tests.
* Fix test.
* Fix test.
* Simplify serialized form of JsonInputFormat.
Use JsonInclude for keepNullColumns, assumeNewlineDelimited, and
useJsonNodeReader. Because the default value of keepNullColumns is
variable, we store the original configured value rather than the
derived value, and include if the original value is nonnull.
* Fix test.
UnnestStorageAdapter and its cursors did not return capabilities correctly
for the output column. This patch fixes two problems:
1) UnnestStorageAdapter returned the capabilities of the unnest virtual
column prior to unnesting. It should return the post-unnest capabilities.
2) UnnestColumnValueSelectorCursor passed through isDictionaryEncoded from
the unnest virtual column. This is incorrect, because the dimension selector
created by this class never has a dictionary. This is the cause of #16543.
* Add interface method for returning canonical lookup name
* Address review comment
* Add test in LookupReferencesManagerTest for coverage check
* Add test in LookupSerdeModuleTest for coverage check