Custom calcite rule mimicking AggregateProjectMergeRule to extend support to expressions.
The current calcite rule return null in such cases.
In addition, this removes the redundant references.
* change to using measure name
* Implment order by delta
* less paring, stricter types
* safeDivide0
* fix no query
* new DTQ alows parsing JSON_VALUE(...RETURNING...)
MSQ sorts the columns in a highly specialized manner by byte comparisons. As such the values are serialized differently. This works well for the primitive types and primitive arrays, however complex types cannot be serialized specially.
This PR adds the support for sorting the complex columns by deserializing the value from the field and comparing it via the type strategy. This is a lot slower than the byte comparisons, however, it's the only way to support sorting on complex columns that can have arbitrary serialization not optimized for MSQ.
The primitives and the arrays are still compared via the byte comparison, therefore this doesn't affect the performance of the queries supported before the patch. If there's a sorting key with mixed complex and primitive/primitive array types, for example: longCol1 ASC, longCol2 ASC, complexCol1 DESC, complexCol2 DESC, stringCol1 DESC, longCol3 DESC, longCol4 ASC, the comparison will happen like:
longCol1, longCol2 (ASC) - Compared together via byte-comparison, since both are byte comparable and need to be sorted in ascending order
complexCol1 (DESC) - Compared via deserialization, cannot be clubbed with any other field
complexCol2 (DESC) - Compared via deserialization, cannot be clubbed with any other field, even though the prior field was a complex column with the same order
stringCol1, longCol3 (DESC) - Compared together via byte-comparison, since both are byte comparable and need to be sorted in descending order
longCol4 (ASC) - Compared via byte-comparison, couldn't be coalesced with the previous fields as the direction was different
This way, we only deserialize the field wherever required
* Altered `QueryTestBuilder` to be able to switch to a backing quidem test
* added a small crc to ensure that the shadow testcase does not deviate from the original one
* Packaged all decoupled related things into a a single `DecoupledExtension` to reduce copy-paste
* `DecoupledTestConfig#quidemReason` must describe why its being used
* `DecoupledTestConfig#separateDefaultModeTest` can be used to make multiple case files based on `NullHandling` state
* fixed a cosmetic bug during decoupled join translation
* enhanced `!druidPlan` to report the final logical plan in non-decoupled mode as well
* add check to ensure that only supported params are present in a druidtest uri
* enabled shadow testcases for previously disabled testcases
Changes:
- Remove `SegmentLockReleaseAction` as it is not used anywhere.
It is not even registered as a known sub-type of `TaskAction`.
- Minor refactor in `TaskLockbox`. No functional change.
- Remove `ExpectedException` from `TaskLockboxTest`
Changes:
- Remove deprecated `markAsUnused` parameter from `KillUnusedSegmentsTask`
- Allow `kill` task to use `REPLACE` lock when `useConcurrentLocks` is true
- Use `EXCLUSIVE` lock by default
* enable quidem uri support for `druidtest:///?ComponentSupplier=Nested` and similar
* changes the way `SqlTestFrameworkConfig` is being applied; all options will have their own annotation (its kinda impossible to detect that an annotation has a set value or its the default)
* enables hierarchical processing of config annotation (was needed to enable class level supplier annotation)
* moves uri processing related string2config stuff into `SqlTestFrameworkConfig`
With this PR changes, MSQ tasks (MSQControllerTask and MSQWorkerTask) only load the required lookups during querying and ingestion, based on the value of CTX_LOOKUPS_TO_LOAD key in the query context.
Remove software.amazon.ion:ion-java from the licenses as it is no longer a transient dependency of aws-java-sdk-core
Verified that after version 1.12.638 of aws-java-sdk-core doesnt have the ion-java as a dependency
Fixes a few minor issues with scripts.
- Add additional information around since it was confusing, and not clear that the number was the ID from github and not just the major version number.
- Fix an issue where the milestone displayed in an output message was the milestone supplied as an argument, instead of the number of the milestone the PR is already tagged against in Github, from the sent request.
Add validation for reindex with realtime sources.
With the addition of concurrent compaction, it is possible to ingest data while querying from realtime sources with MSQ into the same datasource. This could potentially lead to issues if the interval that is ingested into is replaced by an MSQ job, which has queried only some of the data from the realtime task.
This PR adds validation to check that the datasource being ingested into is not being queried from, if the query includes realtime sources.
* specify node type so that the log filename can get resolved
* Update distribution/docker/druid.sh
Co-authored-by: Benedict Jin <asdf2014@apache.org>
---------
Co-authored-by: Benedict Jin <asdf2014@apache.org>
This parameter has been removed for awhile now as of Druid 0.23.0
https://github.com/apache/druid/pull/12187.
The code was only used in tests to verify that serialization works.
Now remove all references to avoid any confusion.
* Add native filter conversion for SCALAR_IN_ARRAY.
Main changes:
1) Add an implementation of "toDruidFilter" in ScalarInArrayOperatorConversion.
2) Split up Expressions.literalToDruidExpression into two functions, so the first
half (literalToExprEval) can be used by ScalarInArrayOperatorConversion to more
efficiently create the list of match values.
* Fix type in time arithmetic conversion.
* Test updates.
* Update test cases to use null instead of '' in default-value mode.
* Switch test from msqIncompatible to compatible with a different result.
* Update one more test.
* Fix test.
* Update tests.
* Use ExprEvalWrapper to differentiate between empty string and null.
* Fix tests some more.
* Fix test.
* Additional comment.
* Style adjustment.
* Fix tests.
* trueValue -> actualValue.
* Use different approach, DruidLiteral instead of ExprEvalWrapper.
* Revert changes in ArrayOfDoublesSketchSqlAggregatorTest.
* QueryableIndex: Close columns after failed vector cursor setup.
If anything fails while setting up a vector cursor, the prior code in
QueryableIndex would not close its ColumnCache and would therefore leak
columns. Columns often contain references to buffers that must be closed.
* Fix style.
* * add another catalog clustering columns unit test
* * dissallow clusterKeys with descending order
* * make more clear that clustering is re-written into ingest node
whether a catalog table or not
* * when partitionedBy is stored in catalog, user shouldnt need to specify
it in order to specify clustering
* * fix intellij inspection failure
update dependencies to address new batch of CVEs:
- Azure POM from 1.2.19 to 1.2.23 to update transitive dependency nimbus-jose-jwt to address: CVE-2023-52428
- commons-configuration2 from 2.8.0 to 2.10.1 to address: CVE-2024-29131 CVE-2024-29133
- bcpkix-jdk18on from 1.76 to 1.78.1 to address: CVE-2024-30172 CVE-2024-30171 CVE-2024-29857
* add rate and stats
* better tabs
* detail
* add recent errors
* update tests
* don't let people hide the actions column because why
* don't sort on actions
* better way to agg
* add timeouts
* show error only once
* fix tests and Explain showing up
* only consider active tasks
* refresh
* fix tests
* better formatting
Description:
All the streaming ingestion tasks for a given datasource share the same lock for a given interval.
Changing lock types in the supervisor can lead to segment allocation errors due to lock conflicts
for the new tasks while the older tasks are still running.
Fix:
Allow locks of different types (EXCLUSIVE, SHARED, APPEND, REPLACE) to co-exist if they have
the same interval and the same task group.