Upstream release: https://github.com/delta-io/delta/releases/tag/v3.2.0
- Upgrade kernel dependency to 3.2.0
- Notable breaking changes introduced in upstream that affects the Druid extension:
- Rename TableClient -> Engine
- Rename DefaultTableClient -> DefaultEngine
- Exceptions moved to a separate package
- Table.getPath() doesn't throw TableNotFoundException. Instead the exception is thrown
when getting snapshot info from the Table object
This commit aims to enable the re-ordering of window operators in order to optimise
the sort and partition operators.
Example :
```
SELECT m1, m2,
SUM(m1) OVER(PARTITION BY m2) as sum1,
SUM(m2) OVER() as sum2
from numFoo
GROUP BY m1,m2
```
In order to compute this query, we can order the operators as to first compute the operators
corresponding to sum2 and then place the operators corresponding to sum1 which would
help us in reducing one sort operator if we order our operators by sum1 and then sum2.
There are a few issues with using Jackson serialization in sending datasketches between controller and worker in MSQ. This caused a blowup due to holding multiple copies of the sketch being stored.
This PR aims to resolve this by switching to deserializing the sketch payload without Jackson.
The PR adds a new query parameter used during communication between controller and worker while fetching sketches, "sketchEncoding".
If the value of this parameter is OCTET, the sketch is returned as a binary encoding, done by ClusterByStatisticsSnapshotSerde.
If the value is not the above, the sketch is encoded by Jackson as before.
* fix issue with auto column grouping
changes:
* fixes bug where AutoTypeColumnIndexer reports incorrect cardinality, allowing it to incorrectly use array grouper algorithm for realtime queries producing incorrect results for strings
* fixes bug where auto LONG and DOUBLE type columns incorrectly report not having null values, resulting in incorrect null handling when grouping
* fix test
* Altered `QueryTestBuilder` to be able to switch to a backing quidem test
* added a small crc to ensure that the shadow testcase does not deviate from the original one
* Packaged all decoupled related things into a a single `DecoupledExtension` to reduce copy-paste
* `DecoupledTestConfig#quidemReason` must describe why its being used
* `DecoupledTestConfig#separateDefaultModeTest` can be used to make multiple case files based on `NullHandling` state
* fixed a cosmetic bug during decoupled join translation
* enhanced `!druidPlan` to report the final logical plan in non-decoupled mode as well
* add check to ensure that only supported params are present in a druidtest uri
* enabled shadow testcases for previously disabled testcases
This brings them in line with the behavior of other numeric aggregations.
It is important because otherwise ClassCastExceptions can arise if comparing
different numeric types that may arise from deserialization.