Commit Graph

3093 Commits

Author SHA1 Message Date
Clint Wylie aa230642dd
use PeekableIntIterator for OR filter "partial index" value matchers (#16300) 2024-04-17 08:27:21 -07:00
Gian Merlino ccc1ffb032
Additional short circuiting knowledge in filter bundles. (#16292)
* Additional short circuiting knowledge in filter bundles.

Three updates:

1) The parameter "selectionRowCount" on "makeFilterBundle" is renamed
   "applyRowCount", and redefined as an upper bound on rows remaining
   after short-circuiting (rather than number of rows selected so far).
   This definition works better for OR filters, which pass through the
   FALSE set rather than the TRUE set to the next subfilter.

2) AndFilter uses min(applyRowCount, indexIntersectionSize) rather
   than using selectionRowCount for the first subfilter and indexIntersectionSize
   for each filter thereafter. This improves accuracy when the incoming
   applyRowCount is smaller than the row count from the first few indexes.

3) OrFilter uses min(applyRowCount, totalRowCount - indexUnionSize) rather
   than applyRowCount for subfilters. This allows an OR filter to pass
   information about short-circuiting to its subfilters.

To help write tests for this, the patch also moves the sampled
wikiticker data file from sql to processing.

* Forbidden APIs.

* Forbidden APIs.

* Better comments.

* Fix inspection.

* Adjustments to tests.
2024-04-16 22:42:28 -07:00
Gian Merlino cf841b8e67
Fix incorrect class in BaseMacroFunctionExpr.equals. (#16294)
The equals method cast to the wrong class, potentially leading to
ClassCastException.
2024-04-16 09:40:46 -07:00
Adarsh Sanjeev 3df00aef9d
Add manifest file for MSQ export (#15953)
Currently, export creates the files at the provided destination. The addition of the manifest file will provide a list of files created as part of the manifest. This will allow easier consumption of the data exported from Druid, especially for automated data pipelines
2024-04-15 11:37:31 +05:30
Kashif Faraz 81d7b6ebe1
Fix OverlordClient to read reports as a concrete `ReportMap` (#16226)
Follow up to #16217 

Changes:
- Update `OverlordClient.getReportAsMap()` to return `TaskReport.ReportMap`
- Move the following classes to `org.apache.druid.indexer.report` in the `druid-processing` module
  - `TaskReport`
  - `KillTaskReport`
  - `IngestionStatsAndErrorsTaskReport`
  - `TaskContextReport`
  - `TaskReportFileWriter`
  - `SingleFileTaskReportFileWriter`
  - `TaskReportSerdeTest`
- Remove `MsqOverlordResourceTestClient` as it had only one method
which is already present in `OverlordResourceTestClient` itself
2024-04-15 08:00:59 +05:30
Gian Merlino b0c5184f9d
Fix ORDER BY on certain GROUPING SETS. (#16268)
* Fix ORDER BY on certain GROUPING SETS.

DefaultLimitSpec (part of native groupBy) had a bug where it would assume
that results are naturally ordered by dimensions even when subtotalsSpec
is present. However, this is not necessarily the case. For certain
combinations of ORDER BY and GROUPING SETS, this would cause the ORDER BY
to be ignored.

* Fix test testGroupByWithSubtotalsSpecWithOrderLimitForcePushdown. Resorting was necessary.
2024-04-12 12:06:47 -07:00
Sree Charan Manamala f65c166327
Windowed aggregates should update the aggregation value based on final compute (#16244) 2024-04-12 08:28:33 +02:00
Pranav fc2600b8e2
Adding jvmVersion dimension in JVM Monitor (#16262) 2024-04-11 15:44:56 -07:00
Gian Merlino 9f358f5f4a
SQL tests: avoid mixing skip and cannot vectorize. (#16251)
* SQL tests: avoid mixing skip and cannot vectorize.

skipVectorize switches off vectorization tests completely, and
cannotVectorize turns vectorization tests into negative tests. It doesn't
make sense to use them together, so this patch makes it an error to do so,
and cleans up cases where both are mentioned.

This patch also has the effect of changing various tests from skipVectorize
to cannotVectorize, because in the past when both were mentioned,
skipVectorize would take priority.

* Fix bug with StringAnyAggregatorFactory attempting to vectorize when it cannt.

* Fix tests.
2024-04-11 15:06:11 -07:00
Vishesh Garg 3d595cfab1
Add storeCompactionState flag support to msq (#15965)
Compaction in the native engine by default records the state of compaction for each segment in the lastCompactionState segment field. This PR adds support for doing the same in the MSQ engine, targeted for future cases such as REPLACE and compaction done via MSQ.

Note that this PR doesn't implicitly store the compaction state for MSQ replace tasks; it is stored with flag "storeCompactionState": true in the query context.
2024-04-09 16:47:47 +05:30
Vishesh Garg 9a4fb58543
Record column name for exceptions while writing frames in RowBasedFrameWriter (#16130)
Current Runtime Exceptions generated while writing frames only include the exception itself without including the name of the column they were encountered in. This patch introduces the further information in the error and makes it non-retryable.
2024-04-09 15:39:10 +05:30
Gian Merlino a319b44545
Allow typedIn to run in replace-with-default mode. (#16233)
* Allow typedIn to run in replace-with-default mode.

Useful when data servers, like Historicals, are running in replace-with-default
mode and the Broker is running in SQL-compatible mode, which can happen during
a rolling update that is applying a mode change.
2024-04-04 15:45:42 -07:00
Gian Merlino b0ca06f8cd
Fix name of combining filtered aggregator factory. (#16224)
The name of the combining filtered aggregator factory should be the same
as the name of the original factory. However, it wasn't the same in the
case where the original factory's name and the original delegate aggregator
were inconsistently named. In this scenario, we should use the name of
the original filtered aggregator, not the name of the original delegate
aggregator.
2024-04-02 12:59:48 -07:00
Sree Charan Manamala 26f9b174de
Handling nil selector column in vector math processors (#16128) 2024-04-02 02:06:57 -07:00
Pranav 20de7fd95a
Geo spatial interfaces (#16029)
This PR creates an interface for ImmutableRTree and moved the existing implementation to new class which represent 32 bit implementation (stores coordinate as floats). This PR makes the ImmutableRTree extendable to create higher precision implementation as well (64 bit).
In all spatial bound filters, we accept float as input which might not be accurate in the case of high precision implementation of ImmutableRTree. This PR changed the bound filters to accepts the query bounds as double instead of float and it is backward compatible change as it compares double to existing float values in RTree. Previously it was comparing input float to RTree floats which can cause precision loss, now it is little better as it compares double to float which is still not 100% accurate.
There are no changes in the way that we query spatial dimension today except input bound parsing. There is little improvement in string filter predicate which now parse double strings instead of float and compares double to double which is 100% accurate but string predicate is only called when we dont have spatial index.
With allowing the interface to extend ImmutableRTree, we allow to create high precision (HP) implementation and defines new search strategies to perform HP search Iterable<ImmutableBitmap> search(ImmutableDoubleNode node, Bound bound);
With possible HP implementations, Radius bound filter can not really focus on accuracy, it is calculating Euclidean distance in comparing. As EARTH 🌍 is round and not flat, Euclidean distances are not accurate in geo system. This PR adds new param called 'radiusUnit' which allows you to specify units like meters, km, miles etc. It uses https://en.wikipedia.org/wiki/Haversine_formula to check if given geo point falls inside circle or not. Added a test that generates set of points inside and outside in RadiusBoundTest.
2024-04-01 14:58:03 +05:30
Soumyava 524842a3bb
Window function on msq (#15470)
This PR aims to introduce Window functions on MSQ by doing the following:

    Introduce a Window querykit for handling window queries along with its factory and a processor for window queries
    If a window operator is present with a partition by clause, pushes the partition as a shuffle spec of the previous stage
    In presence of empty OVER() clause lets all operators loose on a single rac
    In presence of no empty OVER() clause, breaks down each window into individual stages
    Associated machinery to handle window functions in MSQ
    Introduced a separate hidden engine feature WINDOW_LEAF_OPERATOR which is set only for MSQ engine. In presence of this feature, the planner plans without the leaf operators by creating a window query over an inner scan query. In case of native this is set to false and the planner generates the leafOperators
    Guardrails around materialization
    Comprehensive UTs
2024-03-28 14:58:34 +05:30
Abhishek Radhakrishnan cf9a3bdc14
Fix up error handling in unusedSegments API. (#16206)
Changes:
- Handle exceptions in the API and map them to a `Response` object with the appropriate error code.
- Replace `AuthorizationUtils.filterAuthorizedResources()` with `DatasourceResourceFilter`.
The endpoint is annotated consistent with other usages.
- Update `DatasourceResourceFilter` to remove the lambda and update javadocs.
The usages information is self-evident with an IDE.
- Adjust the invalid interval exception message.
- Break up the large unit test `testGetUnusedSegmentsInDataSource()` into smaller unit tests
for each test case. Also, validate the error codes.
2024-03-27 12:31:21 +05:30
Gian Merlino 58a8a23243
Avoid conversion to String in JsonReader, JsonNodeReader. (#15693)
* Avoid conversion to String in JsonReader, JsonNodeReader.

These readers were running UTF-8 decode on the provided entity to
convert it to a String, then parsing the String as JSON. The patch
changes them to parse the provided entity's input stream directly.

In order to preserve the nice error messages that include parse errors,
the readers now need to open the entity again on the error path, to
re-read the data. To make this possible, the InputEntity#open contract
is tightened to require the ability to re-open entities, and existing
InputEntity implementations are updated to allow re-opening.

This patch also renames JsonLineReaderBenchmark to JsonInputFormatBenchmark,
updates it to benchmark all three JSON readers, and adds a case that reads
fields out of the parsed row (not just creates it).

* Fixes for static analysis.

* Implement intermediateRowAsString in JsonReader.

* Enhanced JsonInputFormatBenchmark.

Renames JsonLineReaderBenchmark to JsonInputFormatBenchmark, and enhances it to
test various readers (JsonReader, JsonLineReader, JsonNodeReader) as well as
to test with/without field discovery.
2024-03-26 08:16:05 -07:00
Sree Charan Manamala f29c8ac368
Allow non literal rhs in MV_FILTER_ONLY and MV_FILTER_NONE (#16113)
This commit allows to use the MV_FILTER_ONLY & MV_FILTER_NONE functions
with a non literal argument. 
Currently `select mv_filter_only('mvd_dim', 'array_dim') from 'table'`
returns a `Unhandled Query Planning Failure`

This is being tackled and also considered for the cases where the `array_dim`
having null & empty values.

Changed classes:
 * `MultiValueStringOperatorConversions`
 * `ApplyFunction`
 * `CalciteMultiValueStringQueryTest`
2024-03-26 12:31:09 +05:30
Kashif Faraz 323d67a0ac
Add errorCode to failure type `InternalServerError` (#16186)
Changes:
- Use error code `internalServerError` for failures of this type
- Remove the error code argument from `InternalServerError.exception()` methods
thus fixing a bug in the callers.
2024-03-24 04:24:09 +05:30
Clint Wylie b0a9c318d6
add new typed in filter (#16039)
changes:
* adds TypedInFilter which preserves matching sets in the native match value type
* SQL planner uses new TypedInFilter when druid.generic.useDefaultValueForNull=false (the default)
2024-03-22 12:45:08 -07:00
AmatyaAvadhanula 488d376209
Optimize isOvershadowed when there is a unique minor version for an interval (#15952)
* Optimize isOvershadowed for intervals with timechunk locking
2024-03-20 19:30:00 +05:30
Clint Wylie 48b8d42698
fix regexp_like, contains_string, icontains_string to return null instead of false for null inputs in sql compatible mode (#15963) 2024-03-19 22:12:47 -07:00
Gian Merlino c96b215dd6
SortMerge join support for IS NOT DISTINCT FROM. (#16003)
* SortMerge join support for IS NOT DISTINCT FROM.

The patch adds a "requiredNonNullKeyParts" field to the sortMerge
processor, which has the list of key parts that must be nonnull for
an equijoin condition to match. Conditions with SQL "=" are present in
the list; conditions with SQL "IS NOT DISTINCT FROM" are absent from
the list.

* Fix test.

* Update javadoc.
2024-03-19 12:02:13 -07:00
Laksh Singla e3b75ac11f
Fix FloatFirstVectorAggregationTest (#16165) 2024-03-19 17:42:29 +05:30
Zoltan Haindrich 0a42342cef
Update Calcite*Test to use junit5 (#16106)
* Update Calcite*Test to use junit5

* change the way temp dirs are handled
* add openrewrite workflow to safeguard upgrade
* replace junitparamrunner with standard junit5 parametered tests
* update a few rules to junit5 api
* lots of boring changes

* cleanup QueryLogHook

* cleanup

* fix compile error: ARRAYS_DATASOURCE

* fix test

* remove enclosed

* empty

+TEST:TDigestSketchSqlAggregatorTest,HllSketchSqlAggregatorTest,DoublesSketchSqlAggregatorTest,ThetaSketchSqlAggregatorTest,ArrayOfDoublesSketchSqlAggregatorTest,BloomFilterSqlAggregatorTest,BloomDimFilterSqlTest,CatalogIngestionTest,CatalogQueryTest,FixedBucketsHistogramQuantileSqlAggregatorTest,QuantileSqlAggregatorTest,MSQArraysTest,MSQDataSketchesTest,MSQExportTest,MSQFaultsTest,MSQInsertTest,MSQLoadedSegmentTests,MSQParseExceptionsTest,MSQReplaceTest,MSQSelectTest,InsertLockPreemptedFaultTest,MSQWarningsTest,SqlMSQStatementResourcePostTest,SqlStatementResourceTest,CalciteSelectJoinQueryMSQTest,CalciteSelectQueryMSQTest,CalciteUnionQueryMSQTest,MSQTestBase,VarianceSqlAggregatorTest,SleepSqlTest,SqlRowTransformerTest,DruidAvaticaHandlerTest,DruidStatementTest,BaseCalciteQueryTest,CalciteArraysQueryTest,CalciteCorrelatedQueryTest,CalciteExplainQueryTest,CalciteExportTest,CalciteIngestionDmlTest,CalciteInsertDmlTest,CalciteJoinQueryTest,CalciteLookupFunctionQueryTest,CalciteMultiValueStringQueryTest,CalciteNestedDataQueryTest,CalciteParameterQueryTest,CalciteQueryTest,CalciteReplaceDmlTest,CalciteScanSignatureTest,CalciteSelectQueryTest,CalciteSimpleQueryTest,CalciteSubqueryTest,CalciteSysQueryTest,CalciteTableAppendTest,CalciteTimeBoundaryQueryTest,CalciteUnionQueryTest,CalciteWindowQueryTest,DecoupledPlanningCalciteJoinQueryTest,DecoupledPlanningCalciteQueryTest,DecoupledPlanningCalciteUnionQueryTest,DrillWindowQueryTest,DruidPlannerResourceAnalyzeTest,IngestTableFunctionTest,QueryTestRunner,SqlTestFrameworkConfig,SqlAggregationModuleTest,ExpressionsTest,GreatestExpressionTest,IPv4AddressMatchExpressionTest,IPv4AddressParseExpressionTest,IPv4AddressStringifyExpressionTest,LeastExpressionTest,TimeFormatOperatorConversionTest,CombineAndSimplifyBoundsTest,FiltrationTest,SqlQueryTest,CalcitePlannerModuleTest,CalcitesTest,DruidCalciteSchemaModuleTest,DruidSchemaNoDataInitTest,InformationSchemaTest,NamedDruidSchemaTest,NamedLookupSchemaTest,NamedSystemSchemaTest,RootSchemaProviderTest,SystemSchemaTest,CalciteTestBase,SqlResourceTest

* use @Nested

* add rule to remove enclosed; upgrade surefire

* remove enclosed

* cleanup

* add comment about surefire exclude
2024-03-19 04:05:12 -07:00
Laksh Singla f5c5573da8 Revert "Fix FloatFirstVectorAggregationTest"
This reverts commit c40bb2c8f0.
2024-03-19 14:39:09 +05:30
Laksh Singla c40bb2c8f0 Fix FloatFirstVectorAggregationTest 2024-03-19 14:31:13 +05:30
Soumyava c7823bca98
Adding null check to earliest and latest aggs (#15972)
* Adding null check to earliest and latest aggs

* Native tests for null inPairs
2024-03-19 09:44:14 +05:30
Kashif Faraz 466057c61b
Remove deprecated DruidException, EntryExistsException (#14448)
Changes:
- Remove deprecated `DruidException` (old one) and `EntryExistsException`
- Use newly added comprehensive `DruidException` instead
- Update error message in `SqlMetadataStorageActionHandler` when max packet limit is violated.
- Factor out common code from several faults into `BaseFault`.
- Slightly update javadoc in `DruidException` to render it correctly
- Remove unused classes `SegmentToMove`, `SegmentToDrop`
- Move `ServletResourceUtils` from module `druid-processing` to `druid-server`
- Add utility method to build error Response from `DruidException`.
2024-03-15 21:29:11 +05:30
Clint Wylie dd9bc3749a
fix issues with array_contains and array_overlap with null left side arguments (#15974)
changes:
* fix issues with array_contains and array_overlap with null left side arguments
* modify singleThreaded stuff to allow optimizing Function similar to how we do for ExprMacro - removed SingleThreadSpecializable in favor of default impl of asSingleThreaded on Expr with clear javadocs that most callers shouldn't be calling it directly and should be using Expr.singleThreaded static method which uses a shuttle and delegates to asSingleThreaded instead
* add optimized 'singleThreaded' versions of array_contains and array_overlap
* add mv_harmonize_nulls native expression to use with MV_CONTAINS and MV_OVERLAP to allow them to behave consistently with filter rewrites, coercing null and [] into [null]
* fix bug with casting rhs argument for native array_contains and array_overlap expressions
2024-03-13 18:16:10 -07:00
Zoltan Haindrich 818cc9eedf
Fix toString for SingleThreadSpecializable ConstantExprs (#16084) 2024-03-13 03:48:52 -07:00
Clint Wylie aa2959b2bd
reset keySerde when closing groupers to clear out heap dictionaries (#16114)
ConcurrentGrouper kind of misuses ThreadLocal to hold a SpillingGrouper, and never calls remove() on it, which can result in large amounts of heap being retained as weak references even after grouping is finished. This PR calls keySerde.reset() on all of the Grouper.close() implementations that have a KeySerde and should free up a bunch of space that is no longer needed.
2024-03-13 15:09:54 +05:30
Soumyava 85ee775390
Handling latest_by and earliest_by on numeric columns correctly (#15939)
* Handling latest_by and earliest_by on numeric columns correctly

* Adding test
2024-03-11 13:49:21 -07:00
Clint Wylie 313da98879
decouple column serializer compression closers from SegmentWriteoutMedium to optionally allow serializers to release direct memory allocated for compression earlier than when segment is completed (#16076) 2024-03-11 12:28:04 -07:00
Vishesh Garg b1c1937e94
Change last update timestamp granularity of GCS objects from seconds to milliseconds (#16083)
The previously used GCS API client library returned last update time for objects directly in milliseconds. The new library returns it in OffsetDateTime format which was being converted to seconds and stored against the object. This fix converts the time back to ms before storing it.
2024-03-09 07:54:33 +05:30
Kashif Faraz 5f203725dd
Clean up SqlSegmentsMetadataManager and corresponding tests (#16044)
Changes:

Improve `SqlSegmentsMetadataManager`
- Break the loop in `populateUsedStatusLastUpdated` before going to sleep if there are no more segments to update
- Add comments and clean up logs

Refactor `SqlSegmentsMetadataManagerTest`
- Merge `SqlSegmentsMetadataManagerEmptyTest` into this test
- Add method `testPollEmpty`
- Shave a few seconds off of the tests by reducing poll duration
- Simplify creation of test segments
- Some renames here and there
- Remove unused methods
- Move `TestDerbyConnector.allowLastUsedFlagToBeNull` to this class

Other minor changes
- Add javadoc to `NoneShardSpec`
- Use lambda in `SqlSegmentMetadataPublisher`
2024-03-08 07:34:51 +05:30
Laksh Singla 5f588fa45c
Fix bug while materializing scan's result to frames (#15987)
While converting Sequence<ScanResultValue> to Sequence<Frames>, when maxSubqueryBytes is enabled, we batch the results to prevent creating a single frame per ScanResultValue. Batching requires peeking into the actual value, and checking if the row signature of the scan result’s value matches that of the previous value.

Since we can do this indefinitely (in the worst case all of them have the same signature), we keep fetching them and accumulating them in a list (on the heap). We don’t really know how much to batch before we actually write the value as frames.

The PR modifies the batching logic to not accumulate the results in an intermediary list
2024-03-07 17:11:44 +05:30
Zoltan Haindrich 27d7c30c38
Only use ExprEval in ConstantExpr if its known that it will be safe (#15694)
* `Expr#singleThreaded` which creates a singleThreaded version of the actual expression (caching ExprEval is allowed)
* `Expr#makeSingleThreaded` to make a whole subtree of expressions 'singleThreaded' - uses `Shuttle` to create the new expression tree
* `ConstantExpr#singleThreaded` creates a specialized `ConstantExpr` which does cache the `ExprEval`
* some `@Immutable` annotations were added to make it more likely to notice that there might be something off if a similar change will be made around here for some reason
2024-03-05 12:53:09 -08:00
Gian Merlino e13ed7b878
Improve javadoc for ReadableFieldPointer. (#16043)
Since #15175, the javadoc for ReadableFieldPointer is somewhat out of date. It says that
the pointer only points to the beginning of the field, but this is no longer true. This
patch updates the javadoc to be more accurate.
2024-03-05 11:52:49 -08:00
Zoltan Haindrich bb882727c0
Fix Windowing/scanAndSort query issues on top of Joins. (#15996)
allow a hashjoin result to be converted to RowsAndColumns
added StorageAdapterRowsAndColumns
fix incorrect isConcrete() return values during early phase of planning
2024-03-05 15:05:31 +05:30
Clint Wylie 6e3b33d8c2
fix bug with like filter on missing columns not correctly considering extractionFn (#16037) 2024-03-04 23:27:50 -08:00
Zoltan Haindrich e469b7ed34
Make setting QUERY_CONTEXT_DEFAULT explicit in tests (#16010) 2024-03-05 10:54:16 +05:30
Gian Merlino 930655ff18
Move retries into DataSegmentPusher implementations. (#15938)
* Move retries into DataSegmentPusher implementations.

The individual implementations know better when they should and should
not retry. They can also generate better error messages.

The inspiration for this patch was a situation where EntityTooLarge was
generated by the S3DataSegmentPusher, and retried uselessly by the
retry harness in PartialSegmentMergeTask.

* Fix missing var.

* Adjust imports.

* Tests, comments, style.

* Remove unused import.
2024-03-04 10:36:21 -08:00
Gian Merlino 376a41f1e9
Rows.objectToNumber: Accept decimals with output type LONG. (#15999)
* Rows.objectToNumber: Accept decimals with output type LONG.

PR #15615 added an optimization to avoid parsing numbers twice in cases
where we know that they should definitely be longs or
definitely be doubles. Rather than try parsing as long first, and then
try parsing as double, it would use only the parsing routine specific to
the requested outputType.

This caused a bug: previously, we would accept decimals like "1.0" or
"1.23" as longs, by truncating them to "1". After that patch, we would
treat such decimals as nulls when the outputType is set to LONG.

This patch retains the short-circuit for doubles: if outputType is
DOUBLE, we only parse the string as a double. But for outputType LONG,
this patch restores the old behavior: try to parse as long first,
then double.
2024-03-04 22:00:27 +05:30
Zoltan Haindrich bf0995f846
Introduce dynamic table append (#15897) 2024-03-01 04:31:57 -05:00
Clint Wylie 101176590c
adaptive filter partitioning (#15838)
* cooler cursor filter processing allowing much smart utilization of indexes by feeding selectivity forward, with implementations for range and predicate based filters
* added new method Filter.makeFilterBundle which cursors use to get indexes and matchers for building offsets
* AND filter partitioning is now pushed all the way down, even to nested AND filters
* vector engine now uses same indexed base value matcher strategy for OR filters which partially support indexes
2024-02-29 15:38:12 -08:00
Jan Werner baaa4a6808
update common-compress to address CVE-2024-25710 CVE-2024-26308 (#16009)
* Update common-compress to 1.26.0 to address CVEs CVE-2024-25710 CVE-2024-26308
* Add commons-codec as a runtime dependency required by common-compress 1.26.0

---------

Co-authored-by: Xavier Léauté <xl+github@xvrl.net>
2024-02-29 14:05:31 -08:00
Sensor e0bce0ef90
Add pre-check for heavy debug logs (#15706)
Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
Co-authored-by: Benedict Jin <asdf2014@apache.org>
2024-02-29 12:58:14 +05:30
Laksh Singla 17e4f3ac60
Refactor GroupBy and TopN code to relax the constraint of dimensions being comparable (#15559)
The code in the groupBy engine and the topN engine assume that the dimensions are comparable and can call dimA.compareTo(dimB) to sort the dimensions and group them together.
This works well for the primitive dimensions, because they are Comparable, however falls apart when the dimensions can be arrays (or in future scenarios complex columns). In cases when the dimensions are not comparable, Druid resorts to having a wrapper type ComparableStringArray and ComparableList, which is a Comparable, based on the list comparator.
2024-02-27 11:39:29 +05:30