Commit Graph

3318 Commits

Author SHA1 Message Date
Akshat Jain ca8f24edd3
Upgrade Guice to 5.1.0 (#17578)
* Move Guice to 5.1.0 and fix tests
* Fix checkstyle
* Revert overrideCurrentGuiceModules() and related changes
* Fix the tests
* Try using maven:3-openjdk-17-slim
* Try enabling debugging for mvn command
* Use maven:3.9 image
* Address review comment: Fix formatting
* Address review comment: Add brief javadoc for ExceptionMatcher
---------
Co-authored-by: imply-cheddar <86940447+imply-cheddar@users.noreply.github.com>
2024-12-19 09:08:20 +05:30
Akshat Jain 6fad11fe57
Revert "Add back `UnnecessaryFullyQualifiedName` rule in pmd ruleset (#17570)" (#17584)
This reverts commit cd6083fb94.
2024-12-18 08:29:10 -08:00
Rishabh Singh d5eb94d0e0
Restore Sink Metric Emission Behaviour: Emit them per-Sink instead of per-FireHydrant (#17170)
* Emit aggregate segment processing metrics per sink instead of firehydrant

* add docs

* minor change

* checkstyle

* Fix DefaultQueryMetricsTest

* Minor changes in SinkMetricsEmittingQueryRunner

* spotbugs

* Address review comments

* Use ImmutableSet and ImmutableMap

* Create a helper class for saving state of StubServiceEmitter

* Add SinkQuerySegmentWalkerBenchmark

* Create SegmentMetrics class for tracking segment metrics

---------

Co-authored-by: Akshat Jain <akjn11@gmail.com>
2024-12-18 14:17:14 +05:30
Clint Wylie a44ab109d5
remove druid.expressions.useStrictBooleans in favor of always being true (#17568) 2024-12-17 18:49:16 -08:00
Karan Kumar 8b81c91979
Remove unused fields. (#17579) 2024-12-17 13:34:50 -08:00
Adarsh Sanjeev bb4416a17b
Join context hints (#17541)
* join hints draft

* join algo

* propagate join hints

* review comments

* Use direct hints instead

* Add tests

* Pass preferred algo through pre join clause

* Refactors

* Fix tests

* Revert test changes

* Fix serialization

* Fix tests

* Fix test

* Fix test

* Fix test for sql compat mode

* Increase coverage

* Refactored hint class

---------

Co-authored-by: sreemanamala <sree.manamala@imply.io>
2024-12-17 22:25:22 +05:30
Clint Wylie de9da37384
topn with granularity regression fixes (#17565)
* topn with granularity regression fixes

changes:
* fix issue where topN with query granularity other than ALL would use the heap algorithm when it was actual able to use the pooled algorithm, and incorrectly used the pool algorithm in cases where it must use the heap algorithm, a regression from #16533
* fix issue where topN with query granularity other than ALL could incorrectly process values in the wrong time bucket, another regression from #16533

* move defensive check outside of loop

* more test

* extra layer of safety

* move check outside of loop

* fix spelling

* add query context parameter to allow using pooled algorithm for topN when multi-passes is required even wihen query granularity is not all

* add comment, revert IT context changes and add new context flag
2024-12-17 21:21:24 +05:30
Akshat Jain 98b960c6ac
Refactor: Replace explicit type arguments with diamond operator (#17567)
Since we aren't supporting Java 8 anymore, we can switch to diamond operators
without specifying explicit type arguments.
2024-12-17 14:37:45 +05:30
Kashif Faraz e80a05c38e
Add test and comments for RetryUtils.nextSleep (#17556) 2024-12-17 13:10:13 +05:30
Akshat Jain cd6083fb94
Add back `UnnecessaryFullyQualifiedName` rule in pmd ruleset (#17570)
* Add back UnnecessaryFullyQualifiedName rule in pmd ruleset

* Fix checkstyle
2024-12-17 12:43:12 +05:30
Akshat Jain fed36844f1
Re-visit previously disabled spotbugs patterns and enable them (#17560) 2024-12-13 15:24:40 +01:00
Akshat Jain a26e4c0e06
Cleanup unreachable Java 8 code flows (#17559) 2024-12-13 15:24:21 +01:00
Kashif Faraz 24e5d8a9e8
Refactor: Minor cleanup of segment allocation flow (#17524)
Changes
--------
- Simplify the arguments of IndexerMetadataStorageCoordinator.allocatePendingSegment
- Remove field SegmentCreateRequest.upgradedFromSegmentId as it was always null
- Miscellaneous cleanup
2024-12-13 07:46:57 +05:30
Zoltan Haindrich 1a38434d8d
Restore usage of filtered SUM (#17378) 2024-12-12 10:30:42 +01:00
Clint Wylie 80d2cd3632
snapshot column capabilities for realtime cursors (#17386)
* snapshot column capabilities for realtime cursors

changes:
* adds `CursorBuildSpec.getPhysicalColumns()` to allow specifying the set of required physical columns from a segment. if null, all columns are assumed to be required (e.g. full scan)
* `IncrementalIndexCursorFactory`/`IncrementalIndexCursorHolder` uses the physical columns from the cursor build spec to know which set of dimensions to 'snapshot' the capabilities for, allowing expression selectors on realtime queries to no longer be required to treat selectors from `StringDimensionIndexer` as multi-valued unless they truly are multi-valued. this fixes several bugs with expressions on realtime queries that change a value from `StringDimensionIndexer` to some type other than string, which would often result in a single element array from the column being handled as multi-valued
* `StringDimensionIndexer.setSparseIndexed()` now adds the default value to the dictionary when set
* `StringDimensionIndexer` column value selectors now always report that they are dictionary encoded, and that name lookup is possible in advance on their selectors (since set sparse adds the null value so the cardinality is correct)
* fixed a mistake that expression selectors for realtime queries with no null values could not use dictionary encoded selectors

* hmm

* test changes

* cleanup

* add test coverage

* fix test

* fixes

* cleanup
2024-12-09 08:44:54 -08:00
Abhishek Radhakrishnan 3a2220c68d
Refactor: Move some classes from `sql` to `processing` & `server` for reusability (#17542)
This PR contains non-functional / refactoring changes of the following classes in the sql module:

1. Move ExplainPlan and ExplainAttributes fromsql/src/main/java/org/apache/druid/sql/http to processing/src/main/java/org/apache/druid/query/explain
2. Move sql/src/main/java/org/apache/druid/sql/SqlTaskStatus.java -> processing/src/main/java/org/apache/druid/query/http/SqlTaskStatus.java
3. Add a new class processing/src/main/java/org/apache/druid/query/http/ClientSqlQuery.java that is effectively a thin POJO version of SqlQuery in the sql module but without any of the Calcite functionality and business logic.
4. Move BrokerClient, BrokerClientImpl and Broker classes from sql/src/main/java/org/apache/druid/sql/client to server/src/main/java/org/apache/druid/client/broker.
5. Remove BrokerServiceModule that provided the BrokerClient. The functionality is now contained in ServiceClientModule in the server package itself which provides all the clients as well.

This is done so that we can reuse the said classes in #17353 without brining in Calcite and other dependencies to the Overlord.
2024-12-06 09:32:03 -08:00
Zoltan Haindrich c1ef38b052
Minor fixes and enhancements in UnionQuery handling (#17483)
* plan consistently with either UnionDataSource or UnionQuery for decoupled mode
* expose errors
* move decoupled related setting from PlannerConfig to QueryContexts
2024-11-28 10:05:12 +01:00
Akshat Jain dd46c7722d
Remove pre-java-11 profile (#17511)
We have removed support for Java 8 in #17466. This PR removes an unused profile pre-java-11 which activated for JDK < 11.
2024-11-26 08:43:20 +01:00
Clint Wylie ede9e4077a
add support for aggregate only projections (#17484) 2024-11-25 09:22:46 -08:00
Rishabh Singh 74422b58f5
Emit disk spill and merge buffer utilisation metrics for GroupBy queries (#17360)
This change is to emit following metrics as part of GroupByStatsMonitor monitor,
mergeBuffer/used -> Number of merge buffers used.
mergeBuffer/acquisitionTimeNs -> Total time required to acquire merge buffer.
mergeBuffer/acquisition -> Number of queries that acquired a batch of merge buffers.
groupBy/spilledQueries -> Number of queries that spilled onto the disk.
groupBy/spilledBytes-> Spilled bytes on the disk.
groupBy/mergeDictionarySize -> Size of the merging dictionary.
2024-11-22 14:22:03 +05:30
Adarsh Sanjeev df649c0bbd
Refactors (#17498)
Follow-up PR to #17493 to address pending unaddressed comments.
2024-11-22 09:22:38 +05:30
Adarsh Sanjeev 2726c6f388
Minor refactors to processing
Some refactors across druid to clean up the code and add utility functions where required.
2024-11-21 15:37:55 +05:30
Akshat Jain 17215cd677
Remove support for Java 8 (#17466)
All JDK 8 based CI checks have been removed.
    Images used in Dockerfile(s) have been updated to Java 17 based images.
    Documentation has been updated accordingly.
2024-11-21 15:33:08 +05:30
Clint Wylie 24a1fafaa7
projection segment merge fixes (#17460)
changes:
* fix issue when merging projections from multiple-incremental persists which was hoping that some 'dim conversion' buffers were not closed, but they already were (by the merging iterator). fix involves selectively persisting these conversion buffers to temp files in the segment write out directory and mapping them and tying them to the segment level closer so that they are available after the lifetime of the parent merger
* modify auto column serializers to use segment write out directory for temp files instead of java.io.tmpdir
* fix queryable index projection to not put the time-like column as a dimension, instead only adding it as __time
* use smoosh for temp files so can safely write any Serializer to a temp smoosh
2024-11-15 16:46:04 -08:00
Zoltan Haindrich f296102f05
ScanQuery should not ignore columnTypes in equals/hashCode (#17463)
* ScanQuery: equals/hashCode/toString
* DruidQuery: changes of Align ScanQuery column order with its desired signature #17457
* ScanQueryTest: add equalsverifer test
2024-11-12 14:26:59 +05:30
Virushade 8278a1f7df
Fix Javadocs in ColumnCapablities.java (#17462) 2024-11-12 11:30:33 +05:30
Vivek Dhiman 0dcc2bc469
Fixed NPE in `array_overlap` and `array_contains`. (#17465) 2024-11-08 20:39:14 -08:00
Gian Merlino 9c25226e06
QueryableIndexSegment: Re-use time boundary inspector. (#17397)
This patch re-uses timeBoundaryInspector for each cursor holder, which
enables caching of minDataTimestamp and maxDataTimestamp.

Fixes a performance regression introduced in #16533, where these fields
stopped being cached across cursors. Prior to that patch, they were
cached in the QueryableIndexStorageAdapter.
2024-11-06 09:27:59 -08:00
Zoltan Haindrich 2eac8318f8
Support Union in Decoupled planning (#17354)
* introduces `UnionQuery`
* some changes to enable a `UnionQuery` to have multiple input datasources
* `UnionQuery` execution is driven by the `QueryLogic` - which could later enable to reduce some complexity in `ClientQuerySegmentWalker`
* to run the subqueries of `UnionQuery` there was a need to access the `conglomerate` from the `Runner`; to enable that some refactors were done
* renamed `UnionQueryRunner` to `UnionDataSourceQueryRunner`
* `QueryRunnerFactoryConglomerate` have taken the place of `QueryToolChestWarehouse` which shaves of some unnecessary things here and there
* small cleanup/refactors
2024-11-05 16:58:57 +01:00
Clint Wylie 10208baab2
use big endian for compressed complex column values to fit object strategy expectations (#17422) 2024-10-29 10:21:09 -07:00
Adarsh Sanjeev b7c661b801
Make tempStorageDirectory configuration optional and rely on task dir instead (#17015)
Currently, durable storage and export both require configuring a temporary directory to be used using druid.export.storage.<connectorType>.tempLocalDir and druid.msq.intermediate.storage.tempDir.

Tasks on middle manager already have a configured temporary directory. This PR aims to reduce the configuration required by using the task directory as a default if it is not explicitly configured, thus reducing the number of configs that a user has to set.

Please note that preference would be given to the user configured, druid.*.storage.temp*Dir, on the tasks. If that is not configured, we then use the configured temporary directory.

Overlord and brokers also require storage connector configurations (for the durableStorageCleanerOverlordDuty and to fetch results of async queries respectively), but do not have a default temporary task directory. The configuration is still required for these services.
2024-10-29 13:36:59 +05:30
Gian Merlino 446a8f466f
Update errorprone, mockito, jacoco, checkerframework. (#17414)
* Update errorprone, mockito, jacoco, checkerframework.

This patch updates various build and test dependencies, to see if they
cause unit tests on JDK 21 to behave more reliably.

* Update licenses, tests.

* Remove assertEquals.

* Repair two tests.

* Update some more tests.
2024-10-28 11:34:03 -07:00
Clint Wylie 73675d0671
clean up some thread pools in tests (#17421) 2024-10-28 09:05:15 -07:00
Suraj Goel 7306d280cc
Migrate jaxb bind dependency to jakarta (#17370)
- Migrated from javax.xml.bind 2.3.1  to jakarta.xml.bind 2.3.3.
- Minor version is modified to avoid any breaking changes.
2024-10-26 21:24:17 -07:00
Gian Merlino 7e8671caa9
GroupByQueryConfig: Skip unnecessary toString. (#17396)
Calling toString on newConfig is unnecessary, because it will be done
automatically by the logger. This saves some effort under log levels
higher than DEBUG.
2024-10-23 19:57:22 +05:30
Clint Wylie 1157ecdec3
abstract common base of SQL micro-benchmarks to reduce boilerplate and standardize parameters (#17383)
changes:
* adds `SqlBenchmarkDatasets` which contains commonly used benchmark data generator schemas
* adds `SqlBaseBenchmark` which contains common benchmark segment generation methods for any benchmark using `SqlBenchmarkDatasets`
* adds `SqlBaseQueryBenchmark` and `SqlBasePlanBenchmark` for benchmarks measuring queries and planning respectively
* migrate all existing SQL jmh benchmarks to extend `SqlBaseQueryBenchmark`, quite dramatically reducing the boilerplate needed to create benchmarks, and allowing the use of multiple datasources within a benchmark file
* adjustments to data generator stuff to allow passing in an ObjectMapper so that the same mapper can be used for both benchmark queries and segment generation, avoiding the need to register stuff with both mappers for benchmarks
* adds `SqlProjectionsBenchmark` and `SqlComplexMetricsColumnsBenchmark` for measuring projections and measuring complex metric compression respectively
2024-10-22 19:37:17 -07:00
Laksh Singla 5b09329479
Fixes an issue with AppendableMemory that can cause MSQ jobs to fail (#17369) 2024-10-18 09:05:53 +05:30
Akshat Jain 450fb0147b
Add GlueingPartitioningOperator + Corresponding changes in window function layer to consume it for MSQ (#17038)
*    GlueingPartitioningOperator: It continuously receives data, and outputs batches of partitioned RACs. It maintains a last-partitioning-boundary of the last-pushed-RAC, and attempts to glue it with the next RAC it receives, ensuring that partitions are handled correctly, even across multiple RACs. You can check GlueingPartitioningOperatorTest for some good examples of the "glueing" work.
*    PartitionSortOperator: It sorts rows inside partitioned RACs, on the sort columns. The input RACs it receives are expected to be "complete / separate" partitions of data.
2024-10-17 10:54:52 +05:30
Hardik Bajaj 32ce341a6c
Fix RejectExecutionHandler of Blocking Single Threaded executor (#17146)
Throw RejectedExecutionException when submitting tasks to executor that has been shut down.
2024-10-15 22:02:34 +05:30
Clint Wylie c2149d59a7
remove stale comment in QueryableIndexCursorHolder (#17333) 2024-10-11 16:23:59 -07:00
Gian Merlino b287b219a8
MSQ: Include stageId, workerNumber in processing thread names. (#17324)
* MSQ: Include stageId, workerNumber in processing thread names.

Helps identify which query was running in a thread dump.

* s/dart/msq/
2024-10-11 08:37:15 -07:00
Shivam Garg 6898a5a359
Removed Microsecond from Extract function (#17247) 2024-10-11 05:32:26 +02:00
Clint Wylie a6236c3d15
add substituteCombiningFactory implementations for datasketches aggs (#17314)
Follow up to #17214, adds implementations for substituteCombiningFactory so that more
datasketches aggs can match projections, along with some projections tests for datasketches.
2024-10-10 16:14:06 +05:30
Gian Merlino 1d95ef34f0
Logger: Log context of DruidExceptions. (#17316)
* Logger: Log context of DruidExceptions.

There is often interesting and unique information available in the
"context" of a DruidException. This information is additive to both
the message and the cause, and was missed when we log. This patch adds
the DruidException context to log messages whenever stack traces are
enabled.

* Only log nonempty contexts.
2024-10-10 01:44:50 -07:00
Gian Merlino 4fbb129027
Improve javadocs for SegmentDescriptor. (#17274)
The javadoc for SegmentDescriptor discusses differences between it and
SegmentId, but misses the most important difference: SegmentDescriptor
can have a narrower interval than the segment being referenced.
2024-10-08 00:59:55 -07:00
Clint Wylie ab0d6eb620
Fix string array grouping comparator (#17183) 2024-10-08 09:47:28 +05:30
Karan Kumar 6a4352f466
When removeNullBytes is set, length calculations did not take into account null bytes. (#17232)
* When replaceNullBytes is set, length calculations did not take into account null bytes.
2024-10-07 18:02:52 +05:30
Adarsh Sanjeev c9201ad658
Minor refactors to processing module (#17136)
Refactors a few things.

- Adds SemanticUtils maps to columns.
- Add some addAll functions to reduce duplication, and for future reuse.
- Refactor VariantColumnAndIndexSupplier to only take a SmooshedFileMapper instead.
- Refactor LongColumnSerializerV2 to have separate functions for serializing a value and null.
2024-10-07 13:18:35 +05:30
Clint Wylie 0bd13bcd51
Projections prototype (#17214) 2024-10-05 04:38:57 -07:00
Clint Wylie 04fe56835d
add druid.expressions.allowVectorizeFallback and default to false (#17248)
changes:

adds ExpressionProcessing.allowVectorizeFallback() and ExpressionProcessingConfig.allowVectorizeFallback(), defaulting to false until few remaining bugs can be fixed (mostly complex types and some odd interactions with mixed types)
add cannotVectorizeUnlessFallback functions to make it easy to toggle the default of this config, and easy to know what to delete when we remove it in the future
2024-10-05 12:42:42 +05:30