14799 Commits

Author SHA1 Message Date
Zoltan Haindrich
8d46ff5c70 update apidoc 2024-12-19 07:34:12 +00:00
Zoltan Haindrich
db8d4225d1 rename test 2024-12-19 07:30:38 +00:00
Zoltan Haindrich
c518b488b6 fix compile; add doc 2024-12-19 07:27:37 +00:00
Zoltan Haindrich
d1e8ae160c up 2024-12-19 07:22:25 +00:00
Zoltan Haindrich
7e5658e6b7 add apidoc 2024-12-19 07:18:28 +00:00
Zoltan Haindrich
cc7e0b4ac0 Merge remote-tracking branch 'apache/master' into unnest-relfieldtrimmer-unnestfieldtype 2024-12-19 04:17:15 +00:00
Akshat Jain
ca8f24edd3
Upgrade Guice to 5.1.0 (#17578)
* Move Guice to 5.1.0 and fix tests
* Fix checkstyle
* Revert overrideCurrentGuiceModules() and related changes
* Fix the tests
* Try using maven:3-openjdk-17-slim
* Try enabling debugging for mvn command
* Use maven:3.9 image
* Address review comment: Fix formatting
* Address review comment: Add brief javadoc for ExceptionMatcher
---------
Co-authored-by: imply-cheddar <86940447+imply-cheddar@users.noreply.github.com>
2024-12-19 09:08:20 +05:30
Kashif Faraz
d9a58a7bbd
Move segment update APIs from Coordinator to Overlord (#17545)
Summary of changes
---------------------
- Add `OverlordDataSourcesResource` with APIs to mark segments used/unused
- Add corresponding methods to `OverlordClient`
- Deprecate Coordinator APIs to update segments
- Use `OverlordClient` in `DataSourcesResource` so that Coordinator APIs internally
call the corresponding Overlord APIs
- If the API call fails, fall back to updating the metadata store directly
- Audit these actions only on the Overlord

Other minor changes
---------------------
- Do not perform null check on `OverlordClient` on the coordinator side `DataSourcesResource`.
`OverlordClient` is always non-null in production.
- Add new tests, fix existing ones
- Complete the implementation of `TestSegmentsMetadataManager`

New Overlord APIs
------------------
- Mark all segments of a datasource as unused:
`POST /druid/indexer/v1/datasources/{dataSourceName}`
- Mark all (non-overshadowed) segments of a datasource as used:
`DELETE /druid/indexer/v1/datasources/{dataSourceName}`
- Mark multiple segments as used
`POST /druid/indexer/v1/datasources/{dataSourceName}/markUsed`
- Mark multiple (non-overshadowed) segments as unused
`POST /druid/indexer/v1/datasources/{dataSourceName}/markUnused`
- Mark a single segment as used:
`POST /druid/indexer/v1/datasources/{dataSourceName}/segments/{segmentId}`
- Mark a single segment as unused:
`DELETE /druid/indexer/v1/datasources/{dataSourceName}/segments/{segmentId}`
2024-12-19 09:05:00 +05:30
Ashwin Tumma
f7c2c0acdd
Remove duplicate context from Request Logging (#17582)
* Remove duplicate context from Request Logging
* Update Unit Tests to read context
---------
Co-authored-by: Ashwin Tumma <ashwin.tumma@salesforce.com>
2024-12-19 09:03:28 +05:30
Akshat Jain
6fad11fe57
Revert "Add back UnnecessaryFullyQualifiedName rule in pmd ruleset (#17570)" (#17584)
This reverts commit cd6083fb9423260e46876bfb3f4527c3581a9cb9.
2024-12-18 08:29:10 -08:00
Rishabh Singh
d5eb94d0e0
Restore Sink Metric Emission Behaviour: Emit them per-Sink instead of per-FireHydrant (#17170)
* Emit aggregate segment processing metrics per sink instead of firehydrant

* add docs

* minor change

* checkstyle

* Fix DefaultQueryMetricsTest

* Minor changes in SinkMetricsEmittingQueryRunner

* spotbugs

* Address review comments

* Use ImmutableSet and ImmutableMap

* Create a helper class for saving state of StubServiceEmitter

* Add SinkQuerySegmentWalkerBenchmark

* Create SegmentMetrics class for tracking segment metrics

---------

Co-authored-by: Akshat Jain <akjn11@gmail.com>
2024-12-18 14:17:14 +05:30
George Shiqi Wu
9ff11731c8
Parallelize supervisor stop logic to make it run faster (#17535)
- Add new method `Supervisor.stopAsync`
- Implement `SeekableStreamSupervisor.stopAsync()` to use a shutdown executor
- Call `stopAsync` from `SupervisorManager`
2024-12-18 09:19:24 +05:30
Clint Wylie
a44ab109d5
remove druid.expressions.useStrictBooleans in favor of always being true (#17568) 2024-12-17 18:49:16 -08:00
Karan Kumar
8b81c91979
Remove unused fields. (#17579) 2024-12-17 13:34:50 -08:00
Adarsh Sanjeev
bb4416a17b
Join context hints (#17541)
* join hints draft

* join algo

* propagate join hints

* review comments

* Use direct hints instead

* Add tests

* Pass preferred algo through pre join clause

* Refactors

* Fix tests

* Revert test changes

* Fix serialization

* Fix tests

* Fix test

* Fix test

* Fix test for sql compat mode

* Increase coverage

* Refactored hint class

---------

Co-authored-by: sreemanamala <sree.manamala@imply.io>
2024-12-17 22:25:22 +05:30
Clint Wylie
de9da37384
topn with granularity regression fixes (#17565)
* topn with granularity regression fixes

changes:
* fix issue where topN with query granularity other than ALL would use the heap algorithm when it was actual able to use the pooled algorithm, and incorrectly used the pool algorithm in cases where it must use the heap algorithm, a regression from #16533
* fix issue where topN with query granularity other than ALL could incorrectly process values in the wrong time bucket, another regression from #16533

* move defensive check outside of loop

* more test

* extra layer of safety

* move check outside of loop

* fix spelling

* add query context parameter to allow using pooled algorithm for topN when multi-passes is required even wihen query granularity is not all

* add comment, revert IT context changes and add new context flag
2024-12-17 21:21:24 +05:30
Akshat Jain
98b960c6ac
Refactor: Replace explicit type arguments with diamond operator (#17567)
Since we aren't supporting Java 8 anymore, we can switch to diamond operators
without specifying explicit type arguments.
2024-12-17 14:37:45 +05:30
Kashif Faraz
e80a05c38e
Add test and comments for RetryUtils.nextSleep (#17556) 2024-12-17 13:10:13 +05:30
Akshat Jain
cd6083fb94
Add back UnnecessaryFullyQualifiedName rule in pmd ruleset (#17570)
* Add back UnnecessaryFullyQualifiedName rule in pmd ruleset

* Fix checkstyle
2024-12-17 12:43:12 +05:30
Zoltan Haindrich
9bdb3d205c
Upgrade maven commit-id plugi(#17571) 2024-12-17 12:43:01 +05:30
Clint Wylie
622a6a6f89
remove sql_compatibility from build matrix and only test sql compatible mode (#17557) 2024-12-16 15:51:12 -08:00
Kashif Faraz
0335bdd90f
Reduce coordinator logs (#17566) 2024-12-16 16:48:39 +05:30
Atul Mohan
29ab12ccd7
Add missing docs for lookup based task context properties (#17562)
* Add missing docs for lookup based task context properties

* Fix text based on comments
2024-12-16 11:05:01 +05:30
Zeyu-Chen-SFDC
12eed753f7
fix the order in getNativeQueryLine (#17326) 2024-12-13 21:59:56 +05:30
Zoltan Haindrich
3477592133 up 2024-12-13 16:16:04 +00:00
Zoltan Haindrich
e9891122d4 Merge remote-tracking branch 'apache/master' into unnest-relfieldtrimmer-unnestfieldtype 2024-12-13 15:59:36 +00:00
Zoltan Haindrich
c8d23927f8 add missing override 2024-12-13 14:26:00 +00:00
Akshat Jain
fed36844f1
Re-visit previously disabled spotbugs patterns and enable them (#17560) 2024-12-13 15:24:40 +01:00
Akshat Jain
a26e4c0e06
Cleanup unreachable Java 8 code flows (#17559) 2024-12-13 15:24:21 +01:00
Zoltan Haindrich
dde6e06596 remove boolean 2024-12-13 10:49:06 +00:00
Zoltan Haindrich
a161b6c6d4 retain for old 2024-12-13 10:47:03 +00:00
Zoltan Haindrich
c51607e1b1 fix 2024-12-13 10:09:37 +00:00
Zoltan Haindrich
19609943aa trial of accepting empty project 2024-12-13 10:06:24 +00:00
Kashif Faraz
24e5d8a9e8
Refactor: Minor cleanup of segment allocation flow (#17524)
Changes
--------
- Simplify the arguments of IndexerMetadataStorageCoordinator.allocatePendingSegment
- Remove field SegmentCreateRequest.upgradedFromSegmentId as it was always null
- Miscellaneous cleanup
2024-12-13 07:46:57 +05:30
Katya Macedo
b86ea4d5c4
[Docs] Improve druid.coordinator.kill.on description (#17538)
* Docs: improve druid.coordinator.kill.on description

* Update docs/configuration/index.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update description for durationToRetain

* Update docs/configuration/index.md

* Update after review

---------

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2024-12-12 16:58:38 -08:00
George Shiqi Wu
aca56d6bb8
reject publishing actions with a retriable error code if a earlier task is still publishing (#17509)
* Working queuing of publishing

* fix style

* Add unit tests

* add tests

* retry within the connector

* fix unit tests

* Update indexing-service/src/main/java/org/apache/druid/indexing/common/actions/LocalTaskActionClient.java

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>

* Add comment

* fix style

* Fix unit tests

* style fix

---------

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
2024-12-12 10:37:53 -05:00
Zoltan Haindrich
ef12701816 up 2024-12-12 14:13:43 +00:00
Zoltan Haindrich
b8726c3e72 fix for sqlcompat 2024-12-12 14:12:20 +00:00
Zoltan Haindrich
1a38434d8d
Restore usage of filtered SUM (#17378) 2024-12-12 10:30:42 +01:00
Ashwin Tumma
05c3cbce08
Docs: Update SQL metrics documentation to include dimension engine (#17554)
Co-authored-by: Ashwin Tumma <ashwin.tumma@salesforce.com>
2024-12-12 08:56:42 +05:30
Peter Marshall
ccadfd071d
Docs: Update partitioning.md to fix a typo (#17555)
Quick fix to point the links to `dimensionsSpec` to the correct section of the ingestion spec doc.
2024-12-12 08:56:05 +05:30
Clint Wylie
3c1b488cb7
remove druid.sql.planner.serializeComplexValues config in favor of always serializing complex values (#17549) 2024-12-11 13:07:56 -08:00
Andy Tsai
f3d7f1aa96
Adding 3 sets of SQL tests in quidem (#17548)
Description

Migrate the initial 3 sets of SQL tests to quidem.  These 3 sets cover numeric, string, and datetime scalar functions.
These tests use the existing kttm dataset.  They aim to exercise SQL queries in a more comprehensive way:

Each scalar function is exercised in 3 different query shapes:
  simple query
  subquery
  group by query
Each query covers all operators in its predicates.
All queries are select count(*) queries.  They are designed to all return the same result for easy maintenance and debugging.

These are the initial sets of tests.  More tests to cover the rest of the scalar and aggregation functions will come later.
2024-12-11 12:57:37 -08:00
Zoltan Haindrich
86bec3bdc0 update 2024-12-11 13:17:00 +00:00
Katya Macedo
a51061fa43
[Docs] Improve Bloom filter topic (#17547)
* [Docs] Improve Bloom filter topic

* Apply suggestions from code review

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update spelling file

---------

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2024-12-10 11:43:56 -08:00
Jill Osborne
61d986a179
Filters doc fix (#17553) 2024-12-10 09:34:43 -08:00
Akshat Jain
7705694481
Increase heap size for integration-tests (#17551) 2024-12-10 09:24:58 +05:30
Clint Wylie
80d2cd3632
snapshot column capabilities for realtime cursors (#17386)
* snapshot column capabilities for realtime cursors

changes:
* adds `CursorBuildSpec.getPhysicalColumns()` to allow specifying the set of required physical columns from a segment. if null, all columns are assumed to be required (e.g. full scan)
* `IncrementalIndexCursorFactory`/`IncrementalIndexCursorHolder` uses the physical columns from the cursor build spec to know which set of dimensions to 'snapshot' the capabilities for, allowing expression selectors on realtime queries to no longer be required to treat selectors from `StringDimensionIndexer` as multi-valued unless they truly are multi-valued. this fixes several bugs with expressions on realtime queries that change a value from `StringDimensionIndexer` to some type other than string, which would often result in a single element array from the column being handled as multi-valued
* `StringDimensionIndexer.setSparseIndexed()` now adds the default value to the dictionary when set
* `StringDimensionIndexer` column value selectors now always report that they are dictionary encoded, and that name lookup is possible in advance on their selectors (since set sparse adds the null value so the cardinality is correct)
* fixed a mistake that expression selectors for realtime queries with no null values could not use dictionary encoded selectors

* hmm

* test changes

* cleanup

* add test coverage

* fix test

* fixes

* cleanup
2024-12-09 08:44:54 -08:00
Rohan Garg
ae4ea51352
Rewrite S3StorageConnectorTest using testcontainers and MinIO (#17539) 2024-12-09 09:48:38 -05:00
Zoltan Haindrich
5c5ca2716c cleanup 2024-12-07 19:14:32 +00:00