Commit Graph

14799 Commits

Author SHA1 Message Date
Zoltan Haindrich 8d46ff5c70 update apidoc 2024-12-19 07:34:12 +00:00
Zoltan Haindrich db8d4225d1 rename test 2024-12-19 07:30:38 +00:00
Zoltan Haindrich c518b488b6 fix compile; add doc 2024-12-19 07:27:37 +00:00
Zoltan Haindrich d1e8ae160c up 2024-12-19 07:22:25 +00:00
Zoltan Haindrich 7e5658e6b7 add apidoc 2024-12-19 07:18:28 +00:00
Zoltan Haindrich cc7e0b4ac0 Merge remote-tracking branch 'apache/master' into unnest-relfieldtrimmer-unnestfieldtype 2024-12-19 04:17:15 +00:00
Akshat Jain ca8f24edd3
Upgrade Guice to 5.1.0 (#17578)
* Move Guice to 5.1.0 and fix tests
* Fix checkstyle
* Revert overrideCurrentGuiceModules() and related changes
* Fix the tests
* Try using maven:3-openjdk-17-slim
* Try enabling debugging for mvn command
* Use maven:3.9 image
* Address review comment: Fix formatting
* Address review comment: Add brief javadoc for ExceptionMatcher
---------
Co-authored-by: imply-cheddar <86940447+imply-cheddar@users.noreply.github.com>
2024-12-19 09:08:20 +05:30
Kashif Faraz d9a58a7bbd
Move segment update APIs from Coordinator to Overlord (#17545)
Summary of changes
---------------------
- Add `OverlordDataSourcesResource` with APIs to mark segments used/unused
- Add corresponding methods to `OverlordClient`
- Deprecate Coordinator APIs to update segments
- Use `OverlordClient` in `DataSourcesResource` so that Coordinator APIs internally
call the corresponding Overlord APIs
- If the API call fails, fall back to updating the metadata store directly
- Audit these actions only on the Overlord

Other minor changes
---------------------
- Do not perform null check on `OverlordClient` on the coordinator side `DataSourcesResource`.
`OverlordClient` is always non-null in production.
- Add new tests, fix existing ones
- Complete the implementation of `TestSegmentsMetadataManager`

New Overlord APIs
------------------
- Mark all segments of a datasource as unused:
`POST /druid/indexer/v1/datasources/{dataSourceName}`
- Mark all (non-overshadowed) segments of a datasource as used:
`DELETE /druid/indexer/v1/datasources/{dataSourceName}`
- Mark multiple segments as used
`POST /druid/indexer/v1/datasources/{dataSourceName}/markUsed`
- Mark multiple (non-overshadowed) segments as unused
`POST /druid/indexer/v1/datasources/{dataSourceName}/markUnused`
- Mark a single segment as used:
`POST /druid/indexer/v1/datasources/{dataSourceName}/segments/{segmentId}`
- Mark a single segment as unused:
`DELETE /druid/indexer/v1/datasources/{dataSourceName}/segments/{segmentId}`
2024-12-19 09:05:00 +05:30
Ashwin Tumma f7c2c0acdd
Remove duplicate context from Request Logging (#17582)
* Remove duplicate context from Request Logging
* Update Unit Tests to read context
---------
Co-authored-by: Ashwin Tumma <ashwin.tumma@salesforce.com>
2024-12-19 09:03:28 +05:30
Akshat Jain 6fad11fe57
Revert "Add back `UnnecessaryFullyQualifiedName` rule in pmd ruleset (#17570)" (#17584)
This reverts commit cd6083fb94.
2024-12-18 08:29:10 -08:00
Rishabh Singh d5eb94d0e0
Restore Sink Metric Emission Behaviour: Emit them per-Sink instead of per-FireHydrant (#17170)
* Emit aggregate segment processing metrics per sink instead of firehydrant

* add docs

* minor change

* checkstyle

* Fix DefaultQueryMetricsTest

* Minor changes in SinkMetricsEmittingQueryRunner

* spotbugs

* Address review comments

* Use ImmutableSet and ImmutableMap

* Create a helper class for saving state of StubServiceEmitter

* Add SinkQuerySegmentWalkerBenchmark

* Create SegmentMetrics class for tracking segment metrics

---------

Co-authored-by: Akshat Jain <akjn11@gmail.com>
2024-12-18 14:17:14 +05:30
George Shiqi Wu 9ff11731c8
Parallelize supervisor stop logic to make it run faster (#17535)
- Add new method `Supervisor.stopAsync`
- Implement `SeekableStreamSupervisor.stopAsync()` to use a shutdown executor
- Call `stopAsync` from `SupervisorManager`
2024-12-18 09:19:24 +05:30
Clint Wylie a44ab109d5
remove druid.expressions.useStrictBooleans in favor of always being true (#17568) 2024-12-17 18:49:16 -08:00
Karan Kumar 8b81c91979
Remove unused fields. (#17579) 2024-12-17 13:34:50 -08:00
Adarsh Sanjeev bb4416a17b
Join context hints (#17541)
* join hints draft

* join algo

* propagate join hints

* review comments

* Use direct hints instead

* Add tests

* Pass preferred algo through pre join clause

* Refactors

* Fix tests

* Revert test changes

* Fix serialization

* Fix tests

* Fix test

* Fix test

* Fix test for sql compat mode

* Increase coverage

* Refactored hint class

---------

Co-authored-by: sreemanamala <sree.manamala@imply.io>
2024-12-17 22:25:22 +05:30
Clint Wylie de9da37384
topn with granularity regression fixes (#17565)
* topn with granularity regression fixes

changes:
* fix issue where topN with query granularity other than ALL would use the heap algorithm when it was actual able to use the pooled algorithm, and incorrectly used the pool algorithm in cases where it must use the heap algorithm, a regression from #16533
* fix issue where topN with query granularity other than ALL could incorrectly process values in the wrong time bucket, another regression from #16533

* move defensive check outside of loop

* more test

* extra layer of safety

* move check outside of loop

* fix spelling

* add query context parameter to allow using pooled algorithm for topN when multi-passes is required even wihen query granularity is not all

* add comment, revert IT context changes and add new context flag
2024-12-17 21:21:24 +05:30
Akshat Jain 98b960c6ac
Refactor: Replace explicit type arguments with diamond operator (#17567)
Since we aren't supporting Java 8 anymore, we can switch to diamond operators
without specifying explicit type arguments.
2024-12-17 14:37:45 +05:30
Kashif Faraz e80a05c38e
Add test and comments for RetryUtils.nextSleep (#17556) 2024-12-17 13:10:13 +05:30
Akshat Jain cd6083fb94
Add back `UnnecessaryFullyQualifiedName` rule in pmd ruleset (#17570)
* Add back UnnecessaryFullyQualifiedName rule in pmd ruleset

* Fix checkstyle
2024-12-17 12:43:12 +05:30
Zoltan Haindrich 9bdb3d205c
Upgrade maven commit-id plugi(#17571) 2024-12-17 12:43:01 +05:30
Clint Wylie 622a6a6f89
remove sql_compatibility from build matrix and only test sql compatible mode (#17557) 2024-12-16 15:51:12 -08:00
Kashif Faraz 0335bdd90f
Reduce coordinator logs (#17566) 2024-12-16 16:48:39 +05:30
Atul Mohan 29ab12ccd7
Add missing docs for lookup based task context properties (#17562)
* Add missing docs for lookup based task context properties

* Fix text based on comments
2024-12-16 11:05:01 +05:30
Zeyu-Chen-SFDC 12eed753f7
fix the order in getNativeQueryLine (#17326) 2024-12-13 21:59:56 +05:30
Zoltan Haindrich 3477592133 up 2024-12-13 16:16:04 +00:00
Zoltan Haindrich e9891122d4 Merge remote-tracking branch 'apache/master' into unnest-relfieldtrimmer-unnestfieldtype 2024-12-13 15:59:36 +00:00
Zoltan Haindrich c8d23927f8 add missing override 2024-12-13 14:26:00 +00:00
Akshat Jain fed36844f1
Re-visit previously disabled spotbugs patterns and enable them (#17560) 2024-12-13 15:24:40 +01:00
Akshat Jain a26e4c0e06
Cleanup unreachable Java 8 code flows (#17559) 2024-12-13 15:24:21 +01:00
Zoltan Haindrich dde6e06596 remove boolean 2024-12-13 10:49:06 +00:00
Zoltan Haindrich a161b6c6d4 retain for old 2024-12-13 10:47:03 +00:00
Zoltan Haindrich c51607e1b1 fix 2024-12-13 10:09:37 +00:00
Zoltan Haindrich 19609943aa trial of accepting empty project 2024-12-13 10:06:24 +00:00
Kashif Faraz 24e5d8a9e8
Refactor: Minor cleanup of segment allocation flow (#17524)
Changes
--------
- Simplify the arguments of IndexerMetadataStorageCoordinator.allocatePendingSegment
- Remove field SegmentCreateRequest.upgradedFromSegmentId as it was always null
- Miscellaneous cleanup
2024-12-13 07:46:57 +05:30
Katya Macedo b86ea4d5c4
[Docs] Improve druid.coordinator.kill.on description (#17538)
* Docs: improve druid.coordinator.kill.on description

* Update docs/configuration/index.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update description for durationToRetain

* Update docs/configuration/index.md

* Update after review

---------

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2024-12-12 16:58:38 -08:00
George Shiqi Wu aca56d6bb8
reject publishing actions with a retriable error code if a earlier task is still publishing (#17509)
* Working queuing of publishing

* fix style

* Add unit tests

* add tests

* retry within the connector

* fix unit tests

* Update indexing-service/src/main/java/org/apache/druid/indexing/common/actions/LocalTaskActionClient.java

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>

* Add comment

* fix style

* Fix unit tests

* style fix

---------

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
2024-12-12 10:37:53 -05:00
Zoltan Haindrich ef12701816 up 2024-12-12 14:13:43 +00:00
Zoltan Haindrich b8726c3e72 fix for sqlcompat 2024-12-12 14:12:20 +00:00
Zoltan Haindrich 1a38434d8d
Restore usage of filtered SUM (#17378) 2024-12-12 10:30:42 +01:00
Ashwin Tumma 05c3cbce08
Docs: Update SQL metrics documentation to include dimension engine (#17554)
Co-authored-by: Ashwin Tumma <ashwin.tumma@salesforce.com>
2024-12-12 08:56:42 +05:30
Peter Marshall ccadfd071d
Docs: Update partitioning.md to fix a typo (#17555)
Quick fix to point the links to `dimensionsSpec` to the correct section of the ingestion spec doc.
2024-12-12 08:56:05 +05:30
Clint Wylie 3c1b488cb7
remove druid.sql.planner.serializeComplexValues config in favor of always serializing complex values (#17549) 2024-12-11 13:07:56 -08:00
Andy Tsai f3d7f1aa96
Adding 3 sets of SQL tests in quidem (#17548)
Description

Migrate the initial 3 sets of SQL tests to quidem.  These 3 sets cover numeric, string, and datetime scalar functions.
These tests use the existing kttm dataset.  They aim to exercise SQL queries in a more comprehensive way:

Each scalar function is exercised in 3 different query shapes:
  simple query
  subquery
  group by query
Each query covers all operators in its predicates.
All queries are select count(*) queries.  They are designed to all return the same result for easy maintenance and debugging.

These are the initial sets of tests.  More tests to cover the rest of the scalar and aggregation functions will come later.
2024-12-11 12:57:37 -08:00
Zoltan Haindrich 86bec3bdc0 update 2024-12-11 13:17:00 +00:00
Katya Macedo a51061fa43
[Docs] Improve Bloom filter topic (#17547)
* [Docs] Improve Bloom filter topic

* Apply suggestions from code review

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update spelling file

---------

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2024-12-10 11:43:56 -08:00
Jill Osborne 61d986a179
Filters doc fix (#17553) 2024-12-10 09:34:43 -08:00
Akshat Jain 7705694481
Increase heap size for integration-tests (#17551) 2024-12-10 09:24:58 +05:30
Clint Wylie 80d2cd3632
snapshot column capabilities for realtime cursors (#17386)
* snapshot column capabilities for realtime cursors

changes:
* adds `CursorBuildSpec.getPhysicalColumns()` to allow specifying the set of required physical columns from a segment. if null, all columns are assumed to be required (e.g. full scan)
* `IncrementalIndexCursorFactory`/`IncrementalIndexCursorHolder` uses the physical columns from the cursor build spec to know which set of dimensions to 'snapshot' the capabilities for, allowing expression selectors on realtime queries to no longer be required to treat selectors from `StringDimensionIndexer` as multi-valued unless they truly are multi-valued. this fixes several bugs with expressions on realtime queries that change a value from `StringDimensionIndexer` to some type other than string, which would often result in a single element array from the column being handled as multi-valued
* `StringDimensionIndexer.setSparseIndexed()` now adds the default value to the dictionary when set
* `StringDimensionIndexer` column value selectors now always report that they are dictionary encoded, and that name lookup is possible in advance on their selectors (since set sparse adds the null value so the cardinality is correct)
* fixed a mistake that expression selectors for realtime queries with no null values could not use dictionary encoded selectors

* hmm

* test changes

* cleanup

* add test coverage

* fix test

* fixes

* cleanup
2024-12-09 08:44:54 -08:00
Rohan Garg ae4ea51352
Rewrite S3StorageConnectorTest using testcontainers and MinIO (#17539) 2024-12-09 09:48:38 -05:00
Zoltan Haindrich 5c5ca2716c cleanup 2024-12-07 19:14:32 +00:00