druid

Commit Graph

Author	SHA1	Message	Date
Charles Smith	b79382fa4d	[Docs] Updates the years in the compaction example (#17366 ) * Updates the years in the compaction example * update	2024-12-20 10:22:50 -08:00
Charles Smith	4648a41cab	[Docs] Adds tutorial and stepwise instructions for EXTERN (#17501 ) * draft tutorial on extern: * updated draft * updates * add sidebar, fix reference wording * update reference * final updates to reference * update * add cloud info to tutorial * fix conflict * fix link * Update docs/multi-stage-query/reference.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/multi-stage-query/reference.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * fixes * make hyperlink to console * Update docs/multi-stage-query/reference.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * fix typos --------- Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>	2024-12-19 12:49:56 -08:00
Abhishek Agarwal	bf6b0fd9f9	Speed up builds in few places (#17591 )	2024-12-19 16:33:58 +05:30
Andy Tsai	78ce97ac95	Adding remaining 6 sets of SQL tests in quidem (#17583 ) This is the 2nd part of the PR https://github.com/apache/druid/pull/17548. It migrates the remaining 6 sets of SQL tests to quidem. These 6 sets cover aggregation, array, json, mv, reduction, and other scalar functions. These tests use the existing kttm dataset. They aim to exercise SQL queries in a more comprehensive way: Aggregation functions: Each aggregation function is exercised in 1 query shape: group by query Each query covers all operators in the predicates of the having clause. All queries are designed to return 7 rows. Scalar functions: Each scalar function is exercised in 3 different query shapes: simple query subquery group by query Each query covers all operators in the predicates of the where clause. All queries are select count(*) queries. They are designed to all return the same result for easy maintenance and debugging. Update array and mv tests to use the language column This PR updates the array and the mv tests to use the MV column "language" instead of constructing the data in the queries.	2024-12-19 11:54:07 +05:30
Akshat Jain	ca8f24edd3	Upgrade Guice to 5.1.0 (#17578 ) * Move Guice to 5.1.0 and fix tests * Fix checkstyle * Revert overrideCurrentGuiceModules() and related changes * Fix the tests * Try using maven:3-openjdk-17-slim * Try enabling debugging for mvn command * Use maven:3.9 image * Address review comment: Fix formatting * Address review comment: Add brief javadoc for ExceptionMatcher --------- Co-authored-by: imply-cheddar <86940447+imply-cheddar@users.noreply.github.com>	2024-12-19 09:08:20 +05:30
Kashif Faraz	d9a58a7bbd	Move segment update APIs from Coordinator to Overlord (#17545 ) Summary of changes --------------------- - Add `OverlordDataSourcesResource` with APIs to mark segments used/unused - Add corresponding methods to `OverlordClient` - Deprecate Coordinator APIs to update segments - Use `OverlordClient` in `DataSourcesResource` so that Coordinator APIs internally call the corresponding Overlord APIs - If the API call fails, fall back to updating the metadata store directly - Audit these actions only on the Overlord Other minor changes --------------------- - Do not perform null check on `OverlordClient` on the coordinator side `DataSourcesResource`. `OverlordClient` is always non-null in production. - Add new tests, fix existing ones - Complete the implementation of `TestSegmentsMetadataManager` New Overlord APIs ------------------ - Mark all segments of a datasource as unused: `POST /druid/indexer/v1/datasources/{dataSourceName}` - Mark all (non-overshadowed) segments of a datasource as used: `DELETE /druid/indexer/v1/datasources/{dataSourceName}` - Mark multiple segments as used `POST /druid/indexer/v1/datasources/{dataSourceName}/markUsed` - Mark multiple (non-overshadowed) segments as unused `POST /druid/indexer/v1/datasources/{dataSourceName}/markUnused` - Mark a single segment as used: `POST /druid/indexer/v1/datasources/{dataSourceName}/segments/{segmentId}` - Mark a single segment as unused: `DELETE /druid/indexer/v1/datasources/{dataSourceName}/segments/{segmentId}`	2024-12-19 09:05:00 +05:30
Ashwin Tumma	f7c2c0acdd	Remove duplicate context from Request Logging (#17582 ) * Remove duplicate context from Request Logging * Update Unit Tests to read context --------- Co-authored-by: Ashwin Tumma <ashwin.tumma@salesforce.com>	2024-12-19 09:03:28 +05:30
Akshat Jain	6fad11fe57	Revert "Add back `UnnecessaryFullyQualifiedName` rule in pmd ruleset (#17570 )" (#17584 ) This reverts commit `cd6083fb94`.	2024-12-18 08:29:10 -08:00
Rishabh Singh	d5eb94d0e0	Restore Sink Metric Emission Behaviour: Emit them per-Sink instead of per-FireHydrant (#17170 ) * Emit aggregate segment processing metrics per sink instead of firehydrant * add docs * minor change * checkstyle * Fix DefaultQueryMetricsTest * Minor changes in SinkMetricsEmittingQueryRunner * spotbugs * Address review comments * Use ImmutableSet and ImmutableMap * Create a helper class for saving state of StubServiceEmitter * Add SinkQuerySegmentWalkerBenchmark * Create SegmentMetrics class for tracking segment metrics --------- Co-authored-by: Akshat Jain <akjn11@gmail.com>	2024-12-18 14:17:14 +05:30
George Shiqi Wu	9ff11731c8	Parallelize supervisor stop logic to make it run faster (#17535 ) - Add new method `Supervisor.stopAsync` - Implement `SeekableStreamSupervisor.stopAsync()` to use a shutdown executor - Call `stopAsync` from `SupervisorManager`	2024-12-18 09:19:24 +05:30
Clint Wylie	a44ab109d5	remove druid.expressions.useStrictBooleans in favor of always being true (#17568 )	2024-12-17 18:49:16 -08:00
Karan Kumar	8b81c91979	Remove unused fields. (#17579 )	2024-12-17 13:34:50 -08:00
Adarsh Sanjeev	bb4416a17b	Join context hints (#17541 ) * join hints draft * join algo * propagate join hints * review comments * Use direct hints instead * Add tests * Pass preferred algo through pre join clause * Refactors * Fix tests * Revert test changes * Fix serialization * Fix tests * Fix test * Fix test * Fix test for sql compat mode * Increase coverage * Refactored hint class --------- Co-authored-by: sreemanamala <sree.manamala@imply.io>	2024-12-17 22:25:22 +05:30
Clint Wylie	de9da37384	topn with granularity regression fixes (#17565 ) * topn with granularity regression fixes changes: * fix issue where topN with query granularity other than ALL would use the heap algorithm when it was actual able to use the pooled algorithm, and incorrectly used the pool algorithm in cases where it must use the heap algorithm, a regression from #16533 * fix issue where topN with query granularity other than ALL could incorrectly process values in the wrong time bucket, another regression from #16533 * move defensive check outside of loop * more test * extra layer of safety * move check outside of loop * fix spelling * add query context parameter to allow using pooled algorithm for topN when multi-passes is required even wihen query granularity is not all * add comment, revert IT context changes and add new context flag	2024-12-17 21:21:24 +05:30
Akshat Jain	98b960c6ac	Refactor: Replace explicit type arguments with diamond operator (#17567 ) Since we aren't supporting Java 8 anymore, we can switch to diamond operators without specifying explicit type arguments.	2024-12-17 14:37:45 +05:30
Kashif Faraz	e80a05c38e	Add test and comments for RetryUtils.nextSleep (#17556 )	2024-12-17 13:10:13 +05:30
Akshat Jain	cd6083fb94	Add back `UnnecessaryFullyQualifiedName` rule in pmd ruleset (#17570 ) * Add back UnnecessaryFullyQualifiedName rule in pmd ruleset * Fix checkstyle	2024-12-17 12:43:12 +05:30
Zoltan Haindrich	9bdb3d205c	Upgrade maven commit-id plugi(#17571 )	2024-12-17 12:43:01 +05:30
Clint Wylie	622a6a6f89	remove sql_compatibility from build matrix and only test sql compatible mode (#17557 )	2024-12-16 15:51:12 -08:00
Kashif Faraz	0335bdd90f	Reduce coordinator logs (#17566 )	2024-12-16 16:48:39 +05:30
Atul Mohan	29ab12ccd7	Add missing docs for lookup based task context properties (#17562 ) * Add missing docs for lookup based task context properties * Fix text based on comments	2024-12-16 11:05:01 +05:30
Zeyu-Chen-SFDC	12eed753f7	fix the order in getNativeQueryLine (#17326 )	2024-12-13 21:59:56 +05:30
Akshat Jain	fed36844f1	Re-visit previously disabled spotbugs patterns and enable them (#17560 )	2024-12-13 15:24:40 +01:00
Akshat Jain	a26e4c0e06	Cleanup unreachable Java 8 code flows (#17559 )	2024-12-13 15:24:21 +01:00
Kashif Faraz	24e5d8a9e8	Refactor: Minor cleanup of segment allocation flow (#17524 ) Changes -------- - Simplify the arguments of IndexerMetadataStorageCoordinator.allocatePendingSegment - Remove field SegmentCreateRequest.upgradedFromSegmentId as it was always null - Miscellaneous cleanup	2024-12-13 07:46:57 +05:30
Katya Macedo	b86ea4d5c4	[Docs] Improve druid.coordinator.kill.on description (#17538 ) * Docs: improve druid.coordinator.kill.on description * Update docs/configuration/index.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update description for durationToRetain * Update docs/configuration/index.md * Update after review --------- Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2024-12-12 16:58:38 -08:00
George Shiqi Wu	aca56d6bb8	reject publishing actions with a retriable error code if a earlier task is still publishing (#17509 ) * Working queuing of publishing * fix style * Add unit tests * add tests * retry within the connector * fix unit tests * Update indexing-service/src/main/java/org/apache/druid/indexing/common/actions/LocalTaskActionClient.java Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Add comment * fix style * Fix unit tests * style fix --------- Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2024-12-12 10:37:53 -05:00
Zoltan Haindrich	1a38434d8d	Restore usage of filtered SUM (#17378 )	2024-12-12 10:30:42 +01:00
Ashwin Tumma	05c3cbce08	Docs: Update SQL metrics documentation to include dimension engine (#17554 ) Co-authored-by: Ashwin Tumma <ashwin.tumma@salesforce.com>	2024-12-12 08:56:42 +05:30
Peter Marshall	ccadfd071d	Docs: Update partitioning.md to fix a typo (#17555 ) Quick fix to point the links to `dimensionsSpec` to the correct section of the ingestion spec doc.	2024-12-12 08:56:05 +05:30
Clint Wylie	3c1b488cb7	remove druid.sql.planner.serializeComplexValues config in favor of always serializing complex values (#17549 )	2024-12-11 13:07:56 -08:00
Andy Tsai	f3d7f1aa96	Adding 3 sets of SQL tests in quidem (#17548 ) Description Migrate the initial 3 sets of SQL tests to quidem. These 3 sets cover numeric, string, and datetime scalar functions. These tests use the existing kttm dataset. They aim to exercise SQL queries in a more comprehensive way: Each scalar function is exercised in 3 different query shapes: simple query subquery group by query Each query covers all operators in its predicates. All queries are select count(*) queries. They are designed to all return the same result for easy maintenance and debugging. These are the initial sets of tests. More tests to cover the rest of the scalar and aggregation functions will come later.	2024-12-11 12:57:37 -08:00
Katya Macedo	a51061fa43	[Docs] Improve Bloom filter topic (#17547 ) * [Docs] Improve Bloom filter topic * Apply suggestions from code review Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update spelling file --------- Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2024-12-10 11:43:56 -08:00
Jill Osborne	61d986a179	Filters doc fix (#17553 )	2024-12-10 09:34:43 -08:00
Akshat Jain	7705694481	Increase heap size for integration-tests (#17551 )	2024-12-10 09:24:58 +05:30
Clint Wylie	80d2cd3632	snapshot column capabilities for realtime cursors (#17386 ) * snapshot column capabilities for realtime cursors changes: * adds `CursorBuildSpec.getPhysicalColumns()` to allow specifying the set of required physical columns from a segment. if null, all columns are assumed to be required (e.g. full scan) * `IncrementalIndexCursorFactory`/`IncrementalIndexCursorHolder` uses the physical columns from the cursor build spec to know which set of dimensions to 'snapshot' the capabilities for, allowing expression selectors on realtime queries to no longer be required to treat selectors from `StringDimensionIndexer` as multi-valued unless they truly are multi-valued. this fixes several bugs with expressions on realtime queries that change a value from `StringDimensionIndexer` to some type other than string, which would often result in a single element array from the column being handled as multi-valued * `StringDimensionIndexer.setSparseIndexed()` now adds the default value to the dictionary when set * `StringDimensionIndexer` column value selectors now always report that they are dictionary encoded, and that name lookup is possible in advance on their selectors (since set sparse adds the null value so the cardinality is correct) * fixed a mistake that expression selectors for realtime queries with no null values could not use dictionary encoded selectors * hmm * test changes * cleanup * add test coverage * fix test * fixes * cleanup	2024-12-09 08:44:54 -08:00
Rohan Garg	ae4ea51352	Rewrite S3StorageConnectorTest using testcontainers and MinIO (#17539 )	2024-12-09 09:48:38 -05:00
Akshat Jain	b114807560	Fix cron job ITs by using jdk17 as the runtime_jdk (#17544 ) This PR changes runtime_jdk to 17 from 21.0.4 to fix the cron job ITs.	2024-12-06 14:36:27 -08:00
zachjsh	3b6a3ae222	Add taskStatus dimension to service/heartbeat metric (#17488 ) * SQL syntax error should target USER persona * * revert change to queryHandler and related tests, based on review comments * * add test * * add taskStatus dimension to `service/heartbeat` metric * * address review comments * * fix compilation error from merge * * improve test coverage * Address review comments * * remove unuused import * * address remaining comments	2024-12-06 17:18:59 -05:00
George Shiqi Wu	7736228f37	Separate stop/start logic for LeaderLatch (#17546 )	2024-12-06 16:01:28 -05:00
Virushade	f61ec0af85	Reduce occurrences of failed IT builds (#17543 ) Reduce occurrences of failed IT builds: break up the setup command and add a few retries to improve resiliency.	2024-12-06 09:57:38 -08:00
Abhishek Radhakrishnan	3a2220c68d	Refactor: Move some classes from `sql` to `processing` & `server` for reusability (#17542 ) This PR contains non-functional / refactoring changes of the following classes in the sql module: 1. Move ExplainPlan and ExplainAttributes fromsql/src/main/java/org/apache/druid/sql/http to processing/src/main/java/org/apache/druid/query/explain 2. Move sql/src/main/java/org/apache/druid/sql/SqlTaskStatus.java -> processing/src/main/java/org/apache/druid/query/http/SqlTaskStatus.java 3. Add a new class processing/src/main/java/org/apache/druid/query/http/ClientSqlQuery.java that is effectively a thin POJO version of SqlQuery in the sql module but without any of the Calcite functionality and business logic. 4. Move BrokerClient, BrokerClientImpl and Broker classes from sql/src/main/java/org/apache/druid/sql/client to server/src/main/java/org/apache/druid/client/broker. 5. Remove BrokerServiceModule that provided the BrokerClient. The functionality is now contained in ServiceClientModule in the server package itself which provides all the clients as well. This is done so that we can reuse the said classes in #17353 without brining in Calcite and other dependencies to the Overlord.	2024-12-06 09:32:03 -08:00
TessaIO	93c123a482	docs: fix cached lookup module documentation (#17527 ) * docs: fix loading lookup documentation Signed-off-by: TessaIO <ahmedgrati1999@gmail.com> * docs: fix indentation and punctuation Signed-off-by: TessaIO <ahmedgrati1999@gmail.com> --------- Signed-off-by: TessaIO <ahmedgrati1999@gmail.com>	2024-12-06 00:09:37 -08:00
Kashif Faraz	3de46746ca	Fix NPE in segment allocation when reduceMetadataIO is true (#17537 )	2024-12-05 12:58:47 +05:30
Karan Kumar	0eb8d733d4	Adding leader and not being leader logging on the overlord. (#17519 )	2024-12-03 22:36:53 +05:30
Clint Wylie	9ef46fc92d	suppress kafka cve for ranger extension (#17531 )	2024-12-02 21:25:39 -08:00
Zoltan Haindrich	c1ef38b052	Minor fixes and enhancements in UnionQuery handling (#17483 ) * plan consistently with either UnionDataSource or UnionQuery for decoupled mode * expose errors * move decoupled related setting from PlannerConfig to QueryContexts	2024-11-28 10:05:12 +01:00
Vadim Ogievetsky	ddbb985369	Web console: refactor and improve the segment timeline (try 2) (#17521 ) * refactor and improve the segment timeline * us consistent state * type cleanup * add shpitz * better bubble * Update web-console/src/components/segment-timeline/segment-bar-chart-render.tsx Co-authored-by: Clint Wylie <cjwylie@gmail.com> --------- Co-authored-by: Clint Wylie <cjwylie@gmail.com>	2024-11-27 19:30:40 -08:00
Charles Smith	0325f62af2	[Docs] Remove ambiguous advice regarding TopN correctness (#17522 )	2024-11-27 11:41:28 -08:00
Vadim Ogievetsky	f3e1f1e586	Revert "Web console: refactor and improve the segment timeline (#17508 )" (#17520 ) This reverts commit `09432c099b`.	2024-11-27 09:38:48 -08:00

1 2 3 4 5 ...

14697 Commits All Branches Search

14697 Commits

All Branches