druid

Commit Graph

Author	SHA1	Message	Date
George Shiqi Wu	f7013e012c	Add new test for handoff API (#16492 ) * Add new test for handoff API * Add new method * fix test * Update test	2024-05-28 12:57:51 -07:00
Zoltan Haindrich	44ea4e1c51	Fix cds-coordinator-metadata-query-disabled (#16488 ) fixes the issue with the newly enabled `cds-coordiantor-metadata-query-disabled` [split](https://github.com/apache/druid/pull/16468) * configures to use `prepopulated-data` environment things to configure `S3` for access * this is needed because these tests use a [dataset which is loaded from s3](https://github.com/apache/druid/blob/master/integration-tests/docker/test-data/cds-coordinator-metadata-query-disabled-sample-data.sql) * also undoes the previous [fix](https://github.com/apache/druid/pull/16469) of setting the aws region explicitly as this is a more complete solution - and configuring `prepopulated-data` also sets the region; so that's not needed anymore	2024-05-22 20:42:11 +02:00
Zoltan Haindrich	c948201507	Fix cds-task-schema-publish-disabled (#16469 ) set AWS_REGION=us-west-2 to avoid retries	2024-05-21 12:18:30 +05:30
zachjsh	dd5dc500ce	Catalog integration tests (#16424 ) * * add new catalog IT with failure to ensure that it is run in CI * * actually add failing test referred to and fix checkstyle * * add some tests * * fix checkstyle * * add test descriptions * * add more tests	2024-05-17 11:49:09 -04:00
Akshat Jain	bacdb4c48d	Update integration tests related documentation for better clarity (#16313 )	2024-05-13 11:27:21 +05:30
Alberic Liu	92fb0ff718	upgrade mysql:mysql-connector-java to 8.2.0 (#16024 ) * upgrade mysql:mysql-connector-java to 8.2.0 * fix the check errors * remove unused comment	2024-05-06 21:58:37 +08:00
Rishabh Singh	c61c3785a0	Followup changes to 15817 (Segment schema publishing and polling) (#16368 ) * Fix build * Nit changes in KillUnreferencedSegmentSchema * Replace reference to the abbreviation SMQ with Metadata Query, rename inTransit maps in schema cache * nitpicks * Remove reference to smq abbreviation from integration-tests * Remove reference to smq abbreviation from integration-tests * minor change * Update index.md * Add delimiter while computing schema fingerprint hash	2024-05-03 19:13:52 +05:30
Kashif Faraz	e5b40b0b8c	Miscellaneous cleanup of load queue references (#16367 ) Changes: - Rename `DataSegmentChangeRequestAndStatus` to `DataSegmentChangeResponse` - Rename `SegmentLoadDropHandler.Status` to `SegmentChangeStatus` - Remove method `CoordinatorRunStats.getSnapshotAndReset()` as it was used only in load queue peon implementations. Using an atomic reference is much simpler. - Remove `ServerTestHelper.MAPPER`. Use existing `TestHelper.makeJsonMapper()` instead.	2024-05-02 15:59:50 +05:30
Gian Merlino	5d1950d451	MSQ controller: Support in-memory shuffles; towards JVM reuse. (#16168 ) * MSQ controller: Support in-memory shuffles; towards JVM reuse. This patch contains two controller changes that make progress towards a lower-latency MSQ. First, support for in-memory shuffles. The main feature of in-memory shuffles, as far as the controller is concerned, is that they are not fully buffered. That means that whenever a producer stage uses in-memory output, its consumer must run concurrently. The controller determines which stages run concurrently, and when they start and stop. "Leapfrogging" allows any chain of sort-based stages to use in-memory shuffles even if we can only run two stages at once. For example, in a linear chain of stages 0 -> 1 -> 2 where all do sort-based shuffles, we can use in-memory shuffling for each one while only running two at once. (When stage 1 is done reading input and about to start writing its output, we can stop 0 and start 2.) 1) New OutputChannelMode enum attached to WorkOrders that tells workers whether stage output should be in memory (MEMORY), or use local or durable storage. 2) New logic in the ControllerQueryKernel to determine which stages can use in-memory shuffling (ControllerUtils#computeStageGroups) and to launch them at the appropriate time (ControllerQueryKernel#createNewKernels). 3) New "doneReadingInput" method on Controller (passed down to the stage kernels) which allows stages to transition to POST_READING even if they are not gathering statistics. This is important because it enables "leapfrogging" for HASH_LOCAL_SORT shuffles, and for GLOBAL_SORT shuffles with 1 partition. 4) Moved result-reading from ControllerContext#writeReports to new QueryListener interface, which ControllerImpl feeds results to row-by-row while the query is still running. Important so we can read query results from the final stage using an in-memory channel. 5) New class ControllerQueryKernelConfig holds configs that control kernel behavior (such as whether to pipeline, maximum number of concurrent stages, etc). Generated by the ControllerContext. Second, a refactor towards running workers in persistent JVMs that are able to cache data across queries. This is helpful because I believe we'll want to reuse JVMs and cached data for latency reasons. 1) Move creation of WorkerManager and TableInputSpecSlicer to the ControllerContext, rather than ControllerImpl. This allows managing workers and work assignment differently when JVMs are reusable. 2) Lift the Controller Jersey resource out from ControllerChatHandler to a reusable resource. 3) Move memory introspection to a MemoryIntrospector interface, and introduce ControllerMemoryParameters that uses it. This makes it easier to run MSQ in process types other than Indexer and Peon. Both of these areas will have follow-ups that make similar changes on the worker side. * Address static checks. * Address static checks. * Fixes. * Report writer tests. * Adjustments. * Fix reports. * Review updates. * Adjust name. * Small changes.	2024-04-30 21:30:27 -07:00
Adarsh Sanjeev	9a2d7c28bc	Prepare master branch for 31.0.0 release (#16333 )	2024-04-26 09:22:43 +05:30
Rishabh Singh	e30790e013	Introduce Segment Schema Publishing and Polling for Efficient Datasource Schema Building (#15817 ) Issue: #14989 The initial step in optimizing segment metadata was to centralize the construction of datasource schema in the Coordinator (#14985). Thereafter, we addressed the problem of publishing schema for realtime segments (#15475). Subsequently, our goal is to eliminate the requirement for regularly executing queries to obtain segment schema information. This is the final change which involves publishing segment schema for finalized segments from task and periodically polling them in the Coordinator.	2024-04-24 22:22:53 +05:30
Laksh Singla	b9bbde5c0a	Fix deadlock that can occur while merging group by results (#15420 ) This PR prevents such a deadlock from happening by acquiring the merge buffers in a single place and passing it down to the runner that might need it.	2024-04-22 14:10:44 +05:30
Kashif Faraz	81d7b6ebe1	Fix OverlordClient to read reports as a concrete `ReportMap` (#16226 ) Follow up to #16217 Changes: - Update `OverlordClient.getReportAsMap()` to return `TaskReport.ReportMap` - Move the following classes to `org.apache.druid.indexer.report` in the `druid-processing` module - `TaskReport` - `KillTaskReport` - `IngestionStatsAndErrorsTaskReport` - `TaskContextReport` - `TaskReportFileWriter` - `SingleFileTaskReportFileWriter` - `TaskReportSerdeTest` - Remove `MsqOverlordResourceTestClient` as it had only one method which is already present in `OverlordResourceTestClient` itself	2024-04-15 08:00:59 +05:30
YongGang	da9feb4430	Introduce TaskContextReport for reporting task context (#16041 ) Changes: - Add `TaskContextEnricher` interface to improve task management and monitoring - Invoke `enrichContext` in `TaskQueue.add()` whenever a new task is submitted to the Overlord - Add `TaskContextReport` to write out task context information in reports	2024-04-12 08:57:49 +05:30
Zoltan Haindrich	1df41db46d	Migrate to use docker compose v2 (#16232 ) https://github.com/actions/runner-images/issues/9557	2024-04-03 12:32:55 +02:00
Kashif Faraz	4df4896674	Refactor: Add common method in AbstractBatchIndexTask to create ingestion stats report (#16202 ) Changes - No functional changes - Add method `AbstractBatchIndexTask.buildIngestionStatsReport()` used in several batch tasks - Add utility method `AbstractBatchIndexTask.addBuildSegmentStatsToReport()` - Use boolean argument to represent a full report instead of the String `full` in internal methods. (REST API remains unchanged.) - Rename `IngestionStatsAndErrorsTaskReportData` to `IngestionStatsAndErrors` - Clean up some of the methods	2024-03-28 23:07:00 +05:30
Adarsh Sanjeev	86a24012a6	Add security ITs for sending tasks to overlord (#16131 ) * Add security ITs for sending tasks to overlord * Add security ITs for sending tasks to overlord * Resolve test flakiness	2024-03-18 09:33:40 +05:30
Kashif Faraz	1682d4570d	Increase delay to allow propagation of credentials (#16143 )	2024-03-17 14:47:42 +05:30
Adithya Chakilam	564c44ed85	Add stats segmentsRead and segmentsPublished to compaction task reports (#15947 ) Changes: - Add visibility into number of segments read/published by each parallel compaction - Add new fields `segmentsRead`, `segmentsPublished` to `IngestionStatsAndErrorsTaskReportData` - Update `ParallelIndexSupervisorTask` to populate the new stats	2024-03-07 09:37:23 +05:30
Adithya Chakilam	ec52f686c0	Fix compaction tasks reports getting overwritten (#15981 ) * Fix compaction tasks reports geting overwrittened * only skip for compactiont task * address comments * fix boolean * move boolean flag to task rather than spec * rename variable * add docs, fix missing case * Update docs/ingestion/tasks.md * rename var * add task report decode check in IT * change assert	2024-03-04 10:10:17 -05:00
Sensor	e0bce0ef90	Add pre-check for heavy debug logs (#15706 ) Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> Co-authored-by: Benedict Jin <asdf2014@apache.org>	2024-02-29 12:58:14 +05:30
Abhishek Agarwal	ddfc31d7ed	Reduce the size of distribution docker image (#15968 ) This PR creates symlinks when there are duplicate jars present in the extension. Docker image includes contrib extensions, too, and the size of the image has bloated up quite a lot of late. This change also fixes "ITNestedQueryPushDownTest integration test"	2024-02-26 21:18:55 +05:30
Adarsh Sanjeev	9eaaeb5c16	Add security ITs to the revised integration tests (#15885 ) * Add IT for security * Add admin client * Clean up code * Clean up code * Address review comments	2024-02-20 11:32:08 +05:30
Abhishek Radhakrishnan	a7918be268	Temporarily bump up the delay in auth IT from 5s to 10s. (#15765 ) A more ideal/permanent fix would be to have status checks exposed by the services, but that'll require more code changes. So temporarily bump it to unblock CI now.	2024-01-26 11:52:27 -05:00
Karan Kumar	c4990f56d6	Prepare main branch for next 30.0.0 release. (#15707 )	2024-01-23 15:55:54 +05:30
Kashif Faraz	f0c552b2f9	Fix basic auth integration test (#15679 ) * Add some retries * Add a delay to allow creds to propagate * Checkstyle and stuff	2024-01-14 08:59:15 -08:00
Rishabh Singh	71f5307277	Eliminate Periodic Realtime Segment Metadata Queries: Task Now Publish Schema for Seamless Coordinator Updates (#15475 ) The initial step in optimizing segment metadata was to centralize the construction of datasource schema in the Coordinator (#14985). Subsequently, our goal is to eliminate the requirement for regularly executing queries to obtain segment schema information. This task encompasses addressing both realtime and finalized segments. This modification specifically addresses the issue with realtime segments. Tasks will now routinely communicate the schema for realtime segments during the segment announcement process. The Coordinator will identify the schema alongside the segment announcement and subsequently update the schema for realtime segments in the metadata cache.	2024-01-10 08:55:56 +05:30
Kashif Faraz	cce539495d	[Flaky test] Fix basic auth integration test (#15561 ) Database slowness while doing audits seems to be causing flakiness in auth ITs. The failing test is almost always `ITBasicAuthConfigurationTest.test_avaticaQuery_datasourceAndContextParamsUser` but in some rare cases, other tests fail too. Alternately, this failing test has been seen to pass too. It is most likely because the auth changes are not able to propagate in time from the coordinator to other services. Fix: Just log the audits rather than persisting them to database. Most audits have been newly added and it is okay to not have them persisted. Moreover, logging audits can also be more beneficial while debugging an IT.	2023-12-23 12:11:12 +05:30
Kashif Faraz	9f568858ef	Add logging implementation for AuditManager and audit more endpoints (#15480 ) Changes - Add `log` implementation for `AuditManager` alongwith `SQLAuditManager` - `LoggingAuditManager` simply logs the audit event. Thus, it returns empty for all `fetchAuditHistory` calls. - Add new config `druid.audit.manager.type` which can take values `log`, `sql` (default) - Add new config `druid.audit.manager.logLevel` which can take values `DEBUG`, `INFO`, `WARN`. This gets activated only if `type` is `log`. - Remove usage of `ConfigSerde` from `AuditManager` as audit is not just limited to configs - Add `AuditSerdeHelper` for a single implementation of serialization/deserialization of audit payload and other utility methods.	2023-12-19 13:14:04 +05:30
Jan Werner	fa2c8edb5d	unpin snakeyaml, add suppressions and licenses (#15549 ) * unpin snakeyaml globally, add suppressions and licenses * pin snakeyaml in the specific modules that require version 1.x, update licenses and owasp suppression This removes the pin of the Snakeyaml introduced in: https://github.com/apache/druid/pull/14519 After the updates of io.kubernetes.java-client and io.confluent.kafka-clients, the only uses of the Snakeyaml 1.x are: - in test scope, transitive dependency of jackson-dataformat-yaml🫙2.12.7 - in compile scope in contrib extension druid-cassandra-storage - in compile scope in it-tests. With the dependency version un-pinned, io.kubernetes.java-client and io.confluent.kafka-clients bring Snakeyaml versions 2.0 and 2.2, consequently allowing to build a Druid distribution without the contrib-extension and free of vulnerable Snakeyaml versions.	2023-12-15 10:33:14 -08:00
Ankit Kothari	8735d023a1	Add experimental support for first/last for double/float/long #10702 (#14462 ) Add experimental support for doubleLast, doubleFirst, FloatLast, FloatFirst, longLast and longFirst.	2023-12-12 11:36:51 +05:30
Abhishek Radhakrishnan	96be82a3e6	Clean up duty for non-overlapping eternity tombstones (#15281 ) * Add initial draft of MarkDanglingTombstonesAsUnused duty. * Use overshadowed segments instead of all used segments. * Add unit test for MarkDanglingSegmentsAsUnused duty. * Add mock call * Simplify code. * Docs * shorter lines formatting * metric doc * More tests, refactor and fix up some logic. * update javadocs; other review comments. * Make numCorePartitions as 0 in the TombstoneShardSpec. * fix up test * Add tombstone core partition tests * Update docs/design/coordinator.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * review comment * Minor cleanup * Only consider tombstones with 0 core partitions * Need to register the test shard type to make jackson happy * test comments * checkstyle * fixup misc typos in comments * Update logic to use overshadowed segments * minor cleanup * Rename duty to eternity tombstone instead of dangling. Add test for full eternity tombstone. * Address review feedback. --------- Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2023-12-11 08:57:15 -08:00
Rishabh Singh	77b929f494	Fix CentralizedDatasourceSchema IT (#15493 )	2023-12-05 20:05:13 +05:30
Rishabh Singh	d968bb3f43	Rename config for enabling CentralizedDatasourceSchema feature (#15476 ) * Rename property to druid.centralizedDatasourceSchema.enabled * Update config name in docker-compose	2023-12-05 16:57:25 +05:30
Jan Werner	ee6ad36fab	update confluent's dependencies to common, supported version (#15441 ) * update confluent's dependencies to common, supported version Update io.confluent.* dependencies to common, updated version 6.2.12 currently used versions are EOL * move version definition to the top level pom	2023-11-28 21:35:22 -08:00
Rishabh Singh	db95c375a6	Increase historical heap for standard IT (#15337 ) Lately, Query IT has been failing due to historical server running out of memory (OOM). We are investigating the historical heap dump from the test. Until the issue is resolved, we are increasing the heap size of historical server.	2023-11-08 15:21:30 +05:30
Abhishek Agarwal	4b64a5693b	Move service specific JVM parameters to the right in tests (#15325 ) Historical OOMs were not getting dumped into /shared/logs because common JVM flags will override service-specific JVM flags. This PR fixes that and also removes unnecessary overrides in historical.	2023-11-06 15:45:59 +05:30
Rishabh Singh	8c802e4c9b	Relocating Table Schema Building: Shifting from Brokers to Coordinator for Improved Efficiency (#14985 ) In the current design, brokers query both data nodes and tasks to fetch the schema of the segments they serve. The table schema is then constructed by combining the schemas of all segments within a datasource. However, this approach leads to a high number of segment metadata queries during broker startup, resulting in slow startup times and various issues outlined in the design proposal. To address these challenges, we propose centralizing the table schema management process within the coordinator. This change is the first step in that direction. In the new arrangement, the coordinator will take on the responsibility of querying both data nodes and tasks to fetch segment schema and subsequently building the table schema. Brokers will now simply query the Coordinator to fetch table schema. Importantly, brokers will still retain the capability to build table schemas if the need arises, ensuring both flexibility and resilience.	2023-11-04 19:33:25 +05:30
Xavier Léauté	352702bb25	run some integration tests with Java 21 (#15104 ) * use setup-java everywhere for consistency * add Java 21 to integration test matrix * simplify docker build containers script + add Java 21 * fix for Java versions reporting 21-ea	2023-10-20 11:18:13 +08:00
Laksh Singla	5f86072456	Prepare master for Druid 29 (#15121 ) Prepare master for Druid 29	2023-10-11 10:33:45 +05:30
Rishabh Singh	ebb9724c26	Pass jvm option to write heap dump on out of memory (#15053 )	2023-09-29 17:54:53 +05:30
Zoltan Haindrich	5f3b310115	Build reliablity fixes (#15048 ) * disable parallel builds; enable batch mode to get rid of transfer progress * restore .m2 from setup-java if not found * some change to sql * add ws * fix quote * fix quote * undo querytest change * nullhandling in mvtest * init more * skip commitid plugin * add-back 1.0C to build ; remove redundant skip-s from copy-resources; add comment	2023-09-28 12:27:52 -07:00
Soumyava	8088a763a6	Vectorize earliest aggregator for both numeric and string types (#14408 ) * Vectorizing earliest for numeric * Vectorizing earliest string aggregator * checkstyle fix * Removing unnecessary exceptions * Ignoring tests in MSQ as earliest is not supported for numeric there * Fixing benchmarks * Updating tests as MSQ does not support earliest for some cases * Addressing review comments by adding the following: 1. Checking capabilities first before creating selectors 2. Removing mockito in tests for numeric first aggs 3. Removing unnecessary tests * Addressing issues for dictionary encoded single string columns where we can use the dictionary ids instead of the entire string * Adding a flag for multi value dimension selector * Addressing comments * 1 more change * Handling review comments part 1 * Handling review comments and correctness fix for latest_by when the time expression need not be in sorted order * Updating numeric first vector agg * Revert "Updating numeric first vector agg" This reverts commit `4291709901`. * Updating code for correctness issues * fixing an issue with latest agg * Adding more comments and removing an unnecessary check * Addressing null checks for tie selector and only vectorize false for quantile sketches	2023-09-05 08:41:42 -07:00
Clint Wylie	5d1412949e	enable sql compatible null handling mode by default (#14792 ) * enable sql compatible null handling mode by default * fix bug with string first/last aggs when druid.generic.useDefaultValueForNull=false	2023-08-21 20:07:13 -07:00
Kashif Faraz	097b645005	Clean up after add kill bufferPeriod (#14868 ) Follow up changes to #12599 Changes: - Rename column `used_flag_last_updated` to `used_status_last_updated` - Remove new CLI tool `UpdateTables`. - We already have a `CreateTables` with similar functionality, which should be able to handle update cases too. - Any user running the cluster for the first time should either just have `connector.createTables` enabled or run `CreateTables` which should create tables at the latest version. - For instance, the `UpdateTables` tool would be inadequate when a new metadata table has been added to Druid, and users would have to run `CreateTables` anyway. - Remove `upgrade-prep.md` and include that info in `metadata-init.md`. - Fix log messages to adhere to Druid style - Use lambdas	2023-08-19 00:00:04 +05:30
Lucas Capistrant	9c124f2cde	Add a configurable bufferPeriod between when a segment is marked unused and deleted by KillUnusedSegments duty (#12599 ) * Add new configurable buffer period to create gap between mark unused and kill of segment * Changes after testing * fixes and improvements * changes after initial self review * self review changes * update sql statement that was lacking last_used * shore up some code in SqlMetadataConnector after self review * fix derby compatibility and improve testing/docs * fix checkstyle violations * Fixes post merge with master * add some unit tests to improve coverage * ignore test coverage on new UpdateTools cli tool * another attempt to ignore UpdateTables in coverage check * change column name to used_flag_last_updated * fix a method signature after column name switch * update docs spelling * Update spelling dictionary * Fixing up docs/spelling and integrating altering tasks table with my alteration code * Update NULL values for used_flag_last_updated in the background * Remove logic to allow segs with null used_flag_last_updated to be killed regardless of bufferPeriod * remove unneeded things now that the new column is automatically updated * Test new background row updater method * fix broken tests * fix create table statement * cleanup DDL formatting * Revert adding columns to entry table by default * fix compilation issues after merge with master * discovered and fixed metastore inserts that were breaking integration tests * fixup forgotten insert by using pattern of sharing now timestamp across columns * fix issue introduced by merge * fixup after merge with master * add some directions to docs in the case of segment table validation issues	2023-08-17 19:32:51 -05:00
Rishabh Singh	0dc305f9e4	Upgrade hibernate validator version to fix CVE-2019-10219 (#14757 )	2023-08-14 11:50:51 +05:30
Tejaswini Bandlamudi	a45b25fa1d	Removes support for Hadoop 2 (#14763 ) Removing Hadoop 2 support as discussed in https://lists.apache.org/list?dev@druid.apache.org:lte=1M:hadoop	2023-08-09 17:47:52 +05:30
Kashif Faraz	2d8e0f28f3	Refactor: Cleanup coordinator duties for metadata cleanup (#14631 ) Changes - Add abstract class `MetadataCleanupDuty` - Make `KillAuditLogs`, `KillCompactionConfig`, etc extend `MetadataCleanupDuty` - Improve log and error messages - Cleanup tests - No functional change	2023-08-05 13:08:23 +05:30
AmatyaAvadhanula	5a52f7a457	Fix IT failure due to query interval (#14738 )	2023-08-02 11:29:35 -07:00

1 2 3 4 5 ...

610 Commits