druid

Commit Graph

Author	SHA1	Message	Date
zachjsh	3f2dd46ede	Catalog table should not need explicit segment granularity set (#16278 ) * * fix * * fix * * address review comments * * fix * * simplify tests * * fix complex type nullability issue * * fix and update test * * address review comments * * address test review comments * * fix checkstyle * * fix checkstyle * * fix failing test	2024-04-17 11:46:24 -04:00
Clint Wylie	aa230642dd	use PeekableIntIterator for OR filter "partial index" value matchers (#16300 )	2024-04-17 08:27:21 -07:00
zachjsh	2351f038eb	Kafka with topicPattern can ignore old offsets spuriously (#16190 ) * * fix * * simplify * * simplify tests * * update matches function definition for Kafka Datasource Metadata * * add matchesOld * * override matches and plus for kafka based metadata / sequence numbers * * implement minus * add tests * * fix failing tests * * remove TODO comments * * simplfy and add comments * * remove unused variable in tests * * remove unneeded function * * add serde tests * * more stuff * * address review comments * * remove unneeded code.	2024-04-17 10:00:17 -04:00
Hardik Bajaj	0bf5e7745d	Add configurable parameters for statsd client (#16283 ) Statsd client sometimes drops metrics when this queueSize of statsd client with max unprocessed messages is completely full. This causes some high cardinality metrics like per partition lag being droppped. There are multiple parameters of statsdclient that can be initialized and can help increase the load/capacity of client to not to drop metrics more frequently. Properties like queueSize, poolSize, processorWorkers and senderWorkers will now be configurable at runtime	2024-04-17 18:35:31 +05:30
Adithya Chakilam	34237bc112	Consider max lag for kinesis while autoscaling (#16284 ) * Consider max lag for kinesis while autoscaling * add test for coverage * test folder	2024-04-17 15:05:05 +05:30
Gian Merlino	ccc1ffb032	Additional short circuiting knowledge in filter bundles. (#16292 ) * Additional short circuiting knowledge in filter bundles. Three updates: 1) The parameter "selectionRowCount" on "makeFilterBundle" is renamed "applyRowCount", and redefined as an upper bound on rows remaining after short-circuiting (rather than number of rows selected so far). This definition works better for OR filters, which pass through the FALSE set rather than the TRUE set to the next subfilter. 2) AndFilter uses min(applyRowCount, indexIntersectionSize) rather than using selectionRowCount for the first subfilter and indexIntersectionSize for each filter thereafter. This improves accuracy when the incoming applyRowCount is smaller than the row count from the first few indexes. 3) OrFilter uses min(applyRowCount, totalRowCount - indexUnionSize) rather than applyRowCount for subfilters. This allows an OR filter to pass information about short-circuiting to its subfilters. To help write tests for this, the patch also moves the sampled wikiticker data file from sql to processing. * Forbidden APIs. * Forbidden APIs. * Better comments. * Fix inspection. * Adjustments to tests.	2024-04-16 22:42:28 -07:00
aho135	4fa377c7fd	Improve logging for lookups (#16287 )	2024-04-17 10:20:09 +05:30
AmatyaAvadhanula	f3d69f30e6	Associate pending segments with the tasks that requested them (#16144 ) Changes: - Add column `task_allocator_id` to `pendingSegments` metadata table. - Add column `upgraded_from_segment_id` to `pendingSegments` metadata table. - Add interface `PendingSegmentAllocatingTask` and implement it by all tasks which can allocate pending segments. - Use `taskAllocatorId` to identify the task (and its sub-tasks or replicas) to which a pending segment has been allocated. - Perform active cleanup of pending segments in `TaskLockbox` once there are no active tasks for the corresponding task allocator id. - When committing APPEND segments, also commit all upgraded pending segments corresponding to that task allocator id. - When committing REPLACE segments, upgrade all overlapping pending segments in the same transaction.	2024-04-17 09:06:31 +05:30
zachjsh	a5428e75ff	INSERT/REPLACE complex target column types are validated against source input expressions (#16223 ) * * fix * * fix * * address review comments * * fix * * simplify tests * * fix complex type nullability issue * * address review comments * * address test review comments * * fix checkstyle	2024-04-16 17:20:35 -04:00
Gian Merlino	cf841b8e67	Fix incorrect class in BaseMacroFunctionExpr.equals. (#16294 ) The equals method cast to the wrong class, potentially leading to ClassCastException.	2024-04-16 09:40:46 -07:00
AmatyaAvadhanula	ad6bd62140	Handle task location fetch from overlord during rolling upgrades (#16227 ) Bug: #15724 introduced a bug where a rolling upgrade would cause all task locations returned by the Overlord on an older version to be unknown. Fix: If the new API fails, fall back to single task status API which always returns a valid task location.	2024-04-16 21:01:37 +05:30
Jan Werner	c45da431fb	update netty and zookeeper dependencies to address CVEs (#16267 ) Update dependencies to address CVEs: - Update netty from 4.1.107.Final to 4.1.108.Final to address: CVE-2024-29025 - Update zookeeper from 3.8.3 to 3.8.4 to address: CVE-2024-23944 Release notes: - Update netty from 4.1.107.Final to 4.1.108.Final to address: CVE-2024-29025 - Update zookeeper from 3.8.3 to 3.8.4 to address: CVE-2024-23944	2024-04-15 20:40:50 -07:00
YongGang	6964297b53	Remove the unused Controller context reference from Worker (#16285 )	2024-04-16 08:34:24 +05:30
Nikhil Rao	a805c5612e	Adds Druid SQL query examples for the Stats aggregator Native Queries (#16277 ) * Adds Druid SQL query examples for the Timeseries and GroupBy Native queries in the stats aggregator docs page * Updates intervals in Native Query to remove excess Time part in timestamp * Moves Druid SQL section above Native query because sql used more often by users * removes old Druid SQL sections * Adds TopN Druid SQL query using ORDER BY and LIMIT * Adds table for Druid SQL variance and standard deviation functions * Update docs/development/extensions-core/stats.md Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com> --------- Co-authored-by: Karan Kumar <karankumar1100@gmail.com> Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>	2024-04-15 08:05:34 -07:00
Sree Charan Manamala	5247059d2f	Allow Double & null values in sql type array through dynamic params (#16274 )	2024-04-15 10:44:42 +02:00
Adarsh Sanjeev	3df00aef9d	Add manifest file for MSQ export (#15953 ) Currently, export creates the files at the provided destination. The addition of the manifest file will provide a list of files created as part of the manifest. This will allow easier consumption of the data exported from Druid, especially for automated data pipelines	2024-04-15 11:37:31 +05:30
Kashif Faraz	81d7b6ebe1	Fix OverlordClient to read reports as a concrete `ReportMap` (#16226 ) Follow up to #16217 Changes: - Update `OverlordClient.getReportAsMap()` to return `TaskReport.ReportMap` - Move the following classes to `org.apache.druid.indexer.report` in the `druid-processing` module - `TaskReport` - `KillTaskReport` - `IngestionStatsAndErrorsTaskReport` - `TaskContextReport` - `TaskReportFileWriter` - `SingleFileTaskReportFileWriter` - `TaskReportSerdeTest` - Remove `MsqOverlordResourceTestClient` as it had only one method which is already present in `OverlordResourceTestClient` itself	2024-04-15 08:00:59 +05:30
Abhishek Radhakrishnan	041d0bff5e	Set default `KillUnusedSegments` duty to coordinator's indexing period & `killTaskSlotRatio` to 0.1 (#16247 ) The default value for druid.coordinator.kill.period (if unspecified) has changed from P1D to the value of druid.coordinator.period.indexingPeriod. Operators can choose to override druid.coordinator.kill.period and that will take precedence over the default behavior. The default value for the coordinator dynamic config killTaskSlotRatio is updated from 1.0 to 0.1. This ensures that that kill tasks take up only 1 task slot right out-of-the-box instead of taking up all the task slots. * Remove stale comment and inline canDutyRun() * druid.coordinator.kill.period defaults to druid.coordinator.period.indexingPeriod if not set. - Remove the default P1D value for druid.coordinator.kill.period. Instead default druid.coordinator.kill.period to whatever value druid.coordinator.period.indexingPeriod is set to if the former config isn't specified. - If druid.coordinator.kill.period is set, the value will take precedence over druid.coordinator.period.indexingPeriod * Update server/src/test/java/org/apache/druid/server/coordinator/DruidCoordinatorConfigTest.java * Fix checkstyle error * Clarify comment * Update server/src/main/java/org/apache/druid/server/coordinator/DruidCoordinatorConfig.java * Put back canDutyRun() * Default killTaskSlotsRatio to 0.1 instead of 1.0 (all slots) * Fix typo DEFAULT_MAX_COMPACTION_TASK_SLOTS * Remove unused test method. * Update default value of killTaskSlotsRatio in docs and web-console default mock * Move initDuty() after params and config setup.	2024-04-14 18:56:17 -07:00
Gian Merlino	b0c5184f9d	Fix ORDER BY on certain GROUPING SETS. (#16268 ) * Fix ORDER BY on certain GROUPING SETS. DefaultLimitSpec (part of native groupBy) had a bug where it would assume that results are naturally ordered by dimensions even when subtotalsSpec is present. However, this is not necessarily the case. For certain combinations of ORDER BY and GROUPING SETS, this would cause the ORDER BY to be ignored. * Fix test testGroupByWithSubtotalsSpecWithOrderLimitForcePushdown. Resorting was necessary.	2024-04-12 12:06:47 -07:00
Katya Macedo	7f06a53cb1	[Docs] Fix API placeholder formatting (#16240 )	2024-04-12 09:19:13 -07:00
Sree Charan Manamala	3340b200db	Fix window function drill tests failures falling under RESULT_MISMATCH & RESULT_COUNT_MISMATCH (#16264 ) * Updated the drill test expected results which are failing due to druid's default sorting algorithm taking nulls first approach. * Corrected the queries where date time values are directly provided * marked 2 cases failing with resultset casting issues	2024-04-12 13:54:48 +02:00
Laksh Singla	cce2d0f127	Upload openrewrite patch via GHA (#16270 ) This patch adds a step to the openrewrite action, such that it uploads the correcting patch, in case it fails.	2024-04-12 15:31:07 +05:30
Sree Charan Manamala	f65c166327	Windowed aggregates should update the aggregation value based on final compute (#16244 )	2024-04-12 08:28:33 +02:00
YongGang	da9feb4430	Introduce TaskContextReport for reporting task context (#16041 ) Changes: - Add `TaskContextEnricher` interface to improve task management and monitoring - Invoke `enrichContext` in `TaskQueue.add()` whenever a new task is submitted to the Overlord - Add `TaskContextReport` to write out task context information in reports	2024-04-12 08:57:49 +05:30
Pranav	fc2600b8e2	Adding jvmVersion dimension in JVM Monitor (#16262 )	2024-04-11 15:44:56 -07:00
Gian Merlino	9f358f5f4a	SQL tests: avoid mixing skip and cannot vectorize. (#16251 ) * SQL tests: avoid mixing skip and cannot vectorize. skipVectorize switches off vectorization tests completely, and cannotVectorize turns vectorization tests into negative tests. It doesn't make sense to use them together, so this patch makes it an error to do so, and cleans up cases where both are mentioned. This patch also has the effect of changing various tests from skipVectorize to cannotVectorize, because in the past when both were mentioned, skipVectorize would take priority. * Fix bug with StringAnyAggregatorFactory attempting to vectorize when it cannt. * Fix tests.	2024-04-11 15:06:11 -07:00
317brian	df9e1bb97b	Docs: Fix typo in tutorial (#16254 )	2024-04-10 08:59:52 +05:30
Katya Macedo	cd69f145b7	docs: Add upgrade notes for Druid 29.0.1 (#16123 )	2024-04-09 13:56:57 -07:00
Vishesh Garg	3d595cfab1	Add storeCompactionState flag support to msq (#15965 ) Compaction in the native engine by default records the state of compaction for each segment in the lastCompactionState segment field. This PR adds support for doing the same in the MSQ engine, targeted for future cases such as REPLACE and compaction done via MSQ. Note that this PR doesn't implicitly store the compaction state for MSQ replace tasks; it is stored with flag "storeCompactionState": true in the query context.	2024-04-09 16:47:47 +05:30
Vishesh Garg	9a4fb58543	Record column name for exceptions while writing frames in RowBasedFrameWriter (#16130 ) Current Runtime Exceptions generated while writing frames only include the exception itself without including the name of the column they were encountered in. This patch introduces the further information in the error and makes it non-retryable.	2024-04-09 15:39:10 +05:30
Adarsh Sanjeev	e2e0cb905c	Add reasoning for choosing shardSpec to the MSQ report (#16175 ) This PR logs the segment type and reason chosen. It also adds it to the query report, to be displayed in the UI. This PR adds a new section to the reports, segmentReport. This contains the segment type created, if the query is an ingestion, and null otherwise.	2024-04-09 11:32:02 +05:30
Gian Merlino	5e5cf9af99	Reduce upload buffer size in GoogleTaskLogs. (#16236 ) * Reduce upload buffer size in GoogleTaskLogs. Use a 1MB upload buffer, rather than the default of 15 MB in the API client. This is mainly because MMs may upload logs in parallel, and typically have small heaps. The default-sized 15 MB buffers add up quickly and can cause a MM to run out of memory. * Make bufferSize a nullable Integer. Add tests.	2024-04-08 12:54:31 -07:00
Vadim Ogievetsky	4ff7e2c6c9	Web console: Better manual capabilities detection indication (#16191 ) * Better forced mode indication * more robust * Update web-console/src/components/header-bar/header-bar.tsx Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update web-console/src/components/header-bar/header-bar.tsx Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update web-console/src/components/header-bar/restricted-mode/__snapshots__/restricted-mode.spec.tsx.snap Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update web-console/src/components/header-bar/restricted-mode/__snapshots__/restricted-mode.spec.tsx.snap Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update web-console/src/components/header-bar/restricted-mode/restricted-mode.tsx Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update web-console/src/components/header-bar/restricted-mode/restricted-mode.tsx Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update web-console/src/components/header-bar/restricted-mode/restricted-mode.tsx Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update web-console/src/components/header-bar/restricted-mode/restricted-mode.tsx Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update web-console/src/components/header-bar/restricted-mode/restricted-mode.tsx Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update web-console/src/components/header-bar/restricted-mode/restricted-mode.tsx Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update web-console/src/components/header-bar/restricted-mode/restricted-mode.tsx Co-authored-by: Charles Smith <techdocsmith@gmail.com> * reformat * forced => manual capability detection * typo * typo2 --------- Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2024-04-08 10:07:21 -07:00
sullis	f4649fece9	Bump openrewrite plugin + recipes (#16238 )	2024-04-08 15:13:57 +05:30
Vishesh Garg	af24cc88ce	Fix CVE errors (#16147 ) * Fix CVE errors * Update pac4j * Update nimbus.jose.jwt.version * Change pac4j version to 5.7.3 * Change pac4j version to 5.3.1 * Revert pac4j version change * Update pac4j comment	2024-04-05 17:53:09 +05:30
Parag Jain	f55c9e58a8	add google as external storage for msq export (#16051 ) Support for exporting msq results to gcs bucket. This is essentially copying the logic of s3 export for gs, originally done by @adarshsanjeev in this PR - #15689	2024-04-05 12:10:10 +05:30
Vadim Ogievetsky	3ba878f21b	don't send lookups to sampler (#16234 )	2024-04-04 21:17:42 -07:00
Gian Merlino	a319b44545	Allow typedIn to run in replace-with-default mode. (#16233 ) * Allow typedIn to run in replace-with-default mode. Useful when data servers, like Historicals, are running in replace-with-default mode and the Broker is running in SQL-compatible mode, which can happen during a rolling update that is applying a mode change.	2024-04-04 15:45:42 -07:00
Sergio Ferragut	64433eb2ff	Update kubernetes-overloard-extension extension name in docs (#16239 )	2024-04-04 14:38:28 -07:00
Vadim Ogievetsky	9658e1ad7f	Web console: fix query timer issues (#16235 ) * fix timer issues * wording	2024-04-04 13:13:31 -07:00
Soumyava	7759f25095	Moving bitwise_or to use native calcite operator (#16237 )	2024-04-04 12:49:29 -07:00
Soumyava	972937659d	Fixing return type for IPV4 (#15916 ) * Fixing return type for IPV4 * Update ipv4match	2024-04-04 08:49:50 -07:00
Abhishek Radhakrishnan	75fb57ed6e	Update error messages when supervisor's checkpoint state is invalid (#16208 ) * Update error message when topic messages. Suggest resetting the supervisor when the topic changes instead of changing the supervisor name which is actually making a new supervisor. * Update server/src/main/java/org/apache/druid/metadata/IndexerSQLMetadataStorageCoordinator.java Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Cleanup * Remove log and include oldCommitMetadataFromDb * Fix test --------- Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2024-04-03 10:34:17 -07:00
Zoltan Haindrich	1df41db46d	Migrate to use docker compose v2 (#16232 ) https://github.com/actions/runner-images/issues/9557	2024-04-03 12:32:55 +02:00
Soumyava	4bea865697	Restore context flag for window functions (#16229 )	2024-04-03 13:57:13 +05:30
AmatyaAvadhanula	218513ad55	Use created time from metadata store in list tasks (#16228 )	2024-04-03 09:03:32 +05:30
Gian Merlino	b0ca06f8cd	Fix name of combining filtered aggregator factory. (#16224 ) The name of the combining filtered aggregator factory should be the same as the name of the original factory. However, it wasn't the same in the case where the original factory's name and the original delegate aggregator were inconsistently named. In this scenario, we should use the name of the original filtered aggregator, not the name of the original delegate aggregator.	2024-04-02 12:59:48 -07:00
zachjsh	9b52c909e0	fix complex types returning UNKNOWN as their SQL type inference (#16216 ) * * fix * * fix * * address review comments	2024-04-02 14:36:01 -04:00
Sree Charan Manamala	26f9b174de	Handling nil selector column in vector math processors (#16128 )	2024-04-02 02:06:57 -07:00
Vadim Ogievetsky	06268bf060	only pick kafka input format by default when needed (#16180 )	2024-04-01 13:47:49 -07:00

1 2 3 4 5 ...

14028 Commits All Branches Search

14028 Commits

All Branches