druid

mirror of https://github.com/apache/druid.git synced 2025-02-10 12:05:00 +00:00

Author	SHA1	Message	Date
Charles Smith	ed48cb82e9	[Docs} Remove avro_ocf support from Kafka & Kinesis streaming sources (Revert changes from #11865 ) (#16807 )	2024-07-26 13:06:22 -07:00
Abhishek Radhakrishnan	3c493dc3ed	CircularList round-robin iterator for the KillUnusedSegments duty (#16719 ) * Round-robin iterator for datasources to kill. Currently there's a fairness problem in the KillUnusedSegments duty where the duty consistently selects the same set of datasources as discovered from the metadata store or dynamic config params. This is a problem especially when there are multiple unused. In a medium to large cluster, while we can increase the task slots to increase the likelihood of broader coverage. This patch adds a simple round-robin iterator to select datasources and has the following properties: 1. Starts with an initial random cursor position in an ordered list of candidates. 2. Consecutive {@code next()} iterations from {@link #getIterator()} are guaranteed to be deterministic unless the set of candidates change when {@link #updateCandidates(Set)} is called. 3. Guarantees that no duplicate candidates are returned in two consecutive {@code next()} iterations. * Renames in RoundRobinIteratorTest. * Address review comments. 1. Clarify javadocs on the ordered list. Also flesh out the details a bit more. 2. Rename the test hooks to make intent clearer and fix typo. 3. Add NotThreadSafe annotation. 4. Remove one potentially noisy log that's in the path of iteration. * Add null check to input candidates. * More commentary. * Addres review feedback: downgrade some new info logs to debug; invert condition. Remove redundant comments. Remove rendundant variable tracking. * CircularList adjustments. * Updates to CircularList and cleanup RoundRobinInterator. * One more case and add more tests. * Make advanceCursor private for now. * Review comments.	2024-07-26 12:20:49 -07:00
Sree Charan Manamala	9b76d13ff8	Check for Aggregation inside a window clause when syntax used as - WINDOW W AS DEF (#16801 )	2024-07-26 11:18:35 +02:00
Laksh Singla	725d442355	Faster dimension deserialization on the brokers (#16740 ) Speedier dimension deserialization on the brokers.	2024-07-26 14:36:11 +05:30
Clint Wylie	71725b41b5	ignore dependencies for github stale action (#16797 )	2024-07-25 10:32:43 -07:00
Gian Merlino	b2a88da200	Attempt to coerce COMPLEX to number in numeric aggregators. (#16564 ) * Coerce COMPLEX to number in numeric aggregators. PR #15371 eliminated ObjectColumnSelector's built-in implementations of numeric methods, which had been marked deprecated. However, some complex types, like SpectatorHistogram, can be successfully coerced to number. The documentation for spectator histograms encourages taking advantage of this by aggregating complex columns with doubleSum and longSum. Currently, this doesn't work properly for IncrementalIndex, where the behavior relied on those deprecated ObjectColumnSelector methods. This patch fixes the behavior by making two changes: 1) SimpleXYZAggregatorFactory (XYZ = type; base class for simple numeric aggregators; all of these extend NullableNumericAggregatorFactory) use getObject for STRING and COMPLEX. Previously, getObject was only used for STRING. 2) NullableNumericAggregatorFactory (base class for simple numeric aggregators) has a new protected method "useGetObject". This allows the base class to correctly check for null (using getObject or isNull). The patch also adds a test for SpectatorHistogram + doubleSum + IncrementalIndex. * Fix tests. * Remove the special ColumnValueSelector. * Add test.	2024-07-25 08:45:29 -07:00
Rohan Garg	b5f117bca2	Check for tombstones in wrapping storage adapters (#16791 )	2024-07-25 06:55:40 -04:00
Clint Wylie	14954c7eb9	serialize legacy as false for scan query for rolling downgrade/upgrade (#16793 ) Fixes rolling downgrades/upgrades after #16659 by hard coding scan query "legacy":false since it is a required property during deserialization.	2024-07-25 14:51:58 +05:30
Gian Merlino	c1875e7c1d	HashJoinEngine: Check for interruptions while walking left cursor. (#16773 ) * HashJoinEngine: Check for interruptions while walking left cursor. Previously, the engine only checked for interruptions between emitting joined rows. In scenarios where large numbers of left rows are skipped completely (such as a highly selective INNER JOIN) this led to the join cursor being insufficiently responsive to cancellation. * Coverage.	2024-07-25 15:10:50 +08:00
Clint Wylie	5da69a01cb	change arrayIngestMode default to array (#16789 ) * change arrayIngestMode default to array * remove arrayIngestMode flag option none * fix space * fix test	2024-07-25 15:09:40 +08:00
Zoltan Haindrich	7e3fab5bf9	Make WindowFrames more specific (#16741 ) Changes the WindowFrame internals / representation a bit; introduces dedicated frametypes for rows and groups which corresponds to the implemented processing methods	2024-07-25 04:57:36 +02:00
Edgar Melendrez	ca787885c9	[docs] batch02 of updating functions (#16761 ) * applying changes * ensuring batch is updated * Update docs/querying/sql-functions.md * raise -> raises * addressing review * Apply suggestions from code review Co-authored-by: Charles Smith <techdocsmith@gmail.com> --------- Co-authored-by: Benedict Jin <asdf2014@apache.org> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2024-07-24 15:28:57 -07:00
John Gozde	6ff0cbfa54	Prune date-fns locales, bump sass TODO (#16792 )	2024-07-24 10:50:53 -07:00
Akshat Jain	a0437b6c93	MSQ window functions: Fix partition boundary issues for arrays (#16780 ) * MSQ window functions: Fix partition boundary issues for arrays * Address review comments * Cache type strategies * Trigger Build * Convert typeStrategies from list to array	2024-07-24 18:47:04 +05:30
Clint Wylie	302739aa58	more aggressive cancellation of broker parallel merge, more chill blocking queue timeouts, and query cancellation participation (#16748 ) * more aggressive cancellation of broker parallel merge, more chill blocking queue timeouts * wire parallel merge into query cancellation system * oops * style * adjust metrics initialization * fix timeout, fix cleanup to not block * javadocs to clarify why cancellation future and gizmo are split * cancelled -> canceled, simplify QueuePusher since it always takes a ResultBatch, non-static terminal marker to make stuff stop complaining about types, specialize tryOffer to be tryOfferTerminal so it wont be misused, add comments to clarify reason for non-blocking offers that might fail	2024-07-24 14:58:34 +08:00
Vadim Ogievetsky	4f0b80bef5	Web console: change to use @fontsource/open-sans (#16786 ) * change to use @fontsource/open-sans * import locale directly * update check license	2024-07-23 21:28:59 -07:00
Sree Charan Manamala	3f4d66c399	Check for Unsupported Aggregation with Distinct when useApproxCountDistinct is enabled (#16770 ) * init * add NativelySupportsDistinct * refactor * javadoc * refactor * fix tests * fix drill tests * comments * Update sql/src/test/java/org/apache/druid/sql/calcite/DrillWindowQueryTest.java --------- Co-authored-by: Benedict Jin <asdf2014@apache.org>	2024-07-24 11:13:22 +08:00
Sébastien	aeb2ee59a2	Added an option to hide the workbench-view toolbar (#16785 )	2024-07-23 15:36:54 -07:00
317brian	704962ec8e	doc: minor fixes to migration guides (#16784 )	2024-07-23 13:09:51 -07:00
George Shiqi Wu	a64e9a1746	Add annotation for pod template (#16772 ) * Add annotation for pod template * pr comments * add test cases * add tests	2024-07-23 07:25:15 -07:00
Laksh Singla	11bb40981e	Deduce type from the aggregators when materializing subquery results (#16703 ) For aggregators like StringFirst/Last, whose intermediate type isn't the same as the final type, using them in GroupBy, TopN or Timeseries subqueries causes a fallback when maxSubqueryBytes is set. This is because we assume that the finalization is not known, due to which the row signature cannot determine whether to use the intermediate or the final type, and it puts it as null. This PR figures out the finalization from the query context and uses the intermediate or the final type appropriately.	2024-07-23 11:52:39 +05:30
Akshat Jain	c45d4fdbca	MSQ window functions: Minor cleanup for empty over clause related flows + Exhaustive tests (#16754 ) * MSQ window functions: Revamp logic to create separate window stages when empty over() clause is present * Fix tests * Revert changes of creating separate stages for empty over clause * Address review comments	2024-07-23 11:37:34 +05:30
Gian Merlino	8b8ca0d7fc	DimFilterUtils: Exit filterShards early when filter is null. (#16774 ) When the filter is null, there is no need to run the converter on all the input objects.	2024-07-22 21:17:11 -07:00
Clint Wylie	b645d09c5d	move long and double nested field serialization to later phase of serialization (#16769 ) changes: * moves value column serializer initialization, call to `writeValue` method to `GlobalDictionaryEncodedFieldColumnWriter.writeTo` instead of during `GlobalDictionaryEncodedFieldColumnWriter.addValue`. This shift means these numeric value columns are now done in the per field section that happens after serializing the nested column raw data, so only a single compression buffer and temp file will be needed at a time instead of the total number of nested literal fields present in the column. This should be especially helpful for complicated nested structures with thousands of columns as even those 64k compression buffers can add up pretty quickly to a sizeable chunk of direct memory.	2024-07-22 21:14:30 -07:00
Edgar Melendrez	934c10b1cd	docs: Adding admonition box to warn about MVD (#16712 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Benedict Jin <asdf2014@apache.org>	2024-07-22 17:32:23 -07:00
Clint Wylie	02b8738c00	remove batchProcessingMode from task config, remove AppenderatorImpl (#16765 ) changes: * removes `druid.indexer.task.batchProcessingMode` in favor of always using `CLOSED_SEGMENT_SINKS` which uses `BatchAppenderator`. This was intended to become the default for native batch, but that was missed so `CLOSED_SEGMENTS` was the default (using `AppenderatorImpl`), however MSQ has been exclusively using `BatchAppenderator` with no problems so it seems safe to just roll it out as the only option for batch ingestion everywhere. * with `batchProcessingMode` gone, there is no use for `AppenderatorImpl` so it has been removed * implify `Appenderator` construction since there are only separate stream and batch versions now * simplify tests since `batchProcessingMode` is gone	2024-07-22 13:56:44 -07:00
Akshat Jain	6a2348b78b	Preemptive restriction for queries with approximate count distinct on complex columns of unsupported type (#16682 ) This PR aims to check if the complex column being queried aligns with the supported types in the aggregator and aggregator factories, and throws a user-friendly error message if they don't.	2024-07-22 21:34:06 +05:30
Sree Charan Manamala	149d7c5207	Throw exceptions in SqlValidator when DISTINCT used over WINDOW (#16738 ) * Throw exception if DISTINCT used with window functions aggregate call * Improve error message when unsupported aggregations are used with window functions	2024-07-22 16:29:46 +02:00
Sree Charan Manamala	c9aae9d8e6	Enable WINDOW_LEAF_OPERATOR for native engine to support queries without group by (#16753 )	2024-07-22 12:31:55 +02:00
dave-mccowan	7f7e6ca1e5	Fix excessive logging from druid-basic-security (#16767 ) Fixes #16766 Change log level from INFO to DEBUG when processing an empty user map during polling. An empty user map is a normal situation for some authenticators (e.g. LDAP) and polling is frequent (1 minute by default.)	2024-07-22 08:33:00 +05:30
Vadim Ogievetsky	72eeeec024	fix NPE in number formatting (#16760 )	2024-07-19 15:20:44 -07:00
Clint Wylie	a34a06e192	remove Firehose and FirehoseFactory (#16758 ) changes: * removed `Firehose` and `FirehoseFactory` and remaining implementations which were mostly no longer used after #16602 * Moved `IngestSegmentFirehose` which was still used internally by Hadoop ingestion to `DatasourceRecordReader.SegmentReader` * Rename `SQLFirehoseFactoryDatabaseConnector` to `SQLInputSourceDatabaseConnector` and similar renames for sub-classes * Moved anything remaining in a 'firehose' package somewhere else * Clean up docs on firehose stuff	2024-07-19 14:37:21 -07:00
Charles Smith	1881880714	[Docs] Adds a migration guide SQL compatible null handling (#16704 ) Co-authored-by: Clint Wylie <cjwylie@gmail.com> Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2024-07-19 09:25:05 -07:00
Sébastien	e286be9427	Exposes hooks to customize the workbench-view (#16749 ) * Exposes hooks to customize the workbench-view * addressed PR feedback * naming * auto -> formatInteger(maxNum)	2024-07-19 08:53:34 -07:00
Kashif Faraz	b1edf4a5b4	Refactor: Clean up Overlord guice dependencies (#16752 ) Description: Overlord guice dependencies are currently a little difficult to plug into. This was encountered while working on a separate PR where a class needed to depend on `TaskMaster.getTaskQueue()` to query some task related info but this class itself needs to be a dependency of `TaskMaster` so that it can be registered to the leader lifecycle. The approach taken here is to simply decouple the leadership lifecycle of the overlord from manipulation or querying of its state. Changes: - No functional change - Add new class `DruidOverlord` to contain leadership logic after the model of `DruidCoordinator` - The new class `DruidOverlord` should not be a dependency of any class with the exception of REST endpoint `*Resource` classes. - All classes that need to listen to leadership changes must be a dependency of `DruidOverlord` so that they can be registered to the leadership lifecycle. - Move all querying logic from `OverlordResource` to `TaskQueryTool` so that other classes can leverage this logic too (required for follow up PR). - Update tests	2024-07-19 17:30:23 +05:30
Clint Wylie	35b876436b	remove native scan query legacy mode (#16659 )	2024-07-18 23:33:27 -07:00
Vadim Ogievetsky	0a274d56a1	Web console: upgrade to Blueprint5 (#16756 ) * pre upgrade * did the upgrade * update snapshots * fix BP5 issues * update licenses * fix more depication warnings * use segmented control * updat snapshots * convert to fake local time * preload icons before tests * update e2e tests * Update web-console/src/components/segment-timeline/segment-timeline.tsx Co-authored-by: John Gozde <john@gozde.ca> * Update web-console/src/components/segment-timeline/segment-timeline.tsx Co-authored-by: John Gozde <john@gozde.ca> * update e2e test selector * direct import date-fns --------- Co-authored-by: John Gozde <john@gozde.ca>	2024-07-18 20:47:44 -07:00
Edgar Melendrez	721a65046f	docs: add examples for SQL functions (#16745 ) * updating first batch of numeric functions * First batch of functions * addressing first few comments * alphabetize list * draft with suggestions applied * minor discrepency expr -> <NUMERIC> * changed raises to calculates * Update docs/querying/sql-functions.md * switch to underscore * changed to exp(1) to match slack message * adding html text for trademark symbol to .spelling * fixed discrepancy between description and example --------- Co-authored-by: Benedict Jin <asdf2014@apache.org>	2024-07-18 17:06:22 -07:00
Alberic Liu	0eaa810e89	Fix the maven warning during build (#16746 )	2024-07-18 14:56:15 +08:00
Akshat Jain	b53c26f5c5	Fix issues with partitioning boundaries for MSQ window functions (#16729 ) * Fix issues with partitioning boundaries for MSQ window functions * Address review comments * Address review comments * Add test for coverage check failure * Address review comment * Remove DruidWindowQueryTest and WindowQueryTestBase, move those tests to DrillWindowQueryTest * Update extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/querykit/WindowOperatorQueryKit.java * Address review comments * Add test for equals and hashcode for WindowOperatorQueryFrameProcessorFactory * Address review comment * Fix checkstyle --------- Co-authored-by: Benedict Jin <asdf2014@apache.org>	2024-07-18 10:05:09 +08:00
Vadim Ogievetsky	44b3f8e588	Web console: fix a few console bugs (#16735 ) * remove __time from min max query shortcut * fix scrolling in retention rules dialog * actions menus should have titles * change term * correctly name sort/shuffle	2024-07-17 14:51:17 -07:00
Kashif Faraz	89066b72cf	Fix bug in TaskStorageQueryAdapter (#16750 ) Changes: - Do not hold a reference to `TaskQueue` in `TaskStorageQueryAdapter` - Use `TaskStorage` instead of `TaskStorageQueryAdapter` in `IndexerMetadataStorageAdapter` - Rename `TaskStorageQueryAdapter` to `TaskQueryTool` - Fix newly added task actions `RetrieveUpgradedFromSegmentIds` and `RetrieveUpgradedToSegmentIds` by removing `isAudited` method.	2024-07-17 23:17:41 +05:30
Sree Charan Manamala	40ef9fc4ec	Bug fix for array type selector causing array aggregation over window frame fail (#16653 )	2024-07-17 14:09:56 +02:00
Kashif Faraz	9f6ce6ddc0	Remove task action audit logging and druid_taskLog metadata table (#16309 ) Description: Task action audit logging was first deprecated and disabled by default in Druid 0.13, #6368. As called out in the original discussion #5859, there are several drawbacks to persisting task action audit logs. - Only usage of the task audit logs is to serve the API `/indexer/v1/task/{taskId}/segments` which returns the list of segments created by a task. - The use case is really narrow and no prod clusters really use this information. - There can be better ways of obtaining this information, such as the metric `segment/added/bytes` which reports both the segment ID and task ID when a segment is committed by a task. We could also include committed segment IDs in task reports. - A task persisting several segments would bloat up the audit logs table putting unnecessary strain on metadata storage. Changes: - Remove `TaskAuditLogConfig` - Remove method `TaskAction.isAudited()`. No task action is audited anymore. - Remove `SegmentInsertAction` as it is not used anymore. `SegmentTransactionalInsertAction` is the new incarnation which has been in use for a while. - Deprecate `MetadataStorageActionHandler.addLog()` and `getLogs()`. These are not used anymore but need to be retained for backward compatibility of extensions. - Do not create `druid_taskLog` metadata table anymore.	2024-07-17 17:09:00 +05:30
trompa	ebf216829d	#16717 defer provider instantiation in Kubernetes Module (#16726 ) * #16717 defer provider instatiation * add license header * fix style, ignore new class in jacoco as it is still initialization code --------- Co-authored-by: Alberto Lago Alvarado <albl@sitecore.net>	2024-07-16 13:05:28 -07:00
Kashif Faraz	01d67ae543	Allow CompactionSegmentIterator to have custom priority (#16737 ) Changes: - Break `NewestSegmentFirstIterator` into two parts - `DatasourceCompactibleSegmentIterator` - this contains all the code from `NewestSegmentFirstIterator` but now handles a single datasource and allows a priority to be specified - `PriorityBasedCompactionSegmentIterator` - contains separate iterator for each datasource and combines the results into a single queue to be used by a compaction search policy - Update `NewestSegmentFirstPolicy` to use the above new classes - Cleanup `CompactionStatistics` and `AutoCompactionSnapshot` - Cleanup `CompactSegments` - Remove unused methods from `Tasks` - Remove unneeded `TasksTest` - Move tests from `NewestSegmentFirstIteratorTest` to `CompactionStatusTest` and `DatasourceCompactibleSegmentIteratorTest`	2024-07-16 19:54:49 +05:30
Adithya Chakilam	6cf6838eb9	kubernetes-overlord-extension: Fix tasks not being shutdown (#16711 )	2024-07-15 14:35:11 -07:00
AmatyaAvadhanula	6891866c43	Process retrieval of parent and child segment ids in batches (#16734 )	2024-07-15 18:24:23 +05:30
Sree Charan Manamala	78a4a09d01	Window Function offset correction for RAC (#16718 ) * When an ArrayList RAC creates a child RAC, the start and end offsets need to have the offset of parent's start offset * Defaults the 2nd window bound to CURRENT ROW when only a single bound is specified * Removes the windowingStrictValidation warning and throws a hard exception when Order By alongside RANGE clause is not provided with UNBOUNDED or CURRENT ROW as both bounds	2024-07-15 12:43:27 +02:00
Rishabh Singh	64104533ac	Enable querying entirely cold datasources (#16676 ) Add ability to query entirely cold datasources.	2024-07-15 15:02:59 +05:30

1 2 3 4 5 ...

14226 Commits