druid

Commit Graph

Author	SHA1	Message	Date
AmatyaAvadhanula	d6c760f7ce	Do not kill segments with referenced load specs from deep storage (#16667 ) Do not kill segments with referenced load specs from deep storage	2024-07-15 14:07:53 +05:30
Kashif Faraz	656667ee89	Tests: Add utility class TuningConfigBuilder to make IndexTask tests more readable and concise (#16732 ) Changes: - No functional change - Add class `TuningConfigBuilder` to build `IndexTuningConfig`, `CompactionTuningConfig` - Remove old class `ParallelIndexTestingFactory.TuningConfigBuilder` - Remove some unused fields and methods	2024-07-15 10:13:06 +05:30
Kashif Faraz	a618c5dd0d	Refactor: Miscellaneous batch task cleanup (#16730 ) Changes - No functional change - Remove unused method `IndexTuningConfig.withPartitionsSpec()` - Remove unused method `ParallelIndexTuningConfig.withPartitionsSpec()` - Remove redundant method `CompactTask.emitIngestionModeMetrics()` - Remove Clock argument from `CompactionTask.createDataSchemasForInterval()` as it was only needed for one test which was just verifying the value passed by the test itself. The code now uses a `Stopwatch` instead and test simply verifies that the metric has been emitted. - Other minor cleanup changes	2024-07-13 08:12:51 +05:30
Laksh Singla	3a1b437056	Improve the fallback strategy when the broker is unable to materialize the subquery's results as frames for estimating the bytes (#16679 ) Better fallback strategy when the broker is unable to materialize the subquery's results as frames for estimating the bytes: a. We don't touch the subquery sequence till we know that we can materialize the result as frames	2024-07-12 21:49:12 +05:30
Vishesh Garg	197c54f673	Auto-Compaction using Multi-Stage Query Engine (#16291 ) Description: Compaction operations issued by the Coordinator currently run using the native query engine. As majority of the advancements that we are making in batch ingestion are in MSQ, it is imperative that we support compaction on MSQ to make Compaction more robust and possibly faster. For instance, we have seen OOM errors in native compaction that MSQ could have handled by its auto-calculation of tuning parameters. This commit enables compaction on MSQ to remove the dependency on native engine. Main changes: * `DataSourceCompactionConfig` now has an additional field `engine` that can be one of `[native, msq]` with `native` being the default. * if engine is MSQ, `CompactSegments` duty assigns all available compaction task slots to the launched `CompactionTask` to ensure full capacity is available to MSQ. This is to avoid stalling which could happen in case a fraction of the tasks were allotted and they eventually fell short of the number of tasks required by the MSQ engine to run the compaction. * `ClientCompactionTaskQuery` has a new field `compactionRunner` with just one `engine` field. * `CompactionTask` now has `CompactionRunner` interface instance with its implementations `NativeCompactinRunner` and `MSQCompactionRunner` in the `druid-multi-stage-query` extension. The objectmapper deserializes `ClientCompactionRunnerInfo` in `ClientCompactionTaskQuery` to the `CompactionRunner` instance that is mapped to the specified type [`native`, `msq`]. * `CompactTask` uses the `CompactionRunner` instance it receives to create the indexing tasks. * `CompactionTask` to `MSQControllerTask` conversion logic checks whether metrics are present in the segment schema. If present, the task is created with a native group-by query; if not, the task is issued with a scan query. The `storeCompactionState` flag is set in the context. * Each created `MSQControllerTask` is launched in-place and its `TaskStatus` tracked to determine the final status of the `CompactionTask`. The id of each of these tasks is the same as that of `CompactionTask` since otherwise, the workers will be unable to determine the controller task's location for communication (as they haven't been launched via the overlord).	2024-07-12 16:40:20 +05:30
Sree Charan Manamala	eb981d855f	Correct aggregators violating names (#16615 ) In case of few aggregators for example BloomSqlAggregator, BaseVarianceSqlAggregator etc, the aggName is being updated from a0 to a0:agg, breaching the contract as we would expect the aggName as the name which is passed. This is causing a mismatch while creating a column accessor. This commit aims to correct those violating sql aggregators.	2024-07-12 09:18:09 +02:00
Clint Wylie	dca31d466c	minor adjustments for performance (#16714 ) changes: * switch to stop using some string.format * switch some streams to classic loops	2024-07-11 16:57:15 -07:00
Vadim Ogievetsky	307b8849de	Web console: better sql data loader reset (#16696 ) * better sql data loader reset * snapshot * fix destination pane sizing * clean doc links * update doc links * more doc links * extract getClusterCapacity * update snapsohts * allow submit suspended * some renaming * diff with current * Do delta	2024-07-11 14:45:04 -07:00
Clint Wylie	b3c238457f	fix unnest bugs (#16723 ) changes: * fixes a bug with unnest storage adapter not preserving underlying columns dictionary uniqueness when allowing dimension selector cursor * fixes a bug with unnest on realtime segments with empty rows incorrectly specifying index 0 as the row dictionary value	2024-07-11 13:48:15 -07:00
Sree Charan Manamala	760d70312f	Window Drill tests coverage improvement (#16722 ) Window Drill tests coverage improvement	2024-07-11 19:11:36 +05:30
Clint Wylie	d6c07270a5	fix issues with join filter pushdown and virtual column resolution (#16702 )	2024-07-11 04:26:07 -07:00
YongGang	4b293fc2a9	Docs: Fix k8s dynamic config URL (#16720 )	2024-07-11 10:05:47 +05:30
Kashif Faraz	616ae631c6	Fix NPE in CompactSegments (#16713 )	2024-07-10 14:51:52 +08:00
Adarsh Sanjeev	7c625356c5	Add logging for sketches on workers (#16697 ) Improve the logging of sketches on workers.	2024-07-09 14:37:43 +05:30
Adarsh Sanjeev	af5399cd9d	Fixes a bug when running queries with a limit clause (#16643 ) Add a shuffling based on the resultShuffleSpecFactory after a limit processor depending on the query destination. LimitFrameProcessors currently do not update the partition boosting column, so we also add the boost column to the previous stage, if one is required.	2024-07-09 14:29:12 +05:30
Zoltan Haindrich	a9bd0eea2a	Fix queries filtering for the same condition with both an IN and EQUALS to not return empty results (#16597 ) temp fix until CALCITE-6435 gets fixed (released&upgraded to) added a custom rule (FixIncorrectInExpansionTypes) to fix-up types of the affected literals added a testcase which will alert on upgrade	2024-07-09 12:28:21 +05:30
Clint Wylie	09e0eefdc3	modify equality and typed in filter behavior for numeric match values on string columns (#16593 ) * fix equality and typed in filter behavior for numeric match values on string columns changes: * EqualityFilter and TypedInfilter numeric match values against string columns will now cast strings to numeric values instead of converting the numeric values directly to string for pure string equality, which is consistent with the casts which are eaten in the SQL layer, as well as classic druid behavior * added tests to cover numeric equality matching. Double match values in particular would fail to match the string values since `1.0` would become `'1.0'` which does not match `'1'`.	2024-07-08 10:58:05 -07:00
Kashif Faraz	7c6f2b1e20	Minor log cleanup in K8sDruidNodeDiscoveryProvider (#16701 )	2024-07-08 18:32:39 +05:30
Abhishek Radhakrishnan	bf2be938a9	Refactor `SegmentLoadDropHandler` code (#16685 ) Motivation: - Improve code hygeiene - Make `SegmentLoadDropHandler` easily extensible Changes: - Add `SegmentBootstrapper` - Move code for bootstrapping segments already cached on disk and fetched from coordinator to `SegmentBootstrapper`. - No functional change - Use separate executor service in `SegmentBootstrapper` - Bind `SegmentBootstrapper` to `ManageLifecycle` explicitly in `CliBroker`, `CliHistorical` etc.	2024-07-08 09:29:55 +05:30
Alberic Liu	c6c2652c89	unified the code format in NestedDataOperatorConversions (#16695 )	2024-07-08 10:06:24 +08:00
Lars Francke	586c713d12	Updates build documentation to not mention explicit Java version as it was out of sync with the dedicated Java page. (#16674 ) This means there is one less place to keep information in sync.	2024-07-03 20:53:15 +05:30
Virushade	f290cf083a	Update examples/bin/dsql scripts to accept Python 3 (#16677 ) * Update examples/bin/dsql scripts to accept Python 3 Remove redundant urllib import Translating to Python3: Changing xrange to range Translating to Python3: Changing long to int Translating to Python3: Change urllib2 methods, and fix encoding/decoding issues Remove unnecessary import Add option for Python2 Rename files * Update examples/bin/dsql Co-authored-by: Benedict Jin <asdf2014@apache.org> * Resolve PR comments Add comment in files indicating updates need to be made in both places Update examples/bin/dsql Co-authored-by: Benedict Jin <asdf2014@apache.org> * Update error output when using Python 2. Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com> --------- Co-authored-by: Benedict Jin <asdf2014@apache.org> Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>	2024-07-03 15:52:57 +08:00
Kashif Faraz	6c87b1637b	Revert "Downgrade the version of Apache Curator from 5.5.0 to 5.3.0 to avoid a bug in the new version (#16425 )" (#16688 ) This reverts commit `cb7c2c1e37`.	2024-07-03 11:18:50 +05:30
Abhishek Radhakrishnan	35b970935f	Better error handling when retrieving Avro schemas from registry (#16684 ) * Handle RestClientException separately, instead of returning a generic error. - Add tests - Clean up the tests; remove the legacy expected exception pattern - Better test assertions * Rename tests * checkstyle fixes	2024-07-02 16:48:34 -07:00
317brian	d65e015c94	docs: nit for link format (#16687 )	2024-07-02 16:45:09 -07:00
Victoria Lim	adde024e11	docs: Subtitle updates in migration guide overview (#16683 )	2024-07-02 12:56:05 -07:00
zachjsh	5e05858ff7	Catalog granularity accepts query format (#16680 ) Previously, the segment granularity for tables in the catalog had to be defined in period format, ie `'PT1H'` , `'P1D'`, etc. This disallows a user from defining segment granularity of `'ALL'` for a table in the catalog, which may be a valid use case. This change makes it so that a user may define the segment granularity of a table in the catalog, as any string that results in a valid granularity using either the `Granularity.fromString(str)` method, or `new PeriodGranularity(new Period(value), null, null)`, and that granularity maps to a standard supported granularity, where `GranularityType.isStandard(granularity)` returns true. As a result a user may who wants to assign a catalog table's segment granularity to be hourly, may assign the segment granularity property of the table to be either `PT1H`, or `HOUR`. These are the same formats accepted at query time.	2024-07-02 12:14:28 -04:00
Jill Osborne	bd49ecfd29	Addition to subquery limit migration guide (#16671 ) Co-authored-by: Laksh Singla <lakshsingla@gmail.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2024-07-01 14:22:47 -07:00
Akshat Jain	34c80ee3de	Add MSQ engine support for window function drill tests (#16665 ) * Add MSQ engine support for window function drill tests * Address review comments * Revert formatting changes in TestDataBuilder	2024-06-28 11:14:17 +05:30
Rishabh Singh	c96e783750	Fix schema backfill count metric (#16536 ) * Fix build * Fix backfill metric * Address review comment	2024-06-28 11:07:28 +05:30
Rishabh Singh	b9c7664ac3	Fix empty datasource schema on the Broker when metadata query is disabled (#16645 ) * Fix build * Fix empty datasource schema on the broker * review comment * Remove unused import	2024-06-28 11:06:56 +05:30
Clint Wylie	45c020060c	better javadoc for ColumnIndexSupplier (#16663 ) Updated javadoc for `ColumnIndexSupplier.as` to elaborate on the types of indexes callers might want to ask for from the method, as well as help implementors know what kinds of indexes they should implement to participate in filtering	2024-06-27 17:53:20 -07:00
Clint Wylie	d86f25c74a	fix vector grouping expression deferred evaluation to only consider dictionary encoded strings as fixed width (#16666 )	2024-06-27 16:19:16 -07:00
317brian	4401c9d138	docs: add redirect for kafka lookups (#16668 )	2024-06-27 10:56:51 -07:00
Rishabh Singh	f51c7b346f	Add druid parquet extensions to example quickstarts (#16664 ) This change adds druid-parquet-extensions to all example quickstarts	2024-06-27 14:41:58 +05:30
Hugh Evans	920d9020c0	Docs: Fix default value for globalIngestionHeapLimitBytes (#16654 ) Use the new default value added in #8255	2024-06-27 07:01:56 +05:30
Gian Merlino	dbed1b0f50	Defer more expressions in vectorized groupBy. (#16338 ) * Defer more expressions in vectorized groupBy. This patch adds a way for columns to provide GroupByVectorColumnSelectors, which controls how the groupBy engine operates on them. This mechanism is used by ExpressionVirtualColumn to provide an ExpressionDeferredGroupByVectorColumnSelector that uses the inputs of an expression as the grouping key. The actual expression evaluation is deferred until the grouped ResultRow is created. A new context parameter "deferExpressionDimensions" allows users to control when this deferred selector is used. The default is "fixedWidthNonNumeric", which is a behavioral change from the prior behavior. Users can get the prior behavior by setting this to "singleString". * Fix style. * Add deferExpressionDimensions to SqlExpressionBenchmark. * Fix style. * Fix inspections. * Add more testing. * Use valueOrDefault. * Compute exprKeyBytes a bit lighter-weight.	2024-06-26 17:28:36 -07:00
Clint Wylie	d4f2636325	fix greatest/least function non-vectorized processing to ignore null argument types (#16649 )	2024-06-26 12:59:42 -07:00
Andreas Maechler	ab76d851ad	Update docs contribution with correct script (#16581 ) * Spacing * Fix ordering * npm run start	2024-06-26 10:30:52 -07:00
Abhishek Radhakrishnan	82117e8101	Add MSQ query context `maxNumSegments` (#16637 ) * Add MSQ query context maxNumSegments. - Default is MAX_INT (unbounded). - When set and if a time chunk contains more number of segments than set in the query context, the MSQ task will fail with TooManySegments fault. * Fixup hashCode(). * Rename and checkpoint. * Add some insert and replace happy and sad path tests. * Update error msg. * Commentary * Adjust the default to be null (meaning no max bound on number of segments). Also fix formatter. * Fix CodeQL warnings and minor cleanup. * Assert on maxNumSegments tuning config. * Minor test cleanup. * Use null default for the MultiStageQueryContext as well * Review feedback * Review feedback * Move logic to common function getPartitionsByBucket shared by INSERT and REPLACE. * Rename to validateNumSegmentsPerBucketOrThrow() for consistency. * Add segmentGranularity to error message.	2024-06-26 09:29:51 -07:00
Rahul Bansal	b772277d3b	Update intellij-setup.md (#16655 ) updating typing mistakes	2024-06-26 17:38:37 +05:30
Laksh Singla	71b3b5ab5d	Add query context parameter to remove null bytes when writing frames (#16579 ) MSQ cannot process null bytes in string fields, and the current workaround is to remove them using the REPLACE function. 'removeNullBytes' context parameter has been added which sanitizes the input string fields by removing these null bytes.	2024-06-26 15:00:30 +05:30
Kashif Faraz	d9bd02256a	Refactor: Rename UsedSegmentChecker and cleanup task actions (#16644 ) Changes: - Rename `UsedSegmentChecker` to `PublishedSegmentsRetriever` - Remove deprecated single `Interval` argument from `RetrieveUsedSegmentsAction` as it is now unused and has been deprecated since #1988 - Return `Set` of segments instead of a `Collection` from `IndexerMetadataStorageCoordinator.retrieveUsedSegments()`	2024-06-26 10:48:59 +05:30
Tom	52c9929019	Column name in parse exceptions (#16529 ) * first pass * more changes * fix tests and formatting * fix kinesis failing tests * fix kafka tests * add dimension name to float parse errors * double and convertToType handling of dimensionName can report parse errors with dimension name * fix checkstyle issue * fix tests * more cases to have better parse exception messages * fix test * fix tests * partially address comments * annotate method parameter with nullable * address comments * fix tests * let float, double, long dimensionIndexer pass dimensionName down to dimensionHandlerUtils * fix compilation error and clean up formatting * clean up whitespace * address feedback. undo change, pass down report parse exception for convertToType * fix test	2024-06-25 13:42:52 -07:00
Abhishek Radhakrishnan	e01f155209	Add missing `delta-storage` dependency and class loader workaround to Delta table ingestion (#16648 ) * Workaround to ingesting from Delta table in 3.2.0. With the upgrade to Kernel 3.2.0, the Druid Delta connector extension isn't able to ingest from Delta tables successfully. Please see https://github.com/delta-io/delta/issues/3299 The underlying problem seems to be coming from https://github.com/delta-io/delta/blob/master/kernel/kernel-defaults/src/main/java/io/delta/kernel/defaults/internal/logstore/LogStoreProvider.java#L99 This patch is a workaround to setting the thread class loader explictly. The Kernel community may consider a fix in the next release as it's affected another connector as well. * Address review comment: clear the CL after the Thread CL is set.	2024-06-25 09:16:13 -07:00
Edgar Melendrez	b43f4063c5	Docs: update link and title of quickstart (#16638 ) * update link and title * Discard changes to website/package.json * Apply suggestions from code review Co-authored-by: Charles Smith <techdocsmith@gmail.com> --------- Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2024-06-25 09:07:00 -07:00
Abhishek Radhakrishnan	2979f73e89	Fix Intellij inspection (#16651 )	2024-06-25 04:32:43 -07:00
Kashif Faraz	f1043d20bc	Support csv input format in Kafka ingestion with header (#16630 ) * Support ListBasedInputRow in Kafka ingestion with header * Fix up buildBlendedEventMap * Add new test for KafkaInputFormat with csv value and headers * Do not use forbidden APIs * Move utility method to TestUtils	2024-06-25 11:50:01 +05:30
Clint Wylie	37a50e6803	Remove index_realtime and index_realtime_appenderator tasks (#16602 ) index_realtime tasks were removed from the documentation in #13107. Even at that time, they weren't really documented per se— just mentioned. They existed solely to support Tranquility, which is an obsolete ingestion method that predates migration of Druid to ASF and is no longer being maintained. Tranquility docs were also de-linked from the sidebars and the other doc pages in #11134. Only a stub remains, so people with links to the page can see that it's no longer recommended. index_realtime_appenderator tasks existed in the code base, but were never documented, nor as far as I am aware were they used for any purpose. This patch removes both task types completely, as well as removes all supporting code that was otherwise unused. It also updates the stub doc for Tranquility to be firmer that it is not compatible. (Previously, the stub doc said it wasn't recommended, and pointed out that it is built against an ancient 0.9.2 version of Druid.) ITUnionQueryTest has been migrated to the new integration tests framework and updated to use Kafka ingestion. Co-authored-by: Gian Merlino <gianmerlino@gmail.com>	2024-06-24 20:13:33 -07:00
317brian	2131917f16	docs: added front-coded dictionaries to upgrade notes (#16647 ) * docs: add front-coded dictionareis to upgrade notes * add it to release notes template	2024-06-24 10:52:26 -07:00

1 2 3 4 5 ...

14175 Commits All Branches Search

14175 Commits

All Branches