druid

Commit Graph

Author	SHA1	Message	Date
Edgar Melendrez	83cf4dc554	[docs] fixes to sql-scalar.md (#16826 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> Co-authored-by: Benedict Jin <asdf2014@apache.org> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>	2024-08-06 17:12:57 -07:00
zachjsh	c324f09108	Kinesis input format docs (#16840 ) * SQL syntax error should target USER persona * * revert change to queryHandler and related tests, based on review comments * * add test * Docs for Kinesis input format * * remove reference to kafka * * fix spellcheck error * Apply suggestions from code review Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> --------- Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2024-08-06 18:53:10 -04:00
Gian Merlino	eaa09937bc	SuperSorter: direct merging, increased parallelism. (#16775 ) Two performance enhancements: 1) Direct merging of input frames to output channels, without any temporary files, if all input frames fit in memory. 2) When doing multi-level merging (now called "external mode"), improve parallelism by boosting up the number of mergers in the penultimate level. To support direct merging, FrameChannelMerger is enhanced such that the output partition min/max values are used to filter input frames. This is necessary because all direct mergers read all input frames, but only rows corresponding to a single output partition.	2024-08-06 15:00:39 -07:00
Edgar Melendrez	ebea34a814	[Docs] Batch06: starting string functions (#16838 ) * batch06, starting string functions * addind space after Syntax * quick change * correcting spelling * Update docs/querying/sql-functions.md * Update sql-functions.md * applying suggestions * Update docs/querying/sql-functions.md * Update docs/querying/sql-functions.md --------- Co-authored-by: Benedict Jin <asdf2014@apache.org> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2024-08-06 11:32:26 -07:00
Adarsh Sanjeev	739068469c	General Druid refactors (#16708 ) Some general refactors across Druid. Switch to DruidExceptions Add javadocs Fix a bug in IntArrayColumns Add a class for LongArrayColumns Remove wireTransferable since it would never be called Refactor DictionaryWriter to return the index written as a return value from write.	2024-08-06 11:47:08 -05:00
Adarsh Sanjeev	2b81c18fd7	Refactor SemanticCreator (#16700 ) Refactors the SemanticCreator annotation. Moves the interface to the semantic package. Create a SemanticUtils to hold logic for storing semantic maps. Add FrameMaker interface.	2024-08-06 11:29:38 -05:00
Vishesh Garg	593c3b2150	Do not support non-idempotent aggregator in MSQ compaction (#16846 ) This PR adds checks for verification of DataSourceCompactionConfig and CompactionTask with msq engine to ensure: each aggregator in metricsSpec is idempotent metricsSpec is non-null when rollup is set to true Unit tests and existing compaction ITs have been updated accordingly.	2024-08-06 20:58:08 +05:30
Kashif Faraz	aa49be61ea	Do not create ZK paths if not needed (#16816 ) Background: ZK-based segment loading has been completely disabled in #15705 . ZK `servedSegmentsPath` has been deprecated since Druid 0.7.1, #1182 . This legacy path has been replaced by the `liveSegmentsPath` and is not used in the code anymore. Changes: - Never create ZK loadQueuePath as it is never used. - Never create ZK servedSegmentsPath as it is never used. - Do not create ZK liveSegmentsPath if announcement on ZK is disabled - Fix up tests	2024-08-06 19:29:13 +05:30
Gian Merlino	de40d81b29	SQL: Add ProjectableFilterableTable to SegmentsTable. (#16841 ) * SQL: Add ProjectableFilterableTable to SegmentsTable. This allows us to skip serialization of expensive fields such as shard_spec, dimensions, metrics, and last_compaction_state, if those fields are not actually being queried. * Restructure logic to avoid unnecessary toString() as well.	2024-08-06 06:40:21 -07:00
Akshat Jain	c3aa033e14	MSQ window functions: Fix query correctness issues when using multiple workers (#16804 ) This PR fixes query correctness issues for MSQ window functions when using more than 1 worker (that is, maxNumTasks > 2). Currently, we were keeping the shuffle spec of the previous stage when we didn't have any partition columns for window stage. This PR changes it to override the shuffle spec of the previous stage to MixShuffleSpec (if we have a window function with empty over clause) so that the window stage gets a single partition to work on. A test has been added for a query which returned incorrect results prior to this change when using more than 1 workers.	2024-08-06 16:11:18 +05:30
Sree Charan Manamala	ed6b547481	Handle default bounds correctly in WINDOW clause (#16833 ) When a window is defined as WINDOW W AS <DEF> and using a syntax of (PARTITION BY col1 ORDER BY col2 ROWS x PRECEDING), we would need to default the other bound to CURRENT ROW We already have implemented this earlier, but when defined as WINDOW W AS <DEF>, Calcite takes a different route to validate the window.	2024-08-06 09:58:44 +02:00
Vadim Ogievetsky	aeace28ccb	Web console: Add columnMapping information to the Explain dialog (#16598 ) * Add columnMapping information in the Explain dialog * use arrow char	2024-08-05 13:21:51 -07:00
Alberic Liu	461727de12	Fix Druid Console cannot open submit supervisor dialog (#16736 )	2024-08-05 10:44:11 -07:00
Zoltan Haindrich	26e3c44f4b	Quidem record (#16624 ) * enables to launch a fake broker based on test resources (druidtest uri) * could record queries into new testfiles during usage * instead of re-purpose Calcite's Hook migrates to use DruidHook which we can add further keys * added a quidem-ut module which could be the place for tests which could iteract with modules/etc	2024-08-05 14:58:32 +02:00
Akshat Jain	08f9ec1cae	Memoize the redundant calls to overlord in sql statements endpoint (#16839 )	2024-08-05 16:52:56 +05:30
Rushikesh Bankar	c8323d1a7c	Add indexer task success and failure metrics (#16829 ) This PR adds indexer-level task metrics- "indexer/task/failed/count" "indexer/task/success/count" the current "worker/task/completed/count" metric shows all the tasks completed irrespective of success or failure status so these metrics would help us get more visibility into the status of the completed tasks	2024-08-05 16:21:27 +05:30
Laksh Singla	c84e689eb8	Don't use ComplexMetricExtractor to fetch the class of the object in field readers (#16825 ) This patch fixes queries like `SELECT COUNT(DISTINCT json_col) FROM foo`	2024-08-05 14:13:56 +05:30
Laksh Singla	0411c4e67e	Add metrics for number of rows/bytes materialized while running subqueries (#16835 ) subquery/rows and subquery/bytes metrics have been added, which indicate the size of the results materialized on the heap.	2024-08-05 14:13:20 +05:30
Sree Charan Manamala	c7eacd079e	fallback SQL IN filter to expression filter when VirtualColumnRegistry is null (#16836 )	2024-08-05 11:27:51 +05:30
Abhishek Radhakrishnan	31b43753fb	Add `druid.indexing.formats.stringMultiValueHandlingMode` system config (#16822 ) This patch introduces an optional cluster configuration, druid.indexing.formats.stringMultiValueHandlingMode, allowing operators to override the default mode SORTED_SET for string dimensions. The possible values for the config are SORTED_SET, SORTED_ARRAY, or ARRAY (SORTED_SET is the default). Case insensitive values are allowed. While this cluster property allows users to manage the multi-value handling mode for string dimension types, it's recommended to migrate to using real array types instead of MVDs. This fixes a long-standing issue where compaction will honor the configured cluster wide property instead of rewriting it as the default SORTED_ARRAY always, even if the data was originally ingested with ARRAY or SORTED_SET.	2024-08-03 10:23:44 -07:00
Kashif Faraz	9dc2569f22	Track and emit segment loading rate for HttpLoadQueuePeon on Coordinator (#16691 ) Design: The loading rate is computed as a moving average of at least the last 10 GiB of successful segment loads. To account for multiple loading threads on a server, we use the concept of a batch to track load times. A batch is a set of segments added by the coordinator to the load queue of a server in one go. Computation: batchDurationMillis = t(load queue becomes empty) - t(first load request in batch is sent to server) batchBytes = total bytes successfully loaded in batch avg loading rate in batch (kbps) = (8 * batchBytes) / batchDurationMillis overall avg loading rate (kbps) = (8 * sumOverWindow(batchBytes)) / sumOverWindow(batchDurationMillis) Changes: - Add `LoadingRateTracker` which computes a moving average load rate based on the last few GBs of successful segment loads. - Emit metric `segment/loading/rateKbps` from the Coordinator. In the future, we may also consider emitting this metric from the historicals themselves. - Add `expectedLoadTimeMillis` to response of API `/druid/coordinator/v1/loadQueue?simple`	2024-08-03 13:14:21 +05:30
Abhishek Radhakrishnan	fe6772a101	Rename test builder `MSQTester.setExpectedSegment` (#16837 ) * Rename setExpectedSegment to setExpectedSegments in MSQTestBase. * Add expected segments for max num segments test cases.	2024-08-02 10:01:55 -07:00
zachjsh	9b731e8f0a	Kinesis Input Format for timestamp, and payload parsing (#16813 ) * SQL syntax error should target USER persona * * revert change to queryHandler and related tests, based on review comments * * add test * Introduce KinesisRecordEntity to support Kinesis headers in InputFormats * * add kinesisInputFormat and Reader, and tests * * bind KinesisInputFormat class to module * * improve test coverage * * remove references to kafka * * resolve review comments * * remove comment * * fix grammer of comment * * fix comment again * * fix comment again * * more review comments * * add partitionKey * * add check for same timestamp and partitionKey column name * * fix intellij inspection	2024-08-02 08:48:44 -04:00
Akshat Jain	63ba5a4113	Fix issues with fetching task reports in SQL statements endpoint for middlemanager (#16832 )	2024-08-01 23:37:15 -04:00
Vadim Ogievetsky	8c170f7d0e	Web console: use stages, counters, and warnings from the new detailed status API (#16809 ) * stages and counters can be seen on the status reponse * warnings are exposed also * mark as msq when attached * update snapshots * download CSV/TSV null as empty cell	2024-08-01 02:30:30 -07:00
Akshat Jain	bb4d6cc001	Add task report fields in response of SQL statements endpoint (#16808 ) If the optional query parameter detail is supplied, then the response also includes the following: * A stages object that summarizes information about the different stages being used for query execution, such as stage number, phase, start time, duration, input and output information, processing methods, and partitioning. * A counters object that provides details on the rows, bytes, and files processed at various stages for each worker across different channels, along with sort progress. * A warnings object that provides details about any warnings.	2024-08-01 10:26:04 +05:30
Gian Merlino	01f6cfcbf5	MSQ worker: Support in-memory shuffles. (#16790 ) * MSQ worker: Support in-memory shuffles. This patch is a follow-up to #16168, adding worker-side support for in-memory shuffles. Changes include: 1) Worker-side code now respects the same context parameter "maxConcurrentStages" that was added to the controller in #16168. The parameter remains undocumented for now, to give us a chance to more fully develop and test this functionality. 1) WorkerImpl is broken up into WorkerImpl, RunWorkOrder, and RunWorkOrderListener to improve readability. 2) WorkerImpl has a new StageOutputHolder + StageOutputReader concept, which abstract over memory-based or file-based stage results. 3) RunWorkOrder is updated to create in-memory stage output channels when instructed to. 4) ControllerResource is updated to add /doneReadingInput/, so the controller can tell when workers that sort, but do not gather statistics, are done reading their inputs. 5) WorkerMemoryParameters is updated to consider maxConcurrentStages. Additionally, WorkerChatHandler is split into WorkerResource, so as to match ControllerChatHandler and ControllerResource. * Updates for static checks, test coverage. * Fixes. * Remove exception. * Changes from review. * Address static check. * Changes from review. * Improvements to docs and method names. * Update comments, add test. * Additional javadocs. * Fix throws. * Fix worker stopping in tests. * Fix stuck test.	2024-07-30 18:41:24 -07:00
Edgar Melendrez	3bb6d40285	[docs] batch 5 updating functions (#16812 ) * batch 5 * Update docs/querying/sql-functions.md * applying suggestions --------- Co-authored-by: Benedict Jin <asdf2014@apache.org>	2024-07-30 17:30:01 -07:00
Edgar Melendrez	85a8a1d805	[Docs]Batch04 - Bitwise numeric functions (#16805 ) * Batch04 - Bitwise numeric functions * Batch04 - Bitwise numeric functions * minor fixes * rewording bitwise_shift functions * rewording bitwise_shift functions * Update docs/querying/sql-functions.md * applying suggestions --------- Co-authored-by: Benedict Jin <asdf2014@apache.org>	2024-07-30 10:53:59 -07:00
Kashif Faraz	954aaafe0c	Refactor: Clean up compaction config classes (#16810 ) Changes: - Rename `CoordinatorCompactionConfig` to `DruidCompactionConfig` - Rename `CompactionConfigUpdateRequest` to `ClusterCompactionConfig` - Refactor methods in `DruidCompactionConfig` - Clean up `DataSourceCompactionConfigHistory` and its tests - Clean up tests and add new tests - Change API path `/druid/coordinator/v1/config/global` to `/druid/coordinator/v1/config/cluster`	2024-07-30 12:17:25 +05:30
AmatyaAvadhanula	92a40d8169	Add API to fetch conflicting task locks (#16799 ) * Add API to fetch conflicting active locks	2024-07-30 11:40:48 +05:30
Vishesh Garg	e9ea243d97	Enable compaction ITs on MSQ engine (#16778 ) Follow-up to #16291, this commit enables a subset of existing native compaction ITs on the MSQ engine. In the process, the following changes have been introduced in the MSQ compaction flow: - Populate `metricsSpec` in `CompactionState` from `querySpec` in `MSQControllerTask` instead of `dataSchema` - Add check for pre-rolled-up segments having `AggregatorFactory` with different input and output column names - Fix passing missing cluster-by clause in scan queries - Add annotation of `CompactionState` to tombstone segments	2024-07-30 09:34:46 +05:30
Zoltan Haindrich	c7cde31a89	HAVING clauses may not contain window functions (#16742 ) Rejects having clauses if they contain windowed expressions. Also added a check to produce a more descriptive error if an OVER expression reaches the filter translation layer. --------- Co-authored-by: Benedict Jin <asdf2014@apache.org>	2024-07-29 04:11:36 -04:00
dependabot[bot]	f5527dc3e7	Bump io.grpc:grpc-netty-shaded from 1.57.2 to 1.65.1 (#16731 ) Bumps [io.grpc:grpc-netty-shaded](https://github.com/grpc/grpc-java) from 1.57.2 to 1.65.1. - [Release notes](https://github.com/grpc/grpc-java/releases) - [Commits](https://github.com/grpc/grpc-java/compare/v1.57.2...v1.65.1) --- updated-dependencies: - dependency-name: io.grpc:grpc-netty-shaded dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: asdf2014 <asdf2014@apache.org>	2024-07-29 14:51:39 +08:00
dependabot[bot]	cbca0dc969	Bump jclouds.version from 2.5.0 to 2.6.0 (#16796 ) Bumps `jclouds.version` from 2.5.0 to 2.6.0. Updates `org.apache.jclouds:jclouds-core` from 2.5.0 to 2.6.0 Updates `org.apache.jclouds.api:openstack-swift` from 2.5.0 to 2.6.0 Updates `org.apache.jclouds.driver:jclouds-slf4j` from 2.5.0 to 2.6.0 Updates `org.apache.jclouds.api:openstack-keystone` from 2.5.0 to 2.6.0 Updates `org.apache.jclouds.api:rackspace-cloudfiles` from 2.5.0 to 2.6.0 Updates `org.apache.jclouds.provider:rackspace-cloudfiles-us` from 2.5.0 to 2.6.0 Updates `org.apache.jclouds.provider:rackspace-cloudfiles-uk` from 2.5.0 to 2.6.0 --- updated-dependencies: - dependency-name: org.apache.jclouds:jclouds-core dependency-type: direct:production update-type: version-update:semver-minor - dependency-name: org.apache.jclouds.api:openstack-swift dependency-type: direct:production update-type: version-update:semver-minor - dependency-name: org.apache.jclouds.driver:jclouds-slf4j dependency-type: direct:production update-type: version-update:semver-minor - dependency-name: org.apache.jclouds.api:openstack-keystone dependency-type: direct:production update-type: version-update:semver-minor - dependency-name: org.apache.jclouds.api:rackspace-cloudfiles dependency-type: direct:production update-type: version-update:semver-minor - dependency-name: org.apache.jclouds.provider:rackspace-cloudfiles-us dependency-type: direct:production update-type: version-update:semver-minor - dependency-name: org.apache.jclouds.provider:rackspace-cloudfiles-uk dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: asdf2014 <asdf2014@apache.org>	2024-07-29 14:49:26 +08:00
Kashif Faraz	caedeb66cd	Add API to update compaction engine (#16803 ) Changes: - Add API `/druid/coordinator/v1/config/compaction/global` to update cluster level compaction config - Add class `CompactionConfigUpdateRequest` - Fix bug in `CoordinatorCompactionConfig` which caused compaction engine to not be persisted. Use json field name `engine` instead of `compactionEngine` because JSON field names must align with the getter name. - Update MSQ validation error messages - Complete overhaul of `CoordinatorCompactionConfigResourceTest` to remove unnecessary mocking and add more meaningful tests. - Add `TuningConfigBuilder` to easily build tuning configs for tests. - Add `DatasourceCompactionConfigBuilder`	2024-07-27 09:14:51 +05:30
Edgar Melendrez	c07aeedbec	[docs] Updating Rollup tutorial (#16762 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> Co-authored-by: Benedict Jin <asdf2014@apache.org> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>	2024-07-26 15:43:31 -07:00
Edgar Melendrez	028ee23a1e	[Docs] batch 03 - trig functions (#16795 ) * batch 03 - trig functions * Apply suggestions from code review Co-authored-by: Charles Smith <techdocsmith@gmail.com> * applying suggestions and corrections --------- Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2024-07-26 13:11:17 -07:00
Charles Smith	ed48cb82e9	[Docs} Remove avro_ocf support from Kafka & Kinesis streaming sources (Revert changes from #11865 ) (#16807 )	2024-07-26 13:06:22 -07:00
Abhishek Radhakrishnan	3c493dc3ed	CircularList round-robin iterator for the KillUnusedSegments duty (#16719 ) * Round-robin iterator for datasources to kill. Currently there's a fairness problem in the KillUnusedSegments duty where the duty consistently selects the same set of datasources as discovered from the metadata store or dynamic config params. This is a problem especially when there are multiple unused. In a medium to large cluster, while we can increase the task slots to increase the likelihood of broader coverage. This patch adds a simple round-robin iterator to select datasources and has the following properties: 1. Starts with an initial random cursor position in an ordered list of candidates. 2. Consecutive {@code next()} iterations from {@link #getIterator()} are guaranteed to be deterministic unless the set of candidates change when {@link #updateCandidates(Set)} is called. 3. Guarantees that no duplicate candidates are returned in two consecutive {@code next()} iterations. * Renames in RoundRobinIteratorTest. * Address review comments. 1. Clarify javadocs on the ordered list. Also flesh out the details a bit more. 2. Rename the test hooks to make intent clearer and fix typo. 3. Add NotThreadSafe annotation. 4. Remove one potentially noisy log that's in the path of iteration. * Add null check to input candidates. * More commentary. * Addres review feedback: downgrade some new info logs to debug; invert condition. Remove redundant comments. Remove rendundant variable tracking. * CircularList adjustments. * Updates to CircularList and cleanup RoundRobinInterator. * One more case and add more tests. * Make advanceCursor private for now. * Review comments.	2024-07-26 12:20:49 -07:00
Sree Charan Manamala	9b76d13ff8	Check for Aggregation inside a window clause when syntax used as - WINDOW W AS DEF (#16801 )	2024-07-26 11:18:35 +02:00
Laksh Singla	725d442355	Faster dimension deserialization on the brokers (#16740 ) Speedier dimension deserialization on the brokers.	2024-07-26 14:36:11 +05:30
Clint Wylie	71725b41b5	ignore dependencies for github stale action (#16797 )	2024-07-25 10:32:43 -07:00
Gian Merlino	b2a88da200	Attempt to coerce COMPLEX to number in numeric aggregators. (#16564 ) * Coerce COMPLEX to number in numeric aggregators. PR #15371 eliminated ObjectColumnSelector's built-in implementations of numeric methods, which had been marked deprecated. However, some complex types, like SpectatorHistogram, can be successfully coerced to number. The documentation for spectator histograms encourages taking advantage of this by aggregating complex columns with doubleSum and longSum. Currently, this doesn't work properly for IncrementalIndex, where the behavior relied on those deprecated ObjectColumnSelector methods. This patch fixes the behavior by making two changes: 1) SimpleXYZAggregatorFactory (XYZ = type; base class for simple numeric aggregators; all of these extend NullableNumericAggregatorFactory) use getObject for STRING and COMPLEX. Previously, getObject was only used for STRING. 2) NullableNumericAggregatorFactory (base class for simple numeric aggregators) has a new protected method "useGetObject". This allows the base class to correctly check for null (using getObject or isNull). The patch also adds a test for SpectatorHistogram + doubleSum + IncrementalIndex. * Fix tests. * Remove the special ColumnValueSelector. * Add test.	2024-07-25 08:45:29 -07:00
Rohan Garg	b5f117bca2	Check for tombstones in wrapping storage adapters (#16791 )	2024-07-25 06:55:40 -04:00
Clint Wylie	14954c7eb9	serialize legacy as false for scan query for rolling downgrade/upgrade (#16793 ) Fixes rolling downgrades/upgrades after #16659 by hard coding scan query "legacy":false since it is a required property during deserialization.	2024-07-25 14:51:58 +05:30
Gian Merlino	c1875e7c1d	HashJoinEngine: Check for interruptions while walking left cursor. (#16773 ) * HashJoinEngine: Check for interruptions while walking left cursor. Previously, the engine only checked for interruptions between emitting joined rows. In scenarios where large numbers of left rows are skipped completely (such as a highly selective INNER JOIN) this led to the join cursor being insufficiently responsive to cancellation. * Coverage.	2024-07-25 15:10:50 +08:00
Clint Wylie	5da69a01cb	change arrayIngestMode default to array (#16789 ) * change arrayIngestMode default to array * remove arrayIngestMode flag option none * fix space * fix test	2024-07-25 15:09:40 +08:00
Zoltan Haindrich	7e3fab5bf9	Make WindowFrames more specific (#16741 ) Changes the WindowFrame internals / representation a bit; introduces dedicated frametypes for rows and groups which corresponds to the implemented processing methods	2024-07-25 04:57:36 +02:00
Edgar Melendrez	ca787885c9	[docs] batch02 of updating functions (#16761 ) * applying changes * ensuring batch is updated * Update docs/querying/sql-functions.md * raise -> raises * addressing review * Apply suggestions from code review Co-authored-by: Charles Smith <techdocsmith@gmail.com> --------- Co-authored-by: Benedict Jin <asdf2014@apache.org> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2024-07-24 15:28:57 -07:00

1 2 3 4 5 ...

14264 Commits All Branches Search

14264 Commits

All Branches