druid

Commit Graph

Author	SHA1	Message	Date
Adarsh Sanjeev	f5cc823d0f	Handle nulls in DruidCoordinator.getReplicationFactor (#14447 )	2023-06-20 15:25:57 +05:30
Rohan Garg	09d6c5a45e	Decouple logical planning and native query generation in SQL planning (#14232 ) Add a new planning strategy that explicitly decouples the DAG from building the native query. With this mode, it is Calcite's job to generate a "logical DAG" which is all of the various DruidProject, DruidFilter, etc. nodes. We then take those nodes and use them to build a native query. The current commit doesn't pass all tests, but it does work for some things and is a decent starting baseline.	2023-06-19 16:00:40 -07:00
Kashif Faraz	50461c3bd5	Enable smartSegmentLoading on the Coordinator (#13197 ) This commit does a complete revamp of the coordinator to address problem areas: - Stability: Fix several bugs, add capabilities to prioritize and cancel load queue items - Visibility: Add new metrics, improve logs, revamp `CoordinatorRunStats` - Configuration: Add dynamic config `smartSegmentLoading` to automatically set optimal values for all segment loading configs such as `maxSegmentsToMove`, `replicationThrottleLimit` and `maxSegmentsInNodeLoadingQueue`. Changed classes: - Add `StrategicSegmentAssigner` to make assignment decisions for load, replicate and move - Add `SegmentAction` to distinguish between load, replicate, drop and move operations - Add `SegmentReplicationStatus` to capture current state of replication of all used segments - Add `SegmentLoadingConfig` to contain recomputed dynamic config values - Simplify classes `LoadRule`, `BroadcastRule` - Simplify the `BalancerStrategy` and `CostBalancerStrategy` - Add several new methods to `ServerHolder` to track loaded and queued segments - Refactor `DruidCoordinator` Impact: - Enable `smartSegmentLoading` by default. With this enabled, none of the following dynamic configs need to be set: `maxSegmentsToMove`, `replicationThrottleLimit`, `maxSegmentsInNodeLoadingQueue`, `useRoundRobinSegmentAssignment`, `emitBalancingStats` and `replicantLifetime`. - Coordinator reports richer metrics and produces cleaner and more informative logs - Coordinator uses an unlimited load queue for all serves, and makes better assignment decisions	2023-06-19 14:27:35 +05:30
imply-cheddar	cfd07a95b7	Errors take 3 (#14004 ) Introduce DruidException, an exception whose goal in life is to be delivered to a user. DruidException itself has javadoc on it to describe how it should be used. This commit both introduces the Exception and adjusts some of the places that are generating exceptions to generate DruidException objects instead, as a way to show how the Exception should be used. This work was a 3rd iteration on top of work that was started by Paul Rogers. I don't know if his name will survive the squash-and-merge, so I'm calling it out here and thanking him for starting on this.	2023-06-19 01:11:13 -07:00
Gian Merlino	2b676ac7f8	Quieter KafkaSupervisors in all bundled log4j2.xml. (#14444 ) Follow-up to #13392, which added this to a single log4j2.xml.	2023-06-19 12:04:11 +05:30
Adarsh Sanjeev	128133fadc	Add column replication_factor column to sys.segments table (#14403 ) Description: Druid allows a configuration of load rules that may cause a used segment to not be loaded on any historical. This status is not tracked in the sys.segments table on the broker, which makes it difficult to determine if the unavailability of a segment is expected and if we should not wait for it to be loaded on a server after ingestion has finished. Changes: - Track replication factor in `SegmentReplicantLookup` during evaluation of load rules - Update API `/druid/coordinator/v1metadata/segments` to return replication factor - Add column `replication_factor` to the sys.segments virtual table and populate it in `MetadataSegmentView` - If this column is 0, the segment is not assigned to any historical and will not be loaded.	2023-06-18 10:02:21 +05:30
George Shiqi Wu	bd07c3dd43	Don't need to double synchronize on simple map operations (#14435 ) * Don't need to double syncronize on simple map operations * remove lock	2023-06-17 17:30:37 -07:00
Abhishek Radhakrishnan	04fb75719e	Fail query planning if a `CLUSTERED BY` column contains descending order (#14436 ) * Throw ValidationException if CLUSTERED BY column descending order is specified. - Fails query planning * Some more tests. * fixup existing comment * Update comment * checkstyle fix: remove unused imports * Remove InsertCannotOrderByDescendingFault and deprecate the fault in readme. * move deprecated field to the bottom	2023-06-16 18:10:12 -04:00
George Shiqi Wu	64af9bfe5b	Add groupId to metrics (#14402 ) * Add group id as a dimension * Revert changes * Add to forking task runner * Add missing metrics * Fix indenting * revert metrics * Fix indentation	2023-06-16 09:28:16 -07:00
Clint Wylie	359bd63cc9	allow expression "best effort" type determination to better handle mixed type arrays (#14438 )	2023-06-16 00:02:43 -07:00
Gian Merlino	85656a467c	MSQ: Load broadcast tables on workers. (#14437 ) They were not previously loaded because supportsQueries was false. This patch sets supportsQueries to true, and clarifies in Task javadocs that supportsQueries can be true for tasks that aren't directly queryable over HTTP.	2023-06-16 12:02:20 +05:30
Maytas Monsereenusorn	5d76d0ea74	Fix segment/deleted/count metric not being emitted (#14433 ) * Fix segment/deleted/count metric * Fix segment/deleted/count metric * Fix segment/deleted/count metric	2023-06-15 14:08:19 -07:00
Laksh Singla	4935f2470a	Limit results generated by SELECT queries in MSQ (#14370 ) * Limit select results in MSQ * reduce number of files in test * add truncated flag * avoid materializing select results to list, use iterable instead * javadocs	2023-06-15 13:13:11 +05:30
Clint Wylie	ff5ae4db6c	fix kafka input format reader schema discovery and partial schema discovery (#14421 ) * fix kafka input format reader schema discovery and partial schema discovery to actually work right, by re-using dimension filtering logic of MapInputRowParser	2023-06-15 00:11:04 -07:00
Clint Wylie	ca116cf886	adjust broker parallel merge to help managed blocking be more well behaved (#14427 )	2023-06-15 00:10:31 -07:00
Pranav	5314db9f85	Adding the file mapper to handle v2 buffer deserialization (#14429 )	2023-06-14 19:41:44 -07:00
Pranav	e426d370ea	Start with solo accumulator and empty partition (#14426 ) * Starting parallel merge with solo accumulator and empty partitions * shutshown pool in test	2023-06-14 16:20:48 -07:00
Alexander Saydakov	f6169d437b	use the latest datasketches-java-4.1.0 (#14430 ) Co-authored-by: AlexanderSaydakov <AlexanderSaydakov@users.noreply.github.com>	2023-06-14 16:03:56 -07:00
George Shiqi Wu	76e70654ac	Fix issues when startup timeout is hit (#14425 )	2023-06-14 11:49:55 -07:00
Vadim Ogievetsky	6fd28fc185	Web console: split the Ingestion view into two views: Supervisors and Tasks (#14395 ) * init split * don't crash if unable to get running tasks * update snapshots * push down state into call * googies * simplify * update e2e tests * feedback fixes * update e2e tests * better icons * fix test * adjust colors	2023-06-14 10:42:30 -07:00
Clint Wylie	8454cc619a	auto columns fixes (#14422 ) changes: * auto columns no longer participate in generic 'null column' handling, this was a mistake to try to support and caused ingestion failures due to mismatched ColumnFormat, and will be replaced in the future with nested common format constant column functionality (not in this PR) * fix bugs with auto columns which contain empty objects, empty arrays, or primitive types mixed with either of these empty constructs * fix bug with bound filter when upper is null equivalent but is strict	2023-06-14 08:57:06 -07:00
Abhishek Radhakrishnan	be5a6593a9	Reset `RuntimeInfo` to fix flaky test `ParametrizedUriEmitterConfigTest`. (#14405 ) * Add injector so JVM settings are correctly set up and bound for the test. * Add VisibleForTesting IDE annotation. * spacing	2023-06-13 18:07:51 -07:00
Abhishek Radhakrishnan	b8495d45a1	Expose Druid functions in `INFORMATION_SCHEMA.ROUTINES` table. (#14378 ) * Add INFORMATION_SCHEMA.ROUTINES to expose Druid operators and functions. * checkstyle * remove IS_DETERMISITIC. * test * cleanup test * remove logs and simplify * fixup unit test * Add docs for INFORMATION_SCHEMA.ROUTINES table. * Update test and add another SQL query. * add stuff to .spelling and checkstyle fix. * Add more tests for custom operators. * checkstyle and comment. * Some naming cleanup. * Add FUNCTION_ID * The different Calcite function syntax enums get translated to FUNCTION * Update docs. * Cleanup markdown table. * fixup test. * fixup intellij inspection * Review comment: nullable column; add a function to determine function syntax. * More tests; add non-function syntax operators. * More unit tests. Also add a separate test for DruidOperatorTable. * actually just validate non-zero count. * switch up the order * checkstyle fixes.	2023-06-13 15:44:04 -04:00
Clint Wylie	61120dc49a	fix Kafka input format to throw ParseException if timestamp is missing (#14413 )	2023-06-13 09:00:11 -07:00
Rishabh Singh	66c3cc1391	Handle unparseable SupervisorSpec in metadata store (#14382 ) Changes: - Skip a supervisor spec entry which cannot be deserialised into a `SupervisorSpec` object. - Log an error for the unparseable spec	2023-06-13 08:02:01 +05:30
Abhishek Radhakrishnan	1c76ebad3b	Minor doc updates. (#14409 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2023-06-12 15:24:48 -07:00
Abhishek Radhakrishnan	326f2c5020	Add more statement attributes to explain plan result. (#14391 ) This PR adds the following to the ATTRIBUTES column in the explain plan output: - partitionedBy - clusteredBy - replaceTimeChunks This PR leverages the work done in #14074, which added a new column ATTRIBUTES to encapsulate all the statement-related attributes.	2023-06-12 19:18:02 +05:30
Rishabh Singh	8b212e73d7	Add method to authorize native query using authentication result (#14376 )	2023-06-12 11:06:00 +05:30
Clint Wylie	b5f45832b1	Add 'Flaky test' issue template (#14394 ) * Add 'Flaky test' issue template * Update flaky_test.md	2023-06-11 19:02:38 -07:00
Adarsh Sanjeev	267cbac6ff	Add logs for deleting files using storage connector (#14350 ) * Add logs for deleting files using storage connector * Address review comments * Update log message format	2023-06-11 21:24:30 +05:30
Kashif Faraz	6e158704cb	Do not retry INSERT task into metadata if max_allowed_packet limit is violated (#14271 ) Changes - Add a `DruidException` which contains a user-facing error message, HTTP response code - Make `EntryExistsException` extend `DruidException` - If metadata store max_allowed_packet limit is violated while inserting a new task, throw `DruidException` with response code 400 (bad request) to prevent retries - Add `SQLMetadataConnector.isRootCausePacketTooBigException` with impl for MySQL	2023-06-10 12:15:44 +05:30
Abhishek Radhakrishnan	31c386ee1b	Fixup typo and java code snippets in JDBC docs. (#14399 )	2023-06-09 12:39:21 -07:00
John Gozde	4d146ca87d	Upgrades the React dependency to v18 (#14380 ) * Use react 18 * Remove deprecated usage of Toaster * Make AppToaster lazy * Update testing-library, snapshots * Licenses * Document lazy-init, add license header	2023-06-09 12:09:13 -07:00
Abhishek Radhakrishnan	23c2dcaf8d	Add NullHandling module initialization for `LookupDimensionSpecTest` (#14393 )	2023-06-09 09:07:32 +05:30
Benedict Jin	5eb2556566	Add workflow links to README to jump into detailed pages (#14383 )	2023-06-09 08:31:50 +05:30
imply-cheddar	87149d5975	Remove AbstractIndex (#14388 ) The class apparently only exists to add a toString() method to Indexes, which basically just crashes any debugger on any meaningfully sized index. It's a pointless abstract class that basically only causes pain.	2023-06-08 19:52:16 -07:00
317brian	ff577a69a5	doc: escape tags in markdown in prepration for docusaurus2 (#14379 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2023-06-08 11:26:18 -07:00
Kashif Faraz	12e8fa5c97	Prevent coordinator from getting stuck if leadership changes during coordinator run (#14385 ) Changes: - Add a timeout of 1 minute to resultFuture.get() in `CostBalancerStrategy.chooseBestServer`. 1 minute is the typical time for a full coordinator run and is more than enough time for cost computations of a single segment. - Raise an alert if an exception is encountered while computing costs and if the executor has not been shutdown. This is because a shutdown is intentional and does not require an alert.	2023-06-08 15:29:20 +05:30
Atul Mohan	6a4cbab4b8	Upgrade parquet-mr version (#14070 ) * Upgrade parquet version * Move parquet version to hadoop3 * Fix license * Exclude audience annotations	2023-06-07 08:54:54 -07:00
Gian Merlino	6370769cbf	Fix documentation for druid.query.scheduler.numThreads. (#14381 ) * Fix documentation for druid.query.scheduler.numThreads.	2023-06-07 14:48:08 +05:30
Soumyava	01b22ca022	Hll Sketch and Theta sketch estimate can now be used as an expression (#14312 ) * Hll Sketch estimate can now be used as an expression * Theta sketch estimate now can be used as an expression	2023-06-06 20:14:25 -07:00
Abhishek Radhakrishnan	2d258a95ad	Fix `EARLIEST_BY`/`LATEST_BY` signature and include function name in signature. (#14352 ) * Fix EarliestLatestBySqlAggregator signature; Include function name for all signatures. * Single quote function signatures, space between args and remove \n. * fixup UT assertion	2023-06-06 09:41:05 -07:00
Laksh Singla	5da601c47e	fix npe (#14369 )	2023-06-06 17:01:42 +05:30
John Gozde	cfc2a8d286	Switch to @blueprint/datetime2 (#14371 ) * Bump blueprint packages * Switch to datetime2 components * Update licenses * Update snapshots	2023-06-05 22:18:05 -07:00
Gian Merlino	a0d49baad6	MSQ: Fix issue with rollup ingestion and aggregators with multiple names. (#14367 ) The same aggregator can have two output names for a SQL like: INSERT INTO foo SELECT x, COUNT() AS y, COUNT() AS z FROM t GROUP BY 1 PARTITIONED BY ALL In this case, the SQL planner will create a query with a single "count" aggregator mapped to output names "y" and "z". The prior MSQ code did not properly handle this case, instead throwing an error like: Expected single output for query column[a0] but got [[1, 2]]	2023-06-06 10:28:41 +05:30
John Gozde	c14e54cf93	Remove context params from class component ctors (#14366 )	2023-06-05 11:15:28 -07:00
317brian	49c056af17	docs: add basic contributor guide for docs (#14365 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2023-06-05 10:53:17 -07:00
Tejaswini Bandlamudi	8e4f003f02	Fix flaky Revised ITs failures on GHA runners (#14348 ) * Fix read timed out failures and remove containers before test * remove containers before loading images * add labels to IT docker containers, download stable minio docker image release instead of latest	2023-06-05 18:58:54 +05:30
Abhishek Agarwal	139156cf6b	Reduce the spam in broker logs (#14368 )	2023-06-05 18:56:34 +05:30
Katya Macedo	7fd215b2e7	Document storeCompactionState (#14354 )	2023-06-02 11:09:04 -07:00

... 3 4 5 6 7 ...

13032 Commits All Branches Search

13032 Commits

All Branches