druid

Commit Graph

Author	SHA1	Message	Date
George Shiqi Wu	d5a25a94b8	Docs: Clarify that all supervisors can support early handoff (#16588 )	2024-06-13 08:43:22 +05:30
YongGang	46dbc74053	Support Dynamic Peon Pod Template Selection in K8s extension (#16510 ) * initial commit * add Javadocs * refine JSON input config * more test and fix build * extract existing behavior as default strategy * change template mapping fallback * add docs * update doc * fix doc * address comments * define Matcher interface * fix test coverage * use lower case for endpoint path * update Json name * add more tests * refactoring Selector class	2024-06-12 15:27:10 -07:00
Andreas Maechler	fec48432d4	docs: Correct some outdated module names (#16584 ) * Fix module names * Better spacing * Some spacing * Suggestions from code review Thanks Abhishek. * More links * Roll-up time * Remove logs * More spelling	2024-06-11 14:17:40 -07:00
Andreas Maechler	24056b90b5	Bring back missing property in indexer documentation (#16582 ) * Bring back druid.peon.taskActionClient.retry.minWait * Update docs/configuration/index.md * Consistent italics Thanks Abhishek. * Update docs/configuration/index.md Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com> * Consistent list style * Remove extra space --------- Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>	2024-06-10 16:52:54 -07:00
Kashif Faraz	e4fdf1055b	Update default value of `druid.indexer.tasklock.batchAllocationWaitTime` to zero (#16578 ) Update default value of druid.indexer.tasklock.batchAllocationWaitTime to 0. Thus, a segment allocation request is processed immediately unless there are already some requests queued before this one. While in queue, a segment allocation request may get clubbed together with other similar requests into a batch to reduce load on the metadata store.	2024-06-10 20:07:23 +05:30
317brian	8e11adfc6f	docs: remove outdated druidversion var from a page (#16570 ) Co-authored-by: asdf2014 <asdf2014@apache.org>	2024-06-10 15:30:36 +08:00
Gian Merlino	b837ce565b	Simplify serialized form of JsonInputFormat. (#15691 ) * Simplify serialized form of JsonInputFormat. Use JsonInclude for keepNullColumns, assumeNewlineDelimited, and useJsonNodeReader. Because the default value of keepNullColumns is variable, we store the original configured value rather than the derived value, and include if the original value is nonnull. * Fix test.	2024-06-05 20:01:14 -07:00
Katya Macedo	7aecc09230	Docs: Remove circular link (#16553 )	2024-06-05 11:07:36 -07:00
Charles Smith	c100ae0ecc	Add a tutorial for LATEST_BY to get most recent data (#16515 ) Co-authored-by: Will Xu <2bethere@gmail.com> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>	2024-06-04 17:00:25 -07:00
Jill Osborne	8b5802d4cd	docs: add maxSubqueryBytes limit to migration guide landing page (#16547 ) Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2024-06-04 12:52:06 -07:00
Amit	540d3e6af5	Added new use cases and description of the use case - 5/14/24 (#16451 ) Thanks for your contribution @amit-git-account * Added new use cases and description of the use case - 5/14/24 The use case listing is not changed in a long time. While speaking with users, I came across several other use cases not listed here in the index. So I added new use cases and also added description against the use cases. * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * update spelling file * Update docs/design/index.md --------- Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> Co-authored-by: Benedict Jin <asdf2014@apache.org>	2024-06-04 09:47:49 -07:00
Charles Smith	8f78c901e7	docs: add lookups to the sidebar (#16530 ) Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2024-06-03 16:04:15 -07:00
Charles Smith	b1568fb95b	docs: Adds a redirect for flatten-json which was removed (#16263 )	2024-05-31 16:16:12 -07:00
Katya Macedo	f70ef1f434	Update front coding text (#16491 ) Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2024-05-31 15:13:10 -07:00
Katya Macedo	92e660dd21	Add Druid 30.0.0 upgrade notes (#16522 )	2024-05-31 13:23:22 -07:00
Atul Mohan	b53d75758f	IcebergInputSource : Add option to toggle case sensitivity while reading columns from iceberg catalog (#16496 ) * Toggle case sensitivity while reading columns from iceberg * Fix tests * Drop case check and set unconditionally	2024-05-31 10:18:52 -07:00
George Shiqi Wu	0936798122	Add limit to task payload size (#16512 ) * Add limit to task payload size * Change to a warning * Remove test * Fix unit tests * Optionally throw alert * PR comments * Update indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskQueue.java Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * PR comments * Reject large payloads * Update docs/configuration/index.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Update indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskQueue.java Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> --------- Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2024-05-31 09:17:36 -07:00
Jill Osborne	3c72ec8413	docs: Migration guide for subquery limit (#16519 ) Adds a migration guide for Druid 30 to help users understand the new byte-based subquery limit property maxSubqueryBytes	2024-05-31 09:26:07 +05:30
Charles Smith	92e565e3b8	Adds a migration guide overview page to the release-info section (#16506 ) Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> Co-authored-by: Katya Macedo <katya.macedo@imply.io>	2024-05-30 09:50:30 -07:00
Adithya Chakilam	a9044ac235	Add cgroup cpu/mem/disk usage metrics (#16472 ) * Add cgroup cpu/mem usage metrics * checks * comments * docs fix * add disk metrics * fapi check * checkstyle * issues * spelling * change asserts * checks * use proc builder instead of runtime * specify charset * spotbug	2024-05-29 12:44:37 -07:00
George Shiqi Wu	b3b62ac431	Update azure input source docs (#16508 ) Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2024-05-29 10:00:46 -07:00
Vadim Ogievetsky	10ea88e5bf	Web console: more robust durable storage setting detection (#16493 ) * more robust durable storage setting * add test	2024-05-22 15:47:20 -07:00
Vadim Ogievetsky	a124c6cbbd	fix typo in extension name (#16466 )	2024-05-20 09:47:22 +08:00
George Shiqi Wu	ed9881df88	Cleanup logic from handoff API (#16457 ) * Cleanup logic from handoff API * Fix test * Fix checkstyle * Update docs	2024-05-16 08:42:44 -07:00
Gian Merlino	72432c2e78	Speed up SQL IN using SCALAR_IN_ARRAY. (#16388 ) * Speed up SQL IN using SCALAR_IN_ARRAY. Main changes: 1) DruidSqlValidator now includes a rewrite of IN to SCALAR_IN_ARRAY, when the size of the IN is above inFunctionThreshold. The default value of inFunctionThreshold is 100. Users can restore the prior behavior by setting it to Integer.MAX_VALUE. 2) SearchOperatorConversion now generates SCALAR_IN_ARRAY when converting to a regular expression, when the size of the SEARCH is above inFunctionExprThreshold. The default value of inFunctionExprThreshold is 2. Users can restore the prior behavior by setting it to Integer.MAX_VALUE. 3) ReverseLookupRule generates SCALAR_IN_ARRAY if the set of reverse-looked-up values is greater than inFunctionThreshold. * Revert test. * Additional coverage. * Update docs/querying/sql-query-context.md Co-authored-by: Benedict Jin <asdf2014@apache.org> * New test. --------- Co-authored-by: Benedict Jin <asdf2014@apache.org>	2024-05-14 08:09:27 -07:00
George Shiqi Wu	c1bf4fed90	API for stopping streaming tasks early (#16310 ) * Try stopping task early * Fix checkstyle * Add unit test * Add a couple more tests * PR changes * Use notice * fix checkstyle * PR changes * Update indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/supervisor/SeekableStreamSupervisor.java Co-authored-by: Suneet Saldanha <suneet@apache.org> * Update indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/supervisor/SeekableStreamSupervisor.java Co-authored-by: Suneet Saldanha <suneet@apache.org> * Change payload * Remove quotes --------- Co-authored-by: Suneet Saldanha <suneet@apache.org>	2024-05-14 06:39:50 -07:00
Alberic Liu	811dcd1726	update protobuf.md (#16434 )	2024-05-11 17:52:54 +08:00
Charles Smith	2d0b4e5f1e	Update sidebar to organize tutorials + other minor improvements (#16184 ) Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2024-05-09 08:57:43 -07:00
Adarsh Sanjeev	269e035e76	Add validation for reindex with realtime sources (#16390 ) Add validation for reindex with realtime sources. With the addition of concurrent compaction, it is possible to ingest data while querying from realtime sources with MSQ into the same datasource. This could potentially lead to issues if the interval that is ingested into is replaced by an MSQ job, which has queried only some of the data from the realtime task. This PR adds validation to check that the datasource being ingested into is not being queried from, if the query includes realtime sources.	2024-05-07 10:32:15 +05:30
Misha	b5958b6b07	Feature configurable calcite bloat (#16248 ) * Configurable bloat for calcite ProjectMergeRule implemented * Comment added * Default bloat value increased to 1000 * Implemented bloat configuration from QueryContext * Code refactored, docs updated --------- Co-authored-by: sviatahorau <mikhail.sviatahorau@deep.bi>	2024-05-06 20:43:39 +05:30
Alberic Liu	92fb0ff718	upgrade mysql:mysql-connector-java to 8.2.0 (#16024 ) * upgrade mysql:mysql-connector-java to 8.2.0 * fix the check errors * remove unused comment	2024-05-06 21:58:37 +08:00
Rishabh Singh	c61c3785a0	Followup changes to 15817 (Segment schema publishing and polling) (#16368 ) * Fix build * Nit changes in KillUnreferencedSegmentSchema * Replace reference to the abbreviation SMQ with Metadata Query, rename inTransit maps in schema cache * nitpicks * Remove reference to smq abbreviation from integration-tests * Remove reference to smq abbreviation from integration-tests * minor change * Update index.md * Add delimiter while computing schema fingerprint hash	2024-05-03 19:13:52 +05:30
Kashif Faraz	51104e8bb3	Docs: Remove references to Zk-based segment loading (#16360 ) Follow up to #15705 Changes: - Remove references to ZK-based segment loading in the docs - Fix doc for existing config `druid.coordinator.loadqueuepeon.http.repeatDelay`	2024-05-01 08:06:00 +05:30
Abhishek Radhakrishnan	1d7595f3f7	Support for filters in the Druid Delta Lake connector (#16288 ) * Delta Lake support for filters. * Updates * cleanup comments * Docs * Remmove Enclosed runner * Rename * Cleanup test * Serde test for the Delta input source and fix jackson annotation. * Updates and docs. * Update error messages to be clearer * Fixes * Handle NumberFormatException to provide a nicer error message. * Apply suggestions from code review Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Doc fixes based on feedback * Yes -> yes in docs; reword slightly. * Update docs/ingestion/input-sources.md Co-authored-by: Laksh Singla <lakshsingla@gmail.com> * Update docs/ingestion/input-sources.md Co-authored-by: Laksh Singla <lakshsingla@gmail.com> * Documentation, javadoc and more updates. * Not with an or expression end-to-end test. * Break up =, >, >=, <, <= into its own types instead of sub-classing. --------- Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> Co-authored-by: Laksh Singla <lakshsingla@gmail.com>	2024-04-29 11:31:36 -07:00
Adithya Chakilam	f8015eb02a	Add config lagAggregate to LagBasedAutoScalerConfig (#16334 ) Changes: - Add new config `lagAggregate` to `LagBasedAutoScalerConfig` - Add field `aggregateForScaling` to `LagStats` - Use the new field/config to determine which aggregate to use to compute lag - Remove method `Supervisor.computeLagForAutoScaler()`	2024-04-29 22:20:41 +05:30
Gian Merlino	db82adcdfd	SCALAR_IN_ARRAY: Optimization and behavioral follow-ups. (#16311 ) * Four changes to scalar_in_array as follow-ups to #16306: 1) Align behavior for `null` scalars to the behavior of the native `in` and `inType` filters: return `true` if the array itself contains null, else return `null`. 2) Rename the class to more closely match the function name. 3) Add a specialization for constant arrays, where we build a `HashSet`. 4) Use `castForEqualityComparison` to properly handle cross-type comparisons. Additional tests verify comparisons between LONG and DOUBLE are now handled properly. * Fix spelling. * Adjustments from review.	2024-04-26 16:01:17 -07:00
Atul Mohan	77333e56fa	Docs: Add missing kafka emitter config (#16332 )	2024-04-25 10:37:14 +05:30
Katya Macedo	ceb6646dec	Add supervisor actions (#16276 ) * Add supervisor actions * Update text * Update text * Update after review * Update after review	2024-04-24 13:14:01 -07:00
Rishabh Singh	e30790e013	Introduce Segment Schema Publishing and Polling for Efficient Datasource Schema Building (#15817 ) Issue: #14989 The initial step in optimizing segment metadata was to centralize the construction of datasource schema in the Coordinator (#14985). Thereafter, we addressed the problem of publishing schema for realtime segments (#15475). Subsequently, our goal is to eliminate the requirement for regularly executing queries to obtain segment schema information. This is the final change which involves publishing segment schema for finalized segments from task and periodically polling them in the Coordinator.	2024-04-24 22:22:53 +05:30
Charles Smith	65412f80ab	remove additional column marks (#16319 )	2024-04-22 19:41:54 -07:00
Sree Charan Manamala	ad5701e891	new SCALAR_IN_ARRAY function analogous to DRUID_IN (#16306 ) * scalar_in function * api doc * refactor	2024-04-18 21:15:15 -07:00
Gian Merlino	4285a5e2c6	Update documentation for exceptions to subquery limit. (#16295 ) The true exception for groupBy is somewhat more narrow than the docs suggest.	2024-04-17 21:04:43 -07:00
Hardik Bajaj	0bf5e7745d	Add configurable parameters for statsd client (#16283 ) Statsd client sometimes drops metrics when this queueSize of statsd client with max unprocessed messages is completely full. This causes some high cardinality metrics like per partition lag being droppped. There are multiple parameters of statsdclient that can be initialized and can help increase the load/capacity of client to not to drop metrics more frequently. Properties like queueSize, poolSize, processorWorkers and senderWorkers will now be configurable at runtime	2024-04-17 18:35:31 +05:30
Nikhil Rao	a805c5612e	Adds Druid SQL query examples for the Stats aggregator Native Queries (#16277 ) * Adds Druid SQL query examples for the Timeseries and GroupBy Native queries in the stats aggregator docs page * Updates intervals in Native Query to remove excess Time part in timestamp * Moves Druid SQL section above Native query because sql used more often by users * removes old Druid SQL sections * Adds TopN Druid SQL query using ORDER BY and LIMIT * Adds table for Druid SQL variance and standard deviation functions * Update docs/development/extensions-core/stats.md Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com> --------- Co-authored-by: Karan Kumar <karankumar1100@gmail.com> Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>	2024-04-15 08:05:34 -07:00
Adarsh Sanjeev	3df00aef9d	Add manifest file for MSQ export (#15953 ) Currently, export creates the files at the provided destination. The addition of the manifest file will provide a list of files created as part of the manifest. This will allow easier consumption of the data exported from Druid, especially for automated data pipelines	2024-04-15 11:37:31 +05:30
Abhishek Radhakrishnan	041d0bff5e	Set default `KillUnusedSegments` duty to coordinator's indexing period & `killTaskSlotRatio` to 0.1 (#16247 ) The default value for druid.coordinator.kill.period (if unspecified) has changed from P1D to the value of druid.coordinator.period.indexingPeriod. Operators can choose to override druid.coordinator.kill.period and that will take precedence over the default behavior. The default value for the coordinator dynamic config killTaskSlotRatio is updated from 1.0 to 0.1. This ensures that that kill tasks take up only 1 task slot right out-of-the-box instead of taking up all the task slots. * Remove stale comment and inline canDutyRun() * druid.coordinator.kill.period defaults to druid.coordinator.period.indexingPeriod if not set. - Remove the default P1D value for druid.coordinator.kill.period. Instead default druid.coordinator.kill.period to whatever value druid.coordinator.period.indexingPeriod is set to if the former config isn't specified. - If druid.coordinator.kill.period is set, the value will take precedence over druid.coordinator.period.indexingPeriod * Update server/src/test/java/org/apache/druid/server/coordinator/DruidCoordinatorConfigTest.java * Fix checkstyle error * Clarify comment * Update server/src/main/java/org/apache/druid/server/coordinator/DruidCoordinatorConfig.java * Put back canDutyRun() * Default killTaskSlotsRatio to 0.1 instead of 1.0 (all slots) * Fix typo DEFAULT_MAX_COMPACTION_TASK_SLOTS * Remove unused test method. * Update default value of killTaskSlotsRatio in docs and web-console default mock * Move initDuty() after params and config setup.	2024-04-14 18:56:17 -07:00
Katya Macedo	7f06a53cb1	[Docs] Fix API placeholder formatting (#16240 )	2024-04-12 09:19:13 -07:00
YongGang	da9feb4430	Introduce TaskContextReport for reporting task context (#16041 ) Changes: - Add `TaskContextEnricher` interface to improve task management and monitoring - Invoke `enrichContext` in `TaskQueue.add()` whenever a new task is submitted to the Overlord - Add `TaskContextReport` to write out task context information in reports	2024-04-12 08:57:49 +05:30
Pranav	fc2600b8e2	Adding jvmVersion dimension in JVM Monitor (#16262 )	2024-04-11 15:44:56 -07:00
317brian	df9e1bb97b	Docs: Fix typo in tutorial (#16254 )	2024-04-10 08:59:52 +05:30

1 2 3 4 5 ...

3162 Commits