druid

Commit Graph

Author	SHA1	Message	Date
317brian	e1926e2549	docs: fix redirect (#16548 ) * doc: cleanup unnecessary redirect (cherry picked from commit d86aaadbc78cc51345f768ee66c9a8b2cbf13f27) * restore redirect file entry. delete md file	2024-06-14 09:54:16 +08:00
Alberic Liu	ea2de517b2	Update the youtube link for druid presentations page (#16601 ) * Update the link to lambda architectures with Druid * update the youtube link	2024-06-14 09:47:46 +08:00
Victoria Lim	836cdb48a5	docs: Migration guide for MVDs to arrays (#16516 ) Co-authored-by: Clint Wylie <cjwylie@gmail.com> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> Co-authored-by: Benedict Jin <asdf2014@apache.org> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>	2024-06-13 13:05:58 -07:00
George Shiqi Wu	d5a25a94b8	Docs: Clarify that all supervisors can support early handoff (#16588 )	2024-06-13 08:43:22 +05:30
YongGang	46dbc74053	Support Dynamic Peon Pod Template Selection in K8s extension (#16510 ) * initial commit * add Javadocs * refine JSON input config * more test and fix build * extract existing behavior as default strategy * change template mapping fallback * add docs * update doc * fix doc * address comments * define Matcher interface * fix test coverage * use lower case for endpoint path * update Json name * add more tests * refactoring Selector class	2024-06-12 15:27:10 -07:00
Andreas Maechler	fec48432d4	docs: Correct some outdated module names (#16584 ) * Fix module names * Better spacing * Some spacing * Suggestions from code review Thanks Abhishek. * More links * Roll-up time * Remove logs * More spelling	2024-06-11 14:17:40 -07:00
Andreas Maechler	24056b90b5	Bring back missing property in indexer documentation (#16582 ) * Bring back druid.peon.taskActionClient.retry.minWait * Update docs/configuration/index.md * Consistent italics Thanks Abhishek. * Update docs/configuration/index.md Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com> * Consistent list style * Remove extra space --------- Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>	2024-06-10 16:52:54 -07:00
Kashif Faraz	e4fdf1055b	Update default value of `druid.indexer.tasklock.batchAllocationWaitTime` to zero (#16578 ) Update default value of druid.indexer.tasklock.batchAllocationWaitTime to 0. Thus, a segment allocation request is processed immediately unless there are already some requests queued before this one. While in queue, a segment allocation request may get clubbed together with other similar requests into a batch to reduce load on the metadata store.	2024-06-10 20:07:23 +05:30
317brian	8e11adfc6f	docs: remove outdated druidversion var from a page (#16570 ) Co-authored-by: asdf2014 <asdf2014@apache.org>	2024-06-10 15:30:36 +08:00
Gian Merlino	b837ce565b	Simplify serialized form of JsonInputFormat. (#15691 ) * Simplify serialized form of JsonInputFormat. Use JsonInclude for keepNullColumns, assumeNewlineDelimited, and useJsonNodeReader. Because the default value of keepNullColumns is variable, we store the original configured value rather than the derived value, and include if the original value is nonnull. * Fix test.	2024-06-05 20:01:14 -07:00
Katya Macedo	7aecc09230	Docs: Remove circular link (#16553 )	2024-06-05 11:07:36 -07:00
Charles Smith	c100ae0ecc	Add a tutorial for LATEST_BY to get most recent data (#16515 ) Co-authored-by: Will Xu <2bethere@gmail.com> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>	2024-06-04 17:00:25 -07:00
Jill Osborne	8b5802d4cd	docs: add maxSubqueryBytes limit to migration guide landing page (#16547 ) Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2024-06-04 12:52:06 -07:00
Amit	540d3e6af5	Added new use cases and description of the use case - 5/14/24 (#16451 ) Thanks for your contribution @amit-git-account * Added new use cases and description of the use case - 5/14/24 The use case listing is not changed in a long time. While speaking with users, I came across several other use cases not listed here in the index. So I added new use cases and also added description against the use cases. * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * update spelling file * Update docs/design/index.md --------- Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> Co-authored-by: Benedict Jin <asdf2014@apache.org>	2024-06-04 09:47:49 -07:00
Charles Smith	8f78c901e7	docs: add lookups to the sidebar (#16530 ) Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2024-06-03 16:04:15 -07:00
Charles Smith	b1568fb95b	docs: Adds a redirect for flatten-json which was removed (#16263 )	2024-05-31 16:16:12 -07:00
Katya Macedo	f70ef1f434	Update front coding text (#16491 ) Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2024-05-31 15:13:10 -07:00
Katya Macedo	92e660dd21	Add Druid 30.0.0 upgrade notes (#16522 )	2024-05-31 13:23:22 -07:00
Atul Mohan	b53d75758f	IcebergInputSource : Add option to toggle case sensitivity while reading columns from iceberg catalog (#16496 ) * Toggle case sensitivity while reading columns from iceberg * Fix tests * Drop case check and set unconditionally	2024-05-31 10:18:52 -07:00
George Shiqi Wu	0936798122	Add limit to task payload size (#16512 ) * Add limit to task payload size * Change to a warning * Remove test * Fix unit tests * Optionally throw alert * PR comments * Update indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskQueue.java Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * PR comments * Reject large payloads * Update docs/configuration/index.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Update indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskQueue.java Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> --------- Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2024-05-31 09:17:36 -07:00
Jill Osborne	3c72ec8413	docs: Migration guide for subquery limit (#16519 ) Adds a migration guide for Druid 30 to help users understand the new byte-based subquery limit property maxSubqueryBytes	2024-05-31 09:26:07 +05:30
Charles Smith	92e565e3b8	Adds a migration guide overview page to the release-info section (#16506 ) Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> Co-authored-by: Katya Macedo <katya.macedo@imply.io>	2024-05-30 09:50:30 -07:00
Adithya Chakilam	a9044ac235	Add cgroup cpu/mem/disk usage metrics (#16472 ) * Add cgroup cpu/mem usage metrics * checks * comments * docs fix * add disk metrics * fapi check * checkstyle * issues * spelling * change asserts * checks * use proc builder instead of runtime * specify charset * spotbug	2024-05-29 12:44:37 -07:00
George Shiqi Wu	b3b62ac431	Update azure input source docs (#16508 ) Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2024-05-29 10:00:46 -07:00
Vadim Ogievetsky	10ea88e5bf	Web console: more robust durable storage setting detection (#16493 ) * more robust durable storage setting * add test	2024-05-22 15:47:20 -07:00
Vadim Ogievetsky	a124c6cbbd	fix typo in extension name (#16466 )	2024-05-20 09:47:22 +08:00
George Shiqi Wu	ed9881df88	Cleanup logic from handoff API (#16457 ) * Cleanup logic from handoff API * Fix test * Fix checkstyle * Update docs	2024-05-16 08:42:44 -07:00
Gian Merlino	72432c2e78	Speed up SQL IN using SCALAR_IN_ARRAY. (#16388 ) * Speed up SQL IN using SCALAR_IN_ARRAY. Main changes: 1) DruidSqlValidator now includes a rewrite of IN to SCALAR_IN_ARRAY, when the size of the IN is above inFunctionThreshold. The default value of inFunctionThreshold is 100. Users can restore the prior behavior by setting it to Integer.MAX_VALUE. 2) SearchOperatorConversion now generates SCALAR_IN_ARRAY when converting to a regular expression, when the size of the SEARCH is above inFunctionExprThreshold. The default value of inFunctionExprThreshold is 2. Users can restore the prior behavior by setting it to Integer.MAX_VALUE. 3) ReverseLookupRule generates SCALAR_IN_ARRAY if the set of reverse-looked-up values is greater than inFunctionThreshold. * Revert test. * Additional coverage. * Update docs/querying/sql-query-context.md Co-authored-by: Benedict Jin <asdf2014@apache.org> * New test. --------- Co-authored-by: Benedict Jin <asdf2014@apache.org>	2024-05-14 08:09:27 -07:00
George Shiqi Wu	c1bf4fed90	API for stopping streaming tasks early (#16310 ) * Try stopping task early * Fix checkstyle * Add unit test * Add a couple more tests * PR changes * Use notice * fix checkstyle * PR changes * Update indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/supervisor/SeekableStreamSupervisor.java Co-authored-by: Suneet Saldanha <suneet@apache.org> * Update indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/supervisor/SeekableStreamSupervisor.java Co-authored-by: Suneet Saldanha <suneet@apache.org> * Change payload * Remove quotes --------- Co-authored-by: Suneet Saldanha <suneet@apache.org>	2024-05-14 06:39:50 -07:00
Alberic Liu	811dcd1726	update protobuf.md (#16434 )	2024-05-11 17:52:54 +08:00
Charles Smith	2d0b4e5f1e	Update sidebar to organize tutorials + other minor improvements (#16184 ) Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2024-05-09 08:57:43 -07:00
Adarsh Sanjeev	269e035e76	Add validation for reindex with realtime sources (#16390 ) Add validation for reindex with realtime sources. With the addition of concurrent compaction, it is possible to ingest data while querying from realtime sources with MSQ into the same datasource. This could potentially lead to issues if the interval that is ingested into is replaced by an MSQ job, which has queried only some of the data from the realtime task. This PR adds validation to check that the datasource being ingested into is not being queried from, if the query includes realtime sources.	2024-05-07 10:32:15 +05:30
Misha	b5958b6b07	Feature configurable calcite bloat (#16248 ) * Configurable bloat for calcite ProjectMergeRule implemented * Comment added * Default bloat value increased to 1000 * Implemented bloat configuration from QueryContext * Code refactored, docs updated --------- Co-authored-by: sviatahorau <mikhail.sviatahorau@deep.bi>	2024-05-06 20:43:39 +05:30
Alberic Liu	92fb0ff718	upgrade mysql:mysql-connector-java to 8.2.0 (#16024 ) * upgrade mysql:mysql-connector-java to 8.2.0 * fix the check errors * remove unused comment	2024-05-06 21:58:37 +08:00
Rishabh Singh	c61c3785a0	Followup changes to 15817 (Segment schema publishing and polling) (#16368 ) * Fix build * Nit changes in KillUnreferencedSegmentSchema * Replace reference to the abbreviation SMQ with Metadata Query, rename inTransit maps in schema cache * nitpicks * Remove reference to smq abbreviation from integration-tests * Remove reference to smq abbreviation from integration-tests * minor change * Update index.md * Add delimiter while computing schema fingerprint hash	2024-05-03 19:13:52 +05:30
Kashif Faraz	51104e8bb3	Docs: Remove references to Zk-based segment loading (#16360 ) Follow up to #15705 Changes: - Remove references to ZK-based segment loading in the docs - Fix doc for existing config `druid.coordinator.loadqueuepeon.http.repeatDelay`	2024-05-01 08:06:00 +05:30
Abhishek Radhakrishnan	1d7595f3f7	Support for filters in the Druid Delta Lake connector (#16288 ) * Delta Lake support for filters. * Updates * cleanup comments * Docs * Remmove Enclosed runner * Rename * Cleanup test * Serde test for the Delta input source and fix jackson annotation. * Updates and docs. * Update error messages to be clearer * Fixes * Handle NumberFormatException to provide a nicer error message. * Apply suggestions from code review Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Doc fixes based on feedback * Yes -> yes in docs; reword slightly. * Update docs/ingestion/input-sources.md Co-authored-by: Laksh Singla <lakshsingla@gmail.com> * Update docs/ingestion/input-sources.md Co-authored-by: Laksh Singla <lakshsingla@gmail.com> * Documentation, javadoc and more updates. * Not with an or expression end-to-end test. * Break up =, >, >=, <, <= into its own types instead of sub-classing. --------- Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> Co-authored-by: Laksh Singla <lakshsingla@gmail.com>	2024-04-29 11:31:36 -07:00
Adithya Chakilam	f8015eb02a	Add config lagAggregate to LagBasedAutoScalerConfig (#16334 ) Changes: - Add new config `lagAggregate` to `LagBasedAutoScalerConfig` - Add field `aggregateForScaling` to `LagStats` - Use the new field/config to determine which aggregate to use to compute lag - Remove method `Supervisor.computeLagForAutoScaler()`	2024-04-29 22:20:41 +05:30
Gian Merlino	db82adcdfd	SCALAR_IN_ARRAY: Optimization and behavioral follow-ups. (#16311 ) * Four changes to scalar_in_array as follow-ups to #16306: 1) Align behavior for `null` scalars to the behavior of the native `in` and `inType` filters: return `true` if the array itself contains null, else return `null`. 2) Rename the class to more closely match the function name. 3) Add a specialization for constant arrays, where we build a `HashSet`. 4) Use `castForEqualityComparison` to properly handle cross-type comparisons. Additional tests verify comparisons between LONG and DOUBLE are now handled properly. * Fix spelling. * Adjustments from review.	2024-04-26 16:01:17 -07:00
Atul Mohan	77333e56fa	Docs: Add missing kafka emitter config (#16332 )	2024-04-25 10:37:14 +05:30
Katya Macedo	ceb6646dec	Add supervisor actions (#16276 ) * Add supervisor actions * Update text * Update text * Update after review * Update after review	2024-04-24 13:14:01 -07:00
Rishabh Singh	e30790e013	Introduce Segment Schema Publishing and Polling for Efficient Datasource Schema Building (#15817 ) Issue: #14989 The initial step in optimizing segment metadata was to centralize the construction of datasource schema in the Coordinator (#14985). Thereafter, we addressed the problem of publishing schema for realtime segments (#15475). Subsequently, our goal is to eliminate the requirement for regularly executing queries to obtain segment schema information. This is the final change which involves publishing segment schema for finalized segments from task and periodically polling them in the Coordinator.	2024-04-24 22:22:53 +05:30
Charles Smith	65412f80ab	remove additional column marks (#16319 )	2024-04-22 19:41:54 -07:00
Sree Charan Manamala	ad5701e891	new SCALAR_IN_ARRAY function analogous to DRUID_IN (#16306 ) * scalar_in function * api doc * refactor	2024-04-18 21:15:15 -07:00
Gian Merlino	4285a5e2c6	Update documentation for exceptions to subquery limit. (#16295 ) The true exception for groupBy is somewhat more narrow than the docs suggest.	2024-04-17 21:04:43 -07:00
Hardik Bajaj	0bf5e7745d	Add configurable parameters for statsd client (#16283 ) Statsd client sometimes drops metrics when this queueSize of statsd client with max unprocessed messages is completely full. This causes some high cardinality metrics like per partition lag being droppped. There are multiple parameters of statsdclient that can be initialized and can help increase the load/capacity of client to not to drop metrics more frequently. Properties like queueSize, poolSize, processorWorkers and senderWorkers will now be configurable at runtime	2024-04-17 18:35:31 +05:30
Nikhil Rao	a805c5612e	Adds Druid SQL query examples for the Stats aggregator Native Queries (#16277 ) * Adds Druid SQL query examples for the Timeseries and GroupBy Native queries in the stats aggregator docs page * Updates intervals in Native Query to remove excess Time part in timestamp * Moves Druid SQL section above Native query because sql used more often by users * removes old Druid SQL sections * Adds TopN Druid SQL query using ORDER BY and LIMIT * Adds table for Druid SQL variance and standard deviation functions * Update docs/development/extensions-core/stats.md Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com> --------- Co-authored-by: Karan Kumar <karankumar1100@gmail.com> Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>	2024-04-15 08:05:34 -07:00
Adarsh Sanjeev	3df00aef9d	Add manifest file for MSQ export (#15953 ) Currently, export creates the files at the provided destination. The addition of the manifest file will provide a list of files created as part of the manifest. This will allow easier consumption of the data exported from Druid, especially for automated data pipelines	2024-04-15 11:37:31 +05:30
Abhishek Radhakrishnan	041d0bff5e	Set default `KillUnusedSegments` duty to coordinator's indexing period & `killTaskSlotRatio` to 0.1 (#16247 ) The default value for druid.coordinator.kill.period (if unspecified) has changed from P1D to the value of druid.coordinator.period.indexingPeriod. Operators can choose to override druid.coordinator.kill.period and that will take precedence over the default behavior. The default value for the coordinator dynamic config killTaskSlotRatio is updated from 1.0 to 0.1. This ensures that that kill tasks take up only 1 task slot right out-of-the-box instead of taking up all the task slots. * Remove stale comment and inline canDutyRun() * druid.coordinator.kill.period defaults to druid.coordinator.period.indexingPeriod if not set. - Remove the default P1D value for druid.coordinator.kill.period. Instead default druid.coordinator.kill.period to whatever value druid.coordinator.period.indexingPeriod is set to if the former config isn't specified. - If druid.coordinator.kill.period is set, the value will take precedence over druid.coordinator.period.indexingPeriod * Update server/src/test/java/org/apache/druid/server/coordinator/DruidCoordinatorConfigTest.java * Fix checkstyle error * Clarify comment * Update server/src/main/java/org/apache/druid/server/coordinator/DruidCoordinatorConfig.java * Put back canDutyRun() * Default killTaskSlotsRatio to 0.1 instead of 1.0 (all slots) * Fix typo DEFAULT_MAX_COMPACTION_TASK_SLOTS * Remove unused test method. * Update default value of killTaskSlotsRatio in docs and web-console default mock * Move initDuty() after params and config setup.	2024-04-14 18:56:17 -07:00
Katya Macedo	7f06a53cb1	[Docs] Fix API placeholder formatting (#16240 )	2024-04-12 09:19:13 -07:00
YongGang	da9feb4430	Introduce TaskContextReport for reporting task context (#16041 ) Changes: - Add `TaskContextEnricher` interface to improve task management and monitoring - Invoke `enrichContext` in `TaskQueue.add()` whenever a new task is submitted to the Overlord - Add `TaskContextReport` to write out task context information in reports	2024-04-12 08:57:49 +05:30
Pranav	fc2600b8e2	Adding jvmVersion dimension in JVM Monitor (#16262 )	2024-04-11 15:44:56 -07:00
317brian	df9e1bb97b	Docs: Fix typo in tutorial (#16254 )	2024-04-10 08:59:52 +05:30
Katya Macedo	cd69f145b7	docs: Add upgrade notes for Druid 29.0.1 (#16123 )	2024-04-09 13:56:57 -07:00
Vishesh Garg	3d595cfab1	Add storeCompactionState flag support to msq (#15965 ) Compaction in the native engine by default records the state of compaction for each segment in the lastCompactionState segment field. This PR adds support for doing the same in the MSQ engine, targeted for future cases such as REPLACE and compaction done via MSQ. Note that this PR doesn't implicitly store the compaction state for MSQ replace tasks; it is stored with flag "storeCompactionState": true in the query context.	2024-04-09 16:47:47 +05:30
Vishesh Garg	9a4fb58543	Record column name for exceptions while writing frames in RowBasedFrameWriter (#16130 ) Current Runtime Exceptions generated while writing frames only include the exception itself without including the name of the column they were encountered in. This patch introduces the further information in the error and makes it non-retryable.	2024-04-09 15:39:10 +05:30
Adarsh Sanjeev	e2e0cb905c	Add reasoning for choosing shardSpec to the MSQ report (#16175 ) This PR logs the segment type and reason chosen. It also adds it to the query report, to be displayed in the UI. This PR adds a new section to the reports, segmentReport. This contains the segment type created, if the query is an ingestion, and null otherwise.	2024-04-09 11:32:02 +05:30
Parag Jain	f55c9e58a8	add google as external storage for msq export (#16051 ) Support for exporting msq results to gcs bucket. This is essentially copying the logic of s3 export for gs, originally done by @adarshsanjeev in this PR - #15689	2024-04-05 12:10:10 +05:30
Sergio Ferragut	64433eb2ff	Update kubernetes-overloard-extension extension name in docs (#16239 )	2024-04-04 14:38:28 -07:00
Zoltan Haindrich	1df41db46d	Migrate to use docker compose v2 (#16232 ) https://github.com/actions/runner-images/issues/9557	2024-04-03 12:32:55 +02:00
Charles Smith	1aa6808b9a	docs: add tutorial with examples of sql null handling (#16185 ) Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2024-04-01 11:03:42 -07:00
Pranav	20de7fd95a	Geo spatial interfaces (#16029 ) This PR creates an interface for ImmutableRTree and moved the existing implementation to new class which represent 32 bit implementation (stores coordinate as floats). This PR makes the ImmutableRTree extendable to create higher precision implementation as well (64 bit). In all spatial bound filters, we accept float as input which might not be accurate in the case of high precision implementation of ImmutableRTree. This PR changed the bound filters to accepts the query bounds as double instead of float and it is backward compatible change as it compares double to existing float values in RTree. Previously it was comparing input float to RTree floats which can cause precision loss, now it is little better as it compares double to float which is still not 100% accurate. There are no changes in the way that we query spatial dimension today except input bound parsing. There is little improvement in string filter predicate which now parse double strings instead of float and compares double to double which is 100% accurate but string predicate is only called when we dont have spatial index. With allowing the interface to extend ImmutableRTree, we allow to create high precision (HP) implementation and defines new search strategies to perform HP search Iterable<ImmutableBitmap> search(ImmutableDoubleNode node, Bound bound); With possible HP implementations, Radius bound filter can not really focus on accuracy, it is calculating Euclidean distance in comparing. As EARTH 🌍 is round and not flat, Euclidean distances are not accurate in geo system. This PR adds new param called 'radiusUnit' which allows you to specify units like meters, km, miles etc. It uses https://en.wikipedia.org/wiki/Haversine_formula to check if given geo point falls inside circle or not. Added a test that generates set of points inside and outside in RadiusBoundTest.	2024-04-01 14:58:03 +05:30
Adithya Chakilam	463010bb29	Populate segment stats for non-parallel compaction jobs (#16171 ) * Populate segment stats for non-parallel compaction jobs * fix * add-tests * comments * update-test * comments	2024-03-29 09:40:55 -04:00
Soumyava	524842a3bb	Window function on msq (#15470 ) This PR aims to introduce Window functions on MSQ by doing the following: Introduce a Window querykit for handling window queries along with its factory and a processor for window queries If a window operator is present with a partition by clause, pushes the partition as a shuffle spec of the previous stage In presence of empty OVER() clause lets all operators loose on a single rac In presence of no empty OVER() clause, breaks down each window into individual stages Associated machinery to handle window functions in MSQ Introduced a separate hidden engine feature WINDOW_LEAF_OPERATOR which is set only for MSQ engine. In presence of this feature, the planner plans without the leaf operators by creating a window query over an inner scan query. In case of native this is set to false and the planner generates the leafOperators Guardrails around materialization Comprehensive UTs	2024-03-28 14:58:34 +05:30
Adithya Chakilam	a65b2d4f41	Visibility into LagBased AutoScaler desired task count (#16199 ) * Visibility into skipped scale notices * comments * change to emit always instead of just skips * fix failing test * comments * Add couple more tests	2024-03-27 13:08:00 -04:00
Rushikesh Bankar	3d8b0ffae8	Add indexer level task metrics to provide more visibility in the task distribution (#15991 ) Changes: Add the following indexer level task metrics: - `worker/task/running/count` - `worker/task/assigned/count` - `worker/task/completed/count` These metrics will provide more visibility into the tasks distribution across indexers (We often see a task skew issue across indexers and with this issue it would be easier to catch the imbalance)	2024-03-21 11:08:01 +05:30
Abhishek Radhakrishnan	fa8e511492	Add versions to `markUsed` and `markUnused` APIs (#16141 ) * Mark used and unused APIs by versions. * remove the conditional invocations. * isValid() and test updates. * isValid() and tests. * Remove warning logs for invalid user requests. Also, downgrade visibility. * Update resp message, etc. * tests and some cleanup. * Docs draft * Clarify docs * Update server/src/main/java/org/apache/druid/server/http/DataSourcesResource.java Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Review comments * Remove default interface methods only used in tests and update docs. * Clarify javadocs and @Nullable. * Add more tests. * Parameterized versions. --------- Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2024-03-19 09:22:25 -07:00
Katya Macedo	da6158c166	[Docs] Improve the "Update existing data" tutorial (#16081 ) * Modify update data tutorial * Update after review * Add append topic * Update after review * Add upsert to spelling	2024-03-14 16:31:33 -07:00
Gian Merlino	256160aba6	MSQ: Validate that strings and string arrays are not mixed. (#15920 ) * MSQ: Validate that strings and string arrays are not mixed. When multi-value strings and string arrays coexist in the same column, it causes problems with "classic MVD" style queries such as: select * from wikipedia -- fails at runtime select count() from wikipedia where flags = 'B' -- fails at planning time select flags, count() from wikipedia group by 1 -- fails at runtime To avoid these problems, this patch adds type verification for INSERT and REPLACE. It is targeted: the only type changes that are blocked are string-to-array and array-to-string. There is also a way to exclude certain columns from the type checks, if the user really knows what they're doing. * Fixes. * Tests and docs and error messages. * More docs. * Adjustments. * Adjust message. * Fix tests. * Fix test in DV mode.	2024-03-13 15:37:27 -07:00
317brian	03c191f701	docs: clarify description of uri/uriprefix (#16110 ) * docs: clarify description of uri/uripath * Apply suggestions from code review Co-authored-by: Charles Smith <techdocsmith@gmail.com> --------- Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2024-03-13 11:52:01 -07:00
Abhishek Radhakrishnan	fb7bb0953d	Kill segments by versions (#15994 ) * Kill task version support. Kill tasks by default kill all versions of unused segments in the specified interval. Users wanting to delete specific versions (for example, data compliance reasons) and keep rest of the versions can specify the optional version in the kill task payload. * Formatting changes. * Multi version tests in RetrieveSegmentsActionsTest Sort of like method-level parameterized tests. * Address review feedback * Accept a list of versions instead of a single version. Support multiple versions. * Tests for multiple versions. * Update docs * Cleanup * Address review comments. Retain the old interface method and make it default and route it to the method with nullable versions variant. Update usages to use the default method where versions doesn't matter. * Remove versions from retreive used segments action. * Some updates. * Apply suggestions from code review Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * /s/actual/observed/g * minor test cleanup * WIP: Test fixes and updates. Also add test for kill by version with used load spec. Checkpoint. --------- Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2024-03-13 09:37:30 +05:30
Katya Macedo	6f6f86c325	Update `maxRowsInMemory` and `maxBytesInMemory` description (#16104 )	2024-03-12 14:40:15 -07:00
George Shiqi Wu	94d2a28465	Add deep storage segment metric (#16072 ) * Add new metric for deepStorage segments * Add docs * change metric name	2024-03-11 10:24:46 -04:00
Sensor	2d62b4f09b	docs refinement: json format (#16080 ) * docs refinement: json format * Update docs/api-reference/tasks-api.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> --------- Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2024-03-11 15:49:14 +08:00
Charles Smith	3caacba8c5	update window functions doc (#15902 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2024-03-07 15:16:52 -08:00
Jill Osborne	67ae0ff450	Update docs for rabbit community extension (#16069 ) * Updated docs for rabbit community extension * Updated after review	2024-03-07 11:29:53 -08:00
Adithya Chakilam	564c44ed85	Add stats segmentsRead and segmentsPublished to compaction task reports (#15947 ) Changes: - Add visibility into number of segments read/published by each parallel compaction - Add new fields `segmentsRead`, `segmentsPublished` to `IngestionStatsAndErrorsTaskReportData` - Update `ParallelIndexSupervisorTask` to populate the new stats	2024-03-07 09:37:23 +05:30
Charles Smith	ebf3bdd909	restore information about truncated responses to sql api (#16001 ) Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> Co-authored-by: Victoria Lim <lim.t.victoria@gmail.com>	2024-03-06 14:03:58 -08:00
Sergio Ferragut	d38703281c	updated description of rowsPerPage in export operations (#16048 ) * updated description of rowsPerPage in export operations * Update docs/multi-stage-query/reference.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> --------- Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2024-03-05 15:42:12 -08:00
Gian Merlino	566013a5f5	Docs: Fix spelling of 5 GB. (#16040 ) The spellchecker does not consider "5GB" to be spelled correctly.	2024-03-04 22:37:38 -08:00
zachjsh	720f1e834a	Add support for AzureDNSZone enabled storage accounts used for deep storage (#16016 ) * Add support for AzureDNSZone enabled storage accounts used for deep storage Added a new config to AzureAccountConfig `storageAccountEndpointSuffix` which allows the user to specify a storage account endpoint suffix where the underlying storage account is enabled for AzureDNSZone. The previous config `endpointSuffix`, did not allow support for such accounts. The previous config has been deprecated in favor of this new config. Also fixed an issue where `managedIdentityClientId` was not being set properly * * address review comments * * add back azure government link and docs	2024-03-04 16:13:28 -05:00
Gian Merlino	930655ff18	Move retries into DataSegmentPusher implementations. (#15938 ) * Move retries into DataSegmentPusher implementations. The individual implementations know better when they should and should not retry. They can also generate better error messages. The inspiration for this patch was a situation where EntityTooLarge was generated by the S3DataSegmentPusher, and retried uselessly by the retry harness in PartialSegmentMergeTask. * Fix missing var. * Adjust imports. * Tests, comments, style. * Remove unused import.	2024-03-04 10:36:21 -08:00
Katya Macedo	ced8be3044	docs: Add upgrade notes for Druid 29.0.0 (#16022 )	2024-03-04 08:58:52 -08:00
Sensor	4e9b758661	Support CPU resource configurable for Kubernates job under MoK Mode (#16008 ) * support CPU resource configurable for Kubernates job * update property doc * fix test name * refine doc format	2024-03-04 10:12:09 -05:00
Adithya Chakilam	ec52f686c0	Fix compaction tasks reports getting overwritten (#15981 ) * Fix compaction tasks reports geting overwrittened * only skip for compactiont task * address comments * fix boolean * move boolean flag to task rather than spec * rename variable * add docs, fix missing case * Update docs/ingestion/tasks.md * rename var * add task report decode check in IT * change assert	2024-03-04 10:10:17 -05:00
317brian	b3015bd7ce	docs: mention acid-compliance for meta store (#16014 ) * docs: add mermaid diagram support * fix crash when parsing data in data loader that can not be parsed (#15983) * update jetty to address CVE (#16000) * Concurrent replace should work with supervisors using concurrent locks (#15995) * Concurrent replace should work with supervisors using concurrent locks * Ignore supervisors with useConcurrentLocks set to false * Apply feedback * Add pre-check for heavy debug logs (#15706) Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> Co-authored-by: Benedict Jin <asdf2014@apache.org> * Remove helm paths from CodeQL config (#16006) * docs: mention acid-compliance for metadb --------- Co-authored-by: Vadim Ogievetsky <vadim@ogievetsky.com> Co-authored-by: Jan Werner <105367074+janjwerner-confluent@users.noreply.github.com> Co-authored-by: AmatyaAvadhanula <amatya.avadhanula@imply.io> Co-authored-by: Sensor <fectrain@outlook.com> Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> Co-authored-by: Benedict Jin <asdf2014@apache.org>	2024-03-04 11:00:38 +08:00
Zoltan Haindrich	bf0995f846	Introduce dynamic table append (#15897 )	2024-03-01 04:31:57 -05:00
317brian	3df161f73c	docs: update security doc for hashing (#15970 ) * docs: add mermaid diagram support * docs: update druid-basic-security doc to mention caching * Update docs/development/extensions-core/druid-basic-security.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> --------- Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2024-02-28 09:48:37 +08:00
benkrug	0c601bf430	Update basic-cluster-tuning.md (#14964 ) * Update basic-cluster-tuning.md The sentence "When free system memory is greater than or equal to druid.segmentCache.locations, the more segment data the Historical can be held in the memory-mapped segment cache" didn't read well. Updated to clarify it. * Update docs/operations/basic-cluster-tuning.md * Update docs/operations/basic-cluster-tuning.md --------- Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2024-02-28 09:48:20 +08:00
AlbericByte	e7d753d4b0	update the doc for dump-segment tool when using jdk11+ (#15971 ) * update the doc for dump-segment tool when using jdk11+ * update the style * fix spell check error	2024-02-28 09:40:10 +08:00
Abhishek Radhakrishnan	beccc401e1	Segments created in the same batch have the same `created_date` entry & rename metric (#15977 ) * All segments stored in the same batch have the same created_date entry. In the absence of a group_id column, this metadata would allow us to easily reason about and troubleshoot ingestion-related issues. * Rename metric name and code references to eligibleUnusedSegments. Address review comment from https://github.com/apache/druid/pull/15941#discussion_r1503631992	2024-02-27 17:28:43 +05:30
Karan Kumar	5bb5b41b18	Adding task pending time in MSQ reports (#15966 ) Added a new field pendingMs in MSQ task reports. This helps in figuring out the exact run time of the MSQ worker tasks. Fixed data races.	2024-02-27 14:41:28 +05:30
Abhishek Radhakrishnan	38ecf980d0	Refactor and add tests and metric to KillUnusedSegments duty (auto-kill) (#15941 ) * Kill duty and test improvements. Initial commit with: - Bug fixes - auto-kill can throw NPE when there are no datasources present and defaults mismatch. - Add new stat for candidate segment intervals killed. - Move a couple of debug logs to info logs for improved visibility (should only log once per kill period). - Remove redundant checks for code readability. - Updated tests from using mocks (also the mocks weren't using last updated timestamp) and add more test coverage for different config parameters. - Add a couple of unit tests that are ignored for the eternity case to prove that the kill duty doesn't clean up segments with ALL grain or that end in DateTimes.MAX. - Migrate Druid exception from user to operator persona. * Address review comments. * Remove unused methods. * fix up format specifier and validate bad config tests. * Consolidate the helpers a bit more and add another test. * Update test names. Add javadoc placeholders for slightly involved tests. * Add docs for metric kill/candidateUnusedSegments/count. Also, rename to disambiguate. * Comments. * Apply logging suggestions from code review Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Review comments - Clarify docs on eligibility. - Add test for multiple segments in the same interval. Clarify comment. - Remove log line from test. - Remove lastUpdatedDate = now.plus(10) from test. * minor cleanup. * Clarify javadocs for getUnusedSegmentIntervals(). --------- Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2024-02-27 12:14:41 +05:30
Abhishek Radhakrishnan	67a6224d91	Fix up incorrect `PARTITIONED BY` error messages (#15961 ) * Fix up typos, inaccuracies and clean up code related to PARTITIONED BY. * Remove wrapper function and update tests to use DruidExceptionMatcher. * Checkstyle and Intellij inspection fixes.	2024-02-26 14:17:53 -05:00
Benjamin Hopp	ebb7190545	Docs: Change single-dim to hashed in example for index task (#15529 )	2024-02-26 09:16:10 +05:30
Gian Merlino	b69f89d9f8	Clarify where to set druid.monitoring.monitors. (#15729 )	2024-02-23 18:49:37 +05:30
Adithya Chakilam	1f443d218c	Enable partition stats on streaming task completion report (#15930 ) Changes: - Add visibility into number of records processed by each streaming task per partition - Add field `recordsProcessed` to `IngestionStatsAndErrorsTaskReportData` - Populate number of records processed per partition in `SeekableStreamIndexTaskRunner`	2024-02-23 16:29:03 +05:30
Jamie	80942d5754	Feature: add support for ingesting from rabbitmq super streams (#14137 ) * Add support for ingesting from Rabbit MQ Super Streams	2024-02-22 10:50:37 +05:30
George Shiqi Wu	59bb72a926	Fix parsing of env variables when properties have underscores (#15919 ) * Fix parsing of env variables when properties have underscores * Add documentation * Use a % sign instead	2024-02-21 13:18:21 -05:00
317brian	c98d54f3c4	docs: delete unused file that causes confusion (#15910 )	2024-02-14 16:42:02 -08:00

1 2 3 4 5 ...

3215 Commits