druid

Commit Graph

Author	SHA1	Message	Date
Clint Wylie	5da69a01cb	change arrayIngestMode default to array (#16789 ) * change arrayIngestMode default to array * remove arrayIngestMode flag option none * fix space * fix test	2024-07-25 15:09:40 +08:00
Zoltan Haindrich	7e3fab5bf9	Make WindowFrames more specific (#16741 ) Changes the WindowFrame internals / representation a bit; introduces dedicated frametypes for rows and groups which corresponds to the implemented processing methods	2024-07-25 04:57:36 +02:00
Edgar Melendrez	ca787885c9	[docs] batch02 of updating functions (#16761 ) * applying changes * ensuring batch is updated * Update docs/querying/sql-functions.md * raise -> raises * addressing review * Apply suggestions from code review Co-authored-by: Charles Smith <techdocsmith@gmail.com> --------- Co-authored-by: Benedict Jin <asdf2014@apache.org> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2024-07-24 15:28:57 -07:00
317brian	704962ec8e	doc: minor fixes to migration guides (#16784 )	2024-07-23 13:09:51 -07:00
Edgar Melendrez	934c10b1cd	docs: Adding admonition box to warn about MVD (#16712 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Benedict Jin <asdf2014@apache.org>	2024-07-22 17:32:23 -07:00
Clint Wylie	02b8738c00	remove batchProcessingMode from task config, remove AppenderatorImpl (#16765 ) changes: * removes `druid.indexer.task.batchProcessingMode` in favor of always using `CLOSED_SEGMENT_SINKS` which uses `BatchAppenderator`. This was intended to become the default for native batch, but that was missed so `CLOSED_SEGMENTS` was the default (using `AppenderatorImpl`), however MSQ has been exclusively using `BatchAppenderator` with no problems so it seems safe to just roll it out as the only option for batch ingestion everywhere. * with `batchProcessingMode` gone, there is no use for `AppenderatorImpl` so it has been removed * implify `Appenderator` construction since there are only separate stream and batch versions now * simplify tests since `batchProcessingMode` is gone	2024-07-22 13:56:44 -07:00
Clint Wylie	a34a06e192	remove Firehose and FirehoseFactory (#16758 ) changes: * removed `Firehose` and `FirehoseFactory` and remaining implementations which were mostly no longer used after #16602 * Moved `IngestSegmentFirehose` which was still used internally by Hadoop ingestion to `DatasourceRecordReader.SegmentReader` * Rename `SQLFirehoseFactoryDatabaseConnector` to `SQLInputSourceDatabaseConnector` and similar renames for sub-classes * Moved anything remaining in a 'firehose' package somewhere else * Clean up docs on firehose stuff	2024-07-19 14:37:21 -07:00
Charles Smith	1881880714	[Docs] Adds a migration guide SQL compatible null handling (#16704 ) Co-authored-by: Clint Wylie <cjwylie@gmail.com> Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2024-07-19 09:25:05 -07:00
Clint Wylie	35b876436b	remove native scan query legacy mode (#16659 )	2024-07-18 23:33:27 -07:00
Edgar Melendrez	721a65046f	docs: add examples for SQL functions (#16745 ) * updating first batch of numeric functions * First batch of functions * addressing first few comments * alphabetize list * draft with suggestions applied * minor discrepency expr -> <NUMERIC> * changed raises to calculates * Update docs/querying/sql-functions.md * switch to underscore * changed to exp(1) to match slack message * adding html text for trademark symbol to .spelling * fixed discrepancy between description and example --------- Co-authored-by: Benedict Jin <asdf2014@apache.org>	2024-07-18 17:06:22 -07:00
Kashif Faraz	9f6ce6ddc0	Remove task action audit logging and druid_taskLog metadata table (#16309 ) Description: Task action audit logging was first deprecated and disabled by default in Druid 0.13, #6368. As called out in the original discussion #5859, there are several drawbacks to persisting task action audit logs. - Only usage of the task audit logs is to serve the API `/indexer/v1/task/{taskId}/segments` which returns the list of segments created by a task. - The use case is really narrow and no prod clusters really use this information. - There can be better ways of obtaining this information, such as the metric `segment/added/bytes` which reports both the segment ID and task ID when a segment is committed by a task. We could also include committed segment IDs in task reports. - A task persisting several segments would bloat up the audit logs table putting unnecessary strain on metadata storage. Changes: - Remove `TaskAuditLogConfig` - Remove method `TaskAction.isAudited()`. No task action is audited anymore. - Remove `SegmentInsertAction` as it is not used anymore. `SegmentTransactionalInsertAction` is the new incarnation which has been in use for a while. - Deprecate `MetadataStorageActionHandler.addLog()` and `getLogs()`. These are not used anymore but need to be retained for backward compatibility of extensions. - Do not create `druid_taskLog` metadata table anymore.	2024-07-17 17:09:00 +05:30
Vadim Ogievetsky	307b8849de	Web console: better sql data loader reset (#16696 ) * better sql data loader reset * snapshot * fix destination pane sizing * clean doc links * update doc links * more doc links * extract getClusterCapacity * update snapsohts * allow submit suspended * some renaming * diff with current * Do delta	2024-07-11 14:45:04 -07:00
YongGang	4b293fc2a9	Docs: Fix k8s dynamic config URL (#16720 )	2024-07-11 10:05:47 +05:30
Lars Francke	586c713d12	Updates build documentation to not mention explicit Java version as it was out of sync with the dedicated Java page. (#16674 ) This means there is one less place to keep information in sync.	2024-07-03 20:53:15 +05:30
317brian	d65e015c94	docs: nit for link format (#16687 )	2024-07-02 16:45:09 -07:00
Victoria Lim	adde024e11	docs: Subtitle updates in migration guide overview (#16683 )	2024-07-02 12:56:05 -07:00
Jill Osborne	bd49ecfd29	Addition to subquery limit migration guide (#16671 ) Co-authored-by: Laksh Singla <lakshsingla@gmail.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2024-07-01 14:22:47 -07:00
Hugh Evans	920d9020c0	Docs: Fix default value for globalIngestionHeapLimitBytes (#16654 ) Use the new default value added in #8255	2024-06-27 07:01:56 +05:30
Gian Merlino	dbed1b0f50	Defer more expressions in vectorized groupBy. (#16338 ) * Defer more expressions in vectorized groupBy. This patch adds a way for columns to provide GroupByVectorColumnSelectors, which controls how the groupBy engine operates on them. This mechanism is used by ExpressionVirtualColumn to provide an ExpressionDeferredGroupByVectorColumnSelector that uses the inputs of an expression as the grouping key. The actual expression evaluation is deferred until the grouped ResultRow is created. A new context parameter "deferExpressionDimensions" allows users to control when this deferred selector is used. The default is "fixedWidthNonNumeric", which is a behavioral change from the prior behavior. Users can get the prior behavior by setting this to "singleString". * Fix style. * Add deferExpressionDimensions to SqlExpressionBenchmark. * Fix style. * Fix inspections. * Add more testing. * Use valueOrDefault. * Compute exprKeyBytes a bit lighter-weight.	2024-06-26 17:28:36 -07:00
Andreas Maechler	ab76d851ad	Update docs contribution with correct script (#16581 ) * Spacing * Fix ordering * npm run start	2024-06-26 10:30:52 -07:00
Laksh Singla	71b3b5ab5d	Add query context parameter to remove null bytes when writing frames (#16579 ) MSQ cannot process null bytes in string fields, and the current workaround is to remove them using the REPLACE function. 'removeNullBytes' context parameter has been added which sanitizes the input string fields by removing these null bytes.	2024-06-26 15:00:30 +05:30
Edgar Melendrez	b43f4063c5	Docs: update link and title of quickstart (#16638 ) * update link and title * Discard changes to website/package.json * Apply suggestions from code review Co-authored-by: Charles Smith <techdocsmith@gmail.com> --------- Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2024-06-25 09:07:00 -07:00
Clint Wylie	37a50e6803	Remove index_realtime and index_realtime_appenderator tasks (#16602 ) index_realtime tasks were removed from the documentation in #13107. Even at that time, they weren't really documented per se— just mentioned. They existed solely to support Tranquility, which is an obsolete ingestion method that predates migration of Druid to ASF and is no longer being maintained. Tranquility docs were also de-linked from the sidebars and the other doc pages in #11134. Only a stub remains, so people with links to the page can see that it's no longer recommended. index_realtime_appenderator tasks existed in the code base, but were never documented, nor as far as I am aware were they used for any purpose. This patch removes both task types completely, as well as removes all supporting code that was otherwise unused. It also updates the stub doc for Tranquility to be firmer that it is not compatible. (Previously, the stub doc said it wasn't recommended, and pointed out that it is built against an ancient 0.9.2 version of Druid.) ITUnionQueryTest has been migrated to the new integration tests framework and updated to use Kafka ingestion. Co-authored-by: Gian Merlino <gianmerlino@gmail.com>	2024-06-24 20:13:33 -07:00
317brian	2131917f16	docs: added front-coded dictionaries to upgrade notes (#16647 ) * docs: add front-coded dictionareis to upgrade notes * add it to release notes template	2024-06-24 10:52:26 -07:00
Abhishek Radhakrishnan	7463589b07	Support for bootstrap segments (#16609 ) * Initial support for bootstrap segments. - Adds a new API in the coordinator. - All processes that have storage locations configured (including tasks) talk to the coordinator if they can, and fetch bootstrap segments from it. - Then load the segments onto the segment cache as part of startup. - This addresses the segment bootstrapping logic required by processes before they can start serving queries or ingesting. This patch also lays the foundation to speed up upgrades. * Fail open by default if there are any errors talking to the coordinator. * Add test for failure scenario and cleanup logs. * Cleanup and add debug log * Assert the events so we know the list exactly. * Revert RunRules test. The rules aren't evaluated if there are no clusters. * Revert RunRulesTest too. * Remove debug info. * Make the API POST and update log. * Fix up UTs. * Throw 503 from MetadataResource; clean up exception handling and DruidException. * Remove unused logger, add verification of metrics and docs. * Update error message * Update server/src/main/java/org/apache/druid/server/coordination/SegmentLoadDropHandler.java Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Apply suggestions from code review Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Adjust test metric expectations with the rename. * Add BootstrapSegmentResponse container in the response for future extensibility. * Rename to BootstrapSegmentsInfo for internal consistency. * Remove unused log. * Use a member variable for broadcast segments instead of segmentAssigner. * Minor cleanup * Add test for loadable bootstrap segments and clarify comment. * Review suggestions. --------- Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2024-06-24 09:27:17 -07:00
Suneet Saldanha	4e0ea7823b	Update docs for K8s TaskRunner Dynamic Config (#16600 ) * Update docs for K8s TaskRunner Dynamic Config * touchups * code review * npe * oopsies	2024-06-21 06:01:59 -07:00
Akshat Jain	cd438b1918	Emit metrics for S3UploadThreadPool (#16616 ) * Emit metrics for S3UploadThreadPool * Address review comments * Revert unnecessary formatting change * Revert unnecessary formatting change in metrics.md file * Address review comments * Add metric for task duration * Minor fix in metrics.md * Add s3Key and uploadId in the log message * Address review comments * Create new instance of ServiceMetricEvent.Builder for thread safety * Address review comments * Address review comments	2024-06-21 11:36:47 +05:30
Andreas Maechler	ae70e18bc8	docs: Update Azure extension (#16585 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2024-06-20 09:31:29 -07:00
Jill Osborne	aec1d5ddd6	Link fix (#16596 ) * Link fix * Update docs/operations/auth.md Co-authored-by: Andreas Maechler <amaechler@gmail.com> --------- Co-authored-by: Andreas Maechler <amaechler@gmail.com>	2024-06-14 11:40:53 -07:00
317brian	e1926e2549	docs: fix redirect (#16548 ) * doc: cleanup unnecessary redirect (cherry picked from commit d86aaadbc78cc51345f768ee66c9a8b2cbf13f27) * restore redirect file entry. delete md file	2024-06-14 09:54:16 +08:00
Alberic Liu	ea2de517b2	Update the youtube link for druid presentations page (#16601 ) * Update the link to lambda architectures with Druid * update the youtube link	2024-06-14 09:47:46 +08:00
Victoria Lim	836cdb48a5	docs: Migration guide for MVDs to arrays (#16516 ) Co-authored-by: Clint Wylie <cjwylie@gmail.com> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> Co-authored-by: Benedict Jin <asdf2014@apache.org> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>	2024-06-13 13:05:58 -07:00
George Shiqi Wu	d5a25a94b8	Docs: Clarify that all supervisors can support early handoff (#16588 )	2024-06-13 08:43:22 +05:30
YongGang	46dbc74053	Support Dynamic Peon Pod Template Selection in K8s extension (#16510 ) * initial commit * add Javadocs * refine JSON input config * more test and fix build * extract existing behavior as default strategy * change template mapping fallback * add docs * update doc * fix doc * address comments * define Matcher interface * fix test coverage * use lower case for endpoint path * update Json name * add more tests * refactoring Selector class	2024-06-12 15:27:10 -07:00
Andreas Maechler	fec48432d4	docs: Correct some outdated module names (#16584 ) * Fix module names * Better spacing * Some spacing * Suggestions from code review Thanks Abhishek. * More links * Roll-up time * Remove logs * More spelling	2024-06-11 14:17:40 -07:00
Andreas Maechler	24056b90b5	Bring back missing property in indexer documentation (#16582 ) * Bring back druid.peon.taskActionClient.retry.minWait * Update docs/configuration/index.md * Consistent italics Thanks Abhishek. * Update docs/configuration/index.md Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com> * Consistent list style * Remove extra space --------- Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>	2024-06-10 16:52:54 -07:00
Kashif Faraz	e4fdf1055b	Update default value of `druid.indexer.tasklock.batchAllocationWaitTime` to zero (#16578 ) Update default value of druid.indexer.tasklock.batchAllocationWaitTime to 0. Thus, a segment allocation request is processed immediately unless there are already some requests queued before this one. While in queue, a segment allocation request may get clubbed together with other similar requests into a batch to reduce load on the metadata store.	2024-06-10 20:07:23 +05:30
317brian	8e11adfc6f	docs: remove outdated druidversion var from a page (#16570 ) Co-authored-by: asdf2014 <asdf2014@apache.org>	2024-06-10 15:30:36 +08:00
Gian Merlino	b837ce565b	Simplify serialized form of JsonInputFormat. (#15691 ) * Simplify serialized form of JsonInputFormat. Use JsonInclude for keepNullColumns, assumeNewlineDelimited, and useJsonNodeReader. Because the default value of keepNullColumns is variable, we store the original configured value rather than the derived value, and include if the original value is nonnull. * Fix test.	2024-06-05 20:01:14 -07:00
Katya Macedo	7aecc09230	Docs: Remove circular link (#16553 )	2024-06-05 11:07:36 -07:00
Charles Smith	c100ae0ecc	Add a tutorial for LATEST_BY to get most recent data (#16515 ) Co-authored-by: Will Xu <2bethere@gmail.com> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>	2024-06-04 17:00:25 -07:00
Jill Osborne	8b5802d4cd	docs: add maxSubqueryBytes limit to migration guide landing page (#16547 ) Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2024-06-04 12:52:06 -07:00
Amit	540d3e6af5	Added new use cases and description of the use case - 5/14/24 (#16451 ) Thanks for your contribution @amit-git-account * Added new use cases and description of the use case - 5/14/24 The use case listing is not changed in a long time. While speaking with users, I came across several other use cases not listed here in the index. So I added new use cases and also added description against the use cases. * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * update spelling file * Update docs/design/index.md --------- Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> Co-authored-by: Benedict Jin <asdf2014@apache.org>	2024-06-04 09:47:49 -07:00
Charles Smith	8f78c901e7	docs: add lookups to the sidebar (#16530 ) Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2024-06-03 16:04:15 -07:00
Charles Smith	b1568fb95b	docs: Adds a redirect for flatten-json which was removed (#16263 )	2024-05-31 16:16:12 -07:00
Katya Macedo	f70ef1f434	Update front coding text (#16491 ) Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2024-05-31 15:13:10 -07:00
Katya Macedo	92e660dd21	Add Druid 30.0.0 upgrade notes (#16522 )	2024-05-31 13:23:22 -07:00
Atul Mohan	b53d75758f	IcebergInputSource : Add option to toggle case sensitivity while reading columns from iceberg catalog (#16496 ) * Toggle case sensitivity while reading columns from iceberg * Fix tests * Drop case check and set unconditionally	2024-05-31 10:18:52 -07:00
George Shiqi Wu	0936798122	Add limit to task payload size (#16512 ) * Add limit to task payload size * Change to a warning * Remove test * Fix unit tests * Optionally throw alert * PR comments * Update indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskQueue.java Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * PR comments * Reject large payloads * Update docs/configuration/index.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Update indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskQueue.java Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> --------- Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2024-05-31 09:17:36 -07:00
Jill Osborne	3c72ec8413	docs: Migration guide for subquery limit (#16519 ) Adds a migration guide for Druid 30 to help users understand the new byte-based subquery limit property maxSubqueryBytes	2024-05-31 09:26:07 +05:30
Charles Smith	92e565e3b8	Adds a migration guide overview page to the release-info section (#16506 ) Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> Co-authored-by: Katya Macedo <katya.macedo@imply.io>	2024-05-30 09:50:30 -07:00
Adithya Chakilam	a9044ac235	Add cgroup cpu/mem/disk usage metrics (#16472 ) * Add cgroup cpu/mem usage metrics * checks * comments * docs fix * add disk metrics * fapi check * checkstyle * issues * spelling * change asserts * checks * use proc builder instead of runtime * specify charset * spotbug	2024-05-29 12:44:37 -07:00
George Shiqi Wu	b3b62ac431	Update azure input source docs (#16508 ) Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2024-05-29 10:00:46 -07:00
Vadim Ogievetsky	10ea88e5bf	Web console: more robust durable storage setting detection (#16493 ) * more robust durable storage setting * add test	2024-05-22 15:47:20 -07:00
Vadim Ogievetsky	a124c6cbbd	fix typo in extension name (#16466 )	2024-05-20 09:47:22 +08:00
George Shiqi Wu	ed9881df88	Cleanup logic from handoff API (#16457 ) * Cleanup logic from handoff API * Fix test * Fix checkstyle * Update docs	2024-05-16 08:42:44 -07:00
Gian Merlino	72432c2e78	Speed up SQL IN using SCALAR_IN_ARRAY. (#16388 ) * Speed up SQL IN using SCALAR_IN_ARRAY. Main changes: 1) DruidSqlValidator now includes a rewrite of IN to SCALAR_IN_ARRAY, when the size of the IN is above inFunctionThreshold. The default value of inFunctionThreshold is 100. Users can restore the prior behavior by setting it to Integer.MAX_VALUE. 2) SearchOperatorConversion now generates SCALAR_IN_ARRAY when converting to a regular expression, when the size of the SEARCH is above inFunctionExprThreshold. The default value of inFunctionExprThreshold is 2. Users can restore the prior behavior by setting it to Integer.MAX_VALUE. 3) ReverseLookupRule generates SCALAR_IN_ARRAY if the set of reverse-looked-up values is greater than inFunctionThreshold. * Revert test. * Additional coverage. * Update docs/querying/sql-query-context.md Co-authored-by: Benedict Jin <asdf2014@apache.org> * New test. --------- Co-authored-by: Benedict Jin <asdf2014@apache.org>	2024-05-14 08:09:27 -07:00
George Shiqi Wu	c1bf4fed90	API for stopping streaming tasks early (#16310 ) * Try stopping task early * Fix checkstyle * Add unit test * Add a couple more tests * PR changes * Use notice * fix checkstyle * PR changes * Update indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/supervisor/SeekableStreamSupervisor.java Co-authored-by: Suneet Saldanha <suneet@apache.org> * Update indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/supervisor/SeekableStreamSupervisor.java Co-authored-by: Suneet Saldanha <suneet@apache.org> * Change payload * Remove quotes --------- Co-authored-by: Suneet Saldanha <suneet@apache.org>	2024-05-14 06:39:50 -07:00
Alberic Liu	811dcd1726	update protobuf.md (#16434 )	2024-05-11 17:52:54 +08:00
Charles Smith	2d0b4e5f1e	Update sidebar to organize tutorials + other minor improvements (#16184 ) Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2024-05-09 08:57:43 -07:00
Adarsh Sanjeev	269e035e76	Add validation for reindex with realtime sources (#16390 ) Add validation for reindex with realtime sources. With the addition of concurrent compaction, it is possible to ingest data while querying from realtime sources with MSQ into the same datasource. This could potentially lead to issues if the interval that is ingested into is replaced by an MSQ job, which has queried only some of the data from the realtime task. This PR adds validation to check that the datasource being ingested into is not being queried from, if the query includes realtime sources.	2024-05-07 10:32:15 +05:30
Misha	b5958b6b07	Feature configurable calcite bloat (#16248 ) * Configurable bloat for calcite ProjectMergeRule implemented * Comment added * Default bloat value increased to 1000 * Implemented bloat configuration from QueryContext * Code refactored, docs updated --------- Co-authored-by: sviatahorau <mikhail.sviatahorau@deep.bi>	2024-05-06 20:43:39 +05:30
Alberic Liu	92fb0ff718	upgrade mysql:mysql-connector-java to 8.2.0 (#16024 ) * upgrade mysql:mysql-connector-java to 8.2.0 * fix the check errors * remove unused comment	2024-05-06 21:58:37 +08:00
Rishabh Singh	c61c3785a0	Followup changes to 15817 (Segment schema publishing and polling) (#16368 ) * Fix build * Nit changes in KillUnreferencedSegmentSchema * Replace reference to the abbreviation SMQ with Metadata Query, rename inTransit maps in schema cache * nitpicks * Remove reference to smq abbreviation from integration-tests * Remove reference to smq abbreviation from integration-tests * minor change * Update index.md * Add delimiter while computing schema fingerprint hash	2024-05-03 19:13:52 +05:30
Kashif Faraz	51104e8bb3	Docs: Remove references to Zk-based segment loading (#16360 ) Follow up to #15705 Changes: - Remove references to ZK-based segment loading in the docs - Fix doc for existing config `druid.coordinator.loadqueuepeon.http.repeatDelay`	2024-05-01 08:06:00 +05:30
Abhishek Radhakrishnan	1d7595f3f7	Support for filters in the Druid Delta Lake connector (#16288 ) * Delta Lake support for filters. * Updates * cleanup comments * Docs * Remmove Enclosed runner * Rename * Cleanup test * Serde test for the Delta input source and fix jackson annotation. * Updates and docs. * Update error messages to be clearer * Fixes * Handle NumberFormatException to provide a nicer error message. * Apply suggestions from code review Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Doc fixes based on feedback * Yes -> yes in docs; reword slightly. * Update docs/ingestion/input-sources.md Co-authored-by: Laksh Singla <lakshsingla@gmail.com> * Update docs/ingestion/input-sources.md Co-authored-by: Laksh Singla <lakshsingla@gmail.com> * Documentation, javadoc and more updates. * Not with an or expression end-to-end test. * Break up =, >, >=, <, <= into its own types instead of sub-classing. --------- Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> Co-authored-by: Laksh Singla <lakshsingla@gmail.com>	2024-04-29 11:31:36 -07:00
Adithya Chakilam	f8015eb02a	Add config lagAggregate to LagBasedAutoScalerConfig (#16334 ) Changes: - Add new config `lagAggregate` to `LagBasedAutoScalerConfig` - Add field `aggregateForScaling` to `LagStats` - Use the new field/config to determine which aggregate to use to compute lag - Remove method `Supervisor.computeLagForAutoScaler()`	2024-04-29 22:20:41 +05:30
Gian Merlino	db82adcdfd	SCALAR_IN_ARRAY: Optimization and behavioral follow-ups. (#16311 ) * Four changes to scalar_in_array as follow-ups to #16306: 1) Align behavior for `null` scalars to the behavior of the native `in` and `inType` filters: return `true` if the array itself contains null, else return `null`. 2) Rename the class to more closely match the function name. 3) Add a specialization for constant arrays, where we build a `HashSet`. 4) Use `castForEqualityComparison` to properly handle cross-type comparisons. Additional tests verify comparisons between LONG and DOUBLE are now handled properly. * Fix spelling. * Adjustments from review.	2024-04-26 16:01:17 -07:00
Atul Mohan	77333e56fa	Docs: Add missing kafka emitter config (#16332 )	2024-04-25 10:37:14 +05:30
Katya Macedo	ceb6646dec	Add supervisor actions (#16276 ) * Add supervisor actions * Update text * Update text * Update after review * Update after review	2024-04-24 13:14:01 -07:00
Rishabh Singh	e30790e013	Introduce Segment Schema Publishing and Polling for Efficient Datasource Schema Building (#15817 ) Issue: #14989 The initial step in optimizing segment metadata was to centralize the construction of datasource schema in the Coordinator (#14985). Thereafter, we addressed the problem of publishing schema for realtime segments (#15475). Subsequently, our goal is to eliminate the requirement for regularly executing queries to obtain segment schema information. This is the final change which involves publishing segment schema for finalized segments from task and periodically polling them in the Coordinator.	2024-04-24 22:22:53 +05:30
Charles Smith	65412f80ab	remove additional column marks (#16319 )	2024-04-22 19:41:54 -07:00
Sree Charan Manamala	ad5701e891	new SCALAR_IN_ARRAY function analogous to DRUID_IN (#16306 ) * scalar_in function * api doc * refactor	2024-04-18 21:15:15 -07:00
Gian Merlino	4285a5e2c6	Update documentation for exceptions to subquery limit. (#16295 ) The true exception for groupBy is somewhat more narrow than the docs suggest.	2024-04-17 21:04:43 -07:00
Hardik Bajaj	0bf5e7745d	Add configurable parameters for statsd client (#16283 ) Statsd client sometimes drops metrics when this queueSize of statsd client with max unprocessed messages is completely full. This causes some high cardinality metrics like per partition lag being droppped. There are multiple parameters of statsdclient that can be initialized and can help increase the load/capacity of client to not to drop metrics more frequently. Properties like queueSize, poolSize, processorWorkers and senderWorkers will now be configurable at runtime	2024-04-17 18:35:31 +05:30
Nikhil Rao	a805c5612e	Adds Druid SQL query examples for the Stats aggregator Native Queries (#16277 ) * Adds Druid SQL query examples for the Timeseries and GroupBy Native queries in the stats aggregator docs page * Updates intervals in Native Query to remove excess Time part in timestamp * Moves Druid SQL section above Native query because sql used more often by users * removes old Druid SQL sections * Adds TopN Druid SQL query using ORDER BY and LIMIT * Adds table for Druid SQL variance and standard deviation functions * Update docs/development/extensions-core/stats.md Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com> --------- Co-authored-by: Karan Kumar <karankumar1100@gmail.com> Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>	2024-04-15 08:05:34 -07:00
Adarsh Sanjeev	3df00aef9d	Add manifest file for MSQ export (#15953 ) Currently, export creates the files at the provided destination. The addition of the manifest file will provide a list of files created as part of the manifest. This will allow easier consumption of the data exported from Druid, especially for automated data pipelines	2024-04-15 11:37:31 +05:30
Abhishek Radhakrishnan	041d0bff5e	Set default `KillUnusedSegments` duty to coordinator's indexing period & `killTaskSlotRatio` to 0.1 (#16247 ) The default value for druid.coordinator.kill.period (if unspecified) has changed from P1D to the value of druid.coordinator.period.indexingPeriod. Operators can choose to override druid.coordinator.kill.period and that will take precedence over the default behavior. The default value for the coordinator dynamic config killTaskSlotRatio is updated from 1.0 to 0.1. This ensures that that kill tasks take up only 1 task slot right out-of-the-box instead of taking up all the task slots. * Remove stale comment and inline canDutyRun() * druid.coordinator.kill.period defaults to druid.coordinator.period.indexingPeriod if not set. - Remove the default P1D value for druid.coordinator.kill.period. Instead default druid.coordinator.kill.period to whatever value druid.coordinator.period.indexingPeriod is set to if the former config isn't specified. - If druid.coordinator.kill.period is set, the value will take precedence over druid.coordinator.period.indexingPeriod * Update server/src/test/java/org/apache/druid/server/coordinator/DruidCoordinatorConfigTest.java * Fix checkstyle error * Clarify comment * Update server/src/main/java/org/apache/druid/server/coordinator/DruidCoordinatorConfig.java * Put back canDutyRun() * Default killTaskSlotsRatio to 0.1 instead of 1.0 (all slots) * Fix typo DEFAULT_MAX_COMPACTION_TASK_SLOTS * Remove unused test method. * Update default value of killTaskSlotsRatio in docs and web-console default mock * Move initDuty() after params and config setup.	2024-04-14 18:56:17 -07:00
Katya Macedo	7f06a53cb1	[Docs] Fix API placeholder formatting (#16240 )	2024-04-12 09:19:13 -07:00
YongGang	da9feb4430	Introduce TaskContextReport for reporting task context (#16041 ) Changes: - Add `TaskContextEnricher` interface to improve task management and monitoring - Invoke `enrichContext` in `TaskQueue.add()` whenever a new task is submitted to the Overlord - Add `TaskContextReport` to write out task context information in reports	2024-04-12 08:57:49 +05:30
Pranav	fc2600b8e2	Adding jvmVersion dimension in JVM Monitor (#16262 )	2024-04-11 15:44:56 -07:00
317brian	df9e1bb97b	Docs: Fix typo in tutorial (#16254 )	2024-04-10 08:59:52 +05:30
Katya Macedo	cd69f145b7	docs: Add upgrade notes for Druid 29.0.1 (#16123 )	2024-04-09 13:56:57 -07:00
Vishesh Garg	3d595cfab1	Add storeCompactionState flag support to msq (#15965 ) Compaction in the native engine by default records the state of compaction for each segment in the lastCompactionState segment field. This PR adds support for doing the same in the MSQ engine, targeted for future cases such as REPLACE and compaction done via MSQ. Note that this PR doesn't implicitly store the compaction state for MSQ replace tasks; it is stored with flag "storeCompactionState": true in the query context.	2024-04-09 16:47:47 +05:30
Vishesh Garg	9a4fb58543	Record column name for exceptions while writing frames in RowBasedFrameWriter (#16130 ) Current Runtime Exceptions generated while writing frames only include the exception itself without including the name of the column they were encountered in. This patch introduces the further information in the error and makes it non-retryable.	2024-04-09 15:39:10 +05:30
Adarsh Sanjeev	e2e0cb905c	Add reasoning for choosing shardSpec to the MSQ report (#16175 ) This PR logs the segment type and reason chosen. It also adds it to the query report, to be displayed in the UI. This PR adds a new section to the reports, segmentReport. This contains the segment type created, if the query is an ingestion, and null otherwise.	2024-04-09 11:32:02 +05:30
Parag Jain	f55c9e58a8	add google as external storage for msq export (#16051 ) Support for exporting msq results to gcs bucket. This is essentially copying the logic of s3 export for gs, originally done by @adarshsanjeev in this PR - #15689	2024-04-05 12:10:10 +05:30
Sergio Ferragut	64433eb2ff	Update kubernetes-overloard-extension extension name in docs (#16239 )	2024-04-04 14:38:28 -07:00
Zoltan Haindrich	1df41db46d	Migrate to use docker compose v2 (#16232 ) https://github.com/actions/runner-images/issues/9557	2024-04-03 12:32:55 +02:00
Charles Smith	1aa6808b9a	docs: add tutorial with examples of sql null handling (#16185 ) Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2024-04-01 11:03:42 -07:00
Pranav	20de7fd95a	Geo spatial interfaces (#16029 ) This PR creates an interface for ImmutableRTree and moved the existing implementation to new class which represent 32 bit implementation (stores coordinate as floats). This PR makes the ImmutableRTree extendable to create higher precision implementation as well (64 bit). In all spatial bound filters, we accept float as input which might not be accurate in the case of high precision implementation of ImmutableRTree. This PR changed the bound filters to accepts the query bounds as double instead of float and it is backward compatible change as it compares double to existing float values in RTree. Previously it was comparing input float to RTree floats which can cause precision loss, now it is little better as it compares double to float which is still not 100% accurate. There are no changes in the way that we query spatial dimension today except input bound parsing. There is little improvement in string filter predicate which now parse double strings instead of float and compares double to double which is 100% accurate but string predicate is only called when we dont have spatial index. With allowing the interface to extend ImmutableRTree, we allow to create high precision (HP) implementation and defines new search strategies to perform HP search Iterable<ImmutableBitmap> search(ImmutableDoubleNode node, Bound bound); With possible HP implementations, Radius bound filter can not really focus on accuracy, it is calculating Euclidean distance in comparing. As EARTH 🌍 is round and not flat, Euclidean distances are not accurate in geo system. This PR adds new param called 'radiusUnit' which allows you to specify units like meters, km, miles etc. It uses https://en.wikipedia.org/wiki/Haversine_formula to check if given geo point falls inside circle or not. Added a test that generates set of points inside and outside in RadiusBoundTest.	2024-04-01 14:58:03 +05:30
Adithya Chakilam	463010bb29	Populate segment stats for non-parallel compaction jobs (#16171 ) * Populate segment stats for non-parallel compaction jobs * fix * add-tests * comments * update-test * comments	2024-03-29 09:40:55 -04:00
Soumyava	524842a3bb	Window function on msq (#15470 ) This PR aims to introduce Window functions on MSQ by doing the following: Introduce a Window querykit for handling window queries along with its factory and a processor for window queries If a window operator is present with a partition by clause, pushes the partition as a shuffle spec of the previous stage In presence of empty OVER() clause lets all operators loose on a single rac In presence of no empty OVER() clause, breaks down each window into individual stages Associated machinery to handle window functions in MSQ Introduced a separate hidden engine feature WINDOW_LEAF_OPERATOR which is set only for MSQ engine. In presence of this feature, the planner plans without the leaf operators by creating a window query over an inner scan query. In case of native this is set to false and the planner generates the leafOperators Guardrails around materialization Comprehensive UTs	2024-03-28 14:58:34 +05:30
Adithya Chakilam	a65b2d4f41	Visibility into LagBased AutoScaler desired task count (#16199 ) * Visibility into skipped scale notices * comments * change to emit always instead of just skips * fix failing test * comments * Add couple more tests	2024-03-27 13:08:00 -04:00
Rushikesh Bankar	3d8b0ffae8	Add indexer level task metrics to provide more visibility in the task distribution (#15991 ) Changes: Add the following indexer level task metrics: - `worker/task/running/count` - `worker/task/assigned/count` - `worker/task/completed/count` These metrics will provide more visibility into the tasks distribution across indexers (We often see a task skew issue across indexers and with this issue it would be easier to catch the imbalance)	2024-03-21 11:08:01 +05:30
Abhishek Radhakrishnan	fa8e511492	Add versions to `markUsed` and `markUnused` APIs (#16141 ) * Mark used and unused APIs by versions. * remove the conditional invocations. * isValid() and test updates. * isValid() and tests. * Remove warning logs for invalid user requests. Also, downgrade visibility. * Update resp message, etc. * tests and some cleanup. * Docs draft * Clarify docs * Update server/src/main/java/org/apache/druid/server/http/DataSourcesResource.java Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Review comments * Remove default interface methods only used in tests and update docs. * Clarify javadocs and @Nullable. * Add more tests. * Parameterized versions. --------- Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2024-03-19 09:22:25 -07:00
Katya Macedo	da6158c166	[Docs] Improve the "Update existing data" tutorial (#16081 ) * Modify update data tutorial * Update after review * Add append topic * Update after review * Add upsert to spelling	2024-03-14 16:31:33 -07:00
Gian Merlino	256160aba6	MSQ: Validate that strings and string arrays are not mixed. (#15920 ) * MSQ: Validate that strings and string arrays are not mixed. When multi-value strings and string arrays coexist in the same column, it causes problems with "classic MVD" style queries such as: select * from wikipedia -- fails at runtime select count() from wikipedia where flags = 'B' -- fails at planning time select flags, count() from wikipedia group by 1 -- fails at runtime To avoid these problems, this patch adds type verification for INSERT and REPLACE. It is targeted: the only type changes that are blocked are string-to-array and array-to-string. There is also a way to exclude certain columns from the type checks, if the user really knows what they're doing. * Fixes. * Tests and docs and error messages. * More docs. * Adjustments. * Adjust message. * Fix tests. * Fix test in DV mode.	2024-03-13 15:37:27 -07:00
317brian	03c191f701	docs: clarify description of uri/uriprefix (#16110 ) * docs: clarify description of uri/uripath * Apply suggestions from code review Co-authored-by: Charles Smith <techdocsmith@gmail.com> --------- Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2024-03-13 11:52:01 -07:00
Abhishek Radhakrishnan	fb7bb0953d	Kill segments by versions (#15994 ) * Kill task version support. Kill tasks by default kill all versions of unused segments in the specified interval. Users wanting to delete specific versions (for example, data compliance reasons) and keep rest of the versions can specify the optional version in the kill task payload. * Formatting changes. * Multi version tests in RetrieveSegmentsActionsTest Sort of like method-level parameterized tests. * Address review feedback * Accept a list of versions instead of a single version. Support multiple versions. * Tests for multiple versions. * Update docs * Cleanup * Address review comments. Retain the old interface method and make it default and route it to the method with nullable versions variant. Update usages to use the default method where versions doesn't matter. * Remove versions from retreive used segments action. * Some updates. * Apply suggestions from code review Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * /s/actual/observed/g * minor test cleanup * WIP: Test fixes and updates. Also add test for kill by version with used load spec. Checkpoint. --------- Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2024-03-13 09:37:30 +05:30

1 2 3 4 5 ...

3244 Commits