druid

Commit Graph

Author	SHA1	Message	Date
Sree Charan Manamala	1f6d2c41d2	Update doc for dynamic parameters supporting array (#16660 ) Update dynamic parameter docs to provide how it can used to replace an Array	2024-08-07 12:33:37 +05:30
Edgar Melendrez	83cf4dc554	[docs] fixes to sql-scalar.md (#16826 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> Co-authored-by: Benedict Jin <asdf2014@apache.org> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>	2024-08-06 17:12:57 -07:00
zachjsh	c324f09108	Kinesis input format docs (#16840 ) * SQL syntax error should target USER persona * * revert change to queryHandler and related tests, based on review comments * * add test * Docs for Kinesis input format * * remove reference to kafka * * fix spellcheck error * Apply suggestions from code review Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> --------- Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2024-08-06 18:53:10 -04:00
Edgar Melendrez	ebea34a814	[Docs] Batch06: starting string functions (#16838 ) * batch06, starting string functions * addind space after Syntax * quick change * correcting spelling * Update docs/querying/sql-functions.md * Update sql-functions.md * applying suggestions * Update docs/querying/sql-functions.md * Update docs/querying/sql-functions.md --------- Co-authored-by: Benedict Jin <asdf2014@apache.org> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2024-08-06 11:32:26 -07:00
Kashif Faraz	aa49be61ea	Do not create ZK paths if not needed (#16816 ) Background: ZK-based segment loading has been completely disabled in #15705 . ZK `servedSegmentsPath` has been deprecated since Druid 0.7.1, #1182 . This legacy path has been replaced by the `liveSegmentsPath` and is not used in the code anymore. Changes: - Never create ZK loadQueuePath as it is never used. - Never create ZK servedSegmentsPath as it is never used. - Do not create ZK liveSegmentsPath if announcement on ZK is disabled - Fix up tests	2024-08-06 19:29:13 +05:30
Rushikesh Bankar	c8323d1a7c	Add indexer task success and failure metrics (#16829 ) This PR adds indexer-level task metrics- "indexer/task/failed/count" "indexer/task/success/count" the current "worker/task/completed/count" metric shows all the tasks completed irrespective of success or failure status so these metrics would help us get more visibility into the status of the completed tasks	2024-08-05 16:21:27 +05:30
Laksh Singla	0411c4e67e	Add metrics for number of rows/bytes materialized while running subqueries (#16835 ) subquery/rows and subquery/bytes metrics have been added, which indicate the size of the results materialized on the heap.	2024-08-05 14:13:20 +05:30
Kashif Faraz	9dc2569f22	Track and emit segment loading rate for HttpLoadQueuePeon on Coordinator (#16691 ) Design: The loading rate is computed as a moving average of at least the last 10 GiB of successful segment loads. To account for multiple loading threads on a server, we use the concept of a batch to track load times. A batch is a set of segments added by the coordinator to the load queue of a server in one go. Computation: batchDurationMillis = t(load queue becomes empty) - t(first load request in batch is sent to server) batchBytes = total bytes successfully loaded in batch avg loading rate in batch (kbps) = (8 * batchBytes) / batchDurationMillis overall avg loading rate (kbps) = (8 * sumOverWindow(batchBytes)) / sumOverWindow(batchDurationMillis) Changes: - Add `LoadingRateTracker` which computes a moving average load rate based on the last few GBs of successful segment loads. - Emit metric `segment/loading/rateKbps` from the Coordinator. In the future, we may also consider emitting this metric from the historicals themselves. - Add `expectedLoadTimeMillis` to response of API `/druid/coordinator/v1/loadQueue?simple`	2024-08-03 13:14:21 +05:30
Akshat Jain	bb4d6cc001	Add task report fields in response of SQL statements endpoint (#16808 ) If the optional query parameter detail is supplied, then the response also includes the following: * A stages object that summarizes information about the different stages being used for query execution, such as stage number, phase, start time, duration, input and output information, processing methods, and partitioning. * A counters object that provides details on the rows, bytes, and files processed at various stages for each worker across different channels, along with sort progress. * A warnings object that provides details about any warnings.	2024-08-01 10:26:04 +05:30
Edgar Melendrez	3bb6d40285	[docs] batch 5 updating functions (#16812 ) * batch 5 * Update docs/querying/sql-functions.md * applying suggestions --------- Co-authored-by: Benedict Jin <asdf2014@apache.org>	2024-07-30 17:30:01 -07:00
Edgar Melendrez	85a8a1d805	[Docs]Batch04 - Bitwise numeric functions (#16805 ) * Batch04 - Bitwise numeric functions * Batch04 - Bitwise numeric functions * minor fixes * rewording bitwise_shift functions * rewording bitwise_shift functions * Update docs/querying/sql-functions.md * applying suggestions --------- Co-authored-by: Benedict Jin <asdf2014@apache.org>	2024-07-30 10:53:59 -07:00
Edgar Melendrez	c07aeedbec	[docs] Updating Rollup tutorial (#16762 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> Co-authored-by: Benedict Jin <asdf2014@apache.org> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>	2024-07-26 15:43:31 -07:00
Edgar Melendrez	028ee23a1e	[Docs] batch 03 - trig functions (#16795 ) * batch 03 - trig functions * Apply suggestions from code review Co-authored-by: Charles Smith <techdocsmith@gmail.com> * applying suggestions and corrections --------- Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2024-07-26 13:11:17 -07:00
Charles Smith	ed48cb82e9	[Docs} Remove avro_ocf support from Kafka & Kinesis streaming sources (Revert changes from #11865 ) (#16807 )	2024-07-26 13:06:22 -07:00
Clint Wylie	5da69a01cb	change arrayIngestMode default to array (#16789 ) * change arrayIngestMode default to array * remove arrayIngestMode flag option none * fix space * fix test	2024-07-25 15:09:40 +08:00
Zoltan Haindrich	7e3fab5bf9	Make WindowFrames more specific (#16741 ) Changes the WindowFrame internals / representation a bit; introduces dedicated frametypes for rows and groups which corresponds to the implemented processing methods	2024-07-25 04:57:36 +02:00
Edgar Melendrez	ca787885c9	[docs] batch02 of updating functions (#16761 ) * applying changes * ensuring batch is updated * Update docs/querying/sql-functions.md * raise -> raises * addressing review * Apply suggestions from code review Co-authored-by: Charles Smith <techdocsmith@gmail.com> --------- Co-authored-by: Benedict Jin <asdf2014@apache.org> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2024-07-24 15:28:57 -07:00
317brian	704962ec8e	doc: minor fixes to migration guides (#16784 )	2024-07-23 13:09:51 -07:00
Edgar Melendrez	934c10b1cd	docs: Adding admonition box to warn about MVD (#16712 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Benedict Jin <asdf2014@apache.org>	2024-07-22 17:32:23 -07:00
Clint Wylie	02b8738c00	remove batchProcessingMode from task config, remove AppenderatorImpl (#16765 ) changes: * removes `druid.indexer.task.batchProcessingMode` in favor of always using `CLOSED_SEGMENT_SINKS` which uses `BatchAppenderator`. This was intended to become the default for native batch, but that was missed so `CLOSED_SEGMENTS` was the default (using `AppenderatorImpl`), however MSQ has been exclusively using `BatchAppenderator` with no problems so it seems safe to just roll it out as the only option for batch ingestion everywhere. * with `batchProcessingMode` gone, there is no use for `AppenderatorImpl` so it has been removed * implify `Appenderator` construction since there are only separate stream and batch versions now * simplify tests since `batchProcessingMode` is gone	2024-07-22 13:56:44 -07:00
Clint Wylie	a34a06e192	remove Firehose and FirehoseFactory (#16758 ) changes: * removed `Firehose` and `FirehoseFactory` and remaining implementations which were mostly no longer used after #16602 * Moved `IngestSegmentFirehose` which was still used internally by Hadoop ingestion to `DatasourceRecordReader.SegmentReader` * Rename `SQLFirehoseFactoryDatabaseConnector` to `SQLInputSourceDatabaseConnector` and similar renames for sub-classes * Moved anything remaining in a 'firehose' package somewhere else * Clean up docs on firehose stuff	2024-07-19 14:37:21 -07:00
Charles Smith	1881880714	[Docs] Adds a migration guide SQL compatible null handling (#16704 ) Co-authored-by: Clint Wylie <cjwylie@gmail.com> Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2024-07-19 09:25:05 -07:00
Clint Wylie	35b876436b	remove native scan query legacy mode (#16659 )	2024-07-18 23:33:27 -07:00
Edgar Melendrez	721a65046f	docs: add examples for SQL functions (#16745 ) * updating first batch of numeric functions * First batch of functions * addressing first few comments * alphabetize list * draft with suggestions applied * minor discrepency expr -> <NUMERIC> * changed raises to calculates * Update docs/querying/sql-functions.md * switch to underscore * changed to exp(1) to match slack message * adding html text for trademark symbol to .spelling * fixed discrepancy between description and example --------- Co-authored-by: Benedict Jin <asdf2014@apache.org>	2024-07-18 17:06:22 -07:00
Kashif Faraz	9f6ce6ddc0	Remove task action audit logging and druid_taskLog metadata table (#16309 ) Description: Task action audit logging was first deprecated and disabled by default in Druid 0.13, #6368. As called out in the original discussion #5859, there are several drawbacks to persisting task action audit logs. - Only usage of the task audit logs is to serve the API `/indexer/v1/task/{taskId}/segments` which returns the list of segments created by a task. - The use case is really narrow and no prod clusters really use this information. - There can be better ways of obtaining this information, such as the metric `segment/added/bytes` which reports both the segment ID and task ID when a segment is committed by a task. We could also include committed segment IDs in task reports. - A task persisting several segments would bloat up the audit logs table putting unnecessary strain on metadata storage. Changes: - Remove `TaskAuditLogConfig` - Remove method `TaskAction.isAudited()`. No task action is audited anymore. - Remove `SegmentInsertAction` as it is not used anymore. `SegmentTransactionalInsertAction` is the new incarnation which has been in use for a while. - Deprecate `MetadataStorageActionHandler.addLog()` and `getLogs()`. These are not used anymore but need to be retained for backward compatibility of extensions. - Do not create `druid_taskLog` metadata table anymore.	2024-07-17 17:09:00 +05:30
Vadim Ogievetsky	307b8849de	Web console: better sql data loader reset (#16696 ) * better sql data loader reset * snapshot * fix destination pane sizing * clean doc links * update doc links * more doc links * extract getClusterCapacity * update snapsohts * allow submit suspended * some renaming * diff with current * Do delta	2024-07-11 14:45:04 -07:00
YongGang	4b293fc2a9	Docs: Fix k8s dynamic config URL (#16720 )	2024-07-11 10:05:47 +05:30
Lars Francke	586c713d12	Updates build documentation to not mention explicit Java version as it was out of sync with the dedicated Java page. (#16674 ) This means there is one less place to keep information in sync.	2024-07-03 20:53:15 +05:30
317brian	d65e015c94	docs: nit for link format (#16687 )	2024-07-02 16:45:09 -07:00
Victoria Lim	adde024e11	docs: Subtitle updates in migration guide overview (#16683 )	2024-07-02 12:56:05 -07:00
Jill Osborne	bd49ecfd29	Addition to subquery limit migration guide (#16671 ) Co-authored-by: Laksh Singla <lakshsingla@gmail.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2024-07-01 14:22:47 -07:00
Hugh Evans	920d9020c0	Docs: Fix default value for globalIngestionHeapLimitBytes (#16654 ) Use the new default value added in #8255	2024-06-27 07:01:56 +05:30
Gian Merlino	dbed1b0f50	Defer more expressions in vectorized groupBy. (#16338 ) * Defer more expressions in vectorized groupBy. This patch adds a way for columns to provide GroupByVectorColumnSelectors, which controls how the groupBy engine operates on them. This mechanism is used by ExpressionVirtualColumn to provide an ExpressionDeferredGroupByVectorColumnSelector that uses the inputs of an expression as the grouping key. The actual expression evaluation is deferred until the grouped ResultRow is created. A new context parameter "deferExpressionDimensions" allows users to control when this deferred selector is used. The default is "fixedWidthNonNumeric", which is a behavioral change from the prior behavior. Users can get the prior behavior by setting this to "singleString". * Fix style. * Add deferExpressionDimensions to SqlExpressionBenchmark. * Fix style. * Fix inspections. * Add more testing. * Use valueOrDefault. * Compute exprKeyBytes a bit lighter-weight.	2024-06-26 17:28:36 -07:00
Andreas Maechler	ab76d851ad	Update docs contribution with correct script (#16581 ) * Spacing * Fix ordering * npm run start	2024-06-26 10:30:52 -07:00
Laksh Singla	71b3b5ab5d	Add query context parameter to remove null bytes when writing frames (#16579 ) MSQ cannot process null bytes in string fields, and the current workaround is to remove them using the REPLACE function. 'removeNullBytes' context parameter has been added which sanitizes the input string fields by removing these null bytes.	2024-06-26 15:00:30 +05:30
Edgar Melendrez	b43f4063c5	Docs: update link and title of quickstart (#16638 ) * update link and title * Discard changes to website/package.json * Apply suggestions from code review Co-authored-by: Charles Smith <techdocsmith@gmail.com> --------- Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2024-06-25 09:07:00 -07:00
Clint Wylie	37a50e6803	Remove index_realtime and index_realtime_appenderator tasks (#16602 ) index_realtime tasks were removed from the documentation in #13107. Even at that time, they weren't really documented per se— just mentioned. They existed solely to support Tranquility, which is an obsolete ingestion method that predates migration of Druid to ASF and is no longer being maintained. Tranquility docs were also de-linked from the sidebars and the other doc pages in #11134. Only a stub remains, so people with links to the page can see that it's no longer recommended. index_realtime_appenderator tasks existed in the code base, but were never documented, nor as far as I am aware were they used for any purpose. This patch removes both task types completely, as well as removes all supporting code that was otherwise unused. It also updates the stub doc for Tranquility to be firmer that it is not compatible. (Previously, the stub doc said it wasn't recommended, and pointed out that it is built against an ancient 0.9.2 version of Druid.) ITUnionQueryTest has been migrated to the new integration tests framework and updated to use Kafka ingestion. Co-authored-by: Gian Merlino <gianmerlino@gmail.com>	2024-06-24 20:13:33 -07:00
317brian	2131917f16	docs: added front-coded dictionaries to upgrade notes (#16647 ) * docs: add front-coded dictionareis to upgrade notes * add it to release notes template	2024-06-24 10:52:26 -07:00
Abhishek Radhakrishnan	7463589b07	Support for bootstrap segments (#16609 ) * Initial support for bootstrap segments. - Adds a new API in the coordinator. - All processes that have storage locations configured (including tasks) talk to the coordinator if they can, and fetch bootstrap segments from it. - Then load the segments onto the segment cache as part of startup. - This addresses the segment bootstrapping logic required by processes before they can start serving queries or ingesting. This patch also lays the foundation to speed up upgrades. * Fail open by default if there are any errors talking to the coordinator. * Add test for failure scenario and cleanup logs. * Cleanup and add debug log * Assert the events so we know the list exactly. * Revert RunRules test. The rules aren't evaluated if there are no clusters. * Revert RunRulesTest too. * Remove debug info. * Make the API POST and update log. * Fix up UTs. * Throw 503 from MetadataResource; clean up exception handling and DruidException. * Remove unused logger, add verification of metrics and docs. * Update error message * Update server/src/main/java/org/apache/druid/server/coordination/SegmentLoadDropHandler.java Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Apply suggestions from code review Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Adjust test metric expectations with the rename. * Add BootstrapSegmentResponse container in the response for future extensibility. * Rename to BootstrapSegmentsInfo for internal consistency. * Remove unused log. * Use a member variable for broadcast segments instead of segmentAssigner. * Minor cleanup * Add test for loadable bootstrap segments and clarify comment. * Review suggestions. --------- Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2024-06-24 09:27:17 -07:00
Suneet Saldanha	4e0ea7823b	Update docs for K8s TaskRunner Dynamic Config (#16600 ) * Update docs for K8s TaskRunner Dynamic Config * touchups * code review * npe * oopsies	2024-06-21 06:01:59 -07:00
Akshat Jain	cd438b1918	Emit metrics for S3UploadThreadPool (#16616 ) * Emit metrics for S3UploadThreadPool * Address review comments * Revert unnecessary formatting change * Revert unnecessary formatting change in metrics.md file * Address review comments * Add metric for task duration * Minor fix in metrics.md * Add s3Key and uploadId in the log message * Address review comments * Create new instance of ServiceMetricEvent.Builder for thread safety * Address review comments * Address review comments	2024-06-21 11:36:47 +05:30
Andreas Maechler	ae70e18bc8	docs: Update Azure extension (#16585 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2024-06-20 09:31:29 -07:00
Jill Osborne	aec1d5ddd6	Link fix (#16596 ) * Link fix * Update docs/operations/auth.md Co-authored-by: Andreas Maechler <amaechler@gmail.com> --------- Co-authored-by: Andreas Maechler <amaechler@gmail.com>	2024-06-14 11:40:53 -07:00
317brian	e1926e2549	docs: fix redirect (#16548 ) * doc: cleanup unnecessary redirect (cherry picked from commit d86aaadbc78cc51345f768ee66c9a8b2cbf13f27) * restore redirect file entry. delete md file	2024-06-14 09:54:16 +08:00
Alberic Liu	ea2de517b2	Update the youtube link for druid presentations page (#16601 ) * Update the link to lambda architectures with Druid * update the youtube link	2024-06-14 09:47:46 +08:00
Victoria Lim	836cdb48a5	docs: Migration guide for MVDs to arrays (#16516 ) Co-authored-by: Clint Wylie <cjwylie@gmail.com> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> Co-authored-by: Benedict Jin <asdf2014@apache.org> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>	2024-06-13 13:05:58 -07:00
George Shiqi Wu	d5a25a94b8	Docs: Clarify that all supervisors can support early handoff (#16588 )	2024-06-13 08:43:22 +05:30
YongGang	46dbc74053	Support Dynamic Peon Pod Template Selection in K8s extension (#16510 ) * initial commit * add Javadocs * refine JSON input config * more test and fix build * extract existing behavior as default strategy * change template mapping fallback * add docs * update doc * fix doc * address comments * define Matcher interface * fix test coverage * use lower case for endpoint path * update Json name * add more tests * refactoring Selector class	2024-06-12 15:27:10 -07:00
Andreas Maechler	fec48432d4	docs: Correct some outdated module names (#16584 ) * Fix module names * Better spacing * Some spacing * Suggestions from code review Thanks Abhishek. * More links * Roll-up time * Remove logs * More spelling	2024-06-11 14:17:40 -07:00
Andreas Maechler	24056b90b5	Bring back missing property in indexer documentation (#16582 ) * Bring back druid.peon.taskActionClient.retry.minWait * Update docs/configuration/index.md * Consistent italics Thanks Abhishek. * Update docs/configuration/index.md Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com> * Consistent list style * Remove extra space --------- Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>	2024-06-10 16:52:54 -07:00
Kashif Faraz	e4fdf1055b	Update default value of `druid.indexer.tasklock.batchAllocationWaitTime` to zero (#16578 ) Update default value of druid.indexer.tasklock.batchAllocationWaitTime to 0. Thus, a segment allocation request is processed immediately unless there are already some requests queued before this one. While in queue, a segment allocation request may get clubbed together with other similar requests into a batch to reduce load on the metadata store.	2024-06-10 20:07:23 +05:30
317brian	8e11adfc6f	docs: remove outdated druidversion var from a page (#16570 ) Co-authored-by: asdf2014 <asdf2014@apache.org>	2024-06-10 15:30:36 +08:00
Gian Merlino	b837ce565b	Simplify serialized form of JsonInputFormat. (#15691 ) * Simplify serialized form of JsonInputFormat. Use JsonInclude for keepNullColumns, assumeNewlineDelimited, and useJsonNodeReader. Because the default value of keepNullColumns is variable, we store the original configured value rather than the derived value, and include if the original value is nonnull. * Fix test.	2024-06-05 20:01:14 -07:00
Katya Macedo	7aecc09230	Docs: Remove circular link (#16553 )	2024-06-05 11:07:36 -07:00
Charles Smith	c100ae0ecc	Add a tutorial for LATEST_BY to get most recent data (#16515 ) Co-authored-by: Will Xu <2bethere@gmail.com> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>	2024-06-04 17:00:25 -07:00
Jill Osborne	8b5802d4cd	docs: add maxSubqueryBytes limit to migration guide landing page (#16547 ) Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2024-06-04 12:52:06 -07:00
Amit	540d3e6af5	Added new use cases and description of the use case - 5/14/24 (#16451 ) Thanks for your contribution @amit-git-account * Added new use cases and description of the use case - 5/14/24 The use case listing is not changed in a long time. While speaking with users, I came across several other use cases not listed here in the index. So I added new use cases and also added description against the use cases. * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * update spelling file * Update docs/design/index.md --------- Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> Co-authored-by: Benedict Jin <asdf2014@apache.org>	2024-06-04 09:47:49 -07:00
Charles Smith	8f78c901e7	docs: add lookups to the sidebar (#16530 ) Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2024-06-03 16:04:15 -07:00
Charles Smith	b1568fb95b	docs: Adds a redirect for flatten-json which was removed (#16263 )	2024-05-31 16:16:12 -07:00
Katya Macedo	f70ef1f434	Update front coding text (#16491 ) Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2024-05-31 15:13:10 -07:00
Katya Macedo	92e660dd21	Add Druid 30.0.0 upgrade notes (#16522 )	2024-05-31 13:23:22 -07:00
Atul Mohan	b53d75758f	IcebergInputSource : Add option to toggle case sensitivity while reading columns from iceberg catalog (#16496 ) * Toggle case sensitivity while reading columns from iceberg * Fix tests * Drop case check and set unconditionally	2024-05-31 10:18:52 -07:00
George Shiqi Wu	0936798122	Add limit to task payload size (#16512 ) * Add limit to task payload size * Change to a warning * Remove test * Fix unit tests * Optionally throw alert * PR comments * Update indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskQueue.java Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * PR comments * Reject large payloads * Update docs/configuration/index.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Update indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskQueue.java Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> --------- Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2024-05-31 09:17:36 -07:00
Jill Osborne	3c72ec8413	docs: Migration guide for subquery limit (#16519 ) Adds a migration guide for Druid 30 to help users understand the new byte-based subquery limit property maxSubqueryBytes	2024-05-31 09:26:07 +05:30
Charles Smith	92e565e3b8	Adds a migration guide overview page to the release-info section (#16506 ) Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> Co-authored-by: Katya Macedo <katya.macedo@imply.io>	2024-05-30 09:50:30 -07:00
Adithya Chakilam	a9044ac235	Add cgroup cpu/mem/disk usage metrics (#16472 ) * Add cgroup cpu/mem usage metrics * checks * comments * docs fix * add disk metrics * fapi check * checkstyle * issues * spelling * change asserts * checks * use proc builder instead of runtime * specify charset * spotbug	2024-05-29 12:44:37 -07:00
George Shiqi Wu	b3b62ac431	Update azure input source docs (#16508 ) Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2024-05-29 10:00:46 -07:00
Vadim Ogievetsky	10ea88e5bf	Web console: more robust durable storage setting detection (#16493 ) * more robust durable storage setting * add test	2024-05-22 15:47:20 -07:00
Vadim Ogievetsky	a124c6cbbd	fix typo in extension name (#16466 )	2024-05-20 09:47:22 +08:00
George Shiqi Wu	ed9881df88	Cleanup logic from handoff API (#16457 ) * Cleanup logic from handoff API * Fix test * Fix checkstyle * Update docs	2024-05-16 08:42:44 -07:00
Gian Merlino	72432c2e78	Speed up SQL IN using SCALAR_IN_ARRAY. (#16388 ) * Speed up SQL IN using SCALAR_IN_ARRAY. Main changes: 1) DruidSqlValidator now includes a rewrite of IN to SCALAR_IN_ARRAY, when the size of the IN is above inFunctionThreshold. The default value of inFunctionThreshold is 100. Users can restore the prior behavior by setting it to Integer.MAX_VALUE. 2) SearchOperatorConversion now generates SCALAR_IN_ARRAY when converting to a regular expression, when the size of the SEARCH is above inFunctionExprThreshold. The default value of inFunctionExprThreshold is 2. Users can restore the prior behavior by setting it to Integer.MAX_VALUE. 3) ReverseLookupRule generates SCALAR_IN_ARRAY if the set of reverse-looked-up values is greater than inFunctionThreshold. * Revert test. * Additional coverage. * Update docs/querying/sql-query-context.md Co-authored-by: Benedict Jin <asdf2014@apache.org> * New test. --------- Co-authored-by: Benedict Jin <asdf2014@apache.org>	2024-05-14 08:09:27 -07:00
George Shiqi Wu	c1bf4fed90	API for stopping streaming tasks early (#16310 ) * Try stopping task early * Fix checkstyle * Add unit test * Add a couple more tests * PR changes * Use notice * fix checkstyle * PR changes * Update indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/supervisor/SeekableStreamSupervisor.java Co-authored-by: Suneet Saldanha <suneet@apache.org> * Update indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/supervisor/SeekableStreamSupervisor.java Co-authored-by: Suneet Saldanha <suneet@apache.org> * Change payload * Remove quotes --------- Co-authored-by: Suneet Saldanha <suneet@apache.org>	2024-05-14 06:39:50 -07:00
Alberic Liu	811dcd1726	update protobuf.md (#16434 )	2024-05-11 17:52:54 +08:00
Charles Smith	2d0b4e5f1e	Update sidebar to organize tutorials + other minor improvements (#16184 ) Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2024-05-09 08:57:43 -07:00
Adarsh Sanjeev	269e035e76	Add validation for reindex with realtime sources (#16390 ) Add validation for reindex with realtime sources. With the addition of concurrent compaction, it is possible to ingest data while querying from realtime sources with MSQ into the same datasource. This could potentially lead to issues if the interval that is ingested into is replaced by an MSQ job, which has queried only some of the data from the realtime task. This PR adds validation to check that the datasource being ingested into is not being queried from, if the query includes realtime sources.	2024-05-07 10:32:15 +05:30
Misha	b5958b6b07	Feature configurable calcite bloat (#16248 ) * Configurable bloat for calcite ProjectMergeRule implemented * Comment added * Default bloat value increased to 1000 * Implemented bloat configuration from QueryContext * Code refactored, docs updated --------- Co-authored-by: sviatahorau <mikhail.sviatahorau@deep.bi>	2024-05-06 20:43:39 +05:30
Alberic Liu	92fb0ff718	upgrade mysql:mysql-connector-java to 8.2.0 (#16024 ) * upgrade mysql:mysql-connector-java to 8.2.0 * fix the check errors * remove unused comment	2024-05-06 21:58:37 +08:00
Rishabh Singh	c61c3785a0	Followup changes to 15817 (Segment schema publishing and polling) (#16368 ) * Fix build * Nit changes in KillUnreferencedSegmentSchema * Replace reference to the abbreviation SMQ with Metadata Query, rename inTransit maps in schema cache * nitpicks * Remove reference to smq abbreviation from integration-tests * Remove reference to smq abbreviation from integration-tests * minor change * Update index.md * Add delimiter while computing schema fingerprint hash	2024-05-03 19:13:52 +05:30
Kashif Faraz	51104e8bb3	Docs: Remove references to Zk-based segment loading (#16360 ) Follow up to #15705 Changes: - Remove references to ZK-based segment loading in the docs - Fix doc for existing config `druid.coordinator.loadqueuepeon.http.repeatDelay`	2024-05-01 08:06:00 +05:30
Abhishek Radhakrishnan	1d7595f3f7	Support for filters in the Druid Delta Lake connector (#16288 ) * Delta Lake support for filters. * Updates * cleanup comments * Docs * Remmove Enclosed runner * Rename * Cleanup test * Serde test for the Delta input source and fix jackson annotation. * Updates and docs. * Update error messages to be clearer * Fixes * Handle NumberFormatException to provide a nicer error message. * Apply suggestions from code review Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Doc fixes based on feedback * Yes -> yes in docs; reword slightly. * Update docs/ingestion/input-sources.md Co-authored-by: Laksh Singla <lakshsingla@gmail.com> * Update docs/ingestion/input-sources.md Co-authored-by: Laksh Singla <lakshsingla@gmail.com> * Documentation, javadoc and more updates. * Not with an or expression end-to-end test. * Break up =, >, >=, <, <= into its own types instead of sub-classing. --------- Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> Co-authored-by: Laksh Singla <lakshsingla@gmail.com>	2024-04-29 11:31:36 -07:00
Adithya Chakilam	f8015eb02a	Add config lagAggregate to LagBasedAutoScalerConfig (#16334 ) Changes: - Add new config `lagAggregate` to `LagBasedAutoScalerConfig` - Add field `aggregateForScaling` to `LagStats` - Use the new field/config to determine which aggregate to use to compute lag - Remove method `Supervisor.computeLagForAutoScaler()`	2024-04-29 22:20:41 +05:30
Gian Merlino	db82adcdfd	SCALAR_IN_ARRAY: Optimization and behavioral follow-ups. (#16311 ) * Four changes to scalar_in_array as follow-ups to #16306: 1) Align behavior for `null` scalars to the behavior of the native `in` and `inType` filters: return `true` if the array itself contains null, else return `null`. 2) Rename the class to more closely match the function name. 3) Add a specialization for constant arrays, where we build a `HashSet`. 4) Use `castForEqualityComparison` to properly handle cross-type comparisons. Additional tests verify comparisons between LONG and DOUBLE are now handled properly. * Fix spelling. * Adjustments from review.	2024-04-26 16:01:17 -07:00
Atul Mohan	77333e56fa	Docs: Add missing kafka emitter config (#16332 )	2024-04-25 10:37:14 +05:30
Katya Macedo	ceb6646dec	Add supervisor actions (#16276 ) * Add supervisor actions * Update text * Update text * Update after review * Update after review	2024-04-24 13:14:01 -07:00
Rishabh Singh	e30790e013	Introduce Segment Schema Publishing and Polling for Efficient Datasource Schema Building (#15817 ) Issue: #14989 The initial step in optimizing segment metadata was to centralize the construction of datasource schema in the Coordinator (#14985). Thereafter, we addressed the problem of publishing schema for realtime segments (#15475). Subsequently, our goal is to eliminate the requirement for regularly executing queries to obtain segment schema information. This is the final change which involves publishing segment schema for finalized segments from task and periodically polling them in the Coordinator.	2024-04-24 22:22:53 +05:30
Charles Smith	65412f80ab	remove additional column marks (#16319 )	2024-04-22 19:41:54 -07:00
Sree Charan Manamala	ad5701e891	new SCALAR_IN_ARRAY function analogous to DRUID_IN (#16306 ) * scalar_in function * api doc * refactor	2024-04-18 21:15:15 -07:00
Gian Merlino	4285a5e2c6	Update documentation for exceptions to subquery limit. (#16295 ) The true exception for groupBy is somewhat more narrow than the docs suggest.	2024-04-17 21:04:43 -07:00
Hardik Bajaj	0bf5e7745d	Add configurable parameters for statsd client (#16283 ) Statsd client sometimes drops metrics when this queueSize of statsd client with max unprocessed messages is completely full. This causes some high cardinality metrics like per partition lag being droppped. There are multiple parameters of statsdclient that can be initialized and can help increase the load/capacity of client to not to drop metrics more frequently. Properties like queueSize, poolSize, processorWorkers and senderWorkers will now be configurable at runtime	2024-04-17 18:35:31 +05:30
Nikhil Rao	a805c5612e	Adds Druid SQL query examples for the Stats aggregator Native Queries (#16277 ) * Adds Druid SQL query examples for the Timeseries and GroupBy Native queries in the stats aggregator docs page * Updates intervals in Native Query to remove excess Time part in timestamp * Moves Druid SQL section above Native query because sql used more often by users * removes old Druid SQL sections * Adds TopN Druid SQL query using ORDER BY and LIMIT * Adds table for Druid SQL variance and standard deviation functions * Update docs/development/extensions-core/stats.md Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com> --------- Co-authored-by: Karan Kumar <karankumar1100@gmail.com> Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>	2024-04-15 08:05:34 -07:00
Adarsh Sanjeev	3df00aef9d	Add manifest file for MSQ export (#15953 ) Currently, export creates the files at the provided destination. The addition of the manifest file will provide a list of files created as part of the manifest. This will allow easier consumption of the data exported from Druid, especially for automated data pipelines	2024-04-15 11:37:31 +05:30
Abhishek Radhakrishnan	041d0bff5e	Set default `KillUnusedSegments` duty to coordinator's indexing period & `killTaskSlotRatio` to 0.1 (#16247 ) The default value for druid.coordinator.kill.period (if unspecified) has changed from P1D to the value of druid.coordinator.period.indexingPeriod. Operators can choose to override druid.coordinator.kill.period and that will take precedence over the default behavior. The default value for the coordinator dynamic config killTaskSlotRatio is updated from 1.0 to 0.1. This ensures that that kill tasks take up only 1 task slot right out-of-the-box instead of taking up all the task slots. * Remove stale comment and inline canDutyRun() * druid.coordinator.kill.period defaults to druid.coordinator.period.indexingPeriod if not set. - Remove the default P1D value for druid.coordinator.kill.period. Instead default druid.coordinator.kill.period to whatever value druid.coordinator.period.indexingPeriod is set to if the former config isn't specified. - If druid.coordinator.kill.period is set, the value will take precedence over druid.coordinator.period.indexingPeriod * Update server/src/test/java/org/apache/druid/server/coordinator/DruidCoordinatorConfigTest.java * Fix checkstyle error * Clarify comment * Update server/src/main/java/org/apache/druid/server/coordinator/DruidCoordinatorConfig.java * Put back canDutyRun() * Default killTaskSlotsRatio to 0.1 instead of 1.0 (all slots) * Fix typo DEFAULT_MAX_COMPACTION_TASK_SLOTS * Remove unused test method. * Update default value of killTaskSlotsRatio in docs and web-console default mock * Move initDuty() after params and config setup.	2024-04-14 18:56:17 -07:00
Katya Macedo	7f06a53cb1	[Docs] Fix API placeholder formatting (#16240 )	2024-04-12 09:19:13 -07:00
YongGang	da9feb4430	Introduce TaskContextReport for reporting task context (#16041 ) Changes: - Add `TaskContextEnricher` interface to improve task management and monitoring - Invoke `enrichContext` in `TaskQueue.add()` whenever a new task is submitted to the Overlord - Add `TaskContextReport` to write out task context information in reports	2024-04-12 08:57:49 +05:30
Pranav	fc2600b8e2	Adding jvmVersion dimension in JVM Monitor (#16262 )	2024-04-11 15:44:56 -07:00
317brian	df9e1bb97b	Docs: Fix typo in tutorial (#16254 )	2024-04-10 08:59:52 +05:30
Katya Macedo	cd69f145b7	docs: Add upgrade notes for Druid 29.0.1 (#16123 )	2024-04-09 13:56:57 -07:00
Vishesh Garg	3d595cfab1	Add storeCompactionState flag support to msq (#15965 ) Compaction in the native engine by default records the state of compaction for each segment in the lastCompactionState segment field. This PR adds support for doing the same in the MSQ engine, targeted for future cases such as REPLACE and compaction done via MSQ. Note that this PR doesn't implicitly store the compaction state for MSQ replace tasks; it is stored with flag "storeCompactionState": true in the query context.	2024-04-09 16:47:47 +05:30
Vishesh Garg	9a4fb58543	Record column name for exceptions while writing frames in RowBasedFrameWriter (#16130 ) Current Runtime Exceptions generated while writing frames only include the exception itself without including the name of the column they were encountered in. This patch introduces the further information in the error and makes it non-retryable.	2024-04-09 15:39:10 +05:30
Adarsh Sanjeev	e2e0cb905c	Add reasoning for choosing shardSpec to the MSQ report (#16175 ) This PR logs the segment type and reason chosen. It also adds it to the query report, to be displayed in the UI. This PR adds a new section to the reports, segmentReport. This contains the segment type created, if the query is an ingestion, and null otherwise.	2024-04-09 11:32:02 +05:30

1 2 3 4 5 ...

3258 Commits