druid

Commit Graph

Author	SHA1	Message	Date
zachjsh	720f1e834a	Add support for AzureDNSZone enabled storage accounts used for deep storage (#16016 ) * Add support for AzureDNSZone enabled storage accounts used for deep storage Added a new config to AzureAccountConfig `storageAccountEndpointSuffix` which allows the user to specify a storage account endpoint suffix where the underlying storage account is enabled for AzureDNSZone. The previous config `endpointSuffix`, did not allow support for such accounts. The previous config has been deprecated in favor of this new config. Also fixed an issue where `managedIdentityClientId` was not being set properly * * address review comments * * add back azure government link and docs	2024-03-04 16:13:28 -05:00
Sensor	4e9b758661	Support CPU resource configurable for Kubernates job under MoK Mode (#16008 ) * support CPU resource configurable for Kubernates job * update property doc * fix test name * refine doc format	2024-03-04 10:12:09 -05:00
317brian	3df161f73c	docs: update security doc for hashing (#15970 ) * docs: add mermaid diagram support * docs: update druid-basic-security doc to mention caching * Update docs/development/extensions-core/druid-basic-security.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> --------- Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2024-02-28 09:48:37 +08:00
George Shiqi Wu	59bb72a926	Fix parsing of env variables when properties have underscores (#15919 ) * Fix parsing of env variables when properties have underscores * Add documentation * Use a % sign instead	2024-02-21 13:18:21 -05:00
317brian	c98d54f3c4	docs: delete unused file that causes confusion (#15910 )	2024-02-14 16:42:02 -08:00
Katya Macedo	0f29ece6a9	[Docs] Refactor streaming ingestion section (#15591 ) Merging the work so far. @ektravel , @vogievetsky if there are additional improvements, let's track them & make another pr. * Refactor streaming ingestion docs * Update property definition * Update after review * Update known issues * Move kinesis and kafka topics to ingestion, add redirects * Saving changes * Saving * Add input format text * Update after review * Minor text edit * Update example syntax * Revert back to colon * Fix merge conflicts * Fix broken links * Fix spelling error	2024-02-12 13:52:42 -08:00
Tom	11a8624ef1	allow for kafka-emitter to have extra dimensions be set for each event it emits (#15845 ) * allow for kafka-emitter to have extra dimensions be set for each event it emits * fix checktsyle issue in kafkaemitterconfig * make changes to fix docs, and cleanup copy paste error in #toString() * undo formatting to markdown table * add more branches so test passes * fix checkstyle issue	2024-02-08 22:55:24 -08:00
Abhishek Radhakrishnan	1a5b57df84	Update `groupId` for delta-lake and iceberg extensions (#15843 ) * Update the group id to org.apache.druid.extensions.contrib for contrib exts. * Note iceberg and delta lake extensions in extensions.md * properties and shell backticks * Update groupId in distribution/pom.xml * remove delta-lake from dist. * Add note on downloading extension.	2024-02-07 23:54:06 -08:00
Pramod Immaneni	59bca0951a	Parallelize storage of incremental segments (#13982 ) During ingestion, incremental segments are created in memory for the different time chunks and persisted to disk when certain thresholds are reached (max number of rows, max memory, incremental persist period etc). In the case where there are a lot of dimension and metrics (1000+) it was observed that the creation/serialization of incremental segment file format for persistence and persisting the file took a while and it was blocking ingestion of new data. This affected the real-time ingestion. This serialization and persistence can be parallelized across the different time chunks. This update aims to do that. The patch adds a simple configuration parameter to the ingestion tuning configuration to specify number of persistence threads. The default value is 1 if it not specified which makes it the same as it is today.	2024-02-07 10:43:05 +05:30
Abhishek Radhakrishnan	9f95a691f7	Extension to read and ingest Delta Lake tables (#15755 ) * something * test commit * compilation fix * more compilation fixes (fixme placeholders) * Comment out druid-kereberos build since it conflicts with newly added transitive deps from delta-lake Will need to sort out the dependencies later. * checkpoint * remove snapshot schema since we can get schema from the row * iterator bug fix * json json json * sampler flow * empty impls for read(InputStats) and sample() * conversion? * conversion, without timestamp * Web console changes to show Delta Lake * Asset bug fix and tile load * Add missing pieces to input source info, etc. * fix stuff * Use a different delta lake asset * Delta lake extension dependencies * Cleanup * Add InputSource, module init and helper code to process delta files. * Test init * Checkpoint changes * Test resources and updates * some fixes * move to the correct package * More tests * Test cleanup * TODOs * Test updates * requirements and javadocs * Adjust dependencies * Update readme * Bump up version * fixup typo in deps * forbidden api and checkstyle checks * Trim down dependencies * new lines * Fixup Intellij inspections. * Add equals() and hashCode() * chain splits, intellij inspections * review comments and todo placeholder * fix up some docs * null table path and test dependencies. Fixup broken link. * run prettify * Different test; fixes * Upgrade pyspark and delta-spark to latest (3.5.0 and 3.0.0) and regenerate tests * yank the old test resource. * add a couple of sad path tests * Updates to readme based on latest. * Version support * Extract Delta DateTime converstions to DeltaTimeUtils class and add test * More comprehensive split tests. * Some test renames. * Cleanup and update instructions. * add pruneSchema() optimization for table scans. * Oops, missed the parquet files. * Update default table and rename schema constants. * Test setup and misc changes. * Add class loader logic as the context class loader is unaware about extension classes * change some table client creation logic. * Add hadoop-aws, hadoop-common and related exclusions. * Remove org.apache.hadoop:hadoop-common * Apply suggestions from code review Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Add entry to .spelling to fix docs static check --------- Co-authored-by: abhishekagarwal87 <1477457+abhishekagarwal87@users.noreply.github.com> Co-authored-by: Laksh Singla <lakshsingla@gmail.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2024-01-30 21:53:50 -08:00
Hiroshi Fukada	3fe3a65344	New: Add DDSketch in extensions-contrib (#15049 ) * New: Add DDSketch-Druid extension - Based off of http://www.vldb.org/pvldb/vol12/p2195-masson.pdf and uses the corresponding https://github.com/DataDog/sketches-java library - contains tests for post building and using aggregation/post aggregation. - New aggregator: `ddSketch` - New post aggregators: `quantileFromDDSketch` and `quantilesFromDDSketch` * Fixing easy CodeQL warnings/errors * Fixing docs, and dependencies Also moved aggregator ids to AggregatorUtil and PostAggregatorIds * Adding more Docs and better null/empty handling for aggregators * Fixing docs, and pom version * DDSketch documentation format and wording	2024-01-23 20:17:07 +05:30
zachjsh	9d4e8053a4	Kinesis adaptive memory management (#15360 ) ### Description Our Kinesis consumer works by using the [GetRecords API](https://docs.aws.amazon.com/kinesis/latest/APIReference/API_GetRecords.html) in some number of `fetchThreads`, each fetching some number of records (`recordsPerFetch`) and each inserting into a shared buffer that can hold a `recordBufferSize` number of records. The logic is described in our documentation at: https://druid.apache.org/docs/27.0.0/development/extensions-core/kinesis-ingestion/#determine-fetch-settings There is a problem with the logic that this pr fixes: the memory limits rely on a hard-coded “estimated record size” that is `10 KB` if `deaggregate: false` and `1 MB` if `deaggregate: true`. There have been cases where a supervisor had `deaggregate: true` set even though it wasn’t needed, leading to under-utilization of memory and poor ingestion performance. Users don’t always know if their records are aggregated or not. Also, even if they could figure it out, it’s better to not have to. So we’d like to eliminate the `deaggregate` parameter, which means we need to do memory management more adaptively based on the actual record sizes. We take advantage of the fact that GetRecords doesn’t return more than 10MB (https://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html ): This pr: eliminates `recordsPerFetch`, always use the max limit of 10000 records (the default limit if not set) eliminate `deaggregate`, always have it true cap `fetchThreads` to ensure that if each fetch returns the max (`10MB`) then we don't exceed our budget (`100MB` or `5% of heap`). In practice this means `fetchThreads` will never be more than `10`. Tasks usually don't have that many processors available to them anyway, so in practice I don't think this will change the number of threads for too many deployments add `recordBufferSizeBytes` as a bytes-based limit rather than records-based limit for the shared queue. We do know the byte size of kinesis records by at this point. Default should be `100MB` or `10% of heap`, whichever is smaller. add `maxBytesPerPoll` as a bytes-based limit for how much data we poll from shared buffer at a time. Default is `1000000` bytes. deprecate `recordBufferSize`, use `recordBufferSizeBytes` instead. Warning is logged if `recordBufferSize` is specified deprecate `maxRecordsPerPoll`, use `maxBytesPerPoll` instead. Warning is logged if maxRecordsPerPoll` is specified Fixed issue that when the record buffer is full, the fetchRecords logic throws away the rest of the GetRecords result after `recordBufferOfferTimeout` and starts a new shard iterator. This seems excessively churny. Instead, wait an unbounded amount of time for queue to stop being full. If the queue remains full, we’ll end up right back waiting for it after the restarted fetch. There was also a call to `newQ::offer` without check in `filterBufferAndResetBackgroundFetch`, which seemed like it could cause data loss. Now checking return value here, and failing if false. ### Release Note Kinesis ingestion memory tuning config has been greatly simplified, and a more adaptive approach is now taken for the configuration. Here is a summary of the changes made: eliminates `recordsPerFetch`, always use the max limit of 10000 records (the default limit if not set) eliminate `deaggregate`, always have it true cap `fetchThreads` to ensure that if each fetch returns the max (`10MB`) then we don't exceed our budget (`100MB` or `5% of heap`). In practice this means `fetchThreads` will never be more than `10`. Tasks usually don't have that many processors available to them anyway, so in practice I don't think this will change the number of threads for too many deployments add `recordBufferSizeBytes` as a bytes-based limit rather than records-based limit for the shared queue. We do know the byte size of kinesis records by at this point. Default should be `100MB` or `10% of heap`, whichever is smaller. add `maxBytesPerPoll` as a bytes-based limit for how much data we poll from shared buffer at a time. Default is `1000000` bytes. deprecate `recordBufferSize`, use `recordBufferSizeBytes` instead. Warning is logged if `recordBufferSize` is specified deprecate `maxRecordsPerPoll`, use `maxBytesPerPoll` instead. Warning is logged if maxRecordsPerPoll` is specified	2024-01-19 14:30:21 -05:00
Ben Sykes	e49a7bb3cd	Add SpectatorHistogram extension (#15340 ) * Add SpectatorHistogram extension * Clarify documentation Cleanup comments * Use ColumnValueSelector directly so that we support being queried as a Number using longSum or doubleSum aggregators as well as a histogram. When queried as a Number, we're returning the count of entries in the histogram. * Apply suggestions from code review Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Fix references * Fix spelling * Update docs/development/extensions-contrib/spectator-histogram.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> --------- Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2024-01-14 09:52:30 -08:00
Gian Merlino	cccf13ea82	Reverse, pull up lookups in the SQL planner. (#15626 ) * Reverse, pull up lookups in the SQL planner. Adds two new rules: 1) ReverseLookupRule, which eliminates calls to LOOKUP by doing reverse lookups. 2) AggregatePullUpLookupRule, which pulls up calls to LOOKUP above GROUP BY, when the lookup is injective. Adds configs `sqlReverseLookup` and `sqlPullUpLookup` to control whether these rules fire. Both are enabled by default. To minimize the chance of performance problems due to many keys mapping to the same value, ReverseLookupRule refrains from reversing a lookup if there are more keys than `inSubQueryThreshold`. The rationale for using this setting is that reversal works by generating an IN, and the `inSubQueryThreshold` describes the largest IN the user wants the planner to create. * Add additional line. * Style. * Remove commented-out lines. * Fix tests. * Add test. * Fix doc link. * Fix docs. * Add one more test. * Fix tests. * Logic, test updates. * - Make FilterDecomposeConcatRule more flexible. - Make CalciteRulesManager apply reduction rules til fixpoint. * Additional tests, simplify code.	2024-01-12 00:06:31 -08:00
Misha	ea6ba40ce1	Add support for Azure Goverment storage (#15523 ) Added support for Azure Government storage in Druid Azure-Extensions. This enhancement allows the Azure-Extensions to be compatible with different Azure storage types by updating the endpoint suffix from a hardcoded value to a configurable one.	2024-01-09 22:33:32 +05:30
Victoria Lim	52313c51ac	docs: Anchor link checker (#15624 ) Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2024-01-08 15:19:05 -08:00
Charles Smith	d8830b64fc	add style for table formatting to docs contribution (#15612 ) Co-authored-by: Benedict Jin <asdf2014@apache.org> Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2024-01-04 14:02:32 -08:00
George Shiqi Wu	8e95cea8e5	Azure client upgrade to allow identity options (#15287 ) * Include new dependencies * Mostly implemented * More azure fixes * Tests passing * Unit tests running * Test running after removing storage exception * Happy with coverage now * Add more tests * fix client factory * cleanup from testing * Remove old client * update docs * Exclude from spellcheck * Add licenses * Fix identity version * Save work * Add azure clients * add licenses * typos * Add dependencies * Exception is not thrown * Fix intellij check * Don't need to override * specify length * urldecode * encode path * Fix checks * Revert urlencode changes * Urlencode with azure library * Update docs/development/extensions-core/azure.md Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> * PR changes * Update docs/development/extensions-core/azure.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Deprecate AzureTaskLogsConfig.maxRetries * Clean up azure retry block * logic update to reuse clients * fix comments * Create container conditionally * Fix key auth * Remove container client logic * Add some more testing * Update comments * Add a comment explaining client reuse * Move logic to factory class * use bom for dependency management * fix license versions --------- Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2024-01-03 18:36:05 -05:00
Abhishek Radhakrishnan	f0f428274a	Prometheus config property doc fixup (#15613 ) * Minor fixes * Update docs/development/extensions-contrib/prometheus.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> --------- Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2024-01-02 16:28:42 -08:00
Bartosz Mikulski	4670a7650f	Optional removal of metrics from Prometheus PushGateway on shutdown (#14935 ) * Optional removal of metrics from Prometheus PushGateway on shutdown * Make pushGatewayDeleteOnShutdown property nullable * Add waitForShutdownDelay property * Fix unit test * Address PR comments * Address PR comments * Add explanation on why it is useful to have deletePushGatewayMetricsOnShutdown * Fix spelling error * Fix spelling error	2023-12-13 11:58:53 -05:00
Katya Macedo	355c800108	Revamp design page (#15486 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2023-12-08 11:40:24 -08:00
Jan Werner	f4856bc1c1	ranger-security: exclude jackson-jaxrs from + fix outdated documentation (#15481 ) * Excluding jackson-jaxrs dependency from ranger-plugin-common to address CVE regression introduced by ranger-upgrade: CVE-2019-10202, CVE-2019-10172 * remove the reference to outdated ranger 2.0 from the docs --------- Co-authored-by: Xavier Léauté <xl+github@xvrl.net>	2023-12-05 08:24:37 -08:00
Jill Osborne	3fa856b3ff	Update Kinesis resharding doc (#15401 )	2023-11-20 15:40:59 -08:00
317brian	dfc52994d4	docs: fix code tabs (#15403 )	2023-11-20 11:16:10 -08:00
YongGang	3a3d37ef40	Fix for segment/count Metric Not Emitting with Statsd-emitter (#15347 ) * fix segment/count metric in Statsd-emitter * update doc * Update docs/development/extensions-contrib/prometheus.md Co-authored-by: Suneet Saldanha <suneet@apache.org> * Update docs/development/extensions-contrib/statsd.md Co-authored-by: Suneet Saldanha <suneet@apache.org> --------- Co-authored-by: Suneet Saldanha <suneet@apache.org>	2023-11-10 08:08:58 -08:00
Charles Smith	e7d0429f5b	docs: suggest metadata store with instant ADD COLUMN semantics (#15334 ) Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2023-11-09 12:56:30 -08:00
Pranav	e2fde8c516	Refactor lookups behavior while loading/dropping the containers (#14806 )	2023-11-07 10:07:28 -08:00
Charles Smith	de557a62ad	Suggest adoption of Google Style guide (#14905 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2023-11-01 13:31:03 -07:00
YongGang	7a25ee4fd9	Ability to send task types to k8s or worker task runner (#15196 ) * Ability to send task types to k8s or worker task runner * add more tests * use runnerStrategy to determine task runner * minor refine * refine runner strategy config * move workerType config to upper level * validate config when application start	2023-10-25 09:55:56 -07:00
Pranav	c7d0615af3	Fix the build for #15013.: Lookup jitter upstream build fix (#15103 ) Fix the build for #15013.	2023-10-09 09:35:39 +05:30
George Shiqi Wu	f773d83914	Mixed task runner for migration to mm-less ingestion (#14918 ) * save work * Working * Fix runner constructor * Working runner * extra log lines * try using lifecycle for everything * clean up configs * cleanup /workers call * Use a single config * Allow selecting runner * debug changes * Work on composite task runner * Unit tests running * Add documentation * Add some javadocs * Fix spelling * Use standard libraries * code review * fix * fix * use taskRunner as string * checkstyl --------- Co-authored-by: Suneet Saldanha <suneet@apache.org>	2023-09-11 18:09:46 -07:00
John Gerassimou	d201ea0ece	prometheus-emitter: add extraLabels parameter (#14728 ) * prometheus-emitter: add extraLabels parameter * prometheus-emitter: update readme to include the extraLabels parameter * prometheus-emitter: remove nullable and surface label name issues * remove import to make linter happy	2023-08-29 12:02:22 -07:00
Abhishek Agarwal	3c7b237c22	Add docs for ingesting Kafka topic name (#14894 ) Add documentation on how to extract the Kafka topic name and ingest it into the data.	2023-08-24 19:19:59 +05:30
Clint Wylie	194a9c9abc	set druid.expressions.useStrictBooleans to true by default (#14734 )	2023-08-22 00:19:56 -07:00
Katya Macedo	5f74ef56f1	Clean up Kafka supervisor topic (#14651 ) Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2023-08-21 11:55:38 -07:00
Abhishek Radhakrishnan	37db5d9b81	Reset offsets supervisor API (#14772 ) * Add supervisor /resetOffsets API. - Add a new endpoint /druid/indexer/v1/supervisor/<supervisorId>/resetOffsets which accepts DataSourceMetadata as a body parameter. - Update logs, unit tests and docs. * Add a new interface method for backwards compatibility. * Rename * Adjust tests and javadocs. * Use CoreInjectorBuilder instead of deprecated makeInjectorWithModules * UT fix * Doc updates. * remove extraneous debugging logs. * Remove the boolean setting; only ResetHandle() and resetInternal() * Relax constraints and add a new ResetOffsetsNotice; cleanup old logic. * A separate ResetOffsetsNotice and some cleanup. * Minor cleanup * Add a check & test to verify that sequence numbers are only of type SeekableStreamEndSequenceNumbers * Add unit tests for the no op implementations for test coverage * CodeQL fix * checkstyle from merge conflict * Doc changes * DOCUSAURUS code tabs fix. Thanks, Brian!	2023-08-17 14:13:10 -07:00
Abhishek Agarwal	b97cc45d81	Add clarification to the docs for multi-topic Kafka ingestion (#14847 ) Follow-up to #14828. Added some more clarification about how topicPattern is used.	2023-08-17 12:52:06 +05:30
317brian	6b4dda964d	Docusaurus2 upgrade for master (#14411 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2023-08-16 19:01:21 -07:00
YongGang	3954685aae	Report more metrics to monitor K8s task runner (#14771 ) * Report pod running metrics to monitor K8s task runner * refine method definition * fix checkstyle * implement task metrics * more comment * address comments * update doc for the new metrics reported * fix checkstyle * refine method definition * minor refine	2023-08-16 14:03:53 -04:00
Abhishek Agarwal	7911a04064	Refactoring of multi-topic kafka ingestion docs (#14828 ) In this PR, I have gotten rid of multiTopic parameter and instead added a topicPattern parameter. Kafka supervisor will pass topicPattern or topic as the stream name to the core ingestion engine. There is validation to ensure that only one of topic or topicPattern will be set. This new setting is easier to understand than overloading the topic field that earlier could be interpreted differently depending on the value of some other field.	2023-08-16 18:00:11 +05:30
Abhishek Agarwal	30b5dd4ca7	Add support to read from multiple kafka topics in same supervisor (#14424 ) This PR adds support to read from multiple Kafka topics in the same supervisor. A multi-topic ingestion can be useful in scenarios where a cluster admin has no control over input streams. Different teams in an org may create different input topics that they can write the data to. However, the cluster admin wants all this data to be queryable in one data source.	2023-08-14 22:24:49 +05:30
Tejaswini Bandlamudi	a45b25fa1d	Removes support for Hadoop 2 (#14763 ) Removing Hadoop 2 support as discussed in https://lists.apache.org/list?dev@druid.apache.org:lte=1M:hadoop	2023-08-09 17:47:52 +05:30
Abhishek Radhakrishnan	bff8f9e12e	Update kinesis docs (#14768 ) Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>	2023-08-07 17:08:34 -07:00
Kashif Faraz	2d8e0f28f3	Refactor: Cleanup coordinator duties for metadata cleanup (#14631 ) Changes - Add abstract class `MetadataCleanupDuty` - Make `KillAuditLogs`, `KillCompactionConfig`, etc extend `MetadataCleanupDuty` - Improve log and error messages - Cleanup tests - No functional change	2023-08-05 13:08:23 +05:30
George Shiqi Wu	174053f4fd	Add readme for kubernetes-overlord-extensions and update docs (#14674 ) * Add readme for kubernetes task scheduler * clean up uneeded stuff * Update extensions-contrib/kubernetes-overlord-extensions/README.md Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> * Move documentation into main page * indentation * cleanup spellcheck errors * Update docs/development/extensions-contrib/k8s-jobs.md Co-authored-by: Suneet Saldanha <suneet@apache.org> * Update extensions-contrib/kubernetes-overlord-extensions/README.md Co-authored-by: Suneet Saldanha <suneet@apache.org> * Update docs/development/extensions-contrib/k8s-jobs.md Co-authored-by: Suneet Saldanha <suneet@apache.org> * PR comments * Update docs/development/extensions-contrib/k8s-jobs.md Co-authored-by: Suneet Saldanha <suneet@apache.org> * Update docs/development/extensions-contrib/k8s-jobs.md Co-authored-by: Suneet Saldanha <suneet@apache.org> * Update docs/development/extensions-contrib/k8s-jobs.md Co-authored-by: Suneet Saldanha <suneet@apache.org> --------- Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> Co-authored-by: Suneet Saldanha <suneet@apache.org>	2023-08-01 13:29:44 -07:00
Gian Merlino	5387f1bac0	Remove chatAsync parameter, so chat is always async. (#14692 ) * Remove chatAsync parameter, so chat is always async. chatAsync has been made default in Druid 26. I have seen good battle-testing of it in production, and am comfortable removing the older sync client. This was the last remaining usage of IndexTaskClient, so this patch deletes all that stuff too. * Remove unthrown exception. * Remove unthrown exception. * No more TimeoutException.	2023-07-31 19:42:51 -07:00
Katya Macedo	4804630c78	Clean up Kinesis doc (#14529 )	2023-07-25 19:24:36 -07:00
Jaehui Lee	1f4ee5e21b	Docs: Change default value of "maxRowsInMemory" in tuningConfig (#14618 ) Reflecting fixes from https://github.com/apache/druid/pull/13939	2023-07-19 23:14:15 +05:30
Atul Mohan	03d6d395a0	Extension to read and ingest iceberg data files (#14329 ) This adds a new contrib extension: druid-iceberg-extensions which can be used to ingest data stored in Apache Iceberg format. It adds a new input source of type iceberg that connects to a catalog and retrieves the data files associated with an iceberg table and provides these data file paths to either an S3 or HDFS input source depending on the warehouse location. Two important dependencies associated with Apache Iceberg tables are: Catalog : This extension supports reading from either a Hive Metastore catalog or a Local file-based catalog. Support for AWS Glue is not available yet. Warehouse : This extension supports reading data files from either HDFS or S3. Adapters for other cloud object locations should be easy to add by extending the AbstractInputSourceAdapter.	2023-07-18 08:59:57 +05:30
Gian Merlino	95ca43034f	Change default handoffConditionTimeout to 15 minutes. (#14539 ) * Change default handoffConditionTimeout to 15 minutes. Most of the time, when handoff is taking this long, it's because something is preventing Historicals from loading new data. In this case, we have two choices: 1) Stop making progress on ingestion, wait for Historicals to load stuff, and keep the waiting-for-handoff segments available on realtime tasks. (handoffConditionTimeout = 0, the current default) 2) Continue making progress on ingestion, by exiting the realtime tasks that were waiting for handoff. Once the Historicals get their act together, the segments will be loaded, as they are still there on deep storage. They will just not be continuously available. (handoffConditionTimeout > 0) I believe most users would prefer [2], because [1] risks ingestion falling behind the stream, which causes many other problems. It can cause data loss if the stream ages-out data before we have a chance to ingest it. Due to the way tuningConfigs are serialized -- defaults are baked into the serialized form that is written to the database -- this default change will not change anyone's existing supervisors. It will take effect for newly created supervisors. * Fix tests. * Update docs/development/extensions-core/kafka-supervisor-reference.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/development/extensions-core/kinesis-ingestion.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> --------- Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>	2023-07-13 13:17:14 -07:00

1 2 3 4 5 ...

262 Commits