druid

Commit Graph

Author	SHA1	Message	Date
PANKAJ KUMAR	65857dc0e7	pac4j: fix incompatible dependencies + authorization regression (#15753 ) - After upgrading the pac4j version in: https://github.com/apache/druid/pull/15522. We were not able to access the druid ui. - Upgraded the Nimbus libraries version to a compatible version to pac4j. - In the older pac4j version, when we return RedirectAction there we also update the webcontext Response status code and add the authentication URL to the header. But in the newer pac4j version, we just simply return the RedirectAction. So that's why it was not getting redirected to the generated authentication URL. - To fix the above, I have updated the NOOP_HTTP_ACTION_ADAPTER to JEE_HTTP_ACTION_ADAPTER and it updates the HTTP Response in context as per the HTTP Action.	2024-02-01 09:35:23 -08:00
George Shiqi Wu	5edfa9429f	Batch kill in azure (#15770 ) * Multi kill * add some unit tests * Fix param * Fix deleteBatchFiles * Fix unit tests * Add tests * Save work on batch kill * add tests * Fix unit tests * Update extensions-core/azure-extensions/src/main/java/org/apache/druid/storage/azure/AzureDataSegmentKiller.java Co-authored-by: Suneet Saldanha <suneet@apache.org> * Fix unit tests * Update extensions-core/azure-extensions/src/test/java/org/apache/druid/storage/azure/AzureStorageTest.java Co-authored-by: Suneet Saldanha <suneet@apache.org> * fix test * fix test * Add test --------- Co-authored-by: Suneet Saldanha <suneet@apache.org>	2024-01-31 13:41:15 -05:00
George Shiqi Wu	dbcfb2bb8b	Allow null values for account when injecting (#15777 )	2024-01-30 16:55:45 -05:00
Abhishek Radhakrishnan	dbdfae3011	Fix up typo </br /> -> <br /> and adjust interpolated exception msg in InvalidNullByteFault. (#15804 )	2024-01-30 12:44:51 -08:00
AmatyaAvadhanula	54d0e482dc	Consolidate RetrieveSegmentsToReplaceAction into RetrieveUsedSegmentsAction (#15699 ) Consolidate RetrieveSegmentsToReplaceAction into RetrieveUsedSegmentsAction	2024-01-29 19:18:43 +05:30
George Shiqi Wu	3e512249e3	Azure multi read options (#15630 ) * Include new dependencies * Mostly implemented * More azure fixes * Tests passing * Unit tests running * Test running after removing storage exception * Happy with coverage now * Add more tests * fix client factory * cleanup from testing * Remove old client * update docs * Exclude from spellcheck * Add licenses * Fix identity version * Save work * Add azure clients * add licenses * typos * Add dependencies * Exception is not thrown * Fix intellij check * Don't need to override * specify length * urldecode * encode path * Fix checks * Revert urlencode changes * Urlencode with azure library * Update docs/development/extensions-core/azure.md Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> * PR changes * Update docs/development/extensions-core/azure.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Add config for multiple storage accounts * Deprecate AzureTaskLogsConfig.maxRetries * Clean up azure retry block * logic update to reuse clients * fix comments * Create container conditionally * Fix key auth * save work * Fix unit tests * Revert old azure input type * Separate input source * save work * Add support for app registrations * Fix unit tests * clean up spacing * Add coverage * fixes from testing * cleanup some caching behavior * Add docs * Fix spelling issues * fix more spelling errors' * Fix intellij inspections * add simple changes from pr * save work on fixing bug * Fix unit tests * Add more testing * Fix unit test * Add tests * Add annotation for azureStorage * Fix up docs * Add comment for list method * Fix tests * Remove uneeded toString * Update docs/ingestion/input-sources.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/ingestion/input-sources.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/ingestion/input-sources.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/ingestion/input-sources.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/ingestion/input-sources.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/ingestion/input-sources.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/ingestion/input-sources.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/ingestion/input-sources.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/ingestion/input-sources.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * PR changes * fix injection of StorageConnector * Fix checkstyle * clean up unit tests * More pr fixes --------- Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2024-01-25 13:29:16 -05:00
Parth Agrawal	ed6df26a91	update salt size (#15758 ) As part of becoming FIPS compliance, we are seeing this error: salt must be at least 128 bits when we run the Druid code against FIPS Compliant cryptographic security providers. This PR fixes the salt size used in Pac4jSessionStore.java	2024-01-25 17:05:53 +05:30
George Shiqi Wu	ef0232290c	Fix AzureStorage.batchDeleteFiles (#15730 ) * Fix param * Fix deleteBatchFiles * Fix unit tests * Add tests	2024-01-24 14:37:58 -05:00
Karan Kumar	c4990f56d6	Prepare main branch for next 30.0.0 release. (#15707 )	2024-01-23 15:55:54 +05:30
Zoltan Haindrich	d6a12c4389	Add ability to enable ResultCache in tests (#15465 )	2024-01-22 09:02:59 -05:00
zachjsh	9d4e8053a4	Kinesis adaptive memory management (#15360 ) ### Description Our Kinesis consumer works by using the [GetRecords API](https://docs.aws.amazon.com/kinesis/latest/APIReference/API_GetRecords.html) in some number of `fetchThreads`, each fetching some number of records (`recordsPerFetch`) and each inserting into a shared buffer that can hold a `recordBufferSize` number of records. The logic is described in our documentation at: https://druid.apache.org/docs/27.0.0/development/extensions-core/kinesis-ingestion/#determine-fetch-settings There is a problem with the logic that this pr fixes: the memory limits rely on a hard-coded “estimated record size” that is `10 KB` if `deaggregate: false` and `1 MB` if `deaggregate: true`. There have been cases where a supervisor had `deaggregate: true` set even though it wasn’t needed, leading to under-utilization of memory and poor ingestion performance. Users don’t always know if their records are aggregated or not. Also, even if they could figure it out, it’s better to not have to. So we’d like to eliminate the `deaggregate` parameter, which means we need to do memory management more adaptively based on the actual record sizes. We take advantage of the fact that GetRecords doesn’t return more than 10MB (https://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html ): This pr: eliminates `recordsPerFetch`, always use the max limit of 10000 records (the default limit if not set) eliminate `deaggregate`, always have it true cap `fetchThreads` to ensure that if each fetch returns the max (`10MB`) then we don't exceed our budget (`100MB` or `5% of heap`). In practice this means `fetchThreads` will never be more than `10`. Tasks usually don't have that many processors available to them anyway, so in practice I don't think this will change the number of threads for too many deployments add `recordBufferSizeBytes` as a bytes-based limit rather than records-based limit for the shared queue. We do know the byte size of kinesis records by at this point. Default should be `100MB` or `10% of heap`, whichever is smaller. add `maxBytesPerPoll` as a bytes-based limit for how much data we poll from shared buffer at a time. Default is `1000000` bytes. deprecate `recordBufferSize`, use `recordBufferSizeBytes` instead. Warning is logged if `recordBufferSize` is specified deprecate `maxRecordsPerPoll`, use `maxBytesPerPoll` instead. Warning is logged if maxRecordsPerPoll` is specified Fixed issue that when the record buffer is full, the fetchRecords logic throws away the rest of the GetRecords result after `recordBufferOfferTimeout` and starts a new shard iterator. This seems excessively churny. Instead, wait an unbounded amount of time for queue to stop being full. If the queue remains full, we’ll end up right back waiting for it after the restarted fetch. There was also a call to `newQ::offer` without check in `filterBufferAndResetBackgroundFetch`, which seemed like it could cause data loss. Now checking return value here, and failing if false. ### Release Note Kinesis ingestion memory tuning config has been greatly simplified, and a more adaptive approach is now taken for the configuration. Here is a summary of the changes made: eliminates `recordsPerFetch`, always use the max limit of 10000 records (the default limit if not set) eliminate `deaggregate`, always have it true cap `fetchThreads` to ensure that if each fetch returns the max (`10MB`) then we don't exceed our budget (`100MB` or `5% of heap`). In practice this means `fetchThreads` will never be more than `10`. Tasks usually don't have that many processors available to them anyway, so in practice I don't think this will change the number of threads for too many deployments add `recordBufferSizeBytes` as a bytes-based limit rather than records-based limit for the shared queue. We do know the byte size of kinesis records by at this point. Default should be `100MB` or `10% of heap`, whichever is smaller. add `maxBytesPerPoll` as a bytes-based limit for how much data we poll from shared buffer at a time. Default is `1000000` bytes. deprecate `recordBufferSize`, use `recordBufferSizeBytes` instead. Warning is logged if `recordBufferSize` is specified deprecate `maxRecordsPerPoll`, use `maxBytesPerPoll` instead. Warning is logged if maxRecordsPerPoll` is specified	2024-01-19 14:30:21 -05:00
George Shiqi Wu	dd03f7061a	Implement task payload management with azure storage (#15695 ) * Implement task payload manageent with azure storage * Remove unnecessary parameter	2024-01-18 13:21:36 -05:00
Gian Merlino	764f41d959	Clear "lineSplittable" for JSON when using KafkaInputFormat. (#15692 ) * Clear "lineSplittable" for JSON when using KafkaInputFormat. JsonInputFormat has a "withLineSplittable" method that can be used to control whether JSON is read line-by-line, or as a whole. The intent is that in streaming ingestion, "lineSplittable" is false (although it can be overridden by "assumeNewlineDelimited"), and in batch ingestion, lineSplittable is true. When a "json" format is wrapped by a "kafka" format, this isn't set properly. This patch updates KafkaInputFormat to set this on an underlying "json" format. The tests for KafkaInputFormat were overriding the "lineSplittable" parameter explicitly, which wasn't really fair, because that made them unrealistic to what happens in production. Now they omit the parameter and get the production behavior. * Add test. * Fix test coverage.	2024-01-18 03:22:41 -08:00
Gian Merlino	d3d0c1c91e	Faster parsing: reduce String usage, list-based input rows. (#15681 ) * Faster parsing: reduce String usage, list-based input rows. Three changes: 1) Reworked FastLineIterator to optionally avoid generating Strings entirely, and reduce copying somewhat. Benefits the line-oriented JSON, CSV, delimited (TSV), and regex formats. 2) In the delimited (TSV) format, when the delimiter is a single byte, split on UTF-8 bytes directly. 3) In CSV and delimited (TSV) formats, use list-based input rows when the column list is provided upfront by the user. * Fix style. * Fix inspections. * Restore validation. * Remove fastutil-extra. * Exception type. * Fixes for error messages. * Fixes for null handling.	2024-01-18 19:18:46 +08:00
Maytas Monsereenusorn	55acf2e2ff	Fix incorrect scale when reading decimal from parquet (#15715 ) * Fix incorrect scale when reading decimal from parquet * add comments * fix test	2024-01-18 02:10:27 -08:00
Maytas Monsereenusorn	a3b32fbd26	Fix comparator and remove deprecated methods from spectatorHistogram extension (#15698 ) * Remove deprecated methods from SpectatorHistogram * Remove deprecated methods from SpectatorHistogram * Remove deprecated methods from SpectatorHistogram	2024-01-17 21:59:53 -08:00
Abhishek Radhakrishnan	c27f5bf52f	Report zero values instead of unknown for empty ingest queries (#15674 ) MSQ now allows empty ingest queries by default. For such queries that don't generate any output rows, the query counters in the async status result object/task report don't contain numTotalRows and totalSizeInBytes. These properties when not set/undefined can be confusing to API clients. For example, the web-console treats it as unknown values. This patch fixes the counters by explicitly reporting them as 0 instead of null for empty ingest queries.	2024-01-17 16:26:10 +05:30
Zoltan Haindrich	8a43db9395	Range support in window expressions (support them as groups) (#15365 ) * support groups windowing mode; which is a close relative of ranges (but not in the standard) * all windows with range expressions will be executed wit it groups * it will be 100% correct in case for both bounds its true that: isCurrentRow() \|\| isUnBounded() * this covers OVER ( ORDER BY COL ) * for other cases it will have some chances of getting correct results...	2024-01-17 00:05:21 -06:00
Laksh Singla	8ba06cf723	account for null values in the stddev post aggregator (#15660 )	2024-01-16 19:57:33 +05:30
AmatyaAvadhanula	6b951b94c0	Add new context parameter for using concurrent locks (#15684 ) Changes: - Add new task context flag useConcurrentLocks. - This can be set for an individual task or at a cluster level using `druid.indexer.task.default.context`. - When set to true, any appending task would use an APPEND lock and any other ingestion task would use a REPLACE lock when using time chunk locking. - If false (default), we fall back on the context flag taskLockType and then useSharedLock.	2024-01-16 12:43:39 +05:30
Kashif Faraz	18d2a8957f	Refactor: Cleanup test impls of ServiceEmitter (#15683 )	2024-01-15 17:37:00 +05:30
Gian Merlino	500681d0cb	Add ImmutableLookupMap for static lookups. (#15675 ) * Add ImmutableLookupMap for static lookups. This patch adds a new ImmutableLookupMap, which comes with an ImmutableLookupExtractor. It uses a fastutil open hashmap plus two lists to store its data in such a way that forward and reverse lookups can both be done quickly. I also observed footprint to be somewhat smaller than Java HashMap + MapLookupExtractor for a 1 million row lookup. The main advantage, though, is that reverse lookups can be done much more quickly than MapLookupExtractor (which iterates the entire map for each call to unapplyAll). This speeds up the recently added ReverseLookupRule (#15626) during SQL planning with very large lookups. * Use in one more test. * Fix benchmark. * Object2ObjectOpenHashMap * Fixes, and LookupExtractor interface update to have asMap. * Remove commented-out code. * Fix style. * Fix import order. * Add fastutil. * Avoid storing Map entries.	2024-01-13 13:14:01 -08:00
Gian Merlino	2231cb30a4	Faster k-way merging using tournament trees, 8-byte key strides. (#15661 ) * Faster k-way merging using tournament trees, 8-byte key strides. Two speedups for FrameChannelMerger (which does k-way merging in MSQ): 1) Replace the priority queue with a tournament tree, which does fewer comparisons. 2) Compare keys using 8-byte strides, rather than 1 byte at a time. * Adjust comments. * Fix style. * Adjust benchmark and test. * Add eight-list test (power of two).	2024-01-11 08:36:22 -08:00
Laksh Singla	87fbe42218	"Partition boost" the group by queries in MSQ for better splits (#15474 ) "Partition boost" the group by queries in MSQ for better splits	2024-01-11 12:46:27 +05:30
Kashif Faraz	d623756c66	Add cache for password hash in druid-basic-security (#15648 ) Add class PasswordHashGenerator. Move hashing logic from BasicAuthUtils to this new class. Add cache in the hash generator to contain the computed hash of passwords and boost validator performance Cache has max size 1000 and expiry 1 hour Key of the cache is an SHA-256 hash of the (password + random salt generated on service startup)	2024-01-10 17:46:09 +05:30
Ankit Kothari	355c2f5da0	Add sql + ingestion compatibility for first/last on numeric values (#15607 ) SQL compatibility for numeric last and first column types. Ingestion UI now provides option for first and last aggregation as well.	2024-01-10 12:59:38 +05:30
PANKAJ KUMAR	047c7340ab	Adding retries to update the metadata store instead of failure (#15141 ) Currently, If 2 tasks are consuming from the same partitions, try to publish the segment and update the metadata, the second task can fail because the end offset stored in the metadata store doesn't match with the start offset of the second task. We can fix this by retrying instead of failing. AFAIK apart from the above issue, the metadata mismatch can happen in 2 scenarios: - when we update the input topic name for the data source - when we run 2 replicas of ingestion tasks(1 replica will publish and 1 will fail as the first replica has already updated the metadata). Implemented the comparable function to compare the last committed end offset and new Sequence start offset. And return a specific error msg for this. Add retry logic on indexers to retry for this specific error msg. Updated the existing test case.	2024-01-10 12:30:54 +05:30
Misha	ea6ba40ce1	Add support for Azure Goverment storage (#15523 ) Added support for Azure Government storage in Druid Azure-Extensions. This enhancement allows the Azure-Extensions to be compatible with different Azure storage types by updating the endpoint suffix from a hardcoded value to a configurable one.	2024-01-09 22:33:32 +05:30
Jonathan Wei	5d1e66b8f9	Allow broker to use catalog for datasource schemas for SQL queries (#15469 ) * Allow broker to use catalog for datasource schemas * More PR comments * PR comments	2024-01-08 13:46:08 -06:00
Clint Wylie	c221a2634b	overhaul DruidPredicateFactory to better handle 3VL (#15629 ) * overhaul DruidPredicateFactory to better handle 3VL fixes some bugs caused by some limitations of the original design of how DruidPredicateFactory interacts with 3-value logic. The primary impacted area was with how filters on values transformed with expressions or extractionFn which turn non-null values into nulls, which were not possible to be modelled with the 'isNullInputUnknown' method changes: * adds DruidObjectPredicate to specialize string, array, and object based predicates instead of using guava Predicate * DruidPredicateFactory now uses DruidObjectPredicate * introduces DruidPredicateMatch enum, which all predicates returned from DruidPredicateFactory now use instead of booleans to indicate match. This means DruidLongPredicate, DruidFloatPredicate, DruidDoublePredicate, and the newly added DruidObjectPredicate apply methods all now return DruidPredicateMatch. This allows matchers and indexes * isNullInputUnknown has been removed from DruidPredicateFactory * rename, fix test * adjust * style * npe * more test * fix default value mode to not match new test	2024-01-05 19:08:02 -08:00
Zoltan Haindrich	b9679d0884	Run filter-into-join rule early for subqueries and disable project-filter rule (#15511 ) FILTER_INTO_JOIN is mainly run along with the other rules with the Volcano planner; however if the query starts highly underdefined (join conditions in the where clauses) that generic query could give a lot of room for the other rules to play around with only enabled it for when the join uses subqueries for its inputs. PROJECT_FILTER rule is not that useful. and could increase planning times by providing new plans. This problem worsened after we started supporting inner joins with arbitrary join conditions in https://github.com/apache/druid/pull/15302	2024-01-04 15:33:45 +05:30
George Shiqi Wu	8e95cea8e5	Azure client upgrade to allow identity options (#15287 ) * Include new dependencies * Mostly implemented * More azure fixes * Tests passing * Unit tests running * Test running after removing storage exception * Happy with coverage now * Add more tests * fix client factory * cleanup from testing * Remove old client * update docs * Exclude from spellcheck * Add licenses * Fix identity version * Save work * Add azure clients * add licenses * typos * Add dependencies * Exception is not thrown * Fix intellij check * Don't need to override * specify length * urldecode * encode path * Fix checks * Revert urlencode changes * Urlencode with azure library * Update docs/development/extensions-core/azure.md Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> * PR changes * Update docs/development/extensions-core/azure.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Deprecate AzureTaskLogsConfig.maxRetries * Clean up azure retry block * logic update to reuse clients * fix comments * Create container conditionally * Fix key auth * Remove container client logic * Add some more testing * Update comments * Add a comment explaining client reuse * Move logic to factory class * use bom for dependency management * fix license versions --------- Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2024-01-03 18:36:05 -05:00
Gian Merlino	e40b96e026	Reverse lookup fixes and enhancements. (#15611 ) * Reverse lookup fixes and enhancements. 1) Add a "mayIncludeUnknown" parameter to DimFilter#optimize. This is important because otherwise the reverse-lookup optimization is done improperly when the "in" filter appears under a "not", and the lookup extractionFn may return null for some possible values of the filtered column. The "includeUnknown" test cases in InDimFilterTest illustrate the difference in behavior. 2) Enhance InDimFilter#optimizeLookup to handle "mayIncludeUnknown", and to be able to do a reverse lookup in a wider variety of cases. 3) Make "unapply" protected in LookupExtractor, and move callers to "unapplyAll". The main reason is that MapLookupExtractor, a common implementation, lacks a reverse mapping and therefore does a scan of the map for each call to "unapply". For performance sake these calls need to be batched. * Remove optimize call from BloomDimFilter. * Follow the law. * Fix tests. * Fix imports. * Switch function. * Fix tests. * More tests.	2024-01-03 13:28:44 -08:00
Abhishek Radhakrishnan	9c7d7fc777	Allow empty inserts and replaces in MSQ. (#15495 ) * Allow empty inserts and replace. - Introduce a new query context failOnEmptyInsert which defaults to false. - When this context is false (default), MSQE will now allow empty inserts and replaces. - When this context is true, MSQE will throw the existing InsertCannotBeEmpty MSQ fault. - For REPLACE ALL over an ALL grain segment, the query will generate a tombstone spanning eternity which will be removed eventually be the coordinator. - Add unit tests in MSQInsertTest, MSQReplaceTest to test the new default behavior (i.e., when failOnEmptyInsert = false) - Update unit tests in MSQFaultsTest to test the non-default behavior (i.e., when failOnEmptyInsert = true) * Ignore test to see if it's the culprit for OOM * Add heap dump config * Bump up -Xmx from 1500 MB to 2048 MB * Add steps to tarball and collect hprof dump to GHA action * put back mx to 1500MB to trigger the failure * add the step to reusable unit test workflow as well * Revert the temp heap dump & @Ignore changes since max heap size is increased * Minor updates * Review comments 1. Doc suggestions 2. Add tests for empty insert and replace queries with ALL grain and limit in the default failOnEmptyInsert mode (=false). Add similar tests to MSQFaultsTest with failOnEmptyInsert = true, so the query does fail with an InsertCannotBeEmpty fault. 3. Nullable annotation and javadocs * Add comment replace_limit.patch	2024-01-02 13:05:51 -08:00
Kashif Faraz	9f568858ef	Add logging implementation for AuditManager and audit more endpoints (#15480 ) Changes - Add `log` implementation for `AuditManager` alongwith `SQLAuditManager` - `LoggingAuditManager` simply logs the audit event. Thus, it returns empty for all `fetchAuditHistory` calls. - Add new config `druid.audit.manager.type` which can take values `log`, `sql` (default) - Add new config `druid.audit.manager.logLevel` which can take values `DEBUG`, `INFO`, `WARN`. This gets activated only if `type` is `log`. - Remove usage of `ConfigSerde` from `AuditManager` as audit is not just limited to configs - Add `AuditSerdeHelper` for a single implementation of serialization/deserialization of audit payload and other utility methods.	2023-12-19 13:14:04 +05:30
Vishesh Garg	e43bb74c3a	Add MSQ Durable Storage Connector for Google Cloud Storage and change current Google Cloud Storage client library (#15398 ) The PR addresses 2 things: Add MSQ durable storage connector for GCS Change GCS client library from the old Google API Client Library to the recommended Google Cloud Client Library. Ref: https://cloud.google.com/apis/docs/client-libraries-explained	2023-12-14 07:34:49 +05:30
Keerthana Srikanth	f32dbd4131	Upgrade pac4j-oidc to 4.5.7 to address CVE-2021-44878 (#15522 ) * Upgrade org.pac4j:pac4j-oidc to 4.5.5 to address CVE-2021-44878 * add CVE suppression and notes, since vulnerability scan still shows this CVE * Add tests to improve coverage	2023-12-13 10:44:05 -08:00
AmatyaAvadhanula	48a96f5d06	Better automatic offset reset for Kinesis ingestion (#15338 ) Better automatic offset reset for Kinesis ingestion	2023-12-13 12:03:17 +05:30
Jan Werner	3c7dec56ca	update kubernetes java client to 19.0.0 and docker-java to 3.3.4 (#15449 ) Update of direct dependencies: * kubernetes java-client to 19.0.0 * docker-java-bom to 3.3.4 In order to update transitive dependencies: * okio to 3.6.0 * bcjava to 1.76 To address CVES: - CVE-2023-3635 in okio - CVE-2023-33201 in bcjava --------- Co-authored-by: Xavier Léauté <xvrl@apache.org>	2023-12-12 14:27:57 -08:00
Adarsh Sanjeev	254a8eb7e0	Add null checks for HllSketchHolder (#15502 ) Fixes a potential NPE which could occur while folding the HllSketchAggregator. If the sketch is null, druid could return a null HllSketchHolder object. Adding a null check here could help here Resolves a null pointer exception in HllSketchAggregatorFactory	2023-12-08 11:43:04 +05:30
Jan Werner	ff0e838d30	add gson to dependencyManagement (#15488 ) This change completes the change introduced in #15461 and unifies the version of gson dependency used between all the modules. gson is used by kubernetes-extension, avro-extensions, ranger-security, and as a test dependency in several core modules. --------- Co-authored-by: Xavier Léauté <xl+github@xvrl.net>	2023-12-05 11:50:32 -08:00
Jan Werner	f4856bc1c1	ranger-security: exclude jackson-jaxrs from + fix outdated documentation (#15481 ) * Excluding jackson-jaxrs dependency from ranger-plugin-common to address CVE regression introduced by ranger-upgrade: CVE-2019-10202, CVE-2019-10172 * remove the reference to outdated ranger 2.0 from the docs --------- Co-authored-by: Xavier Léauté <xl+github@xvrl.net>	2023-12-05 08:24:37 -08:00
Adarsh Sanjeev	ddd2299272	Add null check for VarianceAggregatorCollector	2023-12-04 22:26:44 +05:30
Jan Werner	b854058491	remove unnecessary elasticsearch dependencies to fix CVE regressions (#15443 ) Recent upgrade of ranger introduced CVE regressions due to outdated elasticsearch components. Druid-ranger-plugin does not elasticsearch components , and they have been explicitly removed. Update woodstox-core to 6.4.0 to address GHSA-3f7h-mf4q-vrm4	2023-12-03 20:56:40 +05:30
AmatyaAvadhanula	4a594bb9f6	Use task actions to fetch used segments in MSQ (#15284 ) * Use task actions to fetch used segments in MSQ * Fix tests * Fixing tests. * Revert "Fix tests" This reverts commit `95ab6494` * Removing conditional check in tests. * Pulling in latest changes. --------- Co-authored-by: cryptoe <karankumar1100@gmail.com>	2023-12-01 15:29:33 +05:30
Abhishek Agarwal	0a56c87e93	SQL: Plan non-equijoin conditions as cross join followed by filter (#15302 ) This PR revives #14978 with a few more bells and whistles. Instead of an unconditional cross-join, we will now split the join condition such that some conditions are now evaluated post-join. To decide what sub-condition goes where, I have refactored DruidJoinRule class to extract unsupported sub-conditions. We build a postJoinFilter out of these unsupported sub-conditions and push to the join.	2023-11-29 13:46:11 +05:30
Jan Werner	ee6ad36fab	update confluent's dependencies to common, supported version (#15441 ) * update confluent's dependencies to common, supported version Update io.confluent.* dependencies to common, updated version 6.2.12 currently used versions are EOL * move version definition to the top level pom	2023-11-28 21:35:22 -08:00
Clint Wylie	97623b408c	add optional 'castToType' parameter to 'auto' column schema (#15417 ) * auto but.. with an expected type	2023-11-28 17:19:23 -08:00
Kashif Faraz	67c7b6248c	Fix log typos, clean up some kill messages in SeekableStreamSupervisor (#15424 ) Changes: - Fix log `Got end of partition marker for partition [%s] from task [%s] in discoverTasks` by fixing order of args - Simplify in-line classes by using lambda - Update kill task message from `Task [%s] failed to respond to [set end offsets] in a timely manner, killing task` to `Failed to set end offsets, killing task` - Clean up tests	2023-11-24 16:09:10 +05:30
Vivek Dhiman	c14cfc2a86	Patched security vulnerability by updating Ranger libraries to the ne… (#15363 ) Patched security vulnerability by updating Ranger libraries to the newest available version.	2023-11-22 15:47:18 +05:30
Kashif Faraz	4ba3cf5221	Add test to verify sequence name of Kafka task (#15397 ) * Add test to verify sequence name of Kafka and Kinesis tasks	2023-11-21 10:17:32 +05:30
Magnus Henoch	67f45fa7bf	Fix histograms for sketches where min and max are equal (#15381 ) There is a problem with Quantiles sketches and KLL Quantiles sketches. Queries using the histogram post-aggregator fail if: - the sketch contains at least one value, and - the values in the sketch are all equal, and - the splitPoints argument is not passed to the post-aggregator, and - the numBins argument is greater than 2 (or not specified, which leads to the default of 10 being used) In that case, the query fails and returns this error: { "error": "Unknown exception", "errorClass": "org.apache.datasketches.common.SketchesArgumentException", "host": null, "errorCode": "legacyQueryException", "persona": "OPERATOR", "category": "RUNTIME_FAILURE", "errorMessage": "Values must be unique, monotonically increasing and not NaN.", "context": { "host": null, "errorClass": "org.apache.datasketches.common.SketchesArgumentException", "legacyErrorCode": "Unknown exception" } } This behaviour is undesirable, since the caller doesn't necessarily know in advance whether the sketch has values that are diverse enough. With this change, the post-aggregators return [N, 0, 0...] instead of crashing, where N is the number of values in the sketch, and the length of the list is equal to numBins. That is what they already returned for numBins = 2. Here is an example of a query that would fail: {"queryType":"timeseries", "dataSource": { "type": "inline", "columnNames": ["foo", "bar"], "rows": [ ["abc", 42.0], ["def", 42.0] ] }, "intervals":["0000/3000"], "granularity":"all", "aggregations":[ {"name":"the_sketch", "fieldName":"bar", "type":"quantilesDoublesSketch"}], "postAggregations":[ {"name":"the_histogram", "type":"quantilesDoublesSketchToHistogram", "field":{"type":"fieldAccess","fieldName":"the_sketch"}, "numBins": 3}]} I believe this also fixes issue #10585.	2023-11-16 12:31:22 +05:30
Krishna Anandan	53797b9e49	Fixed a flaky test in `S3DataSegmentPusherConfigTest#testSerialization` by changing string to key:value pair (#15207 ) * Fix capacity response in mm-less ingestion (#14888) Changes: - Fix capacity response in mm-less ingestion. - Add field usedClusterCapacity to the GET /totalWorkerCapacity response. This API should be used to get the total ingestion capacity on the overlord. - Remove method `isK8sTaskRunner` from interface `TaskRunner` * Using Map to perform comparison * Minor Change --------- Co-authored-by: George Shiqi Wu <george.wu@imply.io>	2023-11-15 09:05:55 -08:00
Krishna Anandan	4ca5acdc33	Fixed 2 Flaky Tests (#15376 )	2023-11-15 18:40:09 +05:30
Karan Kumar	a70a3d5d48	Fix cancellation bug in MSQ. (#15368 ) Saw bug where MSQ controller task would continue to hold the task slot even after cancel was issued. This was due to a deadlock created on work launch. The main thread was waiting for tasks to spawn and the cancel thread was waiting for tasks to finish. The fix was to instruct the MSQWorkerTaskLauncher thread to stop creating new tasks which would enable the main thread to unblock and release the slot. Also short circuited the taskRetriable condition. Now the check is run in the MSQWorkerTaskLauncher thread as opposed to the main event thread loop. This will result in faster task failure in case the task is deemed to be non retriable.	2023-11-15 18:22:51 +05:30
Abhishek Radhakrishnan	2e79fd56a7	MSQ generates tombstones honoring granularity specified in a `REPLACE` query. (#15243 ) * MSQ generates tombstones honoring the query's granularity. This change tweaks to only account for the infinite-interval tombstones. For finite-interval tombstones, the MSQ query granualrity will be used which is consistent with how MSQ works. * more tests and some cleanup. * checkstyle * comment edits * Throw TooManyBuckets fault based on review; add more tests. * Add javadocs for both methods on reconciling the methods. * review: Move testReplaceTombstonesWithTooManyBucketsThrowsException to MsqFaultsTest * remove unused imports. * Move TooManyBucketsException to indexing package for shared exception handling. * lower max bucket for tests and fixup count * Advance and count the iterator. * checkstyle	2023-11-14 23:35:36 -08:00
Krishna Anandan	06744d3827	Changing a string to key:value pair to fix flakiness in `testSerializationWithDefaults` (#15317 ) * + Fix for Flaky Test * + Replacing TreeMap with LinkedHashMap * + Changing data structure from LinkedHashMap to HashMap * Fixed flaky test in S3DataSegmentPusherConfigTest.testSerializationValidatingMaxListingLength * Minor Changes	2023-11-14 15:59:07 -08:00
Krishna Anandan	5edeac28df	+ Switching Comparison from String to JSON (#15364 )	2023-11-14 08:07:19 -08:00
Pranav	e2fde8c516	Refactor lookups behavior while loading/dropping the containers (#14806 )	2023-11-07 10:07:28 -08:00
Rishabh Singh	8c802e4c9b	Relocating Table Schema Building: Shifting from Brokers to Coordinator for Improved Efficiency (#14985 ) In the current design, brokers query both data nodes and tasks to fetch the schema of the segments they serve. The table schema is then constructed by combining the schemas of all segments within a datasource. However, this approach leads to a high number of segment metadata queries during broker startup, resulting in slow startup times and various issues outlined in the design proposal. To address these challenges, we propose centralizing the table schema management process within the coordinator. This change is the first step in that direction. In the new arrangement, the coordinator will take on the responsibility of querying both data nodes and tasks to fetch segment schema and subsequently building the table schema. Brokers will now simply query the Coordinator to fetch table schema. Importantly, brokers will still retain the capability to build table schemas if the need arises, ensuring both flexibility and resilience.	2023-11-04 19:33:25 +05:30
Gian Merlino	98f1eb8ede	Use filters for pruning properly for hash-joins. (#15299 ) * Use filters for pruning properly for hash-joins. Native used them too aggressively: it might use filters for the RHS to prune the LHS. MSQ used them not at all. Now, both use them properly, pruning based on base (LHS) columns only. * Fix tests. * Fix style. * Clear filterFields too. * Update.	2023-11-03 07:29:16 -07:00
Adarsh Sanjeev	9576fd3141	HllSketch Merge Aggregator optimizations (#15162 ) * Null byte serde for empty sketches * Cache for HllSketchMerge * Check for empty sketches * Address review comments * Revert changes to HllSketchHolder * Handle null sketch holders instead of null sketches * Add unit test for MSQ HllSketch * Add comments * Fix style	2023-11-03 11:01:22 +08:00
Gian Merlino	d87d92bc43	Add system fields to input sources. (#15276 ) * Add system fields to input sources. Main changes: 1) The SystemField enum defines system fields "__file_uri", "__file_path", and "__file_bucket". They are associated with each input entity. 2) The SystemFieldInputSource interface can be added to any InputSource to make it system-field-capable. It sets up serialization of a list of configured "systemFields" in the JSON form of the input source, and provides a method getSystemFieldValue for computing the value of each system field. Cloud object, HDFS, HTTP, and Local now have this. * Fix various LocalInputSource calls. * Fix style stuff. * Fixups. * Fix tests and coverage.	2023-11-02 10:31:28 -07:00
Clint Wylie	d261587f4a	explicit outputType for ExpressionPostAggregator, better documentation for the differences between arrays and mvds (#15245 ) * better documentation for the differences between arrays and mvds * add outputType to ExpressionPostAggregator to make docs true * add output coercion if outputType is defined on ExpressionPostAgg * updated post-aggregations.md to be consistent with aggregations.md and filters.md and use tables	2023-11-02 00:31:37 -07:00
Adarsh Sanjeev	22443ab87e	Fix an issue with passing order by and limit to realtime tasks (#15301 ) While running queries on real time tasks using MSQ, there is an issue with queries with certain order by columns. If the query specifies a non time column, the query is planned as it is supported by MSQ. However, this throws an exception when passed to real time tasks once as the native query stack does not support it. This PR resolves this by removing the ordering from the query before contacting real time tasks. Fixes a bug with MSQ while reading data from real time tasks with non time ordering	2023-11-02 11:38:26 +05:30
Laksh Singla	b82ad59dfe	Better logging in ServiceClientImpl (#15269 ) ServiceClientImpl logs the cause of every retry, even though we are retrying the connection attempt. This leads to slight pollution in the logs because a lot of the time, the reason for retrying is the same. This is seen primarily in MSQ, when the worker task hasn't launched yet however controller attempts to connect to the worker task, which can lead to scary-looking messages (with INFO log level), even though they are normal. This PR changes the logging logic to log every 10 (arbitrary number) retries instead of every retry, to reduce the pollution of the logs. Note: If there are no retries left, the client returns an exception, which would get thrown up by the caller, and therefore this change doesn't hide any important information.	2023-11-02 11:32:49 +05:30
Laksh Singla	2ea7177f15	Allow casted literal values in SQL functions accepting literals (#15282 ) Functions that accept literals also allow casted literals. This shouldn't have an impact on the queries that the user writes. It enables the SQL functions to accept explicit cast, which is required with JDBC.	2023-11-01 10:38:48 +05:30
Vishesh Garg	039b05585c	Add worker status and duration metrics in live and task reports (#15180 ) Add worker status and duration metrics in live and task reports for tracking.	2023-10-30 09:43:22 +05:30
Alexander Saydakov	f1132d20c5	use datasketches-java 4.2.0 (#15257 ) * use datasketches-java 4.2.0 * use exclusive mode * fixed issues raised by CodeQL * fixed issue raised by spotbugs * fixed issues raised by intellij * added missing import * Update QuantilesSketchKeyCollector search mode and adjust tests. * Update sizeOf functions and add unit tests * Add unit tests --------- Co-authored-by: AlexanderSaydakov <AlexanderSaydakov@users.noreply.github.com> Co-authored-by: Gian Merlino <gianmerlino@gmail.com> Co-authored-by: Adarsh Sanjeev <adarshsanjeev@gmail.com>	2023-10-26 16:28:33 -07:00
Adarsh Sanjeev	c5fa649ea5	Rename segment load wait parameter (#15251 )	2023-10-25 18:08:37 +05:30
Zoltan Haindrich	6784e9c507	Fix summary row issues in case postaggregations are happening (#15232 ) * fix-1/2 * add message v1 * extend test to cover for IOB issue * move stuff around * change message * fix testcase string * compute postaggs (thank you Clint!) * enable feature for test * ignore tests in msq --------- Co-authored-by: Soumyava Das <soumyava@users.noreply.github.com>	2023-10-24 20:33:59 -07:00
Abhishek Radhakrishnan	63e3e9531d	Update S3 retry logic to account for the underlying cause in case of `IOException` (#15238 ) * Update S3 retry logic based on the underlying cause in case of IOException. 4xx and other errors wrapped in IOException for instance aren't retriable. * Fix CI	2023-10-24 15:04:42 -07:00
Zoltan Haindrich	9fb0dbfc9f	Fix json inputs for drill windowing tests (#15148 ) This PR: adds a flag to JsonToParquet to do the fix during conversion updates the json files to more correct conents some resultset mismatches were fixed by this updates parquet to 1.13.1	2023-10-19 14:02:41 +05:30
Laksh Singla	fa311dd0b6	Cast the values read in the EXTERN function to the type in the row signature (#15183 )	2023-10-19 10:24:23 +05:30
Karan Kumar	953ce79439	Add undocumented taskLockType to MSQ. (#15168 ) Patch adds an undocumented parameter taskLockType to MSQ so that we can start enabling this feature for users who are interested in testing the new lock types.	2023-10-17 21:44:04 +05:30
Laksh Singla	dc8d2192c3	Introduce natural comparator for types that don't have a StringComparator (#15145 ) Fixes a bug when executing queries with the ordering of arrays	2023-10-16 10:37:32 +05:30
Karan Kumar	f0a70fe3c4	Fixing the flaky tests. (#15142 )	2023-10-13 16:20:05 +05:30
Adarsh Sanjeev	4deeb7e936	Fix issue with checking segment load status (#15147 ) This PR addresses a bug with waiting for segments to be loaded. In the case of append, segments would be created with the same version. This caused the number of segments returned to be incorrect. This PR changes this to keep track of the range of partition numbers as well for each version, which lets the task wait for the correct set of segments. The partition numbers are expected to be continuous since the task obtains the lock for the segment while running.	2023-10-13 16:06:13 +05:30
Karan Kumar	61ea9e07c5	Limit pages size to a configurable limit (#14994 ) Adding the ability to limit the pages sizes of select queries. We piggyback on the same machinery that is used to control the numRowsPerSegment. This patch introduces a new context parameter rowsPerPage for which the default value is set to 100000 rows. This patch also optimizes adding the last selectResults stage only when the previous stages have sorted outputs. Currently for each select query with selectDestination=durableStorage, we used to add this extra selectResults stage.	2023-10-12 14:01:46 +05:30
Clint Wylie	d0f64608eb	sql compatible three-valued logic native filters (#15058 ) * sql compatible tri-state native logical filters when druid.expressions.useStrictBooleans=true and druid.generic.useDefaultValueForNull=false, and new druid.generic.useThreeValueLogicForNativeFilters=true * log.warn if non-default configurations are used to guide operators towards SQL complaint behavior	2023-10-12 00:06:23 -07:00
Zoltan Haindrich	ae88f2c0b6	Fix non-sqlcompat validation in CalciteWindowQueryTest (#15086 ) * fixes * check for latest rewrite place * Revert "check for latest rewrite place" This reverts commit `5cf1e2c1ca`. * some stuff (cherry picked from commit ab346d4373ea888eb8ef6115e018e7fb0d27407f) * update test output * updates to test ouptuts * some stuff * move validator * cleanup * fix * change test slightly * add apidoc cleanup warnings * cleanup/etc * instead of telling the story; add a fail with some reason whats the issue * lead-lag fix * add test * remove unnecessary throw * druidexception-trial * Revert "druidexception-trial" This reverts commit `8fa06644bc`. * undo changes to no_grouping; add no_grouping2 * add missing assert on resultcount * rename method; update * introduce enum/etc * make resultmatchmode accessible from TestBuilder#expectedResults * fix dump results to use log * fix * handle null correctly * disable feature type based things for MSQ * fix varianssqlaggtest * use eps in other test * fix intellij error * add final * addrss review * update test/string/etc * write concat in 3 lines :D	2023-10-11 12:34:31 -07:00
Laksh Singla	5f86072456	Prepare master for Druid 29 (#15121 ) Prepare master for Druid 29	2023-10-11 10:33:45 +05:30
Abhishek Agarwal	90a1458ac9	Parse passwords containing colon correctly (#15109 )	2023-10-09 20:45:10 +05:30
Laksh Singla	b0edbc3d91	MSQ writes out string arrays instead of MVDs by default (#15093 ) MSQ uses the string dimension schema for ARRAY<STRING> typed columns, which creates MVDs instead of string arrays as required. Therefore someone trying to ingest columns of type ARRAY<STRING> from an external data source or another data source would get STRING columns in the newly generated segments. This patch changes the following: - Use auto dimension schema to ingest the ARRAY<STRING> columns, which will create columns with the desired type. - Add an undocumented flag ingestStringArraysAsMVDs to preserve the legacy behavior. Legacy behaviour is turned on by default. - Create MSQArraysInsertTest and refactor some of the tests in MSQInsertTest.	2023-10-09 20:31:07 +05:30
Laksh Singla	36edbce036	Fix compilation failure in master (#15111 ) Merging since it's a dev blocker.	2023-10-09 20:05:48 +05:30
Clint Wylie	1fc8fb1b20	add a bunch of tests with array typed columns to CalciteArraysQueryTest (#15101 ) * add a bunch of tests with array typed columns to CalciteArraysQueryTest * fix a bug with unnest filter pushdown when filtering on unnested array columns	2023-10-09 06:16:06 -07:00
Laksh Singla	549ef56288	UNION ALLs in MSQ (#14981 ) MSQ now supports UNION ALL with UnionDataSource	2023-10-09 18:18:15 +05:30
Adarsh Sanjeev	7a35ce886d	Add ability for MSQ tasks to query realtime tasks (#15024 ) This PR aims to add the capabilities to: 1. Fetch the realtime segment metadata from the coordinator server view, 2. Adds the ability for workers to query indexers, similar to how brokers do the same for native queries.	2023-10-09 15:14:03 +05:30
Pranav	c7d0615af3	Fix the build for #15013.: Lookup jitter upstream build fix (#15103 ) Fix the build for #15013.	2023-10-09 09:35:39 +05:30
Zoltan Haindrich	b5a87fd89b	Support constant args in window functions (#15071 ) Instead of passing the constants around in a new parameter; InputAccessor was introduced to take care of transparently handling the constants - this new class started picking up some copy-paste debris around field accesses; and made them a little bit more readble.	2023-10-08 12:14:25 +05:30
Zoltan Haindrich	7b869fd37a	Change type of AVG aggregates to double (#15089 ) The sql standard is not very restrictive regarding this: If AVG is specified and DT is exact numeric, then the declared type of the result is an implemen- tation-defined exact numeric type with precision not less than the precision of DT and scale not less than the scale of DT. so; using the same type is also ok (without patch); however the avg of 0 and 1 is 0 right now because of the retention of the integer typ Postgres,MySql and Oracle and Drill seem to increase precision ; mssql returns 0 http://sqlfiddle.com/#!9/6f7248/1 I think we should also increase precision as its already calculated more precisely	2023-10-07 18:01:09 +05:30
Pranav	06c5527c85	Allow aliasing of Macros and add new alias for complex decode 64 (#15034 ) * Add AliasExprMacro to allow aliasing of native expression macros * Add decode_base64_complex alias for complex_decode_base64	2023-10-05 16:24:36 -07:00
Adarsh Sanjeev	7e987e3d69	Add query context parameter for segment load wait (#15076 ) Add segmentLoadWait as a query context parameter. If this is true, the controller queries the broker and waits till the segments created (if any) have been loaded by the load rules. The controller also provides this information in the live reports and task reports. If this is false, the controller exits immediately after finishing the query.	2023-10-05 18:26:34 +05:30
Laksh Singla	30cf76db99	Field writers for numerical arrays (#14900 ) Row-based frames, and by extension, MSQ now supports numeric array types. This means that all queries consuming or producing arrays would also work with MSQ. Numeric arrays can also be ingested via MSQ. Post this patch, queries like, SELECT [1, 2] would work with MSQ since they consume a numeric array, instead of failing with an unsupported column type exception.	2023-10-04 23:16:47 +05:30
Gian Merlino	2ed4fd1ae3	Compute broadcast-join segmentMapFn only once per worker. (#15007 ) This patch introduces "processor managers" to processor factories, as a replacement for the sequence of processors. Processor managers can use the results of earlier processors to influence the creation of later processors, which provides us with the building block we need to ensure that broadcast join data is only read once. In particular, when broadcast join is happening, the BaseFrameProcessorFactory now uses a ChainedProcessorManager to first run BroadcastJoinSegmentMapFnProcessor (in a single thread), and then run all of the regular processors (possibly multithreaded).	2023-10-04 11:47:00 +05:30
Xavier Léauté	adef2069b1	Make unit tests pass with Java 21 (#15014 ) This change updates dependencies as needed and fixes tests to remove code incompatible with Java 21 As a result all unit tests now pass with Java 21. * update maven-shade-plugin to 3.5.0 and follow-up to #15042 * explain why we need to override configuration when specifying outputFile * remove configuration from dependency management in favor of explicit overrides in each module. * update to mockito to 5.5.0 for Java 21 support when running with Java 11+ * continue using latest mockito 4.x (4.11.0) when running with Java 8 * remove need to mock private fields * exclude incorrectly declared mockito dependency from pac4j-oidc * remove mocking of ByteBuffer, since sealed classes can no longer be mocked in Java 21 * add JVM options workaround for system-rules junit plugin not supporting Java 18+ * exclude older versions of byte-buddy from assertj-core * fix for Java 19 changes in floating point string representation * fix missing InitializedNullHandlingTest * update easymock to 5.2.0 for Java 21 compatibility * update animal-sniffer-plugin to 1.23 * update nl.jqno.equalsverifier to 3.15.1 * update exec-maven-plugin to 3.1.0	2023-10-03 22:41:21 -07:00
Soumyava	cb050282a0	Intervals are updated properly for Unnest queries (#15020 ) Fixes a bug where the unnest queries were not updated with the correct intervals.	2023-10-04 02:52:10 +05:30
George Shiqi Wu	64754b6799	Allow users to pass task payload via deep storage instead of environment variable (#14887 ) This change is meant to fix a issue where passing too large of a task payload to the mm-less task runner will cause the peon to fail to startup because the payload is passed (compressed) as a environment variable (TASK_JSON). In linux systems the limit for a environment variable is commonly 128KB, for windows systems less than this. Setting a env variable longer than this results in a bunch of "Argument list too long" errors.	2023-10-03 14:08:59 +05:30
Pranav	07c28f17ca	Fix missing format strings in calls to DruidException.build (#15056 ) * Fix the NPE bug in nonStrictFormat * using non null format string * using Assert.assertThrows	2023-09-29 17:00:36 -07:00
Yuanli Han	9a4433bbad	Fix invalid segment path when using hdfs as the intermediate deepstore (#14984 ) This PR fixes the invalid segment path when enabling druid_processing_intermediaryData_storage_type: "deepstore" and using hdfs as the deep store.	2023-09-29 12:53:46 +05:30

1 2 3 4 5 ...

1450 Commits