druid

Commit Graph

Author	SHA1	Message	Date
Abhishek Radhakrishnan	3c493dc3ed	CircularList round-robin iterator for the KillUnusedSegments duty (#16719 ) * Round-robin iterator for datasources to kill. Currently there's a fairness problem in the KillUnusedSegments duty where the duty consistently selects the same set of datasources as discovered from the metadata store or dynamic config params. This is a problem especially when there are multiple unused. In a medium to large cluster, while we can increase the task slots to increase the likelihood of broader coverage. This patch adds a simple round-robin iterator to select datasources and has the following properties: 1. Starts with an initial random cursor position in an ordered list of candidates. 2. Consecutive {@code next()} iterations from {@link #getIterator()} are guaranteed to be deterministic unless the set of candidates change when {@link #updateCandidates(Set)} is called. 3. Guarantees that no duplicate candidates are returned in two consecutive {@code next()} iterations. * Renames in RoundRobinIteratorTest. * Address review comments. 1. Clarify javadocs on the ordered list. Also flesh out the details a bit more. 2. Rename the test hooks to make intent clearer and fix typo. 3. Add NotThreadSafe annotation. 4. Remove one potentially noisy log that's in the path of iteration. * Add null check to input candidates. * More commentary. * Addres review feedback: downgrade some new info logs to debug; invert condition. Remove redundant comments. Remove rendundant variable tracking. * CircularList adjustments. * Updates to CircularList and cleanup RoundRobinInterator. * One more case and add more tests. * Make advanceCursor private for now. * Review comments.	2024-07-26 12:20:49 -07:00
Laksh Singla	725d442355	Faster dimension deserialization on the brokers (#16740 ) Speedier dimension deserialization on the brokers.	2024-07-26 14:36:11 +05:30
Gian Merlino	b2a88da200	Attempt to coerce COMPLEX to number in numeric aggregators. (#16564 ) * Coerce COMPLEX to number in numeric aggregators. PR #15371 eliminated ObjectColumnSelector's built-in implementations of numeric methods, which had been marked deprecated. However, some complex types, like SpectatorHistogram, can be successfully coerced to number. The documentation for spectator histograms encourages taking advantage of this by aggregating complex columns with doubleSum and longSum. Currently, this doesn't work properly for IncrementalIndex, where the behavior relied on those deprecated ObjectColumnSelector methods. This patch fixes the behavior by making two changes: 1) SimpleXYZAggregatorFactory (XYZ = type; base class for simple numeric aggregators; all of these extend NullableNumericAggregatorFactory) use getObject for STRING and COMPLEX. Previously, getObject was only used for STRING. 2) NullableNumericAggregatorFactory (base class for simple numeric aggregators) has a new protected method "useGetObject". This allows the base class to correctly check for null (using getObject or isNull). The patch also adds a test for SpectatorHistogram + doubleSum + IncrementalIndex. * Fix tests. * Remove the special ColumnValueSelector. * Add test.	2024-07-25 08:45:29 -07:00
Rohan Garg	b5f117bca2	Check for tombstones in wrapping storage adapters (#16791 )	2024-07-25 06:55:40 -04:00
Clint Wylie	14954c7eb9	serialize legacy as false for scan query for rolling downgrade/upgrade (#16793 ) Fixes rolling downgrades/upgrades after #16659 by hard coding scan query "legacy":false since it is a required property during deserialization.	2024-07-25 14:51:58 +05:30
Gian Merlino	c1875e7c1d	HashJoinEngine: Check for interruptions while walking left cursor. (#16773 ) * HashJoinEngine: Check for interruptions while walking left cursor. Previously, the engine only checked for interruptions between emitting joined rows. In scenarios where large numbers of left rows are skipped completely (such as a highly selective INNER JOIN) this led to the join cursor being insufficiently responsive to cancellation. * Coverage.	2024-07-25 15:10:50 +08:00
Zoltan Haindrich	7e3fab5bf9	Make WindowFrames more specific (#16741 ) Changes the WindowFrame internals / representation a bit; introduces dedicated frametypes for rows and groups which corresponds to the implemented processing methods	2024-07-25 04:57:36 +02:00
Clint Wylie	302739aa58	more aggressive cancellation of broker parallel merge, more chill blocking queue timeouts, and query cancellation participation (#16748 ) * more aggressive cancellation of broker parallel merge, more chill blocking queue timeouts * wire parallel merge into query cancellation system * oops * style * adjust metrics initialization * fix timeout, fix cleanup to not block * javadocs to clarify why cancellation future and gizmo are split * cancelled -> canceled, simplify QueuePusher since it always takes a ResultBatch, non-static terminal marker to make stuff stop complaining about types, specialize tryOffer to be tryOfferTerminal so it wont be misused, add comments to clarify reason for non-blocking offers that might fail	2024-07-24 14:58:34 +08:00
Laksh Singla	11bb40981e	Deduce type from the aggregators when materializing subquery results (#16703 ) For aggregators like StringFirst/Last, whose intermediate type isn't the same as the final type, using them in GroupBy, TopN or Timeseries subqueries causes a fallback when maxSubqueryBytes is set. This is because we assume that the finalization is not known, due to which the row signature cannot determine whether to use the intermediate or the final type, and it puts it as null. This PR figures out the finalization from the query context and uses the intermediate or the final type appropriately.	2024-07-23 11:52:39 +05:30
Gian Merlino	8b8ca0d7fc	DimFilterUtils: Exit filterShards early when filter is null. (#16774 ) When the filter is null, there is no need to run the converter on all the input objects.	2024-07-22 21:17:11 -07:00
Clint Wylie	b645d09c5d	move long and double nested field serialization to later phase of serialization (#16769 ) changes: * moves value column serializer initialization, call to `writeValue` method to `GlobalDictionaryEncodedFieldColumnWriter.writeTo` instead of during `GlobalDictionaryEncodedFieldColumnWriter.addValue`. This shift means these numeric value columns are now done in the per field section that happens after serializing the nested column raw data, so only a single compression buffer and temp file will be needed at a time instead of the total number of nested literal fields present in the column. This should be especially helpful for complicated nested structures with thousands of columns as even those 64k compression buffers can add up pretty quickly to a sizeable chunk of direct memory.	2024-07-22 21:14:30 -07:00
Clint Wylie	02b8738c00	remove batchProcessingMode from task config, remove AppenderatorImpl (#16765 ) changes: * removes `druid.indexer.task.batchProcessingMode` in favor of always using `CLOSED_SEGMENT_SINKS` which uses `BatchAppenderator`. This was intended to become the default for native batch, but that was missed so `CLOSED_SEGMENTS` was the default (using `AppenderatorImpl`), however MSQ has been exclusively using `BatchAppenderator` with no problems so it seems safe to just roll it out as the only option for batch ingestion everywhere. * with `batchProcessingMode` gone, there is no use for `AppenderatorImpl` so it has been removed * implify `Appenderator` construction since there are only separate stream and batch versions now * simplify tests since `batchProcessingMode` is gone	2024-07-22 13:56:44 -07:00
Akshat Jain	6a2348b78b	Preemptive restriction for queries with approximate count distinct on complex columns of unsupported type (#16682 ) This PR aims to check if the complex column being queried aligns with the supported types in the aggregator and aggregator factories, and throws a user-friendly error message if they don't.	2024-07-22 21:34:06 +05:30
Sree Charan Manamala	149d7c5207	Throw exceptions in SqlValidator when DISTINCT used over WINDOW (#16738 ) * Throw exception if DISTINCT used with window functions aggregate call * Improve error message when unsupported aggregations are used with window functions	2024-07-22 16:29:46 +02:00
Clint Wylie	a34a06e192	remove Firehose and FirehoseFactory (#16758 ) changes: * removed `Firehose` and `FirehoseFactory` and remaining implementations which were mostly no longer used after #16602 * Moved `IngestSegmentFirehose` which was still used internally by Hadoop ingestion to `DatasourceRecordReader.SegmentReader` * Rename `SQLFirehoseFactoryDatabaseConnector` to `SQLInputSourceDatabaseConnector` and similar renames for sub-classes * Moved anything remaining in a 'firehose' package somewhere else * Clean up docs on firehose stuff	2024-07-19 14:37:21 -07:00
Clint Wylie	35b876436b	remove native scan query legacy mode (#16659 )	2024-07-18 23:33:27 -07:00
Akshat Jain	b53c26f5c5	Fix issues with partitioning boundaries for MSQ window functions (#16729 ) * Fix issues with partitioning boundaries for MSQ window functions * Address review comments * Address review comments * Add test for coverage check failure * Address review comment * Remove DruidWindowQueryTest and WindowQueryTestBase, move those tests to DrillWindowQueryTest * Update extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/querykit/WindowOperatorQueryKit.java * Address review comments * Add test for equals and hashcode for WindowOperatorQueryFrameProcessorFactory * Address review comment * Fix checkstyle --------- Co-authored-by: Benedict Jin <asdf2014@apache.org>	2024-07-18 10:05:09 +08:00
Kashif Faraz	89066b72cf	Fix bug in TaskStorageQueryAdapter (#16750 ) Changes: - Do not hold a reference to `TaskQueue` in `TaskStorageQueryAdapter` - Use `TaskStorage` instead of `TaskStorageQueryAdapter` in `IndexerMetadataStorageAdapter` - Rename `TaskStorageQueryAdapter` to `TaskQueryTool` - Fix newly added task actions `RetrieveUpgradedFromSegmentIds` and `RetrieveUpgradedToSegmentIds` by removing `isAudited` method.	2024-07-17 23:17:41 +05:30
Sree Charan Manamala	40ef9fc4ec	Bug fix for array type selector causing array aggregation over window frame fail (#16653 )	2024-07-17 14:09:56 +02:00
Kashif Faraz	9f6ce6ddc0	Remove task action audit logging and druid_taskLog metadata table (#16309 ) Description: Task action audit logging was first deprecated and disabled by default in Druid 0.13, #6368. As called out in the original discussion #5859, there are several drawbacks to persisting task action audit logs. - Only usage of the task audit logs is to serve the API `/indexer/v1/task/{taskId}/segments` which returns the list of segments created by a task. - The use case is really narrow and no prod clusters really use this information. - There can be better ways of obtaining this information, such as the metric `segment/added/bytes` which reports both the segment ID and task ID when a segment is committed by a task. We could also include committed segment IDs in task reports. - A task persisting several segments would bloat up the audit logs table putting unnecessary strain on metadata storage. Changes: - Remove `TaskAuditLogConfig` - Remove method `TaskAction.isAudited()`. No task action is audited anymore. - Remove `SegmentInsertAction` as it is not used anymore. `SegmentTransactionalInsertAction` is the new incarnation which has been in use for a while. - Deprecate `MetadataStorageActionHandler.addLog()` and `getLogs()`. These are not used anymore but need to be retained for backward compatibility of extensions. - Do not create `druid_taskLog` metadata table anymore.	2024-07-17 17:09:00 +05:30
Sree Charan Manamala	78a4a09d01	Window Function offset correction for RAC (#16718 ) * When an ArrayList RAC creates a child RAC, the start and end offsets need to have the offset of parent's start offset * Defaults the 2nd window bound to CURRENT ROW when only a single bound is specified * Removes the windowingStrictValidation warning and throws a hard exception when Order By alongside RANGE clause is not provided with UNBOUNDED or CURRENT ROW as both bounds	2024-07-15 12:43:27 +02:00
Laksh Singla	209f8a9546	Deserialize complex dimensions in group by queries to their respective types when reading from spilled files and cached results (#16620 ) Like #16511, but for keys that have been spilled or cached during the grouping process	2024-07-15 15:00:17 +05:30
Laksh Singla	3a1b437056	Improve the fallback strategy when the broker is unable to materialize the subquery's results as frames for estimating the bytes (#16679 ) Better fallback strategy when the broker is unable to materialize the subquery's results as frames for estimating the bytes: a. We don't touch the subquery sequence till we know that we can materialize the result as frames	2024-07-12 21:49:12 +05:30
Vishesh Garg	197c54f673	Auto-Compaction using Multi-Stage Query Engine (#16291 ) Description: Compaction operations issued by the Coordinator currently run using the native query engine. As majority of the advancements that we are making in batch ingestion are in MSQ, it is imperative that we support compaction on MSQ to make Compaction more robust and possibly faster. For instance, we have seen OOM errors in native compaction that MSQ could have handled by its auto-calculation of tuning parameters. This commit enables compaction on MSQ to remove the dependency on native engine. Main changes: * `DataSourceCompactionConfig` now has an additional field `engine` that can be one of `[native, msq]` with `native` being the default. * if engine is MSQ, `CompactSegments` duty assigns all available compaction task slots to the launched `CompactionTask` to ensure full capacity is available to MSQ. This is to avoid stalling which could happen in case a fraction of the tasks were allotted and they eventually fell short of the number of tasks required by the MSQ engine to run the compaction. * `ClientCompactionTaskQuery` has a new field `compactionRunner` with just one `engine` field. * `CompactionTask` now has `CompactionRunner` interface instance with its implementations `NativeCompactinRunner` and `MSQCompactionRunner` in the `druid-multi-stage-query` extension. The objectmapper deserializes `ClientCompactionRunnerInfo` in `ClientCompactionTaskQuery` to the `CompactionRunner` instance that is mapped to the specified type [`native`, `msq`]. * `CompactTask` uses the `CompactionRunner` instance it receives to create the indexing tasks. * `CompactionTask` to `MSQControllerTask` conversion logic checks whether metrics are present in the segment schema. If present, the task is created with a native group-by query; if not, the task is issued with a scan query. The `storeCompactionState` flag is set in the context. * Each created `MSQControllerTask` is launched in-place and its `TaskStatus` tracked to determine the final status of the `CompactionTask`. The id of each of these tasks is the same as that of `CompactionTask` since otherwise, the workers will be unable to determine the controller task's location for communication (as they haven't been launched via the overlord).	2024-07-12 16:40:20 +05:30
Clint Wylie	dca31d466c	minor adjustments for performance (#16714 ) changes: * switch to stop using some string.format * switch some streams to classic loops	2024-07-11 16:57:15 -07:00
Clint Wylie	b3c238457f	fix unnest bugs (#16723 ) changes: * fixes a bug with unnest storage adapter not preserving underlying columns dictionary uniqueness when allowing dimension selector cursor * fixes a bug with unnest on realtime segments with empty rows incorrectly specifying index 0 as the row dictionary value	2024-07-11 13:48:15 -07:00
Clint Wylie	d6c07270a5	fix issues with join filter pushdown and virtual column resolution (#16702 )	2024-07-11 04:26:07 -07:00
Clint Wylie	09e0eefdc3	modify equality and typed in filter behavior for numeric match values on string columns (#16593 ) * fix equality and typed in filter behavior for numeric match values on string columns changes: * EqualityFilter and TypedInfilter numeric match values against string columns will now cast strings to numeric values instead of converting the numeric values directly to string for pure string equality, which is consistent with the casts which are eaten in the SQL layer, as well as classic druid behavior * added tests to cover numeric equality matching. Double match values in particular would fail to match the string values since `1.0` would become `'1.0'` which does not match `'1'`.	2024-07-08 10:58:05 -07:00
Clint Wylie	45c020060c	better javadoc for ColumnIndexSupplier (#16663 ) Updated javadoc for `ColumnIndexSupplier.as` to elaborate on the types of indexes callers might want to ask for from the method, as well as help implementors know what kinds of indexes they should implement to participate in filtering	2024-06-27 17:53:20 -07:00
Clint Wylie	d86f25c74a	fix vector grouping expression deferred evaluation to only consider dictionary encoded strings as fixed width (#16666 )	2024-06-27 16:19:16 -07:00
Gian Merlino	dbed1b0f50	Defer more expressions in vectorized groupBy. (#16338 ) * Defer more expressions in vectorized groupBy. This patch adds a way for columns to provide GroupByVectorColumnSelectors, which controls how the groupBy engine operates on them. This mechanism is used by ExpressionVirtualColumn to provide an ExpressionDeferredGroupByVectorColumnSelector that uses the inputs of an expression as the grouping key. The actual expression evaluation is deferred until the grouped ResultRow is created. A new context parameter "deferExpressionDimensions" allows users to control when this deferred selector is used. The default is "fixedWidthNonNumeric", which is a behavioral change from the prior behavior. Users can get the prior behavior by setting this to "singleString". * Fix style. * Add deferExpressionDimensions to SqlExpressionBenchmark. * Fix style. * Fix inspections. * Add more testing. * Use valueOrDefault. * Compute exprKeyBytes a bit lighter-weight.	2024-06-26 17:28:36 -07:00
Clint Wylie	d4f2636325	fix greatest/least function non-vectorized processing to ignore null argument types (#16649 )	2024-06-26 12:59:42 -07:00
Laksh Singla	71b3b5ab5d	Add query context parameter to remove null bytes when writing frames (#16579 ) MSQ cannot process null bytes in string fields, and the current workaround is to remove them using the REPLACE function. 'removeNullBytes' context parameter has been added which sanitizes the input string fields by removing these null bytes.	2024-06-26 15:00:30 +05:30
Kashif Faraz	d9bd02256a	Refactor: Rename UsedSegmentChecker and cleanup task actions (#16644 ) Changes: - Rename `UsedSegmentChecker` to `PublishedSegmentsRetriever` - Remove deprecated single `Interval` argument from `RetrieveUsedSegmentsAction` as it is now unused and has been deprecated since #1988 - Return `Set` of segments instead of a `Collection` from `IndexerMetadataStorageCoordinator.retrieveUsedSegments()`	2024-06-26 10:48:59 +05:30
Tom	52c9929019	Column name in parse exceptions (#16529 ) * first pass * more changes * fix tests and formatting * fix kinesis failing tests * fix kafka tests * add dimension name to float parse errors * double and convertToType handling of dimensionName can report parse errors with dimension name * fix checkstyle issue * fix tests * more cases to have better parse exception messages * fix test * fix tests * partially address comments * annotate method parameter with nullable * address comments * fix tests * let float, double, long dimensionIndexer pass dimensionName down to dimensionHandlerUtils * fix compilation error and clean up formatting * clean up whitespace * address feedback. undo change, pass down report parse exception for convertToType * fix test	2024-06-25 13:42:52 -07:00
Clint Wylie	37a50e6803	Remove index_realtime and index_realtime_appenderator tasks (#16602 ) index_realtime tasks were removed from the documentation in #13107. Even at that time, they weren't really documented per se— just mentioned. They existed solely to support Tranquility, which is an obsolete ingestion method that predates migration of Druid to ASF and is no longer being maintained. Tranquility docs were also de-linked from the sidebars and the other doc pages in #11134. Only a stub remains, so people with links to the page can see that it's no longer recommended. index_realtime_appenderator tasks existed in the code base, but were never documented, nor as far as I am aware were they used for any purpose. This patch removes both task types completely, as well as removes all supporting code that was otherwise unused. It also updates the stub doc for Tranquility to be firmer that it is not compatible. (Previously, the stub doc said it wasn't recommended, and pointed out that it is built against an ancient 0.9.2 version of Druid.) ITUnionQueryTest has been migrated to the new integration tests framework and updated to use Kafka ingestion. Co-authored-by: Gian Merlino <gianmerlino@gmail.com>	2024-06-24 20:13:33 -07:00
Abhishek Radhakrishnan	7463589b07	Support for bootstrap segments (#16609 ) * Initial support for bootstrap segments. - Adds a new API in the coordinator. - All processes that have storage locations configured (including tasks) talk to the coordinator if they can, and fetch bootstrap segments from it. - Then load the segments onto the segment cache as part of startup. - This addresses the segment bootstrapping logic required by processes before they can start serving queries or ingesting. This patch also lays the foundation to speed up upgrades. * Fail open by default if there are any errors talking to the coordinator. * Add test for failure scenario and cleanup logs. * Cleanup and add debug log * Assert the events so we know the list exactly. * Revert RunRules test. The rules aren't evaluated if there are no clusters. * Revert RunRulesTest too. * Remove debug info. * Make the API POST and update log. * Fix up UTs. * Throw 503 from MetadataResource; clean up exception handling and DruidException. * Remove unused logger, add verification of metrics and docs. * Update error message * Update server/src/main/java/org/apache/druid/server/coordination/SegmentLoadDropHandler.java Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Apply suggestions from code review Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Adjust test metric expectations with the rename. * Add BootstrapSegmentResponse container in the response for future extensibility. * Rename to BootstrapSegmentsInfo for internal consistency. * Remove unused log. * Use a member variable for broadcast segments instead of segmentAssigner. * Minor cleanup * Add test for loadable bootstrap segments and clarify comment. * Review suggestions. --------- Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2024-06-24 09:27:17 -07:00
Sree Charan Manamala	990fd5f5fb	Make use group iterator for all window frames & support for same bound kinds (#16603 ) Fixes apache/druid#15739	2024-06-24 15:52:41 +02:00
Laksh Singla	00c96432af	Materialize scan results correctly when columns are not present in the segments (#16619 ) Fixes a bug causing maxSubqueryBytes not to work when segments have missing columns.	2024-06-23 23:15:45 +05:30
Akshat Jain	cd438b1918	Emit metrics for S3UploadThreadPool (#16616 ) * Emit metrics for S3UploadThreadPool * Address review comments * Revert unnecessary formatting change * Revert unnecessary formatting change in metrics.md file * Address review comments * Add metric for task duration * Minor fix in metrics.md * Add s3Key and uploadId in the log message * Address review comments * Create new instance of ServiceMetricEvent.Builder for thread safety * Address review comments * Address review comments	2024-06-21 11:36:47 +05:30
Adithya Chakilam	35709de549	CgroupCpuSetMonitor: Initialize the cgroup discoverer (#16621 )	2024-06-20 10:23:59 -07:00
Abhishek Radhakrishnan	b20c3dbadf	Fix malformed period throwing `ADMIN` persona error (#16626 ) * Turn invalid periods into user-facing exception providing more context. The current exception is targeting the ADMIN persona. Catch that and turn it into a USER persona instead. Also, provide more context in the error message. * Review comment: pass the wrapping expression and stringify. * Update processing/src/main/java/org/apache/druid/query/expression/ExprUtils.java Co-authored-by: Clint Wylie <cjwylie@gmail.com> --------- Co-authored-by: Clint Wylie <cjwylie@gmail.com>	2024-06-20 08:40:28 -07:00
Sree Charan Manamala	7ac0862287	Grouping Engine fix when a limit spec with different order by columns is applied (#16534 )	2024-06-20 11:35:58 +02:00
Sam Rash	a10310388f	Add Conditional Helpers to DruidException / InvalidInput (#16470 ) Adds versions of DruidException.defensive(String, Object...) InvalidInput.exception(String, Object...) InvalidInput.exception(Throwable, String, Object...) the versions add a boolean as the first arg and only create and throw an exception if it's false. It can be used similar to Preconditions.checkState/checkArgument	2024-06-18 14:05:43 +05:30
Virushade	eb842d3dda	Remove redundant check on optional in BlockingQueueFrameChannel.Writable#isClosed (#16595 ) * Remove redundant check on optional in BlockingQueueFrameChannel.Writable#isClosed * Rollback mistake	2024-06-14 15:21:07 +05:30
Laksh Singla	da1e293a57	Deserialize dimensions in group by queries to their respective types when reading from their serialized format (#16511 ) * init * tests, pair groupable * framework change * tests * update benchmarks * comments * add javadoc for the jsonMapper * remove extra deserialization * add special serde for map based result rows * revert unnecessary change --------- Co-authored-by: asdf2014 <asdf2014@apache.org>	2024-06-14 16:27:47 +08:00
Zoltan Haindrich	ac19b148c2	Upgrade calcite to 1.37.0 (#16504 ) * contains Make a full copy of the parser and apply our modifications to it #16503 * some minor api changes pair/entry * some unnecessary aggregation was removed from a set of queries in `CalciteSubqueryTest` * `AliasedOperatorConversion` was detecting `CHAR_LENGTH` as not a function ; I've removed the check * the field it was using doesn't look maintained that much * the `kind` is passed for the created `SqlFunction` so I don't think this check is actually needed * some decoupled test cases become broken - will be fixed later * some aggregate related changes: due to the fact that SUM() and COUNT() of no inputs are different * upgrade avatica to 1.25.0 * `CalciteQueryTest#testExactCountDistinctWithFilter` is now executable Close apache/druid#16503	2024-06-13 08:47:50 +02:00
Clint Wylie	fee509df2e	fix NestedDataColumnIndexerV4 to not report cardinality (#16507 ) * fix NestedDataColumnIndexerV4 to not report cardinality changes: * fix issue similar to #16489 but for NestedDataColumnIndexerV4, which can report STRING type if it only processes a single type of values. this should be less common than the auto indexer problem * fix some issues with sql benchmarks	2024-06-11 20:58:12 -07:00
Clint Wylie	3fb6ba22e8	fix expression column capabilities to not report dictionary encoded unless input is string (#16577 )	2024-06-08 13:05:19 -07:00
Akshat Jain	03a38be446	Optimize S3 storage writing for MSQ durable storage (#16481 ) * Optimise S3 storage writing for MSQ durable storage * Get rid of static ConcurrentHashMap * Fix static checks * Fix tests * Remove unused constructor parameter chunkValidation + relevant cleanup * Assert etags as String instead of Integer * Fix flaky test * Inject executor service * Make threadpool size dynamic based on number of cores * Fix S3StorageDruidModuleTest * Fix S3StorageConnectorProviderTest * Fix injection issues * Add S3UploadConfig to manage maximum number of concurrent chunks dynamically based on chunk size * Address the minor review comments * Refactor S3UploadConfig + ExecutorService into S3UploadManager * Address review comments * Make updateChunkSizeIfGreater() synchronized instead of recomputeMaxConcurrentNumChunks() * Address the minor review comments * Fix intellij-inspections check * Refactor code to use futures for maxNumConcurrentChunks. Also use executor service with blocking queue for backpressure semantics. * Update javadoc * Get rid of cyclic dependency injection between S3UploadManager and S3OutputConfig * Fix RetryableS3OutputStreamTest * Remove unnecessary synchronization parts from RetryableS3OutputStream * Update javadoc * Add S3UploadManagerTest * Revert back to S3StorageConnectorProvider extends S3OutputConfig * Address Karan's review comments * Address Kashif's review comments * Change a log message to debug * Address review comments * Fix intellij-inspections check * Fix checkstyle --------- Co-authored-by: asdf2014 <asdf2014@apache.org>	2024-06-07 11:33:16 +05:30

1 2 3 4 5 ...

2757 Commits