druid

Commit Graph

Author	SHA1	Message	Date
Adarsh Sanjeev	269e035e76	Add validation for reindex with realtime sources (#16390 ) Add validation for reindex with realtime sources. With the addition of concurrent compaction, it is possible to ingest data while querying from realtime sources with MSQ into the same datasource. This could potentially lead to issues if the interval that is ingested into is replaced by an MSQ job, which has queried only some of the data from the realtime task. This PR adds validation to check that the datasource being ingested into is not being queried from, if the query includes realtime sources.	2024-05-07 10:32:15 +05:30
Alberic Liu	92fb0ff718	upgrade mysql:mysql-connector-java to 8.2.0 (#16024 ) * upgrade mysql:mysql-connector-java to 8.2.0 * fix the check errors * remove unused comment	2024-05-06 21:58:37 +08:00
zachjsh	fb7c84fb5d	Catalog clustering keys fixes (#16351 ) * * add another catalog clustering columns unit test * * dissallow clusterKeys with descending order * * make more clear that clustering is re-written into ingest node whether a catalog table or not * * when partitionedBy is stored in catalog, user shouldnt need to specify it in order to specify clustering * * fix intellij inspection failure	2024-05-03 14:02:56 -04:00
Jan Werner	b16401323b	update dependencies to address CVEs (#16374 ) update dependencies to address new batch of CVEs: - Azure POM from 1.2.19 to 1.2.23 to update transitive dependency nimbus-jose-jwt to address: CVE-2023-52428 - commons-configuration2 from 2.8.0 to 2.10.1 to address: CVE-2024-29131 CVE-2024-29133 - bcpkix-jdk18on from 1.76 to 1.78.1 to address: CVE-2024-30172 CVE-2024-30171 CVE-2024-29857	2024-05-02 21:35:21 -07:00
Gian Merlino	5d1950d451	MSQ controller: Support in-memory shuffles; towards JVM reuse. (#16168 ) * MSQ controller: Support in-memory shuffles; towards JVM reuse. This patch contains two controller changes that make progress towards a lower-latency MSQ. First, support for in-memory shuffles. The main feature of in-memory shuffles, as far as the controller is concerned, is that they are not fully buffered. That means that whenever a producer stage uses in-memory output, its consumer must run concurrently. The controller determines which stages run concurrently, and when they start and stop. "Leapfrogging" allows any chain of sort-based stages to use in-memory shuffles even if we can only run two stages at once. For example, in a linear chain of stages 0 -> 1 -> 2 where all do sort-based shuffles, we can use in-memory shuffling for each one while only running two at once. (When stage 1 is done reading input and about to start writing its output, we can stop 0 and start 2.) 1) New OutputChannelMode enum attached to WorkOrders that tells workers whether stage output should be in memory (MEMORY), or use local or durable storage. 2) New logic in the ControllerQueryKernel to determine which stages can use in-memory shuffling (ControllerUtils#computeStageGroups) and to launch them at the appropriate time (ControllerQueryKernel#createNewKernels). 3) New "doneReadingInput" method on Controller (passed down to the stage kernels) which allows stages to transition to POST_READING even if they are not gathering statistics. This is important because it enables "leapfrogging" for HASH_LOCAL_SORT shuffles, and for GLOBAL_SORT shuffles with 1 partition. 4) Moved result-reading from ControllerContext#writeReports to new QueryListener interface, which ControllerImpl feeds results to row-by-row while the query is still running. Important so we can read query results from the final stage using an in-memory channel. 5) New class ControllerQueryKernelConfig holds configs that control kernel behavior (such as whether to pipeline, maximum number of concurrent stages, etc). Generated by the ControllerContext. Second, a refactor towards running workers in persistent JVMs that are able to cache data across queries. This is helpful because I believe we'll want to reuse JVMs and cached data for latency reasons. 1) Move creation of WorkerManager and TableInputSpecSlicer to the ControllerContext, rather than ControllerImpl. This allows managing workers and work assignment differently when JVMs are reusable. 2) Lift the Controller Jersey resource out from ControllerChatHandler to a reusable resource. 3) Move memory introspection to a MemoryIntrospector interface, and introduce ControllerMemoryParameters that uses it. This makes it easier to run MSQ in process types other than Indexer and Peon. Both of these areas will have follow-ups that make similar changes on the worker side. * Address static checks. * Address static checks. * Fixes. * Report writer tests. * Adjustments. * Fix reports. * Review updates. * Adjust name. * Small changes.	2024-04-30 21:30:27 -07:00
Adarsh Sanjeev	fb63520de9	Add tests for ProcessorManager (#16327 ) * Add tests for ProcessorManager	2024-04-30 09:35:26 +05:30
Adithya Chakilam	f8015eb02a	Add config lagAggregate to LagBasedAutoScalerConfig (#16334 ) Changes: - Add new config `lagAggregate` to `LagBasedAutoScalerConfig` - Add field `aggregateForScaling` to `LagStats` - Use the new field/config to determine which aggregate to use to compute lag - Remove method `Supervisor.computeLagForAutoScaler()`	2024-04-29 22:20:41 +05:30
Adarsh Sanjeev	9a2d7c28bc	Prepare master branch for 31.0.0 release (#16333 )	2024-04-26 09:22:43 +05:30
Arun Ramani	126a0c219a	Surface lock revocation exceptions in task status (#16325 )	2024-04-26 08:39:44 +05:30
AmatyaAvadhanula	31eee7d51e	Check for handoff of upgraded segments (#16162 ) Changes: 1) Check for handoff of upgraded realtime segments. 2) Drop sink only when all associated realtime segments have been abandoned. 3) Delete pending segments upon commit to prevent unnecessary upgrades and partition space exhaustion when a concurrent replace happens. This also prevents potential data duplication. 4) Register pending segment upgrade only on those tasks to which the segment is associated.	2024-04-25 22:03:38 +05:30
Jakub Matyszewski	5061507541	pacj4: add UserProfile attributes to AuthenticationResult context (#16109 ) I'm adding OIDC context to the AuthenticationResult returned by pac4j extension. I wanted to use this context as input in OpenPolicyAgent authorization. Since AuthenticationResult already accepts context as a parameter it felt okay to pass the profile attributes there.	2024-04-25 12:10:12 +05:30
Zoltan Haindrich	9c0bd56f5b	Make QueryComponentSupliers independent from test classes (#16275 )	2024-04-25 02:12:07 -04:00
Gian Merlino	8a5cc976a9	ArrayOfDoublesSketchBuildAggregator: Fix NPE in get() for empty sketch. (#16330 ) Fixes a bug introduced in #16296, where the sketch might not be initialized if get() is called without calling aggregate(). Also adds a test for this case.	2024-04-25 00:59:59 -04:00
Laksh Singla	6bca406d31	Grouping on complex columns aka unifying GroupBy strategies (#16068 ) Users can pass complex types as dimensions to the group by queries. For example: SELECT nested_col1, count(*) FROM foo GROUP BY nested_col1	2024-04-24 23:00:14 +05:30
Rishabh Singh	e30790e013	Introduce Segment Schema Publishing and Polling for Efficient Datasource Schema Building (#15817 ) Issue: #14989 The initial step in optimizing segment metadata was to centralize the construction of datasource schema in the Coordinator (#14985). Thereafter, we addressed the problem of publishing schema for realtime segments (#15475). Subsequently, our goal is to eliminate the requirement for regularly executing queries to obtain segment schema information. This is the final change which involves publishing segment schema for finalized segments from task and periodically polling them in the Coordinator.	2024-04-24 22:22:53 +05:30
Sree Charan Manamala	080476f9ea	WINDOWING - Fix 2 nodes with same digest causing mapping issue (#16301 ) Fixes the mapping issue in window fucntions where 2 nodes get the same reference.	2024-04-24 16:45:02 +05:30
Gian Merlino	274ccbfd85	Reset buffer aggregators when resetting Groupers. (#16296 ) Buffer aggregators can contain some cached objects within them, such as Memory references or HLL Unions. Prior to this patch, various Grouper implementations were not releasing this state when resetting their own internal state, which could lead to excessive memory use. This patch renames AggregatorAdapater#close to "reset", and updates Grouper implementations to call this reset method whenever they reset their internal state. The base method on BufferAggregator and VectorAggregator remains named "close", for compatibility with existing extensions, but the contract is adjusted to say that the aggregator may be reused after the method is called. All existing implementations in core already adhere to this new contract, except for the ArrayOfDoubles build flavors, which are updated in this patch to adhere. Additionally, this patch harmonizes buffer sketch helpers to call their clear method "clear" rather than a mix of "clear" and "close". (Others were already using "clear".)	2024-04-24 05:39:24 -04:00
Parth Agrawal	f1d24c868f	[CVE Fixes] Update version of Nimbus.jose.jwt (#16320 ) * Update version of nimbus.jose.jwt.version * update licenses.yaml	2024-04-23 15:11:54 +05:30
Vishesh Garg	173a206829	Fix incorrect check of InvalidFieldException to InvalidFieldFault while generating MSQ Error Report (#16273 ) InvalidFieldFault is incorrectly checked as InvalidFieldException in mapQueryColumnNameToOutputColumnName. This fixes the bug.	2024-04-22 15:18:49 +05:30
Laksh Singla	b9bbde5c0a	Fix deadlock that can occur while merging group by results (#15420 ) This PR prevents such a deadlock from happening by acquiring the merge buffers in a single place and passing it down to the runner that might need it.	2024-04-22 14:10:44 +05:30
Adithya Chakilam	cff5d1e369	Add method Supervisor.computeLagForAutoScaler (#16314 ) Tries to address the comments made on #16284 after merged. Changes: - Remove method `Supervisor.getLagMetric()` - Add method `Supervisor.computeLagForAutoScaler()` - Remove classes `LagMetric` and `LagMetricTest`	2024-04-20 07:57:50 +05:30
Akshat Jain	79e48c6b45	Fix NPE while loading lookups from empty JDBC source (#16307 )	2024-04-18 21:52:02 +05:30
zachjsh	3f2dd46ede	Catalog table should not need explicit segment granularity set (#16278 ) * * fix * * fix * * address review comments * * fix * * simplify tests * * fix complex type nullability issue * * fix and update test * * address review comments * * address test review comments * * fix checkstyle * * fix checkstyle * * fix failing test	2024-04-17 11:46:24 -04:00
zachjsh	2351f038eb	Kafka with topicPattern can ignore old offsets spuriously (#16190 ) * * fix * * simplify * * simplify tests * * update matches function definition for Kafka Datasource Metadata * * add matchesOld * * override matches and plus for kafka based metadata / sequence numbers * * implement minus * add tests * * fix failing tests * * remove TODO comments * * simplfy and add comments * * remove unused variable in tests * * remove unneeded function * * add serde tests * * more stuff * * address review comments * * remove unneeded code.	2024-04-17 10:00:17 -04:00
Adithya Chakilam	34237bc112	Consider max lag for kinesis while autoscaling (#16284 ) * Consider max lag for kinesis while autoscaling * add test for coverage * test folder	2024-04-17 15:05:05 +05:30
AmatyaAvadhanula	f3d69f30e6	Associate pending segments with the tasks that requested them (#16144 ) Changes: - Add column `task_allocator_id` to `pendingSegments` metadata table. - Add column `upgraded_from_segment_id` to `pendingSegments` metadata table. - Add interface `PendingSegmentAllocatingTask` and implement it by all tasks which can allocate pending segments. - Use `taskAllocatorId` to identify the task (and its sub-tasks or replicas) to which a pending segment has been allocated. - Perform active cleanup of pending segments in `TaskLockbox` once there are no active tasks for the corresponding task allocator id. - When committing APPEND segments, also commit all upgraded pending segments corresponding to that task allocator id. - When committing REPLACE segments, upgrade all overlapping pending segments in the same transaction.	2024-04-17 09:06:31 +05:30
zachjsh	a5428e75ff	INSERT/REPLACE complex target column types are validated against source input expressions (#16223 ) * * fix * * fix * * address review comments * * fix * * simplify tests * * fix complex type nullability issue * * address review comments * * address test review comments * * fix checkstyle	2024-04-16 17:20:35 -04:00
AmatyaAvadhanula	ad6bd62140	Handle task location fetch from overlord during rolling upgrades (#16227 ) Bug: #15724 introduced a bug where a rolling upgrade would cause all task locations returned by the Overlord on an older version to be unknown. Fix: If the new API fails, fall back to single task status API which always returns a valid task location.	2024-04-16 21:01:37 +05:30
YongGang	6964297b53	Remove the unused Controller context reference from Worker (#16285 )	2024-04-16 08:34:24 +05:30
Adarsh Sanjeev	3df00aef9d	Add manifest file for MSQ export (#15953 ) Currently, export creates the files at the provided destination. The addition of the manifest file will provide a list of files created as part of the manifest. This will allow easier consumption of the data exported from Druid, especially for automated data pipelines	2024-04-15 11:37:31 +05:30
Kashif Faraz	81d7b6ebe1	Fix OverlordClient to read reports as a concrete `ReportMap` (#16226 ) Follow up to #16217 Changes: - Update `OverlordClient.getReportAsMap()` to return `TaskReport.ReportMap` - Move the following classes to `org.apache.druid.indexer.report` in the `druid-processing` module - `TaskReport` - `KillTaskReport` - `IngestionStatsAndErrorsTaskReport` - `TaskContextReport` - `TaskReportFileWriter` - `SingleFileTaskReportFileWriter` - `TaskReportSerdeTest` - Remove `MsqOverlordResourceTestClient` as it had only one method which is already present in `OverlordResourceTestClient` itself	2024-04-15 08:00:59 +05:30
YongGang	da9feb4430	Introduce TaskContextReport for reporting task context (#16041 ) Changes: - Add `TaskContextEnricher` interface to improve task management and monitoring - Invoke `enrichContext` in `TaskQueue.add()` whenever a new task is submitted to the Overlord - Add `TaskContextReport` to write out task context information in reports	2024-04-12 08:57:49 +05:30
Gian Merlino	9f358f5f4a	SQL tests: avoid mixing skip and cannot vectorize. (#16251 ) * SQL tests: avoid mixing skip and cannot vectorize. skipVectorize switches off vectorization tests completely, and cannotVectorize turns vectorization tests into negative tests. It doesn't make sense to use them together, so this patch makes it an error to do so, and cleans up cases where both are mentioned. This patch also has the effect of changing various tests from skipVectorize to cannotVectorize, because in the past when both were mentioned, skipVectorize would take priority. * Fix bug with StringAnyAggregatorFactory attempting to vectorize when it cannt. * Fix tests.	2024-04-11 15:06:11 -07:00
Vishesh Garg	3d595cfab1	Add storeCompactionState flag support to msq (#15965 ) Compaction in the native engine by default records the state of compaction for each segment in the lastCompactionState segment field. This PR adds support for doing the same in the MSQ engine, targeted for future cases such as REPLACE and compaction done via MSQ. Note that this PR doesn't implicitly store the compaction state for MSQ replace tasks; it is stored with flag "storeCompactionState": true in the query context.	2024-04-09 16:47:47 +05:30
Vishesh Garg	9a4fb58543	Record column name for exceptions while writing frames in RowBasedFrameWriter (#16130 ) Current Runtime Exceptions generated while writing frames only include the exception itself without including the name of the column they were encountered in. This patch introduces the further information in the error and makes it non-retryable.	2024-04-09 15:39:10 +05:30
Adarsh Sanjeev	e2e0cb905c	Add reasoning for choosing shardSpec to the MSQ report (#16175 ) This PR logs the segment type and reason chosen. It also adds it to the query report, to be displayed in the UI. This PR adds a new section to the reports, segmentReport. This contains the segment type created, if the query is an ingestion, and null otherwise.	2024-04-09 11:32:02 +05:30
Gian Merlino	5e5cf9af99	Reduce upload buffer size in GoogleTaskLogs. (#16236 ) * Reduce upload buffer size in GoogleTaskLogs. Use a 1MB upload buffer, rather than the default of 15 MB in the API client. This is mainly because MMs may upload logs in parallel, and typically have small heaps. The default-sized 15 MB buffers add up quickly and can cause a MM to run out of memory. * Make bufferSize a nullable Integer. Add tests.	2024-04-08 12:54:31 -07:00
Parag Jain	f55c9e58a8	add google as external storage for msq export (#16051 ) Support for exporting msq results to gcs bucket. This is essentially copying the logic of s3 export for gs, originally done by @adarshsanjeev in this PR - #15689	2024-04-05 12:10:10 +05:30
Soumyava	4bea865697	Restore context flag for window functions (#16229 )	2024-04-03 13:57:13 +05:30
zachjsh	9b52c909e0	fix complex types returning UNKNOWN as their SQL type inference (#16216 ) * * fix * * fix * * address review comments	2024-04-02 14:36:01 -04:00
Kashif Faraz	0de44d91f1	Cleanup serialiazation of TaskReportMap (#16217 ) * Build task reports in AbstractBatchIndexTask * Minor cleanup * Apply suggestions from code review by @abhishekrb Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com> * Cleanup IndexTaskTest * Fix formatting * Fix coverage * Cleanup serialization of TaskReport map * Replace occurrences of Map<String, TaskReport> * Return TaskReport.ReportMap for live reports, fix test comparisons * Address test failures --------- Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>	2024-04-01 11:53:24 -07:00
Kashif Faraz	4df4896674	Refactor: Add common method in AbstractBatchIndexTask to create ingestion stats report (#16202 ) Changes - No functional changes - Add method `AbstractBatchIndexTask.buildIngestionStatsReport()` used in several batch tasks - Add utility method `AbstractBatchIndexTask.addBuildSegmentStatsToReport()` - Use boolean argument to represent a full report instead of the String `full` in internal methods. (REST API remains unchanged.) - Rename `IngestionStatsAndErrorsTaskReportData` to `IngestionStatsAndErrors` - Clean up some of the methods	2024-03-28 23:07:00 +05:30
Soumyava	524842a3bb	Window function on msq (#15470 ) This PR aims to introduce Window functions on MSQ by doing the following: Introduce a Window querykit for handling window queries along with its factory and a processor for window queries If a window operator is present with a partition by clause, pushes the partition as a shuffle spec of the previous stage In presence of empty OVER() clause lets all operators loose on a single rac In presence of no empty OVER() clause, breaks down each window into individual stages Associated machinery to handle window functions in MSQ Introduced a separate hidden engine feature WINDOW_LEAF_OPERATOR which is set only for MSQ engine. In presence of this feature, the planner plans without the leaf operators by creating a window query over an inner scan query. In case of native this is set to false and the planner generates the leafOperators Guardrails around materialization Comprehensive UTs	2024-03-28 14:58:34 +05:30
Gian Merlino	7649957710	MSQ: Fix issue where AUTO assignment would not respect maxWorkerCount. (#16214 ) WorkerAssignmentStrategy.AUTO was missing a check for maxWorkerCount in the case where the inputs to a stage are not dynamically sliceable. A common case here is when the inputs to a stage are other stages.	2024-03-28 14:40:31 +05:30
zachjsh	8370db106c	INSERT/REPLACE dimension target column types are validated against source input expressions (#15962 ) * * address remaining comments from https://github.com/apache/druid/pull/15836 * * address remaining comments from https://github.com/apache/druid/pull/15908 * * add test that exposes relational algebra issue * * simplify test exposing issue * * fix * * add tests for sealed / non-sealed * * update test descriptions * * fix test failure when -Ddruid.generic.useDefaultValueForNull=true * * check type assignment based on natice Druid types * * add tests that cover missing jacoco coverage * * add replace tests * * add more tests and comments about column ordering * * simplify tests * * review comments * * remove commented line * * STRING family types should be validated as non-null	2024-03-25 12:34:07 -04:00
Aru Raghuwanshi	6e19ce5e69	Handle null values in `KafkaStringHeaderReader` (#16192 )	2024-03-23 13:05:55 +05:30
Gian Merlino	2b23d0b5b5	MSQ: Controller checker should check for "closed" only. (#16161 ) * MSQ: Controller checker should check for "closed" only. Currently, the worker's controller checker will exit the worker if the controller location is "closed" (no longer running) or if its location is empty (i.e. location unknown). This patch changes to only exit on "closed". We shouldn't exit on empty location, because that may happen if the Overlord is slow to acknowledge the location of a task. * Fix test.	2024-03-19 19:25:48 -07:00
Gian Merlino	c96b215dd6	SortMerge join support for IS NOT DISTINCT FROM. (#16003 ) * SortMerge join support for IS NOT DISTINCT FROM. The patch adds a "requiredNonNullKeyParts" field to the sortMerge processor, which has the list of key parts that must be nonnull for an equijoin condition to match. Conditions with SQL "=" are present in the list; conditions with SQL "IS NOT DISTINCT FROM" are absent from the list. * Fix test. * Update javadoc.	2024-03-19 12:02:13 -07:00
Zoltan Haindrich	1ad489a2ae	Fix build: newTempFolder (#16170 )	2024-03-19 08:53:56 -07:00
Zoltan Haindrich	0a42342cef	Update CalciteTest to use junit5 (#16106 ) Update CalciteTest to use junit5 change the way temp dirs are handled * add openrewrite workflow to safeguard upgrade * replace junitparamrunner with standard junit5 parametered tests * update a few rules to junit5 api * lots of boring changes * cleanup QueryLogHook * cleanup * fix compile error: ARRAYS_DATASOURCE * fix test * remove enclosed * empty +TEST:TDigestSketchSqlAggregatorTest,HllSketchSqlAggregatorTest,DoublesSketchSqlAggregatorTest,ThetaSketchSqlAggregatorTest,ArrayOfDoublesSketchSqlAggregatorTest,BloomFilterSqlAggregatorTest,BloomDimFilterSqlTest,CatalogIngestionTest,CatalogQueryTest,FixedBucketsHistogramQuantileSqlAggregatorTest,QuantileSqlAggregatorTest,MSQArraysTest,MSQDataSketchesTest,MSQExportTest,MSQFaultsTest,MSQInsertTest,MSQLoadedSegmentTests,MSQParseExceptionsTest,MSQReplaceTest,MSQSelectTest,InsertLockPreemptedFaultTest,MSQWarningsTest,SqlMSQStatementResourcePostTest,SqlStatementResourceTest,CalciteSelectJoinQueryMSQTest,CalciteSelectQueryMSQTest,CalciteUnionQueryMSQTest,MSQTestBase,VarianceSqlAggregatorTest,SleepSqlTest,SqlRowTransformerTest,DruidAvaticaHandlerTest,DruidStatementTest,BaseCalciteQueryTest,CalciteArraysQueryTest,CalciteCorrelatedQueryTest,CalciteExplainQueryTest,CalciteExportTest,CalciteIngestionDmlTest,CalciteInsertDmlTest,CalciteJoinQueryTest,CalciteLookupFunctionQueryTest,CalciteMultiValueStringQueryTest,CalciteNestedDataQueryTest,CalciteParameterQueryTest,CalciteQueryTest,CalciteReplaceDmlTest,CalciteScanSignatureTest,CalciteSelectQueryTest,CalciteSimpleQueryTest,CalciteSubqueryTest,CalciteSysQueryTest,CalciteTableAppendTest,CalciteTimeBoundaryQueryTest,CalciteUnionQueryTest,CalciteWindowQueryTest,DecoupledPlanningCalciteJoinQueryTest,DecoupledPlanningCalciteQueryTest,DecoupledPlanningCalciteUnionQueryTest,DrillWindowQueryTest,DruidPlannerResourceAnalyzeTest,IngestTableFunctionTest,QueryTestRunner,SqlTestFrameworkConfig,SqlAggregationModuleTest,ExpressionsTest,GreatestExpressionTest,IPv4AddressMatchExpressionTest,IPv4AddressParseExpressionTest,IPv4AddressStringifyExpressionTest,LeastExpressionTest,TimeFormatOperatorConversionTest,CombineAndSimplifyBoundsTest,FiltrationTest,SqlQueryTest,CalcitePlannerModuleTest,CalcitesTest,DruidCalciteSchemaModuleTest,DruidSchemaNoDataInitTest,InformationSchemaTest,NamedDruidSchemaTest,NamedLookupSchemaTest,NamedSystemSchemaTest,RootSchemaProviderTest,SystemSchemaTest,CalciteTestBase,SqlResourceTest * use @Nested * add rule to remove enclosed; upgrade surefire * remove enclosed * cleanup * add comment about surefire exclude	2024-03-19 04:05:12 -07:00
Adarsh Sanjeev	a151bcfd12	Fix incorrect header names for certain export queries (#16096 ) * Fix incorrect header names for certain queries * Fix incorrect header names for certain queries * Maintain upgrade compatibility * Fix tests * Change null handling	2024-03-19 15:11:04 +05:30
Gian Merlino	55c47fbcfd	MSQ: Fix NPE in getWorkerStats(). (#16159 ) TaskTracker's status is null when TaskTrackers are first set up, and stay null until the first status call comes back. This patch handles that case and sets the status code to null in the WorkerStats object in live reports.	2024-03-19 14:22:49 +05:30
Gian Merlino	8ee324c7e7	MSQ: Cancel workers more quickly. (#16158 ) Prior to this patch, when canceled, workers would keep trying to contact the controller: they would attempt to report an error, and if they were in the midst of some other call (like a counters push) they would keep trying it. This can cause cancellation to be delayed, because the controller shuts down its HTTP server before it cancels workers. Workers are then stuck retrying calls to the controller that will never succeed. The retry loops are broken when the controller gives up on them (one minute later) and exits for real. Then, the controller failure detection logic on the worker detects that the controller has failed, and the worker finally shuts down. This patch speeds up worker cancellation by bypassing communication with the controller. There is no real need for it. If the controller canceled the workers, it isn't interested in further communications from them. If the workers were canceled out-of-band, the controller can detect this through worker monitoring and report it as a WorkerFailed error.	2024-03-19 14:21:22 +05:30
Gian Merlino	36bc94c798	MSQ: Remove unnecessary snapshot deserialization code. (#16116 ) Since #13205, a special deserializer module has no longer been necessary to read key collector snapshots. This patch removes the unnecessary code.	2024-03-18 10:12:27 -07:00
Kashif Faraz	466057c61b	Remove deprecated DruidException, EntryExistsException (#14448 ) Changes: - Remove deprecated `DruidException` (old one) and `EntryExistsException` - Use newly added comprehensive `DruidException` instead - Update error message in `SqlMetadataStorageActionHandler` when max packet limit is violated. - Factor out common code from several faults into `BaseFault`. - Slightly update javadoc in `DruidException` to render it correctly - Remove unused classes `SegmentToMove`, `SegmentToDrop` - Move `ServletResourceUtils` from module `druid-processing` to `druid-server` - Add utility method to build error Response from `DruidException`.	2024-03-15 21:29:11 +05:30
AlbericByte	33bb99cd0d	remove use log of log4j v1 (#15984 )	2024-03-15 15:43:48 +05:30
Karan Kumar	5e603ac5ff	Adding more logging for s3 RetryableS3OutputStream (#16117 ) Adding more logging for s3 RetryableS3OutputStream which would help us determine if the chunk size needs to be adjusted.	2024-03-14 11:35:57 +05:30
Gian Merlino	256160aba6	MSQ: Validate that strings and string arrays are not mixed. (#15920 ) * MSQ: Validate that strings and string arrays are not mixed. When multi-value strings and string arrays coexist in the same column, it causes problems with "classic MVD" style queries such as: select * from wikipedia -- fails at runtime select count() from wikipedia where flags = 'B' -- fails at planning time select flags, count() from wikipedia group by 1 -- fails at runtime To avoid these problems, this patch adds type verification for INSERT and REPLACE. It is targeted: the only type changes that are blocked are string-to-array and array-to-string. There is also a way to exclude certain columns from the type checks, if the user really knows what they're doing. * Fixes. * Tests and docs and error messages. * More docs. * Adjustments. * Adjust message. * Fix tests. * Fix test in DV mode.	2024-03-13 15:37:27 -07:00
Gian Merlino	910124d4de	MSQ: Plan without implicit sorting. (#16073 ) * MSQ: Plan without implicit sorting. This patch adds an EngineFeature "GROUPBY_IMPLICITLY_SORTS" and sets it true for native, false for MSQ. It's useful for two reasons: 1) In the future we'll likely want MSQ to hash-partition for GROUP BY instead of using a global sort, which would mean MSQ would not implicitly ORDER BY when there is a GROUP BY. 2) When doing REPLACE with MSQ, CLUSTERED BY is transformed to ORDER BY. We should retain that ORDER BY, as it may be a subset of the GROUP BY, and it is important to remember which fields the user wanted to include in range shard specs. * Fix tests. * Fix tests for real. * Fix test.	2024-03-13 08:27:39 -07:00
Karan Kumar	84c5098473	Fix data race in getting results from MSQ select tasks. (#16107 ) * Fix data race in getting results from MSQ select tasks. * Add better logging * Handling number overflow.	2024-03-13 08:58:18 +05:30
Zoltan Haindrich	8252d72e2a	Pull up literals in InputAccessor (#16033 ) * Pull up literals in InputAccessor * pull up literals in `InputAccessor` * remove the need to pass `constants` of `Window` operator Fixes #15353 * update test * enable relax_nulls	2024-03-12 09:14:31 -07:00
Vishesh Garg	2dd8b16467	Correct the API used to fetch the version for a GCS object (#16097 ) Current API used to fetch the version for a GCS object is incorrect. This PR fixes that API.	2024-03-11 18:30:34 +05:30
Zoltan Haindrich	2eb7d7a89b	Calcite tests remove expected exception (#16046 ) * Calcite tests remove expected exception * update testcases using `expectedException` to utilize `assertThrows` instead * remove `BaseCalciteQueryTest#expectedException` * fixes `cannotVectorize` so it doesn't anymore stops further processing * `msqIncompatible` is not anymore toggles a boolean - its an `Assume` instead Fixes #15423 * cleanup * move msqIncompat * update test * cleanup * remove comment * empty-commit * empty-commit	2024-03-11 13:23:57 +05:30
Vishesh Garg	b1c1937e94	Change last update timestamp granularity of GCS objects from seconds to milliseconds (#16083 ) The previously used GCS API client library returned last update time for objects directly in milliseconds. The new library returns it in OffsetDateTime format which was being converted to seconds and stored against the object. This fix converts the time back to ms before storing it.	2024-03-09 07:54:33 +05:30
George Shiqi Wu	40ebaf83c9	Fix bug with mmless ingestion and compaction tasks on azure (#16065 ) * Update azure behavior to match s3 * Add test * Cleanup logic * fix checkstyle * Add comment	2024-03-08 15:42:44 -05:00
Jan Werner	a7b2747e56	remove aws-sdk from ranger-extension (#16011 ) Fixes # size blowup regression introduced in https://github.com/apache/druid/pull/15443 This PR removes the transitive dependency of ranger-plugins-audit to reduce the size of the compiled artifacts * add aws-logs-sdk to ensure that all the transitive dependencies are satisfied * replace aws-bundle-sdk with aws-logs-sdk * add additional guidance on ranger update, add dependency ignore to satisfy dependency analyzer * add aws-sdk-logs to list of ignored dependencies to satisfy the maven plugin * align aws-sdk versions	2024-03-08 07:35:29 -08:00
AmatyaAvadhanula	5871b81a78	Fix race in BaseNodeRoleWatcher tests (#16064 ) * Fix race in BaseNodeRoleWatcher tests * Make non static	2024-03-07 13:41:16 -08:00
Vishesh Garg	bed5d9c3b2	Remove exception on failure response from GCS delete API (#16047 ) * Throw 404 Exception on failure response from GCS delete API * Replace String.format * Apply suggestions from code review Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com> * Remove exception for file not found and fix tests * Add warn log and fix intellij inspection errors * More intellij inspection fixes * * Change to debug log * change runtime exception class for code coverage * Add file paths for batch delete failures * Move failedPaths computation to inside isDebugEnabled flag * Correct handling of StorageException * Address review comments * Remove unused exceptions * Address code coverage and review comments * Minor corrections --------- Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>	2024-03-07 17:57:17 +05:30
AmatyaAvadhanula	c2841425f4	Handle uninitialized cache in Node role watchers (#15726 ) BaseNodeRoleWatcher counts down cacheInitialized after a timeout, but also sets some flag that it was a timed-out initialization. and call nodeViewInitializationTimedOut (new method on listeners) instead of nodeViewInitialized. Then listeners can do what is most appropriate with this information.	2024-03-06 16:00:24 +05:30
Adarsh Sanjeev	ddd9da2e09	Make servedSegments nullable to maintain compatibility (#16034 ) * Make servedSegments nullable to maintain compatibility	2024-03-06 11:39:24 +05:30
zachjsh	720f1e834a	Add support for AzureDNSZone enabled storage accounts used for deep storage (#16016 ) * Add support for AzureDNSZone enabled storage accounts used for deep storage Added a new config to AzureAccountConfig `storageAccountEndpointSuffix` which allows the user to specify a storage account endpoint suffix where the underlying storage account is enabled for AzureDNSZone. The previous config `endpointSuffix`, did not allow support for such accounts. The previous config has been deprecated in favor of this new config. Also fixed an issue where `managedIdentityClientId` was not being set properly * * address review comments * * add back azure government link and docs	2024-03-04 16:13:28 -05:00
Gian Merlino	930655ff18	Move retries into DataSegmentPusher implementations. (#15938 ) * Move retries into DataSegmentPusher implementations. The individual implementations know better when they should and should not retry. They can also generate better error messages. The inspiration for this patch was a situation where EntityTooLarge was generated by the S3DataSegmentPusher, and retried uselessly by the retry harness in PartialSegmentMergeTask. * Fix missing var. * Adjust imports. * Tests, comments, style. * Remove unused import.	2024-03-04 10:36:21 -08:00
Adarsh Sanjeev	93eeb05eaf	Revert explain attributes change to old behaviour. (#16004 ) * Revert explain attributes change * Fix tests * Fix tests * Rename function	2024-03-04 15:56:02 +05:30
Gian Merlino	8d3ed31015	MSQ: Nicer error when sortMerge join falls back to broadcast. (#16002 ) * MSQ: Nicer error when sortMerge join falls back to broadcast. In certain cases, joins run as broadcast even when the user hinted that they wanted sortMerge. This happens when the sortMerge algorithm is unable to process the join, because it isn't a direct comparison between two fields on the LHS and RHS. When this happens, the error message from BroadcastTablesTooLargeFault is quite confusing, since it mentions that you should try sortMerge to fix it. But the user may have already configured sortMerge. This patch fixes it by having two error messages, based on whether broadcast join was used as a primary selection or as a fallback selection. * Style. * Better message.	2024-03-01 13:16:39 -08:00
Kashif Faraz	f757231420	Use cache for password hash while validating LDAP password (#15993 )	2024-02-28 18:33:33 +05:30
Adarsh Sanjeev	d2c2036ea2	Optimize MSQ realtime queries (#15399 ) Currently, while reading results from realtime tasks, requests are sent on a segment level. This is slightly wasteful, as when contacting a data servers, it is possible to transfer results for all segments which it is hosting, instead of only one segment at a time. One change this PR makes is to group the segments on the basis of servers. This reduces the number of queries to data servers made. Since we don't have access to the number of rows for realtime segments, the grouping is done with a fixed estimated number of rows for each realtime segment.	2024-02-28 11:32:14 +05:30
Karan Kumar	5bb5b41b18	Adding task pending time in MSQ reports (#15966 ) Added a new field pendingMs in MSQ task reports. This helps in figuring out the exact run time of the MSQ worker tasks. Fixed data races.	2024-02-27 14:41:28 +05:30
Laksh Singla	17e4f3ac60	Refactor GroupBy and TopN code to relax the constraint of dimensions being comparable (#15559 ) The code in the groupBy engine and the topN engine assume that the dimensions are comparable and can call dimA.compareTo(dimB) to sort the dimensions and group them together. This works well for the primitive dimensions, because they are Comparable, however falls apart when the dimensions can be arrays (or in future scenarios complex columns). In cases when the dimensions are not comparable, Druid resorts to having a wrapper type ComparableStringArray and ComparableList, which is a Comparable, based on the list comparator.	2024-02-27 11:39:29 +05:30
AmatyaAvadhanula	e2b7289dea	Try to fetch the task status for an active from memory (#15724 ) * Reduce metadata calls to fetch the status for an active task	2024-02-26 13:53:05 +05:30
Zoltan Haindrich	06deda9415	ScanAndSort query fails with NPE for simple queries (#15914 ) * some stuff * add dummy fields * draft-fix * rename test * cleanup * add null * cleanup * cleanup * add test * updates * move check tp constructore * cleanup * updates/etc * fix some more * add rowSignatureMode * checkstyle/etc * override * missing msqIncompat * fix test * fixes * undo * updates * remove param	2024-02-24 15:33:50 -08:00
Adithya Chakilam	1f443d218c	Enable partition stats on streaming task completion report (#15930 ) Changes: - Add visibility into number of records processed by each streaming task per partition - Add field `recordsProcessed` to `IngestionStatsAndErrorsTaskReportData` - Populate number of records processed per partition in `SeekableStreamIndexTaskRunner`	2024-02-23 16:29:03 +05:30
zachjsh	8ebf237576	Move INSERT & REPLACE validation to the Calcite validator (#15908 ) This PR contains a portion of the changes from the inactive draft PR for integrating the catalog with the Calcite planner https://github.com/apache/druid/pull/13686 from @paul-rogers, Refactoring the IngestHandler and subclasses to produce a validated SqlInsert instance node instead of the previous Insert source node. The SqlInsert node is then validated in the calcite validator. The validation that is implemented as part of this pr, is only that for the source node, and some of the validation that was previously done in the ingest handlers. As part of this change, the partitionedBy clause can be supplied by the table catalog metadata if it exists, and can be omitted from the ingest time query in this case.	2024-02-22 14:01:59 -05:00
Clint Wylie	fe2ba8cc28	fix return type inference of parse_long, which can also be null if string is not parseable into a long (#15909 ) * fix return type inference of parse_long, which can also be null if string is not parseable into a long * fix msq test	2024-02-15 08:45:34 -08:00
Parth Agrawal	495e66f2e7	CVE Fix: Update json-path version (#15772 ) Apache Druid brings the dependency json-path which is affected by CVE-2023-51074. Its latest version 2.9.0 fixes the above CVE. Append function has been added to json-path and so the unit test to check for the append function not present has been updated. --------- Co-authored-by: Xavier Léauté <xvrl@apache.org>	2024-02-14 20:58:27 -08:00
YongGang	19ed5c863f	Enhance rolling Supervisor restarts at taskDuration (#15859 )	2024-02-14 15:44:34 -08:00
Gian Merlino	0f6a895372	Rework ExprMacro base classes to simplify implementations. (#15622 ) * Rework ExprMacro base classes to simplify implementations. This patch removes BaseScalarUnivariateMacroFunctionExpr, adds BaseMacroFunctionExpr at the top of the hierarchy (a suitable base class for ExprMacros that take either arrays or scalars), and adds an implementation for "visit" to BaseMacroFunctionExpr. The effect on implementations is generally cleaner code: - Exprs no longer need to implement "visit". - Exprs no longer need to implement "stringify", even if they don't use all of their args at runtime, because BaseMacroFunctionExpr has access to even unused args. - Exprs that accept arrays can extend BaseMacroFunctionExpr and inherit a bunch of useful methods. The only one they need to implement themselves that scalar exprs don't is "supplyAnalyzeInputs". * Make StringDecodeBase64UTFExpression a static class. * Remove unused import. * Formatting, annotation changes.	2024-02-12 15:50:45 -08:00
Lasse Mammen	4255711b3e	fix: handle BOOKMARK events in kubernetes pod discovery (#15819 )	2024-02-09 18:50:04 +05:30
Gian Merlino	21a97f4c61	Fix HllSketchHolderObjectStrategy#isSafeToConvertToNullSketch. (#15860 ) * Fix HllSketchHolderObjectStrategy#isSafeToConvertToNullSketch. The prior code from #15162 was reading only the low-order byte of an int representing the size of a coupon set. As a result, it would erroneously believe that a coupon set with a multiple of 256 elements was empty.	2024-02-08 08:14:28 +05:30
Adarsh Sanjeev	514b3b4d01	Add export capabilities to MSQ with SQL syntax (#15689 ) * Add test * Parser changes to support export statements * Fix builds * Address comments * Add frame processor * Address review comments * Fix builds * Update syntax * Webconsole workaround * Refactor * Refactor * Change export file path * Update docs * Remove webconsole changes * Fix spelling mistake * Parser changes, add tests * Parser changes, resolve build warnings * Fix failing test * Fix failing test * Fix IT tests * Add tests * Cleanup * Fix unparse * Fix forbidden API * Update docs * Update docs * Address review comments * Address review comments * Fix tests * Address review comments * Fix insert unparse * Add external write resource action * Fix tests * Add resource check to overlord resource * Fix tests * Add IT * Update syntax * Update tests * Update permission * Address review comments * Address review comments * Address review comments * Add tests * Add check for runtime parameter for bucket and path * Add check for runtime parameter for bucket and path * Add tests * Update docs * Fix NPE * Update docs, remove deadcode * Fix formatting	2024-02-07 22:08:50 +05:30
Pramod Immaneni	59bca0951a	Parallelize storage of incremental segments (#13982 ) During ingestion, incremental segments are created in memory for the different time chunks and persisted to disk when certain thresholds are reached (max number of rows, max memory, incremental persist period etc). In the case where there are a lot of dimension and metrics (1000+) it was observed that the creation/serialization of incremental segment file format for persistence and persisting the file took a while and it was blocking ingestion of new data. This affected the real-time ingestion. This serialization and persistence can be parallelized across the different time chunks. This update aims to do that. The patch adds a simple configuration parameter to the ingestion tuning configuration to specify number of persistence threads. The default value is 1 if it not specified which makes it the same as it is today.	2024-02-07 10:43:05 +05:30
Laksh Singla	ee78a0367d	Fix serialization bug in PassthroughAggregatorFactory (#15830 ) PassthroughAggregatorFactory overrides a deprecated method in the AggregatorFactory, on which it relies on for serializing one of its fields complexTypeName. This was accidentally removed, leading to a bug in the factory, where the type name doesn't get serialized properly, and places null in the type name. This PR revives that method with a different name and adds tests for the same.	2024-02-05 15:11:10 +05:30
PANKAJ KUMAR	65857dc0e7	pac4j: fix incompatible dependencies + authorization regression (#15753 ) - After upgrading the pac4j version in: https://github.com/apache/druid/pull/15522. We were not able to access the druid ui. - Upgraded the Nimbus libraries version to a compatible version to pac4j. - In the older pac4j version, when we return RedirectAction there we also update the webcontext Response status code and add the authentication URL to the header. But in the newer pac4j version, we just simply return the RedirectAction. So that's why it was not getting redirected to the generated authentication URL. - To fix the above, I have updated the NOOP_HTTP_ACTION_ADAPTER to JEE_HTTP_ACTION_ADAPTER and it updates the HTTP Response in context as per the HTTP Action.	2024-02-01 09:35:23 -08:00
George Shiqi Wu	5edfa9429f	Batch kill in azure (#15770 ) * Multi kill * add some unit tests * Fix param * Fix deleteBatchFiles * Fix unit tests * Add tests * Save work on batch kill * add tests * Fix unit tests * Update extensions-core/azure-extensions/src/main/java/org/apache/druid/storage/azure/AzureDataSegmentKiller.java Co-authored-by: Suneet Saldanha <suneet@apache.org> * Fix unit tests * Update extensions-core/azure-extensions/src/test/java/org/apache/druid/storage/azure/AzureStorageTest.java Co-authored-by: Suneet Saldanha <suneet@apache.org> * fix test * fix test * Add test --------- Co-authored-by: Suneet Saldanha <suneet@apache.org>	2024-01-31 13:41:15 -05:00
George Shiqi Wu	dbcfb2bb8b	Allow null values for account when injecting (#15777 )	2024-01-30 16:55:45 -05:00
Abhishek Radhakrishnan	dbdfae3011	Fix up typo </br /> -> <br /> and adjust interpolated exception msg in InvalidNullByteFault. (#15804 )	2024-01-30 12:44:51 -08:00
AmatyaAvadhanula	54d0e482dc	Consolidate RetrieveSegmentsToReplaceAction into RetrieveUsedSegmentsAction (#15699 ) Consolidate RetrieveSegmentsToReplaceAction into RetrieveUsedSegmentsAction	2024-01-29 19:18:43 +05:30
George Shiqi Wu	3e512249e3	Azure multi read options (#15630 ) * Include new dependencies * Mostly implemented * More azure fixes * Tests passing * Unit tests running * Test running after removing storage exception * Happy with coverage now * Add more tests * fix client factory * cleanup from testing * Remove old client * update docs * Exclude from spellcheck * Add licenses * Fix identity version * Save work * Add azure clients * add licenses * typos * Add dependencies * Exception is not thrown * Fix intellij check * Don't need to override * specify length * urldecode * encode path * Fix checks * Revert urlencode changes * Urlencode with azure library * Update docs/development/extensions-core/azure.md Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> * PR changes * Update docs/development/extensions-core/azure.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Add config for multiple storage accounts * Deprecate AzureTaskLogsConfig.maxRetries * Clean up azure retry block * logic update to reuse clients * fix comments * Create container conditionally * Fix key auth * save work * Fix unit tests * Revert old azure input type * Separate input source * save work * Add support for app registrations * Fix unit tests * clean up spacing * Add coverage * fixes from testing * cleanup some caching behavior * Add docs * Fix spelling issues * fix more spelling errors' * Fix intellij inspections * add simple changes from pr * save work on fixing bug * Fix unit tests * Add more testing * Fix unit test * Add tests * Add annotation for azureStorage * Fix up docs * Add comment for list method * Fix tests * Remove uneeded toString * Update docs/ingestion/input-sources.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/ingestion/input-sources.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/ingestion/input-sources.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/ingestion/input-sources.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/ingestion/input-sources.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/ingestion/input-sources.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/ingestion/input-sources.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/ingestion/input-sources.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/ingestion/input-sources.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * PR changes * fix injection of StorageConnector * Fix checkstyle * clean up unit tests * More pr fixes --------- Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2024-01-25 13:29:16 -05:00
Parth Agrawal	ed6df26a91	update salt size (#15758 ) As part of becoming FIPS compliance, we are seeing this error: salt must be at least 128 bits when we run the Druid code against FIPS Compliant cryptographic security providers. This PR fixes the salt size used in Pac4jSessionStore.java	2024-01-25 17:05:53 +05:30
George Shiqi Wu	ef0232290c	Fix AzureStorage.batchDeleteFiles (#15730 ) * Fix param * Fix deleteBatchFiles * Fix unit tests * Add tests	2024-01-24 14:37:58 -05:00
Karan Kumar	c4990f56d6	Prepare main branch for next 30.0.0 release. (#15707 )	2024-01-23 15:55:54 +05:30
Zoltan Haindrich	d6a12c4389	Add ability to enable ResultCache in tests (#15465 )	2024-01-22 09:02:59 -05:00
zachjsh	9d4e8053a4	Kinesis adaptive memory management (#15360 ) ### Description Our Kinesis consumer works by using the [GetRecords API](https://docs.aws.amazon.com/kinesis/latest/APIReference/API_GetRecords.html) in some number of `fetchThreads`, each fetching some number of records (`recordsPerFetch`) and each inserting into a shared buffer that can hold a `recordBufferSize` number of records. The logic is described in our documentation at: https://druid.apache.org/docs/27.0.0/development/extensions-core/kinesis-ingestion/#determine-fetch-settings There is a problem with the logic that this pr fixes: the memory limits rely on a hard-coded “estimated record size” that is `10 KB` if `deaggregate: false` and `1 MB` if `deaggregate: true`. There have been cases where a supervisor had `deaggregate: true` set even though it wasn’t needed, leading to under-utilization of memory and poor ingestion performance. Users don’t always know if their records are aggregated or not. Also, even if they could figure it out, it’s better to not have to. So we’d like to eliminate the `deaggregate` parameter, which means we need to do memory management more adaptively based on the actual record sizes. We take advantage of the fact that GetRecords doesn’t return more than 10MB (https://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html ): This pr: eliminates `recordsPerFetch`, always use the max limit of 10000 records (the default limit if not set) eliminate `deaggregate`, always have it true cap `fetchThreads` to ensure that if each fetch returns the max (`10MB`) then we don't exceed our budget (`100MB` or `5% of heap`). In practice this means `fetchThreads` will never be more than `10`. Tasks usually don't have that many processors available to them anyway, so in practice I don't think this will change the number of threads for too many deployments add `recordBufferSizeBytes` as a bytes-based limit rather than records-based limit for the shared queue. We do know the byte size of kinesis records by at this point. Default should be `100MB` or `10% of heap`, whichever is smaller. add `maxBytesPerPoll` as a bytes-based limit for how much data we poll from shared buffer at a time. Default is `1000000` bytes. deprecate `recordBufferSize`, use `recordBufferSizeBytes` instead. Warning is logged if `recordBufferSize` is specified deprecate `maxRecordsPerPoll`, use `maxBytesPerPoll` instead. Warning is logged if maxRecordsPerPoll` is specified Fixed issue that when the record buffer is full, the fetchRecords logic throws away the rest of the GetRecords result after `recordBufferOfferTimeout` and starts a new shard iterator. This seems excessively churny. Instead, wait an unbounded amount of time for queue to stop being full. If the queue remains full, we’ll end up right back waiting for it after the restarted fetch. There was also a call to `newQ::offer` without check in `filterBufferAndResetBackgroundFetch`, which seemed like it could cause data loss. Now checking return value here, and failing if false. ### Release Note Kinesis ingestion memory tuning config has been greatly simplified, and a more adaptive approach is now taken for the configuration. Here is a summary of the changes made: eliminates `recordsPerFetch`, always use the max limit of 10000 records (the default limit if not set) eliminate `deaggregate`, always have it true cap `fetchThreads` to ensure that if each fetch returns the max (`10MB`) then we don't exceed our budget (`100MB` or `5% of heap`). In practice this means `fetchThreads` will never be more than `10`. Tasks usually don't have that many processors available to them anyway, so in practice I don't think this will change the number of threads for too many deployments add `recordBufferSizeBytes` as a bytes-based limit rather than records-based limit for the shared queue. We do know the byte size of kinesis records by at this point. Default should be `100MB` or `10% of heap`, whichever is smaller. add `maxBytesPerPoll` as a bytes-based limit for how much data we poll from shared buffer at a time. Default is `1000000` bytes. deprecate `recordBufferSize`, use `recordBufferSizeBytes` instead. Warning is logged if `recordBufferSize` is specified deprecate `maxRecordsPerPoll`, use `maxBytesPerPoll` instead. Warning is logged if maxRecordsPerPoll` is specified	2024-01-19 14:30:21 -05:00
George Shiqi Wu	dd03f7061a	Implement task payload management with azure storage (#15695 ) * Implement task payload manageent with azure storage * Remove unnecessary parameter	2024-01-18 13:21:36 -05:00
Gian Merlino	764f41d959	Clear "lineSplittable" for JSON when using KafkaInputFormat. (#15692 ) * Clear "lineSplittable" for JSON when using KafkaInputFormat. JsonInputFormat has a "withLineSplittable" method that can be used to control whether JSON is read line-by-line, or as a whole. The intent is that in streaming ingestion, "lineSplittable" is false (although it can be overridden by "assumeNewlineDelimited"), and in batch ingestion, lineSplittable is true. When a "json" format is wrapped by a "kafka" format, this isn't set properly. This patch updates KafkaInputFormat to set this on an underlying "json" format. The tests for KafkaInputFormat were overriding the "lineSplittable" parameter explicitly, which wasn't really fair, because that made them unrealistic to what happens in production. Now they omit the parameter and get the production behavior. * Add test. * Fix test coverage.	2024-01-18 03:22:41 -08:00
Gian Merlino	d3d0c1c91e	Faster parsing: reduce String usage, list-based input rows. (#15681 ) * Faster parsing: reduce String usage, list-based input rows. Three changes: 1) Reworked FastLineIterator to optionally avoid generating Strings entirely, and reduce copying somewhat. Benefits the line-oriented JSON, CSV, delimited (TSV), and regex formats. 2) In the delimited (TSV) format, when the delimiter is a single byte, split on UTF-8 bytes directly. 3) In CSV and delimited (TSV) formats, use list-based input rows when the column list is provided upfront by the user. * Fix style. * Fix inspections. * Restore validation. * Remove fastutil-extra. * Exception type. * Fixes for error messages. * Fixes for null handling.	2024-01-18 19:18:46 +08:00
Maytas Monsereenusorn	55acf2e2ff	Fix incorrect scale when reading decimal from parquet (#15715 ) * Fix incorrect scale when reading decimal from parquet * add comments * fix test	2024-01-18 02:10:27 -08:00
Maytas Monsereenusorn	a3b32fbd26	Fix comparator and remove deprecated methods from spectatorHistogram extension (#15698 ) * Remove deprecated methods from SpectatorHistogram * Remove deprecated methods from SpectatorHistogram * Remove deprecated methods from SpectatorHistogram	2024-01-17 21:59:53 -08:00
Abhishek Radhakrishnan	c27f5bf52f	Report zero values instead of unknown for empty ingest queries (#15674 ) MSQ now allows empty ingest queries by default. For such queries that don't generate any output rows, the query counters in the async status result object/task report don't contain numTotalRows and totalSizeInBytes. These properties when not set/undefined can be confusing to API clients. For example, the web-console treats it as unknown values. This patch fixes the counters by explicitly reporting them as 0 instead of null for empty ingest queries.	2024-01-17 16:26:10 +05:30
Zoltan Haindrich	8a43db9395	Range support in window expressions (support them as groups) (#15365 ) * support groups windowing mode; which is a close relative of ranges (but not in the standard) * all windows with range expressions will be executed wit it groups * it will be 100% correct in case for both bounds its true that: isCurrentRow() \|\| isUnBounded() * this covers OVER ( ORDER BY COL ) * for other cases it will have some chances of getting correct results...	2024-01-17 00:05:21 -06:00
Laksh Singla	8ba06cf723	account for null values in the stddev post aggregator (#15660 )	2024-01-16 19:57:33 +05:30
AmatyaAvadhanula	6b951b94c0	Add new context parameter for using concurrent locks (#15684 ) Changes: - Add new task context flag useConcurrentLocks. - This can be set for an individual task or at a cluster level using `druid.indexer.task.default.context`. - When set to true, any appending task would use an APPEND lock and any other ingestion task would use a REPLACE lock when using time chunk locking. - If false (default), we fall back on the context flag taskLockType and then useSharedLock.	2024-01-16 12:43:39 +05:30
Kashif Faraz	18d2a8957f	Refactor: Cleanup test impls of ServiceEmitter (#15683 )	2024-01-15 17:37:00 +05:30
Gian Merlino	500681d0cb	Add ImmutableLookupMap for static lookups. (#15675 ) * Add ImmutableLookupMap for static lookups. This patch adds a new ImmutableLookupMap, which comes with an ImmutableLookupExtractor. It uses a fastutil open hashmap plus two lists to store its data in such a way that forward and reverse lookups can both be done quickly. I also observed footprint to be somewhat smaller than Java HashMap + MapLookupExtractor for a 1 million row lookup. The main advantage, though, is that reverse lookups can be done much more quickly than MapLookupExtractor (which iterates the entire map for each call to unapplyAll). This speeds up the recently added ReverseLookupRule (#15626) during SQL planning with very large lookups. * Use in one more test. * Fix benchmark. * Object2ObjectOpenHashMap * Fixes, and LookupExtractor interface update to have asMap. * Remove commented-out code. * Fix style. * Fix import order. * Add fastutil. * Avoid storing Map entries.	2024-01-13 13:14:01 -08:00
Gian Merlino	2231cb30a4	Faster k-way merging using tournament trees, 8-byte key strides. (#15661 ) * Faster k-way merging using tournament trees, 8-byte key strides. Two speedups for FrameChannelMerger (which does k-way merging in MSQ): 1) Replace the priority queue with a tournament tree, which does fewer comparisons. 2) Compare keys using 8-byte strides, rather than 1 byte at a time. * Adjust comments. * Fix style. * Adjust benchmark and test. * Add eight-list test (power of two).	2024-01-11 08:36:22 -08:00
Laksh Singla	87fbe42218	"Partition boost" the group by queries in MSQ for better splits (#15474 ) "Partition boost" the group by queries in MSQ for better splits	2024-01-11 12:46:27 +05:30
Kashif Faraz	d623756c66	Add cache for password hash in druid-basic-security (#15648 ) Add class PasswordHashGenerator. Move hashing logic from BasicAuthUtils to this new class. Add cache in the hash generator to contain the computed hash of passwords and boost validator performance Cache has max size 1000 and expiry 1 hour Key of the cache is an SHA-256 hash of the (password + random salt generated on service startup)	2024-01-10 17:46:09 +05:30
Ankit Kothari	355c2f5da0	Add sql + ingestion compatibility for first/last on numeric values (#15607 ) SQL compatibility for numeric last and first column types. Ingestion UI now provides option for first and last aggregation as well.	2024-01-10 12:59:38 +05:30
PANKAJ KUMAR	047c7340ab	Adding retries to update the metadata store instead of failure (#15141 ) Currently, If 2 tasks are consuming from the same partitions, try to publish the segment and update the metadata, the second task can fail because the end offset stored in the metadata store doesn't match with the start offset of the second task. We can fix this by retrying instead of failing. AFAIK apart from the above issue, the metadata mismatch can happen in 2 scenarios: - when we update the input topic name for the data source - when we run 2 replicas of ingestion tasks(1 replica will publish and 1 will fail as the first replica has already updated the metadata). Implemented the comparable function to compare the last committed end offset and new Sequence start offset. And return a specific error msg for this. Add retry logic on indexers to retry for this specific error msg. Updated the existing test case.	2024-01-10 12:30:54 +05:30
Misha	ea6ba40ce1	Add support for Azure Goverment storage (#15523 ) Added support for Azure Government storage in Druid Azure-Extensions. This enhancement allows the Azure-Extensions to be compatible with different Azure storage types by updating the endpoint suffix from a hardcoded value to a configurable one.	2024-01-09 22:33:32 +05:30
Jonathan Wei	5d1e66b8f9	Allow broker to use catalog for datasource schemas for SQL queries (#15469 ) * Allow broker to use catalog for datasource schemas * More PR comments * PR comments	2024-01-08 13:46:08 -06:00
Clint Wylie	c221a2634b	overhaul DruidPredicateFactory to better handle 3VL (#15629 ) * overhaul DruidPredicateFactory to better handle 3VL fixes some bugs caused by some limitations of the original design of how DruidPredicateFactory interacts with 3-value logic. The primary impacted area was with how filters on values transformed with expressions or extractionFn which turn non-null values into nulls, which were not possible to be modelled with the 'isNullInputUnknown' method changes: * adds DruidObjectPredicate to specialize string, array, and object based predicates instead of using guava Predicate * DruidPredicateFactory now uses DruidObjectPredicate * introduces DruidPredicateMatch enum, which all predicates returned from DruidPredicateFactory now use instead of booleans to indicate match. This means DruidLongPredicate, DruidFloatPredicate, DruidDoublePredicate, and the newly added DruidObjectPredicate apply methods all now return DruidPredicateMatch. This allows matchers and indexes * isNullInputUnknown has been removed from DruidPredicateFactory * rename, fix test * adjust * style * npe * more test * fix default value mode to not match new test	2024-01-05 19:08:02 -08:00
Zoltan Haindrich	b9679d0884	Run filter-into-join rule early for subqueries and disable project-filter rule (#15511 ) FILTER_INTO_JOIN is mainly run along with the other rules with the Volcano planner; however if the query starts highly underdefined (join conditions in the where clauses) that generic query could give a lot of room for the other rules to play around with only enabled it for when the join uses subqueries for its inputs. PROJECT_FILTER rule is not that useful. and could increase planning times by providing new plans. This problem worsened after we started supporting inner joins with arbitrary join conditions in https://github.com/apache/druid/pull/15302	2024-01-04 15:33:45 +05:30
George Shiqi Wu	8e95cea8e5	Azure client upgrade to allow identity options (#15287 ) * Include new dependencies * Mostly implemented * More azure fixes * Tests passing * Unit tests running * Test running after removing storage exception * Happy with coverage now * Add more tests * fix client factory * cleanup from testing * Remove old client * update docs * Exclude from spellcheck * Add licenses * Fix identity version * Save work * Add azure clients * add licenses * typos * Add dependencies * Exception is not thrown * Fix intellij check * Don't need to override * specify length * urldecode * encode path * Fix checks * Revert urlencode changes * Urlencode with azure library * Update docs/development/extensions-core/azure.md Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> * PR changes * Update docs/development/extensions-core/azure.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Deprecate AzureTaskLogsConfig.maxRetries * Clean up azure retry block * logic update to reuse clients * fix comments * Create container conditionally * Fix key auth * Remove container client logic * Add some more testing * Update comments * Add a comment explaining client reuse * Move logic to factory class * use bom for dependency management * fix license versions --------- Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2024-01-03 18:36:05 -05:00
Gian Merlino	e40b96e026	Reverse lookup fixes and enhancements. (#15611 ) * Reverse lookup fixes and enhancements. 1) Add a "mayIncludeUnknown" parameter to DimFilter#optimize. This is important because otherwise the reverse-lookup optimization is done improperly when the "in" filter appears under a "not", and the lookup extractionFn may return null for some possible values of the filtered column. The "includeUnknown" test cases in InDimFilterTest illustrate the difference in behavior. 2) Enhance InDimFilter#optimizeLookup to handle "mayIncludeUnknown", and to be able to do a reverse lookup in a wider variety of cases. 3) Make "unapply" protected in LookupExtractor, and move callers to "unapplyAll". The main reason is that MapLookupExtractor, a common implementation, lacks a reverse mapping and therefore does a scan of the map for each call to "unapply". For performance sake these calls need to be batched. * Remove optimize call from BloomDimFilter. * Follow the law. * Fix tests. * Fix imports. * Switch function. * Fix tests. * More tests.	2024-01-03 13:28:44 -08:00
Abhishek Radhakrishnan	9c7d7fc777	Allow empty inserts and replaces in MSQ. (#15495 ) * Allow empty inserts and replace. - Introduce a new query context failOnEmptyInsert which defaults to false. - When this context is false (default), MSQE will now allow empty inserts and replaces. - When this context is true, MSQE will throw the existing InsertCannotBeEmpty MSQ fault. - For REPLACE ALL over an ALL grain segment, the query will generate a tombstone spanning eternity which will be removed eventually be the coordinator. - Add unit tests in MSQInsertTest, MSQReplaceTest to test the new default behavior (i.e., when failOnEmptyInsert = false) - Update unit tests in MSQFaultsTest to test the non-default behavior (i.e., when failOnEmptyInsert = true) * Ignore test to see if it's the culprit for OOM * Add heap dump config * Bump up -Xmx from 1500 MB to 2048 MB * Add steps to tarball and collect hprof dump to GHA action * put back mx to 1500MB to trigger the failure * add the step to reusable unit test workflow as well * Revert the temp heap dump & @Ignore changes since max heap size is increased * Minor updates * Review comments 1. Doc suggestions 2. Add tests for empty insert and replace queries with ALL grain and limit in the default failOnEmptyInsert mode (=false). Add similar tests to MSQFaultsTest with failOnEmptyInsert = true, so the query does fail with an InsertCannotBeEmpty fault. 3. Nullable annotation and javadocs * Add comment replace_limit.patch	2024-01-02 13:05:51 -08:00
Kashif Faraz	9f568858ef	Add logging implementation for AuditManager and audit more endpoints (#15480 ) Changes - Add `log` implementation for `AuditManager` alongwith `SQLAuditManager` - `LoggingAuditManager` simply logs the audit event. Thus, it returns empty for all `fetchAuditHistory` calls. - Add new config `druid.audit.manager.type` which can take values `log`, `sql` (default) - Add new config `druid.audit.manager.logLevel` which can take values `DEBUG`, `INFO`, `WARN`. This gets activated only if `type` is `log`. - Remove usage of `ConfigSerde` from `AuditManager` as audit is not just limited to configs - Add `AuditSerdeHelper` for a single implementation of serialization/deserialization of audit payload and other utility methods.	2023-12-19 13:14:04 +05:30
Vishesh Garg	e43bb74c3a	Add MSQ Durable Storage Connector for Google Cloud Storage and change current Google Cloud Storage client library (#15398 ) The PR addresses 2 things: Add MSQ durable storage connector for GCS Change GCS client library from the old Google API Client Library to the recommended Google Cloud Client Library. Ref: https://cloud.google.com/apis/docs/client-libraries-explained	2023-12-14 07:34:49 +05:30
Keerthana Srikanth	f32dbd4131	Upgrade pac4j-oidc to 4.5.7 to address CVE-2021-44878 (#15522 ) * Upgrade org.pac4j:pac4j-oidc to 4.5.5 to address CVE-2021-44878 * add CVE suppression and notes, since vulnerability scan still shows this CVE * Add tests to improve coverage	2023-12-13 10:44:05 -08:00
AmatyaAvadhanula	48a96f5d06	Better automatic offset reset for Kinesis ingestion (#15338 ) Better automatic offset reset for Kinesis ingestion	2023-12-13 12:03:17 +05:30
Jan Werner	3c7dec56ca	update kubernetes java client to 19.0.0 and docker-java to 3.3.4 (#15449 ) Update of direct dependencies: * kubernetes java-client to 19.0.0 * docker-java-bom to 3.3.4 In order to update transitive dependencies: * okio to 3.6.0 * bcjava to 1.76 To address CVES: - CVE-2023-3635 in okio - CVE-2023-33201 in bcjava --------- Co-authored-by: Xavier Léauté <xvrl@apache.org>	2023-12-12 14:27:57 -08:00
Adarsh Sanjeev	254a8eb7e0	Add null checks for HllSketchHolder (#15502 ) Fixes a potential NPE which could occur while folding the HllSketchAggregator. If the sketch is null, druid could return a null HllSketchHolder object. Adding a null check here could help here Resolves a null pointer exception in HllSketchAggregatorFactory	2023-12-08 11:43:04 +05:30
Jan Werner	ff0e838d30	add gson to dependencyManagement (#15488 ) This change completes the change introduced in #15461 and unifies the version of gson dependency used between all the modules. gson is used by kubernetes-extension, avro-extensions, ranger-security, and as a test dependency in several core modules. --------- Co-authored-by: Xavier Léauté <xl+github@xvrl.net>	2023-12-05 11:50:32 -08:00
Jan Werner	f4856bc1c1	ranger-security: exclude jackson-jaxrs from + fix outdated documentation (#15481 ) * Excluding jackson-jaxrs dependency from ranger-plugin-common to address CVE regression introduced by ranger-upgrade: CVE-2019-10202, CVE-2019-10172 * remove the reference to outdated ranger 2.0 from the docs --------- Co-authored-by: Xavier Léauté <xl+github@xvrl.net>	2023-12-05 08:24:37 -08:00
Adarsh Sanjeev	ddd2299272	Add null check for VarianceAggregatorCollector	2023-12-04 22:26:44 +05:30
Jan Werner	b854058491	remove unnecessary elasticsearch dependencies to fix CVE regressions (#15443 ) Recent upgrade of ranger introduced CVE regressions due to outdated elasticsearch components. Druid-ranger-plugin does not elasticsearch components , and they have been explicitly removed. Update woodstox-core to 6.4.0 to address GHSA-3f7h-mf4q-vrm4	2023-12-03 20:56:40 +05:30
AmatyaAvadhanula	4a594bb9f6	Use task actions to fetch used segments in MSQ (#15284 ) * Use task actions to fetch used segments in MSQ * Fix tests * Fixing tests. * Revert "Fix tests" This reverts commit `95ab6494` * Removing conditional check in tests. * Pulling in latest changes. --------- Co-authored-by: cryptoe <karankumar1100@gmail.com>	2023-12-01 15:29:33 +05:30
Abhishek Agarwal	0a56c87e93	SQL: Plan non-equijoin conditions as cross join followed by filter (#15302 ) This PR revives #14978 with a few more bells and whistles. Instead of an unconditional cross-join, we will now split the join condition such that some conditions are now evaluated post-join. To decide what sub-condition goes where, I have refactored DruidJoinRule class to extract unsupported sub-conditions. We build a postJoinFilter out of these unsupported sub-conditions and push to the join.	2023-11-29 13:46:11 +05:30
Jan Werner	ee6ad36fab	update confluent's dependencies to common, supported version (#15441 ) * update confluent's dependencies to common, supported version Update io.confluent.* dependencies to common, updated version 6.2.12 currently used versions are EOL * move version definition to the top level pom	2023-11-28 21:35:22 -08:00
Clint Wylie	97623b408c	add optional 'castToType' parameter to 'auto' column schema (#15417 ) * auto but.. with an expected type	2023-11-28 17:19:23 -08:00
Kashif Faraz	67c7b6248c	Fix log typos, clean up some kill messages in SeekableStreamSupervisor (#15424 ) Changes: - Fix log `Got end of partition marker for partition [%s] from task [%s] in discoverTasks` by fixing order of args - Simplify in-line classes by using lambda - Update kill task message from `Task [%s] failed to respond to [set end offsets] in a timely manner, killing task` to `Failed to set end offsets, killing task` - Clean up tests	2023-11-24 16:09:10 +05:30
Vivek Dhiman	c14cfc2a86	Patched security vulnerability by updating Ranger libraries to the ne… (#15363 ) Patched security vulnerability by updating Ranger libraries to the newest available version.	2023-11-22 15:47:18 +05:30
Kashif Faraz	4ba3cf5221	Add test to verify sequence name of Kafka task (#15397 ) * Add test to verify sequence name of Kafka and Kinesis tasks	2023-11-21 10:17:32 +05:30
Magnus Henoch	67f45fa7bf	Fix histograms for sketches where min and max are equal (#15381 ) There is a problem with Quantiles sketches and KLL Quantiles sketches. Queries using the histogram post-aggregator fail if: - the sketch contains at least one value, and - the values in the sketch are all equal, and - the splitPoints argument is not passed to the post-aggregator, and - the numBins argument is greater than 2 (or not specified, which leads to the default of 10 being used) In that case, the query fails and returns this error: { "error": "Unknown exception", "errorClass": "org.apache.datasketches.common.SketchesArgumentException", "host": null, "errorCode": "legacyQueryException", "persona": "OPERATOR", "category": "RUNTIME_FAILURE", "errorMessage": "Values must be unique, monotonically increasing and not NaN.", "context": { "host": null, "errorClass": "org.apache.datasketches.common.SketchesArgumentException", "legacyErrorCode": "Unknown exception" } } This behaviour is undesirable, since the caller doesn't necessarily know in advance whether the sketch has values that are diverse enough. With this change, the post-aggregators return [N, 0, 0...] instead of crashing, where N is the number of values in the sketch, and the length of the list is equal to numBins. That is what they already returned for numBins = 2. Here is an example of a query that would fail: {"queryType":"timeseries", "dataSource": { "type": "inline", "columnNames": ["foo", "bar"], "rows": [ ["abc", 42.0], ["def", 42.0] ] }, "intervals":["0000/3000"], "granularity":"all", "aggregations":[ {"name":"the_sketch", "fieldName":"bar", "type":"quantilesDoublesSketch"}], "postAggregations":[ {"name":"the_histogram", "type":"quantilesDoublesSketchToHistogram", "field":{"type":"fieldAccess","fieldName":"the_sketch"}, "numBins": 3}]} I believe this also fixes issue #10585.	2023-11-16 12:31:22 +05:30
Krishna Anandan	53797b9e49	Fixed a flaky test in `S3DataSegmentPusherConfigTest#testSerialization` by changing string to key:value pair (#15207 ) * Fix capacity response in mm-less ingestion (#14888) Changes: - Fix capacity response in mm-less ingestion. - Add field usedClusterCapacity to the GET /totalWorkerCapacity response. This API should be used to get the total ingestion capacity on the overlord. - Remove method `isK8sTaskRunner` from interface `TaskRunner` * Using Map to perform comparison * Minor Change --------- Co-authored-by: George Shiqi Wu <george.wu@imply.io>	2023-11-15 09:05:55 -08:00
Krishna Anandan	4ca5acdc33	Fixed 2 Flaky Tests (#15376 )	2023-11-15 18:40:09 +05:30
Karan Kumar	a70a3d5d48	Fix cancellation bug in MSQ. (#15368 ) Saw bug where MSQ controller task would continue to hold the task slot even after cancel was issued. This was due to a deadlock created on work launch. The main thread was waiting for tasks to spawn and the cancel thread was waiting for tasks to finish. The fix was to instruct the MSQWorkerTaskLauncher thread to stop creating new tasks which would enable the main thread to unblock and release the slot. Also short circuited the taskRetriable condition. Now the check is run in the MSQWorkerTaskLauncher thread as opposed to the main event thread loop. This will result in faster task failure in case the task is deemed to be non retriable.	2023-11-15 18:22:51 +05:30
Abhishek Radhakrishnan	2e79fd56a7	MSQ generates tombstones honoring granularity specified in a `REPLACE` query. (#15243 ) * MSQ generates tombstones honoring the query's granularity. This change tweaks to only account for the infinite-interval tombstones. For finite-interval tombstones, the MSQ query granualrity will be used which is consistent with how MSQ works. * more tests and some cleanup. * checkstyle * comment edits * Throw TooManyBuckets fault based on review; add more tests. * Add javadocs for both methods on reconciling the methods. * review: Move testReplaceTombstonesWithTooManyBucketsThrowsException to MsqFaultsTest * remove unused imports. * Move TooManyBucketsException to indexing package for shared exception handling. * lower max bucket for tests and fixup count * Advance and count the iterator. * checkstyle	2023-11-14 23:35:36 -08:00
Krishna Anandan	06744d3827	Changing a string to key:value pair to fix flakiness in `testSerializationWithDefaults` (#15317 ) * + Fix for Flaky Test * + Replacing TreeMap with LinkedHashMap * + Changing data structure from LinkedHashMap to HashMap * Fixed flaky test in S3DataSegmentPusherConfigTest.testSerializationValidatingMaxListingLength * Minor Changes	2023-11-14 15:59:07 -08:00
Krishna Anandan	5edeac28df	+ Switching Comparison from String to JSON (#15364 )	2023-11-14 08:07:19 -08:00
Pranav	e2fde8c516	Refactor lookups behavior while loading/dropping the containers (#14806 )	2023-11-07 10:07:28 -08:00

1 2 3 4 5 ...

1591 Commits