druid

Commit Graph

Author	SHA1	Message	Date
Alberic Liu	0eaa810e89	Fix the maven warning during build (#16746 )	2024-07-18 14:56:15 +08:00
Akshat Jain	b53c26f5c5	Fix issues with partitioning boundaries for MSQ window functions (#16729 ) * Fix issues with partitioning boundaries for MSQ window functions * Address review comments * Address review comments * Add test for coverage check failure * Address review comment * Remove DruidWindowQueryTest and WindowQueryTestBase, move those tests to DrillWindowQueryTest * Update extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/querykit/WindowOperatorQueryKit.java * Address review comments * Add test for equals and hashcode for WindowOperatorQueryFrameProcessorFactory * Address review comment * Fix checkstyle --------- Co-authored-by: Benedict Jin <asdf2014@apache.org>	2024-07-18 10:05:09 +08:00
Kashif Faraz	89066b72cf	Fix bug in TaskStorageQueryAdapter (#16750 ) Changes: - Do not hold a reference to `TaskQueue` in `TaskStorageQueryAdapter` - Use `TaskStorage` instead of `TaskStorageQueryAdapter` in `IndexerMetadataStorageAdapter` - Rename `TaskStorageQueryAdapter` to `TaskQueryTool` - Fix newly added task actions `RetrieveUpgradedFromSegmentIds` and `RetrieveUpgradedToSegmentIds` by removing `isAudited` method.	2024-07-17 23:17:41 +05:30
Sree Charan Manamala	40ef9fc4ec	Bug fix for array type selector causing array aggregation over window frame fail (#16653 )	2024-07-17 14:09:56 +02:00
Kashif Faraz	9f6ce6ddc0	Remove task action audit logging and druid_taskLog metadata table (#16309 ) Description: Task action audit logging was first deprecated and disabled by default in Druid 0.13, #6368. As called out in the original discussion #5859, there are several drawbacks to persisting task action audit logs. - Only usage of the task audit logs is to serve the API `/indexer/v1/task/{taskId}/segments` which returns the list of segments created by a task. - The use case is really narrow and no prod clusters really use this information. - There can be better ways of obtaining this information, such as the metric `segment/added/bytes` which reports both the segment ID and task ID when a segment is committed by a task. We could also include committed segment IDs in task reports. - A task persisting several segments would bloat up the audit logs table putting unnecessary strain on metadata storage. Changes: - Remove `TaskAuditLogConfig` - Remove method `TaskAction.isAudited()`. No task action is audited anymore. - Remove `SegmentInsertAction` as it is not used anymore. `SegmentTransactionalInsertAction` is the new incarnation which has been in use for a while. - Deprecate `MetadataStorageActionHandler.addLog()` and `getLogs()`. These are not used anymore but need to be retained for backward compatibility of extensions. - Do not create `druid_taskLog` metadata table anymore.	2024-07-17 17:09:00 +05:30
Sree Charan Manamala	78a4a09d01	Window Function offset correction for RAC (#16718 ) * When an ArrayList RAC creates a child RAC, the start and end offsets need to have the offset of parent's start offset * Defaults the 2nd window bound to CURRENT ROW when only a single bound is specified * Removes the windowingStrictValidation warning and throws a hard exception when Order By alongside RANGE clause is not provided with UNBOUNDED or CURRENT ROW as both bounds	2024-07-15 12:43:27 +02:00
Laksh Singla	209f8a9546	Deserialize complex dimensions in group by queries to their respective types when reading from spilled files and cached results (#16620 ) Like #16511, but for keys that have been spilled or cached during the grouping process	2024-07-15 15:00:17 +05:30
Laksh Singla	3a1b437056	Improve the fallback strategy when the broker is unable to materialize the subquery's results as frames for estimating the bytes (#16679 ) Better fallback strategy when the broker is unable to materialize the subquery's results as frames for estimating the bytes: a. We don't touch the subquery sequence till we know that we can materialize the result as frames	2024-07-12 21:49:12 +05:30
Vishesh Garg	197c54f673	Auto-Compaction using Multi-Stage Query Engine (#16291 ) Description: Compaction operations issued by the Coordinator currently run using the native query engine. As majority of the advancements that we are making in batch ingestion are in MSQ, it is imperative that we support compaction on MSQ to make Compaction more robust and possibly faster. For instance, we have seen OOM errors in native compaction that MSQ could have handled by its auto-calculation of tuning parameters. This commit enables compaction on MSQ to remove the dependency on native engine. Main changes: * `DataSourceCompactionConfig` now has an additional field `engine` that can be one of `[native, msq]` with `native` being the default. * if engine is MSQ, `CompactSegments` duty assigns all available compaction task slots to the launched `CompactionTask` to ensure full capacity is available to MSQ. This is to avoid stalling which could happen in case a fraction of the tasks were allotted and they eventually fell short of the number of tasks required by the MSQ engine to run the compaction. * `ClientCompactionTaskQuery` has a new field `compactionRunner` with just one `engine` field. * `CompactionTask` now has `CompactionRunner` interface instance with its implementations `NativeCompactinRunner` and `MSQCompactionRunner` in the `druid-multi-stage-query` extension. The objectmapper deserializes `ClientCompactionRunnerInfo` in `ClientCompactionTaskQuery` to the `CompactionRunner` instance that is mapped to the specified type [`native`, `msq`]. * `CompactTask` uses the `CompactionRunner` instance it receives to create the indexing tasks. * `CompactionTask` to `MSQControllerTask` conversion logic checks whether metrics are present in the segment schema. If present, the task is created with a native group-by query; if not, the task is issued with a scan query. The `storeCompactionState` flag is set in the context. * Each created `MSQControllerTask` is launched in-place and its `TaskStatus` tracked to determine the final status of the `CompactionTask`. The id of each of these tasks is the same as that of `CompactionTask` since otherwise, the workers will be unable to determine the controller task's location for communication (as they haven't been launched via the overlord).	2024-07-12 16:40:20 +05:30
Clint Wylie	dca31d466c	minor adjustments for performance (#16714 ) changes: * switch to stop using some string.format * switch some streams to classic loops	2024-07-11 16:57:15 -07:00
Clint Wylie	b3c238457f	fix unnest bugs (#16723 ) changes: * fixes a bug with unnest storage adapter not preserving underlying columns dictionary uniqueness when allowing dimension selector cursor * fixes a bug with unnest on realtime segments with empty rows incorrectly specifying index 0 as the row dictionary value	2024-07-11 13:48:15 -07:00
Clint Wylie	d6c07270a5	fix issues with join filter pushdown and virtual column resolution (#16702 )	2024-07-11 04:26:07 -07:00
Clint Wylie	09e0eefdc3	modify equality and typed in filter behavior for numeric match values on string columns (#16593 ) * fix equality and typed in filter behavior for numeric match values on string columns changes: * EqualityFilter and TypedInfilter numeric match values against string columns will now cast strings to numeric values instead of converting the numeric values directly to string for pure string equality, which is consistent with the casts which are eaten in the SQL layer, as well as classic druid behavior * added tests to cover numeric equality matching. Double match values in particular would fail to match the string values since `1.0` would become `'1.0'` which does not match `'1'`.	2024-07-08 10:58:05 -07:00
Clint Wylie	45c020060c	better javadoc for ColumnIndexSupplier (#16663 ) Updated javadoc for `ColumnIndexSupplier.as` to elaborate on the types of indexes callers might want to ask for from the method, as well as help implementors know what kinds of indexes they should implement to participate in filtering	2024-06-27 17:53:20 -07:00
Clint Wylie	d86f25c74a	fix vector grouping expression deferred evaluation to only consider dictionary encoded strings as fixed width (#16666 )	2024-06-27 16:19:16 -07:00
Gian Merlino	dbed1b0f50	Defer more expressions in vectorized groupBy. (#16338 ) * Defer more expressions in vectorized groupBy. This patch adds a way for columns to provide GroupByVectorColumnSelectors, which controls how the groupBy engine operates on them. This mechanism is used by ExpressionVirtualColumn to provide an ExpressionDeferredGroupByVectorColumnSelector that uses the inputs of an expression as the grouping key. The actual expression evaluation is deferred until the grouped ResultRow is created. A new context parameter "deferExpressionDimensions" allows users to control when this deferred selector is used. The default is "fixedWidthNonNumeric", which is a behavioral change from the prior behavior. Users can get the prior behavior by setting this to "singleString". * Fix style. * Add deferExpressionDimensions to SqlExpressionBenchmark. * Fix style. * Fix inspections. * Add more testing. * Use valueOrDefault. * Compute exprKeyBytes a bit lighter-weight.	2024-06-26 17:28:36 -07:00
Clint Wylie	d4f2636325	fix greatest/least function non-vectorized processing to ignore null argument types (#16649 )	2024-06-26 12:59:42 -07:00
Laksh Singla	71b3b5ab5d	Add query context parameter to remove null bytes when writing frames (#16579 ) MSQ cannot process null bytes in string fields, and the current workaround is to remove them using the REPLACE function. 'removeNullBytes' context parameter has been added which sanitizes the input string fields by removing these null bytes.	2024-06-26 15:00:30 +05:30
Kashif Faraz	d9bd02256a	Refactor: Rename UsedSegmentChecker and cleanup task actions (#16644 ) Changes: - Rename `UsedSegmentChecker` to `PublishedSegmentsRetriever` - Remove deprecated single `Interval` argument from `RetrieveUsedSegmentsAction` as it is now unused and has been deprecated since #1988 - Return `Set` of segments instead of a `Collection` from `IndexerMetadataStorageCoordinator.retrieveUsedSegments()`	2024-06-26 10:48:59 +05:30
Tom	52c9929019	Column name in parse exceptions (#16529 ) * first pass * more changes * fix tests and formatting * fix kinesis failing tests * fix kafka tests * add dimension name to float parse errors * double and convertToType handling of dimensionName can report parse errors with dimension name * fix checkstyle issue * fix tests * more cases to have better parse exception messages * fix test * fix tests * partially address comments * annotate method parameter with nullable * address comments * fix tests * let float, double, long dimensionIndexer pass dimensionName down to dimensionHandlerUtils * fix compilation error and clean up formatting * clean up whitespace * address feedback. undo change, pass down report parse exception for convertToType * fix test	2024-06-25 13:42:52 -07:00
Clint Wylie	37a50e6803	Remove index_realtime and index_realtime_appenderator tasks (#16602 ) index_realtime tasks were removed from the documentation in #13107. Even at that time, they weren't really documented per se— just mentioned. They existed solely to support Tranquility, which is an obsolete ingestion method that predates migration of Druid to ASF and is no longer being maintained. Tranquility docs were also de-linked from the sidebars and the other doc pages in #11134. Only a stub remains, so people with links to the page can see that it's no longer recommended. index_realtime_appenderator tasks existed in the code base, but were never documented, nor as far as I am aware were they used for any purpose. This patch removes both task types completely, as well as removes all supporting code that was otherwise unused. It also updates the stub doc for Tranquility to be firmer that it is not compatible. (Previously, the stub doc said it wasn't recommended, and pointed out that it is built against an ancient 0.9.2 version of Druid.) ITUnionQueryTest has been migrated to the new integration tests framework and updated to use Kafka ingestion. Co-authored-by: Gian Merlino <gianmerlino@gmail.com>	2024-06-24 20:13:33 -07:00
Abhishek Radhakrishnan	7463589b07	Support for bootstrap segments (#16609 ) * Initial support for bootstrap segments. - Adds a new API in the coordinator. - All processes that have storage locations configured (including tasks) talk to the coordinator if they can, and fetch bootstrap segments from it. - Then load the segments onto the segment cache as part of startup. - This addresses the segment bootstrapping logic required by processes before they can start serving queries or ingesting. This patch also lays the foundation to speed up upgrades. * Fail open by default if there are any errors talking to the coordinator. * Add test for failure scenario and cleanup logs. * Cleanup and add debug log * Assert the events so we know the list exactly. * Revert RunRules test. The rules aren't evaluated if there are no clusters. * Revert RunRulesTest too. * Remove debug info. * Make the API POST and update log. * Fix up UTs. * Throw 503 from MetadataResource; clean up exception handling and DruidException. * Remove unused logger, add verification of metrics and docs. * Update error message * Update server/src/main/java/org/apache/druid/server/coordination/SegmentLoadDropHandler.java Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Apply suggestions from code review Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Adjust test metric expectations with the rename. * Add BootstrapSegmentResponse container in the response for future extensibility. * Rename to BootstrapSegmentsInfo for internal consistency. * Remove unused log. * Use a member variable for broadcast segments instead of segmentAssigner. * Minor cleanup * Add test for loadable bootstrap segments and clarify comment. * Review suggestions. --------- Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2024-06-24 09:27:17 -07:00
Sree Charan Manamala	990fd5f5fb	Make use group iterator for all window frames & support for same bound kinds (#16603 ) Fixes apache/druid#15739	2024-06-24 15:52:41 +02:00
Laksh Singla	00c96432af	Materialize scan results correctly when columns are not present in the segments (#16619 ) Fixes a bug causing maxSubqueryBytes not to work when segments have missing columns.	2024-06-23 23:15:45 +05:30
Akshat Jain	cd438b1918	Emit metrics for S3UploadThreadPool (#16616 ) * Emit metrics for S3UploadThreadPool * Address review comments * Revert unnecessary formatting change * Revert unnecessary formatting change in metrics.md file * Address review comments * Add metric for task duration * Minor fix in metrics.md * Add s3Key and uploadId in the log message * Address review comments * Create new instance of ServiceMetricEvent.Builder for thread safety * Address review comments * Address review comments	2024-06-21 11:36:47 +05:30
Adithya Chakilam	35709de549	CgroupCpuSetMonitor: Initialize the cgroup discoverer (#16621 )	2024-06-20 10:23:59 -07:00
Abhishek Radhakrishnan	b20c3dbadf	Fix malformed period throwing `ADMIN` persona error (#16626 ) * Turn invalid periods into user-facing exception providing more context. The current exception is targeting the ADMIN persona. Catch that and turn it into a USER persona instead. Also, provide more context in the error message. * Review comment: pass the wrapping expression and stringify. * Update processing/src/main/java/org/apache/druid/query/expression/ExprUtils.java Co-authored-by: Clint Wylie <cjwylie@gmail.com> --------- Co-authored-by: Clint Wylie <cjwylie@gmail.com>	2024-06-20 08:40:28 -07:00
Sree Charan Manamala	7ac0862287	Grouping Engine fix when a limit spec with different order by columns is applied (#16534 )	2024-06-20 11:35:58 +02:00
Sam Rash	a10310388f	Add Conditional Helpers to DruidException / InvalidInput (#16470 ) Adds versions of DruidException.defensive(String, Object...) InvalidInput.exception(String, Object...) InvalidInput.exception(Throwable, String, Object...) the versions add a boolean as the first arg and only create and throw an exception if it's false. It can be used similar to Preconditions.checkState/checkArgument	2024-06-18 14:05:43 +05:30
Virushade	eb842d3dda	Remove redundant check on optional in BlockingQueueFrameChannel.Writable#isClosed (#16595 ) * Remove redundant check on optional in BlockingQueueFrameChannel.Writable#isClosed * Rollback mistake	2024-06-14 15:21:07 +05:30
Laksh Singla	da1e293a57	Deserialize dimensions in group by queries to their respective types when reading from their serialized format (#16511 ) * init * tests, pair groupable * framework change * tests * update benchmarks * comments * add javadoc for the jsonMapper * remove extra deserialization * add special serde for map based result rows * revert unnecessary change --------- Co-authored-by: asdf2014 <asdf2014@apache.org>	2024-06-14 16:27:47 +08:00
Zoltan Haindrich	ac19b148c2	Upgrade calcite to 1.37.0 (#16504 ) * contains Make a full copy of the parser and apply our modifications to it #16503 * some minor api changes pair/entry * some unnecessary aggregation was removed from a set of queries in `CalciteSubqueryTest` * `AliasedOperatorConversion` was detecting `CHAR_LENGTH` as not a function ; I've removed the check * the field it was using doesn't look maintained that much * the `kind` is passed for the created `SqlFunction` so I don't think this check is actually needed * some decoupled test cases become broken - will be fixed later * some aggregate related changes: due to the fact that SUM() and COUNT() of no inputs are different * upgrade avatica to 1.25.0 * `CalciteQueryTest#testExactCountDistinctWithFilter` is now executable Close apache/druid#16503	2024-06-13 08:47:50 +02:00
Clint Wylie	fee509df2e	fix NestedDataColumnIndexerV4 to not report cardinality (#16507 ) * fix NestedDataColumnIndexerV4 to not report cardinality changes: * fix issue similar to #16489 but for NestedDataColumnIndexerV4, which can report STRING type if it only processes a single type of values. this should be less common than the auto indexer problem * fix some issues with sql benchmarks	2024-06-11 20:58:12 -07:00
Clint Wylie	3fb6ba22e8	fix expression column capabilities to not report dictionary encoded unless input is string (#16577 )	2024-06-08 13:05:19 -07:00
Akshat Jain	03a38be446	Optimize S3 storage writing for MSQ durable storage (#16481 ) * Optimise S3 storage writing for MSQ durable storage * Get rid of static ConcurrentHashMap * Fix static checks * Fix tests * Remove unused constructor parameter chunkValidation + relevant cleanup * Assert etags as String instead of Integer * Fix flaky test * Inject executor service * Make threadpool size dynamic based on number of cores * Fix S3StorageDruidModuleTest * Fix S3StorageConnectorProviderTest * Fix injection issues * Add S3UploadConfig to manage maximum number of concurrent chunks dynamically based on chunk size * Address the minor review comments * Refactor S3UploadConfig + ExecutorService into S3UploadManager * Address review comments * Make updateChunkSizeIfGreater() synchronized instead of recomputeMaxConcurrentNumChunks() * Address the minor review comments * Fix intellij-inspections check * Refactor code to use futures for maxNumConcurrentChunks. Also use executor service with blocking queue for backpressure semantics. * Update javadoc * Get rid of cyclic dependency injection between S3UploadManager and S3OutputConfig * Fix RetryableS3OutputStreamTest * Remove unnecessary synchronization parts from RetryableS3OutputStream * Update javadoc * Add S3UploadManagerTest * Revert back to S3StorageConnectorProvider extends S3OutputConfig * Address Karan's review comments * Address Kashif's review comments * Change a log message to debug * Address review comments * Fix intellij-inspections check * Fix checkstyle --------- Co-authored-by: asdf2014 <asdf2014@apache.org>	2024-06-07 11:33:16 +05:30
Gian Merlino	277006446d	Fallback vectorization for FunctionExpr and BaseMacroFunctionExpr. (#16366 ) * Fallback vectorization for FunctionExpr and BaseMacroFunctionExpr. This patch adds FallbackVectorProcessor, a processor that adapts non-vectorizable operations into vectorizable ones. It is used in FunctionExpr and BaseMacroFunctionExpr. In addition: - Identifiers are updated to offer getObjectVector for ARRAY and COMPLEX in addition to STRING. ExprEvalObjectVector is updated to offer ARRAY and COMPLEX as well. - In SQL tests, cannotVectorize now fails tests if an exception is not thrown. This makes it easier to identify tests that can now vectorize. - Fix a null-matcher bug in StringObjectVectorValueMatcher. * Fix tests. * Fixes. * Fix tests. * Fix test. * Fix test.	2024-06-05 20:03:02 -07:00
Gian Merlino	b837ce565b	Simplify serialized form of JsonInputFormat. (#15691 ) * Simplify serialized form of JsonInputFormat. Use JsonInclude for keepNullColumns, assumeNewlineDelimited, and useJsonNodeReader. Because the default value of keepNullColumns is variable, we store the original configured value rather than the derived value, and include if the original value is nonnull. * Fix test.	2024-06-05 20:01:14 -07:00
Gian Merlino	1040a29bc5	Fix capabilities reported by UnnestStorageAdapter. (#16551 ) UnnestStorageAdapter and its cursors did not return capabilities correctly for the output column. This patch fixes two problems: 1) UnnestStorageAdapter returned the capabilities of the unnest virtual column prior to unnesting. It should return the post-unnest capabilities. 2) UnnestColumnValueSelectorCursor passed through isDictionaryEncoded from the unnest virtual column. This is incorrect, because the dimension selector created by this class never has a dictionary. This is the cause of #16543.	2024-06-05 15:19:42 -07:00
Akshat Jain	6d7d2ffa63	Add interface method for returning canonical lookup name (#16557 ) * Add interface method for returning canonical lookup name * Address review comment * Add test in LookupReferencesManagerTest for coverage check * Add test in LookupSerdeModuleTest for coverage check	2024-06-05 14:33:18 -07:00
Abhishek Radhakrishnan	b9ba286423	Fix task bootstrapping & simplify segment load/drop flows (#16475 ) * Fix task bootstrap locations. * Remove dependency of SegmentCacheManager from SegmentLoadDropHandler. - The load drop handler code talks to the local cache manager via SegmentManager. * Clean up unused imports and stuff. * Test fixes. * Intellij inspections and test bind. * Clean up dependencies some more * Extract test load spec and factory to its own class. * Cleanup test util * Pull SegmentForTesting out to TestSegmentUtils. * Fix up. * Minor changes to infoDir * Replace server announcer mock and verify that. * Add tests. * Update javadocs. * Address review comments. * Separate methods for download and bootstrap load * Clean up return types and exception handling. * No callback for loadSegment(). * Minor cleanup * Pull out the test helpers into its own static class so it can have better state control. * LocalCacheManager stuff * Fix build. * Fix build. * Address some CI warnings. * Minor updates to javadocs and test code. * Address some CodeQL test warnings and checkstyle fix. * Pass a Consumer<DataSegment> instead of boolean & rename variables. * Small updates * Remove one test constructor. * Remove the other constructor that wasn't initializing fully and update usages. * Cleanup withInfoDir() builder and unnecessary test hooks. * Remove mocks and elaborate on comments. * Commentary * Fix a few Intellij inspection warnings. * Suppress corePoolSize intellij-inspect warning. The intellij-inspect tool doesn't seem to correctly inspect lambda usages. See ScheduledExecutors. * Update docs and add more tests. * Use hamcrest for asserting order on expectation. * Shutdown bootstrap exec. * Fix checkstyle	2024-06-04 10:44:46 -07:00
Adithya Chakilam	a9044ac235	Add cgroup cpu/mem/disk usage metrics (#16472 ) * Add cgroup cpu/mem usage metrics * checks * comments * docs fix * add disk metrics * fapi check * checkstyle * issues * spelling * change asserts * checks * use proc builder instead of runtime * specify charset * spotbug	2024-05-29 12:44:37 -07:00
Adarsh Sanjeev	21f725f33e	Add octet streaming of sketchs in MSQ (#16269 ) There are a few issues with using Jackson serialization in sending datasketches between controller and worker in MSQ. This caused a blowup due to holding multiple copies of the sketch being stored. This PR aims to resolve this by switching to deserializing the sketch payload without Jackson. The PR adds a new query parameter used during communication between controller and worker while fetching sketches, "sketchEncoding". If the value of this parameter is OCTET, the sketch is returned as a binary encoding, done by ClusterByStatisticsSnapshotSerde. If the value is not the above, the sketch is encoded by Jackson as before.	2024-05-28 18:12:38 +05:30
Kashif Faraz	9d77ef04f4	Cleanup usages of stopwatch (#16478 ) Changes: - Remove synchronized methods from `Stopwatch` - Access stopwatch methods in `ChangeRequestHttpSyncer` inside a lock	2024-05-27 23:08:46 +05:30
Clint Wylie	4e1de50e30	fix issue with auto column grouping (#16489 ) * fix issue with auto column grouping changes: * fixes bug where AutoTypeColumnIndexer reports incorrect cardinality, allowing it to incorrectly use array grouper algorithm for realtime queries producing incorrect results for strings * fixes bug where auto LONG and DOUBLE type columns incorrectly report not having null values, resulting in incorrect null handling when grouping * fix test	2024-05-27 11:18:17 +05:30
zachjsh	b0cc1ee84b	Add ability to turn off Druid Catalog specific validation done on catalog defined tables in Druid (#16465 ) * * add property to enable / disable catalog validation and add tests * * add integration tests for catalog validation disabled * * add integration tests * * remove debugging logs * * fix forbidden api call	2024-05-23 13:19:51 -04:00
Pranav	204a25d3e6	Moving object contains to Bound for string/object matchers (#16241 )	2024-05-23 16:56:04 +02:00
Gian Merlino	eb410f712d	Use typecasting comparator for numeric "any" aggregations. (#16494 ) This brings them in line with the behavior of other numeric aggregations. It is important because otherwise ClassCastExceptions can arise if comparing different numeric types that may arise from deserialization.	2024-05-22 12:38:51 -07:00
Gian Merlino	0fb09445a5	Fix ExpressionPredicateIndexSupplier numeric replace-with-default behavior. (#16448 ) * Fix ExpressionPredicateIndexSupplier numeric replace-with-default behavior. In replace-with-default mode, null numeric values from the index should be interpreted as zeroes by expressions. This makes the index supplier more consistent with the behavior of the selectors created by the expression virtual column. * Fix test case.	2024-05-15 15:11:47 +05:30
Gian Merlino	72432c2e78	Speed up SQL IN using SCALAR_IN_ARRAY. (#16388 ) * Speed up SQL IN using SCALAR_IN_ARRAY. Main changes: 1) DruidSqlValidator now includes a rewrite of IN to SCALAR_IN_ARRAY, when the size of the IN is above inFunctionThreshold. The default value of inFunctionThreshold is 100. Users can restore the prior behavior by setting it to Integer.MAX_VALUE. 2) SearchOperatorConversion now generates SCALAR_IN_ARRAY when converting to a regular expression, when the size of the SEARCH is above inFunctionExprThreshold. The default value of inFunctionExprThreshold is 2. Users can restore the prior behavior by setting it to Integer.MAX_VALUE. 3) ReverseLookupRule generates SCALAR_IN_ARRAY if the set of reverse-looked-up values is greater than inFunctionThreshold. * Revert test. * Additional coverage. * Update docs/querying/sql-query-context.md Co-authored-by: Benedict Jin <asdf2014@apache.org> * New test. --------- Co-authored-by: Benedict Jin <asdf2014@apache.org>	2024-05-14 08:09:27 -07:00
Sree Charan Manamala	b8dd7478d0	Custom Calcite Rule to remove redundant references (#16402 ) Custom calcite rule mimicking AggregateProjectMergeRule to extend support to expressions. The current calcite rule return null in such cases. In addition, this removes the redundant references.	2024-05-14 06:38:05 +02:00

1 2 3 4 5 ...

3169 Commits