druid

Commit Graph

Author	SHA1	Message	Date
AmatyaAvadhanula	488d376209	Optimize isOvershadowed when there is a unique minor version for an interval (#15952 ) * Optimize isOvershadowed for intervals with timechunk locking	2024-03-20 19:30:00 +05:30
Clint Wylie	48b8d42698	fix regexp_like, contains_string, icontains_string to return null instead of false for null inputs in sql compatible mode (#15963 )	2024-03-19 22:12:47 -07:00
Gian Merlino	c96b215dd6	SortMerge join support for IS NOT DISTINCT FROM. (#16003 ) * SortMerge join support for IS NOT DISTINCT FROM. The patch adds a "requiredNonNullKeyParts" field to the sortMerge processor, which has the list of key parts that must be nonnull for an equijoin condition to match. Conditions with SQL "=" are present in the list; conditions with SQL "IS NOT DISTINCT FROM" are absent from the list. * Fix test. * Update javadoc.	2024-03-19 12:02:13 -07:00
Laksh Singla	e3b75ac11f	Fix FloatFirstVectorAggregationTest (#16165 )	2024-03-19 17:42:29 +05:30
Zoltan Haindrich	0a42342cef	Update CalciteTest to use junit5 (#16106 ) Update CalciteTest to use junit5 change the way temp dirs are handled * add openrewrite workflow to safeguard upgrade * replace junitparamrunner with standard junit5 parametered tests * update a few rules to junit5 api * lots of boring changes * cleanup QueryLogHook * cleanup * fix compile error: ARRAYS_DATASOURCE * fix test * remove enclosed * empty +TEST:TDigestSketchSqlAggregatorTest,HllSketchSqlAggregatorTest,DoublesSketchSqlAggregatorTest,ThetaSketchSqlAggregatorTest,ArrayOfDoublesSketchSqlAggregatorTest,BloomFilterSqlAggregatorTest,BloomDimFilterSqlTest,CatalogIngestionTest,CatalogQueryTest,FixedBucketsHistogramQuantileSqlAggregatorTest,QuantileSqlAggregatorTest,MSQArraysTest,MSQDataSketchesTest,MSQExportTest,MSQFaultsTest,MSQInsertTest,MSQLoadedSegmentTests,MSQParseExceptionsTest,MSQReplaceTest,MSQSelectTest,InsertLockPreemptedFaultTest,MSQWarningsTest,SqlMSQStatementResourcePostTest,SqlStatementResourceTest,CalciteSelectJoinQueryMSQTest,CalciteSelectQueryMSQTest,CalciteUnionQueryMSQTest,MSQTestBase,VarianceSqlAggregatorTest,SleepSqlTest,SqlRowTransformerTest,DruidAvaticaHandlerTest,DruidStatementTest,BaseCalciteQueryTest,CalciteArraysQueryTest,CalciteCorrelatedQueryTest,CalciteExplainQueryTest,CalciteExportTest,CalciteIngestionDmlTest,CalciteInsertDmlTest,CalciteJoinQueryTest,CalciteLookupFunctionQueryTest,CalciteMultiValueStringQueryTest,CalciteNestedDataQueryTest,CalciteParameterQueryTest,CalciteQueryTest,CalciteReplaceDmlTest,CalciteScanSignatureTest,CalciteSelectQueryTest,CalciteSimpleQueryTest,CalciteSubqueryTest,CalciteSysQueryTest,CalciteTableAppendTest,CalciteTimeBoundaryQueryTest,CalciteUnionQueryTest,CalciteWindowQueryTest,DecoupledPlanningCalciteJoinQueryTest,DecoupledPlanningCalciteQueryTest,DecoupledPlanningCalciteUnionQueryTest,DrillWindowQueryTest,DruidPlannerResourceAnalyzeTest,IngestTableFunctionTest,QueryTestRunner,SqlTestFrameworkConfig,SqlAggregationModuleTest,ExpressionsTest,GreatestExpressionTest,IPv4AddressMatchExpressionTest,IPv4AddressParseExpressionTest,IPv4AddressStringifyExpressionTest,LeastExpressionTest,TimeFormatOperatorConversionTest,CombineAndSimplifyBoundsTest,FiltrationTest,SqlQueryTest,CalcitePlannerModuleTest,CalcitesTest,DruidCalciteSchemaModuleTest,DruidSchemaNoDataInitTest,InformationSchemaTest,NamedDruidSchemaTest,NamedLookupSchemaTest,NamedSystemSchemaTest,RootSchemaProviderTest,SystemSchemaTest,CalciteTestBase,SqlResourceTest * use @Nested * add rule to remove enclosed; upgrade surefire * remove enclosed * cleanup * add comment about surefire exclude	2024-03-19 04:05:12 -07:00
Laksh Singla	f5c5573da8	Revert "Fix FloatFirstVectorAggregationTest" This reverts commit `c40bb2c8f0`.	2024-03-19 14:39:09 +05:30
Laksh Singla	c40bb2c8f0	Fix FloatFirstVectorAggregationTest	2024-03-19 14:31:13 +05:30
Soumyava	c7823bca98	Adding null check to earliest and latest aggs (#15972 ) * Adding null check to earliest and latest aggs * Native tests for null inPairs	2024-03-19 09:44:14 +05:30
Kashif Faraz	466057c61b	Remove deprecated DruidException, EntryExistsException (#14448 ) Changes: - Remove deprecated `DruidException` (old one) and `EntryExistsException` - Use newly added comprehensive `DruidException` instead - Update error message in `SqlMetadataStorageActionHandler` when max packet limit is violated. - Factor out common code from several faults into `BaseFault`. - Slightly update javadoc in `DruidException` to render it correctly - Remove unused classes `SegmentToMove`, `SegmentToDrop` - Move `ServletResourceUtils` from module `druid-processing` to `druid-server` - Add utility method to build error Response from `DruidException`.	2024-03-15 21:29:11 +05:30
Clint Wylie	dd9bc3749a	fix issues with array_contains and array_overlap with null left side arguments (#15974 ) changes: * fix issues with array_contains and array_overlap with null left side arguments * modify singleThreaded stuff to allow optimizing Function similar to how we do for ExprMacro - removed SingleThreadSpecializable in favor of default impl of asSingleThreaded on Expr with clear javadocs that most callers shouldn't be calling it directly and should be using Expr.singleThreaded static method which uses a shuttle and delegates to asSingleThreaded instead * add optimized 'singleThreaded' versions of array_contains and array_overlap * add mv_harmonize_nulls native expression to use with MV_CONTAINS and MV_OVERLAP to allow them to behave consistently with filter rewrites, coercing null and [] into [null] * fix bug with casting rhs argument for native array_contains and array_overlap expressions	2024-03-13 18:16:10 -07:00
Zoltan Haindrich	818cc9eedf	Fix toString for SingleThreadSpecializable ConstantExprs (#16084 )	2024-03-13 03:48:52 -07:00
Clint Wylie	aa2959b2bd	reset keySerde when closing groupers to clear out heap dictionaries (#16114 ) ConcurrentGrouper kind of misuses ThreadLocal to hold a SpillingGrouper, and never calls remove() on it, which can result in large amounts of heap being retained as weak references even after grouping is finished. This PR calls keySerde.reset() on all of the Grouper.close() implementations that have a KeySerde and should free up a bunch of space that is no longer needed.	2024-03-13 15:09:54 +05:30
Soumyava	85ee775390	Handling latest_by and earliest_by on numeric columns correctly (#15939 ) * Handling latest_by and earliest_by on numeric columns correctly * Adding test	2024-03-11 13:49:21 -07:00
Clint Wylie	313da98879	decouple column serializer compression closers from SegmentWriteoutMedium to optionally allow serializers to release direct memory allocated for compression earlier than when segment is completed (#16076 )	2024-03-11 12:28:04 -07:00
Vishesh Garg	b1c1937e94	Change last update timestamp granularity of GCS objects from seconds to milliseconds (#16083 ) The previously used GCS API client library returned last update time for objects directly in milliseconds. The new library returns it in OffsetDateTime format which was being converted to seconds and stored against the object. This fix converts the time back to ms before storing it.	2024-03-09 07:54:33 +05:30
Kashif Faraz	5f203725dd	Clean up SqlSegmentsMetadataManager and corresponding tests (#16044 ) Changes: Improve `SqlSegmentsMetadataManager` - Break the loop in `populateUsedStatusLastUpdated` before going to sleep if there are no more segments to update - Add comments and clean up logs Refactor `SqlSegmentsMetadataManagerTest` - Merge `SqlSegmentsMetadataManagerEmptyTest` into this test - Add method `testPollEmpty` - Shave a few seconds off of the tests by reducing poll duration - Simplify creation of test segments - Some renames here and there - Remove unused methods - Move `TestDerbyConnector.allowLastUsedFlagToBeNull` to this class Other minor changes - Add javadoc to `NoneShardSpec` - Use lambda in `SqlSegmentMetadataPublisher`	2024-03-08 07:34:51 +05:30
Laksh Singla	5f588fa45c	Fix bug while materializing scan's result to frames (#15987 ) While converting Sequence<ScanResultValue> to Sequence<Frames>, when maxSubqueryBytes is enabled, we batch the results to prevent creating a single frame per ScanResultValue. Batching requires peeking into the actual value, and checking if the row signature of the scan result’s value matches that of the previous value. Since we can do this indefinitely (in the worst case all of them have the same signature), we keep fetching them and accumulating them in a list (on the heap). We don’t really know how much to batch before we actually write the value as frames. The PR modifies the batching logic to not accumulate the results in an intermediary list	2024-03-07 17:11:44 +05:30
Zoltan Haindrich	27d7c30c38	Only use ExprEval in ConstantExpr if its known that it will be safe (#15694 ) * `Expr#singleThreaded` which creates a singleThreaded version of the actual expression (caching ExprEval is allowed) * `Expr#makeSingleThreaded` to make a whole subtree of expressions 'singleThreaded' - uses `Shuttle` to create the new expression tree * `ConstantExpr#singleThreaded` creates a specialized `ConstantExpr` which does cache the `ExprEval` * some `@Immutable` annotations were added to make it more likely to notice that there might be something off if a similar change will be made around here for some reason	2024-03-05 12:53:09 -08:00
Gian Merlino	e13ed7b878	Improve javadoc for ReadableFieldPointer. (#16043 ) Since #15175, the javadoc for ReadableFieldPointer is somewhat out of date. It says that the pointer only points to the beginning of the field, but this is no longer true. This patch updates the javadoc to be more accurate.	2024-03-05 11:52:49 -08:00
Zoltan Haindrich	bb882727c0	Fix Windowing/scanAndSort query issues on top of Joins. (#15996 ) allow a hashjoin result to be converted to RowsAndColumns added StorageAdapterRowsAndColumns fix incorrect isConcrete() return values during early phase of planning	2024-03-05 15:05:31 +05:30
Clint Wylie	6e3b33d8c2	fix bug with like filter on missing columns not correctly considering extractionFn (#16037 )	2024-03-04 23:27:50 -08:00
Zoltan Haindrich	e469b7ed34	Make setting QUERY_CONTEXT_DEFAULT explicit in tests (#16010 )	2024-03-05 10:54:16 +05:30
Gian Merlino	930655ff18	Move retries into DataSegmentPusher implementations. (#15938 ) * Move retries into DataSegmentPusher implementations. The individual implementations know better when they should and should not retry. They can also generate better error messages. The inspiration for this patch was a situation where EntityTooLarge was generated by the S3DataSegmentPusher, and retried uselessly by the retry harness in PartialSegmentMergeTask. * Fix missing var. * Adjust imports. * Tests, comments, style. * Remove unused import.	2024-03-04 10:36:21 -08:00
Gian Merlino	376a41f1e9	Rows.objectToNumber: Accept decimals with output type LONG. (#15999 ) * Rows.objectToNumber: Accept decimals with output type LONG. PR #15615 added an optimization to avoid parsing numbers twice in cases where we know that they should definitely be longs or definitely be doubles. Rather than try parsing as long first, and then try parsing as double, it would use only the parsing routine specific to the requested outputType. This caused a bug: previously, we would accept decimals like "1.0" or "1.23" as longs, by truncating them to "1". After that patch, we would treat such decimals as nulls when the outputType is set to LONG. This patch retains the short-circuit for doubles: if outputType is DOUBLE, we only parse the string as a double. But for outputType LONG, this patch restores the old behavior: try to parse as long first, then double.	2024-03-04 22:00:27 +05:30
Zoltan Haindrich	bf0995f846	Introduce dynamic table append (#15897 )	2024-03-01 04:31:57 -05:00
Clint Wylie	101176590c	adaptive filter partitioning (#15838 ) * cooler cursor filter processing allowing much smart utilization of indexes by feeding selectivity forward, with implementations for range and predicate based filters * added new method Filter.makeFilterBundle which cursors use to get indexes and matchers for building offsets * AND filter partitioning is now pushed all the way down, even to nested AND filters * vector engine now uses same indexed base value matcher strategy for OR filters which partially support indexes	2024-02-29 15:38:12 -08:00
Jan Werner	baaa4a6808	update common-compress to address CVE-2024-25710 CVE-2024-26308 (#16009 ) * Update common-compress to 1.26.0 to address CVEs CVE-2024-25710 CVE-2024-26308 * Add commons-codec as a runtime dependency required by common-compress 1.26.0 --------- Co-authored-by: Xavier Léauté <xl+github@xvrl.net>	2024-02-29 14:05:31 -08:00
Sensor	e0bce0ef90	Add pre-check for heavy debug logs (#15706 ) Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> Co-authored-by: Benedict Jin <asdf2014@apache.org>	2024-02-29 12:58:14 +05:30
Laksh Singla	17e4f3ac60	Refactor GroupBy and TopN code to relax the constraint of dimensions being comparable (#15559 ) The code in the groupBy engine and the topN engine assume that the dimensions are comparable and can call dimA.compareTo(dimB) to sort the dimensions and group them together. This works well for the primitive dimensions, because they are Comparable, however falls apart when the dimensions can be arrays (or in future scenarios complex columns). In cases when the dimensions are not comparable, Druid resorts to having a wrapper type ComparableStringArray and ComparableList, which is a Comparable, based on the list comparator.	2024-02-27 11:39:29 +05:30
Zoltan Haindrich	06deda9415	ScanAndSort query fails with NPE for simple queries (#15914 ) * some stuff * add dummy fields * draft-fix * rename test * cleanup * add null * cleanup * cleanup * add test * updates * move check tp constructore * cleanup * updates/etc * fix some more * add rowSignatureMode * checkstyle/etc * override * missing msqIncompat * fix test * fixes * undo * updates * remove param	2024-02-24 15:33:50 -08:00
Clint Wylie	6145c8dd01	fix bug with expression virtual column indexes on missing columns for expressions that turn null values into not null values (#15959 )	2024-02-23 15:07:32 -08:00
Clint Wylie	cc5964fbcb	fix NestedCommonFormatColumnHandler to use nullable comparator when castToType is set (#15921 ) Fixes a bug when the undocumented castToType parameter is set on 'auto' column schema, which should have been using the 'nullable' comparator to allow null values to be present when merging columns, but wasn't which would lead to null pointer exceptions. Also fixes an issue I noticed while adding tests that if 'FLOAT' type was specified for the castToType parameter it would be an exception because that type is not expected to be present, since 'auto' uses the native expressions to determine the input types and expressions don't have direct support for floats, only doubles. In the future I should probably split this functionality out of the 'auto' schema (maybe even have a simpler version of the auto indexer dedicated to handling non-nested data) but still have the same results of writing out the newer 'nested common format' columns used by 'auto', but I haven't taken that on in this PR.	2024-02-22 21:35:50 +05:30
Zoltan Haindrich	bcce0806d7	Support Union in decoupled mode (#15870 )	2024-02-21 10:54:50 -05:00
Laksh Singla	a1b2c7326e	Numeric array support for columnar frames (#15917 ) Columnar frames used in subquery materialization and window functions now support numeric arrays.	2024-02-21 11:32:33 +05:30
Gian Merlino	9c41827dba	Globally disable AUTO_CLOSE_JSON_CONTENT. (#15880 ) * Globally disable AUTO_CLOSE_JSON_CONTENT. This JsonGenerator feature is on by default. It causes problems with code like this: try (JsonGenerator jg = ...) { jg.writeStartArray(); for (x : xs) { jg.writeObject(x); } jg.writeEndArray(); } If a jg.writeObject call fails due to some problem with the data it's reading, the JsonGenerator will write the end array marker automatically when closed as part of the try-with-resources. If the generator is writing to a stream where the reader does not have some other mechanism to realize that an exception was thrown, this leads the reader to believe that the array is complete when it actually isn't. Prior to this patch, we disabled AUTO_CLOSE_JSON_CONTENT for JSON-wrapped SQL result formats in #11685, which fixed an issue where such results could be erroneously interpreted as complete. This patch fixes a similar issue with task reports, and all similar issues that may exist elsewhere, by disabling the feature globally. * Update test.	2024-02-16 08:52:48 -08:00
Gian Merlino	0f6a895372	Rework ExprMacro base classes to simplify implementations. (#15622 ) * Rework ExprMacro base classes to simplify implementations. This patch removes BaseScalarUnivariateMacroFunctionExpr, adds BaseMacroFunctionExpr at the top of the hierarchy (a suitable base class for ExprMacros that take either arrays or scalars), and adds an implementation for "visit" to BaseMacroFunctionExpr. The effect on implementations is generally cleaner code: - Exprs no longer need to implement "visit". - Exprs no longer need to implement "stringify", even if they don't use all of their args at runtime, because BaseMacroFunctionExpr has access to even unused args. - Exprs that accept arrays can extend BaseMacroFunctionExpr and inherit a bunch of useful methods. The only one they need to implement themselves that scalar exprs don't is "supplyAnalyzeInputs". * Make StringDecodeBase64UTFExpression a static class. * Remove unused import. * Formatting, annotation changes.	2024-02-12 15:50:45 -08:00
Sree Charan Manamala	57e12df352	Sql Single Value Aggregator for scalar queries (#15700 ) Executing single value correlated queries will throw an exception today since single_value function is not available in druid. With these added classes, this provides druid, the capability to plan and run such queries.	2024-02-08 19:20:30 +05:30
Soumyava	f3996b96ff	Fixes for safe_divide with vectorize and datatypes (#15839 ) * Fix for save_divide with vectorize * More fixes * Update to use expr.eval(null) for both cases when denominator is 0	2024-02-08 14:40:42 +05:30
Adarsh Sanjeev	514b3b4d01	Add export capabilities to MSQ with SQL syntax (#15689 ) * Add test * Parser changes to support export statements * Fix builds * Address comments * Add frame processor * Address review comments * Fix builds * Update syntax * Webconsole workaround * Refactor * Refactor * Change export file path * Update docs * Remove webconsole changes * Fix spelling mistake * Parser changes, add tests * Parser changes, resolve build warnings * Fix failing test * Fix failing test * Fix IT tests * Add tests * Cleanup * Fix unparse * Fix forbidden API * Update docs * Update docs * Address review comments * Address review comments * Fix tests * Address review comments * Fix insert unparse * Add external write resource action * Fix tests * Add resource check to overlord resource * Fix tests * Add IT * Update syntax * Update tests * Update permission * Address review comments * Address review comments * Address review comments * Add tests * Add check for runtime parameter for bucket and path * Add check for runtime parameter for bucket and path * Add tests * Update docs * Fix NPE * Update docs, remove deadcode * Fix formatting	2024-02-07 22:08:50 +05:30
Pramod Immaneni	59bca0951a	Parallelize storage of incremental segments (#13982 ) During ingestion, incremental segments are created in memory for the different time chunks and persisted to disk when certain thresholds are reached (max number of rows, max memory, incremental persist period etc). In the case where there are a lot of dimension and metrics (1000+) it was observed that the creation/serialization of incremental segment file format for persistence and persisting the file took a while and it was blocking ingestion of new data. This affected the real-time ingestion. This serialization and persistence can be parallelized across the different time chunks. This update aims to do that. The patch adds a simple configuration parameter to the ingestion tuning configuration to specify number of persistence threads. The default value is 1 if it not specified which makes it the same as it is today.	2024-02-07 10:43:05 +05:30
Soumyava	b86f31f2c0	Addressing shapeshifting issues with window functions (#15807 ) Addressing shapeshifting issues with window functions	2024-02-06 11:12:20 +05:30
Clint Wylie	358892e5b0	add nested array index support, fix some bugs (#15752 ) This PR wires up ValueIndexes and ArrayElementIndexes for nested arrays, ValueIndexes for nested long and double columns, and fixes a handful of bugs I found after adding nested columns to the filter test gauntlet.	2024-02-05 15:12:09 +05:30
Zoltan Haindrich	8f5b7522c7	Strict window frame checks (#15746 ) introduce checks to ensure that window frame is supported added check to ensure that no expressions are set as bounds added logic to detect following/following like cases - described in Window function fails to demarcate if 2 following are used #15739 currently RANGE frames are only supported correctly if both endpoints are unbounded or current row Offset based window range support #15767 added windowingStrictValidation context key to provide a way to override the check	2024-02-02 16:21:53 +05:30
Vishesh Garg	37d1650ccf	Benchmark for query planning time for IN queries (#15688 ) Adds a set of benchmark queries for measuring the planning time with the IN operator. Current results indicate that with the recent optimizations, the IN planning time with 100K expressions in the IN clause is just 3s and with 1M is 46s. For IN clause paired with OR <col>=<val> expr, the numbers are 10s and 155s for 100K and 1M, resp.	2024-01-31 15:40:31 +05:30
Zoltan Haindrich	f701197224	Enable ArrayListRowsAndColumns to StorageAdapter conversion (#15735 )	2024-01-31 02:36:58 -05:00
Clint Wylie	01fa5c7ea6	add null value index wiring for nested column to speed up is null/is not null (#15687 ) Nested columns maintain a null value bitmap for which rows are nulls, however I forgot to wire up a ColumnIndexSupplier to nested columns when filtering the 'raw' data itself, so these were not able to be used. This PR fixes that by adding a supplier that can return NullValueIndex to be used by the NullFilter, which should speed up is null and is not null filters on json columns. I haven't spent the time to measure the difference yet, but I imagine it should be a significant speed increase. Note that I only wired this up if druid.generic.useDefaultValueForNull=false (sql compatible mode), the reason being that the SQL planner still uses selector filter, which is unable to properly handle any arrays or complex types (including json, even checking for nulls). The reason for this is so that the behavior is consistent between using the index and using the value matcher, otherwise we get into a situation where using the index has correct behavior but using the value matcher does not, which I was trying to avoid.	2024-01-29 12:34:50 +05:30
Abhishek Agarwal	989a8f7874	Better error message for date_trunc operators (#15759 ) IAEs are not bubbled up and show up as a runtime failure to the user which are not helpful. See https://apachedruidworkspace.slack.com/archives/C0303FDCZEZ/p1706185796975109 for one such example. This change will fix that.	2024-01-27 11:22:39 +05:30
Abhishek Radhakrishnan	f58fd5b75f	Remove TestObjectMapper in favor of DefaultObjectMapper. (#15769 ) Remove dilemma on what object mapper class to use in tests since the DefaultObjectMapper class provides all the same settings and goodies.	2024-01-26 16:35:12 -08:00
Gian Merlino	00cb0a2900	Fix extractionFns on number-wrapping dimension selectors. (#15761 ) When an ExtractionFn is used on top of a numeric column, it wasn't applied to nulls (nulls are returned as-is). This patch fixes it.	2024-01-26 19:56:13 +05:30
Zoltan Haindrich	2eba20d724	Fix minor build issues and stabilize intellij-inspections runs (#15747 ) * Possibly stabilize intellij-inspections * remove `integration-tests-ex/cases` from excluded projects from initial build * enable ErrorProne's `CheckedExceptionNotThrown` to get earlier errors than intellij-inspections * fix ddsketch pom.xml * fix spellcheck	2024-01-24 15:17:33 +05:30
Hiroshi Fukada	3fe3a65344	New: Add DDSketch in extensions-contrib (#15049 ) * New: Add DDSketch-Druid extension - Based off of http://www.vldb.org/pvldb/vol12/p2195-masson.pdf and uses the corresponding https://github.com/DataDog/sketches-java library - contains tests for post building and using aggregation/post aggregation. - New aggregator: `ddSketch` - New post aggregators: `quantileFromDDSketch` and `quantilesFromDDSketch` * Fixing easy CodeQL warnings/errors * Fixing docs, and dependencies Also moved aggregator ids to AggregatorUtil and PostAggregatorIds * Adding more Docs and better null/empty handling for aggregators * Fixing docs, and pom version * DDSketch documentation format and wording	2024-01-23 20:17:07 +05:30
Karan Kumar	c4990f56d6	Prepare main branch for next 30.0.0 release. (#15707 )	2024-01-23 15:55:54 +05:30
Pranav	45b30dc07d	Revert "Change default inSubQueryThreshold (#15336 )" (#15722 ) A low value of inSubQueryThreshold can cause queries with IN filter to plan as joins more commonly. However, some of these join queries may not get planned as IN filter on data nodes and causes significant perf regression.	2024-01-22 11:34:39 +05:30
zachjsh	9d4e8053a4	Kinesis adaptive memory management (#15360 ) ### Description Our Kinesis consumer works by using the [GetRecords API](https://docs.aws.amazon.com/kinesis/latest/APIReference/API_GetRecords.html) in some number of `fetchThreads`, each fetching some number of records (`recordsPerFetch`) and each inserting into a shared buffer that can hold a `recordBufferSize` number of records. The logic is described in our documentation at: https://druid.apache.org/docs/27.0.0/development/extensions-core/kinesis-ingestion/#determine-fetch-settings There is a problem with the logic that this pr fixes: the memory limits rely on a hard-coded “estimated record size” that is `10 KB` if `deaggregate: false` and `1 MB` if `deaggregate: true`. There have been cases where a supervisor had `deaggregate: true` set even though it wasn’t needed, leading to under-utilization of memory and poor ingestion performance. Users don’t always know if their records are aggregated or not. Also, even if they could figure it out, it’s better to not have to. So we’d like to eliminate the `deaggregate` parameter, which means we need to do memory management more adaptively based on the actual record sizes. We take advantage of the fact that GetRecords doesn’t return more than 10MB (https://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html ): This pr: eliminates `recordsPerFetch`, always use the max limit of 10000 records (the default limit if not set) eliminate `deaggregate`, always have it true cap `fetchThreads` to ensure that if each fetch returns the max (`10MB`) then we don't exceed our budget (`100MB` or `5% of heap`). In practice this means `fetchThreads` will never be more than `10`. Tasks usually don't have that many processors available to them anyway, so in practice I don't think this will change the number of threads for too many deployments add `recordBufferSizeBytes` as a bytes-based limit rather than records-based limit for the shared queue. We do know the byte size of kinesis records by at this point. Default should be `100MB` or `10% of heap`, whichever is smaller. add `maxBytesPerPoll` as a bytes-based limit for how much data we poll from shared buffer at a time. Default is `1000000` bytes. deprecate `recordBufferSize`, use `recordBufferSizeBytes` instead. Warning is logged if `recordBufferSize` is specified deprecate `maxRecordsPerPoll`, use `maxBytesPerPoll` instead. Warning is logged if maxRecordsPerPoll` is specified Fixed issue that when the record buffer is full, the fetchRecords logic throws away the rest of the GetRecords result after `recordBufferOfferTimeout` and starts a new shard iterator. This seems excessively churny. Instead, wait an unbounded amount of time for queue to stop being full. If the queue remains full, we’ll end up right back waiting for it after the restarted fetch. There was also a call to `newQ::offer` without check in `filterBufferAndResetBackgroundFetch`, which seemed like it could cause data loss. Now checking return value here, and failing if false. ### Release Note Kinesis ingestion memory tuning config has been greatly simplified, and a more adaptive approach is now taken for the configuration. Here is a summary of the changes made: eliminates `recordsPerFetch`, always use the max limit of 10000 records (the default limit if not set) eliminate `deaggregate`, always have it true cap `fetchThreads` to ensure that if each fetch returns the max (`10MB`) then we don't exceed our budget (`100MB` or `5% of heap`). In practice this means `fetchThreads` will never be more than `10`. Tasks usually don't have that many processors available to them anyway, so in practice I don't think this will change the number of threads for too many deployments add `recordBufferSizeBytes` as a bytes-based limit rather than records-based limit for the shared queue. We do know the byte size of kinesis records by at this point. Default should be `100MB` or `10% of heap`, whichever is smaller. add `maxBytesPerPoll` as a bytes-based limit for how much data we poll from shared buffer at a time. Default is `1000000` bytes. deprecate `recordBufferSize`, use `recordBufferSizeBytes` instead. Warning is logged if `recordBufferSize` is specified deprecate `maxRecordsPerPoll`, use `maxBytesPerPoll` instead. Warning is logged if maxRecordsPerPoll` is specified	2024-01-19 14:30:21 -05:00
Benedict Jin	96b4abc8e9	Add @VisibleForTesting annotation for the backingArray() method (#15690 )	2024-01-18 19:30:10 -08:00
Gian Merlino	792e5c58e4	IncrementalIndex#add is no longer thread-safe. (#15697 ) * IncrementalIndex#add is no longer thread-safe. Following #14866, there is no longer a reason for IncrementalIndex#add to be thread-safe. It turns out it already was not using its selectors in a thread-safe way, as exposed by #15615 making `testMultithreadAddFactsUsingExpressionAndJavaScript` in `IncrementalIndexIngestionTest` flaky. Note that this problem isn't new: Strings have been stored in the dimension selectors for some time, but we didn't have a test that checked for that case; we only have this test that checks for concurrent adds involving numeric selectors. At any rate, this patch changes OnheapIncrementalIndex to no longer try to offer a thread-safe "add" method. It also improves performance a bit by adding a row ID supplier to the selectors it uses to read InputRows, meaning that it can get the benefit of caching values inside the selectors. This patch also: 1) Adds synchronization to HyperUniquesAggregator and CardinalityAggregator, which the similar datasketches versions already have. This is done to help them adhere to the contract of Aggregator: concurrent calls to "aggregate" and "get" must be thread-safe. 2) Updates OnHeapIncrementalIndexBenchmark to use JMH and moves it to the druid-benchmarks module. * Spelling. * Changes from static analysis. * Fix javadoc.	2024-01-18 03:45:22 -08:00
Gian Merlino	764f41d959	Clear "lineSplittable" for JSON when using KafkaInputFormat. (#15692 ) * Clear "lineSplittable" for JSON when using KafkaInputFormat. JsonInputFormat has a "withLineSplittable" method that can be used to control whether JSON is read line-by-line, or as a whole. The intent is that in streaming ingestion, "lineSplittable" is false (although it can be overridden by "assumeNewlineDelimited"), and in batch ingestion, lineSplittable is true. When a "json" format is wrapped by a "kafka" format, this isn't set properly. This patch updates KafkaInputFormat to set this on an underlying "json" format. The tests for KafkaInputFormat were overriding the "lineSplittable" parameter explicitly, which wasn't really fair, because that made them unrealistic to what happens in production. Now they omit the parameter and get the production behavior. * Add test. * Fix test coverage.	2024-01-18 03:22:41 -08:00
Gian Merlino	d3d0c1c91e	Faster parsing: reduce String usage, list-based input rows. (#15681 ) * Faster parsing: reduce String usage, list-based input rows. Three changes: 1) Reworked FastLineIterator to optionally avoid generating Strings entirely, and reduce copying somewhat. Benefits the line-oriented JSON, CSV, delimited (TSV), and regex formats. 2) In the delimited (TSV) format, when the delimiter is a single byte, split on UTF-8 bytes directly. 3) In CSV and delimited (TSV) formats, use list-based input rows when the column list is provided upfront by the user. * Fix style. * Fix inspections. * Restore validation. * Remove fastutil-extra. * Exception type. * Fixes for error messages. * Fixes for null handling.	2024-01-18 19:18:46 +08:00
Laksh Singla	fc06f2d075	Fix summary iterator in grouping engine(#15658 ) This PR fixes the summary iterator to add aggregators in the correct position. The summary iterator is used when dims are not present, therefore the new change is identical to the old one, but seems more correct while reading.	2024-01-17 20:43:45 +05:30
Zoltan Haindrich	8a43db9395	Range support in window expressions (support them as groups) (#15365 ) * support groups windowing mode; which is a close relative of ranges (but not in the standard) * all windows with range expressions will be executed wit it groups * it will be 100% correct in case for both bounds its true that: isCurrentRow() \|\| isUnBounded() * this covers OVER ( ORDER BY COL ) * for other cases it will have some chances of getting correct results...	2024-01-17 00:05:21 -06:00
AmatyaAvadhanula	11dbfb6e3f	Better error message when partition space is exhausted (#15685 ) * Better error message when partition space is exhausted	2024-01-16 12:32:40 +05:30
Sam Rash	072b16c6df	Fix SQL Innterval.of() error message (#15454 ) Better error message for poorly constructed intervals	2024-01-15 22:34:35 -06:00
Gian Merlino	d359fb3d68	Cache value selectors in RowBasedColumnSelectorFactory. (#15615 ) * Cache value selectors in RowBasedColumnSelectorFactory. There was already caching for dimension selectors. This patch adds caching for value (object and number) selectors. It's helpful when the same field is read multiple times during processing of a single row (for example, by being an input to both MIN and MAX aggregations). * Fix typing. * Fix logic.	2024-01-15 18:03:27 -08:00
Kashif Faraz	18d2a8957f	Refactor: Cleanup test impls of ServiceEmitter (#15683 )	2024-01-15 17:37:00 +05:30
Ben Sykes	e49a7bb3cd	Add SpectatorHistogram extension (#15340 ) * Add SpectatorHistogram extension * Clarify documentation Cleanup comments * Use ColumnValueSelector directly so that we support being queried as a Number using longSum or doubleSum aggregators as well as a histogram. When queried as a Number, we're returning the count of entries in the histogram. * Apply suggestions from code review Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Fix references * Fix spelling * Update docs/development/extensions-contrib/spectator-histogram.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> --------- Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2024-01-14 09:52:30 -08:00
Gian Merlino	500681d0cb	Add ImmutableLookupMap for static lookups. (#15675 ) * Add ImmutableLookupMap for static lookups. This patch adds a new ImmutableLookupMap, which comes with an ImmutableLookupExtractor. It uses a fastutil open hashmap plus two lists to store its data in such a way that forward and reverse lookups can both be done quickly. I also observed footprint to be somewhat smaller than Java HashMap + MapLookupExtractor for a 1 million row lookup. The main advantage, though, is that reverse lookups can be done much more quickly than MapLookupExtractor (which iterates the entire map for each call to unapplyAll). This speeds up the recently added ReverseLookupRule (#15626) during SQL planning with very large lookups. * Use in one more test. * Fix benchmark. * Object2ObjectOpenHashMap * Fixes, and LookupExtractor interface update to have asMap. * Remove commented-out code. * Fix style. * Fix import order. * Add fastutil. * Avoid storing Map entries.	2024-01-13 13:14:01 -08:00
Gian Merlino	cccf13ea82	Reverse, pull up lookups in the SQL planner. (#15626 ) * Reverse, pull up lookups in the SQL planner. Adds two new rules: 1) ReverseLookupRule, which eliminates calls to LOOKUP by doing reverse lookups. 2) AggregatePullUpLookupRule, which pulls up calls to LOOKUP above GROUP BY, when the lookup is injective. Adds configs `sqlReverseLookup` and `sqlPullUpLookup` to control whether these rules fire. Both are enabled by default. To minimize the chance of performance problems due to many keys mapping to the same value, ReverseLookupRule refrains from reversing a lookup if there are more keys than `inSubQueryThreshold`. The rationale for using this setting is that reversal works by generating an IN, and the `inSubQueryThreshold` describes the largest IN the user wants the planner to create. * Add additional line. * Style. * Remove commented-out lines. * Fix tests. * Add test. * Fix doc link. * Fix docs. * Add one more test. * Fix tests. * Logic, test updates. * - Make FilterDecomposeConcatRule more flexible. - Make CalciteRulesManager apply reduction rules til fixpoint. * Additional tests, simplify code.	2024-01-12 00:06:31 -08:00
Gian Merlino	2231cb30a4	Faster k-way merging using tournament trees, 8-byte key strides. (#15661 ) * Faster k-way merging using tournament trees, 8-byte key strides. Two speedups for FrameChannelMerger (which does k-way merging in MSQ): 1) Replace the priority queue with a tournament tree, which does fewer comparisons. 2) Compare keys using 8-byte strides, rather than 1 byte at a time. * Adjust comments. * Fix style. * Adjust benchmark and test. * Add eight-list test (power of two).	2024-01-11 08:36:22 -08:00
Clint Wylie	2118258b54	tidy up group by engines after removal of v1 (#15665 )	2024-01-11 00:52:52 -08:00
Laksh Singla	0b91cc4db2	Fix incorrect tests in Sting first/last serde's null handling (#15657 ) Fixes a couple of incorrect test cases, that got merged accidentally	2024-01-10 19:16:12 +05:30
Zoltan Haindrich	fefa763722	Resultcache fetch should deserialize aggregates when they are real results (#15654 ) Fixes #15538	2024-01-10 06:42:33 -05:00
Laksh Singla	4149f98934	Fixes a bug with long string pair serde where null and empty strings are treated equivalently (#15525 ) This PR fixes a bug with the long string pair serde where null and empty strings are treated equivalently, and the return value is always null. When 'useDefaultValueForNull' was set to true by default, this wasn't a commonly seen issue, because nulls were equivalent to empty strings. However, since the default has changed to false, this can create incorrect results when the long string pairs are serded, where the empty strings are incorrectly converted to nulls.	2024-01-10 14:17:57 +05:30
Clint Wylie	2938b8de53	fix issue with NestedPathArrayElement not correctly handling negative index for Object[] like it has for List (#15650 )	2024-01-10 09:46:08 +05:30
Clint Wylie	cafc748f7e	skip expression virtual column indexes when mvd is used as array (#15644 )	2024-01-08 21:22:37 -08:00
Clint Wylie	911941b4a6	fix issue with nested virtual column index supplier for partial paths when processing from raw (#15643 )	2024-01-09 07:55:08 +05:30
Clint Wylie	df5bcd1367	fix bugs with expression virtual column indexes for expression virtual columns which refer to other virtual columns (#15633 ) changes: * ColumnIndexSelector now extends ColumnSelector. The only real implementation of ColumnIndexSelector, ColumnSelectorColumnIndexSelector, already has a ColumnSelector, so this isn't very disruptive * removed getColumnNames from ColumnSelector since it was not used * VirtualColumns and VirtualColumn getIndexSupplier method now needs argument of ColumnIndexSelector instead of ColumnSelector, which allows expression virtual columns to correctly recognize other virtual columns, fixing an issue which would incorrectly handle other virtual columns as non-existent columns instead * fixed a bug with sql planner incorrectly not using expression filter for equality filters on columns with extractionFn and no virtual column registry	2024-01-08 13:10:11 -08:00
Clint Wylie	c221a2634b	overhaul DruidPredicateFactory to better handle 3VL (#15629 ) * overhaul DruidPredicateFactory to better handle 3VL fixes some bugs caused by some limitations of the original design of how DruidPredicateFactory interacts with 3-value logic. The primary impacted area was with how filters on values transformed with expressions or extractionFn which turn non-null values into nulls, which were not possible to be modelled with the 'isNullInputUnknown' method changes: * adds DruidObjectPredicate to specialize string, array, and object based predicates instead of using guava Predicate * DruidPredicateFactory now uses DruidObjectPredicate * introduces DruidPredicateMatch enum, which all predicates returned from DruidPredicateFactory now use instead of booleans to indicate match. This means DruidLongPredicate, DruidFloatPredicate, DruidDoublePredicate, and the newly added DruidObjectPredicate apply methods all now return DruidPredicateMatch. This allows matchers and indexes * isNullInputUnknown has been removed from DruidPredicateFactory * rename, fix test * adjust * style * npe * more test * fix default value mode to not match new test	2024-01-05 19:08:02 -08:00
AmatyaAvadhanula	c41e99e10c	Do not allocate week granular segments unless requested (#15589 ) * Do not allocate week granular segments unless explicitly requested	2024-01-05 12:14:52 +05:30
Clint Wylie	f19ece146f	expression virtual column indexes (#15585 ) * ExpressionVirtualColumn + indexes = bff. Expression virtual columns can now use indexes of the underlying columns similar to how expression filters	2024-01-03 21:00:39 -08:00
Gian Merlino	e40b96e026	Reverse lookup fixes and enhancements. (#15611 ) * Reverse lookup fixes and enhancements. 1) Add a "mayIncludeUnknown" parameter to DimFilter#optimize. This is important because otherwise the reverse-lookup optimization is done improperly when the "in" filter appears under a "not", and the lookup extractionFn may return null for some possible values of the filtered column. The "includeUnknown" test cases in InDimFilterTest illustrate the difference in behavior. 2) Enhance InDimFilter#optimizeLookup to handle "mayIncludeUnknown", and to be able to do a reverse lookup in a wider variety of cases. 3) Make "unapply" protected in LookupExtractor, and move callers to "unapplyAll". The main reason is that MapLookupExtractor, a common implementation, lacks a reverse mapping and therefore does a scan of the map for each call to "unapply". For performance sake these calls need to be batched. * Remove optimize call from BloomDimFilter. * Follow the law. * Fix tests. * Fix imports. * Switch function. * Fix tests. * More tests.	2024-01-03 13:28:44 -08:00
Gian Merlino	01eec4a55e	New handling for COALESCE, SEARCH, and filter optimization. (#15609 ) * New handling for COALESCE, SEARCH, and filter optimization. COALESCE is converted by Calcite's parser to CASE, which is largely counterproductive for us, because it ends up duplicating expressions. In the current code we end up un-doing it in our CaseOperatorConversion. This patch has a different approach: 1) Add CaseToCoalesceRule to convert CASE back to COALESCE earlier, before the Volcano planner runs, using CaseToCoalesceRule. 2) Add FilterDecomposeCoalesceRule to decompose calls like "f(COALESCE(x, y))" into "(x IS NOT NULL AND f(x)) OR (x IS NULL AND f(y))". This helps use indexes when available on x and y. 3) Add CoalesceLookupRule to push COALESCE into the third arg of LOOKUP. 4) Add a native "coalesce" function so we can convert 3+ arg COALESCE. The advantage of this approach is that by un-doing the CASE to COALESCE conversion earlier, we have flexibility to do more stuff with COALESCE (like decomposition and pushing into LOOKUP). SEARCH is an operator used internally by Calcite to represent matching an argument against some set of ranges. This patch improves our handling of SEARCH in two ways: 1) Expand NOT points (point "holes" in the range set) from SEARCH as `!(a \|\| b)` rather than `!a && !b`, which makes it possible to convert them to a "not" of "in" filter later. 2) Generate those nice conversions for NOT points even if the SEARCH is not composed of 100% NOT points. Without this change, a SEARCH for "x NOT IN ('a', 'b') AND x < 'm'" would get converted like "x < 'a' OR (x > 'a' AND x < 'b') OR (x > 'b' AND x < 'm')". One of the steps we take when generating Druid queries from Calcite plans is to optimize native filters. This patch improves this step: 1) Extract common ANDed predicates in ConvertSelectorsToIns, so we can convert "(a && x = 'b') \|\| (a && x = 'c')" into "a && x IN ('b', 'c')". 2) Speed up CombineAndSimplifyBounds and ConvertSelectorsToIns on ORs with lots of children by adjusting the logic to avoid calling "indexOf" and "remove" on an ArrayList. 3) Refactor ConvertSelectorsToIns to reduce duplicated code between the handling for "selector" and "equals" filters. * Not so final. * Fixes. * Fix test. * Fix test.	2024-01-03 08:56:22 -08:00
Gian Merlino	b0e52c99bb	Fix ColumnSelectorColumnIndexSelector#getColumnCapabilities. (#15614 ) * Fix ColumnSelectorColumnIndexSelector#getColumnCapabilities. It was using virtualColumns.getColumnCapabilities, which only returns capabilities for virtual columns, not regular columns. The effect of this is that expression filters (and in some cases, arrayContainsElement filters) would build value matchers rather than use indexes. I think this has been like this since #12315, which added the getColumnCapabilities method to BitmapIndexSelector, and included the same implementation as exists in the code today. This error is easy to make due to the design of virtualColumns.getColumnCapabilities, so to help avoid it in the future, this patch renames the method to getColumnCapabilitiesWithoutFallback to emphasize that it does not return capabilities for regular columns. * Make getColumnCapabilitiesWithoutFallback package-private. * Fix expression filter bitmap usage.	2024-01-02 21:09:18 -08:00
Parth Agrawal	8505e8a909	Provide default implementation for RowFunction evalDimension method (#15452 ) The PR: #13947 introduced a function evalDimension() in the interface RowFunction. There was no default implementation added for this interface which causes all the implementations and custom transforms to fail and require to implement their own version of evalDimension method. This PR adds a default implementation in the interface which allows the evalDimension to return value as a Singleton array of eval result.	2024-01-02 11:14:23 +05:30
AlbericByte	a2e65e6a89	Support to pass dynamic values to timestamp Extract function (#15586 ) Fixes #15072 Before this modification , the third parameter (timezone) require to be a Literal, it will throw a error when this parameter is column Identifier.	2023-12-21 11:57:52 +05:30
Clint Wylie	8a45efbf65	fix some null handling bugs with vector expression processors (#15587 )	2023-12-19 08:14:17 -08:00
Kashif Faraz	9f568858ef	Add logging implementation for AuditManager and audit more endpoints (#15480 ) Changes - Add `log` implementation for `AuditManager` alongwith `SQLAuditManager` - `LoggingAuditManager` simply logs the audit event. Thus, it returns empty for all `fetchAuditHistory` calls. - Add new config `druid.audit.manager.type` which can take values `log`, `sql` (default) - Add new config `druid.audit.manager.logLevel` which can take values `DEBUG`, `INFO`, `WARN`. This gets activated only if `type` is `log`. - Remove usage of `ConfigSerde` from `AuditManager` as audit is not just limited to configs - Add `AuditSerdeHelper` for a single implementation of serialization/deserialization of audit payload and other utility methods.	2023-12-19 13:14:04 +05:30
Clint Wylie	e373f62692	fix expression post aggregator array handling when grouping wrapper types leak (#15543 ) * fix expression post aggregator array handling when grouping wrapper types leak * more consistent expression function error messaging	2023-12-15 21:43:27 -08:00
Tom	901ebbb744	Allow for kafka emitter producer secrets to be masked in logs (#15485 ) * Allow for kafka emitter producer secrets to be masked in logs instead of being visible This change will allow for kafka producer config values that should be secrets to not show up in the logs. This will enhance the security of the people who use the kafka emitter to use this if they want to. This is opt in and will not affect prior configs for this emitter * fix checkstyle issue * change property name	2023-12-15 12:21:21 -05:00
Zoltan Haindrich	7552dc49fb	Reduce amount of expression objects created during evaluations (#15552 ) I was looking into a query which was performing a bit poorly because the case_searched was touching more than 1 columns (if there is only 1 column there is a cache based evaluator). While I was doing that I've noticed that there are a few simple things which could help a bit: use a static TRUE/FALSE instead of creating a new object every time create the ExprEval early for ConstantExpr -s (except the one for BigInteger which seem to have some odd contract) return early from type autodetection these changes mostly reduce the amount of garbage the query creates during case_searched evaluation; although ExpressionSelectorBenchmark shows some improvements ~15% - but my manual trials on the taxi dataset with 60M rows showed more improvements - probably due to the fact that these changes mostly only reduce gc pressure.	2023-12-15 16:11:59 +05:30
Soumyava	3e15522d6b	Round works correctly on system metadata columns (#15554 )	2023-12-13 17:23:14 -08:00
Clint Wylie	e55f6b6202	remove search auto strategy, estimateSelectivity of BitmapColumnIndex (#15550 ) * remove search auto strategy, estimateSelectivity of BitmapColumnIndex * more cleanup	2023-12-13 16:30:01 -08:00
zachjsh	857693f5cf	Decorate sampling response with system fields if specified (#15536 ) * * decorate sampling response with system fields if specified * * add unit test	2023-12-13 12:16:59 -08:00
Ankit Kothari	8735d023a1	Add experimental support for first/last for double/float/long #10702 (#14462 ) Add experimental support for doubleLast, doubleFirst, FloatLast, FloatFirst, longLast and longFirst.	2023-12-12 11:36:51 +05:30
Abhishek Radhakrishnan	96be82a3e6	Clean up duty for non-overlapping eternity tombstones (#15281 ) * Add initial draft of MarkDanglingTombstonesAsUnused duty. * Use overshadowed segments instead of all used segments. * Add unit test for MarkDanglingSegmentsAsUnused duty. * Add mock call * Simplify code. * Docs * shorter lines formatting * metric doc * More tests, refactor and fix up some logic. * update javadocs; other review comments. * Make numCorePartitions as 0 in the TombstoneShardSpec. * fix up test * Add tombstone core partition tests * Update docs/design/coordinator.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * review comment * Minor cleanup * Only consider tombstones with 0 core partitions * Need to register the test shard type to make jackson happy * test comments * checkstyle * fixup misc typos in comments * Update logic to use overshadowed segments * minor cleanup * Rename duty to eternity tombstone instead of dangling. Add test for full eternity tombstone. * Address review feedback. --------- Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2023-12-11 08:57:15 -08:00
Clint Wylie	42f2496b7d	fix bug with nested empty array fields (#15532 )	2023-12-09 12:20:21 -08:00
Rishabh Singh	54df235026	Lazily build Filter in FilteredAggregatorFactory to avoid parsing exceptions in Router (#15526 ) Query with lookups in FilteredAggregator fails with this exception in router, Cannot construct instance of `org.apache.druid.query.aggregation.FilteredAggregatorFactory`, problem: Lookup [campaigns_lookup[campaignId][is_sold][autodsp]] not found at [Source: (org.eclipse.jetty.server.HttpInputOverHTTP); line: 1, column: 913] (through reference chain: org.apache.druid.query.groupby.GroupByQuery["aggregations"]->java.util.ArrayList[1]) T he problem is that constructor of FilteredAggregatorFactory is actually validating if the lookup exists in this statement dimFilter.toFilter(). This is failing on the router, which is to be expected, because, the router isn’t assigned any lookups. The fix is to move to a lazy initialisation of the filter object in the constructor.	2023-12-09 12:18:37 +05:30
Clint Wylie	e7c8f2e208	lift restriction of array_to_mv to only support direct column access (#15528 )	2023-12-08 16:27:17 -08:00
Clint Wylie	e64b92eb35	add JSON_QUERY_ARRAY function to pluck ARRAY<COMPLEX<json>> out of COMPLEX<json> (#15521 )	2023-12-08 05:28:46 -08:00
Zoltan Haindrich	c353ccfdef	Windowed min aggregates null-s as 0 (#15371 )	2023-12-08 01:41:16 -08:00
Clint Wylie	1eafe983ec	fix array presenting columns to not match single element arrays to scalars for equality (#15503 ) * fix array presenting columns to not match single element arrays to scalars for equality * update docs to clarify usage model of mixed type columns	2023-12-08 01:22:07 -08:00

1 2 3 4 5 ...

3122 Commits