druid

Commit Graph

Author	SHA1	Message	Date
Ankit Kothari	8735d023a1	Add experimental support for first/last for double/float/long #10702 (#14462 ) Add experimental support for doubleLast, doubleFirst, FloatLast, FloatFirst, longLast and longFirst.	2023-12-12 11:36:51 +05:30
Abhishek Radhakrishnan	96be82a3e6	Clean up duty for non-overlapping eternity tombstones (#15281 ) * Add initial draft of MarkDanglingTombstonesAsUnused duty. * Use overshadowed segments instead of all used segments. * Add unit test for MarkDanglingSegmentsAsUnused duty. * Add mock call * Simplify code. * Docs * shorter lines formatting * metric doc * More tests, refactor and fix up some logic. * update javadocs; other review comments. * Make numCorePartitions as 0 in the TombstoneShardSpec. * fix up test * Add tombstone core partition tests * Update docs/design/coordinator.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * review comment * Minor cleanup * Only consider tombstones with 0 core partitions * Need to register the test shard type to make jackson happy * test comments * checkstyle * fixup misc typos in comments * Update logic to use overshadowed segments * minor cleanup * Rename duty to eternity tombstone instead of dangling. Add test for full eternity tombstone. * Address review feedback. --------- Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2023-12-11 08:57:15 -08:00
Clint Wylie	42f2496b7d	fix bug with nested empty array fields (#15532 )	2023-12-09 12:20:21 -08:00
Rishabh Singh	54df235026	Lazily build Filter in FilteredAggregatorFactory to avoid parsing exceptions in Router (#15526 ) Query with lookups in FilteredAggregator fails with this exception in router, Cannot construct instance of `org.apache.druid.query.aggregation.FilteredAggregatorFactory`, problem: Lookup [campaigns_lookup[campaignId][is_sold][autodsp]] not found at [Source: (org.eclipse.jetty.server.HttpInputOverHTTP); line: 1, column: 913] (through reference chain: org.apache.druid.query.groupby.GroupByQuery["aggregations"]->java.util.ArrayList[1]) T he problem is that constructor of FilteredAggregatorFactory is actually validating if the lookup exists in this statement dimFilter.toFilter(). This is failing on the router, which is to be expected, because, the router isn’t assigned any lookups. The fix is to move to a lazy initialisation of the filter object in the constructor.	2023-12-09 12:18:37 +05:30
Clint Wylie	e7c8f2e208	lift restriction of array_to_mv to only support direct column access (#15528 )	2023-12-08 16:27:17 -08:00
Clint Wylie	e64b92eb35	add JSON_QUERY_ARRAY function to pluck ARRAY<COMPLEX<json>> out of COMPLEX<json> (#15521 )	2023-12-08 05:28:46 -08:00
Zoltan Haindrich	c353ccfdef	Windowed min aggregates null-s as 0 (#15371 )	2023-12-08 01:41:16 -08:00
Clint Wylie	1eafe983ec	fix array presenting columns to not match single element arrays to scalars for equality (#15503 ) * fix array presenting columns to not match single element arrays to scalars for equality * update docs to clarify usage model of mixed type columns	2023-12-08 01:22:07 -08:00
sb89594	5fda8613ad	Feature: Add IPv6 Match Function (#15212 )	2023-12-07 23:09:06 -08:00
AlbericByte	935aa187a0	add Assert function to verify in the DataGeneratorTest (#15504 ) * add Assert function to verify in the DataGeneratorTest * remove unused log in DataGeneratorTest * add comment for DataGeneratorTest	2023-12-08 09:12:17 +08:00
Clint Wylie	c241c6980c	store auto columns with only empty or null containing arrays as ARRAY<LONG> instead of COMPLEX<json> (#15505 )	2023-12-07 03:31:43 -08:00
Clint Wylie	557f3f6f57	add array column type support to EXTEND operator (#15458 )	2023-12-06 23:21:35 -08:00
Gian Merlino	6f51155ccb	Fix NullFilter getDimensionRangeSet. (#15500 ) It wasn't checking the column name, so it would return a domain regardless of the input column. This means that null filters on data sources with range partitioning would lead to excessive pruning of segments, and therefore missing results.	2023-12-06 15:09:59 +05:30
Clint Wylie	0516d0dae4	simplify IncrementalIndex since group-by v1 has been removed (#15448 )	2023-11-29 14:46:16 -08:00
Pranav	93cd638645	Enabling aggregateMultipleValues in all StringAnyAggregators (#15434 ) * Enabling aggregateMultipleValues in all StringAnyAggregators * Adding more tests * More validation * fix warning * updating asserts in decoupled mode * fix intellij inspection * Addressing comments * Addressing comments * Adding early validations and make aggregate consistent across all * fixing tests * fixing tests * Update docs/querying/sql-aggregations.md Co-authored-by: Clint Wylie <cjwylie@gmail.com> * fixing static check --------- Co-authored-by: Clint Wylie <cjwylie@gmail.com>	2023-11-29 14:32:49 -08:00
Clint Wylie	64fcb32bcf	add native 'array contains element' filter (#15366 ) * add native arrayContainsElement filter to use array column element indexes	2023-11-29 03:33:00 -08:00
Clint Wylie	97623b408c	add optional 'castToType' parameter to 'auto' column schema (#15417 ) * auto but.. with an expected type	2023-11-28 17:19:23 -08:00
Zoltan Haindrich	eb056e23b5	Fix dictionarySize overrides in tests (#15354 ) I think this is a problem as it discards the false return value when the putToKeyBuffer can't store the value because of the limit Not forwarding the return value at that point may lead to the normal continuation here regardless something was not added to the dictionary like here	2023-11-28 18:49:09 +05:30
Kashif Faraz	58a724c7e4	Use StubServiceEmitter in tests (#15426 ) * Use StubServiceEmitter in tests * Remove unthrown exception from declaration	2023-11-28 09:43:09 +05:30
Zoltan Haindrich	dff5bcb0a6	Fix resultcache multiple postaggregation restore (#15402 ) Fixes https://github.com/apache/druid/issues/15393	2023-11-21 15:58:20 +05:30
Abhishek Radhakrishnan	470c8ed7b0	Make `numCorePartitions` as 0 for tombstones (#15379 ) * Make numCorePartitions as 0 in the TombstoneShardSpec. * fix up test * Add tombstone core partition tests * review comment * Need to register the test shard type to make jackson happy	2023-11-20 09:42:51 -08:00
Clint Wylie	a95c22ce70	support non-constant expressions for path arguments for json_value and json_query (#15320 ) * support dynamic expressions for path arguments for json_value and json_query	2023-11-17 01:12:05 -08:00
Yashdeep Thorat	7b5790c72c	Fix flaky tests in ParserTest.java (#15318 ) Fixed the following flaky tests: org.apache.druid.math.expr.ParserTest#testApplyFunctions org.apache.druid.math.expr.ParserTest#testSimpleMultiplicativeOp1 org.apache.druid.math.expr.ParserTest#testFunctions org.apache.druid.math.expr.ParserTest#testSimpleLogicalOps1 org.apache.druid.math.expr.ParserTest#testSimpleAdditivityOp1 org.apache.druid.math.expr.ParserTest#testSimpleAdditivityOp2 The above mentioned tests have been reported as flaky (tests assuming deterministic implementation of a non-deterministic specification ) when ran against the NonDex tool. The tests contain assertions (Assertion 1 & Assertion 2) that compare an ArrayList created from a HashSet using the ArrayList() constructor with another List. However, HashSet does not guarantee the ordering of elements and thus resulting in these flaky tests that assume deterministic implementation of HashSet. Thus, when the NonDex tool shuffles the HashSet elements, it results in the test failures: Co-authored-by: ythorat2 <ythorat2@illinois.edu>	2023-11-17 12:29:23 +05:30
Abhishek Radhakrishnan	2e79fd56a7	MSQ generates tombstones honoring granularity specified in a `REPLACE` query. (#15243 ) * MSQ generates tombstones honoring the query's granularity. This change tweaks to only account for the infinite-interval tombstones. For finite-interval tombstones, the MSQ query granualrity will be used which is consistent with how MSQ works. * more tests and some cleanup. * checkstyle * comment edits * Throw TooManyBuckets fault based on review; add more tests. * Add javadocs for both methods on reconciling the methods. * review: Move testReplaceTombstonesWithTooManyBucketsThrowsException to MsqFaultsTest * remove unused imports. * Move TooManyBucketsException to indexing package for shared exception handling. * lower max bucket for tests and fixup count * Advance and count the iterator. * checkstyle	2023-11-14 23:35:36 -08:00
Adarsh Sanjeev	a134cc30a6	Change default inSubQueryThreshold (#15336 )	2023-11-14 14:08:12 +05:30
Pranav	e2fde8c516	Refactor lookups behavior while loading/dropping the containers (#14806 )	2023-11-07 10:07:28 -08:00
Rishabh Singh	8c802e4c9b	Relocating Table Schema Building: Shifting from Brokers to Coordinator for Improved Efficiency (#14985 ) In the current design, brokers query both data nodes and tasks to fetch the schema of the segments they serve. The table schema is then constructed by combining the schemas of all segments within a datasource. However, this approach leads to a high number of segment metadata queries during broker startup, resulting in slow startup times and various issues outlined in the design proposal. To address these challenges, we propose centralizing the table schema management process within the coordinator. This change is the first step in that direction. In the new arrangement, the coordinator will take on the responsibility of querying both data nodes and tasks to fetch segment schema and subsequently building the table schema. Brokers will now simply query the Coordinator to fetch table schema. Importantly, brokers will still retain the capability to build table schemas if the need arises, ensuring both flexibility and resilience.	2023-11-04 19:33:25 +05:30
Clint Wylie	5d39b94149	allow compaction to work with spatial dimensions (#15321 )	2023-11-03 11:27:50 -07:00
Gian Merlino	98f1eb8ede	Use filters for pruning properly for hash-joins. (#15299 ) * Use filters for pruning properly for hash-joins. Native used them too aggressively: it might use filters for the RHS to prune the LHS. MSQ used them not at all. Now, both use them properly, pruning based on base (LHS) columns only. * Fix tests. * Fix style. * Clear filterFields too. * Update.	2023-11-03 07:29:16 -07:00
Gian Merlino	d87d92bc43	Add system fields to input sources. (#15276 ) * Add system fields to input sources. Main changes: 1) The SystemField enum defines system fields "__file_uri", "__file_path", and "__file_bucket". They are associated with each input entity. 2) The SystemFieldInputSource interface can be added to any InputSource to make it system-field-capable. It sets up serialization of a list of configured "systemFields" in the JSON form of the input source, and provides a method getSystemFieldValue for computing the value of each system field. Cloud object, HDFS, HTTP, and Local now have this. * Fix various LocalInputSource calls. * Fix style stuff. * Fixups. * Fix tests and coverage.	2023-11-02 10:31:28 -07:00
Clint Wylie	d261587f4a	explicit outputType for ExpressionPostAggregator, better documentation for the differences between arrays and mvds (#15245 ) * better documentation for the differences between arrays and mvds * add outputType to ExpressionPostAggregator to make docs true * add output coercion if outputType is defined on ExpressionPostAgg * updated post-aggregations.md to be consistent with aggregations.md and filters.md and use tables	2023-11-02 00:31:37 -07:00
Gian Merlino	37e158c2c4	Frames: consider writing singly-valued column when input column hasMultipleValues is UNKNOWN. (#15300 ) * Frames: consider writing singly-valued column when input column hasMultipleValues is UNKNOWN. Prior to this patch, columnar frames would always write multi-valued columns if the input column had hasMultipleValues = UNKNOWN. This had the effect of flipping UNKNOWN to TRUE when copying data into frames, which is problematic because TRUE causes expressions to assume that string inputs must be treated as arrays. We now avoid this by flipping UNKNOWN to FALSE if no multi-valuedness is encountered, and flipping it to TRUE if multi-valuedness is encountered. * Add regression test case.	2023-11-01 22:05:53 -07:00
Vishesh Garg	a27598a487	Segregate advance and advanceUninterruptibly flow in postJoinCursor to allow for interrupts in advance (#15222 ) Currently advance function in postJoinCursor calls advanceUninterruptibly which in turn keeps calling baseCursor.advanceUninterruptibly until the post join condition matches, without checking for interrupts. This causes the CPU to hit 100% without getting a chance for query to be cancelled. With this change, the call flow of advance and advanceUninterruptibly is separated out so that they call baseCursor.advance and baseCursor.advanceUninterruptibly in them, respectively, giving a chance for interrupts in the former case between successive calls to baseCursor.advance.	2023-10-30 14:39:15 +05:30
Ben Sykes	275c1ec64c	Fix error assuming a Complex Type that is a Number is a double (#15272 ) * Fix error assuming a Complex Type that is a Number is a double In the case where a complex type is a number, it may not be castable to double. It can safely be case as Number first to get to the doubleValue.	2023-10-30 09:52:52 +05:30
Zoltan Haindrich	f4a74710e6	Process pure ordering changes with windowing operators (#15241 ) - adds a new query build path: DruidQuery#toScanAndSortQuery which: - builds a ScanQuery without considering the current ordering - builds an operator to execute the sort - fixes a null string to "null" literal string conversion in the frame serializer code - fixes some DrillWindowQueryTest cases - fix NPE in NaiveSortOperator in case there was no input - enables back CoreRules.AGGREGATE_REMOVE - adds a processing level OffsetLimit class and uses that instead of just the limit in the rac parts - earlier window expressions on top of a subquery with an offset may have ignored the offset	2023-10-29 16:40:49 +05:30
Simon Hofbauer	e9b7e4a0eb	fix JSON flaky tests (#15261 ) Co-authored-by: simonh5 <simonh5@illinois.edu>	2023-10-26 20:27:09 -07:00
Zoltan Haindrich	f48263bbb3	Report function name for unknown exceptions during execution (#14987 ) * provide function name when unknown exceptions are encountered * fix keywords/etc * fix keywrod order - regex excercise * add test * add check&fix keywords * decoupledIgnore * Revert "decoupledIgnore" This reverts commit `e922c820a7`. * unpatch Function * move to a different location * checkstyle	2023-10-25 13:37:30 -07:00
Zoltan Haindrich	6784e9c507	Fix summary row issues in case postaggregations are happening (#15232 ) * fix-1/2 * add message v1 * extend test to cover for IOB issue * move stuff around * change message * fix testcase string * compute postaggs (thank you Clint!) * enable feature for test * ignore tests in msq --------- Co-authored-by: Soumyava Das <soumyava@users.noreply.github.com>	2023-10-24 20:33:59 -07:00
Clint Wylie	4149c9422c	cleanup temp files for nested column serializer (#15236 ) * cleanup temp files for nested column serializer * fix style * fix tests in default value mode	2023-10-24 15:30:00 -07:00
Zoltan Haindrich	b95035f183	Fix VirtualColumn related issues in window expressions (#15119 ) for some exotic queries like: SELECT '_'\|\|dim1, MIN(cast(0 as double)) OVER (), MIN(cast((cnt\|\|cnt) as bigint)) OVER () FROM foo the compilation have resulted in NPE -s mostly because VirtualColumn -s were not handled properly	2023-10-23 14:05:59 +05:30
Clint Wylie	c8e458452d	Fix native is boolean filter cache key tests to test the right thing (#15216 )	2023-10-23 11:24:46 +05:30
Clint Wylie	5c14b42e72	fix incorrect unnest dimension cursor value matcher implementation (#15192 )	2023-10-18 16:43:06 -07:00
Clint Wylie	061cfee224	add native filters for "(filter) is true" and "(filter) is false" (#15182 ) * add native filters for "(filter) is true" and "(filter) is false" changes: * add IsTrueDimFilter, IsFalseDimFilter, and abstract IsBooleanDimFilter for native json filter implementations of `(filter) IS TRUE` and `(filter) IS FALSE` * add IsBooleanFilter for actual filtering logic for these filters, which ignore includeUnknown to always use matches with false for true and !matches with true for false * fix test incorrectly adjusted to wrong answer in #15058 * add tests for default value mode	2023-10-18 13:07:35 -07:00
Clint Wylie	22034a1630	preserve Rows.objectToStrings behavior of translating null into "null" inside of lists and arrays (#15190 )	2023-10-17 19:49:36 -07:00
Laksh Singla	b4540ed5d4	Optimize the reading of numerical frame arrays in MSQ (#15175 )	2023-10-18 02:33:42 +05:30
Laksh Singla	dc8d2192c3	Introduce natural comparator for types that don't have a StringComparator (#15145 ) Fixes a bug when executing queries with the ordering of arrays	2023-10-16 10:37:32 +05:30
Pranav	4b0d1b3488	Fix expression result writing of arrays in Hadoop Ingestion (#15127 )	2023-10-13 13:41:41 -07:00
Zoltan Haindrich	6d62c75866	Fix columns with null values in windowing expressions (#15131 )	2023-10-13 10:42:45 -04:00
AmatyaAvadhanula	d25caaefa4	Add support for streaming ingestion with concurrent replace (#15039 ) Add support for streaming ingestion with concurrent replace --------- Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2023-10-13 09:09:03 +05:30
Clint Wylie	d0f64608eb	sql compatible three-valued logic native filters (#15058 ) * sql compatible tri-state native logical filters when druid.expressions.useStrictBooleans=true and druid.generic.useDefaultValueForNull=false, and new druid.generic.useThreeValueLogicForNativeFilters=true * log.warn if non-default configurations are used to guide operators towards SQL complaint behavior	2023-10-12 00:06:23 -07:00

1 2 3 4 5 ...

2980 Commits