druid

Commit Graph

Author	SHA1	Message	Date
Clint Wylie	5c14b42e72	fix incorrect unnest dimension cursor value matcher implementation (#15192 )	2023-10-18 16:43:06 -07:00
Clint Wylie	061cfee224	add native filters for "(filter) is true" and "(filter) is false" (#15182 ) * add native filters for "(filter) is true" and "(filter) is false" changes: * add IsTrueDimFilter, IsFalseDimFilter, and abstract IsBooleanDimFilter for native json filter implementations of `(filter) IS TRUE` and `(filter) IS FALSE` * add IsBooleanFilter for actual filtering logic for these filters, which ignore includeUnknown to always use matches with false for true and !matches with true for false * fix test incorrectly adjusted to wrong answer in #15058 * add tests for default value mode	2023-10-18 13:07:35 -07:00
Clint Wylie	22034a1630	preserve Rows.objectToStrings behavior of translating null into "null" inside of lists and arrays (#15190 )	2023-10-17 19:49:36 -07:00
Laksh Singla	b4540ed5d4	Optimize the reading of numerical frame arrays in MSQ (#15175 )	2023-10-18 02:33:42 +05:30
Laksh Singla	dc8d2192c3	Introduce natural comparator for types that don't have a StringComparator (#15145 ) Fixes a bug when executing queries with the ordering of arrays	2023-10-16 10:37:32 +05:30
Pranav	4b0d1b3488	Fix expression result writing of arrays in Hadoop Ingestion (#15127 )	2023-10-13 13:41:41 -07:00
Zoltan Haindrich	6d62c75866	Fix columns with null values in windowing expressions (#15131 )	2023-10-13 10:42:45 -04:00
AmatyaAvadhanula	d25caaefa4	Add support for streaming ingestion with concurrent replace (#15039 ) Add support for streaming ingestion with concurrent replace --------- Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2023-10-13 09:09:03 +05:30
Clint Wylie	d0f64608eb	sql compatible three-valued logic native filters (#15058 ) * sql compatible tri-state native logical filters when druid.expressions.useStrictBooleans=true and druid.generic.useDefaultValueForNull=false, and new druid.generic.useThreeValueLogicForNativeFilters=true * log.warn if non-default configurations are used to guide operators towards SQL complaint behavior	2023-10-12 00:06:23 -07:00
Zoltan Haindrich	ae88f2c0b6	Fix non-sqlcompat validation in CalciteWindowQueryTest (#15086 ) * fixes * check for latest rewrite place * Revert "check for latest rewrite place" This reverts commit `5cf1e2c1ca`. * some stuff (cherry picked from commit ab346d4373ea888eb8ef6115e018e7fb0d27407f) * update test output * updates to test ouptuts * some stuff * move validator * cleanup * fix * change test slightly * add apidoc cleanup warnings * cleanup/etc * instead of telling the story; add a fail with some reason whats the issue * lead-lag fix * add test * remove unnecessary throw * druidexception-trial * Revert "druidexception-trial" This reverts commit `8fa06644bc`. * undo changes to no_grouping; add no_grouping2 * add missing assert on resultcount * rename method; update * introduce enum/etc * make resultmatchmode accessible from TestBuilder#expectedResults * fix dump results to use log * fix * handle null correctly * disable feature type based things for MSQ * fix varianssqlaggtest * use eps in other test * fix intellij error * add final * addrss review * update test/string/etc * write concat in 3 lines :D	2023-10-11 12:34:31 -07:00
Laksh Singla	5f86072456	Prepare master for Druid 29 (#15121 ) Prepare master for Druid 29	2023-10-11 10:33:45 +05:30
Karan Kumar	48f35b3fdd	Add query id to processing pool thread name. (#15059 ) This patch changes the thread name of the processing pool of the indexers/peons/historicals from query.getType() + "_" + query.getDataSource() + "_" + query.getIntervals() to query.getId()	2023-10-10 05:59:03 +05:30
Laksh Singla	95bf331c08	Rename the default setting of 'maxSubqueryBytes' from 'unlimited' to 'disabled' (#15108 ) The default setting of 'maxSubqueryBytes' is renamed from 'unlimited' to 'disabled'.	2023-10-10 02:03:29 +05:30
Clint Wylie	1fc8fb1b20	add a bunch of tests with array typed columns to CalciteArraysQueryTest (#15101 ) * add a bunch of tests with array typed columns to CalciteArraysQueryTest * fix a bug with unnest filter pushdown when filtering on unnested array columns	2023-10-09 06:16:06 -07:00
Laksh Singla	549ef56288	UNION ALLs in MSQ (#14981 ) MSQ now supports UNION ALL with UnionDataSource	2023-10-09 18:18:15 +05:30
Adarsh Sanjeev	7a35ce886d	Add ability for MSQ tasks to query realtime tasks (#15024 ) This PR aims to add the capabilities to: 1. Fetch the realtime segment metadata from the coordinator server view, 2. Adds the ability for workers to query indexers, similar to how brokers do the same for native queries.	2023-10-09 15:14:03 +05:30
kaisun2000	e2cc1c4ad1	Add metric -- count of queries waiting for merge buffers (#15025 ) Add 'mergeBuffer/pendingRequests' metric that exposes the count of waiting queries (threads) blocking in the merge buffers pools.	2023-10-09 12:56:23 +05:30
Gian Merlino	c483cb863d	Fix IndexerWorkerClient#fetchChannelData when response has data and error. (#15084 ) * Fix IndexerWorkerClient#fetchChannelData when response has data and error. When a channel data response from a worker includes some data and then some I/O error, then when the call is retried, we will re-read the set of data that was read by the previous connection and add it to the local channel again. This causes the local channel to become corrupted. The patch fixes this case by skipping data that has already been read.	2023-10-09 11:12:28 +05:30
Soumyava	57ab8e13dc	Updating plans when using joins with unnest on the left (#15075 ) * Updating plans when using joins with unnest on the left * Correcting segment map function for hashJoin * The changes done here are not reflected into MSQ yet so these tests might not run in MSQ * native tests * Self joins with unnest data source * Making this pass * Addressing comments by adding explanation and new test	2023-10-06 19:23:12 -07:00
Pranav	06c5527c85	Allow aliasing of Macros and add new alias for complex decode 64 (#15034 ) * Add AliasExprMacro to allow aliasing of native expression macros * Add decode_base64_complex alias for complex_decode_base64	2023-10-05 16:24:36 -07:00
Laksh Singla	2c286d6f42	Fix monomorphic processing code running on JDK8 since it references a non-existing method (#15092 ) Code relying on monomorphic processing on JDK8 doesn't work correctly, since it tries to reference getArrayLength using method handles, which might have been accidentally removed here since it seems unused. This PR adds the method back as is.	2023-10-05 11:05:38 +05:30
Clint Wylie	b4bc9b6950	fix issue with auto columns with mix of scalar values and empty arrays (#15083 )	2023-10-05 10:15:45 +05:30
Laksh Singla	b8d03d36b0	Free up the resources when materializing the results as Frames (#15032 ) Refactor the code to clean up the result sequences when materializing the results as Frames	2023-10-05 10:14:27 +05:30
Clint Wylie	3afe09a19d	urlencode nested serializer temp file names so they dont explode stuff (#15068 ) Fixes a bug caused by #14919, which was just using the column name as part of a temp file name, which.. isn't very cool, my bad. Switched to use StringUtils.urlEncode so that ugly chars don't explode stuff. The modified test fails without the changes in this PR.	2023-10-05 10:13:45 +05:30
Laksh Singla	30cf76db99	Field writers for numerical arrays (#14900 ) Row-based frames, and by extension, MSQ now supports numeric array types. This means that all queries consuming or producing arrays would also work with MSQ. Numeric arrays can also be ingested via MSQ. Post this patch, queries like, SELECT [1, 2] would work with MSQ since they consume a numeric array, instead of failing with an unsupported column type exception.	2023-10-04 23:16:47 +05:30
Gian Merlino	a9021e4cd7	Fix NPE with lenient aggregators merging in segmentMetadata. (#15078 ) When merging analyses, lenient merging sets unmergeable aggregators to null. Merging such a null aggregator record into a nonnull record would potentially lead to NPE in getMergingFactory. The new code only calls getMergingFactory if both the old and new aggregators are nonnull; else, if either is null, then the merged aggregator is also set to null.	2023-10-04 02:41:41 -07:00
Clint Wylie	632811b285	fix json compat layer to not rewrite v4 into v5 after segment merging (#14997 )	2023-10-04 00:18:18 -07:00
Gian Merlino	2ed4fd1ae3	Compute broadcast-join segmentMapFn only once per worker. (#15007 ) This patch introduces "processor managers" to processor factories, as a replacement for the sequence of processors. Processor managers can use the results of earlier processors to influence the creation of later processors, which provides us with the building block we need to ensure that broadcast join data is only read once. In particular, when broadcast join is happening, the BaseFrameProcessorFactory now uses a ChainedProcessorManager to first run BroadcastJoinSegmentMapFnProcessor (in a single thread), and then run all of the regular processors (possibly multithreaded).	2023-10-04 11:47:00 +05:30
Vishesh Garg	7e8f3e69ef	Avoid intermediate offsets in bucketStart calculation logic to handle DST transition (#15038 ) When moving timestamps by an offset using org.joda.time.chrono.ISOChronology library, if the new timestamp falls in Daylight Savings Time (DST) transition period, the library rounds it off to the nearest valid time. This can lead to incorrect final timestamp when calculated using intermediate offsets landing in DST transition, for e.g. +21D arrived at using +14D and +7D offset, where +14D lands in DST transition period. Since bucketStart values are calculated using this library, this behaviour can lead to incorrect bucketStart times.	2023-10-04 11:32:29 +05:30
Xavier Léauté	adef2069b1	Make unit tests pass with Java 21 (#15014 ) This change updates dependencies as needed and fixes tests to remove code incompatible with Java 21 As a result all unit tests now pass with Java 21. * update maven-shade-plugin to 3.5.0 and follow-up to #15042 * explain why we need to override configuration when specifying outputFile * remove configuration from dependency management in favor of explicit overrides in each module. * update to mockito to 5.5.0 for Java 21 support when running with Java 11+ * continue using latest mockito 4.x (4.11.0) when running with Java 8 * remove need to mock private fields * exclude incorrectly declared mockito dependency from pac4j-oidc * remove mocking of ByteBuffer, since sealed classes can no longer be mocked in Java 21 * add JVM options workaround for system-rules junit plugin not supporting Java 18+ * exclude older versions of byte-buddy from assertj-core * fix for Java 19 changes in floating point string representation * fix missing InitializedNullHandlingTest * update easymock to 5.2.0 for Java 21 compatibility * update animal-sniffer-plugin to 1.23 * update nl.jqno.equalsverifier to 3.15.1 * update exec-maven-plugin to 3.1.0	2023-10-03 22:41:21 -07:00
George Shiqi Wu	64754b6799	Allow users to pass task payload via deep storage instead of environment variable (#14887 ) This change is meant to fix a issue where passing too large of a task payload to the mm-less task runner will cause the peon to fail to startup because the payload is passed (compressed) as a environment variable (TASK_JSON). In linux systems the limit for a environment variable is commonly 128KB, for windows systems less than this. Setting a env variable longer than this results in a bunch of "Argument list too long" errors.	2023-10-03 14:08:59 +05:30
Pranav	f1edd671fb	Exposing optional replaceMissingValueWith in lookup function and macros (#14956 ) * Exposing optional replaceMissingValueWith in lookup function and macros * args range validation * Updating docs * Addressing comments * Update docs/querying/sql-scalar.md Co-authored-by: Clint Wylie <cjwylie@gmail.com> * Update docs/querying/sql-functions.md Co-authored-by: Clint Wylie <cjwylie@gmail.com> * Addressing comments --------- Co-authored-by: Clint Wylie <cjwylie@gmail.com>	2023-10-02 17:09:23 -07:00
Pranav	07c28f17ca	Fix missing format strings in calls to DruidException.build (#15056 ) * Fix the NPE bug in nonStrictFormat * using non null format string * using Assert.assertThrows	2023-09-29 17:00:36 -07:00
Karan Kumar	2f1bcd6717	Adding `"segment/scan/active" metric for processing thread pool. (#15060 )	2023-09-29 12:34:28 -07:00
Zoltan Haindrich	022950a0c5	MV_FILTER_ONLY may run into Exceptions in case duplicate values were processed (#15012 )	2023-09-27 19:19:42 +05:30
Gian Merlino	3dabfead05	Fix getResultType for HLL, quantiles aggregators. (#15043 ) The aggregators had incorrect types for getResultType when shouldFinalze is false. They had the finalized type, but they should have had the intermediate type. Also includes a refactor of how ExprMacroTable is handled in tests, to make it easier to add tests for this to the MSQ module. The bug was originally noticed because the incorrect result types caused MSQ queries with DS_HLL to behave erratically.	2023-09-27 08:51:14 +05:30
Gian Merlino	0850e615b2	Remove istrue, isfalse vectorized impls. (#14991 ) These were added in #14977, but the implementations are incorrect, because they return null when the input arg is null. They should return false when the input is null. Remove them for now, rather than fixing them, since they're so new that they might as well never have existed.	2023-09-25 11:34:24 +05:30
AmatyaAvadhanula	c62193c4d7	Add support for concurrent batch Append and Replace (#14407 ) Changes: - Add task context parameter `taskLockType`. This determines the type of lock used by a batch task. - Add new task actions for transactional replace and append of segments - Add methods StorageCoordinator.commitAppendSegments and commitReplaceSegments - Upgrade segments to appropriate versions when performing replace and append - Add new metadata table `upgradeSegments` to track segments that need to be upgraded - Add tests	2023-09-25 07:06:37 +05:30
Pranav	883c2692d2	Adding new function decode_base64_utf8 and expr macro (#14943 ) * Adding new function decode_base64_utf8 and expr macro * using BaseScalarUnivariateMacroFunctionExpr * Print stack trace in case of debug in ChainedExecutionQueryRunner * fix static check	2023-09-20 17:06:34 -07:00
Xavier Léauté	22abc10f24	update RoaringBitmap to 0.9.49 (#15006 ) * update RoaringBitmap to 0.9.49 update RoaringBitmap from 0.9.0 to 0.9.49 Many optimizations and improvements have gone into recent releases of RoaringBitmap. It seems worthwhile to incorporate those. * implement workaround for BatchIterator interface change * add test case for BatchIteratorAdapter.advanceIfNeeded	2023-09-20 15:52:27 -07:00
Gian Merlino	823f620ede	Add IS [NOT] DISTINCT FROM to SQL and join matchers. (#14976 ) * Add IS [NOT] DISTINCT FROM to SQL and join matchers. Changes: 1) Add "isdistinctfrom" and "notdistinctfrom" native expressions. 2) Add "IS [NOT] DISTINCT FROM" to SQL. It uses the new native expressions when generating expressions, and is treated the same as equals and not-equals when generating native filters on literals. 3) Update join matchers to have an "includeNull" parameter that determines whether we are operating in "equals" mode or "is not distinct from" mode. * Main changes: - Add ARRAY handling to "notdistinctfrom" and "isdistinctfrom". - Include null in pushed-down filters when using "notdistinctfrom" in a join. Other changes: - Adjust join filter analyzer to more explicitly use InDimFilter's ValuesSets, relying less on remembering to get it right to avoid copies. * Remove unused "wrap" method. * Fixes. * Remove methods we do not need. * Fix bug with INPUT_REF.	2023-09-20 10:44:32 -07:00
Zoltan Haindrich	79f882f48c	Fix exception cause logging in QueryResultPusher (#14975 )	2023-09-20 15:44:02 +05:30
Rohan Garg	39d95955f5	Do not eagerly close inner iterators in CloseableIterator#flatMap (#14986 )	2023-09-15 15:14:20 +05:30
Soumyava	279b3818f0	Make Unnest work with nullif operator (#14993 ) This is due to the recursive filter creation in unnest storage adapter not performing correctly in case of an empty children. This PR addresses the issue	2023-09-15 09:54:14 +05:30
Gian Merlino	3ae5e97801	Add IS [NOT] TRUE, IS [NOT] FALSE native functions. (#14977 ) They are not quite the same as "x == true", "x != true", etc. These functions never return null, even when "x" itself is null.	2023-09-14 09:19:09 -07:00
Soumyava	5c42ac8c4d	Fix for latest agg to handle nulls in time column. Also adding optimi… (#14911 ) * Fix for latest agg to handle nulls in time column. Also adding optimization for dictionary encoded string columns * One minor fix * Adding more tests for the new class * Changing the init to a putInt	2023-09-13 17:37:26 -07:00
Soumyava	bf99d2c7b2	Fix for schema mismatch to go down using the non vectorize path till we update the vectorized aggs properly (#14924 ) * Fix for schema mismatch to go down using the non vectorize path till we update the vectorized aggs properly * Fixing a failed test * Updating numericNilAgg * Moving to use default values in case of nil agg * Adding the same for first agg * Fixing a test * fixing vectorized string agg for last/first with cast if numeric * Updating tests to remove mockito and cover the case of string first/last on non string columns * Updating a test to vectorize * Addressing review comments: Name change to NilVectorAggregator and using static variables now * fixing intellij inspections	2023-09-13 13:15:14 -07:00
Clint Wylie	23b78c0f95	use mmap for nested column value to dictionary id lookup for more chill heap usage during serialization (#14919 )	2023-09-12 21:01:18 -07:00
Kashif Faraz	286eecad7c	Simplify DruidCoordinatorConfig and binding of metadata cleanup duties (#14891 ) Changes: - Move following configs from `CliCoordinator` to `DruidCoordinatorConfig`: - `druid.coordinator.kill.on` - `druid.coordinator.kill.pendingSegments.on` - `druid.coordinator.kill.supervisors.on` - `druid.coordinator.kill.rules.on` - `druid.coordinator.kill.audit.on` - `druid.coordinator.kill.datasource.on` - `druid.coordinator.kill.compaction.on` - In the Coordinator style used by historical management duties, always instantiate all the metadata cleanup duties but execute only if enabled. In the existing code, they are instantiated only when enabled by using optional binding with Guice. - Add a wrapper `MetadataManager` which contains handles to all the different metadata managers for rules, supervisors, segments, etc. - Add a `CoordinatorConfigManager` to simplify read and update of coordinator configs - Remove persistence related methods from `CoordinatorCompactionConfig` and `CoordinatorDynamicConfig` as these are config classes. - Remove annotations `@CoordinatorIndexingServiceDuty`, `@CoordinatorMetadataStoreManagementDuty`	2023-09-13 09:06:57 +05:30
Clint Wylie	891f0a3fe9	longer compatibility window for nested column format v4 (#14955 ) changes: * add back nested column v4 serializers * 'json' schema by default still uses the newer 'nested common format' used by 'auto', but now has an optional 'formatVersion' property which can be specified to override format versions on native ingest jobs * add system config to specify default column format stuff, 'druid.indexing.formats', and property 'druid.indexing.formats.nestedColumnFormatVersion' to specify system level preferred nested column format for friendly rolling upgrades from versions which do not support the newer 'nested common format' used by 'auto'	2023-09-12 14:07:53 -07:00

1 2 3 4 5 ...

2939 Commits