Commit Graph

13494 Commits

Author SHA1 Message Date
Sébastien 9ca10c7bd7
Added concurrent compaction switches (#15114)
* Added concurrent compaction switches
2023-10-13 21:03:39 +05:30
Zoltan Haindrich 6d62c75866
Fix columns with null values in windowing expressions (#15131) 2023-10-13 10:42:45 -04:00
Karan Kumar f0a70fe3c4
Fixing the flaky tests. (#15142) 2023-10-13 16:20:05 +05:30
Adarsh Sanjeev 4deeb7e936
Fix issue with checking segment load status (#15147)
This PR addresses a bug with waiting for segments to be loaded. In the case of append, segments would be created with the same version. This caused the number of segments returned to be incorrect.

This PR changes this to keep track of the range of partition numbers as well for each version, which lets the task wait for the correct set of segments. The partition numbers are expected to be continuous since the task obtains the lock for the segment while running.
2023-10-13 16:06:13 +05:30
AmatyaAvadhanula d25caaefa4
Add support for streaming ingestion with concurrent replace (#15039)
Add support for streaming ingestion with concurrent replace

---------

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
2023-10-13 09:09:03 +05:30
Tejaswini Bandlamudi 0a6f78c0bb
Fix GHA workflow bugs (#15138) 2023-10-12 21:25:57 +05:30
Karan Kumar 61ea9e07c5
Limit pages size to a configurable limit (#14994)
Adding the ability to limit the pages sizes of select queries.

    We piggyback on the same machinery that is used to control the numRowsPerSegment.
    This patch introduces a new context parameter rowsPerPage for which the default value is set to 100000 rows.
    This patch also optimizes adding the last selectResults stage only when the previous stages have sorted outputs. Currently for each select query with selectDestination=durableStorage, we used to add this extra selectResults stage.
2023-10-12 14:01:46 +05:30
Clint Wylie a0fd9ec55c
fix issue with SQL boolean constants not respecting nulls when strict booleans and sql compatible null handling are enabled (#15135) 2023-10-12 01:23:24 -07:00
Clint Wylie d0f64608eb
sql compatible three-valued logic native filters (#15058)
* sql compatible tri-state native logical filters when druid.expressions.useStrictBooleans=true and druid.generic.useDefaultValueForNull=false, and new druid.generic.useThreeValueLogicForNativeFilters=true
* log.warn if non-default configurations are used to guide operators towards SQL complaint behavior
2023-10-12 00:06:23 -07:00
317brian 265c811963
docs: remove experimental note from query from deep storage docs (#15132) 2023-10-12 11:51:02 +05:30
Katya Macedo 10aab7506e
Dynamic configuration API documentation refactor (#15098)
Co-authored-by: demo-kratia <56242907+demo-kratia@users.noreply.github.com>
2023-10-11 14:45:05 -07:00
Zoltan Haindrich ae88f2c0b6
Fix non-sqlcompat validation in CalciteWindowQueryTest (#15086)
* fixes

* check for latest rewrite place

* Revert "check for latest rewrite place"

This reverts commit 5cf1e2c1ca.

* some stuff

(cherry picked from commit ab346d4373ea888eb8ef6115e018e7fb0d27407f)

* update test output

* updates to test ouptuts

* some stuff

* move validator

* cleanup

* fix

* change test slightly

* add apidoc cleanup warnings

* cleanup/etc

* instead of telling the story; add a fail with some reason whats the issue

* lead-lag fix

* add test

* remove unnecessary throw

* druidexception-trial

* Revert "druidexception-trial"

This reverts commit 8fa06644bc.

* undo changes to no_grouping; add no_grouping2

* add missing assert on resultcount

* rename method; update

* introduce enum/etc

* make resultmatchmode accessible from TestBuilder#expectedResults

* fix dump results to use log

* fix

* handle null correctly

* disable feature type based things for MSQ

* fix varianssqlaggtest

* use eps in other test

* fix intellij error

* add final

* addrss review

* update test/string/etc

* write concat in 3 lines :D
2023-10-11 12:34:31 -07:00
Vishesh Garg c6ca990f1f
Rewrite EARLIEST/LATEST query operators to EARLIEST_BY/LATEST_BY (#15095)
EARLIEST and LATEST operators implicitly reference the __time column for calculation of the aggregate value. Since the reference isn't explicit, Calcite sometimes fails to update the __time column name when there's column renaming --such as in the case of nested queries -- resulting in column not found errors.

This change rewrites these operators to EARLIEST_BY and LATEST_BY during query processing to make the reference explicit to Calcite.
2023-10-11 19:48:36 +05:30
Tejaswini Bandlamudi 52d94b09a7
update jetty & netty4 dependencies (#15129)
Update jetty dependencies version to 9.4.53.v20231009
Update netty4 dependencies version to 4.1.100.Final to resolve CVE-2023-4586 (Netty-handler does not validate host names by default)
2023-10-11 18:16:28 +05:30
Sébastien dba0246aca
Added UI support for waitTillSegmentsLoad (#15110)
This relies on the work done in #14322 and #15076. It allows the user to set waitTillSegmentsLoad in the query context (if they want, else it defaults to true) and shows the results in the UI :
2023-10-11 16:18:42 +05:30
Laksh Singla 5f86072456
Prepare master for Druid 29 (#15121)
Prepare master for Druid 29
2023-10-11 10:33:45 +05:30
317brian 263e106714
docs: remove experimental note from unnest docs (#15123)
* docs: remove experimental note from unnest docs

* remove flag needed to use unnest
2023-10-10 16:52:51 -07:00
Zoltan Haindrich 23605c1edd
Enable resultset validation of Drill tests (#15096)
- introduces a test_X method for every testcase (995 testcases)
- added a resultset parser which reads the expected resultset based on the result schema
- loaded a few more datasets
- added a testcase to ensure that all files have a corresponding testcase
- renamed DecoupledIgnore to NegativeTest
- categorized the failing 268 tests
2023-10-10 14:40:50 +05:30
Karan Kumar 48f35b3fdd
Add query id to processing pool thread name. (#15059)
This patch changes the thread name of the processing pool of the indexers/peons/historicals from query.getType() + "_" + query.getDataSource() + "_" + query.getIntervals() to query.getId()
2023-10-10 05:59:03 +05:30
Clint Wylie fda8d2b7f3
fix debugging and running with intellij runConfiguration (#15115) 2023-10-09 17:03:06 -07:00
Laksh Singla 95bf331c08
Rename the default setting of 'maxSubqueryBytes' from 'unlimited' to 'disabled' (#15108)
The default setting of 'maxSubqueryBytes' is renamed from 'unlimited' to 'disabled'.
2023-10-10 02:03:29 +05:30
Abhishek Agarwal 90a1458ac9
Parse passwords containing colon correctly (#15109) 2023-10-09 20:45:10 +05:30
Laksh Singla b0edbc3d91
MSQ writes out string arrays instead of MVDs by default (#15093)
MSQ uses the string dimension schema for ARRAY<STRING> typed columns, which creates MVDs instead of string arrays as required. Therefore someone trying to ingest columns of type ARRAY<STRING> from an external data source or another data source would get STRING columns in the newly generated segments.

This patch changes the following:

- Use auto dimension schema to ingest the ARRAY<STRING> columns, which will create columns with the desired type.
- Add an undocumented flag ingestStringArraysAsMVDs to preserve the legacy behavior. Legacy behaviour is turned on by default. 
- Create MSQArraysInsertTest and refactor some of the tests in MSQInsertTest.
2023-10-09 20:31:07 +05:30
Laksh Singla 36edbce036
Fix compilation failure in master (#15111)
Merging since it's a dev blocker.
2023-10-09 20:05:48 +05:30
Clint Wylie 1fc8fb1b20
add a bunch of tests with array typed columns to CalciteArraysQueryTest (#15101)
* add a bunch of tests with array typed columns to CalciteArraysQueryTest
* fix a bug with unnest filter pushdown when filtering on unnested array columns
2023-10-09 06:16:06 -07:00
Laksh Singla 549ef56288
UNION ALLs in MSQ (#14981)
MSQ now supports UNION ALL with UnionDataSource
2023-10-09 18:18:15 +05:30
AmatyaAvadhanula 40a6dc4631
Optimize used segment fetching in Kill tasks (#15107)
* Optimize used segment fetching in Kill tasks
2023-10-09 17:54:13 +05:30
Adarsh Sanjeev 7a35ce886d
Add ability for MSQ tasks to query realtime tasks (#15024)
This PR aims to add the capabilities to:
1. Fetch the realtime segment metadata from the coordinator server view,
2. Adds the ability for workers to query indexers, similar to how brokers do the same for native queries.
2023-10-09 15:14:03 +05:30
kaisun2000 e2cc1c4ad1
Add metric -- count of queries waiting for merge buffers (#15025)
Add 'mergeBuffer/pendingRequests' metric that exposes the count of waiting queries (threads) blocking in the merge buffers pools.
2023-10-09 12:56:23 +05:30
Gian Merlino c483cb863d
Fix IndexerWorkerClient#fetchChannelData when response has data and error. (#15084)
* Fix IndexerWorkerClient#fetchChannelData when response has data and error.

When a channel data response from a worker includes some data and then
some I/O error, then when the call is retried, we will re-read the set
of data that was read by the previous connection and add it to the
local channel again. This causes the local channel to become corrupted.
The patch fixes this case by skipping data that has already been read.
2023-10-09 11:12:28 +05:30
Pranav c7d0615af3
Fix the build for #15013.: Lookup jitter upstream build fix (#15103)
Fix the build for #15013.
2023-10-09 09:35:39 +05:30
Zoltan Haindrich b5a87fd89b
Support constant args in window functions (#15071)
Instead of passing the constants around in a new parameter; InputAccessor was introduced to take care of transparently handling the constants - this new class started picking up some copy-paste debris around field accesses; and made them a little bit more readble.
2023-10-08 12:14:25 +05:30
Zoltan Haindrich 7b869fd37a
Change type of AVG aggregates to double (#15089)
The sql standard is not very restrictive regarding this:

If AVG is specified and DT is exact numeric, then the declared type of the result is an implemen-
tation-defined exact numeric type with precision not less than the precision of DT and scale not
less than the scale of DT.

so; using the same type is also ok (without patch);
however the avg of 0 and 1 is 0 right now because of the retention of the integer typ

Postgres,MySql and Oracle and Drill seem to increase precision ; mssql returns 0
http://sqlfiddle.com/#!9/6f7248/1

I think we should also increase precision as its already calculated more precisely
2023-10-07 18:01:09 +05:30
Soumyava 57ab8e13dc
Updating plans when using joins with unnest on the left (#15075)
* Updating plans when using joins with unnest on the left

* Correcting segment map function for hashJoin

* The changes done here are not reflected into MSQ yet so these tests might not run in MSQ

* native tests

* Self joins with unnest data source

* Making this pass

* Addressing comments by adding explanation and new test
2023-10-06 19:23:12 -07:00
Xavier Léauté f9439970c9
run build and unit tests using Java 21 (#15088)
* run build and unit test using Java 21

* run static checks with Java 21

* use setup-java for unit tests, since Java 21 is not built-in

* skip maven cache from setup-java

* add comments to explain cache behavior
2023-10-06 12:45:07 -07:00
Soumyava 1a06ef5a24
Fixing old function used (#15099) 2023-10-05 17:25:00 -07:00
Pranav 06c5527c85
Allow aliasing of Macros and add new alias for complex decode 64 (#15034)
* Add AliasExprMacro to allow aliasing of native expression macros
* Add decode_base64_complex alias for complex_decode_base64
2023-10-05 16:24:36 -07:00
Zoltan Haindrich 36d7b3cc65
Add CalciteSysQueryTest to enable some testing of bindable plans. (#15070) 2023-10-05 11:37:49 -07:00
317brian 2164dafb99
docs: update unnest to use crossjoin instead of comma (#15074) 2023-10-05 09:01:08 -07:00
Adarsh Sanjeev 7e987e3d69
Add query context parameter for segment load wait (#15076)
Add segmentLoadWait as a query context parameter. If this is true, the controller queries the broker and waits till the segments created (if any) have been loaded by the load rules. The controller also provides this information in the live reports and task reports. If this is false, the controller exits immediately after finishing the query.
2023-10-05 18:26:34 +05:30
Laksh Singla 2c286d6f42
Fix monomorphic processing code running on JDK8 since it references a non-existing method (#15092)
Code relying on monomorphic processing on JDK8 doesn't work correctly, since it tries to reference getArrayLength using method handles, which might have been accidentally removed here since it seems unused. This PR adds the method back as is.
2023-10-05 11:05:38 +05:30
Clint Wylie b4bc9b6950
fix issue with auto columns with mix of scalar values and empty arrays (#15083) 2023-10-05 10:15:45 +05:30
Laksh Singla b8d03d36b0
Free up the resources when materializing the results as Frames (#15032)
Refactor the code to clean up the result sequences when materializing the results as Frames
2023-10-05 10:14:27 +05:30
Clint Wylie 3afe09a19d
urlencode nested serializer temp file names so they dont explode stuff (#15068)
Fixes a bug caused by #14919, which was just using the column name as part of a temp file name, which.. isn't very cool, my bad. Switched to use StringUtils.urlEncode so that ugly chars don't explode stuff. The modified test fails without the changes in this PR.
2023-10-05 10:13:45 +05:30
Laksh Singla 30cf76db99
Field writers for numerical arrays (#14900)
Row-based frames, and by extension, MSQ now supports numeric array types. This means that all queries consuming or producing arrays would also work with MSQ. Numeric arrays can also be ingested via MSQ. Post this patch, queries like, SELECT [1, 2] would work with MSQ since they consume a numeric array, instead of failing with an unsupported column type exception.
2023-10-04 23:16:47 +05:30
317brian 88476e0e83
docs: add note about transparent_reconnection for Avatica (#15066)
* add note about transparent_reconnection

* Update docs/api-reference/sql-jdbc.md
2023-10-04 09:52:48 -07:00
Zoltan Haindrich 90e4b25620
Fix lead/lag to be usable without offset (#15057) 2023-10-04 17:38:46 +05:30
Tejaswini Bandlamudi c888ac5d61
fix path of druid service IT logs (#15082) 2023-10-04 15:38:38 +05:30
Gian Merlino a9021e4cd7
Fix NPE with lenient aggregators merging in segmentMetadata. (#15078)
When merging analyses, lenient merging sets unmergeable aggregators
to null. Merging such a null aggregator record into a nonnull record
would potentially lead to NPE in getMergingFactory.

The new code only calls getMergingFactory if both the old and new
aggregators are nonnull; else, if either is null, then the merged
aggregator is also set to null.
2023-10-04 02:41:41 -07:00
Clint Wylie 632811b285
fix json compat layer to not rewrite v4 into v5 after segment merging (#14997) 2023-10-04 00:18:18 -07:00