Commit Graph

13619 Commits

Author SHA1 Message Date
Abhishek Radhakrishnan 63e3e9531d
Update S3 retry logic to account for the underlying cause in case of `IOException` (#15238)
* Update S3 retry logic based on the underlying cause in case of IOException.

4xx and other errors wrapped in IOException for instance aren't retriable.

* Fix CI
2023-10-24 15:04:42 -07:00
AmatyaAvadhanula 65b69cded4
Filter pending segments upgraded with transactional replace (#15169)
* Filter pending segments upgraded with transactional replace

* Push sequence name filter to metadata query
2023-10-23 21:18:47 +05:30
Zoltan Haindrich 2e31cb2901
DrillWindowQueryTest: use proper way to decide if the query is ordered (#15118) 2023-10-23 10:54:28 -04:00
Zoltan Haindrich b95035f183
Fix VirtualColumn related issues in window expressions (#15119)
for some exotic queries like:

  SELECT
  	'_'||dim1,
    MIN(cast(0 as double)) OVER (),
    MIN(cast((cnt||cnt) as bigint)) OVER ()
  FROM foo
the compilation have resulted in NPE -s mostly because VirtualColumn -s were not handled properly
2023-10-23 14:05:59 +05:30
Clint Wylie c8e458452d
Fix native is boolean filter cache key tests to test the right thing (#15216) 2023-10-23 11:24:46 +05:30
AmatyaAvadhanula 33fdd770f7
Consider only supervisors with append lock for concurrent transactional replace (#15220)
A SegmentTransactionReplaceAction must only update the mapping of tasks with append locks that are running concurrently. To ensure this, we return the supervisor id only if it has the taskLockType as APPEND in its context.
2023-10-22 14:12:36 +05:30
Zoltan Haindrich fbbb9c7730
Allow DESC ordering in window expressions (#15195) 2023-10-20 07:55:28 -04:00
Xavier Léauté e03f863cf6
update core Apache Kafka dependencies to 3.6.0 (#15214)
Release notes: https://downloads.apache.org/kafka/3.6.0/RELEASE_NOTES.html
https://kafka.apache.org/blog#apache_kafka_360_release_announcement
2023-10-19 20:27:09 -07:00
Xavier Léauté 352702bb25
run some integration tests with Java 21 (#15104)
* use setup-java everywhere for consistency

* add Java 21 to integration test matrix

* simplify docker build containers script + add Java 21

* fix for Java versions reporting 21-ea
2023-10-20 11:18:13 +08:00
Sébastien 5752a1a383
Proper default for taskLockType in streaming ingestion (#15213)
* Proper value for taskLockType in streaming ingestion with concurrent compaction
2023-10-19 21:01:31 +05:30
Tejaswini Bandlamudi 1f39c054a7
Fix GHA workflow bugs (#15209) 2023-10-19 17:11:36 +05:30
Zoltan Haindrich 9fb0dbfc9f
Fix json inputs for drill windowing tests (#15148)
This PR:

adds a flag to JsonToParquet to do the fix during conversion
updates the json files to more correct conents
some resultset mismatches were fixed by this
updates parquet to 1.13.1
2023-10-19 14:02:41 +05:30
AmatyaAvadhanula a8febd457c
A Replacing task must read segments created before it acquired its lock (#15085)
* Replacing tasks must read segments created before they acquired their locks
2023-10-19 11:13:07 +05:30
Laksh Singla fa311dd0b6
Cast the values read in the EXTERN function to the type in the row signature (#15183) 2023-10-19 10:24:23 +05:30
Atul Mohan 780207869b
Attach user identity to router request logs (#15126)
* Attach user identity to router request logs

* Add test

* More tests
2023-10-18 19:40:58 -07:00
Clint Wylie 5c14b42e72
fix incorrect unnest dimension cursor value matcher implementation (#15192) 2023-10-18 16:43:06 -07:00
Clint Wylie 061cfee224
add native filters for "(filter) is true" and "(filter) is false" (#15182)
* add native filters for "(filter) is true" and "(filter) is false"

changes:
* add IsTrueDimFilter, IsFalseDimFilter, and abstract IsBooleanDimFilter for native json filter implementations of `(filter) IS TRUE` and `(filter) IS FALSE`
* add IsBooleanFilter for actual filtering logic for these filters, which ignore includeUnknown to always use matches with false for true and !matches with true for false
* fix test incorrectly adjusted to wrong answer in #15058
* add tests for default value mode
2023-10-18 13:07:35 -07:00
Zoltan Haindrich c58b7f40ee
Rename windowing option (#15184) 2023-10-18 10:54:20 +05:30
Clint Wylie 22034a1630
preserve Rows.objectToStrings behavior of translating null into "null" inside of lists and arrays (#15190) 2023-10-17 19:49:36 -07:00
Laksh Singla b4540ed5d4
Optimize the reading of numerical frame arrays in MSQ (#15175) 2023-10-18 02:33:42 +05:30
Karan Kumar 953ce79439
Add undocumented taskLockType to MSQ. (#15168)
Patch adds an undocumented parameter taskLockType to MSQ so that we can start enabling this feature for users who are interested in testing the new lock types.
2023-10-17 21:44:04 +05:30
George Shiqi Wu dc0b163e19
Separate task lifecycle from kubernetes/location lifecycle (#15133)
* Separate k8s and druid task lifecycles

* Remove extra log lines

* Fix unit tests

* fix unit tests

* Fix unit tests

* notify listeners on task completion

* Fix unit test

* unused var

* PR changes

* Fix unit tests

* Fix checkstyle

* PR changes
2023-10-17 08:17:43 -07:00
Pranav 0a27a7a7ca
Update eirslett frontend (#15154) 2023-10-16 20:16:32 +05:30
Laksh Singla dc8d2192c3
Introduce natural comparator for types that don't have a StringComparator (#15145)
Fixes a bug when executing queries with the ordering of arrays
2023-10-16 10:37:32 +05:30
Pranav 4b0d1b3488
Fix expression result writing of arrays in Hadoop Ingestion (#15127) 2023-10-13 13:41:41 -07:00
Sébastien 9ca10c7bd7
Added concurrent compaction switches (#15114)
* Added concurrent compaction switches
2023-10-13 21:03:39 +05:30
Zoltan Haindrich 6d62c75866
Fix columns with null values in windowing expressions (#15131) 2023-10-13 10:42:45 -04:00
Karan Kumar f0a70fe3c4
Fixing the flaky tests. (#15142) 2023-10-13 16:20:05 +05:30
Adarsh Sanjeev 4deeb7e936
Fix issue with checking segment load status (#15147)
This PR addresses a bug with waiting for segments to be loaded. In the case of append, segments would be created with the same version. This caused the number of segments returned to be incorrect.

This PR changes this to keep track of the range of partition numbers as well for each version, which lets the task wait for the correct set of segments. The partition numbers are expected to be continuous since the task obtains the lock for the segment while running.
2023-10-13 16:06:13 +05:30
AmatyaAvadhanula d25caaefa4
Add support for streaming ingestion with concurrent replace (#15039)
Add support for streaming ingestion with concurrent replace

---------

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
2023-10-13 09:09:03 +05:30
Tejaswini Bandlamudi 0a6f78c0bb
Fix GHA workflow bugs (#15138) 2023-10-12 21:25:57 +05:30
Karan Kumar 61ea9e07c5
Limit pages size to a configurable limit (#14994)
Adding the ability to limit the pages sizes of select queries.

    We piggyback on the same machinery that is used to control the numRowsPerSegment.
    This patch introduces a new context parameter rowsPerPage for which the default value is set to 100000 rows.
    This patch also optimizes adding the last selectResults stage only when the previous stages have sorted outputs. Currently for each select query with selectDestination=durableStorage, we used to add this extra selectResults stage.
2023-10-12 14:01:46 +05:30
Clint Wylie a0fd9ec55c
fix issue with SQL boolean constants not respecting nulls when strict booleans and sql compatible null handling are enabled (#15135) 2023-10-12 01:23:24 -07:00
Clint Wylie d0f64608eb
sql compatible three-valued logic native filters (#15058)
* sql compatible tri-state native logical filters when druid.expressions.useStrictBooleans=true and druid.generic.useDefaultValueForNull=false, and new druid.generic.useThreeValueLogicForNativeFilters=true
* log.warn if non-default configurations are used to guide operators towards SQL complaint behavior
2023-10-12 00:06:23 -07:00
317brian 265c811963
docs: remove experimental note from query from deep storage docs (#15132) 2023-10-12 11:51:02 +05:30
Katya Macedo 10aab7506e
Dynamic configuration API documentation refactor (#15098)
Co-authored-by: demo-kratia <56242907+demo-kratia@users.noreply.github.com>
2023-10-11 14:45:05 -07:00
Zoltan Haindrich ae88f2c0b6
Fix non-sqlcompat validation in CalciteWindowQueryTest (#15086)
* fixes

* check for latest rewrite place

* Revert "check for latest rewrite place"

This reverts commit 5cf1e2c1ca.

* some stuff

(cherry picked from commit ab346d4373ea888eb8ef6115e018e7fb0d27407f)

* update test output

* updates to test ouptuts

* some stuff

* move validator

* cleanup

* fix

* change test slightly

* add apidoc cleanup warnings

* cleanup/etc

* instead of telling the story; add a fail with some reason whats the issue

* lead-lag fix

* add test

* remove unnecessary throw

* druidexception-trial

* Revert "druidexception-trial"

This reverts commit 8fa06644bc.

* undo changes to no_grouping; add no_grouping2

* add missing assert on resultcount

* rename method; update

* introduce enum/etc

* make resultmatchmode accessible from TestBuilder#expectedResults

* fix dump results to use log

* fix

* handle null correctly

* disable feature type based things for MSQ

* fix varianssqlaggtest

* use eps in other test

* fix intellij error

* add final

* addrss review

* update test/string/etc

* write concat in 3 lines :D
2023-10-11 12:34:31 -07:00
Vishesh Garg c6ca990f1f
Rewrite EARLIEST/LATEST query operators to EARLIEST_BY/LATEST_BY (#15095)
EARLIEST and LATEST operators implicitly reference the __time column for calculation of the aggregate value. Since the reference isn't explicit, Calcite sometimes fails to update the __time column name when there's column renaming --such as in the case of nested queries -- resulting in column not found errors.

This change rewrites these operators to EARLIEST_BY and LATEST_BY during query processing to make the reference explicit to Calcite.
2023-10-11 19:48:36 +05:30
Tejaswini Bandlamudi 52d94b09a7
update jetty & netty4 dependencies (#15129)
Update jetty dependencies version to 9.4.53.v20231009
Update netty4 dependencies version to 4.1.100.Final to resolve CVE-2023-4586 (Netty-handler does not validate host names by default)
2023-10-11 18:16:28 +05:30
Sébastien dba0246aca
Added UI support for waitTillSegmentsLoad (#15110)
This relies on the work done in #14322 and #15076. It allows the user to set waitTillSegmentsLoad in the query context (if they want, else it defaults to true) and shows the results in the UI :
2023-10-11 16:18:42 +05:30
Laksh Singla 5f86072456
Prepare master for Druid 29 (#15121)
Prepare master for Druid 29
2023-10-11 10:33:45 +05:30
317brian 263e106714
docs: remove experimental note from unnest docs (#15123)
* docs: remove experimental note from unnest docs

* remove flag needed to use unnest
2023-10-10 16:52:51 -07:00
Zoltan Haindrich 23605c1edd
Enable resultset validation of Drill tests (#15096)
- introduces a test_X method for every testcase (995 testcases)
- added a resultset parser which reads the expected resultset based on the result schema
- loaded a few more datasets
- added a testcase to ensure that all files have a corresponding testcase
- renamed DecoupledIgnore to NegativeTest
- categorized the failing 268 tests
2023-10-10 14:40:50 +05:30
Karan Kumar 48f35b3fdd
Add query id to processing pool thread name. (#15059)
This patch changes the thread name of the processing pool of the indexers/peons/historicals from query.getType() + "_" + query.getDataSource() + "_" + query.getIntervals() to query.getId()
2023-10-10 05:59:03 +05:30
Clint Wylie fda8d2b7f3
fix debugging and running with intellij runConfiguration (#15115) 2023-10-09 17:03:06 -07:00
Laksh Singla 95bf331c08
Rename the default setting of 'maxSubqueryBytes' from 'unlimited' to 'disabled' (#15108)
The default setting of 'maxSubqueryBytes' is renamed from 'unlimited' to 'disabled'.
2023-10-10 02:03:29 +05:30
Abhishek Agarwal 90a1458ac9
Parse passwords containing colon correctly (#15109) 2023-10-09 20:45:10 +05:30
Laksh Singla b0edbc3d91
MSQ writes out string arrays instead of MVDs by default (#15093)
MSQ uses the string dimension schema for ARRAY<STRING> typed columns, which creates MVDs instead of string arrays as required. Therefore someone trying to ingest columns of type ARRAY<STRING> from an external data source or another data source would get STRING columns in the newly generated segments.

This patch changes the following:

- Use auto dimension schema to ingest the ARRAY<STRING> columns, which will create columns with the desired type.
- Add an undocumented flag ingestStringArraysAsMVDs to preserve the legacy behavior. Legacy behaviour is turned on by default. 
- Create MSQArraysInsertTest and refactor some of the tests in MSQInsertTest.
2023-10-09 20:31:07 +05:30
Laksh Singla 36edbce036
Fix compilation failure in master (#15111)
Merging since it's a dev blocker.
2023-10-09 20:05:48 +05:30
Clint Wylie 1fc8fb1b20
add a bunch of tests with array typed columns to CalciteArraysQueryTest (#15101)
* add a bunch of tests with array typed columns to CalciteArraysQueryTest
* fix a bug with unnest filter pushdown when filtering on unnested array columns
2023-10-09 06:16:06 -07:00