12966 Commits

Author SHA1 Message Date
Clint Wylie
9875090bee
fix segment metadata queries for auto ingested columns that had all null values (#14262) 2023-05-11 20:58:06 -07:00
Kashif Faraz
47a70d03e8
Docs: Minor rephrase in indexing-service.md (#14231)
* Fix language in indexing-service

* Apply suggestions from code review

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>
2023-05-12 08:22:02 +05:30
317brian
cc37987dff
docs: copyedits for MSQ join algos (#14012) 2023-05-11 14:21:09 -07:00
Soumyava
f128b9b666
Updates to filter processing for inner query in Joins (#14237) 2023-05-11 17:21:41 +05:30
Clint Wylie
a58cebe491
add array_to_mv function to convert arrays into mvds to assist with migration from mvds to arrays (#14236) 2023-05-11 04:43:28 -07:00
Kashif Faraz
64e6283eca
Do not allow retention rules to be null (#14223)
Changes:
- Do not allow retention rules for any datasource or cluster to be null
- Allow empty rules at the datasource level but not at the cluster level
- Add validation to ensure that `druid.manager.rules.defaultRule` is always set correctly
- Minor style refactors
2023-05-11 14:33:56 +05:30
AmatyaAvadhanula
47e48ee657
Remove incorrect optimization (#14246) 2023-05-11 00:54:41 -07:00
Clint Wylie
e833a4700d
suppress hadoop3 cve that seem not applicable to us (#14252) 2023-05-10 23:08:05 -07:00
Abhishek Agarwal
f3ff36a004
Move the stale bot to a GHA action (#14238)
Move the stale bot to a GHA action
2023-05-11 11:31:28 +05:30
Clint Wylie
aaaff74740
fix npe regression in json_value when filtering non-existent paths (#14250)
* fix npe regression in json_value when filtering non-existent paths

* more coverage
2023-05-10 22:39:22 -07:00
Clint Wylie
6db11bfc60
suppress some cves and fix javadoc build when using java 17 (#14241) 2023-05-10 15:47:10 -07:00
Clint Wylie
625c4745b1
add context flag "useAutoColumnSchemas" to use new auto types for MSQ segment generation (#14175) 2023-05-10 15:37:14 -07:00
George Shiqi Wu
161d12eb44
Fix unit tests for java 17 (#14207)
Fix a unit test that fails in java 17
2023-05-09 20:02:31 +05:30
Kashif Faraz
bd0080c4ce
Update default values in docs (#14233) 2023-05-09 19:13:51 +05:30
Shingo Kitagawa
152e9375e2
update documentation about multiValueHandling (#14197)
* update documentation about multiValueHandling

* Update docs/ingestion/ingestion-spec.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/ingestion/ingestion-spec.md

Co-authored-by: Gian Merlino <gianmerlino@gmail.com>

* fix spelling

---------

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
2023-05-08 16:16:54 -07:00
Clint Wylie
8805d8d7db
fix issues with filtering nulls on values coerced to numeric types (#14139)
* fix issues with filtering nulls on values coerced to numeric types
* fix issues with 'auto' type numeric columns in default value mode
* optimize variant typed columns without nested data
* more tests for 'auto' type column ingestion
2023-05-08 13:19:02 -07:00
Vadim Ogievetsky
0a3889b192
account for auto allowing for leading and trailing spaces (#14224) 2023-05-08 13:18:31 -07:00
minseok
3c62c00d4c
Fix Typos in DruidToGraphiteEventConverter (#14219) 2023-05-08 17:46:32 +05:30
Clint Wylie
a7a4bfd331
modify QueryScheduler to lazily acquire lanes when executing queries to avoid leaks (#14184)
This PR fixes an issue that could occur if druid.query.scheduler.numThreads is configured and any exception occurs after QueryScheduler.run has been called to create a Sequence. This would result in total and/or lane specific locks being acquired, but because the sequence was not actually being evaluated, the "baggage" which typically releases these locks was not being executed. An example of how this can happen is if a group-by having filter, which wraps and transforms this sequence happens to explode while wrapping the sequence. The end result is that the locks are acquired, but never released, eventually halting the ability to execute any queries.
2023-05-08 11:42:05 +05:30
Rohan Garg
4d8feeb279
Fix planning in CASE expressions with complex WHEN and ELSE expressions (#14220) 2023-05-08 11:35:04 +05:30
George Shiqi Wu
eed5f4f291
Add labels to k8s jobs for the PodTemplateTaskAdapter (#14205)
* Add labels

* Add prefix

* remove newline

* fix syntax

* Update prefix
2023-05-08 10:56:52 +08:00
Adarsh Sanjeev
fb38085ddb
Add wait for worker shutdown to MSQ task cancel (#14198)
* Add wait for worker shutdown to MSQ task cancel

* Fix checkstyle
2023-05-05 16:29:59 -07:00
Churro
123c4908c8
Ephemeral storage is respected from the overlod for peon tasks (#14201) 2023-05-05 16:27:29 -07:00
Vadim Ogievetsky
4c15e978f1
Web console: misc bug fixes (#14216)
* fixing little things

* clear edit columns when switching to SQL tab

* updated snapshots
2023-05-05 15:45:19 -07:00
Abhishek Radhakrishnan
6ca3fb9b08
Remove the redundant ISO-8601 text in the readme. (#14210) 2023-05-05 11:27:29 -07:00
Abhishek Radhakrishnan
46dabab36d
Fix NPE in test parse exception report. Add more tests with different thresholds. (#14209) 2023-05-05 10:05:41 -07:00
Clint Wylie
01e88848ce
restore .idea/misc.xml to see if it fixes intellij inspection ci (#14208) 2023-05-05 11:47:16 +05:30
zachjsh
48cde236c4
Add columnMappings to explain plan output (#14187)
* Add columnMappings to explain plan output

* * fix checkstyle
* add tests

* * improve test coverage

* * temporarily remove unit-test need to run ITs

* * depend on build

* * temporarily lower unit test threshold

* * add back dependency on unit-tests

* * add license headers

* * fix header order

* * review comments

* * fix intellij inspection errors

* * revert code coverage change
2023-05-04 10:36:28 -07:00
Abhishek Agarwal
edfd46ed45
Better actionable error message when druid services are not running (#14202)
We have seen that the first-time users often don't know the next steps if druid services are unresponsive for some reason. This PR makes some of those messages a bit more clear.
2023-05-04 18:03:59 +05:30
Abhishek Radhakrishnan
68f908e511
Fix uncaught ParseException when reading Avro from Kafka (#14183)
In StreamChunkParser#parseWithInputFormat, we call byteEntityReader.read() without handling a potential ParseException, which is thrown during this function call by the delegate AvroStreamReader#intermediateRowIterator.
A ParseException can be thrown if an Avro stream has corrupt data or data that doesn't conform to the schema specified or for other decoding reasons. This exception if uncaught, can cause ingestion to fail.
2023-05-04 12:35:36 +05:30
Abhishek Radhakrishnan
954f3917ef
Add check for required avroBytesDecoder property that otherwise causes NPE. (#14177) 2023-05-03 09:53:58 -07:00
AmatyaAvadhanula
ac7181bbda
Persist supervisor spec only after successful start (#14150)
* Persist spec after successful start

* Fix checkstyle.

* checkstyle after mvn install
2023-05-03 18:27:39 +05:30
Vadim Ogievetsky
ad93635e45
Web console: allow stringly schemas in the data loader (#14189)
* allow stringly schemas

* fix copy

* feedback fixes

* feedback

* fix copy

* add warning

* indicate submitting

* Update web-console/src/views/load-data-view/load-data-view.tsx

Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>

* feedback fix

* copy fix

---------

Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>
2023-05-02 23:13:21 -07:00
Karan Kumar
6f0cdd0c3f
TaskStartTimeoutFault now depends on the last successful worker launch time. (#14172)
* `TaskStartTimeoutFault` now depends on the last successful worker launch time.
2023-05-03 00:05:15 +05:30
Laksh Singla
387e682fbc
Fix memory calculations for WorkerMemoryParameters for machines with relatively less heap space (#14117)
* update worker memory parameters
2023-05-02 09:24:56 +05:30
Karan Kumar
078d5ac590
Preference to first worker error in-case job fails with TooManyAttemptsForWorker (#14170) 2023-05-01 14:47:11 +05:30
Clint Wylie
90ea192d9c
fix bugs with auto encoded long vector deserializers (#14186)
This PR fixes an issue when using 'auto' encoded LONG typed columns and the 'vectorized' query engine. These columns use a delta based bit-packing mechanism, and errors in the vectorized reader would cause it to incorrectly read column values for some bit sizes (1 through 32 bits). This is a regression caused by #11004, which added the optimized readers to improve performance, so impacts Druid versions 0.22.0+.

While writing the test I finally got sad enough about IndexSpec not having a "builder", so I made one, and switched all the things to use it. Apologies for the noise in this bug fix PR, the only real changes are in VSizeLongSerde, and the tests that have been modified to cover the buggy behavior, VSizeLongSerdeTest and ExpressionVectorSelectorsTest. Everything else is just cleanup of IndexSpec usage.
2023-05-01 11:49:27 +05:30
Vadim Ogievetsky
32af570fb2
fix API doc formatting (#14167) 2023-04-29 09:29:41 -07:00
Vadim Ogievetsky
f976837eaa
allow marking segments as used when the whole datasoruce is unused (#14185) 2023-04-28 19:45:50 -07:00
George Shiqi Wu
d0654e2174
Register emitter (#14180) 2023-04-27 18:32:50 -07:00
Vadim Ogievetsky
98db960794
fix task query error decode (#14174) 2023-04-27 15:26:07 -07:00
Suneet Saldanha
84c11df980
Make LoggingEmitter more useful by using Markers (#14121)
* Make LoggingEmitter more useful

* Skip code coverage for facade classes

* fix spellcheck

* code review

* fix dependency

* logging.md

* fix checkstyle

* Add back jacoco version to main pom
2023-04-27 15:06:06 -07:00
Jill Osborne
d4e478c909
NVL function docs update (#14169)
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2023-04-27 11:17:21 -07:00
Vadim Ogievetsky
fceb505833
Web console: allow __time in MSQ (#14165)
* works in MSQ

* fix spec conversion
2023-04-27 09:02:22 -07:00
Nicholas Lippis
6579c1c5b6
remove unneeded TaskLogStreamer binding override (#14176) 2023-04-27 19:39:24 +05:30
Adarsh Sanjeev
63268a5023
Relaunch track of failed workers without work orders (#14166)
* If a worker dies after it has finished generating results, MSQ decides to not retry it as it has no active work orders. However, since we don't keep track of it further, if it is required for a future stage, the controller hangs waiting for the worker to be ready. This PR keeps tracks of any workers the controller decides to not restart immediately and while starting workers for the next stage, queues these workers for retry.
2023-04-27 19:38:05 +05:30
Adarsh Sanjeev
5aa119dfda
Add retry to opening retrying stream (#14126)
* Add retry to opening retrying stream
* Add retry to S3Entity for network issues

* Fix tests and clean up code
2023-04-27 16:52:22 +05:30
Gian Merlino
42c8c84eb6
TimeBoundary: Use cursor when datasource is not a regular table. (#14151)
* TimeBoundary: Use cursor when datasource is not a regular table.

Fixes a bug where TimeBoundary could return incorrect results with
INNER Join or inline data.

* Addl Javadocs.
2023-04-26 17:00:13 -07:00
TSFenwick
6c99fbea92
fix typo in s3 docs. add readme to s3 module. (#14135)
* fix typo in s3 docs. add readme to s3 module.

* Update extensions-core/s3-extensions/README.md

Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>

* cleanup readme for s3 extension and link to repo markdown doc instead of web docs

---------

Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
2023-04-26 14:03:11 -07:00
robo220
5db7396c78
fix(avro-json-path-expressions): allow more complex jsonpath expressions (#14149) 2023-04-26 14:58:11 +05:30