Commit Graph

12960 Commits

Author SHA1 Message Date
Kashif Faraz 0cde3a8b52
Fix regression in batch segment allocation (#14337)
* Improve batch segment allocation logs

* Fix batch seg alloc regression

* Fix logs

* Fix logs

* Fix tests and logs
2023-05-25 22:34:54 -07:00
Vadim Ogievetsky 1873fca6c7
Web console: update DQT to latest version and fix bigint crash (#14318)
* update dqt

* don't crash on bigint values

* better submit experiance

* bump to an even version
2023-05-24 17:40:45 -07:00
Charles Smith 88831b1dd0
Docs: Updates docker compose to turn off kraft which causes errors (#14335) 2023-05-24 09:33:32 -07:00
Clint Wylie 4096f51f0b
add configurable ColumnTypeMergePolicy to SegmentMetadataCache (#14319)
This PR adds a new interface to control how SegmentMetadataCache chooses ColumnType when faced with differences between segments for SQL schemas which are computed, exposed as druid.sql.planner.metadataColumnTypeMergePolicy and adds a new 'least restrictive type' mode to allow choosing the type that data across all segments can best be coerced into and sets this as the default behavior.

This is a behavior change around when segment driven schema migrations take effect for the SQL schema. With latestInterval, the SQL schema will be updated as soon as the first job with the new schema has published segments, while using leastRestrictive, the schema will only be updated once all segments are reindexed to the new type. The benefit of leastRestrictive is that it eliminates a bunch of type coercion errors that can happen in SQL when types are varied across segments with latestInterval because the newest type is not able to correctly represent older data, such as if the segments have a mix of ARRAY and number types, or any other combinations that lead to odd query plans.
2023-05-24 20:32:51 +05:30
Soumyava 22ba457d29
Expr getCacheKey now delegates to children (#14287)
* Expr getCacheKey now delegates to children

* Removed the LOOKUP_EXPR_CACHE_KEY as we do not need it

* Adding an unit test

* Update processing/src/main/java/org/apache/druid/math/expr/Expr.java

Co-authored-by: Clint Wylie <cjwylie@gmail.com>

---------

Co-authored-by: Clint Wylie <cjwylie@gmail.com>
2023-05-23 14:49:38 -07:00
Abhishek Radhakrishnan 338bdb35ea
Return `RESOURCES` in `EXPLAIN PLAN` as an ordered collection (#14323)
* Make resources an ordered collection so it's deterministic.

* test cleanup

* fixup docs.

* Replace deprecated ObjectNode#put() calls with ObjectNode#set().
2023-05-23 00:55:00 -05:00
Abhishek Radhakrishnan a5e04d95a4
Add `TYPE_NAME` to the complex serde classes and replace the hardcoded names. (#14317)
* Add TYPE_NAME to the serde classes and reuse them instead of hardcoded strings.

* Static check fixes.
2023-05-23 00:54:47 -05:00
Victoria Lim 6b3a6113c4
Doc: List supported values for Kafka `headerFormat` (#14316) 2023-05-22 15:41:07 -07:00
Nhi Pham 3f6610aaf1
fixed wording in OSS query laning doc (#14324)
Co-authored-by: Nhi Pham <nhipham@Nhi-Pham.local>
2023-05-22 11:58:17 -07:00
George Shiqi Wu cb65135b99
Fix log streaming (#14285)
* Fix log streaming

* Add watch log

* Add unit tests

* long running client

* singleton client

* Remove accidental close
2023-05-22 11:19:53 -07:00
Tejaswini Bandlamudi 36a084e021
Fix GHA workflows naming & Run ITs if UTs fail on coverage (#14158)
Currently, there is no way to run ITs if unit-tests fail on coverage. This PR allows Revised, Standard ITs to run even when unit-tests fail on coverage errors, still failing the workflow. This PR also fixes existing GHA workflow naming.
2023-05-22 11:44:34 +05:30
317brian 9faf9ecf20
docs: add line about write datasource perm for overlord api (#14114)
Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>
2023-05-19 14:56:24 -07:00
Katya Macedo 269137c682
Update Ingestion section (#14023)
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Victoria Lim <lim.t.victoria@gmail.com>
2023-05-19 09:42:27 -07:00
Vadim Ogievetsky 7f66fd049b
don't show merged stats until needed (#14311) 2023-05-18 20:32:58 -07:00
imply-cheddar e9fed1445f
Revert PreResponseAuthorizationCheckFilter (#13813)
Make it permissive like it used to be again so that we
ensure that validation errors make it out.
2023-05-18 18:16:43 -07:00
George Shiqi Wu 51f722b7f1
Fix labels (#14282)
* Fix labels

* move to a util function

* style

* PR comments

* rename class
2023-05-18 11:51:58 -07:00
Victoria Lim 058eb99a8b
Docs: Update Docker profile and fix method call in `druidapi` tutorial (#14308) 2023-05-18 07:29:02 -07:00
Abhishek Radhakrishnan c546df3866
Add `examples/` to CI UT/IT ignore (#14306)
* Skip UT/IT on examples only changes.
2023-05-17 17:46:25 -07:00
Abhishek Radhakrishnan 7400ed3c93
Fixup data deletion tutorial docs (#14283)
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2023-05-17 17:05:35 -07:00
Charles Smith c84c174caa
update tutorials to use clarify druid host location for Docker Compose + Druid version (#14295) 2023-05-17 15:41:02 -07:00
Clint Wylie cb10bb9783
add website to java ci ignore (#14303) 2023-05-17 14:50:52 -07:00
Clint Wylie 26ff01a0fd
streamline release process docs (#14268)
remove release:prepare without skipping tests because there is no good reason to run tests locally in this step inline with creating a tag.
2023-05-17 13:57:37 -07:00
Clint Wylie 1d1454b22c
update NOTICE year, update kafka notice in licenses.yaml (#14299) 2023-05-17 04:32:19 -07:00
Clint Wylie d92b9fbfac
more resilient segment metadata, dont parallel merge internal segment metadata queries (#14296) 2023-05-17 04:12:55 -07:00
Vadim Ogievetsky 1dd20773ae
remove website node-scss dep (#14275) 2023-05-17 04:10:46 -07:00
317brian ceda1e98b9
docs: add docs for schema auto-discovery (#14065)
* wip schemaless

* wip

* more cleanup

* update tuningconfig example

* updates based on feedback from clint

* remove errant comma

* update dimension object to include auto

* update to include string schemaless way

* fix spelling errors

* updates for type-aware and string-based changes

* Update docs/ingestion/schema-design.md

* Apply suggestions from code review

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* update spelling file

* Update docs/ingestion/schema-design.md

Co-authored-by: Clint Wylie <cjwylie@gmail.com>

* copyedits

* fix anchor

---------

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>
Co-authored-by: Clint Wylie <cjwylie@gmail.com>
2023-05-17 01:36:02 -07:00
Clint Wylie b038a11280
fix issues with handling arrays with all null elements and arrays of booleans in strict mode (#14297) 2023-05-17 01:33:44 -07:00
Tejaswini Bandlamudi bbbb031057
Do not cancel old GHA workflows triggered on branch commits (#14279)
* group and limit workflows only on PRs and not on branch commits

* also apply to Static Checks CI
2023-05-16 12:13:08 +05:30
Soumyava 96a3c00754
Fixing an issue with filtering on a single dimension by converting In… (#14277)
* Fixing an issue with filtering on a single dimension by converting In filter to a selector filter as needed with Filters.toFilter

* Adding a test so that any future refactoring does not break this behavior

* Made comment a bit more meaningful
2023-05-15 20:10:36 -07:00
Adarsh Sanjeev e8ef31fe92
Fix condition for timeout in worker task launcher (#14270)
* Fix condition for timeout in worker task launcher
2023-05-16 08:30:00 +05:30
Victoria Lim 66d4ea014c
Docs: Tutorial for streaming ingestion using Kafka + Docker file to use with Jupyter tutorials (#13984) 2023-05-15 15:20:52 -07:00
Peter Marshall c4aa98953b
202304-docs-removeDF (#14132) 2023-05-15 15:08:57 -07:00
Paul Rogers 3c0983c8e9
Extend the IT framework to allow tests in extensions (#13877)
The "new" IT framework provides a convenient way to package and run integration tests (ITs), but only for core modules. We have a use case to run an IT for a contrib extension: the proposed gRPC query extension. This PR provides the IT framework functionality to allow non-core ITs.
2023-05-15 20:29:51 +05:30
Adarsh Sanjeev 10bce22e68
Configure maxBytesPerWorker directly instead of using StageDefinition (#14257)
* Configure maxBytesPerWorker directly instead of using StageDefinition
2023-05-15 16:51:57 +05:30
AmatyaAvadhanula e9913abbbf
Add new lock types: APPEND and REPLACE (#14258)
* Add new lock types: APPEND and REPLACE
2023-05-14 22:38:32 -07:00
imply-cheddar f9861808bc
Be able to load segments on Peons (#14239)
* Be able to load segments on Peons

This change introduces a new config on WorkerConfig
that indicates how many bytes of each storage
location to use for storage of a task.  Said config
is divided up amongst the locations and slots
and then used to set TaskConfig.tmpStorageBytesPerTask

The Peons use their local task dir and
tmpStorageBytesPerTask as their StorageLocations for
the SegmentManager such that they can accept broadcast
segments.
2023-05-12 16:51:00 -07:00
317brian 8bda7297e1
doc: fix unnest datasource syntax (#14272) 2023-05-12 13:05:27 -07:00
Tejaswini Bandlamudi 9e0708f5e6
update heap size of coordinator, overlord services in docker IT environment (#14214) 2023-05-12 23:19:48 +05:30
Kashif Faraz ba11b3d462
Refactor: Add OverlordDuty to replace OverlordHelper and align with CoordinatorDuty (#14235)
Changes:
- Replace `OverlordHelper` with `OverlordDuty` to align with `CoordinatorDuty`
  - Each duty has a `run()` method and defines a `Schedule` with an initial delay and period.
  - Update existing duties `TaskLogAutoCleaner` and `DurableStorageCleaner`
- Add utility class `Configs`
- Update log, error messages and javadocs
- Other minor style improvements
2023-05-12 22:39:56 +05:30
317brian 6254658f61
docs: fix links (#14111) 2023-05-12 09:59:16 -07:00
Nicholas Lippis 58dcbf9399
queue tasks in kubernetes task runner if capacity is fully utilized (#14156)
* queue tasks if all slots in use

* Declare hamcrest-core dependency

* Use AtomicBoolean for shutdown requested

* Use AtomicReference for peon lifecycle state

* fix uninitialized read error

* fix indentations

* Make tasks protected

* fix KubernetesTaskRunnerConfig deserialization

* ensure k8s task runner max capacity is Integer.MAX_VALUE

* set job duration as task status duration

* Address pr comments

---------

Co-authored-by: George Shiqi Wu <george.wu@imply.io>
2023-05-12 09:41:44 -06:00
Abhishek Agarwal 9eebeead44
Tune stale bot to pick older issues first (#14267) 2023-05-12 11:45:29 +05:30
Tejaswini Bandlamudi 8ef99f091a
Fix jdk setup in GHA (#14091)
Instead of downloading jdk everytime we run CI, we're using inbuilt temurin jdk distributions 8, 11, 17 by settiing JAVA_HOME variable. This is not working as expected since we were not setting this as global environment variable as a result all CI builds are running on jdk11. This PR fixes the issue.
2023-05-12 10:36:59 +05:30
Clint Wylie eae9e07ea9
suppress CVE-2021-40331 since it applies to ranger-hive-plugin which afaict we do not use (#14261) 2023-05-11 21:58:47 -07:00
Clint Wylie 9875090bee
fix segment metadata queries for auto ingested columns that had all null values (#14262) 2023-05-11 20:58:06 -07:00
Kashif Faraz 47a70d03e8
Docs: Minor rephrase in indexing-service.md (#14231)
* Fix language in indexing-service

* Apply suggestions from code review

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>
2023-05-12 08:22:02 +05:30
317brian cc37987dff
docs: copyedits for MSQ join algos (#14012) 2023-05-11 14:21:09 -07:00
Soumyava f128b9b666
Updates to filter processing for inner query in Joins (#14237) 2023-05-11 17:21:41 +05:30
Clint Wylie a58cebe491
add array_to_mv function to convert arrays into mvds to assist with migration from mvds to arrays (#14236) 2023-05-11 04:43:28 -07:00
Kashif Faraz 64e6283eca
Do not allow retention rules to be null (#14223)
Changes:
- Do not allow retention rules for any datasource or cluster to be null
- Allow empty rules at the datasource level but not at the cluster level
- Add validation to ensure that `druid.manager.rules.defaultRule` is always set correctly
- Minor style refactors
2023-05-11 14:33:56 +05:30