Commit Graph

13071 Commits

Author SHA1 Message Date
panhongan c244c3de53
fix hdfs initialization issue (#14276)
* fix hdfs initialization issue

* add PR

* remove conf settings

* Improve comments

* move hdfs storage validation to start handler

* restore exception
2023-05-30 12:41:54 -07:00
Nhi Pham 70c06fc0e1
Advise against using WEEK granularity for Native Batch and MSQ (#14341)
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2023-05-30 11:40:12 -07:00
Rishabh Singh 2086ff88bc
Add logging for task stop operations (#14192)
Log more details when task cannot be stopped for various reasons
2023-05-30 18:50:52 +05:30
Pramod Immaneni 1ac5544da7
Updated default value of maxTotalRows to reflect the value in the code (#14298) 2023-05-30 14:41:06 +05:30
Abhishek Radhakrishnan 5fd3e01ef0
More specific exclusions in the `examples` folder. (#14347)
This PR changes how we skip java UT and ITs with changes in the examples folder. After this change, any Markdown files within the examples folder and jupyter-notebooks directory will be excluded. The rationale behind these more specific exclusions is that some ITs use json files checked in examples, so we want to trigger the full workflow for all other changes.
2023-05-30 12:01:45 +05:30
Kashif Faraz d4cacebf79
Add tests for CostBalancerStrategy (#14230)
Changes:
- `CostBalancerStrategyTest`
  - Focus on verification of cost computations rather than choosing servers in this test
  - Add new tests `testComputeCost` and `testJointSegmentsCost`
  - Add tests to demonstrate that with a long enough interval gap, all costs become negligible
  - Retain `testIntervalCost` and `testIntervalCostAdditivity`
  - Remove redundant tests such as `testStrategyMultiThreaded`, `testStrategySingleThreaded`as
verification of this behaviour is better suited to `BalancingStrategiesTest`.
- `CostBalancerStrategyBenchmark`
  - Remove usage of static method from `CostBalancerStrategyTest`
  - Explicitly setup cluster and segments to use for benchmarking
2023-05-30 08:52:56 +05:30
Kashif Faraz 8091c6a547
Update default values in CoordinatorDynamicConfig (#14269)
The defaults of the following config values in the `CoordinatorDynamicConfig` are being updated.

1. `maxSegmentsInNodeLoadingQueue = 500` (previous = 100)
2. `replicationThrottleLimit = 500` (previous = 10)
Rationale: With round-robin segment assignment now being the default assignment technique,
the Coordinator can assign a large number of under-replicated/unavailable segments very quickly,
without getting stuck in `RunRules` duty due to very slow strategy-based cost computations.

3. `maxSegmentsToMove = 100` (previous = 5)
Rationale: A very low value (say 5) is ineffective in balancing especially if there are many segments
to balance. A very large value can cause excessive moves, which has these disadvantages:
- Load of moving segments competing with load of unavailable/under-replicated segments
- Unnecessary network costs due to constant download and delete of segments

These defaults will be revisited after #13197 is merged.
2023-05-30 08:51:33 +05:30
Tejaswini Bandlamudi 0e51c2702a
update operations per run (#14325) 2023-05-29 14:05:11 +05:30
Tejaswini Bandlamudi 914c006b8e
increase middlemanager heap server size in tests (#14345) 2023-05-29 10:45:34 +05:30
Alexander Saydakov 4131c0df13
use the latest datasketches-java-4.0.0 (#14334)
* use the latest datasketches-java-4.0.0

* updated versions of datasketches

* adjusted expectation

* fixed the expectations

---------

Co-authored-by: AlexanderSaydakov <AlexanderSaydakov@users.noreply.github.com>
2023-05-27 22:19:18 -07:00
Karan Kumar 8d256e35b4
MSQ ignores tombstone segments for downloads. (#14342) 2023-05-27 14:21:52 +05:30
Kashif Faraz 0cde3a8b52
Fix regression in batch segment allocation (#14337)
* Improve batch segment allocation logs

* Fix batch seg alloc regression

* Fix logs

* Fix logs

* Fix tests and logs
2023-05-25 22:34:54 -07:00
Vadim Ogievetsky 1873fca6c7
Web console: update DQT to latest version and fix bigint crash (#14318)
* update dqt

* don't crash on bigint values

* better submit experiance

* bump to an even version
2023-05-24 17:40:45 -07:00
Charles Smith 88831b1dd0
Docs: Updates docker compose to turn off kraft which causes errors (#14335) 2023-05-24 09:33:32 -07:00
Clint Wylie 4096f51f0b
add configurable ColumnTypeMergePolicy to SegmentMetadataCache (#14319)
This PR adds a new interface to control how SegmentMetadataCache chooses ColumnType when faced with differences between segments for SQL schemas which are computed, exposed as druid.sql.planner.metadataColumnTypeMergePolicy and adds a new 'least restrictive type' mode to allow choosing the type that data across all segments can best be coerced into and sets this as the default behavior.

This is a behavior change around when segment driven schema migrations take effect for the SQL schema. With latestInterval, the SQL schema will be updated as soon as the first job with the new schema has published segments, while using leastRestrictive, the schema will only be updated once all segments are reindexed to the new type. The benefit of leastRestrictive is that it eliminates a bunch of type coercion errors that can happen in SQL when types are varied across segments with latestInterval because the newest type is not able to correctly represent older data, such as if the segments have a mix of ARRAY and number types, or any other combinations that lead to odd query plans.
2023-05-24 20:32:51 +05:30
Soumyava 22ba457d29
Expr getCacheKey now delegates to children (#14287)
* Expr getCacheKey now delegates to children

* Removed the LOOKUP_EXPR_CACHE_KEY as we do not need it

* Adding an unit test

* Update processing/src/main/java/org/apache/druid/math/expr/Expr.java

Co-authored-by: Clint Wylie <cjwylie@gmail.com>

---------

Co-authored-by: Clint Wylie <cjwylie@gmail.com>
2023-05-23 14:49:38 -07:00
Abhishek Radhakrishnan 338bdb35ea
Return `RESOURCES` in `EXPLAIN PLAN` as an ordered collection (#14323)
* Make resources an ordered collection so it's deterministic.

* test cleanup

* fixup docs.

* Replace deprecated ObjectNode#put() calls with ObjectNode#set().
2023-05-23 00:55:00 -05:00
Abhishek Radhakrishnan a5e04d95a4
Add `TYPE_NAME` to the complex serde classes and replace the hardcoded names. (#14317)
* Add TYPE_NAME to the serde classes and reuse them instead of hardcoded strings.

* Static check fixes.
2023-05-23 00:54:47 -05:00
Victoria Lim 6b3a6113c4
Doc: List supported values for Kafka `headerFormat` (#14316) 2023-05-22 15:41:07 -07:00
Nhi Pham 3f6610aaf1
fixed wording in OSS query laning doc (#14324)
Co-authored-by: Nhi Pham <nhipham@Nhi-Pham.local>
2023-05-22 11:58:17 -07:00
George Shiqi Wu cb65135b99
Fix log streaming (#14285)
* Fix log streaming

* Add watch log

* Add unit tests

* long running client

* singleton client

* Remove accidental close
2023-05-22 11:19:53 -07:00
Tejaswini Bandlamudi 36a084e021
Fix GHA workflows naming & Run ITs if UTs fail on coverage (#14158)
Currently, there is no way to run ITs if unit-tests fail on coverage. This PR allows Revised, Standard ITs to run even when unit-tests fail on coverage errors, still failing the workflow. This PR also fixes existing GHA workflow naming.
2023-05-22 11:44:34 +05:30
317brian 9faf9ecf20
docs: add line about write datasource perm for overlord api (#14114)
Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>
2023-05-19 14:56:24 -07:00
Katya Macedo 269137c682
Update Ingestion section (#14023)
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Victoria Lim <lim.t.victoria@gmail.com>
2023-05-19 09:42:27 -07:00
Vadim Ogievetsky 7f66fd049b
don't show merged stats until needed (#14311) 2023-05-18 20:32:58 -07:00
imply-cheddar e9fed1445f
Revert PreResponseAuthorizationCheckFilter (#13813)
Make it permissive like it used to be again so that we
ensure that validation errors make it out.
2023-05-18 18:16:43 -07:00
George Shiqi Wu 51f722b7f1
Fix labels (#14282)
* Fix labels

* move to a util function

* style

* PR comments

* rename class
2023-05-18 11:51:58 -07:00
Victoria Lim 058eb99a8b
Docs: Update Docker profile and fix method call in `druidapi` tutorial (#14308) 2023-05-18 07:29:02 -07:00
Abhishek Radhakrishnan c546df3866
Add `examples/` to CI UT/IT ignore (#14306)
* Skip UT/IT on examples only changes.
2023-05-17 17:46:25 -07:00
Abhishek Radhakrishnan 7400ed3c93
Fixup data deletion tutorial docs (#14283)
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2023-05-17 17:05:35 -07:00
Charles Smith c84c174caa
update tutorials to use clarify druid host location for Docker Compose + Druid version (#14295) 2023-05-17 15:41:02 -07:00
Clint Wylie cb10bb9783
add website to java ci ignore (#14303) 2023-05-17 14:50:52 -07:00
Clint Wylie 26ff01a0fd
streamline release process docs (#14268)
remove release:prepare without skipping tests because there is no good reason to run tests locally in this step inline with creating a tag.
2023-05-17 13:57:37 -07:00
Clint Wylie 1d1454b22c
update NOTICE year, update kafka notice in licenses.yaml (#14299) 2023-05-17 04:32:19 -07:00
Clint Wylie d92b9fbfac
more resilient segment metadata, dont parallel merge internal segment metadata queries (#14296) 2023-05-17 04:12:55 -07:00
Vadim Ogievetsky 1dd20773ae
remove website node-scss dep (#14275) 2023-05-17 04:10:46 -07:00
317brian ceda1e98b9
docs: add docs for schema auto-discovery (#14065)
* wip schemaless

* wip

* more cleanup

* update tuningconfig example

* updates based on feedback from clint

* remove errant comma

* update dimension object to include auto

* update to include string schemaless way

* fix spelling errors

* updates for type-aware and string-based changes

* Update docs/ingestion/schema-design.md

* Apply suggestions from code review

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* update spelling file

* Update docs/ingestion/schema-design.md

Co-authored-by: Clint Wylie <cjwylie@gmail.com>

* copyedits

* fix anchor

---------

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>
Co-authored-by: Clint Wylie <cjwylie@gmail.com>
2023-05-17 01:36:02 -07:00
Clint Wylie b038a11280
fix issues with handling arrays with all null elements and arrays of booleans in strict mode (#14297) 2023-05-17 01:33:44 -07:00
Tejaswini Bandlamudi bbbb031057
Do not cancel old GHA workflows triggered on branch commits (#14279)
* group and limit workflows only on PRs and not on branch commits

* also apply to Static Checks CI
2023-05-16 12:13:08 +05:30
Soumyava 96a3c00754
Fixing an issue with filtering on a single dimension by converting In… (#14277)
* Fixing an issue with filtering on a single dimension by converting In filter to a selector filter as needed with Filters.toFilter

* Adding a test so that any future refactoring does not break this behavior

* Made comment a bit more meaningful
2023-05-15 20:10:36 -07:00
Adarsh Sanjeev e8ef31fe92
Fix condition for timeout in worker task launcher (#14270)
* Fix condition for timeout in worker task launcher
2023-05-16 08:30:00 +05:30
Victoria Lim 66d4ea014c
Docs: Tutorial for streaming ingestion using Kafka + Docker file to use with Jupyter tutorials (#13984) 2023-05-15 15:20:52 -07:00
Peter Marshall c4aa98953b
202304-docs-removeDF (#14132) 2023-05-15 15:08:57 -07:00
Paul Rogers 3c0983c8e9
Extend the IT framework to allow tests in extensions (#13877)
The "new" IT framework provides a convenient way to package and run integration tests (ITs), but only for core modules. We have a use case to run an IT for a contrib extension: the proposed gRPC query extension. This PR provides the IT framework functionality to allow non-core ITs.
2023-05-15 20:29:51 +05:30
Adarsh Sanjeev 10bce22e68
Configure maxBytesPerWorker directly instead of using StageDefinition (#14257)
* Configure maxBytesPerWorker directly instead of using StageDefinition
2023-05-15 16:51:57 +05:30
AmatyaAvadhanula e9913abbbf
Add new lock types: APPEND and REPLACE (#14258)
* Add new lock types: APPEND and REPLACE
2023-05-14 22:38:32 -07:00
imply-cheddar f9861808bc
Be able to load segments on Peons (#14239)
* Be able to load segments on Peons

This change introduces a new config on WorkerConfig
that indicates how many bytes of each storage
location to use for storage of a task.  Said config
is divided up amongst the locations and slots
and then used to set TaskConfig.tmpStorageBytesPerTask

The Peons use their local task dir and
tmpStorageBytesPerTask as their StorageLocations for
the SegmentManager such that they can accept broadcast
segments.
2023-05-12 16:51:00 -07:00
317brian 8bda7297e1
doc: fix unnest datasource syntax (#14272) 2023-05-12 13:05:27 -07:00
Tejaswini Bandlamudi 9e0708f5e6
update heap size of coordinator, overlord services in docker IT environment (#14214) 2023-05-12 23:19:48 +05:30
Kashif Faraz ba11b3d462
Refactor: Add OverlordDuty to replace OverlordHelper and align with CoordinatorDuty (#14235)
Changes:
- Replace `OverlordHelper` with `OverlordDuty` to align with `CoordinatorDuty`
  - Each duty has a `run()` method and defines a `Schedule` with an initial delay and period.
  - Update existing duties `TaskLogAutoCleaner` and `DurableStorageCleaner`
- Add utility class `Configs`
- Update log, error messages and javadocs
- Other minor style improvements
2023-05-12 22:39:56 +05:30