druid

Commit Graph

Author	SHA1	Message	Date
Abhishek Radhakrishnan	5fd3e01ef0	More specific exclusions in the `examples` folder. (#14347 ) This PR changes how we skip java UT and ITs with changes in the examples folder. After this change, any Markdown files within the examples folder and jupyter-notebooks directory will be excluded. The rationale behind these more specific exclusions is that some ITs use json files checked in examples, so we want to trigger the full workflow for all other changes.	2023-05-30 12:01:45 +05:30
Kashif Faraz	d4cacebf79	Add tests for CostBalancerStrategy (#14230 ) Changes: - `CostBalancerStrategyTest` - Focus on verification of cost computations rather than choosing servers in this test - Add new tests `testComputeCost` and `testJointSegmentsCost` - Add tests to demonstrate that with a long enough interval gap, all costs become negligible - Retain `testIntervalCost` and `testIntervalCostAdditivity` - Remove redundant tests such as `testStrategyMultiThreaded`, `testStrategySingleThreaded`as verification of this behaviour is better suited to `BalancingStrategiesTest`. - `CostBalancerStrategyBenchmark` - Remove usage of static method from `CostBalancerStrategyTest` - Explicitly setup cluster and segments to use for benchmarking	2023-05-30 08:52:56 +05:30
Kashif Faraz	8091c6a547	Update default values in CoordinatorDynamicConfig (#14269 ) The defaults of the following config values in the `CoordinatorDynamicConfig` are being updated. 1. `maxSegmentsInNodeLoadingQueue = 500` (previous = 100) 2. `replicationThrottleLimit = 500` (previous = 10) Rationale: With round-robin segment assignment now being the default assignment technique, the Coordinator can assign a large number of under-replicated/unavailable segments very quickly, without getting stuck in `RunRules` duty due to very slow strategy-based cost computations. 3. `maxSegmentsToMove = 100` (previous = 5) Rationale: A very low value (say 5) is ineffective in balancing especially if there are many segments to balance. A very large value can cause excessive moves, which has these disadvantages: - Load of moving segments competing with load of unavailable/under-replicated segments - Unnecessary network costs due to constant download and delete of segments These defaults will be revisited after #13197 is merged.	2023-05-30 08:51:33 +05:30
Tejaswini Bandlamudi	0e51c2702a	update operations per run (#14325 )	2023-05-29 14:05:11 +05:30
Tejaswini Bandlamudi	914c006b8e	increase middlemanager heap server size in tests (#14345 )	2023-05-29 10:45:34 +05:30
Alexander Saydakov	4131c0df13	use the latest datasketches-java-4.0.0 (#14334 ) * use the latest datasketches-java-4.0.0 * updated versions of datasketches * adjusted expectation * fixed the expectations --------- Co-authored-by: AlexanderSaydakov <AlexanderSaydakov@users.noreply.github.com>	2023-05-27 22:19:18 -07:00
Karan Kumar	8d256e35b4	MSQ ignores tombstone segments for downloads. (#14342 )	2023-05-27 14:21:52 +05:30
Kashif Faraz	0cde3a8b52	Fix regression in batch segment allocation (#14337 ) * Improve batch segment allocation logs * Fix batch seg alloc regression * Fix logs * Fix logs * Fix tests and logs	2023-05-25 22:34:54 -07:00
Vadim Ogievetsky	1873fca6c7	Web console: update DQT to latest version and fix bigint crash (#14318 ) * update dqt * don't crash on bigint values * better submit experiance * bump to an even version	2023-05-24 17:40:45 -07:00
Charles Smith	88831b1dd0	Docs: Updates docker compose to turn off kraft which causes errors (#14335 )	2023-05-24 09:33:32 -07:00
Clint Wylie	4096f51f0b	add configurable ColumnTypeMergePolicy to SegmentMetadataCache (#14319 ) This PR adds a new interface to control how SegmentMetadataCache chooses ColumnType when faced with differences between segments for SQL schemas which are computed, exposed as druid.sql.planner.metadataColumnTypeMergePolicy and adds a new 'least restrictive type' mode to allow choosing the type that data across all segments can best be coerced into and sets this as the default behavior. This is a behavior change around when segment driven schema migrations take effect for the SQL schema. With latestInterval, the SQL schema will be updated as soon as the first job with the new schema has published segments, while using leastRestrictive, the schema will only be updated once all segments are reindexed to the new type. The benefit of leastRestrictive is that it eliminates a bunch of type coercion errors that can happen in SQL when types are varied across segments with latestInterval because the newest type is not able to correctly represent older data, such as if the segments have a mix of ARRAY and number types, or any other combinations that lead to odd query plans.	2023-05-24 20:32:51 +05:30
Soumyava	22ba457d29	Expr getCacheKey now delegates to children (#14287 ) * Expr getCacheKey now delegates to children * Removed the LOOKUP_EXPR_CACHE_KEY as we do not need it * Adding an unit test * Update processing/src/main/java/org/apache/druid/math/expr/Expr.java Co-authored-by: Clint Wylie <cjwylie@gmail.com> --------- Co-authored-by: Clint Wylie <cjwylie@gmail.com>	2023-05-23 14:49:38 -07:00
Abhishek Radhakrishnan	338bdb35ea	Return `RESOURCES` in `EXPLAIN PLAN` as an ordered collection (#14323 ) * Make resources an ordered collection so it's deterministic. * test cleanup * fixup docs. * Replace deprecated ObjectNode#put() calls with ObjectNode#set().	2023-05-23 00:55:00 -05:00
Abhishek Radhakrishnan	a5e04d95a4	Add `TYPE_NAME` to the complex serde classes and replace the hardcoded names. (#14317 ) * Add TYPE_NAME to the serde classes and reuse them instead of hardcoded strings. * Static check fixes.	2023-05-23 00:54:47 -05:00
Victoria Lim	6b3a6113c4	Doc: List supported values for Kafka `headerFormat` (#14316 )	2023-05-22 15:41:07 -07:00
Nhi Pham	3f6610aaf1	fixed wording in OSS query laning doc (#14324 ) Co-authored-by: Nhi Pham <nhipham@Nhi-Pham.local>	2023-05-22 11:58:17 -07:00
George Shiqi Wu	cb65135b99	Fix log streaming (#14285 ) * Fix log streaming * Add watch log * Add unit tests * long running client * singleton client * Remove accidental close	2023-05-22 11:19:53 -07:00
Tejaswini Bandlamudi	36a084e021	Fix GHA workflows naming & Run ITs if UTs fail on coverage (#14158 ) Currently, there is no way to run ITs if unit-tests fail on coverage. This PR allows Revised, Standard ITs to run even when unit-tests fail on coverage errors, still failing the workflow. This PR also fixes existing GHA workflow naming.	2023-05-22 11:44:34 +05:30
317brian	9faf9ecf20	docs: add line about write datasource perm for overlord api (#14114 ) Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>	2023-05-19 14:56:24 -07:00
Katya Macedo	269137c682	Update Ingestion section (#14023 ) Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Victoria Lim <lim.t.victoria@gmail.com>	2023-05-19 09:42:27 -07:00
Vadim Ogievetsky	7f66fd049b	don't show merged stats until needed (#14311 )	2023-05-18 20:32:58 -07:00
imply-cheddar	e9fed1445f	Revert PreResponseAuthorizationCheckFilter (#13813 ) Make it permissive like it used to be again so that we ensure that validation errors make it out.	2023-05-18 18:16:43 -07:00
George Shiqi Wu	51f722b7f1	Fix labels (#14282 ) * Fix labels * move to a util function * style * PR comments * rename class	2023-05-18 11:51:58 -07:00
Victoria Lim	058eb99a8b	Docs: Update Docker profile and fix method call in `druidapi` tutorial (#14308 )	2023-05-18 07:29:02 -07:00
Abhishek Radhakrishnan	c546df3866	Add `examples/` to CI UT/IT ignore (#14306 ) * Skip UT/IT on examples only changes.	2023-05-17 17:46:25 -07:00
Abhishek Radhakrishnan	7400ed3c93	Fixup data deletion tutorial docs (#14283 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2023-05-17 17:05:35 -07:00
Charles Smith	c84c174caa	update tutorials to use clarify druid host location for Docker Compose + Druid version (#14295 )	2023-05-17 15:41:02 -07:00
Clint Wylie	cb10bb9783	add website to java ci ignore (#14303 )	2023-05-17 14:50:52 -07:00
Clint Wylie	26ff01a0fd	streamline release process docs (#14268 ) remove release:prepare without skipping tests because there is no good reason to run tests locally in this step inline with creating a tag.	2023-05-17 13:57:37 -07:00
Clint Wylie	1d1454b22c	update NOTICE year, update kafka notice in licenses.yaml (#14299 )	2023-05-17 04:32:19 -07:00
Clint Wylie	d92b9fbfac	more resilient segment metadata, dont parallel merge internal segment metadata queries (#14296 )	2023-05-17 04:12:55 -07:00
Vadim Ogievetsky	1dd20773ae	remove website node-scss dep (#14275 )	2023-05-17 04:10:46 -07:00
317brian	ceda1e98b9	docs: add docs for schema auto-discovery (#14065 ) * wip schemaless * wip * more cleanup * update tuningconfig example * updates based on feedback from clint * remove errant comma * update dimension object to include auto * update to include string schemaless way * fix spelling errors * updates for type-aware and string-based changes * Update docs/ingestion/schema-design.md * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * update spelling file * Update docs/ingestion/schema-design.md Co-authored-by: Clint Wylie <cjwylie@gmail.com> * copyedits * fix anchor --------- Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> Co-authored-by: Clint Wylie <cjwylie@gmail.com>	2023-05-17 01:36:02 -07:00
Clint Wylie	b038a11280	fix issues with handling arrays with all null elements and arrays of booleans in strict mode (#14297 )	2023-05-17 01:33:44 -07:00
Tejaswini Bandlamudi	bbbb031057	Do not cancel old GHA workflows triggered on branch commits (#14279 ) * group and limit workflows only on PRs and not on branch commits * also apply to Static Checks CI	2023-05-16 12:13:08 +05:30
Soumyava	96a3c00754	Fixing an issue with filtering on a single dimension by converting In… (#14277 ) * Fixing an issue with filtering on a single dimension by converting In filter to a selector filter as needed with Filters.toFilter * Adding a test so that any future refactoring does not break this behavior * Made comment a bit more meaningful	2023-05-15 20:10:36 -07:00
Adarsh Sanjeev	e8ef31fe92	Fix condition for timeout in worker task launcher (#14270 ) * Fix condition for timeout in worker task launcher	2023-05-16 08:30:00 +05:30
Victoria Lim	66d4ea014c	Docs: Tutorial for streaming ingestion using Kafka + Docker file to use with Jupyter tutorials (#13984 )	2023-05-15 15:20:52 -07:00
Peter Marshall	c4aa98953b	202304-docs-removeDF (#14132 )	2023-05-15 15:08:57 -07:00
Paul Rogers	3c0983c8e9	Extend the IT framework to allow tests in extensions (#13877 ) The "new" IT framework provides a convenient way to package and run integration tests (ITs), but only for core modules. We have a use case to run an IT for a contrib extension: the proposed gRPC query extension. This PR provides the IT framework functionality to allow non-core ITs.	2023-05-15 20:29:51 +05:30
Adarsh Sanjeev	10bce22e68	Configure maxBytesPerWorker directly instead of using StageDefinition (#14257 ) * Configure maxBytesPerWorker directly instead of using StageDefinition	2023-05-15 16:51:57 +05:30
AmatyaAvadhanula	e9913abbbf	Add new lock types: APPEND and REPLACE (#14258 ) * Add new lock types: APPEND and REPLACE	2023-05-14 22:38:32 -07:00
imply-cheddar	f9861808bc	Be able to load segments on Peons (#14239 ) * Be able to load segments on Peons This change introduces a new config on WorkerConfig that indicates how many bytes of each storage location to use for storage of a task. Said config is divided up amongst the locations and slots and then used to set TaskConfig.tmpStorageBytesPerTask The Peons use their local task dir and tmpStorageBytesPerTask as their StorageLocations for the SegmentManager such that they can accept broadcast segments.	2023-05-12 16:51:00 -07:00
317brian	8bda7297e1	doc: fix unnest datasource syntax (#14272 )	2023-05-12 13:05:27 -07:00
Tejaswini Bandlamudi	9e0708f5e6	update heap size of coordinator, overlord services in docker IT environment (#14214 )	2023-05-12 23:19:48 +05:30
Kashif Faraz	ba11b3d462	Refactor: Add OverlordDuty to replace OverlordHelper and align with CoordinatorDuty (#14235 ) Changes: - Replace `OverlordHelper` with `OverlordDuty` to align with `CoordinatorDuty` - Each duty has a `run()` method and defines a `Schedule` with an initial delay and period. - Update existing duties `TaskLogAutoCleaner` and `DurableStorageCleaner` - Add utility class `Configs` - Update log, error messages and javadocs - Other minor style improvements	2023-05-12 22:39:56 +05:30
317brian	6254658f61	docs: fix links (#14111 )	2023-05-12 09:59:16 -07:00
Nicholas Lippis	58dcbf9399	queue tasks in kubernetes task runner if capacity is fully utilized (#14156 ) * queue tasks if all slots in use * Declare hamcrest-core dependency * Use AtomicBoolean for shutdown requested * Use AtomicReference for peon lifecycle state * fix uninitialized read error * fix indentations * Make tasks protected * fix KubernetesTaskRunnerConfig deserialization * ensure k8s task runner max capacity is Integer.MAX_VALUE * set job duration as task status duration * Address pr comments --------- Co-authored-by: George Shiqi Wu <george.wu@imply.io>	2023-05-12 09:41:44 -06:00
Abhishek Agarwal	9eebeead44	Tune stale bot to pick older issues first (#14267 )	2023-05-12 11:45:29 +05:30
Tejaswini Bandlamudi	8ef99f091a	Fix jdk setup in GHA (#14091 ) Instead of downloading jdk everytime we run CI, we're using inbuilt temurin jdk distributions 8, 11, 17 by settiing JAVA_HOME variable. This is not working as expected since we were not setting this as global environment variable as a result all CI builds are running on jdk11. This PR fixes the issue.	2023-05-12 10:36:59 +05:30

1 2 3 4 5 ...

12767 Commits All Branches Search

12767 Commits

All Branches