648 Commits

Author SHA1 Message Date
Zoltan Haindrich
d75dcea4dc undo integration-tests module changes 2024-06-10 08:44:08 +00:00
Zoltan Haindrich
fb4b32e5b7 updates 2024-05-29 16:10:38 +00:00
Zoltan Haindrich
d1b4587eae uu 2024-05-28 16:38:47 +00:00
Zoltan Haindrich
ff7abeb0f1 small cleanup 2024-05-28 15:54:29 +00:00
Zoltan Haindrich
295c09a03c launcher-x 2024-05-28 15:44:22 +00:00
Zoltan Haindrich
ff31c14dba clkeaup 2024-05-28 15:27:18 +00:00
Zoltan Haindrich
6c264f2977 move stuff 2024-05-28 15:22:42 +00:00
Zoltan Haindrich
ee195719b7 inline class 2024-05-28 15:17:07 +00:00
Zoltan Haindrich
a1b7f981fb cleanup 2024-05-28 15:09:42 +00:00
Zoltan Haindrich
b20ee99371 clean 2024-05-28 15:07:09 +00:00
Zoltan Haindrich
9ae80f05de Merge remote-tracking branch 'kgyrtkirk/quidem-runner-extension-submit' into quidem-record 2024-05-27 10:52:01 +00:00
Zoltan Haindrich
ec2ecde235 updates 2024-05-27 10:49:38 +00:00
Zoltan Haindrich
44ea4e1c51
Fix cds-coordinator-metadata-query-disabled (#16488)
fixes the issue with the newly enabled `cds-coordiantor-metadata-query-disabled` [split](https://github.com/apache/druid/pull/16468)
* configures to use `prepopulated-data` environment things to configure `S3` for access 
* this is needed because these tests use a [dataset which is loaded from s3](https://github.com/apache/druid/blob/master/integration-tests/docker/test-data/cds-coordinator-metadata-query-disabled-sample-data.sql)
* also undoes the previous [fix](https://github.com/apache/druid/pull/16469) of setting the aws region explicitly as this is a more complete solution - and configuring `prepopulated-data` also sets the region; so that's not needed anymore
2024-05-22 20:42:11 +02:00
Zoltan Haindrich
4595b0c128 u[pdate 2024-05-21 14:18:55 +00:00
Zoltan Haindrich
ba09e7d1de add 2024-05-21 14:00:02 +00:00
Zoltan Haindrich
08b73d1969 Revert "some stuff"
This reverts commit 52598d3bca3c63a9563dba2e5b2fa775cc2e9cbd.
2024-05-21 12:02:46 +00:00
Zoltan Haindrich
52598d3bca some stuff 2024-05-21 12:02:44 +00:00
Zoltan Haindrich
c948201507
Fix cds-task-schema-publish-disabled (#16469)
set AWS_REGION=us-west-2 to avoid retries
2024-05-21 12:18:30 +05:30
zachjsh
dd5dc500ce
Catalog integration tests (#16424)
* * add new catalog IT with failure to ensure that it is run in CI

* * actually add failing test referred to and fix checkstyle

* * add some tests

* * fix checkstyle

* * add test descriptions

* * add more tests
2024-05-17 11:49:09 -04:00
Zoltan Haindrich
e7e119b559 reduce copypaste 2024-05-16 13:33:27 +00:00
Zoltan Haindrich
fc9a6c7740 move/etc 2024-05-16 13:23:45 +00:00
Zoltan Haindrich
cabf2a31c3 fix 2024-05-16 13:19:30 +00:00
Zoltan Haindrich
1d2a79f5be cleanup 2024-05-16 13:01:53 +00:00
Zoltan Haindrich
1fb9fac159 remove cl 2024-05-16 12:59:20 +00:00
Zoltan Haindrich
76ffbfb7cf cl 2024-05-16 12:50:38 +00:00
Zoltan Haindrich
e2986ae612 cleanup 2024-05-16 12:49:10 +00:00
Zoltan Haindrich
bec1f38a0e move sqlmodule down 2024-05-16 11:17:05 +00:00
Zoltan Haindrich
93892b6524 undo some 2024-05-16 11:11:03 +00:00
Zoltan Haindrich
b63a80e5b7 passes basic test 2024-05-16 11:01:39 +00:00
Zoltan Haindrich
118eb61939 there - with 1 boot 2024-05-16 10:31:38 +00:00
Zoltan Haindrich
28ea884e19 almost ready? 2024-05-16 10:01:22 +00:00
Zoltan Haindrich
27735f2621 move disco 2024-05-16 09:50:10 +00:00
Zoltan Haindrich
cab3d945be up 2024-05-16 09:48:18 +00:00
Zoltan Haindrich
c9638b7836 update 2024-05-16 09:44:16 +00:00
Zoltan Haindrich
5f552a2997 c 2024-05-16 09:30:41 +00:00
Zoltan Haindrich
074161dfde add some service crap 2024-05-16 05:53:42 +00:00
Zoltan Haindrich
55b2051f9d workinhg stuff 2024-05-15 16:23:11 +00:00
Zoltan Haindrich
8ee41f58d0 it does work 2024-05-15 15:14:43 +00:00
Zoltan Haindrich
d4b052a579 stuff 2024-05-15 11:57:13 +00:00
Zoltan Haindrich
73011267af triaks 2024-05-15 10:34:48 +00:00
Zoltan Haindrich
43fd8af63c Revert "add"
This reverts commit 3fbb3cb853456bebccfbf8fc16ba7f30a810c26c.
2024-05-14 09:39:04 +00:00
Zoltan Haindrich
3fbb3cb853 add 2024-05-14 09:39:02 +00:00
Akshat Jain
bacdb4c48d
Update integration tests related documentation for better clarity (#16313) 2024-05-13 11:27:21 +05:30
Alberic Liu
92fb0ff718
upgrade mysql:mysql-connector-java to 8.2.0 (#16024)
* upgrade mysql:mysql-connector-java to 8.2.0

* fix the check errors

* remove unused comment
2024-05-06 21:58:37 +08:00
Rishabh Singh
c61c3785a0
Followup changes to 15817 (Segment schema publishing and polling) (#16368)
* Fix build

* Nit changes in KillUnreferencedSegmentSchema

* Replace reference to the abbreviation SMQ with Metadata Query, rename inTransit maps in schema cache

* nitpicks

* Remove reference to smq abbreviation from integration-tests

* Remove reference to smq abbreviation from integration-tests

* minor change

* Update index.md

* Add delimiter while computing schema fingerprint hash
2024-05-03 19:13:52 +05:30
Kashif Faraz
e5b40b0b8c
Miscellaneous cleanup of load queue references (#16367)
Changes:
- Rename `DataSegmentChangeRequestAndStatus` to `DataSegmentChangeResponse`
- Rename `SegmentLoadDropHandler.Status` to `SegmentChangeStatus`
- Remove method `CoordinatorRunStats.getSnapshotAndReset()` as it was used only in
load queue peon implementations. Using an atomic reference is much simpler.
- Remove `ServerTestHelper.MAPPER`. Use existing `TestHelper.makeJsonMapper()` instead.
2024-05-02 15:59:50 +05:30
Gian Merlino
5d1950d451
MSQ controller: Support in-memory shuffles; towards JVM reuse. (#16168)
* MSQ controller: Support in-memory shuffles; towards JVM reuse.

This patch contains two controller changes that make progress towards a
lower-latency MSQ.

First, support for in-memory shuffles. The main feature of in-memory shuffles,
as far as the controller is concerned, is that they are not fully buffered. That
means that whenever a producer stage uses in-memory output, its consumer must run
concurrently. The controller determines which stages run concurrently, and when
they start and stop.

"Leapfrogging" allows any chain of sort-based stages to use in-memory shuffles
even if we can only run two stages at once. For example, in a linear chain of
stages 0 -> 1 -> 2 where all do sort-based shuffles, we can use in-memory shuffling
for each one while only running two at once. (When stage 1 is done reading input
and about to start writing its output, we can stop 0 and start 2.)

1) New OutputChannelMode enum attached to WorkOrders that tells workers
   whether stage output should be in memory (MEMORY), or use local or durable
   storage.

2) New logic in the ControllerQueryKernel to determine which stages can use
   in-memory shuffling (ControllerUtils#computeStageGroups) and to launch them
   at the appropriate time (ControllerQueryKernel#createNewKernels).

3) New "doneReadingInput" method on Controller (passed down to the stage kernels)
   which allows stages to transition to POST_READING even if they are not
   gathering statistics. This is important because it enables "leapfrogging"
   for HASH_LOCAL_SORT shuffles, and for GLOBAL_SORT shuffles with 1 partition.

4) Moved result-reading from ControllerContext#writeReports to new QueryListener
   interface, which ControllerImpl feeds results to row-by-row while the query
   is still running. Important so we can read query results from the final
   stage using an in-memory channel.

5) New class ControllerQueryKernelConfig holds configs that control kernel
   behavior (such as whether to pipeline, maximum number of concurrent stages,
   etc). Generated by the ControllerContext.

Second, a refactor towards running workers in persistent JVMs that are able to
cache data across queries. This is helpful because I believe we'll want to reuse
JVMs and cached data for latency reasons.

1) Move creation of WorkerManager and TableInputSpecSlicer to the
   ControllerContext, rather than ControllerImpl. This allows managing workers and
   work assignment differently when JVMs are reusable.

2) Lift the Controller Jersey resource out from ControllerChatHandler to a
   reusable resource.

3) Move memory introspection to a MemoryIntrospector interface, and introduce
   ControllerMemoryParameters that uses it. This makes it easier to run MSQ in
   process types other than Indexer and Peon.

Both of these areas will have follow-ups that make similar changes on the
worker side.

* Address static checks.

* Address static checks.

* Fixes.

* Report writer tests.

* Adjustments.

* Fix reports.

* Review updates.

* Adjust name.

* Small changes.
2024-04-30 21:30:27 -07:00
Adarsh Sanjeev
9a2d7c28bc
Prepare master branch for 31.0.0 release (#16333) 2024-04-26 09:22:43 +05:30
Rishabh Singh
e30790e013
Introduce Segment Schema Publishing and Polling for Efficient Datasource Schema Building (#15817)
Issue: #14989

The initial step in optimizing segment metadata was to centralize the construction of datasource schema in the Coordinator (#14985). Thereafter, we addressed the problem of publishing schema for realtime segments (#15475). Subsequently, our goal is to eliminate the requirement for regularly executing queries to obtain segment schema information.

This is the final change which involves publishing segment schema for finalized segments from task and periodically polling them in the Coordinator.
2024-04-24 22:22:53 +05:30
Laksh Singla
b9bbde5c0a
Fix deadlock that can occur while merging group by results (#15420)
This PR prevents such a deadlock from happening by acquiring the merge buffers in a single place and passing it down to the runner that might need it.
2024-04-22 14:10:44 +05:30