Commit Graph

12 Commits

Author SHA1 Message Date
Rishabh Singh c61c3785a0
Followup changes to 15817 (Segment schema publishing and polling) (#16368)
* Fix build

* Nit changes in KillUnreferencedSegmentSchema

* Replace reference to the abbreviation SMQ with Metadata Query, rename inTransit maps in schema cache

* nitpicks

* Remove reference to smq abbreviation from integration-tests

* Remove reference to smq abbreviation from integration-tests

* minor change

* Update index.md

* Add delimiter while computing schema fingerprint hash
2024-05-03 19:13:52 +05:30
Rishabh Singh e30790e013
Introduce Segment Schema Publishing and Polling for Efficient Datasource Schema Building (#15817)
Issue: #14989

The initial step in optimizing segment metadata was to centralize the construction of datasource schema in the Coordinator (#14985). Thereafter, we addressed the problem of publishing schema for realtime segments (#15475). Subsequently, our goal is to eliminate the requirement for regularly executing queries to obtain segment schema information.

This is the final change which involves publishing segment schema for finalized segments from task and periodically polling them in the Coordinator.
2024-04-24 22:22:53 +05:30
Rishabh Singh d968bb3f43
Rename config for enabling CentralizedDatasourceSchema feature (#15476)
* Rename property to druid.centralizedDatasourceSchema.enabled
* Update config name in docker-compose
2023-12-05 16:57:25 +05:30
Rishabh Singh 8c802e4c9b
Relocating Table Schema Building: Shifting from Brokers to Coordinator for Improved Efficiency (#14985)
In the current design, brokers query both data nodes and tasks to fetch the schema of the segments they serve. The table schema is then constructed by combining the schemas of all segments within a datasource. However, this approach leads to a high number of segment metadata queries during broker startup, resulting in slow startup times and various issues outlined in the design proposal.

To address these challenges, we propose centralizing the table schema management process within the coordinator. This change is the first step in that direction. In the new arrangement, the coordinator will take on the responsibility of querying both data nodes and tasks to fetch segment schema and subsequently building the table schema. Brokers will now simply query the Coordinator to fetch table schema. Importantly, brokers will still retain the capability to build table schemas if the need arises, ensuring both flexibility and resilience.
2023-11-04 19:33:25 +05:30
Jihoon Son 4a74c5adcc
Use Druid's extension loading for integration test instead of maven (#12095)
* Use Druid's extension loading for integration test instead of maven

* fix maven command

* override config path

* load input format extensions and kafka by default; add prepopulated-data group

* all docker-composes are overridable

* fix s3 configs

* override config for all

* fix docker_compose_args

* fix security tests

* turn off debug logs for overlord api calls

* clean up stuff

* revert docker-compose.yml

* fix override config for query error test; fix circular dependency in docker compose

* add back some dependencies in docker compose

* new maven profile for integration test

* example file filter
2022-01-05 23:33:04 -08:00
Maytas Monsereenusorn ce4dd48bb8
Support custom coordinator duties (#11601)
* impl

* fix checkstyle

* fix checkstyle

* fix checkstyle

* add test

* add test

* add test

* add integration tests

* add integration tests

* add more docs

* address comments

* address comments

* address comments

* add test

* fix checkstyle

* fix test
2021-08-19 11:54:11 +07:00
Parag Jain c7b46671b3
option to use deep storage for storing shuffle data (#11507)
Fixes #11297.
Description

Description and design in the proposal #11297
Key changed/added classes in this PR

    *DataSegmentPusher
    *ShuffleClient
    *PartitionStat
    *PartitionLocation
    *IntermediaryDataManager
2021-08-13 16:40:25 -04:00
Jihoon Son a6a2758095
More unit tests for JsonParserIterator; Integration tests for query errors (#11091)
* unit tests for timeout exception in init

* integration tests

* run integraion test on travis

* fix inspection
2021-04-12 15:08:50 -07:00
zhangyue19921010 de691808ce
[Bug]Kinesis-data-format IT can not work (#11071)
* start schema-resgity and replace json template

* add docs

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2021-04-08 15:50:04 -07:00
Clint Wylie 96889cdebc
add avro + kafka + schema registry integration test (#10929)
* add avro + schema registry integration test

* style

* retry init

* maybe this

* oops heh

* this will fix it

* review stuffs

* fix comment
2021-03-08 08:12:12 -08:00
zachjsh 553f5c8570
Ldap integration tests (#10901)
* Add integration tests for ldap extension

* * refactor

* * add ldap-security integration test to travis

* * fix license error

* * Fix failing other integration test

* * break up large tests
* refactor
* address review comments

* * fix intellij inspections failure

* * remove dead code
2021-02-23 13:29:57 -08:00
Clint Wylie da0eabaa01
integration test for coordinator and overlord leadership client (#10680)
* integration test for coordinator and overlord leadership, added sys.servers is_leader column

* docs

* remove not needed

* fix comments

* fix compile heh

* oof

* revert unintended

* fix tests, split out docker-compose file selection from starting cluster, use docker-compose down to stop cluster

* fixes

* style

* dang

* heh

* scripts are hard

* fix spelling

* fix thing that must not matter since was already wrong ip, log when test fails

* needs more heap

* fix merge

* less aggro
2020-12-17 22:50:12 -08:00