* Use Druid's extension loading for integration test instead of maven
* fix maven command
* override config path
* load input format extensions and kafka by default; add prepopulated-data group
* all docker-composes are overridable
* fix s3 configs
* override config for all
* fix docker_compose_args
* fix security tests
* turn off debug logs for overlord api calls
* clean up stuff
* revert docker-compose.yml
* fix override config for query error test; fix circular dependency in docker compose
* add back some dependencies in docker compose
* new maven profile for integration test
* example file filter
* Support routing data through an HTTP proxy
* Support routing data through an HTTP proxy
This adds the ability for the HttpClient to connect through an HTTP proxy. We
augment the channel factory to check if it is supposed to be proxied and, if so,
we connect to the proxy host first, issue a CONNECT command through to the final
recipient host and *then* give the channel to the normal http client for usage.
* add docs
* address comments
Co-authored-by: imply-cheddar <86940447+imply-cheddar@users.noreply.github.com>
This PR splits current SegmentLoader into SegmentLoader and SegmentCacheManager.
SegmentLoader - this class is responsible for building the segment object but does not expose any methods for downloading, cache space management, etc. Default implementation delegates the download operations to SegmentCacheManager and only contains the logic for building segments once downloaded. . This class will be used in SegmentManager to construct Segment objects.
SegmentCacheManager - this class manages the segment cache on the local disk. It fetches the segment files to the local disk, can clean up the cache, and in the future, support reserve and release on cache space. [See https://github.com/Make SegmentLoader extensible and customizable #11398]. This class will be used in ingestion tasks such as compaction, re-indexing where segment files need to be downloaded locally.
* Doc update for new input source and input format.
- The input source and input format are promoted in all docs under docs/ingestion
- All input sources including core extension ones are located in docs/ingestion/native-batch.md
- All input formats and parsers including core extension ones are localted in docs/ingestion/data-formats.md
- New behavior of the parallel task with different partitionsSpecs are documented in docs/ingestion/native-batch.md
* parquet
* add warning for range partitioning with sequential mode
* hdfs + s3, gs
* add fs impl for gs
* address comments
* address comments
* gcs