druid/docs
Didip Kerabat 6ddb828c7a
Able to filter Cloud objects with glob notation. (#12659)
In a heterogeneous environment, sometimes you don't have control over the input folder. Upstream can put any folder they want. In this situation the S3InputSource.java is unusable.

Most people like me solved it by using Airflow to fetch the full list of parquet files and pass it over to Druid. But doing this explodes the JSON spec. We had a situation where 1 of the JSON spec is 16MB and that's simply too much for Overlord.

This patch allows users to pass {"filter": "*.parquet"} and let Druid performs the filtering of the input files.

I am using the glob notation to be consistent with the LocalFirehose syntax.
2022-06-24 11:40:08 +05:30
..
_bin De-incubation cleanup in code, docs, packaging (#9108) 2020-01-03 12:33:19 -05:00
assets Update screenshots for Druid console doc (#12593) 2022-06-15 16:42:20 -07:00
comparisons Update druid-vs-kudu.md (#11470) 2021-07-21 22:58:14 +08:00
configuration Disable autokill of segments by default. (#12693) 2022-06-23 17:17:11 -07:00
dependencies Doc updates for metadata cleanup and storage (#12190) 2022-01-27 11:40:54 -08:00
design Segments doc update (#12344) 2022-06-16 13:25:17 -07:00
development Update screenshots for Druid console doc (#12593) 2022-06-15 16:42:20 -07:00
ingestion Able to filter Cloud objects with glob notation. (#12659) 2022-06-24 11:40:08 +05:30
misc Docs – expressions link back and timestamp hint (#11674) 2022-03-29 09:12:30 -07:00
operations Update screenshots for Druid console doc (#12593) 2022-06-15 16:42:20 -07:00
querying Add TIME_IN_INTERVAL SQL operator. (#12662) 2022-06-21 13:05:37 -07:00
tutorials Add TIME_IN_INTERVAL SQL operator. (#12662) 2022-06-21 13:05:37 -07:00