druid/docs/ingestion
Didip Kerabat 6ddb828c7a
Able to filter Cloud objects with glob notation. (#12659)
In a heterogeneous environment, sometimes you don't have control over the input folder. Upstream can put any folder they want. In this situation the S3InputSource.java is unusable.

Most people like me solved it by using Airflow to fetch the full list of parquet files and pass it over to Druid. But doing this explodes the JSON spec. We had a situation where 1 of the JSON spec is 16MB and that's simply too much for Overlord.

This patch allows users to pass {"filter": "*.parquet"} and let Druid performs the filtering of the input files.

I am using the glob notation to be consistent with the LocalFirehose syntax.
2022-06-24 11:40:08 +05:30
..
automatic-compaction.md Docs for automatic compaction (#12569) 2022-06-09 14:55:12 -07:00
compaction.md Segments doc update (#12344) 2022-06-16 13:25:17 -07:00
data-formats.md Document data format and example for featureSpec (#12394) 2022-04-06 15:17:15 -07:00
data-management.md Docs: Add multi-dimension partitioning doc; refactor native batch and separate into smaller topics. (#11983) 2021-12-03 16:37:14 +05:30
data-model.md Docs refactor of ingestion. Carries #11541 (#11576) 2021-08-13 08:42:03 -07:00
faq.md Docs: Add multi-dimension partitioning doc; refactor native batch and separate into smaller topics. (#11983) 2021-12-03 16:37:14 +05:30
hadoop.md clarify hadoop input paths (#11781) 2021-10-07 20:22:51 -07:00
index.md Docs: Add multi-dimension partitioning doc; refactor native batch and separate into smaller topics. (#11983) 2021-12-03 16:37:14 +05:30
ingestion-spec.md remove arbitrary granularity spec from docs (#12460) 2022-04-28 16:36:54 -07:00
native-batch-firehose.md Docs: Add multi-dimension partitioning doc; refactor native batch and separate into smaller topics. (#11983) 2021-12-03 16:37:14 +05:30
native-batch-input-source.md Able to filter Cloud objects with glob notation. (#12659) 2022-06-24 11:40:08 +05:30
native-batch-simple-task.md Document config for ingesting null columns (#12389) 2022-04-05 09:15:42 -07:00
native-batch.md Improved docs for range partitioning. (#12350) 2022-05-16 09:42:31 -07:00
partitioning.md Docs refactor of ingestion. Carries #11541 (#11576) 2021-08-13 08:42:03 -07:00
rollup.md Docs: Fix column name in ingestion rollup doc (#12036) 2022-05-10 17:35:59 +05:30
schema-design.md Refactor SQL docs (#12239) 2022-02-11 14:43:30 -08:00
standalone-realtime.md Reduce visibility of Tranquility documentation (#11134) 2021-05-03 16:48:24 -07:00
tasks.md Docs for automatic compaction (#12569) 2022-06-09 14:55:12 -07:00
tranquility.md Reduce visibility of Tranquility documentation (#11134) 2021-05-03 16:48:24 -07:00