mirror of https://github.com/apache/druid.git
6ddb828c7a
In a heterogeneous environment, sometimes you don't have control over the input folder. Upstream can put any folder they want. In this situation the S3InputSource.java is unusable. Most people like me solved it by using Airflow to fetch the full list of parquet files and pass it over to Druid. But doing this explodes the JSON spec. We had a situation where 1 of the JSON spec is 16MB and that's simply too much for Overlord. This patch allows users to pass {"filter": "*.parquet"} and let Druid performs the filtering of the input files. I am using the glob notation to be consistent with the LocalFirehose syntax. |
||
---|---|---|
.. | ||
aliyun-oss-extensions | ||
ambari-metrics-emitter | ||
cassandra-storage | ||
cloudfiles-extensions | ||
distinctcount | ||
dropwizard-emitter | ||
gce-extensions | ||
graphite-emitter | ||
influx-extensions | ||
influxdb-emitter | ||
kafka-emitter | ||
materialized-view-maintenance | ||
materialized-view-selection | ||
momentsketch | ||
moving-average-query | ||
opentelemetry-emitter | ||
opentsdb-emitter | ||
prometheus-emitter | ||
redis-cache | ||
sqlserver-metadata-storage | ||
statsd-emitter | ||
tdigestsketch | ||
thrift-extensions | ||
time-min-max | ||
virtual-columns | ||
README.md |
README.md
Community Extensions
Please contribute all community extensions in this directory and include a doc of how your extension can be used under docs/development/extensions-contrib/.
Please note that community extensions are maintained by their original contributors and are not packaged with the core Druid distribution. If you'd like to take on maintenance for a community extension, please post on dev@druid.apache.org to let us know!