druid/extensions-contrib
Gian Merlino 319f99db05
Always use file sizes when determining batch ingest splits (#13955)
* Always use file sizes when determining batch ingest splits.

Main changes:

1) Update CloudObjectInputSource and its subclasses (S3, GCS,
   Azure, Aliyun OSS) to use SplitHintSpecs in all cases. Previously, they
   were only used for prefixes, not uris or objects.

2) Update ExternalInputSpecSlicer (MSQ) to consider file size. Previously,
   file size was ignored; all files were treated as equal weight when
   determining splits.

A side effect of these changes is that we'll make additional network
calls to find the sizes of objects when users specify URIs or objects
as opposed to prefixes. IMO, this is worth it because it's the only way
to respect the user's split hint and task assignment settings.

Secondary changes:

1) S3, Aliyun OSS: Use getObjectMetadata instead of listObjects to get
   metadata for a single object. This is a simpler call that is also
   expected to be less expensive.

2) Azure: Fix a bug where getBlobLength did not populate blob
   reference attributes, and therefore would not actually retrieve the
   blob length.

3) MSQ: Align dynamic slicing logic between ExternalInputSpecSlicer and
   TableInputSpecSlicer.

4) MSQ: Adjust WorkerInputs to ensure there is always at least one
   worker, even if it has a nil slice.

* Add msqCompatible to testGroupByWithImpossibleTimeFilter.

* Fix tests.

* Add additional tests.

* Remove unused stuff.

* Remove more unused stuff.

* Adjust thresholds.

* Remove irrelevant test.

* Fix comments.

* Fix bug.

* Updates.
2023-04-05 08:54:01 -07:00
..
aliyun-oss-extensions Always use file sizes when determining batch ingest splits (#13955) 2023-04-05 08:54:01 -07:00
ambari-metrics-emitter merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything (#13698) 2023-02-17 14:27:41 -08:00
cassandra-storage merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything (#13698) 2023-02-17 14:27:41 -08:00
cloudfiles-extensions Removes FiniteFirehoseFactory and its implementations (#12852) 2023-03-02 18:07:17 +05:30
compressed-bigdecimal Avoid creating new RelDataTypeFactory during SQL planning. (#13904) 2023-03-08 21:55:49 -08:00
distinctcount merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything (#13698) 2023-02-17 14:27:41 -08:00
dropwizard-emitter merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything (#13698) 2023-02-17 14:27:41 -08:00
gce-extensions merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything (#13698) 2023-02-17 14:27:41 -08:00
graphite-emitter merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything (#13698) 2023-02-17 14:27:41 -08:00
influx-extensions merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything (#13698) 2023-02-17 14:27:41 -08:00
influxdb-emitter merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything (#13698) 2023-02-17 14:27:41 -08:00
kafka-emitter merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything (#13698) 2023-02-17 14:27:41 -08:00
kubernetes-overlord-extensions Fix issues with null pointers on jobResponse (#14010) 2023-04-04 17:48:18 -07:00
materialized-view-maintenance merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything (#13698) 2023-02-17 14:27:41 -08:00
materialized-view-selection merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything (#13698) 2023-02-17 14:27:41 -08:00
momentsketch merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything (#13698) 2023-02-17 14:27:41 -08:00
moving-average-query merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything (#13698) 2023-02-17 14:27:41 -08:00
opentelemetry-emitter merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything (#13698) 2023-02-17 14:27:41 -08:00
opentsdb-emitter merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything (#13698) 2023-02-17 14:27:41 -08:00
prometheus-emitter merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything (#13698) 2023-02-17 14:27:41 -08:00
redis-cache merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything (#13698) 2023-02-17 14:27:41 -08:00
sqlserver-metadata-storage merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything (#13698) 2023-02-17 14:27:41 -08:00
statsd-emitter merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything (#13698) 2023-02-17 14:27:41 -08:00
tdigestsketch Avoid creating new RelDataTypeFactory during SQL planning. (#13904) 2023-03-08 21:55:49 -08:00
thrift-extensions merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything (#13698) 2023-02-17 14:27:41 -08:00
time-min-max merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything (#13698) 2023-02-17 14:27:41 -08:00
virtual-columns merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything (#13698) 2023-02-17 14:27:41 -08:00
README.md fix broken links (#9537) 2020-03-22 17:41:18 -07:00

README.md

Community Extensions

Please contribute all community extensions in this directory and include a doc of how your extension can be used under docs/development/extensions-contrib/.

Please note that community extensions are maintained by their original contributors and are not packaged with the core Druid distribution. If you'd like to take on maintenance for a community extension, please post on dev@druid.apache.org to let us know!