druid/extensions.md at 9a10f8352bd0c8c0fd5199226a4ff3a4cf2e6d46

12 KiB

Raw Blame History

id	title
extensions	Extensions

Druid implements an extension system that allows for adding functionality at runtime. Extensions are commonly used to add support for deep storages (like HDFS and S3), metadata stores (like MySQL and PostgreSQL), new aggregators, new input formats, and so on.

Production clusters will generally use at least two extensions; one for deep storage and one for a metadata store. Many clusters will also use additional extensions.

Core extensions

Core extensions are maintained by Druid committers.

Name	Description	Docs
druid-avro-extensions	Support for data in Apache Avro data format.	link
druid-azure-extensions	Microsoft Azure deep storage.	link
druid-basic-security	Support for Basic HTTP authentication and role-based access control.	link
druid-bloom-filter	Support for providing Bloom filters in druid queries.	link
druid-datasketches	Support for approximate counts and set operations with DataSketches.	link
druid-google-extensions	Google Cloud Storage deep storage.	link
druid-hdfs-storage	HDFS deep storage.	link
druid-histogram	Approximate histograms and quantiles aggregator. Deprecated, please use the DataSketches quantiles aggregator from the `druid-datasketches` extension instead.	link
druid-kafka-extraction-namespace	Kafka-based namespaced lookup. Requires namespace lookup extension.	link
druid-kafka-indexing-service	Supervised exactly-once Kafka ingestion for the indexing service.	link
druid-kinesis-indexing-service	Supervised exactly-once Kinesis ingestion for the indexing service.	link
druid-kerberos	Kerberos authentication for druid processes.	link
druid-lookups-cached-global	A module for lookups providing a jvm-global eager caching for lookups. It provides JDBC and URI implementations for fetching lookup data.	link
druid-lookups-cached-single	Per lookup caching module to support the use cases where a lookup need to be isolated from the global pool of lookups	link
druid-orc-extensions	Support for data in Apache Orc data format.	link
druid-parquet-extensions	Support for data in Apache Parquet data format. Requires druid-avro-extensions to be loaded.	link
druid-protobuf-extensions	Support for data in Protobuf data format.	link
druid-ranger-security	Support for access control through Apache Ranger.	link
druid-s3-extensions	Interfacing with data in AWS S3, and using S3 as deep storage.	link
druid-ec2-extensions	Interfacing with AWS EC2 for autoscaling middle managers	UNDOCUMENTED
druid-stats	Statistics related module including variance and standard deviation.	link
mysql-metadata-storage	MySQL metadata store.	link
postgresql-metadata-storage	PostgreSQL metadata store.	link
simple-client-sslcontext	Simple SSLContext provider module to be used by Druid's internal HttpClient when talking to other Druid processes over HTTPS.	link
druid-pac4j	OpenID Connect authentication for druid processes.	link

Community extensions

Community extensions are not maintained by Druid committers, although we accept patches from community members using these extensions. They may not have been as extensively tested as the core extensions.

A number of community members have contributed their own extensions to Druid that are not packaged with the default Druid tarball. If you'd like to take on maintenance for a community extension, please post on dev@druid.apache.org to let us know!

All of these community extensions can be downloaded using pull-deps while specifying a -c coordinate option to pull org.apache.druid.extensions.contrib:{EXTENSION_NAME}:{DRUID_VERSION}.

Name	Description	Docs
ambari-metrics-emitter	Ambari Metrics Emitter	link
druid-cassandra-storage	Apache Cassandra deep storage.	link
druid-cloudfiles-extensions	Rackspace Cloudfiles deep storage and firehose.	link
druid-distinctcount	DistinctCount aggregator	link
druid-redis-cache	A cache implementation for Druid based on Redis.	link
druid-time-min-max	Min/Max aggregator for timestamp.	link
sqlserver-metadata-storage	Microsoft SQLServer deep storage.	link
graphite-emitter	Graphite metrics emitter	link
statsd-emitter	StatsD metrics emitter	link
kafka-emitter	Kafka metrics emitter	link
druid-thrift-extensions	Support thrift ingestion	link
druid-opentsdb-emitter	OpenTSDB metrics emitter	link
materialized-view-selection, materialized-view-maintenance	Materialized View	link
druid-moving-average-query	Support for Moving Average and other Aggregate Window Functions in Druid queries.	link
druid-influxdb-emitter	InfluxDB metrics emitter	link
druid-momentsketch	Support for approximate quantile queries using the momentsketch library	link
druid-tdigestsketch	Support for approximate sketch aggregators based on T-Digest	link
gce-extensions	GCE Extensions	link

Promoting community extensions to core extensions

Please post on dev@druid.apache.org if you'd like an extension to be promoted to core. If we see a community extension actively supported by the community, we can promote it to core based on community feedback.

For information how to create your own extension, please see here.

Loading extensions

Loading core extensions

Apache Druid bundles all core extensions out of the box. See the list of extensions for your options. You can load bundled extensions by adding their names to your common.runtime.properties druid.extensions.loadList property. For example, to load the postgresql-metadata-storage and druid-hdfs-storage extensions, use the configuration:

druid.extensions.loadList=["postgresql-metadata-storage", "druid-hdfs-storage"]

These extensions are located in the extensions directory of the distribution.

Druid bundles two sets of configurations: one for the quickstart and one for a clustered configuration. Make sure you are updating the correct common.runtime.properties for your setup.

Because of licensing, the mysql-metadata-storage extension does not include the required MySQL JDBC driver. For instructions on how to install this library, see the MySQL extension page.

Loading community extensions

You can also load community and third-party extensions not already bundled with Druid. To do this, first download the extension and then install it into your extensions directory. You can download extensions from their distributors directly, or if they are available from Maven, the included pull-deps can download them for you. To use pull-deps, specify the full Maven coordinate of the extension in the form groupId:artifactId:version. For example, for the (hypothetical) extension com.example:druid-example-extension:1.0.0, run:

java \
  -cp "lib/*" \
  -Ddruid.extensions.directory="extensions" \
  -Ddruid.extensions.hadoopDependenciesDir="hadoop-dependencies" \
  org.apache.druid.cli.Main tools pull-deps \
  --no-default-hadoop \
  -c "com.example:druid-example-extension:1.0.0"

You only have to install the extension once. Then, add "druid-example-extension" to druid.extensions.loadList in common.runtime.properties to instruct Druid to load the extension.

Please make sure all the Extensions related configuration properties listed here are set correctly.

The Maven groupId for almost every community extension is org.apache.druid.extensions.contrib. The artifactId is the name of the extension, and the version is the latest Druid stable version.

Loading extensions from the classpath

If you add your extension jar to the classpath at runtime, Druid will also load it into the system. This mechanism is relatively easy to reason about, but it also means that you have to ensure that all dependency jars on the classpath are compatible. That is, Druid makes no provisions while using this method to maintain class loader isolation so you must make sure that the jars on your classpath are mutually compatible.

12 KiB Raw Blame History