mirror of https://github.com/apache/druid.git synced 2025-02-06 18:18:17 +00:00

Joshua Sun 7c7997e8a1 Add Kinesis Indexing Service to core Druid (#6431 )

* created seekablestream classes

* created seekablestreamsupervisor class

* first attempt to integrate kafa indexing service to use SeekableStream

* seekablestream bug fixes

* kafkarecordsupplier

* integrated kafka indexing service with seekablestream

* implemented resume/suspend and refactored some package names

* moved kinesis indexing service into core druid extensions

* merged some changes from kafka supervisor race condition

* integrated kinesis-indexing-service with seekablestream

* unite tests for kinesis-indexing-service

* various bug fixes for kinesis-indexing-service

* refactored kinesisindexingtask

* finished up more kinesis unit tests

* more bug fixes for kinesis-indexing-service

* finsihed refactoring kinesis unit tests

* removed KinesisParititons and KafkaPartitions to use SeekableStreamPartitions

* kinesis-indexing-service code cleanup and docs

* merge #6291

merge #6337

merge #6383

* added more docs and reordered methods

* fixd kinesis tests after merging master and added docs in seekablestream

* fix various things from pr comment

* improve recordsupplier and add unit tests

* migrated to aws-java-sdk-kinesis

* merge changes from master

* fix pom files and forbiddenapi checks

* checkpoint JavaType bug fix

* fix pom and stuff

* disable checkpointing in kinesis

* fix kinesis sequence number null in closed shard

* merge changes from master

* fixes for kinesis tasks

* capitalized <partitionType, sequenceType>

* removed abstract class loggers

* conform to guava api restrictions

* add docker for travis other modules test

* address comments

* improve RecordSupplier to supply records in batch

* fix strict compile issue

* add test scope for localstack dependency

* kinesis indexing task refactoring

* comments

* github comments

* minor fix

* removed unneeded readme

* fix deserialization bug

* fix various bugs

* KinesisRecordSupplier unable to catch up to earliest position in stream bug fix

* minor changes to kinesis

* implement deaggregate for kinesis

* Merge remote-tracking branch 'upstream/master' into seekablestream

* fix kinesis offset discrepancy with kafka

* kinesis record supplier disable getPosition

* pr comments

* mock for kinesis tests and remove docker dependency for unit tests

* PR comments

* avg lag in kafkasupervisor #6587

* refacotred SequenceMetadata in taskRunners

* small fix

* more small fix

* recordsupplier resource leak

* revert .travis.yml formatting

* fix style

* kinesis docs

* doc part2

* more docs

* comments

* comments*2

* revert string replace changes

* comments

* teamcity

* comments part 1

* comments part 2

* comments part 3

* merge #6754

* fix injection binding

* comments

* KinesisRegion refactor

* comments part idk lol

* can't think of a commit msg anymore

* remove possiblyResetDataSourceMetadata() for IncrementalPublishingTaskRunner

* commmmmmmmmmments

* extra error handling in KinesisRecordSupplier getRecords

* comments

* quickfix

* typo

* oof

2018-12-21 12:49:24 -07:00

7.8 KiB

Raw Blame History

layout	title
doc_page	Druid extensions

Druid extensions

Druid implements an extension system that allows for adding functionality at runtime. Extensions are commonly used to add support for deep storages (like HDFS and S3), metadata stores (like MySQL and PostgreSQL), new aggregators, new input formats, and so on.

Production clusters will generally use at least two extensions; one for deep storage and one for a metadata store. Many clusters will also use additional extensions.

Including extensions

Please see here.

Core extensions

Core extensions are maintained by Druid committers.

Name	Description	Docs
druid-avro-extensions	Support for data in Apache Avro data format.	link
druid-basic-security	Support for Basic HTTP authentication and role-based access control.	link
druid-bloom-filter	Support for providing Bloom filters in druid queries.	link
druid-caffeine-cache	A local cache implementation backed by Caffeine.	link
druid-datasketches	Support for approximate counts and set operations with DataSketches.	link
druid-hdfs-storage	HDFS deep storage.	link
druid-histogram	Approximate histograms and quantiles aggregator.	link
druid-kafka-eight	Kafka ingest firehose (high level consumer) for realtime nodes.	link
druid-kafka-extraction-namespace	Kafka-based namespaced lookup. Requires namespace lookup extension.	link
druid-kafka-indexing-service	Supervised exactly-once Kafka ingestion for the indexing service.	link
druid-kinesis-indexing-service	Supervised exactly-once Kinesis ingestion for the indexing service.	link
druid-kerberos	Kerberos authentication for druid nodes.	link
druid-lookups-cached-global	A module for lookups providing a jvm-global eager caching for lookups. It provides JDBC and URI implementations for fetching lookup data.	link
druid-lookups-cached-single	Per lookup caching module to support the use cases where a lookup need to be isolated from the global pool of lookups	link
druid-parquet-extensions	Support for data in Apache Parquet data format. Requires druid-avro-extensions to be loaded.	link
druid-protobuf-extensions	Support for data in Protobuf data format.	link
druid-s3-extensions	Interfacing with data in AWS S3, and using S3 as deep storage.	link
druid-stats	Statistics related module including variance and standard deviation.	link
mysql-metadata-storage	MySQL metadata store.	link
postgresql-metadata-storage	PostgreSQL metadata store.	link
simple-client-sslcontext	Simple SSLContext provider module to be used by internal HttpClient talking to other nodes over HTTPS.	link

Community Extensions

Community extensions are not maintained by Druid committers, although we accept patches from community members using these extensions. They may not have been as extensively tested as the core extensions.

A number of community members have contributed their own extensions to Druid that are not packaged with the default Druid tarball. If you'd like to take on maintenance for a community extension, please post on dev@druid.apache.org to let us know!

All of these community extensions can be downloaded using pull-deps with the coordinate org.apache.druid.extensions.contrib:EXTENSION_NAME:LATEST_DRUID_STABLE_VERSION.

Name	Description	Docs
ambari-metrics-emitter	Ambari Metrics Emitter	link
druid-azure-extensions	Microsoft Azure deep storage.	link
druid-cassandra-storage	Apache Cassandra deep storage.	link
druid-cloudfiles-extensions	Rackspace Cloudfiles deep storage and firehose.	link
druid-distinctcount	DistinctCount aggregator	link
druid-kafka-eight-simpleConsumer	Kafka ingest firehose (low level consumer).	link
druid-orc-extensions	Support for data in Apache Orc data format.	link
druid-rabbitmq	RabbitMQ firehose.	link
druid-redis-cache	A cache implementation for Druid based on Redis.	link
druid-rocketmq	RocketMQ firehose.	link
druid-time-min-max	Min/Max aggregator for timestamp.	link
druid-google-extensions	Google Cloud Storage deep storage.	link
sqlserver-metadata-storage	Microsoft SqlServer deep storage.	link
graphite-emitter	Graphite metrics emitter	link
statsd-emitter	StatsD metrics emitter	link
kafka-emitter	Kafka metrics emitter	link
druid-thrift-extensions	Support thrift ingestion	link
druid-opentsdb-emitter	OpenTSDB metrics emitter	link

Promoting Community Extension to Core Extension

Please post on dev@druid.apache.org if you'd like an extension to be promoted to core. If we see a community extension actively supported by the community, we can promote it to core based on community feedback.

Creating your own Extensions

For information how to create your own extension, please see here.

7.8 KiB Raw Blame History