Apache Druid: a high performance real-time analytics database.
Go to file
Paul Rogers 8ad8582dc8
Refactor DruidSchema & DruidTable (#12835)
Refactors the DruidSchema and DruidTable abstractions to prepare for the Druid Catalog.

As we add the catalog, we’ll want to combine physical segment metadata information with “hints” provided by the catalog. This is best done if we tidy up the existing code to more clearly separate responsibilities.

This PR is purely a refactoring move: no functionality changed. There is no difference to user functionality or external APIs. Functionality changes will come later as we add the catalog itself.

DruidSchema
In the present code, DruidSchema does three tasks:

Holds the segment metadata cache
Interfaces with an external schema manager
Acts as a schema to Calcite
This PR splits those responsibilities.

DruidSchema holds the Calcite schema for the druid namespace, combining information fro the segment metadata cache, from the external schema manager and (later) from the catalog.
SegmentMetadataCache holds the segment metadata cache formerly in DruidSchema.
DruidTable
The present DruidTable class is a bit of a kitchen sink: it holds all the various kinds of tables which Druid supports, and uses if-statements to handle behavior that differs between types. Yet, any given DruidTable will handle only one such table type. To more clearly model the actual table types, we split DruidTable into several classes:

DruidTable becomes an abstract base class to hold Druid-specific methods.
DatasourceTable represents a datasource.
ExternalTable represents an external table, such as from EXTERN or (later) from the catalog.
InlineTable represents the internal case in which we attach data directly to a table.
LookupTable represents Druid’s lookup table mechanism.
The new subclasses are more focused: they can be selective about the data they hold and the various predicates since they represent just one table type. This will be important as the catalog information will differ depending on table type and the new structure makes adding that logic cleaner.

DatasourceMetadata
Previously, the DruidSchema segment cache would work with DruidTable objects. With the catalog, we need a layer between the segment metadata and the table as presented to Calcite. To fix this, the new SegmentMetadataCache class uses a new DatasourceMetadata class as its cache entry to hold only the “physical” segment metadata information: it is up to the DruidTable to combine this with the catalog information in a later PR.

More Efficient Table Resolution
Calcite provides a convenient base class for schema objects: AbstractSchema. However, this class is a bit too convenient: all we have to do is provide a map of tables and Calcite does the rest. This means that, to resolve any single datasource, say, foo, we need to cache segment metadata, external schema information, and catalog information for all tables. Just so Calcite can do a map lookup.

There is nothing special about AbstractSchema. We can handle table lookups ourselves. The new AbstractTableSchema does this. In fact, all the rest of Calcite wants is to resolve individual tables by name, and, for commands we don’t use, to provide a list of table names.

DruidSchema now extends AbstractTableSchema. SegmentMetadataCache resolves individual tables (and provides table names.)

DruidSchemaManager
DruidSchemaManager provides a way to specify table schemas externally. In this sense, it is similar to the catalog, but only for datasources. It originally followed the AbstractSchema pattern: it implements provide a map of tables. This PR provides new optional methods for the table lookup and table names operations. The default implementations work the same way that AbstractSchema works: we get the entire map and pick out the information we need. Extensions that use this API should be revised to support the individual operations instead. Druid code no longer calls the original getTables() method.

The PR has one breaking change: since the DruidSchemaManager map is read-only to the rest of Druid, we should return a Map, not a ConcurrentMap.
2022-08-10 10:24:04 +05:30
.github Lock hadoop dependencies to 2.8.5 (#11583) 2021-08-12 15:16:47 +05:30
.idea Poison stupid pool (#12646) 2022-07-03 14:36:22 -07:00
benchmarks Refactor DruidSchema & DruidTable (#12835) 2022-08-10 10:24:04 +05:30
cloud Tidy up construction of the Guice Injectors (#12816) 2022-08-04 00:05:07 -07:00
codestyle Frame processing and channels. (#12848) 2022-08-04 21:29:04 -07:00
core Refactor DruidSchema & DruidTable (#12835) 2022-08-10 10:24:04 +05:30
dev Add some debug tips for debugging peons (#12697) 2022-07-09 01:47:25 -07:00
distribution Improved Java 17 support and Java runtime docs. (#12839) 2022-08-03 23:16:05 -07:00
docs Change Kafka Lookup Extractor to not register consumer group (#12842) 2022-08-09 16:14:22 +05:30
examples Improved Java 17 support and Java runtime docs. (#12839) 2022-08-03 23:16:05 -07:00
extendedset Bump up the versions (#12480) 2022-04-27 14:28:20 +05:30
extensions-contrib Adding withName implementation to AggregatorFactory (#12862) 2022-08-08 18:31:56 +05:30
extensions-core Change Kafka Lookup Extractor to not register consumer group (#12842) 2022-08-09 16:14:22 +05:30
helm/druid replaces hard-coded probe delays with helm values (#12805) 2022-07-26 14:04:06 +05:30
hll Free ByteBuffers in tests and fix some bugs. (#12521) 2022-05-19 07:42:29 -07:00
hooks Git hooks should fail on errors; pass args to git hooks (#12322) 2022-03-10 09:07:50 +09:00
indexing-hadoop Add authentication call before cleaning up intermediate files in hadoop ingestions (#12030) 2022-05-02 08:40:44 -05:00
indexing-service fix(druid-indexing): failed to get shardSpec for interval issue (#12573) 2022-08-05 17:57:36 -07:00
integration-tests Kinesis ingestion with empty shards (#12792) 2022-08-05 22:38:58 +05:30
licenses Blueprint 4 (#12391) 2022-04-04 10:34:22 -07:00
processing Set druid.processing.fifo to true by default (#12571) 2022-08-08 10:18:24 -07:00
publications De-incubation cleanup in code, docs, packaging (#9108) 2020-01-03 12:33:19 -05:00
server Add check for eternity time segment to SqlSegmentsMetadataQuery (#12844) 2022-08-04 22:33:08 -07:00
services Refactor DruidSchema & DruidTable (#12835) 2022-08-10 10:24:04 +05:30
sql Refactor DruidSchema & DruidTable (#12835) 2022-08-10 10:24:04 +05:30
web-console Kinesis ingestion with empty shards (#12792) 2022-08-05 22:38:58 +05:30
website Change Kafka Lookup Extractor to not register consumer group (#12842) 2022-08-09 16:14:22 +05:30
.asf.yaml Add .asf.yaml. (#9083) 2019-12-20 16:45:38 -08:00
.backportrc.json Add 0.18.0 to .backportrc.json to facilitate backport. (#9661) 2020-04-11 13:49:04 -07:00
.codecov.yml Use Codecov (#8388) 2019-08-28 08:49:30 -07:00
.dockerignore Add docker container for druid (#6896) 2019-02-08 12:12:28 +00:00
.gitignore Frame processing and channels. (#12848) 2022-08-04 21:29:04 -07:00
.lgtm.yml Suppress LGTM warnings about stack trace exposure (#9631) 2020-04-09 17:31:03 -07:00
.travis.yml Improved Java 17 support and Java runtime docs. (#12839) 2022-08-03 23:16:05 -07:00
CONTRIBUTING.md Fix numbered list formatting in markdown. (#9664) 2020-04-21 20:18:12 -07:00
LABELS Add plain text README.txt, use relative link from README.md to build.md (#7611) 2019-05-09 21:29:26 -07:00
LICENSE support Aliyun OSS service as deep storage (#9898) 2020-07-01 22:20:53 -07:00
NOTICE license.yaml fixes for code introduced related to AWS RDS token based password provider in PR #9518 (#10885) 2021-03-10 12:59:25 -08:00
README.md Readme - link fix to build guide (#12849) 2022-08-03 19:32:37 +08:00
README.template De-incubation cleanup in code, docs, packaging (#9108) 2020-01-03 12:33:19 -05:00
check_test_suite.py run web-console e2e tests for java changes too (#12776) 2022-07-13 16:12:57 -07:00
check_test_suite_test.py run web-console e2e tests for java changes too (#12776) 2022-07-13 16:12:57 -07:00
licenses.yaml Improved Java 17 support and Java runtime docs. (#12839) 2022-08-03 23:16:05 -07:00
owasp-dependency-check-suppressions.xml Suppress some false alarm CVEs (#12812) 2022-07-22 22:27:31 +05:30
pom.xml Improved Java 17 support and Java runtime docs. (#12839) 2022-08-03 23:16:05 -07:00
upload.sh Adding licenses and enable apache-rat-plugin. (#6215) 2018-09-18 08:39:26 -07:00

README.md

Slack Build Status Language grade: Java Coverage Status Docker Helm


Website | Documentation | Developer Mailing List | User Mailing List | Slack | Twitter | Download


Apache Druid

Druid is a high performance real-time analytics database. Druid's main value add is to reduce time to insight and action.

Druid is designed for workflows where fast queries and ingest really matter. Druid excels at powering UIs, running operational (ad-hoc) queries, or handling high concurrency. Consider Druid as an open source alternative to data warehouses for a variety of use cases. The design documentation explains the key concepts.

Getting started

You can get started with Druid with our local or Docker quickstart.

Druid provides a rich set of APIs (via HTTP and JDBC) for loading, managing, and querying your data. You can also interact with Druid via the built-in console (shown below).

Load data

data loader Kafka

Load streaming and batch data using a point-and-click wizard to guide you through ingestion setup. Monitor one off tasks and ingestion supervisors.

Manage the cluster

management

Manage your cluster with ease. Get a view of your datasources, segments, ingestion tasks, and services from one convenient location. All powered by SQL systems tables, allowing you to see the underlying query for each view.

Issue queries

query view combo

Use the built-in query workbench to prototype DruidSQL and native queries or connect one of the many tools that help you make the most out of Druid.

Documentation

You can find the documentation for the latest Druid release on the project website.

If you would like to contribute documentation, please do so under /docs in this repository and submit a pull request.

Community

Community support is available on the druid-user mailing list, which is hosted at Google Groups.

Development discussions occur on dev@druid.apache.org, which you can subscribe to by emailing dev-subscribe@druid.apache.org.

Chat with Druid committers and users in real-time on the Apache Druid Slack channel. Please use this invitation link to join and invite others.

Building from source

Please note that JDK 8 or JDK 11 is required to build Druid.

See the latest build guide for instructions on building Apache Druid from source.

Contributing

Please follow the community guidelines for contributing.

For instructions on setting up IntelliJ dev/intellij-setup.md

License

Apache License, Version 2.0