druid

mirror of https://github.com/apache/druid.git synced 2025-02-08 19:14:49 +00:00

Go to file

Refactor DruidSchema & DruidTable (#12835 )

Refactors the DruidSchema and DruidTable abstractions to prepare for the Druid Catalog.

As we add the catalog, we’ll want to combine physical segment metadata information with “hints” provided by the catalog. This is best done if we tidy up the existing code to more clearly separate responsibilities.

This PR is purely a refactoring move: no functionality changed. There is no difference to user functionality or external APIs. Functionality changes will come later as we add the catalog itself.

DruidSchema
In the present code, DruidSchema does three tasks:

Holds the segment metadata cache
Interfaces with an external schema manager
Acts as a schema to Calcite
This PR splits those responsibilities.

DruidSchema holds the Calcite schema for the druid namespace, combining information fro the segment metadata cache, from the external schema manager and (later) from the catalog.
SegmentMetadataCache holds the segment metadata cache formerly in DruidSchema.
DruidTable
The present DruidTable class is a bit of a kitchen sink: it holds all the various kinds of tables which Druid supports, and uses if-statements to handle behavior that differs between types. Yet, any given DruidTable will handle only one such table type. To more clearly model the actual table types, we split DruidTable into several classes:

DruidTable becomes an abstract base class to hold Druid-specific methods.
DatasourceTable represents a datasource.
ExternalTable represents an external table, such as from EXTERN or (later) from the catalog.
InlineTable represents the internal case in which we attach data directly to a table.
LookupTable represents Druid’s lookup table mechanism.
The new subclasses are more focused: they can be selective about the data they hold and the various predicates since they represent just one table type. This will be important as the catalog information will differ depending on table type and the new structure makes adding that logic cleaner.

DatasourceMetadata
Previously, the DruidSchema segment cache would work with DruidTable objects. With the catalog, we need a layer between the segment metadata and the table as presented to Calcite. To fix this, the new SegmentMetadataCache class uses a new DatasourceMetadata class as its cache entry to hold only the “physical” segment metadata information: it is up to the DruidTable to combine this with the catalog information in a later PR.

More Efficient Table Resolution
Calcite provides a convenient base class for schema objects: AbstractSchema. However, this class is a bit too convenient: all we have to do is provide a map of tables and Calcite does the rest. This means that, to resolve any single datasource, say, foo, we need to cache segment metadata, external schema information, and catalog information for all tables. Just so Calcite can do a map lookup.

There is nothing special about AbstractSchema. We can handle table lookups ourselves. The new AbstractTableSchema does this. In fact, all the rest of Calcite wants is to resolve individual tables by name, and, for commands we don’t use, to provide a list of table names.

DruidSchema now extends AbstractTableSchema. SegmentMetadataCache resolves individual tables (and provides table names.)

DruidSchemaManager
DruidSchemaManager provides a way to specify table schemas externally. In this sense, it is similar to the catalog, but only for datasources. It originally followed the AbstractSchema pattern: it implements provide a map of tables. This PR provides new optional methods for the table lookup and table names operations. The default implementations work the same way that AbstractSchema works: we get the entire map and pick out the information we need. Extensions that use this API should be revised to support the individual operations instead. Druid code no longer calls the original getTables() method.

The PR has one breaking change: since the DruidSchemaManager map is read-only to the rest of Druid, we should return a Map, not a ConcurrentMap.

2022-08-10 10:24:04 +05:30

.github

Lock hadoop dependencies to 2.8.5 (#11583 )

2021-08-12 15:16:47 +05:30

.idea

Poison stupid pool (#12646 )

2022-07-03 14:36:22 -07:00

benchmarks

Refactor DruidSchema & DruidTable (#12835 )

2022-08-10 10:24:04 +05:30

cloud

Tidy up construction of the Guice Injectors (#12816 )

2022-08-04 00:05:07 -07:00

codestyle

Frame processing and channels. (#12848 )

2022-08-04 21:29:04 -07:00

core

Refactor DruidSchema & DruidTable (#12835 )

2022-08-10 10:24:04 +05:30

dev

Add some debug tips for debugging peons (#12697 )

2022-07-09 01:47:25 -07:00

distribution

Improved Java 17 support and Java runtime docs. (#12839 )

2022-08-03 23:16:05 -07:00

docs

Change Kafka Lookup Extractor to not register consumer group (#12842 )

2022-08-09 16:14:22 +05:30

examples

Improved Java 17 support and Java runtime docs. (#12839 )

2022-08-03 23:16:05 -07:00

extendedset

Bump up the versions (#12480 )

2022-04-27 14:28:20 +05:30

extensions-contrib

Adding withName implementation to AggregatorFactory (#12862 )

2022-08-08 18:31:56 +05:30

extensions-core

Change Kafka Lookup Extractor to not register consumer group (#12842 )

2022-08-09 16:14:22 +05:30

helm/druid

replaces hard-coded probe delays with helm values (#12805 )

2022-07-26 14:04:06 +05:30

hll

Free ByteBuffers in tests and fix some bugs. (#12521 )

2022-05-19 07:42:29 -07:00

hooks

Git hooks should fail on errors; pass args to git hooks (#12322 )

2022-03-10 09:07:50 +09:00

indexing-hadoop

Add authentication call before cleaning up intermediate files in hadoop ingestions (#12030 )

2022-05-02 08:40:44 -05:00

indexing-service

fix(druid-indexing): failed to get shardSpec for interval issue (#12573 )

2022-08-05 17:57:36 -07:00

integration-tests

Kinesis ingestion with empty shards (#12792 )

2022-08-05 22:38:58 +05:30

licenses

Blueprint 4 (#12391 )

2022-04-04 10:34:22 -07:00

processing

Set druid.processing.fifo to true by default (#12571 )

2022-08-08 10:18:24 -07:00

publications

De-incubation cleanup in code, docs, packaging (#9108 )

2020-01-03 12:33:19 -05:00

server

Add check for eternity time segment to SqlSegmentsMetadataQuery (#12844 )

2022-08-04 22:33:08 -07:00

services

Refactor DruidSchema & DruidTable (#12835 )

2022-08-10 10:24:04 +05:30

sql

Refactor DruidSchema & DruidTable (#12835 )

2022-08-10 10:24:04 +05:30

web-console

Kinesis ingestion with empty shards (#12792 )

2022-08-05 22:38:58 +05:30

website

Change Kafka Lookup Extractor to not register consumer group (#12842 )

2022-08-09 16:14:22 +05:30

.asf.yaml

Add .asf.yaml. (#9083 )

2019-12-20 16:45:38 -08:00

.backportrc.json

Add 0.18.0 to .backportrc.json to facilitate backport. (#9661 )

2020-04-11 13:49:04 -07:00

.codecov.yml

Use Codecov (#8388 )

2019-08-28 08:49:30 -07:00

.dockerignore

Add docker container for druid (#6896 )

2019-02-08 12:12:28 +00:00

.gitignore

Frame processing and channels. (#12848 )

2022-08-04 21:29:04 -07:00

.lgtm.yml

Suppress LGTM warnings about stack trace exposure (#9631 )

2020-04-09 17:31:03 -07:00

.travis.yml

Improved Java 17 support and Java runtime docs. (#12839 )

2022-08-03 23:16:05 -07:00

check_test_suite_test.py

run web-console e2e tests for java changes too (#12776 )

2022-07-13 16:12:57 -07:00

check_test_suite.py

run web-console e2e tests for java changes too (#12776 )

2022-07-13 16:12:57 -07:00

CONTRIBUTING.md

Fix numbered list formatting in markdown. (#9664 )

2020-04-21 20:18:12 -07:00

LABELS

Add plain text README.txt, use relative link from README.md to build.md (#7611 )

2019-05-09 21:29:26 -07:00

LICENSE

support Aliyun OSS service as deep storage (#9898 )

2020-07-01 22:20:53 -07:00

licenses.yaml

Improved Java 17 support and Java runtime docs. (#12839 )

2022-08-03 23:16:05 -07:00

NOTICE

license.yaml fixes for code introduced related to AWS RDS token based password provider in PR #9518 (#10885 )

2021-03-10 12:59:25 -08:00

owasp-dependency-check-suppressions.xml

Suppress some false alarm CVEs (#12812 )

2022-07-22 22:27:31 +05:30

pom.xml

Improved Java 17 support and Java runtime docs. (#12839 )

2022-08-03 23:16:05 -07:00

README.md

Readme - link fix to build guide (#12849 )

2022-08-03 19:32:37 +08:00

README.template

De-incubation cleanup in code, docs, packaging (#9108 )

2020-01-03 12:33:19 -05:00

upload.sh

Adding licenses and enable apache-rat-plugin. (#6215 )

2018-09-18 08:39:26 -07:00

README.md

Apache Druid

Druid is a high performance real-time analytics database. Druid's main value add is to reduce time to insight and action.

Druid is designed for workflows where fast queries and ingest really matter. Druid excels at powering UIs, running operational (ad-hoc) queries, or handling high concurrency. Consider Druid as an open source alternative to data warehouses for a variety of use cases. The design documentation explains the key concepts.

Getting started

You can get started with Druid with our local or Docker quickstart.

Druid provides a rich set of APIs (via HTTP and JDBC) for loading, managing, and querying your data. You can also interact with Druid via the built-in console (shown below).

Load data

Load streaming and batch data using a point-and-click wizard to guide you through ingestion setup. Monitor one off tasks and ingestion supervisors.

Manage the cluster

Manage your cluster with ease. Get a view of your datasources, segments, ingestion tasks, and services from one convenient location. All powered by SQL systems tables, allowing you to see the underlying query for each view.

Issue queries

Use the built-in query workbench to prototype DruidSQL and native queries or connect one of the many tools that help you make the most out of Druid.

Documentation

You can find the documentation for the latest Druid release on the project website.

If you would like to contribute documentation, please do so under /docs in this repository and submit a pull request.

Community

Community support is available on the druid-user mailing list, which is hosted at Google Groups.

Development discussions occur on dev@druid.apache.org, which you can subscribe to by emailing dev-subscribe@druid.apache.org.

Chat with Druid committers and users in real-time on the Apache Druid Slack channel. Please use this invitation link to join and invite others.

Building from source

Please note that JDK 8 or JDK 11 is required to build Druid.

See the latest build guide for instructions on building Apache Druid from source.

Contributing

Please follow the community guidelines for contributing.

For instructions on setting up IntelliJ dev/intellij-setup.md

License

Apache License, Version 2.0

Languages

Java 62.4%

ReScript 30.7%

TypeScript 3.1%

Euphoria 0.9%

Csound 0.8%

Other 1.9%