8ad8582dc8
Refactors the DruidSchema and DruidTable abstractions to prepare for the Druid Catalog. As we add the catalog, we’ll want to combine physical segment metadata information with “hints” provided by the catalog. This is best done if we tidy up the existing code to more clearly separate responsibilities. This PR is purely a refactoring move: no functionality changed. There is no difference to user functionality or external APIs. Functionality changes will come later as we add the catalog itself. DruidSchema In the present code, DruidSchema does three tasks: Holds the segment metadata cache Interfaces with an external schema manager Acts as a schema to Calcite This PR splits those responsibilities. DruidSchema holds the Calcite schema for the druid namespace, combining information fro the segment metadata cache, from the external schema manager and (later) from the catalog. SegmentMetadataCache holds the segment metadata cache formerly in DruidSchema. DruidTable The present DruidTable class is a bit of a kitchen sink: it holds all the various kinds of tables which Druid supports, and uses if-statements to handle behavior that differs between types. Yet, any given DruidTable will handle only one such table type. To more clearly model the actual table types, we split DruidTable into several classes: DruidTable becomes an abstract base class to hold Druid-specific methods. DatasourceTable represents a datasource. ExternalTable represents an external table, such as from EXTERN or (later) from the catalog. InlineTable represents the internal case in which we attach data directly to a table. LookupTable represents Druid’s lookup table mechanism. The new subclasses are more focused: they can be selective about the data they hold and the various predicates since they represent just one table type. This will be important as the catalog information will differ depending on table type and the new structure makes adding that logic cleaner. DatasourceMetadata Previously, the DruidSchema segment cache would work with DruidTable objects. With the catalog, we need a layer between the segment metadata and the table as presented to Calcite. To fix this, the new SegmentMetadataCache class uses a new DatasourceMetadata class as its cache entry to hold only the “physical” segment metadata information: it is up to the DruidTable to combine this with the catalog information in a later PR. More Efficient Table Resolution Calcite provides a convenient base class for schema objects: AbstractSchema. However, this class is a bit too convenient: all we have to do is provide a map of tables and Calcite does the rest. This means that, to resolve any single datasource, say, foo, we need to cache segment metadata, external schema information, and catalog information for all tables. Just so Calcite can do a map lookup. There is nothing special about AbstractSchema. We can handle table lookups ourselves. The new AbstractTableSchema does this. In fact, all the rest of Calcite wants is to resolve individual tables by name, and, for commands we don’t use, to provide a list of table names. DruidSchema now extends AbstractTableSchema. SegmentMetadataCache resolves individual tables (and provides table names.) DruidSchemaManager DruidSchemaManager provides a way to specify table schemas externally. In this sense, it is similar to the catalog, but only for datasources. It originally followed the AbstractSchema pattern: it implements provide a map of tables. This PR provides new optional methods for the table lookup and table names operations. The default implementations work the same way that AbstractSchema works: we get the entire map and pick out the information we need. Extensions that use this API should be revised to support the individual operations instead. Druid code no longer calls the original getTables() method. The PR has one breaking change: since the DruidSchemaManager map is read-only to the rest of Druid, we should return a Map, not a ConcurrentMap. |
||
---|---|---|
.github | ||
.idea | ||
benchmarks | ||
cloud | ||
codestyle | ||
core | ||
dev | ||
distribution | ||
docs | ||
examples | ||
extendedset | ||
extensions-contrib | ||
extensions-core | ||
helm/druid | ||
hll | ||
hooks | ||
indexing-hadoop | ||
indexing-service | ||
integration-tests | ||
licenses | ||
processing | ||
publications | ||
server | ||
services | ||
sql | ||
web-console | ||
website | ||
.asf.yaml | ||
.backportrc.json | ||
.codecov.yml | ||
.dockerignore | ||
.gitignore | ||
.lgtm.yml | ||
.travis.yml | ||
CONTRIBUTING.md | ||
LABELS | ||
LICENSE | ||
NOTICE | ||
README.md | ||
README.template | ||
check_test_suite.py | ||
check_test_suite_test.py | ||
licenses.yaml | ||
owasp-dependency-check-suppressions.xml | ||
pom.xml | ||
upload.sh |
README.md
Website | Documentation | Developer Mailing List | User Mailing List | Slack | Twitter | Download
Apache Druid
Druid is a high performance real-time analytics database. Druid's main value add is to reduce time to insight and action.
Druid is designed for workflows where fast queries and ingest really matter. Druid excels at powering UIs, running operational (ad-hoc) queries, or handling high concurrency. Consider Druid as an open source alternative to data warehouses for a variety of use cases. The design documentation explains the key concepts.
Getting started
You can get started with Druid with our local or Docker quickstart.
Druid provides a rich set of APIs (via HTTP and JDBC) for loading, managing, and querying your data. You can also interact with Druid via the built-in console (shown below).
Load data
Load streaming and batch data using a point-and-click wizard to guide you through ingestion setup. Monitor one off tasks and ingestion supervisors.
Manage the cluster
Manage your cluster with ease. Get a view of your datasources, segments, ingestion tasks, and services from one convenient location. All powered by SQL systems tables, allowing you to see the underlying query for each view.
Issue queries
Use the built-in query workbench to prototype DruidSQL and native queries or connect one of the many tools that help you make the most out of Druid.
Documentation
You can find the documentation for the latest Druid release on the project website.
If you would like to contribute documentation, please do so under
/docs
in this repository and submit a pull request.
Community
Community support is available on the druid-user mailing list, which is hosted at Google Groups.
Development discussions occur on dev@druid.apache.org, which you can subscribe to by emailing dev-subscribe@druid.apache.org.
Chat with Druid committers and users in real-time on the Apache Druid Slack channel. Please use this invitation link to join and invite others.
Building from source
Please note that JDK 8 or JDK 11 is required to build Druid.
See the latest build guide for instructions on building Apache Druid from source.
Contributing
Please follow the community guidelines for contributing.
For instructions on setting up IntelliJ dev/intellij-setup.md