2018-12-13 14:47:20 -05:00
|
|
|
---
|
2019-08-21 00:48:59 -04:00
|
|
|
id: overview
|
|
|
|
title: "Developing on Apache Druid"
|
|
|
|
sidebar_label: "Developing on Druid"
|
2018-12-13 14:47:20 -05:00
|
|
|
---
|
|
|
|
|
2018-11-13 12:38:37 -05:00
|
|
|
<!--
|
|
|
|
~ Licensed to the Apache Software Foundation (ASF) under one
|
|
|
|
~ or more contributor license agreements. See the NOTICE file
|
|
|
|
~ distributed with this work for additional information
|
|
|
|
~ regarding copyright ownership. The ASF licenses this file
|
|
|
|
~ to you under the Apache License, Version 2.0 (the
|
|
|
|
~ "License"); you may not use this file except in compliance
|
|
|
|
~ with the License. You may obtain a copy of the License at
|
|
|
|
~
|
|
|
|
~ http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
~
|
|
|
|
~ Unless required by applicable law or agreed to in writing,
|
|
|
|
~ software distributed under the License is distributed on an
|
|
|
|
~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
|
|
|
~ KIND, either express or implied. See the License for the
|
|
|
|
~ specific language governing permissions and limitations
|
|
|
|
~ under the License.
|
|
|
|
-->
|
|
|
|
|
2016-01-02 17:24:52 -05:00
|
|
|
|
2019-08-21 00:48:59 -04:00
|
|
|
Druid's codebase consists of several major components. For developers interested in learning the code, this document provides
|
2016-01-02 17:24:52 -05:00
|
|
|
a high level overview of the main components that make up Druid and the relevant classes to start from to learn the code.
|
|
|
|
|
2019-08-21 00:48:59 -04:00
|
|
|
## Storage format
|
2016-01-02 17:24:52 -05:00
|
|
|
|
2019-08-21 00:48:59 -04:00
|
|
|
Data in Druid is stored in a custom column format known as a [segment](../design/segments.md). Segments are composed of
|
2016-01-02 17:24:52 -05:00
|
|
|
different types of columns. `Column.java` and the classes that extend it is a great place to looking into the storage format.
|
|
|
|
|
2019-08-21 00:48:59 -04:00
|
|
|
## Segment creation
|
2016-01-02 17:24:52 -05:00
|
|
|
|
|
|
|
Raw data is ingested in `IncrementalIndex.java`, and segments are created in `IndexMerger.java`.
|
|
|
|
|
2019-08-21 00:48:59 -04:00
|
|
|
## Storage engine
|
2016-01-02 17:24:52 -05:00
|
|
|
|
|
|
|
Druid segments are memory mapped in `IndexIO.java` to be exposed for querying.
|
|
|
|
|
2019-08-21 00:48:59 -04:00
|
|
|
## Query engine
|
2016-01-02 17:24:52 -05:00
|
|
|
|
2019-08-21 00:48:59 -04:00
|
|
|
Most of the logic related to Druid queries can be found in the Query* classes. Druid leverages query runners to run queries.
|
|
|
|
Query runners often embed other query runners and each query runner adds on a layer of logic. A good starting point to trace
|
2016-01-02 17:24:52 -05:00
|
|
|
the query logic is to start from `QueryResource.java`.
|
|
|
|
|
|
|
|
## Coordination
|
|
|
|
|
2019-08-21 00:48:59 -04:00
|
|
|
Most of the coordination logic for Historical processes is on the Druid Coordinator. The starting point here is `DruidCoordinator.java`.
|
2016-01-02 17:24:52 -05:00
|
|
|
Most of the coordination logic for (real-time) ingestion is in the Druid indexing service. The starting point here is `OverlordResource.java`.
|
|
|
|
|
|
|
|
## Real-time Ingestion
|
|
|
|
|
2024-07-19 17:37:21 -04:00
|
|
|
Druid streaming tasks are based on the 'seekable stream' classes such as `SeekableStreamSupervisor.java`,
|
|
|
|
`SeekableStreamIndexTask.java`, and `SeekableStreamIndexTaskRunner.java`. The data processing happens through
|
|
|
|
`StreamAppenderator.java`, and the persist and hand-off logic is in `StreamAppenderatorDriver.java`.
|
|
|
|
|
|
|
|
## Native Batch Ingestion
|
|
|
|
|
|
|
|
Druid native batch ingestion main task types are based on `AbstractBatchTask.java` and `AbstractBatchSubtask.java`.
|
|
|
|
Parallel processing uses `ParallelIndexSupervisorTask.java`, which spawns subtasks to perform various operations such
|
|
|
|
as data analysis and partitioning depending on the task specification. Segment generation happens in
|
|
|
|
`SinglePhaseSubTask.java`, `PartialHashSegmentGenerateTask.java`, or `PartialRangeSegmentGenerateTask.java` through
|
|
|
|
`BatchAppenderator`, and the persist and hand-off logic is in `BatchAppenderatorDriver.java`.
|
2016-01-02 17:24:52 -05:00
|
|
|
|
|
|
|
## Hadoop-based Batch Ingestion
|
|
|
|
|
2019-08-21 00:48:59 -04:00
|
|
|
The two main Hadoop indexing classes are `HadoopDruidDetermineConfigurationJob.java` for the job to determine how many Druid
|
2016-01-02 17:24:52 -05:00
|
|
|
segments to create, and `HadoopDruidIndexerJob.java`, which creates Druid segments.
|
|
|
|
|
|
|
|
At some point in the future, we may move the Hadoop ingestion code out of core Druid.
|
|
|
|
|
|
|
|
## Internal UIs
|
|
|
|
|
|
|
|
Druid currently has two internal UIs. One is for the Coordinator and one is for the Overlord.
|
|
|
|
|
|
|
|
At some point in the future, we will likely move the internal UI code out of core Druid.
|
|
|
|
|
2019-08-21 00:48:59 -04:00
|
|
|
## Client libraries
|
2016-01-02 17:24:52 -05:00
|
|
|
|
2019-08-21 00:48:59 -04:00
|
|
|
We welcome contributions for new client libraries to interact with Druid. See the
|
2021-05-10 04:14:06 -04:00
|
|
|
[Community and third-party libraries](https://druid.apache.org/libraries.html) page for links to existing client
|
2019-08-21 00:48:59 -04:00
|
|
|
libraries.
|