Merge pull request #2186 from druid-io/dev-docs2

Add intro developer docs
2025-03-02 15:29:10 +00:00 · 2016-01-03 11:36:41 -05:00 · 2016-01-03 11:36:41 -05:00 · 5a63c3dd63
commit 5a63c3dd63
parent a0ab65d169 06a8e14820
2 changed files with 57 additions and 0 deletions
--- a/docs/content/development/overview.md
+++ b/docs/content/development/overview.md
@ -0,0 +1,56 @@
+---
+layout: doc_page
+---
+
+# Developing on Druid
+
+Druid's codebase consists of several major components. For developers interested in learning the code, this document provides 
+a high level overview of the main components that make up Druid and the relevant classes to start from to learn the code.
+
+## Storage Format
+
+Data in Druid is stored in a custom column format known as a [segment](../design/segments.html). Segments are composed of 
+different types of columns. `Column.java` and the classes that extend it is a great place to looking into the storage format.
+
+## Segment Creation
+
+Raw data is ingested in `IncrementalIndex.java`, and segments are created in `IndexMerger.java`.
+
+## Storage Engine
+
+Druid segments are memory mapped in `IndexIO.java` to be exposed for querying.
+
+## Query Engine
+
+Most of the logic related to Druid queries can be found in the Query* classes. Druid leverages query runners to run queries. 
+Query runners often embed other query runners and each query runner adds on a layer of logic. A good starting to point trace 
+the query logic is to start from `QueryResource.java`.
+
+## Coordination
+
+Most of the coordination logic for historical nodes is on the Druid coordinator. The starting point here is `DruidCoordinator.java`.  
+Most of the coordination logic for (real-time) ingestion is in the Druid indexing service. The starting point here is `OverlordResource.java`.
+
+## Real-time Ingestion
+
+Druid loads data through `FirehoseFactory.java` classes. Firehoses can often wrap other firehoses, where, similar to the 
+query runners, each firehose adds a layer of logic. Much of the core management logic is in `RealtimeManager.java` and the 
+persist logic is in `RealtimePlumber.java`.
+
+## Hadoop-based Batch Ingestion
+
+The two main Hadoop indexing classes are `HadoopDruidDetermineConfigurationJob.java` for the job to determine how many Druid 
+segments to create, and `HadoopDruidIndexerJob.java`, which creates Druid segments.
+
+At some point in the future, we may move the Hadoop ingestion code out of core Druid.
+
+## Internal UIs
+
+Druid currently has two internal UIs. One is for the Coordinator and one is for the Overlord.
+
+At some point in the future, we will likely move the internal UI code out of core Druid.
+
+## Client Libraries
+
+We welcome contributions for new client libraries to interact with Druid. See client 
+[libraries](../development/libraries.html) for existing client libraries.
--- a/docs/content/toc.textile
+++ b/docs/content/toc.textile
@ -81,6 +81,7 @@ h2. Configuration
 * "Production Zookeeper Configuration":../configuration/zookeeper.html

 h2. Development
+* "Overview":../development/overview.html
 * "Libraries":../development/libraries.html
 * "Extending Druid":../development/modules.html
 * "Build From Source":../development/build.html