From 06a8e1482081b940d388bdedff6fcf2aefa1252a Mon Sep 17 00:00:00 2001 From: fjy Date: Sat, 2 Jan 2016 14:24:52 -0800 Subject: [PATCH] Add intro developer docs --- docs/content/development/overview.md | 56 ++++++++++++++++++++++++++++ docs/content/toc.textile | 1 + 2 files changed, 57 insertions(+) create mode 100644 docs/content/development/overview.md diff --git a/docs/content/development/overview.md b/docs/content/development/overview.md new file mode 100644 index 00000000000..abdb5fe942a --- /dev/null +++ b/docs/content/development/overview.md @@ -0,0 +1,56 @@ +--- +layout: doc_page +--- + +# Developing on Druid + +Druid's codebase consists of several major components. For developers interested in learning the code, this document provides +a high level overview of the main components that make up Druid and the relevant classes to start from to learn the code. + +## Storage Format + +Data in Druid is stored in a custom column format known as a [segment](../design/segments.html). Segments are composed of +different types of columns. `Column.java` and the classes that extend it is a great place to looking into the storage format. + +## Segment Creation + +Raw data is ingested in `IncrementalIndex.java`, and segments are created in `IndexMerger.java`. + +## Storage Engine + +Druid segments are memory mapped in `IndexIO.java` to be exposed for querying. + +## Query Engine + +Most of the logic related to Druid queries can be found in the Query* classes. Druid leverages query runners to run queries. +Query runners often embed other query runners and each query runner adds on a layer of logic. A good starting to point trace +the query logic is to start from `QueryResource.java`. + +## Coordination + +Most of the coordination logic for historical nodes is on the Druid coordinator. The starting point here is `DruidCoordinator.java`. +Most of the coordination logic for (real-time) ingestion is in the Druid indexing service. The starting point here is `OverlordResource.java`. + +## Real-time Ingestion + +Druid loads data through `FirehoseFactory.java` classes. Firehoses can often wrap other firehoses, where, similar to the +query runners, each firehose adds a layer of logic. Much of the core management logic is in `RealtimeManager.java` and the +persist logic is in `RealtimePlumber.java`. + +## Hadoop-based Batch Ingestion + +The two main Hadoop indexing classes are `HadoopDruidDetermineConfigurationJob.java` for the job to determine how many Druid +segments to create, and `HadoopDruidIndexerJob.java`, which creates Druid segments. + +At some point in the future, we may move the Hadoop ingestion code out of core Druid. + +## Internal UIs + +Druid currently has two internal UIs. One is for the Coordinator and one is for the Overlord. + +At some point in the future, we will likely move the internal UI code out of core Druid. + +## Client Libraries + +We welcome contributions for new client libraries to interact with Druid. See client +[libraries](../development/libraries.html) for existing client libraries. diff --git a/docs/content/toc.textile b/docs/content/toc.textile index a4d1efc44f4..c445141344c 100644 --- a/docs/content/toc.textile +++ b/docs/content/toc.textile @@ -81,6 +81,7 @@ h2. Configuration * "Production Zookeeper Configuration":../configuration/zookeeper.html h2. Development +* "Overview":../development/overview.html * "Libraries":../development/libraries.html * "Extending Druid":../development/modules.html * "Build From Source":../development/build.html