From c96abe3a5963b3555d60d5ff2e0a1ac96a057b99 Mon Sep 17 00:00:00 2001 From: fjy Date: Fri, 18 Oct 2013 10:29:58 -0700 Subject: [PATCH] add data flow docs --- docs/content/Data-Flow.md | 30 ++++++++++++++++++++++++++++++ docs/content/toc.textile | 1 + 2 files changed, 31 insertions(+) diff --git a/docs/content/Data-Flow.md b/docs/content/Data-Flow.md index afe2ad8339f..8abcc8f9532 100644 --- a/docs/content/Data-Flow.md +++ b/docs/content/Data-Flow.md @@ -4,4 +4,34 @@ layout: doc_page # Data Flow +The diagram below illustrates how different Druid nodes download data and respond to queries: + + +### Real-time Nodes + +Real-time nodes ingest streaming data and announce themselves and the segments they are serving in Zookeeper on start up. During the segment hand-off stage, real-time nodes create a segment metadata entry in MySQL for the segment to hand-off. This segment is uploaded to Deep Storage. Real-time nodes use Zookeeper to monitor when historical nodes complete downloading the segment (indicating hand-off completion) so that it can forget about it. Real-time nodes also respond to query requests from broker nodes and also return query results to the broker nodes. + +### Deep Storage + +Batch indexed segments and segments created by real-time nodes are uploaded to deep storage. Historical nodes download these segments to serve for queries. + +### MySQL + +Real-time nodes and batch indexing create new segment metadata entries for the new segments they've created. Coordinator nodes read this metadata table to determine what segments should be loaded in the cluster. + +### Coordinator Nodes + +Coordinator nodes read segment metadata information from MySQL to determine what segments should be loaded in the cluster. Coordinator nodes user Zookeeper to determine what historical nodes exist, and also create Zookeeper entries to tell historical nodes to load and drop new segments. + +### Zookeeper + +Real-time nodes announce themselves and the segments they are serving in Zookeeper and also use Zookeeper to monitor segment hand-off. Coordinator nodes use Zookeeper to determine what historical nodes exist in the cluster and create new entries to communicate to historical nodes to load or drop new data. Historical nodes announce themselves and the segments they serve in Zookeeper. Historical nodes also monitor Zookeeper for new load or drop requests. Broker nodes use Zookeeper to determine what historical and real-time nodes exist in the cluster. + +### Historical Nodes + +Historical nodes announce themselves and the segments they are serving in Zookeeper. Historical nodes also use Zookeeper to monitor for signals to load or drop new segments. Historical nodes download segments from deep storage, respond to the queries from broker nodes about these segments, and return results to the broker nodes. + +### Broker Nodes + +Broker nodes receive queries from external clients and forward those queries down to real-time and historical nodes. When the individual nodes return their results, broker nodes merge these results and returns them to the caller. Broker nodes use Zookeeper to determine what real-time and historical nodes exist. diff --git a/docs/content/toc.textile b/docs/content/toc.textile index 35935305d66..f01271705e0 100644 --- a/docs/content/toc.textile +++ b/docs/content/toc.textile @@ -47,6 +47,7 @@ h2. Querying h2. Architecture * "Design":./Design.html +** "Data Flow":./Data-Flow.html * "Segments":./Segments.html * Node Types ** "Historical":./Historical.html