diff --git a/docs/_graphics/druid-dataflow.svg b/docs/_graphics/druid-dataflow.svg index 7e70ac70265..a26eae87aaa 100644 --- a/docs/_graphics/druid-dataflow.svg +++ b/docs/_graphics/druid-dataflow.svg @@ -8,9 +8,9 @@ xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape" version="1.1" - viewBox="40 36 470.92646 255.28501" - width="588.65808" - height="319.10626" + viewBox="40 36 466.7041 318.99965" + width="583.38013" + height="398.74957" id="svg2" inkscape:version="0.48.2 r9819" sodipodi:docname="druid-dataflow.svg" @@ -34,9 +34,9 @@ fit-margin-left="0" fit-margin-right="0" fit-margin-bottom="0" - inkscape:zoom="0.9451452" - inkscape:cx="251.09952" - inkscape:cy="43.169182" + inkscape:zoom="1.3366372" + inkscape:cx="248.71225" + inkscape:cy="133.85839" inkscape:window-x="0" inkscape:window-y="0" inkscape:window-maximized="0" @@ -224,397 +224,502 @@ + stdDeviation="1.8" + id="feGaussianBlur5036" /> + stdDeviation="1.8" + id="feGaussianBlur5040" /> + stdDeviation="1.8" + id="feGaussianBlur5044" /> + stdDeviation="1.8" + id="feGaussianBlur5048" /> + stdDeviation="1.8" + id="feGaussianBlur5052" /> + stdDeviation="1.8" + id="feGaussianBlur5056" /> + + + + + + - - + + - streaming - data - - - client - - - - batch - data - - - - - - - - - - - - - - - streaming + data + + + client + + + + batch + data + + + + + + + + + + + + + + + + - realtime - nodes - - - - realtime + nodes + + + + + - historical - nodes - - - - historical + nodes + + + + + - MySQL - - - - MySQL + + + + + - coordinator - nodes - - - - coordinator + nodes + + + + + - deep - storage - - - - deep + storage + + + + + - Zoo - Keeper - - - - - Zoo + Keeper + + + + + + - broker - nodes - - + x="409.95114" + id="tspan4481" + sodipodi:role="line">broker + nodes + + + + + external dependencies + + + + + druid components + diff --git a/docs/content/Data-Flow.md b/docs/content/Data-Flow.md index 5d2dbf6503a..8abcc8f9532 100644 --- a/docs/content/Data-Flow.md +++ b/docs/content/Data-Flow.md @@ -4,5 +4,34 @@ layout: doc_page # Data Flow - +The diagram below illustrates how different Druid nodes download data and respond to queries: + + +### Real-time Nodes + +Real-time nodes ingest streaming data and announce themselves and the segments they are serving in Zookeeper on start up. During the segment hand-off stage, real-time nodes create a segment metadata entry in MySQL for the segment to hand-off. This segment is uploaded to Deep Storage. Real-time nodes use Zookeeper to monitor when historical nodes complete downloading the segment (indicating hand-off completion) so that it can forget about it. Real-time nodes also respond to query requests from broker nodes and also return query results to the broker nodes. + +### Deep Storage + +Batch indexed segments and segments created by real-time nodes are uploaded to deep storage. Historical nodes download these segments to serve for queries. + +### MySQL + +Real-time nodes and batch indexing create new segment metadata entries for the new segments they've created. Coordinator nodes read this metadata table to determine what segments should be loaded in the cluster. + +### Coordinator Nodes + +Coordinator nodes read segment metadata information from MySQL to determine what segments should be loaded in the cluster. Coordinator nodes user Zookeeper to determine what historical nodes exist, and also create Zookeeper entries to tell historical nodes to load and drop new segments. + +### Zookeeper + +Real-time nodes announce themselves and the segments they are serving in Zookeeper and also use Zookeeper to monitor segment hand-off. Coordinator nodes use Zookeeper to determine what historical nodes exist in the cluster and create new entries to communicate to historical nodes to load or drop new data. Historical nodes announce themselves and the segments they serve in Zookeeper. Historical nodes also monitor Zookeeper for new load or drop requests. Broker nodes use Zookeeper to determine what historical and real-time nodes exist in the cluster. + +### Historical Nodes + +Historical nodes announce themselves and the segments they are serving in Zookeeper. Historical nodes also use Zookeeper to monitor for signals to load or drop new segments. Historical nodes download segments from deep storage, respond to the queries from broker nodes about these segments, and return results to the broker nodes. + +### Broker Nodes + +Broker nodes receive queries from external clients and forward those queries down to real-time and historical nodes. When the individual nodes return their results, broker nodes merge these results and returns them to the caller. Broker nodes use Zookeeper to determine what real-time and historical nodes exist. diff --git a/docs/content/Home.md b/docs/content/Home.md index 63dcd997e3c..6daf59b66ca 100644 --- a/docs/content/Home.md +++ b/docs/content/Home.md @@ -47,7 +47,7 @@ The data store world is vast, confusing and constantly in flux. This page is mea Key Features ------------ -- **Designed for Analytics** - Druid is built for exploratory analytics for OLAP workflows (streamalytics). It supports a variety of filters, aggregators and query types and provides a framework for plugging in new functionality. Users have leveraged Druid’s infrastructure to develop features such as top K queries and histograms. +- **Designed for Analytics** - Druid is built for exploratory analytics for OLAP workflows. It supports a variety of filters, aggregators and query types and provides a framework for plugging in new functionality. Users have leveraged Druid’s infrastructure to develop features such as top K queries and histograms. - **Interactive Queries** - Druid’s low latency data ingestion architecture allows events to be queried milliseconds after they are created. Druid’s query latency is optimized by only reading and scanning exactly what is needed. Aggregate and filter on data without sitting around waiting for results. - **Highly Available** - Druid is used to back SaaS implementations that need to be up all the time. Your data is still available and queryable during system updates. Scale up or down without data loss. - **Scalable** - Existing Druid deployments handle billions of events and terabytes of data per day. Druid is designed to be petabyte scale. diff --git a/docs/content/Tutorial:-A-First-Look-at-Druid.md b/docs/content/Tutorial:-A-First-Look-at-Druid.md index f6ea780ee4c..71cb52739d5 100644 --- a/docs/content/Tutorial:-A-First-Look-at-Druid.md +++ b/docs/content/Tutorial:-A-First-Look-at-Druid.md @@ -320,7 +320,7 @@ Feel free to tweak other query parameters to answer other questions you may have Next Steps ---------- -What to know even more information about the Druid Cluster? Check out [Tutorial%3A The Druid Cluster](Tutorial%3A-The-Druid-Cluster.html) +What to know even more information about the Druid Cluster? Check out [The Druid Cluster](Tutorial%3A-The-Druid-Cluster.html) Druid is even more fun if you load your own data into it! To learn how to load your data, see [Loading Your Data](Tutorial%3A-Loading-Your-Data-Part-1.html). diff --git a/docs/content/Tutorial:-The-Druid-Cluster.md b/docs/content/Tutorial:-The-Druid-Cluster.md index 03d848b00a2..31a87cb0bb2 100644 --- a/docs/content/Tutorial:-The-Druid-Cluster.md +++ b/docs/content/Tutorial:-The-Druid-Cluster.md @@ -244,6 +244,5 @@ druid.processing.buffer.sizeBytes=10000000 Next Steps ---------- - -Now that you have an understanding of what the Druid cluster looks like, why not load some of your own data? +If you are intested in how data flows through the different Druid components, check out the Druid [Data Flow](Data-Flow.html). Now that you have an understanding of what the Druid cluster looks like, why not load some of your own data? Check out the next [tutorial](Tutorial%3A-Loading-Your-Data-Part-1.html) section for more info! \ No newline at end of file diff --git a/docs/content/toc.textile b/docs/content/toc.textile index 35935305d66..f01271705e0 100644 --- a/docs/content/toc.textile +++ b/docs/content/toc.textile @@ -47,6 +47,7 @@ h2. Querying h2. Architecture * "Design":./Design.html +** "Data Flow":./Data-Flow.html * "Segments":./Segments.html * Node Types ** "Historical":./Historical.html diff --git a/docs/img/druid-dataflow-2x.png b/docs/img/druid-dataflow-2x.png index 746b91e13b4..5ce306098d7 100644 Binary files a/docs/img/druid-dataflow-2x.png and b/docs/img/druid-dataflow-2x.png differ