~ Licensed to the Apache Software Foundation (ASF) under one
~ or more contributor license agreements. See the NOTICE file
~ distributed with this work for additional information
~ regarding copyright ownership. The ASF licenses this file
~ to you under the Apache License, Version 2.0 (the
~ "License"); you may not use this file except in compliance
~ with the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing,
~ software distributed under the License is distributed on an
~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
~ KIND, either express or implied. See the License for the
~ specific language governing permissions and limitations
~ under the License.
-->
Druid has a multi-process, distributed architecture that is designed to be cloud-friendly and easy to operate. Each
Druid process type can be configured and scaled independently, giving you maximum flexibility over your cluster. This
design also provides enhanced fault tolerance: an outage of one component will not immediately affect other components.
## Processes and Servers
Druid has several process types, briefly described below:
* [**Coordinator**](../design/coordinator.md) processes manage data availability on the cluster.
* [**Overlord**](../design/overlord.md) processes control the assignment of data ingestion workloads.
* [**Broker**](../design/broker.md) processes handle queries from external clients.
* [**Router**](../design/router.md) processes are optional processes that can route requests to Brokers, Coordinators, and Overlords.
* [**Historical**](../design/historical.md) processes store queryable data.
* [**MiddleManager**](../design/middlemanager.md) processes are responsible for ingesting data.
Druid processes can be deployed any way you like, but for ease of deployment we suggest organizing them into three server types: Master, Query, and Data.
* **Master**: Runs Coordinator and Overlord processes, manages data availability and ingestion.
* **Query**: Runs Broker and optional Router processes, handles queries from external clients.
* **Data**: Runs Historical and MiddleManager processes, executes ingestion workloads and stores all queryable data.
You may be wondering what the "version number" described in the previous section is for. Or, you might not be, in which
case good for you and you can skip this section!
It's there to support batch-mode overwriting. In Druid, if all you ever do is append data, then there will be just a
single version for each time chunk. But when you overwrite data, what happens behind the scenes is that a new set of
segments is created with the same datasource, same time interval, but a higher version number. This is a signal to the
rest of the Druid system that the older version should be removed from the cluster, and the new version should replace
it.
The switch appears to happen instantaneously to a user, because Druid handles this by first loading the new data (but
not allowing it to be queried), and then, as soon as the new data is all loaded, switching all new queries to use those
new segments. Then it drops the old segments a few minutes later.
### Segment lifecycle
Each segment has a lifecycle that involves the following three major areas:
1.**Metadata store:** Segment metadata (a small JSON payload generally no more than a few KB) is stored in the
[metadata store](../dependencies/metadata-storage.md) once a segment is done being constructed. The act of inserting
a record for a segment into the metadata store is called _publishing_. These metadata records have a boolean flag
named `used`, which controls whether the segment is intended to be queryable or not. Segments created by realtime tasks will be
available before they are published, since they are only published when the segment is complete and will not accept
any additional rows of data.
2.**Deep storage:** Segment data files are pushed to deep storage once a segment is done being constructed. This
happens immediately before publishing metadata to the metadata store.
3.**Availability for querying:** Segments are available for querying on some Druid data server, like a realtime task
or a Historical process.
You can inspect the state of currently active segments using the Druid SQL
[`sys.segments` table](../querying/sql.md#segments-table). It includes the following flags:
-`is_published`: True if segment metadata has been published to the metadata stored and `used` is true.
-`is_available`: True if the segment is currently available for querying, either on a realtime task or Historical
process.
-`is_realtime`: True if the segment is _only_ available on realtime tasks. For datasources that use realtime ingestion,
this will generally start off `true` and then become `false` as the segment is published and handed off.
-`is_overshadowed`: True if the segment is published (with `used` set to true) and is fully overshadowed by some other
published segments. Generally this is a transient state, and segments in this state will soon have their `used` flag
automatically set to false.
## Query processing
Queries first enter the [Broker](../design/broker.md), where the Broker will identify which segments have data that may pertain to that query.
The list of segments is always pruned by time, and may also be pruned by other attributes depending on how your
datasource is partitioned. The Broker will then identify which [Historicals](../design/historical.md) and
[MiddleManagers](../design/middlemanager.md) are serving those segments and send a rewritten subquery to each of those processes. The Historical/MiddleManager processes will take in the
queries, process them and return results. The Broker receives results and merges them together to get the final answer,
which it returns to the original caller.
Broker pruning is an important way that Druid limits the amount of data that must be scanned for each query, but it is
not the only way. For filters at a more granular level than what the Broker can use for pruning, indexing structures
inside each segment allow Druid to figure out which (if any) rows match the filter set before looking at any row of
data. Once Druid knows which rows match a particular query, it only accesses the specific columns it needs for that
query. Within those columns, Druid can skip from row to row, avoiding reading data that doesn't match the query filter.
So Druid uses three different techniques to maximize query performance:
- Pruning which segments are accessed for each query.
- Within each segment, using indexes to identify which rows must be accessed.
- Within each segment, only reading the specific rows and columns that are relevant to a particular query.