druid/docs/content/Realtime.md

2.2 KiB
Raw Blame History

layout
doc_page

Real-time Node

For Real-time Node Configuration, see Realtime Configuration.

For Real-time Ingestion, see Realtime Ingestion.

Realtime nodes provide a realtime index. Data indexed via these nodes is immediately available for querying. Realtime nodes will periodically build segments representing the data theyve collected over some span of time and transfer these segments off to Historical nodes. They use ZooKeeper to monitor the transfer and MySQL to store metadata about the transfered segment. Once transfered, segments are forgotten by the Realtime nodes.

Running

io.druid.cli.Main server realtime

Segment Propagation

The segment propagation diagram for real-time data ingestion can be seen below:

Segment Propagation

You can read about the various components shown in this diagram under the Architecture section (see the menu on the left).

Firehose

See Firehose.

Plumber

See Plumber

Extending the code

Realtime integration is intended to be extended in two ways:

  1. Connect to data streams from varied systems (Firehose)
  2. Adjust the publishing strategy to match your needs (Plumber)

The expectations are that the former will be very common and something that users of Druid will do on a fairly regular basis. Most users will probably never have to deal with the latter form of customization. Indeed, we hope that all potential use cases can be packaged up as part of Druid proper without requiring proprietary customization.

Given those expectations, adding a firehose is straightforward and completely encapsulated inside of the interface. Adding a plumber is more involved and requires understanding of how the system works to get right, its not impossible, but its not intended that individuals new to Druid will be able to do it immediately.