druid/docs/content/design/realtime.md

81 lines
3.7 KiB
Markdown
Raw Normal View History

---
layout: doc_page
2019-02-28 21:10:39 -05:00
title: "Real-time Process"
---
<!--
~ Licensed to the Apache Software Foundation (ASF) under one
~ or more contributor license agreements. See the NOTICE file
~ distributed with this work for additional information
~ regarding copyright ownership. The ASF licenses this file
~ to you under the Apache License, Version 2.0 (the
~ "License"); you may not use this file except in compliance
~ with the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing,
~ software distributed under the License is distributed on an
~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
~ KIND, either express or implied. See the License for the
~ specific language governing permissions and limitations
~ under the License.
-->
2019-02-28 21:10:39 -05:00
# Real-time Process
2016-01-06 00:27:52 -05:00
<div class="note info">
2019-02-28 21:10:39 -05:00
NOTE: Realtime processes are deprecated. Please use the <a href="../development/extensions-core/kafka-ingestion.html">Kafka Indexing Service</a> for stream pull use cases instead.
</div>
2019-02-28 21:10:39 -05:00
For Real-time Process Configuration, see [Realtime Configuration](../configuration/realtime.html).
2016-02-08 16:20:04 -05:00
For Real-time Ingestion, see [Realtime Ingestion](../ingestion/stream-ingestion.html).
2019-02-28 21:10:39 -05:00
Realtime processes provide a realtime index. Data indexed via these processes is immediately available for querying. Realtime processes will periodically build segments representing the data theyve collected over some span of time and transfer these segments off to [Historical](../design/historical.html) processes. They use ZooKeeper to monitor the transfer and the metadata storage to store metadata about the transferred segment. Once transfered, segments are forgotten by the Realtime processes.
### Running
```
org.apache.druid.cli.Main server realtime
```
Segment Propagation
-------------------
The segment propagation diagram for real-time data ingestion can be seen below:
2015-06-29 13:17:42 -04:00
![Segment Propagation](../../img/segmentPropagation.png "Segment Propagation")
You can read about the various components shown in this diagram under the Architecture section (see the menu on the right). Note that some of the names are now outdated.
### Firehose
See [Firehose](../ingestion/firehose.html).
### Plumber
See [Plumber](../design/plumber.html)
Extending the code
------------------
Realtime integration is intended to be extended in two ways:
1. Connect to data streams from varied systems ([Firehose](https://github.com/apache/incubator-druid/blob/master/core/src/main/org/apache/druid/data/input/FirehoseFactory.java))
2. Adjust the publishing strategy to match your needs ([Plumber](https://github.com/apache/incubator-druid/blob/master/server/src/main/java/org/apache/druid/segment/realtime/plumber/PlumberSchool.java))
The expectations are that the former will be very common and something that users of Druid will do on a fairly regular basis. Most users will probably never have to deal with the latter form of customization. Indeed, we hope that all potential use cases can be packaged up as part of Druid proper without requiring proprietary customization.
Given those expectations, adding a firehose is straightforward and completely encapsulated inside of the interface. Adding a plumber is more involved and requires understanding of how the system works to get right, its not impossible, but its not intended that individuals new to Druid will be able to do it immediately.
HTTP Endpoints
--------------
2019-02-28 21:10:39 -05:00
The real-time process exposes several HTTP endpoints for interactions.
### GET
* `/status`
2019-02-28 21:10:39 -05:00
Returns the Druid version, loaded extensions, memory used, total memory and other useful information about the process.