2016-01-06 00:27:52 -05:00
|
|
|
---
|
|
|
|
layout: doc_page
|
|
|
|
---
|
|
|
|
|
|
|
|
## Stream Push
|
|
|
|
|
2016-02-04 14:53:09 -05:00
|
|
|
Druid can connect to any streaming data source through
|
|
|
|
[Tranquility](https://github.com/druid-io/tranquility/blob/master/README.md), a package for pushing
|
2016-01-06 00:27:52 -05:00
|
|
|
streams to Druid in real-time. Druid does not come bundled with Tranquility, and you will have to download the distribution.
|
|
|
|
|
2016-02-04 14:53:09 -05:00
|
|
|
<div class="note info">
|
2016-01-06 00:27:52 -05:00
|
|
|
If you've never loaded streaming data into Druid, we recommend trying out the
|
2016-02-04 14:53:09 -05:00
|
|
|
<a href="../tutorials/tutorial-streams.html">stream loading tutorial</a> first and then coming back to this page.
|
|
|
|
</div>
|
2016-01-06 00:27:52 -05:00
|
|
|
|
2016-02-04 14:53:09 -05:00
|
|
|
Note that with all streaming ingestion options, you must ensure that incoming data is recent
|
|
|
|
enough (within a [configurable windowPeriod](#segmentgranularity-and-windowperiod) of the current
|
|
|
|
time). Older messages will not be processed in real-time. Historical data is best processed with
|
2016-01-06 00:27:52 -05:00
|
|
|
[batch ingestion](../ingestion/batch-ingestion.html).
|
|
|
|
|
|
|
|
### Server
|
|
|
|
|
2016-02-04 14:53:09 -05:00
|
|
|
Druid can use [Tranquility Server](https://github.com/druid-io/tranquility/blob/master/docs/server.md), which
|
|
|
|
lets you send data to Druid without developing a JVM app. You can run Tranquility server colocated with Druid middleManagers
|
2016-01-06 00:27:52 -05:00
|
|
|
and historical processes.
|
|
|
|
|
|
|
|
Tranquility server is started by issuing:
|
|
|
|
|
|
|
|
```bash
|
|
|
|
bin/tranquility server -configFile <path_to_config_file>/server.json
|
|
|
|
```
|
|
|
|
|
|
|
|
To customize Tranquility Server:
|
|
|
|
|
|
|
|
- In `server.json`, customize the `properties` and `dataSources`.
|
2016-02-04 14:53:09 -05:00
|
|
|
- If you have servers already running Tranquility, stop them (CTRL-C) and start
|
2016-01-06 00:27:52 -05:00
|
|
|
them up again.
|
|
|
|
|
|
|
|
For tips on customizing `server.json`, see the
|
|
|
|
*[Loading your own streams](../tutorials/tutorial-streams.html)* tutorial and the
|
|
|
|
[Tranquility Server documentation](https://github.com/druid-io/tranquility/blob/master/docs/server.md).
|
|
|
|
|
|
|
|
### Kafka
|
|
|
|
|
2016-02-04 14:53:09 -05:00
|
|
|
[Tranquility Kafka](https://github.com/druid-io/tranquility/blob/master/docs/kafka.md)
|
|
|
|
lets you load data from Kafka into Druid without writing any code. You only need a configuration
|
2016-01-06 00:27:52 -05:00
|
|
|
file.
|
|
|
|
|
|
|
|
Tranquility server is started by issuing:
|
|
|
|
|
|
|
|
```bash
|
|
|
|
bin/tranquility kafka -configFile <path_to_config_file>/kafka.json
|
|
|
|
```
|
|
|
|
|
|
|
|
To customize Tranquility Kafka in the single-machine quickstart configuration:
|
|
|
|
|
|
|
|
- In `kafka.json`, customize the `properties` and `dataSources`.
|
|
|
|
- If you have Tranquility already running, stop it (CTRL-C) and start it up again.
|
|
|
|
|
2016-02-04 14:53:09 -05:00
|
|
|
For tips on customizing `kafka.json`, see the
|
2016-01-06 00:27:52 -05:00
|
|
|
[Tranquility Kafka documentation](https://github.com/druid-io/tranquility/blob/master/docs/kafka.md).
|
|
|
|
|
|
|
|
### JVM apps and stream processors
|
|
|
|
|
2016-02-04 14:53:09 -05:00
|
|
|
Tranquility can also be embedded in JVM-based applications as a library. You can do this directly
|
|
|
|
in your own program using the
|
|
|
|
[Core API](https://github.com/druid-io/tranquility/blob/master/docs/core.md), or you can use
|
|
|
|
the connectors bundled in Tranquility for popular JVM-based stream processors such as
|
|
|
|
[Storm](https://github.com/druid-io/tranquility/blob/master/docs/storm.md),
|
|
|
|
[Samza](https://github.com/druid-io/tranquility/blob/master/docs/samza.md),
|
|
|
|
[Spark Streaming](https://github.com/druid-io/tranquility/blob/master/docs/spark.md), and
|
2016-01-06 00:27:52 -05:00
|
|
|
[Flink](https://github.com/druid-io/tranquility/blob/master/docs/flink.md).
|
|
|
|
|
|
|
|
## Concepts
|
|
|
|
|
|
|
|
### Task creation
|
|
|
|
|
2016-02-04 14:53:09 -05:00
|
|
|
Tranquility automates creation of Druid realtime indexing tasks, handling partitioning, replication,
|
|
|
|
service discovery, and schema rollover for you, seamlessly and without downtime. You never have to
|
|
|
|
write code to deal with individual tasks directly. But, it can be helpful to understand how
|
2016-01-06 00:27:52 -05:00
|
|
|
Tranquility creates tasks.
|
|
|
|
|
2016-02-04 14:53:09 -05:00
|
|
|
Tranquility spawns relatively short-lived tasks periodically, and each one handles a small number of
|
|
|
|
[Druid segments](../design/segments.html). Tranquility coordinates all task
|
|
|
|
creation through ZooKeeper. You can start up as many Tranquility instances as you like with the same
|
2016-01-06 00:27:52 -05:00
|
|
|
configuration, even on different machines, and they will send to the same set of tasks.
|
|
|
|
|
2016-02-04 14:53:09 -05:00
|
|
|
See the [Tranquility overview](https://github.com/druid-io/tranquility/blob/master/docs/overview.md)
|
2016-01-06 00:27:52 -05:00
|
|
|
for more details about how Tranquility manages tasks.
|
|
|
|
|
|
|
|
### segmentGranularity and windowPeriod
|
|
|
|
|
2016-02-04 14:53:09 -05:00
|
|
|
The segmentGranularity is the time period covered by the segments produced by each task. For
|
|
|
|
example, a segmentGranularity of "hour" will spawn tasks that create segments covering one hour
|
2016-01-06 00:27:52 -05:00
|
|
|
each.
|
|
|
|
|
2016-02-04 14:53:09 -05:00
|
|
|
The windowPeriod is the slack time permitted for events. For example, a windowPeriod of ten minutes
|
|
|
|
(the default) means that any events with a timestamp older than ten minutes in the past, or more
|
2016-01-06 00:27:52 -05:00
|
|
|
than ten minutes in the future, will be dropped.
|
|
|
|
|
2016-02-04 14:53:09 -05:00
|
|
|
These are important configurations because they influence how long tasks will be alive for, and how
|
|
|
|
long data stays in the realtime system before being handed off to the historical nodes. For example,
|
|
|
|
if your configuration has segmentGranularity "hour" and windowPeriod ten minutes, tasks will stay
|
|
|
|
around listening for events for an hour and ten minutes. For this reason, to prevent excessive
|
2016-01-06 00:27:52 -05:00
|
|
|
buildup of tasks, it is recommended that your windowPeriod be less than your segmentGranularity.
|
|
|
|
|
|
|
|
### Append only
|
|
|
|
|
2016-02-04 14:53:09 -05:00
|
|
|
Druid streaming ingestion is *append-only*, meaning you cannot use streaming ingestion to update or
|
|
|
|
delete individual records after they are inserted. If you need to update or delete individual
|
|
|
|
records, you need to use a batch reindexing process. See the *[batch ingest](batch-ingestion.html)*
|
2016-01-06 00:27:52 -05:00
|
|
|
page for more details.
|
|
|
|
|
2016-02-04 14:53:09 -05:00
|
|
|
Druid does support efficient deletion of entire time ranges without resorting to batch reindexing.
|
2016-01-06 00:27:52 -05:00
|
|
|
This can be done automatically through setting up retention policies.
|
|
|
|
|
|
|
|
### Guarantees
|
|
|
|
|
2016-02-04 14:53:09 -05:00
|
|
|
Tranquility operates under a best-effort design. It tries reasonably hard to preserve your data, by allowing you to set
|
|
|
|
up replicas and by retrying failed pushes for a period of time, but it does not guarantee that your events will be
|
2016-01-06 00:27:52 -05:00
|
|
|
processed exactly once. In some conditions, it can drop or duplicate events:
|
|
|
|
|
|
|
|
- Events with timestamps outside your configured windowPeriod will be dropped.
|
2016-02-04 14:53:09 -05:00
|
|
|
- If you suffer more Druid Middle Manager failures than your configured replicas count, some
|
2016-01-06 00:27:52 -05:00
|
|
|
partially indexed data may be lost.
|
2016-02-04 14:53:09 -05:00
|
|
|
- If there is a persistent issue that prevents communication with the Druid indexing service, and
|
|
|
|
retry policies are exhausted during that period, or the period lasts longer than your windowPeriod,
|
2016-01-06 00:27:52 -05:00
|
|
|
some events will be dropped.
|
2016-02-04 14:53:09 -05:00
|
|
|
- If there is an issue that prevents Tranquility from receiving an acknowledgement from the indexing
|
2016-01-06 00:27:52 -05:00
|
|
|
service, it will retry the batch, which can lead to duplicated events.
|
2016-02-04 14:53:09 -05:00
|
|
|
- If you are using Tranquility inside Storm or Samza, various parts of both architectures have an
|
2016-01-06 00:27:52 -05:00
|
|
|
at-least-once design and can lead to duplicated events.
|
|
|
|
|
2016-02-04 14:53:09 -05:00
|
|
|
Under normal operation, these risks are minimal. But if you need absolute 100% fidelity for
|
|
|
|
historical data, we recommend a [hybrid batch/streaming](../tutorials/ingestion.html#hybrid-batch-streaming)
|
2016-01-06 00:27:52 -05:00
|
|
|
architecture.
|
2016-02-08 16:20:04 -05:00
|
|
|
|
2018-01-12 23:52:37 -05:00
|
|
|
### Deployment Notes
|
|
|
|
|
|
|
|
Stream ingestion may generate a large number of small segments because it's difficult to optimize the segment size at
|
|
|
|
ingestion time. The number of segments will increase over time, and this might cuase the query performance issue.
|
|
|
|
|
|
|
|
Details on how to optimize the segment size can be found on [Segment size optimization](../../operations/segment-optimization.html).
|
|
|
|
|
2016-02-08 16:20:04 -05:00
|
|
|
## Documentation
|
|
|
|
|
|
|
|
Tranquility documentation be found [here](https://github.com/druid-io/tranquility/blob/master/README.md).
|
|
|
|
|
|
|
|
## Configuration
|
|
|
|
|
|
|
|
Tranquility configuration can be found [here](https://github.com/druid-io/tranquility/blob/master/docs/configuration.md).
|
|
|
|
|
|
|
|
Tranquility's tuningConfig can be found [here](http://static.druid.io/tranquility/api/latest/#com.metamx.tranquility.druid.DruidTuning).
|