druid/standalone-realtime.md at 106394874987e1db34bbaed0c4e81393e6429b12

mirror of https://github.com/apache/druid.git synced 2025-02-06 01:58:20 +00:00

[Docs] Refactor streaming ingestion section (#15591 )

Merging the work so far. @ektravel , @vogievetsky if there are additional improvements, let's track them & make another pr.



* Refactor streaming ingestion docs

* Update property definition

* Update after review

* Update known issues

* Move kinesis and kafka topics to ingestion, add redirects

* Saving changes

* Saving

* Add input format text

* Update after review

* Minor text edit

* Update example syntax

* Revert back to colon

* Fix merge conflicts

* Fix broken links

* Fix spelling error

2024-02-12 13:52:42 -08:00

2.1 KiB

Raw Blame History

id	layout	title
standalone-realtime	doc_page	Realtime Process

Older versions of Apache Druid supported a standalone 'Realtime' process to query and index 'stream pull' modes of real-time ingestion. These processes would periodically build segments for the data they had collected over some span of time and then set up hand-off to Historical servers.

This processes could be invoked by

org.apache.druid.cli.Main server realtime

This model of stream pull ingestion was deprecated for a number of both operational and architectural reasons, and removed completely in Druid 0.16.0. Operationally, realtime nodes were difficult to configure, deploy, and scale because each node required an unique configuration. The design of the stream pull ingestion system for realtime nodes also suffered from limitations which made it not possible to achieve exactly once ingestion.

The extensions druid-kafka-eight, druid-kafka-eight-simpleConsumer, druid-rabbitmq, and druid-rocketmq were also removed at this time, since they were built to operate on the realtime nodes.

Please consider using the Kafka Indexing Service or Kinesis Indexing Service for stream pull ingestion instead.

2.1 KiB Raw Blame History

2.1 KiB

Raw Blame History