added firehose and plumber sections, which were being referenced but were missing

This commit is contained in:
Igal Levy 2014-03-27 16:18:56 -07:00
parent 2245764b8f
commit 0f4e5cb125
1 changed files with 21 additions and 0 deletions

View File

@ -1,6 +1,8 @@
--- ---
layout: doc_page layout: doc_page
--- ---
Realtime Data Ingestion Realtime Data Ingestion
======================= =======================
For general Real-time Node information, see [here](Realtime.html). For general Real-time Node information, see [here](Realtime.html).
@ -11,6 +13,7 @@ For writing your own plugins to the real-time node, see [Firehose](Firehose.html
Much of the configuration governing Realtime nodes and the ingestion of data is set in the Realtime spec file, discussed on this page. Much of the configuration governing Realtime nodes and the ingestion of data is set in the Realtime spec file, discussed on this page.
<a id="realtime-specfile"></a> <a id="realtime-specfile"></a>
## Realtime "specFile" ## Realtime "specFile"
@ -81,6 +84,7 @@ This is a JSON Array so you can give more than one realtime stream to a given no
There are four parts to a realtime stream specification, `schema`, `config`, `firehose` and `plumber` which we will go into here. There are four parts to a realtime stream specification, `schema`, `config`, `firehose` and `plumber` which we will go into here.
### Schema ### Schema
This describes the data schema for the output Druid segment. More information about concepts in Druid and querying can be found at [Concepts-and-Terminology](Concepts-and-Terminology.html) and [Querying](Querying.html). This describes the data schema for the output Druid segment. More information about concepts in Druid and querying can be found at [Concepts-and-Terminology](Concepts-and-Terminology.html) and [Querying](Querying.html).
@ -92,6 +96,7 @@ This describes the data schema for the output Druid segment. More information ab
|indexGranularity|String|The granularity of the data inside the segment. E.g. a value of "minute" will mean that data is aggregated at minutely granularity. That is, if there are collisions in the tuple (minute(timestamp), dimensions), then it will aggregate values together using the aggregators instead of storing individual rows.|yes| |indexGranularity|String|The granularity of the data inside the segment. E.g. a value of "minute" will mean that data is aggregated at minutely granularity. That is, if there are collisions in the tuple (minute(timestamp), dimensions), then it will aggregate values together using the aggregators instead of storing individual rows.|yes|
|shardSpec|Object|This describes the shard that is represented by this server. This must be specified properly in order to have multiple realtime nodes indexing the same data stream in a sharded fashion.|no| |shardSpec|Object|This describes the shard that is represented by this server. This must be specified properly in order to have multiple realtime nodes indexing the same data stream in a sharded fashion.|no|
### Config ### Config
This provides configuration for the data processing portion of the realtime stream processor. This provides configuration for the data processing portion of the realtime stream processor.
@ -101,6 +106,22 @@ This provides configuration for the data processing portion of the realtime stre
|intermediatePersistPeriod|ISO8601 Period String|The period that determines the rate at which intermediate persists occur. These persists determine how often commits happen against the incoming realtime stream. If the realtime data loading process is interrupted at time T, it should be restarted to re-read data that arrived at T minus this period.|yes| |intermediatePersistPeriod|ISO8601 Period String|The period that determines the rate at which intermediate persists occur. These persists determine how often commits happen against the incoming realtime stream. If the realtime data loading process is interrupted at time T, it should be restarted to re-read data that arrived at T minus this period.|yes|
|maxRowsInMemory|Number|The number of rows to aggregate before persisting. This number is the post-aggregation rows, so it is not equivalent to the number of input events, but the number of aggregated rows that those events result in. This is used to manage the required JVM heap size.|yes| |maxRowsInMemory|Number|The number of rows to aggregate before persisting. This number is the post-aggregation rows, so it is not equivalent to the number of input events, but the number of aggregated rows that those events result in. This is used to manage the required JVM heap size.|yes|
### Firehose
Firehoses describe the data stream source. See [Firehose](Firehose.html) for more information on firehose configuration.
### Plumber
The Plumber handles generated segments both while they are being generated and when they are "done". The configuration parameters in the example are:
* `type` specifies the type of plumber in terms of configuration schema. The Plumber configuration in the example is for the often-used RealtimePlumber.
* `windowPeriod` is the amount of lag time to allow events. The example configures a 10 minute window, meaning that any event more than 10 minutes ago will be thrown away and not included in the segment generated by the realtime server.
* `segmentGranularity` specifies the granularity of the segment, or the amount of time a segment will represent.
* `basePersistDirectory` is the directory to put things that need persistence. The plumber is responsible for the actual intermediate persists and this tells it where to store those persists.
See [Plumber](Plumber.html) for a fuller discussion of Plumber configuration.
Constraints Constraints
----------- -----------