druid/docs/content/Firehose.md

108 lines
4.1 KiB
Markdown
Raw Normal View History

---
2013-09-26 19:22:28 -04:00
layout: doc_page
---
2013-12-20 16:42:22 -05:00
# Druid Firehoses
2013-09-13 18:20:39 -04:00
Firehoses describe the data stream source. They are pluggable and thus the configuration schema can and will vary based on the `type` of the firehose.
| Field | Type | Description | Required |
|-------|------|-------------|----------|
| type | String | Specifies the type of firehose. Each value will have its own configuration schema, firehoses packaged with Druid are described below. | yes |
2013-09-13 18:20:39 -04:00
We describe the configuration of the [Kafka firehose example](Realtime-ingestion.html#realtime-specfile), but there are other types available in Druid (see below).
2013-09-13 18:20:39 -04:00
- `consumerProps` is a map of properties for the Kafka consumer. The JSON object is converted into a Properties object and passed along to the Kafka consumer.
- `feed` is the feed that the Kafka consumer should read from.
- `parser` represents a parser that knows how to convert from String representations into the required `InputRow` representation that Druid uses. This is a potentially reusable piece that can be found in many of the firehoses that are based on text streams. The spec in the example describes a JSON feed (new-line delimited objects), with a timestamp column called "timestamp" in ISO8601 format and that it should not include the dimension "value" when processing. More information about the options available for the parser are available below.
2013-09-13 18:20:39 -04:00
Available Firehoses
-------------------
There are several firehoses readily available in Druid, some are meant for examples, others can be used directly in a production environment.
#### KafkaFirehose
This firehose acts as a Kafka consumer and ingests data from Kafka.
#### StaticS3Firehose
This firehose ingests events from a predefined list of S3 objects.
#### TwitterSpritzerFirehose
2013-09-16 19:01:14 -04:00
See [Examples](Examples.html). This firehose connects directly to the twitter spritzer data stream.
2013-09-13 18:20:39 -04:00
#### RandomFirehose
2013-09-16 19:01:14 -04:00
See [Examples](Examples.html). This firehose creates a stream of random numbers.
2013-09-13 18:20:39 -04:00
2014-08-01 07:13:39 -04:00
#### RabbitMqFirehose
2013-09-13 18:20:39 -04:00
This firehose ingests events from a define rabbit-mq queue.
#### LocalFirehose
This Firehose can be used to read the data from files on local disk.
It can be used for POCs to ingest data on disk.
A sample local firehose spec is shown below:
```json
{
"type" : "local",
"filter" : "*.csv",
"parser" : {
"timestampSpec": {
"column": "mytimestamp",
"format": "yyyy-MM-dd HH:mm:ss"
},
"data": {
"format": "csv",
"columns": [...],
"dimensions": [...]
}
}
}
```
|property|description|required?|
|--------|-----------|---------|
|type|This should be "local".|yes|
|filter|A wildcard filter for files. See [here](http://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/filefilter/WildcardFileFilter.html) for more information.|yes|
|data|A data spec similar to what is used for batch ingestion.|yes|
2014-08-01 07:13:39 -04:00
#### IngestSegmentFirehose
This Firehose can be used to read the data from existing druid segments.
It can be used ingest existing druid segments using a new schema and change the name, dimensions, metrics, rollup, etc. of the segment.
A sample ingest firehose spec is shown below -
```json
{
"type" : "ingestSegment",
"dataSource" : "wikipedia",
"interval" : "2013-01-01/2013-01-02"
2014-08-01 07:13:39 -04:00
}
```
|property|description|required?|
|--------|-----------|---------|
|type|ingestSegment. Type of firehose|yes|
|dataSource|A String defining the data source to fetch rows from, very similar to a table in a relational database|yes|
|interval|A String representing ISO-8601 Interval. This defines the time range to fetch the data over.|yes|
|dimensions|The list of dimensions to select. If left empty, no dimensions are returned. If left null or not defined, all dimensions are returned. |no|
|metrics|The list of metrics to select. If left empty, no metrics are returned. If left null or not defined, all metrics are selected.|no|
2014-08-01 07:13:39 -04:00
|filter| See [Filters](Filters.html)|yes|
2013-09-13 18:20:39 -04:00
Parsing Data
------------
There are several ways to parse data.
#### StringInputRowParser
This parser converts Strings.
#### MapInputRowParser
This parser converts flat, key/value pair maps.