mirror of https://github.com/apache/druid.git
287 lines
9.4 KiB
Markdown
287 lines
9.4 KiB
Markdown
---
|
|
layout: doc_page
|
|
---
|
|
|
|
# Druid Firehoses
|
|
Firehoses describe the data stream source. They are pluggable and thus the configuration schema can and will vary based on the `type` of the firehose.
|
|
|
|
| Field | Type | Description | Required |
|
|
|-------|------|-------------|----------|
|
|
| type | String | Specifies the type of firehose. Each value will have its own configuration schema, firehoses packaged with Druid are described below. | yes |
|
|
|
|
We describe the configuration of the [Kafka firehose example](realtime-ingestion.html#realtime-specfile), but there are other types available in Druid (see below).
|
|
|
|
- `consumerProps` is a map of properties for the Kafka consumer. The JSON object is converted into a Properties object and passed along to the Kafka consumer.
|
|
- `feed` is the feed that the Kafka consumer should read from.
|
|
|
|
Available Firehoses
|
|
-------------------
|
|
|
|
There are several firehoses readily available in Druid, some are meant for examples, others can be used directly in a production environment.
|
|
|
|
#### KafkaEightFirehose
|
|
|
|
Please note that the [druid-kafka-eight module](../operations/including-extensions.html) is required for this firehose. This firehose acts as a Kafka 0.8.x consumer and ingests data from Kafka.
|
|
|
|
Sample spec:
|
|
|
|
```json
|
|
"firehose": {
|
|
"type": "kafka-0.8",
|
|
"consumerProps": {
|
|
"zookeeper.connect": "localhost:2181",
|
|
"zookeeper.connection.timeout.ms" : "15000",
|
|
"zookeeper.session.timeout.ms" : "15000",
|
|
"zookeeper.sync.time.ms" : "5000",
|
|
"group.id": "druid-example",
|
|
"fetch.message.max.bytes" : "1048586",
|
|
"auto.offset.reset": "largest",
|
|
"auto.commit.enable": "false"
|
|
},
|
|
"feed": "wikipedia"
|
|
}
|
|
```
|
|
|
|
|property|description|required?|
|
|
|--------|-----------|---------|
|
|
|type|This should be "kafka-0.8"|yes|
|
|
|consumerProps|The full list of consumer configs can be [here](https://kafka.apache.org/08/configuration.html).|yes|
|
|
|feed|Kafka maintains feeds of messages in categories called topics. This is the topic name.|yes|
|
|
|
|
#### StaticS3Firehose
|
|
|
|
This firehose ingests events from a predefined list of S3 objects.
|
|
|
|
Sample spec:
|
|
|
|
```json
|
|
"firehose" : {
|
|
"type" : "static-s3",
|
|
"uris": ["s3://foo/bar/file.gz", "s3://bar/foo/file2.gz"]
|
|
}
|
|
```
|
|
|
|
|property|description|default|required?|
|
|
|--------|-----------|-------|---------|
|
|
|type|This should be "static-s3"|N/A|yes|
|
|
|uris|JSON array of URIs where s3 files to be ingested are located.|N/A|yes|
|
|
|
|
#### StaticAzureBlobStoreFirehose
|
|
|
|
This firehose ingests events, similar to the StaticS3Firehose, but from an Azure Blob Store.
|
|
|
|
Data is newline delimited, with one JSON object per line and parsed as per the `InputRowParser` configuration.
|
|
|
|
The storage account is shared with the one used for Azure deep storage functionality, but blobs can be in a different container.
|
|
|
|
As with the S3 blobstore, it is assumed to be gzipped if the extension ends in .gz
|
|
|
|
Sample spec:
|
|
|
|
```json
|
|
"firehose" : {
|
|
"type" : "static-azure-blobstore",
|
|
"blobs": [
|
|
{
|
|
"container": "container",
|
|
"path": "/path/to/your/file.json"
|
|
},
|
|
{
|
|
"container": "anothercontainer",
|
|
"path": "/another/path.json"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
|property|description|default|required?|
|
|
|--------|-----------|-------|---------|
|
|
|type|This should be "static-azure-blobstore".|N/A|yes|
|
|
|blobs|JSON array of [Azure blobs](https://msdn.microsoft.com/en-us/library/azure/ee691964.aspx).|N/A|yes|
|
|
|
|
Azure Blobs:
|
|
|
|
|property|description|default|required?|
|
|
|--------|-----------|-------|---------|
|
|
|container|Name of the azure [container](https://azure.microsoft.com/en-us/documentation/articles/storage-dotnet-how-to-use-blobs/#create-a-container)|N/A|yes|
|
|
|path|The path where data is located.|N/A|yes|
|
|
|
|
#### TwitterSpritzerFirehose
|
|
|
|
This firehose connects directly to the twitter spritzer data stream.
|
|
|
|
Sample spec:
|
|
|
|
```json
|
|
"firehose" : {
|
|
"type" : "twitzer",
|
|
"maxEventCount": -1,
|
|
"maxRunMinutes": 0
|
|
}
|
|
```
|
|
|
|
|property|description|default|required?|
|
|
|--------|-----------|-------|---------|
|
|
|type|This should be "twitzer"|N/A|yes|
|
|
|maxEventCount|max events to receive, -1 is infinite, 0 means nothing is delivered; use this to prevent infinite space consumption or to prevent getting throttled at an inconvenient time.|N/A|yes|
|
|
|maxRunMinutes|maximum number of minutes to fetch Twitter events. Use this to prevent getting throttled at an inconvenient time. If zero or less, no time limit for run.|N/A|yes|
|
|
|
|
#### RabbitMQFirehose
|
|
|
|
This firehose ingests events from a define rabbit-mq queue.
|
|
|
|
**Note:** Add **amqp-client-3.2.1.jar** to lib directory of druid to use this firehose.
|
|
|
|
A sample spec for rabbitmq firehose:
|
|
|
|
```json
|
|
"firehose" : {
|
|
"type" : "rabbitmq",
|
|
"connection" : {
|
|
"host": "localhost",
|
|
"port": "5672",
|
|
"username": "test-dude",
|
|
"password": "test-word",
|
|
"virtualHost": "test-vhost",
|
|
"uri": "amqp://mqserver:1234/vhost"
|
|
},
|
|
"config" : {
|
|
"exchange": "test-exchange",
|
|
"queue" : "druidtest",
|
|
"routingKey": "#",
|
|
"durable": "true",
|
|
"exclusive": "false",
|
|
"autoDelete": "false",
|
|
"maxRetries": "10",
|
|
"retryIntervalSeconds": "1",
|
|
"maxDurationSeconds": "300"
|
|
}
|
|
}
|
|
```
|
|
|
|
|property|description|default|required?|
|
|
|--------|-----------|-------|---------|
|
|
|type|This should be "rabbitmq"|N/A|yes|
|
|
|host|The hostname of the RabbitMQ broker to connect to|localhost|no|
|
|
|port|The port number to connect to on the RabbitMQ broker|5672|no|
|
|
|username|The username to use to connect to RabbitMQ|guest|no|
|
|
|password|The password to use to connect to RabbitMQ|guest|no|
|
|
|virtualHost|The virtual host to connect to|/|no|
|
|
|uri|The URI string to use to connect to RabbitMQ| |no|
|
|
|exchange|The exchange to connect to| |yes|
|
|
|queue|The queue to connect to or create| |yes|
|
|
|routingKey|The routing key to use to bind the queue to the exchange| |yes|
|
|
|durable|Whether the queue should be durable|false|no|
|
|
|exclusive|Whether the queue should be exclusive|false|no|
|
|
|autoDelete|Whether the queue should auto-delete on disconnect|false|no|
|
|
|maxRetries|The max number of reconnection retry attempts| |yes|
|
|
|retryIntervalSeconds|The reconnection interval| |yes|
|
|
|maxDurationSeconds|The max duration of trying to reconnect| |yes|
|
|
|
|
#### LocalFirehose
|
|
|
|
This Firehose can be used to read the data from files on local disk.
|
|
It can be used for POCs to ingest data on disk.
|
|
A sample local firehose spec is shown below:
|
|
|
|
```json
|
|
{
|
|
"type" : "local",
|
|
"filter" : "*.csv",
|
|
"baseDir" : "/data/directory"
|
|
}
|
|
```
|
|
|
|
|property|description|required?|
|
|
|--------|-----------|---------|
|
|
|type|This should be "local".|yes|
|
|
|filter|A wildcard filter for files. See [here](http://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/filefilter/WildcardFileFilter.html) for more information.|yes|
|
|
|baseDir|directory to search recursively for files to be ingested. |yes|
|
|
|
|
#### IngestSegmentFirehose
|
|
|
|
This Firehose can be used to read the data from existing druid segments.
|
|
It can be used ingest existing druid segments using a new schema and change the name, dimensions, metrics, rollup, etc. of the segment.
|
|
A sample ingest firehose spec is shown below -
|
|
|
|
```json
|
|
{
|
|
"type" : "ingestSegment",
|
|
"dataSource" : "wikipedia",
|
|
"interval" : "2013-01-01/2013-01-02"
|
|
}
|
|
```
|
|
|
|
|property|description|required?|
|
|
|--------|-----------|---------|
|
|
|type|This should be "ingestSegment".|yes|
|
|
|dataSource|A String defining the data source to fetch rows from, very similar to a table in a relational database|yes|
|
|
|interval|A String representing ISO-8601 Interval. This defines the time range to fetch the data over.|yes|
|
|
|dimensions|The list of dimensions to select. If left empty, no dimensions are returned. If left null or not defined, all dimensions are returned. |no|
|
|
|metrics|The list of metrics to select. If left empty, no metrics are returned. If left null or not defined, all metrics are selected.|no|
|
|
|filter| See [Filters](../querying/filters.html)|yes|
|
|
|
|
#### CombiningFirehose
|
|
|
|
This firehose can be used to combine and merge data from a list of different firehoses.
|
|
This can be used to merge data from more than one firehose.
|
|
|
|
```json
|
|
{
|
|
"type" : "combining",
|
|
"delegates" : [ { firehose1 }, { firehose2 }, ..... ]
|
|
}
|
|
```
|
|
|
|
|property|description|required?|
|
|
|--------|-----------|---------|
|
|
|type|This should be "combining"|yes|
|
|
|delegates|list of firehoses to combine data from|yes|
|
|
|
|
|
|
#### EventReceiverFirehose
|
|
|
|
EventReceiverFirehoseFactory can be used to ingest events using an http endpoint.
|
|
|
|
```json
|
|
{
|
|
"type": "receiver",
|
|
"serviceName": "eventReceiverServiceName",
|
|
"bufferSize": 10000
|
|
}
|
|
```
|
|
When using this firehose, events can be sent by submitting a POST request to the http endpoint:
|
|
|
|
`http://<peonHost>:<port>/druid/worker/v1/chat/<eventReceiverServiceName>/push-events/`
|
|
|
|
|property|description|required?|
|
|
|--------|-----------|---------|
|
|
|type|This should be "receiver"|yes|
|
|
|serviceName|name used to announce the event receiver service endpoint|yes|
|
|
|bufferSize| size of buffer used by firehose to store events|no default(100000)|
|
|
|
|
#### TimedShutoffFirehose
|
|
|
|
This can be used to start a firehose that will shut down at a specified time.
|
|
An example is shown below:
|
|
|
|
```json
|
|
{
|
|
"type" : "timed",
|
|
"shutoffTime": "2015-08-25T01:26:05.119Z",
|
|
"delegate": {
|
|
"type": "receiver",
|
|
"serviceName": "eventReceiverServiceName",
|
|
"bufferSize": 100000
|
|
}
|
|
}
|
|
```
|
|
|
|
|property|description|required?|
|
|
|--------|-----------|---------|
|
|
|type|This should be "timed"|yes|
|
|
|shutoffTime|time at which the firehose should shut down, in ISO8601 format|yes|
|
|
|delegate|firehose to use|yes|
|
|
=======
|
|
|