2018-12-13 14:47:20 -05:00
---
layout: doc_page
2019-04-19 18:52:26 -04:00
title: "Apache Druid (incubating) Firehoses"
2018-12-13 14:47:20 -05:00
---
2018-11-13 12:38:37 -05:00
<!--
~ Licensed to the Apache Software Foundation (ASF) under one
~ or more contributor license agreements. See the NOTICE file
~ distributed with this work for additional information
~ regarding copyright ownership. The ASF licenses this file
~ to you under the Apache License, Version 2.0 (the
~ "License"); you may not use this file except in compliance
~ with the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing,
~ software distributed under the License is distributed on an
~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
~ KIND, either express or implied. See the License for the
~ specific language governing permissions and limitations
~ under the License.
-->
2019-04-23 17:39:16 -04:00
# Apache Druid (incubating) Firehoses
2016-02-29 20:43:35 -05:00
2019-07-12 00:43:46 -04:00
Firehoses are used in [native batch ingestion tasks ](../ingestion/native_tasks.html ) and stream push tasks automatically created by [Tranquility ](../ingestion/stream-push.html ).
2018-09-04 15:54:41 -04:00
2019-07-12 00:43:46 -04:00
They are pluggable, and thus the configuration schema can and will vary based on the `type` of the Firehose.
2015-05-05 17:07:32 -04:00
| Field | Type | Description | Required |
|-------|------|-------------|----------|
2019-07-12 00:43:46 -04:00
| type | String | Specifies the type of Firehose. Each value will have its own configuration schema. Firehoses packaged with Druid are described below. | yes |
2015-05-05 17:07:32 -04:00
2016-03-22 16:54:49 -04:00
## Additional Firehoses
2015-05-05 17:07:32 -04:00
2019-07-12 00:43:46 -04:00
There are several Firehoses readily available in Druid. Some are meant for examples, and others can be used directly in a production environment.
2015-05-05 17:07:32 -04:00
2019-07-12 00:43:46 -04:00
For additional Firehoses, please see our [extensions list ](../development/extensions.html ).
2015-08-02 13:37:07 -04:00
2018-09-04 15:54:41 -04:00
### LocalFirehose
2015-05-05 17:07:32 -04:00
2019-07-30 18:28:10 -04:00
This Firehose can be used to read the data from files on local disk, and is mainly intended for proof-of-concept testing, and works with `string` typed parsers.
2019-07-12 00:43:46 -04:00
This Firehose is _splittable_ and can be used by [native parallel index tasks ](./native_tasks.html#parallel-index-task ).
Since each split represents a file in this Firehose, each worker task of `index_parallel` will read a file.
A sample local Firehose spec is shown below:
2015-05-05 17:07:32 -04:00
```json
{
2019-07-12 00:43:46 -04:00
"type": "local",
"filter" : "*.csv",
"baseDir": "/data/directory"
2015-05-05 17:07:32 -04:00
}
```
|property|description|required?|
|--------|-----------|---------|
|type|This should be "local".|yes|
|filter|A wildcard filter for files. See [here ](http://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/filefilter/WildcardFileFilter.html ) for more information.|yes|
2015-07-05 15:46:44 -04:00
|baseDir|directory to search recursively for files to be ingested. |yes|
2015-05-05 17:07:32 -04:00
2018-09-04 15:54:41 -04:00
### HttpFirehose
2017-05-25 17:13:04 -04:00
2019-07-30 18:28:10 -04:00
This Firehose can be used to read the data from remote sites via HTTP, and works with `string` typed parsers.
2019-07-12 00:43:46 -04:00
This Firehose is _splittable_ and can be used by [native parallel index tasks ](./native_tasks.html#parallel-index-task ).
Since each split represents a file in this Firehose, each worker task of `index_parallel` will read a file.
A sample HTTP Firehose spec is shown below:
2017-05-25 17:13:04 -04:00
```json
{
2019-07-12 00:43:46 -04:00
"type": "http",
"uris": ["http://example.com/uri1", "http://example2.com/uri2"]
2017-05-25 17:13:04 -04:00
}
```
2019-04-15 17:29:01 -04:00
The below configurations can be optionally used if the URIs specified in the spec require a Basic Authentication Header.
Omitting these fields from your spec will result in HTTP requests with no Basic Authentication Header.
|property|description|default|
|--------|-----------|-------|
|httpAuthenticationUsername|Username to use for authentication with specified URIs|None|
|httpAuthenticationPassword|PasswordProvider to use with specified URIs|None|
Example with authentication fields using the DefaultPassword provider (this requires the password to be in the ingestion spec):
```json
{
"type": "http",
"uris": ["http://example.com/uri1", "http://example2.com/uri2"],
"httpAuthenticationUsername": "username",
"httpAuthenticationPassword": "password123"
}
```
You can also use the other existing Druid PasswordProviders. Here is an example using the EnvironmentVariablePasswordProvider:
```json
{
"type": "http",
"uris": ["http://example.com/uri1", "http://example2.com/uri2"],
"httpAuthenticationUsername": "username",
"httpAuthenticationPassword": {
"type": "environment",
"variable": "HTTP_FIREHOSE_PW"
}
}
```
2019-07-30 18:28:10 -04:00
The below configurations can optionally be used for tuning the Firehose performance.
2017-05-25 17:13:04 -04:00
|property|description|default|
|--------|-----------|-------|
|maxCacheCapacityBytes|Maximum size of the cache space in bytes. 0 means disabling cache. Cached files are not removed until the ingestion task completes.|1073741824|
|maxFetchCapacityBytes|Maximum size of the fetch space in bytes. 0 means disabling prefetch. Prefetched files are removed immediately once they are read.|1073741824|
2019-07-12 00:43:46 -04:00
|prefetchTriggerBytes|Threshold to trigger prefetching HTTP objects.|maxFetchCapacityBytes / 2|
|fetchTimeout|Timeout for fetching an HTTP object.|60000|
|maxFetchRetry|Maximum retries for fetching an HTTP object.|3|
2017-05-25 17:13:04 -04:00
2018-09-04 15:54:41 -04:00
### IngestSegmentFirehose
2015-05-05 17:07:32 -04:00
2019-07-30 18:28:10 -04:00
This Firehose can be used to read the data from existing druid segments, potentially using a new schema and changing the name, dimensions, metrics, rollup, etc. of the segment.
2019-07-12 00:43:46 -04:00
This Firehose is _splittable_ and can be used by [native parallel index tasks ](./native_tasks.html#parallel-index-task ).
2019-07-30 18:28:10 -04:00
This firehose will accept any type of parser, but will only utilize the list of dimensions and the timestamp specification.
A sample ingest Firehose spec is shown below:
2015-05-05 17:07:32 -04:00
```json
{
2019-07-12 00:43:46 -04:00
"type": "ingestSegment",
"dataSource": "wikipedia",
"interval": "2013-01-01/2013-01-02"
2015-05-05 17:07:32 -04:00
}
```
|property|description|required?|
|--------|-----------|---------|
2015-08-02 13:37:07 -04:00
|type|This should be "ingestSegment".|yes|
2015-05-05 17:07:32 -04:00
|dataSource|A String defining the data source to fetch rows from, very similar to a table in a relational database|yes|
2019-07-12 00:43:46 -04:00
|interval|A String representing the ISO-8601 interval. This defines the time range to fetch the data over.|yes|
2015-05-05 17:07:32 -04:00
|dimensions|The list of dimensions to select. If left empty, no dimensions are returned. If left null or not defined, all dimensions are returned. |no|
|metrics|The list of metrics to select. If left empty, no metrics are returned. If left null or not defined, all metrics are selected.|no|
2017-11-03 23:55:27 -04:00
|filter| See [Filters ](../querying/filters.html )|no|
2019-04-02 17:59:17 -04:00
|maxInputSegmentBytesPerTask|When used with the native parallel index task, the maximum number of bytes of input segments to process in a single task. If a single segment is larger than this number, it will be processed by itself in a single task (input segments are never split across tasks). Defaults to 150MB.|no|
2015-05-05 17:07:32 -04:00
2019-07-12 00:43:46 -04:00
### SqlFirehose
2018-09-04 15:54:41 -04:00
2019-07-30 18:28:10 -04:00
This Firehose can be used to ingest events residing in an RDBMS. The database connection information is provided as part of the ingestion spec.
For each query, the results are fetched locally and indexed.
If there are multiple queries from which data needs to be indexed, queries are prefetched in the background, up to `maxFetchCapacityBytes` bytes.
This firehose will accept any type of parser, but will only utilize the list of dimensions and the timestamp specification. See the extension documentation for more detailed ingestion examples.
2019-02-15 01:52:03 -05:00
Requires one of the following extensions:
2019-03-09 18:16:23 -05:00
* [MySQL Metadata Store ](../development/extensions-core/mysql.html ).
* [PostgreSQL Metadata Store ](../development/extensions-core/postgresql.html ).
2018-09-04 15:54:41 -04:00
2019-07-30 18:28:10 -04:00
2018-09-04 15:54:41 -04:00
```json
{
2019-07-12 00:43:46 -04:00
"type": "sql",
2018-09-04 15:54:41 -04:00
"database": {
"type": "mysql",
2019-07-12 00:43:46 -04:00
"connectorConfig": {
"connectURI": "jdbc:mysql://host:port/schema",
"user": "user",
"password": "password"
2018-09-04 15:54:41 -04:00
}
},
2019-07-12 00:43:46 -04:00
"sqls": ["SELECT * FROM table1", "SELECT * FROM table2"]
2018-09-04 15:54:41 -04:00
}
```
|property|description|default|required?|
|--------|-----------|-------|---------|
|type|This should be "sql".||Yes|
2019-02-15 01:52:03 -05:00
|database|Specifies the database connection details.||Yes|
2018-09-04 15:54:41 -04:00
|maxCacheCapacityBytes|Maximum size of the cache space in bytes. 0 means disabling cache. Cached files are not removed until the ingestion task completes.|1073741824|No|
|maxFetchCapacityBytes|Maximum size of the fetch space in bytes. 0 means disabling prefetch. Prefetched files are removed immediately once they are read.|1073741824|No|
|prefetchTriggerBytes|Threshold to trigger prefetching SQL result objects.|maxFetchCapacityBytes / 2|No|
|fetchTimeout|Timeout for fetching the result set.|60000|No|
|foldCase|Toggle case folding of database column names. This may be enabled in cases where the database returns case insensitive column names in query results.|false|No|
|sqls|List of SQL queries where each SQL query would retrieve the data to be indexed.||Yes|
2019-07-30 18:28:10 -04:00
#### Database
2019-02-15 01:52:03 -05:00
|property|description|default|required?|
|--------|-----------|-------|---------|
|type|The type of database to query. Valid values are `mysql` and `postgresql` _||Yes|
2019-07-12 00:43:46 -04:00
|connectorConfig|Specify the database connection properties via `connectURI` , `user` and `password` ||Yes|
2019-07-30 18:28:10 -04:00
2019-07-12 00:43:46 -04:00
### InlineFirehose
This Firehose can be used to read the data inlined in its own spec.
2019-07-30 18:28:10 -04:00
It can be used for demos or for quickly testing out parsing and schema, and works with `string` typed parsers.
2019-07-12 00:43:46 -04:00
A sample inline Firehose spec is shown below:
```json
{
"type": "inline",
"data": "0,values,formatted\n1,as,CSV"
}
```
2019-02-15 01:52:03 -05:00
2019-07-12 00:43:46 -04:00
|property|description|required?|
|--------|-----------|---------|
|type|This should be "inline".|yes|
|data|Inlined data to ingest.|yes|
2019-02-15 01:52:03 -05:00
2018-09-04 15:54:41 -04:00
### CombiningFirehose
2015-08-02 13:37:07 -04:00
2019-07-12 00:43:46 -04:00
This Firehose can be used to combine and merge data from a list of different Firehoses.
2015-05-05 17:07:32 -04:00
```json
{
2019-07-12 00:43:46 -04:00
"type": "combining",
"delegates": [ { firehose1 }, { firehose2 }, ... ]
2015-05-05 17:07:32 -04:00
}
```
|property|description|required?|
|--------|-----------|---------|
2015-08-02 13:37:07 -04:00
|type|This should be "combining"|yes|
2019-07-12 00:43:46 -04:00
|delegates|List of Firehoses to combine data from|yes|
2015-05-05 17:07:32 -04:00
2018-09-04 15:54:41 -04:00
### Streaming Firehoses
2019-07-30 18:28:10 -04:00
The EventReceiverFirehose is used in tasks automatically generated by
[Tranquility stream push ](../ingestion/stream-push.html ). These Firehoses are not suitable for batch ingestion.
2018-09-04 15:54:41 -04:00
2015-05-05 17:07:32 -04:00
#### EventReceiverFirehose
2015-08-02 13:37:07 -04:00
2019-07-30 18:28:10 -04:00
This Firehose can be used to ingest events using an HTTP endpoint, and works with `string` typed parsers.
2015-05-05 17:07:32 -04:00
```json
{
"type": "receiver",
"serviceName": "eventReceiverServiceName",
"bufferSize": 10000
}
```
2019-07-12 00:43:46 -04:00
When using this Firehose, events can be sent by submitting a POST request to the HTTP endpoint:
2015-08-02 13:37:07 -04:00
2015-05-05 17:07:32 -04:00
`http://<peonHost>:<port>/druid/worker/v1/chat/<eventReceiverServiceName>/push-events/`
|property|description|required?|
|--------|-----------|---------|
2015-08-02 13:37:07 -04:00
|type|This should be "receiver"|yes|
2019-03-04 16:50:03 -05:00
|serviceName|Name used to announce the event receiver service endpoint|yes|
2019-07-12 00:43:46 -04:00
|maxIdleTime|A Firehose is automatically shut down after not receiving any events for this period of time, in milliseconds. If not specified, a Firehose is never shut down due to being idle. Zero and negative values have the same effect.|no|
|bufferSize|Size of buffer used by Firehose to store events|no, default is 100000|
2015-08-24 21:41:42 -04:00
2016-05-24 10:24:00 -04:00
Shut down time for EventReceiverFirehose can be specified by submitting a POST request to
`http://<peonHost>:<port>/druid/worker/v1/chat/<eventReceiverServiceName>/shutdown?shutoffTime=<shutoffTime>`
2019-07-12 00:43:46 -04:00
If shutOffTime is not specified, the Firehose shuts off immediately.
2016-05-24 10:24:00 -04:00
2015-08-24 21:41:42 -04:00
#### TimedShutoffFirehose
2019-07-12 00:43:46 -04:00
This can be used to start a Firehose that will shut down at a specified time.
2015-08-24 21:41:42 -04:00
An example is shown below:
```json
{
2019-07-12 00:43:46 -04:00
"type": "timed",
2015-08-24 21:41:42 -04:00
"shutoffTime": "2015-08-25T01:26:05.119Z",
"delegate": {
2019-07-12 00:43:46 -04:00
"type": "receiver",
"serviceName": "eventReceiverServiceName",
"bufferSize": 100000
2015-08-24 21:41:42 -04:00
}
}
```
|property|description|required?|
|--------|-----------|---------|
|type|This should be "timed"|yes|
2019-07-12 00:43:46 -04:00
|shutoffTime|Time at which the Firehose should shut down, in ISO8601 format|yes|
|delegate|Firehose to use|yes|