mirror of https://github.com/apache/druid.git
Firehose migration doc (#12981)
* Firehose migration doc * Update migrate-from-firehose-ingestion.md * Updated with review comments and suggestions * Update migrate-from-firehose-ingestion.md * Update migrate-from-firehose-ingestion.md * Update migrate-from-firehose-ingestion.md
This commit is contained in:
parent
133054bf27
commit
68018a808f
|
@ -0,0 +1,209 @@
|
|||
---
|
||||
id: migrate-from-firehose
|
||||
title: "Migrate from firehose to input source ingestion"
|
||||
sidebar_label: "Migrate from firehose"
|
||||
---
|
||||
|
||||
<!--
|
||||
~ Licensed to the Apache Software Foundation (ASF) under one
|
||||
~ or more contributor license agreements. See the NOTICE file
|
||||
~ distributed with this work for additional information
|
||||
~ regarding copyright ownership. The ASF licenses this file
|
||||
~ to you under the Apache License, Version 2.0 (the
|
||||
~ "License"); you may not use this file except in compliance
|
||||
~ with the License. You may obtain a copy of the License at
|
||||
~
|
||||
~ http://www.apache.org/licenses/LICENSE-2.0
|
||||
~
|
||||
~ Unless required by applicable law or agreed to in writing,
|
||||
~ software distributed under the License is distributed on an
|
||||
~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
~ KIND, either express or implied. See the License for the
|
||||
~ specific language governing permissions and limitations
|
||||
~ under the License.
|
||||
-->
|
||||
|
||||
Apache deprecated support for Druid firehoses in version 0.17. Support for firehose ingestion was removed in version 24.0.
|
||||
|
||||
If you're using a firehose for batch ingestion, we strongly recommend that you follow the instructions on this page to transition to using native batch ingestion input sources as soon as possible.
|
||||
|
||||
Firehose ingestion doesn't work with newer Druid versions, so you must be using an ingestion spec with a defined input source before you upgrade.
|
||||
|
||||
## Migrate from firehose ingestion to an input source
|
||||
|
||||
To migrate from firehose ingestion, you can use the Druid console to update your ingestion spec, or you can update it manually.
|
||||
|
||||
### Use the Druid console
|
||||
|
||||
To update your ingestion spec using the Druid console, open the console and copy your spec into the **Edit spec** stage of the data loader.
|
||||
|
||||
Druid converts the spec into one with a defined input source. For example, it converts the [example firehose ingestion spec](#example-firehose-ingestion-spec) below into the [example ingestion spec after migration](#example-ingestion-spec-after-migration).
|
||||
|
||||
If you're unable to use the console or you have problems with the console method, the alternative is to update your ingestion spec manually.
|
||||
|
||||
### Update your ingestion spec manually
|
||||
|
||||
To update your ingestion spec manually, copy your existing spec into a new file. Refer to [Native batch ingestion with firehose (Deprecated)](./native-batch-firehose.md) for a description of firehose properties.
|
||||
|
||||
Edit the new file as follows:
|
||||
|
||||
1. In the `ioConfig` component, replace the `firehose` definition with an `inputSource` definition for your chosen input source. See [Native batch input sources](./native-batch-input-source.md) for details.
|
||||
2. Move the `timeStampSpec` definition from `parser.parseSpec` to the `dataSchema` component.
|
||||
3. Move the `dimensionsSpec` definition from `parser.parseSpec` to the `dataSchema` component.
|
||||
4. Move the `format` definition from `parser.parseSpec` to an `inputFormat` definition in `ioConfig`.
|
||||
5. Delete the `parser` definition.
|
||||
6. Save the file.
|
||||
<br>You can check the format of your new ingestion file against the [migrated example](#example-ingestion-spec-after-migration) below.
|
||||
7. Test the new ingestion spec with a temporary data source.
|
||||
8. Once you've successfully ingested sample data with the new spec, stop firehose ingestion and switch to the new spec.
|
||||
|
||||
When the transition is complete, you can upgrade Druid to the latest version. See the [Druid release notes](https://druid.apache.org/downloads.html) for upgrade instructions.
|
||||
|
||||
### Example firehose ingestion spec
|
||||
|
||||
An example firehose ingestion spec is as follows:
|
||||
|
||||
```json
|
||||
{
|
||||
"type" : "index",
|
||||
"spec" : {
|
||||
"dataSchema" : {
|
||||
"dataSource" : "wikipedia",
|
||||
"metricsSpec" : [
|
||||
{
|
||||
"type" : "count",
|
||||
"name" : "count"
|
||||
},
|
||||
{
|
||||
"type" : "doubleSum",
|
||||
"name" : "added",
|
||||
"fieldName" : "added"
|
||||
},
|
||||
{
|
||||
"type" : "doubleSum",
|
||||
"name" : "deleted",
|
||||
"fieldName" : "deleted"
|
||||
},
|
||||
{
|
||||
"type" : "doubleSum",
|
||||
"name" : "delta",
|
||||
"fieldName" : "delta"
|
||||
}
|
||||
],
|
||||
"granularitySpec" : {
|
||||
"type" : "uniform",
|
||||
"segmentGranularity" : "DAY",
|
||||
"queryGranularity" : "NONE",
|
||||
"intervals" : [ "2013-08-31/2013-09-01" ]
|
||||
},
|
||||
"parser": {
|
||||
"type": "string",
|
||||
"parseSpec": {
|
||||
"format": "json",
|
||||
"timestampSpec" : {
|
||||
"column" : "timestamp",
|
||||
"format" : "auto"
|
||||
},
|
||||
"dimensionsSpec" : {
|
||||
"dimensions": ["country", "page","language","user","unpatrolled","newPage","robot","anonymous","namespace","continent","region","city"],
|
||||
"dimensionExclusions" : []
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"ioConfig" : {
|
||||
"type" : "index",
|
||||
"firehose" : {
|
||||
"type" : "local",
|
||||
"baseDir" : "examples/indexing/",
|
||||
"filter" : "wikipedia_data.json"
|
||||
}
|
||||
},
|
||||
"tuningConfig" : {
|
||||
"type" : "index",
|
||||
"partitionsSpec": {
|
||||
"type": "single_dim",
|
||||
"partitionDimension": "country",
|
||||
"targetRowsPerSegment": 5000000
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Example ingestion spec after migration
|
||||
|
||||
The following example illustrates the result of migrating the [example firehose ingestion spec](#example-firehose-ingestion-spec) to a spec with an input source:
|
||||
|
||||
```json
|
||||
{
|
||||
"type" : "index",
|
||||
"spec" : {
|
||||
"dataSchema" : {
|
||||
"dataSource" : "wikipedia",
|
||||
"timestampSpec" : {
|
||||
"column" : "timestamp",
|
||||
"format" : "auto"
|
||||
},
|
||||
"dimensionsSpec" : {
|
||||
"dimensions": ["country", "page","language","user","unpatrolled","newPage","robot","anonymous","namespace","continent","region","city"],
|
||||
"dimensionExclusions" : []
|
||||
},
|
||||
"metricsSpec" : [
|
||||
{
|
||||
"type" : "count",
|
||||
"name" : "count"
|
||||
},
|
||||
{
|
||||
"type" : "doubleSum",
|
||||
"name" : "added",
|
||||
"fieldName" : "added"
|
||||
},
|
||||
{
|
||||
"type" : "doubleSum",
|
||||
"name" : "deleted",
|
||||
"fieldName" : "deleted"
|
||||
},
|
||||
{
|
||||
"type" : "doubleSum",
|
||||
"name" : "delta",
|
||||
"fieldName" : "delta"
|
||||
}
|
||||
],
|
||||
"granularitySpec" : {
|
||||
"type" : "uniform",
|
||||
"segmentGranularity" : "DAY",
|
||||
"queryGranularity" : "NONE",
|
||||
"intervals" : [ "2013-08-31/2013-09-01" ]
|
||||
}
|
||||
},
|
||||
"ioConfig" : {
|
||||
"type" : "index",
|
||||
"inputSource" : {
|
||||
"type" : "local",
|
||||
"baseDir" : "examples/indexing/",
|
||||
"filter" : "wikipedia_data.json"
|
||||
},
|
||||
"inputFormat": {
|
||||
"type": "json"
|
||||
}
|
||||
},
|
||||
"tuningConfig" : {
|
||||
"type" : "index",
|
||||
"partitionsSpec": {
|
||||
"type": "single_dim",
|
||||
"partitionDimension": "country",
|
||||
"targetRowsPerSegment": 5000000
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Learn more
|
||||
|
||||
For more information, see the following pages:
|
||||
|
||||
- [Ingestion](./index.md): Overview of the Druid ingestion process.
|
||||
- [Native batch ingestion](./native-batch.md): Description of the supported native batch indexing tasks.
|
||||
- [Ingestion spec reference](./ingestion-spec.md): Description of the components and properties in the ingestion spec.
|
|
@ -1,6 +1,6 @@
|
|||
---
|
||||
id: native-batch-firehose
|
||||
title: "Native batch ingestion with firehose"
|
||||
title: "Native batch ingestion with firehose (Deprecated)"
|
||||
sidebar_label: "Firehose (deprecated)"
|
||||
---
|
||||
|
||||
|
@ -23,14 +23,13 @@ sidebar_label: "Firehose (deprecated)"
|
|||
~ under the License.
|
||||
-->
|
||||
|
||||
|
||||
Firehoses are deprecated in 0.17.0. It's highly recommended to use the [Native batch ingestion input sources](./native-batch-input-source.md) instead.
|
||||
> Firehose ingestion is deprecated. See [Migrate from firehose to input source ingestion](./migrate-from-firehose-ingestion.md) for instructions on migrating from firehose ingestion to using native batch ingestion input sources.
|
||||
|
||||
There are several firehoses readily available in Druid, some are meant for examples, others can be used directly in a production environment.
|
||||
|
||||
## StaticS3Firehose
|
||||
|
||||
> You need to include the [`druid-s3-extensions`](../development/extensions-core/s3.md) as an extension to use the StaticS3Firehose.
|
||||
You need to include the [`druid-s3-extensions`](../development/extensions-core/s3.md) as an extension to use the StaticS3Firehose.
|
||||
|
||||
This firehose ingests events from a predefined list of S3 objects.
|
||||
This firehose is _splittable_ and can be used by the [Parallel task](./native-batch.md).
|
||||
|
@ -62,7 +61,7 @@ Note that prefetching or caching isn't that useful in the Parallel task.
|
|||
|
||||
## StaticGoogleBlobStoreFirehose
|
||||
|
||||
> You need to include the [`druid-google-extensions`](../development/extensions-core/google.md) as an extension to use the StaticGoogleBlobStoreFirehose.
|
||||
You need to include the [`druid-google-extensions`](../development/extensions-core/google.md) as an extension to use the StaticGoogleBlobStoreFirehose.
|
||||
|
||||
This firehose ingests events, similar to the StaticS3Firehose, but from an Google Cloud Store.
|
||||
|
||||
|
@ -112,7 +111,7 @@ Google Blobs:
|
|||
|
||||
## HDFSFirehose
|
||||
|
||||
> You need to include the [`druid-hdfs-storage`](../development/extensions-core/hdfs.md) as an extension to use the HDFSFirehose.
|
||||
You need to include the [`druid-hdfs-storage`](../development/extensions-core/hdfs.md) as an extension to use the HDFSFirehose.
|
||||
|
||||
This firehose ingests events from a predefined list of files from the HDFS storage.
|
||||
This firehose is _splittable_ and can be used by the [Parallel task](./native-batch.md).
|
||||
|
|
|
@ -57,6 +57,8 @@
|
|||
"ids": [
|
||||
"ingestion/native-batch",
|
||||
"ingestion/native-batch-input-sources",
|
||||
"ingestion/migrate-from-firehose",
|
||||
"ingestion/native-batch-firehose",
|
||||
"ingestion/hadoop"
|
||||
]
|
||||
},
|
||||
|
|
Loading…
Reference in New Issue