mirror of https://github.com/apache/druid.git
Add doc for Hadoop-based ingestion vs Native batch ingestion (#7044)
* Add doc for Hadoop-based ingestion vs Native batch ingestion * add links * add links
This commit is contained in:
parent
1701fbcad3
commit
970308463d
|
@ -0,0 +1,43 @@
|
|||
---
|
||||
layout: doc_page
|
||||
title: "Hadoop-based Batch Ingestion VS Native Batch Ingestion"
|
||||
---
|
||||
|
||||
<!--
|
||||
~ Licensed to the Apache Software Foundation (ASF) under one
|
||||
~ or more contributor license agreements. See the NOTICE file
|
||||
~ distributed with this work for additional information
|
||||
~ regarding copyright ownership. The ASF licenses this file
|
||||
~ to you under the Apache License, Version 2.0 (the
|
||||
~ "License"); you may not use this file except in compliance
|
||||
~ with the License. You may obtain a copy of the License at
|
||||
~
|
||||
~ http://www.apache.org/licenses/LICENSE-2.0
|
||||
~
|
||||
~ Unless required by applicable law or agreed to in writing,
|
||||
~ software distributed under the License is distributed on an
|
||||
~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
~ KIND, either express or implied. See the License for the
|
||||
~ specific language governing permissions and limitations
|
||||
~ under the License.
|
||||
-->
|
||||
|
||||
# Comparison of Batch Ingestion Methods
|
||||
|
||||
Druid basically supports three types of batch ingestion: Hadoop-based
|
||||
batch ingestion, native parallel batch ingestion, and native local batch
|
||||
ingestion. The below table shows what features are supported by each
|
||||
ingestion method.
|
||||
|
||||
|
||||
| |Hadoop-based ingestion|Native parallel ingestion|Native local ingestion|
|
||||
|---|----------------------|-------------------------|----------------------|
|
||||
| Parallel indexing | Always parallel | Parallel if firehose is splittable | Always sequential |
|
||||
| Supported indexing modes | Replacing mode | Both appending and replacing modes | Both appending and replacing modes |
|
||||
| External dependency | Hadoop (it internally submits Hadoop jobs) | No dependency | No dependency |
|
||||
| Supported [rollup modes](http://druid.io/docs/latest/ingestion/index.html#roll-up-modes) | Perfect rollup | Best-effort rollup | Both perfect and best-effort rollup |
|
||||
| Supported partitioning methods | [Both Hash-based and range partitioning](http://druid.io/docs/latest/ingestion/hadoop.html#partitioning-specification) | N/A | Hash-based partitioning (when `forceGuaranteedRollup` = true) |
|
||||
| Supported input locations | All locations accessible via HDFS client or Druid dataSource | All implemented [firehoses](./firehose.html) | All implemented [firehoses](./firehose.html) |
|
||||
| Supported file formats | All implemented Hadoop InputFormats | Currently only text file format (CSV, TSV, JSON) | Currently only text file format (CSV, TSV, JSON) |
|
||||
| Saving parse exceptions in ingestion report | Currently not supported | Currently not supported | Supported |
|
||||
| Custom segment version | Supported, but this is NOT recommended | N/A | N/A |
|
|
@ -25,7 +25,9 @@ title: "Hadoop-based Batch Ingestion"
|
|||
# Hadoop-based Batch Ingestion
|
||||
|
||||
Hadoop-based batch ingestion in Druid is supported via a Hadoop-ingestion task. These tasks can be posted to a running
|
||||
instance of a Druid [Overlord](../design/overlord.html).
|
||||
instance of a Druid [Overlord](../design/overlord.html).
|
||||
|
||||
Please check [Hadoop-based Batch Ingestion VS Native Batch Ingestion](./hadoop-vs-native-batch.html) for differences between native batch ingestion and Hadoop-based ingestion.
|
||||
|
||||
## Command Line Hadoop Indexer
|
||||
|
||||
|
|
|
@ -28,6 +28,8 @@ Druid currently has two types of native batch indexing tasks, `index_parallel` w
|
|||
in parallel on multiple MiddleManager nodes, and `index` which will run a single indexing task locally on a single
|
||||
MiddleManager.
|
||||
|
||||
Please check [Hadoop-based Batch Ingestion VS Native Batch Ingestion](./hadoop-vs-native-batch.html) for differences between native batch ingestion and Hadoop-based ingestion.
|
||||
|
||||
Parallel Index Task
|
||||
--------------------------------
|
||||
|
||||
|
|
|
@ -41,6 +41,10 @@ See [batch ingestion](../ingestion/hadoop.html).
|
|||
Druid provides a native index task which doesn't need any dependencies on other systems.
|
||||
See [native index tasks](./native_tasks.html) for more details.
|
||||
|
||||
<div class="note info">
|
||||
Please check [Hadoop-based Batch Ingestion VS Native Batch Ingestion](./hadoop-vs-native-batch.html) for differences between native batch ingestion and Hadoop-based ingestion.
|
||||
</div>
|
||||
|
||||
### Kafka Indexing Tasks
|
||||
|
||||
Kafka Indexing tasks are automatically created by a Kafka Supervisor and are responsible for pulling data from Kafka streams. These tasks are not meant to be created/submitted directly by users. See [Kafka Indexing Service](../development/extensions-core/kafka-ingestion.html) for more details.
|
||||
|
|
Loading…
Reference in New Issue