3.2 KiB

Raw Blame History

layout	title
doc_page	Hadoop-based Batch Ingestion VS Native Batch Ingestion

Comparison of Batch Ingestion Methods

Apache Druid (incubating) basically supports three types of batch ingestion: Apache Hadoop-based batch ingestion, native parallel batch ingestion, and native local batch ingestion. The below table shows what features are supported by each ingestion method.

	Hadoop-based ingestion	Native parallel ingestion	Native local ingestion
Parallel indexing	Always parallel	Parallel if firehose is splittable & maxNumConcurrentSubTasks > 1 in tuningConfig	Always sequential
Supported indexing modes	Overwriting mode	Both appending and overwriting modes	Both appending and overwriting modes
External dependency	Hadoop (it internally submits Hadoop jobs)	No dependency	No dependency
Supported rollup modes	Perfect rollup	Both perfect and best-effort rollup	Both perfect and best-effort rollup
Supported partitioning methods	Both Hash-based and range partitioning	Hash-based partitioning (when `forceGuaranteedRollup` = true)	Hash-based partitioning (when `forceGuaranteedRollup` = true)
Supported input locations	All locations accessible via HDFS client or Druid dataSource	All implemented firehoses	All implemented firehoses
Supported file formats	All implemented Hadoop InputFormats	Currently text file formats (CSV, TSV, JSON) by default. Additional formats can be added though a custom extension implementing `FiniteFirehoseFactory`	Currently text file formats (CSV, TSV, JSON) by default. Additional formats can be added though a custom extension implementing `FiniteFirehoseFactory`
Saving parse exceptions in ingestion report	Currently not supported	Currently not supported	Supported
Custom segment version	Supported, but this is NOT recommended	N/A	N/A

3.2 KiB Raw Blame History

Comparison of Batch Ingestion Methods

3.2 KiB

Raw Blame History