3.1 KiB

Raw Blame History

layout	title
doc_page	Hadoop-based Batch Ingestion VS Native Batch Ingestion

Comparison of Batch Ingestion Methods

Druid basically supports three types of batch ingestion: Hadoop-based batch ingestion, native parallel batch ingestion, and native local batch ingestion. The below table shows what features are supported by each ingestion method.

	Hadoop-based ingestion	Native parallel ingestion	Native local ingestion
Parallel indexing	Always parallel	Parallel if firehose is splittable	Always sequential
Supported indexing modes	Replacing mode	Both appending and replacing modes	Both appending and replacing modes
External dependency	Hadoop (it internally submits Hadoop jobs)	No dependency	No dependency
Supported rollup modes	Perfect rollup	Best-effort rollup	Both perfect and best-effort rollup
Supported partitioning methods	Both Hash-based and range partitioning	N/A	Hash-based partitioning (when `forceGuaranteedRollup` = true)
Supported input locations	All locations accessible via HDFS client or Druid dataSource	All implemented firehoses	All implemented firehoses
Supported file formats	All implemented Hadoop InputFormats	Currently text file formats (CSV, TSV, JSON) by default. Additional formats can be added though a custom extension implementing `FiniteFirehoseFactory`	Currently text file formats (CSV, TSV, JSON) by default. Additional formats can be added though a custom extension implementing `FiniteFirehoseFactory`
Saving parse exceptions in ingestion report	Currently not supported	Currently not supported	Supported
Custom segment version	Supported, but this is NOT recommended	N/A	N/A