mirror of https://github.com/apache/druid.git
3.0 KiB
3.0 KiB
layout | title |
---|---|
doc_page | Hadoop-based Batch Ingestion VS Native Batch Ingestion |
Comparison of Batch Ingestion Methods
Apache Druid (incubating) basically supports three types of batch ingestion: Apache Hadoop-based batch ingestion, native parallel batch ingestion, and native local batch ingestion. The below table shows what features are supported by each ingestion method.
Hadoop-based ingestion | Native parallel ingestion | Native local ingestion | |
---|---|---|---|
Parallel indexing | Always parallel | Parallel if firehose is splittable | Always sequential |
Supported indexing modes | Replacing mode | Both appending and replacing modes | Both appending and replacing modes |
External dependency | Hadoop (it internally submits Hadoop jobs) | No dependency | No dependency |
Supported rollup modes | Perfect rollup | Best-effort rollup | Both perfect and best-effort rollup |
Supported partitioning methods | Both Hash-based and range partitioning | N/A | Hash-based partitioning (when forceGuaranteedRollup = true) |
Supported input locations | All locations accessible via HDFS client or Druid dataSource | All implemented firehoses | All implemented firehoses |
Supported file formats | All implemented Hadoop InputFormats | Currently text file formats (CSV, TSV, JSON) by default. Additional formats can be added though a custom extension implementing FiniteFirehoseFactory |
Currently text file formats (CSV, TSV, JSON) by default. Additional formats can be added though a custom extension implementing FiniteFirehoseFactory |
Saving parse exceptions in ingestion report | Currently not supported | Currently not supported | Supported |
Custom segment version | Supported, but this is NOT recommended | N/A | N/A |