diff --git a/docs/content/ingestion/hadoop-vs-native-batch.md b/docs/content/ingestion/hadoop-vs-native-batch.md new file mode 100644 index 00000000000..ce2c97e603b --- /dev/null +++ b/docs/content/ingestion/hadoop-vs-native-batch.md @@ -0,0 +1,43 @@ +--- +layout: doc_page +title: "Hadoop-based Batch Ingestion VS Native Batch Ingestion" +--- + + + +# Comparison of Batch Ingestion Methods + +Druid basically supports three types of batch ingestion: Hadoop-based +batch ingestion, native parallel batch ingestion, and native local batch +ingestion. The below table shows what features are supported by each +ingestion method. + + +| |Hadoop-based ingestion|Native parallel ingestion|Native local ingestion| +|---|----------------------|-------------------------|----------------------| +| Parallel indexing | Always parallel | Parallel if firehose is splittable | Always sequential | +| Supported indexing modes | Replacing mode | Both appending and replacing modes | Both appending and replacing modes | +| External dependency | Hadoop (it internally submits Hadoop jobs) | No dependency | No dependency | +| Supported [rollup modes](http://druid.io/docs/latest/ingestion/index.html#roll-up-modes) | Perfect rollup | Best-effort rollup | Both perfect and best-effort rollup | +| Supported partitioning methods | [Both Hash-based and range partitioning](http://druid.io/docs/latest/ingestion/hadoop.html#partitioning-specification) | N/A | Hash-based partitioning (when `forceGuaranteedRollup` = true) | +| Supported input locations | All locations accessible via HDFS client or Druid dataSource | All implemented [firehoses](./firehose.html) | All implemented [firehoses](./firehose.html) | +| Supported file formats | All implemented Hadoop InputFormats | Currently only text file format (CSV, TSV, JSON) | Currently only text file format (CSV, TSV, JSON) | +| Saving parse exceptions in ingestion report | Currently not supported | Currently not supported | Supported | +| Custom segment version | Supported, but this is NOT recommended | N/A | N/A | diff --git a/docs/content/ingestion/hadoop.md b/docs/content/ingestion/hadoop.md index 4f8174c40a9..c824fd0809c 100644 --- a/docs/content/ingestion/hadoop.md +++ b/docs/content/ingestion/hadoop.md @@ -25,7 +25,9 @@ title: "Hadoop-based Batch Ingestion" # Hadoop-based Batch Ingestion Hadoop-based batch ingestion in Druid is supported via a Hadoop-ingestion task. These tasks can be posted to a running -instance of a Druid [Overlord](../design/overlord.html). +instance of a Druid [Overlord](../design/overlord.html). + +Please check [Hadoop-based Batch Ingestion VS Native Batch Ingestion](./hadoop-vs-native-batch.html) for differences between native batch ingestion and Hadoop-based ingestion. ## Command Line Hadoop Indexer diff --git a/docs/content/ingestion/native_tasks.md b/docs/content/ingestion/native_tasks.md index 5f7298363a9..963adeae21d 100644 --- a/docs/content/ingestion/native_tasks.md +++ b/docs/content/ingestion/native_tasks.md @@ -28,6 +28,8 @@ Druid currently has two types of native batch indexing tasks, `index_parallel` w in parallel on multiple MiddleManager nodes, and `index` which will run a single indexing task locally on a single MiddleManager. +Please check [Hadoop-based Batch Ingestion VS Native Batch Ingestion](./hadoop-vs-native-batch.html) for differences between native batch ingestion and Hadoop-based ingestion. + Parallel Index Task -------------------------------- diff --git a/docs/content/ingestion/tasks.md b/docs/content/ingestion/tasks.md index 41f7b52444b..4653d6ba2ed 100644 --- a/docs/content/ingestion/tasks.md +++ b/docs/content/ingestion/tasks.md @@ -41,6 +41,10 @@ See [batch ingestion](../ingestion/hadoop.html). Druid provides a native index task which doesn't need any dependencies on other systems. See [native index tasks](./native_tasks.html) for more details. +