mirror of
https://github.com/apache/druid.git
synced 2025-02-06 18:18:17 +00:00
* Doc update for new input source and input format. - The input source and input format are promoted in all docs under docs/ingestion - All input sources including core extension ones are located in docs/ingestion/native-batch.md - All input formats and parsers including core extension ones are localted in docs/ingestion/data-formats.md - New behavior of the parallel task with different partitionsSpecs are documented in docs/ingestion/native-batch.md * parquet * add warning for range partitioning with sequential mode * hdfs + s3, gs * add fs impl for gs * address comments * address comments * gcs
1.8 KiB
1.8 KiB
id | title |
---|---|
parquet | Apache Parquet Extension |
This Apache Druid module extends Druid Hadoop based indexing to ingest data directly from offline Apache Parquet files.
Note: If using the parquet-avro
parser for Apache Hadoop based indexing, druid-parquet-extensions
depends on the druid-avro-extensions
module, so be sure to
include both.
The druid-parquet-extensions
provides the Parquet input format, the Parquet Hadoop parser,
and the Parquet Avro Hadoop Parser with druid-avro-extensions
.
The Parquet input format is available for native batch ingestion
and the other 2 parsers are for Hadoop batch ingestion.
Please see corresponding docs for details.