druid/parquet.md at 9032a0b079f1d8665cac1cf51fb5ab609153bd7d

mirror of https://github.com/apache/druid.git synced 2025-02-06 18:18:17 +00:00

Jihoon Son 153495068b Doc update for the new input source and the new input format (#9171 )

* Doc update for new input source and input format.

- The input source and input format are promoted in all docs under docs/ingestion
- All input sources including core extension ones are located in docs/ingestion/native-batch.md
- All input formats and parsers including core extension ones are localted in docs/ingestion/data-formats.md
- New behavior of the parallel task with different partitionsSpecs are documented in docs/ingestion/native-batch.md

* parquet

* add warning for range partitioning with sequential mode

* hdfs + s3, gs

* add fs impl for gs

* address comments

* address comments

* gcs

2020-01-17 15:52:05 -08:00

1.8 KiB

Raw Blame History

id	title
parquet	Apache Parquet Extension

This Apache Druid module extends Druid Hadoop based indexing to ingest data directly from offline Apache Parquet files.

Note: If using the parquet-avro parser for Apache Hadoop based indexing, druid-parquet-extensions depends on the druid-avro-extensions module, so be sure to include both.

The druid-parquet-extensions provides the Parquet input format, the Parquet Hadoop parser, and the Parquet Avro Hadoop Parser with druid-avro-extensions. The Parquet input format is available for native batch ingestion and the other 2 parsers are for Hadoop batch ingestion. Please see corresponding docs for details.

1.8 KiB Raw Blame History

1.8 KiB

Raw Blame History