druid/docs/development/extensions-core/avro.md

2.4 KiB

id title
avro Apache Avro

Avro extension

This Apache Druid extension enables Druid to ingest and understand the Apache Avro data format. This extension provides two Avro Parsers for stream ingestion and Hadoop batch ingestion. See Avro Hadoop Parser and Avro Stream Parser for more details about how to use these in an ingestion spec.

Additionally, it provides an InputFormat for reading Avro OCF files when using native batch indexing, see Avro OCF for details on how to ingest OCF files.

Make sure to include druid-avro-extensions as an extension.

Avro Types

Druid supports most Avro types natively, there are however some exceptions which are detailed here.

union types which aren't of the form [null, otherType] aren't supported at this time.

bytes and fixed Avro types will be returned by default as base64 encoded strings unless the binaryAsString option is enabled on the Avro parser. This setting will decode these types as UTF-8 strings.

enum types will be returned as string of the enum symbol.

record and map types representing nested data can be ingested using flattenSpec on the parser.

Druid doesn't currently support Avro logical types, they will be ignored and fields will be handled according to the underlying primitive type.