druid/docs/development/extensions-core/avro.md

3.2 KiB

id title
avro Apache Avro

This Apache Druid extension enables Druid to ingest and parse the Apache Avro data format as follows:

The Avro Stream Parser is deprecated.

Load the Avro extension

To use the Avro extension, add the druid-avro-extensions to the list of loaded extensions. See Loading extensions for more information.

Avro types

Druid supports most Avro types natively. This section describes some exceptions.

Unions

Druid has two modes for supporting union types.

The default mode treats unions as a single value regardless of the type of data populating the union.

If you want to operate on individual members of a union, set extractUnionsByType on the Avro parser. This configuration expands union values into nested objects according to the following rules:

  • Primitive types and unnamed complex types are keyed by their type name, such as int and string.
  • Complex named types are keyed by their names, this includes record, fixed, and enum.
  • The Avro null type is elided as its value can only ever be null.

This is safe because an Avro union can only contain a single member of each unnamed type and duplicates of the same named type are not allowed. For example, only a single array is allowed, multiple records (or other named types) are allowed as long as each has a unique name.

You can then access the members of the union with a flattenSpec like you would for other nested types.

Binary types

The extension returns bytes and fixed Avro types as base64 encoded strings by default. To decode these types as UTF-8 strings, enable the binaryAsString option on the Avro parser.

Enums

The extension returns enum types as string of the enum symbol.

Complex types

You can ingest record and map types representing nested data with a flattenSpec on the parser.

Logical types

Druid does not currently support Avro logical types. It ignores them and handles fields according to the underlying primitive type.