druid/docs/development/extensions-core/avro.md

---
id: avro
title: "Apache Avro"
---

<!--
  ~ Licensed to the Apache Software Foundation (ASF) under one
  ~ or more contributor license agreements.  See the NOTICE file
  ~ distributed with this work for additional information
  ~ regarding copyright ownership.  The ASF licenses this file
  ~ to you under the Apache License, Version 2.0 (the
  ~ "License"); you may not use this file except in compliance
  ~ with the License.  You may obtain a copy of the License at
  ~
  ~   http://www.apache.org/licenses/LICENSE-2.0
  ~
  ~ Unless required by applicable law or agreed to in writing,
  ~ software distributed under the License is distributed on an
  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  ~ KIND, either express or implied.  See the License for the
  ~ specific language governing permissions and limitations
  ~ under the License.
  -->

## Avro extension

This Apache Druid extension enables Druid to ingest and understand the Apache Avro data format. This extension provides 
two Avro Parsers for stream ingestion and Hadoop batch ingestion. 
See [Avro Hadoop Parser](../../ingestion/data-formats.md#avro-hadoop-parser) and [Avro Stream Parser](../../ingestion/data-formats.md#avro-stream-parser)
for more details about how to use these in an ingestion spec.

Additionally, it provides an InputFormat for reading Avro OCF files when using
[native batch indexing](../../ingestion/native-batch.md), see [Avro OCF](../../ingestion/data-formats.md#avro-ocf)
for details on how to ingest OCF files.

Make sure to [include](../../development/extensions.md#loading-extensions) `druid-avro-extensions` in the extensions load list.

### Avro Types

Druid supports most Avro types natively, there are however some exceptions which are detailed here.

#### Unions
Druid has two modes for supporting `union` types.

The default mode will treat unions as a single value regardless of the type it is populated with.

If you wish to operate on each different member of a union however you can set `extractUnionsByType` on the Avro parser in which case unions will be expanded into nested objects according to the following rules:
* Primitive types and unnamed complex types are keyed their type name. i.e `int`, `string`
* Complex named types are keyed by their names, this includes `record`, `fixed` and `enum`.
* The Avro null type is elided as its value can only ever be null

This is safe because an Avro union can only contain a single member of each unnamed type and duplicates of the same named type are not allowed.
i.e only a single array is allowed, multiple records (or other named types) are allowed as long as each has a unique name.

The members can then be accessed using a [flattenSpec](../../ingestion/data-formats.md#flattenspec) similar other nested types.

#### Binary types
`bytes` and `fixed` Avro types will be returned by default as base64 encoded strings unless the `binaryAsString` option is enabled on the Avro parser.
This setting will decode these types as UTF-8 strings.

#### Enums
`enum` types will be returned as `string` of the enum symbol.

#### Complex types
`record` and `map` types representing nested data can be ingested using [flattenSpec](../../ingestion/data-formats.md#flattenspec) on the parser.

#### Logical types
Druid doesn't currently support Avro logical types, they will be ignored and fields will be handled according to the underlying primitive type.
Front Matter header needs to be on the first line for md to be rendered properly by jekyll (#6733) 2018-12-13 14:47:20 -05:00			`---`
Docusaurus build framework + ingestion doc refresh. (#8311) * Docusaurus build framework + ingestion doc refresh. * stick to npm instead of yarn * fix typos * restore some _bin * Adjustments. * detect and fix redirect anchors * update anchor lint * Web-console: remove specific column filters (#8343) * add clear filter * update tool kit * remove usless check * auto run * add % * Fix resource leak (#8337) * Fix resource leak * Patch comments * Enable Spotbugs NP_NONNULL_RETURN_VIOLATION (#8234) * Fixes from PR review. * Fix more anchors. * Preamble nix. * Fix more anchors, headers * clean up placeholder page * add to website lint to travis config * better broken link checking * travis fix * Fixed more broken links * better redirects * unfancy catch * fix LGTM error * link fixes * fix md issues * Addl fixes 2019-08-21 00:48:59 -04:00			`id: avro`
Remove fromPigAvroStorage from the docs (#8340) This one has been deprecated a while ago 2019-08-20 19:34:55 -04:00			`title: "Apache Avro"`
Front Matter header needs to be on the first line for md to be rendered properly by jekyll (#6733) 2018-12-13 14:47:20 -05:00			`---`

add missing license headers, in particular to MD files; clean up RAT … (#6563) * add missing license headers, in particular to MD files; clean up RAT exclusions * revert inadvertent doc changes * docs * cr changes * fix modified druid-production.svg 2018-11-13 12:38:37 -05:00			`<!--`
			`~ Licensed to the Apache Software Foundation (ASF) under one`
			`~ or more contributor license agreements. See the NOTICE file`
			`~ distributed with this work for additional information`
			`~ regarding copyright ownership. The ASF licenses this file`
			`~ to you under the Apache License, Version 2.0 (the`
			`~ "License"); you may not use this file except in compliance`
			`~ with the License. You may obtain a copy of the License at`
			`~`
			`~ http://www.apache.org/licenses/LICENSE-2.0`
			`~`
			`~ Unless required by applicable law or agreed to in writing,`
			`~ software distributed under the License is distributed on an`
			`~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY`
			`~ KIND, either express or implied. See the License for the`
			`~ specific language governing permissions and limitations`
			`~ under the License.`
			`-->`

Update docs for extensions (#9218) * Update docs for s3 and avro extensions * More doc updates - google + cleanup 2020-01-19 15:49:33 -05:00			`## Avro extension`
refactor extensions into their own docs 2016-03-22 16:54:49 -04:00
Update docs for extensions (#9218) * Update docs for s3 and avro extensions * More doc updates - google + cleanup 2020-01-19 15:49:33 -05:00			`This Apache Druid extension enables Druid to ingest and understand the Apache Avro data format. This extension provides`
			`two Avro Parsers for stream ingestion and Hadoop batch ingestion.`
			`See [Avro Hadoop Parser](../../ingestion/data-formats.md#avro-hadoop-parser) and [Avro Stream Parser](../../ingestion/data-formats.md#avro-stream-parser)`
			`for more details about how to use these in an ingestion spec.`

Fix Avro support in Web Console (#10232) * Fix Avro OCF detection prefix and run formation detection on raw input * Support Avro Fixed and Enum types correctly * Check Avro version byte in format detection * Add test for AvroOCFReader.sample Ensures that the Sampler doesn't receive raw input that it can't serialize into JSON. * Document Avro type handling * Add TS unit tests for guessInputFormat 2020-10-08 00:08:22 -04:00			`Additionally, it provides an InputFormat for reading Avro OCF files when using`
			`[native batch indexing](../../ingestion/native-batch.md), see [Avro OCF](../../ingestion/data-formats.md#avro-ocf)`
			`for details on how to ingest OCF files.`

MySQL extension with MariaDB connector docs (#11608) * add docs for mariadb support via mysql extensions * add logging so you know what druid knows * homogenize * spelling * missed a couple 2021-08-19 04:52:26 -04:00			Make sure to [include](../../development/extensions.md#loading-extensions) `druid-avro-extensions` in the extensions load list.
Fix Avro support in Web Console (#10232) * Fix Avro OCF detection prefix and run formation detection on raw input * Support Avro Fixed and Enum types correctly * Check Avro version byte in format detection * Add test for AvroOCFReader.sample Ensures that the Sampler doesn't receive raw input that it can't serialize into JSON. * Document Avro type handling * Add TS unit tests for guessInputFormat 2020-10-08 00:08:22 -04:00
			`### Avro Types`

			`Druid supports most Avro types natively, there are however some exceptions which are detailed here.`

Avro union support (#10505) * Avro union support * Document new union support * Add support for AvroStreamInputFormat and fix checkstyle * Extend multi-member union test schema and format * Some additional docs and add Enums to spelling * Rename explodeUnions -> extractUnions * explode -> extract * ByType * Correct spelling error 2021-07-07 01:05:41 -04:00			`#### Unions`
			Druid has two modes for supporting `union` types.
Fix Avro support in Web Console (#10232) * Fix Avro OCF detection prefix and run formation detection on raw input * Support Avro Fixed and Enum types correctly * Check Avro version byte in format detection * Add test for AvroOCFReader.sample Ensures that the Sampler doesn't receive raw input that it can't serialize into JSON. * Document Avro type handling * Add TS unit tests for guessInputFormat 2020-10-08 00:08:22 -04:00
Avro union support (#10505) * Avro union support * Document new union support * Add support for AvroStreamInputFormat and fix checkstyle * Extend multi-member union test schema and format * Some additional docs and add Enums to spelling * Rename explodeUnions -> extractUnions * explode -> extract * ByType * Correct spelling error 2021-07-07 01:05:41 -04:00			`The default mode will treat unions as a single value regardless of the type it is populated with.`

			If you wish to operate on each different member of a union however you can set `extractUnionsByType` on the Avro parser in which case unions will be expanded into nested objects according to the following rules:
			* Primitive types and unnamed complex types are keyed their type name. i.e `int`, `string`
			* Complex named types are keyed by their names, this includes `record`, `fixed` and `enum`.
			`* The Avro null type is elided as its value can only ever be null`

			`This is safe because an Avro union can only contain a single member of each unnamed type and duplicates of the same named type are not allowed.`
			`i.e only a single array is allowed, multiple records (or other named types) are allowed as long as each has a unique name.`

			`The members can then be accessed using a [flattenSpec](../../ingestion/data-formats.md#flattenspec) similar other nested types.`

			`#### Binary types`
Fix Avro support in Web Console (#10232) * Fix Avro OCF detection prefix and run formation detection on raw input * Support Avro Fixed and Enum types correctly * Check Avro version byte in format detection * Add test for AvroOCFReader.sample Ensures that the Sampler doesn't receive raw input that it can't serialize into JSON. * Document Avro type handling * Add TS unit tests for guessInputFormat 2020-10-08 00:08:22 -04:00			`bytes` and `fixed` Avro types will be returned by default as base64 encoded strings unless the `binaryAsString` option is enabled on the Avro parser.
			`This setting will decode these types as UTF-8 strings.`

Avro union support (#10505) * Avro union support * Document new union support * Add support for AvroStreamInputFormat and fix checkstyle * Extend multi-member union test schema and format * Some additional docs and add Enums to spelling * Rename explodeUnions -> extractUnions * explode -> extract * ByType * Correct spelling error 2021-07-07 01:05:41 -04:00			`#### Enums`
Fix Avro support in Web Console (#10232) * Fix Avro OCF detection prefix and run formation detection on raw input * Support Avro Fixed and Enum types correctly * Check Avro version byte in format detection * Add test for AvroOCFReader.sample Ensures that the Sampler doesn't receive raw input that it can't serialize into JSON. * Document Avro type handling * Add TS unit tests for guessInputFormat 2020-10-08 00:08:22 -04:00			`enum` types will be returned as `string` of the enum symbol.

Avro union support (#10505) * Avro union support * Document new union support * Add support for AvroStreamInputFormat and fix checkstyle * Extend multi-member union test schema and format * Some additional docs and add Enums to spelling * Rename explodeUnions -> extractUnions * explode -> extract * ByType * Correct spelling error 2021-07-07 01:05:41 -04:00			`#### Complex types`
Fix Avro support in Web Console (#10232) * Fix Avro OCF detection prefix and run formation detection on raw input * Support Avro Fixed and Enum types correctly * Check Avro version byte in format detection * Add test for AvroOCFReader.sample Ensures that the Sampler doesn't receive raw input that it can't serialize into JSON. * Document Avro type handling * Add TS unit tests for guessInputFormat 2020-10-08 00:08:22 -04:00			`record` and `map` types representing nested data can be ingested using [flattenSpec](../../ingestion/data-formats.md#flattenspec) on the parser.

Avro union support (#10505) * Avro union support * Document new union support * Add support for AvroStreamInputFormat and fix checkstyle * Extend multi-member union test schema and format * Some additional docs and add Enums to spelling * Rename explodeUnions -> extractUnions * explode -> extract * ByType * Correct spelling error 2021-07-07 01:05:41 -04:00			`#### Logical types`
			`Druid doesn't currently support Avro logical types, they will be ignored and fields will be handled according to the underlying primitive type.`