mirror of https://github.com/apache/druid.git
clarify avro support & general style improvements (#11975)
* clarify avro support & general style improvements * Update docs/development/extensions-core/avro.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/development/extensions-core/avro.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/development/extensions-core/avro.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/development/extensions-core/avro.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/development/extensions-core/avro.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/development/extensions-core/avro.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/development/extensions-core/avro.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/development/extensions-core/avro.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/development/extensions-core/avro.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update avro.md remove redundancy Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
This commit is contained in:
parent
93aeaf4801
commit
f536f31229
|
@ -22,47 +22,43 @@ title: "Apache Avro"
|
|||
~ under the License.
|
||||
-->
|
||||
|
||||
## Avro extension
|
||||
This Apache Druid extension enables Druid to ingest and parse the Apache Avro data format as follows:
|
||||
- [Avro stream input format](../../ingestion/data-formats.md#avro-stream) for Kafka and Kinesis.
|
||||
- [Avro OCF input format](../../ingestion/data-formats.md#avro-ocf) for native batch ingestion.
|
||||
- [Avro Hadoop Parser](../../ingestion/data-formats.md#avro-hadoop-parser).
|
||||
|
||||
This Apache Druid extension enables Druid to ingest and understand the Apache Avro data format. This extension provides
|
||||
two Avro Parsers for stream ingestion and Hadoop batch ingestion.
|
||||
See [Avro Hadoop Parser](../../ingestion/data-formats.md#avro-hadoop-parser) and [Avro Stream Parser](../../ingestion/data-formats.md#avro-stream-parser)
|
||||
for more details about how to use these in an ingestion spec.
|
||||
The [Avro Stream Parser](../../ingestion/data-formats.md#avro-stream-parser) is deprecated.
|
||||
|
||||
Additionally, it provides an InputFormat for reading Avro OCF files when using
|
||||
[native batch indexing](../../ingestion/native-batch.md), see [Avro OCF](../../ingestion/data-formats.md#avro-ocf)
|
||||
for details on how to ingest OCF files.
|
||||
## Load the Avro extension
|
||||
|
||||
Make sure to [include](../../development/extensions.md#loading-extensions) `druid-avro-extensions` in the extensions load list.
|
||||
To use the Avro extension, add the `druid-avro-extensions` to the list of loaded extensions. See [Loading extensions](../../development/extensions.md#loading-extensions) for more information.
|
||||
|
||||
### Avro Types
|
||||
## Avro types
|
||||
|
||||
Druid supports most Avro types natively, there are however some exceptions which are detailed here.
|
||||
Druid supports most Avro types natively. This section describes some exceptions.
|
||||
|
||||
#### Unions
|
||||
### Unions
|
||||
Druid has two modes for supporting `union` types.
|
||||
|
||||
The default mode will treat unions as a single value regardless of the type it is populated with.
|
||||
The default mode treats unions as a single value regardless of the type of data populating the union.
|
||||
|
||||
If you wish to operate on each different member of a union however you can set `extractUnionsByType` on the Avro parser in which case unions will be expanded into nested objects according to the following rules:
|
||||
* Primitive types and unnamed complex types are keyed their type name. i.e `int`, `string`
|
||||
* Complex named types are keyed by their names, this includes `record`, `fixed` and `enum`.
|
||||
* The Avro null type is elided as its value can only ever be null
|
||||
If you want to operate on individual members of a union, set `extractUnionsByType` on the Avro parser. This configuration expands union values into nested objects according to the following rules:
|
||||
- Primitive types and unnamed complex types are keyed by their type name, such as `int` and `string`.
|
||||
- Complex named types are keyed by their names, this includes `record`, `fixed`, and `enum`.
|
||||
- The Avro null type is elided as its value can only ever be null.
|
||||
|
||||
This is safe because an Avro union can only contain a single member of each unnamed type and duplicates of the same named type are not allowed.
|
||||
i.e only a single array is allowed, multiple records (or other named types) are allowed as long as each has a unique name.
|
||||
This is safe because an Avro union can only contain a single member of each unnamed type and duplicates of the same named type are not allowed. For example, only a single array is allowed, multiple records (or other named types) are allowed as long as each has a unique name.
|
||||
|
||||
The members can then be accessed using a [flattenSpec](../../ingestion/data-formats.md#flattenspec) similar other nested types.
|
||||
You can then access the members of the union with a [flattenSpec](../../ingestion/data-formats.md#flattenspec) like you would for other nested types.
|
||||
|
||||
#### Binary types
|
||||
`bytes` and `fixed` Avro types will be returned by default as base64 encoded strings unless the `binaryAsString` option is enabled on the Avro parser.
|
||||
This setting will decode these types as UTF-8 strings.
|
||||
### Binary types
|
||||
The extension returns `bytes` and `fixed` Avro types as base64 encoded strings by default. To decode these types as UTF-8 strings, enable the `binaryAsString` option on the Avro parser.
|
||||
|
||||
#### Enums
|
||||
`enum` types will be returned as `string` of the enum symbol.
|
||||
### Enums
|
||||
The extension returns `enum` types as `string` of the enum symbol.
|
||||
|
||||
#### Complex types
|
||||
`record` and `map` types representing nested data can be ingested using [flattenSpec](../../ingestion/data-formats.md#flattenspec) on the parser.
|
||||
### Complex types
|
||||
You can ingest `record` and `map` types representing nested data with a [flattenSpec](../../ingestion/data-formats.md#flattenspec) on the parser.
|
||||
|
||||
#### Logical types
|
||||
Druid doesn't currently support Avro logical types, they will be ignored and fields will be handled according to the underlying primitive type.
|
||||
### Logical types
|
||||
Druid does not currently support Avro logical types. It ignores them and handles fields according to the underlying primitive type.
|
||||
|
|
Loading…
Reference in New Issue