diff --git a/docs/content/development/extensions-core/avro.md b/docs/content/development/extensions-core/avro.md index 9eba92de679..232020b654d 100644 --- a/docs/content/development/extensions-core/avro.md +++ b/docs/content/development/extensions-core/avro.md @@ -1,6 +1,6 @@ --- layout: doc_page -title: "Avro" +title: "Apache Avro" --- -# Avro +# Apache Avro This Apache Druid (incubating) extension enables Druid to ingest and understand the Apache Avro data format. Make sure to [include](../../operations/including-extensions.html) `druid-avro-extensions` as an extension. @@ -37,7 +37,7 @@ This is for streaming/realtime ingestion. | parseSpec | JSON Object | Specifies the timestamp and dimensions of the data. Should be an "avro" parseSpec. | yes | An Avro parseSpec can contain a [flattenSpec](../../ingestion/flatten-json.html) using either the "root" or "path" -field types, which can be used to read nested Avro records. The "jq" field type is not currently supported for Avro. +field types, which can be used to read nested Avro records. The "jq" field type is currently not supported for Avro. For example, using Avro stream parser with schema repo Avro bytes decoder: @@ -187,13 +187,12 @@ Details can be found in Schema Registry [documentation](http://docs.confluent.io ### Avro Hadoop Parser -This is for batch ingestion using the HadoopDruidIndexer. The `inputFormat` of `inputSpec` in `ioConfig` must be set to `"org.apache.druid.data.input.avro.AvroValueInputFormat"`. You may want to set Avro reader's schema in `jobProperties` in `tuningConfig`, eg: `"avro.schema.input.value.path": "/path/to/your/schema.avsc"` or `"avro.schema.input.value": "your_schema_JSON_object"`, if reader's schema is not set, the schema in Avro object container file will be used, see [Avro specification](http://avro.apache.org/docs/1.7.7/spec.html#Schema+Resolution). Make sure to include "org.apache.druid.extensions:druid-avro-extensions" as an extension. +This is for batch ingestion using the `HadoopDruidIndexer`. The `inputFormat` of `inputSpec` in `ioConfig` must be set to `"org.apache.druid.data.input.avro.AvroValueInputFormat"`. You may want to set Avro reader's schema in `jobProperties` in `tuningConfig`, eg: `"avro.schema.input.value.path": "/path/to/your/schema.avsc"` or `"avro.schema.input.value": "your_schema_JSON_object"`, if reader's schema is not set, the schema in Avro object container file will be used, see [Avro specification](http://avro.apache.org/docs/1.7.7/spec.html#Schema+Resolution). Make sure to include "org.apache.druid.extensions:druid-avro-extensions" as an extension. | Field | Type | Description | Required | |-------|------|-------------|----------| | type | String | This should say `avro_hadoop`. | no | | parseSpec | JSON Object | Specifies the timestamp and dimensions of the data. Should be an "avro" parseSpec. | yes | -| fromPigAvroStorage | Boolean | Specifies whether the data file is stored using AvroStorage. | no(default == false) | An Avro parseSpec can contain a [flattenSpec](../../ingestion/flatten-json.html) using either the "root" or "path" field types, which can be used to read nested Avro records. The "jq" field type is not currently supported for Avro.