--- id: orc title: "ORC Extension" --- ## ORC extension This Apache Druid extension enables Druid to ingest and understand the Apache ORC data format. The extension provides the [ORC input format](../../ingestion/data-formats.md#orc) and the [ORC Hadoop parser](../../ingestion/data-formats.md#orc-hadoop-parser) for [native batch ingestion](../../ingestion/native-batch.md) and [Hadoop batch ingestion](../../ingestion/hadoop.md), respectively. Please see corresponding docs for details. To use this extension, make sure to [include](../../development/extensions.md#loading-extensions) `druid-orc-extensions` in the extensions load list. ### Migration from 'contrib' extension This extension, first available in version 0.15.0, replaces the previous 'contrib' extension which was available until 0.14.0-incubating. While this extension can index any data the 'contrib' extension could, the JSON spec for the ingestion task is *incompatible*, and will need modified to work with the newer 'core' extension. To migrate to 0.15.0+: * In `inputSpec` of `ioConfig`, `inputFormat` must be changed from `"org.apache.hadoop.hive.ql.io.orc.OrcNewInputFormat"` to `"org.apache.orc.mapreduce.OrcInputFormat"` * The 'contrib' extension supported a `typeString` property, which provided the schema of the ORC file, of which was essentially required to have the types correct, but notably _not_ the column names, which facilitated column renaming. In the 'core' extension, column renaming can be achieved with [`flattenSpec`](../../ingestion/ingestion-spec.md#flattenspec). For example, `"typeString":"struct"` with the actual schema `struct<_col0:string,_col1:string>`, to preserve Druid schema would need replaced with: ```json "flattenSpec": { "fields": [ { "type": "path", "name": "time", "expr": "$._col0" }, { "type": "path", "name": "name", "expr": "$._col1" } ] ... } ``` * The 'contrib' extension supported a `mapFieldNameFormat` property, which provided a way to specify a dimension to flatten `OrcMap` columns with primitive types. This functionality has also been replaced with [`flattenSpec`](../../ingestion/ingestion-spec.md#flattenspec). For example: `"mapFieldNameFormat": "_"` for a dimension `nestedData_dim1`, to preserve Druid schema could be replaced with ```json "flattenSpec": { "fields": [ { "type": "path", "name": "nestedData_dim1", "expr": "$.nestedData.dim1" } ] ... } ```