--- layout: doc_page --- # Orc To use this extension, make sure to [include](../../operations/including-extensions.html) `druid-orc-extensions`. This extension enables Druid to ingest and understand the Apache Orc data format offline. ## Orc Hadoop Parser This is for batch ingestion using the HadoopDruidIndexer. The inputFormat of inputSpec in ioConfig must be set to `"org.apache.hadoop.hive.ql.io.orc.OrcNewInputFormat"`. |Field | Type | Description | Required| |----------|-------------|----------------------------------------------------------------------------------------|---------| |type | String | This should say `orc` | yes| |parseSpec | JSON Object | Specifies the timestamp and dimensions of the data. Any parse spec that extends ParseSpec is possible but only their TimestampSpec and DimensionsSpec are used. | yes| |typeString| String | String representation of Orc struct type info. If not specified, auto constructed from parseSpec but all metric columns are dropped | no| For example of `typeString`, string column col1 and array of string column col2 is represented by `"struct>"`. Currently, it only supports java primitive types and array of java primitive types, which means only 'list' of compound types in [ORC types](https://orc.apache.org/docs/types.html) is supported (list of list is not supported). For example of hadoop indexing: ```json { "type": "index_hadoop", "spec": { "ioConfig": { "type": "hadoop", "inputSpec": { "type": "static", "inputFormat": "org.apache.hadoop.hive.ql.io.orc.OrcNewInputFormat", "paths": "/data/path/in/HDFS/" }, "metadataUpdateSpec": { "type": "postgresql", "connectURI": "jdbc:postgresql://localhost/druid", "user" : "druid", "password" : "asdf", "segmentTable": "druid_segments" }, "segmentOutputPath": "tmp/segments" }, "dataSchema": { "dataSource": "no_metrics", "parser": { "type": "orc", "parseSpec": { "format": "timeAndDims", "timestampSpec": { "column": "time", "format": "auto" }, "dimensionsSpec": { "dimensions": [ "name" ], "dimensionExclusions": [], "spatialDimensions": [] } }, "typeString": "struct" }, "metricsSpec": [{ "type": "count", "name": "count" }], "granularitySpec": { "type": "uniform", "segmentGranularity": "DAY", "queryGranularity": "ALL", "intervals": ["2015-12-31/2016-01-02"] } }, "tuningConfig": { "type": "hadoop", "workingPath": "tmp/working_path", "partitionsSpec": { "targetPartitionSize": 5000000 }, "jobProperties" : {}, "leaveIntermediate": true } } } ``` Almost all the fields listed above are required, including `inputFormat`, `metadataUpdateSpec`(`type`, `connectURI`, `user`, `password`, `segmentTable`). Set `jobProperties` to make hdfs path timezone unrelated.