Remove the metadataUpdateSpec from specfile (#3973)

Get rid of the metadataUpdateSpec section in the json example to ingest parquet into druid. When this element is present, it will fail start an indexing job.
2017-03-01 23:24:36 +01:00 · 2017-03-01 23:24:36 +01:00 · add17fa7db
parent 8316b4f48f
commit add17fa7db
1 changed files with 75 additions and 16 deletions
--- a/docs/content/development/extensions-contrib/parquet.md
+++ b/docs/content/development/extensions-contrib/parquet.md
@ -2,15 +2,15 @@
 layout: doc_page
 ---
-# Parquet
+# Ingestion using Parquet format
-To use this extension, make sure to [include](../../operations/including-extensions.html) `druid-avro-extensions` and `druid-parquet-extensions`.
+To use this extension, make sure to [include](../../operations/including-extensions.html) both `druid-avro-extensions` and `druid-parquet-extensions`.
 This extension enables Druid to ingest and understand the Apache Parquet data format offline.
 ## Parquet Hadoop Parser
-This is for batch ingestion using the HadoopDruidIndexer. The inputFormat of inputSpec in ioConfig must be set to `"io.druid.data.input.parquet.DruidParquetInputFormat"`. Make sure also to include "io.druid.extensions:druid-avro-extensions" as an extension.
+This is for batch ingestion using the HadoopDruidIndexer. The inputFormat of `inputSpec` in `ioConfig` must be set to `"io.druid.data.input.parquet.DruidParquetInputFormat"`.
 |Field     | Type        | Description                                                                            | Required|
 |----------|-------------|----------------------------------------------------------------------------------------|---------|
@ -18,7 +18,77 @@ This is for batch ingestion using the HadoopDruidIndexer. The inputFormat of inp
 | parseSpec | JSON Object | Specifies the timestamp and dimensions of the data. Should be a timeAndDims parseSpec. | yes |
 | binaryAsString | Boolean | Specifies if the bytes parquet column should be converted to strings. | no(default == false) |
-For example:
+### Example json for overlord
 When posting the index job to the overlord, setting the correct `inputFormat` is required to switch to parquet ingestion. Make sure to set `jobProperties` to make hdfs path timezone unrelated:
 ```json
 {
  "type": "index_hadoop",
  "spec": {
    "ioConfig": {
      "type": "hadoop",
      "inputSpec": {
        "type": "static",
        "inputFormat": "io.druid.data.input.parquet.DruidParquetInputFormat",
        "paths": "no_metrics"
      }
    },
    "dataSchema": {
      "dataSource": "no_metrics",
      "parser": {
        "type": "parquet",
        "parseSpec": {
          "format": "timeAndDims",
          "timestampSpec": {
            "column": "time",
            "format": "auto"
          },
          "dimensionsSpec": {
            "dimensions": [
              "name"
            ],
            "dimensionExclusions": [],
            "spatialDimensions": []
          }
        }
      },
      "metricsSpec": [{
        "type": "count",
        "name": "count"
      }],
      "granularitySpec": {
        "type": "uniform",
        "segmentGranularity": "DAY",
        "queryGranularity": "ALL",
        "intervals": ["2015-12-31/2016-01-02"]
      }
    },
    "tuningConfig": {
      "type": "hadoop",
      "partitionsSpec": {
        "targetPartitionSize": 5000000
      },
      "jobProperties" : {},
      "leaveIntermediate": true
    }
  }
 }
 ```
 ### Example json for standalone jvm
 When using a standalone JVM instead, additional configuration fields are required. You can just fire a hadoop job with your local compiled jars like:
 ```bash
 HADOOP_CLASS_PATH=`hadoop classpath | sed s/*.jar/*/g`
 java -Xmx32m -Duser.timezone=UTC -Dfile.encoding=UTF-8 \
  -classpath config/overlord:config/_common:lib/*:$HADOOP_CLASS_PATH:extensions/druid-avro-extensions/*  \
  io.druid.cli.Main index hadoop \
  wikipedia_hadoop_parquet_job.json
 ```
 An example index json when using the standalone JVM:
 ```json
 {
@ -83,15 +153,4 @@ For example:
 }
 ```
-Almost all the fields listed above are required, including `inputFormat`, `metadataUpdateSpec`(`type`, `connectURI`, `user`, `password`, `segmentTable`). Set `jobProperties` to make hdfs path timezone unrelated.
+Almost all the fields listed above are required, including `inputFormat`, `metadataUpdateSpec`(`type`, `connectURI`, `user`, `password`, `segmentTable`).
 It is no need to make your cluster to update to SNAPSHOT, you can just fire a hadoop job with your local compiled jars like:
 ```bash
 HADOOP_CLASS_PATH=`hadoop classpath | sed s/*.jar/*/g`
 java -Xmx32m -Duser.timezone=UTC -Dfile.encoding=UTF-8 \
  -classpath config/overlord:config/_common:lib/*:$HADOOP_CLASS_PATH:extensions/druid-avro-extensions/*  \
  io.druid.cli.Main index hadoop \
  wikipedia_hadoop_parquet_job.json
 ```