druid/docs/content/Tasks.md

---
layout: doc_page
---
# Tasks
Tasks are run on middle managers and always operate on a single data source. Tasks are submitted using [POST requests](Indexing-Service.html).

There are several different types of tasks.

Segment Creation Tasks
----------------------

### Index Task

The Index Task is a simpler variation of the Index Hadoop task that is designed to be used for smaller data sets. The task executes within the indexing service and does not require an external Hadoop setup to use. The grammar of the index task is as follows:

```json
{
  "type" : "index",
  "spec" : {
    "dataSchema" : {
      "dataSource" : "wikipedia",
      "parser" : {
        "type" : "string",
        "parseSpec" : {
          "format" : "json",
          "timestampSpec" : {
            "column" : "timestamp",
            "format" : "auto"
          },
          "dimensionsSpec" : {
            "dimensions": ["page","language","user","unpatrolled","newPage","robot","anonymous","namespace","continent","country","region","city"],
            "dimensionExclusions" : [],
            "spatialDimensions" : []
          }
        }
      },
      "metricsSpec" : [
        {
          "type" : "count",
          "name" : "count"
        },
        {
          "type" : "doubleSum",
          "name" : "added",
          "fieldName" : "added"
        },
        {
          "type" : "doubleSum",
          "name" : "deleted",
          "fieldName" : "deleted"
        },
        {
          "type" : "doubleSum",
          "name" : "delta",
          "fieldName" : "delta"
        }
      ],
      "granularitySpec" : {
        "type" : "uniform",
        "segmentGranularity" : "DAY",
        "queryGranularity" : "NONE",
        "intervals" : [ "2013-08-31/2013-09-01" ]
      }
    },
    "ioConfig" : {
      "type" : "index",
      "firehose" : {
        "type" : "local",
        "baseDir" : "examples/indexing/",
        "filter" : "wikipedia_data.json"
       }
    },
    "tuningConfig" : {
      "type" : "index",
      "targetPartitionSize" : -1,
      "rowFlushBoundary" : 0,
      "numShards": 1
    }
  }
}
```

#### Task Properties

|property|description|required?|
|--------|-----------|---------|
|type|The task type, this should always be "index".|yes|
|id|The task ID. If this is not explicitly specified, Druid generates the task ID using the name of the task file and date-time stamp. |no|
|spec|The ingestion spec. See below for more details. |yes|

#### DataSchema

This field is required.

See [Ingestion](Ingestion.html)

#### IOConfig

This field is required. You can specify a type of [Firehose](Firehose.html) here.

#### TuningConfig

The tuningConfig is optional and default parameters will be used if no tuningConfig is specified. See below for more details.

|property|description|default|required?|
|--------|-----------|-------|---------|
|type|The task type, this should always be "index".|None.||yes|
|targetPartitionSize|Used in sharding. Determines how many rows are in each segment. Set this to -1 to use numShards instead for sharding.|5000000|no|
|rowFlushBoundary|Used in determining when intermediate persist should occur to disk.|500000|no|
|numShards|Directly specify the number of shards to create. You can skip the intermediate persist step if you specify the number of shards you want and set targetPartitionSize=-1.|null|no|

### Index Hadoop Task

The Hadoop Index Task is used to index larger data sets that require the parallelization and processing power of a Hadoop cluster.

```
{
  "type" : "index_hadoop",
  "spec": <Hadoop index spec>
}
```

|property|description|required?|
|--------|-----------|---------|
|type|The task type, this should always be "index_hadoop".|yes|
|spec|A Hadoop Index Spec. See [Batch Ingestion](Batch-ingestion.html)|yes|
|hadoopCoordinates|The Maven \<groupId\>:\<artifactId\>:\<version\> of Hadoop to use. The default is "org.apache.hadoop:hadoop-client:2.3.0".|no|


The Hadoop Index Config submitted as part of an Hadoop Index Task is identical to the Hadoop Index Config used by the `HadoopBatchIndexer` except that three fields must be omitted: `segmentOutputPath`, `workingPath`, `updaterJobSpec`. The Indexing Service takes care of setting these fields internally.

#### Using your own Hadoop distribution

Druid is compiled against Apache hadoop-client 2.3.0. However, if you happen to use a different flavor of hadoop that is API compatible with hadoop-client 2.3.0, you should only have to change the hadoopCoordinates property to point to the maven artifact used by your distribution. For non-API compatible versions, please see [here](Other-Hadoop.html).

#### Resolving dependency conflicts running HadoopIndexTask

Currently, the HadoopIndexTask creates a single classpath to run the HadoopDruidIndexerJob, which can lead to version conflicts between various dependencies of Druid, extension modules, and Hadoop's own dependencies.

The Hadoop index task will put Druid's dependencies first on the classpath, followed by any extensions dependencies, and any Hadoop dependencies last.

If you are having trouble with any extensions in HadoopIndexTask, it may be the case that Druid, or one of its dependencies, depends on a different version of a library than what you are using as part of your extensions, but Druid's version overrides the one in your extension. In that case you probably want to build your own Druid version and override the offending library by adding an explicit dependency to the pom.xml of each druid sub-module that depends on it.

### Realtime Index Task

The indexing service can also run real-time tasks. These tasks effectively transform a middle manager into a real-time node. We introduced real-time tasks as a way to programmatically add new real-time data sources without needing to manually add nodes. We recommend you use the library [tranquility](https://github.com/metamx/tranquility) to programmatically manage generating real-time index tasks. The grammar for the real-time task is as follows:

```json
{
  "type": "index_realtime",
  "id": "example",
  "resource": {
    "availabilityGroup": "someGroup",
    "requiredCapacity": 1
  },
  "spec": {
    "dataSchema": {
      "dataSource": "wikipedia",
      "parser": {
        "type": "string",
        "parseSpec": {
          "format": "json",
          "timestampSpec": {
            "column": "timestamp",
            "format": "iso"
          },
          "dimensionsSpec": {
            "dimensions": [
              "page",
              "language",
              "user",
              "unpatrolled",
              "newPage",
              "robot",
              "anonymous",
              "namespace",
              "continent",
              "country",
              "region",
              "city"
            ],
            "dimensionExclusions": [

            ],
            "spatialDimensions": [

            ]
          }
        },
        "metricsSpec": [
          {
            "type": "count",
            "name": "count"
          },
          {
            "type": "doubleSum",
            "name": "added",
            "fieldName": "added"
          },
          {
            "type": "doubleSum",
            "name": "deleted",
            "fieldName": "deleted"
          },
          {
            "type": "doubleSum",
            "name": "delta",
            "fieldName": "delta"
          }
        ],
        "granularitySpec": {
          "type": "uniform",
          "segmentGranularity": "DAY",
          "queryGranularity": "NONE"
        }
      }
    },
    "ioConfig": {
      "type": "realtime",
      "firehose": {
        "type": "kafka-0.7.2",
        "consumerProps": {
          "zk.connect": "zk_connect_string",
          "zk.connectiontimeout.ms": "15000",
          "zk.sessiontimeout.ms": "15000",
          "zk.synctime.ms": "5000",
          "groupid": "consumer-group",
          "fetch.size": "1048586",
          "autooffset.reset": "largest",
          "autocommit.enable": "false"
        },
        "feed": "your_kafka_topic"
      },
      "plumber": {
        "type": "realtime"
      }
    },
    "tuningConfig": {
      "type": "realtime",
      "maxRowsInMemory": 500000,
      "intermediatePersistPeriod": "PT10m",
      "windowPeriod": "PT10m",
      "basePersistDirectory": "\/tmp\/realtime\/basePersist",
      "rejectionPolicy": {
        "type": "serverTime"
      }
    }
  }
}
```

|Field|Type|Description|Required|
|-----|----|-----------|--------|
|id|String|The ID of the task.|No|
|Resource|JSON object|Used for high availability purposes.|No|
|availabilityGroup|String|An uniqueness identifier for the task. Tasks with the same availability group will always run on different middle managers. Used mainly for replication. |yes|
|requiredCapacity|Integer|How much middle manager capacity this task will take.|yes|

For schema, windowPeriod, segmentGranularity, and other configuration information, see [Realtime Ingestion](Realtime-ingestion.html). For firehose configuration, see [Firehose](Firehose.html).


Segment Merging Tasks
---------------------

### Append Task

Append tasks append a list of segments together into a single segment (one after the other). The grammar is:

```json
{
    "type": "append",
    "id": <task_id>,
    "dataSource": <task_datasource>,
    "segments": <JSON list of DataSegment objects to append>
}
```

### Merge Task

Merge tasks merge a list of segments together. Any common timestamps are merged. The grammar is:

```json
{
    "type": "merge",
    "id": <task_id>,
    "dataSource": <task_datasource>,
    "segments": <JSON list of DataSegment objects to merge>
}
```

Segment Destroying Tasks
------------------------

### Delete Task

Delete tasks create empty segments with no data. The grammar is:

```json
{
    "type": "delete",
    "id": <task_id>,
    "dataSource": <task_datasource>,
    "segments": <JSON list of DataSegment objects to delete>
}
```

### Kill Task

Kill tasks delete all information about a segment and removes it from deep storage. Killable segments must be disabled (used==0) in the Druid segment table. The available grammar is:

```json
{
    "type": "kill",
    "id": <task_id>,
    "dataSource": <task_datasource>,
    "interval" : <all_segments_in_this_interval_will_die!>
}
```

Misc. Tasks
-----------

### Version Converter Task

These tasks convert segments from an existing older index version to the latest index version. The available grammar is:

```json
{
    "type": "version_converter",
    "id": <task_id>,
    "groupId" : <task_group_id>,
    "dataSource": <task_datasource>,
    "interval" : <segment_interval>,
    "segment": <JSON DataSegment object to convert>
}
```

### Noop Task

These tasks start, sleep for a time and are used only for testing. The available grammar is:

```json
{
    "type": "noop",
    "id": <optional_task_id>,
    "interval" : <optional_segment_interval>,
    "runTime" : <optional_millis_to_sleep>,
    "firehose": <optional_firehose_to_test_connect>
}
```

Locking
-------
Once an overlord node accepts a task, a lock is created for the data source and interval specified in the task. Tasks do not need to explicitly release locks, they are released upon task completion. Tasks may potentially release locks early if they desire. Tasks ids are unique by naming them using UUIDs or the timestamp in which the task was created. Tasks are also part of a "task group", which is a set of tasks that can share interval locks.
Added prepend tag to make pages display. 2013-09-16 17:49:36 -04:00			`---`
Docs working 2013-09-26 19:22:28 -04:00			`layout: doc_page`
Added prepend tag to make pages display. 2013-09-16 17:49:36 -04:00			`---`
added titles; fixed a few typos 2014-01-16 18:37:07 -05:00			`# Tasks`
added link to info on how tasks are posted 2014-03-27 18:12:50 -04:00			`Tasks are run on middle managers and always operate on a single data source. Tasks are submitted using [POST requests](Indexing-Service.html).`
Add docs from github wiki 2013-09-13 18:20:39 -04:00
			`There are several different types of tasks.`

rewrite indexing service docs 2013-10-08 19:34:58 -04:00			`Segment Creation Tasks`
			`----------------------`

Fixed: - Bad link for granularitySpec changed to real definition. - All heads set to logical levels (consistent with all others). - Fixed bad link in definition for spatialDimensions. 2014-03-18 13:28:33 -04:00			`### Index Task`
rewrite indexing service docs 2013-10-08 19:34:58 -04:00
			`The Index Task is a simpler variation of the Index Hadoop task that is designed to be used for smaller data sets. The task executes within the indexing service and does not require an external Hadoop setup to use. The grammar of the index task is as follows:`

a ton of fixes to docs 2013-10-10 18:05:01 -04:00			```json
rewrite indexing service docs 2013-10-08 19:34:58 -04:00			`{`
			`"type" : "index",`
fix task docs 2015-01-22 00:48:48 -05:00			`"spec" : {`
			`"dataSchema" : {`
			`"dataSource" : "wikipedia",`
			`"parser" : {`
			`"type" : "string",`
			`"parseSpec" : {`
			`"format" : "json",`
			`"timestampSpec" : {`
			`"column" : "timestamp",`
			`"format" : "auto"`
			`},`
			`"dimensionsSpec" : {`
			`"dimensions": ["page","language","user","unpatrolled","newPage","robot","anonymous","namespace","continent","country","region","city"],`
			`"dimensionExclusions" : [],`
			`"spatialDimensions" : []`
			`}`
			`}`
rewrite indexing service docs 2013-10-08 19:34:58 -04:00			`},`
fix task docs 2015-01-22 00:48:48 -05:00			`"metricsSpec" : [`
			`{`
			`"type" : "count",`
			`"name" : "count"`
			`},`
			`{`
			`"type" : "doubleSum",`
			`"name" : "added",`
			`"fieldName" : "added"`
			`},`
			`{`
			`"type" : "doubleSum",`
			`"name" : "deleted",`
			`"fieldName" : "deleted"`
			`},`
			`{`
			`"type" : "doubleSum",`
			`"name" : "delta",`
			`"fieldName" : "delta"`
			`}`
			`],`
			`"granularitySpec" : {`
			`"type" : "uniform",`
			`"segmentGranularity" : "DAY",`
			`"queryGranularity" : "NONE",`
			`"intervals" : [ "2013-08-31/2013-09-01" ]`
rewrite indexing service docs 2013-10-08 19:34:58 -04:00			`}`
fix task docs 2015-01-22 00:48:48 -05:00			`},`
			`"ioConfig" : {`
			`"type" : "index",`
			`"firehose" : {`
			`"type" : "local",`
			`"baseDir" : "examples/indexing/",`
			`"filter" : "wikipedia_data.json"`
			`}`
			`},`
			`"tuningConfig" : {`
			`"type" : "index",`
			`"targetPartitionSize" : -1,`
			`"rowFlushBoundary" : 0,`
Go through and fix mistakes in tutorials and docs 2015-02-17 18:21:16 -05:00			`"numShards": 1`
rewrite indexing service docs 2013-10-08 19:34:58 -04:00			`}`
			`}`
			`}`
			```

fix task docs 2015-01-22 00:48:48 -05:00			`#### Task Properties`

rewrite indexing service docs 2013-10-08 19:34:58 -04:00			`\|property\|description\|required?\|`
			`\|--------\|-----------\|---------\|`
			`\|type\|The task type, this should always be "index".\|yes\|`
updated task ID definition 2014-03-20 00:08:40 -04:00			`\|id\|The task ID. If this is not explicitly specified, Druid generates the task ID using the name of the task file and date-time stamp. \|no\|`
fix task docs 2015-01-22 00:48:48 -05:00			`\|spec\|The ingestion spec. See below for more details. \|yes\|`

			`#### DataSchema`

			`This field is required.`

			`See [Ingestion](Ingestion.html)`

			`#### IOConfig`

			`This field is required. You can specify a type of [Firehose](Firehose.html) here.`

			`#### TuningConfig`

			`The tuningConfig is optional and default parameters will be used if no tuningConfig is specified. See below for more details.`

			`\|property\|description\|default\|required?\|`
			`\|--------\|-----------\|-------\|---------\|`
			`\|type\|The task type, this should always be "index".\|None.\|\|yes\|`
Go through and fix mistakes in tutorials and docs 2015-02-17 18:21:16 -05:00			`\|targetPartitionSize\|Used in sharding. Determines how many rows are in each segment. Set this to -1 to use numShards instead for sharding.\|5000000\|no\|`
fix task docs 2015-01-22 00:48:48 -05:00			`\|rowFlushBoundary\|Used in determining when intermediate persist should occur to disk.\|500000\|no\|`
Go through and fix mistakes in tutorials and docs 2015-02-17 18:21:16 -05:00			`\|numShards\|Directly specify the number of shards to create. You can skip the intermediate persist step if you specify the number of shards you want and set targetPartitionSize=-1.\|null\|no\|`
rewrite indexing service docs 2013-10-08 19:34:58 -04:00
Fixed: - Bad link for granularitySpec changed to real definition. - All heads set to logical levels (consistent with all others). - Fixed bad link in definition for spatialDimensions. 2014-03-18 13:28:33 -04:00			`### Index Hadoop Task`
rewrite indexing service docs 2013-10-08 19:34:58 -04:00
			`The Hadoop Index Task is used to index larger data sets that require the parallelization and processing power of a Hadoop cluster.`

			```
			`{`
			`"type" : "index_hadoop",`
Go through and fix mistakes in tutorials and docs 2015-02-17 18:21:16 -05:00			`"spec": <Hadoop index spec>`
rewrite indexing service docs 2013-10-08 19:34:58 -04:00			`}`
			```

			`\|property\|description\|required?\|`
			`\|--------\|-----------\|---------\|`
			`\|type\|The task type, this should always be "index_hadoop".\|yes\|`
Go through and fix mistakes in tutorials and docs 2015-02-17 18:21:16 -05:00			`\|spec\|A Hadoop Index Spec. See [Batch Ingestion](Batch-ingestion.html)\|yes\|`
update default version in docs 2014-03-27 10:46:49 -04:00			`\|hadoopCoordinates\|The Maven \<groupId\>:\<artifactId\>:\<version\> of Hadoop to use. The default is "org.apache.hadoop:hadoop-client:2.3.0".\|no\|`
fix task docs and indexing service img 2013-10-16 21:08:36 -04:00
rewrite indexing service docs 2013-10-08 19:34:58 -04:00
another docs fix 2014-07-29 14:40:35 -04:00			The Hadoop Index Config submitted as part of an Hadoop Index Task is identical to the Hadoop Index Config used by the `HadoopBatchIndexer` except that three fields must be omitted: `segmentOutputPath`, `workingPath`, `updaterJobSpec`. The Indexing Service takes care of setting these fields internally.
port docs over to 0.6 and a bunch of misc fixes 2013-10-11 21:38:53 -04:00
Fixed: - Bad link for granularitySpec changed to real definition. - All heads set to logical levels (consistent with all others). - Fixed bad link in definition for spatialDimensions. 2014-03-18 13:28:33 -04:00			`#### Using your own Hadoop distribution`
add docs on hadoop flavors and hadoop index task classpath resolution 2013-11-26 20:07:32 -05:00
Go through and fix mistakes in tutorials and docs 2015-02-17 18:21:16 -05:00			`Druid is compiled against Apache hadoop-client 2.3.0. However, if you happen to use a different flavor of hadoop that is API compatible with hadoop-client 2.3.0, you should only have to change the hadoopCoordinates property to point to the maven artifact used by your distribution. For non-API compatible versions, please see [here](Other-Hadoop.html).`
add docs on hadoop flavors and hadoop index task classpath resolution 2013-11-26 20:07:32 -05:00
Fixed: - Bad link for granularitySpec changed to real definition. - All heads set to logical levels (consistent with all others). - Fixed bad link in definition for spatialDimensions. 2014-03-18 13:28:33 -04:00			`#### Resolving dependency conflicts running HadoopIndexTask`
add docs on hadoop flavors and hadoop index task classpath resolution 2013-11-26 20:07:32 -05:00
			`Currently, the HadoopIndexTask creates a single classpath to run the HadoopDruidIndexerJob, which can lead to version conflicts between various dependencies of Druid, extension modules, and Hadoop's own dependencies.`

			`The Hadoop index task will put Druid's dependencies first on the classpath, followed by any extensions dependencies, and any Hadoop dependencies last.`

			`If you are having trouble with any extensions in HadoopIndexTask, it may be the case that Druid, or one of its dependencies, depends on a different version of a library than what you are using as part of your extensions, but Druid's version overrides the one in your extension. In that case you probably want to build your own Druid version and override the offending library by adding an explicit dependency to the pom.xml of each druid sub-module that depends on it.`

Fixed: - Bad link for granularitySpec changed to real definition. - All heads set to logical levels (consistent with all others). - Fixed bad link in definition for spatialDimensions. 2014-03-18 13:28:33 -04:00			`### Realtime Index Task`
clean up poms, add a new loading your own data tutorial, add new validation, clean up logs 2013-10-09 18:42:39 -04:00
fix task docs 2015-01-22 00:48:48 -05:00			`The indexing service can also run real-time tasks. These tasks effectively transform a middle manager into a real-time node. We introduced real-time tasks as a way to programmatically add new real-time data sources without needing to manually add nodes. We recommend you use the library [tranquility](https://github.com/metamx/tranquility) to programmatically manage generating real-time index tasks. The grammar for the real-time task is as follows:`
clean up poms, add a new loading your own data tutorial, add new validation, clean up logs 2013-10-09 18:42:39 -04:00
a ton of fixes to docs 2013-10-10 18:05:01 -04:00			```json
clean up poms, add a new loading your own data tutorial, add new validation, clean up logs 2013-10-09 18:42:39 -04:00			`{`
fix task docs 2015-01-22 00:48:48 -05:00			`"type": "index_realtime",`
a ton of fixes to docs 2013-10-10 18:05:01 -04:00			`"id": "example",`
clean up poms, add a new loading your own data tutorial, add new validation, clean up logs 2013-10-09 18:42:39 -04:00			`"resource": {`
fix task docs 2015-01-22 00:48:48 -05:00			`"availabilityGroup": "someGroup",`
			`"requiredCapacity": 1`
clean up poms, add a new loading your own data tutorial, add new validation, clean up logs 2013-10-09 18:42:39 -04:00			`},`
fix task docs 2015-01-22 00:48:48 -05:00			`"spec": {`
			`"dataSchema": {`
			`"dataSource": "wikipedia",`
			`"parser": {`
			`"type": "string",`
			`"parseSpec": {`
			`"format": "json",`
			`"timestampSpec": {`
			`"column": "timestamp",`
			`"format": "iso"`
			`},`
			`"dimensionsSpec": {`
			`"dimensions": [`
			`"page",`
			`"language",`
			`"user",`
			`"unpatrolled",`
			`"newPage",`
			`"robot",`
			`"anonymous",`
			`"namespace",`
			`"continent",`
			`"country",`
			`"region",`
			`"city"`
			`],`
			`"dimensionExclusions": [`

			`],`
			`"spatialDimensions": [`

			`]`
			`}`
			`},`
			`"metricsSpec": [`
			`{`
			`"type": "count",`
			`"name": "count"`
			`},`
			`{`
			`"type": "doubleSum",`
			`"name": "added",`
			`"fieldName": "added"`
			`},`
			`{`
			`"type": "doubleSum",`
			`"name": "deleted",`
			`"fieldName": "deleted"`
			`},`
			`{`
			`"type": "doubleSum",`
			`"name": "delta",`
			`"fieldName": "delta"`
			`}`
			`],`
			`"granularitySpec": {`
			`"type": "uniform",`
			`"segmentGranularity": "DAY",`
			`"queryGranularity": "NONE"`
			`}`
clean up poms, add a new loading your own data tutorial, add new validation, clean up logs 2013-10-09 18:42:39 -04:00			`}`
			`},`
fix task docs 2015-01-22 00:48:48 -05:00			`"ioConfig": {`
			`"type": "realtime",`
			`"firehose": {`
			`"type": "kafka-0.7.2",`
			`"consumerProps": {`
			`"zk.connect": "zk_connect_string",`
			`"zk.connectiontimeout.ms": "15000",`
			`"zk.sessiontimeout.ms": "15000",`
			`"zk.synctime.ms": "5000",`
			`"groupid": "consumer-group",`
			`"fetch.size": "1048586",`
			`"autooffset.reset": "largest",`
			`"autocommit.enable": "false"`
			`},`
			`"feed": "your_kafka_topic"`
clean up poms, add a new loading your own data tutorial, add new validation, clean up logs 2013-10-09 18:42:39 -04:00			`},`
fix task docs 2015-01-22 00:48:48 -05:00			`"plumber": {`
			`"type": "realtime"`
			`}`
			`},`
			`"tuningConfig": {`
			`"type": "realtime",`
			`"maxRowsInMemory": 500000,`
			`"intermediatePersistPeriod": "PT10m",`
			`"windowPeriod": "PT10m",`
			`"basePersistDirectory": "\/tmp\/realtime\/basePersist",`
			`"rejectionPolicy": {`
			`"type": "serverTime"`
			`}`
clean up poms, add a new loading your own data tutorial, add new validation, clean up logs 2013-10-09 18:42:39 -04:00			`}`
fix task docs 2015-01-22 00:48:48 -05:00			`}`
clean up poms, add a new loading your own data tutorial, add new validation, clean up logs 2013-10-09 18:42:39 -04:00			`}`
			```

			`\|Field\|Type\|Description\|Required\|`
			`\|-----\|----\|-----------\|--------\|`
minor rewriting for flow; fixed some wrong links 2014-01-24 19:48:19 -05:00			`\|id\|String\|The ID of the task.\|No\|`
			`\|Resource\|JSON object\|Used for high availability purposes.\|No\|`
clean up poms, add a new loading your own data tutorial, add new validation, clean up logs 2013-10-09 18:42:39 -04:00			`\|availabilityGroup\|String\|An uniqueness identifier for the task. Tasks with the same availability group will always run on different middle managers. Used mainly for replication. \|yes\|`
			`\|requiredCapacity\|Integer\|How much middle manager capacity this task will take.\|yes\|`

removed references to config elements that do not appear on the page 2014-03-27 18:00:28 -04:00			`For schema, windowPeriod, segmentGranularity, and other configuration information, see [Realtime Ingestion](Realtime-ingestion.html). For firehose configuration, see [Firehose](Firehose.html).`
clean up poms, add a new loading your own data tutorial, add new validation, clean up logs 2013-10-09 18:42:39 -04:00
rewrite indexing service docs 2013-10-08 19:34:58 -04:00
			`Segment Merging Tasks`
			`---------------------`

Fixed: - Bad link for granularitySpec changed to real definition. - All heads set to logical levels (consistent with all others). - Fixed bad link in definition for spatialDimensions. 2014-03-18 13:28:33 -04:00			`### Append Task`
Add docs from github wiki 2013-09-13 18:20:39 -04:00
			`Append tasks append a list of segments together into a single segment (one after the other). The grammar is:`

a ton of fixes to docs 2013-10-10 18:05:01 -04:00			```json
rewrite indexing service docs 2013-10-08 19:34:58 -04:00			`{`
added task types that were missing from some of the JSON task configs 2014-03-25 19:53:06 -04:00			`"type": "append",`
rewrite indexing service docs 2013-10-08 19:34:58 -04:00			`"id": <task_id>,`
			`"dataSource": <task_datasource>,`
			`"segments": <JSON list of DataSegment objects to append>`
			`}`
			```
Add docs from github wiki 2013-09-13 18:20:39 -04:00
Fixed: - Bad link for granularitySpec changed to real definition. - All heads set to logical levels (consistent with all others). - Fixed bad link in definition for spatialDimensions. 2014-03-18 13:28:33 -04:00			`### Merge Task`
Add docs from github wiki 2013-09-13 18:20:39 -04:00
			`Merge tasks merge a list of segments together. Any common timestamps are merged. The grammar is:`

a ton of fixes to docs 2013-10-10 18:05:01 -04:00			```json
rewrite indexing service docs 2013-10-08 19:34:58 -04:00			`{`
added task types that were missing from some of the JSON task configs 2014-03-25 19:53:06 -04:00			`"type": "merge",`
rewrite indexing service docs 2013-10-08 19:34:58 -04:00			`"id": <task_id>,`
			`"dataSource": <task_datasource>,`
Fix some of the Task json examples to have the correct description 2015-01-15 03:41:23 -05:00			`"segments": <JSON list of DataSegment objects to merge>`
rewrite indexing service docs 2013-10-08 19:34:58 -04:00			`}`
			```
Add docs from github wiki 2013-09-13 18:20:39 -04:00
rewrite indexing service docs 2013-10-08 19:34:58 -04:00			`Segment Destroying Tasks`
			`------------------------`

Fixed: - Bad link for granularitySpec changed to real definition. - All heads set to logical levels (consistent with all others). - Fixed bad link in definition for spatialDimensions. 2014-03-18 13:28:33 -04:00			`### Delete Task`
Add docs from github wiki 2013-09-13 18:20:39 -04:00
			`Delete tasks create empty segments with no data. The grammar is:`

a ton of fixes to docs 2013-10-10 18:05:01 -04:00			```json
rewrite indexing service docs 2013-10-08 19:34:58 -04:00			`{`
added task types that were missing from some of the JSON task configs 2014-03-25 19:53:06 -04:00			`"type": "delete",`
rewrite indexing service docs 2013-10-08 19:34:58 -04:00			`"id": <task_id>,`
			`"dataSource": <task_datasource>,`
Fix some of the Task json examples to have the correct description 2015-01-15 03:41:23 -05:00			`"segments": <JSON list of DataSegment objects to delete>`
rewrite indexing service docs 2013-10-08 19:34:58 -04:00			`}`
			```
Add docs from github wiki 2013-09-13 18:20:39 -04:00
Fixed: - Bad link for granularitySpec changed to real definition. - All heads set to logical levels (consistent with all others). - Fixed bad link in definition for spatialDimensions. 2014-03-18 13:28:33 -04:00			`### Kill Task`
Add docs from github wiki 2013-09-13 18:20:39 -04:00
			`Kill tasks delete all information about a segment and removes it from deep storage. Killable segments must be disabled (used==0) in the Druid segment table. The available grammar is:`

a ton of fixes to docs 2013-10-10 18:05:01 -04:00			```json
rewrite indexing service docs 2013-10-08 19:34:58 -04:00			`{`
added task types that were missing from some of the JSON task configs 2014-03-25 19:53:06 -04:00			`"type": "kill",`
rewrite indexing service docs 2013-10-08 19:34:58 -04:00			`"id": <task_id>,`
			`"dataSource": <task_datasource>,`
fix doc 2014-07-17 17:07:12 -04:00			`"interval" : <all_segments_in_this_interval_will_die!>`
rewrite indexing service docs 2013-10-08 19:34:58 -04:00			`}`
			```
Add docs from github wiki 2013-09-13 18:20:39 -04:00
rewrite indexing service docs 2013-10-08 19:34:58 -04:00			`Misc. Tasks`
			`-----------`
Add docs from github wiki 2013-09-13 18:20:39 -04:00
Fixed: - Bad link for granularitySpec changed to real definition. - All heads set to logical levels (consistent with all others). - Fixed bad link in definition for spatialDimensions. 2014-03-18 13:28:33 -04:00			`### Version Converter Task`
Add docs from github wiki 2013-09-13 18:20:39 -04:00
rewrite indexing service docs 2013-10-08 19:34:58 -04:00			`These tasks convert segments from an existing older index version to the latest index version. The available grammar is:`
Add docs from github wiki 2013-09-13 18:20:39 -04:00
a ton of fixes to docs 2013-10-10 18:05:01 -04:00			```json
rewrite indexing service docs 2013-10-08 19:34:58 -04:00			`{`
added task types that were missing from some of the JSON task configs 2014-03-25 19:53:06 -04:00			`"type": "version_converter",`
rewrite indexing service docs 2013-10-08 19:34:58 -04:00			`"id": <task_id>,`
			`"groupId" : <task_group_id>,`
			`"dataSource": <task_datasource>,`
			`"interval" : <segment_interval>,`
			`"segment": <JSON DataSegment object to convert>`
			`}`
			```
Add docs from github wiki 2013-09-13 18:20:39 -04:00
Fixed: - Bad link for granularitySpec changed to real definition. - All heads set to logical levels (consistent with all others). - Fixed bad link in definition for spatialDimensions. 2014-03-18 13:28:33 -04:00			`### Noop Task`
Add docs from github wiki 2013-09-13 18:20:39 -04:00
rewrite indexing service docs 2013-10-08 19:34:58 -04:00			`These tasks start, sleep for a time and are used only for testing. The available grammar is:`

a ton of fixes to docs 2013-10-10 18:05:01 -04:00			```json
rewrite indexing service docs 2013-10-08 19:34:58 -04:00			`{`
added task types that were missing from some of the JSON task configs 2014-03-25 19:53:06 -04:00			`"type": "noop",`
rewrite indexing service docs 2013-10-08 19:34:58 -04:00			`"id": <optional_task_id>,`
			`"interval" : <optional_segment_interval>,`
			`"runTime" : <optional_millis_to_sleep>,`
			`"firehose": <optional_firehose_to_test_connect>`
			`}`
			```
Add docs from github wiki 2013-09-13 18:20:39 -04:00
rewrite indexing service docs 2013-10-08 19:34:58 -04:00			`Locking`
			`-------`
			`Once an overlord node accepts a task, a lock is created for the data source and interval specified in the task. Tasks do not need to explicitly release locks, they are released upon task completion. Tasks may potentially release locks early if they desire. Tasks ids are unique by naming them using UUIDs or the timestamp in which the task was created. Tasks are also part of a "task group", which is a set of tasks that can share interval locks.`