druid/docs/content/ingestion/tasks.md

---
layout: doc_page
---
# Tasks
Tasks are run on middle managers and always operate on a single data source. Tasks are submitted using [POST requests](../design/indexing-service.html).

There are several different types of tasks.

Segment Creation Tasks
----------------------

### Hadoop Index Task

See [batch ingestion](../ingestion/batch-ingestion.html).

### Index Task

The Index Task is a simpler variation of the Index Hadoop task that is designed to be used for smaller data sets. The task executes within the indexing service and does not require an external Hadoop setup to use. The grammar of the index task is as follows:

```json
{
  "type" : "index",
  "spec" : {
    "dataSchema" : {
      "dataSource" : "wikipedia",
      "parser" : {
        "type" : "string",
        "parseSpec" : {
          "format" : "json",
          "timestampSpec" : {
            "column" : "timestamp",
            "format" : "auto"
          },
          "dimensionsSpec" : {
            "dimensions": ["page","language","user","unpatrolled","newPage","robot","anonymous","namespace","continent","country","region","city"],
            "dimensionExclusions" : [],
            "spatialDimensions" : []
          }
        }
      },
      "metricsSpec" : [
        {
          "type" : "count",
          "name" : "count"
        },
        {
          "type" : "doubleSum",
          "name" : "added",
          "fieldName" : "added"
        },
        {
          "type" : "doubleSum",
          "name" : "deleted",
          "fieldName" : "deleted"
        },
        {
          "type" : "doubleSum",
          "name" : "delta",
          "fieldName" : "delta"
        }
      ],
      "granularitySpec" : {
        "type" : "uniform",
        "segmentGranularity" : "DAY",
        "queryGranularity" : "NONE",
        "intervals" : [ "2013-08-31/2013-09-01" ]
      }
    },
    "ioConfig" : {
      "type" : "index",
      "firehose" : {
        "type" : "local",
        "baseDir" : "examples/indexing/",
        "filter" : "wikipedia_data.json"
       }
    },
    "tuningConfig" : {
      "type" : "index",
      "targetPartitionSize" : 5000000,
      "maxRowsInMemory" : 75000
    }
  }
}
```

#### Task Properties

|property|description|required?|
|--------|-----------|---------|
|type|The task type, this should always be "index".|yes|
|id|The task ID. If this is not explicitly specified, Druid generates the task ID using the name of the task file and date-time stamp. |no|
|spec|The ingestion spec. See below for more details. |yes|

#### DataSchema

This field is required.

See [Ingestion](../ingestion/index.html)

#### IOConfig

|property|description|default|required?|
|--------|-----------|-------|---------|
|type|The task type, this should always be "index".|none|yes|
|firehose|Specify a [Firehose](../ingestion/firehose.html) here.|none|yes|
|appendToExisting|Creates segments as additional shards of the latest version, effectively appending to the segment set instead of replacing it. This will only work if the existing segment set has extendable-type shardSpecs (which can be forced by setting 'forceExtendableShardSpecs' in the tuning config).|false|no|
|skipFirehoseCaching|By default the IndexTask will fully read the supplied firehose to disk before processing the data. This prevents the task from doing multiple remote fetches and enforces determinism if more than one pass through the data is required. It also allows the task to retry fetching the data if the firehose throws an exception during reading. This requires sufficient disk space for the temporary cache.|false|no|

#### TuningConfig

The tuningConfig is optional and default parameters will be used if no tuningConfig is specified. See below for more details.

|property|description|default|required?|
|--------|-----------|-------|---------|
|type|The task type, this should always be "index".|none|yes|
|targetPartitionSize|Used in sharding. Determines how many rows are in each segment.|5000000|no|
|maxRowsInMemory|Used in determining when intermediate persists to disk should occur.|75000|no|
|numShards|Directly specify the number of shards to create. If this is specified and 'intervals' is specified in the granularitySpec, the index task can skip the determine intervals/partitions pass through the data. numShards cannot be specified if targetPartitionSize is set.|null|no|
|indexSpec|defines segment storage format options to be used at indexing time, see [IndexSpec](#indexspec)|null|no|
|maxPendingPersists|Maximum number of persists that can be pending but not started. If this limit would be exceeded by a new intermediate persist, ingestion will block until the currently-running persist finishes. Maximum heap memory usage for indexing scales with maxRowsInMemory * (2 + maxPendingPersists).|0 (meaning one persist can be running concurrently with ingestion, and none can be queued up)|no|
|buildV9Directly|Whether to build a v9 index directly instead of first building a v8 index and then converting it to v9 format.|true|no|
|forceExtendableShardSpecs|Forces use of extendable shardSpecs. Experimental feature intended for use with the [Kafka indexing service extension](../development/extensions-core/kafka-ingestion.html).|false|no|
|reportParseExceptions|If true, exceptions encountered during parsing will be thrown and will halt ingestion; if false, unparseable rows and fields will be skipped.|false|no|

#### IndexSpec

The indexSpec defines segment storage format options to be used at indexing time, such as bitmap type and column
compression formats. The indexSpec is optional and default parameters will be used if not specified.

|Field|Type|Description|Required|
|-----|----|-----------|--------|
|bitmap|Object|Compression format for bitmap indexes. Should be a JSON object; see below for options.|no (defaults to Concise)|
|dimensionCompression|String|Compression format for dimension columns. Choose from `LZ4`, `LZF`, or `uncompressed`.|no (default == `LZ4`)|
|metricCompression|String|Compression format for metric columns. Choose from `LZ4`, `LZF`, `uncompressed`, or `none`.|no (default == `LZ4`)|
|longEncoding|String|Encoding format for metric and dimension columns with type long. Choose from `auto` or `longs`. `auto` encodes the values using offset or lookup table depending on column cardinality, and store them with variable size. `longs` stores the value as is with 8 bytes each.|no (default == `longs`)|

##### Bitmap types

For Concise bitmaps:

|Field|Type|Description|Required|
|-----|----|-----------|--------|
|type|String|Must be `concise`.|yes|

For Roaring bitmaps:

|Field|Type|Description|Required|
|-----|----|-----------|--------|
|type|String|Must be `roaring`.|yes|
|compressRunOnSerialization|Boolean|Use a run-length encoding where it is estimated as more space efficient.|no (default == `true`)|

Segment Merging Tasks
---------------------

### Append Task

Append tasks append a list of segments together into a single segment (one after the other). The grammar is:

```json
{
    "type": "append",
    "id": <task_id>,
    "dataSource": <task_datasource>,
    "segments": <JSON list of DataSegment objects to append>,
    "buildV9Directly": <true or false, default true>,
    "aggregations": <optional list of aggregators>
}
```

### Merge Task

Merge tasks merge a list of segments together. Any common timestamps are merged.
If rollup is disabled as part of ingestion, common timestamps are not merged and rows are reordered by their timestamp.

The grammar is:

```json
{
    "type": "merge",
    "id": <task_id>,
    "dataSource": <task_datasource>,
    "aggregations": <list of aggregators>,
    "rollup": <whether or not to rollup data during a merge>,
    "buildV9Directly": <true or false, default true>,
    "segments": <JSON list of DataSegment objects to merge>
}
```

Segment Destroying Tasks
------------------------

### Kill Task

Kill tasks delete all information about a segment and removes it from deep storage. Killable segments must be disabled (used==0) in the Druid segment table. The available grammar is:

```json
{
    "type": "kill",
    "id": <task_id>,
    "dataSource": <task_datasource>,
    "interval" : <all_segments_in_this_interval_will_die!>
}
```

Misc. Tasks
-----------

### Version Converter Task
The convert task suite takes active segments and will recompress them using a new IndexSpec. This is handy when doing activities like migrating from Concise to Roaring, or adding dimension compression to old segments.

Upon success the new segments will have the same version as the old segment with `_converted` appended. A convert task may be run against the same interval for the same datasource multiple times. Each execution will append another `_converted` to the version for the segments

There are two types of conversion tasks. One is the Hadoop convert task, and the other is the indexing service convert task. The Hadoop convert task runs on a hadoop cluster, and simply leaves a task monitor on the indexing service (similar to the hadoop batch task). The indexing service convert task runs the actual conversion on the indexing service.

#### Hadoop Convert Segment Task
```json
{
  "type": "hadoop_convert_segment",
  "dataSource":"some_datasource",
  "interval":"2013/2015",
  "indexSpec":{"bitmap":{"type":"concise"},"dimensionCompression":"lz4","metricCompression":"lz4"},
  "force": true,
  "validate": false,
  "distributedSuccessCache":"hdfs://some-hdfs-nn:9000/user/jobrunner/cache",
  "jobPriority":"VERY_LOW",
  "segmentOutputPath":"s3n://somebucket/somekeyprefix"
}
```

The values are described below.

|Field|Type|Description|Required|
|-----|----|-----------|--------|
|`type`|String|Convert task identifier|Yes: `hadoop_convert_segment`|
|`dataSource`|String|The datasource to search for segments|Yes|
|`interval`|Interval string|The interval in the datasource to look for segments|Yes|
|`indexSpec`|json|The compression specification for the index|Yes|
|`force`|boolean|Forces the convert task to continue even if binary versions indicate it has been updated recently (you probably want to do this)|No|
|`validate`|boolean|Runs validation between the old and new segment before reporting task success|No|
|`distributedSuccessCache`|URI|A location where hadoop should put intermediary files.|Yes|
|`jobPriority`|`org.apache.hadoop.mapred.JobPriority` as String|The priority to set for the hadoop job|No|
|`segmentOutputPath`|URI|A base uri for the segment to be placed. Same format as other places a segment output path is needed|Yes|


#### Indexing Service Convert Segment Task
```json
{
  "type": "convert_segment",
  "dataSource":"some_datasource",
  "interval":"2013/2015",
  "indexSpec":{"bitmap":{"type":"concise"},"dimensionCompression":"lz4","metricCompression":"lz4"},
  "force": true,
  "validate": false
}
```

|Field|Type|Description|Required (default)|
|-----|----|-----------|--------|
|`type`|String|Convert task identifier|Yes: `convert_segment`|
|`dataSource`|String|The datasource to search for segments|Yes|
|`interval`|Interval string|The interval in the datasource to look for segments|Yes|
|`indexSpec`|json|The compression specification for the index|Yes|
|`force`|boolean|Forces the convert task to continue even if binary versions indicate it has been updated recently (you probably want to do this)|No (false)|
|`validate`|boolean|Runs validation between the old and new segment before reporting task success|No (true)|

Unlike the hadoop convert task, the indexing service task draws its output path from the indexing service's configuration.

### Noop Task

These tasks start, sleep for a time and are used only for testing. The available grammar is:

```json
{
    "type": "noop",
    "id": <optional_task_id>,
    "interval" : <optional_segment_interval>,
    "runTime" : <optional_millis_to_sleep>,
    "firehose": <optional_firehose_to_test_connect>
}
```

Locking
-------

Once an overlord node accepts a task, a lock is created for the data source and interval specified in the task. 
Tasks do not need to explicitly release locks, they are released upon task completion. Tasks may potentially release 
locks early if they desire. Tasks ids are unique by naming them using UUIDs or the timestamp in which the task was created. 
Tasks are also part of a "task group", which is a set of tasks that can share interval locks.
renaming all *.md filenames to only have lowercase and dashes so that they are editable on case-insensitive os as well 2015-05-05 17:07:32 -04:00			`---`
			`layout: doc_page`
			`---`
			`# Tasks`
			`Tasks are run on middle managers and always operate on a single data source. Tasks are submitted using [POST requests](../design/indexing-service.html).`

			`There are several different types of tasks.`

			`Segment Creation Tasks`
			`----------------------`

new quickstart 2016-01-06 00:27:52 -05:00			`### Hadoop Index Task`

			`See [batch ingestion](../ingestion/batch-ingestion.html).`

renaming all *.md filenames to only have lowercase and dashes so that they are editable on case-insensitive os as well 2015-05-05 17:07:32 -04:00			`### Index Task`

			`The Index Task is a simpler variation of the Index Hadoop task that is designed to be used for smaller data sets. The task executes within the indexing service and does not require an external Hadoop setup to use. The grammar of the index task is as follows:`

			```json
			`{`
			`"type" : "index",`
			`"spec" : {`
			`"dataSchema" : {`
			`"dataSource" : "wikipedia",`
			`"parser" : {`
			`"type" : "string",`
			`"parseSpec" : {`
			`"format" : "json",`
			`"timestampSpec" : {`
			`"column" : "timestamp",`
			`"format" : "auto"`
			`},`
			`"dimensionsSpec" : {`
			`"dimensions": ["page","language","user","unpatrolled","newPage","robot","anonymous","namespace","continent","country","region","city"],`
			`"dimensionExclusions" : [],`
			`"spatialDimensions" : []`
			`}`
			`}`
			`},`
			`"metricsSpec" : [`
			`{`
			`"type" : "count",`
			`"name" : "count"`
			`},`
			`{`
			`"type" : "doubleSum",`
			`"name" : "added",`
			`"fieldName" : "added"`
			`},`
			`{`
			`"type" : "doubleSum",`
			`"name" : "deleted",`
			`"fieldName" : "deleted"`
			`},`
			`{`
			`"type" : "doubleSum",`
			`"name" : "delta",`
			`"fieldName" : "delta"`
			`}`
			`],`
			`"granularitySpec" : {`
			`"type" : "uniform",`
			`"segmentGranularity" : "DAY",`
			`"queryGranularity" : "NONE",`
			`"intervals" : [ "2013-08-31/2013-09-01" ]`
			`}`
			`},`
			`"ioConfig" : {`
			`"type" : "index",`
			`"firehose" : {`
			`"type" : "local",`
			`"baseDir" : "examples/indexing/",`
			`"filter" : "wikipedia_data.json"`
			`}`
			`},`
			`"tuningConfig" : {`
			`"type" : "index",`
IndexTask improvements (#3611) * index task improvements * code review changes * add null check 2017-01-18 17:24:37 -05:00			`"targetPartitionSize" : 5000000,`
			`"maxRowsInMemory" : 75000`
renaming all *.md filenames to only have lowercase and dashes so that they are editable on case-insensitive os as well 2015-05-05 17:07:32 -04:00			`}`
			`}`
			`}`
			```

			`#### Task Properties`

			`\|property\|description\|required?\|`
			`\|--------\|-----------\|---------\|`
			`\|type\|The task type, this should always be "index".\|yes\|`
			`\|id\|The task ID. If this is not explicitly specified, Druid generates the task ID using the name of the task file and date-time stamp. \|no\|`
			`\|spec\|The ingestion spec. See below for more details. \|yes\|`

			`#### DataSchema`

			`This field is required.`

shorten links and file names * remove redundant parts in file names * delete unsupported "Druid-Personal-Demo-Cluster" 2015-05-28 20:10:34 -04:00			`See [Ingestion](../ingestion/index.html)`
renaming all *.md filenames to only have lowercase and dashes so that they are editable on case-insensitive os as well 2015-05-05 17:07:32 -04:00
			`#### IOConfig`

IndexTask improvements (#3611) * index task improvements * code review changes * add null check 2017-01-18 17:24:37 -05:00			`\|property\|description\|default\|required?\|`
			`\|--------\|-----------\|-------\|---------\|`
			`\|type\|The task type, this should always be "index".\|none\|yes\|`
			`\|firehose\|Specify a [Firehose](../ingestion/firehose.html) here.\|none\|yes\|`
			`\|appendToExisting\|Creates segments as additional shards of the latest version, effectively appending to the segment set instead of replacing it. This will only work if the existing segment set has extendable-type shardSpecs (which can be forced by setting 'forceExtendableShardSpecs' in the tuning config).\|false\|no\|`
			`\|skipFirehoseCaching\|By default the IndexTask will fully read the supplied firehose to disk before processing the data. This prevents the task from doing multiple remote fetches and enforces determinism if more than one pass through the data is required. It also allows the task to retry fetching the data if the firehose throws an exception during reading. This requires sufficient disk space for the temporary cache.\|false\|no\|`
renaming all *.md filenames to only have lowercase and dashes so that they are editable on case-insensitive os as well 2015-05-05 17:07:32 -04:00
			`#### TuningConfig`

			`The tuningConfig is optional and default parameters will be used if no tuningConfig is specified. See below for more details.`

			`\|property\|description\|default\|required?\|`
			`\|--------\|-----------\|-------\|---------\|`
IndexTask improvements (#3611) * index task improvements * code review changes * add null check 2017-01-18 17:24:37 -05:00			`\|type\|The task type, this should always be "index".\|none\|yes\|`
			`\|targetPartitionSize\|Used in sharding. Determines how many rows are in each segment.\|5000000\|no\|`
			`\|maxRowsInMemory\|Used in determining when intermediate persists to disk should occur.\|75000\|no\|`
			`\|numShards\|Directly specify the number of shards to create. If this is specified and 'intervals' is specified in the granularitySpec, the index task can skip the determine intervals/partitions pass through the data. numShards cannot be specified if targetPartitionSize is set.\|null\|no\|`
renaming all *.md filenames to only have lowercase and dashes so that they are editable on case-insensitive os as well 2015-05-05 17:07:32 -04:00			`\|indexSpec\|defines segment storage format options to be used at indexing time, see [IndexSpec](#indexspec)\|null\|no\|`
IndexTask improvements (#3611) * index task improvements * code review changes * add null check 2017-01-18 17:24:37 -05:00			`\|maxPendingPersists\|Maximum number of persists that can be pending but not started. If this limit would be exceeded by a new intermediate persist, ingestion will block until the currently-running persist finishes. Maximum heap memory usage for indexing scales with maxRowsInMemory * (2 + maxPendingPersists).\|0 (meaning one persist can be running concurrently with ingestion, and none can be queued up)\|no\|`
Make buildV9Directly the default. (#3688) 2016-11-14 12:29:32 -05:00			`\|buildV9Directly\|Whether to build a v9 index directly instead of first building a v8 index and then converting it to v9 format.\|true\|no\|`
Add forceExtendableShardSpecs option to Hadoop indexing, IndexTask. (#3473) Fixes #3241. 2016-09-21 15:40:04 -04:00			`\|forceExtendableShardSpecs\|Forces use of extendable shardSpecs. Experimental feature intended for use with the [Kafka indexing service extension](../development/extensions-core/kafka-ingestion.html).\|false\|no\|`
IndexTask improvements (#3611) * index task improvements * code review changes * add null check 2017-01-18 17:24:37 -05:00			`\|reportParseExceptions\|If true, exceptions encountered during parsing will be thrown and will halt ingestion; if false, unparseable rows and fields will be skipped.\|false\|no\|`
renaming all *.md filenames to only have lowercase and dashes so that they are editable on case-insensitive os as well 2015-05-05 17:07:32 -04:00
			`#### IndexSpec`

Configurable compressRunOnSerialization for Roaring bitmaps. (#3228) Defaults to true, which is a change in behavior (this used to be false and unconfigurable). 2016-07-08 00:54:19 -04:00			`The indexSpec defines segment storage format options to be used at indexing time, such as bitmap type and column`
			`compression formats. The indexSpec is optional and default parameters will be used if not specified.`
renaming all *.md filenames to only have lowercase and dashes so that they are editable on case-insensitive os as well 2015-05-05 17:07:32 -04:00
Configurable compressRunOnSerialization for Roaring bitmaps. (#3228) Defaults to true, which is a change in behavior (this used to be false and unconfigurable). 2016-07-08 00:54:19 -04:00			`\|Field\|Type\|Description\|Required\|`
			`\|-----\|----\|-----------\|--------\|`
			`\|bitmap\|Object\|Compression format for bitmap indexes. Should be a JSON object; see below for options.\|no (defaults to Concise)\|`
			\|dimensionCompression\|String\|Compression format for dimension columns. Choose from `LZ4`, `LZF`, or `uncompressed`.\|no (default == `LZ4`)\|
Adds long compression methods (#3148) * add read * update deprecated guava calls * add write and vsizeserde * add benchmark * separate encoding and compression * add header and reformat * update doc * address PR comment * fix buffer order * generate benchmark files * separate encoding strategy and format * fix benchmark * modify supplier write to channel * add float NONE handling * address PR comment * address PR comment 2 2016-08-30 19:17:46 -04:00			\|metricCompression\|String\|Compression format for metric columns. Choose from `LZ4`, `LZF`, `uncompressed`, or `none`.\|no (default == `LZ4`)\|
			\|longEncoding\|String\|Encoding format for metric and dimension columns with type long. Choose from `auto` or `longs`. `auto` encodes the values using offset or lookup table depending on column cardinality, and store them with variable size. `longs` stores the value as is with 8 bytes each.\|no (default == `longs`)\|
renaming all *.md filenames to only have lowercase and dashes so that they are editable on case-insensitive os as well 2015-05-05 17:07:32 -04:00
Configurable compressRunOnSerialization for Roaring bitmaps. (#3228) Defaults to true, which is a change in behavior (this used to be false and unconfigurable). 2016-07-08 00:54:19 -04:00			`##### Bitmap types`

			`For Concise bitmaps:`

			`\|Field\|Type\|Description\|Required\|`
			`\|-----\|----\|-----------\|--------\|`
			\|type\|String\|Must be `concise`.\|yes\|

			`For Roaring bitmaps:`

			`\|Field\|Type\|Description\|Required\|`
			`\|-----\|----\|-----------\|--------\|`
			\|type\|String\|Must be `roaring`.\|yes\|
			\|compressRunOnSerialization\|Boolean\|Use a run-length encoding where it is estimated as more space efficient.\|no (default == `true`)\|
renaming all *.md filenames to only have lowercase and dashes so that they are editable on case-insensitive os as well 2015-05-05 17:07:32 -04:00
			`Segment Merging Tasks`
			`---------------------`

			`### Append Task`

			`Append tasks append a list of segments together into a single segment (one after the other). The grammar is:`

			```json
			`{`
			`"type": "append",`
			`"id": <task_id>,`
			`"dataSource": <task_datasource>,`
adding aggregators to segment metadata 2015-11-05 23:52:10 -05:00			`"segments": <JSON list of DataSegment objects to append>,`
buildV9Directly in MergeTask and AppendTask (#3976) * buildV9Directly in MergeTask and AppendTask * add doc 2017-02-28 13:04:32 -05:00			`"buildV9Directly": <true or false, default true>,`
adding aggregators to segment metadata 2015-11-05 23:52:10 -05:00			`"aggregations": <optional list of aggregators>`
renaming all *.md filenames to only have lowercase and dashes so that they are editable on case-insensitive os as well 2015-05-05 17:07:32 -04:00			`}`
			```

			`### Merge Task`

ability to not rollup at index time, make pre aggregation an option (#3020) * ability to not rollup at index time, make pre aggregation an option * rename getRowIndexForRollup to getPriorIndex * fix doc misspelling * test query using no-rollup indexes * fix benchmark fail due to jmh bug 2016-08-02 14:13:05 -04:00			`Merge tasks merge a list of segments together. Any common timestamps are merged.`
			`If rollup is disabled as part of ingestion, common timestamps are not merged and rows are reordered by their timestamp.`

			`The grammar is:`
renaming all *.md filenames to only have lowercase and dashes so that they are editable on case-insensitive os as well 2015-05-05 17:07:32 -04:00
			```json
			`{`
			`"type": "merge",`
			`"id": <task_id>,`
			`"dataSource": <task_datasource>,`
update doc about aggregation field in merge task and a null check 2015-09-24 23:25:07 -04:00			`"aggregations": <list of aggregators>,`
ability to not rollup at index time, make pre aggregation an option (#3020) * ability to not rollup at index time, make pre aggregation an option * rename getRowIndexForRollup to getPriorIndex * fix doc misspelling * test query using no-rollup indexes * fix benchmark fail due to jmh bug 2016-08-02 14:13:05 -04:00			`"rollup": <whether or not to rollup data during a merge>,`
buildV9Directly in MergeTask and AppendTask (#3976) * buildV9Directly in MergeTask and AppendTask * add doc 2017-02-28 13:04:32 -05:00			`"buildV9Directly": <true or false, default true>,`
renaming all *.md filenames to only have lowercase and dashes so that they are editable on case-insensitive os as well 2015-05-05 17:07:32 -04:00			`"segments": <JSON list of DataSegment objects to merge>`
			`}`
			```

			`Segment Destroying Tasks`
			`------------------------`

			`### Kill Task`

			`Kill tasks delete all information about a segment and removes it from deep storage. Killable segments must be disabled (used==0) in the Druid segment table. The available grammar is:`

			```json
			`{`
			`"type": "kill",`
			`"id": <task_id>,`
			`"dataSource": <task_datasource>,`
			`"interval" : <all_segments_in_this_interval_will_die!>`
			`}`
			```

			`Misc. Tasks`
			`-----------`

			`### Version Converter Task`
Add convert.md to document conversion task 2015-07-24 19:07:15 -04:00			`The convert task suite takes active segments and will recompress them using a new IndexSpec. This is handy when doing activities like migrating from Concise to Roaring, or adding dimension compression to old segments.`
renaming all *.md filenames to only have lowercase and dashes so that they are editable on case-insensitive os as well 2015-05-05 17:07:32 -04:00
Add convert.md to document conversion task 2015-07-24 19:07:15 -04:00			Upon success the new segments will have the same version as the old segment with `_converted` appended. A convert task may be run against the same interval for the same datasource multiple times. Each execution will append another `_converted` to the version for the segments
renaming all *.md filenames to only have lowercase and dashes so that they are editable on case-insensitive os as well 2015-05-05 17:07:32 -04:00
Add convert.md to document conversion task 2015-07-24 19:07:15 -04:00			`There are two types of conversion tasks. One is the Hadoop convert task, and the other is the indexing service convert task. The Hadoop convert task runs on a hadoop cluster, and simply leaves a task monitor on the indexing service (similar to the hadoop batch task). The indexing service convert task runs the actual conversion on the indexing service.`
new quickstart 2016-01-06 00:27:52 -05:00
			`#### Hadoop Convert Segment Task`
renaming all *.md filenames to only have lowercase and dashes so that they are editable on case-insensitive os as well 2015-05-05 17:07:32 -04:00			```json
			`{`
Add convert.md to document conversion task 2015-07-24 19:07:15 -04:00			`"type": "hadoop_convert_segment",`
			`"dataSource":"some_datasource",`
			`"interval":"2013/2015",`
			`"indexSpec":{"bitmap":{"type":"concise"},"dimensionCompression":"lz4","metricCompression":"lz4"},`
			`"force": true,`
			`"validate": false,`
			`"distributedSuccessCache":"hdfs://some-hdfs-nn:9000/user/jobrunner/cache",`
			`"jobPriority":"VERY_LOW",`
			`"segmentOutputPath":"s3n://somebucket/somekeyprefix"`
renaming all *.md filenames to only have lowercase and dashes so that they are editable on case-insensitive os as well 2015-05-05 17:07:32 -04:00			`}`
			```

Add convert.md to document conversion task 2015-07-24 19:07:15 -04:00			`The values are described below.`

			`\|Field\|Type\|Description\|Required\|`
			`\|-----\|----\|-----------\|--------\|`
			\|`type`\|String\|Convert task identifier\|Yes: `hadoop_convert_segment`\|
			\|`dataSource`\|String\|The datasource to search for segments\|Yes\|
			\|`interval`\|Interval string\|The interval in the datasource to look for segments\|Yes\|
			\|`indexSpec`\|json\|The compression specification for the index\|Yes\|
			\|`force`\|boolean\|Forces the convert task to continue even if binary versions indicate it has been updated recently (you probably want to do this)\|No\|
			\|`validate`\|boolean\|Runs validation between the old and new segment before reporting task success\|No\|
			\|`distributedSuccessCache`\|URI\|A location where hadoop should put intermediary files.\|Yes\|
			\|`jobPriority`\|`org.apache.hadoop.mapred.JobPriority` as String\|The priority to set for the hadoop job\|No\|
			\|`segmentOutputPath`\|URI\|A base uri for the segment to be placed. Same format as other places a segment output path is needed\|Yes\|


new quickstart 2016-01-06 00:27:52 -05:00			`#### Indexing Service Convert Segment Task`
Add convert.md to document conversion task 2015-07-24 19:07:15 -04:00			```json
			`{`
			`"type": "convert_segment",`
			`"dataSource":"some_datasource",`
			`"interval":"2013/2015",`
			`"indexSpec":{"bitmap":{"type":"concise"},"dimensionCompression":"lz4","metricCompression":"lz4"},`
			`"force": true,`
			`"validate": false`
			`}`
			```

			`\|Field\|Type\|Description\|Required (default)\|`
			`\|-----\|----\|-----------\|--------\|`
			\|`type`\|String\|Convert task identifier\|Yes: `convert_segment`\|
			\|`dataSource`\|String\|The datasource to search for segments\|Yes\|
			\|`interval`\|Interval string\|The interval in the datasource to look for segments\|Yes\|
			\|`indexSpec`\|json\|The compression specification for the index\|Yes\|
			\|`force`\|boolean\|Forces the convert task to continue even if binary versions indicate it has been updated recently (you probably want to do this)\|No (false)\|
			\|`validate`\|boolean\|Runs validation between the old and new segment before reporting task success\|No (true)\|

			`Unlike the hadoop convert task, the indexing service task draws its output path from the indexing service's configuration.`
new quickstart 2016-01-06 00:27:52 -05:00
renaming all *.md filenames to only have lowercase and dashes so that they are editable on case-insensitive os as well 2015-05-05 17:07:32 -04:00			`### Noop Task`

			`These tasks start, sleep for a time and are used only for testing. The available grammar is:`

			```json
			`{`
			`"type": "noop",`
			`"id": <optional_task_id>,`
			`"interval" : <optional_segment_interval>,`
			`"runTime" : <optional_millis_to_sleep>,`
			`"firehose": <optional_firehose_to_test_connect>`
			`}`
			```

			`Locking`
			`-------`
new quickstart 2016-01-06 00:27:52 -05:00
			`Once an overlord node accepts a task, a lock is created for the data source and interval specified in the task.`
			`Tasks do not need to explicitly release locks, they are released upon task completion. Tasks may potentially release`
			`locks early if they desire. Tasks ids are unique by naming them using UUIDs or the timestamp in which the task was created.`
			`Tasks are also part of a "task group", which is a set of tasks that can share interval locks.`