The Index Task is a simpler variation of the Index Hadoop task that is designed to be used for smaller data sets. The task executes within the indexing service and does not require an external Hadoop setup to use. The grammar of the index task is as follows:
```
{
"type" : "index",
"dataSource" : "example",
"granularitySpec" : {
"type" : "uniform",
"gran" : "DAY",
"intervals" : [ "2010/2020" ]
},
"aggregators" : [ {
"type" : "count",
"name" : "count"
}, {
"type" : "doubleSum",
"name" : "value",
"fieldName" : "value"
} ],
"firehose" : {
"type" : "local",
"baseDir" : "/tmp/data/json",
"filter" : "sample_data.json",
"parser" : {
"timestampSpec" : {
"column" : "timestamp"
},
"data" : {
"format" : "json",
"dimensions" : [ "dim1", "dim2", "dim3" ]
}
}
}
}
```
|property|description|required?|
|--------|-----------|---------|
|type|The task type, this should always be "index".|yes|
The indexing service can also run real-time tasks. These tasks effectively transform a middle manager into a real-time node. We introduced real-time tasks as a way to programmatically add new real-time data sources without needing to manually add nodes. The grammar for the real-time task is as follows:
```
{
"type" : "index_realtime",
"id": "example,
"resource": {
"availabilityGroup" : "someGroup",
"requiredCapacity" : 1
},
"schema": {
"dataSource": "dataSourceName",
"aggregators": [
{
"type": "count",
"name": "events"
},
{
"type": "doubleSum",
"name": "outColumn",
"fieldName": "inColumn"
}
],
"indexGranularity": "minute",
"shardSpec": {
"type": "none"
}
},
"firehose": {
"type": "kafka-0.7.2",
"consumerProps": {
"zk.connect": "zk_connect_string",
"zk.connectiontimeout.ms": "15000",
"zk.sessiontimeout.ms": "15000",
"zk.synctime.ms": "5000",
"groupid": "consumer-group",
"fetch.size": "1048586",
"autooffset.reset": "largest",
"autocommit.enable": "false"
},
"feed": "your_kafka_topic",
"parser": {
"timestampSpec": {
"column": "timestamp",
"format": "iso"
},
"data": {
"format": "json"
},
"dimensionExclusions": [
"value"
]
}
},
"fireDepartmentConfig": {
"maxRowsInMemory": 500000,
"intermediatePersistPeriod": "PT10m"
},
"windowPeriod": "PT10m",
"segmentGranularity": "hour",
"rejectionPolicy": {
"type": "messageTime"
}
}
```
Id:
The ID of the task. Not required.
Resource:
A JSON object used for high availability purposes. Not required.
|Field|Type|Description|Required|
|-----|----|-----------|--------|
|availabilityGroup|String|An uniqueness identifier for the task. Tasks with the same availability group will always run on different middle managers. Used mainly for replication. |yes|
|requiredCapacity|Integer|How much middle manager capacity this task will take.|yes|
Kill tasks delete all information about a segment and removes it from deep storage. Killable segments must be disabled (used==0) in the Druid segment table. The available grammar is:
Once an overlord node accepts a task, a lock is created for the data source and interval specified in the task. Tasks do not need to explicitly release locks, they are released upon task completion. Tasks may potentially release locks early if they desire. Tasks ids are unique by naming them using UUIDs or the timestamp in which the task was created. Tasks are also part of a "task group", which is a set of tasks that can share interval locks.