Merge pull request #505 from metamx/doc-for-numShards

add doc for numShards
This commit is contained in:
fjy 2014-04-23 23:29:00 -06:00
commit 12292f3276
1 changed files with 14 additions and 2 deletions

View File

@ -167,9 +167,20 @@ For example, data for a day may be split by the dimension "last\_name" into two
In hashed partition type, the number of partitions is determined based on the targetPartitionSize and cardinality of input set and the data is partitioned based on the hashcode of the row.
It is recommended to use Hashed partition as it is more efficient than singleDimension since it does not need to determine the dimension for creating partitions.
Hashing also gives better distribution of data resulting in equal sized partitons and improving query performance
Hashing also gives better distribution of data resulting in equal sized partitions and improving query performance
To use this option, the indexer must be given a target partition size. It can then find a good set of partition ranges on its own.
To use this druid to automatically determine optimal partitions indexer must be given a target partition size. It can then find a good set of partition ranges on its own.
#### Configuration for disabling auto-sharding and creating Fixed number of partitions
Druid can be configured to NOT run determine partitions and create a fixed number of shards by specifying numShards in hashed partitionsSpec.
e.g This configuration will skip determining optimal partitions and always create 4 shards for every segment granular interval
```json
"partitionsSpec": {
"type": "hashed"
"numShards": 4
}
```
|property|description|required?|
|--------|-----------|---------|
@ -177,6 +188,7 @@ To use this option, the indexer must be given a target partition size. It can th
|targetPartitionSize|target number of rows to include in a partition, should be a number that targets segments of 700MB\~1GB.|yes|
|partitionDimension|the dimension to partition on. Leave blank to select a dimension automatically.|no|
|assumeGrouped|assume input data has already been grouped on time and dimensions. This is faster, but can choose suboptimal partitions if the assumption is violated.|no|
|numShards|provides a way to manually override druid-auto sharding and specify the number of shards to create for each segment granular interval.It is only supported by hashed partitionSpec and targetPartitionSize must be set to -1|no|
### Updater job spec