diff --git a/docs/content/Batch-ingestion.md b/docs/content/Batch-ingestion.md index f976841d468..3d718edde3d 100644 --- a/docs/content/Batch-ingestion.md +++ b/docs/content/Batch-ingestion.md @@ -167,9 +167,20 @@ For example, data for a day may be split by the dimension "last\_name" into two In hashed partition type, the number of partitions is determined based on the targetPartitionSize and cardinality of input set and the data is partitioned based on the hashcode of the row. It is recommended to use Hashed partition as it is more efficient than singleDimension since it does not need to determine the dimension for creating partitions. -Hashing also gives better distribution of data resulting in equal sized partitons and improving query performance +Hashing also gives better distribution of data resulting in equal sized partitions and improving query performance -To use this option, the indexer must be given a target partition size. It can then find a good set of partition ranges on its own. +To use this druid to automatically determine optimal partitions indexer must be given a target partition size. It can then find a good set of partition ranges on its own. + +#### Configuration for disabling auto-sharding and creating Fixed number of partitions + Druid can be configured to NOT run determine partitions and create a fixed number of shards by specifying numShards in hashed partitionsSpec. + e.g This configuration will skip determining optimal partitions and always create 4 shards for every segment granular interval + +```json + "partitionsSpec": { + "type": "hashed" + "numShards": 4 + } +``` |property|description|required?| |--------|-----------|---------| @@ -177,6 +188,7 @@ To use this option, the indexer must be given a target partition size. It can th |targetPartitionSize|target number of rows to include in a partition, should be a number that targets segments of 700MB\~1GB.|yes| |partitionDimension|the dimension to partition on. Leave blank to select a dimension automatically.|no| |assumeGrouped|assume input data has already been grouped on time and dimensions. This is faster, but can choose suboptimal partitions if the assumption is violated.|no| +|numShards|provides a way to manually override druid-auto sharding and specify the number of shards to create for each segment granular interval.It is only supported by hashed partitionSpec and targetPartitionSize must be set to -1|no| ### Updater job spec