From e86273a9ae555c915faa2e89d1381e07eb9e2d31 Mon Sep 17 00:00:00 2001
From: Igal Levy <igal.levy@metamarkets.com>
Date: Thu, 15 Jan 2015 21:19:53 -0800
Subject: [PATCH] New section to provide detail on how and why to set up
 sharding

* Different types of sharding strategies
* How and what to configure
* Redundancy and scale-out
---
 docs/content/Realtime-ingestion.md | 86 +++++++++++++++++++++++++++---
 1 file changed, 78 insertions(+), 8 deletions(-)
diff --git a/docs/content/Realtime-ingestion.md b/docs/content/Realtime-ingestion.md
index 23afede2759..769c6c774e3 100644
--- a/docs/content/Realtime-ingestion.md
+++ b/docs/content/Realtime-ingestion.md
@@ -141,7 +141,7 @@ The tuningConfig is optional and default parameters will be used if no tuningCon
 |versioningPolicy|Object|How to version segments.|no (default == based on segment start time)|
 |rejectionPolicy|Object|Controls how data sets the data acceptance policy for creating and handing off segments. More on this below.|no (default=='serverTime')|
 |maxPendingPersists|Integer|Maximum number of persists that can be pending, but not started. If this limit would be exceeded by a new intermediate persist, ingestion will block until the currently-running persist finishes. Maximum heap memory usage for indexing scales with maxRowsInMemory * (2 + maxPendingPersists).|no (default == 0; meaning one persist can be running concurrently with ingestion, and none can be queued up)|
-|shardSpec|Object|This describes the shard that is represented by this server. This must be specified properly in order to have multiple realtime nodes indexing the same data stream in a sharded fashion.|no (default == 'NoneShardSpec'|
+|shardSpec|Object|This describes the shard that is represented by this server. This must be specified properly in order to have multiple realtime nodes indexing the same data stream in a [sharded fashion](#sharding).|no (default == 'NoneShardSpec'|
 
 #### Rejection Policy
 
@@ -150,18 +150,88 @@ The following policies are available:
     * `messageTime` &ndash; Can be used for non-"current time" as long as that data is relatively in sequence. Events are rejected if they are less than `windowPeriod` from the event with the latest timestamp. Hand off only occurs if an event is seen after the segmentGranularity and `windowPeriod`.
     * `none` &ndash; Never hands off data unless shutdown() is called on the configured firehose.
 
-#### Sharding Real-time Ingestion
 
-The `shardSpec` can be used to shard real-time ingestion. Different shardSpecs are required for different real-time nodes for the same dataSource. For example:
+####<a id="sharding"></a> Sharding
+Druid uses shards, or segments with partition numbers, to more efficiently handle large amounts of incoming data. In Druid, shards represent the segments that together cover a time interval based on the value of `segmentGranularity`. If, for example, `segmentGranularity` is set to "hour", then a number of shards may be used to store the data for that hour. Sharding along dimensions may also occur to optimize efficiency.
+
+Segments are identified by datasource, time interval, and version. With sharding, a segment is also identified by a partition number. Typically, each shard will have the same version but a different partition number to uniquely identify it.
+
+In small-data scenarios, sharding is unnecessary and can be set to none (the default):
 
 ```json
-"shardSpec" : {
-  "type" : "linear",
-  "partitionNumber" : 0
-}
+    "shardSpec": {"type": "none"}
 ```
 
-Other real-time nodes for the same dataSource will have monotonically increasing shardSpec numbers.
+However, in scenarios with multiple realtime nodes, `none` is less useful as it cannot help with scaling data volume (see below). Note that for the batch indexing service, no explicit configuration is required; sharding is provided automatically.
+
+Druid uses sharding based on the `shardSpec` setting you configure. The recommended choices, `linear` and `numbered`, are discussed below; other types have been useful for internal Druid development but are not appropriate for production setups.
+ 
+##### Linear
+This strategy provides following advantages:
+
+* There is no need to update the fileSpec configurations of existing nodes when adding new nodes.
+* All unique shards are queried, regardless of whether the partition numbering is sequential or not (it allows querying of partitions 0 and 2, even if partition 1 is missing).
+
+Configure `linear` under `schema`:
+
+```json
+    "shardSpec": {
+        "type": "linear",
+        "partitionNum": 0
+    }
+```
+            
+
+##### Numbered
+This strategy is similar to `linear` except that it does not tolerate non-sequential partition numbering (it will *not* allow querying of partitions 0 and 2 if partition 1 is missing). It also requires explicitly setting the total number of partitions.
+
+Configure `numbered` under `schema`:
+
+```json
+    "shardSpec": {
+        "type": "numbered",
+        "partitionNum": 0,
+        "partitions": 2
+    }
+```
+     
+
+##### Scale and Redundancy
+The `shardSpec` configuration can be used to create redundancy by having the same `partitionNum` values on different nodes.
+
+For example, if RealTimeNode1 has:
+
+```json
+    "shardSpec": {
+        "type": "linear",
+        "partitionNum": 0
+    }
+```
+            
+and RealTimeNode2 has:
+
+```json
+    "shardSpec": {
+        "type": "linear",
+        "partitionNum": 0
+    }
+```
+
+then two realtime nodes can store segments with the same datasource, version, time interval, and partition number. Brokers that query for data in such segments will assume that they hold the same data, and the query will target only one of the segments.
+
+`shardSpec` can also help achieve scale. For this, add nodes with a different `partionNum`. Continuing with the example, if RealTimeNode3 has:
+
+```json
+    "shardSpec": {
+        "type": "linear",
+        "partitionNum": 1
+    }
+```
+
+then it can store segments with the same datasource, time interval, and version as in the first two nodes, but with a different partition number. Brokers that query for data in such segments will assume that a segment from RealTimeNode3 holds *different* data, and the query will target it along with a segment from the first two nodes.
+
+You can use type `numbered` similarly. Note that type `none` is essentially type `linear` with all shards having a fixed `partitionNum` of 0.
+
 
 ## Realtime Ingestion using the Indexing Service