lucene/solr/solr-ref-guide/src/rule-based-replica-placemen...

= Rule-based Replica Placement
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements.  See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership.  The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License.  You may obtain a copy of the License at
//
//   http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied.  See the License for the
// specific language governing permissions and limitations
// under the License.

When Solr needs to assign nodes to collections, it can either automatically assign them randomly or the user can specify a set of nodes where it should create the replicas.

With very large clusters, it is hard to specify exact node names and it still does not give you fine grained control over how nodes are chosen for a shard. The user should be in complete control of where the nodes are allocated for each collection, shard and replica. This helps to optimally allocate hardware resources across the cluster.

Rule-based replica assignment allows the creation of rules to determine the placement of replicas in the cluster. In the future, this feature will help to automatically add or remove replicas when systems go down, or when higher throughput is required. This enables a more hands-off approach to administration of the cluster.

This feature is used in the following instances:

* Collection creation
* Shard creation
* Replica creation
* Shard splitting

== Common Use Cases

There are several situations where this functionality may be used. A few of the rules that could be implemented are listed below:

* Don’t assign more than 1 replica of this collection to a host.
* Assign all replicas to nodes with more than 100GB of free disk space or, assign replicas where there is more disk space.
* Do not assign any replica on a given host because I want to run an overseer there.
* Assign only one replica of a shard in a rack.
* Assign replica in nodes hosting less than 5 cores.
* Assign replicas in nodes hosting the least number of cores.

== Rule Conditions

A rule is a set of conditions that a node must satisfy before a replica core can be created there.

There are three possible conditions.

* *shard*: this is the name of a shard or a wild card (* means for all shards). If shard is not specified, then the rule applies to the entire collection.
* *replica*: this can be a number or a wild-card (* means any number zero to infinity).
* *tag*: this is an attribute of a node in the cluster that can be used in a rule, e.g., “freedisk”, “cores”, “rack”, “dc”, etc. The tag name can be a custom string. If creating a custom tag, a snitch is responsible for providing tags and values. The section <<Snitches>> below describes how to add a custom tag, and defines six pre-defined tags (cores, freedisk, host, port, node, and sysprop).

=== Rule Operators

A condition can have one of the following operators to set the parameters for the rule.

* *equals (no operator required)*: `tag:x` means tag value must be equal to ‘x’
* *greater than (>)*: `tag:>x` means tag value greater than ‘x’. x must be a number
* *less than (<)*: `tag:<x` means tag value less than ‘x’. x must be a number
* *not equal (!)*: `tag:!x` means tag value MUST NOT be equal to ‘x’. The equals check is performed on String value

=== Fuzzy Operator (~)

This can be used as a suffix to any condition. This would first try to satisfy the rule strictly. If Solr can’t find enough nodes to match the criterion, it tries to find the next best match which may not satisfy the criterion. For example, if we have a rule such as, `freedisk:>200~`, Solr will try to assign replicas of this collection on nodes with more than 200GB of free disk space. If that is not possible, the node which has the most free disk space will be chosen instead.

=== Choosing Among Equals

The nodes are sorted first and the rules are used to sort them. This ensures that even if many nodes match the rules, the best nodes are picked up for node assignment. For example, if there is a rule such as `freedisk:>20`, nodes are sorted first on disk space descending and the node with the most disk space is picked up first. Or, if the rule is `cores:<5`, nodes are sorted with number of cores ascending and the node with the least number of cores is picked up first.

== Rules for New Shards

The rules are persisted along with collection state. So, when a new replica is created, the system will assign replicas satisfying the rules. When a new shard is created as a result of using the Collection API's <<collections-api.adoc#createshard,CREATESHARD command>>, ensure that you have created rules specific for that shard name. Rules can be altered using the <<collections-api.adoc#modifycollection,MODIFYCOLLECTION command>>. However, it is not required to do so if the rules do not specify explicit shard names. For example, a rule such as `shard:shard1,replica:*,ip_3:168:`, will not apply to any new shard created. But, if your rule is `replica:*,ip_3:168`, then it will apply to any new shard created.

The same is applicable to shard splitting. Shard splitting is treated exactly the same way as shard creation. Even though `shard1_1` and `shard1_2` may be created from `shard1`, the rules treat them as distinct, unrelated shards.

== Snitches

Tag values come from a plugin called Snitch. If there is a tag named ‘rack’ in a rule, there must be Snitch which provides the value for ‘rack’ for each node in the cluster. A snitch implements the Snitch interface. Solr, by default, provides a default snitch which provides the following tags:

* *cores*: Number of cores in the node
* *freedisk*: Disk space available in the node
* *host*: host name of the node
* *port*: port of the node
* *node*: node name
* *role*: The role of the node. The only supported role is 'overseer'
* *ip_1, ip_2, ip_3, ip_4*: These are ip fragments for each node. For example, in a host with ip `192.168.1.2`, `ip_1 = 2`, `ip_2 =1`, `ip_3 = 168` and` ip_4 = 192`
* *sysprop.\{PROPERTY_NAME}*: These are values available from system properties. `sysprop.key` means a value that is passed to the node as `-Dkey=keyValue` during the node startup. It is possible to use rules like `sysprop.key:expectedVal,shard:*`

=== How Snitches are Configured

It is possible to use one or more snitches for a set of rules. If the rules only need tags from default snitch it need not be explicitly configured. For example:

[source,text]
----
snitch=class:fqn.ClassName,key1:val1,key2:val2,key3:val3
----

*How Tag Values are Collected*

. Identify the set of tags in the rules
. Create instances of Snitches specified. The default snitch is always created.
. Ask each Snitch if it can provide values for the any of the tags. If even one tag does not have a snitch, the assignment fails.
. After identifying the Snitches, they provide the tag values for each node in the cluster.
. If the value for a tag is not obtained for a given node, it cannot participate in the assignment.

== Replica Placement Examples

=== Keep less than 2 replicas (at most 1 replica) of this collection on any node

For this rule, we define the `replica` condition with operators for "less than 2", and use a pre-defined tag named `node` to define nodes with any name.

[source,text]
----
replica:<2,node:*
// this is equivalent to replica:<2,node:*,shard:**. We can omit shard:** because ** is the default value of shard
----

=== For a given shard, keep less than 2 replicas on any node

For this rule, we use the `shard` condition to define any shard, the `replica` condition with operators for "less than 2", and finally a pre-defined tag named `node` to define nodes with any name.

[source,text]
----
shard:*,replica:<2,node:*
----

=== Assign all replicas in shard1 to rack 730

This rule limits the `shard` condition to 'shard1', but any number of replicas. We're also referencing a custom tag named `rack`. Before defining this rule, we will need to configure a custom Snitch which provides values for the tag `rack`.

[source,text]
----
shard:shard1,replica:*,rack:730
----

In this case, the default value of `replica` is `*`, or all replicas. It can be omitted and the rule will be reduced to:

[source,text]
----
shard:shard1,rack:730
----

=== Create replicas in nodes with less than 5 cores only

This rule uses the `replica` condition to define any number of replicas, but adds a pre-defined tag named `core` and uses operators for "less than 5".

[source,text]
----
replica:*,cores:<5
----

Again, we can simplify this to use the default value for `replica`, like so:

[source,text]
----
cores:<5
----

=== Do not create any replicas in host 192.45.67.3

This rule uses only the pre-defined tag `host` to define an IP address where replicas should not be placed.

[source,text]
----
host:!192.45.67.3
----

== Defining Rules

Rules are specified per collection during collection creation as request parameters. It is possible to specify multiple ‘rule’ and ‘snitch’ parameters as in this example:

[source,text]
----
snitch=class:EC2Snitch&rule=shard:*,replica:1,dc:dc1&rule=shard:*,replica:<2,dc:dc3
----

These rules are persisted in `clusterstate.json` in ZooKeeper and are available throughout the lifetime of the collection. This enables the system to perform any future node allocation without direct user interaction. The rules added during collection creation can be modified later using the <<collections-api.adoc#modifycollection,MODIFYCOLLECTION>> API.