OpenSearch/docs/reference/mapping/fields/routing-field.asciidoc
Scott Somerville 372812da98 Allow an index to be partitioned with custom routing (#22274)
This change makes it possible for custom routing values to go to a subset of shards rather than
just a single shard. This enables the ability to utilize the spatial locality that custom routing can
provide while mitigating the likelihood of ending up with an imbalanced cluster or suffering
from a hot shard.

This is ideal for large multi-tenant indices with custom routing that suffer from one or both of
the following:
- The big tenants cannot fit into a single shard or there is so many of them that they will likely
end up on the same shard
- Tenants often have a surge in write traffic and a single shard cannot process it fast enough

Beyond that, this should also be useful for use cases where most queries are done under the context
of a specific field (e.g. a category) since it gives a hint at how the data can be stored to minimize
the number of shards to check per query. While a similar solution can be achieved with multiple
concrete indices or aliases per value today, those approaches breakdown for high cardinality fields.

A partitioned index enforces that mappings have routing required, that the partition size does not
change when shrinking an index (the partitions will shrink proportionally), and rejects mappings
that have parent/child relationships.

Closes #21585
2017-01-18 08:51:23 +01:00

137 lines
4.2 KiB
Plaintext

[[mapping-routing-field]]
=== `_routing` field
A document is routed to a particular shard in an index using the following
formula:
shard_num = hash(_routing) % num_primary_shards
The default value used for `_routing` is the document's <<mapping-id-field,`_id`>>
or the document's <<mapping-parent-field,`_parent`>> ID, if present.
Custom routing patterns can be implemented by specifying a custom `routing`
value per document. For instance:
[source,js]
------------------------------
PUT my_index/my_type/1?routing=user1&refresh=true <1>
{
"title": "This is a document"
}
GET my_index/my_type/1?routing=user1 <2>
------------------------------
// CONSOLE
// TESTSETUP
<1> This document uses `user1` as its routing value, instead of its ID.
<2> The same `routing` value needs to be provided when
<<docs-get,getting>>, <<docs-delete,deleting>>, or <<docs-update,updating>>
the document.
The value of the `_routing` field is accessible in queries:
[source,js]
--------------------------
GET my_index/_search
{
"query": {
"terms": {
"_routing": [ "user1" ] <1>
}
}
}
--------------------------
// CONSOLE
<1> Querying on the `_routing` field (also see the <<query-dsl-ids-query,`ids` query>>)
==== Searching with custom routing
Custom routing can reduce the impact of searches. Instead of having to fan
out a search request to all the shards in an index, the request can be sent to
just the shard that matches the specific routing value (or values):
[source,js]
------------------------------
GET my_index/_search?routing=user1,user2 <1>
{
"query": {
"match": {
"title": "document"
}
}
}
------------------------------
// CONSOLE
<1> This search request will only be executed on the shards associated with the `user1` and `user2` routing values.
==== Making a routing value required
When using custom routing, it is important to provide the routing value
whenever <<docs-index_,indexing>>, <<docs-get,getting>>,
<<docs-delete,deleting>>, or <<docs-update,updating>> a document.
Forgetting the routing value can lead to a document being indexed on more than
one shard. As a safeguard, the `_routing` field can be configured to make a
custom `routing` value required for all CRUD operations:
[source,js]
------------------------------
PUT my_index2
{
"mappings": {
"my_type": {
"_routing": {
"required": true <1>
}
}
}
}
PUT my_index2/my_type/1 <2>
{
"text": "No routing value provided"
}
------------------------------
// CONSOLE
// TEST[catch:request]
<1> Routing is required for `my_type` documents.
<2> This index request throws a `routing_missing_exception`.
==== Unique IDs with custom routing
When indexing documents specifying a custom `_routing`, the uniqueness of the
`_id` is not guaranteed across all of the shards in the index. In fact,
documents with the same `_id` might end up on different shards if indexed with
different `_routing` values.
It is up to the user to ensure that IDs are unique across the index.
[[routing-index-partition]]
==== Routing to an index partition
An index can be configured such that custom routing values will go to a subset of the shards rather
than a single shard. This helps mitigate the risk of ending up with an imbalanced cluster while still
reducing the impact of searches.
This is done by providing the index level setting <<routing-partition-size,`index.routing_partition_size`>> at index creation.
As the partition size increases, the more evenly distributed the data will become at the
expense of having to search more shards per request.
When this setting is present, the formula for calculating the shard becomes:
shard_num = (hash(_routing) + hash(_id) % routing_partition_size) % num_primary_shards
That is, the `_routing` field is used to calculate a set of shards within the index and then the
`_id` is used to pick a shard within that set.
To enable this feature, the `index.routing_partition_size` should have a value greater than 1 and
less than `index.number_of_shards`.
Once enabled, the partitioned index will have the following limitations:
* Mappings with parent-child relationships cannot be created within it.
* All mappings within the index must have the `_routing` field marked as required.