14 KiB

Raw Blame History

layout

title

nav_order

has_children

parent

datatable

redirect_from

default

Segment replication

true

Availability and recovery

true

/opensearch/segment-replication/

/opensearch/segment-replication/index/

Segment replication

Segment replication involves copying segment files across shards instead of indexing documents on each shard copy. This approach enhances indexing throughput and reduces resource utilization but increases network utilization. Segment replication is the first feature in a series of features designed to decouple reads and writes in order to lower compute costs.

When the primary shard sends a checkpoint to replica shards on a refresh, a new segment replication event is triggered on replica shards. This happens:

When a new replica shard is added to a cluster.
When there are segment file changes on a primary shard refresh.
During peer recovery, such as replica shard recovery and shard relocation (explicit allocation using the move allocation command or automatic shard rebalancing).

Use cases

Segment replication can be applied in a variety of scenarios, including:

High write loads without high search requirements and with longer refresh times.
When experiencing very high loads, you want to add new nodes but don't want to index all data immediately.
OpenSearch cluster deployments with low replica counts, such as those used for log analytics.

Segment replication configuration

Setting the default replication type for a cluster affects all newly created indexes. You can, however, specify a different replication type when creating an index. Index-level settings override cluster-level settings.

Creating an index with segment replication

To use segment replication as the replication strategy for an index, create the index with the replication.type parameter set to SEGMENT as follows:

PUT /my-index1
{
  "settings": {
    "index": {
      "replication.type": "SEGMENT" 
    }
  }
}

{% include copy-curl.html %}

In segment replication, the primary shard is usually generating more network traffic than the replicas because it copies segment files to the replicas. Thus, it's beneficial to distribute primary shards equally between the nodes. To ensure balanced primary shard distribution, set the dynamic cluster.routing.allocation.balance.prefer_primary setting to true. For more information, see Cluster settings.

For the best performance, it is recommended that you enable the following settings:

Segment replication backpressure
Balanced primary shard allocation, using the following command:

curl -X PUT "$host/_cluster/settings?pretty" -H 'Content-Type: application/json' -d'
  {
    "persistent": {
    "cluster.routing.allocation.balance.prefer_primary": true,
    "segrep.pressure.enabled": true
   }
  }

{% include copy-curl.html %}

Setting the replication type for a cluster

You can set the default replication type for newly created cluster indexes in the opensearch.yml file as follows:

cluster.indices.replication.strategy: 'SEGMENT'

{% include copy.html %}

This cluster-level setting cannot be enabled through the REST API. This setting is not applied to system indexes and hidden indexes. By default, all system and hidden indexes in OpenSearch use document replication, even if this setting is enabled. {: .note}

Creating an index with document replication

Even when the default replication type is set to segment replication, you can create an index that uses document replication by setting replication.type to DOCUMENT as follows:

PUT /my-index1
{
  "settings": {
    "index": {
      "replication.type": "DOCUMENT" 
    }
  }
}

{% include copy-curl.html %}

Considerations

When using segment replication, consider the following:

Enabling segment replication for an existing index requires reindexing.
Cross-cluster replication does not currently use segment replication to copy between clusters.
Segment replication is not compatible with document-level monitors, which are used with the Alerting and Security Analytics plugins. The plugins also use the latest available data on replica shards when using the immediate refresh policy, and segment replication can delay the policy's availability, resulting in stale replica shards.
Segment replication leads to increased network congestion on primary shards. See Issue - Optimize network bandwidth on primary shards.
Integration with remote-backed storage as the source of replication is currently not supported.
Read-after-write guarantees: Segment replication does not currently support setting the refresh policy to wait_for. If you set the refresh query parameter to wait_for and then ingest documents, you'll get a response only after the primary node has refreshed and made those documents searchable. Replica shards will respond only after having written to their local translog. We are exploring other mechanisms for providing read-after-write guarantees. For more information, see the corresponding GitHub issue.
System indexes will continue to use document replication internally until read-after-write guarantees are available. In this case, document replication does not hinder the overall performance because there are few system indexes.

Benchmarks

During initial benchmarks, segment replication users reported 40% higher throughput than when using document replication with the same cluster setup.

The following benchmarks were collected with OpenSearch-benchmark using the stackoverflow and nyc_taxi datasets.

The benchmarks demonstrate the effect of the following configurations on segment replication:

The workload size
The number of primary shards
The number of replicas

Your results may vary based on the cluster topology, hardware used, shard count, and merge settings. {: .note }

Increasing the workload size

The following table lists benchmarking results for the nyc_taxi dataset with the following configuration:

10 m5.xlarge data nodes
40 primary shards, 1 replica each (80 shards total)
4 primary shards and 4 replica shards per node

		40 GB primary shard, 80 GB total			240 GB primary shard, 480 GB total
		Document Replication	Segment Replication	Percent difference	Document Replication	Segment Replication	Percent difference
Store size		85.2781	91.2268	N/A	515.726	558.039	N/A
Index throughput (number of requests per second)	Minimum	148,134	185,092	24.95%	100,140	168,335	68.10%
	Median	160,110	189,799	18.54%	106,642	170,573	59.95%
	Maximum	175,196	190,757	8.88%	108,583	172,507	58.87%
Error rate		0.00%	0.00%	0.00%	0.00%	0.00%	0.00%

As the size of the workload increases, the benefits of segment replication are amplified because the replicas are not required to index the larger dataset. In general, segment replication leads to higher throughput at lower resource costs than document replication in all cluster configurations, not accounting for replication lag.

Increasing the number of primary shards

The following table lists benchmarking results for the nyc_taxi dataset for 40 and 100 primary shards.

{::nomarkdown}

		40 primary shards, 1 replica			100 primary shards, 1 replica
		Document Replication	Segment Replication	Percent difference	Document Replication	Segment Replication	Percent difference
Index throughput (number of requests per second)	Minimum	148,134	185,092	24.95%	151,404	167,391	9.55%
	Median	160,110	189,799	18.54%	154,796	172,995	10.52%
	Maximum	175,196	190,757	8.88%	166,173	174,655	4.86%
Error rate		0.00%	0.00%	0.00%	0.00%	0.00%	0.00%

{:/}

As the number of primary shards increases, the benefits of segment replication over document replication decrease. While segment replication is still beneficial with a larger number of primary shards, the difference in performance becomes less pronounced because there are more primary shards per node that must copy segment files across the cluster.

Increasing the number of replicas

The following table lists benchmarking results for the stackoverflow dataset for 1 and 9 replicas.

{::nomarkdown}

		10 primary shards, 1 replica			10 primary shards, 9 replicas
		Document Replication	Segment Replication	Percent difference	Document Replication	Segment Replication	Percent difference
Index throughput (number of requests per second)	Median	72,598.10	90,776.10	25.04%	16,537.00	14,429.80	−12.74%
Index throughput (number of requests per second)	Maximum	86,130.80	96,471.00	12.01%	21,472.40	38,235.00	78.07%
CPU usage (%)	p50	17	18.857	10.92%	69.857	8.833	−87.36%
	p90	76	82.133	8.07%	99	86.4	−12.73%
	p99	100	100	0%	100	100	0%
	p100	100	100	0%	100	100	0%
Memory usage (%)	p50	35	23	−34.29%	42	40	−4.76%
	p90	59	57	−3.39%	59	63	6.78%
	p99	69	61	−11.59%	66	70	6.06%
	p100	72	62	−13.89%	69	72	4.35%
Error rate		0.00%	0.00%	0.00%	0.00%	2.30%	2.30%

{:/}

As the number of replicas increases, the amount of time required for primary shards to keep replicas up to date (known as the replication lag) also increases. This is because segment replication copies the segment files directly from primary shards to replicas.

The benchmarking results show a non-zero error rate as the number of replicas increases. The error rate indicates that the segment replication backpressure mechanism is initiated when replicas cannot keep up with the primary shard. However, the error rate is offset by the significant CPU and memory gains that segment replication provides.

Next steps

Track future enhancements to segment replication.
Read this blog post about segment replication.

14 KiB Raw Blame History Unescape Escape

Segment replication

Use cases

Segment replication configuration

Creating an index with segment replication

Setting the replication type for a cluster

Creating an index with document replication

Considerations

Benchmarks

Increasing the workload size

Increasing the number of primary shards

Increasing the number of replicas

Next steps

14 KiB

Raw Blame History