Add documentation for segment replication GA release (#3461)

* Add documentation for segment replication GA release

Signed-off-by: ariamarble <armarble@amazon.com>

* SegRep doc updates GA

Signed-off-by: ariamarble <armarble@amazon.com>

* content updates

Signed-off-by: ariamarble <armarble@amazon.com>

* Apply suggestions from doc review

Co-authored-by: Melissa Vagi <105296784+vagimeli@users.noreply.github.com>
Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

* additional doc review changes

Signed-off-by: ariamarble <armarble@amazon.com>

* small changes

Signed-off-by: ariamarble <armarble@amazon.com>

* Apply suggestions from editorial review

Co-authored-by: Nathan Bower <nbower@amazon.com>

* Update _tuning-your-cluster/availability-and-recovery/segment-replication/index.md

* Add primary balance information

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* adding backpressure page

Signed-off-by: ariamarble <armarble@amazon.com>

* Implemented tech review comments

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* final changes

Signed-off-by: ariamarble <armarble@amazon.com>

---------

Signed-off-by: ariamarble <armarble@amazon.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Co-authored-by: Melissa Vagi <105296784+vagimeli@users.noreply.github.com>
Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>
Co-authored-by: Nathan Bower <nbower@amazon.com>
Co-authored-by: Fanit Kolchina <kolchfa@amazon.com>
This commit is contained in:
Aria Marble 2023-04-18 10:19:13 -07:00 committed by GitHub
parent c6d2e3ea4e
commit f831c33111
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
3 changed files with 237 additions and 91 deletions

View File

@ -0,0 +1,43 @@
---
layout: default
title: Segment replication back-pressure
nav_order: 75
parent: Segment replication
has_children: false
grand_parent: Availability and Recovery
---
## Segment replication back-pressure
Segment replication back-pressure is a per-shard level rejection mechanism that dynamically rejects indexing requests when the number of replica shards in your cluster are falling behind the number of primary shards. With Segment replication back-pressure, indexing requests are rejected when more than half of the replication group is stale, which is defined by the `MAX_ALLOWED_STALE_SHARDS` field. A replica is considered stale if it is behind by more than the defined `MAX_INDEXING_CHECKPOINTS` field, and its current replication lag is over the defined `MAX_REPLICATION_TIME_SETTING` field.
Replica shards are also monitored to determine whether the shards are stuck or are lagging for an extended period of time. When replica shards are stuck or lagging for more than double the amount of time defined by the `MAX_REPLICATION_TIME_SETTING` field, the shards are removed and then replaced with new replica shards.
## Request fields
Segment replication back-pressure is enabled by default. The following are dynamic cluster settings, and can be enabled or disabled using the [cluster settings]({{site.url}}{{site.baseurl}}/api-reference/cluster-api/cluster-settings/) API endpoint.
Field | Data type | Description
:--- | :--- | :---
SEGMENT_REPLICATION_INDEXING_PRESSURE_ENABLED | Boolean | Enables the segment replication back-pressure mechanism. Default is `true`.
MAX_REPLICATION_TIME_SETTING | Time unit | The maximum time that a replica shard can take to copy from primary. Once `MAX_REPLICATION_TIME_SETTING` is breached along with `MAX_INDEXING_CHECKPOINTS`, the segment replication back-pressure mechanism is triggered. Default is `5 minutes`.
MAX_INDEXING_CHECKPOINTS | Integer | The maximum number of indexing checkpoints that a replica shard can fall behind when copying from primary. Once `MAX_INDEXING_CHECKPOINTS` is breached along with `MAX_REPLICATION_TIME_SETTING`, the segment replication back-pressure mechanism is triggered. Default is `4` checkpoints.
MAX_ALLOWED_STALE_SHARDS | Floating point | The maximum number of stale replica shards that can exist in a replication group. Once `MAX_ALLOWED_STALE_SHARDS` is breached, the segment replication back-pressure mechanism is triggered. Default is `.5`, which is 50% of a replication group.
## Path and HTTP methods
You can use the segment replication API endpoint to retrieve segment replication back-pressure metrics.
```bash
GET _cat/segment_replication
```
#### Example response
```json
shardId target_node target_host checkpoints_behind bytes_behind current_lag last_completed_lag rejected_requests
[index-1][0] runTask-1 127.0.0.1 0 0b 0s 7ms 0
```
- `checkpoints_behind` and `current_lag` directly correlate with `MAX_INDEXING_CHECKPOINTS` and `MAX_REPLICATION_TIME_SETTING`.
- `checkpoints_behind` and `current_lag` metrics are taken into consideration when triggering segment replication back-pressure.

View File

@ -1,84 +0,0 @@
---
layout: default
title: Segment replication configuration
nav_order: 12
parent: Segment replication
grand_parent: Availability and Recovery
---
# Segment replication configuration
Segment replication is an experimental feature. Therefore, we do not recommend the use of segment replication in a production environment. For updates on the progress of segment replication or if you want to leave feedback that could help improve the feature, see the [Segment replication issue](https://github.com/opensearch-project/OpenSearch/issues/2194).
{: .warning }
To enable the segment replication type, reference the steps below.
## Enabling the feature flag
There are several methods for enabling segment replication, depending on the install type. You will also need to set the replication strategy to `SEGMENT` when creating the index.
### Enable on a node using a tarball install
The flag is toggled using a new jvm parameter that is set either in `OPENSEARCH_JAVA_OPTS` or in config/jvm.options.
1. Option 1: Update config/jvm.options by adding the following line:
````json
-Dopensearch.experimental.feature.replication_type.enabled=true
````
1. Option 2: Use the `OPENSEARCH_JAVA_OPTS` environment variable:
````json
export OPENSEARCH_JAVA_OPTS="-Dopensearch.experimental.feature.replication_type.enabled=true"
````
1. Option 3: For developers using Gradle, update run.gradle by adding the following lines:
````json
testClusters {
runTask {
testDistribution = 'archive'
if (numZones > 1) numberOfZones = numZones
if (numNodes > 1) numberOfNodes = numNodes
systemProperty 'opensearch.experimental.feature.replication_type.enabled', 'true'
}
}
````
### Enable with Docker containers
If you're running Docker, add the following line to docker-compose.yml underneath the `opensearch-node` and `environment` section:
````json
OPENSEARCH_JAVA_OPTS="-Dopensearch.experimental.feature.replication_type.enabled=true" # Enables segment replication
````
### Setting the replication strategy on the index
To set the replication strategy to segment replication, create an index with replication.type set to `SEGMENT`:
````json
PUT /my-index1
{
"settings": {
"index": {
"replication.type": "SEGMENT"
}
}
}
````
## Known limitations
1. Enabling segment replication for an existing index requires [reindexing](https://github.com/opensearch-project/OpenSearch/issues/3685).
1. Rolling upgrades are currently not supported. Full cluster restarts are required when upgrading indexes using segment replication. [Issue 3881](https://github.com/opensearch-project/OpenSearch/issues/3881).
1. [Cross-cluster replication](https://github.com/opensearch-project/OpenSearch/issues/4090) does not currently use segment replication to copy between clusters.
1. Increased network congestion on primary shards. [Issue - Optimize network bandwidth on primary shards](https://github.com/opensearch-project/OpenSearch/issues/4245).
1. Shard allocation algorithms have not been updated to evenly spread primary shards across nodes.
1. Integration with remote-backed storage as the source of replication is [currently unsupported](https://github.com/opensearch-project/OpenSearch/issues/4448).
### Further resources regarding segment replication
1. [Known issues](https://github.com/opensearch-project/OpenSearch/issues/2194).
1. Steps for testing (link coming soon).
1. Segment replication blog post (link coming soon).

View File

@ -11,17 +11,204 @@ redirect_from:
# Segment replication
Segment replication is an experimental feature with OpenSearch 2.3. Therefore, we do not recommend the use of segment replication in a production environment. For updates on the progress of segment replication or if you want leave feedback that could help improve the feature, see the [Segment replication git issue](https://github.com/opensearch-project/OpenSearch/issues/2194).
{: .warning}
With segment replication, segment files are copied across shards instead of documents being indexed on each shard copy. This improves indexing throughput and lowers resource utilization at the expense of increased network utilization.
As an experimental feature, segment replication will be behind a feature flag and must be enabled on **each node** of a cluster and pass a new setting during index creation.
{: .note }
When the primary shard sends a checkpoint to replica shards on a refresh, a new segment replication event is triggered on replica shards. This happens:
### Potential use cases
- When a new replica shard is added to a cluster.
- When there are segment file changes on a primary shard refresh.
- During peer recovery, such as replica shard recovery and shard relocation (explicit allocation using the `move` allocation command or automatic shard rebalancing).
Segment replication is the first feature in a series of features designed to decouple reads and writes in order to lower compute costs.
## Use cases
- Users who have high write loads but do not have high search requirements and are comfortable with longer refresh times.
- Users with very high loads who want to add new nodes, as you do not need to index all nodes when adding a new node to the cluster.
- OpenSearch cluster deployments with low replica counts, such as those used for log analytics.
This is the first step in a series of features designed to decouple reads and writes in order to lower compute costs.
## Segment replication configuration
To set segment replication as the replication strategy, create an index with replication.type set to `SEGMENT`:
````json
PUT /my-index1
{
"settings": {
"index": {
"replication.type": "SEGMENT"
}
}
}
````
In segment replication, the primary shard is usually generating more network traffic than the replicas because it copies segment files to the replicas. Thus, it's beneficial to distribute primary shards equally between the nodes. To ensure balanced primary shard distribution, set the dynamic `cluster.routing.allocation.balance.prefer_primary` setting to `true`. For more information, see [Cluster settings]({{site.url}}{{site.baseurl}}/api-reference/cluster-api/cluster-settings/).
## Comparing replication benchmarks
During initial benchmarks, segment replication users reported 40% higher throughput than when using document replication with the same cluster setup.
The following benchmarks were collected with [OpenSearch-benchmark](https://github.com/opensearch-project/opensearch-benchmark) using the [`stackoverflow`](https://www.kaggle.com/datasets/stackoverflow/stackoverflow) and [`nyc_taxi`](https://github.com/topics/nyc-taxi-dataset) datasets.
Both test runs were performed on a 10-node (m5.xlarge) cluster with 10 shards and 5 replicas. Each shard was about 3.2GBs in size. The tests were run with the following settings:
- `indices.recovery.max_bytes_per_sec`: 10gb
- `indices.recovery.max_concurrent_file_chunks`: 5
The benchmarking results are listed in the following table.
<table>
<tr>
<td></td>
<td></td>
<td>Document Replication</td>
<td>Segment Replication</td>
<td>Percent difference</td>
</tr>
<tr>
<td>Test execution time (minutes)</td>
<td></td>
<td>40.00</td>
<td>22.00</td>
<td></td>
</tr>
<tr>
<td rowspan="3">Throughput (number of requests per second)</td>
<td>p0</td>
<td>17553.90</td>
<td>28584.30</td>
<td>63%</td>
</tr>
<tr>
<td>p50</td>
<td>20647.20</td>
<td>32790.20</td>
<td>59%</td>
</tr>
<tr>
<td>p100</td>
<td>23209.00</td>
<td>34286.00</td>
<td>48%</td>
</tr>
<tr>
<td rowspan="4">CPU (%)</td>
<td>p50</td>
<td>65.00</td>
<td>30.00</td>
<td>-54%</td>
</tr>
<tr>
<td>p90</td>
<td>79.00</td>
<td>35.00</td>
<td>-56%</td>
</tr>
<tr>
<td>p99</td>
<td>98.00</td>
<td>45.08</td>
<td>-54%</td>
</tr>
<tr>
<td>p100</td>
<td>98.00</td>
<td>59.00</td>
<td>-40%</td>
</tr>
<tr>
<td rowspan="4">Memory (%)</td>
<td>p50</td>
<td>48.20</td>
<td>39.00</td>
<td>-19%</td>
</tr>
<tr>
<td>p90</td>
<td>62.00</td>
<td>61.00</td>
<td>-2%</td>
</tr>
<tr>
<td>p99</td>
<td>66.21</td>
<td>68.00</td>
<td>3%</td>
</tr>
<tr>
<td>p100</td>
<td>71.00</td>
<td>69.00</td>
<td>-3%</td>
</tr>
<tr>
<td rowspan="4">IOPS</td>
<td>p50</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>p90</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>p99</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>p100</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td rowspan="4">Latency</td>
<td>p50</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>p90</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>p99</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>p100</td>
<td></td>
<td></td>
<td></td>
</tr>
</table>
Your results may vary based on the cluster topology, hardware used, shard count, and merge settings.
{: .note }
## Other considerations
When using segment replication, consider the following:
1. Enabling segment replication for an existing index requires [reindexing](https://github.com/opensearch-project/OpenSearch/issues/3685).
1. Rolling upgrades are not currently supported. Full cluster restarts are required when upgrading indexes using segment replication. See [Issue 3881](https://github.com/opensearch-project/OpenSearch/issues/3881).
1. [Cross-cluster replication](https://github.com/opensearch-project/OpenSearch/issues/4090) does not currently use segment replication to copy between clusters.
1. Increased network congestion on primary shards. See [Issue - Optimize network bandwidth on primary shards](https://github.com/opensearch-project/OpenSearch/issues/4245).
1. Integration with remote-backed storage as the source of replication is [currently unsupported](https://github.com/opensearch-project/OpenSearch/issues/4448).
1. Read-after-write guarantees: The `wait_until` refresh policy is not compatible with segment replication. If you use the `wait_until` refresh policy while ingesting documents, you'll get a response only after the primary node has refreshed and made those documents searchable. Replica shards will respond only after having written to their local translog. We are exploring other mechanisms for providing read-after-write guarantees. For more information, see the corresponding [GitHub issue](https://github.com/opensearch-project/OpenSearch/issues/6046).
1. System indexes will continue to use document replication internally until read-after-write guarantees are available. In this case, document replication does not hinder the overall performance because there are few system indexes.
## Next steps
1. Track [future enhancements to segment replication](https://github.com/orgs/opensearch-project/projects/99).
1. Read [this blog post about segment replication](https://opensearch.org/blog).