From f5ec6b536ef30dcdb39215b2b2db8652e390a2f8 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Tue, 3 Oct 2023 08:49:02 -0500 Subject: [PATCH] Add post 2.10 remote store tweaks (#5118) * Add post 2.10 remote store tweaks. Signed-off-by: Naarcha-AWS * Fix comment about node types. Signed-off-by: Naarcha-AWS * Apply suggestions from code review Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: Naarcha-AWS Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --- .../remote-store/index.md | 18 +++++++++--------- .../remote-store/snapshot-interoperability.md | 19 +++++++++++-------- .../segment-replication/index.md | 2 +- 3 files changed, 21 insertions(+), 18 deletions(-) diff --git a/_tuning-your-cluster/availability-and-recovery/remote-store/index.md b/_tuning-your-cluster/availability-and-recovery/remote-store/index.md index 1dd2ebea..f6cf6405 100644 --- a/_tuning-your-cluster/availability-and-recovery/remote-store/index.md +++ b/_tuning-your-cluster/availability-and-recovery/remote-store/index.md @@ -123,7 +123,7 @@ Your results may vary based on your cluster topology, hardware, shard count, and For these benchmarks, we used the following cluster, shard, and test configuration: -* Node types: Three nodes---one data, one ingest, and one cluster manager node +* Nodes: Three nodes, each using the data, ingest, and cluster manager roles * Node instance: Amazon EC2 r6g.xlarge * OpenSearch Benchmark host: Single Amazon EC2 m5.2xlarge instance * Shard configuration: Three shards with one replica @@ -133,10 +133,10 @@ For these benchmarks, we used the following cluster, shard, and test configurati The following table lists the benchmarking results for the `so` workload with a remote translog buffer interval of 250 ms. -| | |8 bulk indexing clients (Default) |16 bulk indexing clients |24 bulk indexing clients | -|--- |--- |--- |--- |--- | -| | | Document replication | Remote enabled |Percent difference | Document replication | Remote enabled | Percent difference |Document replication | Remote enabled | Percent difference | -|Indexing throughput |Mean |29582.5 |40667.4 |37.47 |31154.9 |47862.3 |53.63 |31777.2 |51123.2 |60.88 | +| | | 8 bulk indexing clients (Default) | | | 16 bulk indexing clients | | | 24 bulk indexing clients | | | +|--- |--- |--- |--- |--- | --- | --- | --- | --- | --- | --- | +| | | Document replication | Remote enabled | Percent difference | Document replication | Remote enabled | Percent difference | Document replication | Remote enabled | Percent difference | +|Indexing throughput |Mean |29582.5 | 40667.4 |37.47 |31154.9 |47862.3 |53.63 |31777.2 |51123.2 |60.88 | |P50 |28915.4 |40343.4 |39.52 |30406.4 |47472.5 |56.13 |30852.1 |50547.2 |63.84 | |Indexing latency |P90 |1716.34 |1469.5 |-14.38 |3709.77 |2799.82 |-24.53 |5768.68 |3794.13 |-34.23 | @@ -144,8 +144,8 @@ The following table lists the benchmarking results for the `so` workload with a The following table lists the benchmarking results for the `http_logs` workload with a remote translog buffer interval of 200 ms. -| | |8 bulk indexing clients (Default) |16 bulk indexing clients |24 bulk indexing clients | -|--- |--- |--- |--- |--- | +| | | 8 bulk indexing clients (Default) | | | 16 bulk indexing clients | | | 24 bulk indexing clients | | | +|--- |--- |--- |--- |--- | --- | --- | --- | --- | --- | --- | | | | Document replication | Remote enabled |Percent difference | Document replication | Remote enabled | Percent difference |Document replication | Remote enabled | Percent difference | |Indexing throughput |Mean |149062 |82198.7 |-44.86 |134696 |148749 |10.43 |133050 |197239 |48.24 | |P50 |148123 |81656.1 |-44.87 |133591 |148859 |11.43 |132872 |197455 |48.61 | @@ -155,8 +155,8 @@ The following table lists the benchmarking results for the `http_logs` workload The following table lists the benchmarking results for the `http_logs` workload with a remote translog buffer interval of 250 ms. -| | |8 bulk indexing clients (Default) |16 bulk indexing clients |24 bulk indexing clients | -|--- |--- |--- |--- |--- | +| | | 8 bulk indexing clients (Default) | | | 16 bulk indexing clients | | | 24 bulk indexing clients | | | +|--- |--- |--- |--- |--- | --- | --- | --- | --- | --- | --- | | | | Document replication | Remote enabled |Percent difference | Document replication | Remote enabled | Percent difference |Document replication | Remote enabled | Percent difference | |Indexing throughput |Mean |93383.9 |94186.1 |0.86 |91624.8 |125770 |37.27 |93627.7 |132006 |40.99 | |P50 |91645.1 |93906.7 |2.47 |89659.8 |125443 |39.91 |91120.3 |132166 |45.05 | diff --git a/_tuning-your-cluster/availability-and-recovery/remote-store/snapshot-interoperability.md b/_tuning-your-cluster/availability-and-recovery/remote-store/snapshot-interoperability.md index a57aa123..0415af65 100644 --- a/_tuning-your-cluster/availability-and-recovery/remote-store/snapshot-interoperability.md +++ b/_tuning-your-cluster/availability-and-recovery/remote-store/snapshot-interoperability.md @@ -8,19 +8,22 @@ grand_parent: Availability and recovery # Shallow snapshots -Shallow copy snapshots allow you to reference data from an entire remote-backed segment instead of storing all of the data from the segment in a snapshot. This makes accessing segment data faster than using normal snapshots because segment data is not stored in the snapshot repository. +Shallow copy snapshots allow you to reference data from an entire remote-backed repository instead of storing all of the data from the segment in a snapshot repository. This makes accessing segment data faster than using normal snapshots because segment data is not stored in the snapshot repository. ## Enabling shallow snapshots -Use the [Cluster Settings API]({{site.url}}{{site.baseurl}}/api-reference/cluster-api/cluster-settings/) to enable the `remote_store_index_shallow_copy` repository setting, as shown in the following example: +Use the [Snapshot API]({{site.url}}{{site.baseurl}}/api-reference/snapshots/create-repository/) and set the `remote_store_index_shallow_copy` repository setting to `true` to enable shallow snapshot copies, as shown in the following example: ```bash -PUT _cluster/settings +PUT /_snapshot/snap_repo { - "persistent":{ - "remote_store_index_shallow_copy": true - } -} + "type": "s3", + "settings": { + "bucket": "test-bucket", + "base_path": "daily-snaps", + "remote_store_index_shallow_copy": true + } + } ``` {% include copy-curl.html %} @@ -32,5 +35,5 @@ Consider the following before using shallow copy snapshots: - Shallow copy snapshots only work for remote-backed indexes. - All nodes in the cluster must use OpenSearch 2.10 or later to take advantage of shallow copy snapshots. -- There is no difference in file size between standard shards and shallow copy snapshot shards because no segment data is stored in the snapshot itself. +- The `incremental` file count and size between the current snapshot and the last snapshot is `0` when using shallow copy snapshots. - Searchable snapshots are not supported inside shallow copy snapshots. diff --git a/_tuning-your-cluster/availability-and-recovery/segment-replication/index.md b/_tuning-your-cluster/availability-and-recovery/segment-replication/index.md index 6c389125..85201576 100644 --- a/_tuning-your-cluster/availability-and-recovery/segment-replication/index.md +++ b/_tuning-your-cluster/availability-and-recovery/segment-replication/index.md @@ -110,7 +110,7 @@ When using segment replication, consider the following: 1. [Cross-cluster replication](https://github.com/opensearch-project/OpenSearch/issues/4090) does not currently use segment replication to copy between clusters. 1. Segment replication is not compatible with [document-level monitors]({{site.url}}{{site.baseurl}}/observing-your-data/alerting/api/#document-level-monitors), which are used with the [Alerting]({{site.url}}{{site.baseurl}}/install-and-configure/plugins/) and [Security Analytics]({{site.url}}{{site.baseurl}}/security-analytics/index/) plugins. The plugins also use the latest available data on replica shards when using the `immediate` refresh policy, and segment replication can delay the policy's availability, resulting in stale replica shards. 1. Segment replication leads to increased network congestion on primary shards using node-to-node replication because replica shards fetch updates from the primary shard. With remote-backed storage, the primary shard can upload segments to, and the replicas can fetch updates from, the remote-backed storage. This helps offload responsibilities from the primary shard to the remote-backed storage. -Read-after-write guarantees: Segment replication does not currently support setting the refresh policy to `wait_for`. If you set the `refresh` query parameter to `wait_for` and then ingest documents, you'll get a response only after the primary node has refreshed and made those documents searchable. Replica shards will respond only after having written to their local translog. If real-time reads are needed, consider using the [`get`]({{site.url}}{{site.baseurl}}/api-reference/document-apis/get-documents/) or [`mget`]({{site.url}}{{site.baseurl}}/api-reference/document-apis/multi-get/) API operations. +1. Read-after-write guarantees: Segment replication does not currently support setting the refresh policy to `wait_for` or `true`. If you set the `refresh` query parameter to `wait_for` or `true` and then ingest documents, you'll get a response only after the primary node has refreshed and made those documents searchable. Replica shards will respond only after having written to their local translog. If real-time reads are needed, consider using the [`get`]({{site.url}}{{site.baseurl}}/api-reference/document-apis/get-documents/) or [`mget`]({{site.url}}{{site.baseurl}}/api-reference/document-apis/multi-get/) API operations. 1. As of OpenSearch 2.10, system indexes support segment replication. 1. Get, MultiGet, TermVector, and MultiTermVector requests serve strong reads by routing requests to the primary shards. Routing more requests to the primary shards may degrade performance as compared to distributing requests across primary and replica shards. To improve performance in read-heavy clusters, we recommend setting the `realtime` parameter in these requests to `false`. For more information, see [Issue #8700](https://github.com/opensearch-project/OpenSearch/issues/8700).