Add cluster awareness and decommission docs (#2438)

* Add cluster awareness and decommission docs

Signed-off-by: Naarcha-AWS <naarcha@amazon.com>

* Update _api-reference/cluster-awareness.md

Co-authored-by: Bukhtawar Khan <bukhtawar7152@gmail.com>

* Edit technical feedback

Signed-off-by: Naarcha-AWS <naarcha@amazon.com>

* Add new cluster awareness examples

Signed-off-by: Naarcha-AWS <naarcha@amazon.com>

* Add technical feedback

Signed-off-by: Naarcha-AWS <naarcha@amazon.com>

* Update _api-reference/cluster-awareness.md

Co-authored-by: Alice Williams <88908598+alicejw-aws@users.noreply.github.com>

* Update _api-reference/cluster-awareness.md

Co-authored-by: Alice Williams <88908598+alicejw-aws@users.noreply.github.com>

* Add Caroline's feedback

Signed-off-by: Naarcha-AWS <naarcha@amazon.com>

* Add one more tweak

Signed-off-by: Naarcha-AWS <naarcha@amazon.com>

* Update _ml-commons-plugin/cluster-settings.md

Co-authored-by: Heather Halter <HDHALTER@AMAZON.COM>

* Update _ml-commons-plugin/cluster-settings.md

Co-authored-by: Heather Halter <HDHALTER@AMAZON.COM>

* Update _api-reference/cluster-awareness.md

Co-authored-by: Nate Bower <nbower@amazon.com>

* Update _api-reference/cluster-awareness.md

Co-authored-by: Nate Bower <nbower@amazon.com>

* Update _api-reference/cluster-awareness.md

Co-authored-by: Nate Bower <nbower@amazon.com>

* Update _ml-commons-plugin/cluster-settings.md

Co-authored-by: Nate Bower <nbower@amazon.com>

* Update _api-reference/cluster-awareness.md

Co-authored-by: Nate Bower <nbower@amazon.com>

* Update _api-reference/cluster-awareness.md

Co-authored-by: Nate Bower <nbower@amazon.com>

* Update _api-reference/cluster-awareness.md

Co-authored-by: Nate Bower <nbower@amazon.com>

* Update _api-reference/cluster-awareness.md

Co-authored-by: Nate Bower <nbower@amazon.com>

* Update _api-reference/cluster-awareness.md

Co-authored-by: Nate Bower <nbower@amazon.com>

* Update _ml-commons-plugin/cluster-settings.md

Co-authored-by: Nate Bower <nbower@amazon.com>

* Update _api-reference/cluster-awareness.md

Co-authored-by: Nate Bower <nbower@amazon.com>

* Update _api-reference/cluster-awareness.md

Co-authored-by: Nate Bower <nbower@amazon.com>

* Update _api-reference/cluster-decommission.md

Co-authored-by: Nate Bower <nbower@amazon.com>

* Update _api-reference/cluster-awareness.md

Co-authored-by: Nate Bower <nbower@amazon.com>

* Update _api-reference/cluster-decommission.md

Co-authored-by: Nate Bower <nbower@amazon.com>

* Add editoiral feedback

Signed-off-by: Naarcha-AWS <naarcha@amazon.com>

* Fix typos

Signed-off-by: Naarcha-AWS <naarcha@amazon.com>

* Final editorial note

Signed-off-by: Naarcha-AWS <naarcha@amazon.com>

Signed-off-by: Naarcha-AWS <naarcha@amazon.com>
Co-authored-by: Bukhtawar Khan <bukhtawar7152@gmail.com>
Co-authored-by: Alice Williams <88908598+alicejw-aws@users.noreply.github.com>
Co-authored-by: Heather Halter <HDHALTER@AMAZON.COM>
Co-authored-by: Nate Bower <nbower@amazon.com>
This commit is contained in:
Naarcha-AWS 2023-01-23 17:13:07 -06:00 committed by GitHub
parent c2e423ff71
commit 1589201a9e
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
6 changed files with 245 additions and 9 deletions

View File

@ -0,0 +1,121 @@
---
layout: default
title: Cluster routing and awareness
nav_order: 16
---
# Cluster routing and awareness
To control the distribution of search or HTTP traffic, you can use the weights per awareness attribute to control the distribution of search or HTTP traffic across zones. This is commonly used for zonal deployments, heterogeneous instances, and routing traffic away from zones during zonal failure.
## HTTP and path methods
```
PUT /_cluster/routing/awareness/<attribute>/weights
GET /_cluster/routing/awareness/<attribute>/weights?local
GET /_cluster/routing/awareness/<attribute>/weights
```
## Path parameters
Parameter | Type | Description
:--- | :--- | :---
attribute | String | The name of the awareness attribute, usually `zone`. The attribute name must match the values listed in the request body when assigning weights to zones.
## Request body parameters
Parameter | Type | Description
:--- | :--- | :---
weights | JSON object | Assigns weights to attributes within the request body of the PUT request. Weights can be set in any ratio, for example, 2:3:5. In a 2:3:5 ratio with 3 zones, for every 100 requests sent to the cluster, each zone would receive either 20, 30, or 50 search requests in a random order. When assigned a weight of `0`, the zone does not receive any search traffic.
_version | String | Implements optimistic concurrency control (OCC) through versioning. The parameter uses simple versioning, such as `1`, and increments upward based on each subsequent modification. This allows any servers from which a request originates to validate whether or not a zone has been modified.
In the following example request body, `zone_1` and `zone_2` receive 50 requests each, whereas `zone_3` is prevented from receiving requests:
```
{
"weights":
{
"zone_1": "5",
"zone_2": "5",
"zone_3": "0"
}
"_version" : 1
}
```
## Example: Weighted round robin search
The following example request creates a round robin shard allocation for search traffic by using an undefined ratio:
### Request
PUT /_cluster/routing/awareness/zone/weights
{
"weights":
{
"zone_1": "1",
"zone_2": "1",
"zone_3": "0"
}
"_version" : 1
}
### Response
```
{
"acknowledged": true
}
```
## Example: Getting weights for all zones
The following example request gets weights for all zones.
### Request
```
GET /_cluster/routing/awareness/zone/weights
```
### Response
OpenSearch responds with the weight of each zone:
```json
{
"weights":
{
"zone_1": "1.0",
"zone_2": "1.0",
"zone_3": "0.0"
},
"_version":1
}
```
## Example: Deleting weights
You can remove your weight ratio for each zone using the `DELETE` method.
### Request
```
DELETE /_cluster/routing/awareness/zone/weights
```
### Response
```json
{
"_version":1
}
```
## Next steps
- For more information about zone commissioning, see [Cluster decommission]({{site.url}}{{site.baseurl}}/api-reference/cluster-decommission/).
- For more information about allocation awareness, see [Cluster formation]({{site.url}}{{site.baseurl}}/opensearch/cluster/#advanced-step-6-configure-shard-allocation-awareness-or-forced-awareness).

View File

@ -0,0 +1,80 @@
---
layout: default
title: Cluster decommission
nav_order: 20
---
# Cluster decommission
The cluster decommission operation adds support decommissioning based on awareness. It greatly benefits multi-zone deployments, where awareness attributes, such as `zones`, can aid in applying new upgrades to a cluster in a controlled fashion. This is especially useful during outages, in which case, you can decommission the unhealthy zone to prevent replication requests from stalling and prevent your request backlog from becoming too large.
For more information about allocation awareness, see [Shard allocation awareness]({{site.url}}{{site.baseurl}}//opensearch/cluster/#shard-allocation-awareness).
## HTTP and Path methods
```
PUT /_cluster/decommission/awareness/{awareness_attribute_name}/{awareness_attribute_value}
GET /_cluster/decommission/awareness/{awareness_attribute_name}/_status
DELETE /_cluster/decommission/awareness
```
## URL parameters
Parameter | Type | Description
:--- | :--- | :---
awareness_attribute_name | String | The name of awareness attribute, usually `zone`.
awareness_attribute_value | String | The value of the awareness attribute. For example, if you have shards allocated in two different zones, you can give each zone a value of `zone-a` or `zoneb`. The cluster decommission operation decommissions the zone listed in the method.
## Example: Decommissioning and recommissioning a zone
You can use the following example requests to decommission and recommission a zone:
### Request
The following example request decommissions `zone-a`:
```
PUT /_cluster/decommission/awareness/<zone>/<zone-a>
```
If you want to recommission a decommissioned zone, you can use the `DELETE` method:
```
DELETE /_cluster/decommission/awareness
```
### Response
```json
{
"acknowledged": true
}
```
## Example: Getting zone decommission status
The following example requests returns the decommission status of all zones.
### Request
```
GET /_cluster/decommission/awareness/zone/_status
```
### Response
```json
{
"zone-1": "INIT | DRAINING | IN_PROGRESS | SUCCESSFUL | FAILED"
}
```
## Next steps
- For more information about zone awareness and weight, see [Cluster awareness]({{site.url}}{{site.baseurl}}/api-reference/cluster-awareness/).
- For more information about allocation awareness, see [Cluster formation]({{site.url}}{{site.baseurl}}/opensearch/cluster/#advanced-step-6-configure-shard-allocation-awareness-or-forced-awareness).

View File

@ -1,7 +1,7 @@
---
layout: default
title: Cluster health
nav_order: 16
nav_order: 17
---
# Cluster health
@ -47,6 +47,7 @@ wait_for_events | Enum | Wait until all currently queued events with the given p
wait_for_no_relocating_shards | Boolean | Whether to wait until there are no relocating shards in the cluster. Default is false.
wait_for_no_initializing_shards | Boolean | Whether to wait until there are no initializing shards in the cluster. Default is false.
wait_for_status | Enum | Wait until the cluster health reaches the specified status or better. Supported values are `green`, `yellow`, and `red`.
weights | JSON object | Assigns weights to attributes within the request body of the PUT request. Weights can be set in any ration, for example, 2:3:5. In a 2:3:5 ratio with three zones, for every 100 requests sent to the cluster, each zone would receive either 20, 30, or 50 search requests in a random order. When assigned a weight of `0`, the zone does not receive any search traffic.
#### Sample request

View File

@ -1,7 +1,7 @@
---
layout: default
title: Cluster settings
nav_order: 17
nav_order: 18
---
# Cluster settings

View File

@ -1,7 +1,7 @@
---
layout: default
title: Count
nav_order: 20
nav_order: 21
---
# Count

View File

@ -12,7 +12,7 @@ To enhance and customize your OpenSearch cluster for machine learning (ML), you
## Run tasks and models on ML nodes only
If `true`, ML Commons tasks and models run machine learning (ML) tasks on ML nodes only. If `false`, tasks and models run on ML nodes first. If no ML nodes exist, tasks and models run on data nodes. Don't set as `false` on a production cluster.
If `true`, ML Commons tasks and models run machine learning (ML) tasks on ML nodes only. If `false`, tasks and models run on ML nodes first. If no ML nodes exist, tasks and models run on data nodes. We recommend that you do not set this value to "false" on production clusters.
### Setting
@ -27,7 +27,7 @@ plugins.ml_commons.only_run_on_ml_node: true
## Dispatch tasks to ML node
`round_robin` dispatches ML tasks to ML nodes using round robin routing. `least_load` gathers all ML nodes' runtime information, such as JVM heap memory usage and running tasks, then dispatches tasks to the ML node with the least load.
`round_robin` dispatches ML tasks to ML nodes using round robin routing. `least_load` gathers runtime information from all ML nodes, like JVM heap memory usage and running tasks, and then dispatches the tasks to the ML node with the lowest load.
### Setting
@ -43,7 +43,9 @@ plugins.ml_commons.task_dispatch_policy: round_robin
- Value range: `round_robin` or `least_load`
## Set sync up job intervals
## Set sync job intervals
When returning runtime information with the [profile API]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api#profile), ML Commons will run a regular job to sync newly loaded or unloaded models on each node. When set to `0`, ML Commons immediately stops sync up jobs.
When returning runtime information with the [profile API]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api#profile), ML Commons will run a regular sync up job to sync up newly loaded or unloaded models on each node. When set to `0`, ML Commons immediately stops sync up jobs.
@ -60,7 +62,7 @@ plugins.ml_commons.sync_up_job_interval_in_seconds: 10
## Predict monitoring requests
Controls how many predict requests are monitored on one node. If set to `0`, OpenSearch clears all monitoring predict requests in the node's cache, and does not monitor predict requests from that point forward.
Controls how many upload model tasks can run in parallel on one node. If set to `0`, you cannot upload models to any node.
### Setting
@ -92,7 +94,7 @@ plugins.ml_commons.max_upload_model_tasks_per_node: 10
## Load model tasks per node
Controls how many load model tasks can run in parallel on one node. If set to `0`, you cannot load models to any node.
Controls how many load model tasks can run in parallel on one node. If set to 0, you cannot load models to any node.
### Setting
@ -107,7 +109,7 @@ plugins.ml_commons.max_load_model_tasks_per_node: 10
## Add trusted URL
The default value allows uploading a model file from any `http`, `https`, `ftp`, or local file. You can change this value to restrict trusted model URL.
The default value allows you to upload a model file from any http/https/ftp/local file. You can change this value to restrict trusted model URLs.
### Setting
@ -120,3 +122,35 @@ plugins.ml_commons.trusted_url_regex: ^(https?\|ftp\|file)://[-a-zA-Z0-9+&@#/%?=
- Default value: `^(https?\|ftp\|file)://[-a-zA-Z0-9+&@#/%?=~_\|!:,.;]*[-a-zA-Z0-9+&@#/%=~_\|]`
- Value range: Java regular expression (regex) string
## Assign task timeout
Assigns how long in seconds an ML task will live. After the timeout, the task will fail.
### Setting
```
plugins.ml_commons.ml_task_timeout_in_seconds: 600
```
### Values
- Default value: 600
- Value range: [1, 86400]
## Set native memory threshold
Sets a circuit breaker that checks all system memory usage before running an ML task. If the native memory exceeds the threshold, OpenSearch throws an exception and stops running any ML task.
Values are based on the percentage of memory available. When set to `0`, no ML tasks will run. When set to `100`, the circuit breaker closes and no threshold exists.
### Setting
```
plugins.ml_commons.native_memory_threshold: 90
```
### Values
- Default value: 90
- Value range: [0, 100]