opensearch-docs-cn/_ml-commons-plugin/cluster-settings.md

249 lines
6.4 KiB
Markdown
Raw Normal View History

---
layout: default
title: ML Commons cluster settings
has_children: false
nav_order: 160
---
# ML Commons cluster settings
To enhance and customize your OpenSearch cluster for machine learning (ML), you can add and modify several configuration settings for the ML Commons plugin in your 'opensearch.yml' file.
## Run tasks and models on ML nodes only
Add cluster awareness and decommission docs (#2438) * Add cluster awareness and decommission docs Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Bukhtawar Khan <bukhtawar7152@gmail.com> * Edit technical feedback Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Add new cluster awareness examples Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Add technical feedback Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Alice Williams <88908598+alicejw-aws@users.noreply.github.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Alice Williams <88908598+alicejw-aws@users.noreply.github.com> * Add Caroline's feedback Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Add one more tweak Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Update _ml-commons-plugin/cluster-settings.md Co-authored-by: Heather Halter <HDHALTER@AMAZON.COM> * Update _ml-commons-plugin/cluster-settings.md Co-authored-by: Heather Halter <HDHALTER@AMAZON.COM> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _ml-commons-plugin/cluster-settings.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _ml-commons-plugin/cluster-settings.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-decommission.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-decommission.md Co-authored-by: Nate Bower <nbower@amazon.com> * Add editoiral feedback Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Fix typos Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Final editorial note Signed-off-by: Naarcha-AWS <naarcha@amazon.com> Signed-off-by: Naarcha-AWS <naarcha@amazon.com> Co-authored-by: Bukhtawar Khan <bukhtawar7152@gmail.com> Co-authored-by: Alice Williams <88908598+alicejw-aws@users.noreply.github.com> Co-authored-by: Heather Halter <HDHALTER@AMAZON.COM> Co-authored-by: Nate Bower <nbower@amazon.com>
2023-01-23 18:13:07 -05:00
If `true`, ML Commons tasks and models run machine learning (ML) tasks on ML nodes only. If `false`, tasks and models run on ML nodes first. If no ML nodes exist, tasks and models run on data nodes. We recommend that you do not set this value to "false" on production clusters.
### Setting
```
Add GPU acceleration documentation (#2384) * Add GPU acceleration documentation Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Address tech feedback Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Address technical feedback Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Adjust model size sentence Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Add optional to neuron step Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Add Jeff's feedback Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Add copy and customize for Inferntia examples Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Update _ml-commons-plugin/gpu-acceleration.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _ml-commons-plugin/gpu-acceleration.md Co-authored-by: Nate Bower <nbower@amazon.com> * Apply suggestions from code review Co-authored-by: Nate Bower <nbower@amazon.com> * Fix link Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Apply suggestions from code review Co-authored-by: Caroline <113052567+carolxob@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Caroline <113052567+carolxob@users.noreply.github.com> * Update _ml-commons-plugin/gpu-acceleration.md Co-authored-by: Caroline <113052567+carolxob@users.noreply.github.com> * Update _ml-commons-plugin/gpu-acceleration.md Co-authored-by: Caroline <113052567+carolxob@users.noreply.github.com> * Update _ml-commons-plugin/gpu-acceleration.md Co-authored-by: Caroline <113052567+carolxob@users.noreply.github.com> * Fix numbering in final section Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Add final tech feedback Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * A couple more suggestion Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Apply suggestions from code review Co-authored-by: Yaliang Wu <ylwu@amazon.com> * Fix Neural Search link Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Add experimental warning Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Update _ml-commons-plugin/gpu-acceleration.md Co-authored-by: Yaliang Wu <ylwu@amazon.com> * Final tech feedback Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Move OpenSearch to step 2. Signed-off-by: Naarcha-AWS <naarcha@amazon.com> Signed-off-by: Naarcha-AWS <naarcha@amazon.com> Co-authored-by: Nate Bower <nbower@amazon.com> Co-authored-by: Caroline <113052567+carolxob@users.noreply.github.com> Co-authored-by: Yaliang Wu <ylwu@amazon.com>
2023-01-18 14:31:52 -05:00
plugins.ml_commons.only_run_on_ml_node: true
```
### Values
- Default value: `true`
- Value range: `true` or `false`
## Dispatch tasks to ML node
Add cluster awareness and decommission docs (#2438) * Add cluster awareness and decommission docs Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Bukhtawar Khan <bukhtawar7152@gmail.com> * Edit technical feedback Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Add new cluster awareness examples Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Add technical feedback Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Alice Williams <88908598+alicejw-aws@users.noreply.github.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Alice Williams <88908598+alicejw-aws@users.noreply.github.com> * Add Caroline's feedback Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Add one more tweak Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Update _ml-commons-plugin/cluster-settings.md Co-authored-by: Heather Halter <HDHALTER@AMAZON.COM> * Update _ml-commons-plugin/cluster-settings.md Co-authored-by: Heather Halter <HDHALTER@AMAZON.COM> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _ml-commons-plugin/cluster-settings.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _ml-commons-plugin/cluster-settings.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-decommission.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-decommission.md Co-authored-by: Nate Bower <nbower@amazon.com> * Add editoiral feedback Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Fix typos Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Final editorial note Signed-off-by: Naarcha-AWS <naarcha@amazon.com> Signed-off-by: Naarcha-AWS <naarcha@amazon.com> Co-authored-by: Bukhtawar Khan <bukhtawar7152@gmail.com> Co-authored-by: Alice Williams <88908598+alicejw-aws@users.noreply.github.com> Co-authored-by: Heather Halter <HDHALTER@AMAZON.COM> Co-authored-by: Nate Bower <nbower@amazon.com>
2023-01-23 18:13:07 -05:00
`round_robin` dispatches ML tasks to ML nodes using round robin routing. `least_load` gathers runtime information from all ML nodes, like JVM heap memory usage and running tasks, and then dispatches the tasks to the ML node with the lowest load.
### Setting
```
plugins.ml_commons.task_dispatch_policy: round_robin
```
### Values
- Dafault value: `round_robin`
- Value range: `round_robin` or `least_load`
## Set number of ML tasks per node
Sets the number of ML tasks that can run on each ML node. When set to `0`, no ML tasks run on any nodes.
### Setting
```
plugins.ml_commons.max_ml_task_per_node: 10
```
### Values
- Default value: `10`
- Value range: [0, 10,000]
## Set number of ML models per node
Add ML fault tolerance (#3803) * Add ML fault tolerance Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Rework Profile API sentence Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Fix link Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Add review feedback Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Add technical feedback for ML. Change API names Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Add final ML node setting Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Add more technical feedback Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Apply suggestions from code review Co-authored-by: Chris Moore <107723039+cwillum@users.noreply.github.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update cluster-settings.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update _ml-commons-plugin/api.md Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update _ml-commons-plugin/api.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update _ml-commons-plugin/api.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update _ml-commons-plugin/api.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update _ml-commons-plugin/api.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update api.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: Naarcha-AWS <naarcha@amazon.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Chris Moore <107723039+cwillum@users.noreply.github.com> Co-authored-by: Nathan Bower <nbower@amazon.com>
2023-05-01 09:16:21 -04:00
Sets the number of ML models that can be deployed to each ML node. When set to `0`, no ML models can deploy on any node.
### Setting
```
plugins.ml_commons.max_model_on_node: 10
```
### Values
- Default value: `10`
- Value range: [0, 10,000]
Add cluster awareness and decommission docs (#2438) * Add cluster awareness and decommission docs Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Bukhtawar Khan <bukhtawar7152@gmail.com> * Edit technical feedback Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Add new cluster awareness examples Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Add technical feedback Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Alice Williams <88908598+alicejw-aws@users.noreply.github.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Alice Williams <88908598+alicejw-aws@users.noreply.github.com> * Add Caroline's feedback Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Add one more tweak Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Update _ml-commons-plugin/cluster-settings.md Co-authored-by: Heather Halter <HDHALTER@AMAZON.COM> * Update _ml-commons-plugin/cluster-settings.md Co-authored-by: Heather Halter <HDHALTER@AMAZON.COM> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _ml-commons-plugin/cluster-settings.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _ml-commons-plugin/cluster-settings.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-decommission.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-decommission.md Co-authored-by: Nate Bower <nbower@amazon.com> * Add editoiral feedback Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Fix typos Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Final editorial note Signed-off-by: Naarcha-AWS <naarcha@amazon.com> Signed-off-by: Naarcha-AWS <naarcha@amazon.com> Co-authored-by: Bukhtawar Khan <bukhtawar7152@gmail.com> Co-authored-by: Alice Williams <88908598+alicejw-aws@users.noreply.github.com> Co-authored-by: Heather Halter <HDHALTER@AMAZON.COM> Co-authored-by: Nate Bower <nbower@amazon.com>
2023-01-23 18:13:07 -05:00
## Set sync job intervals
Add style workflow (#3909) * Tweak rules Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Rule changes Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Remove the following and simple Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Revised rules and added tests Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Soft rollout with only spelling and terms Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Add Vale to readme Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Only lint modified and added files Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Remove run on workflow dispatch Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Removed only added and modified files Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Added please Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Changed min alert level to warning Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Don't fail on error and minor changes Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Remove fail on error Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Attempt to have vale not fail Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Fixed links Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Update README.md Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Update README.md Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower <nbower@amazon.com>
2023-05-03 12:34:05 -04:00
When returning runtime information with the [Profile API]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api#returning-model-profile-information), ML Commons will run a regular job to sync newly deployed or undeployed models on each node. When set to `0`, ML Commons immediately stops sync-up jobs.
### Setting
```
plugins.ml_commons.sync_up_job_interval_in_seconds: 3
```
### Values
- Default value: `3`
- Value range: [0, 86,400]
## Predict monitoring requests
Controls how many predict requests are monitored on one node. If set to `0`, OpenSearch clears all monitoring predict requests in cache and does not monitor for new predict requests.
### Setting
```
plugins.ml_commons.monitoring_request_count: 100
```
### Value range
- Default value: `100`
- Value range: [0, 10,000,000]
## Upload model tasks per node
Controls how many upload model tasks can run in parallel on one node. If set to `0`, you cannot upload models to any node.
### Setting
```
plugins.ml_commons.max_upload_model_tasks_per_node: 10
```
### Values
- Default value: `10`
- Value range: [0, 10]
## Load model tasks per node
Add cluster awareness and decommission docs (#2438) * Add cluster awareness and decommission docs Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Bukhtawar Khan <bukhtawar7152@gmail.com> * Edit technical feedback Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Add new cluster awareness examples Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Add technical feedback Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Alice Williams <88908598+alicejw-aws@users.noreply.github.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Alice Williams <88908598+alicejw-aws@users.noreply.github.com> * Add Caroline's feedback Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Add one more tweak Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Update _ml-commons-plugin/cluster-settings.md Co-authored-by: Heather Halter <HDHALTER@AMAZON.COM> * Update _ml-commons-plugin/cluster-settings.md Co-authored-by: Heather Halter <HDHALTER@AMAZON.COM> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _ml-commons-plugin/cluster-settings.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _ml-commons-plugin/cluster-settings.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-decommission.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-decommission.md Co-authored-by: Nate Bower <nbower@amazon.com> * Add editoiral feedback Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Fix typos Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Final editorial note Signed-off-by: Naarcha-AWS <naarcha@amazon.com> Signed-off-by: Naarcha-AWS <naarcha@amazon.com> Co-authored-by: Bukhtawar Khan <bukhtawar7152@gmail.com> Co-authored-by: Alice Williams <88908598+alicejw-aws@users.noreply.github.com> Co-authored-by: Heather Halter <HDHALTER@AMAZON.COM> Co-authored-by: Nate Bower <nbower@amazon.com>
2023-01-23 18:13:07 -05:00
Controls how many load model tasks can run in parallel on one node. If set to 0, you cannot load models to any node.
### Setting
```
plugins.ml_commons.max_load_model_tasks_per_node: 10
```
### Values
- Default value: `10`
- Value range: [0, 10]
## Add trusted URL
Add cluster awareness and decommission docs (#2438) * Add cluster awareness and decommission docs Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Bukhtawar Khan <bukhtawar7152@gmail.com> * Edit technical feedback Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Add new cluster awareness examples Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Add technical feedback Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Alice Williams <88908598+alicejw-aws@users.noreply.github.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Alice Williams <88908598+alicejw-aws@users.noreply.github.com> * Add Caroline's feedback Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Add one more tweak Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Update _ml-commons-plugin/cluster-settings.md Co-authored-by: Heather Halter <HDHALTER@AMAZON.COM> * Update _ml-commons-plugin/cluster-settings.md Co-authored-by: Heather Halter <HDHALTER@AMAZON.COM> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _ml-commons-plugin/cluster-settings.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _ml-commons-plugin/cluster-settings.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-decommission.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-decommission.md Co-authored-by: Nate Bower <nbower@amazon.com> * Add editoiral feedback Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Fix typos Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Final editorial note Signed-off-by: Naarcha-AWS <naarcha@amazon.com> Signed-off-by: Naarcha-AWS <naarcha@amazon.com> Co-authored-by: Bukhtawar Khan <bukhtawar7152@gmail.com> Co-authored-by: Alice Williams <88908598+alicejw-aws@users.noreply.github.com> Co-authored-by: Heather Halter <HDHALTER@AMAZON.COM> Co-authored-by: Nate Bower <nbower@amazon.com>
2023-01-23 18:13:07 -05:00
The default value allows you to upload a model file from any http/https/ftp/local file. You can change this value to restrict trusted model URLs.
### Setting
The default URL value for this trusted URL setting is not secure. To ensure the security, please use you own regex string to the trusted repository that contains your models, for example `https://github.com/opensearch-project/ml-commons/blob/2.x/ml-algorithms/src/test/resources/org/opensearch/ml/engine/algorithms/text_embedding/*`.
{: .warning }
```
plugins.ml_commons.trusted_url_regex: <model-repository-url>
```
### Values
- Default value: `"^(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]"`
- Value range: Java regular expression (regex) string
Add cluster awareness and decommission docs (#2438) * Add cluster awareness and decommission docs Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Bukhtawar Khan <bukhtawar7152@gmail.com> * Edit technical feedback Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Add new cluster awareness examples Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Add technical feedback Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Alice Williams <88908598+alicejw-aws@users.noreply.github.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Alice Williams <88908598+alicejw-aws@users.noreply.github.com> * Add Caroline's feedback Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Add one more tweak Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Update _ml-commons-plugin/cluster-settings.md Co-authored-by: Heather Halter <HDHALTER@AMAZON.COM> * Update _ml-commons-plugin/cluster-settings.md Co-authored-by: Heather Halter <HDHALTER@AMAZON.COM> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _ml-commons-plugin/cluster-settings.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _ml-commons-plugin/cluster-settings.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-decommission.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-decommission.md Co-authored-by: Nate Bower <nbower@amazon.com> * Add editoiral feedback Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Fix typos Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Final editorial note Signed-off-by: Naarcha-AWS <naarcha@amazon.com> Signed-off-by: Naarcha-AWS <naarcha@amazon.com> Co-authored-by: Bukhtawar Khan <bukhtawar7152@gmail.com> Co-authored-by: Alice Williams <88908598+alicejw-aws@users.noreply.github.com> Co-authored-by: Heather Halter <HDHALTER@AMAZON.COM> Co-authored-by: Nate Bower <nbower@amazon.com>
2023-01-23 18:13:07 -05:00
## Assign task timeout
Assigns how long in seconds an ML task will live. After the timeout, the task will fail.
### Setting
```
plugins.ml_commons.ml_task_timeout_in_seconds: 600
```
### Values
- Default value: 600
- Value range: [1, 86,400]
Add cluster awareness and decommission docs (#2438) * Add cluster awareness and decommission docs Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Bukhtawar Khan <bukhtawar7152@gmail.com> * Edit technical feedback Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Add new cluster awareness examples Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Add technical feedback Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Alice Williams <88908598+alicejw-aws@users.noreply.github.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Alice Williams <88908598+alicejw-aws@users.noreply.github.com> * Add Caroline's feedback Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Add one more tweak Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Update _ml-commons-plugin/cluster-settings.md Co-authored-by: Heather Halter <HDHALTER@AMAZON.COM> * Update _ml-commons-plugin/cluster-settings.md Co-authored-by: Heather Halter <HDHALTER@AMAZON.COM> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _ml-commons-plugin/cluster-settings.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _ml-commons-plugin/cluster-settings.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-decommission.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-awareness.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _api-reference/cluster-decommission.md Co-authored-by: Nate Bower <nbower@amazon.com> * Add editoiral feedback Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Fix typos Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Final editorial note Signed-off-by: Naarcha-AWS <naarcha@amazon.com> Signed-off-by: Naarcha-AWS <naarcha@amazon.com> Co-authored-by: Bukhtawar Khan <bukhtawar7152@gmail.com> Co-authored-by: Alice Williams <88908598+alicejw-aws@users.noreply.github.com> Co-authored-by: Heather Halter <HDHALTER@AMAZON.COM> Co-authored-by: Nate Bower <nbower@amazon.com>
2023-01-23 18:13:07 -05:00
## Set native memory threshold
Sets a circuit breaker that checks all system memory usage before running an ML task. If the native memory exceeds the threshold, OpenSearch throws an exception and stops running any ML task.
Values are based on the percentage of memory available. When set to `0`, no ML tasks will run. When set to `100`, the circuit breaker closes and no threshold exists.
### Setting
```
plugins.ml_commons.native_memory_threshold: 90
```
### Values
- Default value: 90
- Value range: [0, 100]
Add ML fault tolerance (#3803) * Add ML fault tolerance Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Rework Profile API sentence Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Fix link Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Add review feedback Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Add technical feedback for ML. Change API names Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Add final ML node setting Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Add more technical feedback Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Apply suggestions from code review Co-authored-by: Chris Moore <107723039+cwillum@users.noreply.github.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update cluster-settings.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update _ml-commons-plugin/api.md Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update _ml-commons-plugin/api.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update _ml-commons-plugin/api.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update _ml-commons-plugin/api.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update _ml-commons-plugin/api.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update api.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: Naarcha-AWS <naarcha@amazon.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Chris Moore <107723039+cwillum@users.noreply.github.com> Co-authored-by: Nathan Bower <nbower@amazon.com>
2023-05-01 09:16:21 -04:00
## Allow custom deployment plans
When enabled, this setting grants users the ability to deploy models to specific ML nodes according to that user's permissions.
### Setting
```
plugins.ml_commons.allow_custom_deployment_plan: false
```
### Values
- Default value: false
- Value range: [false, true]
## Enable auto redeploy
This setting automatically redeploys deployed or partially deployed models upon cluster failure. If all ML nodes inside a cluster crash, the model switches to the `DEPLOYED_FAILED` state, and the model must be deployed manually.
### Setting
```
plugins.ml_commons.model_auto_redeploy.enable: false
```
### Values
- Default value: false
- Value range: [false, true]
## Set retires for auto redeploy
This setting sets the limit for the number of times a deployed or partially deployed model will try and redeploy when ML nodes in a cluster fail or new ML nodes join the cluster.
### Setting
```
plugins.ml_commons.model_auto_redeploy.lifetime_retry_times: 3
```
### Values
- Default value: 3
- Value range: [0, 100]
## Set auto redeploy success ratio
This setting sets the ratio of success for the auto-redeployment of a model based on the available ML nodes in a cluster. For example, if ML nodes crash inside a cluster, the auto redeploy protocol adds another node or retires a crashed node. If the ratio is `0.7` and 70% of all ML nodes successfully redeploy the model on auto-redeploy activation, the redeployment is a success. If the model redeploys on fewer than 70% of available ML nodes, the auto-redeploy retries until the redeployment succeeds or OpenSearch reaches [the maximum number of retries](#set-retires-for-auto-redeploy).
### Setting
```
plugins.ml_commons.model_auto_redeploy_success_ratio: 0.8
```
### Values
- Default value: 0.8
- Value range: [0, 1]