Add review feedback

Signed-off-by: Naarcha-AWS <naarcha@amazon.com>
This commit is contained in:
Naarcha-AWS 2022-05-24 09:41:26 -05:00
parent 8042de2098
commit fb1a0181e9
3 changed files with 18 additions and 18 deletions

View File

@ -7,7 +7,7 @@ nav_order: 100
# Supported Algorithms # Supported Algorithms
ML Commons supports various algorithms to help train and predict ML models or test data-driven predictions without a model. This page outlines the algorithms supported by the ML Commons plugin and the API actions they support ML Commons supports various algorithms to help train and predict ML models or test data-driven predictions without a model. This page outlines the algorithms supported by the ML Commons plugin and the API actions they support.
## Common limitation ## Common limitation
@ -15,7 +15,7 @@ Except for the Localization algorithm, all of the following algorithms can only
## K-Means ## K-Means
K-Means is a simple and popular unsupervised clustering ML algorithm. K-Means will randomly choose centroids, then calculate iteratively to optimize the position of the centroids until each observation belongs to the cluster with the nearest mean. K-Means is a simple and popular unsupervised clustering ML algorithm, built on top of [Tribuo](https://tribuo.org/) library. K-Means will randomly choose centroids, then calculate iteratively to optimize the position of the centroids until each observation belongs to the cluster with the nearest mean.
### Parameters ### Parameters
@ -27,9 +27,9 @@ distance_type | enum, such as `EUCLIDEAN`, `COSINE`, or `L1` | Type of measureme
### APIs ### APIs
* [Train](https://opensearch.org/docs/latest/ml-commons-plugin/api/#train-model) * [Train]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/#train-model)
* [Predict](https://opensearch.org/docs/latest/ml-commons-plugin/api/#predict) * [Predict]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/#predict)
* [Train and predict](https://opensearch.org/docs/latest/ml-commons-plugin/api/#train-and-predict) * [Train and predict]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/#train-and-predict)
### Example ### Example
@ -77,12 +77,12 @@ optimizerType | OptimizerType | The optimizer used in the model | SIMPLE_SGD
### APIs ### APIs
* [Train](https://opensearch.org/docs/latest/ml-commons-plugin/api/#train-model) * [Train]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/#train-model)
* [Predict](https://opensearch.org/docs/latest/ml-commons-plugin/api/#predict) * [Predict]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/#predict)
### Example ### Example
The following example predicts based on the previously trained linear regression model. The following example creates a new prediction based on the previously trained linear regression model.
**Request** **Request**
@ -155,8 +155,8 @@ ML Commons only supports the linear Stochastic gradient trainer or optimizer, wh
[Random Cut Forest](https://github.com/aws/random-cut-forest-by-aws) (RCF) is a probabilistic data structure used primarily for unsupervised anomaly detection. Its use also extends to density estimation and forecasting. OpenSearch leverages RCF for anomaly detection. ML Commons supports two new variants of RCF for different use cases: [Random Cut Forest](https://github.com/aws/random-cut-forest-by-aws) (RCF) is a probabilistic data structure used primarily for unsupervised anomaly detection. Its use also extends to density estimation and forecasting. OpenSearch leverages RCF for anomaly detection. ML Commons supports two new variants of RCF for different use cases:
* Batch RCF: Detect anomalies in non-time-series data * Batch RCF: Detects anomalies in non-time series data.
* Fixed in time (FIT) RCF: Detect anomalies in time-series data * Fixed in time (FIT) RCF: Detects anomalies in time series data.
### Parameters ### Parameters
@ -181,24 +181,24 @@ output_after | integer | The number of points required by stream samplers before
time_decay | double | The decay factor used by stream samplers in the forest | 0.0001 time_decay | double | The decay factor used by stream samplers in the forest | 0.0001
anomaly_rate | double | The anomaly rate | 0.005 anomaly_rate | double | The anomaly rate | 0.005
time_field | string | (**Required**) The time filed for RCF to use as time-series data | N/A time_field | string | (**Required**) The time filed for RCF to use as time-series data | N/A
date_format | string | The date and time formatting for the time_field field | "yyyy-MM-ddHH:mm:ss" date_format | string | The date and time format for the time_field field | "yyyy-MM-ddHH:mm:ss"
time_zone | string | The time zone for the time_field field | "UTC" time_zone | string | The time zone for the time_field field | "UTC"
### APIs ### APIs
* [Train](https://opensearch.org/docs/latest/ml-commons-plugin/api/#train-model) * [Train]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/#train-model)
* [Predict](https://opensearch.org/docs/latest/ml-commons-plugin/api/#predict) * [Predict]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/#predict)
* [Train and predict](https://opensearch.org/docs/latest/ml-commons-plugin/api/#train-and-predict) * [Train and predict]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/#train-and-predict)
### Limitations ### Limitations
For FIT RCF, you can train the model with historical data, and store the trained model in your index. The model will be deserialized and predict new data points when using the Predict API. But the model in the index will not be refreshed with new data, because the model is "Fixed In Time". For FIT RCF, you can train the model with historical data, and store the trained model in your index. The model will be deserialized and predict new data points when using the Predict API. However, the model in the index will not be refreshed with new data, because the model is "Fixed In Time".
## Localization ## Localization
Finding subset level information for aggregate data (for example, aggregated over time) that demonstrates the activity of interest (spikes, drops, changes, anomalies) is a critical insight. Localization can be applied in different scenarios, such as data exploration, root cause analysis, etc., to expose the contributors driving the activity of interest in the aggregate data. The Localization algorithm finds subset level information for aggregate data (for example, aggregated over time) that demonstrates the activity of interest, such as spikes, drops, changes or anomalies. Localization can be applied in different scenarios, such as data exploration or root cause analysis, to expose the contributors driving the activity of interest in the aggregate data.
### Parameters ### Parameters

View File

@ -232,7 +232,7 @@ The API returns the following:
## Predict ## Predict
ML commons can predict new data with your trained model either from indexed data or a data frame. The model_id is required to use the Predict API. ML Commons can predict new data with your trained model either from indexed data or a data frame. The model_id is required to use the Predict API.
```json ```json
POST /_plugins/_ml/_predict/<algorithm_name>/<model_id> POST /_plugins/_ml/_predict/<algorithm_name>/<model_id>

View File

@ -10,7 +10,7 @@ has_toc: false
ML Commons for OpenSearch eases the development of machine learning features by providing a set of common machine learning (ML) algorithms through transport and REST API calls. Those calls choose the right nodes and resources for each ML request and monitors ML tasks to ensure uptime. This allows you to leverage existing open-source ML algorithms and reduce the effort required to develop new ML features. ML Commons for OpenSearch eases the development of machine learning features by providing a set of common machine learning (ML) algorithms through transport and REST API calls. Those calls choose the right nodes and resources for each ML request and monitors ML tasks to ensure uptime. This allows you to leverage existing open-source ML algorithms and reduce the effort required to develop new ML features.
Interaction with the ML commons plugin occurs through either the [REST API]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api) or [AD]({{site.url}}{{site.baseurl}}/observability-plugin/ppl/commands#ad) and [kmeans]({{site.url}}{{site.baseurl}}/observability-plugin/ppl/commands#kmeans) PPL commands. Interaction with the ML Commons plugin occurs through either the [REST API]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api) or [AD]({{site.url}}{{site.baseurl}}/observability-plugin/ppl/commands#ad) and [kmeans]({{site.url}}{{site.baseurl}}/observability-plugin/ppl/commands#kmeans) PPL commands.
Models [trained]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api#train-model) through the ML Commons plugin support model-based algorithms such as kmeans. After you've trained a model enough so that it meets your precision requirements, you can apply the model to [predict]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api#predict) new data safely. Models [trained]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api#train-model) through the ML Commons plugin support model-based algorithms such as kmeans. After you've trained a model enough so that it meets your precision requirements, you can apply the model to [predict]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api#predict) new data safely.