Add review feedback
Signed-off-by: Naarcha-AWS <naarcha@amazon.com>
This commit is contained in:
parent
fb1a0181e9
commit
adbf630735
|
@ -7,7 +7,7 @@ nav_order: 100
|
||||||
|
|
||||||
# Supported Algorithms
|
# Supported Algorithms
|
||||||
|
|
||||||
ML Commons supports various algorithms to help train and predict ML models or test data-driven predictions without a model. This page outlines the algorithms supported by the ML Commons plugin and the API actions they support.
|
ML Commons supports various algorithms to help train and predict machine learning (ML) models or test data-driven predictions without a model. This page outlines the algorithms supported by the ML Commons plugin and the API operations they support.
|
||||||
|
|
||||||
## Common limitation
|
## Common limitation
|
||||||
|
|
||||||
|
@ -15,15 +15,15 @@ Except for the Localization algorithm, all of the following algorithms can only
|
||||||
|
|
||||||
## K-Means
|
## K-Means
|
||||||
|
|
||||||
K-Means is a simple and popular unsupervised clustering ML algorithm, built on top of [Tribuo](https://tribuo.org/) library. K-Means will randomly choose centroids, then calculate iteratively to optimize the position of the centroids until each observation belongs to the cluster with the nearest mean.
|
K-Means is a simple and popular unsupervised clustering ML algorithm built on top of [Tribuo](https://tribuo.org/) library. K-Means will randomly choose centroids, then calculate iteratively to optimize the position of the centroids until each observation belongs to the cluster with the nearest mean.
|
||||||
|
|
||||||
### Parameters
|
### Parameters
|
||||||
|
|
||||||
Parameter | Type | Description | Default Value
|
Parameter | Type | Description | Default Value
|
||||||
:--- |:--- | :--- | :---
|
:--- |:--- | :--- | :---
|
||||||
centroids | integer | The number of clusters to group the generated data | `2`
|
centroids | integer | The number of clusters in which to group the generated data | `2`
|
||||||
iterations | integer | The number of iterations to perform against the data until a mean generates | `10`
|
iterations | integer | The number of iterations to perform against the data until a mean generates | `10`
|
||||||
distance_type | enum, such as `EUCLIDEAN`, `COSINE`, or `L1` | Type of measurement from which to measure the distance between centroids | `EUCLIDEAN`
|
distance_type | enum, such as `EUCLIDEAN`, `COSINE`, or `L1` | The type of measurement from which to measure the distance between centroids | `EUCLIDEAN`
|
||||||
|
|
||||||
### APIs
|
### APIs
|
||||||
|
|
||||||
|
@ -55,23 +55,23 @@ POST /_plugins/_ml/_train/kmeans
|
||||||
|
|
||||||
### Limitations
|
### Limitations
|
||||||
|
|
||||||
The training process supports multi-threads, but the thread number is less than half of CPUs.
|
The training process supports multi-threads, but the number of threads should be less than half of the number of CPUs.
|
||||||
|
|
||||||
## Linear Regression
|
## Linear Regression
|
||||||
|
|
||||||
Linear Regression maps the linear relationship between inputs and outputs. In ml-common, the linear regression algorithm is adopted from the public machine learning library [Tribuo](https://tribuo.org/), which offers multidimensional linear regression models. The model supports the linear optimizer in training, including popular approaches like Linear Decay, SQRT_DECAY, [ADA](http://chrome-extension//gphandlahdpffmccakmbngmbjnjiiahp/https://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf), [ADAM](https://tribuo.org/learn/4.1/javadoc/org/tribuo/math/optimisers/Adam.html), and [RMS_DROP](https://tribuo.org/learn/4.1/javadoc/org/tribuo/math/optimisers/RMSProp.html).
|
Linear Regression maps the linear relationship between inputs and outputs. In ML Commons, the linear regression algorithm is adopted from the public machine learning library [Tribuo](https://tribuo.org/), which offers multidimensional linear regression models. The model supports the linear optimizer in training, including popular approaches like Linear Decay, SQRT_DECAY, [ADA](http://chrome-extension//gphandlahdpffmccakmbngmbjnjiiahp/https://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf), [ADAM](https://tribuo.org/learn/4.1/javadoc/org/tribuo/math/optimisers/Adam.html), and [RMS_DROP](https://tribuo.org/learn/4.1/javadoc/org/tribuo/math/optimisers/RMSProp.html).
|
||||||
|
|
||||||
### Parameters
|
### Parameters
|
||||||
|
|
||||||
Parameter | Type | Description | Default Value
|
Parameter | Type | Description | Default Value
|
||||||
:--- |:--- | :--- | :---
|
:--- |:--- | :--- | :---
|
||||||
learningRate | Double | The rate of speed that the gradient moves during descent | 0.01
|
learningRate | Double | The rate of speed at which the gradient moves during descent | 0.01
|
||||||
momentumFactor | Double | The medium-term from which the regressor rises or falls | 0
|
momentumFactor | Double | The medium-term from which the regressor rises or falls | 0
|
||||||
epsilon | Double | The criteria in which a linear model is identified | 1.00E-06
|
epsilon | Double | The criteria used to identify a linear model | 1.00E-06
|
||||||
beta1 | Double | The estimated exponential decay for the moment | 0.9
|
beta1 | Double | The estimated exponential decay for the moment | 0.9
|
||||||
beta2 | Double | The estimated exponential decay for the moment | 0.99
|
beta2 | Double | The estimated exponential decay for the moment | 0.99
|
||||||
decayRate | Double | The rate at which the model decays exponentially | 0.9
|
decayRate | Double | The rate at which the model decays exponentially | 0.9
|
||||||
momentumType | MomentumType | The momentum with SDG to help accelerate gradients vectors in the right directions, leading to a faster convergence | STANDARD
|
momentumType | MomentumType | The defined Stochastic Gradient Descent (SDG) momentum type that helps accelerate gradient vectors in the right directions, leading to a fast convergence| STANDARD
|
||||||
optimizerType | OptimizerType | The optimizer used in the model | SIMPLE_SGD
|
optimizerType | OptimizerType | The optimizer used in the model | SIMPLE_SGD
|
||||||
|
|
||||||
|
|
||||||
|
@ -149,7 +149,7 @@ POST _plugins/_ml/_predict/LINEAR_REGRESSION/ROZs-38Br5eVE0lTsoD9
|
||||||
|
|
||||||
### Limitations
|
### Limitations
|
||||||
|
|
||||||
ML Commons only supports the linear Stochastic gradient trainer or optimizer, which can not effectively map the non-linear relationships in trained data. When used with complicated data sets, the linear Stochastic trainer might cause some convergence problems and inaccurate results.
|
ML Commons only supports the linear Stochastic gradient trainer or optimizer, which cannot effectively map the non-linear relationships in trained data. When used with complicated datasets, the linear Stochastic trainer might cause some convergence problems and inaccurate results.
|
||||||
|
|
||||||
## RCF
|
## RCF
|
||||||
|
|
||||||
|
@ -164,23 +164,23 @@ ML Commons only supports the linear Stochastic gradient trainer or optimizer, wh
|
||||||
|
|
||||||
Parameter | Type | Description | Default Value
|
Parameter | Type | Description | Default Value
|
||||||
:--- |:--- | :--- | :---
|
:--- |:--- | :--- | :---
|
||||||
number_of_trees | integer | Number of trees in the forest | 30
|
number_of_trees | integer | The number of trees in the forest | 30
|
||||||
sample_size | integer | The same size used by the stream samplers in the forest | 256
|
sample_size | integer | The same size used by the stream samplers in the forest | 256
|
||||||
output_after | integer | The number of points required by stream samplers before results return | 32
|
output_after | integer | The number of points required by stream samplers before results return | 32
|
||||||
training_data_size | integer | The size of your training data | Data set size
|
training_data_size | integer | The size of your training data | Dataset size
|
||||||
anamoly_score_threshold | double | The threshold of the anomaly score | 1.0
|
anomaly_score_threshold | double | The threshold of the anomaly score | 1.0
|
||||||
|
|
||||||
#### Fit RCF
|
#### Fit RCF
|
||||||
|
|
||||||
Parameter | Type | Description | Default Value
|
Parameter | Type | Description | Default Value
|
||||||
:--- |:--- | :--- | :---
|
:--- |:--- | :--- | :---
|
||||||
number_of_trees | integer | Number of trees in the forest | 30
|
number_of_trees | integer | The number of trees in the forest | 30
|
||||||
shingle_size | integer | A shingle, or consecutive sequence of the most recent records | 8
|
shingle_size | integer | A shingle, or a consecutive sequence of the most recent records | 8
|
||||||
sample_size | integer | The sample size used by stream samplers in the forest | 256
|
sample_size | integer | The sample size used by stream samplers in the forest | 256
|
||||||
output_after | integer | The number of points required by stream samplers before results return | 32
|
output_after | integer | The number of points required by stream samplers before results return | 32
|
||||||
time_decay | double | The decay factor used by stream samplers in the forest | 0.0001
|
time_decay | double | The decay factor used by stream samplers in the forest | 0.0001
|
||||||
anomaly_rate | double | The anomaly rate | 0.005
|
anomaly_rate | double | The anomaly rate | 0.005
|
||||||
time_field | string | (**Required**) The time filed for RCF to use as time-series data | N/A
|
time_field | string | (**Required**) The time filed for RCF to use as time series data | N/A
|
||||||
date_format | string | The date and time format for the time_field field | "yyyy-MM-ddHH:mm:ss"
|
date_format | string | The date and time format for the time_field field | "yyyy-MM-ddHH:mm:ss"
|
||||||
time_zone | string | The time zone for the time_field field | "UTC"
|
time_zone | string | The time zone for the time_field field | "UTC"
|
||||||
|
|
||||||
|
@ -194,27 +194,122 @@ time_zone | string | The time zone for the time_field field | "UTC"
|
||||||
|
|
||||||
### Limitations
|
### Limitations
|
||||||
|
|
||||||
For FIT RCF, you can train the model with historical data, and store the trained model in your index. The model will be deserialized and predict new data points when using the Predict API. However, the model in the index will not be refreshed with new data, because the model is "Fixed In Time".
|
For FIT RCF, you can train the model with historical data and store the trained model in your index. The model will be deserialized and predict new data points when using the Predict API. However, the model in the index will not be refreshed with new data, because the model is fixed in time.
|
||||||
|
|
||||||
## Localization
|
## Anomaly Localization
|
||||||
|
|
||||||
The Localization algorithm finds subset level information for aggregate data (for example, aggregated over time) that demonstrates the activity of interest, such as spikes, drops, changes or anomalies. Localization can be applied in different scenarios, such as data exploration or root cause analysis, to expose the contributors driving the activity of interest in the aggregate data.
|
The Anomaly Localization algorithm finds subset level-information for aggregate data (for example, aggregated over time) that demonstrates the activity of interest, such as spikes, drops, changes, or anomalies. Localization can be applied in different scenarios, such as data exploration or root cause analysis, to expose the contributors driving the activity of interest in the aggregate data.
|
||||||
|
|
||||||
### Parameters
|
### Parameters
|
||||||
|
|
||||||
Parameter | Type | Description | Default Value
|
Parameter | Type | Description | Default Value
|
||||||
:--- | :--- | :--- | :---
|
:--- | :--- | :--- | :---
|
||||||
index_name | String | The data collection to analyze | N/A
|
index_name | String | The data collection to analyze | N/A
|
||||||
attribute_field_names | List<String> | The fields for entity kets | N/A
|
attribute_field_names | List<String> | The fields for entity keys | N/A
|
||||||
aggregations | List<AggregationBuilder> | The fields and aggregation for values | N/A
|
aggregations | List<AggregationBuilder> | The fields and aggregation for values | N/A
|
||||||
time_field_name | String | The timestamp field | null
|
time_field_name | String | The timestamp field | null
|
||||||
start_time | Long | The beginning of the time range | 0
|
start_time | Long | The beginning of the time range | 0
|
||||||
end_time | Long | The end of time range | 0
|
end_time | Long | The end of the time range | 0
|
||||||
min_time_interval | Long | The minimal time interval/scale for analysis | 0
|
min_time_interval | Long | The minimum time interval/scale for analysis | 0
|
||||||
num_outputs | integer | The maximum number of values from localization/slicing | 0
|
num_outputs | integer | The maximum number of values from localization/slicing | 0
|
||||||
filter_query | Long | (Optional) Reduces the collection of data for analysis | Optional.empty()
|
filter_query | Long | (Optional) Reduces the collection of data for analysis | Optional.empty()
|
||||||
anomaly_star | QueryBuilder | (Optional) The time from which after the data will be analyzed | Optional.empty()
|
anomaly_star | QueryBuilder | (Optional) The time after which the data will be analyzed | Optional.empty()
|
||||||
|
|
||||||
|
### Example
|
||||||
|
|
||||||
|
The following example executes Anomaly Localization against an RCA index. The API responds with 10 aggregations and gives the sum of the contribution and base values per aggregation, every
|
||||||
|
|
||||||
|
**Request**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
POST /_plugins/_ml/_execute/anomaly_localization
|
||||||
|
{
|
||||||
|
"index_name": "rca-index",
|
||||||
|
"attribute_field_names": [
|
||||||
|
"attribute"
|
||||||
|
],
|
||||||
|
"aggregations": [
|
||||||
|
{
|
||||||
|
"sum": {
|
||||||
|
"sum": {
|
||||||
|
"field": "value"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"time_field_name": "timestamp",
|
||||||
|
"start_time": 1620630000000,
|
||||||
|
"end_time": 1621234800000,
|
||||||
|
"min_time_interval": 86400000,
|
||||||
|
"num_outputs": 10
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Response**
|
||||||
|
|
||||||
|
The API responds with the sum of the contribution and base values per aggregation, every time the algorithm executes in the specified time interval.
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"results" : [
|
||||||
|
{
|
||||||
|
"name" : "sum",
|
||||||
|
"result" : {
|
||||||
|
"buckets" : [
|
||||||
|
{
|
||||||
|
"start_time" : 1620630000000,
|
||||||
|
"end_time" : 1620716400000,
|
||||||
|
"overall_aggregate_value" : 65.0
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"start_time" : 1620716400000,
|
||||||
|
"end_time" : 1620802800000,
|
||||||
|
"overall_aggregate_value" : 75.0,
|
||||||
|
"entities" : [
|
||||||
|
{
|
||||||
|
"key" : [
|
||||||
|
"attr0"
|
||||||
|
],
|
||||||
|
"contribution_value" : 1.0,
|
||||||
|
"base_value" : 2.0,
|
||||||
|
"new_value" : 3.0
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"key" : [
|
||||||
|
"attr1"
|
||||||
|
],
|
||||||
|
"contribution_value" : 1.0,
|
||||||
|
"base_value" : 3.0,
|
||||||
|
"new_value" : 4.0
|
||||||
|
},
|
||||||
|
{
|
||||||
|
...
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"key" : [
|
||||||
|
"attr8"
|
||||||
|
],
|
||||||
|
"contribution_value" : 6.0,
|
||||||
|
"base_value" : 10.0,
|
||||||
|
"new_value" : 16.0
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"key" : [
|
||||||
|
"attr9"
|
||||||
|
],
|
||||||
|
"contribution_value" : 6.0,
|
||||||
|
"base_value" : 11.0,
|
||||||
|
"new_value" : 17.0
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
### Limitations
|
### Limitations
|
||||||
|
|
||||||
The localization algorithm can only be executed directly. Therefore, it cannot be used with the ML Common's train and predicts APIs.
|
The Localization algorithm can only be executed directly. Therefore, it cannot be used with the ML Commons Train and Predict APIs.
|
|
@ -232,7 +232,7 @@ The API returns the following:
|
||||||
|
|
||||||
## Predict
|
## Predict
|
||||||
|
|
||||||
ML Commons can predict new data with your trained model either from indexed data or a data frame. The model_id is required to use the Predict API.
|
ML Commons can predict new data with your trained model either from indexed data or a data frame. To use the Predict API, the model_id is required.
|
||||||
|
|
||||||
```json
|
```json
|
||||||
POST /_plugins/_ml/_predict/<algorithm_name>/<model_id>
|
POST /_plugins/_ml/_predict/<algorithm_name>/<model_id>
|
||||||
|
|
|
@ -10,7 +10,7 @@ has_toc: false
|
||||||
|
|
||||||
ML Commons for OpenSearch eases the development of machine learning features by providing a set of common machine learning (ML) algorithms through transport and REST API calls. Those calls choose the right nodes and resources for each ML request and monitors ML tasks to ensure uptime. This allows you to leverage existing open-source ML algorithms and reduce the effort required to develop new ML features.
|
ML Commons for OpenSearch eases the development of machine learning features by providing a set of common machine learning (ML) algorithms through transport and REST API calls. Those calls choose the right nodes and resources for each ML request and monitors ML tasks to ensure uptime. This allows you to leverage existing open-source ML algorithms and reduce the effort required to develop new ML features.
|
||||||
|
|
||||||
Interaction with the ML Commons plugin occurs through either the [REST API]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api) or [AD]({{site.url}}{{site.baseurl}}/observability-plugin/ppl/commands#ad) and [kmeans]({{site.url}}{{site.baseurl}}/observability-plugin/ppl/commands#kmeans) PPL commands.
|
Interaction with the ML Commons plugin occurs through either the [REST API]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api) or [AD]({{site.url}}{{site.baseurl}}/observability-plugin/ppl/commands#ad) and [kmeans]({{site.url}}{{site.baseurl}}/observability-plugin/ppl/commands#kmeans) Piped Processing Language (PPL) commands.
|
||||||
|
|
||||||
Models [trained]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api#train-model) through the ML Commons plugin support model-based algorithms such as kmeans. After you've trained a model enough so that it meets your precision requirements, you can apply the model to [predict]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api#predict) new data safely.
|
Models [trained]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api#train-model) through the ML Commons plugin support model-based algorithms such as kmeans. After you've trained a model enough so that it meets your precision requirements, you can apply the model to [predict]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api#predict) new data safely.
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue