From f78b5f612d14845567aab4d1350fca8fbf4c52c9 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS Date: Thu, 17 Mar 2022 12:02:29 -0500 Subject: [PATCH 01/15] Add ML commons Plugin section Signed-off-by: Naarcha-AWS --- _ml-commons-plugin/api.md | 19 +++++++ _ml-commons-plugin/index.md | 31 ++++++++++ _observability-plugin/ppl/commands.md | 81 +++++++++++++++++++++++++++ 3 files changed, 131 insertions(+) create mode 100644 _ml-commons-plugin/api.md create mode 100644 _ml-commons-plugin/index.md diff --git a/_ml-commons-plugin/api.md b/_ml-commons-plugin/api.md new file mode 100644 index 00000000..65a74a03 --- /dev/null +++ b/_ml-commons-plugin/api.md @@ -0,0 +1,19 @@ +--- +layout: default +title: API +has_children: false +nav_order: 90 +--- + +The Machine Learning (ML) commons API lets you create, train, and store machine learning algorithms both synchronously and asynchronously. + +In order to train tasks through the API, three inputs are required. + +- Algorithm name: Usually `FunctionaName`. This determines what algorithm the ML Engine runs. +- Model hyper parameters: Adjust these parameters to make the model train better. You can also implement `MLAgoParamas` to build custom parameters for each model. +- Input data: The data input that teaches the ML model. To input data, query against your index or use data frame. + +## Train model + + + diff --git a/_ml-commons-plugin/index.md b/_ml-commons-plugin/index.md new file mode 100644 index 00000000..bd945232 --- /dev/null +++ b/_ml-commons-plugin/index.md @@ -0,0 +1,31 @@ +--- +layout: default +title: About ML Commons +nav_order: 38 +has_children: false +--- + +# ML Commons plugin + +ML Commons for OpenSearch eases the development of machine learning features by providing a set of common machine learning (ML) algorithms through transport and REST API calls. Those calls choose the right nodes and resources for each ML request and monitors ML tasks to ensure uptime. This allows you to leverage existing open-source ML algorithms and reduce the effort required to develop new ML features. + +Models trained through the ML Commons plugin support two types of algorithms. + +- Model-based algorithms such Kmeans or Linear Regression. To get the best results, make sure you train your model first, then use the model to apply predictions. Linear Regression is only supported +- No-model based algorithm such as RCA. These algorithms can be executed directly through an `Executable` interface. + +Interaction with the ML commons plugin occurs through either the [REST API] or AD and Kmeans PPL commands. + +## Permissions + +There are two user roles that can make use of the ML commons plugin. + +- `ml_full_access`: Full access to all ML features, including starting new jobs and reading or deleting models. +- `ml_readonly_access`: Can only read trained models and statistics relevant to the model's cluster. Cannot start jobs or delete models. + +## Quickstart + + + + + diff --git a/_observability-plugin/ppl/commands.md b/_observability-plugin/ppl/commands.md index 7b5725b8..7d6d2553 100644 --- a/_observability-plugin/ppl/commands.md +++ b/_observability-plugin/ppl/commands.md @@ -732,3 +732,84 @@ PPL query: ```ppl search source=my_index | where match(message, "this is a test", operator=and, zero_terms_query=all) ``` + +## ad + +The `ad` command applies Random Cut Forest (RCF) algorithm in ml-commons plugin on the search result returned by a PPL command.Based on the input, two types of RCF algorithms will be utilized: fixed in time RCF for processing time-series data, batch RCF for processing non-time-series data. + +### Fixed In Time RCF For Time-series Data Command Syntax + +```sql +ad \ \ \ +``` + +Field | Description | Required +:--- | :--- |:--- +`shingle\_size` | A consecutive sequence of the most recent records. The default value is 8. | No +`time\_decay` | Specifies how much of the recent past to consider when computing an anomaly score. The default value is 0.001. | No +`time\_field` | Specifies the time filed for RCF to use as time-series data. | Yes + +### Batch RCF for Non-time-series Data Command Syntax + +```sql +ad \ \ +``` + +Field | Description | Required +:--- | :--- |:--- +`shingle\_size` | A consecutive sequence of the most recent records. The default value is 8. | No +`time\_decay` | Specifies how much of the recent past to consider when computing an anomaly score. The default value is 0.001. | No + +*Example 1*: Detecting events in New York City from taxi ridership data with time-series data + +The example trains a RCF model and use the model to detect anomalies in the time-series ridership data. + +PPL query: + +```sql +os> source=nyc_taxi | fields value, timestamp | AD time_field='timestamp' | where value=10844.0' +``` + +value | timestamp | score | anomaly_grade +:--- | :--- |:--- | :--- +10844.0 | 1404172800000 | 0.0 | 0.0 + +*Example 2*: Detecting events in New York City from taxi ridership data with non-time-series data + +PPL query: + +```sql +os> source=nyc_taxi | fields value | AD | where value=10844.0' +``` + +value | score | anomalous +:--- | :--- |:--- +| 10844.0 | 0.0 | false + +## kmeans + +The kmeans command applies kmeans algorithm in ml-commons plugin on the search result returned by a PPL command. + +## Syntax + +```sql +kmeans +``` + +For `cluster-number`, enter the number of clusters you want to group your data points into. + +*Example* + +The example shows how to classify three Iris species (Iris setosa, Iris virginica and Iris versicolor) based on the combination of four features measured from each sample: the length and the width of the sepals and petals. + +PPL query: + +```sql +os> source=iris_data | fields sepal_length_in_cm, sepal_width_in_cm, petal_length_in_cm, petal_width_in_cm | kmeans 3 +``` + +sepal_length_in_cm | sepal_width_in_cm | petal_length_in_cm | petal_width_in_cm | ClusterID +:--- | :--- |:--- | :--- | :--- +| 5.1 | 3.5 | 1.4 | 0.2 | 1 +| 5.6 | 3.0 | 4.1 | 1.3 | 0 +| 6.7 | 2.5 | 5.8 | 1.8 | 2 From 3b79db7383fb08a056afb4ecb98e5856000cdd1b Mon Sep 17 00:00:00 2001 From: Naarcha-AWS Date: Thu, 17 Mar 2022 17:18:15 -0500 Subject: [PATCH 02/15] Add API endpoints Signed-off-by: Naarcha-AWS --- _ml-commons-plugin/api.md | 463 +++++++++++++++++++++++++++++++++++++- 1 file changed, 462 insertions(+), 1 deletion(-) diff --git a/_ml-commons-plugin/api.md b/_ml-commons-plugin/api.md index 65a74a03..a472fdaa 100644 --- a/_ml-commons-plugin/api.md +++ b/_ml-commons-plugin/api.md @@ -2,9 +2,11 @@ layout: default title: API has_children: false -nav_order: 90 +nav_order: 99 --- +# ML Commons API + The Machine Learning (ML) commons API lets you create, train, and store machine learning algorithms both synchronously and asynchronously. In order to train tasks through the API, three inputs are required. @@ -15,5 +17,464 @@ In order to train tasks through the API, three inputs are required. ## Train model +Training can occur both synchronously and asynchronously. + +### Request + +The following examples use the kmeans algorithm to train index data. + +**Train with kmeans synchronously** + +```json +POST /_plugins/_ml/_train/kmeans +{ + "parameters": { + "centroids": 3, + "iterations": 10, + "distance_type": "COSINE" + }, + "input_query": { + "_source": ["petal_length_in_cm", "petal_width_in_cm"], + "size": 10000 + }, + "input_index": [ + "iris_data" + ] +} +``` + +**Train with kmeans asynchronously** + +```json +POST /_plugins/_ml/_train/kmeans?async=true +{ + "parameters": { + "centroids": 3, + "iterations": 10, + "distance_type": "COSINE" + }, + "input_query": { + "_source": ["petal_length_in_cm", "petal_width_in_cm"], + "size": 10000 + }, + "input_index": [ + "iris_data" + ] +} +``` + +### Response + +**Synchronously** + +For synchronous responses, the API returns the model_id, which can be used to get info or modify the model. + +```json +{ + "model_id" : "lblVmX8BO5w8y8RaYYvN", + "status" : "COMPLETED" +} +``` + +**Asynchronously** + +For asynchronous responses, the API returns the task_id, which can be used to get info or modify a task. + +```json +{ + "task_id" : "lrlamX8BO5w8y8Ra2otd", + "status" : "CREATED" +} +``` + +## Get model information + +You can retrieve information on your model using the model_id. + +### Request + +```json +GET /_plugins/_ml/models/ +``` + +### Response + +The API returns information on the model, the algorithm used, and the content found within the model. + +```json +{ + "name" : "KMEANS", + "algorithm" : "KMEANS", + "version" : 1, + "content" : "" +} +``` + +## Get task information + +You can retrieve information about a task using the task_id. + +### Request + +```json +GET /_plugins/_ml/tasks/ +``` + +### Response + +The response includes information about the task. + +```json +{ + "model_id" : "l7lamX8BO5w8y8Ra2oty", + "task_type" : "TRAINING", + "function_name" : "KMEANS", + "state" : "COMPLETED", + "input_type" : "SEARCH_QUERY", + "worker_node" : "54xOe0w8Qjyze00UuLDfdA", + "create_time" : 1647545342556, + "last_update_time" : 1647545342587, + "is_async" : true +} +``` + +## Predict + +Should you trained a synchronous, ML commons can predict new data with your trained model either from indexed data or a data frame. + +```json +POST /_plugins/_ml/_predict// +``` + +### Request + +```json +POST /_plugins/_ml/_predict/kmeans/eQlomX8Br-2Nu7fWjUu3 +{ + "input_query": { + "_source": ["petal_length_in_cm", "petal_width_in_cm"], + "size": 10000 + }, + "input_index": [ + "iris_data" + ] +} +``` + +### Response + +```json +{ + "status" : "COMPLETED", + "prediction_result" : { + "column_metas" : [ + { + "name" : "ClusterID", + "column_type" : "INTEGER" + } + ], + "rows" : [ + { + "values" : [ + { + "column_type" : "INTEGER", + "value" : 1 + } + ] + }, + { + "values" : [ + { + "column_type" : "INTEGER", + "value" : 1 + } + ] + }, + { + "values" : [ + { + "column_type" : "INTEGER", + "value" : 0 + } + ] + }, + { + "values" : [ + { + "column_type" : "INTEGER", + "value" : 0 + } + ] + }, + { + "values" : [ + { + "column_type" : "INTEGER", + "value" : 0 + } + ] + }, + { + "values" : [ + { + "column_type" : "INTEGER", + "value" : 0 + } + ] + } + ] + } +``` + + + +## Search model + +```json +POST /_plugins/_ml/models/_search +{query} +``` + + +### Example 1: Query all models + +```json +POST /_plugins/_ml/models/_search +{ + "query": { + "match_all": {} + }, + "size": 1000 +} +``` + +### Example 2: query models with algorithm "BATCh_RCF" + +```json +POST /_plugins/_ml/models/_search +{ + "query": { + "term": { + "algorithm": { + "value": "BATCH_RCF" + } + } + } +} +``` + +### Response + +```json +{ + "took" : 8, + "timed_out" : false, + "_shards" : { + "total" : 1, + "successful" : 1, + "skipped" : 0, + "failed" : 0 + }, + "hits" : { + "total" : { + "value" : 2, + "relation" : "eq" + }, + "max_score" : 2.4159138, + "hits" : [ + { + "_index" : ".plugins-ml-model", + "_type" : "_doc", + "_id" : "-QkKJX8BvytMh9aUeuLD", + "_version" : 1, + "_seq_no" : 12, + "_primary_term" : 15, + "_score" : 2.4159138, + "_source" : { + "name" : "FIT_RCF", + "version" : 1, + "content" : "xxx", + "algorithm" : "FIT_RCF" + } + }, + { + "_index" : ".plugins-ml-model", + "_type" : "_doc", + "_id" : "OxkvHn8BNJ65KnIpck8x", + "_version" : 1, + "_seq_no" : 2, + "_primary_term" : 8, + "_score" : 2.4159138, + "_source" : { + "name" : "FIT_RCF", + "version" : 1, + "content" : "xxx", + "algorithm" : "FIT_RCF" + } + } + ] + } + } +``` + +## Delete task + +```json +DELETE /_plugins/_ml/tasks/{task_id} +``` + +### Response + +```json +{ + "_index" : ".plugins-ml-task", + "_type" : "_doc", + "_id" : "xQRYLX8BydmmU1x6nuD3", + "_version" : 4, + "result" : "deleted", + "_shards" : { + "total" : 2, + "successful" : 2, + "failed" : 0 + }, + "_seq_no" : 42, + "_primary_term" : 7 +} +``` + +## Search task + +```json +GET /_plugins/_ml/tasks/_search +{query body} +``` + + +### Example: search task which "function_name" is "KMEANS" + +```json +GET /_plugins/_ml/tasks/_search +{ + "query": { + "bool": { + "filter": [ + { + "term": { + "function_name": "KMEANS" + } + } + ] + } + } +} +``` + +```json +{ + "took" : 12, + "timed_out" : false, + "_shards" : { + "total" : 1, + "successful" : 1, + "skipped" : 0, + "failed" : 0 + }, + "hits" : { + "total" : { + "value" : 2, + "relation" : "eq" + }, + "max_score" : 0.0, + "hits" : [ + { + "_index" : ".plugins-ml-task", + "_type" : "_doc", + "_id" : "_wnLJ38BvytMh9aUi-Ia", + "_version" : 4, + "_seq_no" : 29, + "_primary_term" : 4, + "_score" : 0.0, + "_source" : { + "last_update_time" : 1645640125267, + "create_time" : 1645640125209, + "is_async" : true, + "function_name" : "KMEANS", + "input_type" : "SEARCH_QUERY", + "worker_node" : "jjqFrlW7QWmni1tRnb_7Dg", + "state" : "COMPLETED", + "model_id" : "AAnLJ38BvytMh9aUi-M2", + "task_type" : "TRAINING" + } + }, + { + "_index" : ".plugins-ml-task", + "_type" : "_doc", + "_id" : "wwRRLX8BydmmU1x6I-AI", + "_version" : 3, + "_seq_no" : 38, + "_primary_term" : 7, + "_score" : 0.0, + "_source" : { + "last_update_time" : 1645732766656, + "create_time" : 1645732766472, + "is_async" : true, + "function_name" : "KMEANS", + "input_type" : "SEARCH_QUERY", + "worker_node" : "A_IiqoloTDK01uZvCjREaA", + "state" : "COMPLETED", + "model_id" : "xARRLX8BydmmU1x6I-CG", + "task_type" : "TRAINING" + } + } + ] + } +} +``` + +Stats + +```json +GET /_plugins/_ml/stats +GET /_plugins/_ml//stats/ +GET /_plugins/_ml//stats/ +GET /_plugins/_ml/stats/ +``` + + +### Example1: get all stats + +```json +GET /_plugins/_ml/stats +``` + +### Response + +```json +{ + "zbduvgCCSOeu6cfbQhTpnQ" : { + "ml_executing_task_count" : 0 + }, + "54xOe0w8Qjyze00UuLDfdA" : { + "ml_executing_task_count" : 0 + }, + "UJiykI7bTKiCpR-rqLYHyw" : { + "ml_executing_task_count" : 0 + }, + "zj2_NgIbTP-StNlGZJlxdg" : { + "ml_executing_task_count" : 0 + }, + "jjqFrlW7QWmni1tRnb_7Dg" : { + "ml_executing_task_count" : 0 + }, + "3pSSjl5PSVqzv5-hBdFqyA" : { + "ml_executing_task_count" : 0 + }, + "A_IiqoloTDK01uZvCjREaA" : { + "ml_executing_task_count" : 0 + } +} +``` + + From 7dd37280ae6a19c934474db2198b4ba1b17441e9 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS Date: Fri, 18 Mar 2022 11:25:19 -0500 Subject: [PATCH 03/15] Add all ML commons API endpoints and responses Signed-off-by: Naarcha-AWS --- _ml-commons-plugin/api.md | 503 +++++++++++++++++++++++++++++++++++++- 1 file changed, 493 insertions(+), 10 deletions(-) diff --git a/_ml-commons-plugin/api.md b/_ml-commons-plugin/api.md index a472fdaa..ca20799e 100644 --- a/_ml-commons-plugin/api.md +++ b/_ml-commons-plugin/api.md @@ -91,8 +91,6 @@ For asynchronous responses, the API returns the task_id, which can be used to ge You can retrieve information on your model using the model_id. -### Request - ```json GET /_plugins/_ml/models/ ``` @@ -114,8 +112,6 @@ The API returns information on the model, the algorithm used, and the content fo You can retrieve information about a task using the task_id. -### Request - ```json GET /_plugins/_ml/tasks/ ``` @@ -140,7 +136,7 @@ The response includes information about the task. ## Predict -Should you trained a synchronous, ML commons can predict new data with your trained model either from indexed data or a data frame. +ML commons can predict new data with your trained model either from indexed data or a data frame. ```json POST /_plugins/_ml/_predict// @@ -149,7 +145,7 @@ POST /_plugins/_ml/_predict// ### Request ```json -POST /_plugins/_ml/_predict/kmeans/eQlomX8Br-2Nu7fWjUu3 +POST /_plugins/_ml/_predict/kmeans/ { "input_query": { "_source": ["petal_length_in_cm", "petal_width_in_cm"], @@ -227,9 +223,473 @@ POST /_plugins/_ml/_predict/kmeans/eQlomX8Br-2Nu7fWjUu3 ``` +## Train and Predict + +Use to train and then immediately predict against the same training data set. Can only be used with synchronous models and the kmeans algorithm. + +### Example: Train and predict with Indexed data + + +```json +POST /_plugins/_ml/_train_predict/kmeans +{ + "parameters": { + "centroids": 2, + "iterations": 10, + "distance_type": "COSINE" + }, + "input_query": { + "query": { + "bool": { + "filter": [ + { + "range": { + "k1": { + "gte": 0 + } + } + } + ] + } + }, + "size": 10 + }, + "input_index": [ + "test_data" + ] +} +``` + +### Example: Train and predict with data directly + +```json +POST /_plugins/_ml/_train_predict/kmeans +{ + "parameters": { + "centroids": 2, + "iterations": 1, + "distance_type": "EUCLIDEAN" + }, + "input_data": { + "column_metas": [ + { + "name": "k1", + "column_type": "DOUBLE" + }, + { + "name": "k2", + "column_type": "DOUBLE" + } + ], + "rows": [ + { + "values": [ + { + "column_type": "DOUBLE", + "value": 1.00 + }, + { + "column_type": "DOUBLE", + "value": 2.00 + } + ] + }, + { + "values": [ + { + "column_type": "DOUBLE", + "value": 1.00 + }, + { + "column_type": "DOUBLE", + "value": 4.00 + } + ] + }, + { + "values": [ + { + "column_type": "DOUBLE", + "value": 1.00 + }, + { + "column_type": "DOUBLE", + "value": 0.00 + } + ] + }, + { + "values": [ + { + "column_type": "DOUBLE", + "value": 10.00 + }, + { + "column_type": "DOUBLE", + "value": 2.00 + } + ] + }, + { + "values": [ + { + "column_type": "DOUBLE", + "value": 10.00 + }, + { + "column_type": "DOUBLE", + "value": 4.00 + } + ] + }, + { + "values": [ + { + "column_type": "DOUBLE", + "value": 10.00 + }, + { + "column_type": "DOUBLE", + "value": 0.00 + } + ] + } + ] + } +} +``` + + +### Response + +**Response for index data** + +```json +{ + "status" : "COMPLETED", + "prediction_result" : { + "column_metas" : [ + { + "name" : "ClusterID", + "column_type" : "INTEGER" + } + ], + "rows" : [ + { + "values" : [ + { + "column_type" : "INTEGER", + "value" : 0 + } + ] + }, + { + "values" : [ + { + "column_type" : "INTEGER", + "value" : 0 + } + ] + }, + { + "values" : [ + { + "column_type" : "INTEGER", + "value" : 0 + } + ] + }, + { + "values" : [ + { + "column_type" : "INTEGER", + "value" : 0 + } + ] + }, + { + "values" : [ + { + "column_type" : "INTEGER", + "value" : 0 + } + ] + }, + { + "values" : [ + { + "column_type" : "INTEGER", + "value" : 0 + } + ] + } + ] + } +} +``` + +**Response for data input directly** + +```json +{ + "status" : "COMPLETED", + "prediction_result" : { + "column_metas" : [ + { + "name" : "score", + "column_type" : "DOUBLE" + }, + { + "name" : "anomaly_grade", + "column_type" : "DOUBLE" + }, + { + "name" : "timestamp", + "column_type" : "LONG" + } + ], + "rows" : [ + { + "values" : [ + { + "column_type" : "DOUBLE", + "value" : 0.0 + }, + { + "column_type" : "DOUBLE", + "value" : 0.0 + }, + { + "column_type" : "LONG", + "value" : 1404187200000 + } + ] + }, + ... + ] + } +} +``` + +## Execute + +Use the Execute API to run no-model-based algorithms. You do not need to train a model in order to receive results from your chosen algorithm. + +```json +POST _plugins/_ml/_execute/ +``` + +### Example: Execute sample calculator, supported "operation": max/min/sum + +```json +POST _plugins/_ml/_execute/local_sample_calculator +{ + "operation": "max", + "input_data": [1.0, 2.0, 3.0] +} +``` + + +### Example: Execute anomaly localization + +```json +POST /_plugins/_ml/_execute/anomaly_localization +{ + "index_name": "rca-index", + "attribute_field_names": [ + "attribute" + ], + "aggregations": [ + { + "sum": { + "sum": { + "field": "value" + } + } + } + ], + "time_field_name": "timestamp", + "start_time": 1620630000000, + "end_time": 1621234800000, + "min_time_interval": 86400000, + "num_outputs": 2 +} +``` + +### Response + +**Sample calculator response** + +```json +{ + "sample_result" : 3.0 +} +``` + +**Sample anomaly response** + +```json +{ + "results" : [ + { + "name" : "sum", + "result" : { + "buckets" : [ + { + "start_time" : 1620630000000, + "end_time" : 1620716400000, + "overall_aggregate_value" : 65.0 + }, + { + "start_time" : 1620716400000, + "end_time" : 1620802800000, + "overall_aggregate_value" : 75.0, + "entities" : [ + { + "key" : [ + "attr0" + ], + "contribution_value" : 1.0, + "base_value" : 2.0, + "new_value" : 3.0 + }, + { + "key" : [ + "attr1" + ], + "contribution_value" : 1.0, + "base_value" : 3.0, + "new_value" : 4.0 + } + ] + }, + { + "start_time" : 1620802800000, + "end_time" : 1620889200000, + "overall_aggregate_value" : 85.0, + "entities" : [ + { + "key" : [ + "attr0" + ], + "contribution_value" : 2.0, + "base_value" : 2.0, + "new_value" : 4.0 + }, + { + "key" : [ + "attr1" + ], + "contribution_value" : 2.0, + "base_value" : 3.0, + "new_value" : 5.0 + } + ] + }, + { + "start_time" : 1620889200000, + "end_time" : 1620975600000, + "overall_aggregate_value" : 95.0, + "entities" : [ + { + "key" : [ + "attr0" + ], + "contribution_value" : 3.0, + "base_value" : 2.0, + "new_value" : 5.0 + }, + { + "key" : [ + "attr1" + ], + "contribution_value" : 3.0, + "base_value" : 3.0, + "new_value" : 6.0 + } + ] + }, + { + "start_time" : 1620975600000, + "end_time" : 1621062000000, + "overall_aggregate_value" : 105.0, + "entities" : [ + { + "key" : [ + "attr0" + ], + "contribution_value" : 4.0, + "base_value" : 2.0, + "new_value" : 6.0 + }, + { + "key" : [ + "attr1" + ], + "contribution_value" : 4.0, + "base_value" : 3.0, + "new_value" : 7.0 + } + ] + }, + { + "start_time" : 1621062000000, + "end_time" : 1621148400000, + "overall_aggregate_value" : 115.0, + "entities" : [ + { + "key" : [ + "attr0" + ], + "contribution_value" : 5.0, + "base_value" : 2.0, + "new_value" : 7.0 + }, + { + "key" : [ + "attr1" + ], + "contribution_value" : 5.0, + "base_value" : 3.0, + "new_value" : 8.0 + } + ] + }, + { + "start_time" : 1621148400000, + "end_time" : 1621234800000, + "overall_aggregate_value" : 125.0, + "entities" : [ + { + "key" : [ + "attr0" + ], + "contribution_value" : 6.0, + "base_value" : 2.0, + "new_value" : 8.0 + }, + { + "key" : [ + "attr1" + ], + "contribution_value" : 6.0, + "base_value" : 3.0, + "new_value" : 9.0 + } + ] + } + ] + } + } + ] +} +``` ## Search model +Use this command to search models you're already created. + + ```json POST /_plugins/_ml/models/_search {query} @@ -248,7 +708,7 @@ POST /_plugins/_ml/models/_search } ``` -### Example 2: query models with algorithm "BATCh_RCF" +### Example 2: Query models with algorithm "BATCh_RCF" ```json POST /_plugins/_ml/models/_search @@ -319,6 +779,8 @@ POST /_plugins/_ml/models/_search ## Delete task +Delete a task based on the task_id. + ```json DELETE /_plugins/_ml/tasks/{task_id} ``` @@ -344,13 +806,15 @@ DELETE /_plugins/_ml/tasks/{task_id} ## Search task +Search tasks based on parameters indicated in the request body. + ```json GET /_plugins/_ml/tasks/_search {query body} ``` -### Example: search task which "function_name" is "KMEANS" +### Example: Search task which "function_name" is "KMEANS" ```json GET /_plugins/_ml/tasks/_search @@ -431,17 +895,36 @@ GET /_plugins/_ml/tasks/_search } ``` -Stats +## Stats + +Get statistics related to the number of tasks. + +To receive all stats, use: ```json GET /_plugins/_ml/stats +``` + +To receive stats for a specific node, use: + +```json GET /_plugins/_ml//stats/ +``` + +To receive starts for a specific node and return a specified stat, use: + +```json GET /_plugins/_ml//stats/ +``` + +To receive information on a specific stat from all nodes, use: + +```json GET /_plugins/_ml/stats/ ``` -### Example1: get all stats +### Example: Get all stats ```json GET /_plugins/_ml/stats From aa7356a787e56e2080f7f4103ee034f026549ede Mon Sep 17 00:00:00 2001 From: Naarcha-AWS Date: Fri, 18 Mar 2022 11:45:21 -0500 Subject: [PATCH 04/15] Add links to API and PPl. Add to site. Signed-off-by: Naarcha-AWS --- _config.yml | 3 +++ _ml-commons-plugin/api.md | 9 +++++++++ _ml-commons-plugin/index.md | 11 +++++++---- index.md | 2 ++ 4 files changed, 21 insertions(+), 4 deletions(-) diff --git a/_config.yml b/_config.yml index 2d609a75..ddf65095 100644 --- a/_config.yml +++ b/_config.yml @@ -94,6 +94,9 @@ just_the_docs: observability-plugin: name: Observability plugin nav_fold: true + ml-commons-plugin: + name: ML Commons plugin + nav_fold: true monitoring-plugins: name: Monitoring plugins nav_fold: true diff --git a/_ml-commons-plugin/api.md b/_ml-commons-plugin/api.md index ca20799e..2db6d6b6 100644 --- a/_ml-commons-plugin/api.md +++ b/_ml-commons-plugin/api.md @@ -7,6 +7,15 @@ nav_order: 99 # ML Commons API +--- + +#### Table of contents +- TOC +{:toc} + + +--- + The Machine Learning (ML) commons API lets you create, train, and store machine learning algorithms both synchronously and asynchronously. In order to train tasks through the API, three inputs are required. diff --git a/_ml-commons-plugin/index.md b/_ml-commons-plugin/index.md index bd945232..d2c11a6d 100644 --- a/_ml-commons-plugin/index.md +++ b/_ml-commons-plugin/index.md @@ -3,18 +3,19 @@ layout: default title: About ML Commons nav_order: 38 has_children: false +has_toc: false --- # ML Commons plugin ML Commons for OpenSearch eases the development of machine learning features by providing a set of common machine learning (ML) algorithms through transport and REST API calls. Those calls choose the right nodes and resources for each ML request and monitors ML tasks to ensure uptime. This allows you to leverage existing open-source ML algorithms and reduce the effort required to develop new ML features. -Models trained through the ML Commons plugin support two types of algorithms. +Models trained through the ML Commons plugin support two types of algorithms: -- Model-based algorithms such Kmeans or Linear Regression. To get the best results, make sure you train your model first, then use the model to apply predictions. Linear Regression is only supported +- Model-based algorithms such kmeans or Linear Regression. To get the best results, make sure you train your model first, then use the model to apply predictions. Linear Regression is only supported - No-model based algorithm such as RCA. These algorithms can be executed directly through an `Executable` interface. -Interaction with the ML commons plugin occurs through either the [REST API] or AD and Kmeans PPL commands. +Interaction with the ML commons plugin occurs through either the [REST API]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api) or [AD]({{site.url}}{{site.baseurl}}/ppl/commands#ad) and [kmeans]({{site.url}}{{site.baseurl}}/observability-plugin/ppl/commands#kmeans) PPL commands. ## Permissions @@ -23,7 +24,9 @@ There are two user roles that can make use of the ML commons plugin. - `ml_full_access`: Full access to all ML features, including starting new jobs and reading or deleting models. - `ml_readonly_access`: Can only read trained models and statistics relevant to the model's cluster. Cannot start jobs or delete models. -## Quickstart + + + diff --git a/index.md b/index.md index 836ea22d..9ffe6cf2 100755 --- a/index.md +++ b/index.md @@ -35,9 +35,11 @@ Component | Purpose [KNN]({{site.url}}{{site.baseurl}}/search-plugins/knn/) | Find “nearest neighbors” in your vector data [Performance Analyzer]({{site.url}}{{site.baseurl}}/monitoring-plugins/pa/) | Monitor and optimize your cluster [Anomaly detection]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/) | Identify atypical data and receive automatic notifications +[ML Commons plugin]({{site.url}}{{site.baseurl}}/ml-commons-plugin/) | Train and execute machine-learning models [Asynchronous search]({{site.url}}{{site.baseurl}}/search-plugins/async/) | Run search requests in the background [Cross-cluster replication]({{site.url}}{{site.baseurl}}/replication-plugin/index/) | Replicate your data across multiple OpenSearch clusters + Most OpenSearch plugins have corresponding OpenSearch Dashboards plugins that provide a convenient, unified user interface. For specifics around the project, see the [FAQ](https://opensearch.org/faq/). From ebbab2dea06a0b82dd2493990985727d42bb0cd4 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS Date: Fri, 18 Mar 2022 12:33:10 -0500 Subject: [PATCH 05/15] Fix link in Index.md Signed-off-by: Naarcha-AWS --- _config.yml | 3 +++ _ml-commons-plugin/api.md | 4 ---- _ml-commons-plugin/index.md | 4 ++-- index.md | 2 +- 4 files changed, 6 insertions(+), 7 deletions(-) diff --git a/_config.yml b/_config.yml index ddf65095..524e659c 100644 --- a/_config.yml +++ b/_config.yml @@ -52,6 +52,9 @@ collections: observability-plugin: permalink: /:collection/:path/ output: true + ml-commons-plugin: + permalink: /:collection/:path/ + output: true monitoring-plugins: permalink: /:collection/:path/ output: true diff --git a/_ml-commons-plugin/api.md b/_ml-commons-plugin/api.md index 2db6d6b6..90a31f2a 100644 --- a/_ml-commons-plugin/api.md +++ b/_ml-commons-plugin/api.md @@ -104,8 +104,6 @@ You can retrieve information on your model using the model_id. GET /_plugins/_ml/models/ ``` -### Response - The API returns information on the model, the algorithm used, and the content found within the model. ```json @@ -125,8 +123,6 @@ You can retrieve information about a task using the task_id. GET /_plugins/_ml/tasks/ ``` -### Response - The response includes information about the task. ```json diff --git a/_ml-commons-plugin/index.md b/_ml-commons-plugin/index.md index d2c11a6d..ceb9dc85 100644 --- a/_ml-commons-plugin/index.md +++ b/_ml-commons-plugin/index.md @@ -1,7 +1,7 @@ --- layout: default -title: About ML Commons -nav_order: 38 +title: About ML Commons +nav_order: 1 has_children: false has_toc: false --- diff --git a/index.md b/index.md index 9ffe6cf2..68510abb 100755 --- a/index.md +++ b/index.md @@ -35,7 +35,7 @@ Component | Purpose [KNN]({{site.url}}{{site.baseurl}}/search-plugins/knn/) | Find “nearest neighbors” in your vector data [Performance Analyzer]({{site.url}}{{site.baseurl}}/monitoring-plugins/pa/) | Monitor and optimize your cluster [Anomaly detection]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/) | Identify atypical data and receive automatic notifications -[ML Commons plugin]({{site.url}}{{site.baseurl}}/ml-commons-plugin/) | Train and execute machine-learning models +[ML Commons plugin]({{site.url}}{{site.baseurl}}/ml-commons-plugin/index/) | Train and execute machine-learning models [Asynchronous search]({{site.url}}{{site.baseurl}}/search-plugins/async/) | Run search requests in the background [Cross-cluster replication]({{site.url}}{{site.baseurl}}/replication-plugin/index/) | Replicate your data across multiple OpenSearch clusters From 7b01859ee917bd72dd3fa09742f26efc69558638 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS Date: Sat, 19 Mar 2022 12:37:01 -0500 Subject: [PATCH 06/15] Add review feedback Signed-off-by: Naarcha-AWS --- _ml-commons-plugin/api.md | 280 ++------------------------ _ml-commons-plugin/index.md | 6 +- _observability-plugin/ppl/commands.md | 18 +- 3 files changed, 23 insertions(+), 281 deletions(-) diff --git a/_ml-commons-plugin/api.md b/_ml-commons-plugin/api.md index 90a31f2a..8a9ab582 100644 --- a/_ml-commons-plugin/api.md +++ b/_ml-commons-plugin/api.md @@ -16,13 +16,13 @@ nav_order: 99 --- -The Machine Learning (ML) commons API lets you create, train, and store machine learning algorithms both synchronously and asynchronously. +The Machine Learning (ML) commons API lets you train ML algorithms synchronously and asynchronously, and then store that model in an ML model index. In order to train tasks through the API, three inputs are required. - Algorithm name: Usually `FunctionaName`. This determines what algorithm the ML Engine runs. -- Model hyper parameters: Adjust these parameters to make the model train better. You can also implement `MLAgoParamas` to build custom parameters for each model. -- Input data: The data input that teaches the ML model. To input data, query against your index or use data frame. +- Model hyper parameters: Adjust these parameters to make the model train better. +- Input data: The data input that trains the ML model, or applies the ML models to predictions. To input data, query against your index or use data frame. ## Train model @@ -76,7 +76,7 @@ POST /_plugins/_ml/_train/kmeans?async=true **Synchronously** -For synchronous responses, the API returns the model_id, which can be used to get info or modify the model. +For synchronous responses, the API returns the model_id, which can be used to get info on the model or modify the model. ```json { @@ -141,7 +141,7 @@ The response includes information about the task. ## Predict -ML commons can predict new data with your trained model either from indexed data or a data frame. +ML commons can predict new data with your trained model either from indexed data or a data frame. The model_id is required to use the Predict API. ```json POST /_plugins/_ml/_predict// @@ -230,7 +230,11 @@ POST /_plugins/_ml/_predict/kmeans/ ## Train and Predict -Use to train and then immediately predict against the same training data set. Can only be used with synchronous models and the kmeans algorithm. +Use to train and then immediately predict against the same training data set. Can only be used with synchronous models and the following algorithms: + +- BATCH_RCF +- FIT_RCF +- kmeans ### Example: Train and predict with Indexed data @@ -364,11 +368,8 @@ POST /_plugins/_ml/_train_predict/kmeans } ``` - ### Response -**Response for index data** - ```json { "status" : "COMPLETED", @@ -433,263 +434,6 @@ POST /_plugins/_ml/_train_predict/kmeans } ``` -**Response for data input directly** - -```json -{ - "status" : "COMPLETED", - "prediction_result" : { - "column_metas" : [ - { - "name" : "score", - "column_type" : "DOUBLE" - }, - { - "name" : "anomaly_grade", - "column_type" : "DOUBLE" - }, - { - "name" : "timestamp", - "column_type" : "LONG" - } - ], - "rows" : [ - { - "values" : [ - { - "column_type" : "DOUBLE", - "value" : 0.0 - }, - { - "column_type" : "DOUBLE", - "value" : 0.0 - }, - { - "column_type" : "LONG", - "value" : 1404187200000 - } - ] - }, - ... - ] - } -} -``` - -## Execute - -Use the Execute API to run no-model-based algorithms. You do not need to train a model in order to receive results from your chosen algorithm. - -```json -POST _plugins/_ml/_execute/ -``` - -### Example: Execute sample calculator, supported "operation": max/min/sum - -```json -POST _plugins/_ml/_execute/local_sample_calculator -{ - "operation": "max", - "input_data": [1.0, 2.0, 3.0] -} -``` - - -### Example: Execute anomaly localization - -```json -POST /_plugins/_ml/_execute/anomaly_localization -{ - "index_name": "rca-index", - "attribute_field_names": [ - "attribute" - ], - "aggregations": [ - { - "sum": { - "sum": { - "field": "value" - } - } - } - ], - "time_field_name": "timestamp", - "start_time": 1620630000000, - "end_time": 1621234800000, - "min_time_interval": 86400000, - "num_outputs": 2 -} -``` - -### Response - -**Sample calculator response** - -```json -{ - "sample_result" : 3.0 -} -``` - -**Sample anomaly response** - -```json -{ - "results" : [ - { - "name" : "sum", - "result" : { - "buckets" : [ - { - "start_time" : 1620630000000, - "end_time" : 1620716400000, - "overall_aggregate_value" : 65.0 - }, - { - "start_time" : 1620716400000, - "end_time" : 1620802800000, - "overall_aggregate_value" : 75.0, - "entities" : [ - { - "key" : [ - "attr0" - ], - "contribution_value" : 1.0, - "base_value" : 2.0, - "new_value" : 3.0 - }, - { - "key" : [ - "attr1" - ], - "contribution_value" : 1.0, - "base_value" : 3.0, - "new_value" : 4.0 - } - ] - }, - { - "start_time" : 1620802800000, - "end_time" : 1620889200000, - "overall_aggregate_value" : 85.0, - "entities" : [ - { - "key" : [ - "attr0" - ], - "contribution_value" : 2.0, - "base_value" : 2.0, - "new_value" : 4.0 - }, - { - "key" : [ - "attr1" - ], - "contribution_value" : 2.0, - "base_value" : 3.0, - "new_value" : 5.0 - } - ] - }, - { - "start_time" : 1620889200000, - "end_time" : 1620975600000, - "overall_aggregate_value" : 95.0, - "entities" : [ - { - "key" : [ - "attr0" - ], - "contribution_value" : 3.0, - "base_value" : 2.0, - "new_value" : 5.0 - }, - { - "key" : [ - "attr1" - ], - "contribution_value" : 3.0, - "base_value" : 3.0, - "new_value" : 6.0 - } - ] - }, - { - "start_time" : 1620975600000, - "end_time" : 1621062000000, - "overall_aggregate_value" : 105.0, - "entities" : [ - { - "key" : [ - "attr0" - ], - "contribution_value" : 4.0, - "base_value" : 2.0, - "new_value" : 6.0 - }, - { - "key" : [ - "attr1" - ], - "contribution_value" : 4.0, - "base_value" : 3.0, - "new_value" : 7.0 - } - ] - }, - { - "start_time" : 1621062000000, - "end_time" : 1621148400000, - "overall_aggregate_value" : 115.0, - "entities" : [ - { - "key" : [ - "attr0" - ], - "contribution_value" : 5.0, - "base_value" : 2.0, - "new_value" : 7.0 - }, - { - "key" : [ - "attr1" - ], - "contribution_value" : 5.0, - "base_value" : 3.0, - "new_value" : 8.0 - } - ] - }, - { - "start_time" : 1621148400000, - "end_time" : 1621234800000, - "overall_aggregate_value" : 125.0, - "entities" : [ - { - "key" : [ - "attr0" - ], - "contribution_value" : 6.0, - "base_value" : 2.0, - "new_value" : 8.0 - }, - { - "key" : [ - "attr1" - ], - "contribution_value" : 6.0, - "base_value" : 3.0, - "new_value" : 9.0 - } - ] - } - ] - } - } - ] -} -``` - ## Search model Use this command to search models you're already created. @@ -713,7 +457,7 @@ POST /_plugins/_ml/models/_search } ``` -### Example 2: Query models with algorithm "BATCh_RCF" +### Example 2: Query models with algorithm "FIT_RCF" ```json POST /_plugins/_ml/models/_search @@ -721,7 +465,7 @@ POST /_plugins/_ml/models/_search "query": { "term": { "algorithm": { - "value": "BATCH_RCF" + "value": "FIT_RCF" } } } diff --git a/_ml-commons-plugin/index.md b/_ml-commons-plugin/index.md index ceb9dc85..140e2c7c 100644 --- a/_ml-commons-plugin/index.md +++ b/_ml-commons-plugin/index.md @@ -10,16 +10,14 @@ has_toc: false ML Commons for OpenSearch eases the development of machine learning features by providing a set of common machine learning (ML) algorithms through transport and REST API calls. Those calls choose the right nodes and resources for each ML request and monitors ML tasks to ensure uptime. This allows you to leverage existing open-source ML algorithms and reduce the effort required to develop new ML features. -Models trained through the ML Commons plugin support two types of algorithms: +Models trained through the ML Commons plugin support model-based algorithms such as kmeans or Linear Regression. To get the best results, make sure you train your model first, then use the model to apply predictions. Linear Regression is only supported for synchronous models. -- Model-based algorithms such kmeans or Linear Regression. To get the best results, make sure you train your model first, then use the model to apply predictions. Linear Regression is only supported -- No-model based algorithm such as RCA. These algorithms can be executed directly through an `Executable` interface. Interaction with the ML commons plugin occurs through either the [REST API]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api) or [AD]({{site.url}}{{site.baseurl}}/ppl/commands#ad) and [kmeans]({{site.url}}{{site.baseurl}}/observability-plugin/ppl/commands#kmeans) PPL commands. ## Permissions -There are two user roles that can make use of the ML commons plugin. +There are two reserved user roles that can use of the ML commons plugin. - `ml_full_access`: Full access to all ML features, including starting new jobs and reading or deleting models. - `ml_readonly_access`: Can only read trained models and statistics relevant to the model's cluster. Cannot start jobs or delete models. diff --git a/_observability-plugin/ppl/commands.md b/_observability-plugin/ppl/commands.md index 6c8f82db..577d8884 100644 --- a/_observability-plugin/ppl/commands.md +++ b/_observability-plugin/ppl/commands.md @@ -834,30 +834,30 @@ search source=my_index | where match(message, "this is a test", operator=and, ze ## ad -The `ad` command applies Random Cut Forest (RCF) algorithm in ml-commons plugin on the search result returned by a PPL command.Based on the input, two types of RCF algorithms will be utilized: fixed in time RCF for processing time-series data, batch RCF for processing non-time-series data. +The `ad` command applies the Random Cut Forest (RCF) algorithm in the ML Commons plugin on the search result returned by a PPL command. Based on the input, the plugin uses two types of RCF algorithms: fixed in time RCF for processing time-series data and batch RCF for processing non-time-series data. ### Fixed In Time RCF For Time-series Data Command Syntax ```sql -ad \ \ \ +ad \ \ \ ``` Field | Description | Required :--- | :--- |:--- -`shingle\_size` | A consecutive sequence of the most recent records. The default value is 8. | No -`time\_decay` | Specifies how much of the recent past to consider when computing an anomaly score. The default value is 0.001. | No -`time\_field` | Specifies the time filed for RCF to use as time-series data. | Yes +`shingle_size` | A consecutive sequence of the most recent records. The default value is 8. | No +`time_decay` | Specifies how much of the recent past to consider when computing an anomaly score. The default value is 0.001. | No +`time_field` | Specifies the time filed for RCF to use as time-series data. Must be either a long value, such as the timestamp in miliseconds, or a string value in yyyy-MM-dd HH:mm:ss.| Yes ### Batch RCF for Non-time-series Data Command Syntax ```sql -ad \ \ +ad \ \ ``` Field | Description | Required :--- | :--- |:--- -`shingle\_size` | A consecutive sequence of the most recent records. The default value is 8. | No -`time\_decay` | Specifies how much of the recent past to consider when computing an anomaly score. The default value is 0.001. | No +`shingle_size` | A consecutive sequence of the most recent records. The default value is 8. | No +`time_decay` | Specifies how much of the recent past to consider when computing an anomaly score. The default value is 0.001. | No *Example 1*: Detecting events in New York City from taxi ridership data with time-series data @@ -887,7 +887,7 @@ value | score | anomalous ## kmeans -The kmeans command applies kmeans algorithm in ml-commons plugin on the search result returned by a PPL command. +The kmeans command applies the ML Commons plugin's kmeans algorithm to the provided PPL command's search results. ## Syntax From 228dd73a603988643da061e9a676a6dfe1abf733 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS Date: Mon, 21 Mar 2022 10:13:15 -0500 Subject: [PATCH 07/15] Small fix Signed-off-by: Naarcha-AWS --- _ml-commons-plugin/index.md | 1 - 1 file changed, 1 deletion(-) diff --git a/_ml-commons-plugin/index.md b/_ml-commons-plugin/index.md index 140e2c7c..22d73b29 100644 --- a/_ml-commons-plugin/index.md +++ b/_ml-commons-plugin/index.md @@ -12,7 +12,6 @@ ML Commons for OpenSearch eases the development of machine learning features by Models trained through the ML Commons plugin support model-based algorithms such as kmeans or Linear Regression. To get the best results, make sure you train your model first, then use the model to apply predictions. Linear Regression is only supported for synchronous models. - Interaction with the ML commons plugin occurs through either the [REST API]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api) or [AD]({{site.url}}{{site.baseurl}}/ppl/commands#ad) and [kmeans]({{site.url}}{{site.baseurl}}/observability-plugin/ppl/commands#kmeans) PPL commands. ## Permissions From bbbcdc32ef2779f92b6a6e4ef9c8ae5c2ae5ff9a Mon Sep 17 00:00:00 2001 From: Naarcha-AWS Date: Mon, 21 Mar 2022 15:20:31 -0500 Subject: [PATCH 08/15] Incorporate final technical feedback Signed-off-by: Naarcha-AWS --- _ml-commons-plugin/api.md | 93 ++++++++++++++++++--------- _ml-commons-plugin/index.md | 6 +- _observability-plugin/ppl/commands.md | 6 +- 3 files changed, 67 insertions(+), 38 deletions(-) diff --git a/_ml-commons-plugin/api.md b/_ml-commons-plugin/api.md index 8a9ab582..b5358dab 100644 --- a/_ml-commons-plugin/api.md +++ b/_ml-commons-plugin/api.md @@ -16,11 +16,11 @@ nav_order: 99 --- -The Machine Learning (ML) commons API lets you train ML algorithms synchronously and asynchronously, and then store that model in an ML model index. +The Machine Learning (ML) commons API lets you train ML algorithms synchronously and asynchronously, make predictions with that trained model, train and predict with the same data set, and then store that model in an ML model index. In order to train tasks through the API, three inputs are required. -- Algorithm name: Usually `FunctionaName`. This determines what algorithm the ML Engine runs. +- Algorithm name: Must be one of a [FunctionaName](https://github.com/opensearch-project/ml-commons/blob/1.3/common/src/main/java/org/opensearch/ml/common/parameter/FunctionName.java). This determines what algorithm the ML Engine runs. - Model hyper parameters: Adjust these parameters to make the model train better. - Input data: The data input that trains the ML model, or applies the ML models to predictions. To input data, query against your index or use data frame. @@ -385,7 +385,7 @@ POST /_plugins/_ml/_train_predict/kmeans "values" : [ { "column_type" : "INTEGER", - "value" : 0 + "value" : 1 } ] }, @@ -393,7 +393,7 @@ POST /_plugins/_ml/_train_predict/kmeans "values" : [ { "column_type" : "INTEGER", - "value" : 0 + "value" : 1 } ] }, @@ -401,7 +401,7 @@ POST /_plugins/_ml/_train_predict/kmeans "values" : [ { "column_type" : "INTEGER", - "value" : 0 + "value" : 1 } ] }, @@ -526,33 +526,6 @@ POST /_plugins/_ml/models/_search } ``` -## Delete task - -Delete a task based on the task_id. - -```json -DELETE /_plugins/_ml/tasks/{task_id} -``` - -### Response - -```json -{ - "_index" : ".plugins-ml-task", - "_type" : "_doc", - "_id" : "xQRYLX8BydmmU1x6nuD3", - "_version" : 4, - "result" : "deleted", - "_shards" : { - "total" : 2, - "successful" : 2, - "failed" : 0 - }, - "_seq_no" : 42, - "_primary_term" : 7 -} -``` - ## Search task Search tasks based on parameters indicated in the request body. @@ -707,6 +680,62 @@ GET /_plugins/_ml/stats } ``` +## Delete task + +Delete a task based on the task_id. + +```json +DELETE /_plugins/_ml/tasks/{task_id} +``` + +The API returns the following: + +```json +{ + "_index" : ".plugins-ml-task", + "_type" : "_doc", + "_id" : "xQRYLX8BydmmU1x6nuD3", + "_version" : 4, + "result" : "deleted", + "_shards" : { + "total" : 2, + "successful" : 2, + "failed" : 0 + }, + "_seq_no" : 42, + "_primary_term" : 7 +} +``` + +## Delete model + +Deletes a model based on the model_id + +```json +DELETE /_plugins/_ml/models/ +``` + +The API returns the following: + +```json +{ + "_index" : ".plugins-ml-model", + "_type" : "_doc", + "_id" : "MzcIJX8BA7mbufL6DOwl", + "_version" : 2, + "result" : "deleted", + "_shards" : { + "total" : 2, + "successful" : 2, + "failed" : 0 + }, + "_seq_no" : 27, + "_primary_term" : 18 +} +``` + + + diff --git a/_ml-commons-plugin/index.md b/_ml-commons-plugin/index.md index 22d73b29..35809673 100644 --- a/_ml-commons-plugin/index.md +++ b/_ml-commons-plugin/index.md @@ -10,7 +10,7 @@ has_toc: false ML Commons for OpenSearch eases the development of machine learning features by providing a set of common machine learning (ML) algorithms through transport and REST API calls. Those calls choose the right nodes and resources for each ML request and monitors ML tasks to ensure uptime. This allows you to leverage existing open-source ML algorithms and reduce the effort required to develop new ML features. -Models trained through the ML Commons plugin support model-based algorithms such as kmeans or Linear Regression. To get the best results, make sure you train your model first, then use the model to apply predictions. Linear Regression is only supported for synchronous models. +Models trained through the ML Commons plugin support model-based algorithms such as kmeans or Linear Regression. To get the best results, make sure you train your model first, then use the model to apply predictions. Interaction with the ML commons plugin occurs through either the [REST API]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api) or [AD]({{site.url}}{{site.baseurl}}/ppl/commands#ad) and [kmeans]({{site.url}}{{site.baseurl}}/observability-plugin/ppl/commands#kmeans) PPL commands. @@ -18,8 +18,8 @@ Interaction with the ML commons plugin occurs through either the [REST API]({{si There are two reserved user roles that can use of the ML commons plugin. -- `ml_full_access`: Full access to all ML features, including starting new jobs and reading or deleting models. -- `ml_readonly_access`: Can only read trained models and statistics relevant to the model's cluster. Cannot start jobs or delete models. +- `ml_full_access`: Full access to all ML features, including starting new ML tasks and reading or deleting models. +- `ml_readonly_access`: Can only read ML tasks, trained models and statistics relevant to the model's cluster. Cannot start nor delete ML tasks or models. diff --git a/_observability-plugin/ppl/commands.md b/_observability-plugin/ppl/commands.md index 577d8884..142b5279 100644 --- a/_observability-plugin/ppl/commands.md +++ b/_observability-plugin/ppl/commands.md @@ -846,7 +846,7 @@ Field | Description | Required :--- | :--- |:--- `shingle_size` | A consecutive sequence of the most recent records. The default value is 8. | No `time_decay` | Specifies how much of the recent past to consider when computing an anomaly score. The default value is 0.001. | No -`time_field` | Specifies the time filed for RCF to use as time-series data. Must be either a long value, such as the timestamp in miliseconds, or a string value in yyyy-MM-dd HH:mm:ss.| Yes +`time_field` | Specifies the time filed for RCF to use as time-series data. Must be either a long value, such as the timestamp in miliseconds, or a string value in "yyyy-MM-dd HH:mm:ss".| Yes ### Batch RCF for Non-time-series Data Command Syntax @@ -866,7 +866,7 @@ The example trains a RCF model and use the model to detect anomalies in the time PPL query: ```sql -os> source=nyc_taxi | fields value, timestamp | AD time_field='timestamp' | where value=10844.0' +os> source=nyc_taxi | fields value, timestamp | AD time_field='timestamp' | where value=10844.0 ``` value | timestamp | score | anomaly_grade @@ -878,7 +878,7 @@ value | timestamp | score | anomaly_grade PPL query: ```sql -os> source=nyc_taxi | fields value | AD | where value=10844.0' +os> source=nyc_taxi | fields value | AD | where value=10844.0 ``` value | score | anomalous From 2a637ed31574f4ddbada7d5179c453a69e921b68 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS Date: Mon, 21 Mar 2022 15:27:34 -0500 Subject: [PATCH 09/15] Remove forwardslash from PPL commands Signed-off-by: Naarcha-AWS --- _observability-plugin/ppl/commands.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/_observability-plugin/ppl/commands.md b/_observability-plugin/ppl/commands.md index 142b5279..98e05887 100644 --- a/_observability-plugin/ppl/commands.md +++ b/_observability-plugin/ppl/commands.md @@ -839,7 +839,7 @@ The `ad` command applies the Random Cut Forest (RCF) algorithm in the ML Commons ### Fixed In Time RCF For Time-series Data Command Syntax ```sql -ad \ \ \ +ad ``` Field | Description | Required @@ -851,7 +851,7 @@ Field | Description | Required ### Batch RCF for Non-time-series Data Command Syntax ```sql -ad \ \ +ad ``` Field | Description | Required From d039da1b18bc1315d5fa5c9486ead80aff2977e6 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS Date: Mon, 21 Mar 2022 15:28:37 -0500 Subject: [PATCH 10/15] Reword API inputs Signed-off-by: Naarcha-AWS --- _ml-commons-plugin/api.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_ml-commons-plugin/api.md b/_ml-commons-plugin/api.md index b5358dab..1e7f5968 100644 --- a/_ml-commons-plugin/api.md +++ b/_ml-commons-plugin/api.md @@ -22,7 +22,7 @@ In order to train tasks through the API, three inputs are required. - Algorithm name: Must be one of a [FunctionaName](https://github.com/opensearch-project/ml-commons/blob/1.3/common/src/main/java/org/opensearch/ml/common/parameter/FunctionName.java). This determines what algorithm the ML Engine runs. - Model hyper parameters: Adjust these parameters to make the model train better. -- Input data: The data input that trains the ML model, or applies the ML models to predictions. To input data, query against your index or use data frame. +- Input data: The data input that trains the ML model, or applies the ML models to predictions. You can input data in two ways, query against your index or use data frame. ## Train model From 421a6444fbeb5171a8b935796482ce2e007aef45 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS Date: Mon, 21 Mar 2022 16:22:48 -0500 Subject: [PATCH 11/15] A couple more language tweaks Signed-off-by: Naarcha-AWS --- _ml-commons-plugin/api.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/_ml-commons-plugin/api.md b/_ml-commons-plugin/api.md index 1e7f5968..0cf94f88 100644 --- a/_ml-commons-plugin/api.md +++ b/_ml-commons-plugin/api.md @@ -16,11 +16,11 @@ nav_order: 99 --- -The Machine Learning (ML) commons API lets you train ML algorithms synchronously and asynchronously, make predictions with that trained model, train and predict with the same data set, and then store that model in an ML model index. +The Machine Learning (ML) commons API lets you train ML algorithms synchronously and asynchronously, make predictions with that trained model, and train and predict with the same data set. In order to train tasks through the API, three inputs are required. -- Algorithm name: Must be one of a [FunctionaName](https://github.com/opensearch-project/ml-commons/blob/1.3/common/src/main/java/org/opensearch/ml/common/parameter/FunctionName.java). This determines what algorithm the ML Engine runs. +- Algorithm name: Must be one of a [FunctionaName](https://github.com/opensearch-project/ml-commons/blob/1.3/common/src/main/java/org/opensearch/ml/common/parameter/FunctionName.java). This determines what algorithm the ML Engine runs. To add a new function, see [How To Add a New Function](https://github.com/opensearch-project/ml-commons/blob/main/docs/how-to-add-new-function.md). - Model hyper parameters: Adjust these parameters to make the model train better. - Input data: The data input that trains the ML model, or applies the ML models to predictions. You can input data in two ways, query against your index or use data frame. From 8bcad52409b6742b05799ab8790b4028aac1de32 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS Date: Mon, 21 Mar 2022 18:52:03 -0500 Subject: [PATCH 12/15] Final changes Signed-off-by: Naarcha-AWS --- _ml-commons-plugin/api.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/_ml-commons-plugin/api.md b/_ml-commons-plugin/api.md index 0cf94f88..73900a72 100644 --- a/_ml-commons-plugin/api.md +++ b/_ml-commons-plugin/api.md @@ -76,7 +76,7 @@ POST /_plugins/_ml/_train/kmeans?async=true **Synchronously** -For synchronous responses, the API returns the model_id, which can be used to get info on the model or modify the model. +For synchronous responses, the API returns the model_id, which can be used to get or delete a task. ```json { @@ -87,7 +87,7 @@ For synchronous responses, the API returns the model_id, which can be used to ge **Asynchronously** -For asynchronous responses, the API returns the task_id, which can be used to get info or modify a task. +For asynchronous responses, the API returns the task_id, which can be used to get or delete a tasks. ```json { @@ -230,7 +230,7 @@ POST /_plugins/_ml/_predict/kmeans/ ## Train and Predict -Use to train and then immediately predict against the same training data set. Can only be used with synchronous models and the following algorithms: +Use to train and then immediately predict against the same training data set. Can only be used with unsupervised learning models and the following algorithms: - BATCH_RCF - FIT_RCF @@ -633,7 +633,7 @@ To receive stats for a specific node, use: GET /_plugins/_ml//stats/ ``` -To receive starts for a specific node and return a specified stat, use: +To receive stats for a specific node and return a specified stat, use: ```json GET /_plugins/_ml//stats/ From 07e7fc44788ab7b6a2bde1c318da03c6dd833e1f Mon Sep 17 00:00:00 2001 From: Naarcha-AWS Date: Mon, 21 Mar 2022 18:59:35 -0500 Subject: [PATCH 13/15] Regroup APIs Signed-off-by: Naarcha-AWS --- _ml-commons-plugin/api.md | 295 +++++++++++++++++++------------------- 1 file changed, 147 insertions(+), 148 deletions(-) diff --git a/_ml-commons-plugin/api.md b/_ml-commons-plugin/api.md index 73900a72..ff7e210e 100644 --- a/_ml-commons-plugin/api.md +++ b/_ml-commons-plugin/api.md @@ -115,27 +115,121 @@ The API returns information on the model, the algorithm used, and the content fo } ``` -## Get task information +## Search model + +Use this command to search models you're already created. -You can retrieve information about a task using the task_id. ```json -GET /_plugins/_ml/tasks/ +POST /_plugins/_ml/models/_search +{query} ``` -The response includes information about the task. +### Example 1: Query all models + +```json +POST /_plugins/_ml/models/_search +{ + "query": { + "match_all": {} + }, + "size": 1000 +} +``` + +### Example 2: Query models with algorithm "FIT_RCF" + +```json +POST /_plugins/_ml/models/_search +{ + "query": { + "term": { + "algorithm": { + "value": "FIT_RCF" + } + } + } +} +``` + +### Response ```json { - "model_id" : "l7lamX8BO5w8y8Ra2oty", - "task_type" : "TRAINING", - "function_name" : "KMEANS", - "state" : "COMPLETED", - "input_type" : "SEARCH_QUERY", - "worker_node" : "54xOe0w8Qjyze00UuLDfdA", - "create_time" : 1647545342556, - "last_update_time" : 1647545342587, - "is_async" : true + "took" : 8, + "timed_out" : false, + "_shards" : { + "total" : 1, + "successful" : 1, + "skipped" : 0, + "failed" : 0 + }, + "hits" : { + "total" : { + "value" : 2, + "relation" : "eq" + }, + "max_score" : 2.4159138, + "hits" : [ + { + "_index" : ".plugins-ml-model", + "_type" : "_doc", + "_id" : "-QkKJX8BvytMh9aUeuLD", + "_version" : 1, + "_seq_no" : 12, + "_primary_term" : 15, + "_score" : 2.4159138, + "_source" : { + "name" : "FIT_RCF", + "version" : 1, + "content" : "xxx", + "algorithm" : "FIT_RCF" + } + }, + { + "_index" : ".plugins-ml-model", + "_type" : "_doc", + "_id" : "OxkvHn8BNJ65KnIpck8x", + "_version" : 1, + "_seq_no" : 2, + "_primary_term" : 8, + "_score" : 2.4159138, + "_source" : { + "name" : "FIT_RCF", + "version" : 1, + "content" : "xxx", + "algorithm" : "FIT_RCF" + } + } + ] + } + } +``` + +## Delete model + +Deletes a model based on the model_id + +```json +DELETE /_plugins/_ml/models/ +``` + +The API returns the following: + +```json +{ + "_index" : ".plugins-ml-model", + "_type" : "_doc", + "_id" : "MzcIJX8BA7mbufL6DOwl", + "_version" : 2, + "result" : "deleted", + "_shards" : { + "total" : 2, + "successful" : 2, + "failed" : 0 + }, + "_seq_no" : 27, + "_primary_term" : 18 } ``` @@ -434,98 +528,30 @@ POST /_plugins/_ml/_train_predict/kmeans } ``` -## Search model - -Use this command to search models you're already created. +## Get task information +You can retrieve information about a task using the task_id. ```json -POST /_plugins/_ml/models/_search -{query} +GET /_plugins/_ml/tasks/ ``` - -### Example 1: Query all models +The response includes information about the task. ```json -POST /_plugins/_ml/models/_search { - "query": { - "match_all": {} - }, - "size": 1000 + "model_id" : "l7lamX8BO5w8y8Ra2oty", + "task_type" : "TRAINING", + "function_name" : "KMEANS", + "state" : "COMPLETED", + "input_type" : "SEARCH_QUERY", + "worker_node" : "54xOe0w8Qjyze00UuLDfdA", + "create_time" : 1647545342556, + "last_update_time" : 1647545342587, + "is_async" : true } ``` -### Example 2: Query models with algorithm "FIT_RCF" - -```json -POST /_plugins/_ml/models/_search -{ - "query": { - "term": { - "algorithm": { - "value": "FIT_RCF" - } - } - } -} -``` - -### Response - -```json -{ - "took" : 8, - "timed_out" : false, - "_shards" : { - "total" : 1, - "successful" : 1, - "skipped" : 0, - "failed" : 0 - }, - "hits" : { - "total" : { - "value" : 2, - "relation" : "eq" - }, - "max_score" : 2.4159138, - "hits" : [ - { - "_index" : ".plugins-ml-model", - "_type" : "_doc", - "_id" : "-QkKJX8BvytMh9aUeuLD", - "_version" : 1, - "_seq_no" : 12, - "_primary_term" : 15, - "_score" : 2.4159138, - "_source" : { - "name" : "FIT_RCF", - "version" : 1, - "content" : "xxx", - "algorithm" : "FIT_RCF" - } - }, - { - "_index" : ".plugins-ml-model", - "_type" : "_doc", - "_id" : "OxkvHn8BNJ65KnIpck8x", - "_version" : 1, - "_seq_no" : 2, - "_primary_term" : 8, - "_score" : 2.4159138, - "_source" : { - "name" : "FIT_RCF", - "version" : 1, - "content" : "xxx", - "algorithm" : "FIT_RCF" - } - } - ] - } - } -``` - ## Search task Search tasks based on parameters indicated in the request body. @@ -617,6 +643,33 @@ GET /_plugins/_ml/tasks/_search } ``` +## Delete task + +Delete a task based on the task_id. + +```json +DELETE /_plugins/_ml/tasks/{task_id} +``` + +The API returns the following: + +```json +{ + "_index" : ".plugins-ml-task", + "_type" : "_doc", + "_id" : "xQRYLX8BydmmU1x6nuD3", + "_version" : 4, + "result" : "deleted", + "_shards" : { + "total" : 2, + "successful" : 2, + "failed" : 0 + }, + "_seq_no" : 42, + "_primary_term" : 7 +} +``` + ## Stats Get statistics related to the number of tasks. @@ -680,60 +733,6 @@ GET /_plugins/_ml/stats } ``` -## Delete task - -Delete a task based on the task_id. - -```json -DELETE /_plugins/_ml/tasks/{task_id} -``` - -The API returns the following: - -```json -{ - "_index" : ".plugins-ml-task", - "_type" : "_doc", - "_id" : "xQRYLX8BydmmU1x6nuD3", - "_version" : 4, - "result" : "deleted", - "_shards" : { - "total" : 2, - "successful" : 2, - "failed" : 0 - }, - "_seq_no" : 42, - "_primary_term" : 7 -} -``` - -## Delete model - -Deletes a model based on the model_id - -```json -DELETE /_plugins/_ml/models/ -``` - -The API returns the following: - -```json -{ - "_index" : ".plugins-ml-model", - "_type" : "_doc", - "_id" : "MzcIJX8BA7mbufL6DOwl", - "_version" : 2, - "result" : "deleted", - "_shards" : { - "total" : 2, - "successful" : 2, - "failed" : 0 - }, - "_seq_no" : 27, - "_primary_term" : 18 -} -``` - From a6379975430a714491e1b275ffb572780e0e33f6 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS Date: Tue, 22 Mar 2022 10:13:32 -0500 Subject: [PATCH 14/15] Add additional context to intro Signed-off-by: Naarcha-AWS --- _ml-commons-plugin/api.md | 4 ++-- _ml-commons-plugin/index.md | 7 +++++-- 2 files changed, 7 insertions(+), 4 deletions(-) diff --git a/_ml-commons-plugin/api.md b/_ml-commons-plugin/api.md index ff7e210e..d7c18ca8 100644 --- a/_ml-commons-plugin/api.md +++ b/_ml-commons-plugin/api.md @@ -76,7 +76,7 @@ POST /_plugins/_ml/_train/kmeans?async=true **Synchronously** -For synchronous responses, the API returns the model_id, which can be used to get or delete a task. +For synchronous responses, the API returns the model_id, which can be used to get or delete a model. ```json { @@ -87,7 +87,7 @@ For synchronous responses, the API returns the model_id, which can be used to ge **Asynchronously** -For asynchronous responses, the API returns the task_id, which can be used to get or delete a tasks. +For asynchronous responses, the API returns the task_id, which can be used to get or delete a task. ```json { diff --git a/_ml-commons-plugin/index.md b/_ml-commons-plugin/index.md index 35809673..55d35544 100644 --- a/_ml-commons-plugin/index.md +++ b/_ml-commons-plugin/index.md @@ -10,10 +10,13 @@ has_toc: false ML Commons for OpenSearch eases the development of machine learning features by providing a set of common machine learning (ML) algorithms through transport and REST API calls. Those calls choose the right nodes and resources for each ML request and monitors ML tasks to ensure uptime. This allows you to leverage existing open-source ML algorithms and reduce the effort required to develop new ML features. -Models trained through the ML Commons plugin support model-based algorithms such as kmeans or Linear Regression. To get the best results, make sure you train your model first, then use the model to apply predictions. - Interaction with the ML commons plugin occurs through either the [REST API]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api) or [AD]({{site.url}}{{site.baseurl}}/ppl/commands#ad) and [kmeans]({{site.url}}{{site.baseurl}}/observability-plugin/ppl/commands#kmeans) PPL commands. +Models [trained]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api#train) through the ML Commons plugin support model-based algorithms such as kmeans. After you've trained a model enough so that it meets your precision requirements, you can apply the model to [predict]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api#predict) new data safely. + +Should you not want to use a model, you can use the [Train and Predict]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api#train-and-predict) API to test your model without having to evaluate the model's performance. + + ## Permissions There are two reserved user roles that can use of the ML commons plugin. From 0417e971b808fb9706d1afafb5a372e1d2ad31c1 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS Date: Tue, 22 Mar 2022 12:52:10 -0500 Subject: [PATCH 15/15] Add more consistency to headers Signed-off-by: Naarcha-AWS --- _ml-commons-plugin/api.md | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/_ml-commons-plugin/api.md b/_ml-commons-plugin/api.md index d7c18ca8..d55a121b 100644 --- a/_ml-commons-plugin/api.md +++ b/_ml-commons-plugin/api.md @@ -125,7 +125,7 @@ POST /_plugins/_ml/models/_search {query} ``` -### Example 1: Query all models +### Example: Query all models ```json POST /_plugins/_ml/models/_search @@ -137,7 +137,7 @@ POST /_plugins/_ml/models/_search } ``` -### Example 2: Query models with algorithm "FIT_RCF" +### Example: Query models with algorithm "FIT_RCF" ```json POST /_plugins/_ml/models/_search @@ -322,7 +322,7 @@ POST /_plugins/_ml/_predict/kmeans/ ``` -## Train and Predict +## Train and predict Use to train and then immediately predict against the same training data set. Can only be used with unsupervised learning models and the following algorithms: @@ -330,7 +330,7 @@ Use to train and then immediately predict against the same training data set. Ca - FIT_RCF - kmeans -### Example: Train and predict with Indexed data +### Example: Train and predict with indexed data ```json @@ -581,6 +581,8 @@ GET /_plugins/_ml/tasks/_search } ``` +### Response + ```json { "took" : 12,