diff --git a/_ml-commons-plugin/api.md b/_ml-commons-plugin/api.md index 90a31f2a..8a9ab582 100644 --- a/_ml-commons-plugin/api.md +++ b/_ml-commons-plugin/api.md @@ -16,13 +16,13 @@ nav_order: 99 --- -The Machine Learning (ML) commons API lets you create, train, and store machine learning algorithms both synchronously and asynchronously. +The Machine Learning (ML) commons API lets you train ML algorithms synchronously and asynchronously, and then store that model in an ML model index. In order to train tasks through the API, three inputs are required. - Algorithm name: Usually `FunctionaName`. This determines what algorithm the ML Engine runs. -- Model hyper parameters: Adjust these parameters to make the model train better. You can also implement `MLAgoParamas` to build custom parameters for each model. -- Input data: The data input that teaches the ML model. To input data, query against your index or use data frame. +- Model hyper parameters: Adjust these parameters to make the model train better. +- Input data: The data input that trains the ML model, or applies the ML models to predictions. To input data, query against your index or use data frame. ## Train model @@ -76,7 +76,7 @@ POST /_plugins/_ml/_train/kmeans?async=true **Synchronously** -For synchronous responses, the API returns the model_id, which can be used to get info or modify the model. +For synchronous responses, the API returns the model_id, which can be used to get info on the model or modify the model. ```json { @@ -141,7 +141,7 @@ The response includes information about the task. ## Predict -ML commons can predict new data with your trained model either from indexed data or a data frame. +ML commons can predict new data with your trained model either from indexed data or a data frame. The model_id is required to use the Predict API. ```json POST /_plugins/_ml/_predict// @@ -230,7 +230,11 @@ POST /_plugins/_ml/_predict/kmeans/ ## Train and Predict -Use to train and then immediately predict against the same training data set. Can only be used with synchronous models and the kmeans algorithm. +Use to train and then immediately predict against the same training data set. Can only be used with synchronous models and the following algorithms: + +- BATCH_RCF +- FIT_RCF +- kmeans ### Example: Train and predict with Indexed data @@ -364,11 +368,8 @@ POST /_plugins/_ml/_train_predict/kmeans } ``` - ### Response -**Response for index data** - ```json { "status" : "COMPLETED", @@ -433,263 +434,6 @@ POST /_plugins/_ml/_train_predict/kmeans } ``` -**Response for data input directly** - -```json -{ - "status" : "COMPLETED", - "prediction_result" : { - "column_metas" : [ - { - "name" : "score", - "column_type" : "DOUBLE" - }, - { - "name" : "anomaly_grade", - "column_type" : "DOUBLE" - }, - { - "name" : "timestamp", - "column_type" : "LONG" - } - ], - "rows" : [ - { - "values" : [ - { - "column_type" : "DOUBLE", - "value" : 0.0 - }, - { - "column_type" : "DOUBLE", - "value" : 0.0 - }, - { - "column_type" : "LONG", - "value" : 1404187200000 - } - ] - }, - ... - ] - } -} -``` - -## Execute - -Use the Execute API to run no-model-based algorithms. You do not need to train a model in order to receive results from your chosen algorithm. - -```json -POST _plugins/_ml/_execute/ -``` - -### Example: Execute sample calculator, supported "operation": max/min/sum - -```json -POST _plugins/_ml/_execute/local_sample_calculator -{ - "operation": "max", - "input_data": [1.0, 2.0, 3.0] -} -``` - - -### Example: Execute anomaly localization - -```json -POST /_plugins/_ml/_execute/anomaly_localization -{ - "index_name": "rca-index", - "attribute_field_names": [ - "attribute" - ], - "aggregations": [ - { - "sum": { - "sum": { - "field": "value" - } - } - } - ], - "time_field_name": "timestamp", - "start_time": 1620630000000, - "end_time": 1621234800000, - "min_time_interval": 86400000, - "num_outputs": 2 -} -``` - -### Response - -**Sample calculator response** - -```json -{ - "sample_result" : 3.0 -} -``` - -**Sample anomaly response** - -```json -{ - "results" : [ - { - "name" : "sum", - "result" : { - "buckets" : [ - { - "start_time" : 1620630000000, - "end_time" : 1620716400000, - "overall_aggregate_value" : 65.0 - }, - { - "start_time" : 1620716400000, - "end_time" : 1620802800000, - "overall_aggregate_value" : 75.0, - "entities" : [ - { - "key" : [ - "attr0" - ], - "contribution_value" : 1.0, - "base_value" : 2.0, - "new_value" : 3.0 - }, - { - "key" : [ - "attr1" - ], - "contribution_value" : 1.0, - "base_value" : 3.0, - "new_value" : 4.0 - } - ] - }, - { - "start_time" : 1620802800000, - "end_time" : 1620889200000, - "overall_aggregate_value" : 85.0, - "entities" : [ - { - "key" : [ - "attr0" - ], - "contribution_value" : 2.0, - "base_value" : 2.0, - "new_value" : 4.0 - }, - { - "key" : [ - "attr1" - ], - "contribution_value" : 2.0, - "base_value" : 3.0, - "new_value" : 5.0 - } - ] - }, - { - "start_time" : 1620889200000, - "end_time" : 1620975600000, - "overall_aggregate_value" : 95.0, - "entities" : [ - { - "key" : [ - "attr0" - ], - "contribution_value" : 3.0, - "base_value" : 2.0, - "new_value" : 5.0 - }, - { - "key" : [ - "attr1" - ], - "contribution_value" : 3.0, - "base_value" : 3.0, - "new_value" : 6.0 - } - ] - }, - { - "start_time" : 1620975600000, - "end_time" : 1621062000000, - "overall_aggregate_value" : 105.0, - "entities" : [ - { - "key" : [ - "attr0" - ], - "contribution_value" : 4.0, - "base_value" : 2.0, - "new_value" : 6.0 - }, - { - "key" : [ - "attr1" - ], - "contribution_value" : 4.0, - "base_value" : 3.0, - "new_value" : 7.0 - } - ] - }, - { - "start_time" : 1621062000000, - "end_time" : 1621148400000, - "overall_aggregate_value" : 115.0, - "entities" : [ - { - "key" : [ - "attr0" - ], - "contribution_value" : 5.0, - "base_value" : 2.0, - "new_value" : 7.0 - }, - { - "key" : [ - "attr1" - ], - "contribution_value" : 5.0, - "base_value" : 3.0, - "new_value" : 8.0 - } - ] - }, - { - "start_time" : 1621148400000, - "end_time" : 1621234800000, - "overall_aggregate_value" : 125.0, - "entities" : [ - { - "key" : [ - "attr0" - ], - "contribution_value" : 6.0, - "base_value" : 2.0, - "new_value" : 8.0 - }, - { - "key" : [ - "attr1" - ], - "contribution_value" : 6.0, - "base_value" : 3.0, - "new_value" : 9.0 - } - ] - } - ] - } - } - ] -} -``` - ## Search model Use this command to search models you're already created. @@ -713,7 +457,7 @@ POST /_plugins/_ml/models/_search } ``` -### Example 2: Query models with algorithm "BATCh_RCF" +### Example 2: Query models with algorithm "FIT_RCF" ```json POST /_plugins/_ml/models/_search @@ -721,7 +465,7 @@ POST /_plugins/_ml/models/_search "query": { "term": { "algorithm": { - "value": "BATCH_RCF" + "value": "FIT_RCF" } } } diff --git a/_ml-commons-plugin/index.md b/_ml-commons-plugin/index.md index ceb9dc85..140e2c7c 100644 --- a/_ml-commons-plugin/index.md +++ b/_ml-commons-plugin/index.md @@ -10,16 +10,14 @@ has_toc: false ML Commons for OpenSearch eases the development of machine learning features by providing a set of common machine learning (ML) algorithms through transport and REST API calls. Those calls choose the right nodes and resources for each ML request and monitors ML tasks to ensure uptime. This allows you to leverage existing open-source ML algorithms and reduce the effort required to develop new ML features. -Models trained through the ML Commons plugin support two types of algorithms: +Models trained through the ML Commons plugin support model-based algorithms such as kmeans or Linear Regression. To get the best results, make sure you train your model first, then use the model to apply predictions. Linear Regression is only supported for synchronous models. -- Model-based algorithms such kmeans or Linear Regression. To get the best results, make sure you train your model first, then use the model to apply predictions. Linear Regression is only supported -- No-model based algorithm such as RCA. These algorithms can be executed directly through an `Executable` interface. Interaction with the ML commons plugin occurs through either the [REST API]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api) or [AD]({{site.url}}{{site.baseurl}}/ppl/commands#ad) and [kmeans]({{site.url}}{{site.baseurl}}/observability-plugin/ppl/commands#kmeans) PPL commands. ## Permissions -There are two user roles that can make use of the ML commons plugin. +There are two reserved user roles that can use of the ML commons plugin. - `ml_full_access`: Full access to all ML features, including starting new jobs and reading or deleting models. - `ml_readonly_access`: Can only read trained models and statistics relevant to the model's cluster. Cannot start jobs or delete models. diff --git a/_observability-plugin/ppl/commands.md b/_observability-plugin/ppl/commands.md index 6c8f82db..577d8884 100644 --- a/_observability-plugin/ppl/commands.md +++ b/_observability-plugin/ppl/commands.md @@ -834,30 +834,30 @@ search source=my_index | where match(message, "this is a test", operator=and, ze ## ad -The `ad` command applies Random Cut Forest (RCF) algorithm in ml-commons plugin on the search result returned by a PPL command.Based on the input, two types of RCF algorithms will be utilized: fixed in time RCF for processing time-series data, batch RCF for processing non-time-series data. +The `ad` command applies the Random Cut Forest (RCF) algorithm in the ML Commons plugin on the search result returned by a PPL command. Based on the input, the plugin uses two types of RCF algorithms: fixed in time RCF for processing time-series data and batch RCF for processing non-time-series data. ### Fixed In Time RCF For Time-series Data Command Syntax ```sql -ad \ \ \ +ad \ \ \ ``` Field | Description | Required :--- | :--- |:--- -`shingle\_size` | A consecutive sequence of the most recent records. The default value is 8. | No -`time\_decay` | Specifies how much of the recent past to consider when computing an anomaly score. The default value is 0.001. | No -`time\_field` | Specifies the time filed for RCF to use as time-series data. | Yes +`shingle_size` | A consecutive sequence of the most recent records. The default value is 8. | No +`time_decay` | Specifies how much of the recent past to consider when computing an anomaly score. The default value is 0.001. | No +`time_field` | Specifies the time filed for RCF to use as time-series data. Must be either a long value, such as the timestamp in miliseconds, or a string value in yyyy-MM-dd HH:mm:ss.| Yes ### Batch RCF for Non-time-series Data Command Syntax ```sql -ad \ \ +ad \ \ ``` Field | Description | Required :--- | :--- |:--- -`shingle\_size` | A consecutive sequence of the most recent records. The default value is 8. | No -`time\_decay` | Specifies how much of the recent past to consider when computing an anomaly score. The default value is 0.001. | No +`shingle_size` | A consecutive sequence of the most recent records. The default value is 8. | No +`time_decay` | Specifies how much of the recent past to consider when computing an anomaly score. The default value is 0.001. | No *Example 1*: Detecting events in New York City from taxi ridership data with time-series data @@ -887,7 +887,7 @@ value | score | anomalous ## kmeans -The kmeans command applies kmeans algorithm in ml-commons plugin on the search result returned by a PPL command. +The kmeans command applies the ML Commons plugin's kmeans algorithm to the provided PPL command's search results. ## Syntax