diff --git a/_ml-commons-plugin/api.md b/_ml-commons-plugin/api.md new file mode 100644 index 00000000..65a74a03 --- /dev/null +++ b/_ml-commons-plugin/api.md @@ -0,0 +1,19 @@ +--- +layout: default +title: API +has_children: false +nav_order: 90 +--- + +The Machine Learning (ML) commons API lets you create, train, and store machine learning algorithms both synchronously and asynchronously. + +In order to train tasks through the API, three inputs are required. + +- Algorithm name: Usually `FunctionaName`. This determines what algorithm the ML Engine runs. +- Model hyper parameters: Adjust these parameters to make the model train better. You can also implement `MLAgoParamas` to build custom parameters for each model. +- Input data: The data input that teaches the ML model. To input data, query against your index or use data frame. + +## Train model + + + diff --git a/_ml-commons-plugin/index.md b/_ml-commons-plugin/index.md new file mode 100644 index 00000000..bd945232 --- /dev/null +++ b/_ml-commons-plugin/index.md @@ -0,0 +1,31 @@ +--- +layout: default +title: About ML Commons +nav_order: 38 +has_children: false +--- + +# ML Commons plugin + +ML Commons for OpenSearch eases the development of machine learning features by providing a set of common machine learning (ML) algorithms through transport and REST API calls. Those calls choose the right nodes and resources for each ML request and monitors ML tasks to ensure uptime. This allows you to leverage existing open-source ML algorithms and reduce the effort required to develop new ML features. + +Models trained through the ML Commons plugin support two types of algorithms. + +- Model-based algorithms such Kmeans or Linear Regression. To get the best results, make sure you train your model first, then use the model to apply predictions. Linear Regression is only supported +- No-model based algorithm such as RCA. These algorithms can be executed directly through an `Executable` interface. + +Interaction with the ML commons plugin occurs through either the [REST API] or AD and Kmeans PPL commands. + +## Permissions + +There are two user roles that can make use of the ML commons plugin. + +- `ml_full_access`: Full access to all ML features, including starting new jobs and reading or deleting models. +- `ml_readonly_access`: Can only read trained models and statistics relevant to the model's cluster. Cannot start jobs or delete models. + +## Quickstart + + + + + diff --git a/_observability-plugin/ppl/commands.md b/_observability-plugin/ppl/commands.md index 7b5725b8..7d6d2553 100644 --- a/_observability-plugin/ppl/commands.md +++ b/_observability-plugin/ppl/commands.md @@ -732,3 +732,84 @@ PPL query: ```ppl search source=my_index | where match(message, "this is a test", operator=and, zero_terms_query=all) ``` + +## ad + +The `ad` command applies Random Cut Forest (RCF) algorithm in ml-commons plugin on the search result returned by a PPL command.Based on the input, two types of RCF algorithms will be utilized: fixed in time RCF for processing time-series data, batch RCF for processing non-time-series data. + +### Fixed In Time RCF For Time-series Data Command Syntax + +```sql +ad \ \ \ +``` + +Field | Description | Required +:--- | :--- |:--- +`shingle\_size` | A consecutive sequence of the most recent records. The default value is 8. | No +`time\_decay` | Specifies how much of the recent past to consider when computing an anomaly score. The default value is 0.001. | No +`time\_field` | Specifies the time filed for RCF to use as time-series data. | Yes + +### Batch RCF for Non-time-series Data Command Syntax + +```sql +ad \ \ +``` + +Field | Description | Required +:--- | :--- |:--- +`shingle\_size` | A consecutive sequence of the most recent records. The default value is 8. | No +`time\_decay` | Specifies how much of the recent past to consider when computing an anomaly score. The default value is 0.001. | No + +*Example 1*: Detecting events in New York City from taxi ridership data with time-series data + +The example trains a RCF model and use the model to detect anomalies in the time-series ridership data. + +PPL query: + +```sql +os> source=nyc_taxi | fields value, timestamp | AD time_field='timestamp' | where value=10844.0' +``` + +value | timestamp | score | anomaly_grade +:--- | :--- |:--- | :--- +10844.0 | 1404172800000 | 0.0 | 0.0 + +*Example 2*: Detecting events in New York City from taxi ridership data with non-time-series data + +PPL query: + +```sql +os> source=nyc_taxi | fields value | AD | where value=10844.0' +``` + +value | score | anomalous +:--- | :--- |:--- +| 10844.0 | 0.0 | false + +## kmeans + +The kmeans command applies kmeans algorithm in ml-commons plugin on the search result returned by a PPL command. + +## Syntax + +```sql +kmeans +``` + +For `cluster-number`, enter the number of clusters you want to group your data points into. + +*Example* + +The example shows how to classify three Iris species (Iris setosa, Iris virginica and Iris versicolor) based on the combination of four features measured from each sample: the length and the width of the sepals and petals. + +PPL query: + +```sql +os> source=iris_data | fields sepal_length_in_cm, sepal_width_in_cm, petal_length_in_cm, petal_width_in_cm | kmeans 3 +``` + +sepal_length_in_cm | sepal_width_in_cm | petal_length_in_cm | petal_width_in_cm | ClusterID +:--- | :--- |:--- | :--- | :--- +| 5.1 | 3.5 | 1.4 | 0.2 | 1 +| 5.6 | 3.0 | 4.1 | 1.3 | 0 +| 6.7 | 2.5 | 5.8 | 1.8 | 2