OpenSearch/docs/java-rest/high-level/ml/evaluate-data-frame.asciidoc

--
:api: evaluate-data-frame
:request: EvaluateDataFrameRequest
:response: EvaluateDataFrameResponse
--
[role="xpack"]
[id="{upid}-{api}"]
=== Evaluate {dfanalytics} API

experimental::[]

Evaluates the {dfanalytics} for an annotated index.
The API accepts an +{request}+ object and returns an +{response}+.

[id="{upid}-{api}-request"]
==== Evaluate {dfanalytics} request

["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests-file}[{api}-request]
--------------------------------------------------
<1> Constructing a new evaluation request
<2> Reference to an existing index
<3> The query with which to select data from indices
<4> Evaluation to be performed

==== Evaluation

Evaluation to be performed.
Currently, supported evaluations include: +OutlierDetection+, +Classification+, +Regression+.

===== Outlier detection

["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests-file}[{api}-evaluation-outlierdetection]
--------------------------------------------------
<1> Constructing a new evaluation
<2> Name of the field in the index. Its value denotes the actual (i.e. ground truth) label for an example. Must be either true or false.
<3> Name of the field in the index. Its value denotes the probability (as per some ML algorithm) of the example being classified as positive.
<4> The remaining parameters are the metrics to be calculated based on the two fields described above
<5> {wikipedia}/Precision_and_recall#Precision[Precision] calculated at thresholds: 0.4, 0.5 and 0.6
<6> {wikipedia}/Precision_and_recall#Recall[Recall] calculated at thresholds: 0.5 and 0.7
<7> {wikipedia}/Confusion_matrix[Confusion matrix] calculated at threshold 0.5
<8> {wikipedia}/Receiver_operating_characteristic#Area_under_the_curve[AuC ROC] calculated and the curve points returned

===== Classification

["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests-file}[{api}-evaluation-classification]
--------------------------------------------------
<1> Constructing a new evaluation
<2> Name of the field in the index. Its value denotes the actual (i.e. ground truth) class the example belongs to.
<3> Name of the field in the index. Its value denotes the predicted (as per some ML algorithm) class of the example.
<4> Name of the field in the index. Its value denotes the array of top classes. Must be nested.
<5> The remaining parameters are the metrics to be calculated based on the two fields described above
<6> Accuracy
<7> Precision
<8> Recall
<9> Multiclass confusion matrix of size 3
<10> {wikipedia}/Receiver_operating_characteristic#Area_under_the_curve[AuC ROC] calculated for class "cat" treated as positive and the rest as negative

===== Regression

["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests-file}[{api}-evaluation-regression]
--------------------------------------------------
<1> Constructing a new evaluation
<2> Name of the field in the index. Its value denotes the actual (i.e. ground truth) value for an example.
<3> Name of the field in the index. Its value denotes the predicted (as per some ML algorithm) value for the example.
<4> The remaining parameters are the metrics to be calculated based on the two fields described above
<5> {wikipedia}/Mean_squared_error[Mean squared error]
<6> Mean squared logarithmic error
<7> {wikipedia}/Huber_loss#Pseudo-Huber_loss_function[Pseudo Huber loss]
<8> {wikipedia}/Coefficient_of_determination[R squared]

include::../execution.asciidoc[]

[id="{upid}-{api}-response"]
==== Response

The returned +{response}+ contains the requested evaluation metrics.

["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests-file}[{api}-response]
--------------------------------------------------
<1> Fetching all the calculated metrics results

==== Results

===== Outlier detection

["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests-file}[{api}-results-outlierdetection]
--------------------------------------------------

<1> Fetching precision metric by name
<2> Fetching precision at a given (0.4) threshold
<3> Fetching confusion matrix metric by name
<4> Fetching confusion matrix at a given (0.5) threshold

===== Classification

["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests-file}[{api}-results-classification]
--------------------------------------------------

<1> Fetching accuracy metric by name
<2> Fetching the actual accuracy value
<3> Fetching precision metric by name
<4> Fetching the actual precision value
<5> Fetching recall metric by name
<6> Fetching the actual recall value
<7> Fetching multiclass confusion matrix metric by name
<8> Fetching the contents of the confusion matrix
<9> Fetching the number of classes that were not included in the matrix
<10> Fetching AucRoc metric by name
<11> Fetching the actual AucRoc score
<12> Fetching the number of documents that were used in order to calculate AucRoc score

===== Regression

["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests-file}[{api}-results-regression]
--------------------------------------------------

<1> Fetching mean squared error metric by name
<2> Fetching the actual mean squared error value
<3> Fetching mean squared logarithmic error metric by name
<4> Fetching the actual mean squared logarithmic error value
<5> Fetching pseudo Huber loss metric by name
<6> Fetching the actual pseudo Huber loss value
<7> Fetching R squared metric by name
<8> Fetching the actual R squared value
[7.x][ML] Machine learning data frame analytics (#43544) (#43592) This merges the initial work that adds a framework for performing machine learning analytics on data frames. The feature is currently experimental and requires a platinum license. Note that the original commits can be found in the `feature-ml-data-frame-analytics` branch. A new set of APIs is added which allows the creation of data frame analytics jobs. Configuration allows specifying different types of analysis to be performed on a data frame. At first there is support for outlier detection. The APIs are: - PUT _ml/data_frame/analysis/{id} - GET _ml/data_frame/analysis/{id} - GET _ml/data_frame/analysis/{id}/_stats - POST _ml/data_frame/analysis/{id}/_start - POST _ml/data_frame/analysis/{id}/_stop - DELETE _ml/data_frame/analysis/{id} When a data frame analytics job is started a persistent task is created and started. The main steps of the task are: 1. reindex the source index into the dest index 2. analyze the data through the data_frame_analyzer c++ process 3. merge the results of the process back into the destination index In addition, an evaluation API is added which packages commonly used metrics that provide evaluation of various analysis: - POST _ml/data_frame/_evaluate 2019-06-25 13:29:11 -04:00			`--`
			`:api: evaluate-data-frame`
			`:request: EvaluateDataFrameRequest`
			`:response: EvaluateDataFrameResponse`
			`--`
[DOCS] Adds missing icons to ML HLRC APIs (#46515) 2019-09-10 11:26:56 -04:00			`[role="xpack"]`
[7.x][ML] Machine learning data frame analytics (#43544) (#43592) This merges the initial work that adds a framework for performing machine learning analytics on data frames. The feature is currently experimental and requires a platinum license. Note that the original commits can be found in the `feature-ml-data-frame-analytics` branch. A new set of APIs is added which allows the creation of data frame analytics jobs. Configuration allows specifying different types of analysis to be performed on a data frame. At first there is support for outlier detection. The APIs are: - PUT _ml/data_frame/analysis/{id} - GET _ml/data_frame/analysis/{id} - GET _ml/data_frame/analysis/{id}/_stats - POST _ml/data_frame/analysis/{id}/_start - POST _ml/data_frame/analysis/{id}/_stop - DELETE _ml/data_frame/analysis/{id} When a data frame analytics job is started a persistent task is created and started. The main steps of the task are: 1. reindex the source index into the dest index 2. analyze the data through the data_frame_analyzer c++ process 3. merge the results of the process back into the destination index In addition, an evaluation API is added which packages commonly used metrics that provide evaluation of various analysis: - POST _ml/data_frame/_evaluate 2019-06-25 13:29:11 -04:00			`[id="{upid}-{api}"]`
[DOCS] Fixes data frame analytics job terminology in HLRC (#46758) 2019-09-16 13:00:44 -04:00			`=== Evaluate {dfanalytics} API`
[7.x][ML] Machine learning data frame analytics (#43544) (#43592) This merges the initial work that adds a framework for performing machine learning analytics on data frames. The feature is currently experimental and requires a platinum license. Note that the original commits can be found in the `feature-ml-data-frame-analytics` branch. A new set of APIs is added which allows the creation of data frame analytics jobs. Configuration allows specifying different types of analysis to be performed on a data frame. At first there is support for outlier detection. The APIs are: - PUT _ml/data_frame/analysis/{id} - GET _ml/data_frame/analysis/{id} - GET _ml/data_frame/analysis/{id}/_stats - POST _ml/data_frame/analysis/{id}/_start - POST _ml/data_frame/analysis/{id}/_stop - DELETE _ml/data_frame/analysis/{id} When a data frame analytics job is started a persistent task is created and started. The main steps of the task are: 1. reindex the source index into the dest index 2. analyze the data through the data_frame_analyzer c++ process 3. merge the results of the process back into the destination index In addition, an evaluation API is added which packages commonly used metrics that provide evaluation of various analysis: - POST _ml/data_frame/_evaluate 2019-06-25 13:29:11 -04:00
[DOCS] Add experimental tag to data frame analytics APIs (#63153) 2020-10-02 12:42:57 -04:00			`experimental::[]`

[DOCS] Fix titles for ML APIs (#63152) (#63207) 2020-10-02 17:01:01 -04:00			`Evaluates the {dfanalytics} for an annotated index.`
[7.x][ML] Machine learning data frame analytics (#43544) (#43592) This merges the initial work that adds a framework for performing machine learning analytics on data frames. The feature is currently experimental and requires a platinum license. Note that the original commits can be found in the `feature-ml-data-frame-analytics` branch. A new set of APIs is added which allows the creation of data frame analytics jobs. Configuration allows specifying different types of analysis to be performed on a data frame. At first there is support for outlier detection. The APIs are: - PUT _ml/data_frame/analysis/{id} - GET _ml/data_frame/analysis/{id} - GET _ml/data_frame/analysis/{id}/_stats - POST _ml/data_frame/analysis/{id}/_start - POST _ml/data_frame/analysis/{id}/_stop - DELETE _ml/data_frame/analysis/{id} When a data frame analytics job is started a persistent task is created and started. The main steps of the task are: 1. reindex the source index into the dest index 2. analyze the data through the data_frame_analyzer c++ process 3. merge the results of the process back into the destination index In addition, an evaluation API is added which packages commonly used metrics that provide evaluation of various analysis: - POST _ml/data_frame/_evaluate 2019-06-25 13:29:11 -04:00			`The API accepts an +{request}+ object and returns an +{response}+.`

			`[id="{upid}-{api}-request"]`
[DOCS] Fixes data frame analytics job terminology in HLRC (#46758) 2019-09-16 13:00:44 -04:00			`==== Evaluate {dfanalytics} request`
[7.x][ML] Machine learning data frame analytics (#43544) (#43592) This merges the initial work that adds a framework for performing machine learning analytics on data frames. The feature is currently experimental and requires a platinum license. Note that the original commits can be found in the `feature-ml-data-frame-analytics` branch. A new set of APIs is added which allows the creation of data frame analytics jobs. Configuration allows specifying different types of analysis to be performed on a data frame. At first there is support for outlier detection. The APIs are: - PUT _ml/data_frame/analysis/{id} - GET _ml/data_frame/analysis/{id} - GET _ml/data_frame/analysis/{id}/_stats - POST _ml/data_frame/analysis/{id}/_start - POST _ml/data_frame/analysis/{id}/_stop - DELETE _ml/data_frame/analysis/{id} When a data frame analytics job is started a persistent task is created and started. The main steps of the task are: 1. reindex the source index into the dest index 2. analyze the data through the data_frame_analyzer c++ process 3. merge the results of the process back into the destination index In addition, an evaluation API is added which packages commonly used metrics that provide evaluation of various analysis: - POST _ml/data_frame/_evaluate 2019-06-25 13:29:11 -04:00
			`["source","java",subs="attributes,callouts,macros"]`
			`--------------------------------------------------`
			`include-tagged::{doc-tests-file}[{api}-request]`
			`--------------------------------------------------`
			`<1> Constructing a new evaluation request`
			`<2> Reference to an existing index`
[7.x] Allow the user to specify 'query' in Evaluate Data Frame request (#45775) (#45825) 2019-08-22 05:14:26 -04:00			`<3> The query with which to select data from indices`
[7.x] Add MlClientDocumentationIT tests for classification. (#47569) (#47896) 2019-10-11 04:19:55 -04:00			`<4> Evaluation to be performed`

			`==== Evaluation`

			`Evaluation to be performed.`
Rename binary_soft_classification evaluation to outlier_detection (#59951) (#59970) 2020-07-21 09:15:04 -04:00			`Currently, supported evaluations include: +OutlierDetection+, +Classification+, +Regression+.`
[7.x] Add MlClientDocumentationIT tests for classification. (#47569) (#47896) 2019-10-11 04:19:55 -04:00
Rename binary_soft_classification evaluation to outlier_detection (#59951) (#59970) 2020-07-21 09:15:04 -04:00			`===== Outlier detection`
[7.x] Add MlClientDocumentationIT tests for classification. (#47569) (#47896) 2019-10-11 04:19:55 -04:00
			`["source","java",subs="attributes,callouts,macros"]`
			`--------------------------------------------------`
Rename binary_soft_classification evaluation to outlier_detection (#59951) (#59970) 2020-07-21 09:15:04 -04:00			`include-tagged::{doc-tests-file}[{api}-evaluation-outlierdetection]`
[7.x] Add MlClientDocumentationIT tests for classification. (#47569) (#47896) 2019-10-11 04:19:55 -04:00			`--------------------------------------------------`
			`<1> Constructing a new evaluation`
			`<2> Name of the field in the index. Its value denotes the actual (i.e. ground truth) label for an example. Must be either true or false.`
			`<3> Name of the field in the index. Its value denotes the probability (as per some ML algorithm) of the example being classified as positive.`
			`<4> The remaining parameters are the metrics to be calculated based on the two fields described above`
[DOCS] Replace Wikipedia links with attribute (#61171) (#61209) 2020-08-17 11:27:04 -04:00			`<5> {wikipedia}/Precision_and_recall#Precision[Precision] calculated at thresholds: 0.4, 0.5 and 0.6`
			`<6> {wikipedia}/Precision_and_recall#Recall[Recall] calculated at thresholds: 0.5 and 0.7`
			`<7> {wikipedia}/Confusion_matrix[Confusion matrix] calculated at threshold 0.5`
			`<8> {wikipedia}/Receiver_operating_characteristic#Area_under_the_curve[AuC ROC] calculated and the curve points returned`
[7.x] Add MlClientDocumentationIT tests for classification. (#47569) (#47896) 2019-10-11 04:19:55 -04:00
			`===== Classification`

			`["source","java",subs="attributes,callouts,macros"]`
			`--------------------------------------------------`
			`include-tagged::{doc-tests-file}[{api}-evaluation-classification]`
			`--------------------------------------------------`
			`<1> Constructing a new evaluation`
			`<2> Name of the field in the index. Its value denotes the actual (i.e. ground truth) class the example belongs to.`
			`<3> Name of the field in the index. Its value denotes the predicted (as per some ML algorithm) class of the example.`
[7.x] [ML] Implement AucRoc metric for classification - HLRC (#62304) (#63058) 2020-09-30 08:04:10 -04:00			`<4> Name of the field in the index. Its value denotes the array of top classes. Must be nested.`
			`<5> The remaining parameters are the metrics to be calculated based on the two fields described above`
			`<6> Accuracy`
			`<7> Precision`
			`<8> Recall`
			`<9> Multiclass confusion matrix of size 3`
			`<10> {wikipedia}/Receiver_operating_characteristic#Area_under_the_curve[AuC ROC] calculated for class "cat" treated as positive and the rest as negative`
[7.x] Add MlClientDocumentationIT tests for classification. (#47569) (#47896) 2019-10-11 04:19:55 -04:00
			`===== Regression`

			`["source","java",subs="attributes,callouts,macros"]`
			`--------------------------------------------------`
			`include-tagged::{doc-tests-file}[{api}-evaluation-regression]`
			`--------------------------------------------------`
			`<1> Constructing a new evaluation`
			`<2> Name of the field in the index. Its value denotes the actual (i.e. ground truth) value for an example.`
			`<3> Name of the field in the index. Its value denotes the predicted (as per some ML algorithm) value for the example.`
			`<4> The remaining parameters are the metrics to be calculated based on the two fields described above`
[DOCS] Replace Wikipedia links with attribute (#61171) (#61209) 2020-08-17 11:27:04 -04:00			`<5> {wikipedia}/Mean_squared_error[Mean squared error]`
[7.x] Implement MSLE (MeanSquaredLogarithmicError) evaluation metric for regression analysis (#58684) (#58731) 2020-06-30 08:09:11 -04:00			`<6> Mean squared logarithmic error`
[DOCS] Replace Wikipedia links with attribute (#61171) (#61209) 2020-08-17 11:27:04 -04:00			`<7> {wikipedia}/Huber_loss#Pseudo-Huber_loss_function[Pseudo Huber loss]`
			`<8> {wikipedia}/Coefficient_of_determination[R squared]`
[7.x][ML] Machine learning data frame analytics (#43544) (#43592) This merges the initial work that adds a framework for performing machine learning analytics on data frames. The feature is currently experimental and requires a platinum license. Note that the original commits can be found in the `feature-ml-data-frame-analytics` branch. A new set of APIs is added which allows the creation of data frame analytics jobs. Configuration allows specifying different types of analysis to be performed on a data frame. At first there is support for outlier detection. The APIs are: - PUT _ml/data_frame/analysis/{id} - GET _ml/data_frame/analysis/{id} - GET _ml/data_frame/analysis/{id}/_stats - POST _ml/data_frame/analysis/{id}/_start - POST _ml/data_frame/analysis/{id}/_stop - DELETE _ml/data_frame/analysis/{id} When a data frame analytics job is started a persistent task is created and started. The main steps of the task are: 1. reindex the source index into the dest index 2. analyze the data through the data_frame_analyzer c++ process 3. merge the results of the process back into the destination index In addition, an evaluation API is added which packages commonly used metrics that provide evaluation of various analysis: - POST _ml/data_frame/_evaluate 2019-06-25 13:29:11 -04:00
			`include::../execution.asciidoc[]`

			`[id="{upid}-{api}-response"]`
			`==== Response`

			`The returned +{response}+ contains the requested evaluation metrics.`

			`["source","java",subs="attributes,callouts,macros"]`
			`--------------------------------------------------`
			`include-tagged::{doc-tests-file}[{api}-response]`
			`--------------------------------------------------`
			`<1> Fetching all the calculated metrics results`
[7.x] Add MlClientDocumentationIT tests for classification. (#47569) (#47896) 2019-10-11 04:19:55 -04:00
			`==== Results`

Rename binary_soft_classification evaluation to outlier_detection (#59951) (#59970) 2020-07-21 09:15:04 -04:00			`===== Outlier detection`
[7.x] Add MlClientDocumentationIT tests for classification. (#47569) (#47896) 2019-10-11 04:19:55 -04:00
			`["source","java",subs="attributes,callouts,macros"]`
			`--------------------------------------------------`
Rename binary_soft_classification evaluation to outlier_detection (#59951) (#59970) 2020-07-21 09:15:04 -04:00			`include-tagged::{doc-tests-file}[{api}-results-outlierdetection]`
[7.x] Add MlClientDocumentationIT tests for classification. (#47569) (#47896) 2019-10-11 04:19:55 -04:00			`--------------------------------------------------`

			`<1> Fetching precision metric by name`
			`<2> Fetching precision at a given (0.4) threshold`
			`<3> Fetching confusion matrix metric by name`
			`<4> Fetching confusion matrix at a given (0.5) threshold`

			`===== Classification`

			`["source","java",subs="attributes,callouts,macros"]`
			`--------------------------------------------------`
			`include-tagged::{doc-tests-file}[{api}-results-classification]`
			`--------------------------------------------------`

[7.x] Implement accuracy metric for multiclass classification (#47772) (#49430) 2019-11-21 09:01:18 -05:00			`<1> Fetching accuracy metric by name`
			`<2> Fetching the actual accuracy value`
[7.x] Implement `precision` and `recall` metrics for classification evaluation (#49671) (#50378) 2019-12-19 12:55:05 -05:00			`<3> Fetching precision metric by name`
			`<4> Fetching the actual precision value`
			`<5> Fetching recall metric by name`
			`<6> Fetching the actual recall value`
			`<7> Fetching multiclass confusion matrix metric by name`
			`<8> Fetching the contents of the confusion matrix`
			`<9> Fetching the number of classes that were not included in the matrix`
[7.x] [ML] Implement AucRoc metric for classification - HLRC (#62304) (#63058) 2020-09-30 08:04:10 -04:00			`<10> Fetching AucRoc metric by name`
			`<11> Fetching the actual AucRoc score`
			`<12> Fetching the number of documents that were used in order to calculate AucRoc score`
[7.x] Add MlClientDocumentationIT tests for classification. (#47569) (#47896) 2019-10-11 04:19:55 -04:00
			`===== Regression`

			`["source","java",subs="attributes,callouts,macros"]`
			`--------------------------------------------------`
			`include-tagged::{doc-tests-file}[{api}-results-regression]`
			`--------------------------------------------------`

			`<1> Fetching mean squared error metric by name`
			`<2> Fetching the actual mean squared error value`
[7.x] Implement MSLE (MeanSquaredLogarithmicError) evaluation metric for regression analysis (#58684) (#58731) 2020-06-30 08:09:11 -04:00			`<3> Fetching mean squared logarithmic error metric by name`
			`<4> Fetching the actual mean squared logarithmic error value`
[7.x] Implement pseudo Huber loss (PseudoHuber) evaluation metric for regression analysis (#58734) (#58825) 2020-07-01 08:52:06 -04:00			`<5> Fetching pseudo Huber loss metric by name`
			`<6> Fetching the actual pseudo Huber loss value`
			`<7> Fetching R squared metric by name`
			`<8> Fetching the actual R squared value`