[role="xpack"] [testenv="platinum"] [[evaluate-dfanalytics]] === Evaluate {dfanalytics} API [subs="attributes"] ++++ Evaluate {dfanalytics} ++++ Evaluates the {dfanalytics} for an annotated index. experimental[] [[ml-evaluate-dfanalytics-request]] ==== {api-request-title} `POST _ml/data_frame/_evaluate` [[ml-evaluate-dfanalytics-prereq]] ==== {api-prereq-title} * You must have `monitor_ml` privilege to use this API. For more information, see <> and <>. [[ml-evaluate-dfanalytics-desc]] ==== {api-description-title} The API packages together commonly used evaluation metrics for various types of machine learning features. This has been designed for use on indexes created by {dfanalytics}. Evaluation requires both a ground truth field and an analytics result field to be present. [[ml-evaluate-dfanalytics-request-body]] ==== {api-request-body-title} `index`:: (Required, object) Defines the `index` in which the evaluation will be performed. `query`:: (Optional, object) A query clause that retrieves a subset of data from the source index. See <>. `evaluation`:: (Required, object) Defines the type of evaluation you want to perform. See <>. + -- Available evaluation types: * `binary_soft_classification` * `regression` * `classification` -- //// [[ml-evaluate-dfanalytics-results]] ==== {api-response-body-title} `binary_soft_classification`:: (object) If you chose to do binary soft classification, the API returns the following evaluation metrics: `auc_roc`::: TBD `confusion_matrix`::: TBD `precision`::: TBD `recall`::: TBD //// [[ml-evaluate-dfanalytics-example]] ==== {api-examples-title} ===== Binary soft classification [source,console] -------------------------------------------------- POST _ml/data_frame/_evaluate { "index": "my_analytics_dest_index", "evaluation": { "binary_soft_classification": { "actual_field": "is_outlier", "predicted_probability_field": "ml.outlier_score" } } } -------------------------------------------------- // TEST[skip:TBD] The API returns the following results: [source,console-result] ---- { "binary_soft_classification": { "auc_roc": { "score": 0.92584757746414444 }, "confusion_matrix": { "0.25": { "tp": 5, "fp": 9, "tn": 204, "fn": 5 }, "0.5": { "tp": 1, "fp": 5, "tn": 208, "fn": 9 }, "0.75": { "tp": 0, "fp": 4, "tn": 209, "fn": 10 } }, "precision": { "0.25": 0.35714285714285715, "0.5": 0.16666666666666666, "0.75": 0 }, "recall": { "0.25": 0.5, "0.5": 0.1, "0.75": 0 } } } ---- ===== {regression-cap} [source,console] -------------------------------------------------- POST _ml/data_frame/_evaluate { "index": "house_price_predictions", <1> "query": { "bool": { "filter": [ { "term": { "ml.is_training": false } } <2> ] } }, "evaluation": { "regression": { "actual_field": "price", <3> "predicted_field": "ml.price_prediction", <4> "metrics": { "r_squared": {}, "mean_squared_error": {} } } } } -------------------------------------------------- // TEST[skip:TBD] <1> The output destination index from a {dfanalytics} {reganalysis}. <2> In this example, a test/train split (`training_percent`) was defined for the {reganalysis}. This query limits evaluation to be performed on the test split only. <3> The ground truth value for the actual house price. This is required in order to evaluate results. <4> The predicted value for house price calculated by the {reganalysis}. The following example calculates the training error: [source,console] -------------------------------------------------- POST _ml/data_frame/_evaluate { "index": "student_performance_mathematics_reg", "query": { "term": { "ml.is_training": { "value": true <1> } } }, "evaluation": { "regression": { "actual_field": "G3", <2> "predicted_field": "ml.G3_prediction", <3> "metrics": { "r_squared": {}, "mean_squared_error": {} } } } } -------------------------------------------------- // TEST[skip:TBD] <1> In this example, a test/train split (`training_percent`) was defined for the {reganalysis}. This query limits evaluation to be performed on the train split only. It means that a training error will be calculated. <2> The field that contains the ground truth value for the actual student performance. This is required in order to evaluate results. <3> The field that contains the predicted value for student performance calculated by the {reganalysis}. The next example calculates the testing error. The only difference compared with the previous example is that `ml.is_training` is set to `false` this time, so the query excludes the train split from the evaluation. [source,console] -------------------------------------------------- POST _ml/data_frame/_evaluate { "index": "student_performance_mathematics_reg", "query": { "term": { "ml.is_training": { "value": false <1> } } }, "evaluation": { "regression": { "actual_field": "G3", <2> "predicted_field": "ml.G3_prediction", <3> "metrics": { "r_squared": {}, "mean_squared_error": {} } } } } -------------------------------------------------- // TEST[skip:TBD] <1> In this example, a test/train split (`training_percent`) was defined for the {reganalysis}. This query limits evaluation to be performed on the test split only. It means that a testing error will be calculated. <2> The field that contains the ground truth value for the actual student performance. This is required in order to evaluate results. <3> The field that contains the predicted value for student performance calculated by the {reganalysis}. ===== {classification-cap} [source,console] -------------------------------------------------- POST _ml/data_frame/_evaluate { "index": "animal_classification", "evaluation": { "classification": { <1> "actual_field": "animal_class", <2> "predicted_field": "ml.animal_class_prediction.keyword", <3> "metrics": { "multiclass_confusion_matrix" : {} <4> } } } } -------------------------------------------------- // TEST[skip:TBD] <1> The evaluation type. <2> The field that contains the ground truth value for the actual animal classification. This is required in order to evaluate results. <3> The field that contains the predicted value for animal classification by the {classanalysis}. Since the field storing predicted class is dynamically mapped as text and keyword, you need to add the `.keyword` suffix to the name. <4> Specifies the metric for the evaluation. The API returns the following result: [source,console-result] -------------------------------------------------- { "classification" : { "multiclass_confusion_matrix" : { "confusion_matrix" : [ { "actual_class" : "cat", <1> "actual_class_doc_count" : 12, <2> "predicted_classes" : [ <3> { "predicted_class" : "cat", "count" : 12 <4> }, { "predicted_class" : "dog", "count" : 0 <5> } ], "other_predicted_class_doc_count" : 0 <6> }, { "actual_class" : "dog", "actual_class_doc_count" : 11, "predicted_classes" : [ { "predicted_class" : "dog", "count" : 11 }, { "predicted_class" : "cat", "count" : 4 } ], "other_predicted_class_doc_count" : 4 } ], "other_actual_class_count" : 0 } } } -------------------------------------------------- <1> The name of the actual class that the analysis tried to predict. <2> The number of documents in the index that belong to the `actual_class`. <3> This object contains the list of the predicted classes and the number of predictions associated with the class. <4> The number of cats in the dataset that are correctly identified as cats. <5> The number of cats in the dataset that are incorrectly classified as dogs. <6> The number of documents that are classified as a class that is not listed as a `predicted_class`.