541 lines
15 KiB
Plaintext
541 lines
15 KiB
Plaintext
[role="xpack"]
|
|
[testenv="platinum"]
|
|
[[evaluate-dfanalytics]]
|
|
= Evaluate {dfanalytics} API
|
|
|
|
[subs="attributes"]
|
|
++++
|
|
<titleabbrev>Evaluate {dfanalytics}</titleabbrev>
|
|
++++
|
|
|
|
Evaluates the {dfanalytics} for an annotated index.
|
|
|
|
experimental[]
|
|
|
|
|
|
[[ml-evaluate-dfanalytics-request]]
|
|
== {api-request-title}
|
|
|
|
`POST _ml/data_frame/_evaluate`
|
|
|
|
|
|
[[ml-evaluate-dfanalytics-prereq]]
|
|
== {api-prereq-title}
|
|
|
|
If the {es} {security-features} are enabled, you must have the following privileges:
|
|
|
|
* cluster: `monitor_ml`
|
|
|
|
For more information, see <<security-privileges>> and
|
|
{ml-docs-setup-privileges}.
|
|
|
|
|
|
[[ml-evaluate-dfanalytics-desc]]
|
|
== {api-description-title}
|
|
|
|
The API packages together commonly used evaluation metrics for various types of
|
|
machine learning features. This has been designed for use on indexes created by
|
|
{dfanalytics}. Evaluation requires both a ground truth field and an analytics
|
|
result field to be present.
|
|
|
|
|
|
[[ml-evaluate-dfanalytics-request-body]]
|
|
== {api-request-body-title}
|
|
|
|
`evaluation`::
|
|
(Required, object) Defines the type of evaluation you want to perform.
|
|
See <<ml-evaluate-dfanalytics-resources>>.
|
|
+
|
|
--
|
|
Available evaluation types:
|
|
|
|
* `outlier_detection`
|
|
* `regression`
|
|
* `classification`
|
|
|
|
--
|
|
|
|
`index`::
|
|
(Required, object) Defines the `index` in which the evaluation will be
|
|
performed.
|
|
|
|
`query`::
|
|
(Optional, object) A query clause that retrieves a subset of data from the
|
|
source index. See <<query-dsl>>.
|
|
|
|
[[ml-evaluate-dfanalytics-resources]]
|
|
== {dfanalytics-cap} evaluation resources
|
|
|
|
[[oldetection-resources]]
|
|
=== {oldetection-cap} evaluation objects
|
|
|
|
{oldetection-cap} evaluates the results of an {oldetection} analysis which
|
|
outputs the probability that each document is an outlier.
|
|
|
|
`actual_field`::
|
|
(Required, string) The field of the `index` which contains the `ground truth`.
|
|
The data type of this field can be boolean or integer. If the data type is
|
|
integer, the value has to be either `0` (false) or `1` (true).
|
|
|
|
`predicted_probability_field`::
|
|
(Required, string) The field of the `index` that defines the probability of
|
|
whether the item belongs to the class in question or not. It's the field that
|
|
contains the results of the analysis.
|
|
|
|
`metrics`::
|
|
(Optional, object) Specifies the metrics that are used for the evaluation.
|
|
Available metrics:
|
|
|
|
`auc_roc`:::
|
|
(Optional, object) The AUC ROC (area under the curve of the receiver
|
|
operating characteristic) score and optionally the curve. Default value is
|
|
{"include_curve": false}.
|
|
|
|
`confusion_matrix`:::
|
|
(Optional, object) Set the different thresholds of the {olscore} at where
|
|
the metrics (`tp` - true positive, `fp` - false positive, `tn` - true
|
|
negative, `fn` - false negative) are calculated. Default value is
|
|
{"at": [0.25, 0.50, 0.75]}.
|
|
|
|
`precision`:::
|
|
(Optional, object) Set the different thresholds of the {olscore} at where
|
|
the metric is calculated. Default value is {"at": [0.25, 0.50, 0.75]}.
|
|
|
|
`recall`:::
|
|
(Optional, object) Set the different thresholds of the {olscore} at where
|
|
the metric is calculated. Default value is {"at": [0.25, 0.50, 0.75]}.
|
|
|
|
|
|
[[regression-evaluation-resources]]
|
|
=== {regression-cap} evaluation objects
|
|
|
|
{regression-cap} evaluation evaluates the results of a {regression} analysis
|
|
which outputs a prediction of values.
|
|
|
|
`actual_field`::
|
|
(Required, string) The field of the `index` which contains the `ground truth`.
|
|
The data type of this field must be numerical.
|
|
|
|
`predicted_field`::
|
|
(Required, string) The field in the `index` that contains the predicted value,
|
|
in other words the results of the {regression} analysis.
|
|
|
|
`metrics`::
|
|
(Optional, object) Specifies the metrics that are used for the evaluation. For
|
|
more information on `mse`, `msle`, and `huber`, consult
|
|
https://github.com/elastic/examples/tree/master/Machine%20Learning/Regression%20Loss%20Functions[the Jupyter notebook on regression loss functions].
|
|
Available metrics:
|
|
|
|
`mse`:::
|
|
(Optional, object) Average squared difference between the predicted values
|
|
and the actual (`ground truth`) value. For more information, read
|
|
{wikipedia}/Mean_squared_error[this wiki article].
|
|
|
|
`msle`:::
|
|
(Optional, object) Average squared difference between the logarithm of the
|
|
predicted values and the logarithm of the actual (`ground truth`) value.
|
|
|
|
`offset`::::
|
|
(Optional, double) Defines the transition point at which you switch from
|
|
minimizing quadratic error to minimizing quadratic log error. Defaults to
|
|
`1`.
|
|
|
|
`huber`:::
|
|
(Optional, object) Pseudo Huber loss function. For more information, read
|
|
{wikipedia}/Huber_loss#Pseudo-Huber_loss_function[this wiki article].
|
|
|
|
`delta`::::
|
|
(Optional, double) Approximates 1/2 (prediction - actual)^2^ for values
|
|
much less than delta and approximates a straight line with slope delta for
|
|
values much larger than delta. Defaults to `1`. Delta needs to be greater
|
|
than `0`.
|
|
|
|
`r_squared`:::
|
|
(Optional, object) Proportion of the variance in the dependent variable that
|
|
is predictable from the independent variables. For more information, read
|
|
{wikipedia}/Coefficient_of_determination[this wiki article].
|
|
|
|
|
|
|
|
[[classification-evaluation-resources]]
|
|
== {classification-cap} evaluation objects
|
|
|
|
{classification-cap} evaluation evaluates the results of a {classanalysis} which
|
|
outputs a prediction that identifies to which of the classes each document
|
|
belongs.
|
|
|
|
`actual_field`::
|
|
(Required, string) The field of the `index` which contains the `ground truth`.
|
|
The data type of this field must be categorical.
|
|
|
|
`predicted_field`::
|
|
(Optional, string) The field in the `index` which contains the predicted value,
|
|
in other words the results of the {classanalysis}.
|
|
|
|
`top_classes_field`::
|
|
(Optional, string) The field of the `index` which is an array of documents
|
|
of the form `{ "class_name": XXX, "class_probability": YYY }`.
|
|
This field must be defined as `nested` in the mappings.
|
|
|
|
`metrics`::
|
|
(Optional, object) Specifies the metrics that are used for the evaluation.
|
|
Available metrics:
|
|
|
|
`accuracy`:::
|
|
(Optional, object) Accuracy of predictions (per-class and overall).
|
|
|
|
`auc_roc`:::
|
|
(Optional, object) The AUC ROC (area under the curve of the receiver
|
|
operating characteristic) score and optionally the curve.
|
|
It is calculated for a specific class (provided as "class_name") treated as
|
|
positive.
|
|
|
|
`class_name`::::
|
|
(Required, string) Name of the only class that is treated as positive
|
|
during AUC ROC calculation. Other classes are treated as negative
|
|
("one-vs-all" strategy). All the evaluated documents must have
|
|
`class_name` in the list of their top classes.
|
|
|
|
`include_curve`::::
|
|
(Optional, boolean) Whether or not the curve should be returned in
|
|
addition to the score. Default value is false.
|
|
|
|
`multiclass_confusion_matrix`:::
|
|
(Optional, object) Multiclass confusion matrix.
|
|
|
|
`precision`:::
|
|
(Optional, object) Precision of predictions (per-class and average).
|
|
|
|
`recall`:::
|
|
(Optional, object) Recall of predictions (per-class and average).
|
|
|
|
|
|
////
|
|
[[ml-evaluate-dfanalytics-results]]
|
|
== {api-response-body-title}
|
|
|
|
`outlier_detection`::
|
|
(object) If you chose to do outlier detection, the API returns the
|
|
following evaluation metrics:
|
|
|
|
`auc_roc`::: TBD
|
|
|
|
`confusion_matrix`::: TBD
|
|
|
|
`precision`::: TBD
|
|
|
|
`recall`::: TBD
|
|
////
|
|
|
|
|
|
[[ml-evaluate-dfanalytics-example]]
|
|
== {api-examples-title}
|
|
|
|
|
|
[[ml-evaluate-oldetection-example]]
|
|
=== {oldetection-cap}
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
POST _ml/data_frame/_evaluate
|
|
{
|
|
"index": "my_analytics_dest_index",
|
|
"evaluation": {
|
|
"outlier_detection": {
|
|
"actual_field": "is_outlier",
|
|
"predicted_probability_field": "ml.outlier_score"
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TEST[skip:TBD]
|
|
|
|
The API returns the following results:
|
|
|
|
[source,console-result]
|
|
----
|
|
{
|
|
"outlier_detection": {
|
|
"auc_roc": {
|
|
"score": 0.92584757746414444
|
|
},
|
|
"confusion_matrix": {
|
|
"0.25": {
|
|
"tp": 5,
|
|
"fp": 9,
|
|
"tn": 204,
|
|
"fn": 5
|
|
},
|
|
"0.5": {
|
|
"tp": 1,
|
|
"fp": 5,
|
|
"tn": 208,
|
|
"fn": 9
|
|
},
|
|
"0.75": {
|
|
"tp": 0,
|
|
"fp": 4,
|
|
"tn": 209,
|
|
"fn": 10
|
|
}
|
|
},
|
|
"precision": {
|
|
"0.25": 0.35714285714285715,
|
|
"0.5": 0.16666666666666666,
|
|
"0.75": 0
|
|
},
|
|
"recall": {
|
|
"0.25": 0.5,
|
|
"0.5": 0.1,
|
|
"0.75": 0
|
|
}
|
|
}
|
|
}
|
|
----
|
|
|
|
|
|
[[ml-evaluate-regression-example]]
|
|
=== {regression-cap}
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
POST _ml/data_frame/_evaluate
|
|
{
|
|
"index": "house_price_predictions", <1>
|
|
"query": {
|
|
"bool": {
|
|
"filter": [
|
|
{ "term": { "ml.is_training": false } } <2>
|
|
]
|
|
}
|
|
},
|
|
"evaluation": {
|
|
"regression": {
|
|
"actual_field": "price", <3>
|
|
"predicted_field": "ml.price_prediction", <4>
|
|
"metrics": {
|
|
"r_squared": {},
|
|
"mse": {},
|
|
"msle": {"offset": 10},
|
|
"huber": {"delta": 1.5}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TEST[skip:TBD]
|
|
|
|
<1> The output destination index from a {dfanalytics} {reganalysis}.
|
|
<2> In this example, a test/train split (`training_percent`) was defined for the
|
|
{reganalysis}. This query limits evaluation to be performed on the test split
|
|
only.
|
|
<3> The ground truth value for the actual house price. This is required in order
|
|
to evaluate results.
|
|
<4> The predicted value for house price calculated by the {reganalysis}.
|
|
|
|
|
|
The following example calculates the training error:
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
POST _ml/data_frame/_evaluate
|
|
{
|
|
"index": "student_performance_mathematics_reg",
|
|
"query": {
|
|
"term": {
|
|
"ml.is_training": {
|
|
"value": true <1>
|
|
}
|
|
}
|
|
},
|
|
"evaluation": {
|
|
"regression": {
|
|
"actual_field": "G3", <2>
|
|
"predicted_field": "ml.G3_prediction", <3>
|
|
"metrics": {
|
|
"r_squared": {},
|
|
"mse": {},
|
|
"msle": {},
|
|
"huber": {}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TEST[skip:TBD]
|
|
|
|
<1> In this example, a test/train split (`training_percent`) was defined for the
|
|
{reganalysis}. This query limits evaluation to be performed on the train split
|
|
only. It means that a training error will be calculated.
|
|
<2> The field that contains the ground truth value for the actual student
|
|
performance. This is required in order to evaluate results.
|
|
<3> The field that contains the predicted value for student performance
|
|
calculated by the {reganalysis}.
|
|
|
|
|
|
The next example calculates the testing error. The only difference compared with
|
|
the previous example is that `ml.is_training` is set to `false` this time, so
|
|
the query excludes the train split from the evaluation.
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
POST _ml/data_frame/_evaluate
|
|
{
|
|
"index": "student_performance_mathematics_reg",
|
|
"query": {
|
|
"term": {
|
|
"ml.is_training": {
|
|
"value": false <1>
|
|
}
|
|
}
|
|
},
|
|
"evaluation": {
|
|
"regression": {
|
|
"actual_field": "G3", <2>
|
|
"predicted_field": "ml.G3_prediction", <3>
|
|
"metrics": {
|
|
"r_squared": {},
|
|
"mse": {},
|
|
"msle": {},
|
|
"huber": {}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TEST[skip:TBD]
|
|
|
|
<1> In this example, a test/train split (`training_percent`) was defined for the
|
|
{reganalysis}. This query limits evaluation to be performed on the test split
|
|
only. It means that a testing error will be calculated.
|
|
<2> The field that contains the ground truth value for the actual student
|
|
performance. This is required in order to evaluate results.
|
|
<3> The field that contains the predicted value for student performance
|
|
calculated by the {reganalysis}.
|
|
|
|
|
|
[[ml-evaluate-classification-example]]
|
|
=== {classification-cap}
|
|
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
POST _ml/data_frame/_evaluate
|
|
{
|
|
"index": "animal_classification",
|
|
"evaluation": {
|
|
"classification": { <1>
|
|
"actual_field": "animal_class", <2>
|
|
"predicted_field": "ml.animal_class_prediction", <3>
|
|
"metrics": {
|
|
"multiclass_confusion_matrix" : {} <4>
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TEST[skip:TBD]
|
|
|
|
<1> The evaluation type.
|
|
<2> The field that contains the ground truth value for the actual animal
|
|
classification. This is required in order to evaluate results.
|
|
<3> The field that contains the predicted value for animal classification by
|
|
the {classanalysis}.
|
|
<4> Specifies the metric for the evaluation.
|
|
|
|
|
|
The API returns the following result:
|
|
|
|
[source,console-result]
|
|
--------------------------------------------------
|
|
{
|
|
"classification" : {
|
|
"multiclass_confusion_matrix" : {
|
|
"confusion_matrix" : [
|
|
{
|
|
"actual_class" : "cat", <1>
|
|
"actual_class_doc_count" : 12, <2>
|
|
"predicted_classes" : [ <3>
|
|
{
|
|
"predicted_class" : "cat",
|
|
"count" : 12 <4>
|
|
},
|
|
{
|
|
"predicted_class" : "dog",
|
|
"count" : 0 <5>
|
|
}
|
|
],
|
|
"other_predicted_class_doc_count" : 0 <6>
|
|
},
|
|
{
|
|
"actual_class" : "dog",
|
|
"actual_class_doc_count" : 11,
|
|
"predicted_classes" : [
|
|
{
|
|
"predicted_class" : "dog",
|
|
"count" : 7
|
|
},
|
|
{
|
|
"predicted_class" : "cat",
|
|
"count" : 4
|
|
}
|
|
],
|
|
"other_predicted_class_doc_count" : 0
|
|
}
|
|
],
|
|
"other_actual_class_count" : 0
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
<1> The name of the actual class that the analysis tried to predict.
|
|
<2> The number of documents in the index that belong to the `actual_class`.
|
|
<3> This object contains the list of the predicted classes and the number of
|
|
predictions associated with the class.
|
|
<4> The number of cats in the dataset that are correctly identified as cats.
|
|
<5> The number of cats in the dataset that are incorrectly classified as dogs.
|
|
<6> The number of documents that are classified as a class that is not listed as
|
|
a `predicted_class`.
|
|
|
|
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
POST _ml/data_frame/_evaluate
|
|
{
|
|
"index": "animal_classification",
|
|
"evaluation": {
|
|
"classification": { <1>
|
|
"actual_field": "animal_class", <2>
|
|
"metrics": {
|
|
"auc_roc" : { <3>
|
|
"class_name": "dog" <4>
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TEST[skip:TBD]
|
|
|
|
<1> The evaluation type.
|
|
<2> The field that contains the ground truth value for the actual animal
|
|
classification. This is required in order to evaluate results.
|
|
<3> Specifies the metric for the evaluation.
|
|
<4> Specifies the class name that is treated as positive during the evaluation,
|
|
all the other classes are treated as negative.
|
|
|
|
|
|
The API returns the following result:
|
|
|
|
[source,console-result]
|
|
--------------------------------------------------
|
|
{
|
|
"classification" : {
|
|
"auc_roc" : {
|
|
"score" : 0.8941788639536681
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TEST[skip:TBD] |