OpenSearch/docs/java-rest/high-level/ml/evaluate-data-frame.asciidoc
Dimitris Athanasiou 126c2fd2d5
[7.x][ML] Machine learning data frame analytics (#43544) (#43592)
This merges the initial work that adds a framework for performing
machine learning analytics on data frames. The feature is currently experimental
and requires a platinum license. Note that the original commits can be
found in the `feature-ml-data-frame-analytics` branch.

A new set of APIs is added which allows the creation of data frame analytics
jobs. Configuration allows specifying different types of analysis to be performed
on a data frame. At first there is support for outlier detection.

The APIs are:

- PUT _ml/data_frame/analysis/{id}
- GET _ml/data_frame/analysis/{id}
- GET _ml/data_frame/analysis/{id}/_stats
- POST _ml/data_frame/analysis/{id}/_start
- POST _ml/data_frame/analysis/{id}/_stop
- DELETE _ml/data_frame/analysis/{id}

When a data frame analytics job is started a persistent task is created and started.
The main steps of the task are:

1. reindex the source index into the dest index
2. analyze the data through the data_frame_analyzer c++ process
3. merge the results of the process back into the destination index

In addition, an evaluation API is added which packages commonly used metrics
that provide evaluation of various analysis:

- POST _ml/data_frame/_evaluate
2019-06-25 20:29:11 +03:00

45 lines
2.1 KiB
Plaintext

--
:api: evaluate-data-frame
:request: EvaluateDataFrameRequest
:response: EvaluateDataFrameResponse
--
[id="{upid}-{api}"]
=== Evaluate Data Frame API
The Evaluate Data Frame API is used to evaluate an ML algorithm that ran on a {dataframe}.
The API accepts an +{request}+ object and returns an +{response}+.
[id="{upid}-{api}-request"]
==== Evaluate Data Frame Request
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests-file}[{api}-request]
--------------------------------------------------
<1> Constructing a new evaluation request
<2> Reference to an existing index
<3> Kind of evaluation to perform
<4> Name of the field in the index. Its value denotes the actual (i.e. ground truth) label for an example. Must be either true or false
<5> Name of the field in the index. Its value denotes the probability (as per some ML algorithm) of the example being classified as positive
<6> The remaining parameters are the metrics to be calculated based on the two fields described above.
<7> https://en.wikipedia.org/wiki/Precision_and_recall[Precision] calculated at thresholds: 0.4, 0.5 and 0.6
<8> https://en.wikipedia.org/wiki/Precision_and_recall[Recall] calculated at thresholds: 0.5 and 0.7
<9> https://en.wikipedia.org/wiki/Confusion_matrix[Confusion matrix] calculated at threshold 0.5
<10> https://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_the_curve[AuC ROC] calculated and the curve points returned
include::../execution.asciidoc[]
[id="{upid}-{api}-response"]
==== Response
The returned +{response}+ contains the requested evaluation metrics.
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests-file}[{api}-response]
--------------------------------------------------
<1> Fetching all the calculated metrics results
<2> Fetching precision metric by name
<3> Fetching precision at a given (0.4) threshold
<4> Fetching confusion matrix metric by name
<5> Fetching confusion matrix at a given (0.5) threshold