mirror of
https://github.com/honeymoose/OpenSearch.git
synced 2025-02-08 14:05:27 +00:00
This merges the initial work that adds a framework for performing machine learning analytics on data frames. The feature is currently experimental and requires a platinum license. Note that the original commits can be found in the `feature-ml-data-frame-analytics` branch. A new set of APIs is added which allows the creation of data frame analytics jobs. Configuration allows specifying different types of analysis to be performed on a data frame. At first there is support for outlier detection. The APIs are: - PUT _ml/data_frame/analysis/{id} - GET _ml/data_frame/analysis/{id} - GET _ml/data_frame/analysis/{id}/_stats - POST _ml/data_frame/analysis/{id}/_start - POST _ml/data_frame/analysis/{id}/_stop - DELETE _ml/data_frame/analysis/{id} When a data frame analytics job is started a persistent task is created and started. The main steps of the task are: 1. reindex the source index into the dest index 2. analyze the data through the data_frame_analyzer c++ process 3. merge the results of the process back into the destination index In addition, an evaluation API is added which packages commonly used metrics that provide evaluation of various analysis: - POST _ml/data_frame/_evaluate
115 lines
4.0 KiB
Plaintext
115 lines
4.0 KiB
Plaintext
--
|
|
:api: put-data-frame-analytics
|
|
:request: PutDataFrameAnalyticsRequest
|
|
:response: PutDataFrameAnalyticsResponse
|
|
--
|
|
[id="{upid}-{api}"]
|
|
=== Put Data Frame Analytics API
|
|
|
|
The Put Data Frame Analytics API is used to create a new {dataframe-analytics-config}.
|
|
The API accepts a +{request}+ object as a request and returns a +{response}+.
|
|
|
|
[id="{upid}-{api}-request"]
|
|
==== Put Data Frame Analytics Request
|
|
|
|
A +{request}+ requires the following argument:
|
|
|
|
["source","java",subs="attributes,callouts,macros"]
|
|
--------------------------------------------------
|
|
include-tagged::{doc-tests-file}[{api}-request]
|
|
--------------------------------------------------
|
|
<1> The configuration of the {dataframe-job} to create
|
|
|
|
[id="{upid}-{api}-config"]
|
|
==== Data Frame Analytics Configuration
|
|
|
|
The `DataFrameAnalyticsConfig` object contains all the details about the {dataframe-job}
|
|
configuration and contains the following arguments:
|
|
|
|
["source","java",subs="attributes,callouts,macros"]
|
|
--------------------------------------------------
|
|
include-tagged::{doc-tests-file}[{api}-config]
|
|
--------------------------------------------------
|
|
<1> The {dataframe-analytics-config} id
|
|
<2> The source index and query from which to gather data
|
|
<3> The destination index
|
|
<4> The analysis to be performed
|
|
<5> The fields to be included in / excluded from the analysis
|
|
<6> The memory limit for the model created as part of the analysis process
|
|
|
|
[id="{upid}-{api}-query-config"]
|
|
|
|
==== SourceConfig
|
|
|
|
The index and the query from which to collect data.
|
|
|
|
["source","java",subs="attributes,callouts,macros"]
|
|
--------------------------------------------------
|
|
include-tagged::{doc-tests-file}[{api}-source-config]
|
|
--------------------------------------------------
|
|
<1> Constructing a new DataFrameAnalyticsSource
|
|
<2> The source index
|
|
<3> The query from which to gather the data. If query is not set, a `match_all` query is used by default.
|
|
|
|
===== QueryConfig
|
|
|
|
The query with which to select data from the source.
|
|
|
|
["source","java",subs="attributes,callouts,macros"]
|
|
--------------------------------------------------
|
|
include-tagged::{doc-tests-file}[{api}-query-config]
|
|
--------------------------------------------------
|
|
|
|
==== DestinationConfig
|
|
|
|
The index to which data should be written by the {dataframe-job}.
|
|
|
|
["source","java",subs="attributes,callouts,macros"]
|
|
--------------------------------------------------
|
|
include-tagged::{doc-tests-file}[{api}-dest-config]
|
|
--------------------------------------------------
|
|
<1> Constructing a new DataFrameAnalyticsDest
|
|
<2> The destination index
|
|
|
|
==== Analysis
|
|
|
|
The analysis to be performed.
|
|
Currently, only one analysis is supported: +OutlierDetection+.
|
|
|
|
+OutlierDetection+ analysis can be created in one of two ways:
|
|
|
|
["source","java",subs="attributes,callouts,macros"]
|
|
--------------------------------------------------
|
|
include-tagged::{doc-tests-file}[{api}-analysis-default]
|
|
--------------------------------------------------
|
|
<1> Constructing a new OutlierDetection object with default strategy to determine outliers
|
|
|
|
or
|
|
["source","java",subs="attributes,callouts,macros"]
|
|
--------------------------------------------------
|
|
include-tagged::{doc-tests-file}[{api}-analysis-customized]
|
|
--------------------------------------------------
|
|
<1> Constructing a new OutlierDetection object
|
|
<2> The method used to perform the analysis
|
|
<3> Number of neighbors taken into account during analysis
|
|
|
|
==== Analyzed fields
|
|
|
|
FetchContext object containing fields to be included in / excluded from the analysis
|
|
|
|
["source","java",subs="attributes,callouts,macros"]
|
|
--------------------------------------------------
|
|
include-tagged::{doc-tests-file}[{api}-analyzed-fields]
|
|
--------------------------------------------------
|
|
|
|
include::../execution.asciidoc[]
|
|
|
|
[id="{upid}-{api}-response"]
|
|
==== Response
|
|
|
|
The returned +{response}+ contains the newly created {dataframe-analytics-config}.
|
|
|
|
["source","java",subs="attributes,callouts,macros"]
|
|
--------------------------------------------------
|
|
include-tagged::{doc-tests-file}[{api}-response]
|
|
-------------------------------------------------- |