-- :api: put-data-frame-analytics :request: PutDataFrameAnalyticsRequest :response: PutDataFrameAnalyticsResponse -- [role="xpack"] [id="{upid}-{api}"] === Put {dfanalytics-jobs} API experimental::[] Creates a new {dfanalytics-job}. The API accepts a +{request}+ object as a request and returns a +{response}+. [id="{upid}-{api}-request"] ==== Put {dfanalytics-jobs} request A +{request}+ requires the following argument: ["source","java",subs="attributes,callouts,macros"] -------------------------------------------------- include-tagged::{doc-tests-file}[{api}-request] -------------------------------------------------- <1> The configuration of the {dfanalytics-job} to create [id="{upid}-{api}-config"] ==== {dfanalytics-cap} configuration The `DataFrameAnalyticsConfig` object contains all the details about the {dfanalytics-job} configuration and contains the following arguments: ["source","java",subs="attributes,callouts,macros"] -------------------------------------------------- include-tagged::{doc-tests-file}[{api}-config] -------------------------------------------------- <1> The {dfanalytics-job} ID <2> The source index and query from which to gather data <3> The destination index <4> The analysis to be performed <5> The fields to be included in / excluded from the analysis <6> The memory limit for the model created as part of the analysis process <7> Optionally, a human-readable description <8> The maximum number of threads to be used by the analysis. Defaults to 1. [id="{upid}-{api}-query-config"] ==== SourceConfig The index and the query from which to collect data. ["source","java",subs="attributes,callouts,macros"] -------------------------------------------------- include-tagged::{doc-tests-file}[{api}-source-config] -------------------------------------------------- <1> Constructing a new DataFrameAnalyticsSource <2> The source index <3> The query from which to gather the data. If query is not set, a `match_all` query is used by default. <4> Source filtering to select which fields will exist in the destination index. ===== QueryConfig The query with which to select data from the source. ["source","java",subs="attributes,callouts,macros"] -------------------------------------------------- include-tagged::{doc-tests-file}[{api}-query-config] -------------------------------------------------- ==== DestinationConfig The index to which data should be written by the {dfanalytics-job}. ["source","java",subs="attributes,callouts,macros"] -------------------------------------------------- include-tagged::{doc-tests-file}[{api}-dest-config] -------------------------------------------------- <1> Constructing a new DataFrameAnalyticsDest <2> The destination index ==== Analysis The analysis to be performed. Currently, the supported analyses include: +OutlierDetection+, +Classification+, +Regression+. ===== Outlier detection +OutlierDetection+ analysis can be created in one of two ways: ["source","java",subs="attributes,callouts,macros"] -------------------------------------------------- include-tagged::{doc-tests-file}[{api}-outlier-detection-default] -------------------------------------------------- <1> Constructing a new OutlierDetection object with default strategy to determine outliers or ["source","java",subs="attributes,callouts,macros"] -------------------------------------------------- include-tagged::{doc-tests-file}[{api}-outlier-detection-customized] -------------------------------------------------- <1> Constructing a new OutlierDetection object <2> The method used to perform the analysis <3> Number of neighbors taken into account during analysis <4> The min `outlier_score` required to compute feature influence <5> Whether to compute feature influence <6> The proportion of the data set that is assumed to be outlying prior to outlier detection <7> Whether to apply standardization to feature values ===== Classification +Classification+ analysis requires to set which is the +dependent_variable+ and has a number of other optional parameters: ["source","java",subs="attributes,callouts,macros"] -------------------------------------------------- include-tagged::{doc-tests-file}[{api}-classification] -------------------------------------------------- <1> Constructing a new Classification builder object with the required dependent variable <2> The lambda regularization parameter. A non-negative double. <3> The gamma regularization parameter. A non-negative double. <4> The applied shrinkage. A double in [0.001, 1]. <5> The maximum number of trees the forest is allowed to contain. An integer in [1, 2000]. <6> The fraction of features which will be used when selecting a random bag for each candidate split. A double in (0, 1]. <7> If set, feature importance for the top most important features will be computed. <8> The name of the prediction field in the results object. <9> The percentage of training-eligible rows to be used in training. Defaults to 100%. <10> The seed to be used by the random generator that picks which rows are used in training. <11> The optimization objective to target when assigning class labels. Defaults to maximize_minimum_recall. <12> The number of top classes (or -1 which denotes all classes) to be reported in the results. Defaults to 2. <13> Custom feature processors that will create new features for analysis from the included document fields. Note, automatic categorical {ml-docs}/ml-feature-encoding.html[feature encoding] still occurs for all features. ===== Regression +Regression+ analysis requires to set which is the +dependent_variable+ and has a number of other optional parameters: ["source","java",subs="attributes,callouts,macros"] -------------------------------------------------- include-tagged::{doc-tests-file}[{api}-regression] -------------------------------------------------- <1> Constructing a new Regression builder object with the required dependent variable <2> The lambda regularization parameter. A non-negative double. <3> The gamma regularization parameter. A non-negative double. <4> The applied shrinkage. A double in [0.001, 1]. <5> The maximum number of trees the forest is allowed to contain. An integer in [1, 2000]. <6> The fraction of features which will be used when selecting a random bag for each candidate split. A double in (0, 1]. <7> If set, feature importance for the top most important features will be computed. <8> The name of the prediction field in the results object. <9> The percentage of training-eligible rows to be used in training. Defaults to 100%. <10> The seed to be used by the random generator that picks which rows are used in training. <11> The loss function used for regression. Defaults to `mse`. <12> An optional parameter to the loss function. <13> Custom feature processors that will create new features for analysis from the included document fields. Note, automatic categorical {ml-docs}/ml-feature-encoding.html[feature encoding] still occurs for all features. ==== Analyzed fields FetchContext object containing fields to be included in / excluded from the analysis ["source","java",subs="attributes,callouts,macros"] -------------------------------------------------- include-tagged::{doc-tests-file}[{api}-analyzed-fields] -------------------------------------------------- include::../execution.asciidoc[] [id="{upid}-{api}-response"] ==== Response The returned +{response}+ contains the newly created {dfanalytics-job}. ["source","java",subs="attributes,callouts,macros"] -------------------------------------------------- include-tagged::{doc-tests-file}[{api}-response] --------------------------------------------------