OpenSearch/docs/reference/ml/df-analytics/apis/put-dfanalytics.asciidoc

[role="xpack"]
[testenv="platinum"]
[[put-dfanalytics]]
=== Create {dfanalytics-jobs} API
[subs="attributes"]
++++
<titleabbrev>Create {dfanalytics-jobs}</titleabbrev>
++++

Instantiates a {dfanalytics-job}.

experimental[]

[[ml-put-dfanalytics-request]]
==== {api-request-title}

`PUT _ml/data_frame/analytics/<data_frame_analytics_id>`


[[ml-put-dfanalytics-prereq]]
==== {api-prereq-title}

If the {es} {security-features} are enabled, you must have the following built-in roles and privileges:

* `machine_learning_admin`
* `kibana_admin` (UI only)


* source indices: `read`, `view_index_metadata`
* destination index: `read`, `create_index`, `manage` and `index`
* cluster: `monitor` (UI only)
  
For more information, see <<security-privileges>> and <<built-in-roles>>.

NOTE: The {dfanalytics-job} remembers which roles the user who created it had at
the time of creation. When you start the job, it performs the analysis using
those same roles. If you provide
<<http-clients-secondary-authorization,secondary authorization headers>>, 
those credentials are used instead.

[[ml-put-dfanalytics-desc]]
==== {api-description-title}

This API creates a {dfanalytics-job} that performs an analysis on the source 
indices and stores the outcome in a destination index.

If the destination index does not exist, it is created automatically when you
start the job. See <<start-dfanalytics>>.

[[ml-hyperparam-optimization]]
If you supply only a subset of the {regression} or {classification} parameters,
_hyperparameter optimization_ occurs. It determines a value for each of the
undefined parameters.

////
The starting point is calculated for data dependent parameters by examining the loss
on the training data. Subject to the size constraint, this operation provides an
upper bound on the improvement in validation loss.

The optimization starts with random search, then 
Bayesian optimization is performed that is targeting maximum expected 
improvement. If you override any parameters by explicitely setting it, the 
optimization calculates the value of the remaining parameters accordingly and 
uses the value you provided for the overridden parameter. The number of rounds 
are reduced respectively. The validation error is estimated in each round by 
using 4-fold cross validation.
////

[[ml-put-dfanalytics-path-params]]
==== {api-path-parms-title}

`<data_frame_analytics_id>`::
(Required, string)
include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-data-frame-analytics-define]

[role="child_attributes"]
[[ml-put-dfanalytics-request-body]]
==== {api-request-body-title}

`allow_lazy_start`::
(Optional, boolean) 
Specifies whether this job can start when there is insufficient {ml} node 
capacity for it to be immediately assigned to a node. The default is `false`; if 
a {ml} node with capacity to run the job cannot immediately be found, the
<<start-dfanalytics>> API returns an error. However, this is also subject to the
cluster-wide `xpack.ml.max_lazy_ml_nodes` setting. See <<advanced-ml-settings>>.
If this option is set to `true`, the API does not return an error and the job
waits in the `starting` state until sufficient {ml} node capacity is available.

//Begin analysis
`analysis`::
(Required, object)
The analysis configuration, which contains the information necessary to perform
one of the following types of analysis: {classification}, {oldetection}, or
{regression}.
+
.Properties of `analysis`
[%collapsible%open]
====
//Begin classification
`classification`:::
(Required^*^, object)
The configuration information necessary to perform
{ml-docs}/dfa-classification.html[{classification}].
+
TIP: Advanced parameters are for fine-tuning {classanalysis}. They are set 
automatically by hyperparameter optimization to give the minimum validation
error. It is highly recommended to use the default values unless you fully
understand the function of these parameters.
+
.Properties of `classification`
[%collapsible%open]
=====
`class_assignment_objective`::::
(Optional, string)
Defines the objective to optimize when assigning class labels:
`maximize_accuracy` or `maximize_minimum_recall`. When maximizing accuracy,
class labels are chosen to maximize the number of correct predictions. When
maximizing minimum recall, labels are chosen to maximize the minimum recall
for any class. Defaults to `maximize_minimum_recall`.

`dependent_variable`::::
(Required, string)
+
include::{docdir}/ml/ml-shared.asciidoc[tag=dependent-variable]
+
The data type of the field must be numeric (`integer`, `short`, `long`, `byte`),
categorical (`ip` or `keyword`), or boolean. There must be no more than 30
different values in this field.

`eta`::::
(Optional, double) 
include::{docdir}/ml/ml-shared.asciidoc[tag=eta]

`feature_bag_fraction`::::
(Optional, double) 
include::{docdir}/ml/ml-shared.asciidoc[tag=feature-bag-fraction]

`gamma`::::
(Optional, double) 
include::{docdir}/ml/ml-shared.asciidoc[tag=gamma]

`lambda`::::
(Optional, double) 
include::{docdir}/ml/ml-shared.asciidoc[tag=lambda]

`max_trees`::::
(Optional, integer) 
include::{docdir}/ml/ml-shared.asciidoc[tag=max-trees]

`num_top_classes`::::
(Optional, integer)
Defines the number of categories for which the predicted probabilities are
reported. It must be non-negative. If it is greater than the total number of
categories, the API reports all category probabilities. Defaults to 2.

`num_top_feature_importance_values`::::
(Optional, integer)
Advanced configuration option. Specifies the maximum number of
{ml-docs}/ml-feature-importance.html[{feat-imp}] values per document to return. 
By default, it is zero and no {feat-imp} calculation occurs.

`prediction_field_name`::::
(Optional, string) 
include::{docdir}/ml/ml-shared.asciidoc[tag=prediction-field-name]

`randomize_seed`::::
(Optional, long)
include::{docdir}/ml/ml-shared.asciidoc[tag=randomize-seed]

`training_percent`::::
(Optional, integer)
include::{docdir}/ml/ml-shared.asciidoc[tag=training-percent]
//End classification
=====
//Begin outlier_detection
`outlier_detection`:::
(Required^*^, object)
The configuration information necessary to perform
{ml-docs}/dfa-outlier-detection.html[{oldetection}]:
+
.Properties of `outlier_detection`
[%collapsible%open]
=====
`compute_feature_influence`::::
(Optional, boolean)
If `true`, the feature influence calculation is enabled. Defaults to `true`.
  
`feature_influence_threshold`:::: 
(Optional, double)
The minimum {olscore} that a document needs to have in order to calculate its 
{fiscore}. Value range: 0-1 (`0.1` by default).

`method`::::
(Optional, string)
Sets the method that {oldetection} uses. If the method is not set {oldetection} 
uses an ensemble of different methods and normalises and combines their 
individual {olscores} to obtain the overall {olscore}. We recommend to use the 
ensemble method. Available methods are `lof`, `ldof`, `distance_kth_nn`, 
`distance_knn`.
  
`n_neighbors`::::
(Optional, integer)
Defines the value for how many nearest neighbors each method of 
{oldetection} will use to calculate its {olscore}. When the value is not set, 
different values will be used for different ensemble members. This helps 
improve diversity in the ensemble. Therefore, only override this if you are 
confident that the value you choose is appropriate for the data set.
  
`outlier_fraction`::::
(Optional, double)
Sets the proportion of the data set that is assumed to be outlying prior to 
{oldetection}. For example, 0.05 means it is assumed that 5% of values are real 
outliers and 95% are inliers.
  
`standardization_enabled`::::
(Optional, boolean)
If `true`, then the following operation is performed on the columns before 
computing outlier scores: (x_i - mean(x_i)) / sd(x_i). Defaults to `true`. For 
more information, see 
https://en.wikipedia.org/wiki/Feature_scaling#Standardization_(Z-score_Normalization)[this wiki page about standardization].
//End outlier_detection
=====
//Begin regression
`regression`:::
(Required^*^, object)
The configuration information necessary to perform
{ml-docs}/dfa-regression.html[{regression}].
+
TIP: Advanced parameters are for fine-tuning {reganalysis}. They are set 
automatically by hyperparameter optimization to give minimum validation error.
It is highly recommended to use the default values unless you fully understand
the function of these parameters.
+
.Properties of `regression`
[%collapsible%open]
=====
`dependent_variable`::::
(Required, string)
+
include::{docdir}/ml/ml-shared.asciidoc[tag=dependent-variable]
+
The data type of the field must be numeric.

`eta`::::
(Optional, double)
include::{docdir}/ml/ml-shared.asciidoc[tag=eta]

`feature_bag_fraction`::::
(Optional, double)
include::{docdir}/ml/ml-shared.asciidoc[tag=feature-bag-fraction]

`gamma`::::
(Optional, double) 
include::{docdir}/ml/ml-shared.asciidoc[tag=gamma]

`lambda`::::
(Optional, double) 
include::{docdir}/ml/ml-shared.asciidoc[tag=lambda]

`max_trees`::::
(Optional, integer) 
include::{docdir}/ml/ml-shared.asciidoc[tag=max-trees]

`num_top_feature_importance_values`::::
(Optional, integer)
Advanced configuration option. Specifies the maximum number of
{ml-docs}/ml-feature-importance.html[{feat-imp}] values per document to return. 
By default, it is zero and no {feat-imp} calculation occurs.

`prediction_field_name`::::
(Optional, string)
include::{docdir}/ml/ml-shared.asciidoc[tag=prediction-field-name]

`randomize_seed`::::
(Optional, long)
include::{docdir}/ml/ml-shared.asciidoc[tag=randomize-seed]

`training_percent`::::
(Optional, integer)
include::{docdir}/ml/ml-shared.asciidoc[tag=training-percent]
=====
//End regression
====
//End analysis

//Begin analyzed_fields
`analyzed_fields`::
(Optional, object)
Specify `includes` and/or `excludes` patterns to select which fields will be 
included in the analysis. The patterns specified in `excludes` are applied last, 
therefore `excludes` takes precedence. In other words, if the same field is 
specified in both `includes` and `excludes`, then the field will not be included 
in the analysis.
+
--
[[dfa-supported-fields]]
The supported fields for each type of analysis are as follows:

* {oldetection-cap} requires numeric or boolean data to analyze. The algorithms 
don't support missing values therefore fields that have data types other than 
numeric or boolean are ignored. Documents where included fields contain missing 
values, null values, or an array are also ignored. Therefore the `dest` index 
may contain documents that don't have an {olscore}.
* {regression-cap} supports fields that are numeric, `boolean`, `text`, 
`keyword`, and `ip`. It is also tolerant of missing values. Fields that are 
supported are included in the analysis, other fields are ignored. Documents 
where included fields contain  an array with two or more values are also 
ignored. Documents in the `dest` index  that don’t contain a results field are 
not included in the {reganalysis}.
* {classification-cap} supports fields that are numeric, `boolean`, `text`,
`keyword`, and `ip`. It is also tolerant of missing values. Fields that are 
supported are included in the analysis, other fields are ignored. Documents
where included fields contain an array with two or more values are also ignored. 
Documents in the `dest` index that don’t contain a results field are not
included in the {classanalysis}. {classanalysis-cap} can be improved by mapping
ordinal variable values to a  single number. For example, in case of age ranges,
you can model the values as "0-14" = 0, "15-24" = 1, "25-34" = 2, and so on.

If `analyzed_fields` is not set, only the relevant fields will be included. For
example, all the numeric fields for {oldetection}. For more information about
field selection, see <<explain-dfanalytics>>.
--
+
.Properties of `analyzed_fields`
[%collapsible%open]
====
`excludes`:::
(Optional, array)
An array of strings that defines the fields that will be excluded from the
analysis. You do not need to add fields with unsupported data types to
`excludes`, these fields are excluded from the analysis automatically.

`includes`:::
(Optional, array)
An array of strings that defines the fields that will be included in the 
analysis.
//End analyzed_fields
====

`description`::
(Optional, string)
include::{docdir}/ml/ml-shared.asciidoc[tag=description-dfa]

`dest`::
(Required, object)
include::{docdir}/ml/ml-shared.asciidoc[tag=dest]
  
`model_memory_limit`::
(Optional, string)
The approximate maximum amount of memory resources that are permitted for 
analytical processing. The default value for {dfanalytics-jobs} is `1gb`. If 
your `elasticsearch.yml` file contains an `xpack.ml.max_model_memory_limit` 
setting, an error occurs when you try to create {dfanalytics-jobs} that have 
`model_memory_limit` values greater than that setting. For more information, see 
<<ml-settings>>.
  
`source`::
(object)
The configuration of how to source the analysis data. It requires an `index`.
Optionally, `query` and `_source` may be specified.
+
.Properties of `source`
[%collapsible%open]
====
`index`:::
(Required, string or array) Index or indices on which to perform the analysis.
It can be a single index or index pattern as well as an array of indices or
patterns.
+
WARNING: If your source indices contain documents with the same IDs, only the 
document that is indexed last appears in the destination index.

`query`:::
(Optional, object) The {es} query domain-specific language (<<query-dsl,DSL>>).
This value corresponds to the query object in an {es} search POST body. All the
options that are supported by {es} can be used, as this object is passed
verbatim to {es}. By default, this property has the following value:
`{"match_all": {}}`.

`_source`:::
(Optional, object) Specify `includes` and/or `excludes` patterns to select which
fields will be present in the destination. Fields that are excluded cannot be
included in the analysis.
+
.Properties of `_source`
[%collapsible%open]
=====
`includes`::::
(array) An array of strings that defines the fields that will be included in the
destination.
        
`excludes`::::
(array) An array of strings that defines the fields that will be excluded from
the destination.
=====
====


[[ml-put-dfanalytics-example]]
==== {api-examples-title}


[[ml-put-dfanalytics-example-preprocess]]
===== Preprocessing actions example

The following example shows how to limit the scope of the analysis to certain 
fields, specify excluded fields in the destination index, and use a query to 
filter your data before analysis.

[source,console]
--------------------------------------------------
PUT _ml/data_frame/analytics/model-flight-delays-pre
{
  "source": {
    "index": [
      "kibana_sample_data_flights" <1>
    ],
    "query": { <2>
      "range": {
        "DistanceKilometers": { 
          "gt": 0
        }
      }
    },
    "_source": { <3>
      "includes": [],
      "excludes": [
        "FlightDelay",
        "FlightDelayType"
      ]
    }
  },
  "dest": { <4>
    "index": "df-flight-delays",
    "results_field": "ml-results"
  },
  "analysis": {
  "regression": {
    "dependent_variable": "FlightDelayMin",
    "training_percent": 90
    }
  },
  "analyzed_fields": { <5>
    "includes": [],
    "excludes": [   
      "FlightNum"
    ]
  },
  "model_memory_limit": "100mb"
}
--------------------------------------------------
// TEST[skip:setup kibana sample data]

<1> The source index to analyze.
<2> This query filters out entire documents that will not be present in the 
destination index.
<3> The `_source` object defines fields in the dataset that will be included or 
excluded in the destination index. In this case, `includes` does not specify any 
fields, so the default behavior takes place: all the fields of the source index 
will included except the ones that are explicitly specified in `excludes`.
<4> Defines the destination index that contains the results of the analysis and 
the fields of the source index specified in the `_source` object. Also defines 
the name of the `results_field`.
<5> Specifies fields to be included in or excluded from the analysis. This does 
not affect whether the fields will be present in the destination index, only 
affects whether they are used in the analysis.

In this example, we can see that all the fields of the source index are included 
in the destination index except `FlightDelay` and `FlightDelayType` because 
these are defined as excluded fields by the `excludes` parameter of the 
`_source` object. The `FlightNum` field is included in the destination index, 
however it is not included in the analysis because it is explicitly specified as 
excluded field by the `excludes` parameter of the `analyzed_fields` object.


[[ml-put-dfanalytics-example-od]]
===== {oldetection-cap} example

The following example creates the `loganalytics` {dfanalytics-job}, the analysis 
type is `outlier_detection`:

[source,console]
--------------------------------------------------
PUT _ml/data_frame/analytics/loganalytics
{
  "description": "Outlier detection on log data",
  "source": {
    "index": "logdata"
  },
  "dest": {
    "index": "logdata_out"
  },
  "analysis": {
    "outlier_detection": {
      "compute_feature_influence": true,
      "outlier_fraction": 0.05,
      "standardization_enabled": true
    }
  }
}
--------------------------------------------------
// TEST[setup:setup_logdata]


The API returns the following result:

[source,console-result]
----
{
    "id": "loganalytics",
    "description": "Outlier detection on log data",
    "source": {
        "index": ["logdata"],
        "query": {
            "match_all": {}
        }
    },
    "dest": {
        "index": "logdata_out",
        "results_field": "ml"
    },
    "analysis": {
        "outlier_detection": {
            "compute_feature_influence": true,
            "outlier_fraction": 0.05,
            "standardization_enabled": true
        }
    },
    "model_memory_limit": "1gb",
    "create_time" : 1562265491319,
    "version" : "7.6.0",
    "allow_lazy_start" : false
}
----
// TESTRESPONSE[s/1562265491319/$body.$_path/]
// TESTRESPONSE[s/"version" : "7.6.0"/"version" : $body.version/]


[[ml-put-dfanalytics-example-r]]
===== {regression-cap} examples

The following example creates the `house_price_regression_analysis` 
{dfanalytics-job}, the analysis type is `regression`:

[source,console]
--------------------------------------------------
PUT _ml/data_frame/analytics/house_price_regression_analysis
{
  "source": {
    "index": "houses_sold_last_10_yrs"
  },
  "dest": {
    "index": "house_price_predictions"
  },
  "analysis": 
    {
      "regression": {
        "dependent_variable": "price"
      }
    }
}
--------------------------------------------------
// TEST[skip:TBD]


The API returns the following result:

[source,console-result]
----
{
  "id" : "house_price_regression_analysis",
  "source" : {
    "index" : [
      "houses_sold_last_10_yrs"
    ],
    "query" : {
      "match_all" : { }
    }
  },
  "dest" : {
    "index" : "house_price_predictions",
    "results_field" : "ml"
  },
  "analysis" : {
    "regression" : {
      "dependent_variable" : "price",
      "training_percent" : 100
    }
  },
  "model_memory_limit" : "1gb",
  "create_time" : 1567168659127,
  "version" : "8.0.0",
  "allow_lazy_start" : false
}
----
// TESTRESPONSE[s/1567168659127/$body.$_path/]
// TESTRESPONSE[s/"version": "8.0.0"/"version": $body.version/]


The following example creates a job and specifies a training percent:

[source,console]
--------------------------------------------------
PUT _ml/data_frame/analytics/student_performance_mathematics_0.3
{
 "source": {
   "index": "student_performance_mathematics"
 },
 "dest": {
   "index":"student_performance_mathematics_reg"
 },
 "analysis":
   {
     "regression": {
       "dependent_variable": "G3",
       "training_percent": 70,  <1>
       "randomize_seed": 19673948271  <2>
     }
   }
}
--------------------------------------------------
// TEST[skip:TBD]

<1> The `training_percent` defines the percentage of the data set that will be 
used for training the model.
<2> The `randomize_seed` is the seed used to randomly pick which data is used 
for training.


[[ml-put-dfanalytics-example-c]]
===== {classification-cap} example

The following example creates the `loan_classification` {dfanalytics-job}, the 
analysis type is `classification`:

[source,console]
--------------------------------------------------
PUT _ml/data_frame/analytics/loan_classification
{
  "source" : {
    "index": "loan-applicants"
  },
  "dest" : {
    "index": "loan-applicants-classified"
  },
  "analysis" : {
    "classification": {
      "dependent_variable": "label",
      "training_percent": 75,
      "num_top_classes": 2
    }
  }
}
--------------------------------------------------
// TEST[skip:TBD]
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 13:34:05 +02:00
+								[role="xpack"]
 								[testenv="platinum"]
 								[[put-dfanalytics]]
 								=== Create {dfanalytics-jobs} API
 								[subs="attributes"]
 								++++
 								<titleabbrev>Create {dfanalytics-jobs}</titleabbrev>
 								++++
 								Instantiates a {dfanalytics-job}.
-												[DOCS] Reformats API parameter details (#44194)


											
										
										
											2019-07-12 08:26:31 -07:00
+								experimental[]
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 13:34:05 +02:00
+								[[ml-put-dfanalytics-request]]
 								==== {api-request-title}
 								`PUT _ml/data_frame/analytics/<data_frame_analytics_id>`
-												[DOCS] [PUT DFA] Documents inline the child params of source and dest (#45649)

* [DOCS] [PUT DFA] Documents inline the child params of source and dest.

* [DOCS] Fixes indentation issues and amends dfa definitions.

											
										
										
											2019-08-29 14:38:14 +02:00
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 13:34:05 +02:00
+								[[ml-put-dfanalytics-prereq]]
 								==== {api-prereq-title}
-												[DOCS] Forms role and privilege requirements as bulleted lists in DFA API docs (#50732)

Co-Authored-By: Lisa Cawley <lcawley@elastic.co>
											
										
										
											2020-01-09 10:44:07 +01:00
+								If the {es} {security-features} are enabled, you must have the following built-in roles and privileges:
 								* `machine_learning_admin`
-												[DOCS] Changes kibana_user to kibana_admin in DFA API prerequisites. (#54806)


											
										
										
											2020-04-06 15:45:08 +02:00
+								* `kibana_admin` (UI only)
-												[DOCS] Forms role and privilege requirements as bulleted lists in DFA API docs (#50732)

Co-Authored-By: Lisa Cawley <lcawley@elastic.co>
											
										
										
											2020-01-09 10:44:07 +01:00
-												[DOCS] Edits create data frame analytics job API (#54751)

											
										
										
											2020-04-13 10:43:52 -07:00
+								* source indices: `read`, `view_index_metadata`
-												[DOCS] Forms role and privilege requirements as bulleted lists in DFA API docs (#50732)

Co-Authored-By: Lisa Cawley <lcawley@elastic.co>
											
										
										
											2020-01-09 10:44:07 +01:00
+								* destination index: `read`, `create_index`, `manage` and `index`
 								* cluster: `monitor` (UI only)
 								For more information, see <<security-privileges>> and <<built-in-roles>>.
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 13:34:05 +02:00
-												[DOCS] Adds documentation for secondary authorization headers (#55365) (#55986)


											
										
										
											2020-04-29 16:29:38 -07:00
+								NOTE: The {dfanalytics-job} remembers which roles the user who created it had at
 								the time of creation. When you start the job, it performs the analysis using
 								those same roles. If you provide
 								<<http-clients-secondary-authorization,secondary authorization headers>>,
 								those credentials are used instead.
-												[DOCS] [PUT DFA] Documents inline the child params of source and dest (#45649)

* [DOCS] [PUT DFA] Documents inline the child params of source and dest.

* [DOCS] Fixes indentation issues and amends dfa definitions.

											
										
										
											2019-08-29 14:38:14 +02:00
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 13:34:05 +02:00
+								[[ml-put-dfanalytics-desc]]
 								==== {api-description-title}
 								This API creates a {dfanalytics-job} that performs an analysis on the source
-												[DOCS] Edits create data frame analytics job API (#54751)

											
										
										
											2020-04-13 10:43:52 -07:00
+								indices and stores the outcome in a destination index.
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 13:34:05 +02:00
-												[DOCS] Edits create data frame analytics job API (#54751)

											
										
										
											2020-04-13 10:43:52 -07:00
+								If the destination index does not exist, it is created automatically when you
 								start the job. See <<start-dfanalytics>>.
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 13:34:05 +02:00
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 16:21:35 +01:00
+								[[ml-hyperparam-optimization]]
-												[DOCS] Edits create data frame analytics job API (#54751)

											
										
										
											2020-04-13 10:43:52 -07:00
+								If you supply only a subset of the {regression} or {classification} parameters,
 								_hyperparameter optimization_ occurs. It determines a value for each of the
 								undefined parameters.
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 16:21:35 +01:00
-												[DOCS] Edits create data frame analytics job API (#54751)

											
										
										
											2020-04-13 10:43:52 -07:00
+								////
 								The starting point is calculated for data dependent parameters by examining the loss
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 16:21:35 +01:00
+								on the training data. Subject to the size constraint, this operation provides an
 								upper bound on the improvement in validation loss.
-												[DOCS] Edits create data frame analytics job API (#54751)

											
										
										
											2020-04-13 10:43:52 -07:00
+								The optimization starts with random search, then
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 16:21:35 +01:00
+								Bayesian optimization is performed that is targeting maximum expected
 								improvement. If you override any parameters by explicitely setting it, the
 								optimization calculates the value of the remaining parameters accordingly and
 								uses the value you provided for the overridden parameter. The number of rounds
 								are reduced respectively. The validation error is estimated in each round by
 								using 4-fold cross validation.
-												[DOCS] Edits create data frame analytics job API (#54751)

											
										
										
											2020-04-13 10:43:52 -07:00
+								////
-												[DOCS] Adds supported fields section to the PUT DFA API description (#47842)



											
										
										
											2019-10-10 12:34:39 +02:00
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 16:21:35 +01:00
+								[[ml-put-dfanalytics-path-params]]
 								==== {api-path-parms-title}
-												[DOCS] Adds supported fields section to the PUT DFA API description (#47842)



											
										
										
											2019-10-10 12:34:39 +02:00
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 16:21:35 +01:00
+								`<data_frame_analytics_id>`::
 								(Required, string)
 								include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-data-frame-analytics-define]
-												[DOCS] Adds supported fields section to the PUT DFA API description (#47842)



											
										
										
											2019-10-10 12:34:39 +02:00
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 12:51:04 -07:00
+								[role="child_attributes"]
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 16:21:35 +01:00
+								[[ml-put-dfanalytics-request-body]]
 								==== {api-request-body-title}
-												[DOCS] [PUT DFA] Documents inline the child params of source and dest (#45649)

* [DOCS] [PUT DFA] Documents inline the child params of source and dest.

* [DOCS] Fixes indentation issues and amends dfa definitions.

											
										
										
											2019-08-29 14:38:14 +02:00
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 16:21:35 +01:00
+								`allow_lazy_start`::
 								(Optional, boolean)
-												[DOCS] Edits create data frame analytics job API (#54751)

											
										
										
											2020-04-13 10:43:52 -07:00
+								Specifies whether this job can start when there is insufficient {ml} node
 								capacity for it to be immediately assigned to a node. The default is `false`; if
 								a {ml} node with capacity to run the job cannot immediately be found, the
 								<<start-dfanalytics>> API returns an error. However, this is also subject to the
 								cluster-wide `xpack.ml.max_lazy_ml_nodes` setting. See <<advanced-ml-settings>>.
 								If this option is set to `true`, the API does not return an error and the job
 								waits in the `starting` state until sufficient {ml} node capacity is available.
-												[DOCS] Adds classification type DFA API docs and ml-shared.asciidoc (#48241)


											
										
										
											2019-11-06 07:40:27 -05:00
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 12:51:04 -07:00
+								//Begin analysis
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 16:21:35 +01:00
+								`analysis`::
 								(Required, object)
 								The analysis configuration, which contains the information necessary to perform
 								one of the following types of analysis: {classification}, {oldetection}, or
 								{regression}.
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 12:51:04 -07:00
+								+
 								.Properties of `analysis`
 								[%collapsible%open]
 								====
 								//Begin classification
 								`classification`:::
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 16:21:35 +01:00
+								(Required^*^, object)
 								The configuration information necessary to perform
 								{ml-docs}/dfa-classification.html[{classification}].
 								+
 								TIP: Advanced parameters are for fine-tuning {classanalysis}. They are set
-												[DOCS] Edits create data frame analytics job API (#54751)

											
										
										
											2020-04-13 10:43:52 -07:00
+								automatically by hyperparameter optimization to give the minimum validation
 								error. It is highly recommended to use the default values unless you fully
 								understand the function of these parameters.
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 12:51:04 -07:00
+								+
 								.Properties of `classification`
 								[%collapsible%open]
 								=====
-												[DOCS] Edits create data frame analytics job API (#54751)

											
										
										
											2020-04-13 10:43:52 -07:00
+								`class_assignment_objective`::::
 								(Optional, string)
 								Defines the objective to optimize when assigning class labels:
 								`maximize_accuracy` or `maximize_minimum_recall`. When maximizing accuracy,
 								class labels are chosen to maximize the number of correct predictions. When
 								maximizing minimum recall, labels are chosen to maximize the minimum recall
 								for any class. Defaults to `maximize_minimum_recall`.
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 12:51:04 -07:00
+								`dependent_variable`::::
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 16:21:35 +01:00
+								(Required, string)
 								+
 								include::{docdir}/ml/ml-shared.asciidoc[tag=dependent-variable]
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 12:51:04 -07:00
+								+
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 16:21:35 +01:00
+								The data type of the field must be numeric (`integer`, `short`, `long`, `byte`),
-												[DOCS] Remove text fields from classification dependent variables (#54849)

											
										
										
											2020-04-07 10:43:15 -07:00
+								categorical (`ip` or `keyword`), or boolean. There must be no more than 30
-												[DOCS] Edits create data frame analytics job API (#54751)

											
										
										
											2020-04-13 10:43:52 -07:00
+								different values in this field.
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 16:21:35 +01:00
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 12:51:04 -07:00
+								`eta`::::
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 16:21:35 +01:00
+								(Optional, double)
 								include::{docdir}/ml/ml-shared.asciidoc[tag=eta]
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 12:51:04 -07:00
+								`feature_bag_fraction`::::
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 16:21:35 +01:00
+								(Optional, double)
 								include::{docdir}/ml/ml-shared.asciidoc[tag=feature-bag-fraction]
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 12:51:04 -07:00
+								`gamma`::::
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 16:21:35 +01:00
+								(Optional, double)
 								include::{docdir}/ml/ml-shared.asciidoc[tag=gamma]
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 12:51:04 -07:00
+								`lambda`::::
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 16:21:35 +01:00
+								(Optional, double)
 								include::{docdir}/ml/ml-shared.asciidoc[tag=lambda]
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 12:51:04 -07:00
+								`max_trees`::::
 								(Optional, integer)
 								include::{docdir}/ml/ml-shared.asciidoc[tag=max-trees]
-												[7.x][ML] Adds the class_assignment_objective parameter to classification (#53552)

Adds a new parameter for classification that enables choosing whether to assign labels to
maximise accuracy or to maximise the minimum class recall.

Fixes #52427.
											
										
										
											2020-03-13 17:35:51 +00:00
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 12:51:04 -07:00
+								`num_top_classes`::::
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 16:21:35 +01:00
+								(Optional, integer)
-												[DOCS] Edits create data frame analytics job API (#54751)

											
										
										
											2020-04-13 10:43:52 -07:00
+								Defines the number of categories for which the predicted probabilities are
 								reported. It must be non-negative. If it is greater than the total number of
 								categories, the API reports all category probabilities. Defaults to 2.
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 16:21:35 +01:00
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 12:51:04 -07:00
+								`num_top_feature_importance_values`::::
-												[7.x][ML] Add num_top_feature_importance_values param to regression and classi… (#50914) (#50976)

Adds a new parameter to regression and classification that enables computation
of importance for the top most important features. The computation of the importance
is based on SHAP (SHapley Additive exPlanations) method.

Backport of #50914
											
										
										
											2020-01-14 16:46:09 +02:00
+								(Optional, integer)
-												[DOCS] Clarifies description of num_top_feature_importance_values (#52246)

Co-Authored-By: Valeriy Khakhutskyy <1292899+valeriy42@users.noreply.github.com>
											
										
										
											2020-02-18 08:48:24 -08:00
+								Advanced configuration option. Specifies the maximum number of
-												[DOCS] Changes feature importance links to point to the new page (#55531)

* [DOCS] Changes feature importance links to point to the new page.

* [DOCS] Fixes line breaks.
											
										
										
											2020-04-28 09:02:14 +02:00
+								{ml-docs}/ml-feature-importance.html[{feat-imp}] values per document to return.
 								By default, it is zero and no {feat-imp} calculation occurs.
-												[7.x][ML] Add num_top_feature_importance_values param to regression and classi… (#50914) (#50976)

Adds a new parameter to regression and classification that enables computation
of importance for the top most important features. The computation of the importance
is based on SHAP (SHapley Additive exPlanations) method.

Backport of #50914
											
										
										
											2020-01-14 16:46:09 +02:00
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 12:51:04 -07:00
+								`prediction_field_name`::::
 								(Optional, string)
 								include::{docdir}/ml/ml-shared.asciidoc[tag=prediction-field-name]
 								`randomize_seed`::::
 								(Optional, long)
 								include::{docdir}/ml/ml-shared.asciidoc[tag=randomize-seed]
 								`training_percent`::::
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 16:21:35 +01:00
+								(Optional, integer)
 								include::{docdir}/ml/ml-shared.asciidoc[tag=training-percent]
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 12:51:04 -07:00
+								//End classification
 								=====
 								//Begin outlier_detection
 								`outlier_detection`:::
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 16:21:35 +01:00
+								(Required^*^, object)
 								The configuration information necessary to perform
 								{ml-docs}/dfa-outlier-detection.html[{oldetection}]:
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 12:51:04 -07:00
+								+
 								.Properties of `outlier_detection`
 								[%collapsible%open]
 								=====
 								`compute_feature_influence`::::
-												[DOCS] Removes unshared sections from ml-shared.asciidoc (#55192)


											
										
										
											2020-04-14 18:47:09 -07:00
+								(Optional, boolean)
 								If `true`, the feature influence calculation is enabled. Defaults to `true`.
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 12:51:04 -07:00
 								`feature_influence_threshold`::::
-												[DOCS] Removes unshared sections from ml-shared.asciidoc (#55192)


											
										
										
											2020-04-14 18:47:09 -07:00
+								(Optional, double)
 								The minimum {olscore} that a document needs to have in order to calculate its
 								{fiscore}. Value range: 0-1 (`0.1` by default).
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 16:21:35 +01:00
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 12:51:04 -07:00
+								`method`::::
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 16:21:35 +01:00
+								(Optional, string)
-												[DOCS] Removes unshared sections from ml-shared.asciidoc (#55192)


											
										
										
											2020-04-14 18:47:09 -07:00
+								Sets the method that {oldetection} uses. If the method is not set {oldetection}
 								uses an ensemble of different methods and normalises and combines their
 								individual {olscores} to obtain the overall {olscore}. We recommend to use the
 								ensemble method. Available methods are `lof`, `ldof`, `distance_kth_nn`,
 								`distance_knn`.
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 12:51:04 -07:00
 								`n_neighbors`::::
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 16:21:35 +01:00
+								(Optional, integer)
-												[DOCS] Removes unshared sections from ml-shared.asciidoc (#55192)


											
										
										
											2020-04-14 18:47:09 -07:00
+								Defines the value for how many nearest neighbors each method of
 								{oldetection} will use to calculate its {olscore}. When the value is not set,
 								different values will be used for different ensemble members. This helps
 								improve diversity in the ensemble. Therefore, only override this if you are
 								confident that the value you choose is appropriate for the data set.
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 12:51:04 -07:00
 								`outlier_fraction`::::
-												[DOCS] Removes unshared sections from ml-shared.asciidoc (#55192)


											
										
										
											2020-04-14 18:47:09 -07:00
+								(Optional, double)
 								Sets the proportion of the data set that is assumed to be outlying prior to
 								{oldetection}. For example, 0.05 means it is assumed that 5% of values are real
 								outliers and 95% are inliers.
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 12:51:04 -07:00
 								`standardization_enabled`::::
-												[DOCS] Removes unshared sections from ml-shared.asciidoc (#55192)


											
										
										
											2020-04-14 18:47:09 -07:00
+								(Optional, boolean)
 								If `true`, then the following operation is performed on the columns before
 								computing outlier scores: (x_i - mean(x_i)) / sd(x_i). Defaults to `true`. For
 								more information, see
 								https://en.wikipedia.org/wiki/Feature_scaling#Standardization_(Z-score_Normalization)[this wiki page about standardization].
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 12:51:04 -07:00
+								//End outlier_detection
 								=====
 								//Begin regression
 								`regression`:::
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 16:21:35 +01:00
+								(Required^*^, object)
 								The configuration information necessary to perform
 								{ml-docs}/dfa-regression.html[{regression}].
 								+
 								TIP: Advanced parameters are for fine-tuning {reganalysis}. They are set
-												[DOCS] Edits create data frame analytics job API (#54751)

											
										
										
											2020-04-13 10:43:52 -07:00
+								automatically by hyperparameter optimization to give minimum validation error.
 								It is highly recommended to use the default values unless you fully understand
 								the function of these parameters.
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 12:51:04 -07:00
+								+
 								.Properties of `regression`
 								[%collapsible%open]
 								=====
 								`dependent_variable`::::
-												[7.x][DOCS] Moves data frame analytics job resource definitions into APIs (#50165)

* [7.x][DOCS] Moves data frame analytics job resource definitions into APIs.
											
										
										
											2019-12-13 11:48:21 +01:00
+								(Required, string)
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 16:21:35 +01:00
+								+
 								include::{docdir}/ml/ml-shared.asciidoc[tag=dependent-variable]
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 12:51:04 -07:00
+								+
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 16:21:35 +01:00
+								The data type of the field must be numeric.
-												[DOCS] Moves analysis resources to PUT DFA API docs (#50704)

Co-authored-by: Lisa Cawley <lcawley@elastic.co>

											
										
										
											2020-01-09 13:57:11 +01:00
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 12:51:04 -07:00
+								`eta`::::
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 16:21:35 +01:00
+								(Optional, double)
 								include::{docdir}/ml/ml-shared.asciidoc[tag=eta]
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 12:51:04 -07:00
+								`feature_bag_fraction`::::
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 16:21:35 +01:00
+								(Optional, double)
 								include::{docdir}/ml/ml-shared.asciidoc[tag=feature-bag-fraction]
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 12:51:04 -07:00
+								`gamma`::::
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 16:21:35 +01:00
+								(Optional, double)
 								include::{docdir}/ml/ml-shared.asciidoc[tag=gamma]
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 12:51:04 -07:00
+								`lambda`::::
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 16:21:35 +01:00
+								(Optional, double)
 								include::{docdir}/ml/ml-shared.asciidoc[tag=lambda]
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 12:51:04 -07:00
+								`max_trees`::::
 								(Optional, integer)
 								include::{docdir}/ml/ml-shared.asciidoc[tag=max-trees]
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 16:21:35 +01:00
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 12:51:04 -07:00
+								`num_top_feature_importance_values`::::
-												[7.x][ML] Add num_top_feature_importance_values param to regression and classi… (#50914) (#50976)

Adds a new parameter to regression and classification that enables computation
of importance for the top most important features. The computation of the importance
is based on SHAP (SHapley Additive exPlanations) method.

Backport of #50914
											
										
										
											2020-01-14 16:46:09 +02:00
+								(Optional, integer)
-												[DOCS] Clarifies description of num_top_feature_importance_values (#52246)

Co-Authored-By: Valeriy Khakhutskyy <1292899+valeriy42@users.noreply.github.com>
											
										
										
											2020-02-18 08:48:24 -08:00
+								Advanced configuration option. Specifies the maximum number of
-												[DOCS] Changes feature importance links to point to the new page (#55531)

* [DOCS] Changes feature importance links to point to the new page.

* [DOCS] Fixes line breaks.
											
										
										
											2020-04-28 09:02:14 +02:00
+								{ml-docs}/ml-feature-importance.html[{feat-imp}] values per document to return.
 								By default, it is zero and no {feat-imp} calculation occurs.
-												[7.x][ML] Add num_top_feature_importance_values param to regression and classi… (#50914) (#50976)

Adds a new parameter to regression and classification that enables computation
of importance for the top most important features. The computation of the importance
is based on SHAP (SHapley Additive exPlanations) method.

Backport of #50914
											
										
										
											2020-01-14 16:46:09 +02:00
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 12:51:04 -07:00
+								`prediction_field_name`::::
 								(Optional, string)
 								include::{docdir}/ml/ml-shared.asciidoc[tag=prediction-field-name]
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 16:21:35 +01:00
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 12:51:04 -07:00
+								`randomize_seed`::::
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 16:21:35 +01:00
+								(Optional, long)
 								include::{docdir}/ml/ml-shared.asciidoc[tag=randomize-seed]
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 12:51:04 -07:00
 								`training_percent`::::
 								(Optional, integer)
 								include::{docdir}/ml/ml-shared.asciidoc[tag=training-percent]
 								=====
 								//End regression
 								====
 								//End analysis
 								//Begin analyzed_fields
-												[DOCS] Reformats API parameter details (#44194)


											
										
										
											2019-07-12 08:26:31 -07:00
+								`analyzed_fields`::
-												[7.x][DOCS] Moves data frame analytics job resource definitions into APIs (#50165)

* [7.x][DOCS] Moves data frame analytics job resource definitions into APIs.
											
										
										
											2019-12-13 11:48:21 +01:00
+								(Optional, object)
-												[DOCS] Adds link points to the data frame analytics supported fields (#55004)

Co-authored-by: lcawl <lcawley@elastic.co>

											
										
										
											2020-04-09 20:16:13 +02:00
+								Specify `includes` and/or `excludes` patterns to select which fields will be
 								included in the analysis. The patterns specified in `excludes` are applied last,
 								therefore `excludes` takes precedence. In other words, if the same field is
 								specified in both `includes` and `excludes`, then the field will not be included
 								in the analysis.
 								+
 								--
 								[[dfa-supported-fields]]
 								The supported fields for each type of analysis are as follows:
 								* {oldetection-cap} requires numeric or boolean data to analyze. The algorithms
 								don't support missing values therefore fields that have data types other than
 								numeric or boolean are ignored. Documents where included fields contain missing
 								values, null values, or an array are also ignored. Therefore the `dest` index
 								may contain documents that don't have an {olscore}.
 								* {regression-cap} supports fields that are numeric, `boolean`, `text`,
 								`keyword`, and `ip`. It is also tolerant of missing values. Fields that are
 								supported are included in the analysis, other fields are ignored. Documents
 								where included fields contain  an array with two or more values are also
 								ignored. Documents in the `dest` index  that don’t contain a results field are
 								not included in the {reganalysis}.
 								* {classification-cap} supports fields that are numeric, `boolean`, `text`,
 								`keyword`, and `ip`. It is also tolerant of missing values. Fields that are
 								supported are included in the analysis, other fields are ignored. Documents
 								where included fields contain an array with two or more values are also ignored.
 								Documents in the `dest` index that don’t contain a results field are not
 								included in the {classanalysis}. {classanalysis-cap} can be improved by mapping
 								ordinal variable values to a  single number. For example, in case of age ranges,
 								you can model the values as "0-14" = 0, "15-24" = 1, "25-34" = 2, and so on.
 								If `analyzed_fields` is not set, only the relevant fields will be included. For
 								example, all the numeric fields for {oldetection}. For more information about
 								field selection, see <<explain-dfanalytics>>.
 								--
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 12:51:04 -07:00
+								+
 								.Properties of `analyzed_fields`
 								[%collapsible%open]
 								====
 								`excludes`:::
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 16:21:35 +01:00
+								(Optional, array)
-												[DOCS] Adds link points to the data frame analytics supported fields (#55004)

Co-authored-by: lcawl <lcawley@elastic.co>

											
										
										
											2020-04-09 20:16:13 +02:00
+								An array of strings that defines the fields that will be excluded from the
 								analysis. You do not need to add fields with unsupported data types to
 								`excludes`, these fields are excluded from the analysis automatically.
-												[7.x][DOCS] Moves data frame analytics job resource definitions into APIs (#50165)

* [7.x][DOCS] Moves data frame analytics job resource definitions into APIs.
											
										
										
											2019-12-13 11:48:21 +01:00
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 12:51:04 -07:00
+								`includes`:::
 								(Optional, array)
-												[DOCS] Adds link points to the data frame analytics supported fields (#55004)

Co-authored-by: lcawl <lcawley@elastic.co>

											
										
										
											2020-04-09 20:16:13 +02:00
+								An array of strings that defines the fields that will be included in the
 								analysis.
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 12:51:04 -07:00
+								//End analyzed_fields
 								====
-												[ML] Add description to DF analytics (#45774) (#46019)


											
										
										
											2019-08-27 15:48:59 +03:00
 								`description`::
-												[7.x][DOCS] Moves data frame analytics job resource definitions into APIs (#50165)

* [7.x][DOCS] Moves data frame analytics job resource definitions into APIs.
											
										
										
											2019-12-13 11:48:21 +01:00
+								(Optional, string)
 								include::{docdir}/ml/ml-shared.asciidoc[tag=description-dfa]
-												[ML] Add description to DF analytics (#45774) (#46019)


											
										
										
											2019-08-27 15:48:59 +03:00
-												[DOCS] Reformats API parameter details (#44194)


											
										
										
											2019-07-12 08:26:31 -07:00
+								`dest`::
-												[7.x][DOCS] Moves data frame analytics job resource definitions into APIs (#50165)

* [7.x][DOCS] Moves data frame analytics job resource definitions into APIs.
											
										
										
											2019-12-13 11:48:21 +01:00
+								(Required, object)
 								include::{docdir}/ml/ml-shared.asciidoc[tag=dest]
-												[DOCS] Amends data frame analytics resources, GET, and PUT API docs (#44806)

This PR addresses the feedback in  https://github.com/elastic/ml-team/issues/175#issuecomment-512215731.

* Adds an example to `analyzed_fields`
* Includes `source` and `dest` objects inline in the resource page
* Lists `model_memory_limit` in the PUT API page
* Amends the `analysis` section in the resource page
* Removes Properties headings in subsections
											
										
										
											2019-07-26 11:39:59 +02:00
 								`model_memory_limit`::
-												[7.x][DOCS] Moves data frame analytics job resource definitions into APIs (#50165)

* [7.x][DOCS] Moves data frame analytics job resource definitions into APIs.
											
										
										
											2019-12-13 11:48:21 +01:00
+								(Optional, string)
-												[DOCS] Removes unshared sections from ml-shared.asciidoc (#55192)


											
										
										
											2020-04-14 18:47:09 -07:00
+								The approximate maximum amount of memory resources that are permitted for
 								analytical processing. The default value for {dfanalytics-jobs} is `1gb`. If
 								your `elasticsearch.yml` file contains an `xpack.ml.max_model_memory_limit`
 								setting, an error occurs when you try to create {dfanalytics-jobs} that have
 								`model_memory_limit` values greater than that setting. For more information, see
 								<<ml-settings>>.
-												[DOCS] Fixes formatting in data frame analytics API

											
										
										
											2019-07-10 17:58:17 -07:00
-												[DOCS] Reformats API parameter details (#44194)


											
										
										
											2019-07-12 08:26:31 -07:00
+								`source`::
-												[7.x][DOCS] Moves data frame analytics job resource definitions into APIs (#50165)

* [7.x][DOCS] Moves data frame analytics job resource definitions into APIs.
											
										
										
											2019-12-13 11:48:21 +01:00
+								(object)
-												[DOCS] Removes unshared sections from ml-shared.asciidoc (#55192)


											
										
										
											2020-04-14 18:47:09 -07:00
+								The configuration of how to source the analysis data. It requires an `index`.
 								Optionally, `query` and `_source` may be specified.
 								+
 								.Properties of `source`
 								[%collapsible%open]
 								====
 								`index`:::
 								(Required, string or array) Index or indices on which to perform the analysis.
 								It can be a single index or index pattern as well as an array of indices or
 								patterns.
 								+
 								WARNING: If your source indices contain documents with the same IDs, only the
 								document that is indexed last appears in the destination index.
 								`query`:::
 								(Optional, object) The {es} query domain-specific language (<<query-dsl,DSL>>).
 								This value corresponds to the query object in an {es} search POST body. All the
 								options that are supported by {es} can be used, as this object is passed
 								verbatim to {es}. By default, this property has the following value:
 								`{"match_all": {}}`.
 								`_source`:::
 								(Optional, object) Specify `includes` and/or `excludes` patterns to select which
 								fields will be present in the destination. Fields that are excluded cannot be
 								included in the analysis.
 								+
 								.Properties of `_source`
 								[%collapsible%open]
 								=====
 								`includes`::::
 								(array) An array of strings that defines the fields that will be included in the
 								destination.
 								`excludes`::::
 								(array) An array of strings that defines the fields that will be excluded from
 								the destination.
 								=====
 								====
-												[DOCS] [PUT DFA] Documents inline the child params of source and dest (#45649)

* [DOCS] [PUT DFA] Documents inline the child params of source and dest.

* [DOCS] Fixes indentation issues and amends dfa definitions.

											
										
										
											2019-08-29 14:38:14 +02:00
-												Revert "[DOCS] Moves analysis resources to PUT DFA API docs (#50704)"

This reverts commit 4e1107d5d717599ddf1632c3253de6f1df4a51af.

											
										
										
											2020-01-09 14:31:35 +01:00
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 13:34:05 +02:00
+								[[ml-put-dfanalytics-example]]
 								==== {api-examples-title}
-												Revert "[DOCS] Moves analysis resources to PUT DFA API docs (#50704)"

This reverts commit 4e1107d5d717599ddf1632c3253de6f1df4a51af.

											
										
										
											2020-01-09 14:31:35 +01:00
-												[DOCS] Adds an example of preprocessing actions to the PUT DFA API docs (#49831)



											
										
										
											2019-12-05 14:15:19 +01:00
+								[[ml-put-dfanalytics-example-preprocess]]
 								===== Preprocessing actions example
 								The following example shows how to limit the scope of the analysis to certain
 								fields, specify excluded fields in the destination index, and use a query to
 								filter your data before analysis.
 								[source,console]
 								--------------------------------------------------
 								PUT _ml/data_frame/analytics/model-flight-delays-pre
 								{
 								  "source": {
 								    "index": [
 								      "kibana_sample_data_flights" <1>
 								    ],
 								    "query": { <2>
 								      "range": {
 								        "DistanceKilometers": {
 								          "gt": 0
 								        }
 								      }
 								    },
 								    "_source": { <3>
 								      "includes": [],
 								      "excludes": [
 								        "FlightDelay",
 								        "FlightDelayType"
 								      ]
 								    }
 								  },
 								  "dest": { <4>
 								    "index": "df-flight-delays",
 								    "results_field": "ml-results"
 								  },
 								  "analysis": {
 								  "regression": {
 								    "dependent_variable": "FlightDelayMin",
 								    "training_percent": 90
 								    }
 								  },
 								  "analyzed_fields": { <5>
 								    "includes": [],
 								    "excludes": [
 								      "FlightNum"
 								    ]
 								  },
 								  "model_memory_limit": "100mb"
 								}
 								--------------------------------------------------
 								// TEST[skip:setup kibana sample data]
 								<1> The source index to analyze.
 								<2> This query filters out entire documents that will not be present in the
 								destination index.
 								<3> The `_source` object defines fields in the dataset that will be included or
 								excluded in the destination index. In this case, `includes` does not specify any
 								fields, so the default behavior takes place: all the fields of the source index
 								will included except the ones that are explicitly specified in `excludes`.
 								<4> Defines the destination index that contains the results of the analysis and
 								the fields of the source index specified in the `_source` object. Also defines
 								the name of the `results_field`.
 								<5> Specifies fields to be included in or excluded from the analysis. This does
 								not affect whether the fields will be present in the destination index, only
 								affects whether they are used in the analysis.
 								In this example, we can see that all the fields of the source index are included
 								in the destination index except `FlightDelay` and `FlightDelayType` because
 								these are defined as excluded fields by the `excludes` parameter of the
 								`_source` object. The `FlightNum` field is included in the destination index,
 								however it is not included in the analysis because it is explicitly specified as
 								excluded field by the `excludes` parameter of the `analyzed_fields` object.
-												[DOCS] Adds regression analytics resources and examples to the data frame analytics APIs and the evaluation API (#46176)

* [DOCS] Adds regression analytics resources and examples to the data frame analytics APIs.
Co-Authored-By: Benjamin Trent <ben.w.trent@gmail.com>
Co-Authored-By: Tom Veasey <tveasey@users.noreply.github.com>

											
										
										
											2019-09-19 09:10:11 +02:00
+								[[ml-put-dfanalytics-example-od]]
 								===== {oldetection-cap} example
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 13:34:05 +02:00
+								The following example creates the `loganalytics` {dfanalytics-job}, the analysis
 								type is `outlier_detection`:
-												[DOCS] Change // CONSOLE comments to [source,console] (#46440) (#46494)


											
										
										
											2019-09-09 12:35:50 -04:00
+								[source,console]
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 13:34:05 +02:00
+								--------------------------------------------------
 								PUT _ml/data_frame/analytics/loganalytics
 								{
-												[ML] Add description to DF analytics (#45774) (#46019)


											
										
										
											2019-08-27 15:48:59 +03:00
+								  "description": "Outlier detection on log data",
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 13:34:05 +02:00
+								  "source": {
 								    "index": "logdata"
 								  },
 								  "dest": {
 								    "index": "logdata_out"
 								  },
 								  "analysis": {
 								    "outlier_detection": {
-												[7.x][ML] Additional outlier detection parameters (#47600) (#47669)

Adds the following parameters to `outlier_detection`:

- `compute_feature_influence` (boolean): whether to compute or not
   feature influence scores
- `outlier_fraction` (double): the proportion of the data set assumed
   to be outlying prior to running outlier detection
- `standardization_enabled` (boolean): whether to apply standardization
   to the feature values

Backport of #47600
											
										
										
											2019-10-07 18:21:33 +03:00
+								      "compute_feature_influence": true,
 								      "outlier_fraction": 0.05,
 								      "standardization_enabled": true
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 13:34:05 +02:00
+								    }
 								  }
 								}
 								--------------------------------------------------
-												[DOCS] Updates 7.x version in data frame analytics API (#44026)


											
										
										
											2019-07-08 11:20:57 -07:00
+								// TEST[setup:setup_logdata]
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 13:34:05 +02:00
-												[DOCS] [PUT DFA] Documents inline the child params of source and dest (#45649)

* [DOCS] [PUT DFA] Documents inline the child params of source and dest.

* [DOCS] Fixes indentation issues and amends dfa definitions.

											
										
										
											2019-08-29 14:38:14 +02:00
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 13:34:05 +02:00
+								The API returns the following result:
-												[DOCS] [5 of 5] Change // TESTRESPONSE comments to [source,console-results] (#46449) (#46459)


											
										
										
											2019-09-06 16:09:09 -04:00
+								[source,console-result]
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 13:34:05 +02:00
+								----
 								{
-												[7.x][DOCS] Moves data frame analytics job resource definitions into APIs (#50165)

* [7.x][DOCS] Moves data frame analytics job resource definitions into APIs.
											
										
										
											2019-12-13 11:48:21 +01:00
+								    "id": "loganalytics",
 								    "description": "Outlier detection on log data",
 								    "source": {
 								        "index": ["logdata"],
 								        "query": {
 								            "match_all": {}
 								        }
 								    },
 								    "dest": {
 								        "index": "logdata_out",
 								        "results_field": "ml"
 								    },
 								    "analysis": {
 								        "outlier_detection": {
 								            "compute_feature_influence": true,
 								            "outlier_fraction": 0.05,
 								            "standardization_enabled": true
 								        }
 								    },
 								    "model_memory_limit": "1gb",
 								    "create_time" : 1562265491319,
 								    "version" : "7.6.0",
 								    "allow_lazy_start" : false
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 13:34:05 +02:00
+								}
 								----
-												[7.x][DOCS] Moves data frame analytics job resource definitions into APIs (#50165)

* [7.x][DOCS] Moves data frame analytics job resource definitions into APIs.
											
										
										
											2019-12-13 11:48:21 +01:00
+								// TESTRESPONSE[s/1562265491319/$body.$_path/]
-												Fix hardcoded version replacement in put-dfanalytics.asciidoc #51053

The version replacement for the code snippet should replace 7.6 with the current version,
but doesn't match because of a missing whitespace.

Closes #51052
											
										
										
											2020-01-15 18:09:37 +01:00
+								// TESTRESPONSE[s/"version" : "7.6.0"/"version" : $body.version/]
-												[DOCS] Adds regression analytics resources and examples to the data frame analytics APIs and the evaluation API (#46176)

* [DOCS] Adds regression analytics resources and examples to the data frame analytics APIs.
Co-Authored-By: Benjamin Trent <ben.w.trent@gmail.com>
Co-Authored-By: Tom Veasey <tveasey@users.noreply.github.com>

											
										
										
											2019-09-19 09:10:11 +02:00
 								[[ml-put-dfanalytics-example-r]]
-												[DOCS] Adds examples to the PUT dfa and the evaluate dfa APIs (#46966)

* [DOCS] Adds examples to the PUT dfa and the evaluate dfa APIs.

* [DOCS] Removes extra lines from examples.

* Update docs/reference/ml/df-analytics/apis/evaluate-dfanalytics.asciidoc

Co-Authored-By: Lisa Cawley <lcawley@elastic.co>

* Update docs/reference/ml/df-analytics/apis/put-dfanalytics.asciidoc

Co-Authored-By: Lisa Cawley <lcawley@elastic.co>

* [DOCS] Explains examples.

											
										
										
											2019-10-02 10:26:20 +02:00
+								===== {regression-cap} examples
-												[DOCS] Adds regression analytics resources and examples to the data frame analytics APIs and the evaluation API (#46176)

* [DOCS] Adds regression analytics resources and examples to the data frame analytics APIs.
Co-Authored-By: Benjamin Trent <ben.w.trent@gmail.com>
Co-Authored-By: Tom Veasey <tveasey@users.noreply.github.com>

											
										
										
											2019-09-19 09:10:11 +02:00
-												[DOCS] Fixes typos in the PUT dfa and the evaluate dfa documentation. (#47348)


											
										
										
											2019-10-02 09:49:59 +02:00
+								The following example creates the `house_price_regression_analysis`
 								{dfanalytics-job}, the analysis type is `regression`:
-												[DOCS] Adds regression analytics resources and examples to the data frame analytics APIs and the evaluation API (#46176)

* [DOCS] Adds regression analytics resources and examples to the data frame analytics APIs.
Co-Authored-By: Benjamin Trent <ben.w.trent@gmail.com>
Co-Authored-By: Tom Veasey <tveasey@users.noreply.github.com>

											
										
										
											2019-09-19 09:10:11 +02:00
 								[source,console]
 								--------------------------------------------------
 								PUT _ml/data_frame/analytics/house_price_regression_analysis
 								{
 								  "source": {
 								    "index": "houses_sold_last_10_yrs"
 								  },
 								  "dest": {
 								    "index": "house_price_predictions"
 								  },
 								  "analysis":
 								    {
 								      "regression": {
 								        "dependent_variable": "price"
 								      }
 								    }
 								}
 								--------------------------------------------------
 								// TEST[skip:TBD]
 								The API returns the following result:
 								[source,console-result]
 								----
 								{
 								  "id" : "house_price_regression_analysis",
 								  "source" : {
 								    "index" : [
 								      "houses_sold_last_10_yrs"
 								    ],
 								    "query" : {
 								      "match_all" : { }
 								    }
 								  },
 								  "dest" : {
 								    "index" : "house_price_predictions",
 								    "results_field" : "ml"
 								  },
 								  "analysis" : {
 								    "regression" : {
 								      "dependent_variable" : "price",
 								      "training_percent" : 100
 								    }
 								  },
 								  "model_memory_limit" : "1gb",
 								  "create_time" : 1567168659127,
-												[ML][7.x] Add lazy assignment job config option (#47993)

This change adds:

- A new option, allow_lazy_open, to anomaly detection jobs
- A new option, allow_lazy_start, to data frame analytics jobs

Both work in the same way: they allow a job to be
opened/started even if no ML node exists that can
accommodate the job immediately. In this situation
the job waits in the opening/starting state until ML
node capacity is available. (The starting state for data
frame analytics jobs is new in this change.)

Additionally, the ML nightly maintenance tasks now
creates audit warnings for ML jobs that are unassigned.
This means that jobs that cannot be assigned to an ML
node for a very long time will show a yellow warning
triangle in the UI.

A final change is that it is now possible to close a job
that is not assigned to a node without using force.
This is because previously jobs that were open but
not assigned to a node were an aberration, whereas
after this change they'll be relatively common.
											
										
										
											2019-10-15 06:55:11 +01:00
+								  "version" : "8.0.0",
 								  "allow_lazy_start" : false
-												[DOCS] Adds regression analytics resources and examples to the data frame analytics APIs and the evaluation API (#46176)

* [DOCS] Adds regression analytics resources and examples to the data frame analytics APIs.
Co-Authored-By: Benjamin Trent <ben.w.trent@gmail.com>
Co-Authored-By: Tom Veasey <tveasey@users.noreply.github.com>

											
										
										
											2019-09-19 09:10:11 +02:00
+								}
 								----
 								// TESTRESPONSE[s/1567168659127/$body.$_path/]
-												[DOCS] Adds examples to the PUT dfa and the evaluate dfa APIs (#46966)

* [DOCS] Adds examples to the PUT dfa and the evaluate dfa APIs.

* [DOCS] Removes extra lines from examples.

* Update docs/reference/ml/df-analytics/apis/evaluate-dfanalytics.asciidoc

Co-Authored-By: Lisa Cawley <lcawley@elastic.co>

* Update docs/reference/ml/df-analytics/apis/put-dfanalytics.asciidoc

Co-Authored-By: Lisa Cawley <lcawley@elastic.co>

* [DOCS] Explains examples.

											
										
										
											2019-10-02 10:26:20 +02:00
+								// TESTRESPONSE[s/"version": "8.0.0"/"version": $body.version/]
 								The following example creates a job and specifies a training percent:
 								[source,console]
 								--------------------------------------------------
 								PUT _ml/data_frame/analytics/student_performance_mathematics_0.3
 								{
 								 "source": {
 								   "index": "student_performance_mathematics"
 								 },
 								 "dest": {
 								   "index":"student_performance_mathematics_reg"
 								 },
 								 "analysis":
 								   {
 								     "regression": {
 								       "dependent_variable": "G3",
-												[7.x][ML] Introduce randomize_seed setting for regression and classification (#49990) (#50023)

This adds a new `randomize_seed` for regression and classification.
When not explicitly set, the seed is randomly generated. One can
reuse the seed in a similar job in order to ensure the same docs
are picked for training.

Backport of #49990
											
										
										
											2019-12-10 15:29:19 +02:00
+								       "training_percent": 70,  <1>
 								       "randomize_seed": 19673948271  <2>
-												[DOCS] Adds examples to the PUT dfa and the evaluate dfa APIs (#46966)

* [DOCS] Adds examples to the PUT dfa and the evaluate dfa APIs.

* [DOCS] Removes extra lines from examples.

* Update docs/reference/ml/df-analytics/apis/evaluate-dfanalytics.asciidoc

Co-Authored-By: Lisa Cawley <lcawley@elastic.co>

* Update docs/reference/ml/df-analytics/apis/put-dfanalytics.asciidoc

Co-Authored-By: Lisa Cawley <lcawley@elastic.co>

* [DOCS] Explains examples.

											
										
										
											2019-10-02 10:26:20 +02:00
+								     }
 								   }
 								}
 								--------------------------------------------------
 								// TEST[skip:TBD]
-												[7.x][DOCS] Moves data frame analytics job resource definitions into APIs (#50165)

* [7.x][DOCS] Moves data frame analytics job resource definitions into APIs.
											
										
										
											2019-12-13 11:48:21 +01:00
+								<1> The `training_percent` defines the percentage of the data set that will be
 								used for training the model.
 								<2> The `randomize_seed` is the seed used to randomly pick which data is used
 								for training.
-												[DOCS] Adds classification type DFA API docs and ml-shared.asciidoc (#48241)


											
										
										
											2019-11-06 07:40:27 -05:00
 								[[ml-put-dfanalytics-example-c]]
 								===== {classification-cap} example
 								The following example creates the `loan_classification` {dfanalytics-job}, the
 								analysis type is `classification`:
 								[source,console]
 								--------------------------------------------------
 								PUT _ml/data_frame/analytics/loan_classification
 								{
 								  "source" : {
 								    "index": "loan-applicants"
 								  },
 								  "dest" : {
 								    "index": "loan-applicants-classified"
 								  },
 								  "analysis" : {
 								    "classification": {
 								      "dependent_variable": "label",
 								      "training_percent": 75,
 								      "num_top_classes": 2
 								    }
 								  }
 								}
 								--------------------------------------------------
 								// TEST[skip:TBD]