OpenSearch/docs/reference/ml/df-analytics/apis/put-dfanalytics.asciidoc

[role="xpack"]
[testenv="platinum"]
[[put-dfanalytics]]
= Create {dfanalytics-jobs} API
[subs="attributes"]
++++
<titleabbrev>Create {dfanalytics-jobs}</titleabbrev>
++++

Instantiates a {dfanalytics-job}.

experimental[]

[[ml-put-dfanalytics-request]]
== {api-request-title}

`PUT _ml/data_frame/analytics/<data_frame_analytics_id>`


[[ml-put-dfanalytics-prereq]]
== {api-prereq-title}

If the {es} {security-features} are enabled, you must have the following built-in roles and privileges:

* `machine_learning_admin`
* source indices: `read`, `view_index_metadata`
* destination index: `read`, `create_index`, `manage` and `index`
  
For more information, see <<built-in-roles>>, <<security-privileges>>, and
{ml-docs-setup-privileges}.


NOTE: The {dfanalytics-job} remembers which roles the user who created it had at
the time of creation. When you start the job, it performs the analysis using
those same roles. If you provide
<<http-clients-secondary-authorization,secondary authorization headers>>, 
those credentials are used instead.

[[ml-put-dfanalytics-desc]]
== {api-description-title}

This API creates a {dfanalytics-job} that performs an analysis on the source 
indices and stores the outcome in a destination index.

If the destination index does not exist, it is created automatically when you
start the job. See <<start-dfanalytics>>.

If you supply only a subset of the {regression} or {classification} parameters,
{ml-docs}/hyperparameters.html[hyperparameter optimization] occurs. It 
determines a value for each of the undefined parameters.

[[ml-put-dfanalytics-path-params]]
== {api-path-parms-title}

`<data_frame_analytics_id>`::
(Required, string)
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=job-id-data-frame-analytics-define]

[role="child_attributes"]
[[ml-put-dfanalytics-request-body]]
== {api-request-body-title}

`allow_lazy_start`::
(Optional, boolean) 
Specifies whether this job can start when there is insufficient {ml} node 
capacity for it to be immediately assigned to a node. The default is `false`; if 
a {ml} node with capacity to run the job cannot immediately be found, the
<<start-dfanalytics>> API returns an error. However, this is also subject to the
cluster-wide `xpack.ml.max_lazy_ml_nodes` setting. See <<advanced-ml-settings>>.
If this option is set to `true`, the API does not return an error and the job
waits in the `starting` state until sufficient {ml} node capacity is available.

//Begin analysis
`analysis`::
(Required, object)
The analysis configuration, which contains the information necessary to perform
one of the following types of analysis: {classification}, {oldetection}, or
{regression}.
+
.Properties of `analysis`
[%collapsible%open]
====
//Begin classification
`classification`:::
(Required^*^, object)
The configuration information necessary to perform
{ml-docs}/dfa-classification.html[{classification}].
+
TIP: Advanced parameters are for fine-tuning {classanalysis}. They are set 
automatically by hyperparameter optimization to give the minimum validation
error. It is highly recommended to use the default values unless you fully
understand the function of these parameters.
+
.Properties of `classification`
[%collapsible%open]
=====
`class_assignment_objective`::::
(Optional, string)
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=class-assignment-objective]

`dependent_variable`::::
(Required, string)
+
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dependent-variable]
+
The data type of the field must be numeric (`integer`, `short`, `long`, `byte`),
categorical (`ip` or `keyword`), or boolean. There must be no more than 30
different values in this field.

`eta`::::
(Optional, double) 
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=eta]

`feature_bag_fraction`::::
(Optional, double) 
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=feature-bag-fraction]

`gamma`::::
(Optional, double) 
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=gamma]

`lambda`::::
(Optional, double) 
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=lambda]

`max_trees`::::
(Optional, integer) 
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=max-trees]

`num_top_classes`::::
(Optional, integer)
Defines the number of categories for which the predicted probabilities are
reported. It must be non-negative. If it is greater than the total number of
categories, the API reports all category probabilities. Defaults to 2.

`num_top_feature_importance_values`::::
(Optional, integer)
Advanced configuration option. Specifies the maximum number of
{ml-docs}/ml-feature-importance.html[{feat-imp}] values per document to return. 
By default, it is zero and no {feat-imp} calculation occurs.

`prediction_field_name`::::
(Optional, string) 
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=prediction-field-name]

`randomize_seed`::::
(Optional, long)
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=randomize-seed]

`training_percent`::::
(Optional, integer)
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=training-percent]
//End classification
=====
//Begin outlier_detection
`outlier_detection`:::
(Required^*^, object)
The configuration information necessary to perform
{ml-docs}/dfa-outlier-detection.html[{oldetection}]:
+
.Properties of `outlier_detection`
[%collapsible%open]
=====
`compute_feature_influence`::::
(Optional, boolean)
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=compute-feature-influence]
  
`feature_influence_threshold`:::: 
(Optional, double)
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=feature-influence-threshold]

`method`::::
(Optional, string)
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=method]
  
`n_neighbors`::::
(Optional, integer)
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=n-neighbors]
  
`outlier_fraction`::::
(Optional, double)
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=outlier-fraction]
  
`standardization_enabled`::::
(Optional, boolean)
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=standardization-enabled]
//End outlier_detection
=====
//Begin regression
`regression`:::
(Required^*^, object)
The configuration information necessary to perform
{ml-docs}/dfa-regression.html[{regression}].
+
TIP: Advanced parameters are for fine-tuning {reganalysis}. They are set 
automatically by hyperparameter optimization to give minimum validation error.
It is highly recommended to use the default values unless you fully understand
the function of these parameters.
+
.Properties of `regression`
[%collapsible%open]
=====
`dependent_variable`::::
(Required, string)
+
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dependent-variable]
+
The data type of the field must be numeric.

`eta`::::
(Optional, double)
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=eta]

`feature_bag_fraction`::::
(Optional, double)
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=feature-bag-fraction]

`gamma`::::
(Optional, double) 
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=gamma]

`lambda`::::
(Optional, double) 
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=lambda]

`loss_function`::::
(Optional, string)
The loss function used during {regression}. Available options are `mse` (mean 
squared error), `msle` (mean squared logarithmic error),  `huber` (Pseudo-Huber 
loss). Defaults to `mse`. Refer to 
{ml-docs}/dfa-regression.html#dfa-regression-lossfunction[Loss functions for {regression} analyses] 
to learn more.

`loss_function_parameter`::::
(Optional, double)
A positive number that is used as a parameter to the `loss_function`.

`max_trees`::::
(Optional, integer) 
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=max-trees]

`num_top_feature_importance_values`::::
(Optional, integer)
Advanced configuration option. Specifies the maximum number of
{ml-docs}/ml-feature-importance.html[{feat-imp}] values per document to return. 
By default, it is zero and no {feat-imp} calculation occurs.

`prediction_field_name`::::
(Optional, string)
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=prediction-field-name]

`randomize_seed`::::
(Optional, long)
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=randomize-seed]

`training_percent`::::
(Optional, integer)
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=training-percent]
=====
//End regression
====
//End analysis

//Begin analyzed_fields
`analyzed_fields`::
(Optional, object)
Specify `includes` and/or `excludes` patterns to select which fields will be 
included in the analysis. The patterns specified in `excludes` are applied last, 
therefore `excludes` takes precedence. In other words, if the same field is 
specified in both `includes` and `excludes`, then the field will not be included 
in the analysis.
+
--
[[dfa-supported-fields]]
The supported fields for each type of analysis are as follows:

* {oldetection-cap} requires numeric or boolean data to analyze. The algorithms 
don't support missing values therefore fields that have data types other than 
numeric or boolean are ignored. Documents where included fields contain missing 
values, null values, or an array are also ignored. Therefore the `dest` index 
may contain documents that don't have an {olscore}.
* {regression-cap} supports fields that are numeric, `boolean`, `text`, 
`keyword`, and `ip`. It is also tolerant of missing values. Fields that are 
supported are included in the analysis, other fields are ignored. Documents 
where included fields contain  an array with two or more values are also 
ignored. Documents in the `dest` index  that don’t contain a results field are 
not included in the {reganalysis}.
* {classification-cap} supports fields that are numeric, `boolean`, `text`,
`keyword`, and `ip`. It is also tolerant of missing values. Fields that are 
supported are included in the analysis, other fields are ignored. Documents
where included fields contain an array with two or more values are also ignored. 
Documents in the `dest` index that don’t contain a results field are not
included in the {classanalysis}. {classanalysis-cap} can be improved by mapping
ordinal variable values to a  single number. For example, in case of age ranges,
you can model the values as "0-14" = 0, "15-24" = 1, "25-34" = 2, and so on.

If `analyzed_fields` is not set, only the relevant fields will be included. For
example, all the numeric fields for {oldetection}. For more information about
field selection, see <<explain-dfanalytics>>.
--
+
.Properties of `analyzed_fields`
[%collapsible%open]
====
`excludes`:::
(Optional, array)
An array of strings that defines the fields that will be excluded from the
analysis. You do not need to add fields with unsupported data types to
`excludes`, these fields are excluded from the analysis automatically.

`includes`:::
(Optional, array)
An array of strings that defines the fields that will be included in the 
analysis.
//End analyzed_fields
====

`description`::
(Optional, string)
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=description-dfa]

`dest`::
(Required, object)
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dest]

`max_num_threads`::
(Optional, integer)
The maximum number of threads to be used by the analysis.
The default value is `1`. Using more threads may decrease the time
necessary to complete the analysis at the cost of using more CPU.
Note that the process may use additional threads for operational
functionality other than the analysis itself.
  
`model_memory_limit`::
(Optional, string)
The approximate maximum amount of memory resources that are permitted for 
analytical processing. The default value for {dfanalytics-jobs} is `1gb`. If 
your `elasticsearch.yml` file contains an `xpack.ml.max_model_memory_limit` 
setting, an error occurs when you try to create {dfanalytics-jobs} that have 
`model_memory_limit` values greater than that setting. For more information, see 
<<ml-settings>>.
  
`source`::
(object)
The configuration of how to source the analysis data. It requires an `index`.
Optionally, `query` and `_source` may be specified.
+
.Properties of `source`
[%collapsible%open]
====
`index`:::
(Required, string or array) Index or indices on which to perform the analysis.
It can be a single index or index pattern as well as an array of indices or
patterns.
+
WARNING: If your source indices contain documents with the same IDs, only the 
document that is indexed last appears in the destination index.

`query`:::
(Optional, object) The {es} query domain-specific language (<<query-dsl,DSL>>).
This value corresponds to the query object in an {es} search POST body. All the
options that are supported by {es} can be used, as this object is passed
verbatim to {es}. By default, this property has the following value:
`{"match_all": {}}`.

`_source`:::
(Optional, object) Specify `includes` and/or `excludes` patterns to select which
fields will be present in the destination. Fields that are excluded cannot be
included in the analysis.
+
.Properties of `_source`
[%collapsible%open]
=====
`includes`::::
(array) An array of strings that defines the fields that will be included in the
destination.
        
`excludes`::::
(array) An array of strings that defines the fields that will be excluded from
the destination.
=====
====


[[ml-put-dfanalytics-example]]
== {api-examples-title}


[[ml-put-dfanalytics-example-preprocess]]
=== Preprocessing actions example

The following example shows how to limit the scope of the analysis to certain 
fields, specify excluded fields in the destination index, and use a query to 
filter your data before analysis.

[source,console]
--------------------------------------------------
PUT _ml/data_frame/analytics/model-flight-delays-pre
{
  "source": {
    "index": [
      "kibana_sample_data_flights" <1>
    ],
    "query": { <2>
      "range": {
        "DistanceKilometers": { 
          "gt": 0
        }
      }
    },
    "_source": { <3>
      "includes": [],
      "excludes": [
        "FlightDelay",
        "FlightDelayType"
      ]
    }
  },
  "dest": { <4>
    "index": "df-flight-delays",
    "results_field": "ml-results"
  },
  "analysis": {
  "regression": {
    "dependent_variable": "FlightDelayMin",
    "training_percent": 90
    }
  },
  "analyzed_fields": { <5>
    "includes": [],
    "excludes": [   
      "FlightNum"
    ]
  },
  "model_memory_limit": "100mb"
}
--------------------------------------------------
// TEST[skip:setup kibana sample data]

<1> Source index to analyze.
<2> This query filters out entire documents that will not be present in the 
destination index.
<3> The `_source` object defines fields in the dataset that will be included or 
excluded in the destination index. 
<4> Defines the destination index that contains the results of the analysis and 
the fields of the source index specified in the `_source` object. Also defines 
the name of the `results_field`.
<5> Specifies fields to be included in or excluded from the analysis. This does 
not affect whether the fields will be present in the destination index, only 
affects whether they are used in the analysis.

In this example, we can see that all the fields of the source index are included 
in the destination index except `FlightDelay` and `FlightDelayType` because 
these are defined as excluded fields by the `excludes` parameter of the 
`_source` object. The `FlightNum` field is included in the destination index, 
however it is not included in the analysis because it is explicitly specified as 
excluded field by the `excludes` parameter of the `analyzed_fields` object.


[[ml-put-dfanalytics-example-od]]
=== {oldetection-cap} example

The following example creates the `loganalytics` {dfanalytics-job}, the analysis 
type is `outlier_detection`:

[source,console]
--------------------------------------------------
PUT _ml/data_frame/analytics/loganalytics
{
  "description": "Outlier detection on log data",
  "source": {
    "index": "logdata"
  },
  "dest": {
    "index": "logdata_out"
  },
  "analysis": {
    "outlier_detection": {
      "compute_feature_influence": true,
      "outlier_fraction": 0.05,
      "standardization_enabled": true
    }
  }
}
--------------------------------------------------
// TEST[setup:setup_logdata]


The API returns the following result:

[source,console-result]
----
{
  "id": "loganalytics",
  "description": "Outlier detection on log data",
  "source": {
    "index": ["logdata"],
    "query": {
      "match_all": {}
    }
  },
  "dest": {
    "index": "logdata_out",
    "results_field": "ml"
  },
  "analysis": {
    "outlier_detection": {
      "compute_feature_influence": true,
      "outlier_fraction": 0.05,
      "standardization_enabled": true
    }
  },
  "model_memory_limit": "1gb",
  "create_time" : 1562265491319,
  "version" : "7.6.0",
  "allow_lazy_start" : false,
  "max_num_threads": 1
}
----
// TESTRESPONSE[s/1562265491319/$body.$_path/]
// TESTRESPONSE[s/"version" : "7.6.0"/"version" : $body.version/]


[[ml-put-dfanalytics-example-r]]
=== {regression-cap} examples

The following example creates the `house_price_regression_analysis` 
{dfanalytics-job}, the analysis type is `regression`:

[source,console]
--------------------------------------------------
PUT _ml/data_frame/analytics/house_price_regression_analysis
{
  "source": {
    "index": "houses_sold_last_10_yrs"
  },
  "dest": {
    "index": "house_price_predictions"
  },
  "analysis": 
    {
      "regression": {
        "dependent_variable": "price"
      }
    }
}
--------------------------------------------------
// TEST[skip:TBD]


The API returns the following result:

[source,console-result]
----
{
  "id" : "house_price_regression_analysis",
  "source" : {
    "index" : [
      "houses_sold_last_10_yrs"
    ],
    "query" : {
      "match_all" : { }
    }
  },
  "dest" : {
    "index" : "house_price_predictions",
    "results_field" : "ml"
  },
  "analysis" : {
    "regression" : {
      "dependent_variable" : "price",
      "training_percent" : 100
    }
  },
  "model_memory_limit" : "1gb",
  "create_time" : 1567168659127,
  "version" : "8.0.0",
  "allow_lazy_start" : false
}
----
// TESTRESPONSE[s/1567168659127/$body.$_path/]
// TESTRESPONSE[s/"version": "8.0.0"/"version": $body.version/]


The following example creates a job and specifies a training percent:

[source,console]
--------------------------------------------------
PUT _ml/data_frame/analytics/student_performance_mathematics_0.3
{
 "source": {
   "index": "student_performance_mathematics"
 },
 "dest": {
   "index":"student_performance_mathematics_reg"
 },
 "analysis":
   {
     "regression": {
       "dependent_variable": "G3",
       "training_percent": 70,  <1>
       "randomize_seed": 19673948271  <2>
     }
   }
}
--------------------------------------------------
// TEST[skip:TBD]

<1> The percentage of the data set that is used for training the model.
<2> The seed that is used to randomly pick which data is used for training.


[[ml-put-dfanalytics-example-c]]
=== {classification-cap} example

The following example creates the `loan_classification` {dfanalytics-job}, the 
analysis type is `classification`:

[source,console]
--------------------------------------------------
PUT _ml/data_frame/analytics/loan_classification
{
  "source" : {
    "index": "loan-applicants"
  },
  "dest" : {
    "index": "loan-applicants-classified"
  },
  "analysis" : {
    "classification": {
      "dependent_variable": "label",
      "training_percent": 75,
      "num_top_classes": 2
    }
  }
}
--------------------------------------------------
// TEST[skip:TBD]
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 07:34:05 -04:00
+								[role="xpack"]
 								[testenv="platinum"]
 								[[put-dfanalytics]]
-												[DOCS] Changes level offset in data frame analytics APIs (#59919) (#59923)


											
										
										
											2020-07-20 16:06:29 -04:00
+								= Create {dfanalytics-jobs} API
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 07:34:05 -04:00
+								[subs="attributes"]
 								++++
 								<titleabbrev>Create {dfanalytics-jobs}</titleabbrev>
 								++++
 								Instantiates a {dfanalytics-job}.
-												[DOCS] Reformats API parameter details (#44194)


											
										
										
											2019-07-12 11:26:31 -04:00
+								experimental[]
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 07:34:05 -04:00
+								[[ml-put-dfanalytics-request]]
-												[DOCS] Changes level offset in data frame analytics APIs (#59919) (#59923)


											
										
										
											2020-07-20 16:06:29 -04:00
+								== {api-request-title}
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 07:34:05 -04:00
 								`PUT _ml/data_frame/analytics/<data_frame_analytics_id>`
-												[DOCS] [PUT DFA] Documents inline the child params of source and dest (#45649)

* [DOCS] [PUT DFA] Documents inline the child params of source and dest.

* [DOCS] Fixes indentation issues and amends dfa definitions.

											
										
										
											2019-08-29 08:38:14 -04:00
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 07:34:05 -04:00
+								[[ml-put-dfanalytics-prereq]]
-												[DOCS] Changes level offset in data frame analytics APIs (#59919) (#59923)


											
										
										
											2020-07-20 16:06:29 -04:00
+								== {api-prereq-title}
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 07:34:05 -04:00
-												[DOCS] Forms role and privilege requirements as bulleted lists in DFA API docs (#50732)

Co-Authored-By: Lisa Cawley <lcawley@elastic.co>
											
										
										
											2020-01-09 04:44:07 -05:00
+								If the {es} {security-features} are enabled, you must have the following built-in roles and privileges:
 								* `machine_learning_admin`
-												[DOCS] Edits create data frame analytics job API (#54751)

											
										
										
											2020-04-13 13:43:52 -04:00
+								* source indices: `read`, `view_index_metadata`
-												[DOCS] Forms role and privilege requirements as bulleted lists in DFA API docs (#50732)

Co-Authored-By: Lisa Cawley <lcawley@elastic.co>
											
										
										
											2020-01-09 04:44:07 -05:00
+								* destination index: `read`, `create_index`, `manage` and `index`
-												[DOCS] Fix security links in machine learning APIs (#60098) (#60152)


											
										
										
											2020-07-23 19:43:10 -04:00
+								For more information, see <<built-in-roles>>, <<security-privileges>>, and
 								{ml-docs-setup-privileges}.
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 07:34:05 -04:00
-												[DOCS] Adds documentation for secondary authorization headers (#55365) (#55986)


											
										
										
											2020-04-29 19:29:38 -04:00
+								NOTE: The {dfanalytics-job} remembers which roles the user who created it had at
 								the time of creation. When you start the job, it performs the analysis using
 								those same roles. If you provide
 								<<http-clients-secondary-authorization,secondary authorization headers>>,
 								those credentials are used instead.
-												[DOCS] [PUT DFA] Documents inline the child params of source and dest (#45649)

* [DOCS] [PUT DFA] Documents inline the child params of source and dest.

* [DOCS] Fixes indentation issues and amends dfa definitions.

											
										
										
											2019-08-29 08:38:14 -04:00
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 07:34:05 -04:00
+								[[ml-put-dfanalytics-desc]]
-												[DOCS] Changes level offset in data frame analytics APIs (#59919) (#59923)


											
										
										
											2020-07-20 16:06:29 -04:00
+								== {api-description-title}
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 07:34:05 -04:00
 								This API creates a {dfanalytics-job} that performs an analysis on the source
-												[DOCS] Edits create data frame analytics job API (#54751)

											
										
										
											2020-04-13 13:43:52 -04:00
+								indices and stores the outcome in a destination index.
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 07:34:05 -04:00
-												[DOCS] Edits create data frame analytics job API (#54751)

											
										
										
											2020-04-13 13:43:52 -04:00
+								If the destination index does not exist, it is created automatically when you
 								start the job. See <<start-dfanalytics>>.
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 07:34:05 -04:00
-												[DOCS] Edits create data frame analytics job API (#54751)

											
										
										
											2020-04-13 13:43:52 -04:00
+								If you supply only a subset of the {regression} or {classification} parameters,
-												[DOCS] Synchs and links hyperparameter descriptions (#56131)


											
										
										
											2020-05-04 13:37:26 -04:00
+								{ml-docs}/hyperparameters.html[hyperparameter optimization] occurs. It
 								determines a value for each of the undefined parameters.
-												[DOCS] Adds supported fields section to the PUT DFA API description (#47842)



											
										
										
											2019-10-10 06:34:39 -04:00
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 10:21:35 -05:00
+								[[ml-put-dfanalytics-path-params]]
-												[DOCS] Changes level offset in data frame analytics APIs (#59919) (#59923)


											
										
										
											2020-07-20 16:06:29 -04:00
+								== {api-path-parms-title}
-												[DOCS] Adds supported fields section to the PUT DFA API description (#47842)



											
										
										
											2019-10-10 06:34:39 -04:00
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 10:21:35 -05:00
+								`<data_frame_analytics_id>`::
 								(Required, string)
-												[DOCS] Replaces docdir attributes in ML APIs (#57390) (#57467)


											
										
										
											2020-06-01 16:46:15 -04:00
+								include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=job-id-data-frame-analytics-define]
-												[DOCS] Adds supported fields section to the PUT DFA API description (#47842)



											
										
										
											2019-10-10 06:34:39 -04:00
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 15:51:04 -04:00
+								[role="child_attributes"]
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 10:21:35 -05:00
+								[[ml-put-dfanalytics-request-body]]
-												[DOCS] Changes level offset in data frame analytics APIs (#59919) (#59923)


											
										
										
											2020-07-20 16:06:29 -04:00
+								== {api-request-body-title}
-												[DOCS] [PUT DFA] Documents inline the child params of source and dest (#45649)

* [DOCS] [PUT DFA] Documents inline the child params of source and dest.

* [DOCS] Fixes indentation issues and amends dfa definitions.

											
										
										
											2019-08-29 08:38:14 -04:00
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 10:21:35 -05:00
+								`allow_lazy_start`::
 								(Optional, boolean)
-												[DOCS] Edits create data frame analytics job API (#54751)

											
										
										
											2020-04-13 13:43:52 -04:00
+								Specifies whether this job can start when there is insufficient {ml} node
 								capacity for it to be immediately assigned to a node. The default is `false`; if
 								a {ml} node with capacity to run the job cannot immediately be found, the
 								<<start-dfanalytics>> API returns an error. However, this is also subject to the
 								cluster-wide `xpack.ml.max_lazy_ml_nodes` setting. See <<advanced-ml-settings>>.
 								If this option is set to `true`, the API does not return an error and the job
 								waits in the `starting` state until sufficient {ml} node capacity is available.
-												[DOCS] Adds classification type DFA API docs and ml-shared.asciidoc (#48241)


											
										
										
											2019-11-06 07:40:27 -05:00
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 15:51:04 -04:00
+								//Begin analysis
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 10:21:35 -05:00
+								`analysis`::
 								(Required, object)
 								The analysis configuration, which contains the information necessary to perform
 								one of the following types of analysis: {classification}, {oldetection}, or
 								{regression}.
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 15:51:04 -04:00
+								+
 								.Properties of `analysis`
 								[%collapsible%open]
 								====
 								//Begin classification
 								`classification`:::
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 10:21:35 -05:00
+								(Required^*^, object)
 								The configuration information necessary to perform
 								{ml-docs}/dfa-classification.html[{classification}].
 								+
 								TIP: Advanced parameters are for fine-tuning {classanalysis}. They are set
-												[DOCS] Edits create data frame analytics job API (#54751)

											
										
										
											2020-04-13 13:43:52 -04:00
+								automatically by hyperparameter optimization to give the minimum validation
 								error. It is highly recommended to use the default values unless you fully
 								understand the function of these parameters.
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 15:51:04 -04:00
+								+
 								.Properties of `classification`
 								[%collapsible%open]
 								=====
-												[DOCS] Edits create data frame analytics job API (#54751)

											
										
										
											2020-04-13 13:43:52 -04:00
+								`class_assignment_objective`::::
 								(Optional, string)
-												[DOCS] Replaces docdir attributes in ML APIs (#57390) (#57467)


											
										
										
											2020-06-01 16:46:15 -04:00
+								include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=class-assignment-objective]
-												[DOCS] Edits create data frame analytics job API (#54751)

											
										
										
											2020-04-13 13:43:52 -04:00
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 15:51:04 -04:00
+								`dependent_variable`::::
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 10:21:35 -05:00
+								(Required, string)
 								+
-												[DOCS] Replaces docdir attributes in ML APIs (#57390) (#57467)


											
										
										
											2020-06-01 16:46:15 -04:00
+								include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dependent-variable]
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 15:51:04 -04:00
+								+
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 10:21:35 -05:00
+								The data type of the field must be numeric (`integer`, `short`, `long`, `byte`),
-												[DOCS] Remove text fields from classification dependent variables (#54849)

											
										
										
											2020-04-07 13:43:15 -04:00
+								categorical (`ip` or `keyword`), or boolean. There must be no more than 30
-												[DOCS] Edits create data frame analytics job API (#54751)

											
										
										
											2020-04-13 13:43:52 -04:00
+								different values in this field.
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 10:21:35 -05:00
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 15:51:04 -04:00
+								`eta`::::
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 10:21:35 -05:00
+								(Optional, double)
-												[DOCS] Replaces docdir attributes in ML APIs (#57390) (#57467)


											
										
										
											2020-06-01 16:46:15 -04:00
+								include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=eta]
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 10:21:35 -05:00
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 15:51:04 -04:00
+								`feature_bag_fraction`::::
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 10:21:35 -05:00
+								(Optional, double)
-												[DOCS] Replaces docdir attributes in ML APIs (#57390) (#57467)


											
										
										
											2020-06-01 16:46:15 -04:00
+								include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=feature-bag-fraction]
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 10:21:35 -05:00
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 15:51:04 -04:00
+								`gamma`::::
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 10:21:35 -05:00
+								(Optional, double)
-												[DOCS] Replaces docdir attributes in ML APIs (#57390) (#57467)


											
										
										
											2020-06-01 16:46:15 -04:00
+								include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=gamma]
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 10:21:35 -05:00
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 15:51:04 -04:00
+								`lambda`::::
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 10:21:35 -05:00
+								(Optional, double)
-												[DOCS] Replaces docdir attributes in ML APIs (#57390) (#57467)


											
										
										
											2020-06-01 16:46:15 -04:00
+								include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=lambda]
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 10:21:35 -05:00
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 15:51:04 -04:00
+								`max_trees`::::
 								(Optional, integer)
-												[DOCS] Replaces docdir attributes in ML APIs (#57390) (#57467)


											
										
										
											2020-06-01 16:46:15 -04:00
+								include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=max-trees]
-												[7.x][ML] Adds the class_assignment_objective parameter to classification (#53552)

Adds a new parameter for classification that enables choosing whether to assign labels to
maximise accuracy or to maximise the minimum class recall.

Fixes #52427.
											
										
										
											2020-03-13 13:35:51 -04:00
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 15:51:04 -04:00
+								`num_top_classes`::::
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 10:21:35 -05:00
+								(Optional, integer)
-												[DOCS] Edits create data frame analytics job API (#54751)

											
										
										
											2020-04-13 13:43:52 -04:00
+								Defines the number of categories for which the predicted probabilities are
 								reported. It must be non-negative. If it is greater than the total number of
 								categories, the API reports all category probabilities. Defaults to 2.
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 10:21:35 -05:00
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 15:51:04 -04:00
+								`num_top_feature_importance_values`::::
-												[7.x][ML] Add num_top_feature_importance_values param to regression and classi… (#50914) (#50976)

Adds a new parameter to regression and classification that enables computation
of importance for the top most important features. The computation of the importance
is based on SHAP (SHapley Additive exPlanations) method.

Backport of #50914
											
										
										
											2020-01-14 09:46:09 -05:00
+								(Optional, integer)
-												[DOCS] Clarifies description of num_top_feature_importance_values (#52246)

Co-Authored-By: Valeriy Khakhutskyy <1292899+valeriy42@users.noreply.github.com>
											
										
										
											2020-02-18 11:48:24 -05:00
+								Advanced configuration option. Specifies the maximum number of
-												[DOCS] Changes feature importance links to point to the new page (#55531)

* [DOCS] Changes feature importance links to point to the new page.

* [DOCS] Fixes line breaks.
											
										
										
											2020-04-28 03:02:14 -04:00
+								{ml-docs}/ml-feature-importance.html[{feat-imp}] values per document to return.
 								By default, it is zero and no {feat-imp} calculation occurs.
-												[7.x][ML] Add num_top_feature_importance_values param to regression and classi… (#50914) (#50976)

Adds a new parameter to regression and classification that enables computation
of importance for the top most important features. The computation of the importance
is based on SHAP (SHapley Additive exPlanations) method.

Backport of #50914
											
										
										
											2020-01-14 09:46:09 -05:00
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 15:51:04 -04:00
+								`prediction_field_name`::::
 								(Optional, string)
-												[DOCS] Replaces docdir attributes in ML APIs (#57390) (#57467)


											
										
										
											2020-06-01 16:46:15 -04:00
+								include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=prediction-field-name]
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 15:51:04 -04:00
 								`randomize_seed`::::
 								(Optional, long)
-												[DOCS] Replaces docdir attributes in ML APIs (#57390) (#57467)


											
										
										
											2020-06-01 16:46:15 -04:00
+								include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=randomize-seed]
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 15:51:04 -04:00
 								`training_percent`::::
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 10:21:35 -05:00
+								(Optional, integer)
-												[DOCS] Replaces docdir attributes in ML APIs (#57390) (#57467)


											
										
										
											2020-06-01 16:46:15 -04:00
+								include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=training-percent]
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 15:51:04 -04:00
+								//End classification
 								=====
 								//Begin outlier_detection
 								`outlier_detection`:::
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 10:21:35 -05:00
+								(Required^*^, object)
 								The configuration information necessary to perform
 								{ml-docs}/dfa-outlier-detection.html[{oldetection}]:
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 15:51:04 -04:00
+								+
 								.Properties of `outlier_detection`
 								[%collapsible%open]
 								=====
 								`compute_feature_influence`::::
-												[DOCS] Removes unshared sections from ml-shared.asciidoc (#55192)


											
										
										
											2020-04-14 21:47:09 -04:00
+								(Optional, boolean)
-												[DOCS] Replaces docdir attributes in ML APIs (#57390) (#57467)


											
										
										
											2020-06-01 16:46:15 -04:00
+								include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=compute-feature-influence]
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 15:51:04 -04:00
 								`feature_influence_threshold`::::
-												[DOCS] Removes unshared sections from ml-shared.asciidoc (#55192)


											
										
										
											2020-04-14 21:47:09 -04:00
+								(Optional, double)
-												[DOCS] Replaces docdir attributes in ML APIs (#57390) (#57467)


											
										
										
											2020-06-01 16:46:15 -04:00
+								include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=feature-influence-threshold]
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 10:21:35 -05:00
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 15:51:04 -04:00
+								`method`::::
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 10:21:35 -05:00
+								(Optional, string)
-												[DOCS] Replaces docdir attributes in ML APIs (#57390) (#57467)


											
										
										
											2020-06-01 16:46:15 -04:00
+								include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=method]
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 15:51:04 -04:00
 								`n_neighbors`::::
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 10:21:35 -05:00
+								(Optional, integer)
-												[DOCS] Replaces docdir attributes in ML APIs (#57390) (#57467)


											
										
										
											2020-06-01 16:46:15 -04:00
+								include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=n-neighbors]
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 15:51:04 -04:00
 								`outlier_fraction`::::
-												[DOCS] Removes unshared sections from ml-shared.asciidoc (#55192)


											
										
										
											2020-04-14 21:47:09 -04:00
+								(Optional, double)
-												[DOCS] Replaces docdir attributes in ML APIs (#57390) (#57467)


											
										
										
											2020-06-01 16:46:15 -04:00
+								include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=outlier-fraction]
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 15:51:04 -04:00
 								`standardization_enabled`::::
-												[DOCS] Removes unshared sections from ml-shared.asciidoc (#55192)


											
										
										
											2020-04-14 21:47:09 -04:00
+								(Optional, boolean)
-												[DOCS] Replaces docdir attributes in ML APIs (#57390) (#57467)


											
										
										
											2020-06-01 16:46:15 -04:00
+								include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=standardization-enabled]
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 15:51:04 -04:00
+								//End outlier_detection
 								=====
 								//Begin regression
 								`regression`:::
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 10:21:35 -05:00
+								(Required^*^, object)
 								The configuration information necessary to perform
 								{ml-docs}/dfa-regression.html[{regression}].
 								+
 								TIP: Advanced parameters are for fine-tuning {reganalysis}. They are set
-												[DOCS] Edits create data frame analytics job API (#54751)

											
										
										
											2020-04-13 13:43:52 -04:00
+								automatically by hyperparameter optimization to give minimum validation error.
 								It is highly recommended to use the default values unless you fully understand
 								the function of these parameters.
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 15:51:04 -04:00
+								+
 								.Properties of `regression`
 								[%collapsible%open]
 								=====
 								`dependent_variable`::::
-												[7.x][DOCS] Moves data frame analytics job resource definitions into APIs (#50165)

* [7.x][DOCS] Moves data frame analytics job resource definitions into APIs.
											
										
										
											2019-12-13 05:48:21 -05:00
+								(Required, string)
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 10:21:35 -05:00
+								+
-												[DOCS] Replaces docdir attributes in ML APIs (#57390) (#57467)


											
										
										
											2020-06-01 16:46:15 -04:00
+								include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dependent-variable]
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 15:51:04 -04:00
+								+
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 10:21:35 -05:00
+								The data type of the field must be numeric.
-												[DOCS] Moves analysis resources to PUT DFA API docs (#50704)

Co-authored-by: Lisa Cawley <lcawley@elastic.co>

											
										
										
											2020-01-09 07:57:11 -05:00
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 15:51:04 -04:00
+								`eta`::::
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 10:21:35 -05:00
+								(Optional, double)
-												[DOCS] Replaces docdir attributes in ML APIs (#57390) (#57467)


											
										
										
											2020-06-01 16:46:15 -04:00
+								include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=eta]
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 10:21:35 -05:00
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 15:51:04 -04:00
+								`feature_bag_fraction`::::
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 10:21:35 -05:00
+								(Optional, double)
-												[DOCS] Replaces docdir attributes in ML APIs (#57390) (#57467)


											
										
										
											2020-06-01 16:46:15 -04:00
+								include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=feature-bag-fraction]
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 10:21:35 -05:00
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 15:51:04 -04:00
+								`gamma`::::
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 10:21:35 -05:00
+								(Optional, double)
-												[DOCS] Replaces docdir attributes in ML APIs (#57390) (#57467)


											
										
										
											2020-06-01 16:46:15 -04:00
+								include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=gamma]
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 10:21:35 -05:00
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 15:51:04 -04:00
+								`lambda`::::
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 10:21:35 -05:00
+								(Optional, double)
-												[DOCS] Replaces docdir attributes in ML APIs (#57390) (#57467)


											
										
										
											2020-06-01 16:46:15 -04:00
+								include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=lambda]
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 10:21:35 -05:00
-												[7.x][ML] Add loss_function to regression (#56118) (#56187)

Adds parameters `loss_function` and `loss_function_parameter`
to regression.

Backport of #56118
											
										
										
											2020-05-05 07:59:51 -04:00
+								`loss_function`::::
 								(Optional, string)
-												[DOCS] Puts a link into the loss_function variable description (#56678)


											
										
										
											2020-05-28 03:42:27 -04:00
+								The loss function used during {regression}. Available options are `mse` (mean
 								squared error), `msle` (mean squared logarithmic error),  `huber` (Pseudo-Huber
 								loss). Defaults to `mse`. Refer to
 								{ml-docs}/dfa-regression.html#dfa-regression-lossfunction[Loss functions for {regression} analyses]
 								to learn more.
-												[7.x][ML] Add loss_function to regression (#56118) (#56187)

Adds parameters `loss_function` and `loss_function_parameter`
to regression.

Backport of #56118
											
										
										
											2020-05-05 07:59:51 -04:00
 								`loss_function_parameter`::::
 								(Optional, double)
-												[DOCS] Puts a link into the loss_function variable description (#56678)


											
										
										
											2020-05-28 03:42:27 -04:00
+								A positive number that is used as a parameter to the `loss_function`.
-												[7.x][ML] Add loss_function to regression (#56118) (#56187)

Adds parameters `loss_function` and `loss_function_parameter`
to regression.

Backport of #56118
											
										
										
											2020-05-05 07:59:51 -04:00
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 15:51:04 -04:00
+								`max_trees`::::
 								(Optional, integer)
-												[DOCS] Replaces docdir attributes in ML APIs (#57390) (#57467)


											
										
										
											2020-06-01 16:46:15 -04:00
+								include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=max-trees]
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 10:21:35 -05:00
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 15:51:04 -04:00
+								`num_top_feature_importance_values`::::
-												[7.x][ML] Add num_top_feature_importance_values param to regression and classi… (#50914) (#50976)

Adds a new parameter to regression and classification that enables computation
of importance for the top most important features. The computation of the importance
is based on SHAP (SHapley Additive exPlanations) method.

Backport of #50914
											
										
										
											2020-01-14 09:46:09 -05:00
+								(Optional, integer)
-												[DOCS] Clarifies description of num_top_feature_importance_values (#52246)

Co-Authored-By: Valeriy Khakhutskyy <1292899+valeriy42@users.noreply.github.com>
											
										
										
											2020-02-18 11:48:24 -05:00
+								Advanced configuration option. Specifies the maximum number of
-												[DOCS] Changes feature importance links to point to the new page (#55531)

* [DOCS] Changes feature importance links to point to the new page.

* [DOCS] Fixes line breaks.
											
										
										
											2020-04-28 03:02:14 -04:00
+								{ml-docs}/ml-feature-importance.html[{feat-imp}] values per document to return.
 								By default, it is zero and no {feat-imp} calculation occurs.
-												[7.x][ML] Add num_top_feature_importance_values param to regression and classi… (#50914) (#50976)

Adds a new parameter to regression and classification that enables computation
of importance for the top most important features. The computation of the importance
is based on SHAP (SHapley Additive exPlanations) method.

Backport of #50914
											
										
										
											2020-01-14 09:46:09 -05:00
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 15:51:04 -04:00
+								`prediction_field_name`::::
 								(Optional, string)
-												[DOCS] Replaces docdir attributes in ML APIs (#57390) (#57467)


											
										
										
											2020-06-01 16:46:15 -04:00
+								include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=prediction-field-name]
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 10:21:35 -05:00
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 15:51:04 -04:00
+								`randomize_seed`::::
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 10:21:35 -05:00
+								(Optional, long)
-												[DOCS] Replaces docdir attributes in ML APIs (#57390) (#57467)


											
										
										
											2020-06-01 16:46:15 -04:00
+								include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=randomize-seed]
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 15:51:04 -04:00
 								`training_percent`::::
 								(Optional, integer)
-												[DOCS] Replaces docdir attributes in ML APIs (#57390) (#57467)


											
										
										
											2020-06-01 16:46:15 -04:00
+								include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=training-percent]
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 15:51:04 -04:00
+								=====
 								//End regression
 								====
 								//End analysis
 								//Begin analyzed_fields
-												[DOCS] Reformats API parameter details (#44194)


											
										
										
											2019-07-12 11:26:31 -04:00
+								`analyzed_fields`::
-												[7.x][DOCS] Moves data frame analytics job resource definitions into APIs (#50165)

* [7.x][DOCS] Moves data frame analytics job resource definitions into APIs.
											
										
										
											2019-12-13 05:48:21 -05:00
+								(Optional, object)
-												[DOCS] Adds link points to the data frame analytics supported fields (#55004)

Co-authored-by: lcawl <lcawley@elastic.co>

											
										
										
											2020-04-09 14:16:13 -04:00
+								Specify `includes` and/or `excludes` patterns to select which fields will be
 								included in the analysis. The patterns specified in `excludes` are applied last,
 								therefore `excludes` takes precedence. In other words, if the same field is
 								specified in both `includes` and `excludes`, then the field will not be included
 								in the analysis.
 								+
 								--
 								[[dfa-supported-fields]]
 								The supported fields for each type of analysis are as follows:
 								* {oldetection-cap} requires numeric or boolean data to analyze. The algorithms
 								don't support missing values therefore fields that have data types other than
 								numeric or boolean are ignored. Documents where included fields contain missing
 								values, null values, or an array are also ignored. Therefore the `dest` index
 								may contain documents that don't have an {olscore}.
 								* {regression-cap} supports fields that are numeric, `boolean`, `text`,
 								`keyword`, and `ip`. It is also tolerant of missing values. Fields that are
 								supported are included in the analysis, other fields are ignored. Documents
 								where included fields contain  an array with two or more values are also
 								ignored. Documents in the `dest` index  that don’t contain a results field are
 								not included in the {reganalysis}.
 								* {classification-cap} supports fields that are numeric, `boolean`, `text`,
 								`keyword`, and `ip`. It is also tolerant of missing values. Fields that are
 								supported are included in the analysis, other fields are ignored. Documents
 								where included fields contain an array with two or more values are also ignored.
 								Documents in the `dest` index that don’t contain a results field are not
 								included in the {classanalysis}. {classanalysis-cap} can be improved by mapping
 								ordinal variable values to a  single number. For example, in case of age ranges,
 								you can model the values as "0-14" = 0, "15-24" = 1, "25-34" = 2, and so on.
 								If `analyzed_fields` is not set, only the relevant fields will be included. For
 								example, all the numeric fields for {oldetection}. For more information about
 								field selection, see <<explain-dfanalytics>>.
 								--
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 15:51:04 -04:00
+								+
 								.Properties of `analyzed_fields`
 								[%collapsible%open]
 								====
 								`excludes`:::
-												[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793)



											
										
										
											2020-01-09 10:21:35 -05:00
+								(Optional, array)
-												[DOCS] Adds link points to the data frame analytics supported fields (#55004)

Co-authored-by: lcawl <lcawley@elastic.co>

											
										
										
											2020-04-09 14:16:13 -04:00
+								An array of strings that defines the fields that will be excluded from the
 								analysis. You do not need to add fields with unsupported data types to
 								`excludes`, these fields are excluded from the analysis automatically.
-												[7.x][DOCS] Moves data frame analytics job resource definitions into APIs (#50165)

* [7.x][DOCS] Moves data frame analytics job resource definitions into APIs.
											
										
										
											2019-12-13 05:48:21 -05:00
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 15:51:04 -04:00
+								`includes`:::
 								(Optional, array)
-												[DOCS] Adds link points to the data frame analytics supported fields (#55004)

Co-authored-by: lcawl <lcawley@elastic.co>

											
										
										
											2020-04-09 14:16:13 -04:00
+								An array of strings that defines the fields that will be included in the
 								analysis.
-												[DOCS] Collapses nested objects in data frame analytics APIs (#54472) (#54526)


											
										
										
											2020-03-31 15:51:04 -04:00
+								//End analyzed_fields
 								====
-												[ML] Add description to DF analytics (#45774) (#46019)


											
										
										
											2019-08-27 08:48:59 -04:00
 								`description`::
-												[7.x][DOCS] Moves data frame analytics job resource definitions into APIs (#50165)

* [7.x][DOCS] Moves data frame analytics job resource definitions into APIs.
											
										
										
											2019-12-13 05:48:21 -05:00
+								(Optional, string)
-												[DOCS] Replaces docdir attributes in ML APIs (#57390) (#57467)


											
										
										
											2020-06-01 16:46:15 -04:00
+								include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=description-dfa]
-												[ML] Add description to DF analytics (#45774) (#46019)


											
										
										
											2019-08-27 08:48:59 -04:00
-												[DOCS] Reformats API parameter details (#44194)


											
										
										
											2019-07-12 11:26:31 -04:00
+								`dest`::
-												[7.x][DOCS] Moves data frame analytics job resource definitions into APIs (#50165)

* [7.x][DOCS] Moves data frame analytics job resource definitions into APIs.
											
										
										
											2019-12-13 05:48:21 -05:00
+								(Required, object)
-												[DOCS] Replaces docdir attributes in ML APIs (#57390) (#57467)


											
										
										
											2020-06-01 16:46:15 -04:00
+								include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dest]
-												[7.x][ML] Data frame analytics max_num_threads setting (#59254) (#59308)

This adds a setting to data frame analytics jobs called
`max_number_threads`. The setting expects a positive integer.
When used the user specifies the max number of threads that may
be used by the analysis. Note that the actual number of threads
used is limited by the number of processors on the node where
the job is assigned. Also, the process may use a couple more threads
for operational functionality that is not the analysis itself.

This setting may also be updated for a stopped job.

More threads may reduce the time it takes to complete the job at the cost
of using more CPU.

Backport of #59254 and #57274
											
										
										
											2020-07-09 12:15:46 -04:00
 								`max_num_threads`::
 								(Optional, integer)
 								The maximum number of threads to be used by the analysis.
 								The default value is `1`. Using more threads may decrease the time
 								necessary to complete the analysis at the cost of using more CPU.
 								Note that the process may use additional threads for operational
 								functionality other than the analysis itself.
-												[DOCS] Amends data frame analytics resources, GET, and PUT API docs (#44806)

This PR addresses the feedback in  https://github.com/elastic/ml-team/issues/175#issuecomment-512215731.

* Adds an example to `analyzed_fields`
* Includes `source` and `dest` objects inline in the resource page
* Lists `model_memory_limit` in the PUT API page
* Amends the `analysis` section in the resource page
* Removes Properties headings in subsections
											
										
										
											2019-07-26 05:39:59 -04:00
 								`model_memory_limit`::
-												[7.x][DOCS] Moves data frame analytics job resource definitions into APIs (#50165)

* [7.x][DOCS] Moves data frame analytics job resource definitions into APIs.
											
										
										
											2019-12-13 05:48:21 -05:00
+								(Optional, string)
-												[DOCS] Removes unshared sections from ml-shared.asciidoc (#55192)


											
										
										
											2020-04-14 21:47:09 -04:00
+								The approximate maximum amount of memory resources that are permitted for
 								analytical processing. The default value for {dfanalytics-jobs} is `1gb`. If
 								your `elasticsearch.yml` file contains an `xpack.ml.max_model_memory_limit`
 								setting, an error occurs when you try to create {dfanalytics-jobs} that have
 								`model_memory_limit` values greater than that setting. For more information, see
 								<<ml-settings>>.
-												[DOCS] Fixes formatting in data frame analytics API

											
										
										
											2019-07-10 20:58:17 -04:00
-												[DOCS] Reformats API parameter details (#44194)


											
										
										
											2019-07-12 11:26:31 -04:00
+								`source`::
-												[7.x][DOCS] Moves data frame analytics job resource definitions into APIs (#50165)

* [7.x][DOCS] Moves data frame analytics job resource definitions into APIs.
											
										
										
											2019-12-13 05:48:21 -05:00
+								(object)
-												[DOCS] Removes unshared sections from ml-shared.asciidoc (#55192)


											
										
										
											2020-04-14 21:47:09 -04:00
+								The configuration of how to source the analysis data. It requires an `index`.
 								Optionally, `query` and `_source` may be specified.
 								+
 								.Properties of `source`
 								[%collapsible%open]
 								====
 								`index`:::
 								(Required, string or array) Index or indices on which to perform the analysis.
 								It can be a single index or index pattern as well as an array of indices or
 								patterns.
 								+
 								WARNING: If your source indices contain documents with the same IDs, only the
 								document that is indexed last appears in the destination index.
 								`query`:::
 								(Optional, object) The {es} query domain-specific language (<<query-dsl,DSL>>).
 								This value corresponds to the query object in an {es} search POST body. All the
 								options that are supported by {es} can be used, as this object is passed
 								verbatim to {es}. By default, this property has the following value:
 								`{"match_all": {}}`.
 								`_source`:::
 								(Optional, object) Specify `includes` and/or `excludes` patterns to select which
 								fields will be present in the destination. Fields that are excluded cannot be
 								included in the analysis.
 								+
 								.Properties of `_source`
 								[%collapsible%open]
 								=====
 								`includes`::::
 								(array) An array of strings that defines the fields that will be included in the
 								destination.
 								`excludes`::::
 								(array) An array of strings that defines the fields that will be excluded from
 								the destination.
 								=====
 								====
-												[DOCS] [PUT DFA] Documents inline the child params of source and dest (#45649)

* [DOCS] [PUT DFA] Documents inline the child params of source and dest.

* [DOCS] Fixes indentation issues and amends dfa definitions.

											
										
										
											2019-08-29 08:38:14 -04:00
-												Revert "[DOCS] Moves analysis resources to PUT DFA API docs (#50704)"

This reverts commit 4e1107d5d717599ddf1632c3253de6f1df4a51af.

											
										
										
											2020-01-09 08:31:35 -05:00
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 07:34:05 -04:00
+								[[ml-put-dfanalytics-example]]
-												[DOCS] Changes level offset in data frame analytics APIs (#59919) (#59923)


											
										
										
											2020-07-20 16:06:29 -04:00
+								== {api-examples-title}
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 07:34:05 -04:00
-												Revert "[DOCS] Moves analysis resources to PUT DFA API docs (#50704)"

This reverts commit 4e1107d5d717599ddf1632c3253de6f1df4a51af.

											
										
										
											2020-01-09 08:31:35 -05:00
-												[DOCS] Adds an example of preprocessing actions to the PUT DFA API docs (#49831)



											
										
										
											2019-12-05 08:15:19 -05:00
+								[[ml-put-dfanalytics-example-preprocess]]
-												[DOCS] Changes level offset in data frame analytics APIs (#59919) (#59923)


											
										
										
											2020-07-20 16:06:29 -04:00
+								=== Preprocessing actions example
-												[DOCS] Adds an example of preprocessing actions to the PUT DFA API docs (#49831)



											
										
										
											2019-12-05 08:15:19 -05:00
 								The following example shows how to limit the scope of the analysis to certain
 								fields, specify excluded fields in the destination index, and use a query to
 								filter your data before analysis.
 								[source,console]
 								--------------------------------------------------
 								PUT _ml/data_frame/analytics/model-flight-delays-pre
 								{
 								  "source": {
 								    "index": [
 								      "kibana_sample_data_flights" <1>
 								    ],
 								    "query": { <2>
 								      "range": {
 								        "DistanceKilometers": {
 								          "gt": 0
 								        }
 								      }
 								    },
 								    "_source": { <3>
 								      "includes": [],
 								      "excludes": [
 								        "FlightDelay",
 								        "FlightDelayType"
 								      ]
 								    }
 								  },
 								  "dest": { <4>
 								    "index": "df-flight-delays",
 								    "results_field": "ml-results"
 								  },
 								  "analysis": {
 								  "regression": {
 								    "dependent_variable": "FlightDelayMin",
 								    "training_percent": 90
 								    }
 								  },
 								  "analyzed_fields": { <5>
 								    "includes": [],
 								    "excludes": [
 								      "FlightNum"
 								    ]
 								  },
 								  "model_memory_limit": "100mb"
 								}
 								--------------------------------------------------
 								// TEST[skip:setup kibana sample data]
-												[DOCS] Simplifies footnote text in DFA APIs (#56105)

Co-authored-by: Lisa Cawley <lcawley@elastic.co>
											
										
										
											2020-05-05 03:03:16 -04:00
+								<1> Source index to analyze.
-												[DOCS] Adds an example of preprocessing actions to the PUT DFA API docs (#49831)



											
										
										
											2019-12-05 08:15:19 -05:00
+								<2> This query filters out entire documents that will not be present in the
 								destination index.
 								<3> The `_source` object defines fields in the dataset that will be included or
-												[DOCS] Simplifies footnote text in DFA APIs (#56105)

Co-authored-by: Lisa Cawley <lcawley@elastic.co>
											
										
										
											2020-05-05 03:03:16 -04:00
+								excluded in the destination index.
-												[DOCS] Adds an example of preprocessing actions to the PUT DFA API docs (#49831)



											
										
										
											2019-12-05 08:15:19 -05:00
+								<4> Defines the destination index that contains the results of the analysis and
 								the fields of the source index specified in the `_source` object. Also defines
 								the name of the `results_field`.
 								<5> Specifies fields to be included in or excluded from the analysis. This does
 								not affect whether the fields will be present in the destination index, only
 								affects whether they are used in the analysis.
 								In this example, we can see that all the fields of the source index are included
 								in the destination index except `FlightDelay` and `FlightDelayType` because
 								these are defined as excluded fields by the `excludes` parameter of the
 								`_source` object. The `FlightNum` field is included in the destination index,
 								however it is not included in the analysis because it is explicitly specified as
 								excluded field by the `excludes` parameter of the `analyzed_fields` object.
-												[DOCS] Adds regression analytics resources and examples to the data frame analytics APIs and the evaluation API (#46176)

* [DOCS] Adds regression analytics resources and examples to the data frame analytics APIs.
Co-Authored-By: Benjamin Trent <ben.w.trent@gmail.com>
Co-Authored-By: Tom Veasey <tveasey@users.noreply.github.com>

											
										
										
											2019-09-19 03:10:11 -04:00
+								[[ml-put-dfanalytics-example-od]]
-												[DOCS] Changes level offset in data frame analytics APIs (#59919) (#59923)


											
										
										
											2020-07-20 16:06:29 -04:00
+								=== {oldetection-cap} example
-												[DOCS] Adds regression analytics resources and examples to the data frame analytics APIs and the evaluation API (#46176)

* [DOCS] Adds regression analytics resources and examples to the data frame analytics APIs.
Co-Authored-By: Benjamin Trent <ben.w.trent@gmail.com>
Co-Authored-By: Tom Veasey <tveasey@users.noreply.github.com>

											
										
										
											2019-09-19 03:10:11 -04:00
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 07:34:05 -04:00
+								The following example creates the `loganalytics` {dfanalytics-job}, the analysis
 								type is `outlier_detection`:
-												[DOCS] Change // CONSOLE comments to [source,console] (#46440) (#46494)


											
										
										
											2019-09-09 12:35:50 -04:00
+								[source,console]
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 07:34:05 -04:00
+								--------------------------------------------------
 								PUT _ml/data_frame/analytics/loganalytics
 								{
-												[ML] Add description to DF analytics (#45774) (#46019)


											
										
										
											2019-08-27 08:48:59 -04:00
+								  "description": "Outlier detection on log data",
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 07:34:05 -04:00
+								  "source": {
 								    "index": "logdata"
 								  },
 								  "dest": {
 								    "index": "logdata_out"
 								  },
 								  "analysis": {
 								    "outlier_detection": {
-												[7.x][ML] Additional outlier detection parameters (#47600) (#47669)

Adds the following parameters to `outlier_detection`:

- `compute_feature_influence` (boolean): whether to compute or not
   feature influence scores
- `outlier_fraction` (double): the proportion of the data set assumed
   to be outlying prior to running outlier detection
- `standardization_enabled` (boolean): whether to apply standardization
   to the feature values

Backport of #47600
											
										
										
											2019-10-07 11:21:33 -04:00
+								      "compute_feature_influence": true,
 								      "outlier_fraction": 0.05,
 								      "standardization_enabled": true
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 07:34:05 -04:00
+								    }
 								  }
 								}
 								--------------------------------------------------
-												[DOCS] Updates 7.x version in data frame analytics API (#44026)


											
										
										
											2019-07-08 14:20:57 -04:00
+								// TEST[setup:setup_logdata]
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 07:34:05 -04:00
-												[DOCS] [PUT DFA] Documents inline the child params of source and dest (#45649)

* [DOCS] [PUT DFA] Documents inline the child params of source and dest.

* [DOCS] Fixes indentation issues and amends dfa definitions.

											
										
										
											2019-08-29 08:38:14 -04:00
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 07:34:05 -04:00
+								The API returns the following result:
-												[DOCS] [5 of 5] Change // TESTRESPONSE comments to [source,console-results] (#46449) (#46459)


											
										
										
											2019-09-06 16:09:09 -04:00
+								[source,console-result]
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 07:34:05 -04:00
+								----
 								{
-												[DOCS] Reformat snippets to use two-space indents (#59973) (#59994)


											
										
										
											2020-07-21 15:49:58 -04:00
+								  "id": "loganalytics",
 								  "description": "Outlier detection on log data",
 								  "source": {
 								    "index": ["logdata"],
 								    "query": {
 								      "match_all": {}
 								    }
 								  },
 								  "dest": {
 								    "index": "logdata_out",
 								    "results_field": "ml"
 								  },
 								  "analysis": {
 								    "outlier_detection": {
 								      "compute_feature_influence": true,
 								      "outlier_fraction": 0.05,
 								      "standardization_enabled": true
 								    }
 								  },
 								  "model_memory_limit": "1gb",
 								  "create_time" : 1562265491319,
 								  "version" : "7.6.0",
 								  "allow_lazy_start" : false,
 								  "max_num_threads": 1
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 07:34:05 -04:00
+								}
 								----
-												[7.x][DOCS] Moves data frame analytics job resource definitions into APIs (#50165)

* [7.x][DOCS] Moves data frame analytics job resource definitions into APIs.
											
										
										
											2019-12-13 05:48:21 -05:00
+								// TESTRESPONSE[s/1562265491319/$body.$_path/]
-												Fix hardcoded version replacement in put-dfanalytics.asciidoc #51053

The version replacement for the code snippet should replace 7.6 with the current version,
but doesn't match because of a missing whitespace.

Closes #51052
											
										
										
											2020-01-15 12:09:37 -05:00
+								// TESTRESPONSE[s/"version" : "7.6.0"/"version" : $body.version/]
-												[DOCS] Adds regression analytics resources and examples to the data frame analytics APIs and the evaluation API (#46176)

* [DOCS] Adds regression analytics resources and examples to the data frame analytics APIs.
Co-Authored-By: Benjamin Trent <ben.w.trent@gmail.com>
Co-Authored-By: Tom Veasey <tveasey@users.noreply.github.com>

											
										
										
											2019-09-19 03:10:11 -04:00
 								[[ml-put-dfanalytics-example-r]]
-												[DOCS] Changes level offset in data frame analytics APIs (#59919) (#59923)


											
										
										
											2020-07-20 16:06:29 -04:00
+								=== {regression-cap} examples
-												[DOCS] Adds regression analytics resources and examples to the data frame analytics APIs and the evaluation API (#46176)

* [DOCS] Adds regression analytics resources and examples to the data frame analytics APIs.
Co-Authored-By: Benjamin Trent <ben.w.trent@gmail.com>
Co-Authored-By: Tom Veasey <tveasey@users.noreply.github.com>

											
										
										
											2019-09-19 03:10:11 -04:00
-												[DOCS] Fixes typos in the PUT dfa and the evaluate dfa documentation. (#47348)


											
										
										
											2019-10-02 03:49:59 -04:00
+								The following example creates the `house_price_regression_analysis`
 								{dfanalytics-job}, the analysis type is `regression`:
-												[DOCS] Adds regression analytics resources and examples to the data frame analytics APIs and the evaluation API (#46176)

* [DOCS] Adds regression analytics resources and examples to the data frame analytics APIs.
Co-Authored-By: Benjamin Trent <ben.w.trent@gmail.com>
Co-Authored-By: Tom Veasey <tveasey@users.noreply.github.com>

											
										
										
											2019-09-19 03:10:11 -04:00
 								[source,console]
 								--------------------------------------------------
 								PUT _ml/data_frame/analytics/house_price_regression_analysis
 								{
 								  "source": {
 								    "index": "houses_sold_last_10_yrs"
 								  },
 								  "dest": {
 								    "index": "house_price_predictions"
 								  },
 								  "analysis":
 								    {
 								      "regression": {
 								        "dependent_variable": "price"
 								      }
 								    }
 								}
 								--------------------------------------------------
 								// TEST[skip:TBD]
 								The API returns the following result:
 								[source,console-result]
 								----
 								{
 								  "id" : "house_price_regression_analysis",
 								  "source" : {
 								    "index" : [
 								      "houses_sold_last_10_yrs"
 								    ],
 								    "query" : {
 								      "match_all" : { }
 								    }
 								  },
 								  "dest" : {
 								    "index" : "house_price_predictions",
 								    "results_field" : "ml"
 								  },
 								  "analysis" : {
 								    "regression" : {
 								      "dependent_variable" : "price",
 								      "training_percent" : 100
 								    }
 								  },
 								  "model_memory_limit" : "1gb",
 								  "create_time" : 1567168659127,
-												[ML][7.x] Add lazy assignment job config option (#47993)

This change adds:

- A new option, allow_lazy_open, to anomaly detection jobs
- A new option, allow_lazy_start, to data frame analytics jobs

Both work in the same way: they allow a job to be
opened/started even if no ML node exists that can
accommodate the job immediately. In this situation
the job waits in the opening/starting state until ML
node capacity is available. (The starting state for data
frame analytics jobs is new in this change.)

Additionally, the ML nightly maintenance tasks now
creates audit warnings for ML jobs that are unassigned.
This means that jobs that cannot be assigned to an ML
node for a very long time will show a yellow warning
triangle in the UI.

A final change is that it is now possible to close a job
that is not assigned to a node without using force.
This is because previously jobs that were open but
not assigned to a node were an aberration, whereas
after this change they'll be relatively common.
											
										
										
											2019-10-15 01:55:11 -04:00
+								  "version" : "8.0.0",
 								  "allow_lazy_start" : false
-												[DOCS] Adds regression analytics resources and examples to the data frame analytics APIs and the evaluation API (#46176)

* [DOCS] Adds regression analytics resources and examples to the data frame analytics APIs.
Co-Authored-By: Benjamin Trent <ben.w.trent@gmail.com>
Co-Authored-By: Tom Veasey <tveasey@users.noreply.github.com>

											
										
										
											2019-09-19 03:10:11 -04:00
+								}
 								----
 								// TESTRESPONSE[s/1567168659127/$body.$_path/]
-												[DOCS] Adds examples to the PUT dfa and the evaluate dfa APIs (#46966)

* [DOCS] Adds examples to the PUT dfa and the evaluate dfa APIs.

* [DOCS] Removes extra lines from examples.

* Update docs/reference/ml/df-analytics/apis/evaluate-dfanalytics.asciidoc

Co-Authored-By: Lisa Cawley <lcawley@elastic.co>

* Update docs/reference/ml/df-analytics/apis/put-dfanalytics.asciidoc

Co-Authored-By: Lisa Cawley <lcawley@elastic.co>

* [DOCS] Explains examples.

											
										
										
											2019-10-02 04:26:20 -04:00
+								// TESTRESPONSE[s/"version": "8.0.0"/"version": $body.version/]
 								The following example creates a job and specifies a training percent:
 								[source,console]
 								--------------------------------------------------
 								PUT _ml/data_frame/analytics/student_performance_mathematics_0.3
 								{
 								 "source": {
 								   "index": "student_performance_mathematics"
 								 },
 								 "dest": {
 								   "index":"student_performance_mathematics_reg"
 								 },
 								 "analysis":
 								   {
 								     "regression": {
 								       "dependent_variable": "G3",
-												[7.x][ML] Introduce randomize_seed setting for regression and classification (#49990) (#50023)

This adds a new `randomize_seed` for regression and classification.
When not explicitly set, the seed is randomly generated. One can
reuse the seed in a similar job in order to ensure the same docs
are picked for training.

Backport of #49990
											
										
										
											2019-12-10 08:29:19 -05:00
+								       "training_percent": 70,  <1>
 								       "randomize_seed": 19673948271  <2>
-												[DOCS] Adds examples to the PUT dfa and the evaluate dfa APIs (#46966)

* [DOCS] Adds examples to the PUT dfa and the evaluate dfa APIs.

* [DOCS] Removes extra lines from examples.

* Update docs/reference/ml/df-analytics/apis/evaluate-dfanalytics.asciidoc

Co-Authored-By: Lisa Cawley <lcawley@elastic.co>

* Update docs/reference/ml/df-analytics/apis/put-dfanalytics.asciidoc

Co-Authored-By: Lisa Cawley <lcawley@elastic.co>

* [DOCS] Explains examples.

											
										
										
											2019-10-02 04:26:20 -04:00
+								     }
 								   }
 								}
 								--------------------------------------------------
 								// TEST[skip:TBD]
-												[DOCS] Simplifies footnote text in DFA APIs (#56105)

Co-authored-by: Lisa Cawley <lcawley@elastic.co>
											
										
										
											2020-05-05 03:03:16 -04:00
+								<1> The percentage of the data set that is used for training the model.
 								<2> The seed that is used to randomly pick which data is used for training.
-												[DOCS] Adds classification type DFA API docs and ml-shared.asciidoc (#48241)


											
										
										
											2019-11-06 07:40:27 -05:00
 								[[ml-put-dfanalytics-example-c]]
-												[DOCS] Changes level offset in data frame analytics APIs (#59919) (#59923)


											
										
										
											2020-07-20 16:06:29 -04:00
+								=== {classification-cap} example
-												[DOCS] Adds classification type DFA API docs and ml-shared.asciidoc (#48241)


											
										
										
											2019-11-06 07:40:27 -05:00
 								The following example creates the `loan_classification` {dfanalytics-job}, the
 								analysis type is `classification`:
 								[source,console]
 								--------------------------------------------------
 								PUT _ml/data_frame/analytics/loan_classification
 								{
 								  "source" : {
 								    "index": "loan-applicants"
 								  },
 								  "dest" : {
 								    "index": "loan-applicants-classified"
 								  },
 								  "analysis" : {
 								    "classification": {
 								      "dependent_variable": "label",
 								      "training_percent": 75,
 								      "num_top_classes": 2
 								    }
 								  }
 								}
 								--------------------------------------------------
 								// TEST[skip:TBD]