OpenSearch/docs/reference/ml/df-analytics/apis/put-dfanalytics.asciidoc

[role="xpack"]
[testenv="platinum"]
[[put-dfanalytics]]
=== Create {dfanalytics-jobs} API
[subs="attributes"]
++++
<titleabbrev>Create {dfanalytics-jobs}</titleabbrev>
++++

Instantiates a {dfanalytics-job}.

experimental[]

[[ml-put-dfanalytics-request]]
==== {api-request-title}

`PUT _ml/data_frame/analytics/<data_frame_analytics_id>`


[[ml-put-dfanalytics-prereq]]
==== {api-prereq-title}

* You must have `machine_learning_admin` built-in role to use this API. You must 
also have `read` and `view_index_metadata` privileges on the source index and 
`read`, `create_index`, and `index` privileges on the destination index. For 
more information, see <<security-privileges>> and <<built-in-roles>>.


[[ml-put-dfanalytics-desc]]
==== {api-description-title}

This API creates a {dfanalytics-job} that performs an analysis on the source 
index and stores the outcome in a destination index.

The destination index will be automatically created if it does not exist. The 
`index.number_of_shards` and `index.number_of_replicas` settings of the source 
index will be copied over the destination index. When the source index matches 
multiple indices, these settings will be set to the maximum values found in the 
source indices.

The mappings of the source indices are also attempted to be copied over
to the destination index, however, if the mappings of any of the fields don't 
match among the source indices, the attempt will fail with an error message.

If the destination index already exists, then it will be use as is. This makes 
it possible to set up the destination index in advance with custom settings 
and mappings.

[[ml-put-dfanalytics-supported-fields]]
===== Supported fields

====== {oldetection-cap}

{oldetection-cap} requires numeric or boolean data to analyze. The algorithms 
don't support missing values therefore fields that have data types other than 
numeric or boolean are ignored. Documents where included fields contain missing 
values, null values, or an array are also ignored. Therefore the `dest` index 
may contain documents that don't have an {olscore}.


====== {regression-cap}

{regression-cap} supports fields that are numeric, `boolean`, `text`, `keyword`, 
and `ip`. It is also tolerant of missing values. Fields that are supported are 
included in the analysis, other fields are ignored. Documents where included
fields contain  an array with two or more values are also ignored. Documents in
the `dest` index  that don’t contain a results field are not included in the
 {reganalysis}.


====== {classification-cap}

{classification-cap} supports fields that are numeric, `boolean`, `text`,
`keyword`, and `ip`. It is also tolerant of missing values. Fields that are 
supported are included in the analysis, other fields are ignored. Documents
where included fields contain an array with two or more values are also ignored. 
Documents in the `dest` index that don’t contain a results field are not
included in the {classanalysis}.

{classanalysis-cap} can be improved by mapping ordinal variable values to a 
single number. For example, in case of age ranges, you can model the values as 
"0-14" = 0, "15-24" = 1, "25-34" = 2, and so on.


[[ml-put-dfanalytics-path-params]]
==== {api-path-parms-title}

`<data_frame_analytics_id>`::
  (Required, string) A numerical character string that uniquely identifies the 
  {dfanalytics-job}. This identifier can contain lowercase alphanumeric
  characters (a-z and 0-9), hyphens, and underscores. It must start and end with
  alphanumeric characters.


[[ml-put-dfanalytics-request-body]]
==== {api-request-body-title}

`analysis`::
  (Required, object) Defines the type of {dfanalytics} you want to perform on 
  your source index. For example: `outlier_detection`. See 
  <<dfanalytics-types>>.
  
`analyzed_fields`::
  (Optional, object) You can specify both `includes` and/or `excludes` patterns. 
  If `analyzed_fields` is not set, only the relevant fields will be included. 
  For example, all the numeric fields for {oldetection}. For the supported field 
  types, see <<ml-put-dfanalytics-supported-fields>>. If you specify fields – 
  either in `includes` or in `excludes` – that have a data type that is not 
  supported, an error occurs.
  
  `includes`:::
    (Optional, array) An array of strings that defines the fields that will be 
    included in the analysis.
    
  `excludes`:::
    (Optional, array) An array of strings that defines the fields that will be 
    excluded from the analysis. You do not need to add fields with unsupported 
    data types to `excludes`, these fields are excluded from the analysis 
    automatically.

`description`::
  (Optional, string) A description of the job.

`dest`::
  (Required, object) The destination configuration, consisting of `index` and 
  optionally `results_field` (`ml` by default).
  
    `index`:::
      (Required, string) Defines the _destination index_ to store the results of 
      the {dfanalytics-job}.
    
    `results_field`:::
      (Optional, string) Defines the name of the field in which to store the 
      results of the analysis. Default to `ml`.
  
`model_memory_limit`::
  (Optional, string) The approximate maximum amount of memory resources that are 
  permitted for analytical processing. The default value for {dfanalytics-jobs} 
  is `1gb`. If your `elasticsearch.yml` file contains an 
  `xpack.ml.max_model_memory_limit` setting, an error occurs when you try to 
  create {dfanalytics-jobs} that have `model_memory_limit` values greater than 
  that setting. For more information, see <<ml-settings>>.
  
`source`::
  (Required, object) The source configuration, consisting of `index` and 
  optionally a `query`.
  
    `index`:::
      (Required, string or array) Index or indices on which to perform the 
      analysis. It can be a single index or index pattern as well as an array of 
      indices or patterns.
  
    `query`:::
      (Optional, object) The {es} query domain-specific language 
      (<<query-dsl,DSL>>). This value corresponds to the query object in an {es} 
      search POST body. All the options that are supported by {es} can be used, 
      as this object is passed verbatim to {es}. By default, this property has 
      the following value: `{"match_all": {}}`.

`allow_lazy_start`::
  (Optional, boolean) Whether this job should be allowed to start when there
  is insufficient {ml} node capacity for it to be immediately assigned to a node.
  The default is `false`, which means that the <<start-dfanalytics>>
  will return an error if a {ml} node with capacity to run the
  job cannot immediately be found. (However, this is also subject to
  the cluster-wide `xpack.ml.max_lazy_ml_nodes` setting - see
  <<advanced-ml-settings>>.) If this option is set to `true` then
  the <<start-dfanalytics>> will not return an error, and the job will
  wait in the `starting` state until sufficient {ml} node capacity
  is available.


[[ml-put-dfanalytics-example]]
==== {api-examples-title}


[[ml-put-dfanalytics-example-od]]
===== {oldetection-cap} example

The following example creates the `loganalytics` {dfanalytics-job}, the analysis 
type is `outlier_detection`:

[source,console]
--------------------------------------------------
PUT _ml/data_frame/analytics/loganalytics
{
  "description": "Outlier detection on log data",
  "source": {
    "index": "logdata"
  },
  "dest": {
    "index": "logdata_out"
  },
  "analysis": {
    "outlier_detection": {
      "compute_feature_influence": true,
      "outlier_fraction": 0.05,
      "standardization_enabled": true
    }
  }
}
--------------------------------------------------
// TEST[setup:setup_logdata]


The API returns the following result:

[source,console-result]
----
{
  "id" : "loganalytics",
  "description": "Outlier detection on log data",
  "source" : {
    "index" : [
      "logdata"
    ],
    "query" : {
      "match_all" : { }
    }
  },
  "dest" : {
    "index" : "logdata_out",
    "results_field" : "ml"
  },
  "analysis": {
      "outlier_detection": {
          "compute_feature_influence": true,
          "outlier_fraction": 0.05,
          "standardization_enabled": true
      }
  },
  "model_memory_limit" : "1gb",
  "create_time" : 1562351429434,
  "version" : "7.3.0",
  "allow_lazy_start" : false
}
----
// TESTRESPONSE[s/1562351429434/$body.$_path/]
// TESTRESPONSE[s/"version" : "7.3.0"/"version" : $body.version/]


[[ml-put-dfanalytics-example-r]]
===== {regression-cap} examples

The following example creates the `house_price_regression_analysis` 
{dfanalytics-job}, the analysis type is `regression`:

[source,console]
--------------------------------------------------
PUT _ml/data_frame/analytics/house_price_regression_analysis
{
  "source": {
    "index": "houses_sold_last_10_yrs"
  },
  "dest": {
    "index": "house_price_predictions"
  },
  "analysis": 
    {
      "regression": {
        "dependent_variable": "price"
      }
    }
}
--------------------------------------------------
// TEST[skip:TBD]


The API returns the following result:

[source,console-result]
----
{
  "id" : "house_price_regression_analysis",
  "source" : {
    "index" : [
      "houses_sold_last_10_yrs"
    ],
    "query" : {
      "match_all" : { }
    }
  },
  "dest" : {
    "index" : "house_price_predictions",
    "results_field" : "ml"
  },
  "analysis" : {
    "regression" : {
      "dependent_variable" : "price",
      "training_percent" : 100
    }
  },
  "model_memory_limit" : "1gb",
  "create_time" : 1567168659127,
  "version" : "8.0.0",
  "allow_lazy_start" : false
}
----
// TESTRESPONSE[s/1567168659127/$body.$_path/]
// TESTRESPONSE[s/"version": "8.0.0"/"version": $body.version/]


The following example creates a job and specifies a training percent:

[source,console]
--------------------------------------------------
PUT _ml/data_frame/analytics/student_performance_mathematics_0.3
{
 "source": {
   "index": "student_performance_mathematics"
 },
 "dest": {
   "index":"student_performance_mathematics_reg"
 },
 "analysis":
   {
     "regression": {
       "dependent_variable": "G3",
       "training_percent": 70  <1>
     }
   }
}
--------------------------------------------------
// TEST[skip:TBD]

<1> The `training_percent` defines the percentage of the data set that will be used 
for training the model.


[[ml-put-dfanalytics-example-c]]
===== {classification-cap} example

The following example creates the `loan_classification` {dfanalytics-job}, the 
analysis type is `classification`:

[source,console]
--------------------------------------------------
PUT _ml/data_frame/analytics/loan_classification
{
  "source" : {
    "index": "loan-applicants"
  },
  "dest" : {
    "index": "loan-applicants-classified"
  },
  "analysis" : {
    "classification": {
      "dependent_variable": "label",
      "training_percent": 75,
      "num_top_classes": 2
    }
  }
}
--------------------------------------------------
// TEST[skip:TBD]
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 13:34:05 +02:00
+								[role="xpack"]
 								[testenv="platinum"]
 								[[put-dfanalytics]]
 								=== Create {dfanalytics-jobs} API
 								[subs="attributes"]
 								++++
 								<titleabbrev>Create {dfanalytics-jobs}</titleabbrev>
 								++++
 								Instantiates a {dfanalytics-job}.
-												[DOCS] Reformats API parameter details (#44194)


											
										
										
											2019-07-12 08:26:31 -07:00
+								experimental[]
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 13:34:05 +02:00
+								[[ml-put-dfanalytics-request]]
 								==== {api-request-title}
 								`PUT _ml/data_frame/analytics/<data_frame_analytics_id>`
-												[DOCS] [PUT DFA] Documents inline the child params of source and dest (#45649)

* [DOCS] [PUT DFA] Documents inline the child params of source and dest.

* [DOCS] Fixes indentation issues and amends dfa definitions.

											
										
										
											2019-08-29 14:38:14 +02:00
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 13:34:05 +02:00
+								[[ml-put-dfanalytics-prereq]]
 								==== {api-prereq-title}
 								* You must have `machine_learning_admin` built-in role to use this API. You must
 								also have `read` and `view_index_metadata` privileges on the source index and
 								`read`, `create_index`, and `index` privileges on the destination index. For
-												[DOCS] Cleans up links to security content (#47610) (#47703)


											
										
										
											2019-10-07 15:23:19 -07:00
+								more information, see <<security-privileges>> and <<built-in-roles>>.
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 13:34:05 +02:00
-												[DOCS] [PUT DFA] Documents inline the child params of source and dest (#45649)

* [DOCS] [PUT DFA] Documents inline the child params of source and dest.

* [DOCS] Fixes indentation issues and amends dfa definitions.

											
										
										
											2019-08-29 14:38:14 +02:00
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 13:34:05 +02:00
+								[[ml-put-dfanalytics-desc]]
 								==== {api-description-title}
 								This API creates a {dfanalytics-job} that performs an analysis on the source
 								index and stores the outcome in a destination index.
 								The destination index will be automatically created if it does not exist. The
 								`index.number_of_shards` and `index.number_of_replicas` settings of the source
 								index will be copied over the destination index. When the source index matches
 								multiple indices, these settings will be set to the maximum values found in the
 								source indices.
 								The mappings of the source indices are also attempted to be copied over
 								to the destination index, however, if the mappings of any of the fields don't
 								match among the source indices, the attempt will fail with an error message.
 								If the destination index already exists, then it will be use as is. This makes
 								it possible to set up the destination index in advance with custom settings
 								and mappings.
-												[DOCS] Adds supported fields section to the PUT DFA API description (#47842)



											
										
										
											2019-10-10 12:34:39 +02:00
+								[[ml-put-dfanalytics-supported-fields]]
 								===== Supported fields
 								====== {oldetection-cap}
 								{oldetection-cap} requires numeric or boolean data to analyze. The algorithms
 								don't support missing values therefore fields that have data types other than
 								numeric or boolean are ignored. Documents where included fields contain missing
 								values, null values, or an array are also ignored. Therefore the `dest` index
 								may contain documents that don't have an {olscore}.
 								====== {regression-cap}
-												[DOCS] Fixes data type formatting

											
										
										
											2019-11-26 08:21:39 -08:00
+								{regression-cap} supports fields that are numeric, `boolean`, `text`, `keyword`,
 								and `ip`. It is also tolerant of missing values. Fields that are supported are
 								included in the analysis, other fields are ignored. Documents where included
 								fields contain  an array with two or more values are also ignored. Documents in
 								the `dest` index  that don’t contain a results field are not included in the
 								 {reganalysis}.
-												[DOCS] Adds supported fields section to the PUT DFA API description (#47842)



											
										
										
											2019-10-10 12:34:39 +02:00
-												[DOCS] [PUT DFA] Documents inline the child params of source and dest (#45649)

* [DOCS] [PUT DFA] Documents inline the child params of source and dest.

* [DOCS] Fixes indentation issues and amends dfa definitions.

											
										
										
											2019-08-29 14:38:14 +02:00
-												[DOCS] Adds classification type DFA API docs and ml-shared.asciidoc (#48241)


											
										
										
											2019-11-06 07:40:27 -05:00
+								====== {classification-cap}
-												[DOCS] Fixes data type formatting

											
										
										
											2019-11-26 08:21:39 -08:00
+								{classification-cap} supports fields that are numeric, `boolean`, `text`,
 								`keyword`, and `ip`. It is also tolerant of missing values. Fields that are
 								supported are included in the analysis, other fields are ignored. Documents
 								where included fields contain an array with two or more values are also ignored.
 								Documents in the `dest` index that don’t contain a results field are not
 								included in the {classanalysis}.
-												[DOCS] Adds classification type DFA API docs and ml-shared.asciidoc (#48241)


											
										
										
											2019-11-06 07:40:27 -05:00
 								{classanalysis-cap} can be improved by mapping ordinal variable values to a
 								single number. For example, in case of age ranges, you can model the values as
 								"0-14" = 0, "15-24" = 1, "25-34" = 2, and so on.
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 13:34:05 +02:00
+								[[ml-put-dfanalytics-path-params]]
 								==== {api-path-parms-title}
-												[DOCS] Reformats API parameter details (#44194)


											
										
										
											2019-07-12 08:26:31 -07:00
+								`<data_frame_analytics_id>`::
 								  (Required, string) A numerical character string that uniquely identifies the
 								  {dfanalytics-job}. This identifier can contain lowercase alphanumeric
 								  characters (a-z and 0-9), hyphens, and underscores. It must start and end with
 								  alphanumeric characters.
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 13:34:05 +02:00
-												[DOCS] [PUT DFA] Documents inline the child params of source and dest (#45649)

* [DOCS] [PUT DFA] Documents inline the child params of source and dest.

* [DOCS] Fixes indentation issues and amends dfa definitions.

											
										
										
											2019-08-29 14:38:14 +02:00
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 13:34:05 +02:00
+								[[ml-put-dfanalytics-request-body]]
 								==== {api-request-body-title}
-												[DOCS] Adds data frame analytics API and evaluate API resource documentation (#43972)

This PR adds the resource documentation of the data frame analytics APIs and the evaluate API to the ML API doc pool.
											
										
										
											2019-07-11 18:05:05 +02:00
-												[DOCS] Reformats API parameter details (#44194)


											
										
										
											2019-07-12 08:26:31 -07:00
+								`analysis`::
-												[DOCS] Extends the analyzed_fields description in the PUT DFA API docs (#47791)



											
										
										
											2019-10-09 18:13:33 +02:00
+								  (Required, object) Defines the type of {dfanalytics} you want to perform on
 								  your source index. For example: `outlier_detection`. See
 								  <<dfanalytics-types>>.
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 13:34:05 +02:00
-												[DOCS] Reformats API parameter details (#44194)


											
										
										
											2019-07-12 08:26:31 -07:00
+								`analyzed_fields`::
-												[DOCS] [PUT DFA] Documents inline the child params of source and dest (#45649)

* [DOCS] [PUT DFA] Documents inline the child params of source and dest.

* [DOCS] Fixes indentation issues and amends dfa definitions.

											
										
										
											2019-08-29 14:38:14 +02:00
+								  (Optional, object) You can specify both `includes` and/or `excludes` patterns.
 								  If `analyzed_fields` is not set, only the relevant fields will be included.
-												[DOCS] Adds supported fields section to the PUT DFA API description (#47842)



											
										
										
											2019-10-10 12:34:39 +02:00
+								  For example, all the numeric fields for {oldetection}. For the supported field
-												[DOCS] Extends analyzed_fields description in PUT DFA API docs. (#48307)


											
										
										
											2019-11-11 09:53:59 -05:00
+								  types, see <<ml-put-dfanalytics-supported-fields>>. If you specify fields –
 								  either in `includes` or in `excludes` – that have a data type that is not
 								  supported, an error occurs.
-												[DOCS] [PUT DFA] Documents inline the child params of source and dest (#45649)

* [DOCS] [PUT DFA] Documents inline the child params of source and dest.

* [DOCS] Fixes indentation issues and amends dfa definitions.

											
										
										
											2019-08-29 14:38:14 +02:00
-												[DOCS] Extends the analyzed_fields description in the PUT DFA API docs (#47791)



											
										
										
											2019-10-09 18:13:33 +02:00
+								  `includes`:::
-												[DOCS] [PUT DFA] Documents inline the child params of source and dest (#45649)

* [DOCS] [PUT DFA] Documents inline the child params of source and dest.

* [DOCS] Fixes indentation issues and amends dfa definitions.

											
										
										
											2019-08-29 14:38:14 +02:00
+								    (Optional, array) An array of strings that defines the fields that will be
 								    included in the analysis.
-												[DOCS] Extends the analyzed_fields description in the PUT DFA API docs (#47791)



											
										
										
											2019-10-09 18:13:33 +02:00
+								  `excludes`:::
-												[DOCS] [PUT DFA] Documents inline the child params of source and dest (#45649)

* [DOCS] [PUT DFA] Documents inline the child params of source and dest.

* [DOCS] Fixes indentation issues and amends dfa definitions.

											
										
										
											2019-08-29 14:38:14 +02:00
+								    (Optional, array) An array of strings that defines the fields that will be
-												[DOCS] Extends analyzed_fields description in PUT DFA API docs. (#48307)


											
										
										
											2019-11-11 09:53:59 -05:00
+								    excluded from the analysis. You do not need to add fields with unsupported
 								    data types to `excludes`, these fields are excluded from the analysis
 								    automatically.
-												[ML] Add description to DF analytics (#45774) (#46019)


											
										
										
											2019-08-27 15:48:59 +03:00
 								`description`::
 								  (Optional, string) A description of the job.
-												[DOCS] Reformats API parameter details (#44194)


											
										
										
											2019-07-12 08:26:31 -07:00
+								`dest`::
-												[DOCS] Amends data frame analytics resources, GET, and PUT API docs (#44806)

This PR addresses the feedback in  https://github.com/elastic/ml-team/issues/175#issuecomment-512215731.

* Adds an example to `analyzed_fields`
* Includes `source` and `dest` objects inline in the resource page
* Lists `model_memory_limit` in the PUT API page
* Amends the `analysis` section in the resource page
* Removes Properties headings in subsections
											
										
										
											2019-07-26 11:39:59 +02:00
+								  (Required, object) The destination configuration, consisting of `index` and
-												[DOCS] [PUT DFA] Documents inline the child params of source and dest (#45649)

* [DOCS] [PUT DFA] Documents inline the child params of source and dest.

* [DOCS] Fixes indentation issues and amends dfa definitions.

											
										
										
											2019-08-29 14:38:14 +02:00
+								  optionally `results_field` (`ml` by default).
 								    `index`:::
 								      (Required, string) Defines the _destination index_ to store the results of
 								      the {dfanalytics-job}.
 								    `results_field`:::
 								      (Optional, string) Defines the name of the field in which to store the
 								      results of the analysis. Default to `ml`.
-												[DOCS] Amends data frame analytics resources, GET, and PUT API docs (#44806)

This PR addresses the feedback in  https://github.com/elastic/ml-team/issues/175#issuecomment-512215731.

* Adds an example to `analyzed_fields`
* Includes `source` and `dest` objects inline in the resource page
* Lists `model_memory_limit` in the PUT API page
* Amends the `analysis` section in the resource page
* Removes Properties headings in subsections
											
										
										
											2019-07-26 11:39:59 +02:00
 								`model_memory_limit`::
 								  (Optional, string) The approximate maximum amount of memory resources that are
 								  permitted for analytical processing. The default value for {dfanalytics-jobs}
 								  is `1gb`. If your `elasticsearch.yml` file contains an
 								  `xpack.ml.max_model_memory_limit` setting, an error occurs when you try to
 								  create {dfanalytics-jobs} that have `model_memory_limit` values greater than
 								  that setting. For more information, see <<ml-settings>>.
-												[DOCS] Fixes formatting in data frame analytics API

											
										
										
											2019-07-10 17:58:17 -07:00
-												[DOCS] Reformats API parameter details (#44194)


											
										
										
											2019-07-12 08:26:31 -07:00
+								`source`::
-												[DOCS] Amends data frame analytics resources, GET, and PUT API docs (#44806)

This PR addresses the feedback in  https://github.com/elastic/ml-team/issues/175#issuecomment-512215731.

* Adds an example to `analyzed_fields`
* Includes `source` and `dest` objects inline in the resource page
* Lists `model_memory_limit` in the PUT API page
* Amends the `analysis` section in the resource page
* Removes Properties headings in subsections
											
										
										
											2019-07-26 11:39:59 +02:00
+								  (Required, object) The source configuration, consisting of `index` and
-												[DOCS] [PUT DFA] Documents inline the child params of source and dest (#45649)

* [DOCS] [PUT DFA] Documents inline the child params of source and dest.

* [DOCS] Fixes indentation issues and amends dfa definitions.

											
										
										
											2019-08-29 14:38:14 +02:00
+								  optionally a `query`.
 								    `index`:::
 								      (Required, string or array) Index or indices on which to perform the
 								      analysis. It can be a single index or index pattern as well as an array of
 								      indices or patterns.
 								    `query`:::
 								      (Optional, object) The {es} query domain-specific language
 								      (<<query-dsl,DSL>>). This value corresponds to the query object in an {es}
 								      search POST body. All the options that are supported by {es} can be used,
 								      as this object is passed verbatim to {es}. By default, this property has
 								      the following value: `{"match_all": {}}`.
-												[ML][7.x] Add lazy assignment job config option (#47993)

This change adds:

- A new option, allow_lazy_open, to anomaly detection jobs
- A new option, allow_lazy_start, to data frame analytics jobs

Both work in the same way: they allow a job to be
opened/started even if no ML node exists that can
accommodate the job immediately. In this situation
the job waits in the opening/starting state until ML
node capacity is available. (The starting state for data
frame analytics jobs is new in this change.)

Additionally, the ML nightly maintenance tasks now
creates audit warnings for ML jobs that are unassigned.
This means that jobs that cannot be assigned to an ML
node for a very long time will show a yellow warning
triangle in the UI.

A final change is that it is now possible to close a job
that is not assigned to a node without using force.
This is because previously jobs that were open but
not assigned to a node were an aberration, whereas
after this change they'll be relatively common.
											
										
										
											2019-10-15 06:55:11 +01:00
+								`allow_lazy_start`::
 								  (Optional, boolean) Whether this job should be allowed to start when there
 								  is insufficient {ml} node capacity for it to be immediately assigned to a node.
 								  The default is `false`, which means that the <<start-dfanalytics>>
 								  will return an error if a {ml} node with capacity to run the
 								  job cannot immediately be found. (However, this is also subject to
 								  the cluster-wide `xpack.ml.max_lazy_ml_nodes` setting - see
 								  <<advanced-ml-settings>>.) If this option is set to `true` then
 								  the <<start-dfanalytics>> will not return an error, and the job will
 								  wait in the `starting` state until sufficient {ml} node capacity
 								  is available.
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 13:34:05 +02:00
 								[[ml-put-dfanalytics-example]]
 								==== {api-examples-title}
-												[DOCS] Adds classification type DFA API docs and ml-shared.asciidoc (#48241)


											
										
										
											2019-11-06 07:40:27 -05:00
-												[DOCS] Adds regression analytics resources and examples to the data frame analytics APIs and the evaluation API (#46176)

* [DOCS] Adds regression analytics resources and examples to the data frame analytics APIs.
Co-Authored-By: Benjamin Trent <ben.w.trent@gmail.com>
Co-Authored-By: Tom Veasey <tveasey@users.noreply.github.com>

											
										
										
											2019-09-19 09:10:11 +02:00
+								[[ml-put-dfanalytics-example-od]]
 								===== {oldetection-cap} example
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 13:34:05 +02:00
+								The following example creates the `loganalytics` {dfanalytics-job}, the analysis
 								type is `outlier_detection`:
-												[DOCS] Change // CONSOLE comments to [source,console] (#46440) (#46494)


											
										
										
											2019-09-09 12:35:50 -04:00
+								[source,console]
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 13:34:05 +02:00
+								--------------------------------------------------
 								PUT _ml/data_frame/analytics/loganalytics
 								{
-												[ML] Add description to DF analytics (#45774) (#46019)


											
										
										
											2019-08-27 15:48:59 +03:00
+								  "description": "Outlier detection on log data",
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 13:34:05 +02:00
+								  "source": {
 								    "index": "logdata"
 								  },
 								  "dest": {
 								    "index": "logdata_out"
 								  },
 								  "analysis": {
 								    "outlier_detection": {
-												[7.x][ML] Additional outlier detection parameters (#47600) (#47669)

Adds the following parameters to `outlier_detection`:

- `compute_feature_influence` (boolean): whether to compute or not
   feature influence scores
- `outlier_fraction` (double): the proportion of the data set assumed
   to be outlying prior to running outlier detection
- `standardization_enabled` (boolean): whether to apply standardization
   to the feature values

Backport of #47600
											
										
										
											2019-10-07 18:21:33 +03:00
+								      "compute_feature_influence": true,
 								      "outlier_fraction": 0.05,
 								      "standardization_enabled": true
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 13:34:05 +02:00
+								    }
 								  }
 								}
 								--------------------------------------------------
-												[DOCS] Updates 7.x version in data frame analytics API (#44026)


											
										
										
											2019-07-08 11:20:57 -07:00
+								// TEST[setup:setup_logdata]
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 13:34:05 +02:00
-												[DOCS] [PUT DFA] Documents inline the child params of source and dest (#45649)

* [DOCS] [PUT DFA] Documents inline the child params of source and dest.

* [DOCS] Fixes indentation issues and amends dfa definitions.

											
										
										
											2019-08-29 14:38:14 +02:00
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 13:34:05 +02:00
+								The API returns the following result:
-												[DOCS] [5 of 5] Change // TESTRESPONSE comments to [source,console-results] (#46449) (#46459)


											
										
										
											2019-09-06 16:09:09 -04:00
+								[source,console-result]
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 13:34:05 +02:00
+								----
 								{
-												[DOCS] Updates 7.x version in data frame analytics API (#44026)


											
										
										
											2019-07-08 11:20:57 -07:00
+								  "id" : "loganalytics",
-												[ML] Add description to DF analytics (#45774) (#46019)


											
										
										
											2019-08-27 15:48:59 +03:00
+								  "description": "Outlier detection on log data",
-												[DOCS] Updates 7.x version in data frame analytics API (#44026)


											
										
										
											2019-07-08 11:20:57 -07:00
+								  "source" : {
 								    "index" : [
 								      "logdata"
 								    ],
 								    "query" : {
 								      "match_all" : { }
 								    }
 								  },
 								  "dest" : {
 								    "index" : "logdata_out",
 								    "results_field" : "ml"
 								  },
-												[7.x][ML] Additional outlier detection parameters (#47600) (#47669)

Adds the following parameters to `outlier_detection`:

- `compute_feature_influence` (boolean): whether to compute or not
   feature influence scores
- `outlier_fraction` (double): the proportion of the data set assumed
   to be outlying prior to running outlier detection
- `standardization_enabled` (boolean): whether to apply standardization
   to the feature values

Backport of #47600
											
										
										
											2019-10-07 18:21:33 +03:00
+								  "analysis": {
 								      "outlier_detection": {
 								          "compute_feature_influence": true,
 								          "outlier_fraction": 0.05,
 								          "standardization_enabled": true
 								      }
-												[DOCS] Updates 7.x version in data frame analytics API (#44026)


											
										
										
											2019-07-08 11:20:57 -07:00
+								  },
 								  "model_memory_limit" : "1gb",
 								  "create_time" : 1562351429434,
-												[ML][7.x] Add lazy assignment job config option (#47993)

This change adds:

- A new option, allow_lazy_open, to anomaly detection jobs
- A new option, allow_lazy_start, to data frame analytics jobs

Both work in the same way: they allow a job to be
opened/started even if no ML node exists that can
accommodate the job immediately. In this situation
the job waits in the opening/starting state until ML
node capacity is available. (The starting state for data
frame analytics jobs is new in this change.)

Additionally, the ML nightly maintenance tasks now
creates audit warnings for ML jobs that are unassigned.
This means that jobs that cannot be assigned to an ML
node for a very long time will show a yellow warning
triangle in the UI.

A final change is that it is now possible to close a job
that is not assigned to a node without using force.
This is because previously jobs that were open but
not assigned to a node were an aberration, whereas
after this change they'll be relatively common.
											
										
										
											2019-10-15 06:55:11 +01:00
+								  "version" : "7.3.0",
 								  "allow_lazy_start" : false
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 13:34:05 +02:00
+								}
 								----
-												[DOCS] Updates 7.x version in data frame analytics API (#44026)


											
										
										
											2019-07-08 11:20:57 -07:00
+								// TESTRESPONSE[s/1562351429434/$body.$_path/]
-												[DOCS] Adds regression analytics resources and examples to the data frame analytics APIs and the evaluation API (#46176)

* [DOCS] Adds regression analytics resources and examples to the data frame analytics APIs.
Co-Authored-By: Benjamin Trent <ben.w.trent@gmail.com>
Co-Authored-By: Tom Veasey <tveasey@users.noreply.github.com>

											
										
										
											2019-09-19 09:10:11 +02:00
+								// TESTRESPONSE[s/"version" : "7.3.0"/"version" : $body.version/]
 								[[ml-put-dfanalytics-example-r]]
-												[DOCS] Adds examples to the PUT dfa and the evaluate dfa APIs (#46966)

* [DOCS] Adds examples to the PUT dfa and the evaluate dfa APIs.

* [DOCS] Removes extra lines from examples.

* Update docs/reference/ml/df-analytics/apis/evaluate-dfanalytics.asciidoc

Co-Authored-By: Lisa Cawley <lcawley@elastic.co>

* Update docs/reference/ml/df-analytics/apis/put-dfanalytics.asciidoc

Co-Authored-By: Lisa Cawley <lcawley@elastic.co>

* [DOCS] Explains examples.

											
										
										
											2019-10-02 10:26:20 +02:00
+								===== {regression-cap} examples
-												[DOCS] Adds regression analytics resources and examples to the data frame analytics APIs and the evaluation API (#46176)

* [DOCS] Adds regression analytics resources and examples to the data frame analytics APIs.
Co-Authored-By: Benjamin Trent <ben.w.trent@gmail.com>
Co-Authored-By: Tom Veasey <tveasey@users.noreply.github.com>

											
										
										
											2019-09-19 09:10:11 +02:00
-												[DOCS] Fixes typos in the PUT dfa and the evaluate dfa documentation. (#47348)


											
										
										
											2019-10-02 09:49:59 +02:00
+								The following example creates the `house_price_regression_analysis`
 								{dfanalytics-job}, the analysis type is `regression`:
-												[DOCS] Adds regression analytics resources and examples to the data frame analytics APIs and the evaluation API (#46176)

* [DOCS] Adds regression analytics resources and examples to the data frame analytics APIs.
Co-Authored-By: Benjamin Trent <ben.w.trent@gmail.com>
Co-Authored-By: Tom Veasey <tveasey@users.noreply.github.com>

											
										
										
											2019-09-19 09:10:11 +02:00
 								[source,console]
 								--------------------------------------------------
 								PUT _ml/data_frame/analytics/house_price_regression_analysis
 								{
 								  "source": {
 								    "index": "houses_sold_last_10_yrs"
 								  },
 								  "dest": {
 								    "index": "house_price_predictions"
 								  },
 								  "analysis":
 								    {
 								      "regression": {
 								        "dependent_variable": "price"
 								      }
 								    }
 								}
 								--------------------------------------------------
 								// TEST[skip:TBD]
 								The API returns the following result:
 								[source,console-result]
 								----
 								{
 								  "id" : "house_price_regression_analysis",
 								  "source" : {
 								    "index" : [
 								      "houses_sold_last_10_yrs"
 								    ],
 								    "query" : {
 								      "match_all" : { }
 								    }
 								  },
 								  "dest" : {
 								    "index" : "house_price_predictions",
 								    "results_field" : "ml"
 								  },
 								  "analysis" : {
 								    "regression" : {
 								      "dependent_variable" : "price",
 								      "training_percent" : 100
 								    }
 								  },
 								  "model_memory_limit" : "1gb",
 								  "create_time" : 1567168659127,
-												[ML][7.x] Add lazy assignment job config option (#47993)

This change adds:

- A new option, allow_lazy_open, to anomaly detection jobs
- A new option, allow_lazy_start, to data frame analytics jobs

Both work in the same way: they allow a job to be
opened/started even if no ML node exists that can
accommodate the job immediately. In this situation
the job waits in the opening/starting state until ML
node capacity is available. (The starting state for data
frame analytics jobs is new in this change.)

Additionally, the ML nightly maintenance tasks now
creates audit warnings for ML jobs that are unassigned.
This means that jobs that cannot be assigned to an ML
node for a very long time will show a yellow warning
triangle in the UI.

A final change is that it is now possible to close a job
that is not assigned to a node without using force.
This is because previously jobs that were open but
not assigned to a node were an aberration, whereas
after this change they'll be relatively common.
											
										
										
											2019-10-15 06:55:11 +01:00
+								  "version" : "8.0.0",
 								  "allow_lazy_start" : false
-												[DOCS] Adds regression analytics resources and examples to the data frame analytics APIs and the evaluation API (#46176)

* [DOCS] Adds regression analytics resources and examples to the data frame analytics APIs.
Co-Authored-By: Benjamin Trent <ben.w.trent@gmail.com>
Co-Authored-By: Tom Veasey <tveasey@users.noreply.github.com>

											
										
										
											2019-09-19 09:10:11 +02:00
+								}
 								----
 								// TESTRESPONSE[s/1567168659127/$body.$_path/]
-												[DOCS] Adds examples to the PUT dfa and the evaluate dfa APIs (#46966)

* [DOCS] Adds examples to the PUT dfa and the evaluate dfa APIs.

* [DOCS] Removes extra lines from examples.

* Update docs/reference/ml/df-analytics/apis/evaluate-dfanalytics.asciidoc

Co-Authored-By: Lisa Cawley <lcawley@elastic.co>

* Update docs/reference/ml/df-analytics/apis/put-dfanalytics.asciidoc

Co-Authored-By: Lisa Cawley <lcawley@elastic.co>

* [DOCS] Explains examples.

											
										
										
											2019-10-02 10:26:20 +02:00
+								// TESTRESPONSE[s/"version": "8.0.0"/"version": $body.version/]
 								The following example creates a job and specifies a training percent:
 								[source,console]
 								--------------------------------------------------
 								PUT _ml/data_frame/analytics/student_performance_mathematics_0.3
 								{
 								 "source": {
 								   "index": "student_performance_mathematics"
 								 },
 								 "dest": {
 								   "index":"student_performance_mathematics_reg"
 								 },
 								 "analysis":
 								   {
 								     "regression": {
 								       "dependent_variable": "G3",
 								       "training_percent": 70  <1>
 								     }
 								   }
 								}
 								--------------------------------------------------
 								// TEST[skip:TBD]
 								<1> The `training_percent` defines the percentage of the data set that will be used
-												[ML][7.x] Add lazy assignment job config option (#47993)

This change adds:

- A new option, allow_lazy_open, to anomaly detection jobs
- A new option, allow_lazy_start, to data frame analytics jobs

Both work in the same way: they allow a job to be
opened/started even if no ML node exists that can
accommodate the job immediately. In this situation
the job waits in the opening/starting state until ML
node capacity is available. (The starting state for data
frame analytics jobs is new in this change.)

Additionally, the ML nightly maintenance tasks now
creates audit warnings for ML jobs that are unassigned.
This means that jobs that cannot be assigned to an ML
node for a very long time will show a yellow warning
triangle in the UI.

A final change is that it is now possible to close a job
that is not assigned to a node without using force.
This is because previously jobs that were open but
not assigned to a node were an aberration, whereas
after this change they'll be relatively common.
											
										
										
											2019-10-15 06:55:11 +01:00
+								for training the model.
-												[DOCS] Adds classification type DFA API docs and ml-shared.asciidoc (#48241)


											
										
										
											2019-11-06 07:40:27 -05:00
 								[[ml-put-dfanalytics-example-c]]
 								===== {classification-cap} example
 								The following example creates the `loan_classification` {dfanalytics-job}, the
 								analysis type is `classification`:
 								[source,console]
 								--------------------------------------------------
 								PUT _ml/data_frame/analytics/loan_classification
 								{
 								  "source" : {
 								    "index": "loan-applicants"
 								  },
 								  "dest" : {
 								    "index": "loan-applicants-classified"
 								  },
 								  "analysis" : {
 								    "classification": {
 								      "dependent_variable": "label",
 								      "training_percent": 75,
 								      "num_top_classes": 2
 								    }
 								  }
 								}
 								--------------------------------------------------
 								// TEST[skip:TBD]