OpenSearch/docs/reference/ml/df-analytics/apis/put-dfanalytics.asciidoc

[role="xpack"]
[testenv="platinum"]
[[put-dfanalytics]]
=== Create {dfanalytics-jobs} API
[subs="attributes"]
++++
<titleabbrev>Create {dfanalytics-jobs}</titleabbrev>
++++

Instantiates a {dfanalytics-job}.

experimental[]

[[ml-put-dfanalytics-request]]
==== {api-request-title}

`PUT _ml/data_frame/analytics/<data_frame_analytics_id>`


[[ml-put-dfanalytics-prereq]]
==== {api-prereq-title}

* You must have `machine_learning_admin` built-in role to use this API. You must 
also have `read` and `view_index_metadata` privileges on the source index and 
`read`, `create_index`, and `index` privileges on the destination index. For 
more information, see <<security-privileges>> and <<built-in-roles>>.


[[ml-put-dfanalytics-desc]]
==== {api-description-title}

This API creates a {dfanalytics-job} that performs an analysis on the source 
index and stores the outcome in a destination index.

The destination index will be automatically created if it does not exist. The 
`index.number_of_shards` and `index.number_of_replicas` settings of the source 
index will be copied over the destination index. When the source index matches 
multiple indices, these settings will be set to the maximum values found in the 
source indices.

The mappings of the source indices are also attempted to be copied over
to the destination index, however, if the mappings of any of the fields don't 
match among the source indices, the attempt will fail with an error message.

If the destination index already exists, then it will be use as is. This makes 
it possible to set up the destination index in advance with custom settings 
and mappings.

[[ml-put-dfanalytics-supported-fields]]
===== Supported fields

====== {oldetection-cap}

{oldetection-cap} requires numeric or boolean data to analyze. The algorithms 
don't support missing values therefore fields that have data types other than 
numeric or boolean are ignored. Documents where included fields contain missing 
values, null values, or an array are also ignored. Therefore the `dest` index 
may contain documents that don't have an {olscore}.


====== {regression-cap}

{regression-cap} supports fields that are numeric, `boolean`, `text`, `keyword`, 
and `ip`. It is also tolerant of missing values. Fields that are supported are 
included in the analysis, other fields are ignored. Documents where included
fields contain  an array with two or more values are also ignored. Documents in
the `dest` index  that don’t contain a results field are not included in the
 {reganalysis}.


====== {classification-cap}

{classification-cap} supports fields that are numeric, `boolean`, `text`,
`keyword`, and `ip`. It is also tolerant of missing values. Fields that are 
supported are included in the analysis, other fields are ignored. Documents
where included fields contain an array with two or more values are also ignored. 
Documents in the `dest` index that don’t contain a results field are not
included in the {classanalysis}.

{classanalysis-cap} can be improved by mapping ordinal variable values to a 
single number. For example, in case of age ranges, you can model the values as 
"0-14" = 0, "15-24" = 1, "25-34" = 2, and so on.


[[ml-put-dfanalytics-path-params]]
==== {api-path-parms-title}

`<data_frame_analytics_id>`::
  (Required, string) A numerical character string that uniquely identifies the 
  {dfanalytics-job}. This identifier can contain lowercase alphanumeric
  characters (a-z and 0-9), hyphens, and underscores. It must start and end with
  alphanumeric characters.


[[ml-put-dfanalytics-request-body]]
==== {api-request-body-title}

`analysis`::
  (Required, object) Defines the type of {dfanalytics} you want to perform on 
  your source index. For example: `outlier_detection`. See 
  <<dfanalytics-types>>.
  
`analyzed_fields`::
  (Optional, object) Specify `includes` and/or `excludes` patterns to select
  which fields will be included in the analysis. If `analyzed_fields` is not 
  set, only the relevant fields will be included. For example, all the numeric 
  fields for {oldetection}. For the supported field types, see 
  <<ml-put-dfanalytics-supported-fields>>. Also see the <<explain-dfanalytics>> 
  which helps understand field selection.

  `includes`:::
    (Optional, array) An array of strings that defines the fields that will be 
    included in the analysis.
    
  `excludes`:::
    (Optional, array) An array of strings that defines the fields that will be 
    excluded from the analysis. You do not need to add fields with unsupported 
    data types to `excludes`, these fields are excluded from the analysis 
    automatically.

`description`::
  (Optional, string) A description of the job.

`dest`::
  (Required, object) The destination configuration, consisting of `index` and 
  optionally `results_field` (`ml` by default).
  
    `index`:::
      (Required, string) Defines the _destination index_ to store the results of 
      the {dfanalytics-job}.
    
    `results_field`:::
      (Optional, string) Defines the name of the field in which to store the 
      results of the analysis. Default to `ml`.
  
`model_memory_limit`::
  (Optional, string) The approximate maximum amount of memory resources that are 
  permitted for analytical processing. The default value for {dfanalytics-jobs} 
  is `1gb`. If your `elasticsearch.yml` file contains an 
  `xpack.ml.max_model_memory_limit` setting, an error occurs when you try to 
  create {dfanalytics-jobs} that have `model_memory_limit` values greater than 
  that setting. For more information, see <<ml-settings>>.
  
`source`::
  (object) The configuration of how to source the analysis data. It requires an 
  `index`. Optionally, `query` and `_source` may be specified.
  
  `index`:::
    (Required, string or array) Index or indices on which to perform the 
    analysis. It can be a single index or index pattern as well as an array of 
    indices or patterns.
    
  `query`:::
    (Optional, object) The {es} query domain-specific language 
    (<<query-dsl,DSL>>). This value corresponds to the query object in an {es} 
    search POST body. All the options that are supported by {es} can be used, 
    as this object is passed verbatim to {es}. By default, this property has 
    the following value: `{"match_all": {}}`.

  `_source`:::
    (Optional, object) Specify `includes` and/or `excludes` patterns to select
    which fields will be present in the destination. Fields that are excluded
    cannot be included in the analysis.
        
      `includes`::::
        (array) An array of strings that defines the fields that will be 
        included in the destination.
          
      `excludes`::::
        (array) An array of strings that defines the fields that will be 
        excluded from the destination.

`allow_lazy_start`::
  (Optional, boolean) Whether this job should be allowed to start when there
  is insufficient {ml} node capacity for it to be immediately assigned to a node.
  The default is `false`, which means that the <<start-dfanalytics>>
  will return an error if a {ml} node with capacity to run the
  job cannot immediately be found. (However, this is also subject to
  the cluster-wide `xpack.ml.max_lazy_ml_nodes` setting - see
  <<advanced-ml-settings>>.) If this option is set to `true` then
  the <<start-dfanalytics>> will not return an error, and the job will
  wait in the `starting` state until sufficient {ml} node capacity
  is available.


[[ml-put-dfanalytics-example]]
==== {api-examples-title}


[[ml-put-dfanalytics-example-preprocess]]
===== Preprocessing actions example

The following example shows how to limit the scope of the analysis to certain 
fields, specify excluded fields in the destination index, and use a query to 
filter your data before analysis.

[source,console]
--------------------------------------------------
PUT _ml/data_frame/analytics/model-flight-delays-pre
{
  "source": {
    "index": [
      "kibana_sample_data_flights" <1>
    ],
    "query": { <2>
      "range": {
        "DistanceKilometers": { 
          "gt": 0
        }
      }
    },
    "_source": { <3>
      "includes": [],
      "excludes": [
        "FlightDelay",
        "FlightDelayType"
      ]
    }
  },
  "dest": { <4>
    "index": "df-flight-delays",
    "results_field": "ml-results"
  },
  "analysis": {
  "regression": {
    "dependent_variable": "FlightDelayMin",
    "training_percent": 90
    }
  },
  "analyzed_fields": { <5>
    "includes": [],
    "excludes": [   
      "FlightNum"
    ]
  },
  "model_memory_limit": "100mb"
}
--------------------------------------------------
// TEST[skip:setup kibana sample data]

<1> The source index to analyze.
<2> This query filters out entire documents that will not be present in the 
destination index.
<3> The `_source` object defines fields in the dataset that will be included or 
excluded in the destination index. In this case, `includes` does not specify any 
fields, so the default behavior takes place: all the fields of the source index 
will included except the ones that are explicitly specified in `excludes`.
<4> Defines the destination index that contains the results of the analysis and 
the fields of the source index specified in the `_source` object. Also defines 
the name of the `results_field`.
<5> Specifies fields to be included in or excluded from the analysis. This does 
not affect whether the fields will be present in the destination index, only 
affects whether they are used in the analysis.

In this example, we can see that all the fields of the source index are included 
in the destination index except `FlightDelay` and `FlightDelayType` because 
these are defined as excluded fields by the `excludes` parameter of the 
`_source` object. The `FlightNum` field is included in the destination index, 
however it is not included in the analysis because it is explicitly specified as 
excluded field by the `excludes` parameter of the `analyzed_fields` object.


[[ml-put-dfanalytics-example-od]]
===== {oldetection-cap} example

The following example creates the `loganalytics` {dfanalytics-job}, the analysis 
type is `outlier_detection`:

[source,console]
--------------------------------------------------
PUT _ml/data_frame/analytics/loganalytics
{
  "description": "Outlier detection on log data",
  "source": {
    "index": "logdata"
  },
  "dest": {
    "index": "logdata_out"
  },
  "analysis": {
    "outlier_detection": {
      "compute_feature_influence": true,
      "outlier_fraction": 0.05,
      "standardization_enabled": true
    }
  }
}
--------------------------------------------------
// TEST[setup:setup_logdata]


The API returns the following result:

[source,console-result]
----
{
  "id" : "loganalytics",
  "description": "Outlier detection on log data",
  "source" : {
    "index" : [
      "logdata"
    ],
    "query" : {
      "match_all" : { }
    }
  },
  "dest" : {
    "index" : "logdata_out",
    "results_field" : "ml"
  },
  "analysis": {
      "outlier_detection": {
          "compute_feature_influence": true,
          "outlier_fraction": 0.05,
          "standardization_enabled": true
      }
  },
  "model_memory_limit" : "1gb",
  "create_time" : 1562351429434,
  "version" : "7.3.0",
  "allow_lazy_start" : false
}
----
// TESTRESPONSE[s/1562351429434/$body.$_path/]
// TESTRESPONSE[s/"version" : "7.3.0"/"version" : $body.version/]


[[ml-put-dfanalytics-example-r]]
===== {regression-cap} examples

The following example creates the `house_price_regression_analysis` 
{dfanalytics-job}, the analysis type is `regression`:

[source,console]
--------------------------------------------------
PUT _ml/data_frame/analytics/house_price_regression_analysis
{
  "source": {
    "index": "houses_sold_last_10_yrs"
  },
  "dest": {
    "index": "house_price_predictions"
  },
  "analysis": 
    {
      "regression": {
        "dependent_variable": "price"
      }
    }
}
--------------------------------------------------
// TEST[skip:TBD]


The API returns the following result:

[source,console-result]
----
{
  "id" : "house_price_regression_analysis",
  "source" : {
    "index" : [
      "houses_sold_last_10_yrs"
    ],
    "query" : {
      "match_all" : { }
    }
  },
  "dest" : {
    "index" : "house_price_predictions",
    "results_field" : "ml"
  },
  "analysis" : {
    "regression" : {
      "dependent_variable" : "price",
      "training_percent" : 100
    }
  },
  "model_memory_limit" : "1gb",
  "create_time" : 1567168659127,
  "version" : "8.0.0",
  "allow_lazy_start" : false
}
----
// TESTRESPONSE[s/1567168659127/$body.$_path/]
// TESTRESPONSE[s/"version": "8.0.0"/"version": $body.version/]


The following example creates a job and specifies a training percent:

[source,console]
--------------------------------------------------
PUT _ml/data_frame/analytics/student_performance_mathematics_0.3
{
 "source": {
   "index": "student_performance_mathematics"
 },
 "dest": {
   "index":"student_performance_mathematics_reg"
 },
 "analysis":
   {
     "regression": {
       "dependent_variable": "G3",
       "training_percent": 70  <1>
     }
   }
}
--------------------------------------------------
// TEST[skip:TBD]

<1> The `training_percent` defines the percentage of the data set that will be used 
for training the model.


[[ml-put-dfanalytics-example-c]]
===== {classification-cap} example

The following example creates the `loan_classification` {dfanalytics-job}, the 
analysis type is `classification`:

[source,console]
--------------------------------------------------
PUT _ml/data_frame/analytics/loan_classification
{
  "source" : {
    "index": "loan-applicants"
  },
  "dest" : {
    "index": "loan-applicants-classified"
  },
  "analysis" : {
    "classification": {
      "dependent_variable": "label",
      "training_percent": 75,
      "num_top_classes": 2
    }
  }
}
--------------------------------------------------
// TEST[skip:TBD]
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 13:34:05 +02:00
+								[role="xpack"]
 								[testenv="platinum"]
 								[[put-dfanalytics]]
 								=== Create {dfanalytics-jobs} API
 								[subs="attributes"]
 								++++
 								<titleabbrev>Create {dfanalytics-jobs}</titleabbrev>
 								++++
 								Instantiates a {dfanalytics-job}.
-												[DOCS] Reformats API parameter details (#44194)


											
										
										
											2019-07-12 08:26:31 -07:00
+								experimental[]
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 13:34:05 +02:00
+								[[ml-put-dfanalytics-request]]
 								==== {api-request-title}
 								`PUT _ml/data_frame/analytics/<data_frame_analytics_id>`
-												[DOCS] [PUT DFA] Documents inline the child params of source and dest (#45649)

* [DOCS] [PUT DFA] Documents inline the child params of source and dest.

* [DOCS] Fixes indentation issues and amends dfa definitions.

											
										
										
											2019-08-29 14:38:14 +02:00
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 13:34:05 +02:00
+								[[ml-put-dfanalytics-prereq]]
 								==== {api-prereq-title}
 								* You must have `machine_learning_admin` built-in role to use this API. You must
 								also have `read` and `view_index_metadata` privileges on the source index and
 								`read`, `create_index`, and `index` privileges on the destination index. For
-												[DOCS] Cleans up links to security content (#47610) (#47703)


											
										
										
											2019-10-07 15:23:19 -07:00
+								more information, see <<security-privileges>> and <<built-in-roles>>.
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 13:34:05 +02:00
-												[DOCS] [PUT DFA] Documents inline the child params of source and dest (#45649)

* [DOCS] [PUT DFA] Documents inline the child params of source and dest.

* [DOCS] Fixes indentation issues and amends dfa definitions.

											
										
										
											2019-08-29 14:38:14 +02:00
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 13:34:05 +02:00
+								[[ml-put-dfanalytics-desc]]
 								==== {api-description-title}
 								This API creates a {dfanalytics-job} that performs an analysis on the source
 								index and stores the outcome in a destination index.
 								The destination index will be automatically created if it does not exist. The
 								`index.number_of_shards` and `index.number_of_replicas` settings of the source
 								index will be copied over the destination index. When the source index matches
 								multiple indices, these settings will be set to the maximum values found in the
 								source indices.
 								The mappings of the source indices are also attempted to be copied over
 								to the destination index, however, if the mappings of any of the fields don't
 								match among the source indices, the attempt will fail with an error message.
 								If the destination index already exists, then it will be use as is. This makes
 								it possible to set up the destination index in advance with custom settings
 								and mappings.
-												[DOCS] Adds supported fields section to the PUT DFA API description (#47842)



											
										
										
											2019-10-10 12:34:39 +02:00
+								[[ml-put-dfanalytics-supported-fields]]
 								===== Supported fields
 								====== {oldetection-cap}
 								{oldetection-cap} requires numeric or boolean data to analyze. The algorithms
 								don't support missing values therefore fields that have data types other than
 								numeric or boolean are ignored. Documents where included fields contain missing
 								values, null values, or an array are also ignored. Therefore the `dest` index
 								may contain documents that don't have an {olscore}.
 								====== {regression-cap}
-												[DOCS] Fixes data type formatting

											
										
										
											2019-11-26 08:21:39 -08:00
+								{regression-cap} supports fields that are numeric, `boolean`, `text`, `keyword`,
 								and `ip`. It is also tolerant of missing values. Fields that are supported are
 								included in the analysis, other fields are ignored. Documents where included
 								fields contain  an array with two or more values are also ignored. Documents in
 								the `dest` index  that don’t contain a results field are not included in the
 								 {reganalysis}.
-												[DOCS] Adds supported fields section to the PUT DFA API description (#47842)



											
										
										
											2019-10-10 12:34:39 +02:00
-												[DOCS] [PUT DFA] Documents inline the child params of source and dest (#45649)

* [DOCS] [PUT DFA] Documents inline the child params of source and dest.

* [DOCS] Fixes indentation issues and amends dfa definitions.

											
										
										
											2019-08-29 14:38:14 +02:00
-												[DOCS] Adds classification type DFA API docs and ml-shared.asciidoc (#48241)


											
										
										
											2019-11-06 07:40:27 -05:00
+								====== {classification-cap}
-												[DOCS] Fixes data type formatting

											
										
										
											2019-11-26 08:21:39 -08:00
+								{classification-cap} supports fields that are numeric, `boolean`, `text`,
 								`keyword`, and `ip`. It is also tolerant of missing values. Fields that are
 								supported are included in the analysis, other fields are ignored. Documents
 								where included fields contain an array with two or more values are also ignored.
 								Documents in the `dest` index that don’t contain a results field are not
 								included in the {classanalysis}.
-												[DOCS] Adds classification type DFA API docs and ml-shared.asciidoc (#48241)


											
										
										
											2019-11-06 07:40:27 -05:00
 								{classanalysis-cap} can be improved by mapping ordinal variable values to a
 								single number. For example, in case of age ranges, you can model the values as
 								"0-14" = 0, "15-24" = 1, "25-34" = 2, and so on.
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 13:34:05 +02:00
+								[[ml-put-dfanalytics-path-params]]
 								==== {api-path-parms-title}
-												[DOCS] Reformats API parameter details (#44194)


											
										
										
											2019-07-12 08:26:31 -07:00
+								`<data_frame_analytics_id>`::
 								  (Required, string) A numerical character string that uniquely identifies the
 								  {dfanalytics-job}. This identifier can contain lowercase alphanumeric
 								  characters (a-z and 0-9), hyphens, and underscores. It must start and end with
 								  alphanumeric characters.
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 13:34:05 +02:00
-												[DOCS] [PUT DFA] Documents inline the child params of source and dest (#45649)

* [DOCS] [PUT DFA] Documents inline the child params of source and dest.

* [DOCS] Fixes indentation issues and amends dfa definitions.

											
										
										
											2019-08-29 14:38:14 +02:00
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 13:34:05 +02:00
+								[[ml-put-dfanalytics-request-body]]
 								==== {api-request-body-title}
-												[DOCS] Adds data frame analytics API and evaluate API resource documentation (#43972)

This PR adds the resource documentation of the data frame analytics APIs and the evaluate API to the ML API doc pool.
											
										
										
											2019-07-11 18:05:05 +02:00
-												[DOCS] Reformats API parameter details (#44194)


											
										
										
											2019-07-12 08:26:31 -07:00
+								`analysis`::
-												[DOCS] Extends the analyzed_fields description in the PUT DFA API docs (#47791)



											
										
										
											2019-10-09 18:13:33 +02:00
+								  (Required, object) Defines the type of {dfanalytics} you want to perform on
 								  your source index. For example: `outlier_detection`. See
 								  <<dfanalytics-types>>.
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 13:34:05 +02:00
-												[DOCS] Reformats API parameter details (#44194)


											
										
										
											2019-07-12 08:26:31 -07:00
+								`analyzed_fields`::
-												[7.x][ML] Add optional source filtering during data frame reindexing (#49690) (#49718)

This adds a `_source` setting under the `source` setting of a data
frame analytics config. The new `_source` is reusing the structure
of a `FetchSourceContext` like `analyzed_fields` does. Specifying
includes and excludes for source allows selecting which fields
will get reindexed and will be available in the destination index.

Closes #49531

Backport of #49690
											
										
										
											2019-11-29 16:10:44 +02:00
+								  (Optional, object) Specify `includes` and/or `excludes` patterns to select
-												[DOCS] Adds an example of preprocessing actions to the PUT DFA API docs (#49831)



											
										
										
											2019-12-05 14:15:19 +01:00
+								  which fields will be included in the analysis. If `analyzed_fields` is not
 								  set, only the relevant fields will be included. For example, all the numeric
 								  fields for {oldetection}. For the supported field types, see
 								  <<ml-put-dfanalytics-supported-fields>>. Also see the <<explain-dfanalytics>>
 								  which helps understand field selection.
-												[7.x][ML] Add optional source filtering during data frame reindexing (#49690) (#49718)

This adds a `_source` setting under the `source` setting of a data
frame analytics config. The new `_source` is reusing the structure
of a `FetchSourceContext` like `analyzed_fields` does. Specifying
includes and excludes for source allows selecting which fields
will get reindexed and will be available in the destination index.

Closes #49531

Backport of #49690
											
										
										
											2019-11-29 16:10:44 +02:00
-												[DOCS] Extends the analyzed_fields description in the PUT DFA API docs (#47791)



											
										
										
											2019-10-09 18:13:33 +02:00
+								  `includes`:::
-												[DOCS] [PUT DFA] Documents inline the child params of source and dest (#45649)

* [DOCS] [PUT DFA] Documents inline the child params of source and dest.

* [DOCS] Fixes indentation issues and amends dfa definitions.

											
										
										
											2019-08-29 14:38:14 +02:00
+								    (Optional, array) An array of strings that defines the fields that will be
 								    included in the analysis.
-												[DOCS] Extends the analyzed_fields description in the PUT DFA API docs (#47791)



											
										
										
											2019-10-09 18:13:33 +02:00
+								  `excludes`:::
-												[DOCS] [PUT DFA] Documents inline the child params of source and dest (#45649)

* [DOCS] [PUT DFA] Documents inline the child params of source and dest.

* [DOCS] Fixes indentation issues and amends dfa definitions.

											
										
										
											2019-08-29 14:38:14 +02:00
+								    (Optional, array) An array of strings that defines the fields that will be
-												[DOCS] Extends analyzed_fields description in PUT DFA API docs. (#48307)


											
										
										
											2019-11-11 09:53:59 -05:00
+								    excluded from the analysis. You do not need to add fields with unsupported
 								    data types to `excludes`, these fields are excluded from the analysis
 								    automatically.
-												[ML] Add description to DF analytics (#45774) (#46019)


											
										
										
											2019-08-27 15:48:59 +03:00
 								`description`::
 								  (Optional, string) A description of the job.
-												[DOCS] Reformats API parameter details (#44194)


											
										
										
											2019-07-12 08:26:31 -07:00
+								`dest`::
-												[DOCS] Amends data frame analytics resources, GET, and PUT API docs (#44806)

This PR addresses the feedback in  https://github.com/elastic/ml-team/issues/175#issuecomment-512215731.

* Adds an example to `analyzed_fields`
* Includes `source` and `dest` objects inline in the resource page
* Lists `model_memory_limit` in the PUT API page
* Amends the `analysis` section in the resource page
* Removes Properties headings in subsections
											
										
										
											2019-07-26 11:39:59 +02:00
+								  (Required, object) The destination configuration, consisting of `index` and
-												[DOCS] [PUT DFA] Documents inline the child params of source and dest (#45649)

* [DOCS] [PUT DFA] Documents inline the child params of source and dest.

* [DOCS] Fixes indentation issues and amends dfa definitions.

											
										
										
											2019-08-29 14:38:14 +02:00
+								  optionally `results_field` (`ml` by default).
 								    `index`:::
 								      (Required, string) Defines the _destination index_ to store the results of
 								      the {dfanalytics-job}.
 								    `results_field`:::
 								      (Optional, string) Defines the name of the field in which to store the
 								      results of the analysis. Default to `ml`.
-												[DOCS] Amends data frame analytics resources, GET, and PUT API docs (#44806)

This PR addresses the feedback in  https://github.com/elastic/ml-team/issues/175#issuecomment-512215731.

* Adds an example to `analyzed_fields`
* Includes `source` and `dest` objects inline in the resource page
* Lists `model_memory_limit` in the PUT API page
* Amends the `analysis` section in the resource page
* Removes Properties headings in subsections
											
										
										
											2019-07-26 11:39:59 +02:00
 								`model_memory_limit`::
 								  (Optional, string) The approximate maximum amount of memory resources that are
 								  permitted for analytical processing. The default value for {dfanalytics-jobs}
 								  is `1gb`. If your `elasticsearch.yml` file contains an
 								  `xpack.ml.max_model_memory_limit` setting, an error occurs when you try to
 								  create {dfanalytics-jobs} that have `model_memory_limit` values greater than
 								  that setting. For more information, see <<ml-settings>>.
-												[DOCS] Fixes formatting in data frame analytics API

											
										
										
											2019-07-10 17:58:17 -07:00
-												[DOCS] Reformats API parameter details (#44194)


											
										
										
											2019-07-12 08:26:31 -07:00
+								`source`::
-												[DOCS] Adds an example of preprocessing actions to the PUT DFA API docs (#49831)



											
										
										
											2019-12-05 14:15:19 +01:00
+								  (object) The configuration of how to source the analysis data. It requires an
 								  `index`. Optionally, `query` and `_source` may be specified.
-												[DOCS] [PUT DFA] Documents inline the child params of source and dest (#45649)

* [DOCS] [PUT DFA] Documents inline the child params of source and dest.

* [DOCS] Fixes indentation issues and amends dfa definitions.

											
										
										
											2019-08-29 14:38:14 +02:00
-												[7.x][ML] Add optional source filtering during data frame reindexing (#49690) (#49718)

This adds a `_source` setting under the `source` setting of a data
frame analytics config. The new `_source` is reusing the structure
of a `FetchSourceContext` like `analyzed_fields` does. Specifying
includes and excludes for source allows selecting which fields
will get reindexed and will be available in the destination index.

Closes #49531

Backport of #49690
											
										
										
											2019-11-29 16:10:44 +02:00
+								  `index`:::
 								    (Required, string or array) Index or indices on which to perform the
 								    analysis. It can be a single index or index pattern as well as an array of
 								    indices or patterns.
 								  `query`:::
 								    (Optional, object) The {es} query domain-specific language
 								    (<<query-dsl,DSL>>). This value corresponds to the query object in an {es}
 								    search POST body. All the options that are supported by {es} can be used,
 								    as this object is passed verbatim to {es}. By default, this property has
 								    the following value: `{"match_all": {}}`.
 								  `_source`:::
 								    (Optional, object) Specify `includes` and/or `excludes` patterns to select
 								    which fields will be present in the destination. Fields that are excluded
 								    cannot be included in the analysis.
 								      `includes`::::
-												[DOCS] Adds an example of preprocessing actions to the PUT DFA API docs (#49831)



											
										
										
											2019-12-05 14:15:19 +01:00
+								        (array) An array of strings that defines the fields that will be
 								        included in the destination.
-												[7.x][ML] Add optional source filtering during data frame reindexing (#49690) (#49718)

This adds a `_source` setting under the `source` setting of a data
frame analytics config. The new `_source` is reusing the structure
of a `FetchSourceContext` like `analyzed_fields` does. Specifying
includes and excludes for source allows selecting which fields
will get reindexed and will be available in the destination index.

Closes #49531

Backport of #49690
											
										
										
											2019-11-29 16:10:44 +02:00
 								      `excludes`::::
-												[DOCS] Adds an example of preprocessing actions to the PUT DFA API docs (#49831)



											
										
										
											2019-12-05 14:15:19 +01:00
+								        (array) An array of strings that defines the fields that will be
 								        excluded from the destination.
-												[DOCS] [PUT DFA] Documents inline the child params of source and dest (#45649)

* [DOCS] [PUT DFA] Documents inline the child params of source and dest.

* [DOCS] Fixes indentation issues and amends dfa definitions.

											
										
										
											2019-08-29 14:38:14 +02:00
-												[ML][7.x] Add lazy assignment job config option (#47993)

This change adds:

- A new option, allow_lazy_open, to anomaly detection jobs
- A new option, allow_lazy_start, to data frame analytics jobs

Both work in the same way: they allow a job to be
opened/started even if no ML node exists that can
accommodate the job immediately. In this situation
the job waits in the opening/starting state until ML
node capacity is available. (The starting state for data
frame analytics jobs is new in this change.)

Additionally, the ML nightly maintenance tasks now
creates audit warnings for ML jobs that are unassigned.
This means that jobs that cannot be assigned to an ML
node for a very long time will show a yellow warning
triangle in the UI.

A final change is that it is now possible to close a job
that is not assigned to a node without using force.
This is because previously jobs that were open but
not assigned to a node were an aberration, whereas
after this change they'll be relatively common.
											
										
										
											2019-10-15 06:55:11 +01:00
+								`allow_lazy_start`::
 								  (Optional, boolean) Whether this job should be allowed to start when there
 								  is insufficient {ml} node capacity for it to be immediately assigned to a node.
 								  The default is `false`, which means that the <<start-dfanalytics>>
 								  will return an error if a {ml} node with capacity to run the
 								  job cannot immediately be found. (However, this is also subject to
 								  the cluster-wide `xpack.ml.max_lazy_ml_nodes` setting - see
 								  <<advanced-ml-settings>>.) If this option is set to `true` then
 								  the <<start-dfanalytics>> will not return an error, and the job will
 								  wait in the `starting` state until sufficient {ml} node capacity
 								  is available.
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 13:34:05 +02:00
 								[[ml-put-dfanalytics-example]]
 								==== {api-examples-title}
-												[DOCS] Adds classification type DFA API docs and ml-shared.asciidoc (#48241)


											
										
										
											2019-11-06 07:40:27 -05:00
-												[DOCS] Adds an example of preprocessing actions to the PUT DFA API docs (#49831)



											
										
										
											2019-12-05 14:15:19 +01:00
+								[[ml-put-dfanalytics-example-preprocess]]
 								===== Preprocessing actions example
 								The following example shows how to limit the scope of the analysis to certain
 								fields, specify excluded fields in the destination index, and use a query to
 								filter your data before analysis.
 								[source,console]
 								--------------------------------------------------
 								PUT _ml/data_frame/analytics/model-flight-delays-pre
 								{
 								  "source": {
 								    "index": [
 								      "kibana_sample_data_flights" <1>
 								    ],
 								    "query": { <2>
 								      "range": {
 								        "DistanceKilometers": {
 								          "gt": 0
 								        }
 								      }
 								    },
 								    "_source": { <3>
 								      "includes": [],
 								      "excludes": [
 								        "FlightDelay",
 								        "FlightDelayType"
 								      ]
 								    }
 								  },
 								  "dest": { <4>
 								    "index": "df-flight-delays",
 								    "results_field": "ml-results"
 								  },
 								  "analysis": {
 								  "regression": {
 								    "dependent_variable": "FlightDelayMin",
 								    "training_percent": 90
 								    }
 								  },
 								  "analyzed_fields": { <5>
 								    "includes": [],
 								    "excludes": [
 								      "FlightNum"
 								    ]
 								  },
 								  "model_memory_limit": "100mb"
 								}
 								--------------------------------------------------
 								// TEST[skip:setup kibana sample data]
 								<1> The source index to analyze.
 								<2> This query filters out entire documents that will not be present in the
 								destination index.
 								<3> The `_source` object defines fields in the dataset that will be included or
 								excluded in the destination index. In this case, `includes` does not specify any
 								fields, so the default behavior takes place: all the fields of the source index
 								will included except the ones that are explicitly specified in `excludes`.
 								<4> Defines the destination index that contains the results of the analysis and
 								the fields of the source index specified in the `_source` object. Also defines
 								the name of the `results_field`.
 								<5> Specifies fields to be included in or excluded from the analysis. This does
 								not affect whether the fields will be present in the destination index, only
 								affects whether they are used in the analysis.
 								In this example, we can see that all the fields of the source index are included
 								in the destination index except `FlightDelay` and `FlightDelayType` because
 								these are defined as excluded fields by the `excludes` parameter of the
 								`_source` object. The `FlightNum` field is included in the destination index,
 								however it is not included in the analysis because it is explicitly specified as
 								excluded field by the `excludes` parameter of the `analyzed_fields` object.
-												[DOCS] Adds regression analytics resources and examples to the data frame analytics APIs and the evaluation API (#46176)

* [DOCS] Adds regression analytics resources and examples to the data frame analytics APIs.
Co-Authored-By: Benjamin Trent <ben.w.trent@gmail.com>
Co-Authored-By: Tom Veasey <tveasey@users.noreply.github.com>

											
										
										
											2019-09-19 09:10:11 +02:00
+								[[ml-put-dfanalytics-example-od]]
 								===== {oldetection-cap} example
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 13:34:05 +02:00
+								The following example creates the `loganalytics` {dfanalytics-job}, the analysis
 								type is `outlier_detection`:
-												[DOCS] Change // CONSOLE comments to [source,console] (#46440) (#46494)


											
										
										
											2019-09-09 12:35:50 -04:00
+								[source,console]
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 13:34:05 +02:00
+								--------------------------------------------------
 								PUT _ml/data_frame/analytics/loganalytics
 								{
-												[ML] Add description to DF analytics (#45774) (#46019)


											
										
										
											2019-08-27 15:48:59 +03:00
+								  "description": "Outlier detection on log data",
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 13:34:05 +02:00
+								  "source": {
 								    "index": "logdata"
 								  },
 								  "dest": {
 								    "index": "logdata_out"
 								  },
 								  "analysis": {
 								    "outlier_detection": {
-												[7.x][ML] Additional outlier detection parameters (#47600) (#47669)

Adds the following parameters to `outlier_detection`:

- `compute_feature_influence` (boolean): whether to compute or not
   feature influence scores
- `outlier_fraction` (double): the proportion of the data set assumed
   to be outlying prior to running outlier detection
- `standardization_enabled` (boolean): whether to apply standardization
   to the feature values

Backport of #47600
											
										
										
											2019-10-07 18:21:33 +03:00
+								      "compute_feature_influence": true,
 								      "outlier_fraction": 0.05,
 								      "standardization_enabled": true
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 13:34:05 +02:00
+								    }
 								  }
 								}
 								--------------------------------------------------
-												[DOCS] Updates 7.x version in data frame analytics API (#44026)


											
										
										
											2019-07-08 11:20:57 -07:00
+								// TEST[setup:setup_logdata]
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 13:34:05 +02:00
-												[DOCS] [PUT DFA] Documents inline the child params of source and dest (#45649)

* [DOCS] [PUT DFA] Documents inline the child params of source and dest.

* [DOCS] Fixes indentation issues and amends dfa definitions.

											
										
										
											2019-08-29 14:38:14 +02:00
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 13:34:05 +02:00
+								The API returns the following result:
-												[DOCS] [5 of 5] Change // TESTRESPONSE comments to [source,console-results] (#46449) (#46459)


											
										
										
											2019-09-06 16:09:09 -04:00
+								[source,console-result]
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 13:34:05 +02:00
+								----
 								{
-												[DOCS] Updates 7.x version in data frame analytics API (#44026)


											
										
										
											2019-07-08 11:20:57 -07:00
+								  "id" : "loganalytics",
-												[ML] Add description to DF analytics (#45774) (#46019)


											
										
										
											2019-08-27 15:48:59 +03:00
+								  "description": "Outlier detection on log data",
-												[DOCS] Updates 7.x version in data frame analytics API (#44026)


											
										
										
											2019-07-08 11:20:57 -07:00
+								  "source" : {
 								    "index" : [
 								      "logdata"
 								    ],
 								    "query" : {
 								      "match_all" : { }
 								    }
 								  },
 								  "dest" : {
 								    "index" : "logdata_out",
 								    "results_field" : "ml"
 								  },
-												[7.x][ML] Additional outlier detection parameters (#47600) (#47669)

Adds the following parameters to `outlier_detection`:

- `compute_feature_influence` (boolean): whether to compute or not
   feature influence scores
- `outlier_fraction` (double): the proportion of the data set assumed
   to be outlying prior to running outlier detection
- `standardization_enabled` (boolean): whether to apply standardization
   to the feature values

Backport of #47600
											
										
										
											2019-10-07 18:21:33 +03:00
+								  "analysis": {
 								      "outlier_detection": {
 								          "compute_feature_influence": true,
 								          "outlier_fraction": 0.05,
 								          "standardization_enabled": true
 								      }
-												[DOCS] Updates 7.x version in data frame analytics API (#44026)


											
										
										
											2019-07-08 11:20:57 -07:00
+								  },
 								  "model_memory_limit" : "1gb",
 								  "create_time" : 1562351429434,
-												[ML][7.x] Add lazy assignment job config option (#47993)

This change adds:

- A new option, allow_lazy_open, to anomaly detection jobs
- A new option, allow_lazy_start, to data frame analytics jobs

Both work in the same way: they allow a job to be
opened/started even if no ML node exists that can
accommodate the job immediately. In this situation
the job waits in the opening/starting state until ML
node capacity is available. (The starting state for data
frame analytics jobs is new in this change.)

Additionally, the ML nightly maintenance tasks now
creates audit warnings for ML jobs that are unassigned.
This means that jobs that cannot be assigned to an ML
node for a very long time will show a yellow warning
triangle in the UI.

A final change is that it is now possible to close a job
that is not assigned to a node without using force.
This is because previously jobs that were open but
not assigned to a node were an aberration, whereas
after this change they'll be relatively common.
											
										
										
											2019-10-15 06:55:11 +01:00
+								  "version" : "7.3.0",
 								  "allow_lazy_start" : false
-												[DOCS] Adds data frame analytics APIs to the ML APIs (#43875)

This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
											
										
										
											2019-07-05 13:34:05 +02:00
+								}
 								----
-												[DOCS] Updates 7.x version in data frame analytics API (#44026)


											
										
										
											2019-07-08 11:20:57 -07:00
+								// TESTRESPONSE[s/1562351429434/$body.$_path/]
-												[DOCS] Adds regression analytics resources and examples to the data frame analytics APIs and the evaluation API (#46176)

* [DOCS] Adds regression analytics resources and examples to the data frame analytics APIs.
Co-Authored-By: Benjamin Trent <ben.w.trent@gmail.com>
Co-Authored-By: Tom Veasey <tveasey@users.noreply.github.com>

											
										
										
											2019-09-19 09:10:11 +02:00
+								// TESTRESPONSE[s/"version" : "7.3.0"/"version" : $body.version/]
 								[[ml-put-dfanalytics-example-r]]
-												[DOCS] Adds examples to the PUT dfa and the evaluate dfa APIs (#46966)

* [DOCS] Adds examples to the PUT dfa and the evaluate dfa APIs.

* [DOCS] Removes extra lines from examples.

* Update docs/reference/ml/df-analytics/apis/evaluate-dfanalytics.asciidoc

Co-Authored-By: Lisa Cawley <lcawley@elastic.co>

* Update docs/reference/ml/df-analytics/apis/put-dfanalytics.asciidoc

Co-Authored-By: Lisa Cawley <lcawley@elastic.co>

* [DOCS] Explains examples.

											
										
										
											2019-10-02 10:26:20 +02:00
+								===== {regression-cap} examples
-												[DOCS] Adds regression analytics resources and examples to the data frame analytics APIs and the evaluation API (#46176)

* [DOCS] Adds regression analytics resources and examples to the data frame analytics APIs.
Co-Authored-By: Benjamin Trent <ben.w.trent@gmail.com>
Co-Authored-By: Tom Veasey <tveasey@users.noreply.github.com>

											
										
										
											2019-09-19 09:10:11 +02:00
-												[DOCS] Fixes typos in the PUT dfa and the evaluate dfa documentation. (#47348)


											
										
										
											2019-10-02 09:49:59 +02:00
+								The following example creates the `house_price_regression_analysis`
 								{dfanalytics-job}, the analysis type is `regression`:
-												[DOCS] Adds regression analytics resources and examples to the data frame analytics APIs and the evaluation API (#46176)

* [DOCS] Adds regression analytics resources and examples to the data frame analytics APIs.
Co-Authored-By: Benjamin Trent <ben.w.trent@gmail.com>
Co-Authored-By: Tom Veasey <tveasey@users.noreply.github.com>

											
										
										
											2019-09-19 09:10:11 +02:00
 								[source,console]
 								--------------------------------------------------
 								PUT _ml/data_frame/analytics/house_price_regression_analysis
 								{
 								  "source": {
 								    "index": "houses_sold_last_10_yrs"
 								  },
 								  "dest": {
 								    "index": "house_price_predictions"
 								  },
 								  "analysis":
 								    {
 								      "regression": {
 								        "dependent_variable": "price"
 								      }
 								    }
 								}
 								--------------------------------------------------
 								// TEST[skip:TBD]
 								The API returns the following result:
 								[source,console-result]
 								----
 								{
 								  "id" : "house_price_regression_analysis",
 								  "source" : {
 								    "index" : [
 								      "houses_sold_last_10_yrs"
 								    ],
 								    "query" : {
 								      "match_all" : { }
 								    }
 								  },
 								  "dest" : {
 								    "index" : "house_price_predictions",
 								    "results_field" : "ml"
 								  },
 								  "analysis" : {
 								    "regression" : {
 								      "dependent_variable" : "price",
 								      "training_percent" : 100
 								    }
 								  },
 								  "model_memory_limit" : "1gb",
 								  "create_time" : 1567168659127,
-												[ML][7.x] Add lazy assignment job config option (#47993)

This change adds:

- A new option, allow_lazy_open, to anomaly detection jobs
- A new option, allow_lazy_start, to data frame analytics jobs

Both work in the same way: they allow a job to be
opened/started even if no ML node exists that can
accommodate the job immediately. In this situation
the job waits in the opening/starting state until ML
node capacity is available. (The starting state for data
frame analytics jobs is new in this change.)

Additionally, the ML nightly maintenance tasks now
creates audit warnings for ML jobs that are unassigned.
This means that jobs that cannot be assigned to an ML
node for a very long time will show a yellow warning
triangle in the UI.

A final change is that it is now possible to close a job
that is not assigned to a node without using force.
This is because previously jobs that were open but
not assigned to a node were an aberration, whereas
after this change they'll be relatively common.
											
										
										
											2019-10-15 06:55:11 +01:00
+								  "version" : "8.0.0",
 								  "allow_lazy_start" : false
-												[DOCS] Adds regression analytics resources and examples to the data frame analytics APIs and the evaluation API (#46176)

* [DOCS] Adds regression analytics resources and examples to the data frame analytics APIs.
Co-Authored-By: Benjamin Trent <ben.w.trent@gmail.com>
Co-Authored-By: Tom Veasey <tveasey@users.noreply.github.com>

											
										
										
											2019-09-19 09:10:11 +02:00
+								}
 								----
 								// TESTRESPONSE[s/1567168659127/$body.$_path/]
-												[DOCS] Adds examples to the PUT dfa and the evaluate dfa APIs (#46966)

* [DOCS] Adds examples to the PUT dfa and the evaluate dfa APIs.

* [DOCS] Removes extra lines from examples.

* Update docs/reference/ml/df-analytics/apis/evaluate-dfanalytics.asciidoc

Co-Authored-By: Lisa Cawley <lcawley@elastic.co>

* Update docs/reference/ml/df-analytics/apis/put-dfanalytics.asciidoc

Co-Authored-By: Lisa Cawley <lcawley@elastic.co>

* [DOCS] Explains examples.

											
										
										
											2019-10-02 10:26:20 +02:00
+								// TESTRESPONSE[s/"version": "8.0.0"/"version": $body.version/]
 								The following example creates a job and specifies a training percent:
 								[source,console]
 								--------------------------------------------------
 								PUT _ml/data_frame/analytics/student_performance_mathematics_0.3
 								{
 								 "source": {
 								   "index": "student_performance_mathematics"
 								 },
 								 "dest": {
 								   "index":"student_performance_mathematics_reg"
 								 },
 								 "analysis":
 								   {
 								     "regression": {
 								       "dependent_variable": "G3",
 								       "training_percent": 70  <1>
 								     }
 								   }
 								}
 								--------------------------------------------------
 								// TEST[skip:TBD]
 								<1> The `training_percent` defines the percentage of the data set that will be used
-												[ML][7.x] Add lazy assignment job config option (#47993)

This change adds:

- A new option, allow_lazy_open, to anomaly detection jobs
- A new option, allow_lazy_start, to data frame analytics jobs

Both work in the same way: they allow a job to be
opened/started even if no ML node exists that can
accommodate the job immediately. In this situation
the job waits in the opening/starting state until ML
node capacity is available. (The starting state for data
frame analytics jobs is new in this change.)

Additionally, the ML nightly maintenance tasks now
creates audit warnings for ML jobs that are unassigned.
This means that jobs that cannot be assigned to an ML
node for a very long time will show a yellow warning
triangle in the UI.

A final change is that it is now possible to close a job
that is not assigned to a node without using force.
This is because previously jobs that were open but
not assigned to a node were an aberration, whereas
after this change they'll be relatively common.
											
										
										
											2019-10-15 06:55:11 +01:00
+								for training the model.
-												[DOCS] Adds classification type DFA API docs and ml-shared.asciidoc (#48241)


											
										
										
											2019-11-06 07:40:27 -05:00
 								[[ml-put-dfanalytics-example-c]]
 								===== {classification-cap} example
 								The following example creates the `loan_classification` {dfanalytics-job}, the
 								analysis type is `classification`:
 								[source,console]
 								--------------------------------------------------
 								PUT _ml/data_frame/analytics/loan_classification
 								{
 								  "source" : {
 								    "index": "loan-applicants"
 								  },
 								  "dest" : {
 								    "index": "loan-applicants-classified"
 								  },
 								  "analysis" : {
 								    "classification": {
 								      "dependent_variable": "label",
 								      "training_percent": 75,
 								      "num_top_classes": 2
 								    }
 								  }
 								}
 								--------------------------------------------------
 								// TEST[skip:TBD]