[DOCS] Adds data frame analytics API and evaluate API resource documentation (#43972)

This PR adds the resource documentation of the data frame analytics APIs and the evaluate API to the ML API doc pool.
This commit is contained in:
István Zoltán Szabó 2019-07-11 18:05:05 +02:00
parent 5f22370b6b
commit 2171b6b47f
6 changed files with 223 additions and 17 deletions

View File

@ -0,0 +1,108 @@
[role="xpack"]
[testenv="platinum"]
[[ml-dfanalytics-resources]]
=== {dfanalytics-cap} job resources
{dfanalytics-cap} resources relate to APIs such as <<put-dfanalytics>> and
<<get-dfanalytics>>.
[discrete]
[[ml-dfanalytics-properties]]
==== {api-definitions-title}
`analysis`::
(object) The type of analysis that is performed on the `source`. For example:
`outlier_detection`. For more information, see <<dfanalytics-types>>.
`analyzed_fields`::
(object) You can specify both `includes` and/or `excludes` patterns. If
`analyzed_fields` is not set, only the relevant fields will be included. For
example all the numeric fields for {oldetection}.
`dest`::
(object) The destination configuration of the analysis. For more information,
see <<dfanalytics-dest-resources>>.
`id`::
(string) The unique identifier for the {dfanalytics-job}. This identifier can
contain lowercase alphanumeric characters (a-z and 0-9), hyphens, and
underscores. It must start and end with alphanumeric characters. This property
is informational; you cannot change the identifier for existing jobs.
`model_memory_limit`::
(string) The approximate maximum amount of memory resources that are
permitted for analytical processing. The default value for {dfanalytics-jobs}
is `1gb`. If your `elasticsearch.yml` file contains an
`xpack.ml.max_model_memory_limit` setting, an error occurs when you try to
create {dfanalytics-jobs} that have `model_memory_limit` values greater than
that setting. For more information, see <<ml-settings>>.
`source`::
(object) The source configuration, consisting of `index` and optionally a
`query`. For more information, see <<dfanalytics-source-resources>>.
[[dfanalytics-types]]
==== Analysis objects
{dfanalytics-cap} resources contain `analysis` objects. For example, when you
create a {dfanalytics-job}, you must define the type of analysis it performs.
[discrete]
[[oldetection-resources]]
===== {oldetection-cap} configuration objects
An {oldetection} configuration object has the following properties:
[discrete]
[[oldetection-properties]]
==== {api-definitions-title}
`n_neighbors`::
(integer) Defines the value for how many nearest neighbors each method of
{oldetection} will use to calculate its {olscore}. When the value is
not set, the system will dynamically detect an appropriate value.
`method`::
(string) Sets the method that {oldetection} uses. If the method is not set
{oldetection} uses an ensemble of different methods and normalises and
combines their individual {olscores} to obtain the overall {olscore}.
Available methods are `lof`, `ldof`, `distance_kth_nn`, `distance_knn`.
`feature_influence_threshold`::
(double) The minimum {olscore} that a document needs to have in order to
calculate its {fiscore}.
Value range: 0-1 (`0.1` by default).
[[dfanalytics-dest-resources]]
==== Dest configuration objects
{dfanalytics-cap} resources contain `dest` objects. For example, when you
create a {dfanalytics-job}, you must define its destination.
[discrete]
[[dfanalytics-dest-properties]]
==== {api-definitions-title}
`index`::
(string) The name of the index in which to store the results of the
{dfanalytics-job}.
`results_field`::
(string) The name of the field in which to store the results of the analysis.
The default value is `ml`.
[[dfanalytics-source-resources]]
==== Source configuration objects
The `source` configuration object has the following properties:
`index`::
(array) An array of index names on which to perform the analysis. It can be a
single index or index pattern as well as an array of indices or patterns.
`query`::
(object) The {es} query domain-specific language (DSL). This value
corresponds to the query object in an {es} search POST body. All the
options that are supported by {es} can be used, as this object is
passed verbatim to {es}. By default, this property has the following
value: `{"match_all": {"boost": 1}}`.

View File

@ -8,15 +8,9 @@
<titleabbrev>Evaluate {dfanalytics}</titleabbrev>
++++
experimental[]
Evaluates the {dfanalytics} for an annotated index.
Evaluates the executed analysis on an index that is already annotated with a
field that contains the results of the analytics (the `ground truth`) for each
{dataframe} row. Evaluation is typically done via calculating a set of metrics
that capture various aspects of the quality of the results over the data for
which we have the `ground truth`. For different types of analyses different
metrics are suitable. This API packages together commonly used metrics for
various analyses.
experimental[]
[[ml-evaluate-dfanalytics-request]]
==== {api-request-title}
@ -30,6 +24,19 @@ various analyses.
information, see {stack-ov}/security-privileges.html[Security privileges] and
{stack-ov}/built-in-roles.html[Built-in roles].
[[ml-evaluate-dfanalytics-desc]]
==== {api-description-title}
This API evaluates the executed analysis on an index that is already annotated
with a field that contains the results of the analytics (the `ground truth`)
for each {dataframe} row.
Evaluation is typically done by calculating a set of metrics that capture various aspects of the quality of the results over the data for which you have the
`ground truth`.
For different types of analyses different metrics are suitable. This API
packages together commonly used metrics for various analyses.
[[ml-evaluate-dfanalytics-request-body]]
==== {api-request-body-title}
@ -38,8 +45,22 @@ information, see {stack-ov}/security-privileges.html[Security privileges] and
`evaluation` (Required)::
(object) Defines the type of evaluation you want to perform. For example:
`binary_soft_classification`.
See Evaluate API resources.
`binary_soft_classification`. See <<ml-evaluate-dfanalytics-resources>>.
[[ml-evaluate-dfanalytics-results]]
==== {api-response-body-title}
`binary_soft_classification`::
(object) If you chose to do binary soft classification, the API returns the
following evaluation metrics:
`auc_roc`::: TBD
`confusion_matrix`::: TBD
`precision`::: TBD
`recall`::: TBD
[[ml-evaluate-dfanalytics-example]]
==== {api-examples-title}

View File

@ -0,0 +1,63 @@
[role="xpack"]
[testenv="platinum"]
[[ml-evaluate-dfanalytics-resources]]
=== {dfanalytics-cap} evaluation resources
Evaluation configuration objects relate to the <<evaluate-dfanalytics>>.
[discrete]
[[ml-evaluate-dfanalytics-properties]]
==== {api-definitions-title}
`evaluation`::
(object) Defines the type of evaluation you want to perform. The value of this
object can be different depending on the type of evaluation you want to
perform. For example, it can contain <<binary-sc-resources>>.
[[binary-sc-resources]]
==== Binary soft classification configuration objects
Binary soft classification evaluates the results of an analysis which outputs
the probability that each {dataframe} row belongs to a certain class. For
example, in the context of outlier detection, the analysis outputs the
probability whether each row is an outlier.
[discrete]
[[binary-sc-resources-properties]]
===== {api-definitions-title}
`actual_field`::
(string) The field of the `index` which contains the `ground
truth`. The data type of this field can be boolean or integer. If the data
type is integer, the value has to be either `0` (false) or `1` (true).
`predicted_probability_field`::
(string) The field of the `index` that defines the probability of whether the
item belongs to the class in question or not. It's the field that contains the
results of the analysis.
`metrics`::
(object) Specifies the metrics that are used for the evaluation. Available
metrics:
`auc_roc`::
(object) The AUC ROC (area under the curve of the receiver operating
characteristic) score and optionally the curve.
Default value is {"includes_curve": false}.
`precision`::
(object) Set the different thresholds of the {olscore} at where the metric
is calculated.
Default value is {"at": [0.25, 0.50, 0.75]}.
`recall`::
(object) Set the different thresholds of the {olscore} at where the metric
is calculated.
Default value is {"at": [0.25, 0.50, 0.75]}.
`confusion_matrix`::
(object) Set the different thresholds of the {olscore} at where the metrics
(`tp` - true positive, `fp` - false positive, `tn` - true negative, `fn` -
false negative) are calculated.
Default value is {"at": [0.25, 0.50, 0.75]}.

View File

@ -45,6 +45,10 @@ You can get information for all {dfanalytics-jobs} by using _all, by specifying
(string) Identifier for the {dfanalytics-job}. If you do not specify one of
these options, the API returns information for the first hundred
{dfanalytics-jobs}.
`allow_no_match` (Optional)::
(boolean) If `false` and the `data_frame_analytics_id` does not match any
{dfanalytics-job} an error will be returned. The default value is `true`.
[[ml-get-dfanalytics-query-params]]
==== {api-query-parms-title}
@ -60,6 +64,13 @@ You can get information for all {dfanalytics-jobs} by using _all, by specifying
`size` (Optional)::
(integer) Specifies the maximum number of {dfanalytics-jobs} to obtain. The
default value is `100`.
[[ml-get-dfanalytics-results]]
==== {api-response-body-title}
`data_frame_analytics`::
(array) An array of {dfanalytics-job} resources. For more information, see
<<ml-dfanalytics-resources>>.
[[ml-get-dfanalytics-example]]
==== {api-examples-title}

View File

@ -56,24 +56,23 @@ and mappings.
[[ml-put-dfanalytics-request-body]]
==== {api-request-body-title}
`analysis` (Required)::
(object) Defines the type of {dfanalytics} you want to perform on your source
index. For example: `outlier_detection`.
See {oldetection} resources.
index. For example: `outlier_detection`. See <<dfanalytics-types>>.
`analyzed_fields` (Optional)::
(object) You can specify both `includes` and/or `excludes` patterns. If
`analyzed_fields` is not set, only the relevant fileds will be included. For
example all the numeric fields for {oldetection}.
`analyzed_fields` is not set, only the relevant fields will be included. For
example, all the numeric fields for {oldetection}.
`dest` (Required)::
(object) The destination configuration, consisting of `index` and optionally
`results_field` (`ml` by default).
`results_field` (`ml` by default). See <<dfanalytics-dest-resources>>.
`source` (Required)::
(object) The source configuration, consisting of `index` and optionally a
`query`.
`query`. See <<dfanalytics-source-resources>>.
[[ml-put-dfanalytics-example]]
==== {api-examples-title}

View File

@ -8,7 +8,9 @@ These resource definitions are used in APIs related to {ml-features} and
* <<ml-calendar-resource,Calendars>>
* <<ml-datafeed-resource,{dfeeds-cap}>>
* <<ml-datafeed-counts,{dfeed-cap} counts>>
* <<ml-dfanalytics-resources,{dfanalytics-cap}>>
* <<data-frame-transform-resource,{dataframe-transforms-cap}>>
* <<ml-evaluate-dfanalytics-resources,Evaluate {dfanalytics}>>
* <<ml-filter-resource,Filters>>
* <<ml-job-resource,Jobs>>
* <<ml-jobstats,Job statistics>>
@ -19,7 +21,9 @@ These resource definitions are used in APIs related to {ml-features} and
include::{es-repo-dir}/ml/apis/calendarresource.asciidoc[]
include::{es-repo-dir}/ml/apis/datafeedresource.asciidoc[]
include::{es-repo-dir}/ml/apis/dfanalyticsresources.asciidoc[]
include::{es-repo-dir}/data-frames/apis/transformresource.asciidoc[]
include::{es-repo-dir}/ml/apis/evaluateresources.asciidoc[]
include::{es-repo-dir}/ml/apis/filterresource.asciidoc[]
include::{es-repo-dir}/ml/apis/jobresource.asciidoc[]
include::{es-repo-dir}/ml/apis/jobcounts.asciidoc[]