[DOCS] Removes unshared sections from ml-shared.asciidoc (#55192)

This commit is contained in:
Lisa Cawley 2020-04-14 18:47:09 -07:00 committed by GitHub
parent f49354b7d7
commit 2910d01179
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
5 changed files with 257 additions and 311 deletions

View File

@ -71,11 +71,50 @@ The API returns a response that contains the following:
`field_selection`::
(array)
include::{docdir}/ml/ml-shared.asciidoc[tag=field-selection]
An array of objects that explain selection for each field, sorted by
the field names.
+
.Properties of `field_selection` objects
[%collapsible%open]
====
`is_included`:::
(boolean) Whether the field is selected to be included in the analysis.
`is_required`:::
(boolean) Whether the field is required.
`feature_type`:::
(string) The feature type of this field for the analysis. May be `categorical`
or `numerical`.
`mapping_types`:::
(string) The mapping types of the field.
`name`:::
(string) The field name.
`reason`:::
(string) The reason a field is not selected to be included in the analysis.
====
`memory_estimation`::
(object)
include::{docdir}/ml/ml-shared.asciidoc[tag=memory-estimation]
(object)
An object containing the memory estimates.
+
.Properties of `memory_estimation`
[%collapsible%open]
====
`expected_memory_with_disk`:::
(string) Estimated memory usage under the assumption that overflowing to disk is
allowed during {dfanalytics}. `expected_memory_with_disk` is usually smaller
than `expected_memory_without_disk` as using disk allows to limit the main
memory needed to perform {dfanalytics}.
`expected_memory_without_disk`:::
(string) Estimated memory usage under the assumption that the whole
{dfanalytics} should happen in memory (i.e. without overflowing to disk).
====
[[ml-explain-dfanalytics-example]]

View File

@ -75,8 +75,93 @@ include::{docdir}/ml/ml-shared.asciidoc[tag=size]
==== {api-response-body-title}
`data_frame_analytics`::
(array)
include::{docdir}/ml/ml-shared.asciidoc[tag=data-frame-analytics]
(array)
An array of {dfanalytics-job} resources, which are sorted by the `id` value in
ascending order.
+
.Properties of {dfanalytics-job} resources
[%collapsible%open]
====
`analysis`:::
(object) The type of analysis that is performed on the `source`.
//Begin analyzed_fields
`analyzed_fields`:::
(object) Contains `includes` and/or `excludes` patterns that select which fields
are included in the analysis.
+
.Properties of `analyzed_fields`
[%collapsible%open]
=====
`excludes`:::
(Optional, array) An array of strings that defines the fields that are excluded
from the analysis.
`includes`:::
(Optional, array) An array of strings that defines the fields that are included
in the analysis.
=====
//End analyzed_fields
//Begin dest
`dest`:::
(string) The destination configuration of the analysis.
+
.Properties of `dest`
[%collapsible%open]
=====
`index`:::
(string) The _destination index_ that stores the results of the
{dfanalytics-job}.
`results_field`:::
(string) The name of the field that stores the results of the analysis. Defaults
to `ml`.
=====
//End dest
`id`:::
(string) The unique identifier of the {dfanalytics-job}.
`model_memory_limit`:::
(string) The `model_memory_limit` that has been set to the {dfanalytics-job}.
`source`:::
(object) The configuration of how the analysis data is sourced. It has an
`index` parameter and optionally a `query` and a `_source`.
+
.Properties of `source`
[%collapsible%open]
=====
`index`:::
(array) Index or indices on which to perform the analysis. It can be a single
index or index pattern as well as an array of indices or patterns.
`query`:::
(object) The query that has been specified for the {dfanalytics-job}. The {es}
query domain-specific language (<<query-dsl,DSL>>). This value corresponds to
the query object in an {es} search POST body. By default, this property has the
following value: `{"match_all": {}}`.
`_source`:::
(object) Contains the specified `includes` and/or `excludes` patterns that
select which fields are present in the destination. Fields that are excluded
cannot be included in the analysis.
+
.Properties of `_source`
[%collapsible%open]
======
`excludes`:::
(array) An array of strings that defines the fields that are excluded from the
destination.
`includes`:::
(array) An array of strings that defines the fields that are included in the
destination.
======
//End of _source
=====
//End source
====
[[ml-get-dfanalytics-response-codes]]

View File

@ -60,7 +60,8 @@ include::{docdir}/ml/ml-shared.asciidoc[tag=allow-no-match]
`decompress_definition`::
(Optional, boolean)
include::{docdir}/ml/ml-shared.asciidoc[tag=decompress-definition]
Specifies whether the included model definition should be returned as a JSON map
(`true`) or in a custom compressed format (`false`). Defaults to `true`.
`from`::
(Optional, integer)
@ -68,7 +69,9 @@ include::{docdir}/ml/ml-shared.asciidoc[tag=from]
`include_model_definition`::
(Optional, boolean)
include::{docdir}/ml/ml-shared.asciidoc[tag=include-model-definition]
Specifies if the model definition should be returned in the response. Defaults
to `false`. When `true`, only a single model must match the ID patterns
provided, otherwise a bad request is returned.
`size`::
(Optional, integer)
@ -83,8 +86,60 @@ include::{docdir}/ml/ml-shared.asciidoc[tag=tags]
==== {api-response-body-title}
`trained_model_configs`::
(array)
include::{docdir}/ml/ml-shared.asciidoc[tag=trained-model-configs]
(array)
An array of trained model resources, which are sorted by the `model_id` value in
ascending order.
+
.Properties of trained model resources
[%collapsible%open]
====
`created_by`:::
(string)
Information on the creator of the trained model.
`create_time`:::
(<<time-units,time units>>)
The time when the trained model was created.
`default_field_map` :::
(object)
A string to string object that contains the default field map to use
when inferring against the model. For example, data frame analytics
may train the model on a specific multi-field `foo.keyword`.
The analytics job would then supply a default field map entry for
`"foo" : "foo.keyword"`.
+
Any field map described in the inference configuration takes precedence.
`estimated_heap_memory_usage_bytes`:::
(integer)
The estimated heap usage in bytes to keep the trained model in memory.
`estimated_operations`:::
(integer)
The estimated number of operations to use the trained model.
`license_level`:::
(string)
The license level of the trained model.
`metadata`:::
(object)
An object containing metadata about the trained model. For example, models
created by {dfanalytics} contain `analysis_config` and `input` objects.
`model_id`:::
(string)
Idetifier for the trained model.
`tags`:::
(string)
A comma delimited string of tags. A {infer} model can have many tags, or none.
`version`:::
(string)
The {es} version number in which the trained model was created.
====
[[ml-get-inference-response-codes]]
==== {api-response-codes-title}

View File

@ -182,28 +182,42 @@ The configuration information necessary to perform
[%collapsible%open]
=====
`compute_feature_influence`::::
(Optional, boolean)
include::{docdir}/ml/ml-shared.asciidoc[tag=compute-feature-influence]
(Optional, boolean)
If `true`, the feature influence calculation is enabled. Defaults to `true`.
`feature_influence_threshold`::::
(Optional, double)
include::{docdir}/ml/ml-shared.asciidoc[tag=feature-influence-threshold]
(Optional, double)
The minimum {olscore} that a document needs to have in order to calculate its
{fiscore}. Value range: 0-1 (`0.1` by default).
`method`::::
(Optional, string)
include::{docdir}/ml/ml-shared.asciidoc[tag=method]
Sets the method that {oldetection} uses. If the method is not set {oldetection}
uses an ensemble of different methods and normalises and combines their
individual {olscores} to obtain the overall {olscore}. We recommend to use the
ensemble method. Available methods are `lof`, `ldof`, `distance_kth_nn`,
`distance_knn`.
`n_neighbors`::::
(Optional, integer)
include::{docdir}/ml/ml-shared.asciidoc[tag=n-neighbors]
Defines the value for how many nearest neighbors each method of
{oldetection} will use to calculate its {olscore}. When the value is not set,
different values will be used for different ensemble members. This helps
improve diversity in the ensemble. Therefore, only override this if you are
confident that the value you choose is appropriate for the data set.
`outlier_fraction`::::
(Optional, double)
include::{docdir}/ml/ml-shared.asciidoc[tag=outlier-fraction]
(Optional, double)
Sets the proportion of the data set that is assumed to be outlying prior to
{oldetection}. For example, 0.05 means it is assumed that 5% of values are real
outliers and 95% are inliers.
`standardization_enabled`::::
(Optional, boolean)
include::{docdir}/ml/ml-shared.asciidoc[tag=standardization-enabled]
(Optional, boolean)
If `true`, then the following operation is performed on the columns before
computing outlier scores: (x_i - mean(x_i)) / sd(x_i). Defaults to `true`. For
more information, see
https://en.wikipedia.org/wiki/Feature_scaling#Standardization_(Z-score_Normalization)[this wiki page about standardization].
//End outlier_detection
=====
//Begin regression
@ -334,11 +348,54 @@ include::{docdir}/ml/ml-shared.asciidoc[tag=dest]
`model_memory_limit`::
(Optional, string)
include::{docdir}/ml/ml-shared.asciidoc[tag=model-memory-limit-dfa]
The approximate maximum amount of memory resources that are permitted for
analytical processing. The default value for {dfanalytics-jobs} is `1gb`. If
your `elasticsearch.yml` file contains an `xpack.ml.max_model_memory_limit`
setting, an error occurs when you try to create {dfanalytics-jobs} that have
`model_memory_limit` values greater than that setting. For more information, see
<<ml-settings>>.
`source`::
(object)
include::{docdir}/ml/ml-shared.asciidoc[tag=source-put-dfa]
The configuration of how to source the analysis data. It requires an `index`.
Optionally, `query` and `_source` may be specified.
+
.Properties of `source`
[%collapsible%open]
====
`index`:::
(Required, string or array) Index or indices on which to perform the analysis.
It can be a single index or index pattern as well as an array of indices or
patterns.
+
WARNING: If your source indices contain documents with the same IDs, only the
document that is indexed last appears in the destination index.
`query`:::
(Optional, object) The {es} query domain-specific language (<<query-dsl,DSL>>).
This value corresponds to the query object in an {es} search POST body. All the
options that are supported by {es} can be used, as this object is passed
verbatim to {es}. By default, this property has the following value:
`{"match_all": {}}`.
`_source`:::
(Optional, object) Specify `includes` and/or `excludes` patterns to select which
fields will be present in the destination. Fields that are excluded cannot be
included in the analysis.
+
.Properties of `_source`
[%collapsible%open]
=====
`includes`::::
(array) An array of strings that defines the fields that will be included in the
destination.
`excludes`::::
(array) An array of strings that defines the fields that will be excluded from
the destination.
=====
====
[[ml-put-dfanalytics-example]]

View File

@ -278,10 +278,6 @@ include::{docdir}/ml/ml-shared.asciidoc[tag=time-span]
====
end::chunking-config[]
tag::compute-feature-influence[]
If `true`, the feature influence calculation is enabled. Defaults to `true`.
end::compute-feature-influence[]
tag::custom-rules[]
An array of custom rule objects, which enable you to customize the way detectors
operate. For example, a rule may dictate to the detector conditions under which
@ -375,95 +371,6 @@ include::{docdir}/ml/ml-shared.asciidoc[tag=time-format]
====
end::data-description[]
tag::data-frame-analytics[]
An array of {dfanalytics-job} resources, which are sorted by the `id` value in
ascending order.
+
.Properties of {dfanalytics-job} resources
[%collapsible%open]
====
`analysis`:::
(object) The type of analysis that is performed on the `source`.
//Begin analyzed_fields
`analyzed_fields`:::
(object) Contains `includes` and/or `excludes` patterns that select which fields
are included in the analysis.
+
.Properties of `analyzed_fields`
[%collapsible%open]
=====
`excludes`:::
(Optional, array) An array of strings that defines the fields that are excluded
from the analysis.
`includes`:::
(Optional, array) An array of strings that defines the fields that are included
in the analysis.
=====
//End analyzed_fields
//Begin dest
`dest`:::
(string) The destination configuration of the analysis.
+
.Properties of `dest`
[%collapsible%open]
=====
`index`:::
(string) The _destination index_ that stores the results of the
{dfanalytics-job}.
`results_field`:::
(string) The name of the field that stores the results of the analysis. Defaults
to `ml`.
=====
//End dest
`id`:::
(string) The unique identifier of the {dfanalytics-job}.
`model_memory_limit`:::
(string) The `model_memory_limit` that has been set to the {dfanalytics-job}.
`source`:::
(object) The configuration of how the analysis data is sourced. It has an
`index` parameter and optionally a `query` and a `_source`.
+
.Properties of `source`
[%collapsible%open]
=====
`index`:::
(array) Index or indices on which to perform the analysis. It can be a single
index or index pattern as well as an array of indices or patterns.
`query`:::
(object) The query that has been specified for the {dfanalytics-job}. The {es}
query domain-specific language (<<query-dsl,DSL>>). This value corresponds to
the query object in an {es} search POST body. By default, this property has the
following value: `{"match_all": {}}`.
`_source`:::
(object) Contains the specified `includes` and/or `excludes` patterns that
select which fields are present in the destination. Fields that are excluded
cannot be included in the analysis.
+
.Properties of `_source`
[%collapsible%open]
======
`excludes`:::
(array) An array of strings that defines the fields that are excluded from the
destination.
`includes`:::
(array) An array of strings that defines the fields that are included in the
destination.
======
//End of _source
=====
//End source
====
end::data-frame-analytics[]
tag::data-frame-analytics-stats[]
An array of statistics objects for {dfanalytics-jobs}, which are
sorted by the `id` value in ascending order.
@ -906,11 +813,6 @@ category. (Dead categories are a side effect of the way categorization has no
prior training.)
end::dead-category-count[]
tag::decompress-definition[]
Specifies whether the included model definition should be returned as a JSON map
(`true`) or in a custom compressed format (`false`). Defaults to `true`.
end::decompress-definition[]
tag::delayed-data-check-config[]
Specifies whether the {dfeed} checks for missing data and the size of the
window. For example: `{"enabled": true, "check_window": "1h"}`.
@ -1029,39 +931,6 @@ Advanced configuration option. Defines the fraction of features that will be
used when selecting a random bag for each candidate split.
end::feature-bag-fraction[]
tag::feature-influence-threshold[]
The minimum {olscore} that a document needs to have in order to calculate its
{fiscore}. Value range: 0-1 (`0.1` by default).
end::feature-influence-threshold[]
tag::field-selection[]
An array of objects that explain selection for each field, sorted by
the field names.
+
.Properties of `field_selection` objects
[%collapsible%open]
====
`is_included`:::
(boolean) Whether the field is selected to be included in the analysis.
`is_required`:::
(boolean) Whether the field is required.
`feature_type`:::
(string) The feature type of this field for the analysis. May be `categorical`
or `numerical`.
`mapping_types`:::
(string) The mapping types of the field.
`name`:::
(string) The field name.
`reason`:::
(string) The reason a field is not selected to be included in the analysis.
====
end::field-selection[]
tag::filter[]
One or more <<analysis-tokenfilters,token filters>>. In addition to the built-in
token filters, other plugins can provide more token filters. This property is
@ -1114,12 +983,6 @@ tag::groups[]
A list of job groups. A job can belong to no groups or many.
end::groups[]
tag::include-model-definition[]
Specifies if the model definition should be returned in the response. Defaults
to `false`. When `true`, only a single model must match the ID patterns
provided, otherwise a bad request is returned.
end::include-model-definition[]
tag::indices[]
An array of index names. Wildcards are supported. For example:
`["it_ops_metrics", "server*"]`.
@ -1319,32 +1182,6 @@ Advanced configuration option. Defines the maximum number of trees the forest is
allowed to contain. The maximum value is 2000.
end::max-trees[]
tag::memory-estimation[]
An object containing the memory estimates.
+
.Properties of `memory_estimation`
[%collapsible%open]
====
`expected_memory_with_disk`:::
(string) Estimated memory usage under the assumption that overflowing to disk is
allowed during {dfanalytics}. `expected_memory_with_disk` is usually smaller
than `expected_memory_without_disk` as using disk allows to limit the main
memory needed to perform {dfanalytics}.
`expected_memory_without_disk`:::
(string) Estimated memory usage under the assumption that the whole
{dfanalytics} should happen in memory (i.e. without overflowing to disk).
====
end::memory-estimation[]
tag::method[]
Sets the method that {oldetection} uses. If the method is not set {oldetection}
uses an ensemble of different methods and normalises and combines their
individual {olscores} to obtain the overall {olscore}. We recommend to use the
ensemble method. Available methods are `lof`, `ldof`, `distance_kth_nn`,
`distance_knn`.
end::method[]
tag::missing-field-count[]
The number of input documents that are missing a field that the {anomaly-job} is
configured to analyze. Input documents with missing fields are still processed
@ -1411,15 +1248,6 @@ tag::model-memory-limit-anomaly-jobs[]
The upper limit for model memory usage, checked on increasing values.
end::model-memory-limit-anomaly-jobs[]
tag::model-memory-limit-dfa[]
The approximate maximum amount of memory resources that are permitted for
analytical processing. The default value for {dfanalytics-jobs} is `1gb`. If
your `elasticsearch.yml` file contains an `xpack.ml.max_model_memory_limit`
setting, an error occurs when you try to create {dfanalytics-jobs} that have
`model_memory_limit` values greater than that setting. For more information, see
<<ml-settings>>.
end::model-memory-limit-dfa[]
tag::model-memory-status[]
The status of the mathematical models, which can have one of the following
values:
@ -1496,14 +1324,6 @@ NOTE: To use the `multivariate_by_fields` property, you must also specify
--
end::multivariate-by-fields[]
tag::n-neighbors[]
Defines the value for how many nearest neighbors each method of
{oldetection} will use to calculate its {olscore}. When the value is not set,
different values will be used for different ensemble members. This helps
improve diversity in the ensemble. Therefore, only override this if you are
confident that the value you choose is appropriate for the data set.
end::n-neighbors[]
tag::node-address[]
The network address of the node.
end::node-address[]
@ -1538,12 +1358,6 @@ order documents are discarded, since jobs require time series data to be in
ascending chronological order.
end::out-of-order-timestamp-count[]
tag::outlier-fraction[]
Sets the proportion of the data set that is assumed to be outlying prior to
{oldetection}. For example, 0.05 means it is assumed that 5% of values are real
outliers and 95% are inliers.
end::outlier-fraction[]
tag::over-field-name[]
The field used to split the data. In particular, this property is used for
analyzing the splits with respect to the history of all splits. It is used for
@ -1666,60 +1480,12 @@ tag::snapshot-id[]
A numerical character string that uniquely identifies the model snapshot.
end::snapshot-id[]
tag::source-put-dfa[]
The configuration of how to source the analysis data. It requires an `index`.
Optionally, `query` and `_source` may be specified.
+
.Properties of `source`
[%collapsible%open]
====
`index`:::
(Required, string or array) Index or indices on which to perform the analysis.
It can be a single index or index pattern as well as an array of indices or
patterns.
+
WARNING: If your source indices contain documents with the same IDs, only the
document that is indexed last appears in the destination index.
`query`:::
(Optional, object) The {es} query domain-specific language (<<query-dsl,DSL>>).
This value corresponds to the query object in an {es} search POST body. All the
options that are supported by {es} can be used, as this object is passed
verbatim to {es}. By default, this property has the following value:
`{"match_all": {}}`.
`_source`:::
(Optional, object) Specify `includes` and/or `excludes` patterns to select which
fields will be present in the destination. Fields that are excluded cannot be
included in the analysis.
+
.Properties of `_source`
[%collapsible%open]
=====
`includes`::::
(array) An array of strings that defines the fields that will be included in the
destination.
`excludes`::::
(array) An array of strings that defines the fields that will be excluded from
the destination.
=====
====
end::source-put-dfa[]
tag::sparse-bucket-count[]
The number of buckets that contained few data points compared to the expected
number of data points. If your data contains many sparse buckets, consider using
a longer `bucket_span`.
end::sparse-bucket-count[]
tag::standardization-enabled[]
If `true`, then the following operation is performed on the columns before
computing outlier scores: (x_i - mean(x_i)) / sd(x_i). Defaults to `true`. For
more information, see
https://en.wikipedia.org/wiki/Feature_scaling#Standardization_(Z-score_Normalization)[this wiki page about standardization].
end::standardization-enabled[]
tag::state-anomaly-job[]
The status of the {anomaly-job}, which can be one of the following values:
+
@ -1833,62 +1599,6 @@ The number of `partition` field values that were analyzed by the models. This
value is cumulative for all detectors in the job.
end::total-partition-field-count[]
tag::trained-model-configs[]
An array of trained model resources, which are sorted by the `model_id` value in
ascending order.
+
.Properties of trained model resources
[%collapsible%open]
====
`created_by`:::
(string)
Information on the creator of the trained model.
`create_time`:::
(<<time-units,time units>>)
The time when the trained model was created.
`default_field_map` :::
(object)
A string to string object that contains the default field map to use
when inferring against the model. For example, data frame analytics
may train the model on a specific multi-field `foo.keyword`.
The analytics job would then supply a default field map entry for
`"foo" : "foo.keyword"`.
+
Any field map described in the inference configuration takes precedence.
`estimated_heap_memory_usage_bytes`:::
(integer)
The estimated heap usage in bytes to keep the trained model in memory.
`estimated_operations`:::
(integer)
The estimated number of operations to use the trained model.
`license_level`:::
(string)
The license level of the trained model.
`metadata`:::
(object)
An object containing metadata about the trained model. For example, models
created by {dfanalytics} contain `analysis_config` and `input` objects.
`model_id`:::
(string)
Idetifier for the trained model.
`tags`:::
(string)
A comma delimited string of tags. A {infer} model can have many tags, or none.
`version`:::
(string)
The {es} version number in which the trained model was created.
====
end::trained-model-configs[]
tag::training-percent[]
Defines what percentage of the eligible documents that will
be used for training. Documents that are ignored by the analysis (for example