2019-07-11 12:05:05 -04:00
|
|
|
[role="xpack"]
|
|
|
|
[testenv="platinum"]
|
|
|
|
[[ml-dfanalytics-resources]]
|
|
|
|
=== {dfanalytics-cap} job resources
|
|
|
|
|
|
|
|
{dfanalytics-cap} resources relate to APIs such as <<put-dfanalytics>> and
|
|
|
|
<<get-dfanalytics>>.
|
|
|
|
|
|
|
|
[discrete]
|
|
|
|
[[ml-dfanalytics-properties]]
|
|
|
|
==== {api-definitions-title}
|
|
|
|
|
|
|
|
`analysis`::
|
|
|
|
(object) The type of analysis that is performed on the `source`. For example:
|
|
|
|
`outlier_detection`. For more information, see <<dfanalytics-types>>.
|
|
|
|
|
|
|
|
`analyzed_fields`::
|
|
|
|
(object) You can specify both `includes` and/or `excludes` patterns. If
|
|
|
|
`analyzed_fields` is not set, only the relevant fields will be included. For
|
|
|
|
example all the numeric fields for {oldetection}.
|
2019-07-26 05:39:59 -04:00
|
|
|
|
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------
|
|
|
|
PUT _ml/data_frame/analytics/loganalytics
|
|
|
|
{
|
|
|
|
"source": {
|
|
|
|
"index": "logdata"
|
|
|
|
},
|
|
|
|
"dest": {
|
|
|
|
"index": "logdata_out"
|
|
|
|
},
|
|
|
|
"analysis": {
|
|
|
|
"outlier_detection": {
|
|
|
|
}
|
|
|
|
},
|
|
|
|
"analyzed_fields": {
|
|
|
|
"includes": [ "request.bytes", "response.counts.error" ],
|
|
|
|
"excludes": [ "source.geo" ]
|
|
|
|
}
|
|
|
|
}
|
|
|
|
--------------------------------------------------
|
|
|
|
// CONSOLE
|
|
|
|
// TEST[setup:setup_logdata]
|
2019-07-11 12:05:05 -04:00
|
|
|
|
|
|
|
`dest`::
|
2019-07-26 05:39:59 -04:00
|
|
|
(object) The destination configuration of the analysis. The `index` property
|
|
|
|
(string) is the name of the index in which to store the results of the
|
|
|
|
{dfanalytics-job}. The `results_field` (string) property defines the name of
|
|
|
|
the field in which to store the results of the analysis. The default value is
|
|
|
|
`ml`.
|
2019-07-11 12:05:05 -04:00
|
|
|
|
|
|
|
`id`::
|
|
|
|
(string) The unique identifier for the {dfanalytics-job}. This identifier can
|
|
|
|
contain lowercase alphanumeric characters (a-z and 0-9), hyphens, and
|
|
|
|
underscores. It must start and end with alphanumeric characters. This property
|
|
|
|
is informational; you cannot change the identifier for existing jobs.
|
|
|
|
|
|
|
|
`model_memory_limit`::
|
|
|
|
(string) The approximate maximum amount of memory resources that are
|
|
|
|
permitted for analytical processing. The default value for {dfanalytics-jobs}
|
|
|
|
is `1gb`. If your `elasticsearch.yml` file contains an
|
|
|
|
`xpack.ml.max_model_memory_limit` setting, an error occurs when you try to
|
|
|
|
create {dfanalytics-jobs} that have `model_memory_limit` values greater than
|
|
|
|
that setting. For more information, see <<ml-settings>>.
|
|
|
|
|
|
|
|
`source`::
|
2019-07-26 05:39:59 -04:00
|
|
|
(object) The source configuration, consisting of `index` (array) which is an
|
|
|
|
array of index names on which to perform the analysis. It can be a single
|
|
|
|
index or index pattern as well as an array of indices or patterns. Optionally,
|
|
|
|
`source` can have a `query` (object) property. The {es} query domain-specific
|
|
|
|
language (DSL). This value corresponds to the query object in an {es} search
|
|
|
|
POST body. All the options that are supported by {es} can be used, as this
|
|
|
|
object is passed verbatim to {es}. By default, this property has the following
|
|
|
|
value: `{"match_all": {}}`.
|
2019-07-11 12:05:05 -04:00
|
|
|
|
|
|
|
[[dfanalytics-types]]
|
|
|
|
==== Analysis objects
|
|
|
|
|
|
|
|
{dfanalytics-cap} resources contain `analysis` objects. For example, when you
|
2019-07-26 05:39:59 -04:00
|
|
|
create a {dfanalytics-job}, you must define the type of analysis it performs.
|
|
|
|
Currently, `outlier_detection` is the only available type of analysis, however,
|
|
|
|
other types will be added, for example `regression`.
|
2019-07-11 12:05:05 -04:00
|
|
|
|
|
|
|
[discrete]
|
|
|
|
[[oldetection-resources]]
|
2019-07-26 05:39:59 -04:00
|
|
|
==== {oldetection-cap} configuration objects
|
2019-07-11 12:05:05 -04:00
|
|
|
|
|
|
|
An {oldetection} configuration object has the following properties:
|
|
|
|
|
|
|
|
`n_neighbors`::
|
|
|
|
(integer) Defines the value for how many nearest neighbors each method of
|
|
|
|
{oldetection} will use to calculate its {olscore}. When the value is
|
|
|
|
not set, the system will dynamically detect an appropriate value.
|
|
|
|
|
|
|
|
`method`::
|
|
|
|
(string) Sets the method that {oldetection} uses. If the method is not set
|
|
|
|
{oldetection} uses an ensemble of different methods and normalises and
|
2019-07-26 05:39:59 -04:00
|
|
|
combines their individual {olscores} to obtain the overall {olscore}. We
|
|
|
|
recommend to use the ensemble method. Available methods are `lof`, `ldof`,
|
|
|
|
`distance_kth_nn`, `distance_knn`.
|
2019-07-11 12:05:05 -04:00
|
|
|
|
|
|
|
`feature_influence_threshold`::
|
|
|
|
(double) The minimum {olscore} that a document needs to have in order to
|
|
|
|
calculate its {fiscore}.
|
2019-07-26 05:39:59 -04:00
|
|
|
Value range: 0-1 (`0.1` by default).
|