2019-12-13 05:48:21 -05:00
|
|
|
|
tag::allow-lazy-open[]
|
|
|
|
|
Advanced configuration option. Specifies whether this job can open when there is
|
|
|
|
|
insufficient {ml} node capacity for it to be immediately assigned to a node. The
|
|
|
|
|
default value is `false`; if a {ml} node with capacity to run the job cannot
|
|
|
|
|
immediately be found, the <<ml-open-job,open {anomaly-jobs} API>> returns an
|
|
|
|
|
error. However, this is also subject to the cluster-wide
|
|
|
|
|
`xpack.ml.max_lazy_ml_nodes` setting; see <<advanced-ml-settings>>. If this
|
|
|
|
|
option is set to `true`, the <<ml-open-job,open {anomaly-jobs} API>> does not
|
|
|
|
|
return an error and the job waits in the `opening` state until sufficient {ml}
|
|
|
|
|
node capacity is available.
|
|
|
|
|
end::allow-lazy-open[]
|
|
|
|
|
|
|
|
|
|
tag::allow-lazy-start[]
|
|
|
|
|
Whether this job should be allowed to start when there is insufficient {ml} node
|
|
|
|
|
capacity for it to be immediately assigned to a node. The default is `false`,
|
|
|
|
|
which means that the <<start-dfanalytics>> will return an error if a {ml} node
|
|
|
|
|
with capacity to run the job cannot immediately be found. (However, this is also
|
|
|
|
|
subject to the cluster-wide `xpack.ml.max_lazy_ml_nodes` setting - see
|
|
|
|
|
<<advanced-ml-settings>>.) If this option is set to `true` then the
|
|
|
|
|
<<start-dfanalytics>> will not return an error, and the job will wait in the
|
|
|
|
|
`starting` state until sufficient {ml} node capacity is available.
|
|
|
|
|
end::allow-lazy-start[]
|
|
|
|
|
|
|
|
|
|
tag::allow-no-jobs[]
|
|
|
|
|
Specifies what to do when the request:
|
|
|
|
|
+
|
|
|
|
|
--
|
|
|
|
|
* Contains wildcard expressions and there are no jobs that match.
|
|
|
|
|
* Contains the `_all` string or no identifiers and there are no matches.
|
|
|
|
|
* Contains wildcard expressions and there are only partial matches.
|
|
|
|
|
|
|
|
|
|
The default value is `true`, which returns an empty `jobs` array
|
|
|
|
|
when there are no matches and the subset of results when there are partial
|
|
|
|
|
matches. If this parameter is `false`, the request returns a `404` status code
|
|
|
|
|
when there are no matches or only partial matches.
|
|
|
|
|
--
|
|
|
|
|
end::allow-no-jobs[]
|
|
|
|
|
|
|
|
|
|
tag::allow-no-match[]
|
|
|
|
|
Specifies what to do when the request:
|
|
|
|
|
+
|
|
|
|
|
--
|
|
|
|
|
* Contains wildcard expressions and there are no {dfanalytics-jobs} that match.
|
|
|
|
|
* Contains the `_all` string or no identifiers and there are no matches.
|
|
|
|
|
* Contains wildcard expressions and there are only partial matches.
|
|
|
|
|
|
|
|
|
|
The default value is `true`, which returns an empty `data_frame_analytics` array
|
|
|
|
|
when there are no matches and the subset of results when there are partial
|
|
|
|
|
matches. If this parameter is `false`, the request returns a `404` status code
|
|
|
|
|
when there are no matches or only partial matches.
|
|
|
|
|
--
|
|
|
|
|
end::allow-no-match[]
|
|
|
|
|
|
|
|
|
|
tag::analysis[]
|
|
|
|
|
Defines the type of {dfanalytics} you want to perform on your source index. For
|
|
|
|
|
example: `outlier_detection`. See <<ml-dfa-analysis-objects>>.
|
|
|
|
|
end::analysis[]
|
|
|
|
|
|
|
|
|
|
tag::analysis-config[]
|
|
|
|
|
The analysis configuration, which specifies how to analyze the data.
|
|
|
|
|
After you create a job, you cannot change the analysis configuration; all
|
|
|
|
|
the properties are informational. An analysis configuration object has the
|
|
|
|
|
following properties:
|
|
|
|
|
|
|
|
|
|
`bucket_span`:::
|
|
|
|
|
(<<time-units,time units>>)
|
|
|
|
|
include::{docdir}/ml/ml-shared.asciidoc[tag=bucket-span]
|
|
|
|
|
|
|
|
|
|
`categorization_field_name`:::
|
|
|
|
|
(string)
|
|
|
|
|
include::{docdir}/ml/ml-shared.asciidoc[tag=categorization-field-name]
|
|
|
|
|
|
|
|
|
|
`categorization_filters`:::
|
|
|
|
|
(array of strings)
|
|
|
|
|
include::{docdir}/ml/ml-shared.asciidoc[tag=categorization-filters]
|
|
|
|
|
|
|
|
|
|
`categorization_analyzer`:::
|
|
|
|
|
(object or string)
|
|
|
|
|
include::{docdir}/ml/ml-shared.asciidoc[tag=categorization-analyzer]
|
|
|
|
|
|
|
|
|
|
`detectors`:::
|
|
|
|
|
(array) An array of detector configuration objects. Detector configuration
|
|
|
|
|
objects specify which data fields a job analyzes. They also specify which
|
|
|
|
|
analytical functions are used. You can specify multiple detectors for a job.
|
|
|
|
|
include::{docdir}/ml/ml-shared.asciidoc[tag=detector]
|
|
|
|
|
+
|
|
|
|
|
--
|
|
|
|
|
NOTE: If the `detectors` array does not contain at least one detector,
|
|
|
|
|
no analysis can occur and an error is returned.
|
|
|
|
|
|
|
|
|
|
--
|
|
|
|
|
|
|
|
|
|
`influencers`:::
|
|
|
|
|
(array of strings)
|
|
|
|
|
include::{docdir}/ml/ml-shared.asciidoc[tag=influencers]
|
|
|
|
|
|
|
|
|
|
`latency`:::
|
|
|
|
|
(time units)
|
|
|
|
|
include::{docdir}/ml/ml-shared.asciidoc[tag=latency]
|
|
|
|
|
|
|
|
|
|
`multivariate_by_fields`:::
|
|
|
|
|
(boolean)
|
|
|
|
|
include::{docdir}/ml/ml-shared.asciidoc[tag=multivariate-by-fields]
|
|
|
|
|
|
|
|
|
|
`summary_count_field_name`:::
|
|
|
|
|
(string)
|
|
|
|
|
include::{docdir}/ml/ml-shared.asciidoc[tag=summary-count-field-name]
|
|
|
|
|
|
|
|
|
|
end::analysis-config[]
|
|
|
|
|
|
|
|
|
|
tag::analysis-limits[]
|
|
|
|
|
Limits can be applied for the resources required to hold the mathematical models
|
|
|
|
|
in memory. These limits are approximate and can be set per job. They do not
|
|
|
|
|
control the memory used by other processes, for example the {es} Java
|
|
|
|
|
processes. If necessary, you can increase the limits after the job is created.
|
|
|
|
|
The `analysis_limits` object has the following properties:
|
|
|
|
|
|
|
|
|
|
`categorization_examples_limit`:::
|
|
|
|
|
(long)
|
|
|
|
|
include::{docdir}/ml/ml-shared.asciidoc[tag=categorization-examples-limit]
|
|
|
|
|
|
|
|
|
|
`model_memory_limit`:::
|
|
|
|
|
(long or string)
|
|
|
|
|
include::{docdir}/ml/ml-shared.asciidoc[tag=model-memory-limit]
|
|
|
|
|
end::analysis-limits[]
|
|
|
|
|
|
|
|
|
|
tag::analyzed-fields[]
|
|
|
|
|
Specify `includes` and/or `excludes` patterns to select which fields will be
|
|
|
|
|
included in the analysis. If `analyzed_fields` is not set, only the relevant
|
|
|
|
|
fields will be included. For example, all the numeric fields for {oldetection}.
|
|
|
|
|
For the supported field types, see <<ml-put-dfanalytics-supported-fields>>. Also
|
|
|
|
|
see the <<explain-dfanalytics>> which helps understand field selection.
|
|
|
|
|
|
|
|
|
|
`includes`:::
|
|
|
|
|
(Optional, array) An array of strings that defines the fields that will be
|
|
|
|
|
included in the analysis.
|
|
|
|
|
|
|
|
|
|
`excludes`:::
|
|
|
|
|
(Optional, array) An array of strings that defines the fields that will be
|
|
|
|
|
excluded from the analysis. You do not need to add fields with unsupported
|
|
|
|
|
data types to `excludes`, these fields are excluded from the analysis
|
|
|
|
|
automatically.
|
|
|
|
|
end::analyzed-fields[]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
tag::background-persist-interval[]
|
|
|
|
|
Advanced configuration option. The time between each periodic persistence of the
|
|
|
|
|
model. The default value is a randomized value between 3 to 4 hours, which
|
|
|
|
|
avoids all jobs persisting at exactly the same time. The smallest allowed value
|
|
|
|
|
is 1 hour.
|
|
|
|
|
+
|
|
|
|
|
--
|
|
|
|
|
TIP: For very large models (several GB), persistence could take 10-20 minutes,
|
|
|
|
|
so do not set the `background_persist_interval` value too low.
|
|
|
|
|
|
|
|
|
|
--
|
|
|
|
|
end::background-persist-interval[]
|
|
|
|
|
|
|
|
|
|
tag::bucket-span[]
|
|
|
|
|
The size of the interval that the analysis is aggregated into, typically between
|
|
|
|
|
`5m` and `1h`. The default value is `5m`. For more information about time units,
|
|
|
|
|
see <<time-units>>.
|
|
|
|
|
end::bucket-span[]
|
|
|
|
|
|
|
|
|
|
tag::by-field-name[]
|
|
|
|
|
The field used to split the data. In particular, this property is used for
|
|
|
|
|
analyzing the splits with respect to their own history. It is used for finding
|
|
|
|
|
unusual values in the context of the split.
|
|
|
|
|
end::by-field-name[]
|
|
|
|
|
|
|
|
|
|
tag::categorization-analyzer[]
|
|
|
|
|
If `categorization_field_name` is specified, you can also define the analyzer
|
|
|
|
|
that is used to interpret the categorization field. This property cannot be used
|
|
|
|
|
at the same time as `categorization_filters`. The categorization analyzer
|
|
|
|
|
specifies how the `categorization_field` is interpreted by the categorization
|
|
|
|
|
process. The syntax is very similar to that used to define the `analyzer` in the
|
|
|
|
|
<<indices-analyze,Analyze endpoint>>. For more information, see
|
|
|
|
|
{stack-ov}/ml-configuring-categories.html[Categorizing log messages].
|
|
|
|
|
+
|
|
|
|
|
--
|
|
|
|
|
The `categorization_analyzer` field can be specified either as a string or as an
|
|
|
|
|
object. If it is a string it must refer to a
|
|
|
|
|
<<analysis-analyzers,built-in analyzer>> or one added by another plugin. If it
|
|
|
|
|
is an object it has the following properties:
|
|
|
|
|
--
|
|
|
|
|
|
|
|
|
|
`char_filter`::::
|
|
|
|
|
(array of strings or objects)
|
|
|
|
|
include::{docdir}/ml/ml-shared.asciidoc[tag=char-filter]
|
|
|
|
|
|
|
|
|
|
`tokenizer`::::
|
|
|
|
|
(string or object)
|
|
|
|
|
include::{docdir}/ml/ml-shared.asciidoc[tag=tokenizer]
|
|
|
|
|
|
|
|
|
|
`filter`::::
|
|
|
|
|
(array of strings or objects)
|
|
|
|
|
include::{docdir}/ml/ml-shared.asciidoc[tag=filter]
|
|
|
|
|
end::categorization-analyzer[]
|
|
|
|
|
|
|
|
|
|
tag::categorization-examples-limit[]
|
|
|
|
|
The maximum number of examples stored per category in memory and in the results
|
|
|
|
|
data store. The default value is 4. If you increase this value, more examples
|
|
|
|
|
are available, however it requires that you have more storage available. If you
|
|
|
|
|
set this value to `0`, no examples are stored.
|
|
|
|
|
+
|
|
|
|
|
--
|
|
|
|
|
NOTE: The `categorization_examples_limit` only applies to analysis that uses
|
|
|
|
|
categorization. For more information, see
|
|
|
|
|
{stack-ov}/ml-configuring-categories.html[Categorizing log messages].
|
|
|
|
|
|
|
|
|
|
--
|
|
|
|
|
end::categorization-examples-limit[]
|
|
|
|
|
|
|
|
|
|
tag::categorization-field-name[]
|
|
|
|
|
If this property is specified, the values of the specified field will be
|
|
|
|
|
categorized. The resulting categories must be used in a detector by setting
|
|
|
|
|
`by_field_name`, `over_field_name`, or `partition_field_name` to the keyword
|
|
|
|
|
`mlcategory`. For more information, see
|
|
|
|
|
{stack-ov}/ml-configuring-categories.html[Categorizing log messages].
|
|
|
|
|
end::categorization-field-name[]
|
|
|
|
|
|
|
|
|
|
tag::categorization-filters[]
|
|
|
|
|
If `categorization_field_name` is specified, you can also define optional
|
|
|
|
|
filters. This property expects an array of regular expressions. The expressions
|
|
|
|
|
are used to filter out matching sequences from the categorization field values.
|
|
|
|
|
You can use this functionality to fine tune the categorization by excluding
|
|
|
|
|
sequences from consideration when categories are defined. For example, you can
|
|
|
|
|
exclude SQL statements that appear in your log files. For more information, see
|
|
|
|
|
{stack-ov}/ml-configuring-categories.html[Categorizing log messages]. This
|
|
|
|
|
property cannot be used at the same time as `categorization_analyzer`. If you
|
|
|
|
|
only want to define simple regular expression filters that are applied prior to
|
|
|
|
|
tokenization, setting this property is the easiest method. If you also want to
|
|
|
|
|
customize the tokenizer or post-tokenization filtering, use the
|
|
|
|
|
`categorization_analyzer` property instead and include the filters as
|
|
|
|
|
`pattern_replace` character filters. The effect is exactly the same.
|
|
|
|
|
end::categorization-filters[]
|
|
|
|
|
|
|
|
|
|
tag::char-filter[]
|
|
|
|
|
One or more <<analysis-charfilters,character filters>>. In addition to the
|
|
|
|
|
built-in character filters, other plugins can provide more character filters.
|
|
|
|
|
This property is optional. If it is not specified, no character filters are
|
|
|
|
|
applied prior to categorization. If you are customizing some other aspect of the
|
|
|
|
|
analyzer and you need to achieve the equivalent of `categorization_filters`
|
|
|
|
|
(which are not permitted when some other aspect of the analyzer is customized),
|
|
|
|
|
add them here as
|
|
|
|
|
<<analysis-pattern-replace-charfilter,pattern replace character filters>>.
|
|
|
|
|
end::char-filter[]
|
|
|
|
|
|
|
|
|
|
tag::compute-feature-influence[]
|
|
|
|
|
If `true`, the feature influence calculation is enabled. Defaults to `true`.
|
|
|
|
|
end::compute-feature-influence[]
|
|
|
|
|
|
|
|
|
|
tag::custom-rules[]
|
|
|
|
|
An array of custom rule objects, which enable you to customize the way detectors
|
|
|
|
|
operate. For example, a rule may dictate to the detector conditions under which
|
|
|
|
|
results should be skipped. For more examples, see
|
|
|
|
|
{stack-ov}/ml-configuring-detector-custom-rules.html[Configuring detector custom rules].
|
|
|
|
|
A custom rule has the following properties:
|
|
|
|
|
+
|
|
|
|
|
--
|
|
|
|
|
`actions`::
|
|
|
|
|
(array) The set of actions to be triggered when the rule applies. If
|
|
|
|
|
more than one action is specified the effects of all actions are combined. The
|
|
|
|
|
available actions include:
|
|
|
|
|
|
|
|
|
|
* `skip_result`: The result will not be created. This is the default value.
|
|
|
|
|
Unless you also specify `skip_model_update`, the model will be updated as usual
|
|
|
|
|
with the corresponding series value.
|
|
|
|
|
* `skip_model_update`: The value for that series will not be used to update the
|
|
|
|
|
model. Unless you also specify `skip_result`, the results will be created as
|
|
|
|
|
usual. This action is suitable when certain values are expected to be
|
|
|
|
|
consistently anomalous and they affect the model in a way that negatively
|
|
|
|
|
impacts the rest of the results.
|
|
|
|
|
|
|
|
|
|
`scope`::
|
|
|
|
|
(object) An optional scope of series where the rule applies. A rule must either
|
|
|
|
|
have a non-empty scope or at least one condition. By default, the scope includes
|
|
|
|
|
all series. Scoping is allowed for any of the fields that are also specified in
|
|
|
|
|
`by_field_name`, `over_field_name`, or `partition_field_name`. To add a scope
|
|
|
|
|
for a field, add the field name as a key in the scope object and set its value
|
|
|
|
|
to an object with the following properties:
|
|
|
|
|
|
|
|
|
|
`filter_id`:::
|
|
|
|
|
(string) The id of the filter to be used.
|
|
|
|
|
|
|
|
|
|
`filter_type`:::
|
|
|
|
|
(string) Either `include` (the rule applies for values in the filter) or
|
|
|
|
|
`exclude` (the rule applies for values not in the filter). Defaults to
|
|
|
|
|
`include`.
|
|
|
|
|
|
|
|
|
|
`conditions`::
|
|
|
|
|
(array) An optional array of numeric conditions when the rule applies. A rule
|
|
|
|
|
must either have a non-empty scope or at least one condition. Multiple
|
|
|
|
|
conditions are combined together with a logical `AND`. A condition has the
|
|
|
|
|
following properties:
|
|
|
|
|
|
|
|
|
|
`applies_to`:::
|
|
|
|
|
(string) Specifies the result property to which the condition applies. The
|
|
|
|
|
available options are `actual`, `typical`, `diff_from_typical`, `time`.
|
|
|
|
|
|
|
|
|
|
`operator`:::
|
|
|
|
|
(string) Specifies the condition operator. The available options are `gt`
|
|
|
|
|
(greater than), `gte` (greater than or equals), `lt` (less than) and `lte` (less
|
|
|
|
|
than or equals).
|
|
|
|
|
|
|
|
|
|
`value`:::
|
|
|
|
|
(double) The value that is compared against the `applies_to` field using the
|
|
|
|
|
`operator`.
|
|
|
|
|
--
|
|
|
|
|
+
|
|
|
|
|
--
|
|
|
|
|
NOTE: If your detector uses `lat_long`, `metric`, `rare`, or `freq_rare`
|
|
|
|
|
functions, you can only specify `conditions` that apply to `time`.
|
|
|
|
|
|
|
|
|
|
--
|
|
|
|
|
end::custom-rules[]
|
|
|
|
|
|
|
|
|
|
tag::custom-settings[]
|
|
|
|
|
Advanced configuration option. Contains custom meta data about the job. For
|
|
|
|
|
example, it can contain custom URL information as shown in
|
|
|
|
|
{stack-ov}/ml-configuring-url.html[Adding custom URLs to {ml} results].
|
|
|
|
|
end::custom-settings[]
|
|
|
|
|
|
|
|
|
|
tag::data-description[]
|
|
|
|
|
The data description defines the format of the input data when you send data to
|
|
|
|
|
the job by using the <<ml-post-data,post data>> API. Note that when configure
|
|
|
|
|
a {dfeed}, these properties are automatically set.
|
|
|
|
|
+
|
|
|
|
|
--
|
|
|
|
|
When data is received via the <<ml-post-data,post data>> API, it is not stored
|
|
|
|
|
in {es}. Only the results for {anomaly-detect} are retained.
|
|
|
|
|
|
|
|
|
|
A data description object has the following properties:
|
|
|
|
|
|
|
|
|
|
`format`:::
|
|
|
|
|
(string) Only `JSON` format is supported at this time.
|
|
|
|
|
|
|
|
|
|
`time_field`:::
|
|
|
|
|
(string) The name of the field that contains the timestamp.
|
|
|
|
|
The default value is `time`.
|
|
|
|
|
|
|
|
|
|
`time_format`:::
|
|
|
|
|
(string)
|
|
|
|
|
include::{docdir}/ml/ml-shared.asciidoc[tag=time-format]
|
|
|
|
|
--
|
|
|
|
|
end::data-description[]
|
|
|
|
|
|
|
|
|
|
tag::data-frame-analytics[]
|
|
|
|
|
An array of {dfanalytics-job} resources, which are sorted by the `id` value in
|
|
|
|
|
ascending order.
|
|
|
|
|
|
|
|
|
|
`id`:::
|
|
|
|
|
(string) The unique identifier of the {dfanalytics-job}.
|
|
|
|
|
|
|
|
|
|
`source`:::
|
|
|
|
|
(object) The configuration of how the analysis data is sourced. It has an
|
|
|
|
|
`index` parameter and optionally a `query` and a `_source`.
|
|
|
|
|
|
|
|
|
|
`index`::::
|
|
|
|
|
(array) Index or indices on which to perform the analysis. It can be a single
|
|
|
|
|
index or index pattern as well as an array of indices or patterns.
|
|
|
|
|
|
|
|
|
|
`query`::::
|
|
|
|
|
(object) The query that has been specified for the {dfanalytics-job}. The {es}
|
|
|
|
|
query domain-specific language (<<query-dsl,DSL>>). This value corresponds to
|
|
|
|
|
the query object in an {es} search POST body. By default, this property has the
|
|
|
|
|
following value: `{"match_all": {}}`.
|
|
|
|
|
|
|
|
|
|
`_source`::::
|
|
|
|
|
(object) Contains the specified `includes` and/or `excludes` patterns that
|
|
|
|
|
select which fields are present in the destination. Fields that are excluded
|
|
|
|
|
cannot be included in the analysis.
|
|
|
|
|
|
|
|
|
|
`includes`:::::
|
|
|
|
|
(array) An array of strings that defines the fields that are included in the
|
|
|
|
|
destination.
|
|
|
|
|
|
|
|
|
|
`excludes`:::::
|
|
|
|
|
(array) An array of strings that defines the fields that are excluded from the
|
|
|
|
|
destination.
|
|
|
|
|
|
|
|
|
|
`dest`:::
|
|
|
|
|
(string) The destination configuration of the analysis.
|
|
|
|
|
|
|
|
|
|
`index`::::
|
|
|
|
|
(string) The _destination index_ that stores the results of the
|
|
|
|
|
{dfanalytics-job}.
|
|
|
|
|
|
|
|
|
|
`results_field`::::
|
|
|
|
|
(string) The name of the field that stores the results of the analysis. Defaults
|
|
|
|
|
to `ml`.
|
|
|
|
|
|
|
|
|
|
`analysis`:::
|
|
|
|
|
(object) The type of analysis that is performed on the `source`.
|
|
|
|
|
|
|
|
|
|
`analyzed_fields`:::
|
|
|
|
|
(object) Contains `includes` and/or `excludes` patterns that select which fields
|
|
|
|
|
are included in the analysis.
|
|
|
|
|
|
|
|
|
|
`includes`::::
|
|
|
|
|
(Optional, array) An array of strings that defines the fields that are included
|
|
|
|
|
in the analysis.
|
|
|
|
|
|
|
|
|
|
`excludes`::::
|
|
|
|
|
(Optional, array) An array of strings that defines the fields that are excluded
|
|
|
|
|
from the analysis.
|
|
|
|
|
|
|
|
|
|
`model_memory_limit`:::
|
|
|
|
|
(string) The `model_memory_limit` that has been set to the {dfanalytics-job}.
|
|
|
|
|
end::data-frame-analytics[]
|
|
|
|
|
|
|
|
|
|
tag::data-frame-analytics-stats[]
|
|
|
|
|
An array of statistics objects for {dfanalytics-jobs}, which are
|
|
|
|
|
sorted by the `id` value in ascending order.
|
|
|
|
|
|
|
|
|
|
`id`:::
|
|
|
|
|
(string) The unique identifier of the {dfanalytics-job}.
|
|
|
|
|
|
|
|
|
|
`state`:::
|
|
|
|
|
(string) Current state of the {dfanalytics-job}.
|
|
|
|
|
|
|
|
|
|
`progress`:::
|
|
|
|
|
(array) The progress report of the {dfanalytics-job} by phase.
|
|
|
|
|
|
|
|
|
|
`phase`:::
|
|
|
|
|
(string) Defines the phase of the {dfanalytics-job}. Possible phases:
|
|
|
|
|
`reindexing`, `loading_data`, `analyzing`, and `writing_results`.
|
|
|
|
|
|
|
|
|
|
`progress_percent`:::
|
|
|
|
|
(integer) The progress that the {dfanalytics-job} has made expressed in
|
|
|
|
|
percentage.
|
|
|
|
|
end::data-frame-analytics-stats[]
|
|
|
|
|
|
|
|
|
|
tag::dependent-variable[]
|
|
|
|
|
Defines which field of the document is to be predicted.
|
2019-11-06 07:40:27 -05:00
|
|
|
|
This parameter is supplied by field name and must match one of the fields in
|
|
|
|
|
the index being used to train. If this field is missing from a document, then
|
|
|
|
|
that document will not be used for training, but a prediction with the trained
|
|
|
|
|
model will be generated for it. It is also known as continuous target variable.
|
2019-12-13 05:48:21 -05:00
|
|
|
|
end::dependent-variable[]
|
|
|
|
|
|
|
|
|
|
tag::description-dfa[]
|
|
|
|
|
A description of the job.
|
|
|
|
|
end::description-dfa[]
|
|
|
|
|
|
|
|
|
|
tag::dest[]
|
|
|
|
|
The destination configuration, consisting of `index` and
|
|
|
|
|
optionally `results_field` (`ml` by default).
|
|
|
|
|
|
|
|
|
|
`index`:::
|
|
|
|
|
(Required, string) Defines the _destination index_ to store the results of
|
|
|
|
|
the {dfanalytics-job}.
|
|
|
|
|
|
|
|
|
|
`results_field`:::
|
|
|
|
|
(Optional, string) Defines the name of the field in which to store the
|
|
|
|
|
results of the analysis. Default to `ml`.
|
|
|
|
|
end::dest[]
|
|
|
|
|
|
|
|
|
|
tag::detector-description[]
|
|
|
|
|
A description of the detector. For example, `Low event rate`.
|
|
|
|
|
end::detector-description[]
|
|
|
|
|
|
|
|
|
|
tag::detector-field-name[]
|
|
|
|
|
The field that the detector uses in the function. If you use an event rate
|
|
|
|
|
function such as `count` or `rare`, do not specify this field.
|
|
|
|
|
+
|
|
|
|
|
--
|
|
|
|
|
NOTE: The `field_name` cannot contain double quotes or backslashes.
|
|
|
|
|
|
|
|
|
|
--
|
|
|
|
|
end::detector-field-name[]
|
|
|
|
|
|
|
|
|
|
tag::detector-index[]
|
|
|
|
|
A unique identifier for the detector. This identifier is based on the order of
|
|
|
|
|
the detectors in the `analysis_config`, starting at zero. You can use this
|
|
|
|
|
identifier when you want to update a specific detector.
|
|
|
|
|
end::detector-index[]
|
|
|
|
|
|
|
|
|
|
tag::detector[]
|
|
|
|
|
A detector has the following properties:
|
2019-11-06 07:40:27 -05:00
|
|
|
|
|
2019-12-13 05:48:21 -05:00
|
|
|
|
`by_field_name`::::
|
|
|
|
|
(string)
|
|
|
|
|
include::{docdir}/ml/ml-shared.asciidoc[tag=by-field-name]
|
|
|
|
|
|
|
|
|
|
`custom_rules`::::
|
|
|
|
|
(array)
|
|
|
|
|
include::{docdir}/ml/ml-shared.asciidoc[tag=custom-rules]
|
|
|
|
|
|
|
|
|
|
`detector_description`::::
|
|
|
|
|
(string)
|
|
|
|
|
include::{docdir}/ml/ml-shared.asciidoc[tag=detector-description]
|
|
|
|
|
|
|
|
|
|
`detector_index`::::
|
|
|
|
|
(integer)
|
|
|
|
|
include::{docdir}/ml/ml-shared.asciidoc[tag=detector-index]
|
|
|
|
|
|
|
|
|
|
`exclude_frequent`::::
|
|
|
|
|
(string)
|
|
|
|
|
include::{docdir}/ml/ml-shared.asciidoc[tag=exclude-frequent]
|
|
|
|
|
|
|
|
|
|
`field_name`::::
|
|
|
|
|
(string)
|
|
|
|
|
include::{docdir}/ml/ml-shared.asciidoc[tag=detector-field-name]
|
|
|
|
|
|
|
|
|
|
`function`::::
|
|
|
|
|
(string)
|
|
|
|
|
include::{docdir}/ml/ml-shared.asciidoc[tag=function]
|
|
|
|
|
|
|
|
|
|
`over_field_name`::::
|
|
|
|
|
(string)
|
|
|
|
|
include::{docdir}/ml/ml-shared.asciidoc[tag=over-field-name]
|
|
|
|
|
|
|
|
|
|
`partition_field_name`::::
|
|
|
|
|
(string)
|
|
|
|
|
include::{docdir}/ml/ml-shared.asciidoc[tag=partition-field-name]
|
|
|
|
|
|
|
|
|
|
`use_null`::::
|
|
|
|
|
(boolean)
|
|
|
|
|
include::{docdir}/ml/ml-shared.asciidoc[tag=use-null]
|
|
|
|
|
|
|
|
|
|
end::detector[]
|
2019-11-06 07:40:27 -05:00
|
|
|
|
|
|
|
|
|
tag::eta[]
|
2019-12-13 05:48:21 -05:00
|
|
|
|
The shrinkage applied to the weights. Smaller values result
|
2019-11-06 07:40:27 -05:00
|
|
|
|
in larger forests which have better generalization error. However, the smaller
|
|
|
|
|
the value the longer the training will take. For more information, see
|
|
|
|
|
https://en.wikipedia.org/wiki/Gradient_boosting#Shrinkage[this wiki article]
|
|
|
|
|
about shrinkage.
|
|
|
|
|
end::eta[]
|
|
|
|
|
|
2019-12-13 05:48:21 -05:00
|
|
|
|
tag::exclude-frequent[]
|
|
|
|
|
Contains one of the following values: `all`, `none`, `by`, or `over`. If set,
|
|
|
|
|
frequent entities are excluded from influencing the anomaly results. Entities
|
|
|
|
|
can be considered frequent over time or frequent in a population. If you are
|
|
|
|
|
working with both over and by fields, then you can set `exclude_frequent` to
|
|
|
|
|
`all` for both fields, or to `by` or `over` for those specific fields.
|
|
|
|
|
end::exclude-frequent[]
|
2019-11-06 07:40:27 -05:00
|
|
|
|
|
2019-12-13 05:48:21 -05:00
|
|
|
|
tag::feature-bag-fraction[]
|
|
|
|
|
Defines the fraction of features that will be used when
|
2019-11-06 07:40:27 -05:00
|
|
|
|
selecting a random bag for each candidate split.
|
2019-12-13 05:48:21 -05:00
|
|
|
|
end::feature-bag-fraction[]
|
|
|
|
|
|
|
|
|
|
tag::feature-influence-threshold[]
|
|
|
|
|
The minimum {olscore} that a document needs to have in order to calculate its
|
|
|
|
|
{fiscore}. Value range: 0-1 (`0.1` by default).
|
|
|
|
|
end::feature-influence-threshold[]
|
|
|
|
|
|
|
|
|
|
tag::field-selection[]
|
|
|
|
|
An array of objects that explain selection for each field, sorted by
|
|
|
|
|
the field names. Each object in the array has the following properties:
|
|
|
|
|
|
|
|
|
|
`name`:::
|
|
|
|
|
(string) The field name.
|
|
|
|
|
|
|
|
|
|
`mapping_types`:::
|
|
|
|
|
(string) The mapping types of the field.
|
|
|
|
|
|
|
|
|
|
`is_included`:::
|
|
|
|
|
(boolean) Whether the field is selected to be included in the analysis.
|
|
|
|
|
|
|
|
|
|
`is_required`:::
|
|
|
|
|
(boolean) Whether the field is required.
|
|
|
|
|
|
|
|
|
|
`feature_type`:::
|
|
|
|
|
(string) The feature type of this field for the analysis. May be `categorical`
|
|
|
|
|
or `numerical`.
|
|
|
|
|
|
|
|
|
|
`reason`:::
|
|
|
|
|
(string) The reason a field is not selected to be included in the analysis.
|
|
|
|
|
end::field-selection[]
|
|
|
|
|
|
|
|
|
|
tag::filter[]
|
|
|
|
|
One or more <<analysis-tokenfilters,token filters>>. In addition to the built-in
|
|
|
|
|
token filters, other plugins can provide more token filters. This property is
|
|
|
|
|
optional. If it is not specified, no token filters are applied prior to
|
|
|
|
|
categorization.
|
|
|
|
|
end::filter[]
|
2019-11-06 07:40:27 -05:00
|
|
|
|
|
2019-12-13 05:48:21 -05:00
|
|
|
|
tag::from[]
|
|
|
|
|
Skips the specified number of {dfanalytics-jobs}. The default value is `0`.
|
|
|
|
|
end::from[]
|
|
|
|
|
|
|
|
|
|
tag::function[]
|
|
|
|
|
The analysis function that is used. For example, `count`, `rare`, `mean`, `min`,
|
|
|
|
|
`max`, and `sum`. For more information, see
|
|
|
|
|
{stack-ov}/ml-functions.html[Function reference].
|
|
|
|
|
end::function[]
|
2019-11-06 07:40:27 -05:00
|
|
|
|
|
|
|
|
|
tag::gamma[]
|
2019-12-13 05:48:21 -05:00
|
|
|
|
Regularization parameter to prevent overfitting on the
|
2019-11-06 07:40:27 -05:00
|
|
|
|
training dataset. Multiplies a linear penalty associated with the size of
|
|
|
|
|
individual trees in the forest. The higher the value the more training will
|
|
|
|
|
prefer smaller trees. The smaller this parameter the larger individual trees
|
|
|
|
|
will be and the longer train will take.
|
|
|
|
|
end::gamma[]
|
|
|
|
|
|
2019-12-13 05:48:21 -05:00
|
|
|
|
tag::groups[]
|
|
|
|
|
A list of job groups. A job can belong to no groups or many.
|
|
|
|
|
end::groups[]
|
|
|
|
|
|
|
|
|
|
tag::influencers[]
|
|
|
|
|
A comma separated list of influencer field names. Typically these can be the by,
|
|
|
|
|
over, or partition fields that are used in the detector configuration. You might
|
|
|
|
|
also want to use a field name that is not specifically named in a detector, but
|
|
|
|
|
is available as part of the input data. When you use multiple detectors, the use
|
|
|
|
|
of influencers is recommended as it aggregates results for each influencer entity.
|
|
|
|
|
end::influencers[]
|
|
|
|
|
|
|
|
|
|
tag::job-id-anomaly-detection[]
|
|
|
|
|
Identifier for the {anomaly-job}.
|
|
|
|
|
end::job-id-anomaly-detection[]
|
|
|
|
|
|
|
|
|
|
tag::job-id-data-frame-analytics[]
|
|
|
|
|
Identifier for the {dfanalytics-job}.
|
|
|
|
|
end::job-id-data-frame-analytics[]
|
|
|
|
|
|
|
|
|
|
tag::job-id-anomaly-detection-default[]
|
|
|
|
|
Identifier for the {anomaly-job}. It can be a job identifier, a group name, or a
|
|
|
|
|
wildcard expression. If you do not specify one of these options, the API returns
|
|
|
|
|
information for all {anomaly-jobs}.
|
|
|
|
|
end::job-id-anomaly-detection-default[]
|
|
|
|
|
|
|
|
|
|
tag::job-id-data-frame-analytics-default[]
|
|
|
|
|
Identifier for the {dfanalytics-job}. If you do not specify this option, the API
|
|
|
|
|
returns information for the first hundred {dfanalytics-jobs}.
|
|
|
|
|
end::job-id-data-frame-analytics-default[]
|
|
|
|
|
|
|
|
|
|
tag::job-id-anomaly-detection-list[]
|
|
|
|
|
An identifier for the {anomaly-jobs}. It can be a job
|
|
|
|
|
identifier, a group name, or a comma-separated list of jobs or groups.
|
|
|
|
|
end::job-id-anomaly-detection-list[]
|
|
|
|
|
|
|
|
|
|
tag::job-id-anomaly-detection-wildcard[]
|
|
|
|
|
Identifier for the {anomaly-job}. It can be a job identifier, a group name, or a
|
|
|
|
|
wildcard expression.
|
|
|
|
|
end::job-id-anomaly-detection-wildcard[]
|
|
|
|
|
|
|
|
|
|
tag::job-id-anomaly-detection-wildcard-list[]
|
|
|
|
|
Identifier for the {anomaly-job}. It can be a job identifier, a group name, a
|
|
|
|
|
comma-separated list of jobs or groups, or a wildcard expression.
|
|
|
|
|
end::job-id-anomaly-detection-wildcard-list[]
|
|
|
|
|
|
|
|
|
|
tag::job-id-anomaly-detection-define[]
|
|
|
|
|
Identifier for the {anomaly-job}. This identifier can contain lowercase
|
|
|
|
|
alphanumeric characters (a-z and 0-9), hyphens, and underscores. It must start
|
|
|
|
|
and end with alphanumeric characters.
|
|
|
|
|
end::job-id-anomaly-detection-define[]
|
2019-11-06 07:40:27 -05:00
|
|
|
|
|
2019-12-13 05:48:21 -05:00
|
|
|
|
tag::job-id-data-frame-analytics-define[]
|
|
|
|
|
Identifier for the {dfanalytics-job}. This identifier can contain lowercase
|
|
|
|
|
alphanumeric characters (a-z and 0-9), hyphens, and underscores. It must start
|
|
|
|
|
and end with alphanumeric characters.
|
|
|
|
|
end::job-id-data-frame-analytics-define[]
|
|
|
|
|
|
|
|
|
|
tag::jobs-stats-anomaly-detection[]
|
|
|
|
|
An array of {anomaly-job} statistics objects.
|
|
|
|
|
For more information, see <<ml-jobstats>>.
|
|
|
|
|
end::jobs-stats-anomaly-detection[]
|
|
|
|
|
|
|
|
|
|
tag::lambda[]
|
|
|
|
|
Regularization parameter to prevent overfitting on the
|
2019-11-06 07:40:27 -05:00
|
|
|
|
training dataset. Multiplies an L2 regularisation term which applies to leaf
|
|
|
|
|
weights of the individual trees in the forest. The higher the value the more
|
|
|
|
|
training will attempt to keep leaf weights small. This makes the prediction
|
|
|
|
|
function smoother at the expense of potentially not being able to capture
|
|
|
|
|
relevant relationships between the features and the {depvar}. The smaller this
|
|
|
|
|
parameter the larger individual trees will be and the longer train will take.
|
|
|
|
|
end::lambda[]
|
|
|
|
|
|
2019-12-13 05:48:21 -05:00
|
|
|
|
tag::latency[]
|
|
|
|
|
The size of the window in which to expect data that is out of time order. The
|
|
|
|
|
default value is 0 (no latency). If you specify a non-zero value, it must be
|
|
|
|
|
greater than or equal to one second. For more information about time units, see
|
|
|
|
|
<<time-units>>.
|
|
|
|
|
+
|
|
|
|
|
--
|
|
|
|
|
NOTE: Latency is only applicable when you send data by using
|
|
|
|
|
the <<ml-post-data,post data>> API.
|
|
|
|
|
|
|
|
|
|
--
|
|
|
|
|
end::latency[]
|
2019-11-06 07:40:27 -05:00
|
|
|
|
|
2019-12-13 05:48:21 -05:00
|
|
|
|
tag::maximum-number-trees[]
|
|
|
|
|
Defines the maximum number of trees the forest is allowed
|
2019-11-06 07:40:27 -05:00
|
|
|
|
to contain. The maximum value is 2000.
|
2019-12-13 05:48:21 -05:00
|
|
|
|
end::maximum-number-trees[]
|
2019-11-06 07:40:27 -05:00
|
|
|
|
|
2019-12-13 05:48:21 -05:00
|
|
|
|
tag::memory-estimation[]
|
|
|
|
|
An object containing the memory estimates. The object has the
|
|
|
|
|
following properties:
|
2019-11-06 07:40:27 -05:00
|
|
|
|
|
2019-12-13 05:48:21 -05:00
|
|
|
|
`expected_memory_without_disk`:::
|
|
|
|
|
(string) Estimated memory usage under the assumption that the whole
|
|
|
|
|
{dfanalytics} should happen in memory (i.e. without overflowing to disk).
|
2019-11-06 07:40:27 -05:00
|
|
|
|
|
2019-12-13 05:48:21 -05:00
|
|
|
|
`expected_memory_with_disk`:::
|
|
|
|
|
(string) Estimated memory usage under the assumption that overflowing to disk is
|
|
|
|
|
allowed during {dfanalytics}. `expected_memory_with_disk` is usually smaller
|
|
|
|
|
than `expected_memory_without_disk` as using disk allows to limit the main
|
|
|
|
|
memory needed to perform {dfanalytics}.
|
|
|
|
|
end::memory-estimation[]
|
2019-11-06 07:40:27 -05:00
|
|
|
|
|
2019-12-13 05:48:21 -05:00
|
|
|
|
tag::method[]
|
|
|
|
|
Sets the method that {oldetection} uses. If the method is not set {oldetection}
|
|
|
|
|
uses an ensemble of different methods and normalises and combines their
|
|
|
|
|
individual {olscores} to obtain the overall {olscore}. We recommend to use the
|
|
|
|
|
ensemble method. Available methods are `lof`, `ldof`, `distance_kth_nn`,
|
|
|
|
|
`distance_knn`.
|
|
|
|
|
end::method[]
|
|
|
|
|
|
|
|
|
|
tag::model-memory-limit[]
|
|
|
|
|
The approximate maximum amount of memory resources that are required for
|
|
|
|
|
analytical processing. Once this limit is approached, data pruning becomes
|
|
|
|
|
more aggressive. Upon exceeding this limit, new entities are not modeled. The
|
|
|
|
|
default value for jobs created in version 6.1 and later is `1024mb`.
|
|
|
|
|
This value will need to be increased for jobs that are expected to analyze high
|
|
|
|
|
cardinality fields, but the default is set to a relatively small size to ensure
|
|
|
|
|
that high resource usage is a conscious decision. The default value for jobs
|
|
|
|
|
created in versions earlier than 6.1 is `4096mb`.
|
|
|
|
|
+
|
|
|
|
|
--
|
|
|
|
|
If you specify a number instead of a string, the units are assumed to be MiB.
|
|
|
|
|
Specifying a string is recommended for clarity. If you specify a byte size unit
|
|
|
|
|
of `b` or `kb` and the number does not equate to a discrete number of megabytes,
|
|
|
|
|
it is rounded down to the closest MiB. The minimum valid value is 1 MiB. If you
|
|
|
|
|
specify a value less than 1 MiB, an error occurs. For more information about
|
|
|
|
|
supported byte size units, see <<byte-units>>.
|
|
|
|
|
|
|
|
|
|
If your `elasticsearch.yml` file contains an `xpack.ml.max_model_memory_limit`
|
|
|
|
|
setting, an error occurs when you try to create jobs that have
|
|
|
|
|
`model_memory_limit` values greater than that setting. For more information,
|
|
|
|
|
see <<ml-settings>>.
|
|
|
|
|
--
|
|
|
|
|
end::model-memory-limit[]
|
2019-12-10 08:29:19 -05:00
|
|
|
|
|
2019-12-13 05:48:21 -05:00
|
|
|
|
tag::model-memory-limit-dfa[]
|
|
|
|
|
The approximate maximum amount of memory resources that are permitted for
|
|
|
|
|
analytical processing. The default value for {dfanalytics-jobs} is `1gb`. If
|
|
|
|
|
your `elasticsearch.yml` file contains an `xpack.ml.max_model_memory_limit`
|
|
|
|
|
setting, an error occurs when you try to create {dfanalytics-jobs} that have
|
|
|
|
|
`model_memory_limit` values greater than that setting. For more information, see
|
|
|
|
|
<<ml-settings>>.
|
|
|
|
|
end::model-memory-limit-dfa[]
|
|
|
|
|
|
|
|
|
|
tag::model-plot-config[]
|
|
|
|
|
This advanced configuration option stores model information along with the
|
|
|
|
|
results. It provides a more detailed view into {anomaly-detect}.
|
|
|
|
|
+
|
|
|
|
|
--
|
|
|
|
|
WARNING: If you enable model plot it can add considerable overhead to the
|
|
|
|
|
performance of the system; it is not feasible for jobs with many entities.
|
|
|
|
|
|
|
|
|
|
Model plot provides a simplified and indicative view of the model and its
|
|
|
|
|
bounds. It does not display complex features such as multivariate correlations
|
|
|
|
|
or multimodal data. As such, anomalies may occasionally be reported which cannot
|
|
|
|
|
be seen in the model plot.
|
|
|
|
|
|
|
|
|
|
Model plot config can be configured when the job is created or updated later. It
|
|
|
|
|
must be disabled if performance issues are experienced.
|
|
|
|
|
|
|
|
|
|
The `model_plot_config` object has the following properties:
|
|
|
|
|
|
|
|
|
|
`enabled`:::
|
|
|
|
|
(boolean) If true, enables calculation and storage of the model bounds for
|
|
|
|
|
each entity that is being analyzed. By default, this is not enabled.
|
|
|
|
|
|
|
|
|
|
`terms`:::
|
|
|
|
|
experimental[] (string) Limits data collection to this comma separated list of
|
|
|
|
|
partition or by field values. If terms are not specified or it is an empty
|
|
|
|
|
string, no filtering is applied. For example, "CPU,NetworkIn,DiskWrites".
|
|
|
|
|
Wildcards are not supported. Only the specified `terms` can be viewed when
|
|
|
|
|
using the Single Metric Viewer.
|
|
|
|
|
--
|
|
|
|
|
end::model-plot-config[]
|
|
|
|
|
|
|
|
|
|
tag::model-snapshot-id[]
|
|
|
|
|
A numerical character string that uniquely identifies the model snapshot. For
|
|
|
|
|
example, `1491007364`. For more information about model snapshots, see
|
|
|
|
|
<<ml-snapshot-resource>>.
|
|
|
|
|
end::model-snapshot-id[]
|
|
|
|
|
|
|
|
|
|
tag::model-snapshot-retention-days[]
|
|
|
|
|
The time in days that model snapshots are retained for the job. Older snapshots
|
|
|
|
|
are deleted. The default value is `1`, which means snapshots are retained for
|
|
|
|
|
one day (twenty-four hours).
|
|
|
|
|
end::model-snapshot-retention-days[]
|
|
|
|
|
|
|
|
|
|
tag::multivariate-by-fields[]
|
|
|
|
|
This functionality is reserved for internal use. It is not supported for use in
|
|
|
|
|
customer environments and is not subject to the support SLA of official GA
|
|
|
|
|
features.
|
|
|
|
|
+
|
|
|
|
|
--
|
|
|
|
|
If set to `true`, the analysis will automatically find correlations between
|
|
|
|
|
metrics for a given `by` field value and report anomalies when those
|
|
|
|
|
correlations cease to hold. For example, suppose CPU and memory usage on host A
|
|
|
|
|
is usually highly correlated with the same metrics on host B. Perhaps this
|
|
|
|
|
correlation occurs because they are running a load-balanced application.
|
|
|
|
|
If you enable this property, then anomalies will be reported when, for example,
|
|
|
|
|
CPU usage on host A is high and the value of CPU usage on host B is low. That
|
|
|
|
|
is to say, you'll see an anomaly when the CPU of host A is unusual given
|
|
|
|
|
the CPU of host B.
|
|
|
|
|
|
|
|
|
|
NOTE: To use the `multivariate_by_fields` property, you must also specify
|
|
|
|
|
`by_field_name` in your detector.
|
|
|
|
|
|
|
|
|
|
--
|
|
|
|
|
end::multivariate-by-fields[]
|
|
|
|
|
|
|
|
|
|
tag::n-neighbors[]
|
|
|
|
|
Defines the value for how many nearest neighbors each method of
|
|
|
|
|
{oldetection} will use to calculate its {olscore}. When the value is not set,
|
|
|
|
|
different values will be used for different ensemble members. This helps
|
|
|
|
|
improve diversity in the ensemble. Therefore, only override this if you are
|
|
|
|
|
confident that the value you choose is appropriate for the data set.
|
|
|
|
|
end::n-neighbors[]
|
|
|
|
|
|
|
|
|
|
tag::num-top-classes[]
|
|
|
|
|
Defines the number of categories for which the predicted
|
|
|
|
|
probabilities are reported. It must be non-negative. If it is greater than the
|
|
|
|
|
total number of categories (in the {version} version of the {stack}, it's two)
|
|
|
|
|
to predict then we will report all category probabilities. Defaults to 2.
|
|
|
|
|
end::num-top-classes[]
|
|
|
|
|
|
|
|
|
|
tag::over-field-name[]
|
|
|
|
|
The field used to split the data. In particular, this property is used for
|
|
|
|
|
analyzing the splits with respect to the history of all splits. It is used for
|
|
|
|
|
finding unusual values in the population of all splits. For more information,
|
|
|
|
|
see {stack-ov}/ml-configuring-pop.html[Performing population analysis].
|
|
|
|
|
end::over-field-name[]
|
|
|
|
|
|
|
|
|
|
tag::outlier-fraction[]
|
|
|
|
|
Sets the proportion of the data set that is assumed to be outlying prior to
|
|
|
|
|
{oldetection}. For example, 0.05 means it is assumed that 5% of values are real
|
|
|
|
|
outliers and 95% are inliers.
|
|
|
|
|
end::outlier-fraction[]
|
|
|
|
|
|
|
|
|
|
tag::partition-field-name[]
|
|
|
|
|
The field used to segment the analysis. When you use this property, you have
|
|
|
|
|
completely independent baselines for each value of this field.
|
|
|
|
|
end::partition-field-name[]
|
|
|
|
|
|
|
|
|
|
tag::prediction-field-name[]
|
|
|
|
|
Defines the name of the prediction field in the results.
|
|
|
|
|
Defaults to `<dependent_variable>_prediction`.
|
|
|
|
|
end::prediction-field-name[]
|
|
|
|
|
|
|
|
|
|
tag::randomize-seed[]
|
|
|
|
|
Defines the seed to the random generator that is used to pick
|
2019-12-10 08:29:19 -05:00
|
|
|
|
which documents will be used for training. By default it is randomly generated.
|
|
|
|
|
Set it to a specific value to ensure the same documents are used for training
|
2019-12-13 05:48:21 -05:00
|
|
|
|
assuming other related parameters (e.g. `source`, `analyzed_fields`, etc.) are
|
|
|
|
|
the same.
|
|
|
|
|
end::randomize-seed[]
|
2019-12-10 08:29:19 -05:00
|
|
|
|
|
2019-12-13 05:48:21 -05:00
|
|
|
|
tag::renormalization-window-days[]
|
|
|
|
|
Advanced configuration option. The period over which adjustments to the score
|
|
|
|
|
are applied, as new data is seen. The default value is the longer of 30 days or
|
|
|
|
|
100 `bucket_spans`.
|
|
|
|
|
end::renormalization-window-days[]
|
|
|
|
|
|
|
|
|
|
tag::results-index-name[]
|
|
|
|
|
A text string that affects the name of the {ml} results index. The default value
|
|
|
|
|
is `shared`, which generates an index named `.ml-anomalies-shared`.
|
|
|
|
|
end::results-index-name[]
|
|
|
|
|
|
|
|
|
|
tag::results-retention-days[]
|
|
|
|
|
Advanced configuration option. The number of days for which job results are
|
|
|
|
|
retained. Once per day at 00:30 (server time), results older than this period
|
|
|
|
|
are deleted from {es}. The default value is null, which means results are
|
|
|
|
|
retained.
|
|
|
|
|
end::results-retention-days[]
|
|
|
|
|
|
|
|
|
|
tag::size[]
|
|
|
|
|
Specifies the maximum number of {dfanalytics-jobs} to obtain. The default value
|
|
|
|
|
is `100`.
|
|
|
|
|
end::size[]
|
|
|
|
|
|
|
|
|
|
tag::source-put-dfa[]
|
|
|
|
|
The configuration of how to source the analysis data. It requires an
|
|
|
|
|
`index`. Optionally, `query` and `_source` may be specified.
|
|
|
|
|
|
|
|
|
|
`index`:::
|
|
|
|
|
(Required, string or array) Index or indices on which to perform the
|
|
|
|
|
analysis. It can be a single index or index pattern as well as an array of
|
|
|
|
|
indices or patterns.
|
|
|
|
|
|
|
|
|
|
`query`:::
|
|
|
|
|
(Optional, object) The {es} query domain-specific language
|
|
|
|
|
(<<query-dsl,DSL>>). This value corresponds to the query object in an {es}
|
|
|
|
|
search POST body. All the options that are supported by {es} can be used,
|
|
|
|
|
as this object is passed verbatim to {es}. By default, this property has
|
|
|
|
|
the following value: `{"match_all": {}}`.
|
|
|
|
|
|
|
|
|
|
`_source`:::
|
|
|
|
|
(Optional, object) Specify `includes` and/or `excludes` patterns to select
|
|
|
|
|
which fields will be present in the destination. Fields that are excluded
|
|
|
|
|
cannot be included in the analysis.
|
|
|
|
|
|
|
|
|
|
`includes`::::
|
|
|
|
|
(array) An array of strings that defines the fields that will be
|
|
|
|
|
included in the destination.
|
|
|
|
|
|
|
|
|
|
`excludes`::::
|
|
|
|
|
(array) An array of strings that defines the fields that will be
|
|
|
|
|
excluded from the destination.
|
|
|
|
|
end::source-put-dfa[]
|
|
|
|
|
|
|
|
|
|
tag::standardization-enabled[]
|
|
|
|
|
If `true`, then the following operation is performed on the columns before
|
|
|
|
|
computing outlier scores: (x_i - mean(x_i)) / sd(x_i). Defaults to `true`. For
|
|
|
|
|
more information, see
|
|
|
|
|
https://en.wikipedia.org/wiki/Feature_scaling#Standardization_(Z-score_Normalization)[this wiki page about standardization].
|
|
|
|
|
end::standardization-enabled[]
|
|
|
|
|
|
|
|
|
|
tag::summary-count-field-name[]
|
|
|
|
|
If this property is specified, the data that is fed to the job is expected to be
|
|
|
|
|
pre-summarized. This property value is the name of the field that contains the
|
|
|
|
|
count of raw data points that have been summarized. The same
|
|
|
|
|
`summary_count_field_name` applies to all detectors in the job.
|
|
|
|
|
+
|
|
|
|
|
--
|
|
|
|
|
NOTE: The `summary_count_field_name` property cannot be used with the `metric`
|
|
|
|
|
function.
|
|
|
|
|
|
|
|
|
|
--
|
|
|
|
|
end::summary-count-field-name[]
|
|
|
|
|
|
|
|
|
|
tag::timeout-start[]
|
|
|
|
|
Controls the amount of time to wait until the {dfanalytics-job} starts. Defaults
|
|
|
|
|
to 20 seconds.
|
|
|
|
|
end::timeout-start[]
|
|
|
|
|
|
|
|
|
|
tag::timeout-stop[]
|
|
|
|
|
Controls the amount of time to wait until the {dfanalytics-job} stops. Defaults
|
|
|
|
|
to 20 seconds.
|
|
|
|
|
end::timeout-stop[]
|
|
|
|
|
|
|
|
|
|
tag::time-format[]
|
|
|
|
|
The time format, which can be `epoch`, `epoch_ms`, or a custom pattern. The
|
|
|
|
|
default value is `epoch`, which refers to UNIX or Epoch time (the number of
|
|
|
|
|
seconds since 1 Jan 1970). The value `epoch_ms` indicates that time is measured
|
|
|
|
|
in milliseconds since the epoch. The `epoch` and `epoch_ms` time formats accept
|
|
|
|
|
either integer or real values. +
|
|
|
|
|
+
|
|
|
|
|
--
|
|
|
|
|
NOTE: Custom patterns must conform to the Java `DateTimeFormatter` class.
|
|
|
|
|
When you use date-time formatting patterns, it is recommended that you provide
|
|
|
|
|
the full date, time and time zone. For example: `yyyy-MM-dd'T'HH:mm:ssX`.
|
|
|
|
|
If the pattern that you specify is not sufficient to produce a complete
|
|
|
|
|
timestamp, job creation fails.
|
|
|
|
|
|
|
|
|
|
--
|
|
|
|
|
end::time-format[]
|
|
|
|
|
|
|
|
|
|
tag::tokenizer[]
|
|
|
|
|
The name or definition of the <<analysis-tokenizers,tokenizer>> to use after
|
|
|
|
|
character filters are applied. This property is compulsory if
|
|
|
|
|
`categorization_analyzer` is specified as an object. Machine learning provides a
|
|
|
|
|
tokenizer called `ml_classic` that tokenizes in the same way as the
|
|
|
|
|
non-customizable tokenizer in older versions of the product. If you want to use
|
|
|
|
|
that tokenizer but change the character or token filters, specify
|
|
|
|
|
`"tokenizer": "ml_classic"` in your `categorization_analyzer`.
|
|
|
|
|
end::tokenizer[]
|
|
|
|
|
|
|
|
|
|
tag::training-percent[]
|
|
|
|
|
Defines what percentage of the eligible documents that will
|
|
|
|
|
be used for training. Documents that are ignored by the analysis (for example
|
|
|
|
|
those that contain arrays) won’t be included in the calculation for used
|
|
|
|
|
percentage. Defaults to `100`.
|
|
|
|
|
end::training-percent[]
|
2019-12-10 08:29:19 -05:00
|
|
|
|
|
|
|
|
|
tag::use-null[]
|
|
|
|
|
Defines whether a new series is used as the null series when there is no value
|
|
|
|
|
for the by or partition fields. The default value is `false`.
|
|
|
|
|
end::use-null[]
|