174 lines
7.0 KiB
Plaintext
174 lines
7.0 KiB
Plaintext
[[ml-limitations]]
|
|
== Machine Learning Limitations
|
|
|
|
The following limitations and known problems apply to the {version} release of
|
|
{xpack}:
|
|
|
|
[float]
|
|
=== Categorization uses English tokenization rules and dictionary words
|
|
//See x-pack-elasticsearch/#3021
|
|
Categorization identifies static parts of unstructured logs and groups similar
|
|
messages together. This is currently supported only for English language log
|
|
messages.
|
|
|
|
[float]
|
|
=== Pop-ups must be enabled in browsers
|
|
//See x-pack-elasticsearch/#844
|
|
|
|
The {xpackml} features in {kib} use pop-ups. You must configure your
|
|
web browser so that it does not block pop-up windows or create an
|
|
exception for your {kib} URL.
|
|
|
|
[float]
|
|
=== {xpackml} features do not yet support cross cluster search
|
|
|
|
At this time, you cannot use cross cluster search in either the {ml} APIs or the
|
|
{ml} features in {kib}.
|
|
|
|
For more information about cross cluster search,
|
|
see {ref}/modules-cross-cluster-search.html[Cross Cluster Search].
|
|
|
|
[float]
|
|
=== {xpackml} features are not supported on tribe nodes
|
|
|
|
You cannot use {ml} features on tribe nodes. For more information about that
|
|
type of node, see
|
|
{ref}/modules-tribe.html[Tribe node].
|
|
|
|
[float]
|
|
=== Anomaly Explorer omissions and limitations
|
|
//See x-pack-elasticsearch/#844 and x-pack-kibana/#1461
|
|
|
|
In {kib}, Anomaly Explorer charts are not displayed for anomalies
|
|
that were due to categorization, `time_of_day` functions, or `time_of_week`
|
|
functions. Those particular results do not display well as time series
|
|
charts.
|
|
|
|
The charts are also not displayed for detectors that use script fields. In that
|
|
case, the original source data cannot be easily searched because it has been
|
|
somewhat transformed by the script.
|
|
|
|
The Anomaly Explorer charts can also look odd in circumstances where there
|
|
is very little data to plot. For example, if there is only one data point, it is
|
|
represented as a single dot. If there are only two data points, they are joined
|
|
by a line.
|
|
|
|
[float]
|
|
=== Jobs close on the {dfeed} end date
|
|
//See x-pack-elasticsearch/#1037
|
|
|
|
If you start a {dfeed} and specify an end date, it will close the job when
|
|
the {dfeed} stops. This behavior avoids having numerous open one-time jobs.
|
|
|
|
If you do not specify an end date when you start a {dfeed}, the job
|
|
remains open when you stop the {dfeed}. This behavior avoids the overhead
|
|
of closing and re-opening large jobs when there are pauses in the {dfeed}.
|
|
|
|
[float]
|
|
=== Post data API requires JSON format
|
|
|
|
The post data API enables you to send data to a job for analysis. The data that
|
|
you send to the job must use the JSON format.
|
|
|
|
For more information about this API, see
|
|
{ref}/ml-post-data.html[Post Data to Jobs].
|
|
|
|
|
|
[float]
|
|
=== Misleading high missing field counts
|
|
//See x-pack-elasticsearch/#684
|
|
|
|
One of the counts associated with a {ml} job is `missing_field_count`,
|
|
which indicates the number of records that are missing a configured field.
|
|
//This information is most useful when your job analyzes CSV data. In this case,
|
|
//missing fields indicate data is not being analyzed and you might receive poor results.
|
|
|
|
Since jobs analyze JSON data, the `missing_field_count` might be misleading.
|
|
Missing fields might be expected due to the structure of the data and therefore
|
|
do not generate poor results.
|
|
|
|
For more information about `missing_field_count`,
|
|
see {ref}/ml-jobstats.html#ml-datacounts[Data Counts Objects].
|
|
|
|
|
|
[float]
|
|
=== Terms aggregation size affects data analysis
|
|
//See x-pack-elasticsearch/#601
|
|
|
|
By default, the `terms` aggregation returns the buckets for the top ten terms.
|
|
You can change this default behavior by setting the `size` parameter.
|
|
|
|
If you are send pre-aggregated data to a job for analysis, you must ensure
|
|
that the `size` is configured correctly. Otherwise, some data might not be
|
|
analyzed.
|
|
|
|
|
|
[float]
|
|
=== Time-based index patterns are not supported
|
|
//See x-pack-elasticsearch/#1910
|
|
|
|
It is not possible to create an {xpackml} analysis job that uses time-based
|
|
index patterns, for example `[logstash-]YYYY.MM.DD`.
|
|
This applies to the single metric or multi metric job creation wizards in {kib}.
|
|
|
|
|
|
[float]
|
|
=== Fields named "by", "count", or "over" cannot be used to split data
|
|
//See x-pack-elasticsearch/#858
|
|
|
|
You cannot use the following field names in the `by_field_name` or
|
|
`over_field_name` properties in a job: `by`; `count`; `over`. This limitation
|
|
also applies to those properties when you create advanced jobs in {kib}.
|
|
|
|
|
|
[float]
|
|
=== Jobs created in {kib} use model plot config and pre-aggregated data
|
|
//See x-pack-elasticsearch/#844
|
|
|
|
If you create single or multi-metric jobs in {kib}, it might enable some
|
|
options under the covers that you'd want to reconsider for large or
|
|
long-running jobs.
|
|
|
|
For example, when you create a single metric job in {kib}, it generally
|
|
enables the `model_plot_config` advanced configuration option. That configuration
|
|
option causes model information to be stored along with the results and provides
|
|
a more detailed view into anomaly detection. It is specifically used by the
|
|
**Single Metric Viewer** in {kib}. When this option is enabled, however, it can
|
|
add considerable overhead to the performance of the system. If you have jobs
|
|
with many entities, for example data from tens of thousands of servers, storing
|
|
this additional model information for every bucket might be problematic. If you
|
|
are not certain that you need this option or if you experience performance
|
|
issues, edit your job configuration to disable this option.
|
|
|
|
For more information, see
|
|
{ref}/ml-job-resource.html#ml-apimodelplotconfig[Model Plot Config].
|
|
|
|
Likewise, when you create a single or multi-metric job in {kib}, in some cases
|
|
it uses aggregations on the data that it retrieves from {es}. One of the
|
|
benefits of summarizing data this way is that {es} automatically distributes
|
|
these calculations across your cluster. This summarized data is then fed into
|
|
{xpackml} instead of raw results, which reduces the volume of data that must
|
|
be considered while detecting anomalies. However, if you have two jobs, one of
|
|
which uses pre-aggregated data and another that does not, their results might
|
|
differ. This difference is due to the difference in precision of the input data.
|
|
The {ml} analytics are designed to be aggregation-aware and the likely increase
|
|
in performance that is gained by pre-aggregating the data makes the potentially
|
|
poorer precision worthwhile. If you want to view or change the aggregations
|
|
that are used in your job, refer to the `aggregations` property in your {dfeed}.
|
|
|
|
For more information, see {ref}/ml-datafeed-resource.html[Datafeed Resources].
|
|
|
|
[float]
|
|
=== Security Integration
|
|
|
|
When {security} is enabled, a {dfeed} stores the roles of the user who created
|
|
or updated the {dfeed} **at that time**. This means that if those roles are
|
|
updated then the {dfeed} will subsequently run with the new permissions associated
|
|
with the roles. However, if the user's roles are adjusted after creating or
|
|
updating the {dfeed} then the {dfeed} will continue to run with the permissions
|
|
associated with the original roles.
|
|
|
|
A way to update the roles stored within the {dfeed} without changing any other
|
|
settings is to submit an empty JSON document (`{}`) to the
|
|
{ref}/ml-update-datafeed.html[update {dfeed} API].
|