2017-04-04 18:26:39 -04:00
|
|
|
[[ml-limitations]]
|
|
|
|
== Machine Learning Limitations
|
|
|
|
|
2017-04-28 11:04:08 -04:00
|
|
|
The following limitations and known problems apply to the {version} release of
|
|
|
|
{xpack}:
|
|
|
|
|
|
|
|
[float]
|
|
|
|
=== Pop-ups must be enabled in browsers
|
|
|
|
//See x-pack-elasticsearch/#844
|
|
|
|
|
2017-05-02 15:45:42 -04:00
|
|
|
The {xpackml} features in Kibana use pop-ups. You must configure your
|
2017-04-28 11:04:08 -04:00
|
|
|
web browser so that it does not block pop-up windows or create an
|
|
|
|
exception for your Kibana URL.
|
|
|
|
|
|
|
|
|
|
|
|
[float]
|
|
|
|
=== Jobs must be re-created at GA
|
|
|
|
//See x-pack-elasticsearch/#844
|
|
|
|
|
2017-05-02 15:45:42 -04:00
|
|
|
The models that you create in the {xpackml} Beta cannot be upgraded.
|
|
|
|
After the {xpackml} features become generally available, you must
|
2017-04-28 11:04:08 -04:00
|
|
|
re-create your jobs. If you have data sets and job configurations that
|
|
|
|
you work with extensively in the beta, make note of all the details so
|
|
|
|
that you can re-create them successfully.
|
|
|
|
|
|
|
|
|
|
|
|
[float]
|
|
|
|
=== Anomaly Explorer omissions and limitations
|
|
|
|
//See x-pack-elasticsearch/#844
|
|
|
|
|
|
|
|
In Kibana, Anomaly Explorer charts are not displayed for anomalies
|
|
|
|
that were due to categorization, `time_of_day` functions, or `time_of_week`
|
|
|
|
functions. Those particular results do not display well as time series
|
|
|
|
charts.
|
|
|
|
|
|
|
|
The Anomaly Explorer charts can also look odd in circumstances where there
|
|
|
|
is very little data to plot. For example, if there is only one data point, it is
|
|
|
|
represented as a single dot. If there are only two data points, they are joined
|
|
|
|
by a line.
|
|
|
|
|
|
|
|
[float]
|
2017-05-02 15:45:42 -04:00
|
|
|
=== Jobs close on the {dfeed} end date
|
2017-04-28 11:04:08 -04:00
|
|
|
//See x-pack-elasticsearch/#1037
|
|
|
|
|
2017-05-02 15:45:42 -04:00
|
|
|
If you start a {dfeed} and specify an end date, it will close the job when
|
|
|
|
the {dfeed} stops. This behavior avoids having numerous open one-time jobs.
|
2017-04-28 11:04:08 -04:00
|
|
|
|
2017-05-02 15:45:42 -04:00
|
|
|
If you do not specify an end date when you start a {dfeed}, the job
|
|
|
|
remains open when you stop the {dfeed}. This behavior avoids the overhead
|
|
|
|
of closing and re-opening large jobs when there are pauses in the {dfeed}.
|
2017-04-28 11:04:08 -04:00
|
|
|
|
|
|
|
[float]
|
|
|
|
=== Post data API requires JSON format
|
|
|
|
|
|
|
|
The post data API enables you to send data to a job for analysis. The data that
|
|
|
|
you send to the job must use the JSON format.
|
|
|
|
|
|
|
|
For more information about this API, see <<ml-post-data>>.
|
|
|
|
|
|
|
|
|
2017-04-04 18:26:39 -04:00
|
|
|
[float]
|
2017-04-28 11:04:08 -04:00
|
|
|
=== Misleading high missing field counts
|
2017-04-04 18:26:39 -04:00
|
|
|
//See x-pack-elasticsearch/#684
|
|
|
|
|
2017-04-28 11:04:08 -04:00
|
|
|
One of the counts associated with a {ml} job is `missing_field_count`,
|
2017-04-04 18:26:39 -04:00
|
|
|
which indicates the number of records that are missing a configured field.
|
2017-04-28 11:04:08 -04:00
|
|
|
//This information is most useful when your job analyzes CSV data. In this case,
|
|
|
|
//missing fields indicate data is not being analyzed and you might receive poor results.
|
2017-04-04 18:26:39 -04:00
|
|
|
|
2017-04-28 11:04:08 -04:00
|
|
|
Since jobs analyze JSON data, the `missing_field_count` might be misleading.
|
|
|
|
Missing fields might be expected due to the structure of the data and therefore
|
|
|
|
do not generate poor results.
|
2017-04-04 18:26:39 -04:00
|
|
|
|
2017-04-28 11:04:08 -04:00
|
|
|
For more information about `missing_field_count`,
|
|
|
|
see <<ml-datacounts,Data Counts Objects>>.
|
2017-04-04 18:26:39 -04:00
|
|
|
|
|
|
|
|
2017-04-28 11:04:08 -04:00
|
|
|
[float]
|
|
|
|
=== Terms aggregation size affects data analysis
|
|
|
|
//See x-pack-elasticsearch/#601
|
|
|
|
|
|
|
|
By default, the `terms` aggregation returns the buckets for the top ten terms.
|
|
|
|
You can change this default behavior by setting the `size` parameter.
|
2017-04-04 18:26:39 -04:00
|
|
|
|
2017-04-28 11:04:08 -04:00
|
|
|
If you are send pre-aggregated data to a job for analysis, you must ensure
|
|
|
|
that the `size` is configured correctly. Otherwise, some data might not be
|
|
|
|
analyzed.
|
2017-04-04 18:26:39 -04:00
|
|
|
|
2017-04-28 11:04:08 -04:00
|
|
|
[float]
|
|
|
|
=== Jobs created in {kib} use model plot config and pre-aggregated data
|
|
|
|
//See x-pack-elasticsearch/#844
|
2017-04-04 18:26:39 -04:00
|
|
|
|
2017-04-28 11:04:08 -04:00
|
|
|
If you create single or multi-metric jobs in {kib}, it might enable some
|
|
|
|
options under the covers that you'd want to reconsider for large or
|
|
|
|
long-running jobs.
|
2017-04-04 18:26:39 -04:00
|
|
|
|
2017-04-28 11:04:08 -04:00
|
|
|
For example, when you create a single metric job in {kib}, it generally
|
|
|
|
enables the `model_plot_config` advanced configuration option. That configuration
|
|
|
|
option causes model information to be stored along with the results and provides
|
|
|
|
a more detailed view into anomaly detection. It is specifically used by the
|
|
|
|
**Single Metric Viewer** in {kib}. When this option is enabled, however, it can
|
|
|
|
add considerable overhead to the performance of the system. If you have jobs
|
|
|
|
with many entities, for example data from tens of thousands of servers, storing
|
|
|
|
this additional model information for every bucket might be problematic. If you
|
|
|
|
are not certain that you need this option or if you experience performance
|
|
|
|
issues, edit your job configuration to disable this option.
|
|
|
|
|
|
|
|
For more information, see <<ml-apimodelplotconfig,Model Plot Config>>.
|
|
|
|
|
|
|
|
Likewise, when you create a single or multi-metric job in {kib}, in some cases
|
|
|
|
it uses aggregations on the data that it retrieves from {es}. One of the
|
|
|
|
benefits of summarizing data this way is that {es} automatically distributes
|
|
|
|
these calculations across your cluster. This summarized data is then fed into
|
2017-05-02 15:45:42 -04:00
|
|
|
{xpackml} instead of raw results, which reduces the volume of data that must
|
2017-04-28 11:04:08 -04:00
|
|
|
be considered while detecting anomalies. However, if you have two jobs, one of
|
|
|
|
which uses pre-aggregated data and another that does not, their results might
|
|
|
|
differ. This difference is due to the difference in precision of the input data.
|
|
|
|
The {ml} analytics are designed to be aggregation-aware and the likely increase
|
|
|
|
in performance that is gained by pre-aggregating the data makes the potentially
|
|
|
|
poorer precision worthwhile. If you want to view or change the aggregations
|
2017-05-02 15:45:42 -04:00
|
|
|
that are used in your job, refer to the `aggregations` property in your {dfeed}.
|
2017-04-28 11:04:08 -04:00
|
|
|
|
|
|
|
For more information, see <<ml-datafeed-resource>>.
|