199 lines
8.1 KiB
Plaintext
199 lines
8.1 KiB
Plaintext
[[ml-limitations]]
|
|
== Machine Learning Limitations
|
|
|
|
The following limitations and known problems apply to the {version} release of
|
|
{xpack}:
|
|
|
|
[float]
|
|
=== Categorization uses English dictionary words
|
|
//See x-pack-elasticsearch/#3021
|
|
Categorization identifies static parts of unstructured logs and groups similar
|
|
messages together. The default categorization tokenizer assumes English language
|
|
log messages. For other languages you must define a different
|
|
`categorization_analyzer` for your job. For more information, see
|
|
<<ml-configuring-categories>>.
|
|
|
|
Additionally, a dictionary used to influence the categorization process contains
|
|
only English words. This means categorization might work better in English than
|
|
in other languages. The ability to customize the dictionary will be added in a
|
|
future release.
|
|
|
|
[float]
|
|
=== Pop-ups must be enabled in browsers
|
|
//See x-pack-elasticsearch/#844
|
|
|
|
The {xpackml} features in {kib} use pop-ups. You must configure your
|
|
web browser so that it does not block pop-up windows or create an
|
|
exception for your {kib} URL.
|
|
|
|
[float]
|
|
=== Anomaly Explorer omissions and limitations
|
|
//See x-pack-elasticsearch/#844 and x-pack-kibana/#1461
|
|
|
|
In {kib}, Anomaly Explorer charts are not displayed for anomalies
|
|
that were due to categorization, `time_of_day` functions, or `time_of_week`
|
|
functions. Those particular results do not display well as time series
|
|
charts.
|
|
|
|
The charts are also not displayed for detectors that use script fields. In that
|
|
case, the original source data cannot be easily searched because it has been
|
|
somewhat transformed by the script.
|
|
|
|
The Anomaly Explorer charts can also look odd in circumstances where there
|
|
is very little data to plot. For example, if there is only one data point, it is
|
|
represented as a single dot. If there are only two data points, they are joined
|
|
by a line.
|
|
|
|
[float]
|
|
=== Jobs close on the {dfeed} end date
|
|
//See x-pack-elasticsearch/#1037
|
|
|
|
If you start a {dfeed} and specify an end date, it will close the job when
|
|
the {dfeed} stops. This behavior avoids having numerous open one-time jobs.
|
|
|
|
If you do not specify an end date when you start a {dfeed}, the job
|
|
remains open when you stop the {dfeed}. This behavior avoids the overhead
|
|
of closing and re-opening large jobs when there are pauses in the {dfeed}.
|
|
|
|
[float]
|
|
=== Jobs created in {kib} must use {dfeeds}
|
|
|
|
If you create jobs in {kib}, you must use {dfeeds}. If the data that you want to
|
|
analyze is not stored in {es}, you cannot use {dfeeds} and therefore you cannot
|
|
create your jobs in {kib}. You can, however, use the {ml} APIs to create jobs
|
|
and to send batches of data directly to the jobs. For more information, see
|
|
<<ml-dfeeds>> and <<ml-api-quickref>>.
|
|
|
|
[float]
|
|
=== Post data API requires JSON format
|
|
|
|
The post data API enables you to send data to a job for analysis. The data that
|
|
you send to the job must use the JSON format.
|
|
|
|
For more information about this API, see
|
|
{ref}/ml-post-data.html[Post Data to Jobs].
|
|
|
|
|
|
[float]
|
|
=== Misleading high missing field counts
|
|
//See x-pack-elasticsearch/#684
|
|
|
|
One of the counts associated with a {ml} job is `missing_field_count`,
|
|
which indicates the number of records that are missing a configured field.
|
|
//This information is most useful when your job analyzes CSV data. In this case,
|
|
//missing fields indicate data is not being analyzed and you might receive poor results.
|
|
|
|
Since jobs analyze JSON data, the `missing_field_count` might be misleading.
|
|
Missing fields might be expected due to the structure of the data and therefore
|
|
do not generate poor results.
|
|
|
|
For more information about `missing_field_count`,
|
|
see {ref}/ml-jobstats.html#ml-datacounts[Data Counts Objects].
|
|
|
|
|
|
[float]
|
|
=== Terms aggregation size affects data analysis
|
|
//See x-pack-elasticsearch/#601
|
|
|
|
By default, the `terms` aggregation returns the buckets for the top ten terms.
|
|
You can change this default behavior by setting the `size` parameter.
|
|
|
|
If you are send pre-aggregated data to a job for analysis, you must ensure
|
|
that the `size` is configured correctly. Otherwise, some data might not be
|
|
analyzed.
|
|
|
|
|
|
[float]
|
|
=== Time-based index patterns are not supported
|
|
//See x-pack-elasticsearch/#1910
|
|
|
|
It is not possible to create an {xpackml} analysis job that uses time-based
|
|
index patterns, for example `[logstash-]YYYY.MM.DD`.
|
|
This applies to the single metric or multi metric job creation wizards in {kib}.
|
|
|
|
|
|
[float]
|
|
=== Fields named "by", "count", or "over" cannot be used to split data
|
|
//See x-pack-elasticsearch/#858
|
|
|
|
You cannot use the following field names in the `by_field_name` or
|
|
`over_field_name` properties in a job: `by`; `count`; `over`. This limitation
|
|
also applies to those properties when you create advanced jobs in {kib}.
|
|
|
|
|
|
[float]
|
|
=== Jobs created in {kib} use model plot config and pre-aggregated data
|
|
//See x-pack-elasticsearch/#844
|
|
|
|
If you create single or multi-metric jobs in {kib}, it might enable some
|
|
options under the covers that you'd want to reconsider for large or
|
|
long-running jobs.
|
|
|
|
For example, when you create a single metric job in {kib}, it generally
|
|
enables the `model_plot_config` advanced configuration option. That configuration
|
|
option causes model information to be stored along with the results and provides
|
|
a more detailed view into anomaly detection. It is specifically used by the
|
|
**Single Metric Viewer** in {kib}. When this option is enabled, however, it can
|
|
add considerable overhead to the performance of the system. If you have jobs
|
|
with many entities, for example data from tens of thousands of servers, storing
|
|
this additional model information for every bucket might be problematic. If you
|
|
are not certain that you need this option or if you experience performance
|
|
issues, edit your job configuration to disable this option.
|
|
|
|
For more information, see
|
|
{ref}/ml-job-resource.html#ml-apimodelplotconfig[Model Plot Config].
|
|
|
|
Likewise, when you create a single or multi-metric job in {kib}, in some cases
|
|
it uses aggregations on the data that it retrieves from {es}. One of the
|
|
benefits of summarizing data this way is that {es} automatically distributes
|
|
these calculations across your cluster. This summarized data is then fed into
|
|
{xpackml} instead of raw results, which reduces the volume of data that must
|
|
be considered while detecting anomalies. However, if you have two jobs, one of
|
|
which uses pre-aggregated data and another that does not, their results might
|
|
differ. This difference is due to the difference in precision of the input data.
|
|
The {ml} analytics are designed to be aggregation-aware and the likely increase
|
|
in performance that is gained by pre-aggregating the data makes the potentially
|
|
poorer precision worthwhile. If you want to view or change the aggregations
|
|
that are used in your job, refer to the `aggregations` property in your {dfeed}.
|
|
|
|
For more information, see {ref}/ml-datafeed-resource.html[Datafeed Resources].
|
|
|
|
[float]
|
|
=== Security Integration
|
|
|
|
When {security} is enabled, a {dfeed} stores the roles of the user who created
|
|
or updated the {dfeed} **at that time**. This means that if those roles are
|
|
updated then the {dfeed} subsequently runs with the new permissions that are
|
|
associated with the roles. However, if the user's roles are adjusted after
|
|
creating or updating the {dfeed}, the {dfeed} continues to run with the
|
|
permissions that were associated with the original roles. For more information,
|
|
see <<ml-dfeeds>>.
|
|
|
|
[float]
|
|
=== Forecasts cannot be created for population jobs
|
|
|
|
If you use an `over_field_name` property in your job (that is to say, it's a
|
|
_population job_), you cannot create a forecast. If you try to create a forecast
|
|
for this type of job, an error occurs. For more information about forecasts,
|
|
see <<ml-forecasting>>.
|
|
|
|
[float]
|
|
=== Forecasts cannot be created for jobs that use geographic, rare, or time functions
|
|
|
|
If you use any of the following analytical functions in your job, you cannot
|
|
create a forecast:
|
|
|
|
* `lat_long`
|
|
* `rare` and `freq_rare`
|
|
* `time_of_day` and `time_of_week`
|
|
|
|
If you try to create a forecast for this type of job, an error occurs. For more
|
|
information about any of these functions, see <<ml-functions>>.
|
|
|
|
[float]
|
|
=== Jobs must be stopped before upgrades
|
|
|
|
You must stop any {ml} jobs that are running before you start the upgrade
|
|
process. For more information, see <<stopping-ml>> and
|
|
{stack-ref}/upgrading-elastic-stack.html[Upgrading the Elastic Stack].
|