2019-03-06 07:29:34 -05:00
|
|
|
|
2017-06-19 21:01:52 -04:00
|
|
|
[role="xpack"]
|
2017-04-17 14:53:31 -04:00
|
|
|
[[ml-settings]]
|
2018-09-28 12:41:14 -04:00
|
|
|
=== Machine learning settings in Elasticsearch
|
2017-08-11 13:00:35 -04:00
|
|
|
++++
|
2018-09-28 12:41:14 -04:00
|
|
|
<titleabbrev>Machine learning settings</titleabbrev>
|
2017-08-11 13:00:35 -04:00
|
|
|
++++
|
|
|
|
|
2017-04-27 13:51:48 -04:00
|
|
|
You do not need to configure any settings to use {ml}. It is enabled by default.
|
2017-04-17 14:53:31 -04:00
|
|
|
|
2019-05-24 13:44:51 -04:00
|
|
|
IMPORTANT: {ml-cap} uses SSE4.2 instructions, so will only work on machines whose
|
|
|
|
CPUs https://en.wikipedia.org/wiki/SSE4#Supporting_CPUs[support] SSE4.2. If you
|
|
|
|
run {es} on older hardware you must disable {ml} (by setting `xpack.ml.enabled`
|
|
|
|
to `false`).
|
|
|
|
|
2018-09-28 12:41:14 -04:00
|
|
|
All of these settings can be added to the `elasticsearch.yml` configuration file.
|
|
|
|
The dynamic settings can also be updated across a cluster with the
|
|
|
|
<<cluster-update-settings,cluster update settings API>>.
|
|
|
|
|
|
|
|
TIP: Dynamic settings take precedence over settings in the `elasticsearch.yml`
|
|
|
|
file.
|
|
|
|
|
2017-04-17 14:53:31 -04:00
|
|
|
[float]
|
|
|
|
[[general-ml-settings]]
|
2018-09-28 12:41:14 -04:00
|
|
|
==== General machine learning settings
|
2017-04-17 14:53:31 -04:00
|
|
|
|
2017-10-25 12:00:53 -04:00
|
|
|
`node.ml`::
|
|
|
|
Set to `true` (default) to identify the node as a _machine learning node_. +
|
|
|
|
+
|
|
|
|
If set to `false` in `elasticsearch.yml`, the node cannot run jobs. If set to
|
|
|
|
`true` but `xpack.ml.enabled` is set to `false`, the `node.ml` setting is
|
|
|
|
ignored and the node cannot run jobs. If you want to run jobs, there must be at
|
|
|
|
least one machine learning node in your cluster. +
|
|
|
|
+
|
|
|
|
IMPORTANT: On dedicated coordinating nodes or dedicated master nodes, disable
|
|
|
|
the `node.ml` role.
|
|
|
|
|
2017-04-17 14:53:31 -04:00
|
|
|
`xpack.ml.enabled`::
|
2020-04-02 18:34:37 -04:00
|
|
|
Set to `true` (default) to enable {ml} on the node.
|
2017-04-17 14:53:31 -04:00
|
|
|
+
|
2020-04-02 18:34:37 -04:00
|
|
|
If set to `false`, the {ml} APIs are disabled on the node. Therefore the node
|
|
|
|
cannot open jobs, start {dfeeds}, or receive transport (internal) communication
|
|
|
|
requests related to {ml} APIs. If the node is a coordinating node, {ml} requests
|
|
|
|
from clients (including {kib}) also fail. For more information about disabling
|
|
|
|
{ml} in specific {kib} instances, see
|
|
|
|
{kibana-ref}/ml-settings-kb.html[{kib} {ml} settings].
|
2017-04-17 14:53:31 -04:00
|
|
|
+
|
2020-04-02 18:34:37 -04:00
|
|
|
IMPORTANT: If you want to use {ml-features} in your cluster, it is recommended
|
|
|
|
that you set `xpack.ml.enabled` to `true` on all nodes. This is the
|
|
|
|
default behavior. At a minimum, it must be enabled on all master-eligible nodes.
|
|
|
|
If you want to use {ml-features} in clients or {kib}, it must also be enabled on
|
|
|
|
all coordinating nodes.
|
2017-04-17 14:53:31 -04:00
|
|
|
|
2019-11-19 16:43:19 -05:00
|
|
|
`xpack.ml.inference_model.cache_size`::
|
|
|
|
The maximum inference cache size allowed. The inference cache exists in the JVM
|
|
|
|
heap on each ingest node. The cache affords faster processing times for the
|
|
|
|
`inference` processor. The value can be a static byte sized value (i.e. "2gb")
|
|
|
|
or a percentage of total allocated heap. The default is "40%".
|
2020-06-08 16:02:48 -04:00
|
|
|
See also <<model-inference-circuit-breaker>>.
|
2019-11-19 16:43:19 -05:00
|
|
|
|
|
|
|
`xpack.ml.inference_model.time_to_live`::
|
|
|
|
The time to live (TTL) for models in the inference model cache. The TTL is
|
|
|
|
calculated from last access. The `inference` processor attempts to load the
|
|
|
|
model from cache. If the `inference` processor does not receive any documents
|
|
|
|
for the duration of the TTL, the referenced model is flagged for eviction from
|
|
|
|
the cache. If a document is processed later, the model is again loaded into the
|
|
|
|
cache. Defaults to `5m`.
|
|
|
|
|
|
|
|
`xpack.ml.max_inference_processors` (<<cluster-update-settings,Dynamic>>)::
|
|
|
|
The total number of `inference` type processors allowed across all ingest
|
|
|
|
pipelines. Once the limit is reached, adding an `inference` processor to
|
|
|
|
a pipeline is disallowed. Defaults to `50`.
|
|
|
|
|
2019-03-06 07:29:34 -05:00
|
|
|
`xpack.ml.max_machine_memory_percent` (<<cluster-update-settings,Dynamic>>)::
|
2017-11-21 04:51:52 -05:00
|
|
|
The maximum percentage of the machine's memory that {ml} may use for running
|
|
|
|
analytics processes. (These processes are separate to the {es} JVM.) Defaults to
|
|
|
|
`30` percent. The limit is based on the total memory of the machine, not current
|
|
|
|
free memory. Jobs will not be allocated to a node if doing so would cause the
|
|
|
|
estimated memory use of {ml} jobs to exceed the limit.
|
|
|
|
|
2019-03-06 07:29:34 -05:00
|
|
|
`xpack.ml.max_model_memory_limit` (<<cluster-update-settings,Dynamic>>)::
|
2017-10-25 12:00:53 -04:00
|
|
|
The maximum `model_memory_limit` property value that can be set for any job on
|
|
|
|
this node. If you try to create a job with a `model_memory_limit` property value
|
|
|
|
that is greater than this setting value, an error occurs. Existing jobs are not
|
|
|
|
affected when you update this setting. For more information about the
|
2019-12-27 16:30:26 -05:00
|
|
|
`model_memory_limit` property, see <<put-analysislimits>>.
|
2017-12-15 14:19:11 -05:00
|
|
|
|
2020-02-03 08:33:02 -05:00
|
|
|
[[xpack.ml.max_open_jobs]]
|
2019-03-06 07:29:34 -05:00
|
|
|
`xpack.ml.max_open_jobs` (<<cluster-update-settings,Dynamic>>)::
|
|
|
|
The maximum number of jobs that can run simultaneously on a node. Defaults to
|
2019-10-01 02:04:06 -04:00
|
|
|
`20`. In this context, jobs include both {anomaly-jobs} and {dfanalytics-jobs}.
|
|
|
|
The maximum number of jobs is also constrained by memory usage. Thus if the
|
|
|
|
estimated memory usage of the jobs would be higher than allowed, fewer jobs will
|
|
|
|
run on a node. Prior to version 7.1, this setting was a per-node non-dynamic
|
|
|
|
setting. It became a cluster-wide dynamic setting in version 7.1. As a result,
|
|
|
|
changes to its value after node startup are used only after every node in the
|
|
|
|
cluster is running version 7.1 or higher. The maximum permitted value is `512`.
|
2018-09-28 12:41:14 -04:00
|
|
|
|
2019-03-06 07:29:34 -05:00
|
|
|
`xpack.ml.node_concurrent_job_allocations` (<<cluster-update-settings,Dynamic>>)::
|
2017-12-15 14:19:11 -05:00
|
|
|
The maximum number of jobs that can concurrently be in the `opening` state on
|
|
|
|
each node. Typically, jobs spend a small amount of time in this state before
|
|
|
|
they move to `open` state. Jobs that must restore large models when they are
|
|
|
|
opening spend more time in the `opening` state. Defaults to `2`.
|
2018-09-28 12:41:14 -04:00
|
|
|
|
|
|
|
[float]
|
|
|
|
[[advanced-ml-settings]]
|
|
|
|
==== Advanced machine learning settings
|
|
|
|
|
|
|
|
These settings are for advanced use cases; the default values are generally
|
|
|
|
sufficient:
|
|
|
|
|
2019-01-03 12:26:57 -05:00
|
|
|
`xpack.ml.enable_config_migration` (<<cluster-update-settings,Dynamic>>)::
|
|
|
|
Reserved.
|
|
|
|
|
2018-11-01 14:50:30 -04:00
|
|
|
`xpack.ml.max_anomaly_records` (<<cluster-update-settings,Dynamic>>)::
|
2018-09-28 12:41:14 -04:00
|
|
|
The maximum number of records that are output per bucket. The default value is
|
|
|
|
`500`.
|
|
|
|
|
2018-11-01 14:50:30 -04:00
|
|
|
`xpack.ml.max_lazy_ml_nodes` (<<cluster-update-settings,Dynamic>>)::
|
2018-10-18 17:11:36 -04:00
|
|
|
The number of lazily spun up Machine Learning nodes. Useful in situations
|
|
|
|
where ML nodes are not desired until the first Machine Learning Job
|
|
|
|
is opened. It defaults to `0` and has a maximum acceptable value of `3`.
|
|
|
|
If the current number of ML nodes is `>=` than this setting, then it is
|
|
|
|
assumed that there are no more lazy nodes available as the desired number
|
|
|
|
of nodes have already been provisioned. When a job is opened with this
|
|
|
|
setting set at `>0` and there are no nodes that can accept the job, then
|
|
|
|
the job will stay in the `OPENING` state until a new ML node is added to the
|
|
|
|
cluster and the job is assigned to run on that node.
|
|
|
|
+
|
|
|
|
IMPORTANT: This setting assumes some external process is capable of adding ML nodes
|
|
|
|
to the cluster. This setting is only useful when used in conjunction with
|
|
|
|
such an external process.
|
2019-06-25 11:36:02 -04:00
|
|
|
|
|
|
|
`xpack.ml.process_connect_timeout` (<<cluster-update-settings,Dynamic>>)::
|
|
|
|
The connection timeout for {ml} processes that run separately from the {es} JVM.
|
|
|
|
Defaults to `10s`. Some {ml} processing is done by processes that run separately
|
|
|
|
to the {es} JVM. When such processes are started they must connect to the {es}
|
|
|
|
JVM. If such a process does not connect within the time period specified by this
|
|
|
|
setting then the process is assumed to have failed. Defaults to `10s`. The minimum
|
|
|
|
value for this setting is `5s`.
|
2020-06-08 16:02:48 -04:00
|
|
|
|
|
|
|
[[model-inference-circuit-breaker]]
|
|
|
|
==== {ml-cap} circuit breaker settings
|
|
|
|
|
|
|
|
`breaker.model_inference.limit` (<<cluster-update-settings,Dynamic>>)
|
|
|
|
Limit for model inference breaker, defaults to 50% of JVM heap.
|
|
|
|
If the parent circuit breaker is less than 50% of JVM heap, it is bound
|
|
|
|
to that limit instead.
|
|
|
|
See <<circuit-breaker>>.
|
|
|
|
|
|
|
|
`breaker.model_inference.overhead` (<<cluster-update-settings,Dynamic>>)
|
|
|
|
A constant that all accounting estimations are multiplied with to determine
|
|
|
|
a final estimation. Defaults to 1.
|
|
|
|
See <<circuit-breaker>>.
|
|
|
|
|
|
|
|
`breaker.model_inference.type`
|
|
|
|
The underlying type of the circuit breaker. There are two valid options:
|
|
|
|
`noop`, meaning the circuit breaker does nothing to prevent too much memory usage,
|
|
|
|
`memory`, meaning the circuit breaker tracks the memory used by inference models and
|
|
|
|
could potentially break and prevent OutOfMemory errors.
|
|
|
|
The default is `memory`.
|