diff --git a/docs/reference/settings/ml-settings.asciidoc b/docs/reference/settings/ml-settings.asciidoc index f03ed317a27..db8e0c4dac9 100644 --- a/docs/reference/settings/ml-settings.asciidoc +++ b/docs/reference/settings/ml-settings.asciidoc @@ -10,15 +10,9 @@ // tag::ml-settings-description-tag[] You do not need to configure any settings to use {ml}. It is enabled by default. -IMPORTANT: {ml-cap} uses SSE4.2 instructions, so will only work on machines whose -CPUs {wikipedia}/SSE4#Supporting_CPUs[support] SSE4.2. If you -run {es} on older hardware you must disable {ml} (by setting `xpack.ml.enabled` -to `false`). - -All of these settings can be added to the `elasticsearch.yml` configuration -file. The dynamic settings can also be updated across a cluster with the -<>. Dynamic settings take -precedence over settings in the `elasticsearch.yml` file. +IMPORTANT: {ml-cap} uses SSE4.2 instructions, so it works only on machines whose +CPUs {wikipedia}/SSE4#Supporting_CPUs[support] SSE4.2. If you run {es} on older +hardware, you must disable {ml} (by setting `xpack.ml.enabled` to `false`). // end::ml-settings-description-tag[] @@ -27,8 +21,9 @@ precedence over settings in the `elasticsearch.yml` file. ==== General machine learning settings `node.roles: [ ml ]`:: -Set `node.roles` to contain `ml` to identify the node as a _{ml} node_ that is -capable of running jobs. Every node is a {ml} node by default.+ +(<>) Set `node.roles` to contain `ml` to identify +the node as a _{ml} node_ that is capable of running jobs. Every node is a {ml} +node by default. + If you use the `node.roles` setting, then all required roles must be explicitly set. Consult <> to learn more. @@ -38,7 +33,8 @@ the `ml` role. `xpack.ml.enabled`:: -Set to `true` (default) to enable {ml} APIs on the node. +(<>) Set to `true` (default) to enable {ml} APIs +on the node. + If set to `false`, the {ml} APIs are disabled on the node. Therefore the node cannot open jobs, start {dfeeds}, or receive transport (internal) communication @@ -54,58 +50,62 @@ want to use {ml-features} in clients or {kib}, it must also be enabled on all coordinating nodes. `xpack.ml.inference_model.cache_size`:: -The maximum inference cache size allowed. The inference cache exists in the JVM -heap on each ingest node. The cache affords faster processing times for the -`inference` processor. The value can be a static byte sized value (i.e. "2gb") -or a percentage of total allocated heap. The default is "40%". -See also <>. +(<>) The maximum inference cache size allowed. +The inference cache exists in the JVM heap on each ingest node. The cache +affords faster processing times for the `inference` processor. The value can be +a static byte sized value (i.e. "2gb") or a percentage of total allocated heap. +The default is "40%". See also <>. [[xpack-interference-model-ttl]] // tag::interference-model-ttl-tag[] `xpack.ml.inference_model.time_to_live` {ess-icon}:: -The time to live (TTL) for models in the inference model cache. The TTL is -calculated from last access. The `inference` processor attempts to load the -model from cache. If the `inference` processor does not receive any documents -for the duration of the TTL, the referenced model is flagged for eviction from -the cache. If a document is processed later, the model is again loaded into the -cache. Defaults to `5m`. +(<>) The time to live (TTL) for models in the +inference model cache. The TTL is calculated from last access. The `inference` +processor attempts to load the model from cache. If the `inference` processor +does not receive any documents for the duration of the TTL, the referenced model +is flagged for eviction from the cache. If a document is processed later, the +model is again loaded into the cache. Defaults to `5m`. // end::interference-model-ttl-tag[] -`xpack.ml.max_inference_processors` (<>):: -The total number of `inference` type processors allowed across all ingest -pipelines. Once the limit is reached, adding an `inference` processor to -a pipeline is disallowed. Defaults to `50`. +`xpack.ml.max_inference_processors`:: +(<>) The total number of `inference` type +processors allowed across all ingest pipelines. Once the limit is reached, +adding an `inference` processor to a pipeline is disallowed. Defaults to `50`. -`xpack.ml.max_machine_memory_percent` (<>):: -The maximum percentage of the machine's memory that {ml} may use for running -analytics processes. (These processes are separate to the {es} JVM.) Defaults to -`30` percent. The limit is based on the total memory of the machine, not current -free memory. Jobs will not be allocated to a node if doing so would cause the -estimated memory use of {ml} jobs to exceed the limit. +`xpack.ml.max_machine_memory_percent`:: +(<>) The maximum percentage of the machine's +memory that {ml} may use for running analytics processes. (These processes are +separate to the {es} JVM.) Defaults to `30` percent. The limit is based on the +total memory of the machine, not current free memory. Jobs are not allocated to +a node if doing so would cause the estimated memory use of {ml} jobs to exceed +the limit. -`xpack.ml.max_model_memory_limit` (<>):: -The maximum `model_memory_limit` property value that can be set for any job on -this node. If you try to create a job with a `model_memory_limit` property value -that is greater than this setting value, an error occurs. Existing jobs are not -affected when you update this setting. For more information about the -`model_memory_limit` property, see <>. +`xpack.ml.max_model_memory_limit`:: +(<>) The maximum `model_memory_limit` property +value that can be set for any job on this node. If you try to create a job with +a `model_memory_limit` property value that is greater than this setting value, +an error occurs. Existing jobs are not affected when you update this setting. +For more information about the `model_memory_limit` property, see +<>. [[xpack.ml.max_open_jobs]] -`xpack.ml.max_open_jobs` (<>):: -The maximum number of jobs that can run simultaneously on a node. Defaults to -`20`. In this context, jobs include both {anomaly-jobs} and {dfanalytics-jobs}. -The maximum number of jobs is also constrained by memory usage. Thus if the -estimated memory usage of the jobs would be higher than allowed, fewer jobs will -run on a node. Prior to version 7.1, this setting was a per-node non-dynamic -setting. It became a cluster-wide dynamic setting in version 7.1. As a result, -changes to its value after node startup are used only after every node in the -cluster is running version 7.1 or higher. The maximum permitted value is `512`. +`xpack.ml.max_open_jobs`:: +(<>) The maximum number of jobs that can run +simultaneously on a node. Defaults to `20`. In this context, jobs include both +{anomaly-jobs} and {dfanalytics-jobs}. The maximum number of jobs is also +constrained by memory usage. Thus if the estimated memory usage of the jobs +would be higher than allowed, fewer jobs will run on a node. Prior to version +7.1, this setting was a per-node non-dynamic setting. It became a cluster-wide +dynamic setting in version 7.1. As a result, changes to its value after node +startup are used only after every node in the cluster is running version 7.1 or +higher. The maximum permitted value is `512`. -`xpack.ml.node_concurrent_job_allocations` (<>):: -The maximum number of jobs that can concurrently be in the `opening` state on -each node. Typically, jobs spend a small amount of time in this state before -they move to `open` state. Jobs that must restore large models when they are -opening spend more time in the `opening` state. Defaults to `2`. +`xpack.ml.node_concurrent_job_allocations`:: +(<>) The maximum number of jobs that can +concurrently be in the `opening` state on each node. Typically, jobs spend a +small amount of time in this state before they move to `open` state. Jobs that +must restore large models when they are opening spend more time in the `opening` +state. Defaults to `2`. [discrete] [[advanced-ml-settings]] @@ -114,52 +114,55 @@ opening spend more time in the `opening` state. Defaults to `2`. These settings are for advanced use cases; the default values are generally sufficient: -`xpack.ml.enable_config_migration` (<>):: -Reserved. +`xpack.ml.enable_config_migration`:: +(<>) Reserved. -`xpack.ml.max_anomaly_records` (<>):: -The maximum number of records that are output per bucket. The default value is -`500`. +`xpack.ml.max_anomaly_records`:: +(<>) The maximum number of records that are +output per bucket. The default value is `500`. -`xpack.ml.max_lazy_ml_nodes` (<>):: -The number of lazily spun up Machine Learning nodes. Useful in situations -where ML nodes are not desired until the first Machine Learning Job -is opened. It defaults to `0` and has a maximum acceptable value of `3`. -If the current number of ML nodes is `>=` than this setting, then it is +`xpack.ml.max_lazy_ml_nodes`:: +(<>) The number of lazily spun up {ml} nodes. +Useful in situations where {ml} nodes are not desired until the first {ml} job +opens. It defaults to `0` and has a maximum acceptable value of `3`. If the +current number of {ml} nodes is greater than or equal to this setting, it is assumed that there are no more lazy nodes available as the desired number -of nodes have already been provisioned. When a job is opened with this -setting set at `>0` and there are no nodes that can accept the job, then -the job will stay in the `OPENING` state until a new ML node is added to the -cluster and the job is assigned to run on that node. +of nodes have already been provisioned. If a job is opened and this setting has +a value greater than zero and there are no nodes that can accept the job, the +job stays in the `OPENING` state until a new {ml} node is added to the cluster +and the job is assigned to run on that node. + -IMPORTANT: This setting assumes some external process is capable of adding ML nodes -to the cluster. This setting is only useful when used in conjunction with +IMPORTANT: This setting assumes some external process is capable of adding {ml} +nodes to the cluster. This setting is only useful when used in conjunction with such an external process. -`xpack.ml.process_connect_timeout` (<>):: -The connection timeout for {ml} processes that run separately from the {es} JVM. -Defaults to `10s`. Some {ml} processing is done by processes that run separately -to the {es} JVM. When such processes are started they must connect to the {es} -JVM. If such a process does not connect within the time period specified by this -setting then the process is assumed to have failed. Defaults to `10s`. The minimum -value for this setting is `5s`. +`xpack.ml.process_connect_timeout`:: +(<>) The connection timeout for {ml} processes +that run separately from the {es} JVM. Defaults to `10s`. Some {ml} processing +is done by processes that run separately to the {es} JVM. When such processes +are started they must connect to the {es} JVM. If such a process does not +connect within the time period specified by this setting then the process is +assumed to have failed. Defaults to `10s`. The minimum value for this setting is +`5s`. [discrete] [[model-inference-circuit-breaker]] ==== {ml-cap} circuit breaker settings -`breaker.model_inference.limit` (<>):: -Limit for the model inference breaker, which defaults to 50% of the JVM heap. -If the parent circuit breaker is less than 50% of the JVM heap, it is bound -to that limit instead. See <>. +`breaker.model_inference.limit`:: +(<>) Limit for the model inference breaker, +which defaults to 50% of the JVM heap. If the parent circuit breaker is less +than 50% of the JVM heap, it is bound to that limit instead. See +<>. -`breaker.model_inference.overhead` (<>):: -A constant that all accounting estimations are multiplied by to determine -a final estimation. Defaults to 1. See <>. +`breaker.model_inference.overhead`:: +(<>) A constant that all accounting estimations +are multiplied by to determine a final estimation. Defaults to 1. See +<>. `breaker.model_inference.type`:: -The underlying type of the circuit breaker. There are two valid options: `noop` -and `memory`. `noop` means the circuit breaker does nothing to prevent too much -memory usage. `memory` means the circuit breaker tracks the memory used by -inference models and can potentially break and prevent OutOfMemory errors. The -default is `memory`. +(<>) The underlying type of the circuit breaker. +There are two valid options: `noop` and `memory`. `noop` means the circuit +breaker does nothing to prevent too much memory usage. `memory` means the +circuit breaker tracks the memory used by inference models and can potentially +break and prevent `OutOfMemory` errors. The default is `memory`. diff --git a/docs/reference/settings/transform-settings.asciidoc b/docs/reference/settings/transform-settings.asciidoc index ba74da48682..fc9dc95f991 100644 --- a/docs/reference/settings/transform-settings.asciidoc +++ b/docs/reference/settings/transform-settings.asciidoc @@ -10,11 +10,6 @@ You do not need to configure any settings to use {transforms}. It is enabled by default. -All of these settings can be added to the `elasticsearch.yml` configuration file. -The dynamic settings can also be updated across a cluster with the -<>. Dynamic settings take -precedence over settings in the `elasticsearch.yml` file. - [discrete] [[general-transform-settings]] ==== General {transforms} settings