From 6e80de0e340c6613051607d835e67debbfe1e49e Mon Sep 17 00:00:00 2001 From: Andrei Dan Date: Wed, 7 Oct 2020 17:31:15 +0100 Subject: [PATCH] DOCS: general overview of data tiers and roles (#63086) (#63422) This adds general overview documentation for data tiers, the data tiers specific node roles, and their application in ILM. Co-authored-by: Lee Hinman Co-authored-by: debadair (cherry picked from commit d588cab74722bfb1d3ca0fea15d10c66af937306) Signed-off-by: Andrei Dan --- docs/reference/datatiers.asciidoc | 99 +++++++++++++++++++ .../ilm/actions/ilm-migrate.asciidoc | 95 ++++++++++++++++++ docs/reference/ilm/ilm-actions.asciidoc | 5 + .../index-modules/allocation.asciidoc | 4 +- .../allocation/data_tier_allocation.asciidoc | 51 ++++++++++ .../allocation/filtering.asciidoc | 10 +- docs/reference/index.asciidoc | 2 + .../cluster/allocation_filtering.asciidoc | 8 +- docs/reference/modules/node.asciidoc | 58 ++++++++++- 9 files changed, 326 insertions(+), 6 deletions(-) create mode 100644 docs/reference/datatiers.asciidoc create mode 100644 docs/reference/ilm/actions/ilm-migrate.asciidoc create mode 100644 docs/reference/index-modules/allocation/data_tier_allocation.asciidoc diff --git a/docs/reference/datatiers.asciidoc b/docs/reference/datatiers.asciidoc new file mode 100644 index 00000000000..55d455123b2 --- /dev/null +++ b/docs/reference/datatiers.asciidoc @@ -0,0 +1,99 @@ +[role="xpack"] +[[data-tiers]] +=== Data tiers + +Common data lifecycle management patterns revolve around transitioning indices +through multiple collections of nodes with different hardware characteristics in order +to fulfil evolving CRUD, search, and aggregation needs as indices age. The concept +of a tiered hardware architecture is not new in {es}. +<> is instrumental in +implementing tiered architectures by automating the managemnt of indices according to +performance, resiliency and data retention requirements. +<> architectures are common +for timeseries data such as logging and metrics. + +A data tier is a collection of nodes with the same role. Data tiers are an integrated +solution offering better support for optimising cost and improving performance. +Formalized data tiers in ES allow configuration of the lifecycle and location of data +in a hot/warm/cold topology without requiring the use of custom node attributes. +Each tier formalises specific characteristics and data behaviours. + +The node roles that can currently define data tiers are: + +* <> +* <> +* <> +* <> + +The more generic <> is not a data tier role, but +it is the default node role if no roles are configured. If a node has the +<> role we treat the node as if it has all of the tier +roles assigned. + +[[content-tier]] +==== Content tier + +The content tier is made of one or more nodes that have the <> +role. A content tier is designed to store and search user created content. Non-timeseries data +doesn't necessarily follow the hot-warm-cold path. The hardware profiles are quite different to +the <>. User created content prioritises high CPU to support complex +queries and aggregations in a timely manner, as opposed to the <> which +prioritises high IO. +The content data has very long data retention characteristics and from a resiliency perspective +the indices in this tier should be configured to use one or more replicas. + +NOTE: new indices that are not part of <> will be automatically allocated to the +<> + +[[hot-tier]] +==== Hot tier + +The hot tier is made of one or more nodes that have the <> role. +It is the {es} entry point for timeseries data. This tier needs to be fast both for reads +and writes, requiring more hardware resources such as SSD drives. The hot tier is usually +hosting the data from recent days. From a resiliency perspective the indices in this +tier should be configured to use one or more replicas. + +NOTE: new indices that are part of a <> will be automatically allocated to the +<> + +[[warm-tier]] +==== Warm tier + +The warm tier is made of one or more nodes that have the <> role. +This tier is where data goes once it is not queried as frequently as in the <>. +It is a medium-fast tier that still allows data updates. The warm tier is usually +hosting the data from recent weeks. From a resiliency perspective the indices in this +tier should be configured to use one or more replicas. + +[[cold-tier]] +==== Cold tier + +The cold tier is made of one or more nodes that have the <> role. +Once the data in the <> is not updated anymore it can transition to the +cold tier. The cold tier is still a responsive query tier but as the data transitions into this +tier it can be compressed, shrunken, or configured to have zero replicas and be backed by +a <>. The cold tier is usually hosting the data from recent +months or years. +[discrete] +[[data-tier-allocation]] +=== Data tier index allocation + +When an index is created {es} will automatically allocate the index to the <> +if the index is not part of a <> or to the <> if the index +is part of a <>. +{es} will configure the <> +to `data_content` or `data_hot` respectively. + +These heuristics can be overridden by specifying any <> +settings in the create index request or index template that matches the new index. +Specifying any configuration, including `null`, for `index.routing.allocation.include._tier_preference` will +also opt out of the automatic new index allocation to tiers. +[discrete] +[[data-tier-migration]] +=== Data tier index migration + +<> automates the transition of managed +indices through the available data tiers using the `migrate` action which is injected +in every phase, unless it's manually specified in the phase or an +<> modifying the allocation rules is manually configured. diff --git a/docs/reference/ilm/actions/ilm-migrate.asciidoc b/docs/reference/ilm/actions/ilm-migrate.asciidoc new file mode 100644 index 00000000000..de409bedc94 --- /dev/null +++ b/docs/reference/ilm/actions/ilm-migrate.asciidoc @@ -0,0 +1,95 @@ +[role="xpack"] +[[ilm-migrate]] +=== Migrate + +Phases allowed: warm, cold. + +Moves the index to the <> that corresponds +to the current phase by updating the <> +index setting. +{ilm-init} automatically injects the migrate action in the warm and cold +phases if no allocation options are specified with the <> action. +If you specify an allocate action that only modifies the number of index +replicas, {ilm-init} reduces the number of replicas before migrating the index. +To prevent automatic migration without specifying allocation options, +you can explicitly include the migrate action and set the enabled option to `false`. + +In the warm phase, the `migrate` action sets <> +to `data_warm,data_hot`. This moves the index to nodes in the +<>. If there are no nodes in the warm tier, it falls back to the +<>. + +In the cold phase, the `migrate` action sets +<> +to `data_cold,data_warm,data_hot`. This moves the index to nodes in the +<>. If there are no nodes in the cold tier, it falls back to the +<> tier, or the <> tier if there are no warm nodes available. + +The migrate action is not allowed in the hot phase. +The initial index allocation is performed <>, +and can be configured manually or via <>. + +[[ilm-migrate-options]] +==== Options + +`enabled`:: +(Optional, boolean) +Controls whether {ilm-init} automatically migrates the index during this phase. +Defaults to `true`. + +[[ilm-enabled-migrate-ex]] +==== Example + +In the following policy, the allocate action is specified to reduce the number of replicas before {ilm-init} migrates the index to warm nodes. + +NOTE: Explicitly specifying the migrate action is not required--{ilm-init} automatically performs the migrate action unless you specify allocation options or disable migration. + +[source,console] +-------------------------------------------------- +PUT _ilm/policy/my_policy +{ + "policy": { + "phases": { + "warm": { + "actions": { + "migrate" : { + }, + "allocate": { + "number_of_replicas": 1 + } + } + } + } + } +} +-------------------------------------------------- + +[[ilm-disable-migrate-ex]] +==== Disable automatic migration + +The migrate action in the following policy is disabled and +the allocate action assigns the index to nodes that have a +`rack_id` of _one_ or _two_. +NOTE: Explicitly disabling the migrate action is not required--{ilm-init} does not inject the migrate action if you specify allocation options. +[source,console] +-------------------------------------------------- +PUT _ilm/policy/my_policy +{ + "policy": { + "phases": { + "warm": { + "actions": { + "migrate" : { + "enabled": false + }, + "allocate": { + "include" : { + "rack_id": "one,two" + } + } + } + } + } + } +} +-------------------------------------------------- diff --git a/docs/reference/ilm/ilm-actions.asciidoc b/docs/reference/ilm/ilm-actions.asciidoc index e8484dfdaf9..eb39ebf8b12 100644 --- a/docs/reference/ilm/ilm-actions.asciidoc +++ b/docs/reference/ilm/ilm-actions.asciidoc @@ -18,6 +18,10 @@ Makes the index read-only. [[ilm-freeze-action]]<>:: Freeze the index to minimize its memory footprint. +[[ilm-migrate-action]]<>:: +Move the index shards to the <> that corresponds +to the current {ilm-init] phase. + [[ilm-readonly-action]]<>:: Block write operations to the index. @@ -54,6 +58,7 @@ include::actions/ilm-allocate.asciidoc[] include::actions/ilm-delete.asciidoc[] include::actions/ilm-forcemerge.asciidoc[] include::actions/ilm-freeze.asciidoc[] +include::actions/ilm-migrate.asciidoc[] include::actions/ilm-readonly.asciidoc[] include::actions/ilm-rollover.asciidoc[] ifdef::permanently-unreleased-branch[] diff --git a/docs/reference/index-modules/allocation.asciidoc b/docs/reference/index-modules/allocation.asciidoc index 66e41230687..709f66b7f35 100644 --- a/docs/reference/index-modules/allocation.asciidoc +++ b/docs/reference/index-modules/allocation.asciidoc @@ -7,6 +7,7 @@ nodes: * <>: Controlling which shards are allocated to which nodes. * <>: Delaying allocation of unassigned shards caused by a node leaving. * <>: A hard limit on the number of shards from the same index per node. +* <>: Controls the allocation of indices to <>. include::allocation/filtering.asciidoc[] @@ -16,5 +17,4 @@ include::allocation/prioritization.asciidoc[] include::allocation/total_shards.asciidoc[] - - +include::allocation/data_tier_allocation.asciidoc[] diff --git a/docs/reference/index-modules/allocation/data_tier_allocation.asciidoc b/docs/reference/index-modules/allocation/data_tier_allocation.asciidoc new file mode 100644 index 00000000000..7b4e24057c6 --- /dev/null +++ b/docs/reference/index-modules/allocation/data_tier_allocation.asciidoc @@ -0,0 +1,51 @@ +[role="xpack"] +[[data-tier-shard-filtering]] +=== Index-level data tier allocation filtering + +You can use index-level allocation settings to control which <> +the index is allocated to. The data tier allocator is a +<> that uses two built-in +node attributes: `_tier` and `_tier_preference`. + +These tier attributes are set using the data node roles: + +* <> +* <> +* <> +* <> + +NOTE: The <> role is not a valid data tier and cannot be used +for data tier filtering. + +[discrete] +[[data-tier-allocation-filters]] +====Data tier allocation settings + + +`index.routing.allocation.include._tier`:: + + Assign the index to a node whose `node.roles` configuration has at + least one of to the comma-separated values. + +`index.routing.allocation.require._tier`:: + + Assign the index to a node whose `node.roles` configuration has _all_ + of the comma-separated values. + +`index.routing.allocation.exclude._tier`:: + + Assign the index to a node whose `node.roles` configuration has _none_ of the + comma-separated values. + +[[tier-preference-allocation-filter]] +`index.routing.allocation.include._tier_preference`:: + + Assign the index to the first tier in the list that has an available node. + This prevents indices from remaining unallocated if no nodes are available + in the preferred tier. + + For example, if you set `index.routing.allocation.include._tier_preference` + to `data_warm,data_hot`, the index is allocated to the warm tier if there + are nodes with the `data_warm` role. If there are no nodes in the warm tier, + but there are nodes with the `data_hot` role, the index is allocated to + the hot tier. diff --git a/docs/reference/index-modules/allocation/filtering.asciidoc b/docs/reference/index-modules/allocation/filtering.asciidoc index 02103b7cc5f..27da952a36f 100644 --- a/docs/reference/index-modules/allocation/filtering.asciidoc +++ b/docs/reference/index-modules/allocation/filtering.asciidoc @@ -7,8 +7,8 @@ a particular index. These per-index filters are applied in conjunction with <>. Shard allocation filters can be based on custom node attributes or the built-in -`_name`, `_host_ip`, `_publish_ip`, `_ip`, `_host` and `_id` attributes. -<> uses filters based +`_name`, `_host_ip`, `_publish_ip`, `_ip`, `_host`, `_id`, `_tier` and `_tier_preference` +attributes. <> uses filters based on custom node attributes to determine how to reallocate shards when moving between phases. @@ -102,6 +102,12 @@ The index allocation settings support the following built-in attributes: `_ip`:: Match either `_host_ip` or `_publish_ip` `_host`:: Match nodes by hostname `_id`:: Match nodes by node id +`_tier`:: Match nodes by the node's <> role. + For more details see <> + +NOTE: `_tier` filtering is based on <> roles. Only +a subset of roles are <> roles, and the generic +<> will match any tier filtering. You can use wildcards when specifying attribute values, for example: diff --git a/docs/reference/index.asciidoc b/docs/reference/index.asciidoc index fb75e5d1f38..9bf8470e049 100644 --- a/docs/reference/index.asciidoc +++ b/docs/reference/index.asciidoc @@ -30,6 +30,8 @@ include::indices/index-templates.asciidoc[] include::data-streams/data-streams.asciidoc[] +include::datatiers.asciidoc[] + include::ingest.asciidoc[] include::search/search-your-data/search-your-data.asciidoc[] diff --git a/docs/reference/modules/cluster/allocation_filtering.asciidoc b/docs/reference/modules/cluster/allocation_filtering.asciidoc index 3d21f8483a2..8aad7a97855 100644 --- a/docs/reference/modules/cluster/allocation_filtering.asciidoc +++ b/docs/reference/modules/cluster/allocation_filtering.asciidoc @@ -7,7 +7,7 @@ conjunction with <> and <>. Shard allocation filters can be based on custom node attributes or the built-in -`_name`, `_host_ip`, `_publish_ip`, `_ip`, `_host` and `_id` attributes. +`_name`, `_host_ip`, `_publish_ip`, `_ip`, `_host`, `_id` and `_tier` attributes. The `cluster.routing.allocation` settings are <>, enabling live indices to be moved from one set of nodes to another. Shards are only relocated if it is @@ -55,7 +55,13 @@ The cluster allocation settings support the following built-in attributes: `_ip`:: Match either `_host_ip` or `_publish_ip` `_host`:: Match nodes by hostname `_id`:: Match nodes by node id +`_tier`:: Match nodes by the node's <> role +NOTE: `_tier` filtering is based on <> roles. Only +a subset of roles are <> roles, and the generic +<> will match any tier filtering. +a subset of roles that are <> roles, but the generic +<> will match any tier filtering. You can use wildcards when specifying attribute values, for example: diff --git a/docs/reference/modules/node.asciidoc b/docs/reference/modules/node.asciidoc index f5e5bdcadc3..90d8c8ed764 100644 --- a/docs/reference/modules/node.asciidoc +++ b/docs/reference/modules/node.asciidoc @@ -30,6 +30,10 @@ configure this setting, then the node has the following roles by default: * `master` * `data` +* `data_content` +* `data_hot` +* `data_warm` +* `data_cold` * `ingest` * `ml` * `remote_cluster_client` @@ -44,7 +48,7 @@ A node that has the `master` role (default), which makes it eligible to be <>:: A node that has the `data` role (default). Data nodes hold data and perform data -related operations such as CRUD, search, and aggregations. +related operations such as CRUD, search, and aggregations. A node with the `data` role can fill any of the specialised data node roles. <>:: @@ -206,6 +210,58 @@ To create a dedicated data node, set: node.roles: [ data ] ---- +In a multi-tier deployment architecture, you use specialised data roles to assign data nodes to specific tiers: `data_content`,`data_hot`, +`data_warm`, or `data_cold`. A node can belong to multiple tiers, but a node that has one of the specialised data roles cannot have the +generic `data` role. + +[[data-content-node]] +==== [x-pack]#Content data node# + +Content data nodes accommodate user-created content. They enable operations like CRUD, +search and aggregations. + +To create a dedicated content node, set: +[source,yaml] +---- +node.roles: [ data_content ] +---- + +[[data-hot-node]] +==== [x-pack]#Hot data node# + +Hot data nodes store time series data as it enters {es}. The hot tier must be fast for +both reads and writes, and requires more hardware resources (such as SSD drives). + +To create a dedicated hot node, set: +[source,yaml] +---- +node.roles: [ data_hot ] +---- + +[[data-warm-node]] +==== [x-pack]#Warm data node# + +Warm data nodes store indices that are no longer being regularly updated, but are still being +queried. Query volume is usually at a lower frequency than it was while the index was in the hot tier. +Less performant hardware can usually be used for nodes in this tier. + +To create a dedicated warm node, set: +[source,yaml] +---- +node.roles: [ data_warm ] +---- + +[[data-cold-node]] +==== [x-pack]#Cold data node# + +Cold data nodes store read-only indices that are accessed less frequently. This tier uses less performant hardware and may leverage snapshot-backed indices to minimize the resources required. + +To create a dedicated cold node, set: +[source,yaml] +---- +node.roles: [ data_cold ] +---- + [[node-ingest-node]] ==== Ingest node