DOCS: general overview of data tiers and roles (#63086) (#63422)

This adds general overview documentation for data tiers,
the data tiers specific node roles, and their application in
ILM.

Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
Co-authored-by: debadair <debadair@elastic.co>
(cherry picked from commit d588cab74722bfb1d3ca0fea15d10c66af937306)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
This commit is contained in:
Andrei Dan 2020-10-07 17:31:15 +01:00 committed by GitHub
parent c4726a2cec
commit 6e80de0e34
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
9 changed files with 326 additions and 6 deletions

View File

@ -0,0 +1,99 @@
[role="xpack"]
[[data-tiers]]
=== Data tiers
Common data lifecycle management patterns revolve around transitioning indices
through multiple collections of nodes with different hardware characteristics in order
to fulfil evolving CRUD, search, and aggregation needs as indices age. The concept
of a tiered hardware architecture is not new in {es}.
<<index-lifecycle-management, Index Lifecycle Management>> is instrumental in
implementing tiered architectures by automating the managemnt of indices according to
performance, resiliency and data retention requirements.
<<overview-index-lifecycle-management, Hot/warm/cold>> architectures are common
for timeseries data such as logging and metrics.
A data tier is a collection of nodes with the same role. Data tiers are an integrated
solution offering better support for optimising cost and improving performance.
Formalized data tiers in ES allow configuration of the lifecycle and location of data
in a hot/warm/cold topology without requiring the use of custom node attributes.
Each tier formalises specific characteristics and data behaviours.
The node roles that can currently define data tiers are:
* <<data-content-node, data_content>>
* <<data-hot-node, data_hot>>
* <<data-warm-node, data_warm>>
* <<data-cold-node, data_cold>>
The more generic <<data-node, data role>> is not a data tier role, but
it is the default node role if no roles are configured. If a node has the
<<data-node, data>> role we treat the node as if it has all of the tier
roles assigned.
[[content-tier]]
==== Content tier
The content tier is made of one or more nodes that have the <<data-content-node, data_content>>
role. A content tier is designed to store and search user created content. Non-timeseries data
doesn't necessarily follow the hot-warm-cold path. The hardware profiles are quite different to
the <<hot-tier, hot tier>>. User created content prioritises high CPU to support complex
queries and aggregations in a timely manner, as opposed to the <<hot-tier, hot tier>> which
prioritises high IO.
The content data has very long data retention characteristics and from a resiliency perspective
the indices in this tier should be configured to use one or more replicas.
NOTE: new indices that are not part of <<data-streams, data streams>> will be automatically allocated to the
<<content-tier>>
[[hot-tier]]
==== Hot tier
The hot tier is made of one or more nodes that have the <<data-hot-node, data_hot>> role.
It is the {es} entry point for timeseries data. This tier needs to be fast both for reads
and writes, requiring more hardware resources such as SSD drives. The hot tier is usually
hosting the data from recent days. From a resiliency perspective the indices in this
tier should be configured to use one or more replicas.
NOTE: new indices that are part of a <<data-streams, data stream>> will be automatically allocated to the
<<hot-tier>>
[[warm-tier]]
==== Warm tier
The warm tier is made of one or more nodes that have the <<data-warm-node, data_warm>> role.
This tier is where data goes once it is not queried as frequently as in the <<hot-tier, hot tier>>.
It is a medium-fast tier that still allows data updates. The warm tier is usually
hosting the data from recent weeks. From a resiliency perspective the indices in this
tier should be configured to use one or more replicas.
[[cold-tier]]
==== Cold tier
The cold tier is made of one or more nodes that have the <<data-cold-node, data_cold>> role.
Once the data in the <<warm-tier, warm tier>> is not updated anymore it can transition to the
cold tier. The cold tier is still a responsive query tier but as the data transitions into this
tier it can be compressed, shrunken, or configured to have zero replicas and be backed by
a <<ilm-searchable-snapshot, snapshot>>. The cold tier is usually hosting the data from recent
months or years.
[discrete]
[[data-tier-allocation]]
=== Data tier index allocation
When an index is created {es} will automatically allocate the index to the <<content-tier, Content tier>>
if the index is not part of a <<data-streams, data stream>> or to the <<hot-tier, Hot tier>> if the index
is part of a <<data-streams, data stream>>.
{es} will configure the <<tier-preference-allocation-filter, `index.routing.allocation.include._tier_preference`>>
to `data_content` or `data_hot` respectively.
These heuristics can be overridden by specifying any <<shard-allocation-filtering, shard allocation filtering>>
settings in the create index request or index template that matches the new index.
Specifying any configuration, including `null`, for `index.routing.allocation.include._tier_preference` will
also opt out of the automatic new index allocation to tiers.
[discrete]
[[data-tier-migration]]
=== Data tier index migration
<<index-lifecycle-management, Index Lifecycle Management>> automates the transition of managed
indices through the available data tiers using the `migrate` action which is injected
in every phase, unless it's manually specified in the phase or an
<<ilm-allocate-action, allocate action>> modifying the allocation rules is manually configured.

View File

@ -0,0 +1,95 @@
[role="xpack"]
[[ilm-migrate]]
=== Migrate
Phases allowed: warm, cold.
Moves the index to the <<data-tiers, data tier>> that corresponds
to the current phase by updating the <<tier-preference-allocation-filter, `index.routing.allocation.include._tier_preference`>>
index setting.
{ilm-init} automatically injects the migrate action in the warm and cold
phases if no allocation options are specified with the <<ilm-allocate, allocate>> action.
If you specify an allocate action that only modifies the number of index
replicas, {ilm-init} reduces the number of replicas before migrating the index.
To prevent automatic migration without specifying allocation options,
you can explicitly include the migrate action and set the enabled option to `false`.
In the warm phase, the `migrate` action sets <<tier-preference-allocation-filter, `index.routing.allocation.include._tier_preference`>>
to `data_warm,data_hot`. This moves the index to nodes in the
<<warm-tier, warm tier>>. If there are no nodes in the warm tier, it falls back to the
<<hot-tier, hot tier>>.
In the cold phase, the `migrate` action sets
<<tier-preference-allocation-filter, `index.routing.allocation.include._tier_preference`>>
to `data_cold,data_warm,data_hot`. This moves the index to nodes in the
<<cold-tier, cold tier>>. If there are no nodes in the cold tier, it falls back to the
<<warm-tier, warm>> tier, or the <<hot-tier, hot>> tier if there are no warm nodes available.
The migrate action is not allowed in the hot phase.
The initial index allocation is performed <<data-tier-allocation, automatically>>,
and can be configured manually or via <<indices-templates, index templates>>.
[[ilm-migrate-options]]
==== Options
`enabled`::
(Optional, boolean)
Controls whether {ilm-init} automatically migrates the index during this phase.
Defaults to `true`.
[[ilm-enabled-migrate-ex]]
==== Example
In the following policy, the allocate action is specified to reduce the number of replicas before {ilm-init} migrates the index to warm nodes.
NOTE: Explicitly specifying the migrate action is not required--{ilm-init} automatically performs the migrate action unless you specify allocation options or disable migration.
[source,console]
--------------------------------------------------
PUT _ilm/policy/my_policy
{
"policy": {
"phases": {
"warm": {
"actions": {
"migrate" : {
},
"allocate": {
"number_of_replicas": 1
}
}
}
}
}
}
--------------------------------------------------
[[ilm-disable-migrate-ex]]
==== Disable automatic migration
The migrate action in the following policy is disabled and
the allocate action assigns the index to nodes that have a
`rack_id` of _one_ or _two_.
NOTE: Explicitly disabling the migrate action is not required--{ilm-init} does not inject the migrate action if you specify allocation options.
[source,console]
--------------------------------------------------
PUT _ilm/policy/my_policy
{
"policy": {
"phases": {
"warm": {
"actions": {
"migrate" : {
"enabled": false
},
"allocate": {
"include" : {
"rack_id": "one,two"
}
}
}
}
}
}
}
--------------------------------------------------

View File

@ -18,6 +18,10 @@ Makes the index read-only.
[[ilm-freeze-action]]<<ilm-freeze,Freeze>>::
Freeze the index to minimize its memory footprint.
[[ilm-migrate-action]]<<ilm-migrate,Migrate>>::
Move the index shards to the <<data-tiers, data tier>> that corresponds
to the current {ilm-init] phase.
[[ilm-readonly-action]]<<ilm-readonly,Read only>>::
Block write operations to the index.
@ -54,6 +58,7 @@ include::actions/ilm-allocate.asciidoc[]
include::actions/ilm-delete.asciidoc[]
include::actions/ilm-forcemerge.asciidoc[]
include::actions/ilm-freeze.asciidoc[]
include::actions/ilm-migrate.asciidoc[]
include::actions/ilm-readonly.asciidoc[]
include::actions/ilm-rollover.asciidoc[]
ifdef::permanently-unreleased-branch[]

View File

@ -7,6 +7,7 @@ nodes:
* <<shard-allocation-filtering,Shard allocation filtering>>: Controlling which shards are allocated to which nodes.
* <<delayed-allocation,Delayed allocation>>: Delaying allocation of unassigned shards caused by a node leaving.
* <<allocation-total-shards,Total shards per node>>: A hard limit on the number of shards from the same index per node.
* <<data-tier-shard-filtering, Data tier allocation>>: Controls the allocation of indices to <<data-tiers, data tiers>>.
include::allocation/filtering.asciidoc[]
@ -16,5 +17,4 @@ include::allocation/prioritization.asciidoc[]
include::allocation/total_shards.asciidoc[]
include::allocation/data_tier_allocation.asciidoc[]

View File

@ -0,0 +1,51 @@
[role="xpack"]
[[data-tier-shard-filtering]]
=== Index-level data tier allocation filtering
You can use index-level allocation settings to control which <<data-tiers, data tier>>
the index is allocated to. The data tier allocator is a
<<shard-allocation-filtering, shard allocation filter>> that uses two built-in
node attributes: `_tier` and `_tier_preference`.
These tier attributes are set using the data node roles:
* <<data-content-node, data_content>>
* <<data-hot-node, data_hot>>
* <<data-warm-node, data_warm>>
* <<data-cold-node, data_cold>>
NOTE: The <<data-node, data>> role is not a valid data tier and cannot be used
for data tier filtering.
[discrete]
[[data-tier-allocation-filters]]
====Data tier allocation settings
`index.routing.allocation.include._tier`::
Assign the index to a node whose `node.roles` configuration has at
least one of to the comma-separated values.
`index.routing.allocation.require._tier`::
Assign the index to a node whose `node.roles` configuration has _all_
of the comma-separated values.
`index.routing.allocation.exclude._tier`::
Assign the index to a node whose `node.roles` configuration has _none_ of the
comma-separated values.
[[tier-preference-allocation-filter]]
`index.routing.allocation.include._tier_preference`::
Assign the index to the first tier in the list that has an available node.
This prevents indices from remaining unallocated if no nodes are available
in the preferred tier.
For example, if you set `index.routing.allocation.include._tier_preference`
to `data_warm,data_hot`, the index is allocated to the warm tier if there
are nodes with the `data_warm` role. If there are no nodes in the warm tier,
but there are nodes with the `data_hot` role, the index is allocated to
the hot tier.

View File

@ -7,8 +7,8 @@ a particular index. These per-index filters are applied in conjunction with
<<shard-allocation-awareness, allocation awareness>>.
Shard allocation filters can be based on custom node attributes or the built-in
`_name`, `_host_ip`, `_publish_ip`, `_ip`, `_host` and `_id` attributes.
<<index-lifecycle-management, Index lifecycle management>> uses filters based
`_name`, `_host_ip`, `_publish_ip`, `_ip`, `_host`, `_id`, `_tier` and `_tier_preference`
attributes. <<index-lifecycle-management, Index lifecycle management>> uses filters based
on custom node attributes to determine how to reallocate shards when moving
between phases.
@ -102,6 +102,12 @@ The index allocation settings support the following built-in attributes:
`_ip`:: Match either `_host_ip` or `_publish_ip`
`_host`:: Match nodes by hostname
`_id`:: Match nodes by node id
`_tier`:: Match nodes by the node's <<data-tiers, data tier>> role.
For more details see <<data-tier-shard-filtering, data tier allocation filtering>>
NOTE: `_tier` filtering is based on <<modules-node, node>> roles. Only
a subset of roles are <<data-tiers, data tier>> roles, and the generic
<<data-node, data role>> will match any tier filtering.
You can use wildcards when specifying attribute values, for example:

View File

@ -30,6 +30,8 @@ include::indices/index-templates.asciidoc[]
include::data-streams/data-streams.asciidoc[]
include::datatiers.asciidoc[]
include::ingest.asciidoc[]
include::search/search-your-data/search-your-data.asciidoc[]

View File

@ -7,7 +7,7 @@ conjunction with <<shard-allocation-filtering, per-index allocation filtering>>
and <<shard-allocation-awareness, allocation awareness>>.
Shard allocation filters can be based on custom node attributes or the built-in
`_name`, `_host_ip`, `_publish_ip`, `_ip`, `_host` and `_id` attributes.
`_name`, `_host_ip`, `_publish_ip`, `_ip`, `_host`, `_id` and `_tier` attributes.
The `cluster.routing.allocation` settings are <<dynamic-cluster-setting,dynamic>>, enabling live indices to
be moved from one set of nodes to another. Shards are only relocated if it is
@ -55,7 +55,13 @@ The cluster allocation settings support the following built-in attributes:
`_ip`:: Match either `_host_ip` or `_publish_ip`
`_host`:: Match nodes by hostname
`_id`:: Match nodes by node id
`_tier`:: Match nodes by the node's <<data-tiers, data tier>> role
NOTE: `_tier` filtering is based on <<modules-node, node>> roles. Only
a subset of roles are <<data-tiers, data tier>> roles, and the generic
<<data-node, data role>> will match any tier filtering.
a subset of roles that are <<data-tiers, data tier>> roles, but the generic
<<data-node, data role>> will match any tier filtering.
You can use wildcards when specifying attribute values, for example:

View File

@ -30,6 +30,10 @@ configure this setting, then the node has the following roles by default:
* `master`
* `data`
* `data_content`
* `data_hot`
* `data_warm`
* `data_cold`
* `ingest`
* `ml`
* `remote_cluster_client`
@ -44,7 +48,7 @@ A node that has the `master` role (default), which makes it eligible to be
<<data-node,Data node>>::
A node that has the `data` role (default). Data nodes hold data and perform data
related operations such as CRUD, search, and aggregations.
related operations such as CRUD, search, and aggregations. A node with the `data` role can fill any of the specialised data node roles.
<<node-ingest-node,Ingest node>>::
@ -206,6 +210,58 @@ To create a dedicated data node, set:
node.roles: [ data ]
----
In a multi-tier deployment architecture, you use specialised data roles to assign data nodes to specific tiers: `data_content`,`data_hot`,
`data_warm`, or `data_cold`. A node can belong to multiple tiers, but a node that has one of the specialised data roles cannot have the
generic `data` role.
[[data-content-node]]
==== [x-pack]#Content data node#
Content data nodes accommodate user-created content. They enable operations like CRUD,
search and aggregations.
To create a dedicated content node, set:
[source,yaml]
----
node.roles: [ data_content ]
----
[[data-hot-node]]
==== [x-pack]#Hot data node#
Hot data nodes store time series data as it enters {es}. The hot tier must be fast for
both reads and writes, and requires more hardware resources (such as SSD drives).
To create a dedicated hot node, set:
[source,yaml]
----
node.roles: [ data_hot ]
----
[[data-warm-node]]
==== [x-pack]#Warm data node#
Warm data nodes store indices that are no longer being regularly updated, but are still being
queried. Query volume is usually at a lower frequency than it was while the index was in the hot tier.
Less performant hardware can usually be used for nodes in this tier.
To create a dedicated warm node, set:
[source,yaml]
----
node.roles: [ data_warm ]
----
[[data-cold-node]]
==== [x-pack]#Cold data node#
Cold data nodes store read-only indices that are accessed less frequently. This tier uses less performant hardware and may leverage snapshot-backed indices to minimize the resources required.
To create a dedicated cold node, set:
[source,yaml]
----
node.roles: [ data_cold ]
----
[[node-ingest-node]]
==== Ingest node