OpenSearch/docs/reference/datatiers.asciidoc

[role="xpack"]
[[data-tiers]]
=== Data tiers

Common data lifecycle management patterns revolve around transitioning indices
through multiple collections of nodes with different hardware characteristics in order
to fulfil evolving CRUD, search, and aggregation needs as indices age. The concept
of a tiered hardware architecture is not new in {es}.
<<index-lifecycle-management, Index Lifecycle Management>> is instrumental in
implementing tiered architectures by automating the managemnt of indices according to
performance, resiliency and data retention requirements.
<<overview-index-lifecycle-management, Hot/warm/cold>> architectures are common
for timeseries data such as logging and metrics.

A data tier is a collection of nodes with the same role. Data tiers are an integrated
solution offering better support for optimising cost and improving performance.
Formalized data tiers in ES allow configuration of the lifecycle and location of data
in a hot/warm/cold topology without requiring the use of custom node attributes.
Each tier formalises specific characteristics and data behaviours.

The node roles that can currently define data tiers are:

* <<data-content-node, data_content>>
* <<data-hot-node, data_hot>>
* <<data-warm-node, data_warm>>
* <<data-cold-node, data_cold>>

The more generic <<data-node, data role>> is not a data tier role, but
it is the default node role if no roles are configured. If a node has the
<<data-node, data>> role we treat the node as if it has all of the tier
roles assigned.

[[content-tier]]
==== Content tier

The content tier is made of one or more nodes that have the <<data-content-node, data_content>>
role. A content tier is designed to store and search user created content. Non-timeseries data
doesn't necessarily follow the hot-warm-cold path. The hardware profiles are quite different to
the <<hot-tier, hot tier>>. User created content prioritises high CPU to support complex
queries and aggregations in a timely manner, as opposed to the <<hot-tier, hot tier>> which
prioritises high IO.
The content data has very long data retention characteristics and from a resiliency perspective
the indices in this tier should be configured to use one or more replicas.

NOTE: new indices that are not part of <<data-streams, data streams>> will be automatically allocated to the
<<content-tier>>

[[hot-tier]]
==== Hot tier

The hot tier is made of one or more nodes that have the <<data-hot-node, data_hot>> role.
It is the {es} entry point for timeseries data. This tier needs to be fast both for reads
and writes, requiring more hardware resources such as SSD drives. The hot tier is usually
hosting the data from recent days. From a resiliency perspective the indices in this
tier should be configured to use one or more replicas.

NOTE: new indices that are part of a <<data-streams, data stream>> will be automatically allocated to the
<<hot-tier>>

[[warm-tier]]
==== Warm tier

The warm tier is made of one or more nodes that have the <<data-warm-node, data_warm>> role.
This tier is where data goes once it is not queried as frequently as in the <<hot-tier, hot tier>>.
It is a medium-fast tier that still allows data updates. The warm tier is usually
hosting the data from recent weeks. From a resiliency perspective the indices in this
tier should be configured to use one or more replicas.

[[cold-tier]]
==== Cold tier

The cold tier is made of one or more nodes that have the <<data-cold-node, data_cold>> role.
Once the data in the <<warm-tier, warm tier>> is not updated anymore it can transition to the
cold tier. The cold tier is still a responsive query tier but as the data transitions into this
tier it can be compressed, shrunken, or configured to have zero replicas and be backed by
a <<ilm-searchable-snapshot, snapshot>>. The cold tier is usually hosting the data from recent
months or years.

[discrete]
[[data-tier-allocation]]
=== Data tier index allocation

When an index is created {es} will automatically allocate the index to the <<content-tier, Content tier>>
if the index is not part of a <<data-streams, data stream>> or to the <<hot-tier, Hot tier>> if the index
is part of a <<data-streams, data stream>>.
{es} will configure the <<tier-preference-allocation-filter, `index.routing.allocation.include._tier_preference`>>
to `data_content` or `data_hot` respectively.

These heuristics can be overridden by specifying any <<shard-allocation-filtering, shard allocation filtering>>
settings in the create index request or index template that matches the new index.
Specifying any configuration, including `null`, for `index.routing.allocation.include._tier_preference` will
also opt out of the automatic new index allocation to tiers.
[discrete]
[[data-tier-migration]]
=== Data tier index migration

<<index-lifecycle-management, Index Lifecycle Management>> automates the transition of managed
indices through the available data tiers using the `migrate` action which is injected
in every phase, unless it's manually specified in the phase or an
<<ilm-allocate-action, allocate action>> modifying the allocation rules is manually configured.
DOCS: general overview of data tiers and roles (#63086) (#63422) This adds general overview documentation for data tiers, the data tiers specific node roles, and their application in ILM. Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com> Co-authored-by: debadair <debadair@elastic.co> (cherry picked from commit d588cab74722bfb1d3ca0fea15d10c66af937306) Signed-off-by: Andrei Dan <andrei.dan@elastic.co> 2020-10-07 12:31:15 -04:00			`[role="xpack"]`
			`[[data-tiers]]`
			`=== Data tiers`

			`Common data lifecycle management patterns revolve around transitioning indices`
			`through multiple collections of nodes with different hardware characteristics in order`
			`to fulfil evolving CRUD, search, and aggregation needs as indices age. The concept`
			`of a tiered hardware architecture is not new in {es}.`
			`<<index-lifecycle-management, Index Lifecycle Management>> is instrumental in`
			`implementing tiered architectures by automating the managemnt of indices according to`
			`performance, resiliency and data retention requirements.`
			`<<overview-index-lifecycle-management, Hot/warm/cold>> architectures are common`
			`for timeseries data such as logging and metrics.`

			`A data tier is a collection of nodes with the same role. Data tiers are an integrated`
			`solution offering better support for optimising cost and improving performance.`
			`Formalized data tiers in ES allow configuration of the lifecycle and location of data`
			`in a hot/warm/cold topology without requiring the use of custom node attributes.`
			`Each tier formalises specific characteristics and data behaviours.`

			`The node roles that can currently define data tiers are:`

			`* <<data-content-node, data_content>>`
			`* <<data-hot-node, data_hot>>`
			`* <<data-warm-node, data_warm>>`
			`* <<data-cold-node, data_cold>>`

			`The more generic <<data-node, data role>> is not a data tier role, but`
			`it is the default node role if no roles are configured. If a node has the`
			`<<data-node, data>> role we treat the node as if it has all of the tier`
			`roles assigned.`

			`[[content-tier]]`
			`==== Content tier`

			`The content tier is made of one or more nodes that have the <<data-content-node, data_content>>`
			`role. A content tier is designed to store and search user created content. Non-timeseries data`
			`doesn't necessarily follow the hot-warm-cold path. The hardware profiles are quite different to`
			`the <<hot-tier, hot tier>>. User created content prioritises high CPU to support complex`
			`queries and aggregations in a timely manner, as opposed to the <<hot-tier, hot tier>> which`
			`prioritises high IO.`
			`The content data has very long data retention characteristics and from a resiliency perspective`
			`the indices in this tier should be configured to use one or more replicas.`

			`NOTE: new indices that are not part of <<data-streams, data streams>> will be automatically allocated to the`
			`<<content-tier>>`

			`[[hot-tier]]`
			`==== Hot tier`

			`The hot tier is made of one or more nodes that have the <<data-hot-node, data_hot>> role.`
			`It is the {es} entry point for timeseries data. This tier needs to be fast both for reads`
			`and writes, requiring more hardware resources such as SSD drives. The hot tier is usually`
			`hosting the data from recent days. From a resiliency perspective the indices in this`
			`tier should be configured to use one or more replicas.`

			`NOTE: new indices that are part of a <<data-streams, data stream>> will be automatically allocated to the`
			`<<hot-tier>>`

			`[[warm-tier]]`
			`==== Warm tier`

			`The warm tier is made of one or more nodes that have the <<data-warm-node, data_warm>> role.`
			`This tier is where data goes once it is not queried as frequently as in the <<hot-tier, hot tier>>.`
			`It is a medium-fast tier that still allows data updates. The warm tier is usually`
			`hosting the data from recent weeks. From a resiliency perspective the indices in this`
			`tier should be configured to use one or more replicas.`

			`[[cold-tier]]`
			`==== Cold tier`

			`The cold tier is made of one or more nodes that have the <<data-cold-node, data_cold>> role.`
			`Once the data in the <<warm-tier, warm tier>> is not updated anymore it can transition to the`
			`cold tier. The cold tier is still a responsive query tier but as the data transitions into this`
[DOCS] Move searchable snapshots to beta (#63436) (#63478) 2020-10-08 09:27:52 -04:00			`tier it can be compressed, shrunken, or configured to have zero replicas and be backed by`
			`a <<ilm-searchable-snapshot, snapshot>>. The cold tier is usually hosting the data from recent`
DOCS: general overview of data tiers and roles (#63086) (#63422) This adds general overview documentation for data tiers, the data tiers specific node roles, and their application in ILM. Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com> Co-authored-by: debadair <debadair@elastic.co> (cherry picked from commit d588cab74722bfb1d3ca0fea15d10c66af937306) Signed-off-by: Andrei Dan <andrei.dan@elastic.co> 2020-10-07 12:31:15 -04:00			`months or years.`
[DOCS] Add props for ILM searchable snapshot links (#63431) 2020-10-07 13:23:50 -04:00
DOCS: general overview of data tiers and roles (#63086) (#63422) This adds general overview documentation for data tiers, the data tiers specific node roles, and their application in ILM. Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com> Co-authored-by: debadair <debadair@elastic.co> (cherry picked from commit d588cab74722bfb1d3ca0fea15d10c66af937306) Signed-off-by: Andrei Dan <andrei.dan@elastic.co> 2020-10-07 12:31:15 -04:00			`[discrete]`
			`[[data-tier-allocation]]`
			`=== Data tier index allocation`

			`When an index is created {es} will automatically allocate the index to the <<content-tier, Content tier>>`
			`if the index is not part of a <<data-streams, data stream>> or to the <<hot-tier, Hot tier>> if the index`
			`is part of a <<data-streams, data stream>>.`
			{es} will configure the <<tier-preference-allocation-filter, `index.routing.allocation.include._tier_preference`>>
			to `data_content` or `data_hot` respectively.

			`These heuristics can be overridden by specifying any <<shard-allocation-filtering, shard allocation filtering>>`
			`settings in the create index request or index template that matches the new index.`
			Specifying any configuration, including `null`, for `index.routing.allocation.include._tier_preference` will
			`also opt out of the automatic new index allocation to tiers.`
			`[discrete]`
			`[[data-tier-migration]]`
			`=== Data tier index migration`

			`<<index-lifecycle-management, Index Lifecycle Management>> automates the transition of managed`
			indices through the available data tiers using the `migrate` action which is injected
			`in every phase, unless it's manually specified in the phase or an`
			`<<ilm-allocate-action, allocate action>> modifying the allocation rules is manually configured.`