This adds general overview documentation for data tiers, the data tiers specific node roles, and their application in ILM. Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com> Co-authored-by: debadair <debadair@elastic.co> (cherry picked from commit d588cab74722bfb1d3ca0fea15d10c66af937306) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
This commit is contained in:
parent
c4726a2cec
commit
6e80de0e34
|
@ -0,0 +1,99 @@
|
|||
[role="xpack"]
|
||||
[[data-tiers]]
|
||||
=== Data tiers
|
||||
|
||||
Common data lifecycle management patterns revolve around transitioning indices
|
||||
through multiple collections of nodes with different hardware characteristics in order
|
||||
to fulfil evolving CRUD, search, and aggregation needs as indices age. The concept
|
||||
of a tiered hardware architecture is not new in {es}.
|
||||
<<index-lifecycle-management, Index Lifecycle Management>> is instrumental in
|
||||
implementing tiered architectures by automating the managemnt of indices according to
|
||||
performance, resiliency and data retention requirements.
|
||||
<<overview-index-lifecycle-management, Hot/warm/cold>> architectures are common
|
||||
for timeseries data such as logging and metrics.
|
||||
|
||||
A data tier is a collection of nodes with the same role. Data tiers are an integrated
|
||||
solution offering better support for optimising cost and improving performance.
|
||||
Formalized data tiers in ES allow configuration of the lifecycle and location of data
|
||||
in a hot/warm/cold topology without requiring the use of custom node attributes.
|
||||
Each tier formalises specific characteristics and data behaviours.
|
||||
|
||||
The node roles that can currently define data tiers are:
|
||||
|
||||
* <<data-content-node, data_content>>
|
||||
* <<data-hot-node, data_hot>>
|
||||
* <<data-warm-node, data_warm>>
|
||||
* <<data-cold-node, data_cold>>
|
||||
|
||||
The more generic <<data-node, data role>> is not a data tier role, but
|
||||
it is the default node role if no roles are configured. If a node has the
|
||||
<<data-node, data>> role we treat the node as if it has all of the tier
|
||||
roles assigned.
|
||||
|
||||
[[content-tier]]
|
||||
==== Content tier
|
||||
|
||||
The content tier is made of one or more nodes that have the <<data-content-node, data_content>>
|
||||
role. A content tier is designed to store and search user created content. Non-timeseries data
|
||||
doesn't necessarily follow the hot-warm-cold path. The hardware profiles are quite different to
|
||||
the <<hot-tier, hot tier>>. User created content prioritises high CPU to support complex
|
||||
queries and aggregations in a timely manner, as opposed to the <<hot-tier, hot tier>> which
|
||||
prioritises high IO.
|
||||
The content data has very long data retention characteristics and from a resiliency perspective
|
||||
the indices in this tier should be configured to use one or more replicas.
|
||||
|
||||
NOTE: new indices that are not part of <<data-streams, data streams>> will be automatically allocated to the
|
||||
<<content-tier>>
|
||||
|
||||
[[hot-tier]]
|
||||
==== Hot tier
|
||||
|
||||
The hot tier is made of one or more nodes that have the <<data-hot-node, data_hot>> role.
|
||||
It is the {es} entry point for timeseries data. This tier needs to be fast both for reads
|
||||
and writes, requiring more hardware resources such as SSD drives. The hot tier is usually
|
||||
hosting the data from recent days. From a resiliency perspective the indices in this
|
||||
tier should be configured to use one or more replicas.
|
||||
|
||||
NOTE: new indices that are part of a <<data-streams, data stream>> will be automatically allocated to the
|
||||
<<hot-tier>>
|
||||
|
||||
[[warm-tier]]
|
||||
==== Warm tier
|
||||
|
||||
The warm tier is made of one or more nodes that have the <<data-warm-node, data_warm>> role.
|
||||
This tier is where data goes once it is not queried as frequently as in the <<hot-tier, hot tier>>.
|
||||
It is a medium-fast tier that still allows data updates. The warm tier is usually
|
||||
hosting the data from recent weeks. From a resiliency perspective the indices in this
|
||||
tier should be configured to use one or more replicas.
|
||||
|
||||
[[cold-tier]]
|
||||
==== Cold tier
|
||||
|
||||
The cold tier is made of one or more nodes that have the <<data-cold-node, data_cold>> role.
|
||||
Once the data in the <<warm-tier, warm tier>> is not updated anymore it can transition to the
|
||||
cold tier. The cold tier is still a responsive query tier but as the data transitions into this
|
||||
tier it can be compressed, shrunken, or configured to have zero replicas and be backed by
|
||||
a <<ilm-searchable-snapshot, snapshot>>. The cold tier is usually hosting the data from recent
|
||||
months or years.
|
||||
[discrete]
|
||||
[[data-tier-allocation]]
|
||||
=== Data tier index allocation
|
||||
|
||||
When an index is created {es} will automatically allocate the index to the <<content-tier, Content tier>>
|
||||
if the index is not part of a <<data-streams, data stream>> or to the <<hot-tier, Hot tier>> if the index
|
||||
is part of a <<data-streams, data stream>>.
|
||||
{es} will configure the <<tier-preference-allocation-filter, `index.routing.allocation.include._tier_preference`>>
|
||||
to `data_content` or `data_hot` respectively.
|
||||
|
||||
These heuristics can be overridden by specifying any <<shard-allocation-filtering, shard allocation filtering>>
|
||||
settings in the create index request or index template that matches the new index.
|
||||
Specifying any configuration, including `null`, for `index.routing.allocation.include._tier_preference` will
|
||||
also opt out of the automatic new index allocation to tiers.
|
||||
[discrete]
|
||||
[[data-tier-migration]]
|
||||
=== Data tier index migration
|
||||
|
||||
<<index-lifecycle-management, Index Lifecycle Management>> automates the transition of managed
|
||||
indices through the available data tiers using the `migrate` action which is injected
|
||||
in every phase, unless it's manually specified in the phase or an
|
||||
<<ilm-allocate-action, allocate action>> modifying the allocation rules is manually configured.
|
|
@ -0,0 +1,95 @@
|
|||
[role="xpack"]
|
||||
[[ilm-migrate]]
|
||||
=== Migrate
|
||||
|
||||
Phases allowed: warm, cold.
|
||||
|
||||
Moves the index to the <<data-tiers, data tier>> that corresponds
|
||||
to the current phase by updating the <<tier-preference-allocation-filter, `index.routing.allocation.include._tier_preference`>>
|
||||
index setting.
|
||||
{ilm-init} automatically injects the migrate action in the warm and cold
|
||||
phases if no allocation options are specified with the <<ilm-allocate, allocate>> action.
|
||||
If you specify an allocate action that only modifies the number of index
|
||||
replicas, {ilm-init} reduces the number of replicas before migrating the index.
|
||||
To prevent automatic migration without specifying allocation options,
|
||||
you can explicitly include the migrate action and set the enabled option to `false`.
|
||||
|
||||
In the warm phase, the `migrate` action sets <<tier-preference-allocation-filter, `index.routing.allocation.include._tier_preference`>>
|
||||
to `data_warm,data_hot`. This moves the index to nodes in the
|
||||
<<warm-tier, warm tier>>. If there are no nodes in the warm tier, it falls back to the
|
||||
<<hot-tier, hot tier>>.
|
||||
|
||||
In the cold phase, the `migrate` action sets
|
||||
<<tier-preference-allocation-filter, `index.routing.allocation.include._tier_preference`>>
|
||||
to `data_cold,data_warm,data_hot`. This moves the index to nodes in the
|
||||
<<cold-tier, cold tier>>. If there are no nodes in the cold tier, it falls back to the
|
||||
<<warm-tier, warm>> tier, or the <<hot-tier, hot>> tier if there are no warm nodes available.
|
||||
|
||||
The migrate action is not allowed in the hot phase.
|
||||
The initial index allocation is performed <<data-tier-allocation, automatically>>,
|
||||
and can be configured manually or via <<indices-templates, index templates>>.
|
||||
|
||||
[[ilm-migrate-options]]
|
||||
==== Options
|
||||
|
||||
`enabled`::
|
||||
(Optional, boolean)
|
||||
Controls whether {ilm-init} automatically migrates the index during this phase.
|
||||
Defaults to `true`.
|
||||
|
||||
[[ilm-enabled-migrate-ex]]
|
||||
==== Example
|
||||
|
||||
In the following policy, the allocate action is specified to reduce the number of replicas before {ilm-init} migrates the index to warm nodes.
|
||||
|
||||
NOTE: Explicitly specifying the migrate action is not required--{ilm-init} automatically performs the migrate action unless you specify allocation options or disable migration.
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
PUT _ilm/policy/my_policy
|
||||
{
|
||||
"policy": {
|
||||
"phases": {
|
||||
"warm": {
|
||||
"actions": {
|
||||
"migrate" : {
|
||||
},
|
||||
"allocate": {
|
||||
"number_of_replicas": 1
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
[[ilm-disable-migrate-ex]]
|
||||
==== Disable automatic migration
|
||||
|
||||
The migrate action in the following policy is disabled and
|
||||
the allocate action assigns the index to nodes that have a
|
||||
`rack_id` of _one_ or _two_.
|
||||
NOTE: Explicitly disabling the migrate action is not required--{ilm-init} does not inject the migrate action if you specify allocation options.
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
PUT _ilm/policy/my_policy
|
||||
{
|
||||
"policy": {
|
||||
"phases": {
|
||||
"warm": {
|
||||
"actions": {
|
||||
"migrate" : {
|
||||
"enabled": false
|
||||
},
|
||||
"allocate": {
|
||||
"include" : {
|
||||
"rack_id": "one,two"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
|
@ -18,6 +18,10 @@ Makes the index read-only.
|
|||
[[ilm-freeze-action]]<<ilm-freeze,Freeze>>::
|
||||
Freeze the index to minimize its memory footprint.
|
||||
|
||||
[[ilm-migrate-action]]<<ilm-migrate,Migrate>>::
|
||||
Move the index shards to the <<data-tiers, data tier>> that corresponds
|
||||
to the current {ilm-init] phase.
|
||||
|
||||
[[ilm-readonly-action]]<<ilm-readonly,Read only>>::
|
||||
Block write operations to the index.
|
||||
|
||||
|
@ -54,6 +58,7 @@ include::actions/ilm-allocate.asciidoc[]
|
|||
include::actions/ilm-delete.asciidoc[]
|
||||
include::actions/ilm-forcemerge.asciidoc[]
|
||||
include::actions/ilm-freeze.asciidoc[]
|
||||
include::actions/ilm-migrate.asciidoc[]
|
||||
include::actions/ilm-readonly.asciidoc[]
|
||||
include::actions/ilm-rollover.asciidoc[]
|
||||
ifdef::permanently-unreleased-branch[]
|
||||
|
|
|
@ -7,6 +7,7 @@ nodes:
|
|||
* <<shard-allocation-filtering,Shard allocation filtering>>: Controlling which shards are allocated to which nodes.
|
||||
* <<delayed-allocation,Delayed allocation>>: Delaying allocation of unassigned shards caused by a node leaving.
|
||||
* <<allocation-total-shards,Total shards per node>>: A hard limit on the number of shards from the same index per node.
|
||||
* <<data-tier-shard-filtering, Data tier allocation>>: Controls the allocation of indices to <<data-tiers, data tiers>>.
|
||||
|
||||
include::allocation/filtering.asciidoc[]
|
||||
|
||||
|
@ -16,5 +17,4 @@ include::allocation/prioritization.asciidoc[]
|
|||
|
||||
include::allocation/total_shards.asciidoc[]
|
||||
|
||||
|
||||
|
||||
include::allocation/data_tier_allocation.asciidoc[]
|
||||
|
|
|
@ -0,0 +1,51 @@
|
|||
[role="xpack"]
|
||||
[[data-tier-shard-filtering]]
|
||||
=== Index-level data tier allocation filtering
|
||||
|
||||
You can use index-level allocation settings to control which <<data-tiers, data tier>>
|
||||
the index is allocated to. The data tier allocator is a
|
||||
<<shard-allocation-filtering, shard allocation filter>> that uses two built-in
|
||||
node attributes: `_tier` and `_tier_preference`.
|
||||
|
||||
These tier attributes are set using the data node roles:
|
||||
|
||||
* <<data-content-node, data_content>>
|
||||
* <<data-hot-node, data_hot>>
|
||||
* <<data-warm-node, data_warm>>
|
||||
* <<data-cold-node, data_cold>>
|
||||
|
||||
NOTE: The <<data-node, data>> role is not a valid data tier and cannot be used
|
||||
for data tier filtering.
|
||||
|
||||
[discrete]
|
||||
[[data-tier-allocation-filters]]
|
||||
====Data tier allocation settings
|
||||
|
||||
|
||||
`index.routing.allocation.include._tier`::
|
||||
|
||||
Assign the index to a node whose `node.roles` configuration has at
|
||||
least one of to the comma-separated values.
|
||||
|
||||
`index.routing.allocation.require._tier`::
|
||||
|
||||
Assign the index to a node whose `node.roles` configuration has _all_
|
||||
of the comma-separated values.
|
||||
|
||||
`index.routing.allocation.exclude._tier`::
|
||||
|
||||
Assign the index to a node whose `node.roles` configuration has _none_ of the
|
||||
comma-separated values.
|
||||
|
||||
[[tier-preference-allocation-filter]]
|
||||
`index.routing.allocation.include._tier_preference`::
|
||||
|
||||
Assign the index to the first tier in the list that has an available node.
|
||||
This prevents indices from remaining unallocated if no nodes are available
|
||||
in the preferred tier.
|
||||
|
||||
For example, if you set `index.routing.allocation.include._tier_preference`
|
||||
to `data_warm,data_hot`, the index is allocated to the warm tier if there
|
||||
are nodes with the `data_warm` role. If there are no nodes in the warm tier,
|
||||
but there are nodes with the `data_hot` role, the index is allocated to
|
||||
the hot tier.
|
|
@ -7,8 +7,8 @@ a particular index. These per-index filters are applied in conjunction with
|
|||
<<shard-allocation-awareness, allocation awareness>>.
|
||||
|
||||
Shard allocation filters can be based on custom node attributes or the built-in
|
||||
`_name`, `_host_ip`, `_publish_ip`, `_ip`, `_host` and `_id` attributes.
|
||||
<<index-lifecycle-management, Index lifecycle management>> uses filters based
|
||||
`_name`, `_host_ip`, `_publish_ip`, `_ip`, `_host`, `_id`, `_tier` and `_tier_preference`
|
||||
attributes. <<index-lifecycle-management, Index lifecycle management>> uses filters based
|
||||
on custom node attributes to determine how to reallocate shards when moving
|
||||
between phases.
|
||||
|
||||
|
@ -102,6 +102,12 @@ The index allocation settings support the following built-in attributes:
|
|||
`_ip`:: Match either `_host_ip` or `_publish_ip`
|
||||
`_host`:: Match nodes by hostname
|
||||
`_id`:: Match nodes by node id
|
||||
`_tier`:: Match nodes by the node's <<data-tiers, data tier>> role.
|
||||
For more details see <<data-tier-shard-filtering, data tier allocation filtering>>
|
||||
|
||||
NOTE: `_tier` filtering is based on <<modules-node, node>> roles. Only
|
||||
a subset of roles are <<data-tiers, data tier>> roles, and the generic
|
||||
<<data-node, data role>> will match any tier filtering.
|
||||
|
||||
You can use wildcards when specifying attribute values, for example:
|
||||
|
||||
|
|
|
@ -30,6 +30,8 @@ include::indices/index-templates.asciidoc[]
|
|||
|
||||
include::data-streams/data-streams.asciidoc[]
|
||||
|
||||
include::datatiers.asciidoc[]
|
||||
|
||||
include::ingest.asciidoc[]
|
||||
|
||||
include::search/search-your-data/search-your-data.asciidoc[]
|
||||
|
|
|
@ -7,7 +7,7 @@ conjunction with <<shard-allocation-filtering, per-index allocation filtering>>
|
|||
and <<shard-allocation-awareness, allocation awareness>>.
|
||||
|
||||
Shard allocation filters can be based on custom node attributes or the built-in
|
||||
`_name`, `_host_ip`, `_publish_ip`, `_ip`, `_host` and `_id` attributes.
|
||||
`_name`, `_host_ip`, `_publish_ip`, `_ip`, `_host`, `_id` and `_tier` attributes.
|
||||
|
||||
The `cluster.routing.allocation` settings are <<dynamic-cluster-setting,dynamic>>, enabling live indices to
|
||||
be moved from one set of nodes to another. Shards are only relocated if it is
|
||||
|
@ -55,7 +55,13 @@ The cluster allocation settings support the following built-in attributes:
|
|||
`_ip`:: Match either `_host_ip` or `_publish_ip`
|
||||
`_host`:: Match nodes by hostname
|
||||
`_id`:: Match nodes by node id
|
||||
`_tier`:: Match nodes by the node's <<data-tiers, data tier>> role
|
||||
|
||||
NOTE: `_tier` filtering is based on <<modules-node, node>> roles. Only
|
||||
a subset of roles are <<data-tiers, data tier>> roles, and the generic
|
||||
<<data-node, data role>> will match any tier filtering.
|
||||
a subset of roles that are <<data-tiers, data tier>> roles, but the generic
|
||||
<<data-node, data role>> will match any tier filtering.
|
||||
|
||||
|
||||
You can use wildcards when specifying attribute values, for example:
|
||||
|
|
|
@ -30,6 +30,10 @@ configure this setting, then the node has the following roles by default:
|
|||
|
||||
* `master`
|
||||
* `data`
|
||||
* `data_content`
|
||||
* `data_hot`
|
||||
* `data_warm`
|
||||
* `data_cold`
|
||||
* `ingest`
|
||||
* `ml`
|
||||
* `remote_cluster_client`
|
||||
|
@ -44,7 +48,7 @@ A node that has the `master` role (default), which makes it eligible to be
|
|||
<<data-node,Data node>>::
|
||||
|
||||
A node that has the `data` role (default). Data nodes hold data and perform data
|
||||
related operations such as CRUD, search, and aggregations.
|
||||
related operations such as CRUD, search, and aggregations. A node with the `data` role can fill any of the specialised data node roles.
|
||||
|
||||
<<node-ingest-node,Ingest node>>::
|
||||
|
||||
|
@ -206,6 +210,58 @@ To create a dedicated data node, set:
|
|||
node.roles: [ data ]
|
||||
----
|
||||
|
||||
In a multi-tier deployment architecture, you use specialised data roles to assign data nodes to specific tiers: `data_content`,`data_hot`,
|
||||
`data_warm`, or `data_cold`. A node can belong to multiple tiers, but a node that has one of the specialised data roles cannot have the
|
||||
generic `data` role.
|
||||
|
||||
[[data-content-node]]
|
||||
==== [x-pack]#Content data node#
|
||||
|
||||
Content data nodes accommodate user-created content. They enable operations like CRUD,
|
||||
search and aggregations.
|
||||
|
||||
To create a dedicated content node, set:
|
||||
[source,yaml]
|
||||
----
|
||||
node.roles: [ data_content ]
|
||||
----
|
||||
|
||||
[[data-hot-node]]
|
||||
==== [x-pack]#Hot data node#
|
||||
|
||||
Hot data nodes store time series data as it enters {es}. The hot tier must be fast for
|
||||
both reads and writes, and requires more hardware resources (such as SSD drives).
|
||||
|
||||
To create a dedicated hot node, set:
|
||||
[source,yaml]
|
||||
----
|
||||
node.roles: [ data_hot ]
|
||||
----
|
||||
|
||||
[[data-warm-node]]
|
||||
==== [x-pack]#Warm data node#
|
||||
|
||||
Warm data nodes store indices that are no longer being regularly updated, but are still being
|
||||
queried. Query volume is usually at a lower frequency than it was while the index was in the hot tier.
|
||||
Less performant hardware can usually be used for nodes in this tier.
|
||||
|
||||
To create a dedicated warm node, set:
|
||||
[source,yaml]
|
||||
----
|
||||
node.roles: [ data_warm ]
|
||||
----
|
||||
|
||||
[[data-cold-node]]
|
||||
==== [x-pack]#Cold data node#
|
||||
|
||||
Cold data nodes store read-only indices that are accessed less frequently. This tier uses less performant hardware and may leverage snapshot-backed indices to minimize the resources required.
|
||||
|
||||
To create a dedicated cold node, set:
|
||||
[source,yaml]
|
||||
----
|
||||
node.roles: [ data_cold ]
|
||||
----
|
||||
|
||||
[[node-ingest-node]]
|
||||
==== Ingest node
|
||||
|
||||
|
|
Loading…
Reference in New Issue