Cold tier time-range should not be specified (#65546)

Whether the cold tier can handle years depends a lot on the use case and
for instance our BWC guarantees. This would need to be part of a
specific sizing exercise, so in the spirit of not over-promising, the
description of the cold tier has been changed to not mention years.
This commit is contained in:
Henning Andersen 2020-11-30 15:04:41 +01:00 committed by Henning Andersen
parent aa8ebeb918
commit 9564a8b1e0
1 changed files with 33 additions and 33 deletions

View File

@ -2,24 +2,24 @@
[[data-tiers]] [[data-tiers]]
== Data tiers == Data tiers
A _data tier_ is a collection of nodes with the same data role that A _data tier_ is a collection of nodes with the same data role that
typically share the same hardware profile: typically share the same hardware profile:
* <<content-tier, Content tier>> nodes handle the indexing and query load for content such as a product catalog. * <<content-tier, Content tier>> nodes handle the indexing and query load for content such as a product catalog.
* <<hot-tier, Hot tier>> nodes handle the indexing load for time series data such as logs or metrics * <<hot-tier, Hot tier>> nodes handle the indexing load for time series data such as logs or metrics
and hold your most recent, most-frequently-accessed data. and hold your most recent, most-frequently-accessed data.
* <<warm-tier, Warm tier>> nodes hold time series data that is accessed less-frequently * <<warm-tier, Warm tier>> nodes hold time series data that is accessed less-frequently
and rarely needs to be updated. and rarely needs to be updated.
* <<cold-tier, Cold tier>> nodes hold time series data that is accessed occasionally and not normally updated. * <<cold-tier, Cold tier>> nodes hold time series data that is accessed occasionally and not normally updated.
When you index documents directly to a specific index, they remain on content tier nodes indefinitely. When you index documents directly to a specific index, they remain on content tier nodes indefinitely.
When you index documents to a data stream, they initially reside on hot tier nodes. When you index documents to a data stream, they initially reside on hot tier nodes.
You can configure <<index-lifecycle-management, {ilm}>> ({ilm-init}) policies You can configure <<index-lifecycle-management, {ilm}>> ({ilm-init}) policies
to automatically transition your time series data through the hot, warm, and cold tiers to automatically transition your time series data through the hot, warm, and cold tiers
according to your performance, resiliency and data retention requirements. according to your performance, resiliency and data retention requirements.
A node's <<data-node, data role>> is configured in `elasticsearch.yml`. A node's <<data-node, data role>> is configured in `elasticsearch.yml`.
For example, the highest-performance nodes in a cluster might be assigned to both the hot and content tiers: For example, the highest-performance nodes in a cluster might be assigned to both the hot and content tiers:
[source,yaml] [source,yaml]
@ -33,9 +33,9 @@ node.roles: ["data_hot", "data_content"]
Data stored in the content tier is generally a collection of items such as a product catalog or article archive. Data stored in the content tier is generally a collection of items such as a product catalog or article archive.
Unlike time series data, the value of the content remains relatively constant over time, Unlike time series data, the value of the content remains relatively constant over time,
so it doesn't make sense to move it to a tier with different performance characteristics as it ages. so it doesn't make sense to move it to a tier with different performance characteristics as it ages.
Content data typically has long data retention requirements, and you want to be able to retrieve Content data typically has long data retention requirements, and you want to be able to retrieve
items quickly regardless of how old they are. items quickly regardless of how old they are.
Content tier nodes are usually optimized for query performance--they prioritize processing power over IO throughput Content tier nodes are usually optimized for query performance--they prioritize processing power over IO throughput
so they can process complex searches and aggregations and return results quickly. so they can process complex searches and aggregations and return results quickly.
@ -49,10 +49,10 @@ New indices are automatically allocated to the <<content-tier>> unless they are
[[hot-tier]] [[hot-tier]]
=== Hot tier === Hot tier
The hot tier is the {es} entry point for time series data and holds your most-recent, The hot tier is the {es} entry point for time series data and holds your most-recent,
most-frequently-searched time series data. most-frequently-searched time series data.
Nodes in the hot tier need to be fast for both reads and writes, Nodes in the hot tier need to be fast for both reads and writes,
which requires more hardware resources and faster storage (SSDs). which requires more hardware resources and faster storage (SSDs).
For resiliency, indices in the hot tier should be configured to use one or more replicas. For resiliency, indices in the hot tier should be configured to use one or more replicas.
New indices that are part of a <<data-streams, data stream>> are automatically allocated to the New indices that are part of a <<data-streams, data stream>> are automatically allocated to the
@ -62,43 +62,43 @@ hot tier.
[[warm-tier]] [[warm-tier]]
=== Warm tier === Warm tier
Time series data can move to the warm tier once it is being queried less frequently Time series data can move to the warm tier once it is being queried less frequently
than the recently-indexed data in the hot tier. than the recently-indexed data in the hot tier.
The warm tier typically holds data from recent weeks. The warm tier typically holds data from recent weeks.
Updates are still allowed, but likely infrequent. Updates are still allowed, but likely infrequent.
Nodes in the warm tier generally don't need to be as fast as those in the hot tier. Nodes in the warm tier generally don't need to be as fast as those in the hot tier.
For resiliency, indices in the warm tier should be configured to use one or more replicas. For resiliency, indices in the warm tier should be configured to use one or more replicas.
[discrete] [discrete]
[[cold-tier]] [[cold-tier]]
=== Cold tier === Cold tier
Once data in the warm tier is no longer being updated, it can move to the cold tier. Once data is no longer being updated, it can move from the warm tier to the cold tier where it
The cold tier typically holds the data from recent months or years. stays for the rest of its life.
The cold tier is still a responsive query tier, but data in the cold tier is not normally updated. The cold tier is still a responsive query tier, but data in the cold tier is not normally updated.
As data transitions into the cold tier it can be compressed and shrunken. As data transitions into the cold tier it can be compressed and shrunken.
For resiliency, indices in the cold tier can rely on For resiliency, indices in the cold tier can rely on
<<ilm-searchable-snapshot, searchable snapshots>>, eliminating the need for replicas. <<ilm-searchable-snapshot, searchable snapshots>>, eliminating the need for replicas.
[discrete] [discrete]
[[data-tier-allocation]] [[data-tier-allocation]]
=== Data tier index allocation === Data tier index allocation
When you create an index, by default {es} sets When you create an index, by default {es} sets
<<tier-preference-allocation-filter, `index.routing.allocation.include._tier_preference`>> <<tier-preference-allocation-filter, `index.routing.allocation.include._tier_preference`>>
to `data_content` to automatically allocate the index shards to the content tier. to `data_content` to automatically allocate the index shards to the content tier.
When {es} creates an index as part of a <<data-streams, data stream>>, When {es} creates an index as part of a <<data-streams, data stream>>,
by default {es} sets by default {es} sets
<<tier-preference-allocation-filter, `index.routing.allocation.include._tier_preference`>> <<tier-preference-allocation-filter, `index.routing.allocation.include._tier_preference`>>
to `data_hot` to automatically allocate the index shards to the hot tier. to `data_hot` to automatically allocate the index shards to the hot tier.
You can override the automatic tier-based allocation by specifying You can override the automatic tier-based allocation by specifying
<<shard-allocation-filtering, shard allocation filtering>> <<shard-allocation-filtering, shard allocation filtering>>
settings in the create index request or index template that matches the new index. settings in the create index request or index template that matches the new index.
You can also explicitly set `index.routing.allocation.include._tier_preference` You can also explicitly set `index.routing.allocation.include._tier_preference`
to opt out of the default tier-based allocation. to opt out of the default tier-based allocation.
If you set the tier preference to `null`, {es} ignores the data tier roles during allocation. If you set the tier preference to `null`, {es} ignores the data tier roles during allocation.
[discrete] [discrete]
@ -106,7 +106,7 @@ If you set the tier preference to `null`, {es} ignores the data tier roles durin
=== Automatic data tier migration === Automatic data tier migration
{ilm-init} automatically transitions managed {ilm-init} automatically transitions managed
indices through the available data tiers using the <<ilm-migrate, migrate>> action. indices through the available data tiers using the <<ilm-migrate, migrate>> action.
By default, this action is automatically injected in every phase. By default, this action is automatically injected in every phase.
You can explicitly specify the migrate action to override the default behavior, You can explicitly specify the migrate action to override the default behavior,
or use the <<ilm-allocate, allocate action>> to manually specify allocation rules. or use the <<ilm-allocate, allocate action>> to manually specify allocation rules.