[DOCS] Update snapshot/restore and SLM docs for data streams (#58513) (#59403)

Updates the existing snapshot/restore and SLM docs to make them
aware of data streams.
This commit is contained in:
James Rodewig 2020-07-13 09:26:51 -04:00 committed by GitHub
parent bd01fd107c
commit 27a87c9d0c
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
11 changed files with 175 additions and 81 deletions

View File

@ -320,6 +320,25 @@ For example, you might opt to shrink an index once it is no longer the write ind
See the {ref}/indices-shrink-index.html[shrink index API].
// end::shrink-def[]
[[glossary-snapshot]] snapshot ::
// tag::snapshot-def[]
A backup taken from a running {es} cluster.
A snapshot can include backups of an entire cluster or only data streams and
indices you specify.
// end::snapshot-def[]
[[glossary-snapshot-lifecycle-policy]] snapshot lifecycle policy ::
// tag::snapshot-lifecycle-policy-def[]
Specifies how frequently to perform automatic backups of a cluster and
how long to retain the resulting snapshots.
// end::snapshot-lifecycle-policy-def[]
[[glossary-snapshot-repository]] snapshot repository ::
// tag::snapshot-repository-def[]
Specifies where snapshots are to be stored.
Snapshots can be written to a shared filesystem or to a remote repository.
// end::snapshot-repository-def[]
[[glossary-source_field]] source field ::
By default, the JSON document that you index will be stored in the

View File

@ -1,9 +1,9 @@
[role="xpack"]
[testenv="basic"]
[[index-lifecycle-and-snapshots]]
== Restore a managed index
== Restore a managed data stream or index
When you restore a snapshot that contains managed indices,
When you restore a managed index or a data stream with managed backing indices,
{ilm-init} automatically resumes executing the restored indices' policies.
A restored index's `min_age` is relative to when it was originally created or rolled over,
not its restoration time.
@ -12,8 +12,8 @@ an index has been restored from a snapshot.
If you restore an index that was accidentally deleted half way through its month long lifecycle,
it proceeds normally through the last two weeks of its lifecycle.
In some cases, you might want to restore a managed index and
prevent {ilm-init} from immediately executing its policy.
In some cases, you might want to prevent {ilm-init} from immediately executing
its policy on a restored index.
For example, if you are restoring an older snapshot you might want to
prevent it from rapidly progressing through all of its lifecycle phases.
You might want to add or update documents before it's marked read-only or shrunk,

View File

@ -4,12 +4,12 @@
== Manage the snapshot lifecycle
You can set up snapshot lifecycle policies to automate the timing, frequency, and retention of snapshots.
Snapshot policies can apply to multiple indices.
Snapshot policies can apply to multiple data streams and indices.
The snapshot lifecycle management (SLM) <<snapshot-lifecycle-management-api, CRUD APIs>> provide
the building blocks for the snapshot policy features that are part of the Management application in {kib}.
The Snapshot and Restore UI makes it easy to set up policies, register snapshot repositories,
view and manage snapshots, and restore indices.
The Snapshot and Restore UI makes it easy to set up policies, register snapshot repositories,
view and manage snapshots, and restore data streams or indices.
You can stop and restart SLM to temporarily pause automatic backups while performing
upgrades or other maintenance.

View File

@ -56,7 +56,7 @@ Configuration for each snapshot created by the policy.
====
`ignore_unavailable`::
(Optional, boolean)
If `true`, missing indices do *not* cause snapshot creation to fail and return
If `true`, missing data streams or indices do *not* cause snapshot creation to fail and return
an error. Defaults to `false`.
`include_global_state`::
@ -65,14 +65,15 @@ If `true`, cluster states are included in snapshots. Defaults to `false`.
`indices`::
(Optional, array of strings)
Array of index names or wildcard pattern of index names included in snapshots. It
supports <<date-math-index-names,date math>> expressions.
Array of data streams and indices to include in snapshots.
<<date-math-index-names,Date math>> and wildcard (`*`) expressions are
supported.
====
`name`::
(Required, string)
Name automatically assigned to each snapshot created by the policy. This value
supports the same <<date-math-index-names,date math>> supported in index names.
Name automatically assigned to each snapshot created by the policy.
<<date-math-index-names,Date math>> is supported.
To prevent conflicting snapshot names, a UUID is automatically appended to each
snapshot name.
@ -141,7 +142,7 @@ PUT /_slm/policy/daily-snapshots
<2> The name each snapshot should be given
<3> Which repository to take the snapshot in
<4> Any extra snapshot configuration
<5> Which indices the snapshot should contain
<5> Data streams and indices the snapshot should contain
<6> Optional retention configuration
<7> Keep snapshots for 30 days
<8> Always keep at least 5 successful snapshots, even if they're more than 30 days old

View File

@ -25,7 +25,7 @@ cluster privilege to use this API. For more information, see
Halts all {slm} ({slm-init}) operations and stops the {slm-init} plugin.
This is useful when you are performing maintenance on a cluster and need to
prevent {slm-init} from performing any actions on your indices.
prevent {slm-init} from performing any actions on your data streams or indices.
Stopping {slm-init} does not stop any snapshots that are in progress.
You can manually trigger snapshots with the <<slm-api-execute-lifecycle>> even if {slm-init} is stopped.

View File

@ -3,8 +3,8 @@
[[getting-started-snapshot-lifecycle-management]]
=== Tutorial: Automate backups with {slm-init}
This tutorial demonstrates how to automate daily backups of {es} indices using an {slm-init} policy.
The policy takes <<modules-snapshots, snapshots>> of all indices in the cluster
This tutorial demonstrates how to automate daily backups of {es} data streams and indices using an {slm-init} policy.
The policy takes <<modules-snapshots, snapshots>> of all data streams and indices in the cluster
and stores them in a local repository.
It also defines a retention policy and automatically deletes snapshots
when they are no longer needed.
@ -47,7 +47,7 @@ PUT /_snapshot/my_repository
Once you have a repository in place,
you can define an {slm-init} policy to take snapshots automatically.
The policy defines when to take snapshots, which indices should be included,
The policy defines when to take snapshots, which data streams or indices should be included,
and what to name the snapshots.
A policy can also specify a <<slm-retention,retention policy>> and
automatically delete snapshots when they are no longer needed.
@ -58,7 +58,7 @@ Snapshots are incremental and make efficient use of storage.
You can define and manage policies through {kib} Management or with the put policy API.
For example, you could define a `nightly-snapshots` policy
to back up all of your indices daily at 2:30AM UTC.
to back up all of your data streams and indices daily at 2:30AM UTC.
A put policy request defines the policy configuration in JSON:
@ -86,13 +86,13 @@ PUT /_slm/policy/nightly-snapshots
<<date-math-index-names,date math>> to include the current date in the snapshot name
<3> Where to store the snapshot
<4> The configuration to be used for the snapshot requests (see below)
<5> Which indices to include in the snapshot: all indices
<5> Which data streams or indices to include in the snapshot: all data streams and indices
<6> Optional retention policy: keep snapshots for 30 days,
retaining at least 5 and no more than 50 snapshots regardless of age
You can specify additional snapshot configuration options to customize how snapshots are taken.
For example, you could configure the policy to fail the snapshot
if one of the specified indices is missing.
if one of the specified data streams or indices is missing.
For more information about snapshot options, see <<snapshots-take-snapshot,snapshot requests>>.
[discrete]

View File

@ -4,12 +4,12 @@
== {slm-init}: Manage the snapshot lifecycle
You can set up snapshot lifecycle policies to automate the timing, frequency, and retention of snapshots.
Snapshot policies can apply to multiple indices.
Snapshot policies can apply to multiple data streams and indices.
The {slm} ({slm-init}) <<snapshot-lifecycle-management-api, CRUD APIs>> provide
the building blocks for the snapshot policy features that are part of {kib} Management.
{kibana-ref}/snapshot-repositories.html[Snapshot and Restore] makes it easy to
set up policies, register snapshot repositories, view and manage snapshots, and restore indices.
set up policies, register snapshot repositories, view and manage snapshots, and restore data streams or indices.
You can stop and restart {slm-init} to temporarily pause automatic backups while performing
upgrades or other maintenance.

View File

@ -5,25 +5,28 @@
--
// tag::snapshot-intro[]
A _snapshot_ is a backup taken from a running {es} cluster.
You can take snapshots of individual indices or of the entire cluster.
Snapshots can be stored in either local or remote repositories.
Remote repositories can reside on S3, HDFS, Azure, Google Cloud Storage,
A _snapshot_ is a backup taken from a running {es} cluster.
You can take snapshots of an entire cluster, including all its data streams and
indices. You can also take snapshots of only specific data streams or indices in
the cluster.
Snapshots can be stored in either local or remote repositories.
Remote repositories can reside on S3, HDFS, Azure, Google Cloud Storage,
and other platforms supported by a repository plugin.
Snapshots are incremental: each snapshot of an index only stores data that
is not part of an earlier snapshot.
Snapshots are incremental: each snapshot only stores data that
is not part of an earlier snapshot.
This enables you to take frequent snapshots with minimal overhead.
// end::snapshot-intro[]
// end::snapshot-intro[]
// tag::restore-intro[]
You can restore snapshots to a running cluster with the <<snapshots-restore-snapshot,restore API>>.
By default, all indices in the snapshot are restored.
Alternatively, you can restore specific indices or restore the cluster state from a snapshot.
When restoring indices, you can modify the index name and selected index settings.
You can restore snapshots to a running cluster with the <<snapshots-restore-snapshot,restore API>>.
By default, all data streams and indices in the snapshot are restored.
However, you can choose to restore only the cluster state or specific data
streams or indices from a snapshot.
// end::restore-intro[]
You must <<snapshots-register-repository, register a snapshot repository>>
You must <<snapshots-register-repository, register a snapshot repository>>
before you can <<snapshots-take-snapshot, take snapshots>>.
You can use <<getting-started-snapshot-lifecycle-management, snapshot lifecycle management>>
@ -50,7 +53,7 @@ compatibility. Follow the <<setup-upgrade,Upgrade documentation>>
when migrating between versions.
A snapshot contains a copy of the on-disk data structures that make up an
index. This means that snapshots can only be restored to versions of
index or a data stream's backing indices. This means that snapshots can only be restored to versions of
{es} that can read the indices:
* A snapshot of an index created in 6.x can be restored to 7.x.
@ -67,20 +70,21 @@ We do not recommend restoring snapshots from later {es} versions in earlier
versions. In some cases, the snapshots cannot be restored. For example, a
snapshot taken in 7.6.0 cannot be restored to 7.5.0.
Each snapshot can contain indices created in various versions of {es},
and when restoring a snapshot it must be possible to restore all of the indices
into the target cluster. If any indices in a snapshot were created in an
incompatible version, you will not be able restore the snapshot.
Each snapshot can contain indices created in various versions of {es}. This
includes backing indices created for data streams. When restoring a snapshot, it
must be possible to restore all of these indices into the target cluster. If any
indices in a snapshot were created in an incompatible version, you will not be
able restore the snapshot.
IMPORTANT: When backing up your data prior to an upgrade, keep in mind that you
won't be able to restore snapshots after you upgrade if they contain indices
created in a version that's incompatible with the upgrade version.
If you end up in a situation where you need to restore a snapshot of an index
If you end up in a situation where you need to restore a snapshot of a data stream or index
that is incompatible with the version of the cluster you are currently running,
you can restore it on the latest compatible version and use
<<reindex-from-remote,reindex-from-remote>> to rebuild the index on the current
version. Reindexing from remote is only possible if the original index has
<<reindex-from-remote,reindex-from-remote>> to rebuild the data stream or index on the current
version. Reindexing from remote is only possible if the original data stream or index has
source enabled. Retrieving and reindexing the data can take significantly
longer than simply restoring a snapshot. If you have a large amount of data, we
recommend testing the reindex from remote process with a subset of your data to

View File

@ -76,15 +76,15 @@ DELETE /_snapshot/my_backup/snapshot_1
// TEST[continued]
The restore operation uses the standard shard recovery mechanism. Therefore, any currently running restore operation can
be canceled by deleting indices that are being restored. Please note that data for all deleted indices will be removed
be canceled by deleting data streams and indices that are being restored. Please note that data for all deleted data streams and indices will be removed
from the cluster as a result of this operation.
[float]
=== Effect of cluster blocks on snapshot and restore
Many snapshot and restore operations are affected by cluster and index blocks. For example, registering and unregistering
repositories require write global metadata access. The snapshot operation requires that all indices and their metadata as
well as the global metadata were readable. The restore operation requires the global metadata to be writable, however
repositories require write global metadata access. The snapshot operation requires that all indices, backing indices, and their metadata as
well as the global metadata be readable. The restore operation requires the global metadata to be writable, however
the index level blocks are ignored during restore because indices are essentially recreated during restore.
Please note that a repository content is not part of the cluster and therefore cluster blocks don't affect internal
repository operations such as listing or deleting snapshots from an already registered repository.

View File

@ -1,9 +1,5 @@
[[snapshots-restore-snapshot]]
== Restore indices from a snapshot
++++
<titleabbrev>Restore a snapshot</titleabbrev>
++++
== Restore a snapshot
////
[source,console]
@ -29,15 +25,48 @@ A snapshot can be restored using the following command:
POST /_snapshot/my_backup/snapshot_1/_restore
-----------------------------------
By default, all indices in the snapshot are restored, and the cluster state is
*not* restored. It's possible to select indices that should be restored as well
By default, all data streams and indices in the snapshot are restored, but the cluster state is
*not* restored. It's possible to select specific data streams or indices that should be restored as well
as to allow the global cluster state from being restored by using `indices` and
`include_global_state` options in the restore request body. The list of indices
supports <<multi-index,multi-target syntax>>. The `rename_pattern`
and `rename_replacement` options can be also used to rename indices on restore
`include_global_state` options in the restore request body. The list
supports <<multi-index,multi-target syntax>>.
[WARNING]
====
Each data stream requires a matching
<<create-a-data-stream-template,index template>>. The stream uses this
template to create new backing indices.
When restoring a data stream, ensure a matching template exists for the stream.
You can do this using one of the following methods:
* Check for existing templates that match the stream. If no matching template
exists, <<create-a-data-stream-template,create one>>.
* Restore a global cluster state that includes a matching template for the
stream.
If no index template matches a data stream, the stream cannot
<<manually-roll-over-a-data-stream,roll over>> or create new backing indices.
====
The `rename_pattern`
and `rename_replacement` options can be also used to rename data streams and indices on restore
using regular expression that supports referencing the original text as
explained
http://docs.oracle.com/javase/6/docs/api/java/util/regex/Matcher.html#appendReplacement(java.lang.StringBuffer,%20java.lang.String)[here].
If you rename a restored data stream, its backing indices are also
renamed. For example, if you rename the `logs` data stream to `restored-logs`,
the backing index `.ds-logs-000005` is renamed to `.ds-restored-logs-000005`.
[WARNING]
====
If you rename a restored stream, ensure an index template matches the new stream
name. If no index template matches the stream, it cannot
<<manually-roll-over-a-data-stream,roll over>> or create new backing indices.
====
Set `include_aliases` to `false` to prevent aliases from being restored together
with associated indices
@ -45,7 +74,7 @@ with associated indices
-----------------------------------
POST /_snapshot/my_backup/snapshot_1/_restore
{
"indices": "index_1,index_2",
"indices": "data_stream_1,index_1,index_2",
"ignore_unavailable": true,
"include_global_state": false, <1>
"rename_pattern": "index_(.+)",
@ -69,10 +98,22 @@ has the same number of shards as the index in the snapshot. The restore
operation automatically opens restored indices if they were closed and creates
new indices if they didn't exist in the cluster.
If a data stream is restored, its backing indices are also restored. The restore
operation automatically opens restored backing indices if they were closed.
NOTE: You cannot restore a data stream if a stream with the same name already
exists.
In addition to entire data streams, you can restore only specific backing
indices from a snapshot. However, restored backing indices are not automatically
added to any existing data streams. For example, if only the `.ds-logs-000003`
backing index is restored from a snapshot, it is not automatically added to the
existing `logs` data stream.
[float]
=== Partial restore
By default, the entire restore operation will fail if one or more indices participating in the operation don't have
By default, the entire restore operation will fail if one or more indices or backing indices participating in the operation don't have
snapshots of all shards available. It can occur if some shards failed to snapshot for example. It is still possible to
restore such indices by setting `partial` to `true`. Please note, that only successfully snapshotted shards will be
restored in this case and all missing shards will be recreated empty.
@ -102,6 +143,21 @@ POST /_snapshot/my_backup/snapshot_1/_restore
Please note, that some settings such as `index.number_of_shards` cannot be changed during restore operation.
For data streams, these index settings are applied to the restored backing
indices.
[IMPORTANT]
====
The `index_settings` and `ignore_index_settings` parameters affect
restored backing indices only. New backing indices created for a stream use the index
settings specified in the stream's matching
<<create-a-data-stream-template,index template>>.
If you change index settings during a restore, we recommend you make similar
changes in the stream's matching index template. This ensures new backing
indices created for the stream use the same index settings.
====
[float]
=== Restoring to a different cluster
@ -111,11 +167,11 @@ containing the snapshot in the new cluster and starting the restore process. The
same size or topology. However, the version of the new cluster should be the same or newer (only 1 major version newer) than the cluster that was used to create the snapshot. For example, you can restore a 1.x snapshot to a 2.x cluster, but not a 1.x snapshot to a 5.x cluster.
If the new cluster has a smaller size additional considerations should be made. First of all it's necessary to make sure
that new cluster have enough capacity to store all indices in the snapshot. It's possible to change indices settings
that new cluster have enough capacity to store all data streams and indices in the snapshot. It's possible to change index settings
during restore to reduce the number of replicas, which can help with restoring snapshots into smaller cluster. It's also
possible to select only subset of the indices using the `indices` parameter.
possible to select only subset of the data streams or indices using the `indices` parameter.
If indices in the original cluster were assigned to particular nodes using
If indices or backing indices in the original cluster were assigned to particular nodes using
<<shard-allocation-filtering,shard allocation filtering>>, the same rules will be enforced in the new cluster. Therefore
if the new cluster doesn't contain nodes with appropriate attributes that a restored index can be allocated on, such
index will not be successfully restored unless these index allocation settings are changed during restore operation.

View File

@ -1,9 +1,5 @@
[[snapshots-take-snapshot]]
== Take a snapshot of one or more indices
++++
<titleabbrev>Take a snapshot</titleabbrev>
++++
== Take a snapshot
A repository can contain multiple snapshots of the same cluster. Snapshots are identified by unique names within the
cluster. A snapshot with the name `snapshot_1` in the repository `my_backup` can be created by executing the following
@ -33,14 +29,14 @@ initialization (default) or wait for snapshot completion. During snapshot initia
previous snapshots is loaded into the memory, which means that in large repositories it may take several seconds (or
even minutes) for this command to return even if the `wait_for_completion` parameter is set to `false`.
By default a snapshot of all open and started indices in the cluster is created. This behavior can be changed by
specifying the list of indices in the body of the snapshot request.
By default a snapshot backs up all data streams and open indices in the cluster. This behavior can be changed by
specifying the list of data streams and indices in the body of the snapshot request.
[source,console]
-----------------------------------
PUT /_snapshot/my_backup/snapshot_2?wait_for_completion=true
{
"indices": "index_1,index_2",
"indices": "data_stream_1,index_1,index_2",
"ignore_unavailable": true,
"include_global_state": false,
"metadata": {
@ -51,13 +47,31 @@ PUT /_snapshot/my_backup/snapshot_2?wait_for_completion=true
-----------------------------------
// TEST[skip:cannot complete subsequent snapshot]
The list of indices that should be included into the snapshot can be specified using the `indices` parameter that
The list of data streams and indices that should be included into the snapshot can be specified using the `indices` parameter that
supports <<multi-index,multi-target syntax>>, although the options which control the behavior of multi index syntax
must be supplied in the body of the request, rather than as request parameters. The snapshot request also supports the
`ignore_unavailable` option. Setting it to `true` will cause indices that do not exist to be ignored during snapshot
creation. By default, when `ignore_unavailable` option is not set and an index is missing the snapshot request will fail.
must be supplied in the body of the request, rather than as request parameters.
Data stream backups include the stream's backing indices and metadata, such as
the current <<data-streams-generation,generation>> and timestamp field.
You can also choose to include only specific backing indices in a snapshot.
However, these backups do not include the associated data stream's
metadata or its other backing indices.
The snapshot request also supports the
`ignore_unavailable` option. Setting it to `true` will cause data streams and indices that do not exist to be ignored during snapshot
creation. By default, when the `ignore_unavailable` option is not set and a data stream or index is missing, the snapshot request will fail.
By setting `include_global_state` to false it's possible to prevent the cluster global state to be stored as part of
the snapshot. By default, the entire snapshot will fail if one or more indices participating in the snapshot don't have
the snapshot.
IMPORTANT: The global cluster state includes the cluster's index
templates, such as those <<create-a-data-stream-template,matching a data
streams>>. If your snapshot includes data streams, we recommend storing the
cluster state as part of the snapshot. This lets you later restored any
templates required for a data stream.
By default, the entire snapshot will fail if one or more indices participating in the snapshot don't have
all primary shards available. This behaviour can be changed by setting `partial` to `true`. The `expand_wildcards`
option can be used to control whether hidden and closed indices will be included in the snapshot, and defaults to `all`.
@ -65,7 +79,7 @@ The `metadata` field can be used to attach arbitrary metadata to the snapshot. T
why it was taken, or any other data that might be useful.
Snapshot names can be automatically derived using <<date-math-index-names,date math expressions>>, similarly as when creating
new indices. Note that special characters need to be URI encoded.
new data streams or indices. Note that special characters need to be URI encoded.
For example, creating a snapshot with the current day in the name, like `snapshot-2018.05.11`, can be achieved with
the following command:
@ -78,18 +92,18 @@ PUT /_snapshot/my_backup/%3Csnapshot-%7Bnow%2Fd%7D%3E
// TEST[continued]
The index snapshot process is incremental. In the process of making the index snapshot Elasticsearch analyses
the list of the index files that are already stored in the repository and copies only files that were created or
The snapshot process is incremental. In the process of making the snapshot, {es} analyses
the list of the data stream and index files that are already stored in the repository and copies only files that were created or
changed since the last snapshot. That allows multiple snapshots to be preserved in the repository in a compact form.
Snapshotting process is executed in non-blocking fashion. All indexing and searching operation can continue to be
executed against the index that is being snapshotted. However, a snapshot represents the point-in-time view of the index
at the moment when snapshot was created, so no records that were added to the index after the snapshot process was started
executed against the data stream or index that is being snapshotted. However, a snapshot represents a point-in-time view
at the moment when snapshot was created, so no records that were added to the data stream or index after the snapshot process was started
will be present in the snapshot. The snapshot process starts immediately for the primary shards that has been started
and are not relocating at the moment. Before version 1.2.0, the snapshot operation fails if the cluster has any relocating or
initializing primaries of indices participating in the snapshot. Starting with version 1.2.0, Elasticsearch waits for
relocation or initialization of shards to complete before snapshotting them.
Besides creating a copy of each index the snapshot process can also store global cluster metadata, which includes persistent
Besides creating a copy of each data stream and index, the snapshot process can also store global cluster metadata, which includes persistent
cluster settings and templates. The transient settings and registered snapshot repositories are not stored as part of
the snapshot.
@ -107,7 +121,7 @@ GET /_snapshot/my_backup/snapshot_1
// TEST[continued]
This command returns basic information about the snapshot including start and end time, version of
Elasticsearch that created the snapshot, the list of included indices, the current state of the
Elasticsearch that created the snapshot, the list of included data streams and indices, the current state of the
snapshot and the list of failures that occurred during the snapshot. The snapshot `state` can be
[horizontal]
@ -150,7 +164,7 @@ return all snapshots that are currently available.
Getting all snapshots in the repository can be costly on cloud-based repositories,
both from a cost and performance perspective. If the only information required is
the snapshot names/uuids in the repository and the indices in each snapshot, then
the snapshot names/uuids in the repository and the data streams and indices in each snapshot, then
the optional boolean parameter `verbose` can be set to `false` to execute a more
performant and cost-effective retrieval of the snapshots in the repository. Note
that setting `verbose` to `false` will omit all other information about the snapshot