[7.x] [DOCS] Update snapshot/restore docs to align with API changes (#59730) (#59803)

* [DOCS] Updating snapshot/restore pages to align with API changes (#59730)

* Updating snapshot/restore pages to align with API changes.

* Fixing texts in delete snapshot page.

* Removing duplicate code sample and making editorial changes.

* Change "deleted" to "delete"

* Incorporating review feedback and making minor editorial changes.

* Remove titleabbrev

* Add paragraph break

* Remove titleabbrev from restore page

* Remove titleabbrev from create page

* Change "Create" to lowercase

* Change API names to lowercase

* Remove extraneous delimiters

* Change "Delete" to lowercase

* Single-sourcing warning and clarifying warning text.

* Fixing tests and removing erroneous example.
This commit is contained in:
Adam Locke 2020-07-17 14:33:18 -04:00 committed by GitHub
parent 95e6e4a452
commit 29ff05cbac
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
6 changed files with 230 additions and 269 deletions

View File

@ -88,9 +88,7 @@ Use the get snapshot status API to retrieve detailed information about snapshots
If you specify both the repository name and snapshot, the request retrieves detailed status information for the given snapshot, even if not currently running.
WARNING: Using this API to return any status results other than the currently running snapshots (`_current`) can be very expensive. Each request to retrieve snapshot status results in file reads from every shard in a snapshot, for each snapshot.
+
For example, if you have 100 snapshots with 1,000 shards each, the API request will result in 100,000 file reads (100 snapshots * 1,000 shards). Depending on the latency of your file storage, the request can take extremely long to retrieve results.
include::{es-ref-dir}/snapshot-restore/monitor-snapshot-restore.asciidoc[tag=get-snapshot-status-warning]
[[get-snapshot-status-api-path-params]]
==== {api-path-parms-title}

View File

@ -0,0 +1,47 @@
[[delete-snapshots]]
== Delete a snapshot
////
[source,console]
-----------------------------------
PUT /_snapshot/my_backup
{
"type": "fs",
"settings": {
"location": "my_backup_location"
}
}
PUT /_snapshot/my_backup/snapshot_1?wait_for_completion=true
PUT /_snapshot/my_backup/snapshot_2?wait_for_completion=true
PUT /_snapshot/my_backup/snapshot_3?wait_for_completion=true
-----------------------------------
// TESTSETUP
////
Use the <<delete-snapshot-api,delete snapshot API>> to delete a snapshot
from the repository:
[source,console]
----
DELETE /_snapshot/my_backup/snapshot_1
----
When a snapshot is deleted from a repository, {es} deletes all files associated with the
snapshot that are not in-use by other snapshots.
If the delete snapshot operation starts while the snapshot is being
created, the snapshot process halts and all files created as part of the snapshotting process are
removed. Use the <<delete-snapshot-api,Delete snapshot API>> to cancel long running snapshot operations that were
started by mistake.
To delete multiple snapshots from a repository, separate snapshot names by commas or use wildcards:
[source,console]
-----------------------------------
DELETE /_snapshot/my_backup/snapshot_2,snapshot_3
DELETE /_snapshot/my_backup/snap*
-----------------------------------

View File

@ -11,7 +11,8 @@ indices. You can also take snapshots of only specific data streams or indices in
the cluster.
Snapshots can be stored in either local or remote repositories.
Remote repositories can reside on S3, HDFS, Azure, Google Cloud Storage,
Remote repositories can reside on Amazon S3, HDFS, Microsoft Azure,
Google Cloud Storage,
and other platforms supported by a repository plugin.
Snapshots are incremental: each snapshot only stores data that
@ -96,5 +97,5 @@ include::register-repository.asciidoc[]
include::take-snapshot.asciidoc[]
include::restore-snapshot.asciidoc[]
include::monitor-snapshot-restore.asciidoc[]
include::delete-snapshot.asciidoc[]
include::../slm/index.asciidoc[]

View File

@ -1,13 +1,15 @@
[[snapshots-monitor-snapshot-restore]]
== Monitor snapshot and restore progress
++++
<titleabbrev>Monitor snapshot and restore</titleabbrev>
++++
There are several ways to monitor the progress of the snapshot and restore processes while they are running. Both
operations support `wait_for_completion` parameter that would block client until the operation is completed. This is
the simplest method that can be used to get notified about operation completion.
Use the <<get-snapshot-api,get snapshot API>> or the
<<get-snapshot-status-api,get snapshot status API>> to monitor the
progress of snapshot operations. Both APIs support the
`wait_for_completion` parameter that blocks the client until the
operation finishes, which is the simplest method of being notified
about operation completion.
////
[source,console]
@ -20,71 +22,155 @@ PUT /_snapshot/my_backup
}
}
PUT /_snapshot/my_fs_backup
{
"type": "fs",
"settings": {
"location": "my_other_backup_location"
}
}
PUT /_snapshot/my_backup/snapshot_1?wait_for_completion=true
PUT /_snapshot/my_backup/some_other_snapshot?wait_for_completion=true
-----------------------------------
// TESTSETUP
////
The snapshot operation can be also monitored by periodic calls to the snapshot info:
Use the `_current` parameter to retrieve all currently running
snapshots in the cluster:
[source,console]
-----------------------------------
GET /_snapshot/my_backup/_current
-----------------------------------
Including a snapshot name in the request retrieves information about a single snapshot:
[source,console]
-----------------------------------
GET /_snapshot/my_backup/snapshot_1
-----------------------------------
Please note that snapshot info operation uses the same resources and thread pool as the snapshot operation. So,
executing a snapshot info operation while large shards are being snapshotted can cause the snapshot info operation to wait
for available resources before returning the result. On very large shards the wait time can be significant.
This request retrieves basic information about the snapshot, including start and end time, version of
{es} that created the snapshot, the list of included data streams and indices, the current state of the
snapshot and the list of failures that occurred during the snapshot.
To get more immediate and complete information about snapshots the snapshot status command can be used instead:
Similar to repositories, you can retrieve information about multiple snapshots in a single request, and wildcards are supported:
[source,console]
-----------------------------------
GET /_snapshot/my_backup/snapshot_*,some_other_snapshot
-----------------------------------
Separate repository names with commas or use wildcards to retrieve snapshots from multiple repositories:
[source,console]
-----------------------------------
GET /_snapshot/_all
GET /_snapshot/my_backup,my_fs_backup
GET /_snapshot/my*
-----------------------------------
Add the `_all` parameter to the request to list all snapshots currently stored in the repository:
[source,console]
-----------------------------------
GET /_snapshot/my_backup/_all
-----------------------------------
This request fails if some of the snapshots are unavailable. Use the boolean parameter `ignore_unavailable` to
return all snapshots that are currently available.
Getting all snapshots in the repository can be costly on cloud-based repositories,
both from a cost and performance perspective. If the only information required is
the snapshot names or UUIDs in the repository and the data streams and indices in each snapshot, then
the optional boolean parameter `verbose` can be set to `false` to execute a more
performant and cost-effective retrieval of the snapshots in the repository.
NOTE: Setting `verbose` to `false` omits additional information
about the snapshot, such as metadata, start and end time, number of shards that include the snapshot, and error messages. The default value of the `verbose` parameter is `true`.
[discrete]
[[get-snapshot-detailed-status]]
=== Retrieving snapshot status
To retrieve more detailed information about snapshots, use the <<get-snapshot-status-api,get snapshot status API>>. While snapshot request returns only basic information about the snapshot in progress, the snapshot status request returns
complete breakdown of the current state for each shard participating in the snapshot.
// tag::get-snapshot-status-warning[]
[WARNING]
====
Using the get snapshot status API to return any status results other than the currently running snapshots (`_current`) can be very expensive. Each request to retrieve snapshot status results in file reads from every shard in a snapshot, for each snapshot. Such requests are taxing to machine resources and can also incur high processing costs when running in the cloud.
For example, if you have 100 snapshots with 1,000 shards each, the API request will result in 100,000 file reads (100 snapshots * 1,000 shards). Depending on the latency of your file storage, the request can take extremely long to retrieve results.
====
// end::get-snapshot-status-warning[]
The following request retrieves all currently running snapshots with
detailed status information:
[source,console]
-----------------------------------
GET /_snapshot/_status
-----------------------------------
By specifying a repository name, it's possible
to limit the results to a particular repository:
[source,console]
-----------------------------------
GET /_snapshot/my_backup/_status
-----------------------------------
If both repository name and snapshot name are specified, the request
returns detailed status information for the given snapshot, even
if not currently running:
[source,console]
-----------------------------------
GET /_snapshot/my_backup/snapshot_1/_status
-----------------------------------
// TEST[continued]
While snapshot info method returns only basic information about the snapshot in progress, the snapshot status returns
complete breakdown of the current state for each shard participating in the snapshot.
[discrete]
=== Monitoring restore operations
The restore process piggybacks on the standard recovery mechanism of the
Elasticsearch. As a result, standard recovery monitoring services can be used
to monitor the state of restore. When the restore operation is executed the
The restore process piggybacks on the standard recovery mechanism of
{es}. As a result, standard recovery monitoring services can be used
to monitor the state of restore. When the restore operation starts, the
cluster typically goes into `yellow` state because the restore operation works
by recovering primary shards of the restored indices. Once the recovery of the
primary shards is completed Elasticsearch switches to the standard replication
process that creates the required number of replicas. Once all required
by recovering primary shards of the restored indices. After the recovery of the
primary shards is completed, {es} switches to the standard replication
process that creates the required number of replicas. When all required
replicas are created, the cluster switches to the `green` states.
The cluster health operation provides only a high level status of the restore process. It's possible to get more
detailed insight into the current state of the recovery process by using <<indices-recovery, index recovery>> and
<<cat-recovery, cat recovery>> APIs.
[float]
[discrete]
[[get-snapshot-stop-snapshot]]
=== Stop snapshot and restore operations
The snapshot and restore framework allows running only one snapshot or one restore operation at a time. If a currently
running snapshot was executed by mistake, or takes unusually long, it can be terminated using the snapshot delete operation.
The snapshot delete operation checks if the deleted snapshot is currently running and if it does, the delete operation stops
running snapshot was started by mistake, or takes unusually long, it can be stopped using the <<delete-snapshot-api,delete snapshot API>>.
This operation checks whether the deleted snapshot is currently running. If it is, the delete snapshot operation stops
that snapshot before deleting the snapshot data from the repository.
[source,console]
-----------------------------------
DELETE /_snapshot/my_backup/snapshot_1
-----------------------------------
// TEST[continued]
The restore operation uses the standard shard recovery mechanism. Therefore, any currently running restore operation can
be canceled by deleting data streams and indices that are being restored. Please note that data for all deleted data streams and indices will be removed
be canceled by deleting data streams and indices that are being restored. Data for all deleted data streams and indices will be removed
from the cluster as a result of this operation.
[float]
[discrete]
[[get-snapshot-cluster-blocks]]
=== Effect of cluster blocks on snapshot and restore
Many snapshot and restore operations are affected by cluster and index blocks. For example, registering and unregistering
repositories require write global metadata access. The snapshot operation requires that all indices, backing indices, and their metadata as
well as the global metadata be readable. The restore operation requires the global metadata to be writable, however
repositories require global metadata write access. The snapshot operation requires that all indices, backing indices, and their metadata (including
global metadata) are readable. The restore operation requires the global metadata to be writable. However,
the index level blocks are ignored during restore because indices are essentially recreated during restore.
Please note that a repository content is not part of the cluster and therefore cluster blocks don't affect internal
A repository content is not part of the cluster and therefore cluster blocks do not affect internal
repository operations such as listing or deleting snapshots from an already registered repository.

View File

@ -179,88 +179,3 @@ index will not be successfully restored unless these index allocation settings a
The restore operation also checks that restored persistent settings are compatible with the current cluster to avoid accidentally
restoring incompatible settings. If you need to restore a snapshot with incompatible persistent settings, try restoring it without
the global cluster state.
[float]
=== Snapshot status
A list of currently running snapshots with their detailed status information can be obtained using the following command:
[source,console]
-----------------------------------
GET /_snapshot/_status
-----------------------------------
// TEST[continued]
In this format, the command will return information about all currently running snapshots. By specifying a repository name, it's possible
to limit the results to a particular repository:
[source,console]
-----------------------------------
GET /_snapshot/my_backup/_status
-----------------------------------
// TEST[continued]
If both repository name and snapshot id are specified, this command will return detailed status information for the given snapshot even
if it's not currently running:
[source,console]
-----------------------------------
GET /_snapshot/my_backup/snapshot_1/_status
-----------------------------------
// TEST[continued]
The output looks similar to the following:
[source,console-result]
--------------------------------------------------
{
"snapshots": [
{
"snapshot": "snapshot_1",
"repository": "my_backup",
"uuid": "XuBo4l4ISYiVg0nYUen9zg",
"state": "SUCCESS",
"include_global_state": true,
"shards_stats": {
"initializing": 0,
"started": 0,
"finalizing": 0,
"done": 5,
"failed": 0,
"total": 5
},
"stats": {
"incremental": {
"file_count": 8,
"size_in_bytes": 4704
},
"processed": {
"file_count": 7,
"size_in_bytes": 4254
},
"total": {
"file_count": 8,
"size_in_bytes": 4704
},
"start_time_in_millis": 1526280280355,
"time_in_millis": 358
}
}
]
}
--------------------------------------------------
// TESTRESPONSE[skip: No snapshot status to validate.]
The output is composed of different sections. The `stats` sub-object provides details on the number and size of files that were
snapshotted. As snapshots are incremental, copying only the Lucene segments that are not already in the repository,
the `stats` object contains a `total` section for all the files that are referenced by the snapshot, as well as an `incremental` section
for those files that actually needed to be copied over as part of the incremental snapshotting. In case of a snapshot that's still
in progress, there's also a `processed` section that contains information about the files that are in the process of being copied.
Multiple ids are also supported:
[source,console]
-----------------------------------
GET /_snapshot/my_backup/snapshot_1,snapshot_2/_status
-----------------------------------
// TEST[skip: no snapshot_2 to get]

View File

@ -1,9 +1,12 @@
[[snapshots-take-snapshot]]
== Take a snapshot
== Create a snapshot
A repository can contain multiple snapshots of the same cluster. Snapshots are identified by unique names within the
cluster. A snapshot with the name `snapshot_1` in the repository `my_backup` can be created by executing the following
command:
cluster.
Use the <<put-snapshot-repo-api,put snapshot repository API>> to register or update a snapshot repository, and then use the <<create-snapshot-api,create snapshot API>> to create a snapshot in a repository.
The following request creates a snapshot with the name `snapshot_1` in the repository `my_backup`:
////
[source,console]
@ -26,11 +29,11 @@ PUT /_snapshot/my_backup/snapshot_1?wait_for_completion=true
The `wait_for_completion` parameter specifies whether or not the request should return immediately after snapshot
initialization (default) or wait for snapshot completion. During snapshot initialization, information about all
previous snapshots is loaded into the memory, which means that in large repositories it may take several seconds (or
even minutes) for this command to return even if the `wait_for_completion` parameter is set to `false`.
previous snapshots is loaded into memory, which means that in large repositories it may take several seconds (or
even minutes) for this request to return even if the `wait_for_completion` parameter is set to `false`.
By default a snapshot backs up all data streams and open indices in the cluster. This behavior can be changed by
specifying the list of data streams and indices in the body of the snapshot request.
By default, a snapshot backs up all data streams and open indices in the cluster. You can change this behavior by
specifying the list of data streams and indices in the body of the snapshot request:
[source,console]
-----------------------------------
@ -47,8 +50,8 @@ PUT /_snapshot/my_backup/snapshot_2?wait_for_completion=true
-----------------------------------
// TEST[skip:cannot complete subsequent snapshot]
The list of data streams and indices that should be included into the snapshot can be specified using the `indices` parameter that
supports <<multi-index,multi-target syntax>>, although the options which control the behavior of multi index syntax
Use the `indices` parameter to list the data streams and indices that should be included in the snapshot. This parameter supports
<<multi-index,multi-target syntax>>, although the options that control the behavior of multi-index syntax
must be supplied in the body of the request, rather than as request parameters.
Data stream backups include the stream's backing indices and metadata, such as
@ -58,156 +61,67 @@ You can also choose to include only specific backing indices in a snapshot.
However, these backups do not include the associated data stream's
metadata or its other backing indices.
The snapshot request also supports the
`ignore_unavailable` option. Setting it to `true` will cause data streams and indices that do not exist to be ignored during snapshot
creation. By default, when the `ignore_unavailable` option is not set and a data stream or index is missing, the snapshot request will fail.
By setting `include_global_state` to false it's possible to prevent the cluster global state to be stored as part of
the snapshot.
IMPORTANT: The global cluster state includes the cluster's index
templates, such as those <<create-a-data-stream-template,matching a data
streams>>. If your snapshot includes data streams, we recommend storing the
cluster state as part of the snapshot. This lets you later restored any
templates required for a data stream.
By default, the entire snapshot will fail if one or more indices participating in the snapshot don't have
all primary shards available. This behaviour can be changed by setting `partial` to `true`. The `expand_wildcards`
option can be used to control whether hidden and closed indices will be included in the snapshot, and defaults to `all`.
The `metadata` field can be used to attach arbitrary metadata to the snapshot. This may be a record of who took the snapshot,
why it was taken, or any other data that might be useful.
Snapshot names can be automatically derived using <<date-math-index-names,date math expressions>>, similarly as when creating
new data streams or indices. Note that special characters need to be URI encoded.
For example, creating a snapshot with the current day in the name, like `snapshot-2018.05.11`, can be achieved with
the following command:
[source,console]
-----------------------------------
# PUT /_snapshot/my_backup/<snapshot-{now/d}>
PUT /_snapshot/my_backup/%3Csnapshot-%7Bnow%2Fd%7D%3E
-----------------------------------
// TEST[continued]
[discrete]
[[create-snapshot-process-details]]
=== Snapshot process details
The snapshot process is incremental. In the process of making the snapshot, {es} analyses
the list of the data stream and index files that are already stored in the repository and copies only files that were created or
changed since the last snapshot. That allows multiple snapshots to be preserved in the repository in a compact form.
Snapshotting process is executed in non-blocking fashion. All indexing and searching operation can continue to be
executed against the data stream or index that is being snapshotted. However, a snapshot represents a point-in-time view
changed since the last snapshot. This process allows multiple snapshots to be preserved in the repository in a compact form.
The snapshot process is executed in non-blocking fashion. All indexing and searching operations can continue to run against the data stream or index
that is being snapshotted. However, a snapshot represents a point-in-time view
at the moment when snapshot was created, so no records that were added to the data stream or index after the snapshot process was started
will be present in the snapshot. The snapshot process starts immediately for the primary shards that has been started
and are not relocating at the moment. Before version 1.2.0, the snapshot operation fails if the cluster has any relocating or
initializing primaries of indices participating in the snapshot. Starting with version 1.2.0, Elasticsearch waits for
will be included in the snapshot.
The snapshot process starts immediately for the primary shards that have been started and are not relocating at the moment. {es} waits for
relocation or initialization of shards to complete before snapshotting them.
Besides creating a copy of each data stream and index, the snapshot process can also store global cluster metadata, which includes persistent
cluster settings and templates. The transient settings and registered snapshot repositories are not stored as part of
the snapshot.
Only one snapshot process can be executed in the cluster at any time. While snapshot of a particular shard is being
created this shard cannot be moved to another node, which can interfere with rebalancing process and allocation
filtering. Elasticsearch will only be able to move a shard to another node (according to the current allocation
filtering settings and rebalancing algorithm) once the snapshot is finished.
Only one snapshot process can be started in the cluster at any time. While a
snapshot of a particular shard is being
created, this shard cannot be moved to another node, which can interfere with rebalancing and allocation
filtering. {es} can only move a shard to another node (according to the current allocation
filtering settings and rebalancing algorithm) after the snapshot process
is finished.
Once a snapshot is created information about this snapshot can be obtained using the following command:
After a snapshot is created, use the <<get-snapshot-api,Get snapshot API>> to retrieve information about a snapshot. See <<snapshots-monitor-snapshot-restore,Monitor snapshot and restore progress>> to learn more about retrieving snapshot status.
[discrete]
[[create-snapshot-options]]
=== Options for creating a snapshot
The create snapshot request supports the
`ignore_unavailable` option. Setting it to `true` will cause data streams and indices that do not exist to be ignored during snapshot
creation. By default, when the `ignore_unavailable` option is not set and a data stream or index is missing, the snapshot request will fail.
By setting `include_global_state` to `false` it's possible to prevent the cluster global state to be stored as part of
the snapshot.
IMPORTANT: The global cluster state includes the cluster's index
templates, such as those <<create-a-data-stream-template,matching a data
stream>>. If your snapshot includes data streams, we recommend storing the
cluster state as part of the snapshot. This lets you later restored any
templates required for a data stream.
By default, the entire snapshot will fail if one or more indices participating in the snapshot do not have
all primary shards available. You can change this behaviour by setting `partial` to `true`. The `expand_wildcards`
option can be used to control whether hidden and closed indices will be included in the snapshot, and defaults to `all`.
Use the `metadata` field to attach arbitrary metadata to the snapsho,
such as who took the snapshot,
why it was taken, or any other data that might be useful.
Snapshot names can be automatically derived using <<date-math-index-names,date math expressions>>, similarly as when creating
new data streams or indices. Special characters must be URI encoded.
For example, use the <<create-snapshot-api,create snapshot API>> to create
a snapshot with the current day in the name, such as `snapshot-2020.07.11`:
[source,console]
-----------------------------------
GET /_snapshot/my_backup/snapshot_1
PUT /_snapshot/my_backup/<snapshot-{now/d}>
PUT /_snapshot/my_backup/%3Csnapshot-%7Bnow%2Fd%7D%3E
-----------------------------------
// TEST[continued]
This command returns basic information about the snapshot including start and end time, version of
Elasticsearch that created the snapshot, the list of included data streams and indices, the current state of the
snapshot and the list of failures that occurred during the snapshot. The snapshot `state` can be
[horizontal]
`IN_PROGRESS`::
The snapshot is currently running.
`SUCCESS`::
The snapshot finished and all shards were stored successfully.
`FAILED`::
The snapshot finished with an error and failed to store any data.
`PARTIAL`::
The global cluster state was stored, but data of at least one shard was not stored successfully.
The `failures` section of the response contains more detailed information about shards
that were not processed correctly.
`INCOMPATIBLE`::
The snapshot was created with an old version of {es} and is incompatible with
the current version of the cluster.
Similar as for repositories, information about multiple snapshots can be queried in a single request, supporting wildcards as well:
[source,console]
-----------------------------------
GET /_snapshot/my_backup/snapshot_*,some_other_snapshot
-----------------------------------
// TEST[continued]
All snapshots currently stored in the repository can be listed using the following command:
[source,console]
-----------------------------------
GET /_snapshot/my_backup/_all
-----------------------------------
// TEST[continued]
The command fails if some of the snapshots are unavailable. The boolean parameter `ignore_unavailable` can be used to
return all snapshots that are currently available.
Getting all snapshots in the repository can be costly on cloud-based repositories,
both from a cost and performance perspective. If the only information required is
the snapshot names/uuids in the repository and the data streams and indices in each snapshot, then
the optional boolean parameter `verbose` can be set to `false` to execute a more
performant and cost-effective retrieval of the snapshots in the repository. Note
that setting `verbose` to `false` will omit all other information about the snapshot
such as status information, the number of snapshotted shards, etc. The default
value of the `verbose` parameter is `true`.
It is also possible to retrieve snapshots from multiple repositories in one go, for example:
[source,console]
-----------------------------------
GET /_snapshot/_all
GET /_snapshot/my_backup,my_fs_backup
GET /_snapshot/my*/snap*
-----------------------------------
// TEST[skip:no my_fs_backup]
A currently running snapshot can be retrieved using the following command:
[source,console]
-----------------------------------
GET /_snapshot/my_backup/_current
-----------------------------------
// TEST[continued]
A snapshot can be deleted from the repository using the following command:
[source,console]
-----------------------------------
DELETE /_snapshot/my_backup/snapshot_2
-----------------------------------
// TEST[continued]
When a snapshot is deleted from a repository, Elasticsearch deletes all files that are associated with the deleted
snapshot and not used by any other snapshots. If the deleted snapshot operation is executed while the snapshot is being
created the snapshotting process will be aborted and all files created as part of the snapshotting process will be
cleaned. Therefore, the delete snapshot operation can be used to cancel long running snapshot operations that were
started by mistake.
It is also possible to delete multiple snapshots from a repository in one go, for example:
[source,console]
-----------------------------------
DELETE /_snapshot/my_backup/my_backup,my_fs_backup
DELETE /_snapshot/my_backup/snap*
-----------------------------------
// TEST[skip:no my_fs_backup]