[role="xpack"] [testenv="basic"] [[getting-started-snapshot-lifecycle-management]] == Getting started with snapshot lifecycle management Let's get started with snapshot lifecycle management (SLM) by working through a hands-on scenario. The goal of this example is to automatically back up {es} indices using the <> every day at a particular time. Once these snapshots have been created, they are kept for a configured amount of time and then deleted per a configured retention policy. [float] [[slm-and-security]] === Security and SLM Before starting, it's important to understand the privileges that are needed when configuring SLM if you are using the security plugin. There are two built-in cluster privileges that can be used to assist: `manage_slm` and `read_slm`. It's also good to note that the `cluster:admin/snapshot/*` permission allows taking and deleting snapshots even for indices the role may not have access to. An example of configuring an administrator role for SLM follows: [source,console] ----------------------------------- POST /_security/role/slm-admin { "cluster": ["manage_slm", "cluster:admin/snapshot/*"], "indices": [ { "names": [".slm-history-*"], "privileges": ["all"] } ] } ----------------------------------- // TEST[skip:security is not enabled here] Or, for a read-only role that can retrieve policies (but not update, execute, or delete them), as well as only view the history index: [source,console] ----------------------------------- POST /_security/role/slm-read-only { "cluster": ["read_slm"], "indices": [ { "names": [".slm-history-*"], "privileges": ["read"] } ] } ----------------------------------- // TEST[skip:security is not enabled here] [float] [[slm-gs-create-policy]] === Setting up a repository Before we can set up an SLM policy, we'll need to set up a <> where the snapshots will be stored. Repositories can use {plugins}/repository.html[many different backends], including cloud storage providers. You'll probably want to use one of these in production, but for this example we'll use a shared file system repository: [source,console] ----------------------------------- PUT /_snapshot/my_repository { "type": "fs", "settings": { "location": "my_backup_location" } } ----------------------------------- [float] === Setting up a policy Now that we have a repository in place, we can create a policy to automatically take snapshots. Policies are written in JSON and will define when to take snapshots, what the snapshots should be named, and which indices should be included, among other things. We'll use the <> API to create the policy. When configurating a policy, retention can also optionally be configured. See the <> documentation for the full documentation of how retention works. [source,console] -------------------------------------------------- PUT /_slm/policy/nightly-snapshots { "schedule": "0 30 1 * * ?", <1> "name": "", <2> "repository": "my_repository", <3> "config": { <4> "indices": ["*"] <5> }, "retention": { <6> "expire_after": "30d", <7> "min_count": 5, <8> "max_count": 50 <9> } } -------------------------------------------------- // TEST[continued] <1> when the snapshot should be taken, using <>, in this case at 1:30AM each day <2> whe name each snapshot should be given, using <> to include the current date in the name of the snapshot <3> the repository the snapshot should be stored in <4> the configuration to be used for the snapshot requests (see below) <5> which indices should be included in the snapshot, in this case, every index <6> Optional retention configuration <7> Keep snapshots for 30 days <8> Always keep at least 5 successful snapshots <9> Keep no more than 50 successful snapshots, even if they're less than 30 days old This policy will take a snapshot of every index each day at 1:30AM UTC. Snapshots are incremental, allowing frequent snapshots to be stored efficiently, so don't be afraid to configure a policy to take frequent snapshots. In addition to specifying the indices that should be included in the snapshot, the `config` field can be used to customize other aspects of the snapshot. You can use any option allowed in <>, so you can specify, for example, whether the snapshot should fail in special cases, such as if one of the specified indices cannot be found. [float] === Making sure the policy works While snapshots taken by SLM policies can be viewed through the standard snapshot API, SLM also keeps track of policy successes and failures in ways that are a bit easier to use to make sure the policy is working. Once a policy has executed at least once, when you view the policy using the <>, some metadata will be returned indicating whether the snapshot was sucessfully initiated or not. Instead of waiting for our policy to run, let's tell SLM to take a snapshot as using the configuration from our policy right now instead of waiting for 1:30AM. [source,console] -------------------------------------------------- POST /_slm/policy/nightly-snapshots/_execute -------------------------------------------------- // TEST[skip:we can't easily handle snapshots from docs tests] This request will kick off a snapshot for our policy right now, regardless of the schedule in the policy. This is useful for taking snapshots before making a configuration change, upgrading, or for our purposes, making sure our policy is going to work successfully. The policy will continue to run on its configured schedule after this execution of the policy. [source,console] -------------------------------------------------- GET /_slm/policy/nightly-snapshots?human -------------------------------------------------- // TEST[continued] This request will return a response that includes the policy, as well as information about the last time the policy succeeded and failed, as well as the next time the policy will be executed. [source,console-result] -------------------------------------------------- { "nightly-snapshots" : { "version": 1, "modified_date": "2019-04-23T01:30:00.000Z", "modified_date_millis": 1556048137314, "policy" : { "schedule": "0 30 1 * * ?", "name": "", "repository": "my_repository", "config": { "indices": ["*"], }, "retention": { "expire_after": "30d", "min_count": 5, "max_count": 50 } }, "last_success": { <1> "snapshot_name": "nightly-snap-2019.04.24-tmtnyjtrsxkhbrrdcgg18a", <2> "time_string": "2019-04-24T16:43:49.316Z", "time": 1556124229316 } , "last_failure": { <3> "snapshot_name": "nightly-snap-2019.04.02-lohisb5ith2n8hxacaq3mw", "time_string": "2019-04-02T01:30:00.000Z", "time": 1556042030000, "details": "{\"type\":\"index_not_found_exception\",\"reason\":\"no such index [important]\",\"resource.type\":\"index_or_alias\",\"resource.id\":\"important\",\"index_uuid\":\"_na_\",\"index\":\"important\",\"stack_trace\":\"[important] IndexNotFoundException[no such index [important]]\\n\\tat org.elasticsearch.cluster.metadata.IndexNameExpressionResolver$WildcardExpressionResolver.indexNotFoundException(IndexNameExpressionResolver.java:762)\\n\\tat org.elasticsearch.cluster.metadata.IndexNameExpressionResolver$WildcardExpressionResolver.innerResolve(IndexNameExpressionResolver.java:714)\\n\\tat org.elasticsearch.cluster.metadata.IndexNameExpressionResolver$WildcardExpressionResolver.resolve(IndexNameExpressionResolver.java:670)\\n\\tat org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndices(IndexNameExpressionResolver.java:163)\\n\\tat org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndexNames(IndexNameExpressionResolver.java:142)\\n\\tat org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndexNames(IndexNameExpressionResolver.java:102)\\n\\tat org.elasticsearch.snapshots.SnapshotsService$1.execute(SnapshotsService.java:280)\\n\\tat org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:47)\\n\\tat org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:687)\\n\\tat org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:310)\\n\\tat org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:210)\\n\\tat org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:142)\\n\\tat org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150)\\n\\tat org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188)\\n\\tat org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:688)\\n\\tat org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252)\\n\\tat org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215)\\n\\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\\n\\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\\n\\tat java.base/java.lang.Thread.run(Thread.java:834)\\n\"}" } , "next_execution": "2019-04-24T01:30:00.000Z", <4> "next_execution_millis": 1556048160000 } } -------------------------------------------------- // TESTRESPONSE[skip:the presence of last_failure and last_success is asynchronous and will be present for users, but is untestable] <1> information about the last time the policy successfully initated a snapshot <2> the name of the snapshot that was successfully initiated <3> information about the last time the policy failed to initiate a snapshot <4> the is the next time the policy will execute NOTE: This metadata only indicates whether the request to initiate the snapshot was made successfully or not - after the snapshot has been successfully started, it is possible for the snapshot to fail if, for example, the connection to a remote repository is lost while copying files. If you're following along, the returned SLM policy shouldn't have a `last_failure` field - it's included above only as an example. You should, however, see a `last_success` field and a snapshot name. If you do, you've successfully taken your first snapshot using SLM! While only the most recent sucess and failure are available through the Get Policy API, all policy executions are recorded to a history index, which may be queried by searching the index pattern `.slm-history*`. That's it! We have our first SLM policy set up to periodically take snapshots so that our backups are always up to date. You can read more details in the <> and the <>