230 lines
11 KiB
Plaintext
230 lines
11 KiB
Plaintext
[role="xpack"]
|
|
[testenv="basic"]
|
|
[[getting-started-snapshot-lifecycle-management]]
|
|
== Getting started with snapshot lifecycle management
|
|
|
|
Let's get started with snapshot lifecycle management (SLM) by working through a
|
|
hands-on scenario. The goal of this example is to automatically back up {es}
|
|
indices using the <<modules-snapshots,snapshots>> every day at a particular
|
|
time. Once these snapshots have been created, they are kept for a configured
|
|
amount of time and then deleted per a configured retention policy.
|
|
|
|
[float]
|
|
[[slm-and-security]]
|
|
=== Security and SLM
|
|
Before starting, it's important to understand the privileges that are needed
|
|
when configuring SLM if you are using the security plugin. There are two
|
|
built-in cluster privileges that can be used to assist: `manage_slm` and
|
|
`read_slm`. It's also good to note that the `cluster:admin/snapshot/*`
|
|
permission allows taking and deleting snapshots even for indices the role may
|
|
not have access to.
|
|
|
|
An example of configuring an administrator role for SLM follows:
|
|
|
|
[source,console]
|
|
-----------------------------------
|
|
POST /_security/role/slm-admin
|
|
{
|
|
"cluster": ["manage_slm", "cluster:admin/snapshot/*"],
|
|
"indices": [
|
|
{
|
|
"names": [".slm-history-*"],
|
|
"privileges": ["all"]
|
|
}
|
|
]
|
|
}
|
|
-----------------------------------
|
|
// TEST[skip:security is not enabled here]
|
|
|
|
Or, for a read-only role that can retrieve policies (but not update, execute, or
|
|
delete them), as well as only view the history index:
|
|
|
|
[source,console]
|
|
-----------------------------------
|
|
POST /_security/role/slm-read-only
|
|
{
|
|
"cluster": ["read_slm"],
|
|
"indices": [
|
|
{
|
|
"names": [".slm-history-*"],
|
|
"privileges": ["read"]
|
|
}
|
|
]
|
|
}
|
|
-----------------------------------
|
|
// TEST[skip:security is not enabled here]
|
|
|
|
[float]
|
|
[[slm-gs-create-policy]]
|
|
=== Setting up a repository
|
|
|
|
Before we can set up an SLM policy, we'll need to set up a
|
|
<<snapshots-repositories,snapshot repository>> where the snapshots will be
|
|
stored. Repositories can use {plugins}/repository.html[many different backends],
|
|
including cloud storage providers. You'll probably want to use one of these in
|
|
production, but for this example we'll use a shared file system repository:
|
|
|
|
[source,console]
|
|
-----------------------------------
|
|
PUT /_snapshot/my_repository
|
|
{
|
|
"type": "fs",
|
|
"settings": {
|
|
"location": "my_backup_location"
|
|
}
|
|
}
|
|
-----------------------------------
|
|
|
|
[float]
|
|
=== Setting up a policy
|
|
|
|
Now that we have a repository in place, we can create a policy to automatically
|
|
take snapshots. Policies are written in JSON and will define when to take
|
|
snapshots, what the snapshots should be named, and which indices should be
|
|
included, among other things. We'll use the <<slm-api-put,Put Policy>> API
|
|
to create the policy.
|
|
|
|
When configurating a policy, retention can also optionally be configured. See
|
|
the <<slm-retention,SLM retention>> documentation for the full documentation of
|
|
how retention works.
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
PUT /_slm/policy/nightly-snapshots
|
|
{
|
|
"schedule": "0 30 1 * * ?", <1>
|
|
"name": "<nightly-snap-{now/d}>", <2>
|
|
"repository": "my_repository", <3>
|
|
"config": { <4>
|
|
"indices": ["*"] <5>
|
|
},
|
|
"retention": { <6>
|
|
"expire_after": "30d", <7>
|
|
"min_count": 5, <8>
|
|
"max_count": 50 <9>
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TEST[continued]
|
|
<1> when the snapshot should be taken, using
|
|
<<schedule-cron,Cron syntax>>, in this
|
|
case at 1:30AM each day
|
|
<2> whe name each snapshot should be given, using
|
|
<<date-math-index-names,date math>> to include the current date in the name
|
|
of the snapshot
|
|
<3> the repository the snapshot should be stored in
|
|
<4> the configuration to be used for the snapshot requests (see below)
|
|
<5> which indices should be included in the snapshot, in this case, every index
|
|
<6> Optional retention configuration
|
|
<7> Keep snapshots for 30 days
|
|
<8> Always keep at least 5 successful snapshots
|
|
<9> Keep no more than 50 successful snapshots, even if they're less than 30 days old
|
|
|
|
This policy will take a snapshot of every index each day at 1:30AM UTC.
|
|
Snapshots are incremental, allowing frequent snapshots to be stored efficiently,
|
|
so don't be afraid to configure a policy to take frequent snapshots.
|
|
|
|
In addition to specifying the indices that should be included in the snapshot,
|
|
the `config` field can be used to customize other aspects of the snapshot. You
|
|
can use any option allowed in <<snapshots-take-snapshot,a regular snapshot
|
|
request>>, so you can specify, for example, whether the snapshot should fail in
|
|
special cases, such as if one of the specified indices cannot be found.
|
|
|
|
[float]
|
|
=== Making sure the policy works
|
|
|
|
While snapshots taken by SLM policies can be viewed through the standard snapshot
|
|
API, SLM also keeps track of policy successes and failures in ways that are a bit
|
|
easier to use to make sure the policy is working. Once a policy has executed at
|
|
least once, when you view the policy using the <<slm-api-get,Get Policy API>>,
|
|
some metadata will be returned indicating whether the snapshot was successfully
|
|
initiated or not.
|
|
|
|
Instead of waiting for our policy to run, let's tell SLM to take a snapshot
|
|
as using the configuration from our policy right now instead of waiting for
|
|
1:30AM.
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
POST /_slm/policy/nightly-snapshots/_execute
|
|
--------------------------------------------------
|
|
// TEST[skip:we can't easily handle snapshots from docs tests]
|
|
|
|
This request will kick off a snapshot for our policy right now, regardless of
|
|
the schedule in the policy. This is useful for taking snapshots before making
|
|
a configuration change, upgrading, or for our purposes, making sure our policy
|
|
is going to work successfully. The policy will continue to run on its configured
|
|
schedule after this execution of the policy.
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
GET /_slm/policy/nightly-snapshots?human
|
|
--------------------------------------------------
|
|
// TEST[continued]
|
|
|
|
This request will return a response that includes the policy, as well as
|
|
information about the last time the policy succeeded and failed, as well as the
|
|
next time the policy will be executed.
|
|
|
|
[source,console-result]
|
|
--------------------------------------------------
|
|
{
|
|
"nightly-snapshots" : {
|
|
"version": 1,
|
|
"modified_date": "2019-04-23T01:30:00.000Z",
|
|
"modified_date_millis": 1556048137314,
|
|
"policy" : {
|
|
"schedule": "0 30 1 * * ?",
|
|
"name": "<nightly-snap-{now/d}>",
|
|
"repository": "my_repository",
|
|
"config": {
|
|
"indices": ["*"],
|
|
},
|
|
"retention": {
|
|
"expire_after": "30d",
|
|
"min_count": 5,
|
|
"max_count": 50
|
|
}
|
|
},
|
|
"last_success": { <1>
|
|
"snapshot_name": "nightly-snap-2019.04.24-tmtnyjtrsxkhbrrdcgg18a", <2>
|
|
"time_string": "2019-04-24T16:43:49.316Z",
|
|
"time": 1556124229316
|
|
} ,
|
|
"last_failure": { <3>
|
|
"snapshot_name": "nightly-snap-2019.04.02-lohisb5ith2n8hxacaq3mw",
|
|
"time_string": "2019-04-02T01:30:00.000Z",
|
|
"time": 1556042030000,
|
|
"details": "{\"type\":\"index_not_found_exception\",\"reason\":\"no such index [important]\",\"resource.type\":\"index_or_alias\",\"resource.id\":\"important\",\"index_uuid\":\"_na_\",\"index\":\"important\",\"stack_trace\":\"[important] IndexNotFoundException[no such index [important]]\\n\\tat org.elasticsearch.cluster.metadata.IndexNameExpressionResolver$WildcardExpressionResolver.indexNotFoundException(IndexNameExpressionResolver.java:762)\\n\\tat org.elasticsearch.cluster.metadata.IndexNameExpressionResolver$WildcardExpressionResolver.innerResolve(IndexNameExpressionResolver.java:714)\\n\\tat org.elasticsearch.cluster.metadata.IndexNameExpressionResolver$WildcardExpressionResolver.resolve(IndexNameExpressionResolver.java:670)\\n\\tat org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndices(IndexNameExpressionResolver.java:163)\\n\\tat org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndexNames(IndexNameExpressionResolver.java:142)\\n\\tat org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndexNames(IndexNameExpressionResolver.java:102)\\n\\tat org.elasticsearch.snapshots.SnapshotsService$1.execute(SnapshotsService.java:280)\\n\\tat org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:47)\\n\\tat org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:687)\\n\\tat org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:310)\\n\\tat org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:210)\\n\\tat org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:142)\\n\\tat org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150)\\n\\tat org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188)\\n\\tat org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:688)\\n\\tat org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252)\\n\\tat org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215)\\n\\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\\n\\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\\n\\tat java.base/java.lang.Thread.run(Thread.java:834)\\n\"}"
|
|
} ,
|
|
"next_execution": "2019-04-24T01:30:00.000Z", <4>
|
|
"next_execution_millis": 1556048160000
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TESTRESPONSE[skip:the presence of last_failure and last_success is asynchronous and will be present for users, but is untestable]
|
|
|
|
<1> information about the last time the policy successfully initated a snapshot
|
|
<2> the name of the snapshot that was successfully initiated
|
|
<3> information about the last time the policy failed to initiate a snapshot
|
|
<4> the is the next time the policy will execute
|
|
|
|
NOTE: This metadata only indicates whether the request to initiate the snapshot was
|
|
made successfully or not - after the snapshot has been successfully started, it
|
|
is possible for the snapshot to fail if, for example, the connection to a remote
|
|
repository is lost while copying files.
|
|
|
|
If you're following along, the returned SLM policy shouldn't have a `last_failure`
|
|
field - it's included above only as an example. You should, however, see a
|
|
`last_success` field and a snapshot name. If you do, you've successfully taken
|
|
your first snapshot using SLM!
|
|
|
|
While only the most recent success and failure are available through the Get Policy
|
|
API, all policy executions are recorded to a history index, which may be queried
|
|
by searching the index pattern `.slm-history*`.
|
|
|
|
That's it! We have our first SLM policy set up to periodically take snapshots
|
|
so that our backups are always up to date. You can read more details in the
|
|
<<snapshot-lifecycle-management-api,SLM API documentation>> and the
|
|
<<modules-snapshots,general snapshot documentation.>>
|