2018-10-26 15:19:52 -04:00
|
|
|
[role="xpack"]
|
|
|
|
[testenv="basic"]
|
2018-10-25 10:30:53 -04:00
|
|
|
[[start-stop-ilm]]
|
2018-12-20 13:23:28 -05:00
|
|
|
== Start and stop {ilm}
|
2018-10-25 10:30:53 -04:00
|
|
|
|
|
|
|
All indices that are managed by ILM will continue to execute
|
|
|
|
their policies. There may be times when this is not desired on certain
|
|
|
|
indices, or maybe even all the indices in a cluster. For example,
|
|
|
|
maybe there are scheduled maintenance windows when cluster topology
|
|
|
|
changes are desired that may impact running ILM actions. For this reason,
|
|
|
|
ILM has two ways to disable operations.
|
|
|
|
|
[7.x] Add Snapshot Lifecycle Management (#44382)
* Add Snapshot Lifecycle Management (#43934)
* Add SnapshotLifecycleService and related CRUD APIs
This commit adds `SnapshotLifecycleService` as a new service under the ilm
plugin. This service handles snapshot lifecycle policies by scheduling based on
the policies defined schedule.
This also includes the get, put, and delete APIs for these policies
Relates to #38461
* Make scheduledJobIds return an immutable set
* Use Object.equals for SnapshotLifecyclePolicy
* Remove unneeded TODO
* Implement ToXContentFragment on SnapshotLifecyclePolicyItem
* Copy contents of the scheduledJobIds
* Handle snapshot lifecycle policy updates and deletions (#40062)
(Note this is a PR against the `snapshot-lifecycle-management` feature branch)
This adds logic to `SnapshotLifecycleService` to handle updates and deletes for
snapshot policies. Policies with incremented versions have the old policy
cancelled and the new one scheduled. Deleted policies have their schedules
cancelled when they are no longer present in the cluster state metadata.
Relates to #38461
* Take a snapshot for the policy when the SLM policy is triggered (#40383)
(This is a PR for the `snapshot-lifecycle-management` branch)
This commit fills in `SnapshotLifecycleTask` to actually perform the
snapshotting when the policy is triggered. Currently there is no handling of the
results (other than logging) as that will be added in subsequent work.
This also adds unit tests and an integration test that schedules a policy and
ensures that a snapshot is correctly taken.
Relates to #38461
* Record most recent snapshot policy success/failure (#40619)
Keeping a record of the results of the successes and failures will aid
troubleshooting of policies and make users more confident that their
snapshots are being taken as expected.
This is the first step toward writing history in a more permanent
fashion.
* Validate snapshot lifecycle policies (#40654)
(This is a PR against the `snapshot-lifecycle-management` branch)
With the commit, we now validate the content of snapshot lifecycle policies when
the policy is being created or updated. This checks for the validity of the id,
name, schedule, and repository. Additionally, cluster state is checked to ensure
that the repository exists prior to the lifecycle being added to the cluster
state.
Part of #38461
* Hook SLM into ILM's start and stop APIs (#40871)
(This pull request is for the `snapshot-lifecycle-management` branch)
This change allows the existing `/_ilm/stop` and `/_ilm/start` APIs to also
manage snapshot lifecycle scheduling. When ILM is stopped all scheduled jobs are
cancelled.
Relates to #38461
* Add tests for SnapshotLifecyclePolicyItem (#40912)
Adds serialization tests for SnapshotLifecyclePolicyItem.
* Fix improper import in build.gradle after master merge
* Add human readable version of modified date for snapshot lifecycle policy (#41035)
* Add human readable version of modified date for snapshot lifecycle policy
This small change changes it from:
```
...
"modified_date": 1554843903242,
...
```
To
```
...
"modified_date" : "2019-04-09T21:05:03.242Z",
"modified_date_millis" : 1554843903242,
...
```
Including the `"modified_date"` field when the `?human` field is used.
Relates to #38461
* Fix test
* Add API to execute SLM policy on demand (#41038)
This commit adds the ability to perform a snapshot on demand for a policy. This
can be useful to take a snapshot immediately prior to performing some sort of
maintenance.
```json
PUT /_ilm/snapshot/<policy>/_execute
```
And it returns the response with the generated snapshot name:
```json
{
"snapshot_name" : "production-snap-2019.04.09-rfyv3j9qreixkdbnfuw0ug"
}
```
Note that this does not allow waiting for the snapshot, and the snapshot could
still fail. It *does* record this information into the cluster state similar to
a regularly trigged SLM job.
Relates to #38461
* Add next_execution to SLM policy metadata (#41221)
* Add next_execution to SLM policy metadata
This adds the next time a snapshot lifecycle policy will be executed when
retriving a policy's metadata, for example:
```json
GET /_ilm/snapshot?human
{
"production" : {
"version" : 1,
"modified_date" : "2019-04-15T21:16:21.865Z",
"modified_date_millis" : 1555362981865,
"policy" : {
"name" : "<production-snap-{now/d}>",
"schedule" : "*/30 * * * * ?",
"repository" : "repo",
"config" : {
"indices" : [
"foo-*",
"important"
],
"ignore_unavailable" : true,
"include_global_state" : false
}
},
"next_execution" : "2019-04-15T21:16:30.000Z",
"next_execution_millis" : 1555362990000
},
"other" : {
"version" : 1,
"modified_date" : "2019-04-15T21:12:19.959Z",
"modified_date_millis" : 1555362739959,
"policy" : {
"name" : "<other-snap-{now/d}>",
"schedule" : "0 30 2 * * ?",
"repository" : "repo",
"config" : {
"indices" : [
"other"
],
"ignore_unavailable" : false,
"include_global_state" : true
}
},
"next_execution" : "2019-04-16T02:30:00.000Z",
"next_execution_millis" : 1555381800000
}
}
```
Relates to #38461
* Fix and enhance tests
* Figured out how to Cron
* Change SLM endpoint from /_ilm/* to /_slm/* (#41320)
This commit changes the endpoint for snapshot lifecycle management from:
```
GET /_ilm/snapshot/<policy>
```
to:
```
GET /_slm/policy/<policy>
```
It mimics the ILM path only using `slm` instead of `ilm`.
Relates to #38461
* Add initial documentation for SLM (#41510)
* Add initial documentation for SLM
This adds the initial documentation for snapshot lifecycle management.
It also includes the REST spec API json files since they're sort of
documentation.
Relates to #38461
* Add `manage_slm` and `read_slm` roles (#41607)
* Add `manage_slm` and `read_slm` roles
This adds two more built in roles -
`manage_slm` which has permission to perform any of the SLM actions, as well as
stopping, starting, and retrieving the operation status of ILM.
`read_slm` which has permission to retrieve snapshot lifecycle policies as well
as retrieving the operation status of ILM.
Relates to #38461
* Add execute to the test
* Fix ilm -> slm typo in test
* Record SLM history into an index (#41707)
It is useful to have a record of the actions that Snapshot Lifecycle
Management takes, especially for the purposes of alerting when a
snapshot fails or has not been taken successfully for a certain amount of
time.
This adds the infrastructure to record SLM actions into an index that
can be queried at leisure, along with a lifecycle policy so that this
history does not grow without bound.
Additionally,
SLM automatically setting up an index + lifecycle policy leads to
`index_lifecycle` custom metadata in the cluster state, which some of
the ML tests don't know how to deal with due to setting up custom
`NamedXContentRegistry`s. Watcher would cause the same problem, but it
is already disabled (for the same reason).
* High Level Rest Client support for SLM (#41767)
* High Level Rest Client support for SLM
This commit add HLRC support for SLM.
Relates to #38461
* Fill out documentation tests with tags
* Add more callouts and asciidoc for HLRC
* Update javadoc links to real locations
* Add security test testing SLM cluster privileges (#42678)
* Add security test testing SLM cluster privileges
This adds a test to `PermissionsIT` that uses the `manage_slm` and `read_slm`
cluster privileges.
Relates to #38461
* Don't redefine vars
* Add Getting Started Guide for SLM (#42878)
This commit adds a basic Getting Started Guide for SLM.
* Include SLM policy name in Snapshot metadata (#43132)
Keep track of which SLM policy in the metadata field of the Snapshots
taken by SLM. This allows users to more easily understand where the
snapshot came from, and will enable future SLM features such as
retention policies.
* Fix compilation after master merge
* [TEST] Move exception wrapping for devious exception throwing
Fixes an issue where an exception was created from one line and thrown in another.
* Fix SLM for the change to AcknowledgedResponse
* Add Snapshot Lifecycle Management Package Docs (#43535)
* Fix compilation for transport actions now that task is required
* Add a note mentioning the privileges needed for SLM (#43708)
* Add a note mentioning the privileges needed for SLM
This adds a note to the top of the "getting started with SLM"
documentation mentioning that there are two built-in privileges to
assist with creating roles for SLM users and administrators.
Relates to #38461
* Mention that you can create snapshots for indices you can't read
* Fix REST tests for new number of cluster privileges
* Mute testThatNonExistingTemplatesAreAddedImmediately (#43951)
* Fix SnapshotHistoryStoreTests after merge
* Remove overridden newResponse functions that have been removed
* Fix compilation for backport
* Fix get snapshot output parsing in test
* [DOCS] Add redirects for removed autogen anchors (#44380)
* Switch <tt>...</tt> in javadocs for {@code ...}
2019-07-16 09:37:13 -04:00
|
|
|
When stopping ILM, snapshot lifecycle management operations are also stopped,
|
|
|
|
this means that no scheduled snapshots are created (currently ongoing snapshots
|
|
|
|
are unaffected).
|
|
|
|
|
2018-10-25 10:30:53 -04:00
|
|
|
Normally, ILM will be running by default.
|
|
|
|
To see the current operating status of ILM, use the <<ilm-get-status,Get Status API>>
|
|
|
|
to see the current state of ILM.
|
|
|
|
|
|
|
|
////
|
2019-09-09 13:38:14 -04:00
|
|
|
[source,console]
|
2018-10-25 10:30:53 -04:00
|
|
|
--------------------------------------------------
|
2018-10-30 19:19:05 -04:00
|
|
|
PUT _ilm/policy/my_policy
|
2018-10-25 10:30:53 -04:00
|
|
|
{
|
|
|
|
"policy": {
|
|
|
|
"phases": {
|
|
|
|
"warm": {
|
2018-10-30 15:54:02 -04:00
|
|
|
"min_age": "10d",
|
2018-10-25 10:30:53 -04:00
|
|
|
"actions": {
|
|
|
|
"forcemerge": {
|
|
|
|
"max_num_segments": 1
|
|
|
|
}
|
|
|
|
}
|
|
|
|
},
|
|
|
|
"delete": {
|
2018-10-30 15:54:02 -04:00
|
|
|
"min_age": "30d",
|
2018-10-25 10:30:53 -04:00
|
|
|
"actions": {
|
|
|
|
"delete": {}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2019-01-18 03:34:11 -05:00
|
|
|
PUT my_index
|
2018-10-25 10:30:53 -04:00
|
|
|
{
|
|
|
|
"settings": {
|
|
|
|
"index.lifecycle.name": "my_policy"
|
|
|
|
}
|
|
|
|
}
|
|
|
|
--------------------------------------------------
|
|
|
|
////
|
|
|
|
|
2019-09-09 13:38:14 -04:00
|
|
|
[source,console]
|
2018-10-25 10:30:53 -04:00
|
|
|
--------------------------------------------------
|
|
|
|
GET _ilm/status
|
|
|
|
--------------------------------------------------
|
|
|
|
|
|
|
|
If the request does not encounter errors, you receive the following result:
|
|
|
|
|
2019-09-06 09:22:08 -04:00
|
|
|
[source,console-result]
|
2018-10-25 10:30:53 -04:00
|
|
|
--------------------------------------------------
|
|
|
|
{
|
|
|
|
"operation_mode": "RUNNING"
|
|
|
|
}
|
|
|
|
--------------------------------------------------
|
|
|
|
|
|
|
|
The operating modes of ILM:
|
|
|
|
|
2019-04-30 10:19:09 -04:00
|
|
|
[[ilm-operating-modes]]
|
2018-10-25 10:30:53 -04:00
|
|
|
.ILM Operating Modes
|
2019-04-30 10:19:09 -04:00
|
|
|
[options="header"]
|
2018-10-25 10:30:53 -04:00
|
|
|
|===
|
|
|
|
|Name |Description
|
|
|
|
|RUNNING |Normal operation where all policies are executed as normal
|
|
|
|
|STOPPING|ILM has received a request to stop but is still processing some policies
|
|
|
|
|STOPPED |This represents a state where no policies are executed
|
|
|
|
|===
|
|
|
|
|
2018-11-16 13:49:55 -05:00
|
|
|
[float]
|
|
|
|
=== Stopping ILM=
|
2018-10-25 10:30:53 -04:00
|
|
|
|
|
|
|
The ILM service can be paused such that no further steps will be executed
|
|
|
|
using the <<ilm-stop,Stop API>>.
|
|
|
|
|
2019-09-09 13:38:14 -04:00
|
|
|
[source,console]
|
2018-10-25 10:30:53 -04:00
|
|
|
--------------------------------------------------
|
|
|
|
POST _ilm/stop
|
|
|
|
--------------------------------------------------
|
|
|
|
// TEST[continued]
|
|
|
|
|
|
|
|
When stopped, all further policy actions will be halted. This will
|
|
|
|
be reflected in the Status API
|
|
|
|
|
|
|
|
////
|
2019-09-09 13:38:14 -04:00
|
|
|
[source,console]
|
2018-10-25 10:30:53 -04:00
|
|
|
--------------------------------------------------
|
|
|
|
GET _ilm/status
|
|
|
|
--------------------------------------------------
|
|
|
|
// TEST[continued]
|
|
|
|
////
|
|
|
|
|
2019-09-06 09:22:08 -04:00
|
|
|
[source,console-result]
|
2018-10-25 10:30:53 -04:00
|
|
|
--------------------------------------------------
|
|
|
|
{
|
|
|
|
"operation_mode": "STOPPING"
|
|
|
|
}
|
|
|
|
--------------------------------------------------
|
|
|
|
|
|
|
|
The ILM service will then, asynchronously, run all policies to a point
|
|
|
|
where it is safe to stop. After ILM verifies that it is safe, it will
|
|
|
|
move to the `STOPPED` mode.
|
|
|
|
|
|
|
|
////
|
2019-09-09 13:38:14 -04:00
|
|
|
[source,console]
|
2018-10-25 10:30:53 -04:00
|
|
|
--------------------------------------------------
|
|
|
|
PUT trigger_ilm_cs_action
|
|
|
|
|
|
|
|
GET _ilm/status
|
|
|
|
--------------------------------------------------
|
|
|
|
// TEST[continued]
|
|
|
|
////
|
|
|
|
|
2019-09-06 09:22:08 -04:00
|
|
|
[source,console-result]
|
2018-10-25 10:30:53 -04:00
|
|
|
--------------------------------------------------
|
|
|
|
{
|
|
|
|
"operation_mode": "STOPPED"
|
|
|
|
}
|
|
|
|
--------------------------------------------------
|
|
|
|
|
2018-11-16 13:49:55 -05:00
|
|
|
[float]
|
2018-10-25 10:30:53 -04:00
|
|
|
=== Starting ILM
|
|
|
|
|
|
|
|
To start ILM and continue executing policies, use the <<ilm-start, Start API>>.
|
|
|
|
|
|
|
|
|
2019-09-09 13:38:14 -04:00
|
|
|
[source,console]
|
2018-10-25 10:30:53 -04:00
|
|
|
--------------------------------------------------
|
|
|
|
POST _ilm/start
|
|
|
|
--------------------------------------------------
|
|
|
|
// TEST[continued]
|
|
|
|
|
|
|
|
////
|
2019-09-09 13:38:14 -04:00
|
|
|
[source,console]
|
2018-10-25 10:30:53 -04:00
|
|
|
--------------------------------------------------
|
|
|
|
GET _ilm/status
|
|
|
|
--------------------------------------------------
|
|
|
|
// TEST[continued]
|
|
|
|
////
|
|
|
|
|
|
|
|
The Start API will send a request to the ILM service to immediately begin
|
|
|
|
normal operations.
|
|
|
|
|
2019-09-06 09:22:08 -04:00
|
|
|
[source,console-result]
|
2018-10-25 10:30:53 -04:00
|
|
|
--------------------------------------------------
|
|
|
|
{
|
|
|
|
"operation_mode": "RUNNING"
|
|
|
|
}
|
|
|
|
--------------------------------------------------
|