2018-11-13 14:19:22 -05:00
|
|
|
[role="xpack"]
|
|
|
|
[testenv="basic"]
|
|
|
|
[[index-lifecycle-error-handling]]
|
2020-06-05 21:55:51 -04:00
|
|
|
== Resolve lifecycle policy execution errors
|
2018-11-13 14:19:22 -05:00
|
|
|
|
2020-06-05 21:55:51 -04:00
|
|
|
When {ilm-init} executes a lifecycle policy, it's possible for errors to occur
|
|
|
|
while performing the necessary index operations for a step.
|
|
|
|
When this happens, {ilm-init} moves the index to an `ERROR` step.
|
|
|
|
If {ilm-init] cannot resolve the error automatically, execution is halted
|
|
|
|
until you resolve the underlying issues with the policy, index, or cluster.
|
2018-11-13 14:19:22 -05:00
|
|
|
|
2020-06-05 21:55:51 -04:00
|
|
|
For example, you might have a `shrink-index` policy that shrinks an index to four shards once it
|
|
|
|
is at least five days old:
|
2018-11-13 14:19:22 -05:00
|
|
|
|
2019-09-09 13:38:14 -04:00
|
|
|
[source,console]
|
2018-11-13 14:19:22 -05:00
|
|
|
--------------------------------------------------
|
2020-06-05 21:55:51 -04:00
|
|
|
PUT _ilm/policy/shrink-index
|
2018-11-13 14:19:22 -05:00
|
|
|
{
|
|
|
|
"policy": {
|
|
|
|
"phases": {
|
|
|
|
"warm": {
|
|
|
|
"min_age": "5d",
|
|
|
|
"actions": {
|
|
|
|
"shrink": {
|
|
|
|
"number_of_shards": 4
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
--------------------------------------------------
|
|
|
|
// TEST
|
|
|
|
|
2020-06-05 21:55:51 -04:00
|
|
|
There is nothing that prevents you from applying the `shrink-index` policy to a new
|
|
|
|
index that has only two shards:
|
2018-11-13 14:19:22 -05:00
|
|
|
|
2019-09-09 13:38:14 -04:00
|
|
|
[source,console]
|
2018-11-13 14:19:22 -05:00
|
|
|
--------------------------------------------------
|
2020-07-27 15:58:26 -04:00
|
|
|
PUT /my-index-000001
|
2018-11-13 14:19:22 -05:00
|
|
|
{
|
|
|
|
"settings": {
|
|
|
|
"index.number_of_shards": 2,
|
2020-06-05 21:55:51 -04:00
|
|
|
"index.lifecycle.name": "shrink-index"
|
2018-11-13 14:19:22 -05:00
|
|
|
}
|
|
|
|
}
|
|
|
|
--------------------------------------------------
|
|
|
|
// TEST[continued]
|
|
|
|
|
2020-07-27 15:58:26 -04:00
|
|
|
After five days, {ilm-init} attempts to shrink `my-index-000001` from two shards to four shards.
|
2020-06-05 21:55:51 -04:00
|
|
|
Because the shrink action cannot _increase_ the number of shards, this operation fails
|
2020-07-27 15:58:26 -04:00
|
|
|
and {ilm-init} moves `my-index-000001` to the `ERROR` step.
|
2020-06-05 21:55:51 -04:00
|
|
|
|
|
|
|
You can use the <<ilm-explain-lifecycle,{ilm-init} Explain API>> to get information about
|
|
|
|
what went wrong:
|
2018-11-13 14:19:22 -05:00
|
|
|
|
2019-09-09 13:38:14 -04:00
|
|
|
[source,console]
|
2018-11-13 14:19:22 -05:00
|
|
|
--------------------------------------------------
|
2020-07-27 15:58:26 -04:00
|
|
|
GET /my-index-000001/_ilm/explain
|
2018-11-13 14:19:22 -05:00
|
|
|
--------------------------------------------------
|
|
|
|
// TEST[continued]
|
|
|
|
|
|
|
|
Which returns the following information:
|
|
|
|
|
2019-09-20 12:20:12 -04:00
|
|
|
[source,console-result]
|
2018-11-13 14:19:22 -05:00
|
|
|
--------------------------------------------------
|
|
|
|
{
|
|
|
|
"indices" : {
|
2020-07-27 15:58:26 -04:00
|
|
|
"my-index-000001" : {
|
|
|
|
"index" : "my-index-000001",
|
2020-06-05 21:55:51 -04:00
|
|
|
"managed" : true,
|
|
|
|
"policy" : "shrink-index", <1>
|
2018-11-13 14:19:22 -05:00
|
|
|
"lifecycle_date_millis" : 1541717265865,
|
2020-06-05 21:55:51 -04:00
|
|
|
"age": "5.1d", <2>
|
|
|
|
"phase" : "warm", <3>
|
2018-11-13 14:19:22 -05:00
|
|
|
"phase_time_millis" : 1541717272601,
|
2020-06-05 21:55:51 -04:00
|
|
|
"action" : "shrink", <4>
|
2018-11-13 14:19:22 -05:00
|
|
|
"action_time_millis" : 1541717272601,
|
2020-06-05 21:55:51 -04:00
|
|
|
"step" : "ERROR", <5>
|
2018-11-13 14:19:22 -05:00
|
|
|
"step_time_millis" : 1541717272688,
|
2020-06-05 21:55:51 -04:00
|
|
|
"failed_step" : "shrink", <6>
|
2018-11-13 14:19:22 -05:00
|
|
|
"step_info" : {
|
2020-06-05 21:55:51 -04:00
|
|
|
"type" : "illegal_argument_exception", <7>
|
|
|
|
"reason" : "the number of target shards [4] must be less that the number of source shards [2]"
|
2018-11-13 14:19:22 -05:00
|
|
|
},
|
|
|
|
"phase_execution" : {
|
2020-06-05 21:55:51 -04:00
|
|
|
"policy" : "shrink-index",
|
|
|
|
"phase_definition" : { <8>
|
2018-11-13 14:19:22 -05:00
|
|
|
"min_age" : "5d",
|
|
|
|
"actions" : {
|
|
|
|
"shrink" : {
|
|
|
|
"number_of_shards" : 4
|
|
|
|
}
|
|
|
|
}
|
|
|
|
},
|
|
|
|
"version" : 1,
|
|
|
|
"modified_date_in_millis" : 1541717264230
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
--------------------------------------------------
|
|
|
|
// TESTRESPONSE[skip:no way to know if we will get this response immediately]
|
2019-09-09 13:38:14 -04:00
|
|
|
|
2020-06-05 21:55:51 -04:00
|
|
|
<1> The policy being used to manage the index: `shrink-index`
|
|
|
|
<2> The index age: 5.1 days
|
|
|
|
<3> The phase the index is currently in: `warm`
|
|
|
|
<4> The current action: `shrink`
|
|
|
|
<5> The step the index is currently in: `ERROR`
|
|
|
|
<6> The step that failed to execute: `shrink`
|
|
|
|
<7> The type of error and a description of that error.
|
|
|
|
<8> The definition of the current phase from the `shrink-index` policy
|
|
|
|
|
|
|
|
To resolve this, you could update the policy to shrink the index to a single shard after 5 days:
|
2018-11-13 14:19:22 -05:00
|
|
|
|
2019-09-09 13:38:14 -04:00
|
|
|
[source,console]
|
2018-11-13 14:19:22 -05:00
|
|
|
--------------------------------------------------
|
2020-06-05 21:55:51 -04:00
|
|
|
PUT _ilm/policy/shrink-index
|
2018-11-13 14:19:22 -05:00
|
|
|
{
|
|
|
|
"policy": {
|
|
|
|
"phases": {
|
|
|
|
"warm": {
|
|
|
|
"min_age": "5d",
|
|
|
|
"actions": {
|
|
|
|
"shrink": {
|
|
|
|
"number_of_shards": 1
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
--------------------------------------------------
|
|
|
|
// TEST[continued]
|
|
|
|
|
2020-04-28 19:38:01 -04:00
|
|
|
[discrete]
|
2020-06-05 21:55:51 -04:00
|
|
|
=== Retrying failed lifecycle policy steps
|
2018-11-13 14:19:22 -05:00
|
|
|
|
2020-06-05 21:55:51 -04:00
|
|
|
Once you fix the problem that put an index in the `ERROR` step,
|
|
|
|
you might need to explicitly tell {ilm-init} to retry the step:
|
2018-11-13 14:19:22 -05:00
|
|
|
|
2019-09-09 13:38:14 -04:00
|
|
|
[source,console]
|
2018-11-13 14:19:22 -05:00
|
|
|
--------------------------------------------------
|
2020-07-27 15:58:26 -04:00
|
|
|
POST /my-index-000001/_ilm/retry
|
2018-11-13 14:19:22 -05:00
|
|
|
--------------------------------------------------
|
|
|
|
// TEST[skip:we can't be sure the index is ready to be retried at this point]
|
|
|
|
|
2020-06-05 21:55:51 -04:00
|
|
|
{ilm-init} subsequently attempts to re-run the step that failed.
|
|
|
|
You can use the <<ilm-explain-lifecycle,{ilm-init} Explain API>> to monitor the progress.
|