[role="xpack"] [testenv="basic"] [[index-lifecycle-error-handling]] == Resolve lifecycle policy execution errors When {ilm-init} executes a lifecycle policy, it's possible for errors to occur while performing the necessary index operations for a step. When this happens, {ilm-init} moves the index to an `ERROR` step. If {ilm-init] cannot resolve the error automatically, execution is halted until you resolve the underlying issues with the policy, index, or cluster. For example, you might have a `shrink-index` policy that shrinks an index to four shards once it is at least five days old: [source,console] -------------------------------------------------- PUT _ilm/policy/shrink-index { "policy": { "phases": { "warm": { "min_age": "5d", "actions": { "shrink": { "number_of_shards": 4 } } } } } } -------------------------------------------------- // TEST There is nothing that prevents you from applying the `shrink-index` policy to a new index that has only two shards: [source,console] -------------------------------------------------- PUT /my-index-000001 { "settings": { "index.number_of_shards": 2, "index.lifecycle.name": "shrink-index" } } -------------------------------------------------- // TEST[continued] After five days, {ilm-init} attempts to shrink `my-index-000001` from two shards to four shards. Because the shrink action cannot _increase_ the number of shards, this operation fails and {ilm-init} moves `my-index-000001` to the `ERROR` step. You can use the <> to get information about what went wrong: [source,console] -------------------------------------------------- GET /my-index-000001/_ilm/explain -------------------------------------------------- // TEST[continued] Which returns the following information: [source,console-result] -------------------------------------------------- { "indices" : { "my-index-000001" : { "index" : "my-index-000001", "managed" : true, "policy" : "shrink-index", <1> "lifecycle_date_millis" : 1541717265865, "age": "5.1d", <2> "phase" : "warm", <3> "phase_time_millis" : 1541717272601, "action" : "shrink", <4> "action_time_millis" : 1541717272601, "step" : "ERROR", <5> "step_time_millis" : 1541717272688, "failed_step" : "shrink", <6> "step_info" : { "type" : "illegal_argument_exception", <7> "reason" : "the number of target shards [4] must be less that the number of source shards [2]" }, "phase_execution" : { "policy" : "shrink-index", "phase_definition" : { <8> "min_age" : "5d", "actions" : { "shrink" : { "number_of_shards" : 4 } } }, "version" : 1, "modified_date_in_millis" : 1541717264230 } } } } -------------------------------------------------- // TESTRESPONSE[skip:no way to know if we will get this response immediately] <1> The policy being used to manage the index: `shrink-index` <2> The index age: 5.1 days <3> The phase the index is currently in: `warm` <4> The current action: `shrink` <5> The step the index is currently in: `ERROR` <6> The step that failed to execute: `shrink` <7> The type of error and a description of that error. <8> The definition of the current phase from the `shrink-index` policy To resolve this, you could update the policy to shrink the index to a single shard after 5 days: [source,console] -------------------------------------------------- PUT _ilm/policy/shrink-index { "policy": { "phases": { "warm": { "min_age": "5d", "actions": { "shrink": { "number_of_shards": 1 } } } } } } -------------------------------------------------- // TEST[continued] [discrete] === Retrying failed lifecycle policy steps Once you fix the problem that put an index in the `ERROR` step, you might need to explicitly tell {ilm-init} to retry the step: [source,console] -------------------------------------------------- POST /my-index-000001/_ilm/retry -------------------------------------------------- // TEST[skip:we can't be sure the index is ready to be retried at this point] {ilm-init} subsequently attempts to re-run the step that failed. You can use the <> to monitor the progress.