[role="xpack"]
[testenv="basic"]
[[index-lifecycle-error-handling]]
== Resolve lifecycle policy execution errors

When {ilm-init} executes a lifecycle policy, it's possible for errors to occur
while performing the necessary index operations for a step. 
When this happens, {ilm-init} moves the index to an `ERROR` step. 
If {ilm-init] cannot resolve the error automatically, execution is halted  
until you resolve the underlying issues with the policy, index, or cluster.

For example, you might have a `shrink-index` policy that shrinks an index to four shards once it
is at least five days old: 

[source,console]
--------------------------------------------------
PUT _ilm/policy/shrink-index
{
  "policy": {
    "phases": {
      "warm": {
        "min_age": "5d",
        "actions": {
          "shrink": {
            "number_of_shards": 4
          }
        }
      }
    }
  }
}
--------------------------------------------------
// TEST

There is nothing that prevents you from applying the `shrink-index` policy to a new
index that has only two shards:

[source,console]
--------------------------------------------------
PUT /my-index-000001
{
  "settings": {
    "index.number_of_shards": 2,
    "index.lifecycle.name": "shrink-index"
  }
}
--------------------------------------------------
// TEST[continued]

After five days, {ilm-init} attempts to shrink `my-index-000001` from two shards to four shards.
Because the shrink action cannot _increase_ the number of shards, this operation fails 
and {ilm-init} moves `my-index-000001` to the `ERROR` step. 

You can use the <<ilm-explain-lifecycle,{ilm-init} Explain API>> to get information about
what went wrong: 

[source,console]
--------------------------------------------------
GET /my-index-000001/_ilm/explain
--------------------------------------------------
// TEST[continued]

Which returns the following information:

[source,console-result]
--------------------------------------------------
{
  "indices" : {
    "my-index-000001" : {
      "index" : "my-index-000001",
      "managed" : true,                         
      "policy" : "shrink-index",                <1>
      "lifecycle_date_millis" : 1541717265865,
      "age": "5.1d",                            <2>
      "phase" : "warm",                         <3>
      "phase_time_millis" : 1541717272601,
      "action" : "shrink",                      <4>
      "action_time_millis" : 1541717272601,
      "step" : "ERROR",                         <5>
      "step_time_millis" : 1541717272688,
      "failed_step" : "shrink",                 <6>
      "step_info" : {
        "type" : "illegal_argument_exception",  <7>
        "reason" : "the number of target shards [4] must be less that the number of source shards [2]"
      },
      "phase_execution" : {
        "policy" : "shrink-index",
        "phase_definition" : {                  <8>
          "min_age" : "5d",
          "actions" : {
            "shrink" : {
              "number_of_shards" : 4
            }
          }
        },
        "version" : 1,
        "modified_date_in_millis" : 1541717264230
      }
    }
  }
}
--------------------------------------------------
// TESTRESPONSE[skip:no way to know if we will get this response immediately]

<1> The policy being used to manage the index: `shrink-index`
<2> The index age: 5.1 days
<3> The phase the index is currently in: `warm`
<4> The current action: `shrink`
<5> The step the index is currently in: `ERROR`
<6> The step that failed to execute: `shrink`
<7> The type of error and a description of that error.
<8> The definition of the current phase from the `shrink-index` policy

To resolve this, you could update the policy to shrink the index to a single shard after 5 days:

[source,console]
--------------------------------------------------
PUT _ilm/policy/shrink-index
{
  "policy": {
    "phases": {
      "warm": {
        "min_age": "5d",
        "actions": {
          "shrink": {
            "number_of_shards": 1
          }
        }
      }
    }
  }
}
--------------------------------------------------
// TEST[continued]

[discrete]
=== Retrying failed lifecycle policy steps

Once you fix the problem that put an index in the `ERROR` step, 
you might need to explicitly tell {ilm-init} to retry the step:

[source,console]
--------------------------------------------------
POST /my-index-000001/_ilm/retry
--------------------------------------------------
// TEST[skip:we can't be sure the index is ready to be retried at this point]

{ilm-init} subsequently attempts to re-run the step that failed. 
You can use the <<ilm-explain-lifecycle,{ilm-init} Explain API>> to monitor the progress.