OpenSearch/docs/reference/ilm/error-handling.asciidoc

150 lines
4.6 KiB
Plaintext
Raw Normal View History

[role="xpack"]
[testenv="basic"]
[[index-lifecycle-error-handling]]
== Resolve lifecycle policy execution errors
When {ilm-init} executes a lifecycle policy, it's possible for errors to occur
while performing the necessary index operations for a step.
When this happens, {ilm-init} moves the index to an `ERROR` step.
If {ilm-init] cannot resolve the error automatically, execution is halted
until you resolve the underlying issues with the policy, index, or cluster.
For example, you might have a `shrink-index` policy that shrinks an index to four shards once it
is at least five days old:
[source,console]
--------------------------------------------------
PUT _ilm/policy/shrink-index
{
"policy": {
"phases": {
"warm": {
"min_age": "5d",
"actions": {
"shrink": {
"number_of_shards": 4
}
}
}
}
}
}
--------------------------------------------------
// TEST
There is nothing that prevents you from applying the `shrink-index` policy to a new
index that has only two shards:
[source,console]
--------------------------------------------------
PUT /my-index-000001
{
"settings": {
"index.number_of_shards": 2,
"index.lifecycle.name": "shrink-index"
}
}
--------------------------------------------------
// TEST[continued]
After five days, {ilm-init} attempts to shrink `my-index-000001` from two shards to four shards.
Because the shrink action cannot _increase_ the number of shards, this operation fails
and {ilm-init} moves `my-index-000001` to the `ERROR` step.
You can use the <<ilm-explain-lifecycle,{ilm-init} Explain API>> to get information about
what went wrong:
[source,console]
--------------------------------------------------
GET /my-index-000001/_ilm/explain
--------------------------------------------------
// TEST[continued]
Which returns the following information:
[source,console-result]
--------------------------------------------------
{
"indices" : {
"my-index-000001" : {
"index" : "my-index-000001",
"managed" : true,
"policy" : "shrink-index", <1>
"lifecycle_date_millis" : 1541717265865,
"age": "5.1d", <2>
"phase" : "warm", <3>
"phase_time_millis" : 1541717272601,
"action" : "shrink", <4>
"action_time_millis" : 1541717272601,
"step" : "ERROR", <5>
"step_time_millis" : 1541717272688,
"failed_step" : "shrink", <6>
"step_info" : {
"type" : "illegal_argument_exception", <7>
"reason" : "the number of target shards [4] must be less that the number of source shards [2]"
},
"phase_execution" : {
"policy" : "shrink-index",
"phase_definition" : { <8>
"min_age" : "5d",
"actions" : {
"shrink" : {
"number_of_shards" : 4
}
}
},
"version" : 1,
"modified_date_in_millis" : 1541717264230
}
}
}
}
--------------------------------------------------
// TESTRESPONSE[skip:no way to know if we will get this response immediately]
<1> The policy being used to manage the index: `shrink-index`
<2> The index age: 5.1 days
<3> The phase the index is currently in: `warm`
<4> The current action: `shrink`
<5> The step the index is currently in: `ERROR`
<6> The step that failed to execute: `shrink`
<7> The type of error and a description of that error.
<8> The definition of the current phase from the `shrink-index` policy
To resolve this, you could update the policy to shrink the index to a single shard after 5 days:
[source,console]
--------------------------------------------------
PUT _ilm/policy/shrink-index
{
"policy": {
"phases": {
"warm": {
"min_age": "5d",
"actions": {
"shrink": {
"number_of_shards": 1
}
}
}
}
}
}
--------------------------------------------------
// TEST[continued]
[discrete]
=== Retrying failed lifecycle policy steps
Once you fix the problem that put an index in the `ERROR` step,
you might need to explicitly tell {ilm-init} to retry the step:
[source,console]
--------------------------------------------------
POST /my-index-000001/_ilm/retry
--------------------------------------------------
// TEST[skip:we can't be sure the index is ready to be retried at this point]
{ilm-init} subsequently attempts to re-run the step that failed.
You can use the <<ilm-explain-lifecycle,{ilm-init} Explain API>> to monitor the progress.