2018-11-13 14:19:22 -05:00
|
|
|
[role="xpack"]
|
|
|
|
[testenv="basic"]
|
|
|
|
[[index-lifecycle-error-handling]]
|
2018-12-20 13:23:28 -05:00
|
|
|
== Index lifecycle error handling
|
2018-11-13 14:19:22 -05:00
|
|
|
|
|
|
|
During Index Lifecycle Management's execution of the policy for an index, it's
|
|
|
|
possible for a step to encounter an error during its execution. When this
|
|
|
|
happens, ILM will move the management state into an "error" step. This halts
|
|
|
|
further execution of the policy and gives an administrator the chance to address
|
|
|
|
any issues with the policy, index, or cluster.
|
|
|
|
|
|
|
|
An example will be helpful in illustrating this, imagine the following policy
|
|
|
|
has been created by a user:
|
|
|
|
|
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------
|
|
|
|
PUT _ilm/policy/shrink-the-index
|
|
|
|
{
|
|
|
|
"policy": {
|
|
|
|
"phases": {
|
|
|
|
"warm": {
|
|
|
|
"min_age": "5d",
|
|
|
|
"actions": {
|
|
|
|
"shrink": {
|
|
|
|
"number_of_shards": 4
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
--------------------------------------------------
|
|
|
|
// CONSOLE
|
|
|
|
// TEST
|
|
|
|
|
|
|
|
This policy waits until the index is at least 5 days old, and then shrinks
|
|
|
|
the index to 4 shards.
|
|
|
|
|
|
|
|
Now imagine that a user creates a new index "myindex" with two primary shards,
|
|
|
|
telling it to use the policy they have created:
|
|
|
|
|
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------
|
2019-01-18 03:34:11 -05:00
|
|
|
PUT /myindex
|
2018-11-13 14:19:22 -05:00
|
|
|
{
|
|
|
|
"settings": {
|
|
|
|
"index.number_of_shards": 2,
|
|
|
|
"index.lifecycle.name": "shrink-the-index"
|
|
|
|
}
|
|
|
|
}
|
|
|
|
--------------------------------------------------
|
|
|
|
// CONSOLE
|
|
|
|
// TEST[continued]
|
|
|
|
|
|
|
|
After five days have passed, ILM will attempt to shrink this index from 2
|
|
|
|
shards to 4, which is invalid since the shrink action cannot increase the
|
|
|
|
number of shards. When this occurs, ILM will move this
|
|
|
|
index to the "error" step. Once an index is in this step, information about the
|
|
|
|
reason for the error can be retrieved from the <<ilm-explain-lifecycle,ILM Explain API>>:
|
|
|
|
|
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------
|
|
|
|
GET /myindex/_ilm/explain
|
|
|
|
--------------------------------------------------
|
|
|
|
// CONSOLE
|
|
|
|
// TEST[continued]
|
|
|
|
|
|
|
|
Which returns the following information:
|
|
|
|
|
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------
|
|
|
|
{
|
|
|
|
"indices" : {
|
|
|
|
"myindex" : {
|
|
|
|
"index" : "myindex",
|
|
|
|
"managed" : true, <1>
|
|
|
|
"policy" : "shrink-the-index", <2>
|
|
|
|
"lifecycle_date_millis" : 1541717265865,
|
Expose index age in ILM explain output (#44457)
* Expose index age in ILM explain output
This adds the index's age to the ILM explain output, for example:
```
{
"indices" : {
"ilm-000001" : {
"index" : "ilm-000001",
"managed" : true,
"policy" : "full-lifecycle",
"lifecycle_date" : "2019-07-16T19:48:22.294Z",
"lifecycle_date_millis" : 1563306502294,
"age" : "1.34m",
"phase" : "hot",
"phase_time" : "2019-07-16T19:48:22.487Z",
... etc ...
}
}
}
```
This age can be used to tell when ILM will transition the index to the
next phase, based on that phase's `min_age`.
Resolves #38988
* Expose age in getters and in HLRC
2019-07-18 17:32:52 -04:00
|
|
|
"age": "5.1d", <3>
|
|
|
|
"phase" : "warm", <4>
|
2018-11-13 14:19:22 -05:00
|
|
|
"phase_time_millis" : 1541717272601,
|
Expose index age in ILM explain output (#44457)
* Expose index age in ILM explain output
This adds the index's age to the ILM explain output, for example:
```
{
"indices" : {
"ilm-000001" : {
"index" : "ilm-000001",
"managed" : true,
"policy" : "full-lifecycle",
"lifecycle_date" : "2019-07-16T19:48:22.294Z",
"lifecycle_date_millis" : 1563306502294,
"age" : "1.34m",
"phase" : "hot",
"phase_time" : "2019-07-16T19:48:22.487Z",
... etc ...
}
}
}
```
This age can be used to tell when ILM will transition the index to the
next phase, based on that phase's `min_age`.
Resolves #38988
* Expose age in getters and in HLRC
2019-07-18 17:32:52 -04:00
|
|
|
"action" : "shrink", <5>
|
2018-11-13 14:19:22 -05:00
|
|
|
"action_time_millis" : 1541717272601,
|
Expose index age in ILM explain output (#44457)
* Expose index age in ILM explain output
This adds the index's age to the ILM explain output, for example:
```
{
"indices" : {
"ilm-000001" : {
"index" : "ilm-000001",
"managed" : true,
"policy" : "full-lifecycle",
"lifecycle_date" : "2019-07-16T19:48:22.294Z",
"lifecycle_date_millis" : 1563306502294,
"age" : "1.34m",
"phase" : "hot",
"phase_time" : "2019-07-16T19:48:22.487Z",
... etc ...
}
}
}
```
This age can be used to tell when ILM will transition the index to the
next phase, based on that phase's `min_age`.
Resolves #38988
* Expose age in getters and in HLRC
2019-07-18 17:32:52 -04:00
|
|
|
"step" : "ERROR", <6>
|
2018-11-13 14:19:22 -05:00
|
|
|
"step_time_millis" : 1541717272688,
|
Expose index age in ILM explain output (#44457)
* Expose index age in ILM explain output
This adds the index's age to the ILM explain output, for example:
```
{
"indices" : {
"ilm-000001" : {
"index" : "ilm-000001",
"managed" : true,
"policy" : "full-lifecycle",
"lifecycle_date" : "2019-07-16T19:48:22.294Z",
"lifecycle_date_millis" : 1563306502294,
"age" : "1.34m",
"phase" : "hot",
"phase_time" : "2019-07-16T19:48:22.487Z",
... etc ...
}
}
}
```
This age can be used to tell when ILM will transition the index to the
next phase, based on that phase's `min_age`.
Resolves #38988
* Expose age in getters and in HLRC
2019-07-18 17:32:52 -04:00
|
|
|
"failed_step" : "shrink", <7>
|
2018-11-13 14:19:22 -05:00
|
|
|
"step_info" : {
|
Expose index age in ILM explain output (#44457)
* Expose index age in ILM explain output
This adds the index's age to the ILM explain output, for example:
```
{
"indices" : {
"ilm-000001" : {
"index" : "ilm-000001",
"managed" : true,
"policy" : "full-lifecycle",
"lifecycle_date" : "2019-07-16T19:48:22.294Z",
"lifecycle_date_millis" : 1563306502294,
"age" : "1.34m",
"phase" : "hot",
"phase_time" : "2019-07-16T19:48:22.487Z",
... etc ...
}
}
}
```
This age can be used to tell when ILM will transition the index to the
next phase, based on that phase's `min_age`.
Resolves #38988
* Expose age in getters and in HLRC
2019-07-18 17:32:52 -04:00
|
|
|
"type" : "illegal_argument_exception", <8>
|
|
|
|
"reason" : "the number of target shards [4] must be less that the number of source shards [2]" <9>
|
2018-11-13 14:19:22 -05:00
|
|
|
},
|
|
|
|
"phase_execution" : {
|
|
|
|
"policy" : "shrink-the-index",
|
Expose index age in ILM explain output (#44457)
* Expose index age in ILM explain output
This adds the index's age to the ILM explain output, for example:
```
{
"indices" : {
"ilm-000001" : {
"index" : "ilm-000001",
"managed" : true,
"policy" : "full-lifecycle",
"lifecycle_date" : "2019-07-16T19:48:22.294Z",
"lifecycle_date_millis" : 1563306502294,
"age" : "1.34m",
"phase" : "hot",
"phase_time" : "2019-07-16T19:48:22.487Z",
... etc ...
}
}
}
```
This age can be used to tell when ILM will transition the index to the
next phase, based on that phase's `min_age`.
Resolves #38988
* Expose age in getters and in HLRC
2019-07-18 17:32:52 -04:00
|
|
|
"phase_definition" : { <10>
|
2018-11-13 14:19:22 -05:00
|
|
|
"min_age" : "5d",
|
|
|
|
"actions" : {
|
|
|
|
"shrink" : {
|
|
|
|
"number_of_shards" : 4
|
|
|
|
}
|
|
|
|
}
|
|
|
|
},
|
|
|
|
"version" : 1,
|
|
|
|
"modified_date_in_millis" : 1541717264230
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
--------------------------------------------------
|
|
|
|
// CONSOLE
|
|
|
|
// TESTRESPONSE[skip:no way to know if we will get this response immediately]
|
|
|
|
<1> this index is managed by ILM
|
|
|
|
<2> the policy in question, in this case, "shrink-the-index"
|
Expose index age in ILM explain output (#44457)
* Expose index age in ILM explain output
This adds the index's age to the ILM explain output, for example:
```
{
"indices" : {
"ilm-000001" : {
"index" : "ilm-000001",
"managed" : true,
"policy" : "full-lifecycle",
"lifecycle_date" : "2019-07-16T19:48:22.294Z",
"lifecycle_date_millis" : 1563306502294,
"age" : "1.34m",
"phase" : "hot",
"phase_time" : "2019-07-16T19:48:22.487Z",
... etc ...
}
}
}
```
This age can be used to tell when ILM will transition the index to the
next phase, based on that phase's `min_age`.
Resolves #38988
* Expose age in getters and in HLRC
2019-07-18 17:32:52 -04:00
|
|
|
<3> the current age for the index
|
|
|
|
<4> what phase the index is currently in
|
|
|
|
<5> what action the index is currently on
|
|
|
|
<6> what step the index is currently on, in this case, because there is an error, the index is in the "ERROR" step
|
|
|
|
<7> the name of the step that failed to execute, in this case "shrink"
|
|
|
|
<8> the error class that occurred during this step
|
|
|
|
<9> the error message that occurred during the execution failure
|
|
|
|
<10> the definition of the phase (in this case, the "warm" phase) that the index is currently on
|
2018-11-13 14:19:22 -05:00
|
|
|
|
|
|
|
The index here has been moved to the error step because the shrink definition in
|
2019-01-07 08:44:12 -05:00
|
|
|
the policy is using an incorrect number of shards. So rectifying that in the
|
2018-11-13 14:19:22 -05:00
|
|
|
policy entails updating the existing policy to use one instead of four for
|
|
|
|
the targeted number of shards.
|
|
|
|
|
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------
|
|
|
|
PUT _ilm/policy/shrink-the-index
|
|
|
|
{
|
|
|
|
"policy": {
|
|
|
|
"phases": {
|
|
|
|
"warm": {
|
|
|
|
"min_age": "5d",
|
|
|
|
"actions": {
|
|
|
|
"shrink": {
|
|
|
|
"number_of_shards": 1
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
--------------------------------------------------
|
|
|
|
// CONSOLE
|
|
|
|
// TEST[continued]
|
|
|
|
|
2018-11-16 13:49:55 -05:00
|
|
|
[float]
|
2018-11-13 14:19:22 -05:00
|
|
|
=== Retrying failed index lifecycle management steps
|
|
|
|
|
|
|
|
Once the underlying issue that caused an index to move to the error step has
|
|
|
|
been corrected, index lifecycle management must be told to retry the step to see
|
|
|
|
if it can progress further. This is accomplished by invoking the retry API
|
|
|
|
|
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------
|
|
|
|
POST /myindex/_ilm/retry
|
|
|
|
--------------------------------------------------
|
|
|
|
// CONSOLE
|
|
|
|
// TEST[skip:we can't be sure the index is ready to be retried at this point]
|
|
|
|
|
|
|
|
Once this has been issue, index lifecycle management will asynchronously pick up
|
|
|
|
on the step that is in a failed state, attempting to re-run it. The
|
|
|
|
<<ilm-explain-lifecycle,ILM Explain API>> can again be used to monitor the status of
|
|
|
|
re-running the step.
|