Commit Graph

154 Commits

Author SHA1 Message Date
Gordon Brown f6ac0e4bbc
[ILM] Fix Move To Step API causing ILM to hang (#34618)
The Move To Step API now checks to see if the target step is an
AsyncActionStep, and if so, runs it.

Previously, AsyncActionSteps would only be run when they are entered by
executing the previous step, so if an AsyncActionStep was entered via
the Move To Step API, ILM would never touch that index again.
2018-10-29 11:18:12 -06:00
Tal Levy f6ce935444
fix `GET _ilm` response with uninitialized ILM metadata (#34881)
ILM would return a resource-not-found exception when requesting policies
while the IndexLifecycleMetaData is not initialized. The behavior here
should not be as extreme since it is not the user's fault.

This commit changes the behavior so that it succeeds and returns no policies
when no policy names are explicitely specified, otherwise keep the same behavior
of throwing an exception
2018-10-25 16:00:44 -07:00
Tal Levy 41eaa586e8
remove index.lifecycle.skip setting (#34823)
With the introduction of _ilm/stop and _ilm/start APIs, the
use cases where one would only target a select group
of indices to start/stop has been reduced. Since there is no
strong use-case for skipping specific indices, it is best to
remove this functionality and only adding if later desired, with the
hopes of keeping things more simple.
2018-10-25 07:27:04 -07:00
Tal Levy 21b9b024c7
fix PolicyStatsTests mutateInstance (#34835)
through randomization, there is a chance that the mutateInstance
for PolicyStatsTests does not actually mutate the original object.
This PR aims to fix this
2018-10-25 07:24:54 -07:00
Colin Goodheart-Smithe 0b26f8b14c
Fixes NPE in multi node qa testt 2018-10-25 10:45:30 +01:00
Colin Goodheart-Smithe e7fddb5c93
Adds usage data for ILM (#33377)
* Adds usage data for ILM

* Adds tests for IndexLifecycleFeatureSetUsage and friends

* Adds tests for IndexLifecycleFeatureSet

* Fixes merge errors

* Add number of indices managed to usage stats

Also adds more tests

* Addresses Review comments
2018-10-24 18:28:46 +01:00
Colin Goodheart-Smithe c7fe87e43f
Removes Set Policy API in favour of setting index.lifecycle.name directly (#34304)
* Removes Set Policy API in favour of setting index.lifecycle.name
directly

* Reinstates matcher that will still be used

* Cleans up code after rebase

* Adds test to check changing policy with ndex settings works

* Fixes TimeseriesLifecycleActionsIT after API removal

* Fixes docs tests

* Fixes case on close where lifecycle service was never created
2018-10-24 16:14:59 +01:00
Lee Hinman c5a264e77f
Ensure phase_time is set when in the "new" phase (#34280)
Since there's no transition into the "new" phase it wasn't set until the "hot"
phase, so now we initialize it when initializing the policy context.

Resolves #34277
2018-10-23 15:20:41 -06:00
Gordon Brown 9cb0bb8b9f
Rework ILM build to separate integration tests (#34617)
Having integration tests separated from the unit tests in the qa
directory works much more smoothly with our testing infrastructure,
matches what other plugins do, and tests in a more "real" deployment
scenario by having all plugins installed.
2018-10-18 13:33:33 -06:00
Tal Levy fdb850735a fix setting version on deleting unmaanged indices with wildcard 2018-10-16 23:39:48 -07:00
Tal Levy 3a555da34d update version on ILM setting updates 2018-10-16 15:43:10 -07:00
Jack Conradson 80474e138f
HLRC: Add remove index lifecycle policy (#34204)
This change adds the command RemoveIndexLifecyclePolicy to the HLRC. This uses the 
new TimeRequest as a base class for RemoveIndexLifecyclePolicyRequest on the client side.
2018-10-16 08:12:06 -07:00
Lee Hinman 9ad2a7fa77
Fix expected next step being incorrect when executing async action (#34313)
This fixes an issue where an incorrect expected next step is used when checking
to execute `AsyncActionStep`s after a cluster state step.

It fixes this scenario:

- `ExecuteStepsUpdateTask` executes a `ClusterStateWaitStep` or
  `ClusterStateActionStep` successfully
- The next step is also a `ClusterStateWaitStep`, so it loops
- The `ClusterStateWaitStep` has a next stepkey (which gets set to the
  `nextStepKey` in the code)
- The `ClusterStateWaitStep` fails the condition, meaning that it will have to
  wait longer
- The `nextStepKey` is now incorrect though, because we did not advance the
  index's step, and it's not `null` (which is another safe value if there is no
  step after the `ClusterStateWaitStep`)

This fixes the problem by resetting the nextStepKey to null if the condition is
not met, since we are not going to advance the step metadata in this
case (thereby skipping the `maybeRunAsyncAction` invocation).

This commit also tightens up and enhances much of the ILM logging. A lot of
logging was missing the index name (making it hard to debug in the presence of
multiple indices) and a lot was using the wrong logging level (DEBUG is now
actually readable without being a wall of text).

Resolves #34297
2018-10-08 11:25:18 -06:00
Gordon Brown 13d89295c8
Provide useful error when a policy doesn't exist (#34206)
When an index is configured to use a lifecycle policy that does not
exist, this will now be noted in the step_info for that policy.
2018-10-04 08:21:55 -06:00
Tal Levy f10735aa9a ILM integration test with full policy (#33402)
- this adds an integration test that runs through a policy
with all the actions defined.
- adds a test specific to a policy having just a rollover action
- bumps the node count to 4
2018-10-03 12:20:43 -06:00
Lee Hinman 388f754a8e
Change step execution flow to be deliberate about type (#34126)
This commit changes the way that step execution flows. Rather than have any step
run when the cluster state changes or the periodic scheduler fires, this now
runs the different types of steps at different times.

`AsyncWaitStep` is run at a periodic manner, ie, every 10 minutes by default
`ClusterStateActionStep` and `ClusterStateWaitStep` are run every time the
cluster state changes.
`AsyncActionStep` is now run only after the cluster state has been transitioned
into a new step. This prevents these non-idempotent steps from running at the
same time. It addition to being run when transitioned into, this is also run
when a node is newly elected master (only if set as the current step) so that
master failover does not fail to run the step.

This also changes the `RolloverStep` from an `AsyncActionStep` to an
`AsyncWaitStep` so that it can run periodically.

Relates to #29823
2018-10-02 20:02:50 -06:00
Lee Hinman 2d9cb21490 Merge remote-tracking branch 'origin/master' into index-lifecycle 2018-10-01 14:10:09 -06:00
Lee Hinman a49d59802a
Use more descriptive task names for ILM cluster state updates (#34161)
Rather than using "ILM" for everything, we should use more descriptive names so
debugging from logs is easier to do.

Resolves #34118
2018-10-01 13:45:26 -06:00
Gordon Brown c0bfc07f53
Only make indexes read-only on Shrink and ForceMerge actions (#33907)
ILM now only forces indices to become read only in the case of Shrink
and Force Merge actions, as these are most useful in cases where the
index is no longer being written to.
2018-09-25 10:16:01 -06:00
Gordon Brown 90de436e55
Use custom index metadata for ILM state (#33783)
Using index settings for ILM state is fragile and exposes too much
information that doesn't need to be exposed. Using custom index metadata
is more resilient and allows more controlled access to internal
information.

As part of these changes, moves away from using defaults for ILM-related
values, in favor of using null values to clearly indicate that the value is not
present.
2018-09-19 14:50:48 -06:00
Lee Hinman 27dd25857b
Rebuild step on PolicyStepsRegistry.getStep (#33780)
This moves away from caching a list of steps for a current phase, instead
rebuilding the necessary step from the phase JSON stored in the index's
metadata.

Relates to #29823
2018-09-18 17:07:57 -06:00
Lee Hinman 11a55d2307 [TEST] Handle an IndexLifecycleService that has not started up 2018-09-18 14:02:09 -06:00
Tal Levy 94a66c556d
add phase execution info to ILM Explain API (#33488)
adds a section for phase execution to the Explain API.

This contains

- phase definition
- policy name
- policy version
- modified date
2018-09-17 17:00:00 -07:00
Lee Hinman 1f048d3d3f
Remove unneeded listener on MoveToNextStepUpdateTask (#33725)
There was a listener that re-runs the policy with the new state when the cluster
state is processed by the `MoveToNextStepUpdateTask`. This removes this listener
as we will execute the policy through the `IndexLifecyleService` cluster state
listener.
2018-09-14 14:38:23 -06:00
Lee Hinman b7649fce0c
Rename "after" to "minimum_age" in lifecycle definition (#33530)
This renames the "after" field to better reflect what the meaning is.

Supercedes #32624
2018-09-08 21:40:55 -06:00
Lee Hinman 8fa8dea138
Encapsulate Client as class variable for PolicyStepsRegistry (#33529)
Rather than pass in the client on the `update` step, this makes it passed in to
the constructor so it's not required on every update.
2018-09-07 16:32:25 -06:00
Colin Goodheart-Smithe f83641346f
Adds checks to ensure index metadata exists when we try to use it (#33455)
* Adds checks to ensure index metadata exists when we try to use it

* Fixes failing test
2018-09-07 13:06:51 +01:00
Tal Levy 21bb4720a2
add notion of version and modified_date to LifecyclePolicyMetadata (#33450)
It is useful to keep track of which version of a policy is currently
being executed by a specific index. For management purposes, it would
also be useful to know at which time the latest version was inserted
so that an audit trail is left for reconciling changes happening in ILM.
2018-09-06 13:32:24 -07:00
Lee Hinman b335487ca6 Fix qa build.gradle to gradle assemble works correctly
There is a new way to disable assembling from certain subdirectories
2018-09-06 11:22:27 -06:00
Lee Hinman 96d515e3f5
Replace PhaseAfterStep with PhaseCompleteStep (#33398)
This removes `PhaseAfterStep` in favor of a new `PhaseCompleteStep`. This step
in only a marker that the `LifecyclePolicyRunner` needs to halt until the time
indicated for entering the next phase.

This also fixes a bug where phase times were encapsulated into the policy
instead of dynamically adjusting to policy changes.

Supersedes #33140, which it replaces
Relates to #29823
2018-09-05 16:37:45 -06:00
Tal Levy 023e1bf889 fix test 2018-09-05 13:21:50 -07:00
Tal Levy 0f8bc10bcf
add new phase definition setting used for retrieving phase to execute (#33289)
Since policies can be updated independent of execution plans for the current
phase being executed, it would be nice to know what the phase that is executing
looks like in JSON. This PR does just that, while also using that index setting
to recontruct the phase steps to execute (for consistency)
2018-09-05 11:35:20 -07:00
Colin Goodheart-Smithe ada3e710f6
Renames XPackField.INDEX _LIFCYCLE value to "ilm" (#33270)
This brings the name in line with everywhere else and means that name
seen on the feature usage and `GET _xpack` APIs will match the plugin
name.

This change also removes `IndexLifcycle.NAME` since this was only used
to name the scheduler job and that can be done using
`XPackField.INDEX_LIFECYCLE` instead
2018-08-31 08:29:44 +01:00
Tal Levy cfe0acc83c
separate out IndexLifecycleService cluster-state change concerns (#33033)
Changes to the IndexLifecycleService were necessary since relying on
ClusterChangedEvents for a full picture of the cluster state's settings was
a mistake. It is not necessary that these events hold all settings, especially ones
that are set at node start-up.

Changes to main include:

- move poll interval updates to a SettingsUpdateConsumer
- move scheduler start/stop to a localMasterNodeListener
- keep triggerPolicies in clusterChanged

Changes to tests include:

- removal of some low-level state transition checks in the Service that no longer make sense
  since the changes are unconditionally specified in the appropriate listeners
- add integration tests for poll-interval updates
- add integration test assertions for verifying scheduler is started up correctly
2018-08-27 14:25:27 -07:00
Lee Hinman 52aa738d84
Remove canSetPolicy, canUpdatePolicy, and canRemovePolicy (#33037)
* Remove canSetPolicy, canUpdatePolicy and canRemovePolicy

Since we now store a pre-compiled list of steps for an index's phase in the
`PolicyStepsRegistry`, we no longer need to worry about updating policies as any
updates won't affect the current phase, and will only be picked up on phase
transitions.

This also removes the tests that test these methods

Relates to #29823
2018-08-23 15:37:02 -06:00
Gordon Brown 650f12af1e Duplicate Protocol classes into Core
This is needed as with recent changes to master (see #32952), protocol
is no longer accessible from core, so these classes need to be
duplicated in both places.
2018-08-23 13:50:15 -06:00
Gordon Brown 191bd7c031 Fix Gradle configuration
This change was made to master, this commit brings it over to
index-lifecycle.

See #32409
2018-08-23 12:08:16 -06:00
Colin Goodheart-Smithe fd88ab8c75
Fixes shrink action to remove single ndoe allocation (#33091)
This change fixes the shrink action so when the shrink is performed we
remove the single node allocation fromt eh shard allocation filtering
settings. Without this fix replicas cannot be allocated after we have
performed the shrink and we cannot make progress with the rest of the
shink aciton.

This change also fixes a bug in the explain API where the maste node
timeout was being set to null if it wasn't provided instead of using
its default value causing a NPE
2018-08-23 16:36:57 +01:00
Tal Levy dfc70ddcc7
rename pre-phase/pre-action to new/init (#32996)
this will keep things more consistent with the initial PhaseAfterStep, which has a phase name of `new`
2018-08-22 11:41:09 -07:00
Tal Levy 55cb08a352
move ESLoggerFactory usage to LogManager (#33043)
Work done in #32513 has deprecated the old constructor in favor of
log4j2's LogManager
2018-08-21 17:59:32 -07:00
Tal Levy 96869b253c
conditionally update CS only if StepInfo changes (#33004)
If we are waiting on a condition to be met, and the reason
it is not completed is unchanged, we find ourselves updating
cluster state over and over again and kicking of the ILM listeners
to re-check. This is overkill and can generate way too many
cluster state updates
2018-08-21 12:29:41 -07:00
Tal Levy 6780ab9d5c
add user authentication test for ILM (#32826) 2018-08-21 12:27:53 -07:00
Colin Goodheart-Smithe 75cae4560c
fix compile error after merge 2018-08-21 14:47:54 +01:00
Tal Levy 5ce082cd0a
copy LifecyclePolicy to protocol.xpack (#32915)
This is the final PR for copying over the necessary components for
clients to parse/render LifecyclePolicy. Changes include:

- move of named-x-content server objects away from client
- move validation into the client copy of LifecyclePolicy
- move LifecycleAction into an interface with `getName`
2018-08-20 08:32:22 -07:00
Lee Hinman 77016add19
Store phase steps for index in PolicyStepsRegistry (#32926)
* Store phase steps for index in PolicyStepsRegistry

This changes the way that steps are retrieved from `PolicyStepsRegistry` to
store the steps on a per-index basis (in memory for now, though that will change
in subsequent PRs). These steps are rebuilt as the index changes phases.

This also fixes a bug where an action with the same phase and name was not being
considered changed (and thus updated) in the compiled steps list. These are now
correctly considered as "upsert" diffs.

Relates to #29823
2018-08-17 22:45:15 -06:00
Tal Levy 33522d4fb4
update cluster-state task execution to halt on new phase (#32886)
As we migrate to a per-phase execution model, we need to prepare our cluster-state-step execution model to be aligned. It is the case that the final iteration into the next "currentStep" from the next phase would not be available in the registry yet. This change exits the execution loop early as to not jump into executing the next phase's steps before the registry is properly updated
2018-08-15 19:30:21 -07:00
Tal Levy 4baa721459
remove `type` config from LifecyclePolicy JSON (#32660)
Since there is only one production policy, Timeseries, there
is no reason to expose the `type` argument to the user.
2018-08-15 14:47:22 -07:00
Tal Levy b218b1c68d
introduce random timeseries lifecycle policy util method (#32852)
It is useful to have a random TimeseriesLifecycleType-backed LifecyclePolicy
for testing. This PR exposes a helper method to create one and use it for serialization tests
in LifecyclePolicyTests
2018-08-14 12:57:43 -07:00
Tal Levy 41e6d98af8
move qa yaml tests to inside the ILM plugin (#32693)
The qa tests with security haven't actually gone as far as testing security roles yet, so this is a start in the hopes of both bringing the tests into the ilm plugin
2018-08-09 16:09:20 -07:00
Colin Goodheart-Smithe 8750d622fc
Adds REST client support for starting and stopping ILM (#32609)
* Adds REST client support for PutOperationMode in ILM

* Corrects licence headers

* iter

* add request converter test

* Fixes tests

* Creates start and stop actions for controlling ILM operation

* Addresses review comments
2018-08-09 20:39:06 +01:00
Colin Goodheart-Smithe 5ff4f9347f
Adds explain lifecycle API to the Rest Client (#32606) 2018-08-09 10:18:45 +01:00
Tal Levy 2fc3f1d04c
move replicas action functionality into AllocateAction (#32523)
Since replica counts and allocation rules are set separately, it is not always clear how many replicas are to be allocated in the allocate action. Moving the replicas action to occur at the same time as the allocate action, resolves this confusion that could end an undesired state. This means that the ReplicasAction is removed, and a new optional replicas parameter is added to AllocateAction.
2018-08-08 11:43:29 -07:00
Tal Levy 0ad252d502
change default indices.lifecycle.poll_interval to something sane (#32521)
This was originally set to a few seconds while prototyping things.
This interval is for the scheduled trigger of policies. Policies
have this extra trigger beyond just on cluster-state changes because
cluster-state changes may not be happeneing in a cluster for
whatever reason, and we need to continue making progress. Updating
this value to be larger is reasonable since not all operations
are expected to be completed in the span of seconds, but instead in
minutes and hours. 10 minutes is sane.
2018-08-06 14:41:27 -07:00
Jason Tedor 5de236e1e7
Rename ILM, ILM endpoints and drop _xpack (#32564)
This commit does the following:
 - renames index-lifecycle plugin to ilm
 - modifies the endpoints to ilm instead of index_lifecycle
 - drops _xpack from the endpoints
 - drops a few duplicate endpoints
2018-08-02 13:05:11 -04:00