OpenSearch

mirror of https://github.com/honeymoose/OpenSearch.git synced 2025-02-09 14:35:04 +00:00

Author	SHA1	Message	Date
Tal Levy	bae656dcea	Preserve ILM operation mode when creating new lifecycles (#38134 ) There was a bug where creating a new policy would start the ILM service, even if it was stopped. This change ensures that there is no change to the existing operation mode	2019-02-01 13:16:34 -08:00
Tal Levy	7c738fd241	Skip Shrink when numberOfShards not changed (#37953 ) Previously, ShrinkAction would fail if it was executed on an index that had the same number of shards as the target shrunken number. This PR introduced a new BranchingStep that is used inside of ShrinkAction to branch which step to move to next, depending on the shard values. So no shrink will occur if the shard count is unchanged.	2019-01-30 15:09:17 -08:00
Lee Hinman	427bc7f940	Use ILM for Watcher history deletion (#37443 ) * Use ILM for Watcher history deletion This commit adds an index lifecycle policy for the `.watch-history-*` indices. This policy is automatically used for all new watch history indices. This does not yet remove the automatic cleanup that the monitoring plugin does for the .watch-history indices, and it does not touch the `xpack.watcher.history.cleaner_service.enabled` setting. Relates to #32041	2019-01-23 10:18:08 -07:00
Lee Hinman	647e225698	Retry ILM steps that fail due to SnapshotInProgressException (#37624 ) Some steps, such as steps that delete, close, or freeze an index, may fail due to a currently running snapshot of the index. In those cases, rather than move to the ERROR step, we should retry the step when the snapshot has completed. This change adds an abstract step (`AsyncRetryDuringSnapshotActionStep`) that certain steps (like the ones I mentioned above) can extend that will automatically handle a situation where a snapshot is taking place. When a `SnapshotInProgressException` is received by the listener wrapper, a `ClusterStateObserver` listener is registered to wait until the snapshot has completed, re-running the ILM action when no snapshot is occurring. This also adds integration tests for these scenarios (thanks to @talevy in #37552). Resolves #37541	2019-01-23 09:46:31 -07:00
Martijn van Groningen	a3030c51e2	[ILM] Add unfollow action (#36970 ) This change adds the unfollow action for CCR follower indices. This is needed for the shrink action in case an index is a follower index. This will give the follower index the opportunity to fully catch up with the leader index, pause index following and unfollow the leader index. After this the shrink action can safely perform the ilm shrink. The unfollow action needs to be added to the hot phase and acts as barrier for going to the next phase (warm or delete phases), so that follower indices are being unfollowed properly before indices are expected to go in read-only mode. This allows the force merge action to execute its steps safely. The unfollow action has three steps: * `wait-for-indexing-complete` step: waits for the index in question to get the `index.lifecycle.indexing_complete` setting be set to `true` * `wait-for-follow-shard-tasks` step: waits for all the shard follow tasks for the index being handled to report that the leader shard global checkpoint is equal to the follower shard global checkpoint. * `pause-follower-index` step: Pauses index following, necessary to unfollow * `close-follower-index` step: Closes the index, necessary to unfollow * `unfollow-follower-index` step: Actually unfollows the index using the CCR Unfollow API * `open-follower-index` step: Reopens the index now that it is a normal index * `wait-for-yellow` step: Waits for primary shards to be allocated after reopening the index to ensure the index is ready for the next step In the case of the last two steps, if the index in being handled is a regular index then the steps acts as a no-op. Relates to #34648 Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com> Co-authored-by: Gordon Brown <gordon.brown@elastic.co>	2019-01-18 13:05:03 -07:00
Jake Landis	587034dfa7	Add set_priority action to ILM (#37397 ) This commit adds a set_priority action to the hot, warm, and cold phases for an ILM policy. This action sets the `index.priority` on the managed index to allow different priorities between the hot, warm, and cold recoveries. This commit also includes the HLRC and documentation changes. closes #36905	2019-01-17 09:55:36 -06:00
Alexander Reelsen	b2e8437424	Tests: Add ElasticsearchAssertions.awaitLatch method (#36777 ) * Tests: Add ElasticsearchAssertions.awaitLatch method Some tests are using assertTrue(latch.await(...)) in their code. This leads to an assertion error without any error message. This adds a method which has a nicer error message and can be used in tests. * fix forbidden apis * fix spaces	2019-01-10 09:25:36 +01:00
Tal Levy	eaeccd8401	[ILM] Add Freeze Action (#36910 ) This commit adds a new ILM Action for freezing indices in the cold phase. Closes #34630.	2019-01-03 15:00:40 -08:00
Gordon Brown	d39956c65c	Remove `indexing_complete` when removing policy (#36620 ) Leaving `index.lifecycle.indexing_complete` in place when removing the lifecycle policy from an index can cause confusion, as if a new policy is associated with the policy, rollover will be silently skipped. Removing that setting when removing the policy from an index makes associating a new policy with the index more involved, but allows ILM to fail loudly, rather than silently skipping operations which the user may assume are being performed. * Adjust order of checks in WaitForRolloverReadyStep This allows ILM to error out properly for indices that have a valid alias, but are not the write index, while still handling `indexing_complete` on old-style aliases and rollover (that is, those which only point to a single index at a time with no explicit write index)	2018-12-19 12:11:30 -07:00
Gordon Brown	6a824322fc	Improve error message for deleting in-use policy (#36457 ) The error message used when attempting to delete a lifecycle policy that is in use previously only included one index which was using the policy. It now includes all indices using that policy.	2018-12-12 14:57:48 -07:00
Gordon Brown	6481f2e380	Add setting to bypass Rollover action (#36235 ) Adds a setting that indicates that an index is done indexing, set by ILM when the Rollover action completes. This indicates that the Rollover action should be skipped in any future invocations, as long as the index is no longer the write index for its alias. This enables 1) an index with a policy that involves the Rollover action to have the policy removed and switched to another one without use of the move-to-step API, and 2) integrations with Beats and CCR.	2018-12-11 08:53:05 -07:00
Tal Levy	ed7afd1a9e	[ILM] TEST: fix long overflow in TimeValueScheduleTests (#36384 ) Closes #35948.	2018-12-10 09:28:17 -08:00
Lee Hinman	8ea999e489	Include stack trace with ILM error in explain output (#35512 ) This changes the stacktrace to be included with the ILM explain error when the index is an on ERROR step. Before: ```json { "indices" : { "foo" : { "index" : "foo", "managed" : true, "policy" : "bad", "lifecycle_date_millis" : 1542131670601, "phase" : "warm", "phase_time_millis" : 1542131676335, "action" : "shrink", "action_time_millis" : 1542131676335, "step" : "ERROR", "step_time_millis" : 1542131676451, "failed_step" : "shrink", "step_info" : { "type" : "illegal_argument_exception", "reason" : "the number of target shards [13] must be less that the number of source shards [2]" }, "phase_execution" : { "policy" : "bad", "phase_definition" : { "min_age" : "5s", "actions" : { "shrink" : { "number_of_shards" : 13 } } }, "version" : 1, "modified_date_in_millis" : 1542131669839 } } } } ``` After ``` { "indices" : { "foo" : { "index" : "foo", "managed" : true, "policy" : "bad", "lifecycle_date_millis" : 1542131670601, "phase" : "warm", "phase_time_millis" : 1542131676335, "action" : "shrink", "action_time_millis" : 1542131676335, "step" : "ERROR", "step_time_millis" : 1542131676451, "failed_step" : "shrink", "step_info" : { "type" : "illegal_argument_exception", "reason" : "the number of target shards [13] must be less that the number of source shards [2]", "stack_trace" : "java.lang.IllegalArgumentException: the number of target shards [13] must be less that the number of source shards [2]\n\tat org.elasticsearch.cluster.metadata.IndexMetaData.selectShrinkShards(IndexMetaData.java:1509)\n\tat org.elasticsearch.action.admin.indices.shrink.TransportResizeAction.prepareCreateIndexRequest(TransportResizeAction.java:146)\n\tat org.elasticsearch.action.admin.indices.shrink.TransportResizeAction$1.onResponse(TransportResizeAction.java:104)\n\tat org.elasticsearch.action.admin.indices.shrink.TransportResizeAction$1.onResponse(TransportResizeAction.java:101)\n\tat org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:64)\n\tat org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:60)\n\tat org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$AsyncAction.onCompletion(TransportBroadcastByNodeAction.java:383)\n\tat org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$AsyncAction.onNodeResponse(TransportBroadcastByNodeAction.java:352)\n\tat org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$AsyncAction$1.handleResponse(TransportBroadcastByNodeAction.java:324)\n\tat org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$AsyncAction$1.handleResponse(TransportBroadcastByNodeAction.java:314)\n\tat org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1117)\n\tat org.elasticsearch.transport.TransportService$DirectResponseChannel.processResponse(TransportService.java:1198)\n\tat org.elasticsearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1178)\n\tat org.elasticsearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:54)\n\tat org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.messageReceived(TransportBroadcastByNodeAction.java:417)\n\tat org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.messageReceived(TransportBroadcastByNodeAction.java:391)\n\tat org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:251)\n\tat org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)\n\tat org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.messageReceived(SecurityServerTransportInterceptor.java:309)\n\tat org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63)\n\tat org.elasticsearch.transport.TransportService$7.doRun(TransportService.java:714)\n\tat org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:726)\n\tat org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base/java.lang.Thread.run(Thread.java:834)\n" }, "phase_execution" : { "policy" : "bad", "phase_definition" : { "min_age" : "5s", "actions" : { "shrink" : { "number_of_shards" : 13 } } }, "version" : 1, "modified_date_in_millis" : 1542131669839 } } } } ``` Resolves #35498	2018-11-14 14:40:05 -07:00
Tal Levy	16cbbab7b7	[ILM] fix retry so it picks up latest policy and executes async action (#35406 ) Before, moving to a failed step would only change the step info to be that of the failed step. This means two things. 1. Async Steps would never be triggered to execute 2. If there are inherent problems with the action definition that can be fixed with a policy update, these changes were not being reflected by the new execution info. Changes now 1. Async steps are executed after the move to the failed step in cluster state 2. the lifecycle execution info's phase definition is updated from the current latest policy definition, even though the index isn't moving to a new phase. Closes #35397.	2018-11-12 11:32:59 -08:00
Gordon Brown	67f9e8fa23	Enforce limitations on ILM policy names (#35104 ) Enforces restrictions on ILM policy names to ensure we don't accept policy names the system can't handle, or may reserve for future use.	2018-11-09 10:11:26 -07:00
Alpar Torok	8a85b2eada	Remove build qualifier from server's Version (#35172 ) With this change, `Version` no longer carries information about the qualifier, we still need a way to show the "display version" that does have both qualifier and snapshot. This is now stored by the build and red from `META-INF`.	2018-11-07 14:01:05 +02:00
Tal Levy	a85b4f42ca	[ILM] change remove-policy-from-index http method from DELETE to POST (#35268 ) The remove-ilm-from-index API was using the DELETE http method to signify that something is being removed. Although, metadata about ILM for the index is being deleted, no entity/resource is being deleted during this operation. POST is more in line with what this API is actually doing, it is modifying the metadata for an index. As part of this change, `remove` is also appended to the path to be more explicit about its actions.	2018-11-06 07:46:25 -08:00
Alexander Reelsen	409050e8de	Refactor: Remove settings from transport action CTOR (#35208 ) As settings are not used in the transport action constructor, this removes the passing of the settings in all the transport actions.	2018-11-05 13:08:18 +01:00
Gordon Brown	b3da3eae08	[ILM] Fix race condition in test (#35143 ) Previously, testRunStateChangePolicyWithNextStep asserted that the ClusterState before and after running the steps were equal. The test only passed due to a race condition: The latch would be triggered by the step execution, but the cluster state update thread would continue running before committing the change to the cluster state. This allowed the test to read the old cluster state and pass the equality check about 99.99% of the time. The test now waits for the new cluster state to be committed before checking that it is _not_ equal to the old cluster state.	2018-11-02 11:09:48 -06:00
Tal Levy	6b312a500d	uninherit from AbstractComponent in IndexLifecycleService	2018-11-01 10:22:55 -07:00
Tal Levy	5f4b23f8c1	cleanup ILM qa structure (#35110 ) This commit does a few things - moves ILM-specifc rest yaml tests into plugin/ilm/qa, and creates special :plugin:ilm:qa:rest module to test them - removes the with-security tests of the yaml tests since they are covered in the rest tests now - moves ChangePolicyforIndexIT into the qa/multi-node project since that test is not currently running in main ilm since integTest is disabled	2018-10-31 11:49:29 -07:00
Tal Levy	a294a7c6b5	fix IndexLifecycleService setting member the settings variable was previously created by the AbstractComponent class inherited by IndexLifecycleService. this is no more.	2018-10-31 11:17:16 -07:00
Tal Levy	5141084048	rename CRUD api REST path prefix _ilm to _ilm/policy (#35056 ) This PR renames the CRUD APIS for ILM GET _ilm/<policy>, _ilm -> _ilm/policy/<policy>, _ilm/policy PUT _ilm/<policy> -> _ilm/policy/<policy> DELETE _ilm/<policy> -> _ilm/policy/<policy> closes #34929.	2018-10-30 16:19:05 -07:00
Gordon Brown	6ecb8ff344	Move to Error step if ClusterState* steps throw (#35069 ) Previously, if ClusterStateActionSteps or ClusterStateWaitSteps threw an exception executing, the exception would only be caught and logged by the generic ClusterStateUpdateTask machinery and the index would become stuck on that step. Now, exceptions thrown in these steps will be caught and the index will be moved to the Error step.	2018-10-30 13:33:32 -06:00
Gordon Brown	f6ac0e4bbc	[ILM] Fix Move To Step API causing ILM to hang (#34618 ) The Move To Step API now checks to see if the target step is an AsyncActionStep, and if so, runs it. Previously, AsyncActionSteps would only be run when they are entered by executing the previous step, so if an AsyncActionStep was entered via the Move To Step API, ILM would never touch that index again.	2018-10-29 11:18:12 -06:00
Tal Levy	f6ce935444	fix `GET _ilm` response with uninitialized ILM metadata (#34881 ) ILM would return a resource-not-found exception when requesting policies while the IndexLifecycleMetaData is not initialized. The behavior here should not be as extreme since it is not the user's fault. This commit changes the behavior so that it succeeds and returns no policies when no policy names are explicitely specified, otherwise keep the same behavior of throwing an exception	2018-10-25 16:00:44 -07:00
Tal Levy	41eaa586e8	remove index.lifecycle.skip setting (#34823 ) With the introduction of _ilm/stop and _ilm/start APIs, the use cases where one would only target a select group of indices to start/stop has been reduced. Since there is no strong use-case for skipping specific indices, it is best to remove this functionality and only adding if later desired, with the hopes of keeping things more simple.	2018-10-25 07:27:04 -07:00
Tal Levy	21b9b024c7	fix PolicyStatsTests mutateInstance (#34835 ) through randomization, there is a chance that the mutateInstance for PolicyStatsTests does not actually mutate the original object. This PR aims to fix this	2018-10-25 07:24:54 -07:00
Colin Goodheart-Smithe	e7fddb5c93	Adds usage data for ILM (#33377 ) * Adds usage data for ILM * Adds tests for IndexLifecycleFeatureSetUsage and friends * Adds tests for IndexLifecycleFeatureSet * Fixes merge errors * Add number of indices managed to usage stats Also adds more tests * Addresses Review comments	2018-10-24 18:28:46 +01:00
Colin Goodheart-Smithe	c7fe87e43f	Removes Set Policy API in favour of setting index.lifecycle.name directly (#34304 ) * Removes Set Policy API in favour of setting index.lifecycle.name directly * Reinstates matcher that will still be used * Cleans up code after rebase * Adds test to check changing policy with ndex settings works * Fixes TimeseriesLifecycleActionsIT after API removal * Fixes docs tests * Fixes case on close where lifecycle service was never created	2018-10-24 16:14:59 +01:00
Lee Hinman	c5a264e77f	Ensure phase_time is set when in the "new" phase (#34280 ) Since there's no transition into the "new" phase it wasn't set until the "hot" phase, so now we initialize it when initializing the policy context. Resolves #34277	2018-10-23 15:20:41 -06:00
Gordon Brown	9cb0bb8b9f	Rework ILM build to separate integration tests (#34617 ) Having integration tests separated from the unit tests in the qa directory works much more smoothly with our testing infrastructure, matches what other plugins do, and tests in a more "real" deployment scenario by having all plugins installed.	2018-10-18 13:33:33 -06:00
Tal Levy	fdb850735a	fix setting version on deleting unmaanged indices with wildcard	2018-10-16 23:39:48 -07:00
Tal Levy	3a555da34d	update version on ILM setting updates	2018-10-16 15:43:10 -07:00
Jack Conradson	80474e138f	HLRC: Add remove index lifecycle policy (#34204 ) This change adds the command RemoveIndexLifecyclePolicy to the HLRC. This uses the new TimeRequest as a base class for RemoveIndexLifecyclePolicyRequest on the client side.	2018-10-16 08:12:06 -07:00
Lee Hinman	9ad2a7fa77	Fix expected next step being incorrect when executing async action (#34313 ) This fixes an issue where an incorrect expected next step is used when checking to execute `AsyncActionStep`s after a cluster state step. It fixes this scenario: - `ExecuteStepsUpdateTask` executes a `ClusterStateWaitStep` or `ClusterStateActionStep` successfully - The next step is also a `ClusterStateWaitStep`, so it loops - The `ClusterStateWaitStep` has a next stepkey (which gets set to the `nextStepKey` in the code) - The `ClusterStateWaitStep` fails the condition, meaning that it will have to wait longer - The `nextStepKey` is now incorrect though, because we did not advance the index's step, and it's not `null` (which is another safe value if there is no step after the `ClusterStateWaitStep`) This fixes the problem by resetting the nextStepKey to null if the condition is not met, since we are not going to advance the step metadata in this case (thereby skipping the `maybeRunAsyncAction` invocation). This commit also tightens up and enhances much of the ILM logging. A lot of logging was missing the index name (making it hard to debug in the presence of multiple indices) and a lot was using the wrong logging level (DEBUG is now actually readable without being a wall of text). Resolves #34297	2018-10-08 11:25:18 -06:00
Gordon Brown	13d89295c8	Provide useful error when a policy doesn't exist (#34206 ) When an index is configured to use a lifecycle policy that does not exist, this will now be noted in the step_info for that policy.	2018-10-04 08:21:55 -06:00
Tal Levy	f10735aa9a	ILM integration test with full policy (#33402 ) - this adds an integration test that runs through a policy with all the actions defined. - adds a test specific to a policy having just a rollover action - bumps the node count to 4	2018-10-03 12:20:43 -06:00
Lee Hinman	388f754a8e	Change step execution flow to be deliberate about type (#34126 ) This commit changes the way that step execution flows. Rather than have any step run when the cluster state changes or the periodic scheduler fires, this now runs the different types of steps at different times. `AsyncWaitStep` is run at a periodic manner, ie, every 10 minutes by default `ClusterStateActionStep` and `ClusterStateWaitStep` are run every time the cluster state changes. `AsyncActionStep` is now run only after the cluster state has been transitioned into a new step. This prevents these non-idempotent steps from running at the same time. It addition to being run when transitioned into, this is also run when a node is newly elected master (only if set as the current step) so that master failover does not fail to run the step. This also changes the `RolloverStep` from an `AsyncActionStep` to an `AsyncWaitStep` so that it can run periodically. Relates to #29823	2018-10-02 20:02:50 -06:00
Lee Hinman	2d9cb21490	Merge remote-tracking branch 'origin/master' into index-lifecycle	2018-10-01 14:10:09 -06:00
Lee Hinman	a49d59802a	Use more descriptive task names for ILM cluster state updates (#34161 ) Rather than using "ILM" for everything, we should use more descriptive names so debugging from logs is easier to do. Resolves #34118	2018-10-01 13:45:26 -06:00
Gordon Brown	c0bfc07f53	Only make indexes read-only on Shrink and ForceMerge actions (#33907 ) ILM now only forces indices to become read only in the case of Shrink and Force Merge actions, as these are most useful in cases where the index is no longer being written to.	2018-09-25 10:16:01 -06:00
Gordon Brown	90de436e55	Use custom index metadata for ILM state (#33783 ) Using index settings for ILM state is fragile and exposes too much information that doesn't need to be exposed. Using custom index metadata is more resilient and allows more controlled access to internal information. As part of these changes, moves away from using defaults for ILM-related values, in favor of using null values to clearly indicate that the value is not present.	2018-09-19 14:50:48 -06:00
Lee Hinman	27dd25857b	Rebuild step on PolicyStepsRegistry.getStep (#33780 ) This moves away from caching a list of steps for a current phase, instead rebuilding the necessary step from the phase JSON stored in the index's metadata. Relates to #29823	2018-09-18 17:07:57 -06:00
Lee Hinman	11a55d2307	[TEST] Handle an IndexLifecycleService that has not started up	2018-09-18 14:02:09 -06:00
Tal Levy	94a66c556d	add phase execution info to ILM Explain API (#33488 ) adds a section for phase execution to the Explain API. This contains - phase definition - policy name - policy version - modified date	2018-09-17 17:00:00 -07:00
Lee Hinman	1f048d3d3f	Remove unneeded listener on MoveToNextStepUpdateTask (#33725 ) There was a listener that re-runs the policy with the new state when the cluster state is processed by the `MoveToNextStepUpdateTask`. This removes this listener as we will execute the policy through the `IndexLifecyleService` cluster state listener.	2018-09-14 14:38:23 -06:00
Lee Hinman	b7649fce0c	Rename "after" to "minimum_age" in lifecycle definition (#33530 ) This renames the "after" field to better reflect what the meaning is. Supercedes #32624	2018-09-08 21:40:55 -06:00
Lee Hinman	8fa8dea138	Encapsulate Client as class variable for PolicyStepsRegistry (#33529 ) Rather than pass in the client on the `update` step, this makes it passed in to the constructor so it's not required on every update.	2018-09-07 16:32:25 -06:00
Colin Goodheart-Smithe	f83641346f	Adds checks to ensure index metadata exists when we try to use it (#33455 ) * Adds checks to ensure index metadata exists when we try to use it * Fixes failing test	2018-09-07 13:06:51 +01:00

1 2

74 Commits