OpenSearch

Commit Graph

Author	SHA1	Message	Date
Luca Cavanna	a7046e001c	Remove support for maxRetryTimeout from low-level REST client (#38085 ) We have had various reports of problems caused by the maxRetryTimeout setting in the low-level REST client. Such setting was initially added in the attempts to not have requests go through retries if the request already took longer than the provided timeout. The implementation was problematic though as such timeout would also expire in the first request attempt (see #31834), would leave the request executing after expiration causing memory leaks (see #33342), and would not take into account the http client internal queuing (see #25951). Given all these issues, it seems that this custom timeout mechanism gives little benefits while causing a lot of harm. We should rather rely on connect and socket timeout exposed by the underlying http client and accept that a request can overall take longer than the configured timeout, which is the case even with a single retry anyways. This commit removes the `maxRetryTimeout` setting and all of its usages.	2019-02-06 08:43:47 +01:00
Julie Tibshirani	3ce7d2c9b6	Make sure to reject mappings with type _doc when include_type_name is false. (#38270 ) `CreateIndexRequest#source(Map<String, Object>, ... )`, which is used when deserializing index creation requests, accidentally accepts mappings that are nested twice under the type key (as described in the bug report #38266). This in turn causes us to be too lenient in parsing typeless mappings. In particular, we accept the following index creation request, even though it should not contain the type key `_doc`: ``` PUT index?include_type_name=false { "mappings": { "_doc": { "properties": { ... } } } } ``` There is a similar issue for both 'put templates' and 'put mappings' requests as well. This PR makes the minimal changes to detect and reject these typed mappings in requests. It does not address #38266 generally, or attempt a larger refactor around types in these server-side requests, as I think this should be done at a later time.	2019-02-05 10:52:32 -08:00
Gordon Brown	b866417650	Mute testCannotShrinkLeaderIndex (#38374 ) This test should not pass until CCR finishes integrating shard history retention leases. It currently sometimes passes (which is a bug in the test), but cannot pass reliably until the linked issue is resolved.	2019-02-04 16:06:19 -07:00
Gordon Brown	7a1e89c7ed	Ensure ILM policies run safely on leader indices (#38140 ) Adds a Step to the Shrink and Delete actions which prevents those actions from running on a leader index - all follower indices must first unfollow the leader index before these actions can run. This prevents the loss of history before follower indices are ready, which might otherwise result in the loss of data.	2019-02-01 20:46:12 -07:00
Tal Levy	7c738fd241	Skip Shrink when numberOfShards not changed (#37953 ) Previously, ShrinkAction would fail if it was executed on an index that had the same number of shards as the target shrunken number. This PR introduced a new BranchingStep that is used inside of ShrinkAction to branch which step to move to next, depending on the shard values. So no shrink will occur if the shard count is unchanged.	2019-01-30 15:09:17 -08:00
Tim Brooks	00ace369af	Use `CcrRepository` to init follower index (#35719 ) This commit modifies the put follow index action to use a CcrRepository when creating a follower index. It routes the logic through the snapshot/restore process. A wait_for_active_shards parameter can be used to configure how long to wait before returning the response.	2019-01-29 11:47:29 -07:00
Gordon Brown	49bd8715ff	Inject Unfollow before Rollover and Shrink (#37625 ) We inject an Unfollow action before Shrink because the Shrink action cannot be safely used on a following index, as it may not be fully caught up with the leader index before the "original" following index is deleted and replaced with a non-following Shrunken index. The Unfollow action will verify that 1) the index is marked as "complete", and 2) all operations up to this point have been replicated from the leader to the follower before explicitly disconnecting the follower from the leader. Injecting an Unfollow action before the Rollover action is done mainly as a convenience: This allow users to use the same lifecycle policy on both the leader and follower cluster without having to explictly modify the policy to unfollow the index, while doing what we expect users to want in most cases.	2019-01-28 14:09:12 -07:00
Lee Hinman	647e225698	Retry ILM steps that fail due to SnapshotInProgressException (#37624 ) Some steps, such as steps that delete, close, or freeze an index, may fail due to a currently running snapshot of the index. In those cases, rather than move to the ERROR step, we should retry the step when the snapshot has completed. This change adds an abstract step (`AsyncRetryDuringSnapshotActionStep`) that certain steps (like the ones I mentioned above) can extend that will automatically handle a situation where a snapshot is taking place. When a `SnapshotInProgressException` is received by the listener wrapper, a `ClusterStateObserver` listener is registered to wait until the snapshot has completed, re-running the ILM action when no snapshot is occurring. This also adds integration tests for these scenarios (thanks to @talevy in #37552). Resolves #37541	2019-01-23 09:46:31 -07:00
Ryan Ernst	9a34b20233	Simplify integ test distribution types (#37618 ) The integ tests currently use the raw zip project name as the distribution type. This commit simplifies this specification to be "default" or "oss". Whether zip or tar is used should be an internal implementation detail of the integ test setup, which can (in the future) be platform specific.	2019-01-21 12:37:17 -08:00
Martijn van Groningen	a3030c51e2	[ILM] Add unfollow action (#36970 ) This change adds the unfollow action for CCR follower indices. This is needed for the shrink action in case an index is a follower index. This will give the follower index the opportunity to fully catch up with the leader index, pause index following and unfollow the leader index. After this the shrink action can safely perform the ilm shrink. The unfollow action needs to be added to the hot phase and acts as barrier for going to the next phase (warm or delete phases), so that follower indices are being unfollowed properly before indices are expected to go in read-only mode. This allows the force merge action to execute its steps safely. The unfollow action has three steps: * `wait-for-indexing-complete` step: waits for the index in question to get the `index.lifecycle.indexing_complete` setting be set to `true` * `wait-for-follow-shard-tasks` step: waits for all the shard follow tasks for the index being handled to report that the leader shard global checkpoint is equal to the follower shard global checkpoint. * `pause-follower-index` step: Pauses index following, necessary to unfollow * `close-follower-index` step: Closes the index, necessary to unfollow * `unfollow-follower-index` step: Actually unfollows the index using the CCR Unfollow API * `open-follower-index` step: Reopens the index now that it is a normal index * `wait-for-yellow` step: Waits for primary shards to be allocated after reopening the index to ensure the index is ready for the next step In the case of the last two steps, if the index in being handled is a regular index then the steps acts as a no-op. Relates to #34648 Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com> Co-authored-by: Gordon Brown <gordon.brown@elastic.co>	2019-01-18 13:05:03 -07:00
Jake Landis	587034dfa7	Add set_priority action to ILM (#37397 ) This commit adds a set_priority action to the hot, warm, and cold phases for an ILM policy. This action sets the `index.priority` on the managed index to allow different priorities between the hot, warm, and cold recoveries. This commit also includes the HLRC and documentation changes. closes #36905	2019-01-17 09:55:36 -06:00
Tal Levy	eaeccd8401	[ILM] Add Freeze Action (#36910 ) This commit adds a new ILM Action for freezing indices in the cold phase. Closes #34630.	2019-01-03 15:00:40 -08:00
Tal Levy	f6c1e3f14f	[ILM][TEST] increase assertBusy timeout (#36864 ) the testFullPolicy and testMoveToRolloverStep tests are very important tests, but they sometimes timeout beyond the default 10sec wait for shrink to occur. This commit increases one of the assertBusys to 20 seconds	2018-12-20 08:55:02 -08:00
Gordon Brown	d39956c65c	Remove `indexing_complete` when removing policy (#36620 ) Leaving `index.lifecycle.indexing_complete` in place when removing the lifecycle policy from an index can cause confusion, as if a new policy is associated with the policy, rollover will be silently skipped. Removing that setting when removing the policy from an index makes associating a new policy with the index more involved, but allows ILM to fail loudly, rather than silently skipping operations which the user may assume are being performed. * Adjust order of checks in WaitForRolloverReadyStep This allows ILM to error out properly for indices that have a valid alias, but are not the write index, while still handling `indexing_complete` on old-style aliases and rollover (that is, those which only point to a single index at a time with no explicit write index)	2018-12-19 12:11:30 -07:00
Alpar Torok	e9ef5bdce8	Converting randomized testing to create a separate unitTest task instead of replacing the builtin test task (#36311 ) - Create a separate unitTest task instead of Gradle's built in - convert all configuration to use the new task - the built in task is now disabled	2018-12-19 08:25:20 +02:00
Tal Levy	06dfd4aadc	[TEST] fix flaky ILM tests (#36612 ) * WaitForRolloverReadyStepTests#mutateInstance sometimes did not mutate the instance correctly * 40_explain_lifecycle#"Test new phase still has phase_time" is not really a necessary integration test. In addition to this, it is flaky due to the asynchronous nature of ILM metadata population	2018-12-14 11:36:18 -08:00
Tal Levy	e3cf642299	Add ILM-specific security privileges (#36493 ) * add read_ilm cluster privilege Although managing ILM policies is best done using the "manage" cluster privilege, it is useful to have read-only views. * adds `read_ilm` cluster privilege for viewing policies and status * adds Explain API to the `view_index_metadata` index privilege * add manage_ilm privileges	2018-12-13 08:11:33 -08:00
Gordon Brown	6a824322fc	Improve error message for deleting in-use policy (#36457 ) The error message used when attempting to delete a lifecycle policy that is in use previously only included one index which was using the policy. It now includes all indices using that policy.	2018-12-12 14:57:48 -07:00
Gordon Brown	6481f2e380	Add setting to bypass Rollover action (#36235 ) Adds a setting that indicates that an index is done indexing, set by ILM when the Rollover action completes. This indicates that the Rollover action should be skipped in any future invocations, as long as the index is no longer the write index for its alias. This enables 1) an index with a policy that involves the Rollover action to have the policy removed and switched to another one without use of the move-to-step API, and 2) integrations with Beats and CCR.	2018-12-11 08:53:05 -07:00
Alpar Torok	8659af68e0	Auto skip license headers on no source (#35640 ) * Unmute BuildExamplePluginsIT * Skip licenseHeaders when there are no sources	2018-11-20 13:02:33 +02:00
Gordon Brown	cce9648f9d	Align RolloverStep's name with other step names (#35655 ) RolloverStep previously had a name of "attempt_rollover", which was inconsistent with all other step names due it its use of an underscore instead of a dash.	2018-11-16 17:42:48 -07:00
Gordon Brown	3883e9bf4c	Split RolloverStep into Wait and Action steps (#35524 ) RolloverAction will now periodically check the rollover conditions using the Rollover API with the dry_run option as an AsyncWaitStep, then run the rollover itself by calling the Rollover API with no conditions, which will always roll over, as an AsyncActionStep. This will resolve race condition issues in policies using RolloverAction.	2018-11-15 17:11:31 -07:00
Tal Levy	16cbbab7b7	[ILM] fix retry so it picks up latest policy and executes async action (#35406 ) Before, moving to a failed step would only change the step info to be that of the failed step. This means two things. 1. Async Steps would never be triggered to execute 2. If there are inherent problems with the action definition that can be fixed with a policy update, these changes were not being reflected by the new execution info. Changes now 1. Async steps are executed after the move to the failed step in cluster state 2. the lifecycle execution info's phase definition is updated from the current latest policy definition, even though the index isn't moving to a new phase. Closes #35397.	2018-11-12 11:32:59 -08:00
Gordon Brown	67f9e8fa23	Enforce limitations on ILM policy names (#35104 ) Enforces restrictions on ILM policy names to ensure we don't accept policy names the system can't handle, or may reserve for future use.	2018-11-09 10:11:26 -07:00
Tal Levy	2bf843e768	[TEST] Mute ChangePolicyForIndexIT#testChangePolicyForIndex	2018-11-06 06:09:49 -08:00
Nik Everett	f72ef9b5fd	Build: Pull "skip assemble on qa" to common build (#35214 ) Pull all of the logic that we use to skip the `assemble` and `dependenciesInfo` tasks on `qa` projects into one spot in our root build file.	2018-11-05 16:16:00 -05:00
Gordon Brown	0fbb8a16bc	Skip Rollover step if next index already exists (#35168 ) If the Rollover step would fail due to the next index in sequence already existing, just skip to the next step instead of going to the Error step. This prevents spurious `ResourceAlreadyExistsException`s created by simultaneous RolloverStep executions from causing ILM to error out unnecessarily.	2018-11-05 09:20:43 -07:00
Lee Hinman	3473217563	Remove Joda usage from ILM (#35220 ) This commit removes the Joda time usage from ILM and the HLRC components of ILM. It also fixes an issue where using the `?human=true` flag could have caused the parser not to work. These millisecond fields now follow the standard we use elsewhere in the code, with additional fields added iff the `human` flag is specified. This is a breaking change for ILM, but since ILM has not yet been released, no compatibility shim is needed.	2018-11-05 08:17:15 -07:00
Tal Levy	f8e23f6400	update ILM integ test cluster poll interval to 1s (#35113 )	2018-10-31 17:09:35 -07:00
Tal Levy	5f4b23f8c1	cleanup ILM qa structure (#35110 ) This commit does a few things - moves ILM-specifc rest yaml tests into plugin/ilm/qa, and creates special :plugin:ilm:qa:rest module to test them - removes the with-security tests of the yaml tests since they are covered in the rest tests now - moves ChangePolicyforIndexIT into the qa/multi-node project since that test is not currently running in main ilm since integTest is disabled	2018-10-31 11:49:29 -07:00
Tal Levy	5141084048	rename CRUD api REST path prefix _ilm to _ilm/policy (#35056 ) This PR renames the CRUD APIS for ILM GET _ilm/<policy>, _ilm -> _ilm/policy/<policy>, _ilm/policy PUT _ilm/<policy> -> _ilm/policy/<policy> DELETE _ilm/<policy> -> _ilm/policy/<policy> closes #34929.	2018-10-30 16:19:05 -07:00
Gordon Brown	f6ac0e4bbc	[ILM] Fix Move To Step API causing ILM to hang (#34618 ) The Move To Step API now checks to see if the target step is an AsyncActionStep, and if so, runs it. Previously, AsyncActionSteps would only be run when they are entered by executing the previous step, so if an AsyncActionStep was entered via the Move To Step API, ILM would never touch that index again.	2018-10-29 11:18:12 -06:00
Colin Goodheart-Smithe	0b26f8b14c	Fixes NPE in multi node qa testt	2018-10-25 10:45:30 +01:00
Colin Goodheart-Smithe	c7fe87e43f	Removes Set Policy API in favour of setting index.lifecycle.name directly (#34304 ) * Removes Set Policy API in favour of setting index.lifecycle.name directly * Reinstates matcher that will still be used * Cleans up code after rebase * Adds test to check changing policy with ndex settings works * Fixes TimeseriesLifecycleActionsIT after API removal * Fixes docs tests * Fixes case on close where lifecycle service was never created	2018-10-24 16:14:59 +01:00
Gordon Brown	9cb0bb8b9f	Rework ILM build to separate integration tests (#34617 ) Having integration tests separated from the unit tests in the qa directory works much more smoothly with our testing infrastructure, matches what other plugins do, and tests in a more "real" deployment scenario by having all plugins installed.	2018-10-18 13:33:33 -06:00
Gordon Brown	c0bfc07f53	Only make indexes read-only on Shrink and ForceMerge actions (#33907 ) ILM now only forces indices to become read only in the case of Shrink and Force Merge actions, as these are most useful in cases where the index is no longer being written to.	2018-09-25 10:16:01 -06:00
Lee Hinman	b335487ca6	Fix qa build.gradle to gradle assemble works correctly There is a new way to disable assembling from certain subdirectories	2018-09-06 11:22:27 -06:00
Tal Levy	6780ab9d5c	add user authentication test for ILM (#32826 )	2018-08-21 12:27:53 -07:00
Tal Levy	41e6d98af8	move qa yaml tests to inside the ILM plugin (#32693 ) The qa tests with security haven't actually gone as far as testing security roles yet, so this is a start in the hopes of both bringing the tests into the ilm plugin	2018-08-09 16:09:20 -07:00

1 2 3

139 Commits