OpenSearch

Commit Graph

Author	SHA1	Message	Date
Andrei Dan	4506b37ed5	ILM: Skip rolling indexes that are already rolled (#47324 ) (#47592 ) An index with an ILM policy that has a rollover action in one of the phases was rolled over when the ILM conditions dictated regardless if it was already rolled over (eg. manually after modifying an index template in order to force the creation of a new index that uses the new mappings). This changes this behaviour and has ILM check if the index it's about to roll has not been rolled over in the meantime. (cherry picked from commit 37d6106feeb9f9369519117c88a9e7e30f3ac797) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2019-10-07 07:47:47 +01:00
Lee Hinman	2e3eb4b24e	Add API to execute SLM retention on-demand (#47405 ) (#47463 ) * Add API to execute SLM retention on-demand (#47405) This is a backport of #47405 This commit adds the `/_slm/_execute_retention` API endpoint. This endpoint kicks off SLM retention and then returns immediately. This in particular allows us to run retention without scheduling it (for entirely manual invocation) or perform a one-off cleanup. This commit also includes HLRC for the new API, and fixes an issue in SLMSnapshotBlockingIntegTests where retention invoked prior to the test completing could resurrect an index the internal test cluster cleanup had already deleted. Resolves #46508 Relates to #43663	2019-10-02 12:29:04 -06:00
Rory Hunter	53a4d2176f	Convert most awaitBusy calls to assertBusy (#45794 ) (#47112 ) Backport of #45794 to 7.x. Convert most `awaitBusy` calls to `assertBusy`, and use asserts where possible. Follows on from #28548 by @liketic. There were a small number of places where it didn't make sense to me to call `assertBusy`, so I kept the existing calls but renamed the method to `waitUntil`. This was partly to better reflect its usage, and partly so that anyone trying to add a new call to awaitBusy wouldn't be able to find it. I also didn't change the usage in `TransportStopRollupAction` as the comments state that the local awaitBusy method is a temporary copy-and-paste. Other changes: * Rework `waitForDocs` to scale its timeout. Instead of calling `assertBusy` in a loop, work out a reasonable overall timeout and await just once. * Some tests failed after switching to `assertBusy` and had to be fixed. * Correct the expect templates in AbstractUpgradeTestCase. The ES Security team confirmed that they don't use templates any more, so remove this from the expected templates. Also rewrite how the setup code checks for templates, in order to give more information. * Remove an expected ML template from XPackRestTestConstants The ML team advised that the ML tests shouldn't be waiting for any `.ml-notifications` templates, since such checks should happen in the production code instead. Also rework the template checking code in `XPackRestTestHelper` to give more helpful failure messages. * Fix issue in `DataFrameSurvivesUpgradeIT` when upgrading from < 7.4	2019-09-29 12:21:46 +01:00
Andrei Dan	4c909438dd	Fix OriginationDate parsing tests. (#47170 ) (#47200 ) Drop the usage of `SimpleDateFormat` and use the `DateFormatter` instead (cherry picked from commit 7cf509a7a11ecf6c40c44c18e8f03b8e81fcd1c2) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2019-09-27 13:16:45 +01:00
Gordon Brown	7ac647c365	Add support for POST requests to SLM Execute API (#47061 ) This commit adds support for POST requests to the SLM `_execute` API, because POST is a more appropriate HTTP verb for this action as it is not idempotent. The docs are also changed to favor POST over PUT, although PUT is not removed or officially deprecated.	2019-09-25 16:15:10 -06:00
Andrei Dan	27520cac3b	ILM: parse origination date from index name (#46755 ) (#47124 ) * ILM: parse origination date from index name (#46755) Introduce the `index.lifecycle.parse_origination_date` setting that indicates if the origination date should be parsed from the index name. If set to true an index which doesn't match the expected format (namely `indexName-{dateFormat}-optional_digits` will fail before being created. The origination date will be parsed when initialising a lifecycle for an index and it will be set as the `index.lifecycle.origination_date` for that index. A user set value for `index.lifecycle.origination_date` will always override a possible parsable date from the index name. (cherry picked from commit c363d27f0210733dad0c307d54fa224a92ddb569) Signed-off-by: Andrei Dan <andrei.dan@elastic.co> * Drop usage of Map.of to be java 8 compliant	2019-09-25 21:44:16 +01:00
Lee Hinman	a267df30fa	Wait for snapshot completion in SLM snapshot invocation (#47051 ) * Wait for snapshot completion in SLM snapshot invocation This changes the snapshots internally invoked by SLM to wait for completion. This allows us to capture more snapshotting failure scenarios. For example, previously a snapshot would be created and then registered as a "success", however, the snapshot may have been aborted, or it may have had a subset of its shards fail. These cases are now handled by inspecting the response to the `CreateSnapshotRequest` and ensuring that there are no failures. If any failures are present, the history store now stores the action as a failure instead of a success. Relates to #38461 and #43663	2019-09-25 14:25:22 -06:00
Gordon Brown	a46eef9634	Change SLM stats format (#46991 ) Using arrays of objects with embedded IDs is preferred for new APIs over using entity IDs as JSON keys. This commit changes the SLM stats API to use the preferred format.	2019-09-25 11:32:08 -06:00
Lee Hinman	5ca37db60c	Mute SLMSnapshotBlockingIntegTests.testRetentionWhileSnapshotInProgress Relates to #46508	2019-09-23 17:08:09 -06:00
Lee Hinman	b85468d6ea	Add node setting for disabling SLM (#46794 ) (#46796 ) This adds the `xpack.slm.enabled` setting to allow disabling of SLM functionality as well as its HTTP API endpoints. Relates to #38461	2019-09-17 17:39:41 -06:00
Andrei Dan	c57cca98b2	[ILM] Add date setting to calculate index age (#46561 ) (#46697 ) * [ILM] Add date setting to calculate index age Add the `index.lifecycle.origination_date` to allow users to configure a custom date that'll be used to calculate the index age for the phase transmissions (as opposed to the default index creation date). This could be useful for users to create an index with an "older" origination date when indexing old data. Relates to #42449. * [ILM] Don't override creation date on policy init The initial approach we took was to override the lifecycle creation date if the `index.lifecycle.origination_date` setting was set. This had the disadvantage of the user not being able to update the `origination_date` anymore once set. This commit changes the way we makes use of the `index.lifecycle.origination_date` setting by checking its value when we calculate the index age (ie. at "read time") and, in case it's not set, default to the index creation date. * Make origination date setting index scope dynamic * Document orignation date setting in ilm settings (cherry picked from commit d5bd2bb77ee28c1978ab6679f941d7c02e389d32) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2019-09-16 08:50:28 +01:00
Lee Hinman	52d7b03b49	Wait for no snapshots in state in testRetentionWhileSnapshotIn… (#46573 ) This commit adds a wait/check for all running snapshots to be cleared before taking another snapshot. The previous snapshot was successful but had not yet been cleared from the cluster state, so the second snapshot failed due to a `ConcurrentSnapshotException`. Resolves #46508	2019-09-11 09:47:01 -06:00
Lee Hinman	cdc3a260af	Add retention to Snapshot Lifecycle Management (backport of #4… (#46506 ) * Add retention to Snapshot Lifecycle Management (#46407) This commit adds retention to the existing Snapshot Lifecycle Management feature (#38461) as described in #43663. This allows a user to configure SLM to automatically delete older snapshots based on a number of criteria. An example policy would look like: ``` PUT /_slm/policy/snapshot-every-day { "schedule": "0 30 2 * * ?", "name": "<production-snap-{now/d}>", "repository": "my-s3-repository", "config": { "indices": ["foo-", "important"] }, // Newly configured retention options "retention": { // Snapshots should be deleted after 14 days "expire_after": "14d", // Keep a maximum of thirty snapshots "max_count": 30, // Keep a minimum of the four most recent snapshots "min_count": 4 } } ``` SLM Retention is run on a scheduled configurable with the `slm.retention_schedule` setting, which supports cron expressions. Deletions are run for a configurable time bounded by the `slm.retention_duration` setting, which defaults to 1 hour. Included in this work is a new SLM stats API endpoint available through ``` json GET /_slm/stats ``` That returns statistics about snapshot taken and deleted, as well as successful retention runs, failures, and the time spent deleting snapshots. #45362 has more information as well as an example of the output. These stats are also included when retrieving SLM policies via the API. Add base framework for snapshot retention (#43605) * Add base framework for snapshot retention This adds a basic `SnapshotRetentionService` and `SnapshotRetentionTask` to start as the basis for SLM's retention implementation. Relates to #38461 * Remove extraneous 'public' * Use a local var instead of reading class var repeatedly * Add SnapshotRetentionConfiguration for retention configuration (#43777) * Add SnapshotRetentionConfiguration for retention configuration This commit adds the `SnapshotRetentionConfiguration` class and its HLRC counterpart to encapsulate the configuration for SLM retention. Currently only a single parameter is supported as an example (we still need to discuss the different options we want to support and their names) to keep the size of the PR down. It also does not yet include version serialization checks since the original SLM branch has not yet been merged. Relates to #43663 * Fix REST tests * Fix more documentation * Use Objects.equals to avoid NPE * Put `randomSnapshotLifecyclePolicy` in only one place * Occasionally return retention with no configuration * Implement SnapshotRetentionTask's snapshot filtering and delet… (#44764) * Implement SnapshotRetentionTask's snapshot filtering and deletion This commit implements the snapshot filtering and deletion for `SnapshotRetentionTask`. Currently only the expire-after age is used for determining whether a snapshot is eligible for deletion. Relates to #43663 * Fix deletes running on the wrong thread * Handle missing or null policy in snap metadata differently * Convert Tuple<String, List<SnapshotInfo>> to Map<String, List<SnapshotInfo>> * Use the `OriginSettingClient` to work with security, enhance logging * Prevent NPE in test by mocking Client * Allow empty/missing SLM retention configuration (#45018) Semi-related to #44465, this allows the `"retention"` configuration map to be missing. Relates to #43663 * Add min_count and max_count as SLM retention predicates (#44926) This adds the configuration options for `min_count` and `max_count` as well as the logic for determining whether a snapshot meets this criteria to SLM's retention feature. These options are optional and one, two, or all three can be specified in an SLM policy. Relates to #43663 * Time-bound deletion of snapshots in retention delete function (#45065) * Time-bound deletion of snapshots in retention delete function With a cluster that has a large number of snapshots, it's possible that snapshot deletion can take a very long time (especially since deletes currently have to happen in a serial fashion). To prevent snapshot deletion from taking forever in a cluster and blocking other operations, this commit adds a setting to allow configuring a maximum time to spend deletion snapshots during retention. This dynamic setting defaults to 1 hour and is best-effort, meaning that it doesn't hard stop a deletion at an hour mark, but ensures that once the time has passed, all subsequent deletions are deferred until the next retention cycle. Relates to #43663 * Wow snapshots suuuure can take a long time. * Use a LongSupplier instead of actually sleeping * Remove TestLogging annotation * Remove rate limiting * Add SLM metrics gathering and endpoint (#45362) * Add SLM metrics gathering and endpoint This commit adds the infrastructure to gather metrics about the different SLM actions that a cluster takes. These actions are stored in `SnapshotLifecycleStats` and perpetuated in cluster state. The stats stored include the number of snapshots taken, failed, deleted, the number of retention runs, as well as per-policy counts for snapshots taken, failed, and deleted. It also includes the amount of time spent deleting snapshots from SLM retention. This commit also adds an endpoint for retrieving all stats (further commits will expose this in the SLM get-policy API) that looks like: ``` GET /_slm/stats { "retention_runs" : 13, "retention_failed" : 0, "retention_timed_out" : 0, "retention_deletion_time" : "1.4s", "retention_deletion_time_millis" : 1404, "policy_metrics" : { "daily-snapshots2" : { "snapshots_taken" : 7, "snapshots_failed" : 0, "snapshots_deleted" : 6, "snapshot_deletion_failures" : 0 }, "daily-snapshots" : { "snapshots_taken" : 12, "snapshots_failed" : 0, "snapshots_deleted" : 12, "snapshot_deletion_failures" : 6 } }, "total_snapshots_taken" : 19, "total_snapshots_failed" : 0, "total_snapshots_deleted" : 18, "total_snapshot_deletion_failures" : 6 } ``` This does not yet include HLRC for this, as this commit is quite large on its own. That will be added in a subsequent commit. Relates to #43663 * Version qualify serialization * Initialize counters outside constructor * Use computeIfAbsent instead of being too verbose * Move part of XContent generation into subclass * Fix REST action for master merge * Unused import * Record history of SLM retention actions (#45513) This commit records the deletion of snapshots by the retention component of SLM into the SLM history index for the purposes of reviewing operations taken by SLM and alerting. * Retry SLM retention after currently running snapshot completes (#45802) * Retry SLM retention after currently running snapshot completes This commit adds a ClusterStateObserver to wait until the currently running snapshot is complete before proceeding with snapshot deletion. SLM retention waits for the maximum allowed deletion time for the snapshot to complete, however, the waiting time is not factored into the limit on actual deletions. Relates to #43663 * Increase timeout waiting for snapshot completion * Apply patch From `2374316f0d`.patch * Rename test variables * [TEST] Be less strict for stats checking * Skip SLM retention if ILM is STOPPING or STOPPED (#45869) This adds a check to ensure we take no action during SLM retention if ILM is currently stopped or in the process of stopping. Relates to #43663 * Check all actions preventing snapshot delete during retention (#45992) * Check all actions preventing snapshot delete during retention run Previously we only checked to see if a snapshot was currently running, but it turns out that more things can block snapshot deletion. This changes the check to be a check for: - a snapshot currently running - a deletion already in progress - a repo cleanup in progress - a restore currently running This was found by CI where a third party delete in a test caused SLM retention deletion to throw an exception. Relates to #43663 * Add unit test for okayToDeleteSnapshots * Fix bug where SLM retention task would be scheduled on every node * Enhance test logging * Ignore if snapshot is already deleted * Missing import * Fix SnapshotRetentionServiceTests * Expose SLM policy stats in get SLM policy API (#45989) This also adds support for the SLM stats endpoint to the high level rest client. Retrieving a policy now looks like: ```json { "daily-snapshots" : { "version": 1, "modified_date": "2019-04-23T01:30:00.000Z", "modified_date_millis": 1556048137314, "policy" : { "schedule": "0 30 1 * * ?", "name": "<daily-snap-{now/d}>", "repository": "my_repository", "config": { "indices": ["data-", "important"], "ignore_unavailable": false, "include_global_state": false }, "retention": {} }, "stats": { "snapshots_taken": 0, "snapshots_failed": 0, "snapshots_deleted": 0, "snapshot_deletion_failures": 0 }, "next_execution": "2019-04-24T01:30:00.000Z", "next_execution_millis": 1556048160000 } } ``` Relates to #43663 Rewrite SnapshotLifecycleIT as as ESIntegTestCase (#46356) * Rewrite SnapshotLifecycleIT as as ESIntegTestCase This commit splits `SnapshotLifecycleIT` into two different tests. `SnapshotLifecycleRestIT` which includes the tests that do not require slow repositories, and `SLMSnapshotBlockingIntegTests` which is now an integration test using `MockRepository` to simulate a snapshot being in progress. Relates to #43663 Resolves #46205 * Add error logging when exceptions are thrown * Update serialization versions * Fix type inference * Use non-Cancellable HLRC return value * Fix Client mocking in test * Fix SLMSnapshotBlockingIntegTests for 7.x branch * Update SnapshotRetentionTask for non-multi-repo snapshot retrieval * Add serialization guards for SnapshotLifecyclePolicy	2019-09-10 09:08:09 -06:00
Lee Hinman	3d4b8e01c7	Validate SLM policy ids strictly (#45998 ) (#46145 ) This uses strict validation for SLM policy ids, similar to what we use for index names. Resolves #45997	2019-09-03 09:20:02 -06:00
Gordon Brown	47bbd9d9a9	[7.x] Fix rollover alias in SLM history index template (#46001 ) This commit adds the `rollover_alias` setting required for ILM to work correctly to the SLM history index template and adds assertions to the SLM integration tests to ensure that it works correctly.	2019-08-28 14:50:22 -07:00
Gordon Brown	47b1e2b3d0	[7.x] Use rollover for SLM's history indices (#45686 ) Following our own guidelines, SLM should use rollover instead of purely time-based indices to keep shard counts low. This commit implements lazy index creation for SLM's history indices, indexing via an alias, and rollover in the built-in ILM policy.	2019-08-21 13:42:11 -06:00
Armin Braun	a01bd6c5a3	Stop Executing SLM Policy Transport Action on Snapshot Pool (#45727 ) (#45748 ) * Executing SLM policies on the snapshot thread will block until a snapshot finishes if the pool is completely busy executing that snapshot * Fixes #45594	2019-08-20 19:15:36 +02:00
Gordon Brown	ecb3ebd796	Clean SLM and ongoing snapshots in test framework (#45564 ) Adjusts the cluster cleanup routine in ESRestTestCase to clean up SLM test cases, and optionally wait for all snapshots to be deleted. Waiting for all snapshots to be deleted, rather than failing if any are in progress, is necessary for tests which use SLM policies because SLM policies may be in the process of executing when the test ends.	2019-08-16 14:17:34 -06:00
Gordon Brown	3f5dab99c3	Properly set origin for SLM history store client (#45515 ) The origin was not set properly for the SnapshotHistoryStore client, resulting in errors when SLM was used when security was enabled.	2019-08-13 18:23:20 -06:00
Armin Braun	a9e1402189	Remove Settings from BaseRestRequest Constructor (#45418 ) (#45429 ) * Resolving the todo, cleaning up the unused `settings` parameter * Cleaning up some other minor dead code in affected classes	2019-08-12 05:14:45 +02:00
Alpar Torok	634a070430	Restrict which tasks can use testclusters (#45198 ) * Restrict which tasks can use testclusters This PR fixes a problem between the interaction of test-clusters and build cache. Before this any task could have used a cluster without tracking it as input. With this change a new interface is introduced to track the tasks that can use clusters and we do consider the cluster as input for all of them.	2019-08-09 13:38:01 +03:00
Lee Hinman	c7ec0b8431	Include in-progress snapshot for a policy with get SLM policy… (#45245 ) This commit adds the "in_progress" key to the SLM get policy API, returning a policy that looks like: ```json { "daily-snapshots" : { "version" : 1, "modified_date" : "2019-08-05T18:41:48.778Z", "modified_date_millis" : 1565030508778, "policy" : { "name" : "<production-snap-{now/d}>", "schedule" : "0 30 1 * * ?", "repository" : "repo", "config" : { "indices" : [ "foo-*", "important" ], "ignore_unavailable" : true, "include_global_state" : false }, "retention" : { "expire_after" : "10m" } }, "last_success" : { "snapshot_name" : "production-snap-2019.08.05-oxctmnobqye3luim4uejhg", "time_string" : "2019-08-05T18:42:23.257Z", "time" : 1565030543257 }, "next_execution" : "2019-08-06T01:30:00.000Z", "next_execution_millis" : 1565055000000, "in_progress" : { "name" : "production-snap-2019.08.05-oxctmnobqye3luim4uejhg", "uuid" : "t8Idqt6JQxiZrzp0Vt7z6g", "state" : "STARTED", "start_time" : "2019-08-05T18:42:22.998Z", "start_time_millis" : 1565030542998 } } } ``` These are only visible while the snapshot is being taken (or failed), since it reads from the cluster state rather than from the repository itself.	2019-08-07 08:29:49 -06:00
Mark Vieira	c13285a382	Remove unnecessary plugin application and project configuration (#45100 )	2019-08-01 14:18:24 -07:00
David Kyle	e18e9fa8c5	Mute SnapshotLifecycleServiceTests#testPolicyCRUD Relates to https://github.com/elastic/elasticsearch/issues/44997	2019-07-30 10:36:27 +01:00
Lee Hinman	598c4e72f9	[7.x] Rename indexlifecycle to ilm and snapshotlifecycle to sl… (#44977 ) * Rename indexlifecycle to ilm and snapshotlifecycle to slm (#44917) As a followup to #44725 and #44608, which renamed the packages within the x-pack project, this renames the packages within the core x-pack project. It also renames 'snapshotlifecycle' within the HLRC to slm. * Fix one more import	2019-07-29 15:51:14 -06:00
Gordon Brown	d4b2d21339	Add option to filter ILM explain response (#44777 ) In order to make it easier to interpret the output of the ILM Explain API, this commit adds two request parameters to that API: - `only_managed`, which causes the response to only contain indices which have `index.lifecycle.name` set - `only_errors`, which causes the response to contain only indices in an ILM error state "Error state" is defined as either being in the `ERROR` step or having `index.lifecycle.name` set to a policy that does not exist.	2019-07-26 11:57:38 -04:00
Jason Tedor	e2c8f8dfa3	Rename ILM package to ilm (#44725 ) This commit renames the ILM package from indexlifecycle to ilm. We have all come to know index lifecycle management as ILM, the APIs and settings use ilm, and it would be nice of the package did too. This commit makes that change.	2019-07-23 16:46:38 +09:00
Jason Tedor	5878bde8dc	Rename SLM package to slm (#44608 ) This commit renames the SLM package from snapshotlifecycle to slm. We have all come to know index lifecycle management as ILM, the APIs and settings use ilm, and it would be nice of the package did too. For SLM, let's use slm for all of these including the package name from the beginning.	2019-07-23 07:35:06 +09:00
Lee Hinman	3001f7941f	Allow empty configuration for SLM policies (#44465 ) * Allow empty configuration for SLM policies When putting or updating a snapshot lifecycle policy it was not possible to elide the `config` map. This commit makes the configuration optional, the same way that it is when taking a snapshot. Relates to #38461 * Add Objects.requireNonNull for required parts of the policy	2019-07-18 16:20:31 -06:00
Lee Hinman	fe2ef66e45	Expose index age in ILM explain output (#44457 ) * Expose index age in ILM explain output This adds the index's age to the ILM explain output, for example: ``` { "indices" : { "ilm-000001" : { "index" : "ilm-000001", "managed" : true, "policy" : "full-lifecycle", "lifecycle_date" : "2019-07-16T19:48:22.294Z", "lifecycle_date_millis" : 1563306502294, "age" : "1.34m", "phase" : "hot", "phase_time" : "2019-07-16T19:48:22.487Z", ... etc ... } } } ``` This age can be used to tell when ILM will transition the index to the next phase, based on that phase's `min_age`. Resolves #38988 * Expose age in getters and in HLRC	2019-07-18 15:33:45 -06:00
Ryan Ernst	2a2686e6e7	Convert remaining ActionTypes to writeable in xpack core (#44467 ) (#44525 ) This commit converts all remaining ActionType response classes to writeable in xpack core. It also converts a few from server which were used by xpack core. relates #34389	2019-07-17 18:01:45 -07:00
Jason Tedor	39c5f98de7	Introduce test issue logging (#44477 ) Today we have an annotation for controlling logging levels in tests. This annotation serves two purposes, one is to control the logging level used in tests, when such control is needed to impact and assert the behavior of loggers in tests. The other use is when a test is failing and additional logging is needed. This commit separates these two concerns into separate annotations. The primary motivation for this is that we have a history of leaving behind the annotation for the purpose of investigating test failures long after the test failure is resolved. The accumulation of these stale logging annotations has led to excessive disk consumption. Having recently cleaned this up, we would like to avoid falling into this state again. To do this, we are adding a link to the test failure under investigation to the annotation when used for the purpose of investigating test failures. We will add tooling to inspect these annotations, in the same way that we have tooling on awaits fix annotations. This will enable us to report on the use of these annotations, and report when stale uses of the annotation exist.	2019-07-18 05:33:33 +09:00
Ryan Ernst	0755a13c9f	Convert AcknowledgedRequest to Writeable.Reader (#44412 ) (#44454 ) This commit adds constructors to AcknolwedgedRequest subclasses to implement Writeable.Reader, and ensures all future subclasses implement the same. relates #34389	2019-07-17 11:17:36 -07:00
Yannick Welsch	d98b3e4760	Move frozen indices to x-pack module (#44490 ) Backport of #44408 and #44286.	2019-07-17 16:53:10 +02:00
Lee Hinman	fb0461ac76	[7.x] Add Snapshot Lifecycle Management (#44382 ) * Add Snapshot Lifecycle Management (#43934) * Add SnapshotLifecycleService and related CRUD APIs This commit adds `SnapshotLifecycleService` as a new service under the ilm plugin. This service handles snapshot lifecycle policies by scheduling based on the policies defined schedule. This also includes the get, put, and delete APIs for these policies Relates to #38461 * Make scheduledJobIds return an immutable set * Use Object.equals for SnapshotLifecyclePolicy * Remove unneeded TODO * Implement ToXContentFragment on SnapshotLifecyclePolicyItem * Copy contents of the scheduledJobIds * Handle snapshot lifecycle policy updates and deletions (#40062) (Note this is a PR against the `snapshot-lifecycle-management` feature branch) This adds logic to `SnapshotLifecycleService` to handle updates and deletes for snapshot policies. Policies with incremented versions have the old policy cancelled and the new one scheduled. Deleted policies have their schedules cancelled when they are no longer present in the cluster state metadata. Relates to #38461 * Take a snapshot for the policy when the SLM policy is triggered (#40383) (This is a PR for the `snapshot-lifecycle-management` branch) This commit fills in `SnapshotLifecycleTask` to actually perform the snapshotting when the policy is triggered. Currently there is no handling of the results (other than logging) as that will be added in subsequent work. This also adds unit tests and an integration test that schedules a policy and ensures that a snapshot is correctly taken. Relates to #38461 * Record most recent snapshot policy success/failure (#40619) Keeping a record of the results of the successes and failures will aid troubleshooting of policies and make users more confident that their snapshots are being taken as expected. This is the first step toward writing history in a more permanent fashion. * Validate snapshot lifecycle policies (#40654) (This is a PR against the `snapshot-lifecycle-management` branch) With the commit, we now validate the content of snapshot lifecycle policies when the policy is being created or updated. This checks for the validity of the id, name, schedule, and repository. Additionally, cluster state is checked to ensure that the repository exists prior to the lifecycle being added to the cluster state. Part of #38461 * Hook SLM into ILM's start and stop APIs (#40871) (This pull request is for the `snapshot-lifecycle-management` branch) This change allows the existing `/_ilm/stop` and `/_ilm/start` APIs to also manage snapshot lifecycle scheduling. When ILM is stopped all scheduled jobs are cancelled. Relates to #38461 * Add tests for SnapshotLifecyclePolicyItem (#40912) Adds serialization tests for SnapshotLifecyclePolicyItem. * Fix improper import in build.gradle after master merge * Add human readable version of modified date for snapshot lifecycle policy (#41035) * Add human readable version of modified date for snapshot lifecycle policy This small change changes it from: ``` ... "modified_date": 1554843903242, ... ``` To ``` ... "modified_date" : "2019-04-09T21:05:03.242Z", "modified_date_millis" : 1554843903242, ... ``` Including the `"modified_date"` field when the `?human` field is used. Relates to #38461 * Fix test * Add API to execute SLM policy on demand (#41038) This commit adds the ability to perform a snapshot on demand for a policy. This can be useful to take a snapshot immediately prior to performing some sort of maintenance. ```json PUT /_ilm/snapshot/<policy>/_execute ``` And it returns the response with the generated snapshot name: ```json { "snapshot_name" : "production-snap-2019.04.09-rfyv3j9qreixkdbnfuw0ug" } ``` Note that this does not allow waiting for the snapshot, and the snapshot could still fail. It does record this information into the cluster state similar to a regularly trigged SLM job. Relates to #38461 * Add next_execution to SLM policy metadata (#41221) * Add next_execution to SLM policy metadata This adds the next time a snapshot lifecycle policy will be executed when retriving a policy's metadata, for example: ```json GET /_ilm/snapshot?human { "production" : { "version" : 1, "modified_date" : "2019-04-15T21:16:21.865Z", "modified_date_millis" : 1555362981865, "policy" : { "name" : "<production-snap-{now/d}>", "schedule" : "/30 * * * ?", "repository" : "repo", "config" : { "indices" : [ "foo-", "important" ], "ignore_unavailable" : true, "include_global_state" : false } }, "next_execution" : "2019-04-15T21:16:30.000Z", "next_execution_millis" : 1555362990000 }, "other" : { "version" : 1, "modified_date" : "2019-04-15T21:12:19.959Z", "modified_date_millis" : 1555362739959, "policy" : { "name" : "<other-snap-{now/d}>", "schedule" : "0 30 2 * ?", "repository" : "repo", "config" : { "indices" : [ "other" ], "ignore_unavailable" : false, "include_global_state" : true } }, "next_execution" : "2019-04-16T02:30:00.000Z", "next_execution_millis" : 1555381800000 } } ``` Relates to #38461 * Fix and enhance tests * Figured out how to Cron * Change SLM endpoint from /_ilm/* to /_slm/* (#41320) This commit changes the endpoint for snapshot lifecycle management from: ``` GET /_ilm/snapshot/<policy> ``` to: ``` GET /_slm/policy/<policy> ``` It mimics the ILM path only using `slm` instead of `ilm`. Relates to #38461 * Add initial documentation for SLM (#41510) * Add initial documentation for SLM This adds the initial documentation for snapshot lifecycle management. It also includes the REST spec API json files since they're sort of documentation. Relates to #38461 * Add `manage_slm` and `read_slm` roles (#41607) * Add `manage_slm` and `read_slm` roles This adds two more built in roles - `manage_slm` which has permission to perform any of the SLM actions, as well as stopping, starting, and retrieving the operation status of ILM. `read_slm` which has permission to retrieve snapshot lifecycle policies as well as retrieving the operation status of ILM. Relates to #38461 * Add execute to the test * Fix ilm -> slm typo in test * Record SLM history into an index (#41707) It is useful to have a record of the actions that Snapshot Lifecycle Management takes, especially for the purposes of alerting when a snapshot fails or has not been taken successfully for a certain amount of time. This adds the infrastructure to record SLM actions into an index that can be queried at leisure, along with a lifecycle policy so that this history does not grow without bound. Additionally, SLM automatically setting up an index + lifecycle policy leads to `index_lifecycle` custom metadata in the cluster state, which some of the ML tests don't know how to deal with due to setting up custom `NamedXContentRegistry`s. Watcher would cause the same problem, but it is already disabled (for the same reason). * High Level Rest Client support for SLM (#41767) * High Level Rest Client support for SLM This commit add HLRC support for SLM. Relates to #38461 * Fill out documentation tests with tags * Add more callouts and asciidoc for HLRC * Update javadoc links to real locations * Add security test testing SLM cluster privileges (#42678) * Add security test testing SLM cluster privileges This adds a test to `PermissionsIT` that uses the `manage_slm` and `read_slm` cluster privileges. Relates to #38461 * Don't redefine vars * Add Getting Started Guide for SLM (#42878) This commit adds a basic Getting Started Guide for SLM. * Include SLM policy name in Snapshot metadata (#43132) Keep track of which SLM policy in the metadata field of the Snapshots taken by SLM. This allows users to more easily understand where the snapshot came from, and will enable future SLM features such as retention policies. * Fix compilation after master merge * [TEST] Move exception wrapping for devious exception throwing Fixes an issue where an exception was created from one line and thrown in another. * Fix SLM for the change to AcknowledgedResponse * Add Snapshot Lifecycle Management Package Docs (#43535) * Fix compilation for transport actions now that task is required * Add a note mentioning the privileges needed for SLM (#43708) * Add a note mentioning the privileges needed for SLM This adds a note to the top of the "getting started with SLM" documentation mentioning that there are two built-in privileges to assist with creating roles for SLM users and administrators. Relates to #38461 * Mention that you can create snapshots for indices you can't read * Fix REST tests for new number of cluster privileges * Mute testThatNonExistingTemplatesAreAddedImmediately (#43951) * Fix SnapshotHistoryStoreTests after merge * Remove overridden newResponse functions that have been removed * Fix compilation for backport * Fix get snapshot output parsing in test * [DOCS] Add redirects for removed autogen anchors (#44380) * Switch <tt>...</tt> in javadocs for {@code ...}	2019-07-16 07:37:13 -06:00
Ryan Ernst	7e06888bae	Convert testclusters to use distro download plugin (#44253 ) (#44362 ) Test clusters currently has its own set of logic for dealing with finding different versions of Elasticsearch, downloading them, and extracting them. This commit converts testclusters to use the DistributionDownloadPlugin.	2019-07-15 17:53:05 -07:00
Ryan Ernst	59658daef9	Separate streamable based master node actions (#44313 ) This commit creates new base classes for master node actions whose response types still implement Streamable. This simplifies both finding remaining classes to convert, as well as creating new master node actions that use Writeable for their responses. relates #34389	2019-07-15 09:20:20 -07:00
Jake Landis	6e9ccda2c5	ilm test - allow more time for policy completion (#43844 )	2019-07-02 22:05:18 -05:00
Jake Landis	0a79f4ca70	Extend timeout for TimeSeriesLifecycleActionsIT> testFullPolicy (#43891 )	2019-07-02 22:05:04 -05:00
Ryan Ernst	3a2c698ce0	Rename Action to ActionType (#43778 ) Action is a class that encapsulates meta information about an action that allows it to be called remotely, specifically the action name and response type. With recent refactoring, the action class can now be constructed as a static constant, instead of needing to create a subclass. This makes the old pattern of creating a singleton INSTANCE both misnamed and lacking a common placement. This commit renames Action to ActionType, thus allowing the old INSTANCE naming pattern to be TYPE on the transport action itself. ActionType also conveys that this class is also not the action itself, although this change does not rename any concrete classes as those will be removed organically as they are converted to TYPE constants. relates #34389	2019-06-30 22:00:17 -07:00
Martijn van Groningen	101cf384ba	Replace Streamable w/ Writable in AcknowledgedResponse and subclasses (backport 7.x) (#43525 ) This commit replaces usages of Streamable with Writeable for the AcknowledgedResponse and its subclasses, plus associated actions. Note that where possible response fields were made final and default constructors were removed. This is a large PR, but the change is mostly mechanical. Relates to #34389 Backport of #43414	2019-06-24 13:47:37 +02:00
Lee Hinman	c2bf628a6d	[7.x] Narrow period of Shrink action in which ILM prevents stopping (#43254 ) (#43393 ) * Narrow period of Shrink action in which ILM prevents stopping Prior to this change, we would prevent stopping of ILM if the index was anywhere in the shrink action. This commit changes `IndexLifecycleService` to allow stopping when in any of the innocuous steps during shrink. This changes ILM only to prevent stopping if absolutely necessary. Resolves #43253 * Rename variable for ignore actions -> ignore steps * Fix comment * Factor test out to test all stoppable steps	2019-06-19 16:37:41 -06:00
Alpar Torok	167e51335d	Convert ILM tests to use testclusters (#43076 ) Also improove the error message when bin scripts are not found	2019-06-13 12:24:48 +03:00
Ryan Ernst	172cd4dbfa	Remove description from xpack feature sets (#43065 ) The description field of xpack featuresets is optionally part of the xpack info api, when using the verbose flag. However, this information is unnecessary, as it is better left for documentation (and the existing descriptions describe anything meaningful). This commit removes the description field from feature sets.	2019-06-11 09:22:58 -07:00
Jason Tedor	117df87b2b	Replicate aliases in cross-cluster replication (#42875 ) This commit adds functionality so that aliases that are manipulated on leader indices are replicated by the shard follow tasks to the follower indices. Note that we ignore write indices. This is due to the fact that follower indices do not receive direct writes so the concept is not useful. Relates #41815	2019-06-04 20:36:24 -04:00
Mark Vieira	e44b8b1e2e	[Backport] Remove dependency substitutions 7.x (#42866 ) * Remove unnecessary usage of Gradle dependency substitution rules (#42773) (cherry picked from commit 12d583dbf6f7d44f00aa365e34fc7e937c3c61f7)	2019-06-04 13:50:23 -07:00
Ryan Ernst	6fd8924c5a	Switch run task to use real distro (#41590 ) The run task is supposed to run elasticsearch with the given plugin or module. However, for modules, this is most realistic if using the full distribution. This commit changes the run setup to use the default or oss as appropriate.	2019-05-06 12:34:07 -07:00
Daniel Mitterdorfer	8580053818	Mute PermissionsIT#testWhen[...]ByILMPolicy (#41859 ) Relates #41440 Relates #41858	2019-05-06 16:15:37 +02:00
Christoph Büscher	52495843cc	[Docs] Fix common word repetitions (#39703 )	2019-04-25 20:47:47 +02:00
Yogesh Gaikwad	0d1178fca6	put mapping authorization for alias with write-index and multiple read indices (#40834 ) (#41287 ) When the same alias points to multiple indices we can write to only one index with `is_write_index` value `true`. The special handling in case of the put mapping request(to resolve authorized indices) has a check on indices size for a concrete index. If multiple indices existed then it marked the request as unauthorized. The check has been modified to consider write index flag and only when the requested index matches with the one with write index alias, the alias is considered for authorization. Closes #40831	2019-04-17 14:25:33 +10:00
Gordon Brown	ec8709e831	Check allocation rules are cleared after ILM Shrink (#41170 ) Adds some checks to make sure that the allocation rules that ILM adds before a shrink are cleared after the shrink is complete	2019-04-16 09:25:51 -06:00
Gordon Brown	7e59794ced	Log every use of ILM Move to Step API (#41171 ) Usage of the ILM Move to Step API can result in some very odd situations, and for diagnosing problems arising from these situations it would be nice to have a record of when this API was called with what parameters. Also, adds a dedicated logger for TransportMoveToStepAction, rather than using the (deprecated) inherited one.	2019-04-15 16:20:37 -06:00
Mark Vieira	1287c7d91f	[Backport] Replace usages RandomizedTestingTask with built-in Gradle Test (#40978 ) (#40993 ) * Replace usages RandomizedTestingTask with built-in Gradle Test (#40978) This commit replaces the existing RandomizedTestingTask and supporting code with Gradle's built-in JUnit support via the Test task type. Additionally, the previous workaround to disable all tasks named "test" and create new unit testing tasks named "unitTest" has been removed such that the "test" task now runs unit tests as per the normal Gradle Java plugin conventions. (cherry picked from commit 323f312bbc829a63056a79ebe45adced5099f6e6) * Fix forking JVM runner * Don't bump shadow plugin version	2019-04-09 11:52:50 -07:00
Gordon Brown	5347dec55e	Allow ILM to stop if indices have nonexistent policies (#40820 ) Prior to this PR, there is a bug in ILM which does not allow ILM to stop if one or more indices have an index.lifecycle.name which refers to a policy that does not exist - the operation_mode will be stuck as STOPPING until either the policy is created or the nonexistent policy is removed from those indices. This change allows ILM to stop in this case and makes the logging more clear as to why ILM is not stopping.	2019-04-04 11:46:21 -06:00
Lee Hinman	2fd01cc0b7	Fix testRunStateChangePolicyWithAsyncActionNextStep race condition (#40707 ) Previously we only set the latch countdown with `nextStep.setLatch` after the cluster state change has already been counted down. However, it's possible execution could have already started, causing the latch to be missed when the `MockAsyncActionStep` is being executed. This moves the latch setting to be before the call to `runPolicyAfterStateChange`, which means it is always available when the `MockAsyncActionStep` is executed. I was able to reproduce the failure every 30-40 runs before this change. With this change, running 2000+ times the test passes. Resolves #40018	2019-04-02 10:56:44 -06:00
Gordon Brown	db7f00098e	Correct ILM metadata minimum compatibility version (#40569 ) The ILM metadata minimum compatibility version was not set correctly, which can cause issues in mixed-version clusters.	2019-03-28 10:53:44 -06:00
Lee Hinman	8ec456b5df	Maintain step order for ILM trace logging (#39522 ) When trace logging is enabled we log the computed steps for a policy. This commit makes sure that the steps that are logged are in the same order they will be run when the policy executes. This makes it much easier to reason about the policy if the move-to-step API is ever required in the future.	2019-03-07 11:37:58 -07:00
Gordon Brown	f4c5abe4d4	Handle failure to release retention leases in ILM (#39281 ) (#39417 ) It is possible that the Unfollow API may fail to release shard history retention leases when unfollowing, so this needs to be handled by the ILM Unfollow action. There's nothing much that can be done automatically about it from the follower side, so this change makes the ILM unfollow action simply ignore those failures.	2019-02-26 16:58:30 -07:00
Gordon Brown	2ad1e6aedc	Fix testCannotShrinkLeaderIndex (#38529 ) This test should no longer pass when the functionality it is intended to test is broken, as it now indexes a number of documents and verifies that the index is staying on the same step until after indexing and replication of those documents is finished. This prevents the test from passing if the leader index progresses in its lifecycle during that time.	2019-02-22 08:03:36 -07:00
Jay Modi	697911c31d	Fixed missed stopping of SchedulerEngine (#39193 ) The SchedulerEngine is used in several places in our code and not all of these usages properly stopped the SchedulerEngine, which could lead to test failures due to leaked threads from the SchedulerEngine. This change adds stopping to these usages in order to avoid the thread leaks that cause CI failures and noise. Closes #38875	2019-02-21 14:31:33 -07:00
Jason Tedor	09ea3ccd16	Remove retention leases when unfollowing (#39088 ) This commit attempts to remove the retention leases on the leader shards when unfollowing an index. This is best effort, since the leader might not be available.	2019-02-20 07:06:49 -05:00
Jason Tedor	2d8f6b6501	Introduce retention lease state file (#39004 ) This commit moves retention leases from being persisted in the Lucene commit point to being persisted in a dedicated state file.	2019-02-18 16:53:46 -05:00
Jason Tedor	a5ce1e0bec	Integrate retention leases to recovery from remote (#38829 ) This commit is the first step in integrating shard history retention leases with CCR. In this commit we integrate shard history retention leases with recovery from remote. Before we start transferring files, we take out a retention lease on the primary. Then during the file copy phase, we repeatedly renew the retention lease. Finally, when recovery from remote is complete, we disable the background renewing of the retention lease.	2019-02-16 15:37:52 -05:00
Luca Cavanna	a7046e001c	Remove support for maxRetryTimeout from low-level REST client (#38085 ) We have had various reports of problems caused by the maxRetryTimeout setting in the low-level REST client. Such setting was initially added in the attempts to not have requests go through retries if the request already took longer than the provided timeout. The implementation was problematic though as such timeout would also expire in the first request attempt (see #31834), would leave the request executing after expiration causing memory leaks (see #33342), and would not take into account the http client internal queuing (see #25951). Given all these issues, it seems that this custom timeout mechanism gives little benefits while causing a lot of harm. We should rather rely on connect and socket timeout exposed by the underlying http client and accept that a request can overall take longer than the configured timeout, which is the case even with a single retry anyways. This commit removes the `maxRetryTimeout` setting and all of its usages.	2019-02-06 08:43:47 +01:00
Julie Tibshirani	3ce7d2c9b6	Make sure to reject mappings with type _doc when include_type_name is false. (#38270 ) `CreateIndexRequest#source(Map<String, Object>, ... )`, which is used when deserializing index creation requests, accidentally accepts mappings that are nested twice under the type key (as described in the bug report #38266). This in turn causes us to be too lenient in parsing typeless mappings. In particular, we accept the following index creation request, even though it should not contain the type key `_doc`: ``` PUT index?include_type_name=false { "mappings": { "_doc": { "properties": { ... } } } } ``` There is a similar issue for both 'put templates' and 'put mappings' requests as well. This PR makes the minimal changes to detect and reject these typed mappings in requests. It does not address #38266 generally, or attempt a larger refactor around types in these server-side requests, as I think this should be done at a later time.	2019-02-05 10:52:32 -08:00
Gordon Brown	b866417650	Mute testCannotShrinkLeaderIndex (#38374 ) This test should not pass until CCR finishes integrating shard history retention leases. It currently sometimes passes (which is a bug in the test), but cannot pass reliably until the linked issue is resolved.	2019-02-04 16:06:19 -07:00
Gordon Brown	7a1e89c7ed	Ensure ILM policies run safely on leader indices (#38140 ) Adds a Step to the Shrink and Delete actions which prevents those actions from running on a leader index - all follower indices must first unfollow the leader index before these actions can run. This prevents the loss of history before follower indices are ready, which might otherwise result in the loss of data.	2019-02-01 20:46:12 -07:00
Tal Levy	bae656dcea	Preserve ILM operation mode when creating new lifecycles (#38134 ) There was a bug where creating a new policy would start the ILM service, even if it was stopped. This change ensures that there is no change to the existing operation mode	2019-02-01 13:16:34 -08:00
Tal Levy	7c738fd241	Skip Shrink when numberOfShards not changed (#37953 ) Previously, ShrinkAction would fail if it was executed on an index that had the same number of shards as the target shrunken number. This PR introduced a new BranchingStep that is used inside of ShrinkAction to branch which step to move to next, depending on the shard values. So no shrink will occur if the shard count is unchanged.	2019-01-30 15:09:17 -08:00
Tim Brooks	00ace369af	Use `CcrRepository` to init follower index (#35719 ) This commit modifies the put follow index action to use a CcrRepository when creating a follower index. It routes the logic through the snapshot/restore process. A wait_for_active_shards parameter can be used to configure how long to wait before returning the response.	2019-01-29 11:47:29 -07:00
Gordon Brown	49bd8715ff	Inject Unfollow before Rollover and Shrink (#37625 ) We inject an Unfollow action before Shrink because the Shrink action cannot be safely used on a following index, as it may not be fully caught up with the leader index before the "original" following index is deleted and replaced with a non-following Shrunken index. The Unfollow action will verify that 1) the index is marked as "complete", and 2) all operations up to this point have been replicated from the leader to the follower before explicitly disconnecting the follower from the leader. Injecting an Unfollow action before the Rollover action is done mainly as a convenience: This allow users to use the same lifecycle policy on both the leader and follower cluster without having to explictly modify the policy to unfollow the index, while doing what we expect users to want in most cases.	2019-01-28 14:09:12 -07:00
Lee Hinman	427bc7f940	Use ILM for Watcher history deletion (#37443 ) * Use ILM for Watcher history deletion This commit adds an index lifecycle policy for the `.watch-history-*` indices. This policy is automatically used for all new watch history indices. This does not yet remove the automatic cleanup that the monitoring plugin does for the .watch-history indices, and it does not touch the `xpack.watcher.history.cleaner_service.enabled` setting. Relates to #32041	2019-01-23 10:18:08 -07:00
Lee Hinman	647e225698	Retry ILM steps that fail due to SnapshotInProgressException (#37624 ) Some steps, such as steps that delete, close, or freeze an index, may fail due to a currently running snapshot of the index. In those cases, rather than move to the ERROR step, we should retry the step when the snapshot has completed. This change adds an abstract step (`AsyncRetryDuringSnapshotActionStep`) that certain steps (like the ones I mentioned above) can extend that will automatically handle a situation where a snapshot is taking place. When a `SnapshotInProgressException` is received by the listener wrapper, a `ClusterStateObserver` listener is registered to wait until the snapshot has completed, re-running the ILM action when no snapshot is occurring. This also adds integration tests for these scenarios (thanks to @talevy in #37552). Resolves #37541	2019-01-23 09:46:31 -07:00
Ryan Ernst	9a34b20233	Simplify integ test distribution types (#37618 ) The integ tests currently use the raw zip project name as the distribution type. This commit simplifies this specification to be "default" or "oss". Whether zip or tar is used should be an internal implementation detail of the integ test setup, which can (in the future) be platform specific.	2019-01-21 12:37:17 -08:00
Martijn van Groningen	a3030c51e2	[ILM] Add unfollow action (#36970 ) This change adds the unfollow action for CCR follower indices. This is needed for the shrink action in case an index is a follower index. This will give the follower index the opportunity to fully catch up with the leader index, pause index following and unfollow the leader index. After this the shrink action can safely perform the ilm shrink. The unfollow action needs to be added to the hot phase and acts as barrier for going to the next phase (warm or delete phases), so that follower indices are being unfollowed properly before indices are expected to go in read-only mode. This allows the force merge action to execute its steps safely. The unfollow action has three steps: * `wait-for-indexing-complete` step: waits for the index in question to get the `index.lifecycle.indexing_complete` setting be set to `true` * `wait-for-follow-shard-tasks` step: waits for all the shard follow tasks for the index being handled to report that the leader shard global checkpoint is equal to the follower shard global checkpoint. * `pause-follower-index` step: Pauses index following, necessary to unfollow * `close-follower-index` step: Closes the index, necessary to unfollow * `unfollow-follower-index` step: Actually unfollows the index using the CCR Unfollow API * `open-follower-index` step: Reopens the index now that it is a normal index * `wait-for-yellow` step: Waits for primary shards to be allocated after reopening the index to ensure the index is ready for the next step In the case of the last two steps, if the index in being handled is a regular index then the steps acts as a no-op. Relates to #34648 Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com> Co-authored-by: Gordon Brown <gordon.brown@elastic.co>	2019-01-18 13:05:03 -07:00
Jake Landis	587034dfa7	Add set_priority action to ILM (#37397 ) This commit adds a set_priority action to the hot, warm, and cold phases for an ILM policy. This action sets the `index.priority` on the managed index to allow different priorities between the hot, warm, and cold recoveries. This commit also includes the HLRC and documentation changes. closes #36905	2019-01-17 09:55:36 -06:00
Alexander Reelsen	b2e8437424	Tests: Add ElasticsearchAssertions.awaitLatch method (#36777 ) * Tests: Add ElasticsearchAssertions.awaitLatch method Some tests are using assertTrue(latch.await(...)) in their code. This leads to an assertion error without any error message. This adds a method which has a nicer error message and can be used in tests. * fix forbidden apis * fix spaces	2019-01-10 09:25:36 +01:00
Tal Levy	eaeccd8401	[ILM] Add Freeze Action (#36910 ) This commit adds a new ILM Action for freezing indices in the cold phase. Closes #34630.	2019-01-03 15:00:40 -08:00
Tal Levy	f6c1e3f14f	[ILM][TEST] increase assertBusy timeout (#36864 ) the testFullPolicy and testMoveToRolloverStep tests are very important tests, but they sometimes timeout beyond the default 10sec wait for shrink to occur. This commit increases one of the assertBusys to 20 seconds	2018-12-20 08:55:02 -08:00
Gordon Brown	d39956c65c	Remove `indexing_complete` when removing policy (#36620 ) Leaving `index.lifecycle.indexing_complete` in place when removing the lifecycle policy from an index can cause confusion, as if a new policy is associated with the policy, rollover will be silently skipped. Removing that setting when removing the policy from an index makes associating a new policy with the index more involved, but allows ILM to fail loudly, rather than silently skipping operations which the user may assume are being performed. * Adjust order of checks in WaitForRolloverReadyStep This allows ILM to error out properly for indices that have a valid alias, but are not the write index, while still handling `indexing_complete` on old-style aliases and rollover (that is, those which only point to a single index at a time with no explicit write index)	2018-12-19 12:11:30 -07:00
Alpar Torok	e9ef5bdce8	Converting randomized testing to create a separate unitTest task instead of replacing the builtin test task (#36311 ) - Create a separate unitTest task instead of Gradle's built in - convert all configuration to use the new task - the built in task is now disabled	2018-12-19 08:25:20 +02:00
Tal Levy	06dfd4aadc	[TEST] fix flaky ILM tests (#36612 ) * WaitForRolloverReadyStepTests#mutateInstance sometimes did not mutate the instance correctly * 40_explain_lifecycle#"Test new phase still has phase_time" is not really a necessary integration test. In addition to this, it is flaky due to the asynchronous nature of ILM metadata population	2018-12-14 11:36:18 -08:00
Tal Levy	e3cf642299	Add ILM-specific security privileges (#36493 ) * add read_ilm cluster privilege Although managing ILM policies is best done using the "manage" cluster privilege, it is useful to have read-only views. * adds `read_ilm` cluster privilege for viewing policies and status * adds Explain API to the `view_index_metadata` index privilege * add manage_ilm privileges	2018-12-13 08:11:33 -08:00
Gordon Brown	6a824322fc	Improve error message for deleting in-use policy (#36457 ) The error message used when attempting to delete a lifecycle policy that is in use previously only included one index which was using the policy. It now includes all indices using that policy.	2018-12-12 14:57:48 -07:00
Gordon Brown	6481f2e380	Add setting to bypass Rollover action (#36235 ) Adds a setting that indicates that an index is done indexing, set by ILM when the Rollover action completes. This indicates that the Rollover action should be skipped in any future invocations, as long as the index is no longer the write index for its alias. This enables 1) an index with a policy that involves the Rollover action to have the policy removed and switched to another one without use of the move-to-step API, and 2) integrations with Beats and CCR.	2018-12-11 08:53:05 -07:00
Tal Levy	ed7afd1a9e	[ILM] TEST: fix long overflow in TimeValueScheduleTests (#36384 ) Closes #35948.	2018-12-10 09:28:17 -08:00
Alpar Torok	8659af68e0	Auto skip license headers on no source (#35640 ) * Unmute BuildExamplePluginsIT * Skip licenseHeaders when there are no sources	2018-11-20 13:02:33 +02:00
Gordon Brown	cce9648f9d	Align RolloverStep's name with other step names (#35655 ) RolloverStep previously had a name of "attempt_rollover", which was inconsistent with all other step names due it its use of an underscore instead of a dash.	2018-11-16 17:42:48 -07:00
Gordon Brown	3883e9bf4c	Split RolloverStep into Wait and Action steps (#35524 ) RolloverAction will now periodically check the rollover conditions using the Rollover API with the dry_run option as an AsyncWaitStep, then run the rollover itself by calling the Rollover API with no conditions, which will always roll over, as an AsyncActionStep. This will resolve race condition issues in policies using RolloverAction.	2018-11-15 17:11:31 -07:00
Lee Hinman	8ea999e489	Include stack trace with ILM error in explain output (#35512 ) This changes the stacktrace to be included with the ILM explain error when the index is an on ERROR step. Before: ```json { "indices" : { "foo" : { "index" : "foo", "managed" : true, "policy" : "bad", "lifecycle_date_millis" : 1542131670601, "phase" : "warm", "phase_time_millis" : 1542131676335, "action" : "shrink", "action_time_millis" : 1542131676335, "step" : "ERROR", "step_time_millis" : 1542131676451, "failed_step" : "shrink", "step_info" : { "type" : "illegal_argument_exception", "reason" : "the number of target shards [13] must be less that the number of source shards [2]" }, "phase_execution" : { "policy" : "bad", "phase_definition" : { "min_age" : "5s", "actions" : { "shrink" : { "number_of_shards" : 13 } } }, "version" : 1, "modified_date_in_millis" : 1542131669839 } } } } ``` After ``` { "indices" : { "foo" : { "index" : "foo", "managed" : true, "policy" : "bad", "lifecycle_date_millis" : 1542131670601, "phase" : "warm", "phase_time_millis" : 1542131676335, "action" : "shrink", "action_time_millis" : 1542131676335, "step" : "ERROR", "step_time_millis" : 1542131676451, "failed_step" : "shrink", "step_info" : { "type" : "illegal_argument_exception", "reason" : "the number of target shards [13] must be less that the number of source shards [2]", "stack_trace" : "java.lang.IllegalArgumentException: the number of target shards [13] must be less that the number of source shards [2]\n\tat org.elasticsearch.cluster.metadata.IndexMetaData.selectShrinkShards(IndexMetaData.java:1509)\n\tat org.elasticsearch.action.admin.indices.shrink.TransportResizeAction.prepareCreateIndexRequest(TransportResizeAction.java:146)\n\tat org.elasticsearch.action.admin.indices.shrink.TransportResizeAction$1.onResponse(TransportResizeAction.java:104)\n\tat org.elasticsearch.action.admin.indices.shrink.TransportResizeAction$1.onResponse(TransportResizeAction.java:101)\n\tat org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:64)\n\tat org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:60)\n\tat org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$AsyncAction.onCompletion(TransportBroadcastByNodeAction.java:383)\n\tat org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$AsyncAction.onNodeResponse(TransportBroadcastByNodeAction.java:352)\n\tat org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$AsyncAction$1.handleResponse(TransportBroadcastByNodeAction.java:324)\n\tat org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$AsyncAction$1.handleResponse(TransportBroadcastByNodeAction.java:314)\n\tat org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1117)\n\tat org.elasticsearch.transport.TransportService$DirectResponseChannel.processResponse(TransportService.java:1198)\n\tat org.elasticsearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1178)\n\tat org.elasticsearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:54)\n\tat org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.messageReceived(TransportBroadcastByNodeAction.java:417)\n\tat org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.messageReceived(TransportBroadcastByNodeAction.java:391)\n\tat org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:251)\n\tat org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)\n\tat org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.messageReceived(SecurityServerTransportInterceptor.java:309)\n\tat org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63)\n\tat org.elasticsearch.transport.TransportService$7.doRun(TransportService.java:714)\n\tat org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:726)\n\tat org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base/java.lang.Thread.run(Thread.java:834)\n" }, "phase_execution" : { "policy" : "bad", "phase_definition" : { "min_age" : "5s", "actions" : { "shrink" : { "number_of_shards" : 13 } } }, "version" : 1, "modified_date_in_millis" : 1542131669839 } } } } ``` Resolves #35498	2018-11-14 14:40:05 -07:00
Tal Levy	16cbbab7b7	[ILM] fix retry so it picks up latest policy and executes async action (#35406 ) Before, moving to a failed step would only change the step info to be that of the failed step. This means two things. 1. Async Steps would never be triggered to execute 2. If there are inherent problems with the action definition that can be fixed with a policy update, these changes were not being reflected by the new execution info. Changes now 1. Async steps are executed after the move to the failed step in cluster state 2. the lifecycle execution info's phase definition is updated from the current latest policy definition, even though the index isn't moving to a new phase. Closes #35397.	2018-11-12 11:32:59 -08:00
Gordon Brown	67f9e8fa23	Enforce limitations on ILM policy names (#35104 ) Enforces restrictions on ILM policy names to ensure we don't accept policy names the system can't handle, or may reserve for future use.	2018-11-09 10:11:26 -07:00
Alpar Torok	8a85b2eada	Remove build qualifier from server's Version (#35172 ) With this change, `Version` no longer carries information about the qualifier, we still need a way to show the "display version" that does have both qualifier and snapshot. This is now stored by the build and red from `META-INF`.	2018-11-07 14:01:05 +02:00
Tal Levy	a85b4f42ca	[ILM] change remove-policy-from-index http method from DELETE to POST (#35268 ) The remove-ilm-from-index API was using the DELETE http method to signify that something is being removed. Although, metadata about ILM for the index is being deleted, no entity/resource is being deleted during this operation. POST is more in line with what this API is actually doing, it is modifying the metadata for an index. As part of this change, `remove` is also appended to the path to be more explicit about its actions.	2018-11-06 07:46:25 -08:00
Tal Levy	2bf843e768	[TEST] Mute ChangePolicyForIndexIT#testChangePolicyForIndex	2018-11-06 06:09:49 -08:00
Nik Everett	f72ef9b5fd	Build: Pull "skip assemble on qa" to common build (#35214 ) Pull all of the logic that we use to skip the `assemble` and `dependenciesInfo` tasks on `qa` projects into one spot in our root build file.	2018-11-05 16:16:00 -05:00
Gordon Brown	0fbb8a16bc	Skip Rollover step if next index already exists (#35168 ) If the Rollover step would fail due to the next index in sequence already existing, just skip to the next step instead of going to the Error step. This prevents spurious `ResourceAlreadyExistsException`s created by simultaneous RolloverStep executions from causing ILM to error out unnecessarily.	2018-11-05 09:20:43 -07:00
Lee Hinman	3473217563	Remove Joda usage from ILM (#35220 ) This commit removes the Joda time usage from ILM and the HLRC components of ILM. It also fixes an issue where using the `?human=true` flag could have caused the parser not to work. These millisecond fields now follow the standard we use elsewhere in the code, with additional fields added iff the `human` flag is specified. This is a breaking change for ILM, but since ILM has not yet been released, no compatibility shim is needed.	2018-11-05 08:17:15 -07:00
Alexander Reelsen	409050e8de	Refactor: Remove settings from transport action CTOR (#35208 ) As settings are not used in the transport action constructor, this removes the passing of the settings in all the transport actions.	2018-11-05 13:08:18 +01:00
Gordon Brown	b3da3eae08	[ILM] Fix race condition in test (#35143 ) Previously, testRunStateChangePolicyWithNextStep asserted that the ClusterState before and after running the steps were equal. The test only passed due to a race condition: The latch would be triggered by the step execution, but the cluster state update thread would continue running before committing the change to the cluster state. This allowed the test to read the old cluster state and pass the equality check about 99.99% of the time. The test now waits for the new cluster state to be committed before checking that it is _not_ equal to the old cluster state.	2018-11-02 11:09:48 -06:00
Jason Tedor	1e241190eb	Disable assemble task from ILM qa projects This commit disables the assemble tasks from all ILM qa projects. These projects do not have an assemble task to execute.	2018-11-02 11:16:34 -04:00
Tal Levy	6b312a500d	uninherit from AbstractComponent in IndexLifecycleService	2018-11-01 10:22:55 -07:00
Tal Levy	f8e23f6400	update ILM integ test cluster poll interval to 1s (#35113 )	2018-10-31 17:09:35 -07:00
Tal Levy	5f4b23f8c1	cleanup ILM qa structure (#35110 ) This commit does a few things - moves ILM-specifc rest yaml tests into plugin/ilm/qa, and creates special :plugin:ilm:qa:rest module to test them - removes the with-security tests of the yaml tests since they are covered in the rest tests now - moves ChangePolicyforIndexIT into the qa/multi-node project since that test is not currently running in main ilm since integTest is disabled	2018-10-31 11:49:29 -07:00
Tal Levy	a294a7c6b5	fix IndexLifecycleService setting member the settings variable was previously created by the AbstractComponent class inherited by IndexLifecycleService. this is no more.	2018-10-31 11:17:16 -07:00
Tal Levy	5141084048	rename CRUD api REST path prefix _ilm to _ilm/policy (#35056 ) This PR renames the CRUD APIS for ILM GET _ilm/<policy>, _ilm -> _ilm/policy/<policy>, _ilm/policy PUT _ilm/<policy> -> _ilm/policy/<policy> DELETE _ilm/<policy> -> _ilm/policy/<policy> closes #34929.	2018-10-30 16:19:05 -07:00
Gordon Brown	6ecb8ff344	Move to Error step if ClusterState* steps throw (#35069 ) Previously, if ClusterStateActionSteps or ClusterStateWaitSteps threw an exception executing, the exception would only be caught and logged by the generic ClusterStateUpdateTask machinery and the index would become stuck on that step. Now, exceptions thrown in these steps will be caught and the index will be moved to the Error step.	2018-10-30 13:33:32 -06:00
Gordon Brown	f6ac0e4bbc	[ILM] Fix Move To Step API causing ILM to hang (#34618 ) The Move To Step API now checks to see if the target step is an AsyncActionStep, and if so, runs it. Previously, AsyncActionSteps would only be run when they are entered by executing the previous step, so if an AsyncActionStep was entered via the Move To Step API, ILM would never touch that index again.	2018-10-29 11:18:12 -06:00
Tal Levy	f6ce935444	fix `GET _ilm` response with uninitialized ILM metadata (#34881 ) ILM would return a resource-not-found exception when requesting policies while the IndexLifecycleMetaData is not initialized. The behavior here should not be as extreme since it is not the user's fault. This commit changes the behavior so that it succeeds and returns no policies when no policy names are explicitely specified, otherwise keep the same behavior of throwing an exception	2018-10-25 16:00:44 -07:00
Tal Levy	41eaa586e8	remove index.lifecycle.skip setting (#34823 ) With the introduction of _ilm/stop and _ilm/start APIs, the use cases where one would only target a select group of indices to start/stop has been reduced. Since there is no strong use-case for skipping specific indices, it is best to remove this functionality and only adding if later desired, with the hopes of keeping things more simple.	2018-10-25 07:27:04 -07:00
Tal Levy	21b9b024c7	fix PolicyStatsTests mutateInstance (#34835 ) through randomization, there is a chance that the mutateInstance for PolicyStatsTests does not actually mutate the original object. This PR aims to fix this	2018-10-25 07:24:54 -07:00
Colin Goodheart-Smithe	0b26f8b14c	Fixes NPE in multi node qa testt	2018-10-25 10:45:30 +01:00
Colin Goodheart-Smithe	e7fddb5c93	Adds usage data for ILM (#33377 ) * Adds usage data for ILM * Adds tests for IndexLifecycleFeatureSetUsage and friends * Adds tests for IndexLifecycleFeatureSet * Fixes merge errors * Add number of indices managed to usage stats Also adds more tests * Addresses Review comments	2018-10-24 18:28:46 +01:00
Colin Goodheart-Smithe	c7fe87e43f	Removes Set Policy API in favour of setting index.lifecycle.name directly (#34304 ) * Removes Set Policy API in favour of setting index.lifecycle.name directly * Reinstates matcher that will still be used * Cleans up code after rebase * Adds test to check changing policy with ndex settings works * Fixes TimeseriesLifecycleActionsIT after API removal * Fixes docs tests * Fixes case on close where lifecycle service was never created	2018-10-24 16:14:59 +01:00
Lee Hinman	c5a264e77f	Ensure phase_time is set when in the "new" phase (#34280 ) Since there's no transition into the "new" phase it wasn't set until the "hot" phase, so now we initialize it when initializing the policy context. Resolves #34277	2018-10-23 15:20:41 -06:00
Gordon Brown	9cb0bb8b9f	Rework ILM build to separate integration tests (#34617 ) Having integration tests separated from the unit tests in the qa directory works much more smoothly with our testing infrastructure, matches what other plugins do, and tests in a more "real" deployment scenario by having all plugins installed.	2018-10-18 13:33:33 -06:00
Tal Levy	fdb850735a	fix setting version on deleting unmaanged indices with wildcard	2018-10-16 23:39:48 -07:00
Tal Levy	3a555da34d	update version on ILM setting updates	2018-10-16 15:43:10 -07:00
Jack Conradson	80474e138f	HLRC: Add remove index lifecycle policy (#34204 ) This change adds the command RemoveIndexLifecyclePolicy to the HLRC. This uses the new TimeRequest as a base class for RemoveIndexLifecyclePolicyRequest on the client side.	2018-10-16 08:12:06 -07:00
Lee Hinman	9ad2a7fa77	Fix expected next step being incorrect when executing async action (#34313 ) This fixes an issue where an incorrect expected next step is used when checking to execute `AsyncActionStep`s after a cluster state step. It fixes this scenario: - `ExecuteStepsUpdateTask` executes a `ClusterStateWaitStep` or `ClusterStateActionStep` successfully - The next step is also a `ClusterStateWaitStep`, so it loops - The `ClusterStateWaitStep` has a next stepkey (which gets set to the `nextStepKey` in the code) - The `ClusterStateWaitStep` fails the condition, meaning that it will have to wait longer - The `nextStepKey` is now incorrect though, because we did not advance the index's step, and it's not `null` (which is another safe value if there is no step after the `ClusterStateWaitStep`) This fixes the problem by resetting the nextStepKey to null if the condition is not met, since we are not going to advance the step metadata in this case (thereby skipping the `maybeRunAsyncAction` invocation). This commit also tightens up and enhances much of the ILM logging. A lot of logging was missing the index name (making it hard to debug in the presence of multiple indices) and a lot was using the wrong logging level (DEBUG is now actually readable without being a wall of text). Resolves #34297	2018-10-08 11:25:18 -06:00
Gordon Brown	13d89295c8	Provide useful error when a policy doesn't exist (#34206 ) When an index is configured to use a lifecycle policy that does not exist, this will now be noted in the step_info for that policy.	2018-10-04 08:21:55 -06:00
Tal Levy	f10735aa9a	ILM integration test with full policy (#33402 ) - this adds an integration test that runs through a policy with all the actions defined. - adds a test specific to a policy having just a rollover action - bumps the node count to 4	2018-10-03 12:20:43 -06:00
Lee Hinman	388f754a8e	Change step execution flow to be deliberate about type (#34126 ) This commit changes the way that step execution flows. Rather than have any step run when the cluster state changes or the periodic scheduler fires, this now runs the different types of steps at different times. `AsyncWaitStep` is run at a periodic manner, ie, every 10 minutes by default `ClusterStateActionStep` and `ClusterStateWaitStep` are run every time the cluster state changes. `AsyncActionStep` is now run only after the cluster state has been transitioned into a new step. This prevents these non-idempotent steps from running at the same time. It addition to being run when transitioned into, this is also run when a node is newly elected master (only if set as the current step) so that master failover does not fail to run the step. This also changes the `RolloverStep` from an `AsyncActionStep` to an `AsyncWaitStep` so that it can run periodically. Relates to #29823	2018-10-02 20:02:50 -06:00
Lee Hinman	2d9cb21490	Merge remote-tracking branch 'origin/master' into index-lifecycle	2018-10-01 14:10:09 -06:00
Lee Hinman	a49d59802a	Use more descriptive task names for ILM cluster state updates (#34161 ) Rather than using "ILM" for everything, we should use more descriptive names so debugging from logs is easier to do. Resolves #34118	2018-10-01 13:45:26 -06:00
Gordon Brown	c0bfc07f53	Only make indexes read-only on Shrink and ForceMerge actions (#33907 ) ILM now only forces indices to become read only in the case of Shrink and Force Merge actions, as these are most useful in cases where the index is no longer being written to.	2018-09-25 10:16:01 -06:00
Gordon Brown	90de436e55	Use custom index metadata for ILM state (#33783 ) Using index settings for ILM state is fragile and exposes too much information that doesn't need to be exposed. Using custom index metadata is more resilient and allows more controlled access to internal information. As part of these changes, moves away from using defaults for ILM-related values, in favor of using null values to clearly indicate that the value is not present.	2018-09-19 14:50:48 -06:00
Lee Hinman	27dd25857b	Rebuild step on PolicyStepsRegistry.getStep (#33780 ) This moves away from caching a list of steps for a current phase, instead rebuilding the necessary step from the phase JSON stored in the index's metadata. Relates to #29823	2018-09-18 17:07:57 -06:00
Lee Hinman	11a55d2307	[TEST] Handle an IndexLifecycleService that has not started up	2018-09-18 14:02:09 -06:00
Tal Levy	94a66c556d	add phase execution info to ILM Explain API (#33488 ) adds a section for phase execution to the Explain API. This contains - phase definition - policy name - policy version - modified date	2018-09-17 17:00:00 -07:00
Lee Hinman	1f048d3d3f	Remove unneeded listener on MoveToNextStepUpdateTask (#33725 ) There was a listener that re-runs the policy with the new state when the cluster state is processed by the `MoveToNextStepUpdateTask`. This removes this listener as we will execute the policy through the `IndexLifecyleService` cluster state listener.	2018-09-14 14:38:23 -06:00
Lee Hinman	b7649fce0c	Rename "after" to "minimum_age" in lifecycle definition (#33530 ) This renames the "after" field to better reflect what the meaning is. Supercedes #32624	2018-09-08 21:40:55 -06:00
Lee Hinman	8fa8dea138	Encapsulate Client as class variable for PolicyStepsRegistry (#33529 ) Rather than pass in the client on the `update` step, this makes it passed in to the constructor so it's not required on every update.	2018-09-07 16:32:25 -06:00
Colin Goodheart-Smithe	f83641346f	Adds checks to ensure index metadata exists when we try to use it (#33455 ) * Adds checks to ensure index metadata exists when we try to use it * Fixes failing test	2018-09-07 13:06:51 +01:00
Tal Levy	21bb4720a2	add notion of version and modified_date to LifecyclePolicyMetadata (#33450 ) It is useful to keep track of which version of a policy is currently being executed by a specific index. For management purposes, it would also be useful to know at which time the latest version was inserted so that an audit trail is left for reconciling changes happening in ILM.	2018-09-06 13:32:24 -07:00
Lee Hinman	b335487ca6	Fix qa build.gradle to gradle assemble works correctly There is a new way to disable assembling from certain subdirectories	2018-09-06 11:22:27 -06:00
Lee Hinman	96d515e3f5	Replace PhaseAfterStep with PhaseCompleteStep (#33398 ) This removes `PhaseAfterStep` in favor of a new `PhaseCompleteStep`. This step in only a marker that the `LifecyclePolicyRunner` needs to halt until the time indicated for entering the next phase. This also fixes a bug where phase times were encapsulated into the policy instead of dynamically adjusting to policy changes. Supersedes #33140, which it replaces Relates to #29823	2018-09-05 16:37:45 -06:00
Tal Levy	023e1bf889	fix test	2018-09-05 13:21:50 -07:00
Tal Levy	0f8bc10bcf	add new phase definition setting used for retrieving phase to execute (#33289 ) Since policies can be updated independent of execution plans for the current phase being executed, it would be nice to know what the phase that is executing looks like in JSON. This PR does just that, while also using that index setting to recontruct the phase steps to execute (for consistency)	2018-09-05 11:35:20 -07:00
Colin Goodheart-Smithe	ada3e710f6	Renames XPackField.INDEX _LIFCYCLE value to "ilm" (#33270 ) This brings the name in line with everywhere else and means that name seen on the feature usage and `GET _xpack` APIs will match the plugin name. This change also removes `IndexLifcycle.NAME` since this was only used to name the scheduler job and that can be done using `XPackField.INDEX_LIFECYCLE` instead	2018-08-31 08:29:44 +01:00
Tal Levy	cfe0acc83c	separate out IndexLifecycleService cluster-state change concerns (#33033 ) Changes to the IndexLifecycleService were necessary since relying on ClusterChangedEvents for a full picture of the cluster state's settings was a mistake. It is not necessary that these events hold all settings, especially ones that are set at node start-up. Changes to main include: - move poll interval updates to a SettingsUpdateConsumer - move scheduler start/stop to a localMasterNodeListener - keep triggerPolicies in clusterChanged Changes to tests include: - removal of some low-level state transition checks in the Service that no longer make sense since the changes are unconditionally specified in the appropriate listeners - add integration tests for poll-interval updates - add integration test assertions for verifying scheduler is started up correctly	2018-08-27 14:25:27 -07:00
Lee Hinman	52aa738d84	Remove canSetPolicy, canUpdatePolicy, and canRemovePolicy (#33037 ) * Remove canSetPolicy, canUpdatePolicy and canRemovePolicy Since we now store a pre-compiled list of steps for an index's phase in the `PolicyStepsRegistry`, we no longer need to worry about updating policies as any updates won't affect the current phase, and will only be picked up on phase transitions. This also removes the tests that test these methods Relates to #29823	2018-08-23 15:37:02 -06:00
Gordon Brown	650f12af1e	Duplicate Protocol classes into Core This is needed as with recent changes to master (see #32952), protocol is no longer accessible from core, so these classes need to be duplicated in both places.	2018-08-23 13:50:15 -06:00
Gordon Brown	191bd7c031	Fix Gradle configuration This change was made to master, this commit brings it over to index-lifecycle. See #32409	2018-08-23 12:08:16 -06:00
Colin Goodheart-Smithe	fd88ab8c75	Fixes shrink action to remove single ndoe allocation (#33091 ) This change fixes the shrink action so when the shrink is performed we remove the single node allocation fromt eh shard allocation filtering settings. Without this fix replicas cannot be allocated after we have performed the shrink and we cannot make progress with the rest of the shink aciton. This change also fixes a bug in the explain API where the maste node timeout was being set to null if it wasn't provided instead of using its default value causing a NPE	2018-08-23 16:36:57 +01:00
Tal Levy	dfc70ddcc7	rename pre-phase/pre-action to new/init (#32996 ) this will keep things more consistent with the initial PhaseAfterStep, which has a phase name of `new`	2018-08-22 11:41:09 -07:00
Tal Levy	55cb08a352	move ESLoggerFactory usage to LogManager (#33043 ) Work done in #32513 has deprecated the old constructor in favor of log4j2's LogManager	2018-08-21 17:59:32 -07:00
Tal Levy	96869b253c	conditionally update CS only if StepInfo changes (#33004 ) If we are waiting on a condition to be met, and the reason it is not completed is unchanged, we find ourselves updating cluster state over and over again and kicking of the ILM listeners to re-check. This is overkill and can generate way too many cluster state updates	2018-08-21 12:29:41 -07:00
Tal Levy	6780ab9d5c	add user authentication test for ILM (#32826 )	2018-08-21 12:27:53 -07:00
Colin Goodheart-Smithe	75cae4560c	fix compile error after merge	2018-08-21 14:47:54 +01:00
Tal Levy	5ce082cd0a	copy LifecyclePolicy to protocol.xpack (#32915 ) This is the final PR for copying over the necessary components for clients to parse/render LifecyclePolicy. Changes include: - move of named-x-content server objects away from client - move validation into the client copy of LifecyclePolicy - move LifecycleAction into an interface with `getName`	2018-08-20 08:32:22 -07:00
Lee Hinman	77016add19	Store phase steps for index in PolicyStepsRegistry (#32926 ) * Store phase steps for index in PolicyStepsRegistry This changes the way that steps are retrieved from `PolicyStepsRegistry` to store the steps on a per-index basis (in memory for now, though that will change in subsequent PRs). These steps are rebuilt as the index changes phases. This also fixes a bug where an action with the same phase and name was not being considered changed (and thus updated) in the compiled steps list. These are now correctly considered as "upsert" diffs. Relates to #29823	2018-08-17 22:45:15 -06:00
Tal Levy	33522d4fb4	update cluster-state task execution to halt on new phase (#32886 ) As we migrate to a per-phase execution model, we need to prepare our cluster-state-step execution model to be aligned. It is the case that the final iteration into the next "currentStep" from the next phase would not be available in the registry yet. This change exits the execution loop early as to not jump into executing the next phase's steps before the registry is properly updated	2018-08-15 19:30:21 -07:00
Tal Levy	4baa721459	remove `type` config from LifecyclePolicy JSON (#32660 ) Since there is only one production policy, Timeseries, there is no reason to expose the `type` argument to the user.	2018-08-15 14:47:22 -07:00
Tal Levy	b218b1c68d	introduce random timeseries lifecycle policy util method (#32852 ) It is useful to have a random TimeseriesLifecycleType-backed LifecyclePolicy for testing. This PR exposes a helper method to create one and use it for serialization tests in LifecyclePolicyTests	2018-08-14 12:57:43 -07:00
Tal Levy	41e6d98af8	move qa yaml tests to inside the ILM plugin (#32693 ) The qa tests with security haven't actually gone as far as testing security roles yet, so this is a start in the hopes of both bringing the tests into the ilm plugin	2018-08-09 16:09:20 -07:00
Colin Goodheart-Smithe	8750d622fc	Adds REST client support for starting and stopping ILM (#32609 ) * Adds REST client support for PutOperationMode in ILM * Corrects licence headers * iter * add request converter test * Fixes tests * Creates start and stop actions for controlling ILM operation * Addresses review comments	2018-08-09 20:39:06 +01:00
Colin Goodheart-Smithe	5ff4f9347f	Adds explain lifecycle API to the Rest Client (#32606 )	2018-08-09 10:18:45 +01:00
Tal Levy	2fc3f1d04c	move replicas action functionality into AllocateAction (#32523 ) Since replica counts and allocation rules are set separately, it is not always clear how many replicas are to be allocated in the allocate action. Moving the replicas action to occur at the same time as the allocate action, resolves this confusion that could end an undesired state. This means that the ReplicasAction is removed, and a new optional replicas parameter is added to AllocateAction.	2018-08-08 11:43:29 -07:00
Tal Levy	0ad252d502	change default indices.lifecycle.poll_interval to something sane (#32521 ) This was originally set to a few seconds while prototyping things. This interval is for the scheduled trigger of policies. Policies have this extra trigger beyond just on cluster-state changes because cluster-state changes may not be happeneing in a cluster for whatever reason, and we need to continue making progress. Updating this value to be larger is reasonable since not all operations are expected to be completed in the span of seconds, but instead in minutes and hours. 10 minutes is sane.	2018-08-06 14:41:27 -07:00
Jason Tedor	5de236e1e7	Rename ILM, ILM endpoints and drop _xpack (#32564 ) This commit does the following: - renames index-lifecycle plugin to ilm - modifies the endpoints to ilm instead of index_lifecycle - drops _xpack from the endpoints - drops a few duplicate endpoints	2018-08-02 13:05:11 -04:00

... 2 3 4 5 6 ...

311 Commits