Waiting `INIT` here is dead code in newer versions that don't use `INIT`
any longer and leads to nothing being written to the repository in older versions
if the snapshot is cancelled at the `INIT` step which then breaks repo consistency
checks.
Since we have other tests ensuring that snapshot abort works properly we can just remove
the wait for `INIT` here and backport this down to 7.8 to fix tests.
relates #59140
Fixed an issue #59082 introduced. We have to wait for no more operations
in all tests here not just the one we were waiting in already so that the cleanup
operation from the parent class can run without failure.
For #58994 it would be useful to be able to share test infrastructure.
This PR shares `AbstractSnapshotIntegTestCase` for that purpose, dries up SLM tests
accordingly and adds a shared and efficient (compared to the previous implementations)
way of waiting for no running snapshot operations to the test infrastructure to dry things up further.
Today we have individual settings for configuring node roles such as
node.data and node.master. Additionally, roles are pluggable and we have
used this to introduce roles such as node.ml and node.voting_only. As
the number of roles is growing, managing these becomes harder for the
user. For example, to create a master-only node, today a user has to
configure:
- node.data: false
- node.ingest: false
- node.remote_cluster_client: false
- node.ml: false
at a minimum if they are relying on defaults, but also add:
- node.master: true
- node.transform: false
- node.voting_only: false
If they want to be explicit. This is also challenging in cases where a
user wants to have configure a coordinating-only node which requires
disabling all roles, a list which we are adding to, requiring the user
to keep checking whether a node has acquired any of these roles.
This commit addresses this by adding a list setting node.roles for which
a user has explicit control over the list of roles that a node has. If
the setting is configured, the node has exactly the roles in the list,
and not any additional roles. This means to configure a master-only
node, the setting is merely 'node.roles: [master]', and to configure a
coordinating-only node, the setting is merely: 'node.roles: []'.
With this change we deprecate the existing 'node.*' settings such as
'node.data'.
Add the ability to get a custom value while specifying a default and use it throughout the
codebase to get rid of the `null` edge case and shorten the code a little.
* Add support for snapshot and restore to data streams (#57675)
This change adds support for including data streams in snapshots.
Names are provided in indices field (the same way as in other APIs), wildcards are supported.
If rename pattern is specified it renames both data streams and backing indices.
It also adds test to make sure SLM works correctly.
Closes#57127
Relates to #53100
* version fix
* compilation fix
* compilation fix
* remove unused changes
* compilation fix
* test fix
This changes the actions that would attempt to make the managed index read only to
check if the managed index is the write index of a data stream before proceeding.
The updated actions are shrink, readonly, freeze and forcemerge.
(cherry picked from commit c906f631833fee8628f898917a8613a1f436c6b1)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
In #55592 and #55416, we deprecated the settings for enabling and disabling
basic license features and turned those settings into no-ops. Since doing so,
we've had feedback that this change may not give users enough time to cleanly
switch from non-ILM index management tools to ILM. If two index managers
operate simultaneously, results could be strange and difficult to
reconstruct. We don't know of any cases where SLM will cause a problem, but we
are restoring that setting as well, to be on the safe side.
This PR is not a strict commit reversion. First, we are keeping the new
xpack.watcher.use_ilm_index_management setting, introduced when
xpack.ilm.enabled was made a no-op, so that users can begin migrating to using
it. Second, the SLM setting was modified in the same commit as a group of other
settings, so I have taken just the changes relating to SLM.
As the datastream information is stored in the `ClusterState.Metadata` we exposed
the `Metadata` to the `AsyncWaitStep#evaluateCondition` method in order for
the steps to be able to identify when a managed index is part of a DataStream.
If a managed index is part of a DataStream the rollover target is the DataStream
name and the highest generation index is the write index (ie. the rolled index).
(cherry picked from commit 6b410dfb78f3676fce1b7401f1628c1ca6fbd45a)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
Backport of #56878 to 7.x branch.
With this change the following APIs will be able to resolve data streams:
get index, get mappings and ilm explain APIs.
Relates to #53100
Without the flag we run into the situation where a broken repository (broken by some old 6.x
version of ES that is missing some snap-${uuid}.dat blobs fails to run the SLM retention task
since it always errors out).
The following settings are now no-ops:
* xpack.flattened.enabled
* xpack.logstash.enabled
* xpack.rollup.enabled
* xpack.slm.enabled
* xpack.sql.enabled
* xpack.transform.enabled
* xpack.vectors.enabled
Since these settings no longer need to be checked, we can remove settings
parameters from a number of constructors and methods, and do so in this
commit.
We also update documentation to remove references to these settings.
* Allow Deleting Multiple Snapshots at Once (#55474)
Adds deleting multiple snapshots in one go without significantly changing the mechanics of snapshot deletes otherwise.
This change does not yet allow mixing snapshot delete and abort. Abort is still only allowed for a single snapshot delete by exact name.
* Make xpack.monitoring.enabled setting a no-op
This commit turns xpack.monitoring.enabled into a no-op. Mostly, this involved
removing the setting from the setup for integration tests. Monitoring may
introduce some complexity for test setup and teardown, so we should keep an eye
out for turbulence and failures
* Docs for making deprecated setting a no-op
This commit converts the remaining isXXXAllowed methods to instead of
use isAllowed with a Feature value. There are a couple other methods
that are static, as well as some licensed features that check the
license directly, but those will be dealt with in other followups.
* Make xpack.ilm.enabled setting a no-op
* Add watcher setting to not use ILM
* Update documentation for no-op setting
* Remove NO_ILM ml index templates
* Remove unneeded setting from test setup
* Inline variable definitions for ML templates
* Use identical parameter names in templates
* New ILM/watcher setting falls back to old setting
* Add fallback unit test for watcher/ilm setting
We believe there's no longer a need to be able to disable basic-license
features completely using the "xpack.*.enabled" settings. If users don't
want to use those features, they simply don't need to use them. Having
such features always available lets us build more complex features that
assume basic-license features are present.
This commit deprecates settings of the form "xpack.*.enabled" for
basic-license features, excluding "security", which is a special case.
It also removes deprecated settings from integration tests and unit
tests where they're not directly relevant; e.g. monitoring and ILM are
no longer disabled in many integration tests.
Backport from: #54726
The INCLUDE_DATA_STREAMS indices option controls whether data streams can be resolved in an api for both concrete names and wildcard expressions. If data streams cannot be resolved then a 400 error is returned indicating that data streams cannot be used.
In this pr, the INCLUDE_DATA_STREAMS indices option is enabled in the following APIs: search, msearch, refresh, index (op_type create only) and bulk (index requests with op type create only). In a subsequent later change, we will determine which other APIs need to be able to resolve data streams and enable the INCLUDE_DATA_STREAMS indices option for these APIs.
Whether an api resolve all backing indices of a data stream or the latest index of a data stream (write index) depends on the IndexNameExpressionResolver.Context.isResolveToWriteIndex().
If isResolveToWriteIndex() returns true then data streams resolve to the latest index (for example: index api) and otherwise a data stream resolves to all backing indices of a data stream (for example: search api).
Relates to #53100
When retrieving the snapshots for a set of repos or deleting a single snapshot, it's possible for
the body of the `ActionListener`'s `onResponse` method to throw an Exception. In this case, the
`errHandler` passed in may not be executed, resulting in the `running` boolean not being reset back
to false.
This commit uses `ActionListener.wrap(...)` instead of creating a new ActionListener, which ensures
that if the `onResponse` fails in any way, the `onFailure` handler is still called.
Resolves#55217
Today we pass the `RepositoriesService` to the searchable snapshots plugin
during the initialization of the `RepositoryModule`, forcing the plugin to be a
`RepositoryPlugin` even though it does not implement any repositories.
After discussion we decided it best for now to pass this in via
`Plugin#createComponents` instead, pending some future work in which plugins
can depend on services more dynamically.
Prior to the change in #51631 indices were moved to the `TerminalPolicyStep` when their ILM actions
had completed. Once we switched ILM to stop in the last policy configured, these steps because
inaccessible from the policy's perspective. This meant that indices upgraded from ES prior to 7.7.0
could see the following error spammed in their logs every 10 minutes (by default) for every index in
this state:
```
[2020-04-14T15:52:23,764][ERROR][o.e.x.i.IndexLifecycleRunner] [midgar] current step [{"phase":"completed","action":"completed","name":"completed"}] for index [foo] with policy [full] is not recognized
```
This changes the runner to ignore these steps, which is what is desired anyway since the index is
already in the terminal phase.
This commits adds a timeout when moving ILM back on to a failed step. In
case the master is struggling with processing the cluster update requests
these ones will expire (as we'll send them again anyway on the next ILM
loop run)
ILM more descriptive source messages for cluster updates
Use the configured ILM step master timeout setting
(cherry picked from commit ff6c5ed16616eadfcddd9c95317d370f0d126583)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
* ILM use Priority.IMMEDIATE for stop ILM cluster update (#54909)
This changes the priority of the cluster state update that stops ILM
altogether to `IMMEDIATE`. We've chosen to change this as it can be useful to
temporarily stop ILM if a cluster is overwhelmed, but a `NORMAL`
priority can see the "stop ILM update" not make it up the tasks queue.
On the same note, we're keeping the `start ILM` cluster update priority
to `NORMAL` on purpose such that we only start `ILM` if the cluster can
handle it.
(cherry picked from commit d67df3a7cd2a8619c2c9efac4dde3ba83271f2fa)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
This is a backport of #54803 for 7.x.
This pull request cherry picks the squashed commit from #54803 with the additional commits:
6f50c92 which adjusts master code to 7.x
a114549 to mute a failing ILM test (#54818)
48cbca1 and 50186b2 that cleans up and fixes the previous test
aae12bb that adds a missing feature flag (#54861)
6f330e3 that adds missing serialization bits (#54864)
bf72c02 that adjust the version in YAML tests
a51955f that adds some plumbing for the transport client used in integration tests
Co-authored-by: David Turner <david.turner@elastic.co>
Co-authored-by: Yannick Welsch <yannick@welsch.lu>
Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
Co-authored-by: Andrei Dan <andrei.dan@elastic.co>
This is a follow up to a previous commit that renamed MetaData to
Metadata in all of the places. In that commit in master, we renamed
META_DATA to METADATA, but lost this on the backport. This commit
addresses that.
This is a simple naming change PR, to fix the fact that "metadata" is a
single English word, and for too long we have not followed general
naming conventions for it. We are also not consistent about it, for
example, METADATA instead of META_DATA if we were trying to be
consistent with MetaData (although METADATA is correct when considered
in the context of "metadata"). This was a simple find and replace across
the code base, only taking a few minutes to fix this naming issue
forever.
Backport of #53982
In order to prepare the `AliasOrIndex` abstraction for the introduction of data streams,
the abstraction needs to be made more flexible, because currently it really can be only
an alias or an index.
* Renamed `AliasOrIndex` to `IndexAbstraction`.
* Introduced a `IndexAbstraction.Type` enum to indicate what a `IndexAbstraction` instance is.
* Replaced the `isAlias()` method that returns a boolean with the `getType()` method that returns the new Type enum.
* Moved `getWriteIndex()` up from the `IndexAbstraction.Alias` to the `IndexAbstraction` interface.
* Moved `getAliasName()` up from the `IndexAbstraction.Alias` to the `IndexAbstraction` interface and renamed it to `getName()`.
* Removed unnecessary casting to `IndexAbstraction.Alias` by just checking the `getType()` method.
Relates to #53100
This commit causes negative TimeValues, other than -1 which is sometimes used as
a sentinel value, to be rejected during parsing.
Also introduces a hack to allow ILM to load policies which were written to the
cluster state with a negative min_age, treating those values as 0, which should
match the behavior of prior versions.
This commit adjusts the aliases used for the ILM and SLM history indices
to be hidden aliases.
Also tweaks the configuration of the `IndexTemplateRegistry`s used by
these history system to only upgrade the template from the master node,
as documents are indexed from the master node, so the template version
should only be upgraded from the master node.
The setting, `xpack.logstash.enabled`, exists to enable or disable the
logstash extensions found within x-pack. In practice, this setting had
no effect on the functionality of the extension. Given this, the
setting is now deprecated in preparation for removal.
Backport of #53367
This avoids NPE when executing SLM policy when no config was provided.
Related to #44465Closes#53171
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
* Avoid race condition in ILMHistorySotre (#53039)
* Avoid race condition in ILMHistorySotre
This change modifies ILMHistoryStore to always apply correct settings and mappings,
even if template is deleted and not yet recreated. This ensures that ILM history index
is correctly managed by ILM and also fixes flaky history tests that were prone to
triggenring this race.
This commit also refactors and simplifies ILM history tests.
Closes#50353 and #52853
* Review comment
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
* fixed tests
* backport #53306
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
This commit modifies the codebase so that our production code uses a
single instance of the IndexNameExpressionResolver class. This change
is being made in preparation for allowing name expression resolution
to be augmented by a plugin.
In order to remove some instances of IndexNameExpressionResolver, the
single instance is added as a parameter of Plugin#createComponents and
PersistentTaskPlugin#getPersistentTasksExecutor.
Backport of #52596
* Refactor Inflexible Snapshot Repository BwC (#52365)
Transport the version to use for a snapshot instead of whether to use shard generations in the snapshots in progress entry. This allows making upcoming repository metadata changes in a flexible manner in an analogous way to how we handle serialization BwC elsewhere.
Also, exposing the version at the repository API level will make it easier to do BwC relevant changes in derived repositories like source only or encrypted.
This commit adds more logging to the actions that the SLM retention task does. It will help in the
event that we need to diagnose any additional issues or problems while running retention.
We marked the `init` ILM step as retryable but our test used `waitUntil`
without an assert so we didn’t catch the fact that we were not actually
able to retry this step as our ILM state didn’t contain any information
about the policy execution (as we were in the process of initialising
it).
This commit manually sets the current step to `init` when we’re moving
the ilm policy into the ERROR step (this enables us to successfully
move to the error step and later retry the step)
* ShrunkenIndexCheckStep: Use correct logger
(cherry picked from commit f78d4b3d91345a2a8fc0f48b90dd66c9959bd7ff)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
Modifies SLM's and ILM's history indices to be hidden indices for added
protection against accidental querying and deletion, and improves
IndexTemplateRegistry to handle upgrading index templates.
Also modifies the REST test cleanup to delete hidden indices.
This commit changes how RestHandlers are registered with the
RestController so that a RestHandler no longer needs to register itself
with the RestController. Instead the RestHandler interface has new
methods which when called provide information about the routes
(method and path combinations) that are handled by the handler
including any deprecated and/or replaced combinations.
This change also makes the publication of RestHandlers safe since they
no longer publish a reference to themselves within their constructors.
Closes#51622
Co-authored-by: Jason Tedor <jason@tedor.me>
Backport of #51950
* Adding best_compression (#49974)
This commit adds a `codec` parameter to the ILM `forcemerge` action. When setting the codec to `best_compression` ILM will close the index, then update the codec setting, re-open the index, and finally perform a force merge.
* Fix ForceMergeAction toSteps construction (#51825)
There was a duplicate force merge step and the test continued to fail. This commit clarifies the
`toStep` method and changes the `assertBestCompression` method for better readability.
Resolves#51822
* Update version constants
Co-authored-by: Sivagurunathan Velayutham <sivadeva.93@gmail.com>
Currently when an ILM policy finishes its execution, the index moves into the `TerminalPolicyStep`,
denoted by a completed/completed/completed phase/action/step lifecycle execution state.
This commit changes the behavior so that the index lifecycle execution state halts at the last
configured phase's `PhaseCompleteStep`, so for instance, if an index were configured with a policy
containing a `hot` and `cold` phase, the index would stop at the `cold/complete/complete`
`PhaseCompleteStep`. This allows an ILM user to update the policy to add any later phases and have
indices configured to use that policy pick up execution at the newly added "later" phase. For
example, if a `delete` phase were added to the policy specified about, the index would then move
from `cold/complete/complete` into the `delete` phase.
Relates to #48431
This commit switches the strategy for managing dot-prefixed indices that
should be hidden indices from using "fake" system indices to an explicit
exclusions list that must be updated when those indices are converted to
hidden indices.
This commit deprecates the creation of dot-prefixed index names (e.g.
.watches) unless they are either 1) a hidden index, or 2) registered by
a plugin that extends SystemIndexPlugin. This is the first step
towards more thorough protections for system indices.
This commit also modifies several plugins which use dot-prefixed indices
to register indices they own as system indices, and adds a plugin to
register .tasks as a system index.
* Separate aliases used for tests in TimeSeriesLifecycleActionsIT
This is related to #51375 and hopes to help illuminate why some of those tests are failing. This
commit switches the aliases used in the test to use a random alias name every time (since there were
some complaints in the tests about aliases having more than one write index). With this we hope to
determine the actual cause of the failure in the test.
This also adds additional information to the exception returned when calling move-to-step with the
incorrect current step.
* Fix rest test
* Use ESSingleNodeTestCase instead of ESIntegTestCase (#51345)
(cherry picked from commit abcf1c41faf05a0b0196fb06e57c3de8c3d67688)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>