In the yaml cluster upgrade tests, we start a scroll in a mixed-version cluster,
then attempt to continue the scroll after the upgrade is complete. This test
occasionally fails because the scroll can expire before the cluster is done
upgrading.
The current scroll keep-alive time 5m. This PR bumps it to 10m, which gives a
good buffer since in failing tests the time was only exceeded by ~30 seconds.
Addresses #46529.
The testclusters shutdown code was killing child processes
of the ES JVM before the ES JVM. This causes any running
ML jobs to be recorded as failed, as the ES JVM notices that
they have disconnected from it without being told to stop,
as they would if they crashed. In many test suites this
doesn't matter because the test cluster will never be
restarted, but in the case of upgrade tests it makes it
impossible to test what happens when an ML job is running
at the time of the upgrade.
This change reverses the order of killing the ES process
tree such that the parent processes are killed before their
children. A list of children is stored before killing the
parent so that they can subsequently be killed (if they
don't exit by themselves as a side effect of the parent
dying).
Backport of #50175
Backport of #48849. Update `.editorconfig` to make the Java settings the
default for all files, and then apply a 2-space indent to all `*.gradle`
files. Then reformat all the files.
This commit introduces a consistent, and type-safe manner for handling
global build parameters through out our build logic. Primarily this
replaces the existing usages of extra properties with static accessors.
It also introduces and explicit API for initialization and mutation of
any such parameters, as well as better error handling for uninitialized
or eager access of parameter values.
Closes#42042
I think the problem is that the master is trying to relocate the "upgraded_scroll" shard back to
the node on which it was previously allocated, but to which it can't be allocated now due to the
shard lock being held because of an in-progress scroll. As the master keeps on retrying and
retrying (and indefinitely tries so because max_retries does not apply to relocations, it blocks
any other lower-prioritized task from completing, which leads to the rolling upgrade tests failing
(see #48395).
Closes#48395
rename internal indexes of transform plugin
- rename audit index and create an alias for accessing it, BWC: add an alias for old indexes to
keep them working, kibana UI will switch to use the read alias
- rename config index and provide BWC to read from old and new ones
This commit lifts the validation of the monitoring hosts setting into
the setting itself, rather than when the setting is used. This prevents
a scenario where an invalid value for the setting is accepted, but then
later fails while applying a cluster state with the invalid setting.
* Bwc testclusters all (#46265)
Convert all bwc projects to testclusters
* Fix bwc versions config
* WIP fix rolling upgrade
* Fix bwc tests on old versions
* Fix rolling upgrade
Backport of #45794 to 7.x. Convert most `awaitBusy` calls to
`assertBusy`, and use asserts where possible. Follows on from #28548 by
@liketic.
There were a small number of places where it didn't make sense to me to
call `assertBusy`, so I kept the existing calls but renamed the method to
`waitUntil`. This was partly to better reflect its usage, and partly so
that anyone trying to add a new call to awaitBusy wouldn't be able to find
it.
I also didn't change the usage in `TransportStopRollupAction` as the
comments state that the local awaitBusy method is a temporary
copy-and-paste.
Other changes:
* Rework `waitForDocs` to scale its timeout. Instead of calling
`assertBusy` in a loop, work out a reasonable overall timeout and await
just once.
* Some tests failed after switching to `assertBusy` and had to be fixed.
* Correct the expect templates in AbstractUpgradeTestCase. The ES
Security team confirmed that they don't use templates any more, so
remove this from the expected templates. Also rewrite how the setup
code checks for templates, in order to give more information.
* Remove an expected ML template from XPackRestTestConstants The ML team
advised that the ML tests shouldn't be waiting for any
`.ml-notifications*` templates, since such checks should happen in the
production code instead.
* Also rework the template checking code in `XPackRestTestHelper` to give
more helpful failure messages.
* Fix issue in `DataFrameSurvivesUpgradeIT` when upgrading from < 7.4
When upgrading data nodes to a newer version before
master nodes there was a risk that a transform running
on an upgraded data node would index a document into
the new transforms internal index before its index
template was created. This would cause the index to
be created with entirely dynamic mappings.
This change introduces a check before indexing any
internal transforms document to ensure that the required
index template exists and create it if it doesn't.
Backport of #46553
rename data frame transform plugin to transform:
- rename plugin data-frame to transform
- change all package names from o.e.*.dataframe.* to o.e.*.transform.*
- necessary changes to fix loading/testing
* [ML][Transforms] fixing rolling upgrade continuous transform test (#45823)
* [ML][Transforms] fixing rolling upgrade continuous transform test
* adjusting wait assert logic
* adjusting wait conditions
* [ML][Transforms] allow executor to call start on started task (#46347)
* making sure we only upgrade from 7.4.0 in test
This check was introduced in #41392 but had the unwanted side-effect
that the keystore settings in such blocks would note be added in the
node's keystore. Given that we have a mid-term plan for FIPS testing
that would made such checks unnecessary, and that the conditional
in these two cases is not really that important, this change removes
this conditional logic so that full-cluster-restart and rolling
upgrade tests will run with PEM files for key/certificate material
no matter if we're in a FIPS JVM or not.
Resolves: #45475
This commit replaces task_state and indexer_state in the
data frame _stats output with a single top level state
that combines the two. It is defined as:
- failed if what's currently reported as task_state is failed
- stopped if there is no persistent task
- Otherwise what's currently reported as indexer_state
Backport of #45276
This change adjusts the data frame transforms stats
endpoint to return a structure that is easier to
understand.
This is a breaking change for clients of the data frame
transforms stats endpoint, but the feature is in beta so
stability is not guaranteed.
Backport of #44350
* [ML][Data Frame] Adding bwc tests for pivot transform (#43506)
* [ML][Data Frame] Adding bwc tests for pivot transform
* adding continuous transforms
* adding continuous dataframes to bwc
* adding continuous data frame tests
* Adding rolling upgrade tests for continuous df
* Fixing test
* Adjusting indices used in BWC, and handling NPE for seq_no_stats
* updating and muting specific bwc test
* Adjusting bwc tests for backport
This commit removes some trace logging for the token service in the
rolling upgrade tests. If there is an active investigation here, it
would be best to annotate this line with a comment in the source
indicating such. From my digging, it does not appear there is an active
investigation that relies on this logging, so we remove it.
This commit changes the way token ids are hashed so that the output is
url safe without requiring encoding. This follows the pattern that we
use for document ids that are autogenerated, see UUIDs and the
associated classes for additional details.
The date_histogram accepts an interval which can be either a calendar
interval (DST-aware, leap seconds, arbitrary length of months, etc) or
fixed interval (strict multiples of SI units). Unfortunately this is inferred
by first trying to parse as a calendar interval, then falling back to fixed
if that fails.
This leads to confusing arrangement where `1d` == calendar, but
`2d` == fixed. And if you want a day of fixed time, you have to
specify `24h` (e.g. the next smallest unit). This arrangement is very
error-prone for users.
This PR adds `calendar_interval` and `fixed_interval` parameters to any
code that uses intervals (date_histogram, rollup, composite, datafeed, etc).
Calendar only accepts calendar intervals, fixed accepts any combination of
units (meaning `1d` can be used to specify `24h` in fixed time), and both
are mutually exclusive.
The old interval behavior is deprecated and will throw a deprecation warning.
It is also mutually exclusive with the two new parameters. In the future the
old dual-purpose interval will be removed.
The change applies to both REST and java clients.
This commit introduces the `.security-tokens` and `.security-tokens-7`
alias-index pair. Because index snapshotting is at the index level granularity
(ie you cannot snapshot a subset of an index) snapshoting .`security` had
the undesirable effect of storing ephemeral security tokens. The changes
herein address this issue by moving tokens "seamlessly" (without user
intervention) to another index, so that a "Security Backup" (ie snapshot of
`.security`) would not be bloated by ephemeral data.
This commit removes xpack dependencies of many xpack qa modules.
(for some qa modules this will require some more work)
The reason behind this change is that qa rest modules should not depend
on the x-pack plugins, because the plugins are an implementation detail and
the tests should only know about the rest interface and qa cluster that is
being tested.
Also some qa modules rely on xpack plugins and hlrc (which is a valid
dependency for rest qa tests) creates a cyclic dependency and this is
something that we should avoid. Also Eclipse can't handle gradle cyclic
dependencies (see #41064).
* don't copy xpack-core's plugin property into the test resource of qa
modules. Otherwise installing security manager fails, because it tries
to find the XPackPlugin class.
* moved hlrc parsing tests from xpack to hlrc module and removed dependency on hlrc from xpack core
* deprecated old base test class
* added deprecated jdoc tag
* split test between xpack-core part and hlrc part
* added lang-mustache test dependency, this previously came in via
hlrc dependency.
* added hlrc dependency on a qa module
* duplicated ClusterPrivilegeName class in xpack-core, since x-pack
core no longer has a dependency on hlrc.
* replace ClusterPrivilegeName usages with string literals
* moved tests to dedicated to hlrc packages in order to remove Hlrc part from the name and make sure to use imports instead of full qualified class where possible
* remove ESTestCase. from method invocation and use method directly,
because these tests indirectly extend from ESTestCase
* Replace usages RandomizedTestingTask with built-in Gradle Test (#40978)
This commit replaces the existing RandomizedTestingTask and supporting code with Gradle's built-in JUnit support via the Test task type. Additionally, the previous workaround to disable all tasks named "test" and create new unit testing tasks named "unitTest" has been removed such that the "test" task now runs unit tests as per the normal Gradle Java plugin conventions.
(cherry picked from commit 323f312bbc829a63056a79ebe45adced5099f6e6)
* Fix forking JVM runner
* Don't bump shadow plugin version
Many gradle projects specifically use the -try exclude flag, because
there are many cases where auto-closeable resource ignore is never
referenced in body of corresponding try statement. Suppressing this
warning specifically in each case that it happens using
`@SuppressWarnings("try")` would be very verbose.
This change removes `-try` from any gradle project and adds it to the
build plugin. Also this change removes exclude flags from gradle projects
that is already specified in build plugin (for example -deprecation).
Relates to #40366
* [ML] Addressing bug streaming DatafeedConfig aggs from (<= 6.5.4) -> 6.7.0 (#40610)
* Addressing stream failure and adding tests to catch such in the future
* Add aggs to full cluster restart tests
* Test BWC for datafeeds with and without aggs
The wire serialisation is different for null/non-null
aggs, so it's worth testing both cases.
* Fixing bwc test, removing types
* Fixing BWC test for datafeed
* Update 40_ml_datafeed_crud.yml
* Update build.gradle
This change removes the variants of the rolling upgrade and full
cluster restart tests that use or do not use a system key. These tests
were added during 5.x when the system key was still used for security
and now the system key is only used as the watcher encryption key so
duplicating rolling upgrade and full cluster restarts is not needed.
The change here removes the subprojects for testing these scenarios and
defaults to always run with the watcher sensitive values encrypted for
these tests.
This commit enables full-cluster-restart and rolling-upgrade tests
to run with nodes using a JVM in fips approved only node by using
PEM key material instead of a JKS for the transport layer in that
case.