Testclusters currently provides protection from clusters living past the
life of a build by adding a shutdown hook to java. While this works in
some cases, it does not cover all cases like where the daemon is killed
with SIGKILL.
To handle these other cases, this commit replaces the shutdown hooks with
a separate process (one per build) that manages reaping external services
if gradle dies.
This commit adds the commit hash to the global build info, and adds the
git revision as an extension. There are a couple motivations for this
change:
- the current mechanism of getting the build hash does not work with
git worktrees (because jgit does not understand them)
- a follow-up will want to use the git revision when building the
Docker images, so we want it available as an extension
- it allows us to simplify our usage around the build hash as we no
longer have to hack around silliness in the info-scm plugin
A follow-up will also stop using the short hash in the product build, so
that we use the full hash there. We already know that short hashes in
our codebase do collide, so we should move to the full hash to avoid
this problem.
In https://github.com/elastic/elasticsearch/pull/41913 setting up the
temp dir for ES was moved from the env script to individual cli scripts.
However, moving it to the windows service cli was missed. This commit
restores setting up the temp dir for the windows service control script.
Backport of #43177 so that VersionProperties is Java 8 compatible and
can be used by https://github.com/elastic/elasticsearch-hadoop
to retrieve snapshot versions for Lucene.
(cherry picked from commit ec3ac9b62452f04ce44dea0a904a6e2b31dd8076)
A tool to work with snapshots.
Co-authored by @original-brownbear.
This commit adds snapshot tool and the single command cleanup, that
cleans up orphaned files for S3.
Snapshot tool lives in x-pack/snapshot-tool.
(cherry picked from commit fc4aed44dd975d83229561090f957a95cc76b287)
* Detect process third party audit being killed by OOM
It's very common for the third party audit to be killed by the OOM
killer when the system is running low on memory.
Since the forbidden APIs call is expected to fail, we were ignoring
these and incorrectly interpreting the partial output.
With this change we detect and provide a proper error message when this
happens.
The test EmptyDirTaskTests#testCreateEmptyDirNoPermissions may fail on
Windows. However, the test is only meaningful for Unix permissions
structures, so we should assume a Unix-family OS and skip the test on
Windows.
Fixes#44064
Test clusters currently has its own set of logic for dealing with
finding different versions of Elasticsearch, downloading them, and
extracting them. This commit converts testclusters to use the
DistributionDownloadPlugin.
Due to recent changes are done for converting `repository-hdfs` to test
clusters (#41252), the `integTestSecure*` tasks did not depend on
`secureHdfsFixture` which when running would fail as the fixture
would not be available. This commit adds the dependency of the fixture
to the task.
The `secureHdfsFixture` is a `AntFixture` which is spawned a process.
Internally it waits for 30 seconds for the resources to be made available.
For my local machine, it took almost 45 seconds to be available so I have
added the wait time as an input to the `AntFixture` defaults to 30 seconds
and set it to 60 seconds in case of secure hdfs fixture.
The integ test for secure hdfs was disabled for a long time and so
the changes done in #42090 to fix the tests are also done in this commit.
* Improoce how log is tailed in testclusters on failure
- only print last few lines
- print all errors and warnings
- compact repeating errors and warnings
* Test fixtures improovements
Don't disable some of the precommit tasks on fixtures.
This no longer makes sense now that a project can both produce and use a
fixture.
In order for this to be possible, had to add an additional configuration
to make JarHell class accessible to the task even if it's not a
dependency of the project and fix some of the third party audit fallout
from #43671 which wasn't detected at the time due to the issue being
fixed here.
Closes#43918
* TestClusters: Convert the security plugin
This PR moves security tests to use TestClusters.
The TLS test required support in testclusters itself, so the correct
wait condition is configgured based on the cluster settings.
* PR review
Several types of distributions are built and tested in elasticsearch,
ranging from the current version, to building or downloading snapshot or
released versions. Currently tests relying on these have to contain
logic deciding where and how to pull down these distributions.
This commit adds an distributiond download plugin for each project to
manage which versions and variants the project needs. It abstracts away
all need for knowing where a particular version comes from, like a local
distribution or bwc project, or pulling from the elastic download
service. This will be used in a followup PR by the testclusters and
vagrant tests.
When starting BWC nodes, it could be that runtime Java home is set. Yet,
runtime Java home can advance beyond what a BWC node might be compatible
with. For example, if runtime Java home is set to JDK 13 and we are
starting a 7.1.2 node, we do not have any guarantees that 7.1.2 is
compatible with JDK 13 (since we never did any work to make it so). This
will continue to be the case as JDK releases advance, but we still need
to test against BWC nodes. This commit stops applying runtime Java home
when starting a BWC node. Instead, we would use the bundled JDK.
We initially added `requireDocker` for a way for tasks to say that they
absolutely must have it, like the build docker image tasks.
Projects using the test fixtures plugin are not in this both, as the
intent with these is that they will be skipped if docker and docker-compose
is not available.
Before this change we were lenient, the docker image build would succeed
but produce nothing. The implementation was also confusing as it was not
immediately obvious this was the case due to all the indirection in the
code.
The reason we have this leniency is that when we added the docker image
build, docker was a fairly new requirement for us, and we didn't have
it deployed in CI widely enough nor had CI configured to prefer workers
with docker when possible. We are in a much better position now.
The other reason was other stack teams running `./gradlew assemble`
in their respective CI and the possibility of breaking them if docker is
not installed. We have been advocating for building specific distros for
some time now and I will also send out an additional notice
The PR also removes the use of `requireDocker` from tests that actually
use test fixtures and are ok without it, and fixes a bug in test
fixtures that would cause incorrect configuration and allow some tasks
to run when docker was not available and they shouldn't have.
Closes #42680 and #42829 see also #42719
Moves the test infrastructure away from using node.max_local_storage_nodes, allowing us in a
follow-up PR to deprecate this setting in 7.x and to remove it in 8.0.
This also changes the behavior of InternalTestCluster so that starting up nodes will not automatically
reuse data folders of previously stopped nodes. If this behavior is desired, it needs to be explicitly
done by passing the data path from the stopped node to the new node that is started.
This commit removes the jdk11 download in vagrant provisioning and
converts it to using the jdk downloader for the system jdk, and sets up
a separate jdk for use by the test (which will be converted to running
gradle in a followup).
This commit adds a guard around reading the spooled LoggedExec output.
It is possible the exec command did not output anything, and failed,
which would trigger a failure to read the output file.
This commit fixes the logging in LoggedExec which uses an in memory
buffer to read from a local reference, instead of with
getStandardOutput() of the Exec task. This is due to gradle internally
wrapping with a TeeOutputStream, breaking our cast.
Previously we used LoggedExec for running the internal bwc builds.
However, this had bad performance implications as all the output was
buffered into memory, thus we changed back to normal Exec. This commit
adds a `spoolOutput` setting to LoggedExec which can be used for
commands with large amounts of output, and switches the bwc builds to
use this flag.
* Fix slow sync test clustres artifacts task
The task was mistakenly adding a combinational explosion of task
actions all doing the same thing.
With this PR this is fixed and each version - distribution pair is only
extracted once.
I appologieze for the SSD wear.
* Look for configurations on the root project
* Add dependency on configurations
* This should be a `copy` so we don't blow away all the other distros
* Don't copy example plugin build directory in integration tests
This commit disables rhel 8 from being tested in vagrant packaging
tests. The vagrant image we use is beta release, but RHEL 8 was just
released, which has caused the package mirrors for the beta to stop
working.
This commit reworks the tests for jdk download to test the old and new
url pattern from oracle. Additionally it limits to one repository
created per version, based on the old or new pattern, and restricts
other repositories from trying to resolve jdks.
closes#41998
We currently download 3 variants of the same version of the jdk for
bundling into the distributions. Additionally, the vagrant images do
their own downloading. This commit moves the jdk downloading into a
utility gradle plugin. This will be used in a future PR by the packaging
tests.
The new plugin exposes a "jdks" project extension which allows creating
named jdks. Once the jdk version and platform are set for a named jdk,
the jdk object may be used as a lazy String for the jdk home path, or a
file collection for copying.
testclusters detect from settings that security is enabled
if a user is not specified using the DSL introduced in this PR, a default one is created
the appropriate wait conditions are used authenticating with the first user defined in the DSL ( or the default user ).
an example DSL to create a user is user username:"test_user" password:"x-pack-test-password" role: "superuser" all keys are optional and default to the values shown in this example
The run task is supposed to run elasticsearch with the given plugin or
module. However, for modules, this is most realistic if using the full
distribution. This commit changes the run setup to use the default or
oss as appropriate.
* Revert "Revert "Clean up clusters between tests (#41187)""
This reverts commit 9efc853aa668e285ede733d37b6fc7a0f4b02041.
* Remove the jdk directory to save space on bwc tests
This PR adresses the same concern as #41187 in a different way.
It removes only the JDK from the distribution once the cluster stops,
so we keep the same disk space requirements as before adding the JDK.
This is still a temporary measure, testclusters already deals with this
by doing the equivalent of `cp -l` instead of an actual copy.
Today we allow adding entries from a file or from a string, yet we
internally maintain this distinction such that if you try to add a value
from a file for a setting that expects a string or add a value from a
string for a setting that expects a file, you will have a bad time. This
causes a pain for operators such that for each setting they need to know
this difference. Yet, we do not need to maintain this distinction
internally as they are bytes after all. This commit removes that
distinction and includes logic to upgrade legacy keystores.
To reduce configuration time, we fork some threads to compute the Java
version for the various configured Javas. However, as the number of
JAVA${N}_HOME variable increases, the current implementation creates as
many threads as there are such variables, which could be more than the
number of physical cores on the machine. It is not likely that we would
see benefits to trying to execute all of these once beyond the number of
physical cores (maybe simultaneous multi-threading helps though, who
knows. Therefore, this commit limits the parallelization here to the
number number of physical cores.
If no Java versions are set then when we size the executor thread pool
we end up with zero threads, which is illegal. This commit avoids that
problem by only starting the executor when needed.
ClusterFormationTasks auto configured these properties for clusters.
This PR adds FIPS specific configuration across all test clusters from
the main build script to prevent coupling betwwen testclusters and the
build plugin.
Closes#40904
This will help with reproduction lines and running tests form IDEs and
other operations that are quick and executed often enough for the
configuration time to matter.
Running Gradle with a FIPS JVM is not supproted, so if the runtime JVM
is the same one, no need to spend time checking for fips support.
Verification of the JAVA<version>_HOME env vars is now async and
parallel so it doesn't block configuration.
This PR adds additional cleanup when stopping the node.
The data dir is excepted because it gets reused in some tests.
Without this cleanup the number of working dir copies could grew to
exhaust all available disk space.
This is related to #36652. We intend to deprecate a number of transport
settings in 7.x and remove them in 8.0. This commit removes the string
usages of these settings.
With the 7.0.0 release, we switched to download the packages instead of
using locally built ones.
This PR fixes the artifact names to include the architecture as
introduced in the 7.0.0 release.
* Replace usages RandomizedTestingTask with built-in Gradle Test (#40978)
This commit replaces the existing RandomizedTestingTask and supporting code with Gradle's built-in JUnit support via the Test task type. Additionally, the previous workaround to disable all tasks named "test" and create new unit testing tasks named "unitTest" has been removed such that the "test" task now runs unit tests as per the normal Gradle Java plugin conventions.
(cherry picked from commit 323f312bbc829a63056a79ebe45adced5099f6e6)
* Fix forking JVM runner
* Don't bump shadow plugin version
This commit sets the version to ensure that we use the bundled Java when
running integration tests for all eligible versions. In particular,
since we started bundling Java with 7.0.0, this commits sets said
version to 7.0.0.
Many gradle projects specifically use the -try exclude flag, because
there are many cases where auto-closeable resource ignore is never
referenced in body of corresponding try statement. Suppressing this
warning specifically in each case that it happens using
`@SuppressWarnings("try")` would be very verbose.
This change removes `-try` from any gradle project and adds it to the
build plugin. Also this change removes exclude flags from gradle projects
that is already specified in build plugin (for example -deprecation).
Relates to #40366
By default, in integ tests we wait for the standalone cluster to start
by using the ant Get task to retrieve the cluster health endpoint.
However the ant task has no facilities for customising the trusted
CAs for a https resource, so if the integ test cluster has TLS enabled
on the http interface (using a custom CA) we need a separate utility
for that purpose.
Backport of: #40573
Replaces the vagrant based kerberos fixtures with docker based test fixtures plugin.
The configuration is now entirely static on the docker side and no longer driven by Gradle,
also two different services are being configured since there are two different consumers of the fixture that can run in parallel and require different configurations.
* Add support for setting and keystore settings
* system properties and env var config
* use testclusters for repository-s3
* Some cleanup of the build.gradle file for plugin-s3
* add runner {} to rest integ test task
The platformTest gradle task was a packaging test meant to ensure unit
tests run on the various supported operating systems without relying on
CI to maintain a full matrix of platforms. Howevever, it never really
worked out as intended and is now additional code in our vagrant setup
to maintain. This commit removes the platformTest task.
* Revert "Configure TMP for test nodes on Windows (#39959)"
This reverts commit 97562a874fcb1f29fb05272ab860a0307e97d1aa.
* Configure a tmp dir without spaces
* Pass on TMP instead of changing it
Here are the highlights of this release:
- Feature variants AKA "optional dependencies"
- Type-safe accessors in Kotlin precompiled script plugins
- Gradle Module Metadata 1.0
For more details see https://docs.gradle.org/5.3/release-notes.html
This commit adds a variant for every official distribution that omits
the bundled jdk. The "no-jdk" naming is conveyed through the package
classifier, alongside the platform. Package tests are also added for
each new distribution.
This breaks on windows where TMP dir default to C:\Windows and startup
fails with a permission error.
I tried to create a tmp dir and pass in `TMP` env, but it lead to a
class not found error, and since testclusers is already independent of
the calling environment I stopped there.
The changes in #39732 mean that nodes in the IntegTest clusters will
now run with whichever java version is defined as `runtime.java` and
not JAVA_HOME anymore.
This means that these nodes will also run in JVM with fips approved
mode enabled and as such, need to have access to the password for the
BCFKS keystore that is used as the default keystore/truststore.
This change sets the two necessary system properties.
Resolves#39855
* Bundle java in distributions
Setting up a jdk is currently a required external step when installing
elasticsearch. This is particularly problematic for the rpm/deb packages
as installing a jdk in the same package installation command does not
guarantee any order, so must be done in separate steps. Additionally,
JAVA_HOME must be set and often causes problems in selecting a correct
jdk when, for example, the system java is an older unsupported version.
This commit bundles platform specific openjdks into each distribution.
In addition to eliminating the issues above, it also presents future
possible improvements like using jlink to build jdk images only
containing modules that elasticsearch uses.
closes#31845
This commit introduces the forget follower API. This API is needed in cases that
unfollowing a following index fails to remove the shard history retention leases
on the leader index. This can happen explicitly through user action, or
implicitly through an index managed by ILM. When this occurs, history will be
retained longer than necessary. While the retention lease will eventually
expire, it can be expensive to allow history to persist for that long, and also
prevent ILM from performing actions like shrink on the leader index. As such, we
introduce an API to allow for manual removal of the shard history retention
leases in this case.
* Back port build changes from #39102
This back-ports how versions are determined and bwc test are set up from
#39102 without enabling the bwc from current version tests so it's
easier/possible to backmerge future buld changes.
It's expected that the tets are lacking many of the required fixes in
this version to enable them.
* methods to run bin script
* Add support for specifying and installing plugins
* Add OS specific distirbution support
* Add test to verify plugin installed
* Remove use of Gradle internal OperatingSystem
* Un-mute and fix BuildExamplePluginsIT
There doesn't seem to be anything wrong with the test iteself.
I think the failure were CI performance related, but while it was muted,
some failures managed to sneak in.
Closes#38784
* PR review
When test clusters are stood up, one of the steps in the wait task is to wait for
ports files to appear. An exception throw was added if this were to time out
instead of failing with no information, but the exception text uses a missing
variable which further obfuscates the problem.
Backports #39321
Packaging tests did not honor bwc tests being off.
This was also the reason for which we were building the BWC versinons
even if the tests are off, so this closes#35347.
This fixes a bug in the sensing of the current OS family in the test cluster
formation code. Previously all builds would assume every environment
was windows and would jump to using the windows zip build. This fixes
the OS sensing code as well as updates some tests to account for
different build flavors.
Backport of #38457
This commit fixes a bug which resulted in every RandomizedTestingTask
generating its own unique test seed rather than using the global test
seed generated by the the RandomizedTestingPlugin.
The issue didn't affect build invocations which explicitly ran with
-Dtest.seed. In those cases, the Ant junit task correctly uses the
global seed.
Since the Ant random testing target looks for an Ant project property
for determining the existing seed we now explicitly set this inside
RandomizedTestingPlugin. It just so happens that, incidentally, Ant will
inherit system properties, thus wy the -Dtest.seed mechanism continued
to work.
In the ClusterConfiguration class of the build source, there is an Ant waitfor block
that runs to ensure that the seed node's transport ports file is created before
trying to read it. If the wait times out, the file read fails with not much helpful info.
This just adds a timeout property to the waitfor block and throws a descriptive
exception instead.
Backports #37574
This commit moves validation logic for ensuring our testclusters
configuration doesn't contain unexpected artifacts into the plugin
itself. This change allows us to remove the custom copy task
implementation altogether.
Additionally, the error message has been improved to display component
ids in addition to the artifacts to make it easier to figure out what
actual dependency is at fault.
Handle the case of `Description` being null which is a valid case as
described in the `HeartBeatEvent`'s javadoc, which previously resulted
in exceptions that "pollute" the build output.
Follows: #28563
Backport: #38799
Recently we changed where we source released artifacts for usage in
backwards compatibility tests. We now source these from
artifacts.elastic.co. To avoid polluting the download stats from builds,
we want to add the X-Elastic-No-KPI header to requests from
artifacts.elastic.co. To do this, we hack the Ivy feature of custom HTTP
header credentials and specify our desired headers.
When we are preparing to release a major version the rules around
"unreleased" versions and branches get a bit more complex.
This change implements the following rules:
- If the tip version on the previous major is a .0 (e.g. 6.7.0) then
the tip of the minor before that (e.g. 6.6.1) must be unreleased.
(This is because 6.7.0 would be "staged" in preparation for release,
but 6.6.1 would be open for bug fixes on the release 6.6.x line)
(in VersionCollection & VersionUtils)
- The "major.x" branch (if it exists) will always point to the latest
minor in that series. Anything that is not the latest minor, must
therefore be on a the "major.minor" branch
For example, if v7.1.0 exists then the "7.x" branch must be 7.1.0,
and 7.0.0 must be on the "7.0" branch
(in VersionCollection)
This commit adds the 7.1 version constant to the 7.x branch.
Co-authored-by: Andy Bristol <andy.bristol@elastic.co>
Co-authored-by: Tim Brooks <tim@uncontended.net>
Co-authored-by: Christoph Büscher <cbuescher@posteo.de>
Co-authored-by: Luca Cavanna <javanna@users.noreply.github.com>
Co-authored-by: markharwood <markharwood@gmail.com>
Co-authored-by: Ioannis Kakavas <ioannis@elastic.co>
Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co>
Co-authored-by: David Roberts <dave.roberts@elastic.co>
Co-authored-by: Jason Tedor <jason@tedor.me>
Co-authored-by: Alpar Torok <torokalpar@gmail.com>
Co-authored-by: David Turner <david.turner@elastic.co>
Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com>
Co-authored-by: Tim Vernum <tim@adjective.org>
Co-authored-by: Albert Zaharovits <albert.zaharovits@gmail.com>
Renames the following settings to remove the mention of `zen` in their names:
- `discovery.zen.hosts_provider` -> `discovery.seed_providers`
- `discovery.zen.ping.unicast.concurrent_connects` -> `discovery.seed_resolver.max_concurrent_resolvers`
- `discovery.zen.ping.unicast.hosts.resolve_timeout` -> `discovery.seed_resolver.timeout`
- `discovery.zen.ping.unicast.hosts` -> `discovery.seed_addresses`
Today we pass `discovery.zen.minimum_master_nodes` to nodes started up in
tests, but for 7.x nodes this setting is not required as it has no effect.
This commit removes this setting so that nodes are started with more realistic
configurations, and deprecates it.
The script is used to create a cache on ephemeral CI workers.
Changes:
- create and use a `pullFixture` task that always exists regardless
of docker support
- wire dependencies correctly so any pre fixture setup runs for pull
as well
- set up java env vars so bwc versions can build
This commit adds classifiers to the distributions indicating the
OS (for archives) and platform. The current OSes are for windows, darwin (ie
macos) and linux. This change will allow future OS/architecture specific
changes to the distributions. Note the docs using distribution links
have been updated, but will be reworked in a followup to make OS
specific instructions for the archives.
This commit fixes the distribution flavor passed to the docs tests to be
the same as the distribution. These two values are now in sync (either
oss or default) for the docs tests.
- Cluster logs are only indented as node name is already in the logs
- silence logging on shutdown
- have fully qualified name as node and cluster name
Reverts #36259 in part to make randomized test fail if no tests are ran.
This is useful when filtering tests as it's easy to make a typo and
think the test ran trough successfully.
* Testing conventions now checks for tests in main
This is the last outstanding feature of the old NamingConventionsTask,
so time to remove it.
* PR review
This change adds a docker compose configuration that's used with
the `elasticsearch.test.fixtures` plugin to start up the image
and check that the TCP ports are up.
We can build on this to add other checks for culster health,
run REST tests, etc.
We can add multiple containers and configurations to the compose
file (e.x. test different env vars) and form clusters.