This upgrade required a few significant changes. Firstly, the build
scan plugin has been renamed, and changed to be a Settings plugin rather
than a project plugin so the declaration of this has moved to our
settings.gradle file. Second, we were using a rather old version of the
Nebula ospackage plugin for building deb and rpm packages, the migration
to the latest version required some updates to get things working as
expected as we had some workarounds in place that are no longer
applicable with the latest bug fixes.
(cherry picked from commit 87f9c16e2f8870e3091062cde37b43042c3ae1c5)
The docker build task depends on the docker context being built, but it
was not explicitly setup as an input. This commit adds the task as an
input to the docker build.
relates #49613
Backport of #49079. Reimplement a number of the tests from
elastic/elasticsearch-docker.
There is also one Docker image fix here, which is that two of the provided
config files had different file permissions to the rest. I've fixed this
with another RUN chmod while building the image, and adjusted the
corresponding packaging test.
This commits sets an output marker file for the docker build tasks so
that it can be tracked as up to date. It also fixes the docker build
context task to omit the build date as in input property which always
left the task as out of date.
relates #49359
Backport of #47573.
Closes#43603. Allow environment variables to be passed to ES in a Docker
container via a file, by setting an environment variable with the `_FILE`
suffix that points to the file with the intended value of the env var.
The Apache Commons Daemon has some helpful features for Java
applications, like nice little next boxes for min heap, max heap, and
thread stack size. Our elasticsearch-service.bat script parses those
values out of the ES_JAVA_OPTS environment variable and provides them to
the Apache Commons Daemon invocation command in order to provide
sensible defaults. However, we failed to remove those values from the
ES_JAVA_OPTS environment variable, which meant they ended up in the
"Java Options" text box and would, from there, override whatever the
user put in the specific boxes for heap size or thread stack size.
This commit modifies the loop that parses ES_JAVA_OPTS to construct a
new enviroment variable containing only the values that aren't parsed
out for heap size or thread stack size, then uses that new enviroment
variable in the commons daemon invocation command.
JDK 14 has removed CMS. This commit restricts the support for CMS to JDK
8 through JDK 13, and defaults to G1 GC on JDK 14. We will revisit all
defaults in the future, but this ensures that we run with a
properly-configured garbage collector on JDK 14+.
This fixes a regression introduced in #42042. The logic here was
mistakenly inverted such that we only run these tests in a FIPS JVM
which is the opposite of what we intend.
Backport of #48849. Update `.editorconfig` to make the Java settings the
default for all files, and then apply a 2-space indent to all `*.gradle`
files. Then reformat all the files.
This commit fixes the names of the UBI-based Docker build contexts to
lift the ubi component of the name into the archive base name, instead
of the classifier.
Backport of #46599 and #47640. Add packaging tests for Docker.
* Introduce packaging tests for Docker (#46599)
Closes#37617. Add packaging tests for our Docker images, similar to what
we have for RPMs or Debian packages. This works by running a container and
probing it e.g. via `docker exec`. Test can also be run in Vagrant, by
exporting the Docker images to disk and loading them again in VMs. Docker
is installed via `Vagrantfile` in a selection of boxes.
* Only define Docker pkg tests if Docker is available (#47640)
Closes#47639, and unmutes tests that were muted in b958467.
The Docker packaging tests were being defined irrespective of whether
Docker was actually available in the current environment. Instead,
implement exclude lists so that in environments where Docker is not
available, no Docker packaging tests are defined. For CI hosts, the build
checks `.ci/dockerOnLinuxExclusions`. The Vagrant VMs can defined the
extension property `shouldTestDocker` property to opt-in to packaging
tests.
As part of this, define a seperate utility class for checking Docker,
and call that instead of defining checks in-line in BuildPlugin.groovy
This commit fixes the name of the UBI-based Docker image build contexts
to include "7" (to set us up for the future where we are likely to have
a ubi8-based image).
This commit registers the UBI-based Docker image projects in the build
so that their assemble tasks are executed when the top-level assemble
task is executed.
This commit introduces a consistent, and type-safe manner for handling
global build parameters through out our build logic. Primarily this
replaces the existing usages of extra properties with static accessors.
It also introduces and explicit API for initialization and mutation of
any such parameters, as well as better error handling for uninitialized
or eager access of parameter values.
Closes#42042
This commit simplifies the JDK copy specification in the archives build
so that it's a single line as opposed to an if/else with a repeated
body. This approach reduces the maintenance cost of this code.
* Always pass user-specified MaxDirectMemorySize
We had been testing whether a user had passed a value for
MaxDirectMemorySize by parsing the output of "java -XX:PrintFlagsFinal
-version". If MaxDirectMemorySize equals zero, we set it to half of max
heap. The problem is that on Windows with JDK 8, a JDK bug incorrectly
truncates values over 4g and returns multiples of 4g as zero. In order
to always respect the user-defined settings, we need to check our input
to see if an "-XX:MaxDirectMemorySize" value has been passed.
* Always warn for Windows/jdk8 ergo issue
Even if a user has set MaxDirectMemorySize, they aren't future-proof for
this JDK bug. With this change, we issue a general warning for the
windows/JDK8 issue, and a specific warning if MaxDirectMemorySize is
unset.
After some consideration, we are electing to make "ubi" part of the
image name instead of part of the tag. This commit implements that
change for the Elasticsearch UBI-based Docker images.
This commit bumps the bundled JDK to 13.0.1+9. Since AdoptOpenJDK did
not release 13.0.1+9 for Windows, this commit also enables that the
bundled JDK version can vary by platform.
This commit moves JVM options that we are setting on behalf of the user
that we do not expect them to fiddle with out of the jvm.options
configuration file and into the JVM options parser. In this way, we
discourage fiddling with these settings, but more importantly, we ensure
that as we evolve or add to these settings that a user would pick these
pick instead of being left behind if they have a modified jvm.options
file and do not pick any new that come with the distribution.
Our JVM ergonomics extract max heap size from JDK PrintFlagsFinal output.
On JDK 8, there is a system-dependent bug where memory sizes are cast to
32-bit integers. On affected systems (namely, Windows), when 1/4 of physical
memory is more than the maximum integer value, the output of PrintFlagsFinal
will be inaccurate. In the pathological case, where the max heap size would
be a multiple of 4g, the test will fail.
The practical effect of this bug, beyond test failures, is that we may set
MaxDirectMemorySize to an incorrect value on Windows. This commit adds a
warning about this situation during startup.
This commit removes the option to change the netty system properties to
reenable the direct buffer pooling. It also removes the need for us to
disable the buffer pooling in the system properties file. Instead, we
programmatically craete an allocator that is used by our networking
layer.
This commit does introduce an Elasticsearch property which allows the
user to fallback on the netty default allocator. If they choose this
option, they can configure the default allocator how they wish using the
standard netty properties.
This test has been failing very frequently on Windows checks for 7.4. We've also seen it for 7.x as well as for the new 7.5 tests. I'm muting during investigation so we can see if this is masking any other issues.
* Convert RunTask to use testclusers, remove ClusterFormationTasks
This PR adds a new RunTask and a way for it to start a
testclusters cluster out of band and block on it to replace
the old RunTask that used ClusterFormationTasks.
With this we can now remove ClusterFormationTasks.
* Use versions specific distribution folders so we don't need to clean up (#46539)
* Retry deleting distro dir on windows
When retarting the cluster we clean up old distribution files that might
still be in use by the OS.
Windows closes resources of ded processes async, so we do a couple of
retries to get arround it.
Closes#46014
* Avoid having to delete the distro folder.
* Remove the use of ClusterFormationTasks form RestTestTask (#47022)
This PR removes a use-case of the ClusterFormationTasks and converts a
project that flew under the radar so far.
There's probably more clean-up possible here, but for now the goal is
to be able to remove that code after `RunTask` is also updated.
* Migrate some 7.x only projects
Compilation was accidentally broken here when a backport used code from
JDK 9, which is not supported in 7.x. This commit addresses this by
using JDK 8 compatiable APIs.
This commit moves the ES_TMPDIR substitution that we do for JVM options
into the JVM options parser itself. This solves a problem where the fact
that the we do not make the substitution before ergonomics parsing can
lead to the JVM that we start for computing the ergonomic values failing
to start. Additionally, moving this substitution here enables us to
simplify the shell scripts since we do not need to implement this there,
and twice for Bash and Windows.
This PR makes the necesary adaptations to the tests and adds a power shell script to
invoke the OS tests on GCP instances connected as CI workers.
Also noticed that logs were not being produced by the tests and that theses were not using log4j so fixed that too.
One of the difficulties in working on theses tests was that the tests just stalled with no indication where the problem is.
To ease with the debugging, after process explorer suggested that the tests are running some commands, we now have multiple timeouts: one for the tests ( which will generate a thread dump ) and one for individual commands ( that bails with the command being ran and output and error so far ) to make it easier to see what went wrong.
The tests were blocking because apparently the pipes to the sub-process were not closing, thus the threads were blocking on them and we were blocking indefinitely on the join. I'm not sure why this doesn't happen in vagrant, but we now properly deal with it.
Since the bundled jdk was added to Elasticsearch, there are now 2 ways
java can be missing. Either JAVA_HOME is set but does not exist, or the
bundled jdk does not exist. This commit improves the error messages in
those two cases, and also ensures our tests cover both cases.
Looks like there's a workaround with aufs used in debian 8.
Adding `tsflags=nodocs` works around this issue and results in smaller
image files also.
Closes#47097 and elastic/infra#14780
This is the Java side of https://github.com/elastic/ml-cpp/pull/593
with a fallback so that ml-cpp bundles with either the
new or old directory structure work for the time being.
A few days after merging the C++ changes a followup to
this change will be made that removes the fallback.
Relates to #47007 . the `gradle-ospackage-plugin` plugin doesn't
properly support symlink on windows.
This PR changes the way we configure tasks to prevent building these
packages as part of a windows check.
G1 GC were setup to use an `InitiatingHeapOccupancyPercent` of 75. This
could leave used memory at a very high level for an extended duration,
triggering the real memory circuit breaker even at low activity levels.
The value is a threshold for old generation usage relative to total heap
size and thus it should leave room for the new generation. Default in
G1 is to allow up to 60 percent for new generation and this could mean that the
threshold was effectively at 135% heap usage. GC would still kick in of course and
eventually enough mixed collections would take place such that adaptive adjustment
of IHOP kicks in.
The JVM has adaptive setting of the IHOP, but this does not kick in
until it has sampled a few collections. A newly started, relatively
quiet server with primarily new generation activity could thus
experience heap above 95% frequently for a duration.
The changes here are two-fold:
1. Use 30% default for IHOP (the JVM default of 45 could still mean
105% heap usage threshold and did not fully ensure not to hit the
circuit breaker with low activity)
2. Set G1ReservePercent=25. This is used by the adaptive IHOP mechanism,
meaning old/mixed GC should kick in no later than at 75% heap. This
ensures IHOP stays compatible with the real memory circuit breaker also
after being adjusted by adaptive IHOP.
This PR adds some restrictions around testfixtures to make sure the same service ( as defiend in docker-compose.yml ) is not shared between multiple projects.
Sharing would break running with --parallel.
Projects can still share fixtures as long as each has it;s own service within.
This is still useful to share some of the setup and configuration code of the fixture.
Project now also have to specify a service name when calling useCluster to refer to a specific service.
If this is not the case all services will be claimed and the fixture can't be shared.
For this reason fixtures have to explicitly specify if they are using themselves ( fixture and tests in the same project ).
This commit disables caching of BWC snapshot distributions in the "trunk" (aka master) branch.
Since the previous major release branches move quickly we rarely get cache hits for these
tasks, and the artifacts themselves are very large. This means the overhead here is high and
savings basically zero. We conditionally disable task output caching in this scenario in CI to
avoid excessive build cache overhead as well as causing too much turn in the cache itself which
would lead to lots of cache entry evictions.
This commit teaches the build how to bundle AdoptOpenJDK with our
artifacts, and switches to AdoptOpenJDK as the bundled JDK. We keep the
functionality to also bundle Oracle OpenJDK distributions.
We currently configure io.netty.allocator.numDirectArenas to be 0 in the
jvm erconomics class. This is a config that we always want to set, so it
makes sense to move it to jvm.options.
This adds a pipeline aggregation that calculates the cumulative
cardinality of a field. It does this by iteratively merging in the
HLL sketch from consecutive buckets and emitting the cardinality up
to that point.
This is useful for things like finding the total "new" users that have
visited a website (as opposed to "repeat" visitors).
This is a Basic+ aggregation and adds a new Data Science plugin
to house it and future advanced analytics/data science aggregations.
In the Sys V init scripts, we check for Java. This is not needed, since
the same check happens in elasticsearch-env when starting up. Having
this duplicate check has bitten us in the past, where we made a change
to the logic in elasticsearch-env, but missed updating it here. Since
there is no need for this duplicate check, we remove it from the Sys V
init scripts.
Most of our CLI tools use the Terminal class, which previously did not provide methods for writing to standard output. When all output goes to standard out, there are two basic problems. First, errors and warnings are "swallowed" in pipelines, making it hard for a user to know when something's gone wrong. Second, errors and warnings are intermingled with legitimate output, making it difficult to pass the results of interactive scripts to other tools.
This commit adds a second set of print commands to Terminal for printing to standard error, with errorPrint corresponding to print and errorPrintln corresponding to println. This leaves it to developers to decide which output should go where. It also adjusts existing commands to send errors and warnings to stderr.
Usage is printed to standard output when it's correctly requested (e.g., bin/elasticsearch-keystore --help) but goes to standard error when a command is invoked incorrectly (e.g. bin/elasticsearch-keystore list-with-a-typo | sort).