This commit moves JVM options that we are setting on behalf of the user
that we do not expect them to fiddle with out of the jvm.options
configuration file and into the JVM options parser. In this way, we
discourage fiddling with these settings, but more importantly, we ensure
that as we evolve or add to these settings that a user would pick these
pick instead of being left behind if they have a modified jvm.options
file and do not pick any new that come with the distribution.
Our JVM ergonomics extract max heap size from JDK PrintFlagsFinal output.
On JDK 8, there is a system-dependent bug where memory sizes are cast to
32-bit integers. On affected systems (namely, Windows), when 1/4 of physical
memory is more than the maximum integer value, the output of PrintFlagsFinal
will be inaccurate. In the pathological case, where the max heap size would
be a multiple of 4g, the test will fail.
The practical effect of this bug, beyond test failures, is that we may set
MaxDirectMemorySize to an incorrect value on Windows. This commit adds a
warning about this situation during startup.
This commit removes the option to change the netty system properties to
reenable the direct buffer pooling. It also removes the need for us to
disable the buffer pooling in the system properties file. Instead, we
programmatically craete an allocator that is used by our networking
layer.
This commit does introduce an Elasticsearch property which allows the
user to fallback on the netty default allocator. If they choose this
option, they can configure the default allocator how they wish using the
standard netty properties.
This test has been failing very frequently on Windows checks for 7.4. We've also seen it for 7.x as well as for the new 7.5 tests. I'm muting during investigation so we can see if this is masking any other issues.
* Convert RunTask to use testclusers, remove ClusterFormationTasks
This PR adds a new RunTask and a way for it to start a
testclusters cluster out of band and block on it to replace
the old RunTask that used ClusterFormationTasks.
With this we can now remove ClusterFormationTasks.
* Use versions specific distribution folders so we don't need to clean up (#46539)
* Retry deleting distro dir on windows
When retarting the cluster we clean up old distribution files that might
still be in use by the OS.
Windows closes resources of ded processes async, so we do a couple of
retries to get arround it.
Closes#46014
* Avoid having to delete the distro folder.
* Remove the use of ClusterFormationTasks form RestTestTask (#47022)
This PR removes a use-case of the ClusterFormationTasks and converts a
project that flew under the radar so far.
There's probably more clean-up possible here, but for now the goal is
to be able to remove that code after `RunTask` is also updated.
* Migrate some 7.x only projects
Compilation was accidentally broken here when a backport used code from
JDK 9, which is not supported in 7.x. This commit addresses this by
using JDK 8 compatiable APIs.
This commit moves the ES_TMPDIR substitution that we do for JVM options
into the JVM options parser itself. This solves a problem where the fact
that the we do not make the substitution before ergonomics parsing can
lead to the JVM that we start for computing the ergonomic values failing
to start. Additionally, moving this substitution here enables us to
simplify the shell scripts since we do not need to implement this there,
and twice for Bash and Windows.
This PR makes the necesary adaptations to the tests and adds a power shell script to
invoke the OS tests on GCP instances connected as CI workers.
Also noticed that logs were not being produced by the tests and that theses were not using log4j so fixed that too.
One of the difficulties in working on theses tests was that the tests just stalled with no indication where the problem is.
To ease with the debugging, after process explorer suggested that the tests are running some commands, we now have multiple timeouts: one for the tests ( which will generate a thread dump ) and one for individual commands ( that bails with the command being ran and output and error so far ) to make it easier to see what went wrong.
The tests were blocking because apparently the pipes to the sub-process were not closing, thus the threads were blocking on them and we were blocking indefinitely on the join. I'm not sure why this doesn't happen in vagrant, but we now properly deal with it.
Since the bundled jdk was added to Elasticsearch, there are now 2 ways
java can be missing. Either JAVA_HOME is set but does not exist, or the
bundled jdk does not exist. This commit improves the error messages in
those two cases, and also ensures our tests cover both cases.
Looks like there's a workaround with aufs used in debian 8.
Adding `tsflags=nodocs` works around this issue and results in smaller
image files also.
Closes#47097 and elastic/infra#14780
This is the Java side of https://github.com/elastic/ml-cpp/pull/593
with a fallback so that ml-cpp bundles with either the
new or old directory structure work for the time being.
A few days after merging the C++ changes a followup to
this change will be made that removes the fallback.
Relates to #47007 . the `gradle-ospackage-plugin` plugin doesn't
properly support symlink on windows.
This PR changes the way we configure tasks to prevent building these
packages as part of a windows check.
G1 GC were setup to use an `InitiatingHeapOccupancyPercent` of 75. This
could leave used memory at a very high level for an extended duration,
triggering the real memory circuit breaker even at low activity levels.
The value is a threshold for old generation usage relative to total heap
size and thus it should leave room for the new generation. Default in
G1 is to allow up to 60 percent for new generation and this could mean that the
threshold was effectively at 135% heap usage. GC would still kick in of course and
eventually enough mixed collections would take place such that adaptive adjustment
of IHOP kicks in.
The JVM has adaptive setting of the IHOP, but this does not kick in
until it has sampled a few collections. A newly started, relatively
quiet server with primarily new generation activity could thus
experience heap above 95% frequently for a duration.
The changes here are two-fold:
1. Use 30% default for IHOP (the JVM default of 45 could still mean
105% heap usage threshold and did not fully ensure not to hit the
circuit breaker with low activity)
2. Set G1ReservePercent=25. This is used by the adaptive IHOP mechanism,
meaning old/mixed GC should kick in no later than at 75% heap. This
ensures IHOP stays compatible with the real memory circuit breaker also
after being adjusted by adaptive IHOP.
This PR adds some restrictions around testfixtures to make sure the same service ( as defiend in docker-compose.yml ) is not shared between multiple projects.
Sharing would break running with --parallel.
Projects can still share fixtures as long as each has it;s own service within.
This is still useful to share some of the setup and configuration code of the fixture.
Project now also have to specify a service name when calling useCluster to refer to a specific service.
If this is not the case all services will be claimed and the fixture can't be shared.
For this reason fixtures have to explicitly specify if they are using themselves ( fixture and tests in the same project ).
This commit disables caching of BWC snapshot distributions in the "trunk" (aka master) branch.
Since the previous major release branches move quickly we rarely get cache hits for these
tasks, and the artifacts themselves are very large. This means the overhead here is high and
savings basically zero. We conditionally disable task output caching in this scenario in CI to
avoid excessive build cache overhead as well as causing too much turn in the cache itself which
would lead to lots of cache entry evictions.
This commit teaches the build how to bundle AdoptOpenJDK with our
artifacts, and switches to AdoptOpenJDK as the bundled JDK. We keep the
functionality to also bundle Oracle OpenJDK distributions.
We currently configure io.netty.allocator.numDirectArenas to be 0 in the
jvm erconomics class. This is a config that we always want to set, so it
makes sense to move it to jvm.options.
This adds a pipeline aggregation that calculates the cumulative
cardinality of a field. It does this by iteratively merging in the
HLL sketch from consecutive buckets and emitting the cardinality up
to that point.
This is useful for things like finding the total "new" users that have
visited a website (as opposed to "repeat" visitors).
This is a Basic+ aggregation and adds a new Data Science plugin
to house it and future advanced analytics/data science aggregations.
In the Sys V init scripts, we check for Java. This is not needed, since
the same check happens in elasticsearch-env when starting up. Having
this duplicate check has bitten us in the past, where we made a change
to the logic in elasticsearch-env, but missed updating it here. Since
there is no need for this duplicate check, we remove it from the Sys V
init scripts.
Most of our CLI tools use the Terminal class, which previously did not provide methods for writing to standard output. When all output goes to standard out, there are two basic problems. First, errors and warnings are "swallowed" in pipelines, making it hard for a user to know when something's gone wrong. Second, errors and warnings are intermingled with legitimate output, making it difficult to pass the results of interactive scripts to other tools.
This commit adds a second set of print commands to Terminal for printing to standard error, with errorPrint corresponding to print and errorPrintln corresponding to println. This leaves it to developers to decide which output should go where. It also adjusts existing commands to send errors and warnings to stderr.
Usage is printed to standard output when it's correctly requested (e.g., bin/elasticsearch-keystore --help) but goes to standard error when a command is invoked incorrectly (e.g. bin/elasticsearch-keystore list-with-a-typo | sort).
* Add input and outut tracking of built bwc versions
This PR adds tracking of the bwc versions git has as input and all the
expected files as output.
The effect is that `gradlew` is not called at all when the git has
doesn't change and the version was allready built.
Previusly gradlew would be called for the bwc version and it would have
to configure the project and go trough up to date checks to figure out
that nothing changed.
This helps when working on bwc tests locally needing to run the test
multiple times.
This should also help in CI not re-build bwc versions across different
runs.
* Enable caching of bwc builds
This commit addresses an issue when trying to using Elasticsearch on
systems with Sys V init and the bundled JDK was not being used. Instead,
we were still inadvertently trying to fallback on the path. This commit
removes that fallback as that is against our intentions for 7.x where we
only support the bundled JDK or an explicit JDK via JAVA_HOME.
This commit makes the gitRevision property a lazy loaded value by
returning an Object implementing toString(). The Dockerfile template is
also changed to use groovy templates instead of the mavenfilter hack, so
converting to String will not happen until runtime.
Elasticsearch does not grant Netty reflection access to get Unsafe. The
only mechanism that currently exists to free direct buffers in a timely
manner is to use Unsafe. This leads to the occasional scenario, under
heavy network load, that direct byte buffers can slowly build up without
being freed.
This commit disables Netty direct buffer pooling and moves to a strategy
of using a single thread-local direct buffer for interfacing with sockets.
This will reduce the memory usage from networking. Elasticsearch
currently derives very little value from direct buffer usage (TLS,
compression, Lucene, Elasticsearch handling, etc all use heap bytes). So
this seems like the correct trade-off until that changes.
Prior to this PR we always checked out the latest bwc branches and had
an external mechanism to store the bwc versions used for every CI run so
we could both reproduce those builds and run additional tests using the
same combination.
This adds complexities in setting up and maintaining CI and makes it
difficult to set up multi jobs.
This change replaces that mechanism with a time based approach
that looks at the commit date of the current revision and picks the
newest on the bwc branch that's still older than that.
It also makes sure there are no merge commits in this interval.
This new behavior will is ment to be enabled in CI only, for everything
except PR checks that will still use last available bwc revision.
This commit applies a normalization process to environment paths, both
in how they are stored internally, also their settings values. This
normalization is done via two means:
- we make the paths absolute
- we remove redundant name elements from the path (what Java calls
"normalization")
This change ensures that when we compare and refer to these paths within
the system, we are using a common ground. For example, prior to the
change if the data path was relative, we would not compare it correctly
to paths from disk usage. This is because the paths in disk usage were
being made absolute.
The org.label-schema labels on Docker images have been superseded by
pre-defined OCI annotations. However, there is still a lot of tooling in
use that relies on the org.label-schema, so we do not want to drop
them. This commit adds values for the org.opencontainers.image
pre-defined annotation keys. Additionally, we correct an issue with the
label used to represent the license, to use the org.label-schema.license
label. While this label was never accepted into the org.label-schema
specfication (because this specification was superseded, it's not that
it was explicitly rejected) there are containers out there using this
label. In particular, our base image is and so we need to override
otherwise we inherit, and end up mis-reporting the license.
Today our systemd service defaults to a service type of simple. This
means that systemd assumes Elasticsearch is ready as soon as the
ExecStart (bin/elasticsearch) process is forked off. This means that the
service appears ready long before it actually is, so before it is ready
to receive requests. It also means that services that want to depend on
Elasticsearch being ready to start can not as there is not a reliable
mechanism to determine this. This commit changes the service type to
notify. This requires that Elasticsearch sends a notification message
via libsystemd sd_notify method. This commit does that by using JNA to
invoke this native method. Additionally, we use this integration to also
notify systemd when we are stopping.
The field has to be defined in log4j2.properties and should be an
escaped JSON for now (it is a broken JSON at the moment). This should later be refactored into a JSON array
of strings.
* Mute failing test
tracked in #44552
* mute EvilSecurityTests
tracking in #44558
* Fix line endings in ESJsonLayoutTests
* Mute failing ForecastIT test on windows
Tracking in #44609
* mute BasicRenormalizationIT.testDefaultRenormalization
tracked in #44613
* fix mute testDefaultRenormalization
* Increase busyWait timeout windows is slow
* Mute failure unconfigured node name
* mute x-pack internal cluster test windows
tracking #44610
* Mute JvmErgonomicsTests on windows
Tracking #44669
* mute SharedClusterSnapshotRestoreIT testParallelRestoreOperationsFromSingleSnapshot
Tracking #44671
* Mute NodeTests on Windows
Tracking #44256
In https://github.com/elastic/elasticsearch/pull/41913 setting up the
temp dir for ES was moved from the env script to individual cli scripts.
However, moving it to the windows service cli was missed. This commit
restores setting up the temp dir for the windows service control script.
Today when checksumming a plugin zip during plugin install, we read all
of the bytes of the zip into memory at once. When trying to run the
plugin installer on a small heap (say, 64 MiB), this can lead to the
plugin installer running out of memory when checksumming large
plugins. This commit addresses this by reading the plugin bytes in 8 KiB
chunks, thus using a constant amount of memory independent of the size
of the plugin.
This commit adds some default CLI JVM options to control the heap size
and the garbage collector used for the CLI tools. We do this because
otherwise the JVM will default to large initial and max heap sizes based
on the RAM visible to the JVM (which could be all the physical RAM on
the machine if not run in a container-aware JVM). This commit therefore
sets the initial heap size to 4m, the max heap size to 64m, the garbage
collector to the serial collector, and leaves this user-configurable by
honoring ES_JAVA_OPTS last.
This is a refactor to current JSON logging to make it more open for extensions
and support for custom ES log messages used inDeprecationLogger IndexingSlowLog , SearchSLowLog
We want to include x-opaque-id in deprecation logs. The easiest way to have this as an additional JSON field instead of part of the message is to create a custom DeprecatedMessage (extends ESLogMEssage)
These messages are regular log4j messages with a text, but also carry a map of fields which can then populate the log pattern. The logic for this lives in ESJsonLayout and ESMessageFieldConverter.
Similar approach can be used to refactor IndexingSlowLog and SearchSlowLog JSON logs to contain fields previously only present as escaped JSON string in a message field.
closes#41350
backport #41354
This change makes the process of verifying the signature of
official plugins FIPS 140 compliant by defaulting to use the
BouncyCastle FIPS provider and adding a dependency to bcpg-fips
that implement parts of openPGP in a FIPS compliant manner.
In already FIPS 140 enabled environments that use the
BouncyCastle FIPS provider, the bcfips dependency is redundant
but doesn't cause an issue as it will be added only in the classpath
of the cli-tools
This is a backport of #44224
When using gradle run by itself, this uses the default distro with a
basic license and enables security. There is a setup command to create
a elastic-admin user but only when the license is a trial license. Now
that security is available with the basic license, we should always run
this command when using the default distribution.
We initially added `requireDocker` for a way for tasks to say that they
absolutely must have it, like the build docker image tasks.
Projects using the test fixtures plugin are not in this both, as the
intent with these is that they will be skipped if docker and docker-compose
is not available.
Before this change we were lenient, the docker image build would succeed
but produce nothing. The implementation was also confusing as it was not
immediately obvious this was the case due to all the indirection in the
code.
The reason we have this leniency is that when we added the docker image
build, docker was a fairly new requirement for us, and we didn't have
it deployed in CI widely enough nor had CI configured to prefer workers
with docker when possible. We are in a much better position now.
The other reason was other stack teams running `./gradlew assemble`
in their respective CI and the possibility of breaking them if docker is
not installed. We have been advocating for building specific distros for
some time now and I will also send out an additional notice
The PR also removes the use of `requireDocker` from tests that actually
use test fixtures and are ok without it, and fixes a bug in test
fixtures that would cause incorrect configuration and allow some tasks
to run when docker was not available and they shouldn't have.
Closes #42680 and #42829 see also #42719
Enable audit logs in docker by creating console appenders for audit loggers.
also rename field @timestamp to timestamp and add field type with value audit
The docker build contains now two log4j configuration for oss or default versions. The build now allows override the default configuration.
Also changed the format of a timestamp from ISO8601 to include time zone as per this discussion #36833 (comment)
closes#42666
backport#42671
Before this change we would recurse to cache bwc versions.
This proved to be problematic due to the number of steps it was
generating taking too long.
Also this required tricky maintenance to break the recursion for old
branches we don't really care about.
With this change we now cache specific branches only.
Previously we used LoggedExec for running the internal bwc builds.
However, this had bad performance implications as all the output was
buffered into memory, thus we changed back to normal Exec. This commit
adds a `spoolOutput` setting to LoggedExec which can be used for
commands with large amounts of output, and switches the bwc builds to
use this flag.
The elasticsearch-cli helper script does not use the tempdir created by
elasticsearch-env, yet the env script still creates it. This can lead to
lots of temp directories being created when running cli scripts in an
automated fashion. This commit passes a fake tmpdir to the env script to
avoid creation.
closes#34445
This commit adds deletion of the bin directory to postrm cleanup. While
the package's bin files are cleaned up by the package manager, plugins
may have created subdirectories under bin. We already cleanup plugins,
but not the extra bin dirs their installation created.
closes#18109
Java 8 presents the JVM options slightly differently when displaying via
-XX:+PrintFlagsFinal. This commit adapts the JVM options parser for this
possibility.
Relates #42009
This commit removes manual parsing of JVM options when calculating
ergonomics. This is to avoid a situation that we parse values
differently than the JVM would. In fact, we already have a bug along
these lines today. It is possible to start the JVM with the same flag
multiple times on the command line. In this case, the last value
wins. For example, -Xmx1g -Xmx2g would start the JVM with a heap size of
two gigabytes. Our JVM ergonomics ignores this possibility and instead
the first value is winning!
Our strategy to avoid manual parsing of the JVM options is to start the
Java command line parser (without actually starting a JVM) by invoking
java with the same command line flags as presented and request that the
JVM tell us what values it would start with. This ensures that we have
the correct values when making ergonomic decisions.
Moreover, our strategy also is ignoring ES_JAVA_OPTS which could
override the heap size as well leading to incorrect ergonomic
choices. This commit address this issue too.
The deb package has been updated several times in the past to contain
overrides in order to pass lintian inspection. However, there have never
been any tests to ensure we do not fallback to failure. This commit
updates the overrides file given things that have changed since 2.x like
adding ML and bundling the jdk.
closes#17185
We currently download 3 variants of the same version of the jdk for
bundling into the distributions. Additionally, the vagrant images do
their own downloading. This commit moves the jdk downloading into a
utility gradle plugin. This will be used in a future PR by the packaging
tests.
The new plugin exposes a "jdks" project extension which allows creating
named jdks. Once the jdk version and platform are set for a named jdk,
the jdk object may be used as a lazy String for the jdk home path, or a
file collection for copying.
testclusters detect from settings that security is enabled
if a user is not specified using the DSL introduced in this PR, a default one is created
the appropriate wait conditions are used authenticating with the first user defined in the DSL ( or the default user ).
an example DSL to create a user is user username:"test_user" password:"x-pack-test-password" role: "superuser" all keys are optional and default to the values shown in this example
We have faked some Ivy repositories on a few artifact locations. Today
when Gradle attempts to resolve these artifacts, it follows its default
strategy to search for Gradle metadata, then Maven POM files, then Ivy
descriptors, and finally will fallback to looking directly for the
artifact. This wastes times on remote network calls that will 404 anyway
since these metadata resources will not exist for these fake Ivy
repositories. This commit overrides the Gradle strategy to look directly
for artifacts.
When Elasticsearch is run from a package installation, the running
process does not have permissions to write to the keystore. This is
because of the root:root ownership of /etc/elasticsearch. This is why we
create the keystore if it does not exist during package installation. If
the keystore needs to be upgraded, that is currently done by the running
Elasticsearch process. Yet, as just mentioned, the Elasticsearch process
would not have permissions to do that during runtime. Instead, this
needs to be done during package upgrade. This commit adds an upgrade
command to the keystore CLI for this purpose, and that is invoked during
package upgrade if the keystore already exists. This ensures that we are
always on the latest keystore format before the Elasticsearch process is
invoked, and therefore no upgrade would be needed then. While this bug
has always existed, we have not heard of reports of it in practice. Yet,
this bug becomes a lot more likely with a recent change to the format of
the keystore to remove the distinction between file and string entries.
We use Bouncy Castle to verify signatures when installing official
plugins. This leads to illegal access warnings because Bouncy Castle
accesses the Sun security provider constructor. This commit adds an
add-opens flag to suppress this illegal access.
This commit bumps the bundled JDK to version 12.0.1. Note that we had to
add a new pattern here as Oracle has changed the source of the
builds. This commit will be backported to 6.7 in a different form to
bump the bundled JDK in the Docker images too.
We had been obtaining JDK distributions from download.java.net. This
site is now presenting a certificate that does not list
download.java.net as a SAN. Therefore with host verification, the build
can not use this site. This commit switches to using download.oracle.com
which appears to be an alternative name for the same CNAME
download.oracle.com.edgekey.net. This allows our builds to resume.
hamcrest has some improvements in newer versions, like FileMatchers
that make assertions regarding file exists cleaner. This commit upgrades
to the latest version of hamcrest so we can start using new and improved
matchers.
The pid dir for both systemd and init.d is already managed by those
respective systems (tmpfiles.d and the init script, respectively). Since
the /var/run dir is often mounted as tmpfs, it does not make sense to
have the elasticsearch pid dir added by the package installation. This
commit removes that empty dir from deb and rpm.
This commit adds a filter to the files include from modules to only
include platform specific files relevant to the distribution being
built. For example, the deb files on linux would now only include linux
ML binaries, and not windows or macos files.
* fix the packer cache script
This PR disabled the explicit pull since it seems this always tries to
work with a registry.
Functionality will not be affected since we will still build the images
on pull.
Instead of allowing docker-compose to rebuild it.
With this change we tag the image with a test label, and use that
in the testing as this is simpler that dealing with a dynamically
generated docker-compose file.
This commit changes the bwc builds from a single task for a branch to a
task for each bwc artifact. This reduces the bwc build time when only
needing a specific artifact, for example when running cluster restart
tests on a mac, the windows artifacts or rpm/debs are not needed.
This commit fixes an issue when the artifact used to build the Docker
image is sourced from artifacts.elastic.co. In particular, the artifact
was not downloaded to the proper location.
* Replace usages RandomizedTestingTask with built-in Gradle Test (#40978)
This commit replaces the existing RandomizedTestingTask and supporting code with Gradle's built-in JUnit support via the Test task type. Additionally, the previous workaround to disable all tasks named "test" and create new unit testing tasks named "unitTest" has been removed such that the "test" task now runs unit tests as per the normal Gradle Java plugin conventions.
(cherry picked from commit 323f312bbc829a63056a79ebe45adced5099f6e6)
* Fix forking JVM runner
* Don't bump shadow plugin version
We previously found a bug in the JVM where AVX-512 instructions could
crash the JVM to crash with a segmentation fault. This bug impacted JDK
9 and JDK 10, but was most prominent on JDK 10 because AVX-512 was
enabled there by default. In JDK 11, this bug is reported fixed so this
commit restricts the disabling of AVX-512 to JDK 10 only. Since we no
longer support JDK 10 for any versions that this commit will be
integrated into (7.1, 8.0), we simply remove the disabling of this flag
from the JVM options.
This commit deprecates versions of Java prior to Java 11. This commit
will cause a warning to be printed to standard error when any command
line tool is invoked, or when Elasticsearch is started. Additionally, we
log a deprecation message when Elasticsearch is started.
* Add notice for bundled jdk
This commit adds the license/notice for the bundled openjdk.
* First draft
* iteration
* Fix package notices
* Iteration
* One more iteration
While yum does retry retrieving files 10 times by default [1], slow
network fetches, governed by `minrate` cause immediate aborts without
getting retried.
Wrap yum commands in a 10 iteration retry loop.
[1] http://man7.org/linux/man-pages/man5/yum.conf.5.html
Backport of #40349
On windows, JAVA_HOME is currently resolved when the windows service is started. However, this is contrary to what our documentation states. This commit moves resolution to service install. This has the side effect of making java existence checking optional in elasticsearch-env.bat, since the rest of the service commands do not require java.
closes#30720
This change removes the use of hardcoded port values for the
idp-fixture in favor of the mapped ephemeral ports. This should prevent
failures due to port conflicts in CI.
* Revert "Configure TMP for test nodes on Windows (#39959)"
This reverts commit 97562a874fcb1f29fb05272ab860a0307e97d1aa.
* Configure a tmp dir without spaces
* Pass on TMP instead of changing it
Now that we have the bundled JDK in the Docker images, we should use
them as opposed to procuring a JDK ourselves. This commit replaces the
JDK in the Docker image with the bundled JDK.