This commit prepares the JvmOptionsParser to be more unit testable by
refactoring the class to have some input that it pulls from external
sources passed in as arguments. We do not change any functionality in
this commit, nor add any unit tests, we are only preparing the way.
Now that the FIPS 140 security provider is simply a test dependency
we don't need the thirdPartyAudit exceptions, but plugin-cli and
transport-netty4 do need jarHell disabled as they use the non fips
BouncyCastle security provider as a test dependency too.
This commit introduces the ability to override JVM options by adding
custom JVM options files to a jvm.options.d directory. This simplifies
administration of Elasticsearch by not requiring administrators to keep
the root jvm.options file in sync with changes that we make to the root
jvm.options file. Instead, they are not expected to modify this file but
instead supply their own in jvm.options.d. In Docker installations, this
means they can bind mount this directory in. In future versions of
Elasticsearch, we can consider removing the root jvm.options file
(instead, providing all options there as system JVM options).
* Reload secure settings with password (#43197)
If a password is not set, we assume an empty string to be
compatible with previous behavior.
Only allow the reload to be broadcast to other nodes if TLS is
enabled for the transport layer.
* Add passphrase support to elasticsearch-keystore (#38498)
This change adds support for keystore passphrases to all subcommands
of the elasticsearch-keystore cli tool and adds a subcommand for
changing the passphrase of an existing keystore.
The work to read the passphrase in Elasticsearch when
loading, which will be addressed in a different PR.
Subcommands of elasticsearch-keystore can handle (open and create)
passphrase protected keystores
When reading a keystore, a user is only prompted for a passphrase
only if the keystore is passphrase protected.
When creating a keystore, a user is allowed (default behavior) to create one with an
empty passphrase
Passphrase can be set to be empty when changing/setting it for an
existing keystore
Relates to: #32691
Supersedes: #37472
* Restore behavior for force parameter (#44847)
Turns out that the behavior of `-f` for the add and add-file sub
commands where it would also forcibly create the keystore if it
didn't exist, was by design - although undocumented.
This change restores that behavior auto-creating a keystore that
is not password protected if the force flag is used. The force
OptionSpec is moved to the BaseKeyStoreCommand as we will presumably
want to maintain the same behavior in any other command that takes
a force option.
* Handle pwd protected keystores in all CLI tools (#45289)
This change ensures that `elasticsearch-setup-passwords` and
`elasticsearch-saml-metadata` can handle a password protected
elasticsearch.keystore.
For setup passwords the user would be prompted to add the
elasticsearch keystore password upon running the tool. There is no
option to pass the password as a parameter as we assume the user is
present in order to enter the desired passwords for the built-in
users.
For saml-metadata, we prompt for the keystore password at all times
even though we'd only need to read something from the keystore when
there is a signing or encryption configuration.
* Modify docs for setup passwords and saml metadata cli (#45797)
Adds a sentence in the documentation of `elasticsearch-setup-passwords`
and `elasticsearch-saml-metadata` to describe that users would be
prompted for the keystore's password when running these CLI tools,
when the keystore is password protected.
Co-Authored-By: Lisa Cawley <lcawley@elastic.co>
* Elasticsearch keystore passphrase for startup scripts (#44775)
This commit allows a user to provide a keystore password on Elasticsearch
startup, but only prompts when the keystore exists and is encrypted.
The entrypoint in Java code is standard input. When the Bootstrap class is
checking for secure keystore settings, it checks whether or not the keystore
is encrypted. If so, we read one line from standard input and use this as the
password. For simplicity's sake, we allow a maximum passphrase length of 128
characters. (This is an arbitrary limit and could be increased or eliminated.
It is also enforced in the keystore tools, so that a user can't create a
password that's too long to enter at startup.)
In order to provide a password on standard input, we have to account for four
different ways of starting Elasticsearch: the bash startup script, the Windows
batch startup script, systemd startup, and docker startup. We use wrapper
scripts to reduce systemd and docker to the bash case: in both cases, a
wrapper script can read a passphrase from the filesystem and pass it to the
bash script.
In order to simplify testing the need for a passphrase, I have added a
has-passwd command to the keystore tool. This command can run silently, and
exit with status 0 when the keystore has a password. It exits with status 1 if
the keystore doesn't exist or exists and is unencrypted.
A good deal of the code-change in this commit has to do with refactoring
packaging tests to cleanly use the same tests for both the "archive" and the
"package" cases. This required not only moving tests around, but also adding
some convenience methods for an abstraction layer over distribution-specific
commands.
* Adjust docs for password protected keystore (#45054)
This commit adds relevant parts in the elasticsearch-keystore
sub-commands reference docs and in the reload secure settings API
doc.
* Fix failing Keystore Passphrase test for feature branch (#50154)
One problem with the passphrase-from-file tests, as written, is that
they would leave a SystemD environment variable set when they failed,
and this setting would cause elasticsearch startup to fail for other
tests as well. By using a try-finally, I hope that these tests will fail
more gracefully.
It appears that our Fedora and Ubuntu environments may be configured to
store journald information under /var rather than under /run, so that it
will persist between boots. Our destructive tests that read from the
journal need to account for this in order to avoid trying to limit the
output we check in tests.
* Run keystore management tests on docker distros (#50610)
* Add Docker handling to PackagingTestCase
Keystore tests need to be able to run in the Docker case. We can do this
by using a DockerShell instead of a plain Shell when Docker is running.
* Improve ES startup check for docker
Previously we were checking truncated output for the packaged JDK as
an indication that Elasticsearch had started. With new preliminary
password checks, we might get a false positive from ES keystore
commands, so we have to check specifically that the Elasticsearch
class from the Bootstrap package is what's running.
* Test password-protected keystore with Docker (#50803)
This commit adds two tests for the case where we mount a
password-protected keystore into a Docker container and provide a
password via a Docker environment variable.
We also fix a logging bug where we were logging the identifier for an
array of strings rather than the contents of that array.
* Add documentation for keystore startup prompting (#50821)
When a keystore is password-protected, Elasticsearch will prompt at
startup. This commit adds documentation for this prompt for the archive,
systemd, and Docker cases.
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
* Warn when unable to upgrade keystore on debian (#51011)
For Red Hat RPM upgrades, we warn if we can't upgrade the keystore. This
commit brings the same logic to the code for Debian packages. See the
posttrans file for gets executed for RPMs.
* Restore handling of string input
Adds tests that were mistakenly removed. One of these tests proved
we were not handling the the stdin (-x) option correctly when no
input was added. This commit restores the original approach of
reading stdin one char at a time until there is no more (-1, \r, \n)
instead of using readline() that might return null
* Apply spotless reformatting
* Use '--since' flag to get recent journal messages
When we get Elasticsearch logs from journald, we want to fetch only log
messages from the last run. There are two reasons for this. First, if
there are many logs, we might get a string that's too large for our
utility methods. Second, when we're looking for a specific message or
error, we almost certainly want to look only at messages from the last
execution.
Previously, we've been trying to do this by clearing out the physical
files under the journald process. But there seems to be some contention
over these directories: if journald writes a log file in between when
our deletion command deletes the file and when it deletes the log
directory, the deletion will fail.
It seems to me that we might be able to use journald's "--since" flag to
retrieve only log messages from the last run, and that this might be
less likely to fail due to race conditions in file deletion.
Unfortunately, it looks as if the "--since" flag has a granularity of
one-second. I've added a two-second sleep to make sure that there's a
sufficient gap between the test that will read from journald and the
test before it.
* Use new journald wrapper pattern
* Update version added in secure settings request
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
Co-authored-by: Ioannis Kakavas <ikakavas@protonmail.com>
Today we are repeatedly checking if the current build is a snapshot
build or not by reading the system property build.snapshot. This commit
formalizes this by adding a build parameter to indicate whether or not
the current build is a snapshot build.
In 2bb31fe (v0.6.0!) we added DEBUG-level logging to the default config of
action loggers "for easier debugging". This change to the default config lives
on to this day. It does not obviously make debugging any easier any more, but
it does result in a good deal of log noise sometimes. This commit removes this
special case from the default config.
Closes#51198
This change changes the way to run our test suites in
JVMs configured in FIPS 140 approved mode. It does so by:
- Configuring any given runtime Java in FIPS mode with the bundled
policy and security properties files, setting the system
properties java.security.properties and java.security.policy
with the == operator that overrides the default JVM properties
and policy.
- When runtime java is 11 and higher, using BouncyCastle FIPS
Cryptographic provider and BCJSSE in FIPS mode. These are
used as testRuntime dependencies for unit
tests and internal clusters, and copied (relevant jars)
explicitly to the lib directory for testclusters used in REST tests
- When runtime java is 8, using BouncyCastle FIPS
Cryptographic provider and SunJSSE in FIPS mode.
Running the tests in FIPS 140 approved mode doesn't require an
additional configuration either in CI workers or locally and is
controlled by specifying -Dtests.fips.enabled=true
Adding back accidentally removed jvm option that is required to enforce
start of the week = Monday in IsoCalendarDataProvider.
Adding a `feature` to yml test in order to skip running it in JDK8
commit that removed it 398c802
commit that backports SystemJvmOptions c4fbda3
relates 7.x backport of code that enforces CalendarDataProvider use #48349
Backport of #50927.
Closes#49653. When using _FILE environment variables to supply values
to Elasticsearch, following symlinks when checking that file permissions
are secure.
When installing multiple plugins at once, this commit changes the
behavior to report installed plugins as we go. In the case of failure,
we emit a message that we are rolling back any plugins that were
installed successfully, and also that they were successfully rolled
back. In the case a plugin is not successfully rolled back, we report
this clearly too, alerting the user that there might still be state on
disk they would have to clean up.
This commit allows the plugin installer to install multiple plugins in a
single invocation. The installation will be treated as a transaction, so
that all of the plugins are install successfully, or none of the plugins
are installed.
We renamed README.textile to README.asciidoc but a bunch of tests and
the package build itself still pointed at the old name. This switches
them the new name.
A previous commit taught Elasticsearch packages to respect ES_PATH_CONF
during installs. Missed in that commit was respecting ES_PATH_CONF on
upgrades. This commit does that. Additionally, while ES_PATH_CONF is not
currently used in pre-install, this commit adds respect to the preinst
script in case we do in the future.
Backport of #49612.
The current Docker entrypoint script picks up environment variables and
translates them into -E command line arguments. However, since any tool
executes via `docker exec` doesn't run the entrypoint, it results in
a poorer user experience.
Therefore, refactor the env var handling so that the -E options are
generated in `elasticsearch-env`. These have to be appended to any
existing command arguments, since some CLI tools have subcommands and
-E arguments must come after the subcommand.
Also extract the support for `_FILE` env vars into a separate script, so
that it can be called from more than once place (the behaviour is
idempotent).
Finally, add noop -E handling to CronEvalTool for parity, and support
`-E` in MultiCommand before subcommands.
We respect ES_PATH_CONF everywhere except package install. This commit
addresses this by respecting ES_PATH_CONF when installing the RPM/Debian
packages.
This commit tweaks the workaround introduced in #49211 to support
Gradle 6.0. In the workaround, we specifically override the address
the Gradle daemon binds to by passing the desired address via the
OPENSHIFT_IP environment variable. This works fine for builds using
Gradle 6.0, but for older Gradle versions this causes issues with
inter-daemon communication, specifically when we build BWC branches
not on Gradle 6.0. The fix here is to strip that environment variable
out when building the target BWC branch if that branch is on an
older Gradle version.
This is all temporary and will be removed when this bug fix is released
in Gradle 6.1.
Closes#50025
This upgrade required a few significant changes. Firstly, the build
scan plugin has been renamed, and changed to be a Settings plugin rather
than a project plugin so the declaration of this has moved to our
settings.gradle file. Second, we were using a rather old version of the
Nebula ospackage plugin for building deb and rpm packages, the migration
to the latest version required some updates to get things working as
expected as we had some workarounds in place that are no longer
applicable with the latest bug fixes.
(cherry picked from commit 87f9c16e2f8870e3091062cde37b43042c3ae1c5)
The docker build task depends on the docker context being built, but it
was not explicitly setup as an input. This commit adds the task as an
input to the docker build.
relates #49613
Backport of #49079. Reimplement a number of the tests from
elastic/elasticsearch-docker.
There is also one Docker image fix here, which is that two of the provided
config files had different file permissions to the rest. I've fixed this
with another RUN chmod while building the image, and adjusted the
corresponding packaging test.
This commits sets an output marker file for the docker build tasks so
that it can be tracked as up to date. It also fixes the docker build
context task to omit the build date as in input property which always
left the task as out of date.
relates #49359
Backport of #47573.
Closes#43603. Allow environment variables to be passed to ES in a Docker
container via a file, by setting an environment variable with the `_FILE`
suffix that points to the file with the intended value of the env var.
The Apache Commons Daemon has some helpful features for Java
applications, like nice little next boxes for min heap, max heap, and
thread stack size. Our elasticsearch-service.bat script parses those
values out of the ES_JAVA_OPTS environment variable and provides them to
the Apache Commons Daemon invocation command in order to provide
sensible defaults. However, we failed to remove those values from the
ES_JAVA_OPTS environment variable, which meant they ended up in the
"Java Options" text box and would, from there, override whatever the
user put in the specific boxes for heap size or thread stack size.
This commit modifies the loop that parses ES_JAVA_OPTS to construct a
new enviroment variable containing only the values that aren't parsed
out for heap size or thread stack size, then uses that new enviroment
variable in the commons daemon invocation command.
JDK 14 has removed CMS. This commit restricts the support for CMS to JDK
8 through JDK 13, and defaults to G1 GC on JDK 14. We will revisit all
defaults in the future, but this ensures that we run with a
properly-configured garbage collector on JDK 14+.
This fixes a regression introduced in #42042. The logic here was
mistakenly inverted such that we only run these tests in a FIPS JVM
which is the opposite of what we intend.
Backport of #48849. Update `.editorconfig` to make the Java settings the
default for all files, and then apply a 2-space indent to all `*.gradle`
files. Then reformat all the files.
This commit fixes the names of the UBI-based Docker build contexts to
lift the ubi component of the name into the archive base name, instead
of the classifier.
Backport of #46599 and #47640. Add packaging tests for Docker.
* Introduce packaging tests for Docker (#46599)
Closes#37617. Add packaging tests for our Docker images, similar to what
we have for RPMs or Debian packages. This works by running a container and
probing it e.g. via `docker exec`. Test can also be run in Vagrant, by
exporting the Docker images to disk and loading them again in VMs. Docker
is installed via `Vagrantfile` in a selection of boxes.
* Only define Docker pkg tests if Docker is available (#47640)
Closes#47639, and unmutes tests that were muted in b958467.
The Docker packaging tests were being defined irrespective of whether
Docker was actually available in the current environment. Instead,
implement exclude lists so that in environments where Docker is not
available, no Docker packaging tests are defined. For CI hosts, the build
checks `.ci/dockerOnLinuxExclusions`. The Vagrant VMs can defined the
extension property `shouldTestDocker` property to opt-in to packaging
tests.
As part of this, define a seperate utility class for checking Docker,
and call that instead of defining checks in-line in BuildPlugin.groovy
This commit fixes the name of the UBI-based Docker image build contexts
to include "7" (to set us up for the future where we are likely to have
a ubi8-based image).
This commit registers the UBI-based Docker image projects in the build
so that their assemble tasks are executed when the top-level assemble
task is executed.
This commit introduces a consistent, and type-safe manner for handling
global build parameters through out our build logic. Primarily this
replaces the existing usages of extra properties with static accessors.
It also introduces and explicit API for initialization and mutation of
any such parameters, as well as better error handling for uninitialized
or eager access of parameter values.
Closes#42042
This commit simplifies the JDK copy specification in the archives build
so that it's a single line as opposed to an if/else with a repeated
body. This approach reduces the maintenance cost of this code.
* Always pass user-specified MaxDirectMemorySize
We had been testing whether a user had passed a value for
MaxDirectMemorySize by parsing the output of "java -XX:PrintFlagsFinal
-version". If MaxDirectMemorySize equals zero, we set it to half of max
heap. The problem is that on Windows with JDK 8, a JDK bug incorrectly
truncates values over 4g and returns multiples of 4g as zero. In order
to always respect the user-defined settings, we need to check our input
to see if an "-XX:MaxDirectMemorySize" value has been passed.
* Always warn for Windows/jdk8 ergo issue
Even if a user has set MaxDirectMemorySize, they aren't future-proof for
this JDK bug. With this change, we issue a general warning for the
windows/JDK8 issue, and a specific warning if MaxDirectMemorySize is
unset.
After some consideration, we are electing to make "ubi" part of the
image name instead of part of the tag. This commit implements that
change for the Elasticsearch UBI-based Docker images.
This commit bumps the bundled JDK to 13.0.1+9. Since AdoptOpenJDK did
not release 13.0.1+9 for Windows, this commit also enables that the
bundled JDK version can vary by platform.
This commit moves JVM options that we are setting on behalf of the user
that we do not expect them to fiddle with out of the jvm.options
configuration file and into the JVM options parser. In this way, we
discourage fiddling with these settings, but more importantly, we ensure
that as we evolve or add to these settings that a user would pick these
pick instead of being left behind if they have a modified jvm.options
file and do not pick any new that come with the distribution.
Our JVM ergonomics extract max heap size from JDK PrintFlagsFinal output.
On JDK 8, there is a system-dependent bug where memory sizes are cast to
32-bit integers. On affected systems (namely, Windows), when 1/4 of physical
memory is more than the maximum integer value, the output of PrintFlagsFinal
will be inaccurate. In the pathological case, where the max heap size would
be a multiple of 4g, the test will fail.
The practical effect of this bug, beyond test failures, is that we may set
MaxDirectMemorySize to an incorrect value on Windows. This commit adds a
warning about this situation during startup.
This commit removes the option to change the netty system properties to
reenable the direct buffer pooling. It also removes the need for us to
disable the buffer pooling in the system properties file. Instead, we
programmatically craete an allocator that is used by our networking
layer.
This commit does introduce an Elasticsearch property which allows the
user to fallback on the netty default allocator. If they choose this
option, they can configure the default allocator how they wish using the
standard netty properties.
This test has been failing very frequently on Windows checks for 7.4. We've also seen it for 7.x as well as for the new 7.5 tests. I'm muting during investigation so we can see if this is masking any other issues.
* Convert RunTask to use testclusers, remove ClusterFormationTasks
This PR adds a new RunTask and a way for it to start a
testclusters cluster out of band and block on it to replace
the old RunTask that used ClusterFormationTasks.
With this we can now remove ClusterFormationTasks.
* Use versions specific distribution folders so we don't need to clean up (#46539)
* Retry deleting distro dir on windows
When retarting the cluster we clean up old distribution files that might
still be in use by the OS.
Windows closes resources of ded processes async, so we do a couple of
retries to get arround it.
Closes#46014
* Avoid having to delete the distro folder.
* Remove the use of ClusterFormationTasks form RestTestTask (#47022)
This PR removes a use-case of the ClusterFormationTasks and converts a
project that flew under the radar so far.
There's probably more clean-up possible here, but for now the goal is
to be able to remove that code after `RunTask` is also updated.
* Migrate some 7.x only projects
Compilation was accidentally broken here when a backport used code from
JDK 9, which is not supported in 7.x. This commit addresses this by
using JDK 8 compatiable APIs.
This commit moves the ES_TMPDIR substitution that we do for JVM options
into the JVM options parser itself. This solves a problem where the fact
that the we do not make the substitution before ergonomics parsing can
lead to the JVM that we start for computing the ergonomic values failing
to start. Additionally, moving this substitution here enables us to
simplify the shell scripts since we do not need to implement this there,
and twice for Bash and Windows.
This PR makes the necesary adaptations to the tests and adds a power shell script to
invoke the OS tests on GCP instances connected as CI workers.
Also noticed that logs were not being produced by the tests and that theses were not using log4j so fixed that too.
One of the difficulties in working on theses tests was that the tests just stalled with no indication where the problem is.
To ease with the debugging, after process explorer suggested that the tests are running some commands, we now have multiple timeouts: one for the tests ( which will generate a thread dump ) and one for individual commands ( that bails with the command being ran and output and error so far ) to make it easier to see what went wrong.
The tests were blocking because apparently the pipes to the sub-process were not closing, thus the threads were blocking on them and we were blocking indefinitely on the join. I'm not sure why this doesn't happen in vagrant, but we now properly deal with it.
Since the bundled jdk was added to Elasticsearch, there are now 2 ways
java can be missing. Either JAVA_HOME is set but does not exist, or the
bundled jdk does not exist. This commit improves the error messages in
those two cases, and also ensures our tests cover both cases.
Looks like there's a workaround with aufs used in debian 8.
Adding `tsflags=nodocs` works around this issue and results in smaller
image files also.
Closes#47097 and elastic/infra#14780
This is the Java side of https://github.com/elastic/ml-cpp/pull/593
with a fallback so that ml-cpp bundles with either the
new or old directory structure work for the time being.
A few days after merging the C++ changes a followup to
this change will be made that removes the fallback.
Relates to #47007 . the `gradle-ospackage-plugin` plugin doesn't
properly support symlink on windows.
This PR changes the way we configure tasks to prevent building these
packages as part of a windows check.
G1 GC were setup to use an `InitiatingHeapOccupancyPercent` of 75. This
could leave used memory at a very high level for an extended duration,
triggering the real memory circuit breaker even at low activity levels.
The value is a threshold for old generation usage relative to total heap
size and thus it should leave room for the new generation. Default in
G1 is to allow up to 60 percent for new generation and this could mean that the
threshold was effectively at 135% heap usage. GC would still kick in of course and
eventually enough mixed collections would take place such that adaptive adjustment
of IHOP kicks in.
The JVM has adaptive setting of the IHOP, but this does not kick in
until it has sampled a few collections. A newly started, relatively
quiet server with primarily new generation activity could thus
experience heap above 95% frequently for a duration.
The changes here are two-fold:
1. Use 30% default for IHOP (the JVM default of 45 could still mean
105% heap usage threshold and did not fully ensure not to hit the
circuit breaker with low activity)
2. Set G1ReservePercent=25. This is used by the adaptive IHOP mechanism,
meaning old/mixed GC should kick in no later than at 75% heap. This
ensures IHOP stays compatible with the real memory circuit breaker also
after being adjusted by adaptive IHOP.
This PR adds some restrictions around testfixtures to make sure the same service ( as defiend in docker-compose.yml ) is not shared between multiple projects.
Sharing would break running with --parallel.
Projects can still share fixtures as long as each has it;s own service within.
This is still useful to share some of the setup and configuration code of the fixture.
Project now also have to specify a service name when calling useCluster to refer to a specific service.
If this is not the case all services will be claimed and the fixture can't be shared.
For this reason fixtures have to explicitly specify if they are using themselves ( fixture and tests in the same project ).
This commit disables caching of BWC snapshot distributions in the "trunk" (aka master) branch.
Since the previous major release branches move quickly we rarely get cache hits for these
tasks, and the artifacts themselves are very large. This means the overhead here is high and
savings basically zero. We conditionally disable task output caching in this scenario in CI to
avoid excessive build cache overhead as well as causing too much turn in the cache itself which
would lead to lots of cache entry evictions.
This commit teaches the build how to bundle AdoptOpenJDK with our
artifacts, and switches to AdoptOpenJDK as the bundled JDK. We keep the
functionality to also bundle Oracle OpenJDK distributions.
We currently configure io.netty.allocator.numDirectArenas to be 0 in the
jvm erconomics class. This is a config that we always want to set, so it
makes sense to move it to jvm.options.
This adds a pipeline aggregation that calculates the cumulative
cardinality of a field. It does this by iteratively merging in the
HLL sketch from consecutive buckets and emitting the cardinality up
to that point.
This is useful for things like finding the total "new" users that have
visited a website (as opposed to "repeat" visitors).
This is a Basic+ aggregation and adds a new Data Science plugin
to house it and future advanced analytics/data science aggregations.
In the Sys V init scripts, we check for Java. This is not needed, since
the same check happens in elasticsearch-env when starting up. Having
this duplicate check has bitten us in the past, where we made a change
to the logic in elasticsearch-env, but missed updating it here. Since
there is no need for this duplicate check, we remove it from the Sys V
init scripts.
Most of our CLI tools use the Terminal class, which previously did not provide methods for writing to standard output. When all output goes to standard out, there are two basic problems. First, errors and warnings are "swallowed" in pipelines, making it hard for a user to know when something's gone wrong. Second, errors and warnings are intermingled with legitimate output, making it difficult to pass the results of interactive scripts to other tools.
This commit adds a second set of print commands to Terminal for printing to standard error, with errorPrint corresponding to print and errorPrintln corresponding to println. This leaves it to developers to decide which output should go where. It also adjusts existing commands to send errors and warnings to stderr.
Usage is printed to standard output when it's correctly requested (e.g., bin/elasticsearch-keystore --help) but goes to standard error when a command is invoked incorrectly (e.g. bin/elasticsearch-keystore list-with-a-typo | sort).
* Add input and outut tracking of built bwc versions
This PR adds tracking of the bwc versions git has as input and all the
expected files as output.
The effect is that `gradlew` is not called at all when the git has
doesn't change and the version was allready built.
Previusly gradlew would be called for the bwc version and it would have
to configure the project and go trough up to date checks to figure out
that nothing changed.
This helps when working on bwc tests locally needing to run the test
multiple times.
This should also help in CI not re-build bwc versions across different
runs.
* Enable caching of bwc builds
This commit addresses an issue when trying to using Elasticsearch on
systems with Sys V init and the bundled JDK was not being used. Instead,
we were still inadvertently trying to fallback on the path. This commit
removes that fallback as that is against our intentions for 7.x where we
only support the bundled JDK or an explicit JDK via JAVA_HOME.
This commit makes the gitRevision property a lazy loaded value by
returning an Object implementing toString(). The Dockerfile template is
also changed to use groovy templates instead of the mavenfilter hack, so
converting to String will not happen until runtime.
Elasticsearch does not grant Netty reflection access to get Unsafe. The
only mechanism that currently exists to free direct buffers in a timely
manner is to use Unsafe. This leads to the occasional scenario, under
heavy network load, that direct byte buffers can slowly build up without
being freed.
This commit disables Netty direct buffer pooling and moves to a strategy
of using a single thread-local direct buffer for interfacing with sockets.
This will reduce the memory usage from networking. Elasticsearch
currently derives very little value from direct buffer usage (TLS,
compression, Lucene, Elasticsearch handling, etc all use heap bytes). So
this seems like the correct trade-off until that changes.
Prior to this PR we always checked out the latest bwc branches and had
an external mechanism to store the bwc versions used for every CI run so
we could both reproduce those builds and run additional tests using the
same combination.
This adds complexities in setting up and maintaining CI and makes it
difficult to set up multi jobs.
This change replaces that mechanism with a time based approach
that looks at the commit date of the current revision and picks the
newest on the bwc branch that's still older than that.
It also makes sure there are no merge commits in this interval.
This new behavior will is ment to be enabled in CI only, for everything
except PR checks that will still use last available bwc revision.
This commit applies a normalization process to environment paths, both
in how they are stored internally, also their settings values. This
normalization is done via two means:
- we make the paths absolute
- we remove redundant name elements from the path (what Java calls
"normalization")
This change ensures that when we compare and refer to these paths within
the system, we are using a common ground. For example, prior to the
change if the data path was relative, we would not compare it correctly
to paths from disk usage. This is because the paths in disk usage were
being made absolute.
The org.label-schema labels on Docker images have been superseded by
pre-defined OCI annotations. However, there is still a lot of tooling in
use that relies on the org.label-schema, so we do not want to drop
them. This commit adds values for the org.opencontainers.image
pre-defined annotation keys. Additionally, we correct an issue with the
label used to represent the license, to use the org.label-schema.license
label. While this label was never accepted into the org.label-schema
specfication (because this specification was superseded, it's not that
it was explicitly rejected) there are containers out there using this
label. In particular, our base image is and so we need to override
otherwise we inherit, and end up mis-reporting the license.
Today our systemd service defaults to a service type of simple. This
means that systemd assumes Elasticsearch is ready as soon as the
ExecStart (bin/elasticsearch) process is forked off. This means that the
service appears ready long before it actually is, so before it is ready
to receive requests. It also means that services that want to depend on
Elasticsearch being ready to start can not as there is not a reliable
mechanism to determine this. This commit changes the service type to
notify. This requires that Elasticsearch sends a notification message
via libsystemd sd_notify method. This commit does that by using JNA to
invoke this native method. Additionally, we use this integration to also
notify systemd when we are stopping.
The field has to be defined in log4j2.properties and should be an
escaped JSON for now (it is a broken JSON at the moment). This should later be refactored into a JSON array
of strings.
* Mute failing test
tracked in #44552
* mute EvilSecurityTests
tracking in #44558
* Fix line endings in ESJsonLayoutTests
* Mute failing ForecastIT test on windows
Tracking in #44609
* mute BasicRenormalizationIT.testDefaultRenormalization
tracked in #44613
* fix mute testDefaultRenormalization
* Increase busyWait timeout windows is slow
* Mute failure unconfigured node name
* mute x-pack internal cluster test windows
tracking #44610
* Mute JvmErgonomicsTests on windows
Tracking #44669
* mute SharedClusterSnapshotRestoreIT testParallelRestoreOperationsFromSingleSnapshot
Tracking #44671
* Mute NodeTests on Windows
Tracking #44256
In https://github.com/elastic/elasticsearch/pull/41913 setting up the
temp dir for ES was moved from the env script to individual cli scripts.
However, moving it to the windows service cli was missed. This commit
restores setting up the temp dir for the windows service control script.
Today when checksumming a plugin zip during plugin install, we read all
of the bytes of the zip into memory at once. When trying to run the
plugin installer on a small heap (say, 64 MiB), this can lead to the
plugin installer running out of memory when checksumming large
plugins. This commit addresses this by reading the plugin bytes in 8 KiB
chunks, thus using a constant amount of memory independent of the size
of the plugin.
This commit adds some default CLI JVM options to control the heap size
and the garbage collector used for the CLI tools. We do this because
otherwise the JVM will default to large initial and max heap sizes based
on the RAM visible to the JVM (which could be all the physical RAM on
the machine if not run in a container-aware JVM). This commit therefore
sets the initial heap size to 4m, the max heap size to 64m, the garbage
collector to the serial collector, and leaves this user-configurable by
honoring ES_JAVA_OPTS last.
This is a refactor to current JSON logging to make it more open for extensions
and support for custom ES log messages used inDeprecationLogger IndexingSlowLog , SearchSLowLog
We want to include x-opaque-id in deprecation logs. The easiest way to have this as an additional JSON field instead of part of the message is to create a custom DeprecatedMessage (extends ESLogMEssage)
These messages are regular log4j messages with a text, but also carry a map of fields which can then populate the log pattern. The logic for this lives in ESJsonLayout and ESMessageFieldConverter.
Similar approach can be used to refactor IndexingSlowLog and SearchSlowLog JSON logs to contain fields previously only present as escaped JSON string in a message field.
closes#41350
backport #41354
This change makes the process of verifying the signature of
official plugins FIPS 140 compliant by defaulting to use the
BouncyCastle FIPS provider and adding a dependency to bcpg-fips
that implement parts of openPGP in a FIPS compliant manner.
In already FIPS 140 enabled environments that use the
BouncyCastle FIPS provider, the bcfips dependency is redundant
but doesn't cause an issue as it will be added only in the classpath
of the cli-tools
This is a backport of #44224
When using gradle run by itself, this uses the default distro with a
basic license and enables security. There is a setup command to create
a elastic-admin user but only when the license is a trial license. Now
that security is available with the basic license, we should always run
this command when using the default distribution.
We initially added `requireDocker` for a way for tasks to say that they
absolutely must have it, like the build docker image tasks.
Projects using the test fixtures plugin are not in this both, as the
intent with these is that they will be skipped if docker and docker-compose
is not available.
Before this change we were lenient, the docker image build would succeed
but produce nothing. The implementation was also confusing as it was not
immediately obvious this was the case due to all the indirection in the
code.
The reason we have this leniency is that when we added the docker image
build, docker was a fairly new requirement for us, and we didn't have
it deployed in CI widely enough nor had CI configured to prefer workers
with docker when possible. We are in a much better position now.
The other reason was other stack teams running `./gradlew assemble`
in their respective CI and the possibility of breaking them if docker is
not installed. We have been advocating for building specific distros for
some time now and I will also send out an additional notice
The PR also removes the use of `requireDocker` from tests that actually
use test fixtures and are ok without it, and fixes a bug in test
fixtures that would cause incorrect configuration and allow some tasks
to run when docker was not available and they shouldn't have.
Closes #42680 and #42829 see also #42719
Enable audit logs in docker by creating console appenders for audit loggers.
also rename field @timestamp to timestamp and add field type with value audit
The docker build contains now two log4j configuration for oss or default versions. The build now allows override the default configuration.
Also changed the format of a timestamp from ISO8601 to include time zone as per this discussion #36833 (comment)
closes#42666
backport#42671
Before this change we would recurse to cache bwc versions.
This proved to be problematic due to the number of steps it was
generating taking too long.
Also this required tricky maintenance to break the recursion for old
branches we don't really care about.
With this change we now cache specific branches only.
Previously we used LoggedExec for running the internal bwc builds.
However, this had bad performance implications as all the output was
buffered into memory, thus we changed back to normal Exec. This commit
adds a `spoolOutput` setting to LoggedExec which can be used for
commands with large amounts of output, and switches the bwc builds to
use this flag.
The elasticsearch-cli helper script does not use the tempdir created by
elasticsearch-env, yet the env script still creates it. This can lead to
lots of temp directories being created when running cli scripts in an
automated fashion. This commit passes a fake tmpdir to the env script to
avoid creation.
closes#34445
This commit adds deletion of the bin directory to postrm cleanup. While
the package's bin files are cleaned up by the package manager, plugins
may have created subdirectories under bin. We already cleanup plugins,
but not the extra bin dirs their installation created.
closes#18109
Java 8 presents the JVM options slightly differently when displaying via
-XX:+PrintFlagsFinal. This commit adapts the JVM options parser for this
possibility.
Relates #42009
This commit removes manual parsing of JVM options when calculating
ergonomics. This is to avoid a situation that we parse values
differently than the JVM would. In fact, we already have a bug along
these lines today. It is possible to start the JVM with the same flag
multiple times on the command line. In this case, the last value
wins. For example, -Xmx1g -Xmx2g would start the JVM with a heap size of
two gigabytes. Our JVM ergonomics ignores this possibility and instead
the first value is winning!
Our strategy to avoid manual parsing of the JVM options is to start the
Java command line parser (without actually starting a JVM) by invoking
java with the same command line flags as presented and request that the
JVM tell us what values it would start with. This ensures that we have
the correct values when making ergonomic decisions.
Moreover, our strategy also is ignoring ES_JAVA_OPTS which could
override the heap size as well leading to incorrect ergonomic
choices. This commit address this issue too.
The deb package has been updated several times in the past to contain
overrides in order to pass lintian inspection. However, there have never
been any tests to ensure we do not fallback to failure. This commit
updates the overrides file given things that have changed since 2.x like
adding ML and bundling the jdk.
closes#17185
We currently download 3 variants of the same version of the jdk for
bundling into the distributions. Additionally, the vagrant images do
their own downloading. This commit moves the jdk downloading into a
utility gradle plugin. This will be used in a future PR by the packaging
tests.
The new plugin exposes a "jdks" project extension which allows creating
named jdks. Once the jdk version and platform are set for a named jdk,
the jdk object may be used as a lazy String for the jdk home path, or a
file collection for copying.
testclusters detect from settings that security is enabled
if a user is not specified using the DSL introduced in this PR, a default one is created
the appropriate wait conditions are used authenticating with the first user defined in the DSL ( or the default user ).
an example DSL to create a user is user username:"test_user" password:"x-pack-test-password" role: "superuser" all keys are optional and default to the values shown in this example
We have faked some Ivy repositories on a few artifact locations. Today
when Gradle attempts to resolve these artifacts, it follows its default
strategy to search for Gradle metadata, then Maven POM files, then Ivy
descriptors, and finally will fallback to looking directly for the
artifact. This wastes times on remote network calls that will 404 anyway
since these metadata resources will not exist for these fake Ivy
repositories. This commit overrides the Gradle strategy to look directly
for artifacts.
When Elasticsearch is run from a package installation, the running
process does not have permissions to write to the keystore. This is
because of the root:root ownership of /etc/elasticsearch. This is why we
create the keystore if it does not exist during package installation. If
the keystore needs to be upgraded, that is currently done by the running
Elasticsearch process. Yet, as just mentioned, the Elasticsearch process
would not have permissions to do that during runtime. Instead, this
needs to be done during package upgrade. This commit adds an upgrade
command to the keystore CLI for this purpose, and that is invoked during
package upgrade if the keystore already exists. This ensures that we are
always on the latest keystore format before the Elasticsearch process is
invoked, and therefore no upgrade would be needed then. While this bug
has always existed, we have not heard of reports of it in practice. Yet,
this bug becomes a lot more likely with a recent change to the format of
the keystore to remove the distinction between file and string entries.
We use Bouncy Castle to verify signatures when installing official
plugins. This leads to illegal access warnings because Bouncy Castle
accesses the Sun security provider constructor. This commit adds an
add-opens flag to suppress this illegal access.
This commit bumps the bundled JDK to version 12.0.1. Note that we had to
add a new pattern here as Oracle has changed the source of the
builds. This commit will be backported to 6.7 in a different form to
bump the bundled JDK in the Docker images too.
We had been obtaining JDK distributions from download.java.net. This
site is now presenting a certificate that does not list
download.java.net as a SAN. Therefore with host verification, the build
can not use this site. This commit switches to using download.oracle.com
which appears to be an alternative name for the same CNAME
download.oracle.com.edgekey.net. This allows our builds to resume.
hamcrest has some improvements in newer versions, like FileMatchers
that make assertions regarding file exists cleaner. This commit upgrades
to the latest version of hamcrest so we can start using new and improved
matchers.
The pid dir for both systemd and init.d is already managed by those
respective systems (tmpfiles.d and the init script, respectively). Since
the /var/run dir is often mounted as tmpfs, it does not make sense to
have the elasticsearch pid dir added by the package installation. This
commit removes that empty dir from deb and rpm.
This commit adds a filter to the files include from modules to only
include platform specific files relevant to the distribution being
built. For example, the deb files on linux would now only include linux
ML binaries, and not windows or macos files.
* fix the packer cache script
This PR disabled the explicit pull since it seems this always tries to
work with a registry.
Functionality will not be affected since we will still build the images
on pull.
Instead of allowing docker-compose to rebuild it.
With this change we tag the image with a test label, and use that
in the testing as this is simpler that dealing with a dynamically
generated docker-compose file.
This commit changes the bwc builds from a single task for a branch to a
task for each bwc artifact. This reduces the bwc build time when only
needing a specific artifact, for example when running cluster restart
tests on a mac, the windows artifacts or rpm/debs are not needed.
This commit fixes an issue when the artifact used to build the Docker
image is sourced from artifacts.elastic.co. In particular, the artifact
was not downloaded to the proper location.
* Replace usages RandomizedTestingTask with built-in Gradle Test (#40978)
This commit replaces the existing RandomizedTestingTask and supporting code with Gradle's built-in JUnit support via the Test task type. Additionally, the previous workaround to disable all tasks named "test" and create new unit testing tasks named "unitTest" has been removed such that the "test" task now runs unit tests as per the normal Gradle Java plugin conventions.
(cherry picked from commit 323f312bbc829a63056a79ebe45adced5099f6e6)
* Fix forking JVM runner
* Don't bump shadow plugin version
We previously found a bug in the JVM where AVX-512 instructions could
crash the JVM to crash with a segmentation fault. This bug impacted JDK
9 and JDK 10, but was most prominent on JDK 10 because AVX-512 was
enabled there by default. In JDK 11, this bug is reported fixed so this
commit restricts the disabling of AVX-512 to JDK 10 only. Since we no
longer support JDK 10 for any versions that this commit will be
integrated into (7.1, 8.0), we simply remove the disabling of this flag
from the JVM options.
This commit deprecates versions of Java prior to Java 11. This commit
will cause a warning to be printed to standard error when any command
line tool is invoked, or when Elasticsearch is started. Additionally, we
log a deprecation message when Elasticsearch is started.
* Add notice for bundled jdk
This commit adds the license/notice for the bundled openjdk.
* First draft
* iteration
* Fix package notices
* Iteration
* One more iteration
While yum does retry retrieving files 10 times by default [1], slow
network fetches, governed by `minrate` cause immediate aborts without
getting retried.
Wrap yum commands in a 10 iteration retry loop.
[1] http://man7.org/linux/man-pages/man5/yum.conf.5.html
Backport of #40349
On windows, JAVA_HOME is currently resolved when the windows service is started. However, this is contrary to what our documentation states. This commit moves resolution to service install. This has the side effect of making java existence checking optional in elasticsearch-env.bat, since the rest of the service commands do not require java.
closes#30720
This change removes the use of hardcoded port values for the
idp-fixture in favor of the mapped ephemeral ports. This should prevent
failures due to port conflicts in CI.
* Revert "Configure TMP for test nodes on Windows (#39959)"
This reverts commit 97562a874fcb1f29fb05272ab860a0307e97d1aa.
* Configure a tmp dir without spaces
* Pass on TMP instead of changing it
Now that we have the bundled JDK in the Docker images, we should use
them as opposed to procuring a JDK ourselves. This commit replaces the
JDK in the Docker image with the bundled JDK.
This commit adds cd $ES_HOME to elasticsearch-env and removes it from
elasticsearch. This way, both elasticsearch and elasticsearch-cli are
executed with the working directory set to $ES_HOME. The need for the
fix arose from the following bug:
1. Explicitly set path.data to relative to ES_HOME path in
elasticsearch.yml.
2. Run elasticsearch from any directory. Elasticsearch is able to
correctly start.
3. Stop elasticsearch.
4. Run elasticsearch-node unsafe-bootstrap, not from ES_HOME directory.
It will fail with an exception.
This commit fixes the issue and adds a new test.
This PR fixes the issue and adds a new test.
Also tests >=100 are renamed because alphabetic order does not work for
them.
(cherry picked from commit 2ffc29306ff7366efc598e7b4dd2ce528895cd3a
with fixes by #40083 and #40118)
This commit adds a variant for every official distribution that omits
the bundled jdk. The "no-jdk" naming is conveyed through the package
classifier, alongside the platform. Package tests are also added for
each new distribution.
This breaks on windows where TMP dir default to C:\Windows and startup
fails with a permission error.
I tried to create a tmp dir and pass in `TMP` env, but it lead to a
class not found error, and since testclusers is already independent of
the calling environment I stopped there.
The posix_spawn method of launching a process from Java
goes via an intermediate process called jspawnhelper
which lives in the lib directory rather than the bin
directory and hence got missed by the original chmod
loop. This change adds jspawnhelper as a special case.
It's the only program that's in the lib directory in a
macOS JDK 11.
* Bundle java in distributions
Setting up a jdk is currently a required external step when installing
elasticsearch. This is particularly problematic for the rpm/deb packages
as installing a jdk in the same package installation command does not
guarantee any order, so must be done in separate steps. Additionally,
JAVA_HOME must be set and often causes problems in selecting a correct
jdk when, for example, the system java is an older unsupported version.
This commit bundles platform specific openjdks into each distribution.
In addition to eliminating the issues above, it also presents future
possible improvements like using jlink to build jdk images only
containing modules that elasticsearch uses.
closes#31845
* Back port build changes from #39102
This back-ports how versions are determined and bwc test are set up from
#39102 without enabling the bwc from current version tests so it's
easier/possible to backmerge future buld changes.
It's expected that the tets are lacking many of the required fixes in
this version to enable them.
This commit adds a new build type (together with deb/rpm/tar/zip) to
represent the official Docker images. This build type will be displayed
in APIs such as the main and nodes info APIs.
With this commit we provide more info in an existing error message that is
raised when the file `jvm.dll` cannot be found on Windows when installing
Elasticsearch as a service.
This commit makes the rpm metadata indicate the pre 7.0 noarch packages
are obsoleted by this package. This fixes an issue where upgrading with
yum would cause an error thinking there was nothing to upgrade.
closes#39414
This commit sets the BWC projects to build in parallel if Gradle was
invoked with parallal project execution enabled. This substantially
speeds up the time of building the BWC projects since there are many
dependent projects needed to build a BWC version.
Finding java on the path is sometimes confusing for users and
unexpected, as well as leading to a different java being used than a
user expects. This commit adds warning messages when starting
elasticsearch (or any tools like the plugin cli) and using java found
on the PATH instead of via JAVA_HOME.
As the Dockerfile evolved we don't need anymore certain commands like
`unzip`, `which` and `wget` allowing us to slightly shrink the images.
Backport of: #39040
Currently init scripts fail when `/proc/sys/vm/max_map_count` is not present
with `-bash: [: too many arguments`.
Fix conditional logic to avoid trying to set the `max_map_count` sysctl if not
present.
Backport of: #35933
Relates: #27236
This commit enables the copyDockerfile task to render a Dockerfile that
sources the Elasticsearch binary from artifacts.elastic.co. This is
needed for reproducibility and transparency for the official Docker
images in the Docker library.
This commit adds the 7.1 version constant to the 7.x branch.
Co-authored-by: Andy Bristol <andy.bristol@elastic.co>
Co-authored-by: Tim Brooks <tim@uncontended.net>
Co-authored-by: Christoph Büscher <cbuescher@posteo.de>
Co-authored-by: Luca Cavanna <javanna@users.noreply.github.com>
Co-authored-by: markharwood <markharwood@gmail.com>
Co-authored-by: Ioannis Kakavas <ioannis@elastic.co>
Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co>
Co-authored-by: David Roberts <dave.roberts@elastic.co>
Co-authored-by: Jason Tedor <jason@tedor.me>
Co-authored-by: Alpar Torok <torokalpar@gmail.com>
Co-authored-by: David Turner <david.turner@elastic.co>
Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com>
Co-authored-by: Tim Vernum <tim@adjective.org>
Co-authored-by: Albert Zaharovits <albert.zaharovits@gmail.com>
Renames the following settings to remove the mention of `zen` in their names:
- `discovery.zen.hosts_provider` -> `discovery.seed_providers`
- `discovery.zen.ping.unicast.concurrent_connects` -> `discovery.seed_resolver.max_concurrent_resolvers`
- `discovery.zen.ping.unicast.hosts.resolve_timeout` -> `discovery.seed_resolver.timeout`
- `discovery.zen.ping.unicast.hosts` -> `discovery.seed_addresses`
This commit adds classifiers to the distributions indicating the
OS (for archives) and platform. The current OSes are for windows, darwin (ie
macos) and linux. This change will allow future OS/architecture specific
changes to the distributions. Note the docs using distribution links
have been updated, but will be reworked in a followup to make OS
specific instructions for the archives.
In order to support JSON log format, a custom pattern layout was used and its configuration is enclosed in ESJsonLayout. Users are free to use their own patterns, but if smooth Beats integration is needed, they should use ESJsonLayout. EvilLoggerTests are left intact to make sure user's custom log patterns work fine.
To populate additional fields node.id and cluster.uuid which are not available at start time,
a cluster state update will have to be received and the values passed to log4j pattern converter.
A ClusterStateObserver.Listener is used to receive only one ClusteStateUpdate. Once update is received the nodeId and clusterUUid are set in a static field in a NodeAndClusterIdConverter.
Following fields are expected in JSON log lines: type, tiemstamp, level, component, cluster.name, node.name, node.id, cluster.uuid, message, stacktrace
see ESJsonLayout.java for more details and field descriptions
Docker log4j2 configuration is now almost the same as the one use for ES binary.
The only difference is that docker is using console appenders, whereas ES is using file appenders.
relates: #32850
The /etc/elasticsearch directory is currently configured as a config
file with noreplace. However, the directory itself is not config, and
can lead to an entire /etc/elasticsearch.rpmsave directory in some
situations. This commit fixes the ospackage config to not specify those
file bits for the directory itself, but only the files underneath it.
* Exit batch files explictly using ERRORLEVEL
This makes sure the exit code is preserved when calling the batch
files from different contexts other than DOS
Fixes#29582
This also fixes specific error codes being masked by an explict
exit /b 1
causing the useful exitcodes from ExitCodes to be lost.
* fix line breaks for calling cli to match the bash scripts
* indent size of bash files is 2, make sure editorconfig does the same for bat files
* update indenting to match bash files
* update elasticsearch-keystore.bat indenting
* Update elasticsearch-node.bat to exit outside of endlocal
elasticsearch-node tool helps to restore cluster if half or more of
master eligible nodes are lost. Of course, all bets are off, regarding
data consistency.
There are two parts of the tool: unsafe-bootstrap to be used when there
is still at least one master-eligible node alive and detach-cluster,
when there are no master-eligible nodes left.
This commit implements the first part.
Docs for the tool will be added separately as a part of #37812.
* Testing conventions now checks for tests in main
This is the last outstanding feature of the old NamingConventionsTask,
so time to remove it.
* PR review
This change adds a docker compose configuration that's used with
the `elasticsearch.test.fixtures` plugin to start up the image
and check that the TCP ports are up.
We can build on this to add other checks for culster health,
run REST tests, etc.
We can add multiple containers and configurations to the compose
file (e.x. test different env vars) and form clusters.
Currently integration tests which use either bwc snapshot versions or
the current version of elasticsearch depend on project substitutions to
link to the build of those artifacts. Likewise, vagrant tests use
dependency substitutions to get to bwc snapshots of rpm and debs.
This commit changes those to depend on the relevant project/configuration
and removes the dependency substitutions for distributions we do not
publish.
The integ tests currently use the raw zip project name as the
distribution type. This commit simplifies this specification to be
"default" or "oss". Whether zip or tar is used should be an internal
implementation detail of the integ test setup, which can (in the future)
be platform specific.
With the release of 11.0.2, the old URLs no longer work. This exposed a
few small bugs in the gradle config. One was that --no-cache was not
present in the docker build command, so it was not failing at
first. Then once only the ext.expansions was changed and the docker
build task was not, it was not executing it.
This commit updates the file docker's entrypoint script looks for when
deciding to process the ELASTIC_PASSWORD env var. The x-pack subdir
of bin no longer exists in 7.0, where the backcompat layer for x-pack
script locations was removed.
closes#37240
Some systems default to a nofile ulimit of 65535. To reduce the pain of
deploying Elasticsearch to such systems, this commit lowers the required
limit from 65536 to 65535.
This commit removes permission editing commands from the postinst
scriptlet. Instead, we now fully configure the owner/group (as well as
sticky bit) for these files and directories.
closes#37143
* Default include_type_name to false for get and put mappings.
* Default include_type_name to false for get field mappings.
* Add a constant for the default include_type_name value.
* Default include_type_name to false for get and put index templates.
* Default include_type_name to false for create index.
* Update create index calls in REST documentation to use include_type_name=true.
* Some minor clean-ups around the get index API.
* In REST tests, use include_type_name=true by default for index creation.
* Make sure to use 'expression == false'.
* Clarify the different IndexTemplateMetaData toXContent methods.
* Fix FullClusterRestartIT#testSnapshotRestore.
* Fix the ml_anomalies_default_mappings test.
* Fix GetFieldMappingsResponseTests and GetIndexTemplateResponseTests.
We make sure to specify include_type_name=true during xContent parsing,
so we continue to test the legacy typed responses. XContent generation
for the typeless responses is currently only covered by REST tests,
but we will be adding unit test coverage for these as we implement
each typeless API in the Java HLRC.
This commit also refactors GetMappingsResponse to follow the same appraoch
as the other mappings-related responses, where we read include_type_name
out of the xContent params, instead of creating a second toXContent method.
This gives better consistency in the response parsing code.
* Fix more REST tests.
* Improve some wording in the create index documentation.
* Add a note about types removal in the create index docs.
* Fix SmokeTestMonitoringWithSecurityIT#testHTTPExporterWithSSL.
* Make sure to mention include_type_name in the REST docs for affected APIs.
* Make sure to use 'expression == false' in FullClusterRestartIT.
* Mention include_type_name in the REST templates docs.
This commit makes the assemble tasks in the bwc projects noops by
setting the dependsOn directly. While we can not remove things from
dependsOn, we can still completely override the dependencies.
closes#33581
This commit adds a unique id to cluster blocks, so that they can be uniquely
identified if needed. This is important for the Close Index API where multiple
concurrent closing requests can be executed at the same time. By adding a
UUID to the cluster block, we can generate unique "closing block" that can
later be verified on shards and then checked again from the cluster state
before closing the index. When the verification on shard is done, the closing
block is replaced by the regular INDEX_CLOSED_BLOCK instance.
If something goes wrong, calling the Open Index API will remove the block.
Related to #33888
With this commit we instruct curl to retry with a backoff when
downloading the JDK for the Elasticsearch Docker image. This avoids
build failures on transient network issues. Note that this option
requires curl 7.12.3 or better.
Relates #37103
Relates #37113
We added some special handling for installing and removing the
ingest-geoip and ingest-user-agent plugins when we converted them to
modules. This special handling was done to minimize breaking users in a
minor release. However, do not want to maintain this behavior forever so
this commit removes that special handling in the master branch so that
starting with 7.0.0 this special handling will be gone.
* Deprecate types in index API
- deprecate type-based constructors of IndexRequest
- update tests to use typeless IndexRequest constructors
- no yaml tests as they have been already added in #35790
Relates to #35190
The following updates were made:
* Add deprecation warnings to `RestUpdateAction`, plus a test in `RestUpdateActionTests`.
* Deprecate relevant methods on the Java HLRC requests/ responses.
* Add HLRC integration tests for the typed APIs.
* Update documentation (for both the REST API and Java HLRC).
* Fix failing integration tests.
Because of an earlier PR, the REST yml tests were already updated (one version without types, and another legacy version that retains types).
The commit changes how indices are closed in the MetaDataIndexStateService.
It now uses a 3 steps process where writes are blocked on indices to be closed,
then some verifications are done on shards using the TransportVerifyShardBeforeCloseAction
added in #36249, and finally indices states are moved to CLOSE and their routing
tables removed.
The closing process also takes care of using the pre-7.0 way to close indices if the
cluster contains mixed version of nodes and a node does not support the TransportVerifyShardBeforeCloseAction. It also closes unassigned indices.
Related to #33888
When a security manager is present, the JVM will cache positive hostname
lookups indefinitely. This can be problematic, especially in the modern
world with cloud services where DNS addresses can change, or
environments using Docker containers where IP addresses could be
considered ephemeral. This behavior impacts cluster discovery,
cross-cluster replication and cross-cluster search, reindex from remote,
snapshot repositories, webhooks in Watcher, external authentication
mechanisms, and the Elastic Stack Monitoring Service. The experience of
watching a DNS lookup change yet not be reflected within Elasticsearch
is a poor experience for users. The reason the JVM has this is guard
against DNS cache posioning attacks. Yet, there is already a defense in
the modern world against such attacks: TLS. With proper certificate
validation, even if a resolver falls prey to a DNS cache poisoning
attack, using TLS would neuter the attack. Therefore we have a policy
with dubious security value that significantly impacts usability. As
such we make the usability/security tradeoff towards usability, since
the security risks are very low. This commit introduces new system
properties that Elasticsearch observes to override the JVM DNS cache
policy.
* Don't print download progress in batch mode
With this change we will no longer provide the progress bar in batch
mode.
Assuming that this is mode is mainly for consumption by tools which
will serialize the output, we shouldn't print a progress bar to be
for every percentile.
* PR review
For each API, the following updates were made:
- Add deprecation warnings to `Rest*Action`, plus tests in `Rest*ActionTests`.
- For each REST yml test, make sure there is one version without types, and another legacy version that retains types (called *_with_types.yml).
- Deprecate relevant methods on the Java HLRC requests/ responses.
- Update documentation (for both the REST API and Java HLRC).
This commit introduces the building of the Docker images as bonafide
packaging formats alongside our existing archive and packaging
distributions. This build is migrated from a dedicated repository, and
converted to Gradle in the process.
Currently is `java` is not in $PATH the preinst script fails
prematurely and prevents an appropriate message from getting displayed
to the user.
Make package installation more user friendly when java is not in
$PATH and add a test for it.
Also use a she-bang in the preinst script, as, at least in Debian,
maintainer scripts must start with the #! convention [1].
Relates #31845
[1] https://www.debian.org/doc/debian-policy/ch-maintainerscripts.html
In the long run we want to move all of startup to a Java program. This
will simplify our startup scripts and make maintenance of startup less
dependent on the underlying platform that we run on. This commit moves
the creation of the temporary directory off of system-dependent commands
and onto a simple Java program.
The list of official plugins accidentally included `qa` projects like,
well, `qa` and `amazon-ec2`. This changes the mechanism that we use to
build the list and adds a test to catch this.
Closes#35623
With this change, `Version` no longer carries information about the qualifier,
we still need a way to show the "display version" that does have both
qualifier and snapshot. This is now stored by the build and red from `META-INF`.
* Introduce property to set version qualifier
- VersionProperties.elasticsearch is now a string which can have qualifier
and snapshot too
- The Version class in the build no longer cares about snapshot and
qualifier.
This commit updates the procrun manager and service exes to 1.1.0. There
are a few bug fixes, including for a bug which can cause lingering
processes when removing the service.
Back in #32983 I broke running the integ-test-zip tests against an
external cluster by adding a test that reads the contents of the log
file. This fixes running against an external cluster by explicitly
skipping that test if running against an external cluster.
The BWC builds for the 6.x branch should be using JDK 11. This commit
fixes the BWC builds to specify that they use JDK 11 instead of JDK 10
which is now incompatible with the 6.x build.
To pass the HOSTNAME envrionment variable to the Windows service, we
have to add some command line flags to the service invocation. Namely,
we have to specify that we are passing HOSTNAME variable, and we will
pass for it the value of %%COMPUTERNAME%%. This ensures that if the
hostname is changed, we pick this up the next time that the service is
started. This change is needed for the service now that we use the
HOSTNAME as the default node name.
#32281 adds elasticsearch-shard to provide bwc version of elasticsearch-translog for 6.x; have to remove elasticsearch-translog for 7.0
Relates to #31389
When we implemented `refresh=wait_for` I added a test with the wrong
name. This caused us to not run it. The test asserted that running
several operations with `refresh=wait_for` did not fail if the index was
`_close`d while the operations were waiting. But to be honest, failure
here isn't that bad. The index being waited on is closed. You can't do
anything with it any way. The most important thing is actually that
these operations don't hang forever. Because hanging forever means that
the resources used by the operations aren't freed.
Anyway, when I noticed the error I reenabled the test. But they don't
pass consistently because *sometimes* the operations being tested fail.
They don't seem to hang and they always fail with "this index is closed
so you can't do anything with it" sorts of messages.
When the test started failing we disabled it again. This reenables the
test but causes it to ignore these "index is closed" failures. We'd
prefer they not happen at all but in the grand scheme of things they are
fine and making sure these operations don't hang is much more important.
This also updates the test to bring it more in line with my current
understanding of the "right" way to use the low level rest client.
* Add commented out JVM options for G1GC
These options are available now that we will be supporting G1GC for Java 10 and
above. They are also designed so that the CMS options don't have to be commented
out in order for the G1 options to take effect.
* Update wording
Changes the default of the `node.name` setting to the hostname of the
machine on which Elasticsearch is running. Previously it was the first 8
characters of the node id. This had the advantage of producing a unique
name even when the node name isn't configured but the disadvantage of
being unrecognizable and not being available until fairly late in the
startup process. Of particular interest is that it isn't available until
after logging is configured. This forces us to use a volatile read
whenever we add the node name to the log.
Using the hostname is available immediately on startup and is generally
recognizable but has the disadvantage of not being unique when run on
machines that don't set their hostname or when multiple elasticsearch
processes are run on the same host. I believe that, taken together, it
is better to default to the hostname.
1. Running multiple copies of Elasticsearch on the same node is a fairly
advanced feature. We do it all the as part of the elasticsearch build
for testing but we make sure to set the node name then.
2. That the node.name defaults to some flavor of "localhost" on an
unconfigured box feels like it isn't going to come up too much in
production. I expect most production deployments to at least set the
hostname.
As a bonus, production deployments need no longer set the node name in
most cases. At least in my experience most folks set it to the hostname
anyway.
I created a test a few days ago and declared a package that doesn't line
up with the directory structure. Oops. I a little surprised nothing
complained. But this fixes it.
I disabled one branch a few hours ago because it failed in CI. It looks
like other branches can also fail so I'll disable them as well and look
more closely on Monday.
Change the logging infrastructure to handle when the node name isn't
available in `elasticsearch.yml`. In that case the node name is not
available until long after logging is configured. The biggest change is
that the node name logging no longer fixed at pattern build time.
Instead it is read from a `SetOnce` on every print. If it is unset it is
printed as `unknown` so we have something that fits in the pattern.
On normal startup we don't log anything until the node name is available
so we never see the `unknown`s.
The main benefit of the upgrade for users is the search optimization for top scored documents when the total hit count is not needed. However this optimization is not activated in this change, there is another issue opened to discuss how it should be integrated smoothly.
Some comments about the change:
* Tests that can produce negative scores have been adapted but we need to forbid them completely: #33309Closes#32899
Gradle triggers the build of artifacts even if assemble is disabled.
Most users will not need bwc distributions after running `./gradlew
assemble` so instead of forcing them to add `-x buildBwcVersion`, we
detect this and skip the configuration of the artifacts.
- third party audit detects jar hell with JDK so we disable it
- jdk non portable in forbiddenapis detects classes being used from the
JDK ( for fips ) that are not portable, this is intended so we don't
scan for it on fips.
- different exclusion rules for third party audit on fips
Closes#33179
In #29623 we added `Request` object flavored requests to the low level
REST client and in #30315 we deprecated the old `performRequest`s. This
changes all calls in the `client` and `distribution` projects to use
the new versions.
On some Linux distributions tmpfiles.d cleans files and
directories under /tmp if they haven't been accessed for
10 days.
This can cause problems for ML as ML is currently the only
component that uses the temp directory more than a few
seconds after startup. If you didn't open an ML job for
10 days and then tried to open one then the temp directory
would have been deleted.
This commit prevents the problem occurring in the case of
Elasticsearch being managed by systemd, as systemd private
temp directories are not subject to periodic cleanup (by
default).
Additionally there are now some docs to warn people about
the risk and suggest a manual mitigation for .tar.gz users.
First, some background: we have 15 different methods to get a logger in
Elasticsearch but they can be broken down into three broad categories
based on what information is provided when building the logger.
Just a class like:
```
private static final Logger logger = ESLoggerFactory.getLogger(ActionModule.class);
```
or:
```
protected final Logger logger = Loggers.getLogger(getClass());
```
The class and settings:
```
this.logger = Loggers.getLogger(getClass(), settings);
```
Or more information like:
```
Loggers.getLogger("index.store.deletes", settings, shardId)
```
The goal of the "class and settings" variant is to attach the node name
to the logger. Because we don't always have the settings available, we
often use the "just a class" variant and get loggers without node names
attached. There isn't any real consistency here. Some loggers get the
node name because it is convenient and some do not.
This change makes the node name available to all loggers all the time.
Almost. There are some caveats are testing that I'll get to. But in
*production* code the node name is node available to all loggers. This
means we can stop using the "class and settings" variants to fetch
loggers which was the real goal here, but a pleasant side effect is that
the ndoe name is now consitent on every log line and optional by editing
the logging pattern. This is all powered by setting the node name
statically on a logging formatter very early in initialization.
Now to tests: tests can't set the node name statically because
subclasses of `ESIntegTestCase` run many nodes in the same jvm, even in
the same class loader. Also, lots of tests don't run with a real node so
they don't *have* a node name at all. To support multiple nodes in the
same JVM tests suss out the node name from the thread name which works
surprisingly well and easy to test in a nice way. For those threads
that are not part of an `ESIntegTestCase` node we stick whatever useful
information we can get form the thread name in the place of the node
name. This allows us to keep the logger format consistent.
Explicitly include all subdirectories of these folders in
/usr/share/elasticsearch in package distributions so that they are
managed by the package manager. This change does really have an
effect in the 7.x series, where there are no subdirectories in bin, and
we were already doing this in lib and modules. It does have an effect in
the 6.x series where the bin/x-pack subdirectory was not previously
tracked by the package manager and could be left behind on removal in
rpm distributions.