The Apache Commons Daemon has some helpful features for Java
applications, like nice little next boxes for min heap, max heap, and
thread stack size. Our elasticsearch-service.bat script parses those
values out of the ES_JAVA_OPTS environment variable and provides them to
the Apache Commons Daemon invocation command in order to provide
sensible defaults. However, we failed to remove those values from the
ES_JAVA_OPTS environment variable, which meant they ended up in the
"Java Options" text box and would, from there, override whatever the
user put in the specific boxes for heap size or thread stack size.
This commit modifies the loop that parses ES_JAVA_OPTS to construct a
new enviroment variable containing only the values that aren't parsed
out for heap size or thread stack size, then uses that new enviroment
variable in the commons daemon invocation command.
JDK 14 has removed CMS. This commit restricts the support for CMS to JDK
8 through JDK 13, and defaults to G1 GC on JDK 14. We will revisit all
defaults in the future, but this ensures that we run with a
properly-configured garbage collector on JDK 14+.
This commit moves JVM options that we are setting on behalf of the user
that we do not expect them to fiddle with out of the jvm.options
configuration file and into the JVM options parser. In this way, we
discourage fiddling with these settings, but more importantly, we ensure
that as we evolve or add to these settings that a user would pick these
pick instead of being left behind if they have a modified jvm.options
file and do not pick any new that come with the distribution.
This commit removes the option to change the netty system properties to
reenable the direct buffer pooling. It also removes the need for us to
disable the buffer pooling in the system properties file. Instead, we
programmatically craete an allocator that is used by our networking
layer.
This commit does introduce an Elasticsearch property which allows the
user to fallback on the netty default allocator. If they choose this
option, they can configure the default allocator how they wish using the
standard netty properties.
This commit moves the ES_TMPDIR substitution that we do for JVM options
into the JVM options parser itself. This solves a problem where the fact
that the we do not make the substitution before ergonomics parsing can
lead to the JVM that we start for computing the ergonomic values failing
to start. Additionally, moving this substitution here enables us to
simplify the shell scripts since we do not need to implement this there,
and twice for Bash and Windows.
Since the bundled jdk was added to Elasticsearch, there are now 2 ways
java can be missing. Either JAVA_HOME is set but does not exist, or the
bundled jdk does not exist. This commit improves the error messages in
those two cases, and also ensures our tests cover both cases.
G1 GC were setup to use an `InitiatingHeapOccupancyPercent` of 75. This
could leave used memory at a very high level for an extended duration,
triggering the real memory circuit breaker even at low activity levels.
The value is a threshold for old generation usage relative to total heap
size and thus it should leave room for the new generation. Default in
G1 is to allow up to 60 percent for new generation and this could mean that the
threshold was effectively at 135% heap usage. GC would still kick in of course and
eventually enough mixed collections would take place such that adaptive adjustment
of IHOP kicks in.
The JVM has adaptive setting of the IHOP, but this does not kick in
until it has sampled a few collections. A newly started, relatively
quiet server with primarily new generation activity could thus
experience heap above 95% frequently for a duration.
The changes here are two-fold:
1. Use 30% default for IHOP (the JVM default of 45 could still mean
105% heap usage threshold and did not fully ensure not to hit the
circuit breaker with low activity)
2. Set G1ReservePercent=25. This is used by the adaptive IHOP mechanism,
meaning old/mixed GC should kick in no later than at 75% heap. This
ensures IHOP stays compatible with the real memory circuit breaker also
after being adjusted by adaptive IHOP.
We currently configure io.netty.allocator.numDirectArenas to be 0 in the
jvm erconomics class. This is a config that we always want to set, so it
makes sense to move it to jvm.options.
The field has to be defined in log4j2.properties and should be an
escaped JSON for now (it is a broken JSON at the moment). This should later be refactored into a JSON array
of strings.
In https://github.com/elastic/elasticsearch/pull/41913 setting up the
temp dir for ES was moved from the env script to individual cli scripts.
However, moving it to the windows service cli was missed. This commit
restores setting up the temp dir for the windows service control script.
This commit adds some default CLI JVM options to control the heap size
and the garbage collector used for the CLI tools. We do this because
otherwise the JVM will default to large initial and max heap sizes based
on the RAM visible to the JVM (which could be all the physical RAM on
the machine if not run in a container-aware JVM). This commit therefore
sets the initial heap size to 4m, the max heap size to 64m, the garbage
collector to the serial collector, and leaves this user-configurable by
honoring ES_JAVA_OPTS last.
This is a refactor to current JSON logging to make it more open for extensions
and support for custom ES log messages used inDeprecationLogger IndexingSlowLog , SearchSLowLog
We want to include x-opaque-id in deprecation logs. The easiest way to have this as an additional JSON field instead of part of the message is to create a custom DeprecatedMessage (extends ESLogMEssage)
These messages are regular log4j messages with a text, but also carry a map of fields which can then populate the log pattern. The logic for this lives in ESJsonLayout and ESMessageFieldConverter.
Similar approach can be used to refactor IndexingSlowLog and SearchSlowLog JSON logs to contain fields previously only present as escaped JSON string in a message field.
closes#41350
backport #41354
The elasticsearch-cli helper script does not use the tempdir created by
elasticsearch-env, yet the env script still creates it. This can lead to
lots of temp directories being created when running cli scripts in an
automated fashion. This commit passes a fake tmpdir to the env script to
avoid creation.
closes#34445
This commit removes manual parsing of JVM options when calculating
ergonomics. This is to avoid a situation that we parse values
differently than the JVM would. In fact, we already have a bug along
these lines today. It is possible to start the JVM with the same flag
multiple times on the command line. In this case, the last value
wins. For example, -Xmx1g -Xmx2g would start the JVM with a heap size of
two gigabytes. Our JVM ergonomics ignores this possibility and instead
the first value is winning!
Our strategy to avoid manual parsing of the JVM options is to start the
Java command line parser (without actually starting a JVM) by invoking
java with the same command line flags as presented and request that the
JVM tell us what values it would start with. This ensures that we have
the correct values when making ergonomic decisions.
Moreover, our strategy also is ignoring ES_JAVA_OPTS which could
override the heap size as well leading to incorrect ergonomic
choices. This commit address this issue too.
We use Bouncy Castle to verify signatures when installing official
plugins. This leads to illegal access warnings because Bouncy Castle
accesses the Sun security provider constructor. This commit adds an
add-opens flag to suppress this illegal access.
We previously found a bug in the JVM where AVX-512 instructions could
crash the JVM to crash with a segmentation fault. This bug impacted JDK
9 and JDK 10, but was most prominent on JDK 10 because AVX-512 was
enabled there by default. In JDK 11, this bug is reported fixed so this
commit restricts the disabling of AVX-512 to JDK 10 only. Since we no
longer support JDK 10 for any versions that this commit will be
integrated into (7.1, 8.0), we simply remove the disabling of this flag
from the JVM options.
On windows, JAVA_HOME is currently resolved when the windows service is started. However, this is contrary to what our documentation states. This commit moves resolution to service install. This has the side effect of making java existence checking optional in elasticsearch-env.bat, since the rest of the service commands do not require java.
closes#30720
* Revert "Configure TMP for test nodes on Windows (#39959)"
This reverts commit 97562a874fcb1f29fb05272ab860a0307e97d1aa.
* Configure a tmp dir without spaces
* Pass on TMP instead of changing it
This commit adds cd $ES_HOME to elasticsearch-env and removes it from
elasticsearch. This way, both elasticsearch and elasticsearch-cli are
executed with the working directory set to $ES_HOME. The need for the
fix arose from the following bug:
1. Explicitly set path.data to relative to ES_HOME path in
elasticsearch.yml.
2. Run elasticsearch from any directory. Elasticsearch is able to
correctly start.
3. Stop elasticsearch.
4. Run elasticsearch-node unsafe-bootstrap, not from ES_HOME directory.
It will fail with an exception.
This commit fixes the issue and adds a new test.
This PR fixes the issue and adds a new test.
Also tests >=100 are renamed because alphabetic order does not work for
them.
(cherry picked from commit 2ffc29306ff7366efc598e7b4dd2ce528895cd3a
with fixes by #40083 and #40118)
This breaks on windows where TMP dir default to C:\Windows and startup
fails with a permission error.
I tried to create a tmp dir and pass in `TMP` env, but it lead to a
class not found error, and since testclusers is already independent of
the calling environment I stopped there.
* Bundle java in distributions
Setting up a jdk is currently a required external step when installing
elasticsearch. This is particularly problematic for the rpm/deb packages
as installing a jdk in the same package installation command does not
guarantee any order, so must be done in separate steps. Additionally,
JAVA_HOME must be set and often causes problems in selecting a correct
jdk when, for example, the system java is an older unsupported version.
This commit bundles platform specific openjdks into each distribution.
In addition to eliminating the issues above, it also presents future
possible improvements like using jlink to build jdk images only
containing modules that elasticsearch uses.
closes#31845
With this commit we provide more info in an existing error message that is
raised when the file `jvm.dll` cannot be found on Windows when installing
Elasticsearch as a service.
Finding java on the path is sometimes confusing for users and
unexpected, as well as leading to a different java being used than a
user expects. This commit adds warning messages when starting
elasticsearch (or any tools like the plugin cli) and using java found
on the PATH instead of via JAVA_HOME.
Renames the following settings to remove the mention of `zen` in their names:
- `discovery.zen.hosts_provider` -> `discovery.seed_providers`
- `discovery.zen.ping.unicast.concurrent_connects` -> `discovery.seed_resolver.max_concurrent_resolvers`
- `discovery.zen.ping.unicast.hosts.resolve_timeout` -> `discovery.seed_resolver.timeout`
- `discovery.zen.ping.unicast.hosts` -> `discovery.seed_addresses`
In order to support JSON log format, a custom pattern layout was used and its configuration is enclosed in ESJsonLayout. Users are free to use their own patterns, but if smooth Beats integration is needed, they should use ESJsonLayout. EvilLoggerTests are left intact to make sure user's custom log patterns work fine.
To populate additional fields node.id and cluster.uuid which are not available at start time,
a cluster state update will have to be received and the values passed to log4j pattern converter.
A ClusterStateObserver.Listener is used to receive only one ClusteStateUpdate. Once update is received the nodeId and clusterUUid are set in a static field in a NodeAndClusterIdConverter.
Following fields are expected in JSON log lines: type, tiemstamp, level, component, cluster.name, node.name, node.id, cluster.uuid, message, stacktrace
see ESJsonLayout.java for more details and field descriptions
Docker log4j2 configuration is now almost the same as the one use for ES binary.
The only difference is that docker is using console appenders, whereas ES is using file appenders.
relates: #32850
* Exit batch files explictly using ERRORLEVEL
This makes sure the exit code is preserved when calling the batch
files from different contexts other than DOS
Fixes#29582
This also fixes specific error codes being masked by an explict
exit /b 1
causing the useful exitcodes from ExitCodes to be lost.
* fix line breaks for calling cli to match the bash scripts
* indent size of bash files is 2, make sure editorconfig does the same for bat files
* update indenting to match bash files
* update elasticsearch-keystore.bat indenting
* Update elasticsearch-node.bat to exit outside of endlocal
elasticsearch-node tool helps to restore cluster if half or more of
master eligible nodes are lost. Of course, all bets are off, regarding
data consistency.
There are two parts of the tool: unsafe-bootstrap to be used when there
is still at least one master-eligible node alive and detach-cluster,
when there are no master-eligible nodes left.
This commit implements the first part.
Docs for the tool will be added separately as a part of #37812.
When a security manager is present, the JVM will cache positive hostname
lookups indefinitely. This can be problematic, especially in the modern
world with cloud services where DNS addresses can change, or
environments using Docker containers where IP addresses could be
considered ephemeral. This behavior impacts cluster discovery,
cross-cluster replication and cross-cluster search, reindex from remote,
snapshot repositories, webhooks in Watcher, external authentication
mechanisms, and the Elastic Stack Monitoring Service. The experience of
watching a DNS lookup change yet not be reflected within Elasticsearch
is a poor experience for users. The reason the JVM has this is guard
against DNS cache posioning attacks. Yet, there is already a defense in
the modern world against such attacks: TLS. With proper certificate
validation, even if a resolver falls prey to a DNS cache poisoning
attack, using TLS would neuter the attack. Therefore we have a policy
with dubious security value that significantly impacts usability. As
such we make the usability/security tradeoff towards usability, since
the security risks are very low. This commit introduces new system
properties that Elasticsearch observes to override the JVM DNS cache
policy.
Currently is `java` is not in $PATH the preinst script fails
prematurely and prevents an appropriate message from getting displayed
to the user.
Make package installation more user friendly when java is not in
$PATH and add a test for it.
Also use a she-bang in the preinst script, as, at least in Debian,
maintainer scripts must start with the #! convention [1].
Relates #31845
[1] https://www.debian.org/doc/debian-policy/ch-maintainerscripts.html
In the long run we want to move all of startup to a Java program. This
will simplify our startup scripts and make maintenance of startup less
dependent on the underlying platform that we run on. This commit moves
the creation of the temporary directory off of system-dependent commands
and onto a simple Java program.
This commit updates the procrun manager and service exes to 1.1.0. There
are a few bug fixes, including for a bug which can cause lingering
processes when removing the service.
To pass the HOSTNAME envrionment variable to the Windows service, we
have to add some command line flags to the service invocation. Namely,
we have to specify that we are passing HOSTNAME variable, and we will
pass for it the value of %%COMPUTERNAME%%. This ensures that if the
hostname is changed, we pick this up the next time that the service is
started. This change is needed for the service now that we use the
HOSTNAME as the default node name.
#32281 adds elasticsearch-shard to provide bwc version of elasticsearch-translog for 6.x; have to remove elasticsearch-translog for 7.0
Relates to #31389
* Add commented out JVM options for G1GC
These options are available now that we will be supporting G1GC for Java 10 and
above. They are also designed so that the CMS options don't have to be commented
out in order for the G1 options to take effect.
* Update wording
First, some background: we have 15 different methods to get a logger in
Elasticsearch but they can be broken down into three broad categories
based on what information is provided when building the logger.
Just a class like:
```
private static final Logger logger = ESLoggerFactory.getLogger(ActionModule.class);
```
or:
```
protected final Logger logger = Loggers.getLogger(getClass());
```
The class and settings:
```
this.logger = Loggers.getLogger(getClass(), settings);
```
Or more information like:
```
Loggers.getLogger("index.store.deletes", settings, shardId)
```
The goal of the "class and settings" variant is to attach the node name
to the logger. Because we don't always have the settings available, we
often use the "just a class" variant and get loggers without node names
attached. There isn't any real consistency here. Some loggers get the
node name because it is convenient and some do not.
This change makes the node name available to all loggers all the time.
Almost. There are some caveats are testing that I'll get to. But in
*production* code the node name is node available to all loggers. This
means we can stop using the "class and settings" variants to fetch
loggers which was the real goal here, but a pleasant side effect is that
the ndoe name is now consitent on every log line and optional by editing
the logging pattern. This is all powered by setting the node name
statically on a logging formatter very early in initialization.
Now to tests: tests can't set the node name statically because
subclasses of `ESIntegTestCase` run many nodes in the same jvm, even in
the same class loader. Also, lots of tests don't run with a real node so
they don't *have* a node name at all. To support multiple nodes in the
same JVM tests suss out the node name from the thread name which works
surprisingly well and easy to test in a nice way. For those threads
that are not part of an `ESIntegTestCase` node we stick whatever useful
information we can get form the thread name in the place of the node
name. This allows us to keep the logger format consistent.
The C2 compiler in JDK 10 appears to have an issue compiling to AVX-512
instructions (on hardware that supports such). As a workaround, this
commit adds a JVM flag on JDK 10+ to disable the use of AVX-512
instructions until a fix is introduced to the JDK. Instead, we use a
flag to enable AVX and AVX2 only.
Note: Based on my reading of the C2 code, this flag does not appear to
have any impact on hardware that does not support AVX2. I have tested
this manually on an Intel Atom C2538 processor that supports neither AVX
nor AVX2. I have also tested this manually on an Intel i5-3317U
processor that supports AVX but not AVX2.
The java version checker requires being written with java 7 APIs.
In order to use java 8 apis in other launcher utilities, this commit
moves the java version checker back to its own jar.
This commit adjusts the indentation in the CLI scripts to give a clear
visual indication that the line being indented is a continuation of the
previous line.
A previous refactoring of the CLI scripts migrated all of the CLI tools
to shell to a common script, elasticsearch-cli. This approach is fine in
Bash where it is easy to tear arguments apart but it doesn't work so
well on Windows where quoting is insane. To avoid having to tear the
arguments apart to separate the first argument to elasticsearch-cli from
the remaining arguments, we instead choose a strategy where we can avoid
tearing the arguments apart. To do this, we will instead pass the main
class by an environment variable and then we can pass the arguments
straight through. This will let us avoid awful quoting issues on
Windows. This is the Windows side of that effort and the Bash side was
in a previous commit.
A previous refactoring of the CLI scripts migrated all of the CLI tools
to shell to a common script, elasticsearch-cli. This approach is fine in
Bash where it is easy to tear arguments apart but it doesn't work so
well on Windows where quoting is insane. To avoid having to tear the
arguments apart to separate the first argument to elasticsearch-cli from
the remaining arguments, we instead choose a strategy where we can avoid
tearing the arguments apart. To do this, we will instead pass the main
class by an environment variable and then we can pass the arguments
straight through. This will let us avoid awful quoting issues on
Windows. This is the non-Windows side of that effort and the Windows
side will be in a follow-up.