OldIndexBackwardsCompatibilityIT#testOldClusterStates tested whether global and index metadata could be read from data directory,
this can also be tested in full cluster qa test that checks cluster state via api.
Relates to #24939
Ports all of RepositoryUpgradabilityIT to qa:full-cluster-restart and ports as much of RestoreBackwardsCompatIT as possible into qa:full-cluster-restart.
This commit adds a gradle project, set inside the root build.gradle,
which controls all our bwc tests. This allows for seamless (ie no errant
CI failures) backporting of behavior.
In #25201, a setting was added to allow setting the retry timeout for the rest client under the
impression that this would allow requests to go longer than 30s. However, there is also a socket
timeout that needs to be set to greater than 30s, which this change adds a setting for.
This commit adds a setting to change the request timeout for the rest client. This is useful as the
default timeout is 30s, which is also the same default for calls like cluster health. If both are
the same then the response from the cluster health api will not be received as the client usually
times out first making test failures harder to debug.
Relates #25185
Duplicate data paths already fail to work because we would attempt to
take out a node lock on the directory a second time which will fail
after the first lock attempt succeeds. However, how this failure
manifests is not apparent at all and is quite difficult to
debug. Instead, we should explicitly reject duplicate data paths to make
the failure cause more obvious.
Relates #25178
We have a custom logger implementation known as a prefix logger that is
used to write every message by the logger with a given prefix. This is
useful for node-level, index-level, and shard-level messages where we
want to log the node name, index name, and shard ID, respectively, if
possible. The mechanism that we employ is that of a marker. Log4j has a
built-in facility for managing these markers, but its effectively a
memory leak because these markers are held in a map and can never be
released. This is problematic for us since indices and shards do not
necessarily have infinite life spans and so on a node where there are
many indices being creted and destroyed, this infinite lifespan can be a
problem indeed. To solve this, we use our own cache of markers. This is
necessary to prevent too many instances of the marker for the same
prefix from being created (just think of all the shard-level components
that exist in the system), and to workaround the effective leak in
Log4j. These markers are stored as weak references in a weak hash
map. It is these weak references that are unneeded. When a key is
removed from a weak hash map, the corresponding entry is placed on a
reference queue that is eventually cleared. This commit simplifies
prefix logger by removing this unnecessary weak reference wrapper.
Relates #22460
This commit adds back "id" as the key within a script to specify a
stored script (which with file scripts now gone is no longer ambiguous).
It also adds "source" as a replacement for "code". This is in an attempt
to normalize how scripts are specified across both put stored scripts and script usages, including search template requests. This also deprecates the old inline/stored keys.
This adds an option to `ClusterConfiguration` to preserve the
`shared` directory when starting up a new cluster and switches
the `qa:full-cluster-restart` tests to use it rather than
disable the clean shared task.
Relates to #24846
We're using Vagrant in more places now than before. This commit includes a plugin that verifies
the Vagrant and Virtualbox installations for projects that depend on them. This shared code
should fix up the errors we've seen from CI builds relating to the new Kerberos fixture.
* Adds nodes usage API to monitor usages of actions
The nodes usage API has 2 main endpoints
/_nodes/usage and /_nodes/{nodeIds}/usage return the usage statistics
for all nodes and the specified node(s) respectively.
At the moment only one type of usage statistics is available, the REST
actions usage. This records the number of times each REST action class is
called and when the nodes usage api is called will return a map of rest
action class name to long representing the number of times each of the action
classes has been called.
Still to do:
* [x] Create usage service to store usage statistics
* [x] Record usage in REST layer
* [x] Add Transport Actions
* [x] Add REST Actions
* [x] Tests
* [x] Documentation
* Rafactors UsageService so counts are done by the handlers
* Fixing up docs tests
* Adds a name to all rest actions
* Addresses review comments
We default to 0 replicas in the rolling restart scenario already to ensure
we test against worst case. Yet, this adds a dummy index to ensure we also
recover and index with replicas just fine.
The Lucene version expectation in the verify Lucene version test is
backwards, mixing up the expected and actual values. This commit
reorders them to fix this issue.
The Lucene version constants for 5.4.1 and 5.5.0 are wrong, they are
listed as 6.5.0 instead of 6.5.1. This commit fixes these issues, and
adds a test to ensure that this does not happen again.
Relates #24923
Removes the need for the `_UNRELEASED` suffix on versions by detecting if a version should be unreleased or not based on the versions around it. This should make it simpler to automate the task of adding a new version label.
These tests spin up two nodes of an older version of Elasticsearch,
create some stuff, shut down the nodes, start the current version,
and verify that the created stuff works.
You can run `gradle qa:full-cluster-restart:check` to run these
tests against the head of the previous branch of Elasticsearch
(5.x for master, 5.4 for 5.x, etc) or you can run
`gradle qa:full-cluster-restart:bwcTest` to run this test against
all "index compatible" versions, one after the other. For master
this is every released version in the 5.x.y version *and* the tip
of the 5.x branch.
I'd love to add more to these tests in the future but these
currently just cover the functionality of the `create_bwc_index.py`
script and start to cover the assertions in the
`OldIndexBackwardsCompatibilityIT` test.
This is a simple refactoring to move the context definitions into the
type that they use. While we have multiple context names for the same
class at the moment, this will eventually become one ScriptContext per
instance type, so the pattern of a static member on the interface called
CONTEXT can be used. This commit also moves the consolidated list of
contexts provided by core ES into ScriptModule.
As we work towards contexts implying the return type of compilation, we
first need ScriptContext to not be an enum. This commit removes the
Standard enum and Plugin subclass of ScriptContext.
ScriptEngine implementations have an overridable method to indicate they
are safe to use as inline scripts. Since groovy was removed fro 6.0,
there are no longer any implementations which used the default false
value. Furthermore, the value was not actually read anywhere. This
commit removes the method. The ScriptEngineRegistry was also no longer
necessary as it only was used to build a map from language to engine.
This commit renames the backwards-5.0 qa test to mixed-cluster and
creates a test within the project per wire compat version. Like with
rolling upgrade tests, the integTest task will run against the most
recent version, while all versions will be tested with the bwcTest task.
Now that we generate the versions list from Versions.java we can
drop the list of versions maintained for vagrant testing. One nice
thing that the vagrant testing did was to check if the list of
versions was out of date. This moves that test to the core
project.
This commit changes the rolling upgrade test to create a set of rest
test tasks per wire compat version. The most recent wire compat version
is always tested with the `integTest` task, and all versions can be
tested with `bwcTest`.
This commit expands the logic for version extraction from Version.java
to include a list of all versions for backcompat purposes. The tests
using bwcVersion are converted to use this list, but those tests
(rolling upgrade and backwards-5.0) are still not randomized; that will
happen in another followup.
Today the `_field_caps` API doesn't implement its request serialization
correctly since indices and indices options are not serialized at all.
This will likely break with all transport clients etc. and if this request
must be send across the network. This commit fixes this and adds correct
handling if we have only remote indices to prevent the inclusion of
all local indices.
This commit renames all rest test files to use the .yml extension
instead of .yaml. This way the extension used within all of
elasticsearch for yaml is consistent.
* Add parent-join module
This change adds a new module named `parent-join`.
The goal of this module is to provide a replacement for the `_parent` field but as a first step this change only moves the `has_child`, `has_parent` queries and the `children` aggregation to this module.
These queries and aggregations are no longer in core but they are deployed by default as a module.
Relates #20257
In windows we can't reliable git the pid so we skip the
reindex-from-remote tests from old versions of elasticsearch. This
is OK because we aren't really testing windows here anyway. It isn't
great, but should be safe.
Two of the versions of Elasticsearch we need to run for these tests
can't run in Java 9 so we skip the entire test if we are running in
java 9. For now. I'd like to reenable it to run against java 8 if
there is one available, but that can wait for another time.
Relates to #24561
Adds tests for reindex-from-remote for the latest 2.4, 1.7, and
0.90 releases. 2.4 and 1.7 are fairly popular versions but 0.90
is a point of pride.
This fixes any issues those tests revealed.
Closes#23828Closes#24520
This commit renames ScriptEngineService to ScriptEngine. It is often
confusing because we have the ScriptService, and then
ScriptEngineService implementations, but the latter are not services as
we see in other places in elasticsearch.
File scripts have 2 related settings: the path of file scripts, and
whether they can be dynamically reloaded. This commit deprecates those
settings.
relates #21798
Today we rely on background syncs to relay the global checkpoint under
the mandate of the primary to its replicas. This means that the global
checkpoint on a replica can lag far behind the primary. The commit moves
to inlining global checkpoints with replication requests. When a
replication operation is performed, the primary will send the latest
global checkpoint inline with the replica requests. This keeps the
replicas closer in-sync with the primary.
However, consider a replication request that is not followed by another
replication request for an indefinite period of time. When the replicas
respond to the primary with their local checkpoint, the primary will
advance its global checkpoint. During this indefinite period of time,
the replicas will not be notified of the advanced global
checkpoint. This necessitates a need for another sync. To achieve this,
we perform a global checkpoint sync when a shard falls idle.
Relates #24513
Previously, Mustache would call `toString` on the `_ingest.timestamp`
field and return a date format that did not match Elasticsearch's
defaults for date-mapping parsing. The new ZonedDateTime class in Java 8
happens to do format itself in the same way ES is expecting.
This commit adds support for a feature flag that enables the usage of this new date format
that has more native behavior.
Fixes#23168.
This new fix can be found in the form of a cluster setting called
`ingest.new_date_format`. By default, in 5.x, the existing behavior
will remain the same. One will set this property to `true` in order to
take advantage of this update for ingest-pipeline convenience.
When installing plugin permissions, we try to set the permissions on all
installed files ourselves because a umask from the user could violate
everything needed to get the permissions right. Sadly, directories were
not handled correctly at all and so we were still left with broken
installations with umasks like 0077. This commit fixes this issue, adds
a thorough unit test for the situation, and most importantly, adds a
test that sets the umask before installing the plugin.
Relates #24527
The license header in this file was after and on the same line as the
package statement. This commit moves the package statement to be after
the license header.
Netty uses the number of processors for sizing various resources (e.g.,
thread pools, buffer pools, etc.). However, it uses the runtime number
of available processors which might not match the configured number of
processors as set in Elasticsearch to limit the number of threads (for
example, in Docker containers). A new feature was added to Netty that
enables configuring the number of processors Netty should see for sizing
this various resources. This commit takes advantage of this feature to
set this number of available processors to be equal to the configured
number of processors set in Elasticsearch.
Relates #24420
The tribe service can take a while to initialize, depending on how many cluster it needs to connect to. This change moves writing the ports file used by tests to before the tribe service is started.
This adds the `index.mapping.single_type` setting, which enforces that indices
have at most one type when it is true. The default value is true for 6.0+ indices
and false for old indices.
Relates #15613
The bats tests are descructive and must be run as root. This is a
horrible combination on any sane system but perfectly fine to do
in a VM. This change modifies the tests so they revuse to start
unless they are in an environment with an `/etc/is_vagrant_vm`
file. The Vagrantfile creates it on startup.
Closes#24137
If the user explicitly configured path.data to include
default.path.data, then we should not fail the node if we find indices
in default.path.data. This commit addresses this.
Relates #24285
In #24251 we fix an issue with stored search templates that
this test would have discovered: stored search templates cause
the node to refuse to start. Technically a "restart" test would
have caught this as well and would have caught it more quickly.
But we already *have* an upgrade test and we don't have restart tests.
And testing this on upgrade is a good thing too.
It seems that Wildfly 10 can not be made to start in a fully-functional
form on JDK 9, so this commit skips running the Wildfly integration
tests on JDK 9.
An important use case for our users is deploying our clients inside of
applications containers like Wildly. Sometimes, we make changes that
unintentionally break this use case. We need to know before we ship a
release that we have broken such use cases. As Wildfly is one of the
bigger application containers, this commit starts by adding an
integration test that deploys an application using the transport client
to Wildfly and ensures that all is well. Future work can add similar
integration tests for the low-level and high-level REST clients.
Relates #24147
The plugin cli currently resides inside the elasticsearch jar. This
commit moves it into a plugin-cli jar. This is change alone is a no-op;
it does not change anything about what is loaded at runtime. But it will
allow easier testing (with fixtures in the future to test ES or maven
installation), as well as eventually not loading these classes when
starting elasticsearch.
This leniency was left in after plugin installer refactoring for 2.0
because some tests still relied on it. However, the need for this
leniency no longer exists.
* Replicate write failures
Currently, when a primary write operation fails after generating
a sequence number, the failure is not communicated to the replicas.
Ideally, every operation which generates a sequence number on primary
should be recorded in all replicas.
In this change, a sequence number is associated with write operation
failure. When a failure with an assinged seqence number arrives at a
replica, the failure cause and sequence number is recorded in the translog
and the sequence number is marked as completed via executing `Engine.noOp`
on the replica engine.
* use zlong to serialize seq_no
* Incorporate feedback
* track write failures in translog as a noop in primary
* Add tests for replicating write failures.
Test that document failure (w/ seq no generated) are recorded
as no-op in the translog for primary and replica shards
* Update to master
* update shouldExecuteOnReplica comment
* rename indexshard noop to markSeqNoAsNoOp
* remove redundant conditional
* Consolidate possible replica action for bulk item request
depanding on it's primary execution
* remove bulk shard result abstraction
* fix failure handling logic for bwc
* add more tests
* minor fix
* cleanup
* incorporate feedback
* incorporate feedback
* add assert to remove handling noop primary response when 5.0 nodes are not supported
This change simplifies how the rest test runner finds test files and
removes all leniency. Previously multiple prefixes and suffixes would
be tried, and tests could exist inside or outside of the classpath,
although outside of the classpath never quite worked. Now only classpath
tests are supported, and only one resource prefix is supported,
`/rest-api-spec/tests`.
closes#20240
After splitting integ tests into cluster configuration and the test
runner task, we still have dependencies of the test runner added as deps
of the cluster. This commit adds dependencies directly to the cluster,
so that the runner can have other dependencies independent of what is
needed for the cluster.
We have some packaging tests where we use a custom jvm.options file. The
flags we use here are barebones, just enough to exercise that using a
custom jvm.options files actually works. Due to this, we are missing a
Log4j flag that prevents Log4j from trying to use JMX. If Log4j tries to
use JMX, it hits a security manager exception and tries to log
this. This attempt to log happens before we've configured
logging. Previously, Elasticsearch was lenient here so this was treated
as harmless and the test could march on. Now, we fail startup if we
detect an attempt to log before logging is configured so this prevents
Elasticsearch from starting if we do not have jvm.options files in place
that prevent these log messages from being written before logging is
being configured. This commit adds jvm.options files in the places need
to prevent this.
These tests were marked as awaits fix due to JNA requiring a version of
glibc greater than or equal to version 2.14. Since we still support
systems that would not have this version, we have released our own JNA
dependency that is built to support earlier versions of glibc. This
commit removes some await fixes that were added to tests that failed as
a result of this situation.
It can easily happen that we touch a logger before logging is configured
due to chains of static intializers and other such scenarios. This
commit adds detection for this mechanism that will fail startup if we
touch a logger before logging is configured. This is a bug that will
cause builds to fail.
Relates #24076
This is related to #23893. This commit allows users to use wilcards for
cluster names when executing a cross cluster search.
So instead of defining every cluster such as:
GET one:*,two:*,three:*/_search
A user could just search:
GET *:*/_search
As ":" characters are currently allowed in index names, if the text
up to the first ":" does not match a defined cluster name, the entire
string is treated as an index name.
All our actions that are invoked from rest actions have corresponding
transport actions. This adds the transport action for RestRemoteClusterInfoAction
for consistency.
Relates to #23969
This commit fixes the handling of POSIX permissions on Windows in the
spawner tests. Since POSIX permissions do not exist there, we first have
to check if we are on a filesystem that supports POSIX or not before
attempting to set the permissions.
This commit renames the random ASCII helper methods in ESTestCase. This
is because this method ultimately uses the random ASCII methods from
randomized runner, but these methods actually only produce random
strings generated from [a-zA-Z].
Relates #23886
This commit adds a single node discovery type. With this discovery type,
a node will elect itself as master and never form a cluster with another
node.
Relates #23595
We currently have the last minor version of the previous major hardcoded
in tests like rolling upgrade. This change programatically finds this
during gradle initialization by parsing versions from Version.java.
These tests were disabled due to an issue in Netty which has since been
resolved and integrated into Elasticsearch.
Relates elastic/elasticsearch#20260
The current rest backcompat tests, which run against a mixed cluster of
5.x and 6.0 nodes, depend on snapshot builds of 5.x. However, this has
the potential for inconsistency that results in CI failures, and happens
quite often, whenever some backcompat logic is added to 5.x, but the bwc
test on master fails because the 5.x code has not yet been published as
a snapshot.
This change creates a git clone of the 5.x branch,
builds the zip distribution, and ties that into gradle substitutions for
the 5.x version.
We currently use POSIX exit codes in all of our CLIs. However, posix
only suggests these exit codes are standard across tools. It does not
prescribe particular uses for codes outside of that range. This commit
adds 2 exit codes specific to plugin installation to make distinguishing
an incorrectly built plugin and a plugin already existing easier.
closes#15295
This commit catches the underlying failure when trying to list plugin
information when a plugin is incompatible with the current version of
elasticsearch. This could happen when elasticsearch is upgraded but old
plugins still exist. With this change, all plugins will be output,
instead of failing at the first out of date plugin.
closes#20691
This commit marks the EvilJNANativesTests as awaiting fixes due to these
tests failing on platforms that do not provide at least version 2.14 of
glibc.
This commit adds a system property that enables end-users to explicitly
enforce the bootstrap checks, independently of the binding of the
transport protocol. This can be useful for single-node production
systems that do not bind the transport protocol (and thus the bootstrap
checks would not be enforced).
Relates #23585
Throw error when skip or do sections are malformed, such as they don't start with the proper token (START_OBJECT). That signals bad indentation, which would be ignored otherwise. Thanks (or due to) our pull parsing code, we were still able to properly parse the sections, yet other runners weren't able to.
Closes#21980
* [TEST] fix indentation in matrix_stats yaml tests
* [TEST] fix indentation in painless yaml test
* [TEST] fix indentation in analysis yaml tests
* [TEST] fix indentation in generated docs yaml tests
* [TEST] fix indentation in multi_cluster_search yaml tests
Gradle's finalizedBy on tasks only ensures one task runs after another,
but not immediately after. This is problematic for our integration tests
since it allows multiple project's integ test clusters to be
simultaneously. While this has not been a problem thus far (gradle 2.13
happened to keep the finalizedBy tasks close enough that no clusters
were running in parallel), with gradle 3.3 the task graph generation has
changed, and numerous clusters may be running simultaneously, causing
memory pressure, and thus generally slower tests, or even failure if the
system has a limited amount of memory (eg in a vagrant host).
This commit reworks how integ tests are configured. It adds an
`integTestCluster` extension to gradle which is equivalent to the current
`integTest.cluster` and moves the rest test runner task to
`integTestRunner`. The `integTest` task is then just a dummy task,
which depends on the cluster runner task, as well as the cluster stop
task. This means running `integTest` in one project will both run the
rest tests, and shut down the cluster, before running `integTest` in
another project.
* Make document write requests immutable
Previously, write requests were mutated at the
transport level to update request version, version type
and sequence no before replication.
Now that all write requests go through the shard bulk
transport action, we can use the primary response stored
in item level bulk requests to pass the updated version,
seqence no. to replicas.
* incorporate feedback
* minor cleanup
* Add bwc test to ensure correct index version propagates to replica
* Fix bwc for propagating write operation versions
* Add assertion on replica request version type
* fix tests using internal version type for replica op
* Fix assertions to assert version type in replica and recovery
* add bwc tests for version checks in concurrent indexing
* incorporate feedback
In the packaging tests we make some requests to Elasticsearch as part of
the tests. These requests were not setting the content-type header. This
commit addresses this.
In the packaging tests we make some requests to Elasticsearch as part of
the tests. These requests were not setting the content-type header. This
commit addresses this.
This commit enforces the requirement of Content-Type for the REST layer and removes the deprecated methods in transport
requests and their usages.
While doing this, it turns out that there are many places where *Entity classes are used from the apache http client
libraries and many of these usages did not specify the content type. The methods that do not specify a content type
explicitly have been added to forbidden apis to prevent more of these from entering our code base.
Relates #19388
These images have been rebuilt to be preloaded with java 8 installed.
This change re-enables the systems. It also removes some redundancy in
the rpm checks I found while testing the new images, and fixes a
potential issue with generated resources in plugins where a stale dir
can cause junk to get into the distribution.
As part of #22116 we are going to forbid usage of api
java.net.URL#openStream(). However in a number of places across the
we use this method to read files from the local filesystem. This commit
introduces a helper method openFileURLStream(URL url) to read files
from URLs. It does specific validation to only ensure that file:/
urls are read.
Additionlly, this commit removes unneeded method
FileSystemUtil.newBufferedReader(URL, Charset). This method used the
openStream () method which will soon be forbidden. Instead we use the
Files.newBufferedReader(Path, Charset).
Today either all nodes in the cluster connect to remote clusters of only nodes
that have remote clusters configured in their node config. To allow global remote
cluster configuration but restrict connections to a set of nodes in the cluster
this change adds a new setting `search.remote.connect` (defaults to `true`) to allow
to disable remote cluster connections on a per node basis.
This commit upgrades the checkstyle configuration from version 5.9 to
version 7.5, the latest version as of today. The main enhancement
obtained via this upgrade is better detection of redundant modifiers.
Relates #22960
When a primary is relocated from an old node to a new node, it can have
ops in its translog that do not have a sequence number assigned. When a
file-based recovery is started, this can lead to skipping these ops when
replaying the translog due to a bug in the recovery logic. This commit
addresses this bug and adds a test in the BWC tests.
Relates #22945
Today if a user invokes the remove plugin command without specifying the
name of a plugin to remove, we arrive at a null pointer exception. This
commit adds logic to cleanly handle this situation and provide clear
feedback to the user.
Relates #22930
This change adds a strict mode for xcontent parsing on the rest layer. The strict mode will be off by default for 5.x and in a separate commit will be enabled by default for 6.0. The strict mode, which can be enabled by setting `http.content_type.required: true` in 5.x, will require that all incoming rest requests have a valid and supported content type header before the request is dispatched. In the non-strict mode, the Content-Type header will be inspected and if it is not present or not valid, we will continue with auto detection of content like we have done previously.
The content type header is parsed to the matching XContentType value with the only exception being for plain text requests. This value is then passed on with the content bytes so that we can reduce the number of places where we need to auto-detect the content type.
As part of this, many transport requests and builders were updated to provide methods that
accepted the XContentType along with the bytes and the methods that would rely on auto-detection have been deprecated.
In the non-strict mode, deprecation warnings are issued whenever a request with body doesn't provide the Content-Type header.
See #19388
Currently, stored scripts use a namespace of (lang, id) to be put, get, deleted, and executed. This is not necessary since the lang is stored with the stored script. A user should only have to specify an id to use a stored script. This change makes that possible while keeping backwards compatibility with the previous namespace of (lang, id). Anywhere the previous namespace is used will log deprecation warnings.
The new behavior is the following:
When a user specifies a stored script, that script will be stored under both the new namespace and old namespace.
Take for example script 'A' with lang 'L0' and data 'D0'. If we add script 'A' to the empty set, the scripts map will be ["A" -- D0, "A#L0" -- D0]. If a script 'A' with lang 'L1' and data 'D1' is then added, the scripts map will be ["A" -- D1, "A#L1" -- D1, "A#L0" -- D0].
When a user deletes a stored script, that script will be deleted from both the new namespace (if it exists) and the old namespace.
Take for example a scripts map with {"A" -- D1, "A#L1" -- D1, "A#L0" -- D0}. If a script is removed specified by an id 'A' and lang null then the scripts map will be {"A#L0" -- D0}. To remove the final script, the deprecated namespace must be used, so an id 'A' and lang 'L0' would need to be specified.
When a user gets/executes a stored script, if the new namespace is used then the script will be retrieved/executed using only 'id', and if the old namespace is used then the script will be retrieved/executed using 'id' and 'lang'
This commit introduces sequence-number-based recovery. When a replica
has fallen out of sync, rather than performing a file-based recovery we
first attempt to replay operations since the last local checkpoint on
the replica. To do this, at the start of recovery the replica tells the
primary what its local checkpoint is. The primary will then wait for all
operations between that local checkpoint and the current maximum
sequence number to complete; this is to ensure that there are no gaps in
the operations that will be replayed from the primary to the
replica. This is a best-effort attempt as we currently have no
guarantees on the primary that these operations will be available; if we
are not able to replay all operations in the desired range, we just
fallback to file-based recovery. Later work will strengthen the
guarantees.
Relates #22484
This is related to #22116. URLRepository requires SocketPermission
connect. This commit introduces a new module called "repository-url"
where URLRepository will reside. With the new module, permissions can
be removed from core.
There are presently 7 ctor args used in any rest handlers:
* `Settings`: Every handler uses it to initialize a logger and
some other strange things.
* `RestController`: Every handler registers itself with it.
* `ClusterSettings`: Used by `RestClusterGetSettingsAction` to
render the default values for cluster settings.
* `IndexScopedSettings`: Used by `RestGetSettingsAction` to get
the default values for index settings.
* `SettingsFilter`: Used by a few handlers to filter returned
settings so we don't expose stuff like passwords.
* `IndexNameExpressionResolver`: Used by `_cat/indices` to
filter the list of indices.
* `Supplier<DiscoveryNodes>`: Used to fill enrich the response
by handlers that list tasks.
We probably want to reduce these arguments over time but
switching construction away from guice gives us tighter
control over the list of available arguments.
These parameters are passed to plugins using
`ActionPlugin#initRestHandlers` which is expected to build and
return that handlers immediately. This felt simpler than
returning an reference to the ctors given all the different
possible args.
Breaks java plugins by moving rest handlers off of guice.
This commit adds a MessyRestTestPlugin to the gradle build. It extends
StandaloneRestTestPlugin. The main piece of functionality that it adds
is to copy plugin-metadata from dependencies into the
generated-resources for the current test source. This is necessary to
ensure that permissions for dependencies are applied when running the
tests.
A current limitation is that the permissions are applied differently
than in the distribution sources. When permissions are granted to all
depedencies for a module or plugin, the permissions are granted to all
dependencies on the classpath for tests besides a few hardcoded
exclusions:
- es core
- es test framework
- lucene test framework
- randomized runner
- junit library
For certain situations, end-users need the base path for Elasticsearch
logs. Exposing this as a property is better than hard-coding the path
into the logging configuration file as otherwise the logging
configuration file could easily diverge from the Elasticsearch
configuration file. Additionally, Elasticsearch will only have
permissions to write to the log directory configured in the
Elasticsearch configuration file. This commit adds a property that
exposes this base path.
One use-case for this is configuring a rollover strategy to retain logs
for a certain period of time. As such, we add an example of this to the
documentation.
Additionally, we expose the property es.logs.cluster_name as this is
used as the name of the log files in the default configuration.
Finally, we expose es.logs.node_name in cases where node.name is
explicitly set in case users want to include the node name as part of
the name of the log files.
Relates #22625
When logger.level is set, we end up configuring a logger named "level"
because we look for all settings of the form "logger\..+" as configuring
a logger. Yet, logger.level is special and is meant to only configure
the default logging level. This commit causes is to avoid not
configuring a logger named level.
Relates #22624
This commit removes a logging test that is now obsolete. This test was
added when we included a forked version of some Log4j 2 classes to
workaround a bug in Log4j 2. This bug was fixed and a version of Log4j 2
incorporating this fix was previously integrated into Elaticsearch. At
that time, the forked versions were removed, and this test should have
been removed with it.
* Fix Translog.Delete serialization for sequence numbers
Translog.Delete used `.writeVLong` instead of `.writeLong` for the sequence
number and primary term (and their respective "read" variants). This could lead
to issues where a 5.x node sent a translog operation with a negative sequence
number (-2 for unassigned seq no) that tripped an assertion serializing a
negative number and causing ES to exit.
Adds a unit test for serialization and a mixed-cluster REST test, since that was
how this was originally caught.
* Use more realistic values for random seqNum and primary term
* Add comment with TODO for removal in 7.0
* Change comment into an assert
Instead of `search.remote.seeds.${clustername}` we now specify the seeds as:
`search.remote.${clustername}.seeds` which is a real list setting compared to an unvalidated
group setting before.
`ToXContentObject` extends `ToXContent` without adding new methods to it, while allowing to mark classes that output complete xcontent objects to distinguish them from classes that require starting and ending an anonymous object externally.
Ideally ToXContent would be renamed to ToXContentFragment, but that would be a huge change in our codebase, hence we simply document the fact that toXContent outputs fragments with no guarantees that the output is valid per se without an external ancestor.
Relates to #16347
this commit adds full support for proxy nodes on the search layer.
This allows to connection only to a small set of nodes on a remote cluster
to exectue the search. The nodes will proxy the request to the correct node in the
cluster while the coordinting node doesn't need to be connected to the target node.
This change is the first towards providing the ability to store
sensitive settings in elasticsearch. It adds the
`elasticsearch-keystore` tool, which allows managing a java keystore.
The keystore is loaded upon node startup in Elasticsearch, and used by
the Setting infrastructure when a setting is configured as secure.
There are a lot of caveats to this PR. The most important is it only
provides the tool and setting infrastructure for secure strings. It does
not yet provide for keystore passwords, keypairs, certificates, or even
convert any existing string settings to secure string settings. Those
will all come in follow up PRs. But this PR was already too big, so this
at least gets a basic version of the infrastructure in.
The two main things to look at. The first is the `SecureSetting` class,
which extends `Setting`, but removes the assumption for the raw value of the
setting to be a string. SecureSetting provides, for now, a single
helper, `stringSetting()` to create a SecureSetting which will return a
SecureString (which is like String, but is closeable, so that the
underlying character array can be cleared). The second is the
`KeyStoreWrapper` class, which wraps the java `KeyStore` to provide a
simpler api (we do not need the entire keystore api) and also extend
the serialized format to add metadata needed for loading the keystore
with no assumptions about keystore type (so that we can change this in
the future) as well as whether the keystore has a password (so that we
can know whether prompting is necessary when we add support for keystore
passwords).
* Remove a checked exception, replacing it with `ParsingException`.
* Remove all Parser classes for the yaml sections, replacing them with static methods.
* Remove `ClientYamlTestFragmentParser`. Isn't used any more.
* Remove `ClientYamlTestSuiteParseContext`, replacing it with some static utility methods.
I did not rewrite the parsers using `ObjectParser` because I don't think it is worth it right now.
Today if an older version of a plugin exists, we fail to notify the user
with a helpful error message. This happens because during plugin
verification, we attempt to read the plugin descriptors for all existing
plugins. When an older version of a plugin is sitting on disk, we will
attempt to read this old plugin descriptor and fail due to a version
mismatch. This leads to an unhelpful error message. Instead, we should
check for existence of the plugin as part of the verification phase, but
before attempting to read plugin descriptors for existing plugins. This
enables us to provide a helpful error message to the user.
Relates #22305
* Internal: Refactor SettingCommand into EnvironmentAwareCommand
This change renames and changes the behavior of SettingCommand to have
its primary method take in a fully initialized Environment for
elasticsearch instead of just a map of settings. All of the subclasses
of SettingCommand already did this at some point, so this just removes
duplication.
We are currenlty checking that no deprecation warnings are emitted in our query tests. That can be moved to ESTestCase (disabled in ESIntegTestCase) as it allows us to easily catch where our tests use deprecated features and assert on the expected warnings.
Sequence BWC logic consists of two elements:
1) Wire level BWC using stream versions.
2) A changed to the global checkpoint maintenance semantics.
For the sequence number infra to work with a mixed version clusters, we have to consider situation where the primary is on an old node and replicas are on new ones (i.e., the replicas will receive operations without seq#) and also the reverse (i.e., the primary sends operations to a replica but the replica can't process the seq# and respond with local checkpoint). An new primary with an old replica is a rare because we do not allow a replica to recover from a new primary. However, it can occur if the old primary failed and a new replica was promoted or during primary relocation where the source primary is treated as a replica until the master starts the target.
1) Old Primary & New Replica - this case is easy as is taken care of by the wire level BWC. All incoming requests will have their seq# set to `UNASSIGNED_SEQ_NO`, which doesn't confuse the local checkpoint logic (keeping it at `NO_OPS_PERFORMED`)
2) New Primary & Old replica - this one is trickier as the global checkpoint service currently takes all in sync replicas into consideration for the global checkpoint calculation. In order to deal with old replicas, we change the semantics to say all *new node* in sync replicas. That means the replicas on old nodes don't count for the global checkpointing. In this state the seq# infra is not fully operational (you can't search on it, because copies may miss it) but it is maintained on shards that can support it. The old replicas will have to go through a file based recovery at some point and will get the seq# information at that point. There is still an edge case where a new primary fails and an old replica takes over. I'lll discuss this one with @ywelsch as I prefer to avoid it completely.
This PR also re-enables the BWC tests which were disabled. As such it had to fix any BWC issue that had crept in. Most notably an issue with the removal of the `timestamp` field in #21670.
The commit also includes a fix for the default value of the seq number field in replicated write requests (it was 0 but should be -2), that surface some other minor bugs which are fixed as well.
Last - I added some debugging tools like more sane node names and forcing replication request to implement a `toString`
Problem: We introduced the ability to shorten the rank eval request by using a
template in #20231. When playing with the API it turned out that there might be
use cases where - e.g. due to various heuristics - folks might want to translate
the original user query into more than just one type of Elasticsearch query.
Solution: Give each template an id that can later be referenced in the
actual requests.
Closes#21257
Today in the codebase we refer to seccomp everywhere instead of system
call filter even if we are not specifically referring to Linux. This
commit is a purely mechanical change to refer to system call filter
where appropriate instead of the general seccomp, and only leaves
seccomp in place when actually referring to the Linux implementation.
Relates #22243
Inline scripts defined in Ingest Pipelines are now compiled at creation time to preemptively catch errors on initialization of the pipeline.
Fixes#21842.
Moves the last of the "easy" parser construction into
`RestRequest`, this time with a new method
`RestRequest#contentParser`. The rest of the production
code that builds `XContentParser` isn't "easy" because it is
exposed in the Transport Client API (a Builder) object.
This commit enables CLI commands to be closeable and installs a runtime
shutdown hook to ensure that if the JVM shuts down (as opposed to
aborting) the close method is called.
It is not enough to wrap uses of commands in main methods in
try-with-resources blocks as these will not run if, say, the virtual
machine is terminated in response to SIGINT, or system shutdown event.
Relates #22126
Add checks to RankEvalSpec to safe guard against missing parameters.
Fail early in case no metric is supplied, no rated requests are supplied or the search source builder is missing but no template is supplied neither.
Add stricter checks around rank eval request parsing: Fail if in a rated request we see both, a verbatim request as well as request
template parameters.
Relates to #21260
Our query DSL supports empty queries (`{}`), which have a different meaning depending on the query that holds it, either ignored, match_all or match_none. We deprecated the support for empty queries in 5.0, where we log a deprecation warning wherever they are used.
The way we supported it once we moved query parsing to the coordinating node was having an Optional<QueryBuilder> return type in all of our parse methods (called fromXContent). See #17624. The central place for this was QueryParseContext#parseInnerQueryBuilder. We can now remove all the optional return types and simply throw an exception whenever an empty query is found.
Move rank-eval template compilation down to TransportRankEvalAction
Closes#21777 and #21465
Search templates for rank_eval endpoint so far only worked when sent through
REST end point
However we also allow templates to be set through a Java API call to
"setTemplate" on that same spec. This doesn't go through template execution so
fails further down the line.
To make this work, moved template execution further down, probably to
TransportRankEvalAction.
* Add template IT test for Java API
* Move template compilation to TransportRankEvalAction
Action filters currently have the ability to filter both the request and
response. But the response side was not actually used. This change
removes support for filtering responses with action filters.
We don't use the test infra nor do we run the tests. They might all be
entirely out of date. We also have a different BWC test infra in-place.
This change removes all of the legacy infra.
For the record, I also had to remove the geo-hash cell and geo-distance range
queries to make the code compile. These queries already throw an exception in
all cases with 5.x indices, so that does not hurt any more.
I also had to rename all 2.x bwc indices from `index-${version}` to
`unsupported-${version}` to make `OldIndexBackwardCompatibilityIT`
happy.
This commit adds `qa/multi-cluster-search` which currently does a
simple search across 2 clusters. This commit also adds support for IPv6
addresses and fixes an issue where all shards of the local cluster are searched
when only a remote index was given.
* Scripting: Remove groovy scripting language
Groovy was deprecated in 5.0. This change removes it, along with the
legacy default language infrastructure in scripting.
We kept `netty_3` as a fallback in the 5.x series but now that master
is 6.0 we don't need this or in other words all issues coming up with
netty 4 will be blockers for 6.0.
* master: (22 commits)
Add proper toString() method to UpdateTask (#21582)
Fix `InternalEngine#isThrottled` to not always return `false`. (#21592)
add `ignore_missing` option to SplitProcessor (#20982)
fix trace_match behavior for when there is only one grok pattern (#21413)
Remove dead code from GetResponse.java
Fixes date range query using epoch with timezone (#21542)
Do not cache term queries. (#21566)
Updated dynamic mapper section
Docs: Clarify date_histogram bucket sizes for DST time zones
Handle release of 5.0.1
Fix skip reason for stats API parameters test
Reduce skip version for stats API parameter tests
Strict level parsing for indices stats
Remove cluster update task when task times out (#21578)
[DOCS] Mention "all-fields" mode doesn't search across nested documents
InternalTestCluster: when restarting a node we should validate the cluster is formed via the node we just restarted
Fixed bad asciidoc in boolean mapping docs
Fixed bad asciidoc ID in node stats
Be strict when parsing values searching for booleans (#21555)
Fix time zone rounding edge case for DST overlaps
...
There is not yet a BWC layer in sequence numbers. This commit sets the
BWC version to 6.0.0 for the BWC and rolling upgrade tests until this
BWC layer is built.
Adds a version constant for it, bwc indices, and a vagrant upgrade-from
version. Also bumps the "upgrade from" version for the backwards-5.0
test and adds `skip`s for tests that don't fail against 5.0 so we skip
them during the backwards testing.
Finally, this skips the "Shrink index via API" test because it fails
consistently for me. Inconsistently for CI, but consistently for me.
I'll work on making it consistent tomorrow.
In #21348 the command executed to run the packaging tests has been changed to "sudo -E bats ...", forcing all environment variables from the vagrant user to be passed to the `sudo` command. This breaks a test on opensuse-13 (the one where it checks that elasticsearch cannot be started when `java` is not found) because all the PATH from the user is passed to the sudo command.
This commit restores the previous behavior while allowing only necessary testing environment variables to be passed using a /etc/sudoers.d file.
This changes adds a test discovery (which internally uses the existing
mock zenping by default). Having the mock the test framework selects be a discovery
greatly simplifies discovery setup (no more weird callback to a Node
method).
Today when a node starts, we create dynamic socket permissions based on
the configured HTTP ports and transport ports. If no ports are
configured, we use the default port ranges. When a tribe node starts, a
tribe node creates an internal node client for connecting to each remote
cluster. If neither an explicit HTTP port nor transport ports were
specified, the default port ranges are large enough for the tribe node
and its internal node clients. If an explicit HTTP port or transport
port was specified for the tribe node, then socket permissions for those
ports will be created, but not for the internal node clients. Whether
the internal node clients have explicit ports specified, or attempt to
bind within the default range, socket permissions for these will not
have been created and the internal node clients will hit a permissions
issue when attempting to bind. This commit addresses this issue by also
accounting for tribe nodes when creating the dynamic socket
permissions. Additionally, we add our first real integration test for
tribe nodes.
This commit enables real BWC testing against a 5.1 snapshot. All
REST tests plus rolling upgrade test now run against a mixed version
cross major version cluster.
This commit enables real BWC testing against a 5.1 snapshot. All
REST tests plus rolling upgrade test now run against a mixed version
cross major version cluster.
This commit changes the current :elactisearch:qa:vagrant build file and transforms it into a Gradle plugin in order to reuse it in other projects.
Most of the code from the build.gradle file has been moved into the VagrantTestPlugin class. To avoid duplicated VMs when running vagrant tests, the Gradle plugin sets the following environment variables before running vagrant commands:
VAGRANT_CWD: absolute path to the folder that contains the Vagrantfile
VAGRANT_PROJECT_DIR: absolute path to the Gradle project that use the VagrantTestPlugin
The VAGRANT_PROJECT_DIR is used to share project folders and files with the vagrant VM. These folders and files are exported when running the task `gradle vagrantSetUp` which:
- collects all project archives dependencies and copies them into `${project.buildDir}/bats/archives`
- copy all project bats testing files from 'src/test/resources/packaging/tests' into `${project.buildDir}/bats/tests`
- copy all project bats utils files from 'src/test/resources/packaging/utils' into `${project.buildDir}/bats/utils`
It is also possible to inherit and grab the archives/tests/utils files from project dependencies using the plugin configuration:
apply plugin: 'elasticsearch.vagrant'
esvagrant {
inheritTestUtils true|false
inheritTestArchives true|false
inheritTests true|false
}
dependencies {
// Inherit Bats test utils from :qa:vagrant project
bats project(path: ':qa:vagrant', configuration: 'bats')
}
The folders `${project.buildDir}/bats/archives`, `${project.buildDir}/bats/tests` and `${project.buildDir}/bats/utils` are then exported to the vagrant VMs and mapped to the BATS_ARCHIVES, BATS_TESTS and BATS_UTILS environnement variables.
The following Gradle tasks have also be renamed:
* gradle vagrantSetUp
This task copies all the necessary files to the project build directory (was `prepareTestRoot`)
* gradle vagrantSmokeTest
This task starts the VMs and echoes a "Hello world" within each VM (was: `smokeTest`)
On some systems these utilities are in /usr/lib/systemd/systemd-sysctl
and /usr/sbin/sysctl, and on others the /usr is dropped. This commit
accounts for that fact.
Our docs claim that we set vm.max_map_count automatically. This is not
quite the case. The story is that on SysV init we set vm.max_map_count
each time the service starts, which is good. On systemd, we create a
sysctl.d conf file that sets vm.map_max_count, but this is only
meaningful if the system is rebooted after package install. This commit
modifies the post-install script so that we run systemd-sysctl so that
the vm.max_map_count change occurs after package install without a
reboot.
Relates #21507
This commit ensure that VirtualBox is available in version 5.1+ in the system before running packaging tests. It also check for Vagrant version is now greater than 1.8.6.
The environment variable ES_JVM_OPTIONS allows end-users to specify a
custom location for the jvm.options file. Unfortunately, this
environment variable is not exported from the SysV init scripts. This
commit addresses this issue, and includes a test that ES_JVM_OPTIONS and
ES_JAVA_OPTS work for the SysV init packages.
Relates #21445
At one point in the past when moving out the rest tests from core to
their own subproject, we had multiple test classes which evenly split up
the tests to run. However, we simplified this and went back to a single
test runner to have better reproduceability in tests. This change
removes the remnants of that multiplexing support.
Today if you start Elasticsearch with the status logger configured to
the warn level, or use a transport client with the default status logger
level, you will see warn messages about deprecation loggers being
created with different message factories and that formatting might be
broken. This happens because the deprecation logger is constructed using
the message factory from its parent, an artifact leftover from the first
Log4j 2 implementation that used a custom message factory. When that
custom message factory was removed, this constructor invocation should
have been changed to not explicitly use the message factory from the
parent. This commit fixes this invocation. However, we also had some
status checking to all tests to ensure that there are no warn status log
messages that might indicate a configuration problem with Log4j 2. These
assertions blow up badly without the fix for the deprecation logger
construction, and also caught a misconfiguration in one of the logging
tests.
Relates #21339
The usage information for `elasticsearch-plugin` is quiet verbose and makes the
actual error message that is shown when trying to remove an non-existing plugin
hard to spot. This changes the error code to not trigger printing the usage
information.
Closes#21250
Plugins: Remove pluggability of ZenPing
ZenPing is the part of zen discovery which knows how to ping nodes.
There is only one alternative implementation, which is just for testing.
This change removes the ability to add custom zen pings, and instead
hooks in the MockZenPing for tests through an overridden method in
MockNode. This also folds in the ZenPingService (which was really just a
single method) into ZenDiscovery, and removes the idea of having
multiple ZenPing instances. Finally, this was the last usage of the
ExtensionPoint classes, so that is also removed here.
When installing a plugin when the plugins directory does not exist, the
install plugin command outputs a line saying that it is creating this
directory. The packaging tests for the archive distributions accounted
for this including an assertion that this line was output. The packages
have since been updated to include an empty plugins folder, so this line
will no longer be output. This commit removes this stale assertion from
the packaging tests.
Relates #21275
Today when installing Elasticsearch from an archive distribution (tar.gz
or zip), an empty plugins folder is not included. This means that if you
install Elasticsearch and immediately run elasticsearch-plugin list, you
will receive an error message about the plugins directory missing. While
the plugins directory would be created when starting Elasticsearch for
the first time, it would be better to just include an empty plugins
directory in the archive distributions. This commit makes this the
case. Note that the package distributions already include an empty
plugins folder.
Relates #21204
This adds support for templating in rank eval requests.
Relates to #20231
Problem: In it's current state the rank-eval request API forces the user to repeat complete queries for each test request. In most use cases the structure of the query to test will be stable with only parameters changing across requests, so this looks like lots of boilerplate json for something that could be expressed in a more concise way.
Uses templating/ ScriptServices to enable users to submit only one test request template and let them only specify template parameters on a per test request basis.
Vagrant tests use a static list of dependencies to upgrade from
and we weren't including 5.0.0 deps in that list. Also when the
list was incorrect we weren't sorting the "current" list so it
was difficult to read.
Also adds 2.4.1 to the list but *doesn't* add 5.0.0 because we
still can't resolve it yet. We still only print an error when
the list is wrong but don't abort the build. We'll abort the build
once we've fixed resolution for 5.0.0 and we can re-add it.
We are upgrading from out of date versions in our tests right now and we
can't fix that because the current versions to upgrade from aren't in
maven central. We'll resolve the resolution issue soon, but for now
let's get the build green.
Since with j`ava-1.8.0-openjdk-1.8.0.111-1.b15.el7_2.x86_64`, the OpenJDK packaged for CentOS and OEL override the default value (`false`) for the JVM option `AssumeMP` and force it to `true` (see [this patch](https://git.centos.org/blob/rpms!!java-1.8.0-openjdk.git/ab03fcc7a277355a837dd4c8500f8f90201ea353/SOURCES!always_assumemp.patch))
Because it is forced to true by default for these packages, the following warning message is printed to the standard output when the Vagrant box has only 1 CPU:
> OpenJDK 64-Bit Server VM warning: If the number of processors is expected to increase from one, then you should configure the number of parallel GC threads appropriately using -XX:ParallelGCThreads=N
This message will then fail the test introduced in #20422 where we check if no entries have been added to the journal after the service has been started.
This commit restore the default value for the `AssumeMP` option for CentOS and OracleServer.
This commit mutes a check on the output of journalctl after the Elasticsearch's systemd service has been started. It expected no entries in the journal but since OpenJDK build 1.8.0_111-b15 the following warning message is printed:
OpenJDK 64-Bit Server VM warning: If the number of processors is expected to increase from one, then you should configure the number of parallel GC threads appropriately using -XX:ParallelGCThreads=N
`LocalDiscovery` is a discovery implementation that uses static in memory maps to keep track of current live nodes. This is used extensively in our tests in order to speed up cluster formation (i.e., shortcut the 3 second ping period used by `ZenDiscovery` by default). This is sad as that mean that most of the test run using a different discovery semantics than what is used in production. Instead of replacing the entire discovery logic, we can use a similar approach to only shortcut the pinging components.
This change proposes the removal of all non-tcp transport implementations. The
mock transport can be used by default to run tests instead of local transport that has
roughly the same performance compared to TCP or at least not noticeably slower.
This is a master only change, deprecation notice in 5.x will be committed as a
separate change.
Today when parsing a request, Elasticsearch silently ignores incorrect
(including parameters with typos) or unused parameters. This is bad as
it leads to requests having unintended behavior (e.g., if a user hits
the _analyze API and misspell the "tokenizer" then Elasticsearch will
just use the standard analyzer, completely against intentions).
This commit removes lenient URL parameter parsing. The strategy is
simple: when a request is handled and a parameter is touched, we mark it
as such. Before the request is actually executed, we check to ensure
that all parameters have been consumed. If there are remaining
parameters yet to be consumed, we fail the request with a list of the
unconsumed parameters. An exception has to be made for parameters that
format the response (as opposed to controlling the request); for this
case, handlers are able to provide a list of parameters that should be
excluded from tripping the unconsumed parameters check because those
parameters will be used in formatting the response.
Additionally, some inconsistencies between the parameters in the code
and in the docs are corrected.
Relates #20722
this change adds a hard limit to `index.number_of_shard` that prevents
indices from being created that have more than 1024 shards. This is still
a huge limit and can only be changed via settings a system property.
Today when executing the install plugin command without a plugin id, we
end up throwing an NPE because the plugin id is null yet we just keep
going (ultimatley we try to lookup the null plugin id in a set, the
direct cause of the NPE). This commit modifies the install command so
that a missing plugin id is detected and help is provided to the user.
Relates #20660
When testing tribe nodes in an integration test, we should pass the classpath
plugins of the node down to the tribe client nodes. Without this the tribe client
nodes could be prevented from communicating with the tribes.
Today when CLI tools are executed, logging statements can intentionally
or unintentionally be executed when logging is not configured. This
leads to log messages that the status logger is not configured. This
commit reworks logging configuration for CLI tools so that logging is
always configured.
Relates #20575
This PR introduces backward compatibility index tests to test the rolling upgrade process amongst Elasticsearch instances within the same major version. The test executes in three phases. In the first phase, we form a cluster of 2 ES instances on an old version. In the second phase, we keep one of the nodes from the old cluster, kill the other node, but preserve its data directory and start an instance of the current version of ES using the same data directory as the killed instance. In the third phase, we kill the other old version ES instance from the first phase and launch a new instance, using the same data directory as the killed instance. Therefore, during phase 3, we have fully migrated and have all current versions of ES running. In each phase, we run REST tests that index documents and search them, ensuring at each stage that the documents from the previous phase are still there.
Note that because we haven't released a GA yet of 5.0, the tests currently don't start an old version cluster in the first phase. Once GA is released, this will be changed to make the backward compatibility version 5.0, while the current version in the cluster will be 5.x.
automatically between tasks, as we want some of the nodes from
the previous task to continue running in the next task. This
commit enables a cluster configuration setting to not stop
nodes automatically after a task runs, but instead the creator
of the test task must stop the running nodes explicitly in a
cleanup phase.
cluster, we wait for the cluster health to indicate the
necessary nodes have formed a cluster. This check was an
exact value (equality) check. However, if we are trying to
connect the nodes in the cluster to nodes from a previously
formed cluster (of the same name), then we will have more
nodes returned by the cluster health check than the current
task's configured number of nodes. Hence, this check needs
to be a >= check. This commit fixes it.
Today when starting Elasticsearch without a Log4j 2 configuration file,
we end up throwing an array index out of bounds exception. This is
because we are passing no configuration files to Log4j. Instead, we
should throw a useful error message to the user. This commit modifies
the Log4j configuration setup to throw a user exception if no Log4j
configuration files are present in the config directory.
Relates #20493
BATS upgrade tests fails on master branch because it tries to install 2.x versions to upgrade from instead of 5.x versions. And since #18554 we should only test upgrades from 5.0.0-alpha4 versions.
This commit changes the vagrant tests so that it tries to list all the previous releases from version N-1. If nothing is found, it will fetch the current version and will run the upgrade tests with it. It works nicely with the current master 6.0.0-alpha1-SNAPSHOT. Once 5.0.0 is released it should run the test with it.
When uninstalling or upgrading elasticsearch using the RPM package some empty directories remain on the filesystem:
/usr/share/elasticsearch/bin
/usr/share/elasticsearch/lib
/usr/share/elasticsearch/modules
/usr/share/elasticsearch/modules/foo
Having empty directories in modules can prevent elasticsearch to start after an upgrade: the plugins service expects to find a plugin-descriptor.properties file in every sub directory of modules.
This PR cleans things a bit so that these empty directories are removed on upgrade/removal like it was in 2.x.
The Log4j shutdown hack test tests that a hack we have in place to
workaround a bug in Log4j during shutdown is effective. Log4j can use
JMX to control logging levels, but we disable this through the use of a
system property log4j2.disable.jmx (mainly because there is no need for
this feature, but it also means granting additional security
permissions). The bug in Log4j is that during shutdown, it neglects to
check whether or not its usage of JMX is disable and so it attempts to
unregister management beans, leading to a permissions violation. The
test works by attempting to shutdown Log4j and thus triggering the bad
code path. With the Log4j hack in place, we have introduced jar hell so
that its our code running instead of code from the Log4j jar. Our code
correctly checks that the usage of JMX is disabled and thus does not
trip on a permissions violation. The test was a little complicated in
that it attempted to just grant the minimal permissions needed for Log4j
to do its thing, but this can sometimes lead to other unwanted
permissions violations because the permissions put in place are more
restrictive necessary. This commit simplifies this situation by
rewriting the test to only deny Log4j the sole permission needed to
trigger the bug.
Relates #20476
When upgrading elasticsearch using the RPM package, the scripts directory is removed if it's empty but it won't be recreated by the upgraded package. But after that the service won't start because the scripts dir is missing.
Today when setting the logging level via the command-line or an API
call, the expectation is that the logging level should trickle down the
hiearchy to descendant loggers. However, this is not necessarily the
case. For example, if loggers x and x.y are already configured then
setting the logging level on x will not descend to x.y. This is because
the logging config for x.y has already been forked from the logging
config for x. Therefore, we must explicitly descend the hierarchy when
setting the logging level and that is what this commit does.
Relates #20463
This commit introduces a new plugin for file-based unicast hosts
discovery. This allows specifying the unicast hosts participating
in discovery through a `unicast_hosts.txt` file located in the
`config/discovery-file` directory. The plugin will use the hosts
specified in this file as the set of hosts to ping during discovery.
The format of the `unicast_hosts.txt` file is to have one host/port
entry per line. The hosts file is read and parsed every time
discovery makes ping requests, thus a new version of the file that
is published to the config directory will automatically be picked
up.
Closes#20323
Today we add a prefix when logging within Elasticsearch. This prefix
contains the node name, and index and shard-level components if
appropriate.
Due to some implementation details with Log4j 2 , this does not work for
integration tests; instead what we see is the node name for the last
node to startup. The implementation detail here is that Log4j 2 there is
only one logger for a name, message factory pair, and the key derived
from the message factory is the class name of the message factory. So,
when the last node starts up and starts setting prefixes on its message
factories, it will impact the loggers for the other nodes.
Additionally, the prefixes are lost when logging an exception. This is
due to another implementation detail in Log4j 2. Namely, since we log
exceptions using a parameterized message, Log4j 2 decides that that
means that we do not want to use the message factory that we have
provided (the prefix message factory) and so logs the exception without
the prefix.
This commit fixes both of these issues.
Relates #20429
This commit adds a -q/--quiet option to Elasticsearch so that it does not log anything in the console and closes stdout & stderr streams. This is useful for SystemD to avoid duplicate logs in both journalctl and /var/log/elasticsearch/elasticsearch.log while still allows the JVM to print error messages in stdout/stderr if needed.
closes#17220
The plugin command now displays the version of the plugin, which is
compared to a string without the version. This removes the version from
the string.
The evil logger tests rely on external configuration. This configuration
is shared between these tests which means that changing the
configuration for one test can cause an unrelated test to fail. In
particular, removing the appenders on the root logger so that inherited
loggers in one test do not have a console and file appender by default
breaks tests that were expecting the root logger to have these
appenders. This commit separates these configs so that these tests are
not subject to this problem.
Log4j has a bug where on shutdown it ignores that JMX might be disabled;
since it does not respect this on shutdown, it proceeds to attempt to
access JMX leading to a security exception that should have otherwise
not occurred had it respected that JMX is disabled. This commit
intentionally introduces jar hell with the Server class to work around
this bug until a fix is released.
Relates #20389
Previously we would disable console logging in certain circumstances
(for example, if Elasticsearch is not in the foreground, or if
Elasticsearch is in the foreground but an exception was thrown during
bootstrap). This commit makes this handling work with Log4j 2. This will
prevent users from seeing double bootstrap check failure messages.
Relates #20387
Previous versions of Elasticsearch permitted unquoted JSON field names even though this is against the JSON spec. This leniency was disabled by default in the 5.x series of Elasticsearch but a backwards compatibility layer was added via a system property with the intention of removing this layer in 6.0.0. This commit removes this backwards compatibility layer.
Relates #20388
The 5.x series of Elasticsearch emits a warning if any of the old
logging configuration formats are present. This commit removes that
warning.
Relates #20386
By default, when an exception causes the JVM to terminate, the stack
trace is printed. In the case of failing bootstrap checks, this stack
trace is useless to the user, and might even distract them from seeing
that the bootstrap checks failed for reasons under their control. With
this commit, we cause the stack trace for a failing bootstrap check to
be truncated.
We also modify some methods to not declare that they throw the top level
checked exception type Exception, but instead explicitly declare the
exceptions that they throw. These exceptions are caught and wrapped in a
BootstrapException so that we can percolate only two exception types out
of Bootstrap#init as checked exception, BootstrapException and
NodeValidationException.
Relates #19989
The logging configuration tests write to log files which are deleted at
the end of the test. If these files are not closed, some operating
systems will complain when these deletes are performed. This commit
ensures that the logging system is properly shutdown so that these files
can be properly deleted.
The evil logging tests write to log files which are deleted at the end
of the test. If these files are not closed, some operating systems will
complain when these deletes are performed. This commit ensures that the
logging system is properly shutdown so that these files can be properly
deleted.
This commit expands on the message printed when config files are
preserved when removing a plugin to give the user an indication of the
reason the config files are preserved.
When removing a plugin with a config directory, we preserve the config
directory. This is because the workflow for upgrading a plugin involves
removing and then installing the plugin again and losing the plugin
config in this case would be terrible. This commit causes a message
regarding this to be printed in case the user wants to manually delete
these files.
* master:
Avoid NPE in LoggingListener
Randomly use Netty 3 plugin in some tests
Skip smoke test client on JDK 9
Revert "Don't allow XContentBuilder#writeValue(TimeValue)"
[docs] Remove coming in 2.0.0
Don't allow XContentBuilder#writeValue(TimeValue)
[doc] Remove leftover from CONSOLE conversion
Parameter improvements to Cluster Health API wait for shards (#20223)
Add 2.4.0 to packaging tests list
Docs: clarify scale is applied at origin+offest (#20242)
When Netty 4 was introduced, it was not the default network
implementation. Some tests were constructed to randomly use Netty 4
instead of the default network implementation. When Netty 4 was made the
default implementation, these tests were not updated. Thus, these tests
are randomly choosing between the default network implementation (Netty
4) and Netty 4. This commit updates these tests to reverse the role of
Netty 3 and Netty 4 so that the randomization is choosing between Netty
3 and the default (again, now Netty 4).
Relates #20265
This commit adds an assumption to SmokeTestClientIT tests on JDK 9. The
underlying issue is that Netty attempts to access sun.nio.ch but this
package is not exported from java.base on JDK 9. This throws an uncaught
InaccessibleObjectException causing the test to fail. This assumption
can be removed when Netty 4.1.6 is released as it will include a fix for
this scenario.
Relates #20260
This commit enables CLI tools to have console logging. For the CLI
tools, we skip configuring the logging infrastructure via the config
file, and instead set the level only via a system property.
This commit fixes failing evil logging configuration tests. The test for
resolving multiple configuration files was failing after
9a58fc2348 removed some of the
configuration needed for this test. The solution is revert the removal
of that configuration, but remove additivity from the test logger to
prevent the evil logger tests from failing.
This commit defaults the max local storage nodes to one. The motivation
for this change is that a default value greather than one is dangerous
as users sometimes end up unknowingly starting a second node and start
thinking that they have encountered data loss.
Relates #19964