This commit adds a gradle project, set inside the root build.gradle,
which controls all our bwc tests. This allows for seamless (ie no errant
CI failures) backporting of behavior.
In #25201, a setting was added to allow setting the retry timeout for the rest client under the
impression that this would allow requests to go longer than 30s. However, there is also a socket
timeout that needs to be set to greater than 30s, which this change adds a setting for.
This commit adds a setting to change the request timeout for the rest client. This is useful as the
default timeout is 30s, which is also the same default for calls like cluster health. If both are
the same then the response from the cluster health api will not be received as the client usually
times out first making test failures harder to debug.
Relates #25185
Duplicate data paths already fail to work because we would attempt to
take out a node lock on the directory a second time which will fail
after the first lock attempt succeeds. However, how this failure
manifests is not apparent at all and is quite difficult to
debug. Instead, we should explicitly reject duplicate data paths to make
the failure cause more obvious.
Relates #25178
We have a custom logger implementation known as a prefix logger that is
used to write every message by the logger with a given prefix. This is
useful for node-level, index-level, and shard-level messages where we
want to log the node name, index name, and shard ID, respectively, if
possible. The mechanism that we employ is that of a marker. Log4j has a
built-in facility for managing these markers, but its effectively a
memory leak because these markers are held in a map and can never be
released. This is problematic for us since indices and shards do not
necessarily have infinite life spans and so on a node where there are
many indices being creted and destroyed, this infinite lifespan can be a
problem indeed. To solve this, we use our own cache of markers. This is
necessary to prevent too many instances of the marker for the same
prefix from being created (just think of all the shard-level components
that exist in the system), and to workaround the effective leak in
Log4j. These markers are stored as weak references in a weak hash
map. It is these weak references that are unneeded. When a key is
removed from a weak hash map, the corresponding entry is placed on a
reference queue that is eventually cleared. This commit simplifies
prefix logger by removing this unnecessary weak reference wrapper.
Relates #22460
This commit adds back "id" as the key within a script to specify a
stored script (which with file scripts now gone is no longer ambiguous).
It also adds "source" as a replacement for "code". This is in an attempt
to normalize how scripts are specified across both put stored scripts and script usages, including search template requests. This also deprecates the old inline/stored keys.
This adds an option to `ClusterConfiguration` to preserve the
`shared` directory when starting up a new cluster and switches
the `qa:full-cluster-restart` tests to use it rather than
disable the clean shared task.
Relates to #24846
We're using Vagrant in more places now than before. This commit includes a plugin that verifies
the Vagrant and Virtualbox installations for projects that depend on them. This shared code
should fix up the errors we've seen from CI builds relating to the new Kerberos fixture.
* Adds nodes usage API to monitor usages of actions
The nodes usage API has 2 main endpoints
/_nodes/usage and /_nodes/{nodeIds}/usage return the usage statistics
for all nodes and the specified node(s) respectively.
At the moment only one type of usage statistics is available, the REST
actions usage. This records the number of times each REST action class is
called and when the nodes usage api is called will return a map of rest
action class name to long representing the number of times each of the action
classes has been called.
Still to do:
* [x] Create usage service to store usage statistics
* [x] Record usage in REST layer
* [x] Add Transport Actions
* [x] Add REST Actions
* [x] Tests
* [x] Documentation
* Rafactors UsageService so counts are done by the handlers
* Fixing up docs tests
* Adds a name to all rest actions
* Addresses review comments
We default to 0 replicas in the rolling restart scenario already to ensure
we test against worst case. Yet, this adds a dummy index to ensure we also
recover and index with replicas just fine.
The Lucene version expectation in the verify Lucene version test is
backwards, mixing up the expected and actual values. This commit
reorders them to fix this issue.
The Lucene version constants for 5.4.1 and 5.5.0 are wrong, they are
listed as 6.5.0 instead of 6.5.1. This commit fixes these issues, and
adds a test to ensure that this does not happen again.
Relates #24923
Removes the need for the `_UNRELEASED` suffix on versions by detecting if a version should be unreleased or not based on the versions around it. This should make it simpler to automate the task of adding a new version label.
These tests spin up two nodes of an older version of Elasticsearch,
create some stuff, shut down the nodes, start the current version,
and verify that the created stuff works.
You can run `gradle qa:full-cluster-restart:check` to run these
tests against the head of the previous branch of Elasticsearch
(5.x for master, 5.4 for 5.x, etc) or you can run
`gradle qa:full-cluster-restart:bwcTest` to run this test against
all "index compatible" versions, one after the other. For master
this is every released version in the 5.x.y version *and* the tip
of the 5.x branch.
I'd love to add more to these tests in the future but these
currently just cover the functionality of the `create_bwc_index.py`
script and start to cover the assertions in the
`OldIndexBackwardsCompatibilityIT` test.
This is a simple refactoring to move the context definitions into the
type that they use. While we have multiple context names for the same
class at the moment, this will eventually become one ScriptContext per
instance type, so the pattern of a static member on the interface called
CONTEXT can be used. This commit also moves the consolidated list of
contexts provided by core ES into ScriptModule.
As we work towards contexts implying the return type of compilation, we
first need ScriptContext to not be an enum. This commit removes the
Standard enum and Plugin subclass of ScriptContext.
ScriptEngine implementations have an overridable method to indicate they
are safe to use as inline scripts. Since groovy was removed fro 6.0,
there are no longer any implementations which used the default false
value. Furthermore, the value was not actually read anywhere. This
commit removes the method. The ScriptEngineRegistry was also no longer
necessary as it only was used to build a map from language to engine.
This commit renames the backwards-5.0 qa test to mixed-cluster and
creates a test within the project per wire compat version. Like with
rolling upgrade tests, the integTest task will run against the most
recent version, while all versions will be tested with the bwcTest task.
Now that we generate the versions list from Versions.java we can
drop the list of versions maintained for vagrant testing. One nice
thing that the vagrant testing did was to check if the list of
versions was out of date. This moves that test to the core
project.
This commit changes the rolling upgrade test to create a set of rest
test tasks per wire compat version. The most recent wire compat version
is always tested with the `integTest` task, and all versions can be
tested with `bwcTest`.
This commit expands the logic for version extraction from Version.java
to include a list of all versions for backcompat purposes. The tests
using bwcVersion are converted to use this list, but those tests
(rolling upgrade and backwards-5.0) are still not randomized; that will
happen in another followup.
Today the `_field_caps` API doesn't implement its request serialization
correctly since indices and indices options are not serialized at all.
This will likely break with all transport clients etc. and if this request
must be send across the network. This commit fixes this and adds correct
handling if we have only remote indices to prevent the inclusion of
all local indices.
This commit renames all rest test files to use the .yml extension
instead of .yaml. This way the extension used within all of
elasticsearch for yaml is consistent.
* Add parent-join module
This change adds a new module named `parent-join`.
The goal of this module is to provide a replacement for the `_parent` field but as a first step this change only moves the `has_child`, `has_parent` queries and the `children` aggregation to this module.
These queries and aggregations are no longer in core but they are deployed by default as a module.
Relates #20257
In windows we can't reliable git the pid so we skip the
reindex-from-remote tests from old versions of elasticsearch. This
is OK because we aren't really testing windows here anyway. It isn't
great, but should be safe.
Two of the versions of Elasticsearch we need to run for these tests
can't run in Java 9 so we skip the entire test if we are running in
java 9. For now. I'd like to reenable it to run against java 8 if
there is one available, but that can wait for another time.
Relates to #24561
Adds tests for reindex-from-remote for the latest 2.4, 1.7, and
0.90 releases. 2.4 and 1.7 are fairly popular versions but 0.90
is a point of pride.
This fixes any issues those tests revealed.
Closes#23828Closes#24520
This commit renames ScriptEngineService to ScriptEngine. It is often
confusing because we have the ScriptService, and then
ScriptEngineService implementations, but the latter are not services as
we see in other places in elasticsearch.
File scripts have 2 related settings: the path of file scripts, and
whether they can be dynamically reloaded. This commit deprecates those
settings.
relates #21798
Today we rely on background syncs to relay the global checkpoint under
the mandate of the primary to its replicas. This means that the global
checkpoint on a replica can lag far behind the primary. The commit moves
to inlining global checkpoints with replication requests. When a
replication operation is performed, the primary will send the latest
global checkpoint inline with the replica requests. This keeps the
replicas closer in-sync with the primary.
However, consider a replication request that is not followed by another
replication request for an indefinite period of time. When the replicas
respond to the primary with their local checkpoint, the primary will
advance its global checkpoint. During this indefinite period of time,
the replicas will not be notified of the advanced global
checkpoint. This necessitates a need for another sync. To achieve this,
we perform a global checkpoint sync when a shard falls idle.
Relates #24513
Previously, Mustache would call `toString` on the `_ingest.timestamp`
field and return a date format that did not match Elasticsearch's
defaults for date-mapping parsing. The new ZonedDateTime class in Java 8
happens to do format itself in the same way ES is expecting.
This commit adds support for a feature flag that enables the usage of this new date format
that has more native behavior.
Fixes#23168.
This new fix can be found in the form of a cluster setting called
`ingest.new_date_format`. By default, in 5.x, the existing behavior
will remain the same. One will set this property to `true` in order to
take advantage of this update for ingest-pipeline convenience.
When installing plugin permissions, we try to set the permissions on all
installed files ourselves because a umask from the user could violate
everything needed to get the permissions right. Sadly, directories were
not handled correctly at all and so we were still left with broken
installations with umasks like 0077. This commit fixes this issue, adds
a thorough unit test for the situation, and most importantly, adds a
test that sets the umask before installing the plugin.
Relates #24527
The license header in this file was after and on the same line as the
package statement. This commit moves the package statement to be after
the license header.
Netty uses the number of processors for sizing various resources (e.g.,
thread pools, buffer pools, etc.). However, it uses the runtime number
of available processors which might not match the configured number of
processors as set in Elasticsearch to limit the number of threads (for
example, in Docker containers). A new feature was added to Netty that
enables configuring the number of processors Netty should see for sizing
this various resources. This commit takes advantage of this feature to
set this number of available processors to be equal to the configured
number of processors set in Elasticsearch.
Relates #24420
The tribe service can take a while to initialize, depending on how many cluster it needs to connect to. This change moves writing the ports file used by tests to before the tribe service is started.
This adds the `index.mapping.single_type` setting, which enforces that indices
have at most one type when it is true. The default value is true for 6.0+ indices
and false for old indices.
Relates #15613
The bats tests are descructive and must be run as root. This is a
horrible combination on any sane system but perfectly fine to do
in a VM. This change modifies the tests so they revuse to start
unless they are in an environment with an `/etc/is_vagrant_vm`
file. The Vagrantfile creates it on startup.
Closes#24137
If the user explicitly configured path.data to include
default.path.data, then we should not fail the node if we find indices
in default.path.data. This commit addresses this.
Relates #24285
In #24251 we fix an issue with stored search templates that
this test would have discovered: stored search templates cause
the node to refuse to start. Technically a "restart" test would
have caught this as well and would have caught it more quickly.
But we already *have* an upgrade test and we don't have restart tests.
And testing this on upgrade is a good thing too.
It seems that Wildfly 10 can not be made to start in a fully-functional
form on JDK 9, so this commit skips running the Wildfly integration
tests on JDK 9.
An important use case for our users is deploying our clients inside of
applications containers like Wildly. Sometimes, we make changes that
unintentionally break this use case. We need to know before we ship a
release that we have broken such use cases. As Wildfly is one of the
bigger application containers, this commit starts by adding an
integration test that deploys an application using the transport client
to Wildfly and ensures that all is well. Future work can add similar
integration tests for the low-level and high-level REST clients.
Relates #24147
The plugin cli currently resides inside the elasticsearch jar. This
commit moves it into a plugin-cli jar. This is change alone is a no-op;
it does not change anything about what is loaded at runtime. But it will
allow easier testing (with fixtures in the future to test ES or maven
installation), as well as eventually not loading these classes when
starting elasticsearch.
This leniency was left in after plugin installer refactoring for 2.0
because some tests still relied on it. However, the need for this
leniency no longer exists.
* Replicate write failures
Currently, when a primary write operation fails after generating
a sequence number, the failure is not communicated to the replicas.
Ideally, every operation which generates a sequence number on primary
should be recorded in all replicas.
In this change, a sequence number is associated with write operation
failure. When a failure with an assinged seqence number arrives at a
replica, the failure cause and sequence number is recorded in the translog
and the sequence number is marked as completed via executing `Engine.noOp`
on the replica engine.
* use zlong to serialize seq_no
* Incorporate feedback
* track write failures in translog as a noop in primary
* Add tests for replicating write failures.
Test that document failure (w/ seq no generated) are recorded
as no-op in the translog for primary and replica shards
* Update to master
* update shouldExecuteOnReplica comment
* rename indexshard noop to markSeqNoAsNoOp
* remove redundant conditional
* Consolidate possible replica action for bulk item request
depanding on it's primary execution
* remove bulk shard result abstraction
* fix failure handling logic for bwc
* add more tests
* minor fix
* cleanup
* incorporate feedback
* incorporate feedback
* add assert to remove handling noop primary response when 5.0 nodes are not supported
This change simplifies how the rest test runner finds test files and
removes all leniency. Previously multiple prefixes and suffixes would
be tried, and tests could exist inside or outside of the classpath,
although outside of the classpath never quite worked. Now only classpath
tests are supported, and only one resource prefix is supported,
`/rest-api-spec/tests`.
closes#20240
After splitting integ tests into cluster configuration and the test
runner task, we still have dependencies of the test runner added as deps
of the cluster. This commit adds dependencies directly to the cluster,
so that the runner can have other dependencies independent of what is
needed for the cluster.
We have some packaging tests where we use a custom jvm.options file. The
flags we use here are barebones, just enough to exercise that using a
custom jvm.options files actually works. Due to this, we are missing a
Log4j flag that prevents Log4j from trying to use JMX. If Log4j tries to
use JMX, it hits a security manager exception and tries to log
this. This attempt to log happens before we've configured
logging. Previously, Elasticsearch was lenient here so this was treated
as harmless and the test could march on. Now, we fail startup if we
detect an attempt to log before logging is configured so this prevents
Elasticsearch from starting if we do not have jvm.options files in place
that prevent these log messages from being written before logging is
being configured. This commit adds jvm.options files in the places need
to prevent this.
These tests were marked as awaits fix due to JNA requiring a version of
glibc greater than or equal to version 2.14. Since we still support
systems that would not have this version, we have released our own JNA
dependency that is built to support earlier versions of glibc. This
commit removes some await fixes that were added to tests that failed as
a result of this situation.
It can easily happen that we touch a logger before logging is configured
due to chains of static intializers and other such scenarios. This
commit adds detection for this mechanism that will fail startup if we
touch a logger before logging is configured. This is a bug that will
cause builds to fail.
Relates #24076
This is related to #23893. This commit allows users to use wilcards for
cluster names when executing a cross cluster search.
So instead of defining every cluster such as:
GET one:*,two:*,three:*/_search
A user could just search:
GET *:*/_search
As ":" characters are currently allowed in index names, if the text
up to the first ":" does not match a defined cluster name, the entire
string is treated as an index name.
All our actions that are invoked from rest actions have corresponding
transport actions. This adds the transport action for RestRemoteClusterInfoAction
for consistency.
Relates to #23969
This commit fixes the handling of POSIX permissions on Windows in the
spawner tests. Since POSIX permissions do not exist there, we first have
to check if we are on a filesystem that supports POSIX or not before
attempting to set the permissions.
This commit renames the random ASCII helper methods in ESTestCase. This
is because this method ultimately uses the random ASCII methods from
randomized runner, but these methods actually only produce random
strings generated from [a-zA-Z].
Relates #23886
This commit adds a single node discovery type. With this discovery type,
a node will elect itself as master and never form a cluster with another
node.
Relates #23595
We currently have the last minor version of the previous major hardcoded
in tests like rolling upgrade. This change programatically finds this
during gradle initialization by parsing versions from Version.java.
These tests were disabled due to an issue in Netty which has since been
resolved and integrated into Elasticsearch.
Relates elastic/elasticsearch#20260
The current rest backcompat tests, which run against a mixed cluster of
5.x and 6.0 nodes, depend on snapshot builds of 5.x. However, this has
the potential for inconsistency that results in CI failures, and happens
quite often, whenever some backcompat logic is added to 5.x, but the bwc
test on master fails because the 5.x code has not yet been published as
a snapshot.
This change creates a git clone of the 5.x branch,
builds the zip distribution, and ties that into gradle substitutions for
the 5.x version.
We currently use POSIX exit codes in all of our CLIs. However, posix
only suggests these exit codes are standard across tools. It does not
prescribe particular uses for codes outside of that range. This commit
adds 2 exit codes specific to plugin installation to make distinguishing
an incorrectly built plugin and a plugin already existing easier.
closes#15295
This commit catches the underlying failure when trying to list plugin
information when a plugin is incompatible with the current version of
elasticsearch. This could happen when elasticsearch is upgraded but old
plugins still exist. With this change, all plugins will be output,
instead of failing at the first out of date plugin.
closes#20691