Now that we generate the versions list from Versions.java we can
drop the list of versions maintained for vagrant testing. One nice
thing that the vagrant testing did was to check if the list of
versions was out of date. This moves that test to the core
project.
This commit changes the rolling upgrade test to create a set of rest
test tasks per wire compat version. The most recent wire compat version
is always tested with the `integTest` task, and all versions can be
tested with `bwcTest`.
This commit renames all rest test files to use the .yml extension
instead of .yaml. This way the extension used within all of
elasticsearch for yaml is consistent.
This is almost exclusively for docs test which frequently match the
entire response. This allow something like:
```
- set: {nodes.$master.http.publish_address: host}
- match:
$body:
{
"nodes": {
$host: {
... stuff in here ...
}
}
}
```
This should make it possible for the docs tests to work with
unpredictable keys.
The disruption tests sit in a single test suite which causes these tests
to be single-threaded. We can split this test suite into multiple suites
(logically, of course) enabling them to be run in parallel reducing the
total run time of all integration tests in core. This commit splits the
discovery with service disruptions test suite into three suites
- master disruptions
- discovery disruptions
- cluster disruptions
The last one could probably be better named, it is meant to represent
performing actions in the cluster (indexing, failing a shard, etc.)
while a disruption is taking place.
Relates #24662
* Add parent-join module
This change adds a new module named `parent-join`.
The goal of this module is to provide a replacement for the `_parent` field but as a first step this change only moves the `has_child`, `has_parent` queries and the `children` aggregation to this module.
These queries and aggregations are no longer in core but they are deployed by default as a module.
Relates #20257
This allows other plugins to use a client to call the functionality
that is in the core modules without duplicating the logic.
Plugins can now safely send the request and response classes via the
client even if the requests are executed locally. All relevant classes
are loaded by the core classloader such that plugins can share them.
This is re-adds this commit that was revered in 952feb58e4
This allows other plugins to use a client to call the functionality
that is in the core modules without duplicating the logic.
Plugins can now safely send the request and response classes via the
client even if the requests are executed locally. All relevant classes
are loaded by the core classloader such that plugins can share them.
This starts breaking up the `UpdateHelper.prepare` method so that each piece can
be individually unit tested. No actual functionality has changed.
Note however, that I did add a TODO about `ctx.op` leniency, which I'd love to
remove as a separate PR if desired.
Previously you could set the system property tests.es.path.data and
start the run task with a custom data directory. A change in core to
detect duplicate settings broke this. That change should stay, but we
can work around this easily by only setting path.data to the data
directory if tests.es.path.data is not set.
We start the test JVMs with various options. There are two that we
should remove, they do not make sense.
- we require Java 8 yet there was a conditional build option for Java 7
- we do not set MaxDirectMemorySize in our default JVM options, we
should not in the test JVMs either
Relates #24501
Use fedora-25 Vagrant box in VagrantTestPlugin, which was missing from
9a3ab3e800 causing packaging test
failures.
Additionally update TESTING.asciidoc
* Fix JVM test argline
The argline was being overridden by '-XX:-OmitStackTraceInFastThrow'
which led to test failures that were expecting the JVM to be in a
certain state based on the value of tests.jvm.argline but they were not
since these arguments were never passed to the JVM. Additionally, we
need to respect the provided JVM argline if it is already provided with
a flag for OmitStackTraceInFastThrow. This commit fixes this by only
setting OmitStackTraceInFastThrow if it is not already set.
* Add comment
* Fix comment
* More elaborate comment
* Sigh
TransportService and RemoteClusterService are closely coupled already today
and to simplify remote cluster integration down the road it can be a direct
dependency of TransportService. This change moves RemoteClusterService into
TransportService with the goal to make it a hidden implementation detail
of TransportService in followup changes.
This adds `-XX:-OmitStackTraceInFastThrow` to the JVM arguments
which *should* prevent the JVM from omitting stack traces on
common exception sites. Even though these sites are common, we'd
still like the exceptions to debug them.
This also adds the flag when running tests and adapts some tests
that had workarounds for the absense of the flag.
Closes#24376
This commit removes a few duplicates from the list of checkstyle
suppressions that accumulated while auto-generating the list of
suppressions based on the files that currently exceed the 140-column
limit.
The list of checkstyle suppressions was auto-generated based on the list
of files that currently violate the 140-column limit. An errant entry
was committed that did no harm, but this commit removes it.
We are back to having a 140-column limit. While at some point we will
apply auto-formatting tools to the code base, that is a down-the-road
thing and adjusting the suppressions now shaves minutes off the build
time.
Relates #24408
Separates cluster state publishing from applying cluster states:
- ClusterService is split into two classes MasterService and ClusterApplierService. MasterService has the responsibility to calculate cluster state updates for actions that want to change the cluster state (create index, update shard routing table, etc.). ClusterApplierService has the responsibility to apply cluster states that have been successfully published and invokes the cluster state appliers and listeners.
- ClusterApplierService keeps track of the last applied state, but MasterService is stateless and uses the last cluster state that is provided by the discovery module to calculate the next prospective state. The ClusterService class is still kept around, which now just delegates actions to ClusterApplierService and MasterService.
- The discovery implementation is now responsible for managing the last cluster state that is used by the consensus layer and the master service. It also exposes the initial cluster state which is used by the ClusterApplierService. The discovery implementation is also responsible for adding the right cluster-level blocks to the initial state.
- NoneDiscovery has been renamed to TribeDiscovery as it is exclusively used by TribeService. It adds the tribe blocks to the initial state.
- ZenDiscovery is synchronized on state changes to the last cluster state that is used by the consensus layer and the master service, and does not submit cluster state update tasks anymore to make changes to the disco state (except when becoming master).
Control flow for cluster state updates is now as follows:
- State updates are sent to MasterService
- MasterService gets the latest committed cluster state from the discovery implementation and calculates the next cluster state to publish
- MasterService submits the new prospective cluster state to the discovery implementation for publishing
- Discovery implementation publishes cluster states to all nodes and, once the state is committed, asks the ClusterApplierService to apply the newly committed state.
- ClusterApplierService applies state to local node.
Creates a new task `namingConventionsMain`, that runs on the
`buildSrc` and `test:framework` projects and fails the build if
any of the classes in the main artifacts are named like tests or
are non-abstract subclasses of ESTestCase.
It also fixes the three tests that would cause it to fail.
Start moving built in analysis components into the new analysis-common
module. The goal of this project is:
1. Remove core's dependency on lucene-analyzers-common.jar which should
shrink the dependencies for transport client and high level rest client.
2. Prove that analysis plugins can do all the "built in" things by moving all
"built in" behavior to a plugin.
3. Force tests not to depend on any oddball analyzer behavior. If tests
need anything more than the standard analyzer they can use the mock
analyzer provided by Lucene's test infrastructure.
This commit adds a call to jstack to see where each node is stuck when
starting up, if a timeout occurs. This also decreases the timeout back
to 30 seconds.
We want to upgrade to Lucene 7 ahead of time in order to be able to check whether it causes any trouble to Elasticsearch before Lucene 7.0 gets released. From a user perspective, the main benefit of this upgrade is the enhanced support for sparse fields, whose resource consumption is now function of the number of docs that have a value rather than the total number of docs in the index.
Some notes about the change:
- it includes the deprecation of the `disable_coord` parameter of the `bool` and `common_terms` queries: Lucene has removed support for coord factors
- it includes the deprecation of the `index.similarity.base` expert setting, since it was only useful to configure coords and query norms, which have both been removed
- two tests have been marked with `@AwaitsFix` because of #23966, which we intend to address after the merge
After splitting integ tests into cluster configuration and the test
runner task, we still have dependencies of the test runner added as deps
of the cluster. This commit adds dependencies directly to the cluster,
so that the runner can have other dependencies independent of what is
needed for the cluster.
This new version of jna is rebuilt from the official release of jna, but
with native libs linked against older glibc in order to support all
platforms elasticsearch supports.
closes#23640
This reverts the line limit change in #23623 - this PR doesn't touch the suppression file since we are moving towards automatic code formatting which makes it mainly obsolete.
Adds the option for a plugin to specify extra directories containing notices
and licenses files to be incorporated into the overall notices file that is
generated for the plugin.
This can be useful, for example, where the plugin has a non-Java dependency
that itself incorporates many 3rd party components.
The buffer limit should have been configurable already, but the factory constructor is package private so it is truly configurable only from the org.elasticsearch.client package. Also the HttpAsyncResponseConsumerFactory interface was package private, so it could only be implemented from the org.elasticsearch.client package.
Closes#23958
This commit upgrades the Log4j dependencies from version 2.7 to version
2.8.2. This release includes a fix for a case where Log4j could lose
exceptions in the presence of a security manager.
Relates #23995
This commit puts all the classes in the repository-s3 plugin into a
single package. In addition to simplifying the plugin, it will make it
easier to test as things that should be package private will not be
difficult to use inside tests alone.
The method Boolean#getBoolean is dangerous. It is too easy to mistakenly
invoke this method thinking that it is parsing a string as a
boolean. However, what it actually does is get a system property with
the specified string, and then attempts to use usual crappy boolean
parsing in the JDK to parse that system property as boolean with
complete leniency (it parses every input value into either true or
false); that is, this method amounts to invoking
Boolean#parseBoolean(String) on the result of
System#getProperty(String). Boo. This commit bans usage of this method.
Relates #23864
Now that we are on gradle 3.3, we can take advantage of a fix that was
made in 2.14 which properly handles disabling transitive dependencies in
pom generation. As it was currently, we actually ended up generated two
exclusions sections in the generated pom. This is yet another example of
why we need validation on the pom files with our generation here, but I
leave that for another day because I still don't know a good way to do
it.
This moves `updateReplicaRequest` to `createPrimaryResponse` and separates the
translog updating to be a separate function so that the function purpose is more
easily understood (and testable).
It also separates the logic for `MappingUpdatePerformer` into two functions,
`updateMappingsIfNeeded` and `verifyMappings` so they don't do too much in a
single function. This allows finer-grained error testing for when a mapping
fails to parse or be applied.
Finally, it separates parsing and version validation for
`executeIndexRequestOnReplica` into a separate
method (`prepareIndexOperationOnReplica`) and adds a test for it.
Relates to #23359
This commit adds a build listener to the integ test runner which will
print out an excerpt of the logs from the integ test cluster if the test
run fails. There are future improvements that can be made (for example,
to dump excerpts from dependent clusters like in tribe node tests), but
this is a start, and would help with simple rest tests failures that we
currently don't know what actually happened on the node.
This will allow us to get rid of deprecation warnings that appear when
using 3.3, and also get rid of extra logic for 2.13 required because of
the progress logger.
Search took time uses an absolute clock to measure elapsed time, and
then tries to deal with the complexities of using an absolute clock for
this purpose. Instead, we should use a high-precision monotonic relative
clock that is designed exactly for measuring elapsed time. This commit
modifies the search infrastructure to use a relative clock for measuring
took time, but still provides an absolute clock for the components of
search that require a real clock (e.g., index name expression
resolution, etc.).
Relates #23662
This commit creates a keystore and adds settings to it during the
cluster formation for integration tests. Users can define a
`keyStoreSetting` in build files for settings that need to be placed in
the keystore.
This commit moves the checkstyle rule of max line length from 140
characters to 100 characters. We whitelist all existing violations and
will address them in follow-ups.
Relates #23623
In Gradle 3.4, the buildSrc plugin seems to be packaged into a jar before it is accessed by the rest of the build and the signatures file for the third-party audit task cannot be accessed as
getClass().getResource('/forbidden/third-party-audit.txt') then points to a file entry in a JAR, which cannot be loaded directly as a File object. This commit changes the third-party audit task to pass the content of the signatures file as a String instead.
While trying to improve the failure output in #23547, the stderr was
also captured from jrunscript. This was under the assumption that stderr
is only written to in case of an error. However, with java 9, when
JAVA_TOOL_OPTIONS are set, they are output to stderr. And our CI sets
JAVA_TOOL_OPTIONS for some reason. This commit fixes the jrunscript call
to use a separate buffer for stderr.
A previous change to the multi-search request execution to avoid stack
overflows regressed on limiting the number of concurrent search requests
from a batched multi-search request. In particular, the replacement of
the tail-recursive call with a loop could asynchronously fire off all of
the remaining search requests in the batch while max concurrent search
requests are already executing. This commit attempts to address this
issue by taking a more careful approach to the initial problem of
recurisve calls. The cause of the initial problem was due to possibility
of individual requests completing on the same thread as invoked the
search action execution. This can happen, for example, in cases when an
individual request does not resolve to any shards. To address this
problem, when an individual request completes we check if it completed
on the same thread as fired off the request. In this case, we loop and
otherwise safely recurse. Sadly, there was a unit test to check that the
maximum number of concurrent search requests was not exceeded, but that
test was broken while modifying the test to reproduce a case that led to
the possibility of stack overflow. As such, we randomize whether or not
search actions execute on the same thread as the thread that invoked the
action.
Relates #23538
This commit improves the output when jrunscript fails to include the
full output of the command. It also makes the quoting that is needed for
windows only happen on windows (which worked on java 8, but for some
reason does not work with java 9)
This commit upgrades to the newest version of randomized runner. There
is a new additional check that allows ensuring the working directory
for each child jvm is empty. By default, this check will fail the test
run. However, for elasticsearch, we default to wipe the directory. For
example, if you previously told the runner to not wipe the directory, in
order to investigate a failure, the wipe option will delete this data
upon re-running the test.
While the esplugin extension already had an input for the base notice
file of the plugin, the NoticeTask did not actually know how to use
that, and always used the base notice file from Elasticsearch.
This commit adds an `ignoreSha` configuration to the `dependencyLicense`
task, which allows to not check for a sha for a given dependency jar.
This is useful for locally built jars, which will constantly change.
Adds a common base class for testing subclasses of
`InternalSingleBucketAggregation`. They are so similar they
call into question the utility of having all of these classes.
We maybe could just use `InternalSingleBucketAggregation` in
all those cases.... But for now, let's test the classes!
Relates to #22278
This commit changes the build hash to be the string "Unknown" when for
some reason this build hash is not available. This aligns the value with
the value we use when the hash is not available from the jar
manifest. This situation can occur when running tests from a worktree
which is not currently handled correctly by JGit, the upstream
dependency that is used to acquire the hash. This causes problems when
running tests locally because the warning header pattern expects a hash
or the string "Unknown". While the warning header pattern be changed to
allow "N/A" as well, it seemed more sensible to align the value here
with the value when the hash is not available from the jar manifest.
Relates #23421
This change adds a new test task, platformTest, which runs `gradle test
integTest` within a vagrant host. This will allow us to still test on
all the supported platforms, but be able to standardize on the tools
provided in the host system, for example, with a modern version of git
that can allow #22946.
In order to have sufficient memory and cpu to run the tests, the
vagrantfile has been updated to use 8GB and 4 cpus by default. This can
be customized with the `VAGRANT_MEMORY` and `VAGRANT_CPUS` environment
variables. Also, to save time to show this can work, it currently uses
the same Vagrantfile the packaging tests do. There are a lot of cleanups
we can do to how the gradle-vagrant tasks work, including generating
Vagrantfile altogether, but I think this is fine for now as the same
machines that would run platformTest run packagingTest, and they are
relatively beefy machines, so the higher memory and cpu for them, with
either task, should not be an issue.