Now that debian is disabled, we are seeing similar failures with fedora
not able to install java. This commit temporarily disables fedora until
it is once again stable.
This changes the way that replica failures are handled such that not all
failures will cause the replica shard to be failed or marked as stale.
In some cases such as refresh operations, or global checkpoint syncs, it is
"okay" for the operation to fail without the shard being failed (because no data
is out of sync). In these cases, instead of failing the shard we should simply
fail the operation, and, in the event it is a user-facing operation, return a
5xx response code including the shard-specific failures.
This was accomplished by having two forms of the `Replicas` proxy, one that is
for non-write operations that does not fail the shard, and one that is for write
operations that will fail the shard when an operation fails.
Relates to #10708
Debian 8 has been having issues with the openjdk package dependencies
being broken. This comment comments out debian-8 from the boxes which
packaging tests will run on CI.
This is related to #22116. Core no longer needs `SocketPermission`
`connect`.
This permission is relegated to these modules/plugins:
- transport-netty4 module
- reindex module
- repository-url module
- discovery-azure-classic plugin
- discovery-ec2 plugin
- discovery-gce plugin
- repository-azure plugin
- repository-gcs plugin
- repository-hdfs plugin
- repository-s3 plugin
And for tests:
- mocksocket jar
- rest client
- httpcore-nio jar
- httpasyncclient jar
This commit upgrades the checkstyle configuration from version 5.9 to
version 7.5, the latest version as of today. The main enhancement
obtained via this upgrade is better detection of redundant modifiers.
Relates #22960
This change switches to using jrunscript, instead of jjs, for detecting
version properties of java, which is available on java versions prior to 8.
closes#22898
Currently, stored scripts use a namespace of (lang, id) to be put, get, deleted, and executed. This is not necessary since the lang is stored with the stored script. A user should only have to specify an id to use a stored script. This change makes that possible while keeping backwards compatibility with the previous namespace of (lang, id). Anywhere the previous namespace is used will log deprecation warnings.
The new behavior is the following:
When a user specifies a stored script, that script will be stored under both the new namespace and old namespace.
Take for example script 'A' with lang 'L0' and data 'D0'. If we add script 'A' to the empty set, the scripts map will be ["A" -- D0, "A#L0" -- D0]. If a script 'A' with lang 'L1' and data 'D1' is then added, the scripts map will be ["A" -- D1, "A#L1" -- D1, "A#L0" -- D0].
When a user deletes a stored script, that script will be deleted from both the new namespace (if it exists) and the old namespace.
Take for example a scripts map with {"A" -- D1, "A#L1" -- D1, "A#L0" -- D0}. If a script is removed specified by an id 'A' and lang null then the scripts map will be {"A#L0" -- D0}. To remove the final script, the deprecated namespace must be used, so an id 'A' and lang 'L0' would need to be specified.
When a user gets/executes a stored script, if the new namespace is used then the script will be retrieved/executed using only 'id', and if the old namespace is used then the script will be retrieved/executed using 'id' and 'lang'
This commit introduces sequence-number-based recovery. When a replica
has fallen out of sync, rather than performing a file-based recovery we
first attempt to replay operations since the last local checkpoint on
the replica. To do this, at the start of recovery the replica tells the
primary what its local checkpoint is. The primary will then wait for all
operations between that local checkpoint and the current maximum
sequence number to complete; this is to ensure that there are no gaps in
the operations that will be replayed from the primary to the
replica. This is a best-effort attempt as we currently have no
guarantees on the primary that these operations will be available; if we
are not able to replay all operations in the desired range, we just
fallback to file-based recovery. Later work will strengthen the
guarantees.
Relates #22484
This is related to #22116. URLRepository requires SocketPermission
connect. This commit introduces a new module called "repository-url"
where URLRepository will reside. With the new module, permissions can
be removed from core.
Add unit tests for `TopHitsAggregator` and convert some snippets in
docs for `top_hits` aggregation to `// CONSOLE`.
Relates to #22278
Relates to #18160
These files should have been removed in an earlier commit. This commit also simplifies usage of ProgressLoggerWrapper by using the Groovy delegation instead of using explicit delegation.
move "es." internal headers to separate metadata set in ElasticsearchException and stop returning them as response headers
Closes#17593
* [TEST] remove ESExceptionTests, move its methods to ElasticsearchExceptionTests or ExceptionSerializationTests
Instead of using Gradle-version specific compilation options, use distinct source sets. This also allows compilation of buildSrc/build-tools under IDEs that
don't understand the version-specific compilation options.
Relates to #22669
This changes build files so that building Elasticsearch works with both Gradle 2.13 as well as higher versions of Gradle (tested 2.14 and 3.3), enabling a smooth transition from Gradle 2.13 to 3.x.
* Upgrade to Lucene 6.4.0
`ValueSource`s are now converted to `DoubleValueSource`s using the Lucene adapter made for the migration to the new API in 6.4.0.
* S3 repository: Deprecate specifying credentials through env vars and sys props
This is a follow up to #22479, where storing credentials secure way was
added.
This commit adds a MessyRestTestPlugin to the gradle build. It extends
StandaloneRestTestPlugin. The main piece of functionality that it adds
is to copy plugin-metadata from dependencies into the
generated-resources for the current test source. This is necessary to
ensure that permissions for dependencies are applied when running the
tests.
A current limitation is that the permissions are applied differently
than in the distribution sources. When permissions are granted to all
depedencies for a module or plugin, the permissions are granted to all
dependencies on the classpath for tests besides a few hardcoded
exclusions:
- es core
- es test framework
- lucene test framework
- randomized runner
- junit library
This changes build files so that building Elasticsearch works with both Gradle 2.13 as well as higher versions of Gradle (tested 2.14 and 3.3), enabling a smooth transition from Gradle 2.13 to 3.x.
This PR removes all leniency in the conversion of Strings to booleans: "true"
is converted to the boolean value `true`, "false" is converted to the boolean
value `false`. Everything else raises an error.
Changes the error message when `action.auto_create_index` or
`index.mapper.dynamic` forbids automatic creation of an index
from `no such index` to one of:
* `no such index and [action.auto_create_index] is [false]`
* `no such index and [index.mapper.dynamic] is [false]`
* `no such index and [action.auto_create_index] contains [-<pattern>] which forbids automatic creation of the index`
* `no such index and [action.auto_create_index] ([all patterns]) doesn't match`
This should make it more clear *why* there is `no such index`.
Closes#22435
Today we have quite some abstractions that are essentially providing a simple
dispatch method to the plugins defining a `HttpServerTransport`. This commit
removes `HttpServer` and `HttpServerAdaptor` and introduces a simple `Dispatcher` functional
interface that delegate to `RestController` by default.
Relates to #18482
The IndexingOperationListener interface did not provide any
information about the shard id when a document was indexed.
This commit adds the shard id as the first parameter to all methods
in the IndexingOperationListener.
It is no longer needed. It used to contain a lot of strings
used by serialization but those have since been removed. Now
it is just another thing to pass around that we don't really
need.
Currently, such tasks are only created for default boxes (centos-7, ubuntu-1404) and not all boxes and this can be misleading for developers who want to debug testing scripts on non-default boxes.
The RestHighLevelClient class takes as as an argument a low level client instance RestClient. The first method added is ping, which returns true if the call to HEAD / went ok and false if an IOException was thrown. Any other exception gets bubbled up.
There are two kinds of tests, a unit test (RestHighLevelClientTests) that verifies the interaction between high level and low level client, and an integration test (MainActionIT) which relies on an externally started es cluster to send requests to.
Randomized runner uses a flag, tests.asserts, which we have previously
not used, but is used in lucene for disabling assertions. This change
modifies the gradle configuration to look for this flag and pass through
to the test runner to determine whether -ea and -esa are added to the
java commandline for tests.
This integrates the mocksocket jar with elasticsearch tests. Mocksocket wraps actions requiring SocketPermissions in doPrivilege blocks. This will eventually allow SocketPermissions to be assigned to the mocksocket jar opposed to the entire elasticsearch codebase.
This commit removes a leftover checkstyle suppression for a source file
that was temporarily forked into the codebase to hack around a bug in
Log4j. When that source file was removed, the suppression was left
behind.
The backwards compatibility tests rely on gradle's built-in mechanisms for resolving dependencies
to get the zip of the older version we test against. By default, this will cache snapshots for
24 hours, which can lead to unexpected failures in CI. This change makes the special configurations
for backwards compatibility always update their snapshots by setting the amount of time to cache
to 0 seconds.
This commit adds a test for applying logging levels in hierarchical
order, and addresses an issue with restoring the logging levels at the
end of a test or suite.
When starting a standalone cluster, we do not able assertions. This is
problematic because it means that we miss opportunities to catch
bugs. This commit enables assertions for standalone integration tests,
and fixes a couple bugs that were uncovered by enabling these.
Relates #22334
If we conditionally do random things, e.g. initialize a node only after the first test, we have to make sure that we unconditionally create a new seed calling random.nextLong(), then initialize the node under a private randomness context. This makes sure that any random usage through Randomness.get() will retrieve the proper random instance through RandomizedContext.current().getRandom(). When running under private randomness, the context will return the Random instance that was created with the provided seed (forked from the main random instance) rather than the main Random that's exposed to tests as well. Otherwise tests become non repeatable because that initialization part happens only before the first executed test.
Moved field values `toXContent` logic to `GetField` (from `GetResult`), which outputs its own fields, and can also parse them now. Also added `fromXContent` to `GetResult` and `GetResponse`.
The start object and end object for `GetResponse` output have been moved to `GetResult#toXContent`, from the corresponding rest action. This makes it possible to have `toXContent` and `fromXContent` completely symmetric, as parsing requires looping till an end object is found which is weird when the corresponding `toXContent` doesn't print that out.
This also introduces the foundation for testing retrieval of _source and stored field values.
Sequence BWC logic consists of two elements:
1) Wire level BWC using stream versions.
2) A changed to the global checkpoint maintenance semantics.
For the sequence number infra to work with a mixed version clusters, we have to consider situation where the primary is on an old node and replicas are on new ones (i.e., the replicas will receive operations without seq#) and also the reverse (i.e., the primary sends operations to a replica but the replica can't process the seq# and respond with local checkpoint). An new primary with an old replica is a rare because we do not allow a replica to recover from a new primary. However, it can occur if the old primary failed and a new replica was promoted or during primary relocation where the source primary is treated as a replica until the master starts the target.
1) Old Primary & New Replica - this case is easy as is taken care of by the wire level BWC. All incoming requests will have their seq# set to `UNASSIGNED_SEQ_NO`, which doesn't confuse the local checkpoint logic (keeping it at `NO_OPS_PERFORMED`)
2) New Primary & Old replica - this one is trickier as the global checkpoint service currently takes all in sync replicas into consideration for the global checkpoint calculation. In order to deal with old replicas, we change the semantics to say all *new node* in sync replicas. That means the replicas on old nodes don't count for the global checkpointing. In this state the seq# infra is not fully operational (you can't search on it, because copies may miss it) but it is maintained on shards that can support it. The old replicas will have to go through a file based recovery at some point and will get the seq# information at that point. There is still an edge case where a new primary fails and an old replica takes over. I'lll discuss this one with @ywelsch as I prefer to avoid it completely.
This PR also re-enables the BWC tests which were disabled. As such it had to fix any BWC issue that had crept in. Most notably an issue with the removal of the `timestamp` field in #21670.
The commit also includes a fix for the default value of the seq number field in replicated write requests (it was 0 but should be -2), that surface some other minor bugs which are fixed as well.
Last - I added some debugging tools like more sane node names and forcing replication request to implement a `toString`
If you write a yaml test with a `warnings` section in a `do` block
that doesn't also have a corresponding `skip` section for `warnings`
then client test runners that don't support `warnings` will fail.
This causes the elasticsearch build to fail so we catch these errors
earlier.
Related to #21811
Changes the build to recognize `NORELEASE` as well as `NOCOMMIT` to
mean the same thing as `norelease` and `nocommit` respectively. This
is useful because people have been using them that way but haven't
realized that only the lowercase versions worked.
This also explicitly forbids silly things like `NoReLeAsE` and
`noCOMMIT`, failing the build and telling you to spell them properly.
Set lucene version to 6.4.0-snapshot-ec38570 and update all the sha1s/license
Fix invalid combo after upgrade in query_string query. split_on_whitespace=false is disallowed if auto_generate_phrase_queries=true
Adapt the expectations of some tests to the new format of the Lucene explain output
REST tests use the default OOTB low/high disk watermarks of 85%/90%, which can make some tests fail if run on a machine with a fuller disk. This commit changes the watermarks in the same way as in IntegTestCase so that they're essentially ignored.
Add indices and filter information to search shards api output
The search shards api returns info about which shards are going to be hit by executing a search with provided parameters: indices, routing, preference. Indices can also be aliases, which can also hold filters. The output includes an array of shards and a summary of all the nodes the shards are allocated on. This commit adds a new indices section to the search shards output that includes one entry per index, where each index can be associated with an optional filter in case the index was hit through a filtered alias.
This is relevant since we have moved parsing of alias filters to the coordinating node.
Relates to #20916
Today there is no way to get notified if a node is disconnected. Client code
must poll the TransportClient constantly to detect that a node is not connected
anymore in order to react and add new nodes or notify altering etc. For instance
if a hostname gets resolved to an IP but that host is disconnected clients want
to reconnect by resolving the hostname again which is a common situation in cloud
environments.
Closes#21424
This commit adds the ability to support running with plugins in tests that make use of
backwards compatibility nodes. This can be used to test rolling upgrades with plugins
to ensure they do not cause issues during a rolling upgrade of elasticsearch.
In #21348 the command executed to run the packaging tests has been changed to "sudo -E bats ...", forcing all environment variables from the vagrant user to be passed to the `sudo` command. This breaks a test on opensuse-13 (the one where it checks that elasticsearch cannot be started when `java` is not found) because all the PATH from the user is passed to the sudo command.
This commit restores the previous behavior while allowing only necessary testing environment variables to be passed using a /etc/sudoers.d file.
JDK9 removed pathname canonicalization when constructing FilePermission objects, which breaks some of the FilePermissions added by Elasticsearch. This commit adds the system property jdk.io.permissionsUseCanonicalPath which makes JDK9 behave like JDK8 w.r.t. FilePermission objects (see #21534).
Today when a node starts, we create dynamic socket permissions based on
the configured HTTP ports and transport ports. If no ports are
configured, we use the default port ranges. When a tribe node starts, a
tribe node creates an internal node client for connecting to each remote
cluster. If neither an explicit HTTP port nor transport ports were
specified, the default port ranges are large enough for the tribe node
and its internal node clients. If an explicit HTTP port or transport
port was specified for the tribe node, then socket permissions for those
ports will be created, but not for the internal node clients. Whether
the internal node clients have explicit ports specified, or attempt to
bind within the default range, socket permissions for these will not
have been created and the internal node clients will hit a permissions
issue when attempting to bind. This commit addresses this issue by also
accounting for tribe nodes when creating the dynamic socket
permissions. Additionally, we add our first real integration test for
tribe nodes.
JDK9 removed pathname canonicalization when constructing FilePermission objects, which breaks some of the FilePermissions added by
Elasticsearch. This commit adds the system property jdk.io.permissionsUseCanonicalPath which makes JDK9 behave like JDK8 w.r.t. FilePermissions (see
https://github.com/elastic/elasticsearch/issues/21534).
This commit changes the current :elactisearch:qa:vagrant build file and transforms it into a Gradle plugin in order to reuse it in other projects.
Most of the code from the build.gradle file has been moved into the VagrantTestPlugin class. To avoid duplicated VMs when running vagrant tests, the Gradle plugin sets the following environment variables before running vagrant commands:
VAGRANT_CWD: absolute path to the folder that contains the Vagrantfile
VAGRANT_PROJECT_DIR: absolute path to the Gradle project that use the VagrantTestPlugin
The VAGRANT_PROJECT_DIR is used to share project folders and files with the vagrant VM. These folders and files are exported when running the task `gradle vagrantSetUp` which:
- collects all project archives dependencies and copies them into `${project.buildDir}/bats/archives`
- copy all project bats testing files from 'src/test/resources/packaging/tests' into `${project.buildDir}/bats/tests`
- copy all project bats utils files from 'src/test/resources/packaging/utils' into `${project.buildDir}/bats/utils`
It is also possible to inherit and grab the archives/tests/utils files from project dependencies using the plugin configuration:
apply plugin: 'elasticsearch.vagrant'
esvagrant {
inheritTestUtils true|false
inheritTestArchives true|false
inheritTests true|false
}
dependencies {
// Inherit Bats test utils from :qa:vagrant project
bats project(path: ':qa:vagrant', configuration: 'bats')
}
The folders `${project.buildDir}/bats/archives`, `${project.buildDir}/bats/tests` and `${project.buildDir}/bats/utils` are then exported to the vagrant VMs and mapped to the BATS_ARCHIVES, BATS_TESTS and BATS_UTILS environnement variables.
The following Gradle tasks have also be renamed:
* gradle vagrantSetUp
This task copies all the necessary files to the project build directory (was `prepareTestRoot`)
* gradle vagrantSmokeTest
This task starts the VMs and echoes a "Hello world" within each VM (was: `smokeTest`)
We currently have a lot of log messages in our CI output like
```
[thirdPartyAudit] WARNING: The referenced class 'org.noggit.JSONParser' cannot be loaded. Please fix the classpath!
[thirdPartyAudit] WARNING: The referenced class 'org.noggit.JSONParser' cannot be loaded. Please fix the classpath!
[thirdPartyAudit] WARNING: The referenced class 'org.noggit.JSONParser' cannot be loaded. Please fix the classpath!
[thirdPartyAudit] WARNING: The referenced class 'org.noggit.JSONParser' cannot be loaded. Please fix the classpath!
[thirdPartyAudit] WARNING: The referenced class 'org.noggit.JSONParser' cannot be loaded. Please fix the classpath!
[thirdPartyAudit] WARNING: The referenced class 'org.noggit.JSONParser' cannot be loaded. Please fix the classpath!
[thirdPartyAudit] WARNING: The referenced class 'org.noggit.JSONParser' cannot be loaded. Please fix the classpath!
[thirdPartyAudit] WARNING: The referenced class 'org.noggit.JSONParser' cannot be loaded. Please fix the classpath!
[thirdPartyAudit] WARNING: The referenced class 'org.noggit.JSONParser' cannot be loaded. Please fix the classpath!
... repeated 1827281 times ...
```
This changes these messages to be logged at the DEBUG level, so they
will not show up by default.
This change simply makes the level of the ant timestamp for waiting on
the integ test cluster echo at the info level instead of warn (the
default) so that it is only output when running with gradle --info, or
when the wait condition fails.
Today we still have a leftover from older percolators where lucene
query instances where created ahead of time and rewritten later.
This `LateParsingQuery` was resolving `now()` when it's really used which we
don't need anymore. As a side-effect this failed to execute some highlighting
queries when they get rewritten since at that point `now` access it not permitted
anymore to prevent bugs when queries get cached.
Closes#21295
Dependencies are currently marked as non-transitive in generated POM files by adding a wildcard (*) exclusion. This breaks compatibility with the dependency manager Apache Ivy as it incorrectly translates POMs with * excludes to Ivy XML with * excludes which results in the main artifact being excluded as well (see https://issues.apache.org/jira/browse/IVY-1531). To stay compatible with the current release of Ivy this commit uses explicit excludes for each transitive artifact instead to ensure that the main artifact is not excluded. This should be revisited when we upgrade Gradle to a higher version as the current one (2.13) as Gradle automatically translates non-transitive dependencies to * excludes in 2.14+.
Since we now validate all consumed request parameter, users can't specify
`_cat/nodes?full_id=true|false` anymore since this parameter is consumed late.
This commit adds a test for this parameter and consumes it before request is processed.
Closes#21266
Setting `discovery.initial_state_timeout: 0s` to make `discovery.zen.minimum_master_nodes: N`
work reliably can cause issues in clusters that rely on state recovery once the cluster is available.
This change makes the use or `discovery.zen.minimum_master_nodes` optional for clusters where this behavior is desirable.
Lucene 6.3 is expected to be released in the next weeks so it'd be good to give
it some integration testing. I had to upgrade randomized-testing too so that
both Lucene and Elasticsearch are on the same version.
Today we only use a single node to send requests to when we run REST tests.
In some cases we have more than one node (ie. in the BWC case) where we should
send requests to all nodes in a round-robin fashion. This change passes all
available node endpoints to the rest test.
Additionally, this change adds the setting of `discovery.zen.minimum_master_nodes`
to the cluster formation forcing the nodes to wait for all other nodes until the cluster
is formed. This allows for a more realistic master election and allows all master eligable
nodes to become master while before always the first node in the cluster became the master.
This also adds logging to each test run to log the master nodes version and the minimum node
version in the cluster to help debugging BWC test failures.
This fixes our cluster formation task to run REST tests against a mixed version cluster.
Yet, due to some limitations in our test framework `indices.rollover` tests are currently
disabled for the BWC case since they select the current master as the merge node which
happens to be a BWC node and we can't relocate all shards to it since the primaries are on
a higher version node. This will be fixed in a followup.
Closes#21142
Note: This has been cherry-picked from 5.0 and fixes several rest tests
as well as a BWC break in `OsStats.java`
Today the request interceptor can't support async calls since the response
of the async call would execute on a different thread ie. a client or listener
thread. This means in-turn that the intercepted handler is not executed with the
thread it was supposed to run and therefor can, if it's executing blocking
operations, potentially deadlock an entire server.
* Move all zen discovery classes into o.e.discovery.zen
This collapses sub packages of zen into zen. These all had just a couple
classes each, and there is really no reason to have the subpackages.
* fix checkstyle
When running `gradle run`, a developer usually intends to get a running
instance as if they had run elasticsearch from the command line. This is
different than the isolated environment we use for integration testing
plugins. This change switches the run task to use the zip distribution,
so that all modules included in the normal distribution are included.
Cleaning up a few remaining occurences of using junits ExpectedException rule in
favor of using LuceneTestCase#expectThrows() which is more concise and versatile.
This change adds a overloaded `XContentMapValues#filter` method that returns
a function enclosing the compiled automatons that can be reused across filter
calls. This for instance prevents compiling automatons over and over again when
hits are filtered or in the SourceFieldMapper for each document.
Closes#20839
Settings updates are important to be able to help and administer a cluster in distress. We shouldn't block it due to circuit breakers. An extreme example is where we are actually trying to increase and unreasonable low setting for the circuit breaker itself.
See https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+g1gc/242/
Instead provide services where they are needed. The class worked
well as a temporary measure to easy removal of guice from the index
level but now we can remove it entirely.
-1 @Inject annotation
This commit upgrades the Log4j 2 dependency to version 2.7 and removes
some hacks that we had in place to work around bugs in Log4j 2 version
2.6.2.
Relates #20805