Duplicate data paths already fail to work because we would attempt to
take out a node lock on the directory a second time which will fail
after the first lock attempt succeeds. However, how this failure
manifests is not apparent at all and is quite difficult to
debug. Instead, we should explicitly reject duplicate data paths to make
the failure cause more obvious.
Relates #25178
We have a custom logger implementation known as a prefix logger that is
used to write every message by the logger with a given prefix. This is
useful for node-level, index-level, and shard-level messages where we
want to log the node name, index name, and shard ID, respectively, if
possible. The mechanism that we employ is that of a marker. Log4j has a
built-in facility for managing these markers, but its effectively a
memory leak because these markers are held in a map and can never be
released. This is problematic for us since indices and shards do not
necessarily have infinite life spans and so on a node where there are
many indices being creted and destroyed, this infinite lifespan can be a
problem indeed. To solve this, we use our own cache of markers. This is
necessary to prevent too many instances of the marker for the same
prefix from being created (just think of all the shard-level components
that exist in the system), and to workaround the effective leak in
Log4j. These markers are stored as weak references in a weak hash
map. It is these weak references that are unneeded. When a key is
removed from a weak hash map, the corresponding entry is placed on a
reference queue that is eventually cleared. This commit simplifies
prefix logger by removing this unnecessary weak reference wrapper.
Relates #22460
This commit adds back "id" as the key within a script to specify a
stored script (which with file scripts now gone is no longer ambiguous).
It also adds "source" as a replacement for "code". This is in an attempt
to normalize how scripts are specified across both put stored scripts and script usages, including search template requests. This also deprecates the old inline/stored keys.
This adds an option to `ClusterConfiguration` to preserve the
`shared` directory when starting up a new cluster and switches
the `qa:full-cluster-restart` tests to use it rather than
disable the clean shared task.
Relates to #24846
We're using Vagrant in more places now than before. This commit includes a plugin that verifies
the Vagrant and Virtualbox installations for projects that depend on them. This shared code
should fix up the errors we've seen from CI builds relating to the new Kerberos fixture.
* Adds nodes usage API to monitor usages of actions
The nodes usage API has 2 main endpoints
/_nodes/usage and /_nodes/{nodeIds}/usage return the usage statistics
for all nodes and the specified node(s) respectively.
At the moment only one type of usage statistics is available, the REST
actions usage. This records the number of times each REST action class is
called and when the nodes usage api is called will return a map of rest
action class name to long representing the number of times each of the action
classes has been called.
Still to do:
* [x] Create usage service to store usage statistics
* [x] Record usage in REST layer
* [x] Add Transport Actions
* [x] Add REST Actions
* [x] Tests
* [x] Documentation
* Rafactors UsageService so counts are done by the handlers
* Fixing up docs tests
* Adds a name to all rest actions
* Addresses review comments
We default to 0 replicas in the rolling restart scenario already to ensure
we test against worst case. Yet, this adds a dummy index to ensure we also
recover and index with replicas just fine.
The Lucene version expectation in the verify Lucene version test is
backwards, mixing up the expected and actual values. This commit
reorders them to fix this issue.
The Lucene version constants for 5.4.1 and 5.5.0 are wrong, they are
listed as 6.5.0 instead of 6.5.1. This commit fixes these issues, and
adds a test to ensure that this does not happen again.
Relates #24923
Removes the need for the `_UNRELEASED` suffix on versions by detecting if a version should be unreleased or not based on the versions around it. This should make it simpler to automate the task of adding a new version label.
These tests spin up two nodes of an older version of Elasticsearch,
create some stuff, shut down the nodes, start the current version,
and verify that the created stuff works.
You can run `gradle qa:full-cluster-restart:check` to run these
tests against the head of the previous branch of Elasticsearch
(5.x for master, 5.4 for 5.x, etc) or you can run
`gradle qa:full-cluster-restart:bwcTest` to run this test against
all "index compatible" versions, one after the other. For master
this is every released version in the 5.x.y version *and* the tip
of the 5.x branch.
I'd love to add more to these tests in the future but these
currently just cover the functionality of the `create_bwc_index.py`
script and start to cover the assertions in the
`OldIndexBackwardsCompatibilityIT` test.
This is a simple refactoring to move the context definitions into the
type that they use. While we have multiple context names for the same
class at the moment, this will eventually become one ScriptContext per
instance type, so the pattern of a static member on the interface called
CONTEXT can be used. This commit also moves the consolidated list of
contexts provided by core ES into ScriptModule.
As we work towards contexts implying the return type of compilation, we
first need ScriptContext to not be an enum. This commit removes the
Standard enum and Plugin subclass of ScriptContext.
ScriptEngine implementations have an overridable method to indicate they
are safe to use as inline scripts. Since groovy was removed fro 6.0,
there are no longer any implementations which used the default false
value. Furthermore, the value was not actually read anywhere. This
commit removes the method. The ScriptEngineRegistry was also no longer
necessary as it only was used to build a map from language to engine.
This commit renames the backwards-5.0 qa test to mixed-cluster and
creates a test within the project per wire compat version. Like with
rolling upgrade tests, the integTest task will run against the most
recent version, while all versions will be tested with the bwcTest task.
Now that we generate the versions list from Versions.java we can
drop the list of versions maintained for vagrant testing. One nice
thing that the vagrant testing did was to check if the list of
versions was out of date. This moves that test to the core
project.
This commit changes the rolling upgrade test to create a set of rest
test tasks per wire compat version. The most recent wire compat version
is always tested with the `integTest` task, and all versions can be
tested with `bwcTest`.
This commit expands the logic for version extraction from Version.java
to include a list of all versions for backcompat purposes. The tests
using bwcVersion are converted to use this list, but those tests
(rolling upgrade and backwards-5.0) are still not randomized; that will
happen in another followup.
Today the `_field_caps` API doesn't implement its request serialization
correctly since indices and indices options are not serialized at all.
This will likely break with all transport clients etc. and if this request
must be send across the network. This commit fixes this and adds correct
handling if we have only remote indices to prevent the inclusion of
all local indices.
This commit renames all rest test files to use the .yml extension
instead of .yaml. This way the extension used within all of
elasticsearch for yaml is consistent.
* Add parent-join module
This change adds a new module named `parent-join`.
The goal of this module is to provide a replacement for the `_parent` field but as a first step this change only moves the `has_child`, `has_parent` queries and the `children` aggregation to this module.
These queries and aggregations are no longer in core but they are deployed by default as a module.
Relates #20257
In windows we can't reliable git the pid so we skip the
reindex-from-remote tests from old versions of elasticsearch. This
is OK because we aren't really testing windows here anyway. It isn't
great, but should be safe.
Two of the versions of Elasticsearch we need to run for these tests
can't run in Java 9 so we skip the entire test if we are running in
java 9. For now. I'd like to reenable it to run against java 8 if
there is one available, but that can wait for another time.
Relates to #24561
Adds tests for reindex-from-remote for the latest 2.4, 1.7, and
0.90 releases. 2.4 and 1.7 are fairly popular versions but 0.90
is a point of pride.
This fixes any issues those tests revealed.
Closes#23828Closes#24520
This commit renames ScriptEngineService to ScriptEngine. It is often
confusing because we have the ScriptService, and then
ScriptEngineService implementations, but the latter are not services as
we see in other places in elasticsearch.
File scripts have 2 related settings: the path of file scripts, and
whether they can be dynamically reloaded. This commit deprecates those
settings.
relates #21798
Today we rely on background syncs to relay the global checkpoint under
the mandate of the primary to its replicas. This means that the global
checkpoint on a replica can lag far behind the primary. The commit moves
to inlining global checkpoints with replication requests. When a
replication operation is performed, the primary will send the latest
global checkpoint inline with the replica requests. This keeps the
replicas closer in-sync with the primary.
However, consider a replication request that is not followed by another
replication request for an indefinite period of time. When the replicas
respond to the primary with their local checkpoint, the primary will
advance its global checkpoint. During this indefinite period of time,
the replicas will not be notified of the advanced global
checkpoint. This necessitates a need for another sync. To achieve this,
we perform a global checkpoint sync when a shard falls idle.
Relates #24513
Previously, Mustache would call `toString` on the `_ingest.timestamp`
field and return a date format that did not match Elasticsearch's
defaults for date-mapping parsing. The new ZonedDateTime class in Java 8
happens to do format itself in the same way ES is expecting.
This commit adds support for a feature flag that enables the usage of this new date format
that has more native behavior.
Fixes#23168.
This new fix can be found in the form of a cluster setting called
`ingest.new_date_format`. By default, in 5.x, the existing behavior
will remain the same. One will set this property to `true` in order to
take advantage of this update for ingest-pipeline convenience.
When installing plugin permissions, we try to set the permissions on all
installed files ourselves because a umask from the user could violate
everything needed to get the permissions right. Sadly, directories were
not handled correctly at all and so we were still left with broken
installations with umasks like 0077. This commit fixes this issue, adds
a thorough unit test for the situation, and most importantly, adds a
test that sets the umask before installing the plugin.
Relates #24527
The license header in this file was after and on the same line as the
package statement. This commit moves the package statement to be after
the license header.
Netty uses the number of processors for sizing various resources (e.g.,
thread pools, buffer pools, etc.). However, it uses the runtime number
of available processors which might not match the configured number of
processors as set in Elasticsearch to limit the number of threads (for
example, in Docker containers). A new feature was added to Netty that
enables configuring the number of processors Netty should see for sizing
this various resources. This commit takes advantage of this feature to
set this number of available processors to be equal to the configured
number of processors set in Elasticsearch.
Relates #24420
The tribe service can take a while to initialize, depending on how many cluster it needs to connect to. This change moves writing the ports file used by tests to before the tribe service is started.
This adds the `index.mapping.single_type` setting, which enforces that indices
have at most one type when it is true. The default value is true for 6.0+ indices
and false for old indices.
Relates #15613
The bats tests are descructive and must be run as root. This is a
horrible combination on any sane system but perfectly fine to do
in a VM. This change modifies the tests so they revuse to start
unless they are in an environment with an `/etc/is_vagrant_vm`
file. The Vagrantfile creates it on startup.
Closes#24137
If the user explicitly configured path.data to include
default.path.data, then we should not fail the node if we find indices
in default.path.data. This commit addresses this.
Relates #24285
In #24251 we fix an issue with stored search templates that
this test would have discovered: stored search templates cause
the node to refuse to start. Technically a "restart" test would
have caught this as well and would have caught it more quickly.
But we already *have* an upgrade test and we don't have restart tests.
And testing this on upgrade is a good thing too.
It seems that Wildfly 10 can not be made to start in a fully-functional
form on JDK 9, so this commit skips running the Wildfly integration
tests on JDK 9.
An important use case for our users is deploying our clients inside of
applications containers like Wildly. Sometimes, we make changes that
unintentionally break this use case. We need to know before we ship a
release that we have broken such use cases. As Wildfly is one of the
bigger application containers, this commit starts by adding an
integration test that deploys an application using the transport client
to Wildfly and ensures that all is well. Future work can add similar
integration tests for the low-level and high-level REST clients.
Relates #24147
The plugin cli currently resides inside the elasticsearch jar. This
commit moves it into a plugin-cli jar. This is change alone is a no-op;
it does not change anything about what is loaded at runtime. But it will
allow easier testing (with fixtures in the future to test ES or maven
installation), as well as eventually not loading these classes when
starting elasticsearch.
This leniency was left in after plugin installer refactoring for 2.0
because some tests still relied on it. However, the need for this
leniency no longer exists.
* Replicate write failures
Currently, when a primary write operation fails after generating
a sequence number, the failure is not communicated to the replicas.
Ideally, every operation which generates a sequence number on primary
should be recorded in all replicas.
In this change, a sequence number is associated with write operation
failure. When a failure with an assinged seqence number arrives at a
replica, the failure cause and sequence number is recorded in the translog
and the sequence number is marked as completed via executing `Engine.noOp`
on the replica engine.
* use zlong to serialize seq_no
* Incorporate feedback
* track write failures in translog as a noop in primary
* Add tests for replicating write failures.
Test that document failure (w/ seq no generated) are recorded
as no-op in the translog for primary and replica shards
* Update to master
* update shouldExecuteOnReplica comment
* rename indexshard noop to markSeqNoAsNoOp
* remove redundant conditional
* Consolidate possible replica action for bulk item request
depanding on it's primary execution
* remove bulk shard result abstraction
* fix failure handling logic for bwc
* add more tests
* minor fix
* cleanup
* incorporate feedback
* incorporate feedback
* add assert to remove handling noop primary response when 5.0 nodes are not supported
This change simplifies how the rest test runner finds test files and
removes all leniency. Previously multiple prefixes and suffixes would
be tried, and tests could exist inside or outside of the classpath,
although outside of the classpath never quite worked. Now only classpath
tests are supported, and only one resource prefix is supported,
`/rest-api-spec/tests`.
closes#20240
After splitting integ tests into cluster configuration and the test
runner task, we still have dependencies of the test runner added as deps
of the cluster. This commit adds dependencies directly to the cluster,
so that the runner can have other dependencies independent of what is
needed for the cluster.
We have some packaging tests where we use a custom jvm.options file. The
flags we use here are barebones, just enough to exercise that using a
custom jvm.options files actually works. Due to this, we are missing a
Log4j flag that prevents Log4j from trying to use JMX. If Log4j tries to
use JMX, it hits a security manager exception and tries to log
this. This attempt to log happens before we've configured
logging. Previously, Elasticsearch was lenient here so this was treated
as harmless and the test could march on. Now, we fail startup if we
detect an attempt to log before logging is configured so this prevents
Elasticsearch from starting if we do not have jvm.options files in place
that prevent these log messages from being written before logging is
being configured. This commit adds jvm.options files in the places need
to prevent this.
These tests were marked as awaits fix due to JNA requiring a version of
glibc greater than or equal to version 2.14. Since we still support
systems that would not have this version, we have released our own JNA
dependency that is built to support earlier versions of glibc. This
commit removes some await fixes that were added to tests that failed as
a result of this situation.
It can easily happen that we touch a logger before logging is configured
due to chains of static intializers and other such scenarios. This
commit adds detection for this mechanism that will fail startup if we
touch a logger before logging is configured. This is a bug that will
cause builds to fail.
Relates #24076
This is related to #23893. This commit allows users to use wilcards for
cluster names when executing a cross cluster search.
So instead of defining every cluster such as:
GET one:*,two:*,three:*/_search
A user could just search:
GET *:*/_search
As ":" characters are currently allowed in index names, if the text
up to the first ":" does not match a defined cluster name, the entire
string is treated as an index name.
All our actions that are invoked from rest actions have corresponding
transport actions. This adds the transport action for RestRemoteClusterInfoAction
for consistency.
Relates to #23969
This commit fixes the handling of POSIX permissions on Windows in the
spawner tests. Since POSIX permissions do not exist there, we first have
to check if we are on a filesystem that supports POSIX or not before
attempting to set the permissions.
This commit renames the random ASCII helper methods in ESTestCase. This
is because this method ultimately uses the random ASCII methods from
randomized runner, but these methods actually only produce random
strings generated from [a-zA-Z].
Relates #23886
This commit adds a single node discovery type. With this discovery type,
a node will elect itself as master and never form a cluster with another
node.
Relates #23595
We currently have the last minor version of the previous major hardcoded
in tests like rolling upgrade. This change programatically finds this
during gradle initialization by parsing versions from Version.java.
These tests were disabled due to an issue in Netty which has since been
resolved and integrated into Elasticsearch.
Relates elastic/elasticsearch#20260
The current rest backcompat tests, which run against a mixed cluster of
5.x and 6.0 nodes, depend on snapshot builds of 5.x. However, this has
the potential for inconsistency that results in CI failures, and happens
quite often, whenever some backcompat logic is added to 5.x, but the bwc
test on master fails because the 5.x code has not yet been published as
a snapshot.
This change creates a git clone of the 5.x branch,
builds the zip distribution, and ties that into gradle substitutions for
the 5.x version.
We currently use POSIX exit codes in all of our CLIs. However, posix
only suggests these exit codes are standard across tools. It does not
prescribe particular uses for codes outside of that range. This commit
adds 2 exit codes specific to plugin installation to make distinguishing
an incorrectly built plugin and a plugin already existing easier.
closes#15295
This commit catches the underlying failure when trying to list plugin
information when a plugin is incompatible with the current version of
elasticsearch. This could happen when elasticsearch is upgraded but old
plugins still exist. With this change, all plugins will be output,
instead of failing at the first out of date plugin.
closes#20691
This commit marks the EvilJNANativesTests as awaiting fixes due to these
tests failing on platforms that do not provide at least version 2.14 of
glibc.
This commit adds a system property that enables end-users to explicitly
enforce the bootstrap checks, independently of the binding of the
transport protocol. This can be useful for single-node production
systems that do not bind the transport protocol (and thus the bootstrap
checks would not be enforced).
Relates #23585
Throw error when skip or do sections are malformed, such as they don't start with the proper token (START_OBJECT). That signals bad indentation, which would be ignored otherwise. Thanks (or due to) our pull parsing code, we were still able to properly parse the sections, yet other runners weren't able to.
Closes#21980
* [TEST] fix indentation in matrix_stats yaml tests
* [TEST] fix indentation in painless yaml test
* [TEST] fix indentation in analysis yaml tests
* [TEST] fix indentation in generated docs yaml tests
* [TEST] fix indentation in multi_cluster_search yaml tests
Gradle's finalizedBy on tasks only ensures one task runs after another,
but not immediately after. This is problematic for our integration tests
since it allows multiple project's integ test clusters to be
simultaneously. While this has not been a problem thus far (gradle 2.13
happened to keep the finalizedBy tasks close enough that no clusters
were running in parallel), with gradle 3.3 the task graph generation has
changed, and numerous clusters may be running simultaneously, causing
memory pressure, and thus generally slower tests, or even failure if the
system has a limited amount of memory (eg in a vagrant host).
This commit reworks how integ tests are configured. It adds an
`integTestCluster` extension to gradle which is equivalent to the current
`integTest.cluster` and moves the rest test runner task to
`integTestRunner`. The `integTest` task is then just a dummy task,
which depends on the cluster runner task, as well as the cluster stop
task. This means running `integTest` in one project will both run the
rest tests, and shut down the cluster, before running `integTest` in
another project.
* Make document write requests immutable
Previously, write requests were mutated at the
transport level to update request version, version type
and sequence no before replication.
Now that all write requests go through the shard bulk
transport action, we can use the primary response stored
in item level bulk requests to pass the updated version,
seqence no. to replicas.
* incorporate feedback
* minor cleanup
* Add bwc test to ensure correct index version propagates to replica
* Fix bwc for propagating write operation versions
* Add assertion on replica request version type
* fix tests using internal version type for replica op
* Fix assertions to assert version type in replica and recovery
* add bwc tests for version checks in concurrent indexing
* incorporate feedback
In the packaging tests we make some requests to Elasticsearch as part of
the tests. These requests were not setting the content-type header. This
commit addresses this.
In the packaging tests we make some requests to Elasticsearch as part of
the tests. These requests were not setting the content-type header. This
commit addresses this.
This commit enforces the requirement of Content-Type for the REST layer and removes the deprecated methods in transport
requests and their usages.
While doing this, it turns out that there are many places where *Entity classes are used from the apache http client
libraries and many of these usages did not specify the content type. The methods that do not specify a content type
explicitly have been added to forbidden apis to prevent more of these from entering our code base.
Relates #19388
These images have been rebuilt to be preloaded with java 8 installed.
This change re-enables the systems. It also removes some redundancy in
the rpm checks I found while testing the new images, and fixes a
potential issue with generated resources in plugins where a stale dir
can cause junk to get into the distribution.
As part of #22116 we are going to forbid usage of api
java.net.URL#openStream(). However in a number of places across the
we use this method to read files from the local filesystem. This commit
introduces a helper method openFileURLStream(URL url) to read files
from URLs. It does specific validation to only ensure that file:/
urls are read.
Additionlly, this commit removes unneeded method
FileSystemUtil.newBufferedReader(URL, Charset). This method used the
openStream () method which will soon be forbidden. Instead we use the
Files.newBufferedReader(Path, Charset).
Today either all nodes in the cluster connect to remote clusters of only nodes
that have remote clusters configured in their node config. To allow global remote
cluster configuration but restrict connections to a set of nodes in the cluster
this change adds a new setting `search.remote.connect` (defaults to `true`) to allow
to disable remote cluster connections on a per node basis.
This commit upgrades the checkstyle configuration from version 5.9 to
version 7.5, the latest version as of today. The main enhancement
obtained via this upgrade is better detection of redundant modifiers.
Relates #22960
When a primary is relocated from an old node to a new node, it can have
ops in its translog that do not have a sequence number assigned. When a
file-based recovery is started, this can lead to skipping these ops when
replaying the translog due to a bug in the recovery logic. This commit
addresses this bug and adds a test in the BWC tests.
Relates #22945
Today if a user invokes the remove plugin command without specifying the
name of a plugin to remove, we arrive at a null pointer exception. This
commit adds logic to cleanly handle this situation and provide clear
feedback to the user.
Relates #22930
This change adds a strict mode for xcontent parsing on the rest layer. The strict mode will be off by default for 5.x and in a separate commit will be enabled by default for 6.0. The strict mode, which can be enabled by setting `http.content_type.required: true` in 5.x, will require that all incoming rest requests have a valid and supported content type header before the request is dispatched. In the non-strict mode, the Content-Type header will be inspected and if it is not present or not valid, we will continue with auto detection of content like we have done previously.
The content type header is parsed to the matching XContentType value with the only exception being for plain text requests. This value is then passed on with the content bytes so that we can reduce the number of places where we need to auto-detect the content type.
As part of this, many transport requests and builders were updated to provide methods that
accepted the XContentType along with the bytes and the methods that would rely on auto-detection have been deprecated.
In the non-strict mode, deprecation warnings are issued whenever a request with body doesn't provide the Content-Type header.
See #19388
Currently, stored scripts use a namespace of (lang, id) to be put, get, deleted, and executed. This is not necessary since the lang is stored with the stored script. A user should only have to specify an id to use a stored script. This change makes that possible while keeping backwards compatibility with the previous namespace of (lang, id). Anywhere the previous namespace is used will log deprecation warnings.
The new behavior is the following:
When a user specifies a stored script, that script will be stored under both the new namespace and old namespace.
Take for example script 'A' with lang 'L0' and data 'D0'. If we add script 'A' to the empty set, the scripts map will be ["A" -- D0, "A#L0" -- D0]. If a script 'A' with lang 'L1' and data 'D1' is then added, the scripts map will be ["A" -- D1, "A#L1" -- D1, "A#L0" -- D0].
When a user deletes a stored script, that script will be deleted from both the new namespace (if it exists) and the old namespace.
Take for example a scripts map with {"A" -- D1, "A#L1" -- D1, "A#L0" -- D0}. If a script is removed specified by an id 'A' and lang null then the scripts map will be {"A#L0" -- D0}. To remove the final script, the deprecated namespace must be used, so an id 'A' and lang 'L0' would need to be specified.
When a user gets/executes a stored script, if the new namespace is used then the script will be retrieved/executed using only 'id', and if the old namespace is used then the script will be retrieved/executed using 'id' and 'lang'
This commit introduces sequence-number-based recovery. When a replica
has fallen out of sync, rather than performing a file-based recovery we
first attempt to replay operations since the last local checkpoint on
the replica. To do this, at the start of recovery the replica tells the
primary what its local checkpoint is. The primary will then wait for all
operations between that local checkpoint and the current maximum
sequence number to complete; this is to ensure that there are no gaps in
the operations that will be replayed from the primary to the
replica. This is a best-effort attempt as we currently have no
guarantees on the primary that these operations will be available; if we
are not able to replay all operations in the desired range, we just
fallback to file-based recovery. Later work will strengthen the
guarantees.
Relates #22484
This is related to #22116. URLRepository requires SocketPermission
connect. This commit introduces a new module called "repository-url"
where URLRepository will reside. With the new module, permissions can
be removed from core.
There are presently 7 ctor args used in any rest handlers:
* `Settings`: Every handler uses it to initialize a logger and
some other strange things.
* `RestController`: Every handler registers itself with it.
* `ClusterSettings`: Used by `RestClusterGetSettingsAction` to
render the default values for cluster settings.
* `IndexScopedSettings`: Used by `RestGetSettingsAction` to get
the default values for index settings.
* `SettingsFilter`: Used by a few handlers to filter returned
settings so we don't expose stuff like passwords.
* `IndexNameExpressionResolver`: Used by `_cat/indices` to
filter the list of indices.
* `Supplier<DiscoveryNodes>`: Used to fill enrich the response
by handlers that list tasks.
We probably want to reduce these arguments over time but
switching construction away from guice gives us tighter
control over the list of available arguments.
These parameters are passed to plugins using
`ActionPlugin#initRestHandlers` which is expected to build and
return that handlers immediately. This felt simpler than
returning an reference to the ctors given all the different
possible args.
Breaks java plugins by moving rest handlers off of guice.
This commit adds a MessyRestTestPlugin to the gradle build. It extends
StandaloneRestTestPlugin. The main piece of functionality that it adds
is to copy plugin-metadata from dependencies into the
generated-resources for the current test source. This is necessary to
ensure that permissions for dependencies are applied when running the
tests.
A current limitation is that the permissions are applied differently
than in the distribution sources. When permissions are granted to all
depedencies for a module or plugin, the permissions are granted to all
dependencies on the classpath for tests besides a few hardcoded
exclusions:
- es core
- es test framework
- lucene test framework
- randomized runner
- junit library
For certain situations, end-users need the base path for Elasticsearch
logs. Exposing this as a property is better than hard-coding the path
into the logging configuration file as otherwise the logging
configuration file could easily diverge from the Elasticsearch
configuration file. Additionally, Elasticsearch will only have
permissions to write to the log directory configured in the
Elasticsearch configuration file. This commit adds a property that
exposes this base path.
One use-case for this is configuring a rollover strategy to retain logs
for a certain period of time. As such, we add an example of this to the
documentation.
Additionally, we expose the property es.logs.cluster_name as this is
used as the name of the log files in the default configuration.
Finally, we expose es.logs.node_name in cases where node.name is
explicitly set in case users want to include the node name as part of
the name of the log files.
Relates #22625
When logger.level is set, we end up configuring a logger named "level"
because we look for all settings of the form "logger\..+" as configuring
a logger. Yet, logger.level is special and is meant to only configure
the default logging level. This commit causes is to avoid not
configuring a logger named level.
Relates #22624
This commit removes a logging test that is now obsolete. This test was
added when we included a forked version of some Log4j 2 classes to
workaround a bug in Log4j 2. This bug was fixed and a version of Log4j 2
incorporating this fix was previously integrated into Elaticsearch. At
that time, the forked versions were removed, and this test should have
been removed with it.
* Fix Translog.Delete serialization for sequence numbers
Translog.Delete used `.writeVLong` instead of `.writeLong` for the sequence
number and primary term (and their respective "read" variants). This could lead
to issues where a 5.x node sent a translog operation with a negative sequence
number (-2 for unassigned seq no) that tripped an assertion serializing a
negative number and causing ES to exit.
Adds a unit test for serialization and a mixed-cluster REST test, since that was
how this was originally caught.
* Use more realistic values for random seqNum and primary term
* Add comment with TODO for removal in 7.0
* Change comment into an assert
Instead of `search.remote.seeds.${clustername}` we now specify the seeds as:
`search.remote.${clustername}.seeds` which is a real list setting compared to an unvalidated
group setting before.
`ToXContentObject` extends `ToXContent` without adding new methods to it, while allowing to mark classes that output complete xcontent objects to distinguish them from classes that require starting and ending an anonymous object externally.
Ideally ToXContent would be renamed to ToXContentFragment, but that would be a huge change in our codebase, hence we simply document the fact that toXContent outputs fragments with no guarantees that the output is valid per se without an external ancestor.
Relates to #16347
this commit adds full support for proxy nodes on the search layer.
This allows to connection only to a small set of nodes on a remote cluster
to exectue the search. The nodes will proxy the request to the correct node in the
cluster while the coordinting node doesn't need to be connected to the target node.
This change is the first towards providing the ability to store
sensitive settings in elasticsearch. It adds the
`elasticsearch-keystore` tool, which allows managing a java keystore.
The keystore is loaded upon node startup in Elasticsearch, and used by
the Setting infrastructure when a setting is configured as secure.
There are a lot of caveats to this PR. The most important is it only
provides the tool and setting infrastructure for secure strings. It does
not yet provide for keystore passwords, keypairs, certificates, or even
convert any existing string settings to secure string settings. Those
will all come in follow up PRs. But this PR was already too big, so this
at least gets a basic version of the infrastructure in.
The two main things to look at. The first is the `SecureSetting` class,
which extends `Setting`, but removes the assumption for the raw value of the
setting to be a string. SecureSetting provides, for now, a single
helper, `stringSetting()` to create a SecureSetting which will return a
SecureString (which is like String, but is closeable, so that the
underlying character array can be cleared). The second is the
`KeyStoreWrapper` class, which wraps the java `KeyStore` to provide a
simpler api (we do not need the entire keystore api) and also extend
the serialized format to add metadata needed for loading the keystore
with no assumptions about keystore type (so that we can change this in
the future) as well as whether the keystore has a password (so that we
can know whether prompting is necessary when we add support for keystore
passwords).
* Remove a checked exception, replacing it with `ParsingException`.
* Remove all Parser classes for the yaml sections, replacing them with static methods.
* Remove `ClientYamlTestFragmentParser`. Isn't used any more.
* Remove `ClientYamlTestSuiteParseContext`, replacing it with some static utility methods.
I did not rewrite the parsers using `ObjectParser` because I don't think it is worth it right now.
Today if an older version of a plugin exists, we fail to notify the user
with a helpful error message. This happens because during plugin
verification, we attempt to read the plugin descriptors for all existing
plugins. When an older version of a plugin is sitting on disk, we will
attempt to read this old plugin descriptor and fail due to a version
mismatch. This leads to an unhelpful error message. Instead, we should
check for existence of the plugin as part of the verification phase, but
before attempting to read plugin descriptors for existing plugins. This
enables us to provide a helpful error message to the user.
Relates #22305
* Internal: Refactor SettingCommand into EnvironmentAwareCommand
This change renames and changes the behavior of SettingCommand to have
its primary method take in a fully initialized Environment for
elasticsearch instead of just a map of settings. All of the subclasses
of SettingCommand already did this at some point, so this just removes
duplication.
We are currenlty checking that no deprecation warnings are emitted in our query tests. That can be moved to ESTestCase (disabled in ESIntegTestCase) as it allows us to easily catch where our tests use deprecated features and assert on the expected warnings.
Sequence BWC logic consists of two elements:
1) Wire level BWC using stream versions.
2) A changed to the global checkpoint maintenance semantics.
For the sequence number infra to work with a mixed version clusters, we have to consider situation where the primary is on an old node and replicas are on new ones (i.e., the replicas will receive operations without seq#) and also the reverse (i.e., the primary sends operations to a replica but the replica can't process the seq# and respond with local checkpoint). An new primary with an old replica is a rare because we do not allow a replica to recover from a new primary. However, it can occur if the old primary failed and a new replica was promoted or during primary relocation where the source primary is treated as a replica until the master starts the target.
1) Old Primary & New Replica - this case is easy as is taken care of by the wire level BWC. All incoming requests will have their seq# set to `UNASSIGNED_SEQ_NO`, which doesn't confuse the local checkpoint logic (keeping it at `NO_OPS_PERFORMED`)
2) New Primary & Old replica - this one is trickier as the global checkpoint service currently takes all in sync replicas into consideration for the global checkpoint calculation. In order to deal with old replicas, we change the semantics to say all *new node* in sync replicas. That means the replicas on old nodes don't count for the global checkpointing. In this state the seq# infra is not fully operational (you can't search on it, because copies may miss it) but it is maintained on shards that can support it. The old replicas will have to go through a file based recovery at some point and will get the seq# information at that point. There is still an edge case where a new primary fails and an old replica takes over. I'lll discuss this one with @ywelsch as I prefer to avoid it completely.
This PR also re-enables the BWC tests which were disabled. As such it had to fix any BWC issue that had crept in. Most notably an issue with the removal of the `timestamp` field in #21670.
The commit also includes a fix for the default value of the seq number field in replicated write requests (it was 0 but should be -2), that surface some other minor bugs which are fixed as well.
Last - I added some debugging tools like more sane node names and forcing replication request to implement a `toString`
Problem: We introduced the ability to shorten the rank eval request by using a
template in #20231. When playing with the API it turned out that there might be
use cases where - e.g. due to various heuristics - folks might want to translate
the original user query into more than just one type of Elasticsearch query.
Solution: Give each template an id that can later be referenced in the
actual requests.
Closes#21257
Today in the codebase we refer to seccomp everywhere instead of system
call filter even if we are not specifically referring to Linux. This
commit is a purely mechanical change to refer to system call filter
where appropriate instead of the general seccomp, and only leaves
seccomp in place when actually referring to the Linux implementation.
Relates #22243
Inline scripts defined in Ingest Pipelines are now compiled at creation time to preemptively catch errors on initialization of the pipeline.
Fixes#21842.
Moves the last of the "easy" parser construction into
`RestRequest`, this time with a new method
`RestRequest#contentParser`. The rest of the production
code that builds `XContentParser` isn't "easy" because it is
exposed in the Transport Client API (a Builder) object.
This commit enables CLI commands to be closeable and installs a runtime
shutdown hook to ensure that if the JVM shuts down (as opposed to
aborting) the close method is called.
It is not enough to wrap uses of commands in main methods in
try-with-resources blocks as these will not run if, say, the virtual
machine is terminated in response to SIGINT, or system shutdown event.
Relates #22126
Add checks to RankEvalSpec to safe guard against missing parameters.
Fail early in case no metric is supplied, no rated requests are supplied or the search source builder is missing but no template is supplied neither.
Add stricter checks around rank eval request parsing: Fail if in a rated request we see both, a verbatim request as well as request
template parameters.
Relates to #21260
Our query DSL supports empty queries (`{}`), which have a different meaning depending on the query that holds it, either ignored, match_all or match_none. We deprecated the support for empty queries in 5.0, where we log a deprecation warning wherever they are used.
The way we supported it once we moved query parsing to the coordinating node was having an Optional<QueryBuilder> return type in all of our parse methods (called fromXContent). See #17624. The central place for this was QueryParseContext#parseInnerQueryBuilder. We can now remove all the optional return types and simply throw an exception whenever an empty query is found.
Move rank-eval template compilation down to TransportRankEvalAction
Closes#21777 and #21465
Search templates for rank_eval endpoint so far only worked when sent through
REST end point
However we also allow templates to be set through a Java API call to
"setTemplate" on that same spec. This doesn't go through template execution so
fails further down the line.
To make this work, moved template execution further down, probably to
TransportRankEvalAction.
* Add template IT test for Java API
* Move template compilation to TransportRankEvalAction
Action filters currently have the ability to filter both the request and
response. But the response side was not actually used. This change
removes support for filtering responses with action filters.
We don't use the test infra nor do we run the tests. They might all be
entirely out of date. We also have a different BWC test infra in-place.
This change removes all of the legacy infra.
For the record, I also had to remove the geo-hash cell and geo-distance range
queries to make the code compile. These queries already throw an exception in
all cases with 5.x indices, so that does not hurt any more.
I also had to rename all 2.x bwc indices from `index-${version}` to
`unsupported-${version}` to make `OldIndexBackwardCompatibilityIT`
happy.
This commit adds `qa/multi-cluster-search` which currently does a
simple search across 2 clusters. This commit also adds support for IPv6
addresses and fixes an issue where all shards of the local cluster are searched
when only a remote index was given.
* Scripting: Remove groovy scripting language
Groovy was deprecated in 5.0. This change removes it, along with the
legacy default language infrastructure in scripting.
We kept `netty_3` as a fallback in the 5.x series but now that master
is 6.0 we don't need this or in other words all issues coming up with
netty 4 will be blockers for 6.0.
* master: (22 commits)
Add proper toString() method to UpdateTask (#21582)
Fix `InternalEngine#isThrottled` to not always return `false`. (#21592)
add `ignore_missing` option to SplitProcessor (#20982)
fix trace_match behavior for when there is only one grok pattern (#21413)
Remove dead code from GetResponse.java
Fixes date range query using epoch with timezone (#21542)
Do not cache term queries. (#21566)
Updated dynamic mapper section
Docs: Clarify date_histogram bucket sizes for DST time zones
Handle release of 5.0.1
Fix skip reason for stats API parameters test
Reduce skip version for stats API parameter tests
Strict level parsing for indices stats
Remove cluster update task when task times out (#21578)
[DOCS] Mention "all-fields" mode doesn't search across nested documents
InternalTestCluster: when restarting a node we should validate the cluster is formed via the node we just restarted
Fixed bad asciidoc in boolean mapping docs
Fixed bad asciidoc ID in node stats
Be strict when parsing values searching for booleans (#21555)
Fix time zone rounding edge case for DST overlaps
...
There is not yet a BWC layer in sequence numbers. This commit sets the
BWC version to 6.0.0 for the BWC and rolling upgrade tests until this
BWC layer is built.
Adds a version constant for it, bwc indices, and a vagrant upgrade-from
version. Also bumps the "upgrade from" version for the backwards-5.0
test and adds `skip`s for tests that don't fail against 5.0 so we skip
them during the backwards testing.
Finally, this skips the "Shrink index via API" test because it fails
consistently for me. Inconsistently for CI, but consistently for me.
I'll work on making it consistent tomorrow.
In #21348 the command executed to run the packaging tests has been changed to "sudo -E bats ...", forcing all environment variables from the vagrant user to be passed to the `sudo` command. This breaks a test on opensuse-13 (the one where it checks that elasticsearch cannot be started when `java` is not found) because all the PATH from the user is passed to the sudo command.
This commit restores the previous behavior while allowing only necessary testing environment variables to be passed using a /etc/sudoers.d file.
This changes adds a test discovery (which internally uses the existing
mock zenping by default). Having the mock the test framework selects be a discovery
greatly simplifies discovery setup (no more weird callback to a Node
method).
Today when a node starts, we create dynamic socket permissions based on
the configured HTTP ports and transport ports. If no ports are
configured, we use the default port ranges. When a tribe node starts, a
tribe node creates an internal node client for connecting to each remote
cluster. If neither an explicit HTTP port nor transport ports were
specified, the default port ranges are large enough for the tribe node
and its internal node clients. If an explicit HTTP port or transport
port was specified for the tribe node, then socket permissions for those
ports will be created, but not for the internal node clients. Whether
the internal node clients have explicit ports specified, or attempt to
bind within the default range, socket permissions for these will not
have been created and the internal node clients will hit a permissions
issue when attempting to bind. This commit addresses this issue by also
accounting for tribe nodes when creating the dynamic socket
permissions. Additionally, we add our first real integration test for
tribe nodes.
This commit enables real BWC testing against a 5.1 snapshot. All
REST tests plus rolling upgrade test now run against a mixed version
cross major version cluster.
This commit enables real BWC testing against a 5.1 snapshot. All
REST tests plus rolling upgrade test now run against a mixed version
cross major version cluster.
This commit changes the current :elactisearch:qa:vagrant build file and transforms it into a Gradle plugin in order to reuse it in other projects.
Most of the code from the build.gradle file has been moved into the VagrantTestPlugin class. To avoid duplicated VMs when running vagrant tests, the Gradle plugin sets the following environment variables before running vagrant commands:
VAGRANT_CWD: absolute path to the folder that contains the Vagrantfile
VAGRANT_PROJECT_DIR: absolute path to the Gradle project that use the VagrantTestPlugin
The VAGRANT_PROJECT_DIR is used to share project folders and files with the vagrant VM. These folders and files are exported when running the task `gradle vagrantSetUp` which:
- collects all project archives dependencies and copies them into `${project.buildDir}/bats/archives`
- copy all project bats testing files from 'src/test/resources/packaging/tests' into `${project.buildDir}/bats/tests`
- copy all project bats utils files from 'src/test/resources/packaging/utils' into `${project.buildDir}/bats/utils`
It is also possible to inherit and grab the archives/tests/utils files from project dependencies using the plugin configuration:
apply plugin: 'elasticsearch.vagrant'
esvagrant {
inheritTestUtils true|false
inheritTestArchives true|false
inheritTests true|false
}
dependencies {
// Inherit Bats test utils from :qa:vagrant project
bats project(path: ':qa:vagrant', configuration: 'bats')
}
The folders `${project.buildDir}/bats/archives`, `${project.buildDir}/bats/tests` and `${project.buildDir}/bats/utils` are then exported to the vagrant VMs and mapped to the BATS_ARCHIVES, BATS_TESTS and BATS_UTILS environnement variables.
The following Gradle tasks have also be renamed:
* gradle vagrantSetUp
This task copies all the necessary files to the project build directory (was `prepareTestRoot`)
* gradle vagrantSmokeTest
This task starts the VMs and echoes a "Hello world" within each VM (was: `smokeTest`)
On some systems these utilities are in /usr/lib/systemd/systemd-sysctl
and /usr/sbin/sysctl, and on others the /usr is dropped. This commit
accounts for that fact.
Our docs claim that we set vm.max_map_count automatically. This is not
quite the case. The story is that on SysV init we set vm.max_map_count
each time the service starts, which is good. On systemd, we create a
sysctl.d conf file that sets vm.map_max_count, but this is only
meaningful if the system is rebooted after package install. This commit
modifies the post-install script so that we run systemd-sysctl so that
the vm.max_map_count change occurs after package install without a
reboot.
Relates #21507
This commit ensure that VirtualBox is available in version 5.1+ in the system before running packaging tests. It also check for Vagrant version is now greater than 1.8.6.
The environment variable ES_JVM_OPTIONS allows end-users to specify a
custom location for the jvm.options file. Unfortunately, this
environment variable is not exported from the SysV init scripts. This
commit addresses this issue, and includes a test that ES_JVM_OPTIONS and
ES_JAVA_OPTS work for the SysV init packages.
Relates #21445
At one point in the past when moving out the rest tests from core to
their own subproject, we had multiple test classes which evenly split up
the tests to run. However, we simplified this and went back to a single
test runner to have better reproduceability in tests. This change
removes the remnants of that multiplexing support.
Today if you start Elasticsearch with the status logger configured to
the warn level, or use a transport client with the default status logger
level, you will see warn messages about deprecation loggers being
created with different message factories and that formatting might be
broken. This happens because the deprecation logger is constructed using
the message factory from its parent, an artifact leftover from the first
Log4j 2 implementation that used a custom message factory. When that
custom message factory was removed, this constructor invocation should
have been changed to not explicitly use the message factory from the
parent. This commit fixes this invocation. However, we also had some
status checking to all tests to ensure that there are no warn status log
messages that might indicate a configuration problem with Log4j 2. These
assertions blow up badly without the fix for the deprecation logger
construction, and also caught a misconfiguration in one of the logging
tests.
Relates #21339