set watcher logger to debug level.
These tests haven't run in such a long time,
we first need to get a better picture how/if
these tests fail today.
Backport of #51478
See #33185
Prior to the change the watcher index listener didn't implement the
`postIndex(ShardId, Engine.Index, Engine.IndexResult)` method. This
caused document level exceptions like VersionConflictEngineException
to be ignored. This commit fixes this.
The watcher indexing listener did implement the `postIndex(ShardId, Engine.Index, Exception)`
method, but that only handles engine level exceptions.
This change also unmutes the SmokeTestWatcherTestSuiteIT#testMonitorClusterHealth test again.
Relates to #32299
Previously this test failed waiting for yellow:
https://gradle-enterprise.elastic.co/s/fv55holsa36tg/console-log#L2676
Oddly cluster health returned red status, but there were no unassigned, relocating or initializing shards.
Placed the waiting for green in a try-catch block, so that when this fails again then cluster state gets printed.
Relates to #48381
This change adds a new `kibana_admin` role, and deprecates
the old `kibana_user` and`kibana_dashboard_only_user`roles.
The deprecation is implemented via a new reserved metadata
attribute, which can be consumed from the API and also triggers
deprecation logging when used (by a user authenticating to
Elasticsearch).
Some docs have been updated to avoid references to these
deprecated roles.
Backport of: #46456
Co-authored-by: Larry Gregory <lgregorydev@gmail.com>
Previously custom realms were limited in what services and components
they had easy access to. It was possible to work around this because a
security extension is packaged within a Plugin, so there were ways to
store this components in static/SetOnce variables and access them from
the realm, but those techniques were fragile, undocumented and
difficult to discover.
This change includes key services as an argument to most of the methods
on SecurityExtension so that custom realm / role provider authors can
have easy access to them.
Backport of: #50534
Some changes had to be made in order to make the test pass due to the removal or types.
Added some more assertions. The failure description in this comment [0] indicates that the rest handler couldn't be found. The test passes now.
I plan to merge this into master and see how CI reacts, if it handles this change well then I will also unmute this test in 7 dot x branch.
Also check watch count after stopping watcher in test teardown and
disabled slm in smoke test watcher qa test.
Relates to #41172
0: https://github.com/elastic/elasticsearch/issues/41172#issuecomment-496993976
* Update remote cluster stats to support simple mode (#49961)
Remote cluster stats API currently only returns useful information if
the strategy in use is the SNIFF mode. This PR modifies the API to
provide relevant information if the user is in the SIMPLE mode. This
information is the configured addresses, max socket connections, and
open socket connections.
* Send hostname in SNI header in simple remote mode (#50247)
Currently an intermediate proxy must route conncctions to the
appropriate remote cluster when using simple mode. This commit offers
a additional mechanism for the proxy to route the connections by
including the hostname in the TLS SNI header.
* Rename the remote connection mode simple to proxy (#50291)
This commit renames the simple connection mode to the proxy connection
mode for remote cluster connections. In order to do this, the mode specific
settings which we namespaced by their mode (ex: sniff.seed and
proxy.addresses) have been reverted.
* Modify proxy mode to support a single address (#50391)
Currently, the remote proxy connection mode uses a list setting for the
proxy address. This commit modifies this so that the setting is
proxy_address and only supports a single remote proxy address.
In the yaml cluster upgrade tests, we start a scroll in a mixed-version cluster,
then attempt to continue the scroll after the upgrade is complete. This test
occasionally fails because the scroll can expire before the cluster is done
upgrading.
The current scroll keep-alive time 5m. This PR bumps it to 10m, which gives a
good buffer since in failing tests the time was only exceeded by ~30 seconds.
Addresses #46529.
The testclusters shutdown code was killing child processes
of the ES JVM before the ES JVM. This causes any running
ML jobs to be recorded as failed, as the ES JVM notices that
they have disconnected from it without being told to stop,
as they would if they crashed. In many test suites this
doesn't matter because the test cluster will never be
restarted, but in the case of upgrade tests it makes it
impossible to test what happens when an ML job is running
at the time of the upgrade.
This change reverses the order of killing the ES process
tree such that the parent processes are killed before their
children. A list of children is stored before killing the
parent so that they can subsequently be killed (if they
don't exit by themselves as a side effect of the parent
dying).
Backport of #50175
This commit back ports three commits related to enabling the simple
connection strategy.
Allow simple connection strategy to be configured (#49066)
Currently the simple connection strategy only exists in the code. It
cannot be configured. This commit moves in the direction of allowing it
to be configured. It introduces settings for the addresses and socket
count. Additionally it introduces new settings for the sniff strategy
so that the more generic number of connections and seed node settings
can be deprecated.
The simple settings are not yet registered as the registration is
dependent on follow-up work to validate the settings.
Ensure at least 1 seed configured in remote test (#49389)
This fixes#49384. Currently when we select a random subset of seed
nodes from a list, it is possible for 0 seeds to be selected. This test
depends on at least 1 seed being selected.
Add the simple strategy to cluster settings (#49414)
This is related to #49067. This commit adds the simple connection
strategy settings and strategy mode setting to the cluster settings
registry. With these changes, the simple connection mode can be used.
Additionally, it adds validation to ensure that settings cannot be
misconfigured.
Backport of #48849. Update `.editorconfig` to make the Java settings the
default for all files, and then apply a 2-space indent to all `*.gradle`
files. Then reformat all the files.
The timeout was increased to 60s to allow this test more time to reach a
yellow state. However, the test will still on occasion fail even with the
60s timeout.
Related: #48381
Related: #48434
Related: #47950
Related: #40178
This commit introduces a consistent, and type-safe manner for handling
global build parameters through out our build logic. Primarily this
replaces the existing usages of extra properties with static accessors.
It also introduces and explicit API for initialization and mutation of
any such parameters, as well as better error handling for uninitialized
or eager access of parameter values.
Closes#42042
Previous behavior while copying HTTP headers to the ThreadContext,
would allow multiple HTTP headers with the same name, handling only
the first occurrence and disregarding the rest of the values. This
can be confusing when dealing with multiple Headers as it is not
obvious which value is read and which ones are silently dropped.
According to RFC-7230, a client must not send multiple header fields
with the same field name in a HTTP message, unless the entire field
value for this header is defined as a comma separated list or this
specific header is a well-known exception.
This commits changes the behavior in order to be more compliant to
the aforementioned RFC by requiring the classes that implement
ActionPlugin to declare if a header can be multi-valued or not when
registering this header to be copied over to the ThreadContext in
ActionPlugin#getRestHeaders.
If the header is allowed to be multivalued, then all such headers
are read from the HTTP request and their values get concatenated in
a comma-separated string.
If the header is not allowed to be multivalued, and the HTTP
request contains multiple such Headers with different values, the
request is rejected with a 400 status.
I think the problem is that the master is trying to relocate the "upgraded_scroll" shard back to
the node on which it was previously allocated, but to which it can't be allocated now due to the
shard lock being held because of an in-progress scroll. As the master keeps on retrying and
retrying (and indefinitely tries so because max_retries does not apply to relocations, it blocks
any other lower-prioritized task from completing, which leads to the rolling upgrade tests failing
(see #48395).
Closes#48395
7.5+ for SLM requires [stats] object to exist in the cluster state.
When doing an in-place upgrade from 7.4 to 7.5+ [stats] does not exist
in cluster state, result in an exception on startup [1].
This commit moves the [stats] to be an optional object in the parser
and if not found will default to an empty stats object.
[1] Caused by: java.lang.IllegalArgumentException: Required [stats]
rename internal indexes of transform plugin
- rename audit index and create an alias for accessing it, BWC: add an alias for old indexes to
keep them working, kibana UI will switch to use the read alias
- rename config index and provide BWC to read from old and new ones
* Convert RunTask to use testclusers, remove ClusterFormationTasks
This PR adds a new RunTask and a way for it to start a
testclusters cluster out of band and block on it to replace
the old RunTask that used ClusterFormationTasks.
With this we can now remove ClusterFormationTasks.
* Use versions specific distribution folders so we don't need to clean up (#46539)
* Retry deleting distro dir on windows
When retarting the cluster we clean up old distribution files that might
still be in use by the OS.
Windows closes resources of ded processes async, so we do a couple of
retries to get arround it.
Closes#46014
* Avoid having to delete the distro folder.
* Remove the use of ClusterFormationTasks form RestTestTask (#47022)
This PR removes a use-case of the ClusterFormationTasks and converts a
project that flew under the radar so far.
There's probably more clean-up possible here, but for now the goal is
to be able to remove that code after `RunTask` is also updated.
* Migrate some 7.x only projects
This commit lifts the validation of the monitoring hosts setting into
the setting itself, rather than when the setting is used. This prevents
a scenario where an invalid value for the setting is accepted, but then
later fails while applying a cluster state with the invalid setting.
* Bwc testclusters all (#46265)
Convert all bwc projects to testclusters
* Fix bwc versions config
* WIP fix rolling upgrade
* Fix bwc tests on old versions
* Fix rolling upgrade
Fixes multiple Active Directory related tests that run against the
samba fixture. Some were failing since we changed the realm settings
format in 7.0 and a few were slightly broken in other ways.
We can move to cleanup the tests in a follow up but this work fits
better to be done with or after we move the tests from a Samba
based fixture to a real(-ish) Microsoft Active Directory based
fixture.
Resolves: #33425, #35738
Due to a regression bug the metadata Active Directory realm
setting is ignored (it works correctly for the LDAP realm type).
This commit redresses it.
Closes#45848
Fixes multiple Active Directory related tests that run against the
samba fixture. Some were failing since we changed the realm settings
format in 7.0 and a few were slightly broken in other ways.
We can move to cleanup the tests in a follow up but this work fits
better to be done with or after we move the tests from a Samba
based fixture to a real(-ish) Microsoft Active Directory based
fixture.
Resolves: #33425, #35738
With this change the test setup for ML config upgrade
tests only waits for v6.6+ ML index templates to be
installed if the old cluster is running version 6.6.0
or higher.
Previously it was always waiting, but timing out without
failing the test if the templates were not installed
within 10 seconds, effectively just adding a pointless
10 second sleep to BWC tests against versions earlier
than 6.6.0. This problem was exposed by #47112.
Fixes#47286
Backport of #45794 to 7.x. Convert most `awaitBusy` calls to
`assertBusy`, and use asserts where possible. Follows on from #28548 by
@liketic.
There were a small number of places where it didn't make sense to me to
call `assertBusy`, so I kept the existing calls but renamed the method to
`waitUntil`. This was partly to better reflect its usage, and partly so
that anyone trying to add a new call to awaitBusy wouldn't be able to find
it.
I also didn't change the usage in `TransportStopRollupAction` as the
comments state that the local awaitBusy method is a temporary
copy-and-paste.
Other changes:
* Rework `waitForDocs` to scale its timeout. Instead of calling
`assertBusy` in a loop, work out a reasonable overall timeout and await
just once.
* Some tests failed after switching to `assertBusy` and had to be fixed.
* Correct the expect templates in AbstractUpgradeTestCase. The ES
Security team confirmed that they don't use templates any more, so
remove this from the expected templates. Also rewrite how the setup
code checks for templates, in order to give more information.
* Remove an expected ML template from XPackRestTestConstants The ML team
advised that the ML tests shouldn't be waiting for any
`.ml-notifications*` templates, since such checks should happen in the
production code instead.
* Also rework the template checking code in `XPackRestTestHelper` to give
more helpful failure messages.
* Fix issue in `DataFrameSurvivesUpgradeIT` when upgrading from < 7.4
This PR adds some restrictions around testfixtures to make sure the same service ( as defiend in docker-compose.yml ) is not shared between multiple projects.
Sharing would break running with --parallel.
Projects can still share fixtures as long as each has it;s own service within.
This is still useful to share some of the setup and configuration code of the fixture.
Project now also have to specify a service name when calling useCluster to refer to a specific service.
If this is not the case all services will be claimed and the fixture can't be shared.
For this reason fixtures have to explicitly specify if they are using themselves ( fixture and tests in the same project ).
When upgrading data nodes to a newer version before
master nodes there was a risk that a transform running
on an upgraded data node would index a document into
the new transforms internal index before its index
template was created. This would cause the index to
be created with entirely dynamic mappings.
This change introduces a check before indexing any
internal transforms document to ensure that the required
index template exists and create it if it doesn't.
Backport of #46553
rename data frame transform plugin to transform:
- rename plugin data-frame to transform
- change all package names from o.e.*.dataframe.* to o.e.*.transform.*
- necessary changes to fix loading/testing
* [ML][Transforms] fixing rolling upgrade continuous transform test (#45823)
* [ML][Transforms] fixing rolling upgrade continuous transform test
* adjusting wait assert logic
* adjusting wait conditions
* [ML][Transforms] allow executor to call start on started task (#46347)
* making sure we only upgrade from 7.4.0 in test
This check was introduced in #41392 but had the unwanted side-effect
that the keystore settings in such blocks would note be added in the
node's keystore. Given that we have a mid-term plan for FIPS testing
that would made such checks unnecessary, and that the conditional
in these two cases is not really that important, this change removes
this conditional logic so that full-cluster-restart and rolling
upgrade tests will run with PEM files for key/certificate material
no matter if we're in a FIPS JVM or not.
Resolves: #45475
As of #43939 Watcher tests now correctly block until all Watch executions
kicked off by that test are finished. Prior we allowed tests to finish with
outstanding watch executions. It was known that this would increase the
time needed to finish a test. However, running the tests on CI can be slow
and on at least 1 occasion it took 60s to actually finish.
This PR simply increases the max allowable timeout for Watcher tests
to clean up after themselves.
This commit allows the Transport Actions for the SSO realms to
indicate the realm that should be used to authenticate the
constructed AuthenticationToken. This is useful in the case that
many authentication realms of the same type have been configured
and where the caller of the API(Kibana or a custom web app) already
know which realm should be used so there is no need to iterate all
the realms of the same type.
The realm parameter is added in the relevant REST APIs as optional
so as not to introduce any breaking change.
When Watcher is stopped and there are still outstanding watches running
Watcher will report it self as stopped. In normal cases, this is not problematic.
However, for integration tests Watcher is started and stopped between
each test to help ensure a clean slate for each test. The tests are blocking
only on the stopped state and make an implicit assumption that all watches are
finished if the Watcher is stopped. This is an incorrect assumption since
Stopped really means, "I will not accept any more watches". This can lead to
un-predictable behavior in the tests such as message : "Watch is already queued
in thread pool" and state: "not_executed_already_queued".
This can also change the .watcher-history if watches linger between tests.
This commit changes the semantics of a manual stopping watcher to now mean:
"I will not accept any more watches AND all running watches are complete".
There is now an intermediary step "Stopping" and callback to allow transition
to a "Stopped" state when all Watches have completed.
Additionally since this impacts how long the tests will block waiting for a
"Stopped" state, the timeout has been increased.
Related: #42409
Most of our CLI tools use the Terminal class, which previously did not provide methods for writing to standard output. When all output goes to standard out, there are two basic problems. First, errors and warnings are "swallowed" in pipelines, making it hard for a user to know when something's gone wrong. Second, errors and warnings are intermingled with legitimate output, making it difficult to pass the results of interactive scripts to other tools.
This commit adds a second set of print commands to Terminal for printing to standard error, with errorPrint corresponding to print and errorPrintln corresponding to println. This leaves it to developers to decide which output should go where. It also adjusts existing commands to send errors and warnings to stderr.
Usage is printed to standard output when it's correctly requested (e.g., bin/elasticsearch-keystore --help) but goes to standard error when a command is invoked incorrectly (e.g. bin/elasticsearch-keystore list-with-a-typo | sort).
The vagrant based tests currently reside in a single project, creating
dozens of tasks to manage starting and stopping the vagrant VM along
with running java and bats tests within each image. This all-in-one
pattern makes parallelizing packaging tests difficult.
This commit rewrites the vagrant testing infrastructure to be
independent of the actual test runners, thus allowing each platform to
be handled in a separate subproject. Additionally, the java and bats
tests are changed to be run through a "destructive" gradle task, which
is run inside the VM. The combination of these will allow
parallelization both locally (through running several VMs at once) as
well as running the destructive tasks in CI machines dedicated to each
platform (thus removing the need for vagrant in CI).
This commit adds a first draft of a regression analysis
to data frame analytics. There is high probability that
the exact syntax might change.
This commit adds the new analysis type and its parameters as
well as appropriate validation. It also modifies the extractor
and the fields detector to be able to handle categorical fields
as regression analysis supports them.
This commit replaces task_state and indexer_state in the
data frame _stats output with a single top level state
that combines the two. It is defined as:
- failed if what's currently reported as task_state is failed
- stopped if there is no persistent task
- Otherwise what's currently reported as indexer_state
Backport of #45276
This commit applies a normalization process to environment paths, both
in how they are stored internally, also their settings values. This
normalization is done via two means:
- we make the paths absolute
- we remove redundant name elements from the path (what Java calls
"normalization")
This change ensures that when we compare and refer to these paths within
the system, we are using a common ground. For example, prior to the
change if the data path was relative, we would not compare it correctly
to paths from disk usage. This is because the paths in disk usage were
being made absolute.
This change adjusts the data frame transforms stats
endpoint to return a structure that is easier to
understand.
This is a breaking change for clients of the data frame
transforms stats endpoint, but the feature is in beta so
stability is not guaranteed.
Backport of #44350
Test clusters currently has its own set of logic for dealing with
finding different versions of Elasticsearch, downloading them, and
extracting them. This commit converts testclusters to use the
DistributionDownloadPlugin.