* Convert RunTask to use testclusers, remove ClusterFormationTasks
This PR adds a new RunTask and a way for it to start a
testclusters cluster out of band and block on it to replace
the old RunTask that used ClusterFormationTasks.
With this we can now remove ClusterFormationTasks.
* Use versions specific distribution folders so we don't need to clean up (#46539)
* Retry deleting distro dir on windows
When retarting the cluster we clean up old distribution files that might
still be in use by the OS.
Windows closes resources of ded processes async, so we do a couple of
retries to get arround it.
Closes#46014
* Avoid having to delete the distro folder.
* Remove the use of ClusterFormationTasks form RestTestTask (#47022)
This PR removes a use-case of the ClusterFormationTasks and converts a
project that flew under the radar so far.
There's probably more clean-up possible here, but for now the goal is
to be able to remove that code after `RunTask` is also updated.
* Migrate some 7.x only projects
This commit lifts the validation of the monitoring hosts setting into
the setting itself, rather than when the setting is used. This prevents
a scenario where an invalid value for the setting is accepted, but then
later fails while applying a cluster state with the invalid setting.
* Bwc testclusters all (#46265)
Convert all bwc projects to testclusters
* Fix bwc versions config
* WIP fix rolling upgrade
* Fix bwc tests on old versions
* Fix rolling upgrade
Fixes multiple Active Directory related tests that run against the
samba fixture. Some were failing since we changed the realm settings
format in 7.0 and a few were slightly broken in other ways.
We can move to cleanup the tests in a follow up but this work fits
better to be done with or after we move the tests from a Samba
based fixture to a real(-ish) Microsoft Active Directory based
fixture.
Resolves: #33425, #35738
Due to a regression bug the metadata Active Directory realm
setting is ignored (it works correctly for the LDAP realm type).
This commit redresses it.
Closes#45848
Fixes multiple Active Directory related tests that run against the
samba fixture. Some were failing since we changed the realm settings
format in 7.0 and a few were slightly broken in other ways.
We can move to cleanup the tests in a follow up but this work fits
better to be done with or after we move the tests from a Samba
based fixture to a real(-ish) Microsoft Active Directory based
fixture.
Resolves: #33425, #35738
With this change the test setup for ML config upgrade
tests only waits for v6.6+ ML index templates to be
installed if the old cluster is running version 6.6.0
or higher.
Previously it was always waiting, but timing out without
failing the test if the templates were not installed
within 10 seconds, effectively just adding a pointless
10 second sleep to BWC tests against versions earlier
than 6.6.0. This problem was exposed by #47112.
Fixes#47286
Backport of #45794 to 7.x. Convert most `awaitBusy` calls to
`assertBusy`, and use asserts where possible. Follows on from #28548 by
@liketic.
There were a small number of places where it didn't make sense to me to
call `assertBusy`, so I kept the existing calls but renamed the method to
`waitUntil`. This was partly to better reflect its usage, and partly so
that anyone trying to add a new call to awaitBusy wouldn't be able to find
it.
I also didn't change the usage in `TransportStopRollupAction` as the
comments state that the local awaitBusy method is a temporary
copy-and-paste.
Other changes:
* Rework `waitForDocs` to scale its timeout. Instead of calling
`assertBusy` in a loop, work out a reasonable overall timeout and await
just once.
* Some tests failed after switching to `assertBusy` and had to be fixed.
* Correct the expect templates in AbstractUpgradeTestCase. The ES
Security team confirmed that they don't use templates any more, so
remove this from the expected templates. Also rewrite how the setup
code checks for templates, in order to give more information.
* Remove an expected ML template from XPackRestTestConstants The ML team
advised that the ML tests shouldn't be waiting for any
`.ml-notifications*` templates, since such checks should happen in the
production code instead.
* Also rework the template checking code in `XPackRestTestHelper` to give
more helpful failure messages.
* Fix issue in `DataFrameSurvivesUpgradeIT` when upgrading from < 7.4
This PR adds some restrictions around testfixtures to make sure the same service ( as defiend in docker-compose.yml ) is not shared between multiple projects.
Sharing would break running with --parallel.
Projects can still share fixtures as long as each has it;s own service within.
This is still useful to share some of the setup and configuration code of the fixture.
Project now also have to specify a service name when calling useCluster to refer to a specific service.
If this is not the case all services will be claimed and the fixture can't be shared.
For this reason fixtures have to explicitly specify if they are using themselves ( fixture and tests in the same project ).
When upgrading data nodes to a newer version before
master nodes there was a risk that a transform running
on an upgraded data node would index a document into
the new transforms internal index before its index
template was created. This would cause the index to
be created with entirely dynamic mappings.
This change introduces a check before indexing any
internal transforms document to ensure that the required
index template exists and create it if it doesn't.
Backport of #46553
rename data frame transform plugin to transform:
- rename plugin data-frame to transform
- change all package names from o.e.*.dataframe.* to o.e.*.transform.*
- necessary changes to fix loading/testing
* [ML][Transforms] fixing rolling upgrade continuous transform test (#45823)
* [ML][Transforms] fixing rolling upgrade continuous transform test
* adjusting wait assert logic
* adjusting wait conditions
* [ML][Transforms] allow executor to call start on started task (#46347)
* making sure we only upgrade from 7.4.0 in test
This check was introduced in #41392 but had the unwanted side-effect
that the keystore settings in such blocks would note be added in the
node's keystore. Given that we have a mid-term plan for FIPS testing
that would made such checks unnecessary, and that the conditional
in these two cases is not really that important, this change removes
this conditional logic so that full-cluster-restart and rolling
upgrade tests will run with PEM files for key/certificate material
no matter if we're in a FIPS JVM or not.
Resolves: #45475
As of #43939 Watcher tests now correctly block until all Watch executions
kicked off by that test are finished. Prior we allowed tests to finish with
outstanding watch executions. It was known that this would increase the
time needed to finish a test. However, running the tests on CI can be slow
and on at least 1 occasion it took 60s to actually finish.
This PR simply increases the max allowable timeout for Watcher tests
to clean up after themselves.
This commit allows the Transport Actions for the SSO realms to
indicate the realm that should be used to authenticate the
constructed AuthenticationToken. This is useful in the case that
many authentication realms of the same type have been configured
and where the caller of the API(Kibana or a custom web app) already
know which realm should be used so there is no need to iterate all
the realms of the same type.
The realm parameter is added in the relevant REST APIs as optional
so as not to introduce any breaking change.
When Watcher is stopped and there are still outstanding watches running
Watcher will report it self as stopped. In normal cases, this is not problematic.
However, for integration tests Watcher is started and stopped between
each test to help ensure a clean slate for each test. The tests are blocking
only on the stopped state and make an implicit assumption that all watches are
finished if the Watcher is stopped. This is an incorrect assumption since
Stopped really means, "I will not accept any more watches". This can lead to
un-predictable behavior in the tests such as message : "Watch is already queued
in thread pool" and state: "not_executed_already_queued".
This can also change the .watcher-history if watches linger between tests.
This commit changes the semantics of a manual stopping watcher to now mean:
"I will not accept any more watches AND all running watches are complete".
There is now an intermediary step "Stopping" and callback to allow transition
to a "Stopped" state when all Watches have completed.
Additionally since this impacts how long the tests will block waiting for a
"Stopped" state, the timeout has been increased.
Related: #42409
Most of our CLI tools use the Terminal class, which previously did not provide methods for writing to standard output. When all output goes to standard out, there are two basic problems. First, errors and warnings are "swallowed" in pipelines, making it hard for a user to know when something's gone wrong. Second, errors and warnings are intermingled with legitimate output, making it difficult to pass the results of interactive scripts to other tools.
This commit adds a second set of print commands to Terminal for printing to standard error, with errorPrint corresponding to print and errorPrintln corresponding to println. This leaves it to developers to decide which output should go where. It also adjusts existing commands to send errors and warnings to stderr.
Usage is printed to standard output when it's correctly requested (e.g., bin/elasticsearch-keystore --help) but goes to standard error when a command is invoked incorrectly (e.g. bin/elasticsearch-keystore list-with-a-typo | sort).
The vagrant based tests currently reside in a single project, creating
dozens of tasks to manage starting and stopping the vagrant VM along
with running java and bats tests within each image. This all-in-one
pattern makes parallelizing packaging tests difficult.
This commit rewrites the vagrant testing infrastructure to be
independent of the actual test runners, thus allowing each platform to
be handled in a separate subproject. Additionally, the java and bats
tests are changed to be run through a "destructive" gradle task, which
is run inside the VM. The combination of these will allow
parallelization both locally (through running several VMs at once) as
well as running the destructive tasks in CI machines dedicated to each
platform (thus removing the need for vagrant in CI).
This commit adds a first draft of a regression analysis
to data frame analytics. There is high probability that
the exact syntax might change.
This commit adds the new analysis type and its parameters as
well as appropriate validation. It also modifies the extractor
and the fields detector to be able to handle categorical fields
as regression analysis supports them.
This commit replaces task_state and indexer_state in the
data frame _stats output with a single top level state
that combines the two. It is defined as:
- failed if what's currently reported as task_state is failed
- stopped if there is no persistent task
- Otherwise what's currently reported as indexer_state
Backport of #45276
This commit applies a normalization process to environment paths, both
in how they are stored internally, also their settings values. This
normalization is done via two means:
- we make the paths absolute
- we remove redundant name elements from the path (what Java calls
"normalization")
This change ensures that when we compare and refer to these paths within
the system, we are using a common ground. For example, prior to the
change if the data path was relative, we would not compare it correctly
to paths from disk usage. This is because the paths in disk usage were
being made absolute.
This change adjusts the data frame transforms stats
endpoint to return a structure that is easier to
understand.
This is a breaking change for clients of the data frame
transforms stats endpoint, but the feature is in beta so
stability is not guaranteed.
Backport of #44350
Test clusters currently has its own set of logic for dealing with
finding different versions of Elasticsearch, downloading them, and
extracting them. This commit converts testclusters to use the
DistributionDownloadPlugin.
* [ML][Data Frame] Adding bwc tests for pivot transform (#43506)
* [ML][Data Frame] Adding bwc tests for pivot transform
* adding continuous transforms
* adding continuous dataframes to bwc
* adding continuous data frame tests
* Adding rolling upgrade tests for continuous df
* Fixing test
* Adjusting indices used in BWC, and handling NPE for seq_no_stats
* updating and muting specific bwc test
* Adjusting bwc tests for backport
This adds the ability to execute an action for each element that occurs
in an array, for example you could sent a dedicated slack action for
each search hit returned from a search.
There is also a limit for the number of actions executed, which is
hardcoded to 100 right now, to prevent having watches run forever.
The watch history logs each action result and the total number of actions
the were executed.
Relates #34546
* fix org.elasticsearch.xpack.watcher.test.integration.RejectedExecutionTests (#41777)
This commit un-mutes org.elasticsearch.xpack.watcher.test.integration.RejectedExecutionTests
which was failing intermittently due to a logic bug. It is not possible to use the real
Watcher scheduler (which is needed for this test) and reliabliby count the .triggered-watches
since current count of documents in the .triggered-watches index is based on the timing of the
scheduler and the ability to delete based on the Watcher and Write thread pools.
This commit simply removes the .triggered-watch check and relies soley on the .watcher-history
index as an indication that operations that can occur when the Watcher threadpool is rejecting.
closes#41734
* fix unlikely bug that can prevent Watcher from restarting (#42030)
The bug fixed here is unlikely to happen. It requires ES to be started with
ILM disabled, Watcher enabled, and Watcher explicitly stopped and restarted.
Due to template validation Watcher does not fully start and can result in a
partially started state. This is an unlikely scenerio outside of the testing
framework.
Note - this bug was introduced while the test that would have caught it was
muted. The test remains muted since the underlying cuase of the random failures
has not been identified. When this test is un-muted it will now work.
As defined in https://tools.ietf.org/html/rfc6749#section-2.3.1
both client id and client secret need to be encoded with the
application/x-www-form-urlencoded encoding algorithm when used as
credentials for HTTP Basic Authentication in requests to the OP.
Resolves#43709
Now that the fix krb5-kdc fixture (entropy problem in docker container)
is in and the converting `kerberos-tests` to testclusters is done,
enabling the kerberos-tests
Closes#40678
This commit removes some very old test logging annotations that appeared
to be added to investigate test failures that are long since closed. If
these are needed, they can be added back on a case-by-case basis with a
comment associating them to a test failure.
Kibana wants to create access_token/refresh_token pair using Token
management APIs in exchange for kerberos tickets. `client_credentials`
grant_type requires every user to have `cluster:admin/xpack/security/token/create`
cluster privilege.
This commit introduces `_kerberos` grant_type for generating `access_token`
and `refresh_token` in exchange for a valid base64 encoded kerberos ticket.
In addition, `kibana_user` role now has cluster privilege to create tokens.
This allows Kibana to create access_token/refresh_token pair in exchange for
kerberos tickets.
Note:
The lifetime from the kerberos ticket is not used in ES and so even after it expires
the access_token/refresh_token pair will be valid. Care must be taken to invalidate
such tokens using token management APIs if required.
Closes#41943
This commit removes some trace logging for the token service in the
rolling upgrade tests. If there is an active investigation here, it
would be best to annotate this line with a comment in the source
indicating such. From my digging, it does not appear there is an active
investigation that relies on this logging, so we remove it.
Infra has fixed#10462 by installing `haveged` on CI workers.
This commit enables the disabled fixture and tests, and mounts
`/dev/urandom` for the container so there is enough
entropy required for kdc.
Note: hdfs-repository tests have been disabled, will raise a separate issue for it.
Closes#40624Closes#40678
In hamcrest 2.1 warnings for unchecked varargs were fixed by hamcrest using @SafeVarargs for those matchers where this warning occurred.
This PR is aimed to remove these annotations when Matchers.contains ,Matchers.containsInAnyOrder or Matchers.hasItems was used
backport #41528
This commit changes the way token ids are hashed so that the output is
url safe without requiring encoding. This follows the pattern that we
use for document ids that are autogenerated, see UUIDs and the
associated classes for additional details.
The date_histogram accepts an interval which can be either a calendar
interval (DST-aware, leap seconds, arbitrary length of months, etc) or
fixed interval (strict multiples of SI units). Unfortunately this is inferred
by first trying to parse as a calendar interval, then falling back to fixed
if that fails.
This leads to confusing arrangement where `1d` == calendar, but
`2d` == fixed. And if you want a day of fixed time, you have to
specify `24h` (e.g. the next smallest unit). This arrangement is very
error-prone for users.
This PR adds `calendar_interval` and `fixed_interval` parameters to any
code that uses intervals (date_histogram, rollup, composite, datafeed, etc).
Calendar only accepts calendar intervals, fixed accepts any combination of
units (meaning `1d` can be used to specify `24h` in fixed time), and both
are mutually exclusive.
The old interval behavior is deprecated and will throw a deprecation warning.
It is also mutually exclusive with the two new parameters. In the future the
old dual-purpose interval will be removed.
The change applies to both REST and java clients.
This commit updates the default ciphers and TLS protocols that are used
when the runtime JDK supports them. New cipher support has been
introduced in JDK 11 and 12 along with performance fixes for AES GCM.
The ciphers are ordered with PFS ciphers being most preferred, then
AEAD ciphers, and finally those with mainstream hardware support. When
available stronger encryption is preferred for a given cipher.
This is a backport of #41385 and #41808. There are known JDK bugs with
TLSv1.3 that have been fixed in various versions. These are:
1. The JDK's bundled HttpsServer will endless loop under JDK11 and JDK
12.0 (Fixed in 12.0.1) based on the way the Apache HttpClient performs
a close (half close).
2. In all versions of JDK 11 and 12, the HttpsServer will endless loop
when certificates are not trusted or another handshake error occurs. An
email has been sent to the openjdk security-dev list and #38646 is open
to track this.
3. In JDK 11.0.2 and prior there is a race condition with session
resumption that leads to handshake errors when multiple concurrent
handshakes are going on between the same client and server. This bug
does not appear when client authentication is in use. This is
JDK-8213202, which was fixed in 11.0.3 and 12.0.
4. In JDK 11.0.2 and prior there is a bug where resumed TLS sessions do
not retain peer certificate information. This is JDK-8212885.
The way these issues are addressed is that the current java version is
checked and used to determine the supported protocols for tests that
provoke these issues.
testclusters detect from settings that security is enabled
if a user is not specified using the DSL introduced in this PR, a default one is created
the appropriate wait conditions are used authenticating with the first user defined in the DSL ( or the default user ).
an example DSL to create a user is user username:"test_user" password:"x-pack-test-password" role: "superuser" all keys are optional and default to the values shown in this example
This commit introduces the `.security-tokens` and `.security-tokens-7`
alias-index pair. Because index snapshotting is at the index level granularity
(ie you cannot snapshot a subset of an index) snapshoting .`security` had
the undesirable effect of storing ephemeral security tokens. The changes
herein address this issue by moving tokens "seamlessly" (without user
intervention) to another index, so that a "Security Backup" (ie snapshot of
`.security`) would not be bloated by ephemeral data.
This is related to #36652. We intend to deprecate a number of transport
settings in 7.x and remove them in 8.0. This commit removes the string
usages of these settings.
This commit removes xpack dependencies of many xpack qa modules.
(for some qa modules this will require some more work)
The reason behind this change is that qa rest modules should not depend
on the x-pack plugins, because the plugins are an implementation detail and
the tests should only know about the rest interface and qa cluster that is
being tested.
Also some qa modules rely on xpack plugins and hlrc (which is a valid
dependency for rest qa tests) creates a cyclic dependency and this is
something that we should avoid. Also Eclipse can't handle gradle cyclic
dependencies (see #41064).
* don't copy xpack-core's plugin property into the test resource of qa
modules. Otherwise installing security manager fails, because it tries
to find the XPackPlugin class.
This commit adds an OpenID Connect authentication realm to
elasticsearch. Elasticsearch (with the assistance of kibana or
another web component) acts as an OpenID Connect Relying
Party and supports the Authorization Code Grant and Implicit
flows as described in http://ela.st/oidc-spec. It adds support
for consuming and verifying signed ID Tokens, both RP
initiated and 3rd party initiated Single Sign on and RP
initiated signle logout.
It also adds an OpenID Connect Provider in the idp-fixture to
be used for the associated integration tests.
This is a backport of #40674
Backport of (#41087)
* Use environment settings instead of state settings for Watcher config
Prior to this we used the settings from cluster state to see whether ILM was
enabled of disabled, however, these settings don't accurately reflect the
`xpack.ilm.enabled` setting in `elasticsearch.yml`.
This commit changes to using the `Environment` settings, which correctly reflect
the ILM enabled setting.
Resolves#41042
* moved hlrc parsing tests from xpack to hlrc module and removed dependency on hlrc from xpack core
* deprecated old base test class
* added deprecated jdoc tag
* split test between xpack-core part and hlrc part
* added lang-mustache test dependency, this previously came in via
hlrc dependency.
* added hlrc dependency on a qa module
* duplicated ClusterPrivilegeName class in xpack-core, since x-pack
core no longer has a dependency on hlrc.
* replace ClusterPrivilegeName usages with string literals
* moved tests to dedicated to hlrc packages in order to remove Hlrc part from the name and make sure to use imports instead of full qualified class where possible
* remove ESTestCase. from method invocation and use method directly,
because these tests indirectly extend from ESTestCase
* Replace usages RandomizedTestingTask with built-in Gradle Test (#40978)
This commit replaces the existing RandomizedTestingTask and supporting code with Gradle's built-in JUnit support via the Test task type. Additionally, the previous workaround to disable all tasks named "test" and create new unit testing tasks named "unitTest" has been removed such that the "test" task now runs unit tests as per the normal Gradle Java plugin conventions.
(cherry picked from commit 323f312bbc829a63056a79ebe45adced5099f6e6)
* Fix forking JVM runner
* Don't bump shadow plugin version
* Avoid sharing source directories as it breaks intellij
* Subprojects share main project output classes directory
* Fix jar hell
* Fix sql security with ssl integ tests
* Relax dependency ordering rule so we don't explode on cycles
Many gradle projects specifically use the -try exclude flag, because
there are many cases where auto-closeable resource ignore is never
referenced in body of corresponding try statement. Suppressing this
warning specifically in each case that it happens using
`@SuppressWarnings("try")` would be very verbose.
This change removes `-try` from any gradle project and adds it to the
build plugin. Also this change removes exclude flags from gradle projects
that is already specified in build plugin (for example -deprecation).
Relates to #40366
By default, in integ tests we wait for the standalone cluster to start
by using the ant Get task to retrieve the cluster health endpoint.
However the ant task has no facilities for customising the trusted
CAs for a https resource, so if the integ test cluster has TLS enabled
on the http interface (using a custom CA) we need a separate utility
for that purpose.
Backport of: #40573
* [ML] Addressing bug streaming DatafeedConfig aggs from (<= 6.5.4) -> 6.7.0 (#40610)
* Addressing stream failure and adding tests to catch such in the future
* Add aggs to full cluster restart tests
* Test BWC for datafeeds with and without aggs
The wire serialisation is different for null/non-null
aggs, so it's worth testing both cases.
* Fixing bwc test, removing types
* Fixing BWC test for datafeed
* Update 40_ml_datafeed_crud.yml
* Update build.gradle
This change removes the variants of the rolling upgrade and full
cluster restart tests that use or do not use a system key. These tests
were added during 5.x when the system key was still used for security
and now the system key is only used as the watcher encryption key so
duplicating rolling upgrade and full cluster restarts is not needed.
The change here removes the subprojects for testing these scenarios and
defaults to always run with the watcher sensitive values encrypted for
these tests.
Replaces the vagrant based kerberos fixtures with docker based test fixtures plugin.
The configuration is now entirely static on the docker side and no longer driven by Gradle,
also two different services are being configured since there are two different consumers of the fixture that can run in parallel and require different configurations.
This change removes the use of hardcoded port values for the
idp-fixture in favor of the mapped ephemeral ports. This should prevent
failures due to port conflicts in CI.
The Migration Assistance API has been functionally replaced by the
Deprecation Info API, and the Migration Upgrade API is not used for the
transition from ES 6.x to 7.x, and does not need to be kept around to
repair indices that were not properly upgraded before upgrading the
cluster, as was the case in 6.
This commit enables full-cluster-restart and rolling-upgrade tests
to run with nodes using a JVM in fips approved only node by using
PEM key material instead of a JKS for the transport layer in that
case.