We have a few places where we register license state listeners on
transient components (i.e., resources that can be open and closed during
the lifecycle of the server). In one case (the opt-out query cache) we
were never removing the registered listener, effectively a terrible
memory leak. In another case, we were not un-registered the listener
that we registered, since we were not referencing the same instance of
Runnable. This commit does two things:
- introduces a marker interface LicenseStateListener so that it is
easier to identify these listeners in the codebase and avoid classes
that need to register a license state listener from having to
implement Runnable which carries a different semantic meaning than
we want here
- fixes the two places where we are currently leaking license state
listeners
* Merging the zen2 branch has made it impossible to reproduce the test failure here, likely as a result of the state now being written atomically
* Closes#36189
Some IDEs (specifically Eclipse 4.8.0) needs some explicit casting for type
inference. This was added in a recent change but we should also have comments so
we remember why this needs to be there (at least for now).
Currently, we only log that WriteStateException has occurred, in
GatewayMetaState.
This PR goal is to re-throw WriteStateException to upper layers.
If dirty flag is set to false, we wrap WriteStateException in
UncheckedIOException, we prefer unchecked exception not to add
explicit throws in the multitude of layers.
If dirty flag is set to true - the world is broken. And we need to
halt the JVM. Instead of explicit halting in GatewayMetaState, we
prefer to throw IOError, which will be subsequently handled by
ElasticsearchUncaughtExceptionHandler and JVM will be halted.
This PR also adds tests for WriteStateException.
The Eclipse IDE java compiler seems to need some special hints about what types
some functions used in the tests return. Correcting this for some test that were
newly merged to master.
- GetSslCertificatesRequest need not implement toXContentObject
- getRequest() returns a new Request object
- Add tests for GetSslCertificatesResponse
- Adjust docs to the new format
Today the InternalTestClusterTests sometimes set up a cluster with a single
master, start some other ndoes, shut the original master down, and then reset
the cluster. This doesn't really work, because the original master may be
stale. This change avoids shutting down the only master in this situation.
This change removes the custom serialization of the total hits
and reuses the shard's serialization of Lucene#read/writeTopDocs
in the client code. This also removes the incorrect assertion that
trips randomly in bwc tests.
Closes#36284
Set rest_total_hits_as_int in search requests only on the new cluster,
the old cluster can be on a version older than don't support this
parameter (< 6.6) and total hits is not an object in 6x anyway.
Closes#36291
Currently we use Math.random() in a few places in the tests which makes these
tests not reproducable with the random seed mechanism that comes with
ESTestCase. The change removes those instances.
This commit hides ClusterStates that have a STATE_NOT_RECOVERED_BLOCK from
ClusterStateAppliers. This is needed, because some appliers, such as IngestService, rely on
the fact, that cluster states with STATE_NOT_RECOVERED_BLOCK won't contain anything useful.
Once the state is recovered it's fully available for the appliers. This commit also switches many of
the remaining tests that require state persistence/recovery from Zen1 to Zen2.
This commit changes the format of the `hits.total` in the search response to be an object with
a `value` and a `relation`. The `value` indicates the number of hits that match the query and the
`relation` indicates whether the number is accurate (in which case the relation is equals to `eq`)
or a lower bound of the total (in which case it is equals to `gte`).
This change also adds a parameter called `rest_total_hits_as_int` that can be used in the
search APIs to opt out from this change (retrieve the total hits as a number in the rest response).
Note that currently all search responses are accurate (`track_total_hits: true`) or they don't contain
`hits.total` (`track_total_hits: true`). We'll add a way to get a lower bound of the total hits in a
follow up (to allow numbers to be passed to `track_total_hits`).
Relates #33028
Closes#19528
* Port a test Colin wrote for the TDigest library to validate TDigests storing over 2*MAXINT values. This appears to have been fixed in version 3.2 of TDigest, which Elasticsearch has been using for some time, so no changes were necessary to resolve this issue.
This is a follow-up to #36086. It renames the internal repository
actions to be prefixed by "internal". This allows the system user to
execute the actions.
Additionally, this PR stops casting Client to NodeClient. The client we
have is a NodeClient so executing the actions will be local.
The shard deletion logic (triggered by IndicesStore), which also leads to index metadata deletion on
non-master-eligible data nodes, currently races against the new cluster state persistence logic
triggered by accepting cluster states. One thread is writing the index metadata while another one is
deleting the index metadata, leading to exceptions and assertions tripping (see below). The solution
proposed by this PR is to move the cluster state persistence of non-master-eligible nodes back to
the cluster applier service, just as it used to be for Zen1. This ensures that the index metadata
deletion logic, which is triggered by the shard deletion logic, runs on the same thread on which we
persist the cluster state.
and replaced poll interval setting with a hardcoded poll interval.
The hard coded interval will be removed in a follow up change to make
use of cluster state API's wait_for_metatdata_version.
Before the auto following was bootstrapped from thread pool scheduler,
but now auto followers for new remote clusters are bootstrapped when
a new cluster state is published.
Originates from #35895
Relates to #33007
Closes#35435
- make it easier to add additional testing tasks with the proper configuration and add some where they were missing.
- mute or fix failing tests
- add a check as part of testing conventions to find classes not included in any testing task.