A disruption test case need to use a lower checkpoint sync interval
since they verify sequence numbers after the test waiting max 10 seconds
for it to stabilize.
Closes#42637
In the reindex from old tests we require Java 8. Today when configuring
the reindex from old tests, we eagerly evalulate Java 8 home, which
means that we require JAVA8_HOME to be set even if the reindex from old
test tasks are not in the task graph. This is an onerous requirement if,
for example, all that you want to do is build a distribution. This
commit addresses this by making evaluation of Java 8 home lazy, so that
it is only done and required if the reindex from old test tasks would be
executed.
* Fix Incorrect Time Math in MockTransport
* The timeunit here must be nanos for the current time (we even convert it accordingly in the logging)
* Also, changed the log message when dumping stack traces a little to make it easier to grep for (otherwise it's the same as the message on unregister)
* Now that we process the bulk requests themselves on the WRITE threadpool, they can run out of retries too like the item requests even when backoff is active
* Fixes#41324 by using the same logic that checks failed item requests for their retry status for the top level bulk requests as well
When multiple commands are called in sequence, fetch shards
from mutable, up-to-date routing nodes to ensure each command's
changes are visible to subsequent commands.
This addresses an issue uncovered during work on #41050.
The problem this commit addresses is that state recovery is not reset on a node that then becomes
master with a cluster state that has a state not recovered flag in it. The situation that was observed
in a failed test run of MinimumMasterNodesIT.testThreeNodesNoMasterBlock (see below) is that we
have 3 master nodes (node_t0, node_t1, node_t2), two of them are shut down (node_t2 remains),
when the first one comes back (renamed to node_t4) it becomes leader in term 2 and sends state
(with state_not_recovered_block) to node_t2, which accepts. node_t2 becomes leader in term 3, and
as it was previously leader in term1 and successfully completed state recovery, does never retry
state recovery in term 3.
Closes#39172
* Make unwrapCorrupt Check Suppressed Ex. (#41889)
* As discussed in #24800 we want to check for suppressed corruption
indicating exceptions here as well to more reliably categorize
corruption related exceptions
* Closes#24800, 41201
* Cleanup Bulk Delete Exception Logging
* Follow up to #41368
* Collect all failed blob deletes and add them to the exception message
* Remove logging of blob name list from caller exception logging
* Add Infrastructure to Run 3rd Party Repository Tests
* Add infrastructure to run third party repository tests using our standard JUnit infrastructure
* This is a prerequisite of #42189
testRetentionLeaseIsAddedIfItDisappearsWhileFollowing does not reset the
mock transport service after test. Surviving transport interceptors from
that test can sneaky remove retention leases and make other tests fail.
Closes#39331Closes#39509Closes#41428Closes#41679Closes#41737Closes#41756
ES does not accept doc type starting with underscore until 6.2.0. We
have to use "doc" instead of "_doc" in FullClusterRestartIT if we are
upgrading from a 6.2.0- cluster.
Closes#42581
If all primary shards are allocated on the master node, then the
verifying before close step will never interact with mock transport
service. This change prefers to allocate shards on data-only nodes.
Closes#39757
* It looks like we might be cancelling a previous publication instead of
the one triggered by the given request with a very low likelihood.
* Fixed by adding a wait for no in-progress publications
* Also added debug logging that would've identified this problem
* Closes#36813
* Remove Delete Method from BlobStore (#41619)
* The delete method on the blob store was used almost nowhere and just duplicates the delete method on the blob containers
* The fact that it provided for some recursive delete logic (that did not behave the same way on all implementations) was not used and not properly tested either
* Added separate enum for the state of each shard, it was really
confusing that we used the same enum for the state of the snapshot
overall and the state of each individual shard
* relates https://github.com/elastic/elasticsearch/pull/40943#issuecomment-488664150
* Shortened some obvious spots in equals method and saved a few lines
via `computeIfAbsent` to make up for adding 50 new lines to this class
* Safer Wait for Snapshot Success in ClusterPrivilegeTests
* The snapshot state returned by the API might become SUCCESS before it's fully removed from the cluster state.
* We should fix this race in the transport API but it's not trivial and will be part of the incoming big round of refactoring the repository interaction, this added check fixes the test for now
* closes#38030
* Remove Obsolete BwC Logic from BlobStoreRepository
* We can't restore 1.3.3 files anyway -> no point in doing the dance of computing a hash here
* Some other minor+obvious cleanups
* Dump Stacktrace on Slow IO-Thread Operations
* Follow up to #39729 extending the functionality to actually dump the
stack when the thread is blocked not afterwards
* Logging the stacktrace after the thread became unblocked is only of
limited use because we don't know what happened in the slow callback
from that (only whether we were blocked on a read,write,connect etc.)
* Relates #41745
* Fix testTracerLog Network Tests
* Start appender before using it like we do for e.g. the Netty leak detection appender to avoid interference from actions on the network threads that might still be dangling from previous tests in the same suite
* Closes#41890