The message `... took [31s] above the warn threshold of 30s` suggests
incorrectly that the task took 61 seconds. This commit adds the clarifying
words `which is`.
This commit introduces a background sync for retention leases. The idea
here is that we do a heavyweight sync when adding a new retention lease,
and then periodically we want to background sync any retention lease
renewals to the replicas. As long as the background sync interval is
significantly lower than the extended lifetime of a retention lease, it
is okay if from time to time a replica misses a sync (it will still have
an older version of the lease that is retaining more data as we assume
that renewals do not decrease the retaining sequence number). There are
two follow-ups that will come after this commit. The first is to address
the fact that we have not adapted the should periodically flush logic to
possibly flush the retention leases. We want to do something like flush
if we have not flushed in the last five minutes and there are renewed
retention leases since the last time that we flushed. An additional
follow-up will remove the syncing of retention leases when a retention
lease expires. Today this sync could be invoked in the background by a
merge operation. Rather, we will move the syncing of retention lease
expiration to be done under the background sync. The background sync
will use the heavyweight sync (write action) if a lease has expired, and
will use the lightweight background sync (replication action) otherwise.
Reduces the leader and follower check timeout to 3 * 10 = 30s instead of 3 * 30 = 90s, with 30s still
being a very long time for a node to be completely unresponsive.
This adds a dedicated field mapper that supports nanosecond resolution -
at the price of a reduced date range.
When using the date field mapper, the time is stored as milliseconds since the epoch
in a long in lucene. This field mapper stores the time in nanoseconds
since the epoch - which means its range is much smaller, ranging roughly from
1970 to 2262.
Note that aggregations will still be in milliseconds.
However docvalue fields will have full nanosecond resolution
Relates #27330
Today the following settings in the `discovery.zen` namespace are still used:
- `discovery.zen.no_master_block`
- `discovery.zen.hosts_provider`
- `discovery.zen.ping.unicast.concurrent_connects`
- `discovery.zen.ping.unicast.hosts.resolve_timeout`
- `discovery.zen.ping.unicast.hosts`
This commit deprecates all other settings in this namespace so that they can be
removed in the next major version.
* This was a merge mistake on my end I think, obviously we only need to loop over the shards once not twice here to find those that we missed in INIT state
* Fix Incorrect Transport Response Handler Type
* The response type here is not empty and was always wrong but this only became visible now that 0a604e3b24 was introduced
* As a result of 0a604e3b24 we started actually handling the response
of this request and logging/handling exceptions before that we simply dropped the classcast exception here quietly using the empty response handler
* fix busy assert not handling `Exception`
* Closes#38226
* Closes#38256
Today we have DiscoveryDisruptionIT tests for checking that discovery can still
work once the cluster has formed, even if the cluster is misconfigured and only
has a single master-eligible node in its unicast hosts list. In fact with Zen2
we can go one better: we do not need any nodes in the unicast hosts list,
because nodes also use the contents of the last-committed cluster state for
discovery. Additionally, the DiscoveryDisruptionIT tests were failing due to
the overenthusiastic fault-detection timeouts.
This commit replaces these tests with deterministic `CoordinatorTests` that
verify the same behaviour. It also removes some duplication by extracting a
test method called `testFollowerCheckerAfterMasterReelection()`
Closes#37687
We should increase primary term before renewing leases; otherwise, the
term of the latest RetentionLeases will be lower than the current term.
Relates #37951
If the innerLength is 0, the version won't be increased; then there will
be two RetentionLeases with the same term and version, but their leases
are different.
Relates #37951Closes#38245
Because concurrent sync requests from a primary to its replicas could be
in flight, it can be the case that an older retention leases collection
arrives and is processed on the replica after a newer retention leases
collection has arrived and been processed. Without a defense, in this
case the replica would overwrite the newer retention leases with the
older retention leases. This commit addresses this issue by introducing
a versioning scheme to retention leases. This versioning scheme is used
to resolve out-of-order processing on the replica. We persist this
version into Lucene and restore it on recovery. The encoding of
retention leases is starting to get a little ugly. We can consider
addressing this in a follow-up.
* Instead of replacing the `shardSnapshots` field, we mutate it, explicitly removing entries from it in only a single spot
* Decreased the amount of indirection by moving all logic for starting a snapshot's newly discovered shard tasks into `startNewShards` (saves us two maps (keyed by snapshot) and iterations over them)
In Zen 1 there are commit timeout and publish timeout and these
settings could be changed on-the-fly.
In Zen 2, there is only commit timeout and this setting is static.
RareClusterStateIT is actively using these settings and the fact, they
are dynamic.
This commit adds cancelCommitedPublication method to Coordinator to
be used by tests. This method will cancel current committed publication
if there is any.
When there is BlockClusterStateProcessing on the non-master node, the
publication will be accepted and committed, but not yet applied. So we
can use the method above to cancel it.
Also, this commit replaces callback + AtomicReference with ActionFuture,
which makes test code easier to read.
Using index.routing.allocation.require._host does not correctly work because the boolean logic in
filter matching is broken (DiscoveryNodeFilters.match(...) will return false) when
opType ==OpType.AND
When restoring shards of existing indices, the RestoreService also
restores the values of primary terms stored in the snapshot index
metadata. The primary terms are not updated and could potentially
conflict with current index primary terms if the restored primary terms
are lower than the existing ones.
This situation is likely to happen with replicated closed indices
(because primary terms are increased when the index is transitioning
from open to closed state, and the snapshotted primary terms are the
one at the time the index was opened) (see #38024) and maybe also
with CCR.
This commit changes the RestoreService so that it updates the primary
terms using the maximum value between the snapshotted values and
the existing values.
Related to #33888
If custom date formats are used, there may be combinations that the new
performat DateFormatters.from() method has not covered yet. This adds a
few such corner cases and ensures the tests are correctly commented
out.
In #38158 we ensured that global ordinals are not loaded when another execution hint is explicitly set on the source. This change is a follow up that addresses a comment
dd6043c1c0 (r252984782) added after the merge.
This commit adds the second part of `elasticsearch-node` tool -
`detach-cluster` command in addition to `unsafe-bootstrap` command.
Also, this commit changes the semantics of `unsafe-bootstrap`, now
`unsafe-bootstrap` changes clusterUUID.
So the algorithm of running `elasticsearch-node` tool is the following:
1) Stop all nodes in the cluster.
2) Pick master-eligible node with the highest (term, version) pair and
run the `unsafe-bootstrap` command on it. If there are no survived
master-eligible nodes - skip this step.
3) Run `detach-cluster` command on the remaining survived nodes.
Detach cluster makes the following changes to the node metadata:
1) Sets clusterUUID committed to false.
2) Sets currentTerm and term to 0.
3) Removes voting tombstones and sets voting configurations to special
constant MUST_JOIN_ELECTED_MASTER, that prevents initial cluster
bootstrap.
`ElasticsearchNodeCommand` base abstract class is introduced, because
`UnsafeBootstrapMasterCommand` and `DetachClusterCommand` have a lot in
common.
Also, this commit adds "ordinal" parameter to both commands, because it's
impossible to write IT otherwise.
For MUST_JOIN_ELECTED_MASTER case special handling is introduced in
`ClusterFormationFailureHelper`.
Tests for both commands reside in `ElasticsearchNodeCommandIT` (renamed
from `UnsafeBootstrapMasterIT`).
The current CloseWhileRelocatingShardsIT test adds some "send behavior"
rule to a target node's mocked transport service in order to detect when shard
relocating are started. These rules are never cleared and prevent the test to
complete normally after the rebalance is re-enabled again.
This commit changes the test so that rules are cleared and most verifications
are done before the rebalance is reenabled again.
Closes#38090
Folks at the Lucene project do not seem to be interested in classifying corruptions and
distinguishing them from file-system exceptions (see https://issues.apache.org/jira/browse/LUCENE-8525),
so we'll just cop out as well.
Closes#34322
With #37000 we made sure that fnial reduction is automatically disabled
whenever a localClusterAlias is provided with a SearchRequest.
While working on #37838, we found a scenario where we do need to set a
localClusterAlias yet we would like to perform a final reduction in the
remote cluster: when searching on a single remote cluster.
Relates to #32125
This commit adds support for a separate finalReduce flag to
SearchRequest and makes use of it in TransportSearchAction in case we
are searching against a single remote cluster.
This also makes sure that num_reduce_phases is correct when searching
against a single remote cluster: it makes little sense to return
`num_reduce_phases` set to `2`, which looks especially weird in case
the search was performed against a single remote shard. We should
perform one reduction phase only in this case and `num_reduce_phases`
should reflect that.
* line length
This change forbids negative field boost in the `query_string`, `simple_query_string`
and `multi_match` queries.
Negative boosts are not allowed in Lucene 8 (scores must be positive).
The backport of this change to 6x will turn the error into a deprecation warning
in order to raise the awareness of this breaking change in 7.0.
Closes#33309
Currently, there are a few tests that use autoMinMasterNodes=false and
hence override addExtraClusterBootstrapSettings, mostly this is 10-30
lines of codes that are copy-pasted from class to class.
This PR introduces `InternalTestCluster.setBootstrapMasterNodeIndex`
which is suitable for all classes and copy-paste could be removed.
Removing code is always a good thing!
The terms aggregator loads the global ordinals to retrieve the cardinality of the field to aggregate on. This information is then used to select the strategy to use for the aggregation (breadth_first or depth_first). However this should be avoided if the execution_hint is explicitly set to map since this mode doesn't really need the global ordinals. Since we still need the cardinality of the field this change picks the maximum cardinality in the segments as an estimation of the total cardinality to select the strategy to use (breadth_first or depth_first). This estimation is only used if the execution hint is set to map, otherwise the global ordinals are still used to retrieve the accurate cardinality.
Closes#37705
Today we use `AbstractDisruptionTestCase` to test the behaviour of things like
master elections in the presence of cluster disruptions. These tests have
rather enthusiastic fault detection settings, detecting a fault if a single
ping fails, with a one-second timeout. Furthermore there are some tests that
assert the identity of the master remains unchanged during some disruption, and
these assertions fail rather often thanks to the overly sensitive fault
detector.
However in a number of these tests the fault detector need not be this
sensitive. This commit moves some such tests into their own test suite and uses
more sensible fault-detection settings to avoid the kind of master instability
that is causing CI failures.
Closes#37699
The self written epoch date formatters were not properly able to format
an Instant to a string due to a misconfiguration.
This fix also removes a until now existing runtime behaviour under java
8 regarding the names of the aggregation buckets, which are now the same
as before and have been under java 11.
The '{' as a first character in log line is causing problems for beats when parsing plaintext logs. This can happen if the submitted document has an additional '\n' at the beginning and we are not reformatting.
Trimming the source part of a SlogLog solves that and keeps the logs readable.
closes#38080