This commit simplifies the throttling logic in InitialSearchPhase and removes some asserts from it. Also, a few formatting changes are applied to its code and surrounding classes.
A publication can succeed and complete before all nodes have applied the
published state and acknowledged it, thanks to the publication timeout; however
we need every node eventually either to apply the published state (or a later
state) or be removed from the cluster. This change introduces the LagDetector
which achieves this liveness property by removing any lagging nodes from the
cluster.
The `wait_for_metadata_version` parameter will instruct the cluster state
api to only return a cluster state until the metadata's version is equal or
greater than the version specified in `wait_for_metadata_version`. If
the specified `wait_for_timeout` has expired then a timed out response
is returned. (a response with no cluster state and wait for timed out flag set to true)
In the case metadata's version is equal or higher than `wait_for_metadata_version`
then the api will immediately return.
This feature is useful to avoid external components from constantly
polling the cluster state to whether somethings have changed in the
cluster state's metadata.
Code that operates on-top of the engine requires all readers returned to be
unwrapped into ElasticsearchDirectoryReader. The special reader
the FrozenEngine uses wasn't wrapped.
Today voting tombstones are stored in CoordinationMetaData as
Set<DiscoveryNode>.
DiscoveryNode is not a lightweight object and have a lot of fields.
It also has toXContent method, but no fromXContent method and the
output of toXContent is not enough to re-create DiscoveryNode
object.
And votingTombstone set should be persisted as a part of MetaData.
On the other hand, the only thing required from the tombstone is the
nodeId.
This PR adds VotingTombstone class for voting tombstones, which
consists of two fields for now - nodeId and nodeName. It could be
extended/shrank in the future if needed.
This PR also resolves TODO's related to the voting tombstones xcontent
story.
Example of CoordinationMetaData.toXContent with voting tombstones:
{
"term": 1,
"last_committed_config": [
"fkwLdOBvXSlgRTBfgNAL",
"tmQiPGHvUxXzPkkCDSJo",
"HhOmtQBZAThpHIGWhxpz",
"qZHWGpoDNPYRNIiqKsDl"
],
"last_accepted_config": [
"lhqacKmriwhHGFZcvqbx",
"MYysmBuROkvJRlDcusyd"
],
"voting_tombstones": [
{
"node_id": "McjbZbRkEz",
"node_name": "pdKIWeNJUO"
},
{
"node_id": "cpXkVibGwo",
"node_name": "UnCvFgdVsc"
},
{
"node_id": "EylRNOztbc",
"node_name": "ohOhkbMWZX"
}
]
}
Today when rolling a transog generation we copy the checkpoint from
`translog.ckp` to `translog-nnnn.ckp` using a simple `Files.copy()` followed by
appropriate `fsync()` calls. The copy operation is not atomic, so if we crash
at the wrong moment we can leave an incomplete checkpoint file on disk. In
practice the checkpoint is so small that it's either empty or fully written.
However, we do not correctly handle the case where it's empty when the node
restarts.
In contrast, in `recoverFromFiles()` we _do_ copy the checkpoint atomically.
This commit extracts the atomic copy operation from `recoverFromFiles()` and
re-uses it in `rollGeneration()`.
This pull request exposes two new methods in the IndexShard and
TransportReplicationAction classes in order to allow transport replication
actions to acquire all index shard operation permits for their execution.
It first adds the acquireAllPrimaryOperationPermits() and the
acquireAllReplicaOperationsPermits() methods to the IndexShard class
which allow to acquire all operations permits on a shard while exposing
a Releasable. It also refactors the TransportReplicationAction class to
expose two protected methods (acquirePrimaryOperationPermit() and
acquireReplicaOperationPermit()) that can be overridden when a transport
replication action requires the acquisition of all permits on primary and/or
replica shard during execution.
Finally, it adds a TransportReplicationAllPermitsAcquisitionTests which
illustrates how a transport replication action can grab all permits before
adding a cluster block in the cluster state, making subsequent operations
that requires a single permit to fail).
Related to elastic #33888
We didn't check that the ExplainLifecycleRequest was constructed with at least
one index before, now that we do we must also make sure the tests
mutateInstance() method used in equals/hashCode checks doesn't accidentally
create an empty index array.
Closes#35822
This endpoint was not previously documented as it was not
particularly useful to end users. However, since the HLRC
will support the endpoint we need some documentation to
link to.
The purpose of the endpoint is to provide defaults and
limits used by ML. These are needed to fully understand
configurations that have missing values because the missing
value means the default should be used.
Relates #35777
* Forbid negative scores in functon_score query
- Throw an exception when scores are negative in field_value_factor
function
- Throw an exception when scores are negative in script_score
function
Relates to #33309
After #35332 has been merged, we noticed some test failures like #35597
in which one or more replica shards failed to be promoted as primaries
because the primary replica re-synchronization never succeed.
After some digging it appeared that the execution of the resync action was
blocked because of the presence of a global cluster block in the cluster state
(in this case, the "no master" block), making the resync action to fail when
executed on the primary.
Until #35332 such failures never happened because the
TransportResyncReplicationAction is skipping the reroute phase, the only
place where blocks were checked. Now with #35332 blocks are checked
during reroute and also during the execution of the transport replication
action on the primary. After some internal discussion, we decided that the TransportResyncReplicationAction should never be blocked. This action is
part of the replica to primary promotion and makes sure that replicas are in
sync and should not be blocked when the cluster state has no master or
when the index is read only.
This commit changes the TransportResyncReplicationAction to make obvious
that it does not honor blocks. It also adds a simple test that fails if the resync
action is blocked during the primary action execution.
Closes#35597
This commit moves the documentation and examples for the `index_prefixes`
option on text fields to its own file, to bring it in line with other mapping
parameters, and expands a bit on both.
`testIncompatibleDiffResendsFullState` sometimes makes a 2-node cluster and
then partitions one of the nodes from the leader, which makes the leader stand
down. Then when the partition is removed the cluster re-forms but does so by
sending full cluster states, not diffs, causing the test to fail.
Additionally `testDiffBasedPublishing` sometimes fails if a publication is
delivered out-of-order, wiping out a fresher last-received cluster state with a
less-fresh one. This is fixed here by passing the received cluster state to the
coordinator before recording it as the last-received one, relying on the
coordinator's freshness checks.
When there is no persistent tasks metadata we could hit a null pointer
exception when executing a follower stats request. This is because we
inspect the persistent tasks metadata. Yet, if no tasks have been
registered, this is null (as opposed to empty). We need to avoid
de-referencing the persistent tasks metadata in this case. That is what
this commit does, and we add a test for this situation.
By setting the cron to 2017, we ensure it won't trigger. This makes it
easier to test because we know the job will _only_ be in STARTED,
and we can ignore INDEXING states due to transient triggers.
Closes#35779