This fixes our cluster formation task to run REST tests against a mixed version cluster.
Yet, due to some limitations in our test framework `indices.rollover` tests are currently
disabled for the BWC case since they select the current master as the merge node which
happens to be a BWC node and we can't relocate all shards to it since the primaries are on
a higher version node. This will be fixed in a followup.
Closes#21142
Note: This has been cherry-picked from 5.0 and fixes several rest tests
as well as a BWC break in `OsStats.java`
The network disruption type "network delay" continues delaying existing requests even after the disruption has been cleared. This commit ensures that the requests get to execute right after the delay rule is cleared.
This test failed when the node that was shutting down was not yet removed from the cluster state on the master.
The cluster allocation explain API will not see any unassigned shards until the node shutting down is removed from the
cluster state.
Previously, if a node left the cluster (for example, due to a long GC),
during a snapshot, the master node would mark the snapshot as failed, but
the node itself could continue snapshotting the data on its shards to the
repository. If the node rejoins the cluster, the master may assign it to
hold the replica shard (where it held the primary before getting kicked off
the cluster). The initialization of the replica shard would repeatedly fail
with a ShardLockObtainFailedException until the snapshot thread finally
finishes and relinquishes the lock on the Store.
This commit resolves the situation by ensuring that when a shard is removed
from a node (such as when a node rejoins the cluster and realizes it no longer
holds the active shard copy), any snapshotting of the removed shards is aborted.
In the scenario above, when the node rejoins the cluster, it will see in the cluster
state that the node no longer holds the primary shard, so IndicesClusterStateService
will remove the shard, thereby causing any snapshots of that shard to be aborted.
Closes#20876
The cluster state on a node is updated either
- by incoming cluster states that are received from the active master or
- by the node itself when it notices that the master has gone.
In the second case, the node adds the NO_MASTER_BLOCK and removes the current master as active master from its cluster state. In one particular case, it would also update the list of nodes, removing the master node that just failed. In the future, we want a clear separation between actions that can be executed by a master publishing a cluster state and a node locally updating its cluster state when no active master is around.
This commit fixes responses to HEAD requests so that the value of the
Content-Length is correct per the HTTP spec. Namely, the value of this
header should be equal to the Content-Length if the request were not a
HEAD request.
This commit also fixes a memory leak on HEAD requests to the main action
that arose from the bytes on a builder not being released due to them
being dropped on the floor to ensure that the response to the main
action did not have a body.
Relates #21123
This commit adds preformatted tags to the Javadoc for
OsProbe#readSysFsCgroupCpuAcctCpuStat to render the form of the cpu.stat
file in a fixed-width font.
When acquiring cgroup stats, we check if such stats are available by
invoking a method areCgroupStatsAvailable. This method checks
availability by looking for existence of some virtual files in
/proc/self/cgroup and /sys/fs/cgroups. If these stats are not available,
the getCgroup method returns null. The OsProbeTests#testCgroupProbe did
not account for this. On some systems where tests run, the cgroup stats
might not be available yet this test method was expecting them to be (we
mock the relevant virtual file reads). This commit handles the execution
of this test on such systems by overriding the behavior of
OsProbe#areCgroupStatsAvailable. We test both the possibility of this
method returning true as well as false.
On some systems, cgroups will be available but not configured. And in
some cases, cgroups will be configured, but not for the subsystems that
we are expecting (e.g., cpu and cpuacct). This commit strengthens the
handling of cgroup stats on such systems.
Relates #21094
When system starts, it creates a temporary file named .es_temp_file to ensure the data directories are writable.
If system crashes after creating the .es_temp_file but before deleting this file, next time the system will not be able to start because the Files.createFile(resolve) will throw an exception if the file already exists.
Currently test that check that equals() and hashCode() are working as expected
for classes implementing them are quiet similar. This change moves common
assertions in this method to a common utility class. In addition, another common
utility function in most of these test classes that creates copies of input
object by running them through a StreamOutput and reading them back in, is moved
to ESTestCase so it can be shared across all these classes.
Closes#20629
Today the request interceptor can't support async calls since the response
of the async call would execute on a different thread ie. a client or listener
thread. This means in-turn that the intercepted handler is not executed with the
thread it was supposed to run and therefor can, if it's executing blocking
operations, potentially deadlock an entire server.
Before this commit `curl -XHEAD localhost:9200?pretty` would return
`Content-Length: 1` and a body which is fairly upsetting to standards
compliant tools. Now it'll return `Content-Length: 0` with an empty
body like every other `HEAD` request.
Relates to #21075
Refactors the BalancedShardsAllocator to create a method that
provides an allocation decision for allocating a single
unassigned shard or a single started shard that can no longer
remain on its current node. Having a separate method that
provides a detailed decision on the allocation of a single shard
will enable the cluster allocation explain API to directly
invoke these methods to provide allocation explanations.
This experimental setting enables relocation of shards that are being snapshotted, which can cause the shard allocation failures. This setting is undocumented and there is no good reason to set it in production.
This commit cleans up the code handling load averages in OsProbe:
- remove support for BSD; we do not support this OS
- add Javadocs
- strengthen assertions and testing
- add debug logging for exceptional situation
Relates #21037
This change moves providing UnicastHostsProvider for zen discovery to be
pull based, adding a getter in DiscoveryPlugin. A new setting is added,
discovery.zen.hosts_provider, to separate the discovery type from the
hosts provider for zen when it is selected. Unfortunately existing
plugins added ZenDiscovery with their own name in order to just provide
a hosts provider, so there are already many users setting the hosts
provider through discovery.type. This change also includes backcompat,
falling back to discovery.type when discovery.zen.hosts_provider is not
set.
The max score returned in the response of a query does not take rescorer into account.
This change updates the max_score when a rescorer is used in a query.
Fixes#20651
This change adds a TypesQuery that checks if the disjunction of types should be rewritten to a MatchAllDocs query. The check is done only if the number of terms is below a threshold (16 by default and configurable via max_boolean_clause).
* Move all zen discovery classes into o.e.discovery.zen
This collapses sub packages of zen into zen. These all had just a couple
classes each, and there is really no reason to have the subpackages.
* fix checkstyle
Date math index/alias expressions in mget will now be resolved to a concrete single index instead of failing the mget item with an `IndexNotFoundException`.
Added also an integration test to verify multi index aliases do not fail the entire mget request.
Closes#17957
This commit removes an undocumented output parameter node_info_format
from the cluster stats and node stats APIs. Currently the parameter does
not even work as it is not whitelisted as an output parameter. Since
this parameter is not documented, we opt to just remove it.
Relates #21021
When indices stats are requested via the node stats API, there is a
level parameter to request stats at the index, node, or shards
level. This parameter was not whitelisted when URL parsing was made
strict. This commit whitelists this parameter.
Additionally, there was some leniency in the parsing of this parameter
that has been removed.
Relates #21024
This change makes the ElectMasterService local to ZenDiscovery, no
longer created by guice, and thus also removes the ability for plugins
to customize. This extension point is no longer used by anything.
This was an error-prone version type that allowed overriding previous
version semantics. It could cause primaries and replicas to be out of
sync however, so it has been removed.
This is related to #20377, which removed the feature entirely. This
allows operations to continue to use the `force` version type if the
index was created before 6.0, in the event a document using it exists in
a translog being replayed.