By default gateway.recover_after_master_nodes is set to
discovery.zen.minimum_master_nodes but in this Zen2 test this is set to an
unreasonably large value. This change updates it so the cluster can properly
form.
It is important that all shards of a given index have the same
`indexCreatedVersionMajor` to Lucene, or eg. merging those shards is going to
be considered illegal. At the moment, we use the latest Lucene version when
creating a shard, which could cause shards to have different created versions
eg. in case of forced allocation. This commit makes sure to reuse the
appropriate Lucene version in order to avoid such issues.
Closes#33826
Today we configure the soft-deletes field iff soft-deletes enabled.
Although this choice was correct, it prevents an engine with
soft-deletes disabled from opening a Lucene index with soft-deletes.
Moreover, this change should not have any side-effect if a Lucene index
does not have any soft-deletes.
Relates #36141
AutoFollowCoordinator should take into account that after auto following
an index and while updating that a leader index has been followed, that
the auto follow pattern may have been removed via delete auto follow patterns
api.
Also fixed a bug that when a remote cluster connection has been removed,
the auto follow coordinator does not die when it tries get a remote client for
that cluster.
Closes#35480
This commit implements proper metadata recovery for Zen2.
GatewayService is responsible for the recovery. In Zen1 GatewayService
creates an instance of Gateway, that is used to reach out to other cluster
nodes, get their state and calculate the most up-to-date state based on
versions. After that Gateway performs upgrade and archival of
ClusterSettings and closes bad indices. Then recovered state is passed to GatewayService.GatewayRecoveryListener that mixes up current state
and restored state, removes state not recovered block, creates the
routing table and performs re-routing.
In Zen2 we should perform this kind of logic on cluster startup, except
mixing state (because there is nothing to mix) and opening routing table.
This commit refactors out all `ClusterUpdate` functions in a separate class
`ClusterStateUpdaters`, which is used by `Gateway` and `GatewayService`
in case of Zen1, and by `GatewayMetaState` and `GatewayService` in case of
Zen2.
This commit also switches all integration tests that are already using Zen2 from
InMemoryPersistedState to GatewayMetaState.
This commit changes how an operation which requires all index shard
operations permits is executed when a primary term update is required:
the operation and the update are combined so that the operation is
executed after the primary term update under the same blocking
operation.
Closes#35850
Co-authored-by: Yannick Welsch <yannick@welsch.lu>
This test suite can stop all the shared master-eligible nodes, which breaks the
cluster since any non-shared master-eligible nodes are stopped first in the
reset process between tests.
Since this test suite can leave the cluster in this somewhat broken state, it
seems best that it uses a new cluster for each test.
This change adds a soft limit to open scroll contexts that can be controlled with the dynamic cluster setting `search.max_open_scroll_context` (defaults to 500).
The rest interface for remove-policy-from-index API does not support
`_ilm/remove`, it requires that an `{index}` pattern be defined in
the URL path. This fixes the rest-api-spec to reflect the implementation
Today if a node `A` sends a peers request to another node `B` then `B` will
react by sending a peers request back to `A`. However if `A` is not
master-eligible then this reaction is pointless and fails with an exception
saying `non-master-eligible node found`, adding noise to the logs. This change
suppresses this response to non-master-eligible nodes.
This commit is part of our plan to deprecate and ultimately remove the
use of _xpack in the REST APIs.
* Add deprecation for /_xpack/monitoring/_bulk in favor of /_monitoring/bulk
* Removed xpack from the rest-api-spec and tests
* Removed xpack from the Action name
* Removed MonitoringRestHandler as an unnecessary abstraction
* Minor corrections to comments
Relates #35958
Given that we check the max buckets limit on each shard when collecting the buckets, and that non final reduction cannot add buckets (see #35921), there is no point in counting and checking the number of buckets as part of non final reduction phases.
Such check is still needed though in the final reduction phases to make sure that the number of returned buckets is not above the allowed threshold.
Relates somehow to #32125 as we will make use of non final reduction phases in CCS alternate execution mode and that increases the chance that this check trips for nothing when reducing aggs in each remote cluster.
When building a query Lucene distinguishes two cases, queries that require to produce a score and queries that only need to match. We cloned this mechanism in the QueryBuilders in order to be able to produce different queries based on whether they need to produce a score or not. However the only case in es that require this distinction is the BoolQueryBuilder that sets a different minimum_should_match when a `bool` query is built in a filter context..
This behavior doesn't seem right because it makes the matching of `should` clauses different when the score is not required.
Closes#35293
* Moved method `canOpenIndex` is only used in tests -> moved to test CP
* Simplify `org.elasticsearch.index.store.Store#renameTempFilesSafe`
* Delete some dead methods
Closes#36073
The problem showed up on debian 8 which uses aufs docker storage
driver by default as opposed to overlay2 used on other distros.
aufs does not support acls and thus the failure.
The --use-ntvfs option instructs samba not to rely on acls.
From what I can tell this is an implementation detail that should not
affect the tests ( which continue to pass )
Currently is `java` is not in $PATH the preinst script fails
prematurely and prevents an appropriate message from getting displayed
to the user.
Make package installation more user friendly when java is not in
$PATH and add a test for it.
Also use a she-bang in the preinst script, as, at least in Debian,
maintainer scripts must start with the #! convention [1].
Relates #31845
[1] https://www.debian.org/doc/debian-policy/ch-maintainerscripts.html
* Replace Streamable w/ Writeable in BaseTasksRequest and subclasses
This commit replaces usages of Streamable with Writeable for the
BaseTasksRequest / TransportTasksAction classes and subclasses of
these classes.
Relates to #34389
Introduces a debug log message when a bind fails and a trace message
when a bind succeeds.
It may seem strange to only debug a bind failure, but failures of this
nature are relatively common in some realm configurations (e.g. LDAP
realm with multiple user templates, or additional realms configured
after an LDAP realm).
This fixes a failure of InternalTestClusterTests#testBeforeTest which checks
that the cluster is set up the same when starting from the same seed. Trappily,
using ESTestCase#randomIntBetween() is no good, we have to use
InternalTestCluster#random via RandomNumbers#randomIntBetween() instead.
In #36033 we removed a catch block because we thought we were preventing
exceptions by avoiding concurrent elections, missing the obvious fact that some
joins are supposed to be failing.
As a quick fix the catch was reinstated in 3a5dab6d8e
but this change adds finesse by only catching exceptions from the joins that we
expect to fail. It also inlines an always-false parameter to `initialState()`.
Today, we allow all nodes in an integration test to bootstrap. However this
seems to lead to test failures due to post-election instability. The change
avoids this instability by only bootstrapping a single node in the cluster.
The logic in the dockerComposeSupported method currently returns false
even when docker and docker compose are available on the build machine.
This change updates the check to see if docker compose is available in
one of the two paths and allows the `tests.fixture.enabled` property to
disable the tests even if docker compose is available.
Adds about a minute worth of backoffs and retries to saving task
results so it is *much* more likely that a busy cluster won't lose task
results. This isn't an ideal solution to losing task results, but it is
an incremental improvement. If all of the retries fail when still log
the task result, but that is far from ideal.
Closes#33764
The new limit on the number of open shards in a cluster may be
interpreted by users as a sizing recommendation, but it is not. This
clarifies in the documentation that this is a safety limit, not a
recommendation.
Empty buckets don't need to be added when performing an incremental reduction step, they can be added later in the final reduction step. This will allow us to later remove the max buckets limit when performing non final reduction.