This fixes missing to marking shard snapshots as failures when
multiple data-nodes are lost during the snapshot process or
shard snapshot failures have occured before a node left the cluster.
The problem was that we were simply not adding any shard entries for completed
shards on node-left events. This has no effect for a successful shard, but
for a failed shard would lead to that shard not being marked as failed during
snapshot finalization. Fixed by corectly keeping track of all previous completed
shard states as well in this case.
Also, added an assertion that without this fix would trip on almost every run of the
resiliency tests and adjusted the serialization of SnapshotsInProgress.Entry so
we have a proper assertion message.
Closes#47550
If a shard gets closed we properly abort its snapshot
before closing it. We should in thise case make sure to
not throw a confusing exception about trying to increment
the reference on an already closed shard in the async tasks
if the snapshot is already aborted.
Also, added an assertion to make sure that aborts are in
fact the only situation in which we run into a concurrently
closed store.
Rollover previously requested index stats for all indices in the
provided alias, which causes an exception when there is a closed index
with that alias.
This commit adjusts the IndicesOptions used on the index stats
request so that closed indices are ignored, rather than throwing
an exception.
This commit lifts the validation of the monitoring hosts setting into
the setting itself, rather than when the setting is used. This prevents
a scenario where an invalid value for the setting is accepted, but then
later fails while applying a cluster state with the invalid setting.
This adds a default for the `slm.retention_schedule` setting, setting it
to `0 30 1 * * ?` which is 1:30am every day.
Having retention unset meant that it would never be invoked and clean up
snapshots. We determined it would be better to have a default than never
to be run. When coming to a decision, we weighed the option of an
absolute time (such as 1:30am) versus a periodic invocation (like every
12 hours). In the end we decided on the absolute time because it has
better predictability and consistency than a periodic invocation, which
would rely on when the master node were elected or restarted.
Relates to #43663
We don't need to read the SnapshotInfo for a
snapshot to determine the indices that need to
be updated when it is deleted as the `RepositoryData`
contains that information already.
This PR makes it so the `RepositoryData` is used to
determine which indices to update and also removes
the special handling for deleting snapshot metadata
and the CS snapshot blob and has those simply be
deleted as part of the deleting of other unreferenced
blobs in the last step of the delete.
This makes the snapshot delete a little faster and
more resilient by removing two RPC calls
(the separate delete and the get).
Also, this shortens the diff with #46250 as a
side-effect.
While function scores using scripts do allow explanations, they are only
creatable with an expert plugin. This commit improves the situation for
the newer script score query by adding the ability to set the
explanation from the script itself.
To set the explanation, a user would check for `explanation != null` to
indicate an explanation is needed, and then call
`explanation.set("some description")`.
Today we control the extra translog (when soft-deletes is disabled) for
peer recoveries by size and age. If users manually (force) flush many
times within a short period, we can keep many small (or empty) translog
files as neither the size or age condition is reached. We can protect
the cluster from running out of the file descriptors in such a situation
by limiting the number of retaining translog files.
If we fail to read the global metadata in a snapshot
we would throw `SnapshotMissingException` but wouldn't
do so for the index metadata.
This is breaking SLM tests at a low rate because they
use `SnapshotMissingException` thrown from snapshot status APIs
to wait for a snapshot being gone.
Also, we should be consistent here in general and not leak the
`NoSuchFileException` to the transport layer for index meta.
Closes#46508
Since the property defaulted to `true` this deprecation logging
runs every time unless its set to `false` manually (in which case
it should've also logged but didn't).
I didn't add a tests and removed the tests we had in `7.x` that
covered this logging. I did move the check out of the `if (InetAddresses.isInetAddress(hostString) == false) {` condition so this is sort-of covered by the REST tests.
IMO, any unit-test of this would be somewhat redundant and would've forced
adding a field that just indicates that the deprecated property was used to
every instance which seemed pointless.
Closes#47436
* Remove eclipse conditionals
We used to have some meta projects with a `-test` prefix because
historically eclipse could not distinguish between test and main
source-sets and could only use a single classpath.
This is no longer the case for the past few Eclipse versions.
This PR adds the necessary configuration to correctly categorize source
folders and libraries.
With this change eclipse can import projects, and the visibility rules
are correct e.x. auto compete doesn't offer classes from test code or
`testCompile` dependencies when editing classes in `main`.
Unfortunately the cyclic dependency detection in Eclipse doesn't seem to
take the difference between test and non test source sets into account,
but since we are checking this in Gradle anyhow, it's safe to set to
`warning` in the settings. Unfortunately there is no setting to ignore
it.
This might cause problems when building since Eclipse will probably not
know the right order to build things in so more wirk might be necesarry.
* Fix Snapshot Finalization not Waiting for Index Metadata
We were mixing up the listeners here which led to the final listener
that should be called after all the metadata has been written
to be called before that.
I fixed this by removing the one redundant listener and flattening
the logic out.
* Closes#47425
Today when settings validate, they can only validate against settings
that are of the same type. While this strong-type is convenient from a
development perspective, it is too limiting in that some settings need
to validate against settings of a different type. For example, the list
setting xpack.monitoring.exporters.<namespace>.host wants to validate
that it is non-empty if and only if the string setting
xpack.monitoring.exporters.<namespace>.type is "http". Today this is
impossible since the settings validation framework only allows that
setting to validate against other list settings. This commit increases
the flexibility here to validate against settings of arbitrary type, at
the expense of losing strong-typing during development.
The passage formatter that the unified highlighter use doesn't handle terms with overlapping offsets.
For tokenizer that provides multiple segmentation of the same terms (edge ngram for instance) the formatter
should select the largest span in order to highlight the term only once. This change implements this logic.
Changes auto-id index requests to use optype CREATE, making it compliant with our docs.
This will also make these auto-id index requests compatible with the new "create-doc" index
privilege (which is based on the optype), the default optype is changed to create, just as it is
already documented.
The autoGeneratedTimestamp field is internally used to speed up indexing of operations with
auto-ids, as we can rule out duplicates. Setting this field externally can make the index
inconsistent, resulting in duplicate documents with same id.
Adds support for handling auto-id requests with optype CREATE. Also simplifies the code
handling this by using the standard indexing path when dealing with possible retry conflicts.
Relates #47169
Bulk requests currently do not allow adding "create" actions with auto-generated IDs.
This commit allows using the optype CREATE for append-only indexing operations. This is
mainly the user facing aspect of it.
Synonym queries (when two tokens/paths start at the same position) use the alias field instead
of the concrete field to build Lucene queries. This commit fixes this bug by resolving the alias field upfront in order to provide the concrete field to the actual query parser.
Today, we don't clear the shard info of the primary shard when a new
node joins; then we might risk of making replica allocation decisions
based on the stale information of the primary. The serious problem is
that we can cancel the current recovery which is more advanced than the
copy on the new node due to the old info we have from the primary.
With this change, we ensure the shard info from the primary is not older
than any node when allocating replicas.
Relates #46959
This work was done by Henning in #42518.
Co-authored-by: Henning Andersen <henning.andersen@elastic.co>
This commit adjusts randomization for the cluster shard limit tests so
that there is often more of a gap left between the limit and the size of
the first index. This allows the same randomization to be used for all
tests, and alleviates flakiness in
`testIndexCreationOverLimitFromTemplate`.
Today the comment boldly claims that this line of code keeps nodes above the
10-byte low watermark when in fact this is not true at all. This change fixes
this so that it really does keep nodes above the low watermark.
Fixes#45338. Again.
This is a preliminary of #46250 making the snapshot
delete work by doing all the metadata updates first
and then bulk deleting all of the now unreferenced
blobs.
Before this change, the metadata updates for each shard
and subsequent deletion of the blobs that have become unreferenced
due to the delete would happen sequentially shard-by-shard
parallelising only over all the indices in the snapshot.
This change makes it so the all the metadata updates
happen in parallel on a shard level first.
Once all of the updates of shard-level metadata have finished,
all the now unreferenced blobs are deleted in bulk.
This has two benefits (outside of making #46250 a smaller change):
* We have a lower likelihood of failing to update shard level metadata because
it happens with priority and a higher degree of parallelism
* Deleting of unreferenced data in the shards should go much faster in many cases (rolling indices, large number of indices with many unchanged shards) as well because a number of small bulk deletions (just two blobs for `index-N` and `snap-` for each unchanged shard) are grouped into larger bulk deletes of `100-1000` blobs depending on Cloud provider (even though the final bulk deletes are happening sequentially this should be much faster in almost all cases as you'd parallelism of 50 (GCS) to 500 (S3) snapshot threads to achieve the same delete rates when deleting from unchanged shards).
We cancel ongoing peer recoveries if a node joins the cluster with a completely
up-to-date copy of a shard, because we can use such a copy to recover a replica
instantly. However, today we only look for recoveries to cancel while there are
unassigned shards in the cluster. This means that we do not contemplate the
cancellation of the last few recoveries since recovering shards are not
unassigned. It might take much longer for these recoveries to complete than
would be necessary if they were cancelled.
This commit fixes this by checking for cancellable recoveries even if all
shards are assigned.
As a result of #45689 snapshot finalization started to
take significantly longer than before. This may be a
little unfortunate since it increases the likelihood
of failing to finalize after having written out all
the segment blobs.
This change parallelizes all the metadata writes that
can safely run in parallel in the finalization step to
speed the finalization step up again. Also, this will
generally speed up the snapshot process overall in case
of large number of indices.
This is also a nice to have for #46250 since we add yet
another step (deleting of old index- blobs in the shards
to the finalization.
The method Setting#getRaw leaks implementation details about settings,
namely that they are backed by strings. We do not want code to rely upon
this, so this commit makes Setting#getRaw private as a first step
towards hiding the implementaton details of settings from the rest of
the codebase.
Today plugins may provide upgraders for custom metadata and index metadata, but
these upgraders are bypassed during a rolling restart. Fortunately this
extension mechanism is unused by all known plugins. This commit removes these
extension points.
Relates #47297
Today if you create an index in a cluster without any data nodes then it will
report yellow health because it never attempts to assign any shards if there
are no data nodes, so the new shards remain at `AllocationStatus.NO_ATTEMPT`.
This commit moves the new primaries to `AllocationStatus.DECIDERS_NO` in this
situation, causing the cluster health to move to red.
Fixes#41073
Fixes a bug related to how "closed replicated indices" (introduced in 7.2) interact with the index
metadata storage mechanism, which has special handling for closed indices (but incorrectly
handles replicated closed indices). On non-master-eligible data nodes, it's possible for the
node's manifest file (which tracks the relevant metadata state that the node should persist) to
become out of sync with what's actually stored on disk, leading to an inconsistency that is then
detected at startup, refusing for the node to start up.
Closes#47276
Currently DateMathParser with roundUp = true is relying on the DateFormatter build with
combined optional sub parsers with defaulted fields (depending on the formatter).
That means that for yyyy-MM-dd'T'HH:mm:ss||yyyy-MM-dd'T'HH:mm:ss.SSS
Java.time implementation expects optional parsers in order from most specific to
least specific (reverse in the example above).
It is causing a problem because the first parsing succeeds but does not consume the full input.
The second parser should be used.
We can work around this with keeping a list of RoundUpParsers and iterate over them choosing
the one that parsed full input. The same approach we used for regular (non date math) in
relates #40100
The jdk is not considering this to be a bug https://bugs.openjdk.java.net/browse/JDK-8188771
Those below will expect this change first
relates #46242
relates #45284
backport #46654
Backport of #45794 to 7.x. Convert most `awaitBusy` calls to
`assertBusy`, and use asserts where possible. Follows on from #28548 by
@liketic.
There were a small number of places where it didn't make sense to me to
call `assertBusy`, so I kept the existing calls but renamed the method to
`waitUntil`. This was partly to better reflect its usage, and partly so
that anyone trying to add a new call to awaitBusy wouldn't be able to find
it.
I also didn't change the usage in `TransportStopRollupAction` as the
comments state that the local awaitBusy method is a temporary
copy-and-paste.
Other changes:
* Rework `waitForDocs` to scale its timeout. Instead of calling
`assertBusy` in a loop, work out a reasonable overall timeout and await
just once.
* Some tests failed after switching to `assertBusy` and had to be fixed.
* Correct the expect templates in AbstractUpgradeTestCase. The ES
Security team confirmed that they don't use templates any more, so
remove this from the expected templates. Also rewrite how the setup
code checks for templates, in order to give more information.
* Remove an expected ML template from XPackRestTestConstants The ML team
advised that the ML tests shouldn't be waiting for any
`.ml-notifications*` templates, since such checks should happen in the
production code instead.
* Also rework the template checking code in `XPackRestTestHelper` to give
more helpful failure messages.
* Fix issue in `DataFrameSurvivesUpgradeIT` when upgrading from < 7.4
This commit replaces some uses of Setting#getRaw in the throttling
allocation decider settings. Instead, these settings should be using
fallback settings.
This commit removes some leniency that exists in getting the allow
rebalance setting. Fortunately, that leniency is dead code, this can
never happen. The reason this can never happen is because the settings
infrastructure will not allow setting an invalid value for this
setting. If you try to set this in the elasticsearch.yml, then the node
will fail to start, since parsing the setting will fail. If you try to
set this via an update settings API call, then parsing the setting will
fail and the settings update will be rejected. Therefore, this leniency
can never be activated, so we remove it.
This commit is the first of a few in an attempt to remove the public
uses of Setting#getRaw.
This is the Java side of https://github.com/elastic/ml-cpp/pull/593
with a fallback so that ml-cpp bundles with either the
new or old directory structure work for the time being.
A few days after merging the C++ changes a followup to
this change will be made that removes the fallback.
This change merges the `ShardSearchTransportRequest` and `ShardSearchLocalRequest`
into a single `ShardSearchRequest` that can be used to create a SearchContext.
Relates #46523
Backport of #46241
This PR changes the ingest executing to be non blocking
by adding an additional method to the Processor interface
that accepts a BiConsumer as handler and changing
IngestService#executeBulkRequest(...) to ingest document
in a non blocking fashion iff a processor executes
in a non blocking fashion.
This is the second PR that merges changes made to server module from
the enrich branch (see #32789) into the master branch.
The plan is to merge changes made to the server module separately from
the pr that will merge enrich into master, so that these changes can
be reviewed in isolation.
This change originates from the enrich branch and was introduced there
in #43361.
Today if metadata persistence is excessively slow on a master-ineligible node
then the `ClusterApplierService` emits a warning indicating that the
`GatewayMetaState` applier was slow, but gives no further details. If it is
excessively slow on a master-eligible node then we do not see any warning at
all, although we might see other consequences such as a lagging node or a
master failure.
With this commit we emit a warning if metadata persistence takes longer than a
configurable threshold, which defaults to `10s`. We also emit statistics that
record how much index metadata was persisted and how much was skipped since
this can help distinguish cases where IO was slow from cases where there are
simply too many indices involved.
Backport of #47005.
Currently the logic to check if a connection to a remote discovery node
exists and otherwise create a proxy connection is mixed with the
collect nodes, cluster connection lifecycle, and other
RemoteClusterConnection logic. This commit introduces a specialized
RemoteConnectionManager class which handles the open connections.
Additionally, it reworks the "round-robin" proxy logic to create the list
of potential connections at connection open/close time, opposed to each
time a connection is requested.
We can have a large number of shard copies in this test. For example,
the two recent failures have 24 and 27 copies respectively and all
replicas have to copy segment files as their stores are corrupted. Our
CI needs more than 30 seconds to start all these copies.
Note that in two recent failures, the cluster was green just after the
cluster health timed out.
Closes#41899