This commit attempts to remove the retention leases on the leader shards
when unfollowing an index. This is best effort, since the leader might
not be available.
This test had been disabled because of test failures, but it only affected the
6.x branch. The fix for 6.x is at #39054. On master/7.x/7.0 we can reenable the
test as-is.
This defaults to "true" (current behavior) and will throw an exception
if there is a property that cannot be recognized. If "false", it will
ignore anything unrecognizable.
(cherry picked from commit 38fbf9792bcf4fe66bb3f17589e5fe6d29748d07)
Today's calculation of the maximum number of shards per attribute is rather
convoluted. This commit clarifies that it returns
ceil(shardCount/numberOfAttributes).
Blob store compression was not enabled for some of the files in
snapshots due to constructor accessing sub-class fields. Fixed to
instead accept compress field as constructor param. Also fixed chunk
size validation to work.
Deprecated repositories.fs.compress setting as well to be able to unify
in a future commit.
The watcher trigger service could attempt to modify the perWatchStats
map simultaneously from multiple threads. This would cause the
internal state to become inconsistent, in particular the count()
method may return an incorrect value for the number of watches.
This changes replaces the implementation of the map with a
ConcurrentHashMap so that its internal state remains consistent even
when accessed from mutiple threads.
Backport of: #39092
This ensures we do not attempt to parse non english locale dates
in FIPS mode. The error, originally assumed to affect only Joda,
affects Java time in the same manner and manifests only with the
version of BouncyCastle FIPS certified provider we use in tests.
The upstream issue https://github.com/bcgit/bc-java/issues/405
indicates that the behavior is resolved in later versions of the
BouncyCastle library and should be tested again when the new
versions become FIPS 140 certified
the RethrottleTests assumed that tasks that were
unprepared to rethrottle would bubble up into the
Rethrottle response as an ElasticsearchException
wrapping an IllegalArgumentException. This seems to
have changed to potentially involve further levels of
wrapping.
This change makes the retry logic more resilient to
arbitrary nesting of the underlying IllegalArgumentException
Sometimes we turn objects into strings for logging or debugging using
`toString()`, but the default implementation is often unhelpful. This change
improves on this in two places I ran into recently.
The ML memory tracker does searches against ML results
and config indices. These searches can be asynchronous,
and if they are running while the node is closing then
they can cause problems for other components.
This change adds a stop() method to the MlMemoryTracker
that waits for in-flight searches to complete. Once
stop() has returned the MlMemoryTracker will not kick
off any new searches.
The MlLifeCycleService now calls MlMemoryTracker.stop()
before stopping stopping the node.
Fixes#37117
This test had a bug. We attempt to allow only the primary to be
allocated, to force all replicas to recovery from the primary after we
had set the state of the retention leases on the primary. However, in
building the index settings, we were overwriting the settings that
exclude the replicas from being allocated. This means that some of the
replicas would end up assigned and rather than receive retention leases
during recovery, they would be part of the replication group receiving
retention leases as they are manipulated. Since retention lease renewals
are only synced periodically, this means that the replica could be
lagging a little behind in some cases leading to an assertion tripping
in the test. This commit addresses this by ensuring that the replicas
are indeed not allocated until after the retention leases are done being
manipulated on the replica. We did this by not overwriting the exclude
settings.
Closes#39105
These two changes are interlinked.
Before this change unsetting ML upgrade mode would wait for all
datafeeds to be assigned and not waiting for their corresponding
jobs to initialise. However, this could be inappropriate, if
there was a reason other that upgrade mode why one job was unable
to be assigned or slow to start up. Unsetting of upgrade mode
would hang in this case.
This change relaxes the condition for considering upgrade mode to
be unset to simply that an assignment attempt has been made for
each ML persistent task that did not fail because upgrade mode
was enabled. Thus after unsetting upgrade mode there is no
guarantee that every ML persistent task is assigned, just that
each is not unassigned due to upgrade mode.
In order to make setting upgrade mode work immediately after
unsetting upgrade mode it was then also necessary to make it
possible to stop a datafeed that was not assigned. There was
no particularly good reason why this was not allowed in the past.
It is trivial to stop an unassigned datafeed because it just
involves removing the persistent task.
The parseMillis method was able to work on formats without timezones by
falling back to UTC. The Date Formatter interface did not support this, as
the calling code was using the `Instant.from` java time API.
This switches over to an internal method which adds UTC as a timezone.
Closes#39067
As the Dockerfile evolved we don't need anymore certain commands like
`unzip`, `which` and `wget` allowing us to slightly shrink the images.
Backport of: #39040
- Notes that you can adjust the `s3.client.*.endpoint` setting to point to a
repository held on an S3-compatible service.
- Notes that the default is `s3.amazonaws.com` and not to auto-detect the
endpoint.
- Reformats docs to width.
Closes#35925
Prior to this commit, if during fetch leader / follower GCP
a fatal error occurred, then the shard follow task was removed.
This is unexpected, because if such an error occurs during the lifetime of shard follow task then replication is stopped and the fatal error flag is set. This allows the ccr stats api to report the fatal exception that has occurred (instead of the user grepping through the elasticsearch logs).
This issue was found by a rare failure of the `FollowStatsIT#testFollowStatsApiIncludeShardFollowStatsWithRemovedFollowerIndex` test.
Closes#38779
* Disable specific locales for tests in fips mode
The Bouncy Castle FIPS provider that we use for running our tests
in fips mode has an issue with locale sensitive handling of Dates as
described in https://github.com/bcgit/bc-java/issues/405
This causes certificate validation to fail if any given test that
includes some form of certificate validation happens to run in one
of the locales. This manifested earlier in #33081 which was
handled insufficiently in #33299
This change ensures that the problematic 3 locales
* th-TH
* ja-JP-u-ca-japanese-x-lvariant-JP
* th-TH-u-nu-thai-x-lvariant-TH
will not be used when running our tests in a FIPS 140 JVM. It also
reverts #33299
The build script file for the `:libs:elasticsearch-ssl-config` and
`:libs:ssl-config-tests` projects was incorrectly named `eclipse.build.gradle`
while the expected name was `eclipse-build.gradle`.
In addition, this also adds a missing snippet in the `build.gradle` conf file,
that fixes the project setup for Eclipse users.
When retention leases fail to sync after an expiration check, we emit a
log message about this. This commit adds the retention leases that
failed to sync.
When the background retention lease sync fires, we check an see if any
retention leases are expired. If any did expire, we execute a full
retention lease sync (write action). Since this is happening on a
background thread, we do not block that thread waiting for success (it
will simply try again when the timer elapses). However, we were
swallowing exceptions that indicate failure. This commit addresses that
by logging the failures. Additionally, we add some trace logging to the
execution of syncing retention leases.
Fixed two potential causes for leaked threads during tests:
1. When adding a channel to serverChannels, we add it under a monitor
that we do not use when reading from it. This is potentially unsafe if
there is no other happens-before relationship ensuring the safety of
this.
2. Long-shot but if the thread pool was shutdown before entering this
code, we would silently forget about closing server channels so added
assert.
Strengthened the locking to ensure that once we stop the transport, no
new server channels can be made.
Relates to CI failure issue: #37543