These tests failed on CI multiple times in the past weeks because they use a
test cluster with a SUITE scope that recreates nodes between tests. With such
a scope, nodes can be recreated in between test executions and can inherit a
node id from a previous test execution, while they are assigned a random data
path. With the successive node recreations it is possible that a newly recreated
node shares the same node id (but different data path) as a non recreated node.
This commit changes the cluster scope of the CorruptedFileIT and FlushIT
tests which often fail.
The failure is reproducable with :
./gradlew :server:integTest -Dtests.seed=EF3A50C225CF377
-Dtests.class=org.elasticsearch.index.store.CorruptedFileIT
-Dtests.security.manager=true -Dtests.locale=th-TH-u-nu-thai-x-lvariant-TH -Dtests.timezone=America/Rio_Branco
-Dcompiler.java=11 -Druntime.java=8
This bug was introduced in #36893 and had the effect that
execution would continue after calling onFailure on the the
listener in checkIfTokenIsValid in the case that the token is
expired. In a case of many consecutive requests this could lead to
the unwelcome side effect of an expired access token producing a
successful authentication response.
* [Analysis] Deprecate Standard Html Strip Analyzer
Deprecate only Standard Html Strip Analyzer
If user create index with the analyzer since 7.0, es throws an exception.
If an index was created before 7.0, es issue deprecation log
We will remove it in 8.0
Related #4704
Today we create a global instance of RecoveryResponse then mutate it
when executing each recovery step. This is okay for the current
sequential recovery flow but not suitable for an asynchronous recovery
which we are targeting. With this commit, we return the result of each
step separately, then construct a RecoveryResponse at the end.
Relates #37174
After #30794, our caching realms limit each principal to a single auth
attempt at a time. This prevents hammering of external servers but can
cause a significant performance hit when requests need to go through a
realm that takes a long time to attempt to authenticate in order to get
to the realm that actually authenticates. In order to address this,
this change will propagate failed results to listeners if they use the
same set of credentials that the authentication attempt used. This does
prevent these stalled requests from retrying the authentication attempt
but the implementation does allow for new requests to retry the
attempt.
Previously the HLRC used a blocking ByteArrayEntity, but the Request
class also allows to set a NByteArrayEntity, and defaults to nonblocking
when calling the createJsonEntity method. This commit cleans up all the
uses of ByteArrayEntity in the RequestConverters to use the nonblocking
entity.
* Use `_count` aggregation value only for not-DISTINCT COUNT function calls
* COUNT DISTINCT will use the _exact_ version of a field (the `keyword` sub-field for example), if there is one
Traditionally remote clusters can be configured dynamically. However,
the compress and ping settings are not currently set to be configured
dynamically. This commit changes that.
This commit implements a straightforward approach to retention lease
expiration. Namely, we inspect which leases are expired when obtaining
the current leases through the replication tracker. At that moment, we
clean the map that persists the retention leases in memory.
This commit converts the epoch time parsing implementation which uses
the java time api to create DateTimeFormatters instead of DateFormatter
implementations. This will allow multi formats for java time to be
implemented in a single DateTimeFormatter in a future change.
Remove several unused helper methods. Most of them are one-liners and should
be easier to be used from the corresponding primitive wrapper classes.
The bytes array conversion methods are unused as well, it should be easy to
re-create them if needed.
This commit fixes an issue with a settings builder method that allows
setting a duration by time unit. In particular, this method can suffer
from a loss of precision. For example, if the input duration is 1500
microseconds then internally we are converting this to "1ms",
demonstrating the loss of precision. Instead, we should internally
convert this to a TimeValue that correctly represents the input
duration, and then convert this to a string using a method that does not
lose the unit. That is what this commit does.
Version needs to be updated after backporting #36997 & #37142
where we added support for providing and serializing localClusterAlias
as well ass absoluteStartMillis.
Relates to #36997 & #37142
Today, a setting can declare that its validity depends on the values of other
related settings. However, the validity of a setting is not always checked
against the correct values of its dependent settings because those settings'
correct values may not be available when the validator runs.
This commit separates the validation of a settings updates into two phases,
with separate methods on the `Setting.Validator` interface. In the first phase
the setting's validity is checked in isolation, and in the second phase it is
checked again against the values of its related settings. Most settings only
use the first phase, and only the few settings with dependencies make use of
the second phase.
The `cluster.unsafe_initial_master_node_count` setting was introduced as a
temporary measure while the design of `cluster.initial_master_nodes` was being
finalised. This commit removes this temporary setting, replacing it with usages
of `cluster.initial_master_nodes` where appropriate.
This commit adds a unique id to cluster blocks, so that they can be uniquely
identified if needed. This is important for the Close Index API where multiple
concurrent closing requests can be executed at the same time. By adding a
UUID to the cluster block, we can generate unique "closing block" that can
later be verified on shards and then checked again from the cluster state
before closing the index. When the verification on shard is done, the closing
block is replaced by the regular INDEX_CLOSED_BLOCK instance.
If something goes wrong, calling the Open Index API will remove the block.
Related to #33888
This commit is the first in a series which will culminate with
fully-functional shard history retention leases.
Shard history retention leases are aimed at preventing shard history
consumers from having to fallback to expensive file copy operations if
shard history is not available from a certain point. These consumers
include following indices in cross-cluster replication, and local shard
recoveries. A future consumer will be the changes API.
Further, index lifecycle management requires coordinating with some of
these consumers otherwise it could remove the source before all
consumers have finished reading all operations. The notion of shard
history retention leases that we are introducing here will also be used
to address this problem.
Shard history retention leases are a property of the replication group
managed under the authority of the primary. A shard history retention
lease is a combination of an identifier, a retaining sequence number, a
timestamp indicating when the lease was acquired or renewed, and a
string indicating the source of the lease. Being leases they have a
limited lifespan that will expire if not renewed. The idea of these
leases is that all operations above the minimum of all retaining
sequence numbers will be retained during merges (which would otherwise
clear away operations that are soft deleted). These leases will be
periodically persisted to Lucene and restored during recovery, and
broadcast to replicas under certain circumstances.
This commit is merely putting the basics in place. This first commit
only introduces the concept and integrates their use with the soft
delete retention policy. We add some tests to demonstrate the basic
management is correct, and that the soft delete policy is correctly
influenced by the existence of any retention leases. We make no effort
in this commit to implement any of the following:
- timestamps
- expiration
- persistence to and recovery from Lucene
- handoff during primary relocation
- sharing retention leases with replicas
- exposing leases in shard-level statistics
- integration with cross-cluster replication
These will occur individually in follow-up commits.
This commit addresses an issue when setting a byte size value setting
using a value that has a fractional component when converted to its
string representation. For example, trying to set a byte size value
setting to a value of 1536 bytes is problematic because internally this
is converted to the string "1.5k". When we go to get this setting, we
try to parse "1.5k" back to a byte size value, which does not support
fractional values. The problem is that internally we are relying on a
method which loses the unit when doing the string conversion. Instead,
we are going to use a method that does not lose the unit and therefore
we can roundtrip from the byte size value to the string and back to the
byte size value.
* Randomly doing non-atomic writes causes rare 0 byte reads from `index-N` files in tests
* Removing this randomness fixes these random failures and is valid because it does not reproduce a real-world failure-mode:
* Cloud-based Blob stores (S3, GCS, and Azure) do not have inconsistent partial reads of a blob, either you read a complete blob or nothing on them
* For file system based blob stores the atomic move we do (to atomically write a file) by setting `java.nio.file.StandardCopyOption#ATOMIC_MOVE` would throw if the file system does not provide for atomic moves
* Closes#37005
The initialization of a suite scope cluster had some sideffects on
subsequent runs which causes issues when tests must be reproduced.
This moves the suite scope initialization to a privte random context.
Closes#36202