Today the `CoordinatorTests` run the publication process as a single
atomic action; however in production it appears possible that another
master may be elected, publish its state, then fail, then we win another
election, all in between the time we sampled our previous cluster state
and started to publish the one we first thought of.
This violates the `assertClusterStateConsistency()` assertion that
verifies the cluster state update event matches the states we actually
published and applied.
This commit adjusts the tests to run the publication process more
asynchronously so as to allow time for this behaviour to occur. This
should eventually result in a reproduction of the failure in #61437 that
will let us analyse what's really going on there and help us fix it.
Today we use `long` to represent the number of parts of a blob. There's
no need for this extra range, it forces us to do some casting elsewhere,
and indeed when snapshotting we iterate over the parts using an `int`
which would be an infinite loop in case of overflow anyway:
for (int i = 0; i < fileInfo.numberOfParts(); i++) {
This commit changes the representation of the number of parts of a blob
to an `int`.
We convert longs to ints using `Math.toIntExact` in places where we're
sure there will be no overflow, but this doesn't explain the intent of
these conversions very well. This commit introduces a dedicated method
for these conversions, and adds an assertion that we never overflow.
This commit removes the tasks module that only existed to define the
tasks result index, `.tasks`, as a system index. The definition for
the tasks results system index descriptor is moved to the
`SystemIndices` class with a check that no other plugin or module
attempts to define an entry with the same source.
Additionally, this change also makes the pattern for the tasks result
index a wildcard pattern since we will need this when the index is
upgraded (reindex to new name and then alias that to .tasks).
Backport of #61540
DeprecationLogger's constructor should not create two loggers. It was
taking parent logger instance, changing its name with a .deprecation
prefix and creating a new logger.
Most of the time parent logger was not needed. It was causing Log4j to
unnecessarily cache the unused parent logger instance.
depends on #61515
backports #58435
Backport to add case insensitive support for regex queries.
Forks a copy of Lucene’s RegexpQuery and RegExp from Lucene master.
This can be removed when 8.7 Lucene is released.
Closes#59235
Splitting DeprecationLogger into two. HeaderWarningLogger - responsible for adding a response warning headers and ThrottlingLogger - responsible for limiting the duplicated log entries for the same key (previously deprecateAndMaybeLog).
Introducing A ThrottlingAndHeaderWarningLogger which is a base for other common logging usages where both response warning header and logging throttling was needed.
relates #55699
relates #52369
backports #55941
* Faster `equals` for `BytesArray` which is nice since with this change we use it for the search cache
* Lighter `StreamInput` for `BytesArray` that should save memory and some indirection relative to the one on the abstract bytes reference
* Lighter `writeTo` implementation
* Build a `BytesArray` instead of a PagedBytesReference whenever possible to save indirection and memory
This is mostly motivated by the performance issues we are seeing around the GET mappings
REST API which (in case of a large number of indices) will create decompressing streams in a hot loop
which takes a significant amount of time for the system calls involved in instantiating deflaters
and inflaters.
Also, this fixes a leaked deflater when deserializing cached repository data.
This method might have materialize all the bytes in a reference into a fresh `byte[]`.
Using the stream is much safer and only trivially more expensive + in most cases we now run the fast path via `BytesArray` anyway.
This optimization is more relevant in the context of CCR. When a node in
the follower cluster leaves, we reallocate the shard-follow tasks on
that node to other nodes. The new tasks will overwhelm the follower
cluster with many put-mapping, update-settings requests, although most
of them are noop. This change detects and optimizes the noop
update-settings requests.
This continues #61301, migrating all of the mappers in `server` to the
new `MapperTestCase` which is nicer than `FieldMapperTestCase` because
it doesn't depend on all of Elasticsearch.
It's unnecessary (and adds one string comparison to every request) to special
case the favicon so I added it as a normal REST handler to simplify the code.
Wrapping a `BytesArray` in a `StreamInput` for deserialization is inefficient.
This forces Jackson to internally buffer (i.e. copy) all bytes from the `BytesArray`
before deserializing, adding overhead for copying the bytes and managing the buffers.
This commit fixes a number of spots where `BytesArray` is the most common type of
`BytesReference` to special case this type and parse it more efficiently.
Also improves parsing `String`s to use the more efficient direct `String` parsing APIs.
Today a remote cluster connection comprises a `PING` and a `REG`
channel. The `PING` channel is only used for health checks between the
elected master and the members of its own cluster, so is unused in a
remote cluster connection. This commit removes this unused connection.
For large responses to the get mappings request, the serialization
to XContent can be extremely slow (serializing mappings is expensive since
we have to decompress and deserialize the mapping source).
To not introduce instability on the IO thread handling the get mappings response
we should move the serialization to the management pool.
The trade-off of introducing one or two new context switches for responses that are
small enough to not cause trouble on the transport thread to prevent instability
in case of a large number of mappings in the cluster seems worth it.
It is not realistic to drop messages without eventually failing.
To retain the coverage of long pauses this PR adjusts the blackholed
behavior to fail a send after 24h (which is assumed to be longer than any
timeout in the system) instead of never.
Closes#61034
Before when a value was copied to a field through a parent field or `copy_to`,
we parsed it using the `FieldMapper` from the source field. Instead we should
parse it using the target `FieldMapper`. This ensures that we apply the
appropriate mapping type and options to the copied value.
To implement the fix cleanly, this PR refactors the value parsing strategy. Now
instead of looking up values directly, field mappers produce a helper object
`ValueFetcher`. The value fetchers are responsible for almost all aspects of
fetching, including looking up the right paths in the _source.
The PR is fairly big but each commit can be reviewed individually.
Fixes#61033.
Previously we didn't retain the requested fields when performing a shallow copy
of the search source. This meant that when a search was rewritten, we could drop
the requested fields and fail to return them in the response.
Saving some cycles here and there on the IO loop:
* Don't instantiate new `Runnable` to execute on `SAME` in a few spots
* Don't instantiate complicated wrapped stream for empty messages
* Stop instantiating almost never used `ClusterStateObserver` in two spots
* Some minor cleanup and preventing pointless `Predicate<>` instantiation in transport master node action
In addition, this commit converts ScaledFloatFieldMapper as it was relying
on a number of static values taken from NumberFieldMapper that had changed
or been removed.
This switches a few tests for field mappers from `ESSingleNodeTestCase`
to `ESTestCase` because, in general, we prefer to avoid
`ESSingleNodeTestCase` when we can because it is slow and "big". "Big"
here means that it pulls in an entire node, making it difficult to
reason about what you are testing.
We have to set the recovery setting to `0` if we don't want throttling
from recoveries. Otherwise the randomized value used for this setting in
tests can lead to throttling unexpectedly.
Closes#61311
With #60683 we stopped forcing aggregating all docs using a single
Aggregator which made some of our accuracy assumptions about the stats
aggregator incorrect. This adds a test that does the forcing and asserts
the old accuracy and adds a test without the forcing with much looser
accuracy guarantees.
Closes#61132
The FieldNamesFieldMapper field has different behaviour for indexes created in
clusters earlier than v6.1, and the code to deal with this was still using the vestigial
FieldType field of FieldMapper in its indexing path. This meant that documents
added after an upgrade were not correctly indexing their field names field. This
commit corrects the parseCreateField method to use the default field type.
Fixes#61305
Today a common reason for a `ShardLockObtainFailedException` is when a
shard is removed from a node and then assigned straight back to it again
before the node has had a chance to shut the previous shard instance
down. For instance, this can happen if a node briefly leaves the cluster
holding a primary with no in-sync replicas.
The message in this case is typically as follows:
obtaining shard lock timed out after 5000ms, previous lock details: [shard creation] trying to lock for [shard creation]
This is pretty hard to interpret, and doesn't raise the important
question: "why didn't the shard shut down sooner?"
With this change we reword the message a bit, report the age of the
shard lock, and adjust the details to report that the lock is held by a
closing shard:
obtaining shard lock for [starting shard] timed out after [5000ms], lock already held for [closing shard] with age [12345ms]
Relates #38807
We have seen a situation where the total search operations are higher
than expected. Unfortunately, we did not have enough info to figure it
out. This commit adds the failures to the error to provide more context
and adjusts the log level in case of failure to debug.
We only work with heap byte buffers at this point and those we can and do unwrap the
`byte[]` ourselves and use `BytesArray` instead of a needless level of indirection via `ByteBuffer`.
There is a corner case here in which during partial snapshot the index is
deleted right between starting the snapshot in the CS and the data node getting to work
on it, causing the data node the fail that shard snapshot and making the snapshot `PARTIAL`.
Closes#61208
Adds a method to make a random date `DateFormatter` pattern. We expect
this'll be useful for runtime fields to compate their formatting with
the standard date field.
Currently we occasionally can get ArithmeticException from parsing bad input
values on 'date' fields that are passed on even if 'ignore_malformed' is set.
This change adds this exception to the ones we already catch for malformed
values.
Closes#52634
Today we allocate a new `byte[]` for each document written to the
cluster state. Some of these documents may be quite large. We need a
buffer that's at least as large as the largest document, but there's no
need to use a fresh buffer for each document.
With this commit we re-use the same `byte[]` much more, only allocating
it afresh if we need a larger one, and using the buffer needed for one
round of persistence as a hint for the size needed for the next one.
The 7.x branch preserves the legacy discovery mechanism from 6.x purely
for running internal cluster tests; this mechanism is otherwise
completely untested and unsupported. However it is still technically
possible to use it outside of the test suite if you dig through the
source code to work out what settings need to be set. With this change
we make it impossible to use this mechanism in production.
Closes#61177
The ReloadSecureSettingsIT makes requests to the reload settings apis.
In 7.x, the client used from the integ test infrastructure may be a
transport client. In that case, the expected exception type, and causes
the test to fail (though it will hang indefinitely due to not counting
down the latch, see
https://github.com/elastic/elasticsearch/pull/60800). This commit adds
unwrapping of the remote exception to get the underlying expected
exception.
closes#51546
Use transport blocking to make relocation take forever instead of relying on the relocation to take long enough to clash with the snapshot.
Closes#61069
It is disastrous if we commit an incremental cluster state update
without having written the full state first. We assert that this doesn't
happen, but it is hard to fully test the myriad ways that things might
fail in a messy production environment. Given the disastrous
consequences it is worth erring on the side of caution in this area.
This commit fails invalid writes even if assertions are disabled.
This commit adds the `data_hot`, `data_warm`, `data_cold`, and `data_frozen` node roles to the
x-pack plugin. These roles are intended to be the base for the formalization of data tiers in
Elasticsearch.
These roles all act as data nodes (meaning shards can be allocated to them). Nodes with the existing
`data` role acts as though they have all of the roles configured (it is a hot, warm, cold, and
frozen node).
This also includes a custom `AllocationDecider` that allows the user to configure the following
settings on a cluster level:
- `cluster.routing.allocation.require._tier`
- `cluster.routing.allocation.include._tier`
- `cluster.routing.allocation.exclude._tier`
And in index settings:
- `index.routing.allocation.require._tier`
- `index.routing.allocation.include._tier`
- `index.routing.allocation.exclude._tier`
Relates to #60848
Converting AllFieldMapper to parametrized form ended up not being run through BWC
testing, resulting in an incorrect implementation being committed. This commit fixes
the serialization, and adds unit tests as well as unmuting the BWC test that uncovered
the bug.
Fixes#60986