Lucene TopDocs collector are now able to early terminate the collection
based on the index sort. This change plugs this new functionality directly in the
query phase instead of relying on a dedicated early terminating collector.
These templates were generated with 5.0. We need those generated with 6.0 since
we do not guarantee compatibility with previous versions of the template anyway.
I removed the winlogbeat template which is a bit harder to generate as it requires
Windows, since we do not aim to be exhaustive.
This is related to #27802. This commit adds a jar called
elasticsearch-nio that contains the base nio classes that will be used
for the tcp nio transport and eventually the http nio transport.
The jar does not depend on elasticsearch:core, so all references to core
have been removed.
JDK 10 has gone fully-modular. This means that:
- when targeting JDK 8 with a JDK 10 compiler, we have to use the full
profile
- when targeting JDK 10 with a JDK 10 compiler, we have to use
-add-modules java.base
Today we only target JDK 8 (our minimum compatibility version) so we
need to change the compiler flags conditional on using a JDK 10
compiler. This commit does that.
Relates #27884
Today when we get a metadata snapshot directly from a store directory,
we acquire a metadata lock, then acquire an IndexWriter lock. However,
we create a CheckIndex in IndexShard without acquiring the metadata lock
first. This causes a recovery failed because the IndexWriter lock can be
still held by method snapshotStoreMetadata. This commit makes sure to
create a CheckIndex under the metadata lock.
Closes#24481Closes#27731
Relates #24787
Previously to this change when DocStats are added together (for example when adding the index size of all primary shards for an index) we naively added the `totalSizeInBytes` together. This worked most of the time but not when the index size on one or multiple shards was reported to be `-1` (no value).
This change improves the logic by considering if the current value or the value to be added is `-1`:
* If the current and new value are both `-1` the value remains at `-1`
* If the current value is `-1` and the new value is not `-1`, current value is changed to be equal to the new value
* If the current value is not `-1` and the new value is `-1` the new value is ignored and the current value is not changed
* If both the current and new values are not `-1` the current value is changed to be equal to the sum of the current and new values.
The change also re-enables the failing rollover YAML test that was failing due to this bug.
TestZenDiscovery is used to allow discovery based on in memory structures. This isn't a relevant for the cloud providers tests (but isn't a problem at the moment either)
We used to shrink the version map under an external lock. This is
quite ambigious and instead we can simply issue an empty refresh to
shrink it.
Closes#27852
Today we prevent that the same thread acquires the same lock more than once.
This restriction is a relict form the early days of this concurrency construct
and can be removed.
The last operation executed in IndicesClientDocumentationIT.testCreate()
is an asynchronous index creation. Because nothing waits for its
completion, on slow machines the index can sometimes be created after
the testCreate() test is finished, and it can fail the following test.
Closes#27754
While the LiveVersionMap is an internal class that belongs to the engine we do
rely on some external locking to enforce the desired semantics. Yet, in tests
we mimic the outer locking but we don't have any way to enforce or assert on
that the lock is actually hold. This change moves the KeyedLock inside the
LiveVersionMap that allows the engine to access it as before but enables
assertions in the LiveVersionMap to ensure the lock for the modifying or
reading key is actually hold.
Currently FiltersAggregationBuilder#doRewrite creates a new FiltersAggregationBuilder which doesn't correctly copy the original "keyed" field if a non-keyed filter gets rewritten.
This can cause rendering bugs of the output aggregations like the one reported in #27841.
Closes#27841
Elasticsearch offers a number of http requests that can take a while to
execute. In #27713 we introduced an http read timeout that defaulted to
30 seconds. This means that if no reads happened for 30 seconds (even
after a request is received), the connection would be closed due to
timeout.
This commit disables the read timeout by default to allow us to evaluate
the impact of read timeouts and to avoid introducing distruptive
behavior.
Currently, method corruptTranslogFiles corrupts some translog files
whose translog_gen are at least the min_required_translog_gen from the
translog checkpoint. However this condition is not enough for
recoverFromTranslog to be always failed. If we corrupt only translog
operations from only translog files whose translog_gen are smaller than
the min_translog_gen of a recovering index commit, recoverFromTranslog
will be ok as we won't read translog operations from those files.
This commit makes sure corruptTranslogFiles to corrupt some translog
files that will be used in recoverFromTranslog.
Closes#27538
When the first parameter of `ESTestCase#randomValueOtherThan` is `null`
then run the supplier until it returns non-`null`. Previously,
`randomValueOtherThan` just ran the supplier one time which was
confusing.
Unexpectedly, it looks like not tests rely on the original `null`
handling.
Closes#27821
A FieldCapabilities request can cover multiple indices (or aliases pointing to multiple indices).
When rewriting the request for each index, store the original requested indices.
We currently have a complicated port assignment scheme to make sure that the nodes span off by the internal test cluster will be assigned fixed port ranges that will also not collide between clusters. The port ranges need to be fixed in advance so that the nodes will be able to find each other via `UnicastZenPing`.
This approach worked well for the last few years but we are now at a point that our testing has grown beyond it and we exceed the 5 reusable ranges per JVM. This means that nodes are not always assigned the first 5 ports in their range which causes cluster formation issues. On top of that, most of the clusters that are span up don't even rely on `UnicastZenPing` but rather `MockZenPings` that uses in memory maps for discovery (with the down side that they are not influenced by network disruption simulations).
This PR changes `InternalTestCluster` to use port 0 as a fixed assignment. This will allow the OS to manage ports and will ensure we don't have collisions. For tests that need to simulate network disruptions (and thus can't use `MockZenPings`), a new `UnicastHostProvider` is introduced that is based on the current state of the test cluster. Since that is only resolved at run time, it is aware of the port assignments of the OS.
Closes#27818Closes#27762
This commit moves GlobalCheckpointTracker from the engine to IndexShard, where it better fits logically: Tracking the global checkpoint based on the local checkpoints of all shards in the replication group is not a property of the engine, but rather a property fulfilled by the current primary shard. The LocalCheckpointTracker on the other hand is driven by the contents of the local translog. By moving GlobalCheckpointTracker to IndexShard, it makes little sense to keep the SequenceNumbersService class around - it would only wrap the LocalCheckpointTracker. This commit therefore removes the class and replaces occurrences of SequenceNumbersService in the engine directly by LocalCheckpointTracker.
AnalysisFactoryTestCase checks that the ES custom token filter multi-term
awareness matches the underlying lucene factory. For the trim filter this
won't be the case until LUCENE-8093 is released in 7.3, so we add a
temporary exclusion
Closes#27310
This commit fixes the version tests for release tests. The problem here
is that during release tests all version should be treated as released
so the assertions must be modified accordingly.
Relates #27815
When snapshotting the primary we capture a lucene commit at an arbitrary moment from a sequence number perspective. This means that it is possible that the commit misses operations and that there is a gap between the local checkpoint in the commit and the maximum sequence number.
When we restore, this will create a primary that "misses" operations and currently will mean that the sequence number system is stuck (i.e., the local checkpoint will be stuck). To fix this we should fill in gaps when we restore, in a similar fashion to normal store recovery.
Currently randomNonNegativeLong() returns 0 half as often as any positive long,
but random number generators are typically expected to return
uniformly-distributed values unless otherwise specified. This fixes this issue
by mapping Long.MIN_VALUE directly onto 0 rather than resampling.