DfsPhase captures terms used for scoring a query in order to build global term statistics across
multiple shards for more accurate scoring. It currently does this by building the query's `Weight`
and calling `extractTerms` on it to collect terms, and then calling `IndexSearcher.termStatistics()`
for each collected term. This duplicates work, however, as the various `Weight` implementations
will already have collected these statistics at construction time.
This commit replaces this round-about way of collecting stats, instead using a delegating
IndexSearcher that collects the term contexts and statistics when `IndexSearcher.termStatistics()`
is called from the Weight.
It also fixes a bug when using rescorers, where a `QueryRescorer` would calculate distributed term
statistics, but ignore field statistics. `Rescorer.extractTerms` has been removed, and replaced with
a new method on `RescoreContext` that returns any queries used by the rescore implementation.
The delegating IndexSearcher then collects term contexts and statistics in the same way described
above for each Query.
With this commit we add the `.cfs` file extension to the list of file
types that are memory-mapped by hybridfs. `.cfs` files combine all files
of a Lucene segment into a single file in order to save file handles. As
this strategy is only used for "small" segments (less than 10% of the
shard size), it is benefical to memory-map them instead of accessing
them via NIO.
Relates #36668
Today if soft deletes are enabled then we read the operations needed for peer
recovery from Lucene. However we do not currently make any attempt to retain
history in Lucene specifically for peer recoveries so we may discard it and
fall back to a more expensive file-based recovery. Yet we still retain
sufficient history in the translog to perform an operations-based peer
recovery.
In the long run we would like to fix this by retaining more history in Lucene,
possibly using shard history retention leases (#37165). For now, however, this
commit reverts to performing peer recoveries using the history retained in the
translog regardless of whether soft deletes are enabled or not.
Previously, if a version conflict occurred and a previous primary
response was present, the original primary response would be used both
for sending to replica and back to client. This was made in the past as an
attempt to fix issues with conflicts after relocations where a bulk request
would experience a closed shard half way through and thus have to retry
on the new primary. It could then fail on its own update.
With sequence numbers, this leads to an issue, since if a primary is
demoted (network partitions), it will send along the original response
in the request. In case of a conflict on the new primary, the old
response is sent to the replica. That data could be stale, leading to
inconsistency between primary and replica.
Relocations now do an explicit hand-off from old to new primary and
ensures that no operations are active while doing this. Above is thus no
longer necessary. This change removes the special handling of conflicts
and ignores primary responses when executing shard bulk requests on the
primary.
Prior to this commit, when an indexing operation resulted in an
`Engine.Result.Type.MAPPING_UPDATE_REQUIRED`, TransportShardBulkAction
immediately retries the indexing operation to see if it succeeds. In the event
that it succeeds the context does not wait until the mapping update has
propagated through the cluster state before finishing the indexing.
In some of our tests we rely on mappings being available as soon as they've been
introduced in a document that indexed correctly. By removing the immediate retry
we always wait for this to be the case.
Resolves#38428
Supercedes #38579
Relates to #38711
One of the test methods wasn't run because it was private. Making this method
public and fixing some issues around mocking the threadpool that otherwise would
lead to an NPE.
This changes the output of the `_cat/indices` API with `Security` enabled.
It is possible to only display the index name (and possibly the index
health, depending on the request options) but not its stats (doc count, merges,
size, etc). This is the case for closed indices which have index metadata in the
cluster state but no associated shards, hence no shard stats.
However, when `Security` is enabled, and the request contains wildcards,
**open** indices without stats are a common occurrence. This is because the
index names in the response table are picked up directly from the cluster state
which is not filtered by `Security`'s _indexNameExpressionResolver_, unlike the
stats data which is populated by the indices stats API which does go through the
index name resolver.
This is a bug, because it is circumventing `Security`'s function to hide
unauthorized indices.
This has been fixed by displaying the index names as they are resolved by the indices
stats API. The outputs of these two APIs is now very similar: same index names,
similar data but different format.
Closes#37190
#37767 changed the expected exception for "no such cluster" error from
`IllegalStateException` to a dedicated `NoSuchRemoteClusterException`.
An assertion in `testCollectNodes` needs to be updated accordingly.
When a primary shard is recovered from its store, we trim the last
commit (when it's unsafe). If that primary crashes before the recovery
completes, we will lose the committed retention leases because they are
baked in the last commit. With this change, we copy the retention leases
from the last commit to the safe commit when trimming unsafe commits.
Relates #37165
In this case, we were incrementing the policy too much. This means on
every iteration we actually keep increasing the minimum retained
sequence number, even with leases in place. It was a bug from when the
soft deletes policy had retention leases incorporated into it. This
commit fixes this bug by ensuring we only increment in the proper
places, and adds careful tests for the various situations.
Forward port of https://github.com/elastic/elasticsearch/pull/38757
This change reverts the initial 7.0 commits and replaces them
with the 6.7 variant that still allows for the ecs flag.
This commit differs from the 6.7 variants in that ecs flag will
now default to true.
6.7: `ecs` : default `false`
7.x: `ecs` : default `true`
8.0: no option, but behaves as `true`
* Revert "Ingest node - user agent, move device to an object (#38115)"
This reverts commit 5b008a34aa.
* Revert "Add ECS schema for user-agent ingest processor (#37727) (#37984)"
This reverts commit cac6b8e06f.
* cherry-pick 5dfe1935345da3799931fd4a3ebe0b6aa9c17f57
Add ECS schema for user-agent ingest processor (#37727)
* cherry-pick ec8ddc890a34853ee8db6af66f608b0ad0cd1099
Ingest node - user agent, move device to an object (#38115) (#38121)
* cherry-pick f63cbdb9b426ba24ee4d987ca767ca05a22f2fbb (with manual merge fixes)
Dep. check for ECS changes to User Agent processor (#38362)
* make true the default for the ecs option, and update 7.0 references and tests
The hardcoded '\n' in string will not work in Windows where there is a
different line separator. A System.lineSeparator should be used to make
it work on all platforms
closes#38705
backport #38771
- Disables the request cache on the test, to prevent cached
values from potentially interfering with test results
- Changes the test to execute a single query, in hopes of making
failures more reproducible
Backport of #38583
There were two documents (seq=2 and seq=103) missing on the follower in
one of the failures of `testFailOverOnFollower`. I spent several hours
on that failure but could not figure out the reason. I adjust log and
unmute this test so we can collect more information.
Relates #38633
We need to use the current primary term instead of 1L for the initial
retention leases; otherwise, the primary term of the committed
retention leases won't match the current primary term if the
retention leases never gets updated.
This commit introduces actions for some common retention lease
operations that clients need to be able to perform remotely. These
actions include add/renew/remove.
fix tests to use clock in milliseconds precision in watcher code
make sure the date comparison in string format is using same formatters
some of the code was modified in #38514 possibly because of merge conflicts
closes#38581
Backport #38738
A recent test failure triggered an edge case scenario where failures may be coming back with the same shard id, yet from different clusters.
This commit adapts the failures comparator to take the cluster alias into account when merging failures as part of CCS requests execution.
Also the corresponding test has been split in two: with and without
search shard target set to the failure.
Closes#38672
When a retention lease already exists on an add retention lease
invocation, or a retention lease is not found on a renew retention lease
invocation today we throw an illegal argument exception. This puts a
burden on the caller to catch that specific exception and parse the
message. This commit relieves the burden from the caller by adding
dedicated exception types for these situations.
This commit introduces the ability to remove retention leases. Explicit
removal will be needed to manage retention leases used to increase the
likelihood of operation-based recoveries syncing, and for consumers such
as ILM.
geo_shape indexes created before 6.6 use geohash string encoding as default tree parameter and quadtree encoding for 6.6 and later. This commit fixes bwc to use geohash encoding in LegacyGeoshapeFieldMapper for indexes created before 6.6.
The assertion that the stats2 map is empty in
IndicesQueryCache.close has been observed to
fail very occasionally in internal cluster tests.
The likely cause is a cross-thread visibility
problem for a count variable. This change
makes that count volatile.
Relates #37117
Backport of #38714
The Close Index API has been refactored in 6.7.0 and it now performs
pre-closing sanity checks on shards before an index is closed: the maximum
sequence number must be equals to the global checkpoint. While this is a
strong requirement for regular shards, we identified the need to relax this
check in the case of CCR following shards.
The following shards are not in charge of managing the max sequence
number or global checkpoint, which are pulled from a leader shard. They
also fetch and process batches of operations from the leader in an unordered
way, potentially leaving gaps in the history of ops. If the following shard lags
a lot it's possible that the global checkpoint and max seq number never get
in sync, preventing the following shard to be closed and a new PUT Follow
action to be issued on this shard (which is our recommended way to
resume/restart a CCR following).
This commit allows each Engine implementation to define the specific
verification it must perform before closing the index. In order to allow
following/frozen/closed shards to be closed whatever the max seq number
or global checkpoint are, the FollowingEngine and ReadOnlyEngine do
not perform any check before the index is closed.
Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com>
Added a constructor accepting `StreamInput` as argument, which allowed to
make most of the instance members final as well as remove the default
constructor.
Removed a test only constructor in favour of invoking the existing
constructor that takes a `SearchRequest` as first argument.
Also removed profile members and related methods as they were all unused.
The existing formatter being used was not on par with the joda formatter
as it was missing the ability to parse a comma as a separator between
seconds and milliseconds.
While a real iso8601 would be much more complex, this might be
sufficient for some more use-cases.
The ingest date formatter now also uses the iso8601 formatter by
default.
Closes#38345
The benchmarks showed a sharp decrease in aggregation performance for
the UTC case.
This commit uses the same calculation as joda time, which requires no
conversion into any java time object, also, the check for an fixedoffset
has been put into the ctor to reduce the need for runtime calculations.
The same goes for the amount of the used unit in milliseconds.
Closes#37826
Whenever phase failure is raised in AbstractSearchAsyncAction, we go and
release search contexts of shards that successfully returned their
results, prior to notifying the listener of the failure. In case we are
executing a CCS request, it's important to look-up the connection to
send the release context request to.
This commit makes sure that the lookup takes the cluster alias into
account. We used to use `null` at all times instead which is not correct
and was not caught as any exception is caught without re-throwing it.
The test was relying on toString in ZonedDateTime which is different to
what is formatted by strict_date_time when milliseconds are 0
The method is just delegating to dateFormatter, so that scenario should
be covered there.
closes#38359
Backport #38610
Adds the ability to fetch chunks from different files in parallel, configurable using the new `ccr.indices.recovery.max_concurrent_file_chunks` setting, which defaults to 5 in this PR.
The implementation uses the parallel file writer functionality that is also used by peer recoveries.
This commit adds the 7.1 version constant to the 7.x branch.
Co-authored-by: Andy Bristol <andy.bristol@elastic.co>
Co-authored-by: Tim Brooks <tim@uncontended.net>
Co-authored-by: Christoph Büscher <cbuescher@posteo.de>
Co-authored-by: Luca Cavanna <javanna@users.noreply.github.com>
Co-authored-by: markharwood <markharwood@gmail.com>
Co-authored-by: Ioannis Kakavas <ioannis@elastic.co>
Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co>
Co-authored-by: David Roberts <dave.roberts@elastic.co>
Co-authored-by: Jason Tedor <jason@tedor.me>
Co-authored-by: Alpar Torok <torokalpar@gmail.com>
Co-authored-by: David Turner <david.turner@elastic.co>
Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com>
Co-authored-by: Tim Vernum <tim@adjective.org>
Co-authored-by: Albert Zaharovits <albert.zaharovits@gmail.com>
This commit changes the `TransportVerifyShardBeforeCloseAction` so that it
always forces the flush of the shard. It seems that #37961 is not sufficient to
ensure that the translog and the Lucene commit share the exact same max
seq no and global checkpoint information in case of one or more noop
operations have been made.
The `BulkWithUpdatesIT.testThatMissingIndexDoesNotAbortFullBulkRequest`
and `FrozenIndexTests.testFreezeEmptyIndexWithTranslogOps` test this trivial
situation and they both fail 1 on 10 executions.
Relates to #33888
In #38333 and #38350 we moved away from the `discovery.zen` settings namespace
since these settings have an effect even though Zen Discovery itself is being
phased out. This change aligns the documentation and the names of related
classes and methods with the newly-introduced naming conventions.
This commit integrates retention leases with recovery. With this change,
we copy the current retention leases on primary to the replica during
phase two of recovery. At this point, the replica will have been added
to the replication group and so is already receiving retention lease
sync requests from the primary. This means that if any retention lease
syncs are triggered on the primary after we sample the retention leases
here during phase two, that sync request will also arrive on the replica
ensuring that the replica is from this point on up to date with the
retention leases on the primary. We have to copy these during phase two
since we will be applying indexing operations, potentially triggering
merges, and therefore must ensure the correct retention leases are in
place beforehand.
Instead of logging warnings we now rethrow exceptions thrown inside
scheduled/submitted tasks. This will still log them as warnings in
production but has the added benefit that if they are thrown during
unit/integration test runs, the test will be flagged as an error.
This is a continuation of #38014
Fixed NPE that caused CCR tests (IndexFollowingIT and likely others)
to fail.
schedule could bubble rejected exception to uncaught exception
handler when not using SAME executor if thread pool is terminated.
Now ignore rejected exception silently if executor is shutdown.
Elasticsearch has long [supported](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html#index-versioning) compare and set (a.k.a optimistic concurrency control) operations using internal document versioning. Sadly that approach is flawed and can sometime do the wrong thing. Here's the relevant excerpt from the resiliency status page:
> When a primary has been partitioned away from the cluster there is a short period of time until it detects this. During that time it will continue indexing writes locally, thereby updating document versions. When it tries to replicate the operation, however, it will discover that it is partitioned away. It won’t acknowledge the write and will wait until the partition is resolved to negotiate with the master on how to proceed. The master will decide to either fail any replicas which failed to index the operations on the primary or tell the primary that it has to step down because a new primary has been chosen in the meantime. Since the old primary has already written documents, clients may already have read from the old primary before it shuts itself down. The version numbers of these reads may not be unique if the new primary has already accepted writes for the same document
We recently [introduced](https://www.elastic.co/guide/en/elasticsearch/reference/6.x/optimistic-concurrency-control.html) a new sequence number based approach that doesn't suffer from this dirty reads problem.
This commit removes support for internal versioning as a concurrency control mechanism in favor of the sequence number approach.
Relates to #1078
This commit lifts the control of when retention leases are expired to
index shard. In this case, we move expiration to an explicit action
rather than a side-effect of calling
ReplicationTracker#getRetentionLeases. This explicit action is invoked
on a timer. If any retention leases expire, then we hard sync the
retention leases to the replicas. Otherwise, we proceed with a
background sync.