Today some uncaught shard failures such as RejectedExecutionException skips the release of shard context
and let subsequent scroll requests access the same shard context again. Depending on how the other shards advanced,
this behavior can lead to missing data since scrolls always move forward.
In order to avoid hidden data loss, this commit ensures that we always release the context of shard search scroll requests whenever a failure
occurs locally. The shard search context will no longer exist in subsequent scroll requests which will lead to consistent shard failures
in the responses.
This change also modifies the retry tests of the reindex feature. Reindex retries scroll search request that contains a shard failure and
move on whenever the failure disappears. That is not compatible with how scrolls work and can lead to missing data as explained above.
That means that reindex will now report scroll failures when search rejection happen during the operation instead of skipping document
silently.
Finally this change removes an old TODO that was fulfilled with #61062.
This change makes sure that reader context is validated (`SearchOperationListener#validateReaderContext)
before any other operation and that it is correctly recycled or removed at the end of the operation.
This commit also fixes a race condition bug that would allocate the security reader for scrolls more than once.
Relates #61446
Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co>
PointInTimeBuilder is a ToXContentObject yet it does not print out a whole object (it is rather a fragment). Also, when it is printed out as part of SearchSourceBuilder, an error is thrown because pit should be wrapped into its own object.
This commit fixes this and adds tests for it.
This commit introduces a new API that manages point-in-times in x-pack
basic. Elasticsearch pit (point in time) is a lightweight view into the
state of the data as it existed when initiated. A search request by
default executes against the most recent point in time. In some cases,
it is preferred to perform multiple search requests using the same point
in time. For example, if refreshes happen between search_after requests,
then the results of those requests might not be consistent as changes
happening between searches are only visible to the more recent point in
time.
A point in time must be opened before being used in search requests. The
`keep_alive` parameter tells Elasticsearch how long it should keep a
point in time around.
```
POST /my_index/_pit?keep_alive=1m
```
The response from the above request includes a `id`, which should be
passed to the `id` of the `pit` parameter of search requests.
```
POST /_search
{
"query": {
"match" : {
"title" : "elasticsearch"
}
},
"pit": {
"id": "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWICBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==",
"keep_alive": "1m"
}
}
```
Point-in-times are automatically closed when the `keep_alive` is
elapsed. However, keeping point-in-times has a cost; hence,
point-in-times should be closed as soon as they are no longer used in
search requests.
```
DELETE /_pit
{
"id" : "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWIBBXV1aWQyAAA="
}
```
#### Notable works in this change:
- Move the search state to the coordinating node: #52741
- Allow searches with a specific reader context: #53989
- Add the ability to acquire readers in IndexShard: #54966
Relates #46523
Relates #26472
Co-authored-by: Jim Ferenczi <jimczi@apache.org>
CCR shard follow task can hit CircuitBreakingException on the leader
cluster (read changes requests) or the follower cluster (bulk requests).
CCR should retry on CircuitBreakingException as it's a transient error.
We were missing a few `@Override` annotations in runtime fields which
let us drift from the methods we were supposed to override. Oops. This
adds them and links the methods.
The `fromId` method would show up in profiling and JIT analysis as not-inlinable because it's too large
in the contexts it's used in in many cases and was consuming a surprising amount of cycles for computing the
min compat versions.
-> extract cold path from `fromId` to make JIT happy and cache minimumg compatible versions to fields.
For runtime fields we have written quite some lucene queries that work against runtime values that are the result of the execution of the different script contexts that runtime fields support.
The all (but one) share the same main logic: use a two phase iterator, iterate over all documents, and decide whether the current doc matches or not based on what the script returns. I went ahead and shared this bit of code in the base class for all queries on top of runtime fields.
This commit removes the documentation for some specific Searchable Snapshot REST APIs:
- clear cache
- searchable snapshot stats
- repository stats
These APIs are low-level and are useful to investigate the behavior of snapshot
backed indices but we expect them to be removed in the future or to appear in
a different form.
Avoiding a number of noop updates that were observed to cause trouble (as in needless noop CS publishing) which can become an issue when working with a large number of concurrent snapshot operations.
Also this sets up some simplifications made in the clone snapshot branch.
Backporting #62205 to 7.x branch.
This is similar to what happens for indices. Initially we decided to let each test cleanup the
data streams it created.
The reason behind this was that client yaml test runners would need to be modified to do this too and
because data steams were new, we waited with that and let each test cleanup the data stream it created.
However we sometimes have very hard to debug test failures, because many tests fail because another test
failed mid way and didn't clean up the data streams it created. Given that and data streams exist in
the code base for a while now, we should automatically delete all data streams after each yaml test.
Relates to #62190
* preserve data streams for rolling upgrade yaml tests
Previously the "mappings" field of the response from the
find_file_structure endpoint was not a drop-in for the
mappings format of the create index endpoint - the
"properties" layer was missing. The reason for omitting
it initially was that the assumption was that the
find_file_structure endpoint would only ever return very
simple mappings without any nested objects. However,
this will not be true in the future, as we will improve
mappings detection for complex JSON objects. As a first
step it makes sense to move the returned mappings closer
to the standard format.
This is a small building block towards fixing #55616
- Extract distribution archives defaults into plugin
- Added basic test coverage
- Avoid packaging/unpackaging cycle when relying on locally build distributions
- Provide DSL for setting up distribution archives
- Cleanup archives build script
The hard bounds were incorrectly scaled for intervals, which was
causing incorrect buckets to show up or no buckets at all for
interval other than 1.
Closes#62126
Add test for item-level error when no write index defined for an alia…
Co-authored-by: Jake Landis <jake.landis@elastic.co>
Co-authored-by: bellengao <gbl_long@163.com>
This commit removes `integTest` task from all es-plugins.
Most relevant projects have been converted to use yamlRestTest, javaRestTest,
or internalClusterTest in prior PRs.
A few projects needed to be adjusted to allow complete removal of this task
* x-pack/plugin - converted to use yamlRestTest and javaRestTest
* plugins/repository-hdfs - kept the integTest task, but use `rest-test` plugin to define the task
* qa/die-with-dignity - convert to javaRestTest
* x-pack/qa/security-example-spi-extension - convert to javaRestTest
* multiple projects - remove the integTest.enabled = false (yay!)
related: #61802
related: #60630
related: #59444
related: #59089
related: #56841
related: #59939
related: #55896
This prevent `keyword` valued runtime scripts from emitting too many
values or values that take up too much space. Without this you can put
allocate a ton of memory with the script by sticking it into a tight
loop. Painless has some protections against this but:
1. I don't want to rely on them out of sheer paranoia
2. They don't really kick in when the script uses callbacks like we do
anyway.
Relates to #59332
The `global_ordinals` implementation of `terms` had a bug when
`min_doc_count: 0` that'd cause sub-aggregations to have array index out
of bounds exceptions. Ooops. My fault. This fixes the bug by assigning
ordinals to those buckets.
Closes#62084
* [ML] only persist progress if it has changed
We already search for the previously stored progress document.
For optimization purposes, and to prevent restoring the same
progress after a failed analytics job is stopped,
this commit does an equality check between the previously stored progress and current progress
If the progress has changed, persistence continues as normal.
The windows service script does a little munging of the parsed JVM
option string, converting whitespaces to semicolons. We recently added
an optional Java 14 JDK flag to our system JVM flags. On earlier JDKs,
the windows service batch script would encounter a double whitespace
when this option was missing and convert it into double semicolons.
Double semicolons, in turn, don't work in the arguments to the windows
service command, and led to a lot of JVM options being dropped,
including "es.path.conf", which is required for startup.
This commit puts in a double defense. First, it removes any empty-string
options from the system options list in the Java Options Parser code.
Second, it munges out double semicolons if they do appear in the parsed
option output.
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
When a tree model is provided, it is possible that it is a stump.
Meaning, it only has one node with no splits
This implies that the tree has no features. In this case,
having zero feature_names is appropriate. In any other case,
this should be considered a validation failure.
This commit adds the validation if there is more than 1 node,
that the feature_names in the model are non-empty.
closes#60759
Fixing a few spots where NOOP tasks on the snapshot pool were created needlessly.
Especially when it comes to mixed master+data nodes and concurrent snapshots these
hurt delete operation performance needlessly.
Fetch failures are currently tracked byy AsyncSearchTask like ordinary shard failures. Though they should be treated differently or they end up causing weird scenarios like total=num_shards and successful=num_shards as the query phase ran fine yet the failed count would reflect the number of shards where fetch failed.
Given that partial results only include aggs for now and are complete even if fetch fails, we can ignore fetch failures in async search, as they will be anyways included in the response. They are in fact either received as a failure when all shards fail during fetch, or as part of the final response when only some shards fail during fetch.
As part of #60275 QueryPhaseResultConsumer ended up calling SearchProgressListener#onPartialReduce directly instead of notifyPartialReduce. That means we don't catch exceptions that may occur while executing the progress listener callback.
This commit fixes the call and adds a test for this scenario.
Currently, the async search task is the task that will be running through the whole execution of an async search. While the submit async search task prints out the search as part of its description, async search task doesn't while it should.
With this commit we address that while also making sure that the description highlights that the task is originated from an async search.
Also, we streamline the way the description is printed out by SearchTask so that it does not get forgotten in the future.