There is a race in this test where the index request will return
once the dynamic mapping update has been observed by the cluster
state observer internally used by the indexing but not hit all
state appliers and thus isn't showing up as the applied state returned
by `clusterService.state()` yet.
We have a special FetchPhaseExecutionException which contains some useful
information about which shard and doc a fetch phase has failed in. However, this
is not used in many places - currently only the ExplainPhase and the highlighters
throw one, and the FetchPhase itself catches IOExceptions and just passes them
to the ExceptionsHelper with no extra context.
This commit changes FetchPhase to throw FetchPhaseExecutionException if it
encounters problems in any of its subphases, and removes the special handling
from the explain and highlight phases. It also removes the need to pass shard ids
around when building HitContext objects.
The complexity of removing a timeout listener was `O(n)` which
means that in case of many queued up CS update tasks (such as in the
case of an avalanche of dynamic mapping updates) we're dealing with
quadratic complexity for timing out N tasks which was observed to be
an issue in practice.
This PR makes the complexity of timing out a task `O(1)` and generally
simplifies the iteration logic of listeners and applies to be a little
more efficient and inline better.
The case InnerHitBuilderTests#testEqualsAndHashcode creates a copy of the object
by serializing + deserializing it, then applies a modification. If the 'fields'
list is empty, then deserializing it results in Collections.emptyList. Because
this is immutable, then modifying it can throw an UnsupportedOperationException.
This PR takes the same approach as for docvalue_fields, where we create a new
list instead of trying to add to an empty one.
This PR adds support for the 'fields' option in the following places:
* Anytime `inner_hits` is used, for both fetching nested/ child docs and field collapsing
* The `top_hits` aggregation
Addresses #61949.
Followup to #61681:
- reuse the current iterator in `reset()` if possible
- simply some integer-overflow-avoidance in `skip()`
- clarify some comments
- address some IntelliJ warnings
Disabling the `query_string` queries `allow_leading_wildcard` parameter didn't
work after a change probably introduced in #60959 because the various field types
`wildcardQuery` don't check the leading characters like
QueryParserBase#getWildcardQuery does. This PR adds the missing check also
before calling the field types wildcard generating method.
Closes#62267
Just a number of obvious spots where we were allocating
duplicate empty structures or otherwise inefficient that I
found while investigating snapshot cluster state update performance.
If shards are relocated to new nodes, then searches with a point in time
will fail, although a pit keeps search contexts open. This commit solves
this problem by reducing info used by SearchShardIterator and always
including the matching nodes when resolving a point in time.
Closes#61627
Today some uncaught shard failures such as RejectedExecutionException skips the release of shard context
and let subsequent scroll requests access the same shard context again. Depending on how the other shards advanced,
this behavior can lead to missing data since scrolls always move forward.
In order to avoid hidden data loss, this commit ensures that we always release the context of shard search scroll requests whenever a failure
occurs locally. The shard search context will no longer exist in subsequent scroll requests which will lead to consistent shard failures
in the responses.
This change also modifies the retry tests of the reindex feature. Reindex retries scroll search request that contains a shard failure and
move on whenever the failure disappears. That is not compatible with how scrolls work and can lead to missing data as explained above.
That means that reindex will now report scroll failures when search rejection happen during the operation instead of skipping document
silently.
Finally this change removes an old TODO that was fulfilled with #61062.
This change makes sure that reader context is validated (`SearchOperationListener#validateReaderContext)
before any other operation and that it is correctly recycled or removed at the end of the operation.
This commit also fixes a race condition bug that would allocate the security reader for scrolls more than once.
Relates #61446
Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co>
PointInTimeBuilder is a ToXContentObject yet it does not print out a whole object (it is rather a fragment). Also, when it is printed out as part of SearchSourceBuilder, an error is thrown because pit should be wrapped into its own object.
This commit fixes this and adds tests for it.
This commit introduces a new API that manages point-in-times in x-pack
basic. Elasticsearch pit (point in time) is a lightweight view into the
state of the data as it existed when initiated. A search request by
default executes against the most recent point in time. In some cases,
it is preferred to perform multiple search requests using the same point
in time. For example, if refreshes happen between search_after requests,
then the results of those requests might not be consistent as changes
happening between searches are only visible to the more recent point in
time.
A point in time must be opened before being used in search requests. The
`keep_alive` parameter tells Elasticsearch how long it should keep a
point in time around.
```
POST /my_index/_pit?keep_alive=1m
```
The response from the above request includes a `id`, which should be
passed to the `id` of the `pit` parameter of search requests.
```
POST /_search
{
"query": {
"match" : {
"title" : "elasticsearch"
}
},
"pit": {
"id": "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWICBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==",
"keep_alive": "1m"
}
}
```
Point-in-times are automatically closed when the `keep_alive` is
elapsed. However, keeping point-in-times has a cost; hence,
point-in-times should be closed as soon as they are no longer used in
search requests.
```
DELETE /_pit
{
"id" : "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWIBBXV1aWQyAAA="
}
```
#### Notable works in this change:
- Move the search state to the coordinating node: #52741
- Allow searches with a specific reader context: #53989
- Add the ability to acquire readers in IndexShard: #54966
Relates #46523
Relates #26472
Co-authored-by: Jim Ferenczi <jimczi@apache.org>
The `fromId` method would show up in profiling and JIT analysis as not-inlinable because it's too large
in the contexts it's used in in many cases and was consuming a surprising amount of cycles for computing the
min compat versions.
-> extract cold path from `fromId` to make JIT happy and cache minimumg compatible versions to fields.
Avoiding a number of noop updates that were observed to cause trouble (as in needless noop CS publishing) which can become an issue when working with a large number of concurrent snapshot operations.
Also this sets up some simplifications made in the clone snapshot branch.
The hard bounds were incorrectly scaled for intervals, which was
causing incorrect buckets to show up or no buckets at all for
interval other than 1.
Closes#62126
The `global_ordinals` implementation of `terms` had a bug when
`min_doc_count: 0` that'd cause sub-aggregations to have array index out
of bounds exceptions. Ooops. My fault. This fixes the bug by assigning
ordinals to those buckets.
Closes#62084
Fixing a few spots where NOOP tasks on the snapshot pool were created needlessly.
Especially when it comes to mixed master+data nodes and concurrent snapshots these
hurt delete operation performance needlessly.
As part of #60275 QueryPhaseResultConsumer ended up calling SearchProgressListener#onPartialReduce directly instead of notifyPartialReduce. That means we don't catch exceptions that may occur while executing the progress listener callback.
This commit fixes the call and adds a test for this scenario.
Currently, the async search task is the task that will be running through the whole execution of an async search. While the submit async search task prints out the search as part of its description, async search task doesn't while it should.
With this commit we address that while also making sure that the description highlights that the task is originated from an async search.
Also, we streamline the way the description is printed out by SearchTask so that it does not get forgotten in the future.
In many cases we don't need a `StreamInput` or `StreamOutput`
wrapper around these streams so I this commit adjusts the API
to just normal streams and adds the wrapping where necessary.
Kibana often highlights *everything* like this:
```
POST /_search
{
"query": ...,
"size": 500,
"highlight": {
"fields": {
"*": { ... }
}
}
}
```
This can get slow when there are hundreds of mapped fields. I tested
this locally and unscientifically and it took a request from 20ms to
150ms when there are 100 fields. I've seen clusters with 2000 fields
where simple search go from 500ms to 1500ms just by turning on this sort
of highlighting. Even when the query is just a `range` that and the
fields are all numbers and stuff so it won't highlight anything.
This speeds up the `unified` highlighter in this case in a few ways:
1. Build the highlighting infrastructure once field rather than once pre
document per field. This cuts out a *ton* of work analyzing the query
over and over and over again.
2. Bail out of the highlighter before loading values if we can't produce
any results.
Combined these take that local 150ms case down to 65ms. This is unlikely
to be really useful when there are only a few fetched docs and only a
few fields, but we often end up having many fields with many fetched
docs.
This also adds the ability to define a serialization check on Parameters, used
in this case to only serialize format and locale parameters if the mapper is a
date range.
Currently we open and close the checkpoint file channel for every fsync.
This file channel can be kept open for the lifecycle of a translog
writer. This avoids the overhead of opening the file, checking file
permissions, and closing the file on every fsync.
This pull request adds a new set of APIs that allows tracking the number of requests performed
by the different registered repositories.
In order to avoid losing data, the repository statistics are archived after the repository is closed for
a configurable retention period `repositories.stats.archive.retention_period`. The API exposes the
statistics for the active repositories as well as the modified/closed repositories.
Backport of #60371
The interface is never used as an abstraction - implementations are are called directly,
and most of them don't need to implement the preProcess method.
An important goal of the disk threshold decider is to ensure that nodes
use less disk space than the high watermark, and to take action if a
node ever exceeds this watermark. Today we do not have any
integration-style tests of this high-level behaviour. This commit
introduces a small test harness that can adjust the apparent size of the
disk and verify that the disk threshold decider moves shards around in
response.
Co-authored-by: Yannick Welsch <yannick@welsch.lu>
Just a few random things to optimize motivated by somewhat sub-standard performance
for large snapshot cluster states with many concurrent snapshots observed in production.
Today, the terms aggregation reduces multiple aggregations at once using a map
to group same buckets together. This operation can be costly since it requires
to lookup every bucket in a global map with no particular order.
This commit changes how term buckets are sorted by shards and partial reduces in
order to be able to reduce results using a merge-sort strategy.
For bwc, results are merged with the legacy code if any of the aggregations use
a different sort (if it was returned by a node in prior versions).
Relates #51857
The null_value parameter for date fields is always parsed using DateFormatter.parseMillis,
which is incorrect for nanosecond resolution fields. This commit changes the parsing logic
to always use DateFieldType.parse() to parse the null value.
This commit includes the work that has been done on the runtime fields feature branch until now. The high level tasks are listed in #59332. The tasks that have not yet been completed can be worked on after merging the feature branch.
We are adding a new x-pack plugin called runtime-fields that plugs in a custom mapper which allows to define runtime fields based on a script.
The changes included in this commit that were made outside of the x-pack/plugin/runtime-fields directory are minimal and revolve around 1) making the ScriptService available while parsing index mappings so that the scripts associated to runtime fields can be compiled 2) sharing code to manipulate ranges etc. as it can be reused in runtime fields.
Co-authored-by: Nik Everett <nik9000@gmail.com>
Flattening both streams into a single stream here saves a few objects and some indirection.
Also, removed the redundant `offset` field which added nothing but complexity by forcing the
incrementation of two counters on every read.
This commit adds external test modules. These are modules meant for
external systems to test edge cases in elasticsearch, but only within
snapshots. They are not meant to be used in production, so protections
are also added from their accidental inclusion in release builds.
Note that this commit does not actually add any new modules, it only
adds the infrastructure for the new modules, under
`test/external-modules`.