We currently convert pipeline aggregators to their corresponding
InternalAggregation instance as part of the final reduction phase.
They arrive to the coordinating node as part of QuerySearchResult
objects fom the shards and, despite we may incrementally reduce
aggs (hence we may have some non-final reduce and the final
one later) all the reduction phases happen on the same node.
With CCS minimizing roundtrips though, each cluster performs its
own non-final reduction, and then serializes the results back to
the CCS coordinating node which will perform the final coordination.
This breaks the assumptions made up until now around reductions
happening all on the same node.
With #40101 we have made sure that top-level pipeline aggs are not
reduced as part of the non-final reduction. The next step is to make
sure that they don't get lost, meaning that each coordinating node
needs to send them back to the CCS coordinating node as part of
the top-level `InternalAggregations` object.
Closes#40059
Today a coordinating node forces a final reduction of sibling pipeline aggregators whenever reducing aggs, unless it is reducing aggs incrementally. This works well for incremental reduction of aggs, but breaks CCS when minimizing roundtrips as each cluster ends up reducing its own pipeline aggregators locally while that should only be done by the CCS coordinating node later. This causes issues as after their reduction, pipeline aggs cannot be further reduced, which is what happens with CCS causing errors like "java.lang.UnsupportedOperationException: Not supported" being returned.
Each coordinating node should rather honour the reduce context flag that
indicates whether we are executing a final reduce or not. If not, it should leave the sibling pipeline aggregations alone.
Note that his bug affects only pipeline aggs that don't have a parent in
the aggs tree, while all the others work well.
Relates to #40059 but does not fix it yet, as the CCS coordinating node also needs to be adapted to recreate sibling pipeline aggregators from the request.
When minimizing round-trips, each cluster returns its own independent
search response. In case sort by field and/or field collapsing were
requested, when one cluster has no results to return, the information
about the field that sorting was based on (SortField array) as well as
the field (and the values) that collapsing was performed on are missing
in the search response. That causes problems as we can't build the
proper `TopDocs` instance which would need to be either `TopFieldDocs`
or `CollapseTopFieldDocs`. The merge routine expects that all the top
docs are of the same exact type which can't be guaranteed. Given that
the problematic results are empty, hence have no impact on the final
results, we can simply skip them.
Relates to #32125Closes#40067
This commit clarifies how the gateway selection works when configuring
remote clusters for CCR or CCS. Specifically, it clarifies compatibility
between different versions which is a very common question.
When using DFS_QUERY_THEN_FETCH search type, the dfs phase is run and
its results are used in the query phase to make scoring accurate.
When using CCS, depending on whether the DFS phase runs in the CCS
coordinating node (like if all shards were local) or in each remote
cluster (when minimizing round-trips), scoring will differ.
This commit disables minimizing round-trips whenever DFS is requested,
as it is not currently possible to ensure that scoring is accurate in
that case.
Relates to #32125
When a node is repurposed to master/no-data or no-master/no-data, v7.x
will not start (see #37748 and #37347). The `elasticsearch repurpose`
tool can fix this by cleaning up the problematic data.
The setup-passwords tool gives cryptic messages in case where custom discovery providers are
used (see #33580). As the URL auto-detection logic should be seen as best effort, this commit
improves the exception message to make it clearer what needs to be done to fix the issue.
Relates #33580
This named writable was never registered, so it means that we could not
read auto-follow patterns that were registered in the cluster
state. This causes them to be lost on restarts, a bad bug. This commit
addresses this by registering this named writable, and we add a basic
CCR restart test to ensure that CCR keeps functioning properly when the
follower is restarted.
The cache used in linearizability checker now uses approximately 6x less
memory by changing the cache from a set of (bits, state) tuples into a
map from bits -> { state }.
Each combination of states is kept once only, building on the
assumption that the number of state permutations is small compared to
the number of bits permutations. For those histories that are difficult
to check we will have many bits combinations that use the same state
permutations.
We end up now using approximately 15 bytes per entry compared to 101
bytes before, ie. a 6x improvement, allowing us to linearizability check
significantly longer histories.
Re-enabled linearizability checker in CoordinatorTests, hoping above
ensures we no longer run out of memory.
Resolves#39437
The Migration Assistance API has been functionally replaced by the
Deprecation Info API, and the Migration Upgrade API is not used for the
transition from ES 6.x to 7.x, and does not need to be kept around to
repair indices that were not properly upgraded before upgrading the
cluster, as was the case in 6.
We introduced WAIT_CLUSTERSTATE action in #19287 (5.0), but then stopped
using it since #25692 (6.0). This change removes that action and related
code in 7.x and 8.0.
Relates #19287
Relates #25692
When the method ensureGreen in QA tests is timed out, it does not
provide enough info for us to investigate why the testing index is
not green yet. With this change, we will dump the cluster state if
ensureGreen timed out.
Relates #32027
This PR introduces AsyncRecoveryTarget which executes remote calls of
peer recovery asynchronously. In this change, we also add a new
assertion to ensure that method sendBatch, which sends a batch of
history operations in phase2, is never called recursively on the same
thread. This new assertion will also be used in method sendFileChunks.
This commit unmutes NetworkDisruptionIT.
It makes changes necessary for Zen2 - avoids usage of
autoMinMasterNodes and selects cluster size, such that there is no
need to call AddVotingExclusion.
This test also introduces refactors a single method
prepareDistruptedCluster to be used by both test methods.
Unfortunately, NetworkDisruption is broken and the
testNetworkPartitionRemovalRestoresConnections "is fixed" by
introducing assertBusy - #38348.
Relates #36205
Relates #38348
(cherry picked from commit 97707c7f892636e5b75c3df546b067414acb27cd)
This commit adds cd $ES_HOME to elasticsearch-env and removes it from
elasticsearch. This way, both elasticsearch and elasticsearch-cli are
executed with the working directory set to $ES_HOME. The need for the
fix arose from the following bug:
1. Explicitly set path.data to relative to ES_HOME path in
elasticsearch.yml.
2. Run elasticsearch from any directory. Elasticsearch is able to
correctly start.
3. Stop elasticsearch.
4. Run elasticsearch-node unsafe-bootstrap, not from ES_HOME directory.
It will fail with an exception.
This commit fixes the issue and adds a new test.
This PR fixes the issue and adds a new test.
Also tests >=100 are renamed because alphabetic order does not work for
them.
(cherry picked from commit 2ffc29306ff7366efc598e7b4dd2ce528895cd3a
with fixes by #40083 and #40118)
We were leaking a reference to an AutoFollowCoordinator during
construction, violating safe publication according to the JLS
specification. This commit addresses this by waiting to register
AutoFollowCoordinator with the ClusterApplierService after the
AutoFollowCoordinator is fully constructed. We also remove ourselves as
a listener when stopping.
* Take into consideration aliases that can be used as aggregates
and in the ORDER BY element so that the groupings are re-ordered inside
the composite aggregation according to the ORDER BY ordering.
(cherry picked from commit 110c0b90b9cf2e9344ab3f412cfa8f8cd94ad71f)
For cases where fields can have multi values, allow the behavior to be
customized through a dedicated configuration field.
By default this will be enabled on the drivers so that existing datasets
work instead of throwing an exception.
For regular SQL usage, the behavior is false so that the user is aware
of the underlying data.
Fix#39700
(cherry picked from commit 2b351571961f172fd59290ee079126bbd081ceaf)
When shutting down a node, auto-followers will keep trying to run. This
is happening even as transport services and other components are being
closed. In some cases, this can lead to a stack overflow as we rapidly
try to check the license state of the remote cluster, can not because
the transport service is shutdown, and then immeidately retry
again. This can happen faster than the shutdown, and we die with stack
overflow. This commit adds a stop command to auto-followers so that this
retry loop occurs at most once on shutdown.
This change adds a wrapper for IndexSearcher that makes IndexSearcher#search(List, Weight, Collector) visible by
sub-classes. The wrapper is used by the ContextIndexSearcher to call this protected method on a searcher created by a plugin.
This ensures that an override of the protected method in an IndexSearcherWrapper plugin is called when a search is executed.
Closes#30758
The `GET /_cluster/state` API returns an internal representation of the cluster
state that does change from version to version. It's useful for debugging, but
it is not intended for regular use by clients.
This change adjusts the documentation of `GET /_cluster/state` to clarify that
this API yields an internal representation that should not be expected to
remain stable between versions.
Relates #40061, #40016
This change adds an option to the `FieldSortBuilder` that allows to transform the type
of a numeric field into another. Possible values for this option are `long` that transforms
the source field into an integer and `double` that transforms the source field into a floating point.
This new option is useful for cross-index search when the sort field is mapped differently on some
indices. For instance if a field is mapped as a floating point in one index and as an integer in another
it is possible to align the type for both indices using the `numeric_type` option:
```
{
"sort": {
"field": "my_field",
"numeric_type": "double" <1>
}
}
```
<1> Ensure that values for this field are transformed to a floating point if needed.
This commit enables full-cluster-restart and rolling-upgrade tests
to run with nodes using a JVM in fips approved only node by using
PEM key material instead of a JKS for the transport layer in that
case.
With SUN security provider, a CertificateException is thrown when
attempting to parse a Certificate from a PEM file on disk with
`sun.security.provider.X509Provider#parseX509orPKCS7Cert`
When using the BouncyCastle Security provider (as we do in fips
tests) the parsing happens in
CertificateFactory#engineGenerateCertificates which doesn't throw
an exception but returns an empty list.
In order to have a consistent behavior, this change makes it so
that we throw a CertificateException when attempting to read
a PEM file from disk and failing to do so in either Security
Provider
Resolves: #39580
`SecurityIndexManager` is hardcoded to handle only the `.security`-`.security-7` alias-index pair.
This commit removes the hardcoded bits, so that the `SecurityIndexManager` can be reused
for other indices, such as the planned security tokens index (`.security-tokens-7`).
This change adjusts the LDAP connection timeout for retrieving
attributes while performing the SAML IT to 5 seconds, from 5 ms
that it previously was.
Resolves: #40025
When an auto-follower coordinator times out waiting for the remote
cluster state, we do not log any indication of this. While this is
expected behavior in quiet deployments, it is still useful to see this
information for tracing the behavior of the auto-follow
coordinator. This commit adds a trace log message indicating that the
timeout.
This commit removes the cluster state size field from the cluster state
response, and drops the backwards compatibility layer added in 6.7.0 to
continue to support this field. As calculation of this field was
expensive and had dubious value, we have elected to remove this field.
Since other classes besides intervals can be serialized as part of
the Cursor, the getNamedWritables method should be moved from Intervals
to a more generic class Literals.
Relates to #39973
Currently, we maintain a transport name ("mock-nio", "nio", "netty")
that is passed to a `TcpTransportChannel` when a request is received.
The value of this name is to associate with the task when we register a
task with the task manager. However, it is only possible to run ES with
one transport, so having an implementation specific name is unnecessary.
This commit removes the name and replaces it with the generic
"transport".
Currently if a field alias is updated, any percolator queries that contain the
alias will still refer to its old target. This PR documents the issue while we
look into addressing it.
Relates to #37212.
The `sampler` agg creates a BestDocsDeferringCollector, which internally
initializes a priority queue of size `shardSize`. This queue is
populated with empty `Object` sentinels, which is roughly 16b per
object.
Similarly, the Diversified samplers create a DiversifiedTopDocsCollectors
which internally track PQ slots with ScoreDocKeys, weighing in around
28kb
If the user sets a very abusive `shard_size`, this could easily OOM
a node or cluster since these PQ are allocated up-front without
any checks.
This commit makes sure that when we create the collector, it
cannot be larger than the maxDoc so that we don't accidentally blow
up the node. We ensure the size is not greater than the overall
index maxDoc. A similar treatment is done for `maxDocsPerValue`
parameter of the diversified samplers
For good measure, this also adds in some CB accounting to try and track
memory usage.
Finally, a redundant array creation is removed to reduce a bit of
temporary memory.