Primary shard allocation observes limits in forcing allocation
Previously, during primary shards allocation of shards
with prior allocation IDs, if all nodes returned a
NO decision for allocation (e.g. the settings blocked
allocation on that node), we would chose one of those
nodes and force the primary shard to be allocated to it.
However, this meant that primary shard allocation
would not adhere to the decision of the MaxRetryAllocationDecider,
which would lead to attempting to allocate a shard
which has failed N number of times already (presumably
due to some configuration issue).
This commit solves this issue by introducing the
notion of force allocating a primary shard to a node
and each decider implementation must implement whether
this is allowed or not. In the case of MaxRetryAllocationDecider,
it just forwards the request to canAllocate.
Closes#19446
Squashes all the subpackages of `org.elasticsearch.rest.action` down to
the following:
* `o.e.rest.action.admin` - Administrative actions
* `o.e.rest.action.cat` - Actions that make tables for `grep`ing
* `o.e.rest.action.document` - Actions that act on documents
* `o.e.rest.action.ingest` - Actions that act on ingest pipelines
* `o.e.rest.action.search` - Actions that search
I'm tempted to merge `search` into `document` but the `document`
package feels fairly complete as is and `Suggest` isn't actually always
about documents either....
I'm also tempted to merge `ingest` into `admin.cluster` because the
latter contains the actions for dealing with stored scripts.
I've moved the `o.e.rest.action.support` into `o.e.rest.action`.
I've also added `package-info.java`s to all packges in `o.e.rest`. I
figure if the package is too small to deserve a `package-info.java` file
then it is too small to deserve to be a package....
Also fixes checkstyle in all moved classes.
* Enable BoostingQuery with FVH highlighter
* apply boost with negativeBoost
* flatten boosting query with its own boost and update boost query to a single layer
* Assign scroll keepAlive when deserializing
The scroll time value was never assign when deserializing from the transport layer, meaning that it would always be null when received from another node, although the originating search request might have it set to some value.
* add tests for SearchRequest serialization and fail fast with illegal arguments
To ease testing, also introduced equals, hashcode and toString methods in SearchRequest and Scroll.
The serialization test brought up a few wrong assumptions about non null instance members, for which some null checks were needed to avoid NPEs when serializing.
* make Scroll implement Writeable rather than Streamable
* [TEST] add serialization test for ShardSearchTransportRequest
This also covers ShardSearchLocalRequest implicitly as most of the serialization code is in it.
that have analyzer aliases in their analysis settings will still work, but
any attempts to create an alias for analyzers in newly created indices
will result in an IllegalArgumentException.
As a result, the setting `index.analysis.analyzer.{analyzerName}.alias` is
no longer supported.
Closes#18244
The term persisted task was used to indicate that a task should store its results upon its completion. We would like to use this term to indicate that a task can survive restart of nodes instead. This commit removes usages of the term "persist" when it means store results.
This commit fixes the number of max local storage nodes setting used in
the discovery disruption tests. In some cases (randomly but rarely), the
acked indexing test can run with five nodes instead of three, breaching
the max local storage nodes configuration.
As the most complicated `FetchSubPhase` highlighting gets its own package
(`o.e.seach.fetch.subphase.highlight`. No other `FetchSubPhase`s get their
own package. Instead they all reside together in `o.e.search.fetch.subphase`.
Add package descriptions to `o.e.search.fetch` and subpackages.
This commit adds a function to shard-level query result to determine whether
there are any hits that needs fetching. Currently, a shard-level query result
can have hits when there are search hits and/or completion suggestion hits.
The newly added function encapsulates the checks to determine if a shard-level
query result has any fetchable hits, which is used in optimizing for sorting
documents and releasing search request contexts.
If a primary fails, an active replica is promoted to primary. Once we do the promotion, however, we are sure that the active replica is not relocating anymore. The reason is that when the primary fails, we first remove/cancel all initializing replicas (also if they are relocation targets). This is the only safe thing to do anyhow, because promoting relocating replica to primary would also mean that the replica recovery of the replica relocation target is suddenly promoted to primary relocation, which the recovery code treats in a different way.
This commit defaults the max local storage nodes to one. The motivation
for this change is that a default value greather than one is dangerous
as users sometimes end up unknowingly starting a second node and start
thinking that they have encountered data loss.
Relates #19964
ContextIndexSearcher#explain ignores the dfs data to create the normalized weight.
This change fixes this discrepancy by using the dfs data to create the normalized weight when needed.
This commit separates the description of the links in the network that are to be disrupted from the failure that is to be applied to the links (disconnect/unresponsive/delay). Previously we had subclasses for the various kind of network disruption schemes combining on one hand failure mode (disconnect/unresponsive/delay) as well as the network links to cut (two partitions / bridge partitioning) into a single class.
Reducing the ping timeouts on a test that does not simulate network failures can cause node disconnects within the test on a slow CI machine.
The test testSearchWithRelocationAndSlowClusterStateProcessing does not expect such disconnects, leading to shard relocation in the test to abort prematurely.
Today in the uncaught exception handler, we attempt to halt the virtual
machine on fatal errors. Yet, halting the virtual machine requires
privileges which might not be granted to the caller when the exception
is thrown for example from a scripting engine. This means that if an
OutOfMemoryError or another fatal error is hit inside a script, the
virtual machine will not exit because the halt call will be denied for
securiry privileges. In this commit, we mark this halt call as trusted
so that the virtual machine can be halted if a fatal error is
encountered in a script.
Relates #19923
I also reduced the visibility of a couple classes and renamed/consolidated some
test classes for consistency, eg. removing the `Simple` prefix or using the
`<Type>FieldMapperTests` convention for testing field mappers.