We respect allocation deciders, including the `MaxRetryAllocationDecider`, when
executing reroute commands. If you specify `?retry_failed=true` then the retry
counter is reset, but today this does not happen until after trying to execute
the reroute commands. This means that if an allocation has repeatedly failed,
but you want to take control and assign a shard to a particular node to work
around the repeated failures, you cannot execute the routing command in the
same call to `POST /_cluster/reroute` as the one that resets the failure
counter.
This commit fixes this by resetting the failure counter first, meaning that you
can now explicitly allocate a repeatedly-failed shard like this:
```
POST /_cluster/reroute?retry_failed=true
{
"commands": [
{
"allocate_replica": {
"index": "blahblah",
"shard": 2,
"node": "node-4"
}
}
]
}
```
Fixes#39546
* Take into consideration a wider range of Numbers when extracting the
values from source, more specifically - BigInteger and BigDecimal.
(cherry picked from commit 561b8d73dd7b03c50242e4e3f0128b2142959176)
This commit refactors put mapping request validation for reuse. The
concrete case that we are after here is the ability to apply effectively
the same framework to indices aliases requests. This commit refactors
the put mapping request validation framework to allow for that.
Fsyncing directories on Windows is not possible. We always suppressed
this by allowing that an AccessDeniedException is thrown when attemping
to open the directory for reading. Yet, this suppression also allowed
other IOExceptions to be suppressed, and that was a bug (e.g., the
directory not existing, or a filesystem error and reasons that we might
get an access denied there, like genuine permissions issues). This
leniency was previously removed yet it exposed that we were suppressing
this case on Windows. Rather than relying on exceptions for flow control
and continuing to suppress there, we simply return early if attempting
to fsync a directory on Windows (we will not put this burden on the
caller).
This commit fixes a test bug in the request validators random test. In
particular, an assertion was not properly nested in a guard that would
ensure that was at least one failure.
Relates #43000
When applying put mapping validators, we apply all the validators in the
collection. If a failure occurs, we collect that as a top-level
exception, and suppress any additional failures into the top-level
exception. However, if a request passes the validator after a top-level
exception has been collected, we would try to suppress a null exception
into the top-level exception. This is a violation of the
Throwable#addSuppressed API. This commit addresses this, and adds test
to cover the logic of collecting the failures when validating a put
mapping request.
Today we test for translog corruption by incrementing a byte by 1 somewhere in
a file, and verify that this leads to a `TranslogCorruptionException`.
However, we rely on _all_ corruptions leading to this exception in the
`RemoveCorruptedShardDataCommand`: this command fails if a translog file
corruption leads to a different kind of exception, and `EOFException` and
`NegativeArraySizeException` are both possible. This commit strengthens the
translog corruption detection tests by simulating the following:
- a random value is written
- the file is truncated
It also makes sure that we return a `TranslogCorruptionException` in all such
cases.
Fixes#42661
Backport of #42744
This code has not been needed since the removal of tribe nodes, it was
left behind when those were dropped (note that regular transport
permissions are handled through transport profiles, even if they are not
explicitly in use).
Today in the method IOUtils#fsync we ignore IOExceptions when fsyncing a
directory. However, the catch block here is too broad, for example it
would be ignoring IOExceptions when we try to open a non-existant
file. This commit addresses that by scoping the ignored exceptions only
to the invocation of FileChannel#force.
Previously, a reindex request had two different size specifications in the body:
* Outer level, determining the maximum documents to process
* Inside the source element, determining the scroll/batch size.
The outer level size has now been renamed to max_docs to
avoid confusion and clarify its semantics, with backwards compatibility and
deprecation warnings for using size.
Similarly, the size parameter has been renamed to max_docs for
update/delete-by-query to keep the 3 interfaces consistent.
Finally, all 3 endpoints now support max_docs in both body and URL.
Relates #24344
Today we assert that the connection thread is blocked by the time the test gets
to the barrier, but in fact this is not a valid assertion. The following
`Thread.sleep()` will cause the test to fail reasonably often.
```diff
diff --git a/server/src/test/java/org/elasticsearch/cluster/NodeConnectionsServiceTests.java b/server/src/test/java/org/elasticsearch/cluster/NodeConnectionsServiceTests.java
index 193cde3180d..0e57211cec4 100644
--- a/server/src/test/java/org/elasticsearch/cluster/NodeConnectionsServiceTests.java
+++ b/server/src/test/java/org/elasticsearch/cluster/NodeConnectionsServiceTests.java
@@ -364,6 +364,7 @@ public class NodeConnectionsServiceTests extends ESTestCase {
final CheckedRunnable<Exception> connectionBlock = nodeConnectionBlocks.get(node);
if (connectionBlock != null) {
try {
+ Thread.sleep(50);
connectionBlock.run();
} catch (Exception e) {
throw new AssertionError(e);
```
This change relaxes the test to allow some time for the connection thread to
hit the barrier.
Fixes#40170
Kibana alerting is going to be built using API Keys, and should be
permitted on a basic license.
This commit moves API Keys (but not Tokens) to the Basic license
Relates: elastic/kibana#36836
Backport of: #42787
Currently, when the SSLEngine needs to produce handshake or close data,
we must manually call the nonApplicationWrite method. However, this data
is only required when something triggers the need (starting handshake,
reading from the wire, initiating close, etc). As we have a dedicated
outbound buffer, this data can be produced automatically. Additionally,
with this refactoring, we combine handshake and application mode into a
single mode. This is necessary as there are non-application messages that
are sent post handshake in TLS 1.3. Finally, this commit modifies the
SSLDriver tests to test against TLS 1.3.
This is related to #31908. In order to use the external version in a
reindex from remote request, the search request must be configured to
request the version (as it is not returned by default). This commit
modifies the search request to request the version. Additionally, it
modifies our current reindex from remote tests to randomly use the
external version_type.
Changed order of listener invocation so that we notify before
registering search context and notify after unregistering same.
This ensures that count up/down like what we do in ShardSearchStats
works. Otherwise, we risk notifying onFreeScrollContext before notifying
onNewScrollContext (same for onFreeContext/onNewContext, but we
currently have no assertions failing in those).
Closes#28053
The test failed because we had only a single document in the index
that got deleted such that some assertions that expected at least
one live doc failed.
Relates to: #40741
Single updates use a different internal code path than updates that are wrapped in a bulk request.
While working on a refactoring to bring both closer together I've noticed that bulk updates were
failing some of the tests that single updates passed. In particular, bulk updates cause
NullPointerExceptions to be thrown and listeners not being properly notified when being rejected
from the thread pool.
This change adds a merge policy that drops all _id postings for documents that
are marked as soft-deleted but retained across merges. This is usually unnecessary
unless soft-deletes are used with a retention policy since otherwise a merge would
remove deleted documents anyway.
Yet, this merge policy prevents extreme cases where a very large number of soft-deleted
documents are retained and are impacting update performance.
Note, using this merge policy will remove all lookup by ID capabilities for soft-deleted documents.
A static code analysis revealed that we are not closing
the input stream in the find_file_structure endpoint.
This actually makes no difference in practice, as the
particular InputStream implementation in this case is
org.elasticsearch.common.bytes.BytesReferenceStreamInput
and its close() method is a no-op. However, it is good
practice to close the stream anyway.
This change adds the earliest and latest timestamps into
the field stats for fields of type "date" in the output of
the ML find_file_structure endpoint. This will enable the
cards for date fields in the file data visualizer in the UI
to be made to look more similar to the cards for date
fields in the index data visualizer in the UI.
increases the scheduler interval to fire less frequently, namely changing it from 1s to 10s. The scheduler interval is used for retrying after an error condition.
Adds a metadata field to snapshots which can be used to store arbitrary
key-value information. This may be useful for attaching a description of
why a snapshot was taken, tagging snapshots to make categorization
easier, or identifying the source of automatically-created snapshots.
The `replace` option in the phonetic token filter can have suprising side
effects, e.g. such as described in #26921. This PR adds a note to be mindful
about such scenarios and offers alternatives to using the `replace` option.
Closes#26921