The recovery process started during primary relocation of shadow replicas accesses the engine on the source shard after it's been closed, which results in the source shard failing itself.
Right now closing a shard looks like it strands refresh listeners,
causing tests like
`delete/50_refresh/refresh=wait_for waits until changes are visible in search`
to fail. Here is a build that fails:
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+multi_cluster_search+multijob-darwin-compatibility/4/console
This attempts to fix the problem by implements `Closeable` on
`RefreshListeners` and rejecting listeners when closed. More importantly
the act of closing the instance flushes all pending listeners
so we shouldn't have any stranded listeners on close.
Because it was needed for testing, this also adds the number of
pending listeners to the `CommonStats` object and all API to which
that flows: `_cat/nodes`, `_cat/indices`, `_cat/shards`, and
`_nodes/stats`.
In pre 2.x versions, if the repository was set to compress snapshots,
then snapshots would be compressed with the LZF algorithm. In 5.x,
Elasticsearch no longer supports the LZF compression algorithm. This
presents an issue when retrieving snapshots in a repository or upgrading
repository data to the 5.x version, because Elasticsearch throws an
exception when it tries to read the snapshot metadata because it was
compressed using LZF.
This commit gracefully handles the situation by introducing a new
incompatible-snapshots blob to the repository. For any pre-2.x snapshot
that cannot be read, that snapshot is removed from the list of active
snapshots, because the snapshot could not be restored anyway. Instead,
the snapshot is recorded in the incompatible-snapshots blob. When
listing snapshots, both active snapshots and incompatible snapshots will
be listed, with incompatible snapshots showing a `INCOMPATIBLE` state.
Any attempt to restore an incompatible snapshot will result in an
exception.
`ToXContentObject` extends `ToXContent` without adding new methods to it, while allowing to mark classes that output complete xcontent objects to distinguish them from classes that require starting and ending an anonymous object externally.
Ideally ToXContent would be renamed to ToXContentFragment, but that would be a huge change in our codebase, hence we simply document the fact that toXContent outputs fragments with no guarantees that the output is valid per se without an external ancestor.
Relates to #16347
This is related to #22116. A logIfNecessary() call makes a call to
NetworkInterface.getInterfaceAddresses() requiring SocketPermission
connect privileges. By moving this to bootstrap the logging call can be
made before installing the SecurityManager.
1. Escape sequences we're working. For example `\\` is now correctly
interpreted as `\` instead of `\\`. Same with `\'` being `'` and
`\"` being `"`.
2. `'` delimited strings weren't allowed to contain `"`s but it looked
like they were intended to support it. Now they do.
3. Improves the error message when the script contains an invalid
escape sequence inside a string to include a list of the valid
escape sequences.
Closes#22372
The JDK 9 compiler (b151) emits the warning "No processor claimed any of these annotations" for annotations that would be runtime annotation. Maybe a
regression from https://bugs.openjdk.java.net/browse/JDK-8039469. This is a quick fix so that compilation works again.
Today when an index is shrunk the version information is not carried over
from the source to the target index. This can cause major issues like mapping
incompatibilities for instance if an index from a previous major version is shrunk.
This commit ensures that all version information from the soruce index is preserved
when a shrunk index is created.
Closes#22373
Provides an example of using is and an example return description
and explains that we've added descriptions for some tasks but not
even close to all of them. And that we expect to change the
descriptions as we learn more.
Closes#22407
* Fix example
Getting a single task is always detailed, no need to specify.
* Rewrite like imotov wants it
Netty plays a lot of games with recycling byte buffers in thread local
caches, and using a pooled byte buffer allocator to reduce pressure on
the garbage collector.
The recycler in particular appears to be fraught with peril. It appears
that there are circumstances where the recycler does not recycle quickly
enough and can exceed its capacity leading to heap exhaustion and out of
memory errors. If you spend a few minutes reading the history of the
recycler on the Netty GitHub issues, it appears it has been nothing but
a source of trouble, and the project itself has an open issue that
proposes disabling by default and possibly even removing the recycler.
The pooled byte buffer allocator has problems itself. It sizes the pool
based on the number of runtime processors and can indeed grab a very
large percentage of the heap (in some cases 50% or more). Additionally,
the Netty project continues to struggle with leaks here.
We are seeing users struggle with issues in 5.x that I think are largely
driven by some of the problems here with Netty.
This change proposes to disable the recycler, and to disable the pooled
byte buffer allocator. I think that disabling these features will return
some of the stablity that these features appear to be losing us.
I have done performance testing on my workstation with disabling these
and I do not see a difference in performance. I propose that we make
this change in master and let some nightly benchmarks run to confirm
that there is not a difference in performance. If we are comfortable
with the performance changes, I propose backporting this to all active
branches.
Relates #22452
ParseFieldMatcher as well as ParseFieldMatcherSupplier will be soon removed, hence the ObjectParser's context doesn't need to be a ParseFieldMatcherSupplier anymore. That will allow to remove ParseFieldMatcherSupplier's implementations, little by little.
The test currently checks that the recovering shard is not failed when it is not a primary relocation that has moved past the finalization step.
Checking if it has moved past that step is done by intercepting the request between the replication source and the target and checking if it has seen
then WAIT_FOR_CLUSTERSTATE action as this is the next action that is called after finalization. This action can, however, occur only after the shard was
already failed, and thus trip the assertion. This commit changes the check to look out for the FINALIZE action, independently of whether it succeeded or not.
#22325 changed the recovery retry logic to use unique recovery ids. The change also introduced an issue, however, which made it possible for the shard store to be closed under CancellableThreads, triggering assertions in the node locking logic. This commit limits the use of CancellableThreads only to the part where we wait on the old recovery target to be closed.
The RestHighLevelClient class takes as as an argument a low level client instance RestClient. The first method added is ping, which returns true if the call to HEAD / went ok and false if an IOException was thrown. Any other exception gets bubbled up.
There are two kinds of tests, a unit test (RestHighLevelClientTests) that verifies the interaction between high level and low level client, and an integration test (MainActionIT) which relies on an externally started es cluster to send requests to.
Today we execute the low level handshake on the TCP layer in #connectToNode.
If #openConnection is used directly, which is truly expert, no handshake is executed
which allows connecting to nodes that are not necessarily compatible. This change
moves the handshake to #openConnection to prevent bypassing this logic.
Randomized runner uses a flag, tests.asserts, which we have previously
not used, but is used in lucene for disabling assertions. This change
modifies the gradle configuration to look for this flag and pass through
to the test runner to determine whether -ea and -esa are added to the
java commandline for tests.