Today the `GetDiscoveredNodesAction` waits, possibly indefinitely, to discover
enough nodes to bootstrap the cluster. However it is possible that the cluster
forms before a node has discovered the expected collection of nodes, in which
case the action will wait indefinitely despite the fact that it is no longer
required.
This commit changes the behaviour so that the action fails once a node receives
a cluster state with a nonempty configuration, indicating that the cluster has
been successfully bootstrapped and therefore the `GetDiscoveredNodesAction`
need wait no longer.
Relates #36380 and #36381; reverts 558f4ec278.
Previously, we used a CamelCase to CAMEL_CASE transformation to get the
primary name of a function from its class name which led to some issues
since there are functions that we don't want to be registered this way
(e.g.: IFNULL). To simplify the logic and avoid and "magic"
transformations in the FunctionRegistry a primary name must be provided
explicitely for each function.
The same change is applied for the function resolution (when a function
is used in an SQL statement). There is no CamelCase to CAMEL_CASE
transformation but only upper-casing is applied (fuNcTiOn -> FUNCTION).
This commit adds support for the index templates exist API, creating
new client-side request types for that API and the get index
templates API. Also adds links in hlrc docs to pages for supported
index template APIs
This commit creates JodaDateFormatter to replace
FormatDateTimeFormatter. It converts all uses of the old class
to DateFormatter to allow a future change to use JavaDateFormatter
when appropriate.
This allows you to plug the behavior that the LLRC uses to handle
warnings on a per request basis.
We entertained the idea of allowing you to set the warnings behavior to
strict mode on a per request basis but that wouldn't allow the high
level rest client to fail when it sees an unexpected warning.
We also entertained the idea of adding a list of "required warnings" to
the `RequestOptions` but that won't work well with failures that occur
*sometimes* like those we see in mixed clusters.
Adding a list of "allowed warnings" to the `RequestOptions` would work
for mixed clusters but it'd leave many of the assertions in our tests
weaker than we'd like.
This behavior plugging implementation allows us to make a "required
warnings" option when we need it and an "allowed warnings" behavior when
we need it.
I don't think this behavior is going to be commonly used by used outside
of the Elasticsearch build, but I expect they'll be a few commendably
paranoid folks who could use this behavior.
Moves all remaining (rolling-upgrade and mixed-version) REST tests to use Zen2. To avoid adding
extra configuration, it relies on Zen2 being set as the default discovery type. This required a few
smaller changes in other tests. I've removed AzureMinimumMasterNodesTests which tests Zen1
functionality and dates from a time where host providers were not configurable and each cloud
plugin had its own discovery.type, subclassing the ZenDiscovery class. I've also adapted a few tests
which were unnecessarily adding addTestZenDiscovery = false for the same legacy reasons. Finally,
this also moves the unconfigured-node-name REST test to Zen2, testing the auto-bootstrapping
functionality in development mode when no discovery configuration is provided.
Closes#34820
With this change we allow for no tests being ran in randomized testing
task, and forbid empty testing tasks from the testing conventions task.
We will no longer have to disable the task if all tests are muted.
* Improve logged exec output readability
- Split error and out streams and log them separately
- Log everything in a single call to prevent interference from
other log messages
Renamed the follow qa modules:
`multi-cluster-downgraded-to-basic-license` to `downgraded-to-basic-license`
`multi-cluster-with-non-compliant-license` to `non-compliant-license`
`multi-cluster-with-security` to `security`
Moved the `chain` module into the `multi-cluster` module and
changed the `multi-cluster` to start 3 clusters.
Followup from #36031
Today we expose a new engine immediately during Lucene rollback. The new
engine is started with a safe commit which might not include all
acknowledged operation. With this change, we won't expose the new engine
until it has recovered from the local translog.
Note that this solution is not complete since it's able to reserve only
acknowledged operations before the global checkpoint. This is because we
replay translog up to the global checkpoint during rollback. A per-doc
Lucene rollback would solve this issue entirely.
Relates #32867
Today, we iterate the bitset of hardLiveDocs to calculate the number of
live docs. This calculation might be expensive if we enable soft-deletes
(by default) for old indices whose soft-deletes was disabled previously
and had hard-deletes.
Once soft-deletes is enabled, we no longer hard-update or hard-delete
documents directly. We have hard-deletes in two scenarios: (1) from old
segments where soft-deletes was disabled, (2) when IndexWriter hits
non-aborted exceptions. These two cases, IW flushes SegmentInfos before
exposing the hard-deletes; thus we can use the hard-delete count of
SegmentInfos.
Creating `CompositeBytesReference` has more overhead than a single
`ByteBufferReference`. Many of our messages will be contained to a
single `ByteBuffer`. This commit avoids creating composite instances
when there is 0 or 1 underlying `ByteBuffers`.
We support rolling upgrades from Zen1 by keeping the master as a Zen1 node
until there are no more Zen1 nodes in the cluster, using the following
principles:
- Zen1 nodes will never vote for Zen2 nodes
- Zen2 nodes will, while not bootstrapped, vote for Zen1 nodes
- Zen2 nodes that were previously part of a mixed cluster will automatically
(and unsafely) bootstrap themselves when the last Zen1 node leaves.
This commit makes FormatDateTimeFormatter and DateFormatter apis close
to each other, so that the former can be removed in favor of the latter.
This PR does not change the uses of FormatDateTimeFormatter yet, so that
that future change can be purely mechanical.
This commit gets rid of the 'NONE' and 'INFO' severity levels for
deprecation issues.
'NONE' is unused and does not make much sense as a severity level.
'INFO' can be separated into two categories: Either 1) we can
definitively tell there will be a problem with the cluster/node/index
configuration that can be resolved prior to upgrade, in which case
the issue should be a WARNING, or 2) we can't, because any issues would
be at the application level, for which the user should review the
deprecation logs and/or response headers.
This is related to #35975. It implements a basic restore functionality
for the CcrRepository. When the restore process is kicked off, it
configures the new index as expected for a follower index. This means
that the index has a different uuid, the version is not incremented, and
the Ccr metadata is installed.
When the restore shard method is called, an empty shard is initialized.
This commit removes the parseDefaulting method from DateFormatter,
bringing it more inline with the joda equivalent
FormatDateTimeFormatter. This method was only needed for the java
time implementation of DateMathParser. Instead, a DateFormatter now
returns an implementation of DateMathParser for the given format,
allowing the java time implementation to construct the appropriate date
math parser internally.
ML jobs and datafeeds wrap collections into their unmodifiable
equivalents in their constructor. However, the copying builder
does not make a copy of some of those collections resulting
in wrapping those again and again. This can eventually result
to stack overflow.
This commit addressed this issue by copying the collections in
question in the copying builder constructor.
Closes#36360
We use MethodHandles.asType to cast argument types into the appropriate parameter types for
method calls when the target of the call is a def type at runtime. Currently, certain implicit casts
using the def type are asymmetric. It is possible to cast Integer -> float as an argument to parameter, but not from int -> Float (boxed to primitive with upcasting is okay, but primitive to
boxed with upcasting is not).
This PR introduces a solution to the issue by generating bridge methods for all whitelisted methods
that have at least a single boxed type as an argument. The bridge method will conduct appropriate
casts and then call the original method. This adds a bit of overhead for correctness. It should not be
used often as Painless avoids boxed types as much as possible.
Note that a large portion of this change is adding methods to do the appropriate def to boxed type
casts and a few mechanical changes as well. The most important method for review is
generateBridgeMethod in PainlessLookupBuilder.
* Add warning if cluster fails to form fast enough
Today if a leader is not discovered or elected then nodes are essentially
silent at INFO and above, and log copiously at DEBUG and below. A short delay
when electing a leader is not unusual, for instance if other nodes have not yet
started, but a persistent failure to elect a leader is a problem worthy of log
messages in the default configuration.
With this change, while there is no leader each node outputs a WARN-level log
message every 10 seconds (by default) indicating as such, describing the
current discovery state and the current quorum(s).
* Add note about whether the discovered nodes form a quorum or not
* Introduce separate ClusterFormationFailureHelper
... and back out the unnecessary changes elsewhere
* It can be volatile
In #34474, we added a new assertion to ensure that the
LocalCheckpointTracker is always consistent with Lucene index. However,
we reset LocalCheckpoinTracker in testDedupByPrimaryTerm cause this
assertion to be violated.
This commit removes resetCheckpoint from LocalCheckpointTracker and
rewrites testDedupByPrimaryTerm without resetting the local checkpoint.
Relates #34474
This test tries to compare the CB stats from an InternalEngine
and a FrozenEngine but is subject to segement merges that might finish
and get committed after we read the breaker stats. This can cause
occational test failures.
Closes#36207