We lost the `logger.transport.name` line in #65169 and I incorrectly
extrapolated from what was left and mangled it further in #66318. This
commit fixes things.
Today the docs use `logger.org.elasticsearch.transport: TRACE` as the
example for adjusting the logger config. This is a dangerous thing to
suggest since that's one of the most verbose loggers we have. An
accidental copy-and-paste of this example into a busy cluster can
cause damage.
This commit suggests `logger.org.elasticsearch.discovery: DEBUG`
instead, which is much more benign.
It also corrects the order of the levels and notes that `DEBUG` and
`TRACE` are only for expert use.
This commit fixes the cat tasks api parameter specification and the
handler so that the parameters are consumed during request preparation.
Closes#59493
Backport of #66272
This commit ensures that the after key is parsed with the doc value formatter.
This is needed for unsigned longs that uses shifted longs internally.
Closes#65685
In the assertion mentioned in the new comment we first get the `endingSnapshots`
and then check that we don't have a listener that isn't referred to by it so we need
to remove from the listers map before removing from `endingSnapshots` to avoid rare, random
assertion tripping here with concurrent repository operations.
With the upcoming validation for type compatibility of the sequence
keys, several tests are failing because some fields that contain IP
data were previously mapped as keyword. Fixed the mapping and created a
new snaphost of the correctness data in the gcs bucket.
Relates to: #66183
(cherry picked from commit 7f638f661c5a5c57a4ea7d3d3e2ccf5c81ae92d1)
If creating the latest translog file and retrieving a translog stats
happen within the same millisecond, then the earliestLastModifiedAge
will be zero.
Closes#66092
Today, CCR only checks the historyUUID of the leader shard when it has
operations to replicate. If the follower shard is already in-sync with
the leader shard, then CCR won't detect if the historyUUID of the leader
shard has been changed. While this is not an issue, it can annoy users
in the following situation:
The follower index is in-sync with the leader index
Users restore the leader index from snapshots
CCR won't detect the issue and report ok in its stats API
CCR suddenly stops working when users start indexing to the leader index
This commit makes sure that we always check historyUUID in every
read-request so we can detect and report the issue as soon as possible.
Backport of #65841
The option to use templates already defined in the cluster is not explicitly stated in the docs.
This PR adds and example to the `rank_eval` documentation.
This PR fixes a regression where fvh fragments could be loaded from the wrong
document _source.
Some `FragmentsBuilder` implementations contain a `SourceLookup` to load from
_source. The lookup should be positioned to load from the current hit document.
However, since `FragmentsBuilder` are cached and shared across hits, the lookup
is never updated to load from the new documents. This means we accidentally
load _source from a different document.
The regression was introduced in #60179, which started storing `SourceLookup`
on `FragmentsBuilder`.
Fixes#65533.
Closes#63869. Perform `docker pull` explicitly instead of as part of
`docker build`, and wrap it in a retry loop. This is an attempt to make
the build more resilient to transient errors.
Today we document the use of `_[networkInterface]_` to specify the
addresses of a network interface but do not spell out which parts of
this syntax should be taken literally and which are part of the
placeholder for the interface name.
This commit clarifies the docs.
NB this is a backport of just the docs changes from #66013.
Closes#65978.
This commit adjusts the behavior when calculating the diff between two
`AbstractScopedSettings` objects, so that the default values of settings
whose default values depend on the values of other settings are
correctly calculated. Previously, when calculating the diff, the default
value of a depended setting would be calculated based on the default
value of the setting(s) it depends on, rather than the current value of
those settings.
In #61906 we added the possibility for the master node to fetch
the size of a shard snapshot before allocating the shard to a
data node with enough disk space to host it. When merging
this change we agreed that any failure during size fetching
should not prevent the shard to be allocated.
Sadly it does not work as expected: the service only triggers
reroutes when fetching the size succeed but never when it
fails. It means that a shard might stay unassigned until
another cluster state update triggers a new allocation
(as in #64372). More sadly, the test I wrote was wrong as
it explicitly triggered a reroute.
This commit changes the InternalSnapshotsInfoService
so that it also triggers a reroute when fetching the snapshot
shard size failed, ensuring that the allocation can move
forward by using an UNAVAILABLE_EXPECTED_SHARD_SIZE
shard size. This unknown shard size is kept around in the
snapshot info service until no corresponding unassigned
shards need the information.
Backport of #65436
This change simplifies logic and allow more legit cases in Metadata.Builder.validateDataStreams.
It will only show conflict on names that are in form of .ds-<data stream name>-<[0-9]+> and will allow any names like .ds-<data stream name>-something-else-<[0-9]+>.
This fixes problem with rollover when you have 2 data streams with names like a and a-b - currently if a-b has generation greater than a you won't be able to rollover a anymore.
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Today we document the settings used to control rebalancing and
disk-based shard allocation but there isn't really any discussion around
what these processes do so it's hard to know what, if any, adjustments
to make.
This commit adds some words to help folk understand this area better.
In case the local agg sorter queue gets full and no limit has been provided,
the local sorter will now erroneously call the failure callback for every
single row in the original rowset that's left over the local queue limit
(instead for just the first one). The failure response is dispatched in any
case, so this is relatively harmless. The sorter continues iterating on the
original response fetching subsequent pages. In case of correct Elasticsearch
behaviour, this is also harmless, it'll just trigger a number of internal
exceptions. However, in case of a pagination defect in Elasticsearch (like
GH#65685, where the same search_after is returned), this will result in an
effective spin loop, potentially rendering eventually the node unresponsive.
This PR simply breaks both the inner loop iterating over the current unsorted
rowset, as well as the outer one, iterating over the left pages.
It also fixes an outdated documentation limitation.
(cherry picked from commit 638402c387faf79bba38fcc95f371a73146efc0b)
* Add release notes for 7.10.1
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
TransportService doesn't respond to the pending requests of proxy
connections when the underlying connections get disconnected because
proxy connections do not override the getCacheKey method. Some CCS
requests would never be completed because of this bug.
The `lucene_version_path` variable is used in urls pointing to the Lucene docs.
They use underscore separators for the major and minor versions there.
Correcting this since this has been lost in the latest update on the 7.x branch.