When a shard has fallen search idle, and a non-realtime multi-get
request is executed, today such requests do not wait for the shard to
become search active and therefore such requests do not wait for a
refresh to see the latest changes to the index. This also prevents such
requests from triggering the shard as non-search idle, influencing the
behavior of scheduled refreshes. This commit addresses this by attaching
a listener to the shard search active state for multi-get requests. In
this way, when the next scheduled refresh is executed, the multi-get
request will then proceed.
testSyncFailsIfOperationIsInFlight could fail due to the index request
spawing a GCP sync (new since 7.4). Test now waits for it to finish
before testing that flushed sync fails.
With this change, we can avoid blocking writing threads when trimming
unreferenced readers; hence improving the translog writing performance
in async durability mode.
Close#46201
* Wait for all Rec. to Stop on Node Close
* This issue is in the `RecoverySourceHandler#acquireStore`. If we submit the store release to the generic threadpool while it is getting shut down we never complete the futue we wait on (in the generic pool as well) and fail to ever release the store potentially.
* Fixed by waiting for all recoveries to end on node close so that we aways have a healthy thread pool here
* Closes#45956
This is the first PR that merges changes made to server module from
the enrich branch (see #32789) into the master branch.
The plan is to merge changes made to the server module separately from
the pr that will merge enrich into master, so that these changes can
be reviewed in isolation.
When recovering a shard locally, we use a translog snapshot from
newSnapshotFromGen which consists of all readers from a certain
generation. In the test, we use newSnapshotFromMinSeqNo for the
expectation. The snapshot of this method includes only readers
containing operations in the requesting range.
Closes#46022
If a node is misconfigured to talk to remote node HTTP port (instead of
transport port) eventually it will receive an HTTP response from the
remote node on transport port (this happens when a node sends
accidentally line terminating byte in a transport request).
If this happens today it results in a non-friendly log message and a
long stack trace.
This commit adds a check if a malformed response is HTTP response. In
this case, a concise log message would appear.
(cherry picked from commit 911d02b7a9c3ce7fe316360c127a935ca4b11f37)
When a query contains a mandatory clause that doesn't track the max score per
block, we disable the max score optimization. Previously, we were doing this by
wrapping the collector with a FilterCollector that always returned
ScoreMode.COMPLETE.
However we weren't adjusting totalHitsThreshold, so the collector could still
call Scorer#setMinCompetitiveScore. It is against the method contract to call
setMinCompetitiveScore when the score mode is COMPLETE, and some scorers like
ReqOptSumScorer throw an error in this case.
This commit tries to disable the optimization by always setting
totalHitsThreshold to max int, as opposed to wrapping the collector.
Today we might carry on a big merge uncommitted and therefore
occupy a significant amount of diskspace for quite a long time
if for instance indexing load goes down and we are not quickly
reaching the translog size threshold. This change will cause a
flush if we hit a significant merge (512MB by default) which
frees diskspace sooner.
We only sync translog if the given offset hasn't synced yet. We can't
verify the global checkpoint from the latest translog checkpoint unless
a sync has occurred.
Closes#46065
Relates #45634
Today the `DiskThresholdDecider` attempts to account for already-relocating
shards when deciding how to allocate or relocate a shard. Its goal is to stop
relocating shards onto a node before that node exceeds the low watermark, and
to stop relocating shards away from a node as soon as the node drops below the
high watermark.
The decider handles multiple data paths by only accounting for relocating
shards that affect the appropriate data path. However, this mechanism does not
correctly account for _new_ relocating shards, which are unwittingly ignored.
This means that we may evict far too many shards from a node above the high
watermark, and may relocate far too many shards onto a node causing it to blow
right past the low watermark and potentially other watermarks too.
There are in fact two distinct issues that this PR fixes. New incoming shards
have an unknown data path until the `ClusterInfoService` refreshes its
statistics. New outgoing shards have a known data path, but we fail to account
for the change of the corresponding `ShardRouting` from `STARTED` to
`RELOCATING`, meaning that we fail to find the correct data path and treat the
path as unknown here too.
This PR also reworks the `MockDiskUsagesIT` test to avoid using fake data paths
for all shards. With the changes here, the data paths are handled in tests as
they are in production, except that their sizes are fake.
Fixes#45177
Today we assume that document failures can not occur for deletes. This
assumption is bogus, as they can fail for a variety of reasons such as
the Lucene index having reached the document limit. Because of this
assumption, we were asserting that such a document-level failure would
never happen. When this bogus assertion is violated, we fail the node, a
catastrophe. Instead, we need to treat this as a fatal engine exception.
Today we assume that document failures can not occur for no-ops. This
assumption is bogus, as they can fail for a variety of reasons such as
the Lucene index having reached the document limit. Because of this
assumption, we were asserting that such a document-level failure would
never happen. When this bogus assertion is violated, we fail the node, a
catastrophe. Instead, we need to treat this as a fatal engine exception.
Backport of 1a0dddf4ad24b3f2c751a1fe0e024fdbf8754f94 (AKA #445395)
* Add support for a Range field ValuesSource, including decode logic for range doc values and exposing RangeType as a first class enum
* Provide hooks in ValuesSourceConfig for aggregations to control ValuesSource class selection on missing & script values
* Branch aggregator creation in Histogram and DateHistogram based on ValuesSource class, to enable specialization based on type. This is similar to how Terms aggregator works.
* Prioritize field type when available for selecting the ValuesSource class type to use for an aggregation
Searching with `allowPartialSearchResults=false` could still return
partial search results during recovery. If a shard copy fails
with a "shard not available" exception, the failure would be ignored and
a partial result returned. The one case where this is known to happen
is when a shard copy is recovering when searching, since
`IllegalIndexShardStateException` is considered a "shard not available"
exception.
Relates to #42612
Since #45473, we trim translog below the local checkpoint of the safe
commit immediately if soft-deletes enabled. In
testRestoreLocalHistoryFromTranslog, we should have a safe commit after
recoverFromTranslog is called; then we will trim translog files which
contain only operations that are at most the global checkpoint.
With this change, we relax the assertion to ensure that we don't put
operations to translog while recovering history from the local translog.
If two translog syncs happen concurrently, then one can return before
its operations are marked as persisted. In general, this should not be
an issue; however, peer recoveries currently rely on this assumption.
Closes#29161
Today we create new engines under IndexShard#mutex. This is not ideal
because it can block the cluster state updates which also execute under
the same mutex. We can avoid this problem by creating new engines under
a separate mutex.
Closes#43699
This commit starts from the simple premise that the use of node settings
in blob store repositories is a mistake. Here we see that the node
settings are used to get default settings for store and restore throttle
rates. Yet, since there are not any node settings registered to this
effect, there can never be a default setting to fall back to there, and
so we always end up falling back to the default rate. Since this was the
only use of node settings in blob store repository, we move them. From
this, several places fall out where we were chaining settings through
only to get them to the blob store repository, so we clean these up as
well. That leaves us with the changeset in this commit.
This adds a pipeline aggregation that calculates the cumulative
cardinality of a field. It does this by iteratively merging in the
HLL sketch from consecutive buckets and emitting the cardinality up
to that point.
This is useful for things like finding the total "new" users that have
visited a website (as opposed to "repeat" visitors).
This is a Basic+ aggregation and adds a new Data Science plugin
to house it and future advanced analytics/data science aggregations.
Backport of #45799
This PR modifies the logic in IngestService to preserve the original content type
on the IndexRequest, such that when a document with a content type like SMILE
is submitted to a pipeline, the resulting document that is persisted will remain in
the original content type (SMILE in this case).
This is essentially the same issue fixed in #43362 but for http request
version instead of the request method. We have to deal with the
case of not being able to parse the request version, otherwise
channel closing fails.
Fixes#43850
The snapshot status when blocking can still be INIT in rare cases when
the new cluster state that has the snapshot in `STARTED` hasn't yet
become visible.
Fixes#45917
This commit enhances logging for 2 cases:
1. If non-TLS enabled node receives transport message from TLS enabled
node on transport port.
2. If non-TLS enabled node receives HTTPs request on transport port.
(cherry picked from commit 4f52ebd32eb58526b4c8022f8863210bf88fc9be)
Closing a `RemoteClusterConnection` concurrently with trying to connect
could result in double invoking the listener.
This fixes
RemoteClusterConnectionTest#testCloseWhileConcurrentlyConnecting
Closes#45845
In case of an in-progress snapshot this endpoint was broken because
it tried to execute repository operations in the callback on a
transport thread which is not allowed (only generic or snapshot
pool are allowed here).
This commit namespaces the existing processors setting under the "node"
namespace. In doing so, we deprecate the existing processors setting in
favor of node.processors.
Since #45136, we use soft-deletes instead of translog in peer recovery.
There's no need to retain extra translog to increase a chance of
operation-based recoveries. This commit ignores the translog retention
policy if soft-deletes is enabled so we can discard translog more
quickly.
Backport of #45473
Relates #45136
Today, when rolling a new translog generation, we block all write
threads until a new generation is created. This choice is perfectly
fine except in a highly concurrent environment with the translog
async setting. We can reduce the blocking time by pre-sync the
current generation without writeLock before rolling. The new step
would fsync most of the data of the current generation without
blocking write threads.
Close#45371
Follow up on #32806.
The system property es.http.cname_in_publish_address is deprecated
starting from 7.0.0 and deprecation warning should be added if the
property is specified.
This PR will go to 7.x and master.
Follow-up PR to remove es.http.cname_in_publish_address property
completely will go to the master.
(cherry picked from commit a5ceca7715818f47ec87dd5f17f8812c584b592b)
* There is no point in listing out every shard over and over when the `index-N` blob in the shard contains a list of all the files
* Rebuilding the `index-N` from the `snap-${uuid}.dat` blobs does not provide any material benefit. It only would in the corner case of a corrupted `index-N` but otherwise uncorrupted blobs since we neither check the correctness of the content of all segment blobs nor do we do a similar recovery at the root of the repository.
* Also, at least in version `6.x` we only mark a shard snapshot as successful after writing out the updated `index-N` blob so all snapshots that would work with `7.x` and newer must have correct `index-N` blobs
=> Removed the rebuilding of the `index-N` content from `snap-${uuid}.dat` files and moved to only listing `index-N` when taking a snapshot instead of listing all files
=> Removed check of file existence against physical blob listing
=> Kept full listing on the delete side to retain full cleanup of blobs that aren't referenced by the `index-N`
This PR introduces a mechanism to cancel a search task when its corresponding connection gets closed. That would relief users from having to manually deal with tasks and cancel them if needed. Especially the process of finding the task_id requires calling get tasks which needs to call every node in the cluster.
The implementation is based on associating each http channel with its currently running search task, and cancelling the task when the previously registered close listener gets called.
Today we can release a Store using CancellableThreads. If we are holding
the last reference, then we will verify the node lock before deleting
the store. Checking node lock performs some I/O on FileChannel. If the
current thread is interrupted, then the channel will be closed and the
node lock will also be invalid.
Closes#45237
* Add is_write_index column to cat.aliases (#44772)
Aliases have had the option to set `is_write_index` since 6.4,
but the cat.aliases action was never updated.
* correct version bounds to 7.4
Most of our CLI tools use the Terminal class, which previously did not provide methods for writing to standard output. When all output goes to standard out, there are two basic problems. First, errors and warnings are "swallowed" in pipelines, making it hard for a user to know when something's gone wrong. Second, errors and warnings are intermingled with legitimate output, making it difficult to pass the results of interactive scripts to other tools.
This commit adds a second set of print commands to Terminal for printing to standard error, with errorPrint corresponding to print and errorPrintln corresponding to println. This leaves it to developers to decide which output should go where. It also adjusts existing commands to send errors and warnings to stderr.
Usage is printed to standard output when it's correctly requested (e.g., bin/elasticsearch-keystore --help) but goes to standard error when a command is invoked incorrectly (e.g. bin/elasticsearch-keystore list-with-a-typo | sort).
* Repository Cleanup Endpoint (#43900)
* Snapshot cleanup functionality via transport/REST endpoint.
* Added all the infrastructure for this with the HLRC and node client
* Made use of it in tests and resolved relevant TODO
* Added new `Custom` CS element that tracks the cleanup logic.
Kept it similar to the delete and in progress classes and gave it
some (for now) redundant way of handling multiple cleanups but only allow one
* Use the exact same mechanism used by deletes to have the combination
of CS entry and increment in repository state ID provide some
concurrency safety (the initial approach of just an entry in the CS
was not enough, we must increment the repository state ID to be safe
against concurrent modifications, otherwise we run the risk of "cleaning up"
blobs that just got created without noticing)
* Isolated the logic to the transport action class as much as I could.
It's not ideal, but we don't need to keep any state and do the same
for other repository operations
(like getting the detailed snapshot shard status)
This change adds a new option called user_dictionary_rules to
Kuromoji's tokenizer. It can be used to set additional tokenization rules
to the Japanese tokenizer directly in the settings (instead of using a file).
This commit also adds a check that no rules are duplicated since this is not allowed
in the UserDictionary.
Closes#25343
Backports PR #45737:
Similar to PR #45030 integration test testDontCacheScripts() was moved to unit test AvgAggregatorTests#testDontCacheScripts.
AvgIT class was removed.