* Follow up to #44165
* We should just catch all exceptions here and not return errors after the index-N update went through since a subsequent delete attempt by the user would fail with SnapshotMissingException since the snapshot now appears deleted. Also, `SnapshotException` isn't even thrown in the changed spot it seems in the first place and certainly not the only exception possible.
Today a cluster health request can wait on a selection of conditions, but it
does not guarantee that all of these conditions have ever held simultaneously
when it returns. More specifically, if a request sets `waitForEvents()` along
with some other conditions then Elasticsearch will respond when the master has
processed all the expected pending tasks _and then_ the cluster satisfied the
other conditions, but it may be that at the time the cluster satisfied the
other conditions there were undesired pending tasks again.
This commit adjusts the behaviour of `waitForEvents()` to wait for all the
required events to be processed and then, if the resulting cluster state does
not satisfy the other conditions, it will wait until there is a cluster state
that does and then retry the wait-for-events too.
Today the `BatchedRerouteService` submits its delayed reroute task at `HIGH`
priority, but in some cases a lower priority would be more appropriate. This
commit adds the facility to submit delayed reroute tasks at different
priorities, such that each submitted reroute task runs at a priority no lower
than the one requested. It does not change the fact that all delayed reroute
tasks are submitted at `HIGH` priority, but at least it makes this explicit.
This commit creates new base classes for master node actions whose
response types still implement Streamable. This simplifies both finding
remaining classes to convert, as well as creating new master node
actions that use Writeable for their responses.
relates #34389
Today we do not throw a `TranslogCorruptedException` in certain cases of
translog corruption, such as for a corrupted checkpoint file or when an
expected file (either checkpoint or translog) is completely missing. This means
that `elasticsearch-shard` will not truncate the translog in those cases.
This commit strengthens the translog corruption tests to corrupt and/or delete
both translog and checkpoint files, and ensures that a
`TranslogCorruptedException` is thrown in all cases. It also sometimes
simulates a recovery after a crash while rolling the translog generation,
including cases where the rolled checkpoint contains incorrect data.
It also adjusts (and renames) `RemoveCorruptedShardDataCommandIT.getDirs()` to
return only a single path, since in practice this was the only thing that could
happen and yet we were relying on its callers to verify this and not all
callers were doing so.
* Stronger Cleanup Shard Snapshot Directory on Delete
* Use `RepositoryData` to clean up unreferenced `snap-${uuid}.dat` blobs
from shard directories (and index-N) and as a result also clean up data
blobs that are only referenced by them
* Stop cleaning up anything but index-N on shard snapshot creation to
align behavior of shard index-N handling with root path index-N
handling
The responses super.writeTo() method was removed in #44092, so the corresponding
contructor that reads from a stream shouldn't call super itself, even though its
implementation is currently empty.
* Avoid Needless Set Instantiation in InboundMessage
* When `features` is empty (when there's no xpack) we constantly and needless instantiated a few objects here for the empty set on every message
* At least all the way back to 6.x we never use anything but `SMILE` in
production code with this class so I removed the more general
constructor and removed the format leniency from the deserialization
* HLRC: Fix '+' Not Correctly Encoded in GET Req.
* Encode `+` correctly as `%2B` in URL paths
* Keep encoding `+` as space in URL parameters
* Closes#33077
An indexing on a replica should never fail after it was successfully
indexed on a primary. Hence, we should fail an engine if we hit any
failure (document level or tragic failure) when processing an indexing
on a replica.
Relates #43228Closes#40435
Currently we loose information about whether a token list in an AnalyzeAction
response is null or an empty list, because we write a 0 value to the stream in
both cases and deserialize to a null value on the receiving side. This change
fixes this so we write an additional flag indicating whether the value is null
or not, followed by the size of the list and its content.
Closes#44078
This commit moves the Supplier variant of HandledTransportAction to have
a different ordering than the Writeable.Reader variant. The Supplier
version is used for the legacy Streamable, and currently having the
location of the Writeable.Reader vs Supplier in the same place forces
using casts of Writeable.Reader to select the correct super constructor.
This change in ordering allows easier migration to Writeable.Reader.
relates #34389
This is a refactor to current JSON logging to make it more open for extensions
and support for custom ES log messages used inDeprecationLogger IndexingSlowLog , SearchSLowLog
We want to include x-opaque-id in deprecation logs. The easiest way to have this as an additional JSON field instead of part of the message is to create a custom DeprecatedMessage (extends ESLogMEssage)
These messages are regular log4j messages with a text, but also carry a map of fields which can then populate the log pattern. The logic for this lives in ESJsonLayout and ESMessageFieldConverter.
Similar approach can be used to refactor IndexingSlowLog and SearchSlowLog JSON logs to contain fields previously only present as escaped JSON string in a message field.
closes#41350
backport #41354
* With the removal of the incompatible snapshots list in RepositoryData
the get snapshots and get all snapshots methods are equivalent so I
removed one of them
* The assertion added in #44214 is tripped by tests running dedicated
test clusters per test needlessly.This breaks existing tests like the one in #44245.
* Closes#44245
* Safer Shard Snapshot Delete
* We shouldn't delete the snapshot meta file before we update the index
in the shard folder. If we fail to update the index-N after deleting the
existing index-N is broken because the snap- blob it references is gone.
Today if a master-eligible node is converted to a master-ineligible node it may
remain in the voting configuration, meaning that the master node may count its
publish responses as an indication that it has properly persisted the cluster
state. However master-ineligible nodes do not properly persist the cluster
state, so it is not safe to count these votes.
This change adjusts `CoordinationState` to take account of this from a safety
point of view, and also adjusts the `Coordinator` to prevent such nodes from
joining the cluster. Instead, it triggers a reconfiguration to remove from the
voting configuration a node that now appears to be master-ineligible before
processing its join.
Backport of #43688, see #44260.
* We don't have to calculate the start and end times form the shards for the status API, we have the start time available from the CS or the `SnapshotInfo` in the repo and can either take the end time form the `SnapshotInfo` or
take the most recent time from the shard stats for in progress snapshots
* Closes#43074
For this test, we randomize the CLUSTER_MAX_CLAUSE_COUNT on test setup
(@BeforeClass) between 50 and 100. Some queries in the test generate 56 clauses
which hasn't been an issue before LUCENE-8811, but we slightly need to increase
the minimal possible clause count now.
Closes#44192
This will help in investigations where the real memory circuit breaker is tripped to better understand
on what the actual memory is used, i.e. whether it's a temporary thing (e.g. requests) in contrast to
more permanently allocated memory (e.g. accounting).
* Cleans up all root level temp., snap-%s.dat, meta-%s.dat blobs that aren't referenced by any snapshot to deal with dangling blobs left behind by delete and snapshot finalization failures
* The scenario that get's us here is a snapshot failing before it was finalized or a delete failing right after it wrote the updated index-(N+1) that doesn't reference a snapshot anymore but then fails to remove that snapshot
* Not deleting other dangling blobs since that don't follow the snap-, meta- or tempfile naming schemes to not accidentally delete blobs not created by the snapshot logic
* Follow up to #42189
* Same safety logic, get list of all blobs before writing index-N blobs, delete things after index-N blobs was written
* Fix ShrinkIndexIT
* Move this test suit to cluster scope. Currently, `testShrinkThenSplitWithFailedNode` stops a random node which randomly turns out to be the only shared master node so the cluster reset fails on account of the fact that no shared master node survived.
* Closes#44164
In some cases we need to parse some XContent that is already parsed into
a map. This is currently happening in handling source in SQL and ingest
processors as well as parsing null_value values in geo mappings. To avoid
re-serializing and parsing the value again or writing another map-based
parser this commit adds an iterator that iterates over a map as if it was
XContent. This makes reusing existing XContent parser on maps possible.
Relates to #43554
With connection management now being non-blocking, we can make NodeConnectionsService
avoid the use of MANAGEMENT threads that are blocked during the connection attempts.
I had to fiddle a bit with the tests as testPeriodicReconnection was using both the mock Threadpool
from the DeterministicTaskQueue as well as the real ThreadPool initialized at the test class level,
which resulted in races.
* Reduce Number of List Calls During Snapshot Create and Delete
Some obvious cleanups I found when investigation the API call count
metering:
* No need to get the latest generation id after loading latest
repository data
* Loading RepositoryData already requires fetching the latest
generation so we can reuse it
* Also, reuse list of all root blobs when fetching latest repo
generation during snapshot delete like we do for shard folders
* Lastly, don't try and load `index--1` (N = -1) repository data, it
doesn't exist -> just return the empty repo data initially
Simplifies AbstractSimpleTransportTestCase to use JVM-local ports and also adds an assertion so
that cases like #44134 can be more easily debugged. The likely reason for that one is that a test,
which was repeated again and again while always spawning a fresh Gradle worker (due to Gradle
daemon) kept increasing Gradle worker IDs, causing an overflow at some point.
* Improve Repository Consistency Check in Tests (#44099)
* Check that index metadata as well as snapshot metadata always exists
when referenced by other metadata
* Fix SnapshotResiliencyTests on ExtraFS (#44113)
* As a result of #44099 we're now checking more directories and have to
ignore the `extraN` folders for those like we do for indices already
* Closes#44112
* The incompatible snapshots logic was created to track 1.x snapshots that
became incompatible with 2.x
* It serves no purpose at this point
* It adds an additional GET request to every loading of
RepositoryData (from loading the incompatible snapshots blob)
By default, we don't check ranges while indexing geo_shapes. As a
result, it is possible to index geoshapes that contain contain
coordinates outside of -90 +90 and -180 +180 ranges. Such geoshapes
will currently break SQL and ML retrieval mechanism. This commit removes
these restriction from the validator is used in SQL and ML retrieval.
The base classes for transport requests and responses currently
implement Streamable and Writeable. The writeTo method on these base
classes is implemented with an empty implementation. Not only does this
complicate subclasses to think they need to call super.writeTo, but it
also can lead to not implementing writeTo when it should have been
implemented, or extendiong one of these classes when not necessary,
since there is nothing to actually implement.
This commit removes the empty writeTo from these base classes, and fixes
subclasses to not call super and in some cases implement an empty
writeTo themselves.
relates #34389
AggregatorFactory was generic over itself, but it doesn't appear we
use this functionality anywhere (e.g. to allow the super class
to declare arguments/return types generically for subclasses to
override). Most places use a wildcard constraint, and even when a
concrete type is specified it wasn't used.
But since AggFactories are widely used, this led to
the generic touching many pieces of code and making type signatures
fairly complex
PrimaryAllocationIT#testForceStaleReplicaToBePromotedToPrimary
relies on the flushing when a shard is no long assigned. This behavior,
however, can be randomly disabled in MockInternalEngine.
Closes#44049
The `ShardFailedClusterStateTaskExecutor` fails some shards, which performs a
reroute, but then sometimes schedules a followup reroute. It's not clear from
the code why this followup is necessary, so this commit adds a short comment
describing why it's necessary.
This commit replaces usages of Streamable with Writeable for the
MultiSearchRequest class.
I ran into this when developing a custom action that reuses
MultiSearchRequest in the enrich branch.
Relates to #34389
Today the `ClusterInfoService` requires the `DiskThresholdMonitor` at
construction time so that it can notify it when nodes report changes in their
disk usage, but this is awkward to construct: the `DiskThresholdMonitor`
requires a `RerouteService` which requires an `AllocationService` which comees
from the `ClusterModule` which requires the `ClusterInfoService`.
Today we break the cycle with a `LazilyInitializedRerouteService` which is
itself a little ugly. This commit replaces this with a more traditional
subject/observer relationship between the `ClusterInfoService` and the
`DiskThresholdMonitor`.
Today `testOnlyBlocksOnConnectionsToNewNodes` fails (extremely rarely) if the
last attempt to connect to `node0` is delayed for so long that the test runs
`nodeConnectionsBlocks.clear()` before the connection attempt obtains the
expected connection block. We can turn this into a reliable failure with this
delay:
```diff
diff --git a/server/src/main/java/org/elasticsearch/cluster/NodeConnectionsService.java b/server/src/main/java/org/elasticsearch/cluster/NodeConnectionsService.java
index f48413824d3..9a1d0336bcd 100644
--- a/server/src/main/java/org/elasticsearch/cluster/NodeConnectionsService.java
+++ b/server/src/main/java/org/elasticsearch/cluster/NodeConnectionsService.java
@@ -300,6 +300,13 @@ public class NodeConnectionsService extends AbstractLifecycleComponent {
private final Runnable connectActivity = () -> threadPool.executor(ThreadPool.Names.MANAGEMENT).execute(new AbstractRunnable() {
@Override
protected void doRun() {
+
+ try {
+ Thread.sleep(500);
+ } catch (InterruptedException e) {
+ throw new AssertionError("unexpected", e);
+ }
+
assert Thread.holdsLock(mutex) == false : "mutex unexpectedly held";
transportService.connectToNode(discoveryNode);
consecutiveFailureCount.set(0);
```
This commit reverts the extra logging introduced in #43979 and fixes this
failure by waiting for the connection attempt to hit the barrier before
removing it.
Fixes#40170
Since #42636 we no longer treat connections specially when simulating a
blackholed connection. This means that at the end of the safety phase we may
have just started a connection attempt which will time out, but the default
timeout is 30 seconds, much longer than the 2 seconds we normally allow for
post-safety-phase discovery. This commit adds time for such a connection
attempt to time out.
It also fixes some spurious logging of `this` that now refers to an object with
an unhelpful `toString()` implementation introduced in #42636.
Fixes#44073
Refactor ScrollableHitSource to pump data out and have a simplified
interface (callers should no longer call startNextScroll, instead they
simply mark that they are done with the previous result, triggering a
new batch of data). This eases making reindex resilient, since we will
sometimes need to rerun search during retries.
Relates #43187 and #42612
Today the `index.routing.allocation.initial_recovery._id` setting can only be
set on indices that are the result of a shrink, but the filtered allocation
decider also applies this filter to shards with a recovery source of
`EMPTY_STORE`. The only way to have this setting set while the recovery source
is `EMPTY_STORE` is to force-allocate an empty primary, but such a forced
allocation ignores this allocation decider.
This commit simplifies the allocation decider so that the `initial_recovery`
setting only applies to shards with a recovery source of `LOCAL_SHARDS`.
* Fix DedicatedClusterSnapshotRestoreIT testSnapshotWithStuckNode
* See comment in the test: The problem is that when the snapshot delete works out partially on master failover and the retry fails on `SnapshotMissingException` no repository cleanup is run => we still failed even with repo cleanup logic in the delete path now
* Fixed the test by rerunning a create snapshot and delete loop to clean up the repo before verifying file counts
* Closes#39852
* Missed this one in #42189 and it randomly runs into a situation where the broken mock repo is broken such that we can't get to a consistent end state via a delete
* Closes#43498
Today we prevent any index that is actively snapshotted or restored to be closed.
This verification is done during the execution of the first phase of index closing
(ie before blocking the indices).
We should also do this verification again in the last phase of index closing
(ie after the shard sanity checks and right before actually changing the index
state and the routing table) because a snapshot/restore could sneak in while
the shards are verified-before-close.
Provides a hook for aggregations to introspect the `ValuesSourceType` for a user supplied Missing value on an unmapped field, when the type would otherwise be `ANY`. Mapped field behavior is unchanged, and still applies the `ValuesSourceType` of the field. This PR just provides the hook for doing this, no existing aggregations have their behavior changed.
* In the current codebase it is hardly obvious what code operates on a shard and is run by a datanode what code operates on the global metadata and is run on master
* Fixed by adjusting the method names accordingly
* The nested context classes don't add much if any value, they simply spread out the parameters that go into a shard snapshot create or delete all over the place since their
constructors can be inlined in all spots
* Fixed by flattening the nested classes into BlobStoreRepository
* Also:
* Inlined the other single use inner classes
* Remove some obvious dead code
* Move assert methods that were only used in a single test class to the child they belong to
* Inline some redundant methods
If an index is the result of a shrink then it will have a value set for
`index.routing.allocation.initial_recovery._id`. If this index is subsequently
split then this value will be copied over, forcing the initial allocation of
the split shards to occur on the node on which the shrink took place. Moreover
if this node no longer exists then the split will fail. This commit suppresses
the copying of this setting when splitting an index.
Fixes#43955
* Use ability to list child "folders" in the blob store to implement recursive delete on all stale index folders when cleaning up instead of using the diff between two `RepositoryData` instances to cover aborted deletes
* Runs after ever delete operation
* Relates #13159 (fixing most of this issues caused by unreferenced indices, leaving some meta files to be cleaned up only)
Joda allowed for date_optional_time and strict_date_optional_time a decimal point to be . dot or , comma
For our java.time implementation we should also extend this for strict_date_optional_time-nanos
the approach to fix this is the same as in iso8601 parser
closes#43730
This PR enables the indexing optimization using sequence numbers on
replicas. With this optimization, indexing on replicas should be faster
and use less memory as it can forgo the version lookup when possible.
This change also deactivates the append-only optimization on replicas.
Relates #34099
This commit converts the ConnectionManager's openConnection and connectToNode methods to
async-style. This will allow us to not block threads anymore when opening connections. This PR also
adapts the cluster coordination subsystem to make use of the new async APIs, allowing to remove
some hacks in the test infrastructure that had to account for the previous synchronous nature of the
connection APIs.
These assertions do not hold true when a master fails during publication and quickly becomes
master again, publishing a new cluster state in a higher term which races against the previous
cluster state publication to self (which does not matter anyway).
Relates #43994Closes#44012
This commit ensures that cluster state publications to self also go through the transport layer. This
allows voting-only nodes to intercept the publication to self.
Fixes an issue discovered by a test failure where a voting-only node, which was the only
bootstrapped node, would not step down as master after state transfer because publishing to self
would succeed.
Closes#43631
Today `RetentionLeaseSyncAction.Request` and
`RetentionLeaseBackgroundSyncAction.Request` both describe themselves as
`Request{...}` in the value returned from their respective `toString()`
methods. This commit adds the name of the owning class to both so we have
something a bit easier to search for and so we can distinguish foreground from
background syncs in logs and test failures and so on.
This commit changes the way we manage refreshes in the index engines.
Instead of relying on a SearcherManager, this change uses a ReaderManager that
creates ElasticsearchDirectoryReader when needed. Searchers are now created on-demand
(when acquireSearcher is called) from the current ElasticsearchDirectoryReader.
It also slightly changes the Engine.Searcher to extend IndexSearcher in order
to simplify the usage in the consumer.
Currently we log a deprecation warning to the types removal in
RestGetIndicesAction even if the REST method is HEAD, which is used by the
indices.exists API. Since the body is empty in this case we should not need to
show the deprecation warning.
Closes#43905
The test IndexShardIT.testIndexCanChangeCustomDataPath() fails
on 7.x and 7.3 because the translog cannot be recovered.
While I can't reproduce the issue, I think it has been introduced in #43752
which changed ReadOnlyEngine so that it opens the translog in its
constructor in order to load the translog stats. This opening writes a
new checkpoint file, but because 7.x/7.3 does not wait for shards to be
started after being closed, the test immediately starts to copy shard files
to a new directory and possibly does not copy all the required translog files.
By waiting for the shards to be started after being closed, we ensure
that the shards (and engines) have been correctly initialized and that
the translog checkpoint file is not currently being written.
closes#43964
This brings TokenizerFactory into line with CharFilterFactory and TokenFilterFactory,
and removes the need to pass around tokenizer names when building custom analyzers.
As this means that TokenizerFactory is no longer a functional interface, the commit also
adds a factory method to TokenizerFactory to make construction simpler.
Disjunction over two individual terms in a phrase query with multi-word synonyms
wrongly applies a prefix query to each of these terms. This change fixes this bug
by inversing the logic to use prefixes on `phrase_prefix` queries only.
Closes#43308
This commit changes async IO processor to release the promiseSemaphore
before notifying consumers. This ensures that a bad consumer that
sometimes does blocking (or otherwise slow) operations does not halt the
processor. This should slightly increase the concurrency for shard
fsync, but primarily improves safety so that one bad piece of code has
less effect on overall system performance.
Enables libs/geo parser to return a geometry format object that can
perform both serialization and deserialization functions. This can
be useful for ingest nodes that are trying to modify an existing
geometry in the source.
Relates to #43554
This is a prerequisite of #42189:
* Add directory delete method to blob container specific to each implementation:
* Some notes on the implementations:
* AWS + GCS: We can simply exploit the fact that both AWS and GCS return blobs lexicographically ordered which allows us to simply delete in the same order that we receive the blobs from the listing request. For AWS this simply required listing without the delimiter setting (so we get a deep listing) and for GCS the same behavior is achieved by not using the directory mode on the listing invocation. The nice thing about this is, that even for very large numbers of blobs the memory requirements are now capped nicely since we go page by page when deleting.
* For Azure I extended the parallelization to the listing calls as well and made it work recursively. I verified that this works with thread count `1` since we only block once in the initial thread and then fan out to a "graph" of child listeners that never block.
* HDFS and FS are trivial since we have directory delete methods available for them
* Enhances third party tests to ensure the new functionality works (I manually ran them for all cloud providers)
IndexAnalyzers has a close() method that should iterate through all its wrapped
analyzers and close each one in turn. However, instead of delegating to the
analyzers' close() methods, it instead wraps them in a Closeable interface,
which just returns a list of the analyzers. In addition, whitespace normalizers are
ignored entirely.
When profiling a call to `AllocationService#reroute()` in a large cluster
containing allocation filters of the form `node-name-*` I observed a nontrivial
amount of time spent in `Regex#simpleMatch` due to these allocation filters.
Patterns ending in a wildcard are not uncommon, and this change treats them as
a special case in `Regex#simpleMatch` in order to shave a bit of time off this
calculation. It also uses `String#regionMatches()` to avoid an allocation in
the case that the pattern's only wildcard is at the start.
Microbenchmark results before this change:
Result "org.elasticsearch.common.regex.RegexStartsWithBenchmark.performSimpleMatch":
1113.839 ±(99.9%) 6.338 ns/op [Average]
(min, avg, max) = (1102.388, 1113.839, 1135.783), stdev = 9.486
CI (99.9%): [1107.502, 1120.177] (assumes normal distribution)
Microbenchmark results with this change applied:
Result "org.elasticsearch.common.regex.RegexStartsWithBenchmark.performSimpleMatch":
433.190 ±(99.9%) 0.644 ns/op [Average]
(min, avg, max) = (431.518, 433.190, 435.456), stdev = 0.964
CI (99.9%): [432.546, 433.833] (assumes normal distribution)
The microbenchmark in question was:
@Fork(3)
@Warmup(iterations = 10)
@Measurement(iterations = 10)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Benchmark)
@SuppressWarnings("unused") //invoked by benchmarking framework
public class RegexStartsWithBenchmark {
private static final String testString = "abcdefghijklmnopqrstuvwxyz";
private static final String[] patterns;
static {
patterns = new String[testString.length() + 1];
for (int i = 0; i <= testString.length(); i++) {
patterns[i] = testString.substring(0, i) + "*";
}
}
@Benchmark
public void performSimpleMatch() {
for (int i = 0; i < patterns.length; i++) {
Regex.simpleMatch(patterns[i], testString);
}
}
}
* Optimize Snapshot Finalization
* Delete index-N blobs and segement blobs in one single bulk delete instead of in separate ones to save RPC calls on implementations that have bulk deletes implemented
* Don't fail snapshot because deleting old index-N failed, this results in needlessly logging finalization failures and makes analysis of failures harder going forward as well as incorrect index.latest blobs
* Add Ability to List Child Containers to BlobContainer (#42653)
* Add Ability to List Child Containers to BlobContainer
* This is a prerequisite of #42189
This change fixes the name of the index_prefix sub field when the `index_prefix`
option is set on a text field that is nested under an object or a multi-field.
We don't use the full path of the parent field to set the index_prefix field name
so the field is registered under the wrong name. This doesn't break queries since
we always retrieve the prefix field through its parent field but this breaks other
APIs like _field_caps which tries to find the parent of the `index_prefix` field
in the mapping but fails.
Closes#43741
This commit allows bulk upserts to correctly read the default pipeline
for the concrete index that belongs to an alias.
Bulk upserts are modeled differently from normal index requests such that
the index request is a request inside of the update request. The update
request (outer) contains the index or alias name is not part of the (inner)
index request. This commit adds a secondary check against the update request
(outer) if the index request (inner) does not find an alias.
Currently the repsonse of the "_reload_search_analyzer" endpoint contains the
index names and nodeIds of indices were analyzers reloading was triggered. This
change add the names of the search-time analyzers that were reloaded.
Closes#43804
With Lucene rollback (#33473), we should never have more than one
primary term for each sequence number. Therefore we don't have to sort
by the primary term when reading soft-deletes.
The `RoutingService` has a confusing name, since it doesn't really have
anything to do with routing. Its responsibility is submitting reroute commands
to the master.
This commit renames this class to `BatchedRerouteService`, and extracts the
`RerouteService` interface to avoid passing `BiConsumer`s everywhere. It also
removes that `BatchedRerouteService extends AbstractLifecycleComponent` since
this service has no meaningful lifecycle. Finally, it introduces a small
wrapper class to allow for lazy initialization to deal with the dependency loop
when constructing a `Node`.
This adds a `rare_terms` aggregation. It is an aggregation designed
to identify the long-tail of keywords, e.g. terms that are "rare" or
have low doc counts.
This aggregation is designed to be more memory efficient than the
alternative, which is setting a terms aggregation to size: LONG_MAX
(or worse, ordering a terms agg by count ascending, which has
unbounded error).
This aggregation works by maintaining a map of terms that have
been seen. A counter associated with each value is incremented
when we see the term again. If the counter surpasses a predefined
threshold, the term is removed from the map and inserted into a cuckoo
filter. If a future term is found in the cuckoo filter we assume it
was previously removed from the map and is "common".
The map keys are the "rare" terms after collection is done.
This commit merges the `object-fields` feature branch. The new 'flattened
object' field type allows an entire JSON object to be indexed into a field, and
provides limited search functionality over the field's contents.
Action is a class that encapsulates meta information about an action
that allows it to be called remotely, specifically the action name and
response type. With recent refactoring, the action class can now be
constructed as a static constant, instead of needing to create a
subclass. This makes the old pattern of creating a singleton INSTANCE
both misnamed and lacking a common placement.
This commit renames Action to ActionType, thus allowing the old INSTANCE
naming pattern to be TYPE on the transport action itself. ActionType
also conveys that this class is also not the action itself, although
this change does not rename any concrete classes as those will be
removed organically as they are converted to TYPE constants.
relates #34389
Today the `DiskThresholdMonitor` limits the frequency with which it submits
reroute tasks, but it might still submit these tasks faster than the master can
process them if, for instance, each reroute takes over 60 seconds. This causes
a problem since the reroute task runs with priority `IMMEDIATE` and is always
scheduled when there is a node over the high watermark, so this can starve any
other pending tasks on the master.
This change avoids further updates from the monitor while its last task(s) are
still in progress, and it measures the time of each update from the completion
time of the reroute task rather than its start time, to allow a larger window
for other tasks to run.
It also now makes use of the `RoutingService` to submit the reroute task, in
order to batch this task with any other pending reroutes. It enhances the
`RoutingService` to notify its listeners on completion.
Fixes#40174
Relates #42559
Introduces a new `ConsistentSecureSettingsValidatorService` service that exposes
a single public method, namely `allSecureSettingsConsistent`. The method returns
`true` if the local node's secure settings (inside the keystore) are equal to the
master's, and `false` otherwise. Technically, the local node has to have exactly
the same secure settings - setting names should not be missing or in surplus -
for all `SecureSetting` instances that are flagged with the newly introduced
`Property.Consistent`. It is worth highlighting that the `allSecureSettingsConsistent`
is not a consensus view across the cluster, but rather the local node's perspective
in relation to the master.
The Action base class currently works for both Streamable and Writeable
response types. This commit intorduces StreamableResponseAction, for
which only the legacy Action implementions which provide newResponse()
will extend. This eliminates the need for overriding newResponse() with
an UnsupportedOperationException.
relates #34389
Today when an index is closed all its shards are forced flushed
but the translog files are left around. As explained in #42445
we'd like to trim the translog for closed indices in order to
consume less disk space. This commit reuses the existing
AsyncTrimTranslogTask task and reenables it for closed indices.
At the time the task is executed, we should have the guarantee
that nothing holds the translog files that are going to be removed.
It also leaves a short period of time (10 min) during which translog
files of a recently closed index are still present on disk. This could
also help in some cases where the closed index is reopened
shortly after being closed (in order to update an index setting
for example).
Relates to #42445
This change removes the ability to wrap an IndexSearcher in plugins. The IndexSearcherWrapper is replaced by an IndexReaderWrapper and allows to wrap the DirectoryReader only. This simplifies the creation of the context IndexSearcher that is used on a per request basis. This change also moves the optimization that was implemented in the security index searcher wrapper to the ContextIndexSearcher that now checks the live docs to determine how the search should be executed. If the underlying live docs is a sparse bit set the searcher will compute the intersection
betweeen the query and the live docs instead of checking the live docs on every document that match the query.
This commit adds a wildcard intervals source, similar to the prefix. It
also changes the term parameter in prefix to read prefix, to bring it
in to line with the pattern parameter in wildcard.
Closes#43198
Currently changing resources (like dictionaries, synonym files etc...) of search
time analyzers is only possible by closing an index, changing the underlying
resource (e.g. synonym files) and then re-opening the index for the change to
take effect.
This PR adds a new API endpoint that allows triggering reloading of certain
analysis resources (currently token filters) that will then pick up changes in
underlying file resources. To achieve this we introduce a new type of custom
analyzer (ReloadableCustomAnalyzer) that uses a ReuseStrategy that allows
swapping out analysis components. Custom analyzers that contain filters that are
markes as "updateable" will automatically choose this implementation. This PR
also adds this capability to `synonym` token filters for use in search time
analyzers.
Relates to #29051
When we added support for TokenFilterFactories to specialise how they were used when parsing
synonym files, PreConfiguredTokenFilters were set up to either apply themselves, or be ignored.
This behaviour is a leftover from an earlier iteration, and also has an incorrect default.
This commit makes preconfigured token filters usable in synonym file parsing by default, and brings
those filters that should not be used into line with index-specific filter factories; in indexes created
before version 7 we emit a deprecation warning, and we throw an error in indexes created after.
Fixes#38793
TransportNodesAction provides a mechanism to easily broadcast a request
to many nodes, and collect the respones into a high level response. Each
node has its own request type, with a base class of BaseNodeRequest.
This base request requires passing the nodeId to which the request will
be sent. However, that nodeId is not used anywhere. It is private to the
base class, yet serialized to each node, where the node could just as
easily find the nodeId of the node it is on locally.
This commit removes passing the nodeId through to the node request
creation, and guards its serialization so that we can remove the base
request class altogether in the future.
GatewayIndexStateIT#testRecoverBrokenIndexMetadata replies on the
flushing on shutdown. This behaviour, however, can be randomly disabled
in MockInternalEngine.
Closes#43034
Adds support for the situation where only voting-only nodes are bootstrapped. In that case, they will
still try to become elected and bring full master nodes into the cluster.
Search requests executed through the SecurityIndexSearcherWrapper throw
an UnsupportedOperationException if they match a sparse role query.
When low level cancellation is activated (which is the default since #42857),
the context index searcher creates a weight that doesn't handle #scorer.
This change fixes this bug and adds a test to ensure that we check this case.
Currently `AbstractQueryTestCase#testToQuery` checks the search context cachable
flag. This is a bit fragile due to the high randomization of query builders
performed by this general test. Also we might only rarely check the
"interesting" cases because they rarely get generated when fully randomizing the
query builder.
This change moved the general checks out ot #testToQuery and instead adds
dedicated cache tests for those query builders that exhibit something other than
the default behaviour.
Closes#43200
When a named token filter or char filter is passed as part of an Analyze API
request with no index, we currently try and build the relevant filter using no
index settings. However, this can miss cases where there is a pre-configured
filter defined in the analysis registry. One example here is the elision filter, which
has a pre-configured version built with the french elision set; when used as part
of normal analysis, this preconfigured set is used, but when used as part of the
Analyze API we end up with NPEs because it tries to instantiate the filter with
no index settings.
This commit changes the Analyze API to check for pre-configured filters in the case
that the request has no index defined, and is using a name rather than a custom
definition for a filter.
It also changes the pre-configured `word_delimiter_graph` filter and `edge_ngram`
tokenizer to make their settings consistent with the defaults used when creating
them with no settings
Closes#43002Closes#43621Closes#43582
This commit adds a prefix intervals source, allowing you to search
for intervals that contain terms starting with a given prefix. The source
can make use of the index_prefixes mapping option.
Relates to #43198
This commit modifies the RemoteInfo to clarify that a search query
must always be serialized as JSON. Additionally, it adds an assertion
to ensure that this is the case. This fixes#43406.
Additionally, this PR implements AbstractXContentTestCase for the
reindex request. This is related to #43456.
A voting-only master-eligible node is a node that can participate in master elections but will not act
as a master in the cluster. In particular, a voting-only node can help elect another master-eligible
node as master, and can serve as a tiebreaker in elections. High availability (HA) clusters require at
least three master-eligible nodes, so that if one of the three nodes is down, then the remaining two
can still elect a master amongst them-selves. This only requires one of the two remaining nodes to
have the capability to act as master, but both need to have voting powers. This means that one of
the three master-eligible nodes can be made as voting-only. If this voting-only node is a dedicated
master, a less powerful machine or a smaller heap-size can be chosen for this node. Alternatively, a
voting-only non-dedicated master node can play the role of the third master-eligible node, which
allows running an HA cluster with only two dedicated master nodes.
Closes#14340
Co-authored-by: David Turner <david.turner@elastic.co>
Today the `ClusterFormationFailureHelper` says `... discovery will continue
using ... from last-known cluster state` and lists all the nodes in the
last-known cluster state. In fact we ignore the master-ineligible nodes in the
last-known cluster state during discovery. This commit fixes this by listing
only the master-eligible nodes from the cluster state in this message.
If the master removes the relocating shard, but recovery isn't aware of
it, then we can enter an invalid state where ReplicationTracker does not
include the local shard.
Today we assert that a warning is logged after no more than
`discovery.cluster_formation_warning_timeout`, but the deterministic scheduler
adds a small amount of extra randomness to the timing of future events, causing
the following build to fail:
./gradlew :server:test --tests "org.elasticsearch.cluster.coordination.CoordinatorTests.testLogsWarningPeriodicallyIfClusterNotFormed" -Dtests.seed=DF35C28D4FA9EE2D
This commit adds an allowance for this extra time.
After two recent changes (#38824 and #33888), the _cat/indices API
no longer report information for active recovering indices and
non-replicated closed indices. It also misreport replicated closed
indices that are potentially not authorized for the user.
This commit changes how the cat action works by first using the
Get Settings API in order to resolve authorized indices. It then uses
the Cluster State, Cluster Health and Indices Stats APIs to retrieve
information about the indices.
Closes#39933
This refactors AggregatorTestCase to allow testing mock scripts.
The main change is to QueryShardContext. This was previously mocked,
but to get the ScriptService you have to invoke a final method
which can't be mocked.
Instead, we just create a mostly-empty QueryShardContext and populate
the fields that are needed for testing. It also introduces a few
new helper methods that can be overridden to change the default
behavior a bit.
Most tests should be able to override getMockScriptService() to supply
a ScriptService to the context, which is later used by the aggs.
More complicated tests can override queryShardContextMock() as before.
Adds a test to MaxAggregatorTests to test out the new functionality.
At the end of a peer recovery the primary wants to mark the replica as in-sync. For that the
persisted local checkpoint of the replica needs to have caught up with the global checkpoint on the
primary. If translog durability is set to ASYNC, this means that information about the persisted local
checkpoint can lag on the primary and might need to be explicitly fetched through a global
checkpoint sync action. Unfortunately, that action will only be triggered after 30 seconds, and, even
worse, will only run based on what the in-sync shard copies say (see
IndexShard.maybeSyncGlobalCheckpoint). As the replica has not been marked as in-sync yet, it is
not taken into consideration, and the primary might have its global checkpoint equal to the max seq
no, so it thinks nothing needs to be done.
Closes#43486
Long and Double ValuesSource set the current document on the script
before executing, but Bytes was missing this method call. That meant
it was possible to generate an OutOfBoundsException when using
a "value" script (field + script) on keyword or other bytes
fields.
This adds in the method call, and a few yaml tests to verify correct
behavior.
This commit performs the proper restore of network disruption.
Previously disruptionScheme.stopDisrupting() was called that does not
ensure that connectivity between cluster nodes is restored. The test
was checking that the cluster has green status, but it was not checking
that connectivity between nodes is restored.
Here we switch to internalCluster().clearDisruptionScheme(true) which
performs both checks before returning.
Similar to #42798Closes#42051
(cherry picked from commit cd1ed662f847a0055ede7dfbd325e214ec4d1490)
This commit replaces usages of Streamable with Writeable for the
AcknowledgedResponse and its subclasses, plus associated actions.
Note that where possible response fields were made final and default
constructors were removed.
This is a large PR, but the change is mostly mechanical.
Relates to #34389
Backport of #43414
Unsupported HTTP methods are detected during requests dispatching
which generates an appropriate error response. Sadly, this error is
never sent back to the client because the method of the original
request is checked again in DefaultRestChannel which throws again
an IllegalArgumentException that is never handled.
This pull request changes the DefaultRestChannel so that the latest
exception is swallowed, allowing the error message to be sent back
to the client. It also eagerly adds the objects to close to the toClose
list so that resources are more likely to be released if something
goes wrong during the response creation and sending.
We currently assert that adding deletion tombstones to Lucene must always succeed if it's not a
tragic exception, and the same should also hold true for NOOP tombstones. We rely on this
assumption, as without this, we have the risk of creating gaps in the history, which will break
operation-based recoveries and CCR.
Depending on git configuration, line feed on checked out files may be
platform dependent, which causes problems to some msearch tests as the
line separator must always be `/n`. With this change we move two files
to the test code so that we control exactly what line separator is used,
given that the corresponding tests fail on windows.
Closes#43464
Removes `@TestLogging` annotations in `*DisruptionIT` tests, so that the only
tests with annotations are those with open issues. Also adds links to the open
issues in the remaining cases.
Relates #43403
The current toXContent implementation can fail when the superclasses toXContent
is called (see #43423). This change makes sure that
DefaultShardOperationFailedException#toXContent is final and implementations
need to add special fields in #innerToXContent. All implementations should write
to self-contained xContent objects. Also adding a test for xContent deserialization
to CloseIndexResponseTests.
Closes#43423
Today when searching for an exclusive range the java date math parser rounds up the value
with the granularity of the operation. So when searching for values that are greater than
"now-2M" the parser rounds up the operation to "now-1M". This behavior was introduced when
we migrated to java date but it looks like a bug since the joda math parser rounds up values
but only when a rounding is used. So "now/M" is rounded to "now-1ms" (minus 1ms to get the largest inclusive value)
in the joda parser if the result should be exclusive but no rounding is applied if the input
is a simple operation like "now-1M". This change restores the joda behavior in order to have
a consistent parsing in all versions.
Closes#43277
Currently the fromXContent logic for reindex requests is implemented in
the rest action. This is inconsistent with other requests where the
logic is implemented in the request. Additionally, it requires access to
the rest action in order to parse the request. This commit moves the
logic and tests into the ReindexRequest.
AggregatorTestCase will NPE if only a single, null MappedFieldType
is provided (which is required to simulate an unmapped field). While
it's possible to test unmapped fields by supplying other, non-related
field types... that's clunky and unnecessary. AggregatorTestCase
just needs to filter out null field types when setting up.
Today the `DisruptibleMockTransport` always allows a connection to a node to be
established, and then fails requests sent to that node such as the subsequent
handshake. Since #42342, we log handshake failures on an open connection as a
warning, and this makes the test logs rather noisy. This change fails the
connection attempt first, avoiding these unrealistic warnings.
The test needed adaption after #43205, as the ReplicationTracker now distinguishes between the
knowledge of the persisted global checkpoint and the computed global checkpoint on the primary
Follow-up to #43205
Local and global checkpoints currently do not correctly reflect what's persisted to disk. The issue is
that the local checkpoint is adapted as soon as an operation is processed (but not fsynced yet). This
leaves room for the history below the global checkpoint to still change in case of a crash. As we rely
on global checkpoints for CCR as well as operation-based recoveries, this has the risk of shard
copies / follower clusters going out of sync.
This commit required changing some core classes in the system:
- The LocalCheckpointTracker keeps track now not only of the information whether an operation has
been processed, but also whether that operation has been persisted to disk.
- TranslogWriter now keeps track of the sequence numbers that have not been fsynced yet. Once
they are fsynced, TranslogWriter notifies LocalCheckpointTracker of this.
- ReplicationTracker now keeps track of the persisted local and persisted global checkpoints of all
shard copies when in primary mode. The computed global checkpoint (which represents the
minimum of all persisted local checkpoints of all in-sync shard copies), which was previously stored
in the checkpoint entry for the local shard copy, has been moved to an extra field.
- The periodic global checkpoint sync now also takes async durability into account, where the local
checkpoints on shards only advance when the translog is asynchronously fsynced. This means that
the previous condition to detect inactivity (max sequence number is equal to global checkpoint) is
not sufficient anymore.
- The new index closing API does not work when combined with async durability. The shard
verification step is now requires an additional pre-flight step to fsync the translog, so that the main
verify shard step has the most up-to-date global checkpoint at disposition.
The RemoteClusterService should close the current
RemoteClusterConnection and should build it again if
the seeds are changed, similarly to what is done when
the ping interval or the compression settings are changed.
Closes#37799
The fact that SearchPhaseContext extends ActionListener makes it hard
to reason about when the original listener is notified and to trace
those calls. Also, the corresponding onFailure and onResponse were
only needed in two places, one each, where they can be replaced by a
more intuitive call, like sendSearchResponse for onResponse.
Today the fielddata for global ordinals re-creates docvalues readers of each segment
when building the iterator of a single segment. This is required because the lookup of
global ordinals needs to access the docvalues's TermsEnum of each segment to retrieve
the original terms. This also means that we need to create NxN (where N is the number of segment in the index) docvalues iterators
each time we want to collect global ordinal values. This wasn't an issue in previous versions since docvalues readers are stateless
before 6.0 so they are reused on each segment but now that docvalues are iterators we need to create a new instance each time
we want to access the values. In order to avoid creating too many iterators this change splits
the global ordinals fielddata in two classes, one that is used to cache a single instance per directory reader and one
that is created from the cached instance that can be used by a single consumer. The latter creates the TermsEnum of each segment
once and reuse them to create the segment's iterator. This prevents the creation of all TermsEnums each time we want to access
the value of a single segment, hence reducing the number of docvalues iterator to create to Nx2 (one iterator and one lookup per segment).
This commit removes some very old test logging annotations that appeared
to be added to investigate test failures that are long since closed. If
these are needed, they can be added back on a case-by-case basis with a
comment associating them to a test failure.
It's possible for the passed in `IndexMetaData` to be null (for
instance, cluster state passed in does not have the index in its
metadata) which in turn can cause a `NullPointerException` when
evaluating the conditions for an index. This commit adds null protection
and unit tests for this case.
Resolves#43296
While investigating memory consumption of deeply nested aggregations for #43091
the memory used to keep track of the doc ids and buckets in the BestBucketsDeferringCollector
showed up as one of the main contributor. In my tests half of the memory held in the
BestBucketsDeferringCollector is associated to segments that don't have matching docs
in the selected buckets. This is expected on fields that have a big cardinality since each
bucket can appear in very few segments. By allocating the builders lazily this change
reduces the memory consumption by a factor 2 (from 1GB to 512MB), hence reducing the
impact on gcs for these volatile allocations. This commit also switches the PackedLongValues.Builder
with a RoaringDocIdSet in order to handle very sparse buckets more efficiently.
I ran all my tests on the `geoname` rally track with the following query:
````
{
"size": 0,
"aggs": {
"country_population": {
"terms": {
"size": 100,
"field": "country_code.raw"
},
"aggs": {
"admin1_code": {
"terms": {
"size": 100,
"field": "admin1_code.raw"
},
"aggs": {
"admin2_code": {
"terms": {
"size": 100,
"field": "admin2_code.raw"
},
"aggs": {
"sum_population": {
"sum": {
"field": "population"
}
}
}
}
}
}
}
}
}
}
````
This PR is a backport a of #43214 from v8.0.0
A number of the aggregation base classes have an abstract doEquals() and doHashCode() (e.g. InternalAggregation.java, AbstractPipelineAggregationBuilder.java).
Theoretically this is so the sub-classes can add to the equals/hashCode and don't need to worry about calling super.equals(). In practice, it's mostly just confusing/inconsistent. And if there are more than two levels, we end up with situations like InternalMappedSignificantTerms which has to call super.doEquals() which defeats the point of having these overridable methods.
This PR removes the do versions and just use equals/hashCode ensuring the super when necessary.
* Follow up to #42109:
* Adjust test to only check that interface lookup by name works not actually lookup IPs which is brittle since virtual interfaces can be destroyed/created by Docker while the tests are running
Co-authored-by: Jason Tedor <jason@tedor.me>
* Return 0 for negative "free" and "total" memory reported by the OS
We've had a situation where the MX bean reported negative values for the
free memory of the OS, in those rare cases we want to return a value of
0 rather than blowing up later down the pipeline.
In the event that there is a serialization or creation error with regard
to memory use, this adds asserts so the failure will occur as soon as
possible and give us a better location for investigation.
Resolves#42157
* Fix test passing in invalid memory value
* Fix another test passing in invalid memory value
* Also change mem check in MachineLearning.machineMemoryFromStats
* Add background documentation for why we prevent negative return values
* Clarify comment a bit more
This PR reverts #35230.
Previously, we reply on soft-deletes to fill the mismatch between the
version map and the Lucene index. This is no longer needed after #43202
where we rebuild the version map when opening an engine. Moreover,
PrunePostingsMergePolicy can prune _id of soft-deleted documents out of
order; thus the lookup result including soft-deletes sometimes does not
return the latest version (although it's okay as we only use a valid
result in an engine).
With this change, we use only live documents in Lucene to resolve the
indexing strategy. This is perfectly safe since we keep all deleted
documents after the local checkpoint in the version map.
Closes#42979
Backport of: https://github.com/elastic/elasticsearch/pull/43222
This commit replaces usages of Streamable with Writeable for the
SingleShardRequest / TransportSingleShardAction classes and subclasses of
these classes.
Note that where possible response fields were made final and default
constructors were removed.
Relates to #34389
Today the `ClusterFormationFailureHelper` does not include the local node in
the list of nodes it claims to have discovered. This means that it sometimes
reports that it has not discovered a quorum when in fact it has. This commit
adds the local node to the set of discovered nodes.
Today we suppress election attempts on master-eligible nodes that are not in
the voting configuration. In fact this restriction is not necessary: any
master-eligible node can safely become master as long as it has a fresh enough
cluster state and can gather a quorum of votes. Moreover, this restriction is
sometimes undesirable: there may be a reason why we do not want any of the
nodes in the voting configuration to become master.
The reason for this restriction is as follows. If you want to shut the master
down then you might first exclude it from the voting configuration. When this
exclusion succeeds you might reasonably expect that a new master has been
elected, since the voting config exclusion is almost always a step towards
shutting the node down. If we allow nodes outside the voting configuration to
be the master then the excluded node will continue to be master, which is
confusing.
This commit adjusts the logic to allow master-eligible nodes to attempt an
election even if they are not in the voting configuration. If such a master is
successfully elected then it adds itself to the voting configuration. This
commit also adjusts the logic that causes master nodes to abdicate when they
are excluded from the voting configuration, to avoid the confusion described
above.
Relates #37712, #37802.
With this change, we will rebuild the live version map and local
checkpoint using documents (including soft-deleted) from the safe commit
when opening an internal engine. This allows us to safely prune away _id
of all soft-deleted documents as the version map is always in-sync with
the Lucene index.
Relates #40741
Supersedes #42979
This commit modifies InternalTestCluster to allow using client() and
other operations inside a RestartCallback (onStoppedNode typically).
Restarting nodes are now removed from the map and thus all
methods now return the state as if the restarting node does not exist.
This avoids various exceptions stemming from accessing the stopped
node(s).
Adds `equals()` and `hashcode()` methods to `DiscoveryNodeRole` to compare
these objects' values for equality, and adds a field to allow us to distinguish
unknown roles from known ones with the same name and abbreviation, for clearer
test failures.
Relates #43175
With this change we only have to add one line to add a new version.
The intent is to make it less error prone and easier to write a script
to automate the process.
There are a few tests within NodeTests that submit items to the
threadpool and then close the node. The tests are designed to check
how running tasks are affected during node close. These tests can cause
CI failures since the submitted tasks may not be running when the node
is closed and then execute after the thread context is closed, which
triggers an unexpected exception. This change ensures the threads are
running so we avoid the unexpected exception and can test these cases.
The test of task submittal while a node is closing is also important so
an additional but muted test has been added that tests the case where a
task may be getting submitted while the node is closing and ensuring we
do not trigger anything unexpected in these cases.
Relates #42774
Relates #42577
This PR proposes to model big integers as longs (and big decimals as doubles)
in the context of dynamic mappings.
Previously, the dynamic mapping logic did not recognize big integers or
decimals, and would an error of the form "No matching token for number_type
[BIG_INTEGER]" when a dynamic big integer was encountered. It now accepts these
numeric types and interprets them as 'long' and 'double' respectively. This
allows `dynamic_templates` to accept and and remap them as another type such as
`keyword` or `scaled_float`.
Addresses #37846.
The test fails if querying the roles via a transport client, since the
transport client does not have the plugin necessary to interpret the additional
role correctly. This commit adds this plugin to the transport client used.
Relates #43175Fixes#43223
Currently the randomization of the q.b. in these tests can create query strings
that can cause caching to be disabled for this query if we query all fields and
there is a date field present. This is pretty much an anomaly that we shouldn't
generally test for in the "testToQuery" tests where cache policies are checked.
This change makes sure we don't create offending query strings so the cache
checks never hit these cases and adds a special test method to check this edge
case.
Closes#43112
roundUp parsers were losing the composite pattern information when new
JavaDateFormatter was created from methods withLocale or withZone.
The roundUp parser should be preserved when calling these methods. This is the same approach in withLocale/Zone methods as in daa2ec8a60/server/src/main/java/org/elasticsearch/common/time/JavaDateFormatter.javacloses#42835
Given the significant performance impact that NIOFS has when term dicts are
loaded off-heap this change enforces FstLoadMode#AUTO that loads term dicts
off heap only if the underlying index input indicates a memory map.
Relates to #43150
As the ValuesSourceType evolves, it is important to be
confident that new enum constants do not break
backwards-compatibility on the stream. Having dedicated
unit tests for this class will help be sure of that.
This change adds the terms index (`.tip`) to the list of extensions
that are memory-mapped by hybridfs. These files used to be accessed
only once to load the terms index on-heap but since #42838 they can
now be used to read the binary FST directly so it is benefical to
memory-map them instead of accessing them via NIO.
Fixes an issue where tests would sometimes hang for 5 seconds when restarting a node. The reason
is that the SeedHostsResolver is blockingly waiting on a result for the full 5 seconds when the
corresponding threadpool is shut down.
We are still using `FileSwitchDirectory` in the case a user configures file based pre-load of mmaps. This is trappy for multiple reasons if the both directories used by `FileSwitchDirectory` point to the same filesystem directory. One issue is LUCENE-8835 that cause issues like #37111 - unless LUCENE-8835 isn't fixed we should not use it in elasticsearch. Instead we use a similar trick as we use for HybridFS and subclass mmap directory directly.
When set to false, allowPartialSearchResults option does not check if the
shard failures have been reseted to null. The atomic array, that is used to record
shard failures, is filled with a null value if a successful request on a shard happens
after a failure on a shard of another replica. In this case the atomic array is not empty
but contains only null values so this shouldn't be considered as a failure since all
shards are successful (some replicas have failed but the retries on another replica succeeded).
This change fixes this bug by checking the content of the atomic array and fails the request only
if allowPartialSearchResults is set to false and at least one shard failure is not null.
Closes#40743
Currently suggesters return null values on empty shards. Usually this gets replaced
by results from other non-epmty shards, but if the index is completely epmty (e.g. after
creation) the search responses "suggest" is also "null" and we don't render a corresponding
output in the REST response. This is an irritating edge case that requires special handling on
the user side (see #42473) and should be fixed.
This change makes sure every suggester type (completion, terms, phrase) returns at least an
empty skeleton suggestion output, even for empty shards. This way, even if we don't find
any suggestions anywhere, we still return and output the empty suggestion.
Closes#42473
When a search on some indices takes a long time, it may cause problems to other indices that are being searched as part of the same search request and being written to as well, because their search context needs to stay open for a long time. This is especially a problem when searching against throttled and non-throttled indices as part of the same request. The problem can be generalized though: this may happen whenever read-only indices are searched together with indices that are being written to. Search contexts staying open for a long time is only an issue for indices that are being written to, in practice.
This commit splits the search in two sub-searches: one for read-only indices, and one for ordinary indices. This way the two don't interfere with each other. The split is done only when size is greater than 0, no scroll is provided and query_then_fetch is used as search type. Otherwise, the search executes like before. Note that the returned num_reduce_phases reflect the number of reduction phases that were run. If the search is split in two, there are three reductions: one non-final for each search, and a final one that merges the results of the previous two.
Closes#40900
The SearchRequest#crossClusterSearch method is currently used only as
part of cross cluster search request, when minimizing roundtrips.
It will soon be used also when splitting a search into two: one for
throttled and one for non throttled indices. It will probably be used
for other usecases as well in the future, hence it makes sense to generalize its name to subSearchRequest.
Though not in use in elasticsearch currently, it seems surprising that
ThreadPool.scheduler().scheduleAtFixedRate would hang. A recurring
scheduled task is never completed (except on failure) and we test for
exceptions using RunnableFuture.get(), which hangs for periodic tasks.
Fixed by checking that task is done before calling .get().
Today the master eagerly reroutes the cluster as part of processing node joins.
However, it is not necessary to do this reroute straight away, and it is
sometimes preferable to defer it until later. For instance, when the master
wins its election it processes joins and performs a reroute, but it would be
better to defer the reroute until after the master has become properly
established.
This change defers this reroute into a separate task, and batches multiple such
tasks together.
#40741 introduced a merge policy that can drop the postings for the `_id`
field on soft deleted documents. The TermsSliceQuery assumes that every document
has has an entry in the postings for that field so it doesn't check if the terms
index exists or not. This change fixes this bug by checking if the terms index for
the `_id` field is null and ignore the segment entirely if it's the case. This should
be harmless since segments without an `_id` terms index should only contain soft deleted
documents.
Closes#42996
If linearizability checking fails with OOM (or other exception), we did
not get the serialized history written into the log, making it difficult
to debug in cases where the problem is hard to reproduce. Fixed to
always attempt dumping the serialized history.
Related to #42244
Both TransportAnalyzeAction and CategorizationAnalyzer have logic to build
custom analyzers for index-independent analysis. A lot of this code is duplicated,
and it requires the AnalysisRegistry to expose a number of internal provider
classes, as well as making some assumptions about when analysis components are
constructed.
This commit moves the build logic directly into AnalysisRegistry, reducing the
registry's API surface considerably.
Setting `auto` after the fuzzy operator (e.g. `"query": "foo~auto"`) in the `query_string`
does not take the length of the term into account when computing the distance and always use
a max distance of 1. This change fixes this disrepancy by ensuring that the term is passed when
the fuzziness is computed.
We respect allocation deciders, including the `MaxRetryAllocationDecider`, when
executing reroute commands. If you specify `?retry_failed=true` then the retry
counter is reset, but today this does not happen until after trying to execute
the reroute commands. This means that if an allocation has repeatedly failed,
but you want to take control and assign a shard to a particular node to work
around the repeated failures, you cannot execute the routing command in the
same call to `POST /_cluster/reroute` as the one that resets the failure
counter.
This commit fixes this by resetting the failure counter first, meaning that you
can now explicitly allocate a repeatedly-failed shard like this:
```
POST /_cluster/reroute?retry_failed=true
{
"commands": [
{
"allocate_replica": {
"index": "blahblah",
"shard": 2,
"node": "node-4"
}
}
]
}
```
Fixes#39546
This commit refactors put mapping request validation for reuse. The
concrete case that we are after here is the ability to apply effectively
the same framework to indices aliases requests. This commit refactors
the put mapping request validation framework to allow for that.
This commit fixes a test bug in the request validators random test. In
particular, an assertion was not properly nested in a guard that would
ensure that was at least one failure.
Relates #43000
When applying put mapping validators, we apply all the validators in the
collection. If a failure occurs, we collect that as a top-level
exception, and suppress any additional failures into the top-level
exception. However, if a request passes the validator after a top-level
exception has been collected, we would try to suppress a null exception
into the top-level exception. This is a violation of the
Throwable#addSuppressed API. This commit addresses this, and adds test
to cover the logic of collecting the failures when validating a put
mapping request.
Today we test for translog corruption by incrementing a byte by 1 somewhere in
a file, and verify that this leads to a `TranslogCorruptionException`.
However, we rely on _all_ corruptions leading to this exception in the
`RemoveCorruptedShardDataCommand`: this command fails if a translog file
corruption leads to a different kind of exception, and `EOFException` and
`NegativeArraySizeException` are both possible. This commit strengthens the
translog corruption detection tests by simulating the following:
- a random value is written
- the file is truncated
It also makes sure that we return a `TranslogCorruptionException` in all such
cases.
Fixes#42661
Backport of #42744
This code has not been needed since the removal of tribe nodes, it was
left behind when those were dropped (note that regular transport
permissions are handled through transport profiles, even if they are not
explicitly in use).
Previously, a reindex request had two different size specifications in the body:
* Outer level, determining the maximum documents to process
* Inside the source element, determining the scroll/batch size.
The outer level size has now been renamed to max_docs to
avoid confusion and clarify its semantics, with backwards compatibility and
deprecation warnings for using size.
Similarly, the size parameter has been renamed to max_docs for
update/delete-by-query to keep the 3 interfaces consistent.
Finally, all 3 endpoints now support max_docs in both body and URL.
Relates #24344
Today we assert that the connection thread is blocked by the time the test gets
to the barrier, but in fact this is not a valid assertion. The following
`Thread.sleep()` will cause the test to fail reasonably often.
```diff
diff --git a/server/src/test/java/org/elasticsearch/cluster/NodeConnectionsServiceTests.java b/server/src/test/java/org/elasticsearch/cluster/NodeConnectionsServiceTests.java
index 193cde3180d..0e57211cec4 100644
--- a/server/src/test/java/org/elasticsearch/cluster/NodeConnectionsServiceTests.java
+++ b/server/src/test/java/org/elasticsearch/cluster/NodeConnectionsServiceTests.java
@@ -364,6 +364,7 @@ public class NodeConnectionsServiceTests extends ESTestCase {
final CheckedRunnable<Exception> connectionBlock = nodeConnectionBlocks.get(node);
if (connectionBlock != null) {
try {
+ Thread.sleep(50);
connectionBlock.run();
} catch (Exception e) {
throw new AssertionError(e);
```
This change relaxes the test to allow some time for the connection thread to
hit the barrier.
Fixes#40170
Changed order of listener invocation so that we notify before
registering search context and notify after unregistering same.
This ensures that count up/down like what we do in ShardSearchStats
works. Otherwise, we risk notifying onFreeScrollContext before notifying
onNewScrollContext (same for onFreeContext/onNewContext, but we
currently have no assertions failing in those).
Closes#28053
The test failed because we had only a single document in the index
that got deleted such that some assertions that expected at least
one live doc failed.
Relates to: #40741
Single updates use a different internal code path than updates that are wrapped in a bulk request.
While working on a refactoring to bring both closer together I've noticed that bulk updates were
failing some of the tests that single updates passed. In particular, bulk updates cause
NullPointerExceptions to be thrown and listeners not being properly notified when being rejected
from the thread pool.
This change adds a merge policy that drops all _id postings for documents that
are marked as soft-deleted but retained across merges. This is usually unnecessary
unless soft-deletes are used with a retention policy since otherwise a merge would
remove deleted documents anyway.
Yet, this merge policy prevents extreme cases where a very large number of soft-deleted
documents are retained and are impacting update performance.
Note, using this merge policy will remove all lookup by ID capabilities for soft-deleted documents.
Adds a metadata field to snapshots which can be used to store arbitrary
key-value information. This may be useful for attaching a description of
why a snapshot was taken, tagging snapshots to make categorization
easier, or identifying the source of automatically-created snapshots.
Some clusters might have been already migrated to version 7 without being warned about the joda-java migration changes.
Deprecation api on that version will give them guidance on what patterns need to be changed.
relates. This change is using the same logic like in 6.8 that is: verifying the pattern is from the incompatible set ('y'-Y', 'C', 'Z' etc), not from predifined set, not prefixed with 8. AND was also created in 6.x. Mappings created in 7.x are considered migrated and should not generate warnings
There is no pipeline check (present on 6.8) as it is impossible to verify when the pipeline was created, and therefore to make sure the format is depracated or not
#42010
The full-text query parsers accept field pattern that are expanded using the mapping.
Alias field are also detected during the expansion but they are not deduplicated with the
concrete fields that are found from other patterns (or the same). This change ensures
that we deduplicate the target fields of the full-text query parsers in order to avoid
adding the same clause multiple times. Boolean queries are already able to deduplicate
clauses during rewrite but since we also use DisjunctionMaxQuery it is preferable to detect
these duplicates early on.
This change makes use of the reader attributes added in LUCENE-8671
to ensure that `_id` fields are always on-heap for best update performance
and term dicts are generally off-heap on Read-Only engines.
Closes#38390
This test is failing because recoveries of these empty shards are not
completing in a reasonable time, but the reason for this is still obscure. This
commit adds yet more logging.
Relates #40174, #42424
This commit adds functionality so that aliases that are manipulated on
leader indices are replicated by the shard follow tasks to the follower
indices. Note that we ignore write indices. This is due to the fact that
follower indices do not receive direct writes so the concept is not
useful.
Relates #41815
This commit performs the proper restore of network disruption.
Previously disruptionScheme.stopDisrupting() was called that does not
ensure that connectivity between cluster nodes is restored. The test
was checking that the cluster has green status, but it was not checking
that connectivity between nodes is restored.
Here we switch to internalCluster().clearDisruptionScheme(true) which
performs both checks before returning.
Closes#39688
(cherry picked from commit c8988d5cf5a85f9b28ce148dbf100aaa6682a757)
The control flow in TransportAnalyzeAction is currently spread across two large
methods, and is quite difficult to follow. This commit tidies things up a bit, to make
it clearer when we use pre-defined analyzers and when we use custom built ones.
This commit clones the existing AnalyzeRequest/AnalyzeResponse classes
to the high-level rest client, and adjusts request converters to use these new
classes.
This is a prerequisite to removing the Streamable interface from the internal
server version of these classes.
IntervalBuilder#analyzeText will currently return null if it is passed an
empty TokenStream, which can lead to a confusing NullPointerException
later on during querying. This commit changes the code to return
NO_INTERVALS instead.
Fixes#42587
We were checking if an exception was caused by a specific reason "Not a
directory". Alas, this reason is locale-dependent and can fail on
systems that are not set to en_US.UTF-8. This commit addresses this by
deriving what the locale-dependent error message would be and using that
for comparison with the actual exception thrown.
Relates #41689
We had this as a dependency for legacy dependencies that still needed
the Log4j 1.2 API. This appears to no longer be necessary, so this
commit removes this artifact as a dependency.
To remove this dependency, we had to fix a few places where we were
accidentally relying on Log4j 1.2 instead of Log4j 2 (easy to do, since
both APIs were on the compile-time classpath).
Finally, we can remove our custom Netty logger factory. This was needed
when we were on Log4j 1.2 and handled logging in our own unique
way. When we migrated to Log4j 2 we could have dropped this
dependency. However, even then Netty would still pick up Log4j 1.2 since
it was on the classpath, thus the advantage to removing this as a
dependency now.
Today Elasticsearch does not prevent you from reconfiguring a node's
`path.data` to point to data paths that previously belonged to more than one
node. There's no good reason to be able to do this, and the consequences can be
quietly disastrous. Furthermore, #42489 might result in a user trying to split
up a previously-shared collection of data paths by hand and there's definitely
scope for mixing the paths up across nodes when doing this.
This change adds a check during startup to ensure that each data path belongs
to the same node.
Since the max_score optimization landed in Elasticsearch 7,
the CommonTermsQuery is redundant and slower. Moreover the
cutoff_frequency parameter for MatchQuery and MultiMatchQuery
is redundant.
Relates to #27096
(cherry picked from commit 04b74497314eeec076753a33b3b6cc11549646e8)
Today the `LeaderChecker` and `HandshakingTransportAddressConnector` do not log
anything above `DEBUG` level. However there are some situations where it is
appropriate for them to log at a higher level:
- if the low-level handshake succeeds but the high-level one fails then this
indicates a config error that the user should resolve, and the exception
will help them to do so.
- if leader checks fail repeatedly then we restart discovery, and the exception
will help to determine what went wrong.
Resolves#42153
Refactors the WKT and GeoJSON parsers from an utility class into an
instantiatable objects. This is a preliminary step in
preparation for moving out coordinate validators from Geometry
constructors. This should allow us to make validators plugable.
A disruption test case need to use a lower checkpoint sync interval
since they verify sequence numbers after the test waiting max 10 seconds
for it to stabilize.
Closes#42637
* Now that we process the bulk requests themselves on the WRITE threadpool, they can run out of retries too like the item requests even when backoff is active
* Fixes#41324 by using the same logic that checks failed item requests for their retry status for the top level bulk requests as well
When multiple commands are called in sequence, fetch shards
from mutable, up-to-date routing nodes to ensure each command's
changes are visible to subsequent commands.
This addresses an issue uncovered during work on #41050.
The problem this commit addresses is that state recovery is not reset on a node that then becomes
master with a cluster state that has a state not recovered flag in it. The situation that was observed
in a failed test run of MinimumMasterNodesIT.testThreeNodesNoMasterBlock (see below) is that we
have 3 master nodes (node_t0, node_t1, node_t2), two of them are shut down (node_t2 remains),
when the first one comes back (renamed to node_t4) it becomes leader in term 2 and sends state
(with state_not_recovered_block) to node_t2, which accepts. node_t2 becomes leader in term 3, and
as it was previously leader in term1 and successfully completed state recovery, does never retry
state recovery in term 3.
Closes#39172
* Make unwrapCorrupt Check Suppressed Ex. (#41889)
* As discussed in #24800 we want to check for suppressed corruption
indicating exceptions here as well to more reliably categorize
corruption related exceptions
* Closes#24800, 41201
* Cleanup Bulk Delete Exception Logging
* Follow up to #41368
* Collect all failed blob deletes and add them to the exception message
* Remove logging of blob name list from caller exception logging
If all primary shards are allocated on the master node, then the
verifying before close step will never interact with mock transport
service. This change prefers to allocate shards on data-only nodes.
Closes#39757
* It looks like we might be cancelling a previous publication instead of
the one triggered by the given request with a very low likelihood.
* Fixed by adding a wait for no in-progress publications
* Also added debug logging that would've identified this problem
* Closes#36813
* Remove Delete Method from BlobStore (#41619)
* The delete method on the blob store was used almost nowhere and just duplicates the delete method on the blob containers
* The fact that it provided for some recursive delete logic (that did not behave the same way on all implementations) was not used and not properly tested either
* Added separate enum for the state of each shard, it was really
confusing that we used the same enum for the state of the snapshot
overall and the state of each individual shard
* relates https://github.com/elastic/elasticsearch/pull/40943#issuecomment-488664150
* Shortened some obvious spots in equals method and saved a few lines
via `computeIfAbsent` to make up for adding 50 new lines to this class
* Remove Obsolete BwC Logic from BlobStoreRepository
* We can't restore 1.3.3 files anyway -> no point in doing the dance of computing a hash here
* Some other minor+obvious cleanups
* Some Cleanup in o.e.i.engine
* Remove dead code and parameters
* Reduce visibility in some obvious spots
* Add missing `assert`s (not that important here since the methods
themselves will probably be dead-code eliminated) but still
If a single pipeline is updated then the internal representation of
all pipelines was updated. With this change, only the internal representation
of the pipelines that have been modified will be updated.
Prior to this change the IngestMetadata of the previous and current cluster
was used to determine whether the internal representation of pipelines
should be updated. If applying the previous cluster state change failed then
subsequent cluster state changes that have no changes to IngestMetadata
will not attempt to update the internal representation of the pipelines.
This commit, changes how the IngestService updates the internal representation
by keeping track of the underlying configuration and use that to detect
against the new IngestMetadata whether a pipeline configuration has been
changed and if so, then the internal pipeline representation will be updated.
We need more information to understand why CcrRetentionLeaseIT is
failing. This commit adds some debug log to retention leases and enables
them in CcrRetentionLeaseIT.
This commit removes the act of renewing some retention leases during a
retention lease recovery test. Having renewal does not add anything
extra to this test, but does allow for some situations where the test
can fail spuriously (i.e., in a way that does not indicate that
production code is broken).
If we close an engine while a refresh is happening, then we might leak
refCount of some SegmentReaders. We need to skip the ram accounting
circuit breaker check until we have a new Lucene snapshot which includes
the fix for LUCENE-8809.
This also adds a test to the engine but left it muted so we won't forget
to reenable this check.
Closes#30290
The shard CLI tool would not do anything if a corruption marker was not
present. But a corruption marker is only added if a corruption is
detected during indexing/writing, not if a search or other read fails.
Changed the tool to always check shards regardless of corruption marker
presence.
Related to #41298
Previously sorting on a missing nested field would fail with an
Exception:
`[nested_field] failed to find nested object under path [nested_path]`
despite `unmapped_type` being set on the query.
Fixes: #33644
(cherry picked from commit 631142d5dd088a10de8dcd939b50a14301173283)
Today the default stabilisation time is calculated on the assumption that the
elected master has no pending tasks to process when it is elected, but this is
not a safe assumption to make. This can result in a cluster reaching the end of
its stabilisation time without having stabilised. Furthermore in #36943 we
increased the probability that each step in `runRandomly()` enqueues another
task, vastly increasing the chance that we hit such a situation.
This change extends the stabilisation process to allow time for all pending
tasks, plus a task that might currently be in flight.
Fixes#41967, in which the master entered the stabilisation phase with over 800
tasks to process.
By default, we track total hits up to 10k but we might index more than
10k documents `testPrimaryRelocationWhileIndexing`. With this change, we
always request for the accurate total hits in the test.
> java.lang.AssertionError: Count is 10000+ hits but 11684 was expected.
Both of these classes are basically a bloated wrapper around a simple
construct that can simply be a DirectoryFactory interface. This change
removes both classes and replaces them with a simple stateless interface
that creates a new `Directory` per shard. The concept of `index.store` is preserved
since it makes sense from a configuration perspective.
Today the `TransportClusterStateAction` ignores the state passed by the
`TransportMasterNodeAction` and obtains its state from the cluster applier.
This might be inconsistent, showing a different node as the master or maybe
even having no master.
This change adjusts the action to use the passed-in state directly, and adds
tests showing that the state returned is consistent with our expectations even
if there is a concurrent master failover.
Fixes#38331
Relates #38432
Today `RetentionLeaseIT` calls `fail(e.toString())` on some exceptions, losing
the stack trace that came with the exception. This commit adjusts this to
re-throw the exception wrapped in an `AssertionError` so we can see more
details about failures such as #41430.
This commit adds a log message containing the routing table, emitted on each
iteration of the failing assertBusy() in #40174. It also modernizes the code a
bit.
This commit makes creators of GetField split the fields into document fields and metadata fields. It is part of larger refactoring that aims to remove the calls to static methods of MapperService related to metadata fields, as discussed in #24422.
`org.elasticsearch.action.bulk.BulkProcessor` is a threadsafe class that
allows for simple semantics to deal with sending bulk requests. Once a
bulk reaches it's pre-defined size, documents, or flush interval it will
execute sending the bulk. One configurable option is the number of concurrent
outstanding bulk requests. That concurrency is implemented in
`org.elasticsearch.action.bulk.BulkRequestHandler` via a semaphore. However,
the only code that currently calls into this code is blocked by `synchronized`
methods. This results in the in-ability for the BulkProcessor to behave concurrently
despite supporting configurable amounts of concurrent requests.
This change removes the `synchronized` method in favor an explicit
lock around the non-thread safe parts of the method. The call into
`org.elasticsearch.action.bulk.BulkRequestHandler` is no longer blocking, which
allows `org.elasticsearch.action.bulk.BulkRequestHandler` to handle it's own concurrency.
Lucene 8 has the ability to skip blocks of non-competitive documents.
However some queries don't track their maximum score (`script_score`, `span`, ...)
so they always return Float.POSITIVE_INFINITY as maximum score. This can slow down
some boolean queries if other clauses have bounded max scores. This commit disables
the max score optimization when we detect a mandatory scoring clause with unbounded
max scores. Optional clauses are not checked since they can still skip documents
when the unbounded clause is after the current document.
Currently AnalysisRegistry#processNormalizerFactory creates a normalizer and
only later checks whether it should be added to the normalizer map passed in. In
case we throw an exception it isn't closed. This can be prevented by moving the
check that throws the exception earlier.
Allow for SimpleQueryString, QueryString and MultiMatchQuery
to set the `fields` parameter to the wildcard `*`. If so, set
the leniency to `true`, to achieve the same behaviour as from the
`"default_field" : "*" setting.
Furthermore, check if `*` is in the list of the `default_field` but
not necessarily as the 1st element.
Closes: #39577
(cherry picked from commit e75ff0c748e6b68232c2b08e19ac4a4934918264)
AbstractDisruptionTestCase set a lower global checkpoint sync interval setting, but this was ignored by
testAckedIndexing, which has led to spurious test failures
Relates #41068, #38931
ShardId already implements Writeable so there is no need for it to implement Streamable too. Also the readShardId static method can be
easily replaced with direct usages of the constructor that takes a
StreamInput as argument.
In case a search request holds only the suggest section, the query phase
is skipped and only the suggest phase is executed instead. There will
never be hits returned, and in case the explain flag is set to true, the
explain sub phase throws a null pointer exception as the query is null.
Usually a null query is replaced with a match all query as part of SearchContext#preProcess which is though skipped as well with suggest
only searches. To address this, we skip the explain sub fetch phase
for search requests that only requested suggestions.
Closes#31260
Removing of payload in BulkRequest (#39843) had a side effect of making
`BulkRequest.add(DocWriteRequest<?>...)` (with varargs) recursive, thus
leading to StackOverflowError. This PR adds a small change in
RequestConvertersTests to show the error and the corresponding fix in
`BulkRequest`.
Fixes#41668
* Remove IndexShard dependency from Repository
In order to simplify repository testing especially for BlobStoreRepository
it's important to remove the dependency on IndexShard and reduce it to
Store and MapperService (in the snapshot case). This significantly reduces
the dependcy footprint for Repository and allows unittesting without starting
nodes or instantiate entire shard instances. This change deprecates the old
method signatures and adds a unittest for FileRepository to show the advantage
of this change.
In addition, the unittesting surfaced a bug where the internal file names that
are private to the repository were used in the recovery stats instead of the
target file names which makes it impossible to relate to the actual lucene files
in the recovery stats.
* don't delegate deprecated methods
* apply comments
* test
When using High Level Rest Client Java API to produce search query, using AggregationBuilders.topHits("th").sort("_score", SortOrder.DESC)
caused query to contain duplicate sort clauses.
A shard that is undergoing peer recovery is subject to logging warnings of the form
org.elasticsearch.action.FailedNodeException: Failed node [XYZ]
...
Caused by: org.apache.lucene.index.IndexNotFoundException: no segments* file found in ...
These failures are actually harmless, and expected to happen while a peer recovery is ongoing (i.e.
there is an IndexShard instance, but no proper IndexCommit just yet).
As these failures are currently bubbled up to the master, they cause unnecessary reroutes and
confusion amongst users due to being logged as warnings.
Closes #40107
Moves the test infrastructure away from using node.max_local_storage_nodes, allowing us in a
follow-up PR to deprecate this setting in 7.x and to remove it in 8.0.
This also changes the behavior of InternalTestCluster so that starting up nodes will not automatically
reuse data folders of previously stopped nodes. If this behavior is desired, it needs to be explicitly
done by passing the data path from the stopped node to the new node that is started.
Simplifies the voting configuration reconfiguration logic by switching to an explicit Comparator for
the priorities. Does not make changes to the behavior of the component.
Flushing at the end of a peer recovery (if needed) can bring these
benefits:
1. Closing an index won't end up with the red state for a recovering
replica should always be ready for closing whether it performs the
verifying-before-close step or not.
2. Good opportunities to compact store (i.e., flushing and merging
Lucene, and trimming translog)
Closes#40024Closes#39588
This change verifies and aborts recovery if source and target have the
same syncId but different sequenceId. This commit also adds an upgrade
test to ensure that we always utilize syncId.
The verifying-before-close step ensures the global checkpoints on all
shard copies are in sync; thus, we don' t need to sync global
checkpoints for closed indices.
Relate #33888
There is an off-by-one error in this test. It leads to the recovery
thread never being started, and that means joining on it will wait
indefinitely. This commit addresses that by fixing the off-by-one error.
Relates #42325
I forgot to git add these before pushing, sorry. This commit fixes
compilation in IndexShardTests, they are needed here and not in master
due to differences in how Java infers types in generics between JDK 8
and JDK 11.
Today when executing an action on a primary shard under permit, we do
not enforce that the shard is in primary mode before executing the
action. This commit addresses this by wrapping actions to be executed
under permit in a check that the shard is in primary mode before
executing the action.
Today we are persisting the retention leases at least every thirty
seconds by a scheduled background sync. This sync causes an fsync to
disk and when there are a large number of shards allocated to slow
disks, these fsyncs can pile up and can severely impact the system. This
commit addresses this by only persisting and fsyncing the retention
leases if they have changed since the last time that we persisted and
fsynced the retention leases.
* Cleanup Various Uses of ActionListener
* Use shorter `map`, `runAfter` or `wrap` where functionally equivalent to anonymous class
* Use ActionRunnable where functionally equivalent
Downgrading an Elasticsearch node to an earlier version is unsupported, because
we do not make any attempt to guarantee that a node can read any of the on-disk
data written by a future version. Yet today we do not actively prevent
downgrades, and sometimes users will attempt to roll back a failed upgrade with
an in-place downgrade and get into an unrecoverable state.
This change adds the current version of the node to the node metadata file, and
checks the version found in this file against the current version at startup.
If the node cannot be sure of its ability to read the on-disk data then it
refuses to start, preserving any on-disk data in its upgraded state.
This change also adds a command-line tool to overwrite the node metadata file
without performing any version checks, to unsafely bypass these checks and
recover the historical and lenient behaviour.
Secure settings currently error if they exist inside elasticsearch.yml.
This commit adds validation that non-secure settings do not exist inside
the keystore.
closes#41831
The date_histogram accepts an interval which can be either a calendar
interval (DST-aware, leap seconds, arbitrary length of months, etc) or
fixed interval (strict multiples of SI units). Unfortunately this is inferred
by first trying to parse as a calendar interval, then falling back to fixed
if that fails.
This leads to confusing arrangement where `1d` == calendar, but
`2d` == fixed. And if you want a day of fixed time, you have to
specify `24h` (e.g. the next smallest unit). This arrangement is very
error-prone for users.
This PR adds `calendar_interval` and `fixed_interval` parameters to any
code that uses intervals (date_histogram, rollup, composite, datafeed, etc).
Calendar only accepts calendar intervals, fixed accepts any combination of
units (meaning `1d` can be used to specify `24h` in fixed time), and both
are mutually exclusive.
The old interval behavior is deprecated and will throw a deprecation warning.
It is also mutually exclusive with the two new parameters. In the future the
old dual-purpose interval will be removed.
The change applies to both REST and java clients.
If `keyedFilters` is null it assumes there are unkeyed filters...which
will NPE if the unkeyed filters was actually empty.
This refactors to simplify the filter assignment a bit, adds an empty
check and tidies up some formatting.
This commit fixes a test bug that ends up comparing the result of two consecutive calls to System.currentTimeMillis that can be different
on slow CIs.
Closes#42064