Today we have several implementations of executing SearchOperationListener
in SearchService. While all of them seem to be safe at least on, the one that
executes scroll searches can cause illegal execution of SearchOperationListener
that can then in-turn trigger assertions in ShardSearchStats. This change
adds a SearchOperationListenerExecutor that uses try-with blocks to ensure
listeners are called in a safe way.
Relates to #37185
This commit moves the aggregation and mapping code from joda time to
java time. This includes field mappers, root object mappers, aggregations with date
histograms, query builders and a lot of changes within tests.
The cut-over to java time is a requirement so that we can support nanoseconds
properly in a future field mapper.
Relates #27330
Users may require the sequence number and primary terms to perform optimistic concurrency control operations. Currently, you can get the sequence number via the `docvalues_fields` API but the primary term is not accessible because it is maintained by the `SeqNoFieldMapper` and the infrastructure can't find it.
This commit adds a dedicated sub fetch phase to return both numbers that is connected to a new `seq_no_primary_term` parameter.
1. testSimpleOnlyMasterNodeElection - requires cluster bootstrap when
the first master node is started.
2. testElectOnlyBetweenMasterNodes - requires cluster bootstrap when
the first master node is started and requires adding voting exclusion
before shutting down the first master node.
3. testAliasFilterValidation - requires cluster bootstrap when the
first master node is started.
It's not safe to continue writing state using MetaDataStateFormat
after dirty WriteStateException occurred if it's not recovered by
successful subsequent state write.
We've encountered test failure of testFailRandomlyAndReadAnyState.
The test breaks in the following way. There are 3 state paths. And what
happens next
Successful write at the beginning of the test yields 0 0 0 state
files in the directories.
1st write in the loop is unsuccessful, but not dirty - 0 0 0.
2nd write in the loop is not successful and dirty (failure during
fsync), however before removing new files we have 1 1 1. But now during
deletion, the first deletion fails and we get - 1 0 0.
3rd write in the loop is unsuccessful, but not dirty - so we want to
keep old generation, which happens to be the 1st generation, so now we
have 1 x x in state folders. Now we assert that we either load 0 or 1
state from the state folders and select only 2rd and 3th folder to
emulate disk failures - this results in NPE because there is nothing in
these folders.
Fortunately, this won’t be a problem in real life, because if there is
a dirty exception, we shut down the node and make sure we perform a
successful write on the node startup.
This adds a set of helper classes to determine if an agg "has a value".
This is needed because InternalAggs represent "empty" in different
manners according to convention. Some use `NaN`, `+/- Inf`, `0.0`, etc.
A user can pass the Internal agg type to one of these helper methods
and it will report if the agg contains a value or not, which allows the
user to differentiate "empty" from a real `NaN`.
These helpers are best-effort in some cases. For example, several
pipeline aggs share a single return class but use different conventions
to mark "empty", so the helper uses the loosest definition that applies
to all the aggs that use the class.
Sums in particular are unreliable. The InternalSum simply returns 0.0
if the agg is empty (which is correct, no values == sum of zero). But this
also means the helper cannot differentiate from "empty" and `+1 + -1`.
This commit removes the warn-date from warning headers. Previously we
were stamping every warning header with when the request
occurred. However, this has a severe performance penalty when
deprecation logging is called frequently, as obtaining the current time
and formatting it properly is expensive. A previous change moved to
using the startup time as the time to stamp on every warning header, but
this was only to prove that the timestamping was expensive. Since the
warn-date is optional, we elect to remove it from the warning
header. Prior to this commit, we worked in Kibana to make the warn-date
treated as optional there so that we can follow-up in Elasticsearch and
remove the warn-date. This commit does that.
Prefer publishing to master-eligible nodes first, so that cluster state updates are committed more
quickly, and master-eligible nodes also turned more quickly into followers after a leader election.
PersistentTasksClusterService decides if a task should be reassigned by
checking there is a node in the cluster with the same Id. If a node is
restarted PersistentTasksClusterService may not observe the change and
decide the task still has a valid assignment because the node's
ephemeral Id is not used in that decision. This change un-assigns tasks
as the nodes in the cluster change.
* Fail start of non-data node if node has data
Check that nodes started with node.data=false cannot start if they have
shard data to avoid (old) indexes being resurrected into the cluster in red status.
Issue #27073
When publications were cancelled because a node turned to follower or candidate, it would still
show as time out, which can be confusing in the logs. This change adapts the improper call of
onTimeout by generalizing it to a cancel method.
The method calls "enabled" in addition to what the super.index() does, but this
seems to be done explicitely now in the TypeParsers `parse` method. The removed
method has been deprecated since at least 6.0. Also making some of the Builders
methods and ctos private since they are only used internally in this class.
Today when bootstrapping a Zen2 cluster we wait for every node in the
`initial_master_nodes` setting to be discovered, so that we can map the
node names or addresses in the `initial_master_nodes` list to their IDs for
inclusion in the initial voting configuration. This means that if any of
the expected master-eligible nodes fails to start then bootstrapping will
not occur and the cluster will not form. This is not ideal, and we would
prefer the cluster to bootstrap even if some of the master-eligible nodes
do not start.
Safe bootstrapping requires that all pairs of quorums of all initial
configurations overlap, and this is particularly troublesome to ensure
given that nodes may be concurrently and independently attempting to
bootstrap the cluster. The solution is to bootstrap using an initial
configuration whose size matches the size of the expected set of
master-eligible nodes, but with the unknown IDs replaced by "placeholder"
IDs that can never belong to any node. Any quorum of received votes in any
of these placeholder-laden initial configurations is also a quorum of the
"true" initial set of master-eligible nodes, giving the guarantee that it
intersects all other quorums as required.
Note that this change means that the initial configuration is not
necessarily robust to any node failures. Normally the cluster will form and
then auto-reconfigure to a more robust configuration in which the
placeholder IDs are replaced by the IDs of genuine nodes as they join the
cluster; however if a node fails between bootstrapping and this
auto-reconfiguration then the cluster may become unavailable. This we feel
to be less likely than a node failing to start at all.
This commit also enormously simplifies the cluster bootstrapping process.
Today, the cluster bootstrapping process involves two (local) transport actions
in order to support a flexible bootstrapping API and to make it easily
accessible to plugins. However this flexibility is not required for the current
design so it is adding a good deal of unnecessary complexity. Here we remove
this complexity in favour of a much simpler ClusterBootstrapService
implementation that does all the work itself.
In order to be able to parse epoch seconds and epoch milli seconds own
java time fields had been introduced. These fields are however not
compatible with the way that java time allows one to configure default
fields (when a part of a timestamp cannot be read then a default value
is added), which is used for the formatters that are rounding up to the
next value.
This commit allows java date formatters to configure its round up parsing
by setting default values via a consumer. By default all formats are setting
JavaDateFormatter.ROUND_UP_BASE_FIELDS for rounding up. The epoch
however parsers both need to set different fields. The merged date
formatters do not set any fields, they just append all the round up formatters.
Also the formatter now properly copies the locale and the timezone,
fractional parsing has been set to nano seconds with proper width.
Since version 6.7.0 the Close Index API guarantees that all translog
operations have been correctly flushed before the index is closed. If
the index is reopened as a Frozen index (which uses a ReadOnlyEngine)
we can verify that the maximum sequence number from the last Lucene
commit is indeed equal to the last known global checkpoint and refuses
to open the read only engine if it's not the case. In this PR the check is
only done for indices created on or after 6.7.0 as they are guaranteed
to be closed using the new Close Index API.
Related #33888
This commit introduces a NetworkMessage class. This class has two
subclasses - InboundMessage and OutboundMessage. These messages can
be serialized and deserialized independent of the transport. This allows
more granular testing. Additionally, the serialization mechanism is now
a simple Supplier. This builds the framework to eventually move the
serialization of transport messages to the network thread. This is the
one serialization component that is not currently performed on the
network thread (transport deserialization and http serialization and
deserialization are all on the network thread).
Currently we create dedicated network threads for both the http and
transport implementations. Since these these threads should never
perform blocking operations, these threads could be shared. This commit
modifies the nio-transport to have 0 http workers be default. If the
default configs are used, this will cause the http transport to be run
on the transport worker threads. The http worker setting will still exist
in case the user would like to configure dedicated workers. Additionally,
this commmit deletes dedicated acceptor threads. We have never had these
for the netty transport and they can be added back if a need is
determined in the future.
* The repo id was determined wrong when the delete picked up on an in progress snapshot
* NOTE: This solution is still a best-effort fix and there's a slight chance of running into concurrency issues here
when multiple create and delete requests for the same snapshot name are happening concurrently, but these require a sequence
of multiple cluster state updates between the changed method reading the genId and submitting its cluster state update task
* Added test reproduced the issue reliably in about 50% of runs
* Closes#37581
This will be used in cross-cluster search when reduction will be
performed locally on each cluster. The CCS coordinating node will send
one search request per remote cluster involved and will get one search
response back from each one of them. Such responses contain all the info
to be able to perform an additional reduction and return results back
to the user.
Relates to #32125
This commit optimizes some of the performance issues from using
deprecation logging:
- we optimize encoding the deprecation value
- we optimize formatting the deprecation string
- we optimize away getting the current time (by using cached startup
time)
To make further refactoring of GeoGrid aggregations
easier (related: #30320), splitting out these inner
class dependencies into their own files makes it
easier to map the relationship between classes
From #29453 and #37285, the `include_type_name` parameter was already present and defaulted to false. This PR makes the following updates:
- Add deprecation warnings to `RestPutMappingAction`, plus tests in `RestPutMappingActionTests`.
- Add a typeless 'put mappings' method to the Java HLRC, and deprecate the old typed version. To do this cleanly, I opted to create a new `PutMappingRequest` object that differs from the existing server one.
This adds deprecation to _type in the script contexts for ingest and update.
This adds a DeprecationMap that wraps the ctx Map containing _type for these
specific contexts.
Some tests (e.g. testRestoreIndexWithShardsMissingInLocalGateway) were split-braining since
being switched to Zen2 because the bootstrap setting was left around when nodes got restarted
with data folders wiped.
The test in question here was starting one node (which autobootstrapped to that single node), then
another node. The first node was then shut down (after excluding it from the voting configuration),
its data folder wiped, and restarted. After restart, the node had an empty data folder yet
initial_master_nodes set to itself (i.e. same name). This made the node sometimes form a cluster of
its own, and not rejoin the existing cluster with the other node.
Currently when adding a response header, we do some de-duplication, and
maybe drop the header on the floor if we have reached capacity. Yet, we
still update the thread local tracking the response headers. This is
really expensive because under the hood there is a shared reference that
we synchronize on. In the case of a request processed across many shards
in a tight loop, this contention can be detrimental to performance. We
can avoid updating the thread local in these cases though, when the
response header is duplicate of one that we have already seen, or when
it's dropped on the floor. This commit addresses these performance
issues by avoiding the unnecessary set.
As of today the Close Index API does its best to close indices,
but closing an index with ongoing recoveries might or might not
be acknowledged depending of the values of the max seq number
and global checkpoint at the time the
TransportVerifyShardBeforeClose action is executed.
These tests failed because they always expect that the index is
correctly closed on the first try, which is not always the case.
Instead we need to retry the closing until it succeed.
Closes#37571
* Fixes `testTwoNodeFirstNodeCleared` by manipulating voting config exclusions.
* Removes `testRecoveryDifferentNodeOrderStartup` since state recovery is now
handled entirely on the elected master, so the order in which the data nodes
start is irrelevant.
This test was actually passing, for the wrong reason: it asserts a
`MasterNotDiscoveredException` is thrown, expecting this to be due to a failure
to perform state recovery, but in fact it's thrown because the node is not
correctly bootstrapped.
This change adds deprecation warning to the indices.get_mapping API in case the
"inlcude_type_name" parameter is set to "true" and changes the parsing code in
GetMappingsResponse to parse the type-less response instead of the one
containing types. As a consequence the HLRC client doesn't need to force
"include_type_name=true" any more and the GetMappingsResponseTests can be
adapted to the new format as well. Also removing some "include_type_name"
parameters in yaml test and docs where not necessary.
delete and close index actions threw IllegalArgumentExceptions
when attempting to run against an index that has a snapshot
in progress.
This change introduces a dedicated SnapshotInProgressException
for these scenarios. This is done to explicitly signal to clients that
this is the reason the action failed, and it is a retryable error.
relates to #37541.
This is a continuation of #28667 and has as goal to convert all executors to propagate errors to the
uncaught exception handler. Notable missing ones were the direct executor and the scheduler. This
commit also makes it the property of the executor, not the runnable, to ensure this property. A big
part of this commit also consists of vastly improving the test coverage in this area.
This change adds a way to customize how phrase prefix queries should be created
on field types. The match phrase prefix query is exposed in field types in order
to allow optimizations based on the options set on the field.
For instance the text field uses the configured prefix field (if available) to
build a span near that mixes the original field and the prefix field on the last
position.
This change also contains a small refactoring of the match/multi_match query that
simplifies the interactions between the builders.
Closes#31921
This test failed because the refresh at the end of the test is not guaranteed to run before the
indexing is completed, and therefore there's no guarantee that the refresh will free all operations.
This triggers an assertion failure in the test clean-up, which asserts that there are no more pending
operations.
Some systems default to a nofile ulimit of 65535. To reduce the pain of
deploying Elasticsearch to such systems, this commit lowers the required
limit from 65536 to 65535.
We flush quite often in testAddNewReplicas to create the safe index
commit with gaps in sequence numbers. This test is failing recently
because CI is too slow to complete 5 small flushes in 10 seconds.
This commit increases timeout for this test and also ensures to always
terminate the background indexing. The latter is to eliminate unrelated
failures if this test fails again.
Closes#37183
All tests except testRestorePersistentSettings (renamed to
testExceptionWhenRestoringPersistentSettings) worked fine.
testExceptionWhenRestoringPersistentSettings re-written to use a custom
setting, because "minimum master node" setting is no longer available
in Zen2. It turns out there is no good replacement for "minimum master
node" setting for this test, that's why the custom setting is
introduced.
Unfortunately, there is #37485 bug and currently
RestoreService does not perform setting validation. That's why the
test is annotated with @AwaitsFix, the idea is to merge this commit and
then fix the issue and enable the test. (The test passes with a simple
fix, that adds a single line to RestoreService).
Currently all proxied actions are denied for the `SystemPrivilege`.
Unfortunately, there are use cases (CCR) where we would like to proxy
actions to a remote node that are normally performed by the
system context. This commit allows the system context to perform
proxy actions if they are actions that the system context is normally
allowed to execute.
Currently it takes a type, but this isn't really needed now that indices can
have at most one type. The only downside is that we might return a different
error when trying to index into a type that doesnt't exist yet.
This commit removes some leniency from REST handling where we move to
reject all requests that have a body where the body is not used during
the course of handling the request. For example,
DELETE /index
{
"query" : {
"term" : {
"field" : "value"
}
}
}
is now rejected.
* The internal create request is absolutely redundant, the only difference to the transport request is that we resolved the snapshot
name when moving from the transport to the internal version
* Removed it and passed the transport request into the snapshot service instead
* nicer way of resolve snapshot name in callback
The AbstracLifecycleComponent used to extend AbstractComponent, so it had to pass settings to the constractor of its supper class.
It no longer extends the AbstractComponent so there is no need for this constructor
There is also no need for AbstracLifecycleComponent subclasses to have Settings in their constructors if they were only passing it over to super constructor.
This is part 1. which will be backported to 6.x with a migration guide/deprecation log.
part 2 will have this constructor removed in 7
relates #35560
relates #34488
This commit adds one more underlying implementation of MockPersistedState.
Previously only InMemoryPersistentState was used, not GatewayMetaState
is used rarely.
When adding GatewayMetaState support the main question was: do we want to
emulate exceptions as we do today in MockPersistedState before
delegating to GatewayMetaState or do we want these exceptions to
propagate from the lower level, i.e. file system exceptions?
On the one hand, lower level exception propagation is already tested in
GatewayMetaStateTests, so this won't improve the coverage.
On the other hand, the benefit of low-level exceptions is to see how all these
components work in conjunction. Finally, we abandoned the idea of low-level
exceptions because we don't have a way to deal with IOError today in
CoordinatorTests, but hacking GatewayMetaState not to throw
IOError seems unnatural.
So MockPersistedState rarely throws an exception before delegating to
GatewayMetaState, which is not supposed to throw the exception.
This commit required two changes:
Move GatewayMetaStateUT to upper-level from
GatewayMetaStatePersistedStateTests, because otherwise, it's not easy
to construct GatewayMetaState instance in CoordinatorTests.
Move addition of STATE_NOT_RECOVERED_BLOCK from GatewayMetaState
constructor to GatewayMetaState.applyClusterUpdaters, because
CoordinatorTests class assumes that there is no such block and most of
them fail.
The completion suggester ignores the original weight of the suggestion when duplicates are removed. This change fixes this bug and keeps the best weighted suggestion among the duplicates. It also removes the custom implementation of the top docs suggest collector now that https://issues.apache.org/jira/browse/LUCENE-8529 is committed in Lucene.
Closes#35836
This commit prepares the required infra to make send a translog snapshot
of the recovery source non-blocking. I'll make a follow-up to make the send
snapshot method non-blocking.
Relates #37291
There were 5 tests in MinimumMasterNodesIT. 2 of them removed, 3 of
them changed and renamed.
1) testSimpleMinimumMasterNodes -> testTwoNodesNoMasterBlock. The
flow of this test is left intact but in order to make it work on
Zen2, additional work for the cluster bootstrapping and voting
exclusions is needed.
2) testDynamicUpdateMinimumMasterNodes -> removed, there is nothing
that corresponds to the dynamic change of the minimum master nodes
setting.
3) testCanNotBringClusterDown -> removed, it also plays with changing
minimum master nodes dynamically.
4) testMultipleNodesShutdownNonMasterNodes ->
testThreeNodesNoMasterBlock. Previously this test was checking that
there would be no master block, if min_master_nodes=3 and 4 nodes are
started, then 2 nodes are brought down. Zen2 dynamically accommodates
to the number of nodes in the cluster, so it's possible that there
still will be a master in 2 nodes cluster. For Zen2, we start up 3
nodes. And shut down 2 of them (w/o voting exclusions), which results
in no master block.
5) testCanNotPublishWithoutMinMastNodes ->
testCanNotCommitStateThreeNodes. Test flow is not changed. But
previously there was no check that nodes in the bigger part of
network partition will elect the master, before healing the network
partition. For Zen2 it does not work, because persistent setting
addition is accepted on the old master and if it's elected new master
again, this setting will appear in the cluster state.
Also, I have a feeling that we need to remove this class, but could not
come up with a good name.
Adds the node's current term and the term and version of the the last-accepted
cluster state to the message reported by the `ClusterFormationFailureHelper`,
since these values may be of importance when tracking down a cluster formation
failure.
The test that remote clusters used by ML datafeeds have
a license that allows ML was not accounting for the
possibility that the remote cluster name could be
wildcarded. This change fixes that omission.
Fixes#36228
The test testSendSnapshotSendsOps is currently using a mock instance of
RecoveryTargetHandler which will be hard to modify when we make the
RecoveryTargetHandler non-blocking. This commit prepares for the
incoming changes by replacing the mock instance with a stub.
The Inject Annotation was removed from IndicesClusterStateService as
part of reformatting in e11a32e, but this causes CreationException on
cluster startup.
This commit adds a simple method for executing a runnable against a
shard under a primary permit. Today there is only a single caller for
this method, but this there are two upcoming use-cases for which having
this method will help keep the code simpler.
This commit reformats some classes in the index universe with the
purpose of breaking some long method definitions and invocations into a
line per parameter. This has the advantage that for an upcoming change
to these definitions and invocations, the diff for that change will be a
single line per definition or invocation. That makes these sorts of
changes easier to read.
This commit is a simple introduction of the serialization of retention
leases, which will be needed when they are sent across the wire while
synchronizing retention leases to replicas.
* Default include_type_name to false for get and put mappings.
* Default include_type_name to false for get field mappings.
* Add a constant for the default include_type_name value.
* Default include_type_name to false for get and put index templates.
* Default include_type_name to false for create index.
* Update create index calls in REST documentation to use include_type_name=true.
* Some minor clean-ups around the get index API.
* In REST tests, use include_type_name=true by default for index creation.
* Make sure to use 'expression == false'.
* Clarify the different IndexTemplateMetaData toXContent methods.
* Fix FullClusterRestartIT#testSnapshotRestore.
* Fix the ml_anomalies_default_mappings test.
* Fix GetFieldMappingsResponseTests and GetIndexTemplateResponseTests.
We make sure to specify include_type_name=true during xContent parsing,
so we continue to test the legacy typed responses. XContent generation
for the typeless responses is currently only covered by REST tests,
but we will be adding unit test coverage for these as we implement
each typeless API in the Java HLRC.
This commit also refactors GetMappingsResponse to follow the same appraoch
as the other mappings-related responses, where we read include_type_name
out of the xContent params, instead of creating a second toXContent method.
This gives better consistency in the response parsing code.
* Fix more REST tests.
* Improve some wording in the create index documentation.
* Add a note about types removal in the create index docs.
* Fix SmokeTestMonitoringWithSecurityIT#testHTTPExporterWithSSL.
* Make sure to mention include_type_name in the REST docs for affected APIs.
* Make sure to use 'expression == false' in FullClusterRestartIT.
* Mention include_type_name in the REST templates docs.
Today file-chunks are sent sequentially one by one in peer-recovery. This is a
correct choice since the implementation is straightforward and recovery is
network bound in most of the time. However, if the connection is encrypted, we
might not be able to saturate the network pipe because encrypting/decrypting
are cpu bound rather than network-bound.
With this commit, a source node can send multiple (default to 2) file-chunks
without waiting for the acknowledgments from the target.
Below are the benchmark results for PMC and NYC_taxis.
- PMC (20.2 GB)
| Transport | Baseline | chunks=1 | chunks=2 | chunks=3 | chunks=4 |
| ----------| ---------| -------- | -------- | -------- | -------- |
| Plain | 184s | 137s | 106s | 105s | 106s |
| TLS | 346s | 294s | 176s | 153s | 117s |
| Compress | 1556s | 1407s | 1193s | 1183s | 1211s |
- NYC_Taxis (38.6GB)
| Transport | Baseline | chunks=1 | chunks=2 | chunks=3 | chunks=4 |
| ----------| ---------| ---------| ---------| ---------| -------- |
| Plain | 321s | 249s | 191s | * | * |
| TLS | 618s | 539s | 323s | 290s | 213s |
| Compress | 2622s | 2421s | 2018s | 2029s | n/a |
Relates #33844
This is related to #35975. It implements a file based restore in the
CcrRepository. The restore transfers files from the leader cluster
to the follower cluster. It does not implement any advanced resiliency
features at the moment. Any request failure will end the restore.
Without pulling out the supplier function to the enclosing class, Eclipse 4.8
complains with the following error "No enclosing instance of type
CoordinatorTests.Cluster is available due to some intermediate constructor
invocation"
DeprecationLogger has warning de-duplication logic but it is expensive to run as it involves parsing existing warning headers. This PR changes the upstream bulk indexing code to do its own "event thinning" rather than relying on DeprecationLogger's trimming.
Closes#37411
The constructors in PutPipelineRequest and SimulatePipelineRequest that guess
the xContent type from the provided source are deprecated since 6.0 and each
have a counterpart that takes the xContent type as an explicit argument.
Removing these ctors together with the builders and methods in
ClusterAdminClient that don't have the xContent type as argument.
Today the SyncedFlushService flow is written with multiple nested
callbacks which are hard to read. This commit replaces them with
sequential step listeners.
We recently migrated suggestions to `Writeable`. That allows us to also
clean up empty constructors and methods that called them as they are no
longer needed. They are replaced by constructors that accept a
`StreamInput` instance.
Today a peer-recovery may run into a deadlock if the value of
node_concurrent_recoveries is too high. This happens because the
peer-recovery is executed in a blocking fashion. This commit attempts
to make the recovery source partially non-blocking. I will make three
follow-ups to make it fully non-blocking: (1) send translog operations,
(2) primary relocation, (3) send commit files.
Relates #36195
* Fix PrimaryAllocationIT Race Condition
* Forcing a stale primary allocation on a green index was tripping the assertion that was removed
* Added a test that this case still errors out correctly
* Made the ability to wipe stopped datanode's data public on the internal test cluster and used it to ensure correct behaviour on the fixed test
* Previously it simply passed because the test finished before the index went green and would NPE when the index was green at the time of the shard store status request, that would then come up empty
* Closes#37345
This commit introduces StepListener which provides a simple way to write
a flow consisting of multiple asynchronous steps without having nested
callbacks.
Relates #37291
With the `include_type_name` available now for indices.get on 6.x after the
backport, the corresponsing yaml test can include anything from 6.7 on.
Also changing the RestGetIndicesActionTests base test class.
* Due to a race between retrying the snapshot creation and the failed snapshot create trying to delete the snapshot there is no guarantee that the snapshot is eventually created by retries
* Adjusted the assertion accordingly
* Closes#36779
This commit moves DisruptableMockTransport to use a more accurate representation of connection
management, which allows to use the full connection manager and does not require mocking out
any behavior. With this, we can implement restarting nodes in CoordinatorTests.
Updates perform realtime get, perform the requested update and then index the document again
using optimistic concurrency control. This PR changes the logic to use sequence numbers instead
of versioning.
Note that the current versioning logic isn't suffering from the same problem as external OCC
requests because the get and indexing is always done on the same primary.
Relates #36148
Relates #10708
* Add benchmark
* Use java time API instead of exception handling
when several formatters are used, the existing way of parsing those is
to throw an exception catch it, and try the next one. This is is
considerably slower than the approach taken in joda time, so that
indexing is reduced when a date format like `x||y` is used and y is the
date format being used.
This commit now uses the java API to parse the date by appending the
date time formatters to each other and does not rely on exception
handling.
* fix benchmark
* fix tests by changing formatter, also expose printer
* restore optional printing logic to fix tests
* fix tests
* incorporate review comments
This commit makes the use of empty retention lease suppliers to always
be an empty list as opposed to in some cases an empty set. This commit
is solely for consistency reasons, there is no functional change here.
This commit adds some simple validation that the values input to the
retention lease constructor our valid values. We will later rely on
these values being within the validated range.
Added warnings checks to existing tests
Added “defaultTypeIfNull” to DocWriteRequest interface so that Bulk requests can override a null choice of document type with any global custom choice.
Related to #35190
* Add include_type_name to the get field mappings API.
* Make sure the API specification lists include_type_name as a boolean.
* Add include_type_name to the get index templates API.
* Add include_type_name to the put index templates API.
When the deprecation log is written to within scripting support code
like ScriptDocValues, it runs under the reduces privileges of scripts.
Sometimes this can trigger log rolling, which then causes uncaught
security errors, as was handled in #28485. While doing individual
deprecation handling within each deprecation scripting location is
possible, there are a growing number of deprecations in scripts.
This commit wraps the logging call within the deprecation logger use a
doPrivileged block, just was we would within individual logging call
sites for scripting utilities.
* Get indices shard store status before enqueuing the reallocation state update task to prevent
tasks that would fail because a node does not hold a stale copy of the shard on a best effort basis
* Closes#37098
Adds join validation to Zen2, which prevents a node from joining a cluster when the node does not
have the right ES version or does not satisfy any other of the join validation constraints.
This SearchType was deprecated since at least 6.0 and according to the
documentation is only kept around for pre-5.3 requests. Removing and leaving a
comment as placeholder so we don't reuse the byte value associated with it
without further consideration.
* Java Time: Fix timezone parsing
An independent test uncovered an issue when parsing a timezone
containing a colon like `01:00` - some formats did not properly support
this.
This commit adds test for all formats in the dueling tests and fixes a
few issues with existing date formatters.
* fix tests, so they run under java8
* Tests: Add ElasticsearchAssertions.awaitLatch method
Some tests are using assertTrue(latch.await(...)) in their code. This
leads to an assertion error without any error message. This adds a
method which has a nicer error message and can be used in tests.
* fix forbidden apis
* fix spaces
Today we still wrap recovery source readers on merge even if we
keep all documents recovery source. This basically disables bulk
merging for stored fields. This change skips wrapping if all docs
sources are kept anyway.
This change fixes an unreleased bug that trips an assertion because a static instance
shared among threads is modified during the search. This commit copies the static
instance in order to ensure that each thread can modify the value without modifying
the other instances.
Closes#37179Closes#37266
* ingest: compile mustache template only if field includes '{{''
Prior to this change, any field in an ingest node processor that supports
script templates would be compiled as mustache template regardless if they
contain a template or not. Compiling normal text as mustache templates is
harmless. However, each compilation counts against the script compilation
circuit breaker. A large number of processors without any templates or scripts
could un-intuitively trip the too many script compilations circuit breaker.
This change simple checks for '{{' in the text before it attempts to compile.
fixes#37120
Update BucketSortPipelineAggregator to use a List and Collections.sort() for sorting instead of a priority queue. This preserves the order for equal values. Closes#36322.
* TESTS: Real Coordinator in SnapshotServiceTests
* Introduce real coordinator in SnapshotServiceTests to be able to test network disruptions realistically
* Make adjustments to cluster applier service so that we can pass a mocked single threaded executor for tests
This change adds support for the 'include_type_name' parameter for the
indices.get API. This parameter, which defaults to `false` starting in 7.0,
changes the response to not include the indices type names any longer.
If the parameter is set in the request, we additionally emit a deprecation
warning since using the parameter should be only temporarily necessary while
adapting to the new response format and we will remove it with the next major
version.
This change turns an assertion into an IllegalStateException in SearchPhaseController#getTotalHits.
The goal is to help identify the cause of the failures in https://github.com/elastic/elasticsearch/issues/37179
which seems to fail only in CI.
The assertion will be restored when the issue is solved (NORELEASE).
The test intercepts TransportVerifyShardBeforeCloseAction shard
requests, so it needs a minimum of 2 primary shards on 2 different
nodes to correctly intercepts requests.
These tests failed on CI multiple times in the past weeks because they use a
test cluster with a SUITE scope that recreates nodes between tests. With such
a scope, nodes can be recreated in between test executions and can inherit a
node id from a previous test execution, while they are assigned a random data
path. With the successive node recreations it is possible that a newly recreated
node shares the same node id (but different data path) as a non recreated node.
This commit changes the cluster scope of the CorruptedFileIT and FlushIT
tests which often fail.
The failure is reproducable with :
./gradlew :server:integTest -Dtests.seed=EF3A50C225CF377
-Dtests.class=org.elasticsearch.index.store.CorruptedFileIT
-Dtests.security.manager=true -Dtests.locale=th-TH-u-nu-thai-x-lvariant-TH -Dtests.timezone=America/Rio_Branco
-Dcompiler.java=11 -Druntime.java=8
* [Analysis] Deprecate Standard Html Strip Analyzer
Deprecate only Standard Html Strip Analyzer
If user create index with the analyzer since 7.0, es throws an exception.
If an index was created before 7.0, es issue deprecation log
We will remove it in 8.0
Related #4704
Today we create a global instance of RecoveryResponse then mutate it
when executing each recovery step. This is okay for the current
sequential recovery flow but not suitable for an asynchronous recovery
which we are targeting. With this commit, we return the result of each
step separately, then construct a RecoveryResponse at the end.
Relates #37174
Traditionally remote clusters can be configured dynamically. However,
the compress and ping settings are not currently set to be configured
dynamically. This commit changes that.
This commit implements a straightforward approach to retention lease
expiration. Namely, we inspect which leases are expired when obtaining
the current leases through the replication tracker. At that moment, we
clean the map that persists the retention leases in memory.
This commit converts the epoch time parsing implementation which uses
the java time api to create DateTimeFormatters instead of DateFormatter
implementations. This will allow multi formats for java time to be
implemented in a single DateTimeFormatter in a future change.
Remove several unused helper methods. Most of them are one-liners and should
be easier to be used from the corresponding primitive wrapper classes.
The bytes array conversion methods are unused as well, it should be easy to
re-create them if needed.
This commit fixes an issue with a settings builder method that allows
setting a duration by time unit. In particular, this method can suffer
from a loss of precision. For example, if the input duration is 1500
microseconds then internally we are converting this to "1ms",
demonstrating the loss of precision. Instead, we should internally
convert this to a TimeValue that correctly represents the input
duration, and then convert this to a string using a method that does not
lose the unit. That is what this commit does.
Version needs to be updated after backporting #36997 & #37142
where we added support for providing and serializing localClusterAlias
as well ass absoluteStartMillis.
Relates to #36997 & #37142
Today, a setting can declare that its validity depends on the values of other
related settings. However, the validity of a setting is not always checked
against the correct values of its dependent settings because those settings'
correct values may not be available when the validator runs.
This commit separates the validation of a settings updates into two phases,
with separate methods on the `Setting.Validator` interface. In the first phase
the setting's validity is checked in isolation, and in the second phase it is
checked again against the values of its related settings. Most settings only
use the first phase, and only the few settings with dependencies make use of
the second phase.
The `cluster.unsafe_initial_master_node_count` setting was introduced as a
temporary measure while the design of `cluster.initial_master_nodes` was being
finalised. This commit removes this temporary setting, replacing it with usages
of `cluster.initial_master_nodes` where appropriate.
This commit adds a unique id to cluster blocks, so that they can be uniquely
identified if needed. This is important for the Close Index API where multiple
concurrent closing requests can be executed at the same time. By adding a
UUID to the cluster block, we can generate unique "closing block" that can
later be verified on shards and then checked again from the cluster state
before closing the index. When the verification on shard is done, the closing
block is replaced by the regular INDEX_CLOSED_BLOCK instance.
If something goes wrong, calling the Open Index API will remove the block.
Related to #33888
This commit is the first in a series which will culminate with
fully-functional shard history retention leases.
Shard history retention leases are aimed at preventing shard history
consumers from having to fallback to expensive file copy operations if
shard history is not available from a certain point. These consumers
include following indices in cross-cluster replication, and local shard
recoveries. A future consumer will be the changes API.
Further, index lifecycle management requires coordinating with some of
these consumers otherwise it could remove the source before all
consumers have finished reading all operations. The notion of shard
history retention leases that we are introducing here will also be used
to address this problem.
Shard history retention leases are a property of the replication group
managed under the authority of the primary. A shard history retention
lease is a combination of an identifier, a retaining sequence number, a
timestamp indicating when the lease was acquired or renewed, and a
string indicating the source of the lease. Being leases they have a
limited lifespan that will expire if not renewed. The idea of these
leases is that all operations above the minimum of all retaining
sequence numbers will be retained during merges (which would otherwise
clear away operations that are soft deleted). These leases will be
periodically persisted to Lucene and restored during recovery, and
broadcast to replicas under certain circumstances.
This commit is merely putting the basics in place. This first commit
only introduces the concept and integrates their use with the soft
delete retention policy. We add some tests to demonstrate the basic
management is correct, and that the soft delete policy is correctly
influenced by the existence of any retention leases. We make no effort
in this commit to implement any of the following:
- timestamps
- expiration
- persistence to and recovery from Lucene
- handoff during primary relocation
- sharing retention leases with replicas
- exposing leases in shard-level statistics
- integration with cross-cluster replication
These will occur individually in follow-up commits.
This commit addresses an issue when setting a byte size value setting
using a value that has a fractional component when converted to its
string representation. For example, trying to set a byte size value
setting to a value of 1536 bytes is problematic because internally this
is converted to the string "1.5k". When we go to get this setting, we
try to parse "1.5k" back to a byte size value, which does not support
fractional values. The problem is that internally we are relying on a
method which loses the unit when doing the string conversion. Instead,
we are going to use a method that does not lose the unit and therefore
we can roundtrip from the byte size value to the string and back to the
byte size value.
* Randomly doing non-atomic writes causes rare 0 byte reads from `index-N` files in tests
* Removing this randomness fixes these random failures and is valid because it does not reproduce a real-world failure-mode:
* Cloud-based Blob stores (S3, GCS, and Azure) do not have inconsistent partial reads of a blob, either you read a complete blob or nothing on them
* For file system based blob stores the atomic move we do (to atomically write a file) by setting `java.nio.file.StandardCopyOption#ATOMIC_MOVE` would throw if the file system does not provide for atomic moves
* Closes#37005
I started referring to this parameter name from various places in #37149 so I
think it's a good idea to simplify things by referring to a common constant.
We have recently added support for providing a local cluster alias to a
SearchRequest through a package protected constructor. When executing
cross-cluster search requests with local reduction on each cluster, the
CCS coordinating node will have to provide such cluster alias to each
remote cluster, as well as the absolute start time of the search action
in milliseconds from the time epoch, to be used when evaluating date
math expressions both while executing queries / scripts as well as when
resolving index names.
This commit adds support for providing the start time together with the
cluster alias. It is a final member in the search request, which will
only be set when using cross-cluster search with local reduction (also
known as alternate execution mode). When not provided, the coordinating
node will determine the current time and pass it through (by calling
`System.currentTimeMillis`).
Relates to #32125
This pull request changes the Freeze Index and Close Index actions so
that these actions always requires a Task. The task's id is then propagated
from the Freeze action to the Close action, and then to the Verify shard action.
This way it is possible to track which Freeze task initiates the closing of an index,
and which consecutive verifiy shard are executed for the index closing.
As suggested in #36775, this pull request renames the following methods:
ClusterBlocks.hasGlobalBlock(int)
ClusterBlocks.hasGlobalBlock(RestStatus)
ClusterBlocks.hasGlobalBlock(ClusterBlockLevel)
to something that better reflects the property of the ClusterBlock that is searched for:
ClusterBlocks.hasGlobalBlockWithId(int)
ClusterBlocks.hasGlobalBlockWithStatus(RestStatus)
ClusterBlocks.hasGlobalBlockWithLevel(ClusterBlockLevel)
This commit addresses an issue when setting a time value setting using a
value that has a fractional component when converted to its string
representation. For example, trying to set a time value setting to a
value of 1500ms is problematic because internally this is converted to
the string "1.5s". When we go to get this setting, we try to parse
"1.5s" back to a time value, which does not support fractional
values. The problem is that internally we are relying on a method which
loses the unit when doing the string conversion. Instead, we are going
to use a method that does not lose the unit and therefore we can
roundtrip from the time value to the string and back to the time value.
* The retries on the failing master can lead to concurrently trying to create and delete a snapshot, catch this for now to fix this test
* closes#36779
We don't want two FSDirectories manage pending deletes separately
and optimize file listing. This confuses IndexWriter and causes exceptions
when files are deleted twice but are pending for deletion. This change
move to using a NIOFS subclass that only delegates to MMAP for opening files
all metadata and pending deletes are managed on top.
Closes#37111
Relates to #36668
The query object was incorrectly added to the remote object in the
xcontent. This fix moves the query back into the source, if it was
passed in as part of the RemoteInfo. It also adds a IPv6 test for
reindex from remote such that we can properly validate this.
In Lucene 8 searches can skip non-competitive hits if the total hit count is not requested.
It is also possible to track the number of hits up to a certain threshold. This is a trade off to speed up searches while still being able to know a lower bound of the total hit count. This change adds the ability to set this threshold directly in the track_total_hits search option. A boolean value (true, false) indicates whether the total hit count should be tracked in the response. When set as an integer this option allows to compute a lower bound of the total hits while preserving the ability to skip non-competitive hits when enough matches have been collected.
Relates #33028
Today we block using the generic thread-pool on the target side
until the source side has fully executed the recovery. We still
block on the source side executing the recovery in a blocking fashion
but there is no reason to block on the target side. This will
release generic threads early if there are many concurrent recoveries
happen.
Relates to #36195
Today it's very difficult to see which indices are frozen or rather
throttled via the commonly used monitoring APIs. This change adds
a cell to the `_cat/indices` API to render if an index is `search.throttled`
Relates to #34352
With #36997 we added support for providing a local cluster alias with a
`SearchRequest`. We intended to make sure that when provided as part of
a search request, the cluster alias would never be used for connection
lookups. Yet due to a bug we would still end up looking up the
connection from the remote ones.
This commit adds a test to make sure that whenever we set the cluster
alias to the `SearchRequest` (which can only be done at transport), such
alias is used as index prefix in the returned hits. No errors are thrown
despite no remote clusters are configured indicating that such alias is
never used for connection look-ups.
Also, we add explicit support for the empty cluster alias when printing
out index names through `RemoteClusterAware#buildRemoteIndexName`.
In fact we don't want to print out `:index` when the cluster alias is
set to empty string, but rather `index`. Yet, the semantic of empty
string is different compared to `null` as it will still disable final
reduction. This will be used in CCS when searching against remote
clusters as well as the local one, the local one will have empty prefix
yet it will need to disable final reduction so that its results will be
properly merged with the ones coming from the remote clusters.
Today when electing a master in Zen2 we use the cluster state version to
determine whether a node has a fresh-enough cluster state to become master.
However the cluster state version is not a reliable measure of freshness in the
Zen1 world; furthermore in 6.x the cluster state version is not persisted. This
means that when upgrading from 6.x via a full cluster restart a cluster state
update may be lost if a stale master wins the initial election.
This change fixes this by using the metadata version as a measure of freshness
when in term 0, since this is persisted in 6.x and does more reliably indicate
the freshness of nodes.
It also makes changes parallel to elastic/elasticsearch-formal-models#40 to
support situations in which nodes accept cluster state versions in term 0: this
does not happen in a pure Zen2 cluster, but can happen in mixed clusters and
during upgrades.
Now that we unwrap mappings in DocumentMapperParser#extractMappings, it is not
necessary for the mapping definition to always be nested under the type. This
leniency around the mapping format was added in 2341825358.
* The static threadpool leaks a lot of memory in these tests because it prevents things like the connect listeners from `org.elasticsearch.transport.TcpTransport#initiateConnection`
to be GCed between tests (since they keep being referenced by the threadpool) which in turn reference channels and their underlying buffers
* I could not find any slowdown in executing these tests from this change, if anything they are slightly faster now on my machine
* Relates #36906 (which may be caused by slowness from leaking memory and also becomes testable in a loop by this change)
Types can be used both in the source and dest section of the body which will
be translated to search and index requests respectively. Adding a deprecation warning
for those cases and removing examples using more than one type in reindex since
support for this is going to be removed.
The `composite` aggregation uses a TreeMap to keep track of the best buckets.
This ensures a log(n) time cost to insert new buckets but also to retrieve buckets
that are already present in the map. In order to speed up the retrieval of buckets
this change replaces the TreeMap with a priority queue and a HashMap. The insertion
cost is still log(n) but the retrieval of buckets through the HashMap is now done in constant
time. This optimization can bring significant improvement since each document needs
to check if its associated buckets are already present in the current best buckets.
With this commit we rename `node.store.allow_mmapfs` to
`node.store.allow_mmap`. Previously this setting has controlled whether
`mmapfs` could be used as a store type. With the introduction of
`hybridfs` which also relies on memory-mapping,
`node.store.allow_mmapfs` also applies to `hybridfs` and thus we rename
it in order to convey that it is actually used to allow memory-mapping
but not a specific store type.
Relates #36668
Relates #37070
The QueryStringQueryBuilder does not currently delegate to the field mapper's prefixQuery
method, so does not use indexed prefixes. This commit corrects this.
It also fixes a bug where a query a* would not match the word a if indexed prefixes were used with
a minchar setting of 2.
When executing terms aggregations we set the shard_size, meaning the
number of buckets to collect on each shard, to a value that's higher than
the number of requested buckets, to guarantee some basic level of
precision. We have an optimization in place so that we leave shard_size
set to size whenever we are searching against a single shard, in which
case maximum precision is guaranteed by definition.
Such optimization requires us access to the total number of shards that
the search is executing against. In the context of cross-cluster search,
once we will introduce multiple reduction steps (one per cluster) each
cluster will only know the number of local shards, which is problematic
as we should only optimize if we are searching against a single shard in a
single cluster. It could be that we are searching against one shard per cluster
in which case the current code would optimize number of terms causing
a loss of precision.
While discussing how to address the CCS scenario, we decided that we do
not want to introduce further complexity caused by this single shard
optimization, as it benefits only a minority of cases, especially when
the benefits are not so great.
This commit removes the single shard optimization, meaning that we will
always have heuristic enabled on how many number of buckets to collect
on the shards, even when searching against a single shard.
This will cause more buckets to be collected when searching against a single
shard compared to before. If that becomes a problem for some users, they
can work around that by setting the shard_size equal to the size.
Relates to #32125
With #36221 we introduced shards counting to address a rare failure.
This caused a worse problem in this test when replicas were allocated
and shards failures were randomly returned. The latch has to take into
account additional attempts caused by the shard failures, which means
that in order for run to be called, performPhaseOnShard will be called
(numShards + numFailures) times.
To address this, we need to decide upfront which shard is going to fail,
making sure that at least one shards is successful otherwise the whole
request fails.
Closes#37074
With this commit we introduce a new store type `hybridfs` that is a
hybrid between `mmapfs` and `niofs`. This store type chooses different
strategies to read Lucene files based on the read access pattern (random
or linear) in order to optimize performance.
This store type has been available in earlier versions of Elasticsearch
as `default_fs`. We have chosen a different name now in order to convey
the intent of the store type instead of tying it to the fact whether it
is the default choice.
Relates #36668
SearchAsyncActionTests may fail with RejectedExecutionException as InitialSearchPhase may try to execute a runnable after the test has successfully completed, and the corresponding executor was already shut down. The latch was located in getNextPhase that is almost correct, but does not cover for the last finishAndRunNext round that gets executed after onShardResult is invoked.
This commit moves the latch to count the number of shards, and allowing the test to count down later, after finishAndRunNext has been potentially forked. This way nothing else will be executed once the executor is shut down at the end of the tests.
Closes#36221Closes#33699
* Fixes the issue reproduced in the added tests:
* When having open index requests on a shard that are waiting for a refresh, relocating that shard
becomes blocked until that refresh happens (which could be never as in the test scenario).
With #36997 we added the ability to provide a cluster alias with a
SearchRequest.
The next step is to disable the final reduction whenever a cluster alias
is provided with the SearchRequest. A cluster alias will be provided
when executing a cross-cluster search request with alternate execution
mode, where each cluster does its own reduction locally. In order for
the CCS node to be able to later perform an additional reduction of the
results, we need to make sure that all the needed info stays available.
This means that terms aggregations can be reduced but not pruned, and
pipeline aggs should not be executed. The final reduction will happen
later in the CCS coordinating node.
Relates to #36997 & #32125
With the upcoming cross-cluster search alternate execution mode, the CCS
node will be able to split a CCS request into multiple search requests,
one per remote cluster involved. In order to do that, the CCS node has
to be able to signal to each remote cluster that such sub-requests are
part of a CCS request. Each cluster does not know about the other
clusters involved, and does not know either what alias it is given in
the CCS node, hence the CCS coordinating node needs to be able to provide
the alias as part of the search request so that it is used as index prefix
in the returned search hits.
The cluster alias is a notion that's already supported in the search shards
iterator and search shard target, but it is currently used in CCS as both
index prefix and connection lookup key when fanning out to all the shards.
With CCS alternate execution mode the provided cluster alias needs to be
used only as index prefix, as shards are local to each cluster hence no
cluster alias should be used for connection lookups.
The local cluster alias can be set to the SearchRequest at the transport layer
only, and its constructor/getter methods are package private.
Relates to #32125
Improves on #36449 which did not cover the situation where a node had bumped its term during
the election, and not when receiving the first follower check. This was uncovered while refactoring
NodeJoinTests so that they don't need to access to an internal field of Coordinator anymore (which
can now be made private).
* This speeds up the test from an average 25s down to 7s runtime
* There is no need for artificially slowing down the snapshot to reproduce the issue of an out of sync routing table in practice.
Over hundreds of test runs the test's snapshot shard service still runs in the index not found exception every time reproducing this issue.
* Relates #36294
Keys are compared in BucketSortPipelineAggregation so making key type (ArrayMap) implement Comparable. Maps are compared using the entry set's iterator so ordered maps order is maintain. For each entry first comparing key then value. Assuming all keys are strings. When comparing entries' values if type is not identical and\or type not implementing Comparable, throwing exception. Not implementing equals() and hashCode() functions as parent's ones are sufficient. Tests included.
Today the routing of a SourceToParse is assigned in a separate step
after the object is created. We can easily forget to set the routing.
With this commit, the routing must be provided in the constructor of
SourceToParse.
Relates #36921
We introduce a typeless API in #35790 where we translate the default
docType "_doc" to the user-defined docType. However, we do not rewrite
the SourceToParse with the resolved docType. This leads to a situation
where we have two translog operations for the same document with
different types:
- prvOp [Index{id='9LCpwGcBkJN7eZxaB54L', type='_doc', seqNo=1,
primaryTerm=1, version=1, autoGeneratedIdTimestamp=1545125562123}]
- newOp [Index{id='9LCpwGcBkJN7eZxaB54L', type='not_doc', seqNo=1,
primaryTerm=1, version=1, autoGeneratedIdTimestamp=-1}]
Closes#36769
In this test, we verify that the LocalCheckpointTracker is initialized
with the operations of the safe commit. And the test fails because
Engine#Index does not implement the equals method (should not
implement as it consists of a mutable ParsedDocument).
Closes#36470
This is a follow-up to some discussions around #36399. Currently we have
relatively confusing compression behavior where compression can be
configured for requests based on transport.compress or a specific
setting for a remote cluster. However, we can only compress responses
based on transport.compress as we do not know where a request is
coming from (currently).
This commit modifies the behavior to NEVER compress responses based on
settings. Instead, a response will only be compressed if the request was
compressed. This commit also updates the documentation to more clearly
described transport level compression.
When the script contexts were created in 6, the use of params.ctx was
deprecated. This commit cleans up that code and ensures that params.ctx
is null in both watcher script contexts.
Relates: #34059
This commit adds a RemoteClusterAwareRequest interface that allows a
request to specify which remote node it should be routed to. The remote
cluster aware client will attempt to route the request directly to this
node. Otherwise it will send it as a proxy action to eventually end up
on the requested node.
It implements the ccr clean_session action with this client.
* Adds term number and greppable phrase 'coordinator becoming' to Coordinator
mode changes
* Adds term and version to messages from the ClusterApplier about master
changes
* Reduces some LeaderChecker messages to TRACE level
This commit modifies ESSingleNodeTestCase and ESIntegTestCase and
several concrete test classes to use node names when bootstrapping the
cluster.
Today ClusterBootstrapService.INITIAL_MASTER_NODE_COUNT_SETTING
setting is used to bootstrap clusters in tests. Instead, we want to use
ClusterBootrstapService.INITIAL_MASTER_NODES_SETTING and get rid of
the former setting eventually.
There were two main problems when refactoring InternalTestCluster:
1. Nodes are created one-by-one in buildNode method. And node.name
is created in this method as well. It's not suitable for bootstrapping,
because we need to have the names of all master eligible nodes in
advance, before creating the node with bootstrapping configuration set.
We address this issue by separating buildNode into two methods:
getNodeSettings and buildNode. We first iterate over all nodes to
get nodes settings, then change the setting for the bootstrapping node
and then proceed with building the node.
2. If autoManageMinMasterNodes = false, there is no way for the test to
set the list of bootstrapping nodes because node names are not known in
advance. This problem is solved by adding updateNodesSettings method
to NodeConfigurationSource and ESIntegTestCase (which could be
overridden by concrete integration test class). Once we have the list
of settings for all nodes, the integration test class is allowed to
update it. In our case, we update the
ClusterBootrstapService.INITIAL_MASTER_NODES_SETTING setting.
... MlDistributedFailureIT.testLoseDedicatedMasterNode.
An intermittent failure has been observed in
`MlDistributedFailureIT. testLoseDedicatedMasterNode`.
The test launches a cluster comprised by a dedicated master node
and a data and ML node. It creates a job and datafeed and starts them.
It then shuts down and restarts the master node. Finally, the test asserts
that the two tasks have been reassigned within 10s.
The intermittent failure is due to the assertions that the tasks have been
reassigned failing. Investigating the failure revealed that the `assertBusy`
that performs that assertion times out. Furthermore, it appears that the
job task is not reassigned because the memory tracking info is stale.
Memory tracking info is refreshed asynchronously when a job is attempted
to be reassigned. Tasks are attempted to be reassigned either due to a relevant
cluster state change or periodically. The periodic interval is controlled by a cluster
setting called `cluster.persistent_tasks.allocation.recheck_interval` and defaults to 30s.
What seems to be happening in this test is that if all cluster state changes after the
master node is restarted come through before the async memory info refresh completes,
then the job might take up to 30s until it is attempted to reassigned. Thus the `assertBusy`
times out.
This commit changes the test to reduce the periodic check that reassigns persistent
tasks to `200ms`. If the above theory is correct, this should eradicate those failures.
Closes#36760
Negative timestamps are currently supported in joda time. These are
dates before epoch. However, it doesn't really make sense to have a
negative timestamp, since this is a modern format. Any dates before
epoch can be represented with normal date formats, like ISO8601.
Additionally, implementing negative epoch timestamp parsing in java time
has an edge case which would more than double the code required. This
commit deprecates use of negative epoch timestamps.
Currently the ByteBufferReference does not duplicate the buffer.
This means that any changes to the buffer's limit or position will
impact the reference. This can lead to unexpected behavior. This commit
uses the ByteBuffer#slice method to ensure that the reference retains
its own ByteBuffer.
This commit partially reverts #36447 by using the ability of Joda time's
DateTimeFormatterBuilder to append multiple parsers instead of using the
MergedDateFormatter. The MergedDateFormatter will be removed in a future
change, as it is not as performant due to creating potentially many
exceptions during heavy date parsing. This change is a stop-gap until
that followup is ready.
closes#36602
The test testClusterInfoServiceInformationClearOnError relies on timing behavior. It sets
InternalClusterInfoService.INTERNAL_CLUSTER_INFO_TIMEOUT_SETTING to 1s and relies on the
fact that the stats request completes within that timeframe (which our ever-so-slow CI seems to
violate at times). Unfortunately the logging has been misimplemented in InternalClusterInfoService,
so the corresponding log messages showing that the requests have timed out are missing for this.
The issue can be locally reproduced by reducing the timeout to something lower.
Closes#36554
If the master sends both its follower checks just before disconnection then
neither will receive a response, meaning it must wait for the checks to time
out and send another in order to detect the disconnection and stand down.
Closes#36788
Single-node discovery is not persisting cluster states, which was caused by a recent 7.0-only
refactoring. This commit ensures that the cluster state is properly persisted when using single-node
discovery and adds a corresponding test.
* Replace 0L with an UNASSIGNED_PRIMARY_TERM constant
0 is an illegal value for a primary term that is often used to indicate
the primary term isn't assigned or is irrelevant. This PR replaces the
usage of 0 with a constant, to improve readability and so it can be
tracked and if needed, replaced.
* feedback
The default index_prefix settings will index prefixes of between 2 and 5 characters in length.
Currently, if a prefix search falls outside of this range at either end we fall back to a standard prefix
expansion, which is still very expensive for single character prefixes. However, we have an option
here to use a wildcard expansion rather than a prefix expansion, so that a query of a* gets remapped
to a? against the _index_prefix field - likely to be a very small set of terms, and certain to be much
smaller than a* against the whole index.
This commit adds this extra level of mapping for any prefix term whose length is one less than
the min_chars parameter of the index_prefixes field.
Today we might corrupt a fully deleted segment which is then pruned
once a snapshot is taken. This causes random test failures in CorruptedFileIT.
This change hardens the selection of files to corrupt and removes some fragile
code preventing fully deleted segments to be taken into account.
Closes#36526
Commit #36786 updated docs and strings to reference transport.port instead of
transport.tcp.port. However, this breaks backwards compatibility tests
as the tests rely on string configurations and transport.port does not
exist prior to 6.6. This commit reverts the places were we reference
transport.tcp.port for tests. This work will need to be reintroduced in
a backwards compatible way.
Lucene 7.6 uses a smaller encoding for LatLonShape. This commit forks the LatLonShape classes to Elasticsearch's local lucene package. These classes will be removed on the release of Lucene 7.6.
This is related to #36652. In 7.0 we plan to deprecate a number of
settings that make reference to the concept of a tcp transport. We
mostly just have a single transport type now (based on tcp). Settings
should only reference tcp if they are referring to socket options. This
commit updates the settings in the docs. And removes string usages of
the old settings. Additionally it adds a missing remote compress setting
to the docs.
TransportWriteAction.WriteReplicaResult is not properly synchronized, which can lead to a data race
between the thread that calls respond and the AsyncAfterWriteAction that calls either onSuccess or
onFailure. This data race results in the response listener not being called, which ultimately results in
a stuck replication task on the replica.
This commit is related to #36127. It adds a CcrRestoreSourceService to
track Engine.IndexCommitRef need for in-process file restores. When a
follower starts restoring a shard through the CcrRepository it opens a
session with the leader through the PutCcrRestoreSessionAction. The
leader responds to the request by telling the follower what files it
needs to fetch for a restore. This is not yet implemented.
Once, the restore is complete, the follower closes the session with the
DeleteCcrRestoreSessionAction action.
This commit removes the originalSettings member from Node. It was only
needed to allows test clusters to recreate the node in certain
situations. Instead, the test cluster now keeps track of these settings.
The joda epoch parsing code currently supports passing epoch time as a
number in scientific notation. However, no systems appear to exist which
output timestamps in scientific notation. In java time, it is
particularly complex to implement scientific notation timestamp parsing
within a DateTimeFormatter. This commit adds a deprecation warning when
the epoch time parsers in joda parse scientific notation, so that it can
be removed when switching to java time.
joda are passed a time in scientific notation.
* [Geo] Expose BKDBackedGeoShapes as new VECTOR strategy
This commit exposes lucene's LatLonShape field as a new
strategy in GeoShapeFieldMapper. To use the new indexing
approach, strategy should be set to "vector" in the
geo_shape field mapper. If the tree parameter is set
the mapper will throw an IAE. Note the following:
When using vector strategy:
* geo_shape query does not support querying by POINT,
MULTIPOINT, or GEOMETRYCOLLECTION.
* LINESTRING and MULTILINESTRING queries do not support
WITHIN relation.
* CONTAINS relation is not supported.
* The tree, precision, tree_levels, distance_error_pct,
and points_only parameters will not throw an exception
but they have no effect and will be marked as
deprecated..
All other features are supported.
* revert change to PercolatorFieldMapper
* fix ExistsQuery for geo_shape vector strategy
* add deprecation logging for tree, precision, tree_levels, distance_error_pct, and points_only
* initial update to geoshape docs, including mapping migration updates
* initial support for GeoCollection queries
* fix docs and javadoc errors
* clean up geocollection queries
* set deprecated mapping tests to NOTCONSOLE
* fix geo-shape mapper asciidoc mapping and test warnings
* add support for point queries using LatLonShapeBoundingBoxQuery
* update GeoShapeQueryBuilderTests to include POINT queries for VECTOR strategy. Other comment cleanups
* add lucene geometry build testing to ShapeBuilder tests
* remove deprecated prefix tree mapping from geo-shape.asciidoc
* refactor GeoShapeFieldMapper into LegacyGeoShapeFieldMapper and GeoShapeFieldMapper
Both classes derive from BaseGeoShapeFieldMapper that provides shared parameters:
coerce, ignoreMalformed, ignore_z_value, orientation.
* update docs to remove vector strategy
* fix GeometryCollectionBuilder#buildLucene to return the object created by the shape builder
* fix LineLength failure in GeoJsonShapeParserTests
* ShapeMapper refactor changes from PR feedback
* fix typo in geo-shape.asciidoc
* ignore circle test in docs
* update indexing-approach ref to geoshape-indexing-approach
* add warnings check for LegacyGeoShapeFieldMapper to AbstractBuilderTestCase
* fix deprecatedParameters setup
* update indexing approach
* fixing unexpected warnings failures
* move orientation back to field type
* remove if in LegacyGeoShapeFieldMapper#doXContent. Fix GeoShapeFieldMapper to work with double array as a point
* fix indexing-approach link in circle section of geoshape docs
* add strategy to deprecation warnings check
* fix test failures
* fix typo in QueryStringQueryBuilderTests
* fix total hits to totalHits().value
* fix version number
* add version check to BaseGeoShapeFieldMapper
* fix line length!
* revert version check in BaseGeoShapeFieldMapper
* Fix serialization of mappings of legacy shapes.
* Deprecate types in index API
- deprecate type-based constructors of IndexRequest
- update tests to use typeless IndexRequest constructors
- no yaml tests as they have been already added in #35790
Relates to #35190
MapperService#getAllMetaFields returns an array, which is created out of
an `ObjectHashSet`. Such set does not guarantee deterministic hash
ordering. The array returned by its toArray may be sorted differently
at each run. This caused some repeatability issues in our tests (see #29080)
as we pick random fields from the array of possible metadata fields,
but that won't be repeatable if the input array is sorted differently at
every run. Once setting the tests seed, hppc picks that up and the sorting is
deterministic, but failures don't repeat with the seed that gets printed out
originally (as a seed was not originally set).
See also https://issues.carrot2.org/projects/HPPC/issues/HPPC-173.
With this commit, we simply create a static sorted array that is used for
`getAllMetaFields`. The change is in production code but really affects
only testing as the only production usage of this method was to iterate
through all values when parsing fields in the high-level REST client code.
Anyways, this seems like a good change as returning an array would imply
that it's deterministically sorted.
In order for CCS alternate execution mode (see #32125) to be able to do the final reduction step on the CCS coordinating node, we need to serialize additional info in the transport layer as part of each `SearchHit`. Sort values are already present but they are formatted according to the provided `DocValueFormat` provided. The CCS node needs to be able to reconstruct the lucene `FieldDoc` to include in the `TopFieldDocs` and `CollapseTopFieldDocs` which will feed the `mergeTopDocs` method used to reduce multiple search responses (one per cluster) into one.
This commit adds such information to the `SearchSortValues` and exposes it through a new getter method added to `SearchHit` for retrieval. This info is only serialized at transport and never printed out at REST.
This change adds a new untyped endpoint `{index}/_source/{id}` for both the
GET and the HEAD methods to get the source of a document or check for its
existance. It also adds deprecation warnings to RestGetSourceAction that emit
a warning when the old deprecated "type" parameter is still used. Also updating
documentation and tests where appropriate.
Relates to #35190
This commit adds support to enable bulk upserts to use an index's
default pipeline. Bulk upsert, doc_as_upsert, and script_as_upsert
are all supported.
However, bulk script_as_upsert has slightly surprising behavior since
the pipeline is executed _before_ the script is evaluated. This means
that the pipeline only has access the data found in the upsert field
of the script_as_upsert. The non-bulk script_as_upsert (existing behavior)
runs the pipeline _after_ the script is executed. This commit
does _not_ attempt to consolidate the bulk and non-bulk behavior for
script_as_upsert.
This commit also adds additional testing for the non-bulk behavior,
which remains unchanged with this commit.
fixes#36219
This commit exposes lucene's LatLonShape field as the
default type in GeoShapeFieldMapper. To use the new
indexing approach, simply set "type" : "geo_shape" in
the mappings without setting any of the strategy, precision,
tree_levels, or distance_error_pct parameters. Note the
following when using the new indexing approach:
* geo_shape query does not support querying by
MULTIPOINT.
* LINESTRING and MULTILINESTRING queries do not
yet support WITHIN relation.
* CONTAINS relation is not yet supported.
The tree, precision, tree_levels, distance_error_pct,
and points_only parameters are deprecated.
The remote connection info API leads to resolving addresses of seed
nodes when invoked. This is problematic because if a hostname fails to
resolve, we would not display any remote connection info. Yet, a
hostname not resolving can happen across remote clusters, especially in
the modern world of cloud services with dynamically chaning
IPs. Instead, the remote connection info API should be providing the
configured seed nodes. This commit changes the remote connection info to
display the configured seed nodes, avoiding a hostname resolution. Note
that care was taken to preserve backwards compatibility with previous
versions that expect the remote connection info to serialize a transport
address instead of a string representing the hostname.
This commit adds the last sequence number and primary term of the last operation that have
modified a document to `GetResult` and uses it to power the Update API.
Relates #36148
Relates #10708
Investigating #36556 was made a little trickier because the feedback from the
failing assertion wasn't very informative, and the messages attached to other
nearby assertions were misleading. This commit improves the feedback from these
assertions and tidies up a few other issues in the test suite.
* If removing half the nodes completely removes a shard from the cluster we can't count it in the assertion
* Also:
* Remove unused logger parameter
* Fix typo in var name
* Closes#35365
This commit add support for using sequence numbers to power [optimistic concurrency control](http://en.wikipedia.org/wiki/Optimistic_concurrency_control)
in the delete and index transport actions and requests. A follow up will come with adding sequence
numbers to the update and get results.
Relates #36148
Relates #10708
This commit updates our transport settings for 7.0. It generally takes a
few approaches. First, for normal transport settings, it usestransport.
instead of transport.tcp. Second, it uses transport.tcp, http.tcp,
or network.tcp for all settings that are proxies for OS level socket
settings. Third, it marks the network.tcp.connect_timeout setting for
removal. Network service level settings are only settings that apply to
both the http and transport modules. There is no connect timeout in
http. Fourth, it moves all the transport settings to a single class
TransportSettings similar to the HttpTransportSettings class.
This commit does not actually remove any settings. It just adds the new
renamed settings and adds todos for settings that will be deprecated.
The existing joda compat methods isEquals isAfter and isBefore all took
in a ZonedDateTime, but since all of the scripting is now using the new
JodaCompatZonedDateTime, these are changed to take that in instead.
The following updates were made:
* Add deprecation warnings to `RestUpdateAction`, plus a test in `RestUpdateActionTests`.
* Deprecate relevant methods on the Java HLRC requests/ responses.
* Add HLRC integration tests for the typed APIs.
* Update documentation (for both the REST API and Java HLRC).
* Fix failing integration tests.
Because of an earlier PR, the REST yml tests were already updated (one version without types, and another legacy version that retains types).
SearchSortValuesTests extends now `AbstractSerializingTestCase` which removes some code duplication and standardizes the way we test `fromXContent`, serialization and equals/hashcode.
Also, we were never creating `SearchSortValues` through their public constructor that accept an array of `DocValueFormat` together with the array of raw sort values. That is covered now, which involved some conversion from `BytesRef` to String in the test.
Also, the previous test was not using doing any equality check against the original and parsed versions in `testFromXContent` due to values being parsed with different types in some cases, which is now covered by converting those values using a new method added to `RandomObjects`. The code was already there as part of `randomStoredFieldValues`, but it is now exposed to be used in other scenarios.
For cross cluster search alternate execution mode (see #32125), we will need to take a search request that spans across multiple clusters (based on index prefixes e.g. cluster1:index, cluster2:index etc.) and split it into multiple search requests to be sent to each cluster. A copy constructor added to `SearchRequest` would make that easy and well maintainable in the future.
Something along the same lines already happens in `BulkByScrollParallelizationHelper`, but the corresponding code went outdated as some new fields were added to `SearchRequest` which were not added to the bulk by scroll code. A copy constructor helps making the task of copying a search request maintainable over time.
* Add IntervalQueryBuilder with support for match and combine intervals
* Add relative intervals
* feedback
* YAML test - broekn
* yaml test; begin to add block source
* Add block; make disjunction its own source
* WIP
* Extract IntervalBuilder and add tests for it
* Fix eq/hashcode in Disjunction
* New yaml test
* checkstyle
* license headers
* test fix
* YAML format
* YAML formatting again
* yaml tests; javadoc
* Add OR test -> requires fix from LUCENE-8586
* Add docs
* Re-do API
* Clint's API
* Delete bash script
* doc fixes
* imports
* docs
* test fix
* feedback
* comma
* docs fixes
* Tidy up doc references to old rule
Today we assert that the fake node ID is greater than the real node's ID. In
fact we want to assert that it's greater than _all_ proper UUIDs. This adds
assertions to that effect.
* Adds deprecation logging to ScriptDocValues#getValues.
First commit addressing issue #22919.
`ScriptDocValues#getValues` was added for backwards compatibility but no
longer needed. Scripts using the syntax `doc['foo'].values` when
`doc['foo']` is a list should be using `doc['foo']` instead.
* Fixes two build errors in #34279
* Removes unused import in ScriptDocValuesDatesTest
* Removes used of `.values` in example in diversified-sampler-aggregation.asciidoc
* Removes use of .values from painless test.
Part of #34279
* Updates tests to use `doc[foo]` syntax rather than `doc[foo].values`.
* Removes use of `getValues()` and replaces use of `doc[foo].values` with `doc[foo]`.
* Indentation fix.
* Remove unnecessary list construction at previous `getValues()` callsite in ScriptDocValues.GeoPoints.
* Update migration doc and add link to `getValue` in ScriptDocValues javadoc.
* Fix compile
* Fix javadoc issue
* Removes ScriptDocValues#getValues usage from painless whitelist.
Today, ResyncTask.Status is not registered, but appears as a task status
sometimes, leading to `Failed to deserialize response from handler` exceptions:
java.lang.IllegalArgumentException: Unknown NamedWriteable [org.elasticsearch.tasks.Task$Status][resync]
This commit adds the missing registration.
ConcurrentHashMap does not always behave correctly if removing elements and
concurrently checking for its emptyiness. Work around this by protecting all
usages with a mutex (there was only one usage unprotected by the mutex anyway)
and then we don't even need a ConcurrentHashMap at all.
In order for CCS alternate execution mode (see #32125) to be able to do the final reduction step on the CCS coordinating node, we need to serialize additional info in the transport layer as part of the `SearchHits`, specifically:
- lucene `SortField[]` which contains info about the fields that sorting was performed on and their type, which depends on mappings (that the CCS node does not know about)
- collapse field (`String`) that field collapsing was executed on, if requested
- collapse values (`Object[]`) that field collapsing was based on, if requested
This info is needed to be able to reconstruct the `TopFieldDocs` or `CollapseFieldTopDocs` in the CCS coordinating node to feed the `mergeTopDocs` method and reduce multiple search responses received (one per cluster) into one.
This commit adds such information to the `SearchHits` class. It's nullable info that is not serialized through the REST layer. `SearchPhaseController` sets such info at the end of the hits reduction phase.
* Enable parallel restore operations
* Add uuid to restore in progress entries to uniquely identify them
* Adjust restore in progress entries to be a map in cluster state
* Added tests for:
* Parallel restore from two different snapshots
* Parallel restore from a single snapshot to different indices to test uuid identifiers are correctly used by `RestoreService` and routing allocator
* Parallel restore with waiting for completion to test transport actions correctly use uuid identifiers
In this test, we keep track of a list of index commits then verify that
we reload exactly every operation from the safe commit. If a background
merge is triggered, then we might have a new index commit which is not
recorded in the tracking list. This change disables merges in the test.
Closes#36470
A previous fix of a similar problem in #35201 wasn't general enough, we also
need to catch cases where the randomly generated query string starts with some
version of "now" and hits a date field.
Closes#36595
This commit adds deprecation warnings when using format specifiers with
joda data formats that will change with java time. It also adds the "8"
prefix which may be used to force the new java time format parsing.
The getters and setters for useDisMax() have been deprecated since at least 6.0,
also there hasn't been any reference to the query parameter in the
documentation. Removing it from the builder and tests and replacing it with
`tieBreaker(1.0f)` where necessary.
The commit changes how indices are closed in the MetaDataIndexStateService.
It now uses a 3 steps process where writes are blocked on indices to be closed,
then some verifications are done on shards using the TransportVerifyShardBeforeCloseAction
added in #36249, and finally indices states are moved to CLOSE and their routing
tables removed.
The closing process also takes care of using the pre-7.0 way to close indices if the
cluster contains mixed version of nodes and a node does not support the TransportVerifyShardBeforeCloseAction. It also closes unassigned indices.
Related to #33888
This commit turns MultiValuesSourceFieldConfig into a proper
ToXContentObject for easy testing and verification of its
to/from XContent methods.
Closes#36474.
We are attempting to replace the usage of the `MockTcpTransport` with
the `MockNioTransport`. This commit replaces usages of
`MockTcpTransport` in two zen test cases.
When a security manager is present, the JVM will cache positive hostname
lookups indefinitely. This can be problematic, especially in the modern
world with cloud services where DNS addresses can change, or
environments using Docker containers where IP addresses could be
considered ephemeral. This behavior impacts cluster discovery,
cross-cluster replication and cross-cluster search, reindex from remote,
snapshot repositories, webhooks in Watcher, external authentication
mechanisms, and the Elastic Stack Monitoring Service. The experience of
watching a DNS lookup change yet not be reflected within Elasticsearch
is a poor experience for users. The reason the JVM has this is guard
against DNS cache posioning attacks. Yet, there is already a defense in
the modern world against such attacks: TLS. With proper certificate
validation, even if a resolver falls prey to a DNS cache poisoning
attack, using TLS would neuter the attack. Therefore we have a policy
with dubious security value that significantly impacts usability. As
such we make the usability/security tradeoff towards usability, since
the security risks are very low. This commit introduces new system
properties that Elasticsearch observes to override the JVM DNS cache
policy.
Previously persistent task assignment was checked in the
following situations:
- Persistent tasks are changed
- A node joins or leaves the cluster
- The routing table is changed
- Custom metadata in the cluster state is changed
- A new master node is elected
However, there could be situations when a persistent
task that could not be assigned to a node could become
assignable due to some other change, such as memory
usage on the nodes.
This change adds a timed recheck of persistent task
assignment to account for such situations. The timer
is suspended while checks triggered by cluster state
changes are in-flight to avoid adding burden to an
already busy cluster.
Closes#35792
* We should compare the target value with the to be applied value before interpreting the update as a change
* This speeds up the test failing in #36496 considerably by preventing state updates on noop setting updates
This commit add support to engine operations for resolving and verifying the sequence number and
primary term of the last modification to a document before performing an operation. This is
infrastructure to move our (optimistic concurrency control)[http://en.wikipedia.org/wiki/Optimistic_concurrency_control] API to use sequence numbers instead of internal versioning.
Relates #36148
Relates #10708
There are certain BootstrapCheck checks that may need access environment-specific
values. Watcher's EncryptSensitiveDataBootstrapCheck passes in the node's environment
via a constructor to bypass the shortcoming in BootstrapContext. This commit
pulls in the node's environment into BootstrapContext.
Another case is found in #36519, where it is useful to check the state of the
data-path. Since PathUtils.get and Paths.get are forbidden APIs, we rely on
the environment to retrieve references to things like node data paths.
This means that the BootstrapContext will have the same Settings used in the
Environment, which currently differs from the Node's settings.
data files under the cluster name subdirectory has been deprecated and was
meant to be removed in 6.0. This commit removes some leftover referrences to
these paths.
Includes the following:
* Reversion of doc-values changes in LUCENE-8374; we are interested in seeing if this
has an effect on benchmarks for node-stats and index-stats
* More improvements to docvalues updates
* Make sure to test conversion for both typed and typeless HLRC requests.
* Update a few more statements to deprecatedAndMaybeLog.
* Make sure Rest*SearchTemplateActionTests extend RestActionTestCase.
Currently TransportRequestOptions allows specific requests to request
compression. This commit removes this and always compresses based on the
settings. Additionally, it removes TransportResponseOptions as they
are unused.
This closes#36399.
* Fix CorruptFileIT to also take last DV generation into account
We currently only prune old .liv generations. With soft_deletes it's important
to also prune DV generations.
* Fix CorruptionUtils to skip the footer bytes after the checksum is read.
Today we read a broken checksum since we also checksum the 8 footer bytes that include
the checksum algorithm and the footer magic.
Closes#36526
Currently, the `BytesStreamOutput` always zeros out the underlying byte
pages when they are acquired. This should not be necessary as the stream
overwrites the underlying bytes as serialization occurs.
`PageCacheRecycler` is the class that creates and holds pages of arrays
for various uses. `BigArrays` is just one user of these pages. This
commit moves the constants that define the page sizes for the recycler
to be on the recycler class.
This commit moves the node field in the JoinRequest object to be a
private field, adding a dedicated accessor. This is a minor breaking
change in that it is no longer possible for all callers to overwrite
this field, but that is a feature.
This commit moves the MergedDateFormatter to a package private class and
reworks joda DateFormatter instances to use that instead of a single
DateTimeFormatter with multiple parsers. This will allow the java and
joda multi formats to share the same format parsing method in a
followup.
Currently our handshake requests do not include a version. This is
unfortunate as we cannot rely on the stream version since it is not the
sending node's version. Instead it is the minimum compatibility version.
The handshake request is currently empty and we do nothing with it. This
should allow us to add data to the request without breaking backwards
compatibility.
This commit adds the version to the handshake request. Additionally, it
allows "future data" to be added to the request. This allows nodes to craft
a version compatible response. And will properly handle additional data in
future handshake requests. The proper handling of "future data" is useful
as this is the only request where we do not know the other node's version.
Finally, it renames the TcpTransportHandshaker to
TransportHandshaker.
This commit contains a few minor changes to our search code:
- adjust the visibility of a couple of methods in our search code to package private from public or protected.
- make some of the `SearchPhaseController` methods static where possible
- rename one of the `SearchPhaseController#reducedQueryPhase` methods (used only for scroll requests) to `reducedScrollQueryPhase` without the `isScrollRequest` argument which was always set to `true`
- replace leniency in `SearchPhaseController#setShardIndex` with an assert to make sure that we never set the shard index twice
- remove two null checks where the checked field can never be null
- resolve an unchecked warning
- replace `List#toArray` invocation that creates an array providing the true size with array creation of length 0
- correct a couple of typos in comments
The different `DocValueFormat` implementors throw `UnsupportedOperationException` for methods that they don't support. That is perfectly fine, and quite common as not all implementors support all of the possible formats. This makes it hard though to trace back which implementors support which formats as they all implement the same methods.
This commit introduces default methods in the `DocValueFormat` interface so that all methods throw `UnsupportedOperationException` by default. This way implementors can override only the methods that they specifically support.
This commit modifies BigArrays to take a circuit breaker name and
the circuit breaking service. The default instance of BigArrays that
is passed around everywhere always uses the request breaker. At the
network level, we want to be using the inflight request breaker. So this
change will allow that.
Additionally, as this change moves away from a single instance of
BigArrays, the class is modified to not be a Releasable anymore.
Releasing big arrays was always dispatching to the PageCacheRecycler,
so this change makes the PageCacheRecycler the class that needs to be
managed and torn-down.
Finally, this commit closes#31435 be making the serialization of
transport messages use the inflight request breaker. With this change,
we no longer push the global BigArrays instnace to the network level.
Today the Zen2 coordinator only applies a write block when there is no known
elected master, ignoring the `discovery.zen.no_master_block` setting. This
commit resolves this, applying the correct block according to the configuration
instead.
Today we do not enforce soft-deletes when accessing the Lucene changes
snapshot. This might lead to internal errors because we assume
soft-deletes are enabled in that code path.
1. CCR tests work without any changes
2. `testDanglingIndices` require changes the source code (added TODO).
3. `testIndexDeletionWhenNodeRejoins` because it's using just two
nodes, adding the node to exclusions is needed on restart.
4. `testCorruptTranslogTruncationOfReplica` starts dedicated master
one, because otherwise, the cluster does not form, if nodes are stopped
and one node is started back.
5. `testResolvePath` needs TEST cluster, because all nodes are stopped
at the end of the test and it's not possible to perform checks needed
by SUITE cluster.
6. `SnapshotDisruptionIT`. Without changes, the test fails because Zen2
retries snapshot creation as soon as network partition heals. This
results into the race between creating snapshot and test cleanup logic
(deleting index). Zen1 on the
other hand, also schedules retry, but it takes some time after network
partition heals, so cleanup logic executes latter and test passes. The
check that snapshot is eventually created is added to
the end of the test.
Today we log a slightly cryptic "cluster bootstrapping is disabled on this
node" message if bootstrapping hasn't been configured. Since there is today
only one way to bootstrap the cluster it seems preferable to spell out exactly
which setting is missing.
* Lower fielddata circuit breaker default limit
Lower fielddata circuit breaker default limit from 60% to 40% as we have
moved to doc_values for most of the cases.
* merge master in
* update tests
* update docs
Deals with a situation where a follower becomes disconnected from the leader, but only for such a
short time where it becomes candidate and puts up a NO_MASTER_BLOCK, but then receives a
follower check from the leader. If the leader does not notice the node disconnecting, it is important
for the node not to be turned back into a follower but try and join the leader again.
We still should prefer the node into a follower on a follower check when this follower check triggers
a term bump as this can help during a leader election to quickly have a leader turn all other nodes
into followers, even before the leader has had the chance to transfer a possibly very large cluster
state.
Closes#36428
The following updates were made:
- Add a new untyped endpoint `{index}/_explain/{id}`.
- Add deprecation warnings to Rest*Action, plus tests in Rest*ActionTests.
- For each REST yml test, make sure there is one version without types, and another legacy version that retains types (called *_with_types.yml).
- Deprecate relevant methods on the Java HLRC requests/ responses.
- Update documentation (for both the REST API and Java HLRC).
This commit converts the watcher execution context to use the joda
compat java time objects. It also again removes the joda methods from
the painless whitelist.
For each API, the following updates were made:
- Add deprecation warnings to `Rest*Action`, plus tests in `Rest*ActionTests`.
- For each REST yml test, make sure there is one version without types, and another legacy version that retains types (called *_with_types.yml).
- Deprecate relevant methods on the Java HLRC requests/ responses.
- Update documentation (for both the REST API and Java HLRC).
Allow `auto_date_histogram` as a valid parent agg for derivative,
cumulative sum, moving average, moving function and serial
differencing pipeline aggregations.
Since all these aggs share the same requirement (sequentially
ordered parent aggs), this commit also refactors to share
the same validation code so that any newly added aggs won't
be forgotten.
Closes#35578
This change adds a way to provide the content type of the rest serialization
tests when creating random instances. This is used by SearchHitsTests to ensure
that the internal members of the class are created with the same xContentType
and that equals can be used to compare an instances created from an XContent
view.
Today the `GetDiscoveredNodesAction` waits, possibly indefinitely, to discover
enough nodes to bootstrap the cluster. However it is possible that the cluster
forms before a node has discovered the expected collection of nodes, in which
case the action will wait indefinitely despite the fact that it is no longer
required.
This commit changes the behaviour so that the action fails once a node receives
a cluster state with a nonempty configuration, indicating that the cluster has
been successfully bootstrapped and therefore the `GetDiscoveredNodesAction`
need wait no longer.
Relates #36380 and #36381; reverts 558f4ec278.
This commit creates JodaDateFormatter to replace
FormatDateTimeFormatter. It converts all uses of the old class
to DateFormatter to allow a future change to use JavaDateFormatter
when appropriate.
Moves all remaining (rolling-upgrade and mixed-version) REST tests to use Zen2. To avoid adding
extra configuration, it relies on Zen2 being set as the default discovery type. This required a few
smaller changes in other tests. I've removed AzureMinimumMasterNodesTests which tests Zen1
functionality and dates from a time where host providers were not configurable and each cloud
plugin had its own discovery.type, subclassing the ZenDiscovery class. I've also adapted a few tests
which were unnecessarily adding addTestZenDiscovery = false for the same legacy reasons. Finally,
this also moves the unconfigured-node-name REST test to Zen2, testing the auto-bootstrapping
functionality in development mode when no discovery configuration is provided.
Today we expose a new engine immediately during Lucene rollback. The new
engine is started with a safe commit which might not include all
acknowledged operation. With this change, we won't expose the new engine
until it has recovered from the local translog.
Note that this solution is not complete since it's able to reserve only
acknowledged operations before the global checkpoint. This is because we
replay translog up to the global checkpoint during rollback. A per-doc
Lucene rollback would solve this issue entirely.
Relates #32867
Today, we iterate the bitset of hardLiveDocs to calculate the number of
live docs. This calculation might be expensive if we enable soft-deletes
(by default) for old indices whose soft-deletes was disabled previously
and had hard-deletes.
Once soft-deletes is enabled, we no longer hard-update or hard-delete
documents directly. We have hard-deletes in two scenarios: (1) from old
segments where soft-deletes was disabled, (2) when IndexWriter hits
non-aborted exceptions. These two cases, IW flushes SegmentInfos before
exposing the hard-deletes; thus we can use the hard-delete count of
SegmentInfos.
Creating `CompositeBytesReference` has more overhead than a single
`ByteBufferReference`. Many of our messages will be contained to a
single `ByteBuffer`. This commit avoids creating composite instances
when there is 0 or 1 underlying `ByteBuffers`.
We support rolling upgrades from Zen1 by keeping the master as a Zen1 node
until there are no more Zen1 nodes in the cluster, using the following
principles:
- Zen1 nodes will never vote for Zen2 nodes
- Zen2 nodes will, while not bootstrapped, vote for Zen1 nodes
- Zen2 nodes that were previously part of a mixed cluster will automatically
(and unsafely) bootstrap themselves when the last Zen1 node leaves.
This commit makes FormatDateTimeFormatter and DateFormatter apis close
to each other, so that the former can be removed in favor of the latter.
This PR does not change the uses of FormatDateTimeFormatter yet, so that
that future change can be purely mechanical.
This is related to #35975. It implements a basic restore functionality
for the CcrRepository. When the restore process is kicked off, it
configures the new index as expected for a follower index. This means
that the index has a different uuid, the version is not incremented, and
the Ccr metadata is installed.
When the restore shard method is called, an empty shard is initialized.
This commit removes the parseDefaulting method from DateFormatter,
bringing it more inline with the joda equivalent
FormatDateTimeFormatter. This method was only needed for the java
time implementation of DateMathParser. Instead, a DateFormatter now
returns an implementation of DateMathParser for the given format,
allowing the java time implementation to construct the appropriate date
math parser internally.
* Add warning if cluster fails to form fast enough
Today if a leader is not discovered or elected then nodes are essentially
silent at INFO and above, and log copiously at DEBUG and below. A short delay
when electing a leader is not unusual, for instance if other nodes have not yet
started, but a persistent failure to elect a leader is a problem worthy of log
messages in the default configuration.
With this change, while there is no leader each node outputs a WARN-level log
message every 10 seconds (by default) indicating as such, describing the
current discovery state and the current quorum(s).
* Add note about whether the discovered nodes form a quorum or not
* Introduce separate ClusterFormationFailureHelper
... and back out the unnecessary changes elsewhere
* It can be volatile
In #34474, we added a new assertion to ensure that the
LocalCheckpointTracker is always consistent with Lucene index. However,
we reset LocalCheckpoinTracker in testDedupByPrimaryTerm cause this
assertion to be violated.
This commit removes resetCheckpoint from LocalCheckpointTracker and
rewrites testDedupByPrimaryTerm without resetting the local checkpoint.
Relates #34474
Add support for inlined user dictionary in Nori
This change adds a new option called `user_dictionary_rules` to the
Nori a tokenizer`. It can be used to set additional tokenization rules
to the Korean tokenizer directly in the settings (instead of using a file).
Closes#35842
In real deployments it is important that clusters are properly configured to
avoid accidentally forming multiple independent clusters at cluster
bootstrapping time. However we also expect to be able to unpack Elasticsearch
and start up one or more nodes without any up-front configuration, and have
them do their best to find each other and form a cluster after a few seconds.
This change adds a delayed automatic bootstrapping process to nodes that start
up with no relevant settings set to support the desired out-of-the-box
experience without compromising safety in properly-configured deployments.
The conversion of timezones in JodaCompatibleZonedDateTime from joda to
java time requires the use of the DateUtils class to cater for corner
cases.
Closes#36306
This pull request adds the TransportShardCloseAction which is a
transport replication action that acquires all index shard permits for
its execution. This action will be used in the future by the
MetaDataIndexStateService in a new index closing process, where
we need to execute some sanity checks before closing an index.
The action executes the following verifications on the primary and replicas:
* there is no other on going operation active on the shard
* the data node holding the shard knows that the index is blocked for writes
* the shard's max sequence number is equal to the global checkpoint
When the verifications are done and successful, the shard is flushed.
Relates #33888
Includes:
LUCENE-8594: DV update are broken for updates on new field
LUCENE-8590: Optimize DocValues update datastructures
LUCENE-8593: Specialize single value numeric DV updates
Relates #36286
The test testLookupSeqNoByIdInLucene fails because it expects if any
change should be visible after a flush. However, that flush might be
ignored if the waitIfOngoing parameter is false (the default value), and
there is an ongoing flush triggered after merge is running.
Closes#35823
This commit moves back to use explicit dependsOn for test tasks on
check. Not all tasks extending RandomizedTestingTask should be run by
check directly.
* Add deprecation warnings to `Rest*TermVectorsAction`, plus tests in `Rest*TermVectorsActionTests`.
* Deprecate relevant methods on the Java HLRC requests/ responses.
* Update documentation (for both the REST API and Java HLRC).
* For each REST yml test, create one version without types, and another legacy version that retains types (called *_with_types.yml).
* Merging the zen2 branch has made it impossible to reproduce the test failure here, likely as a result of the state now being written atomically
* Closes#36189
Some IDEs (specifically Eclipse 4.8.0) needs some explicit casting for type
inference. This was added in a recent change but we should also have comments so
we remember why this needs to be there (at least for now).
Currently, we only log that WriteStateException has occurred, in
GatewayMetaState.
This PR goal is to re-throw WriteStateException to upper layers.
If dirty flag is set to false, we wrap WriteStateException in
UncheckedIOException, we prefer unchecked exception not to add
explicit throws in the multitude of layers.
If dirty flag is set to true - the world is broken. And we need to
halt the JVM. Instead of explicit halting in GatewayMetaState, we
prefer to throw IOError, which will be subsequently handled by
ElasticsearchUncaughtExceptionHandler and JVM will be halted.
This PR also adds tests for WriteStateException.
The Eclipse IDE java compiler seems to need some special hints about what types
some functions used in the tests return. Correcting this for some test that were
newly merged to master.
This change removes the custom serialization of the total hits
and reuses the shard's serialization of Lucene#read/writeTopDocs
in the client code. This also removes the incorrect assertion that
trips randomly in bwc tests.
Closes#36284
Currently we use Math.random() in a few places in the tests which makes these
tests not reproducable with the random seed mechanism that comes with
ESTestCase. The change removes those instances.
This commit hides ClusterStates that have a STATE_NOT_RECOVERED_BLOCK from
ClusterStateAppliers. This is needed, because some appliers, such as IngestService, rely on
the fact, that cluster states with STATE_NOT_RECOVERED_BLOCK won't contain anything useful.
Once the state is recovered it's fully available for the appliers. This commit also switches many of
the remaining tests that require state persistence/recovery from Zen1 to Zen2.
This commit changes the format of the `hits.total` in the search response to be an object with
a `value` and a `relation`. The `value` indicates the number of hits that match the query and the
`relation` indicates whether the number is accurate (in which case the relation is equals to `eq`)
or a lower bound of the total (in which case it is equals to `gte`).
This change also adds a parameter called `rest_total_hits_as_int` that can be used in the
search APIs to opt out from this change (retrieve the total hits as a number in the rest response).
Note that currently all search responses are accurate (`track_total_hits: true`) or they don't contain
`hits.total` (`track_total_hits: true`). We'll add a way to get a lower bound of the total hits in a
follow up (to allow numbers to be passed to `track_total_hits`).
Relates #33028
Closes#19528
* Port a test Colin wrote for the TDigest library to validate TDigests storing over 2*MAXINT values. This appears to have been fixed in version 3.2 of TDigest, which Elasticsearch has been using for some time, so no changes were necessary to resolve this issue.
The shard deletion logic (triggered by IndicesStore), which also leads to index metadata deletion on
non-master-eligible data nodes, currently races against the new cluster state persistence logic
triggered by accepting cluster states. One thread is writing the index metadata while another one is
deleting the index metadata, leading to exceptions and assertions tripping (see below). The solution
proposed by this PR is to move the cluster state persistence of non-master-eligible nodes back to
the cluster applier service, just as it used to be for Zen1. This ensures that the index metadata
deletion logic, which is triggered by the shard deletion logic, runs on the same thread on which we
persist the cluster state.
Closes#35435
- make it easier to add additional testing tasks with the proper configuration and add some where they were missing.
- mute or fix failing tests
- add a check as part of testing conventions to find classes not included in any testing task.
This commit replaces usages of Streamable with Writeable for the
BaseTasksResponse / TransportTasksAction classes and subclasses of
these classes.
Note that where possible response fields were made final.
Relates to #34389
This commit adds an empty CcrRepository snapshot/restore repository.
When a new cluster is registered in the remote cluster settings, a new
CcrRepository is registered for that cluster.
This is implemented using a new concept of "internal repositories".
RepositoryPlugin now allows implementations to return factories for
"internal repositories". The "internal repositories" are different from
normal repositories in that they cannot be registered through the
external repository api. Additionally, "internal repositories" are local
to a node and are not stored in the cluster state.
The repository will be unregistered if the remote cluster is removed.
* Move `createRepository` call out of cluster state tasks
* Now only `RepositoriesService#applyClusterState` manipulates `this.repositories`
* Closes#9488
This commit makes `document`, `update`, `explain`, `termvectors` and `mapping`
typeless APIs work on indices that have a type whose name is not `_doc`.
Unfortunately, this needs to be a bit of a hack since I didn't want calls with
random type names to see documents with the type name that the user had chosen
upon type creation.
The `explain` and `termvectors` do not support being called without a type for
now so the test is just using `_doc` as a type for now, we will need to fix
tests later but this shouldn't require further changes server-side since passing
`_doc` as a type name is what typeless APIs do internally anyway.
Relates #35190
It is important that all shards of a given index have the same
`indexCreatedVersionMajor` to Lucene, or eg. merging those shards is going to
be considered illegal. At the moment, we use the latest Lucene version when
creating a shard, which could cause shards to have different created versions
eg. in case of forced allocation. This commit makes sure to reuse the
appropriate Lucene version in order to avoid such issues.
Closes#33826
Today we configure the soft-deletes field iff soft-deletes enabled.
Although this choice was correct, it prevents an engine with
soft-deletes disabled from opening a Lucene index with soft-deletes.
Moreover, this change should not have any side-effect if a Lucene index
does not have any soft-deletes.
Relates #36141
This commit implements proper metadata recovery for Zen2.
GatewayService is responsible for the recovery. In Zen1 GatewayService
creates an instance of Gateway, that is used to reach out to other cluster
nodes, get their state and calculate the most up-to-date state based on
versions. After that Gateway performs upgrade and archival of
ClusterSettings and closes bad indices. Then recovered state is passed to GatewayService.GatewayRecoveryListener that mixes up current state
and restored state, removes state not recovered block, creates the
routing table and performs re-routing.
In Zen2 we should perform this kind of logic on cluster startup, except
mixing state (because there is nothing to mix) and opening routing table.
This commit refactors out all `ClusterUpdate` functions in a separate class
`ClusterStateUpdaters`, which is used by `Gateway` and `GatewayService`
in case of Zen1, and by `GatewayMetaState` and `GatewayService` in case of
Zen2.
This commit also switches all integration tests that are already using Zen2 from
InMemoryPersistedState to GatewayMetaState.
This commit changes how an operation which requires all index shard
operations permits is executed when a primary term update is required:
the operation and the update are combined so that the operation is
executed after the primary term update under the same blocking
operation.
Closes#35850
Co-authored-by: Yannick Welsch <yannick@welsch.lu>
This test suite can stop all the shared master-eligible nodes, which breaks the
cluster since any non-shared master-eligible nodes are stopped first in the
reset process between tests.
Since this test suite can leave the cluster in this somewhat broken state, it
seems best that it uses a new cluster for each test.
This change adds a soft limit to open scroll contexts that can be controlled with the dynamic cluster setting `search.max_open_scroll_context` (defaults to 500).
Today if a node `A` sends a peers request to another node `B` then `B` will
react by sending a peers request back to `A`. However if `A` is not
master-eligible then this reaction is pointless and fails with an exception
saying `non-master-eligible node found`, adding noise to the logs. This change
suppresses this response to non-master-eligible nodes.
Given that we check the max buckets limit on each shard when collecting the buckets, and that non final reduction cannot add buckets (see #35921), there is no point in counting and checking the number of buckets as part of non final reduction phases.
Such check is still needed though in the final reduction phases to make sure that the number of returned buckets is not above the allowed threshold.
Relates somehow to #32125 as we will make use of non final reduction phases in CCS alternate execution mode and that increases the chance that this check trips for nothing when reducing aggs in each remote cluster.
When building a query Lucene distinguishes two cases, queries that require to produce a score and queries that only need to match. We cloned this mechanism in the QueryBuilders in order to be able to produce different queries based on whether they need to produce a score or not. However the only case in es that require this distinction is the BoolQueryBuilder that sets a different minimum_should_match when a `bool` query is built in a filter context..
This behavior doesn't seem right because it makes the matching of `should` clauses different when the score is not required.
Closes#35293
* Moved method `canOpenIndex` is only used in tests -> moved to test CP
* Simplify `org.elasticsearch.index.store.Store#renameTempFilesSafe`
* Delete some dead methods
* Replace Streamable w/ Writeable in BaseTasksRequest and subclasses
This commit replaces usages of Streamable with Writeable for the
BaseTasksRequest / TransportTasksAction classes and subclasses of
these classes.
Relates to #34389
In #36033 we removed a catch block because we thought we were preventing
exceptions by avoiding concurrent elections, missing the obvious fact that some
joins are supposed to be failing.
As a quick fix the catch was reinstated in 3a5dab6d8e
but this change adds finesse by only catching exceptions from the joins that we
expect to fail. It also inlines an always-false parameter to `initialState()`.
Adds about a minute worth of backoffs and retries to saving task
results so it is *much* more likely that a busy cluster won't lose task
results. This isn't an ideal solution to losing task results, but it is
an incremental improvement. If all of the retries fail when still log
the task result, but that is far from ideal.
Closes#33764
Empty buckets don't need to be added when performing an incremental reduction step, they can be added later in the final reduction step. This will allow us to later remove the max buckets limit when performing non final reduction.
This is a follow-up to #35144. That commit made the underlying
connection opening process in TcpTransport asynchronous. However the
method still blocked on the process being complete before returning.
This commit moves the blocking to the ConnectionManager level. This is
another step towards the top-level TransportService api being async.
Prior to #35441 `ConnectionManager` had a `Lifecycle` object to support
the ping runnable. After that commit, the connection amanger only needs
the existing `AtomicBoolean` to indicate if it is running.
New that we test with min_doc_count set to 0 as well, we may end up generating a lot more buckets. This commit adjusts the min bound and max bound, as well as the offset for each randomly generated agg instance so that we don't end up hitting the 10.000 max buckets limit.
Relates to #36064
In this test we were randomizing different values but minDocCount was hardcoded to 1. It's important to test other values, especially `0` as it's the default. The test needed some adapting in the way buckets are randomly generated: all aggs need to share the same interval, minDocCount and emptyBucketInfo. Also assertions need to take into account that more (or less) buckets are expected depending on minDocCount.
CompositeBytesReference#slice has two bugs:
- One that makes it fail if the reference is empty and an empty slice is
created, this is #35950 and is fixed by special-casing empty-slices.
- One performance bug that makes it always create a composite slice when
creating a slice that ends on a boundary, this is fixed by computing `limit`
as the index of the sub reference that holds the last element rather than
the next element after the slice.
Closes#35950
org.elasticsearch.rest.RestController#hasContentType checks to see if the
RestHandler supports the `application/x-ndjson` Content-Type. DeprecationRestHandler
is a wrapper around the real RestHandler, and prior to this change
would always return `false` due to the interface's default supportsContentStream().
This prevents API's that use multi-line JSON from properly being deprecated
resulting in an HTTP 406 error.
This change ensures that the DeprecationRestHandler honors the
supportsContentStream() of the wrapped RestHandler.
Relates to #35958
The support for rest_total_hits_as_int has already been merged to 6x
in #35848 so this change adds this new option to master. The plan was
to add this new option as part of #35848 but we've decided to wait a few
days before merging this breaking change so this commit just handles
the new option as a noop exactly like 6x for now. This will allow
users to migrate to this parameter before #35848 is merged.
Relates #33028
This is related to #34405 and a follow-up to #34753. It makes a number
of changes to our current keepalive pings.
The ping interval configuration is moved to the ConnectionProfile.
The server channel now responds to pings. This makes the keepalive
pings bidirectional.
On the client-side, the pings can now be optimized away. What this
means is that if the channel has received a message or sent a message
since the last pinging round, the ping is not sent for this round.
The nested agg can defer the collection of children if it is nested
under another aggregation. In such case accessing the score in the children
aggregation throws an error because the scorer has already advanced to the next
parent. This change fixes this error by caching the score of the parent in the
nested aggregation. Children aggregations that work on nested documents will be
able to access the _score. Also note that the _score in this case is always the
parent's score, there is no way to retrieve the score of a nested docs in aggregations.
Closes#35985Closes#34555
* [Zen2] Implement Tombstone REST APIs
* Adds REST API for withdrawing votes and clearing vote withdrawls
* Tests added to Netty4 module since we need a real Network impl. for Http endpoints
Today the default for USE_ZEN2 is false and it is overridden in many places. By
defaulting it to true we can be sure that the only places in which Zen2 does
not work are those in which it is explicitly set to false.
Today we sometimes create a setup in which the node is a quorum on its own,
which allows it to win a pre-voting round and schedule an election essentially
at will, causing it to discard all the joins it just received and fail the
test. This change excludes this case, preventing stray elections from ruining
things.
Today any node can win an election. However, the whole point of
master-eligibility is that master-ineligible nodes should not be elected as the
leader; furthermore master-ineligible nodes do not have any outgoing STATE
channels so cannot publish cluster states, so their leadership is ineffective
and disruptive.
This change ensures that the elected leader is master-eligible by preventing
master-ineligible nodes from scheduling an election.
A number of tokenfilters can produce multiple tokens at the same position. This
is a problem when using token chains to parse synonym files, as the SynonymMap
requires that there are no stacked tokens in its input.
This commit ensures that when used to parse synonyms, these tokenfilters either produce
a single version of their input token, or that they throw an error when mappings are
generated. In indexes created in elasticsearch 6.x deprecation warnings are emitted in place
of the error.
* asciifolding and cjk_bigram produce only the folded or bigrammed token
* decompounders, synonyms and keyword_repeat are skipped
* n-grams, word-delimiter-filter, multiplexer, fingerprint and phonetic throw errors
Fixes#34298
The ActiveShardCount is used by cluster state observers to wait for a
given number of shards to be active before returning to the caller. The
current implementation does not work when an index is closed while an
observer is waiting on shards to be active. In this case, a NPE is thrown
and the observer is never notified that the shards won't become active.
This commit fixes the ActiveShardCount.enoughShardsActive() so that it
does not fail when an index is closed, similarly to what is done when an
index is deleted.
In `InternalHistogramTests` we were randomizing different values but `minDocCount` was hardcoded to `1`. It's important to test other values, especially `0` as it's the default. To make this possible, the test needed some adapting in the way buckets are randomly generated: all aggs need to share the same `interval`, `minDocCount` and `emptyBucketInfo`. Also assertions need to take into account that more (or less) buckets are expected depending on `minDocCount`.
This was originated by #35921 and its need to test adding empty buckets as part of the reduce phase.
Also relates to #26856 as one more key comparison needed to use `Double.compare` to properly handle `NaN` values, which was triggered by the increased test coverage.
Right now using the `GET /_tasks/<taskid>` API and causing a task to opt
in to saving its result after being completed requires permissions on
the `.tasks` index. When we built this we thought that that was fine,
but we've since moved towards not leaking details like "persisting task
results after the task is completed is done by saving them into an index
named `.tasks`." A more modern way of doing this would be to save the
tasks into the index "under the hood" and to have APIs to manage the
saved tasks. This is the first step down that road: it drops the
requirement to have permissions to interact with the `.tasks` index when
fetching task statuses and when persisting statuses beyond the lifetime
of the task.
In particular, this moves the concept of the "origin" of an action into
a more prominent place in the Elasticsearch server. The origin of an
action is ignored by the server, but the security plugin uses the origin
to make requests on behalf of a user in such a way that the user need
not have permissions to perform these actions. It *can* be made to be
fairly precise. More specifically, we can create an internal user just
for the tasks API that just has permission to interact with the `.tasks`
index. This change doesn't do that, instead, it uses the ubiquitus
"xpack" user which has most permissions because it is simpler. Adding
the tasks user is something I'd like to get to in a follow up change.
Instead, the majority of this change is about moving the "origin"
concept from the security portion of x-pack into the server. This should
allow any code to use the origin. To keep the change managable I've also
opted to deprecate rather than remove the "origin" helpers in the
security code. Removing them is almost entirely mechanical and I'd like
to that in a follow up as well.
Relates to #35573
Currently when a Fuzziness instance with custom AUTO distance values gets
written to XContent, the customized lower and upper distance values are ommited
and can consequently not be parsed back. This changes this to write the String
including the optional custom values when writing to XContent and fixes the
tests that should have caught this in the first place, e.g. by adding the custom
low and high distance values to the equality check.
This change removes the deprecated useDisMax() and useAllFields() methods from
the QueryStringQueryBuilder and related tests. The disMax parameter has already
been a no-op since 6.0 and also the useAllFields has been deprecated since 6.0
and there is a direct replacement via defaultField.
`ScriptDocValues#getValues` was added for backwards compatibility but no
longer needed. Scripts using the syntax `doc['foo'].values` when
`doc['foo']` is a list should be using `doc['foo']` instead.
Closes#22919
This PR adds deprecation warnings to the relevant `Rest*Action` classes, plus tests in `Rest*ActionTests`. No updates to REST tests, the Java HLRC, or documentation were necessary, since they didn't make use of types.
MultiSearchRequests issues through `_msearch` now validate all keys
in the metadata section. Previously unknown keys were ignored
while now an exception is thrown.
Closes#35869
This commit removes the dedicated `setSoLinger` method. This simplifies
the `TcpChannel` interface. This method has very little effect as the
SO_LINGER is not set prior to the channels being closed in the abstract
transport test case. We still will set SO_LINGER on the
`MockNioTransport`. However we can do this manually.
This pull request makes the `RestGetSourceAction` return a `ResourceNotFoundException` with a proper JSON response when source or document itself is missing (see issue #33384).
Here is below a sample JSON output:
```
{
"error": {
"root_cause": [
{
"type": "resource_not_found_exception",
"reason": "Source not found [index1]/[_doc]/[1]"
}
],
"type": "resource_not_found_exception",
"reason": "Source not found [index1]/[_doc]/[1]"
},
"status": 404
}
```
Today GatewayMetaState is capable of atomically storing MetaData to
disk. We've also moved fields that are needed to be persisted in Zen2
from ClusterState to ClusterState.MetaData.CoordinationMetaData.
This commit implements PersistedState interface.
version and currentTerm are persisted as a part of Manifest.
GatewayMetaState now implements both ClusterStateApplier and
PersistedState interfaces. We started with two descendants
Zen1GatewayMetaState and Zen2GatewayMetaState, but it turned
out to be not easy to glue it.
GatewayMetaState now constructs previousClusterState (including
MetaData) and previousManifest inside the constructor so that all
PersistedState methods are usable as soon as GatewayMetaState
instance is constructed. Also, loadMetaData is renamed to
getMetaData, because it just returns
previousClusterState.metaData().
Sadly, we don't have access to localNode (obtained from
TransportService in the constructor, so getLastAcceptedState
should be called, after setLocalNode method is invoked.
Currently, when deciding whether to write IndexMetaData to disk,
we're comparing current IndexMetaData version and received
IndexMetaData version. This is not safe in Zen2 if the term has changed.
So updateClusterState now accepts incremental write
method parameter. When it's set to false, we always write
IndexMetaData to disk.
Things that are not covered by GatewayMetaStateTests are covered
by GatewayMetaStatePersistedStateTests.
This commit also adds an option to use GatewayMetaState instead of
InMemoryPersistedState in TestZenDiscovery. However, by default
InMemoryPersistedState is used and only one test in PersistedStateIT
used GatewayMetaState. In order to use it for other tests, proper
state recovery should be implemented.
This removes the option to run a cluster without enforcing the
cluster-wide shard limit, making strict enforcement the default and only
behavior. The limit can still be adjusted as desired using the cluster
settings API.
This commit adds the support for exists query in the sorted execution mode
of the `composite` aggregation. We'll execute the aggregation from the sorted
points and use early termination if the main query is an `exists` query over the
first source of the `composite` aggregation.
Today we don't respect the indices options when they are passed
as request parameters to the `_msearch` endpoint. This is unintuitive
and doesn't cause any errors. This changes uses the top-level indices
options as the defaults for each sub search-request.
Closes#35851
More like this query allows to provide identifiers of documents to be retrieved as like/unlike items.
It can happen that at retrieval time an error is thrown, for instance caused by missing routing value when `_routing` is set required in the mapping.
Instead of ignoring such error and returning no documents for the query, the error should be re-thrown and returned to users. As part of this
change also mget and mtermvectors are unified in the way they throw such exception like it happens in other places, so that a `RoutingMissingException` is raised.
Closes#29678
This commit simplifies the throttling logic in InitialSearchPhase and removes some asserts from it. Also, a few formatting changes are applied to its code and surrounding classes.
A publication can succeed and complete before all nodes have applied the
published state and acknowledged it, thanks to the publication timeout; however
we need every node eventually either to apply the published state (or a later
state) or be removed from the cluster. This change introduces the LagDetector
which achieves this liveness property by removing any lagging nodes from the
cluster.
The `wait_for_metadata_version` parameter will instruct the cluster state
api to only return a cluster state until the metadata's version is equal or
greater than the version specified in `wait_for_metadata_version`. If
the specified `wait_for_timeout` has expired then a timed out response
is returned. (a response with no cluster state and wait for timed out flag set to true)
In the case metadata's version is equal or higher than `wait_for_metadata_version`
then the api will immediately return.
This feature is useful to avoid external components from constantly
polling the cluster state to whether somethings have changed in the
cluster state's metadata.
Code that operates on-top of the engine requires all readers returned to be
unwrapped into ElasticsearchDirectoryReader. The special reader
the FrozenEngine uses wasn't wrapped.
Today voting tombstones are stored in CoordinationMetaData as
Set<DiscoveryNode>.
DiscoveryNode is not a lightweight object and have a lot of fields.
It also has toXContent method, but no fromXContent method and the
output of toXContent is not enough to re-create DiscoveryNode
object.
And votingTombstone set should be persisted as a part of MetaData.
On the other hand, the only thing required from the tombstone is the
nodeId.
This PR adds VotingTombstone class for voting tombstones, which
consists of two fields for now - nodeId and nodeName. It could be
extended/shrank in the future if needed.
This PR also resolves TODO's related to the voting tombstones xcontent
story.
Example of CoordinationMetaData.toXContent with voting tombstones:
{
"term": 1,
"last_committed_config": [
"fkwLdOBvXSlgRTBfgNAL",
"tmQiPGHvUxXzPkkCDSJo",
"HhOmtQBZAThpHIGWhxpz",
"qZHWGpoDNPYRNIiqKsDl"
],
"last_accepted_config": [
"lhqacKmriwhHGFZcvqbx",
"MYysmBuROkvJRlDcusyd"
],
"voting_tombstones": [
{
"node_id": "McjbZbRkEz",
"node_name": "pdKIWeNJUO"
},
{
"node_id": "cpXkVibGwo",
"node_name": "UnCvFgdVsc"
},
{
"node_id": "EylRNOztbc",
"node_name": "ohOhkbMWZX"
}
]
}
Today when rolling a transog generation we copy the checkpoint from
`translog.ckp` to `translog-nnnn.ckp` using a simple `Files.copy()` followed by
appropriate `fsync()` calls. The copy operation is not atomic, so if we crash
at the wrong moment we can leave an incomplete checkpoint file on disk. In
practice the checkpoint is so small that it's either empty or fully written.
However, we do not correctly handle the case where it's empty when the node
restarts.
In contrast, in `recoverFromFiles()` we _do_ copy the checkpoint atomically.
This commit extracts the atomic copy operation from `recoverFromFiles()` and
re-uses it in `rollGeneration()`.
This pull request exposes two new methods in the IndexShard and
TransportReplicationAction classes in order to allow transport replication
actions to acquire all index shard operation permits for their execution.
It first adds the acquireAllPrimaryOperationPermits() and the
acquireAllReplicaOperationsPermits() methods to the IndexShard class
which allow to acquire all operations permits on a shard while exposing
a Releasable. It also refactors the TransportReplicationAction class to
expose two protected methods (acquirePrimaryOperationPermit() and
acquireReplicaOperationPermit()) that can be overridden when a transport
replication action requires the acquisition of all permits on primary and/or
replica shard during execution.
Finally, it adds a TransportReplicationAllPermitsAcquisitionTests which
illustrates how a transport replication action can grab all permits before
adding a cluster block in the cluster state, making subsequent operations
that requires a single permit to fail).
Related to elastic #33888
* Forbid negative scores in functon_score query
- Throw an exception when scores are negative in field_value_factor
function
- Throw an exception when scores are negative in script_score
function
Relates to #33309
After #35332 has been merged, we noticed some test failures like #35597
in which one or more replica shards failed to be promoted as primaries
because the primary replica re-synchronization never succeed.
After some digging it appeared that the execution of the resync action was
blocked because of the presence of a global cluster block in the cluster state
(in this case, the "no master" block), making the resync action to fail when
executed on the primary.
Until #35332 such failures never happened because the
TransportResyncReplicationAction is skipping the reroute phase, the only
place where blocks were checked. Now with #35332 blocks are checked
during reroute and also during the execution of the transport replication
action on the primary. After some internal discussion, we decided that the TransportResyncReplicationAction should never be blocked. This action is
part of the replica to primary promotion and makes sure that replicas are in
sync and should not be blocked when the cluster state has no master or
when the index is read only.
This commit changes the TransportResyncReplicationAction to make obvious
that it does not honor blocks. It also adds a simple test that fails if the resync
action is blocked during the primary action execution.
Closes#35597
`testIncompatibleDiffResendsFullState` sometimes makes a 2-node cluster and
then partitions one of the nodes from the leader, which makes the leader stand
down. Then when the partition is removed the cluster re-forms but does so by
sending full cluster states, not diffs, causing the test to fail.
Additionally `testDiffBasedPublishing` sometimes fails if a publication is
delivered out-of-order, wiping out a fresher last-received cluster state with a
less-fresh one. This is fixed here by passing the received cluster state to the
coordinator before recording it as the last-received one, relying on the
coordinator's freshness checks.
Adds an XContent sub parser class that can to wrap another
XContent parser at the beginning of an object and allow skiping
all children in case of the parsing failure. It also uses this
subparser to ignore the rest of the GeoJson shape if the
parsing fails and we need to ignore the geoshape due to the
ignore_malformed flag.
Supersedes #34498Closes#34047
* [GEO] Add support to ShapeBuilders for building Lucene geometry
This commit adds support for building lucene geometry from the ShapeBuilders.
This is needed for integrating LatLonShape as the primary indexing approach
for geo_shape field types. All unit and integration tests are updated to
add randomization for testing both jts/s4j shapes and lucene shapes.
This test is failing sometimes with Zen2 due to the lack of lag detection.
Zen1 does not have this problem as it only considers a join as valid if the
corresponding cluster state update is successfully published and committed
on the joining node.
Today we have a way to atomically persist global MetaData and
IndexMetaData to disk when new ClusterState is received. All other
ClusterState fields are not persisted.
However, there are other parts of ClusterState that should be
persisted, namely:
version
term
lastCommittedConfiguration
lastAcceptedConfiguration
votingTombstones
version is changed frequently, other fields are not. We decided
to group term, lastCommittedConfiguration,
lastAcceptedConfiguration and votingTombstones into
CoordinationMetaData class and make CoordinationMetaData a field
inside MetaData.
MetaData.toXContent and MetaData.fromXContent should take care of
CoordinationMetaData.
version stays as a top level field in ClusterState and will be
persisted as part of Manifest in a follow-up commit.
Also MetaData.isGlobalStateEquals should be extended to include
coordinationMetaData in comparison.
This commit favors exposing getters, such as getTerm directly in
ClusterState to avoid massive code changes.
An example of CoordinationMetaState.toXContent:
{
"term": 1,
"last_committed_config": [
"TiIuBcbBtpuXyDDVHXeD",
"ZIAoVbkjjLPLUuYLaTkw"
],
"last_accepted_config": [
"OwkXbXZNOZPJqccdFHdz",
"LouzsGYwmQzpeQMrboZe",
"fCKGRZdjLTqzXAqPUtGL",
"pLoxshjpJXwDhbgjfYJy",
"SjINLwFIlIEFZCbjrSFo",
"MDkVncJEVyZLJktopWje"
]
}
- Moves disruption tests to Zen2
- Registers a few missing settings
- Removes .put(TestZenDiscovery.USE_ZEN2.getKey(), true) from tests where Zen2 is now enabled
by default through the parent test class
- Moves QuorumGatewayIT back to Zen1, as it is not stable with Zen2 as it currently relies on
dangling indices due to the lack of proper CS persistence, which triggers secondary failures
Queries across multiple fields generate MatchNoDocsQuerys for fields that are
unmapped. In certain situation this can lead to erroneous behaviour,
for example when an umapped field is used in a query_string query across
several fields. If some of the tokens in the query string get eliminated by an
analyzer on the mapped fields, the same token will currently generate
MatchNoDocsQuerys combined into a disjunction, which in turn
leads to no matches in the overall query. Instead we should simply not add
MatchNoDocsQuerys to those disjunctions.
Closes#34708
This parameter in the `query_string` query was deprecated in 6.0 and ignored
since then. Its API methods and remaining uses can be removed in the upcoming
major version.
Relates to #35734
This commit adds a rest endpoint for freezing and unfreezing an index.
Among other cleanups mainly fixing an issue accessing package private APIs
from a plugin that got caught by integration tests this change also adds
documentation for frozen indices.
Note: frozen indices are marked as `beta` and available as a basic feature.
Relates to #34352
Zen2 is now feature-complete enough to run most ESIntegTestCase tests. The changes in this PR
are as follows:
- ClusterSettingsIT is adapted to not be Zen1 specific anymore (it was using Zen1 settings).
- Some of the integration tests require persistent storage of the cluster state, which is not fully
implemented yet (see #33958). These tests keep running with Zen1 for now but will be switched
over as soon as that is fully implemented.
- Some very few integration tests are not running yet with Zen2 for other reasons, depending on
some of the other open points in #32006.
Elasticsearch node is responsible for storing cluster metadata.
There are 2 types of metadata: global metadata and index metadata.
`GatewayMetaState` implements `ClusterStateApplier` and receives all
`ClusterStateChanged` events and is responsible for storing modified
metadata to disk.
When new `ClusterStateChanged` event is received, `GatewayMetaState`
checks if global metadata has changed and if it's the case writes new
global metadata to disk. After that `GatewayMetaState` checks if index
metadata has changed or there are new indices assigned to this node and
if it's the case writes new index metadata to disk. Atomicity of global
metadata and index metadata writes is ensured by `MetaDataStateFormat`
class.
Unfortunately, there is no atomicity when more than one metadata changes
(global and index, or metadata for two indices). And atomicity is
important for Zen2 correctness.
This commit adds atomicity by adding a notion of manifest file,
represented by `MetaState` class. `MetaState` contains pointers to
current metadata.
More precisely, it stores global state generation as long and map from
`Index` to index metadata generation as long. Atomicity of writes for
manifest file is ensured by `MetaStateFormat` class.
The algorithm of writing changes to the disk would be the following:
1. Write global metadata state file to disk and remember
it's generation.
2. For each new/changed index write state file to disk and remember
it's generation. For each not-changed index use generation from
previous manifest file. If index is removed or this node is no longer
responsible for this index - forget about the index.
3. Create `MetaState` object using previously remembered generations and
write it to disk.
4. Remove old state files for global metadata, indices metadata and
manifest.
Additonally new implementation relies on enhanced `MetaDataStateFormat`
failure semantics, `applyClusterState` throws IOException, whose
descendant `WriteStateException` could be (and should be in Zen2)
explicitly handled.
The list of official plugins accidentally included `qa` projects like,
well, `qa` and `amazon-ec2`. This changes the mechanism that we use to
build the list and adds a test to catch this.
Closes#35623