Today query parsers throw TooManyClauses exception when a query creates
too many clauses. However graph phrase queries do not respect this limit.
This change adds a protection against crazy expansions that can happen when
building a graph phrase query. This is a temporary copy of the fix available
in https://issues.apache.org/jira/browse/LUCENE-8479 but not merged yet.
This logic will be removed when we integrate the Lucene patch in a future
release.
* INGEST: Tests for Drop Processor
* UT for behavior of dropped callback
and drop processor
* Moved drop processor to `server`
project to enable this test
* Simple IT
* Relates #32278
Today the `CoordinatorTests` are not completely reliable. These changes make
them more so, by removing a couple of assertions that we do not expect to pass
(yet).
Today `SearchAsyncActionTests#testFanOutAndCollect` uses a simple `HashMap` for
the `nodeToContextMap` variable, which is then accessed from multiple threads
without, apparently, explicit synchronisation. This provides an explanation for
the test failure identified in #29242 in which `.toString()` returns `"[]"`
just before `.isEmpty` returns `false`, without any concurrent modifications.
This change converts `nodeToContextMap` to a `newConcurrentMap()` so that this
cannot occur. It also fixes a race condition in the detection of double-calling
the subsequent search phase.
Closes#29242.
We start tracking max seq_no_of_updates on the primary in #33842. This
commit replicates that value from a primary to its replicas in replication
requests or the translog phase of peer-recovery.
With this change, we guarantee that the value of max seq_no_of_updates
on a replica when any index/delete operation is performed at least the
max_seq_no_of_updates on the primary when that operation was executed.
Relates #33656
We currently fallback to local indices whenever a remote cluster is not found, as there may still be indices / aliases with the same name. Such behaviour is lenient but needs to be kept for backwards compatibility. Clarified that in the code so we don't forget.
Relates to #26247
It mistakenly uses the Elasticsearch major version instead of the Lucene major
version. I noticed it when backporting, it is not noticeable on master because
the only two Lucene versions that are supported, 7 and 8, encode norms the same
way, unlike Lucene 6.
Triggers a join when an active master is detected. In order to avoid spamming joins, deduplicates join request based on <target, join> pair. This ensures that a new join is sent whenever the term is incremented or when a new master is found.
Also changes the logging of join failures from DEBUG to INFO. These join failures should be happening rarely, and can either indicate a failed election (which should be rare) or a configuration issue.
Today the CoordinatorTests are not very reliable if two elections are scheduled
concurrently. Although we expect occasional failures due to this, in fact the
failures are much more common than expected due to a handful of issues. This PR
fixes these issues.
Today, TransportService uses System.currentTimeMillis() to get the current time
to report on things like timeouts, and enqueues lambdas for future execution.
However, in tests it is useful to be able to fake out the current time and to
see what all these enqueued lambdas are really for. This change alters the
situation so that we can obtain the time from the more easily-faked
ThreadPool#relativeTimeInMillis(), and implements some friendlier toString()
methods on the various Runnables so we can see what they are later.
Today, we know that CoordinatorTests sometimes fail to stabilise due to an
election collision. This change improves the logging that occurs when an
election collision occurs so it will be easier to see if this is happening when
analysing a test failure. We also wrap the call to
masterService.submitStateUpdateTask() in a context that logs the node on which
it runs.
We also introduce the InitialJoinAccumulator instead of using a placeholder
CandidateJoinAccumulator at startup, which reduces the cases to consider in
CandidateJoinAccumulator.close() and tightens up the assertions we can make
here.
As far as I can tell this guard against fragile analyzers is no longer relevant, since
we stopped setting special analyzers on numeric fields (3bf6f4). Instead of removing
the guard completely, I opted to keep a check for untokenized + unnormalized fields
to avoid going through the analysis process unnecessarily.
My motivation for simplifying this check is that I'd like to add support for
`split_queries_on_whitespace` to the new 'queryable object' fields. As it stands, I would
have to add a dedicated instanceof check for the new mapper, which is not optimal.
This commit introduces an AbstractSimpleSecurityTransportTestCase for
security transports. This classes provides transport tests that are
specific for security transports. Additionally, it fixes the tests referenced in
#33285.
* TESTS: Make score Float#NaN when there is no max score
Fixes test failure due to maxScore set to Float#MinValue instead
on Float#NaN. In addition the initial value for maxScore is set to
Float#NEGATIVE_INFINITY so it is an illegal value.
Closes#33993
When executing a cross-cluster search, we need to search against all local indices (and no remote indices) in case no indices are specified. Also, if only remote indices are specified, no local indices will be queried. We previously added empty local indices whenever they were not present in the map of the grouped indices, then we would act differently later based on the extracted remote indices. Instead, we now add the empty array for local indices only in case we need to search all local indices; the entry for local indices is not added when local indices should not be searched. This way the grouped indices reflect reality and provide a better indication of what indices will be searched.
Settings validation in AutoQueueAdjustingExecutorBuilder always checked against
a default value which means that we never can change a max queue size that is lower
than the default. This change adds tests and fixes this validation.
This PR is the first step to use seq_no to optimize indexing operations.
The idea is to track the max seq_no of either update or delete ops on a
primary, and transfer this information to replicas, and replicas use it
to optimize indexing plan for index operations (with assigned seq_no).
The max_seq_no_of_updates on primary is initialized once when a primary
finishes its local recovery or peer recovery in relocation or being
promoted. After that, the max_seq_no_of_updates is only advanced internally
inside an engine when processing update or delete operations.
Relates #33656
It is important that the leader periodically checks that its followers are
still healthy and can remain part of its cluster. If these checks fail
repeatedly then the leader should remove the faulty node from the cluster. The
FollowerChecker, introduced in this commit, performs these periodic checks and
deals with retries.
With recent changes to the logging framework, the node name can no longer be injected into the logging output using the node.name setting, which means that for the CoordinatorTests (which are simulating a cluster in a fully deterministic fashion using a single thread), as all the different nodes are running under the same test thread, we are not able to distinguish which log lines are coming from which node. This commit readds logging for node ids in the CoordinatorTests, making two very small changes to DeterministicTaskQueue and TestThreadInfoPatternConverter.
This change adds the OneStatementPerLineCheck to our checkstyle precommit
checks. This rule restricts the number of statements per line to one. The
resoning behind this is that it is very difficult to read multiple statements on
one line. People seem to mostly use it in short lambdas and switch statements in
our code base, but just going through the changes already uncovered some actual
problems in randomization in test code, so I think its worth it.
Today we don't store the auto-generated timestamp of append-only
operations in Lucene; and assign -1 to every index operations
constructed from LuceneChangesSnapshot. This looks innocent but it
generates duplicate documents on a replica if a retry append-only
arrives first via peer-recovery; then an original append-only arrives
via replication. Since the retry append-only (delivered via recovery)
does not have timestamp, the replica will happily optimizes the original
request while it should not.
This change transmits the max auto-generated timestamp from the primary
to replicas before translog phase in peer recovery. This timestamp will
prevent replicas from optimizing append-only requests if retry
counterparts have been processed.
Relates #33656
Relates #33222
It is important that follower nodes periodically check that their leader is
still healthy and that they remain part of its cluster. If these checks fail
repeatedly then followers should attempt to find and join a new leader,
possibly electing one in the process. The LeaderChecker, introduced in this
commit, performs these periodic checks and deals with retries.
Currently, assertSeqNos assumes that the cluster is stable at the end of
the test (i.e., no more shard movement). However, this assumption does
not always hold. In these cases, we can stop the assertion instead of
failing a test.
Closes#33704
If a shard was serving as a replica when another shard was promoted to
primary, then its Lucene index was reset to the global checkpoint.
However, if the new primary fails before the primary/replica resync
completes and we are now being promoted, we have to restore the reverted
operations by replaying the translog to avoid losing acknowledged writes.
Relates #33473
Relates #32867
It's possible for the set "seqNos" to contain only the "unFinishedSeq"
in the testConcurrentReplica test. If this is the case, the call
`randomValueOtherThan` won't make any progress because the predicate
will never be false.
This commit removes this expectation because it's incorrect and it's no
longer needed as we have a dedicated test to verify the contains method.
Relates #33871
Drops `Settings` from some of the methods to lookup loggers and
deprecates another logger lookup that takes `Settings` because
`Settings` is no longer required to build a logger.
* ingest: support simulate with verbose for pipeline processor
This change better supports the use of simulate?verbose with the
pipeline processor. Prior to this change any pipeline processors
executed with simulate?verbose would not show all intermediate
processors for the inner pipelines.
This changes also moves the PipelineProcess and TrackingResultProcessor
classes to enable instance checks and to avoid overly public classes.
As well this updates the error message for when cycles are detected
in pipelines calling other pipelines.
Today all searches happen on the search threadpool which is the correct
behavior in almost any case. Yet, there are exceptions where for instance
searches searches should be passed through a single-thread
thread-pool to reduce impact on a node. This change adds a index-private setting that allows to mark an index as throttled for searches and forks off all non-stats searcher access to this thread-pool for indices that are marked as `index.search.throttled`
Transient settings override persistent settings, but in fact all of the tests
that run as part of `:server:test` and `:server:integTest` will pass if the
precedence is changed to be the other way round. This change adds a test that
verifies the precedence is as documented.
With this commit we clear the fielddata cache per field as it is
supposed to be. Previously we retrieved the proper field from the cache
but then cleared the entire cache anyway.
Closes#33798
Relates #33807
This change adds "contains" method to LocalCheckpointTracker.
One of the use cases is to check if a given operation has been processed
in an engine or not by looking up its seq_no in LocalCheckpointTracker.
Relates #33656
Changes the default of the `node.name` setting to the hostname of the
machine on which Elasticsearch is running. Previously it was the first 8
characters of the node id. This had the advantage of producing a unique
name even when the node name isn't configured but the disadvantage of
being unrecognizable and not being available until fairly late in the
startup process. Of particular interest is that it isn't available until
after logging is configured. This forces us to use a volatile read
whenever we add the node name to the log.
Using the hostname is available immediately on startup and is generally
recognizable but has the disadvantage of not being unique when run on
machines that don't set their hostname or when multiple elasticsearch
processes are run on the same host. I believe that, taken together, it
is better to default to the hostname.
1. Running multiple copies of Elasticsearch on the same node is a fairly
advanced feature. We do it all the as part of the elasticsearch build
for testing but we make sure to set the node name then.
2. That the node.name defaults to some flavor of "localhost" on an
unconfigured box feels like it isn't going to come up too much in
production. I expect most production deployments to at least set the
hostname.
As a bonus, production deployments need no longer set the node name in
most cases. At least in my experience most folks set it to the hostname
anyway.