If all primary shards are allocated on the master node, then the
verifying before close step will never interact with mock transport
service. This change prefers to allocate shards on data-only nodes.
Closes#39757
* It looks like we might be cancelling a previous publication instead of
the one triggered by the given request with a very low likelihood.
* Fixed by adding a wait for no in-progress publications
* Also added debug logging that would've identified this problem
* Closes#36813
* Remove Delete Method from BlobStore (#41619)
* The delete method on the blob store was used almost nowhere and just duplicates the delete method on the blob containers
* The fact that it provided for some recursive delete logic (that did not behave the same way on all implementations) was not used and not properly tested either
* Added separate enum for the state of each shard, it was really
confusing that we used the same enum for the state of the snapshot
overall and the state of each individual shard
* relates https://github.com/elastic/elasticsearch/pull/40943#issuecomment-488664150
* Shortened some obvious spots in equals method and saved a few lines
via `computeIfAbsent` to make up for adding 50 new lines to this class
* Remove Obsolete BwC Logic from BlobStoreRepository
* We can't restore 1.3.3 files anyway -> no point in doing the dance of computing a hash here
* Some other minor+obvious cleanups
* Some Cleanup in o.e.i.engine
* Remove dead code and parameters
* Reduce visibility in some obvious spots
* Add missing `assert`s (not that important here since the methods
themselves will probably be dead-code eliminated) but still
If a single pipeline is updated then the internal representation of
all pipelines was updated. With this change, only the internal representation
of the pipelines that have been modified will be updated.
Prior to this change the IngestMetadata of the previous and current cluster
was used to determine whether the internal representation of pipelines
should be updated. If applying the previous cluster state change failed then
subsequent cluster state changes that have no changes to IngestMetadata
will not attempt to update the internal representation of the pipelines.
This commit, changes how the IngestService updates the internal representation
by keeping track of the underlying configuration and use that to detect
against the new IngestMetadata whether a pipeline configuration has been
changed and if so, then the internal pipeline representation will be updated.
We need more information to understand why CcrRetentionLeaseIT is
failing. This commit adds some debug log to retention leases and enables
them in CcrRetentionLeaseIT.
This commit removes the act of renewing some retention leases during a
retention lease recovery test. Having renewal does not add anything
extra to this test, but does allow for some situations where the test
can fail spuriously (i.e., in a way that does not indicate that
production code is broken).
If we close an engine while a refresh is happening, then we might leak
refCount of some SegmentReaders. We need to skip the ram accounting
circuit breaker check until we have a new Lucene snapshot which includes
the fix for LUCENE-8809.
This also adds a test to the engine but left it muted so we won't forget
to reenable this check.
Closes#30290
The shard CLI tool would not do anything if a corruption marker was not
present. But a corruption marker is only added if a corruption is
detected during indexing/writing, not if a search or other read fails.
Changed the tool to always check shards regardless of corruption marker
presence.
Related to #41298
Previously sorting on a missing nested field would fail with an
Exception:
`[nested_field] failed to find nested object under path [nested_path]`
despite `unmapped_type` being set on the query.
Fixes: #33644
(cherry picked from commit 631142d5dd088a10de8dcd939b50a14301173283)
Today the default stabilisation time is calculated on the assumption that the
elected master has no pending tasks to process when it is elected, but this is
not a safe assumption to make. This can result in a cluster reaching the end of
its stabilisation time without having stabilised. Furthermore in #36943 we
increased the probability that each step in `runRandomly()` enqueues another
task, vastly increasing the chance that we hit such a situation.
This change extends the stabilisation process to allow time for all pending
tasks, plus a task that might currently be in flight.
Fixes#41967, in which the master entered the stabilisation phase with over 800
tasks to process.
By default, we track total hits up to 10k but we might index more than
10k documents `testPrimaryRelocationWhileIndexing`. With this change, we
always request for the accurate total hits in the test.
> java.lang.AssertionError: Count is 10000+ hits but 11684 was expected.
Both of these classes are basically a bloated wrapper around a simple
construct that can simply be a DirectoryFactory interface. This change
removes both classes and replaces them with a simple stateless interface
that creates a new `Directory` per shard. The concept of `index.store` is preserved
since it makes sense from a configuration perspective.
Today the `TransportClusterStateAction` ignores the state passed by the
`TransportMasterNodeAction` and obtains its state from the cluster applier.
This might be inconsistent, showing a different node as the master or maybe
even having no master.
This change adjusts the action to use the passed-in state directly, and adds
tests showing that the state returned is consistent with our expectations even
if there is a concurrent master failover.
Fixes#38331
Relates #38432
Today `RetentionLeaseIT` calls `fail(e.toString())` on some exceptions, losing
the stack trace that came with the exception. This commit adjusts this to
re-throw the exception wrapped in an `AssertionError` so we can see more
details about failures such as #41430.
This commit adds a log message containing the routing table, emitted on each
iteration of the failing assertBusy() in #40174. It also modernizes the code a
bit.
This commit makes creators of GetField split the fields into document fields and metadata fields. It is part of larger refactoring that aims to remove the calls to static methods of MapperService related to metadata fields, as discussed in #24422.
`org.elasticsearch.action.bulk.BulkProcessor` is a threadsafe class that
allows for simple semantics to deal with sending bulk requests. Once a
bulk reaches it's pre-defined size, documents, or flush interval it will
execute sending the bulk. One configurable option is the number of concurrent
outstanding bulk requests. That concurrency is implemented in
`org.elasticsearch.action.bulk.BulkRequestHandler` via a semaphore. However,
the only code that currently calls into this code is blocked by `synchronized`
methods. This results in the in-ability for the BulkProcessor to behave concurrently
despite supporting configurable amounts of concurrent requests.
This change removes the `synchronized` method in favor an explicit
lock around the non-thread safe parts of the method. The call into
`org.elasticsearch.action.bulk.BulkRequestHandler` is no longer blocking, which
allows `org.elasticsearch.action.bulk.BulkRequestHandler` to handle it's own concurrency.
Lucene 8 has the ability to skip blocks of non-competitive documents.
However some queries don't track their maximum score (`script_score`, `span`, ...)
so they always return Float.POSITIVE_INFINITY as maximum score. This can slow down
some boolean queries if other clauses have bounded max scores. This commit disables
the max score optimization when we detect a mandatory scoring clause with unbounded
max scores. Optional clauses are not checked since they can still skip documents
when the unbounded clause is after the current document.
Currently AnalysisRegistry#processNormalizerFactory creates a normalizer and
only later checks whether it should be added to the normalizer map passed in. In
case we throw an exception it isn't closed. This can be prevented by moving the
check that throws the exception earlier.
Allow for SimpleQueryString, QueryString and MultiMatchQuery
to set the `fields` parameter to the wildcard `*`. If so, set
the leniency to `true`, to achieve the same behaviour as from the
`"default_field" : "*" setting.
Furthermore, check if `*` is in the list of the `default_field` but
not necessarily as the 1st element.
Closes: #39577
(cherry picked from commit e75ff0c748e6b68232c2b08e19ac4a4934918264)
AbstractDisruptionTestCase set a lower global checkpoint sync interval setting, but this was ignored by
testAckedIndexing, which has led to spurious test failures
Relates #41068, #38931
ShardId already implements Writeable so there is no need for it to implement Streamable too. Also the readShardId static method can be
easily replaced with direct usages of the constructor that takes a
StreamInput as argument.
In case a search request holds only the suggest section, the query phase
is skipped and only the suggest phase is executed instead. There will
never be hits returned, and in case the explain flag is set to true, the
explain sub phase throws a null pointer exception as the query is null.
Usually a null query is replaced with a match all query as part of SearchContext#preProcess which is though skipped as well with suggest
only searches. To address this, we skip the explain sub fetch phase
for search requests that only requested suggestions.
Closes#31260
Removing of payload in BulkRequest (#39843) had a side effect of making
`BulkRequest.add(DocWriteRequest<?>...)` (with varargs) recursive, thus
leading to StackOverflowError. This PR adds a small change in
RequestConvertersTests to show the error and the corresponding fix in
`BulkRequest`.
Fixes#41668