Today the `ClusterFormationFailureHelper` does not include the local node in
the list of nodes it claims to have discovered. This means that it sometimes
reports that it has not discovered a quorum when in fact it has. This commit
adds the local node to the set of discovered nodes.
Today we suppress election attempts on master-eligible nodes that are not in
the voting configuration. In fact this restriction is not necessary: any
master-eligible node can safely become master as long as it has a fresh enough
cluster state and can gather a quorum of votes. Moreover, this restriction is
sometimes undesirable: there may be a reason why we do not want any of the
nodes in the voting configuration to become master.
The reason for this restriction is as follows. If you want to shut the master
down then you might first exclude it from the voting configuration. When this
exclusion succeeds you might reasonably expect that a new master has been
elected, since the voting config exclusion is almost always a step towards
shutting the node down. If we allow nodes outside the voting configuration to
be the master then the excluded node will continue to be master, which is
confusing.
This commit adjusts the logic to allow master-eligible nodes to attempt an
election even if they are not in the voting configuration. If such a master is
successfully elected then it adds itself to the voting configuration. This
commit also adjusts the logic that causes master nodes to abdicate when they
are excluded from the voting configuration, to avoid the confusion described
above.
Relates #37712, #37802.
With this change, we will rebuild the live version map and local
checkpoint using documents (including soft-deleted) from the safe commit
when opening an internal engine. This allows us to safely prune away _id
of all soft-deleted documents as the version map is always in-sync with
the Lucene index.
Relates #40741
Supersedes #42979
This commit modifies InternalTestCluster to allow using client() and
other operations inside a RestartCallback (onStoppedNode typically).
Restarting nodes are now removed from the map and thus all
methods now return the state as if the restarting node does not exist.
This avoids various exceptions stemming from accessing the stopped
node(s).
Adds `equals()` and `hashcode()` methods to `DiscoveryNodeRole` to compare
these objects' values for equality, and adds a field to allow us to distinguish
unknown roles from known ones with the same name and abbreviation, for clearer
test failures.
Relates #43175
With this change we only have to add one line to add a new version.
The intent is to make it less error prone and easier to write a script
to automate the process.
There are a few tests within NodeTests that submit items to the
threadpool and then close the node. The tests are designed to check
how running tasks are affected during node close. These tests can cause
CI failures since the submitted tasks may not be running when the node
is closed and then execute after the thread context is closed, which
triggers an unexpected exception. This change ensures the threads are
running so we avoid the unexpected exception and can test these cases.
The test of task submittal while a node is closing is also important so
an additional but muted test has been added that tests the case where a
task may be getting submitted while the node is closing and ensuring we
do not trigger anything unexpected in these cases.
Relates #42774
Relates #42577
This PR proposes to model big integers as longs (and big decimals as doubles)
in the context of dynamic mappings.
Previously, the dynamic mapping logic did not recognize big integers or
decimals, and would an error of the form "No matching token for number_type
[BIG_INTEGER]" when a dynamic big integer was encountered. It now accepts these
numeric types and interprets them as 'long' and 'double' respectively. This
allows `dynamic_templates` to accept and and remap them as another type such as
`keyword` or `scaled_float`.
Addresses #37846.
The test fails if querying the roles via a transport client, since the
transport client does not have the plugin necessary to interpret the additional
role correctly. This commit adds this plugin to the transport client used.
Relates #43175Fixes#43223
Currently the randomization of the q.b. in these tests can create query strings
that can cause caching to be disabled for this query if we query all fields and
there is a date field present. This is pretty much an anomaly that we shouldn't
generally test for in the "testToQuery" tests where cache policies are checked.
This change makes sure we don't create offending query strings so the cache
checks never hit these cases and adds a special test method to check this edge
case.
Closes#43112
roundUp parsers were losing the composite pattern information when new
JavaDateFormatter was created from methods withLocale or withZone.
The roundUp parser should be preserved when calling these methods. This is the same approach in withLocale/Zone methods as in daa2ec8a60/server/src/main/java/org/elasticsearch/common/time/JavaDateFormatter.javacloses#42835
Given the significant performance impact that NIOFS has when term dicts are
loaded off-heap this change enforces FstLoadMode#AUTO that loads term dicts
off heap only if the underlying index input indicates a memory map.
Relates to #43150
As the ValuesSourceType evolves, it is important to be
confident that new enum constants do not break
backwards-compatibility on the stream. Having dedicated
unit tests for this class will help be sure of that.
This change adds the terms index (`.tip`) to the list of extensions
that are memory-mapped by hybridfs. These files used to be accessed
only once to load the terms index on-heap but since #42838 they can
now be used to read the binary FST directly so it is benefical to
memory-map them instead of accessing them via NIO.
Fixes an issue where tests would sometimes hang for 5 seconds when restarting a node. The reason
is that the SeedHostsResolver is blockingly waiting on a result for the full 5 seconds when the
corresponding threadpool is shut down.
We are still using `FileSwitchDirectory` in the case a user configures file based pre-load of mmaps. This is trappy for multiple reasons if the both directories used by `FileSwitchDirectory` point to the same filesystem directory. One issue is LUCENE-8835 that cause issues like #37111 - unless LUCENE-8835 isn't fixed we should not use it in elasticsearch. Instead we use a similar trick as we use for HybridFS and subclass mmap directory directly.
When set to false, allowPartialSearchResults option does not check if the
shard failures have been reseted to null. The atomic array, that is used to record
shard failures, is filled with a null value if a successful request on a shard happens
after a failure on a shard of another replica. In this case the atomic array is not empty
but contains only null values so this shouldn't be considered as a failure since all
shards are successful (some replicas have failed but the retries on another replica succeeded).
This change fixes this bug by checking the content of the atomic array and fails the request only
if allowPartialSearchResults is set to false and at least one shard failure is not null.
Closes#40743
Currently suggesters return null values on empty shards. Usually this gets replaced
by results from other non-epmty shards, but if the index is completely epmty (e.g. after
creation) the search responses "suggest" is also "null" and we don't render a corresponding
output in the REST response. This is an irritating edge case that requires special handling on
the user side (see #42473) and should be fixed.
This change makes sure every suggester type (completion, terms, phrase) returns at least an
empty skeleton suggestion output, even for empty shards. This way, even if we don't find
any suggestions anywhere, we still return and output the empty suggestion.
Closes#42473
When a search on some indices takes a long time, it may cause problems to other indices that are being searched as part of the same search request and being written to as well, because their search context needs to stay open for a long time. This is especially a problem when searching against throttled and non-throttled indices as part of the same request. The problem can be generalized though: this may happen whenever read-only indices are searched together with indices that are being written to. Search contexts staying open for a long time is only an issue for indices that are being written to, in practice.
This commit splits the search in two sub-searches: one for read-only indices, and one for ordinary indices. This way the two don't interfere with each other. The split is done only when size is greater than 0, no scroll is provided and query_then_fetch is used as search type. Otherwise, the search executes like before. Note that the returned num_reduce_phases reflect the number of reduction phases that were run. If the search is split in two, there are three reductions: one non-final for each search, and a final one that merges the results of the previous two.
Closes#40900
The SearchRequest#crossClusterSearch method is currently used only as
part of cross cluster search request, when minimizing roundtrips.
It will soon be used also when splitting a search into two: one for
throttled and one for non throttled indices. It will probably be used
for other usecases as well in the future, hence it makes sense to generalize its name to subSearchRequest.
Though not in use in elasticsearch currently, it seems surprising that
ThreadPool.scheduler().scheduleAtFixedRate would hang. A recurring
scheduled task is never completed (except on failure) and we test for
exceptions using RunnableFuture.get(), which hangs for periodic tasks.
Fixed by checking that task is done before calling .get().
Today the master eagerly reroutes the cluster as part of processing node joins.
However, it is not necessary to do this reroute straight away, and it is
sometimes preferable to defer it until later. For instance, when the master
wins its election it processes joins and performs a reroute, but it would be
better to defer the reroute until after the master has become properly
established.
This change defers this reroute into a separate task, and batches multiple such
tasks together.
#40741 introduced a merge policy that can drop the postings for the `_id`
field on soft deleted documents. The TermsSliceQuery assumes that every document
has has an entry in the postings for that field so it doesn't check if the terms
index exists or not. This change fixes this bug by checking if the terms index for
the `_id` field is null and ignore the segment entirely if it's the case. This should
be harmless since segments without an `_id` terms index should only contain soft deleted
documents.
Closes#42996
If linearizability checking fails with OOM (or other exception), we did
not get the serialized history written into the log, making it difficult
to debug in cases where the problem is hard to reproduce. Fixed to
always attempt dumping the serialized history.
Related to #42244
Both TransportAnalyzeAction and CategorizationAnalyzer have logic to build
custom analyzers for index-independent analysis. A lot of this code is duplicated,
and it requires the AnalysisRegistry to expose a number of internal provider
classes, as well as making some assumptions about when analysis components are
constructed.
This commit moves the build logic directly into AnalysisRegistry, reducing the
registry's API surface considerably.
Setting `auto` after the fuzzy operator (e.g. `"query": "foo~auto"`) in the `query_string`
does not take the length of the term into account when computing the distance and always use
a max distance of 1. This change fixes this disrepancy by ensuring that the term is passed when
the fuzziness is computed.
We respect allocation deciders, including the `MaxRetryAllocationDecider`, when
executing reroute commands. If you specify `?retry_failed=true` then the retry
counter is reset, but today this does not happen until after trying to execute
the reroute commands. This means that if an allocation has repeatedly failed,
but you want to take control and assign a shard to a particular node to work
around the repeated failures, you cannot execute the routing command in the
same call to `POST /_cluster/reroute` as the one that resets the failure
counter.
This commit fixes this by resetting the failure counter first, meaning that you
can now explicitly allocate a repeatedly-failed shard like this:
```
POST /_cluster/reroute?retry_failed=true
{
"commands": [
{
"allocate_replica": {
"index": "blahblah",
"shard": 2,
"node": "node-4"
}
}
]
}
```
Fixes#39546
This commit refactors put mapping request validation for reuse. The
concrete case that we are after here is the ability to apply effectively
the same framework to indices aliases requests. This commit refactors
the put mapping request validation framework to allow for that.
This commit fixes a test bug in the request validators random test. In
particular, an assertion was not properly nested in a guard that would
ensure that was at least one failure.
Relates #43000
When applying put mapping validators, we apply all the validators in the
collection. If a failure occurs, we collect that as a top-level
exception, and suppress any additional failures into the top-level
exception. However, if a request passes the validator after a top-level
exception has been collected, we would try to suppress a null exception
into the top-level exception. This is a violation of the
Throwable#addSuppressed API. This commit addresses this, and adds test
to cover the logic of collecting the failures when validating a put
mapping request.
Today we test for translog corruption by incrementing a byte by 1 somewhere in
a file, and verify that this leads to a `TranslogCorruptionException`.
However, we rely on _all_ corruptions leading to this exception in the
`RemoveCorruptedShardDataCommand`: this command fails if a translog file
corruption leads to a different kind of exception, and `EOFException` and
`NegativeArraySizeException` are both possible. This commit strengthens the
translog corruption detection tests by simulating the following:
- a random value is written
- the file is truncated
It also makes sure that we return a `TranslogCorruptionException` in all such
cases.
Fixes#42661
Backport of #42744
This code has not been needed since the removal of tribe nodes, it was
left behind when those were dropped (note that regular transport
permissions are handled through transport profiles, even if they are not
explicitly in use).
Previously, a reindex request had two different size specifications in the body:
* Outer level, determining the maximum documents to process
* Inside the source element, determining the scroll/batch size.
The outer level size has now been renamed to max_docs to
avoid confusion and clarify its semantics, with backwards compatibility and
deprecation warnings for using size.
Similarly, the size parameter has been renamed to max_docs for
update/delete-by-query to keep the 3 interfaces consistent.
Finally, all 3 endpoints now support max_docs in both body and URL.
Relates #24344
Today we assert that the connection thread is blocked by the time the test gets
to the barrier, but in fact this is not a valid assertion. The following
`Thread.sleep()` will cause the test to fail reasonably often.
```diff
diff --git a/server/src/test/java/org/elasticsearch/cluster/NodeConnectionsServiceTests.java b/server/src/test/java/org/elasticsearch/cluster/NodeConnectionsServiceTests.java
index 193cde3180d..0e57211cec4 100644
--- a/server/src/test/java/org/elasticsearch/cluster/NodeConnectionsServiceTests.java
+++ b/server/src/test/java/org/elasticsearch/cluster/NodeConnectionsServiceTests.java
@@ -364,6 +364,7 @@ public class NodeConnectionsServiceTests extends ESTestCase {
final CheckedRunnable<Exception> connectionBlock = nodeConnectionBlocks.get(node);
if (connectionBlock != null) {
try {
+ Thread.sleep(50);
connectionBlock.run();
} catch (Exception e) {
throw new AssertionError(e);
```
This change relaxes the test to allow some time for the connection thread to
hit the barrier.
Fixes#40170
Changed order of listener invocation so that we notify before
registering search context and notify after unregistering same.
This ensures that count up/down like what we do in ShardSearchStats
works. Otherwise, we risk notifying onFreeScrollContext before notifying
onNewScrollContext (same for onFreeContext/onNewContext, but we
currently have no assertions failing in those).
Closes#28053
The test failed because we had only a single document in the index
that got deleted such that some assertions that expected at least
one live doc failed.
Relates to: #40741
Single updates use a different internal code path than updates that are wrapped in a bulk request.
While working on a refactoring to bring both closer together I've noticed that bulk updates were
failing some of the tests that single updates passed. In particular, bulk updates cause
NullPointerExceptions to be thrown and listeners not being properly notified when being rejected
from the thread pool.
This change adds a merge policy that drops all _id postings for documents that
are marked as soft-deleted but retained across merges. This is usually unnecessary
unless soft-deletes are used with a retention policy since otherwise a merge would
remove deleted documents anyway.
Yet, this merge policy prevents extreme cases where a very large number of soft-deleted
documents are retained and are impacting update performance.
Note, using this merge policy will remove all lookup by ID capabilities for soft-deleted documents.
Adds a metadata field to snapshots which can be used to store arbitrary
key-value information. This may be useful for attaching a description of
why a snapshot was taken, tagging snapshots to make categorization
easier, or identifying the source of automatically-created snapshots.
Some clusters might have been already migrated to version 7 without being warned about the joda-java migration changes.
Deprecation api on that version will give them guidance on what patterns need to be changed.
relates. This change is using the same logic like in 6.8 that is: verifying the pattern is from the incompatible set ('y'-Y', 'C', 'Z' etc), not from predifined set, not prefixed with 8. AND was also created in 6.x. Mappings created in 7.x are considered migrated and should not generate warnings
There is no pipeline check (present on 6.8) as it is impossible to verify when the pipeline was created, and therefore to make sure the format is depracated or not
#42010
The full-text query parsers accept field pattern that are expanded using the mapping.
Alias field are also detected during the expansion but they are not deduplicated with the
concrete fields that are found from other patterns (or the same). This change ensures
that we deduplicate the target fields of the full-text query parsers in order to avoid
adding the same clause multiple times. Boolean queries are already able to deduplicate
clauses during rewrite but since we also use DisjunctionMaxQuery it is preferable to detect
these duplicates early on.
This change makes use of the reader attributes added in LUCENE-8671
to ensure that `_id` fields are always on-heap for best update performance
and term dicts are generally off-heap on Read-Only engines.
Closes#38390
This test is failing because recoveries of these empty shards are not
completing in a reasonable time, but the reason for this is still obscure. This
commit adds yet more logging.
Relates #40174, #42424
This commit adds functionality so that aliases that are manipulated on
leader indices are replicated by the shard follow tasks to the follower
indices. Note that we ignore write indices. This is due to the fact that
follower indices do not receive direct writes so the concept is not
useful.
Relates #41815
This commit performs the proper restore of network disruption.
Previously disruptionScheme.stopDisrupting() was called that does not
ensure that connectivity between cluster nodes is restored. The test
was checking that the cluster has green status, but it was not checking
that connectivity between nodes is restored.
Here we switch to internalCluster().clearDisruptionScheme(true) which
performs both checks before returning.
Closes#39688
(cherry picked from commit c8988d5cf5a85f9b28ce148dbf100aaa6682a757)
The control flow in TransportAnalyzeAction is currently spread across two large
methods, and is quite difficult to follow. This commit tidies things up a bit, to make
it clearer when we use pre-defined analyzers and when we use custom built ones.
This commit clones the existing AnalyzeRequest/AnalyzeResponse classes
to the high-level rest client, and adjusts request converters to use these new
classes.
This is a prerequisite to removing the Streamable interface from the internal
server version of these classes.
IntervalBuilder#analyzeText will currently return null if it is passed an
empty TokenStream, which can lead to a confusing NullPointerException
later on during querying. This commit changes the code to return
NO_INTERVALS instead.
Fixes#42587
We were checking if an exception was caused by a specific reason "Not a
directory". Alas, this reason is locale-dependent and can fail on
systems that are not set to en_US.UTF-8. This commit addresses this by
deriving what the locale-dependent error message would be and using that
for comparison with the actual exception thrown.
Relates #41689
We had this as a dependency for legacy dependencies that still needed
the Log4j 1.2 API. This appears to no longer be necessary, so this
commit removes this artifact as a dependency.
To remove this dependency, we had to fix a few places where we were
accidentally relying on Log4j 1.2 instead of Log4j 2 (easy to do, since
both APIs were on the compile-time classpath).
Finally, we can remove our custom Netty logger factory. This was needed
when we were on Log4j 1.2 and handled logging in our own unique
way. When we migrated to Log4j 2 we could have dropped this
dependency. However, even then Netty would still pick up Log4j 1.2 since
it was on the classpath, thus the advantage to removing this as a
dependency now.
Today Elasticsearch does not prevent you from reconfiguring a node's
`path.data` to point to data paths that previously belonged to more than one
node. There's no good reason to be able to do this, and the consequences can be
quietly disastrous. Furthermore, #42489 might result in a user trying to split
up a previously-shared collection of data paths by hand and there's definitely
scope for mixing the paths up across nodes when doing this.
This change adds a check during startup to ensure that each data path belongs
to the same node.
Since the max_score optimization landed in Elasticsearch 7,
the CommonTermsQuery is redundant and slower. Moreover the
cutoff_frequency parameter for MatchQuery and MultiMatchQuery
is redundant.
Relates to #27096
(cherry picked from commit 04b74497314eeec076753a33b3b6cc11549646e8)
Today the `LeaderChecker` and `HandshakingTransportAddressConnector` do not log
anything above `DEBUG` level. However there are some situations where it is
appropriate for them to log at a higher level:
- if the low-level handshake succeeds but the high-level one fails then this
indicates a config error that the user should resolve, and the exception
will help them to do so.
- if leader checks fail repeatedly then we restart discovery, and the exception
will help to determine what went wrong.
Resolves#42153
Refactors the WKT and GeoJSON parsers from an utility class into an
instantiatable objects. This is a preliminary step in
preparation for moving out coordinate validators from Geometry
constructors. This should allow us to make validators plugable.
A disruption test case need to use a lower checkpoint sync interval
since they verify sequence numbers after the test waiting max 10 seconds
for it to stabilize.
Closes#42637
* Now that we process the bulk requests themselves on the WRITE threadpool, they can run out of retries too like the item requests even when backoff is active
* Fixes#41324 by using the same logic that checks failed item requests for their retry status for the top level bulk requests as well
When multiple commands are called in sequence, fetch shards
from mutable, up-to-date routing nodes to ensure each command's
changes are visible to subsequent commands.
This addresses an issue uncovered during work on #41050.
The problem this commit addresses is that state recovery is not reset on a node that then becomes
master with a cluster state that has a state not recovered flag in it. The situation that was observed
in a failed test run of MinimumMasterNodesIT.testThreeNodesNoMasterBlock (see below) is that we
have 3 master nodes (node_t0, node_t1, node_t2), two of them are shut down (node_t2 remains),
when the first one comes back (renamed to node_t4) it becomes leader in term 2 and sends state
(with state_not_recovered_block) to node_t2, which accepts. node_t2 becomes leader in term 3, and
as it was previously leader in term1 and successfully completed state recovery, does never retry
state recovery in term 3.
Closes#39172
* Make unwrapCorrupt Check Suppressed Ex. (#41889)
* As discussed in #24800 we want to check for suppressed corruption
indicating exceptions here as well to more reliably categorize
corruption related exceptions
* Closes#24800, 41201
* Cleanup Bulk Delete Exception Logging
* Follow up to #41368
* Collect all failed blob deletes and add them to the exception message
* Remove logging of blob name list from caller exception logging
If all primary shards are allocated on the master node, then the
verifying before close step will never interact with mock transport
service. This change prefers to allocate shards on data-only nodes.
Closes#39757
* It looks like we might be cancelling a previous publication instead of
the one triggered by the given request with a very low likelihood.
* Fixed by adding a wait for no in-progress publications
* Also added debug logging that would've identified this problem
* Closes#36813
* Remove Delete Method from BlobStore (#41619)
* The delete method on the blob store was used almost nowhere and just duplicates the delete method on the blob containers
* The fact that it provided for some recursive delete logic (that did not behave the same way on all implementations) was not used and not properly tested either
* Added separate enum for the state of each shard, it was really
confusing that we used the same enum for the state of the snapshot
overall and the state of each individual shard
* relates https://github.com/elastic/elasticsearch/pull/40943#issuecomment-488664150
* Shortened some obvious spots in equals method and saved a few lines
via `computeIfAbsent` to make up for adding 50 new lines to this class
* Remove Obsolete BwC Logic from BlobStoreRepository
* We can't restore 1.3.3 files anyway -> no point in doing the dance of computing a hash here
* Some other minor+obvious cleanups
* Some Cleanup in o.e.i.engine
* Remove dead code and parameters
* Reduce visibility in some obvious spots
* Add missing `assert`s (not that important here since the methods
themselves will probably be dead-code eliminated) but still
If a single pipeline is updated then the internal representation of
all pipelines was updated. With this change, only the internal representation
of the pipelines that have been modified will be updated.
Prior to this change the IngestMetadata of the previous and current cluster
was used to determine whether the internal representation of pipelines
should be updated. If applying the previous cluster state change failed then
subsequent cluster state changes that have no changes to IngestMetadata
will not attempt to update the internal representation of the pipelines.
This commit, changes how the IngestService updates the internal representation
by keeping track of the underlying configuration and use that to detect
against the new IngestMetadata whether a pipeline configuration has been
changed and if so, then the internal pipeline representation will be updated.
We need more information to understand why CcrRetentionLeaseIT is
failing. This commit adds some debug log to retention leases and enables
them in CcrRetentionLeaseIT.
This commit removes the act of renewing some retention leases during a
retention lease recovery test. Having renewal does not add anything
extra to this test, but does allow for some situations where the test
can fail spuriously (i.e., in a way that does not indicate that
production code is broken).
If we close an engine while a refresh is happening, then we might leak
refCount of some SegmentReaders. We need to skip the ram accounting
circuit breaker check until we have a new Lucene snapshot which includes
the fix for LUCENE-8809.
This also adds a test to the engine but left it muted so we won't forget
to reenable this check.
Closes#30290
The shard CLI tool would not do anything if a corruption marker was not
present. But a corruption marker is only added if a corruption is
detected during indexing/writing, not if a search or other read fails.
Changed the tool to always check shards regardless of corruption marker
presence.
Related to #41298
Previously sorting on a missing nested field would fail with an
Exception:
`[nested_field] failed to find nested object under path [nested_path]`
despite `unmapped_type` being set on the query.
Fixes: #33644
(cherry picked from commit 631142d5dd088a10de8dcd939b50a14301173283)
Today the default stabilisation time is calculated on the assumption that the
elected master has no pending tasks to process when it is elected, but this is
not a safe assumption to make. This can result in a cluster reaching the end of
its stabilisation time without having stabilised. Furthermore in #36943 we
increased the probability that each step in `runRandomly()` enqueues another
task, vastly increasing the chance that we hit such a situation.
This change extends the stabilisation process to allow time for all pending
tasks, plus a task that might currently be in flight.
Fixes#41967, in which the master entered the stabilisation phase with over 800
tasks to process.
By default, we track total hits up to 10k but we might index more than
10k documents `testPrimaryRelocationWhileIndexing`. With this change, we
always request for the accurate total hits in the test.
> java.lang.AssertionError: Count is 10000+ hits but 11684 was expected.
Both of these classes are basically a bloated wrapper around a simple
construct that can simply be a DirectoryFactory interface. This change
removes both classes and replaces them with a simple stateless interface
that creates a new `Directory` per shard. The concept of `index.store` is preserved
since it makes sense from a configuration perspective.
Today the `TransportClusterStateAction` ignores the state passed by the
`TransportMasterNodeAction` and obtains its state from the cluster applier.
This might be inconsistent, showing a different node as the master or maybe
even having no master.
This change adjusts the action to use the passed-in state directly, and adds
tests showing that the state returned is consistent with our expectations even
if there is a concurrent master failover.
Fixes#38331
Relates #38432
Today `RetentionLeaseIT` calls `fail(e.toString())` on some exceptions, losing
the stack trace that came with the exception. This commit adjusts this to
re-throw the exception wrapped in an `AssertionError` so we can see more
details about failures such as #41430.
This commit adds a log message containing the routing table, emitted on each
iteration of the failing assertBusy() in #40174. It also modernizes the code a
bit.
This commit makes creators of GetField split the fields into document fields and metadata fields. It is part of larger refactoring that aims to remove the calls to static methods of MapperService related to metadata fields, as discussed in #24422.
`org.elasticsearch.action.bulk.BulkProcessor` is a threadsafe class that
allows for simple semantics to deal with sending bulk requests. Once a
bulk reaches it's pre-defined size, documents, or flush interval it will
execute sending the bulk. One configurable option is the number of concurrent
outstanding bulk requests. That concurrency is implemented in
`org.elasticsearch.action.bulk.BulkRequestHandler` via a semaphore. However,
the only code that currently calls into this code is blocked by `synchronized`
methods. This results in the in-ability for the BulkProcessor to behave concurrently
despite supporting configurable amounts of concurrent requests.
This change removes the `synchronized` method in favor an explicit
lock around the non-thread safe parts of the method. The call into
`org.elasticsearch.action.bulk.BulkRequestHandler` is no longer blocking, which
allows `org.elasticsearch.action.bulk.BulkRequestHandler` to handle it's own concurrency.
Lucene 8 has the ability to skip blocks of non-competitive documents.
However some queries don't track their maximum score (`script_score`, `span`, ...)
so they always return Float.POSITIVE_INFINITY as maximum score. This can slow down
some boolean queries if other clauses have bounded max scores. This commit disables
the max score optimization when we detect a mandatory scoring clause with unbounded
max scores. Optional clauses are not checked since they can still skip documents
when the unbounded clause is after the current document.
Currently AnalysisRegistry#processNormalizerFactory creates a normalizer and
only later checks whether it should be added to the normalizer map passed in. In
case we throw an exception it isn't closed. This can be prevented by moving the
check that throws the exception earlier.
Allow for SimpleQueryString, QueryString and MultiMatchQuery
to set the `fields` parameter to the wildcard `*`. If so, set
the leniency to `true`, to achieve the same behaviour as from the
`"default_field" : "*" setting.
Furthermore, check if `*` is in the list of the `default_field` but
not necessarily as the 1st element.
Closes: #39577
(cherry picked from commit e75ff0c748e6b68232c2b08e19ac4a4934918264)
AbstractDisruptionTestCase set a lower global checkpoint sync interval setting, but this was ignored by
testAckedIndexing, which has led to spurious test failures
Relates #41068, #38931
ShardId already implements Writeable so there is no need for it to implement Streamable too. Also the readShardId static method can be
easily replaced with direct usages of the constructor that takes a
StreamInput as argument.
In case a search request holds only the suggest section, the query phase
is skipped and only the suggest phase is executed instead. There will
never be hits returned, and in case the explain flag is set to true, the
explain sub phase throws a null pointer exception as the query is null.
Usually a null query is replaced with a match all query as part of SearchContext#preProcess which is though skipped as well with suggest
only searches. To address this, we skip the explain sub fetch phase
for search requests that only requested suggestions.
Closes#31260
Removing of payload in BulkRequest (#39843) had a side effect of making
`BulkRequest.add(DocWriteRequest<?>...)` (with varargs) recursive, thus
leading to StackOverflowError. This PR adds a small change in
RequestConvertersTests to show the error and the corresponding fix in
`BulkRequest`.
Fixes#41668
* Remove IndexShard dependency from Repository
In order to simplify repository testing especially for BlobStoreRepository
it's important to remove the dependency on IndexShard and reduce it to
Store and MapperService (in the snapshot case). This significantly reduces
the dependcy footprint for Repository and allows unittesting without starting
nodes or instantiate entire shard instances. This change deprecates the old
method signatures and adds a unittest for FileRepository to show the advantage
of this change.
In addition, the unittesting surfaced a bug where the internal file names that
are private to the repository were used in the recovery stats instead of the
target file names which makes it impossible to relate to the actual lucene files
in the recovery stats.
* don't delegate deprecated methods
* apply comments
* test
When using High Level Rest Client Java API to produce search query, using AggregationBuilders.topHits("th").sort("_score", SortOrder.DESC)
caused query to contain duplicate sort clauses.
A shard that is undergoing peer recovery is subject to logging warnings of the form
org.elasticsearch.action.FailedNodeException: Failed node [XYZ]
...
Caused by: org.apache.lucene.index.IndexNotFoundException: no segments* file found in ...
These failures are actually harmless, and expected to happen while a peer recovery is ongoing (i.e.
there is an IndexShard instance, but no proper IndexCommit just yet).
As these failures are currently bubbled up to the master, they cause unnecessary reroutes and
confusion amongst users due to being logged as warnings.
Closes #40107
Moves the test infrastructure away from using node.max_local_storage_nodes, allowing us in a
follow-up PR to deprecate this setting in 7.x and to remove it in 8.0.
This also changes the behavior of InternalTestCluster so that starting up nodes will not automatically
reuse data folders of previously stopped nodes. If this behavior is desired, it needs to be explicitly
done by passing the data path from the stopped node to the new node that is started.
Simplifies the voting configuration reconfiguration logic by switching to an explicit Comparator for
the priorities. Does not make changes to the behavior of the component.
Flushing at the end of a peer recovery (if needed) can bring these
benefits:
1. Closing an index won't end up with the red state for a recovering
replica should always be ready for closing whether it performs the
verifying-before-close step or not.
2. Good opportunities to compact store (i.e., flushing and merging
Lucene, and trimming translog)
Closes#40024Closes#39588
This change verifies and aborts recovery if source and target have the
same syncId but different sequenceId. This commit also adds an upgrade
test to ensure that we always utilize syncId.
The verifying-before-close step ensures the global checkpoints on all
shard copies are in sync; thus, we don' t need to sync global
checkpoints for closed indices.
Relate #33888
There is an off-by-one error in this test. It leads to the recovery
thread never being started, and that means joining on it will wait
indefinitely. This commit addresses that by fixing the off-by-one error.
Relates #42325
I forgot to git add these before pushing, sorry. This commit fixes
compilation in IndexShardTests, they are needed here and not in master
due to differences in how Java infers types in generics between JDK 8
and JDK 11.
Today when executing an action on a primary shard under permit, we do
not enforce that the shard is in primary mode before executing the
action. This commit addresses this by wrapping actions to be executed
under permit in a check that the shard is in primary mode before
executing the action.
Today we are persisting the retention leases at least every thirty
seconds by a scheduled background sync. This sync causes an fsync to
disk and when there are a large number of shards allocated to slow
disks, these fsyncs can pile up and can severely impact the system. This
commit addresses this by only persisting and fsyncing the retention
leases if they have changed since the last time that we persisted and
fsynced the retention leases.
* Cleanup Various Uses of ActionListener
* Use shorter `map`, `runAfter` or `wrap` where functionally equivalent to anonymous class
* Use ActionRunnable where functionally equivalent
Downgrading an Elasticsearch node to an earlier version is unsupported, because
we do not make any attempt to guarantee that a node can read any of the on-disk
data written by a future version. Yet today we do not actively prevent
downgrades, and sometimes users will attempt to roll back a failed upgrade with
an in-place downgrade and get into an unrecoverable state.
This change adds the current version of the node to the node metadata file, and
checks the version found in this file against the current version at startup.
If the node cannot be sure of its ability to read the on-disk data then it
refuses to start, preserving any on-disk data in its upgraded state.
This change also adds a command-line tool to overwrite the node metadata file
without performing any version checks, to unsafely bypass these checks and
recover the historical and lenient behaviour.
Secure settings currently error if they exist inside elasticsearch.yml.
This commit adds validation that non-secure settings do not exist inside
the keystore.
closes#41831
The date_histogram accepts an interval which can be either a calendar
interval (DST-aware, leap seconds, arbitrary length of months, etc) or
fixed interval (strict multiples of SI units). Unfortunately this is inferred
by first trying to parse as a calendar interval, then falling back to fixed
if that fails.
This leads to confusing arrangement where `1d` == calendar, but
`2d` == fixed. And if you want a day of fixed time, you have to
specify `24h` (e.g. the next smallest unit). This arrangement is very
error-prone for users.
This PR adds `calendar_interval` and `fixed_interval` parameters to any
code that uses intervals (date_histogram, rollup, composite, datafeed, etc).
Calendar only accepts calendar intervals, fixed accepts any combination of
units (meaning `1d` can be used to specify `24h` in fixed time), and both
are mutually exclusive.
The old interval behavior is deprecated and will throw a deprecation warning.
It is also mutually exclusive with the two new parameters. In the future the
old dual-purpose interval will be removed.
The change applies to both REST and java clients.
If `keyedFilters` is null it assumes there are unkeyed filters...which
will NPE if the unkeyed filters was actually empty.
This refactors to simplify the filter assignment a bit, adds an empty
check and tidies up some formatting.
This commit fixes a test bug that ends up comparing the result of two consecutive calls to System.currentTimeMillis that can be different
on slow CIs.
Closes#42064
This test can create and shuffle 2*(3*5*7) = 210 shards which is quite
heavy for our CI. This commit reduces the load, so we don't timeout on
CI.
Closes#28153
Adds an initial limited implementations of geo features to SQL. This implementation is based on the [OpenGIS® Implementation Standard for Geographic information - Simple feature access](http://www.opengeospatial.org/standards/sfs), which is the current standard for GIS system implementation. This effort is concentrate on SQL option AKA ISO 19125-2.
Queries that are supported as a result of this initial implementation
Metadata commands
- `DESCRIBE table` - returns the correct column types `GEOMETRY` for geo shapes and geo points.
- `SHOW FUNCTIONS` - returns a list that includes supported `ST_` functions
- `SYS TYPES` and `SYS COLUMNS` display correct types `GEO_SHAPE` and `GEO_POINT` for geo shapes and geo points accordingly.
Returning geoshapes and geopoints from elasticsearch
- `SELECT geom FROM table` - returns the geoshapes and geo_points as libs/geo objects in JDBC or as WKT strings in console.
- `SELECT ST_AsWKT(geom) FROM table;` and `SELECT ST_AsText(geom) FROM table;`- returns the geoshapes ang geopoints in their WKT representation;
Using geopoints to elasticsearch
- The following functions will be supported for geopoints in queries, sorting and aggregations: `ST_GeomFromText`, `ST_X`, `ST_Y`, `ST_Z`, `ST_GeometryType`, and `ST_Distance`. In most cases when used in queries, sorting and aggregations, these function are translated into script. These functions can be used in the SELECT clause for both geopoints and geoshapes.
- `SELECT * FROM table WHERE ST_Distance(ST_GeomFromText(POINT(1 2), point) < 10;` - returns all records for which `point` is located within 10m from the `POINT(1 2)`. In this case the WHERE clause is translated into a range query.
Limitations:
Geoshapes cannot be used in queries, sorting and aggregations as part of this initial effort. In order to fully take advantage of geoshapes we would need to have access to geoshape doc values, which is coming in #37206. `ST_Z` cannot be used on geopoints in queries, sorting and aggregations since we don't store altitude in geo_point doc values.
Relates to #29872
Backport of #42031
This change updates tests that use a CountDownLatch to synchronize the
running of threads when testing concurrent operations so that we ensure
the thread has been fully created and run by the scheduler. Previously,
these tests used a latch with a value of 1 and the test thread counted
down while the threads performing concurrent operations just waited.
This change updates the value of the latch to be 1 + the number of
threads. Each thread counts down and then waits. This means that each
thread has been constructed and has started running. All threads will
have a common start point now.
Today we do not expose the cluster UUID in any logs by default, but it would be
useful to see it. For instance if a user starts multiple nodes as separate
clusters then they will silently remain as separate clusters even if they are
subsequently reconfigured to look like a single cluster. This change logs the
committed cluster UUID the first time the node encounters it.
* Switch to using a list instead of a Set for the filters, so that the
order of these filters is kept.
(cherry picked from commit 74a743829799b64971e0ac5ae265f43f6c14e074)
If remote recovery copies an index commit which has gaps in sequence
numbers to a follower; then these assertions (introduced in #40823)
don't hold for follower replicas.
Closes#41037
Currently IndexAnalyzers keeps the three default as separate class members
although they should refer to the same analyzers held in the additional
analyzers map under the default names. This assumption should be made more
explicit by keeping all analyzers in the map. This change adapts the constructor
to check all the default entries are there and the getters to reach into the map
with the default names when needed.
The changes in #39317 brought to light some concurrency issues in the
close method of Recyclers as we do not wait for threads running in the
threadpool to be finished prior to the closing of the PageCacheRecycler
and the Recyclers that are used internally. #41695 was opened to
address the concurrent close issues but upon review, the closing of
these classes is not really needed as the instances should be become
available for garbage collection once there is no longer a reference to
the closed node.
Closes#41683
We have a number of places in analysis-handling code where we check
if a field type is a keyword field, and if so then extract the normalizer rather
than pulling the index-time analyzer. However, a keyword normalizer is
really just a special case of an analyzer, so we should be able to simplify this
by setting the normalizer as the index-time analyzer at construction time.
* This test was resulting in a `PARTIAL` instead of a `SUCCESS` state for
the case of closing an index during snapshotting on 7.x
* The reason for this is the changed default behaviour regarding
waiting for active shards between 8.0 and 7.x
* Fixed by adjusting the waiting behaviour on the close index request
in the test
* Closes#39828
* When close becomes true while the management pool is shut down, we run
into an unhandled `EsRejectedExecutionException` that fails tests
* Found this while trying to reproduce #32506
* Running the IndexStatsIT in a loop is a way of reproducing this
Previously, TransportSingleShardAction required constructing a new
empty response object. This response object's Streamable readFrom
was used. As part of the migration to Writeable, the interface here
was updated to leverage Writeable.Reader.
relates to #34389.
The close method in Node uses a StopWatch to time to closing of
various services. However, the call to log the timing was made before
any of the services had been closed and therefore no timing would be
printed out. This change moves the timing log call to be a closeable
that is the last item closed.
This change makes the default seed address tests account for the lack
of an IPv6 network. By default docker containers only run with IPv4 and
these tests fail in a vanilla installation of elasticsearch-ci. To
resolve this we only expect IPv6 seed addresses if IPv6 is available.
Relates #41404
If the max doc in the index is greater than the minimum total term frequency
among the requested fields we need to adjust max doc to be equal to the min ttf.
This was removed by mistake when fixing #41125.
Closes#41934
The ReadOnlyEngine wraps its reader with a SoftDeletesDirectoryReaderWrapper if soft deletes
are enabled. However the wrapping is done on top of the ElasticsearchDirectoryReader and that
trips assertion later on since the cache key of these directories are different. This commit
changes the order of the wrapping to put the ElasticsearchDirectoryReader first in order to
ensure that it is always retrieved first when we unwrap the directory.
Closes#41795
* Allow unknown task time in QueueResizingEsTPE
The afterExecute method previously asserted that a TimedRunnable task
must have a positive execution time. However, the code in TimedRunnable
returns a value of -1 when a task time is unknown. Here, we expand the
logic in the assertion to allow for that possibility, and we don't
update our task time average if the value is negative.
* Add a failure flag to TimedRunnable
In order to be sure that a task has an execution time of -1 because of
a failure, I'm adding a failure flag boolean to the TimedRunnable class.
If execution time is negative for some other reason, an assertion will
fail.
Backport of #41810Fixes#41448
Prior to this change the ISO8601 date parser would only
parse an optional timezone if seconds were specified.
This change moves the timezone to the same level of
optional components as hour, so that timestamps without
minutes or seconds may optionally contain a timezone.
It also adds a unit test to cover all the supported
formats.
Allows configuring the number of test iterations via IntelliJ's config dialog, instead of having to add it
manually via the tests.iters system property.
* terminated_early should always be set in the response with terminate_after
Today we set `terminated_early` to true in the response if the query terminated
early due to `terminate_after`. However if `terminate_after` is smaller than
the number of documents in a shard we don't set the flag in the response indicating
that the query was exhaustive. This change fixes this disprepancy by setting
terminated_early to false in the response if the number of documents that match
the query is smaller than the provided `terminate_after` value.
Closes#33949
Today Elasticsearch accepts, but silently ignores, port ranges in the
`discovery.seed_hosts` setting:
```
discovery.seed_hosts: 10.1.2.3:9300-9400
```
Silently ignoring part of a setting like this is trappy. With this change we
reject seed host addresses of this form.
Closes#40786
Backport of #41404
Today if an exception is thrown when serializing a cluster state during
publication then the master enters a poisoned state where it cannot publish any
more cluster states, but nor does it stand down as master, yielding repeated
exceptions of the following form:
```
failed to commit cluster state version [12345]
org.elasticsearch.cluster.coordination.FailedToCommitClusterStateException: publishing failed
at org.elasticsearch.cluster.coordination.Coordinator.publish(Coordinator.java:1045) ~[elasticsearch-7.0.0.jar:7.0.0]
at org.elasticsearch.cluster.service.MasterService.publish(MasterService.java:252) [elasticsearch-7.0.0.jar:7.0.0]
at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:238) [elasticsearch-7.0.0.jar:7.0.0]
at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:142) [elasticsearch-7.0.0.jar:7.0.0]
at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) [elasticsearch-7.0.0.jar:7.0.0]
at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) [elasticsearch-7.0.0.jar:7.0.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:681) [elasticsearch-7.0.0.jar:7.0.0]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252) [elasticsearch-7.0.0.jar:7.0.0]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215) [elasticsearch-7.0.0.jar:7.0.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_144]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_144]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144]
Caused by: org.elasticsearch.cluster.coordination.CoordinationStateRejectedException: cannot start publishing next value before accepting previous one
at org.elasticsearch.cluster.coordination.CoordinationState.handleClientValue(CoordinationState.java:280) ~[elasticsearch-7.0.0.jar:7.0.0]
at org.elasticsearch.cluster.coordination.Coordinator.publish(Coordinator.java:1030) ~[elasticsearch-7.0.0.jar:7.0.0]
... 11 more
```
This is because it already created the publication request using
`CoordinationState#handleClientValue()` but then it fails before accepting it.
This commit addresses this by performing the serialization before calling
`handleClientValue()`.
Relates #41090, which was the source of such a serialization exception.
The fractional seconds portion of strict_date_optional_time was
accidentally copied from the printer, which always prints at least 3
fractional digits. This commit fixes the formatter to allow 1 or 2
fractional seconds.
closes#41633
Add a test that stresses concurrent writes using ifSeqno/ifPrimaryTerm to do CAS style updates. Use linearizability checker to verify linearizability. Linearizability of successful CAS'es is guaranteed.
Changed linearizability checker to allow collecting history concurrently.
Changed unresponsive network simulation to wake up immediately when network disruption is cleared to ensure tests proceed in a timely manner (and this also seems more likely to provoke issues).
Full text queries that start with now are not cacheable if they target a date field.
However we assume in the query builder tests that all queries are cacheable and this assumption
fails when the random generated query string starts with "now". This fails twice in several years
since the probability that a random string starts with "now" is low but this commit ensures that
isCacheable is correctly checked for full text queries that fall into this edge case.
Closes#41847
With this change, we will verify the consistency of version and source
(besides id, seq_no, and term) of live documents between shard copies
at the end of disruption tests.
If users close an index to change some non-dynamic index settings, then the current implementation forces replicas of that closed index to copy over segment files from the primary. With this change, we make peer recoveries of closed index skip both phases.
Relates #33888
Co-authored-by: Yannick Welsch <yannick@welsch.lu>
Exception message thrown when specifying illegal characters did
no accurately described the allowed characters. This updates the
error message to reflect reality (any character except [, ] and >)
When Elasticsearch is run from a package installation, the running
process does not have permissions to write to the keystore. This is
because of the root:root ownership of /etc/elasticsearch. This is why we
create the keystore if it does not exist during package installation. If
the keystore needs to be upgraded, that is currently done by the running
Elasticsearch process. Yet, as just mentioned, the Elasticsearch process
would not have permissions to do that during runtime. Instead, this
needs to be done during package upgrade. This commit adds an upgrade
command to the keystore CLI for this purpose, and that is invoked during
package upgrade if the keystore already exists. This ensures that we are
always on the latest keystore format before the Elasticsearch process is
invoked, and therefore no upgrade would be needed then. While this bug
has always existed, we have not heard of reports of it in practice. Yet,
this bug becomes a lot more likely with a recent change to the format of
the keystore to remove the distinction between file and string entries.
Today you can add a null `Custom` to the cluster state or its metadata, but
attempting to publish such a cluster state will fail. Unfortunately, the
publication-time failure gives very little information about the source of the
problem. This change causes the failure to manifest earlier and adds
information about which `Custom` was null in order to simplify the
investigation.
Relates #41090.
This changes the error message for a negative result in a function score when
using the ln modifier to suggest using ln1p or ln2p when a negative result
occurs in a function score and for the log modifier to suggest using log1p or
log2p.
This relates to #41509
This commit is a refactoring of how we filter addresses on
interfaces. In particular, we refactor all of these methods into a
common private method. We also change the order of logic to first check
if an address matches our filter and then check if the interface is
up. This is to possibly avoid problems we are seeing where devices are
flapping up and down while we are checking for loopback addresses. We do
not expect the loopback device to flap up and down so by reversing the
logic here we avoid that problem on CI machines. Finally, we expand the
error message when this does occur so that we know which device is
flapping.
This is related to #27260. Currently there is a setting
http.read_timeout that allows users to define a read timeout for the
http transport. This commit implements support for this functionality
with the transport-nio plugin. The behavior here is that a repeating
task will be scheduled for the interval defined. If there have been
no requests received since the last run and there are no inflight
requests, the channel will be closed.
Today a bulk shard request appears as follows in the detailed task list:
requests[42], index[my_index]
This change adds the shard index and refresh policy too:
requests[42], index[my_index][2], refresh[IMMEDIATE]
If closing a shard while resetting engine,
IndexEventListener.afterIndexShardClosed would be called while there is
still an active IndexWriter on the shard. For integration tests, this
leads to an exception during check index called from MockFSIndexStore
.Listener. Fixed.
Relates to #38561
Today we allow adding entries from a file or from a string, yet we
internally maintain this distinction such that if you try to add a value
from a file for a setting that expects a string or add a value from a
string for a setting that expects a file, you will have a bad time. This
causes a pain for operators such that for each setting they need to know
this difference. Yet, we do not need to maintain this distinction
internally as they are bytes after all. This commit removes that
distinction and includes logic to upgrade legacy keystores.
Today we choose to initialize max_seq_no_of_updates on primaries only so
we can deal with a situation where a primary is on an old node (before
6.5) which does not have MUS while replicas on new nodes (6.5+).
However, this strategy is quite complex and can lead to bugs (for
example #40249) since we have to assign a correct value (not too low) to
MSU in all possible situations (before recovering from translog,
restoring history on promotion, and handing off relocation).
Fortunately, we don't have to deal with this BWC in 7.0+ since all nodes
in the cluster should have MSU. This change simplifies the
initialization of MSU by always assigning it a correct value in the
constructor of Engine regardless of whether it's a replica or primary.
Relates #33842
IntervalsSources can throw IllegalArgumentExceptions if they would produce
too many disjunctions. To mitigate against this when building random
sources, we limit the depth of the randomly generated source to four
nested sources
Fixes#41402
* Add Repository Consistency Assertion to SnapshotResiliencyTests (#40857)
* Add Repository Consistency Assertion to SnapshotResiliencyTests
* Add some quick validation on not leaving behind any dangling metadata or dangling indices to the snapshot resiliency tests
* Added todo about expanding this assertion further
* Fix SnapshotResiliencyTest Repo Consistency Check (#41332)
* Fix SnapshotResiliencyTest Repo Consistency Check
* Due to the random creation of an empty `extra0` file by the Lucene mockFS we see broken tests because we use the existence of an index folder in assertions and the index deletion doesn't go through if there are extra files in an index folder
* Fixed by removing the `extra0` file and resulting empty directory trees before asserting repo consistency
* Closes#41326
* Reenable SnapshotResiliency Test (#41437)
This was fixed in https://github.com/elastic/elasticsearch/pull/41332
but I forgot to reenable the test.
* fix compile on java8
A stuck peer recovery in #40913 reveals that we indefinitely retry on
new cluster states if indexing translog operations hits a mapper
exception. We should not wait and retry if the mapping on the target is
as recent as the mapping that the primary used to index the replaying
operations.
Relates #40913
This commit refactors GeoHashUtils class into a new Geohash utility class located in the ES geo library. The intent is to not only better control what geo methods are whitelisted for painless scripting but to clean up the geo utility API in general.
Motivated by slow snapshot deletes reported in e.g. #39656 and the fact that these likely are a contributing factor to repositories accumulating stale files over time when deletes fail to finish in time and are interrupted before they can complete.
* Makes snapshot deletion async and parallelizes some steps of the delete process that can be safely run concurrently via the snapshot thread poll
* I did not take the biggest potential speedup step here and parallelize the shard file deletion because that's probably better handled by moving to bulk deletes where possible (and can still be parallelized via the snapshot pool where it isn't). Also, I wanted to keep the size of the PR manageable.
* See https://github.com/elastic/elasticsearch/pull/39656#issuecomment-470492106
* Also, as a side effect this gives the `SnapshotResiliencyTests` a little more coverage for master failover scenarios (since parallel access to a blob store repository during deletes is now possible since a delete isn't a single task anymore).
* By adding a `ThreadPool` reference to the repository this also lays the groundwork to parallelizing shard snapshot uploads to improve the situation reported in #39657
* Thanks to #39793 dynamic mapping updates don't contain blocking operations anymore so we don't have to manually put the mapping in this test and can keep it a little simpler
* Add Restore Operation to SnapshotResiliencyTests
* Expand the successful snapshot test case to also include restoring the snapshop
* Add indexing of documents as well to be able to meaningfully verify the restore
* This is part of the larger effort to test eventually consistent blob stores in #39504
* The check doesn't add much if anything practically, since the S3 repository is eventually consistent and we only log the non-existence of a blob anyway
* We don't do the check on writes for this very reason and documented it as such
* Removing the check saves one API call per single delete speeding up the deletion process and lowering costs
Today the `_field_caps` API returns the list of indices where a field
is present only if this field has different types within the requested indices.
However if the request is an index pattern (or an alias, or both...) there
is no way to infer the indices if the response contains only fields that have
the same type in all indices. This commit changes the response to always return
the list of indices in the response. It also adds a way to retrieve unmapped field
in a specific section per field called `unmapped`. This section is created for each field
that is present in some indices but not all if the parameter `include_unmapped` is set to
true in the request (defaults to false).
* Introduce Delegating ActionListener Wrappers
* Dry up use cases of ActionListener that simply pass through the response or exception to another listener
* Name Snapshot Data Blobs by UUID
* There is no functional reason why we need incremental naming for these files but
* As explained in #38941 it is a possible source of corrupting the repository
* It wastes API calls for the list operation
* Is just needless complication
* Since we store the exact names of the data blobs in all the metadata anyway, we can make this change without any BwC considerations
* Even on the worst case scenario of a downgrade the functionality would continue working since the incremental names wouldn't conflict with the uuids and the number parsing for finding the next incremental name suppresses the exception when encountring a non-numeric value after the double underscore prefix
In order to support empty action metadata in the first msearch item,
we need to remove support for prepending msearch request body with an
empty line, which prevents us from parsing the empty line as action
metadata for the first search item.
Relates to #41011
The test is testing the java time API and fails in case it hits daylight saving time changes.
Java time has the right implementation and we don't need to test this.
more details on how the test was affected by the DST change on this comment
closes#39617
backport(#41493)
* Due to #40866 one of the two parallel bulk requests can randomly be
rejected outright when the write queue is full already, we can catch
this situation and ignore it since we can still have the rejection for
the dynamic mapping udate for the other reuqest and it's somewhat rare
to run into this anyway
* Closes#41363
Adds some validation to prevent duplicate source names from being
used in the composite agg.
Also refactored to use a ConstructingObjectParser and removed the
private ctor and setter for sources, making it mandatory.
* The problem here is that if we run into a corrupted index-N file, instead of generating a new index-(N+1) file, we instead set the newest index generation to -1 and thus tried to create `index-0`
* If `index-0` is corrupt, this prevents us from ever creating a new snapshot using the broken shard, because we are unable to create `index-0` since it already exists
* Fixed by still using the index generation for naming the next index file, even if it was a broken index file
* Added test that makes sure restoring as well as snapshotting on top of the broken shard index file work as expected
* closes#41304
* The test fails for the retry backoff enabled case because the retry handler in the bulk processor hasn't been adjusted to account for #40866 which now might lead to an outright rejection of the request instead of its items individually
* Fixed by adding retry functionality to the top level request as well
* Also fixed the duplicate test for the HLRC that wasn't handling the non-backoff case yet the same way the non-client IT did
* closes#41324
Today we assert that there are no operations in flight in this test. However we
will sometimes be in a situation where the operations are blocked, and we
distinguish these cases since #41271 causing the assertion to fail. This commit
addresses this by allowing operations to be blocked sometimes after a primary
promotion.
Fixes#41333.
The `composite` aggregation maps unknown fields as numerics, this means that
any `after` value that is set on a query with an unmapped field on some indices
will fail if the provided value is not numeric. This commit changes the default
value source to use keyword instead in order to be able to parse any type of after
values.
The `_id` field uses a binary encoding to index terms that is not compatible with
the utf8 automaton that the unified highlighter creates to reanalyze the input.
For these reason this commit ignores terms that target the `_id` field when
`require_field_match` is set to false.
Closes#37525
With the removal of the `_all` field the `mlt` query cannot infer a field name
to use to analyze the provided (un)like text if the `fields` parameter is not
explicitly set in the query and the `index.query.default_field` is not changed
in the index settings (by default it is set to `*`). For this reason the like text
is ignored and queries are only built from the provided document ids.
This change fixes this bug by throwing an error if the fields option is not set
and the `index.query.default_field` is equals to `*`. The error is thrown only
if like or unlike texts are provided in the query.
This fixes an issue where every N seconds a slow search request is triggered
since the searcher access time is not set unless the shard is idle. This change
moves to a more pro-active approach setting the searcher as accessed all the time.
* Bulk requests can be thousands of items large and take more than O(10ms) time to handle => we should not handle them on the transport threadpool to not block select loops
* relates #39128
* relates #39658
Today we do not distinguish "no operations in flight" from "operations are
blocked", since both return `0` from `IndexShard#getActiveOperationsCount()`.
We therefore cannot assert that every `TransportReplicationAction` performs its
actions under permit(s). This commit fixes this by returning
`IndexShard#OPERATIONS_BLOCKED` if operations are blocked, allowing these two
cases to be distinguished.
The date_histogram internally converts obsolete timezones (such as
"Canada/Mountain") into their modern equivalent ("America/Edmonton").
But rollup just stored the TZ as provided by the user.
When checking the TZ for query validation we used a string comparison,
which would fail due to the date_histo's upgrading behavior.
Instead, we should convert both to a TimeZone object and check if their
rules are compatible.
The `ignore_malformed` option currently works on numeric fields only when the
bad value isn't a string value but not if it is a boolean. In this case we get a
parsing error from the xContent parser which we need to catch in addition to the
field mapper.
Closes#11498
Today the `?preference=custom_string_value` search preference will only change
its choice of a shard copy if something changes the `IndexShardRoutingTable`
for that specific shard. Users can use this behaviour to route searches to a
consistent set of shard copies, which means they can reliably hit copies with
hot caches, and use the other copies only for redundancy in case of failure.
However we do not assert this property anywhere, so we might break it in
future.
This commit adds a test that shows that searches are routed consistently even
if other indices are created/rebalanced/deleted.
Relates https://discuss.elastic.co/t/176598, #41115, #26791
Today we always trim unsafe commits (whose max_seq_no >= global
checkpoint) before starting a read-write or read-only engine. This is
mandatory for read-write engines because they must start with the safe
commit. This is also fine for read-only engines since most of the cases
we should have exactly one commit after closing an index (trimming is a
noop). However, this is dangerous for following indices which might have
more than one commits when they are being closed.
With this change, we move the trimming logic to the ctor of InternalEngine
so we won't trim anything if we are going to open a read-only engine.
Currently enabling profiling disables top-hits optimizations, which is
unfortunate: it would be nice to be able to notice the difference in method
counts and timings depending on whether total hit counts are requested.
`Node#close` is pretty hard to rely on today:
- it might swallow exceptions
- it waits for 10 seconds for threads to terminate but doesn't signal anything
if threads are still not terminated after 10 seconds
This commit makes `IOException`s propagated and splits `Node#close` into
`Node#close` and `Node#awaitClose` so that the decision what to do if a node
takes too long to close can be done on top of `Node#close`.
It also adds synchronization to lifecycle transitions to make them atomic. I
don't think it is a source of problems today, but it makes things easier to
reason about.
Today we check if an index has broken settings when checking if an index
needs to be upgraded. However, it can be the case that an index setting
became broken even if an index is already upgraded to the current
version if the user removed a plugin (or downgraded from the default
distribution to the non-default distribution) while on the same version
of Elasticsearch. In this case, some registered settings would go
missing and the index would now be broken. Yet, we miss this check and
instead of archiving the settings, the index becomes unassigned due to
the missing settings. This commit addresses this by checking for broken
settings whether or not the index is upgraded.
One of the two #getCorrections methods is only used in tests, so we can move
it and any of the required helper methods to that test. Also reducing the
visibility of several methods to package private since the class isn't used
elsewhere outside the package.
Today we erroneously look for a node setting called `readonly` when deciding
whether or not to create a missing directory in a filesystem repository. This
change fixes this by using the repository setting instead.
Closes#41009
Relates #26909
In `TransportRolloverAction` before doing rollover we resolve
source index name (write index) from the alias in the rollover request.
Before evaluating the conditions and executing rollover action, we
retrieve stats, but to do so we used the source index name
resolved from the alias instead of alias from the index.
This fails when the user is assigned a role with index privilege on the
alias instead of the concrete index. This commit fixes this by using
the alias from the request.
After this change, verified that when we retrieve all the stats (including write + read indexes)
we are considering only source index.
Closes#40771
The unified highlighter returns the first sentence of the text when number_of_fragments
is set to 0 (full highlighting). This is a legacy of the removed postings highlighter
that was based on sentence break only. This commit changes this behavior in order
to respect the provided no_match_size value when number_of_fragments is set to 0.
This means that the behavior will be consistent for any value of the number_of_fragments option.
Closes#41066
* Adds Bulk delete API to blob container
* Implement bulk delete API for S3
* Adjust S3Fixture to accept both path styles for bulk deletes since the S3 SDK uses both during our ITs
* Closes#40250
Today the blended term query detects if a term exists in a field by looking at the term statistics in the index.
However the value to indicate that a term has no occurence in a field have changed in Lucene. A non-existing term now returns
a doc and total term frequency of 0. Because of this disrepancy the blended term query picks 0 as the minimum frequency for a term
even if other fields have documents for this terms. This confuses the term queries that the blending creates since some of them
contain a custom state that indicates a frequency of 0 even though the term has some occurence in the field. For these terms an exception
is thrown because the term query always checks that the term state's frequency is greater than 0 if there are documents associate to it.
This change fixes this bug by ignoring terms with a doc freq of 0 when the blended term query picks the minimum term frequency among the
requested fields.
Closes#41118
`TransportReplicationAction.AsyncPrimaryAction#createReplicatedOperation`
exists so it can be overridden in tests. This commit re-works these tests to
use a real `ReplicationOperation` and inlines the now-unnecessary method.
Relates #40706.
Today we fail to join a Zen2 cluster if the cluster UUID does not match our
own, but we do not perform the same validation when joining a Zen1 cluster.
This means that a Zen2 node will pass join validation and be added to a Zen1
cluster but will reject all cluster states from the master.
Relates #37775
Currently we throw an error when a range querys minimum value exceeds the
maximum value due to the fact that they are neighbouring values and both upper
and lower value are excluded from the interval.
Since this is a condition that the user usually doesn't specify conciously (at
least in the case of float and double values its difficult to see which values
are adjacent) we should ignore those "wrong" intervals and create a
MatchNoDocsQuery in those cases.
We should still throw errors with an actionable message if the user specifies
the query interval in a way that min value > max value. This PR adds those
checks and tests for those cases.
Closes#40937
This is related to #36652. We intend to deprecate a number of transport
settings in 7.x and remove them in 8.0. This commit removes the string
usages of these settings.
Pipelines require single-valued agg or a numeric to be returned.
If they don't get that, they throw an exception. Unfortunately, this
exception text is very confusing to users because it usually arises
from pathing "through" multiple terms aggs. The final target is a numeric,
but it's the intermediary aggs that cause the problem.
This commit adds the current agg name to the exception message
so the user knows which "level" is the issue.
Full text queries ignore unmapped fields since https://github.com/elastic/elasticsearch/issues/41022
even if all fields in the query are unmapped.
This change makes sure that we ignore unmapped fields only if they are mixed
with mapped fields and returns a MatchNoDocsQuery otherwise.
Closes#41022
This change adds either ToXContentObject or ToXContentFragment to classes
directly implementing ToXContent currently. This helps in reasoning about
whether those implementations output full xcontent object or just fragments.
Relates to #16347
When a polygon contains a self-intersection due to have twice the same point in no-consecutive position, the polygon builder tries to split the polygon. During the split one of the polygons become invalid as it is not closed and an error is thrown which is not related to the real issue.
We detect this situation now and throw a more meaningful error.
Today a new replica of a closed index does not have a safe commit
invariant when its engine is opened because we won't initialize the
global checkpoint on a recovering replica until the finalize step. With
this change, we can achieve that property by creating a new translog
with the global checkpoint from the primary at the end of phase 1.
This helps avoid memory issues when computing deep sub-aggregations. Because it
should be rare to use sub-aggregations with significant terms, we opted to always
choose breadth first as opposed to exposing a `collect_mode` option.
Closes#28652.
Since #40249, we always reinitialize max_seq_no_of_updates to max_seq_no
when a promoting primary restores history regardless of whether it did
rollback previously or not.
Closes#40929
This commit removes the settings member variable from Node.
This member made it confusing which settings should actually be looked
at. Now all settings are accessed through the final environment.
A small refactoring that removes the primaryTerm field from ReplicasProxy and
instead passes it directly in to the methods that need it. Relates #40706.
This is a dependency of #39504
Motivation:
By refactoring `TransportShardBulkAction#shardOperationOnPrimary` to async, we enable using `DeterministicTaskQueue` based tests to run indexing operations. This was previously impossible since we were blocking on the `write` thread until the `update` thread finished the mapping update.
With this change, the mapping update will trigger a new task in the `write` queue instead.
This change significantly enhances the amount of coverage we get from `SnapshotResiliencyTests` (and other potential future tests) when it comes to tracking down concurrency issues with distributed state machines.
The logical change is effectively all in `TransportShardBulkAction`, the rest of the changes is then simply mechanically moving the caller code and tests to being async and passing the `ActionListener` down.
Since the move to async would've added more parameters to the `private static` steps in this logic, I decided to inline and dry up (between delete and update) the logic as much as I could instead of passing the listener + wait-consumer down through all of them.
* Replace usages RandomizedTestingTask with built-in Gradle Test (#40978)
This commit replaces the existing RandomizedTestingTask and supporting code with Gradle's built-in JUnit support via the Test task type. Additionally, the previous workaround to disable all tasks named "test" and create new unit testing tasks named "unitTest" has been removed such that the "test" task now runs unit tests as per the normal Gradle Java plugin conventions.
(cherry picked from commit 323f312bbc829a63056a79ebe45adced5099f6e6)
* Fix forking JVM runner
* Don't bump shadow plugin version
It could be that we try to shutdown the executor pool before all the
listeners have been invoked. It can happen that one was not invoked if
it timed out and was in the process of being notified that it timed out
on the executor. If we do this shutdown then, a listener will be met
with rejected execution exception. To address this, we first wait until
all listeners have been notified (or timed out) before proceeding with
shutting down the executor.
Relates #40970
Added documentation for node repurpose tool and included documentation on how to repurpose nodes safely. Adjusted order of tools in `elasticsearch-node` tool since the repurpose tool is most likely to be used.
Co-Authored-By: David Turner <david.turner@elastic.co>
Today if `cluster.routing.rebalance.enable: none` then rebalancing is disabled,
but we still execute `balanceByWeights()` and perform some rather expensive
calculations before discovering that we cannot rebalance any shards. In a large
cluster this can make cluster state updates occur rather slowly. With this
change we check earlier whether rebalancing is globally disabled and, if so,
avoid the rebalancing process entirely.
Relates #40942 which was reverted because of egregiously faulty tests.
The number of user data attributes of an index commit has increased
from 6 to 8, but we forgot to adjust. This change increases the initial
size of that map to avoid resizing.
Reducing some methods scope and marking them as static where possible. Removing
"alias" support from AnalysisRegistry#produceAnalyze and changing that method to
return a NamedAnalyzer instead of having a side effect on the analyzer map passed in.
Also, CustomAnalyzerProvider doesn't seem to need the `environment` field.
Today if `cluster.routing.rebalance.enable: none` then rebalancing is disabled,
but we still execute `balanceByWeights()` and perform some rather expensive
calculations before discovering that we cannot rebalance any shards. In a large
cluster this can make cluster state updates occur rather slowly. With this
change we check earlier whether rebalancing is globally disabled and, if so,
avoid the rebalancing process entirely.
The phrase suggesters have an option to remove terms that have
a frequency lower than a provided min_doc_freq. However this value is
overwritten by the frequency of the original term in the popular mode.
This change ensures that we keep the maximum value between the provided
min_doc_value and the original term frequency as a threshold to select
candidates.
Fixes#16764
Today we are strict when parsing build flavor and types off the
wire. This means that if a later version introduces a new build flavor
or type, an older version would not be able to parse what that new
version is sending. For a practical example of this, we recently added
the build type "docker", and this means that in a rolling upgrade
scenario older nodes would not be able to understand the build type that
the newer node is sending. This breaks clusters and is bad. We do not
normally think of adding a new enumeration value as being a
serialization breaking change, it is just not a lesson that we have
learned before. We should be lenient here though, so that we can add
future changes without running the risk of breaking ourselves
horribly. It is either that, or we have super-strict testing
infrastructure here yet still I fear the possibility of mistakes. This
commit changes the parsing of build flavor and build type so that we are
still strict at startup, yet we are lenient with values coming across
the wire. This will help avoid us breaking rolling upgrades, or clients
that are on an older version.
If the transport service is stopped, likely because we are shutting
down, and a retention lease background sync fires the logs will display
a warn message and stacktrace. Yet, this situaton is harmless and can
happen as a normal course of business when shutting down. This commit
suppresses the log messages in this case.
The `getIndexShard()` and `sendReplicaRequest()` methods in
TransportReplicationAction are effectively only used to customise some
behaviour in tests. However there are other ways to do this that do not cause
such an obstacle to separating the TransportReplicationAction into its two
halves (see #40706).
This commit removes these customisation points and injects the test-only
behaviour using other techniques.
Many gradle projects specifically use the -try exclude flag, because
there are many cases where auto-closeable resource ignore is never
referenced in body of corresponding try statement. Suppressing this
warning specifically in each case that it happens using
`@SuppressWarnings("try")` would be very verbose.
This change removes `-try` from any gradle project and adds it to the
build plugin. Also this change removes exclude flags from gradle projects
that is already specified in build plugin (for example -deprecation).
Relates to #40366
Primary-replica resync in a mixed-cluster between 6.x and 5.6 can send
operations without sequence number to a replica which already processed
operations with sequence number. This leads to the failure of that
replica for we trip the sequence number assertion when writing resync
operations without sequence number to translog.
This change rejects an illegal combination of flush parameters where
force is true, but wait_if_ongoing is false. This combination is trappy
and should be forbidden.
Closes#36342
If there's an ongoing flush triggered by the translog flush threshold,
we may fail to execute a flush because waitIfOngoing is false by
default.
Relates to #36342
If a refresh, which is scheduled by the setting change, executes after
the index-2 operation and win the refresh race (i.e., maybeRefresh) with
the scheduledRefresh that we are going to check, then the latter will
return false.
Closes#39565
Relates #39462
PR #40387
A user reported that the same query that takes ~900ms when querying an index
pattern only takes ~50ms when only querying indices that have matches. The
query is a date range query and we confirmed that the `can_match` phase works
as expected. I was able to reproduce this issue locally with a single node: with
900 1-shard indices, a query to an index pattern that matches all indices runs
in ~90ms while a query to the only index that has matches runs in 0-1ms.
This ended up not being related to the `can_match` phase but to the cost of
resolving aliases when querying an index pattern that matches lots of indices.
In that case, we first resolve the index pattern to a list of concrete indices
and then for each concrete index, we check whether it was matched through an
alias, meaning we might have to apply alias filters. Unfortunately this second
per-index operation runs in linear time with the number of matched concrete
indices, which means that alias resolution runs in O(num_indices^2) overall.
So queries get exponentially slower as an index pattern matches more indices.
I reorganized alias resolution into a one-step operation that runs in linear
time with the number of matches indices, and then a per-index operation that
runs in linear time with the number of aliases of this index. This makes alias
resolution run is O(num_indices * num_aliases_per_index) overall instead. When
testing the scenario described above, the `took` went down from ~90ms to ~10ms.
It is still more than the 0-1ms latency that one gets when only querying the
single index that has data, but still much better than what we had before.
Closes#40248
If a field `field_name` was missing in a document,
doc['field_name'].get(0) incorrectly retrieved
a value of the previously accessed document.
This happened because `get(int index)` function
was just accessing `values[index]` without
checking the number of values - `count`.
This PR fixes this.
There were some test failures caused by the background retention lease sync running on a relocated
primary. This commit fixes the situation that triggered the assertion and reactivates the failing test.
Closes#40731
The Eclipse compiler (4.10, Photon) cannot build this test because it cannot
correctly infer the type arguments of the functions. Explicitely adding them
helps in this case.
This change adds the following internal refactorings:
* wraps input analyzers into an unmodifiable map in IndexAnalyzers ctor
* removes duplicated indexSetting in IndexAnalyzers
* removes references to IndexAnalyzers from DocumentMapperParser and TypeParser.ParserContext.
It can always be retrieve it from MapperService directly in those cases
It is important that resync actions are not rejected on the primary even if its
`write` threadpool is overloaded. Today we do this by exposing
`registerRequestHandlers` to subclasses and overriding it in
`TransportResyncReplicationAction`. This isn't ideal because it obscures the
difference between this action and other replication actions, and also might
allow subclasses to try and use some state before they are properly
initialised. This change replaces this override with a constructor parameter to
solve these issues.
Relates #40706
This commit deprecates versions of Java prior to Java 11. This commit
will cause a warning to be printed to standard error when any command
line tool is invoked, or when Elasticsearch is started. Additionally, we
log a deprecation message when Elasticsearch is started.
`TransportReplicationAction` is a rather complex beast, and some of its
concrete implementations do not need all of its features. More specifically, it
(a) chases a primary around the cluster until it manages to pin it down and
then (b) executes an action on that primary and all its replicas. There are
some actions that are coordinated by the primary itself, meaning that there is
no need for the chase-the-primary phases, and in the case of peer recovery
retention leases and primary/replica resync it is important to bypass these
first phases.
This commit is a step towards separating the `TransportReplicationAction` into
these two parts. It is a mostly mechanical sequence of steps to remove some
abstractions that are no longer in use.
Completion and DocStats are pulled from internal readers
instead of external since #33835 and #33847 which doesn't require
us to refresh after a stats call since refreshes will happen internally
anyhow and that will cause updated stats on ongoing indexing.
In 6.7.0 (#39378) we added a build type of DOCKER for the docker images, but
unfortunately earlier versions do not understand this and will reject any
transport messages that mention this build type.
This commit fixes this by reporting TAR instead of DOCKER when talking to older
nodes.
Relates (but does not fix) #40511
Relates #39378
In order to remain compatible with the existing joda based
implementation the parsing of milliseconds should support parsing single
digits instead of relying on three, even with strict formats.
This adds a few tests to duel against the existing joda based
implementation in order to ensure the parsing behaviour is the same.
Closes#40403
Currently, if Manifest write is unsuccessful (i.e. WriteStateException
is thrown) we perform cleanup of newly created metadata files.
However, this is wrong.
Consider the following sequence (caught by CI here
https://github.com/elastic/elasticsearch/issues/39077):
- cluster global data is written **successful**
- the associated manifest write **fails** (during the fsync, ie files
have been written)
- deleting (revert) the manifest files, **fails**, metadata is
therefore persisted
- deleting (revert) the cluster global data is **successful**
In this case, when trying to load metadata (after node restart
because of dirty WriteStateException), the following exception will
happen
```
java.io.IOException: failed to find global metadata [generation: 0]
```
because the manifest file is referencing missing global metadata file.
This commit checks if thrown WriteStateException is dirty and if its
we don't perform any cleanup, because new Manifest file might be
created, but its deletion has failed.
In the future, we might add more fine-grained check - perform the
clean up if WriteStateException is dirty, but Manifest deletion is
successful.
Closes https://github.com/elastic/elasticsearch/issues/39077
(cherry picked from commit 1fac56916bb3c4f3333c639e59188dbe743e385b)
On mapping updates the `text` field mapper does not update
the field types for the underlying prefix and phrase fields.
In practice this shouldn't be considered as a bug but we have
an assert in the code that check that field types in the mapper service
are identical to the ones present in field mappers.
When geo point parsing threw a parse exception, it did not consume
remaining tokens from the parser. This in turn meant that
indexing documents with malformed geo points into mappings with
ignore_malformed=true would fail in some cases, since DocumentParser
expects geo_point parsing to end on the END_OBJECT token.
Related to #17617
As part of #40177 we have added top-level pipeline aggs to
`InternalAggregations`. Given that `QuerySearchResult` holds an
`InternalAggregations` instance, there is no need to keep on setting
top-level pipeline aggs separately. Top-level pipeline aggs can then
always be transported through `InternalAggregations`. Such change is
made in a backwards compatible manner.
IOException are never thrown in any of the existing pipeline aggregation
builders. Removing the throws IOException from the create method allows
to remove it also from a couple of other methods which ends up simplifying
AggregationPhase (one less catch).
The cat recovery API is incredibly useful. Yet it is missing the start
and stop time as an option from the output. This commit adds these as
options to the cat recovery API. We elect to make these not visible by
default to avoid breaking the output that users might rely on.
To make script_score query to have the same features
as function_score query, we need to add randomScore
function.
This function produces different
random scores on different index shards.
It is also able to produce random scores
based on the internal Lucene Document Ids.
Today if you try and insert a very large number like `1e9999999` into a long
field we first construct this number as a `BigDecimal`, convert this to a
`BigInteger` and then reject it because it is out of range. Unfortunately
making such a large `BigInteger` is rather expensive.
We can avoid this expense by performing a (weaker) range check on the
`BigDecimal` representation of incoming `long`s too.
Relates #26137Closes#40323