When a search on some indices takes a long time, it may cause problems to other indices that are being searched as part of the same search request and being written to as well, because their search context needs to stay open for a long time. This is especially a problem when searching against throttled and non-throttled indices as part of the same request. The problem can be generalized though: this may happen whenever read-only indices are searched together with indices that are being written to. Search contexts staying open for a long time is only an issue for indices that are being written to, in practice.
This commit splits the search in two sub-searches: one for read-only indices, and one for ordinary indices. This way the two don't interfere with each other. The split is done only when size is greater than 0, no scroll is provided and query_then_fetch is used as search type. Otherwise, the search executes like before. Note that the returned num_reduce_phases reflect the number of reduction phases that were run. If the search is split in two, there are three reductions: one non-final for each search, and a final one that merges the results of the previous two.
Closes#40900
The SearchRequest#crossClusterSearch method is currently used only as
part of cross cluster search request, when minimizing roundtrips.
It will soon be used also when splitting a search into two: one for
throttled and one for non throttled indices. It will probably be used
for other usecases as well in the future, hence it makes sense to generalize its name to subSearchRequest.
Now emphasises the test is for indexed values.
Previous documentation only mentioned the state of the input JSON doc (null values) but this is only one of several reasons why an indexed value may not exist.
Closes#24256
In these tests, we sleep for a small multiple of the renew interval,
then check that the retention leases are not changed. If a renewal
request takes longer than that interval because of GC or slow CI, then
the retention leases are not the same as before sleep. With this change,
we relax to assert that we eventually stop the renewable process.
Closes#39509
* stop data frame task after it finishes
* test auto stop
* adapt tests
* persist the state correctly and move stop into listener
* Calling `onStop` even if persistence fails, changing `stop` to rely on doSaveState
Though not in use in elasticsearch currently, it seems surprising that
ThreadPool.scheduler().scheduleAtFixedRate would hang. A recurring
scheduled task is never completed (except on failure) and we test for
exceptions using RunnableFuture.get(), which hangs for periodic tasks.
Fixed by checking that task is done before calling .get().
The description field of xpack featuresets is optionally part of the
xpack info api, when using the verbose flag. However, this information
is unnecessary, as it is better left for documentation (and the existing
descriptions describe anything meaningful). This commit removes the
description field from feature sets.
Unresponsive network simulation would throw away requests. However, then
we no longer have any guarantees that a transport action either succeeds
or fails, which could lead to hangs (example: unclosed IndexShard
permits).
Closes#42244
The tests for the ML TimeoutChecker rely on threads
not being interrupted after the TimeoutChecker is
closed. This change ensures this by making the
close() and setTimeoutExceeded() methods synchronized
so that the code inside them cannot execute
simultaneously.
Fixes#43097
When starting BWC nodes, it could be that runtime Java home is set. Yet,
runtime Java home can advance beyond what a BWC node might be compatible
with. For example, if runtime Java home is set to JDK 13 and we are
starting a 7.1.2 node, we do not have any guarantees that 7.1.2 is
compatible with JDK 13 (since we never did any work to make it so). This
will continue to be the case as JDK releases advance, but we still need
to test against BWC nodes. This commit stops applying runtime Java home
when starting a BWC node. Instead, we would use the bundled JDK.
* Documents the new deprecations options on the rest-api-spec
Relates #41439#38613#35262
* remove reference to path now that #41452 is merged, also fixed missing a comma rendering the example json invalid
* removed one more instance of path
* make sure json examples are self contained and not excerpts
(cherry picked from commit 4430f99750a3bf98373d69d2be59d71475c7aaad)
Today the master eagerly reroutes the cluster as part of processing node joins.
However, it is not necessary to do this reroute straight away, and it is
sometimes preferable to defer it until later. For instance, when the master
wins its election it processes joins and performs a reroute, but it would be
better to defer the reroute until after the master has become properly
established.
This change defers this reroute into a separate task, and batches multiple such
tasks together.
#40741 introduced a merge policy that can drop the postings for the `_id`
field on soft deleted documents. The TermsSliceQuery assumes that every document
has has an entry in the postings for that field so it doesn't check if the terms
index exists or not. This change fixes this bug by checking if the terms index for
the `_id` field is null and ignore the segment entirely if it's the case. This should
be harmless since segments without an `_id` terms index should only contain soft deleted
documents.
Closes#42996
Rest docs page update
- have the section be on separate pages
- add an Overview page
- add other formats examples
(cherry picked from commit 309bd691ff3f8625f67ca09fc1dd8e265f7e6c92)
It turns out that key rotation on the OP, can manifest as both
a BadJWSException and a BadJOSEException in nimbus-jose-jwt. As
such we cannot depend on matching only BadJWSExceptions to
determine if we should poll the remote JWKs for an update.
This has the side-effect that a remote JWKs source will be polled
exactly one additional time too for errors that have to do with
configuration, or for errors that might be caused by not synched
clocks, forged JWTs, etc. ( These will throw a BadJWTException
which extends BadJOSEException also )
If linearizability checking fails with OOM (or other exception), we did
not get the serialized history written into the log, making it difficult
to debug in cases where the problem is hard to reproduce. Fixed to
always attempt dumping the serialized history.
Related to #42244
* [ML] Adding support for geo_shape, geo_centroid, geo_point in datafeeds
* only supporting doc_values for geo_point fields
* moving validation into GeoPointField ctor
The retention leases stats is null if the processing shard copy is being
closed. In this the case, we should check against null then retry to
avoid failing a test.
Closes#41237
WriteActionsTests#testBulk and WriteActionsTests#testIndex sometimes
fail with a pending retention lock. We might leak retention locks when
switching to async recovery. However, it's more likely that ongoing
recoveries prevent the retention lock from releasing.
This change increases the waiting time when we check for no pending
retention lock and also ensures no ongoing recovery in
WriteActionsTests.
Closes#41054
When using gradle run by itself, this uses the default distro with a
basic license and enables security. There is a setup command to create
a elastic-admin user but only when the license is a trial license. Now
that security is available with the basic license, we should always run
this command when using the default distribution.
If the source field name is a prefix of the target field name, the
source field still exists after rename processor has run. Adjusted test
case to handle that case.
Get resources action sorts on the resource id. When there are no resources at
all, then it is possible the index does not contain a mapping for the resource
id field. In that case, the search api fails by default.
This commit adjusts the search request to ignore unmapped fields.
Closeselastic/kibana#37870
Both TransportAnalyzeAction and CategorizationAnalyzer have logic to build
custom analyzers for index-independent analysis. A lot of this code is duplicated,
and it requires the AnalysisRegistry to expose a number of internal provider
classes, as well as making some assumptions about when analysis components are
constructed.
This commit moves the build logic directly into AnalysisRegistry, reducing the
registry's API surface considerably.
We initially added `requireDocker` for a way for tasks to say that they
absolutely must have it, like the build docker image tasks.
Projects using the test fixtures plugin are not in this both, as the
intent with these is that they will be skipped if docker and docker-compose
is not available.
Before this change we were lenient, the docker image build would succeed
but produce nothing. The implementation was also confusing as it was not
immediately obvious this was the case due to all the indirection in the
code.
The reason we have this leniency is that when we added the docker image
build, docker was a fairly new requirement for us, and we didn't have
it deployed in CI widely enough nor had CI configured to prefer workers
with docker when possible. We are in a much better position now.
The other reason was other stack teams running `./gradlew assemble`
in their respective CI and the possibility of breaking them if docker is
not installed. We have been advocating for building specific distros for
some time now and I will also send out an additional notice
The PR also removes the use of `requireDocker` from tests that actually
use test fixtures and are ok without it, and fixes a bug in test
fixtures that would cause incorrect configuration and allow some tasks
to run when docker was not available and they shouldn't have.
Closes #42680 and #42829 see also #42719
Setting `auto` after the fuzzy operator (e.g. `"query": "foo~auto"`) in the `query_string`
does not take the length of the term into account when computing the distance and always use
a max distance of 1. This change fixes this disrepancy by ensuring that the term is passed when
the fuzziness is computed.
We respect allocation deciders, including the `MaxRetryAllocationDecider`, when
executing reroute commands. If you specify `?retry_failed=true` then the retry
counter is reset, but today this does not happen until after trying to execute
the reroute commands. This means that if an allocation has repeatedly failed,
but you want to take control and assign a shard to a particular node to work
around the repeated failures, you cannot execute the routing command in the
same call to `POST /_cluster/reroute` as the one that resets the failure
counter.
This commit fixes this by resetting the failure counter first, meaning that you
can now explicitly allocate a repeatedly-failed shard like this:
```
POST /_cluster/reroute?retry_failed=true
{
"commands": [
{
"allocate_replica": {
"index": "blahblah",
"shard": 2,
"node": "node-4"
}
}
]
}
```
Fixes#39546
* Take into consideration a wider range of Numbers when extracting the
values from source, more specifically - BigInteger and BigDecimal.
(cherry picked from commit 561b8d73dd7b03c50242e4e3f0128b2142959176)