Shard requests like broadcast delete and delete by query, that needs to be executed on primary and all replicas, get read and written out to the transport on the same node. That means that if we add some field version checks are not enough to maintain bw comp since a newer node that holds the primary might receive the request from an older node, that didn't provide the field. Yet, when writing the request out again to a newer node that holds the replica, we do try and serialize the field although it's missing. The newer fields just needs to be set to optional in these cases, in addition to the version checks.
Re-enabled testDeleteByQuery and testDeleteRoutingRequired bw comp tests since this was the cause of their failures.
Closes#7406
The verifyThreadNames starts a node and checks that all new threads on the JVM are properly named. The current test uses the name of the new node which sometimes fails because our shared cluster spawns a new thread which is properly named but for not for the new name.
The commits relaxes the requirement of the test and on verify the threads are properly named (but not necessarily of the new node)
Items with no specified field now defaults to all the possible fields from the
document source. Previously, we had required 'fields' to be specified either
as a top level parameter or for each item. The default behavior is now similar
to the MLT API.
Closes#7382
This commit changes the way how files are selected for retransmission
on recovery / restore. Today this happens on a per-file basis where the
rather weak checksum and the file length in bytes is compared to check if
a file is identical. This is prone to fail in the case of a checksum collision
which can happen under certain circumstances.
The changes in this commit move the identity comparsion to a per-commit / per-segment
level where files are only treated as identical iff all the other files in the
commit / segment are the same. This "all or nothing" strategy is reducing the chance for
a collision dramatically since we also use a strong hash to identify commits / segments
based on the content of the ".si" / "segments.N" file.
Closes#7351
resetting the clients on each test (in after test) makes the tests running, especially in network mode, much slower, since transport client needs to be created each time when randmized to be used. Also, on OSX, the excessive connections causes bind exceptions eventually which makes running the network tests much harder on it.
closes#7329
Today we always use the latest commit point to return the metadata from
the store. This might cause problems for snapshot and restore since in
contrast to recovery it won't prevent concurrent flushes (lucene commits).
This can lead to all kinds of interesting effects if we are snapshotting
while flushing. This change uses the IndexCommit to open the metadata snapshot
from the store which is consistent with what we snapshot.
Closes#7376
Changes the serialisation of pre and post offset to use Long instead of VLong so that negative values are supported. This actually only showed up in the case where minDocCount=0 as the rounding is only serialised in this case.
Closes#7312
The term vector API can now generate term vectors on the fly, if the terms are
not already stored in the index. This commit exploits this new functionality
for the MLT query. Now the terms are directly retrieved using multi-
termvectors API, instead of generating them from the texts retrieved using the
multi-get API.
Closes#7014
Today we have very confusing naming since some methods names claim to
read binary but in fact read utf-16 and convert to utf-8 under the hood.
This commit clarifies the interfaces and adds additional documentation.
Closes#7367
These options are not used anymore. Instead numeric values can now contain dups
and it is the responsibility of the aggregation to deal with it (eg. terms).
And otherwise all values sources are now sorted, which is the contract of the
interfaces that they implement.
Close#7276
A request that relates to indices (`IndicesRequest` or `CompositeIndicesRequest`) might be converted to some other internal request(s) (e.g. shard level request) that get distributed over the cluster. Those requests contain the concrete index they refer to, but it is not known which indices (or aliases or expressions) the original request related to.
This commit makes sure that the original indices are available as part of the shard level requests and makes them implement `IndicesRequest` as well.
Also every internal request should be created passing in the original request, so that the original headers, together with the eventual original indices and options, get copied to it. Corrected some places where this information was lost.
NOTE: As for the bulk api and other multi items api (e.g. multi_get), their shard level requests won't keep around the whole set of original indices, but only the ones that related to the bulk items sent to each shard, the important bit is that we keep the original names though, not only the concrete ones.
Closes#7319