This commit changes the way how files are selected for retransmission
on recovery / restore. Today this happens on a per-file basis where the
rather weak checksum and the file length in bytes is compared to check if
a file is identical. This is prone to fail in the case of a checksum collision
which can happen under certain circumstances.
The changes in this commit move the identity comparsion to a per-commit / per-segment
level where files are only treated as identical iff all the other files in the
commit / segment are the same. This "all or nothing" strategy is reducing the chance for
a collision dramatically since we also use a strong hash to identify commits / segments
based on the content of the ".si" / "segments.N" file.
Closes#7351
resetting the clients on each test (in after test) makes the tests running, especially in network mode, much slower, since transport client needs to be created each time when randmized to be used. Also, on OSX, the excessive connections causes bind exceptions eventually which makes running the network tests much harder on it.
closes#7329
Today we always use the latest commit point to return the metadata from
the store. This might cause problems for snapshot and restore since in
contrast to recovery it won't prevent concurrent flushes (lucene commits).
This can lead to all kinds of interesting effects if we are snapshotting
while flushing. This change uses the IndexCommit to open the metadata snapshot
from the store which is consistent with what we snapshot.
Closes#7376
Changes the serialisation of pre and post offset to use Long instead of VLong so that negative values are supported. This actually only showed up in the case where minDocCount=0 as the rounding is only serialised in this case.
Closes#7312
The term vector API can now generate term vectors on the fly, if the terms are
not already stored in the index. This commit exploits this new functionality
for the MLT query. Now the terms are directly retrieved using multi-
termvectors API, instead of generating them from the texts retrieved using the
multi-get API.
Closes#7014
Today we have very confusing naming since some methods names claim to
read binary but in fact read utf-16 and convert to utf-8 under the hood.
This commit clarifies the interfaces and adds additional documentation.
Closes#7367
These options are not used anymore. Instead numeric values can now contain dups
and it is the responsibility of the aggregation to deal with it (eg. terms).
And otherwise all values sources are now sorted, which is the contract of the
interfaces that they implement.
Close#7276
A request that relates to indices (`IndicesRequest` or `CompositeIndicesRequest`) might be converted to some other internal request(s) (e.g. shard level request) that get distributed over the cluster. Those requests contain the concrete index they refer to, but it is not known which indices (or aliases or expressions) the original request related to.
This commit makes sure that the original indices are available as part of the shard level requests and makes them implement `IndicesRequest` as well.
Also every internal request should be created passing in the original request, so that the original headers, together with the eventual original indices and options, get copied to it. Corrected some places where this information was lost.
NOTE: As for the bulk api and other multi items api (e.g. multi_get), their shard level requests won't keep around the whole set of original indices, but only the ones that related to the bulk items sent to each shard, the important bit is that we keep the original names though, not only the concrete ones.
Closes#7319
This change fixes the creation circle shapes o it calculates it correctly instead of essentially using the diameter as the radius. The radius has to be converted into degrees but calculating the ratio of the desired radius to the circumference of the earth and then multiplying it by 360 (number of degrees around the earths circumference). This issue here was that it was only multiplied by 180 making the result out by a factor of 2. Also made the test for circles actually check to make sure it has the correct centre and radius.
Closes#7301
Recent test failures triggered by #7289 were caused by this, simply because internal node settings (transport type key) that are not supported by the external older nodes were copied to them by mistake.
* Removed & refactored unused module code
* Allowed to set transports programmatically
* Allow to set the source of the changed transport
Note: The current implementation breaks BWC as you need to specify a concrete
transport now instead of a module if you want to use a different
Transport or HttpServerTransport
Closes#7289
TransportShardSingleOperationAction is currently subclassed by different transport actions. Some of them are internal only, meaning that their execution will take place only in the same node where their parent execution took place. That means that their main transport handler doesn't need to be registered, the only transport handler that's needed is the shard level one.
Added `isSubAction` method (defaults to false) to the parent class that tells whether the action is a main one or a subaction, used to decide whether we need to register the main transport handler.
Closes#7285
These two aggregators basically do exactly the same thing, they just interpret
bytes differently. This refactoring found an (unreleased) bug in the long terms
aggregator which didn't work correctly with duplicate values.
Close#7279
This was causing too much work e.g. when pulling node stats or when
opening a new reader, because the least_used distributor would
unnecessarily check free disk space on all path.data entires every
time we try to open a file for reading or check its length.
Closes#7306Closes#7323