This commit removes custom comparators in favor of the ones that are in Lucene.
The major change is for nested documents: instead of having a comparator wrapper
that deals with nested documents, this is done at the fielddata level by having
a selector that returns the value to use for comparison.
Sorting with custom missing string values might be slower since it is using
TermValComparator since Lucene's TermOrdValComparator only supports sorting
missing values first or last. But other than this particular case, this change
will allow us to benefit from improvements on comparators from the Lucene side.
Close#5980
This change just changes the default for index.codec.bloom.load to
false: with recent performance improvements to ID lookup, such as
#6298, bloom filters don't give much of a performance gain anymore,
and they can consume non-trivial RAM when there are many tiny
documents.
For now, we still index the bloom filters, so if a given app wants
them back, it can just update the index.codec.bloom.load to true.
Closes#6959
GET only returned null even when stored if requested with GET like this:
`curl -XGET "http://localhost:9200/test/test/1?fields=_all"`
Instead, it should simply behave like a String field and return the
concatenated fields as String.
closes#6924
A new option `prune` has been added to allow users to control phrase suggestion pruning when `collate`
is set. If the new option is set, the phrase suggestion option will contain a boolean `collate_match`
indicating whether the respective result had hits in collation.
CLoses#6927
Today we only do count searches to ensure sane results are returned
after upgrading etc. This change adds sorting to the picture asserting
on simple numeric sorting that uses field data etc. after upgrading.
Relates to #6967
In unicast discovery, we try to reuse existing discovery nodes based on the node address they have. If we find an existing node based on its address, and for some reason its not connected, don't add it to the list of nodes to disconnect from, as that (full) connection is useful down the road
closes#6966
Looking at the connect code, if 2 threads at the same time try and connect to a node, and both enter sequentially the connectLock code block, the second one would try and put the connection in the map, and close the replaced channels, which will cause the existing connection to close as well (since it removes the node from the connectedNodes map)
To fix this, simply make sure we properly check the existence of the connection within the connectionLock block, so there won't be concurrent connections going on.
While doing this, also went over all the mutation code that handles disconnections, and made sure they are properly done only within a connection lock.
closes#6964
This commits removes BytesValues/LongValues/DoubleValues/... and tries to use
Lucene's APIs such as NumericDocValues or RandomAccessOrds instead whenever
possible.
The next step would be to take advantage of the fact that APIs are the same in
Lucene and Elasticsearch in order to remove our custom comparators and use
Lucene's.
There are a few side-effects to this change:
- GeoDistanceComparator has been removed, DoubleValuesComparator is used instead
on top of dynamically computed values (was easier than migrating
GeoDistanceComparator).
- SortedNumericDocValues doesn't guarantee uniqueness so long/double terms
aggregators have been updated to make sure a document cannot fall twice in
the same bucket.
- Sorting by maximum value of a field or running a `max` aggregation is
potentially significantly faster thanks to the random-access API.
Our aggs and p/c aggregations benchmarks don't report differences with this
change on uninverted field data. However the fact that doc values don't need
to be wrapped anymore seems to help a lot. For example
TermsAggregationSearchBenchmark reports ~30% faster terms aggregations on doc
values on string fields with this change, which are now only ~18% slower than
uninverted field data although stored on disk.
Close#6908
The assertion on binary equality for streamable serialization
sometimes fails due to the usage of identify hashmaps inside
the InternalSearchHits serialization. This only happens if
the number of shards the result set is composed of is very high.
This commit makes the serialziation deterministic and removes
the need to serialize the ordinal due to in-order serialization.
This commit adds the ability to run bwc tests during the release
process to ensure the current release is backwards compatible with
the latest installed previous version. Installed means available
in the configured bwc test path.
Closes#6953
The change introduced in #6949 (do not serialize the cluster state) also means master now responds with an empty response rather then a JoinResponse. However, sendJoinRequestBlocking still expected a JoinRequest.
At the moment we serialize the cluster state in JoinResponse and ValidateJoinRequest. However this state is not used anywhere and can be removed to save on network overhead
Closes#6949