Although SORTED_SET doc values make things like terms aggregations very fast
thanks to the use of ordinals, ordinals are usually not that useful on numeric
data. We are more interested in the values themselves in order to be able to
compute sums, averages, etc. on these values. However, SORTED_SET is quite slow
at accessing values, so BINARY doc values are better suited at storing numeric
data.
floats and doubles are encoded without compression with little-endian byte order
(so that it may be optimizable through sun.misc.Unsafe in the future given that
most computers nowadays use the little-endian byte order) and byte, short, int,
and long are encoded using vLong encoding: they first encode the minimum value
using zig-zag encoding (so that negative values become positive) and then deltas
between successive values.
Close#3993
This prevents missing very short timeouts which fire before the calling thread had the chance to add the task to the queue and are therefore ignored. This is mostly of importance for testing where we explicitly want tasks to timeout and set it to a very low value.
- also, in geo distance, because it's based on range agg and that by default the order of the geo point per doc is unknown, always wrap it in a dedicated field data source which sorts the values if needed. But most of the times, a doc will be associated with a single point and therefore most of this wrapping is redundant and adds perf. cost for nothing.
- the idea here is for every request that "hits" a field data agg, we'll first iterate over the searchable segments and load their field data and compute the cross-segment info out of them. This info will be placed in the field context with which the value sources are created.
- we currently have some of this info on the IndexFieldData, but problem with getting it from there is that we may easily end up getting wrong info that originate in unsearchable segments.
The `geo_shape filter and query` option in geo_shape filter and query has been replaced with the `path` option, which allows these filter and query to fetch shapes from within objects as well.
Closes#4486
the build_release.py tool now also downloads and verfyfies the
released packages from S3. It checks integrity based on the sha1
checksums and runs the smoketest against the specs in the current branch.
The fixes introduced in #4235 and #4411 do not take into account, that a
template JSON in the config/ directory includes a template name, as opposed
when calling the Put Template API.
This PR allows to put both formats (either specifying a template name or not)
into files. However you template name/id may not be one of the template
element names like "template", "settings", "order" or "mapping".
Closes#4511
When a search on a shard to a remove node fails, and then replica exists on the local node, then the execution of the search is done on the network thread. This is problematic since we need to execute it on the actual search thread pool, but can also explain #4519, where the get happens on the network thread and it waits to send the get request till the network thread we use is freed (deadlock...)
fixes#4526
note, re-enable the geo shape fetch test, this fix should solve it as well
Allow to have a new index level setting index.codec.bloom.load (default to true), that can control if the boom filters will be loaded or not. This is an updateable setting, that can be updated on a live index using the update settings API.
Note though, when this setting is updated, a fresh Lucene index will be reopened, causing associate caches to be dropped potentially.
closes#4525
Note, this change also disables the returning lucene ram usage stats, due to a bug in Lucene, relates to #4512
The shards in the set are mutated after they are added to the
set such that the hashcode doesn't fit anymore. For this reason
this used an identity hashset before but the downside of this is
that the iteration order is not deterministic. We can just use a list
since shard removal is a very rare action and the size of the list is
very small such that iteration is fast.
The memory used for the Lucene index (term dict, bloom filter, ...) can now be reported per segment using the segments API, and on the segments flag on node/indices stats
closes#4512