The alias -> (index -> alias) map, specifically the index -> alias one, typically just hold one entry, yet we eagerly initialize it to the number of indices. When there are many indices, each with many aliases, this is a very large overhead per alias...
closes#6504
Our field data currently exposes hashes of the bytes values. That takes roughly
4 bytes per unique value, which is definitely not negligible on high-cardinality
fields.
These hashes have been used for 3 different purposes:
- term-based aggregations,
- parent/child queries,
- the percolator _id -> Query cache.
Both aggregations and parent/child queries have been moved to ordinals which
provide a greater speedup and lower memory usage. In the case of the percolator
it is used in conjunction with HashedBytesRef to not recompute the hash value
when getting resolving a query given its ID. However, removing this has no
impact on PercolatorStressBenchmark.
Close#6500
This was how terms aggregations managed to not be too slow initially by caching
reads into the terms dictionary using ordinals. However, this doesn't behave
nicely on high-cardinality fields since the reads into the terms dict are
random and this execution mode loads all unique terms into memory.
The `global_ordinals` execution mode (default since 1.2) is expected to be
better in all cases.
Close#6499
Under some rare circumstances:
- local transport,
- the range aggregation has both a parent and a child aggregation,
- the range aggregation got no documents on one shard or more and several
documents on one shard or more.
the range aggregation could return incorrect counts and sub aggregations.
The root cause is that since the reduce happens in-place and since the range
aggregation uses the same instance for all sub-aggregation in case of an
empty bucket, sometimes non-empty buckets would have been reduced into this
shared instance.
In order to avoid similar bugs in the future, aggregations have been updated
to return a new instance when reducing instead of doing it in-place.
Close#6435
Moved BulkProcessor tests from BulkTests to newly added BulkProcessorTests class.
Strenghtened BulkProcessorTests by adding randomizations to existing tests and new tests for concurrent requests and expcetions.
Also made sure that afterBulk is called only once per request if concurrentRequests==0.
Closes#5038
When a node sends a join request to the master, only send back the response after it has been added to the master cluster state and published.
This will fix the rare cases where today, a join request can return, and the master, since its under load, have not yet added the node to its cluster state, and the node that joined will start a fault detect against the master, failing since its not part of the cluster state.
Since now the join request is longer, also increase the join request timeout default.
closes#6480
Previously, one MLT query per field was created for each item. One issue with
this method is that the maximum number of selected terms was equal to the
number of items times 'max_query_terms'. Instead, users should have direct control
over the maximum number of selected terms allowed, regardless of the number of
queried items.
Another issue related to the previous method is that it could lead to the
selection of rather uninteresting terms, that because they were found in a
particular queried item. Instead, this new procedure enforces the selection of
interesting terms across ALL items, not within each item. This could lead to
search results where the best matching items share commonalities amongst the
best characteristics of all the items.
Closes#6404
This commit adds checks for nocommit and tabs in the source code.
The task is executed during the validate phase and can be disabled via
`-Dvalidate.skip`
* `english` returned the slow snowball English stemmer
* `porter2` returned the snowball Porter stemmer (v1)
* `portuguese` was used twice, preventing the second version from working
Changes:
* `english` now returns the fast PorterStemmer (for indices created from v1.3.0 onwards)
* `porter2` now returns the snowball English stemmer (for indices created from v1.3.0 onwards)
* `light_english` now returns the `kstem` stemmer (`kstem` still works)
* `portuguese_rslp` returns the PortugueseStemmer
* `dutch_kp` is a synonym for `kp`
Tests and docs updated
Fixes#6345Fixes#6213Fixes#6330
Bugs:
* "groups" and "types" were being ignored
* "completion_fields" as wildcards were not being resolved to fieldnames
Enhancements:
* Made "groups" and "types" support wildcards
* Added missing tests
Closes#6390
In order to return more information to the client, in case a TransportClient
can not connect to the cluster, this commit adds logging and also returns the
configured nodes in the NoNodeAvailableException
Also a minor bug has been fixed, which propagated exceptions wrong, so that an
invalid request was actually tried on every node, if a regular connection failure
on the first node had happened.
Closes#6376