After a shard fails on a node we assign a new replica on another node. This is important in order to avoid failing again due to node specific problems. In the rare case where two different replicas of the same shard failed in a short time span, we may fail to do so and assign one of them back to the node it's currently on. This happens if both shard failed events are processed within the same batch on the master.
Closes#5725
In TransportSearchTypeAction we need to increment successful ops
first before we increment and compare the exit condition otherwise if we
are fast we could concurrently update totalOps but then preempt one
of the threads which can cause the successor to read a wrong value from
successfulOps if second phase is very fast ie. searchType == count etc.
This can cause wrong success stats in the search response.
ElasticsearchRestTests extends now ElasticsearchIntegrationTest and makes use of our ordinary test infrastructure, in particular all randomized aspects now come for free instead of having to maintain a separate (custom) tests runner
We previously parsed only the tests that needed to be run given the version of the cluster the tests are running against. This doesn't happen anymore as it didn't buy much and it would be harder to support as the tests get now parsed before the test cluster gets started. Thus all the tests are now parsed regardless of their skip sections, afterwards the ones that don't need to be run will be skipped through assume directives.
Fixed REST tests that rely on a specific number of shards as this change introduces also random number of shards and replicas (through randomIndexTemplate)
Closes#5654
Refactor the rest layer handlers to simplify common code paths (like handling) failures, and introduce optional (enabled for netty) rest channel bytes recycling
- consolidated value source parsing under a single parser that is reused in all the value source aggs parsers
- consolidated include/exclude parsing under a single parser
- cleaned up value format handling, to have consistent behaviour across all values source aggs
since we use a shared cluster, calling clear on the mock array / page recycler can cause removing a valid on going reference, and then when its released, the release will fail because it can't be found.
There is no real reason to call clear, checking if pages/arrays have been released takes the snapshot behavior here into account.
This change also makes sure we don't use the mock classes in places where we don't really release.
Note, with this change DoubleTermsTests fails, since it causes failures when creating aggs in the pre process phase, causing obtained arrays not to be released. This needs to be fixed before pulling this change in.
create a new releasable bytes output, that can be recycled, and use it in netty and the translog, 2 areas where the recycling will help nicely.
Note, opted for statically typed enforced releasble bytes output, to make sure people take the extra care to control when the bytes reference are released.
Also, the mock page/array classes were fixed to not take into account potential recycling going during teardown, for example, on a shared cluster ping requests still happen, so recycling happen actively during teardown.
closes#5691
Whether or not the stacktrace is displayed is controlled by bootstrap
log level setting, so that bootstrap: DEBUG displays the stack trace on
output, like it does on log
Closes#5102