At the moment the IndexingMemoryController can try to update the index buffer memory of shards at any give moment. This update involves a flush, which may cause a FlushNotAllowedEngineException to be thrown in a concurrently finalizing recovery.
Closes#6642, closes#6667
Due to a bogus if-check in SearchSourceBuilder.fetchSource(String include, String exclude)
the excludes only got set when the includes were not null. Fixed this and added some
basic tests.
Closes#6632
Sorting fails on unmapped fields so the new propagation delay of the mappings
exposed this issue. I added explicit mappings as part of index creation to fix it.
when the primary shard is recovering its translog, make sure to wait for new mapping introductions till the mappings have been updated on the master before finalizing the recovery itself
also, this change performs the mapping updates in a more optimized manner by batching the types to change into a single set and sending after the translog has been replayed
also, remove the wait for mapping on master in the local state tests since this new behavior covers it
closes#6666
remove waiting for mapping on master since we do it in recovery
Some IO api can return after writing & reading only a part of the requested data. On these rare occasions, we should call the methods again to read/write the rest of the data. This has cause rare translog corruption while writing huge documents on Windows.
Noteful parts of the commit:
- A new Channels class with utility methods for reading and writing to channels
- Writing or reading to channels is added to the forbidden API list
- Added locking to SimpleFsTranslogFile
- Removed FileChannelInputStream which was not used
Closes#6441 , #6576
All settings should be passes as settings and the enviroment should not
influence the test cluster settings. The settings we care about ie.
`es.node.mode` and `es.logger.level` should be passed via settings.
This allows tests to override these settings if they for instance need
`network` transport to operate at all.
Closes#6663
Today `index.version.created` depends on the version of the master
node in the cluster. This is potentially causing new features to be
expected on shards that didn't exist when the index was created.
There is no notion of `where was the shard allocated first` such that
`index.version.created` can't be reliably used as a feature flag.
With this change the `index.version.created` can be reliably used to
determin the smallest nodes version at the point in time when the index
was created. This means we can safely use certain features that would
for instance require reindeing and / or would not work if not the
entire index (all shards and segments) have been created with a certain
version or newer.
Closes#6660
When three threads are trying to write checksums at the same time, it's possible for all three threads to obtain the same checksum file name A. Then the first thread enters the synchronized section, creates the file with name A and exits. The second thread enters the synchronized section, checks that A exists, creates file A+1 and exits the critical section. Then it proceeds to clean up and deletes all checksum files including A. If it happens before the third thread enters the synchronized section, it's possible for the third thread to check for A and since it no longer exists create the checksum file A the second time, which triggers "file _checksums-XXXXXXXXXXXXX was already written to" exception in MockDirectoryWrapper and fails recovery.
They were due to a combination of mappings propagation delays and the behavior
of MapperService.smartName(String) so mappings are now configured up-front.
Makes it possible to delete snapshots that are missing some of the metadata files. This can happen if snapshot creation failed because repository drive ran out of disk space.
Closes#6383
The delayed mapping intro tests exposed a bug where if a new mapping is introduced, yet not updated on the master, and a full restart occurs, reply of the transaction log will not cause the new mapping to be re-introduced.
closes#6659
add comment on the method
Today, when a new mapping is introduced, the mapping is rebuilt (refreshSource) on the thread that performs the indexing request. This can become heavier and heavier if new mappings keeps on being introduced, we can move this process to another thread that will be responsible to refresh the source and then send the update mapping to the master (note, this doesn't change the semantics of new mapping introduction, since they are async anyhow).
When doing so, the thread can also try and batch as much updates as possible, this is handy especially when multiple shards for the same index exists on the same node. An internal setting that can control the time to wait for batches is also added (defaults to 0).
Testing wise, a new support method on ElasticsearchIntegrationTest#waitForConcreteMappingsOnAll to allow to wait for the concrete manifestation of mappings on all relevant nodes is added. Some tests mistakenly rely on the fact that there are no more pending tasks to mean mappings have been updated, so if we see, timing related, failures down later (all tests pass), then those will need to be fixed to wither awaitBusy on the master for the new mapping, or in the rare case, wait for the concrete mapping on all the nodes using the new method.
closes#6648
allow to change the additional time window dynamically
better sorting on mappers when refreshing source
also, no need to call nodes info in test, we already have the node names
clean calls to mapping update to provide doc mapper and UUID always
also use the internal cluster support method to get the list of nodes an index is on
reverse the order to pick the latest change first
remove unused field
and fix constructor param
move to start/stop on mapping update action
randomize INDICES_MAPPING_ADDITIONAL_MAPPING_CHANGE_TIME
Try and push our system to a state where there is only a single worker, trying to expose potential deadlocks when we by mistake execute blocking operations on the worker thread
closes#6635
only change recovery throttling to slow down recoveries. The recovery file chunk size updates are not picked up by ongoing recoveries. That cause the recovery to take too long even after the default settings are restored.
Also - change document creation to reuse field names in order to speed up the test.
We don't rely upon GC to cleanup mappedbytebuffers, we unmap them
explicitly on close in lucene. But the JDK has crazy loops with
explicit GCs in exceptional cases to try to force unmapping.
In general we don't want any of our code or library code calling
this method: so its banned in forbidden-apis as well.
We clone RateLimitedIndexOutput from lucene just to collect pausing
statistics we can do this in a more straight forward way in a delegating
RateLimiter.
Closes#6625
Thread rejection should return too many requests status code, and not 503, which is used to also show that the cluster is not available
relates to #6627, but only for rejections for now
closes#6629
We want to make sure recycling will not fail for any reason while trying to send a response back that is caused by a failure, for example, if we have circuit breaker on it (at one point), sending an error back will not be affected by it.
closes#6631
the test failed but couldn't repro (yet), at the very least, make sure we have the exception message as the reason, can help to track down the failure itself when it happens again