OpenSearch

History

Yannick Welsch 7f8e1454ab Advance checkpoints only after persisting ops (#43205 ) Local and global checkpoints currently do not correctly reflect what's persisted to disk. The issue is that the local checkpoint is adapted as soon as an operation is processed (but not fsynced yet). This leaves room for the history below the global checkpoint to still change in case of a crash. As we rely on global checkpoints for CCR as well as operation-based recoveries, this has the risk of shard copies / follower clusters going out of sync. This commit required changing some core classes in the system: - The LocalCheckpointTracker keeps track now not only of the information whether an operation has been processed, but also whether that operation has been persisted to disk. - TranslogWriter now keeps track of the sequence numbers that have not been fsynced yet. Once they are fsynced, TranslogWriter notifies LocalCheckpointTracker of this. - ReplicationTracker now keeps track of the persisted local and persisted global checkpoints of all shard copies when in primary mode. The computed global checkpoint (which represents the minimum of all persisted local checkpoints of all in-sync shard copies), which was previously stored in the checkpoint entry for the local shard copy, has been moved to an extra field. - The periodic global checkpoint sync now also takes async durability into account, where the local checkpoints on shards only advance when the translog is asynchronously fsynced. This means that the previous condition to detect inactivity (max sequence number is equal to global checkpoint) is not sufficient anymore. - The new index closing API does not work when combined with async durability. The shard verification step is now requires an additional pre-flight step to fsync the translog, so that the main verify shard step has the most up-to-date global checkpoint at disposition.		2019-06-20 11:12:38 +02:00
..
ccr	Advance checkpoints only after persisting ops (#43205 )	2019-06-20 11:12:38 +02:00
core	Advance checkpoints only after persisting ops (#43205 )	2019-06-20 11:12:38 +02:00
data-frame	[ML][Data Frame] make response.count be total count of hits (#43241 ) (#43389 )	2019-06-19 16:19:06 -05:00
deprecation	Fix hang in test for "too many fields" dep. check (#42909 )	2019-06-06 08:28:32 -06:00
graph	Testclusters: graph (#43033 )	2019-06-13 09:50:59 +03:00
ilm	[7.x] Narrow period of Shrink action in which ILM prevents stopping (#43254 ) (#43393 )	2019-06-19 16:37:41 -06:00
logstash	Remove description from xpack feature sets (#43065 )	2019-06-11 09:22:58 -07:00
ml	Remove stale test logging annotations (#43403 )	2019-06-19 22:58:22 -04:00
monitoring	Return 0 for negative "free" and "total" memory reported by the OS (#42725 )	2019-06-19 10:35:48 -06:00
rollup	Remove description from xpack feature sets (#43065 )	2019-06-11 09:22:58 -07:00
security	Remove stale test logging annotations (#43403 )	2019-06-19 22:58:22 -04:00
sql	Fix NPE in case of subsequent scrolled requests for a CSV/TSV formatted response (#43365 )	2019-06-20 11:26:11 +03:00
src/test	[ML][Data Frame] make response.count be total count of hits (#43241 ) (#43389 )	2019-06-19 16:19:06 -05:00
vectors	Move dense_vector and sparse_vector to module (#43280 ) (#43333 )	2019-06-18 11:56:04 -04:00
watcher	Remove stale test logging annotations (#43403 )	2019-06-19 22:58:22 -04:00
build.gradle	Remove trace logging from ML datafeeds in tests	2019-06-18 22:24:36 -04:00