OpenSearch

Commit Graph

Author	SHA1	Message	Date
Boaz Leskes	80b59e0d66	Discovery: Add a dedicate queue for incoming ClusterStates The initial implementation of two phase commit based cluster state publishing (#13062) relied on a single in memory "pending" cluster state that is only processed by ZenDiscovery once committed by the master. While this is fine on it's own, it resulted in an issue with acknowledged APIs, such as the open index API, in the extreme case where a node falls behind and receives a commit message after a new cluster state has been published. Specifically: 1) Master receives and acked-API call and publishes cluster state CS1 2) Master waits for a min-master nodes to receives CS1 and commits it. 3) All nodes that have responded to CS1 are sent a commit message, however, node N didn't respond yet 4) Master waits for publish timeout (defaults to 30s) for all nodes to process the commit. Node N fails to do so. 5) Master publishes a cluster state CS2. Node N responds to cluster state CS1's publishing but receives cluster state CS2 before the commit for CS1 arrives. 6) The commit message for cluster CS1 is processed on node N, but fails because CS2 is pending. This caused the acked API in step 1 to return (but CS2 , is not yet processed). In this case, the action indicated by CS1 is not yet executed on node N and therefore the acked API calls return pre-maturely. Note that once CS2 is processed but the change in CS1 takes effect (cluster state operations are safe to batch and we do so all the time). An example failure can be found on: http://build-us-00.elastic.co/job/es_feature_two_phase_pub/314/ This commit extracts the already existing pending cluster state queue (processNewClusterStates) from ZenDiscovery into it's own class, which serves as a temporary container for in-flight cluster states. Once committed the cluster states are transferred to ZenDiscovery as they used to before. This allows "lagging" cluster states to still be successfully committed and processed (and likely to be ignored as a newer cluster state has already been processed). As a side effect, all batching logic is now extracted from ZenDiscovery and is unit tested.	2015-09-11 09:23:41 +02:00
Boaz Leskes	218979da1b	remove committedOrFailed and use committedOrFailedLatch for state	2015-08-28 12:31:46 +02:00
Boaz Leskes	10e8c410ea	more feedback	2015-08-28 12:31:46 +02:00
Boaz Leskes	0668e0d623	more feedback	2015-08-28 12:31:46 +02:00
Boaz Leskes	c9ee8dbd16	tighten up FailedToCommitClusterStateException semantics and other feedback	2015-08-28 12:31:45 +02:00
Boaz Leskes	98ed133dd7	reduce log chatter	2015-08-28 12:31:45 +02:00
Boaz Leskes	d9f6e302b5	doc feedback	2015-08-28 12:31:45 +02:00
Boaz Leskes	f70ed876d6	added docs	2015-08-28 12:31:45 +02:00
Boaz Leskes	c7c65b626f	commit timeout default should never be larger than publishing timeout	2015-08-28 12:31:45 +02:00
Boaz Leskes	6208248215	fix defaults in DiscoverySettings	2015-08-28 12:31:44 +02:00
Boaz Leskes	91dee8b311	reject older cluster state from the same master	2015-08-28 12:31:44 +02:00
Boaz Leskes	a56d67d8d7	force mock transport in testCanNotPublishWithoutMinMastNodes	2015-08-28 12:31:44 +02:00
Boaz Leskes	e3e0aa5049	Improved concurrency controls In SendingController to make sure that a CS is never committed after publishing is marked out as timed out	2015-08-28 12:31:44 +02:00
Boaz Leskes	234a3794e5	improved timeout handling	2015-08-28 12:31:44 +02:00
Boaz Leskes	4d31681057	added constructor to FailedToCommitException	2015-08-28 12:31:43 +02:00
Boaz Leskes	7d3a36b20f	fix ZenDiscoveryUnitTest.testShouldIgnoreNewClusterState	2015-08-28 12:31:43 +02:00
Boaz Leskes	7390bcf833	add FailedToCommitException to registration	2015-08-28 12:31:43 +02:00
Boaz Leskes	b702843fe9	beefed up testing...	2015-08-28 12:31:43 +02:00
Boaz Leskes	81e07e81e0	simplified PublishClusterStateActionTests infra	2015-08-28 12:31:42 +02:00
Boaz Leskes	3815a41626	initial copy over from POC	2015-08-28 12:31:42 +02:00
Boaz Leskes	35f9ee7a62	Tests: better isolation of cluster ports Previously multiple clusters in the same JVM reused the same port ranges, leading to potential big gaps in port selection, which in turns causes unicast based discovery to fail, missing to find another node in the default 5 port range. Also the previous logic had http use a range that is assigned to another JVMs.	2015-08-28 11:39:30 +02:00
Michael McCandless	07b5d22d91	disable new test on windows	2015-08-28 05:06:35 -04:00
Michael McCandless	fb703845dd	Merge pull request #13158 from mikemccand/new_path_for_shard_test Add unit test for ShardPath.selectNewPathForShard	2015-08-28 04:15:15 -04:00
Michael McCandless	b646ed9cd8	try to work on Windows too	2015-08-28 04:13:21 -04:00
Michael McCandless	8dbc1fbdbd	use ShardPath.getRootStatePath; allow forbidden API	2015-08-28 03:59:02 -04:00
Boaz Leskes	db5e225a25	Discovery: fix `discovery.zen.join_timeout` default value logic We default the value to be 20x the value of a ping timeout, however we only use the legacy ping timeout settings value for the calculation. Closes #13162	2015-08-28 09:47:15 +02:00
javanna	9b2e77903d	Internal: make ValidationException methods final and fix javadocs	2015-08-28 09:41:47 +02:00
javanna	37ec221df5	Internal: remove unused MapperQueryParser constructor	2015-08-28 09:38:29 +02:00
Jason Tedor	90bc784194	Work around for JDK-8039214 on JDK 9	2015-08-27 23:29:22 -04:00
Jason Tedor	3e88cc0bd0	Merge pull request #13170 from jasontedor/fix/lists-be-gone Remove and forbid use of com.google.common.collect.Lists	2015-08-27 22:19:44 -04:00
Jason Tedor	3067cacb66	Remove and forbid use of com.google.common.collect.Lists This commit removes and now forbids all uses of com.google.common.collect.Lists across the codebase. This is the first of many steps in the eventual removal of Guava as a dependency.	2015-08-27 22:14:33 -04:00
Igor Motov	2b87d7d919	Add `readonly` option for repositories Closes #7831 Closes #11753	2015-08-27 18:21:29 -04:00
Simon Willnauer	f64a875e03	use provided version in smoke test file paths	2015-08-27 23:20:01 +02:00
Nik Everett	144a641a5d	Merge pull request #13165 from nik9000/fix_ttl_test Use proper comparison operator ttl test	2015-08-27 16:49:27 -04:00
Nik Everett	19a79c99f9	[test] Use proper comparison operator lessThanOrEqualTo is more appropriate when comparing _ttl than lessThan because in rare cases, when tests run very fast, the ttl you fetch will still equal the one you sent.	2015-08-27 16:43:10 -04:00
Britta Weber	e6eeadd171	[test] make sure that the scripts in testScoreAccessWithinScript never compute log(0)	2015-08-27 22:02:51 +02:00
Ryan Ernst	48ea97cace	Merge pull request #13133 from rjernst/fix/bwc_creation Fix generation scripts for bwc indexes, and add 2.0 beta1 index	2015-08-27 10:21:32 -07:00
Ryan Ernst	38b8f20cc5	Make 0.x and 1.x indexes still work with get-bwc-version	2015-08-27 10:19:59 -07:00
Ryan Ernst	448d3498b1	Merge branch 'master' into fix/bwc_creation	2015-08-27 10:16:45 -07:00
Michael McCandless	e2e1b7f76a	reference original issue	2015-08-27 13:06:00 -04:00
Michael McCandless	30a3e431ec	polish	2015-08-27 13:01:36 -04:00
Michael McCandless	11f09f0a68	add basic unit test	2015-08-27 12:33:04 -04:00
Michael McCandless	4d38856f70	simplify API for ShardPath.selectNewPathForShard to enable unit testing: don't pass IndexShard	2015-08-27 12:32:21 -04:00
Lee Hinman	9f03f8cf44	Call `beforeIndexShardCreated` listener earlier in `createShard` Some listeners may need to do work before a shard's path is accessed (such as creating the directory in a plugin), so the listener should be called before anything happens (as its name implies).	2015-08-27 10:05:27 -06:00
Nik Everett	38fdacdbf7	Merge pull request #11306 from nik9000/default_detect_noop Default detect_noop to true	2015-08-27 11:22:13 -04:00
Michael McCandless	8f2ae59316	add asserts to make sure mocking 'took'	2015-08-27 11:19:55 -04:00
Michael McCandless	7a8a608d50	initial mock filesystem setup for test case	2015-08-27 10:55:04 -04:00
Nik Everett	9eb684da51	Default detect_noop to true detect_noop is pretty cheap and noop updates compartively expensive so this feels like a sensible default. Also had to do some testing and documentation around how _ttl works with detect_noop. Closes #11282	2015-08-27 10:34:18 -04:00
Simon Willnauer	9a1b5cf966	[TEST] comparing paths seems to be hard on windonws	2015-08-27 13:20:22 +02:00
Dan Tuffery	d8298e1d3a	Update query_dsl.asciidoc Fixed typo.'	2015-08-27 12:47:15 +02:00

1 2 3 4 5 ...

15167 Commits All Branches Search

15167 Commits

All Branches