OpenSearch

Commit Graph

Author	SHA1	Message	Date
Martijn van Groningen	8e1ef0cff9	Rewrite shard follow node task logic (#31581 ) The current shard follow mechanism is complex and does not give us easy ways the have visibility into the system (e.g. why we are falling behind). The main reason why it is complex is because the current design is highly asynchronous. Also in the current model it is hard to apply backpressure other than reducing the concurrent reads from the leader shard. This PR has the following changes: * Rewrote the shard follow task to coordinate the shard follow mechanism between a leader and follow shard in a single threaded manner. This allows for better unit testing and makes it easier to add stats. * All write operations read from the shard changes api should be added to a buffer instead of directly sending it to the bulk shard operations api. This allows to apply backpressure. In this PR there is a limit that controls how many write ops are allowed in the buffer after which no new reads will be performed until the number of ops is below that limit. * The shard changes api includes the current global checkpoint on the leader shard copy. This allows reading to be a more self sufficient process; instead of relying on a background thread to fetch the leader shard's global checkpoint. * Reading write operations from the leader shard (via shard changes api) is a separate step then writing the write operations (via bulk shards operations api). Whereas before a read would immediately result into a write. * The bulk shard operations api returns the local checkpoint on the follow primary shard, to keep the shard follow task up to date with what has been written. * Moved the shard follow logic that was previously in ShardFollowTasksExecutor to ShardFollowNodeTask. * Moved over the changes from #31242 to make shard follow mechanism resilient from node and shard failures. Relates to #30086	2018-07-10 16:00:55 +02:00
Martijn van Groningen	ac654cbc10	Follow engine should not fill gaps upon promotion and recovery (#31751 ) Closes #31318	2018-07-03 13:15:06 +02:00
Martijn van Groningen	8ecfcc3b80	muted tests that will be replaced by the shard follow task refactoring: https://github.com/elastic/elasticsearch/pull/31581	2018-06-29 11:47:46 +02:00
Nhat Nguyen	1185ddbcc6	Replaces testClassesDir with testClassesDirs in ccr build Relates #30389	2018-06-28 11:24:41 -04:00
Nhat Nguyen	2c56df631d	Adjusts transport actions in CCR This commit adjusts the ccr’s actions accordingly to the recent changes in the upstream.	2018-06-23 18:10:15 -04:00
Nhat Nguyen	34f127be3c	CCR: Remove index name resolver from CCR actions Relates #31002	2018-06-20 13:20:24 -04:00
Nhat Nguyen	c74cd30ac6	Remove request type parameter from CCR actions Relates #31405	2018-06-19 10:49:05 -04:00
Martijn van Groningen	50ce990305	added missing serialization tests	2018-06-19 10:22:58 +02:00
Martijn van Groningen	73c9dd976b	Remove action request builders.	2018-06-15 12:32:08 +02:00
Tanguy Leroux	18938aab39	Adapt ShardFollowTasksExecutor after #31031	2018-06-15 11:46:08 +02:00
Martijn van Groningen	cc824ebb5e	[CCR] Added more validation to follow index api. (#31068 )	2018-06-15 07:39:53 +02:00
Nhat Nguyen	1ccb34ac77	Remove unused imports	2018-06-14 11:44:20 -04:00
Jason Tedor	64b4cdeda6	Merge remote-tracking branch 'elastic/master' into ccr * elastic/master: (53 commits) Painless: Restructure/Clean Up of Spec Documentation (#31013) Update ignore_unmapped serialization after backport Add back dropped substitution on merge high level REST api: cancel task (#30745) Enable engine factory to be pluggable (#31183) Remove vestiges of animal sniffer (#31178) Rename elasticsearch-nio to nio (#31186) Rename elasticsearch-core to core (#31185) Move cli sub-project out of server to libs (#31184) [DOCS] Fixes broken link in auditing settings QA: Better seed nodes for rolling restart [DOCS] Moves ML content to stack-docs [DOCS] Clarifies recommendation for audit index output type (#31146) Add nio-transport as option for http smoke tests (#31162) QA: Set better node names on rolling restart tests Add support for ignore_unmapped to geo sort (#31153) Share common parser in some AcknowledgedResponses (#31169) Fix random failure on SearchQueryIT#testTermExpansionExceptionOnSpanFailure Remove reference to multiple fields with one name (#31127) Remove BlobContainer.move() method (#31100) ...	2018-06-07 23:33:42 -04:00
Simon Willnauer	5c6711b8a4	Use a `_recovery_source` if source is omitted or modified (#31106 ) Today if a user omits the `_source` entirely or modifies the source on indexing we have no chance to re-create the document after it has been added. This is an issue for CCR and recovery based on soft deletes which we are going to make the default. This change adds an additional recovery source if the source is disabled or modified that is only kept around until the document leaves the retention policy window. This change adds a merge policy that efficiently removes this extra source on merge for all document that are live and not in the retention policy window anymore.	2018-06-07 07:39:28 +02:00
Jason Tedor	20a2f646e2	Fix off-by-one error in chunks coordinator (#31147 ) This commit fixes an off-by-error in the chunks coordinator where the batches would be of size one more than the batch size.	2018-06-06 19:53:49 -04:00
Jason Tedor	bf1152fcc6	Use follower primary term when applying operations (#31113 ) The primary shard copy on the following has authority of the replication operations that occur on the following side in cross-cluster replication. Yet today we are using the primary term directly from the operations on the leader side. Instead we should be replacing the primary term on the following side with the primary term of the primary on the following side. This commit does this by copying the translog operations with the corrected primary term. This ensures that we use this primary term while applying the operations on the primary, and when replicating them across to the replica (where the replica request was carrying the primary term of the primary shard copy on the follower).	2018-06-06 11:03:57 -04:00
Jason Tedor	d230548401	Remove use of deprecated methods to perform request (#31117 ) The old perform request methods on the REST client have been deprecated in favor using request-flavored methods. This commit addresses the use of these deprecated methods in the CCR test suite.	2018-06-06 05:09:55 -04:00
Nhat Nguyen	6ee6404e94	Adapt changes in PersistentTaskParams Relates #31045	2018-06-04 14:48:04 -04:00
Nhat Nguyen	87abb49145	Adapt changes in AcknowledgeResponse Relates #30983	2018-06-04 14:47:22 -04:00
Nhat Nguyen	9564b60194	Adjust CCR Actions after RequestBuilder is removed CCR side of #30966	2018-06-01 23:09:59 -04:00
Nhat Nguyen	2a9a2002e6	CCR: Tighten requesting range check on leader This commit clarifies the origin of the global checkpoint that the following shard uses and replaces illegal_state_exc E by an assertion. Relates #30980	2018-05-31 20:00:33 -04:00
Nhat Nguyen	fa54be2dcd	CCR: Do not minimization requesting range on leader (#30980 ) Today before reading operations on the leading shard, we minimization the requesting range with the global checkpoint. However, this might make the request invalid if the following shard generates a requesting range based on the global-checkpoint from a primary shard and sends that request to a replica whose global checkpoint is lagged. Another issue is that we are mutating the request when applying minimization. If the request becomes invalid on a replica, we will reroute the mutated request instead of the original one to the primary. This commit removes the minimization and replaces it by a range check with the local checkpoint.	2018-05-31 15:14:32 -04:00
Martijn van Groningen	7e8cf768cf	changed persistent task name to be of similar structure as the others	2018-05-31 15:16:13 +02:00
Martijn van Groningen	a82f2e31b4	[CCR] Also copy routing_num_shards from leader to follow index. (#30894 ) Bug was introduced when create and follow api was added in #30602	2018-05-31 08:03:53 +02:00
Nhat Nguyen	f25ee254cc	Mute ShardChangesIT#testFollowIndex	2018-05-30 14:29:58 -04:00
Martijn van Groningen	adca32eae7	no need to resolve index name as only concrete index names are used	2018-05-30 12:42:35 +02:00
Martijn van Groningen	4a20dca5fe	Required changes after merging in master.	2018-05-30 10:26:49 +02:00
Martijn van Groningen	51caefe46c	[CCR] Sync mappings between leader and follow index (#30115 ) The shard changes api returns the minimum IndexMetadata version the leader index needs to have. If the leader side is behind on IndexMetadata version then follow shard task waits with processing write operations until the mapping has been fetched from leader index and applied in follower index in the background. The cluster state api is used to fetch the leader mapping and put mapping api to apply the mapping in the follower index. This works because put mapping api accepts fields that are already defined. Relates to #30086	2018-05-28 07:37:27 +02:00
Martijn van Groningen	e477147143	[CCR] Add create and follow api (#30602 ) Also renamed FollowExisting* internal names to just Follow* and fixed tests	2018-05-26 15:05:40 +02:00
Martijn van Groningen	7942e4082a	build: enhance check task instead of overwriting it. (test task didn't run when check task ran)	2018-05-16 10:54:15 +02:00
Martijn van Groningen	596ec1848e	[CCR] Add validation checks that were left out of #30120 (#30463 )	2018-05-16 09:46:03 +02:00
Martijn van Groningen	23204e3d09	[CCR] Fixed follow and unfollow api url path according to design. The TODOs in the rest actions was incorrect. The problem was that these rest actions used `follow_index` as first named variable in the path under which the rest actions were registered. Other candidate rest actions that also have a named variable as first element in the path (but with a different name) get resolved as rest parameters too and passed down to the rest action that actually ends up getting executed. In the case of the follow index api, a `index` parameter got passed down to `RestFollowExistingAction`, but that param was never used. This caused the follow index api call to fail, because of unused http parameters. This change doesn't fixes that problem, but works around it by using `index` as named variable for the follow index (instead of `follow_index`). Relates to #30102	2018-05-16 09:07:50 +02:00
Martijn van Groningen	64b97313d5	[CCR] Make cross cluster replication work with security (#30239 ) If security is enabled today with ccr then the follow index api will fail with the fact that system user does not have privileges to use the shard changes api. The reason that system user is used is because the persistent tasks that keep the shards in sync runs in the background and the user that invokes the follow index api only start those background processes. I think it is better that the system user isn't used by the persistent tasks that keep shards in sync, but rather runs as the same user that invoked the follow index api and use the permissions that that user has. This is what this PR does, and this is done by keeping track of security headers inside the persistent task (similar to how rollup does this). This PR also adds a cluster ccr priviledge that allows a user to follow or unfollow an index. Finally if a user that wants to follow an index, it needs to have read and monitor privileges on the leader index and monitor and write privileges on the follow index.	2018-05-16 07:48:32 +02:00
Martijn van Groningen	bb6586dc5f	[CCR] Read changes from Lucene instead of translog (#30120 ) This commit adds an API to read translog snapshot from Lucene, then cut-over from the existing translog to the new API in CCR. Relates #30086 Relates #29530	2018-05-09 17:35:27 -04:00
Martijn van Groningen	ad499fc178	[CCR] added rest specs and simple rest test for follow and unfollow apis (#30123 ) [CCR] added rest specs and simple rest test for follow and unfollow apis, also Added an acknowledge field in follow and unfollow api responses. Currently these api return an empty response and fixed bug in unfollow api that didn't cleanup node tasks properly.	2018-05-07 14:18:28 +02:00
Nhat Nguyen	6e0d0feca0	Enable MockHttpTransport in ShardChangsIT CCR side of #29601	2018-05-04 13:44:18 -04:00
Nhat Nguyen	8fefa8a661	Update InternalEngine tests on ccr side for #30121 Relates #30121	2018-05-04 10:57:54 -04:00
Nhat Nguyen	d621fc7a00	Add tombstone document into Lucene for Noop (#30226 ) This commit adds a tombstone document into Lucene for every No-op. With this change, Lucene index is expected to have a complete history of operations like Translog. In fact, this guarantee is subjected to the soft-deletes retention merge policy. Relates #29530	2018-05-02 09:08:29 -04:00
Nhat Nguyen	eb4281edef	CCR side #30244 Relates #30244	2018-05-01 21:08:24 -04:00
Martijn van Groningen	8a2df6c3b9	[CCR] Only normalize -1 seqno in shard changes request. (#30238 ) Prior to this change a -1 seqno would be normalized earlier, which caused a leader shard containing a single operation to be ignored. Closes #30227	2018-05-01 08:40:23 +02:00
Martijn van Groningen	e6b88fa5a0	removed duplicated license	2018-04-25 12:18:02 +02:00
Martijn van Groningen	5a67a0f78f	Applying changes required for ccr after moving ccr code to elasticsearch	2018-04-25 08:03:29 +02:00
Martijn van Groningen	9b9d0f9057	Enabled licence header check and fixed unchecked casts. (#4408 )	2018-04-20 11:15:52 +02:00
Martijn van Groningen	cfd7847628	fixed issues after merging in master	2018-04-20 07:59:13 +02:00
Nhat Nguyen	f97aec7b8b	Sibling of enforce access to translog via engine Since elastic/elasticsearch#29542, we no longer expose translog instance but only provide creating translog snapshot method. This commit adapts that change in CCR branch. Relates elastic/elasticsearch#29542	2018-04-18 11:54:00 -04:00
Martijn van Groningen	56ca59a513	Add the ability to the follow index to follow an index in a remote cluster. The follow index api completely reuses CCS infrastructure that was exposed via: https://github.com/elastic/elasticsearch/pull/29495 This means that the leader index parameter support the same ccs index to indicate that an index resides in a different cluster. I also added a qa module that smoke tests the cross cluster nature of ccr. The idea is that this test just verifies that ccr can read data from a remote leader index and that is it, no crazy randomization or indirectly testing other features.	2018-04-17 07:36:40 +02:00
Martijn van Groningen	c0d42e9cd1	Fixed test	2018-04-16 10:48:46 +02:00
Martijn van Groningen	a94b38b88e	Fixed compile errors and test failures after merging master into ccr.	2018-04-13 16:35:09 +02:00
Martijn van Groningen	d77f756f5c	ccr: use indices stats api to fetch global checkpoint of the follower shards and keep track of shard follow stats inside shard follow stats' node task instead of persistent task status. By maintaining the shard follow stats inside its node task the stats update is quicker as no cluster state update is required. The stats are now transient; meaning if the task is going to run a different node then the stats are gone too. Currently only the processed global checkpoint is being tracked and this is being restored when a shard follow node task starts via the indices stats api (the reason of the first change of this change). Other stats that we may add in the future (like fetch_time, see: https://gist.github.com/s1monw/dba13daf8493bf48431b72365e110717) it is ok if we start from zero in case a shard follow task moves to another node.	2018-04-05 14:52:20 +02:00
Martijn van Groningen	d976fa44e7	Removed LocalCheckpointTracker usage.	2018-03-29 07:41:23 +02:00
Martijn van Groningen	a22a7d079d	ccr: Added maximum translog limit that a single shard changes response can return. This limit is based on the number of estimate bytes in each translog operation that fall between the minimum and maximum request sequence number. If this limit is met then the shard follow task executor will make sure that a subsequent shard changes request will be performed to fetch the remaining translog operations. This limit is needed in order to protect against returning too many translog operations in a single shard changes response. Relates to #2436	2018-03-28 15:49:57 +02:00
Martijn van Groningen	282740610b	Fixed test after merging in master branch.	2018-03-28 09:54:41 +02:00
Nhat Nguyen	51111a8106	CCR: Stop FollowExistingIndexAction after report failure (#4111 ) We check for the existence of both leader and follower index, then properly report to the caller. However, we do not return after reporting failure. This causes the caller receive exception twice: IllegalArgumentException then NullPointerException. This commit makes sure to stop the action after reporting failure.	2018-03-26 13:56:47 -04:00
Martijn van Groningen	9e4c68c389	Fixed compile and test errors after merging in master	2018-03-16 17:47:10 +01:00
Martijn van Groningen	10cfa21a68	required changes after merge master branch into ccr branch.	2018-02-22 15:03:33 +01:00
Martijn van Groningen	1a9a7ffe97	removed hack	2018-02-07 17:54:28 +01:00
Martijn van Groningen	c442d14f1d	Several changes that were required after merging master into the ccr branch.	2018-02-05 13:25:58 +01:00
Martijn van Groningen	4e818254ad	re-enabled java integration tests	2018-01-25 14:18:34 +01:00
Martijn van Groningen	05d3d2e49c	fix packages after merge	2018-01-24 09:28:42 +01:00
Jason Tedor	9b6bb2c635	Enable run task for CCR This commit enables the run task for ccr by specifying that the ccr project not be evaluated until after core is evaluated. This is important since ccr is alphabetically before core and thus Gradle evaluates it first. Relates #3665	2018-01-22 15:07:20 -05:00
Martijn van Groningen	83a82d83d0	Moved ccr source code to its own gradle module after xpack split.	2018-01-22 11:09:04 +01:00

... 2 3 4 5 6

261 Commits