OpenSearch

Commit Graph

Author	SHA1	Message	Date
Tim Brooks	8bde608979	Register CcrRepository based on settings update (#36086 ) This commit adds an empty CcrRepository snapshot/restore repository. When a new cluster is registered in the remote cluster settings, a new CcrRepository is registered for that cluster. This is implemented using a new concept of "internal repositories". RepositoryPlugin now allows implementations to return factories for "internal repositories". The "internal repositories" are different from normal repositories in that they cannot be registered through the external repository api. Additionally, "internal repositories" are local to a node and are not stored in the cluster state. The repository will be unregistered if the remote cluster is removed.	2018-12-04 14:36:50 -07:00
Yannick Welsch	70c361ea5a	Merge remote-tracking branch 'elastic/master' into zen2	2018-12-04 21:26:11 +01:00
Adrien Grand	0df08dd458	Set Lucene version upon index creation. (#36038 ) It is important that all shards of a given index have the same `indexCreatedVersionMajor` to Lucene, or eg. merging those shards is going to be considered illegal. At the moment, we use the latest Lucene version when creating a shard, which could cause shards to have different created versions eg. in case of forced allocation. This commit makes sure to reuse the appropriate Lucene version in order to avoid such issues. Closes #33826	2018-12-04 17:53:20 +01:00
Martijn van Groningen	6e1ff31222	[CCR] AutoFollowCoordinator should tolerate that auto follow patterns may be removed (#35945 ) AutoFollowCoordinator should take into account that after auto following an index and while updating that a leader index has been followed, that the auto follow pattern may have been removed via delete auto follow patterns api. Also fixed a bug that when a remote cluster connection has been removed, the auto follow coordinator does not die when it tries get a remote client for that cluster. Closes #35480	2018-12-04 15:55:15 +01:00
Yannick Welsch	80ee7943c9	Merge remote-tracking branch 'elastic/master' into zen2	2018-12-04 09:37:09 +01:00
Martijn van Groningen	43773a32a4	Replace Streamable w/ Writeable in BaseTasksRequest and subclasses (#35854 ) * Replace Streamable w/ Writeable in BaseTasksRequest and subclasses This commit replaces usages of Streamable with Writeable for the BaseTasksRequest / TransportTasksAction classes and subclasses of these classes. Relates to #34389	2018-12-03 08:04:29 +01:00
Martijn van Groningen	32f7fbd9f0	[TEST] Set 'index.unassigned.node_left.delayed_timeout' to 0 in ccr tests Some tests kill nodes and otherwise it would take 60s by default for replicas to get allocated and that is longer than we wait for getting in a green state in tests. Relates to #35403	2018-11-30 11:03:36 +01:00
David Turner	7f257187af	[Zen2] Update default for USE_ZEN2 to true (#35998 ) Today the default for USE_ZEN2 is false and it is overridden in many places. By defaulting it to true we can be sure that the only places in which Zen2 does not work are those in which it is explicitly set to false.	2018-11-29 12:18:35 +00:00
Martijn van Groningen	1390f366d4	[CCR] Only auto follow indices when all primary shards have started (#35814 ) This change adds an extra check that verifies that all primary shards have been started of an index that is about to be auto followed. If not all primary shards have been started for an index then the next auto follow run will try to follow to auto follow this index again. Closes #35480	2018-11-29 09:46:09 +01:00
Jason Tedor	a3186e4a32	Deprecate X-Pack centric license endpoints (#35959 ) This commit is part of our plan to deprecate and ultimately remove the use of _xpack in the REST APIs.	2018-11-28 08:24:35 -05:00
Jason Tedor	2887680acb	Avoid NPE in follower stats when no tasks metadata (#35802 ) When there is no persistent tasks metadata we could hit a null pointer exception when executing a follower stats request. This is because we inspect the persistent tasks metadata. Yet, if no tasks have been registered, this is null (as opposed to empty). We need to avoid de-referencing the persistent tasks metadata in this case. That is what this commit does, and we add a test for this situation.	2018-11-21 19:16:28 -05:00
Tim Brooks	a989b675b5	Remove NPE from IndexFollowingIT (#35717 ) Currently there is a common NPE in the IndexFollowingIT that does not indicate the test failing. This is when a cluster state listener is called and certain index metadata is not yet available. This commit checks that the metadata is not null before performing the logic that depends on the metadata.	2018-11-19 20:38:49 -07:00
Arthur Gavlyukovskiy	022726011c	Remove use of AbstractComponent in server (#35444 ) Removed extending of AbstractComponent and changed logger usage to explicit declaration. Abstract classes still have logger declaration using this.getClass() in order to show implementation class name in its logs. See #34488	2018-11-16 16:10:32 -05:00
Martijn van Groningen	0487181d0f	[TEST] Force flush to ensure multiple segments. Relates to #35333	2018-11-13 14:58:17 +01:00
Jason Tedor	3859d21661	Fix the names of CCR stats endpoints in usage API (#35438 ) This commit fixes the names of the CCR stats endpoints reported in the usage API.	2018-11-12 10:27:12 -05:00
Martijn van Groningen	ef10461caf	[TEST] Instead of ignoring the ccr downgrade to basic license qa test avoid the assertions that check the log files, because that does not work on Windows. The rest of the test is still useful and should work on Windows CI. Currently on Windows CI this qa module fails because there is just one test and that test si ignored if OS is Windows.	2018-11-12 10:17:33 +01:00
Martijn van Groningen	ae2af20ae5	[CCR] Validate remote cluster license as part of put auto follow pattern api call (#35364 ) Validate remote cluster license as part of put auto follow pattern api call in addition of validation that when auto follow coordinator starts auto following indices in the leader cluster. Also added qa module that tests what happens to ccr after downgrading to basic license. Existing active follow indices should remain to follow, but the auto follow feature should not pickup new leader indices.	2018-11-09 17:43:43 +01:00
Martijn van Groningen	807ce10f73	[TEST] Increased timeout for verifying ccr monitoring.	2018-11-09 15:40:15 +01:00
Martijn van Groningen	fba811fa3a	[TEST] increased the number of index and delete ops to make it less likely that all ops exist as soft delete docs.	2018-11-09 15:31:51 +01:00
Martijn van Groningen	83152b3835	[CCR] Get all auto follow patterns and no auto follow metadata (#35381 ) Return empty response when querying all auto follow patterns, but there is no auto follow metadata.	2018-11-09 14:24:27 +01:00
Martijn van Groningen	07a69a528b	[CCR] Rename leaderClient variables and parameters to remoteClient (#35368 )	2018-11-08 16:26:14 +01:00
Martijn van Groningen	8a85251da0	[CCR] Auto follow Coordinator fetch cluster state in system context (#35120 ) Auto follow Coordinator should fetch the leader cluster state using system context.	2018-11-08 10:48:27 +01:00
Martijn van Groningen	2f2090f562	[CCR] Adjust list of dynamic index settings that should be replicated (#35195 ) Adjust list of dynamic index settings that should be replicated and added a test that verifies whether builtin dynamic index settings are classified as replicated or non replicated (whitelisted).	2018-11-07 21:59:58 -05:00
Jason Tedor	4f4fc3b8f8	Replicate index settings to followers (#35089 ) This commit uses the index settings version so that a follower can replicate index settings changes as needed from the leader. Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com>	2018-11-07 21:20:51 -05:00
Martijn van Groningen	314b9ca44c	[CCR] Enforce auto follow pattern name restrictions (#35197 ) An auto follow pattern: * cannot start with `_` * cannot contain a `,` * can be encoded in UTF-8 * the length of UTF-8 encoded bytes is no longer than 255 bytes	2018-11-07 20:16:26 +01:00
Martijn van Groningen	e685cfe8f9	[CCR] Fail with a better error if leader index is red (#35298 ) as part of fetching history uuids from leader index.	2018-11-07 13:23:30 +01:00
Martijn van Groningen	2395e16d84	[CCR] Change resume follow api to be a master node action (#35249 ) In order to start shard follow tasks, the resume follow api already needs execute N requests to the elected master node. The pause follow API is also a master node action, which would make how both APIs execute more consistent.	2018-11-07 07:38:44 +01:00
Martijn van Groningen	a937d7f5f3	[CCR] Forgot missing return statement, Error was thrown if leader index had no soft deletes enabled, but it then continued creating the follower index. The test caught this bug, but very rarely due to timing issue. Build failure instance: ``` 1> [2018-11-05T20:29:38,597][INFO ][o.e.x.c.LocalIndexFollowingIT] [testDoNotCreateFollowerIfLeaderDoesNotHaveSoftDeletes] before test 1> [2018-11-05T20:29:38,599][INFO ][o.e.c.s.ClusterSettings ] [node_s_0] updating [cluster.remote.local.seeds] from [[]] to [["127.0.0.1:9300"]] 1> [2018-11-05T20:29:38,599][INFO ][o.e.c.s.ClusterSettings ] [node_s_0] updating [cluster.remote.local.seeds] from [[]] to [["127.0.0.1:9300"]] 1> [2018-11-05T20:29:38,609][INFO ][o.e.c.m.MetaDataCreateIndexService] [node_s_0] [leader-index] creating index, cause [api], templates [random-soft-deletes-templat e, one_shard_index_template], shards [2]/[0], mappings [] 1> [2018-11-05T20:29:38,628][INFO ][o.e.c.r.a.AllocationService] [node_s_0] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[leader- index][0]] ...]). 1> [2018-11-05T20:29:38,660][INFO ][o.e.x.c.a.TransportPutFollowAction] [node_s_0] [follower-index] creating index, cause [ccr_create_and_follow], shards [2]/[0] 1> [2018-11-05T20:29:38,675][INFO ][o.e.c.s.ClusterSettings ] [node_s_0] updating [cluster.remote.local.seeds] from [["127.0.0.1:9300"]] to [[]] 1> [2018-11-05T20:29:38,676][INFO ][o.e.c.s.ClusterSettings ] [node_s_0] updating [cluster.remote.local.seeds] from [["127.0.0.1:9300"]] to [[]] 1> [2018-11-05T20:29:38,678][INFO ][o.e.x.c.LocalIndexFollowingIT] [testDoNotCreateFollowerIfLeaderDoesNotHaveSoftDeletes] after test 1> [2018-11-05T20:29:38,678][INFO ][o.e.x.c.LocalIndexFollowingIT] [testDoNotCreateFollowerIfLeaderDoesNotHaveSoftDeletes] [LocalIndexFollowingIT#testDoNotCreateFoll owerIfLeaderDoesNotHaveSoftDeletes]: cleaning up after test 1> [2018-11-05T20:29:38,678][INFO ][o.e.c.m.MetaDataDeleteIndexService] [node_s_0] [follower-index/TlWlXp0JSVasju2Kr_hksQ] deleting index 1> [2018-11-05T20:29:38,678][INFO ][o.e.c.m.MetaDataDeleteIndexService] [node_s_0] [leader-index/FQ6EwIWcRAKD8qvOg2eS8g] deleting index FAILURE 0.23s J0 \| LocalIndexFollowingIT.testDoNotCreateFollowerIfLeaderDoesNotHaveSoftDeletes <<< FAILURES! > Throwable #1: java.lang.AssertionError: > Expected: <false> > but: was <true> > at __randomizedtesting.SeedInfo.seed([7A3C89DA3BCA17DD:65C26CBF6FEF0B39]:0) > at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20) > at org.elasticsearch.xpack.ccr.LocalIndexFollowingIT.testDoNotCreateFollowerIfLeaderDoesNotHaveSoftDeletes(LocalIndexFollowingIT.java:83) > at java.lang.Thread.run(Thread.java:748) ``` Build failure: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.5+intake/46/console	2018-11-06 16:05:35 +01:00
Martijn van Groningen	46c238d792	[CCR] Improve error when operations are missing (#35179 ) Improve error when operations are missing	2018-11-06 08:42:47 +01:00
Martijn van Groningen	cac67f8bcc	[CCR] Add extra validation to unfollow api (#35245 ) Validate whether the follow index actually exists and whether the follow index actually has custom ccr metadata.	2018-11-06 08:00:34 +01:00
Nik Everett	f72ef9b5fd	Build: Pull "skip assemble on qa" to common build (#35214 ) Pull all of the logic that we use to skip the `assemble` and `dependenciesInfo` tasks on `qa` projects into one spot in our root build file.	2018-11-05 16:16:00 -05:00
Alexander Reelsen	409050e8de	Refactor: Remove settings from transport action CTOR (#35208 ) As settings are not used in the transport action constructor, this removes the passing of the settings in all the transport actions.	2018-11-05 13:08:18 +01:00
Martijn van Groningen	ddda2d419c	[CCR] Change max_read_request_size default (#35247 ) This changes the max_read_request_size default from unlimited to 32MB.	2018-11-05 12:51:42 +01:00
Nhat Nguyen	54e1231ebd	CCR/TEST: Limit indexing docs in FollowerFailOverIT (#35228 ) The suite FollowerFailOverIT is failing because some documents are not replicated to the follower. Maybe the FollowTask is not working as expected or the background indexers eat all resources while the follower cluster is trying to reform after a failover; then CI is not fast enough to replicate all the indexed docs within 60 seconds (sometimes I see 80k docs on the leader). This commit limits the number of documents to be indexed into the leader index by the background threads so that we can eliminate the latter case. This change also replaces a docCount assertion with a docIds assertion so we can have more information if these tests fail again. Relates #33337	2018-11-03 10:00:54 -04:00
David Kyle	43cff56aec	Mute FollowerFailOverIT.testFollowIndexAndCloseNode Issue #33337	2018-11-02 15:13:09 +00:00
Nhat Nguyen	4875d6fb0b	CCR: Add NodeClosedException to retryable list (#35191 ) This change adds NodeClosedException to the retry-able exception list.	2018-11-02 07:01:46 -04:00
Martijn van Groningen	19d6cf1b9e	[CCR] Change response classes to not use from Streamable. (#35085 ) Only the response classes of get auto follow pattern, the follow and stats APIs were moved away from Streamable. The other APIs use `AcknowledgedResponse` or `BaseTasksResponse` as response class and moving that class away from Streamable is a bigger change.	2018-11-02 08:02:17 +01:00
Nhat Nguyen	dbc7c9259c	CCR/TEST: Add debug log to testFailOverOnFollower	2018-11-01 22:28:06 -04:00
Nik Everett	e28509fbfe	Core: Less settings to AbstractComponent (#35140 ) Stop passing `Settings` to `AbstractComponent`'s ctor. This allows us to stop passing around `Settings` in a ton of places. While this change touches many files, it touches them all in fairly small, mechanical ways, doing a few things per file: 1. Drop the `super(settings);` line on everything that extends `AbstractComponent`. 2. Drop the `settings` argument to the ctor if it is no longer used. 3. If the file doesn't use `logger` then drop `extends AbstractComponent` from it. 4. Clean up all compilation failure caused by the `settings` removal and drop any now unused `settings` isntances and method arguments. I've intentionally not removed the `settings` argument from a few files: 1. TransportAction 2. AbstractLifecycleComponent 3. BaseRestHandler These files don't need `settings` either, but this change is large enough as is. Relates to #34488	2018-10-31 21:23:20 -04:00
Martijn van Groningen	6da2fb7d5b	Change CCR API request classes to use Writeable serialization instead of Streamable (#34911 ) Only the follow stats request couldn't be changed to use Writeable serialization, because that requires changes in `TransportTasksAction` and `BaseTasksRequest` base classes.	2018-10-30 11:03:30 +01:00
Jason Tedor	cc9894d78a	Fix name of CCR stats transport action This class name should include an indication that it is for CCR. This commit adds that to the name of this class.	2018-10-29 09:50:13 -04:00
Jason Tedor	26f5c509af	Fix CCR API specification (#34963 ) This commit fixes two issues with the CCR API specification: - remove the CCR stats endpoint, it is not currently implemented - fix the documentation links	2018-10-29 09:37:13 -04:00
Martijn van Groningen	b2daaf15d1	[CCR] move tests that modify test cluster from the main test class to a dedicated class.	2018-10-29 13:45:59 +01:00
Martijn van Groningen	1801518527	[CCR] Refactor stats APIs (#34912 ) * Changed the auto follow stats to also include follow stats. * Renamed the auto follow stats api to stats api and changed its url path from `/_ccr/auto_follow/stats` `/_ccr/stats`. * Removed `/_ccr/stats` url path for the follow stats api, which makes the index parameter a required parameter. * Fixed docs.	2018-10-29 07:45:27 +01:00
Martijn van Groningen	bad5972f62	[CCR] Fix request serialization bug (#34917 ) and some parameters that were not set in tests.	2018-10-29 07:38:55 +01:00
Jason Tedor	43f6ba1c63	Fix put/resume follow request parsing (#34913 ) This commit adds some fields that were missing from put follow, and fixes a bug in resume follow.	2018-10-26 11:09:55 -04:00
Martijn van Groningen	306f1d78f8	[CCR] Retry when no index shard stats can be found (#34852 ) Index shard stats for the follower shard are fetched, when a shard follow task is started. This is needed in order to bootstap the shard follow task with the follower global checkpoint. Sometimes index shard stats are not available (e.g. during a restart) and we fail now, while it is very likely that these stats will be available some time later.	2018-10-26 15:14:24 +02:00
Nhat Nguyen	ff49e79d40	CCR: Rename follow-task parameters and stats (#34836 ) * CCR: Rename follow parameters and stats This commit renames the follow-task parameters and its stats. Below are the changes: ## Params - remote_cluster (unchanged) - leader_index (unchanged) - max_read_request_operation_count -> max_read_request_operation_count - max_batch_size -> max_read_request_size - max_write_request_operation_count (new) - max_write_request_size (new) - max_concurrent_read_batches -> max_outstanding_read_requests - max_concurrent_write_batches -> max_outstanding_write_requests - max_write_buffer_size (unchanged) - max_write_buffer_count (unchanged) - max_retry_delay (unchanged) - poll_timeout -> read_poll_timeout ## Stats - remote_cluster (unchanged) - leader_index (unchanged) - follower_index (unchanged) - shard_id (unchanged) - leader_global_checkpoint (unchanged) - leader_max_seq_no (unchanged) - follower_global_checkpoint (unchanged) - follower_max_seq_no (unchanged) - last_requested_seq_no (unchanged) - number_of_concurrent_reads -> outstanding_read_requests - number_of_concurrent_writes -> outstanding_write_requests - buffer_size_in_bytes -> write_buffer_size_in_bytes (new) - number_of_queued_writes -> write_buffer_operation_count - mapping_version -> follower_mapping_version - total_fetch_time_millis -> total_read_time_millis - total_fetch_remote_time_millis -> total_read_remote_exec_time_millis - number_of_successful_fetches -> successful_read_requests - number_of_failed_fetches -> failed_read_requests - operation_received -> operations_read - total_transferred_bytes -> bytes_read - total_index_time_millis -> total_write_time_millis [?] - number_of_successful_bulk_operations -> successful_write_requests - number_of_failed_bulk_operations -> failed_write_requests - number_of_operations_indexed -> operations_written - fetch_exception -> read_exceptions - time_since_last_read_millis -> time_since_last_read_millis * add test for max_write_request_(operation_count\|size)	2018-10-25 10:36:15 +02:00
Martijn van Groningen	6fe0e62b7a	[CCR] Added write buffer size limit (#34797 ) This limit is based on the size in bytes of the operations in the write buffer. If this limit is exceeded then no more read operations will be coordinated until the size in bytes of the write buffer has dropped below the configured write buffer size limit. Renamed existing `max_write_buffer_size` to ``max_write_buffer_count` to indicate that limit is count based. Closes #34705	2018-10-24 23:48:49 +02:00
Andrey Atapin	5f588180f9	Improve IndexNotFoundException's default error message (#34649 ) This commit adds the index name to the error message when an index is not found.	2018-10-24 12:53:31 -07:00
Nhat Nguyen	d73768f812	CCR: Do not follow if leader does not have soft-deletes (#34767 ) We should not create a follower index and abort a follow request if the leader does not have soft-deletes. Moreover, we also should not auto-follow an index if it does not have soft-deletes.	2018-10-24 11:19:39 -04:00
Alpar Torok	795d57b4f9	Auto configure all test tasks (#34666 ) With this change, we apply the common test config automatically to all newly created tasks instead of opting in specifically. For plugin authors using the plugin externally this means that the configuration will be applied to their RandomizedTestingTasks as well. The purpose of the task is to simplify setup and make it easier to change projects that use the `test` task but actually run integration tests to use a task called `integTest` for clarity, but also because we may want to configure and run them differently. E.x. using different levels of concurrency.	2018-10-24 16:05:50 +03:00
Martijn van Groningen	76240e6bbe	[CCR] Renamed leader_cluster to remote_cluster (#34776 ) and also some occurrences of clusterAlias to remoteCluster. Closes #34682	2018-10-24 13:39:36 +02:00
Boaz Leskes	be907516ad	Change ShardFollowTask defaults (#34793 ) Per #31717 this commit changes the defaults to the following: Batch size of 5120 ops. Maximum of 12 concurrent read requests. Maximum of 9 concurrent write requests. This is not necessarily our final values but it's good to have these as defaults for the purposes of initial testing.	2018-10-24 13:32:48 +02:00
Martijn van Groningen	18007a29b2	[CCR] Made leader cluster required in shard follow task. Left over from #34580	2018-10-24 08:38:25 +02:00
Martijn van Groningen	abf8cb6706	[CCR] Cleanup pause follow action (#34183 ) * Change the `TransportPauseFollowAction` to extend from `TransportMasterNodeAction` instead of `HandledAction`, this removes a sync cluster state api call. * Introduced `ResponseHandler` that removes duplicated code in `TransportPauseFollowAction` and `TransportResumeFollowAction`. * Changed `PauseFollowAction.Request` to not use `readFrom()`.	2018-10-24 08:12:39 +02:00
Martijn van Groningen	0efba0675e	[CCR] Add qa test library (#34611 ) * Introduced test qa lib that all CCR qa modules depend on to avoid test code duplication.	2018-10-23 23:24:32 +02:00
Nhat Nguyen	e242fd2e42	CCR: Add TransportService closed to retryable errors (#34722 ) Both testFollowIndexAndCloseNode and testFailOverOnFollower failed because they responded to the FollowTask a TransportService closed exception which is currently considered as a fatal error. This behavior is not desirable since a closing node can throw that exception, and we should retry in that case. This change adds TransportService closed error to the list of retryable errors. Closes #34694	2018-10-23 14:23:29 -04:00
Martijn van Groningen	ed817fb265	[CCR] Move leader_index and leader_cluster parameters from resume follow to put follow api (#34638 ) As part of this change the leader index name and leader cluster name are stored in the CCR metadata in the follow index. The resume follow api will read that when a resume follow request is executed.	2018-10-23 19:37:45 +02:00
Nhat Nguyen	5923ea536e	CCR: Requires soft-deletes on the follower (#34725 ) Since #34412 and #34474, a follower must have soft-deletes enabled to work correctly. This change requires soft-deletes on the follower. Relates #34412 Relates #34474	2018-10-23 11:51:17 -04:00
Martijn van Groningen	e6d87cc09f	[CCR] Add total fetch time leader stat (#34577 ) Add total fetch time leader stat, that keeps track how much time was spent on fetches from the leader cluster perspective.	2018-10-23 16:41:06 +02:00
Martijn van Groningen	36baf3823d	[CCR] Auto follow pattern APIs adjustments (#34518 ) * Changed the resource id of auto follow patterns to be a user defined name instead of being the leader cluster alias name. * Fail when an unfollowed leader index matches with two or more auto follow patterns.	2018-10-23 15:48:51 +02:00
Jason Tedor	52fc502b7e	Fix the casing in the names of some CCR classes We should be consistent here. We were already using the casing "Ccr" and this is the preferred casing for Java class names. This commit adjusts the names of some classes that were using the casing "CCR" to be "Ccr".	2018-10-22 11:25:00 -04:00
Jason Tedor	7af19b8f81	Migrate wait for pending tasks helper to server (#34675 ) In some of our X-Pack REST tests we have to wait for pending tasks to complete. We are now needing this functionality in ESRestTestCase for the docs tests where we run against X-Pack features. This commit moves the helper method that we have in X-Pack to ESRestTestCase, and removes duplicate logic from waiting for rollup tasks to complete.	2018-10-22 11:14:02 -04:00
Martijn van Groningen	92e34732f5	[CCR] Remove ccr related metadata between tests for single node tests too	2018-10-22 09:15:22 +02:00
Martijn van Groningen	b6750cf6c2	[CCR] Muted tests Relates to #34696	2018-10-22 08:47:31 +02:00
Martijn van Groningen	f51301a1a6	[CCR] Moved integration test	2018-10-22 08:44:41 +02:00
Martijn van Groningen	b816837d39	[CCR] Always remove persistent tasks metadata between tests and better handle assertion errors between tests.	2018-10-22 08:15:43 +02:00
Nhat Nguyen	d90b6730c7	CCR: Following primary should process NoOps once (#34408 ) This is a follow-up for #34288. Relates #34412	2018-10-19 21:10:13 -04:00
Nhat Nguyen	630d5514a5	CCR/TEST: Adjust testFailOverOnFollower CI passed but the result is outdated after PR #34366 was merged.	2018-10-19 15:06:44 -04:00
Nhat Nguyen	bd92a28cfc	CCR: Replicate existing ops with old term on follower (#34412 ) Since #34288, we might hit deadlock if the FollowTask has more fetchers than writers. This can happen in the following scenario: Suppose the leader has two operations [seq#0, seq#1]; the FollowTask has two fetchers and one writer. 1. The FollowTask issues two concurrent fetch requests: {from_seq_no: 0, num_ops:1} and {from_seq_no: 1, num_ops:1} to read seq#0 and seq#1 respectively. 2. The second request which fetches seq#1 completes before, and then it triggers a write request containing only seq#1. 3. The primary of a follower fails after it has replicated seq#1 to replicas. 4. Since the old primary did not respond, the FollowTask issues another write request containing seq#1 (resend the previous write request). 5. The new primary has seq#1 already; thus it won't replicate seq#1 to replicas but will wait for the global checkpoint to advance at least seq#1. The problem is that the FollowTask has only one writer and that writer is waiting for seq#0 which won't be delivered until the writer completed. This PR proposes to replicate existing operations with the old primary term (instead of the current term) on the follower. In particular, when the following primary detects that it has processed an process already, it will look up the term of an existing operation with the same seq_no in the Lucene index, then rewrite that operation with the old term before replicating it to the following replicas. This approach is wait-free but requires soft-deletes on the follower. Relates #34288	2018-10-19 13:56:00 -04:00
Nhat Nguyen	90ca5b1fde	Fill LocalCheckpointTracker with Lucene commit (#34474 ) Today we rely on the LocalCheckpointTracker to ensure no duplicate when enabling optimization using max_seq_no_of_updates. The problem is that the LocalCheckpointTracker is not fully reloaded when opening an engine with an out-of-order index commit. Suppose the starting commit has seq#0 and seq#2, then the current LocalCheckpointTracker would return "false" when asking if seq#2 was processed before although seq#2 in the commit. This change scans the existing sequence numbers in the starting commit, then marks these as completed in the LocalCheckpointTracker to ensure the consistent state between LocalCheckpointTracker and Lucene commit.	2018-10-19 12:38:06 -04:00
Martijn van Groningen	56d4f69718	Renamed remaining leader_cluster_alias / cluster_alias to leader_cluster	2018-10-19 07:59:56 +02:00
Martijn van Groningen	44b461aff2	[CCR] Make leader cluster a required argument. (#34580 ) This change makes it no longer possible to follow / auto follow without specifying a leader cluster. If a local index needs to be followed then `cluster.remote.*.seeds` should point to nodes in the local cluster. Closes #34258	2018-10-19 07:41:46 +02:00
Martijn van Groningen	0d62f6102c	[CCR] Split cluster alias from leader index field into its own field in follow APIs (#34366 )	2018-10-18 12:11:48 +02:00
Jason Tedor	3e067123a1	Remove dead methods from ChainIT This commit removes some unused methods from ChainIT.	2018-10-16 10:45:33 -04:00
Martijn van Groningen	a1ec91395c	Changed CCR internal integration tests to use a leader and follower cluster instead of a single cluster (#34344 ) The `AutoFollowTests` needs to restart the clusters between each tests, because it is using auto follow stats in assertions. Auto follow stats are only reset by stopping the elected master node. Extracted the `testGetOperationsBasedOnGlobalSequenceId()` test to its own test, because it just tests the shard changes api. * Renamed AutoFollowTests to AutoFollowIT, because it is an integration test. Renamed ShardChangesIT to IndexFollowingIT, because shard changes it the name of an internal api and isn't a good name for an integration test. * move creation of NodeConfigurationSource to a seperate method * Fixes issues after merge, moved assertSeqNos() and assertSameDocIdsOnShards() methods from ESIntegTestCase to InternalTestCluster, so that ccr tests can use these methods too.	2018-10-16 14:45:46 +02:00
Jason Tedor	e0b6721df4	Add dedicated test for chain replication (#34497 ) This commit adds a dedicated test that chain replication leader -> middle -> follow is successful.	2018-10-16 06:21:28 -04:00
Martijn van Groningen	f7df8718b9	[CCR] Don't fail shard follow tasks in case of a non-retryable error (#34404 )	2018-10-16 07:44:15 +02:00
Martijn van Groningen	51eca14288	[TEST] Make sure there are shards started so that `ESIntegTestCase#assertSameDocIdsOnShards()` does not fail with shard not found.	2018-10-15 10:24:28 +02:00
Martijn van Groningen	74dc2da873	Change shard changes api's threadpool from get to search (#34421 )	2018-10-15 08:09:00 +01:00
Nhat Nguyen	429c29e833	CCR/TEST: AwaitsFix testFailOverOnFollower Tracked at #34412	2018-10-13 21:05:33 -04:00
Nhat Nguyen	7bc11a8099	Unmute testFollowIndexAndCloseNode This issue was resolved by #34288. Closes #33337 Relates #34288	2018-10-10 15:48:22 -04:00
Nhat Nguyen	33791ac27c	CCR: Following primary should process operations once (#34288 ) Today we rewrite the operations from the leader with the term of the following primary because the follower should own its history. The problem is that a newly promoted primary may re-assign its term to operations which were replicated to replicas before by the previous primary. If this happens, some operations with the same seq_no may be assigned different terms. This is not good for the future optimistic locking using a combination of seqno and term. This change ensures that the primary of a follower only processes an operation if that operation was not processed before. The skipped operations are guaranteed to be delivered to replicas via either primary-replica resync or peer-recovery. However, the primary must not acknowledge until the global checkpoint is at least the highest seqno of all skipped ops (i.e., they all have been processed on every replica). Relates #31751 Relates #31113	2018-10-10 15:39:57 -04:00
Martijn van Groningen	268e134121	renamed test class	2018-10-08 15:05:50 +02:00
Martijn van Groningen	c6c83d19f7	[CCR] Clear fetch exceptions if an empty but successful shard changes response returns (#34256 ) Also fixed ShardFollowNodeTaskTests to not return ops when responseSize is empty. Otherwise ops are returned when no ops are expected to be returned. Co-authored-by: Jason Tedor <jason@tedor.me>	2018-10-06 07:53:37 -04:00
Martijn van Groningen	899e48395b	[CCR] Change unfollow API's privilege scheme. (#34175 ) Unfollow should be allowed / disallowed on a per index level instead of cluster level. Also renamed `create_follow_index` index privilege to `manage_follow_index` privilege and include unfollow and close APIs.	2018-10-06 07:38:28 -04:00
Jason Tedor	7d57bdb3a0	Follow stats structure (#34301 ) This commit modifies the follow stats API response structure to more clearly highlight meaning of the higher level fields. In particular, previously the response had a top-level key for each index. Instead, we nest the indices under an "indices" field which is now an array. The values in this array are objects containing two fields: "index" which is the name of the follower index, and "shards" which is an array where each value in the array is the follower stats for that shard. That is, we have gone from: { "bar": [ { "shard_id": 0... }... ]... } to { "indices": [ { "index": "bar", "shards": [ { "shard_id": 0... }... ] }... }	2018-10-05 06:38:20 -04:00
Jason Tedor	7478167d60	Rename CCR stats implementation (#34300 ) In the CCR docs we want to refer to the endpoint that returns following stats as the follow stats API. This commit renames the internal implementation of this endpoint to reflect this usage.	2018-10-05 06:25:24 -04:00
Nhat Nguyen	d7893fd1e4	TEST: Mute testFollowIndexAndCloseNode Tracked at #33337	2018-10-02 17:20:31 -04:00
Martijn van Groningen	7f5c2f1050	[CCR] Validate follower index historyUUIDs (#34078 ) The follower index shard history UUID will be fetched from the indices stats api when the shard follow task starts and will be provided with the bulk shard operation requests. The bulk shard operations api will fail if the provided history uuid is unequal to the actual history uuid. No longer record the leader history uuid in shard follow task params, but rather use the leader history UUIDs directly from follower index's custom metadata. The resume follow api will remain to fail if leader index shard history UUIDs are missing. Closes #33956	2018-10-02 18:01:06 +02:00
Martijn van Groningen	d12a64eac2	[CCR] Only use primary shards and get expected count from leader index (#34186 ) Closes #34173	2018-10-01 20:13:16 +02:00
Nhat Nguyen	a02debadfe	TEST: Unmute testFollowIndexAndCloseNode Since #34099, the FollowingEngine will skip an operation which was already processed before. With that change, it should be okay to unmute testFollowIndexAndCloseNode.	2018-10-01 11:59:33 -04:00
Jason Tedor	80f7c1dcc9	Fix compilation in unfollow action tests This arose when two commits were pushed at roughly the same time, both of which compiled successfully against master, but not when taken together. This commit fixes a reference in one of the commits that was changed in the other commit.	2018-09-30 14:30:08 -04:00
Jason Tedor	1893765055	Change CCR stats endpoint to be index-centric (#34169 ) This commit modifies the CCR stats endpoint for indices to be /{index}/_ccr/stats. This makes this endpoint consistent with other index-centric endpoints like indices stats.	2018-09-30 14:29:32 -04:00
Jason Tedor	e2bd2028d8	Allow specifying shard changes batch sizes in bytes (#34168 ) This commit changes the shard changes requests from using a raw byte value to being able to be specified using bytes units (e.g., 4mb).	2018-09-30 14:22:22 -04:00
Martijn van Groningen	7c91c7a638	fixed test compile error	2018-09-30 19:31:30 +02:00
Martijn van Groningen	b1a27b2e6b	[CCR] Add unfollow API (#34132 ) The unfollow API changes a follower index into a regular index, so that it will accept write requests from clients. For the unfollow api to work the index follow needs to be stopped and the index needs to be closed. Closes #33931	2018-09-30 19:19:34 +02:00
Nhat Nguyen	ad61398879	CCR: Optimize indexing ops using seq_no on followers (#34099 ) This change introduces the indexing optimization using sequence numbers in the FollowingEngine. This optimization uses the max_seq_no_updates which is tracked on the primary of the leader and replicated to replicas and followers. Relates #33656	2018-09-28 20:42:26 -04:00
Martijn van Groningen	a984f8afb3	[CCR] Validate index privileges prior to following an index (#33758 ) Prior to following an index in the follow API, check whether current user has sufficient privileges in the leader cluster to read and monitor the leader index. Also check this in the create and follow API prior to creating the follow index. Also introduced READ_CCR cluster privilege that include the minimal cluster level actions that are required for ccr in the leader cluster. So a user can follow indices in a cluster, but not use the ccr admin APIs. Closes #33553 Co-authored-by: Jason Tedor <jason@tedor.me>	2018-09-28 17:51:23 +02:00

1 2 3 4 5 ...

312 Commits