OpenSearch

Commit Graph

Author	SHA1	Message	Date
Henning Andersen	68ed72b923	Handle scheduler exceptions (#38014 ) Scheduler.schedule(...) would previously assume that caller handles exception by calling get() on the returned ScheduledFuture. schedule() now returns a ScheduledCancellable that no longer gives access to the exception. Instead, any exception thrown out of a scheduled Runnable is logged as a warning. This is a continuation of #28667, #36137 and also fixes #37708.	2019-01-31 17:51:45 +01:00
Alpar Torok	b7de8e1d1e	Mute failing test Tracking #38100	2019-01-31 17:01:16 +02:00
Alpar Torok	f15d7b9b91	Mute failing test Tracking #38027	2019-01-31 16:55:52 +02:00
Nhat Nguyen	1a93976ff7	Correct arg names when update mapping/settings from leader (#38063 ) These two arguments are not named incorrectly and caused confusion.	2019-01-31 02:45:42 -05:00
Tim Brooks	b88bdfe958	Add dispatching to `HandledTransportAction` (#38050 ) This commit allows implementors of the `HandledTransportAction` to specify what thread the action should be executed on. The motivation for this commit is that certain CCR requests should be performed on the generic threadpool.	2019-01-30 15:40:49 -07:00
Tim Brooks	aeab55e8d1	Reduce flaxiness of ccr recovery timeouts test (#38035 ) This fixes #38027. Currently we assert that all shards have failed. However, it is possible that some shards do not have segement files created yet. The action that we block is fetching these segement files so it is possible that some shards successfully recover. This commit changes the assertion to ensure that at least some of the shards have failed.	2019-01-30 14:13:23 -07:00
Martijn van Groningen	5433af28e3	Fixed test bug, lastFollowTime is null if there are no follower indices.	2019-01-30 19:33:16 +01:00
Martijn van Groningen	f51bc00fcf	Added ccr to xpack usage infrastructure (#37256 ) * Added ccr to xpack usage infrastructure Closes #37221	2019-01-30 07:58:26 +01:00
Tim Brooks	55b916afc0	Ensure task metadata not null in follow test (#37993 ) This commit fixes a potential race in the IndexFollowingIT. Currently it is possible that we fetch the task metadata, it is null, and that throws a null pointer exception. Assertbusy does not catch null pointer exceptions. This commit assertions that the metadata is not null.	2019-01-29 15:58:31 -07:00
Tim Brooks	f3f9cabd67	Add timeout for ccr recovery action (#37840 ) This is related to #35975. It adds a action timeout setting that allows timeouts to be applied to the individual transport actions that are used during a ccr recovery.	2019-01-29 12:29:06 -07:00
Tim Brooks	00ace369af	Use `CcrRepository` to init follower index (#35719 ) This commit modifies the put follow index action to use a CcrRepository when creating a follower index. It routes the logic through the snapshot/restore process. A wait_for_active_shards parameter can be used to configure how long to wait before returning the response.	2019-01-29 11:47:29 -07:00
Przemyslaw Gomulka	891320f5ac	Elasticsearch support to JSON logging (#36833 ) In order to support JSON log format, a custom pattern layout was used and its configuration is enclosed in ESJsonLayout. Users are free to use their own patterns, but if smooth Beats integration is needed, they should use ESJsonLayout. EvilLoggerTests are left intact to make sure user's custom log patterns work fine. To populate additional fields node.id and cluster.uuid which are not available at start time, a cluster state update will have to be received and the values passed to log4j pattern converter. A ClusterStateObserver.Listener is used to receive only one ClusteStateUpdate. Once update is received the nodeId and clusterUUid are set in a static field in a NodeAndClusterIdConverter. Following fields are expected in JSON log lines: type, tiemstamp, level, component, cluster.name, node.name, node.id, cluster.uuid, message, stacktrace see ESJsonLayout.java for more details and field descriptions Docker log4j2 configuration is now almost the same as the one use for ES binary. The only difference is that docker is using console appenders, whereas ES is using file appenders. relates: #32850	2019-01-29 07:20:09 +01:00
Nhat Nguyen	557fcf915e	Wait for mapping in testReadRequestsReturnLatestMappingVersion (#37886 ) If the index request is executed before the mapping update is applied on the IndexShard, the index request will perform a dynamic mapping update. This mapping update will be timeout (i.e, ProcessClusterEventTimeoutException) because the latch is not open. This leads to the failure of the index request and the test. This commit makes sure the mapping is ready before we execute the index request. Closes #37807	2019-01-28 15:25:56 -05:00
Martijn van Groningen	4e1a779773	Prepare ShardFollowNodeTask to bootstrap when it fall behind leader shard (#37562 ) * Changed `LuceneSnapshot` to throw an `OperationsMissingException` if the requested ops are missing. * Changed the shard changes api to handle the `OperationsMissingException` and wrap the exception into `ResourceNotFound` exception and include metadata to indicate the requested range can no longer be retrieved. * Changed `ShardFollowNodeTask` to handle this `ResourceNotFound` exception with the included metdata header. Relates to #35975	2019-01-28 09:30:04 +01:00
Dimitrios Liappis	290c6637c2	Refactor into appropriate uses of scheduleUnlessShuttingDown (#37709 ) Replace `threadPool().schedule()` / catch `EsRejectedExecutionException` pattern with direct calls to `ThreadPool#scheduleUnlessShuttingDown()`. Closes #36318	2019-01-28 10:01:26 +02:00
Julie Tibshirani	7c130d235a	Mute CcrRepositoryIT#testFollowerMappingIsUpdated Tracked in #37887.	2019-01-25 14:55:47 -08:00
Tanguy Leroux	f1f54e0f61	TransportUnfollowAction should increase settings version (#37859 ) The TransportUnfollowAction updates the index settings but does not increase the settings version to reflect that change. This issue has been caught while working on the replication of closed indices (#33888). The IndexFollowingIT.testUnfollowIndex() started to fail and this specific assertion tripped. It does not happen on master branch today because index metadata for closed indices are never updated in IndexService instances, but this is something that is going to change with the replication of closed indices.	2019-01-25 16:31:26 +01:00
Martijn van Groningen	1151f3b3ff	Fail with a dedicated exception if remote connection is missing or (#37767 ) or connectivity to the remote connection is failing. Relates to #37681	2019-01-25 08:53:18 +01:00
Nhat Nguyen	76fb573569	Do not allow put mapping on follower (#37675 ) Today, the mapping on the follower is managed and replicated from its leader index by the ShardFollowTask. Thus, we should prevent users from modifying the mapping on the follower indices. Relates #30086	2019-01-24 12:13:00 -05:00
David Roberts	f12bfb4684	Mute FollowerFailOverIT testReadRequestsReturnsLatestMappingVersion Due to https://github.com/elastic/elasticsearch/issues/37807	2019-01-24 09:58:50 +00:00
Martijn van Groningen	2908ca1b35	Fix index filtering in follow info api. (#37752 ) The filtering by follower index was completely broken. Also the wrong persistent tasks were selected, causing the wrong status to be reported. Closes #37738	2019-01-24 08:50:23 +01:00
Nhat Nguyen	0096f1b2e4	Ensure changes requests return the latest mapping version (#37633 ) Today we keep the mapping on the follower in sync with the leader's using the mapping version from changes requests. There are two rare cases where the mapping on the follower is not synced properly: 1. The returned mapping version (from ClusterService) is outdated than the actual mapping. This happens because we expose the latest cluster state in ClusterService after applying it to IndexService. 2. It's possible for the FollowTask to receive an outdated mapping than the min_required_mapping. In that case, it should fetch the mapping again; otherwise, the follower won't have the right mapping. Relates to #31140	2019-01-23 13:41:13 -05:00
Tim Brooks	eb43ab6d60	Implement leader rate limiting for file restore (#37677 ) This is related to #35975. This commit implements rate limiting on the leader side using the CombinedRateLimiter.	2019-01-22 10:57:37 -07:00
Martijn van Groningen	ef2f5e4a13	Follow stats api should return a 404 when requesting stats for a non existing index (#37220 ) Currently it returns an empty response with a 200 response code. Closes #37021	2019-01-22 12:48:05 +01:00
Ryan Ernst	9a34b20233	Simplify integ test distribution types (#37618 ) The integ tests currently use the raw zip project name as the distribution type. This commit simplifies this specification to be "default" or "oss". Whether zip or tar is used should be an internal implementation detail of the integ test setup, which can (in the future) be platform specific.	2019-01-21 12:37:17 -08:00
Martijn van Groningen	88f4b0a326	Do not set fatal exception when shard follow task is stopped. (#37603 ) When shard follow task is cancelled while fetching operations then the fatal exception field should not be set.	2019-01-21 07:54:51 +01:00
Tim Brooks	fe753ee1d2	Do not add index event listener if CCR disabled (#37432 ) Currently we add the CcrRestoreSourceService as a index event listener. However, if ccr is disabled, this service is null and we attempt to add a null listener throwing an exception. This commit only adds the listener if ccr is enabled.	2019-01-18 16:31:21 -07:00
Tim Brooks	cd41289396	Add local session timeouts to leader node (#37438 ) This is related to #35975. This commit adds timeout functionality to the local session on a leader node. When a session is started, a timeout is scheduled using a repeatable runnable. If the session is not accessed in between two runs the session is closed. When the sssion is closed, the repeating task is cancelled. Additionally, this commit moves session uuid generation to the leader cluster. And renames the PutCcrRestoreSessionRequest to StartCcrRestoreSessionRequest to reflect that change.	2019-01-18 14:48:20 -07:00
Martijn van Groningen	6846666b6b	Add ccr follow info api (#37408 ) * Add ccr follow info api This api returns all follower indices and per follower index the provided parameters at put follow / resume follow time and whether index following is paused or active. Closes #37127 * iter * [DOCS] Edits the get follower info API * [DOCS] Fixes link to remote cluster * [DOCS] Clarifies descriptions for configured parameters	2019-01-18 16:37:21 +01:00
Tim Brooks	978c818d0f	Use RestoreSnapshotRequest in CcrRepositoryIT Commit #37535 removed an internal restore request in favor of the RestoreSnapshotRequest. Commit #37449 added a new test that used the internal restore request. This commit modifies the new test to use the RestoreSnapshotRequest.	2019-01-17 15:31:27 -07:00
Tim Brooks	b6f06a48c0	Implement follower rate limiting for file restore (#37449 ) This is related to #35975. This commit implements rate limiting on the follower side using a new class `CombinedRateLimiter`.	2019-01-17 14:58:46 -07:00
Armin Braun	381d035cd6	Remove Redundant RestoreRequest Class (#37535 ) * Same as #37464 but for the restore side	2019-01-17 22:23:23 +01:00
Martijn van Groningen	b85bfd3e17	Added fatal_exception field for ccr stats in monitoring mapping. (#37563 )	2019-01-17 14:04:41 +01:00
Martijn van Groningen	99b09845da	Moved ccr integration to the package with other ccr integration tests.	2019-01-17 13:57:56 +01:00
Przemyslaw Gomulka	5e94f384c4	Remove the use of AbstracLifecycleComponent constructor #37488 (#37488 ) The AbstracLifecycleComponent used to extend AbstractComponent, so it had to pass settings to the constractor of its supper class. It no longer extends the AbstractComponent so there is no need for this constructor There is also no need for AbstracLifecycleComponent subclasses to have Settings in their constructors if they were only passing it over to super constructor. This is part 1. which will be backported to 6.x with a migration guide/deprecation log. part 2 will have this constructor removed in 7 relates #35560 relates #34488	2019-01-16 09:05:30 +01:00
Martijn van Groningen	9554b2fecb	When removing an AutoFollower also mark it as removed. (#37402 ) Currently when there are no more auto follow patterns for a remote cluster then the AutoFollower instance for this remote cluster will be removed. If a new auto follow pattern for this remote cluster gets added quickly enough after the last delete then there may be two AutoFollower instance running for this remote cluster instead of one. Each AutoFollower instance stops automatically after it sees in the start() method that there are no more auto follow patterns for the remote cluster it is tracking. However when an auto follow pattern gets removed and then added back quickly enough then old AutoFollower may never detect that at some point there were no auto follow patterns for the remote cluster it is monitoring. The creation and removal of an AutoFollower instance happens independently in the `updateAutoFollowers()` as part of a cluster state update. By adding the `removed` field, an AutoFollower instance will not miss the fact there were no auto follow patterns at some point in time. The `updateAutoFollowers()` method now marks an AutoFollower instance as removed when it sees that there are no more patterns for a remote cluster. The updateAutoFollowers() method can then safely start a new AutoFollower instance. Relates to #36761	2019-01-15 16:24:19 +01:00
Julie Tibshirani	36a3b84fc9	Update the default for include_type_name to false. (#37285 ) * Default include_type_name to false for get and put mappings. * Default include_type_name to false for get field mappings. * Add a constant for the default include_type_name value. * Default include_type_name to false for get and put index templates. * Default include_type_name to false for create index. * Update create index calls in REST documentation to use include_type_name=true. * Some minor clean-ups around the get index API. * In REST tests, use include_type_name=true by default for index creation. * Make sure to use 'expression == false'. * Clarify the different IndexTemplateMetaData toXContent methods. * Fix FullClusterRestartIT#testSnapshotRestore. * Fix the ml_anomalies_default_mappings test. * Fix GetFieldMappingsResponseTests and GetIndexTemplateResponseTests. We make sure to specify include_type_name=true during xContent parsing, so we continue to test the legacy typed responses. XContent generation for the typeless responses is currently only covered by REST tests, but we will be adding unit test coverage for these as we implement each typeless API in the Java HLRC. This commit also refactors GetMappingsResponse to follow the same appraoch as the other mappings-related responses, where we read include_type_name out of the xContent params, instead of creating a second toXContent method. This gives better consistency in the response parsing code. * Fix more REST tests. * Improve some wording in the create index documentation. * Add a note about types removal in the create index docs. * Fix SmokeTestMonitoringWithSecurityIT#testHTTPExporterWithSSL. * Make sure to mention include_type_name in the REST docs for affected APIs. * Make sure to use 'expression == false' in FullClusterRestartIT. * Mention include_type_name in the REST templates docs.	2019-01-14 13:08:01 -08:00
Tim Brooks	5c68338a1c	Implement ccr file restore (#37130 ) This is related to #35975. It implements a file based restore in the CcrRepository. The restore transfers files from the leader cluster to the follower cluster. It does not implement any advanced resiliency features at the moment. Any request failure will end the restore.	2019-01-14 13:07:55 -07:00
Martijn van Groningen	de852765d6	unmuted test Relates to #37014	2019-01-14 14:27:42 +01:00
Martijn van Groningen	e4391afd98	Test fix, wait for auto follower to have stopped in the background Relates to #36761	2019-01-11 17:26:17 +01:00
Martijn van Groningen	6d81e7c3e7	[CCR] FollowingEngine should fail with 403 if operation has no seqno assigned (#37213 ) Fail with a 403 when indexing a document directly into a follower index. In order to test this change, I had to move specific assertions into a dedicated class and disable assertions for that class in the rest qa module. I think that is the right trade off.	2019-01-10 15:54:34 +01:00
Martijn van Groningen	df488720e0	[CCR] Make shard follow tasks more resilient for restarts (#37239 ) If a running shard follow task needs to be restarted and the remote connection seeds have changed then a shard follow task currently fails with a fatal error. The change creates the remote client lazily and adjusts the errors a shard follow task should retry. This issue was found in test failures in the recently added ccr rolling upgrade tests. The reason why this issue occurs more frequently in the rolling upgrade test is because ccr is setup in local mode (so remote connection seed will become stale) and all nodes are restarted, which forces the shard follow tasks to get restarted at some point during the test. Note that these tests cannot be enabled yet, because this change will need to be backported to 6.x first. (otherwise the issue still occurs on non upgraded nodes) I also changed the RestartIndexFollowingIT to setup remote cluster via persistent settings and to also restart the leader cluster. This way what happens during the ccr rolling upgrade qa tests, also happens in this test. Relates to #37231	2019-01-10 15:02:30 +01:00
Martijn van Groningen	1a41d84536	[CCR] Resume follow Api should not require a request body (#37217 ) Closes #37022	2019-01-10 09:48:26 +01:00
Martijn van Groningen	9122585359	[CCR] Added more logging.	2019-01-09 12:17:47 +01:00
Alpar Torok	6344e9a3ce	Testing conventions: add support for checking base classes (#36650 )	2019-01-08 13:39:03 +02:00
Jason Tedor	c8c596cead	Introduce retention lease expiration (#37195 ) This commit implements a straightforward approach to retention lease expiration. Namely, we inspect which leases are expired when obtaining the current leases through the replication tracker. At that moment, we clean the map that persists the retention leases in memory.	2019-01-07 22:03:52 -08:00
Jason Tedor	c0f8c89172	Introduce shard history retention leases (#37167 ) This commit is the first in a series which will culminate with fully-functional shard history retention leases. Shard history retention leases are aimed at preventing shard history consumers from having to fallback to expensive file copy operations if shard history is not available from a certain point. These consumers include following indices in cross-cluster replication, and local shard recoveries. A future consumer will be the changes API. Further, index lifecycle management requires coordinating with some of these consumers otherwise it could remove the source before all consumers have finished reading all operations. The notion of shard history retention leases that we are introducing here will also be used to address this problem. Shard history retention leases are a property of the replication group managed under the authority of the primary. A shard history retention lease is a combination of an identifier, a retaining sequence number, a timestamp indicating when the lease was acquired or renewed, and a string indicating the source of the lease. Being leases they have a limited lifespan that will expire if not renewed. The idea of these leases is that all operations above the minimum of all retaining sequence numbers will be retained during merges (which would otherwise clear away operations that are soft deleted). These leases will be periodically persisted to Lucene and restored during recovery, and broadcast to replicas under certain circumstances. This commit is merely putting the basics in place. This first commit only introduces the concept and integrates their use with the soft delete retention policy. We add some tests to demonstrate the basic management is correct, and that the soft delete policy is correctly influenced by the existence of any retention leases. We make no effort in this commit to implement any of the following: - timestamps - expiration - persistence to and recovery from Lucene - handoff during primary relocation - sharing retention leases with replicas - exposing leases in shard-level statistics - integration with cross-cluster replication These will occur individually in follow-up commits.	2019-01-07 07:43:57 -08:00
Jim Ferenczi	e38cf1d0dc	Add the ability to set the number of hits to track accurately (#36357 ) In Lucene 8 searches can skip non-competitive hits if the total hit count is not requested. It is also possible to track the number of hits up to a certain threshold. This is a trade off to speed up searches while still being able to know a lower bound of the total hit count. This change adds the ability to set this threshold directly in the track_total_hits search option. A boolean value (true, false) indicates whether the total hit count should be tracked in the response. When set as an integer this option allows to compute a lower bound of the total hits while preserving the ability to skip non-competitive hits when enough matches have been collected. Relates #33028	2019-01-04 20:36:49 +01:00
Luca Cavanna	c1beb95aa1	Mute LocalIndexFollowingIT#testRemoveRemoteConnection Relates to #37014	2018-12-28 16:39:36 +01:00
Nhat Nguyen	7580d9d925	Make SourceToParse immutable (#36971 ) Today the routing of a SourceToParse is assigned in a separate step after the object is created. We can easily forget to set the routing. With this commit, the routing must be provided in the constructor of SourceToParse. Relates #36921	2018-12-24 14:06:50 -05:00
Martijn van Groningen	561b704129	[CCR] AutoFollowCoordinator and follower index already created (#36540 ) The AutoFollowCoordinator should be resilient to the fact that the follower index has already been created and in that case it should only update the auto follow metadata with the fact that the follower index was created. Relates to #33007	2018-12-24 10:16:38 +01:00
Martijn van Groningen	44fe265d82	[CCR] Added auto_follow_exception.timestamp field to auto follow stats (#36947 ) Currently auto follow stats users are unable to see whether an auto follow error was recent or old. The new timestamp field will help user distinguish between old and new errors.	2018-12-24 07:53:51 +01:00
Martijn van Groningen	4fb62fcba6	Make CCR resilient against missing remote cluster connections (#36682 ) Both index following and auto following should be resilient against missing remote connections. This happens in the case that they get accidentally removed by a user. When this happens auto following and index following will retry to continue instead of failing with unrecoverable exceptions. Both the put follow and put auto follow APIs validate whether the remote cluster connection. The logic added in this change only exists in case during the lifetime of a follower index or auto follow pattern the remote connection gets removed. This retry behavior similar how CCR deals with authorization errors. Closes #36667 Closes #36255	2018-12-24 07:28:34 +01:00
Martijn van Groningen	4ded4717fe	[CCR] Add `ccr.auto_follow_coordinator.wait_for_timeout` setting (#36714 ) This setting controls the wait for timeout the autofollow coordinator should use when setting cluster state requests to a remote cluster.	2018-12-21 09:36:40 +01:00
Tim Brooks	d9b2ed6135	Send clear session as routable remote request (#36805 ) This commit adds a RemoteClusterAwareRequest interface that allows a request to specify which remote node it should be routed to. The remote cluster aware client will attempt to route the request directly to this node. Otherwise it will send it as a proxy action to eventually end up on the requested node. It implements the ccr clean_session action with this client.	2018-12-20 17:43:12 -07:00
Tim Brooks	4cd570593d	Update index mappings when ccr restore complete (#36879 ) This is related to #35975. When the shard restore process is complete, the index mappings need to be updated to ensure that the data in the files restores is compatible with the follower mappings. This commit implements a mapping update as the final step in a shard restore.	2018-12-20 13:53:04 -07:00
Martijn van Groningen	b42074c1cc	[CCR] Report error if auto follower tries auto follow a leader index with soft deletes disabled (#36886 ) Currently if a leader index with soft deletes disabled is auto followed then this index is silently ignored. This commit changes this behavior to mark these indices as auto followed and report an error, which is visible in auto follow stats. Marking the index as auto follow is important, because otherwise the auto follower will continuously try to auto follow and fail. Relates to #33007	2018-12-20 15:21:52 +01:00
Martijn van Groningen	7b1dfeff2e	Renamed `WHITE_LISTED_SETTINGS` to `NON_REPLICATED_SETTINGS` because the latter better describes the purpose of this field.	2018-12-20 15:08:04 +01:00
Martijn van Groningen	18691daebe	[TEST] Renamed ccr qa module.	2018-12-19 13:57:12 +01:00
Martijn van Groningen	3cc0cf03c6	[TEST] No need to specifically check licensesMetaData on master node.	2018-12-19 13:51:24 +01:00
Martijn van Groningen	a6af33ef0b	[TEST] Wait for license metadata to be installed	2018-12-19 13:03:45 +01:00
Alpar Torok	e9ef5bdce8	Converting randomized testing to create a separate unitTest task instead of replacing the builtin test task (#36311 ) - Create a separate unitTest task instead of Gradle's built in - convert all configuration to use the new task - the built in task is now disabled	2018-12-19 08:25:20 +02:00
Tim Brooks	1fa105658e	Add CcrRestoreSourceService to track sessions (#36578 ) This commit is related to #36127. It adds a CcrRestoreSourceService to track Engine.IndexCommitRef need for in-process file restores. When a follower starts restoring a shard through the CcrRepository it opens a session with the leader through the PutCcrRestoreSessionAction. The leader responds to the request by telling the follower what files it needs to fetch for a restore. This is not yet implemented. Once, the restore is complete, the follower closes the session with the DeleteCcrRestoreSessionAction action.	2018-12-18 11:23:13 -07:00
Martijn van Groningen	1afcfc97bd	[TEST] Added more logging Relates to #36761	2018-12-18 16:01:02 +01:00
Boaz Leskes	5f76f39386	Rename seq# powered optimistic concurrency control parameters to ifSeqNo/ifPrimaryTerm (#36757 ) This PR renames the parameters previously introduce to the following: ### URL Parameters ``` PUT twitter/_doc/1?if_seq_no=501&if_primary_term=1 { "user" : "kimchy", "post_date" : "2009-11-15T14:12:12", "message" : "trying out Elasticsearch" } DELETE twitter/_doc/1?if_seq_no=501&if_primary_term=1 ``` ### Bulk API ``` POST _bulk { "index" : { "_index" : "test", "_type" : "_doc", "_id" : "1", "if_seq_no": 501, "if_primary_term": 1 } } { "field1" : "value1" } { "delete" : { "_index" : "test", "_type" : "_doc", "_id" : "2", "if_seq_no": 501, "if_primary_term": 1 } } ``` ### Java API ``` IndexRequest.ifSeqNo(long seqNo) IndexRequest.ifPrimaryTerm(long primaryTerm) DeleteRequest.ifSeqNo(long seqNo) DeleteRequest.ifPrimaryTerm(long primaryTerm) ``` Relates #36148 Relates #10708	2018-12-18 14:35:18 +01:00
Martijn van Groningen	0ff1f1fa18	Muted tests. Relates to #36764	2018-12-18 13:39:01 +01:00
Martijn van Groningen	57e1a4bc9f	[TEST] Ensure shard follow tasks have really stopped. Relates to #36696	2018-12-18 10:43:27 +01:00
Tim Brooks	3dd5a5a3c5	Initialize startup `CcrRepositories` (#36730 ) Currently, the CcrRepositoryManger only listens for settings updates and installs new repositories. It does not install the repositories that are in the initial settings. This commit, modifies the manager to install the initial repositories. Additionally, it modifies the ccr integration test to configure the remote leader node at startup, instead of using a settings update.	2018-12-17 13:19:32 -07:00
Martijn van Groningen	a181a25226	[CCR] Add time since last auto follow fetch to auto follow stats (#36542 ) For each remote cluster the auto follow coordinator, starts an auto follower that checks the remote cluster state and determines whether an index needs to be auto followed. The time since last auto follow is reported per remote cluster and gives insight whether the auto follow process is alive. Relates to #33007 Originates from #35895	2018-12-17 14:14:56 +01:00
Martijn van Groningen	f27d2c2927	[TEST] Pause index following at end of test, so that no unexpected failures happen at test teardown.	2018-12-17 07:55:27 +01:00
Nhat Nguyen	2028c2af14	TEST: Do not assert max_seq_of_updates if promotion If a primary promotion happens in the test testAddRemoveShardOnLeader, the max_seq_no_of_updates_or_deletes on a new primary might be higher than the max_seq_no_of_updates_or_deletes on the replicas or copies of the follower. Relates #36607	2018-12-16 16:48:04 -05:00
Martijn van Groningen	97107e99e8	Moved test to its rightful place.	2018-12-16 13:57:51 +01:00
Boaz Leskes	733a6d34c1	Add seq no powered optimistic locking support to the index and delete transport actions (#36619 ) This commit add support for using sequence numbers to power [optimistic concurrency control](http://en.wikipedia.org/wiki/Optimistic_concurrency_control) in the delete and index transport actions and requests. A follow up will come with adding sequence numbers to the update and get results. Relates #36148 Relates #10708	2018-12-15 17:59:57 +01:00
Albert Zaharovits	a30e8c2fa3	HasPrivilegesResponse use TreeSet for fields (#36329 ) For class fields of type collection whose order is not important and for which duplicates are not permitted we declare them as `Set`s. Usually the definition is a `HashSet` but in this case `TreeSet` is used instead to aid testing.	2018-12-15 08:34:54 +02:00
Martijn van Groningen	68a674ef1f	[CCR] Fix follow stats API's follower index filtering feature (#36647 ) Currently always all follow stats for all follower indices are being returned even if follow stats for only specific indices are requested.	2018-12-14 19:39:30 +01:00
Armin Braun	c5b3ac5578	SNAPSHOTS: Allow Parallel Restore Operations (#36397 ) * Enable parallel restore operations * Add uuid to restore in progress entries to uniquely identify them * Adjust restore in progress entries to be a map in cluster state * Added tests for: * Parallel restore from two different snapshots * Parallel restore from a single snapshot to different indices to test uuid identifiers are correctly used by `RestoreService` and routing allocator * Parallel restore with waiting for completion to test transport actions correctly use uuid identifiers	2018-12-14 11:39:23 +01:00
Nhat Nguyen	a4b32f1143	Remove concurrency in testFailLeaderReplicaShard (#36607 ) testFailLeaderReplicaShard periodically fails because we concurrently index to the leader group and close one of its replicas. If a replication request hits a closing shard, we will fail that shard; however, failing a shard is supported by the test framework - this makes the test fail.	2018-12-13 19:02:13 -05:00
Boaz Leskes	f6b5d7e013	Add sequence numbers based optimistic concurrency control support to Engine (#36467 ) This commit add support to engine operations for resolving and verifying the sequence number and primary term of the last modification to a document before performing an operation. This is infrastructure to move our (optimistic concurrency control)[http://en.wikipedia.org/wiki/Optimistic_concurrency_control] API to use sequence numbers instead of internal versioning. Relates #36148 Relates #10708	2018-12-13 08:08:40 +01:00
Martijn van Groningen	883940ad92	[CCR] Change AutofollowCoordinator to use wait_for_metadata_version (#36264 ) Changed AutofollowCoordinator makes use of the wait_for_metadata_version feature in cluster state API and removed hard coded poll interval. Originates from #35895 Relates to #33007	2018-12-12 12:47:24 +01:00
Martijn van Groningen	4a825e2e86	[CCR] Clean followed leader index UUIDs in auto follow metadata (#36408 ) The auto follow coordinator keeps track of the UUIDs of indices that it has followed. The index UUID strings need to be cleaned up in the case that these indices are removed in the remote cluster. Relates to #33007	2018-12-12 09:55:37 +01:00
Nhat Nguyen	51800de2a8	Enable soft-deletes by default on 7.0.0 or later (#36141 ) This change enables soft-deletes by default on ES 7.0.0 or later. Relates #33222 Co-authored-by: Jason Tedor <jason@tedor.me>	2018-12-11 18:58:49 -05:00
Nhat Nguyen	f23701406b	CCR/TEST: Enable soft-deletes in ShardChangesActionTests Relates #36446	2018-12-11 15:00:09 -05:00
Andrey Ershov	8b821706cc	Switch more tests to zen2 (#36367 ) 1. CCR tests work without any changes 2. `testDanglingIndices` require changes the source code (added TODO). 3. `testIndexDeletionWhenNodeRejoins` because it's using just two nodes, adding the node to exclusions is needed on restart. 4. `testCorruptTranslogTruncationOfReplica` starts dedicated master one, because otherwise, the cluster does not form, if nodes are stopped and one node is started back. 5. `testResolvePath` needs TEST cluster, because all nodes are stopped at the end of the test and it's not possible to perform checks needed by SUITE cluster. 6. `SnapshotDisruptionIT`. Without changes, the test fails because Zen2 retries snapshot creation as soon as network partition heals. This results into the race between creating snapshot and test cleanup logic (deleting index). Zen1 on the other hand, also schedules retry, but it takes some time after network partition heals, so cleanup logic executes latter and test passes. The check that snapshot is eventually created is added to the end of the test.	2018-12-11 17:12:17 +01:00
Martijn van Groningen	633ab24017	[CCR] Restructured QA modules (#36404 ) Renamed the follow qa modules: `multi-cluster-downgraded-to-basic-license` to `downgraded-to-basic-license` `multi-cluster-with-non-compliant-license` to `non-compliant-license` `multi-cluster-with-security` to `security` Moved the `chain` module into the `multi-cluster` module and changed the `multi-cluster` to start 3 clusters. Followup from #36031	2018-12-09 19:34:48 +01:00
Nhat Nguyen	95bafb0593	TEST: Always enable soft-deletes in ShardChangesTests	2018-12-09 02:57:13 -05:00
Tim Brooks	8a53f2b464	Implement basic `CcrRepository` restore (#36287 ) This is related to #35975. It implements a basic restore functionality for the CcrRepository. When the restore process is kicked off, it configures the new index as expected for a follower index. This means that the index has a different uuid, the version is not incremented, and the Ccr metadata is installed. When the restore shard method is called, an empty shard is initialized.	2018-12-07 15:27:04 -07:00
Nhat Nguyen	f2df0a5be4	Remove LocalCheckpointTracker#resetCheckpoint (#34667 ) In #34474, we added a new assertion to ensure that the LocalCheckpointTracker is always consistent with Lucene index. However, we reset LocalCheckpoinTracker in testDedupByPrimaryTerm cause this assertion to be violated. This commit removes resetCheckpoint from LocalCheckpointTracker and rewrites testDedupByPrimaryTerm without resetting the local checkpoint. Relates #34474	2018-12-07 12:22:20 -05:00
Ryan Ernst	37b3fc383f	Build: Use explicit deps on test tasks for check (#36325 ) This commit moves back to use explicit dependsOn for test tasks on check. Not all tasks extending RandomizedTestingTask should be run by check directly.	2018-12-06 14:13:49 -08:00
Yannick Welsch	a0ae1cc987	Merge remote-tracking branch 'elastic/master' into zen2	2018-12-05 23:13:12 +01:00
Jim Ferenczi	18866c4c0b	Make hits.total an object in the search response (#35849 ) This commit changes the format of the `hits.total` in the search response to be an object with a `value` and a `relation`. The `value` indicates the number of hits that match the query and the `relation` indicates whether the number is accurate (in which case the relation is equals to `eq`) or a lower bound of the total (in which case it is equals to `gte`). This change also adds a parameter called `rest_total_hits_as_int` that can be used in the search APIs to opt out from this change (retrieve the total hits as a number in the rest response). Note that currently all search responses are accurate (`track_total_hits: true`) or they don't contain `hits.total` (`track_total_hits: true`). We'll add a way to get a lower bound of the total hits in a follow up (to allow numbers to be passed to `track_total_hits`). Relates #33028	2018-12-05 19:49:06 +01:00
Yannick Welsch	cc11953724	Merge remote-tracking branch 'elastic/master' into zen2	2018-12-05 16:55:45 +01:00
Tim Brooks	068c856e88	Rename internal repository actions to be internal (#36244 ) This is a follow-up to #36086. It renames the internal repository actions to be prefixed by "internal". This allows the system user to execute the actions. Additionally, this PR stops casting Client to NodeClient. The client we have is a NodeClient so executing the actions will be local.	2018-12-05 08:11:47 -07:00
Yannick Welsch	b20497560c	Merge remote-tracking branch 'elastic/master' into zen2	2018-12-05 14:06:38 +01:00
Martijn van Groningen	a264cb6ddb	Refactor AutoFollowCoordinator to track leader indices per remote cluster (#36031 ) and replaced poll interval setting with a hardcoded poll interval. The hard coded interval will be removed in a follow up change to make use of cluster state API's wait_for_metatdata_version. Before the auto following was bootstrapped from thread pool scheduler, but now auto followers for new remote clusters are bootstrapped when a new cluster state is published. Originates from #35895 Relates to #33007	2018-12-05 13:39:14 +01:00
Alpar Torok	60e45cd81d	Testing conventions task part 2 (#36107 ) Closes #35435 - make it easier to add additional testing tasks with the proper configuration and add some where they were missing. - mute or fix failing tests - add a check as part of testing conventions to find classes not included in any testing task.	2018-12-05 14:20:01 +02:00
Martijn van Groningen	11935cd480	Replace Streamable w/ Writeable in BaseTasksResponse and subclasses (#36176 ) This commit replaces usages of Streamable with Writeable for the BaseTasksResponse / TransportTasksAction classes and subclasses of these classes. Note that where possible response fields were made final. Relates to #34389	2018-12-05 13:14:10 +01:00
Yannick Welsch	42457b5960	Merge remote-tracking branch 'elastic/master' into zen2	2018-12-05 11:39:38 +01:00
Martijn van Groningen	b9707c29a1	[CCR] Change get autofollow patterns API response format (#36203 ) The current response format is: ``` { "pattern1": { ... }, "pattern2": { ... } } ``` The new format is: ``` { "patterns": [ { "name": "pattern1", "pattern": { ... } }, { "name": "pattern2", "pattern": { ... } } ] } ``` This format is more structured and more friendly for parsing and generating specs. This is a breaking change, but it is better to do this now while ccr is still a beta feature than later. Follow up from #36049	2018-12-05 08:41:27 +01:00
Tim Brooks	8bde608979	Register CcrRepository based on settings update (#36086 ) This commit adds an empty CcrRepository snapshot/restore repository. When a new cluster is registered in the remote cluster settings, a new CcrRepository is registered for that cluster. This is implemented using a new concept of "internal repositories". RepositoryPlugin now allows implementations to return factories for "internal repositories". The "internal repositories" are different from normal repositories in that they cannot be registered through the external repository api. Additionally, "internal repositories" are local to a node and are not stored in the cluster state. The repository will be unregistered if the remote cluster is removed.	2018-12-04 14:36:50 -07:00
Yannick Welsch	70c361ea5a	Merge remote-tracking branch 'elastic/master' into zen2	2018-12-04 21:26:11 +01:00
Adrien Grand	0df08dd458	Set Lucene version upon index creation. (#36038 ) It is important that all shards of a given index have the same `indexCreatedVersionMajor` to Lucene, or eg. merging those shards is going to be considered illegal. At the moment, we use the latest Lucene version when creating a shard, which could cause shards to have different created versions eg. in case of forced allocation. This commit makes sure to reuse the appropriate Lucene version in order to avoid such issues. Closes #33826	2018-12-04 17:53:20 +01:00
Martijn van Groningen	6e1ff31222	[CCR] AutoFollowCoordinator should tolerate that auto follow patterns may be removed (#35945 ) AutoFollowCoordinator should take into account that after auto following an index and while updating that a leader index has been followed, that the auto follow pattern may have been removed via delete auto follow patterns api. Also fixed a bug that when a remote cluster connection has been removed, the auto follow coordinator does not die when it tries get a remote client for that cluster. Closes #35480	2018-12-04 15:55:15 +01:00
Yannick Welsch	80ee7943c9	Merge remote-tracking branch 'elastic/master' into zen2	2018-12-04 09:37:09 +01:00
Martijn van Groningen	43773a32a4	Replace Streamable w/ Writeable in BaseTasksRequest and subclasses (#35854 ) * Replace Streamable w/ Writeable in BaseTasksRequest and subclasses This commit replaces usages of Streamable with Writeable for the BaseTasksRequest / TransportTasksAction classes and subclasses of these classes. Relates to #34389	2018-12-03 08:04:29 +01:00
Martijn van Groningen	32f7fbd9f0	[TEST] Set 'index.unassigned.node_left.delayed_timeout' to 0 in ccr tests Some tests kill nodes and otherwise it would take 60s by default for replicas to get allocated and that is longer than we wait for getting in a green state in tests. Relates to #35403	2018-11-30 11:03:36 +01:00
David Turner	7f257187af	[Zen2] Update default for USE_ZEN2 to true (#35998 ) Today the default for USE_ZEN2 is false and it is overridden in many places. By defaulting it to true we can be sure that the only places in which Zen2 does not work are those in which it is explicitly set to false.	2018-11-29 12:18:35 +00:00
Martijn van Groningen	1390f366d4	[CCR] Only auto follow indices when all primary shards have started (#35814 ) This change adds an extra check that verifies that all primary shards have been started of an index that is about to be auto followed. If not all primary shards have been started for an index then the next auto follow run will try to follow to auto follow this index again. Closes #35480	2018-11-29 09:46:09 +01:00
Jason Tedor	a3186e4a32	Deprecate X-Pack centric license endpoints (#35959 ) This commit is part of our plan to deprecate and ultimately remove the use of _xpack in the REST APIs.	2018-11-28 08:24:35 -05:00
Jason Tedor	2887680acb	Avoid NPE in follower stats when no tasks metadata (#35802 ) When there is no persistent tasks metadata we could hit a null pointer exception when executing a follower stats request. This is because we inspect the persistent tasks metadata. Yet, if no tasks have been registered, this is null (as opposed to empty). We need to avoid de-referencing the persistent tasks metadata in this case. That is what this commit does, and we add a test for this situation.	2018-11-21 19:16:28 -05:00
Tim Brooks	a989b675b5	Remove NPE from IndexFollowingIT (#35717 ) Currently there is a common NPE in the IndexFollowingIT that does not indicate the test failing. This is when a cluster state listener is called and certain index metadata is not yet available. This commit checks that the metadata is not null before performing the logic that depends on the metadata.	2018-11-19 20:38:49 -07:00
Arthur Gavlyukovskiy	022726011c	Remove use of AbstractComponent in server (#35444 ) Removed extending of AbstractComponent and changed logger usage to explicit declaration. Abstract classes still have logger declaration using this.getClass() in order to show implementation class name in its logs. See #34488	2018-11-16 16:10:32 -05:00
Martijn van Groningen	0487181d0f	[TEST] Force flush to ensure multiple segments. Relates to #35333	2018-11-13 14:58:17 +01:00
Jason Tedor	3859d21661	Fix the names of CCR stats endpoints in usage API (#35438 ) This commit fixes the names of the CCR stats endpoints reported in the usage API.	2018-11-12 10:27:12 -05:00
Martijn van Groningen	ef10461caf	[TEST] Instead of ignoring the ccr downgrade to basic license qa test avoid the assertions that check the log files, because that does not work on Windows. The rest of the test is still useful and should work on Windows CI. Currently on Windows CI this qa module fails because there is just one test and that test si ignored if OS is Windows.	2018-11-12 10:17:33 +01:00
Martijn van Groningen	ae2af20ae5	[CCR] Validate remote cluster license as part of put auto follow pattern api call (#35364 ) Validate remote cluster license as part of put auto follow pattern api call in addition of validation that when auto follow coordinator starts auto following indices in the leader cluster. Also added qa module that tests what happens to ccr after downgrading to basic license. Existing active follow indices should remain to follow, but the auto follow feature should not pickup new leader indices.	2018-11-09 17:43:43 +01:00
Martijn van Groningen	807ce10f73	[TEST] Increased timeout for verifying ccr monitoring.	2018-11-09 15:40:15 +01:00
Martijn van Groningen	fba811fa3a	[TEST] increased the number of index and delete ops to make it less likely that all ops exist as soft delete docs.	2018-11-09 15:31:51 +01:00
Martijn van Groningen	83152b3835	[CCR] Get all auto follow patterns and no auto follow metadata (#35381 ) Return empty response when querying all auto follow patterns, but there is no auto follow metadata.	2018-11-09 14:24:27 +01:00
Martijn van Groningen	07a69a528b	[CCR] Rename leaderClient variables and parameters to remoteClient (#35368 )	2018-11-08 16:26:14 +01:00
Martijn van Groningen	8a85251da0	[CCR] Auto follow Coordinator fetch cluster state in system context (#35120 ) Auto follow Coordinator should fetch the leader cluster state using system context.	2018-11-08 10:48:27 +01:00
Martijn van Groningen	2f2090f562	[CCR] Adjust list of dynamic index settings that should be replicated (#35195 ) Adjust list of dynamic index settings that should be replicated and added a test that verifies whether builtin dynamic index settings are classified as replicated or non replicated (whitelisted).	2018-11-07 21:59:58 -05:00
Jason Tedor	4f4fc3b8f8	Replicate index settings to followers (#35089 ) This commit uses the index settings version so that a follower can replicate index settings changes as needed from the leader. Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com>	2018-11-07 21:20:51 -05:00
Martijn van Groningen	314b9ca44c	[CCR] Enforce auto follow pattern name restrictions (#35197 ) An auto follow pattern: * cannot start with `_` * cannot contain a `,` * can be encoded in UTF-8 * the length of UTF-8 encoded bytes is no longer than 255 bytes	2018-11-07 20:16:26 +01:00
Martijn van Groningen	e685cfe8f9	[CCR] Fail with a better error if leader index is red (#35298 ) as part of fetching history uuids from leader index.	2018-11-07 13:23:30 +01:00
Martijn van Groningen	2395e16d84	[CCR] Change resume follow api to be a master node action (#35249 ) In order to start shard follow tasks, the resume follow api already needs execute N requests to the elected master node. The pause follow API is also a master node action, which would make how both APIs execute more consistent.	2018-11-07 07:38:44 +01:00
Martijn van Groningen	a937d7f5f3	[CCR] Forgot missing return statement, Error was thrown if leader index had no soft deletes enabled, but it then continued creating the follower index. The test caught this bug, but very rarely due to timing issue. Build failure instance: ``` 1> [2018-11-05T20:29:38,597][INFO ][o.e.x.c.LocalIndexFollowingIT] [testDoNotCreateFollowerIfLeaderDoesNotHaveSoftDeletes] before test 1> [2018-11-05T20:29:38,599][INFO ][o.e.c.s.ClusterSettings ] [node_s_0] updating [cluster.remote.local.seeds] from [[]] to [["127.0.0.1:9300"]] 1> [2018-11-05T20:29:38,599][INFO ][o.e.c.s.ClusterSettings ] [node_s_0] updating [cluster.remote.local.seeds] from [[]] to [["127.0.0.1:9300"]] 1> [2018-11-05T20:29:38,609][INFO ][o.e.c.m.MetaDataCreateIndexService] [node_s_0] [leader-index] creating index, cause [api], templates [random-soft-deletes-templat e, one_shard_index_template], shards [2]/[0], mappings [] 1> [2018-11-05T20:29:38,628][INFO ][o.e.c.r.a.AllocationService] [node_s_0] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[leader- index][0]] ...]). 1> [2018-11-05T20:29:38,660][INFO ][o.e.x.c.a.TransportPutFollowAction] [node_s_0] [follower-index] creating index, cause [ccr_create_and_follow], shards [2]/[0] 1> [2018-11-05T20:29:38,675][INFO ][o.e.c.s.ClusterSettings ] [node_s_0] updating [cluster.remote.local.seeds] from [["127.0.0.1:9300"]] to [[]] 1> [2018-11-05T20:29:38,676][INFO ][o.e.c.s.ClusterSettings ] [node_s_0] updating [cluster.remote.local.seeds] from [["127.0.0.1:9300"]] to [[]] 1> [2018-11-05T20:29:38,678][INFO ][o.e.x.c.LocalIndexFollowingIT] [testDoNotCreateFollowerIfLeaderDoesNotHaveSoftDeletes] after test 1> [2018-11-05T20:29:38,678][INFO ][o.e.x.c.LocalIndexFollowingIT] [testDoNotCreateFollowerIfLeaderDoesNotHaveSoftDeletes] [LocalIndexFollowingIT#testDoNotCreateFoll owerIfLeaderDoesNotHaveSoftDeletes]: cleaning up after test 1> [2018-11-05T20:29:38,678][INFO ][o.e.c.m.MetaDataDeleteIndexService] [node_s_0] [follower-index/TlWlXp0JSVasju2Kr_hksQ] deleting index 1> [2018-11-05T20:29:38,678][INFO ][o.e.c.m.MetaDataDeleteIndexService] [node_s_0] [leader-index/FQ6EwIWcRAKD8qvOg2eS8g] deleting index FAILURE 0.23s J0 \| LocalIndexFollowingIT.testDoNotCreateFollowerIfLeaderDoesNotHaveSoftDeletes <<< FAILURES! > Throwable #1: java.lang.AssertionError: > Expected: <false> > but: was <true> > at __randomizedtesting.SeedInfo.seed([7A3C89DA3BCA17DD:65C26CBF6FEF0B39]:0) > at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20) > at org.elasticsearch.xpack.ccr.LocalIndexFollowingIT.testDoNotCreateFollowerIfLeaderDoesNotHaveSoftDeletes(LocalIndexFollowingIT.java:83) > at java.lang.Thread.run(Thread.java:748) ``` Build failure: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.5+intake/46/console	2018-11-06 16:05:35 +01:00
Martijn van Groningen	46c238d792	[CCR] Improve error when operations are missing (#35179 ) Improve error when operations are missing	2018-11-06 08:42:47 +01:00
Martijn van Groningen	cac67f8bcc	[CCR] Add extra validation to unfollow api (#35245 ) Validate whether the follow index actually exists and whether the follow index actually has custom ccr metadata.	2018-11-06 08:00:34 +01:00
Nik Everett	f72ef9b5fd	Build: Pull "skip assemble on qa" to common build (#35214 ) Pull all of the logic that we use to skip the `assemble` and `dependenciesInfo` tasks on `qa` projects into one spot in our root build file.	2018-11-05 16:16:00 -05:00
Alexander Reelsen	409050e8de	Refactor: Remove settings from transport action CTOR (#35208 ) As settings are not used in the transport action constructor, this removes the passing of the settings in all the transport actions.	2018-11-05 13:08:18 +01:00
Martijn van Groningen	ddda2d419c	[CCR] Change max_read_request_size default (#35247 ) This changes the max_read_request_size default from unlimited to 32MB.	2018-11-05 12:51:42 +01:00
Nhat Nguyen	54e1231ebd	CCR/TEST: Limit indexing docs in FollowerFailOverIT (#35228 ) The suite FollowerFailOverIT is failing because some documents are not replicated to the follower. Maybe the FollowTask is not working as expected or the background indexers eat all resources while the follower cluster is trying to reform after a failover; then CI is not fast enough to replicate all the indexed docs within 60 seconds (sometimes I see 80k docs on the leader). This commit limits the number of documents to be indexed into the leader index by the background threads so that we can eliminate the latter case. This change also replaces a docCount assertion with a docIds assertion so we can have more information if these tests fail again. Relates #33337	2018-11-03 10:00:54 -04:00
David Kyle	43cff56aec	Mute FollowerFailOverIT.testFollowIndexAndCloseNode Issue #33337	2018-11-02 15:13:09 +00:00
Nhat Nguyen	4875d6fb0b	CCR: Add NodeClosedException to retryable list (#35191 ) This change adds NodeClosedException to the retry-able exception list.	2018-11-02 07:01:46 -04:00
Martijn van Groningen	19d6cf1b9e	[CCR] Change response classes to not use from Streamable. (#35085 ) Only the response classes of get auto follow pattern, the follow and stats APIs were moved away from Streamable. The other APIs use `AcknowledgedResponse` or `BaseTasksResponse` as response class and moving that class away from Streamable is a bigger change.	2018-11-02 08:02:17 +01:00
Nhat Nguyen	dbc7c9259c	CCR/TEST: Add debug log to testFailOverOnFollower	2018-11-01 22:28:06 -04:00
Nik Everett	e28509fbfe	Core: Less settings to AbstractComponent (#35140 ) Stop passing `Settings` to `AbstractComponent`'s ctor. This allows us to stop passing around `Settings` in a ton of places. While this change touches many files, it touches them all in fairly small, mechanical ways, doing a few things per file: 1. Drop the `super(settings);` line on everything that extends `AbstractComponent`. 2. Drop the `settings` argument to the ctor if it is no longer used. 3. If the file doesn't use `logger` then drop `extends AbstractComponent` from it. 4. Clean up all compilation failure caused by the `settings` removal and drop any now unused `settings` isntances and method arguments. I've intentionally not removed the `settings` argument from a few files: 1. TransportAction 2. AbstractLifecycleComponent 3. BaseRestHandler These files don't need `settings` either, but this change is large enough as is. Relates to #34488	2018-10-31 21:23:20 -04:00
Martijn van Groningen	6da2fb7d5b	Change CCR API request classes to use Writeable serialization instead of Streamable (#34911 ) Only the follow stats request couldn't be changed to use Writeable serialization, because that requires changes in `TransportTasksAction` and `BaseTasksRequest` base classes.	2018-10-30 11:03:30 +01:00
Jason Tedor	cc9894d78a	Fix name of CCR stats transport action This class name should include an indication that it is for CCR. This commit adds that to the name of this class.	2018-10-29 09:50:13 -04:00
Jason Tedor	26f5c509af	Fix CCR API specification (#34963 ) This commit fixes two issues with the CCR API specification: - remove the CCR stats endpoint, it is not currently implemented - fix the documentation links	2018-10-29 09:37:13 -04:00
Martijn van Groningen	b2daaf15d1	[CCR] move tests that modify test cluster from the main test class to a dedicated class.	2018-10-29 13:45:59 +01:00
Martijn van Groningen	1801518527	[CCR] Refactor stats APIs (#34912 ) * Changed the auto follow stats to also include follow stats. * Renamed the auto follow stats api to stats api and changed its url path from `/_ccr/auto_follow/stats` `/_ccr/stats`. * Removed `/_ccr/stats` url path for the follow stats api, which makes the index parameter a required parameter. * Fixed docs.	2018-10-29 07:45:27 +01:00
Martijn van Groningen	bad5972f62	[CCR] Fix request serialization bug (#34917 ) and some parameters that were not set in tests.	2018-10-29 07:38:55 +01:00
Jason Tedor	43f6ba1c63	Fix put/resume follow request parsing (#34913 ) This commit adds some fields that were missing from put follow, and fixes a bug in resume follow.	2018-10-26 11:09:55 -04:00
Martijn van Groningen	306f1d78f8	[CCR] Retry when no index shard stats can be found (#34852 ) Index shard stats for the follower shard are fetched, when a shard follow task is started. This is needed in order to bootstap the shard follow task with the follower global checkpoint. Sometimes index shard stats are not available (e.g. during a restart) and we fail now, while it is very likely that these stats will be available some time later.	2018-10-26 15:14:24 +02:00
Nhat Nguyen	ff49e79d40	CCR: Rename follow-task parameters and stats (#34836 ) * CCR: Rename follow parameters and stats This commit renames the follow-task parameters and its stats. Below are the changes: ## Params - remote_cluster (unchanged) - leader_index (unchanged) - max_read_request_operation_count -> max_read_request_operation_count - max_batch_size -> max_read_request_size - max_write_request_operation_count (new) - max_write_request_size (new) - max_concurrent_read_batches -> max_outstanding_read_requests - max_concurrent_write_batches -> max_outstanding_write_requests - max_write_buffer_size (unchanged) - max_write_buffer_count (unchanged) - max_retry_delay (unchanged) - poll_timeout -> read_poll_timeout ## Stats - remote_cluster (unchanged) - leader_index (unchanged) - follower_index (unchanged) - shard_id (unchanged) - leader_global_checkpoint (unchanged) - leader_max_seq_no (unchanged) - follower_global_checkpoint (unchanged) - follower_max_seq_no (unchanged) - last_requested_seq_no (unchanged) - number_of_concurrent_reads -> outstanding_read_requests - number_of_concurrent_writes -> outstanding_write_requests - buffer_size_in_bytes -> write_buffer_size_in_bytes (new) - number_of_queued_writes -> write_buffer_operation_count - mapping_version -> follower_mapping_version - total_fetch_time_millis -> total_read_time_millis - total_fetch_remote_time_millis -> total_read_remote_exec_time_millis - number_of_successful_fetches -> successful_read_requests - number_of_failed_fetches -> failed_read_requests - operation_received -> operations_read - total_transferred_bytes -> bytes_read - total_index_time_millis -> total_write_time_millis [?] - number_of_successful_bulk_operations -> successful_write_requests - number_of_failed_bulk_operations -> failed_write_requests - number_of_operations_indexed -> operations_written - fetch_exception -> read_exceptions - time_since_last_read_millis -> time_since_last_read_millis * add test for max_write_request_(operation_count\|size)	2018-10-25 10:36:15 +02:00
Martijn van Groningen	6fe0e62b7a	[CCR] Added write buffer size limit (#34797 ) This limit is based on the size in bytes of the operations in the write buffer. If this limit is exceeded then no more read operations will be coordinated until the size in bytes of the write buffer has dropped below the configured write buffer size limit. Renamed existing `max_write_buffer_size` to ``max_write_buffer_count` to indicate that limit is count based. Closes #34705	2018-10-24 23:48:49 +02:00
Andrey Atapin	5f588180f9	Improve IndexNotFoundException's default error message (#34649 ) This commit adds the index name to the error message when an index is not found.	2018-10-24 12:53:31 -07:00
Nhat Nguyen	d73768f812	CCR: Do not follow if leader does not have soft-deletes (#34767 ) We should not create a follower index and abort a follow request if the leader does not have soft-deletes. Moreover, we also should not auto-follow an index if it does not have soft-deletes.	2018-10-24 11:19:39 -04:00
Alpar Torok	795d57b4f9	Auto configure all test tasks (#34666 ) With this change, we apply the common test config automatically to all newly created tasks instead of opting in specifically. For plugin authors using the plugin externally this means that the configuration will be applied to their RandomizedTestingTasks as well. The purpose of the task is to simplify setup and make it easier to change projects that use the `test` task but actually run integration tests to use a task called `integTest` for clarity, but also because we may want to configure and run them differently. E.x. using different levels of concurrency.	2018-10-24 16:05:50 +03:00
Martijn van Groningen	76240e6bbe	[CCR] Renamed leader_cluster to remote_cluster (#34776 ) and also some occurrences of clusterAlias to remoteCluster. Closes #34682	2018-10-24 13:39:36 +02:00
Boaz Leskes	be907516ad	Change ShardFollowTask defaults (#34793 ) Per #31717 this commit changes the defaults to the following: Batch size of 5120 ops. Maximum of 12 concurrent read requests. Maximum of 9 concurrent write requests. This is not necessarily our final values but it's good to have these as defaults for the purposes of initial testing.	2018-10-24 13:32:48 +02:00
Martijn van Groningen	18007a29b2	[CCR] Made leader cluster required in shard follow task. Left over from #34580	2018-10-24 08:38:25 +02:00
Martijn van Groningen	abf8cb6706	[CCR] Cleanup pause follow action (#34183 ) * Change the `TransportPauseFollowAction` to extend from `TransportMasterNodeAction` instead of `HandledAction`, this removes a sync cluster state api call. * Introduced `ResponseHandler` that removes duplicated code in `TransportPauseFollowAction` and `TransportResumeFollowAction`. * Changed `PauseFollowAction.Request` to not use `readFrom()`.	2018-10-24 08:12:39 +02:00
Martijn van Groningen	0efba0675e	[CCR] Add qa test library (#34611 ) * Introduced test qa lib that all CCR qa modules depend on to avoid test code duplication.	2018-10-23 23:24:32 +02:00
Nhat Nguyen	e242fd2e42	CCR: Add TransportService closed to retryable errors (#34722 ) Both testFollowIndexAndCloseNode and testFailOverOnFollower failed because they responded to the FollowTask a TransportService closed exception which is currently considered as a fatal error. This behavior is not desirable since a closing node can throw that exception, and we should retry in that case. This change adds TransportService closed error to the list of retryable errors. Closes #34694	2018-10-23 14:23:29 -04:00
Martijn van Groningen	ed817fb265	[CCR] Move leader_index and leader_cluster parameters from resume follow to put follow api (#34638 ) As part of this change the leader index name and leader cluster name are stored in the CCR metadata in the follow index. The resume follow api will read that when a resume follow request is executed.	2018-10-23 19:37:45 +02:00
Nhat Nguyen	5923ea536e	CCR: Requires soft-deletes on the follower (#34725 ) Since #34412 and #34474, a follower must have soft-deletes enabled to work correctly. This change requires soft-deletes on the follower. Relates #34412 Relates #34474	2018-10-23 11:51:17 -04:00
Martijn van Groningen	e6d87cc09f	[CCR] Add total fetch time leader stat (#34577 ) Add total fetch time leader stat, that keeps track how much time was spent on fetches from the leader cluster perspective.	2018-10-23 16:41:06 +02:00
Martijn van Groningen	36baf3823d	[CCR] Auto follow pattern APIs adjustments (#34518 ) * Changed the resource id of auto follow patterns to be a user defined name instead of being the leader cluster alias name. * Fail when an unfollowed leader index matches with two or more auto follow patterns.	2018-10-23 15:48:51 +02:00
Jason Tedor	52fc502b7e	Fix the casing in the names of some CCR classes We should be consistent here. We were already using the casing "Ccr" and this is the preferred casing for Java class names. This commit adjusts the names of some classes that were using the casing "CCR" to be "Ccr".	2018-10-22 11:25:00 -04:00
Jason Tedor	7af19b8f81	Migrate wait for pending tasks helper to server (#34675 ) In some of our X-Pack REST tests we have to wait for pending tasks to complete. We are now needing this functionality in ESRestTestCase for the docs tests where we run against X-Pack features. This commit moves the helper method that we have in X-Pack to ESRestTestCase, and removes duplicate logic from waiting for rollup tasks to complete.	2018-10-22 11:14:02 -04:00
Martijn van Groningen	92e34732f5	[CCR] Remove ccr related metadata between tests for single node tests too	2018-10-22 09:15:22 +02:00
Martijn van Groningen	b6750cf6c2	[CCR] Muted tests Relates to #34696	2018-10-22 08:47:31 +02:00
Martijn van Groningen	f51301a1a6	[CCR] Moved integration test	2018-10-22 08:44:41 +02:00
Martijn van Groningen	b816837d39	[CCR] Always remove persistent tasks metadata between tests and better handle assertion errors between tests.	2018-10-22 08:15:43 +02:00
Nhat Nguyen	d90b6730c7	CCR: Following primary should process NoOps once (#34408 ) This is a follow-up for #34288. Relates #34412	2018-10-19 21:10:13 -04:00
Nhat Nguyen	630d5514a5	CCR/TEST: Adjust testFailOverOnFollower CI passed but the result is outdated after PR #34366 was merged.	2018-10-19 15:06:44 -04:00
Nhat Nguyen	bd92a28cfc	CCR: Replicate existing ops with old term on follower (#34412 ) Since #34288, we might hit deadlock if the FollowTask has more fetchers than writers. This can happen in the following scenario: Suppose the leader has two operations [seq#0, seq#1]; the FollowTask has two fetchers and one writer. 1. The FollowTask issues two concurrent fetch requests: {from_seq_no: 0, num_ops:1} and {from_seq_no: 1, num_ops:1} to read seq#0 and seq#1 respectively. 2. The second request which fetches seq#1 completes before, and then it triggers a write request containing only seq#1. 3. The primary of a follower fails after it has replicated seq#1 to replicas. 4. Since the old primary did not respond, the FollowTask issues another write request containing seq#1 (resend the previous write request). 5. The new primary has seq#1 already; thus it won't replicate seq#1 to replicas but will wait for the global checkpoint to advance at least seq#1. The problem is that the FollowTask has only one writer and that writer is waiting for seq#0 which won't be delivered until the writer completed. This PR proposes to replicate existing operations with the old primary term (instead of the current term) on the follower. In particular, when the following primary detects that it has processed an process already, it will look up the term of an existing operation with the same seq_no in the Lucene index, then rewrite that operation with the old term before replicating it to the following replicas. This approach is wait-free but requires soft-deletes on the follower. Relates #34288	2018-10-19 13:56:00 -04:00
Nhat Nguyen	90ca5b1fde	Fill LocalCheckpointTracker with Lucene commit (#34474 ) Today we rely on the LocalCheckpointTracker to ensure no duplicate when enabling optimization using max_seq_no_of_updates. The problem is that the LocalCheckpointTracker is not fully reloaded when opening an engine with an out-of-order index commit. Suppose the starting commit has seq#0 and seq#2, then the current LocalCheckpointTracker would return "false" when asking if seq#2 was processed before although seq#2 in the commit. This change scans the existing sequence numbers in the starting commit, then marks these as completed in the LocalCheckpointTracker to ensure the consistent state between LocalCheckpointTracker and Lucene commit.	2018-10-19 12:38:06 -04:00
Martijn van Groningen	56d4f69718	Renamed remaining leader_cluster_alias / cluster_alias to leader_cluster	2018-10-19 07:59:56 +02:00
Martijn van Groningen	44b461aff2	[CCR] Make leader cluster a required argument. (#34580 ) This change makes it no longer possible to follow / auto follow without specifying a leader cluster. If a local index needs to be followed then `cluster.remote.*.seeds` should point to nodes in the local cluster. Closes #34258	2018-10-19 07:41:46 +02:00
Martijn van Groningen	0d62f6102c	[CCR] Split cluster alias from leader index field into its own field in follow APIs (#34366 )	2018-10-18 12:11:48 +02:00
Jason Tedor	3e067123a1	Remove dead methods from ChainIT This commit removes some unused methods from ChainIT.	2018-10-16 10:45:33 -04:00
Martijn van Groningen	a1ec91395c	Changed CCR internal integration tests to use a leader and follower cluster instead of a single cluster (#34344 ) The `AutoFollowTests` needs to restart the clusters between each tests, because it is using auto follow stats in assertions. Auto follow stats are only reset by stopping the elected master node. Extracted the `testGetOperationsBasedOnGlobalSequenceId()` test to its own test, because it just tests the shard changes api. * Renamed AutoFollowTests to AutoFollowIT, because it is an integration test. Renamed ShardChangesIT to IndexFollowingIT, because shard changes it the name of an internal api and isn't a good name for an integration test. * move creation of NodeConfigurationSource to a seperate method * Fixes issues after merge, moved assertSeqNos() and assertSameDocIdsOnShards() methods from ESIntegTestCase to InternalTestCluster, so that ccr tests can use these methods too.	2018-10-16 14:45:46 +02:00
Jason Tedor	e0b6721df4	Add dedicated test for chain replication (#34497 ) This commit adds a dedicated test that chain replication leader -> middle -> follow is successful.	2018-10-16 06:21:28 -04:00
Martijn van Groningen	f7df8718b9	[CCR] Don't fail shard follow tasks in case of a non-retryable error (#34404 )	2018-10-16 07:44:15 +02:00
Martijn van Groningen	51eca14288	[TEST] Make sure there are shards started so that `ESIntegTestCase#assertSameDocIdsOnShards()` does not fail with shard not found.	2018-10-15 10:24:28 +02:00
Martijn van Groningen	74dc2da873	Change shard changes api's threadpool from get to search (#34421 )	2018-10-15 08:09:00 +01:00
Nhat Nguyen	429c29e833	CCR/TEST: AwaitsFix testFailOverOnFollower Tracked at #34412	2018-10-13 21:05:33 -04:00
Nhat Nguyen	7bc11a8099	Unmute testFollowIndexAndCloseNode This issue was resolved by #34288. Closes #33337 Relates #34288	2018-10-10 15:48:22 -04:00
Nhat Nguyen	33791ac27c	CCR: Following primary should process operations once (#34288 ) Today we rewrite the operations from the leader with the term of the following primary because the follower should own its history. The problem is that a newly promoted primary may re-assign its term to operations which were replicated to replicas before by the previous primary. If this happens, some operations with the same seq_no may be assigned different terms. This is not good for the future optimistic locking using a combination of seqno and term. This change ensures that the primary of a follower only processes an operation if that operation was not processed before. The skipped operations are guaranteed to be delivered to replicas via either primary-replica resync or peer-recovery. However, the primary must not acknowledge until the global checkpoint is at least the highest seqno of all skipped ops (i.e., they all have been processed on every replica). Relates #31751 Relates #31113	2018-10-10 15:39:57 -04:00
Martijn van Groningen	268e134121	renamed test class	2018-10-08 15:05:50 +02:00
Martijn van Groningen	c6c83d19f7	[CCR] Clear fetch exceptions if an empty but successful shard changes response returns (#34256 ) Also fixed ShardFollowNodeTaskTests to not return ops when responseSize is empty. Otherwise ops are returned when no ops are expected to be returned. Co-authored-by: Jason Tedor <jason@tedor.me>	2018-10-06 07:53:37 -04:00
Martijn van Groningen	899e48395b	[CCR] Change unfollow API's privilege scheme. (#34175 ) Unfollow should be allowed / disallowed on a per index level instead of cluster level. Also renamed `create_follow_index` index privilege to `manage_follow_index` privilege and include unfollow and close APIs.	2018-10-06 07:38:28 -04:00
Jason Tedor	7d57bdb3a0	Follow stats structure (#34301 ) This commit modifies the follow stats API response structure to more clearly highlight meaning of the higher level fields. In particular, previously the response had a top-level key for each index. Instead, we nest the indices under an "indices" field which is now an array. The values in this array are objects containing two fields: "index" which is the name of the follower index, and "shards" which is an array where each value in the array is the follower stats for that shard. That is, we have gone from: { "bar": [ { "shard_id": 0... }... ]... } to { "indices": [ { "index": "bar", "shards": [ { "shard_id": 0... }... ] }... }	2018-10-05 06:38:20 -04:00
Jason Tedor	7478167d60	Rename CCR stats implementation (#34300 ) In the CCR docs we want to refer to the endpoint that returns following stats as the follow stats API. This commit renames the internal implementation of this endpoint to reflect this usage.	2018-10-05 06:25:24 -04:00
Nhat Nguyen	d7893fd1e4	TEST: Mute testFollowIndexAndCloseNode Tracked at #33337	2018-10-02 17:20:31 -04:00
Martijn van Groningen	7f5c2f1050	[CCR] Validate follower index historyUUIDs (#34078 ) The follower index shard history UUID will be fetched from the indices stats api when the shard follow task starts and will be provided with the bulk shard operation requests. The bulk shard operations api will fail if the provided history uuid is unequal to the actual history uuid. No longer record the leader history uuid in shard follow task params, but rather use the leader history UUIDs directly from follower index's custom metadata. The resume follow api will remain to fail if leader index shard history UUIDs are missing. Closes #33956	2018-10-02 18:01:06 +02:00
Martijn van Groningen	d12a64eac2	[CCR] Only use primary shards and get expected count from leader index (#34186 ) Closes #34173	2018-10-01 20:13:16 +02:00
Nhat Nguyen	a02debadfe	TEST: Unmute testFollowIndexAndCloseNode Since #34099, the FollowingEngine will skip an operation which was already processed before. With that change, it should be okay to unmute testFollowIndexAndCloseNode.	2018-10-01 11:59:33 -04:00
Jason Tedor	80f7c1dcc9	Fix compilation in unfollow action tests This arose when two commits were pushed at roughly the same time, both of which compiled successfully against master, but not when taken together. This commit fixes a reference in one of the commits that was changed in the other commit.	2018-09-30 14:30:08 -04:00
Jason Tedor	1893765055	Change CCR stats endpoint to be index-centric (#34169 ) This commit modifies the CCR stats endpoint for indices to be /{index}/_ccr/stats. This makes this endpoint consistent with other index-centric endpoints like indices stats.	2018-09-30 14:29:32 -04:00
Jason Tedor	e2bd2028d8	Allow specifying shard changes batch sizes in bytes (#34168 ) This commit changes the shard changes requests from using a raw byte value to being able to be specified using bytes units (e.g., 4mb).	2018-09-30 14:22:22 -04:00
Martijn van Groningen	7c91c7a638	fixed test compile error	2018-09-30 19:31:30 +02:00
Martijn van Groningen	b1a27b2e6b	[CCR] Add unfollow API (#34132 ) The unfollow API changes a follower index into a regular index, so that it will accept write requests from clients. For the unfollow api to work the index follow needs to be stopped and the index needs to be closed. Closes #33931	2018-09-30 19:19:34 +02:00
Nhat Nguyen	ad61398879	CCR: Optimize indexing ops using seq_no on followers (#34099 ) This change introduces the indexing optimization using sequence numbers in the FollowingEngine. This optimization uses the max_seq_no_updates which is tracked on the primary of the leader and replicated to replicas and followers. Relates #33656	2018-09-28 20:42:26 -04:00
Martijn van Groningen	a984f8afb3	[CCR] Validate index privileges prior to following an index (#33758 ) Prior to following an index in the follow API, check whether current user has sufficient privileges in the leader cluster to read and monitor the leader index. Also check this in the create and follow API prior to creating the follow index. Also introduced READ_CCR cluster privilege that include the minimal cluster level actions that are required for ccr in the leader cluster. So a user can follow indices in a cluster, but not use the ccr admin APIs. Closes #33553 Co-authored-by: Jason Tedor <jason@tedor.me>	2018-09-28 17:51:23 +02:00
Martijn van Groningen	3d7e3b2ab1	[TEST] changed naming of test methods to not refer to old api names.	2018-09-28 17:43:53 +02:00
Martijn van Groningen	eb00348b57	[CCR] Adjust list retryable errors (#33985 ) The following changes were made: * Added ElasticsearchSecurityException. For in the case the current user has insufficient privileges while an index is being followed. Prior to following ccr checks whether the current user has sufficient privileges and if not the follow api fails with an error. * Added Index block exception. If the leader index gets closed, this exception is returned. * Added ClusterBlockException service unavailable. In case for example the leader cluster is without elected master. * Removed IndexNotFoundException. If the leader / follower index has been deleted, ccr will need to stop the shard follow tasks with an error. Closes #33954	2018-09-28 13:33:09 +02:00
Martijn van Groningen	506c1c2d47	Retry errors when fetching follower global checkpoint. (#34019 ) Closes #34016	2018-09-28 10:34:08 +02:00
Martijn van Groningen	9129948f60	Rename CCR APIs (#34027 ) * Renamed CCR APIs Renamed: * `/{index}/_ccr/create_and_follow` to `/{index}/_ccr/follow` * `/{index}/_ccr/unfollow` to `/{index}/_ccr/pause_follow` * `/{index}/_ccr/follow` to `/{index}/_ccr/resume_follow` Relates to #33931	2018-09-28 08:02:20 +02:00
Martijn van Groningen	17b3b97899	Fixed CCR stats api serialization issues and (#33983 ) always use `IndicesOptions.strictExpand()` for indices options. The follow index may be closed and we still want to get stats from shard follow task and the whether the provided index name matches with follow index name is checked when locating the task itself in the ccr stats transport action.	2018-09-28 07:45:32 +02:00
Nhat Nguyen	48c169e065	CCR: replicates max seq_no of updates to follower (#34051 ) This commit replicates the max_seq_no_of_updates on the leading index to the primaries of the following index via ShardFollowNodeTask. The max_seq_of_updates is then transmitted to the replicas of the follower via replication requests (that's BulkShardOperationsRequest). Relates #33656	2018-09-26 08:00:10 -04:00
Martijn van Groningen	eae5487477	[CCR] set minimum version to 6.5.0	2018-09-26 09:31:36 +02:00
Martijn van Groningen	96b3417985	[CCR] Don't auto follow follow indices in the same cluster. (#33944 )	2018-09-26 07:34:51 +02:00
Nhat Nguyen	5166dd0a4c	Replicate max seq_no of updates to replicas (#33967 ) We start tracking max seq_no_of_updates on the primary in #33842. This commit replicates that value from a primary to its replicas in replication requests or the translog phase of peer-recovery. With this change, we guarantee that the value of max seq_no_of_updates on a replica when any index/delete operation is performed at least the max_seq_no_of_updates on the primary when that operation was executed. Relates #33656	2018-09-25 08:07:57 -04:00
Martijn van Groningen	793b2a94b4	[CCR] Expose auto follow stats to monitoring (#33886 )	2018-09-25 07:19:46 +02:00
Nhat Nguyen	6ec36b1273	CCR: Make AutoFollowMetadata immutable (#33977 ) We should make AutoFollowMetadata immutable to avoid being inconsistent when one thread modifies it while other reads it.	2018-09-24 17:47:10 -04:00
Martijn van Groningen	2795ef561f	[CCR] Add get auto follow pattern api (#33849 ) Relates to #33007	2018-09-24 20:26:13 +02:00
Nhat Nguyen	ddd5ce5740	TEST: Avoid invalid ranges in ShardChangesActionTests (#33976 ) If numWrites is between 2 and 9, we will issue an invalid range because the from_seq_no is negative. This commit makes sure that numWrites is at least 10, and adds an explicit test to verify invalid request ranges.	2018-09-23 22:28:41 -04:00
Nhat Nguyen	7944a0cb25	Track max seq_no of updates or deletes on primary (#33842 ) This PR is the first step to use seq_no to optimize indexing operations. The idea is to track the max seq_no of either update or delete ops on a primary, and transfer this information to replicas, and replicas use it to optimize indexing plan for index operations (with assigned seq_no). The max_seq_no_of_updates on primary is initialized once when a primary finishes its local recovery or peer recovery in relocation or being promoted. After that, the max_seq_no_of_updates is only advanced internally inside an engine when processing update or delete operations. Relates #33656	2018-09-22 08:02:57 -04:00
Martijn van Groningen	e1e5f40727	[CCR] Move headers from auto follow pattern to auto follow metadata (#33846 ) This ensures that we will not serialize the headers as part of the auto follow pattern in the to be added get auto follow api.	2018-09-21 18:08:29 +02:00
Martijn van Groningen	384ce58535	removed unused fields	2018-09-20 08:56:23 +02:00
Martijn van Groningen	44c7c4b166	[CCR] Add auto follow stats api (#33801 ) GET /_ccr/auto_follow/stats Returns: ``` { "number_of_successful_follow_indices": ... "number_of_failed_follow_indices": ... "number_of_failed_remote_cluster_state_requests": ... "recent_auto_follow_errors": [ ... ] } ``` Relates to #33007	2018-09-20 07:16:20 +02:00
Martijn van Groningen	d9947c631a	[CCR] Rename idle_shard_retry_delay to poll_timout in auto follow patterns (#33821 )	2018-09-19 13:13:20 +02:00
Martijn van Groningen	013b64a07c	[CCR] Change FollowIndexAction.Request class to be more user friendly (#33810 ) Instead of having one constructor that accepts all arguments, all parameters should be provided via setters. Only leader and follower index are required arguments. This makes using this class in tests and transport client easier.	2018-09-19 07:18:24 +02:00
Martijn van Groningen	805a12361f	[CCR] Fail with a descriptive error if leader index does not exist (#33797 ) Closes #33737	2018-09-18 21:47:02 +02:00
Martijn van Groningen	9fe5a273aa	[TEST] handle failed search requests differently	2018-09-18 15:55:27 +02:00
Martijn van Groningen	47b86d6e6a	[CCR] Changed AutoFollowCoordinator to keep track of certain statistics (#33684 ) The following stats are being kept track of: 1) The total number of times that auto following a leader index succeed. 2) The total number of times that auto following a leader index failed. 3) The total number of times that fetching a remote cluster state failed. 4) The most recent 256 auto follow failures per auto leader index (e.g. create_and_follow api call fails) or cluster alias (e.g. fetching remote cluster state fails). Each auto follow run now produces a result that is being used to update the stats being kept track of in AutoFollowCoordinator. Relates to #33007	2018-09-18 09:43:50 +02:00
Martijn van Groningen	15f30d689b	[CCR] Do not unnecessarily wrap fetch exception in a ElasticSearch exception and (#33777 ) * [CCR] Do not unnecessarily wrap fetch exception in a ElasticSearch exception and properly map fetch_exception.exception field as object. The extra caused by level is not necessary here: ``` "fetch_exceptions": [ { "from_seq_no": 1, "retries": 106, "exception": { "type": "exception", "reason": "[index1] IndexNotFoundException[no such index]", "caused_by": { "type": "index_not_found_exception", "reason": "no such index", "index_uuid": "_na_", "index": "index1" } } } ], ```	2018-09-17 22:33:37 +02:00
Martijn van Groningen	d8dc042514	[CCR] Handle leader index with no mapping correctly (#33770 ) When a leader index is created, it may not have a mapping yet. Currently if you follow such an index the shard follow tasks fail with NoSuchElementException, because they expect a single mapping. This commit fixes that, by allowing that a leader index does not yet have a mapping.	2018-09-17 19:47:40 +02:00
Martijn van Groningen	7046cc467f	[CCR] Make index.xpack.ccr.following_index an internal setting (#33768 )	2018-09-17 18:08:19 +02:00
Martijn van Groningen	5d2a01dcc3	[CCR] Fail with a good error if a follow index does not have ccr metadata (#33761 ) instead of a NPE.	2018-09-17 18:00:16 +02:00
Jason Tedor	2d81fc3873	Keep CCR REST API specification with all of X-Pack (#33743 ) This commit moves the CCR REST API specification out of the CCR sub-project to locate them with the rest of the REST API specifications for X-Pack.	2018-09-17 09:59:22 -04:00
Martijn van Groningen	481f8a9a07	[CCR] Make auto follow patterns work with security (#33501 ) Relates to #33007	2018-09-17 07:29:00 +02:00
Jason Tedor	770ad53978	Introduce long polling for changes (#33683 ) Rather than scheduling pings to the leader index when we are caught up to the leader, this commit introduces long polling for changes. We will fire off a request to the leader which if we are already caught up will enter a poll on the leader side to listen for global checkpoint changes. These polls will timeout after a default of one minute, but can also be specified when creating the following task. We use these time outs as a way to keep statistics up to date, to not exaggerate time since last fetches, and to avoid pipes being broken.	2018-09-16 10:35:23 -04:00
Jason Tedor	069605bd91	Do not count shard changes tasks against REST tests (#33738 ) When executing CCR REST tests it is going to be expected after global checkpoint polling goes in that shard changes tasks can still be pending at the end of the test. One way to deal with this is to set a low timeout on these polls, but then that means we are not executing our REST tests with our default production settings and instead would be using an unrealistic low timeout. Alternatively, since we expect these tasks to be there, we can not count them against the test. That is what this commit does.	2018-09-16 07:32:12 -04:00
Jason Tedor	73417bf09a	Move CCR REST tests to a sub-project of ccr This commit moves these REST tests (possibly temporarily) to a sub-project of ccr. We do this (again, possibly temporarily) to keep them within the ccr sub-project yet there are changes within 6.x that prevent these from being in the top-level project (the cluster formation tasks are trying to install x-pack-ccr into the integ-test-zip). Therefore, we isolate these for now until we can understand why there are differences between 6.x and master.	2018-09-15 10:18:59 -04:00
Jason Tedor	aa56892f2f	Move CCR REST tests to ccr sub-project (#33731 ) This commit moves the CCR REST tests to the ccr sub-project as another step towards running :x-pack:plugin:ccr:check giving us full coverage on CCR.	2018-09-15 09:18:15 -04:00
Jason Tedor	f037edb8e3	Move CCR monitoring tests to ccr sub-project (#33730 ) This commit moves the CCR monitoring tests from the monitoring sub-project to the ccr sub-project.	2018-09-15 09:16:33 -04:00
Martijn van Groningen	82a6ae1dae	[CCR] Move ccr tests in core module back to ccr module (#33711 ) When developing ccr it is not ideal if tests are in multiple modules. Even the classes these tests test are in the core module, it is easier if these tests are in ccr module in order to avoid running the test task in core module. This results in running many non ccr tests. This way when developing ccr we can run locally: ./gradlew x-pack:plugin:core:precommit x-pack:plugin:ccr:check before pushing to PR branches and be confident that the PR build passes, without running x-pack:plugin:core:check task.	2018-09-14 17:18:00 +02:00
Jason Tedor	2282150f34	Expose retries for CCR fetch failures (#33694 ) This commit exposes the number of times that a fetch has been tried to the CCR stats endpoint, and to CCR monitoring.	2018-09-14 08:52:46 -04:00
Martijn van Groningen	222f42274e	[CCR] Check whether the rejected execution exception has the shutdown flag set (#33703 ) and if so debug log it and otherwise rethrow. This should fix a couple of test failures where during test teardown tests failed due to uncaught exceptions being detected.	2018-09-14 13:28:11 +02:00
Martijn van Groningen	4bcad95fe7	[TEST] wait for no initializing shards	2018-09-14 09:59:24 +02:00
Martijn van Groningen	53ba253aa4	[CCR] Add validation for max_retry_delay (#33648 )	2018-09-13 20:52:00 +02:00
Martijn van Groningen	a69ae6b89f	[CCR] Add metadata to keep track of the index uuid of the leader index in the follow index (#33367 ) The follow index api checks if the recorded uuid in the follow index matches with uuid of the leader index and fails otherwise. This validation will prevent a follow index from following an incompatible leader index. The create_and_follow api will automatically add this custom index metadata when it creates the follow index. Closes #31505	2018-09-13 11:36:52 +02:00
Jason Tedor	eb715d5290	Add follower index to CCR monitoring and status (#33645 ) This commit adds the follower index to CCR shard follow task status, and to monitoring.	2018-09-12 17:35:06 -04:00
Martijn van Groningen	b5d8495789	[CCR] Add auto follow pattern APIs to transport client. (#33629 )	2018-09-12 21:50:22 +02:00
Martijn van Groningen	5fa81310cc	[CCR] Added history uuid validation (#33546 ) For correctness we need to verify whether the history uuid of the leader index shards never changes while that index is being followed. * The history UUIDs are recorded as custom index metadata in the follow index. * The follow api validates whether the current history UUIDs of the leader index shards are the same as the recorded history UUIDs. If not the follow api fails. * While a follow index is following a leader index; shard follow tasks on each shard changes api call verify whether their current history uuid is the same as the recorded history uuid. Relates to #30086 Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co>	2018-09-12 19:42:00 +02:00
Martijn van Groningen	901d8035d9	[CCR] Update es monitoring mapping and (#33635 ) * [CCR] Update es monitoring mapping and change qa tests to query based on leader index. Co-authored-by: Jason Tedor <jason@tedor.me>	2018-09-12 19:36:17 +02:00
Tanguy Leroux	bcac7f5e55	Fix checkstyle violation in ShardFollowNodeTask	2018-09-12 16:03:52 +02:00
Jason Tedor	23f12e42c1	Expose CCR stats to monitoring (#33617 ) This commit exposes the CCR stats endpoint to monitoring collection. Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com>	2018-09-12 09:13:07 -04:00
Martijn van Groningen	96c49e5ed0	[CCR] Improve shard follow task's retryable error handling (#33371 ) Improve failure handling of retryable errors by retrying remote calls in a exponential backoff like manner. The delay between a retry would not be longer than the configured max retry delay. Also retryable errors will be retried indefinitely. Relates to #30086	2018-09-12 12:49:51 +02:00
Jason Tedor	20476b9e06	Disable CCR REST endpoints if CCR disabled (#33619 ) This commit avoids enabling the CCR REST endpoints if CCR is disabled.	2018-09-12 01:54:34 -04:00
Jason Tedor	eca37e6e0a	Expose CCR to the transport client (#33608 ) This commit exposes CCR to the transport client.	2018-09-11 16:37:52 -04:00
Martijn van Groningen	74d41857c6	mute test on windows Relates #33570	2018-09-10 16:49:17 +02:00
Martijn van Groningen	8eebca32d2	[CCR] Delay auto follow license check (#33557 ) * [CCR] Delay auto follow license check so that we're sure that there are auto follow patterns configured Otherwise we log a warning in case someone is running with basic or gold license and has not used the ccr feature.	2018-09-10 13:23:02 +02:00
Martijn van Groningen	c4adcee3ea	[CCR] Add create_follow_index privilege (#33559 ) This is a new index privilege that the user needs to have in the follow cluster. This privilege is required in addition to the `manage_ccr` cluster privilege in order to execute the create and follow api. Closes #33555	2018-09-10 13:08:20 +02:00
Jason Tedor	d1b99877fa	Remove underscore from auto-follow API (#33550 ) This commit removes the leading underscore from _auto_follow in the auto-follow API endpoints.	2018-09-09 14:42:49 -04:00
Nhat Nguyen	902d20cbbe	CCR: Use single global checkpoint to normalize range (#33545 ) We may use different global checkpoints to validate/normalize the range of a change request if the global checkpoint is advanced between these calls. If this is the case, then we generate an invalid request range.	2018-09-09 13:18:30 -04:00
Jason Tedor	6eca627409	Reverse logic for CCR license checks (#33549 ) This commit reverses the logic for CCR license checks in a few actions. This is done so that the successful case, which tends to be a larger block of code, does not require indentation.	2018-09-09 10:22:22 -04:00
Jason Tedor	edc492419b	Add latch countdown on failure in CCR license tests (#33548 ) We have some listeners in the CCR license tests that invoke Assert#fail if the onSuccess method for the listener is unexpectedly invoked. This can leave the main test thread hanging until the test suite times out rather than failing quickly. This commit adds some latch countdowns so that we fail quickly if these cases are hit.	2018-09-09 09:52:40 -04:00
Jason Tedor	c67b0ba33e	Create temporary directory if needed in CCR test In the multi-cluster-with-non-compliant-license tests, we try to write out a java.policy to a temporary directory. However, if this temporary directory does not already exist then writing the java.policy file will fail. This commit ensures that the temporary directory exists before we attempt to write the java.policy file.	2018-09-09 07:16:56 -04:00
Jason Tedor	5a38c930fc	Add license checks for auto-follow implementation (#33496 ) This commit adds license checks for the auto-follow implementation. We check the license on put auto-follow patterns, and then for every coordination round we check that the local and remote clusters are licensed for CCR. In the case of non-compliance, we skip coordination yet continue to schedule follow-ups.	2018-09-09 07:06:55 -04:00
Simon Willnauer	c12d232215	Pass Directory instead of DirectoryService to Store (#33466 ) Instead of passing DirectoryService which causes yet another dependency on Store we can just pass in a Directory since we will just call `DirectoryService#newDirectory()` on it anyway.	2018-09-07 14:00:24 +02:00
Nhat Nguyen	8afe09a749	Pass TranslogRecoveryRunner to engine from outside (#33449 ) This commit allows us to use different TranslogRecoveryRunner when recovering an engine from its local translog. This change is a prerequisite for the commit-based rollback PR. Relates #32867	2018-09-06 11:59:16 -04:00
Martijn van Groningen	ef207edbf0	test: do not schedule when test has stopped	2018-09-06 14:14:24 +02:00
Martijn van Groningen	cdd82bb203	test: fetch `SeqNoStats` inside try-catch block Relates to #33457	2018-09-06 11:49:08 +02:00
Martijn van Groningen	a721d09c81	[CCR] Added auto follow patterns feature (#33118 ) Auto Following Patterns is a cross cluster replication feature that keeps track whether in the leader cluster indices are being created with names that match with a specific pattern and if so automatically let the follower cluster follow these newly created indices. This change adds an `AutoFollowCoordinator` component that is only active on the elected master node. Periodically this component checks the the cluster state of remote clusters if there new leader indices that match with configured auto follow patterns that have been defined in `AutoFollowMetadata` custom metadata. This change also adds two new APIs to manage auto follow patterns. A put auto follow pattern api: ``` PUT /_ccr/_autofollow/{{remote_cluster}} { "leader_index_pattern": ["logs-*", ...], "follow_index_pattern": "{{leader_index}}-copy", "max_concurrent_read_batches": 2 ... // other optional parameters } ``` and delete auto follow pattern api: ``` DELETE /_ccr/_autofollow/{{remote_cluster_alias}} ``` The auto follow patterns are directly tied to the remote cluster aliases configured in the follow cluster. Relates to #33007 Co-authored-by: Jason Tedor jason@tedor.me	2018-09-06 08:01:58 +02:00
Jason Tedor	d71ced1b00	Generalize search.remote settings to cluster.remote (#33413 ) With features like CCR building on the CCS infrastructure, the settings prefix search.remote makes less sense as the namespace for these remote cluster settings than does a more general namespace like cluster.remote. This commit replaces these settings with cluster.remote with a fallback to the deprecated settings search.remote.	2018-09-05 20:43:44 -04:00
Nhat Nguyen	16b53b5ab5	Mute testValidateFollowingIndexSettings Tracked at #33379	2018-09-04 09:03:26 -04:00
Alpar Torok	7f7e8fd733	Disable assemble task instead of removing it (#33348 )	2018-09-04 07:32:14 +03:00
Nhat Nguyen	3a1dad1050	Mute testFollowIndexAndCloseNode Tracked at #33337	2018-09-02 19:17:51 -04:00
Nhat Nguyen	c6b011f8ea	TEST: Increase timeout testFollowIndexAndCloseNode (#33333 ) This test fails several times due to timeout when asserting the number of docs on the following and leading indices. This change reduces the number of docs to index and increases the timeout.	2018-09-02 09:28:47 -04:00
Martijn van Groningen	66b164c2a6	[CCR] Removed custom follow and unfollow api's reponse classes with AcknowledgedResponse (#33260 ) These response classes did not add any value and in that case just AcknowledgedResponse should be used. I also changed the formatting of methods to take one line per parameter in FollowIndexAction.java and UnfollowIndexAction.java files to make reviewing diffs in the future easier.	2018-08-31 21:16:06 +07:00
Nhat Nguyen	d3f32273eb	Merge branch 'master' into ccr	2018-08-30 23:22:58 -04:00
Martijn van Groningen	41c7fc8d37	[CCR] Introduce leader index name & last fetch time stats to stats api response (#33155 )	2018-08-29 10:54:58 +07:00
Nhat Nguyen	e2b931e80b	Use Lucene history in primary-replica resync (#33178 ) This commit makes primary-replica resyncer use Lucene as the source of history operation instead of translog if soft-deletes is enabled. With this change, we no longer expose translog snapshot directly in IndexShard. Relates #29530	2018-08-28 10:44:15 -04:00
Jason Tedor	5954354e62	Fix ShardFollowNodeTask.Status equals and hash code (#33189 ) These were broken when fetch exceptions were introduced to the status object but equals and hash code were not updated then. This commit addresses that.	2018-08-28 08:53:45 -04:00
Jason Tedor	cd91992c89	Only fetch mapping updates when necessary (#33182 ) Today we fetch the mapping from the leader and apply it as a mapping update whenever the index metadata version on the leader changes. Yet, the index metadata can change for many reasons other than a mapping update (e.g., settings updates, adding an alias, or a replica being promoted to a primary among many other reasons). This commit builds on the addition of a mapping version to the index metadata to only fetch mapping updates when the mapping version increases. This reduces the number of these fetches and application of mappings on the follower to the bare minimum.	2018-08-28 06:06:22 -04:00
Jason Tedor	0e5d42ca38	Merge branch 'master' into ccr * master: Adjust BWC version on mapping version Token API supports the client_credentials grant (#33106) Build: forked compiler max memory matches jvmArgs (#33138) Introduce mapping version to index metadata (#33147) SQL: Enable aggregations to create a separate bucket for missing values (#32832) Fix grammar in contributing docs SECURITY: Fix Compile Error in ReservedRealmTests (#33166) APM server monitoring (#32515) Support only string `format` in date, root object & date range (#28117) [Rollup] Move toBuilders() methods out of rollup config objects (#32585) Fix forbiddenapis on java 11 (#33116) Apply publishing to genreate pom (#33094) Have circuit breaker succeed on unknown mem usage Do not lose default mapper on metadata updates (#33153) Fix a mappings update test (#33146) Reload Secure Settings REST specs & docs (#32990) Refactor CachingUsernamePassword realm (#32646)	2018-08-27 13:49:59 -04:00
Martijn van Groningen	47e9e72df2	reduce maximum number of writes to speed up test	2018-08-27 12:14:46 +07:00
Jason Tedor	ef9607ea0c	Track fetch exceptions for shard follow tasks (#33047 ) This commit adds tracking and reporting for fetch exceptions. We track fetch exceptions per fetch, keeping track of up to the maximum number of concurrent fetches. With each failing fetch, we associate the from sequence number with the exception that caused the fetch. We report these in the CCR stats endpoint, and add some testing for this tracking.	2018-08-24 14:21:23 -04:00
Jason Tedor	7fa8a728c4	Make CCR QA tests build again (#33113 ) Welp, I broke this. I merged a change to auto-discover the CCR QA tests by making :x-pack:plugin:ccr:check auto-discover the check tasks in the qa sub-project. Yet, the check tasks for these sub-projects did not depend on the necessary test tasks (as we were previously doing this directly from the ccr build file. This commit fixes this!	2018-08-24 09:48:54 -04:00
Martijn van Groningen	b0f22d67c4	fixed not returning response instance	2018-08-24 16:56:29 +07:00
Martijn van Groningen	575f33941c	Required changes after merging in master branch.	2018-08-24 12:51:26 +07:00
Jason Tedor	9623cf6cde	Find CCR QA sub-projects automatically (#33027 ) Today we are by-hand maintaining a list of CCR QA sub-projects that the check task depends on. This commit simplifies this by finding these sub-projects automatically and adding their check task as dependencies of the CCR check task.	2018-08-21 12:51:55 -04:00
Jason Tedor	b08d02e3b7	Implement CCR licensing (#33002 ) This commit implements licensing for CCR. CCR will require a platinum license, and administrative endpoints will be disabled when a license is non-compliant.	2018-08-20 23:33:18 -04:00
Nhat Nguyen	919888eba7	TEST: Enable debug log testValidateFollowingIndexSettings	2018-08-06 14:55:56 -04:00
Nhat Nguyen	c394eb9ae9	CCR: Expose the operation primary term Relates #32442	2018-08-06 10:55:37 -04:00
Jason Tedor	3b739b9fd5	Avoid NPE on shard changes action (#32630 ) If a leader index is deleted while there is an active follower, the follower will send shard changes requests bound for the leader index. Today this will result in a null pointer exception because there will not be an index routing table for the index. A null pointer exception looks like a bug to a user so this commit addresses this by throwing an index not found exception instead.	2018-08-06 08:01:47 -04:00
Jason Tedor	32c2759bb9	Remove extra blank line in CcrStatsAction.java This commit removes an extra blank line that was accidentally committed to CcrStatsAction.java.	2018-08-03 09:55:04 -04:00
Jason Tedor	d640c9ddf9	Introduce CCR stats endpoint (#32350 ) This commit introduces the CCR stats endpoint which provides shard-level stats on the status of CCR follower tasks.	2018-08-03 09:09:45 -04:00
Jason Tedor	2387616c80	Remove _xpack from CCR APIs (#32563 ) For a new feature like CCR we will go without this extra layer of indirection. This commit replaces all /_xpack/ccr/_(\S+) endpoints by /_ccr/$1 endpoints.	2018-08-02 20:21:43 -04:00
Nhat Nguyen	8cfbb64d6e	ShardFollowNodeTask should fetch operation once (#32455 ) Today ShardFollowNodeTask might fetch some operations more than once. This happens because we ask the leading for up to max_batch_count operations (instead of the left-over size) for the left-over request. The leading then can freely respond up to the max_batch_count, and at the same time, if one of the previous requests completed, we might issue another read request whose range overlaps with the response of the left-over request. Closes #32453	2018-07-30 20:53:09 -04:00
Nhat Nguyen	aa3b6e098c	Reject follow request if following setting not enabled on follower (#32448 ) Today we do not check if the `following_index` setting of the follower is enabled or not when processing a follow-request. If that setting is disabled, the follower will use the default engine, not the following engine. This change checks and rejects such invalid follow requests. Relates #30086	2018-07-29 21:57:45 -04:00
Nhat Nguyen	8474f8a01c	Validate source of an index in LuceneChangesSnapshot (#32288 ) Today it's possible to encounter an Index operation in Lucene whose _source is disabled, and _recovery_source was pruned by the MergePolicy. If it's the case, we create a Translog#Index without source and let the caller validate it later. However, this approach is challenging for the caller. Deletes and No-Ops don't allow invoking "source()" method. The caller has to make sure to call "source()" only on index operations. The current implementation in CCR does not follow this and fail to replica deletes or no-ops. Moreover, it's easier to reason if a Translog#Index always has the source.	2018-07-27 08:16:52 -04:00
Nhat Nguyen	cd8b80da58	Use shadow plugin in ccr/qa	2018-07-25 00:16:33 -04:00
Nhat Nguyen	a5d8f0b55a	CCR: use shadow plugin Relates #32240	2018-07-24 22:48:11 -04:00
Nhat Nguyen	88190299df	CCR: Fix incorrect read request completion condition (#32266 ) Today we consider a read request is exhausted if from_seqno is equal to or greater than the max_required_seqno. However, if we stop when from_seqno equals to the max_required_seqno, we will miss an operation whose seqno is max_required_seqno because we have not seen that operation yet.	2018-07-22 22:14:27 -04:00
Martijn van Groningen	b6b596e471	[CCR] Add random shard follow task test (#32188 ) Added shard follow task unit tests that tests whether the shard follow task is able to process randomly generated shard changes api responses.	2018-07-21 12:38:05 +02:00
Nhat Nguyen	8e15504443	TEST: Fix range issue in ShardChangesActionTests We modified the way we calculate to_seqno in #32121 but did not adjust this test accordingly. If min_seqno equals to max_seqno, the size should be one instead of zero. Relates #32121	2018-07-20 17:20:41 -04:00
Nhat Nguyen	fe574f89f8	CCR: Translog op on primary should have versionType Normally translog operations will not be replayed on the primary. Following engine is an exception where we replay translog on both primary and replica as a non-primary strategy. Even though we won't use the version_type in the following engine, we still need to pass a valid value for the primary operation in order not to trip assertions in an engine. This commit passes version_type EXTERNAL for translog operation if its origin is primary. Relates #31945	2018-07-20 08:39:38 -04:00
Martijn van Groningen	a6b7497fdc	[CCR] Add more unit tests for shard follow task (#32121 ) The added tests are based on specific scenarios as described in the test plan. Before this change the ShardFollowNodeTaskTests contained more random like tests, but these have been removed and in a followup pr better random tests will be added in a new test class as is described in the test plan.	2018-07-20 14:12:05 +02:00
Nhat Nguyen	d0f3ed5abd	Merge branch 'master' into ccr * master: Painless: Simplify Naming in Lookup Package (#32177) Handle missing values in painless (#32207) add support for write index resolution when creating/updating documents (#31520) ECS Task IAM profile credentials ignored in repository-s3 plugin (#31864) Remove indication of future multi-homing support (#32187) Rest test - allow for snapshots to take 0 milliseconds Make x-pack-core generate a pom file Rest HL client: Add put watch action (#32026) Build: Remove pom generation for plugin zip files (#32180) Fix comments causing errors with Java 11 Fix rollup on date fields that don't support epoch_millis (#31890) Detect and prevent configuration that triggers a Gradle bug (#31912) [test] port linux package packaging tests (#31943) Revert "Introduce a Hashing Processor (#31087)" (#32178) Remove empty @return from JavaDoc Adjust SSLDriver behavior for JDK11 changes (#32145) [test] use randomized runner in packaging tests (#32109) Add support for field aliases. (#32172) Painless: Fix caching bug and clean up addPainlessClass. (#32142) Call setReferences() on custom referring tokenfilters in _analyze (#32157) Fix BwC Tests looking for UUID Pre 6.4 (#32158) Improve docs for search preferences (#32159) use before instead of onOrBefore Add more contexts to painless execute api (#30511) Add EC2 credential test for repository-s3 (#31918) A replica can be promoted and started in one cluster state update (#32042) Fix Java 11 javadoc compile problem Fix CP for namingConventions when gradle home has spaces (#31914) Fix `range` queries on `_type` field for singe type indices (#31756) [DOCS] Update TLS on Docker for 6.3 (#32114) ESIndexLevelReplicationTestCase doesn't support replicated failures but it's good to know what they are Remove versionType from translog (#31945) Switch distribution to new style Requests (#30595) Build: Skip jar tests if jar disabled Painless: Add PainlessClassBuilder (#32141) Build: Make additional test deps of check (#32015) Disable C2 from using AVX-512 on JDK 10 (#32138) Build: Move shadow customizations into common code (#32014) Painless: Fix Bug with Duplicate PainlessClasses (#32110) Remove empty @param from Javadoc Re-disable packaging tests on suse boxes Docs: Fix missing example script quote (#32010) [ML] Wait for aliases in multi-node tests (#32086) [ML] Move analyzer dependencies out of categorization config (#32123) Ensure to release translog snapshot in primary-replica resync (#32045) Handle TokenizerFactory TODOs (#32063) Relax TermVectors API to work with textual fields other than TextFieldType (#31915) Updates the build to gradle 4.9 (#32087) Mute :qa:mixed-cluster indices.stats/10_index/Index - all’ Check that client methods match API defined in the REST spec (#31825) Enable testing in FIPS140 JVM (#31666) Fix put mappings java API documentation (#31955) Add exclusion option to `keep_types` token filter (#32012) [Test] Modify assert statement for ssl handshake (#32072)	2018-07-19 23:03:01 -04:00
Martijn van Groningen	d88c76e02b	[CCR] Initial replication group based tests (#32024 ) Tests shard follow task in the context of a leader and follower ReplicationGroup, in order to test how the shard follow logic reacts to certain shard related failure scenarios. More tests will need to be added, but this indicates what changes need to be made to have these tests. Relates to #30102	2018-07-17 17:39:49 +02:00
Martijn van Groningen	006c79a80d	[CCR] Improve retry mechanism when making remote calls from shard follow task (#31930 ) Closes #31816	2018-07-17 10:25:51 +02:00
Martijn van Groningen	815faf34fc	[CCR] Move api parameters from url to request body. (#31949 ) Relates to #30102	2018-07-11 10:16:43 +02:00
Martijn van Groningen	8e1ef0cff9	Rewrite shard follow node task logic (#31581 ) The current shard follow mechanism is complex and does not give us easy ways the have visibility into the system (e.g. why we are falling behind). The main reason why it is complex is because the current design is highly asynchronous. Also in the current model it is hard to apply backpressure other than reducing the concurrent reads from the leader shard. This PR has the following changes: * Rewrote the shard follow task to coordinate the shard follow mechanism between a leader and follow shard in a single threaded manner. This allows for better unit testing and makes it easier to add stats. * All write operations read from the shard changes api should be added to a buffer instead of directly sending it to the bulk shard operations api. This allows to apply backpressure. In this PR there is a limit that controls how many write ops are allowed in the buffer after which no new reads will be performed until the number of ops is below that limit. * The shard changes api includes the current global checkpoint on the leader shard copy. This allows reading to be a more self sufficient process; instead of relying on a background thread to fetch the leader shard's global checkpoint. * Reading write operations from the leader shard (via shard changes api) is a separate step then writing the write operations (via bulk shards operations api). Whereas before a read would immediately result into a write. * The bulk shard operations api returns the local checkpoint on the follow primary shard, to keep the shard follow task up to date with what has been written. * Moved the shard follow logic that was previously in ShardFollowTasksExecutor to ShardFollowNodeTask. * Moved over the changes from #31242 to make shard follow mechanism resilient from node and shard failures. Relates to #30086	2018-07-10 16:00:55 +02:00
Martijn van Groningen	ac654cbc10	Follow engine should not fill gaps upon promotion and recovery (#31751 ) Closes #31318	2018-07-03 13:15:06 +02:00
Martijn van Groningen	8ecfcc3b80	muted tests that will be replaced by the shard follow task refactoring: https://github.com/elastic/elasticsearch/pull/31581	2018-06-29 11:47:46 +02:00
Nhat Nguyen	1185ddbcc6	Replaces testClassesDir with testClassesDirs in ccr build Relates #30389	2018-06-28 11:24:41 -04:00
Nhat Nguyen	2c56df631d	Adjusts transport actions in CCR This commit adjusts the ccr’s actions accordingly to the recent changes in the upstream.	2018-06-23 18:10:15 -04:00
Nhat Nguyen	34f127be3c	CCR: Remove index name resolver from CCR actions Relates #31002	2018-06-20 13:20:24 -04:00
Nhat Nguyen	c74cd30ac6	Remove request type parameter from CCR actions Relates #31405	2018-06-19 10:49:05 -04:00
Martijn van Groningen	50ce990305	added missing serialization tests	2018-06-19 10:22:58 +02:00
Martijn van Groningen	73c9dd976b	Remove action request builders.	2018-06-15 12:32:08 +02:00
Tanguy Leroux	18938aab39	Adapt ShardFollowTasksExecutor after #31031	2018-06-15 11:46:08 +02:00
Martijn van Groningen	cc824ebb5e	[CCR] Added more validation to follow index api. (#31068 )	2018-06-15 07:39:53 +02:00
Nhat Nguyen	1ccb34ac77	Remove unused imports	2018-06-14 11:44:20 -04:00
Jason Tedor	64b4cdeda6	Merge remote-tracking branch 'elastic/master' into ccr * elastic/master: (53 commits) Painless: Restructure/Clean Up of Spec Documentation (#31013) Update ignore_unmapped serialization after backport Add back dropped substitution on merge high level REST api: cancel task (#30745) Enable engine factory to be pluggable (#31183) Remove vestiges of animal sniffer (#31178) Rename elasticsearch-nio to nio (#31186) Rename elasticsearch-core to core (#31185) Move cli sub-project out of server to libs (#31184) [DOCS] Fixes broken link in auditing settings QA: Better seed nodes for rolling restart [DOCS] Moves ML content to stack-docs [DOCS] Clarifies recommendation for audit index output type (#31146) Add nio-transport as option for http smoke tests (#31162) QA: Set better node names on rolling restart tests Add support for ignore_unmapped to geo sort (#31153) Share common parser in some AcknowledgedResponses (#31169) Fix random failure on SearchQueryIT#testTermExpansionExceptionOnSpanFailure Remove reference to multiple fields with one name (#31127) Remove BlobContainer.move() method (#31100) ...	2018-06-07 23:33:42 -04:00
Simon Willnauer	5c6711b8a4	Use a `_recovery_source` if source is omitted or modified (#31106 ) Today if a user omits the `_source` entirely or modifies the source on indexing we have no chance to re-create the document after it has been added. This is an issue for CCR and recovery based on soft deletes which we are going to make the default. This change adds an additional recovery source if the source is disabled or modified that is only kept around until the document leaves the retention policy window. This change adds a merge policy that efficiently removes this extra source on merge for all document that are live and not in the retention policy window anymore.	2018-06-07 07:39:28 +02:00
Jason Tedor	20a2f646e2	Fix off-by-one error in chunks coordinator (#31147 ) This commit fixes an off-by-error in the chunks coordinator where the batches would be of size one more than the batch size.	2018-06-06 19:53:49 -04:00
Jason Tedor	bf1152fcc6	Use follower primary term when applying operations (#31113 ) The primary shard copy on the following has authority of the replication operations that occur on the following side in cross-cluster replication. Yet today we are using the primary term directly from the operations on the leader side. Instead we should be replacing the primary term on the following side with the primary term of the primary on the following side. This commit does this by copying the translog operations with the corrected primary term. This ensures that we use this primary term while applying the operations on the primary, and when replicating them across to the replica (where the replica request was carrying the primary term of the primary shard copy on the follower).	2018-06-06 11:03:57 -04:00
Jason Tedor	d230548401	Remove use of deprecated methods to perform request (#31117 ) The old perform request methods on the REST client have been deprecated in favor using request-flavored methods. This commit addresses the use of these deprecated methods in the CCR test suite.	2018-06-06 05:09:55 -04:00
Nhat Nguyen	6ee6404e94	Adapt changes in PersistentTaskParams Relates #31045	2018-06-04 14:48:04 -04:00
Nhat Nguyen	87abb49145	Adapt changes in AcknowledgeResponse Relates #30983	2018-06-04 14:47:22 -04:00
Nhat Nguyen	9564b60194	Adjust CCR Actions after RequestBuilder is removed CCR side of #30966	2018-06-01 23:09:59 -04:00
Nhat Nguyen	2a9a2002e6	CCR: Tighten requesting range check on leader This commit clarifies the origin of the global checkpoint that the following shard uses and replaces illegal_state_exc E by an assertion. Relates #30980	2018-05-31 20:00:33 -04:00
Nhat Nguyen	fa54be2dcd	CCR: Do not minimization requesting range on leader (#30980 ) Today before reading operations on the leading shard, we minimization the requesting range with the global checkpoint. However, this might make the request invalid if the following shard generates a requesting range based on the global-checkpoint from a primary shard and sends that request to a replica whose global checkpoint is lagged. Another issue is that we are mutating the request when applying minimization. If the request becomes invalid on a replica, we will reroute the mutated request instead of the original one to the primary. This commit removes the minimization and replaces it by a range check with the local checkpoint.	2018-05-31 15:14:32 -04:00
Martijn van Groningen	7e8cf768cf	changed persistent task name to be of similar structure as the others	2018-05-31 15:16:13 +02:00
Martijn van Groningen	a82f2e31b4	[CCR] Also copy routing_num_shards from leader to follow index. (#30894 ) Bug was introduced when create and follow api was added in #30602	2018-05-31 08:03:53 +02:00
Nhat Nguyen	f25ee254cc	Mute ShardChangesIT#testFollowIndex	2018-05-30 14:29:58 -04:00
Martijn van Groningen	adca32eae7	no need to resolve index name as only concrete index names are used	2018-05-30 12:42:35 +02:00
Martijn van Groningen	4a20dca5fe	Required changes after merging in master.	2018-05-30 10:26:49 +02:00
Martijn van Groningen	51caefe46c	[CCR] Sync mappings between leader and follow index (#30115 ) The shard changes api returns the minimum IndexMetadata version the leader index needs to have. If the leader side is behind on IndexMetadata version then follow shard task waits with processing write operations until the mapping has been fetched from leader index and applied in follower index in the background. The cluster state api is used to fetch the leader mapping and put mapping api to apply the mapping in the follower index. This works because put mapping api accepts fields that are already defined. Relates to #30086	2018-05-28 07:37:27 +02:00
Martijn van Groningen	e477147143	[CCR] Add create and follow api (#30602 ) Also renamed FollowExisting* internal names to just Follow* and fixed tests	2018-05-26 15:05:40 +02:00
Martijn van Groningen	7942e4082a	build: enhance check task instead of overwriting it. (test task didn't run when check task ran)	2018-05-16 10:54:15 +02:00
Martijn van Groningen	596ec1848e	[CCR] Add validation checks that were left out of #30120 (#30463 )	2018-05-16 09:46:03 +02:00
Martijn van Groningen	23204e3d09	[CCR] Fixed follow and unfollow api url path according to design. The TODOs in the rest actions was incorrect. The problem was that these rest actions used `follow_index` as first named variable in the path under which the rest actions were registered. Other candidate rest actions that also have a named variable as first element in the path (but with a different name) get resolved as rest parameters too and passed down to the rest action that actually ends up getting executed. In the case of the follow index api, a `index` parameter got passed down to `RestFollowExistingAction`, but that param was never used. This caused the follow index api call to fail, because of unused http parameters. This change doesn't fixes that problem, but works around it by using `index` as named variable for the follow index (instead of `follow_index`). Relates to #30102	2018-05-16 09:07:50 +02:00
Martijn van Groningen	64b97313d5	[CCR] Make cross cluster replication work with security (#30239 ) If security is enabled today with ccr then the follow index api will fail with the fact that system user does not have privileges to use the shard changes api. The reason that system user is used is because the persistent tasks that keep the shards in sync runs in the background and the user that invokes the follow index api only start those background processes. I think it is better that the system user isn't used by the persistent tasks that keep shards in sync, but rather runs as the same user that invoked the follow index api and use the permissions that that user has. This is what this PR does, and this is done by keeping track of security headers inside the persistent task (similar to how rollup does this). This PR also adds a cluster ccr priviledge that allows a user to follow or unfollow an index. Finally if a user that wants to follow an index, it needs to have read and monitor privileges on the leader index and monitor and write privileges on the follow index.	2018-05-16 07:48:32 +02:00
Martijn van Groningen	bb6586dc5f	[CCR] Read changes from Lucene instead of translog (#30120 ) This commit adds an API to read translog snapshot from Lucene, then cut-over from the existing translog to the new API in CCR. Relates #30086 Relates #29530	2018-05-09 17:35:27 -04:00
Martijn van Groningen	ad499fc178	[CCR] added rest specs and simple rest test for follow and unfollow apis (#30123 ) [CCR] added rest specs and simple rest test for follow and unfollow apis, also Added an acknowledge field in follow and unfollow api responses. Currently these api return an empty response and fixed bug in unfollow api that didn't cleanup node tasks properly.	2018-05-07 14:18:28 +02:00
Nhat Nguyen	6e0d0feca0	Enable MockHttpTransport in ShardChangsIT CCR side of #29601	2018-05-04 13:44:18 -04:00
Nhat Nguyen	8fefa8a661	Update InternalEngine tests on ccr side for #30121 Relates #30121	2018-05-04 10:57:54 -04:00
Nhat Nguyen	d621fc7a00	Add tombstone document into Lucene for Noop (#30226 ) This commit adds a tombstone document into Lucene for every No-op. With this change, Lucene index is expected to have a complete history of operations like Translog. In fact, this guarantee is subjected to the soft-deletes retention merge policy. Relates #29530	2018-05-02 09:08:29 -04:00
Nhat Nguyen	eb4281edef	CCR side #30244 Relates #30244	2018-05-01 21:08:24 -04:00
Martijn van Groningen	8a2df6c3b9	[CCR] Only normalize -1 seqno in shard changes request. (#30238 ) Prior to this change a -1 seqno would be normalized earlier, which caused a leader shard containing a single operation to be ignored. Closes #30227	2018-05-01 08:40:23 +02:00
Martijn van Groningen	e6b88fa5a0	removed duplicated license	2018-04-25 12:18:02 +02:00
Martijn van Groningen	5a67a0f78f	Applying changes required for ccr after moving ccr code to elasticsearch	2018-04-25 08:03:29 +02:00
Martijn van Groningen	9b9d0f9057	Enabled licence header check and fixed unchecked casts. (#4408 )	2018-04-20 11:15:52 +02:00
Martijn van Groningen	cfd7847628	fixed issues after merging in master	2018-04-20 07:59:13 +02:00
Nhat Nguyen	f97aec7b8b	Sibling of enforce access to translog via engine Since elastic/elasticsearch#29542, we no longer expose translog instance but only provide creating translog snapshot method. This commit adapts that change in CCR branch. Relates elastic/elasticsearch#29542	2018-04-18 11:54:00 -04:00
Martijn van Groningen	56ca59a513	Add the ability to the follow index to follow an index in a remote cluster. The follow index api completely reuses CCS infrastructure that was exposed via: https://github.com/elastic/elasticsearch/pull/29495 This means that the leader index parameter support the same ccs index to indicate that an index resides in a different cluster. I also added a qa module that smoke tests the cross cluster nature of ccr. The idea is that this test just verifies that ccr can read data from a remote leader index and that is it, no crazy randomization or indirectly testing other features.	2018-04-17 07:36:40 +02:00
Martijn van Groningen	c0d42e9cd1	Fixed test	2018-04-16 10:48:46 +02:00
Martijn van Groningen	a94b38b88e	Fixed compile errors and test failures after merging master into ccr.	2018-04-13 16:35:09 +02:00
Martijn van Groningen	d77f756f5c	ccr: use indices stats api to fetch global checkpoint of the follower shards and keep track of shard follow stats inside shard follow stats' node task instead of persistent task status. By maintaining the shard follow stats inside its node task the stats update is quicker as no cluster state update is required. The stats are now transient; meaning if the task is going to run a different node then the stats are gone too. Currently only the processed global checkpoint is being tracked and this is being restored when a shard follow node task starts via the indices stats api (the reason of the first change of this change). Other stats that we may add in the future (like fetch_time, see: https://gist.github.com/s1monw/dba13daf8493bf48431b72365e110717) it is ok if we start from zero in case a shard follow task moves to another node.	2018-04-05 14:52:20 +02:00
Martijn van Groningen	d976fa44e7	Removed LocalCheckpointTracker usage.	2018-03-29 07:41:23 +02:00
Martijn van Groningen	a22a7d079d	ccr: Added maximum translog limit that a single shard changes response can return. This limit is based on the number of estimate bytes in each translog operation that fall between the minimum and maximum request sequence number. If this limit is met then the shard follow task executor will make sure that a subsequent shard changes request will be performed to fetch the remaining translog operations. This limit is needed in order to protect against returning too many translog operations in a single shard changes response. Relates to #2436	2018-03-28 15:49:57 +02:00

... 5 6 7 8 9 ...

660 Commits