OpenSearch

Commit Graph

Author	SHA1	Message	Date
Jason Tedor	c8c596cead	Introduce retention lease expiration (#37195 ) This commit implements a straightforward approach to retention lease expiration. Namely, we inspect which leases are expired when obtaining the current leases through the replication tracker. At that moment, we clean the map that persists the retention leases in memory.	2019-01-07 22:03:52 -08:00
Jason Tedor	c0f8c89172	Introduce shard history retention leases (#37167 ) This commit is the first in a series which will culminate with fully-functional shard history retention leases. Shard history retention leases are aimed at preventing shard history consumers from having to fallback to expensive file copy operations if shard history is not available from a certain point. These consumers include following indices in cross-cluster replication, and local shard recoveries. A future consumer will be the changes API. Further, index lifecycle management requires coordinating with some of these consumers otherwise it could remove the source before all consumers have finished reading all operations. The notion of shard history retention leases that we are introducing here will also be used to address this problem. Shard history retention leases are a property of the replication group managed under the authority of the primary. A shard history retention lease is a combination of an identifier, a retaining sequence number, a timestamp indicating when the lease was acquired or renewed, and a string indicating the source of the lease. Being leases they have a limited lifespan that will expire if not renewed. The idea of these leases is that all operations above the minimum of all retaining sequence numbers will be retained during merges (which would otherwise clear away operations that are soft deleted). These leases will be periodically persisted to Lucene and restored during recovery, and broadcast to replicas under certain circumstances. This commit is merely putting the basics in place. This first commit only introduces the concept and integrates their use with the soft delete retention policy. We add some tests to demonstrate the basic management is correct, and that the soft delete policy is correctly influenced by the existence of any retention leases. We make no effort in this commit to implement any of the following: - timestamps - expiration - persistence to and recovery from Lucene - handoff during primary relocation - sharing retention leases with replicas - exposing leases in shard-level statistics - integration with cross-cluster replication These will occur individually in follow-up commits.	2019-01-07 07:43:57 -08:00
Jim Ferenczi	e38cf1d0dc	Add the ability to set the number of hits to track accurately (#36357 ) In Lucene 8 searches can skip non-competitive hits if the total hit count is not requested. It is also possible to track the number of hits up to a certain threshold. This is a trade off to speed up searches while still being able to know a lower bound of the total hit count. This change adds the ability to set this threshold directly in the track_total_hits search option. A boolean value (true, false) indicates whether the total hit count should be tracked in the response. When set as an integer this option allows to compute a lower bound of the total hits while preserving the ability to skip non-competitive hits when enough matches have been collected. Relates #33028	2019-01-04 20:36:49 +01:00
Luca Cavanna	c1beb95aa1	Mute LocalIndexFollowingIT#testRemoveRemoteConnection Relates to #37014	2018-12-28 16:39:36 +01:00
Nhat Nguyen	7580d9d925	Make SourceToParse immutable (#36971 ) Today the routing of a SourceToParse is assigned in a separate step after the object is created. We can easily forget to set the routing. With this commit, the routing must be provided in the constructor of SourceToParse. Relates #36921	2018-12-24 14:06:50 -05:00
Martijn van Groningen	561b704129	[CCR] AutoFollowCoordinator and follower index already created (#36540 ) The AutoFollowCoordinator should be resilient to the fact that the follower index has already been created and in that case it should only update the auto follow metadata with the fact that the follower index was created. Relates to #33007	2018-12-24 10:16:38 +01:00
Martijn van Groningen	44fe265d82	[CCR] Added auto_follow_exception.timestamp field to auto follow stats (#36947 ) Currently auto follow stats users are unable to see whether an auto follow error was recent or old. The new timestamp field will help user distinguish between old and new errors.	2018-12-24 07:53:51 +01:00
Martijn van Groningen	4fb62fcba6	Make CCR resilient against missing remote cluster connections (#36682 ) Both index following and auto following should be resilient against missing remote connections. This happens in the case that they get accidentally removed by a user. When this happens auto following and index following will retry to continue instead of failing with unrecoverable exceptions. Both the put follow and put auto follow APIs validate whether the remote cluster connection. The logic added in this change only exists in case during the lifetime of a follower index or auto follow pattern the remote connection gets removed. This retry behavior similar how CCR deals with authorization errors. Closes #36667 Closes #36255	2018-12-24 07:28:34 +01:00
Martijn van Groningen	4ded4717fe	[CCR] Add `ccr.auto_follow_coordinator.wait_for_timeout` setting (#36714 ) This setting controls the wait for timeout the autofollow coordinator should use when setting cluster state requests to a remote cluster.	2018-12-21 09:36:40 +01:00
Tim Brooks	d9b2ed6135	Send clear session as routable remote request (#36805 ) This commit adds a RemoteClusterAwareRequest interface that allows a request to specify which remote node it should be routed to. The remote cluster aware client will attempt to route the request directly to this node. Otherwise it will send it as a proxy action to eventually end up on the requested node. It implements the ccr clean_session action with this client.	2018-12-20 17:43:12 -07:00
Tim Brooks	4cd570593d	Update index mappings when ccr restore complete (#36879 ) This is related to #35975. When the shard restore process is complete, the index mappings need to be updated to ensure that the data in the files restores is compatible with the follower mappings. This commit implements a mapping update as the final step in a shard restore.	2018-12-20 13:53:04 -07:00
Martijn van Groningen	b42074c1cc	[CCR] Report error if auto follower tries auto follow a leader index with soft deletes disabled (#36886 ) Currently if a leader index with soft deletes disabled is auto followed then this index is silently ignored. This commit changes this behavior to mark these indices as auto followed and report an error, which is visible in auto follow stats. Marking the index as auto follow is important, because otherwise the auto follower will continuously try to auto follow and fail. Relates to #33007	2018-12-20 15:21:52 +01:00
Martijn van Groningen	7b1dfeff2e	Renamed `WHITE_LISTED_SETTINGS` to `NON_REPLICATED_SETTINGS` because the latter better describes the purpose of this field.	2018-12-20 15:08:04 +01:00
Martijn van Groningen	18691daebe	[TEST] Renamed ccr qa module.	2018-12-19 13:57:12 +01:00
Martijn van Groningen	3cc0cf03c6	[TEST] No need to specifically check licensesMetaData on master node.	2018-12-19 13:51:24 +01:00
Martijn van Groningen	a6af33ef0b	[TEST] Wait for license metadata to be installed	2018-12-19 13:03:45 +01:00
Alpar Torok	e9ef5bdce8	Converting randomized testing to create a separate unitTest task instead of replacing the builtin test task (#36311 ) - Create a separate unitTest task instead of Gradle's built in - convert all configuration to use the new task - the built in task is now disabled	2018-12-19 08:25:20 +02:00
Tim Brooks	1fa105658e	Add CcrRestoreSourceService to track sessions (#36578 ) This commit is related to #36127. It adds a CcrRestoreSourceService to track Engine.IndexCommitRef need for in-process file restores. When a follower starts restoring a shard through the CcrRepository it opens a session with the leader through the PutCcrRestoreSessionAction. The leader responds to the request by telling the follower what files it needs to fetch for a restore. This is not yet implemented. Once, the restore is complete, the follower closes the session with the DeleteCcrRestoreSessionAction action.	2018-12-18 11:23:13 -07:00
Martijn van Groningen	1afcfc97bd	[TEST] Added more logging Relates to #36761	2018-12-18 16:01:02 +01:00
Boaz Leskes	5f76f39386	Rename seq# powered optimistic concurrency control parameters to ifSeqNo/ifPrimaryTerm (#36757 ) This PR renames the parameters previously introduce to the following: ### URL Parameters ``` PUT twitter/_doc/1?if_seq_no=501&if_primary_term=1 { "user" : "kimchy", "post_date" : "2009-11-15T14:12:12", "message" : "trying out Elasticsearch" } DELETE twitter/_doc/1?if_seq_no=501&if_primary_term=1 ``` ### Bulk API ``` POST _bulk { "index" : { "_index" : "test", "_type" : "_doc", "_id" : "1", "if_seq_no": 501, "if_primary_term": 1 } } { "field1" : "value1" } { "delete" : { "_index" : "test", "_type" : "_doc", "_id" : "2", "if_seq_no": 501, "if_primary_term": 1 } } ``` ### Java API ``` IndexRequest.ifSeqNo(long seqNo) IndexRequest.ifPrimaryTerm(long primaryTerm) DeleteRequest.ifSeqNo(long seqNo) DeleteRequest.ifPrimaryTerm(long primaryTerm) ``` Relates #36148 Relates #10708	2018-12-18 14:35:18 +01:00
Martijn van Groningen	0ff1f1fa18	Muted tests. Relates to #36764	2018-12-18 13:39:01 +01:00
Martijn van Groningen	57e1a4bc9f	[TEST] Ensure shard follow tasks have really stopped. Relates to #36696	2018-12-18 10:43:27 +01:00
Tim Brooks	3dd5a5a3c5	Initialize startup `CcrRepositories` (#36730 ) Currently, the CcrRepositoryManger only listens for settings updates and installs new repositories. It does not install the repositories that are in the initial settings. This commit, modifies the manager to install the initial repositories. Additionally, it modifies the ccr integration test to configure the remote leader node at startup, instead of using a settings update.	2018-12-17 13:19:32 -07:00
Martijn van Groningen	a181a25226	[CCR] Add time since last auto follow fetch to auto follow stats (#36542 ) For each remote cluster the auto follow coordinator, starts an auto follower that checks the remote cluster state and determines whether an index needs to be auto followed. The time since last auto follow is reported per remote cluster and gives insight whether the auto follow process is alive. Relates to #33007 Originates from #35895	2018-12-17 14:14:56 +01:00
Martijn van Groningen	f27d2c2927	[TEST] Pause index following at end of test, so that no unexpected failures happen at test teardown.	2018-12-17 07:55:27 +01:00
Nhat Nguyen	2028c2af14	TEST: Do not assert max_seq_of_updates if promotion If a primary promotion happens in the test testAddRemoveShardOnLeader, the max_seq_no_of_updates_or_deletes on a new primary might be higher than the max_seq_no_of_updates_or_deletes on the replicas or copies of the follower. Relates #36607	2018-12-16 16:48:04 -05:00
Martijn van Groningen	97107e99e8	Moved test to its rightful place.	2018-12-16 13:57:51 +01:00
Boaz Leskes	733a6d34c1	Add seq no powered optimistic locking support to the index and delete transport actions (#36619 ) This commit add support for using sequence numbers to power [optimistic concurrency control](http://en.wikipedia.org/wiki/Optimistic_concurrency_control) in the delete and index transport actions and requests. A follow up will come with adding sequence numbers to the update and get results. Relates #36148 Relates #10708	2018-12-15 17:59:57 +01:00
Albert Zaharovits	a30e8c2fa3	HasPrivilegesResponse use TreeSet for fields (#36329 ) For class fields of type collection whose order is not important and for which duplicates are not permitted we declare them as `Set`s. Usually the definition is a `HashSet` but in this case `TreeSet` is used instead to aid testing.	2018-12-15 08:34:54 +02:00
Martijn van Groningen	68a674ef1f	[CCR] Fix follow stats API's follower index filtering feature (#36647 ) Currently always all follow stats for all follower indices are being returned even if follow stats for only specific indices are requested.	2018-12-14 19:39:30 +01:00
Armin Braun	c5b3ac5578	SNAPSHOTS: Allow Parallel Restore Operations (#36397 ) * Enable parallel restore operations * Add uuid to restore in progress entries to uniquely identify them * Adjust restore in progress entries to be a map in cluster state * Added tests for: * Parallel restore from two different snapshots * Parallel restore from a single snapshot to different indices to test uuid identifiers are correctly used by `RestoreService` and routing allocator * Parallel restore with waiting for completion to test transport actions correctly use uuid identifiers	2018-12-14 11:39:23 +01:00
Nhat Nguyen	a4b32f1143	Remove concurrency in testFailLeaderReplicaShard (#36607 ) testFailLeaderReplicaShard periodically fails because we concurrently index to the leader group and close one of its replicas. If a replication request hits a closing shard, we will fail that shard; however, failing a shard is supported by the test framework - this makes the test fail.	2018-12-13 19:02:13 -05:00
Boaz Leskes	f6b5d7e013	Add sequence numbers based optimistic concurrency control support to Engine (#36467 ) This commit add support to engine operations for resolving and verifying the sequence number and primary term of the last modification to a document before performing an operation. This is infrastructure to move our (optimistic concurrency control)[http://en.wikipedia.org/wiki/Optimistic_concurrency_control] API to use sequence numbers instead of internal versioning. Relates #36148 Relates #10708	2018-12-13 08:08:40 +01:00
Martijn van Groningen	883940ad92	[CCR] Change AutofollowCoordinator to use wait_for_metadata_version (#36264 ) Changed AutofollowCoordinator makes use of the wait_for_metadata_version feature in cluster state API and removed hard coded poll interval. Originates from #35895 Relates to #33007	2018-12-12 12:47:24 +01:00
Martijn van Groningen	4a825e2e86	[CCR] Clean followed leader index UUIDs in auto follow metadata (#36408 ) The auto follow coordinator keeps track of the UUIDs of indices that it has followed. The index UUID strings need to be cleaned up in the case that these indices are removed in the remote cluster. Relates to #33007	2018-12-12 09:55:37 +01:00
Nhat Nguyen	51800de2a8	Enable soft-deletes by default on 7.0.0 or later (#36141 ) This change enables soft-deletes by default on ES 7.0.0 or later. Relates #33222 Co-authored-by: Jason Tedor <jason@tedor.me>	2018-12-11 18:58:49 -05:00
Nhat Nguyen	f23701406b	CCR/TEST: Enable soft-deletes in ShardChangesActionTests Relates #36446	2018-12-11 15:00:09 -05:00
Andrey Ershov	8b821706cc	Switch more tests to zen2 (#36367 ) 1. CCR tests work without any changes 2. `testDanglingIndices` require changes the source code (added TODO). 3. `testIndexDeletionWhenNodeRejoins` because it's using just two nodes, adding the node to exclusions is needed on restart. 4. `testCorruptTranslogTruncationOfReplica` starts dedicated master one, because otherwise, the cluster does not form, if nodes are stopped and one node is started back. 5. `testResolvePath` needs TEST cluster, because all nodes are stopped at the end of the test and it's not possible to perform checks needed by SUITE cluster. 6. `SnapshotDisruptionIT`. Without changes, the test fails because Zen2 retries snapshot creation as soon as network partition heals. This results into the race between creating snapshot and test cleanup logic (deleting index). Zen1 on the other hand, also schedules retry, but it takes some time after network partition heals, so cleanup logic executes latter and test passes. The check that snapshot is eventually created is added to the end of the test.	2018-12-11 17:12:17 +01:00
Martijn van Groningen	633ab24017	[CCR] Restructured QA modules (#36404 ) Renamed the follow qa modules: `multi-cluster-downgraded-to-basic-license` to `downgraded-to-basic-license` `multi-cluster-with-non-compliant-license` to `non-compliant-license` `multi-cluster-with-security` to `security` Moved the `chain` module into the `multi-cluster` module and changed the `multi-cluster` to start 3 clusters. Followup from #36031	2018-12-09 19:34:48 +01:00
Nhat Nguyen	95bafb0593	TEST: Always enable soft-deletes in ShardChangesTests	2018-12-09 02:57:13 -05:00
Tim Brooks	8a53f2b464	Implement basic `CcrRepository` restore (#36287 ) This is related to #35975. It implements a basic restore functionality for the CcrRepository. When the restore process is kicked off, it configures the new index as expected for a follower index. This means that the index has a different uuid, the version is not incremented, and the Ccr metadata is installed. When the restore shard method is called, an empty shard is initialized.	2018-12-07 15:27:04 -07:00
Nhat Nguyen	f2df0a5be4	Remove LocalCheckpointTracker#resetCheckpoint (#34667 ) In #34474, we added a new assertion to ensure that the LocalCheckpointTracker is always consistent with Lucene index. However, we reset LocalCheckpoinTracker in testDedupByPrimaryTerm cause this assertion to be violated. This commit removes resetCheckpoint from LocalCheckpointTracker and rewrites testDedupByPrimaryTerm without resetting the local checkpoint. Relates #34474	2018-12-07 12:22:20 -05:00
Ryan Ernst	37b3fc383f	Build: Use explicit deps on test tasks for check (#36325 ) This commit moves back to use explicit dependsOn for test tasks on check. Not all tasks extending RandomizedTestingTask should be run by check directly.	2018-12-06 14:13:49 -08:00
Yannick Welsch	a0ae1cc987	Merge remote-tracking branch 'elastic/master' into zen2	2018-12-05 23:13:12 +01:00
Jim Ferenczi	18866c4c0b	Make hits.total an object in the search response (#35849 ) This commit changes the format of the `hits.total` in the search response to be an object with a `value` and a `relation`. The `value` indicates the number of hits that match the query and the `relation` indicates whether the number is accurate (in which case the relation is equals to `eq`) or a lower bound of the total (in which case it is equals to `gte`). This change also adds a parameter called `rest_total_hits_as_int` that can be used in the search APIs to opt out from this change (retrieve the total hits as a number in the rest response). Note that currently all search responses are accurate (`track_total_hits: true`) or they don't contain `hits.total` (`track_total_hits: true`). We'll add a way to get a lower bound of the total hits in a follow up (to allow numbers to be passed to `track_total_hits`). Relates #33028	2018-12-05 19:49:06 +01:00
Yannick Welsch	cc11953724	Merge remote-tracking branch 'elastic/master' into zen2	2018-12-05 16:55:45 +01:00
Tim Brooks	068c856e88	Rename internal repository actions to be internal (#36244 ) This is a follow-up to #36086. It renames the internal repository actions to be prefixed by "internal". This allows the system user to execute the actions. Additionally, this PR stops casting Client to NodeClient. The client we have is a NodeClient so executing the actions will be local.	2018-12-05 08:11:47 -07:00
Yannick Welsch	b20497560c	Merge remote-tracking branch 'elastic/master' into zen2	2018-12-05 14:06:38 +01:00
Martijn van Groningen	a264cb6ddb	Refactor AutoFollowCoordinator to track leader indices per remote cluster (#36031 ) and replaced poll interval setting with a hardcoded poll interval. The hard coded interval will be removed in a follow up change to make use of cluster state API's wait_for_metatdata_version. Before the auto following was bootstrapped from thread pool scheduler, but now auto followers for new remote clusters are bootstrapped when a new cluster state is published. Originates from #35895 Relates to #33007	2018-12-05 13:39:14 +01:00
Alpar Torok	60e45cd81d	Testing conventions task part 2 (#36107 ) Closes #35435 - make it easier to add additional testing tasks with the proper configuration and add some where they were missing. - mute or fix failing tests - add a check as part of testing conventions to find classes not included in any testing task.	2018-12-05 14:20:01 +02:00

1 2 3 4 5 ...

315 Commits