OpenSearch

Commit Graph

Author	SHA1	Message	Date
Armin Braun	813b49adb4	Make BlobStoreRepository Aware of ClusterState (#49639 ) (#49711 ) * Make BlobStoreRepository Aware of ClusterState (#49639) This is a preliminary to #49060. It does not introduce any substantial behavior change to how the blob store repository operates. What it does is to add all the infrastructure changes around passing the cluster service to the blob store, associated test changes and a best effort approach to tracking the latest repository generation on all nodes from cluster state updates. This brings a slight improvement to the consistency by which non-master nodes (or master directly after a failover) will be able to determine the latest repository generation. It does not however do any tricky checks for the situation after a repository operation (create, delete or cleanup) that could theoretically be used to get even greater accuracy to keep this change simple. This change does not in any way alter the behavior of the blobstore repository other than adding a better "guess" for the value of the latest repo generation and is mainly intended to isolate the actual logical change to how the repository operates in #49060	2019-11-29 14:57:47 +01:00
Yannick Welsch	04e9cbd6eb	Revert "Remove obsolete resolving logic from TRA (#49647 )" This reverts commit `0827ea2175`.	2019-11-28 13:12:07 +01:00
Yannick Welsch	0827ea2175	Remove obsolete resolving logic from TRA (#49647 ) This stems from a time where index requests were directly forwarded to TransportReplicationAction. Nowadays they are wrapped in a BulkShardRequest, and this logic is obsolete. Closes #20279	2019-11-28 12:11:27 +01:00
Tim Brooks	416178c7c8	Enable simple remote connection strategy (#49561 ) This commit back ports three commits related to enabling the simple connection strategy. Allow simple connection strategy to be configured (#49066) Currently the simple connection strategy only exists in the code. It cannot be configured. This commit moves in the direction of allowing it to be configured. It introduces settings for the addresses and socket count. Additionally it introduces new settings for the sniff strategy so that the more generic number of connections and seed node settings can be deprecated. The simple settings are not yet registered as the registration is dependent on follow-up work to validate the settings. Ensure at least 1 seed configured in remote test (#49389) This fixes #49384. Currently when we select a random subset of seed nodes from a list, it is possible for 0 seeds to be selected. This test depends on at least 1 seed being selected. Add the simple strategy to cluster settings (#49414) This is related to #49067. This commit adds the simple connection strategy settings and strategy mode setting to the cluster settings registry. With these changes, the simple connection mode can be used. Additionally, it adds validation to ensure that settings cannot be misconfigured.	2019-11-25 16:53:07 -07:00
Jason Tedor	71bcfbf1e3	Replace required pipeline with final pipeline (#49470 ) This commit enhances the required pipeline functionality by changing it so that default/request pipelines can also be executed, but the required pipeline is always executed last. This gives users the flexibility to execute their own indexing pipelines, but also ensure that any required pipelines are also executed. Since such pipelines are executed last, we change the name of required pipelines to final pipelines.	2019-11-22 14:37:36 -05:00
Nhat Nguyen	fec22130c2	Improve error message when pausing index (#48915 ) Throw an appropriate error message when the follower index is not found or is a regular index.	2019-11-20 15:58:44 -05:00
Armin Braun	0acba44a2e	Make Repository.getRepositoryData an Async API (#49299 ) (#49312 ) This API call in most implementations is fairly IO heavy and slow so it is more natural to be async in the first place. Concretely though, this change is a prerequisite of #49060 since determining the repository generation from the cluster state introduces situations where this call would have to wait for other operations to finish. Doing so in a blocking manner would break `SnapshotResiliencyTests` and waste a thread. Also, this sets up the possibility to in the future make use of async IO where provided by the underlying Repository implementation. In a follow-up `SnapshotsService#getRepositoryData` will be made async as well (did not do it here, since it's another huge change to do so). Note: This change for now does not alter the threading behaviour in any way (since `Repository#getRepositoryData` isn't forking) and is purely mechanical.	2019-11-19 16:49:12 +01:00
Tanguy Leroux	fcac3fbfd9	AutoFollowIT should not rely on assertBusy but should use latches instead (#49141 ) AutoFollowIT relies on assertBusy() calls to wait for a given number of leader indices to be created but this is prone to failures on CI. Instead, we should use latches to indicate when auto-follow patterns must be paused and resumed.	2019-11-18 09:40:56 +01:00
Jason Tedor	60d1d67aac	CCR should auto-retry rejected execution exceptions (#49213 ) If CCR encounters a rejected execution exception, today we treat this as fatal. This is not though, as the stuffed queue could drain. Requiring an administrator to manually restart the follow tasks that faced such an exception is a burden. This commit addresses this by making CCR auto-retry on rejected execution exceptions.	2019-11-17 12:48:46 -05:00
Rory Hunter	c46a0e8708	Apply 2-space indent to all gradle scripts (#49071 ) Backport of #48849. Update `.editorconfig` to make the Java settings the default for all files, and then apply a 2-space indent to all `*.gradle` files. Then reformat all the files.	2019-11-14 11:01:23 +00:00
Henning Andersen	66f0c8900f	Fix Transport Stopped Exception (#48930 ) (#49035 ) When a node shuts down, `TransportService` moves to stopped state and then closes connections. If a request is done in between, an exception was thrown that was not retried in replication actions. Now throw a wrapped `NodeClosedException` exception instead, which is correctly handled in replication action. Fixed other usages too. Relates #42612	2019-11-13 18:48:05 +01:00
Tanguy Leroux	e86b598813	Fix AutoFollowIT (#49025 ) This commit fixes an off-by-one bug in the AutoFollowIT test that causes failures because the leaderIndices counter is incremented during the evaluation of the leaderIndices.incrementAndGet() < 20 condition but the 20th index is not created, making the final assertion not verified. It also gives a bit more time for cluster state updates to be processed on the follower cluster. Closes #48982	2019-11-13 13:20:57 +01:00
Yannick Welsch	2dfa0133d5	Always use primary term from primary to index docs on replica (#47583 ) Ensures that we always use the primary term established by the primary to index docs on the replica. Makes the logic around replication less brittle by always using the operation primary term on the replica that is coming from the primary.	2019-11-13 12:13:45 +01:00
Jake Landis	c320b499a0	Prevent deadlock by using separate schedulers (#48697 ) (#48964 ) Currently the BulkProcessor class uses a single scheduler to schedule flushes and retries. Functionally these are very different concerns but can result in a dead lock. Specifically, the single shared scheduler can kick off a flush task, which only finishes it's task when the bulk that is being flushed finishes. If (for what ever reason), any items in that bulk fails it will (by default) schedule a retry. However, that retry will never run it's task, since the flush task is consuming the 1 and only thread available from the shared scheduler. Since the BulkProcessor is mostly client based code, the client can provide their own scheduler. As-is the scheduler would require at minimum 2 worker threads to avoid the potential deadlock. Since the number of threads is a configuration option in the scheduler, the code can not enforce this 2 worker rule until runtime. For this reason this commit splits the single task scheduler into 2 schedulers. This eliminates the potential for the flush task to block the retry task and removes this deadlock scenario. This commit also deprecates the Java APIs that presume a single scheduler, and updates any internal code to no longer use those APIs. Fixes #47599 Note - #41451 fixed the general case where a bulk fails and is retried that can result in a deadlock. This fix should address that case as well as the case when a bulk failure from the flush needs to be retried.	2019-11-11 16:31:21 -06:00
Yannick Welsch	af887be3e5	Hide orphaned tasks from follower stats (#48901 ) CCR follower stats can return information for persistent tasks that are in the process of being cleaned up. This is problematic for tests where CCR follower indices have been deleted, but their persistent follower task is only cleaned up asynchronously afterwards. If one of the following tests then accesses the follower stats, it might still get the stats for that follower task. In addition, some tests were not cleaning up their auto-follow patterns, leaving orphaned patterns behind. Other tests cleaned up their auto-follow patterns. As always the same name was used, it just depended on the test execution order whether this led to a failure or not. This commit fixes the offensive tests, and will also automatically remove auto-follow-patterns at the end of tests, like we do for many other features. Closes #48700	2019-11-08 13:56:53 +01:00
Nhat Nguyen	020ff0fef9	Do not intercept renew requests from other tests (#48833 ) We might have some outstanding renew retention lease requests after a shard has unfollowed. If testRetentionLeaseIsAddedIfItDisappearsWhileFollowing intercepts a renew request from other tests then we will never unlatch and the test will time out. Closes #45192	2019-11-02 21:15:05 -04:00
Armin Braun	a22f6fbe3c	Cleanup Redundant Futures in Recovery Code (#48805 ) (#48832 ) Follow up to #48110 cleaning up the redundant future uses that were left over from that change.	2019-11-02 17:28:12 +01:00
Nhat Nguyen	4c70770877	Add debug log for CcrRetentionLeaseIT (#48820 ) testRetentionLeaseIsAddedIfItDisappearsWhileFollowing is still failing although we already have several fixes. I think other tests interfere and cause this test to fail. We can use the test scope to isolate them. However, I prefer to add debug logs so we can find the source. Relates #45192	2019-11-01 22:07:35 -04:00
Armin Braun	e26d01e71f	Make CcrRepository#restore non-Blocking (#48814 ) (#48823 ) With the changes in #48110 there is no more need to block a generic thread when waiting for the multi file transfer in `CcrRepository`.	2019-11-01 21:02:47 +01:00
Armin Braun	52e5ceb321	Restore from Individual Shard Snapshot Files in Parallel (#48110 ) (#48686 ) Make restoring shard snapshots run in parallel on the `SNAPSHOT` thread-pool.	2019-10-30 14:36:30 +01:00
Tim Brooks	f5f1072824	Multiple remote connection strategy support (#48496 ) * Extract remote "sniffing" to connection strategy (#47253) Currently the connection strategy used by the remote cluster service is implemented as a multi-step sniffing process in the RemoteClusterConnection. We intend to introduce a new connection strategy that will operate in a different manner. This commit extracts the sniffing logic to a dedicated strategy class. Additionally, it implements dedicated tests for this class. Additionally, in previous commits we moved away from a world where the remote cluster connection was mutable. Instead, when setting updates are made, the connection is torn down and rebuilt. We still had methods and tests hanging around for the mutable behavior. This commit removes those. * Introduce simple remote connection strategy (#47480) This commit introduces a simple remote connection strategy which will open remote connections to a configurable list of user supplied addresses. These addresses can be remote Elasticsearch nodes or intermediate proxies. We will perform normal clustername and version validation, but otherwise rely on the remote cluster to route requests to the appropriate remote node. * Make remote setting updates support diff strategies (#47891) Currently the entire remote cluster settings infrastructure is designed around the sniff strategy. As we introduce an additional conneciton strategy this infrastructure needs to be modified to support it. This commit modifies the code so that the strategy implementations will tell the service if the connection needs to be torn down and rebuilt. As part of this commit, we will wait 10 seconds for new clusters to connect when they are added through the "update" settings infrastructure. * Make remote setting updates support diff strategies (#47891) Currently the entire remote cluster settings infrastructure is designed around the sniff strategy. As we introduce an additional conneciton strategy this infrastructure needs to be modified to support it. This commit modifies the code so that the strategy implementations will tell the service if the connection needs to be torn down and rebuilt. As part of this commit, we will wait 10 seconds for new clusters to connect when they are added through the "update" settings infrastructure.	2019-10-25 09:29:41 -06:00
Tim Brooks	c0b545f325	Make BytesReference an interface (#48486 ) BytesReference is currently an abstract class which is extended by various implementations. This makes it very difficult to use the delegation pattern. The implication of this is that our releasable BytesReference is a PagedBytesReference type and cannot be used as a generic releasable bytes reference that delegates to any reference type. This commit makes BytesReference an interface and introduces an AbstractBytesReference for common functionality.	2019-10-24 15:39:30 -06:00
Armin Braun	7215201406	Track Shard-Snapshot Index Generation at Repository Root (#48371 ) This change adds a new field `"shards"` to `RepositoryData` that contains a mapping of `IndexId` to a `String[]`. This string array can be accessed by shard id to get the generation of a shard's shard folder (i.e. the `N` in the name of the currently valid `/indices/${indexId}/${shardId}/index-${N}` for the shard in question). This allows for creating a new snapshot in the shard without doing any LIST operations on the shard's folder. In the case of AWS S3, this saves about 1/3 of the cost for updating an empty shard (see #45736) and removes one out of two remaining potential issues with eventually consistent blob stores (see #38941 ... now only the root `index-${N}` is determined by listing). Also and equally if not more important, a number of possible failure modes on eventually consistent blob stores like AWS S3 are eliminated by moving all delete operations to the `master` node and moving from incremental naming of shard level index-N to uuid suffixes for these blobs. This change moves the deleting of the previous shard level `index-${uuid}` blob to the master node instead of the data node allowing for a safe and consistent update of the shard's generation in the `RepositoryData` by first updating `RepositoryData` and then deleting the now unreferenced `index-${newUUID}` blob. __No deletes are executed on the data nodes at all for any operation with this change.__ Note also: Previous issues with hanging data nodes interfering with master nodes are completely impossible, even on S3 (see next section for details). This change changes the naming of the shard level `index-${N}` blobs to a uuid suffix `index-${UUID}`. The reason for this is the fact that writing a new shard-level `index-` generation blob is not atomic anymore in its effect. Not only does the blob have to be written to have an effect, it must also be referenced by the root level `index-N` (`RepositoryData`) to become an effective part of the snapshot repository. This leads to a problem if we were to use incrementing names like we did before. If a blob `index-${N+1}` is written but due to the node/network/cluster/... crashes the root level `RepositoryData` has not been updated then a future operation will determine the shard's generation to be `N` and try to write a new `index-${N+1}` to the already existing path. Updates like that are problematic on S3 for consistency reasons, but also create numerous issues when thinking about stuck data nodes. Previously stuck data nodes that were tasked to write `index-${N+1}` but got stuck and tried to do so after some other node had already written `index-${N+1}` were prevented form doing so (except for on S3) by us not allowing overwrites for that blob and thus no corruption could occur. Were we to continue using incrementing names, we could not do this. The stuck node scenario would either allow for overwriting the `N+1` generation or force us to continue using a `LIST` operation to figure out the next `N` (which would make this change pointless). With uuid naming and moving all deletes to `master` this becomes a non-issue. Data nodes write updated shard generation `index-${uuid}` and `master` makes those `index-${uuid}` part of the `RepositoryData` that it deems correct and cleans up all those `index-` that are unused. Co-authored-by: Yannick Welsch <yannick@welsch.lu> Co-authored-by: Tanguy Leroux <tlrx.dev@gmail.com>	2019-10-23 10:58:26 +01:00
Nhat Nguyen	d0a4bad95b	Use MultiFileTransfer in CCR remote recovery (#44514 ) Relates #44468	2019-10-21 23:30:52 -04:00
Armin Braun	e65c60915a	Cleanup FileRestoreContext Abstractions (#48173 ) (#48300 ) This class is only used by the blob store repository and CCR and the abstractions didn't really make sense with CCR ignoring the concrete `restoreFiles` method completely and having a method used only by the blobstore overriden as unsupported. => Moved to a more fitting set of abstractions => Dried up the stream wrapping in `BlobStoreRepository` a little now that the `restoreFile` method could be simplified Relates #48110 as it makes changing the API of `FileRestoreContext` to what is needed for async restores simpler	2019-10-21 17:30:35 +02:00
Armin Braun	dc08feadc6	Remove Redundant Version Param from Repository APIs (#48231 ) (#48298 ) This parameter isn't used by any implementation	2019-10-21 16:20:45 +02:00
Tanguy Leroux	0094bd5939	Fix AutoFollowIT.testPauseAndResumeWithMultipleAutoFollowPatterns (#48289 ) The test testPauseAndResumeWithMultipleAutoFollowPatterns failed multiple times, mostly because it creates too many leader indices and the following cluster cannot cope with cluster state updates generated by following indices creation and pause/ resume auto-followers changes. This commit simplifies the test by creating at most 20 leader indices and by waiting for any new leader index to be picked up by the auto-follower before created another leader index. It also pause and resume less auto-followers as previously. closes #47917	2019-10-21 14:31:58 +02:00
Armin Braun	1157775074	Remove Support for pre-5.x Indices in Restore (#48181 ) (#48199 ) The logic for handling empty segment files has been unnecessary ever since #24021 which removes the support for these files in 6.x -> we can safely remove the support for restoring these from 7.x+ to simplify the code.	2019-10-18 09:45:07 +02:00
Tanguy Leroux	742fa818b8	Add Pause/Resume Auto Follower APIs (#47510 ) (#47904 ) This commit adds two APIs that allow to pause and resume CCR auto-follower patterns: // pause auto-follower POST /_ccr/auto_follow/my_pattern/pause // resume auto-follower POST /_ccr/auto_follow/my_pattern/resume The ability to pause and resume auto-follow patterns can be useful in some situations, including the rolling upgrades of cluster using a bi-directional cross-cluster replication scheme (see #46665). This commit adds a new active flag to the AutoFollowPattern and adapts the AutoCoordinator and AutoFollower classes so that it stops to fetch remote's cluster state when all auto-follow patterns associate to the remote cluster are paused. When an auto-follower is paused, remote indices that match the pattern are just ignored: they are not added to the pattern's followed indices uids list that is maintained in the local cluster state. This way, when the auto-follow pattern is resumed the indices created in the remote cluster in the meantime will be picked up again and added as new following indices. Indices created and then deleted in the remote cluster will be ignored as they won't be seen at all by the auto-follower pattern at resume time. Backport of #47510 for 7.x	2019-10-13 09:22:51 +02:00
Tanguy Leroux	8f86469d3f	Do not auto-follow closed indices (#47721 ) (#47800 ) Backport of (#47721) for 7.x. Similarly to #47582, Auto-follow patterns creates following indices as long as the remote index matches the pattern and the remote primary shards are all started. But since 7.2 closed indices are also replicated, and it does not play well with CCR auto-follow patterns as they create following indices for closed leader indices too. This commit changes the getLeaderIndicesToFollow() so that closed indices are excluded from auto-follow patterns.	2019-10-09 19:16:23 +02:00
Tanguy Leroux	b5ac0204d2	Fail earlier Put Follow requests for closed leader indices (#47637 ) Backport of (#47582) Today when following a new leader index, we fetch the remote cluster state, check the remote cluster license, check the user privileges, retrieve the index shard stats before initiating a CCR restore session. But if the leader index to follow is closed, we're executing a bunch of operations that would inevitability fail at some point (on retrieving the index shard stats, because this type of request forbid closed indices when resolving indices). We could fail a Put Follow request at the first step by checking the leader index state directly from the remote cluster state. This also helps the Resume Follow API to fail a bit earlier.	2019-10-07 13:59:04 +02:00
Armin Braun	3d23cb44a3	Speed up Snapshot Finalization (#47283 ) (#47309 ) As a result of #45689 snapshot finalization started to take significantly longer than before. This may be a little unfortunate since it increases the likelihood of failing to finalize after having written out all the segment blobs. This change parallelizes all the metadata writes that can safely run in parallel in the finalization step to speed the finalization step up again. Also, this will generally speed up the snapshot process overall in case of large number of indices. This is also a nice to have for #46250 since we add yet another step (deleting of old index- blobs in the shards to the finalization.	2019-09-30 23:28:59 +02:00
Yannick Welsch	9dc90e41fc	Remove "force" version type (#47228 ) It's been deprecated long ago and can be removed. Relates to #20377 Closes #19769	2019-09-30 11:58:34 +02:00
Rory Hunter	53a4d2176f	Convert most awaitBusy calls to assertBusy (#45794 ) (#47112 ) Backport of #45794 to 7.x. Convert most `awaitBusy` calls to `assertBusy`, and use asserts where possible. Follows on from #28548 by @liketic. There were a small number of places where it didn't make sense to me to call `assertBusy`, so I kept the existing calls but renamed the method to `waitUntil`. This was partly to better reflect its usage, and partly so that anyone trying to add a new call to awaitBusy wouldn't be able to find it. I also didn't change the usage in `TransportStopRollupAction` as the comments state that the local awaitBusy method is a temporary copy-and-paste. Other changes: * Rework `waitForDocs` to scale its timeout. Instead of calling `assertBusy` in a loop, work out a reasonable overall timeout and await just once. * Some tests failed after switching to `assertBusy` and had to be fixed. * Correct the expect templates in AbstractUpgradeTestCase. The ES Security team confirmed that they don't use templates any more, so remove this from the expected templates. Also rewrite how the setup code checks for templates, in order to give more information. * Remove an expected ML template from XPackRestTestConstants The ML team advised that the ML tests shouldn't be waiting for any `.ml-notifications` templates, since such checks should happen in the production code instead. Also rework the template checking code in `XPackRestTestHelper` to give more helpful failure messages. * Fix issue in `DataFrameSurvivesUpgradeIT` when upgrading from < 7.4	2019-09-29 12:21:46 +01:00
Nhat Nguyen	444b47ce88	Relax maxSeqNoOfUpdates assertion in FollowingEngine (#47188 ) We disable MSU optimization if the local checkpoint is smaller than max_seq_no_of_updates. Hence, we need to relax the MSU assertion in FollowingEngine for that scenario. Suppose the leader has three operations: index-0, delete-1, and index-2 for the same doc Id. MSU on the leader is 1 as index-2 is an append. If the follower applies index-0 then index-2, then the assertion is violated. Closes #47137	2019-09-27 14:00:20 -04:00
Jason Tedor	bd77626177	Add the ability to require an ingest pipeline (#46847 ) This commit adds the ability to require an ingest pipeline on an index. Today we can have a default pipeline, but that could be overridden by a request pipeline parameter. This commit introduces a new index setting index.required_pipeline that acts similarly to index.default_pipeline, except that it can not be overridden by a request pipeline parameter. Additionally, a default pipeline and a request pipeline can not both be set. The required pipeline can be set to _none to ensure that no pipeline ever runs for index requests on that index.	2019-09-19 16:37:45 -04:00
Armin Braun	b0f09b279f	Make Snapshot Logic Write Metadata after Segments (#45689 ) (#46764 ) * Write metadata during snapshot finalization after segment files to prevent outdated metadata in case of dynamic mapping updates as explained in #41581 * Keep the old behavior of writing the metadata beforehand in the case of mixed version clusters for BwC reasons * Still overwrite the metadata in the end, so even a mixed version cluster is fixed by this change if a newer version master does the finalization * Fixes #41581	2019-09-17 13:09:39 +02:00
Nhat Nguyen	cabff5a7cd	Handle lower retaining seqno retention lease error (#46420 ) We renew the CCR retention lease at a fixed interval, therefore it's possible to have more than one in-flight renewal requests at the same time. If requests arrive out of order, then the assertion is violated. Closes #46416 Closes #46013	2019-09-13 08:50:19 -04:00
Armin Braun	41633cb9b5	More Efficient Ordering of Shard Upload Execution (#42791 ) (#46588 ) * More Efficient Ordering of Shard Upload Execution (#42791) * Change the upload order of of snapshots to work file by file in parallel on the snapshot pool instead of merely shard-by-shard * Inspired by #39657 * Cleanup BlobStoreRepository Abort and Failure Handling (#46208)	2019-09-11 13:59:20 +02:00
Simon Willnauer	9b2ea07b17	Flush engine after big merge (#46066 ) (#46111 ) Today we might carry on a big merge uncommitted and therefore occupy a significant amount of diskspace for quite a long time if for instance indexing load goes down and we are not quickly reaching the translog size threshold. This change will cause a flush if we hit a significant merge (512MB by default) which frees diskspace sooner.	2019-08-29 17:54:15 +02:00
Nhat Nguyen	028e792e1d	Remove already exist assertion while renew ccr lease (#46009 ) If a CCR lease is disappeared while we are renewing it, then we will issue asyncAddRetentionLease to add that lease. And if asyncAddRetentionLease takes longer than retentionLeaseRenewInterval, then we can issue another asyncAddRetentionLease request. One of asyncAddRetentionLease requests will fail with RetentionLeaseAlreadyExistsException, hence trip the assertion. Closes #45192	2019-08-29 09:44:40 -04:00
Nhat Nguyen	99b21d50b8	Include leases in ccr errmsg when ops no longer available (#45681 ) The setting index.soft_deletes.retention.operations is no longer needed nor recommended in CCR. We, therefore, should hint users about the retention leases period setting instead when operations are no longer available for replicating.	2019-08-20 10:40:12 -04:00
Armin Braun	a9e1402189	Remove Settings from BaseRestRequest Constructor (#45418 ) (#45429 ) * Resolving the todo, cleaning up the unused `settings` parameter * Cleaning up some other minor dead code in affected classes	2019-08-12 05:14:45 +02:00
Alpar Torok	634a070430	Restrict which tasks can use testclusters (#45198 ) * Restrict which tasks can use testclusters This PR fixes a problem between the interaction of test-clusters and build cache. Before this any task could have used a cluster without tracking it as input. With this change a new interface is introduced to track the tasks that can use clusters and we do consider the cluster as input for all of them.	2019-08-09 13:38:01 +03:00
David Turner	9ff320d967	Use index for peer recovery instead of translog (#45137 ) Today we recover a replica by copying operations from the primary's translog. However we also retain some historical operations in the index itself, as long as soft-deletes are enabled. This commit adjusts peer recovery to use the operations in the index for recovery rather than those in the translog, and ensures that the replication group retains enough history for use in peer recovery by means of retention leases. Reverts #38904 and #42211 Relates #41536 Backport of #45136 to 7.x.	2019-08-02 15:00:43 +01:00
Yannick Welsch	917510d3e4	Always use primary term of operation in InternalEngine (#45083 ) We keep adding the current primary term to operations for which we do not assign a sequence number. This does not make sense anymore as all operations which we care about have sequence numbers now. The goal of this commit is to clean things up in InternalEngine and reduce the complexity.	2019-08-01 17:30:00 +02:00
Yannick Welsch	e0d4544ef6	Close connection manager on current thread in RemoteClusterConnection (#44805 ) The problem is that RemoteClusterConnection closes the connection manager asynchronously, which races with the threadpool being shutdown at the end of the test. Closes #44339 Closes #44610	2019-07-25 09:34:41 +02:00
Tanguy Leroux	9944e193f9	[7.x] Clean up ShardFollowTasks for deleted indices (#44702 ) (#44790 ) Deleting a follower index does not delete its ShardFollowTasks, potentially leaving many persistent tasks in the cluster that cannot be allocated on nodes and unnecessary fill the logs. This commit adds a cluster state listener (ShardFollowTaskCleaner) that completes (with a failure) any persistent task that refers to a non existent follower index. I think that this bug has been introduced by #34404: before this change the task would have been completed as failed and removed from the cluster state. Backport of #44702 and #44801 on 7.x	2019-07-25 09:33:57 +02:00
Alpar Torok	b34ac66d96	Mute multiple tests on Windows (7.x) (#44676 ) * Mute failing test tracked in #44552 * mute EvilSecurityTests tracking in #44558 * Fix line endings in ESJsonLayoutTests * Mute failing ForecastIT test on windows Tracking in #44609 * mute BasicRenormalizationIT.testDefaultRenormalization tracked in #44613 * fix mute testDefaultRenormalization * Increase busyWait timeout windows is slow * Mute failure unconfigured node name * mute x-pack internal cluster test windows tracking #44610 * Mute JvmErgonomicsTests on windows Tracking #44669 * mute SharedClusterSnapshotRestoreIT testParallelRestoreOperationsFromSingleSnapshot Tracking #44671 * Mute NodeTests on Windows Tracking #44256	2019-07-22 11:32:29 +03:00
Ryan Ernst	f193d14764	Convert remaining Action Response/Request to writeable.reader (#44528 ) (#44607 ) This commit converts readFrom to ctor with StreamInput on the remaining ActionResponse and ActionRequest classes. relates #34389	2019-07-19 13:33:38 -07:00

1 2 3 4 5 ...

574 Commits