OpenSearch

Commit Graph

Author	SHA1	Message	Date
Tal Levy	5640197632	Refactor TransportSingleShardAction to serialize Writeable responses (#41985 ) (#42040 ) Previously, TransportSingleShardAction required constructing a new empty response object. This response object's Streamable readFrom was used. As part of the migration to Writeable, the interface here was updated to leverage Writeable.Reader. relates to #34389.	2019-05-09 22:08:31 -07:00
Jason Tedor	8bea3c3a58	Enable trace logging in CCR retention lease tests These tests are failing somewhat mysteriously, indicating that when we renew retention leaess during a restore that our retention leases that we added before starting the restore suddenly do not exist. To make sense of this, this commit enables trace logging.	2019-05-07 22:44:55 -04:00
Ryan Ernst	6fd8924c5a	Switch run task to use real distro (#41590 ) The run task is supposed to run elasticsearch with the given plugin or module. However, for modules, this is most realistic if using the full distribution. This commit changes the run setup to use the default or oss as appropriate.	2019-05-06 12:34:07 -07:00
Hicham Mallah	4a88da70c5	Add index name to cluster block exception (#41489 ) Updates the error message to reveal the index name that is causing it. Closes #40870	2019-05-04 19:11:59 -04:00
Nhat Nguyen	c7924014fa	Verify consistency of version and source in disruption tests (#41614 ) (#41661 ) With this change, we will verify the consistency of version and source (besides id, seq_no, and term) of live documents between shard copies at the end of disruption tests.	2019-05-03 18:47:14 -04:00
Nhat Nguyen	887f3f2c83	Simplify initialization of max_seq_no of updates (#41161 ) Today we choose to initialize max_seq_no_of_updates on primaries only so we can deal with a situation where a primary is on an old node (before 6.5) which does not have MUS while replicas on new nodes (6.5+). However, this strategy is quite complex and can lead to bugs (for example #40249) since we have to assign a correct value (not too low) to MSU in all possible situations (before recovering from translog, restoring history on promotion, and handing off relocation). Fortunately, we don't have to deal with this BWC in 7.0+ since all nodes in the cluster should have MSU. This change simplifies the initialization of MSU by always assigning it a correct value in the constructor of Engine regardless of whether it's a replica or primary. Relates #33842	2019-04-30 15:14:52 -04:00
David Kyle	f737b05ad1	Mute CcrRetentionLeaseIT.testForgetFollower https://github.com/elastic/elasticsearch/issues/39850	2019-04-30 09:55:16 +01:00
Armin Braun	aad33121d8	Async Snapshot Repository Deletes (#40144 ) (#41571 ) Motivated by slow snapshot deletes reported in e.g. #39656 and the fact that these likely are a contributing factor to repositories accumulating stale files over time when deletes fail to finish in time and are interrupted before they can complete. * Makes snapshot deletion async and parallelizes some steps of the delete process that can be safely run concurrently via the snapshot thread poll * I did not take the biggest potential speedup step here and parallelize the shard file deletion because that's probably better handled by moving to bulk deletes where possible (and can still be parallelized via the snapshot pool where it isn't). Also, I wanted to keep the size of the PR manageable. * See https://github.com/elastic/elasticsearch/pull/39656#issuecomment-470492106 * Also, as a side effect this gives the `SnapshotResiliencyTests` a little more coverage for master failover scenarios (since parallel access to a blob store repository during deletes is now possible since a delete isn't a single task anymore). * By adding a `ThreadPool` reference to the repository this also lays the groundwork to parallelizing shard snapshot uploads to improve the situation reported in #39657	2019-04-26 15:36:09 +02:00
Christoph Büscher	52495843cc	[Docs] Fix common word repetitions (#39703 )	2019-04-25 20:47:47 +02:00
Armin Braun	40aef2b8aa	Introduce Delegating ActionListener Wrappers (#40129 ) (#41527 ) * Introduce Delegating ActionListener Wrappers * Dry up use cases of ActionListener that simply pass through the response or exception to another listener	2019-04-25 16:05:04 +02:00
Jason Tedor	21bf2fe3c4	Reduce security permissions in CCR plugin (#41391 ) It looks like these permissions were copy/pasted from another plugin yet almost none of these permissions are needed for the CCR plugin. This commit removes all these unneeded permissions from the CCR plugin.	2019-04-20 08:21:59 -04:00
Adrien Grand	86e56590a7	Revert "Disable CcrRetentionLeaseIT#testRetentionLeasesAreNotBeingRenewedAfterRecoveryCompletes." This reverts commit `343039e200`.	2019-04-18 11:31:00 +02:00
Adrien Grand	343039e200	Disable CcrRetentionLeaseIT#testRetentionLeasesAreNotBeingRenewedAfterRecoveryCompletes. Relates #39331.	2019-04-18 11:29:11 +02:00
Armin Braun	233df6b73b	Make Transport Shard Bulk Action Async (#39793 ) (#41112 ) This is a dependency of #39504 Motivation: By refactoring `TransportShardBulkAction#shardOperationOnPrimary` to async, we enable using `DeterministicTaskQueue` based tests to run indexing operations. This was previously impossible since we were blocking on the `write` thread until the `update` thread finished the mapping update. With this change, the mapping update will trigger a new task in the `write` queue instead. This change significantly enhances the amount of coverage we get from `SnapshotResiliencyTests` (and other potential future tests) when it comes to tracking down concurrency issues with distributed state machines. The logical change is effectively all in `TransportShardBulkAction`, the rest of the changes is then simply mechanically moving the caller code and tests to being async and passing the `ActionListener` down. Since the move to async would've added more parameters to the `private static` steps in this logic, I decided to inline and dry up (between delete and update) the logic as much as I could instead of passing the listener + wait-consumer down through all of them.	2019-04-11 16:01:52 +02:00
Jason Tedor	bb6f060f74	Add log message to forget follower test This commit adds a log message to help debug failures in a forget follower test.	2019-04-09 23:33:29 -04:00
Julie Tibshirani	21c5d7e95f	Mute CcrRetentionLeaseIT#testRetentionLeasesAreNotBeingRenewedAfterRecoveryCompletes. Tracked in #39331.	2019-04-09 16:08:44 -07:00
Julie Tibshirani	cbae617898	Mute IndexFollowingIT#testFollowIndex as we await a fix. Tracked in #41037.	2019-04-09 14:56:37 -07:00
Mark Vieira	1287c7d91f	[Backport] Replace usages RandomizedTestingTask with built-in Gradle Test (#40978 ) (#40993 ) * Replace usages RandomizedTestingTask with built-in Gradle Test (#40978) This commit replaces the existing RandomizedTestingTask and supporting code with Gradle's built-in JUnit support via the Test task type. Additionally, the previous workaround to disable all tasks named "test" and create new unit testing tasks named "unitTest" has been removed such that the "test" task now runs unit tests as per the normal Gradle Java plugin conventions. (cherry picked from commit 323f312bbc829a63056a79ebe45adced5099f6e6) * Fix forking JVM runner * Don't bump shadow plugin version	2019-04-09 11:52:50 -07:00
David Turner	2ff19bc1b7	Use Writeable for TransportReplAction derivatives (#40905 ) Relates #34389, backport of #40894.	2019-04-05 19:10:10 +01:00
Martijn van Groningen	809a5f13a4	Make -try xlint warning disabled by default. (#40833 ) Many gradle projects specifically use the -try exclude flag, because there are many cases where auto-closeable resource ignore is never referenced in body of corresponding try statement. Suppressing this warning specifically in each case that it happens using `@SuppressWarnings("try")` would be very verbose. This change removes `-try` from any gradle project and adds it to the build plugin. Also this change removes exclude flags from gradle projects that is already specified in build plugin (for example -deprecation). Relates to #40366	2019-04-05 08:02:26 +02:00
David Turner	1d2bc85586	Inline TransportReplAction#registerRequestHandlers (#40762 ) It is important that resync actions are not rejected on the primary even if its `write` threadpool is overloaded. Today we do this by exposing `registerRequestHandlers` to subclasses and overriding it in `TransportResyncReplicationAction`. This isn't ideal because it obscures the difference between this action and other replication actions, and also might allow subclasses to try and use some state before they are properly initialised. This change replaces this override with a constructor parameter to solve these issues. Relates #40706	2019-04-03 12:12:26 +01:00
Christoph Büscher	a13be65b01	Fixing typo in test error message (#40611 )	2019-03-28 22:12:24 +01:00
Tim Brooks	760cfffe4b	Move TransportMessageListener to TransportService (#40474 ) Currently the TransportMessageListener is applied and used in the Transport class. However, local requests and responses never make it to this class. This PR moves the listener add/remove methods to the TransportService. After this change the Transport can only have one listener set with it. This one listener is the TransportService, which will then propogate the events to the external listeners. Additionally this commit back ports #40237 Remove Tracer from MockTransportService Currently the TransportMessageListener is applied and used in the Transport class. However, local requests and responses never make it to this class. This PR moves the listener add/remove methods to the TransportService. After this change the Transport can only have one listener set with it. This one listener is the TransportService, which will then propogate the events to the external listeners.	2019-03-27 09:24:20 -06:00
alex101101	fb8ad0cf30	Add a soft limit to the field name length (#40309 ) Adds an optional limit to the length of field names, throws an IllegalArgumentException if the limit is breached. Closes #33651	2019-03-26 17:58:32 +01:00
Jason Tedor	10bbb082a4	Only run retention lease actions on active primary (#40386 ) In some cases, a request to perform a retention lease action can arrive on a primary shard before it is active. In this case, the primary shard would not yet be in primary mode, tripping an assertion in the replication tracker. Instead, we should not attempt to perform such actions on an initializing shard. This commit addresses this by not returning the primary shard in the single shard iterator if the primary shard is not yet active.	2019-03-23 09:39:39 -04:00
Nhat Nguyen	0e12065b54	Relax max_seq_no_of_updates assertion in follow tests If there's a failover on the follower, then its max_seq_no_of_updates is bootstrapped from its max_seq_no which might be higher than the max_seq_no_of_updates of the leader. We need to relax this check. Relates #40249	2019-03-21 19:41:55 -04:00
Jason Tedor	1e6941b138	Reduce retention lease sync intervals (#40302 ) This commit adjusts the frequency with which CCR renews retention leases and with which primaries sync retention leases to replicas. This helps Lucene reclaim soft-deleted documents more aggressively, which we have found in some use-cases can help improve performance, and either way will help keep disk space under more control.	2019-03-21 07:37:44 -04:00
Like	6f64267626	Make setting index.translog.sync_interval be dynamic (#37382 ) Currently, we cannot update index setting index.translog.sync_interval if index is open, because it's not dynamic which can be updated for closed index only. Closes #32763	2019-03-20 17:12:45 +01:00
Henning Andersen	4c2a8638ca	Cascading primary failure lead to MSU too low (#40249 ) If a replica were first reset due to one primary failover and then promoted (before resync completes), its MSU would not include changes since global checkpoint, leading to errors during translog replay. Fixed by re-initializing MSU before restoring local history.	2019-03-20 14:00:43 +01:00
Jason Tedor	f88e4181ca	Enable reading auto-follow patterns from x-content (#40130 ) This named writable was never registered, so it means that we could not read auto-follow patterns that were registered in the cluster state. This causes them to be lost on restarts, a bad bug. This commit addresses this by registering this named writable, and we add a basic CCR restart test to ensure that CCR keeps functioning properly when the follower is restarted.	2019-03-18 21:48:44 -04:00
Nhat Nguyen	38e9522218	Remove wait for cluster state step in peer recovery (#40004 ) We introduced WAIT_CLUSTERSTATE action in #19287 (5.0), but then stopped using it since #25692 (6.0). This change removes that action and related code in 7.x and 8.0. Relates #19287 Relates #25692	2019-03-18 15:17:21 -04:00
Jason Tedor	5be12e0999	Safe publication of AutoFollowCoordinator (#40153 ) We were leaking a reference to an AutoFollowCoordinator during construction, violating safe publication according to the JLS specification. This commit addresses this by waiting to register AutoFollowCoordinator with the ClusterApplierService after the AutoFollowCoordinator is fully constructed. We also remove ourselves as a listener when stopping.	2019-03-18 10:13:41 -04:00
Jason Tedor	b8ad337234	Stop auto-followers on shutdown (#40124 ) When shutting down a node, auto-followers will keep trying to run. This is happening even as transport services and other components are being closed. In some cases, this can lead to a stack overflow as we rapidly try to check the license state of the remote cluster, can not because the transport service is shutdown, and then immeidately retry again. This can happen faster than the shutdown, and we die with stack overflow. This commit adds a stop command to auto-followers so that this retry loop occurs at most once on shutdown.	2019-03-18 07:25:31 -04:00
Jason Tedor	0824eceacf	Add log message for auto-follower timeout When an auto-follower coordinator times out waiting for the remote cluster state, we do not log any indication of this. While this is expected behavior in quiet deployments, it is still useful to see this information for tracing the behavior of the auto-follow coordinator. This commit adds a trace log message indicating that the timeout.	2019-03-16 10:46:20 -04:00
Jason Tedor	86d1d03c37	Remove cluster state size (#40109 ) This commit removes the cluster state size field from the cluster state response, and drops the backwards compatibility layer added in 6.7.0 to continue to support this field. As calculation of this field was expensive and had dubious value, we have elected to remove this field.	2019-03-15 17:16:25 -04:00
David Kyle	4eb3683d65	Mute CcrRetentionLeaseIT tests (#40090 )	2019-03-15 15:05:47 +00:00
Jake Landis	b0b0f66669	Remove types from internal monitoring templates and bump to api 7 (#39888 ) (#39926 ) This commit removes the "doc" type from monitoring internal indexes. The template still carries the "_doc" type since that is needed for the internal representation. This change impacts the following templates: monitoring-alerts.json monitoring-beats.json monitoring-es.json monitoring-kibana.json monitoring-logstash.json As part of the required changes, the system_api_version has been bumped from "6" to "7" and support for version "2" has been dropped. A new empty pipeline is now introduced for the version "7", and the formerly empty "6" pipeline will now remove the type and re-direct the request to the "7" index. Additionally, to due to a difference in the internal representation (which requires the inclusion of "_doc" type) and external representation (which requires the exclusion of any type) a helper method is introduced to help convert internal to external representation, and used by the monitoring HTTP template exporter. Relates #38637	2019-03-11 13:17:27 -05:00
Martijn van Groningen	8925a2c6c2	Further tweak AutoFollowIT#testAutoFollowManyIndices: * reduce the number of leader indices to be auto followed * also check the number of follower indices being created * also check the whether leader indices are marked as auto followed Relates to #36761	2019-03-11 10:01:56 +01:00
Daniel Mitterdorfer	1bc31aca03	Mute CcrRetentionLeaseIT#testRetentionLeaseRenewalIsCancelledWhenFollowingIsPaused (#39897 ) Relates #39509	2019-03-11 08:47:51 +01:00
Jason Tedor	6675bafc49	Simplify CcrRetentionLeaseIT#testForgetFollower This test was more complicated than necessary, where we were capturing requests to prevent removal of retention leases, so that our forget follower request could remove the retention leases instead. Instead, a pause is enough to ensure that the retention leases are not re-added after we remove them by the forget follower request. This commit simplifies this test, and should remove some spurious failures. Relates #39850	2019-03-08 12:33:17 -05:00
Martijn van Groningen	8666aa1ed2	unmuted and tweaked test Relates to #36761	2019-03-08 12:43:23 +01:00
Jason Tedor	0250d554b6	Introduce forget follower API (#39718 ) This commit introduces the forget follower API. This API is needed in cases that unfollowing a following index fails to remove the shard history retention leases on the leader index. This can happen explicitly through user action, or implicitly through an index managed by ILM. When this occurs, history will be retained longer than necessary. While the retention lease will eventually expire, it can be expensive to allow history to persist for that long, and also prevent ILM from performing actions like shrink on the leader index. As such, we introduce an API to allow for manual removal of the shard history retention leases in this case.	2019-03-07 11:08:45 -05:00
Nhat Nguyen	83688ce2d4	Unmute testFollowIndexAndCloseNode Resolved in #39584	2019-03-06 22:39:13 -05:00
David Turner	77dd711847	Tidy up GroupedActionListener (#39633 ) Today the `GroupedActionListener` accepts a `defaults` parameter but all callers pass an empty list. Also it is permitted to pass an empty group but this is trappy because the delegated listener is never be called in that case. This commit removes the `defaults` parameter and forbids an empty group.	2019-03-06 09:25:10 +00:00
Jason Tedor	75a0d4f470	Rename retention lease setting (#39719 ) This commit renames the retention lease setting index.soft_deletes.retention.lease so that it is under the namespace index.soft_deletes.retention_lease. As such, we rename the setting to index.soft_deletes.retention_lease.period.	2019-03-05 22:04:45 -05:00
Nhat Nguyen	af4918ebff	Simplify AutoFollowCoordinator with GroupedListener (#39603 ) This change simplifies AutoFollowCoordinator by replacing a combination of AtomicArray and CountDown with GroupedActionListener.	2019-03-04 13:50:27 -05:00
Yannick Welsch	0f65390c29	Do not mutate engine during planning step (#39571 ) This cleans up the Engine implementation by separating the sequence number generation from the planning step in the engine, to avoid for the planning step to have any side effects. This makes it easier to see that every sequence number is properly accounted for.	2019-03-04 10:11:39 +01:00
Tanguy Leroux	e005eeb0b3	Backport support for replicating closed indices to 7.x (#39506 )(#39499 ) Backport support for replicating closed indices (#39499) Before this change, closed indexes were simply not replicated. It was therefore possible to close an index and then decommission a data node without knowing that this data node contained shards of the closed index, potentially leading to data loss. Shards of closed indices were not completely taken into account when balancing the shards within the cluster, or automatically replicated through shard copies, and they were not easily movable from node A to node B using APIs like Cluster Reroute without being fully reopened and closed again. This commit changes the logic executed when closing an index, so that its shards are not just removed and forgotten but are instead reinitialized and reallocated on data nodes using an engine implementation which does not allow searching or indexing, which has a low memory overhead (compared with searchable/indexable opened shards) and which allows shards to be recovered from peer or promoted as primaries when needed. This new closing logic is built on top of the new Close Index API introduced in 6.7.0 (#37359). Some pre-closing sanity checks are executed on the shards before closing them, and closing an index on a 8.0 cluster will reinitialize the index shards and therefore impact the cluster health. Some APIs have been adapted to make them work with closed indices: - Cluster Health API - Cluster Reroute API - Cluster Allocation Explain API - Recovery API - Cat Indices - Cat Shards - Cat Health - Cat Recovery This commit contains all the following changes (most recent first): * c6c42a1 Adapt NoOpEngineTests after #39006 * 3f9993d Wait for shards to be active after closing indices (#38854) * 5e7a428 Adapt the Cluster Health API to closed indices (#39364) * 3e61939 Adapt CloseFollowerIndexIT for replicated closed indices (#38767) * 71f5c34 Recover closed indices after a full cluster restart (#39249) * 4db7fd9 Adapt the Recovery API for closed indices (#38421) * 4fd1bb2 Adapt more tests suites to closed indices (#39186) * 0519016 Add replica to primary promotion test for closed indices (#39110) * b756f6c Test the Cluster Shard Allocation Explain API with closed indices (#38631) * c484c66 Remove index routing table of closed indices in mixed versions clusters (#38955) * 00f1828 Mute CloseFollowerIndexIT.testCloseAndReopenFollowerIndex() * e845b0a Do not schedule Refresh/Translog/GlobalCheckpoint tasks for closed indices (#38329) * cf9a015 Adapt testIndexCanChangeCustomDataPath for replicated closed indices (#38327) * b9becdd Adapt testPendingTasks() for replicated closed indices (#38326) * 02cc730 Allow shards of closed indices to be replicated as regular shards (#38024) * e53a9be Fix compilation error in IndexShardIT after merge with master * cae4155 Relax NoOpEngine constraints (#37413) * 54d110b [RCI] Adapt NoOpEngine to latest FrozenEngine changes * c63fd69 [RCI] Add NoOpEngine for closed indices (#33903) Relates to #33888	2019-03-01 14:48:26 +01:00
Martijn van Groningen	24e478c58e	Fix test, more than one node may be connected. Relates to #37681	2019-02-26 10:40:09 +01:00
Martijn van Groningen	b159cc51c0	Ensure remote connection established and clean remote connection prior to leader cluster restart Relates to #37681	2019-02-26 09:06:30 +01:00

1 2 3 4 5 ...

485 Commits