OpenSearch

Commit Graph

Author	SHA1	Message	Date
Nhat Nguyen	1a93976ff7	Correct arg names when update mapping/settings from leader (#38063 ) These two arguments are not named incorrectly and caused confusion.	2019-01-31 02:45:42 -05:00
Tim Brooks	b88bdfe958	Add dispatching to `HandledTransportAction` (#38050 ) This commit allows implementors of the `HandledTransportAction` to specify what thread the action should be executed on. The motivation for this commit is that certain CCR requests should be performed on the generic threadpool.	2019-01-30 15:40:49 -07:00
Tim Brooks	aeab55e8d1	Reduce flaxiness of ccr recovery timeouts test (#38035 ) This fixes #38027. Currently we assert that all shards have failed. However, it is possible that some shards do not have segement files created yet. The action that we block is fetching these segement files so it is possible that some shards successfully recover. This commit changes the assertion to ensure that at least some of the shards have failed.	2019-01-30 14:13:23 -07:00
Martijn van Groningen	5433af28e3	Fixed test bug, lastFollowTime is null if there are no follower indices.	2019-01-30 19:33:16 +01:00
Martijn van Groningen	f51bc00fcf	Added ccr to xpack usage infrastructure (#37256 ) * Added ccr to xpack usage infrastructure Closes #37221	2019-01-30 07:58:26 +01:00
Tim Brooks	55b916afc0	Ensure task metadata not null in follow test (#37993 ) This commit fixes a potential race in the IndexFollowingIT. Currently it is possible that we fetch the task metadata, it is null, and that throws a null pointer exception. Assertbusy does not catch null pointer exceptions. This commit assertions that the metadata is not null.	2019-01-29 15:58:31 -07:00
Tim Brooks	f3f9cabd67	Add timeout for ccr recovery action (#37840 ) This is related to #35975. It adds a action timeout setting that allows timeouts to be applied to the individual transport actions that are used during a ccr recovery.	2019-01-29 12:29:06 -07:00
Tim Brooks	00ace369af	Use `CcrRepository` to init follower index (#35719 ) This commit modifies the put follow index action to use a CcrRepository when creating a follower index. It routes the logic through the snapshot/restore process. A wait_for_active_shards parameter can be used to configure how long to wait before returning the response.	2019-01-29 11:47:29 -07:00
Przemyslaw Gomulka	891320f5ac	Elasticsearch support to JSON logging (#36833 ) In order to support JSON log format, a custom pattern layout was used and its configuration is enclosed in ESJsonLayout. Users are free to use their own patterns, but if smooth Beats integration is needed, they should use ESJsonLayout. EvilLoggerTests are left intact to make sure user's custom log patterns work fine. To populate additional fields node.id and cluster.uuid which are not available at start time, a cluster state update will have to be received and the values passed to log4j pattern converter. A ClusterStateObserver.Listener is used to receive only one ClusteStateUpdate. Once update is received the nodeId and clusterUUid are set in a static field in a NodeAndClusterIdConverter. Following fields are expected in JSON log lines: type, tiemstamp, level, component, cluster.name, node.name, node.id, cluster.uuid, message, stacktrace see ESJsonLayout.java for more details and field descriptions Docker log4j2 configuration is now almost the same as the one use for ES binary. The only difference is that docker is using console appenders, whereas ES is using file appenders. relates: #32850	2019-01-29 07:20:09 +01:00
Nhat Nguyen	557fcf915e	Wait for mapping in testReadRequestsReturnLatestMappingVersion (#37886 ) If the index request is executed before the mapping update is applied on the IndexShard, the index request will perform a dynamic mapping update. This mapping update will be timeout (i.e, ProcessClusterEventTimeoutException) because the latch is not open. This leads to the failure of the index request and the test. This commit makes sure the mapping is ready before we execute the index request. Closes #37807	2019-01-28 15:25:56 -05:00
Martijn van Groningen	4e1a779773	Prepare ShardFollowNodeTask to bootstrap when it fall behind leader shard (#37562 ) * Changed `LuceneSnapshot` to throw an `OperationsMissingException` if the requested ops are missing. * Changed the shard changes api to handle the `OperationsMissingException` and wrap the exception into `ResourceNotFound` exception and include metadata to indicate the requested range can no longer be retrieved. * Changed `ShardFollowNodeTask` to handle this `ResourceNotFound` exception with the included metdata header. Relates to #35975	2019-01-28 09:30:04 +01:00
Dimitrios Liappis	290c6637c2	Refactor into appropriate uses of scheduleUnlessShuttingDown (#37709 ) Replace `threadPool().schedule()` / catch `EsRejectedExecutionException` pattern with direct calls to `ThreadPool#scheduleUnlessShuttingDown()`. Closes #36318	2019-01-28 10:01:26 +02:00
Julie Tibshirani	7c130d235a	Mute CcrRepositoryIT#testFollowerMappingIsUpdated Tracked in #37887.	2019-01-25 14:55:47 -08:00
Tanguy Leroux	f1f54e0f61	TransportUnfollowAction should increase settings version (#37859 ) The TransportUnfollowAction updates the index settings but does not increase the settings version to reflect that change. This issue has been caught while working on the replication of closed indices (#33888). The IndexFollowingIT.testUnfollowIndex() started to fail and this specific assertion tripped. It does not happen on master branch today because index metadata for closed indices are never updated in IndexService instances, but this is something that is going to change with the replication of closed indices.	2019-01-25 16:31:26 +01:00
Martijn van Groningen	1151f3b3ff	Fail with a dedicated exception if remote connection is missing or (#37767 ) or connectivity to the remote connection is failing. Relates to #37681	2019-01-25 08:53:18 +01:00
Nhat Nguyen	76fb573569	Do not allow put mapping on follower (#37675 ) Today, the mapping on the follower is managed and replicated from its leader index by the ShardFollowTask. Thus, we should prevent users from modifying the mapping on the follower indices. Relates #30086	2019-01-24 12:13:00 -05:00
David Roberts	f12bfb4684	Mute FollowerFailOverIT testReadRequestsReturnsLatestMappingVersion Due to https://github.com/elastic/elasticsearch/issues/37807	2019-01-24 09:58:50 +00:00
Martijn van Groningen	2908ca1b35	Fix index filtering in follow info api. (#37752 ) The filtering by follower index was completely broken. Also the wrong persistent tasks were selected, causing the wrong status to be reported. Closes #37738	2019-01-24 08:50:23 +01:00
Nhat Nguyen	0096f1b2e4	Ensure changes requests return the latest mapping version (#37633 ) Today we keep the mapping on the follower in sync with the leader's using the mapping version from changes requests. There are two rare cases where the mapping on the follower is not synced properly: 1. The returned mapping version (from ClusterService) is outdated than the actual mapping. This happens because we expose the latest cluster state in ClusterService after applying it to IndexService. 2. It's possible for the FollowTask to receive an outdated mapping than the min_required_mapping. In that case, it should fetch the mapping again; otherwise, the follower won't have the right mapping. Relates to #31140	2019-01-23 13:41:13 -05:00
Tim Brooks	eb43ab6d60	Implement leader rate limiting for file restore (#37677 ) This is related to #35975. This commit implements rate limiting on the leader side using the CombinedRateLimiter.	2019-01-22 10:57:37 -07:00
Martijn van Groningen	ef2f5e4a13	Follow stats api should return a 404 when requesting stats for a non existing index (#37220 ) Currently it returns an empty response with a 200 response code. Closes #37021	2019-01-22 12:48:05 +01:00
Ryan Ernst	9a34b20233	Simplify integ test distribution types (#37618 ) The integ tests currently use the raw zip project name as the distribution type. This commit simplifies this specification to be "default" or "oss". Whether zip or tar is used should be an internal implementation detail of the integ test setup, which can (in the future) be platform specific.	2019-01-21 12:37:17 -08:00
Martijn van Groningen	88f4b0a326	Do not set fatal exception when shard follow task is stopped. (#37603 ) When shard follow task is cancelled while fetching operations then the fatal exception field should not be set.	2019-01-21 07:54:51 +01:00
Tim Brooks	fe753ee1d2	Do not add index event listener if CCR disabled (#37432 ) Currently we add the CcrRestoreSourceService as a index event listener. However, if ccr is disabled, this service is null and we attempt to add a null listener throwing an exception. This commit only adds the listener if ccr is enabled.	2019-01-18 16:31:21 -07:00
Tim Brooks	cd41289396	Add local session timeouts to leader node (#37438 ) This is related to #35975. This commit adds timeout functionality to the local session on a leader node. When a session is started, a timeout is scheduled using a repeatable runnable. If the session is not accessed in between two runs the session is closed. When the sssion is closed, the repeating task is cancelled. Additionally, this commit moves session uuid generation to the leader cluster. And renames the PutCcrRestoreSessionRequest to StartCcrRestoreSessionRequest to reflect that change.	2019-01-18 14:48:20 -07:00
Martijn van Groningen	6846666b6b	Add ccr follow info api (#37408 ) * Add ccr follow info api This api returns all follower indices and per follower index the provided parameters at put follow / resume follow time and whether index following is paused or active. Closes #37127 * iter * [DOCS] Edits the get follower info API * [DOCS] Fixes link to remote cluster * [DOCS] Clarifies descriptions for configured parameters	2019-01-18 16:37:21 +01:00
Tim Brooks	978c818d0f	Use RestoreSnapshotRequest in CcrRepositoryIT Commit #37535 removed an internal restore request in favor of the RestoreSnapshotRequest. Commit #37449 added a new test that used the internal restore request. This commit modifies the new test to use the RestoreSnapshotRequest.	2019-01-17 15:31:27 -07:00
Tim Brooks	b6f06a48c0	Implement follower rate limiting for file restore (#37449 ) This is related to #35975. This commit implements rate limiting on the follower side using a new class `CombinedRateLimiter`.	2019-01-17 14:58:46 -07:00
Armin Braun	381d035cd6	Remove Redundant RestoreRequest Class (#37535 ) * Same as #37464 but for the restore side	2019-01-17 22:23:23 +01:00
Martijn van Groningen	b85bfd3e17	Added fatal_exception field for ccr stats in monitoring mapping. (#37563 )	2019-01-17 14:04:41 +01:00
Martijn van Groningen	99b09845da	Moved ccr integration to the package with other ccr integration tests.	2019-01-17 13:57:56 +01:00
Przemyslaw Gomulka	5e94f384c4	Remove the use of AbstracLifecycleComponent constructor #37488 (#37488 ) The AbstracLifecycleComponent used to extend AbstractComponent, so it had to pass settings to the constractor of its supper class. It no longer extends the AbstractComponent so there is no need for this constructor There is also no need for AbstracLifecycleComponent subclasses to have Settings in their constructors if they were only passing it over to super constructor. This is part 1. which will be backported to 6.x with a migration guide/deprecation log. part 2 will have this constructor removed in 7 relates #35560 relates #34488	2019-01-16 09:05:30 +01:00
Martijn van Groningen	9554b2fecb	When removing an AutoFollower also mark it as removed. (#37402 ) Currently when there are no more auto follow patterns for a remote cluster then the AutoFollower instance for this remote cluster will be removed. If a new auto follow pattern for this remote cluster gets added quickly enough after the last delete then there may be two AutoFollower instance running for this remote cluster instead of one. Each AutoFollower instance stops automatically after it sees in the start() method that there are no more auto follow patterns for the remote cluster it is tracking. However when an auto follow pattern gets removed and then added back quickly enough then old AutoFollower may never detect that at some point there were no auto follow patterns for the remote cluster it is monitoring. The creation and removal of an AutoFollower instance happens independently in the `updateAutoFollowers()` as part of a cluster state update. By adding the `removed` field, an AutoFollower instance will not miss the fact there were no auto follow patterns at some point in time. The `updateAutoFollowers()` method now marks an AutoFollower instance as removed when it sees that there are no more patterns for a remote cluster. The updateAutoFollowers() method can then safely start a new AutoFollower instance. Relates to #36761	2019-01-15 16:24:19 +01:00
Julie Tibshirani	36a3b84fc9	Update the default for include_type_name to false. (#37285 ) * Default include_type_name to false for get and put mappings. * Default include_type_name to false for get field mappings. * Add a constant for the default include_type_name value. * Default include_type_name to false for get and put index templates. * Default include_type_name to false for create index. * Update create index calls in REST documentation to use include_type_name=true. * Some minor clean-ups around the get index API. * In REST tests, use include_type_name=true by default for index creation. * Make sure to use 'expression == false'. * Clarify the different IndexTemplateMetaData toXContent methods. * Fix FullClusterRestartIT#testSnapshotRestore. * Fix the ml_anomalies_default_mappings test. * Fix GetFieldMappingsResponseTests and GetIndexTemplateResponseTests. We make sure to specify include_type_name=true during xContent parsing, so we continue to test the legacy typed responses. XContent generation for the typeless responses is currently only covered by REST tests, but we will be adding unit test coverage for these as we implement each typeless API in the Java HLRC. This commit also refactors GetMappingsResponse to follow the same appraoch as the other mappings-related responses, where we read include_type_name out of the xContent params, instead of creating a second toXContent method. This gives better consistency in the response parsing code. * Fix more REST tests. * Improve some wording in the create index documentation. * Add a note about types removal in the create index docs. * Fix SmokeTestMonitoringWithSecurityIT#testHTTPExporterWithSSL. * Make sure to mention include_type_name in the REST docs for affected APIs. * Make sure to use 'expression == false' in FullClusterRestartIT. * Mention include_type_name in the REST templates docs.	2019-01-14 13:08:01 -08:00
Tim Brooks	5c68338a1c	Implement ccr file restore (#37130 ) This is related to #35975. It implements a file based restore in the CcrRepository. The restore transfers files from the leader cluster to the follower cluster. It does not implement any advanced resiliency features at the moment. Any request failure will end the restore.	2019-01-14 13:07:55 -07:00
Martijn van Groningen	de852765d6	unmuted test Relates to #37014	2019-01-14 14:27:42 +01:00
Martijn van Groningen	e4391afd98	Test fix, wait for auto follower to have stopped in the background Relates to #36761	2019-01-11 17:26:17 +01:00
Martijn van Groningen	6d81e7c3e7	[CCR] FollowingEngine should fail with 403 if operation has no seqno assigned (#37213 ) Fail with a 403 when indexing a document directly into a follower index. In order to test this change, I had to move specific assertions into a dedicated class and disable assertions for that class in the rest qa module. I think that is the right trade off.	2019-01-10 15:54:34 +01:00
Martijn van Groningen	df488720e0	[CCR] Make shard follow tasks more resilient for restarts (#37239 ) If a running shard follow task needs to be restarted and the remote connection seeds have changed then a shard follow task currently fails with a fatal error. The change creates the remote client lazily and adjusts the errors a shard follow task should retry. This issue was found in test failures in the recently added ccr rolling upgrade tests. The reason why this issue occurs more frequently in the rolling upgrade test is because ccr is setup in local mode (so remote connection seed will become stale) and all nodes are restarted, which forces the shard follow tasks to get restarted at some point during the test. Note that these tests cannot be enabled yet, because this change will need to be backported to 6.x first. (otherwise the issue still occurs on non upgraded nodes) I also changed the RestartIndexFollowingIT to setup remote cluster via persistent settings and to also restart the leader cluster. This way what happens during the ccr rolling upgrade qa tests, also happens in this test. Relates to #37231	2019-01-10 15:02:30 +01:00
Martijn van Groningen	1a41d84536	[CCR] Resume follow Api should not require a request body (#37217 ) Closes #37022	2019-01-10 09:48:26 +01:00
Martijn van Groningen	9122585359	[CCR] Added more logging.	2019-01-09 12:17:47 +01:00
Alpar Torok	6344e9a3ce	Testing conventions: add support for checking base classes (#36650 )	2019-01-08 13:39:03 +02:00
Jason Tedor	c8c596cead	Introduce retention lease expiration (#37195 ) This commit implements a straightforward approach to retention lease expiration. Namely, we inspect which leases are expired when obtaining the current leases through the replication tracker. At that moment, we clean the map that persists the retention leases in memory.	2019-01-07 22:03:52 -08:00
Jason Tedor	c0f8c89172	Introduce shard history retention leases (#37167 ) This commit is the first in a series which will culminate with fully-functional shard history retention leases. Shard history retention leases are aimed at preventing shard history consumers from having to fallback to expensive file copy operations if shard history is not available from a certain point. These consumers include following indices in cross-cluster replication, and local shard recoveries. A future consumer will be the changes API. Further, index lifecycle management requires coordinating with some of these consumers otherwise it could remove the source before all consumers have finished reading all operations. The notion of shard history retention leases that we are introducing here will also be used to address this problem. Shard history retention leases are a property of the replication group managed under the authority of the primary. A shard history retention lease is a combination of an identifier, a retaining sequence number, a timestamp indicating when the lease was acquired or renewed, and a string indicating the source of the lease. Being leases they have a limited lifespan that will expire if not renewed. The idea of these leases is that all operations above the minimum of all retaining sequence numbers will be retained during merges (which would otherwise clear away operations that are soft deleted). These leases will be periodically persisted to Lucene and restored during recovery, and broadcast to replicas under certain circumstances. This commit is merely putting the basics in place. This first commit only introduces the concept and integrates their use with the soft delete retention policy. We add some tests to demonstrate the basic management is correct, and that the soft delete policy is correctly influenced by the existence of any retention leases. We make no effort in this commit to implement any of the following: - timestamps - expiration - persistence to and recovery from Lucene - handoff during primary relocation - sharing retention leases with replicas - exposing leases in shard-level statistics - integration with cross-cluster replication These will occur individually in follow-up commits.	2019-01-07 07:43:57 -08:00
Jim Ferenczi	e38cf1d0dc	Add the ability to set the number of hits to track accurately (#36357 ) In Lucene 8 searches can skip non-competitive hits if the total hit count is not requested. It is also possible to track the number of hits up to a certain threshold. This is a trade off to speed up searches while still being able to know a lower bound of the total hit count. This change adds the ability to set this threshold directly in the track_total_hits search option. A boolean value (true, false) indicates whether the total hit count should be tracked in the response. When set as an integer this option allows to compute a lower bound of the total hits while preserving the ability to skip non-competitive hits when enough matches have been collected. Relates #33028	2019-01-04 20:36:49 +01:00
Luca Cavanna	c1beb95aa1	Mute LocalIndexFollowingIT#testRemoveRemoteConnection Relates to #37014	2018-12-28 16:39:36 +01:00
Nhat Nguyen	7580d9d925	Make SourceToParse immutable (#36971 ) Today the routing of a SourceToParse is assigned in a separate step after the object is created. We can easily forget to set the routing. With this commit, the routing must be provided in the constructor of SourceToParse. Relates #36921	2018-12-24 14:06:50 -05:00
Martijn van Groningen	561b704129	[CCR] AutoFollowCoordinator and follower index already created (#36540 ) The AutoFollowCoordinator should be resilient to the fact that the follower index has already been created and in that case it should only update the auto follow metadata with the fact that the follower index was created. Relates to #33007	2018-12-24 10:16:38 +01:00
Martijn van Groningen	44fe265d82	[CCR] Added auto_follow_exception.timestamp field to auto follow stats (#36947 ) Currently auto follow stats users are unable to see whether an auto follow error was recent or old. The new timestamp field will help user distinguish between old and new errors.	2018-12-24 07:53:51 +01:00
Martijn van Groningen	4fb62fcba6	Make CCR resilient against missing remote cluster connections (#36682 ) Both index following and auto following should be resilient against missing remote connections. This happens in the case that they get accidentally removed by a user. When this happens auto following and index following will retry to continue instead of failing with unrecoverable exceptions. Both the put follow and put auto follow APIs validate whether the remote cluster connection. The logic added in this change only exists in case during the lifetime of a follower index or auto follow pattern the remote connection gets removed. This retry behavior similar how CCR deals with authorization errors. Closes #36667 Closes #36255	2018-12-24 07:28:34 +01:00

1 2 3 4 5 ...

357 Commits