OpenSearch

Commit Graph

Author	SHA1	Message	Date
Nhat Nguyen	a3f39741be	Adjust log and unmute testFailOverOnFollower (#38762 ) There were two documents (seq=2 and seq=103) missing on the follower in one of the failures of `testFailOverOnFollower`. I spent several hours on that failure but could not figure out the reason. I adjust log and unmute this test so we can collect more information. Relates #38633	2019-02-12 11:42:25 -05:00
Martijn van Groningen	40d5beaf41	muted test Relates to #38779	2019-02-12 16:54:54 +01:00
Martijn van Groningen	6290d59ffa	Use clear cluster names in order to make debugging easier. Relates to #37681	2019-02-12 10:19:39 +01:00
Yannick Welsch	bafc709326	Fix CCR concurrent file chunk fetching bug (#38736 ) Fixes a bug with concurrent file chunk fetching during recovery from remote where the wrong offset was used.	2019-02-11 19:15:57 +01:00
Tanguy Leroux	dc212de822	Specialize pre-closing checks for engine implementations (#38702 ) (#38722 ) The Close Index API has been refactored in 6.7.0 and it now performs pre-closing sanity checks on shards before an index is closed: the maximum sequence number must be equals to the global checkpoint. While this is a strong requirement for regular shards, we identified the need to relax this check in the case of CCR following shards. The following shards are not in charge of managing the max sequence number or global checkpoint, which are pulled from a leader shard. They also fetch and process batches of operations from the leader in an unordered way, potentially leaving gaps in the history of ops. If the following shard lags a lot it's possible that the global checkpoint and max seq number never get in sync, preventing the following shard to be closed and a new PUT Follow action to be issued on this shard (which is our recommended way to resume/restart a CCR following). This commit allows each Engine implementation to define the specific verification it must perform before closing the index. In order to allow following/frozen/closed shards to be closed whatever the max seq number or global checkpoint are, the FollowingEngine and ReadOnlyEngine do not perform any check before the index is closed. Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com>	2019-02-11 17:34:17 +01:00
Martijn van Groningen	92201ef563	Catch AlreadyClosedException and use other IndexShard instance (#38630 ) Closes #38617	2019-02-11 15:36:48 +01:00
Martijn van Groningen	a29bf2585e	Added unit test for FollowParameters class (#38500 ) (#38690 ) A unit test that tests FollowParameters directly was missing.	2019-02-11 10:53:04 +01:00
Martijn van Groningen	4625807505	Reuse FollowParameters' parse fields. (#38508 )	2019-02-11 08:46:36 +01:00
Martijn van Groningen	e213ad3e88	Mute test. Relates to #38695	2019-02-11 08:32:42 +01:00
Tim Brooks	023e3c207a	Concurrent file chunk fetching for CCR restore (#38656 ) Adds the ability to fetch chunks from different files in parallel, configurable using the new `ccr.indices.recovery.max_concurrent_file_chunks` setting, which defaults to 5 in this PR. The implementation uses the parallel file writer functionality that is also used by peer recoveries.	2019-02-09 21:19:57 -07:00
Nhat Nguyen	c202900915	Retry on wait_for_metada_version timeout (#38521 ) Closes #37807 Backport of #38521	2019-02-09 19:51:58 -05:00
Christoph Büscher	d03b386f6a	Mute FollowerFailOverIT testFailOverOnFollower (#38634 ) Relates to #38633	2019-02-08 17:20:30 +01:00
David Turner	5a3c452480	Align docs etc with new discovery setting names (#38492 ) In #38333 and #38350 we moved away from the `discovery.zen` settings namespace since these settings have an effect even though Zen Discovery itself is being phased out. This change aligns the documentation and the names of related classes and methods with the newly-introduced naming conventions.	2019-02-06 11:34:38 +00:00
Tim Brooks	fb0ec26fd4	Set update mappings mater node timeout to 30 min (#38439 ) This is related to #35975. We do not want a slow master to fail a recovery from remote process due to a slow put mappings call. This commit increases the master node timeout on this call to 30 mins.	2019-02-05 16:22:11 -06:00
Przemyslaw Gomulka	afcdbd2bc0	XPack: core/ccr/Security-cli migration to java-time (#38415 ) part of the migrating joda time work. refactoring x-pack plugins usages of joda to java-time refers #27330	2019-02-05 22:09:32 +01:00
Tim Brooks	4a15e2b29e	Make Ccr recovery file chunk size configurable (#38370 ) This commit adds a byte setting `ccr.indices.recovery.chunk_size`. This setting configs the size of file chunk requested while recovering from remote.	2019-02-05 13:34:00 -06:00
Tim Brooks	c2a8fe1f91	Prevent CCR recovery from missing documents (#38237 ) Currently the snapshot/restore process manually sets the global checkpoint to the max sequence number from the restored segements. This does not work for Ccr as this will lead to documents that would be recovered in the normal followering operation from being recovered. This commit fixes this issue by setting the initial global checkpoint to the existing local checkpoint.	2019-02-05 13:32:41 -06:00
David Turner	f2dd5dd6eb	Remove DiscoveryPlugin#getDiscoveryTypes (#38414 ) With this change we no longer support pluggable discovery implementations. No known implementations of `DiscoveryPlugin` actually override this method, so in practice this should have no effect on the wider world. However, we were using this rather extensively in tests to provide the `test-zen` discovery type. We no longer need a separate discovery type for tests as we no longer need to customise its behaviour. Relates #38410	2019-02-05 17:42:24 +00:00
Armin Braun	887fa2c97a	Mute testReadRequestsReturnLatestMappingVersion (#38438 ) * Relates #37807	2019-02-05 17:10:12 +01:00
Martijn van Groningen	0beb3c93d1	Clean up duplicate follow config parameter code (#37688 ) Introduced FollowParameters class that put follow, resume follow, put auto follow pattern requests and follow info response classes reuse. The FollowParameters class had the fields, getters etc. for the common parameters that all these APIs have. Also binary and xcontent serialization / parsing is handled by this class. The follow, resume follow, put auto follow pattern request classes originally used optional non primitive fields, so FollowParameters has that too and the follow info api can handle that now too. Also the followerIndex field can in production only be specified via the url path. If it is also specified via the request body then it must have the same value as is specified in the url path. This option only existed to xcontent testing. However the AbstractSerializingTestCase base class now also supports createXContextTestInstance() to provide a different test instance when testing xcontent, so allowing followerIndex to be specified via the request body is no longer needed. By moving the followerIndex field from Body to ResumeFollowAction.Request class and not allowing the followerIndex field to be specified via the request body the Body class is redundant and can be removed. The ResumeFollowAction.Request class can then directly use the FollowParameters class. For consistency I also removed the ability to specified followerIndex in the put follow api and the name in put auto follow pattern api via the request body.	2019-02-05 17:05:19 +01:00
David Turner	2d114a02ff	Rename static Zen1 settings (#38333 ) Renames the following settings to remove the mention of `zen` in their names: - `discovery.zen.hosts_provider` -> `discovery.seed_providers` - `discovery.zen.ping.unicast.concurrent_connects` -> `discovery.seed_resolver.max_concurrent_resolvers` - `discovery.zen.ping.unicast.hosts.resolve_timeout` -> `discovery.seed_resolver.timeout` - `discovery.zen.ping.unicast.hosts` -> `discovery.seed_addresses`	2019-02-05 08:46:52 +00:00
Yogesh Gaikwad	fe36861ada	Add support for API keys to access Elasticsearch (#38291 ) X-Pack security supports built-in authentication service `token-service` that allows access tokens to be used to access Elasticsearch without using Basic authentication. The tokens are generated by `token-service` based on OAuth2 spec. The access token is a short-lived token (defaults to 20m) and refresh token with a lifetime of 24 hours, making them unsuitable for long-lived or recurring tasks where the system might go offline thereby failing refresh of tokens. This commit introduces a built-in authentication service `api-key-service` that adds support for long-lived tokens aka API keys to access Elasticsearch. The `api-key-service` is consulted after `token-service` in the authentication chain. By default, if TLS is enabled then `api-key-service` is also enabled. The service can be disabled using the configuration setting. The API keys:- - by default do not have an expiration but expiration can be configured where the API keys need to be expired after a certain amount of time. - when generated will keep authentication information of the user that generated them. - can be defined with a role describing the privileges for accessing Elasticsearch and will be limited by the role of the user that generated them - can be invalidated via invalidation API - information can be retrieved via a get API - that have been expired or invalidated will be retained for 1 week before being deleted. The expired API keys remover task handles this. Following are the API key management APIs:- 1. Create API Key - `PUT/POST /_security/api_key` 2. Get API key(s) - `GET /_security/api_key` 3. Invalidate API Key(s) `DELETE /_security/api_key` The API keys can be used to access Elasticsearch using `Authorization` header, where the auth scheme is `ApiKey` and the credentials, is the base64 encoding of API key Id and API key separated by a colon. Example:- ``` curl -H "Authorization: ApiKey YXBpLWtleS1pZDphcGkta2V5" http://localhost:9200/_cluster/health ``` Closes #34383	2019-02-05 14:21:57 +11:00
Nhat Nguyen	cecfa5bd6d	Tighten mapping syncing in ccr remote restore (#38071 ) There are two issues regarding the way that we sync mapping from leader to follower when a ccr restore is completed: 1. The returned mapping from a cluster service might not be up to date as the mapping of the restored index commit. 2. We should not compare the mapping version of the follower and the leader. They are not related to one another. Moreover, I think we should only ensure that once the restore is done, the mapping on the follower should be at least the mapping of the copied index commit. We don't have to sync the mapping which is updated after we have opened a session. Relates #36879 Closes #37887	2019-02-04 17:53:41 -05:00
Tim Brooks	5a33816c86	Add test for `PutFollowAction` on a closed index (#38236 ) This is related to #35975. Currently when an index falls behind a leader it encounters a fatal exception. This commit adds a test for that scenario. Additionally, it tests that the user can stop following, close the follower index, and put follow again. After the indexing is re-bootstrapped, it will recover the documents it lost in normal following operations.	2019-02-04 16:37:42 -06:00
Nhat Nguyen	fb1e350c81	Mute testFollowIndexAndCloseNode (#38360 ) Tracked at #33337	2019-02-04 15:04:46 -05:00
Jason Tedor	f181e17038	Introduce retention leases versioning (#37951 ) Because concurrent sync requests from a primary to its replicas could be in flight, it can be the case that an older retention leases collection arrives and is processed on the replica after a newer retention leases collection has arrived and been processed. Without a defense, in this case the replica would overwrite the newer retention leases with the older retention leases. This commit addresses this issue by introducing a versioning scheme to retention leases. This versioning scheme is used to resolve out-of-order processing on the replica. We persist this version into Lucene and restore it on recovery. The encoding of retention leases is starting to get a little ugly. We can consider addressing this in a follow-up.	2019-02-01 17:19:19 -05:00
Nhat Nguyen	3ecdfe1060	Enable trace log in FollowerFailOverIT (#38148 ) This suite still fails one per week sometimes with a worrying assertion. Sadly we are still unable to find the actual source. Expected: <SeqNoStats{maxSeqNo=229, localCheckpoint=86, globalCheckpoint=86}> but: was <SeqNoStats{maxSeqNo=229, localCheckpoint=-1, globalCheckpoint=86}> This change enables trace log in the suite so we will have a better picture if this fails again. Relates #3333	2019-02-01 15:44:39 -05:00
Julie Tibshirani	c2e9d13ebd	Default include_type_name to false in the yml test harness. (#38058 ) This PR removes the temporary change we made to the yml test harness in #37285 to automatically set `include_type_name` to `true` in index creation requests if it's not already specified. This is possible now that the vast majority of index creation requests were updated to be typeless in #37611. A few additional tests also needed updating here. Additionally, this PR updates the test harness to set `include_type_name` to `false` in index creation requests when communicating with 6.x nodes. This mirrors the logic added in #37611 to allow for typeless document write requests in test set-up code. With this update in place, we can remove many references to `include_type_name: false` from the yml tests.	2019-02-01 11:44:13 -08:00
Nhat Nguyen	f64b20383e	Replace awaitBusy with assertBusy in atLeastDocsIndexed (#38190 ) Unlike assertBusy, awaitBusy does not retry if the code-block throws an AssertionError. A refresh in atLeastDocsIndexed can fail because we call this method while we are closing some node in FollowerFailOverIT.	2019-02-01 13:31:17 -05:00
Tim Brooks	291c4e7a0c	Fix file reading in ccr restore service (#38117 ) Currently we use the raw byte array length when calling the IndexInput read call to determine how many bytes we want to read. However, due to how BigArrays works, the array length might be longer than the reference length. This commit fixes the issue and uses the BytesRef length when calling read. Additionally, it expands the index follow test to index many more documents. These documents should potentially lead to large enough segment files to trigger scenarios where this fix matters.	2019-01-31 18:02:24 -07:00
Henning Andersen	68ed72b923	Handle scheduler exceptions (#38014 ) Scheduler.schedule(...) would previously assume that caller handles exception by calling get() on the returned ScheduledFuture. schedule() now returns a ScheduledCancellable that no longer gives access to the exception. Instead, any exception thrown out of a scheduled Runnable is logged as a warning. This is a continuation of #28667, #36137 and also fixes #37708.	2019-01-31 17:51:45 +01:00
Alpar Torok	b7de8e1d1e	Mute failing test Tracking #38100	2019-01-31 17:01:16 +02:00
Alpar Torok	f15d7b9b91	Mute failing test Tracking #38027	2019-01-31 16:55:52 +02:00
Nhat Nguyen	1a93976ff7	Correct arg names when update mapping/settings from leader (#38063 ) These two arguments are not named incorrectly and caused confusion.	2019-01-31 02:45:42 -05:00
Tim Brooks	b88bdfe958	Add dispatching to `HandledTransportAction` (#38050 ) This commit allows implementors of the `HandledTransportAction` to specify what thread the action should be executed on. The motivation for this commit is that certain CCR requests should be performed on the generic threadpool.	2019-01-30 15:40:49 -07:00
Tim Brooks	aeab55e8d1	Reduce flaxiness of ccr recovery timeouts test (#38035 ) This fixes #38027. Currently we assert that all shards have failed. However, it is possible that some shards do not have segement files created yet. The action that we block is fetching these segement files so it is possible that some shards successfully recover. This commit changes the assertion to ensure that at least some of the shards have failed.	2019-01-30 14:13:23 -07:00
Martijn van Groningen	5433af28e3	Fixed test bug, lastFollowTime is null if there are no follower indices.	2019-01-30 19:33:16 +01:00
Martijn van Groningen	f51bc00fcf	Added ccr to xpack usage infrastructure (#37256 ) * Added ccr to xpack usage infrastructure Closes #37221	2019-01-30 07:58:26 +01:00
Tim Brooks	55b916afc0	Ensure task metadata not null in follow test (#37993 ) This commit fixes a potential race in the IndexFollowingIT. Currently it is possible that we fetch the task metadata, it is null, and that throws a null pointer exception. Assertbusy does not catch null pointer exceptions. This commit assertions that the metadata is not null.	2019-01-29 15:58:31 -07:00
Tim Brooks	f3f9cabd67	Add timeout for ccr recovery action (#37840 ) This is related to #35975. It adds a action timeout setting that allows timeouts to be applied to the individual transport actions that are used during a ccr recovery.	2019-01-29 12:29:06 -07:00
Tim Brooks	00ace369af	Use `CcrRepository` to init follower index (#35719 ) This commit modifies the put follow index action to use a CcrRepository when creating a follower index. It routes the logic through the snapshot/restore process. A wait_for_active_shards parameter can be used to configure how long to wait before returning the response.	2019-01-29 11:47:29 -07:00
Przemyslaw Gomulka	891320f5ac	Elasticsearch support to JSON logging (#36833 ) In order to support JSON log format, a custom pattern layout was used and its configuration is enclosed in ESJsonLayout. Users are free to use their own patterns, but if smooth Beats integration is needed, they should use ESJsonLayout. EvilLoggerTests are left intact to make sure user's custom log patterns work fine. To populate additional fields node.id and cluster.uuid which are not available at start time, a cluster state update will have to be received and the values passed to log4j pattern converter. A ClusterStateObserver.Listener is used to receive only one ClusteStateUpdate. Once update is received the nodeId and clusterUUid are set in a static field in a NodeAndClusterIdConverter. Following fields are expected in JSON log lines: type, tiemstamp, level, component, cluster.name, node.name, node.id, cluster.uuid, message, stacktrace see ESJsonLayout.java for more details and field descriptions Docker log4j2 configuration is now almost the same as the one use for ES binary. The only difference is that docker is using console appenders, whereas ES is using file appenders. relates: #32850	2019-01-29 07:20:09 +01:00
Nhat Nguyen	557fcf915e	Wait for mapping in testReadRequestsReturnLatestMappingVersion (#37886 ) If the index request is executed before the mapping update is applied on the IndexShard, the index request will perform a dynamic mapping update. This mapping update will be timeout (i.e, ProcessClusterEventTimeoutException) because the latch is not open. This leads to the failure of the index request and the test. This commit makes sure the mapping is ready before we execute the index request. Closes #37807	2019-01-28 15:25:56 -05:00
Martijn van Groningen	4e1a779773	Prepare ShardFollowNodeTask to bootstrap when it fall behind leader shard (#37562 ) * Changed `LuceneSnapshot` to throw an `OperationsMissingException` if the requested ops are missing. * Changed the shard changes api to handle the `OperationsMissingException` and wrap the exception into `ResourceNotFound` exception and include metadata to indicate the requested range can no longer be retrieved. * Changed `ShardFollowNodeTask` to handle this `ResourceNotFound` exception with the included metdata header. Relates to #35975	2019-01-28 09:30:04 +01:00
Dimitrios Liappis	290c6637c2	Refactor into appropriate uses of scheduleUnlessShuttingDown (#37709 ) Replace `threadPool().schedule()` / catch `EsRejectedExecutionException` pattern with direct calls to `ThreadPool#scheduleUnlessShuttingDown()`. Closes #36318	2019-01-28 10:01:26 +02:00
Julie Tibshirani	7c130d235a	Mute CcrRepositoryIT#testFollowerMappingIsUpdated Tracked in #37887.	2019-01-25 14:55:47 -08:00
Tanguy Leroux	f1f54e0f61	TransportUnfollowAction should increase settings version (#37859 ) The TransportUnfollowAction updates the index settings but does not increase the settings version to reflect that change. This issue has been caught while working on the replication of closed indices (#33888). The IndexFollowingIT.testUnfollowIndex() started to fail and this specific assertion tripped. It does not happen on master branch today because index metadata for closed indices are never updated in IndexService instances, but this is something that is going to change with the replication of closed indices.	2019-01-25 16:31:26 +01:00
Martijn van Groningen	1151f3b3ff	Fail with a dedicated exception if remote connection is missing or (#37767 ) or connectivity to the remote connection is failing. Relates to #37681	2019-01-25 08:53:18 +01:00
Nhat Nguyen	76fb573569	Do not allow put mapping on follower (#37675 ) Today, the mapping on the follower is managed and replicated from its leader index by the ShardFollowTask. Thus, we should prevent users from modifying the mapping on the follower indices. Relates #30086	2019-01-24 12:13:00 -05:00
David Roberts	f12bfb4684	Mute FollowerFailOverIT testReadRequestsReturnsLatestMappingVersion Due to https://github.com/elastic/elasticsearch/issues/37807	2019-01-24 09:58:50 +00:00

1 2 3 4 5 ...

390 Commits