OpenSearch

Commit Graph

Author	SHA1	Message	Date
Armin Braun	34f2cc78f6	Fix Master Failover and DataNode Leave Blocking Snapshot (#38460 ) * Closes #38447	2019-02-05 23:56:59 +01:00
Jason Tedor	79a45b47da	Recover retention leases during peer recovery (#38435 ) This commit integrates retention leases with recovery. With this change, we copy the current retention leases on primary to the replica during phase two of recovery. At this point, the replica will have been added to the replication group and so is already receiving retention lease sync requests from the primary. This means that if any retention lease syncs are triggered on the primary after we sample the retention leases here during phase two, that sync request will also arrive on the replica ensuring that the replica is from this point on up to date with the retention leases on the primary. We have to copy these during phase two since we will be applying indexing operations, potentially triggering merges, and therefore must ensure the correct retention leases are in place beforehand.	2019-02-05 17:43:41 -05:00
Henning Andersen	20c66c5a05	Bubble-up exceptions from scheduler (#38317 ) Instead of logging warnings we now rethrow exceptions thrown inside scheduled/submitted tasks. This will still log them as warnings in production but has the added benefit that if they are thrown during unit/integration test runs, the test will be flagged as an error. This is a continuation of #38014 Fixed NPE that caused CCR tests (IndexFollowingIT and likely others) to fail. schedule could bubble rejected exception to uncaught exception handler when not using SAME executor if thread pool is terminated. Now ignore rejected exception silently if executor is shutdown.	2019-02-05 21:48:24 +01:00
Boaz Leskes	033ba725af	Remove support for internal versioning for concurrency control (#38254 ) Elasticsearch has long [supported](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html#index-versioning) compare and set (a.k.a optimistic concurrency control) operations using internal document versioning. Sadly that approach is flawed and can sometime do the wrong thing. Here's the relevant excerpt from the resiliency status page: > When a primary has been partitioned away from the cluster there is a short period of time until it detects this. During that time it will continue indexing writes locally, thereby updating document versions. When it tries to replicate the operation, however, it will discover that it is partitioned away. It won’t acknowledge the write and will wait until the partition is resolved to negotiate with the master on how to proceed. The master will decide to either fail any replicas which failed to index the operations on the primary or tell the primary that it has to step down because a new primary has been chosen in the meantime. Since the old primary has already written documents, clients may already have read from the old primary before it shuts itself down. The version numbers of these reads may not be unique if the new primary has already accepted writes for the same document We recently [introduced](https://www.elastic.co/guide/en/elasticsearch/reference/6.x/optimistic-concurrency-control.html) a new sequence number based approach that doesn't suffer from this dirty reads problem. This commit removes support for internal versioning as a concurrency control mechanism in favor of the sequence number approach. Relates to #1078	2019-02-05 20:53:35 +01:00
Jason Tedor	b03d138122	Lift retention lease expiration to index shard (#38380 ) This commit lifts the control of when retention leases are expired to index shard. In this case, we move expiration to an explicit action rather than a side-effect of calling ReplicationTracker#getRetentionLeases. This explicit action is invoked on a timer. If any retention leases expire, then we hard sync the retention leases to the replicas. Otherwise, we proceed with a background sync.	2019-02-05 14:42:17 -05:00
Tim Brooks	c2a8fe1f91	Prevent CCR recovery from missing documents (#38237 ) Currently the snapshot/restore process manually sets the global checkpoint to the max sequence number from the restored segements. This does not work for Ccr as this will lead to documents that would be recovered in the normal followering operation from being recovered. This commit fixes this issue by setting the initial global checkpoint to the existing local checkpoint.	2019-02-05 13:32:41 -06:00
Tal Levy	aef5775561	re-enables awaitsfixed datemath tests (#38376 ) Previously, date formats of `YYYY.MM.dd` would hit an issue where the year would jump towards the end of the calendar year. This was an issue that had since been resolved in tests by using `yyyy` to be the more accurate representation of the year. Closes #37037.	2019-02-05 11:20:40 -08:00
Julie Tibshirani	3ce7d2c9b6	Make sure to reject mappings with type _doc when include_type_name is false. (#38270 ) `CreateIndexRequest#source(Map<String, Object>, ... )`, which is used when deserializing index creation requests, accidentally accepts mappings that are nested twice under the type key (as described in the bug report #38266). This in turn causes us to be too lenient in parsing typeless mappings. In particular, we accept the following index creation request, even though it should not contain the type key `_doc`: ``` PUT index?include_type_name=false { "mappings": { "_doc": { "properties": { ... } } } } ``` There is a similar issue for both 'put templates' and 'put mappings' requests as well. This PR makes the minimal changes to detect and reject these typed mappings in requests. It does not address #38266 generally, or attempt a larger refactor around types in these server-side requests, as I think this should be done at a later time.	2019-02-05 10:52:32 -08:00
David Turner	f2dd5dd6eb	Remove DiscoveryPlugin#getDiscoveryTypes (#38414 ) With this change we no longer support pluggable discovery implementations. No known implementations of `DiscoveryPlugin` actually override this method, so in practice this should have no effect on the wider world. However, we were using this rather extensively in tests to provide the `test-zen` discovery type. We no longer need a separate discovery type for tests as we no longer need to customise its behaviour. Relates #38410	2019-02-05 17:42:24 +00:00
David Turner	b7ab521eb1	Throw AssertionError when no master (#38432 ) Today we throw a fatal `RuntimeException` if an exception occurs in `getMasterName()`, and this includes the case where there is currently no master. However, sometimes we call this method inside an `assertBusy()` in order to allow for a cluster that is in the process of stabilising and electing a master. The trouble is that `assertBusy()` only retries on an `AssertionError` and not on a general `RuntimeException`, so the lack of a master is immediately fatal. This commit fixes the issue by asserting there is a master, triggering a retry if there is not. Fixes #38331	2019-02-05 17:11:20 +00:00
Armin Braun	2f6afd290e	Fix Concurrent Snapshot Ending And Stabilize Snapshot Finalization (#38368 ) * The problem in #38226 is that in some corner cases multiple calls to `endSnapshot` were made concurrently, leading to non-deterministic behavior (`beginSnapshot` was triggering a repository finalization while one that was triggered by a `deleteSnapshot` was already in progress) * Fixed by: * Making all `endSnapshot` calls originate from the cluster state being in a "completed" state (apart from on short-circuit on initializing an empty snapshot). This forced putting the failure string into `SnapshotsInProgress.Entry`. * Adding deduplication logic to `endSnapshot` * Also: * Streamlined the init behavior to work the same way (keep state on the `SnapshotsService` to decide which snapshot entries are stale) * closes #38226	2019-02-05 16:44:18 +01:00
Lee Hinman	d862453d68	Support unknown fields in ingest pipeline map configuration (#38352 ) We already support unknown objects in the list of pipelines, this changes the `PipelineConfiguration` to support fields other than just `id` and `config`. Relates to #36938	2019-02-05 07:52:17 -07:00
David Turner	3b2a0d7959	Rename no-master-block setting (#38350 ) Replaces `discovery.zen.no_master_block` with `cluster.no_master_block`. Any value set for the old setting is now ignored.	2019-02-05 08:47:56 +00:00
David Turner	2d114a02ff	Rename static Zen1 settings (#38333 ) Renames the following settings to remove the mention of `zen` in their names: - `discovery.zen.hosts_provider` -> `discovery.seed_providers` - `discovery.zen.ping.unicast.concurrent_connects` -> `discovery.seed_resolver.max_concurrent_resolvers` - `discovery.zen.ping.unicast.hosts.resolve_timeout` -> `discovery.seed_resolver.timeout` - `discovery.zen.ping.unicast.hosts` -> `discovery.seed_addresses`	2019-02-05 08:46:52 +00:00
Yogesh Gaikwad	fe36861ada	Add support for API keys to access Elasticsearch (#38291 ) X-Pack security supports built-in authentication service `token-service` that allows access tokens to be used to access Elasticsearch without using Basic authentication. The tokens are generated by `token-service` based on OAuth2 spec. The access token is a short-lived token (defaults to 20m) and refresh token with a lifetime of 24 hours, making them unsuitable for long-lived or recurring tasks where the system might go offline thereby failing refresh of tokens. This commit introduces a built-in authentication service `api-key-service` that adds support for long-lived tokens aka API keys to access Elasticsearch. The `api-key-service` is consulted after `token-service` in the authentication chain. By default, if TLS is enabled then `api-key-service` is also enabled. The service can be disabled using the configuration setting. The API keys:- - by default do not have an expiration but expiration can be configured where the API keys need to be expired after a certain amount of time. - when generated will keep authentication information of the user that generated them. - can be defined with a role describing the privileges for accessing Elasticsearch and will be limited by the role of the user that generated them - can be invalidated via invalidation API - information can be retrieved via a get API - that have been expired or invalidated will be retained for 1 week before being deleted. The expired API keys remover task handles this. Following are the API key management APIs:- 1. Create API Key - `PUT/POST /_security/api_key` 2. Get API key(s) - `GET /_security/api_key` 3. Invalidate API Key(s) `DELETE /_security/api_key` The API keys can be used to access Elasticsearch using `Authorization` header, where the auth scheme is `ApiKey` and the credentials, is the base64 encoding of API key Id and API key separated by a colon. Example:- ``` curl -H "Authorization: ApiKey YXBpLWtleS1pZDphcGkta2V5" http://localhost:9200/_cluster/health ``` Closes #34383	2019-02-05 14:21:57 +11:00
Christoph Büscher	d255303584	Add typless client side GetIndexRequest calls and response class (#37778 ) The HLRC client currently uses `org.elasticsearch.action.admin.indices.get.GetIndexRequest` and `org.elasticsearch.action.admin.indices.get.GetIndexResponse` in its get index calls. Both request and response are designed for the typed APIs, including some return types e.g. for `getMappings()` which in the maps it returns still use a level including the type name. In order to change this without breaking existing users of the HLRC API, this PR introduces two new request and response objects in the `org.elasticsearch.client.indices` client package. These are used by the IndicesClient#get and IndicesClient#exists calls now by default and support the type-less API. The old request and response objects are still kept for use in similarly named, but deprecated methods. The newly introduced client side classes are simplified versions of the server side request/response classes since they don't need to support wire serialization, and only the response needs fromXContent parsing (but no xContent-serialization, since this is the responsibility of the server-side class). Also changing the return type of `GetIndexResponse#getMapping` to `Map<String, MappingMetaData> getMappings()`, while it previously was returning another map keyed by the type-name. Similar getters return simple Maps instead of the ImmutableOpenMaps that the server side response objects return.	2019-02-05 03:41:05 +01:00
Gordon Brown	292e0f6fb7	Deprecate `_type` in simulate pipeline requests (#37949 ) As mapping types are being removed throughout Elasticsearch, the use of `_type` in pipeline simulation requests is deprecated. Additionally, the default `_type` used if one is not supplied has been changed to `_doc` for consistency with the rest of Elasticsearch.	2019-02-04 16:11:44 -07:00
Christoph Büscher	0ced775389	Mute RareClusterStateIT.testDelayedMappingPropagationOnReplica (#38357 )	2019-02-04 22:30:34 +01:00
Mayya Sharipova	641704464d	Deprecate types in rollover index API (#38039 ) Relates to #35190	2019-02-04 16:07:45 -05:00
Zachary Tong	ab1150378b	Add Composite to AggregationBuilders (#38207 )	2019-02-04 13:47:04 -05:00
David Turner	2c1eab2b8a	Clarify slow cluster-state log messages (#38302 ) The message `... took [31s] above the warn threshold of 30s` suggests incorrectly that the task took 61 seconds. This commit adds the clarifying words `which is`.	2019-02-04 17:44:00 +00:00
Andrey Ershov	7bc8bc9605	ensureGreen (#38324 )	2019-02-04 16:36:04 +01:00
Jason Tedor	625d37a26a	Introduce retention lease background sync (#38262 ) This commit introduces a background sync for retention leases. The idea here is that we do a heavyweight sync when adding a new retention lease, and then periodically we want to background sync any retention lease renewals to the replicas. As long as the background sync interval is significantly lower than the extended lifetime of a retention lease, it is okay if from time to time a replica misses a sync (it will still have an older version of the lease that is retaining more data as we assume that renewals do not decrease the retaining sequence number). There are two follow-ups that will come after this commit. The first is to address the fact that we have not adapted the should periodically flush logic to possibly flush the retention leases. We want to do something like flush if we have not flushed in the last five minutes and there are renewed retention leases since the last time that we flushed. An additional follow-up will remove the syncing of retention leases when a retention lease expires. Today this sync could be invoked in the background by a merge operation. Rather, we will move the syncing of retention lease expiration to be done under the background sync. The background sync will use the heavyweight sync (write action) if a lease has expired, and will use the lightweight background sync (replication action) otherwise.	2019-02-04 10:35:29 -05:00
Christoph Büscher	5ee7232379	Mute SpecificMasterNodesIT#testElectOnlyBetweenMasterNodes (#38334 )	2019-02-04 16:10:06 +01:00
Christoph Büscher	715e581378	Mute DedicatedClusterSnapshotRestoreIT#testRestoreShrinkIndex (#38330 )	2019-02-04 15:46:19 +01:00
Boaz Leskes	e49b593c81	Move TokenService to seqno powered cas (#38311 ) Relates #37872 Relates #10708	2019-02-04 15:25:41 +01:00
Yannick Welsch	ece8c659c5	Decrease leader and follower check timeout (#38298 ) Reduces the leader and follower check timeout to 3 * 10 = 30s instead of 3 * 30 = 90s, with 30s still being a very long time for a node to be completely unresponsive.	2019-02-04 15:11:12 +01:00
Przemyslaw Gomulka	9b64558efb	Migrating from joda to java.time. Watcher plugin (#35809 ) part of the migrating joda time work. Migrating watcher plugin to use JDK's java-time refers #27330	2019-02-04 15:08:31 +01:00
Alexander Reelsen	87f3579125	Add nanosecond field mapper (#37755 ) This adds a dedicated field mapper that supports nanosecond resolution - at the price of a reduced date range. When using the date field mapper, the time is stored as milliseconds since the epoch in a long in lucene. This field mapper stores the time in nanoseconds since the epoch - which means its range is much smaller, ranging roughly from 1970 to 2262. Note that aggregations will still be in milliseconds. However docvalue fields will have full nanosecond resolution Relates #27330	2019-02-04 11:31:16 +01:00
Christoph Büscher	15510da2af	Mute SharedClusterSnapshotRestoreIT#testAbortedSnapshotDuringInitDoesNotStart (#38304 )	2019-02-04 10:41:35 +01:00
David Turner	1d82a6d9f9	Deprecate unused Zen1 settings (#38289 ) Today the following settings in the `discovery.zen` namespace are still used: - `discovery.zen.no_master_block` - `discovery.zen.hosts_provider` - `discovery.zen.ping.unicast.concurrent_connects` - `discovery.zen.ping.unicast.hosts.resolve_timeout` - `discovery.zen.ping.unicast.hosts` This commit deprecates all other settings in this namespace so that they can be removed in the next major version.	2019-02-04 08:52:08 +00:00
Armin Braun	4561f425db	Remove Redundandant Loop in SnapshotShardsService (#38283 ) * This was a merge mistake on my end I think, obviously we only need to loop over the shards once not twice here to find those that we missed in INIT state	2019-02-04 09:06:39 +01:00
Alpar Torok	d58e899d45	Remove empty service files (#38192 )	2019-02-04 10:05:04 +02:00
Jason Tedor	d2cc1459a3	Fix ordering problem in add or renew lease test (#38280 ) We have to set the primary term before we add a retention lease, otherwise we can not assert the correct primary term.	2019-02-03 12:54:31 -05:00
Christoph Büscher	6ca7a913ea	Mute ReplicationTrackerRetentionLeaseTests#testAddOrRenewRetentionLease (#38275 )	2019-02-03 12:54:13 +01:00
Armin Braun	89d7c57bd9	Fix Incorrect Transport Response Handler Type (#38264 ) * Fix Incorrect Transport Response Handler Type * The response type here is not empty and was always wrong but this only became visible now that `0a604e3b24` was introduced * As a result of `0a604e3b24` we started actually handling the response of this request and logging/handling exceptions before that we simply dropped the classcast exception here quietly using the empty response handler * fix busy assert not handling `Exception` * Closes #38226 * Closes #38256	2019-02-03 08:48:15 +01:00
Nhat Nguyen	0861dc3581	Mute testCanRunUnsafeBootstrapAfterErroneousDetachWithoutLoosingMetaData (#38268 ) Tracked at #38267	2019-02-02 20:02:21 -05:00
Christoph Büscher	50cdc61874	Mute DedicatedClusterSnapshotRestoreIT#testRestoreShrinkIndex (#38257 )	2019-02-02 13:46:29 +01:00
David Turner	c311062476	Add CoordinatorTests for empty unicast hosts list (#38209 ) Today we have DiscoveryDisruptionIT tests for checking that discovery can still work once the cluster has formed, even if the cluster is misconfigured and only has a single master-eligible node in its unicast hosts list. In fact with Zen2 we can go one better: we do not need any nodes in the unicast hosts list, because nodes also use the contents of the last-committed cluster state for discovery. Additionally, the DiscoveryDisruptionIT tests were failing due to the overenthusiastic fault-detection timeouts. This commit replaces these tests with deterministic `CoordinatorTests` that verify the same behaviour. It also removes some duplication by extracting a test method called `testFollowerCheckerAfterMasterReelection()` Closes #37687	2019-02-02 07:54:56 +00:00
Nhat Nguyen	80d3092292	Fix primary term in testAddOrRenewRetentionLease (#38239 ) We should increase primary term before renewing leases; otherwise, the term of the latest RetentionLeases will be lower than the current term. Relates #37951	2019-02-02 02:38:53 -05:00
Nhat Nguyen	1ec04dff43	FIx testReplicaIgnoresOlderRetentionLeasesVersion (#38246 ) If the innerLength is 0, the version won't be increased; then there will be two RetentionLeases with the same term and version, but their leases are different. Relates #37951 Closes #38245	2019-02-02 02:37:37 -05:00
Nhat Nguyen	8bee5b8e06	Mute testAddOrRenewRetentionLease (#38240 ) Relates #38239	2019-02-01 21:27:10 -05:00
Boaz Leskes	f6e06a2b19	Adapt minimum versions for seq# powered operations in Watch related requests and UpdateRequest (#38231 ) After backporting #37977, #37857 and #37872	2019-02-01 20:37:16 -05:00
Jason Tedor	f181e17038	Introduce retention leases versioning (#37951 ) Because concurrent sync requests from a primary to its replicas could be in flight, it can be the case that an older retention leases collection arrives and is processed on the replica after a newer retention leases collection has arrived and been processed. Without a defense, in this case the replica would overwrite the newer retention leases with the older retention leases. This commit addresses this issue by introducing a versioning scheme to retention leases. This versioning scheme is used to resolve out-of-order processing on the replica. We persist this version into Lucene and restore it on recovery. The encoding of retention leases is starting to get a little ugly. We can consider addressing this in a follow-up.	2019-02-01 17:19:19 -05:00
Nhat Nguyen	9c39dea7ae	AwaitsFix testAbortedSnapshotDuringInitDoesNotStart (#38227 ) Tracked at #38226	2019-02-01 16:24:02 -05:00
Armin Braun	03a1d21070	SnapshotShardsService Simplifications (#38025 ) * Instead of replacing the `shardSnapshots` field, we mutate it, explicitly removing entries from it in only a single spot * Decreased the amount of indirection by moving all logic for starting a snapshot's newly discovered shard tasks into `startNewShards` (saves us two maps (keyed by snapshot) and iterations over them)	2019-02-01 20:46:14 +01:00
Luca Cavanna	ee57420de6	Adjust SearchRequest version checks (#38181 ) The finalReduce flag is now supported on 6.x too, hence we need to update the version checks in master.	2019-02-01 19:23:13 +01:00
Andrey Ershov	04dc41b99e	Zen2ify RareClusterStateIT (#38184 ) In Zen 1 there are commit timeout and publish timeout and these settings could be changed on-the-fly. In Zen 2, there is only commit timeout and this setting is static. RareClusterStateIT is actively using these settings and the fact, they are dynamic. This commit adds cancelCommitedPublication method to Coordinator to be used by tests. This method will cancel current committed publication if there is any. When there is BlockClusterStateProcessing on the non-master node, the publication will be accepted and committed, but not yet applied. So we can use the method above to cancel it. Also, this commit replaces callback + AtomicReference with ActionFuture, which makes test code easier to read.	2019-02-01 18:18:11 +01:00
Yannick Welsch	025bf28405	Fix _host based require filters (#38173 ) Using index.routing.allocation.require._host does not correctly work because the boolean logic in filter matching is broken (DiscoveryNodeFilters.match(...) will return false) when opType ==OpType.AND	2019-02-01 16:02:37 +01:00
Tanguy Leroux	da6269b456	RestoreService should update primary terms when restoring shards of existing indices (#38177 ) When restoring shards of existing indices, the RestoreService also restores the values of primary terms stored in the snapshot index metadata. The primary terms are not updated and could potentially conflict with current index primary terms if the restored primary terms are lower than the existing ones. This situation is likely to happen with replicated closed indices (because primary terms are increased when the index is transitioning from open to closed state, and the snapshotted primary terms are the one at the time the index was opened) (see #38024) and maybe also with CCR. This commit changes the RestoreService so that it updates the primary terms using the maximum value between the snapshotted values and the existing values. Related to #33888	2019-02-01 15:59:11 +01:00

1 2 3 4 5 ...

2516 Commits