OpenSearch

Commit Graph

Author	SHA1	Message	Date
Jason Tedor	2d8f6b6501	Introduce retention lease state file (#39004 ) This commit moves retention leases from being persisted in the Lucene commit point to being persisted in a dedicated state file.	2019-02-18 16:53:46 -05:00
Jason Tedor	d43ac8fe11	Include in log retention leases that failed to sync When retention leases fail to sync after an expiration check, we emit a log message about this. This commit adds the retention leases that failed to sync.	2019-02-18 15:08:08 -05:00
Jason Tedor	bbb61002ba	Add some logging related to retention lease syncing (#39066 ) When the background retention lease sync fires, we check an see if any retention leases are expired. If any did expire, we execute a full retention lease sync (write action). Since this is happening on a background thread, we do not block that thread waiting for success (it will simply try again when the timer elapses). However, we were swallowing exceptions that indicate failure. This commit addresses that by logging the failures. Additionally, we add some trace logging to the execution of syncing retention leases.	2019-02-18 15:02:31 -05:00
Henning Andersen	99b2bc3461	Fix potential race during TcpTransport close (#39031 ) Fixed two potential causes for leaked threads during tests: 1. When adding a channel to serverChannels, we add it under a monitor that we do not use when reading from it. This is potentially unsafe if there is no other happens-before relationship ensuring the safety of this. 2. Long-shot but if the thread pool was shutdown before entering this code, we would silently forget about closing server channels so added assert. Strengthened the locking to ensure that once we stop the transport, no new server channels can be made. Relates to CI failure issue: #37543	2019-02-18 19:13:23 +01:00
Alan Woodward	ab4d5f404f	Add overlapping, before, after filters to intervals query (#38999 ) Lucene recently added `overlapping`, `before` and `after` filters to the intervals package. This commit exposes them in elasticsearch.	2019-02-18 15:06:24 +00:00
Adrien Grand	45b17e8645	Don't close caches while there might still be in-flight requests. (#38958 ) Many of our index components use ref-counting so that in the event that a shard is closed while there are still ongoing requests, then the index reader and the store only effectively get closed when ongoing requests have finished. However we don't apply the same principle to the request and query caches, which might get closed while there are still in-flight requests. This commit adds ref-counting to `IndicesService` so that the caches and other components it maintains only get closed when all shards are effectively closed. Closes #37117	2019-02-18 13:59:58 +01:00
Martijn van Groningen	ed08bc3537	Fix LocalIndexFollowingIT#testRemoveRemoteConnection() test (#38709 ) * During fetching remote mapping if remote client is missing then `NoSuchRemoteClusterException` was not handled. * When adding remote connection, check that it is really connected before continue-ing to run the tests. Relates to #38695	2019-02-18 09:41:44 +01:00
Jason Tedor	a5ce1e0bec	Integrate retention leases to recovery from remote (#38829 ) This commit is the first step in integrating shard history retention leases with CCR. In this commit we integrate shard history retention leases with recovery from remote. Before we start transferring files, we take out a retention lease on the primary. Then during the file copy phase, we repeatedly renew the retention lease. Finally, when recovery from remote is complete, we disable the background renewing of the retention lease.	2019-02-16 15:37:52 -05:00
Tim Brooks	b1c1daa63f	Add get file chunk timeouts with listener timeouts (#38758 ) This commit adds a `ListenerTimeouts` class that will wrap a `ActionListener` in a listener with a timeout scheduled on the generic thread pool. If the timeout expires before the listener is completed, `onFailure` will be called with an `ElasticsearchTimeoutException`. Timeouts for the get ccr file chunk action are implemented using this functionality. Additionally, this commit attempts to fix #38027 by also blocking proxied get ccr file chunk actions. This test being un-muted is useful to verify the timeout functionality.	2019-02-16 10:56:03 -07:00
Luca Cavanna	a1a49f201d	Tie break search shard iterator comparisons on cluster alias (#38853 ) `SearchShardIterator` inherits its `compareTo` implementation from `PlainShardIterator`. That is good in most of the cases, as such comparisons are based on the shard id which is unique, even when searching against indices with same names across multiple clusters (thanks to the index uuid being different). In case though the same cluster is registered multiple times with different aliases, the shard id is exactly the same, hence remote results will be returned before local ones with same shard id objects. That is because remote iterators are added before local ones, and we use a stable sorting method in `GroupShardIterators` constructor. This PR enhances `compareTo` for `SearchShardIterator` to tie break on cluster alias and introduces consistent `equals` and `hashcode` methods. This allows to remove a TODO in `SearchResponseMerger` which otherwise has to handle this special case specifically. Also, while at it I added missing tests around equals/hashcode and compareTo and expanded existing ones.	2019-02-16 09:41:03 +01:00
Nhat Nguyen	7e20a92888	Advance max_seq_no before add operation to Lucene (#38879 ) Today when processing an operation on a replica engine (or the following engine), we first add it to Lucene, then add it to translog, then finally marks its seq_no as completed. If a flush occurs after step1, but before step-3, the max_seq_no in the commit's user_data will be smaller than the seq_no of some documents in the Lucene commit.	2019-02-15 21:04:28 -05:00
Nhat Nguyen	20755e666c	Reduce global checkpoint sync interval in disruption tests (#38931 ) We verify seq_no_stats is aligned between copies at the end of some disruption tests. Sometimes, the assertion `assertSeqNos` is tripped due to a lagged global checkpoint on replicas. The global checkpoint on replicas is lagged because we sync the global checkpoint 30 seconds (by default) after the last replication operation. This change reduces the global checkpoint sync-internal to 1s in the disruption tests. Closes #38318 Closes #36789	2019-02-15 21:04:20 -05:00
Nhat Nguyen	a67b9f6d1f	Relax testStressMaybeFlushOrRollTranslogGeneration (#38918 ) The predicate shouldPeriodicallyFlush is determined by the uncommitted translog size and the local checkpoint. The uncommitted translog size depends on the local checkpoint. The condition shouldPeriodicallyFlush can be true twice in in the test in the following scenario: 1. Index doc-0 and advances the local checkpoint to 0, the condition shouldPeriodicallyFlush remains false. 2. Index doc-1 and add it to translog, but the local checkpoint is not advanced yet (still 0). The condition shouldPeriodicallyFlush becomes true because the uncommitted translog size is 216bytes (2ops + gen-1 + gen-2) > 180bytes and the translog generation of the new index commit would advance from 1 to 2. > [2019-02-13T23:33:58,257][TRACE][o.e.i.e.Engine ] [node_s_0] > [test][0] committing writer with commit data [{local_checkpoint=0, > max_unsafe_auto_id_timestamp=-1, translog_uuid=fFp1Yqd4QiqKDD4ZrC8F-g, > min_retained_seq_no=0, history_uuid=cn31yrwVQk-Vs7qcg4bi_Q, > retention_leases=primary_term:1;version:0;, translog_generation=2, > max_seq_no=1}] 1. The shouldPeriodicallyFlush becomes true again after the local checkpoint is advanced to 1 because the uncommitted translog size is 216bytes (2ops + gen-2 + gen-3) > 180bytes and the translog generation of the new index commit would advance from 2 to 4. > [2019-02-13T23:33:58,264][TRACE][o.e.i.e.Engine ] [node_s_0] > [test][0] committing writer with commit data [{local_checkpoint=1, > max_unsafe_auto_id_timestamp=-1, translog_uuid=fFp1Yqd4QiqKDD4ZrC8F-g, > min_retained_seq_no=0, history_uuid=cn31yrwVQk-Vs7qcg4bi_Q, > retention_leases=primary_term:1;version:0;, translog_generation=4, > max_seq_no=1}] We need to relax the assertion in this test to cover this situation. Closes #31629	2019-02-15 21:04:12 -05:00
Armin Braun	238425e5e7	Fix Issue with Concurrent Snapshot Init + Delete (#38518 ) * Fix Issue with Concurrent Snapshot Init + Delete by ensuring that we're not finalizing a snapshot in the repository while it is initializing on another thread * Closes #38489	2019-02-15 16:50:47 -08:00
Alan Woodward	176013e23c	Avoid double term construction in DfsPhase (#38716 ) DfsPhase captures terms used for scoring a query in order to build global term statistics across multiple shards for more accurate scoring. It currently does this by building the query's `Weight` and calling `extractTerms` on it to collect terms, and then calling `IndexSearcher.termStatistics()` for each collected term. This duplicates work, however, as the various `Weight` implementations will already have collected these statistics at construction time. This commit replaces this round-about way of collecting stats, instead using a delegating IndexSearcher that collects the term contexts and statistics when `IndexSearcher.termStatistics()` is called from the Weight. It also fixes a bug when using rescorers, where a `QueryRescorer` would calculate distributed term statistics, but ignore field statistics. `Rescorer.extractTerms` has been removed, and replaced with a new method on `RescoreContext` that returns any queries used by the rescore implementation. The delegating IndexSearcher then collects term contexts and statistics in the same way described above for each Query.	2019-02-15 16:00:38 +00:00
Daniel Mitterdorfer	fcc7f553f5	Also mmap cfs files for hybridfs (#38940 ) (#38947 ) With this commit we add the `.cfs` file extension to the list of file types that are memory-mapped by hybridfs. `.cfs` files combine all files of a Lucene segment into a single file in order to save file handles. As this strategy is only used for "small" segments (less than 10% of the shard size), it is benefical to memory-map them instead of accessing them via NIO. Relates #36668	2019-02-15 15:34:40 +01:00
David Turner	578514e892	Recover peers from translog, ignoring soft deletes (#38904 ) Today if soft deletes are enabled then we read the operations needed for peer recovery from Lucene. However we do not currently make any attempt to retain history in Lucene specifically for peer recoveries so we may discard it and fall back to a more expensive file-based recovery. Yet we still retain sufficient history in the translog to perform an operations-based peer recovery. In the long run we would like to fix this by retaining more history in Lucene, possibly using shard history retention leases (#37165). For now, however, this commit reverts to performing peer recoveries using the history retained in the translog regardless of whether soft deletes are enabled or not.	2019-02-15 10:45:15 +01:00
Henning Andersen	a211e51343	ShardBulkAction ignore primary response on primary (#38901 ) Previously, if a version conflict occurred and a previous primary response was present, the original primary response would be used both for sending to replica and back to client. This was made in the past as an attempt to fix issues with conflicts after relocations where a bulk request would experience a closed shard half way through and thus have to retry on the new primary. It could then fail on its own update. With sequence numbers, this leads to an issue, since if a primary is demoted (network partitions), it will send along the original response in the request. In case of a conflict on the new primary, the old response is sent to the replica. That data could be stale, leading to inconsistency between primary and replica. Relocations now do an explicit hand-off from old to new primary and ensures that no operations are active while doing this. Above is thus no longer necessary. This change removes the special handling of conflicts and ignores primary responses when executing shard bulk requests on the primary.	2019-02-15 10:13:11 +01:00
Jason Tedor	00cb8d0be8	Mark coordinator test as awaits fix This test is failing frequently so this commit mutes it. Relates #38867	2019-02-14 12:43:31 -05:00
Lee Hinman	0c733c04be	Remove immediate operation retry after mapping update (#38873 ) Prior to this commit, when an indexing operation resulted in an `Engine.Result.Type.MAPPING_UPDATE_REQUIRED`, TransportShardBulkAction immediately retries the indexing operation to see if it succeeds. In the event that it succeeds the context does not wait until the mapping update has propagated through the cluster state before finishing the indexing. In some of our tests we rely on mappings being available as soon as they've been introduced in a document that indexed correctly. By removing the immediate retry we always wait for this to be the case. Resolves #38428 Supercedes #38579 Relates to #38711	2019-02-14 09:31:08 -07:00
Christoph Büscher	6c5cec4ff4	Enable silent FollowersCheckerTest (#38851 ) One of the test methods wasn't run because it was private. Making this method public and fixing some issues around mocking the threadpool that otherwise would lead to an NPE.	2019-02-14 16:16:48 +01:00
Albert Zaharovits	6243a9797f	_cat/indices with Security, hide names when wildcard (#38824 ) This changes the output of the `_cat/indices` API with `Security` enabled. It is possible to only display the index name (and possibly the index health, depending on the request options) but not its stats (doc count, merges, size, etc). This is the case for closed indices which have index metadata in the cluster state but no associated shards, hence no shard stats. However, when `Security` is enabled, and the request contains wildcards, open indices without stats are a common occurrence. This is because the index names in the response table are picked up directly from the cluster state which is not filtered by `Security`'s _indexNameExpressionResolver_, unlike the stats data which is populated by the indices stats API which does go through the index name resolver. This is a bug, because it is circumventing `Security`'s function to hide unauthorized indices. This has been fixed by displaying the index names as they are resolved by the indices stats API. The outputs of these two APIs is now very similar: same index names, similar data but different format. Closes #37190	2019-02-14 15:09:17 +02:00
David Roberts	6ea483a663	Mute DedicatedClusterSnapshotRestoreIT testRestoreShrinkIndex Due to https://github.com/elastic/elasticsearch/issues/38845	2019-02-14 11:46:22 +00:00
Luca Cavanna	7456117019	[TEST] address testCollectNodes rare failure (#38559 ) #37767 changed the expected exception for "no such cluster" error from `IllegalStateException` to a dedicated `NoSuchRemoteClusterException`. An assertion in `testCollectNodes` needs to be updated accordingly.	2019-02-14 10:57:14 +01:00
Nhat Nguyen	5d22e45990	Copy retention leases when trim unsafe commits (#37995 ) When a primary shard is recovered from its store, we trim the last commit (when it's unsafe). If that primary crashes before the recovery completes, we will lose the committed retention leases because they are baked in the last commit. With this change, we copy the retention leases from the last commit to the safe commit when trimming unsafe commits. Relates #37165	2019-02-13 17:27:48 -05:00
Jason Tedor	062eea8fcc	Fix excessive increments in soft delete policy (#38813 ) In this case, we were incrementing the policy too much. This means on every iteration we actually keep increasing the minimum retained sequence number, even with leases in place. It was a bug from when the soft deletes policy had retention leases incorporated into it. This commit fixes this bug by ensuring we only increment in the proper places, and adds careful tests for the various situations.	2019-02-13 14:04:45 -05:00
Jake Landis	46bb663a09	Make 7.x like 6.7 user agent ecs, but default to true (#38828 ) Forward port of https://github.com/elastic/elasticsearch/pull/38757 This change reverts the initial 7.0 commits and replaces them with the 6.7 variant that still allows for the ecs flag. This commit differs from the 6.7 variants in that ecs flag will now default to true. 6.7: `ecs` : default `false` 7.x: `ecs` : default `true` 8.0: no option, but behaves as `true` * Revert "Ingest node - user agent, move device to an object (#38115)" This reverts commit `5b008a34aa`. * Revert "Add ECS schema for user-agent ingest processor (#37727) (#37984)" This reverts commit `cac6b8e06f`. * cherry-pick 5dfe1935345da3799931fd4a3ebe0b6aa9c17f57 Add ECS schema for user-agent ingest processor (#37727) * cherry-pick ec8ddc890a34853ee8db6af66f608b0ad0cd1099 Ingest node - user agent, move device to an object (#38115) (#38121) * cherry-pick f63cbdb9b426ba24ee4d987ca767ca05a22f2fbb (with manual merge fixes) Dep. check for ECS changes to User Agent processor (#38362) * make true the default for the ecs option, and update 7.0 references and tests	2019-02-13 10:28:01 -06:00
Przemyslaw Gomulka	7404882105	Fix line separators in JSON logging tests backport#38771 #38834 The hardcoded '\n' in string will not work in Windows where there is a different line separator. A System.lineSeparator should be used to make it work on all platforms closes #38705 backport #38771	2019-02-13 13:34:33 +01:00
Zachary Tong	57f69082fd	Disable cache on QueryProfilerIT (#38748 ) - Disables the request cache on the test, to prevent cached values from potentially interfering with test results - Changes the test to execute a single query, in hopes of making failures more reproducible Backport of #38583	2019-02-12 13:11:52 -05:00
Nhat Nguyen	a3f39741be	Adjust log and unmute testFailOverOnFollower (#38762 ) There were two documents (seq=2 and seq=103) missing on the follower in one of the failures of `testFailOverOnFollower`. I spent several hours on that failure but could not figure out the reason. I adjust log and unmute this test so we can collect more information. Relates #38633	2019-02-12 11:42:25 -05:00
Nhat Nguyen	4a5070dcfb	Use current term in initial leases in engine test (#38285 ) We need to use the current primary term instead of 1L for the initial retention leases; otherwise, the primary term of the committed retention leases won't match the current primary term if the retention leases never gets updated.	2019-02-12 11:40:04 -05:00
Nhat Nguyen	eca5404572	Fix synchronization in LocalCheckpointTracker#contains (#38755 ) We are accessing the `CountedBitSet` in `LocalCheckpointTracker#contains` without proper synchronization. Relates #33871	2019-02-12 11:39:50 -05:00
Nhat Nguyen	225ebb6935	Ensure no snapshotted commit when close engine (#38663 ) With this change, we can automatically detect an implementation that acquires an index commit but fails to release.	2019-02-12 11:39:35 -05:00
Tanguy Leroux	51d6b9ab31	Fix CloseWhileRelocatingShardsIT (#38728 )	2019-02-12 14:04:44 +01:00
Jason Tedor	bbc9aa9979	Introduce retention lease actions (#38756 ) This commit introduces actions for some common retention lease operations that clients need to be able to perform remotely. These actions include add/renew/remove.	2019-02-12 07:38:03 -05:00
Przemyslaw Gomulka	7e178aa4a7	Enable IndexActionTests and WatcherIndexingListenerTests Backport #38738 fix tests to use clock in milliseconds precision in watcher code make sure the date comparison in string format is using same formatters some of the code was modified in #38514 possibly because of merge conflicts closes #38581 Backport #38738	2019-02-12 13:05:44 +01:00
Luca Cavanna	90fff54954	Tie break on cluster alias when merging shard search failures (#38715 ) A recent test failure triggered an edge case scenario where failures may be coming back with the same shard id, yet from different clusters. This commit adapts the failures comparator to take the cluster alias into account when merging failures as part of CCS requests execution. Also the corresponding test has been split in two: with and without search shard target set to the failure. Closes #38672	2019-02-12 11:25:44 +01:00
Jason Tedor	c7cdd6a46a	Add dedicated retention lease exceptions (#38754 ) When a retention lease already exists on an add retention lease invocation, or a retention lease is not found on a renew retention lease invocation today we throw an illegal argument exception. This puts a burden on the caller to catch that specific exception and parse the message. This commit relieves the burden from the caller by adding dedicated exception types for these situations.	2019-02-12 00:32:09 -05:00
Jason Tedor	b97c74bbab	Enable removal of retention leases (#38751 ) This commit introduces the ability to remove retention leases. Explicit removal will be needed to manage retention leases used to increase the likelihood of operation-based recoveries syncing, and for consumers such as ILM.	2019-02-11 21:19:11 -05:00
Nick Knize	e2f432a413	Fix the version check for LegacyGeoShapeFieldMapper (#38547 ) Change version check from 7.0 to 6.6 in BaseGeoShapeFieldMapper to correctly use LegacyGeoShapeFieldMapper for indexes created prior to 6.6.	2019-02-11 16:27:47 -06:00
Nick Knize	078da6d9bd	Fix GeoHash PrefixTree BWC (#38584 ) geo_shape indexes created before 6.6 use geohash string encoding as default tree parameter and quadtree encoding for 6.6 and later. This commit fixes bwc to use geohash encoding in LegacyGeoshapeFieldMapper for indexes created before 6.6.	2019-02-11 11:59:51 -06:00
David Roberts	d1848b96fc	Fix possible assertion failure in IndicesQueryCache.close (#38731 ) The assertion that the stats2 map is empty in IndicesQueryCache.close has been observed to fail very occasionally in internal cluster tests. The likely cause is a cross-thread visibility problem for a count variable. This change makes that count volatile. Relates #37117 Backport of #38714	2019-02-11 17:33:20 +00:00
Tanguy Leroux	dc212de822	Specialize pre-closing checks for engine implementations (#38702 ) (#38722 ) The Close Index API has been refactored in 6.7.0 and it now performs pre-closing sanity checks on shards before an index is closed: the maximum sequence number must be equals to the global checkpoint. While this is a strong requirement for regular shards, we identified the need to relax this check in the case of CCR following shards. The following shards are not in charge of managing the max sequence number or global checkpoint, which are pulled from a leader shard. They also fetch and process batches of operations from the leader in an unordered way, potentially leaving gaps in the history of ops. If the following shard lags a lot it's possible that the global checkpoint and max seq number never get in sync, preventing the following shard to be closed and a new PUT Follow action to be issued on this shard (which is our recommended way to resume/restart a CCR following). This commit allows each Engine implementation to define the specific verification it must perform before closing the index. In order to allow following/frozen/closed shards to be closed whatever the max seq number or global checkpoint are, the FollowingEngine and ReadOnlyEngine do not perform any check before the index is closed. Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com>	2019-02-11 17:34:17 +01:00
Luca Cavanna	6443b46184	Clean up ShardSearchLocalRequest (#38574 ) Added a constructor accepting `StreamInput` as argument, which allowed to make most of the instance members final as well as remove the default constructor. Removed a test only constructor in favour of invoking the existing constructor that takes a `SearchRequest` as first argument. Also removed profile members and related methods as they were all unused.	2019-02-11 15:55:46 +01:00
Alexander Reelsen	884b5063a4	Create ISO8601 joda compatible java time formatter (#38434 ) The existing formatter being used was not on par with the joda formatter as it was missing the ability to parse a comma as a separator between seconds and milliseconds. While a real iso8601 would be much more complex, this might be sufficient for some more use-cases. The ingest date formatter now also uses the iso8601 formatter by default. Closes #38345	2019-02-11 15:11:26 +01:00
Alexander Reelsen	e7868e92bd	Restore date aggregation performance in UTC case (#38221 ) (#38700 ) The benchmarks showed a sharp decrease in aggregation performance for the UTC case. This commit uses the same calculation as joda time, which requires no conversion into any java time object, also, the check for an fixedoffset has been put into the ctor to reduce the need for runtime calculations. The same goes for the amount of the used unit in milliseconds. Closes #37826	2019-02-11 16:30:48 +03:00
Luca Cavanna	fe8bd757b2	Look up connection using the right cluster alias when releasing contexts (#38570 ) Whenever phase failure is raised in AbstractSearchAsyncAction, we go and release search contexts of shards that successfully returned their results, prior to notifying the listener of the failure. In case we are executing a CCS request, it's important to look-up the connection to send the release context request to. This commit makes sure that the lookup takes the cluster alias into account. We used to use `null` at all times instead which is not correct and was not caught as any exception is caught without re-throwing it.	2019-02-11 13:40:42 +01:00
Przemyslaw Gomulka	ba9a4d13e1	mute Failing tests related to logging and joda-java migration backport(#38704 )(#38710 ) the tests awaits fix from #38693 and #38705 and #38581	2019-02-11 13:15:12 +01:00
Przemyslaw Gomulka	ab9e2f2e69	Move testToUtc test to DateFormattersTests #38698 Backport #38610 The test was relying on toString in ZonedDateTime which is different to what is formatted by strict_date_time when milliseconds are 0 The method is just delegating to dateFormatter, so that scenario should be covered there. closes #38359 Backport #38610	2019-02-11 11:34:25 +01:00
Like	b8be6cb5c7	Reject index.optimize_auto_generated_id setting (#28895 ) This commit rejects the index.optmize_auto_generated_id setting for indices created on or after 7.0.0. This setting was deprecated in 6.7.0.	2019-02-10 13:46:09 -05:00
Tim Brooks	023e3c207a	Concurrent file chunk fetching for CCR restore (#38656 ) Adds the ability to fetch chunks from different files in parallel, configurable using the new `ccr.indices.recovery.max_concurrent_file_chunks` setting, which defaults to 5 in this PR. The implementation uses the parallel file writer functionality that is also used by peer recoveries.	2019-02-09 21:19:57 -07:00
Christoph Büscher	e3c7b93917	Mute failure in InternalEngineTests (#38622 )	2019-02-08 16:29:54 +01:00
Dimitris Athanasiou	fe8182ece2	Mute RetentionLeastIT.testRetentionLeasesSyncOnRecovery on 7x (#38597 )	2019-02-08 11:32:28 +02:00
Jason Tedor	fdf6b3f23f	Add 7.1 version constant to 7.x branch (#38513 ) This commit adds the 7.1 version constant to the 7.x branch. Co-authored-by: Andy Bristol <andy.bristol@elastic.co> Co-authored-by: Tim Brooks <tim@uncontended.net> Co-authored-by: Christoph Büscher <cbuescher@posteo.de> Co-authored-by: Luca Cavanna <javanna@users.noreply.github.com> Co-authored-by: markharwood <markharwood@gmail.com> Co-authored-by: Ioannis Kakavas <ioannis@elastic.co> Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co> Co-authored-by: David Roberts <dave.roberts@elastic.co> Co-authored-by: Jason Tedor <jason@tedor.me> Co-authored-by: Alpar Torok <torokalpar@gmail.com> Co-authored-by: David Turner <david.turner@elastic.co> Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com> Co-authored-by: Tim Vernum <tim@adjective.org> Co-authored-by: Albert Zaharovits <albert.zaharovits@gmail.com>	2019-02-07 16:32:27 -05:00
Jason Tedor	f8ed6c15c4	Enable BWC after backport recovering leases (#38485 ) This commit enables the BWC tests after backporting recovery of retention leases during peer recovery.	2019-02-06 08:03:19 -05:00
Jason Tedor	4b42281a4e	Collapse retention lease integration tests (#38483 ) This commit collapses the retention lease integration tests into a single suite.	2019-02-06 07:55:41 -05:00
Tanguy Leroux	510829f9f7	TransportVerifyShardBeforeCloseAction should force a flush (#38401 ) This commit changes the `TransportVerifyShardBeforeCloseAction` so that it always forces the flush of the shard. It seems that #37961 is not sufficient to ensure that the translog and the Lucene commit share the exact same max seq no and global checkpoint information in case of one or more noop operations have been made. The `BulkWithUpdatesIT.testThatMissingIndexDoesNotAbortFullBulkRequest` and `FrozenIndexTests.testFreezeEmptyIndexWithTranslogOps` test this trivial situation and they both fail 1 on 10 executions. Relates to #33888	2019-02-06 13:22:54 +01:00
David Turner	5a3c452480	Align docs etc with new discovery setting names (#38492 ) In #38333 and #38350 we moved away from the `discovery.zen` settings namespace since these settings have an effect even though Zen Discovery itself is being phased out. This change aligns the documentation and the names of related classes and methods with the newly-introduced naming conventions.	2019-02-06 11:34:38 +00:00
Ioannis Kakavas	e1d464b22c	Mute testRetentionLeasesSyncOnRecovery (#38488 ) Relates: #38487	2019-02-06 08:52:54 +02:00
Armin Braun	34f2cc78f6	Fix Master Failover and DataNode Leave Blocking Snapshot (#38460 ) * Closes #38447	2019-02-05 23:56:59 +01:00
Jason Tedor	79a45b47da	Recover retention leases during peer recovery (#38435 ) This commit integrates retention leases with recovery. With this change, we copy the current retention leases on primary to the replica during phase two of recovery. At this point, the replica will have been added to the replication group and so is already receiving retention lease sync requests from the primary. This means that if any retention lease syncs are triggered on the primary after we sample the retention leases here during phase two, that sync request will also arrive on the replica ensuring that the replica is from this point on up to date with the retention leases on the primary. We have to copy these during phase two since we will be applying indexing operations, potentially triggering merges, and therefore must ensure the correct retention leases are in place beforehand.	2019-02-05 17:43:41 -05:00
Henning Andersen	20c66c5a05	Bubble-up exceptions from scheduler (#38317 ) Instead of logging warnings we now rethrow exceptions thrown inside scheduled/submitted tasks. This will still log them as warnings in production but has the added benefit that if they are thrown during unit/integration test runs, the test will be flagged as an error. This is a continuation of #38014 Fixed NPE that caused CCR tests (IndexFollowingIT and likely others) to fail. schedule could bubble rejected exception to uncaught exception handler when not using SAME executor if thread pool is terminated. Now ignore rejected exception silently if executor is shutdown.	2019-02-05 21:48:24 +01:00
Boaz Leskes	033ba725af	Remove support for internal versioning for concurrency control (#38254 ) Elasticsearch has long [supported](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html#index-versioning) compare and set (a.k.a optimistic concurrency control) operations using internal document versioning. Sadly that approach is flawed and can sometime do the wrong thing. Here's the relevant excerpt from the resiliency status page: > When a primary has been partitioned away from the cluster there is a short period of time until it detects this. During that time it will continue indexing writes locally, thereby updating document versions. When it tries to replicate the operation, however, it will discover that it is partitioned away. It won’t acknowledge the write and will wait until the partition is resolved to negotiate with the master on how to proceed. The master will decide to either fail any replicas which failed to index the operations on the primary or tell the primary that it has to step down because a new primary has been chosen in the meantime. Since the old primary has already written documents, clients may already have read from the old primary before it shuts itself down. The version numbers of these reads may not be unique if the new primary has already accepted writes for the same document We recently [introduced](https://www.elastic.co/guide/en/elasticsearch/reference/6.x/optimistic-concurrency-control.html) a new sequence number based approach that doesn't suffer from this dirty reads problem. This commit removes support for internal versioning as a concurrency control mechanism in favor of the sequence number approach. Relates to #1078	2019-02-05 20:53:35 +01:00
Jason Tedor	b03d138122	Lift retention lease expiration to index shard (#38380 ) This commit lifts the control of when retention leases are expired to index shard. In this case, we move expiration to an explicit action rather than a side-effect of calling ReplicationTracker#getRetentionLeases. This explicit action is invoked on a timer. If any retention leases expire, then we hard sync the retention leases to the replicas. Otherwise, we proceed with a background sync.	2019-02-05 14:42:17 -05:00
Tim Brooks	c2a8fe1f91	Prevent CCR recovery from missing documents (#38237 ) Currently the snapshot/restore process manually sets the global checkpoint to the max sequence number from the restored segements. This does not work for Ccr as this will lead to documents that would be recovered in the normal followering operation from being recovered. This commit fixes this issue by setting the initial global checkpoint to the existing local checkpoint.	2019-02-05 13:32:41 -06:00
Tal Levy	aef5775561	re-enables awaitsfixed datemath tests (#38376 ) Previously, date formats of `YYYY.MM.dd` would hit an issue where the year would jump towards the end of the calendar year. This was an issue that had since been resolved in tests by using `yyyy` to be the more accurate representation of the year. Closes #37037.	2019-02-05 11:20:40 -08:00
Julie Tibshirani	3ce7d2c9b6	Make sure to reject mappings with type _doc when include_type_name is false. (#38270 ) `CreateIndexRequest#source(Map<String, Object>, ... )`, which is used when deserializing index creation requests, accidentally accepts mappings that are nested twice under the type key (as described in the bug report #38266). This in turn causes us to be too lenient in parsing typeless mappings. In particular, we accept the following index creation request, even though it should not contain the type key `_doc`: ``` PUT index?include_type_name=false { "mappings": { "_doc": { "properties": { ... } } } } ``` There is a similar issue for both 'put templates' and 'put mappings' requests as well. This PR makes the minimal changes to detect and reject these typed mappings in requests. It does not address #38266 generally, or attempt a larger refactor around types in these server-side requests, as I think this should be done at a later time.	2019-02-05 10:52:32 -08:00
David Turner	f2dd5dd6eb	Remove DiscoveryPlugin#getDiscoveryTypes (#38414 ) With this change we no longer support pluggable discovery implementations. No known implementations of `DiscoveryPlugin` actually override this method, so in practice this should have no effect on the wider world. However, we were using this rather extensively in tests to provide the `test-zen` discovery type. We no longer need a separate discovery type for tests as we no longer need to customise its behaviour. Relates #38410	2019-02-05 17:42:24 +00:00
David Turner	b7ab521eb1	Throw AssertionError when no master (#38432 ) Today we throw a fatal `RuntimeException` if an exception occurs in `getMasterName()`, and this includes the case where there is currently no master. However, sometimes we call this method inside an `assertBusy()` in order to allow for a cluster that is in the process of stabilising and electing a master. The trouble is that `assertBusy()` only retries on an `AssertionError` and not on a general `RuntimeException`, so the lack of a master is immediately fatal. This commit fixes the issue by asserting there is a master, triggering a retry if there is not. Fixes #38331	2019-02-05 17:11:20 +00:00
Armin Braun	2f6afd290e	Fix Concurrent Snapshot Ending And Stabilize Snapshot Finalization (#38368 ) * The problem in #38226 is that in some corner cases multiple calls to `endSnapshot` were made concurrently, leading to non-deterministic behavior (`beginSnapshot` was triggering a repository finalization while one that was triggered by a `deleteSnapshot` was already in progress) * Fixed by: * Making all `endSnapshot` calls originate from the cluster state being in a "completed" state (apart from on short-circuit on initializing an empty snapshot). This forced putting the failure string into `SnapshotsInProgress.Entry`. * Adding deduplication logic to `endSnapshot` * Also: * Streamlined the init behavior to work the same way (keep state on the `SnapshotsService` to decide which snapshot entries are stale) * closes #38226	2019-02-05 16:44:18 +01:00
Lee Hinman	d862453d68	Support unknown fields in ingest pipeline map configuration (#38352 ) We already support unknown objects in the list of pipelines, this changes the `PipelineConfiguration` to support fields other than just `id` and `config`. Relates to #36938	2019-02-05 07:52:17 -07:00
David Turner	3b2a0d7959	Rename no-master-block setting (#38350 ) Replaces `discovery.zen.no_master_block` with `cluster.no_master_block`. Any value set for the old setting is now ignored.	2019-02-05 08:47:56 +00:00
David Turner	2d114a02ff	Rename static Zen1 settings (#38333 ) Renames the following settings to remove the mention of `zen` in their names: - `discovery.zen.hosts_provider` -> `discovery.seed_providers` - `discovery.zen.ping.unicast.concurrent_connects` -> `discovery.seed_resolver.max_concurrent_resolvers` - `discovery.zen.ping.unicast.hosts.resolve_timeout` -> `discovery.seed_resolver.timeout` - `discovery.zen.ping.unicast.hosts` -> `discovery.seed_addresses`	2019-02-05 08:46:52 +00:00
Yogesh Gaikwad	fe36861ada	Add support for API keys to access Elasticsearch (#38291 ) X-Pack security supports built-in authentication service `token-service` that allows access tokens to be used to access Elasticsearch without using Basic authentication. The tokens are generated by `token-service` based on OAuth2 spec. The access token is a short-lived token (defaults to 20m) and refresh token with a lifetime of 24 hours, making them unsuitable for long-lived or recurring tasks where the system might go offline thereby failing refresh of tokens. This commit introduces a built-in authentication service `api-key-service` that adds support for long-lived tokens aka API keys to access Elasticsearch. The `api-key-service` is consulted after `token-service` in the authentication chain. By default, if TLS is enabled then `api-key-service` is also enabled. The service can be disabled using the configuration setting. The API keys:- - by default do not have an expiration but expiration can be configured where the API keys need to be expired after a certain amount of time. - when generated will keep authentication information of the user that generated them. - can be defined with a role describing the privileges for accessing Elasticsearch and will be limited by the role of the user that generated them - can be invalidated via invalidation API - information can be retrieved via a get API - that have been expired or invalidated will be retained for 1 week before being deleted. The expired API keys remover task handles this. Following are the API key management APIs:- 1. Create API Key - `PUT/POST /_security/api_key` 2. Get API key(s) - `GET /_security/api_key` 3. Invalidate API Key(s) `DELETE /_security/api_key` The API keys can be used to access Elasticsearch using `Authorization` header, where the auth scheme is `ApiKey` and the credentials, is the base64 encoding of API key Id and API key separated by a colon. Example:- ``` curl -H "Authorization: ApiKey YXBpLWtleS1pZDphcGkta2V5" http://localhost:9200/_cluster/health ``` Closes #34383	2019-02-05 14:21:57 +11:00
Christoph Büscher	d255303584	Add typless client side GetIndexRequest calls and response class (#37778 ) The HLRC client currently uses `org.elasticsearch.action.admin.indices.get.GetIndexRequest` and `org.elasticsearch.action.admin.indices.get.GetIndexResponse` in its get index calls. Both request and response are designed for the typed APIs, including some return types e.g. for `getMappings()` which in the maps it returns still use a level including the type name. In order to change this without breaking existing users of the HLRC API, this PR introduces two new request and response objects in the `org.elasticsearch.client.indices` client package. These are used by the IndicesClient#get and IndicesClient#exists calls now by default and support the type-less API. The old request and response objects are still kept for use in similarly named, but deprecated methods. The newly introduced client side classes are simplified versions of the server side request/response classes since they don't need to support wire serialization, and only the response needs fromXContent parsing (but no xContent-serialization, since this is the responsibility of the server-side class). Also changing the return type of `GetIndexResponse#getMapping` to `Map<String, MappingMetaData> getMappings()`, while it previously was returning another map keyed by the type-name. Similar getters return simple Maps instead of the ImmutableOpenMaps that the server side response objects return.	2019-02-05 03:41:05 +01:00
Gordon Brown	292e0f6fb7	Deprecate `_type` in simulate pipeline requests (#37949 ) As mapping types are being removed throughout Elasticsearch, the use of `_type` in pipeline simulation requests is deprecated. Additionally, the default `_type` used if one is not supplied has been changed to `_doc` for consistency with the rest of Elasticsearch.	2019-02-04 16:11:44 -07:00
Christoph Büscher	0ced775389	Mute RareClusterStateIT.testDelayedMappingPropagationOnReplica (#38357 )	2019-02-04 22:30:34 +01:00
Mayya Sharipova	641704464d	Deprecate types in rollover index API (#38039 ) Relates to #35190	2019-02-04 16:07:45 -05:00
Zachary Tong	ab1150378b	Add Composite to AggregationBuilders (#38207 )	2019-02-04 13:47:04 -05:00
David Turner	2c1eab2b8a	Clarify slow cluster-state log messages (#38302 ) The message `... took [31s] above the warn threshold of 30s` suggests incorrectly that the task took 61 seconds. This commit adds the clarifying words `which is`.	2019-02-04 17:44:00 +00:00
Andrey Ershov	7bc8bc9605	ensureGreen (#38324 )	2019-02-04 16:36:04 +01:00
Jason Tedor	625d37a26a	Introduce retention lease background sync (#38262 ) This commit introduces a background sync for retention leases. The idea here is that we do a heavyweight sync when adding a new retention lease, and then periodically we want to background sync any retention lease renewals to the replicas. As long as the background sync interval is significantly lower than the extended lifetime of a retention lease, it is okay if from time to time a replica misses a sync (it will still have an older version of the lease that is retaining more data as we assume that renewals do not decrease the retaining sequence number). There are two follow-ups that will come after this commit. The first is to address the fact that we have not adapted the should periodically flush logic to possibly flush the retention leases. We want to do something like flush if we have not flushed in the last five minutes and there are renewed retention leases since the last time that we flushed. An additional follow-up will remove the syncing of retention leases when a retention lease expires. Today this sync could be invoked in the background by a merge operation. Rather, we will move the syncing of retention lease expiration to be done under the background sync. The background sync will use the heavyweight sync (write action) if a lease has expired, and will use the lightweight background sync (replication action) otherwise.	2019-02-04 10:35:29 -05:00
Christoph Büscher	5ee7232379	Mute SpecificMasterNodesIT#testElectOnlyBetweenMasterNodes (#38334 )	2019-02-04 16:10:06 +01:00
Christoph Büscher	715e581378	Mute DedicatedClusterSnapshotRestoreIT#testRestoreShrinkIndex (#38330 )	2019-02-04 15:46:19 +01:00
Boaz Leskes	e49b593c81	Move TokenService to seqno powered cas (#38311 ) Relates #37872 Relates #10708	2019-02-04 15:25:41 +01:00
Yannick Welsch	ece8c659c5	Decrease leader and follower check timeout (#38298 ) Reduces the leader and follower check timeout to 3 * 10 = 30s instead of 3 * 30 = 90s, with 30s still being a very long time for a node to be completely unresponsive.	2019-02-04 15:11:12 +01:00
Przemyslaw Gomulka	9b64558efb	Migrating from joda to java.time. Watcher plugin (#35809 ) part of the migrating joda time work. Migrating watcher plugin to use JDK's java-time refers #27330	2019-02-04 15:08:31 +01:00
Alexander Reelsen	87f3579125	Add nanosecond field mapper (#37755 ) This adds a dedicated field mapper that supports nanosecond resolution - at the price of a reduced date range. When using the date field mapper, the time is stored as milliseconds since the epoch in a long in lucene. This field mapper stores the time in nanoseconds since the epoch - which means its range is much smaller, ranging roughly from 1970 to 2262. Note that aggregations will still be in milliseconds. However docvalue fields will have full nanosecond resolution Relates #27330	2019-02-04 11:31:16 +01:00
Christoph Büscher	15510da2af	Mute SharedClusterSnapshotRestoreIT#testAbortedSnapshotDuringInitDoesNotStart (#38304 )	2019-02-04 10:41:35 +01:00
David Turner	1d82a6d9f9	Deprecate unused Zen1 settings (#38289 ) Today the following settings in the `discovery.zen` namespace are still used: - `discovery.zen.no_master_block` - `discovery.zen.hosts_provider` - `discovery.zen.ping.unicast.concurrent_connects` - `discovery.zen.ping.unicast.hosts.resolve_timeout` - `discovery.zen.ping.unicast.hosts` This commit deprecates all other settings in this namespace so that they can be removed in the next major version.	2019-02-04 08:52:08 +00:00
Armin Braun	4561f425db	Remove Redundandant Loop in SnapshotShardsService (#38283 ) * This was a merge mistake on my end I think, obviously we only need to loop over the shards once not twice here to find those that we missed in INIT state	2019-02-04 09:06:39 +01:00
Alpar Torok	d58e899d45	Remove empty service files (#38192 )	2019-02-04 10:05:04 +02:00
Jason Tedor	d2cc1459a3	Fix ordering problem in add or renew lease test (#38280 ) We have to set the primary term before we add a retention lease, otherwise we can not assert the correct primary term.	2019-02-03 12:54:31 -05:00
Christoph Büscher	6ca7a913ea	Mute ReplicationTrackerRetentionLeaseTests#testAddOrRenewRetentionLease (#38275 )	2019-02-03 12:54:13 +01:00
Armin Braun	89d7c57bd9	Fix Incorrect Transport Response Handler Type (#38264 ) * Fix Incorrect Transport Response Handler Type * The response type here is not empty and was always wrong but this only became visible now that `0a604e3b24` was introduced * As a result of `0a604e3b24` we started actually handling the response of this request and logging/handling exceptions before that we simply dropped the classcast exception here quietly using the empty response handler * fix busy assert not handling `Exception` * Closes #38226 * Closes #38256	2019-02-03 08:48:15 +01:00
Nhat Nguyen	0861dc3581	Mute testCanRunUnsafeBootstrapAfterErroneousDetachWithoutLoosingMetaData (#38268 ) Tracked at #38267	2019-02-02 20:02:21 -05:00
Christoph Büscher	50cdc61874	Mute DedicatedClusterSnapshotRestoreIT#testRestoreShrinkIndex (#38257 )	2019-02-02 13:46:29 +01:00
David Turner	c311062476	Add CoordinatorTests for empty unicast hosts list (#38209 ) Today we have DiscoveryDisruptionIT tests for checking that discovery can still work once the cluster has formed, even if the cluster is misconfigured and only has a single master-eligible node in its unicast hosts list. In fact with Zen2 we can go one better: we do not need any nodes in the unicast hosts list, because nodes also use the contents of the last-committed cluster state for discovery. Additionally, the DiscoveryDisruptionIT tests were failing due to the overenthusiastic fault-detection timeouts. This commit replaces these tests with deterministic `CoordinatorTests` that verify the same behaviour. It also removes some duplication by extracting a test method called `testFollowerCheckerAfterMasterReelection()` Closes #37687	2019-02-02 07:54:56 +00:00
Nhat Nguyen	80d3092292	Fix primary term in testAddOrRenewRetentionLease (#38239 ) We should increase primary term before renewing leases; otherwise, the term of the latest RetentionLeases will be lower than the current term. Relates #37951	2019-02-02 02:38:53 -05:00
Nhat Nguyen	1ec04dff43	FIx testReplicaIgnoresOlderRetentionLeasesVersion (#38246 ) If the innerLength is 0, the version won't be increased; then there will be two RetentionLeases with the same term and version, but their leases are different. Relates #37951 Closes #38245	2019-02-02 02:37:37 -05:00
Nhat Nguyen	8bee5b8e06	Mute testAddOrRenewRetentionLease (#38240 ) Relates #38239	2019-02-01 21:27:10 -05:00
Boaz Leskes	f6e06a2b19	Adapt minimum versions for seq# powered operations in Watch related requests and UpdateRequest (#38231 ) After backporting #37977, #37857 and #37872	2019-02-01 20:37:16 -05:00
Jason Tedor	f181e17038	Introduce retention leases versioning (#37951 ) Because concurrent sync requests from a primary to its replicas could be in flight, it can be the case that an older retention leases collection arrives and is processed on the replica after a newer retention leases collection has arrived and been processed. Without a defense, in this case the replica would overwrite the newer retention leases with the older retention leases. This commit addresses this issue by introducing a versioning scheme to retention leases. This versioning scheme is used to resolve out-of-order processing on the replica. We persist this version into Lucene and restore it on recovery. The encoding of retention leases is starting to get a little ugly. We can consider addressing this in a follow-up.	2019-02-01 17:19:19 -05:00
Nhat Nguyen	9c39dea7ae	AwaitsFix testAbortedSnapshotDuringInitDoesNotStart (#38227 ) Tracked at #38226	2019-02-01 16:24:02 -05:00
Armin Braun	03a1d21070	SnapshotShardsService Simplifications (#38025 ) * Instead of replacing the `shardSnapshots` field, we mutate it, explicitly removing entries from it in only a single spot * Decreased the amount of indirection by moving all logic for starting a snapshot's newly discovered shard tasks into `startNewShards` (saves us two maps (keyed by snapshot) and iterations over them)	2019-02-01 20:46:14 +01:00
Luca Cavanna	ee57420de6	Adjust SearchRequest version checks (#38181 ) The finalReduce flag is now supported on 6.x too, hence we need to update the version checks in master.	2019-02-01 19:23:13 +01:00
Andrey Ershov	04dc41b99e	Zen2ify RareClusterStateIT (#38184 ) In Zen 1 there are commit timeout and publish timeout and these settings could be changed on-the-fly. In Zen 2, there is only commit timeout and this setting is static. RareClusterStateIT is actively using these settings and the fact, they are dynamic. This commit adds cancelCommitedPublication method to Coordinator to be used by tests. This method will cancel current committed publication if there is any. When there is BlockClusterStateProcessing on the non-master node, the publication will be accepted and committed, but not yet applied. So we can use the method above to cancel it. Also, this commit replaces callback + AtomicReference with ActionFuture, which makes test code easier to read.	2019-02-01 18:18:11 +01:00
Yannick Welsch	025bf28405	Fix _host based require filters (#38173 ) Using index.routing.allocation.require._host does not correctly work because the boolean logic in filter matching is broken (DiscoveryNodeFilters.match(...) will return false) when opType ==OpType.AND	2019-02-01 16:02:37 +01:00
Tanguy Leroux	da6269b456	RestoreService should update primary terms when restoring shards of existing indices (#38177 ) When restoring shards of existing indices, the RestoreService also restores the values of primary terms stored in the snapshot index metadata. The primary terms are not updated and could potentially conflict with current index primary terms if the restored primary terms are lower than the existing ones. This situation is likely to happen with replicated closed indices (because primary terms are increased when the index is transitioning from open to closed state, and the snapshotted primary terms are the one at the time the index was opened) (see #38024) and maybe also with CCR. This commit changes the RestoreService so that it updates the primary terms using the maximum value between the snapshotted values and the existing values. Related to #33888	2019-02-01 15:59:11 +01:00
Desmond Vehar	c1c4abae10	Throw if two inner_hits have the same name (#37645 ) This change throws an error if two inner_hits have the same name Closes #37584	2019-02-01 15:53:50 +01:00
Alexander Reelsen	35ed137684	Ensure joda compatibility in custom date formats (#38171 ) If custom date formats are used, there may be combinations that the new performat DateFormatters.from() method has not covered yet. This adds a few such corner cases and ensures the tests are correctly commented out.	2019-02-01 15:42:56 +01:00
Jim Ferenczi	66e4fb4fb6	Do not compute cardinality if the `terms` execution mode does not use `global_ordinals` (#38169 ) In #38158 we ensured that global ordinals are not loaded when another execution hint is explicitly set on the source. This change is a follow up that addresses a comment `dd6043c1c0 (r252984782)` added after the merge.	2019-02-01 15:32:19 +01:00
Nhat Nguyen	2e475d63f7	Do not set timeout for IndexRequests in GatewayIndexStateIT (#38147 ) CI might not be fast enough to publish a dynamic mapping update within 100ms.	2019-02-01 09:30:03 -05:00
Andrey Ershov	c1270e97b0	Zen2ify testMasterFailoverDuringIndexingWithMappingChanges (#38178 ) In Zen2 cluster bootstrap is required and some parameters are called differently in Zen2.	2019-02-01 15:24:08 +01:00
Andrey Ershov	bda591453c	Add elasticsearch-node detach-cluster command (#37979 ) This commit adds the second part of `elasticsearch-node` tool - `detach-cluster` command in addition to `unsafe-bootstrap` command. Also, this commit changes the semantics of `unsafe-bootstrap`, now `unsafe-bootstrap` changes clusterUUID. So the algorithm of running `elasticsearch-node` tool is the following: 1) Stop all nodes in the cluster. 2) Pick master-eligible node with the highest (term, version) pair and run the `unsafe-bootstrap` command on it. If there are no survived master-eligible nodes - skip this step. 3) Run `detach-cluster` command on the remaining survived nodes. Detach cluster makes the following changes to the node metadata: 1) Sets clusterUUID committed to false. 2) Sets currentTerm and term to 0. 3) Removes voting tombstones and sets voting configurations to special constant MUST_JOIN_ELECTED_MASTER, that prevents initial cluster bootstrap. `ElasticsearchNodeCommand` base abstract class is introduced, because `UnsafeBootstrapMasterCommand` and `DetachClusterCommand` have a lot in common. Also, this commit adds "ordinal" parameter to both commands, because it's impossible to write IT otherwise. For MUST_JOIN_ELECTED_MASTER case special handling is introduced in `ClusterFormationFailureHelper`. Tests for both commands reside in `ElasticsearchNodeCommandIT` (renamed from `UnsafeBootstrapMasterIT`).	2019-02-01 14:53:55 +01:00
Alexander Reelsen	979e5576e5	Add tests for fractional epoch parsing (#38162 ) Fractional epoch parsing is supported, the tests we used were edge cases that did not make sense. This adds tests to properly check for this.	2019-02-01 14:48:37 +01:00
Tanguy Leroux	029e4b6278	Clear send behavior rule in CloseWhileRelocatingShardsIT (#38159 ) The current CloseWhileRelocatingShardsIT test adds some "send behavior" rule to a target node's mocked transport service in order to detect when shard relocating are started. These rules are never cleared and prevent the test to complete normally after the rebalance is re-enabled again. This commit changes the test so that rules are cleared and most verifications are done before the rebalance is reenabled again. Closes #38090	2019-02-01 12:58:46 +01:00
Yannick Welsch	ce469cfda5	Fix testCorruptedIndex (#38161 ) Folks at the Lucene project do not seem to be interested in classifying corruptions and distinguishing them from file-system exceptions (see https://issues.apache.org/jira/browse/LUCENE-8525), so we'll just cop out as well. Closes #34322	2019-02-01 12:51:38 +01:00
Luca Cavanna	e18cac3659	Add finalReduce flag to SearchRequest (#38104 ) With #37000 we made sure that fnial reduction is automatically disabled whenever a localClusterAlias is provided with a SearchRequest. While working on #37838, we found a scenario where we do need to set a localClusterAlias yet we would like to perform a final reduction in the remote cluster: when searching on a single remote cluster. Relates to #32125 This commit adds support for a separate finalReduce flag to SearchRequest and makes use of it in TransportSearchAction in case we are searching against a single remote cluster. This also makes sure that num_reduce_phases is correct when searching against a single remote cluster: it makes little sense to return `num_reduce_phases` set to `2`, which looks especially weird in case the search was performed against a single remote shard. We should perform one reduction phase only in this case and `num_reduce_phases` should reflect that. * line length	2019-02-01 12:11:42 +01:00
Jim Ferenczi	6fa93ca493	Forbid negative field boosts in analyzed queries (#37930 ) This change forbids negative field boost in the `query_string`, `simple_query_string` and `multi_match` queries. Negative boosts are not allowed in Lucene 8 (scores must be positive). The backport of this change to 6x will turn the error into a deprecation warning in order to raise the awareness of this breaking change in 7.0. Closes #33309	2019-02-01 11:41:40 +01:00
Jim Ferenczi	57b1d245e8	Remove AtomiFieldData#getLegacyFieldValues (#38087 ) This function is unused now that we format the docvalue fields with the default formatter on the field (#30831)	2019-02-01 11:41:17 +01:00
Andrey Ershov	bfd618cf83	Universal cluster bootstrap method for tests with autoMinMasterNodes=false (#38038 ) Currently, there are a few tests that use autoMinMasterNodes=false and hence override addExtraClusterBootstrapSettings, mostly this is 10-30 lines of codes that are copy-pasted from class to class. This PR introduces `InternalTestCluster.setBootstrapMasterNodeIndex` which is suitable for all classes and copy-paste could be removed. Removing code is always a good thing!	2019-02-01 11:34:31 +01:00
Jim Ferenczi	b7308aa03c	Don't load global ordinals with the `map` execution_hint (#37833 ) The terms aggregator loads the global ordinals to retrieve the cardinality of the field to aggregate on. This information is then used to select the strategy to use for the aggregation (breadth_first or depth_first). However this should be avoided if the execution_hint is explicitly set to map since this mode doesn't really need the global ordinals. Since we still need the cardinality of the field this change picks the maximum cardinality in the segments as an estimation of the total cardinality to select the strategy to use (breadth_first or depth_first). This estimation is only used if the execution hint is set to map, otherwise the global ordinals are still used to retrieve the accurate cardinality. Closes #37705	2019-02-01 09:35:46 +01:00
David Turner	23f00e3676	Relax fault detector in some disruption tests (#38101 ) Today we use `AbstractDisruptionTestCase` to test the behaviour of things like master elections in the presence of cluster disruptions. These tests have rather enthusiastic fault detection settings, detecting a fault if a single ping fails, with a one-second timeout. Furthermore there are some tests that assert the identity of the master remains unchanged during some disruption, and these assertions fail rather often thanks to the overly sensitive fault detector. However in a number of these tests the fault detector need not be this sensitive. This commit moves some such tests into their own test suite and uses more sensible fault-detection settings to avoid the kind of master instability that is causing CI failures. Closes #37699	2019-02-01 08:10:49 +00:00
Alexander Reelsen	c02cd3e2fd	Fix java time epoch date formatters (#37829 ) The self written epoch date formatters were not properly able to format an Instant to a string due to a misconfiguration. This fix also removes a until now existing runtime behaviour under java 8 regarding the names of the aggregation buckets, which are now the same as before and have been under java 11.	2019-02-01 09:03:48 +01:00
Yannick Welsch	859e2f5bc8	Adapt timeouts in UpdateMappingIntegrationIT Relates to #37263 and possibly #36916	2019-02-01 08:58:31 +01:00
Adrien Grand	d83c748417	Fix test bug in DynamicMappingsIT. (#37906 ) Closes #37898	2019-02-01 08:35:29 +01:00
Przemyslaw Gomulka	2758578570	Trim the JSON source in indexing slow logs (#38081 ) The '{' as a first character in log line is causing problems for beats when parsing plaintext logs. This can happen if the submitted document has an additional '\n' at the beginning and we are not reformatting. Trimming the source part of a SlogLog solves that and keeps the logs readable. closes #38080	2019-02-01 08:12:12 +01:00
Armin Braun	0a604e3b24	Fix Two Races that Lead to Stuck Snapshots (#37686 ) * Fixes two broken spots: 1. Master failover while deleting a snapshot that has no shards will get stuck if the new master finds the 0-shard snapshot in `INIT` when deleting 2. Aborted shards that were never seen in `INIT` state by the `SnapshotsShardService` will not be notified as failed, leading to the snapshot staying in `ABORTED` state and never getting deleted with one or more shards stuck in `ABORTED` state * Tried to make fixes as short as possible so we can backport to `6.x` with the least amount of risk * Significantly extended test infrastructure to reproduce the above two issues * Two new test runs: 1. Reproducing the effects of node disconnects/restarts in isolation 2. Reproducing the effects of disconnects/restarts in parallel with shard relocations and deletes * Relates #32265 * Closes #32348	2019-02-01 05:45:40 +01:00
Nhat Nguyen	b8b843476d	Disable dynamic mapping in testSimpleGetFieldMappingsWithDefaults (#38045 ) Since #31140 we no longer require acking on the dynamic mapping of index requests. Thus, a returned mapping from a get mapping request does not necessarily contain the dynamic updates from the index request. This commit replaces the dynamic mapping update with a manual put mapping. Relates #31140 Closes #37928	2019-01-31 21:01:41 -05:00
Nhat Nguyen	a8ebe2a217	Fix random params in testSoftDeletesRetentionLock (#38114 ) Since #37992 the retainingSequenceNumber is initialized with 0 while the global checkpoint can be -1. Relates #37992	2019-01-31 20:50:41 -05:00
Lee Hinman	c67a9663af	Fix MasterServiceTests.testClusterStateUpdateLogging (#38116 ) This changes the test to not use a `CountDownlatch`, instead adding an assertion for the final logging message and waiting until the `MockAppender` has seen it before proceeding. Related to df2c06f6f30f7e23a6863a3f72fc3bdb7648885c Resolves #23739	2019-01-31 17:13:19 -07:00
Yuri Astrakhan	f3cde06a1d	geotile_grid implementation (#37842 ) Implements `geotile_grid` aggregation This patch refactors previous implementation https://github.com/elastic/elasticsearch/pull/30240 This code uses the same base classes as `geohash_grid` agg, but uses a different hashing algorithm to allow zoom consistency. Each grid bucket is aligned to Web Mercator tiles.	2019-01-31 19:11:30 -05:00
Pascal Christoph	a3d9ba3f4b	Log document id when MapperParsingException occurs (#37800 ) Closes #37658	2019-01-31 16:33:13 -05:00
Nhat Nguyen	237fcda2cc	Disable dynamic mapping update in testTransportBulkTasks (#38073 ) If a replica does not have a right mapping yet, we will retry the index request on that replica; then the actual tasks is higher than the expected tasks. Since #31140 this happens more frequently for we no longer require acking on the dynamic mapping of index requests. Relates #31140 Closes #37893	2019-01-31 13:16:52 -05:00
Przemyslaw Gomulka	28b5c7ce78	Do not set up NodeAndClusterIdStateListener in test (#38110 ) When extending ESIntegTestCase are run on the same jvm, the static field in NodeAndClusterIdConverter will throw an AlreadySet exceptions. overriding the configuration method from Node.configureNodeAndClusterIdStateListener in the MockNode will prevent the listener registration from happening relates #32850	2019-01-31 18:59:40 +01:00
Nhat Nguyen	8e95780f98	Soft-deletes policy should always fetch latest leases (#37940 ) If a new retention lease is added while a primary's soft-deletes policy is locked for peer-recovery, that lease won't be baked into the Lucene commit. Relates #37165 Relates #37375	2019-01-31 12:02:57 -05:00
Henning Andersen	68ed72b923	Handle scheduler exceptions (#38014 ) Scheduler.schedule(...) would previously assume that caller handles exception by calling get() on the returned ScheduledFuture. schedule() now returns a ScheduledCancellable that no longer gives access to the exception. Instead, any exception thrown out of a scheduled Runnable is logged as a warning. This is a continuation of #28667, #36137 and also fixes #37708.	2019-01-31 17:51:45 +01:00
David Turner	7f738e8541	Minor logging improvements (#38084 ) Fixes some log messages that caused some minor confusion when digging through a log generated by a failing test.	2019-01-31 16:41:04 +00:00
Tal Levy	9923f0fe6a	fix a few versionAdded values in ElasticsearchExceptions (#37877 ) TooManyBucketsException was introduced in v6.2 and SnapshotInProgressException was introduced in v6.7	2019-01-31 08:28:20 -08:00
Tanguy Leroux	7a597cad0d	Reenable BWC tests after backport of #37899 (#38093 ) This commit adapts the version used in StartedShardEntry serialization after the backport of #37899 and reenables bwc tests. Related to #37899 Related to #38074	2019-01-31 16:53:28 +01:00
Henning Andersen	7487be3d3c	Un-mute NoMasterNodeIT.testNoMasterActionsWriteMasterBlock	2019-01-31 15:31:01 +01:00
Jason Tedor	a9b12b38f0	Push primary term to replication tracker (#38044 ) This commit pushes the primary term into the replication tracker. This is a precursor to using the primary term to resolving ordering problems for retention leases. Namely, it can be that out-of-order retention lease sync requests arrive on a replica. To resolve this, we need a tuple of (primary term, version). For this to be, the primary term needs to be accessible in the replication tracker. As the primary term is part of the replication group anyway, this change conceptually makes sense.	2019-01-31 09:19:49 -05:00
Luca Cavanna	622fb7883b	Introduce ability to minimize round-trips in CCS (#37828 ) With #37566 we have introduced the ability to merge multiple search responses into one. That makes it possible to expose a new way of executing cross-cluster search requests, that makes CCS much faster whenever there is network latency between the CCS coordinating node and the remote clusters. The coordinating node can now send a single search request to each remote cluster, which gets reduced by each one of them. from + size results are requested to each cluster, and the reduce phase in each cluster is non final (meaning that buckets are not pruned and pipeline aggs are not executed). The CCS coordinating node performs an additional, final reduction, which produces one search response out of the multiple responses received from the different clusters. This new execution path will be activated by default for any CCS request unless a scroll is provided or inner hits are requested as part of field collapsing. The search API accepts now a new parameter called ccs_minimize_roundtrips that allows to opt-out of the default behaviour. Relates to #32125	2019-01-31 15:12:14 +01:00
Armin Braun	ae9f4df361	Don't Assert Ack on when Publish Timeout is 0 in Test (#38077 ) * Publish timeout is set to `0` so out of order processing of states on the node can lead to a `false` ack response * See #30672 * Closes #36813	2019-01-31 14:35:11 +01:00
Alexander Reelsen	9f026bb8ad	Reduce object creation in Rounding class (#38061 ) This reduces objects creations in the rounding class (used by aggs) by properly creating the objects only once. Furthermore a few unneeded ZonedDateTime objects were created in order to create other objects out of them. This was changed as well. Running the benchmarks shows a much faster performance for all of the java time based Rounding classes.	2019-01-31 14:18:28 +01:00
Adrien Grand	a536fa7755	Treat put-mapping calls with `_doc` as a top-level key as typed calls. (#38032 ) Currently the put-mapping API assumes that because the type name is `_doc` then it is dealing with a typeless put-mapping call. Yet we still allow running the put-mapping API in a typed fashion with `_doc` as a type name. The current logic triggers surprising errors when doing a typed put-mapping call with `_doc` as a type name on an index that has a type already. This is a bit of a corner-case, but is more important on 6.x due to the fact that using the index API with `_doc` as a type name triggers typed calls to the put-mapping API with `_doc` as a type name.	2019-01-31 13:57:42 +01:00
David Turner	eadcb5f0f8	Fix size of rolling-upgrade bootstrap config (#38031 ) Zen2 nodes will bootstrap themselves once they believe there to be no remaining Zen1 master-eligible nodes in the cluster, as long as minimum_master_nodes is satisfied. Today the bootstrap configuration comprises just the ids of the known master-eligible nodes, and this might be too small to be safe. For instance, if there are 5 master-eligible nodes (so that minimum_master_nodes is 3) then the bootstrap configuration could comprise just 3 nodes, of which 2 form a quorum, and this does not intersect other quorums that might arise, leading to a split-brain. This commit fixes this by expanding the bootstrap configuration so that its quorums satisfy minimum_master_nodes, by adding some of the IDs of the other master-eligible nodes in the last-published cluster state.	2019-01-31 08:00:11 +00:00
Alexander Reelsen	b94acb608b	Speed up converting of temporal accessor to zoned date time (#37915 ) The existing implementation was slow due to exceptions being thrown if an accessor did not have a time zone. This implementation queries for having a timezone, local time and local date and also checks for an instant preventing to throw an exception and thus speeding up the conversion. This removes the existing method and create a new one named DateFormatters.from(TemporalAccessor accessor) to resemble the naming of the java time ones. Before this change an epoch millis parser using the toZonedDateTime method took approximately 50x longer. Relates #37826	2019-01-31 08:55:40 +01:00
Alexander Reelsen	160d1bd4dd	Work around JDK8 timezone bug in tests (#37968 ) The timezone GMT0 cannot be properly parsed on java8. The randomZone() method now excludes GMT0, if java8 is used. Closes #37814	2019-01-31 08:52:35 +01:00

1 2 3 4 5 ...

2640 Commits