OpenSearch

Commit Graph

Author	SHA1	Message	Date
David Roberts	366eef99a1	Mute SharedClusterSnapshotRestoreIT.testCloseOrDeleteIndexDuringSnapshot Due to https://github.com/elastic/elasticsearch/issues/39828	2019-03-08 11:42:13 +00:00
David Turner	5d68143b18	Reformat elasticsearch-node messages (#39811 ) Flows the warning messages emitted by the `elasticsearch-node` tool to a width of 72 characters and tweaks the wording slightly.	2019-03-08 10:01:29 +00:00
Jake Landis	797d6b8a66	Execute ingest node pipeline before creating the index (#39607 ) (#39796 ) Prior to this commit (and after 6.5.0), if an ingest node changes the _index in a pipeline, the original target index would be created. For daily indexes this could create an extra, empty index per day. This commit changes the TransportBulkAction to execute the ingest node pipeline before attempting to create the index. This ensures that the only index created is the original or one set by the ingest node pipeline. This was the execution order prior to 6.5.0 (#32786). The execution order was changed in 6.5 to better support default pipelines. Specifically the execution order was changed to be able to read the settings from the index meta data. This commit also includes a change in logic such that if the target index does not exist when ingest node pipeline runs, it will now pull the default pipeline (if one exists) from the settings of the best matched of the index template. Relates #32786 Relates #32758 Closes #36545	2019-03-07 13:31:41 -06:00
Jason Tedor	0250d554b6	Introduce forget follower API (#39718 ) This commit introduces the forget follower API. This API is needed in cases that unfollowing a following index fails to remove the shard history retention leases on the leader index. This can happen explicitly through user action, or implicitly through an index managed by ILM. When this occurs, history will be retained longer than necessary. While the retention lease will eventually expire, it can be expensive to allow history to persist for that long, and also prevent ILM from performing actions like shrink on the leader index. As such, we introduce an API to allow for manual removal of the shard history retention leases in this case.	2019-03-07 11:08:45 -05:00
Armin Braun	213cc6673c	Remove Dead Code in o.e.util package (#39717 ) (#39779 ) * None of this code is used so we should delete it, we can always bring it back if needed	2019-03-07 08:31:46 +01:00
Nhat Nguyen	b69affda6a	Use unwrapped cause to determine if node is closing (#39723 ) We need to unwrap and use the actual cause when determining if the node with primary shard is shutting down because TransportService will throw a TransportException wrapped in a SendRequestTransportException. Relates #39584	2019-03-06 15:30:55 -05:00
Nhat Nguyen	1fe7cb594f	Don’t ack if unable to remove failing replica (#39584 ) Today when a replicated write operation fails to execute on a replica, the primary will reach out to the master to fail that replica (and mark it stale). We then won't ack that request until the master removes the failing replica; otherwise, we will lose the acked operation if the failed replica is still in the in-sync set. However, if a node with the primary is shutting down, we might ack such request even though we are unable to send a shard-failure request to the master. This happens because we ignore NodeClosedException which is triggered when the ClusterService is being closed. Closes #39467	2019-03-06 15:30:55 -05:00
markharwood	1873de5240	Bug fix for AnnotatedTextHighlighter - port of 39525 (#39749 ) Bug fix for AnnotatedTextHighlighter - port of 39525 Relates to #39395	2019-03-06 19:02:04 +00:00
Yannick Welsch	d094107592	Fix SharedClusterSnapshotRestoreIT Relates to #39644	2019-03-06 17:51:23 +01:00
Yannick Welsch	fef11f7efc	Allow snapshotting replicated closed indices (#39644 ) This adds the capability to snapshot replicated closed indices. It also changes snapshot requests in v8.0.0 to automatically expand wildcards to closed indices and hence start snapshotting closed indices by default. For v7.1.0 and above, wildcards are by default only expanded to open indices, which can be changed by explicitly setting the expand_wildcards option either to all or closed. Note that indices are always restored as open indices, even if they have been snapshotted as closed replicated indices. Relates to #33888	2019-03-06 16:08:20 +01:00
Simon Willnauer	e620fb2e4a	Add option to force load term dict into memory (#39741 ) Lucene added an optimization to leave the term dictionary on disk for non-id like fields. This change happened very late in the release processes such that it's better to have an escape hatch if certain use-cases are hurt by this optimization. This setting might be removed in the future if it turns out to be unnecessary.	2019-03-06 15:29:04 +01:00
Christoph Büscher	6c503824c8	Fix occasional SearchServiceTests failure (#39697 ) Currently SearchServiceTests.testCloseSearchContextOnRewriteException can fail if a refresh happens while we test for the SearchPhaseExecutionException that is thrown later in the test. The test takes the current Store#refCount and expects it to be the same after the exception is thrown. If a refresh happens in that interval however, the refCound will be different, causing the test to fail. This can be provoked e.g. by running this section in a tight loop. Switching of refresh for this tests solves the issue.	2019-03-06 14:18:03 +01:00
Andrey Ershov	52fd102e23	Avoid serialising state if it was already serialised (#39179 ) When preparing the state to send to other nodes, we're serializing it for each node, despite using putIfAbsent. This commit checks if the state was already serialized for this node version before performing the potentially expensive computation. The map is not used by multiple threads, so computeIfAbsent is not needed (and could not be used here easily, because IOException could be thrown). (cherry picked from commit c99be63b43f5250f3cd220130df73c5e9e097459)	2019-03-06 11:54:13 +01:00
David Turner	295e39a8c8	Drop node if asymmetrically partitioned from master (#39598 ) When a node is joining the cluster we ensure that it can send requests to the master _at that time_. If it joins the cluster and _then_ loses the ability to send requests to the master then it should be removed from the cluster. Today this is not the case: the master can still receive responses to its follower checks, and receives acknowledgements to cluster state publications, so has no reason to remove the node. This commit changes the handling of follower checks so that they fail if they come from a master that the other node was following but which it now believes to have failed.	2019-03-06 09:41:57 +00:00
David Turner	77dd711847	Tidy up GroupedActionListener (#39633 ) Today the `GroupedActionListener` accepts a `defaults` parameter but all callers pass an empty list. Also it is permitted to pass an empty group but this is trappy because the delegated listener is never be called in that case. This commit removes the `defaults` parameter and forbids an empty group.	2019-03-06 09:25:10 +00:00
Armin Braun	aaecaf59a4	Optimize Bulk Message Parsing and Message Length Parsing (#39634 ) (#39730 ) * Optimize Bulk Message Parsing and Message Length Parsing * findNextMarker took almost 1ms per invocation during the PMC rally track * Fixed to be about an order of magnitude faster by using Netty's bulk `ByteBuf` search * It is unnecessary to instantiate an object (the input stream wrapper) and throw it away, just to read the `int` length from the message bytes * Fixed by adding bulk `int` read to BytesReference	2019-03-06 08:13:15 +01:00
Jason Tedor	75a0d4f470	Rename retention lease setting (#39719 ) This commit renames the retention lease setting index.soft_deletes.retention.lease so that it is under the namespace index.soft_deletes.retention_lease. As such, we rename the setting to index.soft_deletes.retention_lease.period.	2019-03-05 22:04:45 -05:00
Jason Tedor	504c792861	Add Docker build type (#39378 ) This commit adds a new build type (together with deb/rpm/tar/zip) to represent the official Docker images. This build type will be displayed in APIs such as the main and nodes info APIs.	2019-03-05 22:03:15 -05:00
Luca Cavanna	9d0211485c	Tie-break completion suggestions with same score and surface form (#39564 ) In case multiple completion suggestion entries have the same score and surface form, the order in which such options will be returned is currently not deterministic. With this commmit we introduce tie-breaking for such situations, based on shard id, index name, index uuid and doc id like we already do for ordinary search hits. With this change we also make shardIndex mandatory when sorting and comparing completion suggestion options, which was previously only needed later when fetching hits). Also, we need to make sure shardIndex is properly set when merging completion suggestions coming from multiple clusters in `SearchResponseMerger`	2019-03-05 18:03:54 +01:00
Jim Ferenczi	160dc29f0e	Handle total hits equal to track_total_hits (#37907 ) This change ensures that a total hits equal to the value set for track_total_hits is not considered as a lower bound.	2019-03-05 16:28:48 +01:00
Armin Braun	750ec8ba53	Minor Cleanups in QueryPhase (#39680 ) (#39694 ) * Soften redundant cast to allow use of `DeterministicTaskQueue` in this class for #39504 * Remove two redundant variables and lower visibility in two possible spots * Make field `final`	2019-03-05 15:04:16 +01:00
Christoph Büscher	5cdea6ef17	Fix Fuzziness#asDistance(String) (#39643 ) Currently Fuzziness#asDistance(String) doesn't work for custom AUTO values. If the fuzziness is AUTO, the method returns the correct edit distance to use, depending on the input string, but for custom AUTO values it currently always returns an edit distance of 1. Correcting this and adding unit and integration tests to catch these cases. Closes #39614	2019-03-05 14:31:07 +01:00
Simon Willnauer	19f6a35358	Move BWC Version to 7.1.0 after backport Relates to #39512	2019-03-05 14:11:59 +01:00
Simon Willnauer	d112c89041	Allow inclusion of unloaded segments in stats (#39512 ) Today we have no chance to fetch actual segment stats for segments that are currently unloaded. This is relevant in the case of frozen indices. This allows to monitor how much memory a frozen index would use if it was unfrozen.	2019-03-05 14:02:20 +01:00
Armin Braun	e8d9744340	Use Threadpool Time in ClusterApplierService (#39679 ) (#39685 ) * Use threadpool's time in `ClusterApplierService` to allow for deterministic tests * This is a part of/requirement for #39504	2019-03-05 12:37:49 +01:00
Gordon Brown	380dc27d91	Mute testCloseWhileRelocatingShards (#39589 )	2019-03-05 13:34:43 +02:00
Alan Woodward	0b14782b23	Add stopword support to IntervalBuilder (#39637 ) The match interval builder analyses input text and converts it to an IntervalSource, and as such may generate token streams with stopwords. This commit deals with these by using the extend factory to cover the gaps produced by these stopwords so that phrase and ordered queries work correctly.	2019-03-05 10:50:45 +00:00
Christoph Büscher	2fe1fa8972	Shortcut counts on exists queries (#39570 ) (#39660 ) `TopDocsCollectorContext` can already shortcut hit counts on `match_all` and `term` queries when there are no deletions. This change adds this ability for `exists` queries if the index doesn't have deletions and fields are indexed. Closes #37475	2019-03-04 19:53:43 +01:00
Prabhakar S	98925e9a09	Fixing the custom object serialization bug in diffable utils. (#39544 ) While serializing custom objects, the length of the list is computed after filtering out the unsupported objects but while writing objects the filter is not applied thus resulting in writing unsupported objects which will fail to deserialize by the receiever. Adding the condition to filter out unsupported custom objects.	2019-03-04 18:41:14 +01:00
Nhat Nguyen	801f13f201	Assert recovery done in testDoNotWaitForPendingSeqNo (#39595 ) Since #39006 we should be able to complete a peer-recovery without waiting for pending indexing operations. Thus, the assertion in testDoNotWaitForPendingSeqNo should be updated from false to true. Closes #39510	2019-03-04 10:21:23 -05:00
Yannick Welsch	936dbb00e3	Isolate Zen1 (#39470 ) Cherry-picks a few commits from #39466 to align 7.x with master branch.	2019-03-04 15:51:17 +01:00
Luca Cavanna	9ddaabba88	Remote private SearchHits.Total class (#39556 ) This is now possible as Lucene's `TotalHits` implements `equals`/`hashcode`, all the other methods can be in-lined in `SearchHits` instead, no need for a specific wrapper class.	2019-03-04 13:46:45 +01:00
Armin Braun	547af21a12	Introduce Mapping ActionListener (#39538 ) (#39636 ) * Introduce Safer Chaining of Listeners * The motivation here is to make reasoning about chains of `ActionListener` a little easier, by providing a safe method for nesting `ActionListener` that guarantees that a response is never dropped. Also, it dries up the code a little by removing the need to repeat `listener::onFailure` and `listener.onResponse` over and over. * Refactored a number of obvious/easy spots to use the new listener constructor	2019-03-04 12:56:46 +01:00
Daniel Mitterdorfer	fca6a2f006	Avoid deprecated API usage in TaskOperationFailure (#39303 ) (#39628 ) With this commit we remove usage of the deprecated method `ExceptionsHelper#detailedMessage` in the class `TaskOperationFailure`. Relates #19069	2019-03-04 11:37:59 +01:00
David Turner	dd68244841	Wait for state recovery in testFreshestMasterElectedAfterFullClusterRestart (#39602 ) Zen1IT#testFreshestMasterElectedAfterFullClusterRestart fails sometimes because we request the cluster state before state recovery has completed, and therefore obtain the default value for the setting we're relying on. Confusingly, we were starting out by setting this setting to its default value, so the test looked like it was failing because of a production bug. This commit avoids this confusion in future by setting it to a non-default value at the start of the test. Fixes #39586.	2019-03-04 10:26:07 +00:00
Adrien Grand	782f873165	Don't swallow exceptions in Store#close(). (#39035 ) (#39622 ) Store#close() swallows any `IOException`. Relates #39030	2019-03-04 10:58:43 +01:00
Adrien Grand	934946a232	Don't swallow exception in ThreadPool.terminate. (#39038 ) (#39623 ) The use of `closeWhileHandlingException` means that any exception while trying to close the threadpool is going to be swallowed. Relates #39030	2019-03-04 10:58:29 +01:00
Adrien Grand	21540a5ada	Enhancements to IndicesQueryCache. (#39099 ) (#39626 ) This commit adds the following: - more tests to IndicesServiceCloseTests, one of them found a bug in the order in which `IndicesQueryCache#onClose` and `IndicesService.indicesRefCount#decRef` are called. - made `IndicesQueryCache.stats2` a synchronized map. All writes to it are already protected by the lock of the Lucene cache, but the final read from an assertion in `IndicesQueryCache#close()` was not so this change should avoid any potential visibility issues. - human-readable `toString`s to make debugging easier. Relates #37117	2019-03-04 10:58:12 +01:00
Armin Braun	68bc178017	Disable Bwc Tests (#39551 ) * Disable Bwc Tests * For #39550	2019-03-04 10:41:52 +01:00
Yannick Welsch	0f65390c29	Do not mutate engine during planning step (#39571 ) This cleans up the Engine implementation by separating the sequence number generation from the planning step in the engine, to avoid for the planning step to have any side effects. This makes it easier to see that every sequence number is properly accounted for.	2019-03-04 10:11:39 +01:00
David Turner	9ec24bae80	Mute testDoNotWaitForPendingSeqNo Relates #39510, #39595.	2019-03-03 22:03:53 -05:00
Mayya Sharipova	d0e65a45a2	Add debug log for flush for IndicesRequestCacheIT (#39475 ) Add debug log when index is flushed to investigate a failure in IndicesRequestCacheIT "DEBUG" level is used as "TRACE" produces too much output irrelevant for this issue Relates to #32827	2019-03-01 13:12:45 -05:00
Luca Cavanna	29e3c18713	Mute failing IndexShardIT#testPendingRefreshWithIntervalChange Relates to #39565	2019-03-01 14:55:19 +01:00
Tanguy Leroux	e005eeb0b3	Backport support for replicating closed indices to 7.x (#39506 )(#39499 ) Backport support for replicating closed indices (#39499) Before this change, closed indexes were simply not replicated. It was therefore possible to close an index and then decommission a data node without knowing that this data node contained shards of the closed index, potentially leading to data loss. Shards of closed indices were not completely taken into account when balancing the shards within the cluster, or automatically replicated through shard copies, and they were not easily movable from node A to node B using APIs like Cluster Reroute without being fully reopened and closed again. This commit changes the logic executed when closing an index, so that its shards are not just removed and forgotten but are instead reinitialized and reallocated on data nodes using an engine implementation which does not allow searching or indexing, which has a low memory overhead (compared with searchable/indexable opened shards) and which allows shards to be recovered from peer or promoted as primaries when needed. This new closing logic is built on top of the new Close Index API introduced in 6.7.0 (#37359). Some pre-closing sanity checks are executed on the shards before closing them, and closing an index on a 8.0 cluster will reinitialize the index shards and therefore impact the cluster health. Some APIs have been adapted to make them work with closed indices: - Cluster Health API - Cluster Reroute API - Cluster Allocation Explain API - Recovery API - Cat Indices - Cat Shards - Cat Health - Cat Recovery This commit contains all the following changes (most recent first): * c6c42a1 Adapt NoOpEngineTests after #39006 * 3f9993d Wait for shards to be active after closing indices (#38854) * 5e7a428 Adapt the Cluster Health API to closed indices (#39364) * 3e61939 Adapt CloseFollowerIndexIT for replicated closed indices (#38767) * 71f5c34 Recover closed indices after a full cluster restart (#39249) * 4db7fd9 Adapt the Recovery API for closed indices (#38421) * 4fd1bb2 Adapt more tests suites to closed indices (#39186) * 0519016 Add replica to primary promotion test for closed indices (#39110) * b756f6c Test the Cluster Shard Allocation Explain API with closed indices (#38631) * c484c66 Remove index routing table of closed indices in mixed versions clusters (#38955) * 00f1828 Mute CloseFollowerIndexIT.testCloseAndReopenFollowerIndex() * e845b0a Do not schedule Refresh/Translog/GlobalCheckpoint tasks for closed indices (#38329) * cf9a015 Adapt testIndexCanChangeCustomDataPath for replicated closed indices (#38327) * b9becdd Adapt testPendingTasks() for replicated closed indices (#38326) * 02cc730 Allow shards of closed indices to be replicated as regular shards (#38024) * e53a9be Fix compilation error in IndexShardIT after merge with master * cae4155 Relax NoOpEngine constraints (#37413) * 54d110b [RCI] Adapt NoOpEngine to latest FrozenEngine changes * c63fd69 [RCI] Add NoOpEngine for closed indices (#33903) Relates to #33888	2019-03-01 14:48:26 +01:00
Yannick Welsch	1a50af7dd4	Do not close bad indices on startup (#39500 ) With #17187, we verified IndexService creation during initial state recovery on the master and if the recovery failed the index was imported as closed, not allocating any shards. This was mainly done to prevent endless allocation loops and full log files on data-nodes when the indexmetadata contained broken settings / analyzers. Zen2 loads the cluster state eagerly, and this check currently runs on all nodes (not only the elected master), which can significantly slow down startup on data nodes. Furthermore, with replicated closed indices (#33888) on the horizon, importing the index as closed will no longer not allocate any shards. Fortunately, the original issue for endless allocation loops is no longer a problem due to #18467, where we limit the retries of failed allocations. The solution here is therefore to just undo #17187, as it's no longer necessary, and covered by #18467, which will solve the issue for Zen2 and replicated closed indices as well.	2019-03-01 09:23:46 +01:00
Tal Levy	b9b46fdec6	fix UpdateSettingsRequestStreamableTests.mutateInstance (#39386 ) (#39477 ) Mutations of the timeout values were using string-representations. This resulted in very rare cases where the original timeout value was represented as something like "0ms" and the new random time-value generated was "0s". Although their string representations differ, their underlying TimeValue does not. This resulted in `-Dtests.seed=7F4C034C43C22B1B` to fail.	2019-02-28 21:02:32 -08:00
Mark Tozzi	609118c229	Override and mute InternalAutoDateHistogramTests#testReduceRandom() (#39536 ) pending resolution of #39497	2019-02-28 16:00:32 -05:00
Lee Hinman	dae48ba262	Add details about what acquired the shard lock last (#38807 ) This adds a `details` parameter to shard locking in `NodeEnvironment`. This is intended to be used for diagnosing issues such as ``` 1> [2019-02-11T14:34:19,262][INFO ][o.e.c.m.MetaDataDeleteIndexService] [node_s0] [.tasks/oSYOG0-9SHOx_pfAoiSExQ] deleting index 1> [2019-02-11T14:34:19,279][WARN ][o.e.i.IndicesService ] [node_s0] [.tasks/oSYOG0-9SHOx_pfAoiSExQ] failed to delete index 1> org.elasticsearch.env.ShardLockObtainFailedException: [.tasks][0]: obtaining shard lock timed out after 0ms 1> at org.elasticsearch.env.NodeEnvironment$InternalShardLock.acquire(NodeEnvironment.java:736) ~[main/:?] 1> at org.elasticsearch.env.NodeEnvironment.shardLock(NodeEnvironment.java:655) ~[main/:?] 1> at org.elasticsearch.env.NodeEnvironment.lockAllForIndex(NodeEnvironment.java:601) ~[main/:?] 1> at org.elasticsearch.env.NodeEnvironment.deleteIndexDirectorySafe(NodeEnvironment.java:554) ~[main/:?] ``` In the hope that we will be able to determine why the shard is still locked. Relates to #30290 as well as some other CI failures	2019-02-28 10:50:47 -07:00
Armin Braun	e564c4d8ad	Add Package Level JavaDoc on Snapshots (#38108 ) (#39514 ) * Add Package Level JavaDoc on Snapshots	2019-02-28 18:23:01 +01:00
Simon Willnauer	5c96b90ed5	Never block on scheduled refresh if a refresh is running (#39462 ) Today we block on the ReferenceManager in the case of a scheduled refresh. Yet if there is a refresh happening concurrently we might block and create very smallish segments. Instead we should just move on to the next shard and free up the refresh thread instead.	2019-02-28 11:57:45 +01:00
Armin Braun	d3d7d9bb9d	Remove Dead Code + Duplication in o.e.c.routing (#36678 ) (#39493 ) * Removed obviously unused fields+methods * Inlined public methods that only had one caller * Simplified `Optional` chain * Simplified some obviously redundant conditions	2019-02-28 10:33:05 +01:00
Armin Braun	90ab4a6f6e	Stabilize RareClusterState (#38671 ) (#39468 ) * Use actual master node, not just a master elligible node when trying to cancel publication. This only works on the master and for unlucky seeds we never try the master within the 10s that the busy assert runs. * Closes #36813	2019-02-28 08:01:52 +01:00
Tanguy Leroux	4dd274b51d	Unmute CoordinatorTests.testDiscoveryUsesNodesFromLastClusterState() (#39452 ) This commit unmutes the test and comments out the offending call to linearizabilityChecker.isLinearizable() as suggested in #39437	2019-02-27 20:38:54 +01:00
Tanguy Leroux	983b5d1c0e	Mute SpecificMasterNodesIT.testElectOnlyBetweenMasterNodes() Tracked in #38331	2019-02-27 18:00:02 +01:00
Daniel Mitterdorfer	2ccba18809	Correct name of basic_date_time_no_millis (#39367 ) (#39454 ) With this commit we correct the name of the Java time based formatter for `basic_date_time_no_millis`.	2019-02-27 17:03:50 +01:00
Alan Woodward	71b8494181	Upgrade to lucene 8.0.0-snapshot-ff9509a8df (#39444 ) Backport of #39350 Contains the following: * LUCENE-8635: Move terms dictionary off-heap for non-primary-key fields in `MMapDirectory` * LUCENE-8292: `TermsEnum` is fully abstract * LUCENE-8679: Return WITHIN in `EdgeTree#relateTriangle` only when polygon and triangle share one edge * LUCENE-8676: Nori tokenizer deals correctly with large buffers * LUCENE-8697: `GraphTokenStreamFiniteStrings` better handles side paths with gaps * LUCENE-8664: Add `equals` and `hashCode` to `TotalHits` * LUCENE-8660: `TopDocsCollector` returns accurate hit counts if the total equals the threshold * LUCENE-8654: `Polygon2D#relateTriangle` fix for when the polygon is inside the triangle * LUCENE-8645: `Intervals#fixField` can merge intervals from different fields * LUCENE-8585: Create jump-tables for DocValues at index time	2019-02-27 14:36:08 +00:00
Armin Braun	f675b33d50	Increase Timeout in UnicastZenPingTests (#38893 ) (#39449 ) * Just like #37268 removing another 1s timeout, those are dangerous since they're easily exceeded by an untimely gc pause * Closes #26701	2019-02-27 15:22:17 +01:00
Jason Tedor	55e98f08d8	Provide a clearer error message on keystore add (#39327 ) When trying to add a setting to the keystore with an upper case name, we reject with an unclear error message. This commit makes that error message much clearer.	2019-02-27 08:10:23 -05:00
Armin Braun	27485871b8	Don't Ping on Handshake Connection (#39076 ) (#39446 ) * Don't Ping on Handshake Connection * It does not make sense to run pings on the handshake connection * Set the ping interval to `-1` to deactivate pings on it	2019-02-27 13:39:25 +01:00
Tanguy Leroux	6912e27ee0	Mute MinimumMasterNodesIT.testThreeNodesNoMasterBlock() Tracked in #39172	2019-02-27 13:13:22 +01:00
David Turner	41668f7723	Move PeerFinder's logger to the expected package (#39412 ) Today the abstract `org.elasticsearch.discovery.PeerFinder` uses the logger of its implementation, which in production is in `o.e.cluster.coordination`. This turns out to be confusing and unhelpful, so with this change we move to using the logger that belongs to `PeerFinder`.	2019-02-27 08:44:05 +00:00
Armin Braun	28b771f5db	Remove Dead Code Test Infrastructure (#39192 ) (#39436 ) * Just removing some obviously unused things	2019-02-27 09:38:47 +01:00
Tim Brooks	f24dae302d	Make security tests transport agnostic (#39411 ) Currently there are two security tests that specifically target the netty security transport. This PR moves the client authentication tests into `AbstractSimpleSecurityTransportTestCase` so that the nio transport will also be tested. Additionally the work to build transport configurations is moved out of the netty transport and tested independently.	2019-02-26 18:55:19 -07:00
Nhat Nguyen	a9e86bc941	Adjust testWaitForPendingSeqNo (#39404 ) Since #39006, we should either remove `testWaitForPendingSeqNo` or adjust it not to wait for the pending operations. This change picks the latter. Relates #39006	2019-02-26 16:21:56 -05:00
Mayya Sharipova	4ca514f18c	Fix testCacheWithFilteredAlias failure (#39401 ) Move refresh after Forcemerge Relates to #32827	2019-02-26 14:11:35 -05:00
Luca Cavanna	2619f48e4d	Rename SearchRequest#withLocalReduction (#39108 ) `withLocalReduction` is confusing as `local` effectively means "local to the remote clusters" rather than "local the coordinating node" where the method is executed. I propose we rename the method to `crossClusterSearch` which better resembles what the static method is used for.	2019-02-26 16:30:54 +01:00
Luca Cavanna	c09773a76e	Completion suggestions to be reduced once instead of twice (#39255 ) We have been calling `reduce` against completion suggestions twice, once in `SearchPhaseController#reducedQueryPhase` where all suggestions get reduced, and once more in `SearchPhaseController#sortDocs` where we add the top completion suggestions to the `TopDocs` so their docs can be fetched. There is no need to do reduction twice. All suggestions can be reduced in one call, then we can filter the result and pass only the already reduced completion suggestions over to `sortDocs`. The small important detail is that `shardIndex`, which is currently used only to fetch suggestions hits, needs to be set before the first reduction, hence outside of `sortDocs` where we have been doing it until now.	2019-02-26 11:42:02 +01:00
Yannick Welsch	d42f422258	Add linearizability checker for coordination layer (#36943 ) Checks that the core coordination algorithm implemented as part of Zen2 (#32006) supports linearizable semantics. This commit adds a linearizability checker based on the Wing and Gong graph search algorithm with support for compositional checking and activates these checks for all CoordinatorTests.	2019-02-26 08:26:55 +01:00
Nhat Nguyen	575eed8582	Bubble up exception when processing NoOp (#39338 ) Today we do not bubble up exceptions when processing NoOps but always treat them as document-level failures. This incorrect treatment causes the assert_no_failure being tripped in peer-recovery if IndexWriter was closed exceptionally before. Closes #38898	2019-02-25 17:54:45 -05:00
Nhat Nguyen	e9dda75834	Enable soft-deletes by default for 7.0+ indices (#38929 ) Today when users upgrade to 7.0, existing indices will automatically switch to soft-deletes without an opt-out option. With this change, we only enable soft-deletes by default for new indices. Relates #36141	2019-02-25 17:54:29 -05:00
Igor Motov	d5046b1c25	[CI] Fixes testQueryRandomGeoCollection failure again (#39275 ) Moves the check for tiny polygons earlier in the test. It turned out that polygons can be so tiny that we cannot even figure out their orientation. Relates to #37356	2019-02-25 16:35:17 -05:00
Evgenia Badyanova	1ed3407930	Reduce garbage from allocations in deprecation logger (#38780 ) (#39370 ) 1. Setting length for formatWarning String to avoid AbstractStringBuilder.ensureCapacityInternal calls 2. Adding extra check for parameter array length == 0 to avoid unnecessarily creating StringBuilder in LoggerMessageFormat.format Helps to narrow the performance gap in throughout for geonames benchmark (#37411) by 3%. For more details: https://github.com/elastic/elasticsearch/issues/37530#issuecomment-462758384 Relates to #37530 Relates to #37411 Relates to #35754	2019-02-25 16:23:22 -05:00
Lee Hinman	5c7dd6f0ee	Set mappings when creating indices in SuggestSearchIT (#39323 ) * Set mappings when creating indices in SuggestSearchIT These tests don't test dynamic mapping, so they can use preset mappings. This removes the possibility they may fail due to the mapping not being available since mapping updates are asynchronous. Resolves #39315 * Wrap creates in assertAcked	2019-02-25 13:27:03 -07:00
Mayya Sharipova	bf058d6e4d	Fix anaylze NullPointerException when AnalyzeTokenList tokens is null (#39332 ) (#39361 )	2019-02-25 12:49:18 -05:00
Nhat Nguyen	48219112e3	Do not wait for advancement of checkpoint in recovery (#39006 ) With this change, we won't wait for the local checkpoint to advance to the max_seq_no before starting phase2 of peer-recovery. We also remove the sequence number range check in peer-recovery. We can safely do these thanks to Yannick's finding. The replication group to be used is currently sampled after indexing into the primary (see `ReplicationOperation` class). This means that when initiating tracking of a new replica, we have to consider the following two cases: - There are operations for which the replication group has not been sampled yet. As we initiated the new replica as tracking, we know that those operations will be replicated to the new replica and follow the typical replication group semantics (e.g. marked as stale when unavailable). - There are operations for which the replication group has already been sampled. These operations will not be sent to the new replica. However, we know that those operations are already indexed into Lucene and the translog on the primary, as the sampling is happening after that. This means that by taking a snapshot of Lucene or the translog, we will be getting those ops as well. What we cannot guarantee anymore is that all ops up to `endingSeqNo` are available in the snapshot (i.e. also see comment in `RecoverySourceHandler` saying `We need to wait for all operations up to the current max to complete, otherwise we can not guarantee that all operations in the required range will be available for replaying from the translog of the source.`). This is not needed, though, as we can no longer guarantee that max seq no == local checkpoint. Relates #39000 Closes #38949 Co-authored-by: Yannick Welsch <yannick@welsch.lu>	2019-02-25 12:10:14 -05:00
David Turner	236db51d34	Fix testSnapshotFileFailureDuringSnapshot (#39362 ) Today this test catches an exception and asserts that its proximate cause has message `Random IOException` but occasionally this exception is wrapped two layers deep, causing the test to fail. This commit adjusts the test to look at the root cause of the exception instead. 1> [2019-02-25T12:31:50,837][INFO ][o.e.s.SharedClusterSnapshotRestoreIT] [testSnapshotFileFailureDuringSnapshot] --> caught a top level exception, asserting what's expected 1> org.elasticsearch.snapshots.SnapshotException: [test-repo:test-snap/e-hn_pLGRmOo97ENEXdQMQ] Snapshot could not be read 1> at org.elasticsearch.snapshots.SnapshotsService.snapshots(SnapshotsService.java:212) ~[main/:?] 1> at org.elasticsearch.action.admin.cluster.snapshots.get.TransportGetSnapshotsAction.masterOperation(TransportGetSnapshotsAction.java:135) ~[main/:?] 1> at org.elasticsearch.action.admin.cluster.snapshots.get.TransportGetSnapshotsAction.masterOperation(TransportGetSnapshotsAction.java:54) ~[main/:?] 1> at org.elasticsearch.action.support.master.TransportMasterNodeAction.masterOperation(TransportMasterNodeAction.java:127) ~[main/:?] 1> at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$2.doRun(TransportMasterNodeAction.java:208) ~[main/:?] 1> at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:751) ~[main/:?] 1> at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[main/:?] 1> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_202] 1> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_202] 1> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_202] 1> Caused by: org.elasticsearch.snapshots.SnapshotException: [test-repo:test-snap/e-hn_pLGRmOo97ENEXdQMQ] failed to get snapshots 1> at org.elasticsearch.repositories.blobstore.BlobStoreRepository.getSnapshotInfo(BlobStoreRepository.java:564) ~[main/:?] 1> at org.elasticsearch.snapshots.SnapshotsService.snapshots(SnapshotsService.java:206) ~[main/:?] 1> ... 9 more 1> Caused by: java.io.IOException: Random IOException 1> at org.elasticsearch.snapshots.mockstore.MockRepository$MockBlobStore$MockBlobContainer.maybeIOExceptionOrBlock(MockRepository.java:275) ~[test/:?] 1> at org.elasticsearch.snapshots.mockstore.MockRepository$MockBlobStore$MockBlobContainer.readBlob(MockRepository.java:317) ~[test/:?] 1> at org.elasticsearch.repositories.blobstore.ChecksumBlobStoreFormat.readBlob(ChecksumBlobStoreFormat.java:101) ~[main/:?] 1> at org.elasticsearch.repositories.blobstore.BlobStoreFormat.read(BlobStoreFormat.java:90) ~[main/:?] 1> at org.elasticsearch.repositories.blobstore.BlobStoreRepository.getSnapshotInfo(BlobStoreRepository.java:560) ~[main/:?] 1> at org.elasticsearch.snapshots.SnapshotsService.snapshots(SnapshotsService.java:206) ~[main/:?] 1> ... 9 more FAILURE 0.59s J0 \| SharedClusterSnapshotRestoreIT.testSnapshotFileFailureDuringSnapshot <<< FAILURES! > Throwable #1: java.lang.AssertionError: > Expected: a string containing "Random IOException" > but: was "[test-repo:test-snap/e-hn_pLGRmOo97ENEXdQMQ] failed to get snapshots" > at __randomizedtesting.SeedInfo.seed([B73CA847D4B4F52D:884E042D2D899330]:0) > at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20) > at org.elasticsearch.snapshots.SharedClusterSnapshotRestoreIT.testSnapshotFileFailureDuringSnapshot(SharedClusterSnapshotRestoreIT.java:821) > at java.lang.Thread.run(Thread.java:748)	2019-02-25 16:43:55 +00:00
Marios Trivyzas	11fe8cd16f	[Tests] Fix flakiness by ensuring stable cluster (#39300 ) (#39356 ) In integration tests where `setBootstrapMasterNodeIndex()` is used in combination with `autoMinMasterNodes = false` the cluster can start bootstrapping once the number of nodes set with the `setBootstrapMasterNodeIndex` have been started but it's not ensured that all nodes have successfully joined to form the cluster. This behaviour was introduced with `5db7ed22a0` and in order to ensure that the cluster is properly formed before proceeding with the integration test, use `ensureStableCluster()` with the appropriate number of expected nodes. Fixes: #39220	2019-02-25 17:26:15 +01:00
David Turner	dc23be5a9d	Avoid creating a green index in RetentionLeaseIT (#39347 ) In #39224 we made shard history retention lease syncing ignore the `index.write.wait_for_active_shards` setting on the index, and added a test that showed that it was ignored. However the test as merged actually creates a green index, so the `wait_for_active_shards` setting has no effect. This change adjusts the test to create a yellow index to verify that `wait_for_active_shards` really is ignored.	2019-02-25 15:33:09 +00:00
Yannick Welsch	a2bc41621c	Clean GatewayAllocator when stepping down as master (#38885 ) This fixes an issue where a messy master election might prevent shard allocation to properly proceed. I've encountered this in failing CI tests when we were bootstrapping multiple nodes. Tests would sometimes time out on an `ensureGreen` after an unclean master election. The reason for this is how the async shard information fetching works and how the clean-up logic in GatewayAllocator is integrated with the rest of the system. When a node becomes master, it will, as part of the first cluster state update where it becomes master, already try allocating shards (see `JoinTaskExecutor`, in particular the call to `reroute`). This process, which runs on the MasterService thread, will trigger async shard fetching. If the node is still processing an earlier election failure in ClusterApplierService (e.g. due to a messy election), that will possibly trigger the clean-up logic in GatewayAllocator after the shard fetching has been initiated by MasterService, thereby cancelling the fetching, which means that no subsequent reroute (allocation) is triggered after the shard fetching results return. This means that no shard allocation will happen unless the user triggers an explicit reroute command. The bug imo is that GatewayAllocator is called from both MasterService and ClusterApplierService threads, with no clear happens-before relation. The fix here makes it so that the clean-up logic is also run on the MasterService thread instead of the ClusterApplierService thread, reestablishing a clear happens-before relation. Note that testing this is tricky. With the newly added test, I can quite often reproduce this by adding `Thread.sleep(10);` in ClusterApplierService (to make sure it does not go too quickly) and adding `Thread.sleep(50);` in `TransportNodesListGatewayStartedShards` to make sure that shard state fetching does not go too quickly either. Note that older versions of Zen discovery are affected by this as well, but did not exhibit this issue as often because master elections are much slower there.	2019-02-25 10:37:31 +01:00
David Turner	96c09b032d	Ignore waitForActiveShards when syncing leases (#39224 ) Adjust the retention lease sync actions so that they do not respect the `index.write.wait_for_active_shards` setting on an index, allowing them to sync retention leases even if insufficiently many shards are currently active to accept writes. Relates #39089	2019-02-25 08:53:43 +00:00
Nhat Nguyen	f17d408fbb	Add cause to assert_no_failure when replay translog (#39333 ) We tripped this assertion three times for the last two weeks. However, it only says "this IndexWriter is closed" without the actual cause. ``` [2019-02-14T11:46:31,144][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [node-1] fatal error in thread [elasticsearch[node-1][generic][T#2]], exiting java.lang.AssertionError: unexpected failure while replicating translog entry: org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed ``` This change replaces an assert with an AssertionError so that we will have the actual cause in the next build failures. Relates #38898	2019-02-23 13:04:43 -05:00
Zachary Tong	c7516b03b6	Better HoltWinters parameter validation (#38747 ) We validate HW parameters (namely, window > 2 * period) when parsing the XContent... but that means transport clients can configure bad params. This change allows model to validate the window and throw an exception if they wish. It also makes some test changes: - removes testBadModelParams(), which was a junk test (didn't do anything), and bad param checking is done elsewhere in units tests - Fixes one of the windows in testHoltWintersNotEnoughData() - Ensures the period in testHoltWintersNotEnoughData() is >> window - Removes `setTypes()` since that's deprecated	2019-02-22 15:25:26 -05:00
Daniel Mitterdorfer	9fea21aca5	Remove ExceptionsHelper#detailedMessage in tests (#37921 ) (#39297 ) With this commit we remove all usages of the deprecated method `ExceptionsHelper#detailedMessage` in tests. We do not address production code here but rather in dedicated follow-up PRs to keep the individual changes manageable. Relates #19069	2019-02-22 14:03:29 +01:00
Tim Brooks	44df76251f	Rebuild remote connections on profile changes (#39146 ) Currently remote compression and ping schedule settings are dynamic. However, we do not listen for changes. This commit adds listeners for changes to those two settings. Additionally, when those settings change we now close existing connections and open new ones with the settings applied. Fixes #37201.	2019-02-21 14:00:39 -07:00
Tanguy Leroux	fc896e452c	ReadOnlyEngine should update translog recovery state information (#39238 ) (#39251 ) `ReadOnlyEngine` never recovers operations from translog and never updates translog information in the index shard's recovery state, even though the recovery state goes through the `TRANSLOG` stage during the recovery. It means that recovery information for frozen shards indicates an unkown number of recovered translog ops in the Recovery APIs (translog_ops: `-1` and translog_ops_percent: `-1.0%`) and this is confusing. This commit changes the `recoverFromTranslog()` method in `ReadOnlyEngine` so that it always recover from an empty translog snapshot, allowing the recovery state translog information to be correctly updated. Related to #33888	2019-02-21 18:08:06 +01:00
Daniel Mitterdorfer	ef921fd157	Migrate Streamable to Writeable for cluster block package (#37391 ) (#39236 )	2019-02-21 15:21:44 +01:00
Marios Trivyzas	ecfd48b6d3	[Tests] Make testEngineGCDeletesSetting deterministic (#38942 ) (#39231 ) `InternalEngine.resolveDocVersion()` uses `relativeTimeInMillis()` from `ThreadPool` so it needs, the cached time to be advanced. Add a check to ensure that and decrease the `thread_pool.estimated_time_interval` to 1msec to prevent long running times for the test. Fixes: #38874 Co-authored-by: Boaz Leskes <b.leskes@gmail.com>	2019-02-21 14:30:59 +02:00
Marios Trivyzas	1316825f52	Replace superfluous usage of Counter with Supplier (#39048 ) (#39225 ) `Counter` was used as a means of a functional argument to pass the relative cached time before `Supplier` iface was introduced.	2019-02-21 12:42:54 +02:00
Ignacio Vera	be8a5315d7	Extend nextDoc to delegate to the wrapped doc-value iterator for date_nanos (#39176 ) The type date_nanos does not direct doc-value iterators and it needs to extend `next_doc` in order to delegate the call to the wrapped iterator.	2019-02-21 11:10:51 +01:00
Tal Levy	8150ca40f2	mute test3MasterNodes2Failed	2019-02-20 17:35:37 -08:00
Nhat Nguyen	820ba8169e	Add retention leases replication tests (#38857 ) This commit introduces the retention leases to ESIndexLevelReplicationTestCase, then adds some tests verifying that the retention leases replication works correctly in spite of the presence of the primary failover or out of order delivery of retention leases sync requests. Relates #37165	2019-02-20 19:21:00 -05:00
Mirko Jotic	a6ae146ccc	Converting Derivative Pipeline Agg integration test into AggregatorTestsCase. (#38679 ) Replicates the majority of existing Derivative pipeline integration tests into an AggregatorTestCase, with the goal of removing the integration tests in the near future.	2019-02-20 16:35:32 -05:00
Igor Motov	3d93011e32	Fix median calculation in MedianAbsoluteDeviationAggregatorTests (#38979 ) Fixes an error in median calculation in MedianAbsoluteDeviationAggregatorTests for odd number of sample points, which causes some rare test failures. Fixes #38937	2019-02-20 13:24:30 -05:00
Ioannis Kakavas	c783069804	Fix NPE on Stale Index in IndicesService(#39173 ) This is a backport of #38891 which closes #38845	2019-02-20 15:35:35 +02:00
David Turner	efffb3d5b7	Simplify calculation in AwarenessAllocationDecider (#38091 ) Today's calculation of the maximum number of shards per attribute is rather convoluted. This commit clarifies that it returns ceil(shardCount/numberOfAttributes).	2019-02-20 08:54:57 +00:00
Henning Andersen	00a26b9dd2	Blob store compression fix (#39073 ) Blob store compression was not enabled for some of the files in snapshots due to constructor accessing sub-class fields. Fixed to instead accept compress field as constructor param. Also fixed chunk size validation to work. Deprecated repositories.fs.compress setting as well to be able to unify in a future commit.	2019-02-20 09:24:41 +01:00
Hendrik Muhs	50b3858f7c	add version 6.6.2	2019-02-19 20:28:06 +01:00
David Turner	0a9574c9d4	Add some missing toString() implementations (#39124 ) Sometimes we turn objects into strings for logging or debugging using `toString()`, but the default implementation is often unhelpful. This change improves on this in two places I ran into recently.	2019-02-19 17:52:41 +00:00
Jason Tedor	fef9bdb23f	Allow retention lease operations under blocks (#39089 ) This commit allows manipulating retention leases under blocks.	2019-02-19 10:26:49 -05:00
Jason Tedor	12f6963456	Fix retention leases sync on recovery test This test had a bug. We attempt to allow only the primary to be allocated, to force all replicas to recovery from the primary after we had set the state of the retention leases on the primary. However, in building the index settings, we were overwriting the settings that exclude the replicas from being allocated. This means that some of the replicas would end up assigned and rather than receive retention leases during recovery, they would be part of the replication group receiving retention leases as they are manipulated. Since retention lease renewals are only synced periodically, this means that the replica could be lagging a little behind in some cases leading to an assertion tripping in the test. This commit addresses this by ensuring that the replicas are indeed not allocated until after the retention leases are done being manipulated on the replica. We did this by not overwriting the exclude settings. Closes #39105	2019-02-19 09:07:33 -05:00
Alexander Reelsen	7f8a640363	Fix DateFormatters.parseMillis when no timezone is given (#39100 ) The parseMillis method was able to work on formats without timezones by falling back to UTC. The Date Formatter interface did not support this, as the calling code was using the `Instant.from` java time API. This switches over to an internal method which adds UTC as a timezone. Closes #39067	2019-02-19 14:12:22 +01:00
Jim Ferenczi	199155f5fb	Enforce Completion Context Limit (#38675 ) (#39075 ) This change adds a limit to the number of completion contexts that a completion field can define. Closes #32741	2019-02-19 08:52:24 +01:00
Albert Zaharovits	6bc88b00ec	Mute GatewayMetaStateTests.testAtomicityWithFailures (#39079 ) Mute test GatewayMetaStateTests.testAtomicityWithFailures	2019-02-19 00:25:45 +02:00
Jason Tedor	2d8f6b6501	Introduce retention lease state file (#39004 ) This commit moves retention leases from being persisted in the Lucene commit point to being persisted in a dedicated state file.	2019-02-18 16:53:46 -05:00
Jason Tedor	d43ac8fe11	Include in log retention leases that failed to sync When retention leases fail to sync after an expiration check, we emit a log message about this. This commit adds the retention leases that failed to sync.	2019-02-18 15:08:08 -05:00
Jason Tedor	bbb61002ba	Add some logging related to retention lease syncing (#39066 ) When the background retention lease sync fires, we check an see if any retention leases are expired. If any did expire, we execute a full retention lease sync (write action). Since this is happening on a background thread, we do not block that thread waiting for success (it will simply try again when the timer elapses). However, we were swallowing exceptions that indicate failure. This commit addresses that by logging the failures. Additionally, we add some trace logging to the execution of syncing retention leases.	2019-02-18 15:02:31 -05:00
Henning Andersen	99b2bc3461	Fix potential race during TcpTransport close (#39031 ) Fixed two potential causes for leaked threads during tests: 1. When adding a channel to serverChannels, we add it under a monitor that we do not use when reading from it. This is potentially unsafe if there is no other happens-before relationship ensuring the safety of this. 2. Long-shot but if the thread pool was shutdown before entering this code, we would silently forget about closing server channels so added assert. Strengthened the locking to ensure that once we stop the transport, no new server channels can be made. Relates to CI failure issue: #37543	2019-02-18 19:13:23 +01:00
Alan Woodward	ab4d5f404f	Add overlapping, before, after filters to intervals query (#38999 ) Lucene recently added `overlapping`, `before` and `after` filters to the intervals package. This commit exposes them in elasticsearch.	2019-02-18 15:06:24 +00:00
Adrien Grand	45b17e8645	Don't close caches while there might still be in-flight requests. (#38958 ) Many of our index components use ref-counting so that in the event that a shard is closed while there are still ongoing requests, then the index reader and the store only effectively get closed when ongoing requests have finished. However we don't apply the same principle to the request and query caches, which might get closed while there are still in-flight requests. This commit adds ref-counting to `IndicesService` so that the caches and other components it maintains only get closed when all shards are effectively closed. Closes #37117	2019-02-18 13:59:58 +01:00
Martijn van Groningen	ed08bc3537	Fix LocalIndexFollowingIT#testRemoveRemoteConnection() test (#38709 ) * During fetching remote mapping if remote client is missing then `NoSuchRemoteClusterException` was not handled. * When adding remote connection, check that it is really connected before continue-ing to run the tests. Relates to #38695	2019-02-18 09:41:44 +01:00
Jason Tedor	a5ce1e0bec	Integrate retention leases to recovery from remote (#38829 ) This commit is the first step in integrating shard history retention leases with CCR. In this commit we integrate shard history retention leases with recovery from remote. Before we start transferring files, we take out a retention lease on the primary. Then during the file copy phase, we repeatedly renew the retention lease. Finally, when recovery from remote is complete, we disable the background renewing of the retention lease.	2019-02-16 15:37:52 -05:00
Tim Brooks	b1c1daa63f	Add get file chunk timeouts with listener timeouts (#38758 ) This commit adds a `ListenerTimeouts` class that will wrap a `ActionListener` in a listener with a timeout scheduled on the generic thread pool. If the timeout expires before the listener is completed, `onFailure` will be called with an `ElasticsearchTimeoutException`. Timeouts for the get ccr file chunk action are implemented using this functionality. Additionally, this commit attempts to fix #38027 by also blocking proxied get ccr file chunk actions. This test being un-muted is useful to verify the timeout functionality.	2019-02-16 10:56:03 -07:00
Luca Cavanna	a1a49f201d	Tie break search shard iterator comparisons on cluster alias (#38853 ) `SearchShardIterator` inherits its `compareTo` implementation from `PlainShardIterator`. That is good in most of the cases, as such comparisons are based on the shard id which is unique, even when searching against indices with same names across multiple clusters (thanks to the index uuid being different). In case though the same cluster is registered multiple times with different aliases, the shard id is exactly the same, hence remote results will be returned before local ones with same shard id objects. That is because remote iterators are added before local ones, and we use a stable sorting method in `GroupShardIterators` constructor. This PR enhances `compareTo` for `SearchShardIterator` to tie break on cluster alias and introduces consistent `equals` and `hashcode` methods. This allows to remove a TODO in `SearchResponseMerger` which otherwise has to handle this special case specifically. Also, while at it I added missing tests around equals/hashcode and compareTo and expanded existing ones.	2019-02-16 09:41:03 +01:00
Nhat Nguyen	7e20a92888	Advance max_seq_no before add operation to Lucene (#38879 ) Today when processing an operation on a replica engine (or the following engine), we first add it to Lucene, then add it to translog, then finally marks its seq_no as completed. If a flush occurs after step1, but before step-3, the max_seq_no in the commit's user_data will be smaller than the seq_no of some documents in the Lucene commit.	2019-02-15 21:04:28 -05:00
Nhat Nguyen	20755e666c	Reduce global checkpoint sync interval in disruption tests (#38931 ) We verify seq_no_stats is aligned between copies at the end of some disruption tests. Sometimes, the assertion `assertSeqNos` is tripped due to a lagged global checkpoint on replicas. The global checkpoint on replicas is lagged because we sync the global checkpoint 30 seconds (by default) after the last replication operation. This change reduces the global checkpoint sync-internal to 1s in the disruption tests. Closes #38318 Closes #36789	2019-02-15 21:04:20 -05:00
Nhat Nguyen	a67b9f6d1f	Relax testStressMaybeFlushOrRollTranslogGeneration (#38918 ) The predicate shouldPeriodicallyFlush is determined by the uncommitted translog size and the local checkpoint. The uncommitted translog size depends on the local checkpoint. The condition shouldPeriodicallyFlush can be true twice in in the test in the following scenario: 1. Index doc-0 and advances the local checkpoint to 0, the condition shouldPeriodicallyFlush remains false. 2. Index doc-1 and add it to translog, but the local checkpoint is not advanced yet (still 0). The condition shouldPeriodicallyFlush becomes true because the uncommitted translog size is 216bytes (2ops + gen-1 + gen-2) > 180bytes and the translog generation of the new index commit would advance from 1 to 2. > [2019-02-13T23:33:58,257][TRACE][o.e.i.e.Engine ] [node_s_0] > [test][0] committing writer with commit data [{local_checkpoint=0, > max_unsafe_auto_id_timestamp=-1, translog_uuid=fFp1Yqd4QiqKDD4ZrC8F-g, > min_retained_seq_no=0, history_uuid=cn31yrwVQk-Vs7qcg4bi_Q, > retention_leases=primary_term:1;version:0;, translog_generation=2, > max_seq_no=1}] 1. The shouldPeriodicallyFlush becomes true again after the local checkpoint is advanced to 1 because the uncommitted translog size is 216bytes (2ops + gen-2 + gen-3) > 180bytes and the translog generation of the new index commit would advance from 2 to 4. > [2019-02-13T23:33:58,264][TRACE][o.e.i.e.Engine ] [node_s_0] > [test][0] committing writer with commit data [{local_checkpoint=1, > max_unsafe_auto_id_timestamp=-1, translog_uuid=fFp1Yqd4QiqKDD4ZrC8F-g, > min_retained_seq_no=0, history_uuid=cn31yrwVQk-Vs7qcg4bi_Q, > retention_leases=primary_term:1;version:0;, translog_generation=4, > max_seq_no=1}] We need to relax the assertion in this test to cover this situation. Closes #31629	2019-02-15 21:04:12 -05:00
Armin Braun	238425e5e7	Fix Issue with Concurrent Snapshot Init + Delete (#38518 ) * Fix Issue with Concurrent Snapshot Init + Delete by ensuring that we're not finalizing a snapshot in the repository while it is initializing on another thread * Closes #38489	2019-02-15 16:50:47 -08:00
Alan Woodward	176013e23c	Avoid double term construction in DfsPhase (#38716 ) DfsPhase captures terms used for scoring a query in order to build global term statistics across multiple shards for more accurate scoring. It currently does this by building the query's `Weight` and calling `extractTerms` on it to collect terms, and then calling `IndexSearcher.termStatistics()` for each collected term. This duplicates work, however, as the various `Weight` implementations will already have collected these statistics at construction time. This commit replaces this round-about way of collecting stats, instead using a delegating IndexSearcher that collects the term contexts and statistics when `IndexSearcher.termStatistics()` is called from the Weight. It also fixes a bug when using rescorers, where a `QueryRescorer` would calculate distributed term statistics, but ignore field statistics. `Rescorer.extractTerms` has been removed, and replaced with a new method on `RescoreContext` that returns any queries used by the rescore implementation. The delegating IndexSearcher then collects term contexts and statistics in the same way described above for each Query.	2019-02-15 16:00:38 +00:00
Daniel Mitterdorfer	fcc7f553f5	Also mmap cfs files for hybridfs (#38940 ) (#38947 ) With this commit we add the `.cfs` file extension to the list of file types that are memory-mapped by hybridfs. `.cfs` files combine all files of a Lucene segment into a single file in order to save file handles. As this strategy is only used for "small" segments (less than 10% of the shard size), it is benefical to memory-map them instead of accessing them via NIO. Relates #36668	2019-02-15 15:34:40 +01:00
David Turner	578514e892	Recover peers from translog, ignoring soft deletes (#38904 ) Today if soft deletes are enabled then we read the operations needed for peer recovery from Lucene. However we do not currently make any attempt to retain history in Lucene specifically for peer recoveries so we may discard it and fall back to a more expensive file-based recovery. Yet we still retain sufficient history in the translog to perform an operations-based peer recovery. In the long run we would like to fix this by retaining more history in Lucene, possibly using shard history retention leases (#37165). For now, however, this commit reverts to performing peer recoveries using the history retained in the translog regardless of whether soft deletes are enabled or not.	2019-02-15 10:45:15 +01:00
Henning Andersen	a211e51343	ShardBulkAction ignore primary response on primary (#38901 ) Previously, if a version conflict occurred and a previous primary response was present, the original primary response would be used both for sending to replica and back to client. This was made in the past as an attempt to fix issues with conflicts after relocations where a bulk request would experience a closed shard half way through and thus have to retry on the new primary. It could then fail on its own update. With sequence numbers, this leads to an issue, since if a primary is demoted (network partitions), it will send along the original response in the request. In case of a conflict on the new primary, the old response is sent to the replica. That data could be stale, leading to inconsistency between primary and replica. Relocations now do an explicit hand-off from old to new primary and ensures that no operations are active while doing this. Above is thus no longer necessary. This change removes the special handling of conflicts and ignores primary responses when executing shard bulk requests on the primary.	2019-02-15 10:13:11 +01:00
Jason Tedor	00cb8d0be8	Mark coordinator test as awaits fix This test is failing frequently so this commit mutes it. Relates #38867	2019-02-14 12:43:31 -05:00
Lee Hinman	0c733c04be	Remove immediate operation retry after mapping update (#38873 ) Prior to this commit, when an indexing operation resulted in an `Engine.Result.Type.MAPPING_UPDATE_REQUIRED`, TransportShardBulkAction immediately retries the indexing operation to see if it succeeds. In the event that it succeeds the context does not wait until the mapping update has propagated through the cluster state before finishing the indexing. In some of our tests we rely on mappings being available as soon as they've been introduced in a document that indexed correctly. By removing the immediate retry we always wait for this to be the case. Resolves #38428 Supercedes #38579 Relates to #38711	2019-02-14 09:31:08 -07:00
Christoph Büscher	6c5cec4ff4	Enable silent FollowersCheckerTest (#38851 ) One of the test methods wasn't run because it was private. Making this method public and fixing some issues around mocking the threadpool that otherwise would lead to an NPE.	2019-02-14 16:16:48 +01:00
Albert Zaharovits	6243a9797f	_cat/indices with Security, hide names when wildcard (#38824 ) This changes the output of the `_cat/indices` API with `Security` enabled. It is possible to only display the index name (and possibly the index health, depending on the request options) but not its stats (doc count, merges, size, etc). This is the case for closed indices which have index metadata in the cluster state but no associated shards, hence no shard stats. However, when `Security` is enabled, and the request contains wildcards, open indices without stats are a common occurrence. This is because the index names in the response table are picked up directly from the cluster state which is not filtered by `Security`'s _indexNameExpressionResolver_, unlike the stats data which is populated by the indices stats API which does go through the index name resolver. This is a bug, because it is circumventing `Security`'s function to hide unauthorized indices. This has been fixed by displaying the index names as they are resolved by the indices stats API. The outputs of these two APIs is now very similar: same index names, similar data but different format. Closes #37190	2019-02-14 15:09:17 +02:00
David Roberts	6ea483a663	Mute DedicatedClusterSnapshotRestoreIT testRestoreShrinkIndex Due to https://github.com/elastic/elasticsearch/issues/38845	2019-02-14 11:46:22 +00:00
Luca Cavanna	7456117019	[TEST] address testCollectNodes rare failure (#38559 ) #37767 changed the expected exception for "no such cluster" error from `IllegalStateException` to a dedicated `NoSuchRemoteClusterException`. An assertion in `testCollectNodes` needs to be updated accordingly.	2019-02-14 10:57:14 +01:00
Nhat Nguyen	5d22e45990	Copy retention leases when trim unsafe commits (#37995 ) When a primary shard is recovered from its store, we trim the last commit (when it's unsafe). If that primary crashes before the recovery completes, we will lose the committed retention leases because they are baked in the last commit. With this change, we copy the retention leases from the last commit to the safe commit when trimming unsafe commits. Relates #37165	2019-02-13 17:27:48 -05:00
Jason Tedor	062eea8fcc	Fix excessive increments in soft delete policy (#38813 ) In this case, we were incrementing the policy too much. This means on every iteration we actually keep increasing the minimum retained sequence number, even with leases in place. It was a bug from when the soft deletes policy had retention leases incorporated into it. This commit fixes this bug by ensuring we only increment in the proper places, and adds careful tests for the various situations.	2019-02-13 14:04:45 -05:00
Jake Landis	46bb663a09	Make 7.x like 6.7 user agent ecs, but default to true (#38828 ) Forward port of https://github.com/elastic/elasticsearch/pull/38757 This change reverts the initial 7.0 commits and replaces them with the 6.7 variant that still allows for the ecs flag. This commit differs from the 6.7 variants in that ecs flag will now default to true. 6.7: `ecs` : default `false` 7.x: `ecs` : default `true` 8.0: no option, but behaves as `true` * Revert "Ingest node - user agent, move device to an object (#38115)" This reverts commit `5b008a34aa`. * Revert "Add ECS schema for user-agent ingest processor (#37727) (#37984)" This reverts commit `cac6b8e06f`. * cherry-pick 5dfe1935345da3799931fd4a3ebe0b6aa9c17f57 Add ECS schema for user-agent ingest processor (#37727) * cherry-pick ec8ddc890a34853ee8db6af66f608b0ad0cd1099 Ingest node - user agent, move device to an object (#38115) (#38121) * cherry-pick f63cbdb9b426ba24ee4d987ca767ca05a22f2fbb (with manual merge fixes) Dep. check for ECS changes to User Agent processor (#38362) * make true the default for the ecs option, and update 7.0 references and tests	2019-02-13 10:28:01 -06:00
Przemyslaw Gomulka	7404882105	Fix line separators in JSON logging tests backport#38771 #38834 The hardcoded '\n' in string will not work in Windows where there is a different line separator. A System.lineSeparator should be used to make it work on all platforms closes #38705 backport #38771	2019-02-13 13:34:33 +01:00
Zachary Tong	57f69082fd	Disable cache on QueryProfilerIT (#38748 ) - Disables the request cache on the test, to prevent cached values from potentially interfering with test results - Changes the test to execute a single query, in hopes of making failures more reproducible Backport of #38583	2019-02-12 13:11:52 -05:00
Nhat Nguyen	a3f39741be	Adjust log and unmute testFailOverOnFollower (#38762 ) There were two documents (seq=2 and seq=103) missing on the follower in one of the failures of `testFailOverOnFollower`. I spent several hours on that failure but could not figure out the reason. I adjust log and unmute this test so we can collect more information. Relates #38633	2019-02-12 11:42:25 -05:00
Nhat Nguyen	4a5070dcfb	Use current term in initial leases in engine test (#38285 ) We need to use the current primary term instead of 1L for the initial retention leases; otherwise, the primary term of the committed retention leases won't match the current primary term if the retention leases never gets updated.	2019-02-12 11:40:04 -05:00
Nhat Nguyen	eca5404572	Fix synchronization in LocalCheckpointTracker#contains (#38755 ) We are accessing the `CountedBitSet` in `LocalCheckpointTracker#contains` without proper synchronization. Relates #33871	2019-02-12 11:39:50 -05:00
Nhat Nguyen	225ebb6935	Ensure no snapshotted commit when close engine (#38663 ) With this change, we can automatically detect an implementation that acquires an index commit but fails to release.	2019-02-12 11:39:35 -05:00
Tanguy Leroux	51d6b9ab31	Fix CloseWhileRelocatingShardsIT (#38728 )	2019-02-12 14:04:44 +01:00
Jason Tedor	bbc9aa9979	Introduce retention lease actions (#38756 ) This commit introduces actions for some common retention lease operations that clients need to be able to perform remotely. These actions include add/renew/remove.	2019-02-12 07:38:03 -05:00
Przemyslaw Gomulka	7e178aa4a7	Enable IndexActionTests and WatcherIndexingListenerTests Backport #38738 fix tests to use clock in milliseconds precision in watcher code make sure the date comparison in string format is using same formatters some of the code was modified in #38514 possibly because of merge conflicts closes #38581 Backport #38738	2019-02-12 13:05:44 +01:00
Luca Cavanna	90fff54954	Tie break on cluster alias when merging shard search failures (#38715 ) A recent test failure triggered an edge case scenario where failures may be coming back with the same shard id, yet from different clusters. This commit adapts the failures comparator to take the cluster alias into account when merging failures as part of CCS requests execution. Also the corresponding test has been split in two: with and without search shard target set to the failure. Closes #38672	2019-02-12 11:25:44 +01:00
Jason Tedor	c7cdd6a46a	Add dedicated retention lease exceptions (#38754 ) When a retention lease already exists on an add retention lease invocation, or a retention lease is not found on a renew retention lease invocation today we throw an illegal argument exception. This puts a burden on the caller to catch that specific exception and parse the message. This commit relieves the burden from the caller by adding dedicated exception types for these situations.	2019-02-12 00:32:09 -05:00
Jason Tedor	b97c74bbab	Enable removal of retention leases (#38751 ) This commit introduces the ability to remove retention leases. Explicit removal will be needed to manage retention leases used to increase the likelihood of operation-based recoveries syncing, and for consumers such as ILM.	2019-02-11 21:19:11 -05:00
Nick Knize	e2f432a413	Fix the version check for LegacyGeoShapeFieldMapper (#38547 ) Change version check from 7.0 to 6.6 in BaseGeoShapeFieldMapper to correctly use LegacyGeoShapeFieldMapper for indexes created prior to 6.6.	2019-02-11 16:27:47 -06:00
Nick Knize	078da6d9bd	Fix GeoHash PrefixTree BWC (#38584 ) geo_shape indexes created before 6.6 use geohash string encoding as default tree parameter and quadtree encoding for 6.6 and later. This commit fixes bwc to use geohash encoding in LegacyGeoshapeFieldMapper for indexes created before 6.6.	2019-02-11 11:59:51 -06:00
David Roberts	d1848b96fc	Fix possible assertion failure in IndicesQueryCache.close (#38731 ) The assertion that the stats2 map is empty in IndicesQueryCache.close has been observed to fail very occasionally in internal cluster tests. The likely cause is a cross-thread visibility problem for a count variable. This change makes that count volatile. Relates #37117 Backport of #38714	2019-02-11 17:33:20 +00:00
Tanguy Leroux	dc212de822	Specialize pre-closing checks for engine implementations (#38702 ) (#38722 ) The Close Index API has been refactored in 6.7.0 and it now performs pre-closing sanity checks on shards before an index is closed: the maximum sequence number must be equals to the global checkpoint. While this is a strong requirement for regular shards, we identified the need to relax this check in the case of CCR following shards. The following shards are not in charge of managing the max sequence number or global checkpoint, which are pulled from a leader shard. They also fetch and process batches of operations from the leader in an unordered way, potentially leaving gaps in the history of ops. If the following shard lags a lot it's possible that the global checkpoint and max seq number never get in sync, preventing the following shard to be closed and a new PUT Follow action to be issued on this shard (which is our recommended way to resume/restart a CCR following). This commit allows each Engine implementation to define the specific verification it must perform before closing the index. In order to allow following/frozen/closed shards to be closed whatever the max seq number or global checkpoint are, the FollowingEngine and ReadOnlyEngine do not perform any check before the index is closed. Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com>	2019-02-11 17:34:17 +01:00
Luca Cavanna	6443b46184	Clean up ShardSearchLocalRequest (#38574 ) Added a constructor accepting `StreamInput` as argument, which allowed to make most of the instance members final as well as remove the default constructor. Removed a test only constructor in favour of invoking the existing constructor that takes a `SearchRequest` as first argument. Also removed profile members and related methods as they were all unused.	2019-02-11 15:55:46 +01:00
Alexander Reelsen	884b5063a4	Create ISO8601 joda compatible java time formatter (#38434 ) The existing formatter being used was not on par with the joda formatter as it was missing the ability to parse a comma as a separator between seconds and milliseconds. While a real iso8601 would be much more complex, this might be sufficient for some more use-cases. The ingest date formatter now also uses the iso8601 formatter by default. Closes #38345	2019-02-11 15:11:26 +01:00
Alexander Reelsen	e7868e92bd	Restore date aggregation performance in UTC case (#38221 ) (#38700 ) The benchmarks showed a sharp decrease in aggregation performance for the UTC case. This commit uses the same calculation as joda time, which requires no conversion into any java time object, also, the check for an fixedoffset has been put into the ctor to reduce the need for runtime calculations. The same goes for the amount of the used unit in milliseconds. Closes #37826	2019-02-11 16:30:48 +03:00
Luca Cavanna	fe8bd757b2	Look up connection using the right cluster alias when releasing contexts (#38570 ) Whenever phase failure is raised in AbstractSearchAsyncAction, we go and release search contexts of shards that successfully returned their results, prior to notifying the listener of the failure. In case we are executing a CCS request, it's important to look-up the connection to send the release context request to. This commit makes sure that the lookup takes the cluster alias into account. We used to use `null` at all times instead which is not correct and was not caught as any exception is caught without re-throwing it.	2019-02-11 13:40:42 +01:00
Przemyslaw Gomulka	ba9a4d13e1	mute Failing tests related to logging and joda-java migration backport(#38704 )(#38710 ) the tests awaits fix from #38693 and #38705 and #38581	2019-02-11 13:15:12 +01:00
Przemyslaw Gomulka	ab9e2f2e69	Move testToUtc test to DateFormattersTests #38698 Backport #38610 The test was relying on toString in ZonedDateTime which is different to what is formatted by strict_date_time when milliseconds are 0 The method is just delegating to dateFormatter, so that scenario should be covered there. closes #38359 Backport #38610	2019-02-11 11:34:25 +01:00
Like	b8be6cb5c7	Reject index.optimize_auto_generated_id setting (#28895 ) This commit rejects the index.optmize_auto_generated_id setting for indices created on or after 7.0.0. This setting was deprecated in 6.7.0.	2019-02-10 13:46:09 -05:00
Tim Brooks	023e3c207a	Concurrent file chunk fetching for CCR restore (#38656 ) Adds the ability to fetch chunks from different files in parallel, configurable using the new `ccr.indices.recovery.max_concurrent_file_chunks` setting, which defaults to 5 in this PR. The implementation uses the parallel file writer functionality that is also used by peer recoveries.	2019-02-09 21:19:57 -07:00
Christoph Büscher	e3c7b93917	Mute failure in InternalEngineTests (#38622 )	2019-02-08 16:29:54 +01:00
Dimitris Athanasiou	fe8182ece2	Mute RetentionLeastIT.testRetentionLeasesSyncOnRecovery on 7x (#38597 )	2019-02-08 11:32:28 +02:00
Jason Tedor	fdf6b3f23f	Add 7.1 version constant to 7.x branch (#38513 ) This commit adds the 7.1 version constant to the 7.x branch. Co-authored-by: Andy Bristol <andy.bristol@elastic.co> Co-authored-by: Tim Brooks <tim@uncontended.net> Co-authored-by: Christoph Büscher <cbuescher@posteo.de> Co-authored-by: Luca Cavanna <javanna@users.noreply.github.com> Co-authored-by: markharwood <markharwood@gmail.com> Co-authored-by: Ioannis Kakavas <ioannis@elastic.co> Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co> Co-authored-by: David Roberts <dave.roberts@elastic.co> Co-authored-by: Jason Tedor <jason@tedor.me> Co-authored-by: Alpar Torok <torokalpar@gmail.com> Co-authored-by: David Turner <david.turner@elastic.co> Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com> Co-authored-by: Tim Vernum <tim@adjective.org> Co-authored-by: Albert Zaharovits <albert.zaharovits@gmail.com>	2019-02-07 16:32:27 -05:00
Jason Tedor	f8ed6c15c4	Enable BWC after backport recovering leases (#38485 ) This commit enables the BWC tests after backporting recovery of retention leases during peer recovery.	2019-02-06 08:03:19 -05:00
Jason Tedor	4b42281a4e	Collapse retention lease integration tests (#38483 ) This commit collapses the retention lease integration tests into a single suite.	2019-02-06 07:55:41 -05:00
Tanguy Leroux	510829f9f7	TransportVerifyShardBeforeCloseAction should force a flush (#38401 ) This commit changes the `TransportVerifyShardBeforeCloseAction` so that it always forces the flush of the shard. It seems that #37961 is not sufficient to ensure that the translog and the Lucene commit share the exact same max seq no and global checkpoint information in case of one or more noop operations have been made. The `BulkWithUpdatesIT.testThatMissingIndexDoesNotAbortFullBulkRequest` and `FrozenIndexTests.testFreezeEmptyIndexWithTranslogOps` test this trivial situation and they both fail 1 on 10 executions. Relates to #33888	2019-02-06 13:22:54 +01:00
David Turner	5a3c452480	Align docs etc with new discovery setting names (#38492 ) In #38333 and #38350 we moved away from the `discovery.zen` settings namespace since these settings have an effect even though Zen Discovery itself is being phased out. This change aligns the documentation and the names of related classes and methods with the newly-introduced naming conventions.	2019-02-06 11:34:38 +00:00
Ioannis Kakavas	e1d464b22c	Mute testRetentionLeasesSyncOnRecovery (#38488 ) Relates: #38487	2019-02-06 08:52:54 +02:00
Armin Braun	34f2cc78f6	Fix Master Failover and DataNode Leave Blocking Snapshot (#38460 ) * Closes #38447	2019-02-05 23:56:59 +01:00
Jason Tedor	79a45b47da	Recover retention leases during peer recovery (#38435 ) This commit integrates retention leases with recovery. With this change, we copy the current retention leases on primary to the replica during phase two of recovery. At this point, the replica will have been added to the replication group and so is already receiving retention lease sync requests from the primary. This means that if any retention lease syncs are triggered on the primary after we sample the retention leases here during phase two, that sync request will also arrive on the replica ensuring that the replica is from this point on up to date with the retention leases on the primary. We have to copy these during phase two since we will be applying indexing operations, potentially triggering merges, and therefore must ensure the correct retention leases are in place beforehand.	2019-02-05 17:43:41 -05:00
Henning Andersen	20c66c5a05	Bubble-up exceptions from scheduler (#38317 ) Instead of logging warnings we now rethrow exceptions thrown inside scheduled/submitted tasks. This will still log them as warnings in production but has the added benefit that if they are thrown during unit/integration test runs, the test will be flagged as an error. This is a continuation of #38014 Fixed NPE that caused CCR tests (IndexFollowingIT and likely others) to fail. schedule could bubble rejected exception to uncaught exception handler when not using SAME executor if thread pool is terminated. Now ignore rejected exception silently if executor is shutdown.	2019-02-05 21:48:24 +01:00
Boaz Leskes	033ba725af	Remove support for internal versioning for concurrency control (#38254 ) Elasticsearch has long [supported](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html#index-versioning) compare and set (a.k.a optimistic concurrency control) operations using internal document versioning. Sadly that approach is flawed and can sometime do the wrong thing. Here's the relevant excerpt from the resiliency status page: > When a primary has been partitioned away from the cluster there is a short period of time until it detects this. During that time it will continue indexing writes locally, thereby updating document versions. When it tries to replicate the operation, however, it will discover that it is partitioned away. It won’t acknowledge the write and will wait until the partition is resolved to negotiate with the master on how to proceed. The master will decide to either fail any replicas which failed to index the operations on the primary or tell the primary that it has to step down because a new primary has been chosen in the meantime. Since the old primary has already written documents, clients may already have read from the old primary before it shuts itself down. The version numbers of these reads may not be unique if the new primary has already accepted writes for the same document We recently [introduced](https://www.elastic.co/guide/en/elasticsearch/reference/6.x/optimistic-concurrency-control.html) a new sequence number based approach that doesn't suffer from this dirty reads problem. This commit removes support for internal versioning as a concurrency control mechanism in favor of the sequence number approach. Relates to #1078	2019-02-05 20:53:35 +01:00
Jason Tedor	b03d138122	Lift retention lease expiration to index shard (#38380 ) This commit lifts the control of when retention leases are expired to index shard. In this case, we move expiration to an explicit action rather than a side-effect of calling ReplicationTracker#getRetentionLeases. This explicit action is invoked on a timer. If any retention leases expire, then we hard sync the retention leases to the replicas. Otherwise, we proceed with a background sync.	2019-02-05 14:42:17 -05:00
Tim Brooks	c2a8fe1f91	Prevent CCR recovery from missing documents (#38237 ) Currently the snapshot/restore process manually sets the global checkpoint to the max sequence number from the restored segements. This does not work for Ccr as this will lead to documents that would be recovered in the normal followering operation from being recovered. This commit fixes this issue by setting the initial global checkpoint to the existing local checkpoint.	2019-02-05 13:32:41 -06:00
Tal Levy	aef5775561	re-enables awaitsfixed datemath tests (#38376 ) Previously, date formats of `YYYY.MM.dd` would hit an issue where the year would jump towards the end of the calendar year. This was an issue that had since been resolved in tests by using `yyyy` to be the more accurate representation of the year. Closes #37037.	2019-02-05 11:20:40 -08:00
Julie Tibshirani	3ce7d2c9b6	Make sure to reject mappings with type _doc when include_type_name is false. (#38270 ) `CreateIndexRequest#source(Map<String, Object>, ... )`, which is used when deserializing index creation requests, accidentally accepts mappings that are nested twice under the type key (as described in the bug report #38266). This in turn causes us to be too lenient in parsing typeless mappings. In particular, we accept the following index creation request, even though it should not contain the type key `_doc`: ``` PUT index?include_type_name=false { "mappings": { "_doc": { "properties": { ... } } } } ``` There is a similar issue for both 'put templates' and 'put mappings' requests as well. This PR makes the minimal changes to detect and reject these typed mappings in requests. It does not address #38266 generally, or attempt a larger refactor around types in these server-side requests, as I think this should be done at a later time.	2019-02-05 10:52:32 -08:00
David Turner	f2dd5dd6eb	Remove DiscoveryPlugin#getDiscoveryTypes (#38414 ) With this change we no longer support pluggable discovery implementations. No known implementations of `DiscoveryPlugin` actually override this method, so in practice this should have no effect on the wider world. However, we were using this rather extensively in tests to provide the `test-zen` discovery type. We no longer need a separate discovery type for tests as we no longer need to customise its behaviour. Relates #38410	2019-02-05 17:42:24 +00:00
David Turner	b7ab521eb1	Throw AssertionError when no master (#38432 ) Today we throw a fatal `RuntimeException` if an exception occurs in `getMasterName()`, and this includes the case where there is currently no master. However, sometimes we call this method inside an `assertBusy()` in order to allow for a cluster that is in the process of stabilising and electing a master. The trouble is that `assertBusy()` only retries on an `AssertionError` and not on a general `RuntimeException`, so the lack of a master is immediately fatal. This commit fixes the issue by asserting there is a master, triggering a retry if there is not. Fixes #38331	2019-02-05 17:11:20 +00:00
Armin Braun	2f6afd290e	Fix Concurrent Snapshot Ending And Stabilize Snapshot Finalization (#38368 ) * The problem in #38226 is that in some corner cases multiple calls to `endSnapshot` were made concurrently, leading to non-deterministic behavior (`beginSnapshot` was triggering a repository finalization while one that was triggered by a `deleteSnapshot` was already in progress) * Fixed by: * Making all `endSnapshot` calls originate from the cluster state being in a "completed" state (apart from on short-circuit on initializing an empty snapshot). This forced putting the failure string into `SnapshotsInProgress.Entry`. * Adding deduplication logic to `endSnapshot` * Also: * Streamlined the init behavior to work the same way (keep state on the `SnapshotsService` to decide which snapshot entries are stale) * closes #38226	2019-02-05 16:44:18 +01:00
Lee Hinman	d862453d68	Support unknown fields in ingest pipeline map configuration (#38352 ) We already support unknown objects in the list of pipelines, this changes the `PipelineConfiguration` to support fields other than just `id` and `config`. Relates to #36938	2019-02-05 07:52:17 -07:00
David Turner	3b2a0d7959	Rename no-master-block setting (#38350 ) Replaces `discovery.zen.no_master_block` with `cluster.no_master_block`. Any value set for the old setting is now ignored.	2019-02-05 08:47:56 +00:00
David Turner	2d114a02ff	Rename static Zen1 settings (#38333 ) Renames the following settings to remove the mention of `zen` in their names: - `discovery.zen.hosts_provider` -> `discovery.seed_providers` - `discovery.zen.ping.unicast.concurrent_connects` -> `discovery.seed_resolver.max_concurrent_resolvers` - `discovery.zen.ping.unicast.hosts.resolve_timeout` -> `discovery.seed_resolver.timeout` - `discovery.zen.ping.unicast.hosts` -> `discovery.seed_addresses`	2019-02-05 08:46:52 +00:00
Yogesh Gaikwad	fe36861ada	Add support for API keys to access Elasticsearch (#38291 ) X-Pack security supports built-in authentication service `token-service` that allows access tokens to be used to access Elasticsearch without using Basic authentication. The tokens are generated by `token-service` based on OAuth2 spec. The access token is a short-lived token (defaults to 20m) and refresh token with a lifetime of 24 hours, making them unsuitable for long-lived or recurring tasks where the system might go offline thereby failing refresh of tokens. This commit introduces a built-in authentication service `api-key-service` that adds support for long-lived tokens aka API keys to access Elasticsearch. The `api-key-service` is consulted after `token-service` in the authentication chain. By default, if TLS is enabled then `api-key-service` is also enabled. The service can be disabled using the configuration setting. The API keys:- - by default do not have an expiration but expiration can be configured where the API keys need to be expired after a certain amount of time. - when generated will keep authentication information of the user that generated them. - can be defined with a role describing the privileges for accessing Elasticsearch and will be limited by the role of the user that generated them - can be invalidated via invalidation API - information can be retrieved via a get API - that have been expired or invalidated will be retained for 1 week before being deleted. The expired API keys remover task handles this. Following are the API key management APIs:- 1. Create API Key - `PUT/POST /_security/api_key` 2. Get API key(s) - `GET /_security/api_key` 3. Invalidate API Key(s) `DELETE /_security/api_key` The API keys can be used to access Elasticsearch using `Authorization` header, where the auth scheme is `ApiKey` and the credentials, is the base64 encoding of API key Id and API key separated by a colon. Example:- ``` curl -H "Authorization: ApiKey YXBpLWtleS1pZDphcGkta2V5" http://localhost:9200/_cluster/health ``` Closes #34383	2019-02-05 14:21:57 +11:00
Christoph Büscher	d255303584	Add typless client side GetIndexRequest calls and response class (#37778 ) The HLRC client currently uses `org.elasticsearch.action.admin.indices.get.GetIndexRequest` and `org.elasticsearch.action.admin.indices.get.GetIndexResponse` in its get index calls. Both request and response are designed for the typed APIs, including some return types e.g. for `getMappings()` which in the maps it returns still use a level including the type name. In order to change this without breaking existing users of the HLRC API, this PR introduces two new request and response objects in the `org.elasticsearch.client.indices` client package. These are used by the IndicesClient#get and IndicesClient#exists calls now by default and support the type-less API. The old request and response objects are still kept for use in similarly named, but deprecated methods. The newly introduced client side classes are simplified versions of the server side request/response classes since they don't need to support wire serialization, and only the response needs fromXContent parsing (but no xContent-serialization, since this is the responsibility of the server-side class). Also changing the return type of `GetIndexResponse#getMapping` to `Map<String, MappingMetaData> getMappings()`, while it previously was returning another map keyed by the type-name. Similar getters return simple Maps instead of the ImmutableOpenMaps that the server side response objects return.	2019-02-05 03:41:05 +01:00
Gordon Brown	292e0f6fb7	Deprecate `_type` in simulate pipeline requests (#37949 ) As mapping types are being removed throughout Elasticsearch, the use of `_type` in pipeline simulation requests is deprecated. Additionally, the default `_type` used if one is not supplied has been changed to `_doc` for consistency with the rest of Elasticsearch.	2019-02-04 16:11:44 -07:00
Christoph Büscher	0ced775389	Mute RareClusterStateIT.testDelayedMappingPropagationOnReplica (#38357 )	2019-02-04 22:30:34 +01:00
Mayya Sharipova	641704464d	Deprecate types in rollover index API (#38039 ) Relates to #35190	2019-02-04 16:07:45 -05:00
Zachary Tong	ab1150378b	Add Composite to AggregationBuilders (#38207 )	2019-02-04 13:47:04 -05:00
David Turner	2c1eab2b8a	Clarify slow cluster-state log messages (#38302 ) The message `... took [31s] above the warn threshold of 30s` suggests incorrectly that the task took 61 seconds. This commit adds the clarifying words `which is`.	2019-02-04 17:44:00 +00:00
Andrey Ershov	7bc8bc9605	ensureGreen (#38324 )	2019-02-04 16:36:04 +01:00
Jason Tedor	625d37a26a	Introduce retention lease background sync (#38262 ) This commit introduces a background sync for retention leases. The idea here is that we do a heavyweight sync when adding a new retention lease, and then periodically we want to background sync any retention lease renewals to the replicas. As long as the background sync interval is significantly lower than the extended lifetime of a retention lease, it is okay if from time to time a replica misses a sync (it will still have an older version of the lease that is retaining more data as we assume that renewals do not decrease the retaining sequence number). There are two follow-ups that will come after this commit. The first is to address the fact that we have not adapted the should periodically flush logic to possibly flush the retention leases. We want to do something like flush if we have not flushed in the last five minutes and there are renewed retention leases since the last time that we flushed. An additional follow-up will remove the syncing of retention leases when a retention lease expires. Today this sync could be invoked in the background by a merge operation. Rather, we will move the syncing of retention lease expiration to be done under the background sync. The background sync will use the heavyweight sync (write action) if a lease has expired, and will use the lightweight background sync (replication action) otherwise.	2019-02-04 10:35:29 -05:00
Christoph Büscher	5ee7232379	Mute SpecificMasterNodesIT#testElectOnlyBetweenMasterNodes (#38334 )	2019-02-04 16:10:06 +01:00
Christoph Büscher	715e581378	Mute DedicatedClusterSnapshotRestoreIT#testRestoreShrinkIndex (#38330 )	2019-02-04 15:46:19 +01:00
Boaz Leskes	e49b593c81	Move TokenService to seqno powered cas (#38311 ) Relates #37872 Relates #10708	2019-02-04 15:25:41 +01:00
Yannick Welsch	ece8c659c5	Decrease leader and follower check timeout (#38298 ) Reduces the leader and follower check timeout to 3 * 10 = 30s instead of 3 * 30 = 90s, with 30s still being a very long time for a node to be completely unresponsive.	2019-02-04 15:11:12 +01:00
Przemyslaw Gomulka	9b64558efb	Migrating from joda to java.time. Watcher plugin (#35809 ) part of the migrating joda time work. Migrating watcher plugin to use JDK's java-time refers #27330	2019-02-04 15:08:31 +01:00
Alexander Reelsen	87f3579125	Add nanosecond field mapper (#37755 ) This adds a dedicated field mapper that supports nanosecond resolution - at the price of a reduced date range. When using the date field mapper, the time is stored as milliseconds since the epoch in a long in lucene. This field mapper stores the time in nanoseconds since the epoch - which means its range is much smaller, ranging roughly from 1970 to 2262. Note that aggregations will still be in milliseconds. However docvalue fields will have full nanosecond resolution Relates #27330	2019-02-04 11:31:16 +01:00
Christoph Büscher	15510da2af	Mute SharedClusterSnapshotRestoreIT#testAbortedSnapshotDuringInitDoesNotStart (#38304 )	2019-02-04 10:41:35 +01:00
David Turner	1d82a6d9f9	Deprecate unused Zen1 settings (#38289 ) Today the following settings in the `discovery.zen` namespace are still used: - `discovery.zen.no_master_block` - `discovery.zen.hosts_provider` - `discovery.zen.ping.unicast.concurrent_connects` - `discovery.zen.ping.unicast.hosts.resolve_timeout` - `discovery.zen.ping.unicast.hosts` This commit deprecates all other settings in this namespace so that they can be removed in the next major version.	2019-02-04 08:52:08 +00:00
Armin Braun	4561f425db	Remove Redundandant Loop in SnapshotShardsService (#38283 ) * This was a merge mistake on my end I think, obviously we only need to loop over the shards once not twice here to find those that we missed in INIT state	2019-02-04 09:06:39 +01:00
Alpar Torok	d58e899d45	Remove empty service files (#38192 )	2019-02-04 10:05:04 +02:00
Jason Tedor	d2cc1459a3	Fix ordering problem in add or renew lease test (#38280 ) We have to set the primary term before we add a retention lease, otherwise we can not assert the correct primary term.	2019-02-03 12:54:31 -05:00
Christoph Büscher	6ca7a913ea	Mute ReplicationTrackerRetentionLeaseTests#testAddOrRenewRetentionLease (#38275 )	2019-02-03 12:54:13 +01:00
Armin Braun	89d7c57bd9	Fix Incorrect Transport Response Handler Type (#38264 ) * Fix Incorrect Transport Response Handler Type * The response type here is not empty and was always wrong but this only became visible now that `0a604e3b24` was introduced * As a result of `0a604e3b24` we started actually handling the response of this request and logging/handling exceptions before that we simply dropped the classcast exception here quietly using the empty response handler * fix busy assert not handling `Exception` * Closes #38226 * Closes #38256	2019-02-03 08:48:15 +01:00
Nhat Nguyen	0861dc3581	Mute testCanRunUnsafeBootstrapAfterErroneousDetachWithoutLoosingMetaData (#38268 ) Tracked at #38267	2019-02-02 20:02:21 -05:00
Christoph Büscher	50cdc61874	Mute DedicatedClusterSnapshotRestoreIT#testRestoreShrinkIndex (#38257 )	2019-02-02 13:46:29 +01:00
David Turner	c311062476	Add CoordinatorTests for empty unicast hosts list (#38209 ) Today we have DiscoveryDisruptionIT tests for checking that discovery can still work once the cluster has formed, even if the cluster is misconfigured and only has a single master-eligible node in its unicast hosts list. In fact with Zen2 we can go one better: we do not need any nodes in the unicast hosts list, because nodes also use the contents of the last-committed cluster state for discovery. Additionally, the DiscoveryDisruptionIT tests were failing due to the overenthusiastic fault-detection timeouts. This commit replaces these tests with deterministic `CoordinatorTests` that verify the same behaviour. It also removes some duplication by extracting a test method called `testFollowerCheckerAfterMasterReelection()` Closes #37687	2019-02-02 07:54:56 +00:00
Nhat Nguyen	80d3092292	Fix primary term in testAddOrRenewRetentionLease (#38239 ) We should increase primary term before renewing leases; otherwise, the term of the latest RetentionLeases will be lower than the current term. Relates #37951	2019-02-02 02:38:53 -05:00
Nhat Nguyen	1ec04dff43	FIx testReplicaIgnoresOlderRetentionLeasesVersion (#38246 ) If the innerLength is 0, the version won't be increased; then there will be two RetentionLeases with the same term and version, but their leases are different. Relates #37951 Closes #38245	2019-02-02 02:37:37 -05:00
Nhat Nguyen	8bee5b8e06	Mute testAddOrRenewRetentionLease (#38240 ) Relates #38239	2019-02-01 21:27:10 -05:00
Boaz Leskes	f6e06a2b19	Adapt minimum versions for seq# powered operations in Watch related requests and UpdateRequest (#38231 ) After backporting #37977, #37857 and #37872	2019-02-01 20:37:16 -05:00
Jason Tedor	f181e17038	Introduce retention leases versioning (#37951 ) Because concurrent sync requests from a primary to its replicas could be in flight, it can be the case that an older retention leases collection arrives and is processed on the replica after a newer retention leases collection has arrived and been processed. Without a defense, in this case the replica would overwrite the newer retention leases with the older retention leases. This commit addresses this issue by introducing a versioning scheme to retention leases. This versioning scheme is used to resolve out-of-order processing on the replica. We persist this version into Lucene and restore it on recovery. The encoding of retention leases is starting to get a little ugly. We can consider addressing this in a follow-up.	2019-02-01 17:19:19 -05:00
Nhat Nguyen	9c39dea7ae	AwaitsFix testAbortedSnapshotDuringInitDoesNotStart (#38227 ) Tracked at #38226	2019-02-01 16:24:02 -05:00
Armin Braun	03a1d21070	SnapshotShardsService Simplifications (#38025 ) * Instead of replacing the `shardSnapshots` field, we mutate it, explicitly removing entries from it in only a single spot * Decreased the amount of indirection by moving all logic for starting a snapshot's newly discovered shard tasks into `startNewShards` (saves us two maps (keyed by snapshot) and iterations over them)	2019-02-01 20:46:14 +01:00
Luca Cavanna	ee57420de6	Adjust SearchRequest version checks (#38181 ) The finalReduce flag is now supported on 6.x too, hence we need to update the version checks in master.	2019-02-01 19:23:13 +01:00
Andrey Ershov	04dc41b99e	Zen2ify RareClusterStateIT (#38184 ) In Zen 1 there are commit timeout and publish timeout and these settings could be changed on-the-fly. In Zen 2, there is only commit timeout and this setting is static. RareClusterStateIT is actively using these settings and the fact, they are dynamic. This commit adds cancelCommitedPublication method to Coordinator to be used by tests. This method will cancel current committed publication if there is any. When there is BlockClusterStateProcessing on the non-master node, the publication will be accepted and committed, but not yet applied. So we can use the method above to cancel it. Also, this commit replaces callback + AtomicReference with ActionFuture, which makes test code easier to read.	2019-02-01 18:18:11 +01:00
Yannick Welsch	025bf28405	Fix _host based require filters (#38173 ) Using index.routing.allocation.require._host does not correctly work because the boolean logic in filter matching is broken (DiscoveryNodeFilters.match(...) will return false) when opType ==OpType.AND	2019-02-01 16:02:37 +01:00
Tanguy Leroux	da6269b456	RestoreService should update primary terms when restoring shards of existing indices (#38177 ) When restoring shards of existing indices, the RestoreService also restores the values of primary terms stored in the snapshot index metadata. The primary terms are not updated and could potentially conflict with current index primary terms if the restored primary terms are lower than the existing ones. This situation is likely to happen with replicated closed indices (because primary terms are increased when the index is transitioning from open to closed state, and the snapshotted primary terms are the one at the time the index was opened) (see #38024) and maybe also with CCR. This commit changes the RestoreService so that it updates the primary terms using the maximum value between the snapshotted values and the existing values. Related to #33888	2019-02-01 15:59:11 +01:00
Desmond Vehar	c1c4abae10	Throw if two inner_hits have the same name (#37645 ) This change throws an error if two inner_hits have the same name Closes #37584	2019-02-01 15:53:50 +01:00
Alexander Reelsen	35ed137684	Ensure joda compatibility in custom date formats (#38171 ) If custom date formats are used, there may be combinations that the new performat DateFormatters.from() method has not covered yet. This adds a few such corner cases and ensures the tests are correctly commented out.	2019-02-01 15:42:56 +01:00
Jim Ferenczi	66e4fb4fb6	Do not compute cardinality if the `terms` execution mode does not use `global_ordinals` (#38169 ) In #38158 we ensured that global ordinals are not loaded when another execution hint is explicitly set on the source. This change is a follow up that addresses a comment `dd6043c1c0 (r252984782)` added after the merge.	2019-02-01 15:32:19 +01:00
Nhat Nguyen	2e475d63f7	Do not set timeout for IndexRequests in GatewayIndexStateIT (#38147 ) CI might not be fast enough to publish a dynamic mapping update within 100ms.	2019-02-01 09:30:03 -05:00
Andrey Ershov	c1270e97b0	Zen2ify testMasterFailoverDuringIndexingWithMappingChanges (#38178 ) In Zen2 cluster bootstrap is required and some parameters are called differently in Zen2.	2019-02-01 15:24:08 +01:00
Andrey Ershov	bda591453c	Add elasticsearch-node detach-cluster command (#37979 ) This commit adds the second part of `elasticsearch-node` tool - `detach-cluster` command in addition to `unsafe-bootstrap` command. Also, this commit changes the semantics of `unsafe-bootstrap`, now `unsafe-bootstrap` changes clusterUUID. So the algorithm of running `elasticsearch-node` tool is the following: 1) Stop all nodes in the cluster. 2) Pick master-eligible node with the highest (term, version) pair and run the `unsafe-bootstrap` command on it. If there are no survived master-eligible nodes - skip this step. 3) Run `detach-cluster` command on the remaining survived nodes. Detach cluster makes the following changes to the node metadata: 1) Sets clusterUUID committed to false. 2) Sets currentTerm and term to 0. 3) Removes voting tombstones and sets voting configurations to special constant MUST_JOIN_ELECTED_MASTER, that prevents initial cluster bootstrap. `ElasticsearchNodeCommand` base abstract class is introduced, because `UnsafeBootstrapMasterCommand` and `DetachClusterCommand` have a lot in common. Also, this commit adds "ordinal" parameter to both commands, because it's impossible to write IT otherwise. For MUST_JOIN_ELECTED_MASTER case special handling is introduced in `ClusterFormationFailureHelper`. Tests for both commands reside in `ElasticsearchNodeCommandIT` (renamed from `UnsafeBootstrapMasterIT`).	2019-02-01 14:53:55 +01:00
Alexander Reelsen	979e5576e5	Add tests for fractional epoch parsing (#38162 ) Fractional epoch parsing is supported, the tests we used were edge cases that did not make sense. This adds tests to properly check for this.	2019-02-01 14:48:37 +01:00
Tanguy Leroux	029e4b6278	Clear send behavior rule in CloseWhileRelocatingShardsIT (#38159 ) The current CloseWhileRelocatingShardsIT test adds some "send behavior" rule to a target node's mocked transport service in order to detect when shard relocating are started. These rules are never cleared and prevent the test to complete normally after the rebalance is re-enabled again. This commit changes the test so that rules are cleared and most verifications are done before the rebalance is reenabled again. Closes #38090	2019-02-01 12:58:46 +01:00
Yannick Welsch	ce469cfda5	Fix testCorruptedIndex (#38161 ) Folks at the Lucene project do not seem to be interested in classifying corruptions and distinguishing them from file-system exceptions (see https://issues.apache.org/jira/browse/LUCENE-8525), so we'll just cop out as well. Closes #34322	2019-02-01 12:51:38 +01:00
Luca Cavanna	e18cac3659	Add finalReduce flag to SearchRequest (#38104 ) With #37000 we made sure that fnial reduction is automatically disabled whenever a localClusterAlias is provided with a SearchRequest. While working on #37838, we found a scenario where we do need to set a localClusterAlias yet we would like to perform a final reduction in the remote cluster: when searching on a single remote cluster. Relates to #32125 This commit adds support for a separate finalReduce flag to SearchRequest and makes use of it in TransportSearchAction in case we are searching against a single remote cluster. This also makes sure that num_reduce_phases is correct when searching against a single remote cluster: it makes little sense to return `num_reduce_phases` set to `2`, which looks especially weird in case the search was performed against a single remote shard. We should perform one reduction phase only in this case and `num_reduce_phases` should reflect that. * line length	2019-02-01 12:11:42 +01:00
Jim Ferenczi	6fa93ca493	Forbid negative field boosts in analyzed queries (#37930 ) This change forbids negative field boost in the `query_string`, `simple_query_string` and `multi_match` queries. Negative boosts are not allowed in Lucene 8 (scores must be positive). The backport of this change to 6x will turn the error into a deprecation warning in order to raise the awareness of this breaking change in 7.0. Closes #33309	2019-02-01 11:41:40 +01:00
Jim Ferenczi	57b1d245e8	Remove AtomiFieldData#getLegacyFieldValues (#38087 ) This function is unused now that we format the docvalue fields with the default formatter on the field (#30831)	2019-02-01 11:41:17 +01:00
Andrey Ershov	bfd618cf83	Universal cluster bootstrap method for tests with autoMinMasterNodes=false (#38038 ) Currently, there are a few tests that use autoMinMasterNodes=false and hence override addExtraClusterBootstrapSettings, mostly this is 10-30 lines of codes that are copy-pasted from class to class. This PR introduces `InternalTestCluster.setBootstrapMasterNodeIndex` which is suitable for all classes and copy-paste could be removed. Removing code is always a good thing!	2019-02-01 11:34:31 +01:00
Jim Ferenczi	b7308aa03c	Don't load global ordinals with the `map` execution_hint (#37833 ) The terms aggregator loads the global ordinals to retrieve the cardinality of the field to aggregate on. This information is then used to select the strategy to use for the aggregation (breadth_first or depth_first). However this should be avoided if the execution_hint is explicitly set to map since this mode doesn't really need the global ordinals. Since we still need the cardinality of the field this change picks the maximum cardinality in the segments as an estimation of the total cardinality to select the strategy to use (breadth_first or depth_first). This estimation is only used if the execution hint is set to map, otherwise the global ordinals are still used to retrieve the accurate cardinality. Closes #37705	2019-02-01 09:35:46 +01:00
David Turner	23f00e3676	Relax fault detector in some disruption tests (#38101 ) Today we use `AbstractDisruptionTestCase` to test the behaviour of things like master elections in the presence of cluster disruptions. These tests have rather enthusiastic fault detection settings, detecting a fault if a single ping fails, with a one-second timeout. Furthermore there are some tests that assert the identity of the master remains unchanged during some disruption, and these assertions fail rather often thanks to the overly sensitive fault detector. However in a number of these tests the fault detector need not be this sensitive. This commit moves some such tests into their own test suite and uses more sensible fault-detection settings to avoid the kind of master instability that is causing CI failures. Closes #37699	2019-02-01 08:10:49 +00:00
Alexander Reelsen	c02cd3e2fd	Fix java time epoch date formatters (#37829 ) The self written epoch date formatters were not properly able to format an Instant to a string due to a misconfiguration. This fix also removes a until now existing runtime behaviour under java 8 regarding the names of the aggregation buckets, which are now the same as before and have been under java 11.	2019-02-01 09:03:48 +01:00
Yannick Welsch	859e2f5bc8	Adapt timeouts in UpdateMappingIntegrationIT Relates to #37263 and possibly #36916	2019-02-01 08:58:31 +01:00
Adrien Grand	d83c748417	Fix test bug in DynamicMappingsIT. (#37906 ) Closes #37898	2019-02-01 08:35:29 +01:00
Przemyslaw Gomulka	2758578570	Trim the JSON source in indexing slow logs (#38081 ) The '{' as a first character in log line is causing problems for beats when parsing plaintext logs. This can happen if the submitted document has an additional '\n' at the beginning and we are not reformatting. Trimming the source part of a SlogLog solves that and keeps the logs readable. closes #38080	2019-02-01 08:12:12 +01:00
Armin Braun	0a604e3b24	Fix Two Races that Lead to Stuck Snapshots (#37686 ) * Fixes two broken spots: 1. Master failover while deleting a snapshot that has no shards will get stuck if the new master finds the 0-shard snapshot in `INIT` when deleting 2. Aborted shards that were never seen in `INIT` state by the `SnapshotsShardService` will not be notified as failed, leading to the snapshot staying in `ABORTED` state and never getting deleted with one or more shards stuck in `ABORTED` state * Tried to make fixes as short as possible so we can backport to `6.x` with the least amount of risk * Significantly extended test infrastructure to reproduce the above two issues * Two new test runs: 1. Reproducing the effects of node disconnects/restarts in isolation 2. Reproducing the effects of disconnects/restarts in parallel with shard relocations and deletes * Relates #32265 * Closes #32348	2019-02-01 05:45:40 +01:00
Nhat Nguyen	b8b843476d	Disable dynamic mapping in testSimpleGetFieldMappingsWithDefaults (#38045 ) Since #31140 we no longer require acking on the dynamic mapping of index requests. Thus, a returned mapping from a get mapping request does not necessarily contain the dynamic updates from the index request. This commit replaces the dynamic mapping update with a manual put mapping. Relates #31140 Closes #37928	2019-01-31 21:01:41 -05:00
Nhat Nguyen	a8ebe2a217	Fix random params in testSoftDeletesRetentionLock (#38114 ) Since #37992 the retainingSequenceNumber is initialized with 0 while the global checkpoint can be -1. Relates #37992	2019-01-31 20:50:41 -05:00
Lee Hinman	c67a9663af	Fix MasterServiceTests.testClusterStateUpdateLogging (#38116 ) This changes the test to not use a `CountDownlatch`, instead adding an assertion for the final logging message and waiting until the `MockAppender` has seen it before proceeding. Related to df2c06f6f30f7e23a6863a3f72fc3bdb7648885c Resolves #23739	2019-01-31 17:13:19 -07:00
Yuri Astrakhan	f3cde06a1d	geotile_grid implementation (#37842 ) Implements `geotile_grid` aggregation This patch refactors previous implementation https://github.com/elastic/elasticsearch/pull/30240 This code uses the same base classes as `geohash_grid` agg, but uses a different hashing algorithm to allow zoom consistency. Each grid bucket is aligned to Web Mercator tiles.	2019-01-31 19:11:30 -05:00
Pascal Christoph	a3d9ba3f4b	Log document id when MapperParsingException occurs (#37800 ) Closes #37658	2019-01-31 16:33:13 -05:00
Nhat Nguyen	237fcda2cc	Disable dynamic mapping update in testTransportBulkTasks (#38073 ) If a replica does not have a right mapping yet, we will retry the index request on that replica; then the actual tasks is higher than the expected tasks. Since #31140 this happens more frequently for we no longer require acking on the dynamic mapping of index requests. Relates #31140 Closes #37893	2019-01-31 13:16:52 -05:00
Przemyslaw Gomulka	28b5c7ce78	Do not set up NodeAndClusterIdStateListener in test (#38110 ) When extending ESIntegTestCase are run on the same jvm, the static field in NodeAndClusterIdConverter will throw an AlreadySet exceptions. overriding the configuration method from Node.configureNodeAndClusterIdStateListener in the MockNode will prevent the listener registration from happening relates #32850	2019-01-31 18:59:40 +01:00
Nhat Nguyen	8e95780f98	Soft-deletes policy should always fetch latest leases (#37940 ) If a new retention lease is added while a primary's soft-deletes policy is locked for peer-recovery, that lease won't be baked into the Lucene commit. Relates #37165 Relates #37375	2019-01-31 12:02:57 -05:00
Henning Andersen	68ed72b923	Handle scheduler exceptions (#38014 ) Scheduler.schedule(...) would previously assume that caller handles exception by calling get() on the returned ScheduledFuture. schedule() now returns a ScheduledCancellable that no longer gives access to the exception. Instead, any exception thrown out of a scheduled Runnable is logged as a warning. This is a continuation of #28667, #36137 and also fixes #37708.	2019-01-31 17:51:45 +01:00
David Turner	7f738e8541	Minor logging improvements (#38084 ) Fixes some log messages that caused some minor confusion when digging through a log generated by a failing test.	2019-01-31 16:41:04 +00:00
Tal Levy	9923f0fe6a	fix a few versionAdded values in ElasticsearchExceptions (#37877 ) TooManyBucketsException was introduced in v6.2 and SnapshotInProgressException was introduced in v6.7	2019-01-31 08:28:20 -08:00
Tanguy Leroux	7a597cad0d	Reenable BWC tests after backport of #37899 (#38093 ) This commit adapts the version used in StartedShardEntry serialization after the backport of #37899 and reenables bwc tests. Related to #37899 Related to #38074	2019-01-31 16:53:28 +01:00
Henning Andersen	7487be3d3c	Un-mute NoMasterNodeIT.testNoMasterActionsWriteMasterBlock	2019-01-31 15:31:01 +01:00
Jason Tedor	a9b12b38f0	Push primary term to replication tracker (#38044 ) This commit pushes the primary term into the replication tracker. This is a precursor to using the primary term to resolving ordering problems for retention leases. Namely, it can be that out-of-order retention lease sync requests arrive on a replica. To resolve this, we need a tuple of (primary term, version). For this to be, the primary term needs to be accessible in the replication tracker. As the primary term is part of the replication group anyway, this change conceptually makes sense.	2019-01-31 09:19:49 -05:00
Luca Cavanna	622fb7883b	Introduce ability to minimize round-trips in CCS (#37828 ) With #37566 we have introduced the ability to merge multiple search responses into one. That makes it possible to expose a new way of executing cross-cluster search requests, that makes CCS much faster whenever there is network latency between the CCS coordinating node and the remote clusters. The coordinating node can now send a single search request to each remote cluster, which gets reduced by each one of them. from + size results are requested to each cluster, and the reduce phase in each cluster is non final (meaning that buckets are not pruned and pipeline aggs are not executed). The CCS coordinating node performs an additional, final reduction, which produces one search response out of the multiple responses received from the different clusters. This new execution path will be activated by default for any CCS request unless a scroll is provided or inner hits are requested as part of field collapsing. The search API accepts now a new parameter called ccs_minimize_roundtrips that allows to opt-out of the default behaviour. Relates to #32125	2019-01-31 15:12:14 +01:00
Armin Braun	ae9f4df361	Don't Assert Ack on when Publish Timeout is 0 in Test (#38077 ) * Publish timeout is set to `0` so out of order processing of states on the node can lead to a `false` ack response * See #30672 * Closes #36813	2019-01-31 14:35:11 +01:00
Alexander Reelsen	9f026bb8ad	Reduce object creation in Rounding class (#38061 ) This reduces objects creations in the rounding class (used by aggs) by properly creating the objects only once. Furthermore a few unneeded ZonedDateTime objects were created in order to create other objects out of them. This was changed as well. Running the benchmarks shows a much faster performance for all of the java time based Rounding classes.	2019-01-31 14:18:28 +01:00
Adrien Grand	a536fa7755	Treat put-mapping calls with `_doc` as a top-level key as typed calls. (#38032 ) Currently the put-mapping API assumes that because the type name is `_doc` then it is dealing with a typeless put-mapping call. Yet we still allow running the put-mapping API in a typed fashion with `_doc` as a type name. The current logic triggers surprising errors when doing a typed put-mapping call with `_doc` as a type name on an index that has a type already. This is a bit of a corner-case, but is more important on 6.x due to the fact that using the index API with `_doc` as a type name triggers typed calls to the put-mapping API with `_doc` as a type name.	2019-01-31 13:57:42 +01:00
David Turner	eadcb5f0f8	Fix size of rolling-upgrade bootstrap config (#38031 ) Zen2 nodes will bootstrap themselves once they believe there to be no remaining Zen1 master-eligible nodes in the cluster, as long as minimum_master_nodes is satisfied. Today the bootstrap configuration comprises just the ids of the known master-eligible nodes, and this might be too small to be safe. For instance, if there are 5 master-eligible nodes (so that minimum_master_nodes is 3) then the bootstrap configuration could comprise just 3 nodes, of which 2 form a quorum, and this does not intersect other quorums that might arise, leading to a split-brain. This commit fixes this by expanding the bootstrap configuration so that its quorums satisfy minimum_master_nodes, by adding some of the IDs of the other master-eligible nodes in the last-published cluster state.	2019-01-31 08:00:11 +00:00
Alexander Reelsen	b94acb608b	Speed up converting of temporal accessor to zoned date time (#37915 ) The existing implementation was slow due to exceptions being thrown if an accessor did not have a time zone. This implementation queries for having a timezone, local time and local date and also checks for an instant preventing to throw an exception and thus speeding up the conversion. This removes the existing method and create a new one named DateFormatters.from(TemporalAccessor accessor) to resemble the naming of the java time ones. Before this change an epoch millis parser using the toZonedDateTime method took approximately 50x longer. Relates #37826	2019-01-31 08:55:40 +01:00
Alexander Reelsen	160d1bd4dd	Work around JDK8 timezone bug in tests (#37968 ) The timezone GMT0 cannot be properly parsed on java8. The randomZone() method now excludes GMT0, if java8 is used. Closes #37814	2019-01-31 08:52:35 +01:00
Nhat Nguyen	f5398d6511	Mute testRetentionLeasesSyncOnExpiration Tracked at #37963	2019-01-31 00:57:27 -05:00
Jason Tedor	a6a534f1f0	Reenable BWC testing after retention lease stats (#38062 ) This commit adjusts the BWC version on retention leases in stats, so with this we also reenable BWC testing.	2019-01-30 20:34:27 -05:00
Tim Brooks	b88bdfe958	Add dispatching to `HandledTransportAction` (#38050 ) This commit allows implementors of the `HandledTransportAction` to specify what thread the action should be executed on. The motivation for this commit is that certain CCR requests should be performed on the generic threadpool.	2019-01-30 15:40:49 -07:00
Michael Basnight	945ad05d54	Update verify repository to allow unknown fields (#37619 ) The subparser in verify repository allows for unknown fields. This commit sets the value to true for the parser and modifies the test such that it accurately tests it. Relates #36938	2019-01-30 14:31:16 -06:00
David Turner	81c443c9de	Deprecate minimum_master_nodes (#37868 ) Today we pass `discovery.zen.minimum_master_nodes` to nodes started up in tests, but for 7.x nodes this setting is not required as it has no effect. This commit removes this setting so that nodes are started with more realistic configurations, and deprecates it.	2019-01-30 20:09:15 +00:00
Armin Braun	a070b8acc0	Extract TransportRequestDeduplication from ShardStateAction (#37870 ) * Extracted the logic for master request duplication so it can be reused by the snapshotting logic * Removed custom listener used by `ShardStateAction` to not leak these into future users of this class * Changed semantics slightly to get rid of redundant instantiations of the composite listener * Relates #37686	2019-01-30 19:21:09 +01:00
Jason Tedor	6500b0cbd7	Expose retention leases in shard stats (#37991 ) This commit exposes retention leases via shard-level stats.	2019-01-30 13:20:40 -05:00
Jason Tedor	c468b2f7ca	Make primary terms fields private in index shard (#38036 ) This commit encapsulates the primary terms fields in index shard. This is a precursor to pushing the operation primary term down to the replication tracker.	2019-01-30 12:56:58 -05:00
Nhat Nguyen	ed460c2815	Log flush_stats and commit_stats in testMaybeFlush This test failed a few times over the last several months. It seems that we triggered a flush, but CI was too slow to finish it in several seconds. I added the flush stats and commit stats and unmuted this test. We should have a good clue if this test fails again. Relates #37896	2019-01-30 12:54:31 -05:00
Christoph Büscher	ecbaa38864	Remove deprecated Plugin#onModule extension points (#37866 ) Removes some guice index level extension point marked as @Deprecated since at least 6.0. They served as a signpost for plugin authors upgrading from 2.x but this shouldn't be relevant in 7.0 anymore.	2019-01-30 17:17:54 +01:00
Igor Motov	23805fa41a	Geo: Fix Empty Geometry Collection Handling (#37978 ) Fixes handling empty geometry collection and re-enables testParseGeometryCollection test. Fixes #37894	2019-01-30 09:20:30 -05:00
Luca Cavanna	b91d587275	Move SearchHit and SearchHits to Writeable (#37931 ) This allowed to make SearchHits immutable, while quite a few fields in SearchHit have to stay mutable unfortunately. Relates to #34389	2019-01-30 12:05:54 +01:00
Jason Tedor	ba285a56a7	Fix limit on retaining sequence number (#37992 ) We only assign non-negative sequence numbers to operations, so the lower limit on retaining sequence numbers should be that it is non-negative only.	2019-01-30 05:25:17 -05:00
Alexander Reelsen	9ec4abc31e	Ensure date parsing BWC compatibility (#37929 ) In order to retain BWC this changes the java date formatters to be able to parse nanoseconds resolution, even if only milliseconds are supported. This used to work on joda time as well so that a user could store a date like `2018-10-03T14:42:44.613469+0000` and then just loose the precision on anything lower than millisecond level.	2019-01-30 10:47:12 +01:00
Adrien Grand	c8af0f4bfa	Use mappings to format doc-value fields by default. (#30831 ) Doc-value fields now return a value that is based on the mappings rather than the script implementation by default. This deprecates the special `use_field_mapping` docvalue format which was added in #29639 only to ease the transition to 7.x and it is not necessary anymore in 7.0.	2019-01-30 10:31:51 +01:00
Adrien Grand	b63b50b945	Give precedence to index creation when mixing typed templates with typeless index creation and vice-versa. (#37871 ) Currently if you mix typed templates and typeless index creation or typeless templates and typed index creation then you will end up with an error because Elasticsearch tries to create an index that has multiple types: `_doc` and the explicit type name that you used. This commit proposes to give precedence to the index creation call so that the type from the template will be ignored if the index creation call is typeless while the template is typed, and the type from the index creation call will be used if there is a typeless template. This is consistent with the fact that index creation already "wins" if a field is defined differently in the index creation call and in a template: the definition from the index creation call is used in such cases. Closes #37773	2019-01-30 10:28:24 +01:00
Jim Ferenczi	2732bb5cf3	Fix fetch source option in expand search phase (#37908 ) This change fixes the copy of the fetch source option into the expand search request that is used to retrieve the documents of each collapsed group. Closes #23829	2019-01-30 08:46:14 +01:00
Jim Ferenczi	5dcc805dc9	Restore a noop _all metadata field for 6x indices (#37808 ) This commit restores a noop version of the AllFieldMapper that is instanciated only for indices created in 6x. We need this metadata field mapper to be present in this version in order to allow the upgrade of indices that explicitly disable _all (enabled: false). The mapping of these indices contains a reference to the _all field that we cannot remove in 7 so we'll need to keep this metadata mapper in 7x. Since indices created in 6x will not be compatible with 8, we'll remove this noop mapper in the next major version. Closes #37429	2019-01-30 08:45:50 +01:00
Marios Trivyzas	f5b9b4d89c	Add version 6.6.1 (#37975 )	2019-01-30 15:33:01 +11:00
markharwood	b889221f75	Types removal - deprecate include_type_name with index templates (#37484 ) Added deprecation warnings for use of include_type_name in put/get index templates. HLRC changes: GetIndexTemplateRequest has a new client-side class which is a copy of server's GetIndexTemplateResponse but modified to be typeless. PutIndexTemplateRequest has a new client-side counterpart which doesn't use types in the mappings Relates to #35190	2019-01-29 20:52:41 +00:00
jimczi	193017672a	Handle completion suggestion without contexts This change fixes the handling of completion suggestion without contexts. Relates #36996	2019-01-29 20:31:46 +01:00
Tim Brooks	00ace369af	Use `CcrRepository` to init follower index (#35719 ) This commit modifies the put follow index action to use a CcrRepository when creating a follower index. It routes the logic through the snapshot/restore process. A wait_for_active_shards parameter can be used to configure how long to wait before returning the response.	2019-01-29 11:47:29 -07:00
Albert Zaharovits	d05a4b9d14	Get Aliases with wildcard exclusion expression (#34230 ) This commit adds the code in the HTTP layer that will parse exclusion wildcard expressions. The existing code issues 404s for wildcards as well as explicit indices. But, in general, in an expression with exclude wildcards (-...*) following other include wildcards, there is no way to tell if the include wildcard produced no results or they were subsequently excluded. Therefore, the proposed change is breaking the behavior of 404s for wildcards. Specifically, no 404s will be returned for wildcards, even if they are not followed by exclude wildcards or the exclude wildcards could not possibly exclude what has previously been included. Only explicitly requested aliases will be called out as missing.	2019-01-29 18:56:20 +02:00
Boaz Leskes	218df3009a	Move update and delete by query to use seq# for optimistic concurrency control (#37857 ) The delete and update by query APIs both offer protection against overriding concurrent user changes to the documents they touch. They currently are using internal versioning. This PR changes that to rely on sequences numbers and primary terms. Relates #37639 Relates #36148 Relates #10708	2019-01-29 10:23:05 -05:00
Yannick Welsch	3c9f7031b9	Enforce cluster UUIDs (#37775 ) This commit adds join validation around cluster UUIDs, preventing a node to join a cluster if it was previously part of another cluster. The commit introduces a new flag to the cluster state, clusterUUIDCommitted, which denotes whether the node has locked into a cluster with the given uuid. When a cluster is committed, this flag will turn to true, and subsequent cluster state updates will keep the information about committal. Note that coordinating-only nodes are still free to switch clusters at will (after restart), as they don't carry any persistent state.	2019-01-29 15:41:05 +01:00
Luca Cavanna	09a11a34ef	Remove clusterAlias instance member from QueryShardContext (#37923 ) The clusterAlias member is only used in the copy constructor, to be able to reconstruct the fully qualified index. It is also possible to remove the instance member and add a private constructor that accepts the already built Index object which contains the cluster alias.	2019-01-29 15:31:49 +01:00
Boaz Leskes	65a9b61a91	Add Seq# based optimistic concurrency control to UpdateRequest (#37872 ) The update request has a lesser known support for a one off update of a known document version. This PR adds an a seq# based alternative to power these operations. Relates #36148 Relates #10708	2019-01-29 09:18:05 -05:00
Tanguy Leroux	5d1964bcbf	Ignore shard started requests when primary term does not match (#37899 ) This commit changes the StartedShardEntry so that it also contains the primary term of the shard to start. This way the master node can also checks that the primary term from the start request is equal to the current shard's primary term in the cluster state, and it can ignore any shard started request that would concerns a previous instance of the shard that would have been allocated to the same node. Such situation are likely to happen with frozen (or restored) indices and the replication of closed indices, because with replicated closed indices the shards will be initialized again after the index is closed and can potentially be re initialized again if the index is reopened as a frozen index. In such cases the lifecycle of the shards would be something like: * shard is STARTED * index is closed * shards is INITIALIZING (index state is CLOSED, primary term is X) * index is reopened * shards are INITIALIZING again (index state is OPENED, potentially frozen, primary term is X+1) Adding the primary term to the shard started request will allow to discard potential StartedShardEntry requests received by the master node if the request concerns the shard with primary term X because it has been moved/reinitialized in the meanwhile under the primary term X+1. Relates to #33888	2019-01-29 15:09:40 +01:00
Luca Cavanna	2325fb9cb3	Remove test only SearchShardTarget constructor (#37912 ) Remove SearchShardTarget test only constructor and replace all the usages with calls to the other constructor that accepts a ShardId.	2019-01-29 14:58:11 +01:00
Luca Cavanna	42eec55837	Replace failure.get().addSuppressed with failure.accumulateAndGet() (#37649 ) Also add a test for concurrent incoming failures	2019-01-29 14:57:33 +01:00
Luca Cavanna	a6d4838a67	Clean up allowPartialSearchResults serialization (#37911 ) When serializing allowPartialSearchResults to the shards through ShardSearchTransportRequest, we use an optional boolean field, though the corresponding instance member is declared `boolean` which can never be null. We also have an assert to verify that the incoming search request provides a non-null value for the flag, and a comment explaining that null should be considered a bug. This commit makes the allowPartialSearchResults method in ShardSearchRequest return a `boolean` rather than a `Boolean` and changes the serialization from optional to non optional, in a bw comp manner.	2019-01-29 14:56:22 +01:00
Tanguy Leroux	460f10ce60	Close Index API should force a flush if a sync is needed (#37961 ) This commit changes the TransportVerifyShardBeforeCloseAction so that it issues a forced flush, forcing the translog and the Lucene commit to contain the same max seq number and global checkpoint in the case the Translog contains operations that were not written in the IndexWriter (like a Delete that touches a non existing doc). This way the assertion added in #37426 won't trip. Related to #33888	2019-01-29 13:15:58 +01:00
Yannick Welsch	504a89feaf	Step down as master when configured out of voting configuration (#37802 ) Abdicates to another master-eligible node once the active master is reconfigured out of the voting configuration, for example through the use of voting configuration exclusions. Follow-up to #37712	2019-01-29 12:43:04 +01:00
Yannick Welsch	827c4f6567	Make Version.java aware of 6.x Lucene upgrade Relates to #37913	2019-01-29 10:44:01 +01:00
Przemyslaw Gomulka	891320f5ac	Elasticsearch support to JSON logging (#36833 ) In order to support JSON log format, a custom pattern layout was used and its configuration is enclosed in ESJsonLayout. Users are free to use their own patterns, but if smooth Beats integration is needed, they should use ESJsonLayout. EvilLoggerTests are left intact to make sure user's custom log patterns work fine. To populate additional fields node.id and cluster.uuid which are not available at start time, a cluster state update will have to be received and the values passed to log4j pattern converter. A ClusterStateObserver.Listener is used to receive only one ClusteStateUpdate. Once update is received the nodeId and clusterUUid are set in a static field in a NodeAndClusterIdConverter. Following fields are expected in JSON log lines: type, tiemstamp, level, component, cluster.name, node.name, node.id, cluster.uuid, message, stacktrace see ESJsonLayout.java for more details and field descriptions Docker log4j2 configuration is now almost the same as the one use for ES binary. The only difference is that docker is using console appenders, whereas ES is using file appenders. relates: #32850	2019-01-29 07:20:09 +01:00
Nhat Nguyen	9ceb218d85	Adjust bwc version for put mapping requests Relates #37675	2019-01-28 10:57:11 -05:00
Armin Braun	0d109396fa	Increase Timeout in #testSnapshotCanceled (#37890 ) * The test failure reported in the issue looks like a mere timeout. Logging suggestst hat the snapshot completes/aborts correctly but the busy loop polling the snapshot state times out too early. * Closes #37888	2019-01-28 14:13:02 +01:00
Luca Cavanna	a9adc16922	Mute failing SearchQueryIT test Relates to #37814	2019-01-28 13:41:13 +01:00
Alpar Torok	64b98db973	Add an alias for :server:integTest so it runs as part of internalClusterTest (#37910 )	2019-01-28 14:26:22 +02:00
Jason Tedor	194cdfe208	Sync retention leases on expiration (#37902 ) This commit introduces a sync of retention leases when a retention lease expires. As expiration of retention leases is lazy, their expiration is managed only when getting the current retention leases from the replication tracker. At this point, we callback to our full retention lease sync to sync and flush these on all shard copies. With this change, replicas do not locally manage expiration of retention leases; instead, that is done only on the primary.	2019-01-28 07:11:51 -05:00
Tanguy Leroux	758eb9d451	Track accurate total hits in CloseIndexIT The test was not using the TRACK_TOTAL_HITS_ACCURATE and thus encountered a different issue tracked in #37907. In the meanwhile we can adapt the test to not fail anymore. Closes #37897	2019-01-28 11:30:20 +01:00
Martijn van Groningen	4e1a779773	Prepare ShardFollowNodeTask to bootstrap when it fall behind leader shard (#37562 ) * Changed `LuceneSnapshot` to throw an `OperationsMissingException` if the requested ops are missing. * Changed the shard changes api to handle the `OperationsMissingException` and wrap the exception into `ResourceNotFound` exception and include metadata to indicate the requested range can no longer be retrieved. * Changed `ShardFollowNodeTask` to handle this `ResourceNotFound` exception with the included metdata header. Relates to #35975	2019-01-28 09:30:04 +01:00
Jim Ferenczi	a056804831	Track total hits in tests that index more than 10,000 docs This change sets track_total_hits to true on a test that requires to check the total hits of a query that can return more than 10,000 docs. Closes #37895	2019-01-28 09:24:32 +01:00
Dimitrios Liappis	290c6637c2	Refactor into appropriate uses of scheduleUnlessShuttingDown (#37709 ) Replace `threadPool().schedule()` / catch `EsRejectedExecutionException` pattern with direct calls to `ThreadPool#scheduleUnlessShuttingDown()`. Closes #36318	2019-01-28 10:01:26 +02:00
Julie Tibshirani	b1735aa93b	Support both typed and typeless 'get mapping' requests in the HLRC. (#37796 ) From previous PRs, we've already added support for include_type_name to the get mapping API. We had also taken an approach to the HLRC where the server-side `GetMappingResponse#fromXContent` could only handle typeless input. This PR updates the HLRC for 'get mapping' to be in line with our new approach: * Add a typeless 'get mappings' method to the Java HLRC, that accepts new client-side request and response objects. This new response only handles typeless mapping definitions. * Switch the old version of `GetMappingResponse` back to expecting typed mappings, and deprecate the corresponding method on the HLRC. Finally, the PR also does some small, related clean-up around 'get field mappings'.	2019-01-27 16:02:22 -08:00
Jason Tedor	f24dce1122	Fix newlines in retention lease sync action tests There is a method invocation here spanning multiple lines. This commit breaks it up into a line per parameter as this is friendlier to future changes and diffs.	2019-01-27 08:16:14 -05:00
Jason Tedor	3801925cf0	Copy retention leases under lock When adding a retention lease, we make a reference copy of the retention leases under lock and then make a copy of that collection outside of the lock. However, since we merely copied a reference to the retention leases, after leaving a lock the underlying collection could change on us. Rather, we want to copy these under lock. This commit adds a dedicated method for doing this, asserts that we hold a lock when we use this method, and changes adding a retention lease to use this method. This commit was intended to be included with #37398 but was pushed to the wrong branch.	2019-01-27 08:13:47 -05:00
Jason Tedor	5fddb631a2	Introduce retention lease syncing (#37398 ) This commit introduces retention lease syncing from the primary to its replicas when a new retention lease is added. A follow-up commit will add a background sync of the retention leases as well so that renewed retention leases are synced to replicas.	2019-01-27 07:49:56 -05:00
Nhat Nguyen	780b4c72fe	Make ChannelActionListener a top-level class (#37797 ) We start using this class more often. Let's make it a top-level class.	2019-01-26 22:01:30 -05:00
Julie Tibshirani	afc60bb0e5	Mute DynamicMappingIT#testConflictingDynamicMappings Tracked in #37898.	2019-01-25 18:09:34 -08:00
Tal Levy	eb973a4744	fix GeoHashGridTests precision parsing error Previously, a hardcoded precision value of 4 was used by these tests resulting in no approximation errors. Now that the precision is between 1-12, precision values of 1 and 2 result in potential bucketing errors. This commit adjusts the range to be 4-12. Fixes #37892.	2019-01-25 17:29:04 -08:00
Julie Tibshirani	58301ead6d	Mute IndexShardIT#testMaybeFlush Tracked in #37896.	2019-01-25 17:12:16 -08:00
Julie Tibshirani	23b0d9b3ed	Mute RecoveryWhileUnderLoadIT#testRecoverWhileUnderLoadAllocateReplicasRelocatePrimariesTest Tracked in #37895.	2019-01-25 16:50:39 -08:00
Julie Tibshirani	e41ccdc1a0	Mute GeoWKTShapeParserTests#testParseGeometryCollection Tracked in #37894.	2019-01-25 16:15:16 -08:00
Julie Tibshirani	827ed12146	Mute TasksIT#testTransportBulkTasks Tracked in #37893.	2019-01-25 15:29:24 -08:00
Julie Tibshirani	a4020f4587	Mute SharedClusterSnapshotRestoreIT#testSnapshotCanceledOnRemovedShard Tracked in #37888.	2019-01-25 13:40:29 -08:00
Like	eb7bf16427	Migrate o.e.i.r.RecoveryState to Writeable (#37380 ) Relates to #34389	2019-01-25 15:52:04 -05:00
Nhat Nguyen	5cd4dfb0e4	Relax cluster metadata version check (#37834 ) If the in_sync_allocations of index-1 or index-2 is changed, the metadata version will be increased. This leads to the failure in the metadata version checks. We need to relax them. Closes #37820	2019-01-25 14:54:13 -05:00
Yuri Astrakhan	f1e71be8b2	Refactored GeoHashGrid unit tests (#37832 ) * Refactored GeoHashGrid unit tests This change allows other grid aggregations to reuse the same tests. The change mostly just moves code to the base classes, trying to keep changes to a bare minimum. * rename createInternalGeoHashGridBucket to createInternalGeoGridBucket * indentation	2019-01-25 13:37:24 -05:00
Zachary Tong	afd4618851	Fixes for a few randomized agg tests that fail hasValue() checks Closes #37743 Closes #37873	2019-01-25 12:39:42 -05:00
Igor Motov	68149b6058	Geo: replace intermediate geo objects with libs/geo (#37721 ) Replaces intermediate geo objects built by ShapeBuilders with objects from the libs/geo hierarchy. This should allow us to build all geo functionality around a single hierarchy. Follow up for #35320	2019-01-25 11:37:27 -05:00
Tanguy Leroux	a644bc095c	Add unit tests for ShardStateAction's ShardStartedClusterStateTaskExecutor (#37756 )	2019-01-25 16:51:53 +01:00
Vishnu Gt	27c3fb8e0d	Do not allow negative variances (#37384 ) Due to floating point error, it was possible for variances to become negative which should never happen. This bugfix sets variance to zero if it becomes negative as a result of fp error.	2019-01-25 09:56:34 -05:00
Tanguy Leroux	ef8dd12c6d	Limit number of documents indexed in CloseIndexIT test This test indexes an unlimited number of documents, this commit reduces this number to 25K and also tracks exact number of hits when counting the docs.	2019-01-25 15:09:27 +01:00
Christoph Büscher	b4b4cd6ebd	Clean codebase from empty statements (#37822 ) * Remove empty statements There are a couple of instances of undocumented empty statements all across the code base. While they are mostly harmless, they make the code hard to read and are potentially error-prone. Removing most of these instances and marking blocks that look empty by intention as such. * Change test, slightly more verbose but less confusing	2019-01-25 14:23:02 +01:00
Henning Andersen	49073dd2f6	Fail start on invalid index metadata (#37748 ) Node started with node.data=false and node.master=false can no longer start if they have index metadata. This avoids resurrecting old indexes into the cluster and ensures metadata is cleaned out before re-purposing a node that was previously master or data node. Issue #27073	2019-01-25 14:22:48 +01:00
Jim Ferenczi	cb451edb01	Allow nested fields in the composite aggregation (#37178 ) This changes adds the support to handle `nested` fields in the `composite` aggregation. A `nested` aggregation can be used as parent of a `composite` aggregation in order to target `nested` fields in the `sources`. Closes #28611	2019-01-25 14:00:39 +01:00
Alexander Reelsen	9e350d027e	Add BWC compatible processing to ingest date processors (#37407 ) The ingest date processor is currently only able to parse joda formats. However it is not using the existing elasticsearch classes but access joda directly. This means that our existing BWC layer does not notify the user about deprecated formats. This commit switches to use the exising Elasticsearch Joda methods to acquire a date format, that includes the BWC check and the ability to parse java 8 dates. The date parsing in ingest has also another extra feature, that the fallback year, when a date format without a year is used, is the current year, and not 1970 like usual. This is currently not properly supported in the DateFormatter class. As this is the only case for this feature and java time can take care of this using the toZonedDateTime() method, a workaround just for the joda time parser has been created, that can be removed soon again from 7.0.	2019-01-25 13:50:19 +01:00
Jim Ferenczi	787acb14b9	Track total hits up to 10,000 by default (#37466 ) This commit changes the default for the `track_total_hits` option of the search request to `10,000`. This means that by default search requests will accurately track the total hit count up to `10,000` documents, requests that match more than this value will set the `"total.relation"` to `"gte"` (e.g. greater than or equals) and the `"total.value"` to `10,000` in the search response. Scroll queries are not impacted, they will continue to count the total hits accurately. The default is set back to `true` (accurate hit count) if `rest_total_hits_as_int` is set in the search request. I choose `10,000` as the default because that's also the number we use to limit pagination. This means that users will be able to know how far they can jump (up to 10,000) even if the total number of hits is not accurate. Closes #33028	2019-01-25 13:45:39 +01:00
Mayya Sharipova	70af3c7983	Correct deprec log in RestGetFieldMappingAction (#37843 ) * Correct deprec log in RestGetFieldMappingAction Correct a class used for deprecation logging in RestGetFieldMappingAction * Correct deprec log in RestCreateIndexAction Correct a class used for deprecation logging in RestCreateIndexAction	2019-01-25 07:13:46 -05:00
Andrey Ershov	9e7fd8caed	Migrate ZenDiscoveryIT to Zen2 (#37465 ) ZenDiscoveryIT contained 5 tests. 3 run without changes, testNodeRejectsClusterStateWithWrongMasterNode removed, testHandleNodeJoin_incompatibleClusterState changed.	2019-01-25 11:17:09 +01:00
Armin Braun	7692b607b9	Fix ClusterDisruptionIT#testAckedIndexing (#37853 ) * Stop threads before logging the list of exceptions * For the broken case of concurrent iteration in the finally block and the threads not having shut down, use `CopyOnWriteArrayList` to have concurrency safe iteration * Closes #37810	2019-01-25 09:38:29 +01:00
Martijn van Groningen	5a9dadb3ff	changed versionAdded now that #37767 is backedported	2019-01-25 09:18:42 +01:00
Martijn van Groningen	1151f3b3ff	Fail with a dedicated exception if remote connection is missing or (#37767 ) or connectivity to the remote connection is failing. Relates to #37681	2019-01-25 08:53:18 +01:00
Ricardo Ferreira	df8fa9781e	Remove Abstract Component (#35898 ) TransportAction and BaseRestHandler now no longer extends AbstractComponent. The AbstractComponent no longer has usages so it was deleted. Closes #34488	2019-01-25 08:35:19 +01:00
Yuri Astrakhan	6a13a252e9	Abstract GeoHashGridAggregatorFactory creation, renamed geohash -> hash (#37836 ) * Delegate `new GeoHashGridAggregatorFactory(...)` inside the `GeoGridAggregationBuilder` to the child classes. * Rename all `geohash...` to `hash...`	2019-01-24 23:45:18 -05:00
Nhat Nguyen	3ccd488755	Remove testMappingsPropagatedToMasterNodeImmediately This test is obsolete since #31140 where an index request with dynamic mapping update no longer requires acking. Closes #37816	2019-01-24 21:48:50 -05:00
Julie Tibshirani	e1d8df4ffa	Deprecate types in create index requests. (#37134 ) From #29453 and #37285, the include_type_name parameter was already present and defaulted to false. This PR makes the following updates: * Add deprecation warnings to RestCreateIndexAction, plus tests in RestCreateIndexActionTests. * Add a typeless 'create index' method to the Java HLRC, and deprecate the old typed version. To do this cleanly, I created new CreateIndexRequest and CreateIndexResponse objects that differ from the existing server ones.	2019-01-24 13:17:47 -08:00
Boaz Leskes	af2f4c8f73	enable bwc tests and bump versions after backporting https://github.com/elastic/elasticsearch/pull/37639	2019-01-24 20:55:55 +01:00
Nhat Nguyen	864e465515	Adjust minRetainedSeqNo asssertion in CombinedDeletionPolicyTests In these tests, we initialize the retained_seq_no with NO_OPS_PERFORMED, thus we should verify that the min of the retained_seq_no is at least NO_OPS_PERFORMED not 0. Closes #35994	2019-01-24 13:43:51 -05:00
Andrey Ershov	4974684003	Add tool elasticsearch-node unsafe-bootstrap (#37696 ) elasticsearch-node tool helps to restore cluster if half or more of master eligible nodes are lost. Of course, all bets are off, regarding data consistency. There are two parts of the tool: unsafe-bootstrap to be used when there is still at least one master-eligible node alive and detach-cluster, when there are no master-eligible nodes left. This commit implements the first part. Docs for the tool will be added separately as a part of #37812.	2019-01-24 19:25:55 +01:00
Tal Levy	289106a578	Refactor GeoHashGrid to be abstract and re-usable (#37742 ) This change split out all the specific GeoHash classes for the geohash_grid aggregation into abstract GeoGrid classes that can be re-used for specific hashing types, like `geohash`	2019-01-24 10:12:14 -08:00
Nhat Nguyen	76fb573569	Do not allow put mapping on follower (#37675 ) Today, the mapping on the follower is managed and replicated from its leader index by the ShardFollowTask. Thus, we should prevent users from modifying the mapping on the follower indices. Relates #30086	2019-01-24 12:13:00 -05:00
David Turner	187b233571	Read m_m_n from cluster states from 6.7 This completes the BWC serialisation changes required for a 6.7 master to inform other nodes of the node-level value of the `minimum_master_nodes` setting. Relates #37701, #37811	2019-01-24 17:05:49 +00:00
David Roberts	0e36adc35f	Mute SimpleClusterStateIT testMetadataVersion Due to https://github.com/elastic/elasticsearch/issues/37820	2019-01-24 16:50:55 +00:00
David Roberts	bd02ca4b7b	Mute NoMasterNodeIT testNoMasterActionsWriteMasterBlock Due to https://github.com/elastic/elasticsearch/issues/37823	2019-01-24 15:17:13 +00:00
Nhat Nguyen	a6abb28abf	Fix InternalEngineTests#assertOpsOnPrimary (#37746 ) The assertion `assertOpsOnPrimary` does not store seq_no and primary term of successful deletes to the `lastOpSeqNo` and `lastOpTerm`. This leads to failures of the subsequence CAS deletes or indexes with seq_no and term. Moreover, this assertion trips a translog assertion because it bumps the primary term of some operations but not the primary term of the engine. Relates #36467 Closes #37684	2019-01-24 10:02:48 -05:00
David Roberts	a81931bb2a	Mute DynamicMappingIT testMappingsPropagatedToMasterNodeImmediately Due to https://github.com/elastic/elasticsearch/issues/37816	2019-01-24 14:32:44 +00:00
Jason Tedor	7517e3a7bd	Optimize warning header de-duplication (#37725 ) Now that warning headers no longer contain a timestamp of when the warning was generated, we no longer need to extract the warning value from the warning to determine whether or not the warning value is duplicated. Instead, we can compare strings directly. Further, when de-duplicating warning headers, are constantly rebuilding sets. Instead of doing that, we can carry about the set with us and rebuild it if we find a new warning value. This commit applies both of these optimizations.	2019-01-24 08:39:24 -05:00
Yannick Welsch	feab59df03	Bubble exceptions up in ClusterApplierService (#37729 ) Exceptions thrown by the cluster applier service's settings and cluster appliers are bubbled up, and block the state from being applied instead of silently being ignored. In combination with the cluster state publishing lag detector, this will throw a node out of the cluster that can't properly apply cluster state updates.	2019-01-24 14:09:03 +01:00
Simon Willnauer	c7b16162ae	Remove unused ThreadBarrier class (#37666 ) This class is pretty complex and only used in a test where we can simply fail the test with an assertion error.	2019-01-24 13:52:22 +01:00
Yannick Welsch	2bf269e628	Fix docs for MappingUpdatedAction Follow-up to #31140	2019-01-24 12:44:36 +01:00
David Roberts	bcf5a4ca47	Mute ClusterDisruptionIT testAckedIndexing Due to https://github.com/elastic/elasticsearch/issues/37810	2019-01-24 10:58:02 +00:00
Yannick Welsch	64adb5ad5b	Set acking timeout to 0 on dynamic mapping update (#31140 ) As acking can fail for any reason (unrelated node being too slow, node disconnecting), it should not be required for acking to succeed in order for index requests with dynamic mapping updates to successfully complete. Relates to #30672 and Closes #30844	2019-01-24 11:39:46 +01:00
Armin Braun	36889e8a2f	Remove Custom Listeners from SnapshotsService (#37629 ) * Remove Custom Listeners from SnapshotsService Motivations: * Shorten the code some more * Use ActionListener#wrap to get easy to reason about behavior in failure scenarios * Remove duplication in the logic of handling snapshot completion listeners (listeners removing themselves and comparing snapshots to their targets) * Also here, move all listener handling into `SnapshotsService` and remove custom listener class by putting listeners in a map	2019-01-24 10:11:18 +01:00
David Turner	bdef2ab8c0	Use m_m_nodes from Zen1 master for Zen2 bootstrap (#37701 ) Today we support a smooth rolling upgrade from Zen1 to Zen2 by automatically bootstrapping the cluster once all the Zen1 nodes have left, as long as the `minimum_master_nodes` count is satisfied. However this means that Zen2 nodes also require the `minimum_master_nodes` setting for this one specific and transient situation. Since nodes only perform this automatic bootstrapping if they previously belonged to a Zen1 cluster, they can keep track of the `minimum_master_nodes` setting from the previous master instead of requiring it to be set on the Zen2 node.	2019-01-24 08:57:40 +00:00
Mayya Sharipova	fdb66039d4	Change `rational` to `saturation` in script_score (#37766 ) This change of the function name is necessary for conformity with feature queries. Closes #37714	2019-01-23 14:28:20 -05:00
Mayya Sharipova	c8565fe692	Deprecate types in get field mapping API (#37667 ) - Add deprecation warning to RestGetFieldMappingAction - Add two new java HRLC classes GetFieldMappingsRequest and GetFieldMappingsResponse. These classes use new typeless forms of a request and response, and differ in that from the server versions. Relates to #35190	2019-01-23 14:24:35 -05:00
Tim Brooks	f45b5fedb5	Add ability to listen to group of affix settings (#37679 ) Currently we have the ability to listen for setting changes to two group affix settings. However, it is possible that we might have the need to listen to more than two. This commit adds a method that allows consumer to listen to a list of affix settings for changes.	2019-01-23 12:05:39 -07:00
Jason Tedor	169cb38778	Liberalize StreamOutput#writeStringList (#37768 ) In some cases we only have a string collection instead of a string list that we want to serialize out. We have a convenience method for writing a list of strings, but no such method for writing a collection of strings. Yet, a list of strings is a collection of strings, so we can simply liberalize StreamOutput#writeStringList to be more generous in the collections that it accepts and write out collections of strings too. On the other side, we do not have a convenience method for reading a list of strings. This commit addresses both of these issues.	2019-01-23 12:52:17 -05:00
Benjamin Trent	1c2ae9185c	Add PersistentTasksClusterService::unassignPersistentTask method (#37576 ) * Add PersistentTasksClusterService::unassignPersistentTask method * adding cancellation test * Adding integration test for unallocating tasks from a node * Addressing review comments * adressing minor PR comments	2019-01-23 11:48:32 -06:00
Igor Motov	e3672aa551	Tests: disable testRandomGeoCollectionQuery on tiny polygons (#37579 ) Due to https://issues.apache.org/jira/browse/LUCENE-8634 this test may fail if a really tiny polygon is generated. This commit checks for tiny polygons and skips the final check, which is expected to fail until the lucene bug is fixed and new version of lucene is released.	2019-01-23 12:25:54 -05:00
Julie Tibshirani	f0fc6e8003	Make sure PutMappingRequest accepts content types other than JSON. (#37720 )	2019-01-23 08:51:05 -08:00
David Kyle	d193ca8aae	Use disassociate in preference to deassociate (#37704 )	2019-01-23 16:06:25 +00:00
Armin Braun	2439f68745	Delete Redundant RoutingServiceTests (#37750 ) * This test compleletly overrode the `reroute` method and hence did nothing put test the override itself * Removed the test since it tests nothing and simplified `reroute` accordingly	2019-01-23 16:39:02 +01:00
Nhat Nguyen	6a9838359c	Always return metadata version if metadata is requested (#37674 ) If the indices of a ClusterStateRequest are specified, we fail to include the cluster state metadata version in the response. Relates #37633	2019-01-23 10:24:51 -05:00
Luca Cavanna	12f5b02fd0	Streamline skip_unavailable handling (#37672 ) This commit moves the collectSearchShards method out of RemoteClusterService into TransportSearchAction that currently calls it. RemoteClusterService used to be used only for cross-cluster search but is now also used in cross-cluster replication where different API are called through the RemoteClusterAwareClient. There is no reason for the collectSearchShards and fetchShards methods to be respectively in RemoteClusterService and RemoteClusterConnection. The search shards API can be called through the RemoteClusterAwareClient too, the only missing bit is a way to handle failures based on the skip_unavailable setting for each cluster (currently only supported in RemoteClusterConnection#fetchShards) which is achieved by adding a isSkipUnavailable(String clusterAlias) method to RemoteClusterService. This change is useful for #32125 as we will very soon need to also call the search API against remote clusters, which will be done through RemoteClusterAwareClient. In that case we will also need to support skip_unavailable when calling the search API so we need some way to handle the skip_unavailable setting like we currently do for the search_shards call. Relates to #32125	2019-01-23 13:53:37 +01:00
Yannick Welsch	d5139e0590	Only bootstrap and elect node in current voting configuration (#37712 ) Adapts bootstrapping and leader election to only trigger on nodes that are actually part of the voting configuration.	2019-01-23 13:10:11 +01:00
Simon Willnauer	4ec3a6d922	Ensure either success or failure path for SearchOperationListener is called (#37467 ) Today we have several implementations of executing SearchOperationListener in SearchService. While all of them seem to be safe at least on, the one that executes scroll searches can cause illegal execution of SearchOperationListener that can then in-turn trigger assertions in ShardSearchStats. This change adds a SearchOperationListenerExecutor that uses try-with blocks to ensure listeners are called in a safe way. Relates to #37185	2019-01-23 12:38:44 +01:00
Tanguy Leroux	6130d15172	Adapt SyncedFlushService (#37691 )	2019-01-23 11:08:54 +01:00
Alexander Reelsen	701d89caa2	Mute FilterAggregatorTests#testRandom Relates #37743	2019-01-23 11:00:37 +01:00
Alexander Reelsen	daa2ec8a60	Switch mapping/aggregations over to java time (#36363 ) This commit moves the aggregation and mapping code from joda time to java time. This includes field mappers, root object mappers, aggregations with date histograms, query builders and a lot of changes within tests. The cut-over to java time is a requirement so that we can support nanoseconds properly in a future field mapper. Relates #27330	2019-01-23 10:40:05 +01:00
Boaz Leskes	52ba407931	Expose sequence number and primary terms in search responses (#37639 ) Users may require the sequence number and primary terms to perform optimistic concurrency control operations. Currently, you can get the sequence number via the `docvalues_fields` API but the primary term is not accessible because it is maintained by the `SeqNoFieldMapper` and the infrastructure can't find it. This commit adds a dedicated sub fetch phase to return both numbers that is connected to a new `seq_no_primary_term` parameter.	2019-01-23 09:01:58 +01:00
Andrey Ershov	7c6566e14c	Migrate SpecificMasterNodesIT to Zen2 (#37532 ) 1. testSimpleOnlyMasterNodeElection - requires cluster bootstrap when the first master node is started. 2. testElectOnlyBetweenMasterNodes - requires cluster bootstrap when the first master node is started and requires adding voting exclusion before shutting down the first master node. 3. testAliasFilterValidation - requires cluster bootstrap when the first master node is started.	2019-01-23 07:22:41 +01:00
Andrey Ershov	e2e00cd245	Fix MetaStateFormat tests It's not safe to continue writing state using MetaDataStateFormat after dirty WriteStateException occurred if it's not recovered by successful subsequent state write. We've encountered test failure of testFailRandomlyAndReadAnyState. The test breaks in the following way. There are 3 state paths. And what happens next Successful write at the beginning of the test yields 0 0 0 state files in the directories. 1st write in the loop is unsuccessful, but not dirty - 0 0 0. 2nd write in the loop is not successful and dirty (failure during fsync), however before removing new files we have 1 1 1. But now during deletion, the first deletion fails and we get - 1 0 0. 3rd write in the loop is unsuccessful, but not dirty - so we want to keep old generation, which happens to be the 1st generation, so now we have 1 x x in state folders. Now we assert that we either load 0 or 1 state from the state folders and select only 2rd and 3th folder to emulate disk failures - this results in NPE because there is nothing in these folders. Fortunately, this won’t be a problem in real life, because if there is a dirty exception, we shut down the node and make sure we perform a successful write on the node startup.	2019-01-23 07:21:26 +01:00
Zachary Tong	2ba9e361ab	Add helper classes to determine if aggs have a value (#36020 ) This adds a set of helper classes to determine if an agg "has a value". This is needed because InternalAggs represent "empty" in different manners according to convention. Some use `NaN`, `+/- Inf`, `0.0`, etc. A user can pass the Internal agg type to one of these helper methods and it will report if the agg contains a value or not, which allows the user to differentiate "empty" from a real `NaN`. These helpers are best-effort in some cases. For example, several pipeline aggs share a single return class but use different conventions to mark "empty", so the helper uses the loosest definition that applies to all the aggs that use the class. Sums in particular are unreliable. The InternalSum simply returns 0.0 if the agg is empty (which is correct, no values == sum of zero). But this also means the helper cannot differentiate from "empty" and `+1 + -1`.	2019-01-22 12:38:55 -05:00
Jason Tedor	715719ee3b	Remove warn-date from warning headers (#37622 ) This commit removes the warn-date from warning headers. Previously we were stamping every warning header with when the request occurred. However, this has a severe performance penalty when deprecation logging is called frequently, as obtaining the current time and formatting it properly is expensive. A previous change moved to using the startup time as the time to stamp on every warning header, but this was only to prove that the timestamping was expensive. Since the warn-date is optional, we elect to remove it from the warning header. Prior to this commit, we worked in Kibana to make the warn-date treated as optional there so that we can follow-up in Elasticsearch and remove the warn-date. This commit does that.	2019-01-22 12:29:24 -05:00
Yannick Welsch	23ba900840	Publish to masters first (#37673 ) Prefer publishing to master-eligible nodes first, so that cluster state updates are committed more quickly, and master-eligible nodes also turned more quickly into followers after a leader election.	2019-01-22 13:53:10 +01:00
David Kyle	3fad1eeaed	Un-assign persistent tasks as nodes exit the cluster (#37656 ) PersistentTasksClusterService decides if a task should be reassigned by checking there is a node in the cluster with the same Id. If a node is restarted PersistentTasksClusterService may not observe the change and decide the task still has a valid assignment because the node's ephemeral Id is not used in that decision. This change un-assigns tasks as the nodes in the cluster change.	2019-01-22 12:44:45 +00:00
Henning Andersen	228611843c	Fail start of non-data node if node has data (#37347 ) * Fail start of non-data node if node has data Check that nodes started with node.data=false cannot start if they have shard data to avoid (old) indexes being resurrected into the cluster in red status. Issue #27073	2019-01-22 13:27:12 +01:00
Yannick Welsch	2a7b7ccf1c	Use cancel instead of timeout for aborting publications (#37670 ) When publications were cancelled because a node turned to follower or candidate, it would still show as time out, which can be confusing in the logs. This change adapts the improper call of onTimeout by generalizing it to a cancel method.	2019-01-22 12:51:03 +01:00
Christoph Büscher	0a93a0358b	Remove deprecated FieldNamesFieldMapper.Builder#index (#37305 ) The method calls "enabled" in addition to what the super.index() does, but this seems to be done explicitely now in the TypeParsers `parse` method. The removed method has been deprecated since at least 6.0. Also making some of the Builders methods and ctos private since they are only used internally in this class.	2019-01-22 12:12:21 +01:00
David Turner	5db7ed22a0	Bootstrap a Zen2 cluster once quorum is discovered (#37463 ) Today when bootstrapping a Zen2 cluster we wait for every node in the `initial_master_nodes` setting to be discovered, so that we can map the node names or addresses in the `initial_master_nodes` list to their IDs for inclusion in the initial voting configuration. This means that if any of the expected master-eligible nodes fails to start then bootstrapping will not occur and the cluster will not form. This is not ideal, and we would prefer the cluster to bootstrap even if some of the master-eligible nodes do not start. Safe bootstrapping requires that all pairs of quorums of all initial configurations overlap, and this is particularly troublesome to ensure given that nodes may be concurrently and independently attempting to bootstrap the cluster. The solution is to bootstrap using an initial configuration whose size matches the size of the expected set of master-eligible nodes, but with the unknown IDs replaced by "placeholder" IDs that can never belong to any node. Any quorum of received votes in any of these placeholder-laden initial configurations is also a quorum of the "true" initial set of master-eligible nodes, giving the guarantee that it intersects all other quorums as required. Note that this change means that the initial configuration is not necessarily robust to any node failures. Normally the cluster will form and then auto-reconfigure to a more robust configuration in which the placeholder IDs are replaced by the IDs of genuine nodes as they join the cluster; however if a node fails between bootstrapping and this auto-reconfiguration then the cluster may become unavailable. This we feel to be less likely than a node failing to start at all. This commit also enormously simplifies the cluster bootstrapping process. Today, the cluster bootstrapping process involves two (local) transport actions in order to support a flexible bootstrapping API and to make it easily accessible to plugins. However this flexibility is not required for the current design so it is adding a good deal of unnecessary complexity. Here we remove this complexity in favour of a much simpler ClusterBootstrapService implementation that does all the work itself.	2019-01-22 11:03:51 +00:00
Adrien Grand	e9fcb25a28	Upgrade to lucene-8.0.0-snapshot-83f9835. (#37668 ) This snapshot uses a new file format for doc-values which is expected to make advance/advanceExact perform faster on sparse fields: https://issues.apache.org/jira/browse/LUCENE-8585	2019-01-22 11:44:29 +01:00
Alpar Torok	74d1cfbf7e	Mute failing test Tracking ##37687	2019-01-22 10:50:27 +02:00
Alexander Reelsen	4fb68ea195	Fix java time formatters that round up (#37604 ) In order to be able to parse epoch seconds and epoch milli seconds own java time fields had been introduced. These fields are however not compatible with the way that java time allows one to configure default fields (when a part of a timestamp cannot be read then a default value is added), which is used for the formatters that are rounding up to the next value. This commit allows java date formatters to configure its round up parsing by setting default values via a consumer. By default all formats are setting JavaDateFormatter.ROUND_UP_BASE_FIELDS for rounding up. The epoch however parsers both need to set different fields. The merged date formatters do not set any fields, they just append all the round up formatters. Also the formatter now properly copies the locale and the timezone, fractional parsing has been set to nano seconds with proper width.	2019-01-22 09:42:17 +01:00
Alpar Torok	17d704347e	Mute failing test Tracking #37685	2019-01-22 10:31:23 +02:00
Tanguy Leroux	0290547ad7	Ensure that max seq # is equal to the global checkpoint when creating ReadOnlyEngines (#37426 ) Since version 6.7.0 the Close Index API guarantees that all translog operations have been correctly flushed before the index is closed. If the index is reopened as a Frozen index (which uses a ReadOnlyEngine) we can verify that the maximum sequence number from the last Lucene commit is indeed equal to the last known global checkpoint and refuses to open the read only engine if it's not the case. In this PR the check is only done for indices created on or after 6.7.0 as they are guaranteed to be closed using the new Close Index API. Related #33888	2019-01-22 09:22:33 +01:00
Alpar Torok	a713183cab	Mute failing discovery disruption tests Tracking #37539	2019-01-22 10:16:04 +02:00
Nhat Nguyen	7394892b4c	Make prepare engine step of recovery source non-blocking (#37573 ) Relates #37174	2019-01-21 21:35:10 -05:00
Tim Brooks	21838d73b5	Extract message serialization from `TcpTransport` (#37034 ) This commit introduces a NetworkMessage class. This class has two subclasses - InboundMessage and OutboundMessage. These messages can be serialized and deserialized independent of the transport. This allows more granular testing. Additionally, the serialization mechanism is now a simple Supplier. This builds the framework to eventually move the serialization of transport messages to the network thread. This is the one serialization component that is not currently performed on the network thread (transport deserialization and http serialization and deserialization are all on the network thread).	2019-01-21 14:14:18 -07:00
Tim Brooks	f516d68fb2	Share `NioGroup` between http and transport impls (#37396 ) Currently we create dedicated network threads for both the http and transport implementations. Since these these threads should never perform blocking operations, these threads could be shared. This commit modifies the nio-transport to have 0 http workers be default. If the default configs are used, this will cause the http transport to be run on the transport worker threads. The http worker setting will still exist in case the user would like to configure dedicated workers. Additionally, this commmit deletes dedicated acceptor threads. We have never had these for the netty transport and they can be added back if a need is determined in the future.	2019-01-21 13:50:56 -07:00
Armin Braun	3a3f5b39c3	Fix Race in Concurrent Snapshot Delete and Create (#37612 ) * The repo id was determined wrong when the delete picked up on an in progress snapshot * NOTE: This solution is still a best-effort fix and there's a slight chance of running into concurrency issues here when multiple create and delete requests for the same snapshot name are happening concurrently, but these require a sequence of multiple cluster state updates between the changed method reading the genId and submitting its cluster state update task * Added test reproduced the issue reliably in about 50% of runs * Closes #37581	2019-01-21 13:10:33 +01:00
Luca Cavanna	09a6ba50ef	Add support for merging multiple search responses into one (#37566 ) This will be used in cross-cluster search when reduction will be performed locally on each cluster. The CCS coordinating node will send one search request per remote cluster involved and will get one search response back from each one of them. Such responses contain all the info to be able to perform an additional reduction and return results back to the user. Relates to #32125	2019-01-21 11:51:47 +01:00
Jason Tedor	adae233f77	Add some deprecation optimizations (#37597 ) This commit optimizes some of the performance issues from using deprecation logging: - we optimize encoding the deprecation value - we optimize formatting the deprecation string - we optimize away getting the current time (by using cached startup time)	2019-01-18 16:42:25 -05:00
Tal Levy	106f900dfb	refactor inner geogrid classes to own class files (#37596 ) To make further refactoring of GeoGrid aggregations easier (related: #30320), splitting out these inner class dependencies into their own files makes it easier to map the relationship between classes	2019-01-18 13:40:00 -08:00
Julie Tibshirani	8da7a27f3b	Deprecate types in the put mapping API. (#37280 ) From #29453 and #37285, the `include_type_name` parameter was already present and defaulted to false. This PR makes the following updates: - Add deprecation warnings to `RestPutMappingAction`, plus tests in `RestPutMappingActionTests`. - Add a typeless 'put mappings' method to the Java HLRC, and deprecate the old typed version. To do this cleanly, I opted to create a new `PutMappingRequest` object that differs from the existing server one.	2019-01-18 12:28:31 -08:00
Jack Conradson	de55b4dfd1	Add types deprecation to script contexts (#37554 ) This adds deprecation to _type in the script contexts for ingest and update. This adds a DeprecationMap that wraps the ctx Map containing _type for these specific contexts.	2019-01-18 09:13:49 -08:00
Yannick Welsch	377d96e376	Remove initial_master_nodes on node restart (#37580 ) Some tests (e.g. testRestoreIndexWithShardsMissingInLocalGateway) were split-braining since being switched to Zen2 because the bootstrap setting was left around when nodes got restarted with data folders wiped. The test in question here was starting one node (which autobootstrapped to that single node), then another node. The first node was then shut down (after excluding it from the voting configuration), its data folder wiped, and restarted. After restart, the node had an empty data folder yet initial_master_nodes set to itself (i.e. same name). This made the node sometimes form a cluster of its own, and not rejoin the existing cluster with the other node.	2019-01-18 16:36:42 +01:00
Jason Tedor	ed297b7369	Only update response headers if we have a new one (#37590 ) Currently when adding a response header, we do some de-duplication, and maybe drop the header on the floor if we have reached capacity. Yet, we still update the thread local tracking the response headers. This is really expensive because under the hood there is a shared reference that we synchronize on. In the case of a request processed across many shards in a tight loop, this contention can be detrimental to performance. We can avoid updating the thread local in these cases though, when the response header is duplicate of one that we have already seen, or when it's dropped on the floor. This commit addresses these performance issues by avoiding the unnecessary set.	2019-01-18 08:20:05 -05:00
Tanguy Leroux	29d3a708da	Fix BulkWithUpdatesIT and CloseIndexIT As of today the Close Index API does its best to close indices, but closing an index with ongoing recoveries might or might not be acknowledged depending of the values of the max seq number and global checkpoint at the time the TransportVerifyShardBeforeClose action is executed. These tests failed because they always expect that the index is correctly closed on the first try, which is not always the case. Instead we need to retry the closing until it succeed. Closes #37571	2019-01-18 10:54:35 +01:00
David Turner	65e76b3f6f	Migrate RecoveryFromGatewayIT to Zen2 (#37520 ) * Fixes `testTwoNodeFirstNodeCleared` by manipulating voting config exclusions. * Removes `testRecoveryDifferentNodeOrderStartup` since state recovery is now handled entirely on the elected master, so the order in which the data nodes start is irrelevant.	2019-01-18 09:15:51 +00:00
David Turner	699d881739	Migrate IndicesExistsIT to Zen2 (#37526 ) This test was actually passing, for the wrong reason: it asserts a `MasterNotDiscoveredException` is thrown, expecting this to be due to a failure to perform state recovery, but in fact it's thrown because the node is not correctly bootstrapped.	2019-01-18 09:15:30 +00:00
Christoph Büscher	2f0e0b2426	Allow indices.get_mapping response parsing without types (#37492 ) This change adds deprecation warning to the indices.get_mapping API in case the "inlcude_type_name" parameter is set to "true" and changes the parsing code in GetMappingsResponse to parse the type-less response instead of the one containing types. As a consequence the HLRC client doesn't need to force "include_type_name=true" any more and the GetMappingsResponseTests can be adapted to the new format as well. Also removing some "include_type_name" parameters in yaml test and docs where not necessary.	2019-01-18 09:33:36 +01:00
Armin Braun	62ddc8c776	Reenable UnicastZenPingTests#testSimplePings * This was muted needlessly, the problem in #26701 only applies to `6.x` * Relates #26701	2019-01-18 08:36:22 +01:00
Tim Brooks	b6f06a48c0	Implement follower rate limiting for file restore (#37449 ) This is related to #35975. This commit implements rate limiting on the follower side using a new class `CombinedRateLimiter`.	2019-01-17 14:58:46 -07:00
Armin Braun	381d035cd6	Remove Redundant RestoreRequest Class (#37535 ) * Same as #37464 but for the restore side	2019-01-17 22:23:23 +01:00

... 6 7 8 9 10 ...

3028 Commits