OpenSearch

Commit Graph

Author	SHA1	Message	Date
Jason Tedor	0250d554b6	Introduce forget follower API (#39718 ) This commit introduces the forget follower API. This API is needed in cases that unfollowing a following index fails to remove the shard history retention leases on the leader index. This can happen explicitly through user action, or implicitly through an index managed by ILM. When this occurs, history will be retained longer than necessary. While the retention lease will eventually expire, it can be expensive to allow history to persist for that long, and also prevent ILM from performing actions like shrink on the leader index. As such, we introduce an API to allow for manual removal of the shard history retention leases in this case.	2019-03-07 11:08:45 -05:00
Armin Braun	213cc6673c	Remove Dead Code in o.e.util package (#39717 ) (#39779 ) * None of this code is used so we should delete it, we can always bring it back if needed	2019-03-07 08:31:46 +01:00
Nhat Nguyen	b69affda6a	Use unwrapped cause to determine if node is closing (#39723 ) We need to unwrap and use the actual cause when determining if the node with primary shard is shutting down because TransportService will throw a TransportException wrapped in a SendRequestTransportException. Relates #39584	2019-03-06 15:30:55 -05:00
Nhat Nguyen	1fe7cb594f	Don’t ack if unable to remove failing replica (#39584 ) Today when a replicated write operation fails to execute on a replica, the primary will reach out to the master to fail that replica (and mark it stale). We then won't ack that request until the master removes the failing replica; otherwise, we will lose the acked operation if the failed replica is still in the in-sync set. However, if a node with the primary is shutting down, we might ack such request even though we are unable to send a shard-failure request to the master. This happens because we ignore NodeClosedException which is triggered when the ClusterService is being closed. Closes #39467	2019-03-06 15:30:55 -05:00
markharwood	1873de5240	Bug fix for AnnotatedTextHighlighter - port of 39525 (#39749 ) Bug fix for AnnotatedTextHighlighter - port of 39525 Relates to #39395	2019-03-06 19:02:04 +00:00
Yannick Welsch	d094107592	Fix SharedClusterSnapshotRestoreIT Relates to #39644	2019-03-06 17:51:23 +01:00
Yannick Welsch	fef11f7efc	Allow snapshotting replicated closed indices (#39644 ) This adds the capability to snapshot replicated closed indices. It also changes snapshot requests in v8.0.0 to automatically expand wildcards to closed indices and hence start snapshotting closed indices by default. For v7.1.0 and above, wildcards are by default only expanded to open indices, which can be changed by explicitly setting the expand_wildcards option either to all or closed. Note that indices are always restored as open indices, even if they have been snapshotted as closed replicated indices. Relates to #33888	2019-03-06 16:08:20 +01:00
Simon Willnauer	e620fb2e4a	Add option to force load term dict into memory (#39741 ) Lucene added an optimization to leave the term dictionary on disk for non-id like fields. This change happened very late in the release processes such that it's better to have an escape hatch if certain use-cases are hurt by this optimization. This setting might be removed in the future if it turns out to be unnecessary.	2019-03-06 15:29:04 +01:00
Christoph Büscher	6c503824c8	Fix occasional SearchServiceTests failure (#39697 ) Currently SearchServiceTests.testCloseSearchContextOnRewriteException can fail if a refresh happens while we test for the SearchPhaseExecutionException that is thrown later in the test. The test takes the current Store#refCount and expects it to be the same after the exception is thrown. If a refresh happens in that interval however, the refCound will be different, causing the test to fail. This can be provoked e.g. by running this section in a tight loop. Switching of refresh for this tests solves the issue.	2019-03-06 14:18:03 +01:00
Andrey Ershov	52fd102e23	Avoid serialising state if it was already serialised (#39179 ) When preparing the state to send to other nodes, we're serializing it for each node, despite using putIfAbsent. This commit checks if the state was already serialized for this node version before performing the potentially expensive computation. The map is not used by multiple threads, so computeIfAbsent is not needed (and could not be used here easily, because IOException could be thrown). (cherry picked from commit c99be63b43f5250f3cd220130df73c5e9e097459)	2019-03-06 11:54:13 +01:00
David Turner	295e39a8c8	Drop node if asymmetrically partitioned from master (#39598 ) When a node is joining the cluster we ensure that it can send requests to the master _at that time_. If it joins the cluster and _then_ loses the ability to send requests to the master then it should be removed from the cluster. Today this is not the case: the master can still receive responses to its follower checks, and receives acknowledgements to cluster state publications, so has no reason to remove the node. This commit changes the handling of follower checks so that they fail if they come from a master that the other node was following but which it now believes to have failed.	2019-03-06 09:41:57 +00:00
David Turner	77dd711847	Tidy up GroupedActionListener (#39633 ) Today the `GroupedActionListener` accepts a `defaults` parameter but all callers pass an empty list. Also it is permitted to pass an empty group but this is trappy because the delegated listener is never be called in that case. This commit removes the `defaults` parameter and forbids an empty group.	2019-03-06 09:25:10 +00:00
Armin Braun	aaecaf59a4	Optimize Bulk Message Parsing and Message Length Parsing (#39634 ) (#39730 ) * Optimize Bulk Message Parsing and Message Length Parsing * findNextMarker took almost 1ms per invocation during the PMC rally track * Fixed to be about an order of magnitude faster by using Netty's bulk `ByteBuf` search * It is unnecessary to instantiate an object (the input stream wrapper) and throw it away, just to read the `int` length from the message bytes * Fixed by adding bulk `int` read to BytesReference	2019-03-06 08:13:15 +01:00
Jason Tedor	75a0d4f470	Rename retention lease setting (#39719 ) This commit renames the retention lease setting index.soft_deletes.retention.lease so that it is under the namespace index.soft_deletes.retention_lease. As such, we rename the setting to index.soft_deletes.retention_lease.period.	2019-03-05 22:04:45 -05:00
Jason Tedor	504c792861	Add Docker build type (#39378 ) This commit adds a new build type (together with deb/rpm/tar/zip) to represent the official Docker images. This build type will be displayed in APIs such as the main and nodes info APIs.	2019-03-05 22:03:15 -05:00
Luca Cavanna	9d0211485c	Tie-break completion suggestions with same score and surface form (#39564 ) In case multiple completion suggestion entries have the same score and surface form, the order in which such options will be returned is currently not deterministic. With this commmit we introduce tie-breaking for such situations, based on shard id, index name, index uuid and doc id like we already do for ordinary search hits. With this change we also make shardIndex mandatory when sorting and comparing completion suggestion options, which was previously only needed later when fetching hits). Also, we need to make sure shardIndex is properly set when merging completion suggestions coming from multiple clusters in `SearchResponseMerger`	2019-03-05 18:03:54 +01:00
Jim Ferenczi	160dc29f0e	Handle total hits equal to track_total_hits (#37907 ) This change ensures that a total hits equal to the value set for track_total_hits is not considered as a lower bound.	2019-03-05 16:28:48 +01:00
Armin Braun	750ec8ba53	Minor Cleanups in QueryPhase (#39680 ) (#39694 ) * Soften redundant cast to allow use of `DeterministicTaskQueue` in this class for #39504 * Remove two redundant variables and lower visibility in two possible spots * Make field `final`	2019-03-05 15:04:16 +01:00
Christoph Büscher	5cdea6ef17	Fix Fuzziness#asDistance(String) (#39643 ) Currently Fuzziness#asDistance(String) doesn't work for custom AUTO values. If the fuzziness is AUTO, the method returns the correct edit distance to use, depending on the input string, but for custom AUTO values it currently always returns an edit distance of 1. Correcting this and adding unit and integration tests to catch these cases. Closes #39614	2019-03-05 14:31:07 +01:00
Simon Willnauer	19f6a35358	Move BWC Version to 7.1.0 after backport Relates to #39512	2019-03-05 14:11:59 +01:00
Simon Willnauer	d112c89041	Allow inclusion of unloaded segments in stats (#39512 ) Today we have no chance to fetch actual segment stats for segments that are currently unloaded. This is relevant in the case of frozen indices. This allows to monitor how much memory a frozen index would use if it was unfrozen.	2019-03-05 14:02:20 +01:00
Armin Braun	e8d9744340	Use Threadpool Time in ClusterApplierService (#39679 ) (#39685 ) * Use threadpool's time in `ClusterApplierService` to allow for deterministic tests * This is a part of/requirement for #39504	2019-03-05 12:37:49 +01:00
Gordon Brown	380dc27d91	Mute testCloseWhileRelocatingShards (#39589 )	2019-03-05 13:34:43 +02:00
Alan Woodward	0b14782b23	Add stopword support to IntervalBuilder (#39637 ) The match interval builder analyses input text and converts it to an IntervalSource, and as such may generate token streams with stopwords. This commit deals with these by using the extend factory to cover the gaps produced by these stopwords so that phrase and ordered queries work correctly.	2019-03-05 10:50:45 +00:00
Christoph Büscher	2fe1fa8972	Shortcut counts on exists queries (#39570 ) (#39660 ) `TopDocsCollectorContext` can already shortcut hit counts on `match_all` and `term` queries when there are no deletions. This change adds this ability for `exists` queries if the index doesn't have deletions and fields are indexed. Closes #37475	2019-03-04 19:53:43 +01:00
Prabhakar S	98925e9a09	Fixing the custom object serialization bug in diffable utils. (#39544 ) While serializing custom objects, the length of the list is computed after filtering out the unsupported objects but while writing objects the filter is not applied thus resulting in writing unsupported objects which will fail to deserialize by the receiever. Adding the condition to filter out unsupported custom objects.	2019-03-04 18:41:14 +01:00
Nhat Nguyen	801f13f201	Assert recovery done in testDoNotWaitForPendingSeqNo (#39595 ) Since #39006 we should be able to complete a peer-recovery without waiting for pending indexing operations. Thus, the assertion in testDoNotWaitForPendingSeqNo should be updated from false to true. Closes #39510	2019-03-04 10:21:23 -05:00
Yannick Welsch	936dbb00e3	Isolate Zen1 (#39470 ) Cherry-picks a few commits from #39466 to align 7.x with master branch.	2019-03-04 15:51:17 +01:00
Luca Cavanna	9ddaabba88	Remote private SearchHits.Total class (#39556 ) This is now possible as Lucene's `TotalHits` implements `equals`/`hashcode`, all the other methods can be in-lined in `SearchHits` instead, no need for a specific wrapper class.	2019-03-04 13:46:45 +01:00
Armin Braun	547af21a12	Introduce Mapping ActionListener (#39538 ) (#39636 ) * Introduce Safer Chaining of Listeners * The motivation here is to make reasoning about chains of `ActionListener` a little easier, by providing a safe method for nesting `ActionListener` that guarantees that a response is never dropped. Also, it dries up the code a little by removing the need to repeat `listener::onFailure` and `listener.onResponse` over and over. * Refactored a number of obvious/easy spots to use the new listener constructor	2019-03-04 12:56:46 +01:00
Daniel Mitterdorfer	fca6a2f006	Avoid deprecated API usage in TaskOperationFailure (#39303 ) (#39628 ) With this commit we remove usage of the deprecated method `ExceptionsHelper#detailedMessage` in the class `TaskOperationFailure`. Relates #19069	2019-03-04 11:37:59 +01:00
David Turner	dd68244841	Wait for state recovery in testFreshestMasterElectedAfterFullClusterRestart (#39602 ) Zen1IT#testFreshestMasterElectedAfterFullClusterRestart fails sometimes because we request the cluster state before state recovery has completed, and therefore obtain the default value for the setting we're relying on. Confusingly, we were starting out by setting this setting to its default value, so the test looked like it was failing because of a production bug. This commit avoids this confusion in future by setting it to a non-default value at the start of the test. Fixes #39586.	2019-03-04 10:26:07 +00:00
Adrien Grand	782f873165	Don't swallow exceptions in Store#close(). (#39035 ) (#39622 ) Store#close() swallows any `IOException`. Relates #39030	2019-03-04 10:58:43 +01:00
Adrien Grand	934946a232	Don't swallow exception in ThreadPool.terminate. (#39038 ) (#39623 ) The use of `closeWhileHandlingException` means that any exception while trying to close the threadpool is going to be swallowed. Relates #39030	2019-03-04 10:58:29 +01:00
Adrien Grand	21540a5ada	Enhancements to IndicesQueryCache. (#39099 ) (#39626 ) This commit adds the following: - more tests to IndicesServiceCloseTests, one of them found a bug in the order in which `IndicesQueryCache#onClose` and `IndicesService.indicesRefCount#decRef` are called. - made `IndicesQueryCache.stats2` a synchronized map. All writes to it are already protected by the lock of the Lucene cache, but the final read from an assertion in `IndicesQueryCache#close()` was not so this change should avoid any potential visibility issues. - human-readable `toString`s to make debugging easier. Relates #37117	2019-03-04 10:58:12 +01:00
Armin Braun	68bc178017	Disable Bwc Tests (#39551 ) * Disable Bwc Tests * For #39550	2019-03-04 10:41:52 +01:00
Yannick Welsch	0f65390c29	Do not mutate engine during planning step (#39571 ) This cleans up the Engine implementation by separating the sequence number generation from the planning step in the engine, to avoid for the planning step to have any side effects. This makes it easier to see that every sequence number is properly accounted for.	2019-03-04 10:11:39 +01:00
David Turner	9ec24bae80	Mute testDoNotWaitForPendingSeqNo Relates #39510, #39595.	2019-03-03 22:03:53 -05:00
Mayya Sharipova	d0e65a45a2	Add debug log for flush for IndicesRequestCacheIT (#39475 ) Add debug log when index is flushed to investigate a failure in IndicesRequestCacheIT "DEBUG" level is used as "TRACE" produces too much output irrelevant for this issue Relates to #32827	2019-03-01 13:12:45 -05:00
Luca Cavanna	29e3c18713	Mute failing IndexShardIT#testPendingRefreshWithIntervalChange Relates to #39565	2019-03-01 14:55:19 +01:00
Tanguy Leroux	e005eeb0b3	Backport support for replicating closed indices to 7.x (#39506 )(#39499 ) Backport support for replicating closed indices (#39499) Before this change, closed indexes were simply not replicated. It was therefore possible to close an index and then decommission a data node without knowing that this data node contained shards of the closed index, potentially leading to data loss. Shards of closed indices were not completely taken into account when balancing the shards within the cluster, or automatically replicated through shard copies, and they were not easily movable from node A to node B using APIs like Cluster Reroute without being fully reopened and closed again. This commit changes the logic executed when closing an index, so that its shards are not just removed and forgotten but are instead reinitialized and reallocated on data nodes using an engine implementation which does not allow searching or indexing, which has a low memory overhead (compared with searchable/indexable opened shards) and which allows shards to be recovered from peer or promoted as primaries when needed. This new closing logic is built on top of the new Close Index API introduced in 6.7.0 (#37359). Some pre-closing sanity checks are executed on the shards before closing them, and closing an index on a 8.0 cluster will reinitialize the index shards and therefore impact the cluster health. Some APIs have been adapted to make them work with closed indices: - Cluster Health API - Cluster Reroute API - Cluster Allocation Explain API - Recovery API - Cat Indices - Cat Shards - Cat Health - Cat Recovery This commit contains all the following changes (most recent first): * c6c42a1 Adapt NoOpEngineTests after #39006 * 3f9993d Wait for shards to be active after closing indices (#38854) * 5e7a428 Adapt the Cluster Health API to closed indices (#39364) * 3e61939 Adapt CloseFollowerIndexIT for replicated closed indices (#38767) * 71f5c34 Recover closed indices after a full cluster restart (#39249) * 4db7fd9 Adapt the Recovery API for closed indices (#38421) * 4fd1bb2 Adapt more tests suites to closed indices (#39186) * 0519016 Add replica to primary promotion test for closed indices (#39110) * b756f6c Test the Cluster Shard Allocation Explain API with closed indices (#38631) * c484c66 Remove index routing table of closed indices in mixed versions clusters (#38955) * 00f1828 Mute CloseFollowerIndexIT.testCloseAndReopenFollowerIndex() * e845b0a Do not schedule Refresh/Translog/GlobalCheckpoint tasks for closed indices (#38329) * cf9a015 Adapt testIndexCanChangeCustomDataPath for replicated closed indices (#38327) * b9becdd Adapt testPendingTasks() for replicated closed indices (#38326) * 02cc730 Allow shards of closed indices to be replicated as regular shards (#38024) * e53a9be Fix compilation error in IndexShardIT after merge with master * cae4155 Relax NoOpEngine constraints (#37413) * 54d110b [RCI] Adapt NoOpEngine to latest FrozenEngine changes * c63fd69 [RCI] Add NoOpEngine for closed indices (#33903) Relates to #33888	2019-03-01 14:48:26 +01:00
Yannick Welsch	1a50af7dd4	Do not close bad indices on startup (#39500 ) With #17187, we verified IndexService creation during initial state recovery on the master and if the recovery failed the index was imported as closed, not allocating any shards. This was mainly done to prevent endless allocation loops and full log files on data-nodes when the indexmetadata contained broken settings / analyzers. Zen2 loads the cluster state eagerly, and this check currently runs on all nodes (not only the elected master), which can significantly slow down startup on data nodes. Furthermore, with replicated closed indices (#33888) on the horizon, importing the index as closed will no longer not allocate any shards. Fortunately, the original issue for endless allocation loops is no longer a problem due to #18467, where we limit the retries of failed allocations. The solution here is therefore to just undo #17187, as it's no longer necessary, and covered by #18467, which will solve the issue for Zen2 and replicated closed indices as well.	2019-03-01 09:23:46 +01:00
Tal Levy	b9b46fdec6	fix UpdateSettingsRequestStreamableTests.mutateInstance (#39386 ) (#39477 ) Mutations of the timeout values were using string-representations. This resulted in very rare cases where the original timeout value was represented as something like "0ms" and the new random time-value generated was "0s". Although their string representations differ, their underlying TimeValue does not. This resulted in `-Dtests.seed=7F4C034C43C22B1B` to fail.	2019-02-28 21:02:32 -08:00
Mark Tozzi	609118c229	Override and mute InternalAutoDateHistogramTests#testReduceRandom() (#39536 ) pending resolution of #39497	2019-02-28 16:00:32 -05:00
Lee Hinman	dae48ba262	Add details about what acquired the shard lock last (#38807 ) This adds a `details` parameter to shard locking in `NodeEnvironment`. This is intended to be used for diagnosing issues such as ``` 1> [2019-02-11T14:34:19,262][INFO ][o.e.c.m.MetaDataDeleteIndexService] [node_s0] [.tasks/oSYOG0-9SHOx_pfAoiSExQ] deleting index 1> [2019-02-11T14:34:19,279][WARN ][o.e.i.IndicesService ] [node_s0] [.tasks/oSYOG0-9SHOx_pfAoiSExQ] failed to delete index 1> org.elasticsearch.env.ShardLockObtainFailedException: [.tasks][0]: obtaining shard lock timed out after 0ms 1> at org.elasticsearch.env.NodeEnvironment$InternalShardLock.acquire(NodeEnvironment.java:736) ~[main/:?] 1> at org.elasticsearch.env.NodeEnvironment.shardLock(NodeEnvironment.java:655) ~[main/:?] 1> at org.elasticsearch.env.NodeEnvironment.lockAllForIndex(NodeEnvironment.java:601) ~[main/:?] 1> at org.elasticsearch.env.NodeEnvironment.deleteIndexDirectorySafe(NodeEnvironment.java:554) ~[main/:?] ``` In the hope that we will be able to determine why the shard is still locked. Relates to #30290 as well as some other CI failures	2019-02-28 10:50:47 -07:00
Armin Braun	e564c4d8ad	Add Package Level JavaDoc on Snapshots (#38108 ) (#39514 ) * Add Package Level JavaDoc on Snapshots	2019-02-28 18:23:01 +01:00
Simon Willnauer	5c96b90ed5	Never block on scheduled refresh if a refresh is running (#39462 ) Today we block on the ReferenceManager in the case of a scheduled refresh. Yet if there is a refresh happening concurrently we might block and create very smallish segments. Instead we should just move on to the next shard and free up the refresh thread instead.	2019-02-28 11:57:45 +01:00
Armin Braun	d3d7d9bb9d	Remove Dead Code + Duplication in o.e.c.routing (#36678 ) (#39493 ) * Removed obviously unused fields+methods * Inlined public methods that only had one caller * Simplified `Optional` chain * Simplified some obviously redundant conditions	2019-02-28 10:33:05 +01:00
Armin Braun	90ab4a6f6e	Stabilize RareClusterState (#38671 ) (#39468 ) * Use actual master node, not just a master elligible node when trying to cancel publication. This only works on the master and for unlucky seeds we never try the master within the 10s that the busy assert runs. * Closes #36813	2019-02-28 08:01:52 +01:00
Tanguy Leroux	4dd274b51d	Unmute CoordinatorTests.testDiscoveryUsesNodesFromLastClusterState() (#39452 ) This commit unmutes the test and comments out the offending call to linearizabilityChecker.isLinearizable() as suggested in #39437	2019-02-27 20:38:54 +01:00

1 2 3 4 5 ...

2640 Commits