Commit Graph

2666 Commits

Author SHA1 Message Date
Andrey Ershov 52fd102e23 Avoid serialising state if it was already serialised (#39179)
When preparing the state to send to other nodes, we're serializing it
for each node, despite using putIfAbsent.
This commit checks if the state was already serialized for this node
version before performing the potentially expensive computation.
The map is not used by multiple threads, so computeIfAbsent is not
needed (and could not be used here easily, because IOException could
be thrown).

(cherry picked from commit c99be63b43f5250f3cd220130df73c5e9e097459)
2019-03-06 11:54:13 +01:00
David Turner 295e39a8c8 Drop node if asymmetrically partitioned from master (#39598)
When a node is joining the cluster we ensure that it can send requests to the
master _at that time_. If it joins the cluster and _then_ loses the ability to
send requests to the master then it should be removed from the cluster. Today
this is not the case: the master can still receive responses to its follower
checks, and receives acknowledgements to cluster state publications, so has no
reason to remove the node.

This commit changes the handling of follower checks so that they fail if they
come from a master that the other node was following but which it now believes
to have failed.
2019-03-06 09:41:57 +00:00
David Turner 77dd711847 Tidy up GroupedActionListener (#39633)
Today the `GroupedActionListener` accepts a `defaults` parameter but all
callers pass an empty list. Also it is permitted to pass an empty group but
this is trappy because the delegated listener is never be called in that case.
This commit removes the `defaults` parameter and forbids an empty group.
2019-03-06 09:25:10 +00:00
Armin Braun aaecaf59a4
Optimize Bulk Message Parsing and Message Length Parsing (#39634) (#39730)
* Optimize Bulk Message Parsing and Message Length Parsing

* findNextMarker took almost 1ms per invocation during the PMC rally track
  * Fixed to be about an order of magnitude faster by using Netty's bulk `ByteBuf` search
* It is unnecessary to instantiate an object (the input stream wrapper) and throw it away, just to read the `int` length from the message bytes
  * Fixed by adding bulk `int` read to BytesReference
2019-03-06 08:13:15 +01:00
Jason Tedor 75a0d4f470
Rename retention lease setting (#39719)
This commit renames the retention lease setting
index.soft_deletes.retention.lease so that it is under the namespace
index.soft_deletes.retention_lease. As such, we rename the setting to
index.soft_deletes.retention_lease.period.
2019-03-05 22:04:45 -05:00
Jason Tedor 504c792861
Add Docker build type (#39378)
This commit adds a new build type (together with deb/rpm/tar/zip) to
represent the official Docker images. This build type will be displayed
in APIs such as the main and nodes info APIs.
2019-03-05 22:03:15 -05:00
Luca Cavanna 9d0211485c Tie-break completion suggestions with same score and surface form (#39564)
In case multiple completion suggestion entries have the same score and
surface form, the order in which such options will be returned is
currently not deterministic.

With this commmit we introduce tie-breaking for such situations, based
on shard id, index name, index uuid and doc id like we already do for
 ordinary search hits. With this change we also make shardIndex
mandatory when sorting and comparing completion suggestion options,
which was previously only needed later when fetching hits).

Also, we need to make sure shardIndex is properly set when merging
completion suggestions coming from multiple clusters in
`SearchResponseMerger`
2019-03-05 18:03:54 +01:00
Jim Ferenczi 160dc29f0e Handle total hits equal to track_total_hits (#37907)
This change ensures that a total hits equal to the value set for
track_total_hits is not considered as a lower bound.
2019-03-05 16:28:48 +01:00
Armin Braun 750ec8ba53
Minor Cleanups in QueryPhase (#39680) (#39694)
* Soften redundant cast to allow use of `DeterministicTaskQueue` in this class for #39504
* Remove two redundant variables and lower visibility in two possible spots
* Make field `final`
2019-03-05 15:04:16 +01:00
Christoph Büscher 5cdea6ef17 Fix Fuzziness#asDistance(String) (#39643)
Currently Fuzziness#asDistance(String) doesn't work for custom AUTO values. If
the fuzziness is AUTO, the method returns the correct edit distance to use,
depending on the input string, but for custom AUTO values it currently always
returns an edit distance of 1. Correcting this and adding unit and integration
tests to catch these cases.

Closes #39614
2019-03-05 14:31:07 +01:00
Simon Willnauer 19f6a35358 Move BWC Version to 7.1.0 after backport
Relates to #39512
2019-03-05 14:11:59 +01:00
Simon Willnauer d112c89041 Allow inclusion of unloaded segments in stats (#39512)
Today we have no chance to fetch actual segment stats for segments that
are currently unloaded. This is relevant in the case of frozen indices.
This allows to monitor how much memory a frozen index would use if it was
unfrozen.
2019-03-05 14:02:20 +01:00
Armin Braun e8d9744340
Use Threadpool Time in ClusterApplierService (#39679) (#39685)
* Use threadpool's time in `ClusterApplierService` to allow for deterministic tests
* This is a part of/requirement for #39504
2019-03-05 12:37:49 +01:00
Gordon Brown 380dc27d91 Mute testCloseWhileRelocatingShards (#39589) 2019-03-05 13:34:43 +02:00
Alan Woodward 0b14782b23 Add stopword support to IntervalBuilder (#39637)
The match interval builder analyses input text and converts it to an IntervalSource, and as such
may generate token streams with stopwords. This commit deals with these by using the extend
factory to cover the gaps produced by these stopwords so that phrase and ordered queries work
correctly.
2019-03-05 10:50:45 +00:00
Christoph Büscher 2fe1fa8972
Shortcut counts on exists queries (#39570) (#39660)
`TopDocsCollectorContext` can already shortcut hit counts on `match_all` and `term` queries when there are no deletions. 
This change adds this ability for `exists` queries if the index doesn't have deletions and fields are indexed.

Closes #37475
2019-03-04 19:53:43 +01:00
Prabhakar S 98925e9a09 Fixing the custom object serialization bug in diffable utils. (#39544)
While serializing custom objects, the length of the list is computed after
filtering out the unsupported objects but while writing objects the filter
is not applied thus resulting in writing unsupported objects which will fail
to deserialize by the receiever. Adding the condition to filter out unsupported
custom objects.
2019-03-04 18:41:14 +01:00
Nhat Nguyen 801f13f201 Assert recovery done in testDoNotWaitForPendingSeqNo (#39595)
Since #39006 we should be able to complete a peer-recovery without
waiting for pending indexing operations. Thus, the assertion in
testDoNotWaitForPendingSeqNo should be updated from false to true.

Closes #39510
2019-03-04 10:21:23 -05:00
Yannick Welsch 936dbb00e3
Isolate Zen1 (#39470)
Cherry-picks a few commits from #39466 to align 7.x with master branch.
2019-03-04 15:51:17 +01:00
Luca Cavanna 9ddaabba88 Remote private SearchHits.Total class (#39556)
This is now possible as Lucene's `TotalHits` implements `equals`/`hashcode`,
all the other methods can be in-lined in `SearchHits` instead, no need for
a specific wrapper class.
2019-03-04 13:46:45 +01:00
Armin Braun 547af21a12
Introduce Mapping ActionListener (#39538) (#39636)
* Introduce Safer Chaining of Listeners

* The motivation here is to make reasoning about chains of `ActionListener` a little easier, by providing a safe method for nesting `ActionListener` that guarantees that a response is never dropped. Also, it dries up the code a little by removing the need to repeat `listener::onFailure` and `listener.onResponse` over and over.
* Refactored a number of obvious/easy spots to use the new listener constructor
2019-03-04 12:56:46 +01:00
Daniel Mitterdorfer fca6a2f006
Avoid deprecated API usage in TaskOperationFailure (#39303) (#39628)
With this commit we remove usage of the deprecated method
`ExceptionsHelper#detailedMessage` in the class `TaskOperationFailure`.

Relates #19069
2019-03-04 11:37:59 +01:00
David Turner dd68244841
Wait for state recovery in testFreshestMasterElectedAfterFullClusterRestart (#39602)
Zen1IT#testFreshestMasterElectedAfterFullClusterRestart fails sometimes because
we request the cluster state before state recovery has completed, and therefore
obtain the default value for the setting we're relying on.

Confusingly, we were starting out by setting this setting to its default value,
so the test looked like it was failing because of a production bug. This commit
avoids this confusion in future by setting it to a non-default value at the
start of the test.

Fixes #39586.
2019-03-04 10:26:07 +00:00
Adrien Grand 782f873165
Don't swallow exceptions in Store#close(). (#39035) (#39622)
Store#close() swallows any `IOException`.

Relates #39030
2019-03-04 10:58:43 +01:00
Adrien Grand 934946a232
Don't swallow exception in ThreadPool.terminate. (#39038) (#39623)
The use of `closeWhileHandlingException` means that any exception while trying
to close the threadpool is going to be swallowed.

Relates #39030
2019-03-04 10:58:29 +01:00
Adrien Grand 21540a5ada
Enhancements to IndicesQueryCache. (#39099) (#39626)
This commit adds the following:
 - more tests to IndicesServiceCloseTests, one of them found a bug in the order
   in which `IndicesQueryCache#onClose` and
   `IndicesService.indicesRefCount#decRef` are called.
 - made `IndicesQueryCache.stats2` a synchronized map. All writes to it are
   already protected by the lock of the Lucene cache, but the final read from
   an assertion in `IndicesQueryCache#close()` was not so this change should
   avoid any potential visibility issues.
 - human-readable `toString`s to make debugging easier.

Relates #37117
2019-03-04 10:58:12 +01:00
Armin Braun 68bc178017
Disable Bwc Tests (#39551)
* Disable Bwc Tests
* For #39550
2019-03-04 10:41:52 +01:00
Yannick Welsch 0f65390c29 Do not mutate engine during planning step (#39571)
This cleans up the Engine implementation by separating the sequence number generation from the
planning step in the engine, to avoid for the planning step to have any side effects. This makes it
easier to see that every sequence number is properly accounted for.
2019-03-04 10:11:39 +01:00
David Turner 9ec24bae80 Mute testDoNotWaitForPendingSeqNo
Relates #39510, #39595.
2019-03-03 22:03:53 -05:00
Mayya Sharipova d0e65a45a2 Add debug log for flush for IndicesRequestCacheIT (#39475)
Add debug log when index is flushed to investigate a failure
in IndicesRequestCacheIT

"DEBUG" level is used as "TRACE" produces too  much output irrelevant for this
issue

Relates to #32827
2019-03-01 13:12:45 -05:00
Luca Cavanna 29e3c18713 Mute failing IndexShardIT#testPendingRefreshWithIntervalChange
Relates to #39565
2019-03-01 14:55:19 +01:00
Tanguy Leroux e005eeb0b3
Backport support for replicating closed indices to 7.x (#39506)(#39499)
Backport support for replicating closed indices (#39499)
    
    Before this change, closed indexes were simply not replicated. It was therefore
    possible to close an index and then decommission a data node without knowing
    that this data node contained shards of the closed index, potentially leading to
    data loss. Shards of closed indices were not completely taken into account when
    balancing the shards within the cluster, or automatically replicated through shard
    copies, and they were not easily movable from node A to node B using APIs like
    Cluster Reroute without being fully reopened and closed again.
    
    This commit changes the logic executed when closing an index, so that its shards
    are not just removed and forgotten but are instead reinitialized and reallocated on
    data nodes using an engine implementation which does not allow searching or
     indexing, which has a low memory overhead (compared with searchable/indexable
    opened shards) and which allows shards to be recovered from peer or promoted
    as primaries when needed.
    
    This new closing logic is built on top of the new Close Index API introduced in
    6.7.0 (#37359). Some pre-closing sanity checks are executed on the shards before
    closing them, and closing an index on a 8.0 cluster will reinitialize the index shards
    and therefore impact the cluster health.
    
    Some APIs have been adapted to make them work with closed indices:
    - Cluster Health API
    - Cluster Reroute API
    - Cluster Allocation Explain API
    - Recovery API
    - Cat Indices
    - Cat Shards
    - Cat Health
    - Cat Recovery
    
    This commit contains all the following changes (most recent first):
    * c6c42a1 Adapt NoOpEngineTests after #39006
    * 3f9993d Wait for shards to be active after closing indices (#38854)
    * 5e7a428 Adapt the Cluster Health API to closed indices (#39364)
    * 3e61939 Adapt CloseFollowerIndexIT for replicated closed indices (#38767)
    * 71f5c34 Recover closed indices after a full cluster restart (#39249)
    * 4db7fd9 Adapt the Recovery API for closed indices (#38421)
    * 4fd1bb2 Adapt more tests suites to closed indices (#39186)
    * 0519016 Add replica to primary promotion test for closed indices (#39110)
    * b756f6c Test the Cluster Shard Allocation Explain API with closed indices (#38631)
    * c484c66 Remove index routing table of closed indices in mixed versions clusters (#38955)
    * 00f1828 Mute CloseFollowerIndexIT.testCloseAndReopenFollowerIndex()
    * e845b0a Do not schedule Refresh/Translog/GlobalCheckpoint tasks for closed indices (#38329)
    * cf9a015 Adapt testIndexCanChangeCustomDataPath for replicated closed indices (#38327)
    * b9becdd Adapt testPendingTasks() for replicated closed indices (#38326)
    * 02cc730 Allow shards of closed indices to be replicated as regular shards (#38024)
    * e53a9be Fix compilation error in IndexShardIT after merge with master
    * cae4155 Relax NoOpEngine constraints (#37413)
    * 54d110b [RCI] Adapt NoOpEngine to latest FrozenEngine changes
    * c63fd69 [RCI] Add NoOpEngine for closed indices (#33903)
    
    Relates to #33888
2019-03-01 14:48:26 +01:00
Yannick Welsch 1a50af7dd4 Do not close bad indices on startup (#39500)
With #17187, we verified IndexService creation during initial state recovery on the master and if the
recovery failed the index was imported as closed, not allocating any shards. This was mainly done to
prevent endless allocation loops and full log files on data-nodes when the indexmetadata contained
broken settings / analyzers. Zen2 loads the cluster state eagerly, and this check currently runs on all
nodes (not only the elected master), which can significantly slow down startup on data nodes.
Furthermore, with replicated closed indices (#33888) on the horizon, importing the index as closed
will no longer not allocate any shards. Fortunately, the original issue for endless allocation loops is
no longer a problem due to #18467, where we limit the retries of failed allocations. The solution here
is therefore to just undo #17187, as it's no longer necessary, and covered by #18467, which will solve
the issue for Zen2 and replicated closed indices as well.
2019-03-01 09:23:46 +01:00
Tal Levy b9b46fdec6
fix UpdateSettingsRequestStreamableTests.mutateInstance (#39386) (#39477)
Mutations of the timeout values were using string-representations.

This resulted in very rare cases where the original timeout value was
represented as something like "0ms" and the new random time-value generated
was "0s". Although their string representations differ, their underlying
TimeValue does not. This resulted in `-Dtests.seed=7F4C034C43C22B1B` to
fail.
2019-02-28 21:02:32 -08:00
Mark Tozzi 609118c229 Override and mute InternalAutoDateHistogramTests#testReduceRandom() (#39536)
pending resolution of #39497
2019-02-28 16:00:32 -05:00
Lee Hinman dae48ba262 Add details about what acquired the shard lock last (#38807)
This adds a `details` parameter to shard locking in `NodeEnvironment`. This is
intended to be used for diagnosing issues such as

```
  1> [2019-02-11T14:34:19,262][INFO ][o.e.c.m.MetaDataDeleteIndexService] [node_s0] [.tasks/oSYOG0-9SHOx_pfAoiSExQ] deleting index
  1> [2019-02-11T14:34:19,279][WARN ][o.e.i.IndicesService     ] [node_s0] [.tasks/oSYOG0-9SHOx_pfAoiSExQ] failed to delete index
  1> org.elasticsearch.env.ShardLockObtainFailedException: [.tasks][0]: obtaining shard lock timed out after 0ms
  1> 	at org.elasticsearch.env.NodeEnvironment$InternalShardLock.acquire(NodeEnvironment.java:736) ~[main/:?]
  1> 	at org.elasticsearch.env.NodeEnvironment.shardLock(NodeEnvironment.java:655) ~[main/:?]
  1> 	at org.elasticsearch.env.NodeEnvironment.lockAllForIndex(NodeEnvironment.java:601) ~[main/:?]
  1> 	at org.elasticsearch.env.NodeEnvironment.deleteIndexDirectorySafe(NodeEnvironment.java:554) ~[main/:?]
```

In the hope that we will be able to determine why the shard is still locked.

Relates to #30290 as well as some other CI failures
2019-02-28 10:50:47 -07:00
Armin Braun e564c4d8ad
Add Package Level JavaDoc on Snapshots (#38108) (#39514)
* Add Package Level JavaDoc on Snapshots
2019-02-28 18:23:01 +01:00
Simon Willnauer 5c96b90ed5 Never block on scheduled refresh if a refresh is running (#39462)
Today we block on the ReferenceManager in the case of a scheduled refresh.
Yet if there is a refresh happening concurrently we might block and create
very smallish segments. Instead we should just move on to the next shard
and free up the refresh thread instead.
2019-02-28 11:57:45 +01:00
Armin Braun d3d7d9bb9d
Remove Dead Code + Duplication in o.e.c.routing (#36678) (#39493)
* Removed obviously unused fields+methods
* Inlined public methods that only had one caller
* Simplified `Optional` chain
* Simplified some obviously redundant conditions
2019-02-28 10:33:05 +01:00
Armin Braun 90ab4a6f6e
Stabilize RareClusterState (#38671) (#39468)
* Use actual master node, not just a master elligible node when trying to cancel publication. This only works on the master and for unlucky seeds we never try the master within the 10s that the busy assert runs.
* Closes #36813
2019-02-28 08:01:52 +01:00
Tanguy Leroux 4dd274b51d Unmute CoordinatorTests.testDiscoveryUsesNodesFromLastClusterState() (#39452)
This commit unmutes the test and comments out the
offending call to linearizabilityChecker.isLinearizable() as suggested
in #39437
2019-02-27 20:38:54 +01:00
Tanguy Leroux 983b5d1c0e Mute SpecificMasterNodesIT.testElectOnlyBetweenMasterNodes()
Tracked in #38331
2019-02-27 18:00:02 +01:00
Daniel Mitterdorfer 2ccba18809
Correct name of basic_date_time_no_millis (#39367) (#39454)
With this commit we correct the name of the Java time based formatter
for `basic_date_time_no_millis`.
2019-02-27 17:03:50 +01:00
Alan Woodward 71b8494181
Upgrade to lucene 8.0.0-snapshot-ff9509a8df (#39444)
Backport of #39350

Contains the following:

* LUCENE-8635: Move terms dictionary off-heap for non-primary-key fields in `MMapDirectory`
* LUCENE-8292: `TermsEnum` is fully abstract
* LUCENE-8679: Return WITHIN in `EdgeTree#relateTriangle` only when polygon and triangle share one edge
* LUCENE-8676: Nori tokenizer deals correctly with large buffers
* LUCENE-8697: `GraphTokenStreamFiniteStrings` better handles side paths with gaps
* LUCENE-8664: Add `equals` and `hashCode` to `TotalHits`
* LUCENE-8660: `TopDocsCollector` returns accurate hit counts if the total equals the threshold
* LUCENE-8654: `Polygon2D#relateTriangle` fix for when the polygon is inside the triangle
* LUCENE-8645: `Intervals#fixField` can merge intervals from different fields
* LUCENE-8585: Create jump-tables for DocValues at index time
2019-02-27 14:36:08 +00:00
Armin Braun f675b33d50
Increase Timeout in UnicastZenPingTests (#38893) (#39449)
* Just like #37268 removing another 1s timeout, those are dangerous since they're easily exceeded by an untimely gc pause
* Closes #26701
2019-02-27 15:22:17 +01:00
Jason Tedor 55e98f08d8
Provide a clearer error message on keystore add (#39327)
When trying to add a setting to the keystore with an upper case name, we
reject with an unclear error message. This commit makes that error
message much clearer.
2019-02-27 08:10:23 -05:00
Armin Braun 27485871b8
Don't Ping on Handshake Connection (#39076) (#39446)
* Don't Ping on Handshake Connection

* It does not make sense to run pings on the handshake connection
   * Set the ping interval to `-1` to deactivate pings on it
2019-02-27 13:39:25 +01:00
Tanguy Leroux 6912e27ee0 Mute MinimumMasterNodesIT.testThreeNodesNoMasterBlock()
Tracked in #39172
2019-02-27 13:13:22 +01:00
David Turner 41668f7723 Move PeerFinder's logger to the expected package (#39412)
Today the abstract `org.elasticsearch.discovery.PeerFinder` uses the logger of
its implementation, which in production is in `o.e.cluster.coordination`. This
turns out to be confusing and unhelpful, so with this change we move to using
the logger that belongs to `PeerFinder`.
2019-02-27 08:44:05 +00:00
Armin Braun 28b771f5db
Remove Dead Code Test Infrastructure (#39192) (#39436)
* Just removing some obviously unused things
2019-02-27 09:38:47 +01:00