1979 Commits

Author SHA1 Message Date
Tim Brooks
ea7ea51050
Make TcpTransport#openConnection fully async (#36095)
This is a follow-up to #35144. That commit made the underlying
connection opening process in TcpTransport asynchronous. However the
method still blocked on the process being complete before returning.
This commit moves the blocking to the ConnectionManager level. This is
another step towards the top-level TransportService api being async.
2018-11-30 11:30:42 -07:00
Tim Brooks
26dcbcc8cc
Remove MockTcpTransport for ESIntegTestCase (#36089)
This commit removes the `MockTcpTransport` as a transport option for
`ESIntegTestCase`. It is the first step in replacing the usages of
`MockTcpTransport` with `MockNioTransport`.
2018-11-30 09:04:51 -07:00
Adrien Grand
fa3d365ee8
Fix CompositeBytesReference#slice to not throw AIOOBE with legal offsets. (#35955)
CompositeBytesReference#slice has two bugs:
 - One that makes it fail if the reference is empty and an empty slice is
   created, this is #35950 and is fixed by special-casing empty-slices.
 - One performance bug that makes it always create a composite slice when
   creating a slice that ends on a boundary, this is fixed by computing `limit`
   as the index of the sub reference that holds the last element rather than
   the next element after the slice.

Closes #35950
2018-11-30 10:32:46 +01:00
Zachary Tong
61c2db5ebb Revert "Deprecate X-Pack centric rollup endpoints (#35962)"
This reverts commit b84f1f6a3ad511538ea102badfefa934fa8f0336.
2018-11-29 12:58:23 -05:00
Zachary Tong
40c5445480 Revert "[TEST] Use deprecated form of rollup endpoint in mixed cluster (#36000)"
This reverts commit 85cdf4f9133190d9e0fbb7dc8b791262571cab52.
2018-11-29 12:56:25 -05:00
Tim Brooks
c305f9dc03
Make keepalive pings bidirectional and optimizable (#35441)
This is related to #34405 and a follow-up to #34753. It makes a number
of changes to our current keepalive pings.

The ping interval configuration is moved to the ConnectionProfile.

The server channel now responds to pings. This makes the keepalive
pings bidirectional.

On the client-side, the pings can now be optimized away. What this
means is that if the channel has received a message or sent a message
since the last pinging round, the ping is not sent for this round.
2018-11-29 08:55:53 -07:00
Zachary Tong
85cdf4f913
[TEST] Use deprecated form of rollup endpoint in mixed cluster (#36000)
When wiping rollup jobs, if we are in a mixed cluster with < v7.0 nodes
we need to fall back to the deprecated endpoint because we may talk
to a 6.x node.
2018-11-29 07:37:33 -05:00
David Turner
7f257187af
[Zen2] Update default for USE_ZEN2 to true (#35998)
Today the default for USE_ZEN2 is false and it is overridden in many places. By
defaulting it to true we can be sure that the only places in which Zen2 does
not work are those in which it is explicitly set to false.
2018-11-29 12:18:35 +00:00
Jason Tedor
b84f1f6a3a
Deprecate X-Pack centric rollup endpoints (#35962)
This commit is part of our plan to deprecate and ultimately remove the
use of _xpack in the REST APIs.
2018-11-27 20:34:17 -05:00
Tim Brooks
cc1fa799c8
Remove TcpChannel#setSoLinger method (#35924)
This commit removes the dedicated `setSoLinger` method. This simplifies
the `TcpChannel` interface. This method has very little effect as the
SO_LINGER is not set prior to the channels being closed in the abstract
transport test case. We still will set SO_LINGER on the
`MockNioTransport`. However we can do this manually.
2018-11-27 09:08:14 -07:00
Andrey Ershov
0e283f9670
[Zen2] PersistedState interface implementation (#35819)
Today GatewayMetaState is capable of atomically storing MetaData to
disk. We've also moved fields that are needed to be persisted in Zen2
from ClusterState to ClusterState.MetaData.CoordinationMetaData.

This commit implements PersistedState interface.

version and currentTerm are persisted as a part of Manifest.
GatewayMetaState now implements both ClusterStateApplier and
PersistedState interfaces. We started with two descendants
Zen1GatewayMetaState and Zen2GatewayMetaState, but it turned
out to be not easy to glue it.
GatewayMetaState now constructs previousClusterState (including
MetaData) and previousManifest inside the constructor so that all
PersistedState methods are usable as soon as GatewayMetaState
instance is constructed. Also, loadMetaData is renamed to
getMetaData, because it just returns
previousClusterState.metaData().
Sadly, we don't have access to localNode (obtained from 
TransportService in the constructor, so getLastAcceptedState
should be called, after setLocalNode method is invoked.
Currently, when deciding whether to write IndexMetaData to disk,
we're comparing current IndexMetaData version and received
IndexMetaData version. This is not safe in Zen2 if the term has changed.
So updateClusterState now accepts incremental write
method parameter. When it's set to false, we always write
IndexMetaData to disk.
Things that are not covered by GatewayMetaStateTests are covered
by GatewayMetaStatePersistedStateTests.
This commit also adds an option to use GatewayMetaState instead of
InMemoryPersistedState in TestZenDiscovery. However, by default
InMemoryPersistedState is used and only one test in PersistedStateIT
used GatewayMetaState. In order to use it for other tests, proper
state recovery should be implemented.
2018-11-27 15:04:52 +01:00
Christophe Bismuth
b95a4db6e6 Throw a parsing exception when boost is set in span_or query (#28390) (#34112) 2018-11-26 12:15:59 -05:00
Jim Ferenczi
e37a0ef844
Upgrade to lucene-8.0.0-snapshot-67cdd21996 (#35816) 2018-11-22 15:42:59 +01:00
Andrey Ershov
a056bd8c1c
[Zen2] Move ClusterState fields to be persisted to ClusterState.MetaData (#35625)
Today we have a way to atomically persist global MetaData and
IndexMetaData to disk when new ClusterState is received. All other
ClusterState fields are not persisted.
However, there are other parts of ClusterState that should be
persisted, namely:

version
term
lastCommittedConfiguration
lastAcceptedConfiguration
votingTombstones

version is changed frequently, other fields are not. We decided
to group term, lastCommittedConfiguration,
lastAcceptedConfiguration and votingTombstones into
CoordinationMetaData class and make CoordinationMetaData a field
inside MetaData.
MetaData.toXContent and MetaData.fromXContent should take care of
CoordinationMetaData.
version stays as a top level field in ClusterState and will be
persisted as part of Manifest in a follow-up commit.
Also MetaData.isGlobalStateEquals should be extended to include
coordinationMetaData in comparison.

This commit favors exposing getters, such as getTerm directly in
ClusterState to avoid massive code changes.

An example of CoordinationMetaState.toXContent:

{
  "term": 1,
  "last_committed_config": [
    "TiIuBcbBtpuXyDDVHXeD",
    "ZIAoVbkjjLPLUuYLaTkw"
  ],
  "last_accepted_config": [
    "OwkXbXZNOZPJqccdFHdz",
    "LouzsGYwmQzpeQMrboZe",
    "fCKGRZdjLTqzXAqPUtGL",
    "pLoxshjpJXwDhbgjfYJy",
    "SjINLwFIlIEFZCbjrSFo",
    "MDkVncJEVyZLJktopWje"
  ]
}
2018-11-21 17:03:26 +01:00
Andrey Ershov
6ac0cb1842 Merge branch master into zen2
2 types of conflicts during the merge:
1) Line length fix
2) Classes no longer extend AbstractComponent
2018-11-21 15:36:49 +01:00
Yannick Welsch
8939a7894f
Zen2: Move disruption tests to Zen2 (#35724)
- Moves disruption tests to Zen2
- Registers a few missing settings
- Removes .put(TestZenDiscovery.USE_ZEN2.getKey(), true) from tests where Zen2 is now enabled
by default through the parent test class
- Moves QuorumGatewayIT back to Zen1, as it is not stable with Zen2 as it currently relies on
dangling indices due to the lack of proper CS persistence, which triggers secondary failures
2018-11-21 14:43:33 +01:00
Armin Braun
7a210342ab
TESTS: Remove Dead Code in Disruption Tests (#35768)
* Neither this class nor the constructor are used anywhere
2018-11-21 10:33:50 +01:00
Christoph Büscher
5847f8379c
Move ScoreAccessor to test-framework (#35766)
This class is only used by RandomScoreFunctionIT and the MockScriptEngine, so it
shouldn't be part of the server codebase.
2018-11-21 10:28:31 +01:00
Armin Braun
33c713ba60
TESTS: More Logging in LongGcDisruptionTests (#35702)
* The existing logging is not helpful enough to track down which threads hang, we need the hanging thread's stacktraces too
* Relates #35686
2018-11-20 15:36:01 +01:00
Alpar Torok
8659af68e0
Auto skip license headers on no source (#35640)
* Unmute BuildExamplePluginsIT

* Skip licenseHeaders when there are no sources
2018-11-20 13:02:33 +02:00
Simon Willnauer
29ef442841
Add a _freeze / _unfreeze API (#35592)
This commit adds a rest endpoint for freezing and unfreezing an index.
Among other cleanups mainly fixing an issue accessing package private APIs
from a plugin that got caught by integration tests this change also adds
documentation for frozen indices.
Note: frozen indices are marked as `beta` and available as a basic feature.

Relates to #34352
2018-11-20 08:03:24 +01:00
Yannick Welsch
47ada69c46
Zen2: Move most integration tests to Zen2 (#35678)
Zen2 is now feature-complete enough to run most ESIntegTestCase tests. The changes in this PR
are as follows:
- ClusterSettingsIT is adapted to not be Zen1 specific anymore (it was using Zen1 settings).
- Some of the integration tests require persistent storage of the cluster state, which is not fully
implemented yet (see #33958). These tests keep running with Zen1 for now but will be switched
over as soon as that is fully implemented.
- Some very few integration tests are not running yet with Zen2 for other reasons, depending on
some of the other open points in #32006.
2018-11-19 21:15:29 +01:00
Gordon Brown
b2057138a7
Remove AbstractComponent from AbstractLifecycleComponent (#35560)
AbstractLifecycleComponent now no longer extends AbstractComponent. In
order to accomplish this, many, many classes now instantiate their own
logger.
2018-11-19 09:51:32 -07:00
Arthur Gavlyukovskiy
022726011c Remove use of AbstractComponent in server (#35444)
Removed extending of AbstractComponent and changed logger usage to
explicit declaration. Abstract classes still have logger
declaration using this.getClass() in order to show implementation class
name in its logs.

See #34488
2018-11-16 16:10:32 -05:00
Jernej Klancic
baf33b3162 Removes AbstractComponent from several classes (#35566)
Removes inhertiting from AbstractComponent for some classes (mostly
in the plugins module).

Relates to #34488
2018-11-16 20:50:18 +01:00
Lee Hinman
ce35d049e9 [TEST] Fix ClusterApplierServiceTests.testClusterStateUpdateLogging
This changes the test to not use a `CountDownlatch`, instead adding an assertion
for the final logging message and waiting until the `MockAppender` has seen it
before proceeding.

Resolves #23739
2018-11-15 14:15:23 -07:00
David Turner
86ef041539
[Zen2] Introduce ClusterBootstrapService (#35488)
Today, the bootstrapping of a Zen2 cluster is driven externally, requiring
something else to wait for discovery to converge and then to inject the initial
configuration. This is hard to use in some situations, such as REST tests.

This change introduces the `ClusterBootstrapService` which brings the bootstrap
retry logic within each node and allows it to be controlled via an (unsafe)
node setting.
2018-11-15 20:09:22 +00:00
Tanguy Leroux
c9b4ef0dfd
Use RunOnce when appropriate (#35553)
This pull request replaces some blocks of code that must be run once 
and that are currently based on AtomicBoolean by the convient RunOnce 
class added in #35489.
2018-11-15 09:24:40 +01:00
Andrey Ershov
045fdd0d3b Merge master into zen2 2018-11-14 15:37:13 +03:00
Yannick Welsch
4cfdb0609e
Adapt InternalCluster#fullRestart to call onNodeStopped when all nodes are stopped (#35494)
Refactors and simplifies the logic around stopping nodes, making sure that for a full cluster restart
onNodeStopped is only called after the nodes are actually all stopped (and in particular not while
starting up some nodes again). This change also ensures that a closed node client is not being used
anymore (which required a small change to a test).

Relates to #35049
2018-11-14 13:24:56 +01:00
Zachary Tong
c346a0f027
[Rollup] Add wait_for_completion option to StopRollupJob API (#34811)
This adds a `wait_for_completion` flag which allows the user to block 
the Stop API until the task has actually moved to a stopped state, 
instead of returning immediately.  If the flag is set, a `timeout` parameter
can be specified to determine how long (at max) to block the API
call.  If unspecified, the timeout is 30s.

If the timeout is exceeded before the job moves to STOPPED, a
timeout exception is thrown.  Note: this is just signifying that the API
call itself timed out.  The job will remain in STOPPING and evenutally
flip over to STOPPED in the background.

If the user asks the API to block, we move over the the generic
threadpool so that we don't hold up a networking thread.
2018-11-13 16:37:17 -05:00
Julie Tibshirani
bc799e4a6f
Ignore warnings related to types deprecation in REST tests. (#35395) 2018-11-13 11:56:01 -08:00
David Turner
8e40a2bbe2
[Zen2] Introduce vote withdrawal (#35446)
If shutting down half or more of the master-eligible nodes, their votes must
first be explicitly withdrawn to ensure that the cluster doesn't lose its
quorum. This works via _voting tombstones_, stored in the cluster state, which
tell the reconfigurator to remove nodes from the voting configuration.

This change introduces voting tombstones to the cluster state, together with
transport APIs for adding and removing them, and makes use of these APIs in
`InternalTestCluster` to support tests which remove at least half of the
master-eligible nodes at once (e.g. shrinking from two master-eligible nodes to
one).
2018-11-13 19:32:32 +00:00
David Turner
fbd3cab410
[Zen2] Remove AbstractComponent usage (#35483)
AbstractComponent was deprecated in #35140 and is looking like it will be
removed at some point by #34888. Today all it does is provide a logger. This
change removes the usages of AbstractComponent that live solely in the zen2
feature branch to avoid some future merge pain, and replaces it where necessary
with some directly-created loggers.
2018-11-13 15:20:49 +00:00
Yannick Welsch
fe29b18c26 Fix compilation 2018-11-12 11:05:11 +01:00
Yannick Welsch
4e6c58c942 Merge remote-tracking branch 'elastic/master' into zen2 2018-11-12 10:03:59 +01:00
Tim Brooks
ba478827ad
Improve MockTcpTransport memory usage (#35402)
The MockTcpTransport is not friendly in regards to memory usage. It must
allocate multiple byte arrays for every message. This improves the
memory situation by failing fast if the message is improperly formatted.
Additionally, it uses reusable big arrays for at least half of the
allocated byte arrays.
2018-11-09 10:12:49 -07:00
Jim Ferenczi
7054e289fa
Add trace log of the request for the query and fetch phases (#34479)
This change adds a logger for the query and fetch phases that prints all requests
before their execution at the trace level. This will help debugging cases where an issue
occurs during the execution since only completed queries are logged by the slow logs.
2018-11-09 09:41:51 +01:00
Tim Brooks
93c2c604e5
Move compression config to ConnectionProfile (#35357)
This is related to #34483. It introduces a namespaced setting for
compression that allows users to configure compression on a per remote
cluster basis. The transport.tcp.compress remains as a fallback
setting. If transport.tcp.compress is set to true, then all requests
and responses are compressed. If it is set to false, only requests to
clusters based on the cluster.remote.cluster_name.transport.compress
setting are compressed. However, after this change regardless of any
local settings, responses will be compressed if the request that is
received was compressed.
2018-11-08 10:37:59 -07:00
Yannick Welsch
c315ead0ac
Zen2: Add diff-based publishing (#35290)
Enables diff-based publishing, which is an optimization where only the changing parts of the cluster
state are published to the nodes in the cluster, falling back to full cluster state publishing if the
receiver does not have the previous cluster state.
2018-11-08 17:16:09 +01:00
David Turner
6885a7cb0f
Introduce transport API for cluster bootstrapping (#34961)
- Introduces a transport API for bootstrapping a Zen2 cluster
- Introduces a transport API for requesting the set of nodes that a
  master-eligible node has discovered and for waiting until this comprises the
  expected number of nodes.
- Alters ESIntegTestCase to use these APIs when forming a cluster, rather than
  injecting the initial configuration directly.
2018-11-08 16:09:37 +00:00
Zachary Tong
54b445d74b [Test] Remove obsolete job/cluster cleanup code
Also makes sure the awaitBusy for job stoppage is checked, so
that we can fail if we timed out waiting for a job to stop.

Closes #35295
2018-11-08 10:23:23 -05:00
David Turner
77789a733d Merge branch 'master' into 2018-11-08-merge-master 2018-11-08 13:38:18 +00:00
Simon Willnauer
0cc0fd2d15
Add a frozen engine implementation (#34357)
This change adds a `frozen` engine that allows lazily open a directory reader
on a read-only shard. The engine wraps general purpose searchers in a LazyDirectoryReader
that also allows to release and reset the underlying index readers after any and before
secondary search phases.

Relates to #34352
2018-11-07 20:23:35 +01:00
Alpar Torok
8a85b2eada
Remove build qualifier from server's Version (#35172)
With this change, `Version` no longer carries information about the qualifier,
we still need a way to show the "display version" that does have both
qualifier and snapshot. This is now stored  by the build and red from `META-INF`.
2018-11-07 14:01:05 +02:00
Tim Brooks
f395b1eace
Open node connections asynchronously (#35144)
This is related to #29023. Additionally at other points we have
discussed a preference for removing the need to unnecessarily block
threads for opening new node connections. This commit lays the groudwork
for this by opening connections asynchronously at the transport level.
We still block, however, this work will make it possible to eventually
remove all blocking on new connections out of the TransportService
and Transport.
2018-11-06 17:58:20 -07:00
David Turner
7e356ac29b
[Zen2] Introduce auto_shrink_voting_configuration setting (#35217)
Today we allow the user to set the minimum size of a voting configuration. On
reflection we would rather this was simply '3' where possible, and we can use
the retirement API to control the removal of nodes more explicitly.

This change replaces the old reconfigurator setting with a new one,
`cluster.auto_shrink_voting_configuration`, which determines whether
Elasticsearch should automatically remove nodes from the voting configuration
or not.
2018-11-06 18:10:29 +00:00
Nick Knize
a5e1f4d3a2 Upgrade to lucene-8.0.0-snapshot-31d7dfe6b1 (#35224) 2018-11-06 11:55:23 +01:00
David Turner
2fb3d1a465
[Zen2] Fix some rarely-failing tests (#35198)
Recent changes have left a few Zen2 tests occasionally failing. This commit
fixes them.
2018-11-05 21:54:53 +00:00
Boaz Leskes
28078642b3
Engine.newChangesSnapshot may cause unneeded refreshes if called concurrently (#35169)
When the engine is asked for historical operations, we check if some of the requested operations
are not yet refreshed and if so we refresh before returning the operations. The refresh check is
based on capturing the local checkpoint before each refresh and comparing that value to the one
requested when `newChangesSnapshot` was called. If the requested range is above the captured
local checkpoint we issue a refresh.

This can currently cause unneeded extra refreshes if the method is called concurrently which may cause unwanted degradation in indexing performance. This is especially relevant for CCR where we always ask for a range below the global checkpoint. That range is guaranteed to be below the local
checkpoint of the shard and one refresh is enough to serve multiple changes requests.

This commit fixes this by introducing a dedicated mutex to make sure the test for whether a refresh
is needed actually wait for concurrents for concurrent refreshes that were caused by another
change refresh. 

Note that this is not a big change in semantics as refreshes are serialized by lucene anyway. I also
opted not to keep the synchronization to the changes snapshot request only even if in theory we
can apply it to all refreshes, not matter where they come from.
2018-11-04 13:43:33 +01:00