Commit Graph

2016 Commits

Author SHA1 Message Date
Nhat Nguyen 887f3f2c83 Simplify initialization of max_seq_no of updates (#41161)
Today we choose to initialize max_seq_no_of_updates on primaries only so
we can deal with a situation where a primary is on an old node (before
6.5) which does not have MUS while replicas on new nodes (6.5+).
However, this strategy is quite complex and can lead to bugs (for
example #40249) since we have to assign a correct value (not too low) to
MSU in all possible situations (before recovering from translog,
restoring history on promotion, and handing off relocation).

Fortunately, we don't have to deal with this BWC in 7.0+ since all nodes
in the cluster should have MSU. This change simplifies the
initialization of MSU by always assigning it a correct value in the
constructor of Engine regardless of whether it's a replica or primary.

Relates #33842
2019-04-30 15:14:52 -04:00
Tim Brooks df3ef66294
Remove dedicated SSL network write buffer (#41654)
This is related to #27260. Currently for the SSLDriver we allocate a
dedicated network write buffer and encrypt the data into that buffer one
buffer at a time. This requires constantly switching between encrypting
and flushing. This commit adds a dedicated outbound buffer for SSL
operations that will internally allocate new packet sized buffers as
they are need (for writing encrypted data). This allows us to totally
encrypt an operation before writing it to the network. Eventually it can
be hooked up to buffer recycling.

This commit also backports the following commit:

Handle WRAP ops during SSL read

It is possible that a WRAP operation can occur while decrypting
handshake data in TLS 1.3. The SSLDriver does not currently handle this
well as it does not have access to the outbound buffer during read call.
This commit moves the buffer into the Driver to fix this issue. Data
wrapped during a read call will be queued for writing after the read
call is complete.
2019-04-29 17:59:13 -06:00
Nhat Nguyen 615a0211f0 Recovery should not indefinitely retry on mapping error (#41099)
A stuck peer recovery in #40913 reveals that we indefinitely retry on
new cluster states if indexing translog operations hits a mapper
exception. We should not wait and retry if the mapping on the target is
as recent as the mapping that the primary used to index the replaying
operations.

Relates #40913
2019-04-27 10:55:08 -04:00
Armin Braun aad33121d8
Async Snapshot Repository Deletes (#40144) (#41571)
Motivated by slow snapshot deletes reported in e.g. #39656 and the fact that these likely are a contributing factor to repositories accumulating stale files over time when deletes fail to finish in time and are interrupted before they can complete.

* Makes snapshot deletion async and parallelizes some steps of the delete process that can be safely run concurrently via the snapshot thread poll
   * I did not take the biggest potential speedup step here and parallelize the shard file deletion because that's probably better handled by moving to bulk deletes where possible (and can still be parallelized via the snapshot pool where it isn't). Also, I wanted to keep the size of the PR manageable.
* See https://github.com/elastic/elasticsearch/pull/39656#issuecomment-470492106
* Also, as a side effect this gives the `SnapshotResiliencyTests` a little more coverage for master failover scenarios (since parallel access to a blob store repository during deletes is now possible since a delete isn't a single task anymore).
* By adding a `ThreadPool` reference to the repository this also lays the groundwork to parallelizing shard snapshot uploads to improve the situation reported in #39657
2019-04-26 15:36:09 +02:00
Tim Brooks 1f8ff052a1
Revert "Remove dedicated SSL network write buffer (#41283)"
This reverts commit f65a86c258.
2019-04-25 18:39:25 -06:00
Tim Brooks f65a86c258
Remove dedicated SSL network write buffer (#41283)
This is related to #27260. Currently for the SSLDriver we allocate a
dedicated network write buffer and encrypt the data into that buffer one
buffer at a time. This requires constantly switching between encrypting
and flushing. This commit adds a dedicated outbound buffer for SSL
operations that will internally allocate new packet sized buffers as
they are need (for writing encrypted data). This allows us to totally
encrypt an operation before writing it to the network. Eventually it can
be hooked up to buffer recycling.
2019-04-25 14:30:54 -06:00
Armin Braun 40aef2b8aa
Introduce Delegating ActionListener Wrappers (#40129) (#41527)
* Introduce Delegating ActionListener Wrappers
* Dry up use cases of ActionListener that simply pass through the response or exception to another listener
2019-04-25 16:05:04 +02:00
Ryan Ernst 7e3875d781 Upgrade hamcrest to 2.1 (#41464)
hamcrest has some improvements in newer versions, like FileMatchers
that make assertions regarding file exists cleaner. This commit upgrades
to the latest version of hamcrest so we can start using new and improved
matchers.
2019-04-24 23:40:03 -07:00
Martijn Laarman 85b9dc18a7 fix #35262 define deprecations of API's as a whole and urls (#39063)
* fix #35262 define deprecations of API's as a whole and urls

* document hot threads deprecated paths

* deprecate scroll_id as part of the URL, documented only as part of the body which is a safer behaviour as well

* use version numbers up to patch version

* rest spec parser picks up deprecated paths as paths too

(cherry picked from commit 7e06023e7603b7584bfd9ee4e8a1ccd82c208ce7)
2019-04-23 14:28:36 +02:00
Jason Tedor 4a288af85f
Avoid concurrent modification in mock log appender (#41424)
It can be the case that while we are setting up expectations that also a
log message is appended. For example, if we are setting up these
expectations after a cluster has formed and messages start being sent
around the cluster. In this case, we would hit a concurrent modification
exception while we are mutating the expectations, and also while the
expectations are being iterated over as a message is appended. This
commit avoids this by using a copy-on-write array list which is safe for
concurrent modification and iteration. Note that another possible
approach here is to use synchronized, but that seems unnecessary since
we don't appear to rely on messages that are sent while we are setting
up expectations. Rather, we are setting up some expectations and some
situation that we think will cause those expectations to be met. Using
copy-on-write array list here is nice since we avoid bottlenecking these
tests on synchronizing these methods.
2019-04-22 21:47:16 -04:00
Adrien Grand 9fd5237fd4
Clean up Node#close. (#39317) (#41301)
`Node#close` is pretty hard to rely on today:
 - it might swallow exceptions
 - it waits for 10 seconds for threads to terminate but doesn't signal anything
   if threads are still not terminated after 10 seconds

This commit makes `IOException`s propagated and splits `Node#close` into
`Node#close` and `Node#awaitClose` so that the decision what to do if a node
takes too long to close can be done on top of `Node#close`.

It also adds synchronization to lifecycle transitions to make them atomic. I
don't think it is a source of problems today, but it makes things easier to
reason about.
2019-04-17 16:10:53 +02:00
Armin Braun c4e84e2b34
Add Bulk Delete Api to BlobStore (#40322) (#41253)
* Adds Bulk delete API to blob container
* Implement bulk delete API for S3
* Adjust S3Fixture to accept both path styles for bulk deletes since the S3 SDK uses both during our ITs
* Closes #40250
2019-04-16 17:19:05 +02:00
Tim Brooks ad3b7abaa3
Deprecate old transport settings (#41229)
This is related to #36652. We intend to remove a number of old transport
settings in 8.0. This commit deprecates those settings for 7.x.
2019-04-15 21:43:09 -06:00
Nhat Nguyen e9999dfa1d
Init global checkpoint after copy commit in peer recovery (#40823)
Today a new replica of a closed index does not have a safe commit
invariant when its engine is opened because we won't initialize the
global checkpoint on a recovering replica until the finalize step. With
this change, we can achieve that property by creating a new translog
with the global checkpoint from the primary at the end of phase 1.
2019-04-11 22:18:31 -04:00
David Turner b522de975d Move primary term from replicas proxy to repl op (#41119)
A small refactoring that removes the primaryTerm field from ReplicasProxy and
instead passes it directly in to the methods that need it. Relates #40706.
2019-04-11 21:19:27 +01:00
Armin Braun 233df6b73b
Make Transport Shard Bulk Action Async (#39793) (#41112)
This is a dependency of #39504

Motivation:
By refactoring `TransportShardBulkAction#shardOperationOnPrimary` to async, we enable using `DeterministicTaskQueue` based tests to run indexing operations. This was previously impossible since we were blocking on the `write` thread until the `update` thread finished the mapping update.
With this change, the mapping update will trigger a new task in the `write` queue instead.
This change significantly enhances the amount of coverage we get from `SnapshotResiliencyTests` (and other potential future tests) when it comes to tracking down concurrency issues with distributed state machines.

The logical change is effectively all in `TransportShardBulkAction`, the rest of the changes is then simply mechanically moving the caller code and tests to being async and passing the `ActionListener` down.

Since the move to async would've added more parameters to the `private static` steps in this logic, I decided to inline and dry up (between delete and update) the logic as much as I could instead of passing the listener + wait-consumer down through all of them.
2019-04-11 16:01:52 +02:00
Mark Vieira 1287c7d91f
[Backport] Replace usages RandomizedTestingTask with built-in Gradle Test (#40978) (#40993)
* Replace usages RandomizedTestingTask with built-in Gradle Test (#40978)

This commit replaces the existing RandomizedTestingTask and supporting code with Gradle's built-in JUnit support via the Test task type. Additionally, the previous workaround to disable all tasks named "test" and create new unit testing tasks named "unitTest" has been removed such that the "test" task now runs unit tests as per the normal Gradle Java plugin conventions.

(cherry picked from commit 323f312bbc829a63056a79ebe45adced5099f6e6)

* Fix forking JVM runner

* Don't bump shadow plugin version
2019-04-09 11:52:50 -07:00
David Turner 2ff19bc1b7
Use Writeable for TransportReplAction derivatives (#40905)
Relates #34389, backport of #40894.
2019-04-05 19:10:10 +01:00
Martijn van Groningen 809a5f13a4
Make -try xlint warning disabled by default. (#40833)
Many gradle projects specifically use the -try exclude flag, because
there are many cases where auto-closeable resource ignore is never
referenced in body of corresponding try statement. Suppressing this
warning specifically in each case that it happens using
`@SuppressWarnings("try")` would be very verbose.

This change removes `-try` from any gradle project and adds it to the
build plugin. Also this change removes exclude flags from gradle projects
that is already specified in build plugin (for example -deprecation).

Relates to #40366
2019-04-05 08:02:26 +02:00
Jason Tedor f377155f10
Use default memory lock setting in testing (#40730)
Today we are running our internal tests with bootstrap.memory_lock
enabled. This is not out default setting, and not the recommended
value. This commit switches to use the default value, which is to not
enable bootstrap.memory_lock.
2019-04-02 17:56:32 -04:00
Alpar Torok 293297ae3d Fix repository-hdfs when no docker and unnecesary fixture
The hdfs-fixture is actually executed in plugin/repository-hdfs as a
dependency. The fixture is not needed and actually causes a failure
because we have two copies now and both use the same ports.
2019-03-29 16:55:12 +02:00
Alpar Torok 2b91fb1cc0 Avoid building hdfs-fixure use an image that works instead
Avoid the additional requirement for the debian package repos to be up,
and depend on dockerhub only instead.
2019-03-29 16:55:11 +02:00
Alpar Torok e8c0b53796 Add ability to mute and mute flaky fixture (#40630) 2019-03-29 12:10:04 +02:00
Alpar Torok d791e08932 Test fixtures krb5 (#40297)
Replaces the vagrant based kerberos fixtures with docker based test fixtures plugin.
The configuration is now entirely static on the docker side and no longer driven by Gradle,
also two different services are being configured since there are two different consumers of the fixture that can run in parallel and require different configurations.
2019-03-28 17:26:58 +02:00
Yannick Welsch fea91c6113 Mute testTracerLog
Relates to #40586
2019-03-28 14:05:54 +01:00
David Turner 5a2ba34174
Get node ID from nodes info in REST tests (#40052) (#40532)
We discussed recently that the cluster state API should be considered
"internal" and therefore our usual cast-iron stability guarantees do not hold
for this API.

However, there are a good number of REST tests that try to identify the master
node. Today they call `GET /_cluster/state` API and extract the master node ID
from the response. In fact many of these tests just want an arbitary node ID
(or perhaps a data node ID) so an alternative is to call `GET _nodes` or `GET
_nodes/data:true` and obtain a node ID from the keys of the `nodes` map in the
response.

This change adds the ability for YAML-based REST tests to extract an arbitrary
key from a map so that they can obtain a node ID from the nodes info API
instead of using the master node ID from the cluster state API.

Relates #40047.
2019-03-27 23:08:10 +00:00
Tim Brooks ab44f5fd5d
Add InboundHandler for inbound message handling (#40430)
This commit adds an InboundHandler to handle inbound message processing.
With this commit, this code is moved out of the TcpTransport.
Additionally, finer grained unit tests are added to ensure that the
inbound processing works as expected
2019-03-27 12:33:26 -06:00
Yannick Welsch 64b31f44af No mapper service and index caches for replicated closed indices (#40423)
Replicated closed indices can't be indexed into or searched, and therefore don't need a shard with
full indexing and search capabilities allocated. We can save on a lot of heap memory for those
indices by not allocating a mapper service and caching infrastructure (which preallocates a constant
amount per instance). Before this change, a 1GB ES instance could host 250 replicated closed
metricbeat indices (each index with one shard). After this change, the same instance can host 7300
replicated closed metricbeat instances (not that this would be a recommended configuration). Most
of the remaining memory is in the cluster state and the IndexSettings object.
2019-03-27 19:04:24 +01:00
Yannick Welsch 8f7c5732f1 Use default discovery implementation for single-node discovery (#40036)
Switches "discovery.type: single-node" from using a separate implementation for single-node discovery to using the existing standard discovery implementation, with two small adaptions:

-  auto-bootstrapping, but requiring initial_master_nodes not to be set.
- not actively pinging other nodes using the Peerfinder
- not allowing other nodes to join its single-node cluster (if they have e.g. been set up using regular discovery and connect to the single-disco node).
2019-03-27 19:04:24 +01:00
Tim Brooks 3860ddd1a4
Move outbound message handling to OutboundHandler (#40336)
Currently there are some components of message serializer and sending
that still occur in TcpTransport. This commit makes it possible to
send a message without the TcpTransport by moving all of the remaining
application logic to the OutboundHandler. Additionally, it adds unit
tests to ensure that this logic works as expected.
2019-03-27 11:47:36 -06:00
Tim Brooks 760cfffe4b
Move TransportMessageListener to TransportService (#40474)
Currently the TransportMessageListener is applied and used in the
Transport class. However, local requests and responses never make it to
this class. This PR moves the listener add/remove methods to the
TransportService. After this change the Transport can only have one
listener set with it. This one listener is the TransportService, which
will then propogate the events to the external listeners.

Additionally this commit back ports #40237

Remove Tracer from MockTransportService

Currently the TransportMessageListener is applied and used in the
Transport class. However, local requests and responses never make it to
this class. This PR moves the listener add/remove methods to the
TransportService. After this change the Transport can only have one
listener set with it. This one listener is the TransportService, which
will then propogate the events to the external listeners.
2019-03-27 09:24:20 -06:00
Henning Andersen bf444b9f02 Store Pending Deletions Fix (#40345)
FilterDirectory.getPendingDeletions does not delegate, fixed
temporarily by overriding in StoreDirectory.

This in turn caused duplicate file name use after a trimUnsafeCommits
had been done, since a new IndexWriter would not consider the pending
deletes in IndexFileDeleter. This should only happen on windows (AFAIK).

Reenabled doing index updates for all tests using
IndexShardTests.indexOnReplicaWithGaps (which could fail due to above
when using mocked WindowsFS).

Added getPendingDeletions delegation to all elasticsearch
FilterDirectory subclasses that were not trivial test-only overrides to
minimize the risk of hitting this issue in another case.
2019-03-26 15:30:44 +01:00
Armin Braun cafb83297c
Use Correct Enum in Wipe Snapshots Test Method (#40422) (#40438)
* Mistake was made in #39662
* The response deserialized here is `org.elasticsearch.action.admin.cluster.snapshots.get.GetSnapshotsResponse` which uses `org.elasticsearch.snapshots.SnapshotInfo` which uses `org.elasticsearch.snapshots.SnapshotState` and not the shard state
2019-03-26 09:04:15 +01:00
Nhat Nguyen efaf95628b Use separate translog dir in testDeleteWithFatalError
This test currently opens a new engine but shares the same translog
directory of the previous opening engine.
2019-03-20 10:22:27 -04:00
Mayya Sharipova 49a7c6e0e8
Expose proximity boosting (#39385) (#40251)
Expose DistanceFeatureQuery for geo, date and date_nanos types

Closes #33382
2019-03-20 09:24:41 -04:00
Henning Andersen 4c2a8638ca Cascading primary failure lead to MSU too low (#40249)
If a replica were first reset due to one primary failover and then
promoted (before resync completes), its MSU would not include changes
since global checkpoint, leading to errors during translog replay.

Fixed by re-initializing MSU before restoring local history.
2019-03-20 14:00:43 +01:00
Nhat Nguyen a13b4bc8c5 Always fail engine if delete operation fails (#40117)
Unlike index operations which can fail at the document level to
analyzing errors, delete operations should never fail at the document
level whether soft-deletes is enabled or not. With this change, we will
always fail the engine if we fail to apply a delete operation to Lucene.

Closes #33256
2019-03-19 13:09:23 -04:00
Nhat Nguyen 38e9522218 Remove wait for cluster state step in peer recovery (#40004)
We introduced WAIT_CLUSTERSTATE action in #19287 (5.0), but then stopped
using it since #25692 (6.0). This change removes that action and related
code in 7.x and 8.0.

Relates #19287
Relates #25692
2019-03-18 15:17:21 -04:00
Nhat Nguyen 9ba0bdf528 Dump cluster state if ensureGreen timed out in QA tests (#40133)
When the method ensureGreen in QA tests is timed out, it does not
provide enough info for us to investigate why the testing index is
not green yet. With this change, we will dump the cluster state if
ensureGreen timed out.

Relates #32027
2019-03-18 15:17:21 -04:00
Nhat Nguyen d720a64b9e Ensure sendBatch not called recursively (#39988)
This PR introduces AsyncRecoveryTarget which executes remote calls of
peer recovery asynchronously. In this change, we also add a new
assertion to ensure that method sendBatch, which sends a batch of
history operations in phase2, is never called recursively on the same
thread. This new assertion will also be used in method sendFileChunks.
2019-03-18 15:17:21 -04:00
Andrey Ershov 42602478b8 Unmute, fix, refactor and zen2ify NetworkDisruptionIT (#38351)
This commit unmutes NetworkDisruptionIT.

It makes changes necessary for Zen2 - avoids usage of
autoMinMasterNodes and selects cluster size, such that there is no
need to call AddVotingExclusion.
This test also introduces refactors a single method
prepareDistruptedCluster to be used by both test methods.
Unfortunately, NetworkDisruption is broken and the
testNetworkPartitionRemovalRestoresConnections "is fixed" by
introducing assertBusy - #38348.

Relates #36205
Relates #38348

(cherry picked from commit 97707c7f892636e5b75c3df546b067414acb27cd)
2019-03-18 16:39:43 +01:00
Jim Ferenczi eb540125ea
Fix IndexSearcherWrapper visibility (#39071) (#40145)
This change adds a wrapper for IndexSearcher that makes IndexSearcher#search(List, Weight, Collector) visible by
sub-classes. The wrapper is used by the ContextIndexSearcher to call this protected method on a searcher created by a plugin.
This ensures that an override of the protected method in an IndexSearcherWrapper plugin is called when a search is executed.

Closes #30758
2019-03-18 11:33:54 +01:00
Tim Brooks 0b50a670a4
Remove transport name from tcp channel (#40074)
Currently, we maintain a transport name ("mock-nio", "nio", "netty")
that is passed to a `TcpTransportChannel` when a request is received.
The value of this name is to associate with the task when we register a
task with the task manager. However, it is only possible to run ES with
one transport, so having an implementation specific name is unnecessary.
This commit removes the name and replaces it with the generic
"transport".
2019-03-15 12:04:13 -06:00
David Turner a323132503 Create retention leases file during recovery (#39359)
Today we load the shard history retention leases from disk whenever opening the
engine, and treat a missing file as an empty set of leases. However in some
cases this is inappropriate: we might be restoring from a snapshot (if the
target index already exists then there may be leases on disk) or
force-allocating a stale primary, and in neither case does it make sense to
restore the retention leases from disk.

With this change we write an empty retention leases file during recovery,
except for the following cases:

- During peer recovery the on-disk leases may be accurate and could be needed
  if the recovery target is made into a primary.

- During recovery from an existing store, as long as we are not
  force-allocating a stale primary.

Relates #37165
2019-03-15 07:49:49 +00:00
Jason Tedor 9181668edf
Stop returning cluster state size by default (#40016)
Computing the compressed size of the cluster state on every invocation
of cluster:monitor/state action is expensive, and the value of this
field is dubious anyway. Therefore we want to remove computing this
field. As a first step, we stop computing and return this field by
default. To avoid breaking users, we will give them a system property to
use to tide them over until the next major release when we will actually
remove this field. This comes with a deprecation warning too, and the
backport to the appropriate minor will also include a note in the
migration guide. There will be a follow-up to remove this field in the
next major version.
2019-03-14 08:57:55 -04:00
David Turner 049970af3e Only connect to new nodes on new cluster state (#39629)
Today, when applying new cluster state we attempt to connect to all of its
nodes as a blocking part of the application process. This is the right thing to
do with new nodes, and is a no-op on any already-connected nodes, but is
questionable on known nodes from which we are currently disconnected: there is
a risk that we are partitioned from these nodes so that any attempt to connect
to them will hang until it times out. This can dramatically slow down the
application of new cluster states which hinders the recovery of the cluster
during certain kinds of partition.

If nodes are disconnected from the master then it is likely that they are to be
removed as part of a subsequent cluster state update, so there's no need to try
and reconnect to them like this. Moreover there is no need to attempt to
reconnect to disconnected nodes as part of the cluster state application
process, because we periodically try and reconnect to any disconnected nodes,
and handle their disconnectedness reasonably gracefully in the meantime.

This commit alters this behaviour to avoid reconnecting to known nodes during
cluster state application.

Resolves #29025.
2019-03-12 19:26:13 +00:00
Tim Brooks 5612ed97ca
Add log warnings for long running event handling (#39729)
Recently we have had a number of test issues related to blocking
activity occuring on the io thread. This commit adds a log warning for
when handling event takes a >150 milliseconds. This is implemented
for the MockNioTransport which is the transport used in
ESIntegTestCase.
2019-03-08 13:07:24 -07:00
Alpar Torok 0f89427eb6
Back port build changes from same version bwc tests (#39744)
* Back port build changes from #39102

This back-ports how versions are determined and bwc test are set up from
 #39102 without enabling the bwc from current version tests so it's
 easier/possible to backmerge future buld changes.
It's expected that the tets are lacking many of the required fixes in
this version to enable them.
2019-03-07 17:25:09 +02:00
Alpar Torok 34ea84948c
Fix bwc tests failure to extract (#39619)
* Set correct packaging for older versions

Continue using zip packages for pre 7
Some other bwc fixes that hid behind this one.
Closes #39441  #39751
2019-03-07 09:07:11 +02:00
Armin Braun bb2f8485f1
Wipe Snapshots Before Indices in RestTests (#39662) (#39763)
* Wipe Snapshots Before Indices in RestTests

* If we have a snapshot ongoing from the previous test and enter this method, then deleting the indices fails, which in turn fails the whole wipe
   * Fixed by first deleting/aborting snapshots
2019-03-06 21:48:17 +01:00