Commit Graph

3581 Commits

Author SHA1 Message Date
Luca Cavanna 8cc3c0dd93 Remove task null check in TransportAction (#45014)
The task that TaskManager#register returns cannot be null. The method
enforces that it is not null after calling request#createTask. It is
then needless to check for null in the listener later. Also, added the
call to the delegate listener in a finally block, just to make sure.
2019-07-31 17:16:41 +02:00
Christoph Büscher e85b53a955
Remove left-over AwaitsFix in DedicatedClusterSnapshotRestoreIT (#45042)
The issue mentioned (#38845) seems to have been closed with #38891 so the test
can be re-activated.
2019-07-31 17:15:41 +02:00
Armin Braun c7d7230524
Stop Recreating Wrapped Handlers in RestController (#44964) (#45040)
* We shouldn't be recreating wrapped REST handlers over and over for every request. We only use this hook in x-pack and the wrapper there does not have any per request state.
  This is inefficient and could lead to some very unexpected memory behavior
   => I made the logic create the wrapper on handler registration and adjusted the x-pack wrapper implementation to correctly forward the circuit breaker and content stream flags
2019-07-31 17:11:34 +02:00
Zachary Tong c25f3dd5d0
Introduce 7.3.1 version (#45046) 2019-07-31 10:53:55 -04:00
Andrey Ershov c27ac3d24c Unmute testClusterJoinDespiteOfPublishingIssues and testElectMasterWithLatestVersion (#38555)
See my comments for #37539 and #37685

(cherry picked from commit 038d4ab2940340eca942e32b54044f183b7804d9)
2019-07-31 14:55:02 +02:00
David Roberts 5e3010a606 Use system context for looking up connected nodes (#43991)
When finding nodes in a connected cluster for cross cluster
search the requests to get cluster state on the connected
cluster should be made in the system context because
logically they are equivalent to checking a single detail
in the local cluster state and should not require that the
user who made the request that is using this method in its
implementation is authorized to view the entire cluster
state.

Fixes #43974
2019-07-31 09:09:56 +01:00
Igor Motov 1a1bb4707d Geo: move indexShape to AbstractGeometryFieldMapper.Indexer (#44979)
Move indexShape functionality into AbstractGeometryFieldMapper to make
it more unit testable.

Relates to #43644
2019-07-30 14:50:23 -04:00
Mayya Sharipova a154b73d99 Assure index ops are successful for SimpleNestedIT (#44815)
relates to #44486
2019-07-30 14:24:28 -04:00
Nhat Nguyen 979d0a71c7 Remove leniency during replay translog in peer recovery (#44989)
This change removes leniency in InternalEngine during replaying translog
in peer recovery.
2019-07-30 13:25:15 -04:00
Jake Landis 41a99c9e4a introduce 7.2.2 as a version (#44371)
* introduce 7.2.2 as a version
2019-07-30 18:52:34 +02:00
Jake Landis 03fea1c503 introduce 6.8.3 as a version (#44708) 2019-07-30 18:48:41 +02:00
David Kyle 78aa6143a6 Mute FilteringAllocationIT testTransientSettingsStillApplied
Relates to https://github.com/elastic/elasticsearch/issues/45003
2019-07-30 14:10:50 +01:00
Yannick Welsch c1b569ed4b Revert "Mute Zen1IT#testMixedClusterDisruption"
This reverts commit cf78ca58e3.
2019-07-30 13:10:14 +02:00
David Turner 55f1dd8da6 Close nodes properly in Coordinator tests (#44967)
Today closing a `ClusterNode` in an `AbstractCoordinatorTestCase` uses
`onNode()` so has no effect if the node is not in the current list of nodes.
It also discards the `Runnable` it creates without having run it, so has no
effect anyway.

This commit makes these tests much stricter about properly closing the nodes
started during `Coordinator` tests, by tracking the persisted states that are
opened, and adds an assertion to catch the trappy requirement that the closing
node still belongs to the cluster.
2019-07-30 11:47:36 +01:00
David Kyle cf78ca58e3 Mute Zen1IT#testMixedClusterDisruption 2019-07-30 11:33:39 +01:00
Jim Ferenczi 43bd8f2ba0 Fix aggregators early termination with breadth-first mode (#44963)
This commit fixes a bug when a deferred aggregator tries to early terminate the collection. In such case the CollectionTerminatedException is not caught and
the search fails on the shard. This change makes sure that we catch the exception in order to continue the deferred collection on the next leaf.

Fixes #44909
2019-07-30 11:26:40 +02:00
Andrey Ershov 5a0bd696fc
Snapshot tool S3 cleanup 7.x backport (#44575)
Backport of #44551
2019-07-30 11:02:08 +02:00
Nhat Nguyen 4813728783 Remove leniency in reset engine from translog (#44711)
Replaying operations from the local translog must never fail as those
operations were processed successfully on the primary before and the
mapping is up to update already. This change removes leniency during
resetting engine from translog in IndexShard and InternalEngine.
2019-07-29 16:31:45 -04:00
Jack Conradson 1a21682ed0 Fix JodaCompatibleZonedDateTime casts in Painless (#44874)
This is a temporary fix during the Joda to Java datetime transition. This will 
implicitly cast a JodaCompatibleZonedDateTime to a ZonedDateTime for 
both def and static types. This is necessary to insulate users from needing 
to know about JodaCompatibleZonedDateTime explicitly.
2019-07-29 12:05:26 -07:00
Igor Motov b6cef227a5 Geo: fix geo query decomposition (#44924)
The recent refactoring introduced an issue where queries where not
going through the decomposition processing.

Fixes #44891
2019-07-29 11:48:24 -04:00
Luca Cavanna a3cc32da64 TaskListener#onFailure to accept Exception instead of Throwable (#44946)
TaskListener accepts today Throwable in its onFailure method. Though
looking at where it is called (TransportAction), it can never be
notified of a Throwable.

This commit changes the signature of TaskListener#onFailure so that it
accepts an `Exception` rather than a `Throwable` as second argument.
2019-07-29 16:47:19 +02:00
Michał Perlak 245c9b7914 Optimize Min and Max BKD optimizations (#44315)
MinAggregator - skip BKD optimization when no result found after 1024 lookups.
MaxAggregator - skip unnecessary conversions.
2019-07-29 10:04:39 -04:00
Yannick Welsch 24873dd3e3 Do not block transport thread on startup (#44939)
We currently block the transport thread on startup, which has caused test failures. I think this is
some kind of deadlock situation. I don't think we should even block a transport thread, and
there's also no need to do so. We can just reject requests as long we're not fully set up. Note
that the HTTP layer is only started much later (after we've completed full start up of the
transport layer), so that one should be completely unaffected by this.

Closes #41745
2019-07-29 11:35:17 +02:00
Armin Braun f5efafd4d6
Cleanup Deadcode o.e.indices (#44931) (#44938)
* none of this is used anywhere
2019-07-29 10:38:35 +02:00
Igor Motov cfc8d17bb4 Geo: refactor geo mapper and query builder (#44884)
Refactors out the indexing and query generation logic out of the
mapper and query builder into a separate unit-testable classes.
2019-07-26 16:48:31 -04:00
Yannick Welsch 1561ab5420 Guard open connection call in RemoteClusterConnection (#44921)
Fixes an issue where a call to openConnection was not properly guarded, allowing an exception
to bubble up to the uncaught exception handler, causing test failures.

Closes #44912
2019-07-26 22:27:45 +02:00
Tanguy Leroux e1b626b947 Ensure index is green in SimpleClusterStateIT.testIndicesOptions() (#44893)
SimpleClusterStateIT testIndicesOptions failed in #44817 because it tries to close 
an index at the beginning of the test. With random index settings, it is possible that 
the index has a high number of shards (10) and replicas (1), which means that on 
CI this index can take time to be fully allocated.

The close index request can fail in the case where replicas are still recovering operations. 
Thiscommit adds a simple ensureGreen() at the beginning of the test to be sure that all 
replicas are started before trying to close the index.

closes #44817
2019-07-26 17:07:53 +02:00
Armin Braun 1340ff19bc
Fix Test Failure in ScalingThreadPoolTests (#44898) (#44901)
* Due to #44894 some constellations log a deprecation warning here now
* Fixed by checking for that
2019-07-26 17:05:50 +02:00
Tanguy Leroux 8848fcfb22 Ensure cluster is stable in ShrinkIndexIT.testShrinkThenSplitWithFailedNode (#44860)
The test ShrinkIndexIT.testShrinkThenSplitWithFailedNode sometimes fails 
because the resize operation is not acknowledged (see #44736). This resize 
operation creates a new index "splitagain" and it results in a cluster state 
update (TransportResizeAction uses MetaDataCreateIndexService.createIndex() 
to create the resized index). This cluster state update is expected to be 
acknowledged by all nodes (see IndexCreationTask.onAllNodesAcked()) but 
this is not always true: the data node that was just stopped in the test before 
executing the resize operation might still be considered as a "faulty" node
 (and not yet removed from the cluster nodes) by the FollowersChecker. The 
cluster state is then acked on all nodes but one, and it results in a non 
acknowledged resize operation.

This commit adds an ensureStableCluster() check after stopping the node in 
the test. The goal is to ensure that the data node has been correctly removed 
from the cluster and that all nodes are fully connected to each before moving 
forward with the resize operation.

Closes #44736
2019-07-26 10:14:27 +02:00
Jason Tedor 6ea2b5dec0
Deprecate setting processors to more than available (#44889)
Today the processors setting is permitted to be set to more than the
number of processors available to the JVM. The processors setting
directly sizes the number of threads in the various thread pools, with
most of these sizes being a linear function in the number of
processors. It doesn't make any sense to set processors very high as the
overhead from context switching amongst all the threads will overwhelm,
and changing the setting does not control how many physical CPU
resources there are on which to schedule the additional threads. We have
to draw a line somewhere and this commit deprecates setting processors
to more than the number of available processors. This is the right place
to draw the line given the linear growth as a function of processors in
most of the thread pools, and that some are capped at the number of
available processors already.
2019-07-26 17:06:44 +09:00
Ignacio Vera 821f6f893b
Upgrade to Lucene 8.2.0 release (#44859) (#44892) 2019-07-26 08:14:59 +02:00
Nhat Nguyen d128188c28 Return seq_no and primary_term in noop update (#44603)
With this change, we will return primary_term and seq_no of the current
document if an update is detected as a noop. We already return the
version; hence we should also return seq_no and primary_term.

Relates #42497
2019-07-25 19:16:56 -04:00
Yannick Welsch bd8470e738 Asynchronously connect to remote clusters (#44825)
Refactors RemoteClusterConnection so that it no longer blockingly connects to remote clusters.

Relates to #40150
2019-07-25 22:59:59 +02:00
Yannick Welsch 0ce841915c Add Clone Index API (#44267)
Adds an API to clone an index. This is similar to the index split and shrink APIs, just with the
difference that the number of primary shards is kept the same. In case where the filesystem
provides hard-linking capabilities, this is a very cheap operation.

Indexing cloning can be done by running `POST my_source_index/_clone/my_target_index` and it
supports the same options as the split and shrink APIs.

Closes #44128
2019-07-25 22:02:28 +02:00
Ryan Ernst 03dd22b56c Add missing ZonedDateTime methods for joda compat layer (#44829)
While joda no longer exists in the apis for 7.x, the compatibility layer
still exists with helper methods mimicking the behavior of joda for
ZonedDateTime objects returned for date fields in scripts. This layer
was originally intended to be removed in 7.0, but is now likely to exist
for the lifetime of 7.x.

This commit adds missing methods from ChronoZonedDateTime to the compat
class. These methods were not part of joda, but are needed to act like a
real ZonedDateTime.

relates #44411
2019-07-25 11:45:57 -07:00
Julie Tibshirani acb7f599a3 Fix an NPE when requesting inner hits and _source is disabled. (#44836)
This PR makes two changes to FetchSourceSubPhase when _source is disabled and
we're in a nested context:
* If no source filters are provided, return early to avoid an NPE.
* If there are source filters, make sure to throw an exception.

The behavior was chosen to match what currently happens in a non-nested context.
2019-07-25 10:38:00 -07:00
Nicholas Knize 48757da6e1 [GEO] Fix GeoShapeQueryBuilder to check for valid spatial relations
Refactor left out the spatial strategy check in GeoShapeQueryBuilder.relation
setter method. This commit adds that check back in.
2019-07-25 11:32:13 -05:00
Nick Knize 133f848e9f [Geo] Refactor GeoShapeQueryBuilder to derive from AbstractGeometryQueryBuilder (#44780)
Refactors GeoShapeQueryBuilder to derive from a new AbstractGeometryQueryBuilder that provides common parsing and build logic for spatial geometries. This will allow development of custom geometry queries by extending AbstractGeometryQueryBuilder preventing duplication of common spatial query logic.
2019-07-25 11:32:13 -05:00
Armin Braun 383d7b7713
Cleanup Dead Code in Index Creation (#44784) (#44822)
* Cleanup Dead Code in Index Creation
* This is all unused and the state of a create request is always `OPEN`
2019-07-25 10:50:04 +02:00
Yannick Welsch e0d4544ef6 Close connection manager on current thread in RemoteClusterConnection (#44805)
The problem is that RemoteClusterConnection closes the connection manager asynchronously, which races with the threadpool being shutdown at the end of the test.

Closes #44339
Closes #44610
2019-07-25 09:34:41 +02:00
Igor Motov f9943a3e53 Geo: deprecate ShapeBuilder in QueryBuilders (#44715)
Removes unnecessary now timeline decompositions from shape builders
and deprecates ShapeBuilders in QueryBuilder in favor of libs/geo
shapes.

Relates to #40908
2019-07-24 14:27:58 -04:00
David Turner 4cfd2fc6b2 Fix testFirstListElementsToCommaDelimitedStringReportsFirstElementsIfLong (#44785)
This test can fail (super-rarely) if it generates a list of length 11
containing a duplicate, because the `.distinct()` reduces the list length to 10
and then it is not abbreviated any more. This change generalises the test to
cover lists of any random length.
2019-07-24 16:10:41 +01:00
Tanguy Leroux a8905ef142
[7.x] Add CloseIndexResponse to HLRC (#44349) (#44788)
The CloseIndexResponse was improved in #39687; this commit
exposes it in the HLRC.

Backport of #44349 to 7.x.
2019-07-24 15:51:01 +02:00
Dimitris Athanasiou 5453188cef [TEST] Mute SharedClusterSnapshotRestoreIT.testParallelRestoreOperationsFromSingleSnapshot
This was supposed to be muted in #44675 and its backports but that PR accidentally muted
another test.

Relates #44671
2019-07-24 14:28:09 +03:00
Armin Braun 4a3218551c
Fix ConnectionManagerTests (#44769) (#44789)
* In both fake connection validators we were potentially executing the listener twice. This lead to the situation that the locking via `connectionLock` that ensures that each listener is only executed once ever
would fail and the lister would run twice (in which case the listeners for that node are already `null` and we get an NPE)
* The fact that two different tests fail is due to the fact that we weren't safely shutting down the threadpool which meant the the task that trips the assertion (on the generic pool) would leak into the next test and fail it
* Closes #44758
2019-07-24 13:12:57 +02:00
Jason Tedor 4c77d5e2c7
Remove stale permissions from untrusted policy (#44783)
We have some old permissions lying around, granted to untrusted code
from the days of yore when we supported Groovy and Javascript
scripting. This commit removes these stale permissions.
2019-07-24 15:59:16 +09:00
Jason Tedor 659ebf6cfb
Notify systemd when Elasticsearch is ready (#44673)
Today our systemd service defaults to a service type of simple. This
means that systemd assumes Elasticsearch is ready as soon as the
ExecStart (bin/elasticsearch) process is forked off. This means that the
service appears ready long before it actually is, so before it is ready
to receive requests. It also means that services that want to depend on
Elasticsearch being ready to start can not as there is not a reliable
mechanism to determine this. This commit changes the service type to
notify. This requires that Elasticsearch sends a notification message
via libsystemd sd_notify method. This commit does that by using JNA to
invoke this native method. Additionally, we use this integration to also
notify systemd when we are stopping.
2019-07-24 14:04:36 +09:00
Armin Braun 818103ff1e
Fix testRetentionLeasesClearedOnRestore (#44754) (#44766)
* Fix this test randomly failing when running into async translog persistence edge case and failing to successfully close index
* Also, slightly improve debug logging on close failure
* Closes #44681
2019-07-23 21:29:07 +02:00
Igor Motov 9338fc8536 GEO: Switch to using GeoTestUtil to generate random geo shapes (#44635)
Switches to more robust way of generating random test geometries by
reusing lucene's GeoTestUtil. Removes duplicate random geometry
generators by moving them to the test framework.

Closes #37278
2019-07-23 14:30:41 -04:00
Armin Braun e5bd3ad0e9
Remove some Dead Code in o.e.transport (#44653) (#44734)
* None of this is used
2019-07-23 10:52:37 +02:00
David Turner ee23968f05 Ignore unknown fields if overriding node metadata (#44689)
The `elasticsearch-node override-version` command fails if it cannot read the
existing node metadata file. However, it reads this file strictly and fails if
there are any unknown fields, which means it will not be useful if we add
another field in future.

This commit adds leniency to this command, allowing it to ignore any unknown
fields and proceed with the downgrade. A downgrade is already unsafe, and the
user is already copiously warned about this, so being lenient in this case does
not make things much worse.
2019-07-23 08:54:58 +01:00
Jason Tedor 6928a315c4
Check shard limit after applying index templates (#44619)
Today when creating an index and checking cluster shard limits, we check
the number of shards before applying index templates. At this point, we
do not know the actual number of shards that will be used to create the
index. In a case when the defaults are used and a template would
override, we could be grossly underestimating the number of shards that
would be created, and thus incorrectly applying the limits. This commit
addresses this by checking the shard limits after applying index
templates.
2019-07-23 16:50:42 +09:00
Ignacio Vera 05ec970723
Support BucketScript paths of type string and array. (#44694) (#44731) 2019-07-23 09:05:47 +02:00
Ioannis Kakavas 3714cb63da Allow parsing the value of java.version sysprop (#44017)
We often start testing with early access versions of new Java
versions and this have caused minor issues in our tests
(i.e. #43141) because the version string that the JVM reports
cannot be parsed as it ends with the string -ea.

This commit changes how we parse and compare Java versions to
allow correct parsing and comparison of the output of java.version
system property that might include an additional alphanumeric
part after the version numbers
 (see [JEP 223[(https://openjdk.java.net/jeps/223)). In short it 
handles a version number part, like before, but additionally a 
PRE part that matches ([a-zA-Z0-9]+).

It also changes a number of tests that would attempt to parse
java.specification.version in order to get the full version
of Java. java.specification.version only contains the major
version and is thus inappropriate when trying to compare against
a version that might contain a minor, patch or an early access
part. We know parse java.version that can be consistently
parsed.

Resolves #43141
2019-07-22 20:14:56 +03:00
Tanguy Leroux bcb3563dcf Remove AllocationService.reroute(ClusterState, String, boolean) (#44629)
This commit removes the method AllocationService.reroute(ClusterState, String, boolean) 
in favor of AllocationService.reroute(ClusterState, String).

Motivations are:
    there are already 3 other reroute methods in this class
    this method is always called with the debug parameter set to false
    almost all tests use the method reroute(ClusterState, String)
2019-07-22 17:12:21 +02:00
Evgenia Badiyanova 5273a548a4 Unmute PendingTasksBlocksIT tests 2019-07-22 10:59:21 -04:00
Armin Braun 6ceae5d586
Document Type of Collections Returned by StreamInput (#44686) (#44688)
* As a result of #44665 the collections returned by the deserialization methods on `StreamInput` may be either mutable or immutable now,
this PR adds documentation for that fact
2019-07-22 16:06:34 +02:00
Evgenia Badiyanova 8ee4c4d5ba Mute some tests in PendingTasksBlocksIT
Tracked in #44695.
2019-07-22 09:55:07 -04:00
David Turner dcb3b2c18a Fix testPendingTasksWithClusterNotRecoveredBlock
In 7.x we cannot start a new master-eligible node before the cluster has formed
since we first try and update minimum_master_nodes and this is blocked. This
commit changes the test to start a data-only node so that no such adjustment is
necessary.

Relates #44685
2019-07-22 14:42:20 +01:00
Mayya Sharipova 972a49312c Fix testQuotedQueryStringWithBoost test (#43385)
Add more logging to indexRandom

Seems that asynchronous indexing from indexRandom sometimes indexes
the same document twice, which will mess up the expected score calculations.

For example, indexing:
{ "index" : {"_id" : "1" } }
{"important" :"phrase match", "less_important": "nothing important"}
{ "index" : {"_id" : "2" } }
{"important" :"nothing important", "less_important" :"phrase match"}
Produces the expected scores: 13.8 for doc1, and 1.38 for doc2

indexing:
{ "index" : {"_id" : "1" } }
{"important" :"phrase match", "less_important": "nothing important"}
{ "index" : {"_id" : "2" } }
{"important" :"nothing important", "less_important" :"phrase match"}
{ "index" : {"_id" : "3" } }
{"important" :"phrase match", "less_important": "nothing important"}
Produces scores: 9.4 for doc1, and 1.96 for doc2 which are found in the
error logs.

Relates to #43144
2019-07-22 08:44:31 -04:00
Przemyslaw Gomulka a154f49b94
Fix stats in slow logs to be a escaped JSON backport(#44642) #44687
Fields in JSON logs should be an escaped JSON fields. It is a broken json value at the moment
"stats": "["group1", "group2"]", -> "stats": "[\"group1\", \"group2\"]",
This should later be refactored into a JSON array of strings (the same as types in 7.x)
2019-07-22 14:28:39 +02:00
David Turner 0ce3114779 Allow pending tasks before state recovery (#44685)
Today we block access to the pending tasks API before the cluster has recovered
its state. There's no real need to do so, and the master does meaningful work
even before performing state recovery so it might sometimes be useful to allow
access to this API. This commit changes this API to ignore all cluster blocks.

Fixes #44652
2019-07-22 13:15:10 +01:00
Przemyslaw Gomulka 09e9c4cb59
Fix types field in JSON Search Slow Logs (#44641)
The field has to be defined in log4j2.properties and should be an
escaped JSON for now (it is a broken JSON at the moment). This should later be refactored into a JSON array
of strings.
2019-07-22 12:02:20 +02:00
Przemyslaw Gomulka fe20e217a4
Deprecation messages with the same key but different x-opaque-id are allowed backport(#44587) #44682
Deprecation logger was filtering log entries by key, that means that if two log messages with the same key are logged from different users, then the second log messages will be filtered.
This change allows to log deprecation message with the same key by different users.

relates #41354
backport #44587
2019-07-22 11:38:11 +02:00
Armin Braun a6adcecd20 Fix Tring to Mutate Immutable Collections
Fixes two spots where #44665 caused a previously mutable collection to now be read as an immutable one, leading to errors
2019-07-22 11:04:05 +02:00
Armin Braun b9067ba1ba
Remove Needless Synchronization in FollowersChecker (#44631) (#44680)
* It seems redundant to synchronize here and check that the map hasn't checked via the `isRunning` under the mutex
* The map won't change if under the mutex that locks on all the updates to it
* Without the mutex it's very unlikely to change inside the method call relative to the likelihood of changing until the generic pool where we check for `isRunning` again anyway

-> just remove the synchronization (it's on the IO loop) and check since we do check the running state on the generic pool under the mutex anyway when we actually fail it
2019-07-22 10:57:30 +02:00
Jason Tedor ff76b0af8b
Copy field names in stored fields context
We have to copy the field names otherwise we either have a handle of a
list that a caller might mutate or we might mutate when they aren't
expecting it, or worse, a handle of a list that is not mutable (and we
end up mutating the list).

Relates #44665
2019-07-22 17:40:07 +09:00
Alpar Torok b34ac66d96
Mute multiple tests on Windows (7.x) (#44676)
* Mute failing test

tracked in #44552

* mute EvilSecurityTests

tracking in #44558

* Fix line endings in ESJsonLayoutTests

* Mute failing ForecastIT  test on windows

Tracking in #44609

* mute BasicRenormalizationIT.testDefaultRenormalization

tracked in #44613

* fix mute testDefaultRenormalization

* Increase busyWait timeout windows is slow

* Mute failure unconfigured node name

* mute x-pack internal cluster test windows

tracking #44610

* Mute JvmErgonomicsTests on windows

Tracking #44669

* mute SharedClusterSnapshotRestoreIT testParallelRestoreOperationsFromSingleSnapshot

Tracking #44671

* Mute NodeTests on Windows

Tracking #44256
2019-07-22 11:32:29 +03:00
Armin Braun 0e2e83f591
More Efficient Deserialization of Empty Collections in StreamInput (#44665) (#44674)
* We only had the `size == 0` optimization in some but not all spots of deserializing collections in this class, fixed the remaining spots.
* Also fixed the a similar spot when deserializing `ThreadContextStruct` that could now be simplified (it was apparently doing it's own version of this optimization for the first map it deserialized before ... but not for the second map -> made it not instantiate anything if both maps are empty since it's always the same object here anyway)
2019-07-22 09:31:12 +02:00
Armin Braun 0ac137a9a1
Optimize some StreamOutput Operations (#44660) (#44668)
* Optimize some StreamOutput Operations

* Writing numbers byte by byte adds a lot of unnecessary bounds checks to serialization
* Serializing to a threadlocal `byte[]` instead and bulk writing gives about a 50% speedup on `long` and `vlong` (for large numbers) writes and 30% for `int`, `vint` on Linux on an i9
* Using a threadlocal of the maximum string buffer size we used to allocate before also removes allocations when writing strings in general since we now never have to allocate a `byte[]` for that
   * And don't have to GC one either resolving the TODO removed here
2019-07-22 07:09:32 +02:00
Tal Levy 1a9cfe9110
Removal Streamable (#44647) (#44655)
This commit ends the grand adventure that was the
refactoring effort to migrate all usages of
Streamable to Writeable.

Closes #34389.
2019-07-20 19:10:49 -07:00
Ryan Ernst 4c05d25ec7
Convert Transport Request/Response to Writeable (#44636) (#44654)
This commit converts all remaining TransportRequest and
TransportResponse classes to implement Writeable, and disallows
Streamable implementations.

relates #34389
2019-07-20 11:25:58 -07:00
Ryan Ernst f4ee2e9e91
Convert direct implementations of Streamable to Writeable (#44605) (#44646)
This commit converts Streamable to Writeable for direct implementations.

relates #34389
2019-07-20 08:32:29 -07:00
Tal Levy 7c84636029
Remove StreamOutput #writeOptionalStreamable and #writeStreamableList (#44602) (#44643)
remove usages of writeOptionalStreamable and writeStreambaleList

relates #34389.
2019-07-19 15:55:53 -07:00
Ryan Ernst f193d14764
Convert remaining Action Response/Request to writeable.reader (#44528) (#44607)
This commit converts readFrom to ctor with StreamInput on the remaining
ActionResponse and ActionRequest classes.

relates #34389
2019-07-19 13:33:38 -07:00
Armin Braun f028ab43ad
Don't Swallow Interrupt in TransportService#onRequestReceived (#44622) (#44627)
* We shouldn't just swallow the interrupt here quietly and keep going on the IO thread
   * Currently interrupt continues here just the same way an invocation of `acceptIncomingRequests` woudl have made things continue
* Relates #44610
2019-07-19 20:35:29 +02:00
Christoph Büscher eafe54c81c Fix AnalysisMode propagation in NamedAnalyzer (#44626)
NamedAnalyzer should return the same AnalysisMode than any custom analyzer it
wraps, otherwise AnalysisMode.ALL. This used to be only CustomAnalyzer in the
past, but with the introduction of the ReloadableCustomAnalyzer this needs to be
added as an option where the analysis mode gets propagated.

Closes #44625
2019-07-19 18:18:43 +02:00
Nikita Glashenko 804476c35d Remove support for old translog checkpoint formats (#44280)
This commit removes support for the translog checkpoint format from versions
before 6.0.0 since 7.x versions are incompatible with indices from these
versions.

Relates #44720
Fixes #44210
2019-07-19 16:01:47 +01:00
Przemyslaw Gomulka 597d2dfaf5
Add types field to slow logs in 7.x (#44592)
By mistake in 7.x types field was removed from slow logs. Types are
still present in that version, so this have to be present as a JSON
field
relates #41354
backport that was causing this #44178
2019-07-19 08:31:00 +02:00
Ryan Ernst 60785a9fa8
Convert several direct uses of Streamable to Writeable (#44586) (#44604)
This commit converts several utility classes that implement Streamable
to have StreamInput constructors. It also adds a default version of
readFrom to Streamable so that overriding to throw UOE is not necessary.

relates #34389
2019-07-18 21:25:44 -07:00
Julie Tibshirani 336364fefe
Convert more classes in 'server' to Writeable. (#44600)
* Convert GetTask*.
* Convert RemoteInfo*.
* Convert GetFieldMappings*.
* Convert ValidateQueryRequest*.
* Convert MainResponse*.
* Convert MultiGet*.
* Convert Update*.
* Add a missing call to parent constructors.

Relates to #34389.
2019-07-18 18:45:10 -07:00
Ryan Ernst 13f46aa801
Convert index and persistent actions/response to writeable (#44582) (#44601)
This commit converts several more classes from streamable to writeable
in server, mostly within the o.e.index and o.e.persistent packages.

relates #34389
2019-07-18 18:32:09 -07:00
Tal Levy 03f5084ac7
remove usages of #readOptionalStreamable, #readStreamableList. (#44578) (#44598)
This commit removes references to Streamable from StreamInput.

This is all a part of the effort to remove Streamable usage.

relates #34389.
2019-07-18 16:19:02 -07:00
Ryan Ernst af093a4095
Convert ShardOperationFailedException to Writeable (#44532) (#44580)
This commit converts subclasses of ShardOperationFailedException to
implement ctors with StreamInput instead of readFrom. It also simplifies
IndicesShardStoresResponse.Failure to serialize its shardId after the
super data.

relates #34389
2019-07-18 13:29:19 -07:00
Armin Braun 3b5038b837
Implement Eventually Consistent Mock Repository for SnapshotResiliencyTests (#40893) (#44570)
* Add eventually consistent mock repository for reproducing and testing AWS S3 blob store behavior
* Relates #38941
2019-07-18 17:54:54 +02:00
Andrey Ershov ef6ddd15c6 Revert "Snapshot tool: S3 orphaned files
cleanup (#44551)"

This reverts commit 09edeeb3
2019-07-18 17:21:45 +02:00
Andrey Ershov 09edeeb38e Snapshot tool: S3 orphaned files cleanup (#44551)
A tool to work with snapshots.
Co-authored by @original-brownbear.
This commit adds snapshot tool and the single command cleanup, that
cleans up orphaned files for S3.
Snapshot tool lives in x-pack/snapshot-tool.

(cherry picked from commit fc4aed44dd975d83229561090f957a95cc76b287)
2019-07-18 16:38:00 +02:00
David Turner 452f7f67a0
Defer reroute when starting shards (#44539)
Today we reroute the cluster as part of the process of starting a shard, which
runs at `URGENT` priority. In large clusters, rerouting may take some time to
complete, and this means that a mere trickle of shard-started events can cause
starvation for other, lower-priority, tasks that are pending on the master.

However, it isn't really necessary to perform a reroute when starting a shard,
as long as one occurs eventually. This commit removes the inline reroute from
the process of starting a shard and replaces it with a deferred one that runs
at `NORMAL` priority, avoiding starvation of higher-priority tasks.

Backport of #44433 and #44543.
2019-07-18 14:10:40 +01:00
Alan Woodward ec0a0a41db Remove type parameter from ParserContext (#44478)
ParserContext.getType() is never called, so we can remove it and tidy up
the callers as well.
2019-07-18 11:07:46 +01:00
Luca Cavanna a8a16e6b08 Associate sub-requests to their parent task in multi search API (#44492)
Multi search accepts multiple search requests and runs them as
independent requests, each one as part of their own search task. Today
they don't get associated though with their parent multi search task,
which would be useful to monitor which msearch a certain search was part
of, if any, and also to cancel all of the sub-requests in case the
parent msearch gets cancelled (though this will also require making 
the multi search task cancellable as a follow-up).
2019-07-18 11:58:30 +02:00
David Turner 7598e0186a
Harmonise indentation of cluster settings (#44540)
Today the long list of `BUILT_IN_CLUSTER_SETTINGS` is indented differently
between `master` and `7.x`. This sometimes makes backporting painful. This
commit adjusts the indentation of earlier branches to match that in `master`.
2019-07-18 09:50:53 +01:00
Armin Braun 6565825a13
Avoid CharsRef Allocations in StreamInput (#44488) (#44519)
* Many messages deserialized from a `StreamInput` only contain short strings, some use-cases of instantiating a `StreamInput` don't deserialize any strings
  * Don't allocate `CharsRef` for small strings to save some allocations (especially on the IO threads)
  * Lazily allocate a larger `CharsRef` if needed for larger strings like we did before and have it live as long as the `StreamInput` like before as well
2019-07-18 08:52:37 +02:00
Tal Levy 38d2ada84f
deprecate Supplier<Response> constructors in HandledTransportAction (#44456) (#44533)
This commit deprecates all constructors of HandledTransportAction
that take in a Supplier instead of a Writeable.Reader for response
objects.

in addition to the deprecation, the following modules were updated to
leverage Writeable

- modules:ingest-common
- modules:lang-mustache

relates #34389.
2019-07-17 22:47:09 -07:00
Tal Levy 075a3f0e99
remove usage of ActionType#(String) (#44459) (#44526)
this commit removes usage of the deprecated
constructor with a single argument and no Writeable.Reader.

The purpose of this is to reduce the boilerplate necessary for
properly implementing a new action, as well as reducing the
chances of using the incorrect super constructor while classes
are being migrated to Writeable

relates #34389.
2019-07-17 20:28:11 -07:00
Nhat Nguyen 51180af91d Make peer recovery send file chunks async (#44468)
Relates #44040
Relates #36195
2019-07-17 22:25:43 -04:00
Nhat Nguyen 458f24c46a Reenable accounting circuit breaker (#44495)
We have a new Lucene 8.2 snapshot on master and 7.x; hence we can
re-enable the accounting on these branches.

Relates #30290
2019-07-17 22:25:43 -04:00
Julie Tibshirani 34c6067018
Convert several classes in 'server' to Writeable. (#44527)
* Convert FieldCapabilities*.
* Convert MultiTermVectors*.
* Convert SyncedFlush*.
* Convert SearchTemplateRequest.
* Convert MultiSearchTemplateRequest.
* Convert GrokProcessorGet*.
* Remove a stray reference to SearchTemplateRequest#readFrom.

Relates to #34389.
2019-07-17 19:04:21 -07:00
Ryan Ernst 2a2686e6e7
Convert remaining ActionTypes to writeable in xpack core (#44467) (#44525)
This commit converts all remaining ActionType response classes to
writeable in xpack core. It also converts a few from server which were
used by xpack core.

relates #34389
2019-07-17 18:01:45 -07:00
Ryan Ernst 17c4b2b839
Convert MasterNodeRequest to implement Writeable.Reader (#44452) (#44513)
This commit converts all MasterNodeRequest subclasses to fullfill
Writeable.Reader constructors.

relates #34389
2019-07-17 18:01:29 -07:00
Paul Sanwald 7114fe786b
Fix incorrect calculation of how many buckets will result from a merge operation. (#44461) (#44515) 2019-07-17 19:14:16 -04:00
Julie Tibshirani 8841779de8
Convert ClearScroll* to Writeable. (#44511)
This PR converts `ClearScrollRequest` and `ClearScrollResponse` to
`Writeable`.

Relates to #34389.
2019-07-17 15:49:38 -07:00
Jason Tedor 39c5f98de7
Introduce test issue logging (#44477)
Today we have an annotation for controlling logging levels in
tests. This annotation serves two purposes, one is to control the
logging level used in tests, when such control is needed to impact and
assert the behavior of loggers in tests. The other use is when a test is
failing and additional logging is needed. This commit separates these
two concerns into separate annotations.

The primary motivation for this is that we have a history of leaving
behind the annotation for the purpose of investigating test failures
long after the test failure is resolved. The accumulation of these stale
logging annotations has led to excessive disk consumption. Having
recently cleaned this up, we would like to avoid falling into this state
again. To do this, we are adding a link to the test failure under
investigation to the annotation when used for the purpose of
investigating test failures. We will add tooling to inspect these
annotations, in the same way that we have tooling on awaits fix
annotations. This will enable us to report on the use of these
annotations, and report when stale uses of the annotation exist.
2019-07-18 05:33:33 +09:00
Ryan Ernst 0755a13c9f
Convert AcknowledgedRequest to Writeable.Reader (#44412) (#44454)
This commit adds constructors to AcknolwedgedRequest subclasses to
implement Writeable.Reader, and ensures all future subclasses implement
the same.

relates #34389
2019-07-17 11:17:36 -07:00
Yannick Welsch c8b66c549d Ignore failures to set socket options on Mac (#44355)
Brings some temporary relief for test failures until #41071 is addressed.
2019-07-17 18:51:25 +02:00
Yannick Welsch f78e64e3e2 Terminate linearizability check early on large histories (#44444)
Large histories can be problematic and have the linearizability checker occasionally run OOM. As it's
very difficult to bound the size of the histories just right, this PR will let it instead run for 10 seconds
on large histories and then abort.

Closes #44429
2019-07-17 18:51:25 +02:00
Igor Motov d3cb7bbc8f Geo: fix GeoWKTShapeParserTests (#44448)
Changes in #44187 introduced some optimization in the way shapes are
generated. These changes were not captured in GeoWKTShapeParserTests.

Relates #44187
2019-07-17 12:09:38 -04:00
Igor Motov cd5a334864 Geo: extract dateline handling logic from ShapeBuilders (#44187)
Extracts dateline decomposition logic from ShapeBuilder into a separate
utility class that is used on the indexing side. The search side
will be handled as part of another PR at this time we will remove
the decomposition logic from ShapeBuilders as well. This PR also doesn't
change any existing logic including bugs.

Relates to #40908
2019-07-17 12:09:38 -04:00
Alan Woodward b6a0f098e6 Don't use index_phrases on graph queries (#44340)
Due to https://issues.apache.org/jira/browse/LUCENE-8916, when you
try to use a synonym filter with the index_phrases option on a text field,
you can end up with null values in a Phrase query, leading to weird
exceptions further down the querying chain. As a workaround, this commit
disables the index_phrases optimization for queries that produce token
graphs.

Fixes #43976
2019-07-17 16:46:00 +01:00
Yannick Welsch ddd740162e
Do not use CancellableThreads for Zen1 (#44430)
Zen 1 stops pinging threads in ZenDiscovery by calling Thread.interrupt(). This is incompatible with
the CancellableThreads that only allow threads to be interrupted through cancellation. The use of
CancellableThreads was introduced in #42844 and added to UnicastZenPing as part of the
backport, as both Zen1 and Zen2 share the same SeedHostsResolver implementation. This commit
effectively undoes the change in the backport while still allowing to share same implementation.

Closes #44425
2019-07-17 17:32:47 +02:00
Zachary Tong 103ba976fd Convert BucketScript to static parser (#44385)
BucketScript was using the old-style parser and could easily be
converted over to the newer static parser.

Also adds a test for GapPolicy enum serialization
2019-07-17 10:22:42 -04:00
David Turner 377a6a47ac Improve handshake failure messages (#44485)
Today we report an exception on a handshake failure (e.g. cluster name
mismatch) but the message does not include all the details of the mismatch. If
the mismatch is something subtle like `my-cluster` instead of `my_cluster` then
we cannot diagnose this from the message alone. This commit adds the details of
the local cluster to the message, along with the details of the remote cluster,
improving the utility of the exception message if reported in isolation.
2019-07-17 13:33:28 +01:00
Armin Braun 91673e373a
Fix Incorrect Uncompressed Error Handling in InboundMessage (#44317) (#44483)
* Fix Incorrect Uncompressed Error Handling in InboundMessage

* CompressorFactory.compressor does not throw uncompressed exception on uncompressed bytes, it merely returns `null` in this case if the bytes are at least XContent so the current catch and re-throw logic is dead code
* Made it work again by throwing on a `null` return so we get a real error message instead of an NPE
2019-07-17 14:31:46 +02:00
Ignacio Vera eb348d2593
Upgrade to lucene-8.2.0-snapshot-6413aae226 (#44480) 2019-07-17 13:28:28 +02:00
Armin Braun c8db0e9b7e
Remove blobExists Method from BlobContainer (#44472) (#44475)
* We only use this method in one place in production code and can replace that with a read -> remove it to simplify the interface
   * Keep it as an implementation detail in the Azure repository
2019-07-17 11:56:02 +02:00
Tanguy Leroux e423b7341a Log non-acknowledged close index response in ReplicaToPrimaryPromotionIT
Relates #44479
2019-07-17 10:32:44 +02:00
David Turner dca8a918f3 Use applied cluster state in cluster health (#44426)
In #44348 we changed the cluster health action so that it sometimes uses the
cluster state directly from the master service rather than from the cluster
applier. If the state is not recovered then this is inappropriate, because
prior to state recovery the state available to the cluster applier contains no
indices. This commit moves us back to using the state from the applier.

Fixes #44416.
2019-07-17 08:36:13 +01:00
David Turner 0fd33b089f Report shard state changes better (#44419)
Today when the cluster health changes the `AllocationService` reports at most
ten shards that were started or failed, and always ends its message with `...`
suggesting that the list is truncated. This commit adjusts these messages to be
clearer about whether the list is truncated or not. When debug logging is
enabled the list is not truncated; if the list is truncated then its length is
logged, and if it is not truncated then no `...` is included in the message.
2019-07-17 08:36:06 +01:00
Ryan Ernst 6e50bafa8f
Convert Broadcast request and response to use writeable.reader (#44386) (#44453)
This commit converts the request and response classes for broadcast
actions to implement ctors for Writeable.Reader and forces all future
implementations to implement the same.

relates #34389
2019-07-16 23:24:02 -07:00
Tim Brooks 6b1a769638
Move CORS Config into :server package (#43779)
This commit moves the config that stores Cors options into the server
package. Currently both nio and netty modules must have a copy of this
config. Moving it into server allows one copy and the tests to be in a
common location.
2019-07-16 17:50:42 -06:00
Julie Tibshirani cc0ff3aa71 Ensure field caps doesn't error on rank feature fields. (#44370)
The contract for MappedFieldType#fielddataBuilder is to throw an
IllegalArgumentException if fielddata is not supported. The rank feature mappers
were instead throwing an UnsupportedOperationException, which caused
MappedFieldType#isAggregatable to fail.
2019-07-16 15:56:50 -07:00
Ryan Ernst c26edb4c43
Ensure replication response/requests implement writeable (#44392) (#44446)
This commit cleans up replication response and request so that the base
class does not allow subclasses to implement Streamable.

relates #34389
2019-07-16 12:53:08 -07:00
Przemysław Witek 9613700a63
[7.x] Implement MlConfigIndexMappingsFullClusterRestartIT test which verifies that .ml-config index mappings are properly updated during cluster upgrade (#44341) (#44366) 2019-07-16 21:22:40 +02:00
Nhat Nguyen 301c8daf4c Revert "Make peer recovery send file chunks async (#44040)"
This reverts commit a2b4687d89.
2019-07-16 14:18:35 -04:00
Christoph Büscher 67ec0a4e9b
Unmute SpecificMasterNodesIT test (#44436)
The underlying issue is closed and the fix in #42454 seems to have been
backported to 7.x and 7.3 so we can reactivate the test.
2019-07-16 19:41:59 +02:00
Yu 563a78829f Do not allow version in Rest Update API (#43516)
The versioning of Update API doesn't rely on version number anymore (and
rather on sequence number). But in rest api level we ignored the
"version" and "version_type" parameter, so that the server cannot raise
the exception when whey were set.

This PR restores "version" and "version_type" parsing in Update Rest API
so that we can get the appropriate errors.

Relates to #42497
2019-07-16 13:19:07 -04:00
Nhat Nguyen a2b4687d89 Make peer recovery send file chunks async (#44040) 2019-07-16 10:43:46 -04:00
Lee Hinman fb0461ac76
[7.x] Add Snapshot Lifecycle Management (#44382)
* Add Snapshot Lifecycle Management (#43934)

* Add SnapshotLifecycleService and related CRUD APIs

This commit adds `SnapshotLifecycleService` as a new service under the ilm
plugin. This service handles snapshot lifecycle policies by scheduling based on
the policies defined schedule.

This also includes the get, put, and delete APIs for these policies

Relates to #38461

* Make scheduledJobIds return an immutable set

* Use Object.equals for SnapshotLifecyclePolicy

* Remove unneeded TODO

* Implement ToXContentFragment on SnapshotLifecyclePolicyItem

* Copy contents of the scheduledJobIds

* Handle snapshot lifecycle policy updates and deletions (#40062)

(Note this is a PR against the `snapshot-lifecycle-management` feature branch)

This adds logic to `SnapshotLifecycleService` to handle updates and deletes for
snapshot policies. Policies with incremented versions have the old policy
cancelled and the new one scheduled. Deleted policies have their schedules
cancelled when they are no longer present in the cluster state metadata.

Relates to #38461

* Take a snapshot for the policy when the SLM policy is triggered (#40383)

(This is a PR for the `snapshot-lifecycle-management` branch)

This commit fills in `SnapshotLifecycleTask` to actually perform the
snapshotting when the policy is triggered. Currently there is no handling of the
results (other than logging) as that will be added in subsequent work.

This also adds unit tests and an integration test that schedules a policy and
ensures that a snapshot is correctly taken.

Relates to #38461

* Record most recent snapshot policy success/failure (#40619)

Keeping a record of the results of the successes and failures will aid
troubleshooting of policies and make users more confident that their
snapshots are being taken as expected.

This is the first step toward writing history in a more permanent
fashion.

* Validate snapshot lifecycle policies (#40654)

(This is a PR against the `snapshot-lifecycle-management` branch)

With the commit, we now validate the content of snapshot lifecycle policies when
the policy is being created or updated. This checks for the validity of the id,
name, schedule, and repository. Additionally, cluster state is checked to ensure
that the repository exists prior to the lifecycle being added to the cluster
state.

Part of #38461

* Hook SLM into ILM's start and stop APIs (#40871)

(This pull request is for the `snapshot-lifecycle-management` branch)

This change allows the existing `/_ilm/stop` and `/_ilm/start` APIs to also
manage snapshot lifecycle scheduling. When ILM is stopped all scheduled jobs are
cancelled.

Relates to #38461

* Add tests for SnapshotLifecyclePolicyItem (#40912)

Adds serialization tests for SnapshotLifecyclePolicyItem.

* Fix improper import in build.gradle after master merge

* Add human readable version of modified date for snapshot lifecycle policy (#41035)

* Add human readable version of modified date for snapshot lifecycle policy

This small change changes it from:

```
...
"modified_date": 1554843903242,
...
```

To

```
...
"modified_date" : "2019-04-09T21:05:03.242Z",
"modified_date_millis" : 1554843903242,
...
```

Including the `"modified_date"` field when the `?human` field is used.

Relates to #38461

* Fix test

* Add API to execute SLM policy on demand (#41038)

This commit adds the ability to perform a snapshot on demand for a policy. This
can be useful to take a snapshot immediately prior to performing some sort of
maintenance.

```json
PUT /_ilm/snapshot/<policy>/_execute
```

And it returns the response with the generated snapshot name:

```json
{
  "snapshot_name" : "production-snap-2019.04.09-rfyv3j9qreixkdbnfuw0ug"
}
```

Note that this does not allow waiting for the snapshot, and the snapshot could
still fail. It *does* record this information into the cluster state similar to
a regularly trigged SLM job.

Relates to #38461

* Add next_execution to SLM policy metadata (#41221)

* Add next_execution to SLM policy metadata

This adds the next time a snapshot lifecycle policy will be executed when
retriving a policy's metadata, for example:

```json
GET /_ilm/snapshot?human
{
  "production" : {
    "version" : 1,
    "modified_date" : "2019-04-15T21:16:21.865Z",
    "modified_date_millis" : 1555362981865,
    "policy" : {
      "name" : "<production-snap-{now/d}>",
      "schedule" : "*/30 * * * * ?",
      "repository" : "repo",
      "config" : {
        "indices" : [
          "foo-*",
          "important"
        ],
        "ignore_unavailable" : true,
        "include_global_state" : false
      }
    },
    "next_execution" : "2019-04-15T21:16:30.000Z",
    "next_execution_millis" : 1555362990000
  },
  "other" : {
    "version" : 1,
    "modified_date" : "2019-04-15T21:12:19.959Z",
    "modified_date_millis" : 1555362739959,
    "policy" : {
      "name" : "<other-snap-{now/d}>",
      "schedule" : "0 30 2 * * ?",
      "repository" : "repo",
      "config" : {
        "indices" : [
          "other"
        ],
        "ignore_unavailable" : false,
        "include_global_state" : true
      }
    },
    "next_execution" : "2019-04-16T02:30:00.000Z",
    "next_execution_millis" : 1555381800000
  }
}
```

Relates to #38461

* Fix and enhance tests

* Figured out how to Cron

* Change SLM endpoint from /_ilm/* to /_slm/* (#41320)

This commit changes the endpoint for snapshot lifecycle management from:

```
GET /_ilm/snapshot/<policy>
```

to:

```
GET /_slm/policy/<policy>
```

It mimics the ILM path only using `slm` instead of `ilm`.

Relates to #38461

* Add initial documentation for SLM (#41510)

* Add initial documentation for SLM

This adds the initial documentation for snapshot lifecycle management.

It also includes the REST spec API json files since they're sort of
documentation.

Relates to #38461

* Add `manage_slm` and `read_slm` roles (#41607)

* Add `manage_slm` and `read_slm` roles

This adds two more built in roles -

`manage_slm` which has permission to perform any of the SLM actions, as well as
stopping, starting, and retrieving the operation status of ILM.

`read_slm` which has permission to retrieve snapshot lifecycle policies as well
as retrieving the operation status of ILM.

Relates to #38461

* Add execute to the test

* Fix ilm -> slm typo in test

* Record SLM history into an index (#41707)

It is useful to have a record of the actions that Snapshot Lifecycle
Management takes, especially for the purposes of alerting when a
snapshot fails or has not been taken successfully for a certain amount of
time.

This adds the infrastructure to record SLM actions into an index that
can be queried at leisure, along with a lifecycle policy so that this
history does not grow without bound.

Additionally,
SLM automatically setting up an index + lifecycle policy leads to
`index_lifecycle` custom metadata in the cluster state, which some of
the ML tests don't know how to deal with due to setting up custom
`NamedXContentRegistry`s.  Watcher would cause the same problem, but it
is already disabled (for the same reason).

* High Level Rest Client support for SLM (#41767)

* High Level Rest Client support for SLM

This commit add HLRC support for SLM.

Relates to #38461

* Fill out documentation tests with tags

* Add more callouts and asciidoc for HLRC

* Update javadoc links to real locations

* Add security test testing SLM cluster privileges (#42678)

* Add security test testing SLM cluster privileges

This adds a test to `PermissionsIT` that uses the `manage_slm` and `read_slm`
cluster privileges.

Relates to #38461

* Don't redefine vars

*  Add Getting Started Guide for SLM  (#42878)

This commit adds a basic Getting Started Guide for SLM.

* Include SLM policy name in Snapshot metadata (#43132)

Keep track of which SLM policy in the metadata field of the Snapshots
taken by SLM. This allows users to more easily understand where the
snapshot came from, and will enable future SLM features such as
retention policies.

* Fix compilation after master merge

* [TEST] Move exception wrapping for devious exception throwing

Fixes an issue where an exception was created from one line and thrown in another.

* Fix SLM for the change to AcknowledgedResponse

* Add Snapshot Lifecycle Management Package Docs (#43535)

* Fix compilation for transport actions now that task is required

* Add a note mentioning the privileges needed for SLM (#43708)

* Add a note mentioning the privileges needed for SLM

This adds a note to the top of the "getting started with SLM"
documentation mentioning that there are two built-in privileges to
assist with creating roles for SLM users and administrators.

Relates to #38461

* Mention that you can create snapshots for indices you can't read

* Fix REST tests for new number of cluster privileges

* Mute testThatNonExistingTemplatesAreAddedImmediately (#43951)

* Fix SnapshotHistoryStoreTests after merge

* Remove overridden newResponse functions that have been removed

* Fix compilation for backport

* Fix get snapshot output parsing in test

* [DOCS] Add redirects for removed autogen anchors (#44380)

* Switch <tt>...</tt> in javadocs for {@code ...}
2019-07-16 07:37:13 -06:00
Armin Braun 4a79ccd324
Cleaner Exception Handling on Shard Delete (#44384) (#44407)
* Follow up to #44165
* We should just catch all exceptions here and not return errors after the index-N update went through since a subsequent delete attempt by the user would fail with SnapshotMissingException since the snapshot now appears deleted. Also, `SnapshotException` isn't even thrown in the changed spot it seems in the first place and certainly not the only exception possible.
2019-07-16 12:20:52 +02:00
David Turner a09389c511 AwaitsFix GatewayIndexStateIT#testJustMasterNode
Relates #44416.
2019-07-16 11:02:32 +01:00
David Turner 8d68d1f54d Cluster health should await events plus other things (#44348)
Today a cluster health request can wait on a selection of conditions, but it
does not guarantee that all of these conditions have ever held simultaneously
when it returns. More specifically, if a request sets `waitForEvents()` along
with some other conditions then Elasticsearch will respond when the master has
processed all the expected pending tasks _and then_ the cluster satisfied the
other conditions, but it may be that at the time the cluster satisfied the
other conditions there were undesired pending tasks again.

This commit adjusts the behaviour of `waitForEvents()` to wait for all the
required events to be processed and then, if the resulting cluster state does
not satisfy the other conditions, it will wait until there is a cluster state
that does and then retry the wait-for-events too.
2019-07-16 06:34:02 +01:00
Ryan Ernst e0b82e92f3
Convert BaseNode(s) Request/Response classes to Writeable (#44301) (#44358)
This commit converts all BaseNodeResponse and BaseNodesResponse
subclasses to implement Writeable.Reader instead of Streamable.

relates #34389
2019-07-15 18:07:52 -07:00
David Turner 86ee8eab3f Allow RerouteService to reroute at lower priority (#44338)
Today the `BatchedRerouteService` submits its delayed reroute task at `HIGH`
priority, but in some cases a lower priority would be more appropriate. This
commit adds the facility to submit delayed reroute tasks at different
priorities, such that each submitted reroute task runs at a priority no lower
than the one requested. It does not change the fact that all delayed reroute
tasks are submitted at `HIGH` priority, but at least it makes this explicit.
2019-07-15 17:41:39 +01:00
Ryan Ernst 59658daef9
Separate streamable based master node actions (#44313)
This commit creates new base classes for master node actions whose
response types still implement Streamable. This simplifies both finding
remaining classes to convert, as well as creating new master node
actions that use Writeable for their responses.

relates #34389
2019-07-15 09:20:20 -07:00
David Turner e3d2af64c4 Throw TranslogCorruptedException in more cases (#44217)
Today we do not throw a `TranslogCorruptedException` in certain cases of
translog corruption, such as for a corrupted checkpoint file or when an
expected file (either checkpoint or translog) is completely missing. This means
that `elasticsearch-shard` will not truncate the translog in those cases.

This commit strengthens the translog corruption tests to corrupt and/or delete
both translog and checkpoint files, and ensures that a
`TranslogCorruptedException` is thrown in all cases. It also sometimes
simulates a recovery after a crash while rolling the translog generation,
including cases where the rolled checkpoint contains incorrect data.

It also adjusts (and renames) `RemoveCorruptedShardDataCommandIT.getDirs()` to
return only a single path, since in practice this was the only thing that could
happen and yet we were relying on its callers to verify this and not all
callers were doing so.
2019-07-15 15:20:33 +01:00
Armin Braun eb1106c465
Stronger Cleanup Shard Snapshot Directory on Delete (#44257) (#44337)
* Stronger Cleanup Shard Snapshot Directory on Delete

* Use `RepositoryData` to clean up unreferenced `snap-${uuid}.dat` blobs
from shard directories (and index-N) and as a result also clean up data
blobs that are only referenced by them
* Stop cleaning up anything but index-N on shard snapshot creation to
align behavior of shard index-N handling with root path index-N
handling
2019-07-15 12:59:38 +02:00
Christoph Büscher 22dc125dad AnalyzeAction.Response doesn't need to call super.readFrom() (#44331)
The responses super.writeTo() method was removed in #44092, so the corresponding
contructor that reads from a stream shouldn't call super itself, even though its
implementation is currently empty.
2019-07-15 11:53:25 +02:00
Armin Braun 7f5d40d235
Avoid Needless Set Instantiation in InboundMessage (#44318) (#44329)
* Avoid Needless Set Instantiation in InboundMessage

* When `features` is empty (when there's no xpack) we constantly and needless instantiated a few objects here for the empty set on every message
2019-07-15 10:59:51 +02:00
Armin Braun 0cc94a457d
Remove non-SMILE Serialization from ChecksumBlobStoreFormat (#44278) (#44326)
* At least all the way back to 6.x we never use anything but `SMILE` in
production code with this class so I removed the more general
constructor and removed the format leniency from the deserialization
2019-07-15 10:59:33 +02:00
Tanguy Leroux 76a96c3774 Remove ReusePeerRecoverySharedTest class (#44275) 2019-07-15 10:29:29 +02:00
Armin Braun d73e2f9c56
HLRC: Fix '+' Not Correctly Encoded in GET Req. (#33164) (#44324)
* HLRC: Fix '+' Not Correctly Encoded in GET Req.

* Encode `+` correctly as `%2B` in URL paths
* Keep encoding `+` as space in URL parameters
* Closes #33077
2019-07-15 10:21:54 +02:00
Nhat Nguyen 2203d447aa Fail engine if hit document failure on replicas (#43523)
An indexing on a replica should never fail after it was successfully
indexed on a primary. Hence, we should fail an engine if we hit any
failure (document level or tragic failure) when processing an indexing
on a replica.

Relates #43228
Closes #40435
2019-07-14 19:29:16 -04:00
Christoph Büscher 835b7a120d Fix AnalyzeAction response serialization (#44284)
Currently we loose information about whether a token list in an AnalyzeAction
response is null or an empty list, because we write a 0 value to the stream in
both cases and deserialize to a null value on the receiving side. This change
fixes this so we write an additional flag indicating whether the value is null
or not, followed by the size of the list and its content.

Closes #44078
2019-07-14 10:35:11 +02:00
Ryan Ernst 1dcf53465c Reorder HandledTransportAction ctor args (#44291)
This commit moves the Supplier variant of HandledTransportAction to have
a different ordering than the Writeable.Reader variant. The Supplier
version is used for the legacy Streamable, and currently having the
location of the Writeable.Reader vs Supplier in the same place forces
using casts of Writeable.Reader to select the correct super constructor.
This change in ordering allows easier migration to Writeable.Reader.

relates #34389
2019-07-12 13:45:09 -07:00
Nikita Glashenko d187fcb9de Support WKT point conversion to geo_point type (#44107)
This PR adds support for parsing geo_point values from WKT POINT format.
Also, a few minor bugs in geo_point parsing were fixed.

Closes #41821
2019-07-12 14:31:07 -04:00
Przemyslaw Gomulka e23ecc5838
JSON logging refactoring and X-Opaque-ID support backport(#41354) (#44178)
This is a refactor to current JSON logging to make it more open for extensions
and support for custom ES log messages used inDeprecationLogger IndexingSlowLog , SearchSLowLog
We want to include x-opaque-id in deprecation logs. The easiest way to have this as an additional JSON field instead of part of the message is to create a custom DeprecatedMessage (extends ESLogMEssage)

These messages are regular log4j messages with a text, but also carry a map of fields which can then populate the log pattern. The logic for this lives in ESJsonLayout and ESMessageFieldConverter.

Similar approach can be used to refactor IndexingSlowLog and SearchSlowLog JSON logs to contain fields previously only present as escaped JSON string in a message field.

closes #41350
 backport #41354
2019-07-12 16:53:27 +02:00
Armin Braun 9b4f50b40a
Remove Redundant GetAllSnapshots Method from RepositoryData (#44259) (#44271)
* With the removal of the incompatible snapshots list in RepositoryData
the get snapshots and get all snapshots methods are equivalent so I
removed one of them
2019-07-12 15:03:09 +02:00
Yannick Welsch 068286ca4b Remove RemoteClusterConnection.ConnectedNodes (#44235)
This instead exposes the set of connected nodes on ConnectionManager.
2019-07-12 14:54:21 +02:00
Armin Braun 6c02cf0241
Fix InternalTestCluster StopRandomNode Assertion (#44258) (#44265)
* The assertion added in #44214 is tripped by tests running dedicated
test clusters per test needlessly.This breaks existing tests like the one in #44245.
* Closes #44245
2019-07-12 13:18:55 +02:00
Armin Braun ad6dce16f4
Safer Shard Snapshot Delete (#44165) (#44244)
* Safer Shard Snapshot Delete

* We shouldn't delete the snapshot meta file before we update the index
in the shard folder. If we fail to update the index-N after deleting the
existing index-N is broken because the snap- blob it references is gone.
2019-07-12 12:45:06 +02:00
David Turner 735c897ec6
Avoid counting votes from master-ineligible nodes (#43688)
Today if a master-eligible node is converted to a master-ineligible node it may
remain in the voting configuration, meaning that the master node may count its
publish responses as an indication that it has properly persisted the cluster
state. However master-ineligible nodes do not properly persist the cluster
state, so it is not safe to count these votes.

This change adjusts `CoordinationState` to take account of this from a safety
point of view, and also adjusts the `Coordinator` to prevent such nodes from
joining the cluster. Instead, it triggers a reconfiguration to remove from the
voting configuration a node that now appears to be master-ineligible before
processing its join.

Backport of #43688, see #44260.
2019-07-12 11:30:52 +01:00
Armin Braun 9e920f9612
Make Timestamps Returned by Snapshot APIs Consistent (#43148) (#44261)
* We don't have to calculate the start and end times form the shards for the status API, we have the start time available from the CS or the `SnapshotInfo` in the repo and can either take the end time form the `SnapshotInfo` or
take the most recent time from the shard stats for in progress snapshots
* Closes #43074
2019-07-12 12:05:35 +02:00
Mark Vieira 3cd9606566
Mute failing test 2019-07-11 13:32:49 -07:00
Armin Braun 0dd06cf7a5
Remove Dead Code Around Snapshots (#44109) (#44236)
* Just some random spots that have become unused with recent cleanups
2019-07-11 21:56:36 +02:00
Christoph Büscher 31725ef390 [Tests] Increase SimpleQueryStringIT allowed maxClauseCount (#44215)
For this test, we randomize the CLUSTER_MAX_CLAUSE_COUNT on test setup
(@BeforeClass) between 50 and 100. Some queries in the test generate 56 clauses
which hasn't been an issue before LUCENE-8811, but we slightly need to increase
the minimal possible clause count now.

Closes #44192
2019-07-11 20:16:20 +02:00
Yannick Welsch ae8f625d73 Report usages old child breakers when breaking on real memory (#44221)
This will help in investigations where the real memory circuit breaker is tripped to better understand
on what the actual memory is used, i.e. whether it's a temporary thing (e.g. requests) in contrast to
more permanently allocated memory (e.g. accounting).
2019-07-11 19:52:12 +02:00
Armin Braun 2768662822
Cleanup Stale Root Level Blobs in Sn. Repository (#43542) (#44226)
* Cleans up all root level temp., snap-%s.dat, meta-%s.dat blobs that aren't referenced by any snapshot to deal with dangling blobs left behind by delete and snapshot finalization failures
   * The scenario that get's us here is a snapshot failing before it was finalized or a delete failing right after it wrote the updated index-(N+1) that doesn't reference a snapshot anymore but then fails to remove that snapshot
   * Not deleting other dangling blobs since that don't follow the snap-, meta- or tempfile naming schemes to not accidentally delete blobs not created by the snapshot logic
* Follow up to #42189
  * Same safety logic, get list of all blobs before writing index-N blobs, delete things after index-N blobs was written
2019-07-11 19:35:15 +02:00
Andrei Stefan e9f9f00940
SQL: add pretty printing to JSON format (#43756) (#44220)
(cherry picked from commit cbd9d4c259bf5a541bc49f65f7973174a36df449)
2019-07-11 20:02:24 +03:00
Christos Soulios c091b6c004
Migrating tests from AvgIT integration test to AvgAggregatorTests (#44076) (#44225)
This PR migrates most tests from AvgIT integration test to AvgAggregatorTests, as described in #42893
2019-07-11 19:20:13 +03:00
Armin Braun 5f22370b6b
Fix ShrinkIndexIT (#44214) (#44223)
* Fix ShrinkIndexIT

* Move this test suit to cluster scope. Currently, `testShrinkThenSplitWithFailedNode` stops a random node which randomly turns out to be the only shared master node so the cluster reset fails on account of the fact that no shared master node survived.
* Closes #44164
2019-07-11 17:58:00 +02:00
Igor Motov 1636701d69 CI: Disable SimpleQueryStringIT.testDocWithAllTypes
Tracked by #44192
2019-07-11 09:22:18 -05:00
Nick Knize 374030a53f
Upgrade to lucene-8.2.0-snapshot-860e0be5378 (#44171) (#44184)
Upgrades lucene library to lucene-8.2.0-snapshot-860e0be5378
2019-07-11 09:17:22 -05:00
Igor Motov 66a9b721f5 Add Map to XContentParser Wrapper (#44036)
In some cases we need to parse some XContent that is already parsed into
a map. This is currently happening in handling source in SQL and ingest
processors as well as parsing null_value values in geo mappings. To avoid
re-serializing and parsing the value again or writing another map-based
parser this commit adds an iterator that iterates over a map as if it was
XContent. This makes reusing existing XContent parser on maps possible.

Relates to #43554
2019-07-11 09:38:31 -04:00
Yannick Welsch ea5513f2cf Make NodeConnectionsService non-blocking (#44211)
With connection management now being non-blocking, we can make NodeConnectionsService
avoid the use of MANAGEMENT threads that are blocked during the connection attempts.

I had to fiddle a bit with the tests as testPeriodicReconnection was using both the mock Threadpool
from the DeterministicTaskQueue as well as the real ThreadPool initialized at the test class level,
which resulted in races.
2019-07-11 14:08:07 +02:00
Armin Braun 51f0e941d3
Reduce Number of List Calls During Snapshot Create and Delete (#44088) (#44209)
* Reduce Number of List Calls During Snapshot Create and Delete

Some obvious cleanups I found when investigation the API call count
metering:
* No need to get the latest generation id after loading latest
repository data
   * Loading RepositoryData already requires fetching the latest
generation so we can reuse it
* Also, reuse list of all root blobs when fetching latest repo
generation during snapshot delete like we do for shard folders
* Lastly, don't try and load `index--1` (N = -1) repository data, it
doesn't exist -> just return the empty repo data initially
2019-07-11 13:52:36 +02:00
Armin Braun 8ce8c627dd
Some Cleanup in o.e.i.shard (#44097) (#44208)
* Some Cleanup in o.e.i.shard

* Extract one duplicated method
* Cleanup obviously unused code
2019-07-11 13:52:06 +02:00
Yannick Welsch 2ee07f1ff4 Simplify port usage in transport tests (#44157)
Simplifies AbstractSimpleTransportTestCase to use JVM-local ports  and also adds an assertion so
that cases like #44134 can be more easily debugged. The likely reason for that one is that a test,
which was repeated again and again while always spawning a fresh Gradle worker (due to Gradle
daemon) kept increasing Gradle worker IDs, causing an overflow at some point.
2019-07-11 13:35:37 +02:00
Armin Braun c0ed64bb92
Improve Repository Consistency Check in Tests (#44204)
* Improve Repository Consistency Check in Tests (#44099)

* Check that index metadata as well as snapshot metadata always exists
when referenced by other metadata

* Fix SnapshotResiliencyTests on ExtraFS (#44113)

* As a result of #44099 we're now checking more directories and have to
ignore the `extraN` folders for those like we do for indices already
* Closes #44112
2019-07-11 11:14:37 +02:00
Armin Braun 8a554f9737
Remove IncompatibleSnapshots Logic from Codebase (#44096) (#44183)
* The incompatible snapshots logic was created to track 1.x snapshots that
became incompatible with 2.x
   * It serves no purpose at this point
   * It adds an additional GET request to every loading of
RepositoryData (from loading the incompatible snapshots blob)
2019-07-11 07:15:51 +02:00
Igor Motov df2e1fb43e Geo: add validator that only checks altitude (#43893)
By default, we don't check ranges while indexing geo_shapes. As a
result, it is possible to index geoshapes that contain contain
coordinates outside of -90 +90 and -180 +180 ranges. Such geoshapes
will currently break SQL and ML retrieval mechanism. This commit removes
these restriction from the validator is used in SQL and ML retrieval.
2019-07-10 16:55:03 -04:00
Ryan Ernst 8fda49a834 Remove unused import in TransportShardBulkAction
Accidentally left from backporting #44092
2019-07-10 13:43:47 -07:00
Christoph Büscher cbb19032df [Test] Additional logging for RemoteClusterClientTests (#44124) 2019-07-10 22:41:54 +02:00
Ryan Ernst c6efb9be2a Convert ReplicationResponse to Writeable (#43953)
This commit convers ReplicationResponse and all its subclasses to
support Writeable.Reader as a constructor.

relates #34389
2019-07-10 12:45:10 -07:00
Ryan Ernst fb77d8f461 Removed writeTo from TransportResponse and ActionResponse (#44092)
The base classes for transport requests and responses currently
implement Streamable and Writeable. The writeTo method on these base
classes is implemented with an empty implementation. Not only does this
complicate subclasses to think they need to call super.writeTo, but it
also can lead to not implementing writeTo when it should have been
implemented, or extendiong one of these classes when not necessary,
since there is nothing to actually implement.

This commit removes the empty writeTo from these base classes, and fixes
subclasses to not call super and in some cases implement an empty
writeTo themselves.

relates #34389
2019-07-10 12:42:04 -07:00
Zachary Tong 92ad588275
Remove generic on AggregatorFactory (#43664) (#44079)
AggregatorFactory was generic over itself, but it doesn't appear we
use this functionality anywhere (e.g. to allow the super class
to declare arguments/return types generically for subclasses to
override).  Most places use a wildcard constraint, and even when a
concrete type is specified it wasn't used.

But since AggFactories are widely used, this led to
the generic touching many pieces of code and making type signatures
fairly complex
2019-07-10 13:20:28 -04:00
Nhat Nguyen b158919542 Do not use mock engine in PrimaryAllocationIT (#44083)
PrimaryAllocationIT#testForceStaleReplicaToBePromotedToPrimary 
relies on the flushing when a shard is no long assigned. This behavior,
however, can be randomly disabled in MockInternalEngine.

Closes #44049
2019-07-10 12:26:34 -04:00
David Turner d0f1a756d9 Comment on the extra reroute after failing shards (#44152)
The `ShardFailedClusterStateTaskExecutor` fails some shards, which performs a
reroute, but then sometimes schedules a followup reroute. It's not clear from
the code why this followup is necessary, so this commit adds a short comment
describing why it's necessary.
2019-07-10 13:24:21 +01:00
David Roberts cad804df92 [TEST] Mute ShrinkIndexIT
Due to https://github.com/elastic/elasticsearch/issues/44164
2019-07-10 13:22:25 +01:00
Martijn van Groningen 913b6a64e8
Replace Streamable w/ Writable for MultiSearchRequest (#44057)
This commit replaces usages of Streamable with Writeable for the
MultiSearchRequest class.

I ran into this when developing a custom action that reuses
MultiSearchRequest in the enrich branch.

Relates to #34389
2019-07-10 11:13:28 +02:00
Armin Braun a23d1ed00d
Mute SearchWithRandomExceptionsIT (#44147) (#44149)
* This is failing quiete often and we can reproduce it now so we don't
need additional test logging on CI
* Relates #40435
2019-07-10 08:12:26 +02:00
David Turner aec44fecbc Decouple DiskThresholdMonitor & ClusterInfoService (#44105)
Today the `ClusterInfoService` requires the `DiskThresholdMonitor` at
construction time so that it can notify it when nodes report changes in their
disk usage, but this is awkward to construct: the `DiskThresholdMonitor`
requires a `RerouteService` which requires an `AllocationService` which comees
from the `ClusterModule` which requires the `ClusterInfoService`.

Today we break the cycle with a `LazilyInitializedRerouteService` which is
itself a little ugly. This commit replaces this with a more traditional
subject/observer relationship between the `ClusterInfoService` and the
`DiskThresholdMonitor`.
2019-07-09 18:43:32 +01:00
David Turner e70cad4c52 Remove node conn block after connection barrier (#44114)
Today `testOnlyBlocksOnConnectionsToNewNodes` fails (extremely rarely) if the
last attempt to connect to `node0` is delayed for so long that the test runs
`nodeConnectionsBlocks.clear()` before the connection attempt obtains the
expected connection block. We can turn this into a reliable failure with this
delay:

```diff
diff --git a/server/src/main/java/org/elasticsearch/cluster/NodeConnectionsService.java b/server/src/main/java/org/elasticsearch/cluster/NodeConnectionsService.java
index f48413824d3..9a1d0336bcd 100644
--- a/server/src/main/java/org/elasticsearch/cluster/NodeConnectionsService.java
+++ b/server/src/main/java/org/elasticsearch/cluster/NodeConnectionsService.java
@@ -300,6 +300,13 @@ public class NodeConnectionsService extends AbstractLifecycleComponent {
         private final Runnable connectActivity = () -> threadPool.executor(ThreadPool.Names.MANAGEMENT).execute(new AbstractRunnable() {
             @Override
             protected void doRun() {
+
+                try {
+                    Thread.sleep(500);
+                } catch (InterruptedException e) {
+                    throw new AssertionError("unexpected", e);
+                }
+
                 assert Thread.holdsLock(mutex) == false : "mutex unexpectedly held";
                 transportService.connectToNode(discoveryNode);
                 consecutiveFailureCount.set(0);
```

This commit reverts the extra logging introduced in #43979 and fixes this
failure by waiting for the connection attempt to hit the barrier before
removing it.

Fixes #40170
2019-07-09 17:03:26 +01:00
David Turner 268971db03 Wait for blackholed connection before discovery (#44077)
Since #42636 we no longer treat connections specially when simulating a
blackholed connection. This means that at the end of the safety phase we may
have just started a connection attempt which will time out, but the default
timeout is 30 seconds, much longer than the 2 seconds we normally allow for
post-safety-phase discovery. This commit adds time for such a connection
attempt to time out.

It also fixes some spurious logging of `this` that now refers to an object with
an unhelpful `toString()` implementation introduced in #42636.

Fixes #44073
2019-07-09 10:59:53 +01:00
Henning Andersen 748a10866d Reindex ScrollableHitSource pump data out (#43864)
Refactor ScrollableHitSource to pump data out and have a simplified
interface (callers should no longer call startNextScroll, instead they
simply mark that they are done with the previous result, triggering a
new batch of data). This eases making reindex resilient, since we will
sometimes need to rerun search during retries.

Relates #43187 and #42612
2019-07-09 11:50:09 +02:00
David Turner fd9eebae81 Only apply initial recovery filter to shrunk shard (#44054)
Today the `index.routing.allocation.initial_recovery._id` setting can only be
set on indices that are the result of a shrink, but the filtered allocation
decider also applies this filter to shards with a recovery source of
`EMPTY_STORE`. The only way to have this setting set while the recovery source
is `EMPTY_STORE` is to force-allocate an empty primary, but such a forced
allocation ignores this allocation decider.

This commit simplifies the allocation decider so that the `initial_recovery`
setting only applies to shards with a recovery source of `LOCAL_SHARDS`.
2019-07-09 08:42:18 +01:00
Armin Braun 9eac5ceb1b
Dry up inputstream to bytesreference (#43675) (#44094)
* Dry up Reading InputStream to BytesReference
* Dry up spots where we use the same pattern to get from an InputStream to a BytesReferences
2019-07-09 09:18:25 +02:00
Armin Braun dc8f8e40eb
Fix DedicatedClusterSnapshotRestoreIT testSnapshotWithStuckNode (#43537) (#44082)
* Fix DedicatedClusterSnapshotRestoreIT testSnapshotWithStuckNode

* See comment in the test: The problem is that when the snapshot delete works out partially on master failover and the retry fails on `SnapshotMissingException` no repository cleanup is run => we still failed even with repo cleanup logic in the delete path now
   * Fixed the test by rerunning a create snapshot and delete loop to clean up the repo before verifying file counts
* Closes #39852
2019-07-09 06:32:08 +02:00
Armin Braun 03332b5aeb
Don't Consistency Check Broken Repository in Test (#43499) (#44071)
* Missed this one in #42189 and it randomly runs into a situation where the broken mock repo is broken such that we can't get to a consistent end state via a delete
* Closes #43498
2019-07-08 17:21:40 +02:00
Tanguy Leroux 251287f89d Check again on-going snapshots/restores of indices before closing (#43873)
Today we prevent any index that is actively snapshotted or restored to be closed. 
This verification is done during the execution of the first phase of index closing 
(ie before blocking the indices).

We should also do this verification again in the last phase of index closing 
(ie after the shard sanity checks and right before actually changing the index 
state and the routing table) because a snapshot/restore could sneak in while
 the shards are verified-before-close.
2019-07-08 17:07:04 +02:00
Mark Tozzi 299a52c17d
Enable validating user-supplied missing values on unmapped fields (#43718) (#43940)
Provides a hook for aggregations to introspect the `ValuesSourceType` for a user supplied Missing value on an unmapped field, when the type would otherwise be `ANY`.  Mapped field behavior is unchanged, and still applies the `ValuesSourceType` of the field.  This PR just provides the hook for doing this, no existing aggregations have their behavior changed.
2019-07-08 10:46:23 -04:00
Armin Braun 2918363e90
Simplify BlobStoreRepository (Flatten Nested Classes) (#42833) (#44060)
* In the current codebase it is hardly obvious what code operates on a shard and is run by a datanode what code operates on the global metadata and is run on master
   * Fixed by adjusting the method names accordingly
* The nested context classes don't add much if any value, they simply spread out the parameters that go into a shard snapshot create or delete all over the place since their
constructors can be inlined in all spots
   * Fixed by flattening the nested classes into BlobStoreRepository
* Also:
  * Inlined the other single use inner classes
2019-07-08 14:57:27 +02:00
Armin Braun afe81fd625
Some Cleanup in Test Framework (#44039) (#44059)
* Remove some obvious dead code
* Move assert methods that were only used in a single test class to the child they belong to
* Inline some redundant methods
2019-07-08 14:15:31 +02:00
David Turner 3f3bcb23c2 AwaitsFix testForceStaleReplicaToBePromotedToPrimary
Relates #44049
2019-07-08 11:26:57 +01:00
David Turner 3129f5b42e Do not copy initial recovery filter during split (#44053)
If an index is the result of a shrink then it will have a value set for
`index.routing.allocation.initial_recovery._id`. If this index is subsequently
split then this value will be copied over, forcing the initial allocation of
the split shards to occur on the node on which the shrink took place. Moreover
if this node no longer exists then the split will fail.  This commit suppresses
the copying of this setting when splitting an index.

Fixes #43955
2019-07-08 10:32:05 +01:00
Armin Braun af9b98e81c
Recursively Delete Unreferenced Index Directories (#42189) (#44051)
* Use ability to list child "folders" in the blob store to implement recursive delete on all stale index folders when cleaning up instead of using the diff between two `RepositoryData` instances to cover aborted deletes
* Runs after ever delete operation
* Relates  #13159 (fixing most of this issues caused by unreferenced indices, leaving some meta files to be cleaned up only)
2019-07-08 10:55:39 +02:00
Przemyslaw Gomulka 247f2dabad
Fix decimal point parsing for date_optional_time backport(#43859) #44050
Joda allowed for date_optional_time and strict_date_optional_time a decimal point to be . dot or , comma
For our java.time implementation we should also extend this for strict_date_optional_time-nanos
the approach to fix this is the same as in iso8601 parser
closes #43730
2019-07-08 09:56:01 +02:00
Armin Braun f6efc55556
Fix SnapshotResiliencyTest (#44015) (#44041)
* Closes #43989
2019-07-07 19:59:16 +02:00
Armin Braun 990ac4ca83
Some Cleanup in BlobStoreRepository (#43323) (#44043)
* Some Cleanup in BlobStoreRepository

* Extracted from #42833:
  * Dry up index and shard path handling
  * Shorten XContent handling
2019-07-07 19:50:46 +02:00
Nhat Nguyen 9089820d8f Enable indexing optimization using sequence numbers on replicas (#43616)
This PR enables the indexing optimization using sequence numbers on
replicas. With this optimization, indexing on replicas should be faster
and use less memory as it can forgo the version lookup when possible.
This change also deactivates the append-only optimization on replicas.

Relates #34099
2019-07-05 22:12:08 -04:00
Yannick Welsch 504a43d43a Move ConnectionManager to async APIs (#42636)
This commit converts the ConnectionManager's openConnection and connectToNode methods to
async-style. This will allow us to not block threads anymore when opening connections. This PR also
adapts the cluster coordination subsystem to make use of the new async APIs, allowing to remove
some hacks in the test infrastructure that had to account for the previous synchronous nature of the
connection APIs.
2019-07-05 20:40:22 +02:00
Yannick Welsch 88783927d1 Weaken assertion in PublicationTransportHandler (#44014)
These assertions do not hold true when a master fails during publication and quickly becomes
master again, publishing a new cluster state in a higher term which races against the previous
cluster state publication to self (which does not matter anyway).

Relates #43994

Closes #44012
2019-07-05 18:27:42 +02:00